diff --git a/.circleci/config.yml b/.circleci/config.yml index 6f3b4af86..d640cbf91 100644 --- a/.circleci/config.yml +++ b/.circleci/config.yml @@ -60,7 +60,7 @@ jobs: - checkout - run: name: 'Check Snapshots' - command: "./configure && make check-sql-snapshots -C src/backend/distributed" + command: ci/check_sql_snapshots.sh test-11_check-multi: docker: - image: 'citus/exttester-11:latest' @@ -252,7 +252,7 @@ jobs: - checkout - run: command: | - ./src/test/scripts/check_enterprise_merge.sh + ci/check_enterprise_merge.sh ch_benchmark: docker: - image: buildpack-deps:stretch diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index deff4b45f..b667dc455 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -110,31 +110,43 @@ why we ask this as well as instructions for how to proceed, see the ### Following our coding conventions -CircleCI will automatically reject any PRs which do not follow our coding conventions, it -won't even run tests! The easiest way to ensure your PR adheres to those conventions is +CircleCI will automatically reject any PRs which do not follow our coding +conventions. The easiest way to ensure your PR adheres to those conventions is to use the [citus_indent](https://github.com/citusdata/tools/tree/develop/uncrustify) -tool. +tool. This tool uses `uncrustify` under the hood. - ```bash - # Ubuntu does have uncrustify in the package manager however it's an older - # version which doesn't work with our citus-style.cfg file. We require version - # 0.60 or greater. If your package manager has a more recent version of uncrustify - # feel free to use that instead of installing from source - git clone --branch uncrustify-0.60 https://github.com/uncrustify/uncrustify.git - pushd uncrustify - ./configure - sudo make install - popd +```bash +# Uncrustify changes the way it formats code every release a bit. To make sure +# everyone formats consistently we use version 0.68.1: +curl -L https://github.com/uncrustify/uncrustify/archive/uncrustify-0.68.1.tar.gz | tar xz +cd uncrustify-uncrustify-0.68.1/ +mkdir build +cd build +cmake .. +make -j5 +sudo make install +cd ../.. - git clone https://github.com/citusdata/tools.git - pushd tools/uncrustify - make install - popd - ``` +git clone https://github.com/citusdata/tools.git +cd tools +make uncrustify/.install +``` -Once you've done that, you can run the `make reindent` command from the top directory to recursively check and -correct the style of any source files in the current directory. Under the hood, `make reindent` will run `citus_indent` and some -other style corrections for you. +Once you've done that, you can run the `make reindent` command from the top +directory to recursively check and correct the style of any source files in the +current directory. Under the hood, `make reindent` will run `citus_indent` and +some other style corrections for you. + +You can also run the following in the directory of this repository to +automatically format all the files that you have changed before committing: + +```bash +cat > .git/hooks/pre-commit << __EOF__ +#!/bin/bash +citus_indent --check --diff || { citus_indent --diff; exit 1; } +__EOF__ +chmod +x .git/hooks/pre-commit +``` ### Making SQL changes diff --git a/ci/README.md b/ci/README.md new file mode 100644 index 000000000..f9afb7bd2 --- /dev/null +++ b/ci/README.md @@ -0,0 +1,179 @@ +# CI scripts + +We have a few scripts that we run in CI to confirm that code confirms to our +standards. Be sure you have followed the setup in the [Following our coding +conventions](https://github.com/citusdata/citus/blob/master/CONTRIBUTING.md#following-our-coding-conventions) +section of `CONTRIBUTING.md`. Once you've done that, most of them should be +fixed automatically, when running: +``` +make reindent +``` + +See the sections below for details on what a specific failing script means. + +## `citus_indent` + +We format all our code using the coding conventions in the +[citus_indent](https://github.com/citusdata/tools/tree/develop/uncrustify) +tool. This tool uses `uncrustify` under the hood. See [Following our coding +conventions](https://github.com/citusdata/citus/blob/master/CONTRIBUTING.md#following-our-coding-conventions) on how to install this. + +## `editorconfig.sh` + +You should install the Editorconfig plugin for your editor/IDE +https://editorconfig.org/ + +## `banned.h.sh` + +You're using a C library function that is banned by Microsoft, mostly because of +risk for buffer overflows. This page lists the Microsoft suggested replacements: +https://liquid.microsoft.com/Web/Object/Read/ms.security/Requirements/Microsoft.Security.SystemsADM.10082#guide +These replacements are only available on Windows normally. Since we build for +Linux we make most of them available with this header file: +```c +#include "distributed/citus_safe_lib.h" +``` +This uses https://github.com/intel/safestringlib to provide them. + +However, still not all of them are available. For those cases we provide +some extra functions in `citus_safe_lib.h`, with similar functionality. + +If none of those replacements match your requirements you have to do one of the +following: +1. Add a replacement to `citus_safe_lib.{c,h}` that handles the same error cases + that the `{func_name}_s` function that Microsoft suggests. +2. Add a `/* IGNORE-BANNED */` comment to the line that complains. Doing this + requires also adding a comment before explaining why this specific use of the + function is safe. + +## `check_enterprise_merge.sh` + +When you open a PR on community, if it creates a conflict with +enterprise-master, the check-merge-to-enterprise will fail. Say your branch name +is `$PR_BRANCH`, we will refer to `$PR_BRANCH` on community as +`community/$PR_BRANCH` and on enterprise as `enterprise/$PR_BRANCH`. If the +job already passes, you are done, nothing further required! Otherwise follow the +below steps. First make sure these two things are the case: + +1. Get approval from your reviewer for `community/$PR_BRANCH`. Only follow the + next steps after you are about to merge the branch to community master. +2. Make sure your commits are in a nice state, since you should not do + "squash and merge" on Github later. Otherwise you will certainly get + duplicate commits and possibly get merge conflicts with enterprise again. + +Once that's done, you need to create a merged version of your PR branch on the +enterprise repo. For example if `community` is added as a remote in +your enterprise repo, you can do the following: + +```bash +export PR_BRANCH= +git checkout enterprise-master +git pull # Make sure your local enterprise-master is up to date +git fetch community # Fetch your up to date branch name +git checkout -b "$PR_BRANCH" enterprise-master +``` +Now you have X in your enterprise repo, which we refer to as +`enterprise/$PR_BRANCH` (even though in git commands you would reference it as +`origin/$PR_BRANCH`). This branch is currently the same as `enterprise-master`. +First to make review easier, you should merge community master into it. This +should apply without any merge conflicts: + +```bash +git merge community/master +``` +Now you need to merge `community/$PR_BRANCH` to `enterprise/$PR_BRANCH`. Solve +any conflicts and make sure to remove any parts that should not be in enterprise +even though it doesn't have a conflict, on enterprise repository: + +```bash +git merge "community/$PR_BRANCH" +``` + +1. You should push this branch to the enterprise repo. This is so that the job + on community will see this branch. +2. Wait until tests on `enterprise/$PR_BRANCH` pass. +3. Create a PR on the enterprise repo for your `enterprise/$PR_BRANCH` branch. +4. You should get approval for the merge conflict changes on + `enterprise/$PR_BRANCH`, preferably from the same reviewer as they are + familiar with the change. +5. You should rerun the `check-merge-to-enterprise` check on + `community/$PR_BRANCH`. You can use re-run from failed option in circle CI. +6. You can now merge the PR on community. Be sure to NOT use "squash and merge", + but instead use the regular "merge commit" mode. +7. You can now merge the PR on enterprise. Be sure to NOT use "squash and merge", + but instead use the regular "merge commit" mode. + +The subsequent PRs on community will be able to pass the +`check-merge-to-enterprise` check as long as they don't have a conflict with +`enterprise-master`. + +## `check_sql_snapshots.sh` + +To allow for better diffs during review we have snapshots of SQL UDFs. This +means that `latest.sql` is not up to date with the SQL file of the highest +version number in the directory. The output of the script shows you what is +different. + +## `disallow_c_comments_in_migrations.sh` + +We do not use C-style comments in migration files as the stripped +zero-length migration files cause warning during packaging. +Instead use SQL type comments, i.e: +``` +-- this is a comment +``` +See [#3115](https://github.com/citusdata/citus/pull/3115) for more info. + + +## `disallow_long_changelog_entries.sh` + +Having changelog items with entries that are longer than 80 characters are +forbidden. It's allowed to split up the entry over multiple lines, as long as +each line of the entry is 80 characters or less. + +## `normalize_expected.sh` + +All files in `src/test/expected` should be committed in normalized form. +This error mostly happens if someone added a new normalization rule and you have +not rerun tests that you have added. + +We normalize the test output files using a `sed` script called +[`normalize.sed`](https://github.com/citusdata/citus/blob/master/src/test/regress/bin/normalize.sed). +The reason for this is that some output changes randomly in ways we don't care +about. An example of this is when an error happens on a different port number, +or a different worker shard, or a different placement, etc. Either randomly or +because we are running the tests in a slightly different configuration. + +## `remove_useless_declarations.sh` + +This script tries to make sure that we don't add useless declarations to our +code. What it effectively does is replace this: +```c +int a = 0; +int b = 2; +Assert(b == 2); +a = b + b; +``` +With this equivalent, but shorter version: +```c +int b = 2; +Assert(b == 2); +int a = b + b; +``` + +It relies on the fact that `citus_indent` formats our code in certain ways. So +before running this script, make sure that you've done that. +This replacement is all done using a [regex replace](xkcd.com/1171), so it's +definitely possible there's a bug in there. So far no bad ones have been found. + +A known issue is that it does not replace code in a block after an `#ifdef` like +this. +```c +int foo = 0; +#ifdef SOMETHING +foo = 1 +#else +foo = 2 +#endif +``` +This was deemed to be error prone and not worth the effort. diff --git a/ci/banned.h.sh b/ci/banned.h.sh index 6b91e6072..d814d9292 100755 --- a/ci/banned.h.sh +++ b/ci/banned.h.sh @@ -1,4 +1,4 @@ -#!/bin/sh +#!/bin/bash # Checks for the APIs that are banned by microsoft. Since we compile for Linux # we use the replacements from https://github.com/intel/safestringlib @@ -11,6 +11,8 @@ # https://liquid.microsoft.com/Web/Object/Read/ms.security/Requirements/Microsoft.Security.SystemsADM.10082#guide set -eu +# shellcheck disable=SC1091 +source ci/ci_helpers.sh files=$(find src -iname '*.[ch]' | git check-attr --stdin citus-style | grep -v ': unset$' | sed 's/: citus-style: set$//') diff --git a/src/test/scripts/check_enterprise_merge.sh b/ci/check_enterprise_merge.sh similarity index 69% rename from src/test/scripts/check_enterprise_merge.sh rename to ci/check_enterprise_merge.sh index 007c27d10..e2b12b7eb 100755 --- a/src/test/scripts/check_enterprise_merge.sh +++ b/ci/check_enterprise_merge.sh @@ -14,33 +14,8 @@ set -o pipefail PR_BRANCH="${CIRCLE_BRANCH}" ENTERPRISE_REMOTE="https://${GIT_USERNAME}:${GIT_TOKEN}@github.com/citusdata/citus-enterprise" -# For echo commands "set -x" would show the message effectively twice. Once as -# part of the echo command shown by "set -x" and once because of the output of -# the echo command. We do not want "set -x" to show the echo command. We only -# want to see the actual message in the output of echo itself. This function is -# a trick to do so. Read the StackOverflow post below to understand why this -# works and what this works around. -# Source: https://superuser.com/a/1141026/242593 -shopt -s expand_aliases -alias echo='{ save_flags="$-"; set +x;} 2> /dev/null; echo_and_restore' -echo_and_restore() { - builtin echo "$*" - #shellcheck disable=SC2154 - case "$save_flags" in - (*x*) set -x - esac -} - -# Make sure that on a failing exit we show a useful message -hint_on_fail() { - exit_code=$? - if [ $exit_code != 0 ]; then - echo HINT: To solve this failure look here: https://github.com/citusdata/citus/blob/master/src/test/scripts/README.md#check-merge-to-enterprise-job - fi - exit $exit_code -} -trap hint_on_fail EXIT - +# shellcheck disable=SC1091 +source ci/ci_helpers.sh # List executed commands. This is done so debugging this script is easier when # it fails. It's explicitly done after git remote add so username and password diff --git a/ci/check_sql_snapshots.sh b/ci/check_sql_snapshots.sh new file mode 100755 index 000000000..3c871ab8b --- /dev/null +++ b/ci/check_sql_snapshots.sh @@ -0,0 +1,20 @@ +#!/bin/bash +set -euo pipefail + +# shellcheck disable=SC1091 +source ci/ci_helpers.sh + +for udf_dir in src/backend/distributed/sql/udfs/*; do + # We want to find the last snapshotted sql file, to make sure it's the same + # as "latest.sql". This is done by: + # 1. Getting the filenames in the UDF directory (using find instead of ls, to keep shellcheck happy) + # 2. Filter out latest.sql + # 3. Sort using "version sort" + # 4. Get the last one using tail + latest_snapshot=$(\ + find "$udf_dir" -iname "*.sql" -exec basename {} \; \ + | { grep --invert-match latest.sql || true; } \ + | sort --version-sort \ + | tail --lines 1); + diff --unified --color=auto "$udf_dir/latest.sql" "$udf_dir/$latest_snapshot"; \ +done diff --git a/ci/ci_helpers.sh b/ci/ci_helpers.sh new file mode 100644 index 000000000..9c0ab7a88 --- /dev/null +++ b/ci/ci_helpers.sh @@ -0,0 +1,32 @@ +#!/bin/bash + +# For echo commands "set -x" would show the message effectively twice. Once as +# part of the echo command shown by "set -x" and once because of the output of +# the echo command. We do not want "set -x" to show the echo command. We only +# want to see the actual message in the output of echo itself. This function is +# a trick to do so. Read the StackOverflow post below to understand why this +# works and what this works around. +# Source: https://superuser.com/a/1141026/242593 +shopt -s expand_aliases +alias echo='{ save_flags="$-"; set +x;} 2> /dev/null && echo_and_restore' +echo_and_restore() { + builtin echo "$*" + #shellcheck disable=SC2154 + case "$save_flags" in + (*x*) set -x + esac +} + +# Make sure that on a failing exit we show a useful message +hint_on_fail() { + exit_code=$? + # Get filename of the currently running script + # Source: https://stackoverflow.com/a/192337/2570866 + filename=$(basename "$0") + if [ $exit_code != 0 ]; then + echo "HINT: To solve this failure look here: https://github.com/citusdata/citus/blob/master/ci/README.md#$filename" + fi + exit $exit_code +} +trap hint_on_fail EXIT + diff --git a/ci/disallow_c_comments_in_migrations.sh b/ci/disallow_c_comments_in_migrations.sh index 72ab74587..764822b41 100755 --- a/ci/disallow_c_comments_in_migrations.sh +++ b/ci/disallow_c_comments_in_migrations.sh @@ -1,6 +1,8 @@ #! /bin/bash set -euo pipefail +# shellcheck disable=SC1091 +source ci/ci_helpers.sh # We do not use c-style comments in migration files as the stripped # zero-length migration files cause warning during packaging diff --git a/ci/disallow_long_changelog_entries.sh b/ci/disallow_long_changelog_entries.sh index 36d7e5879..74d93c303 100755 --- a/ci/disallow_long_changelog_entries.sh +++ b/ci/disallow_long_changelog_entries.sh @@ -1,6 +1,8 @@ #! /bin/bash set -eu +# shellcheck disable=SC1091 +source ci/ci_helpers.sh # Having changelog items with entries that are longer than 80 characters are forbidden. # Find all lines with disallowed length, and for all such lines store diff --git a/ci/editorconfig.sh b/ci/editorconfig.sh index 427c45061..206c9a445 100755 --- a/ci/editorconfig.sh +++ b/ci/editorconfig.sh @@ -1,5 +1,9 @@ -#!/bin/sh -set -eu +#!/bin/bash + +set -euo pipefail +# shellcheck disable=SC1091 +source ci/ci_helpers.sh + for f in $(git ls-tree -r HEAD --name-only); do if [ "$f" = "${f%.out}" ] && [ "$f" = "${f%.data}" ] && diff --git a/ci/fix_style.sh b/ci/fix_style.sh index 63d899f28..8846eda38 100755 --- a/ci/fix_style.sh +++ b/ci/fix_style.sh @@ -8,9 +8,9 @@ set -e cidir="${0%/*}" cd ${cidir}/.. +citus_indent . --quiet ci/editorconfig.sh ci/remove_useless_declarations.sh ci/disallow_c_comments_in_migrations.sh ci/disallow_long_changelog_entries.sh - -citus_indent . --quiet +ci/normalize_expected.sh diff --git a/ci/normalize_expected.sh b/ci/normalize_expected.sh index 431ff83a6..0ff7f9c1c 100755 --- a/ci/normalize_expected.sh +++ b/ci/normalize_expected.sh @@ -1,6 +1,9 @@ -#!/bin/sh +#!/bin/bash + +set -euo pipefail +# shellcheck disable=SC1091 +source ci/ci_helpers.sh -set -eu for f in $(git ls-tree -r HEAD --name-only src/test/regress/expected/*.out); do sed -Ef src/test/regress/bin/normalize.sed < "$f" > "$f.modified" mv "$f.modified" "$f" diff --git a/ci/remove_useless_declarations.sh b/ci/remove_useless_declarations.sh index 0e77eff24..5925bfc66 100755 --- a/ci/remove_useless_declarations.sh +++ b/ci/remove_useless_declarations.sh @@ -1,12 +1,21 @@ -#!/bin/sh +#!/bin/bash -set -eu +set -euo pipefail +# shellcheck disable=SC1091 +source ci/ci_helpers.sh -files=$(find src -iname '*.c' | git check-attr --stdin citus-style | grep -v ': unset$' | sed 's/: citus-style: set$//') +files=$(find src -iname '*.c' -type f | git check-attr --stdin citus-style | grep -v ': unset$' | sed 's/: citus-style: set$//') while true; do + # A visual version of this regex can be seen here (it is MUCH clearer): + # https://www.debuggex.com/r/XodMNE9auT9e-bTx + # This visual version only contains the search bit, the replacement bit is + # quite simple. It looks like when extracted from the command below: + # \n$+{code_between}\t$+{type}$+{variable} = # shellcheck disable=SC2086 perl -i -p0e 's/\n\t(?!return )(?P(\w+ )+\**)(?>(?P\w+)( = *[\w>\s\n-]*?)?;\n(?P(?>(?P\/\*.*?\*\/|"(?>\\"|.)*?"|[^#]))*?)(\t)?(?=\b(?P=variable)\b))(?<=\n\t)(?P=variable) =(?![^;]*?[^>_]\b(?P=variable)\b[^_])/\n$+{code_between}\t$+{type}$+{variable} =/sg' $files - # The following are simply the same regex, but repeated for different tab sizes + # The following are simply the same regex, but repeated for different + # indentation levels, i.e. finding declarations indented using 2, 3, 4, 5 + # and 6 tabs. More than 6 don't really occur in the wild. # (this is needed because variable sized backtracking is not supported in perl) # shellcheck disable=SC2086 perl -i -p0e 's/\n\t\t(?!return )(?P(\w+ )+\**)(?>(?P\w+)( = *[\w>\s\n-]*?)?;\n(?P(?>(?P\/\*.*?\*\/|"(?>\\"|.)*?"|[^#]))*?)(\t\t)?(?=\b(?P=variable)\b))(?<=\n\t\t)(?P=variable) =(?![^;]*?[^>_]\b(?P=variable)\b[^_])/\n$+{code_between}\t\t$+{type}$+{variable} =/sg' $files @@ -18,6 +27,8 @@ while true; do perl -i -p0e 's/\n\t\t\t\t\t(?!return )(?P(\w+ )+\**)(?>(?P\w+)( = *[\w>\s\n-]*?)?;\n(?P(?>(?P\/\*.*?\*\/|"(?>\\"|.)*?"|[^#]))*?)(\t\t\t\t\t)?(?=\b(?P=variable)\b))(?<=\n\t\t\t\t\t)(?P=variable) =(?![^;]*?[^>_]\b(?P=variable)\b[^_])/\n$+{code_between}\t\t\t\t\t$+{type}$+{variable} =/sg' $files # shellcheck disable=SC2086 perl -i -p0e 's/\n\t\t\t\t\t\t(?!return )(?P(\w+ )+\**)(?>(?P\w+)( = *[\w>\s\n-]*?)?;\n(?P(?>(?P\/\*.*?\*\/|"(?>\\"|.)*?"|[^#]))*?)(\t\t\t\t\t\t)?(?=\b(?P=variable)\b))(?<=\n\t\t\t\t\t\t)(?P=variable) =(?![^;]*?[^>_]\b(?P=variable)\b[^_])/\n$+{code_between}\t\t\t\t\t\t$+{type}$+{variable} =/sg' $files - git diff --quiet && break; - git add .; + # shellcheck disable=SC2086 + git diff --quiet $files && break; + # shellcheck disable=SC2086 + git add $files; done diff --git a/src/backend/distributed/Makefile b/src/backend/distributed/Makefile index d530080d7..99ad9e365 100644 --- a/src/backend/distributed/Makefile +++ b/src/backend/distributed/Makefile @@ -61,15 +61,7 @@ ifneq (,$(SQL_Po_files)) include $(SQL_Po_files) endif -.PHONY: check-sql-snapshots clean-full - -check-sql-snapshots: - bash -c '\ - set -eu -o pipefail; \ - for f in sql/udfs/*; do \ - latest_snapshot=$$(ls $$f | { grep -v latest.sql || true; } | sort -V | tail -n 1); \ - diff -u $$f/latest.sql $$f/$$latest_snapshot; \ - done' +.PHONY: clean-full cleanup-before-install: rm -f $(DESTDIR)$(datadir)/$(datamoduledir)/citus* diff --git a/src/test/scripts/README.md b/src/test/scripts/README.md deleted file mode 100644 index 1d2c9ee93..000000000 --- a/src/test/scripts/README.md +++ /dev/null @@ -1,60 +0,0 @@ -# check-merge-to-enterprise Job - -When you open a PR on community, if it creates a conflict with -enterprise-master, the check-merge-to-enterprise will fail. Say your branch name -is `$PR_BRANCH`, we will refer to `$PR_BRANCH` on community as -`community/$PR_BRANCH` and on enterprise as `enterprise/$PR_BRANCH`. If the -job already passes, you are done, nothing further required! Otherwise follow the -below steps. First make sure these two things are the case: - -1. Get approval from your reviewer for `community/$PR_BRANCH`. Only follow the - next steps after you are about to merge the branch to community master. -2. Make sure your commits are in a nice state, since you should not do - "squash and merge" on Github later. Otherwise you will certainly get - duplicate commits and possibly get merge conflicts with enterprise again. - -Once that's done, you need to create a merged version of your PR branch on the -enterprise repo. For example if `community` is added as a remote in -your enterprise repo, you can do the following: - -```bash -export PR_BRANCH= -git checkout enterprise-master -git pull # Make sure your local enterprise-master is up to date -git fetch community # Fetch your up to date branch name -git checkout -b "$PR_BRANCH" enterprise-master -``` -Now you have X in your enterprise repo, which we refer to as -`enterprise/$PR_BRANCH` (even though in git commands you would reference it as -`origin/$PR_BRANCH`). This branch is currently the same as `enterprise-master`. -First to make review easier, you should merge community master into it. This -should apply without any merge conflicts: - -```bash -git merge community/master -``` -Now you need to merge `community/$PR_BRANCH` to `enterprise/$PR_BRANCH`. Solve -any conflicts and make sure to remove any parts that should not be in enterprise -even though it doesn't have a conflict, on enterprise repository: - -```bash -git merge "community/$PR_BRANCH" -``` - -1. You should push this branch to the enterprise repo. This is so that the job - on community will see this branch. -2. Wait until tests on `enterprise/$PR_BRANCH` pass. -3. Create a PR on the enterprise repo for your `enterprise/$PR_BRANCH` branch. -4. You should get approval for the merge conflict changes on - `enterprise/$PR_BRANCH`, preferably from the same reviewer as they are - familiar with the change. -5. You should rerun the `check-merge-to-enterprise` check on - `community/$PR_BRANCH`. You can use re-run from failed option in circle CI. -6. You can now merge the PR on community. Be sure to NOT use "squash and merge", - but instead use the regular "merge commit" mode. -7. You can now merge the PR on enterprise. Be sure to NOT use "squash and merge", - but instead use the regular "merge commit" mode. - -The subsequent PRs on community will be able to pass the -`check-merge-to-enterprise` check as long as they don't have a conflict with -`enterprise-master`.