citus/src/test/regress
Onder Kalaci aa6b641828 Throttle connections to the worker nodes
With this commit, we're introducing a new infrastructure to throttle
connections to the worker nodes. This infrastructure is useful for
multi-shard queries, router queries are have not been affected by this.

The goal is to prevent establishing more than citus.max_shared_pool_size
number of connections per worker node in total, across sessions.

To do that, we've introduced a new connection flag OPTIONAL_CONNECTION.
The idea is that some connections are optional such as the second
(and further connections) for the adaptive executor. A single connection
is enough to finish the distributed execution, the others are useful to
execute the query faster. Thus, they can be consider as optional connections.
When an optional connection is not allowed to the adaptive executor, it
simply skips it and continues the execution with the already established
connections. However, it'll keep retrying to establish optional
connections, in case some slots are open again.
2020-04-14 10:27:48 +02:00
..
bin Creates normalize_modified.sed 2020-04-10 13:03:19 +03:00
data add tests for local copy execution 2020-03-18 09:28:59 +03:00
expected Throttle connections to the worker nodes 2020-04-14 10:27:48 +02:00
input Adds multi_schedule_hyperscale schedule 2020-04-10 15:54:47 +03:00
mitmscripts mitmscripts/fluent.py: use atomic increment 2020-01-13 20:35:08 +00:00
output Adds multi_schedule_hyperscale schedule 2020-04-10 15:54:47 +03:00
spec Throttle connections to the worker nodes 2020-04-14 10:27:48 +02:00
sql Throttle connections to the worker nodes 2020-04-14 10:27:48 +02:00
upgrade Add missing pieces for version bump of #3482 (#3523) 2020-02-21 12:35:29 +01:00
.gitignore Implement direct COPY table TO stdout 2020-02-17 15:15:10 +01:00
Makefile Adds multi_schedule_hyperscale schedule 2020-04-10 15:54:47 +03:00
Pipfile Add upgrade postgres version test (#2940) 2019-09-10 17:56:04 +03:00
Pipfile.lock Improve upgrade test runner 2019-10-03 13:10:11 +02:00
README.md Add a basic testing README including normalization explanation 2020-01-06 09:32:03 +01:00
after_citus_upgrade_coord_schedule Introduce objects to dist. infrastructure when updating Citus (#3477) 2020-02-07 18:07:59 +03:00
after_pg_upgrade_schedule Add the necessary changes for rebalance strategies on enterprise (#3325) 2019-12-19 15:23:08 +01:00
base_schedule Adds multi_schedule_hyperscale schedule 2020-04-10 15:54:47 +03:00
before_citus_upgrade_coord_schedule Introduce objects to dist. infrastructure when updating Citus (#3477) 2020-02-07 18:07:59 +03:00
before_pg_upgrade_schedule Adds multi_schedule_hyperscale schedule 2020-04-10 15:54:47 +03:00
failure_base_schedule Adds multi_schedule_hyperscale schedule 2020-04-10 15:54:47 +03:00
failure_schedule Throttle connections to the worker nodes 2020-04-14 10:27:48 +02:00
isolation_schedule Throttle connections to the worker nodes 2020-04-14 10:27:48 +02:00
log_test_times Add test-timing script 2019-02-26 23:01:40 -07:00
minimal_schedule Adds multi_schedule_hyperscale schedule 2020-04-10 15:54:47 +03:00
multi_follower_schedule End regression tests with ensure_no_intermediate_data_leak 2020-01-03 18:59:02 +00:00
multi_mx_schedule Throttle connections to the worker nodes 2020-04-14 10:27:48 +02:00
multi_schedule Throttle connections to the worker nodes 2020-04-14 10:27:48 +02:00
multi_schedule_hyperscale Adds multi_schedule_hyperscale schedule 2020-04-10 15:54:47 +03:00
multi_schedule_hyperscale_superuser Adds multi_schedule_hyperscale schedule 2020-04-10 15:54:47 +03:00
multi_task_tracker_extra_schedule Throttle connections to the worker nodes 2020-04-14 10:27:48 +02:00
mx_base_schedule Adds multi_schedule_hyperscale schedule 2020-04-10 15:54:47 +03:00
pg_regress_multi.pl Fixes the psql connection bug 2020-04-10 15:54:47 +03:00
worker_schedule End regression tests with ensure_no_intermediate_data_leak 2020-01-03 18:59:02 +00:00

README.md

How our testing works

We use the test tooling of postgres to run our tests. This tooling is very simple but effective. The basics it runs a series of .sql scripts, gets their output and stores that in results/$sqlfilename.out. It then compares the actual output to the expected output with a simple diff command:

diff results/$sqlfilename.out expected/$sqlfilename.out

Schedules

Which sql scripts to run is defined in a schedule file, e.g. multi_schedule, multi_mx_schedule.

Makefile

In our Makefile we have rules to run the different types of test schedules. You can run them from the root of the repository like so:

# e.g. the multi_schedule
make install -j9 && make -C src/test/regress/ check-multi

Take a look at the makefile for a list of all the testing targets.

Running a specific test

Often you want to run a specific test and don't want to run everything. You can use one of the following commands to do so:

# If your tests needs almost no setup you can use check-minimal
make install -j9 && make -C src/test/regress/ check-minimal EXTRA_TESTS='multi_utility_warnings'
# Often tests need some testing data, if you get missing table errors using
# check-minimal you should try check-base
make install -j9 && make -C src/test/regress/ check-base EXTRA_TESTS='with_prepare'
# Sometimes this is still not enough and some other test needs to be run before
# the test you want to run. You can do so by adding it to EXTRA_TESTS too.
make install -j9 && make -C src/test/regress/ check-base EXTRA_TESTS='add_coordinator coordinator_shouldhaveshards'

Normalization

The output of tests is sadly not completely predictable. Still we want to compare the output of different runs and error when the important things are different. We do this by not using the regular system diff to compare files. Instead we use src/test/regress/bin/diff which does the following things:

  1. Change the $sqlfilename.out file by running it through sed using the src/test/regress/bin/normalize.sed file. This does stuff like replacing numbers that keep changing across runs with an XXX string, e.g. portnumbers or transaction numbers.
  2. Backup the original output to $sqlfilename.out.unmodified in case it's needed for debugging
  3. Compare the changed results and expected files with the system diff command.

Updating the expected test output

Sometimes you add a test to an existing file, or test output changes in a way that's not bad (possibly even good if support for queries is added). In those cases you want to update the expected test output. The way to do this is very simple, you run the test and copy the new .out file in the results directory to the expected directory, e.g.:

make install -j9 && make -C src/test/regress/ check-minimal EXTRA_TESTS='multi_utility_warnings'
cp src/test/regress/{results,expected}/multi_utility_warnings.out

Adding a new test file

Adding a new test file is quite simple:

  1. Write the SQL file in the sql directory
  2. Add it to a schedule file, to make sure it's run in CI
  3. Run the test
  4. Check that the output is as expected
  5. Copy the .out file from results to expected

Isolation testing

See src/test/regress/spec/README.md

Upgrade testing

See src/test/regress/upgrade/README.md

Failure testing

See src/test/regress/mitmscripts/README.md

Perl test setup script

To automatically setup a citus cluster in tests we use our src/test/regress/pg_regress_multi.pl script. This sets up a citus cluster and then starts the standard postgres test tooling. You almost never have to change this file.