History

Cédric Villemain 0c1b31cdb5 Fix UPDATE stmts with indirection & array/jsonb subscripting with more than 1 field (#7675 ) DESCRIPTION: Fixes problematic UPDATE statements with indirection and array/jsonb subscripting with more than one field. Fixes #4092, #7674 and #5621. Issues #7674 and #4092 involve an UPDATE with out of order columns and a sublink (SELECT) in the source, e.g. `UPDATE T SET (col3, col1, col4) = (SELECT 3, 1, 4)` where an incorrect value could get written to a column because query deparsing generated an incorrect SQL statement. To address this the fix adds an additional check to `ruleutils` to ensure that the target list of an UPDATE statement is in an order so that deparsing can be done safely. It is needed when the source of the UPDATE has a sublink, because Postgres `rewrite` will have put the target list in attribute order, but for deparsing to produce a correct SQL text the target list needs to be in order of the references (or `paramids`) to the target list of the sublink(s). Issue #5621 involves an UPDATE with array/jsonb subscripting that can behave incorrectly with more than one field, again because Citus query deparsing is receiving a post-`rewrite` query tree. The fix also adds a check to `ruleutils` to enable correct query deparsing of the UPDATE. --------- Co-authored-by: Ibrahim Halatci <ihalatci@gmail.com> Co-authored-by: Colm McHugh <colm.mchugh@gmail.com>		2025-07-22 17:49:26 +01:00
..
bin	PG18 - Strip decimal fractions from actual rows counts in normalize.sed (#8041 )	2025-07-17 15:38:06 +03:00
citus_tests	Fix UPDATE stmts with indirection & array/jsonb subscripting with more than 1 field (#7675 )	2025-07-22 17:49:26 +01:00
data	…
expected	Fix UPDATE stmts with indirection & array/jsonb subscripting with more than 1 field (#7675 )	2025-07-22 17:49:26 +01:00
mitmscripts	Add citus_nodes view (#7968 )	2025-05-14 15:05:12 +03:00
spec	Revert "Release RowExclusiveLock on pg_dist_transaction as soon as remote xacts are recovered"	2025-03-12 12:43:01 +03:00
sql	Fix UPDATE stmts with indirection & array/jsonb subscripting with more than 1 field (#7675 )	2025-07-22 17:49:26 +01:00
.gitignore	…
Makefile	Support running Citus upgrade tests with run_test.py (#6832 )	2023-05-23 14:38:54 +02:00
Pipfile	Bump werkzeug from 2.3.7 to 3.0.6 in /src/test/regress (#8003 )	2025-06-26 18:30:16 +03:00
Pipfile.lock	Bump black from 24.2.0 to 24.3.0 in /src/test/regress (#8062 )	2025-07-18 15:48:59 +03:00
README.md	Add STYLEGUIDE.md and update some other md files on best practices (#7347 )	2025-03-14 15:42:59 +00:00
after_citus_upgrade_coord_schedule	Fix mixed Citus upgrade tests (#7218 )	2023-09-26 17:52:52 +03:00
after_pg_upgrade_schedule	Fix mixed Citus upgrade tests (#7218 )	2023-09-26 17:52:52 +03:00
base_isolation_schedule	…
base_schedule	…
before_citus_upgrade_coord_schedule	Fix upgrade tests (#7413 )	2024-01-16 12:37:18 +00:00
before_pg_upgrade_schedule	Fix upgrade tests (#7413 )	2024-01-16 12:37:18 +00:00
columnar_isolation_schedule	…
columnar_schedule	Change test files in multi and multi-1 schedules to accommodate coordinator in the metadata. (#6939 )	2023-06-05 10:37:48 +03:00
create_schedule	…
enterprise_failure_schedule	…
enterprise_isolation_logicalrep_1_schedule	…
enterprise_isolation_logicalrep_2_schedule	…
enterprise_isolation_logicalrep_3_schedule	…
enterprise_isolation_schedule	Removes pg_send_cancellation (#7135 )	2023-08-21 17:29:44 +03:00
enterprise_minimal_schedule	…
enterprise_schedule	Change test files in multi and multi-1 schedules to accommodate coordinator in the metadata. (#6939 )	2023-06-05 10:37:48 +03:00
failure_base_schedule	…
failure_schedule	Add citus_stat_counters view and citus_stat_counters_reset() function to reset it (#7917 )	2025-04-28 12:23:52 +00:00
flaky_tests.md	…
isolation_schedule	Support CREATE / DROP database commands from any node (#7359 )	2024-01-08 16:47:49 +00:00
log_test_times	…
minimal_pg_upgrade_schedule	Fix running PG upgrade tests with run_test.py (#6829 )	2023-04-24 15:54:32 +02:00
minimal_schedule	…
mixed_after_citus_upgrade_schedule	Remove PG13 from CI and Configure (#7002 )	2023-06-15 14:54:06 +03:00
mixed_before_citus_upgrade_schedule	Remove PG13 from CI and Configure (#7002 )	2023-06-15 14:54:06 +03:00
multi_1_schedule	Fix UPDATE stmts with indirection & array/jsonb subscripting with more than 1 field (#7675 )	2025-07-22 17:49:26 +01:00
multi_follower_schedule	…
multi_mx_schedule	Avoid query deparse and planning of shard query in local execution. (#8035 )	2025-07-22 17:16:53 +01:00
multi_schedule	Fix UPDATE stmts with indirection & array/jsonb subscripting with more than 1 field (#7675 )	2025-07-22 17:49:26 +01:00
multi_schedule_hyperscale	Fix flaky validate_constraint test (#7293 )	2023-11-01 09:41:28 +01:00
multi_schedule_hyperscale_superuser	Fix flaky validate_constraint test (#7293 )	2023-11-01 09:41:28 +01:00
mx_base_schedule	…
mx_minimal_schedule	…
operations_schedule	Run replicate_reference_tables background task as superuser. (#6930 )	2023-05-18 23:46:32 +03:00
pg_regress_multi.pl	fix #7715 - add assign hook for CDC library path adjustment (#8025 )	2025-07-18 11:07:17 +03:00
postgres_schedule	…
single_shard_table_prep_schedule	Call null-shard-key tables as single-shard distributed tables in code	2023-05-03 17:02:43 +03:00
split_schedule	Random tests refactoring (#7342 )	2023-11-14 12:49:15 +03:00
sql_base_schedule	…
sql_schedule	…

README.md

How our testing works

We use the test tooling of postgres to run our tests. This tooling is very simple but effective. The basics it runs a series of .sql scripts, gets their output and stores that in results/$sqlfilename.out. It then compares the actual output to the expected output with a simple diff command:

diff results/$sqlfilename.out expected/$sqlfilename.out

Schedules

Which sql scripts to run is defined in a schedule file, e.g. multi_schedule, multi_mx_schedule.

Makefile

In our Makefile we have rules to run the different types of test schedules. You can run them from the root of the repository like so:

# e.g. the multi_schedule
make install -j9 && make -C src/test/regress/ check-multi

Take a look at the makefile for a list of all the testing targets.

Running a specific test

Often you want to run a specific test and don't want to run everything. You can simply use run_test.py [test_name] script like below in that case. It detects the test schedule and make target to run the given test.

src/test/regress/citus_tests/run_test.py multi_utility_warnings

You can pass --repeat or r parameter to run the given test for multiple times.

src/test/regress/citus_tests/run_test.py multi_utility_warnings -r 1000

To force the script to use base schedules rather than minimal ones, you can pass -b or --use-base-schedule.

src/test/regress/citus_tests/run_test.py coordinator_shouldhaveshards -r 1000 --use-base-schedule

If you would like to run a specific test on a certain target you can use one of the following commands to do so:

# If your tests needs almost no setup you can use check-minimal
make install -j9 && make -C src/test/regress/ check-minimal EXTRA_TESTS='multi_utility_warnings'
# Often tests need some testing data, if you get missing table errors using
# check-minimal you should try check-base
make install -j9 && make -C src/test/regress/ check-base EXTRA_TESTS='with_prepare'
# Sometimes this is still not enough and some other test needs to be run before
# the test you want to run. You can do so by adding it to EXTRA_TESTS too.
make install -j9 && make -C src/test/regress/ check-base EXTRA_TESTS='add_coordinator coordinator_shouldhaveshards'

Normalization

The output of tests is sadly not completely predictable. Still we want to compare the output of different runs and error when the important things are different. We do this by not using the regular system diff to compare files. Instead we use src/test/regress/bin/diff which does the following things:

Change the $sqlfilename.out file by running it through sed using the src/test/regress/bin/normalize.sed file. This does stuff like replacing numbers that keep changing across runs with an XXX string, e.g. portnumbers or transaction numbers.
Backup the original output to $sqlfilename.out.unmodified in case it's needed for debugging
Compare the changed results and expected files with the system diff command.

Updating the expected test output

Sometimes you add a test to an existing file, or test output changes in a way that's not bad (possibly even good if support for queries is added). In those cases you want to update the expected test output. The way to do this is very simple, you run the test and copy the new .out file in the results directory to the expected directory, e.g.:

make install -j9 && make -C src/test/regress/ check-minimal EXTRA_TESTS='multi_utility_warnings'
cp src/test/regress/{results,expected}/multi_utility_warnings.out

Adding a new test file

Adding a new test file is quite simple:

Write the SQL file in the sql directory
Add it to a schedule file, to make sure it's run in CI
Run the test
Check that the output is as expected
Copy the .out file from results to expected

Isolation testing

See src/test/regress/spec/README.md

Pytest testing

See src/test/regress/citus_tests/test/README.md

Upgrade testing

See src/test/regress/citus_tests/upgrade/README.md

Arbitrary configs testing

See src/test/regress/citus_tests/arbitrary_configs/README.md

Failure testing

See src/test/regress/mitmscripts/README.md

Perl test setup script

To automatically setup a citus cluster in tests we use our src/test/regress/pg_regress_multi.pl script. This sets up a citus cluster and then starts the standard postgres test tooling. You almost never have to change this file.

Handling different test outputs

Sometimes the test output changes because we run tests in different configurations. The most common example is an output that changes in different Postgres versions. We highly encourage to find a way to avoid these test outputs. You can try the following, if applicable to the changing output:

Change the test such that you still test what you want, but you avoid the different test outputs.
Reduce the test verbosity via: \set VERBOSITY terse, SET client_min_messages TO error, etc
Drop the specific test lines altogether, if the test is not critical.
Use utility functions that modify the output to your preference, like coordinator_plan, which modifies EXPLAIN output
Add a normalization rule

Alternative test output files are highly discouraged, so only add one when strictly necessary. In order to maintain a clean test suite, make sure to explain why it has an alternative output in the test header, and when we can drop the alternative output file in the future.

For example:

--
-- MULTI_INSERT_SELECT
--
-- This test file has an alternative output because of the change in the
-- display of SQL-standard function's arguments in INSERT/SELECT in PG15.
-- The alternative output can be deleted when we drop support for PG14
--

Including important keywords, like "PG14", "PG15", "alternative output" will help cleaning up in the future.

Randomly failing tests

In CI sometimes a test fails randomly, we call these tests "flaky". To fix these flaky tests see src/test/regress/flaky_tests.md

Regression test best practices

Instead of connecting to different nodes to check catalog tables, should use run_command_on_all_nodes() because it's faster than keep disconnecting / connecting to different nodes.
Tests should define functions for repetitive actions, e.g., by wrapping usual queries used to check catalog tables. If the function is presumed to be used by other tests in future, then the function needs to defined in multi_test_helpers.sql.
If you're adding a new file, consider using src/test/regress/bin/create_test.py your_new_test_name to create the file. Or if you want to manually create it, make sure that your test file creates a schema and that it drops the schema at the end of the test to make sure that it doesn't leak any objects behind. See which lines src/test/regress/bin/create_test.py adds to the test file to understand what you need to do.

For the object that are not bound to a schema, make sure to drop them at the end of the test too, such as databases and roles.