citus

Commit Graph

Author	SHA1	Message	Date
Jelte Fennema	c1f58fac6c	Don't run any isolation tests in parallel (#6212 ) By running isolation tests in parallel we're just asking for flaky tasks. The first test might temporarily block one of the commands in the second test, which we then detect as waiting like this: ```diff step s2-vacuum-analyze: VACUUM ANALYZE test_insert_vacuum; - + <waiting ...> step s1-commit: COMMIT; +step s2-vacuum-analyze: <... completed> ``` Debugging flaky tests is also much harder when they are run in parallel. This PR starts running all our isolation tests sequentially. The reason for opening this PR was me seeing this failing test: https://app.circleci.com/pipelines/github/citusdata/citus/26194/workflows/ff57e2cf-8ac4-40fe-bc0c-74a7f8fecb53/jobs/740454 As well as having fixed a similar issue recently in #6122 (cherry picked from commit `85305b2773`)	2022-09-07 13:27:49 +02:00
Jelte Fennema	3d67ae6497	Fix flakyness in failure_insert_select_repartition (#6202 ) This fixes our most commonly randomly failing failure test. The failing diff is as follows: ```diff SELECT citus.mitmproxy('conn.onQuery(query="fetch_intermediate_results").kill()'); mitmproxy ----------- (1 row) INSERT INTO target_table SELECT * FROM source_table; -ERROR: connection to the remote node localhost:xxxxx failed with the following error: connection not open +ERROR: could not open file "base/pgsql_job_cache/10_0_40/repartitioned_results_20770193413_from_4213590_to_1.data": No such file or directory +CONTEXT: while executing command on localhost:9060 +while executing command on localhost:57637 SELECT * FROM target_table ORDER BY a; ``` As far as I can tell this is the cause of a race condition: After killing fetch_intermediate_results on worker 9060, the previously created data file gets cleaned up. The fetch_intermediate_results call that's sent to worker 57637 will be cancelled and rolled back soon because of the failure on the other connection. But if that fetch_intermediate_results call is able to connect to 9060 before it is cancelled, it won't find the file it's looking for there anymore. So while it's not the error we expect, it does indicate that we succeeded. To avoid this issue instead of killing the fetch_intermediate_results call directly, we kill the COPY command that it uses to do the fetch. This results in stable output as can be seen here, where 227 runs of failure_insert_select_repartition succeeded: https://app.circleci.com/pipelines/github/citusdata/citus/26168/workflows/9c64a3b6-f46c-4725-9fb4-8f6a2d00a023/jobs/739389 To be clear this changes the test to affects the opposite fetch_intermediate_results call. This kills the fetch_intermediate_results call of worker 57637, instead of killing the fetch_intermediate_results call on worker 9060. Example of failing test: https://app.circleci.com/pipelines/github/citusdata/citus/26147/workflows/780e95ea-264a-4c9f-ad2e-cf11449a795e/jobs/738467 (cherry picked from commit `8ce12eb51f`)	2022-09-07 13:27:49 +02:00
Önder Kalacı	173b7ed5ad	Properly add / remove coordinator for isolation tests (#6181 ) We used to rely on a seperate session to add the coordinator. However, that might prevent the existing sessions to get assigned proper gpids, which causes flaky tests. (cherry picked from commit `961fcff5db`)	2022-09-07 13:27:49 +02:00
Jelte Fennema	eed4d6452d	Fix flakyness in columnar_first_row_number test (#6192 ) When running columnar_first_row_number in parallel with the columnar_query test sometimes it would fail. This bug is tracked in #6191. For now to make CI less flaky we simply don't run these tests in parallel. Example of failed test: https://app.circleci.com/pipelines/github/citusdata/citus/26106/workflows/75d00ea9-23f8-4bff-a927-bced19e1f81b/jobs/736713 Fixes #6184 (cherry picked from commit `0a045afd3a`)	2022-09-07 13:27:49 +02:00
Jelte Fennema	aa879108b7	Remove the flaky rollback_to_savepoint test (#6190 ) This removes a flaky test that I introduced in #3868 after I fixed the issue described in #3622. This test is sometimes fails randomly in CI. The way it fails indicates that there might be some bug: A connection breaks after rolling back to a savepoint. I tried reproducing this issue locally, but I wasn't able to. I don't understand what causes the failure. Things that I tried were: 1. Running the test with: ```sql SET citus.force_max_query_parallelization = true; ``` 2. Running the test with: ```sql SET citus.max_adaptive_executor_pool_size = 1; ``` 3. Running the test in parallel with the same tests that it is run in parallel with in multi_schedule. None of these allowed me to reproduce the issue locally. So I think it's time to give on fixing this test and simply remove the test. The regression that this test protects against seems very unlikely to reappear, since in #3868 I also added a big comment about the need for the newly added `UnclaimConnection` call. So, I think the need for the test is quite small, and removing it will make our CI less flaky. In case the cause of the bug ever gets found, I tracked the bug in #6189 Example of a failing CI run: https://app.circleci.com/pipelines/github/citusdata/citus/26098/workflows/f84741d9-13b1-4ae7-9155-c21ed3466951/jobs/736424 For reference the unexpected diff is this (so both warnings and an error): ```diff INSERT INTO t SELECT i FROM generate_series(1, 100) i; +WARNING: connection to the remote node localhost:57638 failed with the following error: +WARNING: +CONTEXT: while executing command on localhost:57638 +ERROR: connection to the remote node localhost:57638 failed with the following error: ROLLBACK; ``` This test is also mentioned as the most failing regression test in #5975 (cherry picked from commit `d16b458e2a`)	2022-09-07 13:27:49 +02:00
Jelte Fennema	ee887ef648	Fix flakyness in create index concurrently isolation tests (#6158 ) This creates consistent test output for isolation tests that involve `CREATE INDEX CONCURRENTLY`. `CREATE INDEX CONCURRENTLY` is sometimes temporarily detected as blocking, even though it will complete without any other queries needing to be run. This change makes sure that we wait until that happens without running any other queries in the meantime. This way we always get consistent output. The way we do that is addressed by using an empty step in the same session as the `CREATE INDEX CONCURRENLTY` command. Doing so forces the isolation tester to wait until the command is finished and not continue with steps from other sessions. This is [the recommended approach by Postgres][1]. There's two separate cases which are addressed in slightly different ways: 1. If `CREATE INDEX CONCURRENTLY` is actually blocked on another session: Add an empty step right after the commit of blocking session. e.g. `"s2-ddl-create-index-concurrently" "s1-commit" "s2-empty"` 2. If it's not actually blocked on another session: Add [an asterisk marker][2] to make it look like it's blocked (because sometimes this happens randomly) and right after that we add an empty step to trigger waiting. e.g. `"s2-ddl-create-index-concurrently"(*) "s2-empty" "s1-commit"` In passing this also enables isolation tests that were disabled due to a bug that has already been fixed for a while. Fixes #5993 Related to #5910 and #2966 [1]: `5f0adec253/src/test/isolation/README (L197-L204)` [2]: `5f0adec253/src/test/isolation/README (L174-L179)` Co-authored-by: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com> (cherry picked from commit `fd07cc9baf`)	2022-09-07 13:27:49 +02:00
Jelte Fennema	a9a114145a	Fix flakyness in isolation_data_migration.spec (#6122 ) The tests isolation_concurrent_dml and isolation_data_migration tests were being run in parallel, but they were interfering with each others output. Sometimes queries from isolation_concurrent_dml were blocking create_distributed_table in isolation_data_migration: 1. https://app.circleci.com/pipelines/github/citusdata/citus/25562/workflows/f9d0a6ff-bb7a-4b71-9fcf-1a3e46d54425/jobs/713270 2. https://app.circleci.com/pipelines/github/citusdata/citus/25562/workflows/1e22454c-1623-48a7-97fb-c6803c7959c7/jobs/713223 3. https://app.circleci.com/pipelines/github/citusdata/citus/25562/workflows/618c419e-eefb-4582-9482-322dbb9ac96d/jobs/713110 This fixes it changing the schedule to not run these tests in parallel. (cherry picked from commit `dff71abc32`)	2022-09-07 13:27:49 +02:00
Jelte Fennema	86614e3555	Fix flakyness in isolation_replicate_reference_tables_to_coordinator.spec (#6123 ) When the deadlock detector kills s2-update-dist-table both sessions finish at the same time. The order in which they are displayed can be swapped. To counteract this we start using the ["marker" feature][1] of the isolationtester framework to create consistent output. In passing this also sets the next_shard_id to the expected value by this test so it can be run using `make check-isolation-base`. Failed CI test: https://app.circleci.com/pipelines/github/citusdata/citus/25562/workflows/dfe6f88a-c306-4d91-b771-d5d1deb1798d/jobs/713417 [1]: `ec62ce55a8/src/test/isolation/README (L152)` (cherry picked from commit `8bbc1a45e1`)	2022-09-07 13:27:49 +02:00
Hanefi Onaldi	a29b689fc9	Replace isolation tester func only once on enterprise tests (#6064 ) This is a continuation of a refactor (with commit sha `2b7cf0c097`) that aimed to use Citus helper UDFs by default in iso tests. PostgreSQL isolation test infrastructure uses some UDFs to detect whether concurrent sessions block each other. Citus implements alternatives to that UDF so that we are able to detect and report distributed transactions that get blocked on the worker nodes as well. We needed to explicitly replace PG helper functions with Citus implementations in each isolation file. Now we replace them by default. (cherry picked from commit `ae58ca5783`)	2022-09-07 13:27:49 +02:00
Hanefi Onaldi	b90523628f	Replace iso tester func only once (#5964 ) Use Citus helper UDFs by default in iso tests PostgreSQL isolation test infrastructure uses some UDFs to detect whether concurrent sessions block each other. Citus implements alternatives to that UDF so that we are able to detect and report distributed transactions that get blocked on the worker nodes as well. We needed to explicitly replace PG helper functions with Citus implementations in each isolation file. Now we replace them by default. (cherry picked from commit `2b7cf0c097`)	2022-09-07 13:27:49 +02:00
Jelte Fennema	75cf7a748d	Define symbols required for downgrade from 11.1 (#6301 ) Since #6300/e29db74 changed the C symbol that our bigint overrides of pg_cancel_backend and pg_terminate_backend called. We needed to do something to continue to make these functions work after downgrading. Recreating the old definition with a downgrade scripts is not really possible, since people are expected to run the downgrade steps when using the new .so file, which does not contain the old symbols. So, the easiest way to solve it was also defining the new symbols in our old Citus versions. Luckily our overrides haven't existed for long, so these symbol definitions only needed to be backported to 11.0.	2022-09-07 12:18:39 +02:00
Marco Slot	5f57d77899	Allow citus_internal application_name with additional suffix (#6282 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-09-05 21:41:06 +02:00
Marco Slot	0a11da1291	Add an allow_unsafe_constraints flag for constraints without distribution column (#6237 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-08-25 16:13:07 +02:00
Gokhan Gulbiz	07143e7d12	Use the same colocation group for child and parent rels when altering a distributed table (#6225 ) * Alter_distributed_table colocateWith:none bug fix for partitioned tables. * Regression tests added for alter_distributed_table colocateWith:none for partitioned tables * Update query comparision to be more accurate (cherry picked from commit `69d2fcf5c0`)	2022-08-25 11:47:06 +03:00
Marco Slot	006c8eacc0	Verify that we can replicate reference tables using rebalancer	2022-08-23 23:26:49 +02:00
Marco Slot	9dc6273b88	Set application_name to citus_rebalancer when copying reference tables	2022-08-23 23:26:49 +02:00
Onur Tirtir	dbfdaca0f0	Add changelog entries for 11.0.6 (cherry picked from commit `b6b8f198d9`)	2022-08-19 11:09:46 +03:00
Onur Tirtir	a5d6e841df	Bump citus version to 11.0.6	2022-08-19 10:58:38 +03:00
Jelte Fennema	73e993e908	Fix flakyness in isolation_reference_table (#6193 ) The newly introduced isolation_reference_table test had some flakyness, because the assumption on how the arbitrary reference table gets chosen was incorrect. This introduces a VACUUM FULL at the start of the test to ensure the assumption actually holds. Example of failed test: https://app.circleci.com/pipelines/github/citusdata/citus/26108/workflows/0a5cd526-006b-423e-8b67-7411b9c6be36/jobs/736802	2022-08-18 14:47:59 +02:00
Nils Dijk	08dee6fe08	Fix reference table lock contention (#6173 ) DESCRIPTION: Fix reference table lock contention Dropping and creating reference tables unintentionally blocked on each other due to the use of an ExclusiveLock for both the Drop and conditionally copying existing reference tables to (new) nodes. The patch does the following: - Lower lock lever for dropping (reference) tables to `ShareLock` so they don't self conflict - Treat reference tables and distributed tables equally and acquire the colocation lock when dropping any table that is in a colocation group - Perform the precondition check for copying reference tables twice, first time with a lower lock that doesn't conflict with anything. Could have been a NoLock, however, in preparation for dropping a colocation group, it is an `AccessShareLock` During normal operation the first check will always pass and we don't have to escalate that lock. Making it that we won't be blocked on adding and remove reference tables. Only after a node addition the first `create_reference_table` will still need to acquire an `ExclusiveLock` on the colocation group to perform the copy.	2022-08-18 13:22:31 +02:00
Onder Kalaci	87787dd146	Support Sequences owned by columns before distributing tables There are 3 different ways that a sequence can be interacting with tables. (1) and (2) are already supported. This commit adds support for (3). (1) column DEFAULT nextval('seq'): The dependency is roughly like below, and ExpandCitusSupportedTypes() is responsible for finding the depending sequences. schema <--- table <--- column <---- default value ^ \| \|------------------ sequence <--------\| (2) serial columns: Bigserial/small serial etc: The dependency is roughly like below, and ExpandCitusSupportedTypes() is responsible for finding the depending sequences. schema <--- table <--- column <---- default value ^ \| \| \| sequence <--------\| (3) Sequence OWNED BY table.column: Added support for this type of resolution in this commit. The dependency is almost like the following, and ExpandCitusSupportedTypes() is NOT responsible for finding the dependency. schema <--- table <--- column ^ \| sequence (cherry picked from commit `9ec8e627c1`)	2022-08-18 11:22:25 +02:00
Marco Slot	56939f0d14	Fix relation access tracking for local only transactions on release-11.0 (#6182 ) Co-authored-by: Onder Kalaci <onderkalaci@gmail.com>	2022-08-18 10:13:41 +02:00
Ahmet Gedemenli	7df8588107	Fix upgrade paths for 11.0 (#6171 ) * Fix upgrade paths for 11.0	2022-08-17 21:34:23 +03:00
aykut-bozkurt	e0b4455e45	sysid should be parsed as int. (#6150 ) (cherry picked from commit `898801504e`)	2022-08-11 11:03:41 +03:00
Onur Tirtir	f8e3e8c444	Add CHANGELOG entries for 11.0.5 (#6108 ) (cherry picked from commit `0a04b115aa`)	2022-08-01 13:40:43 +03:00
Onur Tirtir	a18f6c4e40	Bump citus version to 11.0.5	2022-08-01 10:56:31 +03:00
Jelte Fennema	d6c885713e	Work around flaky test related to search_path (#5894 ) For some reason search_path is not always set correctly on the worker when calling a distributed function, this shows up when calling `insert_document` in our distributed_triggers test. The underlying reason is currently unknown and warrants deeper investigation. Currently this test is one of the main causes for random CI failures. So this change sets the search_path of each function explicitly, to reduce these failures. So other devs can be more efficient, while I continue investigating the root cause of this issue. Also changes explicit `SET citus.enable_unsafe_triggers = false` to `RESET citus.enable_unsafe_triggers` in passing. (cherry picked from commit `6d8c5931d6`)	2022-08-01 10:17:26 +03:00
Ying Xu	a8aa82a3ec	Bugfix for IN clause to be considered during planner phase in Columnar (#6030 ) Reported bug #5803 shows that we are currently not sending the IN clause to our planner for columnar. This PR fixes it by checking for ScalarArrayOpExpr in ExtractPushdownClause so that we do not skip it. Also added a test case for this new addition.	2022-07-29 17:56:24 +02:00
Ahmet Gedemenli	2f1719c149	Do not create truncate triggers on foreign tables (#6103 )	2022-07-29 16:43:09 +03:00
Marco Slot	4eb0749369	Avoid catalog read via superuser() call in DecrementSharedConnectionCounter	2022-07-29 14:22:51 +02:00
Marco Slot	4439124b6d	Fix issues with insert..select casts and column ordering	2022-07-28 13:54:04 +02:00
Jelte Fennema	1cf079581f	Avoid possible information leakage about existing users (#6090 ) (cherry picked from commit `0f50bef696`)	2022-07-27 17:58:24 +02:00
Ahmet Gedemenli	4d01af5160	Error out for views with circular dependencies (#6051 ) Adds error check for views with circular dependencies (cherry picked from commit `2b2a529653`)	2022-07-27 17:59:49 +03:00
Marco Slot	e45b6ece0d	Allow WITH HOLD cursors with parameters	2022-07-27 14:08:18 +02:00
Onder Kalaci	9af736c7a6	Concurrent shard move/copy and colocated table creation fix It turns out that create_distributed_table and citus_move/copy_shard_placement does not work well concurrently. To fix that, we need to acquire a lock, which sounds like a good use of colocation lock. However, the current usage of colocation lock is limited to higher level UDFs like rebalance_table_shards etc. Those usage of lock is still useful, but we cannot acquire the same lock on citus_move_shard_placement etc. because the coordinator connects to itself to acquire the lock. Hence, the high level UDF blocks itself. To fix that, we use one more colocation lock, with the placements are the main objects to consider. (cherry picked from commit `12fa3aaf6b`)	2022-07-27 10:10:46 +02:00
Onder Kalaci	a21a4e128c	Optimize StringJoin() for when prefix-postfix is needed Before this commit, we required multiple copies of the same stringInfo if we needed to append/prepend data to the stringInfo. Now, we optionally get prefix/postfix. For large string operations, this can save up to %10 memory. (cherry picked from commit `26fdcb68f0`)	2022-07-27 10:02:32 +02:00
Onder Kalaci	2a684e426c	Do not cache all the metadata during fix_all_partition_shard_index_names (cherry picked from commit `f076e81166`)	2022-07-27 10:02:05 +02:00
Onder Kalaci	377375de2a	Reduce memory consumption while adjust partition index names Previously, CreateFixPartitionShardIndexNames() created all the relevant query strings for all the shards, and executed the large query string. And, in terms of the memory consumption, this huge command (and its ExprContext generated while running the command) is the main bottleneck/ With this change, we are reducing the total amount of memory usage to almost 1/shard_count. On my local machine, a distributed partitioned table with 120 partitions, each 32 shards, the total memory consumption reduced from ~3GB to ~0.1GB. And, the total execution time increased from ~28 seconds to ~30 seconds. This seems like a good trade-off. (cherry picked from commit `b8008999dc`)	2022-07-27 10:02:00 +02:00
Nitish Upreti	fcdf4434c6	Fix blocking shard moves failure due to constraint failure. DESCRIPTION: Fix Bug #4949 where Blocking shard moves fails if there is a foreign key between partitioned distributed tables (from child to parent). This is because we try to create constraints before attaching child partitions to parent. This causes constraint failure as parent table will be empty. Fix is to reverse the order i.e. attach partitions before we create constraints. TESTING: Added a new test 'shard_move_constraints_blocking' inspired for existing 'shard_move_constraints' where we trigger shard move with 'block_writes' instead of 'force_logical' to add coverage for this scenario.	2022-07-24 21:21:25 -07:00
Hanefi Onaldi	5ca792aef9	Bump Citus version to 11.0.4	2022-07-13 18:06:04 +03:00
Hanefi Onaldi	096047bfbc	Add changelog entry for 11.0.4	2022-07-13 18:05:19 +03:00
Onder Kalaci	c51095c462	Add more generic read-replica tests (cherry picked from commit `6cd7319f12`)	2022-07-13 15:16:04 +02:00
Onder Kalaci	857a770b86	Add regression tests for LOCK command citus.use_secondary_nodes=always mode (cherry picked from commit `3c343d4563`)	2022-07-13 15:15:52 +02:00
Onder Kalaci	06e55df141	Make sure citus_is_coordinator works on read replicas (cherry picked from commit `b2e9a5baf1`)	2022-07-13 15:15:46 +02:00
Onder Kalaci	06d6ffbb6e	LOCK COMMAND does not require primaries at the start (cherry picked from commit `8ab696f7e2`)	2022-07-13 15:15:40 +02:00
Hanefi Onaldi	2bb106508a	Bump Citus version to 11.0.3	2022-07-05 13:19:10 +03:00
Hanefi Onaldi	5f46f2e9f7	Add changelog entry for 11.0.3 (cherry picked from commit `c33915c3e6`)	2022-07-05 13:19:10 +03:00
Ahmet Gedemenli	ac7511de7d	Fix matviews for citus_add_local_table_to_metadata (#6023 ) (cherry picked from commit `c8e1e243b8`)	2022-07-04 17:01:40 +03:00
Hanefi Onaldi	0eee7fd9b8	Fix downgrade scripts from 11.0-2 to 11.0-1 (cherry picked from commit `f60809a6c1`) Conflicts: src/test/regress/expected/multi_extension.out src/test/regress/sql/multi_extension.sql	2022-06-29 22:52:07 +03:00
Önder Kalacı	03a4305e06	Fixes a bug that prevents upgrades when there are no worker nodes (#6037 ) (cherry picked from commit `bab4c0a8c3`)	2022-06-29 14:36:24 +03:00

1 2 3 4 5 ...

5893 Commits (7603fe510ac34e6e534aaaceaea50c7294fd142f) All Branches Search

5893 Commits (7603fe510ac34e6e534aaaceaea50c7294fd142f)

All Branches