citus

Commit Graph

Author	SHA1	Message	Date
Colm	bb6eeb17cc	Fix bug in redundant WHERE clause detection. (#8162 ) Need to also check Postgres plan's rangetables for relations used in Initplans. DESCRIPTION: Fix a bug in redundant WHERE clause detection; we need to additionally check the Postgres plan's range tables for the presence of citus tables, to account for relations that are referenced from scalar subqueries. There is a fundamental flaw in `4139370`, the assumption that, after Postgres planning has completed, all tables used in a query can be obtained by walking the query tree. This is not the case for scalar subqueries, which will be referenced by `PARAM` nodes. The fix adds an additional check of the Postgres plan range tables; if there is at least one citus table in there we do not need to change the needs distributed planning flag. Fixes #8159	2025-08-27 13:32:02 +01:00
Muhammad Usama	be6668e440	Snapshot-Based Node Split – Foundation and Core Implementation (#8122 ) DESCRIPTION: This pull request introduces the foundation and core logic for the snapshot-based node split feature in Citus. This feature enables promoting a streaming replica (referred to as a clone in this feature and UI) to a primary node and rebalancing shards between the original and the newly promoted node without requiring a full data copy. This significantly reduces rebalance times for scale-out operations where the new node already contains a full copy of the data via streaming replication. Key Highlights: 1. Replica (Clone) Registration & Management Infrastructure Introduces a new set of UDFs to register and manage clone nodes: - citus_add_clone_node() - citus_add_clone_node_with_nodeid() - citus_remove_clone_node() - citus_remove_clone_node_with_nodeid() These functions allow administrators to register a streaming replica of an existing worker node as a clone, making it eligible for later promotion via snapshot-based split. 2. Snapshot-Based Node Split (Core Implementation) New core UDF: - citus_promote_clone_and_rebalance() This function implements the full workflow to promote a clone and rebalance shards between the old and new primaries. Steps include: 1. Ensuring Exclusivity – Blocks any concurrent placement-changing operations. 2. Blocking Writes – Temporarily blocks writes on the primary to ensure consistency. 3. Replica Catch-up – Waits for the replica to be fully in sync. 4. Promotion – Promotes the replica to a primary using pg_promote. 5. Metadata Update – Updates metadata to reflect the newly promoted primary node. 6. Shard Rebalancing – Redistributes shards between the old and new primary nodes. 3. Split Plan Preview A new helper UDF get_snapshot_based_node_split_plan() provides a preview of the shard distribution post-split, without executing the promotion. Example: ``` reb 63796> select * from pg_catalog.get_snapshot_based_node_split_plan('127.0.0.1',5433,'127.0.0.1',5453); table_name \| shardid \| shard_size \| placement_node --------------+---------+------------+---------------- companies \| 102008 \| 0 \| Primary Node campaigns \| 102010 \| 0 \| Primary Node ads \| 102012 \| 0 \| Primary Node mscompanies \| 102014 \| 0 \| Primary Node mscampaigns \| 102016 \| 0 \| Primary Node msads \| 102018 \| 0 \| Primary Node mscompanies2 \| 102020 \| 0 \| Primary Node mscampaigns2 \| 102022 \| 0 \| Primary Node msads2 \| 102024 \| 0 \| Primary Node companies \| 102009 \| 0 \| Clone Node campaigns \| 102011 \| 0 \| Clone Node ads \| 102013 \| 0 \| Clone Node mscompanies \| 102015 \| 0 \| Clone Node mscampaigns \| 102017 \| 0 \| Clone Node msads \| 102019 \| 0 \| Clone Node mscompanies2 \| 102021 \| 0 \| Clone Node mscampaigns2 \| 102023 \| 0 \| Clone Node msads2 \| 102025 \| 0 \| Clone Node (18 rows) ``` 4 Test Infrastructure Enhancements - Added a new test case scheduler for snapshot-based split scenarios. - Enhanced pg_regress_multi.pl to support creating node backups with slightly modified options to simulate real-world backup-based clone creation. ### 5. Usage Guide The snapshot-based node split can be performed using the following workflow: - Take a Backup of the Worker Node Run pg_basebackup (or an equivalent tool) against the existing worker node to create a physical backup. `pg_basebackup -h <primary_worker_host> -p <port> -D /path/to/replica/data --write-recovery-conf ` - Start the Replica Node Start PostgreSQL on the replica using the backup data directory, ensuring it is configured as a streaming replica of the original worker node. - Register the Backup Node as a Clone Mark the registered replica as a clone of its original worker node: `SELECT * FROM citus_add_clone_node('<clone_host>', <clone_port>, '<primary_host>', <primary_port>); ` - Promote and Rebalance the Clone Promote the clone to a primary and rebalance shards between it and the original worker: `SELECT * FROM citus_promote_clone_and_rebalance('clone_node_id'); ` - Drop Any Replication Slots from the Original Worker After promotion, clean up any unused replication slots from the original worker: `SELECT pg_drop_replication_slot('<slot_name>'); `	2025-08-19 14:13:55 +03:00
Muhammad Usama	f743b35fc2	Parallelize Shard Rebalancing & Unlock Concurrent Logical Shard Moves (#7983 ) DESCRIPTION: Parallelizes shard rebalancing and removes the bottlenecks that previously blocked concurrent logical-replication moves. These improvements reduce rebalance windows—particularly for clusters with large reference tables and enable multiple shard transfers to run in parallel. Motivation: Citus’ shard rebalancer has some key performance bottlenecks: Sequential Movement of Reference Tables: Reference tables are often assumed to be small, but in real-world deployments, they can grow significantly large. Previously, reference table shards were transferred as a single unit, making the process monolithic and time-consuming. No Parallelism Within a Colocation Group: Although Citus distributes data using colocated shards, shard movements within the same colocation group were serialized. In environments with hundreds of distributed tables colocated together, this serialization significantly slowed down rebalance operations. Excessive Locking: Rebalancer used restrictive locks and redundant logical replication guards, further limiting concurrency. The goal of this commit is to eliminate these inefficiencies and enable maximum parallelism during rebalance, without compromising correctness or compatibility. Parallelize shard rebalancing to reduce rebalance time. Feature Summary: 1. Parallel Reference Table Rebalancing Each reference-table shard is now copied in its own background task. Foreign key and other constraints are deferred until all shards are copied. For single shard movement without considering colocation a new internal-only UDF '`citus_internal_copy_single_shard_placement`' is introduced to allow single-shard copy/move operations. Since this function is internal, we do not allow users to call it directly. Temporary Hack to Set Background Task Context Background tasks cannot currently set custom GUCs like application_name before executing internal-only functions. 'citus_rebalancer ...' statement as a prefix in the task command. This is a temporary hack to label internal tasks until proper GUC injection support is added to the background task executor. 2. Changes in Locking Strategy - Drop the leftover replication lock that previously serialized shard moves performed via logical replication. This lock was only needed when we used to drop and recreate the subscriptions/publications before each move. Since Citus now removes those objects later as part of the “unused distributed objects” cleanup, shard moves via logical replication can safely run in parallel without additional locking. - Introduced a per-shard advisory lock to prevent concurrent operations on the same shard while allowing maximum parallelism elsewhere. - Change the lock mode in AcquirePlacementColocationLock from ExclusiveLock to RowExclusiveLock to allow concurrent updates within the same colocation group, while still preventing concurrent DDL operations. 3. citus_rebalance_start() enhancements The citus_rebalance_start() function now accepts two new optional parameters: ``` - parallel_transfer_colocated_shards BOOLEAN DEFAULT false, - parallel_transfer_reference_tables BOOLEAN DEFAULT false ``` This ensures backward compatibility by preserving the existing behavior and avoiding any disruption to user expectations and when both are set to true, the rebalancer operates with full parallelism. Previous Rebalancer Behavior: `SELECT citus_rebalance_start(shard_transfer_mode := 'force_logical');` This would: Start a single background task for replicating all reference tables Then, move all shards serially, one at a time. ``` Task 1: replicate_reference_tables() ↓ Task 2: move_shard_1() ↓ Task 3: move_shard_2() ↓ Task 4: move_shard_3() ``` Slow and sequential. Reference table copy is a bottleneck. Colocated shards must wait for each other. New Parallel Rebalancer: ``` SELECT citus_rebalance_start( shard_transfer_mode := 'force_logical', parallel_transfer_colocated_shards := true, parallel_transfer_reference_tables := true ); ``` This would: - Schedule independent background tasks for each reference-table shard. - Move colocated shards in parallel, while still maintaining dependency order. - Defer constraint application until all reference shards are in place. - ``` Task 1: copy_ref_shard_1() Task 2: copy_ref_shard_2() Task 3: copy_ref_shard_3() → Task 4: apply_constraints() ↓ Task 5: copy_shard_1() Task 6: copy_shard_2() Task 7: copy_shard_3() ↓ Task 8-10: move_shard_1..3() ``` Each operation is scheduled independently and can run as soon as dependencies are satisfied.	2025-08-18 17:44:14 +03:00
eaydingol	8d929d3bf8	Push down recurring outer joins when possible (#7973 ) DESCRIPTION: Adds support for pushing down LEFT/RIGHT outer joins having a reference table in the outer side and a distributed table on the inner side (e.g., <reference table> LEFT JOIN <distributed table>) Partially addresses #6546 1) `<outer:reference>` LEFT JOIN `<inner:distributed>` 2) `<inner:distributed>` RIGHT JOIN `<outer:reference>` Previously, for outer joins of types (1) and (2), the distributed side was computed recursively. This was necessary because, when the inner side of a recurring outer join is a distributed table, it is not possible to directly distribute the join; the preserved (outer and recurring) side may generate rows with join keys that hash to different shards. To implement distributed planning while maintaining consistency with global execution semantics, this PR restricts the outer side only to those partition key values that route to the selected shard during distributed shard query computation. This method is employed )when the following criteria are met: (recursive planning applied otherwise) - The join type is (1) or (2) (lateral joins are not supported). - The outer side is a reference table. - The outer join qualifications include an equality condition between the partition column of a distributed table and the recurring table. - The join is not part of a chained join. - The “enable_recurring_outer_join_pushdown” GUC is enabled (default is on). --------- Co-authored-by: ebruaydingol <ebruaydingol@microsoft.com> Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>	2025-08-18 14:03:44 +03:00
Onur Tirtir	87a1b631e8	Not automatically create citus_columnar when creating citus extension (#8081 ) DESCRIPTION: Not automatically create citus_columnar when there are no relations using it. Previously, we were always creating citus_columnar when creating citus with version >= 11.1. And how we were doing was as follows: * Detach SQL objects owned by old columnar, i.e., "drop" them from citus, but not actually drop them from the database * "old columnar" is the one that we had before Citus 11.1 as part of citus, i.e., before splitting the access method ands its catalog to citus_columnar. * Create citus_columnar and attach the SQL objects leftover from old columnar to it so that we can continue supporting the columnar tables that user had before Citus 11.1 with citus_columnar. First part is unchanged, however, now we don't create citus_columnar automatically anymore if the user didn't have any relations using columnar. For this reason, as of Citus 13.2, when these SQL objects are not owned by an extension and there are no relations using columnar access method, we drop these SQL objects when updating Citus to 13.2. The net effect is still the same as if we automatically created citus_columnar and user dropped citus_columnar later, so we should not have any issues with dropping them. (Update: Seems we've made some assumptions in citus, e.g., citus_finish_pg_upgrade() still assumes columnar metadata exists and tries to apply some fixes for it, so this PR fixes them as well. See the last section of this PR description.) Also, ideally I was hoping to just remove some lines of code from extension.c, where we decide automatically creating citus_columnar when creating citus, however, this didn't happen to be the case for two reasons: * We still need to automatically create it for the servers using columnar access method. * We need to clean-up the leftover SQL objects from old columnar when the above is not case otherwise we would have leftover SQL objects from old columnar for no reason, and that would confuse users too. * Old columnar cannot be used to create columnar tables properly, so we should clean them up and let the user decide whether they want to create citus_columnar when they really need it later. --- Also made several changes in the test suite because similarly, we don't always want to have citus_columnar created in citus tests anymore: * Now, columnar specific test targets, which cover 41 test sql files, always install columnar by default, by using "--load-extension=citus_columnar". * "--load-extension=citus_columnar" is not added to citus specific test targets because by default we don't want to have citus_columnar created during citus tests. * Excluding citus_columnar specific tests, we have 601 sql files that we have as citus tests and in 27 of them we manually create citus_columnar at the very beginning of the test because these tests do test some functionalities of citus together with columnar tables. Also, before and after schedules for PG upgrade tests are now duplicated so we have two versions of each: one with columnar tests and one without. To choose between them, check-pg-upgrade now supports a "test-with-columnar" option, which can be set to "true" or anything else to logically indicate "false". In CI, we run the check-pg-upgrade test target with both options. The purpose is to ensure we can test PG upgrades where citus_columnar is not created in the cluster before the upgrade as well. Finally, added more tests to multi_extension.sql to test Citus upgrade scenarios with / without columnar tables / citus_columnar extension. --- Also, seems citus_finish_pg_upgrade was assuming that citus_columnar is always created but actually we should have never made such an assumption. To fix that, moved columnar specific post-PG-upgrade work from citus to a new columnar UDF, which is columnar_finish_pg_upgrade. But to avoid breaking existing customer / managed service scripts, we continue to automatically perform post PG-upgrade work for columnar within citus_finish_pg_upgrade, but only if columnar access method exists this time.	2025-08-18 08:29:27 +01:00
eaydingol	3d8fd337e5	Check outer table partition column (#8092 ) DESCRIPTION: Introduce a new check to push down a query including union and outer join to fix #8091 . In "SafeToPushdownUnionSubquery", we check if the distribution column of the outer relation is in the target list.	2025-08-06 16:13:14 +03:00
Cédric Villemain	0c1b31cdb5	Fix UPDATE stmts with indirection & array/jsonb subscripting with more than 1 field (#7675 ) DESCRIPTION: Fixes problematic UPDATE statements with indirection and array/jsonb subscripting with more than one field. Fixes #4092, #7674 and #5621. Issues #7674 and #4092 involve an UPDATE with out of order columns and a sublink (SELECT) in the source, e.g. `UPDATE T SET (col3, col1, col4) = (SELECT 3, 1, 4)` where an incorrect value could get written to a column because query deparsing generated an incorrect SQL statement. To address this the fix adds an additional check to `ruleutils` to ensure that the target list of an UPDATE statement is in an order so that deparsing can be done safely. It is needed when the source of the UPDATE has a sublink, because Postgres `rewrite` will have put the target list in attribute order, but for deparsing to produce a correct SQL text the target list needs to be in order of the references (or `paramids`) to the target list of the sublink(s). Issue #5621 involves an UPDATE with array/jsonb subscripting that can behave incorrectly with more than one field, again because Citus query deparsing is receiving a post-`rewrite` query tree. The fix also adds a check to `ruleutils` to enable correct query deparsing of the UPDATE. --------- Co-authored-by: Ibrahim Halatci <ihalatci@gmail.com> Co-authored-by: Colm McHugh <colm.mchugh@gmail.com>	2025-07-22 17:49:26 +01:00
Naisila Puka	a18040869a	Error out for queries with outer joins and pseudoconstant quals in PG<17 (#7937 ) PG15 commit d1ef5631e620f9a5b6480a32bb70124c857af4f1 and PG16 commit 695f5deb7902865901eb2d50a70523af655c3a00 disallow replacing joins with scans in queries with pseudoconstant quals. This commit prevents the set_join_pathlist_hook from being called if any of the join restrictions is a pseudo-constant. So in these cases, citus has no info on the join, never sees that the query has an outer join, and ends up producing an incorrect plan. PG17 fixes this by commit 9e9931d2bf40e2fea447d779c2e133c2c1256ef3 Therefore, we take this extra measure here for PG versions less than 17. hasOuterJoin can never be true when set_join_pathlist_hook is absent.	2025-05-11 21:47:28 +00:00
Onur Tirtir	3ce731d497	Make multi_metadata_sync runnable via run_test.py (#7472 )	2024-01-31 09:50:16 +00:00
Jelte Fennema-Nio	fcfedff8d1	Support running isolation_update_node in flaky test detection (#7425 ) I noticed in #7423 that `isolation_update_node` could not be run using flaky test detection. This fixes that.	2024-01-17 15:36:26 +00:00
Onur Tirtir	1d55debb98	Support CREATE / DROP database commands from any node (#7359 ) DESCRIPTION: Adds support for issuing `CREATE`/`DROP` DATABASE commands from worker nodes With this commit, we allow issuing CREATE / DROP DATABASE commands from worker nodes too. As in #7278, this is not allowed when the coordinator is not added to metadata because we don't ever sync metadata changes to coordinator when adding coordinator to the metadata via `SELECT citus_set_coordinator_host('<hostname>')`, or equivalently, via `SELECT citus_add_node(<coordinator_node_name>, <coordinator_node_port>, 0)`. We serialize database management commands by acquiring a Citus specific advisory lock on the first primary worker node if there are any workers in the cluster. As opposed to what we've done in https://github.com/citusdata/citus/pull/7278 for role management commands, we try to avoid from running into distributed deadlocks as much as possible. This is because, while distributed deadlocks that can happen around role management commands can be detected by Citus, this is not the case for database management commands because most of them cannot be run inside in a transaction block. In that case, Citus cannot even detect the distributed deadlock because the command is not part of a distributed transaction at all, then the command execution might not return the control back to the user for an indefinite amount of time.	2024-01-08 16:47:49 +00:00
Karina	20dc58cf5d	Fix getting heap tuple size (#7387 ) This fixes #7230. First of all, using HeapTupleHeaderGetDatumLength(heapTuple) is definetly wrong, it gives a number that's 4 times less than the correct tuple size (heapTuple.t_len). See https://github.com/postgres/postgres/blob/REL_16_0/src/include/access/htup_details.h#L455-L456 https://github.com/postgres/postgres/blob/REL_16_0/src/include/varatt.h#L279 https://github.com/postgres/postgres/blob/REL_16_0/src/include/varatt.h#L225-L226 When I fixed it, the limit_intermediate_size test failed, so I tried to understand what's going on there. In original commit `fd546cf` these queries were supposed to fail. Then in `b3af63c` three of the queries that were supposed to fail suddenly worked and tests were changed to pass without understanding why the output had changed or how to keep test testing what it had to test. Even comments saying that these queries should fail were left untouched. Commit message gives no clue about why exactly test has changed: > It seems that when we use adaptive executor instead of task tracker, we > exceed the intermediate result size less in the test. Therefore updated > the tests accordingly. Then `3fda2c3` also blindly raised the limit for one of the queries to keep it working: `3fda2c3254 (diff-a9b7b617f9dfd345318cb8987d5897143ca1b723c87b81049bbadd94dcc86570R19)` When in `fe3caf3` that HeapTupleHeaderGetDatumLength(heapTuple) call was finally added, one of those test queries became failing again. The other two of them now also failing after the fix. I don't understand how exactly the calculation of "intermediate result size" that is limited by citus.max_intermediate_result_size had changed through `b3af63c` and `fe3caf3`, but these numbers are now closer to what they originally were when this limitation was added in `fd546cf`. So these queries should fail, like in the original version of the limit_intermediate_size test. Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>	2024-01-08 17:09:30 +01:00
Onur Tirtir	968ac74cde	Fix foreign_key_to_reference_shard_rebalance test (#7400 ) foreign_key_to_reference_shard_rebalance failed because partition of 2024 year does not exist, fixed by add default partition. Replaces https://github.com/citusdata/citus/pull/7396 by adding a rule that allows properly testing foreign_key_to_reference_shard_rebalance via run_test.py. Closes #7396 Co-authored-by: chuhx <148182736+cstarc1@users.noreply.github.com>	2024-01-04 13:16:45 +01:00
Onur Tirtir	d940cfa992	Do nothing if the database is not distributed (#7392 ) Fixes the remaining cases reported in https://github.com/citusdata/citus/issues/7370.	2024-01-03 17:03:06 +03:00
Naisila Puka	c6fbb72c02	Fix flaky multi_prepare_plsql (#7346 ) Simple need of an `ORDER BY` clause Ran into this twice this week already! https://github.com/citusdata/citus/actions/runs/6849701315/attempts/1#summary-18622563506 https://github.com/citusdata/citus/actions/runs/6875051160/attempts/1#summary-18698009952 ```diff SELECT nspname, typname FROM pg_type JOIN pg_namespace ON pg_namespace.oid = pg_type.typnamespace WHERE typname = 'prepare_ddl_type_backup'; nspname \| typname -------------+------------------------- - public \| prepare_ddl_type_backup otherschema \| prepare_ddl_type_backup + public \| prepare_ddl_type_backup (2 rows) ```	2023-11-15 13:28:43 +03:00
Naisila Puka	cdef2d5224	Random tests refactoring (#7342 ) While investigating replication slots leftovers in PR https://github.com/citusdata/citus/pull/7338, I ran into the following refactoring/cleanup that can be done in our test suite: - Add separate test to remove non default nodes - Remove coordinator removal from `add_coordinator` test Use `remove_coordinator_from_metadata` test where needed - Don't print nodeids in `multi_multiuser_auth` and `multi_poolinfo_usage` tests - Use `startswith` when checking for isolation or failure tests - Add some dependencies accordingly in `run_test.py` for running flaky test schedules	2023-11-14 12:49:15 +03:00
Naisila Puka	57ff762c82	Fix VACUUM flakiness in multi_utilities (#7334 ) When I run this test in my local, the size of the table after the DELETE command is around 58785792. Hence, I assume that the diffs suggest that the Vacuum had no effect. The current solution is to run the VACUUM command three times instead of once. Example diff: https://github.com/citusdata/citus/actions/runs/6722231142/attempts/1#summary-18269870674 ```diff insert into local_vacuum_table select i from generate_series(1,1000000) i; delete from local_vacuum_table; VACUUM local_vacuum_table; SELECT CASE WHEN s BETWEEN 20000000 AND 25000000 THEN 22500000 ELSE s END FROM pg_total_relation_size('local_vacuum_table') s ; s ---------- - 22500000 + 58785792 (1 row) ``` See more diff examples in the PR description https://github.com/citusdata/citus/pull/7334	2023-11-09 21:00:24 +03:00
Onur Tirtir	5e2439a117	Make some more tests re-runable (#7322 ) * multi_mx_create_table * multi_mx_function_table_reference * multi_mx_add_coordinator * create_role_propagation * metadata_sync_helpers * text_search https://github.com/citusdata/citus/pull/7278 requires this.	2023-11-02 18:32:56 +03:00
Onur Tirtir	9867c5b949	Fix flaky multi_mx_node_metadata.sql test (#7317 ) Fixes the flaky test that results in following diff: ```diff --- /__w/citus/citus/src/test/regress/expected/multi_mx_node_metadata.out.modified 2023-11-01 14:22:12.890476575 +0000 +++ /__w/citus/citus/src/test/regress/results/multi_mx_node_metadata.out.modified 2023-11-01 14:22:12.914476657 +0000 @@ -840,24 +840,26 @@ (1 row) \c :datname - - :master_port SELECT datname FROM pg_stat_activity WHERE application_name LIKE 'Citus Met%'; datname ------------ db_to_drop (1 row) DROP DATABASE db_to_drop; +ERROR: database "db_to_drop" is being accessed by other users SELECT datname FROM pg_stat_activity WHERE application_name LIKE 'Citus Met%'; datname ------------ -(0 rows) + db_to_drop +(1 row) -- cleanup DROP SEQUENCE sequence CASCADE; NOTICE: drop cascades to default value for column a of table reference_table ```	2023-11-02 11:02:34 +00:00
Cédric Villemain	37415ef8f5	Allow citus__size on index related to a distributed table (#7271 ) I just enhanced the existing code to check if the relation is an index belonging to a distributed table. If so the shardId is appended to relation (index) name and the _size function are executed as before. There is a change in an extern function: `extern StringInfo GenerateSizeQueryOnMultiplePlacements(...)` It's possible to create a new function and deprecate this one later if compatibility is an issue. Fixes https://github.com/citusdata/citus/issues/6496. DESCRIPTION: Allows using Citus size functions on distributed tables indexes. --------- Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>	2023-11-01 09:05:51 +00:00
Jelte Fennema-Nio	788e09a39a	Add a test for citus_shards where table names have spaces (#7224 ) There was a bug reported for previous versions of Citus where shard\_size was returning NULL for tables with spaces in them. It works fine on the main branch though, but I'm still adding a test for this to the main branch because it seems a good test to have.	2023-10-16 11:38:24 +02:00
Gürkan İndibay	7fa109c977	Adds alter user missing features (#7204 ) DESCRIPTION: Adds alter user rename propagation and enriches alter user tests --------- Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>	2023-09-26 12:28:07 +03:00
aykut-bozkurt	8eb3360017	Fixes visibility problems with dependency propagation (#7028 ) Problem: Previously we always used an outside superuser connection to overcome permission issues for the current user while propagating dependencies. That has mainly 2 problems: 1. Visibility issues during dependency propagation, (metadata connection propagates some objects like a schema, and outside transaction does not see it and tries to create it again) 2. Security issues (it is preferrable to use current user's connection instead of extension superuser) Solution (high level): Now, we try to make a smarter decision on whether should we use an outside superuser connection or current user's metadata connection. We prefer using current user's connection if any of the objects, which is already propagated in the current transaction, is a dependency for a target object. We do that since we assume if current user has permissions to create the dependency, then it can most probably propagate the target as well. Our assumption is expected to hold most of the times but it can still be wrong. In those cases, transaction would fail and user should set the GUC `citus.create_object_propagation` to `deferred` to work around it. Solution: 1. We track all objects propagated in the current transaction (we can handle subtransactions), 2. We propagate dependencies via the current user's metadata connection if any dependency is created in the current transaction to address issues listed above. Otherwise, we still use an outside superuser connection. DESCRIPTION: Fixes some object propagation errors seen with transaction blocks. Fixes https://github.com/citusdata/citus/issues/6614 --------- Co-authored-by: Nils Dijk <nils@citusdata.com>	2023-09-05 18:04:16 +03:00
Onur Tirtir	a830862717	Not undistribute Citus local table when converting it to a reference table / single-shard table	2023-08-29 12:57:28 +03:00
Jelte Fennema	fd1427de2c	Change by_disk_size rebalance strategy to have a base size (#7035 ) One problem with rebalancing by disk size is that shards in newly created collocation groups are considered extremely small. This can easily result in bad balances if there are some other collocation groups that do have some data. One extremely bad example of this is: 1. You have 2 workers 2. Both contain about 100GB of data, but there's a 70MB difference. 3. You create 100 new distributed schemas with a few empty tables in them 4. You run the rebalancer 5. Now all new distributed schemas are placed on the node with that had 70MB less. 6. You start loading some data in these shards and quickly the balance is completely off To address this edge case, this PR changes the by_disk_size rebalance strategy to add a a base size of 100MB to the actual size of each shard group. This can still result in a bad balance when shard groups are empty, but it solves some of the worst cases.	2023-06-27 16:37:09 +02:00
Naisila Puka	69af3e8509	Drop PG13 Support Phase 2 - Remove PG13 specific paths/tests (#7007 ) This commit is the second and last phase of dropping PG13 support. It consists of the following: - Removes all PG_VERSION_13 & PG_VERSION_14 from codepaths - Removes pg_version_compat entries and columnar_version_compat entries specific for PG13 - Removes alternative pg13 test outputs - Removes PG13 normalize lines and fix the test outputs based on that It is a continuation of `5bf163a27d`	2023-06-21 14:18:23 +03:00
Onur Tirtir	12a093b456	Allow using generated identity column based on int/smallint when creating a distributed table (#7008 ) Allow using generated identity column based on int/smallint when creating a distributed table so that applications that rely on those data types don't break. Inserting into / modifying such columns from workers is not allowed but it's better than not allowing such columns altogether.	2023-06-16 14:34:23 +03:00
Emel Şimşek	3fda2c3254	Change test files in multi and multi-1 schedules to accommodate coordinator in the metadata. (#6939 ) Changes test files in multi and multi-1 schedules such that they accomodate coordinator in metadata. Changes fall into the following buckets: 1. When coordinator is in metadata, reference table shards are present in coordinator too. This changes test outputs checking the table size, shard numbers etc. for reference tables. 2. When coordinator is in metadata, postgres tables are converted to citus local tables whenever a foreign key relationship to them is created. This changes some test cases which tests it should not be possible to create foreign keys to postgres tables. 3. Remove lines that add/remove coordinator for testing purposes.	2023-06-05 10:37:48 +03:00
Ahmet Gedemenli	1ca80813f6	Citus UDFs support for single shard tables (#6916 ) Verify Citus UDFs work well with single shard tables SUPPORTED * citus_table_size * citus_total_relation_size * citus_relation_size * citus_shard_sizes * truncate_local_data_after_distributing_table * create_distributed_function // test function colocated with a single shard table * undistribute_table * alter_table_set_access_method UNSUPPORTED - error out for single shard tables * master_create_empty_shard * create_distributed_table_concurrently * create_distributed_table * create_reference_table * citus_add_local_table_to_metadata * citus_split_shard_by_split_points * alter_distributed_table	2023-05-26 17:30:05 +03:00
Onur Tirtir	246b054a7d	Add support for schema-based-sharding via a GUC (#6866 ) DESCRIPTION: Adds citus.enable_schema_based_sharding GUC that allows sharding the database based on schemas when enabled. * Refactor the logic that automatically creates Citus managed tables * Refactor CreateSingleShardTable() to allow specifying colocation id instead * Add support for schema-based-sharding via a GUC ### What this PR is about: Add citus.enable_schema_based_sharding GUC to enable schema-based sharding. Each schema created while this GUC is ON will be considered as a tenant schema. Later on, regardless of whether the GUC is ON or OFF, any table created in a tenant schema will be converted to a single shard distributed table (without a shard key). All the tenant tables that belong to a particular schema will be co-located with each other and will have a shard count of 1. We introduce a new metadata table --pg_dist_tenant_schema-- to do the bookkeeping for tenant schemas: ```sql psql> \d pg_dist_tenant_schema Table "pg_catalog.pg_dist_tenant_schema" ┌───────────────┬─────────┬───────────┬──────────┬─────────┐ │ Column │ Type │ Collation │ Nullable │ Default │ ├───────────────┼─────────┼───────────┼──────────┼─────────┤ │ schemaid │ oid │ │ not null │ │ │ colocationid │ integer │ │ not null │ │ └───────────────┴─────────┴───────────┴──────────┴─────────┘ Indexes: "pg_dist_tenant_schema_pkey" PRIMARY KEY, btree (schemaid) "pg_dist_tenant_schema_unique_colocationid_index" UNIQUE, btree (colocationid) psql> table pg_dist_tenant_schema; ┌───────────┬───────────────┐ │ schemaid │ colocationid │ ├───────────┼───────────────┤ │ 41963 │ 91 │ │ 41962 │ 90 │ └───────────┴───────────────┘ (2 rows) ``` Colocation id column of pg_dist_tenant_schema can never be NULL even for the tenant schemas that don't have a tenant table yet. This is because, we assign colocation ids to tenant schemas as soon as they are created. That way, we can keep associating tenant schemas with particular colocation groups even if all the tenant tables of a tenant schema are dropped and recreated later on. When a tenant schema is dropped, we delete the corresponding row from pg_dist_tenant_schema. In that case, we delete the corresponding colocation group from pg_dist_colocation as well. ### Future work for 12.0 release: We're building schema-based sharding on top of the infrastructure that adds support for creating distributed tables without a shard key (https://github.com/citusdata/citus/pull/6867). However, not all the operations that can be done on distributed tables without a shard key necessarily make sense (in the same way) in the context of schema-based sharding. For example, we need to think about what happens if user attempts altering schema of a tenant table. We will tackle such scenarios in a future PR. We will also add a new UDF --citus.schema_tenant_set() or such-- to allow users to use an existing schema as a tenant schema, and another one --citus.schema_tenant_unset() or such-- to stop using a schema as a tenant schema in future PRs.	2023-05-26 10:49:58 +03:00
Jelte Fennema	350a0f6417	Support running Citus upgrade tests with run_test.py (#6832 ) Citus upgrade tests require some additional logic to run, because we have a before and after schedule and we need to swap the Citus version in-between. This adds that logic to `run_test.py`. In passing this makes running upgrade tests locally multiple times faster by caching tarballs.	2023-05-23 14:38:54 +02:00
Emel Şimşek	02f815ce1f	Disable local execution when Explain Analyze is requested for a query. (#6892 ) DESCRIPTION: Fixes a crash when explain analyze is requested for a query that is normally locally executed. When explain analyze is requested for a query, a task with two queries is created. Those two queries are 1. Wrapped Query --> `SELECT ... FROM worker_save_query_explain_analyze(<query>, <explain analyze options>)` 2. Fetch Query -->` SELECT explain_analyze_output, execution_duration FROM worker_last_saved_explain_analyze();` When the query is locally executed a task with multiple queries causes a crash in production. See the Assert at `57455dc64d/src/backend/distributed/executor/tuple_destination.c`#:~:text=Assert(task%2D%3EqueryCount%20%3D%3D%201)%3B This becomes a critical issue when auto_explain extension is used. When auto_explain extension is enabled, explain analyze is automatically requested for every query. One possible solution could be not to create two queries for a locally executed query. The fetch part may not have to be a query since the values are available in local variables. Until we enable local execution for explain analyze, it is best to disable local execution. Fixes #6777.	2023-05-23 14:33:22 +03:00
Emel Şimşek	f9a5be59b9	Run replicate_reference_tables background task as superuser. (#6930 ) DESCRIPTION: Fixes a bug in background shard rebalancer where the replicate reference tables task fails if the current user is not a superuser. This change is to be backported to earlier releases. We should fix the permissions for replicate_reference_tables on main branch such that it can be run by non-superuser roles. Fixes #6925. Fixes #6926.	2023-05-18 23:46:32 +03:00
Onur Tirtir	56d217b108	Mark objects as distributed even when pg_dist_node is empty (#6900 ) We mark objects as distributed objects in Citus metadata only if we need to propagate given the command that creates it to worker nodes. For this reason, we were not doing this for the objects that are created while pg_dist_node is empty. One implication of doing so is that we defer the schema propagation to the time when user creates the first distributed table in the schema. However, this doesn't help for schema-based sharding (#6866) because we want to sync pg_dist_tenant_schema to the worker nodes even for empty schemas too. * Support test dependencies for isolation tests without a schedule * Comment out a test due to a known issue (#6901) * Also, reduce the verbosity for some log messages and make some tests compatible with run_test.py.	2023-05-16 11:45:42 +03:00
Onur Tirtir	db2514ef78	Call null-shard-key tables as single-shard distributed tables in code	2023-05-03 17:02:43 +03:00
Onur Tirtir	fa467e05e7	Add support for creating distributed tables with a null shard key (#6745 ) With this PR, we allow creating distributed tables with without specifying a shard key via create_distributed_table(). Here are the the important details about those tables: * Specifying `shard_count` is not allowed because it is assumed to be 1. * We mostly call such tables as "null shard-key" table in code / comments. * To avoid doing a breaking layout change in create_distributed_table(); instead of throwing an error, it will inform the user that `distribution_type` param is ignored unless it's explicitly set to NULL or 'h'. * `colocate_with` param allows colocating such null shard-key tables to each other. * We define this table type, i.e., NULL_SHARD_KEY_TABLE, as a subclass of DISTRIBUTED_TABLE because we mostly want to treat them as distributed tables in terms of SQL / DDL / operation support. * Metadata for such tables look like: - distribution method => DISTRIBUTE_BY_NONE - replication model => REPLICATION_MODEL_STREAMING - colocation id => != INVALID_COLOCATION_ID (distinguishes from Citus local tables) * We assign colocation groups for such tables to different nodes in a round-robin fashion based on the modulo of "colocation id". Note that this PR doesn't care about DDL (except CREATE TABLE) / SQL / operation (i.e., Citus UDFs) support for such tables but adds a preliminary API.	2023-05-03 16:18:27 +03:00
aykut-bozkurt	a7fa1db696	fix flaky test regex (#6890 ) There was a bug related to regex. We sometimes caught the wrong line when the test name is also included in comments. Example: We caught the wrong line as multi_metadata_sync is included in the comment before the test line. ``` # ---------- # multi_metadata_sync tests the propagation of mx-related metadata changes to metadata workers # multi_unsupported_worker_operations tests that unsupported operations error out on metadata workers # ---------- test: multi_metadata_sync ``` Solution: Restrict regex rule better.	2023-04-27 13:14:40 +03:00
Jelte Fennema	a5f4fece13	Fix running PG upgrade tests with run_test.py (#6829 ) In #6814 we started using the Python test runner for upgrade tests in run_test.py, instead of the Perl based one. This had a problem though, not all tests in minimal_schedule can be run with the Python runner. This adds a separate minimal schedule for the pg_upgrade tests which doesn't include the tests that break with the Python runner. This PR also fixes various other issues that came up while testing the upgrade tests.	2023-04-24 15:54:32 +02:00
Emel Şimşek	2675a68218	Make coordinator always in metadata by default in regression tests. (#6847 ) DESCRIPTION: Changes the regression test setups adding the coordinator to metadata by default. When creating a Citus cluster, coordinator can be added in metadata explicitly by running `citus_set_coordinator_host ` function. Adding the coordinator to metadata allows to create citus managed local tables. Other Citus functionality is expected to be unaffected. This change adds the coordinator to metadata by default when creating test clusters in regression tests. There are 3 ways to run commands in a sql file (or a schedule which is a sequence of sql files) with Citus regression tests. Below is how this PR adds the coordinator to metadata for each. 1. `make <schedule_name>` Changed the sql files (sql/multi_cluster_management.sql and sql/minimal_cluster_management.sql) which sets up the test clusters such that they call `citus_set_coordinator_host`. This ensures any following tests will have the coordinator in metadata by default. 2. `citus_tests/run_test.py <sql_file_name>` Changed the python code that sets up the cluster to always call ` citus_set_coordinator_host`. For the upgrade tests, a version check is included to make sure `citus_set_coordinator_host` function is available for a given version. 3. ` make check-arbitrary-configs ` Changed the python code that sets up the cluster to always call `citus_set_coordinator_host `. #6864 will be used to track the remaining work which is to change the tests where coordinator is added/removed as a node.	2023-04-17 14:14:37 +03:00
Jelte Fennema	d04d32b314	In run_test.py actually return worker_count (#6830 ) Fixes a small mistake that was missed in the refactor of run_test.py that was done in #6816.	2023-04-05 16:38:57 +03:00
Jelte Fennema	e5e5eb35c7	Refactor run_test.py (#6816 ) Over the last few months run_test.py got more and more complex. This refactors the code in `run_test.py` to be better understandable. Mostly this splits up separate pieces of logic into separate functions.	2023-04-05 11:11:30 +02:00
Jelte Fennema	92b358fe0a	Make python-regress based tests runnable with run_test.py (#6814 ) For some tests such as upgrade tests and arbitrary config tests we set up the citus cluster using Python. This setup is slightly different from the perl based setup script (`multi_regress.pl`). Most importantly it uses replication factor 1 by default. This changes our run_test.py script to be able to run a schedule using python instead of `multi_regress.pl`, for the tests that require it. For now arbitrary config tests are still not runnable with `run_test.py`, but this brings us one step closer to being able to do that. Fixes #6804	2023-03-31 17:07:12 +02:00
Jelte Fennema	085b59f586	Fix flaky multi_mx_schema_support test (#6813 ) This happened sometimes: ```diff SELECT objid::oid::regnamespace as "Distributed Schemas" FROM pg_catalog.pg_dist_object WHERE objid::oid::regnamespace IN ('mx_old_schema', 'mx_new_schema'); Distributed Schemas --------------------- - mx_old_schema mx_new_schema + mx_old_schema (2 rows) ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/31706/workflows/edc84a6a-dfef-42b3-ab5c-54daa64c2154/jobs/1065463 In passing make multi_mx_schema_support runnable with run_test.py	2023-03-31 12:36:53 +02:00
Jelte Fennema	7b60cdd13b	Convert columnar tap tests to pytest (#6720 ) Having as little Perl as possible in our repo seems a worthy goal. Sadly Postgres its Perl based TAP infrastructure was the only way in which we could run tests that were hard to do using only SQL commands. This change adds infrastructure to run such "application style tests" using python and converts all our existing Perl TAP tests to this new infrastructure. Some of the helper functions that are added in this PR are currently unused. Most of these will be used by the CDC PR that depends on this. Some others are there because they were needed by the PgBouncer test framework that this is based on, and the functions seemed useful enough to citus testing to keep. The main features of the test suite are: 1. Application style tests using a programming language that our developers know how to write. 2. Caching of Citus clusters in-between tests using the ["fixture" pattern][fixture] from `pytest` to achieve speedy tests. To make this work in practice any changes made during a test are automatically undone. Schemas, replication slots, subscriptions, publications are dropped at the end of each test. And any changes made by `ALTER SYSTEM` or manually editing of `pg_hba.conf` are undone too. 3. Automatic parallel execution of tests using the `-n auto` flag that's added by `pytest-xdist`. This improved the speed of tests greatly with the similar test framework I created for PgBouncer. Right now it doesn't help much yet though, since this PR only adds two tests (one of which takes ~10 times longer than the other). Possible future improvements are: 1. Clean up even more things at the end of each test (e.g. users that were created). These are fairly easy to add, but I have not done so yet since they were not needed yet for this PR or the CDC PR. So I would not be able to test the cleanup easily. 2. Support for query block detection similar to what we can now do using isolation tests. [fixture]: https://docs.pytest.org/en/6.2.x/fixture.html	2023-03-31 12:25:19 +02:00
Emel Şimşek	d3fb9288ab	Schedule parallel shard moves in background rebalancer by removing task dependencies between shard moves across colocation groups. (#6756 ) DESCRIPTION: This PR removes the task dependencies between shard moves for which the shards belong to different colocation groups. This change results in scheduling multiple tasks in the RUNNABLE state. Therefore it is possible that the background task monitor can run them concurrently. Previously, all the shard moves planned in a rebalance operation took dependency on each other sequentially. For instance, given the following table and shards colocation group 1 colocation group 2 table1 table2 table3 table4 table 5 shard11 shard21 shard31 shard41 shard51 shard12 shard22 shard32 shard42 shard52 if the rebalancer planner returned the below set of moves ` {move(shard11), move(shard12), move(shard41), move(shard42)}` background rebalancer scheduled them such that they depend on each other sequentially. ``` {move(reftables) if there is any, none} \| move( shard11) \| move(shard12) \| {move(shard41)<--- move(shard12)} This is an artificial dependency move(shard41) \| move(shard42) ``` This results in artificial dependencies between otherwise independent moves. Considering that the shards in different colocation groups can be moved concurrently, this PR changes the dependency relationship between the moves as follows: ``` {move(reftables) if there is any, none} {move(reftables) if there is any, none} \| \| move(shard11) move(shard41) \| \| move(shard12) move(shard42) ``` --------- Co-authored-by: Jelte Fennema <jelte.fennema@microsoft.com>	2023-03-29 22:03:37 +03:00
rajeshkt78	85b8a2c7a1	CDC implementation for Citus using Logical Replication (#6623 ) Description: Implementing CDC changes using Logical Replication to avoid re-publishing events multiple times by setting up replication origin session, which will add "DoNotReplicateId" to every WAL entry. - shard splits - shard moves - create distributed table - undistribute table - alter distributed tables (for some cases) - reference table operations The citus decoder which will be decoding WAL events for CDC clients, ignores any WAL entry with replication origin that is not zero. It also maps the shard names to distributed table names.	2023-03-28 16:00:21 +05:30
Onur Tirtir	372a93b529	Make 8 more tests runnable multiple times via run_test.py (#6791 ) Soon I will be doing some changes related to #692 in router planner and those changes require updating ~5/6 tests related to router planning. And to make those test files runnable by run_test.py multiple times, we need to make some other tests (that they're run in parallel / they badly depend on) ready for run_test.py too.	2023-03-27 12:19:06 +03:00
aykut-bozkurt	ea3093bdb6	Make workerCount configurable for regression tests (#6764 ) Make worker count flexible in our regression tests instead of hardcoding it to 2 workers.	2023-03-20 12:06:31 +03:00
Onur Tirtir	821f26cc74	Fix flaky test detection for upgrade tests When run_test.py is run for an upgrade_._after.sql then, then automatically run the corresponding uprade_._before.sql file first. This is because all those upgrade_._after.sql files depend on the objects created in upgrade_._before.sql files by definition.	2023-03-14 17:13:52 +03:00
Jelte Fennema	b489d763e1	Use pg_total_relation_size in citus_shards (#6748 ) DESCRIPTION: Correctly report shard size in citus_shards view When looking at citus_shards, people are interested in the actual size that all the data related to the shard takes up on disk. `pg_total_relation_size` is the function to use for that purpose. The previously used `pg_relation_size` does not include indexes or TOAST. Especially the missing toast can have enormous impact on the size of the shown data.	2023-03-06 10:53:12 +01:00

1 2

65 Commits (bb6eeb17cc397893f0151ffdfcede02644179c60)