citus

Commit Graph

Author	SHA1	Message	Date
aykutbozkurt	29ef9117e6	PR #6728 / commit - 4 Add new metadata sync methods which uses MemorySyncContext api so that during the sync we can - free memory to prevent OOM, - use either transactional or nontransactional modes according to the GUC .	2023-03-30 10:53:22 +03:00
Gokhan Gulbiz	e71bfd6074	Identity column implementation refactorings (#6738 ) This pull request proposes a change to the logic used for propagating identity columns to worker nodes in citus. Instead of creating a dependent sequence for each identity column and changing its default value to `nextval(seq)/worker_nextval(seq)`, this update will pass the identity columns as-is to the worker nodes. Please note that there are a few limitations to this change. 1. Only bigint identity columns will be allowed in distributed tables to ensure compatibility with the DDL from any node functionality. Our current distributed sequence implementation only allows insert statements from all nodes for bigint sequences. 2. `alter_distributed_table` and `undistribute_table` operations will not be allowed for tables with identity columns. This is because we do not have a proper way of keeping sequence states consistent across the cluster. DESCRIPTION: Prevents using identity columns on data types other than `bigint` on distributed tables DESCRIPTION: Prevents using `alter_distributed_table` and `undistribute_table` UDFs when a table has identity columns DESCRIPTION: Fixes a bug that prevents enforcing identity column restrictions on worker nodes Depends on #6740 Fixes #6694	2023-03-30 10:41:01 +03:00
Emel Şimşek	d3fb9288ab	Schedule parallel shard moves in background rebalancer by removing task dependencies between shard moves across colocation groups. (#6756 ) DESCRIPTION: This PR removes the task dependencies between shard moves for which the shards belong to different colocation groups. This change results in scheduling multiple tasks in the RUNNABLE state. Therefore it is possible that the background task monitor can run them concurrently. Previously, all the shard moves planned in a rebalance operation took dependency on each other sequentially. For instance, given the following table and shards colocation group 1 colocation group 2 table1 table2 table3 table4 table 5 shard11 shard21 shard31 shard41 shard51 shard12 shard22 shard32 shard42 shard52 if the rebalancer planner returned the below set of moves ` {move(shard11), move(shard12), move(shard41), move(shard42)}` background rebalancer scheduled them such that they depend on each other sequentially. ``` {move(reftables) if there is any, none} \| move( shard11) \| move(shard12) \| {move(shard41)<--- move(shard12)} This is an artificial dependency move(shard41) \| move(shard42) ``` This results in artificial dependencies between otherwise independent moves. Considering that the shards in different colocation groups can be moved concurrently, this PR changes the dependency relationship between the moves as follows: ``` {move(reftables) if there is any, none} {move(reftables) if there is any, none} \| \| move(shard11) move(shard41) \| \| move(shard12) move(shard42) ``` --------- Co-authored-by: Jelte Fennema <jelte.fennema@microsoft.com>	2023-03-29 22:03:37 +03:00
Marco Slot	8ad444f8ef	Hide shards from CDC subscriptions	2023-03-29 00:59:12 +02:00
Marco Slot	b09d239809	Propagate CREATE PUBLICATION statements	2023-03-29 00:59:12 +02:00
Gokhan Gulbiz	e618345703	Handle identity columns properly in the router planner (#6802 ) DESCRIPTION: Fixes a bug with insert..select queries with identity columns Fixes #6798	2023-03-29 15:50:12 +03:00
Onur Tirtir	616b5018a0	Add a GUC to disallow planning the queries that reference non-colocated tables via router planner (#6793 ) Today we allow planning the queries that reference non-colocated tables if the shards that query targets are placed on the same node. However, this may not be the case, e.g., after rebalancing shards because it's not guaranteed to have those shards on the same node anymore. This commit adds citus.enable_non_colocated_router_query_pushdown GUC that can be used to disallow planning such queries via router planner, when it's set to false. Note that the default value for this GUC will be "true" for 11.3, but we will alter it to "false" on 12.0 to not introduce a breaking change in a minor release. Closes #692. Even more, allowing such queries to go through router planner also causes generating an incorrect plan for the DML queries that reference distributed tables that are sharded based on different replication factor settings. For this reason, #6779 can be closed after altering the default value for this GUC to "false", hence not now. DESCRIPTION: Adds `citus.enable_non_colocated_router_query_pushdown` GUC to ensure generating a consistent distributed plan for the queries that reference non-colocated distributed tables (when set to "false", the default is "true").	2023-03-28 13:10:29 +03:00
Teja Mupparti	9bab819f26	Disentangle MERGE planning code from the modify-planning code path	2023-03-27 10:41:46 -07:00
Onur Tirtir	372a93b529	Make 8 more tests runnable multiple times via run_test.py (#6791 ) Soon I will be doing some changes related to #692 in router planner and those changes require updating ~5/6 tests related to router planning. And to make those test files runnable by run_test.py multiple times, we need to make some other tests (that they're run in parallel / they badly depend on) ready for run_test.py too.	2023-03-27 12:19:06 +03:00
Onur Tirtir	4960ced175	Add an arbitrary config test heavily based on multi_router_planner_fast_path.sql (#6782 ) This would be useful for testing #6773. This is because, given that #6773 only adds support for router / fast-path queries, theoretically almost all the tests that we have in that test file should work for null-shard-key tables too (and they indeed do). I deliberately did not replace multi_router_planner_fast_path.sql with the one that I'm adding into arbitrary configs because we might still want to see when we're able to go through fast-path planning for the usual distributed tables (the ones that have a shard key).	2023-03-22 10:49:08 +03:00
Ahmet Gedemenli	2713e015d6	Check before logicalrep for rebalancer, error if needed (#6754 ) DESCRIPTION: Check before logicalrep for rebalancer, error if needed Check if we can use logical replication or not, in case of shard transfer mode = auto, before executing the shard moves. If we can't, error out. Before this PR, we used to error out in the middle of shard moves: ```sql set citus.shard_count = 4; -- just to get the error sooner select citus_remove_node('localhost',9702); create table t1 (a int primary key); select create_distributed_table('t1','a'); create table t2 (a bigint); select create_distributed_table('t2','a'); select citus_add_node('localhost',9702); select rebalance_table_shards(); NOTICE: Moving shard 102008 from localhost:9701 to localhost:9702 ... NOTICE: Moving shard 102009 from localhost:9701 to localhost:9702 ... NOTICE: Moving shard 102012 from localhost:9701 to localhost:9702 ... ERROR: cannot use logical replication to transfer shards of the relation t2 since it doesn't have a REPLICA IDENTITY or PRIMARY KEY ``` Now we check and error out in the beginning, without moving the shards. fixes: #6727	2023-03-21 16:34:52 +03:00
Teja Mupparti	cf55136281	1) Restrict MERGE command INSERT to the source's distribution column Fixes #6672 2) Move all MERGE related routines to a new file merge_planner.c 3) Make ConjunctionContainsColumnFilter() static again, and rearrange the code in MergeQuerySupported() 4) Restore the original format in the comments section. 5) Add big serial test. Implement latest set of comments	2023-03-16 13:43:08 -07:00
Teja Mupparti	1e42cd3da0	Support MERGE on distributed tables with restrictions This implements the phase - II of MERGE sql support Support routable query where all the tables in the merge-sql are distributed, co-located, and both the source and target relations are joined on the distribution column with a constant qual. This should be a Citus single-task query. Below is an example. SELECT create_distributed_table('t1', 'id'); SELECT create_distributed_table('s1', 'id', colocate_with => ‘t1’); MERGE INTO t1 USING s1 ON t1.id = s1.id AND t1.id = 100 WHEN MATCHED THEN UPDATE SET val = s1.val + 10 WHEN MATCHED THEN DELETE WHEN NOT MATCHED THEN INSERT (id, val, src) VALUES (s1.id, s1.val, s1.src) Basically, MERGE checks to see if There are a minimum of two distributed tables (source and a target). All the distributed tables are indeed colocated. MERGE relations are joined on the distribution column MERGE .. USING .. ON target.dist_key = source.dist_key The query should touch only a single shard i.e. JOIN AND with a constant qual MERGE .. USING .. ON target.dist_key = source.dist_key AND target.dist_key = <> If any of the conditions are not met, it raises an exception. (cherry picked from commit `44c387b978`) This implements MERGE phase3 Support pushdown query where all the tables in the merge-sql are Citus-distributed, co-located, and both the source and target relations are joined on the distribution column. This will generate multiple tasks which execute independently after pushdown. SELECT create_distributed_table('t1', 'id'); SELECT create_distributed_table('s1', 'id', colocate_with => ‘t1’); MERGE INTO t1 USING s1 ON t1.id = s1.id WHEN MATCHED THEN UPDATE SET val = s1.val + 10 WHEN MATCHED THEN DELETE WHEN NOT MATCHED THEN INSERT (id, val, src) VALUES (s1.id, s1.val, s1.src) *The only exception for both the phases II and III is, UPDATEs and INSERTs must be done on the same shard-group as the joined key; for example, below scenarios are NOT supported as the key-value to be inserted/updated is not guaranteed to be on the same node as the id distribution-column. MERGE INTO target t USING source s ON (t.customer_id = s.customer_id) WHEN NOT MATCHED THEN - - INSERT(customer_id, …) VALUES (<non-local-constant-key-value>, ……); OR this scenario where we update the distribution column itself MERGE INTO target t USING source s On (t.customer_id = s.customer_id) WHEN MATCHED THEN UPDATE SET customer_id = 100; (cherry picked from commit `fa7b8949a8`)	2023-03-16 13:43:08 -07:00
Onur Tirtir	9550ebd118	Remove pg_depend entries from columnar metadata indexes to columnar-am In the past, having columnar tables in the cluster was causing pg upgrades to fail when attempting to access columnar metadata. This is because, pg_dump doesn't see objects that we use for columnar-am related booking as the dependencies of the tables using columnar-am. To fix that; in #5456, we inserted some "normal dependency" edges (from those objects to columnar-am) into pg_depend. This helped us ensuring the existency of a class of metadata objects --such as columnar.storageid_seq-- and helped fixing #5437. However, the normal-dependency edges that we added for indexes on columnar metadata tables --such columnar.stripe_pkey-- didn't help at all because they were indeed causing dependency loops (#5510) and pg_dump was not able to take those dependency edges into the account. For this reason, this commit deletes those dependency edges so that pg_dump stops complaining about them. Note that it's not critical to delete those edges from pg_depend since they're not breaking pg upgrades but were triggering some warning messages. And given that backporting a sql change into older versions is hard a lot, we skip backporting this.	2023-03-14 17:13:52 +03:00
Onur Tirtir	2b4be535de	Do clean-up before upgrade_columnar_before to make it runnable multiple times So that flaky test detector can run upgrade_columnar_before.sql multiple times.	2023-03-14 17:13:52 +03:00
Onur Tirtir	994f67185f	Make upgrade_columnar_after runnable multiple times This commit hides port numbers in upgrade_columnar_after because the port numbers assigned to nodes in upgrade schedule differ from the ones that flaky test detector assigns.	2023-03-14 17:13:52 +03:00
Emel Şimşek	4043abd5aa	Exclude-Generated-Columns-In-Copy (#6721 ) DESCRIPTION: Fixes a bug in shard copy operations. For copying shards in both shard move and shard split operations, Citus uses the COPY statement. A COPY all statement in the following form ` COPY target_shard FROM STDIN;` throws an error when there is a GENERATED column in the shard table. In order to fix this issue, we need to exclude the GENERATED columns in the COPY and the matching SELECT statements. Hence this fix converts the COPY and SELECT all statements to the following form: ``` COPY target_shard (col1, col2, ..., coln) FROM STDIN; SELECT (col1, col2, ..., coln) FROM source_shard; ``` where (col1, col2, ..., coln) does not include a GENERATED column. GENERATED column values are created in the target_shard as the values are inserted. Fixes #6705. --------- Co-authored-by: Teja Mupparti <temuppar@microsoft.com> Co-authored-by: aykut-bozkurt <51649454+aykut-bozkurt@users.noreply.github.com> Co-authored-by: Jelte Fennema <jelte.fennema@microsoft.com> Co-authored-by: Gürkan İndibay <gindibay@microsoft.com>	2023-03-07 18:15:50 +03:00
Ahmet Gedemenli	03f1bb70b7	Rebalance shard groups with placement count less than worker count (#6739 ) DESCRIPTION: Adds logic to distribute unbalanced shards If the number of shard placements (for a colocation group) is less than the number of workers, it means that some of the workers will remain empty. With this PR, we consider these shard groups as a colocation group, in order to make them be distributed evenly as much as possible across the cluster. Example: ```sql create table t1 (a int primary key); create table t2 (a int primary key); create table t3 (a int primary key); set citus.shard_count =1; select create_distributed_table('t1','a'); select create_distributed_table('t2','a',colocate_with=>'t1'); select create_distributed_table('t3','a',colocate_with=>'t2'); create table tb1 (a bigint); create table tb2 (a bigint); select create_distributed_table('tb1','a'); select create_distributed_table('tb2','a',colocate_with=>'tb1'); select citus_add_node('localhost',9702); select rebalance_table_shards(); ``` Here we have two colocation groups, each with one shard group. Both shard groups are placed on the first worker node. When we add a new worker node and try to rebalance table shards, the rebalance planner considers it well balanced and does nothing. With this PR, the rebalancer tries to distribute these shard groups evenly across the cluster as much as possible. For this example, with this PR, the rebalancer moves one of the shard groups to the second worker node. fixes: #6715	2023-03-06 14:14:27 +03:00
Onur Tirtir	a9820e96a3	Make single_node_truncate.sql re-runnable First of all, this commit sets next_shard_id for single_node_truncate.sql because shard ids in the test output were changing whenever we modify a prior test file. Then the flaky test detector started complaining about single_node_truncate.sql. We fix that by specifying the correct test dependency for it in run_test.py.	2023-03-02 16:33:18 +03:00
Onur Tirtir	40105bf1fc	Make single_node.sql re-runnable	2023-03-02 16:33:17 +03:00
Teja Mupparti	ca65d2ba0b	Fix flaky tests local_shards_execution and local_shards_execution_replication. O Simple fix is to add ORDER BY to have definitive results. O Add search_path explicitly after reconnecting, this avoids creating objects in public schema which prevents us from repetitive running of tests. O multi_mx_modification is not designed to run repetitive, so isolate it.	2023-02-15 09:18:10 -08:00
Jelte Fennema	09be4bb5fd	Allow multi_insert_select to run repeatably (#6707 ) It was not cleaning up all the tables it created. This changes it to create a dedicated schema for this test, like we have for many others.	2023-02-10 10:06:42 +01:00
Jelte Fennema	590df5360c	Fix flakyness in failure_create_distributed_table_non_empty (#6708 ) The failure_create_distributed_table_non_empty test would sometimes fail like this: ```diff -- in the first test, cancel the first connection we sent from the coordinator SELECT citus.mitmproxy('conn.cancel(' \|\| pg_backend_pid() \|\| ')'); - mitmproxy ---------------------------------------------------------------------- - -(1 row) - +ERROR: canceling statement due to user request +CONTEXT: COPY mitmproxy_result, line 0 +SQL statement "COPY mitmproxy_result FROM '/home/circleci/project/src/test/regress/tmp_check/mitmproxy.fifo'" +PL/pgSQL function citus.mitmproxy(text) line 11 at EXECUTE SELECT create_distributed_table('test_table', 'id'); ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/30474/workflows/be1c9f9d-22c9-465c-964a-dcdd1cb8c99c/jobs/985441 Because the cancel command had no filter it would actually sometimes cancel the mitmproxy cancel command itself. This PR addresses that by simply removing this test. This is basically the exact same issue as #6217, only in a different place in the file. It's fixed here by removing the test since there's already many different similar tests.	2023-02-10 09:55:12 +01:00
Onur Tirtir	483b51392f	Bump Citus to 11.3devel (#6690 )	2023-02-06 10:23:25 +00:00
Gokhan Gulbiz	b6a4652849	Stop background daemon before dropping the database (#6688 ) DESCRIPTION: Stop maintenance daemon when dropping a database even without Citus extension Fixes #6670	2023-02-03 15:15:44 +03:00
Jelte Fennema	14c31fbb07	Fix background rebalance when reference table has no PK (#6682 ) DESCRIPTION: Fix background rebalance when reference table has no PK For the background rebalance we would always fail if a reference table that was not replicated to all nodes would not have a PK (or replica identity). Even when we used force_logical or block_writes as the shard transfer mode. This fixes that and adds some regression tests. Fixes #6680	2023-01-31 12:18:29 +01:00
aykut-bozkurt	8a9bb272e4	fix dropping table_name option from foreign table (#6669 ) We should disallow dropping table_name option if foreign table is in metadata. Otherwise, we get table not found error which contains shardid. DESCRIPTION: Fixes an unexpected foreign table error by disallowing to drop the table_name option. Fixes #6663	2023-01-30 17:24:30 +03:00
Marco Slot	a482b36760	Revert "Support MERGE on distributed tables with restrictions" (#6675 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2023-01-30 15:01:59 +01:00
Onur Tirtir	594684bb33	Do clean-up before columnar_create to make it runnable multiple times So that flaky test detector can run columnar_create.sql multiple times.	2023-01-30 15:58:34 +03:00
Onur Tirtir	1c51ddae49	Fall-back to seq-scan when accessing columnar metadata if the index doesn't exist Fixes #6570. In the past, having columnar tables in the cluster was causing pg upgrades to fail when attempting to access columnar metadata. This is because, pg_dump doesn't see objects that we use for columnar-am related booking as the dependencies of the tables using columnar-am. To fix that; in #5456, we inserted some "normal dependency" edges (from those objects to columnar-am) into pg_depend. This helped us ensuring the existency of a class of metadata objects --such as columnar.storageid_seq-- and helped fixing #5437. However, the normal-dependency edges that we added for indexes on columnar metadata tables --such columnar.stripe_pkey-- didn't help at all because they were indeed causing dependency loops (#5510) and pg_dump was not able to take those dependency edges into the account. For this reason, instead of inserting such dependency edges from indexes to columnar-am, we allow columnar metadata accessors to fall-back to sequential scan during pg upgrades.	2023-01-30 15:58:34 +03:00
Jelte Fennema	10603ed5d4	Fix flaky multi_reference_table test (#6664 ) Sometimes in CI our multi_reference_table test fails like this: ```diff WHERE colocated_table_test.value_2 = reference_table_test.value_2; LOG: join order: [ "colocated_table_test" ][ reference join "reference_table_test" ] value_2 --------- - 1 2 + 1 (2 rows) ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/30223/workflows/ce3ab5db-310f-4e30-ba0b-c3b31927d9b6/jobs/970041 We forgot an ORDER BY in this test.	2023-01-30 13:32:38 +01:00
aykut-bozkurt	ab71cd01ee	fix multi level recursive plan (#6650 ) Recursive planner should handle all the tree from bottom to top at single pass. i.e. It should have already recursively planned all required parts in its first pass. Otherwise, this means we have bug at recursive planner, which needs to be handled. We add a check here and return error. DESCRIPTION: Fixes wrong results by throwing error in case recursive planner multipass the query. We found 3 different cases which causes recursive planner passes the query multiple times. 1. Sublink in WHERE clause is planned at second pass after we recursively planned a distributed table at the first pass. Fixed by PR #6657. 2. Local-distributed joins are recursively planned at both the first and the second pass. Issue #6659. 3. Some parts of the query is considered to be noncolocated at the second pass as we do not generate attribute equivalances between nondistributed and distributed tables. Issue #6653	2023-01-27 21:25:04 +03:00
Gokhan Gulbiz	4e26464969	Allow plain pg foreign tables without a table_name option (#6652 )	2023-01-27 16:34:11 +03:00
aykut-bozkurt	8870f0f90b	fix order of recursive sublink planning (#6657 ) We should do the sublink conversations at the end of the recursive planning because earlier steps might have transformed the query into a shape that needs recursively planning the sublinks. DESCRIPTION: Fixes early sublink check at recursive planner. Related to PR https://github.com/citusdata/citus/pull/6650	2023-01-27 14:35:16 +03:00
Emel Şimşek	24f6136f72	Fixes ADD {PRIMARY KEY/UNIQUE} USING INDEX cmd (#6647 ) This change allows creating a constraint without a name using an index. The index name will be used as the constraint name the same way postgres handles it. Fixes issue #6644 This commit also cleans up some leftovers from nameless constraint checks. With this commit, we now fully support adding all nameless constraints directly to a table. Co-authored-by: naisila <nicypp@gmail.com>	2023-01-25 21:28:07 +03:00
Emel Şimşek	2169e0222b	Propagates NOT VALID option for FK&CHECK constraints w/out a name (#6649 ) Adds NOT VALID option to deparser. When we need to deparse: "ALTER TABLE ADD FOREIGN KEY ... NOT VALID" "ALTER TABLE ADD CHECK ... NOT VALID" NOT VALID option should be propagated to workers. Fixes issue #6646 This commit also uses AppendColumnNameList function instead of repeated code blocks in two appropriate places in the "ALTER TABLE" deparser.	2023-01-25 20:41:04 +03:00
Hanefi Onaldi	94b63f35a5	Prevent crashes on update with returning clauses (#6643 ) If an update query on a reference table has a returns clause with a subquery that accesses some other local table, we end-up with an crash. This commit prevents the crash, but does not prevent other error messages from happening due to Citus not being able to pushdown the results of that subquery in a valid SQL command. Related: #6634	2023-01-24 20:07:43 +03:00
Jelte Fennema	7a7880aec9	Fix regression in allowed foreign keys on distributed tables (#6550 ) DESCRIPTION: Fix regression in allowed foreign keys on distributed tables In commit `eadc88a` we changed how we skip foreign key validation. The goal was to skip it in more cases. However, one change had the unintended regression of introducing failures when trying to create certain foreign keys. This reverts that part of the change. The way of skipping validation of foreign keys that was introduced in `eadc88a` was skipping validation during execution. The reason that this caused this regression was because some foreign key validation queries already fail during planning. In those cases it never gets to the execution step where it would later be skipped. Fixes #6543	2023-01-24 16:09:21 +03:00
Emel Şimşek	58368b7783	Enable adding FOREIGN KEY constraints on Citus tables without a name. (#6616 ) DESCRIPTION: Enable adding FOREIGN KEY constraints on Citus tables without a name This PR enables adding a foreign key to a distributed/reference/Citus local table without specifying the name of the constraint, e.g. `ALTER TABLE items ADD FOREIGN KEY (user_id) REFERENCES users (id);`	2023-01-20 01:43:52 +03:00
Gokhan Gulbiz	2388fbea6e	Identity Column Support on Citus Managed Tables (#6591 ) DESCRIPTION: Identity Column Support on Citus Managed Tables	2023-01-19 15:45:41 +03:00
Marco Slot	64e3fee89b	Remove shardstate leftovers (#6627 ) Remove ShardState enum and associated logic. Co-authored-by: Marco Slot <marco.slot@gmail.com> Co-authored-by: Ahmet Gedemenli <afgedemenli@gmail.com>	2023-01-19 11:43:58 +03:00
Teja Mupparti	44c387b978	Support MERGE on distributed tables with restrictions This implements the phase - II of MERGE sql support Support routable query where all the tables in the merge-sql are distributed, co-located, and both the source and target relations are joined on the distribution column with a constant qual. This should be a Citus single-task query. Below is an example. SELECT create_distributed_table('t1', 'id'); SELECT create_distributed_table('s1', 'id', colocate_with => ‘t1’); MERGE INTO t1 USING s1 ON t1.id = s1.id AND t1.id = 100 WHEN MATCHED THEN UPDATE SET val = s1.val + 10 WHEN MATCHED THEN DELETE WHEN NOT MATCHED THEN INSERT (id, val, src) VALUES (s1.id, s1.val, s1.src) Basically, MERGE checks to see if There are a minimum of two distributed tables (source and a target). All the distributed tables are indeed colocated. MERGE relations are joined on the distribution column MERGE .. USING .. ON target.dist_key = source.dist_key The query should touch only a single shard i.e. JOIN AND with a constant qual MERGE .. USING .. ON target.dist_key = source.dist_key AND target.dist_key = <> If any of the conditions are not met, it raises an exception.	2023-01-18 11:05:27 -08:00
Ahmet Gedemenli	b3b135867e	Remove shardstate from placement insert functions (#6615 )	2023-01-18 09:52:38 +01:00
Emel Şimşek	28ed013a91	Enable ALTER TABLE ... ADD CHECK (#6606 ) DESCRIPTION: Enable adding CHECK constraints on distributed tables without the client having to provide a constraint name. This PR enables the following command syntax for adding check constraints to distributed tables. ALTER TABLE ... ADD CHECK ... by creating a default constraint name and transforming the command into the below syntax before sending it to workers. ALTER TABLE ... ADD CONSTRAINT \<conname> CHECK ...	2023-01-12 23:31:06 +03:00
Ahmet Gedemenli	e5fef40c06	Introduce citus_move_shard_placement UDF with nodeid	2023-01-12 16:57:51 +03:00
Ahmet Gedemenli	e19c545fbf	Introduce citus_copy_shard_placement UDF with nodeid	2023-01-12 16:57:51 +03:00
Emel Şimşek	d322f9e382	Handle DEFERRABLE option for the relevant constraints at deparser. (#6613 ) Table Constraints UNIQUE, PRIMARY KEY and EXCLUDE may have option DEFERRABLE in their command syntax. This PR handles the option when deparsing the relevant constraint statements. NOT DEFERRABLE and INITIALLY IMMEDIATE (if DEFERRABLE} are the default values for the option so we only append the non-default values to the alter table statement.	2023-01-12 12:32:38 +03:00
Ahmet Gedemenli	bc3383170e	Fix crash when trying to replicate a ref table that is actually dropped (#6595 ) DESCRIPTION: Fix crash when trying to replicate a ref table that is actually dropped see #6592 We should have a real solution for it.	2023-01-06 14:52:08 +03:00
Emel Şimşek	db7a70ef3e	Enable ALTER TABLE ... ADD UNIQUE and ADD EXCLUDE. (#6582 ) DESCRIPTION: Adds support for creating table constraints UNIQUE and EXCLUDE via ALTER TABLE command without client having to specify a name. ALTER TABLE ... ADD CONSTRAINT <conname> UNIQUE ... ALTER TABLE ... ADD CONSTRAINT <conname> EXCLUDE ... commands require the client to provide an explicit constraint name. However, in postgres it is possible for clients not to provide a name and let the postgres generate it using the following commands ALTER TABLE ... ADD UNIQUE ... ALTER TABLE ... ADD EXCLUDE ... This PR enables the same functionality for citus tables.	2023-01-05 18:12:32 +03:00
Emel Şimşek	135c519c62	Fix flakyness for multi_name_lengths test (#6599 ) Fix flakyness for multi_name_lengths test.	2023-01-05 17:27:16 +03:00
Ahmet Gedemenli	235047670d	Drop SHARD_STATE_TO_DELETE (#6494 ) DESCRIPTION: Drop `SHARD_STATE_TO_DELETE` and use the cleanup records instead Drops the shard state that is used to mark shards as orphaned. Now we insert cleanup records into `pg_dist_cleanup` so "orphaned" shards will be dropped either by maintenance daemon or internal cleanup calls. With this PR, we make the "cleanup orphaned shards" functions to be no-op, as they would not be needed anymore. This PR includes some naming changes about placement functions. We don't need functions that filter orphaned shards, as there will be no orphaned shards anymore. We will also be introducing a small script with this PR, for users with orphaned shards. We'll basically delete the orphaned shard entries from `pg_dist_placement` and insert cleanup records into `pg_dist_cleanup` for each one of them, during Citus upgrade. We also have a lot of flakiness fixes in this PR. Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>	2023-01-03 14:38:16 +03:00
Ahmet Gedemenli	0c74e4cc0f	Fix some flaky tests (#6587 ) Fix for some simple flakiness'. All `DROP USER' and cleanup function calls.	2022-12-29 10:19:09 +03:00
Ahmet Gedemenli	1b1e737e51	Drop cleanup on failure (#6584 ) DESCRIPTION: Defers cleanup after a failure in shard move or split We don't need to do a cleanup in case of failure on a shard transfer or split anymore. Because, * Maintenance daemon will clean them up anyway. * We trigger a cleanup at the beginning of shard transfers/splits. * The cleanup on failure logic also can fail sometimes and instead of the original error, we throw the error that is raised by the cleanup procedure, and it causes confusion.	2022-12-28 15:48:44 +03:00
Ahmet Gedemenli	eba9abeee2	Fix leftover shard copy on the target node when tx with move is aborted (#6583 ) DESCRIPTION: Cleanup the shard on the target node in case of a failed/aborted shard move Inserts a cleanup record for the moved shard placement on the target node. If the move operation succeeds, the record will be deleted. If not, it will remain there to be cleaned up later. fixes: #6580	2022-12-27 22:42:46 +03:00
Ahmet Gedemenli	96bf648d1a	Unify dependent test files into one	2022-12-22 13:06:26 +03:00
Ahmet Gedemenli	acf3539a90	Fix split schedule	2022-12-22 13:06:26 +03:00
Hanefi Onaldi	303db172f8	Use Citus version comparison in upgrade tests, not equality (#6568 ) We have several version checks in our Citus upgrade tests. However, as we drop support for PG versions, we need to update the Citus versions used in our CI images. Therefore we must compare Citus versions in our tests instead of using equality checks so that the queries are ran in all the associated Citus versions. For example, we have many conditionals where we early exit if the Citus version is not equal to 9.0. However, as of today we never use version 9.0 and thus we always early exit in those tests.	2022-12-21 14:01:57 +03:00
Teja Mupparti	9a9989fc15	Support MERGE Phase – I All the tables (target, source or any CTE present) in the SQL statement are local i.e. a merge-sql with a combination of Citus local and Non-Citus tables (regular Postgres tables) should work and give the same result as Postgres MERGE on regular tables. Catch and throw an exception (not-yet-supported) for all other scenarios during Citus-planning phase.	2022-12-18 20:32:15 -08:00
Emel Şimşek	5268d0a6cb	Enable PRIMARY KEY generation via ALTER TABLE even if the constraint name is not provided (#6520 ) DESCRIPTION: Support ALTER TABLE .. ADD PRIMARY KEY ... command Before processing > ALTER TABLE ... ADD PRIMARY KEY ... command 1. Create a primary key name to use as the constraint name. 2. Change the ALTER TABLE ... ADD PRIMARY KEY ... command to into ALTER TABLE ... ADD CONSTRAINT \<constraint name> PRIMARY KEY ... form. This is the only form we can specify a name for a primary key. If we run ALTER TABLE .. ADD PRIMARY KEY, postgres would create a constraint name internally in its own scheme. But the problem is that we need to create constraint names for shards in our own scheme which is \<constraint name>_\<shardid>. Hence we need to create a name and send it to workers so that the workers can append the shardid. 4. Run the changed command on the coordinator to make sure we are using the same constraint name across the board. 5. Send the changed command to workers such that it is executed for the main table as well as for the shards. Fixes #6515.	2022-12-16 20:34:00 +03:00
Onder Kalaci	feb5534c65	Do not create additional WaitEventSet for RemoteSocketClosed checks Before this commit, we created an additional WaitEventSet for checking whether the remote socket is closed per connection - only once at the start of the execution. However, for certain workloads, such as pgbench select-only workloads, the creation/deletion of the additional WaitEventSet adds ~7% CPU overhead, which is also reflected on the benchmark results. With this commit, we use the same WaitEventSet for the purposes of checking the remote socket at the start of the execution. We use "rebuildWaitEventSet" flag so that the executor can re-use the existing WaitEventSet. As a result, we see the following improvements on PG 15: main : 120051 tps, 0.532 ms latency avg. avoid_wes_rebuild: 127119 tps, 0.503 ms latency avg. And, on PG 14, as expected, there is no difference main : 129191 tps, 0.495 ms latency avg. avoid_wes_rebuild: 129480 tps, 0.494 ms latency avg. But, note that PG 15 is slightly (~1.5%) slower than PG 14. That is probably the overhead of checking the remote socket.	2022-12-14 22:42:55 +01:00
Gürkan İndibay	3f091e3493	Give nicer error message when using alter_table_set_access_method on a view (#6553 ) DESCRIPTION: Fixes alter_table_set_access_method error for views. Fixes #6001	2022-12-12 23:56:22 +03:00
aykut-bozkurt	1ad1a0a336	add citus_task_wait udf to wait on desired task status (#6475 ) We already have citus_job_wait to wait until the job reaches the desired state. That PR adds waiting on task state to allow more granular waiting. It can be used for Citus operations. Moreover, it is also useful for testing purposes. (wait until a task reaches specified state) Related to #6459.	2022-12-12 22:41:03 +03:00
Ahmet Gedemenli	190307e8d8	Wait for cleanup function (#6549 ) Adding a testing function `wait_for_resource_cleanup` which waits until all records in `pg_dist_cleanup` are cleaned up. The motivation is to prevent flakiness in our tests, since the `NOTICE: cleaned up X orphaned resources` message is not consistent in many cases. This PR replaces `citus_cleanup_orphaned_resources` calls with `wait_for_resource_cleanup` calls.	2022-12-08 13:19:25 +03:00
Teja Mupparti	cbb33167f9	Fix the flaky test in clock.sql	2022-12-07 09:47:35 -08:00
Ahmet Gedemenli	989a3b54c9	Disable maintenance daemon for cleanup test (#6551 ) Disabling the cleanup in maintenance daemon, to prevent a flaky test.	2022-12-07 20:28:55 +03:00
Onur Tirtir	b177975371	Add new regression tests	2022-12-07 18:27:50 +03:00
Onur Tirtir	e7e4881289	Phase - III: recursively plan non-recurring sub join trees too	2022-12-07 18:27:50 +03:00
Onur Tirtir	f52381387e	Phase - II: recursively plan non-recurring subqueries too	2022-12-07 18:27:50 +03:00
Onur Tirtir	f339450a9d	Phase - I: recursively plan non-recurring relations	2022-12-07 18:27:50 +03:00
Ahmet Gedemenli	3cc5d9842a	Remove IF EXISTS from cleanup on failure test for subscription object (#6547 ) Nothing critical. Just improving a DROP SUBSCRIPTION test for a cleanup after failure scenario.	2022-12-07 17:51:36 +03:00
Ahmet Gedemenli	cb02d62369	Unique names for replication artifacts (#6529 ) DESCRIPTION: Create replication artifacts with unique names We're creating replication objects with generic names. This disallows us to enable parallel shard moves, as two operations might use the same objects. With this PR, we'll create below objects with operation specific names, by appending OparationId to the names. * Subscriptions * Publications * Replication Slots * Users created for subscriptions	2022-12-06 15:48:16 +03:00
Teja Mupparti	e14dc5d45d	Address the issues/comments from the original PR# 6315 1) Regular users fail to use clock UDF with permission issue. 2) Clock functions were declared as STABLE, whereas by definition they are VOLATILE. By design, any clock/time functions will return different results for each call even within a single SQL statement. Note: UDF citus_get_transaction_clock() is a misnomer as it internally calls the clock tick which always returns different results for every invocation in the same transaction.	2022-12-05 11:06:21 -08:00
aykut-bozkurt	65f256eec4	* add SIGTERM handler to gracefully terminate task executors, \ (#6473 ) Adds signal handlers for graceful termination, cancellation of task executors and detecting config updates. Related to PR #6459. #### How to handle termination signal? Monitor need to gracefully terminate all running task executors before terminating. Hence, we have sigterm handler for the monitor. #### How to handle cancellation signal? Monitor need to gracefully cancel all running task executors before terminating. Hence, we have sigint handler for the monitor. #### How to detect configuration changes? Monitor has SIGHUP handler to reflect configuration changes while executing tasks.	2022-12-02 18:15:31 +03:00
songjinzhou	ad6450b793	fix the problem #5763 (#6519 ) Co-authored-by: TsinghuaLucky912 <tsinghualucky912@foxmail.com> Fixes https://github.com/citusdata/citus/issues/5763	2022-12-02 13:49:32 +01:00
Ahmet Gedemenli	3b24c47470	Fix flaky cleanup tests (#6530 ) We are having some flakiness in our test schedule because of the objects leftover from shard moves/splits. With this commit we prevent logging cleanup object counts. fixes: #6534	2022-12-02 12:39:36 +03:00
songjinzhou	29f0196fdf	Add support for SET ACCESS METHOD in altering a distributed table (#6525 ) Co-authored-by: TsinghuaLucky912 <postgres@localhost.localdomain>	2022-12-01 17:45:32 +01:00
Ahmet Gedemenli	0e92244bfe	Cleanup for shard moves (#6472 ) DESCRIPTION: Extend cleanup process for replication artifacts This PR adds new cleanup record types for: * Subscriptions * Replication slots * Publications * Users created for subscriptions We add records for these object types, to `pg_dist_cleanup` during creation phase. Once the operation is done, in case of success or failure, we iterate those records and drop the objects. With this PR we will not be dropping any of these objects during the operation. In short, we will always be deferring the drop. One thing that's worth mentioning is that we sort cleanup records before processing (dropping) them, because of dependency relations among those objects, e.g a subscription might depend on a publication. Therefore, we always drop subscriptions before publications. We have some renames in this PR: * `TryDropOrphanedShards` -> `TryDropOrphanedResources` * `DropOrphanedShardsForCleanup` -> `DropOrphanedResourcesForCleanup` * `run_try_drop_marked_shards` -> `run_try_drop_marked_resources` as these functions now process replication artifacts as well. This PR drops function `DropAllLogicalReplicationLeftovers` and its all usages, since now we rely on the deferring drop mechanism.	2022-11-30 15:38:05 +03:00
aykut-bozkurt	1f8675da43	nonblocking concurrent task execution via background workers (#6459 ) Improvement on our background task monitoring API (PR #6296) to support concurrent and nonblocking task execution. Mainly we have a queue monitor background process which forks task executors for `Runnable` tasks and then monitors their status by fetching messages from shared memory queue in nonblocking way.	2022-11-30 14:29:46 +03:00
aykut-bozkurt	83ef600f27	fix false full join pushdown error check (#6523 ) Problem: Currently, we error out if we detect recurring tuples in one side without checking the other side of the join. Solution: When one side of the full join consists recurring tuples and the other side consists nonrecurring tuples, we should not pushdown to prevent duplicate results. Otherwise, safe to pushdown.	2022-11-30 14:17:56 +03:00
Gokhan Gulbiz	bc118ee551	Change GUC propagation flag's default value to off (#6516 ) This PR changes ```citus.propagate_session_settings_for_loopback_connection``` default value to off not to expose this feature publicly at this point. See #6488 for details.	2022-11-29 13:25:53 +03:00
Jelte Fennema	e12d97def2	Fix flakyness in multi_metadata_access (#6524 ) Sometimes multi_metadata_access failed like this in CI: ```diff AND ext.extname = 'citus' AND nsp.nspname = 'pg_catalog' AND NOT has_table_privilege(pg_class.oid, 'select'); oid --------------------------- - pg_dist_authinfo pg_dist_clock_logical_seq + pg_dist_authinfo (2 rows) ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/28784/workflows/e462f118-eb64-4a3f-941a-e04115334f9b/jobs/883443 This fixes that by ordering the column.	2022-11-29 10:00:06 +01:00
Philip Dubé	cf69fc3652	Grammar: it's to its Includes an error message & one case of its to it's Also fix "to the to" typos	2022-11-28 20:43:44 +00:00
Jelte Fennema	68de2ce601	Include gpid in all internal application names (#6431 ) When debugging issues it's quite useful to see the originating gpid in the application_name of a query on a worker. This already happens for most queries, but not for queries created by the rebalancer or by run_command_on_worker. This adds a gpid to those two application_names too. Note, that if the GPID of the new application_names is different than the current GPID of the backend the backend will continue to keep the old gpid as its actual GPID. This PR is just meant to make sure that the application_name is as useful as it can be for users to look at. Updating of gpids will be done in a follow-up PR, and adding gpids to all internal connections will make this easier.	2022-11-25 11:16:33 +01:00
Emel Şimşek	8e5ba45b74	Fixes a bug that causes crash when using auto_explain extension with ALTER TABLE...ADD FOREIGN KEY... queries. (#6470 ) Fixes a bug that causes crash when using auto_explain extension with ALTER TABLE...ADD FOREIGN KEY... queries. Those queries trigger a SELECT query on the citus tables as part of the foreign key constraint validation check. At the explain hook, workers try to explain this SELECT query as a distributed query causing memory corruption in the connection data structures. Hence, we will not explain ALTER TABLE...ADD FOREIGN KEY... and the triggered queries on the workers. Fixes #6424.	2022-11-15 17:53:39 +03:00
Teja Mupparti	7358b826ef	Remove the explicit-transaction requirement for the UDF citus_get_transaction_clock() as implicit transactions too use this UDF.	2022-11-10 10:54:36 -08:00
Marco Slot	77fbcfaf14	Propagate BEGIN properties to worker nodes (#6483 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-11-10 18:08:43 +01:00
Marco Slot	fcaabfdcf3	Remove remaining master_create_distributed_table usages (#6477 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-11-04 16:30:06 +01:00
Marco Slot	666696c01c	Deprecate citus.replicate_reference_tables_on_activate, make it always off (#6474 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-11-04 16:21:10 +01:00
Onur Tirtir	a5f7f001b0	Make sure to disallow triggers that depend on extensions (#6399 ) DESCRIPTION: Makes sure to disallow triggers that depend on extensions We were already doing so for `ALTER trigger DEPENDS ON EXTENSION` commands. However, we also need to disallow creating Citus tables having such triggers already, so this PR fixes that.	2022-11-02 16:27:31 +03:00
Alexander Kukushkin	402a30a2b7	Allow citus_update_node() to work with nodes from different clusters (#6466 ) DESCRIPTION: Allow citus_update_node() to work with nodes from different clusters citus_update_node(), citus_nodename_for_nodeid(), and citus_nodeport_for_nodeid() functions only checked for nodes in their own clusters and hence last two returned NULLs and the first one showed an error is the nodeId was from a different cluster. Fixes https://github.com/citusdata/citus/issues/6433	2022-11-02 10:07:01 +01:00
Teja Mupparti	01103ce05d	This implements a new UDF citus_get_cluster_clock() that returns a monotonically increasing logical clock. Clock guarantees to never go back in value after restarts, and makes best attempt to keep the value close to unix epoch time in milliseconds. Also, introduces a new GUC "citus.enable_cluster_clock", when true, every distributed transaction is stamped with logical causal clock and persisted in a catalog pg_dist_commit_transaction.	2022-10-28 10:15:08 -07:00
Ahmet Gedemenli	c379ff8614	Drop defer drop gucs (#6447 ) DESCRIPTION: Drops GUC defer_drop_after_shard_split DESCRIPTION: Drops GUC defer_drop_after_shard_move Drop GUCs and related parts from the code. Delete tests that specifically added for the GUCs. Keep tests that can be used without the GUCs. Update test output changes. The motivation for this PR is to have an "always deferring" mechanism. These two GUCs provide an option to not deferring dropping objects during a shard move/split, and dropping them immediately. With this PR, we will be always deferring dropping orphaned shards and other types of objects. We will have a separate PR to extend the deferred cleanup operation, so that we would create records for deferred drop, for Subscriptions, Publications, Replication Slots etc. This will make us be able to keep track of created objects that needs to be dropped, during a shard move/split. We will have objects created specifically for the current operation; and those objects will be dropped at the end. We have an issue (a draft roadmap) for enabling parallel shard moves. For details please see: https://github.com/citusdata/citus/issues/6437	2022-10-25 16:48:34 +03:00
Hanefi Onaldi	915d1b3b38	Repartition tests for numeric types with neg scale (#6358 ) This PR adds some test cases where repartition join correctly prunes shards on two tables that have numeric columns with negative scale.	2022-10-24 20:59:05 +03:00
Jelte Fennema	20a4d742aa	Fix flakyness in failure_split_cleanup (#6450 ) Sometimes in CI our failure_split_cleanup test would fail like this: ```diff CALL pg_catalog.citus_cleanup_orphaned_resources(); -NOTICE: cleaned up 79 orphaned resources +NOTICE: cleaned up 82 orphaned resources SELECT operation_id, object_type, object_name, node_group_id, policy_type ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/28107/workflows/4ec712c9-98b5-4e90-9806-e02a37d71679/jobs/846107 The reason was that previous tests in the schedule would also create some orphaned resources. Sometimes some of those would already be cleaned up by the maintenance daemon, resulting in a different number of cleaned up resources than expected. This cleans up any previously created resources at the start of the test without logging how many exactly were cleaned up. As a bonus this now also allows running this test using check-failure-base.	2022-10-24 17:35:31 +02:00
Jelte Fennema	737e2bb1bb	Don't leak search_path to workers on DDL (#6444 ) DESCRIPTION: Don't leak search_path to workers on DDL For DDL we have to set the `search_path` on workers to the same as on the coordinator for some DDL to work. Previously this search_path would leak outside of the transaction that was used for the DDL. This fixes that by using `SET LOCAL` instead of `SET`. The only place where we still use plain `SET` is for DDL commands that are not allowed within transactions, such as `CREATE INDEX CONCURRENLTY`. This fixes this flaky test: ```diff CONTEXT: SQL statement "SELECT change_id FROM distributed_triggers.data_changes WHERE shard_key_value = NEW.shard_key_value AND object_id = NEW.object_id ORDER BY change_id DESC LIMIT 1" -PL/pgSQL function record_change() line XX at SQL statement +PL/pgSQL function distributed_triggers.record_change() line 17 at SQL statement while executing command on localhost:57638 DELETE FROM data_ref_table where shard_key_value = 'hello'; ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/27849/workflows/75ae5f1a-100b-4b7a-b991-7de069f39ee1/jobs/831429 I had tried to fix this flaky test in #5894 and then I tried implementing a better fix in #5896, where @marcocitus suggested this better fix. This change reverts the fix from #5894 and implements the fix suggested by Marco. Our multi_mx_alter_distributed_table test actually depended on the old buggy search_path leaking behavior. After fixing the bug that test would fail like this: ```diff CALL proc_0(1.0); DEBUG: pushing down the procedure -NOTICE: Res: 3 -DETAIL: from localhost:xxxxx +ERROR: relation "test_proc_colocation_0" does not exist +CONTEXT: PL/pgSQL function mx_alter_distributed_table.proc_0(double precision) line 5 at SQL statement +while executing command on localhost:57637 RESET client_min_messages; ``` I fixed this test by fully qualifying the table names used in the procedure. I think it's quite unlikely that actual users depend on this behavior though. Since it would require first doing DDL before calling a procedure in a session where the search_path was changed after connecting.	2022-10-19 16:47:35 +02:00
Ahmet Gedemenli	cdbda9ea6b	Add failure test for shard move (#6325 ) DESCRIPTION: Adds failure test for shard move DESCRIPTION: Remove function `WaitForAllSubscriptionsToBecomeReady` and related tests Adding some failure tests for shard moves. Dropping the not-needed-anymore function `WaitForAllSubscriptionsToBecomeReady`, as the subscriptions now start as ready from the beginning because we don't use logical replication table sync workers anymore. fixes: #6260	2022-10-19 14:25:26 +02:00
Gokhan Gulbiz	56da3cf6aa	Increase node_connection_timeout to prevent flakiness in shard_rebalancer regression tests (#6445 ) In CI shard_rebalancer sometimes fails with this error: ```diff SET citus.node_connection_timeout to 60; BEGIN; SET LOCAL citus.shard_replication_factor TO 2; SET citus.log_remote_commands TO ON; SET SESSION citus.max_adaptive_executor_pool_size TO 5; SELECT replicate_table_shards('dist_table_test_2', max_shard_copies := 4, shard_transfer_mode:='block_writes'); +WARNING: could not establish connection after 60 ms ``` Source https://app.circleci.com/pipelines/github/citusdata/citus/28128/workflows/38eeacc4-4191-4366-87ed-9a628414965a/jobs/847458?invite=true#step-107-21 This PR avoids this issue by increasing ```citus.node_connection_timeout``` to 35s.	2022-10-19 13:03:14 +03:00
Onur Tirtir	5aec88d084	Not try locking relations referencing to views (#6430 ) Since there can't be such a foreign key already. This mainly fixes the error that Citus throws when trying to truncate a distributed view. Fixes #5990.	2022-10-19 11:24:22 +03:00
Gokhan Gulbiz	e87eda6496	Introduce a new GUC to propagate local settings to new connections in rebalancer (#6396 ) DESCRIPTION: Introduce ```citus.propagate_session_settings_for_loopback_connection``` GUC to propagate local settings to new connections. Fixes: #5289	2022-10-18 12:50:30 +03:00
Naisila Puka	8323f4f12c	Cleans up test outputs (#6434 )	2022-10-17 15:13:07 +03:00
Onur Tirtir	4152a391c2	Properly set col names for shard rels that citus_extradata_container points to (#6428 ) Deparser function set_relation_column_names() knows that it needs to re-evaluate column names based on relation's tuple descriptor when the rte belongs to a relation (RTE_RELATION). However before this commit, it didn't know about the fact that citus might wrap such an rte with an rte that points to citus_extradata_container() placeholder. And because of this, it was simply taking the column aliases (e.g., "bar" in "foo AS bar") into the account and this might result in an incorrectly deparsed query as in below case: * Say, if we had view based on following query: ```sql SELECT a FROM table; ``` * And if we rename column "a" to "b", the view query normally becomes: ```sql SELECT b AS a FROM table; ``` * So before this commit, deparsing a query based on that view was resulting in such a query due to deparsing based on the column aliases, which is not correct: ```sql SELECT a FROM table; ``` Fixes #5932. DESCRIPTION: Fixes a bug that might cause failing to query the views based on tables that have renamed columns	2022-10-14 17:31:25 +03:00
Önder Kalacı	8b624b5c9d	Detect remotely closed sockets and add a single connection retry in the executor (#6404 ) PostgreSQL 15 exposes WL_SOCKET_CLOSED in WaitEventSet API, which is useful for detecting closed remote sockets. In this patch, we use this new event and try to detect closed remote sockets in the executor. When a closed socket is detected, the executor now has the ability to retry the connection establishment. Note that, the executor can retry connection establishments only for the connection that has not been used. Basically, this patch is mostly useful for preventing the executor to fail if a cached connection is closed because of the worker node restart (or worker failover). In other words, the executor cannot retry connection establishment if we are in a distributed transaction AND any command has been sent over the connection. That requires more sophisticated retry mechanisms. For now, fixing the above use case is enough. Fixes #5538 Earlier discussions: #5908, #6259 and #6283 ### Summary of the current approach regards to earlier trials As noted, we explored some alternatives before getting into this. https://github.com/citusdata/citus/pull/6283 is simple, but lacks an important property. We should be checking for `WL_SOCKET_CLOSED` _before_ sending anything over the wire. Otherwise, it becomes very tricky to understand which connection is actually safe to retry. For example, in the current patch, we can safely check `transaction->transactionState == REMOTE_TRANS_NOT_STARTED` before restarting a connection. #6259 does what we intent here (e.g., check for sending any command). However, as @marcocitus noted, it is very tricky to handle `WaitEventSets` in multiple places. And, the executor is designed such that it reacts to the events. So, adding anything `pre-executor` seemed too ugly. In the end, I converged into this patch. This patch relies on the simplicity of #6283 and also does a very limited handling of `WaitEventSets`, just for our purpose. Just before we add any connection to the execution, we check if the remote session has already closed. With that, we do a brief interaction of multiple wait event processing, but with different purposes. The new wait event processing we added does not even consider cancellations. We let that handled by the main event processing loop. Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-10-14 15:08:49 +02:00
Onur Tirtir	20847515fa	Hint users to call "citus_set_coordinator_host" first (#6425 ) If an operation requires having coordinator in pg_dist_node and if that is not the case, then we automatically add the coordinator into pg_dist_node if user didn't add any worker nodes yet. However, if user have already added some worker nodes before, we throw an error. With this commit, we improve the error thrown in that case. Closes #6423 based on the discussion made there.	2022-10-12 18:18:51 +03:00
Jelte Fennema	cb34adf7ac	Don't reassign global PID when already assigned (#6412 ) DESCRIPTION: Fix bug in global PID assignment for rebalancer sub-connections In CI our isolation_shard_rebalancer_progress test would sometimes fail like this: ```diff +isolationtester: canceling step s1-rebalance-c1-block-writes after 60 seconds step s1-rebalance-c1-block-writes: SELECT rebalance_table_shards('colocated1', shard_transfer_mode:='block_writes'); - <waiting ...> + +ERROR: canceling statement due to user request step s7-get-progress: ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/27855/workflows/2a7e335a-f3e8-46ed-b6bd-6920d42f7214/jobs/831710 It turned out this was an actual bug in the way our assigning of global PIDs interacts with the way we connect to ourselves as the shard rebalancer. The first command the shard rebalancer sends is a SET ommand to change the application_name to `citus_rebalancer`. If `StartupCitusBackend` is called after this command is processed, then it overwrites the global PID that was extracted from the previous application_name. This makes sure that we don't do that, and continue to use the original global PID. While it might seem that we only call `StartupCitusBackend` once for each query backend, this isn't actually the case. Whenever pg_dist_partition gets ANALYZEd by autovacuum we indirectly call `StartupCitusBackend` again, because we invalidate the cache then. In passing this fixes two other things as well: 1. It sets `distributedCommandOriginator` correctly in `AssignGlobalPID`, by using IsExternalClientBackend(). This doesn't matter much anymore, since AssignGlobalPID effectively becomes a no-op in this PR for any non-external client backends. 2. It passes the application_name to InitializeBackendData in StartupCitusBackend, instead of INVALID_CITUS_INTERNAL_BACKEND_GPID (which effectively got casted to NULL). In practice this doesn't change the behaviour of the call, since the call is a no-op for every backend except the maintenance daemon. And the behaviour of the call is the same for NULL as for the application_name of the maintenance daemon.	2022-10-11 16:41:01 +02:00
Naisila Puka	b5d70d2e11	Fix flakyness in alter_table_set_access_method (#6421 ) We decrease verbosity level here to avoid the flaky output https://app.circleci.com/pipelines/github/citusdata/citus/27936/workflows/dc63128a-1570-41a0-8722-08f3e3cfe301/jobs/836153 ```diff select alter_table_set_access_method('ref','heap'); NOTICE: creating a new table for alter_table_set_access_method.ref NOTICE: moving the data of alter_table_set_access_method.ref NOTICE: dropping the old alter_table_set_access_method.ref NOTICE: drop cascades to 2 other objects -DETAIL: drop cascades to materialized view m_ref -drop cascades to view v_ref +DETAIL: drop cascades to view v_ref +drop cascades to materialized view m_ref CONTEXT: SQL statement "DROP TABLE alter_table_set_access_method.ref CASCADE" NOTICE: renaming the new table to alter_table_set_access_method.ref alter_table_set_access_method ------------------------------- (1 row) ```	2022-10-11 16:31:24 +03:00
Naisila Puka	89aa9a015f	Fixes empty password issue (#6417 )	2022-10-11 15:56:44 +03:00
Hanefi Onaldi	cbe4298c5b	Remove references to optimization PG15 reverted PG15 introduced an optimization on GROUP BY keys that is now reverted on RC2. Relevant PG commit: Revert "Optimize order of GROUP BY keys". 443df6e2db932a7cd6d85ddfb67e11a43345130d	2022-10-10 21:54:08 +03:00
Gokhan Gulbiz	1776bdf654	Limit citus_drain_node to drain the specified node only (#6361 ) DESCRIPTION: Fixes citus_drain_node to drain the specified worker only. Fixes #6267	2022-10-09 13:33:08 +03:00
Onur Tirtir	86e186f671	Retain trigger settings when re-creating the triggers (on shards) (#6398 ) Fixes https://github.com/citusdata/citus/issues/6394. DESCRIPTION: Fixes a bug that causes creating disabled-triggers on shards as enabled Since CREATE TRIGGER doesn't have syntax support to specify whether the trigger should be enabled/disabled, the underlying PG function (`pg_get_triggerdef()`) that we use to generate the command to create the trigger is not enough. For this reason, we append a second command to enable/disable trigger, right after creating it. We don't retain explicit extension dependencies set by using `ALTER trigger DEPENDS ON EXTENSION` commands too, but apparently right fix for that is to throw an error as in `PreprocessAlterTriggerDependsStmt()`; so, opened a separate PR to fix that #6399.	2022-10-06 14:51:07 +03:00
Naisila Puka	27e867afbc	Propagates column aliases (#6400 ) Propagates column aliases in the shard-level commands	2022-10-06 12:27:31 +03:00
Naisila Puka	b5cba3a3fe	Use original relation to retrieve column name because of syscache (#6387 ) During alter_distributed_table, we create a new table like the original table but with the altered options. To retrieve the name of the distribution column, we were using the attribute syscache of the new table, since we already created the new table as identical to the original table. However, the attribute syscaches of these two tables are not the same if the original table has dropped columns. The reason is that dropped columns are all still present in the cache. Hence, for example, the attnos would be different in the syscaches. So, let's use the attribute syscache of the original table.	2022-10-06 12:08:00 +03:00
Ying Xu	f21cbe68f8	[Columnar] Bugfix for Columnar: options ignored during ALTER TABLE rewrite (#6337 ) DESCRIPTION: Fixes a bug that prevents retaining columnar table options after a table-rewrite A fix for this issue: Columnar: options ignored during ALTER TABLE rewrite #5927 The OID for the temporary table created during ALTER TABLE was not the same as the original table's OID so the columnar options were not being applied during rewrite. The change is that I applied the original table's columnar options to the new table so that it has the correct options during write. I also added a test.	2022-10-05 11:42:09 -07:00
Hanefi Onaldi	5ddd4754a2	Document failing downgrades from 10.2-4 to 10.2-2	2022-10-04 18:56:20 +03:00
Hanefi Onaldi	0efd6f7829	Fix tests for missing downgrades	2022-10-04 18:56:20 +03:00
Hanefi Onaldi	24f247b5a1	Cleanup multi_utility_warnings test This test used to contain some utility commands that Citus did not support. However we added support for most of the commands, and this test got outdated. We used to error out on community when user attempted to use pooler options. Now that we open sourced all enterprise features, the test can now be removed.	2022-10-04 15:27:42 +03:00
Ahmet Gedemenli	d0fa10a98c	Bump Citus to 11.2devel (#6385 )	2022-09-30 14:47:42 +03:00
Hanefi Onaldi	7e0edee4ec	Add tests for CREATE DATABASE with OID option (#6376 ) PG15 now allows users to specify oids when creating databases. This feature is a side effect of a bigger feature in pg_upgrade. Relevant PG Commit: pg_upgrade: Preserve database OIDs. aa01051418f10afbdfa781b8dc109615ca785ff9	2022-09-27 19:54:51 +02:00
Naisila Puka	1b26d57288	Adds tests for suppressed constants in postgres_fdw queries (#6370 ) PG15 has suppressed some casts on constants when querying foreign tables. For example, we can use text to represent a type that's an enum on the remote side. A comparison on such a column will get shipped as "var = 'foo'::text". But there's no enum = text operator on the remote side. If we leave off the explicit cast, the comparison will work. Test we behave in the same way with a Citus foreign table Reminder: foreign tables cannot be distributed/reference, can only be Citus local Relevant PG commit: `f8abb0f5e1`	2022-09-27 13:40:48 +03:00
Hanefi Onaldi	30ac6f0fe9	Add tests for jsonpath changes on PG15 PostgreSQL 15 had some changes to jsonpath to conform with ECMA-262 referenced by SQL standard. This commit adds tests to make sure Citus also supports the same standards. Relevant pg commit: e26114c817b610424010cfbe91a743f591246ff1	2022-09-26 22:55:54 +03:00
Jelte Fennema	24e06af6d2	Reuse connections for Splits and Logical Replication (#6314 ) In Split, Logical replication logic and ShardCleaner we call `SendCommandListToWorkerOutsideTransaction` and `SendOptionalCommandListToWorkerOutsideTransaction` frequently. This opens new connection for each of those calls, even though we already have a perfectly good connection lying around. This PR adds two new APIs `SendCommandListToWorkerOutsideTransactionWithConnection` and `SendOptionalCommandListToWorkerOutsideTransactionWithConnection` that allow sending a list of queries in a transaction over an existing connection. We also update the callers (Split, ShardCleaner, Logical Replication) to use these new APIs instead. Co-authored-by: Nitish Upreti <niupre@microsoft.com> Co-authored-by: Onder Kalaci <onderkalaci@gmail.com>	2022-09-26 13:37:40 +02:00
Naisila Puka	dc9723fa45	Comment about column list for fk ON DELETE SET in PG15 (#6372 ) As a part of `a868cc049a`	2022-09-26 11:45:05 +03:00
Onur Tirtir	a868cc049a	Not allow ON DELETE/UPDATE SET DEFAULT actions on columns that default to sequences (#6340 ) Given that we drop DEFAULT nextval('sequence') expressions from shard relation columns, allowing `ON DELETE/UPDATE SET DEFAULT` on such columns might cause inserting NULL values as a result of a delete/update operation. For this reason, we disallow ON DELETE/UPDATE SET DEFAULT actions on columns that default to sequences. DESCRIPTION: Disallows having ON DELETE/UPDATE SET DEFAULT actions on columns that default to sequences Fixes #6339.	2022-09-23 03:34:02 -07:00
Onur Tirtir	de24a3eda5	Not drop default col exprs from shard when adding local table to metadata (#6323 ) As we did for GENERATED STORED columns in #4613, we should not drop column default expressions that are not based on sequences from shard relation since such expressions need to exist e.g. for foreign key actions. For the column default expressions that are based on sequences we cannot do much, so we need to disallow having ON DELETE SET DEFAULT actions on such columns in a separate PR, see #6339. Fixes #6318. DESCRIPTION: Fixes a bug that might cause inserting incorrect DEFAULT values when applying foreign key actions	2022-09-23 03:05:08 -07:00
Naisila Puka	1ede0b9db3	Add tests to verify we support security invoker views (#6362 ) PG15 added support for security invoker views. Relevant PG conmit: `7faa5fc84b` These views check the permissions for the underlying tables of the view invoker user, not the view definer user. When the view has underlying distributed tables, the queries to the shards are sent by opening connections with the current user, which is the view invoker, no matter what the type of the view is. This means that, for distributed views, they were always behaving like security invoker views. Check the following issue for more details: https://github.com/citusdata/citus/issues/6161 So, Citus doesn't fully support security definer views. However Citus does fully support security invoker views. We add tests to make sure we cover different cases.	2022-09-23 10:55:46 +03:00
Onder Kalaci	03ac8b4f82	Add tests for PG15 new aggregate commands Both tests include pushdown and pull to coordinator type of aggregate execution. Relevant PG commits: Add min() and max() aggregates for xid8 400fc6b6487ddf16aa82c9d76e5cfbe64d94f660 Add range_agg with multirange inputs 7ae1619bc5b1794938c7387a766b8cae34e38d8a Co-authored-by: Onder Kalaci <onderkalaci@gmail.com>	2022-09-20 17:08:17 +03:00
Marco Slot	8544346a78	Allow create_distributed_table_concurrently on an empty node (#6353 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-09-16 10:55:02 +02:00
Onder Kalaci	766f340ce0	Prevent failures on partitioned distributed tables with statistics objects on PG 15 Comment from the code is clear on this: /* * The statistics objects of the distributed table are not relevant * for the distributed planning, so we can override it. * * Normally, we should not need this. However, the combination of * Postgres commit 269b532aef55a579ae02a3e8e8df14101570dfd9 and * Citus function AdjustPartitioningForDistributedPlanning() * forces us to do this. The commit expects statistics objects * of partitions to have "inh" flag set properly. Whereas, the * function overrides "inh" flag. To avoid Postgres to throw error, * we override statlist such that Postgres does not try to process * any statistics objects during the standard_planner() on the * coordinator. In the end, we do not need the standard_planner() * on the coordinator to generate an optimized plan. We call * into standard_planner() for other purposes, such as generating the * relationRestrictionContext here. * * AdjustPartitioningForDistributedPlanning() is a hack that we use * to prevent Postgres' standard_planner() to expand all the partitions * for the distributed planning when a distributed partitioned table * is queried. It is required for both correctness and performance * reasons. Although we can eliminate the use of the function for * the correctness (e.g., make sure that rest of the planner can handle * partitions), it's performance implication is hard to avoid. Certain * planning logic of Citus (such as router or query pushdown) relies * heavily on the relationRestrictionList. If * AdjustPartitioningForDistributedPlanning() is removed, all the * partitions show up in the, causing high planning times for * such queries. */	2022-09-15 14:36:05 +03:00
aykut-bozkurt	739b91afa6	ensure we have more active nodes than replication factor. (#6341 ) DESCRIPTION: Fixes floating exception during create_distributed_table_concurrently. Fixes #6332. During create_distributed_table_concurrently, when there is no active primary node, it fails with floating exception. We added similar check with create_distributed_table. It will fail with proper message if current active node is less than replication factor.	2022-09-14 18:20:50 +03:00
Nils Dijk	da527951ca	Fix: rebalance stop non super user (#6334 ) No need for description, fixing issue introduced with new feature for 11.1 Fixes #6333 Due to Postgres' C api being o-indexed and postgres' attributes being 1-indexed, we were reading the wrong Datum as the Task owner when cancelling. Here we add a test to show the error and fix the off-by-one error.	2022-09-13 23:19:31 +02:00
Naisila Puka	76ff4ab188	Adds support for unlogged distributed sequences (#6292 ) We can now do the following: - Distribute sequence with logged/unlogged option - ALTER TABLE my_sequence SET LOGGED/UNLOGGED - ALTER SEQUENCE my_sequence SET LOGGED/UNLOGGED Relevant PG commit `344d62fb9a`	2022-09-13 10:53:39 +03:00
Hanefi Onaldi	5cfcc63308	Add warning messages for cluster commands on partitioned tables (#6306 ) PG15 introduces `CLUSTER` commands for partitioned tables. Similar to a `CLUSTER` command with no supplied table names, these commands also can not be run inside transaction blocks and therefore can not be propagated in a distributed transaction block with ease. Therefore we raise warnings. Relevant PG commit: cfdd03f45e6afc632fbe70519250ec19167d6765	2022-09-13 00:05:58 +03:00
Hanefi Onaldi	164f2fa0a6	PG15: Add support for NULLS NOT DISTINCT (#6308 ) Relevant PG commit: 94aa7cc5f707712f592885995a28e018c7c80488	2022-09-12 23:47:37 +03:00
Marco Slot	b79111527e	Avoid blocking writes in create_distributed_table_concurrently (#6324 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-09-12 12:09:37 -07:00
Nils Dijk	cda3686d86	Feature: run rebalancer in the background (#6215 ) DESCRIPTION: Add a rebalancer that uses background tasks for its execution Based on the baclground jobs and tasks introduced in #6296 we implement a new rebalancer on top of the primitives of background execution. This allows the user to initiate a rebalance and let Citus execute the long running steps in the background until completion. Users can invoke the new background rebalancer with `SELECT citus_rebalance_start();`. It will output information on its job id and how to track progress. Also it returns its job id for automation purposes. If you simply want to wait till the rebalance is done you can use `SELECT citus_rebalance_wait();` A running rebalance can be canelled/stopped with `SELECT citus_rebalance_stop();`.	2022-09-12 20:46:53 +03:00
Marco Slot	48f7d6c279	Show local managed tables in citus_tables and hide tables owned by extensions (#6321 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-09-12 17:49:17 +03:00
Marco Slot	b036e44aa4	Fix bug preventing isolate_tenant_to_new_shard with text column (#6320 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-09-12 16:29:57 +02:00
naisila	47bea76c6c	Revert "Support JSON_TABLE on PG 15 (#6241 )" This reverts commit `1f4fe35512`.	2022-09-12 15:20:17 +03:00
Onder Kalaci	36f8c48560	Add tests for allowing SET NULL/DEFAULT for subseet of columns PG 15 added support for that (d6f96ed94e73052f99a2e545ed17a8b2fdc1fb8a). We also add support, but we already do not support ON DELETE SET NULL/DEFAULT for distribution column. So, in essence, we add support for reference tables and Citus local tables.	2022-09-12 13:56:09 +03:00
Marco Slot	2e943a64a0	Make shard moves more idempotent (#6313 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-09-09 18:21:36 +02:00
Jelte Fennema	a2d86214b2	Share more replication code between moves and splits (#6310 ) The logical replication catchup part for shard splits and shard moves is very similar. This abstracts most of that similarity away into a single function. This also improves the logic for non blocking shard splits a bit, by using faster foreign key creation. It also parallelizes index creation which shard moves were already doing, but shard splits did not.	2022-09-09 16:45:38 +02:00
Marco Slot	ba2fe3e3c4	Remove do_repair option from citus_copy_shard_placement (#6299 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-09-09 15:44:30 +02:00
Nils Dijk	00a94c7f13	Implement infrastructure to run sql jobs in the background (#6296 ) DESCRIPTION: Add infrastructure to run long running management operations in background This infrastructure introduces the primitives of jobs and tasks. A task consists of a sql statement and an owner. Tasks belong to a Job and can depend on other tasks from the same job. When there are either runnable or running tasks we would like to make sure a bacgrkound task queue monitor process is running. A Task could be in running state while there is actually no monitor present due to a database restart or failover. Once the monitor starts it will reset any running task to its runnable state. To make sure only one background task queue monitor is ever running at once it will acquire an advisory lock that self conflicts. Once a task is done it will find all tasks depending on this task. After checking that the task doesn't have unmet dependencies it will transition the task from blocked to runnable state for the task to be picked up on a subsequent task start. Currently only one task can be running at a time. This can be improved upon in later releases without changes to the higher level API. The initial goal for this background tasks is to allow a rebalance to run in the background. This will be implemented in a subsequent PR.	2022-09-09 16:11:19 +03:00
Jelte Fennema	76137e967f	Create all foreign keys quickly at the end of a shard move (#6148 ) Previously we would create foreign keys to reference table in an extra fast way at the end of a shard move. This uses that same logic to also do it for foreign keys between distributed tables. Fixes #6141	2022-09-09 09:58:33 +02:00
Ahmet Gedemenli	eadc88a800	Introduce GUC citus.skip_constraint_validation (#6281 ) Introduces a new GUC named citus.skip_constraint_validation, which basically skips constraint validation when set to on. For some several places that we hack to skip the foreign key validation phase, now we use this GUC.	2022-09-08 18:13:18 +03:00
Hanefi Onaldi	a557a196aa	Add tests for numeric with scale greater than precision	2022-09-07 13:12:04 +03:00
Hanefi Onaldi	4db113496f	Add tests for new COPY features in PG15	2022-09-07 13:12:04 +03:00
Hanefi Onaldi	3e4e42253f	Add tests for new regexp sql functions	2022-09-07 13:12:04 +03:00
Nitish Upreti	d7404a9446	'Deferred Drop' and robust 'Shard Cleanup' for Splits. (#6258 ) DESCRIPTION: This PR adds support for 'Deferred Drop' and robust 'Shard Cleanup' for Splits. Common Infrastructure This PR introduces new common infrastructure so as any operation that wants robust cleanup of resources can register with the cleaner and have the resources cleaned appropriately based on a specified policy. 'Shard Split' is the first consumer using this new infrastructure. Note : We only support adding 'shards' as resources to be cleaned-up right now but the framework will be extended to support other resources in future. Deferred Drop for Split Deferred Drop Support ensures that shards undergoing split are not dropped inline as part of operation but dropped later when no active read queries are running on shard. This helps with : Avoids any potential deadlock scenarios that can cause long running Split operation to rollback. Avoids Split operation blocking writes and then getting blocked (due to running queries on the shard) when trying to drop shards. Deferred drop is the new default behavior going forward. Shard Cleaner Extension Shard Cleaner is a background task responsible for deferred drops in case of 'Move' operations. The cleaner has been extended to ensure robust cleanup of shards (dummy shards and split children) in case of a failure based on the new infrastructure mentioned above. The cleaner also handles deferred drop for 'Splits'. TESTING: New test ''citus_split_shard_by_split_points_deferred_drop' to test deferred drop support. New test 'failure_split_cleanup' to test shard cleanup with failures in different stages. Update 'isolation_blocking_shard_split and isolation_non_blocking_shard_split' for deferred drop. Added non-deferred drop version of existing tests : 'citus_split_shard_no_deferred_drop' and 'citus_non_blocking_splits_no_deferred_drop'	2022-09-06 12:11:20 -07:00
Gokhan Gulbiz	ac96370ddf	Use IsMultiStatementTransaction for SELECT .. FOR UPDATE queries (#6288 ) * Use IsMultiStatementTransaction instead of IsTransaction for row-locking operations. * Add regression test for SELECT..FOR UPDATE statement	2022-09-06 16:38:41 +02:00
Emel Şimşek	6f06ff78cc	Throw an error if there is a RangeTblEntry that is not assigned an RTE identity. (#6295 ) * Fix issue : 6109 Segfault or (assertion failure) is possible when using a SQL function * DESCRIPTION: Ensures disallowing the usage of SQL functions referencing to a distributed table and prevents a segfault. Using a SQL function may result in segmentation fault in some cases. This change fixes the issue by throwing an error message when a SQL function cannot be handled. Fixes #6109. * DESCRIPTION: Ensures disallowing the usage of SQL functions referencing to a distributed table and prevents a segfault. Using a SQL function may result in segmentation fault in some cases. This change fixes the issue by throwing an error message when a SQL function cannot be handled. Fixes #6109. Co-authored-by: Emel Simsek <emel.simsek@microsoft.com>	2022-09-06 15:46:41 +02:00
Hanefi Onaldi	85b19c851a	Disallow distributing by numeric with negative scale PG15 allows numeric scale to be negative or greater than precision. This causes issues and we may end up routing queries to a wrong shard due to differing hash results after rounding. Formerly, when specifying NUMERIC(precision, scale), the scale had to be in the range [0, precision], which was per SQL spec. PG15 extends the range of allowed scales to [-1000, 1000]. A negative scale implies rounding before the decimal point. For example, a column might be declared with a scale of -3 to round values to the nearest thousand. Note that the display scale remains non-negative, so in this case the display scale will be zero, and all digits before the decimal point will be displayed. Relevant PG commit: 085f931f52494e1f304e35571924efa6fcdc2b44	2022-09-06 12:40:56 +03:00
Naisila Puka	d7f41cacbe	Prohibit renaming child trigger on distributed partition pre PG15 (#6290 ) Pre PG15, renaming child triggers on partitions is allowed. When creating a trigger in a distributed parent partitioned table, the triggers on the shards of the partitions have the same name with the triggers on the corresponding parent shards of the parent table. Therefore, they don't have the same appended shard id as the shard id of the partition. Hence, when trying to rename a child trigger on a partition of a distributed table, we can't correctly find the triggers on the shards of the partition in order to rename them since we append a different shard id to the name of the trigger. Since we can't find the trigger we get a misleading error of inexistent trigger. In this commit we prohibit renaming child triggers on distributed partitions altogether.	2022-09-06 12:19:25 +03:00
Naisila Puka	fd9b3f4ae9	Add tests to make sure distributed clone trigger rename fails in PG15 (#6291 ) Relevant PG commit: 80ba4bb383538a2ee846fece6a7b8da9518b6866	2022-09-06 11:04:14 +03:00
Marco Slot	e6b1845931	Change split logic to avoid EnsureReferenceTablesExistOnAllNodesExtended (#6208 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-09-05 22:02:18 +02:00
Önder Kalacı	bd13836648	Add citus.skip_advisory_lock_permission_checks (#6293 )	2022-09-05 17:47:41 +02:00
Ahmet Gedemenli	7c8cc7fc61	Fix flakiness for view tests (#6284 )	2022-09-02 10:12:07 +03:00
Marco Slot	432f399a5d	Allow citus_internal application_name with additional suffix (#6282 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-09-01 14:26:43 +02:00
Naisila Puka	9e2b96caa5	Add pg14->pg15 upgrade test for dist. triggers on part. tables (#6265 ) PRE PG15, Renaming the parent triggers on partitioned tables doesn't recurse to renaming the child triggers on the partitions as well. In PG15, Renaming triggers on partitioned tables recurses to renaming the triggers on the partitions as well. Add an upgrade test to make sure we are not breaking anything with distributed triggers on distributed partitioned tables. Relevant PG commit: 80ba4bb383538a2ee846fece6a7b8da9518b6866	2022-09-01 12:32:44 +03:00
Naisila Puka	98dcbeb304	Specifies that our CustomScan providers support projections (#6244 ) Before, this was the default mode for CustomScan providers. Now, the default is to assume that they can't project. This causes performance penalties due to adding unnecessary Result nodes. Hence we use the newly added flag, CUSTOMPATH_SUPPORT_PROJECTION to get it back to how it was. In PG15 support branch we created explain functions to ignore the new Result nodes, so we undo that in this commit. Relevant PG commit: 955b3e0f9269639fb916cee3dea37aee50b82df0	2022-08-31 10:48:01 +03:00
Jelte Fennema	24e695ca27	Fix flakyness in multi_utilities (#6272 ) Sometimes in CI our multi_utilities test fails like this: ```diff VACUUM (INDEX_CLEANUP ON, PARALLEL 1) local_vacuum_table; SELECT CASE WHEN s BETWEEN 20000000 AND 25000000 THEN 22500000 ELSE s END size FROM pg_total_relation_size('local_vacuum_table') s ; size ---------- - 22500000 + 39518208 (1 row) ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26641/workflows/5caea99c-9f58-4baa-839a-805aea714628/jobs/762870 Apparently VACUUM is not as reliable in cleaning up as we thought. This increases the range of allowed values. Important to note is that the range is still completely outside of the allowed range of the initial size. So we know for sure that some data was cleaned up.	2022-08-30 14:32:34 -07:00
Jelte Fennema	f22a47981a	Fix flakyness in adaptive_executor (#6275 ) Sometimes in CI our adaptive_executor test would fail randomly with the following error: ```diff SELECT sum(result::bigint) FROM run_command_on_workers($$ SELECT count(*) FROM pg_stat_activity WHERE pid <> pg_backend_pid() AND query LIKE '%8010090%' $$); sum ----- - 4 + 2 (1 row) END; ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26665/workflows/40665680-0044-4852-8fe4-5fd628f9fb47/jobs/764371 This means that the low slow start interval did not have any effect on the number of connections being opened. I could see two possibilities for this to happen: 1. CI was slow and actually doing the start of the second connection. I tried to solve this by doubling the time a query to the worker takes. 2. The second option is that the shards were queried in the oposite order than we expect. This would mean that the first query to the worker completes quickly because there's no, sleep because it doesn't contain any rows. I tried to solve this option by adding a row to each shard. After trying to reproduce the random failure in CI it turned out that I needed both of these fixes to resolve the random failure.	2022-08-30 23:23:30 +02:00
Jelte Fennema	8354853dec	Fix flakyness in citus_split_shard_columnar_partitioned (#6273 ) On CI our citus_split_shard_columnar_partitioned test would sometimes randomly fail like this: ```diff 8970008 \| colocated_dist_table \| -2147483648 \| 2147483647 \| localhost \| 57637 8970009 \| colocated_partitioned_table \| -2147483648 \| 2147483647 \| localhost \| 57637 8970010 \| colocated_partitioned_table_2020_01_01 \| -2147483648 \| 2147483647 \| localhost \| 57637 - 8970011 \| reference_table \| \| \| localhost \| 57637 8970011 \| reference_table \| \| \| localhost \| 57638 + 8970011 \| reference_table \| \| \| localhost \| 57637 (13 rows) ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26651/workflows/f695b4fb-ad81-46ff-b97e-0100e5d167ea/jobs/763517 This is a harmless diff due to a missing column in the order by list. This fixes that by adding the nodeport as a tiebreaker.	2022-08-30 19:54:50 +03:00
Marco Slot	6bb31c5d75	Add non-blocking variant of create_distributed_table (#6087 ) Added create_distributed_table_concurrently which is nonblocking variant of create_distributed_table. It bases on the split API which takes advantage of logical replication to support nonblocking split operations. Co-authored-by: Marco Slot <marco.slot@gmail.com> Co-authored-by: aykutbozkurt <aykut.bozkurt1995@gmail.com>	2022-08-30 15:35:40 +03:00
Önder Kalacı	33af407ac8	Add missing orderbys (#6271 )	2022-08-30 12:49:15 +03:00
Jelte Fennema	895a484b39	Hopefully fix flakyeness in drop_partitioned_table (#6270 ) Sometimes in CI our drop_partitioned_talbe test would fail with the following error: ```diff NOTICE: issuing SELECT worker_drop_distributed_table('drop_partitioned_table.child1') NOTICE: issuing SELECT worker_drop_distributed_table('drop_partitioned_table.child1') NOTICE: issuing DROP TABLE IF EXISTS drop_partitioned_table.child1_727001 CASCADE -NOTICE: issuing SELECT pg_catalog.citus_internal_delete_colocation_metadata(100047) -NOTICE: issuing SELECT pg_catalog.citus_internal_delete_colocation_metadata(100047) +NOTICE: issuing SELECT pg_catalog.citus_internal_delete_colocation_metadata(100046) +NOTICE: issuing SELECT pg_catalog.citus_internal_delete_colocation_metadata(100046) ROLLBACK; NOTICE: issuing ROLLBACK NOTICE: issuing ROLLBACK ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26631/workflows/31536032-e1ba-493b-b12a-f40757f3a7d6/jobs/762170 For some reason the colocationid of the distributed partitioned table would be one less than we expected. Why this happens I'm not sure, but it seems fairly harmless that it does. In an attempt to work around this flakyness I now reset the colocation id sequence right before creating the table in question. This is good practice in general, because it allows us to run the test successfully using `check-minimal` and it also allows us to rerun it multiple times.	2022-08-30 12:21:16 +03:00
Naisila Puka	13fe89f018	Fixes flakyness in columnar_permissions test (#6266 ) `columnar_permissions.sql` test is flaky due to a missing `ORDER BY` clauses. Added the other `ORDER BY` clauses for consistency in the test. ```diff where relation in ('no_access'::regclass, 'columnar_permissions'::regclass); relation \| chunk_group_row_limit \| stripe_row_limit \| compression \| compression_level ----------------------+-----------------------+------------------+-------------+------------------- - no_access \| 10000 \| 150000 \| zstd \| 3 columnar_permissions \| 10000 \| 2222 \| none \| 3 + no_access \| 10000 \| 150000 \| zstd \| 3 (2 rows) ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26610/workflows/79f03ef9-7674-4567-a087-02536c9ddf04/jobs/760942	2022-08-29 14:33:26 +02:00
Ahmet Gedemenli	0855a9d1d4	Use SUM for calculating non partitioned table sizes (#6222 ) We currently do a `pg_relation_total_size('t1') + pg_relation_total_size('t2') + ..` on shard lists, especially when rebalancing the shards. This in some cases goes huge. With this PR, we basically use a SUM for all table sizes, instead of using thousands of pluses.	2022-08-26 18:02:14 +03:00
Sameer Awasekar	4df8eca77f	Add worker_split_shard_release_dsm udf to release dynamic shared memory (#6248 ) The code introduces worker_split_shard_release_dsm udf to release the dynamic shared memory segment allocated during non-blocking split workflow.	2022-08-26 18:27:32 +05:30
Jelte Fennema	77dd49fcf8	Fix flakyness in failure_online_move_shard_placement (#6250 ) Sometimes in CI failure_online_move_shard_placement fails with the following error: ```diff SELECT citus.mitmproxy('conn.onQuery(query="^ALTER SUBSCRIPTION .* ENABLE").cancel(' \|\| :pid \|\| ')'); mitmproxy ----------- (1 row) SELECT master_move_shard_placement(101, 'localhost', :worker_1_port, 'localhost', :worker_2_proxy_port); -ERROR: canceling statement due to user request +ERROR: tuple concurrently updated +CONTEXT: while executing command on localhost:9060 -- failure on polling subscription state ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26441/workflows/dd6e3475-6121-47b3-aea3-4ac92be114f4/jobs/751476/steps This error is not completely harmless, because based on the logs it mean that our cleanup logic failed, which in turn means that replication slots are left around: ``` 2022-08-24 16:01:29.247 UTC [1219] ERROR: XX000: tuple concurrently updated 2022-08-24 16:01:29.247 UTC [1219] LOCATION: simple_heap_update, heapam.c:4179 2022-08-24 16:01:29.247 UTC [1219] STATEMENT: ALTER SUBSCRIPTION citus_shard_move_subscription_10 DISABLE ``` However, we have other mechanisms to clean up any leftovers in case of a failed cleanup. So it's not that big of a problem. The reason we run into this error is arguably because of a Postgres bug, so I created a patch for Postgres that fixes this. While we wait for this (or a similar) patch to be merged, this PR disables the flaky test. There's still a test that tests in case of a connection "kill" instead of a "cancel", so I don't think we lose very important coverage by disabling this test. When trying to reproduce this I only hit this issue in the cancel case, so I don't think there's a need to disable the kill case for now.	2022-08-26 12:49:45 +02:00
Jelte Fennema	2a0c0b3ba6	Fix flakyness in failure_connection_establishment (#6251 ) In CI sometimes failure_connection_establishment would fail with the following error: ```diff -- cancel all connections to this node SELECT citus.mitmproxy('conn.onAuthenticationOk().cancel(' \|\| pg_backend_pid() \|\| ')'); - mitmproxy ---------------------------------------------------------------------- - -(1 row) - +ERROR: canceling statement due to user request +CONTEXT: COPY mitmproxy_result, line 1: "" +SQL statement "COPY mitmproxy_result FROM '/home/circleci/project/src/test/regress/tmp_check/mitmproxy.fifo'" +PL/pgSQL function citus.mitmproxy(text) line 11 at EXECUTE SELECT * FROM citus_check_cluster_node_health(); ``` The reason for this is that the mitm command that was used is very broad and doesn't actually do what the comment says. What happens is that if any connection is made, the current backend is cancelled, which is not the always the same as the backend that made the connection. My assessment is that likely the maintenance daemon makes a connection to the node while we are executing the mitmproxy command. The mitmproxy command goes through, and then triggers a cancel of itself due to the connection made by the maintenance daemon. This PR simply removes this test, since it doesn't seem to test what it intended to test anyway. There's also still the "kill" version of this test, which does do the intended thing. So I don't think we lose important coverage by removing this test.	2022-08-26 10:01:36 +00:00
Jelte Fennema	18015ca501	Fix flakyness in multi_transaction_recovery (#6249 ) Sometimes in CI multi_transaction_recovery would fail with the following error: ```diff SET LOCAL citus.defer_drop_after_shard_move TO OFF; SELECT citus_move_shard_placement((SELECT * FROM selected_shard), 'localhost', :worker_1_port, 'localhost', :worker_2_port, shard_transfer_mode := 'block_writes'); - citus_move_shard_placement ---------------------------------------------------------------------- - -(1 row) - +ERROR: could not find placement matching "localhost:57637" +HINT: Confirm the placement still exists and try again. COMMIT; ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26510/workflows/8269ea93-d9b4-4376-ae0e-8332a5c15fc6/jobs/755548 The reason for this was that when choosing `selected_shard` we didn't ensure that it was actually located on the node that we were moving it from. Instead we simply picked the first shard for the table that was returned by the query. To fix this issue this PR adds a filter to only choose shards that are located on the intended node.	2022-08-26 11:48:55 +02:00
Jelte Fennema	b5cd1676f9	Fix flakyness in multi_utilities (#6245 ) In CI multi_utilities would sometimes fail randomly with this error: ```diff VACUUM (INDEX_CLEANUP ON, PARALLEL 1) local_vacuum_table; SELECT pg_size_pretty( pg_total_relation_size('local_vacuum_table') ); pg_size_pretty ---------------- - 21 MB + 22 MB (1 row) ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26459/workflows/da47d9b6-f70b-49fe-806f-5ebf75bf0b11/jobs/752482 This is a harmless change in output where the relation size after vacuuming was slightly more than we expected. This changes the size checks for the local_vacuum_table to allow a wider range of values. It uses the same trick as #6216 to show the actual value when it's outside this valid range, which is useful if this test ever starts failing again.	2022-08-25 22:50:47 +02:00
Jelte Fennema	00485d45a6	Make multi_utilities not leak tables (#6246 ) When trying to fix #6245 I realized that multi_utilities was leaking some tables that it created during the test. This fixes that by creating all these tables in a schema that's dedicated for this test.	2022-08-25 19:33:03 +03:00
Jelte Fennema	1688bcda33	Fix errors in base_schedule (#6247 ) When running `make check-base` locally it would fail with two different errors. The first one was this: ```diff SELECT create_distributed_table('pg_class', 'relname'); -ERROR: cannot create a citus table from a catalog table +ERROR: deadlock detected +DETAIL: Process 28950 waits for ExclusiveLock on relation 16551 of database 16384; blocked by process 28951. +Process 28951 waits for RowExclusiveLock on relation 1259 of database 16384; blocked by process 28950. +HINT: See server log for query details. SELECT create_reference_table('pg_class'); ``` This happened because multi_behavioral_analytics_create_table and multi_create_table were being run in parallel. Running them separately resolved this issue. The second one was this: ```diff CREATE OR REPLACE FUNCTION wait_until_metadata_sync(timeout INTEGER DEFAULT 15000) RETURNS void LANGUAGE C STRICT AS 'citus'; +ERROR: duplicate key value violates unique constraint "pg_proc_proname_args_nsp_index" +DETAIL: Key (proname, proargtypes, pronamespace)=(wait_until_metadata_sync, 23, 2200) already exists. -- Add some helper functions for sending commands to mitmproxy ``` Which was because failure_test_helpers and multi_test_helpers were trying to create the same function at the exact same time. The easy fix here is to simply not create this function in the failure_test_helpers file. This is fine, because any test schedule that runs failure_test_helpers also runs multi_test_helpers.	2022-08-25 18:06:41 +02:00
Önder Kalacı	3ed6fea1cf	Prevent Merge command on distributed tables [PG 15] (#6238 )	2022-08-25 13:27:08 +03:00
Marco Slot	9bf3c3dd5c	Add an allow_unsafe_constraints flag for constraints without distribution column (#6237 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-08-25 11:37:50 +03:00
Gokhan Gulbiz	69d2fcf5c0	Use the same colocation group for child and parent rels when altering a distributed table (#6225 ) * Alter_distributed_table colocateWith:none bug fix for partitioned tables. * Regression tests added for alter_distributed_table colocateWith:none for partitioned tables * Update query comparision to be more accurate	2022-08-25 11:23:59 +03:00
Naisila Puka	1f4fe35512	Support JSON_TABLE on PG 15 (#6241 ) Postgres supports JSON_TABLE feature on PG 15. We treat JSON_TABLE the same as correlated functions (e.g., recurring tuples). In the end, for multi-shard JSON_TABLE commands, we apply the same restrictions as reference tables (e.g., cannot be in the outer part of an outer join etc.) Co-authored-by: Onder Kalaci <onderkalaci@gmail.com>	2022-08-24 19:11:18 +03:00
Naisila Puka	35b4ddc355	Pg15 support (#6085 ) * Adjust configure script to allow PG15 * Adds copy of ruleutils_14.c as ruleutils_15.c * Uses get_namespace_name_or_temp in ruleutils_15.c Relevant PG commit: 48c5c9068211e0a04fd9553c8714b2821ed3ad17 * Clean up code using "(expr) ? true : false" in ruleutils_15.c Relevant PG commit: fd0625c7a9c679c0c1e896014b8f49a489c3a245 * Change varno from Index (unsigned int) to int in ruleutils_15.c Relevant PG commit: e3ec3c00d85bd2844ffddee83df2bd67c4f8297f * Adds find_recursive_union to ruleutils_15.c Relevant PG commit: 3f50b82639637c9908afa2087de7588450aa866b * Fix display of SQL-std func's args in INSERT/SELECT in ruleutils_15.c Relevant PG commit: a8d8445a7b2f80f6d0bfe97b19f90bd2cbef8759 * Fix ruleutils_15.c's dumping of whole-row Vars in more contexts Relevant PG commit: 43c2175121c829c8591fc5117b725f1f22bfb670 * Fix assorted missing logic for GroupingFunc nodes in ruleutils_15.c Relevant PG commit: 2591ee8ec44d8cbc8e1226550337a64c684746e4 * Adds grammar support for SQL/JSON clauses in ruleutils_15.c Relevant PG commit: f79b803dcc98d707450e158db3638dc67ff8380b * Adds SQL/JSON constructors to ruleutils_15.c Relevant PG commits: f4fb45d15c59d7add2e1b81a9d477d0119a9691a cc7401d5ca498a84d9b47fd2e01cebd8e830e558 * Adds support for MERGE in ruleutils_15.c Relevant PG commit: 7103ebb7aae8ab8076b7e85f335ceb8fe799097c * Add IS JSON predicate to ruleutils_15.c Relevant PG commit: 33a377608fc29cdd1f6b63be561eab0aee5c81f0 * Add SQL/JSON query functions to ruleutils_15.c Relevant PG commit: 1a36bc9dba8eae90963a586d37b6457b32b2fed4 * Adds three different SQL/JSON values to ruleutils_15.c Relevant PG commits: 606948b058dc16bce494270eea577011a602810e 49082c2cc3d8167cca70cfe697afb064710828ca * Adds JSON table functions in ruleutils_15.c Relevant PG commit: 4e34747c88a03ede6e9d731727815e37273d4bc9 * Add PLAN function for JSON table in ruleutils_15.c Relevant PG commit: fadb48b00e02ccfd152baa80942de30205ab3c4f * Remove extra blank lines before block-closing braces ruleutils_15.c Relevant PG commit: 24d2b2680a8d0e01b30ce8a41c4eb3b47aca5031 * set_deparse_plan: Reuse variable to appease Coverity ruleutils_15.c Relevant PG commit: e70813fbc4aaca35ec012d5a426706bd54e4acab * Mechanical code beautification ruleutils_15.c Relevant PG commit: 23e7b38bfe396f919fdb66057174d29e17086418 * Rename value_type to item_type in ruleutils_15.c Relevant PG commit: 3ab9a63cb638a1fd99475668e2da9c237495aeda * Show 'AS "?column?"' explicitly when it's important in ruleutils_15.c Relevant PG commit: c7461fc25558832dd347a9c8150b0f1ed85e36e8 * Fix ruleutils_15.c issues with dropped cols in funcs-returning-composite Relevant PG commit: c1d1e8469c77ce6b8e5310955580b4a3eee7fe96 * Change comment regarding functions returning composite in ruleutils_15.c Relevant PG commit: c2fa113ddb1117b1f03e91960f65d5d7d8a90270 * Replace int nodes with bool nodes where needed In PG15, Boolean nodes are added. Pre PG15, internal Boolean values in Create Role commands were represented by Integer nodes. This commit replaces int nodes logic with bool nodes logic where needed. Mostly there are CREATE ROLE logic changes. Relevant PG commit: 941460fcf731a32e6a90691508d5cfa3d1f8eeaf * Handle new option colliculocale in CREATE COLLATION logic In PG15, there is an added option to use ICU as global locale provider. pg_collation has three locale-related fields: collcollate and collctype, which are libc-related fields, and a new one colliculocale, which is the ICU-related field. Only the libc-related fields or the ICU-related field is set, never both. Relevant PG commits: f2553d43060edb210b36c63187d52a632448e1d2 54637508f87bd5f07fb9406bac6b08240283be3b * Add PG15 tests to CI using test images that have 15beta2 (#6093) * Change warning message in pg_signal_backend() Relevant PG commit: 7fa945b857cc1b2964799411f1633468826861ff * Revert "Add missing ifdef for PG 15" This reverts commit `c7b51025ab`. * Fixes tests for ALTER TRIGGER RENAME consistency for part. tables Relevant PG commit: 80ba4bb383538a2ee846fece6a7b8da9518b6866 * Prevent creating child triggers on partitions when adding new node Pre PG15, tgisinternal is true for a "child" trigger on a partition cloned from the trigger on the parent. In PG15, tgisinternal is false in that case. However, we don't want to create this trigger on the partition since it will create a conflict when we try to attach the partition to the parent table: ERROR: trigger "..." for relation "{partition_name}" already exists Relevant PG commit: f4566345cf40b068368cb5617e61318da60676ec * Fix tests for generated columns dependency changes In PG15, For GENERATED columns, all dependencies of the generation expression are recorded as NORMAL dependencies of the column itself. This requires CASCADE to drop generated cols with the original col. PRE PG15, dependencies were recorded as AUTO, with which generated columns are silently dropped with the original column. Relevant PG commit: cb02fcb4c95bae08adaca1202c2081cfc81a28b5 * Explicitly cast catalog "char" column to text before concatenation Relevant PG commit: 07eee5a0dc642d26f44d65c4e6263304208e8583 * Remove 'AS "?column?"' from test outputs There were some instances in the following tst outputs in planning debug outputs where AS "?column?" is added. We add a normalization rule to remove it as it is not important. cte_inline.out recursive_relation_planning_restriction_pushdown.out Relevant PG commit: c7461fc25558832dd347a9c8150b0f1ed85e36e8 * Use pg_backup_stop(PG15) instead of pg_stop_backup(PG<15) Add an alternative test output because of the change in the backup modes of Postgres. Specifically here, there is a renaming issue: pg_stop_backup PRE PG15 vs pg_backup_stop PG15+ The alternative output can be deleted when we drop support for PG14 Relevant PG commit: 39969e2a1e4d7f5a37f3ef37d53bbfe171e7d77a * Adds citus.mitmfifo GUC Previously we setting this configuration parameter in the fly for failure tests schedule. However, PG15 doesn't allow that anymore: reserved prefixes like "citus" cannot be used to set non-existing GUCs. Relevant PG commit: 88103567cb8fa5be46dc9fac3e3b8774951a2be7 * Handles EXPLAIN output diffs in PG15 - Extra result lines To handle extra "Result" lines in explain outputs, we add explain method to multi_test_helpers.sql file - plan_without_result_lines() is added for cases where we want the whole explain output with only "Result" lines removed * Handles EXPLAIN output diffs in PG15, Hash Agg/Join leverage To handle differences in usage of GroupAggregate vs HashAggregate or Merge Join vs Hash join in cases where this detail doesn't seem to matter, we use coordinator_plan(). - coordinator_plan() is updated to remove "Result" lines There are some cases where we have subplans so we add a new function that prints all Task Count lines as well - coordinator_plan_with_subplans() Still not sure of the relevant PG commit Could be db0d67db2401eb6238ccc04c6407a4fd4f985832 but disabling enable_group_by_reordering didn't help. * Handles EXPLAIN output diffs in PG15: enable_group_by_reordering Relevant PG commit db0d67db2401eb6238ccc04c6407a4fd4f985832 * Normalizes Memory Usage, Buckets, Batches for PG15 explain diffs We create a new function in multi_test_helpers, which is similar to explain_merge function in PG15. This explain helper function normalies Memory Usage, Buckets and Batches, and we use it in the tests which give a different output for PG15. * Bump test images to 15beta3 (#6172) * Omit namespace in post-copy errmsg Relevant PG commit: 069d33d0c5a021601245e44df77a0423ddd69359 * Handles EXPLAIN output diffs in PG15: extra arrows&result lines To handle extra "->" arrows resulting from extra Result lines in explain outputs, we add the following explain method to multi_test_helpers.sql file - plan_without_arrows() is added for cases where we want the whole explain output without arrows and without Result lines * Alters public schema's owner to pg_database_owner in PG15 In PG15, public schema is owned by pg_database_owner role. In multi_extension, we drop and recreate the ppublic schema, hence its owner become the default user in our tests, postgres. Change that to pg_database_owner for PG15 consistency. This results in alternative test output for public schema grants in the following test: grant_on_schema_propagation.sql Relevant PG commit: b073c3ccd06e4cb845e121387a43faa8c68a7b62 * Add alternative test outputs for change in Insert Select display citus_local_tables_queries.sql coordinator_shouldhaveshards.sql cte_inline.sql insert_select_repartition.sql intermediate_result_pruning.sql local_shard_execution.sql local_shard_execution_replicated.sql multi_deparse_shard_query.sql multi_insert_select.sql multi_insert_select_conflict.sql multi_mx_insert_select_repartition.sql mx_coordinator_shouldhaveshards.sql single_node.sql Relevant PG commit: a8d8445a7b2f80f6d0bfe97b19f90bd2cbef8759 * Fixes columnar tap tests for PG15 In PG15, Perl test modules have been moved to a new namespace. Also, postgres node new() and get_new_node() methods have been unified to one method: new() We create separate tap tests for PG13/14 and PG15+ and update the Makefiles accordingly. Relevant PG commits: 201a76183e2056c2217129e12d68c25ec9c559c8 b3b4d8e68ae83f432f43f035c7eb481ef93e1583 * Handles EXPLAIN output diffs in PG15: HashAgg Leverage,alt. output Still not sure of the relevant PG commit Could be db0d67db2401eb6238ccc04c6407a4fd4f985832 but disabling enable_group_by_reordering didn't help.	2022-08-24 17:59:17 +02:00
Naisila Puka	ddbd10d2e7	Rename server version checks in tests (#6239 )	2022-08-24 16:31:52 +03:00
Jelte Fennema	5c0205ce10	Fix flakyness in multi_replicate_reference_table (#6235 ) In CI multi_replicate_reference_table would sometimes fail like this: ```diff -- detects correctly that referecence table doesn't have replica identity SELECT replicate_reference_tables(); -ERROR: cannot use logical replication to transfer shards of the relation initially_not_replicated_reference_table since it doesn't have a REPLICA IDENTITY or PRIMARY KEY +ERROR: cannot use logical replication to transfer shards of the relation ref_table since it doesn't have a REPLICA IDENTITY or PRIMARY KEY DETAIL: UPDATE and DELETE commands on the shard will error out during logical replication unless there is a REPLICA IDENTITY or PRIMARY KEY. HINT: If you wish to continue without a replica identity set the shard_transfer_mode to 'force_logical' or 'block_writes'. ``` Because `CitusTableTypeIdList` returns tables in heap order so it's a bit random which one is first in the list. And the test contained multiple tables that didn't have a primary key or replica identity. So it made sense that the error could be for either one of these tables. This PR makes the test output consistent by changing one of the tables to have a primary key. Example of failing test: https://app.circleci.com/pipelines/github/citusdata/citus/26387/workflows/fc3196e7-ddf2-4000-a70b-5ac71c836321/jobs/748940	2022-08-24 13:34:10 +03:00
aykut-bozkurt	041f88d7bf	Revert "Revert "Creates new colocation for colocate_with:='none' too"" (#6227 ) This reverts commit `d171a736ab`.	2022-08-24 10:54:04 +03:00
Marco Slot	bad8196da3	Verify that we can replicate reference tables using rebalancer (#6232 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-08-24 00:34:21 +02:00
Jelte Fennema	e0ada050aa	Enable binary logical replication for shard moves (#6017 ) Using binary encoding can save a lot of CPU cycles, both on the sender and on the receiver. Since the walsender and walreceiver processes are single threaded, this can matter a lot for the throughput if they are bottlenecked on CPU. This feature is only available in PG14, not PG13. It should be safe to always enable because it's only used for types that support binary encoding according to the PG docs: > Even when this option is enabled, only data types that have binary > send and receive functions will be transferred in binary. But in case it causes problems, it can still be disabled by setting `citus.enable_binary_protocol` to `false`.	2022-08-23 16:38:00 +02:00
Jelte Fennema	cc7e93a56a	Fix flakyness in failure_connection_establishment (#6226 ) In CI our failure_connection_establishment sometimes failed randomly with the following error: ```diff -- verify a connection attempt was made to the intercepted node, this would have cause the -- connection to have been delayed and thus caused a timeout SELECT * FROM citus.dump_network_traffic() WHERE conn=0; conn \| source \| message ------+--------+--------- - 0 \| coordinator \| [initial message] -(1 row) +(0 rows) SELECT citus.mitmproxy('conn.allow()'); ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26318/workflows/d3354024-9a67-4b01-9416-5cf79aec6bd8/jobs/745558 The way I fixed this was by removing the dump_network_traffic call. This might sound simple, but doing this while continuing to let the test serve its intended purpose required quite some more changes. This dump_network_traffic call was there because we didn't want to show warnings in the queries above, because the exact warnings were not reliable. The main reason this error was not reliable was because we were using round-robin task assignment. We did the same query twice, so that it would hit the node with the intercepted connection in one of those connections. Instead of doing that I'm now using the "first-replica" policy and do the queries only once. This works, because the first placements by placementid for each of the used tables are on the second node, so first-replica will cause the first connection to go there. This solved most of the flakyness, but when confirming that the flakyness was fixed I found some additional errors: ```diff -- show that INSERT failed SELECT citus.mitmproxy('conn.allow()'); mitmproxy ----------- (1 row) SELECT count(*) FROM single_replicatated WHERE key = 100; - count ---------------------------------------------------------------------- - 0 -(1 row) - +ERROR: could not establish any connections to the node localhost:9060 after 400 ms RESET client_min_messages; ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26321/workflows/fd5f4622-400c-465e-8d82-83f5f55a87ec/jobs/745666 I addressed this with a combination of two things: 1. Only change citus.node_connection_timeout for the queries that we want to test timeout behaviour for. When those queries are done I reset the value to the default again. 2. Change our mitm framework to only delay the initial connection packet instead of all packets. I think sometimes a follow on packet of a previous connection attempt was causing the next connection attempt to be delayed even if `conn.allow()` was already called. For our tests we only care about connection timeouts, so there's no reason to delay any other packets than the initial connection packet. Then there was some last flakyness in the exact error that was given: ```diff -- tests for connectivity checks SELECT name FROM r1 WHERE id = 2; WARNING: could not establish any connections to the node localhost:9060 after 900 ms +WARNING: connection to the remote node localhost:9060 failed with the following error: name ------ bar (1 row) ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26338/workflows/9610941c-4d01-4f62-84dc-b91abc56c252/jobs/746467 I don't have a good explaination for this slight change in error message, but given that it is missing the actual error message I expected this to be related to some small difference in timing: e.g. the server responding to the connection attempt right after the coordinator determined that the connection timed out. To solve this last flakyness I increased the connection timeouts and made the difference between the timeout and the delay a bit bigger. With these tweaks I wasn't able to reproduce this error on CI anymore. Finally, I made most of the same changes to failure_failover_to_local_execution, since it was using the `conn.delay()` mitm method too. The only change that I left out was the timing increase, since it might not be strictly necessary and increases time it takes to run the test. If this test ever becomes flaky the first thing we should try is increase its timeout.	2022-08-23 15:04:20 +03:00
Jelte Fennema	506c16efdf	Fix flakyness in failure_single_select (#6223 ) The failure_single_select test would sometimes fail with an error that's similar to this: ```diff -- cancel after first SELECT; txn should fail and nothing should be marked as invalid SELECT citus.mitmproxy('conn.onQuery(query="^SELECT").cancel(' \|\| pg_backend_pid() \|\| ')'); - mitmproxy ---------------------------------------------------------------------- - -(1 row) - +ERROR: canceling statement due to user request +CONTEXT: COPY mitmproxy_result, line 1: "" +SQL statement "COPY mitmproxy_result FROM '/home/circleci/project/src/test/regress/tmp_check/mitmproxy.fifo'" +PL/pgSQL function citus.mitmproxy(text) line 11 at EXECUTE BEGIN; ``` This error looked very to the one from #6217 and indeed the cause turned out to be similar. Because we were canceling all SELECT queries, we would actually sometimes cancel our mitmproxy SELECT queries itself. This puts some additional restrictions on the queries that we cancel, most importantly it should contain the name of the table that we're selecting from. I was able to reproduce the original issue locally pretty reliably. With the changes in this PR it didn't happen again. In passing this also changes one other failure test that was cancelling all selects and puts similar additional restrictions on those cancellations. Example of failed test in CI: https://app.circleci.com/pipelines/github/citusdata/citus/26305/workflows/4d942b91-f83c-453c-8d9a-ae22d608e756/jobs/745071	2022-08-22 20:06:33 +02:00
Hanefi Onaldi	e33ba7da9e	Decrease min messages for normalization	2022-08-22 17:16:52 +03:00
Jelte Fennema	e2a24b921e	Fix flakyness in failure_create_distributed_table_non_empty (#6217 ) The failure_create_distributed_table_non_empty test would sometimes fail like this: ```diff -- in the first test, cancel the first connection we sent from the coordinator SELECT citus.mitmproxy('conn.cancel(' \|\| pg_backend_pid() \|\| ')'); - mitmproxy ---------------------------------------------------------------------- - -(1 row) - +ERROR: canceling statement due to user request +CONTEXT: COPY mitmproxy_result, line 1: "" +SQL statement "COPY mitmproxy_result FROM '/home/circleci/project/src/test/regress/tmp_check/mitmproxy.fifo'" +PL/pgSQL function citus.mitmproxy(text) line 11 at EXECUTE SELECT create_distributed_table('test_table', 'id'); ``` Because the cancel command had no filter it would actually sometimes cancel the mitmproxy cancel command itself. This PR addresses that by filtering on CREATE TABLE, which is one of the command that create_distributed_table will send to the workers. Example of failing test: https://app.circleci.com/pipelines/github/citusdata/citus/26252/workflows/1b7e5464-cca4-4ec1-99b3-48ddf25c29fa/jobs/742829	2022-08-20 01:23:25 +03:00
Jelte Fennema	4ce17f015b	Fix flakyness in columnar_memory test (#6216 ) Sometimes in CI the columnar_memory test was using slightly more memory than expected. ```diff SELECT CASE WHEN 1.0 * TopMemoryContext / :top_post BETWEEN 0.98 AND 1.02 THEN 1 ELSE 1.0 * TopMemoryContext / :top_post END AS top_growth FROM columnar_test_helpers.columnar_store_memory_stats(); --[ RECORD 1 ]- -top_growth \| 1 +-[ RECORD 1 ]------------------ +top_growth \| 1.0206132116232119 -- before this change, max mem usage while executing inserts was 28MB and ``` This PR changes the expectation to be slightly higher, such that this random increase in memory usage doesn't cause a flaky test. Failing test: https://app.circleci.com/pipelines/github/citusdata/citus/26256/workflows/c0870f66-3346-4f8d-a1d3-36dfd7c98289/jobs/743028	2022-08-19 23:46:28 +02:00
Jelte Fennema	de475feb69	Actually connect to the right database in logical_replication test (#6211 ) In the logical_replication test we test that the cleanup logic at the start of a shard move works as expected. To do so we create a subscription and publication slot manually. This changes the test to make that subscription actually connect to the database that the publication is in. Useful for #5987 #6085	2022-08-20 00:09:50 +03:00
Naisila Puka	9cfadd7965	Deletes unnecessary test outputs pt2 (#6214 )	2022-08-19 18:21:13 +03:00
Jelte Fennema	e6a1a86db0	Improve debugability for columnar_memory flakyness (#6203 ) Sometimes the columnar_memory test fails in CI with the following error: ```diff SELECT 1.0 * TopMemoryContext / :top_post BETWEEN 0.98 AND 1.02 AS top_growth_ok FROM columnar_test_helpers.columnar_store_memory_stats(); -[ RECORD 1 ]-+-- -top_growth_ok \| t +top_growth_ok \| f -- before this change, max mem usage while executing inserts was 28MB and ``` This is almost certainly a harmless failure that simply requires bumping the margin a little bit. However, it's impossible to say with the current output. I was unable to reproduce this on-demand on my local machine or even in CI. So this changes the test to include the actual value difference in the size of TopMemoryContext when it's outside the expected range. Then next time it fails we at least have some information about why. Example of failing test: https://app.circleci.com/pipelines/github/citusdata/citus/25966/workflows/d472a57b-419a-4f33-b8bc-2e174a98d4d6/jobs/730576	2022-08-19 15:41:16 +02:00
Jelte Fennema	fe1668e43f	Fix flakyness in multi_utilities (#6204 ) Sometimes this multi_utilities would fail with the following error: ```diff SET citus.log_remote_commands TO ON; -- should propagate to all workers because no table is specified ANALYZE; NOTICE: issuing BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;SELECT assign_distributed_transaction_id(0, 3461, '2022-08-19 01:56:06.35816-07'); DETAIL: on server postgres@localhost:57637 connectionId: 1 NOTICE: issuing BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;SELECT assign_distributed_transaction_id(0, 3461, '2022-08-19 01:56:06.35816-07'); DETAIL: on server postgres@localhost:57638 connectionId: 2 NOTICE: issuing SET citus.enable_ddl_propagation TO 'off' DETAIL: on server postgres@localhost:57637 connectionId: 1 -NOTICE: issuing SET citus.enable_ddl_propagation TO 'off' -DETAIL: on server postgres@localhost:xxxxx connectionId: xxxxxxx NOTICE: issuing ANALYZE DETAIL: on server postgres@localhost:57637 connectionId: 1 +NOTICE: issuing SET citus.enable_ddl_propagation TO 'off' +DETAIL: on server postgres@localhost:57638 connectionId: 2 NOTICE: issuing ANALYZE DETAIL: on server postgres@localhost:57638 connectionId: 2 ``` This is simply a harmless change in output due to some timing differences. This PR makes the test output consistent by only logging the remote ANALYZE commands, not the SET commands.	2022-08-19 12:38:55 +02:00
Jelte Fennema	8ce12eb51f	Fix flakyness in failure_insert_select_repartition (#6202 ) This fixes our most commonly randomly failing failure test. The failing diff is as follows: ```diff SELECT citus.mitmproxy('conn.onQuery(query="fetch_intermediate_results").kill()'); mitmproxy ----------- (1 row) INSERT INTO target_table SELECT * FROM source_table; -ERROR: connection to the remote node localhost:xxxxx failed with the following error: connection not open +ERROR: could not open file "base/pgsql_job_cache/10_0_40/repartitioned_results_20770193413_from_4213590_to_1.data": No such file or directory +CONTEXT: while executing command on localhost:9060 +while executing command on localhost:57637 SELECT * FROM target_table ORDER BY a; ``` As far as I can tell this is the cause of a race condition: After killing fetch_intermediate_results on worker 9060, the previously created data file gets cleaned up. The fetch_intermediate_results call that's sent to worker 57637 will be cancelled and rolled back soon because of the failure on the other connection. But if that fetch_intermediate_results call is able to connect to 9060 before it is cancelled, it won't find the file it's looking for there anymore. So while it's not the error we expect, it does indicate that we succeeded. To avoid this issue instead of killing the fetch_intermediate_results call directly, we kill the COPY command that it uses to do the fetch. This results in stable output as can be seen here, where 227 runs of failure_insert_select_repartition succeeded: https://app.circleci.com/pipelines/github/citusdata/citus/26168/workflows/9c64a3b6-f46c-4725-9fb4-8f6a2d00a023/jobs/739389 To be clear this changes the test to affects the opposite fetch_intermediate_results call. This kills the fetch_intermediate_results call of worker 57637, instead of killing the fetch_intermediate_results call on worker 9060. Example of failing test: https://app.circleci.com/pipelines/github/citusdata/citus/26147/workflows/780e95ea-264a-4c9f-ad2e-cf11449a795e/jobs/738467	2022-08-19 09:11:07 +00:00
Naisila Puka	5a9fdc221b	Add explicit alias to avoid debug output diff in pg15 (#6183 )	2022-08-19 11:39:18 +03:00
Jelte Fennema	d16b458e2a	Remove the flaky rollback_to_savepoint test (#6190 ) This removes a flaky test that I introduced in #3868 after I fixed the issue described in #3622. This test is sometimes fails randomly in CI. The way it fails indicates that there might be some bug: A connection breaks after rolling back to a savepoint. I tried reproducing this issue locally, but I wasn't able to. I don't understand what causes the failure. Things that I tried were: 1. Running the test with: ```sql SET citus.force_max_query_parallelization = true; ``` 2. Running the test with: ```sql SET citus.max_adaptive_executor_pool_size = 1; ``` 3. Running the test in parallel with the same tests that it is run in parallel with in multi_schedule. None of these allowed me to reproduce the issue locally. So I think it's time to give on fixing this test and simply remove the test. The regression that this test protects against seems very unlikely to reappear, since in #3868 I also added a big comment about the need for the newly added `UnclaimConnection` call. So, I think the need for the test is quite small, and removing it will make our CI less flaky. In case the cause of the bug ever gets found, I tracked the bug in #6189 Example of a failing CI run: https://app.circleci.com/pipelines/github/citusdata/citus/26098/workflows/f84741d9-13b1-4ae7-9155-c21ed3466951/jobs/736424 For reference the unexpected diff is this (so both warnings and an error): ```diff INSERT INTO t SELECT i FROM generate_series(1, 100) i; +WARNING: connection to the remote node localhost:57638 failed with the following error: +WARNING: +CONTEXT: while executing command on localhost:57638 +ERROR: connection to the remote node localhost:57638 failed with the following error: ROLLBACK; ``` This test is also mentioned as the most failing regression test in #5975	2022-08-18 15:14:16 +03:00
Onder Kalaci	9ec8e627c1	Support Sequences owned by columns before distributing tables There are 3 different ways that a sequence can be interacting with tables. (1) and (2) are already supported. This commit adds support for (3). (1) column DEFAULT nextval('seq'): The dependency is roughly like below, and ExpandCitusSupportedTypes() is responsible for finding the depending sequences. schema <--- table <--- column <---- default value ^ \| \|------------------ sequence <--------\| (2) serial columns: Bigserial/small serial etc: The dependency is roughly like below, and ExpandCitusSupportedTypes() is responsible for finding the depending sequences. schema <--- table <--- column <---- default value ^ \| \| \| sequence <--------\| (3) Sequence OWNED BY table.column: Added support for this type of resolution in this commit. The dependency is almost like the following, and ExpandCitusSupportedTypes() is NOT responsible for finding the dependency. schema <--- table <--- column ^ \| sequence	2022-08-18 10:29:40 +02:00
Ying Xu	91473635db	[Columnar] Check for existence of Citus before creating Citus_Columnar (#6178 ) * Added a check to see if Citus has already been loaded before creating citus_columnar * added tests	2022-08-17 15:12:42 -07:00
Ahmet Gedemenli	0631e1998b	Fix upgrade paths for #6100 (#6176 ) * Fix upgrade paths for #6100 Co-authored-by: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com>	2022-08-17 18:56:53 +03:00
Naisila Puka	20a0e0ed39	Grant create on public to some users where necessary (for PG15) (#6180 )	2022-08-17 17:35:10 +03:00

... 2 3 4 5 6 ...

2183 Commits (49c62b35a0635c470a96460d4eb08ebc7cf9b7d1)