citus

Commit Graph

Author	SHA1	Message	Date
Onur Tirtir	9550ebd118	Remove pg_depend entries from columnar metadata indexes to columnar-am In the past, having columnar tables in the cluster was causing pg upgrades to fail when attempting to access columnar metadata. This is because, pg_dump doesn't see objects that we use for columnar-am related booking as the dependencies of the tables using columnar-am. To fix that; in #5456, we inserted some "normal dependency" edges (from those objects to columnar-am) into pg_depend. This helped us ensuring the existency of a class of metadata objects --such as columnar.storageid_seq-- and helped fixing #5437. However, the normal-dependency edges that we added for indexes on columnar metadata tables --such columnar.stripe_pkey-- didn't help at all because they were indeed causing dependency loops (#5510) and pg_dump was not able to take those dependency edges into the account. For this reason, this commit deletes those dependency edges so that pg_dump stops complaining about them. Note that it's not critical to delete those edges from pg_depend since they're not breaking pg upgrades but were triggering some warning messages. And given that backporting a sql change into older versions is hard a lot, we skip backporting this.	2023-03-14 17:13:52 +03:00
Onur Tirtir	be0735a329	Use "cpp" to expand "#include" directives in columnar sql files	2023-03-14 17:13:52 +03:00
Onur Tirtir	2b4be535de	Do clean-up before upgrade_columnar_before to make it runnable multiple times So that flaky test detector can run upgrade_columnar_before.sql multiple times.	2023-03-14 17:13:52 +03:00
Onur Tirtir	994f67185f	Make upgrade_columnar_after runnable multiple times This commit hides port numbers in upgrade_columnar_after because the port numbers assigned to nodes in upgrade schedule differ from the ones that flaky test detector assigns.	2023-03-14 17:13:52 +03:00
Onur Tirtir	821f26cc74	Fix flaky test detection for upgrade tests When run_test.py is run for an upgrade_._after.sql then, then automatically run the corresponding uprade_._before.sql file first. This is because all those upgrade_._after.sql files depend on the objects created in upgrade_._before.sql files by definition.	2023-03-14 17:13:52 +03:00
Onur Tirtir	f68fc9e69c	Decide core distribution params in CreateCitusTable (#6760 ) Decide core distribution params in CreateCitusTable to reduce the chances of creating Citus tables based on incorrect combinations of distribution method and replication model params. Also introduce DistributedTableParams struct to encapsulate the parameters that are specific to distributed tables.	2023-03-14 14:24:52 +03:00
Onur Tirtir	cc945fa331	Add multi_create_fdw into minimal_schedule (#6759 ) So that we can run the tests that require fake_fdw by using minimal schedule too. Also move multi_create_fdw.sql up in multi_1_schedule to make it available to more tests.	2023-03-14 10:22:34 +03:00
Gokhan Gulbiz	74b4ef1f45	Indent	2023-03-10 16:15:46 +03:00
Gokhan Gulbiz	3a24649748	Add schema name to worker_modify_identity_columns UDF	2023-03-10 16:12:05 +03:00
Gokhan Gulbiz	9f2ff9ed1f	Merge remote-tracking branch 'upstream/main' into issue/6694	2023-03-10 15:58:17 +03:00
Onur Tirtir	20a5f3af2b	Replace CITUS_TABLE_WITH_NO_DIST_KEY checks with HasDistributionKey() (#6743 ) Now that we will soon add another table type having DISTRIBUTE_BY_NONE as distribution method and that we want the code to interpret such tables mostly as distributed tables, let's make the definition of those other two table types more strict by removing CITUS_TABLE_WITH_NO_DIST_KEY macro. And instead, use HasDistributionKey() check in the places where the logic applies to all table types that have / don't have a distribution key. In future PRs, we might want to convert some of those HasDistributionKey() checks if logic only applies to Citus local / reference tables, not the others. And adding HasDistributionKey() also allows us to consider having DISTRIBUTE_BY_NONE as the distribution method as a "table attribute" that can apply to distributed tables too, rather something that determines the table type.	2023-03-10 13:55:52 +03:00
Gokhan Gulbiz	bd213a0970	Add tests for a columnar table with identity columns	2023-03-10 11:19:35 +03:00
Gokhan Gulbiz	3311b44e6a	Move supported identity column check to EnsureRelationCanBeDistributed()	2023-03-10 11:18:55 +03:00
Gokhan Gulbiz	4fb37f33cc	Move identity column existance check to ConvertTable()	2023-03-10 10:51:44 +03:00
Gokhan Gulbiz	55ebabc875	Keep read lock until we are done with identity columns check	2023-03-10 10:43:19 +03:00
Gokhan Gulbiz	b649df41c5	Merge remote-tracking branch 'upstream/main' into issue/6694	2023-03-09 11:37:18 +03:00
Onur Tirtir	e3cf7ace7c	Stabilize single_node.sql and others that report illegal node removal (#6751 ) See https://app.circleci.com/pipelines/github/citusdata/citus/30859/workflows/223d61db-8c1d-4909-9aea-d8e470f0368b/jobs/1009243.	2023-03-08 15:25:36 +03:00
Onur Tirtir	d82c11f793	Refactor CreateDistributedTable() (#6742 ) Split the main logic that allows creating a Citus table into the internal function CreateCitusTable(). Old CreateDistributedTable() function was assuming that it's creating a reference table when the distribution method is DISTRIBUTE_BY_NONE. However, soon this won't be the case when adding support for creating single-shard distributed tables because their distribution method would also be the same. Now the internal method CreateCitusTable() doesn't make any assumptions about table's replication model or such. Instead, it expects callers to properly set all such metadata bits. Even more, some of the parameters the old CreateDistributedTable() takes --such as the shard count-- were not meaningful for a reference table, and would be the same as for new table type.	2023-03-08 13:38:51 +03:00
Emel Şimşek	4043abd5aa	Exclude-Generated-Columns-In-Copy (#6721 ) DESCRIPTION: Fixes a bug in shard copy operations. For copying shards in both shard move and shard split operations, Citus uses the COPY statement. A COPY all statement in the following form ` COPY target_shard FROM STDIN;` throws an error when there is a GENERATED column in the shard table. In order to fix this issue, we need to exclude the GENERATED columns in the COPY and the matching SELECT statements. Hence this fix converts the COPY and SELECT all statements to the following form: ``` COPY target_shard (col1, col2, ..., coln) FROM STDIN; SELECT (col1, col2, ..., coln) FROM source_shard; ``` where (col1, col2, ..., coln) does not include a GENERATED column. GENERATED column values are created in the target_shard as the values are inserted. Fixes #6705. --------- Co-authored-by: Teja Mupparti <temuppar@microsoft.com> Co-authored-by: aykut-bozkurt <51649454+aykut-bozkurt@users.noreply.github.com> Co-authored-by: Jelte Fennema <jelte.fennema@microsoft.com> Co-authored-by: Gürkan İndibay <gindibay@microsoft.com>	2023-03-07 18:15:50 +03:00
Ahmet Gedemenli	03f1bb70b7	Rebalance shard groups with placement count less than worker count (#6739 ) DESCRIPTION: Adds logic to distribute unbalanced shards If the number of shard placements (for a colocation group) is less than the number of workers, it means that some of the workers will remain empty. With this PR, we consider these shard groups as a colocation group, in order to make them be distributed evenly as much as possible across the cluster. Example: ```sql create table t1 (a int primary key); create table t2 (a int primary key); create table t3 (a int primary key); set citus.shard_count =1; select create_distributed_table('t1','a'); select create_distributed_table('t2','a',colocate_with=>'t1'); select create_distributed_table('t3','a',colocate_with=>'t2'); create table tb1 (a bigint); create table tb2 (a bigint); select create_distributed_table('tb1','a'); select create_distributed_table('tb2','a',colocate_with=>'tb1'); select citus_add_node('localhost',9702); select rebalance_table_shards(); ``` Here we have two colocation groups, each with one shard group. Both shard groups are placed on the first worker node. When we add a new worker node and try to rebalance table shards, the rebalance planner considers it well balanced and does nothing. With this PR, the rebalancer tries to distribute these shard groups evenly across the cluster as much as possible. For this example, with this PR, the rebalancer moves one of the shard groups to the second worker node. fixes: #6715	2023-03-06 14:14:27 +03:00
Emel Şimşek	ed7cc8f460	Remove unused lock functions (#6747 ) Code cleanup. This change removes two unused functions seemingly left over after a previous refactoring of shard move code.	2023-03-06 13:59:45 +03:00
Jelte Fennema	b489d763e1	Use pg_total_relation_size in citus_shards (#6748 ) DESCRIPTION: Correctly report shard size in citus_shards view When looking at citus_shards, people are interested in the actual size that all the data related to the shard takes up on disk. `pg_total_relation_size` is the function to use for that purpose. The previously used `pg_relation_size` does not include indexes or TOAST. Especially the missing toast can have enormous impact on the size of the shown data.	2023-03-06 10:53:12 +01:00
Gokhan Gulbiz	41e8255b16	Add table ownership check for identity column modifications	2023-03-06 11:42:17 +03:00
Gledis Zeneli	dc7fa0d5af	Fix multiple output version arbitrary config tests (#6744 ) With this small change, arbitrary config tests can have multiple acceptable correct outputs. For an arbitrary config tests named `t`, now you can define `expected/t.out`, `expected/t_0.out`, `expected/t_1.out` etc and the test will succeed if the output of `sql/t.sql` is equal to any of the `t.out` or `t_{0, 1, ...}.out` files.	2023-03-03 21:06:59 +03:00
Onur Tirtir	0d401344c2	Stabilize single node tests (#6741 ) First of all, we set next_shard_id for single_node_truncate.sql because shard ids in the test output were changing whenever we modify a prior test file, such as single_node.sql. Then the flaky test detector started complaining about single_node_truncate.sql. We fix that by specifying the correct test dependency for it in run_test.py. We also do the same for single_node.sql.	2023-03-03 17:17:08 +03:00
Onur Tirtir	a9820e96a3	Make single_node_truncate.sql re-runnable First of all, this commit sets next_shard_id for single_node_truncate.sql because shard ids in the test output were changing whenever we modify a prior test file. Then the flaky test detector started complaining about single_node_truncate.sql. We fix that by specifying the correct test dependency for it in run_test.py.	2023-03-02 16:33:18 +03:00
Onur Tirtir	40105bf1fc	Make single_node.sql re-runnable	2023-03-02 16:33:17 +03:00
Gokhan Gulbiz	fe9304054a	Merge branch 'main' into issue/6694	2023-03-02 08:45:35 +03:00
Gokhan Gulbiz	f027a47ca8	Fix string eval bug in migration files check (#6740 )	2023-03-02 08:44:57 +03:00
Gokhan Gulbiz	d2002823d1	Merge remote-tracking branch 'upstream/main' into issue/6694	2023-03-01 14:41:50 +03:00
Gokhan Gulbiz	d8d2ce3c49	Add tests for create_distributed_table_concurrently	2023-03-01 14:11:08 +03:00
Gokhan Gulbiz	7701ca12e0	Update migration sql scripts.	2023-03-01 14:10:15 +03:00
Gokhan Gulbiz	cd69b975a3	Fix flakyness	2023-03-01 14:10:15 +03:00
Gokhan Gulbiz	08bf877d29	Minor refactorings	2023-03-01 14:09:06 +03:00
Gokhan Gulbiz	9e45f8c6b7	Indent	2023-03-01 14:09:06 +03:00
Gokhan Gulbiz	e684e15e40	Fix regressions	2023-03-01 14:09:06 +03:00
Gokhan Gulbiz	02eacd4113	Introduce worker_modify_identity_columns() udf	2023-03-01 14:09:06 +03:00
Gokhan Gulbiz	afed03336d	Propagate identity columns as-is to workers.	2023-03-01 14:09:06 +03:00
aykut-bozkurt	e2654deeae	fix memory leak during altering distributed table with a lot of partition and shards (#6726 ) 2 improvements to prevent memory leaks during altering or undistributing distributed tables with a lot of partitions and shards: 1. Free memory for each call to ConvertTable so that colocated and partition tables at `AlterDistributedTable`, `UndistributeTable`, or `AlterTableSetAccessMethod` will not cause an increase in memory usage, 2. Free memory while executing attach partition commands for each partition table at `AlterDistributedTable` to prevent an increase in memory usage. DESCRIPTION: Fixes memory leak issue during altering distributed table with a lot of partition and shards. Fixes https://github.com/citusdata/citus/issues/6503.	2023-02-28 21:23:41 +03:00
Jelte Fennema	17ad61678f	Make run_test.py and create_test.py importable without errors (#6736 ) Allowing scripts to be importable is good practice in general and it's required for the pytest testing framework that I'm adding in a follow up PR.	2023-02-28 00:34:42 +03:00
Jelte Fennema	c018e29bec	Don't blanket ignore flake8 E402 error (#6734 ) Instead this starts ignoring it in specific places only, because most files don't actually need it ignored.	2023-02-27 18:17:15 +03:00
Gürkan İndibay	7b8e614039	Fixes bookworm packaging pipeline problem (#6737 ) Recently, I changed Python execution structure into virtual. Therefore, now there is no need change built in python for the images. Since Github is provisioning images with specific permissions, this issue caused error. With this PR, I removed unnecessary installation of pip and setuptools in container docker image Additionally, removed some unnecessary sudos and used ap-get instead of apt in one place	2023-02-27 15:28:36 +03:00
Jelte Fennema	24ad8574b5	Fix run_test.py on python 3.9 (#6735 ) In #6718 I accidentally added Python type hint syntax that was only supported on Python 3.10. Our CI uses 3.9, so this PR changes that to a syntax that's supported on 3.9 too.	2023-02-27 10:12:18 +01:00
Teja Mupparti	9cbfdc86dd	MERGE: In deparser, add missing check for RETURNING clause.	2023-02-26 22:38:14 -08:00
Teja Mupparti	d7b499929c	Rearrange the common code into a newfunction to facilitate the multiple checks of the same conditions in a multi-modify MERGE statement	2023-02-24 12:55:11 -08:00
aykut-bozkurt	a7689c3f8d	fix memory leak during distribution of a table with a lot of partitions (#6722 ) We have memory leak during distribution of a table with a lot of partitions as we do not release memory at ExprContext until all partitions are not distributed. We improved 2 things to resolve the issue: 1. We create and delete MemoryContext for each call to `CreateDistributedTable` by partitions, 2. We rebuild the cache after we insert all the placements instead of each placement for a shard. DESCRIPTION: Fixes memory leak during distribution of a table with a lot of partitions and shards. Fixes https://github.com/citusdata/citus/issues/6572.	2023-02-17 18:12:49 +03:00
Emel Şimşek	756c1d3f5d	Remove auto_explain workaround in citus explain hook for ALTER TABLE (#6714 ) When auto_explain module is loaded and configured, EXPLAIN will be implicitly run for all the supported commands. Postgres does not support `EXPLAIN` for `ALTER` command. However, auto_explain will try to `EXPLAIN` other supported commands internally triggered by `ALTER`. For instance, `ALTER TABLE target_table ADD CONSTRAINT fkey_167 FOREIGN KEY (col_1) REFERENCES ref_table(key) ... ` command may trigger a SELECT command in the following form for foreign key validation purpose: `SELECT fk.col_1 FROM ONLY target_table fk LEFT OUTER JOIN ONLY ref_table pk ON ( pk.key OPERATOR(pg_catalog.=) fk.col_1) WHERE pk.key IS NULL AND (fk.col_1 IS NOT NULL) ` For Citus tables, the Citus utility hook should ensure that constraint validation is skipped for shell tables but they are done for shard tables. The reason behind this design choice can be summed up as: - An ALTER TABLE command via coordinator node is run in a distributed transaction. - Citus does not support nested distributed transactions. - A SELECT query on a distributed table (aka shell table) is also run in a distributed transaction. - Therefore, Citus does not support running a SELECT query on a shell table while an ALTER TABLE command is running. With `eadc88a800` a bug is introduced breaking the skip constraint validation behaviour of Citus. With this change, we see that validation queries on distributed tables are triggered within `ALTER` command adding constraints with validation check. This regression did not cause an issue for regular use cases since the citus executor hook blocks those queries heuristically when there is an ALTER TABLE command in progress. The issue is surfaced as a crash (#6424 Workers, when configured to use auto_explain, crash during distributed transactions.) when auto_explain is enabled. This is due to auto_explain trying to execute the SELECT queries in a nested distributed transaction. Now since the regression with constraint validation is fixed in https://github.com/citusdata/citus/issues/6543, we should be able to remove the workaround.	2023-02-17 17:47:03 +03:00
aykut-bozkurt	9e69dd0e7f	fix single tuple result memory leak (#6724 ) We should not omit to free PGResult when we receive single tuple result from an internal backend. Single tuple results are normally freed by our ReceiveResults for `tupleDescriptor != NULL` flow but not for those with `tupleDescriptor == NULL`. See PR #6722 for details. DESCRIPTION: Fixes memory leak issue with query results that returns single row.	2023-02-17 14:15:09 +03:00
Teja Mupparti	ca65d2ba0b	Fix flaky tests local_shards_execution and local_shards_execution_replication. O Simple fix is to add ORDER BY to have definitive results. O Add search_path explicitly after reconnecting, this avoids creating objects in public schema which prevents us from repetitive running of tests. O multi_mx_modification is not designed to run repetitive, so isolate it.	2023-02-15 09:18:10 -08:00
Hanefi Onaldi	902d4262f9	CI checks to check for missing downgrade updates (#6661 ) A branch that touches a set of upgrade scripts is also expected to touch corresponding downgrade scripts as well. To ensure that I introduce a new CI script. If this script fails, read the output and make sure you update the downgrade scripts in the printed list.	2023-02-15 18:20:14 +03:00

1 2 3 4 5 ...

6488 Commits (24db425288cae064476246fb1b1841a4381e8f3b) All Branches Search

6488 Commits (24db425288cae064476246fb1b1841a4381e8f3b)

All Branches