citus

Commit Graph

Author	SHA1	Message	Date
Naisila Puka	cedcc220bf	Fixes flaky VACUUM (freeze, process toast true) result (#7348 ) https://app.circleci.com/pipelines/github/citusdata/citus/34550/workflows/5b802f66-2666-4623-a209-6d7799f7ee5f/jobs/1229153 ```diff VACUUM (FREEZE, PROCESS_TOAST true) local_vacuum_table; SELECT relfrozenxid::text::integer > :frozenxid AS frozen_performed FROM pg_class WHERE oid=:reltoastrelid::regclass; frozen_performed ------------------ - t + f (1 row) ``` Process toast option in vacuum was introduced in PG14. The failing test was supposed to be a part of `multi_utilities.sql`, but it was included in `pg14.sql` to avoid alternative output for PG13. See `ba62c0a148 (diff-ed03478f693155e2fe092e9ad356bf884dc097f554e8d75eff562d52bbcf7a75L255-L272)` for reference. However, now that we don't support PG13 anymore, we can move this test to `multi_utilities.sql`. Moving the test, plus inserting data before running vacuum freeze such that the freeze is more meaningful and not flaky, fixes the flakiness problem of the test.	2023-11-17 18:58:06 +03:00
Naisila Puka	c88bf5ff1c	Cleanup leftover replication slots in publication test (#7354 )	2023-11-17 15:11:38 +03:00
Japin Li	e14e8667cc	Fix redundant variable declaration (#7353 ) The `$workerCount` declare twice in src/test/regress/pg_regress_multi.pl.	2023-11-17 13:01:23 +03:00
Gürkan İndibay	32b0fc23f5	Removes unnecessary package installations in packaging pipelines (#7341 ) With the recent changes in packaging images, linux package installations to execute validate_output is unnecessary now. In this PR, I removed them to make the pipeline more effective. - [x] Remove the test warning before merge	2023-11-17 08:51:56 +03:00
Naisila Puka	55d500de8d	Remove accidentally added gucs.out (#7349 )	2023-11-16 14:51:31 +03:00
Hanefi Onaldi	5efd3f181a	Fix wrong PR links in changelog (#7350 ) When preparing changelog for 12.1.1 release, I accidentally swapped the PR numbers for the two commits. This commit fixes the changelog to point to the correct PRs.	2023-11-16 14:12:17 +03:00
Naisila Puka	0d1f18862b	Propagates SECURITY LABEL ON ROLE stmt (#7304 ) We propagate `SECURITY LABEL [for provider] ON ROLE rolename IS labelname` to the worker nodes. We also make sure to run the relevant `SecLabelStmt` commands on a newly added node by looking at roles found in `pg_shseclabel`. See official docs for explanation on how this command works: https://www.postgresql.org/docs/current/sql-security-label.html This command stores the role label in the `pg_shseclabel` catalog table. This commit also fixes the regex string in `check_gucs_are_alphabetically_sorted.sh` script such that it escapes the dot. Previously it was looking for all strings starting with "citus" instead of "citus." as it should. To test this feature, I currently make use of a special GUC to control label provider registration in PG_init when creating the Citus extension.	2023-11-16 13:12:30 +03:00
Naisila Puka	c6fbb72c02	Fix flaky multi_prepare_plsql (#7346 ) Simple need of an `ORDER BY` clause Ran into this twice this week already! https://github.com/citusdata/citus/actions/runs/6849701315/attempts/1#summary-18622563506 https://github.com/citusdata/citus/actions/runs/6875051160/attempts/1#summary-18698009952 ```diff SELECT nspname, typname FROM pg_type JOIN pg_namespace ON pg_namespace.oid = pg_type.typnamespace WHERE typname = 'prepare_ddl_type_backup'; nspname \| typname -------------+------------------------- - public \| prepare_ddl_type_backup otherschema \| prepare_ddl_type_backup + public \| prepare_ddl_type_backup (2 rows) ```	2023-11-15 13:28:43 +03:00
Naisila Puka	a960799dfb	Clean up leftover replication slots in tests (#7338 ) This commit fixes the flakiness in `logical_replication` and `citus_non_blocking_split_shard_cleanup` tests. The flakiness was related to leftover replication slots. Below is a flaky example for each test: logical_replication https://github.com/citusdata/citus/actions/runs/6721324131/attempts/1#summary-18267030604 citus_non_blocking_split_shard_cleanup https://github.com/citusdata/citus/actions/runs/6721324131/attempts/1#summary-18267006967 ```diff -- Replication slots should be cleaned up SELECT slot_name FROM pg_replication_slots; slot_name --------------------------------- -(0 rows) + citus_shard_split_slot_19_10_17 +(1 row) ``` The tests by themselves are not flaky: 32 flaky test schedules each with 20 runs run successfully. https://github.com/citusdata/citus/actions/runs/6822020127?pr=7338 The conclusion is that: 1. `multi_tenant_isolation_nonblocking` is the problematic test running before `logical_replication` in the `enterprise_schedule`, so I added a cleanup at the end of `multi_tenant_isolation_nonblocking`. https://github.com/citusdata/citus/actions/runs/6824334614/attempts/1#summary-18560127461 2. `citus_split_shard_by_split_points_negative` is the problematic test running before `citus_non_blocking_split_shards_cleanup` in the split schedule. Also added cleanup line. For details on the investigation of leftover replication slots, please check the PR https://github.com/citusdata/citus/pull/7338	2023-11-14 18:50:54 +03:00
Naisila Puka	cdef2d5224	Random tests refactoring (#7342 ) While investigating replication slots leftovers in PR https://github.com/citusdata/citus/pull/7338, I ran into the following refactoring/cleanup that can be done in our test suite: - Add separate test to remove non default nodes - Remove coordinator removal from `add_coordinator` test Use `remove_coordinator_from_metadata` test where needed - Don't print nodeids in `multi_multiuser_auth` and `multi_poolinfo_usage` tests - Use `startswith` when checking for isolation or failure tests - Add some dependencies accordingly in `run_test.py` for running flaky test schedules	2023-11-14 12:49:15 +03:00
Naisila Puka	e4ac3e6d9a	Bump PG versions to latest minors 14.10, 15.5, 16.1 (#7336 ) Postgres got minor updates on Nov9, this starts using the images with the latest version for our tests, namely 14.10, 15.5 and 16.1. These minor updates were compatible with Citus. Sister PR: https://github.com/citusdata/the-process/pull/152	2023-11-13 15:05:38 +03:00
Onur Tirtir	240313e286	Support role commands from any node (#7278 ) DESCRIPTION: Adds support from issuing role management commands from worker nodes It's unlikely to get into a distributed deadlock with role commands, we don't care much about them at the moment. There were several attempts to reduce the chances of a deadlock but we didn't any of them merged into main branch yet, see: #7325 #7016 #7009	2023-11-10 09:58:51 +00:00
Naisila Puka	57ff762c82	Fix VACUUM flakiness in multi_utilities (#7334 ) When I run this test in my local, the size of the table after the DELETE command is around 58785792. Hence, I assume that the diffs suggest that the Vacuum had no effect. The current solution is to run the VACUUM command three times instead of once. Example diff: https://github.com/citusdata/citus/actions/runs/6722231142/attempts/1#summary-18269870674 ```diff insert into local_vacuum_table select i from generate_series(1,1000000) i; delete from local_vacuum_table; VACUUM local_vacuum_table; SELECT CASE WHEN s BETWEEN 20000000 AND 25000000 THEN 22500000 ELSE s END FROM pg_total_relation_size('local_vacuum_table') s ; s ---------- - 22500000 + 58785792 (1 row) ``` See more diff examples in the PR description https://github.com/citusdata/citus/pull/7334	2023-11-09 21:00:24 +03:00
dependabot[bot]	c028d929b5	Bump werkzeug from 2.3.7 to 3.0.1 in /.devcontainer/src/test/regress Bumps [werkzeug](https://github.com/pallets/werkzeug) from 2.3.7 to 3.0.1. - [Release notes](https://github.com/pallets/werkzeug/releases) - [Changelog](https://github.com/pallets/werkzeug/blob/main/CHANGES.rst) - [Commits](https://github.com/pallets/werkzeug/compare/2.3.7...3.0.1) --- updated-dependencies: - dependency-name: werkzeug dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2023-11-09 17:14:14 +01:00
dependabot[bot]	d4663212f4	Bump werkzeug from 2.3.7 to 3.0.1 in /src/test/regress Bumps [werkzeug](https://github.com/pallets/werkzeug) from 2.3.7 to 3.0.1. - [Release notes](https://github.com/pallets/werkzeug/releases) - [Changelog](https://github.com/pallets/werkzeug/blob/main/CHANGES.rst) - [Commits](https://github.com/pallets/werkzeug/compare/2.3.7...3.0.1) --- updated-dependencies: - dependency-name: werkzeug dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2023-11-09 17:14:14 +01:00
Nils Dijk	0dac63afc0	move pg_version_constants.h to toplevel include (#7335 ) In preparation of sorting and grouping all includes we wanted to move this file to the toplevel includes for good grouping/sorting.	2023-11-09 15:09:39 +00:00
Hanefi Onaldi	92228b279a	Add changelog entries for 12.1.1 (#7332 ) Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>	2023-11-09 14:19:28 +00:00
Naisila Puka	0dc41ee5a0	Fix flaky multi_mx_insert_select_repartition test (#7331 ) https://github.com/citusdata/citus/actions/runs/6745019678/attempts/1#summary-18336188930 ```diff insert into target_table SELECT a*2 FROM source_table RETURNING a; -NOTICE: executing the command locally: SELECT bytes FROM fetch_intermediate_results(ARRAY['repartitioned_results_xxxxx_from_4213582_to_0','repartitioned_results_xxxxx_from_4213584_to_0']::text[],'localhost',57638) bytes +NOTICE: executing the command locally: SELECT bytes FROM fetch_intermediate_results(ARRAY['repartitioned_results_3940758121873413_from_4213584_to_0','repartitioned_results_3940758121873413_from_4213582_to_0']::text[],'localhost',57638) bytes ``` The elements in the array passed to `fetch_intermediate_results` are the same, but in the opposite order than expected. To fix this flakiness, we can omit the `"SELECT bytes FROM fetch_intermediate_results..."` line. From the following logs, it is understandable that the intermediate results have been fetched.	2023-11-08 15:15:33 +03:00
Onur Tirtir	444e6cb7d6	Remove useless variables (#7327 ) To fix warnings observed when using different compiler versions.	2023-11-07 16:39:08 +03:00
cvbhjkl	e535f53ce5	Fix typo in local_executor.c (#7324 ) Fix a typo 'remaning' -> 'remaining' in local_executor.c	2023-11-03 12:14:11 +00:00
Onur Tirtir	21646ca1e9	Fix flaky isolation_get_all_active_transactions.spec test (#7323 ) Fix the flaky test that results in following diff by waiting until the backend that we want to terminate really terminates, until 5secs. ```diff --- /__w/citus/citus/src/test/regress/expected/isolation_get_all_active_transactions.out.modified 2023-11-01 16:30:57.648749795 +0000 +++ /__w/citus/citus/src/test/regress/results/isolation_get_all_active_transactions.out.modified 2023-11-01 16:30:57.656749877 +0000 @@ -114,13 +114,13 @@ -------------------- t (1 row) step s3-show-activity: SET ROLE postgres; select count() from get_all_active_transactions() where process_id IN (SELECT FROM selected_pid); count ----- - 0 + 1 (1 row) ```	2023-11-03 09:00:32 +01:00
Onur Tirtir	5e2439a117	Make some more tests re-runable (#7322 ) * multi_mx_create_table * multi_mx_function_table_reference * multi_mx_add_coordinator * create_role_propagation * metadata_sync_helpers * text_search https://github.com/citusdata/citus/pull/7278 requires this.	2023-11-02 18:32:56 +03:00
Jelte Fennema-Nio	85b997a0fb	Fix flaky multi_alter_table_statements (#7321 ) Sometimes multi_alter_table_statements would fail in CI like this: ```diff -- Verify that DROP NOT NULL works ALTER TABLE lineitem_alter ALTER COLUMN int_column2 DROP NOT NULL; SELECT "Column", "Type", "Modifiers" FROM table_desc WHERE relid='lineitem_alter'::regclass; - Column \| Type \| Modifiers ---------------------------------------------------------------------- - l_orderkey \| bigint \| not null - l_partkey \| integer \| not null - l_suppkey \| integer \| not null - l_linenumber \| integer \| not null - l_quantity \| numeric(15,2) \| not null - l_extendedprice \| numeric(15,2) \| not null - l_discount \| numeric(15,2) \| not null - l_tax \| numeric(15,2) \| not null - l_returnflag \| character(1) \| not null - l_linestatus \| character(1) \| not null - l_shipdate \| date \| not null - l_commitdate \| date \| not null - l_receiptdate \| date \| not null - l_shipinstruct \| character(25) \| not null - l_shipmode \| character(10) \| not null - l_comment \| character varying(44) \| not null - float_column \| double precision \| default 1 - date_column \| date \| - int_column1 \| integer \| - int_column2 \| integer \| - null_column \| integer \| -(21 rows) - +ERROR: schema "alter_table_add_column" does not exist -- COPY should succeed now SELECT master_create_empty_shard('lineitem_alter') as shardid \gset ``` Reading from table_desc apparantly has an issue that if the schema gets deleted from one of the items, while it is being read that we get such an error. This change fixes that by not running multi_alter_table_statements in parallel with alter_table_add_column anymore. This is another instance of the same issue as in #7294	2023-11-02 16:42:45 +03:00
Jelte Fennema-Nio	f171ec98fc	Fix flaky failure_distributed_results (#7307 ) Sometimes in CI we run into this failure: ```diff SELECT resultId, nodeport, rowcount, targetShardId, targetShardIndex FROM partition_task_list_results('test', $$ SELECT * FROM source_table $$, 'target_table') NATURAL JOIN pg_dist_node; -WARNING: connection to the remote node localhost:xxxxx failed with the following error: connection not open +ERROR: connection to the remote node localhost:9060 failed with the following error: connection not open SELECT * FROM distributed_result_info ORDER BY resultId; - resultid \| nodeport \| rowcount \| targetshardid \| targetshardindex ---------------------------------------------------------------------- - test_from_100800_to_0 \| 9060 \| 22 \| 100805 \| 0 - test_from_100801_to_0 \| 57637 \| 2 \| 100805 \| 0 - test_from_100801_to_1 \| 57637 \| 15 \| 100806 \| 1 - test_from_100802_to_1 \| 57637 \| 10 \| 100806 \| 1 - test_from_100802_to_2 \| 57637 \| 5 \| 100807 \| 2 - test_from_100803_to_2 \| 57637 \| 18 \| 100807 \| 2 - test_from_100803_to_3 \| 57637 \| 4 \| 100808 \| 3 - test_from_100804_to_3 \| 9060 \| 24 \| 100808 \| 3 -(8 rows) - +ERROR: current transaction is aborted, commands ignored until end of transaction block -- fetch from worker 2 should fail SAVEPOINT s1; +ERROR: current transaction is aborted, commands ignored until end of transaction block SELECT fetch_intermediate_results('{test_from_100802_to_1,test_from_100802_to_2}'::text[], 'localhost', :worker_2_port) > 0 AS fetched; -ERROR: could not open file "base/pgsql_job_cache/xx_x_xxx/test_from_100802_to_1.data": No such file or directory -CONTEXT: while executing command on localhost:xxxxx +ERROR: current transaction is aborted, commands ignored until end of transaction block ROLLBACK TO SAVEPOINT s1; +ERROR: savepoint "s1" does not exist -- fetch from worker 1 should succeed SELECT fetch_intermediate_results('{test_from_100802_to_1,test_from_100802_to_2}'::text[], 'localhost', :worker_1_port) > 0 AS fetched; - fetched ---------------------------------------------------------------------- - t -(1 row) - +ERROR: current transaction is aborted, commands ignored until end of transaction block -- make sure the results read are same as the previous transaction block SELECT count(*), sum(x) FROM read_intermediate_results('{test_from_100802_to_1,test_from_100802_to_2}'::text[],'binary') AS res (x int); - count \| sum ---------------------------------------------------------------------- - 15 \| 863 -(1 row) - +ERROR: current transaction is aborted, commands ignored until end of transaction block ROLLBACk; ``` As outlined in the #7306 I created, the reason for this is related to only having a single connection open to the node. Finding and fixing the full cause is not trivial, so instead this PR starts working around this bug by forcing maximum parallelism. Preferably we'd want this workaround not to be necessary, but that requires spending time to fix this. For now having a less flaky CI is good enough.	2023-11-02 12:31:56 +00:00
Jelte Fennema-Nio	b47c8b3fb0	Fix flaky insert_select_connection_leak (#7302 ) Sometimes in CI insert_select_connection_leak would fail like this: ```diff END; SELECT worker_connection_count(:worker_1_port) - :pre_xact_worker_1_connections AS leaked_worker_1_connections, worker_connection_count(:worker_2_port) - :pre_xact_worker_2_connections AS leaked_worker_2_connections; leaked_worker_1_connections \| leaked_worker_2_connections -----------------------------+----------------------------- - 0 \| 0 + -1 \| 0 (1 row) -- ROLLBACK BEGIN; INSERT INTO target_table SELECT * FROM source_table; INSERT INTO target_table SELECT * FROM source_table; ROLLBACK; SELECT worker_connection_count(:worker_1_port) - :pre_xact_worker_1_connections AS leaked_worker_1_connections, worker_connection_count(:worker_2_port) - :pre_xact_worker_2_connections AS leaked_worker_2_connections; leaked_worker_1_connections \| leaked_worker_2_connections -----------------------------+----------------------------- - 0 \| 0 + -1 \| 0 (1 row) \set VERBOSITY TERSE -- Error on constraint failure BEGIN; INSERT INTO target_table SELECT * FROM source_table; SELECT worker_connection_count(:worker_1_port) AS worker_1_connections, worker_connection_count(:worker_2_port) AS worker_2_connections \gset SAVEPOINT s1; INSERT INTO target_table SELECT a, CASE WHEN a < 50 THEN b ELSE null END FROM source_table; @@ -89,15 +89,15 @@ leaked_worker_1_connections \| leaked_worker_2_connections -----------------------------+----------------------------- 0 \| 0 (1 row) END; SELECT worker_connection_count(:worker_1_port) - :pre_xact_worker_1_connections AS leaked_worker_1_connections, worker_connection_count(:worker_2_port) - :pre_xact_worker_2_connections AS leaked_worker_2_connections; leaked_worker_1_connections \| leaked_worker_2_connections -----------------------------+----------------------------- - 0 \| 0 + -1 \| 0 (1 row) ``` Source: https://github.com/citusdata/citus/actions/runs/6718401194/attempts/1#summary-18258258387 A negative amount of leaked connectios is obviously not possible. For some reason there was a connection open when we checked the initial amount of connections that was closed afterwards. This could be the from the maintenance daemon or maybe from the previous test that had not fully closed its connections just yet. The change in this PR doesnt't actually fix the cause of the negative connection, but it simply considers it good as well, by changing the result to zero for negative values. With this fix we might sometimes miss a leak, because the negative number can cancel out the leak and still result in a 0. But since the negative number only occurs sometimes, we'll still find the leak often enough.	2023-11-02 13:15:43 +01:00
Cédric Villemain	0678a2fd89	Fix #7242 , CALL(@0) crash backend (#7288 ) When executing a prepared CALL, which is not pure SQL but available with some drivers like npgsql and jpgdbc, Citus entered a code path where a plan is not defined, while trying to increase its cost. Thus SIG11 when plan is a NULL pointer. Fix by only increasing plan cost when plan is not null. However, it is a bit suspicious to get here with a NULL plan and maybe a better change will be to not call ShardPlacementForFunctionColocatedWithDistTable() with a NULL plan at all (in call.c:134) bug hit with for example: ``` CallableStatement proc = con.prepareCall("{CALL p(?)}"); proc.registerOutParameter(1, java.sql.Types.BIGINT); proc.setInt(1, -100); proc.execute(); ``` where `p(bigint)` is a distributed "function" and the param the distribution key (also in a distributed table), see #7242 for details Fixes #7242	2023-11-02 13:15:24 +01:00
Jelte Fennema-Nio	5a48a1602e	Debug flaky logical_replication test (#7309 ) Sometimes in CI our logical_replication test fails like this: ```diff +++ /__w/citus/citus/src/test/regress/results/logical_replication.out.modified 2023-11-01 14:15:08.562758546 +0000 @@ -40,21 +40,21 @@ SELECT count() from pg_publication; count ------- 0 (1 row) SELECT count() from pg_replication_slots; count ------- - 0 + 1 (1 row) SELECT count(*) FROM dist; count ------- ``` It's hard to understand what is going on here, just based on the wrong number. So this PR changes the test to show the name of the subscription, publication and replication slot to make finding the cause easier. In passing this also fixes another flaky test in the same file that our flaky test detection picked up. This is done by waiting for resource cleanup after the shard move.	2023-11-02 13:15:02 +01:00
Jelte Fennema-Nio	6fed82609c	Do not download all artifacts for flaky test detection (#7320 ) This is causing 404 failures due to a race condition: https://github.com/actions/toolkit/issues/1235 It also makes the tests take unnecessarily long. This was tested by changing a test file and seeing that the flaky test detection was still working.	2023-11-02 12:13:29 +00:00
Onur Tirtir	9867c5b949	Fix flaky multi_mx_node_metadata.sql test (#7317 ) Fixes the flaky test that results in following diff: ```diff --- /__w/citus/citus/src/test/regress/expected/multi_mx_node_metadata.out.modified 2023-11-01 14:22:12.890476575 +0000 +++ /__w/citus/citus/src/test/regress/results/multi_mx_node_metadata.out.modified 2023-11-01 14:22:12.914476657 +0000 @@ -840,24 +840,26 @@ (1 row) \c :datname - - :master_port SELECT datname FROM pg_stat_activity WHERE application_name LIKE 'Citus Met%'; datname ------------ db_to_drop (1 row) DROP DATABASE db_to_drop; +ERROR: database "db_to_drop" is being accessed by other users SELECT datname FROM pg_stat_activity WHERE application_name LIKE 'Citus Met%'; datname ------------ -(0 rows) + db_to_drop +(1 row) -- cleanup DROP SEQUENCE sequence CASCADE; NOTICE: drop cascades to default value for column a of table reference_table ```	2023-11-02 11:02:34 +00:00
Gürkan İndibay	184c8fc1ee	Enriches statement propagation document (#7267 ) Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com> Co-authored-by: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com> Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>	2023-11-02 09:59:34 +00:00
Jelte Fennema-Nio	a6e86884f6	Fix flaky isolation_metadata_sync_deadlock (#7312 ) Sometimes isolation_metadata_sync_deadlock fails in CI like this: ```diff diff -dU10 -w /__w/citus/citus/src/test/regress/expected/isolation_metadata_sync_deadlock.out /__w/citus/citus/src/test/regress/results/isolation_metadata_sync_deadlock.out --- /__w/citus/citus/src/test/regress/expected/isolation_metadata_sync_deadlock.out.modified 2023-11-01 16:03:15.090199229 +0000 +++ /__w/citus/citus/src/test/regress/results/isolation_metadata_sync_deadlock.out.modified 2023-11-01 16:03:15.098199312 +0000 @@ -110,10 +110,14 @@ t (1 row) step s2-stop-connection: SELECT stop_session_level_connection_to_node(); stop_session_level_connection_to_node ------------------------------------- (1 row) + +teardown failed: ERROR: localhost:57638 is a metadata node, but is out of sync +HINT: If the node is up, wait until metadata gets synced to it and try again. +CONTEXT: SQL statement "SELECT master_remove_distributed_table_metadata_from_workers(v_obj.objid, v_obj.schema_name, v_obj.object_name)" ``` Source: https://github.com/citusdata/citus/actions/runs/6721938040/attempts/1#summary-18268946448 To fix this we now wait for the metadata to be fully synced to all nodes at the start of the teardown steps.	2023-11-02 10:39:05 +01:00
Jelte Fennema-Nio	ea5551689e	Prepare github actions pipelines for merge queue (#7315 ) Github has a built in merge queue. I think it would be good to try this out, to speed up merging PRs when multiple people want to merge at the same time. This PR does not enable it yet, but it starts triggering Github actions also for the `merge_queue` event. This is a requirement for trying them out. Announcment: https://github.blog/2023-07-12-github-merge-queue-is-generally-available/ Docs: https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/managing-a-merge-queue	2023-11-02 08:23:34 +00:00
Onur Tirtir	2cf4c04023	Fix flaky global_cancel.sql test (#7316 )	2023-11-01 23:59:41 +01:00
Jelte Fennema-Nio	e3c93c303d	Fix flaky citus_non_blocking_split_shard_cleanup (#7311 ) Sometimes in CI citus_non_blocking_split_shard_cleanup failed like this: ```diff --- /__w/citus/citus/src/test/regress/expected/citus_non_blocking_split_shard_cleanup.out.modified 2023-11-01 15:07:14.280551207 +0000 +++ /__w/citus/citus/src/test/regress/results/citus_non_blocking_split_shard_cleanup.out.modified 2023-11-01 15:07:14.292551358 +0000 @@ -106,21 +106,22 @@ ----------------------------------- (1 row) \c - - - :worker_2_port SET search_path TO "citus_split_test_schema"; -- Replication slots should be cleaned up SELECT slot_name FROM pg_replication_slots; slot_name --------------------------------- -(0 rows) + citus_shard_split_slot_19_10_17 +(1 row) -- Publications should be cleanedup SELECT count(*) FROM pg_publication; count ``` It's expected that the replication slot is sometimes not cleaned up if we don't wait until resource cleanup completes. This PR starts doing that here.	2023-11-01 16:21:12 +00:00
Gürkan İndibay	5903196020	Removes use-base-schedule flag from CI (#7301 ) Normally, tests which are written non-dependent to other tests can use minimal-tests and should use as well. However, in our test settings base-schedule is being used which may cause unnecessary dependencies and so unrelated errors that developers don't see in their local environment With this change, default setting will be minimal, so that tests will be free of unnecessary dependencies.	2023-11-01 15:52:22 +00:00
Jelte Fennema-Nio	c9f2fc892d	Fix flaky failure_split_cleanup (#7299 ) Sometimes failure_split_cleanup failed in CI like this: ```diff ERROR: server closed the connection unexpectedly CONTEXT: while executing command on localhost:9060 SELECT operation_id, object_type, object_name, node_group_id, policy_type FROM pg_dist_cleanup where operation_id = 777 ORDER BY object_name; operation_id \| object_type \| object_name \| node_group_id \| policy_type --------------+-------------+-----------------------------------------------------------+---------------+------------- 777 \| 1 \| citus_failure_split_cleanup_schema.table_to_split_8981000 \| 1 \| 0 - 777 \| 1 \| citus_failure_split_cleanup_schema.table_to_split_8981002 \| 1 \| 1 777 \| 1 \| citus_failure_split_cleanup_schema.table_to_split_8981002 \| 2 \| 0 + 777 \| 1 \| citus_failure_split_cleanup_schema.table_to_split_8981002 \| 1 \| 1 777 \| 1 \| citus_failure_split_cleanup_schema.table_to_split_8981003 \| 2 \| 1 777 \| 4 \| citus_shard_split_publication_1_10_777 \| 2 \| 0 (5 rows) -- we need to allow connection so that we can connect to proxy ``` Source: https://github.com/citusdata/citus/actions/runs/6717642291/attempts/1#summary-18256014949 It's the common problem where we're missing a column in the ORDER BY clause. This fixes that by adding an node_group_id to the query in question.	2023-11-01 14:08:51 +00:00
Jelte Fennema-Nio	c83c556702	Fix flaky isolation_master_update_node (#7303 ) Sometimes in CI isolation_master_update_node fails like this: ```diff ------------------ (1 row) step s2-abort: ABORT; step s1-abort: ABORT; FATAL: terminating connection due to administrator command FATAL: terminating connection due to administrator command SSL connection has been closed unexpectedly +server closed the connection unexpectedly master_remove_node ------------------ ``` This just seesm like a random error line. The only way to reasonably fix this is by adding an extra output file. So that's what this PR does.	2023-11-01 16:44:45 +03:00
Jelte Fennema-Nio	2bccb58157	Run github actions on main (#7292 ) We want the nice looking green checkmark on our main branch too. This PR includes running on pushes to release branches too, but that won't come into effect until we have release branches with this workflow file.	2023-11-01 13:12:20 +01:00
Jelte Fennema-Nio	0d83ab57de	Fix flaky multi_cluster_management (#7295 ) One of our most flaky and most anoying tests is multi_cluster_management. It usually fails like this: ```diff SELECT citus_disable_node('localhost', :worker_2_port); citus_disable_node -------------------- (1 row) SELECT public.wait_until_metadata_sync(60000); +WARNING: waiting for metadata sync timed out wait_until_metadata_sync -------------------------- (1 row) ``` This tries to address that by hardening wait_until_metadata_sync. I believe the reason for this warning is that there is a race condition in wait_until_metadata_sync. It's possible for the pre-check to fail, then have the maintenance daemon send a notification. And only then have the backend start to listen. I tried to fix it in two ways: 1. First run LISTEN, and only then read do the pre-check. 2. If we time out, check again just to make sure that we did not miss the notification somehow. And don't show a warning if all metadata is synced after the timeout. It's hard to know for sure that this fixes it because the test is not repeatable and I could not reproduce it locally. Let's just hope for the best. --------- Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>	2023-11-01 10:46:01 +00:00
Jelte Fennema-Nio	20ae42e7fa	Fix flaky multi_reference_table test (#7294 ) Sometimes multi_reference_table failed in CI like this: ```diff \c - - - :master_port DROP INDEX reference_schema.reference_index_2; \c - - - :worker_1_port SELECT "Column", "Type", "Modifiers" FROM table_desc WHERE relid='reference_schema.reference_table_ddl_1250019'::regclass; - Column \| Type \| Modifiers ---------------------------------------------------------------------- - value_2 \| double precision \| default 25.0 - value_3 \| text \| not null - value_4 \| timestamp without time zone \| - value_5 \| double precision \| -(4 rows) - +ERROR: schema "citus_local_table_queries" does not exist \di reference_schema.reference_index_2* List of relations Schema \| Name \| Type \| Owner \| Table ``` Source: https://github.com/citusdata/citus/actions/runs/6707535961/attempts/2#summary-18226879513 Reading from table_desc apparantly has an issue that if the schema gets deleted from one of the items, while it is being read that we get such an error. This change fixes that by not running multi_reference_table in parallel with citus_local_tables_queries anymore.	2023-11-01 10:12:06 +00:00
Cédric Villemain	37415ef8f5	Allow citus__size on index related to a distributed table (#7271 ) I just enhanced the existing code to check if the relation is an index belonging to a distributed table. If so the shardId is appended to relation (index) name and the _size function are executed as before. There is a change in an extern function: `extern StringInfo GenerateSizeQueryOnMultiplePlacements(...)` It's possible to create a new function and deprecate this one later if compatibility is an issue. Fixes https://github.com/citusdata/citus/issues/6496. DESCRIPTION: Allows using Citus size functions on distributed tables indexes. --------- Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>	2023-11-01 09:05:51 +00:00
Jelte Fennema-Nio	a76a832553	Fix flaky validate_constraint test (#7293 ) Sometimes validate constraint would fail like this: ```diff validatable_constraint_8000016 \| t (10 rows) DROP TABLE constrained_table; +ERROR: deadlock detected +DETAIL: Process 16602 waits for ShareRowExclusiveLock on relation 56258 of database 16384; blocked by process 16601. +Process 16601 waits for AccessShareLock on relation 56120 of database 16384; blocked by process 16602. +HINT: See server log for query details. DROP TABLE referenced_table CASCADE; DROP TABLE referencing_table; DROP SCHEMA validate_constraint CASCADE; -NOTICE: drop cascades to 3 other objects +NOTICE: drop cascades to 4 other objects DETAIL: drop cascades to type constraint_validity drop cascades to view constraint_validations_in_workers drop cascades to view constraint_validations +drop cascades to table constrained_table SET search_path TO DEFAULT; ``` Source: https://github.com/citusdata/citus/actions/runs/6708383699?pr=7291 This change fixes that by not running together with the foreign_key_to_reference_table test anymore. In passing it also simplifies dropping of the test its resources.	2023-11-01 09:41:28 +01:00
Jelte Fennema-Nio	81aa660b31	Fix flaky test detection (#7291 ) PR #7289 broke flaky test detction. This fixes that.	2023-10-31 15:59:16 +00:00
Gokhan Gulbiz	ce58c04304	Disable CircleCI (#7276 ) We are switching to Github Actions. In the test period it has worked well enough, so now we can stop using CircleCI.	2023-10-31 16:00:10 +01:00
Jelte Fennema-Nio	83e3fb817d	Only put major Postgres version in CI task name (#7289 ) Making tasks in CI required before merging to master is important and useful. The way this works is by saving the exact names of the required tasks in the admin interface of the repo. It has a search box to add them so it's not completely horrible, but doing so is quite a hassle since we have so many jobs. So limiting the amount of churn in this list of required jobs is quite useful. This changes the names of tasks to only include the major versions of Postgres, not the minor ones. Otherwise the next time we bump the minor versions we would have to remove and re-add each of the jobs.	2023-10-31 14:05:09 +01:00
Emel Şimşek	ee8f4bb7e8	Start Maintenance Daemon for Main DB at the server start. (#7254 ) DESCRIPTION: This change starts a maintenance deamon at the time of server start if there is a designated main database. This is the code flow: 1. User designates a main database: `ALTER SYSTEM SET citus.main_db = "myadmindb";` 2. When postmaster starts, in _PG_Init, citus calls `InitializeMaintenanceDaemonForMainDb` This function registers a background worker to run `CitusMaintenanceDaemonMain `with `databaseOid = 0 ` 3. `CitusMaintenanceDaemonMain ` takes some special actions when databaseOid is 0: - Gets the citus.main_db value. - Connects to the citus.main_db - Now the `MyDatabaseId `is available, creates a hash entry for it. - Then follows the same control flow as for a regular db,	2023-10-30 09:44:13 +03:00
Nils Dijk	d0b093c975	automatically add a breakpoint that breaks on postgres errors (#7279 ) When debugging postgres it is quite hard to get to the source for `errfinish` in `elog.c`. Instead of relying on the developer to set a breakpoint in the `elog.c` file for `errfinish` for `elevel == ERROR`, this change adds the breakpoint to `.gdbinit`. This makes sure that whenever a debugger is attached to a postgres backend it will break on postgres errors. When attaching the debugger a small banner is printed that explains how to disable the breakpoint.	2023-10-27 16:57:51 +02:00
Benjamin O	f9218d9780	Support replacing IPv6 Loopback in `normalize.sed` (#7269 ) I had a test failure issue due to my machine using the IPv6 loopback address. This change to the `normalize.sed` solves that issue.	2023-10-27 16:42:55 +02:00
Gokhan Gulbiz	2bf1472c8e	Move GHA environment variables to workflow file (#7275 ) Since GHA does not interpolate env variables in a matrix context, This PR defines them in a separate job and uses them in other jobs.	2023-10-26 14:54:58 +03:00
Naisila Puka	10198b18e8	Technical readme small fixes (#7261 )	2023-10-23 13:43:43 +03:00

... 4 5 6 7 8 ...

7031 Commits (leftjoin_push) All Branches Search

7031 Commits (leftjoin_push)

All Branches