Commit Graph

6788 Commits (pg16_grant_inherit_set)

Author SHA1 Message Date
Naisila Puka 30b397840b Fix WITH ADMIN FALSE propagation (#7191) 2023-11-13 13:51:21 -05:00
Naisila Puka 0fb471e7c5 PG16 - Add reindex database/system tests (#7167)
In PG16, REINDEX DATABASE/SYSTEM name is optional.
We already don't propagate these commands automatically.
Testing here with run_command_on_workers.

Relevant PG commit:
https://github.com/postgres/postgres/commit/2cbc3c1
2023-11-13 13:51:21 -05:00
Jodi-Ann Francis 79f27b6761 Review changes for pg16 update GRANT and REVOKE 2023-11-13 13:51:19 -05:00
Jodi-Ann Francis 712d15dc20 Review changes 2023-11-13 13:51:05 -05:00
Naisila Puka 9da1a70072 Fix WITH ADMIN FALSE propagation (#7191) 2023-11-13 13:50:52 -05:00
Naisila Puka 0d1e973793 PG16 - Add reindex database/system tests (#7167)
In PG16, REINDEX DATABASE/SYSTEM name is optional.
We already don't propagate these commands automatically.
Testing here with run_command_on_workers.

Relevant PG commit:
https://github.com/postgres/postgres/commit/2cbc3c1
2023-11-13 13:50:10 -05:00
Jodi-Ann Francis d7761b3158 Review changes for pg16 update GRANT and REVOKE 2023-11-13 13:48:55 -05:00
Jodi-Ann Francis 62e0d0d01b Review changes 2023-11-13 13:47:59 -05:00
Naisila Puka a212a9d0cc Fix WITH ADMIN FALSE propagation (#7191) 2023-11-13 13:47:27 -05:00
Naisila Puka 645a0d1227 PG16 - Add reindex database/system tests (#7167)
In PG16, REINDEX DATABASE/SYSTEM name is optional.
We already don't propagate these commands automatically.
Testing here with run_command_on_workers.

Relevant PG commit:
https://github.com/postgres/postgres/commit/2cbc3c1
2023-11-13 13:46:00 -05:00
Jodi-Ann Francis c6d5240af7 Review changes for pg16 update GRANT and REVOKE 2023-11-13 13:32:07 -05:00
Jodi-Ann Francis 587718aeae Review changes 2023-11-13 13:32:06 -05:00
Naisila Puka 29dbabbcf3 Fix WITH ADMIN FALSE propagation (#7191) 2023-11-13 13:31:08 -05:00
Naisila Puka b967ee0935 PG16 - Add reindex database/system tests (#7167)
In PG16, REINDEX DATABASE/SYSTEM name is optional.
We already don't propagate these commands automatically.
Testing here with run_command_on_workers.

Relevant PG commit:
https://github.com/postgres/postgres/commit/2cbc3c1
2023-11-13 13:30:53 -05:00
Jodi-Ann Francis ff3cabb1cb Review changes for pg16 update GRANT and REVOKE 2023-11-13 13:30:51 -05:00
Jodi-Ann Francis 816fb85333 Review changes 2023-11-13 13:30:14 -05:00
Naisila Puka 7a205f2dd2 Bump PG versions to latest minors 14.10, 15.5, 16.1 (#7336)
Postgres got minor updates on Nov9, this starts using the images with
the latest version for our tests, namely 14.10, 15.5 and 16.1.
These minor updates were compatible with Citus.
Sister PR: https://github.com/citusdata/the-process/pull/152
2023-11-13 13:29:11 -05:00
Onur Tirtir cc23fd8068 Support role commands from any node (#7278)
DESCRIPTION: Adds support from issuing role management commands from worker nodes

It's unlikely to get into a distributed deadlock with role commands, we
don't care much about them at the moment.
There were several attempts to reduce the chances of a deadlock but we
didn't any of them merged into main branch yet, see:
#7325
#7016
#7009
2023-11-13 13:29:11 -05:00
Naisila Puka 56425ca6c5 Fix VACUUM flakiness in multi_utilities (#7334)
When I run this test in my local, the size of the table after the DELETE
command is around 58785792. Hence, I assume that the diffs suggest that
the Vacuum had no effect. The current solution is to run the VACUUM
command three times instead of once.

Example diff:
https://github.com/citusdata/citus/actions/runs/6722231142/attempts/1#summary-18269870674
```diff
insert into local_vacuum_table select i from generate_series(1,1000000) i;
 delete from local_vacuum_table;
 VACUUM local_vacuum_table;
 SELECT CASE WHEN s BETWEEN 20000000 AND 25000000 THEN 22500000 ELSE s END
 FROM pg_total_relation_size('local_vacuum_table') s ;
     s     
 ----------
- 22500000
+ 58785792
 (1 row)
```
See more diff examples in the PR description
https://github.com/citusdata/citus/pull/7334
2023-11-13 13:29:11 -05:00
dependabot[bot] e19036a124 Bump werkzeug from 2.3.7 to 3.0.1 in /.devcontainer/src/test/regress
Bumps [werkzeug](https://github.com/pallets/werkzeug) from 2.3.7 to 3.0.1.
- [Release notes](https://github.com/pallets/werkzeug/releases)
- [Changelog](https://github.com/pallets/werkzeug/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/werkzeug/compare/2.3.7...3.0.1)

---
updated-dependencies:
- dependency-name: werkzeug
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-11-13 13:29:11 -05:00
dependabot[bot] fffb7839e5 Bump werkzeug from 2.3.7 to 3.0.1 in /src/test/regress
Bumps [werkzeug](https://github.com/pallets/werkzeug) from 2.3.7 to 3.0.1.
- [Release notes](https://github.com/pallets/werkzeug/releases)
- [Changelog](https://github.com/pallets/werkzeug/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/werkzeug/compare/2.3.7...3.0.1)

---
updated-dependencies:
- dependency-name: werkzeug
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-11-13 13:29:11 -05:00
Nils Dijk c8b1000450 move pg_version_constants.h to toplevel include (#7335)
In preparation of sorting and grouping all includes we wanted to move
this file to the toplevel includes for good grouping/sorting.
2023-11-13 13:29:11 -05:00
Hanefi Onaldi e8eb0cf9a1 Add changelog entries for 12.1.1 (#7332)
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
2023-11-13 13:29:11 -05:00
Naisila Puka d515b37275 Fix flaky multi_mx_insert_select_repartition test (#7331)
https://github.com/citusdata/citus/actions/runs/6745019678/attempts/1#summary-18336188930
```diff
     insert into target_table SELECT a*2 FROM source_table RETURNING a;
-NOTICE:  executing the command locally: SELECT bytes FROM fetch_intermediate_results(ARRAY['repartitioned_results_xxxxx_from_4213582_to_0','repartitioned_results_xxxxx_from_4213584_to_0']::text[],'localhost',57638) bytes
+NOTICE:  executing the command locally: SELECT bytes FROM fetch_intermediate_results(ARRAY['repartitioned_results_3940758121873413_from_4213584_to_0','repartitioned_results_3940758121873413_from_4213582_to_0']::text[],'localhost',57638) bytes
```

The elements in the array passed to `fetch_intermediate_results` are the
same, but in the opposite order than expected.

To fix this flakiness, we can omit the `"SELECT bytes FROM
fetch_intermediate_results..."` line. From the following logs, it is
understandable that the intermediate results have been fetched.
2023-11-13 13:29:11 -05:00
Onur Tirtir 319484e44a Remove useless variables (#7327)
To fix warnings observed when using different compiler versions.
2023-11-13 13:29:11 -05:00
cvbhjkl e8e0b834c0 Fix typo in local_executor.c (#7324)
Fix a typo 'remaning' -> 'remaining' in local_executor.c
2023-11-13 13:29:11 -05:00
Onur Tirtir d42c145dab Fix flaky isolation_get_all_active_transactions.spec test (#7323)
Fix the flaky test that results in following diff by waiting until the
backend that we want to terminate really terminates, until 5secs.

```diff
--- /__w/citus/citus/src/test/regress/expected/isolation_get_all_active_transactions.out.modified	2023-11-01 16:30:57.648749795 +0000
+++ /__w/citus/citus/src/test/regress/results/isolation_get_all_active_transactions.out.modified	2023-11-01 16:30:57.656749877 +0000
@@ -114,13 +114,13 @@
 --------------------
 t                   
 (1 row)
 
 step s3-show-activity: 
  SET ROLE postgres;
  select count(*) from get_all_active_transactions() where process_id IN (SELECT * FROM selected_pid);
 
 count
 -----
-    0
+    1
 (1 row)
```
2023-11-13 13:29:11 -05:00
Onur Tirtir 8cc9fb3025 Make some more tests re-runable (#7322)
* multi_mx_create_table
* multi_mx_function_table_reference
* multi_mx_add_coordinator
* create_role_propagation
* metadata_sync_helpers
* text_search

https://github.com/citusdata/citus/pull/7278 requires this.
2023-11-13 13:29:11 -05:00
Jelte Fennema-Nio 1a2f02216c Fix flaky multi_alter_table_statements (#7321)
Sometimes multi_alter_table_statements would fail in CI like this:

```diff
 -- Verify that DROP NOT NULL works
 ALTER TABLE lineitem_alter ALTER COLUMN int_column2 DROP NOT NULL;
 SELECT "Column", "Type", "Modifiers" FROM table_desc WHERE relid='lineitem_alter'::regclass;
-     Column      |         Type          | Modifiers
----------------------------------------------------------------------
- l_orderkey      | bigint                | not null
- l_partkey       | integer               | not null
- l_suppkey       | integer               | not null
- l_linenumber    | integer               | not null
- l_quantity      | numeric(15,2)         | not null
- l_extendedprice | numeric(15,2)         | not null
- l_discount      | numeric(15,2)         | not null
- l_tax           | numeric(15,2)         | not null
- l_returnflag    | character(1)          | not null
- l_linestatus    | character(1)          | not null
- l_shipdate      | date                  | not null
- l_commitdate    | date                  | not null
- l_receiptdate   | date                  | not null
- l_shipinstruct  | character(25)         | not null
- l_shipmode      | character(10)         | not null
- l_comment       | character varying(44) | not null
- float_column    | double precision      | default 1
- date_column     | date                  |
- int_column1     | integer               |
- int_column2     | integer               |
- null_column     | integer               |
-(21 rows)
-
+ERROR:  schema "alter_table_add_column" does not exist
 -- COPY should succeed now
 SELECT master_create_empty_shard('lineitem_alter') as shardid \gset
 ```

Reading from table_desc apparantly has an issue that if the schema gets
deleted from one of the items, while it is being read that we get such
an error.

This change fixes that by not running multi_alter_table_statements in parallel
with alter_table_add_column anymore.

This is another instance of the same issue as in #7294
2023-11-13 13:29:11 -05:00
Jelte Fennema-Nio 7539fa3403 Fix flaky failure_distributed_results (#7307)
Sometimes in CI we run into this failure:

```diff
   SELECT resultId, nodeport, rowcount, targetShardId, targetShardIndex
   FROM partition_task_list_results('test', $$ SELECT * FROM source_table $$, 'target_table')
           NATURAL JOIN pg_dist_node;
-WARNING:  connection to the remote node localhost:xxxxx failed with the following error: connection not open
+ERROR:  connection to the remote node localhost:9060 failed with the following error: connection not open
 SELECT * FROM distributed_result_info ORDER BY resultId;
-       resultid        | nodeport | rowcount | targetshardid | targetshardindex
----------------------------------------------------------------------
- test_from_100800_to_0 |     9060 |       22 |        100805 |                0
- test_from_100801_to_0 |    57637 |        2 |        100805 |                0
- test_from_100801_to_1 |    57637 |       15 |        100806 |                1
- test_from_100802_to_1 |    57637 |       10 |        100806 |                1
- test_from_100802_to_2 |    57637 |        5 |        100807 |                2
- test_from_100803_to_2 |    57637 |       18 |        100807 |                2
- test_from_100803_to_3 |    57637 |        4 |        100808 |                3
- test_from_100804_to_3 |     9060 |       24 |        100808 |                3
-(8 rows)
-
+ERROR:  current transaction is aborted, commands ignored until end of transaction block
 -- fetch from worker 2 should fail
 SAVEPOINT s1;
+ERROR:  current transaction is aborted, commands ignored until end of transaction block
 SELECT fetch_intermediate_results('{test_from_100802_to_1,test_from_100802_to_2}'::text[], 'localhost', :worker_2_port) > 0 AS fetched;
-ERROR:  could not open file "base/pgsql_job_cache/xx_x_xxx/test_from_100802_to_1.data": No such file or directory
-CONTEXT:  while executing command on localhost:xxxxx
+ERROR:  current transaction is aborted, commands ignored until end of transaction block
 ROLLBACK TO SAVEPOINT s1;
+ERROR:  savepoint "s1" does not exist
 -- fetch from worker 1 should succeed
 SELECT fetch_intermediate_results('{test_from_100802_to_1,test_from_100802_to_2}'::text[], 'localhost', :worker_1_port) > 0 AS fetched;
- fetched
----------------------------------------------------------------------
- t
-(1 row)
-
+ERROR:  current transaction is aborted, commands ignored until end of transaction block
 -- make sure the results read are same as the previous transaction block
 SELECT count(*), sum(x) FROM
   read_intermediate_results('{test_from_100802_to_1,test_from_100802_to_2}'::text[],'binary') AS res (x int);
- count | sum
----------------------------------------------------------------------
-    15 | 863
-(1 row)
-
+ERROR:  current transaction is aborted, commands ignored until end of transaction block
 ROLLBACk;
```

As outlined in the #7306 I created, the reason for this is related to
only having a single connection open to the node. Finding and fixing the
full cause is not trivial, so instead this PR starts working around
this bug by forcing maximum parallelism. Preferably we'd want
this workaround not to be necessary, but that requires
spending time to fix this. For now having a less flaky CI is
good enough.
2023-11-13 13:29:11 -05:00
Jelte Fennema-Nio a953705017 Fix flaky insert_select_connection_leak (#7302)
Sometimes in CI insert_select_connection_leak would fail like this:

```diff
 END;
 SELECT worker_connection_count(:worker_1_port) - :pre_xact_worker_1_connections AS leaked_worker_1_connections,
        worker_connection_count(:worker_2_port) - :pre_xact_worker_2_connections AS leaked_worker_2_connections;
  leaked_worker_1_connections | leaked_worker_2_connections
 -----------------------------+-----------------------------
-                           0 |                           0
+                          -1 |                           0
 (1 row)

 -- ROLLBACK
 BEGIN;
 INSERT INTO target_table SELECT * FROM source_table;
 INSERT INTO target_table SELECT * FROM source_table;
 ROLLBACK;
 SELECT worker_connection_count(:worker_1_port) - :pre_xact_worker_1_connections AS leaked_worker_1_connections,
        worker_connection_count(:worker_2_port) - :pre_xact_worker_2_connections AS leaked_worker_2_connections;
  leaked_worker_1_connections | leaked_worker_2_connections
 -----------------------------+-----------------------------
-                           0 |                           0
+                          -1 |                           0
 (1 row)

 \set VERBOSITY TERSE
 -- Error on constraint failure
 BEGIN;
 INSERT INTO target_table SELECT * FROM source_table;
 SELECT worker_connection_count(:worker_1_port) AS worker_1_connections,
        worker_connection_count(:worker_2_port) AS worker_2_connections \gset
 SAVEPOINT s1;
 INSERT INTO target_table SELECT a, CASE WHEN a < 50 THEN b ELSE null END  FROM source_table;
@@ -89,15 +89,15 @@
  leaked_worker_1_connections | leaked_worker_2_connections
 -----------------------------+-----------------------------
                            0 |                           0
 (1 row)

 END;
 SELECT worker_connection_count(:worker_1_port) - :pre_xact_worker_1_connections AS leaked_worker_1_connections,
        worker_connection_count(:worker_2_port) - :pre_xact_worker_2_connections AS leaked_worker_2_connections;
  leaked_worker_1_connections | leaked_worker_2_connections
 -----------------------------+-----------------------------
-                           0 |                           0
+                          -1 |                           0
 (1 row)
```

Source:
https://github.com/citusdata/citus/actions/runs/6718401194/attempts/1#summary-18258258387

A negative amount of leaked connectios is obviously not possible. For
some reason there was a connection open when we checked the initial
amount of connections that was closed afterwards. This could be the
from the maintenance daemon or maybe from the previous test that had not
fully closed its connections just yet.

The change in this PR doesnt't actually fix the cause of the negative
connection, but it simply considers it good as well, by changing the
result to zero for negative values.

With this fix we might sometimes miss a leak, because the negative
number can cancel out the leak and still result in a 0. But since the
negative number only occurs sometimes, we'll still find the leak often
enough.
2023-11-13 13:29:11 -05:00
Cédric Villemain e10b62a6a3 Fix #7242, CALL(@0) crash backend (#7288)
When executing a prepared CALL, which is not pure SQL but available with
some drivers like npgsql and jpgdbc, Citus entered a code path where a
plan is not defined, while trying to increase its cost. Thus SIG11 when
plan is a NULL pointer.

Fix by only increasing plan cost when plan is not null.

However, it is a bit suspicious to get here with a NULL plan and maybe a
better change will be to not call
ShardPlacementForFunctionColocatedWithDistTable() with a NULL plan at
all (in call.c:134)

bug hit with for example:
```
CallableStatement proc = con.prepareCall("{CALL p(?)}");
proc.registerOutParameter(1, java.sql.Types.BIGINT);
proc.setInt(1, -100);
proc.execute();
```

where `p(bigint)` is a distributed "function" and the param the
distribution key (also in a distributed table), see #7242 for details

Fixes #7242
2023-11-13 13:29:11 -05:00
Jelte Fennema-Nio c9494c682f Debug flaky logical_replication test (#7309)
Sometimes in CI our logical_replication test fails like this:

```diff
+++ /__w/citus/citus/src/test/regress/results/logical_replication.out.modified	2023-11-01 14:15:08.562758546 +0000
@@ -40,21 +40,21 @@

 SELECT count(*) from pg_publication;
  count
 -------
      0
 (1 row)

 SELECT count(*) from pg_replication_slots;
  count
 -------
-     0
+     1
 (1 row)

 SELECT count(*) FROM dist;
  count
 -------
```

It's hard to understand what is going on here, just based on the wrong
number. So this PR changes the test to show the name of the
subscription, publication and replication slot to make finding the cause
easier.

In passing this also fixes another flaky test in the same file that our
flaky test detection picked up. This is done by waiting for resource
cleanup after the shard move.
2023-11-13 13:29:11 -05:00
Jelte Fennema-Nio a1ca8b9dc3 Do not download all artifacts for flaky test detection (#7320)
This is causing 404 failures due to a race condition:
https://github.com/actions/toolkit/issues/1235

It also makes the tests take unnecessarily long.

This was tested by changing a test file and seeing that the flaky test
detection was still working.
2023-11-13 13:29:11 -05:00
Onur Tirtir 99551bd7aa Fix flaky multi_mx_node_metadata.sql test (#7317)
Fixes the flaky test that results in following diff:
```diff
--- /__w/citus/citus/src/test/regress/expected/multi_mx_node_metadata.out.modified	2023-11-01 14:22:12.890476575 +0000
+++ /__w/citus/citus/src/test/regress/results/multi_mx_node_metadata.out.modified	2023-11-01 14:22:12.914476657 +0000
@@ -840,24 +840,26 @@
 (1 row)
 
 \c :datname - - :master_port
 SELECT datname FROM pg_stat_activity WHERE application_name LIKE 'Citus Met%';
   datname   
 ------------
  db_to_drop
 (1 row)
 
 DROP DATABASE db_to_drop;
+ERROR:  database "db_to_drop" is being accessed by other users
 SELECT datname FROM pg_stat_activity WHERE application_name LIKE 'Citus Met%';
   datname   
 ------------
-(0 rows)
+ db_to_drop
+(1 row)
 
 -- cleanup
 DROP SEQUENCE sequence CASCADE;
 NOTICE:  drop cascades to default value for column a of table reference_table
```
2023-11-13 13:29:11 -05:00
Gürkan İndibay 341e9393a4 Enriches statement propagation document (#7267)
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
Co-authored-by: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com>
Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
2023-11-13 13:29:11 -05:00
Jelte Fennema-Nio 7cc1354617 Fix flaky isolation_metadata_sync_deadlock (#7312)
Sometimes isolation_metadata_sync_deadlock fails in CI like this:

```diff
diff -dU10 -w /__w/citus/citus/src/test/regress/expected/isolation_metadata_sync_deadlock.out /__w/citus/citus/src/test/regress/results/isolation_metadata_sync_deadlock.out
--- /__w/citus/citus/src/test/regress/expected/isolation_metadata_sync_deadlock.out.modified	2023-11-01 16:03:15.090199229 +0000
+++ /__w/citus/citus/src/test/regress/results/isolation_metadata_sync_deadlock.out.modified	2023-11-01 16:03:15.098199312 +0000
@@ -110,10 +110,14 @@
 t
 (1 row)

 step s2-stop-connection:
  SELECT stop_session_level_connection_to_node();

 stop_session_level_connection_to_node
 -------------------------------------

 (1 row)
+
+teardown failed: ERROR:  localhost:57638 is a metadata node, but is out of sync
+HINT:  If the node is up, wait until metadata gets synced to it and try again.
+CONTEXT:  SQL statement "SELECT master_remove_distributed_table_metadata_from_workers(v_obj.objid, v_obj.schema_name, v_obj.object_name)"
```

Source:
https://github.com/citusdata/citus/actions/runs/6721938040/attempts/1#summary-18268946448

To fix this we now wait for the metadata to be fully synced to all
nodes at the start of the teardown steps.
2023-11-13 13:29:11 -05:00
Jelte Fennema-Nio 22c8c17f70 Prepare github actions pipelines for merge queue (#7315)
Github has a built in merge queue. I think it would be good to try this
out, to speed up merging PRs when multiple people want to merge at the
same time. This PR does not enable it yet, but it starts triggering
Github actions also for the `merge_queue` event. This is a requirement
for trying them out.

Announcment:
https://github.blog/2023-07-12-github-merge-queue-is-generally-available/
Docs:
https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/managing-a-merge-queue
2023-11-13 13:29:11 -05:00
Onur Tirtir b2338b1a6c Fix flaky global_cancel.sql test (#7316) 2023-11-13 13:29:11 -05:00
Jelte Fennema-Nio 14ed67f38d Fix flaky citus_non_blocking_split_shard_cleanup (#7311)
Sometimes in CI citus_non_blocking_split_shard_cleanup failed like this:

```diff
--- /__w/citus/citus/src/test/regress/expected/citus_non_blocking_split_shard_cleanup.out.modified	2023-11-01 15:07:14.280551207 +0000
+++ /__w/citus/citus/src/test/regress/results/citus_non_blocking_split_shard_cleanup.out.modified	2023-11-01 15:07:14.292551358 +0000
@@ -106,21 +106,22 @@
 -----------------------------------

 (1 row)

 \c - - - :worker_2_port
 SET search_path TO "citus_split_test_schema";
 -- Replication slots should be cleaned up
 SELECT slot_name FROM pg_replication_slots;
             slot_name
 ---------------------------------
-(0 rows)
+ citus_shard_split_slot_19_10_17
+(1 row)

 -- Publications should be cleanedup
 SELECT count(*) FROM pg_publication;
  count
```

It's expected that the replication slot is sometimes not cleaned up if
we don't wait until resource cleanup completes. This PR starts doing
that here.
2023-11-13 13:29:11 -05:00
Gürkan İndibay 4f9d525ab6 Removes use-base-schedule flag from CI (#7301)
Normally, tests which are written non-dependent to other tests can use
minimal-tests and should use as well. However, in our test settings
base-schedule is being used which may cause unnecessary dependencies and
so unrelated errors that developers don't see in their local environment
With this change, default setting will be minimal, so that tests will be
free of unnecessary dependencies.
2023-11-13 13:29:11 -05:00
Jelte Fennema-Nio d708f13b5c Fix flaky failure_split_cleanup (#7299)
Sometimes failure_split_cleanup failed in CI like this:

```diff
 ERROR:  server closed the connection unexpectedly
 CONTEXT:  while executing command on localhost:9060
     SELECT operation_id, object_type, object_name, node_group_id, policy_type
     FROM pg_dist_cleanup where operation_id = 777 ORDER BY object_name;
  operation_id | object_type |                        object_name                        | node_group_id | policy_type
 --------------+-------------+-----------------------------------------------------------+---------------+-------------
           777 |           1 | citus_failure_split_cleanup_schema.table_to_split_8981000 |             1 |           0
-          777 |           1 | citus_failure_split_cleanup_schema.table_to_split_8981002 |             1 |           1
           777 |           1 | citus_failure_split_cleanup_schema.table_to_split_8981002 |             2 |           0
+          777 |           1 | citus_failure_split_cleanup_schema.table_to_split_8981002 |             1 |           1
           777 |           1 | citus_failure_split_cleanup_schema.table_to_split_8981003 |             2 |           1
           777 |           4 | citus_shard_split_publication_1_10_777                    |             2 |           0
 (5 rows)

     -- we need to allow connection so that we can connect to proxy
```

Source:
https://github.com/citusdata/citus/actions/runs/6717642291/attempts/1#summary-18256014949

It's the common problem where we're missing a column in the ORDER BY
clause. This fixes that by adding an node_group_id to the query in
question.
2023-11-13 13:29:11 -05:00
Jelte Fennema-Nio e0ed54141b Fix flaky isolation_master_update_node (#7303)
Sometimes in CI isolation_master_update_node fails like this:

```diff
 ------------------

 (1 row)

 step s2-abort: ABORT;
 step s1-abort: ABORT;
 FATAL:  terminating connection due to administrator command
 FATAL:  terminating connection due to administrator command
 SSL connection has been closed unexpectedly
+server closed the connection unexpectedly

 master_remove_node
 ------------------

```

This just seesm like a random error line. The only way to reasonably fix
this is by adding an extra output file. So that's what this PR does.
2023-11-13 13:29:11 -05:00
Jelte Fennema-Nio 171cc08fb2 Run github actions on main (#7292)
We want the nice looking green checkmark on our main branch too.

This PR includes running on pushes to release branches too, but that
won't come into effect until we have release branches with this
workflow file.
2023-11-13 13:29:11 -05:00
Jelte Fennema-Nio 1e283bee4c Fix flaky multi_cluster_management (#7295)
One of our most flaky and most anoying tests is
multi_cluster_management. It usually fails like this:
```diff
 SELECT citus_disable_node('localhost', :worker_2_port);
  citus_disable_node
 --------------------

 (1 row)

 SELECT public.wait_until_metadata_sync(60000);
+WARNING:  waiting for metadata sync timed out
  wait_until_metadata_sync
 --------------------------

 (1 row)

```

This tries to address that by hardening wait_until_metadata_sync. I
believe the reason for this warning is that there is a race condition in
wait_until_metadata_sync. It's possible for the pre-check to fail, then
have the maintenance daemon send a notification. And only then have the
backend start to listen. I tried to fix it in two ways:
1. First run LISTEN, and only then read do the pre-check.
2. If we time out, check again just to make sure that we did not miss
   the notification somehow. And don't show a warning if all metadata is
   synced after the timeout.

It's hard to know for sure that this fixes it because the test is not
repeatable and I could not reproduce it locally. Let's just hope for the
best.

---------

Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
2023-11-13 13:29:11 -05:00
Jelte Fennema-Nio a1a140eeeb Fix flaky multi_reference_table test (#7294)
Sometimes multi_reference_table failed in CI like this:

```diff
 \c - - - :master_port
 DROP INDEX reference_schema.reference_index_2;
 \c - - - :worker_1_port
 SELECT "Column", "Type", "Modifiers" FROM table_desc WHERE relid='reference_schema.reference_table_ddl_1250019'::regclass;
- Column  |            Type             |  Modifiers
----------------------------------------------------------------------
- value_2 | double precision            | default 25.0
- value_3 | text                        | not null
- value_4 | timestamp without time zone |
- value_5 | double precision            |
-(4 rows)
-
+ERROR:  schema "citus_local_table_queries" does not exist
 \di reference_schema.reference_index_2*
           List of relations
  Schema | Name | Type | Owner | Table
```

Source:
https://github.com/citusdata/citus/actions/runs/6707535961/attempts/2#summary-18226879513

Reading from table_desc apparantly has an issue that if the schema gets
deleted from one of the items, while it is being read that we get such
an error.

This change fixes that by not running multi_reference_table in parallel
with citus_local_tables_queries anymore.
2023-11-13 13:29:11 -05:00
Cédric Villemain 49c62b35a0 Allow citus_*_size on index related to a distributed table (#7271)
I just enhanced the existing code to check if the relation is an index
belonging to a distributed table.
If so the shardId is appended to relation (index) name and the *_size
function are executed as before.

There is a change in an extern function:
  `extern StringInfo GenerateSizeQueryOnMultiplePlacements(...)`
It's possible to create a new function and deprecate this one later if
compatibility is an issue.

Fixes https://github.com/citusdata/citus/issues/6496.

DESCRIPTION: Allows using Citus size functions on distributed tables
indexes.

---------

Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
2023-11-13 13:29:11 -05:00
Jelte Fennema-Nio 5f3cf5a266 Fix flaky validate_constraint test (#7293)
Sometimes validate constraint would fail like this:

```diff
  validatable_constraint_8000016 | t
 (10 rows)

 DROP TABLE constrained_table;
+ERROR:  deadlock detected
+DETAIL:  Process 16602 waits for ShareRowExclusiveLock on relation 56258 of database 16384; blocked by process 16601.
+Process 16601 waits for AccessShareLock on relation 56120 of database 16384; blocked by process 16602.
+HINT:  See server log for query details.
 DROP TABLE referenced_table CASCADE;
 DROP TABLE referencing_table;
 DROP SCHEMA validate_constraint CASCADE;
-NOTICE:  drop cascades to 3 other objects
+NOTICE:  drop cascades to 4 other objects
 DETAIL:  drop cascades to type constraint_validity
 drop cascades to view constraint_validations_in_workers
 drop cascades to view constraint_validations
+drop cascades to table constrained_table
 SET search_path TO DEFAULT;

```

Source:
https://github.com/citusdata/citus/actions/runs/6708383699?pr=7291

This change fixes that by not running together with the
foreign_key_to_reference_table test anymore. In passing it also
simplifies dropping of the test its resources.
2023-11-13 13:29:11 -05:00
Jelte Fennema-Nio e205241634 Fix flaky test detection (#7291)
PR #7289 broke flaky test detction. This fixes that.
2023-11-13 13:29:11 -05:00
Gokhan Gulbiz bdf0a4effe Disable CircleCI (#7276)
We are switching to Github Actions. In the test period it has worked well enough, so now we can stop using CircleCI.
2023-11-13 13:29:11 -05:00