citus/src/backend
Jelte Fennema-Nio 0d83ab57de
Fix flaky multi_cluster_management (#7295)
One of our most flaky and most anoying tests is
multi_cluster_management. It usually fails like this:
```diff
 SELECT citus_disable_node('localhost', :worker_2_port);
  citus_disable_node
 --------------------

 (1 row)

 SELECT public.wait_until_metadata_sync(60000);
+WARNING:  waiting for metadata sync timed out
  wait_until_metadata_sync
 --------------------------

 (1 row)

```

This tries to address that by hardening wait_until_metadata_sync. I
believe the reason for this warning is that there is a race condition in
wait_until_metadata_sync. It's possible for the pre-check to fail, then
have the maintenance daemon send a notification. And only then have the
backend start to listen. I tried to fix it in two ways:
1. First run LISTEN, and only then read do the pre-check.
2. If we time out, check again just to make sure that we did not miss
   the notification somehow. And don't show a warning if all metadata is
   synced after the timeout.

It's hard to know for sure that this fixes it because the test is not
repeatable and I could not reproduce it locally. Let's just hope for the
best.

---------

Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
2023-11-01 10:46:01 +00:00
..
columnar bump citus and columnar into 12.2devel (#7200) 2023-09-14 12:03:09 +03:00
distributed Fix flaky multi_cluster_management (#7295) 2023-11-01 10:46:01 +00:00
.gitignore Initial commit of Citus 5.0 2016-02-11 04:05:32 +02:00