Hopefully reduce flaky tests by disabling the maintenance daemon (#6252)

Sometimes our CI randomly fails on a test in a way similar to this:
```diff
 step s2-drop:
     DROP TABLE cancel_table;
-
+ <waiting ...>
+step s2-drop: <... completed>

 starting permutation: s1-timeout s1-begin s1-sleep10000 s1-rollback s1-reset s1-drop
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/26524/workflows/5415b84f-13a3-482f-bef9-648314c79a67/jobs/756377

Another example of a failure like this:
```diff
 stop_session_level_connection_to_node
 -------------------------------------
                                      
 (1 row)
 
 step s3-display: 
  SELECT * FROM ref_table ORDER BY id, value;
  SELECT * FROM dist_table ORDER BY id, value;
-
+ <waiting ...>
+step s3-display: <... completed>
 id|value
 --+-----
 ```
Source: https://app.circleci.com/pipelines/github/citusdata/citus/26551/workflows/91dca4b2-bb1c-4cae-b2ef-ce3f9c689ce5/jobs/757781

A step that shouldn't be blocked is detected as "waiting..." temporarily
and then gets unblocked automatically immediately after. I'm not
certain of the reason for this, but one explanation is that the
maintenance daemon is doing something that blocks the query. In the
shown case my hunch is that it could be the deferred shard deletion.

This PR disables all the features of the maintenance daemon during
isolation testing to try and prevent process from randomly being
detected as blocking.

NOTE: I'm not certain that this will actually fix this issue. If the
issue persists even after this change, at least we know that it's not
the maintenance daemon that's blocking it.
pull/6375/head
Jelte Fennema 2022-10-04 13:33:57 +02:00 committed by GitHub
parent 813542dfa1
commit 5c64227223
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 9 additions and 1 deletions

View File

@ -550,13 +550,21 @@ if($isolationtester)
{
push(@pgOptions, "citus.worker_min_messages='warning'");
push(@pgOptions, "citus.log_distributed_deadlock_detection=on");
push(@pgOptions, "citus.distributed_deadlock_detection_factor=-1");
push(@pgOptions, "citus.shard_count=4");
push(@pgOptions, "citus.metadata_sync_interval=1000");
push(@pgOptions, "citus.metadata_sync_retry_interval=100");
push(@pgOptions, "client_min_messages='warning'"); # pg12 introduced notice showing during isolation tests
push(@pgOptions, "citus.running_under_isolation_test=true");
# Disable all features of the maintenance daemon. Otherwise queries might
# randomly show temporarily as "waiting..." because they are waiting for the
# maintenance daemon.
push(@pgOptions, "citus.distributed_deadlock_detection_factor=-1");
push(@pgOptions, "citus.recover_2pc_interval=-1");
push(@pgOptions, "citus.enable_statistics_collection=-1");
push(@pgOptions, "citus.defer_shard_delete_interval=-1");
push(@pgOptions, "citus.stat_statements_purge_interval=-1");
push(@pgOptions, "citus.background_task_queue_interval=-1");
}
# Add externally added options last, so they overwrite the default ones above