Increase isolation timeout because of shards splits (#6213)

Recently isolation tests involving shard splits have been randomly failing in CI with timeouts. It's possible that there's an actual bug here, but it's also quite likely that our timeout is just slightly too low for the combination of shard splits and the CI VM having a bad day. Increasing the timeout is fairly low cost and allows us to find out if there's an actual bug or if its simply slowness. So that's what this PR does. If it turns out to be an actual bug, we can decrease the timeout again when we fix it. Examples of failed tests: 1. https://app.circleci.com/pipelines/github/citusdata/citus/26241/workflows/9e0bb721-d798-481b-907c-914236b63e38/jobs/742409 2. https://app.circleci.com/pipelines/github/citusdata/citus/26171/workflows/8f352e3b-e6e4-4f7f-b0d0-2543f62a0209/jobs/739470
2022-08-19 21:37:45 +02:00 · 2022-08-19 21:37:45 +02:00 · dfa6c26d7d
parent 9cfadd7965
commit dfa6c26d7d
1 changed files with 7 additions and 3 deletions
--- a/src/test/regress/Makefile
+++ b/src/test/regress/Makefile
@ -16,9 +16,13 @@ MAKEFILE_DIR := $(dir $(realpath $(firstword $(MAKEFILE_LIST))))
 export PATH := $(MAKEFILE_DIR)/bin:$(PATH)
 export PG_REGRESS_DIFF_OPTS = -dU10 -w
 # Use lower isolation test timeout, the 5 minute default is waaay too long for
-# us so we use 20 seconds instead. We should detect blockages very quickly and
-# the queries we run are also very fast.
-export PGISOLATIONTIMEOUT = 20
+# us so we use 60 seconds instead. We should detect blockages very quickly and
+# most queries that we run are also very fast. So fast even that 60 seconds is
+# usually too long. However, any commands involving logical replication can be
+# quite slow, especially shard splits and especially on CI. So we still keep
+# this value at the pretty high 60 seconds because even those slow commands are
+# definitly stuck when they take longer than that.
+export PGISOLATIONTIMEOUT = 60

 ##
 ## Citus regression support