mirror of https://github.com/citusdata/citus.git
Fix flakyness in failure_online_move_shard_placement (#6250)
Sometimes in CI failure_online_move_shard_placement fails with the
following error:
```diff
SELECT citus.mitmproxy('conn.onQuery(query="^ALTER SUBSCRIPTION .* ENABLE").cancel(' || :pid || ')');
mitmproxy
-----------
(1 row)
SELECT master_move_shard_placement(101, 'localhost', :worker_1_port, 'localhost', :worker_2_proxy_port);
-ERROR: canceling statement due to user request
+ERROR: tuple concurrently updated
+CONTEXT: while executing command on localhost:9060
-- failure on polling subscription state
```
Source: https://app.circleci.com/pipelines/github/citusdata/citus/26441/workflows/dd6e3475-6121-47b3-aea3-4ac92be114f4/jobs/751476/steps
This error is not completely harmless, because based on the logs it mean
that our cleanup logic failed, which in turn means that replication
slots are left around:
```
2022-08-24 16:01:29.247 UTC [1219] ERROR: XX000: tuple concurrently updated
2022-08-24 16:01:29.247 UTC [1219] LOCATION: simple_heap_update, heapam.c:4179
2022-08-24 16:01:29.247 UTC [1219] STATEMENT: ALTER SUBSCRIPTION citus_shard_move_subscription_10 DISABLE
```
However, we have other mechanisms to clean up any leftovers in case of a
failed cleanup. So it's not that big of a problem.
The reason we run into this error is arguably because of a Postgres bug,
so I created a patch for Postgres that fixes this.
While we wait for this (or a similar) patch to be merged, this PR
disables the flaky test. There's still a test that tests in case of a
connection "kill" instead of a "cancel", so I don't think we lose very
important coverage by disabling this test. When trying to reproduce this
I only hit this issue in the cancel case, so I don't think there's a
need to disable the kill case for now.
pull/6248/head
parent
2a0c0b3ba6
commit
77dd49fcf8
|
|
@ -110,14 +110,10 @@ SELECT master_move_shard_placement(101, 'localhost', :worker_1_port, 'localhost'
|
||||||
ERROR: connection not open
|
ERROR: connection not open
|
||||||
CONTEXT: while executing command on localhost:xxxxx
|
CONTEXT: while executing command on localhost:xxxxx
|
||||||
-- failure when enabling the subscriptions
|
-- failure when enabling the subscriptions
|
||||||
SELECT citus.mitmproxy('conn.onQuery(query="^ALTER SUBSCRIPTION .* ENABLE").cancel(' || :pid || ')');
|
-- This test can be enabled again once this postgres bug is fixed:
|
||||||
mitmproxy
|
-- https://www.postgresql.org/message-id/flat/HE1PR8303MB0075BF78AF1BE904050DA16BF7729%40HE1PR8303MB0075.EURPRD83.prod.outlook.com
|
||||||
---------------------------------------------------------------------
|
-- SELECT citus.mitmproxy('conn.onQuery(query="^ALTER SUBSCRIPTION .* ENABLE").cancel(' || :pid || ')');
|
||||||
|
-- SELECT master_move_shard_placement(101, 'localhost', :worker_1_port, 'localhost', :worker_2_proxy_port);
|
||||||
(1 row)
|
|
||||||
|
|
||||||
SELECT master_move_shard_placement(101, 'localhost', :worker_1_port, 'localhost', :worker_2_proxy_port);
|
|
||||||
ERROR: canceling statement due to user request
|
|
||||||
-- failure on polling subscription state
|
-- failure on polling subscription state
|
||||||
SELECT citus.mitmproxy('conn.onQuery(query="^SELECT count\(\*\) FROM pg_subscription_rel").kill()');
|
SELECT citus.mitmproxy('conn.onQuery(query="^SELECT count\(\*\) FROM pg_subscription_rel").kill()');
|
||||||
mitmproxy
|
mitmproxy
|
||||||
|
|
|
||||||
|
|
@ -60,8 +60,10 @@ SELECT citus.mitmproxy('conn.onQuery(query="^ALTER SUBSCRIPTION .* ENABLE").kill
|
||||||
SELECT master_move_shard_placement(101, 'localhost', :worker_1_port, 'localhost', :worker_2_proxy_port);
|
SELECT master_move_shard_placement(101, 'localhost', :worker_1_port, 'localhost', :worker_2_proxy_port);
|
||||||
|
|
||||||
-- failure when enabling the subscriptions
|
-- failure when enabling the subscriptions
|
||||||
SELECT citus.mitmproxy('conn.onQuery(query="^ALTER SUBSCRIPTION .* ENABLE").cancel(' || :pid || ')');
|
-- This test can be enabled again once this postgres bug is fixed:
|
||||||
SELECT master_move_shard_placement(101, 'localhost', :worker_1_port, 'localhost', :worker_2_proxy_port);
|
-- https://www.postgresql.org/message-id/flat/HE1PR8303MB0075BF78AF1BE904050DA16BF7729%40HE1PR8303MB0075.EURPRD83.prod.outlook.com
|
||||||
|
-- SELECT citus.mitmproxy('conn.onQuery(query="^ALTER SUBSCRIPTION .* ENABLE").cancel(' || :pid || ')');
|
||||||
|
-- SELECT master_move_shard_placement(101, 'localhost', :worker_1_port, 'localhost', :worker_2_proxy_port);
|
||||||
|
|
||||||
-- failure on polling subscription state
|
-- failure on polling subscription state
|
||||||
SELECT citus.mitmproxy('conn.onQuery(query="^SELECT count\(\*\) FROM pg_subscription_rel").kill()');
|
SELECT citus.mitmproxy('conn.onQuery(query="^SELECT count\(\*\) FROM pg_subscription_rel").kill()');
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue