mirror of https://github.com/citusdata/citus.git
Fix flakyness in failure_online_move_shard_placement (#6250)
Sometimes in CI failure_online_move_shard_placement fails with the following error: ```diff SELECT citus.mitmproxy('conn.onQuery(query="^ALTER SUBSCRIPTION .* ENABLE").cancel(' || :pid || ')'); mitmproxy ----------- (1 row) SELECT master_move_shard_placement(101, 'localhost', :worker_1_port, 'localhost', :worker_2_proxy_port); -ERROR: canceling statement due to user request +ERROR: tuple concurrently updated +CONTEXT: while executing command on localhost:9060 -- failure on polling subscription state ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26441/workflows/dd6e3475-6121-47b3-aea3-4ac92be114f4/jobs/751476/steps This error is not completely harmless, because based on the logs it mean that our cleanup logic failed, which in turn means that replication slots are left around: ``` 2022-08-24 16:01:29.247 UTC [1219] ERROR: XX000: tuple concurrently updated 2022-08-24 16:01:29.247 UTC [1219] LOCATION: simple_heap_update, heapam.c:4179 2022-08-24 16:01:29.247 UTC [1219] STATEMENT: ALTER SUBSCRIPTION citus_shard_move_subscription_10 DISABLE ``` However, we have other mechanisms to clean up any leftovers in case of a failed cleanup. So it's not that big of a problem. The reason we run into this error is arguably because of a Postgres bug, so I created a patch for Postgres that fixes this. While we wait for this (or a similar) patch to be merged, this PR disables the flaky test. There's still a test that tests in case of a connection "kill" instead of a "cancel", so I don't think we lose very important coverage by disabling this test. When trying to reproduce this I only hit this issue in the cancel case, so I don't think there's a need to disable the kill case for now.pull/6248/head
parent
2a0c0b3ba6
commit
77dd49fcf8
|
@ -110,14 +110,10 @@ SELECT master_move_shard_placement(101, 'localhost', :worker_1_port, 'localhost'
|
|||
ERROR: connection not open
|
||||
CONTEXT: while executing command on localhost:xxxxx
|
||||
-- failure when enabling the subscriptions
|
||||
SELECT citus.mitmproxy('conn.onQuery(query="^ALTER SUBSCRIPTION .* ENABLE").cancel(' || :pid || ')');
|
||||
mitmproxy
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
SELECT master_move_shard_placement(101, 'localhost', :worker_1_port, 'localhost', :worker_2_proxy_port);
|
||||
ERROR: canceling statement due to user request
|
||||
-- This test can be enabled again once this postgres bug is fixed:
|
||||
-- https://www.postgresql.org/message-id/flat/HE1PR8303MB0075BF78AF1BE904050DA16BF7729%40HE1PR8303MB0075.EURPRD83.prod.outlook.com
|
||||
-- SELECT citus.mitmproxy('conn.onQuery(query="^ALTER SUBSCRIPTION .* ENABLE").cancel(' || :pid || ')');
|
||||
-- SELECT master_move_shard_placement(101, 'localhost', :worker_1_port, 'localhost', :worker_2_proxy_port);
|
||||
-- failure on polling subscription state
|
||||
SELECT citus.mitmproxy('conn.onQuery(query="^SELECT count\(\*\) FROM pg_subscription_rel").kill()');
|
||||
mitmproxy
|
||||
|
|
|
@ -60,8 +60,10 @@ SELECT citus.mitmproxy('conn.onQuery(query="^ALTER SUBSCRIPTION .* ENABLE").kill
|
|||
SELECT master_move_shard_placement(101, 'localhost', :worker_1_port, 'localhost', :worker_2_proxy_port);
|
||||
|
||||
-- failure when enabling the subscriptions
|
||||
SELECT citus.mitmproxy('conn.onQuery(query="^ALTER SUBSCRIPTION .* ENABLE").cancel(' || :pid || ')');
|
||||
SELECT master_move_shard_placement(101, 'localhost', :worker_1_port, 'localhost', :worker_2_proxy_port);
|
||||
-- This test can be enabled again once this postgres bug is fixed:
|
||||
-- https://www.postgresql.org/message-id/flat/HE1PR8303MB0075BF78AF1BE904050DA16BF7729%40HE1PR8303MB0075.EURPRD83.prod.outlook.com
|
||||
-- SELECT citus.mitmproxy('conn.onQuery(query="^ALTER SUBSCRIPTION .* ENABLE").cancel(' || :pid || ')');
|
||||
-- SELECT master_move_shard_placement(101, 'localhost', :worker_1_port, 'localhost', :worker_2_proxy_port);
|
||||
|
||||
-- failure on polling subscription state
|
||||
SELECT citus.mitmproxy('conn.onQuery(query="^SELECT count\(\*\) FROM pg_subscription_rel").kill()');
|
||||
|
|
Loading…
Reference in New Issue