Fix flakyness in multi_transaction_recovery (#6249)

Sometimes in CI multi_transaction_recovery would fail with the following
error:
```diff
 SET LOCAL citus.defer_drop_after_shard_move TO OFF;
 SELECT citus_move_shard_placement((SELECT * FROM selected_shard), 'localhost', :worker_1_port, 'localhost', :worker_2_port, shard_transfer_mode := 'block_writes');
- citus_move_shard_placement
----------------------------------------------------------------------
-
-(1 row)
-
+ERROR:  could not find placement matching "localhost:57637"
+HINT:  Confirm the placement still exists and try again.
 COMMIT;
```
Source: https://app.circleci.com/pipelines/github/citusdata/citus/26510/workflows/8269ea93-d9b4-4376-ae0e-8332a5c15fc6/jobs/755548

The reason for this was that when choosing `selected_shard` we didn't
ensure that it was actually located on the node that we were moving it
from. Instead we simply picked the first shard for the table that was
returned by the query.

To fix this issue this PR adds a filter to only choose shards that are
located on the intended node.

(cherry picked from commit 18015ca501)
pull/6363/head
Jelte Fennema 2022-08-26 11:48:55 +02:00
parent bf63788b98
commit 77596cd62f
2 changed files with 9 additions and 2 deletions

View File

@ -352,7 +352,10 @@ SELECT recover_prepared_transactions();
0
(1 row)
SELECT shardid INTO selected_shard FROM pg_dist_shard WHERE logicalrelid='test_2pcskip'::regclass LIMIT 1;
SELECT shardid INTO selected_shard
FROM citus_shards
WHERE table_name='test_2pcskip'::regclass AND nodeport = :worker_1_port
LIMIT 1;
SELECT COUNT(*) FROM pg_dist_transaction;
count
---------------------------------------------------------------------

View File

@ -193,7 +193,11 @@ SELECT create_distributed_table('test_2pcskip', 'a');
INSERT INTO test_2pcskip SELECT i FROM generate_series(0, 5)i;
SELECT recover_prepared_transactions();
SELECT shardid INTO selected_shard FROM pg_dist_shard WHERE logicalrelid='test_2pcskip'::regclass LIMIT 1;
SELECT shardid INTO selected_shard
FROM citus_shards
WHERE table_name='test_2pcskip'::regclass AND nodeport = :worker_1_port
LIMIT 1;
SELECT COUNT(*) FROM pg_dist_transaction;
BEGIN;
SET LOCAL citus.defer_drop_after_shard_move TO OFF;