Fix distributed deadlock with partitions

If we have a multi shard update on a partition and a concurrent drop/create partition a distributed deadlock was possible. The reason for the deadlock is Postgres takes an AccessExclusiveLock on the parent table when dropping a partition, however it might not take AccessShareLock on the parent table when updating a partition if the child quals are already cached. When AccessShareLock is not taken on the parent, Drop command will be sent to workers and depending on the timing a DROP command can take an AccessExclusiveLock on one worker and an UPDATE can take AccessShareLock on another worker blocking each other to process. To resolve the issue, we now take an AccessExclusiveLock on the parent table in case of modify commands. This makes sure that DROP commands will wait.
2021-08-03 16:39:42 +03:00 · 2021-08-03 16:39:42 +03:00 · 7ca6794706
parent 0722ec95bc
commit 7ca6794706
1 changed files with 14 additions and 0 deletions
--- a/src/backend/distributed/executor/adaptive_executor.c
+++ b/src/backend/distributed/executor/adaptive_executor.c
@ -1559,6 +1559,20 @@ LockPartitionsForDistributedPlan(DistributedPlan *distributedPlan)
 		Oid targetRelationId = distributedPlan->targetRelationId;
 		LockPartitionsInRelationList(list_make1_oid(targetRelationId), RowExclusiveLock);
 		if (PartitionTable(targetRelationId))
 		{
 			Oid parentRelationId = PartitionParentOid(targetRelationId);
 			/*
 			 * We lock the parent relation after locking relations to prevent
 			 * distributed deadlock.
 			 * Postgres doesn't take AccessShareLock on the parent table when the
 			 * child quals are already cached and a drop/create partition can
 			 * result in a distributed deadlock with multi-shard update.
 			 */
 			LockRelationOid(parentRelationId, AccessShareLock);
 		}
 	}
 	/*