Fix distributed deadlock with partitions

If we have a multi shard update on a partition and a concurrent
drop/create partition a distributed deadlock was possible.

The reason for the deadlock is Postgres takes an AccessExclusiveLock on
the parent table when dropping a partition, however it might not take
AccessShareLock on the parent table when updating a partition if the
child quals are already cached.

When AccessShareLock is not taken on the parent, Drop command will be
sent to workers and depending on the timing a DROP command can take an
AccessExclusiveLock on one worker and an UPDATE can take AccessShareLock
on another worker blocking each other to process.

To resolve the issue, we now take an AccessExclusiveLock on the parent table in case of
modify commands. This makes sure that DROP commands will wait.
fix/distributed_deadlock1
Sait Talha Nisanci 2021-08-03 16:39:42 +03:00
parent 0722ec95bc
commit 7ca6794706
1 changed files with 14 additions and 0 deletions

View File

@ -1559,6 +1559,20 @@ LockPartitionsForDistributedPlan(DistributedPlan *distributedPlan)
Oid targetRelationId = distributedPlan->targetRelationId; Oid targetRelationId = distributedPlan->targetRelationId;
LockPartitionsInRelationList(list_make1_oid(targetRelationId), RowExclusiveLock); LockPartitionsInRelationList(list_make1_oid(targetRelationId), RowExclusiveLock);
if (PartitionTable(targetRelationId))
{
Oid parentRelationId = PartitionParentOid(targetRelationId);
/*
* We lock the parent relation after locking relations to prevent
* distributed deadlock.
* Postgres doesn't take AccessShareLock on the parent table when the
* child quals are already cached and a drop/create partition can
* result in a distributed deadlock with multi-shard update.
*/
LockRelationOid(parentRelationId, AccessShareLock);
}
} }
/* /*