mirror of https://github.com/citusdata/citus.git
Fix distributed deadlock with partitions
If we have a multi shard update on a partition and a concurrent drop/create partition a distributed deadlock was possible. The reason for the deadlock is Postgres takes an AccessExclusiveLock on the parent table when dropping a partition, however it might not take AccessShareLock on the parent table when updating a partition if the child quals are already cached. When AccessShareLock is not taken on the parent, Drop command will be sent to workers and depending on the timing a DROP command can take an AccessExclusiveLock on one worker and an UPDATE can take AccessShareLock on another worker blocking each other to process. To resolve the issue, we now take an AccessExclusiveLock on the parent table in case of modify commands. This makes sure that DROP commands will wait.fix/distributed_deadlock1
parent
0722ec95bc
commit
7ca6794706
|
@ -1559,6 +1559,20 @@ LockPartitionsForDistributedPlan(DistributedPlan *distributedPlan)
|
|||
Oid targetRelationId = distributedPlan->targetRelationId;
|
||||
|
||||
LockPartitionsInRelationList(list_make1_oid(targetRelationId), RowExclusiveLock);
|
||||
|
||||
if (PartitionTable(targetRelationId))
|
||||
{
|
||||
Oid parentRelationId = PartitionParentOid(targetRelationId);
|
||||
|
||||
/*
|
||||
* We lock the parent relation after locking relations to prevent
|
||||
* distributed deadlock.
|
||||
* Postgres doesn't take AccessShareLock on the parent table when the
|
||||
* child quals are already cached and a drop/create partition can
|
||||
* result in a distributed deadlock with multi-shard update.
|
||||
*/
|
||||
LockRelationOid(parentRelationId, AccessShareLock);
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
|
|
Loading…
Reference in New Issue