Revert replica identity creation order for shard moves (#6367)

In Citus 11.1.0 we changed the order of doing the initial data copy and the replica identity creation when doing a non blocking shard move. This was done to try and increase the speed with which shard moves could be done. But after doing more extensive performance testing this change turned out to have a negative impact on the speed of moves on the setups that I tested. Looking at the resource usage metrics of the VMs the reason for this seems to be that these shard moves were bottlenecked by disk bandwidth. While creating replica identities in bulk after the initial copy will reduce CPU usage a bit, it does require an additional sequence scan of the just written data. So when a VM is bottlenecked on disk, it makes sense to spend a little bit more CPU to avoid an additional scan. Since PKs are usually simple indexes that don't require lots of CPU to update, as opposed to e.g. GiST indexes. This reverts the order change to avoid a regression on shard move speed in these cases. For future releases we might consider re-evaluating our index creation order for other indexes too, and create "simple" indexes before the copy.
2022-09-23 14:55:25 +02:00 · 2022-09-23 14:55:25 +02:00 · d9a9a3263b
parent a868cc049a
commit d9a9a3263b
2 changed files with 40 additions and 14 deletions
--- a/src/backend/distributed/operations/shard_split.c
+++ b/src/backend/distributed/operations/shard_split.c
@ -1656,6 +1656,26 @@ NonBlockingShardSplit(SplitOperation splitOperation,
 			databaseName,
 			logicalRepTargetList);
 		/*
 		 * We have to create the primary key (or any other replica identity)
 		 * before the update/delete operations that are queued will be
 		 * replicated. Because if the replica identity does not exist on the
 		 * target, the replication would fail.
 		 *
 		 * So the latest possible moment we could do this is right after the
 		 * initial data COPY, but before enabling the susbcriptions. It might
 		 * seem like a good idea to it after the initial data COPY, since
 		 * it's generally the rule that it's cheaper to build an index at once
 		 * than to create it incrementally. This general rule, is why we create
 		 * all the regular indexes as late during the move as possible.
 		 *
 		 * But as it turns out in practice it's not as clear cut, and we saw a
 		 * speed degradation in the time it takes to move shards when doing the
 		 * replica identity creation after the initial COPY. So, instead we
 		 * keep it before the COPY.
 		 */
 		CreateReplicaIdentities(logicalRepTargetList);
 		ereport(LOG, (errmsg("performing copy for %s", operationName)));
 		/* 8) Do snapshotted Copy */
--- a/src/backend/distributed/replication/multi_logical_replication.c
+++ b/src/backend/distributed/replication/multi_logical_replication.c
@ -234,6 +234,26 @@ LogicallyReplicateShards(List *shardList, char *sourceNodeName, int sourceNodePo
 		/* only useful for isolation testing, see the function comment for the details */
 		ConflictOnlyWithIsolationTesting();
 		/*
 		 * We have to create the primary key (or any other replica identity)
 		 * before the update/delete operations that are queued will be
 		 * replicated. Because if the replica identity does not exist on the
 		 * target, the replication would fail.
 		 *
 		 * So the latest possible moment we could do this is right after the
 		 * initial data COPY, but before enabling the susbcriptions. It might
 		 * seem like a good idea to it after the initial data COPY, since
 		 * it's generally the rule that it's cheaper to build an index at once
 		 * than to create it incrementally. This general rule, is why we create
 		 * all the regular indexes as late during the move as possible.
 		 *
 		 * But as it turns out in practice it's not as clear cut, and we saw a
 		 * speed degradation in the time it takes to move shards when doing the
 		 * replica identity creation after the initial COPY. So, instead we
 		 * keep it before the COPY.
 		 */
 		CreateReplicaIdentities(logicalRepTargetList);
 		CopyShardsToNode(sourceNode, targetNode, shardList, snapshot);
 		/*
@ -347,20 +367,6 @@ CompleteNonBlockingShardTransfer(List *shardList,
 								 HTAB *groupedLogicalRepTargetsHash,
 								 LogicalRepType type)
 {
 	/*
 	 * We have to create the primary key (or any other replica identity)
 	 * before the update/delete operations that are queued will be
 	 * replicated. Because if the replica identity does not exist on the
 	 * target, the replication would fail.
 	 *
 	 * So we it right after the initial data COPY, but before enabling the
 	 * susbcriptions. We do it at this latest possible moment, because its
 	 * much cheaper to build an index at once than to create it
 	 * incrementally. So this way we create the primary key index in one go
 	 * for all data from the initial COPY.
 	 */
 	CreateReplicaIdentities(logicalRepTargetList);
 	/* Start applying the changes from the replication slots to catch up. */
 	EnableSubscriptions(logicalRepTargetList);