citus/src/backend/distributed/executor
Colm c1f5762645
Enhance MERGE .. WHEN NOT MATCHED BY SOURCE for repartitioned source (#7900)
DESCRIPTION: Ensure that a MERGE command on a distributed table with a
`WHEN NOT MATCHED BY SOURCE` clause runs against all shards of the
distributed table.

The Postgres MERGE command updates a table using a table or a query as a
data source. It provides three ways to match the target table with the
source: `WHEN MATCHED` means that there is a row in both the target and
source; `WHEN NOT MATCHED` means that there is a row in the source that
has no match (is not present) in the target; and, as of PG17, `WHEN NOT
MATCHED BY SOURCE` means that there is a row in the target that has no
match in the source.

In Citus, when a MERGE command updates a distributed table using a
local/reference table or a distributed query as source, that source is
repartitioned, and for each repartitioned shard that has data (i.e. 1 or
more rows) the MERGE is run against the corresponding distributed table
shard. Suppose the distributed table has 32 shards, and the source
repartitions into 4 shards that have data, with the remaining 28 shards
being empty; then the MERGE command is performed on the 4 corresponding
shards of the distributed table. However, the semantics of `WHEN NOT
MATCHED BY SOURCE` are that the specified action must be performed on
the target for each row in the target that is not in the source; so if
the source is empty, all target rows should be updated. To see this,
consider the following MERGE command:
```
MERGE INTO target AS t
USING source AS s ON t.id = s.id
WHEN NOT MATCHED BY SOURCE THEN UPDATE t SET t.col1 = 100
```
If the source has zero rows then every row in the target is updated s.t.
its col1 value is 100. Currently in Citus a MERGE on a distributed table
with a local/reference table or a distributed query as source ignores
shards of the distributed table when the corresponding shard of the
repartitioned source has zero rows. However, if the MERGE command
specifies a `WHEN NOT MATCHED BY SOURCE` clause, then the MERGE should
be performed on all shards of the distributed table, to ensure that the
specified action is performed on the target for each row in the target
that is not in the source. This PR enhances Citus MERGE execution so
that when a repartitioned source shard has zero rows, and the MERGE
command specifies a `WHEN NOT MATCHED BY SOURCE` clause, the MERGE is
performed against the corresponding shard of the distributed table using
an empty (zero row) relation as source, by generating a query of the
form:
```
MERGE INTO target_shard_0002 AS t
USING (SELECT id FROM (VALUES (NULL) ) source_0002(id) WHERE FALSE) AS s ON t.id = s.id
WHEN NOT MATCHED BY SOURCE THEN UPDATE t set t.col1 = 100
```
This works because each row in the target shard will be updated, and
`WHEN MATCHED` and `WHEN NOT MATCHED`, if specified, will be no-ops
because the source has zero rows.

To implement this when the source is a local or reference table involves
teaching function `ExcuteSourceAtCoordAndRedistribution()` in
`merge_executor.c` to not prune tasks when the query has `WHEN NOT
MATCHED BY SOURCE` but to instead replace the task's query to one that
uses an empty relation as source. And when the source is a distributed
query, function
`ExecuteMergeSourcePlanIntoColocatedIntermediateResults()` (also in
`merge_executor.c`) instead of skipping empty tasks now generates a
query that uses an empty relation as source for the corresponding target
shard of the distributed table, but again only when the query has `WHEN
NOT MATCHED BY SOURCE`. A new function `BuildEmptyResultQuery()` is
added to `recursive_planning.c` and it is used by both the
aforementioned functions in `merge_executor.c` to build an empty
relation to use as the source. It applies the appropriate type to each
column of the empty relation so the join with the target makes sense to
the query compiler.
2025-02-24 09:11:19 +00:00
..
adaptive_executor.c Drops PG14 support (#7753) 2025-02-03 17:13:40 +03:00
citus_custom_scan.c Rename foreach_ macros to foreach_declared_ macros (#7700) 2024-10-16 17:01:39 +03:00
directed_acyclic_graph_execution.c Rename foreach_ macros to foreach_declared_ macros (#7700) 2024-10-16 17:01:39 +03:00
distributed_execution_locks.c Rename foreach_ macros to foreach_declared_ macros (#7700) 2024-10-16 17:01:39 +03:00
distributed_intermediate_results.c Rename foreach_ macros to foreach_declared_ macros (#7700) 2024-10-16 17:01:39 +03:00
executor_util_params.c Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
executor_util_tasks.c Rename foreach_ macros to foreach_declared_ macros (#7700) 2024-10-16 17:01:39 +03:00
executor_util_tuples.c Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
insert_select_executor.c Generate qualified relation name (#7427) 2025-01-12 22:08:41 +03:00
intermediate_results.c Rename foreach_ macros to foreach_declared_ macros (#7700) 2024-10-16 17:01:39 +03:00
local_executor.c Rename foreach_ macros to foreach_declared_ macros (#7700) 2024-10-16 17:01:39 +03:00
merge_executor.c Enhance MERGE .. WHEN NOT MATCHED BY SOURCE for repartitioned source (#7900) 2025-02-24 09:11:19 +00:00
multi_executor.c Rename foreach_ macros to foreach_declared_ macros (#7700) 2024-10-16 17:01:39 +03:00
multi_server_executor.c Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
partitioned_intermediate_results.c Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
placement_access.c Rename foreach_ macros to foreach_declared_ macros (#7700) 2024-10-16 17:01:39 +03:00
query_stats.c Drops PG14 support (#7753) 2025-02-03 17:13:40 +03:00
repartition_executor.c Enhance MERGE .. WHEN NOT MATCHED BY SOURCE for repartitioned source (#7900) 2025-02-24 09:11:19 +00:00
repartition_join_execution.c Rename foreach_ macros to foreach_declared_ macros (#7700) 2024-10-16 17:01:39 +03:00
subplan_execution.c Rename foreach_ macros to foreach_declared_ macros (#7700) 2024-10-16 17:01:39 +03:00
transmit.c Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
tuple_destination.c Fix getting heap tuple size (#7387) 2025-01-12 22:06:46 +03:00