citus/src/include/distributed
Onur Tirtir e4ce4d184e Properly detect no-op shard-key updates via UPDATE / MERGE (#8214)
DESCRIPTION: Fixes a bug that causes allowing UPDATE / MERGE queries
that may change the distribution column value.

Fixes: #8087.

Probably as of #769, we were not properly checking if UPDATE
may change the distribution column.

In #769, we had these checks:
```c
	if (targetEntry->resno != column->varattno)
	{
		/* target entry of the form SET some_other_col = <x> */
		isColumnValueChanged = false;
	}
	else if (IsA(setExpr, Var))
	{
		Var *newValue = (Var *) setExpr;
		if (newValue->varattno == column->varattno)
		{
			/* target entry of the form SET col = table.col */
			isColumnValueChanged = false;
		}
	}
```

However, what we check in "if" and in the "else if" are not so
different in the sense they both attempt to verify if SET expr
of the target entry points to the attno of given column. So, in
Also see this PR comment from #5220:
https://github.com/citusdata/citus/pull/5220#discussion_r699230597.
In #769, probably we actually wanted to first check whether both
SET expr of the target entry and given variable are pointing to the
same range var entry, but this wasn't what the "if" was checking,
so removed.

As a result, in the cases that are mentioned in the linked issue,
we were incorrectly concluding that the SET expr of the target
entry won't change given column just because it's pointing to the
same attno as given variable, regardless of what range var entries
the column and the SET expr are pointing to. Then we also started
using the same function to check for such cases for update action
of MERGE, so we have the same bug there as well.

So with this PR, we properly check for such cases by comparing
varno as well in TargetEntryChangesValue(). However, then some of
the existing tests started failing where the SET expr doesn't
directly assign the column to itself but the "where" clause could
actually imply that the distribution column won't change. Even before
we were not attempting to verify if "where" cluse quals could imply a
no-op assignment for the SET expr in such cases but that was not a
problem. This is because, for the most cases, we were always qualifying
such SET expressions as a no-op update as long as the SET expr's
attno is the same as given column's. For this reason, to prevent
regressions, this PR also adds some extra logic as well to understand
if the "where" clause quals could imply that SET expr for the
distribution key is a no-op.

Ideally, we should instead use "relation restriction equivalence"
mechanism to understand if the "where" clause implies a no-op
update. This is because, for instance, right now we're not able to
deduce that the update is a no-op when the "where" clause transitively
implies a no-op update, as in the case where we're setting "column a"
to "column c" and where clause looks like:
  "column a = column b AND column b = column c".
If this means a regression for some users, we can consider doing it
that way. Until then, as a workaround, we can suggest adding additional
quals to "where" clause that would directly imply equivalence.

Also, after fixing TargetEntryChangesValue(), we started successfully
deducing that the update action is a no-op for such MERGE queries:
```sql
MERGE INTO dist_1
USING dist_1 src
ON (dist_1.a = src.b)
WHEN MATCHED THEN UPDATE SET a = src.b;
```
However, we then started seeing below error for above query even
though now the update is qualified as a no-op update:
```
ERROR:  Unexpected column index of the source list
```
This was because of #8180 and #8201 fixed that.

In summary, with this PR:

* We disallow such queries,
  ```sql
  -- attno for dist_1.a, dist_1.b: 1, 2
  -- attno for dist_different_order_1.a, dist_different_order_1.b: 2, 1
  UPDATE dist_1 SET a = dist_different_order_1.b
  FROM dist_different_order_1
  WHERE dist_1.a dist_different_order_1.a;

  -- attno for dist_1.a, dist_1.b: 1, 2
  -- but ON (..) doesn't imply a no-op update for SET expr
  MERGE INTO dist_1
  USING dist_1 src
  ON (dist_1.a = src.b)
  WHEN MATCHED THEN UPDATE SET a = src.a;
  ```

* .. and allow such queries,
  ```sql
  MERGE INTO dist_1
  USING dist_1 src
  ON (dist_1.a = src.b)
  WHEN MATCHED THEN UPDATE SET a = src.b;
  ```

(cherry picked from commit 5eb1d93be1)
(cherry picked from commit 2fd20b3bb5dcc4d24cdee5985cf97c2e37a2b5e6)
2025-09-30 15:27:38 +03:00
..
commands Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
metadata Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
utils Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
adaptive_executor.h Refactor executor utility functions into multiple files (#6593) 2023-03-31 13:07:48 +02:00
argutils.h Add the necessary changes for rebalance strategies on enterprise (#3325) 2019-12-19 15:23:08 +01:00
backend_data.h Avoid re-assigning the global pid for client backends and bg workers when the application_name changes (#7791) 2024-12-24 09:34:08 +03:00
background_jobs.h PG16 compatibility: ruleutils and successful CREATE EXTENSION (#7087) 2023-08-02 16:04:51 +03:00
cancel_utils.h add IsHoldOffCancellationReceived utility function (#3290) 2019-12-12 17:32:59 +03:00
causal_clock.h Remove unused macros 2022-10-28 10:38:07 -07:00
citus_acquire_lock.h remove copyright years (#3286) 2019-12-11 21:14:08 +03:00
citus_clauses.h Rename master evaluation to coordinator evaluation 2020-07-07 10:37:41 +02:00
citus_custom_scan.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
citus_depended_object.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
citus_nodefuncs.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
citus_nodes.h refactor table ddl events scoped for shards (#4342) 2020-11-26 13:31:59 +01:00
citus_ruleutils.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
citus_safe_lib.h Fix format attribute and IsLocalReplicationOriginSessionActive errors (#7055) 2023-07-13 17:41:57 +03:00
colocation_utils.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
combine_query_planner.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
commands.h Propagates SECURITY LABEL ON ROLE stmt (#7304) (#7735) 2024-11-13 14:21:08 +03:00
connection_management.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
coordinator_protocol.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
cte_inline.h Remove citus.enable_cte_inlining GUC 2022-03-22 17:14:44 +01:00
deparse_shard_query.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
deparser.h Propagates SECURITY LABEL ON ROLE stmt (#7304) (#7735) 2024-11-13 14:21:08 +03:00
directed_acyclic_graph_execution.h Fill in jobIdList field of DistributedExecution 2020-02-05 17:32:22 +00:00
distributed_deadlock_detection.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
distributed_execution_locks.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
distributed_planner.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
distribution_column.h Add non-blocking variant of create_distributed_table (#6087) 2022-08-30 15:35:40 +03:00
enterprise.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
error_codes.h Issue worker messages with the same log level 2020-04-14 21:08:25 +02:00
errormessage.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
executor_util.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
extended_op_node_utils.h Fix some more master->coordinator comments 2020-07-07 10:37:53 +02:00
foreign_key_relationship.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
function_call_delegation.h Force-delegated functions' distribution argument must be reset as soon as the routine completes execution, 2022-02-17 10:48:30 -08:00
function_utils.h Semmle: Fix obvious issues (#3502) 2020-02-21 10:16:00 +01:00
hash_helpers.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
insert_select_executor.h This pull request introduces support for nonroutable merge commands in the following scenarios: 2023-06-19 12:23:40 -07:00
insert_select_planner.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
intermediate_result_pruning.h Refactor executor utility functions into multiple files (#6593) 2023-03-31 13:07:48 +02:00
intermediate_results.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
jsonbutils.h Multi tenant monitoring (#6725) 2023-04-05 17:44:17 +03:00
listutils.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
local_distributed_join_planner.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
local_executor.h Refactor executor utility functions into multiple files (#6593) 2023-03-31 13:07:48 +02:00
local_multi_copy.h Allow local execution for intermediate results in COPY 2021-02-09 15:00:06 +01:00
local_plan_cache.h Deparse/parse the local cached queries 2021-06-21 12:24:29 +03:00
locally_reserved_shared_connections.h COPY uses adaptive connection management on local node 2021-02-04 09:45:07 +01:00
lock_graph.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
log_utils.h Remove unused functions (#6220) 2022-08-22 11:53:25 +03:00
maintenanced.h PG16 compatibility: ruleutils and successful CREATE EXTENSION (#7087) 2023-08-02 16:04:51 +03:00
memutils.h Implementation for asycn FinishConnectionListEstablishment (#2584) 2019-03-22 17:30:42 +01:00
merge_executor.h This pull request introduces support for nonroutable merge commands in the following scenarios: 2023-06-19 12:23:40 -07:00
merge_planner.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
metadata_cache.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
metadata_sync.h Speed up SequenceUsedInDistributedTable (#7579) 2024-04-17 10:26:50 +02:00
metadata_utility.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
multi_executor.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
multi_explain.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
multi_join_order.h In the MERGE join clause, there is a datatype mismatch between target's distribution column 2023-07-27 16:06:00 -07:00
multi_logical_optimizer.h Fix union pushdown issue (#5079) 2021-07-29 13:52:55 +03:00
multi_logical_planner.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
multi_logical_replication.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
multi_partitioning_utils.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
multi_physical_planner.h Properly detect no-op shard-key updates via UPDATE / MERGE (#8214) 2025-09-30 15:27:38 +03:00
multi_progress.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
multi_router_planner.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
multi_server_executor.h This pull request introduces support for nonroutable merge commands in the following scenarios: 2023-06-19 12:23:40 -07:00
namespace_utils.h get rid of {Push/Pop}OverrideSearchPath (#7145) 2023-09-05 17:40:22 +02:00
param_utils.h Fixes: #5787 In prepared statements, map any unused parameters 2022-05-13 19:31:05 -07:00
pg_dist_background_job.h Implement infrastructure to run sql jobs in the background (#6296) 2022-09-09 16:11:19 +03:00
pg_dist_background_task.h Adds control for background task executors involving a node (#6771) 2023-04-06 14:12:39 +03:00
pg_dist_backrgound_task_depend.h Implement infrastructure to run sql jobs in the background (#6296) 2022-09-09 16:11:19 +03:00
pg_dist_cleanup.h 'Deferred Drop' and robust 'Shard Cleanup' for Splits. (#6258) 2022-09-06 12:11:20 -07:00
pg_dist_colocation.h Add distributioncolumncollation to to pg_dist_colocation 2019-12-09 19:51:40 +00:00
pg_dist_local_group.h Remove copyright years (#2918) 2019-10-15 17:44:30 +03:00
pg_dist_node.h Replace master with citus in logs and comments (#5210) 2021-08-26 11:31:17 +03:00
pg_dist_node_metadata.h MERGE: Support reference table as source with local table as target 2023-05-02 11:37:29 -07:00
pg_dist_partition.h Dont auto-undistribute user-added citus local tables (#5314) 2021-10-28 12:10:26 +03:00
pg_dist_placement.h Fix some more master->coordinator comments 2020-07-07 10:37:53 +02:00
pg_dist_rebalance_strategy.h Implement an improvement threshold in the rebalancer (#4927) 2021-05-11 14:24:59 +02:00
pg_dist_schema.h Rename pg_dist tenant_schema to pg_dist_schema (#7001) 2023-06-14 12:12:15 +03:00
pg_dist_shard.h Remove cstore_fdw-related logic 2021-11-16 13:59:03 +01:00
pg_dist_transaction.h Remove copyright years (#2918) 2019-10-15 17:44:30 +03:00
placement_access.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
placement_connection.h Drop support for Inactive Shard placements 2021-10-22 18:03:35 +02:00
priority.h Support changing CPU priorities for backends and shard moves (#6126) 2022-08-16 13:07:17 +03:00
query_colocation_checker.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
query_pushdown_planning.h Error out for queries with outer joins and pseudoconstant quals in PG<17 (#7937) 2025-05-20 16:56:23 +02:00
query_stats.h PG 15 Compat: Resolve compile issues + shmem requests 2022-07-15 10:11:39 +02:00
query_utils.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
recursive_planning.h Add GUC for queries with outer joins and pseudoconstant quals (#8163) 2025-08-28 00:17:55 +03:00
reference_table_utils.h PR #6728  / commit - 5 2023-03-30 10:53:22 +03:00
relation_access_tracking.h Detach relation access tracking from connection management 2022-07-28 11:27:59 +02:00
relation_restriction_equivalence.h PG16 compatibility - one more outer join check (#7126) 2023-08-17 19:07:18 +03:00
relation_utils.h move pg_version_constants.h to toplevel include (#7335) 2024-04-17 10:26:50 +02:00
relay_utility.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
remote_commands.h CDC implementation for Citus using Logical Replication (#6623) 2023-03-28 16:00:21 +05:30
remote_transaction.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
repartition_executor.h This pull request introduces support for nonroutable merge commands in the following scenarios: 2023-06-19 12:23:40 -07:00
repartition_join_execution.h Enable re-partition joins after local execution 2022-02-23 19:40:21 +01:00
replicate_none_dist_table_shard.h Not undistribute Citus local table when converting it to a reference table / single-shard table 2023-08-29 12:57:28 +03:00
replication_origin_session_utils.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
resource_lock.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
run_from_same_connection.h Remove copyright years (#2918) 2019-10-15 17:44:30 +03:00
shard_cleaner.h Drop SHARD_STATE_TO_DELETE (#6494) 2023-01-03 14:38:16 +03:00
shard_pruning.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
shard_rebalancer.h Propagates SECURITY LABEL ON ROLE stmt (#7304) (#7735) 2024-11-13 14:21:08 +03:00
shard_split.h 'Deferred Drop' and robust 'Shard Cleanup' for Splits. (#6258) 2022-09-06 12:11:20 -07:00
shard_transfer.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
shard_utils.h Switch to sequential mode on long partition names 2021-04-14 15:27:50 +03:00
shardinterval_utils.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
shardsplit_logical_replication.h Add non-blocking variant of create_distributed_table (#6087) 2022-08-30 15:35:40 +03:00
shardsplit_shared_memory.h Share more code between splits and moves (#6152) 2022-08-15 20:21:51 +03:00
shared_connection_stats.h PG 15 Compat: Resolve compile issues + shmem requests 2022-07-15 10:11:39 +02:00
shared_library_init.h PG16 compatibility: ruleutils and successful CREATE EXTENSION (#7087) 2023-08-02 16:04:51 +03:00
statistics_collection.h Remove copyright years (#2918) 2019-10-15 17:44:30 +03:00
string_utils.h Make enterprise features open source (#6008) 2022-06-16 00:23:46 -07:00
subplan_execution.h Locally execute queries that don't need any data access (#3410) 2020-01-23 18:28:34 +01:00
task_execution_utils.h Remove task tracker executor (#3850) 2020-07-18 13:11:36 +03:00
tdigest_extension.h Feature: tdigest aggregate (#3897) 2020-06-12 13:50:28 +02:00
tenant_schema_metadata.h Make citus_stat_tenants work with schema-based tenants. (#6936) 2023-06-13 14:11:45 +03:00
time_constants.h refactor some of hard coded values in citus gucs (#3137) 2019-10-30 10:35:39 +03:00
transaction_identifier.h Remove copyright years (#2918) 2019-10-15 17:44:30 +03:00
transaction_management.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
transaction_recovery.h Delete transactions when removing node 2020-12-07 11:35:20 +03:00
transmit.h Remove old re-partitioning functions 2022-04-04 18:11:52 +02:00
tuple_destination.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
tuplestore.h Remove copyright years (#2918) 2019-10-15 17:44:30 +03:00
type_utils.h This implements a new UDF citus_get_cluster_clock() that returns a monotonically 2022-10-28 10:15:08 -07:00
version_compat.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
worker_create_or_replace.h Propagate CREATE PUBLICATION statements 2023-03-29 00:59:12 +02:00
worker_log_messages.h Issue worker messages with the same log level 2020-04-14 21:08:25 +02:00
worker_manager.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
worker_protocol.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00
worker_shard_copy.h Exclude-Generated-Columns-In-Copy (#6721) 2023-03-07 18:15:50 +03:00
worker_shard_visibility.h Convert citus.hide_shards_from_app_name_prefixes to citus.show_shards_for_app_name_prefixes 2022-05-03 14:22:13 +02:00
worker_transaction.h Actually sort includes after cherry-pick 2024-04-17 10:26:50 +02:00