citus/src/backend/distributed/operations
Muhammad Usama be6668e440
Snapshot-Based Node Split – Foundation and Core Implementation (#8122)
**DESCRIPTION:**
This pull request introduces the foundation and core logic for the
snapshot-based node split feature in Citus. This feature enables
promoting a streaming replica (referred to as a clone in this feature
and UI) to a primary node and rebalancing shards between the original
and the newly promoted node without requiring a full data copy.

This significantly reduces rebalance times for scale-out operations
where the new node already contains a full copy of the data via
streaming replication.

Key Highlights:
**1. Replica (Clone) Registration & Management Infrastructure**

Introduces a new set of UDFs to register and manage clone nodes:

- citus_add_clone_node()
- citus_add_clone_node_with_nodeid()
- citus_remove_clone_node()
- citus_remove_clone_node_with_nodeid()

These functions allow administrators to register a streaming replica of
an existing worker node as a clone, making it eligible for later
promotion via snapshot-based split.

**2. Snapshot-Based Node Split (Core Implementation)**
New core UDF: 

- citus_promote_clone_and_rebalance()

This function implements the full workflow to promote a clone and
rebalance shards between the old and new primaries. Steps include:

1. Ensuring Exclusivity – Blocks any concurrent placement-changing
operations.
2. Blocking Writes – Temporarily blocks writes on the primary to ensure
consistency.
3. Replica Catch-up – Waits for the replica to be fully in sync.
4. Promotion – Promotes the replica to a primary using pg_promote.
5. Metadata Update – Updates metadata to reflect the newly promoted
primary node.
6. Shard Rebalancing – Redistributes shards between the old and new
primary nodes.


**3. Split Plan Preview**
A new helper UDF get_snapshot_based_node_split_plan() provides a preview
of the shard distribution post-split, without executing the promotion.

**Example:**

```
reb 63796> select * from pg_catalog.get_snapshot_based_node_split_plan('127.0.0.1',5433,'127.0.0.1',5453);
  table_name  | shardid | shard_size | placement_node 
--------------+---------+------------+----------------
 companies    |  102008 |          0 | Primary Node
 campaigns    |  102010 |          0 | Primary Node
 ads          |  102012 |          0 | Primary Node
 mscompanies  |  102014 |          0 | Primary Node
 mscampaigns  |  102016 |          0 | Primary Node
 msads        |  102018 |          0 | Primary Node
 mscompanies2 |  102020 |          0 | Primary Node
 mscampaigns2 |  102022 |          0 | Primary Node
 msads2       |  102024 |          0 | Primary Node
 companies    |  102009 |          0 | Clone Node
 campaigns    |  102011 |          0 | Clone Node
 ads          |  102013 |          0 | Clone Node
 mscompanies  |  102015 |          0 | Clone Node
 mscampaigns  |  102017 |          0 | Clone Node
 msads        |  102019 |          0 | Clone Node
 mscompanies2 |  102021 |          0 | Clone Node
 mscampaigns2 |  102023 |          0 | Clone Node
 msads2       |  102025 |          0 | Clone Node
(18 rows)

```
**4 Test Infrastructure Enhancements**

- Added a new test case scheduler for snapshot-based split scenarios.
- Enhanced pg_regress_multi.pl to support creating node backups with
slightly modified options to simulate real-world backup-based clone
creation.

### 5. Usage Guide
The snapshot-based node split can be performed using the following
workflow:

**- Take a Backup of the Worker Node**
Run pg_basebackup (or an equivalent tool) against the existing worker
node to create a physical backup.

`pg_basebackup -h <primary_worker_host> -p <port> -D
/path/to/replica/data --write-recovery-conf
`

**- Start the Replica Node**
Start PostgreSQL on the replica using the backup data directory,
ensuring it is configured as a streaming replica of the original worker
node.

**- Register the Backup Node as a Clone**
Mark the registered replica as a clone of its original worker node:

`SELECT * FROM citus_add_clone_node('<clone_host>', <clone_port>,
'<primary_host>', <primary_port>);
`

**- Promote and Rebalance the Clone**
Promote the clone to a primary and rebalance shards between it and the
original worker:

`SELECT * FROM citus_promote_clone_and_rebalance('clone_node_id');
`

**- Drop Any Replication Slots from the Original Worker**
After promotion, clean up any unused replication slots from the original
worker:

`SELECT pg_drop_replication_slot('<slot_name>');
`
2025-08-19 14:13:55 +03:00
..
citus_create_restore_point.c Rename foreach_ macros to foreach_declared_ macros (#7700) 2025-03-12 11:01:49 +03:00
citus_split_shard_by_split_points.c Sort includes (#7326) 2023-11-23 18:19:54 +01:00
citus_tools.c Sort includes (#7326) 2023-11-23 18:19:54 +01:00
create_shards.c Rename foreach_ macros to foreach_declared_ macros (#7700) 2025-03-12 11:01:49 +03:00
delete_protocol.c Rename foreach_ macros to foreach_declared_ macros (#7700) 2025-03-12 11:01:49 +03:00
health_check.c Rename foreach_ macros to foreach_declared_ macros (#7700) 2025-03-12 11:01:49 +03:00
isolate_shards.c Sort includes (#7326) 2023-11-23 18:19:54 +01:00
modify_multiple_shards.c Sort includes (#7326) 2023-11-23 18:19:54 +01:00
node_promotion.c Snapshot-Based Node Split – Foundation and Core Implementation (#8122) 2025-08-19 14:13:55 +03:00
node_protocol.c Add PG 18Beta1 compatibility (Build + RuleUtils) (#7981) 2025-07-16 15:30:41 +03:00
partitioning.c Sort includes (#7326) 2023-11-23 18:19:54 +01:00
replicate_none_dist_table_shard.c Rename foreach_ macros to foreach_declared_ macros (#7700) 2025-03-12 11:01:49 +03:00
shard_cleaner.c Rename some more foreach_ptr to foreach_declared_ptr 2025-03-13 15:13:56 +03:00
shard_rebalancer.c Snapshot-Based Node Split – Foundation and Core Implementation (#8122) 2025-08-19 14:13:55 +03:00
shard_split.c Parallelize Shard Rebalancing & Unlock Concurrent Logical Shard Moves (#7983) 2025-08-18 17:44:14 +03:00
shard_transfer.c Snapshot-Based Node Split – Foundation and Core Implementation (#8122) 2025-08-19 14:13:55 +03:00
stage_protocol.c Propagates GRANT/REVOKE rights on table columns (#7918) 2025-04-04 11:54:16 +03:00
worker_copy_table_to_node_udf.c Add PG 18Beta1 compatibility (Build + RuleUtils) (#7981) 2025-07-16 15:30:41 +03:00
worker_node_manager.c Rename foreach_ macros to foreach_declared_ macros (#7700) 2025-03-12 11:01:49 +03:00
worker_shard_copy.c Sort includes (#7326) 2023-11-23 18:19:54 +01:00
worker_split_copy_udf.c Rename foreach_ macros to foreach_declared_ macros (#7700) 2025-03-12 11:01:49 +03:00
worker_split_shard_release_dsm_udf.c Sort includes (#7326) 2023-11-23 18:19:54 +01:00
worker_split_shard_replication_setup_udf.c Rename foreach_ macros to foreach_declared_ macros (#7700) 2025-03-12 11:01:49 +03:00