Commit Graph

2 Commits (122a541c5896ce3d7abc125f9949fcf9d9dcac06)

Author SHA1 Message Date
Muhammad Usama 122a541c58 Add test for parallel reference table transfer and adjust tests for concurrent locking behavior
This commit introduces the following changes:
- New test case to verify parallel transfer of reference table shards
  during rebalance.

- Adjustments to existing test cases to reflect the revised locking strategy,
  which now permits concurrent shard movements within a colocation group.
  These changes align test expectations with the new execution model,
  where multiple shard transfers can progress in parallel.
  Additionally, with the updated locking mechanism, shard transfer tasks
  that involve minimal or no data movement may complete almost
  instantly—transitioning from running to done in a split second.
  As a result, some test assertions were updated accordingly.
  Note that this does not alter any underlying functionality,
  only the timing of task state transitions.

- Few indent fixes are also part of the commit
- Move DROP citus_rebalance_start statement before its creation command.
- Fixing the path in downgrade script
2025-08-11 13:24:58 +03:00
Muhammad Usama b86fde3520 Parallelize Shard Rebalancing & Enable Concurrent Logical Shard Movement
Citus’ shard rebalancer has some key performance bottlenecks:
- Sequential Movement of Reference Tables:
  Reference tables are often assumed to be small, but in real-world deployments,
  they can grow significantly large. Previously, reference table shards were
  transferred as a single unit, making the process monolithic and time-consuming.

- No Parallelism Within a Colocation Group:
  Although Citus distributes data using colocated shards, shard movements
  within the same colocation group were serialized.
  In environments with hundreds of distributed tables colocated together,
  this serialization significantly slowed down rebalance operations.

- Excessive Locking:
  Rebalancer used restrictive locks and redundant logical replication guards,
  further limiting concurrency.

The goal of this commit is to eliminate these inefficiencies and enable maximum
parallelism during rebalance, without compromising correctness or compatibility.

Parallelize shard rebalancing to reduce rebalance time

Feature Summary:

1. Parallel Reference Table Rebalancing
   Each reference-table shard is now copied in its own background task.
   Foreign key and other constraints are deferred until all shards are copied.

   For single shard movement without considering colocation a new internal-only
   UDF 'citus_internal_copy_single_shard_placement' is introduced to allow
   single-shard copy/move operations.
   Since this function is internal, we do not allow users to call it directly.

   **Temporary Hack to Set Background Task Context**
     Background tasks cannot currently set custom GUCs like application_name
     before executing internal-only functions.
     To work around this, we inject a SET LOCAL application_name TO
     'citus_rebalancer ...' statement as a prefix in the task command.
     This is a temporary hack to label internal tasks until proper GUC
     injection support is added to the background task executor.

2. Changes in Locking Strategy
  - Drop the leftover replication lock that previously serialized shard
    moves performed via logical replication.
    This lock was only needed when we used to drop and recreate the
    subscriptions/publications before each move. Since Citus now removes
    those objects later as part of the “unused distributed objects” cleanup,
    shard moves via logical replication can safely run in parallel without
    additional locking.

  - Introduced a per-shard advisory lock to prevent concurrent operations
    on the same shard while allowing maximum parallelism elsewhere.

  - Change the lock mode in AcquirePlacementColocationLock from
    ExclusiveLock to RowExclusiveLock to allow concurrent updates within
    the same colocation group, while still preventing concurrent DDL operations.

3. citus_rebalance_start() enhancements
    The citus_rebalance_start() function now accepts two new optional parameters:
     - parallel_transfer_colocated_shards BOOLEAN DEFAULT false,
     - parallel_transfer_reference_tables BOOLEAN DEFAULT false
    This ensures backward compatibility by preserving the existing behavior
    and avoiding any disruption to user expectations and when both are set
    to true, the rebalancer operates with full parallelism.

  Previous Rebalancer Behavior:

  SELECT citus_rebalance_start(shard_transfer_mode := 'force_logical');
  This would:
    Start a single background task for replicating all reference tables
    Then, move all shards serially, one at a time.

     Task 1: replicate_reference_tables()
     ↓
     Task 2: move_shard_1()
     ↓
     Task 3: move_shard_2()
     ↓
     Task 4: move_shard_3()
    Slow and sequential. Reference table copy is a bottleneck.
    Colocated shards must wait for each other.

  New Parallel Rebalancer:

  SELECT citus_rebalance_start(
    shard_transfer_mode := 'force_logical',
    parallel_transfer_colocated_shards := true,
    parallel_transfer_reference_tables := true
  );
  This would:
    - Schedule independent background tasks for each reference-table shard.
    - Move colocated shards in parallel, while still maintaining
      dependency order.
    - Defer constraint application until all reference shards are in place.

      Task 1: copy_ref_shard_1()
      Task 2: copy_ref_shard_2()
      Task 3: copy_ref_shard_3()
        → Task 4: apply_constraints()
      ↓
     Task 5: copy_shard_1()
     Task 6: copy_shard_2()
     Task 7: copy_shard_3()
     ↓
     Task 8-10: move_shard_1..3()
   Each operation is scheduled independently and can run as
   soon as dependencies are satisfied.
2025-08-06 18:33:00 +03:00