citus/src/backend
Mehmet YILMAZ 31911d8297
PG18 – Respect VACUUM/ANALYZE ONLY semantics for Citus tables (#8365)
fixes #8364

PostgreSQL 18 changes VACUUM/ANALYZE to recurse into inheritance
children by default, and introduces `ONLY` to limit processing to the
parent. Upstream change:

[https://github.com/postgres/postgres/commit/62ddf7ee9](https://github.com/postgres/postgres/commit/62ddf7ee9)

For Citus tables, we should treat shard placements as “children” and
avoid propagating `VACUUM/ANALYZE` to shards when the user explicitly
asks for `ONLY`.

This PR adjusts the Citus VACUUM handling to align with PG18 semantics,
and adds regression coverage on both regular distributed tables and
partitioned distributed tables.

---

### Behavior changes

* Introduce a per-relation helper struct:

  ```c
  typedef struct CitusVacuumRelation
  {
      VacuumRelation *vacuumRelation;
      Oid             relationId;
  } CitusVacuumRelation;
  ```

  This lets us keep both:

  * the resolved relation OID (for `IsCitusTable`, task building), and
* the original `VacuumRelation` node (for column list and ONLY/inh
flag).

* Replace the old `VacuumRelationIdList` / `ExtractVacuumTargetRels`
flow with:

  ```c
  static List *VacuumRelationList(VacuumStmt *vacuumStmt,
                                  CitusVacuumParams vacuumParams);
  ```

  `VacuumRelationList` now:

  * Iterates over `vacuumStmt->rels`.
* Resolves `relid` via `RangeVarGetRelidExtended` when `relation` is
present.
* Falls back to locking `VacuumRelation->oid` when only an OID is
available.
* Respects `VACOPT_FULL` for lock mode and `VACOPT_SKIP_LOCKED` for
locking behavior.
  * Builds a `List *` of `CitusVacuumRelation` entries.

* Update:

  ```c
  IsDistributedVacuumStmt(List *vacuumRelationList);
  ExecuteVacuumOnDistributedTables(VacuumStmt *vacuumStmt,
                                   List *vacuumRelationList,
                                   CitusVacuumParams vacuumParams);
  ```

  to operate on `CitusVacuumRelation` instead of bare OIDs.

* Implement `ONLY` semantics in `ExecuteVacuumOnDistributedTables`:

  ```c
  RangeVar *relation = vacuumRelation->relation;
  if (relation != NULL && !relation->inh)
  {
      /* ONLY specified, so don't recurse to shard placements */
      continue;
  }
  ```

  Effect:

* `VACUUM / ANALYZE` (no `ONLY`) on a Citus table: behavior unchanged,
Citus creates tasks and propagates to shard placements.
  * `VACUUM ONLY <citus_table>` / `ANALYZE ONLY <citus_table>`:

    * Core still processes the coordinator relation as usual.
* Citus **skips** building tasks for shard placements, so we do not
recurse into distributed children.
* The code compiles and behaves as before on pre-PG18; the new behavior
becomes observable only when the core planner starts setting `inh =
false` for `ONLY` (PG18).

* Unqualified `VACUUM` / `ANALYZE` (no rels) is unchanged and still
handled via `ExecuteUnqualifiedVacuumTasks`.

* Remove now-redundant helpers:

  * `VacuumColumnList`
  * `ExtractVacuumTargetRels`

Column lists are now taken directly from `vacuumRelation->va_cols` via
`CitusVacuumRelation`.

---

### Testing

Extend `src/test/regress/sql/pg18.sql` and `expected/pg18.out` with two
PG18-only blocks that verify we do not recurse into shard placements
when `ONLY` is used:

1. **Simple distributed table (`pg18_vacuum_part`)**

   * Create and distribute a regular table:

     ```sql
     CREATE SCHEMA pg18_vacuum_part;
     SET search_path TO pg18_vacuum_part;
     CREATE TABLE vac_analyze_only (a int);
     SELECT create_distributed_table('vac_analyze_only', 'a');
     INSERT INTO vac_analyze_only VALUES (1), (2), (3);
     ```

   * On the coordinator:

* Run `ANALYZE vac_analyze_only;` and later `ANALYZE ONLY
vac_analyze_only;`.
* Run `VACUUM vac_analyze_only;` and later `VACUUM ONLY
vac_analyze_only;`.

   * On `worker_1`:

* Capture `coalesce(max(last_analyze), 'epoch')` from
`pg_stat_user_tables` for `vac_analyze_only_%` into
`:analyze_before_only`, then assert:

       ```sql
SELECT max(last_analyze) = :'analyze_before_only'::timestamptz AS
analyze_only_skipped;
       ```

* Capture `coalesce(max(last_vacuum), 'epoch')` into
`:vacuum_before_only`, then assert:

       ```sql
SELECT max(last_vacuum) = :'vacuum_before_only'::timestamptz AS
vacuum_only_skipped;
       ```

Both checks return `t`, confirming `ONLY` does not change `last_analyze`
/ `last_vacuum` on shard tables.

2. **Partitioned distributed table (`pg18_vacuum_part_dist`)**

   * Create a partitioned table whose parent is distributed:

     ```sql
     CREATE SCHEMA pg18_vacuum_part_dist;
     SET search_path TO pg18_vacuum_part_dist;
     SET citus.shard_count = 2;
     SET citus.shard_replication_factor = 1;

     CREATE TABLE part_dist (id int, v int) PARTITION BY RANGE (id);
CREATE TABLE part_dist_1 PARTITION OF part_dist FOR VALUES FROM (1) TO
(100);
CREATE TABLE part_dist_2 PARTITION OF part_dist FOR VALUES FROM (100) TO
(200);

     SELECT create_distributed_table('part_dist', 'id');
     INSERT INTO part_dist SELECT g, g FROM generate_series(1, 199) g;
     ```

   * On the coordinator:

     * Run `ANALYZE part_dist;` then `ANALYZE ONLY part_dist;`.
* Run `VACUUM part_dist;` then `VACUUM ONLY part_dist;` (PG18 emits the
expected warning: `VACUUM ONLY of partitioned table "part_dist" has no
effect`).

   * On `worker_1`:

* Capture `coalesce(max(last_analyze), 'epoch')` for `part_dist_%` into
`:analyze_before_only`, then assert:

       ```sql
       SELECT max(last_analyze) = :'analyze_before_only'::timestamptz
              AS analyze_only_partitioned_skipped;
       ```

* Capture `coalesce(max(last_vacuum), 'epoch')` into
`:vacuum_before_only`, then assert:

       ```sql
       SELECT max(last_vacuum) = :'vacuum_before_only'::timestamptz
              AS vacuum_only_partitioned_skipped;
       ```

Both checks return `t`, confirming that even for a partitioned
distributed parent, `VACUUM/ANALYZE ONLY` does not recurse into shard
placements, and Citus behavior matches PG18’s “ONLY = parent only”
semantics.
2025-12-05 16:50:42 +03:00
..
columnar Keep temp reloid for columnar cases (#8309) 2025-11-06 23:15:52 +03:00
distributed PG18 – Respect VACUUM/ANALYZE ONLY semantics for Citus tables (#8365) 2025-12-05 16:50:42 +03:00
.gitignore Initial commit of Citus 5.0 2016-02-11 04:05:32 +02:00