mirror of https://github.com/citusdata/citus.git
fixes #8364 PostgreSQL 18 changes VACUUM/ANALYZE to recurse into inheritance children by default, and introduces `ONLY` to limit processing to the parent. Upstream change: [https://github.com/postgres/postgres/commit/62ddf7ee9](https://github.com/postgres/postgres/commit/62ddf7ee9) For Citus tables, we should treat shard placements as “children” and avoid propagating `VACUUM/ANALYZE` to shards when the user explicitly asks for `ONLY`. This PR adjusts the Citus VACUUM handling to align with PG18 semantics, and adds regression coverage on both regular distributed tables and partitioned distributed tables. --- ### Behavior changes * Introduce a per-relation helper struct: ```c typedef struct CitusVacuumRelation { VacuumRelation *vacuumRelation; Oid relationId; } CitusVacuumRelation; ``` This lets us keep both: * the resolved relation OID (for `IsCitusTable`, task building), and * the original `VacuumRelation` node (for column list and ONLY/inh flag). * Replace the old `VacuumRelationIdList` / `ExtractVacuumTargetRels` flow with: ```c static List *VacuumRelationList(VacuumStmt *vacuumStmt, CitusVacuumParams vacuumParams); ``` `VacuumRelationList` now: * Iterates over `vacuumStmt->rels`. * Resolves `relid` via `RangeVarGetRelidExtended` when `relation` is present. * Falls back to locking `VacuumRelation->oid` when only an OID is available. * Respects `VACOPT_FULL` for lock mode and `VACOPT_SKIP_LOCKED` for locking behavior. * Builds a `List *` of `CitusVacuumRelation` entries. * Update: ```c IsDistributedVacuumStmt(List *vacuumRelationList); ExecuteVacuumOnDistributedTables(VacuumStmt *vacuumStmt, List *vacuumRelationList, CitusVacuumParams vacuumParams); ``` to operate on `CitusVacuumRelation` instead of bare OIDs. * Implement `ONLY` semantics in `ExecuteVacuumOnDistributedTables`: ```c RangeVar *relation = vacuumRelation->relation; if (relation != NULL && !relation->inh) { /* ONLY specified, so don't recurse to shard placements */ continue; } ``` Effect: * `VACUUM / ANALYZE` (no `ONLY`) on a Citus table: behavior unchanged, Citus creates tasks and propagates to shard placements. * `VACUUM ONLY <citus_table>` / `ANALYZE ONLY <citus_table>`: * Core still processes the coordinator relation as usual. * Citus **skips** building tasks for shard placements, so we do not recurse into distributed children. * The code compiles and behaves as before on pre-PG18; the new behavior becomes observable only when the core planner starts setting `inh = false` for `ONLY` (PG18). * Unqualified `VACUUM` / `ANALYZE` (no rels) is unchanged and still handled via `ExecuteUnqualifiedVacuumTasks`. * Remove now-redundant helpers: * `VacuumColumnList` * `ExtractVacuumTargetRels` Column lists are now taken directly from `vacuumRelation->va_cols` via `CitusVacuumRelation`. --- ### Testing Extend `src/test/regress/sql/pg18.sql` and `expected/pg18.out` with two PG18-only blocks that verify we do not recurse into shard placements when `ONLY` is used: 1. **Simple distributed table (`pg18_vacuum_part`)** * Create and distribute a regular table: ```sql CREATE SCHEMA pg18_vacuum_part; SET search_path TO pg18_vacuum_part; CREATE TABLE vac_analyze_only (a int); SELECT create_distributed_table('vac_analyze_only', 'a'); INSERT INTO vac_analyze_only VALUES (1), (2), (3); ``` * On the coordinator: * Run `ANALYZE vac_analyze_only;` and later `ANALYZE ONLY vac_analyze_only;`. * Run `VACUUM vac_analyze_only;` and later `VACUUM ONLY vac_analyze_only;`. * On `worker_1`: * Capture `coalesce(max(last_analyze), 'epoch')` from `pg_stat_user_tables` for `vac_analyze_only_%` into `:analyze_before_only`, then assert: ```sql SELECT max(last_analyze) = :'analyze_before_only'::timestamptz AS analyze_only_skipped; ``` * Capture `coalesce(max(last_vacuum), 'epoch')` into `:vacuum_before_only`, then assert: ```sql SELECT max(last_vacuum) = :'vacuum_before_only'::timestamptz AS vacuum_only_skipped; ``` Both checks return `t`, confirming `ONLY` does not change `last_analyze` / `last_vacuum` on shard tables. 2. **Partitioned distributed table (`pg18_vacuum_part_dist`)** * Create a partitioned table whose parent is distributed: ```sql CREATE SCHEMA pg18_vacuum_part_dist; SET search_path TO pg18_vacuum_part_dist; SET citus.shard_count = 2; SET citus.shard_replication_factor = 1; CREATE TABLE part_dist (id int, v int) PARTITION BY RANGE (id); CREATE TABLE part_dist_1 PARTITION OF part_dist FOR VALUES FROM (1) TO (100); CREATE TABLE part_dist_2 PARTITION OF part_dist FOR VALUES FROM (100) TO (200); SELECT create_distributed_table('part_dist', 'id'); INSERT INTO part_dist SELECT g, g FROM generate_series(1, 199) g; ``` * On the coordinator: * Run `ANALYZE part_dist;` then `ANALYZE ONLY part_dist;`. * Run `VACUUM part_dist;` then `VACUUM ONLY part_dist;` (PG18 emits the expected warning: `VACUUM ONLY of partitioned table "part_dist" has no effect`). * On `worker_1`: * Capture `coalesce(max(last_analyze), 'epoch')` for `part_dist_%` into `:analyze_before_only`, then assert: ```sql SELECT max(last_analyze) = :'analyze_before_only'::timestamptz AS analyze_only_partitioned_skipped; ``` * Capture `coalesce(max(last_vacuum), 'epoch')` into `:vacuum_before_only`, then assert: ```sql SELECT max(last_vacuum) = :'vacuum_before_only'::timestamptz AS vacuum_only_partitioned_skipped; ``` Both checks return `t`, confirming that even for a partitioned distributed parent, `VACUUM/ANALYZE ONLY` does not recurse into shard placements, and Citus behavior matches PG18’s “ONLY = parent only” semantics. |
||
|---|---|---|
| .. | ||
| columnar | ||
| distributed | ||
| .gitignore | ||