PG18: syntax & semantics behavior in Citus, part 1.
Includes PG18 tests for:
- OLD/NEW support in RETURNING clause of DML queries (PG commit
80feb727c)
- WITHOUT OVERLAPS in PRIMARY KEY and UNIQUE constraints (PG commit
fc0438b4e)
- COLUMNS clause in JSON_TABLE (PG commit bb766cd)
- Foreign tables created with LIKE <table> clause (PG commit 302cf1575)
- Foreign Key constraint with PERIOD clause (PG commit 89f908a6d)
- COPY command REJECT_LIMIT option (PG commit 4ac2a9bec)
- COPY TABLE TO on a materialized view (PG commit 534874fac)
Partially addresses issue #8250
fixes#8279
PostgreSQL 18 planner changes (probably AIO and updated cost model) make
sequential scans cheaper, so the psql `\d table`-style query that uses a
regex on `pg_class.relname` no longer chooses an index scan. This causes
a plan difference in the `mx_hide_shard_names` regression test.
This patch adds an alternative out for the multi_mx_hide_shard_names
test.
This PR introduces infrastructure and validation to detect breaking
changes during Citus minor version upgrades, designed to run in release
branches only.
**Breaking change detection:**
- [GUCs] Detects removed GUCs and changes to default values
- [UDFs] Detects removed functions and function signature changes
-- Supports backward-compatible function overloading (new optional
parameters allowed)
- [types] Detects removed data types
- [tables/views] Detects removed tables/views and removed/changed
columns
- New make targets for minor version upgrade tests
- Follow-up PRs will add test schedules with different upgrade scenarios
The test will be enabled in release branches (e.g., release-13) via the
new test-citus-minor-upgrade job shown below. It will not run on the
main branch.
Testing
Verified locally with sample breaking changes:
`make check-citus-minor-upgrade-local citus-old-version=v13.2.0
`
**Test case 1:** Backward-compatible signature change (allowed)
```
-- Old: CREATE FUNCTION pg_catalog.citus_blocking_pids(pBlockedPid integer)
-- New: CREATE FUNCTION pg_catalog.citus_blocking_pids(pBlockedPid integer, pBlockedByPid integer DEFAULT NULL)
```
No breaking change detected (new parameter has DEFAULT)
**Test case 2:** Incompatible signature change (breaking)
```
-- Old: CREATE FUNCTION pg_catalog.citus_blocking_pids(pBlockedPid integer)
-- New: CREATE FUNCTION pg_catalog.citus_blocking_pids(pBlockedPid integer, pBlockedByPid integer)
```
Breaking change detected:
`UDF signature removed: pg_catalog.citus_blocking_pids(pblockedpid
integer) RETURNS integer[]`
**Test case 3:** GUC changes (breaking)
- Removed `citus.max_worker_nodes_tracked`
- Changed default value of `citus.max_shared_pool_size` from 0 to 4
Breaking change detected:
```
The default value of GUC citus.max_shared_pool_size was changed from 0 to 4
GUC citus.max_worker_nodes_tracked was removed
```
**Test case 4:** Table/view changes
- Dropped `pg_catalog.pg_dist_rebalance_strategy` and removed a column
from `pg_catalog.citus_lock_waits`
```
- Column blocking_nodeid in table/view pg_catalog.citus_lock_waits was removed
- Table/view pg_catalog.pg_dist_rebalance_strategy was removed
```
**Test case 5:** Remove a custom type
- Dropped `cluster_clock` and the objects depend on it. In addition to
the dependent objects, test shows:
```
- Type pg_catalog.cluster_clock was removed
```
Sample new job for build and test workflow (for release branches):
```
test-citus-minor-upgrade:
name: PG17 - check-citus-minor-upgrade
runs-on: ubuntu-latest
container:
image: "${{ needs.params.outputs.citusupgrade_image_name }}:${{ fromJson(needs.params.outputs.pg17_version).full }}${{ needs.params.outputs.image_suffix }}"
options: --user root
needs:
- params
- build
env:
citus_version: 13.2
steps:
- uses: actions/checkout@v4
- uses: "./.github/actions/setup_extension"
with:
skip_installation: true
- name: Install and test citus minor version upgrade
run: |-
gosu circleci \
make -C src/test/regress \
check-citus-minor-upgrade \
bindir=/usr/lib/postgresql/${PG_MAJOR}/bin \
citus-pre-tar=/install-pg${PG_MAJOR}-citus${citus_version}.tar \
citus-post-tar=${GITHUB_WORKSPACE}/install-$PG_MAJOR.tar;
- uses: "./.github/actions/save_logs_and_results"
if: always()
with:
folder: ${{ env.PG_MAJOR }}_citus_minor_upgrade
- uses: "./.github/actions/upload_coverage"
if: always()
with:
flags: ${{ env.PG_MAJOR }}_citus_minor_upgrade
codecov_token: ${{ secrets.CODECOV_TOKEN }}
```
fixes#8264
PostgreSQL 18 introduced a planner improvement (commit `ae4569161`) that
rewrites simple `OR` equality clauses into `= ANY(...)` forms, allowing
the use of a single index scan instead of multiple scans or a custom
scan.
This change affects the columnar path tests where queries like `a=0 OR
a=5` previously chose a Columnar or Seq Scan plan.
In this PR:
* Updated test expectations for `uses_custom_scan` and `uses_seq_scan`
to reflect the new index scan plan.
This keeps the test output consistent with PostgreSQL 18’s updated
planner behavior.
fixes#83170d8bd0a72e
PG18 changed the wording of connection failures during `CREATE
SUBSCRIPTION` to include a subscription prefix and a verbose “connection
to server … failed:” preamble. This breaks one regression output
(`multi_move_mx`). This PR adds normalization rules to map PG18 output
back to the prior form so results are stable across PG15–PG18.
**What changes**
Add two rules in `src/test/regress/bin/normalize.sed`:
```sed
# PG18: drop 'subscription "<name>"' prefix
# remove when PG18 is the minimum supported version
s/^[[:space:]]*ERROR:[[:space:]]+subscription "[^"]+" could not connect to the publisher:[[:space:]]*/ERROR: could not connect to the publisher: /I
# PG18: drop verbose 'connection to server … failed:' preamble
s/^[[:space:]]*ERROR:[[:space:]]+could not connect to the publisher:[[:space:]]*connection to server .* failed:[[:space:]]*/ERROR: could not connect to the publisher: /I
```
**Before (PG18)**
```
ERROR: subscription "subs_01" could not connect to the publisher:
connection to server at "localhost" (::1), port 57637 failed:
root certificate file "/non/existing/certificate.crt" does not exist
```
**After normalization**
```
ERROR: could not connect to the publisher:
root certificate file "/non/existing/certificate.crt" does not exist
```
**Why**
Maintain identical regression outputs across supported PG versions while
Citus still supports PG<18.
PostgreSQL 18 adds a new line to text EXPLAIN with ANALYZE (`Index
Searches: N`). That extra line both creates noise and bumps psql’s `(N
rows)` footer. This PR keeps ANALYZE (so statements still execute) while
removing the version-specific churn in our regress outputs.
### What changed
* **Use `explain_filter(...)` instead of raw text EXPLAIN**
* In `local_shard_execution.sql` and
`local_shard_execution_replicated.sql`, replace direct:
```sql
EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF, BUFFERS OFF)
<stmt>;
```
with:
```sql
\pset footer off
SELECT public.explain_filter('EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF,
TIMING OFF, BUFFERS OFF) <stmt>');
\pset footer on
```
* Expected files updated accordingly to show the `explain_filter` output
block instead of raw EXPLAIN text.
* **Extend `explain_filter` to drop the PG18 line**
* Filter now removes any `Index Searches: <number>` line before
normalizing numeric fields, preventing the “N” version of the same line
from sneaking in.
* **Keep suite-wide normalizer intact**
fixes#8318
PostgreSQL 18 started printing an extra line in `EXPLAIN (ANALYZE …)`
for index scans:
```
Index Searches: N
```
This makes our test output flap (extra line, footer row count changes)
while the intent of the test is simply to **prove we plan an index
scan**, not to assert runtime counters.
### What this PR does
```sql
EXPLAIN (... analyze on, ...)
```
to
```sql
EXPLAIN (... analyze off, ...)
```
### Why this approach
* **Minimal change**: keeps the test’s purpose (“we exercise an index
scan”) without introducing normalization rules
* **Version-stable**: avoids PG18’s new text while remaining valid on
PG15–PG18.
* **No behavioral impact**: we still validate the plan node is an Index
Scan; we just don’t request execution.
### Before → After (essence)
Before (PG18):
```
Aggregate
-> Index Scan ... (actual rows=1 loops=1)
Index Cond: ...
Index Searches: 1
```
After:
```
Aggregate
-> Index Scan ...
Index Cond: ...
```
Fixes#8275 by printing the names in order so that in every message
`DETAIL: x and y are not co-located` x precedes (or is lexicographically
less than) y.
fixes#8267
* Extend `src/test/regress/bin/normalize.sed` to drop the new PG18
EXPLAIN instrumentation line:
```
Storage: <Memory|Disk|Memory and Disk> Maximum Storage: <size>
```
which appears under `Materialize`, some `CTE Scan`s, etc. when `ANALYZE`
is on.
**Why**
* PG18 added storage usage reporting for materialization/tuplestore
nodes. It’s useful for humans but creates noisy, non-semantic diffs in
regression output. There’s no EXPLAIN flag to suppress it, so we
normalize in tests instead. This PR wires that normalization into our
sed pipeline.
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1eff8279d
**How**
* Add a narrowly scoped sed rule that matches only lines starting with
`Storage:` (keeping `Sort Method`, `Hash`, `Buffers`, etc. intact). Use
ERE compatible with `sed -Ef` and Python `re` (no POSIX character
classes), e.g.:
```
/^[ \t]*Storage:[ \t].*$/d
```
fixes#8263
* Introduces `columnar_test_helpers._plan_json(q text) -> jsonb`, which
runs `EXPLAIN (FORMAT JSON, COSTS OFF, ANALYZE OFF)` and returns the
plan as `jsonb`. This lets us assert on plan structure instead of text
matching.
* Updates `columnar_indexes.sql` tests to detect whether any index-based
scan is used by searching the JSON plan’s `"Node Type"` (e.g., `Index
Scan`, `Index Only Scan`, `Bitmap Index Scan`).
## Notable changes
* New helper in `src/test/regress/sql/columnar_test_helpers.sql`:
* `EXECUTE format('EXPLAIN (FORMAT JSON, COSTS OFF, ANALYZE OFF) %s', q)
INTO j;`
* Returns the `jsonb` plan for downstream assertions.
```sql
SELECT NOT jsonb_path_exists(
columnar_test_helpers._plan_json('SELECT b FROM columnar_table WHERE b =
30000'),
'$[*].Plan.** ? (@."Node Type" like_regex "^(Index|Bitmap
Index).*Scan$")'
) AS uses_no_index_scan;
```
to verify no index scan occurs at the partial index boundary (`b =
30000`) and that an index scan is used where expected (`b = 30001`).
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
fixes#8277https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=45188c2ea
PostgreSQL 18 + newer OpenSSL builds surface `ssl_ciphers` as a **rule
string** (e.g., `HIGH:MEDIUM:+3DES:!aNULL`) instead of an expanded
cipher list. Our tests hard-pinned the literal list and started failing
on PG18. Also, with TLS 1.3 in the picture, we need to assert that
cipher configuration is sane without coupling to OpenSSL’s expansion.
**What changed**
* **sql/ssl_by_default.sql**
* Replace brittle `SHOW ssl_ciphers` string matching with invariant
checks:
* non-empty ciphers: `current_setting('ssl_ciphers') <> ''`
* looks like a rule/list: `position(':' in
current_setting('ssl_ciphers')) > 0`
* Run the same checks on **workers** via `run_command_on_workers`.
* Keep existing validations for `ssl=on`, `sslmode=require` in
`citus.node_conninfo`, and `pg_stat_ssl.ssl = true`.
* **expected/ssl_by_default.out**
* Update expected output to booleans for the new checks (less diff-prone
across PG/SSL variants).
Fixes#8268
PG18 added core async I/O infrastructure. Relevant PG commit:
https://github.com/postgres/postgres/commit/da72269
In `pg16.sql test`, we tested `REINDEX SYSTEM` command, which would
later cause io debug lines in a command of `multi_insert_select.sql`
test, that runs _after_ `pg16.sql` test.
"This is expected because the I/O subsystem is repopulating its internal
callback pool after heavy catalog I/O churn caused by REINDEX SYSTEM." -
chatgpt (seems right 😄)
This PR removes `REINDEX SYSTEM` test as its not crucial anyway.
`REINDEX DATABASE` is sufficient for the purpose presented in pg16.sql
test.
The failing queries all have a GROUP BY, and the fix teaches the Citus recursive planner how to handle a PG18 GROUP range table in the outer query:
- In recursive query planning, don't recurse into subquery expressions in a GROUP BY clause
- Flatten references to a GROUP rte before creating the worker subquery in pushdown planning
- If a PARAM node points to a GROUP rte then tunnel through to the underlying expression
Fixes#8296.
fixes#82650fbceae841
PostgreSQL 18 started printing an extra line in `EXPLAIN (ANALYZE …)`
for index scans:
```
Index Searches: N
```
**normalize.sed**: add a rule to remove the PG18-only line
```
/^\s*Index Searches:\s*\d+\s*$/d
```
0dca5d68d7fixes#8153
```diff
/citus/src/test/regress/expected/multi_subquery_misc.out
-- should error out
SELECT sql_subquery_test(1,1);
-ERROR: could not create distributed plan
-DETAIL: Possibly this is caused by the use of parameters in SQL functions, which is not supported in Citus.
-HINT: Consider using PL/pgSQL functions instead.
-CONTEXT: SQL function "sql_subquery_test" statement 1
+ sql_subquery_test
+-------------------
+ 307
+(1 row)
+
```
PostgreSQL 18 changes planner behavior for inlining/parameter handling
in SQL functions (pg18 commit `0dca5d68d`). As a result, a query in
`multi_subquery_misc` that previously failed to create a distributed
plan now succeeds. This PR updates the regression **expected** file to
reflect the new outcome on PG18+.
### What changed
* Updated `src/test/regress/expected/multi_subquery_misc.out`:
* Replaced the previous error block:
```
ERROR: could not create distributed plan
DETAIL: Possibly this is caused by the use of parameters in SQL
functions, which is not supported in Citus.
HINT: Consider using PL/pgSQL functions instead.
CONTEXT: SQL function "sql_subquery_test" statement 1
```
* with the actual successful result on PG18+:
```
sql_subquery_test
--------------------------------------------
307
(1 row)
```
* **PG < 18:** Behavior remains unchanged; the test still errors on
older versions.
* **PG ≥ 18:** The call succeeds; the updated expected output matches
actual results.
similar PR: https://github.com/citusdata/citus/pull/8184
fixes#8093c2a4078eba
- Enable buffer-usage reporting by default in `EXPLAIN ANALYZE` on
PostgreSQL 18 and above.
- Introduce the explicit `BUFFERS OFF` option in every existing
regression test to maintain pre-PG18 output consistency.
- This appends, `BUFFERS OFF` to all `EXPLAIN(...)` calls in
src/test/regress/sql and the corresponding .out files.
PG18 changed the names generated for child foreign key constraints.
https://github.com/postgres/postgres/commit/3db61db48
The test failures in Citus regression suite are all changing the name of
a constraint from `'sensors%'` to `'%to_parent%_1'`: the naming is very
nice here because `to_parent` means that we have a foreign key to a
parent table.
To fix the diff, we exclude those constraints from the output. To verify
correctness, we still count the problematic constraints to make sure
they are there - we are simply removing them from the first output (we
add this count query right after the previous one)
Fixes#8126
Co-authored-by: Mehmet YILMAZ <mehmety87@gmail.com>
This PR solves the following diffs, originating from the addition of
`varreturningtype` field to the `Var` struct in PG18:
https://github.com/postgres/postgres/commit/80feb727c
Previously we didn't account for this new field (as it's new), so this
wouldn't allow the parser to correctly reconstruct the `Var` node
structure, but rather it would error out with `did not find '}' at end
of input node`:
```diff
SELECT column_to_column_name(logicalrelid, partkey)
FROM pg_dist_partition WHERE partkey IS NOT NULL ORDER BY 1 LIMIT 1;
- column_to_column_name
----------------------------------------------------------------------
- a
-(1 row)
-
+ERROR: did not find '}' at end of input node
```
Solution follows precedent https://github.com/citusdata/citus/pull/7107,
when varnullingrels field was added to the `Var` struct in PG16.
The solution includes:
- Taking care of the `partkey` in `pg_dist_partition` table because it's
coming from the `Var` struct. This mainly includes fixing the upgrade
script to PG18, by saving all the `partkey` infos before upgrading to
PG18 (in `citus_prepare_pg_upgrade`), and then re-generating `partkey`
columns in `pg_dist_partition` (using `UPDATE`) after upgrading to PG18
(in `citus_finish_pg_upgrade`).
- Adding a normalize rule to fix output differences among PG versions.
Note that we need two normalize lines: one for PG15 since it doesn't
have `varnullingrels`, and one for PG16/PG17.
- Small trick on `metadata_sync_helpers` to use different text when
generating the `partkey`, based on the PG version.
Fixes#8189
3 minor changes to reduce some noise from the regression diffs.
1 - Reduce verbosity when ALTER EXTENSION fails
PG18 has improved reporting of errors in extension script files
Relevant PG commit:
https://github.com/postgres/postgres/commit/774171c4f
There was more context in PG18, so reducing verbosity
```
ALTER EXTENSION citus UPDATE TO '11.0-1';
ERROR: cstore_fdw tables are deprecated as of Citus 11.0
HINT: Install Citus 10.2 and convert your cstore_fdw tables to the
columnar access method before upgrading further
CONTEXT: PL/pgSQL function inline_code_block line 4 at RAISE
+SQL statement "DO LANGUAGE plpgsql
+$$
+BEGIN
+ IF EXISTS (SELECT 1 FROM pg_dist_shard where shardstorage = 'c') THEN
+ RAISE EXCEPTION 'cstore_fdw tables are deprecated as of Citus 11.0'
+ USING HINT = 'Install Citus 10.2 and convert your cstore_fdw tables
to the columnar access method before upgrading further';
+ END IF;
+END;
+$$"
+extension script file "citus--10.2-5--11.0-1.sql", near line 532
```
2 - Fix backend type order in tests for PG18
PG18 added another backend type which messed the order
in this test
Adding a separate IF condition for PG18
Relevant PG commit:
https://github.com/postgres/postgres/commit/18d67a8d7d
3 - Ignore "DEBUG: find_in_path" lines in output
Relevant PG commit:
https://github.com/postgres/postgres/commit/4f7f7b0375
The new GUC extension_control_path specifies a path to look for
extension control files.
6 minor changes to reduce some noise from the regression diffs.
1 - Add ORDER BY to fix subquery_in_where diff
2 - Disable buffers in explain analyze calls
Leftover work from
https://github.com/citusdata/citus/commit/f1f0b09f7
3 - Reduce verbosity to avoid diffs between PG versions
Relevant PG commit:
https://github.com/postgres/postgres/commit/0dca5d68d7
diff was:
```
CALL test_procedure_commit(2,5);
ERROR: COMMIT is not allowed in an SQL function
-CONTEXT: SQL function "test_procedure_commit" during startup
+CONTEXT: SQL function "test_procedure_commit" statement 2
```
4 - Rename array_sort to array_sort_citus since PG18 added array_sort
Relevant PG commit:
https://github.com/postgres/postgres/commit/6c12ae09f5a
Diff we were seeing in multi_array_agg, because the PG18 test was using
PG18's array_sort function instead:
```
-- Check that we return NULL in case there are no input rows to array_agg()
SELECT array_sort(array_agg(l_orderkey))
FROM lineitem WHERE l_orderkey < 0;
array_sort
------------
- {}
+
(1 row)
```
5 - Exclude not-null constraints from output to avoid diffs
PG18 has added pg_constraint rows for not-null constraints
Relevant PG commit
https://github.com/postgres/postgres/commit/14e87ffa5c
Remove them by condition contype <> 'n'
6 - Reduce verbosity to avoid md5 pwd deprecation warning in PG18
PG18 has deprecated MD5 passwords
Relevant PG commit:
https://github.com/postgres/postgres/commit/db6a4a985Fixes#8154Fixes#8157
Checksums are now on by default in PG18: 04bec894a
Upgrade to PG18 fails with the following error:
`old cluster does not use data checksums but the new one does`
To overcome this error, we add --data-checksums option such that
clusters with PG less than 18 also use data checksums.
Fixes#8229
DESCRIPTION: Normalize \d+ output in PG18 by filtering “Not-null
constraints” blocks
fixes#8095
**PR Description**
Postgres 18 started representing column `NOT NULL` as named constraints
in `pg_constraint`, and `psql \d+` now prints them under a `Not-null
constraints:` section. This caused extra diffs in our regression tests.
14e87ffa5c
This PR updates the normalization rules to strip those sections during
diff filtering by adding two regex rules:
* remove the `Not-null constraints:` header
* remove any indented constraint lines ending in `_not_null`
0dca5d68d7fixes#8153
```diff
diff -dU10 -w /__w/citus/citus/src/test/regress/expected/multi_sql_function.out /__w/citus/citus/src/test/regress/results/multi_sql_function.out
--- /__w/citus/citus/src/test/regress/expected/multi_sql_function.out.modified 2025-08-25 12:43:24.373634581 +0000
+++ /__w/citus/citus/src/test/regress/results/multi_sql_function.out.modified 2025-08-25 12:43:24.383634533 +0000
@@ -317,24 +317,25 @@
$$ LANGUAGE SQL STABLE;
INSERT INTO test_parameterized_sql VALUES(1, 1);
-- all of them should fail
SELECT * FROM test_parameterized_sql_function(1);
ERROR: cannot perform distributed planning on this query because parameterized queries for SQL functions referencing distributed tables are not supported
HINT: Consider using PL/pgSQL functions instead.
SELECT (SELECT 1 FROM test_parameterized_sql limit 1) FROM test_parameterized_sql_function(1);
ERROR: cannot perform distributed planning on this query because parameterized queries for SQL functions referencing distributed tables are not supported
HINT: Consider using PL/pgSQL functions instead.
SELECT test_parameterized_sql_function_in_subquery_where(1);
-ERROR: could not create distributed plan
-DETAIL: Possibly this is caused by the use of parameters in SQL functions, which is not supported in Citus.
-HINT: Consider using PL/pgSQL functions instead.
-CONTEXT: SQL function "test_parameterized_sql_function_in_subquery_where" statement 1
+ test_parameterized_sql_function_in_subquery_where
+---------------------------------------------------
+ 1
+(1 row)
+
```
allows custom vs. generic plans for SQL functions; arguments can be
folded to consts, enabling more rewrites/optimizations (and in your
case, routable Citus plans)
seems that P18 rewrote how LANGUAGE SQL functions are planned/executed:
they now go through the plan cache (like PL/pgSQL does) and can produce
custom plans with the function arguments substituted as constants. That
means your call
SELECT test_parameterized_sql_function_in_subquery_where(1);
is planned with org_id_val = 1 baked in, so Citus no longer sees an
unresolved Param inside the function body and is able to build a
distributed plan instead of tripping the old “params in SQL functions”
error path.
**What’s in here**
- Update `expected/multi_sql_function.out` to reflect PG18 behavior
- Add `expected/multi_sql_function_0.out` as an alternate expected file
that retains the pre-PG18 error output for the same test
This crash has been there for a while but wasn't tested before pg18.
PG18 added this test:
CREATE STATISTICS tst ON a FROM (VALUES (x)) AS foo;
which tries to create statistics on a derived-on-the-fly table (which is
not allowed) However Citus assumes we always have a valid table when
intercepting CREATE STATISTICS command to check for Citus tables
Added a check to return early if needed.
pg18 commit: https://github.com/postgres/postgres/commit/3eea4dc2cFixes#8212
DESCRIPTION: Fixes a bug that causes allowing UPDATE / MERGE queries
that may change the distribution column value.
Fixes: #8087.
Probably as of #769, we were not properly checking if UPDATE
may change the distribution column.
In #769, we had these checks:
```c
if (targetEntry->resno != column->varattno)
{
/* target entry of the form SET some_other_col = <x> */
isColumnValueChanged = false;
}
else if (IsA(setExpr, Var))
{
Var *newValue = (Var *) setExpr;
if (newValue->varattno == column->varattno)
{
/* target entry of the form SET col = table.col */
isColumnValueChanged = false;
}
}
```
However, what we check in "if" and in the "else if" are not so
different in the sense they both attempt to verify if SET expr
of the target entry points to the attno of given column. So, in
#5220, we even removed the first check because it was redundant.
Also see this PR comment from #5220:
https://github.com/citusdata/citus/pull/5220#discussion_r699230597.
In #769, probably we actually wanted to first check whether both
SET expr of the target entry and given variable are pointing to the
same range var entry, but this wasn't what the "if" was checking,
so removed.
As a result, in the cases that are mentioned in the linked issue,
we were incorrectly concluding that the SET expr of the target
entry won't change given column just because it's pointing to the
same attno as given variable, regardless of what range var entries
the column and the SET expr are pointing to. Then we also started
using the same function to check for such cases for update action
of MERGE, so we have the same bug there as well.
So with this PR, we properly check for such cases by comparing
varno as well in TargetEntryChangesValue(). However, then some of
the existing tests started failing where the SET expr doesn't
directly assign the column to itself but the "where" clause could
actually imply that the distribution column won't change. Even before
we were not attempting to verify if "where" cluse quals could imply a
no-op assignment for the SET expr in such cases but that was not a
problem. This is because, for the most cases, we were always qualifying
such SET expressions as a no-op update as long as the SET expr's
attno is the same as given column's. For this reason, to prevent
regressions, this PR also adds some extra logic as well to understand
if the "where" clause quals could imply that SET expr for the
distribution key is a no-op.
Ideally, we should instead use "relation restriction equivalence"
mechanism to understand if the "where" clause implies a no-op
update. This is because, for instance, right now we're not able to
deduce that the update is a no-op when the "where" clause transitively
implies a no-op update, as in the case where we're setting "column a"
to "column c" and where clause looks like:
"column a = column b AND column b = column c".
If this means a regression for some users, we can consider doing it
that way. Until then, as a workaround, we can suggest adding additional
quals to "where" clause that would directly imply equivalence.
Also, after fixing TargetEntryChangesValue(), we started successfully
deducing that the update action is a no-op for such MERGE queries:
```sql
MERGE INTO dist_1
USING dist_1 src
ON (dist_1.a = src.b)
WHEN MATCHED THEN UPDATE SET a = src.b;
```
However, we then started seeing below error for above query even
though now the update is qualified as a no-op update:
```
ERROR: Unexpected column index of the source list
```
This was because of #8180 and #8201 fixed that.
In summary, with this PR:
* We disallow such queries,
```sql
-- attno for dist_1.a, dist_1.b: 1, 2
-- attno for dist_different_order_1.a, dist_different_order_1.b: 2, 1
UPDATE dist_1 SET a = dist_different_order_1.b
FROM dist_different_order_1
WHERE dist_1.a dist_different_order_1.a;
-- attno for dist_1.a, dist_1.b: 1, 2
-- but ON (..) doesn't imply a no-op update for SET expr
MERGE INTO dist_1
USING dist_1 src
ON (dist_1.a = src.b)
WHEN MATCHED THEN UPDATE SET a = src.a;
```
* .. and allow such queries,
```sql
MERGE INTO dist_1
USING dist_1 src
ON (dist_1.a = src.b)
WHEN MATCHED THEN UPDATE SET a = src.b;
```
DESCRIPTION: Fixes a bug that causes an unexpected error when executing
repartitioned merge.
Fixes#8180.
This was happening because of a bug in
SourceResultPartitionColumnIndex(). And to fix it, this PR avoids
using DistributionColumnIndex() in SourceResultPartitionColumnIndex().
Instead, invents FindTargetListEntryWithVarExprAttno(), which finds
the index of the target entry in the source query's target list that
can be used to repartition the source for a repartitioned merge. In
short, to find the source target entry that refences the Var used in
ON (..) clause and that references the source rte, we should check the
varattno of the underlying expr, which presumably is always a Var for
repartitioned merge as we always wrap the source rte with a subquery,
where all target entries point to the columns of the original source
relation.
Using DistributionColumnIndex() prior to 13.0 wasn't causing such an
issue because prior to 13.0, the varattno of the underlying expr of
the source target entries was almost (*1) always equal to resno of the
target entry as we were including all target entries of the source
relation. However, starting with #7659, which is merged to main before
13.0, we started using CreateFilteredTargetListForRelation() instead of
CreateAllTargetListForRelation() to compute the target entry list for
the source rte to fix another bug. So we cannot revert to using
CreateAllTargetListForRelation() because otherwise we would re-introduce
bug that it helped fixing, so we instead had to find a way to properly
deal with the "filtered target list"s, as in this commit. Plus (*1),
even before #7659, probably we would still fail when the source relation
has dropped attributes or such because that would probably also cause
such a mismatch between the varattno of the underlying expr of the
target entry and its resno.
DESCRIPTION: Stabilize table_checks across PG15–PG18: switch to
pg_constraint, remove dupes, exclude NOT NUL
fixes#8138fixes#8131
**Problem**
```diff
diff -dU10 -w /__w/citus/citus/src/test/regress/expected/multi_create_table_constraints.out /__w/citus/citus/src/test/regress/results/multi_create_table_constraints.out
--- /__w/citus/citus/src/test/regress/expected/multi_create_table_constraints.out.modified 2025-08-18 12:26:51.991598284 +0000
+++ /__w/citus/citus/src/test/regress/results/multi_create_table_constraints.out.modified 2025-08-18 12:26:52.004598519 +0000
@@ -403,22 +403,30 @@
relid = 'check_example_partition_col_key_365068'::regclass;
Column | Type | Definition
---------------+---------+---------------
partition_col | integer | partition_col
(1 row)
SELECT "Constraint", "Definition" FROM table_checks WHERE relid='public.check_example_365068'::regclass;
Constraint | Definition
-------------------------------------+-----------------------------------
check_example_other_col_check | CHECK other_col >= 100
+ check_example_other_col_check | CHECK other_col >= 100
+ check_example_other_col_check | CHECK other_col >= 100
+ check_example_other_col_check | CHECK other_col >= 100
+ check_example_other_col_check | CHECK other_col >= 100
check_example_other_other_col_check | CHECK abs(other_other_col) >= 100
-(2 rows)
+ check_example_other_other_col_check | CHECK abs(other_other_col) >= 100
+ check_example_other_other_col_check | CHECK abs(other_other_col) >= 100
+ check_example_other_other_col_check | CHECK abs(other_other_col) >= 100
+ check_example_other_other_col_check | CHECK abs(other_other_col) >= 100
+(10 rows)
```
On PostgreSQL 18, `NOT NULL` is represented as a cataloged constraint
and surfaces through `information_schema.check_constraints`.
14e87ffa5c
Our helper view `table_checks` (built on
`information_schema.check_constraints` + `constraint_column_usage`)
started returning:
* Extra `…_not_null` rows (noise for our tests)
* Duplicate rows for real CHECKs due to the one-to-many join via
`constraint_column_usage`
* Occasional literal formatting differences (e.g., dates) coming from
the information\_schema deparser
### What changed
1. **Rewrite `table_checks` to use system catalogs directly**
We now select only expression-based, table-level constraints—excluding
NOT NULL—by filtering on `contype <> 'n'` and requiring `conbin IS NOT
NULL`. This yields the same effective set as real CHECKs while remaining
future-proof against non-CHECK constraint types.
```sql
CREATE OR REPLACE VIEW table_checks AS
SELECT
c.conname AS "Constraint",
'CHECK ' ||
-- drop a single pair of outer parens if the deparser adds them
regexp_replace(pg_get_expr(c.conbin, c.conrelid, true), '^\((.*)\)$', '\1')
AS "Definition",
c.conrelid AS relid
FROM pg_catalog.pg_constraint AS c
WHERE c.contype <> 'n' -- drop NOT NULL (PG18)
AND c.conbin IS NOT NULL -- only expression-bearing constraints (i.e., CHECKs)
AND c.conrelid <> 0 -- table-level only (exclude domains)
ORDER BY "Constraint", "Definition";
```
Why this filter?
* `contype <> 'n'` excludes PG18’s NOT NULL rows.
* `conbin IS NOT NULL` restricts to expression-backed constraints
(CHECKs); PK/UNIQUE/FK/EXCLUSION don’t have `conbin`.
* `conrelid <> 0` removes domain constraints.
2. **Add a PG18-specific regression test for `contype = 'n'`**
New test (`pg18_not_null_constraints`) verifies:
* Coordinator tables have `n` rows for NOT NULL (columns `a`, `c`),
* A worker shard has matching `n` rows,
* Dropping a NOT NULL on the coordinator propagates to shards (count
goes from 2 → 1),
* `table_checks` *never* reports NOT NULL, but does report a real CHECK
added for the test.
---
### Why this works (PG15–PG18)
* **Stable source of truth:** Directly reads `pg_constraint` instead of
`information_schema`.
* **No duplicates:** Eliminates the `constraint_column_usage` join,
removing multiplicity.
* **No NOT NULL noise:** PG18’s `contype = 'n'` is filtered out by
design.
* **Deterministic text:** Uses `pg_get_expr` and strips a single outer
set of parentheses for consistent output.
---
### Impact on tests
* Removes spurious `…_not_null` entries and duplicate `checky_…` rows
(e.g., in `multi_name_lengths` and similar).
* Existing expected files stabilize without adding brittle
normalizations.
* New PG18 test asserts correct catalog behavior and Citus propagation
while remaining a no-op on earlier PG versions.
---
14e87ffa5c
PostgreSQL 18 now records column `NOT NULL` constraints in
`pg_constraint` (`contype = 'n'`). That means queries that previously
listed “all constraints” for a relation now return extra rows, causing
noisy diffs in Citus regression tests. This PR narrows each catalog
probe to the specific constraint type under test
(PK/UNIQUE/EXCLUDE/CHECK), keeping results stable across PG15–PG18.
## What changed
* Update
`src/test/regress/sql/multi_alter_table_add_constraints_without_name.sql`
to:
* Add `AND con.contype IN ('p'|'u'|'x'|'c')` in each query, matching the
constraint just created.
* Join namespace via `rel.relnamespace` for robustness.
* Refresh
`src/test/regress/expected/multi_alter_table_add_constraints_without_name.out`
to reflect the filtered results.
## Why
* PG18 adds named `NOT NULL` entries to `pg_constraint`, which
previously lived only in `pg_attribute`. Tests that select from
`pg_constraint` without filtering now see extra rows (e.g.,
`*_not_null`), breaking expectations. Filtering by `contype` validates
exactly what the test intends (PK/UNIQUE/EXCLUDE/CHECK
naming/propagation) and ignores unrelated `NOT NULL` rows.
```diff
diff -dU10 -w /__w/citus/citus/src/test/regress/expected/multi_alter_table_add_constraints_without_name.out /__w/citus/citus/src/test/regress/results/multi_alter_table_add_constraints_without_name.out
--- /__w/citus/citus/src/test/regress/expected/multi_alter_table_add_constraints_without_name.out.modified 2025-09-11 14:36:52.521254512 +0000
+++ /__w/citus/citus/src/test/regress/results/multi_alter_table_add_constraints_without_name.out.modified 2025-09-11 14:36:52.549254440 +0000
@@ -20,34 +20,36 @@
ALTER TABLE AT_AddConstNoName.products ADD PRIMARY KEY(product_no);
SELECT con.conname
FROM pg_catalog.pg_constraint con
INNER JOIN pg_catalog.pg_class rel ON rel.oid = con.conrelid
INNER JOIN pg_catalog.pg_namespace nsp ON nsp.oid = connamespace
WHERE rel.relname = 'products';
conname
------------------------------
products_pkey
-(1 row)
+ products_product_no_not_null
+(2 rows)
-- Check that the primary key name created on the coordinator is sent to workers and
-- the constraints created for the shard tables conform to the <conname>_shardid naming scheme.
\c - - :public_worker_1_host :worker_1_port
SELECT con.conname
FROM pg_catalog.pg_constraint con
INNER JOIN pg_catalog.pg_class rel ON rel.oid = con.conrelid
INNER JOIN pg_catalog.pg_namespace nsp ON nsp.oid = connamespace
WHERE rel.relname = 'products_5410000';
conname
--------------------------------------
+ products_5410000_product_no_not_null
products_pkey_5410000
-(1 row)
+(2 rows)
```
after pr:
https://github.com/citusdata/citus/actions/runs/17697415668/job/50298622183#step:5:265
fixes#8186086c84b23d
PG18 emitting a more specific message for foreign-key violations when
the action is `RESTRICT` (SQLSTATE 23001), e.g.
`violates RESTRICT setting of foreign key constraint ...` and `Key (...)
is referenced from table ...`.
Older versions printed the generic FK text (SQLSTATE 23503), e.g.
`violates foreign key constraint ...` and `Key (...) is still referenced
from table ...`.
This change was causing noisy diffs in our regression tests (e.g.,
`multi_foreign_key.out`).
To keep a single set of expected files across PG15–PG18, this PR adds
two normalization rules to the test filter:
```sed
# PG18 FK wording -> legacy generic form
s/violates RESTRICT setting of foreign key constraint/violates foreign key constraint/g
# DETAIL line: "is referenced" -> "is still referenced"
s/\<is referenced from table\>/is still referenced from table/g
```
**Scope / impact**
* Test-only change; runtime behavior is unaffected.
* Keeps outputs stable across PG15–PG18 without version-splitting
expected files.
* Rules are narrowly targeted to the FK wording introduced in PG18.
with pr:
https://github.com/citusdata/citus/actions/runs/17698469722/job/50300960878#step:5:252
Also use pid in valgrind logfile name to avoid overwriting the valgrind
logs due to the memory errors that can happen in different processes
concurrently:
(from https://valgrind.org/docs/manual/manual-core.html)
```
--log-file=<filename>
Specifies that Valgrind should send all of its messages to the specified file. If the file name is empty, it causes an abort. There are three special format specifiers that can be used in the file name.
%p is replaced with the current process ID. This is very useful for program that invoke multiple processes. WARNING: If you use --trace-children=yes and your program invokes multiple processes OR your program forks without calling exec afterwards, and you don't use this specifier (or the %q specifier below), the Valgrind output from all those processes will go into one file, possibly jumbled up, and possibly incomplete.
```
With this change, we'll start having lots of valgrind output files
generated under "src/test/regress" with the same prefix,
citus_valgrind_test_log.txt, by default, during valgrind tests, so it'll
look a bit ugly; but one can use `cat
src/test/regress/citus_valgrind_test_log.txt.[0-9]*"` or such to combine
them into a single valgrind log file later.
Need to also check Postgres plan's rangetables for relations used in Initplans.
DESCRIPTION: Fix a bug in redundant WHERE clause detection; we need to
additionally check the Postgres plan's range tables for the presence of
citus tables, to account for relations that are referenced from scalar
subqueries.
There is a fundamental flaw in 4139370, the assumption that, after
Postgres planning has completed, all tables used in a query can be
obtained by walking the query tree. This is not the case for scalar
subqueries, which will be referenced by `PARAM` nodes. The fix adds an
additional check of the Postgres plan range tables; if there is at least
one citus table in there we do not need to change the needs distributed
planning flag.
Fixes#8159
- Downgrade replication lag reporting from NOTICE to DEBUG to reduce
noise and improve regression test stability.
- Add hints to certain replication status messages for better clarity.
- Update expected output files accordingly.
Test was not cleaning up after itself therefore failed consecutive runs
Test locally with:
make check-columnar-minimal
\ EXTRA_TESTS='columnar_chunk_filtering columnar_chunk_filtering'
DESCRIPTION: Normalize Actual Rows output in regression tests for PG18
compatibility
PostgreSQL 18 changed `EXPLAIN ANALYZE` to always print fractional row
counts (e.g. `1.00` instead of `1`).
95dbd827f2
This caused diffs across multiple output formats in Citus regression
tests:
* Text EXPLAIN: `actual rows=50.00` vs `actual rows=50`
* YAML: `Actual Rows: 1.00` vs `Actual Rows: 1`
* XML: `<Actual-Rows>1.00</Actual-Rows>` vs
`<Actual-Rows>1</Actual-Rows>`
* JSON: `"Actual Rows": 1.00` vs `"Actual Rows": 1`
* Placeholders: `rows=N.N` vs `rows=N`
This patch extends `normalize.sed` to strip trailing `.0…` from `Actual
Rows` in all supported formats and collapses placeholder values back to
`N`. With these changes, regression tests produce stable output across
PG15–PG18.
No functional changes to Citus itself — only test normalization was
updated.
Relevant PG18 commit:
c2a4078eba
- Enable buffer-usage reporting by default in `EXPLAIN ANALYZE` on
PostgreSQL 18 and above.
Solution:
- Introduce the explicit `BUFFERS OFF` option in every existing
regression test to maintain pre-PG18 output consistency.
- This appends, `BUFFERS OFF` to all `EXPLAIN ANALYZE(...)` calls in
src/test/regress/sql and the corresponding .out files.
fixes#8093
DESCRIPTION: Add `citus_stats` UDF
This UDF acts on a Citus table, and provides `null_frac`,
`most_common_vals` and `most_common_freqs` for each column in the table,
based on the definitions of these columns in the Postgres view
`pg_stats`.
**Aggregated Views: pg\_stats > citus\_stats**
citus\_stats, is a **view** intended for use in **Citus**, a distributed
extension of PostgreSQL. It collects and returns **column-level**
**statistics** for a distributed table—specifically, the **most common
values**, their **frequencies,** and **fraction of null values**, like
pg\_stats view does for regular Postgres tables.
**Use Case**
This view is useful when:
- You need **column-level insights** on a distributed table.
- You're performing **query optimization**, **cardinality estimation**,
or **data profiling** across shards.
**What It Returns**
A **table** with:
| Column Name | Data Type | Description |
|---------------------|-----------|-----------------------------------------------------------------------------|
| schemaname | text | Name of the schema containing the distributed
table |
| tablename | text | Name of the distributed table |
| attname | text | Name of the column (attribute) |
| null_frac | float4 | Estimated fraction of NULLs in the column across
all shards |
| most_common_vals | text[] | Array of most common values for the column
|
| most_common_freqs | float4[] | Array of corresponding frequencies (as
fractions) of the most common values|
**Caveats**
- The function assumes that the array of the most common values among
different shards will be the same, therefore it just adds everything up.
**DESCRIPTION:**
This pull request introduces the foundation and core logic for the
snapshot-based node split feature in Citus. This feature enables
promoting a streaming replica (referred to as a clone in this feature
and UI) to a primary node and rebalancing shards between the original
and the newly promoted node without requiring a full data copy.
This significantly reduces rebalance times for scale-out operations
where the new node already contains a full copy of the data via
streaming replication.
Key Highlights:
**1. Replica (Clone) Registration & Management Infrastructure**
Introduces a new set of UDFs to register and manage clone nodes:
- citus_add_clone_node()
- citus_add_clone_node_with_nodeid()
- citus_remove_clone_node()
- citus_remove_clone_node_with_nodeid()
These functions allow administrators to register a streaming replica of
an existing worker node as a clone, making it eligible for later
promotion via snapshot-based split.
**2. Snapshot-Based Node Split (Core Implementation)**
New core UDF:
- citus_promote_clone_and_rebalance()
This function implements the full workflow to promote a clone and
rebalance shards between the old and new primaries. Steps include:
1. Ensuring Exclusivity – Blocks any concurrent placement-changing
operations.
2. Blocking Writes – Temporarily blocks writes on the primary to ensure
consistency.
3. Replica Catch-up – Waits for the replica to be fully in sync.
4. Promotion – Promotes the replica to a primary using pg_promote.
5. Metadata Update – Updates metadata to reflect the newly promoted
primary node.
6. Shard Rebalancing – Redistributes shards between the old and new
primary nodes.
**3. Split Plan Preview**
A new helper UDF get_snapshot_based_node_split_plan() provides a preview
of the shard distribution post-split, without executing the promotion.
**Example:**
```
reb 63796> select * from pg_catalog.get_snapshot_based_node_split_plan('127.0.0.1',5433,'127.0.0.1',5453);
table_name | shardid | shard_size | placement_node
--------------+---------+------------+----------------
companies | 102008 | 0 | Primary Node
campaigns | 102010 | 0 | Primary Node
ads | 102012 | 0 | Primary Node
mscompanies | 102014 | 0 | Primary Node
mscampaigns | 102016 | 0 | Primary Node
msads | 102018 | 0 | Primary Node
mscompanies2 | 102020 | 0 | Primary Node
mscampaigns2 | 102022 | 0 | Primary Node
msads2 | 102024 | 0 | Primary Node
companies | 102009 | 0 | Clone Node
campaigns | 102011 | 0 | Clone Node
ads | 102013 | 0 | Clone Node
mscompanies | 102015 | 0 | Clone Node
mscampaigns | 102017 | 0 | Clone Node
msads | 102019 | 0 | Clone Node
mscompanies2 | 102021 | 0 | Clone Node
mscampaigns2 | 102023 | 0 | Clone Node
msads2 | 102025 | 0 | Clone Node
(18 rows)
```
**4 Test Infrastructure Enhancements**
- Added a new test case scheduler for snapshot-based split scenarios.
- Enhanced pg_regress_multi.pl to support creating node backups with
slightly modified options to simulate real-world backup-based clone
creation.
### 5. Usage Guide
The snapshot-based node split can be performed using the following
workflow:
**- Take a Backup of the Worker Node**
Run pg_basebackup (or an equivalent tool) against the existing worker
node to create a physical backup.
`pg_basebackup -h <primary_worker_host> -p <port> -D
/path/to/replica/data --write-recovery-conf
`
**- Start the Replica Node**
Start PostgreSQL on the replica using the backup data directory,
ensuring it is configured as a streaming replica of the original worker
node.
**- Register the Backup Node as a Clone**
Mark the registered replica as a clone of its original worker node:
`SELECT * FROM citus_add_clone_node('<clone_host>', <clone_port>,
'<primary_host>', <primary_port>);
`
**- Promote and Rebalance the Clone**
Promote the clone to a primary and rebalance shards between it and the
original worker:
`SELECT * FROM citus_promote_clone_and_rebalance('clone_node_id');
`
**- Drop Any Replication Slots from the Original Worker**
After promotion, clean up any unused replication slots from the original
worker:
`SELECT pg_drop_replication_slot('<slot_name>');
`