Disable DDL propagation for the vanilla test suite. This enables the
vanilla `database ` test to pass, where previously it was correctly
returning `ERROR: unrecognized ALTER DATABASE option: tablespace`
because release-13.0 does not propagate this ALTER DATABASE variant.
We (Citus team) discussed cherry picking
[#7253](https://github.com/citusdata/citus/pull/7253) from main to
release-13.0 because it does propagate ALTER DATABASE tablespace option
(as well as a couple of others) but decided fixing the regress test was
not the proper context for that. The fix disables
`citus.enable_metadata_sync` when running vanilla, we discussed
disabling `citus.enable_create_database_propagation` but this is not in
release-13.0.
Preserve the test error message by adjusting the query so that PG17
cannot pull it up to a join. Another instance of a subquery that can be
pulled up to a join with PG17 (#7745)
This should have been fixed in, but slipped by, #7745
In PG17, Auto-generated array types, multirange types, and relation
rowtypes
are treated as dependent objects, hence changing the output of the
print_extension_changes function.
Relevant PG commit:
e5bc9454e527b1cba97553531d8d4992892fdeef
e5bc9454e5
Here we create a table with only the basic extension types
in order to avoid printing extra ones for now.
This can be removed when we drop PG16 support.
https://github.com/citusdata/citus/actions/runs/11960253650/attempts/1#summary-33343972656
```diff
| table pg_dist_rebalance_strategy
+ | type citus.distribution_type[]
+ | type citus.pg_dist_object
+ | type pg_dist_shard
+ | type pg_dist_shard[]
+ | type pg_dist_shard_placement
+ | type pg_dist_shard_placement[]
+ | type pg_dist_transaction
+ | type pg_dist_transaction[]
| view citus_dist_stat_activity
| view pg_dist_shard_placement
```
This work was already done by @m3hm3t and approved as part of
https://github.com/citusdata/citus/pull/7722
I separated it in this PR since the previous one contained other changes
which we don't currently want to merge.
Relevant PG commit:
---------
Co-authored-by: Mehmet YILMAZ <mehmety87@gmail.com>
A recent Postgres commit (*) that refactored error messages is the cause
of the diffs in pg16 regress test when running Citus on Postgres 17. The
fix changes the pg16 goldfile and includes a normalization rule for the
error messages so pg16 will pass when running with version 16 of
Postgres.
(*)
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=498ee9ee2f
PG17 changed how scalar subquery outputs appear in EXPLAIN output (*).
This commit changes impacted regress goldfiles to the PG17 format, and
adds a helper function to covert pre-PG17 plans to the PG17 format. The
conversion is required when testing Citus on pgversions prior to 17. The
helper function can and should be removed when 17 becomes the minimum
supported version.
(*)
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=fd0398fcb
Fix Test Failure in subquery_in_where, set_operations, dml_recursive in
PG17 #7741
The test failures are caused by[ this commit in
PG17](https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9f1337639),
which enables correlated subqueries to be pulled up to a join. Prior to
this, the correlated subquery was implemented as a subplan. In citus, it
is not possible to pushdown a correlated subplan, but with a different
plan in PG17 the query can be executed, per the test diff from
`subquery_in_where`:
```
37,39c37,41
< DEBUG: generating subplan XXX_1 for CTE event_id: SELECT user_id AS events_user_id, "time" AS events_time, event_type FROM public.events_table
< DEBUG: Plan XXX query after replacing subqueries and CTEs: SELECT count(*) AS count FROM ...
< ERROR: correlated subqueries are not supported when the FROM clause contains a CTE or subquery
---
> count
> ---------------------------------------------------------------------
> 0
> (1 row)
>
```
This is because with pg17 `= ANY subquery` in the queries can be
implemented as a join, instead of as a subplan filter on a table scan.
For example, `SELECT * FROM test a WHERE x IN (SELECT x FROM test b
UNION SELECT y FROM test c WHERE a.x = c.x) ORDER BY 1,2` (from
set_operations) has this plan in pg17; note that the subquery is the
inner side of a nested loop join:
```
┌───────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────┤
│ Sort │
│ Sort Key: a.x, a.y │
│ -> Nested Loop │
│ -> Seq Scan on test a │
│ -> Subquery Scan on "ANY_subquery" │
│ Filter: (a.x = "ANY_subquery".x) │
│ -> HashAggregate │
│ Group Key: b.x │
│ -> Append │
│ -> Seq Scan on test b │
│ -> Seq Scan on test c │
│ Filter: (a.x = x) │
└───────────────────────────────────────────────────┘
```
and this plan in pg16 (and previous pg versions); the subquery is a
correlated subplan filter on a table scan:
```
┌───────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────┤
│ Sort │
│ Sort Key: a.x, a.y │
│ -> Seq Scan on test a │
│ Filter: (SubPlan 1) │
│ SubPlan 1 │
│ -> HashAggregate │
│ Group Key: b.x │
│ -> Append │
│ -> Seq Scan on test b │
│ -> Seq Scan on test c │
│ Filter: (a.x = x) │
└───────────────────────────────────────────────┘
```
The fix Modifies the queries causing the test failures so that an ANY
subquery is not folded to a join, preserving the expected output of the
tests. A similar approach was taken for existing regress tests in the[
postgres
commit](https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9f1337639).
See the `join `regress test, for example.
We also add pg17 specific tests that leverage this improvement in Postgres
with Citus distributed planning as well.
Regression test cte_inline has the following diff;
```
DEBUG: CTE cte_1 is going to be inlined via distributed planning
DEBUG: CTE cte_1 is going to be inlined via distributed planning
DEBUG: Creating router plan
-DEBUG: query has a single distribution column value: 1
```
DEBUG message `query has a single distribution column value` does not
appear with PG17. This is because PG17 can recognize when a Result node
does not need to have an input node, so the predicate on the
distribution column is not present in the query plan. Comparing the
query plan obtained before PG17:
```
│ Result │
│ One-Time Filter: false │
│ -> GroupAggregate │
│ -> Seq Scan on public.test_table │
│ Filter: (test_table.key = 1) │
```
with the PG17 query plan:
```
┌──────────────────────────────────┐
│ QUERY PLAN │
├──────────────────────────────────┤
│ Result │
│ One-Time Filter: false │
└──────────────────────────────────┘
```
we see that the Result node in the PG16 plan has an Aggregate node, but
the Result node in the PG17 plan does not have any input node; PG17
recognizes it is not needed given a Filter that evaluates to False at
compile-time. The Result node is present in both plans because PG in
both versions can recognize when a combination of predicates equate to
false at compile time; this is the because the successive predicates in
the test query (key=6, key=5, key=4, etc) become contradictory when the
CTEs are inlined. Here is an example query showing the effect of the CTE
inlining:
```
select count(*), key FROM test_table WHERE key = 1 AND key = 2 GROUP BY key;
```
In this case, the WHERE clause obviously evaluates to False. The PG16
query plan for this query is:
```
┌────────────────────────────────────┐
│ QUERY PLAN │
├────────────────────────────────────┤
│ GroupAggregate │
│ -> Result │
│ One-Time Filter: false │
│ -> Seq Scan on test_table │
│ Filter: (key = 1) │
└────────────────────────────────────┘
```
The PG17 query plan is:
```
┌────────────────────────────────┐
│ QUERY PLAN │
├────────────────────────────────┤
│ GroupAggregate │
│ -> Result │
│ One-Time Filter: false │
└────────────────────────────────┘
```
In both plans the PG optimizer is able to derive the predicate 1=2 from
the equivalence class { key, 1, 2 } and then constant fold this to
False. But, in the PG16 plan the Result node has an input node (a
sequential scan on test_table), while in the PG17 plan the Result node
does not have any input. This is because PG17 recognizes that when the
Result filter resolves to False at compile time it is not necessary to
set an input on the Result. I think this is a consequence of this PG17
commit:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b262ad440
which handles redundant IS [NOT] NULL predicates, but also refactored
evaluating of predicates to true/false at compile-time, enabling
optimizations such as those seen here.
Given the reason for the diff, the fix preserves the test output by
modifying the query so the predicates are not contradictory when the
CTEs are inlined.
In PG17 adds builtin C.UTF-8 locale option, we add it in the code to
avoid "unknown collation provider" in vanilla tests.
Relevant PG commit:
f69319f2f1
f69319f2f1fb16eda4b535bcccec90dff3a6795e
Also in PG17, colliculocale, daticulocale renamed to colllocale,
datlocale
Here we fix the following tests to avoid alternative output
pg15 pg16 multi_mx_create_table multi_schema_support
Relevant PG commit:
f696c0cd5f
f696c0cd5f299f1b51e214efc55a22a782cc175d
PG 17 added support for DEFAULT in ALTER TABLE .. SET ACCESS METHOD
Relevant PG commit:
d61a6cad6418f643a5773352038d0dfe5d3535b8
d61a6cad64
In that case, name in `AlterTableCmd->name` would be null.
Add a null check here to avoid crash.
In PG17, the outer loop in `acquire_sample_rows()` changed
from
`while (BlockSampler_HasMore(&bs))`
to
`while (table_scan_analyze_next_block(scan, stream))`
Relevant PG commit:
041b96802efa33d2bc9456f2ad946976b92b5ae1
041b96802e
It is expected that the `scan_analyze_next_block` function will
check if there are any blocks left. So we add that check in
`columnar_scan_analyze_next_block`
Without this fix, we will have an indefinite loop causing timeout.
Specifically, in our test schedules,
`multi schedule` stuck at `drop_column_partitioned_table` test
`multi-mx` schedule stuck at `start_stop_metadata_sync` test
`columnar schedule` stuck at `columnar_create` test
Changed `attstattarget` in `pg_attribute` to use `NullableDatum`,
allowing null representation for default statistics target in PostgreSQL
17.
Relevant PG commit:
6a004f1be87d34cfe51acf2fe2552d2b08a79273
6a004f1be8
```diff
-- verify statistics is set
SELECT c.relname, a.attstattarget
FROM pg_attribute a
JOIN pg_class c ON a.attrelid = c.oid AND c.relname LIKE 'test\_idx%'
ORDER BY c.relname, a.attnum;
relname | attstattarget
-----------+---------------
test_idx | 4646
- test_idx2 | -1
+ test_idx2 |
test_idx2 | 10000
test_idx2 | 3737
(4 rows)
```
Changed stxstattarget in pg_statistic_ext to use nullable
representation, removing explicit -1 for default statistics target in
PostgreSQL 17.
Relevant PG commit:
012460ee93c304fbc7220e5b55d9d0577fc766ab
012460ee93
```diff
SELECT stxstattarget, stxrelid::regclass
FROM pg_statistic_ext
WHERE stxnamespace IN (
SELECT oid
FROM pg_namespace
WHERE nspname IN ('statistics''TestTarget')
)
AND stxname SIMILAR TO '%\_\d+'
ORDER BY stxstattarget, stxrelid::regclass ASC;
stxstattarget | stxrelid
---------------+-----------------------------------
- -1 | "statistics'TestTarget".t1_980000
- -1 | "statistics'TestTarget".t1_980002
...
+ | "statistics'TestTarget".t1_980000
+ | "statistics'TestTarget".t1_980002
...
```
PG17 compatibility - Part 2
https://github.com/citusdata/citus/pull/7699 was the first PG17
compatibility PR merged to main branch, which provided ONLY successful
Citus compilation with PG17.0.
This PR, consider it as Part 2, provides ruleutils changes for PG17.
Ruleutils changes is the first thing we should merge, after successful
build. It's the core for deparsing logic in Citus.
# Question: How do we add ruleutils changes?
- We add a new ruleutils file specific to PG17.
- We keep track of the changes in Postgres's ruleutils file from here
https://github.com/postgres/postgres/commits/REL_17_0/src/backend/utils/adt/ruleutils.c
- Per each commit in that history that belongs only to 17.0, we add the
relevant changes to static functions to our ruleutils file for PG17.
It's like a manual commit copying.
# Check the PR's commits for detailed steps
https://github.com/citusdata/citus/pull/7725/commits
This PR provides successful compilation against PG17.0.
- Remove ExecFreeExprContext call
Relevant PG commit
d060e921ea5aa47b6265174c32e1128cebdbc3df
d060e921ea
- PG17 uses streaming IO in analyze, fix scan_analyze_next_block function
Relevant PG commit
041b96802efa33d2bc9456f2ad946976b92b5ae1
041b96802e
- Define ObjectClass for PG17+ only since it's removed
Relevant PG commit:
89e5ef7e21812916c9cf9fcf56e45f0f74034656
89e5ef7e21
- Remove ReorderBufferTupleBuf structure.
Relevant PG commit:
08e6344fd6423210b339e92c069bb979ba4e7cd6
08e6344fd6
- Define colliculocale and daticulocale since they have been renamed
Relevant PG commit:
f696c0cd5f299f1b51e214efc55a22a782cc175d
f696c0cd5f
- makeStringConst defined in PG17
Relevant PG commit:
de3600452b61d1bc3967e9e37e86db8956c8f577
de3600452b
- RangeVarCallbackOwnsTable was replaced by RangeVarCallbackMaintainsTable
Relevant PG commit:
ecb0fd33720fab91df1207e85704f382f55e1eb7
ecb0fd3372
- attstattarget is nullable, define pg compatible functions for it
Relevant PG commit:
4f622503d6de975ac87448aea5cea7de4bc140d5
4f622503d6
- stxstattarget is nullable in PG17, write compat functions for it
Relevant PG commit:
012460ee93c304fbc7220e5b55d9d0577fc766ab
012460ee93
- Use ResourceOwner to track WaitEventSet in PG17
Relevant PG commit:
50c67c2019ab9ade8aa8768bfe604cd802fe8591
50c67c2019
- getIdentitySequence now uses Relation instead of relation_id
Relevant PG commit:
509199587df73f06eda898ae13284292f4ae573a
509199587d
- Remove no-op tuplestore_donestoring function
Relevant PG commit:
75680c3d805e2323cd437ac567f0677fdfc7b680
75680c3d80
- MergeAction can have 3 merge kinds (now enum) in PG17, write compat
Relevant PG commit:
0294df2f1f842dfb0eed79007b21016f486a3c6c
0294df2f1f
- EXPLAIN (MEMORY) is added, make changes to ExplainOnePlan
Relevant PG commit:
5de890e3610d5a12cdaea36413d967cf5c544e20
5de890e361
- LIMIT_OPTION_DEFAULT has been removed as it's useless, use LIMIT_OPTION_COUNT
Relevant PG commit:
a6be0600ac3b71dda8277ab0fcbe59ee101ac1ce
a6be0600ac
- write compat for create_foreignscan_path bcs of more arguments in PG17
Relevant PG commit:
9e9931d2bf40e2fea447d779c2e133c2c1256ef3
9e9931d2bf
- pgprocno and lxid have been combined into a struct in PGPROC
Relevant PG commits:
28f3915b73f75bd1b50ba070f56b34241fe53fd1
28f3915b73
ab355e3a88de745607f6dd4c21f0119b5c68f2ad
ab355e3a88
024c521117579a6d356050ad3d78fdc95e44eefa
024c521117
- Simplify CitusNewNode (#7434)
postgres refactored newNode() in PG 17, the main point for doing this is
the original tricks is no longer neccessary for modern compilers[1].
This does the same for Citus.
This should have no backward compatibility issues since it just replaces
palloc0fast with palloc0.
This is good for forward compatibility since palloc0fast no longer
exists in PG 17.
[1]
https://www.postgresql.org/message-id/b51f1fa7-7e6a-4ecc-936d-90a8a1659e7c@iki.fi
(cherry picked from commit 4b295cc)
This is prep work for successful compilation with PG17
PG17added foreach_ptr, foreach_int and foreach_oid macros
Relevant PG commit
14dd0f27d7cd56ffae9ecdbe324965073d01a9ff
14dd0f27d7
We already have these macros, but they are different with the
PG17 ones because our macros take a DECLARED variable, whereas
the PG16 macros declare a locally-scoped loop variable themselves.
Hence I am renaming our macros to foreach_declared_
I am separating this into its own PR since it touches many files. The
main compilation PR is https://github.com/citusdata/citus/pull/7699
In the function TaskConcurrentCancelCheck() the pointer "task" was
utilized after checking against NULL, which can lead to dereference of
the null pointer.
To avoid the problem, added a separate handling of the case when the
pointer is null with an interruption of execution.
Fixes: #7693.
Fixes: 1f8675da4382f6e("nonblocking concurrent task execution via
background workers")
Signed-off-by: Maksim Korotkov <m.korotkov@postgrespro.ru>
Fixes#6795
The `worker_copy_table_to_node` is not supposed to be called for Citus
tables. When this function was initially introduced in #6098 , it had
the respective check. But the check was omitted, since
`worker_copy_table_to_node` called for Citus table finishes with error
anyway:
```
ERROR: cannot execute a distributed query from a query on a shard
DETAIL: Executing a distributed query in a function call that may be pushed to a remote node can lead to incorrect results.
```
It turns out that in some cases this error does not occur. See #6795
I suggest restoring that check.
Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>
The test added in #7604 doesn't reach the `HasRangeTableRef` function
and thus doesn't test what it should.
Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>
Bump PG versions to the latest minors 14.15, 15.10, 16.6
There is a libpq symlink issue when the images are built remotely
https://github.com/citusdata/citus/actions/runs/12583502447/job/35071296238
Hence, we use the commit sha of a local build of the images, pushed.
This is temporary, until we find the underlying cause of the symlink
issue.
---------
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
We thought we provided support for this in
b8c493f2c4
However the use of parameters in SQL is not supported in Citus. Since
generic plan queries use parameters, we can't support for now.
Relevant PG16 commit https://github.com/postgres/postgres/commit/3c05284Fixes#7813 with proper error message
DESCRIPTION: Fixes a crash that happens because of unsafe catalog access
when re-assigning the global pid after application_name changes.
When application_name changes, we don't actually need to
try re-assigning the global pid for external client backends because
application_name doesn't affect the global pid for such backends. Plus,
trying to re-assign the global pid for external client backends would
unnecessarily cause performing a catalog access when the cached local
node id is invalidated. However, accessing to the catalog tables is
dangerous in certain situations like when we're not in a transaction
block. And for the other types of backends, i.e., the Citus internal
backends, we need to re-assign the global pid when the application_name
changes because for such backends we simply extract the global pid
inherited from the originating backend from the application_name -that's
specified by originating backend when openning that connection- and this
doesn't require catalog access.
This PR is a proposed fix for issue
[7705](https://github.com/citusdata/citus/issues/7705). The following is
the background and rationale for the fix (please refer to
[7705](https://github.com/citusdata/citus/issues/7705) for context);
The `varnullingrels `field was introduced to the Var node struct
definition in Postgres 16. Its purpose is to associate a variable with
the set of outer join relations that can cause the variable to be NULL.
The `varnullingrels ` for the variable
`"gianluca_camp_test"."start_timestamp"` in the problem query is 3,
because the variable "gianluca_camp_test"."start_timestamp" is coming
from the inner (nullable) side of an outer join and 3 is the RT index
(aka relid) of that outer join. The problem occurs when the Postgres
planner attempts to plan the combine query. The format of a combine
query is:
```
SELECT <targets>
FROM pg_catalog.citus_extradata_container();
```
There is only one relation in a combine query, so no outer joins are
present, but the non-empty `varnullingrels `field causes the Postgres
planner to access structures for a non-existent relation. The source of
the problem is that, when creating the target list for the combine
query, function MasterAggregateMutator() uses copyObject() to construct
a Var node before setting the master table ID, and this copies over the
non-empty varnullingrels field in the case of the
`"gianluca_camp_test"."start_timestamp"` var. The proposed solution is
to have MasterAggregateMutator() use makeVar() instead of copyObject(),
and only set the fields that make sense for the combine query; var type,
collation and type modifier. The `varnullingrels `field can be left
empty because there is only one relation in the combine query.
A new regress test issue_7705.sql is added to exercise the fix. The
issue is not specific to window functions, any target expression that
cannot be pushed down and contains at least one column from the inner
side of a left outer join (so has a non-empty varnullingrels field) can
cause the same issue.
More about Citus combine queries
[here](https://github.com/citusdata/citus/tree/main/src/backend/distributed#combine-query-planner).
More about Postgres varnullingrels
[here](https://github.com/postgres/postgres/blob/master/src/backend/optimizer/README).
In function MasterAggregateMutator(), when the original Node is a Var node use makeVar() instead
of copyObject() when constructing the Var node for the target list of the combine query.
The varnullingrels field of the original Var node is ignored because it is not relevant for the
combine query; copying this cause the problem in issue 7705, where a coordinator query had
a Var with a reference to a non-existent join relation.
Very small PR, no changes to behaviour. Just a typo fix :-)
Under
`src/backend/distributed/sql/udfs/citus_finalize_upgrade_to_citus11/`
the sql has a typo "runnnig", which will be displayed to the user if the
`citus_check_cluster_node_health()` fails when calling
`citus_finish_citus_upgrade();`
Co-authored-by: eaydingol <60466783+eaydingol@users.noreply.github.com>
When multiple sessions concurrently attempt to add the same coordinator
node using `citus_set_coordinator_host`, there is a potential race
condition. Both sessions may pass the initial metadata check
(`isCoordinatorInMetadata`), but only one will succeed in adding the
node. The other session will fail with an assertion error
(`Assert(!nodeAlreadyExists)`), causing the server to crash. Even though
the `AddNodeMetadata` function takes an exclusive lock, it appears that
the lock is not preventing the race condition before the initial
metadata check.
- **Issue**: The current logic allows concurrent sessions to pass the
check for existing coordinators, leading to an attempt to insert
duplicate nodes, which triggers the assertion failure.
- **Impact**: This race condition leads to crashes during operations
that involve concurrent coordinator additions, as seen in
https://github.com/citusdata/citus/issues/7646.
**Test Plan:**
- Isolation Test Limitation: An isolation test was added to simulate
concurrent additions of the same coordinator node, but due to the
behavior of PostgreSQL locking mechanisms, the test does not trigger the
edge case. The lock applied within the function serializes the
operations, preventing the race condition from occurring in the
isolation test environment.
While the edge case is difficult to reproduce in an isolation test, the
fix addresses the core issue by ensuring concurrency control through
proper locking.
- Existing Tests: All existing tests related to node metadata and
coordinator management have been run to ensure that no regressions were
introduced.
**After the Fix:**
- Concurrent attempts to add the same coordinator node will be
serialized. One session will succeed in adding the node, while the
others will skip the operation without crashing the server.
Co-authored-by: Mehmet YILMAZ <mehmet.yilmaz@microsoft.com>