Nontrivial bump because of the following PG15.3 commit
317aba70e
https://github.com/postgres/postgres/commit/317aba70e
Previously, when views were converted to RTE_SUBQUERY the relid
would be cleared in PG15. In this patch of PG15, relid is retained.
Therefore, we add a check with the "relkind and rtekind" to
identify the converted views in 15.13
Sister PR https://github.com/citusdata/the-process/pull/164
Using dev image sha because I encountered the libpq
symlink issue again with "-v219b87c"
_Since we've never released a Citus release that contains the commit
that introduced this bug (see #7461), we don't need to have a
DESCRIPTION line that shows up in release changelog._
From 8 valgrind test targets run for release-13.1 with PG 17.5, we got
1344 stack traces and except one of them, they were all about below
unsafe memory access because this is a very hot code-path that we
execute via our drop trigger.
On main, even `make -C src/test/regress/ check-base-vg` dumps this stack
trace with PG 16/17 to src/test/regress/citus_valgrind_test_log.txt when
executing "multi_cluster_management", and this is not the case with this
PR anymore.
```c
==27337== VALGRINDERROR-BEGIN
==27337== Conditional jump or move depends on uninitialised value(s)
==27337== at 0x7E26B68: citus_unmark_object_distributed (home/onurctirtir/citus/src/backend/distributed/metadata/distobject.c:113)
==27337== by 0x7E26CC7: master_unmark_object_distributed (home/onurctirtir/citus/src/backend/distributed/metadata/distobject.c:153)
==27337== by 0x4BD852: ExecInterpExpr (home/onurctirtir/.pgenv/src/postgresql-16.2/src/backend/executor/execExprInterp.c:758)
==27337== by 0x4BFD00: ExecInterpExprStillValid (home/onurctirtir/.pgenv/src/postgresql-16.2/src/backend/executor/execExprInterp.c:1870)
==27337== by 0x51D82C: ExecEvalExprSwitchContext (home/onurctirtir/.pgenv/src/postgresql-16.2/src/backend/executor/../../../src/include/executor/executor.h:355)
==27337== by 0x51D8A4: ExecProject (home/onurctirtir/.pgenv/src/postgresql-16.2/src/backend/executor/../../../src/include/executor/executor.h:389)
==27337== by 0x51DADB: ExecResult (home/onurctirtir/.pgenv/src/postgresql-16.2/src/backend/executor/nodeResult.c:136)
==27337== by 0x4D72ED: ExecProcNodeFirst (home/onurctirtir/.pgenv/src/postgresql-16.2/src/backend/executor/execProcnode.c:464)
==27337== by 0x4CA394: ExecProcNode (home/onurctirtir/.pgenv/src/postgresql-16.2/src/backend/executor/../../../src/include/executor/executor.h:273)
==27337== by 0x4CD34C: ExecutePlan (home/onurctirtir/.pgenv/src/postgresql-16.2/src/backend/executor/execMain.c:1670)
==27337== by 0x4CAA7C: standard_ExecutorRun (home/onurctirtir/.pgenv/src/postgresql-16.2/src/backend/executor/execMain.c:365)
==27337== by 0x7E1E475: CitusExecutorRun (home/onurctirtir/citus/src/backend/distributed/executor/multi_executor.c:238)
==27337== Uninitialised value was created by a heap allocation
==27337== at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==27337== by 0x9AB1F7: AllocSetContextCreateInternal (home/onurctirtir/.pgenv/src/postgresql-16.2/src/backend/utils/mmgr/aset.c:438)
==27337== by 0x4E0D56: CreateExprContextInternal (home/onurctirtir/.pgenv/src/postgresql-16.2/src/backend/executor/execUtils.c:261)
==27337== by 0x4E0E3E: CreateExprContext (home/onurctirtir/.pgenv/src/postgresql-16.2/src/backend/executor/execUtils.c:311)
==27337== by 0x4E10D9: ExecAssignExprContext (home/onurctirtir/.pgenv/src/postgresql-16.2/src/backend/executor/execUtils.c:490)
==27337== by 0x51EE09: ExecInitSeqScan (home/onurctirtir/.pgenv/src/postgresql-16.2/src/backend/executor/nodeSeqscan.c:147)
==27337== by 0x4D6CE1: ExecInitNode (home/onurctirtir/.pgenv/src/postgresql-16.2/src/backend/executor/execProcnode.c:210)
==27337== by 0x5243C7: ExecInitSubqueryScan (home/onurctirtir/.pgenv/src/postgresql-16.2/src/backend/executor/nodeSubqueryscan.c:126)
==27337== by 0x4D6DD9: ExecInitNode (home/onurctirtir/.pgenv/src/postgresql-16.2/src/backend/executor/execProcnode.c:250)
==27337== by 0x4F05B2: ExecInitAppend (home/onurctirtir/.pgenv/src/postgresql-16.2/src/backend/executor/nodeAppend.c:223)
==27337== by 0x4D6C46: ExecInitNode (home/onurctirtir/.pgenv/src/postgresql-16.2/src/backend/executor/execProcnode.c:182)
==27337== by 0x52003D: ExecInitSetOp (home/onurctirtir/.pgenv/src/postgresql-16.2/src/backend/executor/nodeSetOp.c:530)
==27337==
==27337== VALGRINDERROR-END
```
DESCRIPTION: Adds `citus_nodes` view that displays the node name, port,
role, and "active" for nodes in the cluster.
This PR adds `citus_nodes` view to the `pg_catalog` schema. The
`citus_nodes` view is created in the `citus` schema and is used to
display the node name, port, role, and active status of each node in the
`pg_dist_node` table.
The view is granted `SELECT` permission to the `PUBLIC` role and is set
to the `pg_catalog` schema.
Test cases was added to `multi_cluster_management` tests.
structs.py was modified to add white spaces as `citus_indent` required.
---------
Co-authored-by: Alper Kocatas <alperkocatas@microsoft.com>
PG15 commit d1ef5631e620f9a5b6480a32bb70124c857af4f1
and PG16 commit 695f5deb7902865901eb2d50a70523af655c3a00
disallow replacing joins with scans in queries with pseudoconstant quals.
This commit prevents the set_join_pathlist_hook from being called
if any of the join restrictions is a pseudo-constant.
So in these cases, citus has no info on the join, never sees that
the query has an outer join, and ends up producing an incorrect plan.
PG17 fixes this by commit 9e9931d2bf40e2fea447d779c2e133c2c1256ef3
Therefore, we take this extra measure here for PG versions less than 17.
hasOuterJoin can never be true when set_join_pathlist_hook is absent.
This PR fixes#7784 and refactors the `WrapSubquery(Query *subquery)`
function to improve clarity and correctness when handling volatile
expressions in subqueries during Citus insert-select rewriting.
### Background
The `WrapSubquery` function rewrites a query of the form:
```sql
INSERT INTO target_table SELECT ... FROM ...
```
...by wrapping the `SELECT` in a subquery:
```sql
SELECT <outer-TL>
FROM ( <subquery with volatile expressions replaced with NULL> ) citus_insert_select_subquery
```
This transformation allows:
* **Volatile expressions** (e.g., `nextval`, `now`) **not used in `GROUP
BY` or `ORDER BY`** to be evaluated **exactly once on the coordinator**.
* **Stable/immutable or sort-relevant expressions** to remain in the
worker-executed subquery.
* Placeholder `NULL`s to maintain column alignment in the inner
subquery.
### Fix Details
* Restructured the code into labeled logical sections:
1. Build wrapper query (`SELECT … FROM (subquery)`)
2. Rewrite target lists with volatility analysis
3. Assign and return updated query trees
* Preserved existing behavior, focusing on clarity and maintainability.
### How the new code handles volatile items
stage | what we look for | what we do | why
-- | -- | -- | --
scan target list once | 1. `expr_is_volatile(te->expr)` 2.
`te->ressortgroupref != 0` (is the column used in GROUP BY / ORDER BY?)
| decide whether to hoist or keep | we must not hoist an expression the
inner query still needs for sorting/grouping, otherwise its
`SortGroupClause` breaks
volatile & not used in sort/group | deep‑copy the expression into the
outer target list | executes once on the coordinator |
| leave a typed `NULL `placeholder (visible, not `resjunk`) in the
inner target list | keeps column numbering stable for helpers that
already ran (reorder, cast); the worker sends a cheap constant |
stable / immutable, or volatile but used in sort/group | keep the
original expression in the inner list; outer list references it via a
`Var `| workers can evaluate it safely and, if needed, the inner
ORDER BY still works |
### Example
Given this query:
```sql
INSERT INTO t SELECT nextval('s'), 42 FROM generate_series(1, 2);
```
The planner rewrites it as:
```sql
SELECT nextval('s'), col2
FROM (SELECT NULL::bigint AS col1, 42 AS col2 FROM generate_series(1, 2)) citus_insert_select_subquery;
```
This ensures `nextval('s')` is evaluated only once per row on the
**coordinator**, not on each worker node, preserving correct sequence
semantics.
#### **Outer‑Var guard (`FindReferencedTableColumn`)**
Because `WrapSubquery` adds an extra query level, lots of Vars that the
old code never expected become “outer” Vars; without teaching
`FindReferencedTableColumn` to climb that extra level reliably, Citus
would intermittently reject valid foreign keys and even hit asserts.
* Re‑implemented the outer‑Var guard so that the function:
* **Walks deterministically up the query stack** when `skipOuterVars =
false` (default for FK / UNION checks). A new while‑loop copies — rather
than truncates — `parentQueryList` on each hop, eliminating
list‑aliasing that made *issue 5248* fail intermittently in parallel
regressions.
* Handles multi‑level `varlevelsup` in a single loop; never mutates the
caller’s list in place.
Issue #7709 asks for security labels on columns to be propagated, to
support the `anon` extension. Before, Citus supported security labels
on roles (#7735) and this PR adds support for propagating security
labels on tables and columns.
All scenarios that involve propagating metadata for a Citus table now
include the security labels on the table and on the columns of the
table. These scenarios are:
- When a table becomes distributed using `create_distributed_table()` or
`create_reference_table()`, its security labels (if any) are propageted.
- When a security label is defined on a distributed table, or one of its
columns, the label is propagated.
- When a node is added to a Citus cluster, all distributed tables have
their security labels propagated.
- When a column of a distributed table is dropped, any security labels
on the column are also dropped.
- When a column is added to a distributed table, security labels can be
defined on the column and are propagated.
- Security labels on a distributed table or its columns are not
propagated when `citus.enable_metadata_sync` is enabled.
Regress test `seclabel` is extended with tests to cover these scenarios.
The implementation is somewhat involved because it impacts DDL
propagation of Citus tables, but can be broken down as follows:
- distributed_object_ops has `Role_SecLabel`, `Table_SecLabel` and
`Column_SecLabel` to take care of security labels on roles, tables and
columns. `Any_SecLabel` is used for all other security labels and is
essentially a nop.
- Deparser support - `DeparseRoleSecLabelStmt()`,
`DeparseTableSecLabelStmt()` and `DeparseColumnSecLabelStmt()` take care
of deparsing security label statements on roles, tables and columns
respectively.
- When reconstructing the DDL for a citus table, security labels on the
table or its columns are included by having
`GetPreLoadTableCreationCommands()` call a new function
`CreateSecurityLabelCommands()` to take care of any security labels on
the table or its columns.
- When changing a distributed table name to a shard name before running
a command locally on a worker, function `RelayEventExtendNames()` checks
for security labels on a table or its columns.
DESCRIPTION: Adds citus_stat_counters view that can be used to query
stat counters that Citus collects while the feature is enabled, which is
controlled by citus.enable_stat_counters. citus_stat_counters() can be
used to query the stat counters for the provided database oid and
citus_stat_counters_reset() can be used to reset them for the provided
database oid or for the current database if nothing or 0 is provided.
Today we don't persist stat counters on server shutdown. In other words,
stat counters are automatically reset in case of a server restart.
Details on the underlying design can be found in header comment of
stat_counters.c and in the technical readme.
-------
Here are the details about what we track as of this PR:
For connection management, we have three statistics about the inter-node
connections initiated by the node itself:
* **connection_establishment_succeeded**
* **connection_establishment_failed**
* **connection_reused**
While the first two are relatively easier to understand, the third one
covers the case where a connection is reused. This can happen when a
connection was already established to the desired node, Citus decided to
cache it for some time (see citus.max_cached_conns_per_worker &
citus.max_cached_connection_lifetime), and then reused it for a new
remote operation. Here are the other important details about these
connection statistics:
1. connection_establishment_failed doesn't care about the connections
that we could establish but are lost later in the transaction. Plus, we
cannot guarantee that the connections that are counted in
connection_establishment_succeeded were not lost later.
2. connection_establishment_failed doesn't care about the optional
connections (see OPTIONAL_CONNECTION flag) that we gave up establishing
because of the connection throttling rules we follow (see
citus.max_shared_pool_size & citus.local_shared_pool_size). The reaason
for this is that we didn't even try to establish these connections.
3. For the rest of the cases where a connection failed for some reason,
we always increment connection_establishment_failed even if the caller
was okay with the failure and know how to recover from it (e.g., the
adaptive executor knows how to fall back local execution when the target
node is the local node and if it cannot establish a connection to the
local node). The reason is that even if it's likely that we can still
serve the operation, we still failed to establish the connection and we
want to track this.
4. Finally, the connection failures that we count in
connection_establishment_failed might be caused by any of the following
reasons and for now we prefer to _not_ further distinguish them for
simplicity:
a. remote node is down or cannot accept any more connections, or
overloaded such that citus.node_connection_timeout is not enough to
establish a connection
b. any internal Citus error that might result in preparing a bad
connection string so that libpq fails when parsing the connection string
even before actually trying to establish a connection via connect() call
c. broken citus.node_conninfo or such Citus configuration that was
incorrectly set by the user can also result in similar outcomes as in b
d. internal waitevent set / poll errors or OOM in local node
We also track two more statistics for query execution:
* **query_execution_single_shard**
* **query_execution_multi_shard**
And more importantly, both query_execution_single_shard and
query_execution_multi_shard are not only tracked for the top-level
queries but also for the subplans etc. The reason is that for some
queries, e.g., the ones that go through recursive planning, after Citus
performs the heavy work as part of subplans, the work that needs to be
done for the top-level query becomes quite straightforward. And for such
query types, it would be deceiving if we only incremented the query stat
counters for the top-level query. Similarly, for non-pushable INSERT ..
SELECT and MERGE queries, we perform separate counter increments for the
SELECT / source part of the query besides the final INSERT / MERGE
query.
Fixes#7105.
DESCRIPTION: Fixes a bug that causes omitting CASCADE clause for the
commands sent to workers for REVOKE commands on tables.
---------
Co-authored-by: ThomasC02 <thomascantrell02@gmail.com>
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
Co-authored-by: Tiago Silva <tiagos3373@gmail.com>
DESCRIPTION: Adjusts max_prepared_transactions only when it's set to
default on PG >= 16
Fixes#7711.
Change AdjustMaxPreparedTransactions to really check if
max_prepared_transactions is explicitly set by user, and only adjust
max_prepared_transactions when it is default.
This fixes 021_twophase test failure with loaded Citus library after
postgres/postgres@b39c5272.
Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>
Var externParamPlaceholder is created on stack, and its address is used
for paramFetch. Postgres code return address of externParamPlaceholder
var to externParam, then code flow go out of scope and dereference
pointer on stack out of scope.
Fixes https://github.com/citusdata/citus/issues/7941.
---------
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
Var jobTypeName is created on stack and its value over pointer is used
in heap_form_tuple, so we
have stack use out of scope.
Issue was detected with adress sanitizer.
Fixes#7943.
DESCRIPTION: Makes sure to prevent `INSERT INTO ... SELECT` queries involving subfield or sublink, to avoid crashes
The following query was crashing the backend:
```
INSERT INTO field_indirection_test_1 (
int_col, ct1_col.int_1,ct1_col.int_2
) SELECT 0, 1, 2;
-- crash
```
En passant, added more tests with sublink in distributed_types and found
another query with wrong behavior:
```
INSERT INTO domain_indirection_test (f1,f3.if1) SELECT 0, 1;
ERROR: could not find a conversion path from type 23 to 17619
-- not the expected ERROR
```
Fixed them by using `strip_implicit_coercions()` on target entry
expression before checking for the presence of a subscript or
fieldstore, else we fail to find the existing ones and wrongly accept to
execute unsafe query.
DESCRIPTION: Fixes a bug in deparsing of shard query in case of
"output-table column" name conflict
If an `ORDER BY` item in `SELECT` is a bare identifier, the parser
_first seeks it as an output column name_ of the `SELECT` (for SQL92
compatibility). However, ruleutils.c is expecting the SQL99
interpretation _where such a name is an input column name_. So it's
possible to produce an incorrect display of a view in the (admittedly
pretty ill-advised) case where some other column is renamed in the
`SELECT` output list to match an `ORDER BY` column.
The `DISTINCT ON` expressions are interpreted using the same rules as
for `ORDER BY`.
We had an issue reported that actually uses `DISTINCT ON`: #7684
Since Citus uses ruleutils deparsing logic to create the shard queries,
it would not
table-qualify the column names as needed.
PG17 fixed this https://github.com/postgres/postgres/commit/a7eb633563c
by table-qualifying such names in the dumped view text. Therefore,
Citus doesn't reproduce the issue in PG17, since PG17 table-qualifies
the column names when needed, and the produced shard queries are
correct.
This PR applies the PG17 patch to `ruleutils_15.c` and `ruleutils_16.c`.
Even though we generally try to avoid modifying the ruleutils files, in
this case
we are applying a Postgres patch that `ruleutils_17.c` already has:
897d996b8f
Thanks @c2main for your discussion and idea in the issue.
Fixes#7684
DESCRIPTION: Adds citus_is_primary_node() UDF to determine if the
current node is a primary node in the cluster.
---------
Co-authored-by: German Eichberger <geeichbe@microsoft.com>
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>