The code was suppopsed to be heavely based on aclexplode() from
PostgreSQL. It's partially true and I moved the essence of the part of
the code into a new static function to build GRANT sql strings.
Also removed the INDENT-OFF marker has the code has changed and should
anyway be reviewed to manage "grantor".
When privileges are set for an attribute, it is required to propage the
privileges when the command is executed but also on any future
synchronisation with a (new) node.
Add 2 tests case with GRANT executed:
* before the distribution of the table
* before a node is added
PostgreSQL introduces MAINTAIN privilege, this change the default output
of some test queries.
I added a new alternative "grant_on_table_propagation_0.out" which is used
for PostgreSQL versions < 17 (so we just have to drop it when 17 will be
the oldest supported version).
In a new file `grant_on_table_propagation, added with others
`grant_on_*_propagation` tests.
Only test related to GRANT/REVOKE columns are added here.
Also included several suggestion from comments in the PR.
Testing with real queries leaded to fix the ACL checks used by Citus
(see previous commit)
Previously this function was only checking ACL at table level, however
for propagating GRANT on attributes, it is required to lock the shard
while the user may not have table privilege at all (but attribute level
ones).
In order to solve that, I add a new parameter to the function: the ACL
mask to be used when checking attributes acl:
-----
Check that the current user has `mode` permissions on relationId.
If not, also check relationId's attributes with `mask`, error out
privileges are not defined.
ACL mask is used because we assume that user has enought privilege
to distribute a table when either ACL_INSERT on the TABLE or
ACL_INSERT on ALL attributes.
In other situations, having a single attribute privilege is enough.
-----
In `EnsureTablePermissions()`, which is used in `lock_shard_resources()`
but also in few other places, citus does check the ACL at the table
level but is not examining attributes ACL.
It prevents GRANT INSERT/UPDATE(col) to work as citus requires to lock
shared but this GRANT is not a table lock.
Change the behavior to check attributes ACL also.
Other usage:
* `Datum get_shard_id_for_distribution_column()`, looks safe.
* `Datum master_create_empty_shard()`, in this case it is apparently
really required to be more careful. This is managed in the next
commit.
`GRANT/REVOKE ALL (col, ...)` cannot be mixed with other privilege and
because there is at least one column defined, the privileges node is
defined so the existing code is valid for table but not for columns.
Add a condition to ensure `ALL` is defined alone and added to the
privilege string when needed.
Comment added based on:
https://github.com/citusdata/citus/pull/7918#discussion_r1994929886
When assigning acl from datum, it's possible that we have a detoasted copy.
In such situation, we ought to pfree() it, here during TABLE GRANTS
propagation.
It's not directly related with the PR, but a discovery while being here.
In multi_read_from_secondaries, there was a select on nodeid and groupid.
But in another (future) test a node is dropped then added again, it's ok but
its nodeid and groupid are increased which leads to a faillure in this
test file.
Apparently nodeid and groupid are not really the subject of the test so
just remove them from the SELECT.
Precisely, I moved the check for cols presence a bit later.
It's safe to exit with ERROR after having append to the string
and it will ease the reading of next commit.
DESCRIPTION: Makes sure to prevent `INSERT INTO ... SELECT` queries involving subfield or sublink, to avoid crashes
The following query was crashing the backend:
```
INSERT INTO field_indirection_test_1 (
int_col, ct1_col.int_1,ct1_col.int_2
) SELECT 0, 1, 2;
-- crash
```
En passant, added more tests with sublink in distributed_types and found
another query with wrong behavior:
```
INSERT INTO domain_indirection_test (f1,f3.if1) SELECT 0, 1;
ERROR: could not find a conversion path from type 23 to 17619
-- not the expected ERROR
```
Fixed them by using `strip_implicit_coercions()` on target entry
expression before checking for the presence of a subscript or
fieldstore, else we fail to find the existing ones and wrongly accept to
execute unsafe query.
DESCRIPTION: Fixes a bug in deparsing of shard query in case of
"output-table column" name conflict
If an `ORDER BY` item in `SELECT` is a bare identifier, the parser
_first seeks it as an output column name_ of the `SELECT` (for SQL92
compatibility). However, ruleutils.c is expecting the SQL99
interpretation _where such a name is an input column name_. So it's
possible to produce an incorrect display of a view in the (admittedly
pretty ill-advised) case where some other column is renamed in the
`SELECT` output list to match an `ORDER BY` column.
The `DISTINCT ON` expressions are interpreted using the same rules as
for `ORDER BY`.
We had an issue reported that actually uses `DISTINCT ON`: #7684
Since Citus uses ruleutils deparsing logic to create the shard queries,
it would not
table-qualify the column names as needed.
PG17 fixed this https://github.com/postgres/postgres/commit/a7eb633563c
by table-qualifying such names in the dumped view text. Therefore,
Citus doesn't reproduce the issue in PG17, since PG17 table-qualifies
the column names when needed, and the produced shard queries are
correct.
This PR applies the PG17 patch to `ruleutils_15.c` and `ruleutils_16.c`.
Even though we generally try to avoid modifying the ruleutils files, in
this case
we are applying a Postgres patch that `ruleutils_17.c` already has:
897d996b8f
Thanks @c2main for your discussion and idea in the issue.
Fixes#7684
DESCRIPTION: Adds citus_is_primary_node() UDF to determine if the
current node is a primary node in the cluster.
---------
Co-authored-by: German Eichberger <geeichbe@microsoft.com>
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
This is a Merge commit that includes all changes from
release-13.0 branch into main branch.
This Merge commit adds PG17 support and drops PG14 support
from the main branch.
Local steps to open this PR and
include `release-13.0` commits to the `main` branch:
```bash
git checkout release-13.0
git checkout -b naisila/merge_13_0
git rebase main
```
Understandably, the rebase step was a resolve-conflict pain. On top of
resolving some conflicts, I had to add some more commits to this PR such
that the main branch compiles and runs as we want it to. Mainly there
were PG17 additions or PG14 subtractions.
I chose this approach as it cleanly stacks _any new_ `release-13.0`
changes on top of the current main branch. Only new ones, not stuff
there is already on main (we had backported several commits from main to
`release-13.0`, so we ignore those in this PR). The idea is to merge all
these commits in the main branch, not squash and merge.
Note 0: We should remove PG14 tests from required tests as this PR
will drop PG14 support in the main branch as well.
Note 1: `check-style` fails because it considers
`src/backend/distributed/sql/citus--12.1-1--12.2-1.sql` as deleted, and
`src/backend/distributed/sql/downgrades/citus--12.2-1--12.1-1.sql` as
renamed. The reason is that the downgrade script actually stayed 98% the
same therefore was considered a rename. I don't think we can fix this.
Note 2:
I tried the following approach as well:
```bash
git checkout main
git checkout -b naisila/merge_13_0
git merge release-13.0
```
However, this approach was a mess as it included several irrelevant
commits that differ between the main and `release-13.0` branch which
just make this PR difficult to understand. For reference, I have pushed
a different branch with that approach.
https://github.com/citusdata/citus/tree/naisila/merge_13_0_first_try As
you can see it's 156 commits ahead of main, with irrelevant commits such
as
1b4d7a51f8.
The reason is that it's including commits from the very first point of
divergence between `main` and `release-12.1` branch (because we had
cloned `release-13.0` branch from `release-12.1` branch, not `main`).
This commit also has to do with renaming of
daticulocale to datlocale
Relevant PG commit:
f696c0cd5f299f1b51e214efc55a22a782cc175d
f696c0cd5f
Keeping this commit separate from the previous one because
these changes will be different once we drop PG15 support.
For now I renamed pg_ge_15_options to pg_ge_15_17_options
and together with it I changed the meaning of the variable.
However when we drop PG14 support, we will use pg_ge_17_options
and delete pg_ge_15_options altogether
DESCRIPTION: Fixes a bug with `UPDATE SET (...) = (SELECT
some_func(),... )` (#7676)
Citus was checking for presence of sublink, but forgot to manage
multiexpr while evaluating clauses during planning. At this stage (citus
planner), it's not always possible to call PostgreSQL code because the
tree is not yet ready for PostgreSQL pure executor.
Fixes https://github.com/citusdata/citus/issues/7676.
Fixed by adding a new function to check sublink or multiexpr in the
tree.
---------
Co-authored-by: Colm <colmmchugh@microsoft.com>
## Enhance `AddInsertSelectCasts` for Identity Columns
This PR fixes#7887 and improves the behavior of partial inserts into
**identity columns** by modifying the **`AddInsertSelectCasts`**
function. Specifically, we introduce **special-case handling** for
`nextval(...)` calls (represented in the parse tree as `NextValueExpr`)
to ensure that if the identity column’s declared type differs from
`nextval`’s default return type (`int8`), we **cast** the expression
properly. This prevents mismatches like `int8` → `int4` from causing
“invalid string enlargement” errors or other type-related failures.
When `INSERT ... SELECT` is processed, `AddInsertSelectCasts` reconciles
each target column’s type with the corresponding SELECT expression’s
type. Historically, for identity columns that rely on `nextval(...)`, we
can end up with a mismatch:
- `nextval` returns **`int8`**,
- The identity column might be **`int4`**, **`bigint`**, or another
integer type.
Without a correct cast, Postgres or Citus can produce plan-time or
runtime errors. By **detecting** `NextValueExpr` and applying a cast to
the column’s type, the final plan ensures consistent insertion without
errors.
## What Changed
1. **Check for `NextValueExpr`**:
In `AddInsertSelectCasts`, we now have a code block:
```c
if (IsA(selectEntry->expr, NextValueExpr))
{
Oid nextvalType = GetNextvalReturnTypeCatalog();
...
// If (targetType != nextvalType), build a cast from int8 -> targetType
}
else
{
// fallback to generic mismatch logic
}
```
This short-circuits any expression that’s a `nextval(...)` call, letting
us explicitly cast to the correct type.
2. **Fallback Generic Logic**:
If it isn’t a `NextValueExpr` (i.e. a normal column or expression
mismatch), we still rely on the existing path that compares `sourceType`
vs. `targetType` and calls `CastExpr(...)` if they differ.
3. **`GetNextvalReturnTypeCatalog`**:
We added or refined a helper function to confirm that `nextval` returns
`int8`, or do a `LookupFuncName("nextval", ...)` to discover the
function’s return type from `pg_proc`—making it robust if future changes
happen.
## Benefits
- **Partial inserts** into identity columns no longer fail with type
mismatches.
- When `nextval` yields `int8` but the identity column is `int4` (or
another type), we properly cast to the column’s type in the plan.
- Preserves the **existing** approach for other columns—only identity
calls get the specialized `NextValueExpr` logic.
## Testing
- Extended `generatedidentity.sql` test scenario to cover partial
inserts into both `GENERATED ALWAYS` and `GENERATED BY DEFAULT` identity
columns, including tests for the `OVERRIDING SYSTEM VALUE` clause and
partial inserts referencing foreign-key columns.
DESCRIPTION: Fixes deadlock with transaction recovery that is possible
during Citus upgrades.
Fixes#7875.
This commit addresses two interrelated deadlock issues uncovered during Citus
upgrades:
1. Local Deadlock:
- **Problem:**
In `RecoverWorkerTransactions()`, a new connection is created for each worker
node to perform transaction recovery by locking the
`pg_dist_transaction` catalog table until the end of the transaction. When
`RecoverTwoPhaseCommits()` calls this function for each worker node, the order
of acquiring locks on `pg_dist_authinfo` and `pg_dist_transaction` can alternate.
This reversal can lead to a deadlock if any concurrent process requires locks on
these tables.
- **Fix:**
Pre-establish all worker node connections upfront so that
`RecoverWorkerTransactions()` operates with a single, consistent connection.
This ensures that locks on `pg_dist_authinfo` and `pg_dist_transaction` are always
acquired in the correct order, thereby preventing the local deadlock.
2. Distributed Deadlock:
- **Problem:**
After resolving the local deadlock, a distributed deadlock issue emerges. The
maintenance daemon calls `RecoverWorkerTransactions()` on each worker node—
including the local node—which leads to a complex locking sequence:
- A RowExclusiveLock is taken on the `pg_dist_transaction` table in
`RecoverWorkerTransactions()`.
- An update extension then tries to acquire an AccessExclusiveLock on the same
table, getting blocked by the RowExclusiveLock.
- A subsequent query (e.g., a SELECT on `pg_prepared_xacts`) issued using a
separate connection on the local node gets blocked due to locks held during a
call to `BuildCitusTableCacheEntry()`.
- The maintenance daemon waits for this query, resulting in a circular wait and
stalling the entire cluster.
- **Fix:**
Avoid cache lookups for internal PostgreSQL tables by implementing an early bailout
for relation IDs below `FirstNormalObjectId` (system objects). This eliminates
unnecessary calls to `BuildCitusTableCache`, reducing lock contention and mitigating
the distributed deadlock.
Furthermore, this optimization improves performance in fast
connect→query_catalog→disconnect cycles by eliminating redundant
cache creation and lookups.
3. Also reverts the commit that disabled the relevant test cases.
DESCRIPTION: fix a planning error caused by a redundant WHERE clause
Fix a Citus planning glitch that occurs in a DML query when the WHERE
clause of the query is of the form:
` WHERE true OR <expression with 1 or more citus tables> `
and this is the only place in the query referencing a citus table.
Postgres' standard planner transforms the WHERE clause to:
` WHERE true `
So the query now has no citus tables, confusing the Citus planner as
described in issues #7782 and #7783. The fix is to check, after Postgres
standard planner, if the Query has been transformed as shown, and re-run
the check of whether or not the query needs distributed planning.
This PR fixes an issue #7891 in the Citus planner where an `UPDATE` on a
local table with a subquery referencing a reference table could produce
a 0-task plan. Historically, the planner sometimes failed to detect that
both the target and referenced tables were effectively “local,”
assigning `INVALID_SHARD_ID `and yielding a no-op plan.
### Root Cause
- In the Citus router logic (`PlanRouterQuery`), we relied on `shardId`
to determine whether a query should be routed to a single shard.
- If `shardId == INVALID_SHARD_ID`, but we also had not marked the query
as a “local table modification,” the code path would produce zero tasks.
- Local + reference tables do not require multi-shard routing. Failing
to detect this “purely local” scenario caused Citus to incorrectly route
to zero tasks.
### Changes
**Enhanced Local Table Detection**
- Updated `IsLocalTableModification` and related checks to consider both
local and reference tables as “local” for planning, preventing the
0-task scenario.
- Expanded `ContainsOnlyLocalOrReferenceTables` to return true if there
are no fully distributed tables in the query.
**Added Regress Test**
- Introduced a new regress test (`issue_7891.sql`) which reproduces the
scenario.
- Verifies we get a valid single- or local-task plan rather than a
0-task plan.
DESCRIPTION: Ensure that a MERGE command on a distributed table with a
`WHEN NOT MATCHED BY SOURCE` clause runs against all shards of the
distributed table.
The Postgres MERGE command updates a table using a table or a query as a
data source. It provides three ways to match the target table with the
source: `WHEN MATCHED` means that there is a row in both the target and
source; `WHEN NOT MATCHED` means that there is a row in the source that
has no match (is not present) in the target; and, as of PG17, `WHEN NOT
MATCHED BY SOURCE` means that there is a row in the target that has no
match in the source.
In Citus, when a MERGE command updates a distributed table using a
local/reference table or a distributed query as source, that source is
repartitioned, and for each repartitioned shard that has data (i.e. 1 or
more rows) the MERGE is run against the corresponding distributed table
shard. Suppose the distributed table has 32 shards, and the source
repartitions into 4 shards that have data, with the remaining 28 shards
being empty; then the MERGE command is performed on the 4 corresponding
shards of the distributed table. However, the semantics of `WHEN NOT
MATCHED BY SOURCE` are that the specified action must be performed on
the target for each row in the target that is not in the source; so if
the source is empty, all target rows should be updated. To see this,
consider the following MERGE command:
```
MERGE INTO target AS t
USING source AS s ON t.id = s.id
WHEN NOT MATCHED BY SOURCE THEN UPDATE t SET t.col1 = 100
```
If the source has zero rows then every row in the target is updated s.t.
its col1 value is 100. Currently in Citus a MERGE on a distributed table
with a local/reference table or a distributed query as source ignores
shards of the distributed table when the corresponding shard of the
repartitioned source has zero rows. However, if the MERGE command
specifies a `WHEN NOT MATCHED BY SOURCE` clause, then the MERGE should
be performed on all shards of the distributed table, to ensure that the
specified action is performed on the target for each row in the target
that is not in the source. This PR enhances Citus MERGE execution so
that when a repartitioned source shard has zero rows, and the MERGE
command specifies a `WHEN NOT MATCHED BY SOURCE` clause, the MERGE is
performed against the corresponding shard of the distributed table using
an empty (zero row) relation as source, by generating a query of the
form:
```
MERGE INTO target_shard_0002 AS t
USING (SELECT id FROM (VALUES (NULL) ) source_0002(id) WHERE FALSE) AS s ON t.id = s.id
WHEN NOT MATCHED BY SOURCE THEN UPDATE t set t.col1 = 100
```
This works because each row in the target shard will be updated, and
`WHEN MATCHED` and `WHEN NOT MATCHED`, if specified, will be no-ops
because the source has zero rows.
To implement this when the source is a local or reference table involves
teaching function `ExcuteSourceAtCoordAndRedistribution()` in
`merge_executor.c` to not prune tasks when the query has `WHEN NOT
MATCHED BY SOURCE` but to instead replace the task's query to one that
uses an empty relation as source. And when the source is a distributed
query, function
`ExecuteMergeSourcePlanIntoColocatedIntermediateResults()` (also in
`merge_executor.c`) instead of skipping empty tasks now generates a
query that uses an empty relation as source for the corresponding target
shard of the distributed table, but again only when the query has `WHEN
NOT MATCHED BY SOURCE`. A new function `BuildEmptyResultQuery()` is
added to `recursive_planning.c` and it is used by both the
aforementioned functions in `merge_executor.c` to build an empty
relation to use as the source. It applies the appropriate type to each
column of the empty relation so the join with the target makes sense to
the query compiler.
DESCRIPTION: Fixes a crash in columnar custom scan that happens when a
columnar table is used in a join. Fixes issue #7647.
Co-authored-by: Ольга Сергеева <ob-sergeeva@it-serv.ru>
DESCRIPTION: Fixes a crash in left outer joins that can happen when
there is an an aggregate on a column from the inner side of the join.
Fix the SEGV seen in #7787 and #7899; it occurs because a column in the
targetlist of a worker subquery can contain a non-empty varnullingrels
field if the column is from the inner side of a left outer join. The
issue can also occur with the columns in the HAVING clause, and this is
also tested in the fix. The issue was triggered by the introduction of
the varnullingrels to Vars in Postgres 16 (2489d76c)
There is a related issue, #7705, where a non-empty varnullingrels was
incorrectly copied into the query tree for the combine query. Here, a
non-empty varnullingrels field of a var is incorrectly copied into the
query tree for a worker subquery.
The regress file from #7705 is used (and renamed) to also test this
(#7787). An alternative test output file is required for Postgres 15
because of an optimization to DISTINCT in Postgres 16 (1349d2790bf).
DESCRIPTION: Drops PG14 support
1. Remove "$version_num" != 'xx' from configure file
2. delete all PG_VERSION_NUM = PG_VERSION_XX references in the code
3. Look at pg_version_compat.h file, remove all _compat functions etc
defined specifically for PGXX differences
4. delete all PG_VERSION_NUM >= PG_VERSION_(XX+1), PG_VERSION_NUM <
PG_VERSION_(XX+1) ifs in the codebase
5. delete ruleutils_xx.c file
6. cleanup normalize.sed file from pg14 specific lines
7. delete all alternative output files for that particular PG version,
server_version_ge variable helps here
As of this commit, after recovering the remote transactions, now we release the lock
on pg_dist_transaction while closing it to avoid deadlocks that might occur because
of trying to acquire a lock on pg_dist_authinfo while holding a lock on
pg_dist_transaction. Such a scenario can only cause a deadlock if another transaction
is trying to acquire a strong lock on pg_dist_transaction while holding a lock on
pg_dist_authinfo. As of today, we (implicitly) acquire a strong lock on
pg_dist_transaction only when upgrading Citus to 11.3-1 and this happens when creating
a REPLICA IDENTITY on pg_dist_transaction.
And regardless of the code-path we are in, it should be okay to release the lock there
because all we do after that point is to abort the prepared transactions that are not
part of an in-progress distributed transaction and releasing the lock before doing so
should be just fine.
This also changes the blocking behavior between citus_create_restore_point and the
transaction recovery code-path in the sense that now citus_create_restore_point doesn't
until transaction recovery completes aborting the prepared transactions that are not
part of an in-progress distributed transaction. However, this should be fine because
even before this was possible, e.g., if transaction recovery fails to open a remote
connection to a node.