**DESCRIPTION:**
This pull request introduces the foundation and core logic for the
snapshot-based node split feature in Citus. This feature enables
promoting a streaming replica (referred to as a clone in this feature
and UI) to a primary node and rebalancing shards between the original
and the newly promoted node without requiring a full data copy.
This significantly reduces rebalance times for scale-out operations
where the new node already contains a full copy of the data via
streaming replication.
Key Highlights:
**1. Replica (Clone) Registration & Management Infrastructure**
Introduces a new set of UDFs to register and manage clone nodes:
- citus_add_clone_node()
- citus_add_clone_node_with_nodeid()
- citus_remove_clone_node()
- citus_remove_clone_node_with_nodeid()
These functions allow administrators to register a streaming replica of
an existing worker node as a clone, making it eligible for later
promotion via snapshot-based split.
**2. Snapshot-Based Node Split (Core Implementation)**
New core UDF:
- citus_promote_clone_and_rebalance()
This function implements the full workflow to promote a clone and
rebalance shards between the old and new primaries. Steps include:
1. Ensuring Exclusivity – Blocks any concurrent placement-changing
operations.
2. Blocking Writes – Temporarily blocks writes on the primary to ensure
consistency.
3. Replica Catch-up – Waits for the replica to be fully in sync.
4. Promotion – Promotes the replica to a primary using pg_promote.
5. Metadata Update – Updates metadata to reflect the newly promoted
primary node.
6. Shard Rebalancing – Redistributes shards between the old and new
primary nodes.
**3. Split Plan Preview**
A new helper UDF get_snapshot_based_node_split_plan() provides a preview
of the shard distribution post-split, without executing the promotion.
**Example:**
```
reb 63796> select * from pg_catalog.get_snapshot_based_node_split_plan('127.0.0.1',5433,'127.0.0.1',5453);
table_name | shardid | shard_size | placement_node
--------------+---------+------------+----------------
companies | 102008 | 0 | Primary Node
campaigns | 102010 | 0 | Primary Node
ads | 102012 | 0 | Primary Node
mscompanies | 102014 | 0 | Primary Node
mscampaigns | 102016 | 0 | Primary Node
msads | 102018 | 0 | Primary Node
mscompanies2 | 102020 | 0 | Primary Node
mscampaigns2 | 102022 | 0 | Primary Node
msads2 | 102024 | 0 | Primary Node
companies | 102009 | 0 | Clone Node
campaigns | 102011 | 0 | Clone Node
ads | 102013 | 0 | Clone Node
mscompanies | 102015 | 0 | Clone Node
mscampaigns | 102017 | 0 | Clone Node
msads | 102019 | 0 | Clone Node
mscompanies2 | 102021 | 0 | Clone Node
mscampaigns2 | 102023 | 0 | Clone Node
msads2 | 102025 | 0 | Clone Node
(18 rows)
```
**4 Test Infrastructure Enhancements**
- Added a new test case scheduler for snapshot-based split scenarios.
- Enhanced pg_regress_multi.pl to support creating node backups with
slightly modified options to simulate real-world backup-based clone
creation.
### 5. Usage Guide
The snapshot-based node split can be performed using the following
workflow:
**- Take a Backup of the Worker Node**
Run pg_basebackup (or an equivalent tool) against the existing worker
node to create a physical backup.
`pg_basebackup -h <primary_worker_host> -p <port> -D
/path/to/replica/data --write-recovery-conf
`
**- Start the Replica Node**
Start PostgreSQL on the replica using the backup data directory,
ensuring it is configured as a streaming replica of the original worker
node.
**- Register the Backup Node as a Clone**
Mark the registered replica as a clone of its original worker node:
`SELECT * FROM citus_add_clone_node('<clone_host>', <clone_port>,
'<primary_host>', <primary_port>);
`
**- Promote and Rebalance the Clone**
Promote the clone to a primary and rebalance shards between it and the
original worker:
`SELECT * FROM citus_promote_clone_and_rebalance('clone_node_id');
`
**- Drop Any Replication Slots from the Original Worker**
After promotion, clean up any unused replication slots from the original
worker:
`SELECT pg_drop_replication_slot('<slot_name>');
`
DESCRIPTION: Parallelizes shard rebalancing and removes the bottlenecks
that previously blocked concurrent logical-replication moves.
These improvements reduce rebalance windows—particularly for clusters
with large reference tables and enable multiple shard transfers to run in parallel.
Motivation:
Citus’ shard rebalancer has some key performance bottlenecks:
**Sequential Movement of Reference Tables:**
Reference tables are often assumed to be small, but in real-world
deployments, they can grow significantly large. Previously, reference
table shards were transferred as a single unit, making the process
monolithic and time-consuming.
**No Parallelism Within a Colocation Group:**
Although Citus distributes data using colocated shards, shard
movements within the same colocation group were serialized. In
environments with hundreds of distributed tables colocated
together, this serialization significantly slowed down rebalance
operations.
**Excessive Locking:**
Rebalancer used restrictive locks and redundant logical replication
guards, further limiting concurrency.
The goal of this commit is to eliminate these inefficiencies and enable
maximum parallelism during rebalance, without compromising correctness
or compatibility. Parallelize shard rebalancing to reduce rebalance
time.
Feature Summary:
**1. Parallel Reference Table Rebalancing**
Each reference-table shard is now copied in its own background task.
Foreign key and other constraints are deferred until all shards are
copied.
For single shard movement without considering colocation a new
internal-only UDF '`citus_internal_copy_single_shard_placement`' is
introduced to allow single-shard copy/move operations.
Since this function is internal, we do not allow users to call it
directly.
**Temporary Hack to Set Background Task Context** Background tasks
cannot currently set custom GUCs like application_name before executing
internal-only functions. 'citus_rebalancer ...' statement as a prefix in
the task command. This is a temporary hack to label internal tasks until
proper GUC injection support is added to the background task executor.
**2. Changes in Locking Strategy**
- Drop the leftover replication lock that previously serialized shard
moves performed via logical replication. This lock was only needed when
we used to drop and recreate the subscriptions/publications before each
move. Since Citus now removes those objects later as part of the “unused
distributed objects” cleanup, shard moves via logical replication can
safely run in parallel without additional locking.
- Introduced a per-shard advisory lock to prevent concurrent operations
on the same shard while allowing maximum parallelism elsewhere.
- Change the lock mode in AcquirePlacementColocationLock from
ExclusiveLock to RowExclusiveLock to allow concurrent updates within the
same colocation group, while still preventing concurrent DDL operations.
**3. citus_rebalance_start() enhancements**
The citus_rebalance_start() function now accepts two new optional
parameters:
```
- parallel_transfer_colocated_shards BOOLEAN DEFAULT false,
- parallel_transfer_reference_tables BOOLEAN DEFAULT false
```
This ensures backward compatibility by preserving the existing behavior
and avoiding any disruption to user expectations and when both are set
to true, the rebalancer operates with full parallelism.
**Previous Rebalancer Behavior:**
`SELECT citus_rebalance_start(shard_transfer_mode := 'force_logical');`
This would:
Start a single background task for replicating all reference tables
Then, move all shards serially, one at a time.
```
Task 1: replicate_reference_tables()
↓
Task 2: move_shard_1()
↓
Task 3: move_shard_2()
↓
Task 4: move_shard_3()
```
Slow and sequential. Reference table copy is a bottleneck. Colocated
shards must wait for each other.
**New Parallel Rebalancer:**
```
SELECT citus_rebalance_start(
shard_transfer_mode := 'force_logical',
parallel_transfer_colocated_shards := true,
parallel_transfer_reference_tables := true
);
```
This would:
- Schedule independent background tasks for each reference-table shard.
- Move colocated shards in parallel, while still maintaining dependency
order.
- Defer constraint application until all reference shards are in place.
-
```
Task 1: copy_ref_shard_1()
Task 2: copy_ref_shard_2()
Task 3: copy_ref_shard_3()
→ Task 4: apply_constraints()
↓
Task 5: copy_shard_1()
Task 6: copy_shard_2()
Task 7: copy_shard_3()
↓
Task 8-10: move_shard_1..3()
```
Each operation is scheduled independently and can run as soon as
dependencies are satisfied.
Enhance security by addressing a code scanning alert and refactoring the
background worker setup code for better maintainability and clarity.
---------
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
DESCRIPTION: Fixes potential memory corruptions that could happen when a
Citus downgrade is followed by a Citus upgrade.
In case of citus downgrade and further upgrade citus crash with core
dump.
The reason is that citus hardcoded number of columns in
pg_dist_partition table,
but in case of downgrade and following update table can have more
columns, and
some of then can be marked as dropped.
Patch suggest decision for this problem with using
tupleDescriptor->nattrs(postgres internal approach).
Fixes#7933.
---------
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
DESCRIPTION: Fixed a bug in EXPLAIN ANALYZE to prevent unintended (duplicate) execution of the (sub)plans during the explain phase.
Fixes#4212
### 🐞 Bug #4212 : Redundant (Subplan) Execution in `EXPLAIN ANALYZE`
codepath
#### 🔍 Background
In the standard PostgreSQL execution path, `ExplainOnePlan()` is
responsible for two distinct operations depending on whether `EXPLAIN
ANALYZE` is requested:
1. **Execute the plan**
```c
if (es->analyze)
ExecutorRun(queryDesc, direction, 0L, true);
```
2. **Print the plan tree**
```c
ExplainPrintPlan(es, queryDesc);
```
When printing the plan, the executor should **not run the plan again**.
Execution is only expected to happen once—at the top level when
`es->analyze = true`.
---
#### ⚠️ Issue in Citus
In the Citus implementation of `CustomScanMethods.ExplainCustomScan =
CitusExplainScan`, which is a custom scan explain callback function used
to print explain information of a Citus plan incorrectly performs
**redundant execution** inside the explain path of `ExplainPrintPlan()`
```c
ExplainOnePlan()
ExplainPrintPlan()
ExplainNode()
CitusExplainScan()
if (distributedPlan->subPlanList != NIL)
{
ExplainSubPlans(distributedPlan, es);
{
PlannedStmt *plan = subPlan->plan;
ExplainOnePlan(plan, ...); // ⚠️ May re-execute subplan if es->analyze is true
}
}
```
This causes the subplans to be **executed again**, even though they have
already been executed during the top-level plan execution. This behavior
violates the expectation in PostgreSQL where `EXPLAIN ANALYZE` should
**execute each node exactly once** for analysis.
---
#### ✅ Fix (proposed)
Save the output of Subplans during `ExecuteSubPlans()`, and later use it
in `ExplainSubPlans()`
DESCRIPTION: Avoid query deparse and planning of shard query in local execution. Adds citus.enable_local_execution_local_plan GUC to allow avoiding unnecessary query deparsing to improve performance of fast-path queries targeting local shards.
If a fast path query resolves to a shard that is local to the node planning the query, a shortcut can be taken so that the OID of the shard is plugged into the parse tree, which is then planned by Postgres. In `local_executor.c` the task uses that plan instead of parsing and planning a shard query. How this is done: The fast path planner identifies if the shortcut is possible, and then the distributed planner checks, using `CheckAndBuildDelayedFastPathPlan()`, if a local plan can be generated or if the shard query should be generated.
This optimization is controlled by a GUC `citus.enable_local_execution_local_plan` which is on by default. A new
regress test `local_execution_local_plan` tests both row-sharding and schema sharding. Negative tests are added to
`local_shard_execution_dropped_column` to verify that the optimization is not taken when the shard is local but there is a difference between the shard and distributed table because of a dropped column.
This PR provides successful build against PG18Beta1. RuleUtils PR was
reviewed separately: #8010
## PG 18Beta1–related changes for building Citus
### TupleDesc / Attr layout
**What changed in PG:** Postgres consolidated the
`TupleDescData.attrs[]` array into a more compact representation. Direct
field access (tupdesc->attrs[i]) was replaced by the new
`TupleDescAttr()` API.
**Citus adaptation:** Everywhere we previously used
`tupdesc->attrs[...]`, we now call `TupleDescAttr(tupdesc, idx)` (or our
own `Attr()` macro) under a compatibility guard.
*
5983a4cffc
General Logic:
* Use `Attr(...)` in places where `columnar_version_compat.h` is
included. This avoids the need to sprinkle `#if PG_VERSION_NUM` guards
around each attribute access.
* Use `TupleDescAttr(tupdesc, i)` when the relevant PostgreSQL header is
already included and the additional macro indirection is unnecessary.
### Collation‐aware `LIKE`
**What changed in PG:** The `textlike` operator now requires an explicit
collation, to avoid ambiguous‐collation errors. Core code switched from
`DirectFunctionCall2(textlike, ...)` to
`DirectFunctionCall2Coll(textlike, DEFAULT_COLLATION_OID, ...)`.
**Citus adaptation:** In `remote_commands.c` and any other LIKE call, we
now use `DirectFunctionCall2Coll(textlike, DEFAULT_COLLATION_OID, ...)`
and `#include <utils/pg_collation.h>`.
*
85b7efa1cd
### Columnar storage API
* Adapt `columnar_relation_set_new_filelocator` (and related init
routines) for PG 18’s revised SMGR and storage-initialization hooks.
* Pull in the new headers (`explain_format.h`,
`columnar_version_compat.h`) so the columnar module compiles cleanly
against PG 18.
- heap_modify_tuple + heap_inplace_update only exist on PG < 18; on PG18
the in-place helper was removed upstream
-
a07e03fd8f
### OpenSSL / TLS integration
**What changed in PG:** Moved from the legacy `SSL_library_init()` to
`OPENSSL_init_ssl(OPENSSL_INIT_LOAD_CONFIG, NULL)`, updated certificate
API calls (`X509_getm_notBefore`, `X509_getm_notAfter`), and
standardized on `TLS_method()`.
**Citus adaptation:** We now `#include <openssl/opensslv.h>` and use
`#if OPENSSL_VERSION_NUMBER >= 0x10100000L` to choose between`
OPENSSL_init_ssl()` or `SSL_library_init()`, and wrap`
X509_gmtime_adj()` calls around the new accessor functions.
*
6c66b7443c
### Adapt `ExtractColumns()` to the new PG-18 `expandRTE()` signature
PostgreSQL 18
80feb727c8
added a fourth argument of type `VarReturningType` to `expandRTE()`, so
calls that used the old 7-parameter form no longer compile. This patch:
* Wraps the `expandRTE(...)` call in a `#if PG_VERSION_NUM >= 180000`
guard.
* On PG 18+ passes the new `VAR_RETURNING_DEFAULT` argument before
`location`.
* On PG 15–17 continues to call the original 7-arg form.
* Adds the necessary includes (`parser/parse_relation.h` for `expandRTE`
and `VarReturningType`, and `pg_version_constants.h` for
`PG_VERSION_NUM`).
### Adapt `ExecutorStart`/`ExecutorRun` hooks to PG-18’s new signatures
PostgreSQL 18
525392d572
changed the signatures of the executor hooks:
* `ExecutorStart_hook` now returns `bool` instead of `void`, and
* `ExecutorRun_hook` drops its old `run_once` argument.
This patch preserves Citus’s existing hook logic by:
1. **Adding two adapter functions** under `#if PG_VERSION_NUM >=
PG_VERSION_18`:
* `citus_executor_start_adapter(QueryDesc *queryDesc, int eflags)`
Calls the old `CitusExecutorStart(queryDesc, eflags)` and then returns
`true` to satisfy the new hook’s `bool` return type.
* `citus_executor_run_adapter(QueryDesc *queryDesc, ScanDirection
direction, uint64 count)`
Calls the old `CitusExecutorRun(queryDesc, direction, count, true)`
(passing `true` for the dropped `run_once` argument), and returns
`void`.
2. **Installing the adapters** in `_PG_init()` instead of the original
hooks when building against PG 18+:
```c
#if PG_VERSION_NUM >= PG_VERSION_18
ExecutorStart_hook = citus_executor_start_adapter;
ExecutorRun_hook = citus_executor_run_adapter;
#else
ExecutorStart_hook = CitusExecutorStart;
ExecutorRun_hook = CitusExecutorRun;
#endif
```
### Adapt to PG-18’s removal of the “run\_once” flag from
ExecutorRun/PortalRun
PostgreSQL commit
[[3eea7a0](3eea7a0c97)
rationalized the executor’s parallelism logic by moving the “execute a
plan only once” check into `ExecutePlan()` itself and dropping the old
`bool run_once` argument from the public APIs:
```diff
- void ExecutorRun(QueryDesc *queryDesc,
- ScanDirection direction,
- uint64 count,
- bool run_once);
+ void ExecutorRun(QueryDesc *queryDesc,
+ ScanDirection direction,
+ uint64 count);
```
(and similarly for `PortalRun()`).
To stay compatible across PG 15–18, Citus now:
1. **Updates all internal calls** to `ExecutorRun(...)` and
`PortalRun(...)`:
* On PG 18+, use the new three-argument form (`ExecutorRun(qd, dir,
count)`).
* On PG 15–17, keep the old four-arg form (`ExecutorRun(qd, dir, count,
true)`) under a `#if PG_VERSION_NUM < 180000` guard.
2. **Guards the dispatcher hooks** via the adapter functions (from the
earlier patch) so that Citus’s executor hooks continue to work under
both the old and new signatures.
### Adapt to PG-18’s shortened PortalRun signature
PostgreSQL 18’s refactoring (see commit
[3eea7a0](3eea7a0c97))
also removed the old run_once and alternate‐dest arguments from the
public PortalRun() API. The signature changed from:
```diff
- bool PortalRun(Portal portal,
- long count,
- bool isTopLevel,
- bool run_once,
- DestReceiver *dest,
- DestReceiver *altdest,
- QueryCompletion *qc);
+ bool PortalRun(Portal portal,
+ long count,
+ bool isTopLevel,
+ DestReceiver *dest,
+ DestReceiver *altdest,
+ QueryCompletion *qc);
```
To support both versions in Citus, we:
1. **Version-guard each call** to `PortalRun()`:
* **On PG 18+** invoke the new 6-argument form.
* **On PG 15–17** fall back to the legacy 7-argument form, passing
`true` for `run_once`.
### Add support for PG-18’s new `plansource` argument in
`PortalDefineQuery`**
PostgreSQL 18 extended the `PortalDefineQuery` API to carry a
`CachedPlanSource *plansource` pointer so that the portal machinery can
track cached‐plan invalidation (as introduced alongside deferred-locking
in commit
525392d572.
To remain compatible across PG 15–18, Citus now wraps its calls under a
version guard:
```diff
- PortalDefineQuery(portal, NULL, sql, commandTag, plantree_list, NULL);
+#if PG_VERSION_NUM >= 180000
+ /* PG 18+: seven-arg signature (adds plansource) */
+ PortalDefineQuery(
+ portal,
+ NULL, /* no prepared-stmt name */
+ sql, /* the query text */
+ commandTag, /* the CommandTag */
+ plantree_list, /* List of PlannedStmt* */
+ NULL, /* no CachedPlan */
+ NULL /* no CachedPlanSource */
+ );
+#else
+ /* PG 15–17: six-arg signature */
+ PortalDefineQuery(
+ portal,
+ NULL, /* no prepared-stmt name */
+ sql, /* the query text */
+ commandTag, /* the CommandTag */
+ plantree_list, /* List of PlannedStmt* */
+ NULL /* no CachedPlan */
+ );
+#endif
```
### Adapt ExecInitRangeTable() calls to PG-18’s new signature
PostgreSQL commit
[cbc127917e04a978a788b8bc9d35a70244396d5b](cbc127917e)
overhauled the planner API for range‐table initialization:
**PG 18+**: added a fourth `Bitmapset *unpruned_relids` argument to
support deferred partition pruning
In Citus’s `create_estate_for_relation()` (in `columnar_metadata.c`), we
now wrap the call in a compile‐time guard so that the code compiles
correctly on all supported PostgreSQL versions:
```
/* Prepare permission info on PG 16+ */
#if PG_VERSION_NUM >= PG_VERSION_16
List *perminfos = NIL;
addRTEPermissionInfo(&perminfos, rte);
#else
List *perminfos = NIL; /* unused on PG 15 */
#endif
/* Initialize the range table, with the right signature for each PG version */
#if PG_VERSION_NUM >= PG_VERSION_18
/* PG 18+: four‐arg signature (adds unpruned_relids) */
ExecInitRangeTable(
estate,
list_make1(rte),
perminfos,
NULL /* unpruned_relids: not used by columnar */
);
#elif PG_VERSION_NUM >= PG_VERSION_16
/* PG 16–17: three‐arg signature (permInfos) */
ExecInitRangeTable(
estate,
list_make1(rte),
perminfos
);
#else
/* PG 15: two‐arg signature */
ExecInitRangeTable(
estate,
list_make1(rte)
);
#endif
estate->es_output_cid = GetCurrentCommandId(true);
```
### Adapt `pgstat_report_vacuum()` to PG-18’s new timestamp argument
PostgreSQL commit
[[30a6ed0ce4bb18212ec38cdb537ea4b43bc99b83](30a6ed0ce4)
extended the `pgstat_report_vacuum()` API by adding a `TimestampTz
start_time` parameter at the end so that the VACUUM statistics collector
can record when the operation began:
```diff
/* PG ≤17: four-arg signature */
- void pgstat_report_vacuum(Oid tableoid,
- bool shared,
- double num_live_tuples,
- double num_dead_tuples);
+/* PG ≥18: five-arg signature adds a start_time */
+ void pgstat_report_vacuum(Oid tableoid,
+ bool shared,
+ double num_live_tuples,
+ double num_dead_tuples,
+ TimestampTz start_time);
```
To support both versions, we now wrap the call in `columnar_tableam.c`
with a version guard, supplying `GetCurrentTimestamp()` for PG-18+:
```c
#if PG_VERSION_NUM >= 180000
/* PG 18+: include start_timestamp */
pgstat_report_vacuum(
RelationGetRelid(rel),
rel->rd_rel->relisshared,
Max(new_live_tuples, 0), /* live tuples */
0, /* dead tuples */
GetCurrentTimestamp() /* start time */
);
#else
/* PG 15–17: original signature */
pgstat_report_vacuum(
RelationGetRelid(rel),
rel->rd_rel->relisshared,
Max(new_live_tuples, 0), /* live tuples */
0 /* dead tuples */
);
#endif
```
### Adapt `ExecuteTaskPlan()` to PG-18’s expanded `CreateQueryDesc()`
signature
PostgreSQL 18 changed `CreateQueryDesc()` from an eight-argument to a
nine-argument call by inserting a `CachedPlan *cplan` parameter
immediately after the `PlannedStmt *plannedstmt` argument (see commit
525392d572).
To remain compatible with PG 15–17, Citus now wraps its invocation in
`local_executor.c` with a version guard:
```diff
- /* PG15–17: eight-arg CreateQueryDesc without cached plan */
- QueryDesc *queryDesc = CreateQueryDesc(
- taskPlan, /* PlannedStmt *plannedstmt */
- queryString, /* const char *sourceText */
- GetActiveSnapshot(),/* Snapshot snapshot */
- InvalidSnapshot, /* Snapshot crosscheck_snapshot */
- destReceiver, /* DestReceiver *dest */
- paramListInfo, /* ParamListInfo params */
- queryEnv, /* QueryEnvironment *queryEnv */
- 0 /* int instrument_options */
- );
+#if PG_VERSION_NUM >= 180000
+ /* PG18+: nine-arg CreateQueryDesc with a CachedPlan slot */
+ QueryDesc *queryDesc = CreateQueryDesc(
+ taskPlan, /* PlannedStmt *plannedstmt */
+ NULL, /* CachedPlan *cplan (none) */
+ queryString, /* const char *sourceText */
+ GetActiveSnapshot(),/* Snapshot snapshot */
+ InvalidSnapshot, /* Snapshot crosscheck_snapshot */
+ destReceiver, /* DestReceiver *dest */
+ paramListInfo, /* ParamListInfo params */
+ queryEnv, /* QueryEnvironment *queryEnv */
+ 0 /* int instrument_options */
+ );
+#else
+ /* PG15–17: eight-arg CreateQueryDesc without cached plan */
+ QueryDesc *queryDesc = CreateQueryDesc(
+ taskPlan, /* PlannedStmt *plannedstmt */
+ queryString, /* const char *sourceText */
+ GetActiveSnapshot(),/* Snapshot snapshot */
+ InvalidSnapshot, /* Snapshot crosscheck_snapshot */
+ destReceiver, /* DestReceiver *dest */
+ paramListInfo, /* ParamListInfo params */
+ queryEnv, /* QueryEnvironment *queryEnv */
+ 0 /* int instrument_options */
+ );
+#endif
```
### Adapt `RelationGetPrimaryKeyIndex()` to PG-18’s new “deferrable\_ok”
flag
PostgreSQL commit
14e87ffa5c
added a new Boolean `deferrable_ok` parameter to
`RelationGetPrimaryKeyIndex()` so that the lock manager can defer
unique‐constraint locks when requested. The API changed from:
```c
RelationGetPrimaryKeyIndex(Relation relation)
```
to:
```c
RelationGetPrimaryKeyIndex(Relation relation, bool deferrable_ok)
```
```diff
diff --git a/src/backend/distributed/metadata/node_metadata.c
b/src/backend/distributed/metadata/node_metadata.c
index e3a1b2c..f4d5e6f 100644
--- a/src/backend/distributed/metadata/node_metadata.c
+++ b/src/backend/distributed/metadata/node_metadata.c
@@ -2965,8 +2965,18 @@
*/
- Relation replicaIndex =
index_open(RelationGetPrimaryKeyIndex(pgDistNode),
- AccessShareLock);
+ #if PG_VERSION_NUM >= PG_VERSION_18
+ /* PG 18+ adds a bool "deferrable_ok" parameter */
+ Relation replicaIndex =
+ index_open(
+ RelationGetPrimaryKeyIndex(pgDistNode, false),
+ AccessShareLock);
+ #else
+ Relation replicaIndex =
+ index_open(
+ RelationGetPrimaryKeyIndex(pgDistNode),
+ AccessShareLock);
+ #endif
ScanKeyInit(&scanKey[0], Anum_pg_dist_node_nodename,
BTEqualStrategyNumber, F_TEXTEQ, CStringGetTextDatum(nodeName));
```
```diff
diff --git a/src/backend/distributed/operations/node_protocol.c b/src/backend/distributed/operations/node_protocol.c
index e3a1b2c..f4d5e6f 100644
--- a/src/backend/distributed/operations/node_protocol.c
+++ b/src/backend/distributed/operations/node_protocol.c
@@ -746,7 +746,12 @@
if (!OidIsValid(idxoid))
{
- idxoid = RelationGetPrimaryKeyIndex(rel);
+ /* Determine the index OID of the primary key (PG18 adds a second parameter) */
+#if PG_VERSION_NUM >= PG_VERSION_18
+ idxoid = RelationGetPrimaryKeyIndex(rel, false);
+#else
+ idxoid = RelationGetPrimaryKeyIndex(rel);
+#endif
}
return idxoid;
```
Because Citus has always taken the lock immediately—just as the old
two-arg call did—we pass `false` to keep that same immediate-lock
behavior. Passing `true` would switch to deferred locking, which we
don’t want.
### Adapt `ExplainOnePlan()` to PG-18’s expanded API
PostgreSQL 18 extended
525392d572
the `ExplainOnePlan()` function to carry the `CachedPlan *` and
`CachedPlanSource *` pointers plus an explicit `query_index`, letting
the EXPLAIN machinery track plan‐source invalidation. The old signature:
```c
/* PG ≤17 */
void
ExplainOnePlan(PlannedStmt *plannedstmt,
IntoClause *into,
struct ExplainState *es,
const char *queryString,
ParamListInfo params,
QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage);
```
became, in PG 18:
```c
/* PG ≥18 */
void
ExplainOnePlan(PlannedStmt *plannedstmt,
CachedPlan *cplan,
CachedPlanSource *plansource,
int query_index,
IntoClause *into,
struct ExplainState *es,
const char *queryString,
ParamListInfo params,
QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage,
const MemoryContextCounters *mem_counters);
```
To compile under both versions, Citus now wraps each call in
`multi_explain.c` with:
```c
#if PG_VERSION_NUM >= PG_VERSION_18
/* PG 18+: pass NULL for the new cached‐plan fields and zero for query_index */
ExplainOnePlan(
plan, /* PlannedStmt *plannedstmt */
NULL, /* CachedPlan *cplan */
NULL, /* CachedPlanSource *plansource */
0, /* query_index */
into, /* IntoClause *into */
es, /* ExplainState *es */
queryString, /* const char *queryString */
params, /* ParamListInfo params */
NULL, /* QueryEnvironment *queryEnv */
&planduration,/* const instr_time *planduration */
(es->buffers ? &bufusage : NULL),
(es->memory ? &mem_counters : NULL)
);
#elif PG_VERSION_NUM >= PG_VERSION_17
/* PG 17: same as before, plus passing mem_counters if enabled */
ExplainOnePlan(
plan,
into,
es,
queryString,
params,
queryEnv,
&planduration,
(es->buffers ? &bufusage : NULL),
(es->memory ? &mem_counters : NULL)
);
#else
/* PG 15–16: original seven-arg form */
ExplainOnePlan(
plan,
into,
es,
queryString,
params,
queryEnv,
&planduration,
(es->buffers ? &bufusage : NULL)
);
#endif
```
### Adapt to the unified “index interpretation” API in PG 18 (commit
a8025f544854)
PostgreSQL commit
a8025f5448
generalized the old btree‐specific operator‐interpretation API into a
single “index interpretation” interface:
* **Renamed type**:
`OpBtreeInterpretation` → `OpIndexInterpretation`
* **Renamed function**:
`get_op_btree_interpretation(opno)` →
`get_op_index_interpretation(opno)`
* **Unified field**:
Each interpretation now carries `cmptype` instead of `strategy`.
To build cleanly on PG 18 while still supporting PG 15–17, Citus’s
shard‐pruning code now wraps these changes:
```c
#include "pg_version_constants.h"
#if PG_VERSION_NUM >= PG_VERSION_18
/* On PG 18+ the btree‐only APIs vanished; alias them to the new generic versions */
typedef OpIndexInterpretation OpBtreeInterpretation;
#define get_op_btree_interpretation(opno) get_op_index_interpretation(opno)
#define ROWCOMPARE_NE COMPARE_NE
#endif
/* … later, when checking an interpretation … */
OpBtreeInterpretation *interp =
(OpBtreeInterpretation *) lfirst(cell);
#if PG_VERSION_NUM >= PG_VERSION_18
/* use cmptype on PG 18+ */
if (interp->cmptype == ROWCOMPARE_NE)
#else
/* use strategy on PG 15–17 */
if (interp->strategy == ROWCOMPARE_NE)
#endif
{
/* … */
}
```
### Adapt `create_foreignscan_path()` for PG-18’s revised signature
PostgreSQL commit
e222534679
reordered and removed a couple of parameters in the FDW‐path builder:
* **PG 15–17 signature (11 args)**
```c
create_foreignscan_path(PlannerInfo *root,
RelOptInfo *rel,
PathTarget *target,
double rows,
Cost startup_cost,
Cost total_cost,
List *pathkeys,
Relids required_outer,
Path *fdw_outerpath,
List *fdw_restrictinfo,
List *fdw_private);
```
* **PG 18+ signature (9 args)**
```c
create_foreignscan_path(PlannerInfo *root,
RelOptInfo *rel,
PathTarget *target,
double rows,
int disabled_nodes,
Cost startup_cost,
Cost total_cost,
Relids required_outer,
Path *fdw_outerpath,
List *fdw_private);
```
To support both, Citus now defines a compatibility macro in
`pg_version_compat.h`:
```c
#include "nodes/bitmapset.h" /* for Relids */
#include "nodes/pg_list.h" /* for List */
#include "optimizer/pathnode.h" /* for create_foreignscan_path() */
#if PG_VERSION_NUM >= PG_VERSION_18
/* PG18+: drop pathkeys & fdw_restrictinfo, add disabled_nodes */
#define create_foreignscan_path_compat(a, b, c, d, e, f, g, h, i, j, k) \
create_foreignscan_path( \
(a), /* root */ \
(b), /* rel */ \
(c), /* target */ \
(d), /* rows */ \
(0), /* disabled_nodes (unused by Citus) */ \
(e), /* startup_cost */ \
(f), /* total_cost */ \
(g), /* required_outer */ \
(h), /* fdw_outerpath */ \
(k) /* fdw_private */ \
)
#else
/* PG15–17: original signature */
#define create_foreignscan_path_compat(a, b, c, d, e, f, g, h, i, j, k) \
create_foreignscan_path( \
(a), (b), (c), (d), \
(e), (f), \
(g), (h), (i), (j), (k) \
)
#endif
```
Now every call to `create_foreignscan_path_compat(...)`—even in tests
like `fake_fdw.c`—automatically picks the correct argument list for
PG 15 through PG 18.
### Drop the obsolete bitmap‐scan hooks on PG 18+
PostgreSQL commit
c3953226a0
cleaned up the `TableAmRoutine` API by removing the two bitmap‐scan
callback slots:
* `scan_bitmap_next_block`
* `scan_bitmap_next_tuple`
Since those hook‐slots no longer exist in PG 18, Citus now wraps their
NULL‐initialization in a `#if PG_VERSION_NUM < PG_VERSION_18` guard. On
PG 15–17 we still explicitly set them to `NULL` (to satisfy the old
struct layout), and on PG 18+ we omit them entirely:
```c
#if PG_VERSION_NUM < PG_VERSION_18
/* PG 15–17 only: these fields were removed upstream in PG 18 */
.scan_bitmap_next_block = NULL,
.scan_bitmap_next_tuple = NULL,
#endif
```
### Adapt `vac_update_relstats()` invocation to PG-18’s new
“all\_frozen” argument
PostgreSQL commit
99f8f3fbbc
extended the `vac_update_relstats()` API by inserting a
`num_all_frozen_pages` parameter between the existing
`num_all_visible_pages` and `hasindex` arguments:
```diff
- /* PG ≤17: */
- void
- vac_update_relstats(Relation relation,
- BlockNumber num_pages,
- double num_tuples,
- BlockNumber num_all_visible_pages,
- bool hasindex,
- TransactionId frozenxid,
- MultiXactId minmulti,
- bool *frozenxid_updated,
- bool *minmulti_updated,
- bool in_outer_xact);
+ /* PG ≥18: adds num_all_frozen_pages */
+ void
+ vac_update_relstats(Relation relation,
+ BlockNumber num_pages,
+ double num_tuples,
+ BlockNumber num_all_visible_pages,
+ BlockNumber num_all_frozen_pages,
+ bool hasindex,
+ TransactionId frozenxid,
+ MultiXactId minmulti,
+ bool *frozenxid_updated,
+ bool *minmulti_updated,
+ bool in_outer_xact);
```
To compile cleanly on both PG 15–17 and PG 18+, Citus wraps its call in
a version guard and supplies a zero placeholder for the new field:
```c
#if PG_VERSION_NUM >= 180000
/* PG 18+: supply explicit “all_frozen” count */
vac_update_relstats(
rel,
new_rel_pages,
new_live_tuples,
new_rel_allvisible, /* allvisible */
0, /* all_frozen */
nindexes > 0,
newRelFrozenXid,
newRelminMxid,
&frozenxid_updated,
&minmulti_updated,
false /* in_outer_xact */
);
#else
/* PG 15–17: original signature */
vac_update_relstats(
rel,
new_rel_pages,
new_live_tuples,
new_rel_allvisible,
nindexes > 0,
newRelFrozenXid,
newRelminMxid,
&frozenxid_updated,
&minmulti_updated,
false /* in_outer_xact */
);
#endif
```
**Why all_frozen = 0?**
Columnar storage never embeds transaction IDs in its pages, so it never
needs to track “all‐frozen” pages the way a heap does. Setting both
allvisible and allfrozen to zero simply tells Postgres “there are no
pages with the visibility or frozen‐status bits set,” matching our
existing behavior.
This change ensures Citus’s VACUUM‐statistic updates work unmodified
across all supported Postgres versions.
DESCRIPTION: Fixes a bug with `UPDATE SET (...) = (SELECT
some_func(),... )` (#7676)
Citus was checking for presence of sublink, but forgot to manage
multiexpr while evaluating clauses during planning. At this stage (citus
planner), it's not always possible to call PostgreSQL code because the
tree is not yet ready for PostgreSQL pure executor.
Fixes https://github.com/citusdata/citus/issues/7676.
Fixed by adding a new function to check sublink or multiexpr in the
tree.
---------
Co-authored-by: Colm <colmmchugh@microsoft.com>
DESCRIPTION: Drops PG14 support
1. Remove "$version_num" != 'xx' from configure file
2. delete all PG_VERSION_NUM = PG_VERSION_XX references in the code
3. Look at pg_version_compat.h file, remove all _compat functions etc
defined specifically for PGXX differences
4. delete all PG_VERSION_NUM >= PG_VERSION_(XX+1), PG_VERSION_NUM <
PG_VERSION_(XX+1) ifs in the codebase
5. delete ruleutils_xx.c file
6. cleanup normalize.sed file from pg14 specific lines
7. delete all alternative output files for that particular PG version,
server_version_ge variable helps here
There is a crash when running vanilla tests because of the
`citus.hide_citus_dependent_objects` GUC. We turn on this GUC only for
the pg vanilla tests. This GUC runs the following function
`HideCitusDependentObjectsOnQueriesOfPgMetaTables`. This function
doesn't take into account the new `mergeJoinCondition`. I rewrote the
function such that it checks for merge join conditions as well.
Relevant PG commit:
https://github.com/postgres/postgres/commit/0294df2f1
The crash could be reproduced locally like the following:
```SQL
SET citus.hide_citus_dependent_objects TO on;
CREATE OR REPLACE FUNCTION
pg_catalog.is_citus_depended_object(oid,oid)
RETURNS bool
LANGUAGE C
AS 'citus', $$is_citus_depended_object$$;
-- try a system catalog
MERGE INTO pg_class c
USING (SELECT 'pg_depend'::regclass AS oid) AS j
ON j.oid = c.oid
WHEN MATCHED THEN
UPDATE SET reltuples = reltuples + 1
RETURNING j.oid;
CREATE VIEW classv AS SELECT * FROM pg_class;
MERGE INTO classv c
USING pg_namespace n
ON n.oid = c.relnamespace
WHEN MATCHED AND c.oid = 'pg_depend'::regclass THEN
UPDATE SET reltuples = reltuples - 1
RETURNING c.oid;
-- crash happens here
```
This PR provides successful compilation against PG17.0.
- Remove ExecFreeExprContext call
Relevant PG commit
d060e921ea5aa47b6265174c32e1128cebdbc3df
d060e921ea
- PG17 uses streaming IO in analyze, fix scan_analyze_next_block function
Relevant PG commit
041b96802efa33d2bc9456f2ad946976b92b5ae1
041b96802e
- Define ObjectClass for PG17+ only since it's removed
Relevant PG commit:
89e5ef7e21812916c9cf9fcf56e45f0f74034656
89e5ef7e21
- Remove ReorderBufferTupleBuf structure.
Relevant PG commit:
08e6344fd6423210b339e92c069bb979ba4e7cd6
08e6344fd6
- Define colliculocale and daticulocale since they have been renamed
Relevant PG commit:
f696c0cd5f299f1b51e214efc55a22a782cc175d
f696c0cd5f
- makeStringConst defined in PG17
Relevant PG commit:
de3600452b61d1bc3967e9e37e86db8956c8f577
de3600452b
- RangeVarCallbackOwnsTable was replaced by RangeVarCallbackMaintainsTable
Relevant PG commit:
ecb0fd33720fab91df1207e85704f382f55e1eb7
ecb0fd3372
- attstattarget is nullable, define pg compatible functions for it
Relevant PG commit:
4f622503d6de975ac87448aea5cea7de4bc140d5
4f622503d6
- stxstattarget is nullable in PG17, write compat functions for it
Relevant PG commit:
012460ee93c304fbc7220e5b55d9d0577fc766ab
012460ee93
- Use ResourceOwner to track WaitEventSet in PG17
Relevant PG commit:
50c67c2019ab9ade8aa8768bfe604cd802fe8591
50c67c2019
- getIdentitySequence now uses Relation instead of relation_id
Relevant PG commit:
509199587df73f06eda898ae13284292f4ae573a
509199587d
- Remove no-op tuplestore_donestoring function
Relevant PG commit:
75680c3d805e2323cd437ac567f0677fdfc7b680
75680c3d80
- MergeAction can have 3 merge kinds (now enum) in PG17, write compat
Relevant PG commit:
0294df2f1f842dfb0eed79007b21016f486a3c6c
0294df2f1f
- EXPLAIN (MEMORY) is added, make changes to ExplainOnePlan
Relevant PG commit:
5de890e3610d5a12cdaea36413d967cf5c544e20
5de890e361
- LIMIT_OPTION_DEFAULT has been removed as it's useless, use LIMIT_OPTION_COUNT
Relevant PG commit:
a6be0600ac3b71dda8277ab0fcbe59ee101ac1ce
a6be0600ac
- write compat for create_foreignscan_path bcs of more arguments in PG17
Relevant PG commit:
9e9931d2bf40e2fea447d779c2e133c2c1256ef3
9e9931d2bf
- pgprocno and lxid have been combined into a struct in PGPROC
Relevant PG commits:
28f3915b73f75bd1b50ba070f56b34241fe53fd1
28f3915b73
ab355e3a88de745607f6dd4c21f0119b5c68f2ad
ab355e3a88
024c521117579a6d356050ad3d78fdc95e44eefa
024c521117
- Simplify CitusNewNode (#7434)
postgres refactored newNode() in PG 17, the main point for doing this is
the original tricks is no longer neccessary for modern compilers[1].
This does the same for Citus.
This should have no backward compatibility issues since it just replaces
palloc0fast with palloc0.
This is good for forward compatibility since palloc0fast no longer
exists in PG 17.
[1]
https://www.postgresql.org/message-id/b51f1fa7-7e6a-4ecc-936d-90a8a1659e7c@iki.fi
(cherry picked from commit 4b295cc)
This is prep work for successful compilation with PG17
PG17added foreach_ptr, foreach_int and foreach_oid macros
Relevant PG commit
14dd0f27d7cd56ffae9ecdbe324965073d01a9ff
14dd0f27d7
We already have these macros, but they are different with the
PG17 ones because our macros take a DECLARED variable, whereas
the PG16 macros declare a locally-scoped loop variable themselves.
Hence I am renaming our macros to foreach_declared_
I am separating this into its own PR since it touches many files. The
main compilation PR is https://github.com/citusdata/citus/pull/7699
In the function TaskConcurrentCancelCheck() the pointer "task" was
utilized after checking against NULL, which can lead to dereference of
the null pointer.
To avoid the problem, added a separate handling of the case when the
pointer is null with an interruption of execution.
Fixes: #7693.
Fixes: 1f8675da4382f6e("nonblocking concurrent task execution via
background workers")
Signed-off-by: Maksim Korotkov <m.korotkov@postgrespro.ru>
Variables being modified in the PG_TRY block and read in the PG_CATCH
block should be qualified with volatile.
The variable waitEventSet is modified in the PG_TRY block (line 1085)
and read in the PG_CATCH block (line 1095).
The variable relation is modified in the PG_TRY block (line 500) and
read in the PG_CATCH block (line 515).
Besides, the variable objectAddress doesn't need the volatile qualifier.
Ref: C99 7.13.2.1[^1],
> All accessible objects have values, and all other components of the
abstract machine have state, as of the time the longjmp function was
called, except that the values of objects of automatic storage duration
that are local to the function containing the invocation of the
corresponding setjmp macro that do not have volatile-qualified type and
have been changed between the setjmp invocation and longjmp call are
indeterminate.
[^1]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
DESCRIPTION: Correctly mark some variables as volatile
---------
Co-authored-by: Hong Yi <zouzou0208@gmail.com>
This PR changes the order in which the locks are acquired (for the
target and reference tables), when a modify request is initiated from a
worker node that is not the "FirstWorkerNode".
To prevent concurrent writes, locks are acquired on the first worker
node for the replicated tables. When the update statement originates
from the first worker node, it acquires the lock on the reference
table(s) first, followed by the target table(s). However, if the update
statement is initiated in another worker node, the lock requests are
sent to the first worker in a different order. This PR unifies the
modification order on the first worker node. With the third commit,
independent of the node that received the request, the locks are
acquired for the modified table and then the reference tables on the
first node.
The first commit shows a sample output for the test prior to the fix.
Fixes#7477
---------
Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
Moves the following functions to the Citus internal schema:
citus_internal_local_blocked_processes
citus_internal_global_blocked_processes
citus_internal_mark_node_not_synced
citus_internal_unregister_tenant_schema_globally
citus_internal_update_none_dist_table_metadata
citus_internal_update_placement_metadata
citus_internal_update_relation_colocation
citus_internal_start_replication_origin_tracking
citus_internal_stop_replication_origin_tracking
citus_internal_is_replication_origin_tracking_active
#7405
---------
Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
Since Postgres commit da9b580d files and directories are supposed to
be created with pg_file_create_mode and pg_dir_create_mode permissions
when default permissions are expected.
This fixes a failure of one of the postgres tests:
If we create file add.conf containing
```
shared_preload_libraries='citus'
```
and run postgres tests
```
TEMP_CONFIG=/path/to/add.conf make installcheck -C src/bin/pg_ctl/
```
then 001_start_stop.pl fails with
```
.../data/base/pgsql_job_cache mode must be 0750
```
in the log.
In passing this also stops creating directories that we haven't used
since Citus 7.4
This change explicitely doesn't change permissions of certificates/keys
that we create.
---------
Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>
LoadShardList is called twice, which is not neccessary, and there is no
need to sort the shard placement list since we only want to know the list
length.
This change adds a script to programatically group all includes in a
specific order. The script was used as a one time invocation to group
and sort all includes throught our formatted code. The grouping is as
follows:
- System includes (eg. `#include<...>`)
- Postgres.h (eg. `#include "postgres.h"`)
- Toplevel imports from postgres, not contained in a directory (eg.
`#include "miscadmin.h"`)
- General postgres includes (eg . `#include "nodes/..."`)
- Toplevel citus includes, not contained in a directory (eg. `#include
"citus_verion.h"`)
- Columnar includes (eg. `#include "columnar/..."`)
- Distributed includes (eg. `#include "distributed/..."`)
Because it is quite hard to understand the difference between toplevel
citus includes and toplevel postgres includes it hardcodes the list of
toplevel citus includes. In the same manner it assumes anything not
prefixed with `columnar/` or `distributed/` as a postgres include.
The sorting/grouping is enforced by CI. Since we do so with our own
script there are not changes required in our uncrustify configuration.
DESCRIPTION: This change starts a maintenance deamon at the time of
server start if there is a designated main database.
This is the code flow:
1. User designates a main database:
`ALTER SYSTEM SET citus.main_db = "myadmindb";`
2. When postmaster starts, in _PG_Init, citus calls
`InitializeMaintenanceDaemonForMainDb`
This function registers a background worker to run
`CitusMaintenanceDaemonMain `with `databaseOid = 0 `
3. `CitusMaintenanceDaemonMain ` takes some special actions when
databaseOid is 0:
- Gets the citus.main_db value.
- Connects to the citus.main_db
- Now the `MyDatabaseId `is available, creates a hash entry for it.
- Then follows the same control flow as for a regular db,
DESCRIPTION: Fix leaking of memory and memory contexts in Foreign
Constraint Graphs
Previously, every time we (re)created the Foreign Constraint
Relationship Graph, we created a new Memory Context while loosing a
reference to the previous context. This old context could still have
left over memory in there causing a memory leak.
With this patch we statically have one memory context that we lazily
initialize the first time we create our foreign constraint relationship
graph. On every subsequent creation, beside destroying our previous
hashmap we also reset our memory context to remove any left over
references.
When braking a colocation, we need to create a new colocation group
record in pg_dist_colocation for the relation. It is not sufficient to
have a new colocationid value in pg_dist_partition only.
This patch also fixes a bug when deleting a colocation group if no
tables are left in it. Previously we passed a relation id as a parameter
to DeleteColocationGroupIfNoTablesBelong function, where we should have
passed a colocation id.
PG16 compatibility - Part 6
Check out part 1 42d956888d
part 2 0d503dd5ac
part 3 907d72e60d
part 4 7c6b4ce103
part 5 6056cb2c29
This commit is in the series of PG16 compatibility commits.
It handles the Permission Info changes in PG16. See below:
The main issue lies in the following entries of PlannedStmt: {
rtable
permInfos
}
Each rtable has an int perminfoindex, and its actual permission info is
obtained through the following:
permInfos[perminfoindex]
We had crashes because perminfoindexes were not updated in the finalized
planned statement after distributed planner hook.
So, basically, everywhere we set a query's or planned statement's rtable
entry, we need to set the rteperminfos/permInfos accordingly.
Relevant PG commits:
a61b1f7482
a61b1f74823c9c4f79c95226a461f1e7a367764b
b803b7d132
b803b7d132e3505ab77c29acf91f3d1caa298f95
More PG16 compatibility commits are coming soon ...
PG16 compatibility - Part 2
Part 1 provided successful compilation against pg16beta2.
42d956888d
This PR provides ruleutils changes with pg16beta2 and successful CREATE EXTENSION command.
Note that more changes are needed in order to have successful regression tests.
More commits are coming soon ...
For any_value changes, I referred to this commit
8ef94dc1f5
where we did something similar for PG14 support.
This PR provides successful compilation against PG16Beta2. It does some
necessary refactoring to prepare for full support of version 16, in
https://github.com/citusdata/citus/pull/6952 .
Change RelFileNode to RelFileNumber or RelFileLocator
Relevant PG commit
b0a55e43299c4ea2a9a8c757f9c26352407d0ccc
new header for varatt.h
Relevant PG commit:
d952373a987bad331c0e499463159dd142ced1ef
drop support for Abs, use fabs
Relevant PG commit
357cfefb09115292cfb98d504199e6df8201c957
tuplesort PGcommit: d37aa3d35832afde94e100c4d2a9618b3eb76472
Relevant PG commit:
d37aa3d35832afde94e100c4d2a9618b3eb76472
Fix vacuum in columnar
Relevant PG commit:
4ce3afb82ecfbf64d4f6247e725004e1da30f47c
older one:
b6074846cebc33d752f1d9a66e5a9932f21ad177
Add alloc_flags to pg_clean_ascii
Relevant PG commit:
45b1a67a0fcb3f1588df596431871de4c93cb76f
Merge GetNumConfigOptions() into get_guc_variables()
Relevant PG commit:
3057465acfbea2f3dd7a914a1478064022c6eecd
Minor PG refactor PG_FUNCNAME_MACRO __func__
Relevant PG commit
320f92b744b44f961e5d56f5f21de003e8027a7f
Pass NULL context to stringToQualifiedNameList, typeStringToTypeName
The pre-PG16 error behaviour for the following
stringToQualifiedNameList & typeStringToTypeName
was ereport(ERROR, ...)
Now with PG16 we have this context input. We preserve the same behaviour
by passing a NULL context, because of the following:
(copy paste comment from PG16)
If "context" isn't an ErrorSaveContext node, this behaves as
errstart(ERROR, domain), and the errsave() macro ends up acting
exactly like ereport(ERROR, ...).
Relevant PG commit
858e776c84f48841e7e16fba7b690b76e54f3675
Use RangeVarCallbackMaintainsTable instead of RangeVarCallbackOwnsTable
Relevant PG commit:
60684dd834a222fefedd49b19d1f0a6189c1632e
FIX THIS: Not implemented grant-level control of role inheritance
see PG commit
e3ce2de09d814f8770b2e3b3c152b7671bcdb83f
Make Scan node abstract
PG commit:
8c73c11a0d39049de2c1f400d8765a0eb21f5228
Change in Var representations, get_relids_in_jointree
PG commit
2489d76c4906f4461a364ca8ad7e0751ead8aa0d
Deadlock detection changes because SHM_QUEUE is removed
Relevant PG Commit:
d137cb52cb7fd44a3f24f3c750fbf7924a4e9532
TU_UpdateIndexes
Relevant PG commit
19d8e2308bc51ec4ab993ce90077342c915dd116
Use object_ownercheck and object_aclcheck functions
Relevant PG commits:
afbfc02983f86c4d71825efa6befd547fe81a926
c727f511bd7bf3c58063737bcf7a8f331346f253
Rework Permission Info for successful compilation
Relevant PG commits:
postgres/postgres@a61b1f7postgres/postgres@b803b7d
---------
Co-authored-by: onderkalaci <onderkalaci@gmail.com>
Index scans in PG16 return empty sets because of extra compatibility
enforcement for `ScanKeyInit` arguments.
Could be one of the relevant PG commits:
c8b2ef05f4
This PR fixes all incompatible `RegProcedure` and `Datum` arguments in
all `ScanKeyInit` functions used throughout the codebase.
Helpful for https://github.com/citusdata/citus/pull/6952
This PR fixes the following:
- in oraclelinux-7 `Make` step
```
/usr/bin/ld: utils/replication_origin_session_utils.o: relocation R_X86_64_PC32 against undefined symbol
`IsLocalReplicationOriginSessionActive' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
```
`IsLocalReplicationOriginSessionActive` function has improper inline
declaration, fixed that
- in centos-7 `Make` step
```
utils/background_jobs.c: In function 'StartCitusBackgroundTaskExecutor':
utils/background_jobs.c:1746:6: warning: function might be possible candidate for 'gnu_printf' format attribute
[-Wsuggest-attribute=format]
database, user, jobId, taskId);
^
```
should use `pg_attribute_printf(3,4)` instead of
`pg_attribute_printf(3,0)` since the number of arguments varies for
`SafeSnprintf(char *str, rsize_t count, const char *fmt, ...)`
---------
Co-authored-by: naisila <nicypp@gmail.com>
This PR
* Addresses a concurrency issue in the probabilistic approach of tenant
monitoring by acquiring a shared lock for tenant existence checks.
* Changes `citus.stat_tenants_sample_rate_for_new_tenants` type to
double
* Renames `citus.stat_tenants_sample_rate_for_new_tenants` to
`citus.stat_tenants_untracked_sample_rate`
We did not properly handle the error at ownership check method, which
causes `max stack depth for errors` as in
https://github.com/citusdata/citus/issues/6980.
**Fix:**
In case of an error, we should rollback subtransaction and throw the
message with log level to `LOG_SERVER_ONLY`.
Note: We prevent logs from the client to prevent pg vanilla test
failures due to Citus logs which differs from the actual Postgres logs.
(For context: https://github.com/citusdata/citus/pull/6130)
I also needed to fix a flaky test: `multi_schema_support`
DESCRIPTION: Fixes a bug related to non-existent objects in DDL
commands.
Fixes https://github.com/citusdata/citus/issues/6980
This commit is the second and last phase of dropping PG13 support.
It consists of the following:
- Removes all PG_VERSION_13 & PG_VERSION_14 from codepaths
- Removes pg_version_compat entries and columnar_version_compat entries
specific for PG13
- Removes alternative pg13 test outputs
- Removes PG13 normalize lines and fix the test outputs based on that
It is a continuation of 5bf163a27d
Adds support for altering schema of single shard tables. We do that in 2
steps.
1. Undistribute the tenant table at `preprocess` step,
2. Distribute new schema if it is a distributed schema after DDLs are
propagated.
DESCRIPTION: Adds support for altering a table's schema to/from
distributed schemas.
DESCRIPTION: Enabling citus_stat_tenants to support schema-based
tenants.
This pull request modifies the existing logic to enable tenant
monitoring with schema-based tenants. The changes made are as follows:
- If a query has a partitionKeyValue (which serves as a tenant
key/identifier for distributed tables), Citus annotates the query with
both the partitionKeyValue and colocationId. This allows for accurate
tracking of the query.
- If a query does not have a partitionKeyValue, but its colocationId
belongs to a distributed schema, Citus annotates the query with only the
colocationId. The tenant monitor can then easily look up the schema to
determine if it's a distributed schema and make a decision on whether to
track the query.
---------
Co-authored-by: Jelte Fennema <jelte.fennema@microsoft.com>
Citus build with PG16 fails because of the following warnings:
- using char* instead of Datum
- using pointer instead of oid
- candidate function for format attribute
- remove old definition from PG11 compatibility 62bf571ced
This commit fixes the above.
Adds Support for Single Shard Tables in
`update_distributed_table_colocation`.
This PR changes checks that make sure tables should be hash distributed
table to hash or single shard distributed tables.
DESCRIPTION: Adds citus.enable_schema_based_sharding GUC that allows
sharding the database based on schemas when enabled.
* Refactor the logic that automatically creates Citus managed tables
* Refactor CreateSingleShardTable() to allow specifying colocation id
instead
* Add support for schema-based-sharding via a GUC
### What this PR is about:
Add **citus.enable_schema_based_sharding GUC** to enable schema-based
sharding. Each schema created while this GUC is ON will be considered
as a tenant schema. Later on, regardless of whether the GUC is ON or
OFF, any table created in a tenant schema will be converted to a
single shard distributed table (without a shard key). All the tenant
tables that belong to a particular schema will be co-located with each
other and will have a shard count of 1.
We introduce a new metadata table --pg_dist_tenant_schema-- to do the
bookkeeping for tenant schemas:
```sql
psql> \d pg_dist_tenant_schema
Table "pg_catalog.pg_dist_tenant_schema"
┌───────────────┬─────────┬───────────┬──────────┬─────────┐
│ Column │ Type │ Collation │ Nullable │ Default │
├───────────────┼─────────┼───────────┼──────────┼─────────┤
│ schemaid │ oid │ │ not null │ │
│ colocationid │ integer │ │ not null │ │
└───────────────┴─────────┴───────────┴──────────┴─────────┘
Indexes:
"pg_dist_tenant_schema_pkey" PRIMARY KEY, btree (schemaid)
"pg_dist_tenant_schema_unique_colocationid_index" UNIQUE, btree (colocationid)
psql> table pg_dist_tenant_schema;
┌───────────┬───────────────┐
│ schemaid │ colocationid │
├───────────┼───────────────┤
│ 41963 │ 91 │
│ 41962 │ 90 │
└───────────┴───────────────┘
(2 rows)
```
Colocation id column of pg_dist_tenant_schema can never be NULL even
for the tenant schemas that don't have a tenant table yet. This is
because, we assign colocation ids to tenant schemas as soon as they
are created. That way, we can keep associating tenant schemas with
particular colocation groups even if all the tenant tables of a tenant
schema are dropped and recreated later on.
When a tenant schema is dropped, we delete the corresponding row from
pg_dist_tenant_schema. In that case, we delete the corresponding
colocation group from pg_dist_colocation as well.
### Future work for 12.0 release:
We're building schema-based sharding on top of the infrastructure that
adds support for creating distributed tables without a shard key
(https://github.com/citusdata/citus/pull/6867).
However, not all the operations that can be done on distributed tables
without a shard key necessarily make sense (in the same way) in the
context of schema-based sharding. For example, we need to think about
what happens if user attempts altering schema of a tenant table. We
will tackle such scenarios in a future PR.
We will also add a new UDF --citus.schema_tenant_set() or such-- to
allow users to use an existing schema as a tenant schema, and another
one --citus.schema_tenant_unset() or such-- to stop using a schema as
a tenant schema in future PRs.