This PR provides successful build against PG18Beta1. RuleUtils PR was
reviewed separately: #8010
## PG 18Beta1–related changes for building Citus
### TupleDesc / Attr layout
**What changed in PG:** Postgres consolidated the
`TupleDescData.attrs[]` array into a more compact representation. Direct
field access (tupdesc->attrs[i]) was replaced by the new
`TupleDescAttr()` API.
**Citus adaptation:** Everywhere we previously used
`tupdesc->attrs[...]`, we now call `TupleDescAttr(tupdesc, idx)` (or our
own `Attr()` macro) under a compatibility guard.
*
5983a4cffc
General Logic:
* Use `Attr(...)` in places where `columnar_version_compat.h` is
included. This avoids the need to sprinkle `#if PG_VERSION_NUM` guards
around each attribute access.
* Use `TupleDescAttr(tupdesc, i)` when the relevant PostgreSQL header is
already included and the additional macro indirection is unnecessary.
### Collation‐aware `LIKE`
**What changed in PG:** The `textlike` operator now requires an explicit
collation, to avoid ambiguous‐collation errors. Core code switched from
`DirectFunctionCall2(textlike, ...)` to
`DirectFunctionCall2Coll(textlike, DEFAULT_COLLATION_OID, ...)`.
**Citus adaptation:** In `remote_commands.c` and any other LIKE call, we
now use `DirectFunctionCall2Coll(textlike, DEFAULT_COLLATION_OID, ...)`
and `#include <utils/pg_collation.h>`.
*
85b7efa1cd
### Columnar storage API
* Adapt `columnar_relation_set_new_filelocator` (and related init
routines) for PG 18’s revised SMGR and storage-initialization hooks.
* Pull in the new headers (`explain_format.h`,
`columnar_version_compat.h`) so the columnar module compiles cleanly
against PG 18.
- heap_modify_tuple + heap_inplace_update only exist on PG < 18; on PG18
the in-place helper was removed upstream
-
a07e03fd8f
### OpenSSL / TLS integration
**What changed in PG:** Moved from the legacy `SSL_library_init()` to
`OPENSSL_init_ssl(OPENSSL_INIT_LOAD_CONFIG, NULL)`, updated certificate
API calls (`X509_getm_notBefore`, `X509_getm_notAfter`), and
standardized on `TLS_method()`.
**Citus adaptation:** We now `#include <openssl/opensslv.h>` and use
`#if OPENSSL_VERSION_NUMBER >= 0x10100000L` to choose between`
OPENSSL_init_ssl()` or `SSL_library_init()`, and wrap`
X509_gmtime_adj()` calls around the new accessor functions.
*
6c66b7443c
### Adapt `ExtractColumns()` to the new PG-18 `expandRTE()` signature
PostgreSQL 18
80feb727c8
added a fourth argument of type `VarReturningType` to `expandRTE()`, so
calls that used the old 7-parameter form no longer compile. This patch:
* Wraps the `expandRTE(...)` call in a `#if PG_VERSION_NUM >= 180000`
guard.
* On PG 18+ passes the new `VAR_RETURNING_DEFAULT` argument before
`location`.
* On PG 15–17 continues to call the original 7-arg form.
* Adds the necessary includes (`parser/parse_relation.h` for `expandRTE`
and `VarReturningType`, and `pg_version_constants.h` for
`PG_VERSION_NUM`).
### Adapt `ExecutorStart`/`ExecutorRun` hooks to PG-18’s new signatures
PostgreSQL 18
525392d572
changed the signatures of the executor hooks:
* `ExecutorStart_hook` now returns `bool` instead of `void`, and
* `ExecutorRun_hook` drops its old `run_once` argument.
This patch preserves Citus’s existing hook logic by:
1. **Adding two adapter functions** under `#if PG_VERSION_NUM >=
PG_VERSION_18`:
* `citus_executor_start_adapter(QueryDesc *queryDesc, int eflags)`
Calls the old `CitusExecutorStart(queryDesc, eflags)` and then returns
`true` to satisfy the new hook’s `bool` return type.
* `citus_executor_run_adapter(QueryDesc *queryDesc, ScanDirection
direction, uint64 count)`
Calls the old `CitusExecutorRun(queryDesc, direction, count, true)`
(passing `true` for the dropped `run_once` argument), and returns
`void`.
2. **Installing the adapters** in `_PG_init()` instead of the original
hooks when building against PG 18+:
```c
#if PG_VERSION_NUM >= PG_VERSION_18
ExecutorStart_hook = citus_executor_start_adapter;
ExecutorRun_hook = citus_executor_run_adapter;
#else
ExecutorStart_hook = CitusExecutorStart;
ExecutorRun_hook = CitusExecutorRun;
#endif
```
### Adapt to PG-18’s removal of the “run\_once” flag from
ExecutorRun/PortalRun
PostgreSQL commit
[[3eea7a0](3eea7a0c97)
rationalized the executor’s parallelism logic by moving the “execute a
plan only once” check into `ExecutePlan()` itself and dropping the old
`bool run_once` argument from the public APIs:
```diff
- void ExecutorRun(QueryDesc *queryDesc,
- ScanDirection direction,
- uint64 count,
- bool run_once);
+ void ExecutorRun(QueryDesc *queryDesc,
+ ScanDirection direction,
+ uint64 count);
```
(and similarly for `PortalRun()`).
To stay compatible across PG 15–18, Citus now:
1. **Updates all internal calls** to `ExecutorRun(...)` and
`PortalRun(...)`:
* On PG 18+, use the new three-argument form (`ExecutorRun(qd, dir,
count)`).
* On PG 15–17, keep the old four-arg form (`ExecutorRun(qd, dir, count,
true)`) under a `#if PG_VERSION_NUM < 180000` guard.
2. **Guards the dispatcher hooks** via the adapter functions (from the
earlier patch) so that Citus’s executor hooks continue to work under
both the old and new signatures.
### Adapt to PG-18’s shortened PortalRun signature
PostgreSQL 18’s refactoring (see commit
[3eea7a0](3eea7a0c97))
also removed the old run_once and alternate‐dest arguments from the
public PortalRun() API. The signature changed from:
```diff
- bool PortalRun(Portal portal,
- long count,
- bool isTopLevel,
- bool run_once,
- DestReceiver *dest,
- DestReceiver *altdest,
- QueryCompletion *qc);
+ bool PortalRun(Portal portal,
+ long count,
+ bool isTopLevel,
+ DestReceiver *dest,
+ DestReceiver *altdest,
+ QueryCompletion *qc);
```
To support both versions in Citus, we:
1. **Version-guard each call** to `PortalRun()`:
* **On PG 18+** invoke the new 6-argument form.
* **On PG 15–17** fall back to the legacy 7-argument form, passing
`true` for `run_once`.
### Add support for PG-18’s new `plansource` argument in
`PortalDefineQuery`**
PostgreSQL 18 extended the `PortalDefineQuery` API to carry a
`CachedPlanSource *plansource` pointer so that the portal machinery can
track cached‐plan invalidation (as introduced alongside deferred-locking
in commit
525392d572.
To remain compatible across PG 15–18, Citus now wraps its calls under a
version guard:
```diff
- PortalDefineQuery(portal, NULL, sql, commandTag, plantree_list, NULL);
+#if PG_VERSION_NUM >= 180000
+ /* PG 18+: seven-arg signature (adds plansource) */
+ PortalDefineQuery(
+ portal,
+ NULL, /* no prepared-stmt name */
+ sql, /* the query text */
+ commandTag, /* the CommandTag */
+ plantree_list, /* List of PlannedStmt* */
+ NULL, /* no CachedPlan */
+ NULL /* no CachedPlanSource */
+ );
+#else
+ /* PG 15–17: six-arg signature */
+ PortalDefineQuery(
+ portal,
+ NULL, /* no prepared-stmt name */
+ sql, /* the query text */
+ commandTag, /* the CommandTag */
+ plantree_list, /* List of PlannedStmt* */
+ NULL /* no CachedPlan */
+ );
+#endif
```
### Adapt ExecInitRangeTable() calls to PG-18’s new signature
PostgreSQL commit
[cbc127917e04a978a788b8bc9d35a70244396d5b](cbc127917e)
overhauled the planner API for range‐table initialization:
**PG 18+**: added a fourth `Bitmapset *unpruned_relids` argument to
support deferred partition pruning
In Citus’s `create_estate_for_relation()` (in `columnar_metadata.c`), we
now wrap the call in a compile‐time guard so that the code compiles
correctly on all supported PostgreSQL versions:
```
/* Prepare permission info on PG 16+ */
#if PG_VERSION_NUM >= PG_VERSION_16
List *perminfos = NIL;
addRTEPermissionInfo(&perminfos, rte);
#else
List *perminfos = NIL; /* unused on PG 15 */
#endif
/* Initialize the range table, with the right signature for each PG version */
#if PG_VERSION_NUM >= PG_VERSION_18
/* PG 18+: four‐arg signature (adds unpruned_relids) */
ExecInitRangeTable(
estate,
list_make1(rte),
perminfos,
NULL /* unpruned_relids: not used by columnar */
);
#elif PG_VERSION_NUM >= PG_VERSION_16
/* PG 16–17: three‐arg signature (permInfos) */
ExecInitRangeTable(
estate,
list_make1(rte),
perminfos
);
#else
/* PG 15: two‐arg signature */
ExecInitRangeTable(
estate,
list_make1(rte)
);
#endif
estate->es_output_cid = GetCurrentCommandId(true);
```
### Adapt `pgstat_report_vacuum()` to PG-18’s new timestamp argument
PostgreSQL commit
[[30a6ed0ce4bb18212ec38cdb537ea4b43bc99b83](30a6ed0ce4)
extended the `pgstat_report_vacuum()` API by adding a `TimestampTz
start_time` parameter at the end so that the VACUUM statistics collector
can record when the operation began:
```diff
/* PG ≤17: four-arg signature */
- void pgstat_report_vacuum(Oid tableoid,
- bool shared,
- double num_live_tuples,
- double num_dead_tuples);
+/* PG ≥18: five-arg signature adds a start_time */
+ void pgstat_report_vacuum(Oid tableoid,
+ bool shared,
+ double num_live_tuples,
+ double num_dead_tuples,
+ TimestampTz start_time);
```
To support both versions, we now wrap the call in `columnar_tableam.c`
with a version guard, supplying `GetCurrentTimestamp()` for PG-18+:
```c
#if PG_VERSION_NUM >= 180000
/* PG 18+: include start_timestamp */
pgstat_report_vacuum(
RelationGetRelid(rel),
rel->rd_rel->relisshared,
Max(new_live_tuples, 0), /* live tuples */
0, /* dead tuples */
GetCurrentTimestamp() /* start time */
);
#else
/* PG 15–17: original signature */
pgstat_report_vacuum(
RelationGetRelid(rel),
rel->rd_rel->relisshared,
Max(new_live_tuples, 0), /* live tuples */
0 /* dead tuples */
);
#endif
```
### Adapt `ExecuteTaskPlan()` to PG-18’s expanded `CreateQueryDesc()`
signature
PostgreSQL 18 changed `CreateQueryDesc()` from an eight-argument to a
nine-argument call by inserting a `CachedPlan *cplan` parameter
immediately after the `PlannedStmt *plannedstmt` argument (see commit
525392d572).
To remain compatible with PG 15–17, Citus now wraps its invocation in
`local_executor.c` with a version guard:
```diff
- /* PG15–17: eight-arg CreateQueryDesc without cached plan */
- QueryDesc *queryDesc = CreateQueryDesc(
- taskPlan, /* PlannedStmt *plannedstmt */
- queryString, /* const char *sourceText */
- GetActiveSnapshot(),/* Snapshot snapshot */
- InvalidSnapshot, /* Snapshot crosscheck_snapshot */
- destReceiver, /* DestReceiver *dest */
- paramListInfo, /* ParamListInfo params */
- queryEnv, /* QueryEnvironment *queryEnv */
- 0 /* int instrument_options */
- );
+#if PG_VERSION_NUM >= 180000
+ /* PG18+: nine-arg CreateQueryDesc with a CachedPlan slot */
+ QueryDesc *queryDesc = CreateQueryDesc(
+ taskPlan, /* PlannedStmt *plannedstmt */
+ NULL, /* CachedPlan *cplan (none) */
+ queryString, /* const char *sourceText */
+ GetActiveSnapshot(),/* Snapshot snapshot */
+ InvalidSnapshot, /* Snapshot crosscheck_snapshot */
+ destReceiver, /* DestReceiver *dest */
+ paramListInfo, /* ParamListInfo params */
+ queryEnv, /* QueryEnvironment *queryEnv */
+ 0 /* int instrument_options */
+ );
+#else
+ /* PG15–17: eight-arg CreateQueryDesc without cached plan */
+ QueryDesc *queryDesc = CreateQueryDesc(
+ taskPlan, /* PlannedStmt *plannedstmt */
+ queryString, /* const char *sourceText */
+ GetActiveSnapshot(),/* Snapshot snapshot */
+ InvalidSnapshot, /* Snapshot crosscheck_snapshot */
+ destReceiver, /* DestReceiver *dest */
+ paramListInfo, /* ParamListInfo params */
+ queryEnv, /* QueryEnvironment *queryEnv */
+ 0 /* int instrument_options */
+ );
+#endif
```
### Adapt `RelationGetPrimaryKeyIndex()` to PG-18’s new “deferrable\_ok”
flag
PostgreSQL commit
14e87ffa5c
added a new Boolean `deferrable_ok` parameter to
`RelationGetPrimaryKeyIndex()` so that the lock manager can defer
unique‐constraint locks when requested. The API changed from:
```c
RelationGetPrimaryKeyIndex(Relation relation)
```
to:
```c
RelationGetPrimaryKeyIndex(Relation relation, bool deferrable_ok)
```
```diff
diff --git a/src/backend/distributed/metadata/node_metadata.c
b/src/backend/distributed/metadata/node_metadata.c
index e3a1b2c..f4d5e6f 100644
--- a/src/backend/distributed/metadata/node_metadata.c
+++ b/src/backend/distributed/metadata/node_metadata.c
@@ -2965,8 +2965,18 @@
*/
- Relation replicaIndex =
index_open(RelationGetPrimaryKeyIndex(pgDistNode),
- AccessShareLock);
+ #if PG_VERSION_NUM >= PG_VERSION_18
+ /* PG 18+ adds a bool "deferrable_ok" parameter */
+ Relation replicaIndex =
+ index_open(
+ RelationGetPrimaryKeyIndex(pgDistNode, false),
+ AccessShareLock);
+ #else
+ Relation replicaIndex =
+ index_open(
+ RelationGetPrimaryKeyIndex(pgDistNode),
+ AccessShareLock);
+ #endif
ScanKeyInit(&scanKey[0], Anum_pg_dist_node_nodename,
BTEqualStrategyNumber, F_TEXTEQ, CStringGetTextDatum(nodeName));
```
```diff
diff --git a/src/backend/distributed/operations/node_protocol.c b/src/backend/distributed/operations/node_protocol.c
index e3a1b2c..f4d5e6f 100644
--- a/src/backend/distributed/operations/node_protocol.c
+++ b/src/backend/distributed/operations/node_protocol.c
@@ -746,7 +746,12 @@
if (!OidIsValid(idxoid))
{
- idxoid = RelationGetPrimaryKeyIndex(rel);
+ /* Determine the index OID of the primary key (PG18 adds a second parameter) */
+#if PG_VERSION_NUM >= PG_VERSION_18
+ idxoid = RelationGetPrimaryKeyIndex(rel, false);
+#else
+ idxoid = RelationGetPrimaryKeyIndex(rel);
+#endif
}
return idxoid;
```
Because Citus has always taken the lock immediately—just as the old
two-arg call did—we pass `false` to keep that same immediate-lock
behavior. Passing `true` would switch to deferred locking, which we
don’t want.
### Adapt `ExplainOnePlan()` to PG-18’s expanded API
PostgreSQL 18 extended
525392d572
the `ExplainOnePlan()` function to carry the `CachedPlan *` and
`CachedPlanSource *` pointers plus an explicit `query_index`, letting
the EXPLAIN machinery track plan‐source invalidation. The old signature:
```c
/* PG ≤17 */
void
ExplainOnePlan(PlannedStmt *plannedstmt,
IntoClause *into,
struct ExplainState *es,
const char *queryString,
ParamListInfo params,
QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage);
```
became, in PG 18:
```c
/* PG ≥18 */
void
ExplainOnePlan(PlannedStmt *plannedstmt,
CachedPlan *cplan,
CachedPlanSource *plansource,
int query_index,
IntoClause *into,
struct ExplainState *es,
const char *queryString,
ParamListInfo params,
QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage,
const MemoryContextCounters *mem_counters);
```
To compile under both versions, Citus now wraps each call in
`multi_explain.c` with:
```c
#if PG_VERSION_NUM >= PG_VERSION_18
/* PG 18+: pass NULL for the new cached‐plan fields and zero for query_index */
ExplainOnePlan(
plan, /* PlannedStmt *plannedstmt */
NULL, /* CachedPlan *cplan */
NULL, /* CachedPlanSource *plansource */
0, /* query_index */
into, /* IntoClause *into */
es, /* ExplainState *es */
queryString, /* const char *queryString */
params, /* ParamListInfo params */
NULL, /* QueryEnvironment *queryEnv */
&planduration,/* const instr_time *planduration */
(es->buffers ? &bufusage : NULL),
(es->memory ? &mem_counters : NULL)
);
#elif PG_VERSION_NUM >= PG_VERSION_17
/* PG 17: same as before, plus passing mem_counters if enabled */
ExplainOnePlan(
plan,
into,
es,
queryString,
params,
queryEnv,
&planduration,
(es->buffers ? &bufusage : NULL),
(es->memory ? &mem_counters : NULL)
);
#else
/* PG 15–16: original seven-arg form */
ExplainOnePlan(
plan,
into,
es,
queryString,
params,
queryEnv,
&planduration,
(es->buffers ? &bufusage : NULL)
);
#endif
```
### Adapt to the unified “index interpretation” API in PG 18 (commit
a8025f544854)
PostgreSQL commit
a8025f5448
generalized the old btree‐specific operator‐interpretation API into a
single “index interpretation” interface:
* **Renamed type**:
`OpBtreeInterpretation` → `OpIndexInterpretation`
* **Renamed function**:
`get_op_btree_interpretation(opno)` →
`get_op_index_interpretation(opno)`
* **Unified field**:
Each interpretation now carries `cmptype` instead of `strategy`.
To build cleanly on PG 18 while still supporting PG 15–17, Citus’s
shard‐pruning code now wraps these changes:
```c
#include "pg_version_constants.h"
#if PG_VERSION_NUM >= PG_VERSION_18
/* On PG 18+ the btree‐only APIs vanished; alias them to the new generic versions */
typedef OpIndexInterpretation OpBtreeInterpretation;
#define get_op_btree_interpretation(opno) get_op_index_interpretation(opno)
#define ROWCOMPARE_NE COMPARE_NE
#endif
/* … later, when checking an interpretation … */
OpBtreeInterpretation *interp =
(OpBtreeInterpretation *) lfirst(cell);
#if PG_VERSION_NUM >= PG_VERSION_18
/* use cmptype on PG 18+ */
if (interp->cmptype == ROWCOMPARE_NE)
#else
/* use strategy on PG 15–17 */
if (interp->strategy == ROWCOMPARE_NE)
#endif
{
/* … */
}
```
### Adapt `create_foreignscan_path()` for PG-18’s revised signature
PostgreSQL commit
e222534679
reordered and removed a couple of parameters in the FDW‐path builder:
* **PG 15–17 signature (11 args)**
```c
create_foreignscan_path(PlannerInfo *root,
RelOptInfo *rel,
PathTarget *target,
double rows,
Cost startup_cost,
Cost total_cost,
List *pathkeys,
Relids required_outer,
Path *fdw_outerpath,
List *fdw_restrictinfo,
List *fdw_private);
```
* **PG 18+ signature (9 args)**
```c
create_foreignscan_path(PlannerInfo *root,
RelOptInfo *rel,
PathTarget *target,
double rows,
int disabled_nodes,
Cost startup_cost,
Cost total_cost,
Relids required_outer,
Path *fdw_outerpath,
List *fdw_private);
```
To support both, Citus now defines a compatibility macro in
`pg_version_compat.h`:
```c
#include "nodes/bitmapset.h" /* for Relids */
#include "nodes/pg_list.h" /* for List */
#include "optimizer/pathnode.h" /* for create_foreignscan_path() */
#if PG_VERSION_NUM >= PG_VERSION_18
/* PG18+: drop pathkeys & fdw_restrictinfo, add disabled_nodes */
#define create_foreignscan_path_compat(a, b, c, d, e, f, g, h, i, j, k) \
create_foreignscan_path( \
(a), /* root */ \
(b), /* rel */ \
(c), /* target */ \
(d), /* rows */ \
(0), /* disabled_nodes (unused by Citus) */ \
(e), /* startup_cost */ \
(f), /* total_cost */ \
(g), /* required_outer */ \
(h), /* fdw_outerpath */ \
(k) /* fdw_private */ \
)
#else
/* PG15–17: original signature */
#define create_foreignscan_path_compat(a, b, c, d, e, f, g, h, i, j, k) \
create_foreignscan_path( \
(a), (b), (c), (d), \
(e), (f), \
(g), (h), (i), (j), (k) \
)
#endif
```
Now every call to `create_foreignscan_path_compat(...)`—even in tests
like `fake_fdw.c`—automatically picks the correct argument list for
PG 15 through PG 18.
### Drop the obsolete bitmap‐scan hooks on PG 18+
PostgreSQL commit
c3953226a0
cleaned up the `TableAmRoutine` API by removing the two bitmap‐scan
callback slots:
* `scan_bitmap_next_block`
* `scan_bitmap_next_tuple`
Since those hook‐slots no longer exist in PG 18, Citus now wraps their
NULL‐initialization in a `#if PG_VERSION_NUM < PG_VERSION_18` guard. On
PG 15–17 we still explicitly set them to `NULL` (to satisfy the old
struct layout), and on PG 18+ we omit them entirely:
```c
#if PG_VERSION_NUM < PG_VERSION_18
/* PG 15–17 only: these fields were removed upstream in PG 18 */
.scan_bitmap_next_block = NULL,
.scan_bitmap_next_tuple = NULL,
#endif
```
### Adapt `vac_update_relstats()` invocation to PG-18’s new
“all\_frozen” argument
PostgreSQL commit
99f8f3fbbc
extended the `vac_update_relstats()` API by inserting a
`num_all_frozen_pages` parameter between the existing
`num_all_visible_pages` and `hasindex` arguments:
```diff
- /* PG ≤17: */
- void
- vac_update_relstats(Relation relation,
- BlockNumber num_pages,
- double num_tuples,
- BlockNumber num_all_visible_pages,
- bool hasindex,
- TransactionId frozenxid,
- MultiXactId minmulti,
- bool *frozenxid_updated,
- bool *minmulti_updated,
- bool in_outer_xact);
+ /* PG ≥18: adds num_all_frozen_pages */
+ void
+ vac_update_relstats(Relation relation,
+ BlockNumber num_pages,
+ double num_tuples,
+ BlockNumber num_all_visible_pages,
+ BlockNumber num_all_frozen_pages,
+ bool hasindex,
+ TransactionId frozenxid,
+ MultiXactId minmulti,
+ bool *frozenxid_updated,
+ bool *minmulti_updated,
+ bool in_outer_xact);
```
To compile cleanly on both PG 15–17 and PG 18+, Citus wraps its call in
a version guard and supplies a zero placeholder for the new field:
```c
#if PG_VERSION_NUM >= 180000
/* PG 18+: supply explicit “all_frozen” count */
vac_update_relstats(
rel,
new_rel_pages,
new_live_tuples,
new_rel_allvisible, /* allvisible */
0, /* all_frozen */
nindexes > 0,
newRelFrozenXid,
newRelminMxid,
&frozenxid_updated,
&minmulti_updated,
false /* in_outer_xact */
);
#else
/* PG 15–17: original signature */
vac_update_relstats(
rel,
new_rel_pages,
new_live_tuples,
new_rel_allvisible,
nindexes > 0,
newRelFrozenXid,
newRelminMxid,
&frozenxid_updated,
&minmulti_updated,
false /* in_outer_xact */
);
#endif
```
**Why all_frozen = 0?**
Columnar storage never embeds transaction IDs in its pages, so it never
needs to track “all‐frozen” pages the way a heap does. Setting both
allvisible and allfrozen to zero simply tells Postgres “there are no
pages with the visibility or frozen‐status bits set,” matching our
existing behavior.
This change ensures Citus’s VACUUM‐statistic updates work unmodified
across all supported Postgres versions.
DESCRIPTION: Fixes deadlock with transaction recovery that is possible
during Citus upgrades.
Fixes#7875.
This commit addresses two interrelated deadlock issues uncovered during Citus
upgrades:
1. Local Deadlock:
- **Problem:**
In `RecoverWorkerTransactions()`, a new connection is created for each worker
node to perform transaction recovery by locking the
`pg_dist_transaction` catalog table until the end of the transaction. When
`RecoverTwoPhaseCommits()` calls this function for each worker node, the order
of acquiring locks on `pg_dist_authinfo` and `pg_dist_transaction` can alternate.
This reversal can lead to a deadlock if any concurrent process requires locks on
these tables.
- **Fix:**
Pre-establish all worker node connections upfront so that
`RecoverWorkerTransactions()` operates with a single, consistent connection.
This ensures that locks on `pg_dist_authinfo` and `pg_dist_transaction` are always
acquired in the correct order, thereby preventing the local deadlock.
2. Distributed Deadlock:
- **Problem:**
After resolving the local deadlock, a distributed deadlock issue emerges. The
maintenance daemon calls `RecoverWorkerTransactions()` on each worker node—
including the local node—which leads to a complex locking sequence:
- A RowExclusiveLock is taken on the `pg_dist_transaction` table in
`RecoverWorkerTransactions()`.
- An update extension then tries to acquire an AccessExclusiveLock on the same
table, getting blocked by the RowExclusiveLock.
- A subsequent query (e.g., a SELECT on `pg_prepared_xacts`) issued using a
separate connection on the local node gets blocked due to locks held during a
call to `BuildCitusTableCacheEntry()`.
- The maintenance daemon waits for this query, resulting in a circular wait and
stalling the entire cluster.
- **Fix:**
Avoid cache lookups for internal PostgreSQL tables by implementing an early bailout
for relation IDs below `FirstNormalObjectId` (system objects). This eliminates
unnecessary calls to `BuildCitusTableCache`, reducing lock contention and mitigating
the distributed deadlock.
Furthermore, this optimization improves performance in fast
connect→query_catalog→disconnect cycles by eliminating redundant
cache creation and lookups.
3. Also reverts the commit that disabled the relevant test cases.
As of this commit, after recovering the remote transactions, now we release the lock
on pg_dist_transaction while closing it to avoid deadlocks that might occur because
of trying to acquire a lock on pg_dist_authinfo while holding a lock on
pg_dist_transaction. Such a scenario can only cause a deadlock if another transaction
is trying to acquire a strong lock on pg_dist_transaction while holding a lock on
pg_dist_authinfo. As of today, we (implicitly) acquire a strong lock on
pg_dist_transaction only when upgrading Citus to 11.3-1 and this happens when creating
a REPLICA IDENTITY on pg_dist_transaction.
And regardless of the code-path we are in, it should be okay to release the lock there
because all we do after that point is to abort the prepared transactions that are not
part of an in-progress distributed transaction and releasing the lock before doing so
should be just fine.
This also changes the blocking behavior between citus_create_restore_point and the
transaction recovery code-path in the sense that now citus_create_restore_point doesn't
until transaction recovery completes aborting the prepared transactions that are not
part of an in-progress distributed transaction. However, this should be fine because
even before this was possible, e.g., if transaction recovery fails to open a remote
connection to a node.
This is prep work for successful compilation with PG17
PG17added foreach_ptr, foreach_int and foreach_oid macros
Relevant PG commit
14dd0f27d7cd56ffae9ecdbe324965073d01a9ff
14dd0f27d7
We already have these macros, but they are different with the
PG17 ones because our macros take a DECLARED variable, whereas
the PG16 macros declare a locally-scoped loop variable themselves.
Hence I am renaming our macros to foreach_declared_
I am separating this into its own PR since it touches many files. The
main compilation PR is https://github.com/citusdata/citus/pull/7699
Fixes https://github.com/citusdata/citus/issues/7536.
Note to reviewer:
Before this commit, the following results in an assertion failure when
executed locally and this won't be the case anymore:
```console
make -C src/test/regress/ check-citus-upgrade-local citus-old-version=v10.2.0
```
Note that this doesn't happen on CI as we don't enable assertions there.
---------
Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
DESCRIPTION: Adds support for 2PC from non-Citus main databases
This PR only adds support for `CREATE USER` queries, other queries need
to be added. But it should be simple because this PR creates the
underlying structure.
Citus main database is the database where the Citus extension is
created. A non-main database is all the other databases that are in the
same node with a Citus main database.
When a `CREATE USER` query is run on a non-main database we:
1. Run `start_management_transaction` on the main database. This
function saves the outer transaction's xid (the non-main database
query's transaction id) and marks the current query as main db command.
2. Run `execute_command_on_remote_nodes_as_user("CREATE USER
<username>", <username to run the command>)` on the main database. This
function creates the users in the rest of the cluster by running the
query on the other nodes. The user on the current node is created by the
query on the outer, non-main db, query to make sure consequent commands
in the same transaction can see this user.
3. Run `mark_object_distributed` on the main database. This function
adds the user to `pg_dist_object` in all of the nodes, including the
current one.
This PR also implements transaction recovery for the queries from
non-main databases.
This change adds a script to programatically group all includes in a
specific order. The script was used as a one time invocation to group
and sort all includes throught our formatted code. The grouping is as
follows:
- System includes (eg. `#include<...>`)
- Postgres.h (eg. `#include "postgres.h"`)
- Toplevel imports from postgres, not contained in a directory (eg.
`#include "miscadmin.h"`)
- General postgres includes (eg . `#include "nodes/..."`)
- Toplevel citus includes, not contained in a directory (eg. `#include
"citus_verion.h"`)
- Columnar includes (eg. `#include "columnar/..."`)
- Distributed includes (eg. `#include "distributed/..."`)
Because it is quite hard to understand the difference between toplevel
citus includes and toplevel postgres includes it hardcodes the list of
toplevel citus includes. In the same manner it assumes anything not
prefixed with `columnar/` or `distributed/` as a postgres include.
The sorting/grouping is enforced by CI. Since we do so with our own
script there are not changes required in our uncrustify configuration.
* Not take ShareUpdateExlusiveLock on pg_dist_transaction
We were taking ShareUpdateExlusiveLock on pg_dist_transaction during
recovery to prevent multiple recoveries happening concurrenly. VACUUM(
not FULL) also takes ShareUpdateExclusiveLock, and they can conflict. It
seems that VACUUM will skip the table if there is a conflicting lock
already taken unless it is doing the vacuum to prevent id wraparound, in
which case there can be a deadlock. I guess the deadlock happens if:
- VACUUM takes a lock on pg_dist_transaction and is done for id
wraparound problem
- The transaction in the maintenance tries to take a lock but
cannot as that conflicts with the lock acquired by VACUUM
- The transaction in the maintenance daemon has a very old xid hence
VACUUM cannot proceed.
If we take a row exclusive lock in transaction recovery then it wouldn't
conflict with VACUUM hence it could proceed so the deadlock would be
resolved. To prevent concurrent transaction recoveries happening, an
advisory lock is taken with ShareUpdateExlusiveLock as before.
* Use CITUS_OPERATIONS tag
CMDTAG_SELECT exists in PG12 hence defining a MACRO such as
CMDTAG_SELECT -> "SELECT" is not possible. I chose CMDTAG_SELECT_COMPAT
because with the COMPAT suffix it is explicit that it maps to different
things in different versions and also has a less chance of mapping
something irrevelant. For example if we used SELECT as a macro, then it
would map every SELECT to whatever it is mapping to, which might have
unexpected/undesired behaviour.
With PG13 heap_* (heap_open, heap_close etc) are replaced with table_*
(table_open, table_close etc).
It is better to use the new table access methods in the codebase and
define the macros for the previous versions as we can easily remove the
macro without having to change the codebase when we drop the support for
the old version.
Commits that introduced this change on Postgres:
f25968c49697db673f6cd2a07b3f7626779f1827
e0c4ec07284db817e1f8d9adfb3fffc952252db0
4b21acf522d751ba5b6679df391d5121b6c4a35f
Command to see relevant commits on Postgres side:
git log --all --grep="heap_open"
- changes in ruleutils_11.c is reflected
- vacuum statement api change is handled. We now allow
multi-table vacuum commands.
- some other function header changes are reflected
- api conflicts between PG11 and earlier versions
are handled by adding shims in version_compat.h
- various regression tests are fixed due output and
functionality in PG1
- no change is made to support new features in PG11
they need to be handled by new commit
- master_add_node enforces that there is only one primary per group
- there's also a trigger on pg_dist_node to prevent multiple primaries
per group
- functions in metadata cache only return primary nodes
- Rename ActiveWorkerNodeList -> ActivePrimaryNodeList
- Rename WorkerGetLive{Node->Group}Count()
- Refactor WorkerGetRandomCandidateNode
- master_remove_node only complains about active shard placements if the
node being removed is a primary.
- master_remove_node only deletes all reference table placements in the
group if the node being removed is the primary.
- Rename {Node->NodeGroup}HasShardPlacements, this reflects the behavior it
already had.
- Rename DeleteAllReferenceTablePlacementsFrom{Node->NodeGroup}. This also
reflects the behavior it already had, but the new signature forces the
caller to pass in a groupId
- Rename {WorkerGetLiveGroup->ActivePrimaryNode}Count
Adds support for PostgreSQL 10 by copying in the requisite ruleutils
and updating all API usages to conform with changes in PostgreSQL 10.
Most changes are fairly minor but they are numerous. One particular
obstacle was the change in \d behavior in PostgreSQL 10's psql; I had
to add SQL implementations (views, mostly) to mimic the pre-10 output.
With this change we add an option to add a node without replicating all reference
tables to that node. If a node is added with this option, we mark the node as
inactive and no queries will sent to that node.
We also added two new UDFs;
- master_activate_node(host, port):
- marks node as active and replicates all reference tables to that node
- master_add_inactive_node(host, port):
- only adds node to pg_dist_node
A small refactor which pulls some code out of `RecoverWorkerTransactions`
and into `remote_commands.c`. This code block currently only occurs in
`RecoverWorkerTransactions` but will be useful to other functions
shortly.
Unfortunately we couldn't call it `ExecuteRemoteCommand`, that name was
already taken.