citus

Commit Graph

Author	SHA1	Message	Date
ibrahim halatci	6b9962c0c0	[doc] wrong code comments for function PopUnassignedPlacementExecution (#8079 ) Fixes #7621 DESCRIPTION: function comment correction	2025-07-29 13:24:42 +03:00
Mehmet YILMAZ	9327df8446	Add PG 18Beta2 Build compatibility (#8060 ) Fixes #8061 Add PG 18Beta2 Build compatibility Revert "Don't lock partitions pruned by initial pruning Relevant PG commit: 1722d5eb05d8e5d2e064cd1798abcae4f296ca9d https://github.com/postgres/postgres/commit/1722d5e	2025-07-23 15:15:55 +03:00
Colm	245a62df3e	Avoid query deparse and planning of shard query in local execution. (#8035 ) DESCRIPTION: Avoid query deparse and planning of shard query in local execution. Adds citus.enable_local_execution_local_plan GUC to allow avoiding unnecessary query deparsing to improve performance of fast-path queries targeting local shards. If a fast path query resolves to a shard that is local to the node planning the query, a shortcut can be taken so that the OID of the shard is plugged into the parse tree, which is then planned by Postgres. In `local_executor.c` the task uses that plan instead of parsing and planning a shard query. How this is done: The fast path planner identifies if the shortcut is possible, and then the distributed planner checks, using `CheckAndBuildDelayedFastPathPlan()`, if a local plan can be generated or if the shard query should be generated. This optimization is controlled by a GUC `citus.enable_local_execution_local_plan` which is on by default. A new regress test `local_execution_local_plan` tests both row-sharding and schema sharding. Negative tests are added to `local_shard_execution_dropped_column` to verify that the optimization is not taken when the shard is local but there is a difference between the shard and distributed table because of a dropped column.	2025-07-22 17:16:53 +01:00
Mehmet YILMAZ	9e42f3f2c4	Add PG 18Beta1 compatibility (Build + RuleUtils) (#7981 ) This PR provides successful build against PG18Beta1. RuleUtils PR was reviewed separately: #8010 ## PG 18Beta1–related changes for building Citus ### TupleDesc / Attr layout What changed in PG: Postgres consolidated the `TupleDescData.attrs[]` array into a more compact representation. Direct field access (tupdesc->attrs[i]) was replaced by the new `TupleDescAttr()` API. Citus adaptation: Everywhere we previously used `tupdesc->attrs[...]`, we now call `TupleDescAttr(tupdesc, idx)` (or our own `Attr()` macro) under a compatibility guard. * `5983a4cffc` General Logic: * Use `Attr(...)` in places where `columnar_version_compat.h` is included. This avoids the need to sprinkle `#if PG_VERSION_NUM` guards around each attribute access. * Use `TupleDescAttr(tupdesc, i)` when the relevant PostgreSQL header is already included and the additional macro indirection is unnecessary. ### Collation‐aware `LIKE` What changed in PG: The `textlike` operator now requires an explicit collation, to avoid ambiguous‐collation errors. Core code switched from `DirectFunctionCall2(textlike, ...)` to `DirectFunctionCall2Coll(textlike, DEFAULT_COLLATION_OID, ...)`. Citus adaptation: In `remote_commands.c` and any other LIKE call, we now use `DirectFunctionCall2Coll(textlike, DEFAULT_COLLATION_OID, ...)` and `#include <utils/pg_collation.h>`. * `85b7efa1cd` ### Columnar storage API * Adapt `columnar_relation_set_new_filelocator` (and related init routines) for PG 18’s revised SMGR and storage-initialization hooks. * Pull in the new headers (`explain_format.h`, `columnar_version_compat.h`) so the columnar module compiles cleanly against PG 18. - heap_modify_tuple + heap_inplace_update only exist on PG < 18; on PG18 the in-place helper was removed upstream - `a07e03fd8f` ### OpenSSL / TLS integration What changed in PG: Moved from the legacy `SSL_library_init()` to `OPENSSL_init_ssl(OPENSSL_INIT_LOAD_CONFIG, NULL)`, updated certificate API calls (`X509_getm_notBefore`, `X509_getm_notAfter`), and standardized on `TLS_method()`. Citus adaptation: We now `#include <openssl/opensslv.h>` and use `#if OPENSSL_VERSION_NUMBER >= 0x10100000L` to choose between` OPENSSL_init_ssl()` or `SSL_library_init()`, and wrap` X509_gmtime_adj()` calls around the new accessor functions. * `6c66b7443c` ### Adapt `ExtractColumns()` to the new PG-18 `expandRTE()` signature PostgreSQL 18 `80feb727c8` added a fourth argument of type `VarReturningType` to `expandRTE()`, so calls that used the old 7-parameter form no longer compile. This patch: * Wraps the `expandRTE(...)` call in a `#if PG_VERSION_NUM >= 180000` guard. * On PG 18+ passes the new `VAR_RETURNING_DEFAULT` argument before `location`. * On PG 15–17 continues to call the original 7-arg form. * Adds the necessary includes (`parser/parse_relation.h` for `expandRTE` and `VarReturningType`, and `pg_version_constants.h` for `PG_VERSION_NUM`). ### Adapt `ExecutorStart`/`ExecutorRun` hooks to PG-18’s new signatures PostgreSQL 18 `525392d572` changed the signatures of the executor hooks: * `ExecutorStart_hook` now returns `bool` instead of `void`, and * `ExecutorRun_hook` drops its old `run_once` argument. This patch preserves Citus’s existing hook logic by: 1. Adding two adapter functions under `#if PG_VERSION_NUM >= PG_VERSION_18`: * `citus_executor_start_adapter(QueryDesc queryDesc, int eflags)` Calls the old `CitusExecutorStart(queryDesc, eflags)` and then returns `true` to satisfy the new hook’s `bool` return type. `citus_executor_run_adapter(QueryDesc queryDesc, ScanDirection direction, uint64 count)` Calls the old `CitusExecutorRun(queryDesc, direction, count, true)` (passing `true` for the dropped `run_once` argument), and returns `void`. 2. Installing the adapters* in `_PG_init()` instead of the original hooks when building against PG 18+: ```c #if PG_VERSION_NUM >= PG_VERSION_18 ExecutorStart_hook = citus_executor_start_adapter; ExecutorRun_hook = citus_executor_run_adapter; #else ExecutorStart_hook = CitusExecutorStart; ExecutorRun_hook = CitusExecutorRun; #endif ``` ### Adapt to PG-18’s removal of the “run\_once” flag from ExecutorRun/PortalRun PostgreSQL commit [[3eea7a0](`3eea7a0c97`) rationalized the executor’s parallelism logic by moving the “execute a plan only once” check into `ExecutePlan()` itself and dropping the old `bool run_once` argument from the public APIs: ```diff - void ExecutorRun(QueryDesc queryDesc, - ScanDirection direction, - uint64 count, - bool run_once); + void ExecutorRun(QueryDesc queryDesc, + ScanDirection direction, + uint64 count); ``` (and similarly for `PortalRun()`). To stay compatible across PG 15–18, Citus now: 1. Updates all internal calls to `ExecutorRun(...)` and `PortalRun(...)`: * On PG 18+, use the new three-argument form (`ExecutorRun(qd, dir, count)`). * On PG 15–17, keep the old four-arg form (`ExecutorRun(qd, dir, count, true)`) under a `#if PG_VERSION_NUM < 180000` guard. 2. Guards the dispatcher hooks via the adapter functions (from the earlier patch) so that Citus’s executor hooks continue to work under both the old and new signatures. ### Adapt to PG-18’s shortened PortalRun signature PostgreSQL 18’s refactoring (see commit [3eea7a0](`3eea7a0c97`)) also removed the old run_once and alternate‐dest arguments from the public PortalRun() API. The signature changed from: ```diff - bool PortalRun(Portal portal, - long count, - bool isTopLevel, - bool run_once, - DestReceiver dest, - DestReceiver altdest, - QueryCompletion qc); + bool PortalRun(Portal portal, + long count, + bool isTopLevel, + DestReceiver dest, + DestReceiver altdest, + QueryCompletion qc); ``` To support both versions in Citus, we: 1. Version-guard each call to `PortalRun()`: * On PG 18+ invoke the new 6-argument form. * On PG 15–17 fall back to the legacy 7-argument form, passing `true` for `run_once`. ### Add support for PG-18’s new `plansource` argument in `PortalDefineQuery`** PostgreSQL 18 extended the `PortalDefineQuery` API to carry a `CachedPlanSource plansource` pointer so that the portal machinery can track cached‐plan invalidation (as introduced alongside deferred-locking in commit `525392d572`. To remain compatible across PG 15–18, Citus now wraps its calls under a version guard: ```diff - PortalDefineQuery(portal, NULL, sql, commandTag, plantree_list, NULL); +#if PG_VERSION_NUM >= 180000 + / PG 18+: seven-arg signature (adds plansource) / + PortalDefineQuery( + portal, + NULL, / no prepared-stmt name / + sql, / the query text / + commandTag, / the CommandTag / + plantree_list, / List of PlannedStmt* / + NULL, / no CachedPlan / + NULL / no CachedPlanSource / + ); +#else + / PG 15–17: six-arg signature / + PortalDefineQuery( + portal, + NULL, / no prepared-stmt name / + sql, / the query text / + commandTag, / the CommandTag / + plantree_list, / List of PlannedStmt* / + NULL / no CachedPlan / + ); +#endif ``` ### Adapt ExecInitRangeTable() calls to PG-18’s new signature PostgreSQL commit [cbc127917e04a978a788b8bc9d35a70244396d5b](`cbc127917e`) overhauled the planner API for range‐table initialization: PG 18+: added a fourth `Bitmapset unpruned_relids` argument to support deferred partition pruning In Citus’s `create_estate_for_relation()` (in `columnar_metadata.c`), we now wrap the call in a compile‐time guard so that the code compiles correctly on all supported PostgreSQL versions: ``` /* Prepare permission info on PG 16+ / #if PG_VERSION_NUM >= PG_VERSION_16 List perminfos = NIL; addRTEPermissionInfo(&perminfos, rte); #else List perminfos = NIL; / unused on PG 15 / #endif / Initialize the range table, with the right signature for each PG version / #if PG_VERSION_NUM >= PG_VERSION_18 / PG 18+: four‐arg signature (adds unpruned_relids) / ExecInitRangeTable( estate, list_make1(rte), perminfos, NULL / unpruned_relids: not used by columnar / ); #elif PG_VERSION_NUM >= PG_VERSION_16 / PG 16–17: three‐arg signature (permInfos) / ExecInitRangeTable( estate, list_make1(rte), perminfos ); #else / PG 15: two‐arg signature / ExecInitRangeTable( estate, list_make1(rte) ); #endif estate->es_output_cid = GetCurrentCommandId(true); ``` ### Adapt `pgstat_report_vacuum()` to PG-18’s new timestamp argument PostgreSQL commit [[30a6ed0ce4bb18212ec38cdb537ea4b43bc99b83](`30a6ed0ce4`) extended the `pgstat_report_vacuum()` API by adding a `TimestampTz start_time` parameter at the end so that the VACUUM statistics collector can record when the operation began: ```diff / PG ≤17: four-arg signature / - void pgstat_report_vacuum(Oid tableoid, - bool shared, - double num_live_tuples, - double num_dead_tuples); +/ PG ≥18: five-arg signature adds a start_time / + void pgstat_report_vacuum(Oid tableoid, + bool shared, + double num_live_tuples, + double num_dead_tuples, + TimestampTz start_time); ``` To support both versions, we now wrap the call in `columnar_tableam.c` with a version guard, supplying `GetCurrentTimestamp()` for PG-18+: ```c #if PG_VERSION_NUM >= 180000 / PG 18+: include start_timestamp / pgstat_report_vacuum( RelationGetRelid(rel), rel->rd_rel->relisshared, Max(new_live_tuples, 0), / live tuples / 0, / dead tuples / GetCurrentTimestamp() / start time / ); #else / PG 15–17: original signature / pgstat_report_vacuum( RelationGetRelid(rel), rel->rd_rel->relisshared, Max(new_live_tuples, 0), / live tuples / 0 / dead tuples / ); #endif ``` ### Adapt `ExecuteTaskPlan()` to PG-18’s expanded `CreateQueryDesc()` signature PostgreSQL 18 changed `CreateQueryDesc()` from an eight-argument to a nine-argument call by inserting a `CachedPlan cplan` parameter immediately after the `PlannedStmt plannedstmt` argument (see commit `525392d572`). To remain compatible with PG 15–17, Citus now wraps its invocation in `local_executor.c` with a version guard: ```diff - / PG15–17: eight-arg CreateQueryDesc without cached plan / - QueryDesc queryDesc = CreateQueryDesc( - taskPlan, /* PlannedStmt plannedstmt / - queryString, /* const char sourceText / - GetActiveSnapshot(),/* Snapshot snapshot / - InvalidSnapshot, / Snapshot crosscheck_snapshot / - destReceiver, / DestReceiver dest / - paramListInfo, /* ParamListInfo params / - queryEnv, / QueryEnvironment queryEnv / - 0 /* int instrument_options / - ); +#if PG_VERSION_NUM >= 180000 + / PG18+: nine-arg CreateQueryDesc with a CachedPlan slot / + QueryDesc queryDesc = CreateQueryDesc( + taskPlan, /* PlannedStmt plannedstmt / + NULL, /* CachedPlan cplan (none) / + queryString, /* const char sourceText / + GetActiveSnapshot(),/* Snapshot snapshot / + InvalidSnapshot, / Snapshot crosscheck_snapshot / + destReceiver, / DestReceiver dest / + paramListInfo, /* ParamListInfo params / + queryEnv, / QueryEnvironment queryEnv / + 0 /* int instrument_options / + ); +#else + / PG15–17: eight-arg CreateQueryDesc without cached plan / + QueryDesc queryDesc = CreateQueryDesc( + taskPlan, /* PlannedStmt plannedstmt / + queryString, /* const char sourceText / + GetActiveSnapshot(),/* Snapshot snapshot / + InvalidSnapshot, / Snapshot crosscheck_snapshot / + destReceiver, / DestReceiver dest / + paramListInfo, /* ParamListInfo params / + queryEnv, / QueryEnvironment queryEnv / + 0 /* int instrument_options / + ); +#endif ``` ### Adapt `RelationGetPrimaryKeyIndex()` to PG-18’s new “deferrable\_ok” flag PostgreSQL commit `14e87ffa5c` added a new Boolean `deferrable_ok` parameter to `RelationGetPrimaryKeyIndex()` so that the lock manager can defer unique‐constraint locks when requested. The API changed from: ```c RelationGetPrimaryKeyIndex(Relation relation) ``` to: ```c RelationGetPrimaryKeyIndex(Relation relation, bool deferrable_ok) ``` ```diff diff --git a/src/backend/distributed/metadata/node_metadata.c b/src/backend/distributed/metadata/node_metadata.c index e3a1b2c..f4d5e6f 100644 --- a/src/backend/distributed/metadata/node_metadata.c +++ b/src/backend/distributed/metadata/node_metadata.c @@ -2965,8 +2965,18 @@ / - Relation replicaIndex = index_open(RelationGetPrimaryKeyIndex(pgDistNode), - AccessShareLock); + #if PG_VERSION_NUM >= PG_VERSION_18 + /* PG 18+ adds a bool "deferrable_ok" parameter / + Relation replicaIndex = + index_open( + RelationGetPrimaryKeyIndex(pgDistNode, false), + AccessShareLock); + #else + Relation replicaIndex = + index_open( + RelationGetPrimaryKeyIndex(pgDistNode), + AccessShareLock); + #endif ScanKeyInit(&scanKey[0], Anum_pg_dist_node_nodename, BTEqualStrategyNumber, F_TEXTEQ, CStringGetTextDatum(nodeName)); ``` ```diff diff --git a/src/backend/distributed/operations/node_protocol.c b/src/backend/distributed/operations/node_protocol.c index e3a1b2c..f4d5e6f 100644 --- a/src/backend/distributed/operations/node_protocol.c +++ b/src/backend/distributed/operations/node_protocol.c @@ -746,7 +746,12 @@ if (!OidIsValid(idxoid)) { - idxoid = RelationGetPrimaryKeyIndex(rel); + / Determine the index OID of the primary key (PG18 adds a second parameter) / +#if PG_VERSION_NUM >= PG_VERSION_18 + idxoid = RelationGetPrimaryKeyIndex(rel, false); +#else + idxoid = RelationGetPrimaryKeyIndex(rel); +#endif } return idxoid; ``` Because Citus has always taken the lock immediately—just as the old two-arg call did—we pass `false` to keep that same immediate-lock behavior. Passing `true` would switch to deferred locking, which we don’t want. ### Adapt `ExplainOnePlan()` to PG-18’s expanded API PostgreSQL 18 extended `525392d572` the `ExplainOnePlan()` function to carry the `CachedPlan ` and `CachedPlanSource ` pointers plus an explicit `query_index`, letting the EXPLAIN machinery track plan‐source invalidation. The old signature: ```c / PG ≤17 / void ExplainOnePlan(PlannedStmt plannedstmt, IntoClause into, struct ExplainState es, const char queryString, ParamListInfo params, QueryEnvironment queryEnv, const instr_time planduration, const BufferUsage bufusage); ``` became, in PG 18: ```c /* PG ≥18 / void ExplainOnePlan(PlannedStmt plannedstmt, CachedPlan cplan, CachedPlanSource plansource, int query_index, IntoClause into, struct ExplainState es, const char queryString, ParamListInfo params, QueryEnvironment queryEnv, const instr_time planduration, const BufferUsage bufusage, const MemoryContextCounters mem_counters); ``` To compile under both versions, Citus now wraps each call in `multi_explain.c` with: ```c #if PG_VERSION_NUM >= PG_VERSION_18 / PG 18+: pass NULL for the new cached‐plan fields and zero for query_index / ExplainOnePlan( plan, / PlannedStmt plannedstmt / NULL, /* CachedPlan cplan / NULL, /* CachedPlanSource plansource / 0, /* query_index / into, / IntoClause into / es, /* ExplainState es / queryString, /* const char queryString / params, /* ParamListInfo params / NULL, / QueryEnvironment queryEnv / &planduration,/* const instr_time planduration / (es->buffers ? &bufusage : NULL), (es->memory ? &mem_counters : NULL) ); #elif PG_VERSION_NUM >= PG_VERSION_17 /* PG 17: same as before, plus passing mem_counters if enabled / ExplainOnePlan( plan, into, es, queryString, params, queryEnv, &planduration, (es->buffers ? &bufusage : NULL), (es->memory ? &mem_counters : NULL) ); #else / PG 15–16: original seven-arg form / ExplainOnePlan( plan, into, es, queryString, params, queryEnv, &planduration, (es->buffers ? &bufusage : NULL) ); #endif ``` ### Adapt to the unified “index interpretation” API in PG 18 (commit a8025f544854) PostgreSQL commit `a8025f5448` generalized the old btree‐specific operator‐interpretation API into a single “index interpretation” interface: Renamed type: `OpBtreeInterpretation` → `OpIndexInterpretation` * Renamed function: `get_op_btree_interpretation(opno)` → `get_op_index_interpretation(opno)` * Unified field: Each interpretation now carries `cmptype` instead of `strategy`. To build cleanly on PG 18 while still supporting PG 15–17, Citus’s shard‐pruning code now wraps these changes: ```c #include "pg_version_constants.h" #if PG_VERSION_NUM >= PG_VERSION_18 /* On PG 18+ the btree‐only APIs vanished; alias them to the new generic versions / typedef OpIndexInterpretation OpBtreeInterpretation; #define get_op_btree_interpretation(opno) get_op_index_interpretation(opno) #define ROWCOMPARE_NE COMPARE_NE #endif / … later, when checking an interpretation … / OpBtreeInterpretation interp = (OpBtreeInterpretation ) lfirst(cell); #if PG_VERSION_NUM >= PG_VERSION_18 / use cmptype on PG 18+ / if (interp->cmptype == ROWCOMPARE_NE) #else / use strategy on PG 15–17 / if (interp->strategy == ROWCOMPARE_NE) #endif { / … / } ``` ### Adapt `create_foreignscan_path()` for PG-18’s revised signature PostgreSQL commit `e222534679` reordered and removed a couple of parameters in the FDW‐path builder: PG 15–17 signature (11 args) ```c create_foreignscan_path(PlannerInfo root, RelOptInfo rel, PathTarget target, double rows, Cost startup_cost, Cost total_cost, List pathkeys, Relids required_outer, Path fdw_outerpath, List fdw_restrictinfo, List fdw_private); ``` PG 18+ signature (9 args) ```c create_foreignscan_path(PlannerInfo root, RelOptInfo rel, PathTarget target, double rows, int disabled_nodes, Cost startup_cost, Cost total_cost, Relids required_outer, Path fdw_outerpath, List fdw_private); ``` To support both, Citus now defines a compatibility macro in `pg_version_compat.h`: ```c #include "nodes/bitmapset.h" / for Relids / #include "nodes/pg_list.h" / for List / #include "optimizer/pathnode.h" / for create_foreignscan_path() / #if PG_VERSION_NUM >= PG_VERSION_18 / PG18+: drop pathkeys & fdw_restrictinfo, add disabled_nodes / #define create_foreignscan_path_compat(a, b, c, d, e, f, g, h, i, j, k) \ create_foreignscan_path( \ (a), / root / \ (b), / rel / \ (c), / target / \ (d), / rows / \ (0), / disabled_nodes (unused by Citus) / \ (e), / startup_cost / \ (f), / total_cost / \ (g), / required_outer / \ (h), / fdw_outerpath / \ (k) / fdw_private / \ ) #else / PG15–17: original signature / #define create_foreignscan_path_compat(a, b, c, d, e, f, g, h, i, j, k) \ create_foreignscan_path( \ (a), (b), (c), (d), \ (e), (f), \ (g), (h), (i), (j), (k) \ ) #endif ``` Now every call to `create_foreignscan_path_compat(...)`—even in tests like `fake_fdw.c`—automatically picks the correct argument list for PG 15 through PG 18. ### Drop the obsolete bitmap‐scan hooks on PG 18+ PostgreSQL commit `c3953226a0` cleaned up the `TableAmRoutine` API by removing the two bitmap‐scan callback slots: `scan_bitmap_next_block` * `scan_bitmap_next_tuple` Since those hook‐slots no longer exist in PG 18, Citus now wraps their NULL‐initialization in a `#if PG_VERSION_NUM < PG_VERSION_18` guard. On PG 15–17 we still explicitly set them to `NULL` (to satisfy the old struct layout), and on PG 18+ we omit them entirely: ```c #if PG_VERSION_NUM < PG_VERSION_18 /* PG 15–17 only: these fields were removed upstream in PG 18 / .scan_bitmap_next_block = NULL, .scan_bitmap_next_tuple = NULL, #endif ``` ### Adapt `vac_update_relstats()` invocation to PG-18’s new “all\_frozen” argument PostgreSQL commit `99f8f3fbbc` extended the `vac_update_relstats()` API by inserting a `num_all_frozen_pages` parameter between the existing `num_all_visible_pages` and `hasindex` arguments: ```diff - / PG ≤17: / - void - vac_update_relstats(Relation relation, - BlockNumber num_pages, - double num_tuples, - BlockNumber num_all_visible_pages, - bool hasindex, - TransactionId frozenxid, - MultiXactId minmulti, - bool frozenxid_updated, - bool minmulti_updated, - bool in_outer_xact); + / PG ≥18: adds num_all_frozen_pages / + void + vac_update_relstats(Relation relation, + BlockNumber num_pages, + double num_tuples, + BlockNumber num_all_visible_pages, + BlockNumber num_all_frozen_pages, + bool hasindex, + TransactionId frozenxid, + MultiXactId minmulti, + bool frozenxid_updated, + bool minmulti_updated, + bool in_outer_xact); ``` To compile cleanly on both PG 15–17 and PG 18+, Citus wraps its call in a version guard and supplies a zero placeholder for the new field: ```c #if PG_VERSION_NUM >= 180000 / PG 18+: supply explicit “all_frozen” count / vac_update_relstats( rel, new_rel_pages, new_live_tuples, new_rel_allvisible, / allvisible / 0, / all_frozen / nindexes > 0, newRelFrozenXid, newRelminMxid, &frozenxid_updated, &minmulti_updated, false / in_outer_xact / ); #else / PG 15–17: original signature / vac_update_relstats( rel, new_rel_pages, new_live_tuples, new_rel_allvisible, nindexes > 0, newRelFrozenXid, newRelminMxid, &frozenxid_updated, &minmulti_updated, false / in_outer_xact / ); #endif ``` Why all_frozen = 0?* Columnar storage never embeds transaction IDs in its pages, so it never needs to track “all‐frozen” pages the way a heap does. Setting both allvisible and allfrozen to zero simply tells Postgres “there are no pages with the visibility or frozen‐status bits set,” matching our existing behavior. This change ensures Citus’s VACUUM‐statistic updates work unmodified across all supported Postgres versions.	2025-07-16 15:30:41 +03:00
Onur Tirtir	ea7aa6712d	Move stat view implementations into a submodule (#7975 ) Also move serialize_distributed_ddls into commands submodule, seems like an oversight from last year (by me).	2025-04-29 14:22:29 +03:00
Onur Tirtir	3d61c4dc71	Add citus_stat_counters view and citus_stat_counters_reset() function to reset it (#7917 ) DESCRIPTION: Adds citus_stat_counters view that can be used to query stat counters that Citus collects while the feature is enabled, which is controlled by citus.enable_stat_counters. citus_stat_counters() can be used to query the stat counters for the provided database oid and citus_stat_counters_reset() can be used to reset them for the provided database oid or for the current database if nothing or 0 is provided. Today we don't persist stat counters on server shutdown. In other words, stat counters are automatically reset in case of a server restart. Details on the underlying design can be found in header comment of stat_counters.c and in the technical readme. ------- Here are the details about what we track as of this PR: For connection management, we have three statistics about the inter-node connections initiated by the node itself: * connection_establishment_succeeded * connection_establishment_failed * connection_reused While the first two are relatively easier to understand, the third one covers the case where a connection is reused. This can happen when a connection was already established to the desired node, Citus decided to cache it for some time (see citus.max_cached_conns_per_worker & citus.max_cached_connection_lifetime), and then reused it for a new remote operation. Here are the other important details about these connection statistics: 1. connection_establishment_failed doesn't care about the connections that we could establish but are lost later in the transaction. Plus, we cannot guarantee that the connections that are counted in connection_establishment_succeeded were not lost later. 2. connection_establishment_failed doesn't care about the optional connections (see OPTIONAL_CONNECTION flag) that we gave up establishing because of the connection throttling rules we follow (see citus.max_shared_pool_size & citus.local_shared_pool_size). The reaason for this is that we didn't even try to establish these connections. 3. For the rest of the cases where a connection failed for some reason, we always increment connection_establishment_failed even if the caller was okay with the failure and know how to recover from it (e.g., the adaptive executor knows how to fall back local execution when the target node is the local node and if it cannot establish a connection to the local node). The reason is that even if it's likely that we can still serve the operation, we still failed to establish the connection and we want to track this. 4. Finally, the connection failures that we count in connection_establishment_failed might be caused by any of the following reasons and for now we prefer to _not_ further distinguish them for simplicity: a. remote node is down or cannot accept any more connections, or overloaded such that citus.node_connection_timeout is not enough to establish a connection b. any internal Citus error that might result in preparing a bad connection string so that libpq fails when parsing the connection string even before actually trying to establish a connection via connect() call c. broken citus.node_conninfo or such Citus configuration that was incorrectly set by the user can also result in similar outcomes as in b d. internal waitevent set / poll errors or OOM in local node We also track two more statistics for query execution: * query_execution_single_shard * query_execution_multi_shard And more importantly, both query_execution_single_shard and query_execution_multi_shard are not only tracked for the top-level queries but also for the subplans etc. The reason is that for some queries, e.g., the ones that go through recursive planning, after Citus performs the heavy work as part of subplans, the work that needs to be done for the top-level query becomes quite straightforward. And for such query types, it would be deceiving if we only incremented the query stat counters for the top-level query. Similarly, for non-pushable INSERT .. SELECT and MERGE queries, we perform separate counter increments for the SELECT / source part of the query besides the final INSERT / MERGE query.	2025-04-28 12:23:52 +00:00
Colm	ec141f696a	Enhance MERGE .. WHEN NOT MATCHED BY SOURCE for repartitioned source (#7900 ) DESCRIPTION: Ensure that a MERGE command on a distributed table with a `WHEN NOT MATCHED BY SOURCE` clause runs against all shards of the distributed table. The Postgres MERGE command updates a table using a table or a query as a data source. It provides three ways to match the target table with the source: `WHEN MATCHED` means that there is a row in both the target and source; `WHEN NOT MATCHED` means that there is a row in the source that has no match (is not present) in the target; and, as of PG17, `WHEN NOT MATCHED BY SOURCE` means that there is a row in the target that has no match in the source. In Citus, when a MERGE command updates a distributed table using a local/reference table or a distributed query as source, that source is repartitioned, and for each repartitioned shard that has data (i.e. 1 or more rows) the MERGE is run against the corresponding distributed table shard. Suppose the distributed table has 32 shards, and the source repartitions into 4 shards that have data, with the remaining 28 shards being empty; then the MERGE command is performed on the 4 corresponding shards of the distributed table. However, the semantics of `WHEN NOT MATCHED BY SOURCE` are that the specified action must be performed on the target for each row in the target that is not in the source; so if the source is empty, all target rows should be updated. To see this, consider the following MERGE command: ``` MERGE INTO target AS t USING source AS s ON t.id = s.id WHEN NOT MATCHED BY SOURCE THEN UPDATE t SET t.col1 = 100 ``` If the source has zero rows then every row in the target is updated s.t. its col1 value is 100. Currently in Citus a MERGE on a distributed table with a local/reference table or a distributed query as source ignores shards of the distributed table when the corresponding shard of the repartitioned source has zero rows. However, if the MERGE command specifies a `WHEN NOT MATCHED BY SOURCE` clause, then the MERGE should be performed on all shards of the distributed table, to ensure that the specified action is performed on the target for each row in the target that is not in the source. This PR enhances Citus MERGE execution so that when a repartitioned source shard has zero rows, and the MERGE command specifies a `WHEN NOT MATCHED BY SOURCE` clause, the MERGE is performed against the corresponding shard of the distributed table using an empty (zero row) relation as source, by generating a query of the form: ``` MERGE INTO target_shard_0002 AS t USING (SELECT id FROM (VALUES (NULL) ) source_0002(id) WHERE FALSE) AS s ON t.id = s.id WHEN NOT MATCHED BY SOURCE THEN UPDATE t set t.col1 = 100 ``` This works because each row in the target shard will be updated, and `WHEN MATCHED` and `WHEN NOT MATCHED`, if specified, will be no-ops because the source has zero rows. To implement this when the source is a local or reference table involves teaching function `ExcuteSourceAtCoordAndRedistribution()` in `merge_executor.c` to not prune tasks when the query has `WHEN NOT MATCHED BY SOURCE` but to instead replace the task's query to one that uses an empty relation as source. And when the source is a distributed query, function `ExecuteMergeSourcePlanIntoColocatedIntermediateResults()` (also in `merge_executor.c`) instead of skipping empty tasks now generates a query that uses an empty relation as source for the corresponding target shard of the distributed table, but again only when the query has `WHEN NOT MATCHED BY SOURCE`. A new function `BuildEmptyResultQuery()` is added to `recursive_planning.c` and it is used by both the aforementioned functions in `merge_executor.c` to build an empty relation to use as the source. It applies the appropriate type to each column of the empty relation so the join with the target makes sense to the query compiler.	2025-03-12 12:43:01 +03:00
Naisila Puka	3b1c082791	Drops PG14 support (#7753 ) DESCRIPTION: Drops PG14 support 1. Remove "$version_num" != 'xx' from configure file 2. delete all PG_VERSION_NUM = PG_VERSION_XX references in the code 3. Look at pg_version_compat.h file, remove all _compat functions etc defined specifically for PGXX differences 4. delete all PG_VERSION_NUM >= PG_VERSION_(XX+1), PG_VERSION_NUM < PG_VERSION_(XX+1) ifs in the codebase 5. delete ruleutils_xx.c file 6. cleanup normalize.sed file from pg14 specific lines 7. delete all alternative output files for that particular PG version, server_version_ge variable helps here	2025-03-12 12:43:01 +03:00
Naisila Puka	dce54db494	PG17 compatibility: Resolve compilation issues (#7699 ) This PR provides successful compilation against PG17.0. - Remove ExecFreeExprContext call Relevant PG commit d060e921ea5aa47b6265174c32e1128cebdbc3df `d060e921ea` - PG17 uses streaming IO in analyze, fix scan_analyze_next_block function Relevant PG commit 041b96802efa33d2bc9456f2ad946976b92b5ae1 `041b96802e` - Define ObjectClass for PG17+ only since it's removed Relevant PG commit: 89e5ef7e21812916c9cf9fcf56e45f0f74034656 `89e5ef7e21` - Remove ReorderBufferTupleBuf structure. Relevant PG commit: 08e6344fd6423210b339e92c069bb979ba4e7cd6 `08e6344fd6` - Define colliculocale and daticulocale since they have been renamed Relevant PG commit: f696c0cd5f299f1b51e214efc55a22a782cc175d `f696c0cd5f` - makeStringConst defined in PG17 Relevant PG commit: de3600452b61d1bc3967e9e37e86db8956c8f577 `de3600452b` - RangeVarCallbackOwnsTable was replaced by RangeVarCallbackMaintainsTable Relevant PG commit: ecb0fd33720fab91df1207e85704f382f55e1eb7 `ecb0fd3372` - attstattarget is nullable, define pg compatible functions for it Relevant PG commit: 4f622503d6de975ac87448aea5cea7de4bc140d5 `4f622503d6` - stxstattarget is nullable in PG17, write compat functions for it Relevant PG commit: 012460ee93c304fbc7220e5b55d9d0577fc766ab `012460ee93` - Use ResourceOwner to track WaitEventSet in PG17 Relevant PG commit: 50c67c2019ab9ade8aa8768bfe604cd802fe8591 `50c67c2019` - getIdentitySequence now uses Relation instead of relation_id Relevant PG commit: 509199587df73f06eda898ae13284292f4ae573a `509199587d` - Remove no-op tuplestore_donestoring function Relevant PG commit: 75680c3d805e2323cd437ac567f0677fdfc7b680 `75680c3d80` - MergeAction can have 3 merge kinds (now enum) in PG17, write compat Relevant PG commit: 0294df2f1f842dfb0eed79007b21016f486a3c6c `0294df2f1f` - EXPLAIN (MEMORY) is added, make changes to ExplainOnePlan Relevant PG commit: 5de890e3610d5a12cdaea36413d967cf5c544e20 `5de890e361` - LIMIT_OPTION_DEFAULT has been removed as it's useless, use LIMIT_OPTION_COUNT Relevant PG commit: a6be0600ac3b71dda8277ab0fcbe59ee101ac1ce `a6be0600ac` - write compat for create_foreignscan_path bcs of more arguments in PG17 Relevant PG commit: 9e9931d2bf40e2fea447d779c2e133c2c1256ef3 `9e9931d2bf` - pgprocno and lxid have been combined into a struct in PGPROC Relevant PG commits: 28f3915b73f75bd1b50ba070f56b34241fe53fd1 `28f3915b73` ab355e3a88de745607f6dd4c21f0119b5c68f2ad `ab355e3a88` 024c521117579a6d356050ad3d78fdc95e44eefa `024c521117` - Simplify CitusNewNode (#7434) postgres refactored newNode() in PG 17, the main point for doing this is the original tricks is no longer neccessary for modern compilers[1]. This does the same for Citus. This should have no backward compatibility issues since it just replaces palloc0fast with palloc0. This is good for forward compatibility since palloc0fast no longer exists in PG 17. [1] https://www.postgresql.org/message-id/b51f1fa7-7e6a-4ecc-936d-90a8a1659e7c@iki.fi (cherry picked from commit `4b295cc`)	2025-03-12 11:01:49 +03:00
Naisila Puka	6bd3474804	Rename foreach_ macros to foreach_declared_ macros (#7700 ) This is prep work for successful compilation with PG17 PG17added foreach_ptr, foreach_int and foreach_oid macros Relevant PG commit 14dd0f27d7cd56ffae9ecdbe324965073d01a9ff `14dd0f27d7` We already have these macros, but they are different with the PG17 ones because our macros take a DECLARED variable, whereas the PG16 macros declare a locally-scoped loop variable themselves. Hence I am renaming our macros to foreach_declared_ I am separating this into its own PR since it touches many files. The main compilation PR is https://github.com/citusdata/citus/pull/7699	2025-03-12 11:01:49 +03:00
Karina	9ff8436f14	Create directories and files with pg_file_create_mode and pg_dir_create_mode permissions (#7479 ) Since Postgres commit da9b580d files and directories are supposed to be created with pg_file_create_mode and pg_dir_create_mode permissions when default permissions are expected. This fixes a failure of one of the postgres tests: If we create file add.conf containing ``` shared_preload_libraries='citus' ``` and run postgres tests ``` TEMP_CONFIG=/path/to/add.conf make installcheck -C src/bin/pg_ctl/ ``` then 001_start_stop.pl fails with ``` .../data/base/pgsql_job_cache mode must be 0750 ``` in the log. In passing this also stops creating directories that we haven't used since Citus 7.4 This change explicitely doesn't change permissions of certificates/keys that we create. --------- Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>	2024-02-07 12:48:31 +01:00
Gürkan İndibay	863713e9b7	Refactors ExtendedTaskList methods (#7372 ) ExecuteTaskListIntoTupleDestWithParam and ExecuteTaskListIntoTupleDest are nearly the same. I parameterized and a made a reusable structure here --------- Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>	2024-01-24 06:00:19 +00:00
eaydingol	ee11492a0e	Generate qualified relation name (#7427 ) This change refactors the code by using generate_qualified_relation_name from id instead of using a sequence of functions to generate the relation name. Fixes #6602	2024-01-22 17:32:49 +03:00
zhjwpku	51e607878b	remove a duplicate forward declaration and polish some comments (#7371 ) remove a duplicate forward declaration and polish some comments Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>	2024-01-17 14:30:23 +00:00
Karina	20dc58cf5d	Fix getting heap tuple size (#7387 ) This fixes #7230. First of all, using HeapTupleHeaderGetDatumLength(heapTuple) is definetly wrong, it gives a number that's 4 times less than the correct tuple size (heapTuple.t_len). See https://github.com/postgres/postgres/blob/REL_16_0/src/include/access/htup_details.h#L455-L456 https://github.com/postgres/postgres/blob/REL_16_0/src/include/varatt.h#L279 https://github.com/postgres/postgres/blob/REL_16_0/src/include/varatt.h#L225-L226 When I fixed it, the limit_intermediate_size test failed, so I tried to understand what's going on there. In original commit `fd546cf` these queries were supposed to fail. Then in `b3af63c` three of the queries that were supposed to fail suddenly worked and tests were changed to pass without understanding why the output had changed or how to keep test testing what it had to test. Even comments saying that these queries should fail were left untouched. Commit message gives no clue about why exactly test has changed: > It seems that when we use adaptive executor instead of task tracker, we > exceed the intermediate result size less in the test. Therefore updated > the tests accordingly. Then `3fda2c3` also blindly raised the limit for one of the queries to keep it working: `3fda2c3254 (diff-a9b7b617f9dfd345318cb8987d5897143ca1b723c87b81049bbadd94dcc86570R19)` When in `fe3caf3` that HeapTupleHeaderGetDatumLength(heapTuple) call was finally added, one of those test queries became failing again. The other two of them now also failing after the fix. I don't understand how exactly the calculation of "intermediate result size" that is limited by citus.max_intermediate_result_size had changed through `b3af63c` and `fe3caf3`, but these numbers are now closer to what they originally were when this limitation was added in `fd546cf`. So these queries should fail, like in the original version of the limit_intermediate_size test. Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>	2024-01-08 17:09:30 +01:00
Nils Dijk	0620c8f9a6	Sort includes (#7326 ) This change adds a script to programatically group all includes in a specific order. The script was used as a one time invocation to group and sort all includes throught our formatted code. The grouping is as follows: - System includes (eg. `#include<...>`) - Postgres.h (eg. `#include "postgres.h"`) - Toplevel imports from postgres, not contained in a directory (eg. `#include "miscadmin.h"`) - General postgres includes (eg . `#include "nodes/..."`) - Toplevel citus includes, not contained in a directory (eg. `#include "citus_verion.h"`) - Columnar includes (eg. `#include "columnar/..."`) - Distributed includes (eg. `#include "distributed/..."`) Because it is quite hard to understand the difference between toplevel citus includes and toplevel postgres includes it hardcodes the list of toplevel citus includes. In the same manner it assumes anything not prefixed with `columnar/` or `distributed/` as a postgres include. The sorting/grouping is enforced by CI. Since we do so with our own script there are not changes required in our uncrustify configuration.	2023-11-23 18:19:54 +01:00
Nils Dijk	0dac63afc0	move pg_version_constants.h to toplevel include (#7335 ) In preparation of sorting and grouping all includes we wanted to move this file to the toplevel includes for good grouping/sorting.	2023-11-09 15:09:39 +00:00
cvbhjkl	e535f53ce5	Fix typo in local_executor.c (#7324 ) Fix a typo 'remaning' -> 'remaining' in local_executor.c	2023-11-03 12:14:11 +00:00
Gürkan İndibay	71a4633dad	Fixes typo and renames multi_process_utility (#7259 )	2023-10-17 16:39:37 +03:00
zhjwpku	5034f8eba5	polish the codebase by fixing dozens of typos (#7166 )	2023-09-01 12:21:53 +02:00
Önder Kalacı	4ae3982d14	Add single-shard router Merge command support (#7088 ) Similar to https://github.com/citusdata/citus/pull/7077. As PG 16+ has changed the join restriction information for certain outer joins, MERGE is also impacted given that is is also underlying an outer join. See #7077 for the details.	2023-08-04 08:16:29 +03:00
Önder Kalacı	960a5f6104	Improve failure handling of distributed execution (#7090 ) Prior to this commit, the code would skip processing the errors happened for local commands. Prior to https://github.com/citusdata/citus/pull/5379, it might make sense to allow the execution continue. But, as of today, if a modification fails on any placement, we can safely fail the execution. The first commit show the problem in action. The second commit includes the fix and the test fixes.	2023-08-01 16:47:59 +03:00
Naisila Puka	69af3e8509	Drop PG13 Support Phase 2 - Remove PG13 specific paths/tests (#7007 ) This commit is the second and last phase of dropping PG13 support. It consists of the following: - Removes all PG_VERSION_13 & PG_VERSION_14 from codepaths - Removes pg_version_compat entries and columnar_version_compat entries specific for PG13 - Removes alternative pg13 test outputs - Removes PG13 normalize lines and fix the test outputs based on that It is a continuation of `5bf163a27d`	2023-06-21 14:18:23 +03:00
aykut-bozkurt	f667f14029	Rewind tuple store to fix scrollable with hold cursor fetches (#7014 ) We need to rewind the tuplestorestate's tuple index to get correct results on fetching scrollable with hold cursors. `PersistHoldablePortal` is responsible for persisting out tuplestorestate inside a with hold cursor before commiting a transaction. It rewinds the cursor like below (`ExecutorRewindcalls` calls `rescan`): ```c if (portal->cursorOptions & CURSOR_OPT_SCROLL) { ExecutorRewind(queryDesc); } ``` At the end, it adjusts tuple index for holdStore in the portal properly. ```c if (portal->cursorOptions & CURSOR_OPT_SCROLL) { if (!tuplestore_skiptuples(portal->holdStore, portal->portalPos, true)) elog(ERROR, "unexpected end of tuple stream"); } ``` DESCRIPTION: Fixes incorrect results on fetching scrollable with hold cursors. Fixes https://github.com/citusdata/citus/issues/7010	2023-06-19 23:00:18 +03:00
Teja Mupparti	58da8771aa	This pull request introduces support for nonroutable merge commands in the following scenarios: 1) For distributed tables that are not colocated. 2) When joining on a non-distribution column for colocated tables. 3) When merging into a distributed table using reference or citus-local tables as the data source. This is accomplished primarily through the implementation of the following two strategies. Repartition: Plan the source query independently, execute the results into intermediate files, and repartition the files to co-locate them with the merge-target table. Subsequently, compile a final merge query on the target table using the intermediate results as the data source. Pull-to-coordinator: Execute the plan that requires evaluation at the coordinator, run the query on the coordinator, and redistribute the resulting rows to ensure colocation with the target shards. Direct the MERGE SQL operation to the worker nodes' target shards, using the intermediate files colocated with the data as the data source.	2023-06-19 12:23:40 -07:00
Naisila Puka	ba40eb363c	Fix some gucs' initial and boot values, and flag combinations (#6957 ) PG16beta1 added some sanity checks for GUCS, find the Relevant PG commits below: 1- Add check on initial and boot values when loading GUCs `a73952b795` 2- Extend check_GUC_init() with checks on flag combinations when loading GUCs `009f8d1714` I fixed our currently problematic GUCS, we can merge this directly into main as these make sense for any PG version. There was a particular NodeConninfo issue: Previously we would rely on the fact that NodeConninfo initial value is an empty string. However, with PG16 enforcing same initial and boot values, we can't use an empty initial value for NodeConninfo anymore. Therefore we add a new flag to indicate whether we are at boot check.	2023-06-14 11:55:52 +03:00
Gokhan Gulbiz	e0ccd155ab	Make citus_stat_tenants work with schema-based tenants. (#6936 ) DESCRIPTION: Enabling citus_stat_tenants to support schema-based tenants. This pull request modifies the existing logic to enable tenant monitoring with schema-based tenants. The changes made are as follows: - If a query has a partitionKeyValue (which serves as a tenant key/identifier for distributed tables), Citus annotates the query with both the partitionKeyValue and colocationId. This allows for accurate tracking of the query. - If a query does not have a partitionKeyValue, but its colocationId belongs to a distributed schema, Citus annotates the query with only the colocationId. The tenant monitor can then easily look up the schema to determine if it's a distributed schema and make a decision on whether to track the query. --------- Co-authored-by: Jelte Fennema <jelte.fennema@microsoft.com>	2023-06-13 14:11:45 +03:00
Teja Mupparti	f6a516dab5	Refactor repartitioning code into generic format	2023-06-05 09:06:05 -07:00
Teja Mupparti	ff2062e8c3	Rename insert-select redistribute code base to generic purpose	2023-06-01 09:43:43 -07:00
Emel Şimşek	02f815ce1f	Disable local execution when Explain Analyze is requested for a query. (#6892 ) DESCRIPTION: Fixes a crash when explain analyze is requested for a query that is normally locally executed. When explain analyze is requested for a query, a task with two queries is created. Those two queries are 1. Wrapped Query --> `SELECT ... FROM worker_save_query_explain_analyze(<query>, <explain analyze options>)` 2. Fetch Query -->` SELECT explain_analyze_output, execution_duration FROM worker_last_saved_explain_analyze();` When the query is locally executed a task with multiple queries causes a crash in production. See the Assert at `57455dc64d/src/backend/distributed/executor/tuple_destination.c`#:~:text=Assert(task%2D%3EqueryCount%20%3D%3D%201)%3B This becomes a critical issue when auto_explain extension is used. When auto_explain extension is enabled, explain analyze is automatically requested for every query. One possible solution could be not to create two queries for a locally executed query. The fetch part may not have to be a query since the values are available in local variables. Until we enable local execution for explain analyze, it is best to disable local execution. Fixes #6777.	2023-05-23 14:33:22 +03:00
Onur Tirtir	56d217b108	Mark objects as distributed even when pg_dist_node is empty (#6900 ) We mark objects as distributed objects in Citus metadata only if we need to propagate given the command that creates it to worker nodes. For this reason, we were not doing this for the objects that are created while pg_dist_node is empty. One implication of doing so is that we defer the schema propagation to the time when user creates the first distributed table in the schema. However, this doesn't help for schema-based sharding (#6866) because we want to sync pg_dist_tenant_schema to the worker nodes even for empty schemas too. * Support test dependencies for isolation tests without a schedule * Comment out a test due to a known issue (#6901) * Also, reduce the verbosity for some log messages and make some tests compatible with run_test.py.	2023-05-16 11:45:42 +03:00
Gokhan Gulbiz	8782ea1582	Ensure partitionKeyValue and colocationId are set for proper tenant stats gathering (#6834 ) This PR updates the tenant stats implementation to set partitionKeyValue and colocationId in ExecuteLocalTaskListExtended, in addition to LocallyExecuteTaskPlan. This ensures that tenant stats can be properly gathered regardless of the code path taken. The changes were initially made while testing stored procedure calls for tenant stats.	2023-04-17 09:35:26 +03:00
Halil Ozan Akgül	52ad2d08c7	Multi tenant monitoring (#6725 ) DESCRIPTION: Adds views that monitor statistics on tenant usages This PR adds `citus_stats_tenants` view that monitors the tenants on the cluster. `citus_stats_tenants` shows the node id, colocation id, tenant attribute, read count in this period and last period, and query count in this period and last period of the tenant. Tenant attribute currently is the tenant's distribution column value, later when schema based sharding is introduced, this meaning might change. A period is a time bucket the queries are counted by. Read and query counts for this period can increase until the current period ends. After that those counts are moved to last period's counts, which cannot change. The period length can be set using 'citus.stats_tenants_period'. `SELECT` queries are counted as _read_ queries, `INSERT`, `UPDATE` and `DELETE` queries are counted as _write_ queries. So in the view read counts are `SELECT` counts and query counts are `SELECT`, `INSERT`, `UPDATE` and `DELETE` count. The data is stored in shared memory, in a struct named `MultiTenantMonitor`. `citus_stats_tenants` shows the data from local tenants. `citus_stats_tenants` show up to `citus.stats_tenant_limit` number of tenants. The tenants are scored based on the number of queries they run and the recency of those queries. Every query ran increases the score of tenant by `ONE_QUERY_SCORE`, and after every period ends the scores are halved. Halving is done lazily. To retain information a longer the monitor keeps up to 3 times `citus.stats_tenant_limit` tenants. When the tenant count hits `3 * citus.stats_tenant_limit`, last `citus.stats_tenant_limit` tenants are removed. To see all stored tenants you can use `citus_stats_tenants(return_all_tenants := true)` - [x] Create collector view that gets data from all nodes. #6761 - [x] Add monitoring log #6762 - [x] Create enable/disable GUC #6769 - [x] Parse the annotation string correctly #6796 - [x] Add local queries and prepared statements #6797 - [x] Rename to citus_stat_statements #6821 - [x] Run pgbench - [x] Fix role permissions #6812 --------- Co-authored-by: Gokhan Gulbiz <ggulbiz@gmail.com> Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>	2023-04-05 17:44:17 +03:00
Marco Slot	343d1c5072	Refactor executor utility functions into multiple files (#6593 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2023-03-31 13:07:48 +02:00
Marco Slot	b09d239809	Propagate CREATE PUBLICATION statements	2023-03-29 00:59:12 +02:00
rajeshkt78	85b8a2c7a1	CDC implementation for Citus using Logical Replication (#6623 ) Description: Implementing CDC changes using Logical Replication to avoid re-publishing events multiple times by setting up replication origin session, which will add "DoNotReplicateId" to every WAL entry. - shard splits - shard moves - create distributed table - undistribute table - alter distributed tables (for some cases) - reference table operations The citus decoder which will be decoding WAL events for CDC clients, ignores any WAL entry with replication origin that is not zero. It also maps the shard names to distributed table names.	2023-03-28 16:00:21 +05:30
aykut-bozkurt	9e69dd0e7f	fix single tuple result memory leak (#6724 ) We should not omit to free PGResult when we receive single tuple result from an internal backend. Single tuple results are normally freed by our ReceiveResults for `tupleDescriptor != NULL` flow but not for those with `tupleDescriptor == NULL`. See PR #6722 for details. DESCRIPTION: Fixes memory leak issue with query results that returns single row.	2023-02-17 14:15:09 +03:00
Jelte Fennema	81dcddd1ef	Actually skip constraint validation on shards after shard move (#6640 ) DESCRIPTION: Fix foreign key validation skip at the end of shard move In `eadc88a` we started completely skipping foreign key constraint validation at the end of a non blocking shard move, instead of only for foreign keys to reference tables. However, it turns out that this didn't work at all because of a hard to notice bug: By resetting the SkipConstraintValidation flag at the end of our utility hook, we actually make the SET command that sets it a no-op. This fixes that bug by removing the code that resets it. This is fine because #6543 removed the only place where we set the flag in C code. So the resetting of the flag has no purpose anymore. This PR also adds a regression test, because it turned out we didn't have any otherwise we would have caught that the feature was completely broken. It also moves the constraint validation skipping to the utility hook. The reason is that #6550 showed us that this is the better place to skip it, because it will also skip the planning phase and not just the execution.	2023-01-27 13:08:05 +01:00
Onder Kalaci	feb5534c65	Do not create additional WaitEventSet for RemoteSocketClosed checks Before this commit, we created an additional WaitEventSet for checking whether the remote socket is closed per connection - only once at the start of the execution. However, for certain workloads, such as pgbench select-only workloads, the creation/deletion of the additional WaitEventSet adds ~7% CPU overhead, which is also reflected on the benchmark results. With this commit, we use the same WaitEventSet for the purposes of checking the remote socket at the start of the execution. We use "rebuildWaitEventSet" flag so that the executor can re-use the existing WaitEventSet. As a result, we see the following improvements on PG 15: main : 120051 tps, 0.532 ms latency avg. avoid_wes_rebuild: 127119 tps, 0.503 ms latency avg. And, on PG 14, as expected, there is no difference main : 129191 tps, 0.495 ms latency avg. avoid_wes_rebuild: 129480 tps, 0.494 ms latency avg. But, note that PG 15 is slightly (~1.5%) slower than PG 14. That is probably the overhead of checking the remote socket.	2022-12-14 22:42:55 +01:00
Onder Kalaci	d52da55ac0	Move WaitEvent to DistributedExecution Prep. for caching WaitEventsSet/WaitEvents	2022-12-14 21:59:19 +01:00
Marco Slot	666696c01c	Deprecate citus.replicate_reference_tables_on_activate, make it always off (#6474 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-11-04 16:21:10 +01:00
Önder Kalacı	8b624b5c9d	Detect remotely closed sockets and add a single connection retry in the executor (#6404 ) PostgreSQL 15 exposes WL_SOCKET_CLOSED in WaitEventSet API, which is useful for detecting closed remote sockets. In this patch, we use this new event and try to detect closed remote sockets in the executor. When a closed socket is detected, the executor now has the ability to retry the connection establishment. Note that, the executor can retry connection establishments only for the connection that has not been used. Basically, this patch is mostly useful for preventing the executor to fail if a cached connection is closed because of the worker node restart (or worker failover). In other words, the executor cannot retry connection establishment if we are in a distributed transaction AND any command has been sent over the connection. That requires more sophisticated retry mechanisms. For now, fixing the above use case is enough. Fixes #5538 Earlier discussions: #5908, #6259 and #6283 ### Summary of the current approach regards to earlier trials As noted, we explored some alternatives before getting into this. https://github.com/citusdata/citus/pull/6283 is simple, but lacks an important property. We should be checking for `WL_SOCKET_CLOSED` _before_ sending anything over the wire. Otherwise, it becomes very tricky to understand which connection is actually safe to retry. For example, in the current patch, we can safely check `transaction->transactionState == REMOTE_TRANS_NOT_STARTED` before restarting a connection. #6259 does what we intent here (e.g., check for sending any command). However, as @marcocitus noted, it is very tricky to handle `WaitEventSets` in multiple places. And, the executor is designed such that it reacts to the events. So, adding anything `pre-executor` seemed too ugly. In the end, I converged into this patch. This patch relies on the simplicity of #6283 and also does a very limited handling of `WaitEventSets`, just for our purpose. Just before we add any connection to the execution, we check if the remote session has already closed. With that, we do a brief interaction of multiple wait event processing, but with different purposes. The new wait event processing we added does not even consider cancellations. We let that handled by the main event processing loop. Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-10-14 15:08:49 +02:00
Ahmet Gedemenli	eadc88a800	Introduce GUC citus.skip_constraint_validation (#6281 ) Introduces a new GUC named citus.skip_constraint_validation, which basically skips constraint validation when set to on. For some several places that we hack to skip the foreign key validation phase, now we use this GUC.	2022-09-08 18:13:18 +03:00
Gokhan Gulbiz	ac96370ddf	Use IsMultiStatementTransaction for SELECT .. FOR UPDATE queries (#6288 ) * Use IsMultiStatementTransaction instead of IsTransaction for row-locking operations. * Add regression test for SELECT..FOR UPDATE statement	2022-09-06 16:38:41 +02:00
aykut-bozkurt	69726648ab	verify shards if exists for insert, delete, update (#6280 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-09-06 15:29:14 +02:00
Marco Slot	639588bee0	Remove unused functions (#6220 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-08-22 11:53:25 +03:00
Jelte Fennema	3f6ce889eb	Use CreateSimpleHash (and variants) whenever possible (#6177 ) This is a refactoring PR that starts using our new hash table creation helper function. It adds a few more macros for ease of use, because C doesn't have default arguments. It also adds a macro to check if a struct contains automatic padding bytes. No struct that is hashed using tag_hash should have automatic padding bytes, because those bytes are undefined and thus using them to create a hash will result in undefined behaviour (usually a random hash).	2022-08-17 13:01:59 +03:00
Onder Kalaci	bdaeb40b51	Add missing relation access record for local utility command While testing `5670dffd33`, I realized that we have a missing RecordNonDistTableAccessesForTask() for local utility commands. Although we don't have to record the relation access for local only cases, we really want to keep the behaviour for scale-out be the same with single node on all aspects. We wouldn't want any single node complex transaction to work on single machine, but not on multi node cluster. Hence, we apply the same restrictions. For example, on a distributed cluster, the following errors, and after this commit this errors locally as well ```SQL CREATE TABLE ref(a int primary key); INSERT INTO ref VALUES (1); CREATE TABLE dist(a int REFERENCES ref(a)); SELECT create_reference_table('ref'); SELECT create_distributed_table('dist', 'a'); BEGIN; SELECT * FROM dist; TRUNCATE ref CASCADE; ERROR: cannot execute DDL on table "ref" because there was a parallel SELECT access to distributed table "dist" in the same transaction HINT: Try re-running the transaction with "SET LOCAL citus.multi_shard_modify_mode TO 'sequential';" COMMIT; ``` We also add the comprehensive test suite and run the same locally.	2022-07-29 11:36:33 +02:00
Onder Kalaci	149771792b	Remove useless version compats most likely leftover from earlier versions	2022-07-29 10:31:55 +02:00
Marco Slot	cff013a057	Fix issues with insert..select casts and column ordering	2022-07-28 13:23:57 +02:00

1 2 3 4 5 ...

710 Commits (6b9962c0c03da7b5ee24945c9c1ef24b01f52d7e)