citus

Commit Graph

Author	SHA1	Message	Date
Mehmet Yilmaz	b1a38e0bca	Refactor CTE level shifting logic in insert_select_planner.c to improve clarity and functionality	2025-04-18 09:40:42 +00:00
Mehmet Yilmaz	d7a78ed17c	Refactor CTE level handling in insert_select_planner.c to shift every level (1→0, 2→1, 3→2, etc.) all in one pass	2025-04-18 09:40:42 +00:00
Mehmet Yilmaz	a7ca49c4bd	Remove unused variable isWrapped in PrepareInsertSelectForCitusPlanner function	2025-04-18 09:40:42 +00:00
Mehmet Yilmaz	ac0084eb92	Refactor comments and formatting in insert_select_planner.c for clarity and consistency	2025-04-18 09:40:42 +00:00
Mehmet Yilmaz	93ab49702d	Add regression tests for INSERT ... SELECT with CTE for schema-based and distributed tables	2025-04-18 09:40:42 +00:00
Cédric Villemain	a7e686c106	Make sure to prevent INSERT INTO ... SELECT queries involving subfield or sublink (#7912 ) DESCRIPTION: Makes sure to prevent `INSERT INTO ... SELECT` queries involving subfield or sublink, to avoid crashes The following query was crashing the backend: ``` INSERT INTO field_indirection_test_1 ( int_col, ct1_col.int_1,ct1_col.int_2 ) SELECT 0, 1, 2; -- crash ``` En passant, added more tests with sublink in distributed_types and found another query with wrong behavior: ``` INSERT INTO domain_indirection_test (f1,f3.if1) SELECT 0, 1; ERROR: could not find a conversion path from type 23 to 17619 -- not the expected ERROR ``` Fixed them by using `strip_implicit_coercions()` on target entry expression before checking for the presence of a subscript or fieldstore, else we fail to find the existing ones and wrongly accept to execute unsafe query.	2025-03-27 09:39:43 +00:00
Mehmet YILMAZ	e50563fbd8	Issue 7887 Enhance AddInsertSelectCasts for Identity Columns (#7920 ) ## Enhance `AddInsertSelectCasts` for Identity Columns This PR fixes #7887 and improves the behavior of partial inserts into identity columns by modifying the `AddInsertSelectCasts` function. Specifically, we introduce special-case handling for `nextval(...)` calls (represented in the parse tree as `NextValueExpr`) to ensure that if the identity column’s declared type differs from `nextval`’s default return type (`int8`), we cast the expression properly. This prevents mismatches like `int8` → `int4` from causing “invalid string enlargement” errors or other type-related failures. When `INSERT ... SELECT` is processed, `AddInsertSelectCasts` reconciles each target column’s type with the corresponding SELECT expression’s type. Historically, for identity columns that rely on `nextval(...)`, we can end up with a mismatch: - `nextval` returns `int8`, - The identity column might be `int4`, `bigint`, or another integer type. Without a correct cast, Postgres or Citus can produce plan-time or runtime errors. By detecting `NextValueExpr` and applying a cast to the column’s type, the final plan ensures consistent insertion without errors. ## What Changed 1. Check for `NextValueExpr`: In `AddInsertSelectCasts`, we now have a code block: ```c if (IsA(selectEntry->expr, NextValueExpr)) { Oid nextvalType = GetNextvalReturnTypeCatalog(); ... // If (targetType != nextvalType), build a cast from int8 -> targetType } else { // fallback to generic mismatch logic } ``` This short-circuits any expression that’s a `nextval(...)` call, letting us explicitly cast to the correct type. 2. Fallback Generic Logic: If it isn’t a `NextValueExpr` (i.e. a normal column or expression mismatch), we still rely on the existing path that compares `sourceType` vs. `targetType` and calls `CastExpr(...)` if they differ. 3. `GetNextvalReturnTypeCatalog`: We added or refined a helper function to confirm that `nextval` returns `int8`, or do a `LookupFuncName("nextval", ...)` to discover the function’s return type from `pg_proc`—making it robust if future changes happen. ## Benefits - Partial inserts into identity columns no longer fail with type mismatches. - When `nextval` yields `int8` but the identity column is `int4` (or another type), we properly cast to the column’s type in the plan. - Preserves the existing approach for other columns—only identity calls get the specialized `NextValueExpr` logic. ## Testing - Extended `generatedidentity.sql` test scenario to cover partial inserts into both `GENERATED ALWAYS` and `GENERATED BY DEFAULT` identity columns, including tests for the `OVERRIDING SYSTEM VALUE` clause and partial inserts referencing foreign-key columns.	2025-03-12 12:43:01 +03:00
Naisila Puka	6bd3474804	Rename foreach_ macros to foreach_declared_ macros (#7700 ) This is prep work for successful compilation with PG17 PG17added foreach_ptr, foreach_int and foreach_oid macros Relevant PG commit 14dd0f27d7cd56ffae9ecdbe324965073d01a9ff `14dd0f27d7` We already have these macros, but they are different with the PG17 ones because our macros take a DECLARED variable, whereas the PG16 macros declare a locally-scoped loop variable themselves. Hence I am renaming our macros to foreach_declared_ I am separating this into its own PR since it touches many files. The main compilation PR is https://github.com/citusdata/citus/pull/7699	2025-03-12 11:01:49 +03:00
Pavel Seleznev	fe6d198ab2	Remove warnings on some builds (#7680 ) Co-authored-by: Pavel Seleznev <PNSeleznev@sberbank.ru>	2024-12-03 17:10:36 +03:00
Nils Dijk	0620c8f9a6	Sort includes (#7326 ) This change adds a script to programatically group all includes in a specific order. The script was used as a one time invocation to group and sort all includes throught our formatted code. The grouping is as follows: - System includes (eg. `#include<...>`) - Postgres.h (eg. `#include "postgres.h"`) - Toplevel imports from postgres, not contained in a directory (eg. `#include "miscadmin.h"`) - General postgres includes (eg . `#include "nodes/..."`) - Toplevel citus includes, not contained in a directory (eg. `#include "citus_verion.h"`) - Columnar includes (eg. `#include "columnar/..."`) - Distributed includes (eg. `#include "distributed/..."`) Because it is quite hard to understand the difference between toplevel citus includes and toplevel postgres includes it hardcodes the list of toplevel citus includes. In the same manner it assumes anything not prefixed with `columnar/` or `distributed/` as a postgres include. The sorting/grouping is enforced by CI. Since we do so with our own script there are not changes required in our uncrustify configuration.	2023-11-23 18:19:54 +01:00
Nils Dijk	0dac63afc0	move pg_version_constants.h to toplevel include (#7335 ) In preparation of sorting and grouping all includes we wanted to move this file to the toplevel includes for good grouping/sorting.	2023-11-09 15:09:39 +00:00
Naisila Puka	b36c431abb	PG16 compatibility - Rework PlannedStmt and Query's Permission Info (#7098 ) PG16 compatibility - Part 6 Check out part 1 `42d956888d` part 2 `0d503dd5ac` part 3 `907d72e60d` part 4 `7c6b4ce103` part 5 `6056cb2c29` This commit is in the series of PG16 compatibility commits. It handles the Permission Info changes in PG16. See below: The main issue lies in the following entries of PlannedStmt: { rtable permInfos } Each rtable has an int perminfoindex, and its actual permission info is obtained through the following: permInfos[perminfoindex] We had crashes because perminfoindexes were not updated in the finalized planned statement after distributed planner hook. So, basically, everywhere we set a query's or planned statement's rtable entry, we need to set the rteperminfos/permInfos accordingly. Relevant PG commits: `a61b1f7482` a61b1f74823c9c4f79c95226a461f1e7a367764b `b803b7d132` b803b7d132e3505ab77c29acf91f3d1caa298f95 More PG16 compatibility commits are coming soon ...	2023-08-09 15:23:00 +03:00
Naisila Puka	7c6b4ce103	PG16 compatibility - outer join checks, subscription password, crash fixes (#7097 ) PG16 compatibility - Part 4 Check out part 1 `42d956888d` part 2 `0d503dd5ac` part 3 `907d72e60d` This commit is in the series of PG16 compatibility commits. It adds some outer join checks to the planner, the new password_required option to the subscription, and a crash fix related to PGIOAlignedBlock, see below for more details: - Fix PGIOAlignedBlock Assert crash in PG16 Relevant PG commit: `faeedbcefd` faeedbcefd40bfdf314e048c425b6d9208896d90 - Pass planner info as argument to make_simple_restrictinfo Pre PG16 passing plannerInfo to make_simple_restrictinfo was only needed for placeholder Vars, which is not the case in this part of the codebase because we are building the expression from shard intervals which don't have placeholder vars. However, PG16 is counting baserels appearing in clause_relids and is deleting the rels mentioned in plannerinfo->outer_join_rels Hence directly accessing plannerinfo. We will crash if we leave it as NULL. For reference `2489d76c49 (diff-e045c41eda9686451a7993e91518e40056b3739365e39eb1b70ae438dc1f7c76R207)` Relevant PG commit: `2489d76c49` 2489d76c4906f4461a364ca8ad7e0751ead8aa0d - Add outer join checks, root->simple_rel_array - fix rebalancer to include passwork_required option Relevant PG commit: `c3afe8cf5a` c3afe8cf5a1e465bd71e48e4bc717f5bfdc7a7d6 More PG16 compatibility commits are coming soon ...	2023-08-04 14:51:28 +03:00
Önder Kalacı	cb5eb73048	Add support for router INSERT .. SELECT commands (#7077 ) Tradionally our planner works in the following order: router - > pushdown -> repartition -> pull to coordinator However, for INSERT .. SELECT commands, we did not support "router". In practice, that is not a big issue, because pushdown planning can handle router case as well. However, with PG 16, certain outer joins are converted to JOIN without any conditions (e.g., JOIN .. ON (true)) and the filters are pushed down to the tables. When the filters are pushed down to the tables, router planner can detect. However, pushdown planner relies on JOIN conditions. An example query: ``` INSERT INTO agg_events (user_id) SELECT raw_events_first.user_id FROM raw_events_first LEFT JOIN raw_events_second ON raw_events_first.user_id = raw_events_second.user_id WHERE raw_events_first.user_id = 10; ``` As a side effect of this change, now we can also relax certain limitation that "pushdown" planner emposes, but not "router". So, with this PR, we also allow those. Closes https://github.com/citusdata/citus/pull/6772 DESCRIPTION: Prevents unnecessarily pulling the data into coordinator for some INSERT .. SELECT queries that target a single-shard group	2023-07-28 15:07:20 +03:00
Naisila Puka	69af3e8509	Drop PG13 Support Phase 2 - Remove PG13 specific paths/tests (#7007 ) This commit is the second and last phase of dropping PG13 support. It consists of the following: - Removes all PG_VERSION_13 & PG_VERSION_14 from codepaths - Removes pg_version_compat entries and columnar_version_compat entries specific for PG13 - Removes alternative pg13 test outputs - Removes PG13 normalize lines and fix the test outputs based on that It is a continuation of `5bf163a27d`	2023-06-21 14:18:23 +03:00
Teja Mupparti	58da8771aa	This pull request introduces support for nonroutable merge commands in the following scenarios: 1) For distributed tables that are not colocated. 2) When joining on a non-distribution column for colocated tables. 3) When merging into a distributed table using reference or citus-local tables as the data source. This is accomplished primarily through the implementation of the following two strategies. Repartition: Plan the source query independently, execute the results into intermediate files, and repartition the files to co-locate them with the merge-target table. Subsequently, compile a final merge query on the target table using the intermediate results as the data source. Pull-to-coordinator: Execute the plan that requires evaluation at the coordinator, run the query on the coordinator, and redistribute the resulting rows to ensure colocation with the target shards. Direct the MERGE SQL operation to the worker nodes' target shards, using the intermediate files colocated with the data as the data source.	2023-06-19 12:23:40 -07:00
Onur Tirtir	fa8870217d	Enable logical planner for single-shard tables (#6950 ) * Enable using logical planner for single-shard tables * Improve non-colocated table error in physical planner * Favor distributed tables over reference tables when chosing anchor shard	2023-06-08 10:57:23 +03:00
Teja Mupparti	f6a516dab5	Refactor repartitioning code into generic format	2023-06-05 09:06:05 -07:00
Naisila Puka	48f068d08e	Remove AssertArg and AssertState (#6970 ) PG16 removed them. They were already identical to Assert. We can merge this directly to main branch Relevant PG commit: `b1099eca8f` b1099eca8f38ff5cfaf0901bb91cb6a22f909bc6 Co-authored-by: onderkalaci <onderkalaci@gmail.com>	2023-06-05 13:25:21 +03:00
Teja Mupparti	ff2062e8c3	Rename insert-select redistribute code base to generic purpose	2023-06-01 09:43:43 -07:00
Onur Tirtir	8ff9dde4b3	Prevent pushing down INSERT .. SELECT queries that we shouldn't (and allow some more) (#6752 ) Previously INSERT .. SELECT planner were pushing down some queries that should not be pushed down due to wrong colocation checks. It was checking whether one of the table in SELECT part and target table are colocated. But now, we check colocation for all tables in SELECT part and the target table. Another problem with INSERT .. SELECT planner was that some queries, which is valid to be pushed down, were not pushed down due to unnecessary checks which are currently supported. e.g. UNION check. As solution, we reused the pushdown planner checks for INSERT .. SELECT planner. DESCRIPTION: Fixes a bug that causes incorrectly pushing down some INSERT .. SELECT queries that we shouldn't DESCRIPTION: Prevents unnecessarily pulling the data into coordinator for some INSERT .. SELECT queries DESCRIPTION: Drops support for pushing down INSERT .. SELECT with append table as target Fixes #6749. Fixes #1428. Fixes #6920. --------- Co-authored-by: aykutbozkurt <aykut.bozkurt1995@gmail.com>	2023-05-17 15:05:08 +03:00
Onur Tirtir	db2514ef78	Call null-shard-key tables as single-shard distributed tables in code	2023-05-03 17:02:43 +03:00
Onur Tirtir	39b7711527	Add support for more pushable / non-pushable insert .. select queries with null-shard-key tables (#6823 ) * Add support for dist insert select by selecting from a reference table. This was the only pushable insert .. select case that #6773 didn't cover. * For the cases where we insert into a Citus table but the INSERT .. SELECT query cannot be pushed down, allow pull-to-coordinator when possible. Remove the checks that we had at the very beginning of CreateInsertSelectPlanInternal so that we can try insert .. select via pull-to-coordinator for the cases where we cannot push-down the insert .. select query. What we support via pull-to-coordinator is still limited due to lacking of logical planner support for SELECT queries, but this commit at least allows using pull-to-coordinator for the cases where the select query can be planned via router planner, without limiting ourselves to restrictive top-level checks. Also introduce some additional restrictions into CreateDistributedInsertSelectPlan for the cases it was missing to check for null-shard-key tables. Indeed, it would make more sense to have those checks for distributed tables in general, via separate PRs against main branch. See https://github.com/citusdata/citus/pull/6817. * Add support for inserting into a Postgres table.	2023-05-03 16:24:20 +03:00
Onur Tirtir	85745b46d5	Add initial sql support for distributed tables that don't have a shard key (#6773/#6822) Enable router planner and a limited version of INSERT .. SELECT planner for the queries that reference colocated null shard key tables. * SELECT / UPDATE / DELETE / MERGE is supported as long as it's a router query. * INSERT .. SELECT is supported as long as it only references colocated null shard key tables. Note that this is not only limited to distributed INSERT .. SELECT but also covers a limited set of query types that require pull-to-coordinator, e.g., due to LIMIT clause, generate_series() etc. ... (Ideally distributed INSERT .. SELECT could handle such queries too, e.g., when we're only referencing tables that don't have a shard key, but today this is not the case. See https://github.com/citusdata/citus/pull/6773#discussion_r1140130562.	2023-05-03 16:24:20 +03:00
Onur Tirtir	fa467e05e7	Add support for creating distributed tables with a null shard key (#6745 ) With this PR, we allow creating distributed tables with without specifying a shard key via create_distributed_table(). Here are the the important details about those tables: * Specifying `shard_count` is not allowed because it is assumed to be 1. * We mostly call such tables as "null shard-key" table in code / comments. * To avoid doing a breaking layout change in create_distributed_table(); instead of throwing an error, it will inform the user that `distribution_type` param is ignored unless it's explicitly set to NULL or 'h'. * `colocate_with` param allows colocating such null shard-key tables to each other. * We define this table type, i.e., NULL_SHARD_KEY_TABLE, as a subclass of DISTRIBUTED_TABLE because we mostly want to treat them as distributed tables in terms of SQL / DDL / operation support. * Metadata for such tables look like: - distribution method => DISTRIBUTE_BY_NONE - replication model => REPLICATION_MODEL_STREAMING - colocation id => != INVALID_COLOCATION_ID (distinguishes from Citus local tables) * We assign colocation groups for such tables to different nodes in a round-robin fashion based on the modulo of "colocation id". Note that this PR doesn't care about DDL (except CREATE TABLE) / SQL / operation (i.e., Citus UDFs) support for such tables but adds a preliminary API.	2023-05-03 16:18:27 +03:00
Onur Tirtir	d4f9de7875	Explicitly disallow local rels when inserting into dist table (#6817 )	2023-04-04 17:46:43 +02:00
Teja Mupparti	01ea5f58a9	Fix the incorrect value passed to pointer-to-bool parameter, pass a NULL as the value is not used for this invocation.	2023-03-30 10:45:32 -07:00
aykut-bozkurt	ab71cd01ee	fix multi level recursive plan (#6650 ) Recursive planner should handle all the tree from bottom to top at single pass. i.e. It should have already recursively planned all required parts in its first pass. Otherwise, this means we have bug at recursive planner, which needs to be handled. We add a check here and return error. DESCRIPTION: Fixes wrong results by throwing error in case recursive planner multipass the query. We found 3 different cases which causes recursive planner passes the query multiple times. 1. Sublink in WHERE clause is planned at second pass after we recursively planned a distributed table at the first pass. Fixed by PR #6657. 2. Local-distributed joins are recursively planned at both the first and the second pass. Issue #6659. 3. Some parts of the query is considered to be noncolocated at the second pass as we do not generate attribute equivalances between nondistributed and distributed tables. Issue #6653	2023-01-27 21:25:04 +03:00
Marco Slot	639588bee0	Remove unused functions (#6220 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-08-22 11:53:25 +03:00
Onder Kalaci	149771792b	Remove useless version compats most likely leftover from earlier versions	2022-07-29 10:31:55 +02:00
Marco Slot	cff013a057	Fix issues with insert..select casts and column ordering	2022-07-28 13:23:57 +02:00
Jelte Fennema	3a44fa827a	Add versions of forboth that don't need ListCell (#5856 ) We've had custom versions of Postgres its `foreach` macro which with a hidden ListCell for quite some time now. People like these custom macros, because they are easier to use and require less boilerplate. This adds similar custom versions of Postgres its `forboth` macro. Now you don't need ListCells anymore when looping over two lists at the same time.	2022-03-23 14:50:36 +03:00
Halil Ozan Akgul	b01e7e884c	Pass NULL for plannerInfo as we don't generate PlaceHolderVars	2021-09-03 15:27:25 +03:00
SaitTalhaNisanci	4559d02c41	Fix union pushdown issue (#5079 ) * Fix UNION not being pushdown Postgres optimizes column fields that are not needed in the output. We were relying on these fields to understand if it is safe to push down a union query. This fix looks at the parse query, which has the original column fields to detect if it is safe to push down a union query. * Add more tests * Simplify code and make it more robust * Process varlevelsup > 0 in FindReferencedTableColumn * Only look for outers vars in union path * Add more comments * Remove UNION ALL specific logic for pulling up childvars	2021-07-29 13:52:55 +03:00
SaitTalhaNisanci	03832f353c	Drop postgres 11 support	2021-03-25 09:20:28 +03:00
Onur Tirtir	cacb76d2c6	Not mention citus local tables in error messages (#4579 )	2021-01-27 12:36:53 +03:00
Sait Talha Nisanci	5618f3a3fc	Use BaseRestrictInfo for finding equality columns Baseinfo also has pushed down filters etc, so it makes more sense to use BaseRestrictInfo to determine what columns have constant equality filters. Also RteIdentity is used for removing conversion candidates instead of rteIndex.	2020-12-15 18:18:36 +03:00
Hanefi Önaldı	6d8e83d24f	Replace worker_hash calls with partkey IS NOT NULL filters	2020-10-02 18:16:24 +03:00
Onur Tirtir	3a73fba810	Apply planner changes for citus local tables	2020-09-09 11:51:18 +03:00
SaitTalhaNisanci	366461ccdb	Introduce cache entry/table utilities (#4132 ) Introduce table entry utility functions Citus table cache entry utilities are introduced so that we can easily extend existing functionality with minimum changes, specifically changes to these functions. For example IsNonDistributedTableCacheEntry can be extended for citus local tables without the need to scan the whole codebase and update each relevant part. * Introduce utility functions to find the type of tables A table type can be a reference table, a hash/range/append distributed table. Utility methods are created so that we don't have to worry about how a table is considered as a reference table etc. This also makes it easy to extend the table types. * Add IsCitusTableType utilities * Rename IsCacheEntryCitusTableType -> IsCitusTableTypeCacheEntry * Change citus table types in some checks	2020-09-02 22:26:05 +03:00
SaitTalhaNisanci	73ef40886b	Rename FindNodeCheckXXX functions (#4106 ) FindNodeCheck is not clear about what the function is doing. They are renamed to FindNodeMatchingCheckFunctionXXX. Also for choosing elements in these functions, CheckNodeFunc type is introduced.	2020-08-11 15:01:23 +03:00
Sait Talha Nisanci	8e9b52971c	Use new var field names in the codebase The codebase is updated to use varattnosync and varnosyn and we defined the macros for older versions. This way we can just remove the macros when we drop an older version.	2020-08-04 15:38:13 +03:00
Sait Talha Nisanci	d68bfc5687	Improve error for index operator class parameters The error message when index has opclassopts is improved and the commit from postgres side is also included for future reference. Also some minor style related changes are applied.	2020-08-04 15:38:13 +03:00
Sait Talha Nisanci	62879ee8c1	introduce planner_compat and pg_plan_query_compat macros As the new planner and pg_plan_query_compat methods expect the query string as well, macros are defined to be compatible in different versions of postgres. Relevant commit on Postgres: 6aba63ef3e606db71beb596210dd95fa73c44ce2 Command on Postgres: git log --all --grep="pg_plan_query"	2020-08-04 15:10:22 +03:00
Sait Talha Nisanci	bf831d2e59	Use table_openXXX methods in the codebase With PG13 heap_* (heap_open, heap_close etc) are replaced with table_* (table_open, table_close etc). It is better to use the new table access methods in the codebase and define the macros for the previous versions as we can easily remove the macro without having to change the codebase when we drop the support for the old version. Commits that introduced this change on Postgres: f25968c49697db673f6cd2a07b3f7626779f1827 e0c4ec07284db817e1f8d9adfb3fffc952252db0 4b21acf522d751ba5b6679df391d5121b6c4a35f Command to see relevant commits on Postgres side: git log --all --grep="heap_open"	2020-08-04 15:10:22 +03:00
Onder Kalaci	c25de2cf22	Remove flag from As it doesn't make any sense anymore	2020-07-20 12:45:05 +02:00
Jelte Fennema	759e628dd5	Handle some NULL issues that static analysis found (#4001 ) Static analysis found some issues where we used the result from ExtractResultRelationRTE, without checking that it wasn't NULL. It seems like in all these cases it can never actually be NULL, since we have checked before that it isn't a SELECT query. So, this PR is mostly to make static analysis happy (and protect a bit against future changes of the code).	2020-07-09 15:46:42 +02:00
Marco Slot	b4fec63bc0	Rename master evaluation to coordinator evaluation	2020-07-07 10:37:41 +02:00
Hadi Moshayedi	4ed59d2db3	Move more from insert_select_executor to insert_select_planner	2020-06-26 08:08:26 -07:00
Hadi Moshayedi	4e8d79998e	Save INSERT/SELECT method in DistributedPlan. This is so we don't need to calculate it twice in insert_select_executor.c and multi_explain.c, which can cause discrepancy if an update in one of them is not reflected in the other site.	2020-06-25 08:55:48 -07:00

1 2

100 Commits (b1a38e0bca9a51af64544cf28ed494a1c21cdfde)