citus

Commit Graph

Author	SHA1	Message	Date
Onur Tirtir	fa8870217d	Enable logical planner for single-shard tables (#6950 ) * Enable using logical planner for single-shard tables * Improve non-colocated table error in physical planner * Favor distributed tables over reference tables when chosing anchor shard	2023-06-08 10:57:23 +03:00
Onur Tirtir	db2514ef78	Call null-shard-key tables as single-shard distributed tables in code	2023-05-03 17:02:43 +03:00
Onur Tirtir	85745b46d5	Add initial sql support for distributed tables that don't have a shard key (#6773/#6822) Enable router planner and a limited version of INSERT .. SELECT planner for the queries that reference colocated null shard key tables. * SELECT / UPDATE / DELETE / MERGE is supported as long as it's a router query. * INSERT .. SELECT is supported as long as it only references colocated null shard key tables. Note that this is not only limited to distributed INSERT .. SELECT but also covers a limited set of query types that require pull-to-coordinator, e.g., due to LIMIT clause, generate_series() etc. ... (Ideally distributed INSERT .. SELECT could handle such queries too, e.g., when we're only referencing tables that don't have a shard key, but today this is not the case. See https://github.com/citusdata/citus/pull/6773#discussion_r1140130562.	2023-05-03 16:24:20 +03:00
Teja Mupparti	e444dd4f3f	MERGE: Support reference table as source with local table as target	2023-05-02 11:37:29 -07:00
Onur Tirtir	616b5018a0	Add a GUC to disallow planning the queries that reference non-colocated tables via router planner (#6793 ) Today we allow planning the queries that reference non-colocated tables if the shards that query targets are placed on the same node. However, this may not be the case, e.g., after rebalancing shards because it's not guaranteed to have those shards on the same node anymore. This commit adds citus.enable_non_colocated_router_query_pushdown GUC that can be used to disallow planning such queries via router planner, when it's set to false. Note that the default value for this GUC will be "true" for 11.3, but we will alter it to "false" on 12.0 to not introduce a breaking change in a minor release. Closes #692. Even more, allowing such queries to go through router planner also causes generating an incorrect plan for the DML queries that reference distributed tables that are sharded based on different replication factor settings. For this reason, #6779 can be closed after altering the default value for this GUC to "false", hence not now. DESCRIPTION: Adds `citus.enable_non_colocated_router_query_pushdown` GUC to ensure generating a consistent distributed plan for the queries that reference non-colocated distributed tables (when set to "false", the default is "true").	2023-03-28 13:10:29 +03:00
Teja Mupparti	9bab819f26	Disentangle MERGE planning code from the modify-planning code path	2023-03-27 10:41:46 -07:00
Teja Mupparti	cf55136281	1) Restrict MERGE command INSERT to the source's distribution column Fixes #6672 2) Move all MERGE related routines to a new file merge_planner.c 3) Make ConjunctionContainsColumnFilter() static again, and rearrange the code in MergeQuerySupported() 4) Restore the original format in the comments section. 5) Add big serial test. Implement latest set of comments	2023-03-16 13:43:08 -07:00
Teja Mupparti	d7b499929c	Rearrange the common code into a newfunction to facilitate the multiple checks of the same conditions in a multi-modify MERGE statement	2023-02-24 12:55:11 -08:00
Marco Slot	a482b36760	Revert "Support MERGE on distributed tables with restrictions" (#6675 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2023-01-30 15:01:59 +01:00
Teja Mupparti	44c387b978	Support MERGE on distributed tables with restrictions This implements the phase - II of MERGE sql support Support routable query where all the tables in the merge-sql are distributed, co-located, and both the source and target relations are joined on the distribution column with a constant qual. This should be a Citus single-task query. Below is an example. SELECT create_distributed_table('t1', 'id'); SELECT create_distributed_table('s1', 'id', colocate_with => ‘t1’); MERGE INTO t1 USING s1 ON t1.id = s1.id AND t1.id = 100 WHEN MATCHED THEN UPDATE SET val = s1.val + 10 WHEN MATCHED THEN DELETE WHEN NOT MATCHED THEN INSERT (id, val, src) VALUES (s1.id, s1.val, s1.src) Basically, MERGE checks to see if There are a minimum of two distributed tables (source and a target). All the distributed tables are indeed colocated. MERGE relations are joined on the distribution column MERGE .. USING .. ON target.dist_key = source.dist_key The query should touch only a single shard i.e. JOIN AND with a constant qual MERGE .. USING .. ON target.dist_key = source.dist_key AND target.dist_key = <> If any of the conditions are not met, it raises an exception.	2023-01-18 11:05:27 -08:00
Teja Mupparti	9a9989fc15	Support MERGE Phase – I All the tables (target, source or any CTE present) in the SQL statement are local i.e. a merge-sql with a combination of Citus local and Non-Citus tables (regular Postgres tables) should work and give the same result as Postgres MERGE on regular tables. Catch and throw an exception (not-yet-supported) for all other scenarios during Citus-planning phase.	2022-12-18 20:32:15 -08:00
SaitTalhaNisanci	bcbd24f8de	Only consider pseudo constants for shortcuts (#4712 ) It seems that we need to consider only pseudo constants while doing some shortcuts in planning. For example there could be a false clause but it can contribute to the result in which case it will not be a pseudo constant.	2021-02-15 18:39:37 +03:00
Sait Talha Nisanci	f7c1509fed	Not check if the query is routable for converting It seems that there are only very few cases where that is useful, and for now we prefer not having that check. This means that we might perform some unnecessary checks, but that should be rare and not performance critical.	2020-12-15 18:18:36 +03:00
Sait Talha Nisanci	3aed6c3ad0	Rename containsOnlyLocalTable as isLocalTableModification Update error message in Modify View	2020-12-15 18:18:36 +03:00
Sait Talha Nisanci	5618f3a3fc	Use BaseRestrictInfo for finding equality columns Baseinfo also has pushed down filters etc, so it makes more sense to use BaseRestrictInfo to determine what columns have constant equality filters. Also RteIdentity is used for removing conversion candidates instead of rteIndex.	2020-12-15 18:18:36 +03:00
Sait Talha Nisanci	28c5b6a425	Convert some hard coded errors to deferred errors in router planner	2020-12-15 18:18:36 +03:00
Sait Talha Nisanci	5693cabc41	Not convert an already routable plannable query We should not recursively plan an already routable plannable query. An example of this is (SELECT * FROM local JOIN (SELECT * FROM dist) d1 USING(a)); So we let the recursive planner do all of its work and at the end we convert the final query to to handle unsupported joins. While doing each conversion, we check if it is router plannable, if so we stop. Only consider range table entries that are in jointree If a range table is not in jointree then there is no point in considering that because we are trying to convert range table entries to subqueries for join use case.	2020-12-15 18:17:10 +03:00
Hanefi Önaldı	6d8e83d24f	Replace worker_hash calls with partkey IS NOT NULL filters	2020-10-02 18:16:24 +03:00
Jelte Fennema	759e628dd5	Handle some NULL issues that static analysis found (#4001 ) Static analysis found some issues where we used the result from ExtractResultRelationRTE, without checking that it wasn't NULL. It seems like in all these cases it can never actually be NULL, since we have checked before that it isn't a SELECT query. So, this PR is mostly to make static analysis happy (and protect a bit against future changes of the code).	2020-07-09 15:46:42 +02:00
Marco Slot	4f7989ad8e	Rename WorkersContainingAllShards to PlacementsForWorkersContainingAllShards	2020-06-12 18:36:02 -07:00
Marco Slot	d953f084db	Rename FindRouterWorkerList to CreateTaskPlacementListForShardIntervals	2020-06-12 18:36:01 -07:00
Marco Slot	24feadc230	Handle joins between local/reference/cte via router planner	2020-06-12 18:36:01 -07:00
Philip Dubé	bec58000d6	Given IsDistributedTableRTE, there's ambiguity in what DistributedTable means Elsewhere we used DistributedTable to include reference tables Marco suggested we use CitusTable for distributed & reference tables So renaming: - IsDistributedTable -> IsCitusTable - IsDistributedTableViaCatalog -> IsCitusTableViaCatalog - DistributedTableCacheEntry -> CitusTableCacheEntry - DistributedTableList -> CitusTableList - isDistributedTable -> isCitusTable - InsertSelectIntoDistributedTable -> InsertSelectIntoCitusTable - ExtractFirstDistributedTableId -> ExtractFirstCitusTableId	2020-03-06 18:57:55 +00:00
Marco Slot	dc4c0c032e	Refactor CitusBeginScan into separate DML / SELECT paths	2020-03-05 12:37:22 +01:00
Onder Kalaci	975c4c2264	Do not prune shards if the distribution key is NULL The root of the problem is that, standard_planner() converts the following qual ``` {OPEXPR :opno 98 :opfuncid 67 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 100 :args ( {VAR :varno 1 :varattno 1 :vartype 25 :vartypmod -1 :varcollid 100 :varlevelsup 0 :varnoold 1 :varoattno 1 :location 45 } {CONST :consttype 25 :consttypmod -1 :constcollid 100 :constlen -1 :constbyval false :constisnull true :location 51 :constvalue <> } ) :location 49 } ``` To ``` ( {CONST :consttype 16 :consttypmod -1 :constcollid 0 :constlen 1 :constbyval true :constisnull true :location -1 :constvalue <> } ) ``` So, Citus doesn't deal with NULL values in real-time or non-fast path router queries. And, in the FastPathRouter planner, we check constisnull in DistKeyInSimpleOpExpression(). However, in deferred pruning case, we do not check for isnull for const. Thus, the fix consists of two parts: - Let PruneShards() not crash when NULL parameter is passed - For deferred shard pruning in fast-path queries, explicitly check that we have CONST which is not NULL	2020-02-13 15:00:31 +01:00
Onder Kalaci	dc17c2658e	Defer shard pruning for fast-path router queries to execution This is purely to enable better performance with prepared statements. Before this commit, the fast path queries with prepared statements where the distribution key includes a parameter always went through distributed planning. After this change, we only go through distributed planning on the first 5 executions.	2020-01-16 16:59:36 +01:00
Onder Kalaci	7f3ab7892d	Skip shard pruning when possible We're already traversing the queryTree and finding the distribution key value, so pass it to the later stages of the planning.	2020-01-06 12:42:43 +01:00
SaitTalhaNisanci	13204487e9	remove copyright years (#3286 )	2019-12-11 21:14:08 +03:00
SaitTalhaNisanci	b9b7fd7660	add IsLoggableLevel utility function (#3149 ) * add IsLoggableLevel utility function * add function comment for IsLoggableLevel * put ApplyLogRedaction to logutils	2019-11-15 14:59:13 +03:00
Philip Dubé	b77c52f95b	PlanRouterQuery: don't store list of list of shard intervals in relationShardList	2019-08-02 14:08:57 +00:00
Philip Dubé	0915027389	DistributedPlan: replace operation with modLevel This causes no behaviorial changes, only organizes better to implement modifying CTEs Also rename ExtactInsertRangeTableEntry to ExtractResultRelationRTE, as the source of this function didn't match the documentation Remove Task's upsertQuery in favor of ROW_MODIFY_NONCOMMUTATIVE Split up AcquireExecutorShardLock into more internal functions Tests: Normalize multi_reference_table multi_create_table_constraints	2019-07-16 13:58:18 -07:00
Onder Kalaci	f144bb4911	Introduce fast path router planning In this context, we define "Fast Path Planning for SELECT" as trivial queries where Citus can skip relying on the standard_planner() and handle all the planning. For router planner, standard_planner() is mostly important to generate the necessary restriction information. Later, the restriction information generated by the standard_planner is used to decide whether all the shards that a distributed query touches reside on a single worker node. However, standard_planner() does a lot of extra things such as cost estimation and execution path generations which are completely unnecessary in the context of distributed planning. There are certain types of queries where Citus could skip relying on standard_planner() to generate the restriction information. For queries in the following format, Citus does not need any information that the standard_planner() generates: SELECT ... FROM single_table WHERE distribution_key = X; or DELETE FROM single_table WHERE distribution_key = X; or UPDATE single_table SET value_1 = value_2 + 1 WHERE distribution_key = X; Note that the queries might not be as simple as the above such that GROUP BY, WINDOW FUNCIONS, ORDER BY or HAVING etc. are all acceptable. The only rule is that the query is on a single distributed (or reference) table and there is a "distribution_key = X;" in the WHERE clause. With that, we could use to decide the shard that a distributed query touches reside on a worker node.	2019-02-21 13:27:01 +03:00
Onder Kalaci	c1b5a04f6e	Allow partitioned tables with replication factor > 1 With this commit, we all partitioned distributed tables with replication factor > 1. However, we also have many restrictions. In summary, we disallow all kinds of modifications (including DDLs) on the partition tables. Instead, the user is allowed to run the modifications over the parent table. The necessity for such a restriction have two aspects: - We need to acquire shard resource locks appropriately - We need to handle marking partitions INVALID in case of any failures. Note that, in theory, the parent table should also become INVALID, which is too aggressive.	2018-09-21 14:40:41 +03:00
Nils Dijk	6a15e1c9fc	extract ErrorIfOnConflictNotSupported function for reuse	2018-07-23 12:20:10 +02:00
Murat Tuncer	4d35b92016	Add groundwork for citus_stat_statements api	2018-06-27 14:20:03 +03:00
Marco Slot	fd4ff29f2f	Add a debug message with distribution column value	2018-06-05 15:09:17 +03:00
velioglu	32bcd610c1	Support modify queries with multiple tables With this commit we begin to support modify queries with multiple tables if these queries are pushdownable.	2018-05-02 16:22:26 +03:00
Marco Slot	ee132c5ead	Prune shards once per relation in subquery pushdown	2018-04-10 20:33:07 +02:00
Metin Doslu	bcf660475a	Add support for modifying CTEs	2018-02-27 15:08:32 +02:00
Marco Slot	6ba3f42d23	Rename MultiPlan to DistributedPlan	2017-11-22 09:36:24 +01:00
velioglu	0b5db5d826	Support multi shard update/delete queries	2017-10-25 15:52:38 +03:00
Marco Slot	0aadbb1760	Convert multi-row INSERT target list to Vars	2017-08-25 10:55:56 +02:00
Marco Slot	ae00795dab	Allow default columns in multi-row INSERTs	2017-08-25 10:55:56 +02:00
Marco Slot	c97692f382	Fix multi-row INSERT with RETURNING on reference tables	2017-08-24 10:42:12 +02:00
Metin Doslu	b8a9e7c1bf	Add support for UPDATE/DELETE with subqueries	2017-08-08 21:35:08 +03:00
Marco Slot	aa7ca81548	Execute UPDATE/DELETE statements with 0 shards	2017-08-07 15:36:58 +02:00
Marco Slot	da47a03b18	Move INSERT ... SELECT planning logic into one place	2017-06-29 15:03:14 +02:00
Marco Slot	2f8ac82660	Execute INSERT..SELECT via coordinator if it cannot be pushed down Add a second implementation of INSERT INTO distributed_table SELECT ... that is used if the query cannot be pushed down. The basic idea is to execute the SELECT query separately and pass the results into the distributed table using a CopyDestReceiver, which is also used for COPY and create_distributed_table. When planning the SELECT, we go through planner hooks again, which means the SELECT can also be a distributed query. EXPLAIN is supported, but EXPLAIN ANALYZE is not because preventing double execution was a lot more complicated in this case.	2017-06-22 15:46:30 +02:00
Marco Slot	155db4d913	Simplify router planner call path	2017-06-22 15:45:57 +02:00
Önder Kalacı	ad5cd326a4	Subquery pushdown - main branch (#1323 ) * Enabling physical planner for subquery pushdown changes This commit applies the logic that exists in INSERT .. SELECT planning to the subquery pushdown changes. The main algorithm is followed as : - pick an anchor relation (i.e., target relation) - per each target shard interval - add the target shard interval's shard range as a restriction to the relations (if all relations joined on the partition keys) - Check whether the query is router plannable per target shard interval. - If router plannable, create a task * Add union support within the JOINS This commit adds support for UNION/UNION ALL subqueries that are in the following form: .... (Q1 UNION Q2 UNION ...) as union_query JOIN (QN) ... In other words, we currently do NOT support the queries that are in the following form where union query is not JOINed with other relations/subqueries : .... (Q1 UNION Q2 UNION ...) as union_query .... * Subquery pushdown planner uses original query With this commit, we change the input to the logical planner for subquery pushdown. Before this commit, the planner was relying on the query tree that is transformed by the postgresql planner. After this commit, the planner uses the original query. The main motivation behind this change is the simplify deparsing of subqueries. * Enable top level subquery join queries This work enables - Top level subquery joins - Joins between subqueries and relations - Joins involving more than 2 range table entries A new regression test file is added to reflect enabled test cases * Add top level union support This commit adds support for UNION/UNION ALL subqueries that are in the following form: .... (Q1 UNION Q2 UNION ...) as union_query .... In other words, Citus supports allow top level unions being wrapped into aggregations queries and/or simple projection queries that only selects some fields from the lower level queries. * Disallow subqueries without a relation in the range table list for subquery pushdown This commit disallows subqueries without relation in the range table list. This commit is only applied for subquery pushdown. In other words, we do not add this limitation for single table re-partition subqueries. The reasoning behind this limitation is that if we allow pushing down such queries, the result would include (shardCount * expectedResults) where in a non distributed world the result would be (expectedResult) only. * Disallow subqueries without a relation in the range table list for INSERT .. SELECT This commit disallows subqueries without relation in the range table list. This commit is only applied for INSERT.. SELECT queries. The reasoning behind this limitation is that if we allow pushing down such queries, the result would include (shardCount * expectedResults) where in a non distributed world the result would be (expectedResult) only. * Change behaviour of subquery pushdown flag (#1315) This commit changes the behaviour of the citus.subquery_pushdown flag. Before this commit, the flag is used to enable subquery pushdown logic. But, with this commit, that behaviour is enabled by default. In other words, the flag is now useless. We prefer to keep the flag since we don't want to break the backward compatibility. Also, we may consider using that flag for other purposes in the next commits. * Require subquery_pushdown when limit is used in subquery Using limit in subqueries may cause returning incorrect results. Therefore we allow limits in subqueries only if user explicitly set subquery_pushdown flag. * Evaluate expressions on the LIMIT clause (#1333) Subquery pushdown uses orignal query, the LIMIT and OFFSET clauses are not evaluated. However, logical optimizer expects these expressions are already evaluated by the standard planner. This commit manually evaluates the functions on the logical planner for subquery pushdown. * Better format subquery regression tests (#1340) * Style fix for subquery pushdown regression tests With this commit we intented a more consistent style for the regression tests we've added in the - multi_subquery_union.sql - multi_subquery_complex_queries.sql - multi_subquery_behavioral_analytics.sql * Enable the tests that are temporarily commented This commit enables some of the regression tests that were commented out until all the development is done. * Fix merge conflicts (#1347) - Update regression tests to meet the changes in the regression test output. - Replace Ifs with Asserts given that the check is already done - Update shard pruning outputs * Add view regression tests for increased subquery coverage (#1348) - joins between views and tables - joins between views - union/union all queries involving views - views with limit - explain queries with view * Improve btree operators for the subquery tests This commit adds the missing comprasion for subquery composite key btree comparator.	2017-04-29 04:09:48 +03:00

1 2

68 Commits (fa8870217d34ec880737e4f17b44b1c77226f966)