citus

Commit Graph

Author	SHA1	Message	Date
Teja Mupparti	fa7b8949a8	This implements MERGE phase3 Support pushdown query where all the tables in the merge-sql are Citus-distributed, co-located, and both the source and target relations are joined on the distribution column. This will generate multiple tasks which execute independently after pushdown. SELECT create_distributed_table('t1', 'id'); SELECT create_distributed_table('s1', 'id', colocate_with => ‘t1’); MERGE INTO t1 USING s1 ON t1.id = s1.id WHEN MATCHED THEN UPDATE SET val = s1.val + 10 WHEN MATCHED THEN DELETE WHEN NOT MATCHED THEN INSERT (id, val, src) VALUES (s1.id, s1.val, s1.src) *The only exception for both the phases II and III is, UPDATEs and INSERTs must be done on the same shard-group as the joined key; for example, below scenarios are NOT supported as the key-value to be inserted/updated is not guaranteed to be on the same node as the id distribution-column. MERGE INTO target t USING source s ON (t.customer_id = s.customer_id) WHEN NOT MATCHED THEN - - INSERT(customer_id, …) VALUES (<non-local-constant-key-value>, ……); OR this scenario where we update the distribution column itself MERGE INTO target t USING source s On (t.customer_id = s.customer_id) WHEN MATCHED THEN UPDATE SET customer_id = 100;	2023-01-24 17:44:21 -08:00
Teja Mupparti	44c387b978	Support MERGE on distributed tables with restrictions This implements the phase - II of MERGE sql support Support routable query where all the tables in the merge-sql are distributed, co-located, and both the source and target relations are joined on the distribution column with a constant qual. This should be a Citus single-task query. Below is an example. SELECT create_distributed_table('t1', 'id'); SELECT create_distributed_table('s1', 'id', colocate_with => ‘t1’); MERGE INTO t1 USING s1 ON t1.id = s1.id AND t1.id = 100 WHEN MATCHED THEN UPDATE SET val = s1.val + 10 WHEN MATCHED THEN DELETE WHEN NOT MATCHED THEN INSERT (id, val, src) VALUES (s1.id, s1.val, s1.src) Basically, MERGE checks to see if There are a minimum of two distributed tables (source and a target). All the distributed tables are indeed colocated. MERGE relations are joined on the distribution column MERGE .. USING .. ON target.dist_key = source.dist_key The query should touch only a single shard i.e. JOIN AND with a constant qual MERGE .. USING .. ON target.dist_key = source.dist_key AND target.dist_key = <> If any of the conditions are not met, it raises an exception.	2023-01-18 11:05:27 -08:00
Önder Kalacı	eb75decbeb	Undo planner extended statistics override (#6492 )	2023-01-04 13:25:57 +01:00
Teja Mupparti	9a9989fc15	Support MERGE Phase – I All the tables (target, source or any CTE present) in the SQL statement are local i.e. a merge-sql with a combination of Citus local and Non-Citus tables (regular Postgres tables) should work and give the same result as Postgres MERGE on regular tables. Catch and throw an exception (not-yet-supported) for all other scenarios during Citus-planning phase.	2022-12-18 20:32:15 -08:00
aykut-bozkurt	9c0073ba57	remove unused boundary type (#6563 ) Removes unused job boundary tag `SUBQUERY_MAP_MERGE_JOB`. Only usage is at `BuildMapMergeJob`, which is only called when the boundary = `JOIN_MAP_MERGE_JOB`. Hence, it should be safe to remove.	2022-12-16 18:19:22 +03:00
Onur Tirtir	2803470b58	Add lateral join checks for outer joins and drop the useless ones for semi joins	2022-12-07 18:27:50 +03:00
Onur Tirtir	e7e4881289	Phase - III: recursively plan non-recurring sub join trees too	2022-12-07 18:27:50 +03:00
Onur Tirtir	f52381387e	Phase - II: recursively plan non-recurring subqueries too	2022-12-07 18:27:50 +03:00
Onur Tirtir	f339450a9d	Phase - I: recursively plan non-recurring relations	2022-12-07 18:27:50 +03:00
aykut-bozkurt	83ef600f27	fix false full join pushdown error check (#6523 ) Problem: Currently, we error out if we detect recurring tuples in one side without checking the other side of the join. Solution: When one side of the full join consists recurring tuples and the other side consists nonrecurring tuples, we should not pushdown to prevent duplicate results. Otherwise, safe to pushdown.	2022-11-30 14:17:56 +03:00
Philip Dubé	cf69fc3652	Grammar: it's to its Includes an error message & one case of its to it's Also fix "to the to" typos	2022-11-28 20:43:44 +00:00
Emel Şimşek	8e5ba45b74	Fixes a bug that causes crash when using auto_explain extension with ALTER TABLE...ADD FOREIGN KEY... queries. (#6470 ) Fixes a bug that causes crash when using auto_explain extension with ALTER TABLE...ADD FOREIGN KEY... queries. Those queries trigger a SELECT query on the citus tables as part of the foreign key constraint validation check. At the explain hook, workers try to explain this SELECT query as a distributed query causing memory corruption in the connection data structures. Hence, we will not explain ALTER TABLE...ADD FOREIGN KEY... and the triggered queries on the workers. Fixes #6424.	2022-11-15 17:53:39 +03:00
Marco Slot	666696c01c	Deprecate citus.replicate_reference_tables_on_activate, make it always off (#6474 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-11-04 16:21:10 +01:00
Onur Tirtir	dbe2749bbf	Drop unreachable code from query_pushdown_planning.c (#6451 ) Given that we cannot continue after a `RaiseDeferredErrorInternal(.., ERROR)` call.	2022-10-21 18:04:31 +03:00
Emel Şimşek	02fd1e6c03	Fix the crash that happens when using auto_explain extension with recursive queries (#6406 ) This crash happens with recursively planned queries. For such queries, subplans are explained via the ExplainOnePlan function of postgresql. This function reconstructs the query description from the plan therefore it expects the ActiveSnaphot for the query be available. This fix makes sure that the snapshot is in the stack before calling ExplainOnePlan. Fixes #2920.	2022-10-19 18:04:45 +03:00
Onder Kalaci	766f340ce0	Prevent failures on partitioned distributed tables with statistics objects on PG 15 Comment from the code is clear on this: /* * The statistics objects of the distributed table are not relevant * for the distributed planning, so we can override it. * * Normally, we should not need this. However, the combination of * Postgres commit 269b532aef55a579ae02a3e8e8df14101570dfd9 and * Citus function AdjustPartitioningForDistributedPlanning() * forces us to do this. The commit expects statistics objects * of partitions to have "inh" flag set properly. Whereas, the * function overrides "inh" flag. To avoid Postgres to throw error, * we override statlist such that Postgres does not try to process * any statistics objects during the standard_planner() on the * coordinator. In the end, we do not need the standard_planner() * on the coordinator to generate an optimized plan. We call * into standard_planner() for other purposes, such as generating the * relationRestrictionContext here. * * AdjustPartitioningForDistributedPlanning() is a hack that we use * to prevent Postgres' standard_planner() to expand all the partitions * for the distributed planning when a distributed partitioned table * is queried. It is required for both correctness and performance * reasons. Although we can eliminate the use of the function for * the correctness (e.g., make sure that rest of the planner can handle * partitions), it's performance implication is hard to avoid. Certain * planning logic of Citus (such as router or query pushdown) relies * heavily on the relationRestrictionList. If * AdjustPartitioningForDistributedPlanning() is removed, all the * partitions show up in the, causing high planning times for * such queries. */	2022-09-15 14:36:05 +03:00
naisila	47bea76c6c	Revert "Support JSON_TABLE on PG 15 (#6241 )" This reverts commit `1f4fe35512`.	2022-09-12 15:20:17 +03:00
Emel Şimşek	6f06ff78cc	Throw an error if there is a RangeTblEntry that is not assigned an RTE identity. (#6295 ) * Fix issue : 6109 Segfault or (assertion failure) is possible when using a SQL function * DESCRIPTION: Ensures disallowing the usage of SQL functions referencing to a distributed table and prevents a segfault. Using a SQL function may result in segmentation fault in some cases. This change fixes the issue by throwing an error message when a SQL function cannot be handled. Fixes #6109. * DESCRIPTION: Ensures disallowing the usage of SQL functions referencing to a distributed table and prevents a segfault. Using a SQL function may result in segmentation fault in some cases. This change fixes the issue by throwing an error message when a SQL function cannot be handled. Fixes #6109. Co-authored-by: Emel Simsek <emel.simsek@microsoft.com>	2022-09-06 15:46:41 +02:00
aykut-bozkurt	69726648ab	verify shards if exists for insert, delete, update (#6280 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-09-06 15:29:14 +02:00
Naisila Puka	98dcbeb304	Specifies that our CustomScan providers support projections (#6244 ) Before, this was the default mode for CustomScan providers. Now, the default is to assume that they can't project. This causes performance penalties due to adding unnecessary Result nodes. Hence we use the newly added flag, CUSTOMPATH_SUPPORT_PROJECTION to get it back to how it was. In PG15 support branch we created explain functions to ignore the new Result nodes, so we undo that in this commit. Relevant PG commit: 955b3e0f9269639fb916cee3dea37aee50b82df0	2022-08-31 10:48:01 +03:00
Önder Kalacı	3ed6fea1cf	Prevent Merge command on distributed tables [PG 15] (#6238 )	2022-08-25 13:27:08 +03:00
Marco Slot	ac07d33a29	Remove unused reduceQuery from physical planning (#6221 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-08-24 17:24:27 +00:00
Naisila Puka	1f4fe35512	Support JSON_TABLE on PG 15 (#6241 ) Postgres supports JSON_TABLE feature on PG 15. We treat JSON_TABLE the same as correlated functions (e.g., recurring tuples). In the end, for multi-shard JSON_TABLE commands, we apply the same restrictions as reference tables (e.g., cannot be in the outer part of an outer join etc.) Co-authored-by: Onder Kalaci <onderkalaci@gmail.com>	2022-08-24 19:11:18 +03:00
Marco Slot	639588bee0	Remove unused functions (#6220 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-08-22 11:53:25 +03:00
Onder Kalaci	149771792b	Remove useless version compats most likely leftover from earlier versions	2022-07-29 10:31:55 +02:00
Marco Slot	cff013a057	Fix issues with insert..select casts and column ordering	2022-07-28 13:23:57 +02:00
aykut-bozkurt	67ac3da2b0	added citus_depended_objects udf and HideCitusDependentObjects GUC to hide citus depended objects from pg meta queries (#6055 ) use RecurseObjectDependencies api to find if an object is citus depended make vanilla tests runnable to see if citus_depended function is working correctly	2022-07-25 16:43:34 +03:00
Naisila Puka	7d6410c838	Drop postgres 12 support (#6040 ) * Remove if conditions with PG_VERSION_NUM < 13 * Remove server_above_twelve(&eleven) checks from tests * Fix tests * Remove pg12 and pg11 alternative test output files * Remove pg12 specific normalization rules * Some more if conditions in the code * Change RemoteCollationIdExpression and some pg12/pg13 comments * Remove some more normalization rules	2022-07-20 17:49:36 +03:00
Jelte Fennema	184c7c0bce	Make enterprise features open source (#6008 ) This PR makes all of the features open source that were previously only available in Citus Enterprise. Features that this adds: 1. Non blocking shard moves/shard rebalancer (`citus.logical_replication_timeout`) 2. Propagation of CREATE/DROP/ALTER ROLE statements 3. Propagation of GRANT statements 4. Propagation of CLUSTER statements 5. Propagation of ALTER DATABASE ... OWNER TO ... 6. Optimization for COPY when loading JSON to avoid double parsing of the JSON object (`citus.skip_jsonb_validation_in_copy`) 7. Support for row level security 8. Support for `pg_dist_authinfo`, which allows storing different authentication options for different users, e.g. you can store passwords or certificates here. 9. Support for `pg_dist_poolinfo`, which allows using connection poolers in between coordinator and workers 10. Tracking distributed query execution times using citus_stat_statements (`citus.stat_statements_max`, `citus.stat_statements_purge_interval`, `citus.stat_statements_track`). This is disabled by default. 11. Blocking tenant_isolation 12. Support for `sslkey` and `sslcert` in `citus.node_conninfo`	2022-06-16 00:23:46 -07:00
Gledis Zeneli	beef392f5a	Fix memory error with citus_add_node reported by valgrind test (#5967 ) The error comes due to the datum jsonb in pg_dist_metadata_node.metadata being 0 in some scenarios. This is likely due to not copying the data when receiving a datum from a tuple and pg deciding to deallocate that memory when the table that the tuple was from is closed. Also fix another place in the code that might have been susceptible to this issue. I tested on both multi-vg and multi-1-vg and the test were successful.	2022-05-28 00:22:00 +03:00
Marco Slot	7abcfac61f	Add caching for functions that check the backend type	2022-05-20 19:02:37 +02:00
Burak Velioglu	06a94d167e	Use object address instead of relation id on DDLJob to decide on syncing metadata	2022-05-05 17:59:44 +03:00
Jeff Davis	3e1180de78	PG15: handle extra argument to parse_analyze_varparams(). From PG commit 25751f54b8.	2022-05-02 10:12:03 -07:00
Jeff Davis	bd455f42e3	PG15: handle change to SeqScan structure. Account for PG commit 2226b4189b. The one site dependent on it can do just as well with a Scan instead of a SeqScan.	2022-05-02 10:12:03 -07:00
Jeff Davis	3799f95742	PG15: Value -> String, Integer, Float. Handle PG commit 639a86e36a.	2022-05-02 10:12:03 -07:00
Marco Slot	c0827703ec	Fix EXPLAIN ANALYZE JSON format for subplans	2022-04-07 11:38:20 +02:00
Marco Slot	544dce919a	Handle user-defined type parameters in EXPLAIN ANALYZE	2022-04-07 11:14:32 +02:00
Marco Slot	8c8c3b665d	Add TABLESAMPLE support	2022-04-01 15:51:40 +02:00
Jelte Fennema	3a44fa827a	Add versions of forboth that don't need ListCell (#5856 ) We've had custom versions of Postgres its `foreach` macro which with a hidden ListCell for quite some time now. People like these custom macros, because they are easier to use and require less boilerplate. This adds similar custom versions of Postgres its `forboth` macro. Now you don't need ListCells anymore when looping over two lists at the same time.	2022-03-23 14:50:36 +03:00
Onder Kalaci	af4ba3eb1f	Remove citus.enable_cte_inlining GUC In Postgres 12+, users can adjust whether to inline/not inline CTEs by [NOT] MATERIALIZED keywords. So, this GUC is already useless.	2022-03-22 17:14:44 +01:00
Jelte Fennema	e5d5c7be93	Start erroring out for unsupported lateral subqueries (#5753 ) With the introduction of #4385 we inadvertently started allowing and pushing down certain lateral subqueries that were unsafe to push down. To be precise the type of LATERAL subqueries that is unsafe to push down has all of the following properties: 1. The lateral subquery contains some non recurring tuples 2. The lateral subquery references a recurring tuple from outside of the subquery (recurringRelids) 3. The lateral subquery requires a merge step (e.g. a LIMIT) 4. The reference to the recurring tuple should be something else than an equality check on the distribution column, e.g. equality on a non distribution column. Property number four is considered both hard to detect and probably not used very often. Thus this PR ignores property number four and causes query planning to error out if the first three properties hold. Fixes #5327	2022-03-11 11:59:18 +01:00
Marco Slot	49467e27e6	Ensure worker_save_query_explain_analyze always fully qualifies types (#5776 ) Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com> Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-03-10 07:30:11 -08:00
Gledis Zeneli	2cb02bfb56	Fix node adding itself with citus_add_node leading to deadlock (Fix #5720 ) (#5758 ) If a worker node is being added, a command is sent to get the server_id of the worker from the pg_dist_node_metadata table. If the worker's id is the same as the node executing the code, we will know the node is trying to add itself. If the node tries to add itself without specifying `groupid:=0` the operation will result in an error.	2022-03-10 17:46:33 +03:00
Marco Slot	ef1ceb3953	Only use a single placement for map tasks	2022-02-23 19:40:21 +01:00
Marco Slot	3cd9aa655a	Stop using citus.binary_worker_copy_format	2022-02-23 19:40:21 +01:00
Marco Slot	5ac0d31e8b	Fix re-partition hash range generation	2022-02-23 19:40:21 +01:00
Marco Slot	72d8fde28b	Use intermediate results for re-partition joins	2022-02-23 19:40:21 +01:00
Ahmet Gedemenli	2bc6a00408	Refactor CreateDistributedTable to take column name	2022-02-21 12:07:17 +03:00
Teja Mupparti	46fa47beea	Force-delegated functions' distribution argument must be reset as soon as the routine completes execution, and not wait until the top level Executor ends. This fixes issue #5687	2022-02-17 10:48:30 -08:00
Marco Slot	d0711ea9b4	Delegate function calls in FROM outside of transaction block	2022-02-09 20:56:25 +01:00

1 2 3 4 5 ...

789 Commits (fa7b8949a88bf614d5a07fc33f6159d9efa5d087)