citus

Commit Graph

Author	SHA1	Message	Date
Murat Tuncer	f20258ef10	Expand count distinct support We can now support more complex count distinct operations by pulling necessary columns to coordinator and evalutating the aggreage at coordinator. It supports broad range of expression with the restriction that the expression must contain a column.	2018-07-06 09:44:20 +03:00
mehmet furkan şahin	06217be326	hll aggregate functions are supported natively	2018-07-04 16:41:09 +03:00
Murat Tuncer	e532755a6e	Fix bug in partition column extraction added strip_implicit_coercion prior to checking if the expression is Const. This is important to find values for types like bigint.	2018-07-02 18:08:16 +03:00
Murat Tuncer	4d35b92016	Add groundwork for citus_stat_statements api	2018-06-27 14:20:03 +03:00
velioglu	53b2e81d01	Adds SELECT ... FOR UPDATE support for router plannable queries	2018-06-18 13:55:17 +03:00
Marco Slot	fd4ff29f2f	Add a debug message with distribution column value	2018-06-05 15:09:17 +03:00
velioglu	caa27161ca	Check volatile functions in modify queries	2018-05-08 11:16:40 +03:00
Marco Slot	2f9c8c6af0	Allow DML commands with unreferenced SELECT CTEs	2018-05-03 14:53:26 +02:00
Marco Slot	f8cfe07fd1	Support intermediate results in distributed INSERT..SELECT	2018-05-03 14:42:28 +02:00
Marco Slot	90cdfff602	Implement recursive planning for DML statements	2018-05-03 14:42:28 +02:00
Onder Kalaci	317dd02a2f	Implement single repartitioning on hash distributed tables * Change worker_hash_partition_table() such that the divergence between Citus planner's hashing and worker_hash_partition_table() becomes the same. * Rename single partitioning to single range partitioning. * Add single hash repartitioning. Basically, logical planner treats single hash and range partitioning almost equally. Physical planner, on the other hand, treats single hash and dual hash repartitioning almost equally (except for JoinPruning). * Add a new GUC to enable this feature	2018-05-02 18:50:55 +03:00
velioglu	32bcd610c1	Support modify queries with multiple tables With this commit we begin to support modify queries with multiple tables if these queries are pushdownable.	2018-05-02 16:22:26 +03:00
velioglu	d9fa69c031	Refactor query pushdown related logic	2018-05-02 15:03:09 +03:00
velioglu	121ff39b26	Removes large_table_shard_count GUC	2018-04-29 10:34:50 +02:00
Onder Kalaci	832c91e28c	Move processing each part of the query into its own functions This commit doesn't change any of the logic at all. Instead, the goal is to: * Get rid of any code duplication * Incremental changes to the optimizer made it slightly hard to follow the code, improve that and make it easier to implement new features * Simplify the code by moving each part of query processing (e.g., DISTINCT, LIMIT etc) into its own function * Make the interaction between each part of the query more obvious (e.g., How DISTINCT affects LIMIT etc)	2018-04-27 17:32:38 +03:00
Marco Slot	304b3a41ba	Cache the partition column Var	2018-04-26 14:58:16 -06:00
Murat Tuncer	a6fe5ca183	PG11 compatibility update - changes in ruleutils_11.c is reflected - vacuum statement api change is handled. We now allow multi-table vacuum commands. - some other function header changes are reflected - api conflicts between PG11 and earlier versions are handled by adding shims in version_compat.h - various regression tests are fixed due output and functionality in PG1 - no change is made to support new features in PG11 they need to be handled by new commit	2018-04-26 11:29:43 +03:00
Onder Kalaci	ac8f2f1e6d	Eliminate code duplication in WorkerExtendedOpNode() Before this commit, we had code duplication in the WorkerExtendedOpNode(). The duplication was noticeable and any change is prone to bugs. The PR consists of 4 commits. Each commit incrementally fixes the problem by moving certain parts of the duplicated code into smaller, better-documented functions.	2018-04-25 08:54:59 +03:00
Onder Kalaci	ee748d9140	Unify extendedOpNode Processing Before this commit, we had a divergence among the creation of master/worker extended op nodes. This commit moves the related parts into a single place and allows the creation of master/extended op nodes to share a common data structure.	2018-04-24 11:56:38 +03:00
Onder Kalaci	814f0e3acc	Ensure Citus never try to access a not planned subquery PostgreSQL might remove some of the subqueries when they do not contribute to the query result at all. Citus should not try to access such subqueries during planning.	2018-04-20 13:52:00 +03:00
mehmet furkan şahin	e5a5502b16	Adds support for multiple ANDs in Having This PR adds support for multiple AND expressions in Having for pushdown planner. We simply make a call to make_ands_explicit from MultiLogicalPlanOptimize for the having qual in workerExtendedOpNode.	2018-04-16 14:14:48 +03:00
velioglu	82b2d21b0c	Convert broadcast join to reference join After this commit large_table_shard_count wont be used to check whether broadcast join, which is renamed as reference join, can be applied. Reference join can only be applied over reference tables.	2018-04-13 12:58:14 +03:00
velioglu	1b92812be2	Add co-placement check to CoPartition function	2018-04-13 12:13:08 +03:00
Marco Slot	ee132c5ead	Prune shards once per relation in subquery pushdown	2018-04-10 20:33:07 +02:00
velioglu	72dfe4a289	Adds colocation check to local join	2018-04-04 22:49:27 +03:00
velioglu	698d585fb5	Remove broadcast join logic After this change all the logic related to shard data fetch logic will be removed. Planner won't plan any ShardFetchTask anymore. Shard fetch related steps in real time executor and task-tracker executor have been removed.	2018-03-30 11:45:19 +03:00
Murat Tuncer	1440caeef2	Fix incorrect limit pushdown when distinct clause is not superset of group by (#2035 ) Pushing down limit and order by into workers may produce wrong output when distinct on() clause has expressions, aggregates, or window functions. This checking allows pushing down of limits only if distinct clause is a superset of group by clause. i.e. it contains all clauses in group by.	2018-03-07 13:24:56 +03:00
Onder Kalaci	40b898b59f	Improve error messages for INSERT queries that have subqueries	2018-03-05 14:46:47 +02:00
Murat Tuncer	76f6883d5d	Add support for window functions that can be pushed down to worker (#2008 ) This is the first of series of window function work. We can now support window functions that can be pushed down to workers. Window function must have distribution column in the partition clause to be pushed down.	2018-03-01 19:07:07 +03:00
Marco Slot	e79db17b91	Update comment in WorkerAggregateExpressionList	2018-02-27 23:48:25 +01:00
Murat Tuncer	e13c5beced	Fix worker query when order by avg aggregate is used (#2024 ) We push down order by to worker query when limit is specified (with some other additional checks). If the query has an expression on an aggregate or avg aggregate by itself, and there is an order by on this particular target we may send wrong order by to worker query with potential to affect query result. The fix creates a auxilary target entry in the worker query and uses that target entry for sorting.	2018-02-28 12:12:54 +03:00
Metin Doslu	bcf660475a	Add support for modifying CTEs	2018-02-27 15:08:32 +02:00
velioglu	78e6d990a2	Fix master plan of the query with distinct, aggregate and group by clauses. Before this PR, we were trusting on the columns of group by about guaranteeing the uniqueness of the results. However, this assumption is correct only if the columns in the group by is subset of columns in the distinct clause. It can be wrong if we have part of group by columns and some aggregation columns in the distinct clause. With this PR, we add distinct plan on top of aggregate plan when necessary.	2018-02-26 15:30:15 +03:00
Onder Kalaci	1c930c96a3	Support non-co-located joins between subqueries With #1804 (and related PRs), Citus gained the ability to plan subqueries that are not safe to pushdown. There are two high-level requirements for pushing down subqueries: * Individual subqueries that require a merge step (i.e., GROUP BY on non-distribution key, or LIMIT in the subquery etc). We've handled such subqueries via #1876. * Combination of subqueries that are not joined on distribution keys. This commit aims to recursively plan some of such subqueries to make the whole query safe to pushdown. The main logic behind non colocated subquery joins is that we pick an anchor range table entry and check for distribution key equality of any other subqueries in the given query. If for a given subquery, we cannot find distribution key equality with the anchor rte, we recursively plan that subquery. We also used a hacky solution for picking relations as the anchor range table entries. The hack is that we wrap them into a subquery. This is only necessary since some of the attribute equivalance checks are based on queries rather than range table entries.	2018-02-26 13:50:37 +02:00
Onder Kalaci	7b57e0562a	Add infrastructure for detecting non-colocated subqueries	2018-02-26 13:28:25 +02:00
Onder Kalaci	4d70c86645	Leaf level recursive planning for non colocated subqueries With this commit, we enable recursive planning for the subqueries that are not joined on the distribution keys.	2018-02-26 13:28:24 +02:00
Onder Kalaci	e998703ff8	Enable restriction eq. checks for top level set operations We used to only support pushdownable set operations inside a subquery, however, we could easily expand the restriction checks to cover top level set operations as well.	2018-02-26 13:28:24 +02:00
Onder Kalaci	e8aa532a90	Refactor checks for distribution key equality Change some function names, ensure we stick to Citus' function order rules etc.	2018-02-26 13:28:24 +02:00
Markus Sintonen	6202e80d06	Implemented jsonb_agg, json_agg, jsonb_object_agg, json_object_agg	2018-02-18 00:19:18 +02:00
velioglu	195ac948d2	Recursively plan subqueries in WHERE clause when FROM recurs	2018-02-13 19:52:12 +03:00
Marco Slot	ee6a751798	Only copy distributed plan when modifying it	2018-02-12 16:30:55 +01:00
Onder Kalaci	94c5ac6ebb	Remove duplicate join restrictions We use PostgreSQL hooks to accumulate the join restrictions and PostgreSQL gives us all the join paths it tries while deciding on the join order. Thus, for queries that have many joins, this function is likely to remove lots of duplicate join restrictions. This becomes relevant for Citus on query pushdown check peformance.	2018-02-12 18:35:05 +02:00
Onder Kalaci	c228d8ff3d	Refactor equivalance generation related codes This commit changes the APIs for restriction generation to make future changes simpler.	2018-02-12 18:35:04 +02:00
Onder Kalaci	2f2d350924	Refactor relation restriction related codes This commit moves some of the functions to a more relevant source file.	2018-02-12 18:35:04 +02:00
Murat Tuncer	901b543e20	Fix count distinct using field select on top level query We were allowing count distict queries even if they were not directly on columns if the query is grouped on distribution column. When performing these checks we were skipping subqueries because they also perform this check in a more concise manner. We relied on oid SUBQUERY_RELATION_ID (10000) to decide if a given RTE relation id denotes a subquery, however, we also use SUBQUERY_PUSHDOWN_RELATION_ID (10001) for some subqueries. We skip both type of subqueries with this change.	2018-02-06 13:16:10 +03:00
metdos	35f864bcaf	Respect enable_hashagg in the master planner	2018-02-05 15:06:00 +02:00
metdos	3d540d961c	Fix typo in grouping_is_sortable()	2018-02-05 12:10:19 +02:00
Onder Kalaci	a1bbdf2d44	Outer joins should also use subquery pushdown planner if join clause is not supported This change allows unsupported clauses to go through query pushdown planner instead of erroring out as we already do for non-outer joins.	2017-12-29 16:40:47 +02:00
Marco Slot	09c09f650f	Recursively plan set operations when leaf nodes recur	2017-12-26 13:46:55 +02:00
mehmet furkan şahin	446893234a	unsupported subquery error messages are fixed	2017-12-25 15:10:59 +03:00
mehmet furkan şahin	57bc86e23d	new debug output for subplans	2017-12-25 09:50:51 +03:00
Murat Tuncer	87c6f306f1	Fix join clause eq restrictions (#1884 ) We used to error out if the join clause includes filters like t1.a < t2.a even if other filter like t1.key = t2.key exists. Recently we lifted that restriction in subquery planning by not lifting that restriction and focusing on equivalance classes provided by postgres. This checkin forwards previously erroring out real-time queries due to join clauses to subquery planner and let it handle the join even if the query does not have a subquery. We are now pushing down queries that do not have any subqueries in it. Error message looked misleading, changed to a more descriptive one.	2017-12-22 12:16:14 +03:00
Murat Tuncer	a9cf0c3e66	Fix CTE column alias issue (#1893 ) We were creating intermediate query result's target names from subquery target list. Now we also check if cte re-defines its column name aliases, and create intermediate result query accordingly.	2017-12-22 09:39:40 +03:00
Onder Kalaci	0d5a4b9c72	Recursively plan subqueries that are not safe to pushdown With this commit, Citus recursively plans subqueries that are not safe to pushdown, in other words, requires a merge step. The algorithm is simple: Recursively traverse the query from bottom up (i.e., bottom meaning the leaf queries). On each level, check whether the query is safe to pushdown (or a single repartition subquery). If the answer is yes, do not touch that subquery. If the answer is no, plan the subquery seperately (i.e., create a subPlan for it) and replace the subquery with a call to `read_intermediate_results(planId, subPlanId)`. During the the execution, run the subPlans first, and make them avaliable to the next query executions. Some of the queries hat this change allows us: * Subqueries with LIMIT * Subqueries with GROUP BY/DISTINCT on non-partition keys * Subqueries involving re-partition joins, router queries * Mixed usage of subqueries and CTEs (i.e., use CTEs in subqueries as well). Nested subqueries as long as we support the subquery inside the nested subquery. * Subqueries with local tables (i.e., those subqueries has the limitation that they have to be leaf subqueries) * VIEWs on the distributed tables just works (i.e., the limitations mentioned below still applies to views) Some of the queries that is still NOT supported: * Corrolated subqueries that are not safe to pushdown * Window function on non-partition keys * Recursively planned subqueries or CTEs on the outer side of an outer join * Only recursively planned subqueries and CTEs in the FROM (i.e., not any distributed tables in the FROM) and subqueries in WHERE clause * Subquery joins that are not on the partition columns (i.e., each subquery is individually joined on partition keys but not the upper level subquery.) * Any limitation that logical planner applies such as aggregate distincts (except for count) when GROUP BY is on non-partition key, or array_agg with ORDER BY	2017-12-21 08:37:40 +02:00
Onder Kalaci	e12ea914b9	Refactor ErrorIfQueryNotSupported to defer errors	2017-12-20 09:03:49 +02:00
Onder Kalaci	71ce42b936	Refactor RecursivelyPlanSubqueriesAndCTEs() to make it ready to work with subqueries	2017-12-20 09:03:47 +02:00
Marco Slot	5e0539efa3	Plan CTEs when subquery pushdown is on	2017-12-19 16:34:56 +01:00
Marco Slot	44a1ea631a	Show distributed subplan ID in EXPLAIN output	2017-12-19 16:34:56 +01:00
Marco Slot	7dab078e67	Set cost estimates for read_intermediate_result	2017-12-18 16:23:44 +01:00
Marco Slot	74bd33d0cc	Revert "Plan CTEs when subquery pushdown is on" This reverts commit `e3b953b8e3`.	2017-12-17 22:34:20 +01:00
Marco Slot	aca5f35ab9	Revert "Show distributed subplan ID in EXPLAIN output" This reverts commit `686b079272`.	2017-12-17 22:34:04 +01:00
Marco Slot	e3b953b8e3	Plan CTEs when subquery pushdown is on	2017-12-17 21:49:36 +01:00
Marco Slot	686b079272	Show distributed subplan ID in EXPLAIN output	2017-12-16 11:32:01 +01:00
Marco Slot	ea6b98fda4	Allow count(distinct) in queries with a subquery	2017-12-15 15:24:26 +01:00
Marco Slot	5a69fc1b17	Relax checks on recurring tuples in FROM with sublinks	2017-12-15 11:56:06 +01:00
Marco Slot	2e2b4e81fa	Add support for CTEs in distributed queries	2017-12-14 09:32:55 +01:00
Marco Slot	66f9f1d6cd	Make some intermediate results functions public	2017-12-14 09:32:55 +01:00
Onder Kalaci	86b2d9420c	Treat recurring tuples as reference table for GROUP BY checks read_intermediate_results() and immutable functions are implemented. Empty join trees seems not applicable here.	2017-12-13 14:55:42 +02:00
Marco Slot	60a1e31671	Allow queries with local tables in NeedsDistributedPlanning	2017-12-07 16:20:23 +01:00
Marco Slot	d8fea4efb8	Revert "Allow queries with local tables in NeedsDistributedPlanning" This reverts commit `d2bac081e8`.	2017-12-07 11:19:11 +01:00
Marco Slot	d2bac081e8	Allow queries with local tables in NeedsDistributedPlanning	2017-12-07 11:02:16 +01:00
Onder Kalaci	c42a92afd2	Fix bug related to incrementing an index not properly	2017-12-07 08:50:57 +02:00
Marco Slot	7279d42849	Treat read_intermediate_result as recurring tuples	2017-12-04 14:50:11 +01:00
Murat Tuncer	2d66bf5f16	Fix hard coded formatting strings for 64 bit numbers (#1831 ) Postgres provides OS agnosting formatting macros for formatting 64 bit numbers. Replaced %ld %lu with INT64_FORMAT and UINT64_FORMAT respectively. Also found some incorrect usages of formatting flags and fixed them.	2017-12-04 14:11:06 +03:00
Onder Kalaci	a273711500	The common attribute equivalance class always includes the input relations We added the ability to filter out the planner restriction information for specific parts of the query. This might lead to situations where the common restriction includes some other relations that we're searching for. The reason is that while filtering for join restrictions, we add the restriction as soon as we find the relation. With this commit we make sure that the common attribute equivalance class always includes the input relations.	2017-11-30 16:00:26 +02:00
Marco Slot	3a4d5f8182	Remove filter checks on leaf queries	2017-11-30 12:25:14 +01:00
Marco Slot	3f03cb6a6a	Support UNION with joins in the subqueries	2017-11-30 10:37:56 +01:00
Marco Slot	a9933deac6	Make real time executor work in transactions	2017-11-30 09:59:32 +03:00
Marco Slot	7ea718fd8d	Round-robin over worker nodes for 0-shard router queries	2017-11-29 15:52:22 +01:00
Onder Kalaci	05fb0dd020	Add infrastructure for filtering restriction contexts based on the input query In subquery pushdown, we first ensure that each relation is joined with at least on another relation on the partition keys. That's fine given that the decision is binary: pushdown the query at all or not. With recursive planning, we'd want to check whether any specific part of the query can be pushded down or not. Thus, we need the ability to understand which part(s) of the subquery is safe to pushdown. This commit adds the infrastructure for doing that.	2017-11-28 09:58:21 +02:00
Onder Kalaci	26d9b58e9e	Make sure that ExtractRangeTableRelationWalker never misses RTE_RELATION	2017-11-28 09:27:34 +02:00
Onder Kalaci	32def06ebd	Split assigning RTE identities and partitioning related query modifications Note that we used to iterate over the RTEs once for performance reasons. However, keeping an extra copy of original query seems more costly and hard to maintain/explain.	2017-11-28 09:27:34 +02:00
Marco Slot	feffe86440	Subqueries containing functions go through subquery pushdown	2017-11-27 22:13:02 +01:00
Onder Kalaci	48f96bf3e5	Enable non equi joins in subquery pushdown Subquery pushdown planning is based on relation restriction equivalnce. This brings us the opportuneatly to allow any other joins as long as there is an already equi join between the distributed tables. We already allow that for joins with reference tables and this commit allows that for joins among distributed tables.	2017-11-23 16:13:46 +02:00
Onder Kalaci	83c1143505	Refactor custom scan related codes In this commit, we don't change any codes, only create a new file and move the related functions and types there.	2017-11-23 11:38:12 +02:00
Marco Slot	6ba3f42d23	Rename MultiPlan to DistributedPlan	2017-11-22 09:36:24 +01:00
Marco Slot	0ad39b36fe	Treat immutable table functions and constant subqueries as reference tables	2017-11-21 14:15:22 +01:00
Onder Kalaci	d558ebb923	Relax the checks on ensuring distribution columns for target entries With this commit, we allow pushing down subqueries with only reference tables where GROUP BY or DISTINCT clause or Window functions include only columns from reference tables.	2017-11-21 12:28:14 +02:00
Brian Cloutier	7be1545843	Support implicit casts during INSERT/SELECT It's possible to build INSERT SELECT queries which include implicit casts, currently we attempt to support these by adding explicit casts to the SELECT query, but this sometimes crashes because we don't update all nodes with the new types. (SortClauses, for instance) This commit removes those explicit casts and passes an unmodified SELECT query to the COPY executor (how we implement INSERT SELECT under the scenes). In lieu of those cases, COPY has been given some extra logic to inspect queries, notice that the types don't line up with the table it's supposed to be inserting into, and "manually" casting every tuple before sending them to workers.	2017-11-03 22:27:15 -07:00
Marco Slot	6219186683	Allow distributed INSERT...SELECT via worker nodes in MX	2017-11-02 14:38:39 +01:00
metdos	8c356b2bc8	Don't try to add restrictions for reference tables in insert into select	2017-10-31 19:44:10 +02:00
Murat Tuncer	e16805215d	Support count(distinct) for non-partition columns (#1692 ) Expands count distinct coverage by allowing more cases. We used to support count distinct only if we can push down distinct aggregate to worker query i.e. the count distinct clause was on the partition column of the table, or there was a grouping on the partition column. Now we can support - non-partition columns, with or without grouping on partition column - partition, and non partition column in the same query - having clause - single table subqueries - insert into select queries - join queries where count distinct is on partition, or non-partition column - filters on count distinct clauses (extends existing support) We first try to push down aggregate to worker query (original case), if we can't then we modify worker query to return distinct columns to coordinator node. We do that by adding distinct column targets to group by clauses. Then we perform count distinct operation on the coordinator node. This work should reduce the cases where HLL is used as it can address anything that HLL can. However, if we start having performance issues due to very large number rows, then we can recommend hll use.	2017-10-30 13:12:24 +02:00
velioglu	0b5db5d826	Support multi shard update/delete queries	2017-10-25 15:52:38 +03:00
Murat Tuncer	4832abc7cb	Make multi_master_planner.c coding convention compliant Changed order of function definitions and added declarations in the beginning of the file	2017-10-13 14:59:48 +03:00
Murat Tuncer	f7ab901766	Add select distinct, and distinct on support Distinct, and distinct on() clauses are supported in simple selects, joins, subqueries, and insert into select queries.	2017-10-13 14:59:48 +03:00
Onder Kalaci	498ac80d8b	Add window function support for SUBQUERY PUSHDOWN and INSERT INTO SELECT This commit provides the support for window functions in subquery and insert into select queries. Note that our support for window functions is still limited because it must have a partition by clause on the distribution key. This commit makes changes in the files insert_select_planner and multi_logical_planner. The required tests are also added with files multi_subquery_window_functions.out and multi_insert_select_window.out.	2017-10-04 15:33:07 +03:00
Hadi Moshayedi	11adb9b034	Push down LIMIT and HAVING when grouped by partition key. (#1641 ) We can do this because all rows belonging to a group are in the same shard when grouping by distribution column on a range/hash distributed table.	2017-10-02 20:17:51 -04:00
Jason Petersen	d686123dae	Omit now-public Explain methods from PG11 build This copy-pasted code is no longer needed in PG11.	2017-09-25 17:20:24 -07:00
Jason Petersen	6c9b19a954	Add version-compat header For polyfill macros, etc.	2017-09-25 17:20:23 -07:00
Jason Petersen	fbeaa2f9d0	Remove direct access to tupleDesc->attrs A level of indirection was removed from this field for PostgreSQL 11. By using the handy provided macro, we can be version agnostic.	2017-09-25 17:20:23 -07:00

1 2 3 4 5 ...

326 Commits (796141334dde262da3ebdf449423a2be5d207761)