citus

Commit Graph

Author	SHA1	Message	Date
Onder Kalaci	f144bb4911	Introduce fast path router planning In this context, we define "Fast Path Planning for SELECT" as trivial queries where Citus can skip relying on the standard_planner() and handle all the planning. For router planner, standard_planner() is mostly important to generate the necessary restriction information. Later, the restriction information generated by the standard_planner is used to decide whether all the shards that a distributed query touches reside on a single worker node. However, standard_planner() does a lot of extra things such as cost estimation and execution path generations which are completely unnecessary in the context of distributed planning. There are certain types of queries where Citus could skip relying on standard_planner() to generate the restriction information. For queries in the following format, Citus does not need any information that the standard_planner() generates: SELECT ... FROM single_table WHERE distribution_key = X; or DELETE FROM single_table WHERE distribution_key = X; or UPDATE single_table SET value_1 = value_2 + 1 WHERE distribution_key = X; Note that the queries might not be as simple as the above such that GROUP BY, WINDOW FUNCIONS, ORDER BY or HAVING etc. are all acceptable. The only rule is that the query is on a single distributed (or reference) table and there is a "distribution_key = X;" in the WHERE clause. With that, we could use to decide the shard that a distributed query touches reside on a worker node.	2019-02-21 13:27:01 +03:00
Hanefi Onaldi	1106e14385	Wrap functions in subqueries remove debug logs to fix travis tests Support RowType functions in joins Regression tests for a custom type function in join	2019-02-04 19:19:29 +03:00
Nils Dijk	4af40eee76	Enable SSL by default during installation of citus	2018-12-07 11:23:19 -07:00
velioglu	8764a19464	Adds support for disabling hash agg with hll functions on coordinator query	2018-12-07 18:49:25 +03:00
Marco Slot	8893cc141d	Support INSERT...SELECT with ON CONFLICT or RETURNING via coordinator Before this commit, Citus supported INSERT...SELECT queries with ON CONFLICT or RETURNING clauses only for pushdownable ones, since queries supported via coordinator were utilizing COPY infrastructure of PG to send selected tuples to the target worker nodes. After this PR, INSERT...SELECT queries with ON CONFLICT or RETURNING clauses will be performed in two phases via coordinator. In the first phase selected tuples will be saved to the intermediate table which is colocated with target table of the INSERT...SELECT query. Note that, a utility function to save results to the colocated intermediate result also implemented as a part of this commit. In the second phase, INSERT.. SELECT query is directly run on the worker node using the intermediate table as the source table.	2018-11-30 15:29:12 +03:00
Hanefi Onaldi	b3d897039a	constraint validation regression tests	2018-11-26 14:04:51 +03:00
Hadi Moshayedi	d3e284dcd6	Use heap_deform_tuple() instead of calling heap_getattr(). (#2464 ) After Fast ALTER TABLE ADD COLUMN with a non-NULL default in PG11, physical heaps might not contain all attributes after a ALTER TABLE ADD COLUMN happens. heap_getattr() returns NULL when the physical tuple doesn't contain an attribute. So we should use heap_deform_tuple() in these cases, which fills in the missing attributes. Our catalog tables evolve over time, and an upgrade might involve some ALTER TABLE ADD COLUMN commands. Note that we don't need to worry about postgres catalog tables and we can use heap_getattr() for them, because they only change between major versions. This also fixes #2453.	2018-11-05 15:11:01 -05:00
Onder Kalaci	73696a03e4	Make sure not to leak intermediate result folders on the workers	2018-10-09 22:47:56 +03:00
Onder Kalaci	c1b5a04f6e	Allow partitioned tables with replication factor > 1 With this commit, we all partitioned distributed tables with replication factor > 1. However, we also have many restrictions. In summary, we disallow all kinds of modifications (including DDLs) on the partition tables. Instead, the user is allowed to run the modifications over the parent table. The necessity for such a restriction have two aspects: - We need to acquire shard resource locks appropriately - We need to handle marking partitions INVALID in case of any failures. Note that, in theory, the parent table should also become INVALID, which is too aggressive.	2018-09-21 14:40:41 +03:00
Murat Tuncer	ae0032dff8	Add regression tests for procedure calls PG11 introduced PROCEDURE concept similar to FUNCTION Procedure's allow committing/rolling back behavior. This commmit adds regression tests for procedure calls.	2018-09-12 10:28:50 +03:00
Murat Tuncer	470ee0b4d9	Revert multi_partition test back to being required Test was marked as optional (ignore) by previous commit. Reverting that change to make test required	2018-09-11 12:39:44 -06:00
Jason Petersen	c4e2349b80	Mark failing PostgreSQL 11 test as ignored This commit should be reverted once a new PostgreSQL 11 beta is available: it's due to a bug in the partitioning code which has been fixed in REL_11_STABLE but (not yet) a released tag.	2018-08-16 19:37:37 -06:00
mehmet furkan şahin	df11dda750	hll aggregates are tested	2018-07-05 08:19:01 +03:00
Onder Kalaci	d83be3a33f	Enforce foreign key restrictions inside transaction blocks When a hash distributed table have a foreign key to a reference table, there are few restrictions we have to apply in order to prevent distributed deadlocks or reading wrong results. The necessity to apply the restrictions arise from cascading nature of foreign keys. When a foreign key on a reference table cascades to a distributed table, a single operation over a single connection can acquire locks on multiple shards of the distributed table. Thus, any parallel operation on that distributed table, in the same transaction should not open parallel connections to the shards. Otherwise, we'd either end-up with a self-distributed deadlock or read wrong results. As briefly described above, the restrictions that we apply is done by tracking the distributed/reference relation accesses inside transaction blocks, and act accordingly when necessary. The two main rules are as follows: - Whenever a parallel distributed relation access conflicts with a consecutive reference relation access, Citus errors out - Whenever a reference relation access is followed by a conflicting parallel relation access, the execution mode is switched to sequential mode. There are also some other notes to mention: - If the user does SET LOCAL citus.multi_shard_modify_mode TO 'sequential';, all the queries should simply work with using one connection per worker and sequentially executing the commands. That's obviously a slower approach than Citus' usual parallel execution. However, we've at least have a way to run all commands successfully. - If an unrelated parallel query executed on any distributed table, we cannot switch to sequential mode. Because, the essense of sequential mode is using one connection per worker. However, in the presence of a parallel connection, the connection manager picks those connections to execute the commands. That contradicts with our purpose, thus we error out. - COPY to a distributed table cannot be executed in sequential mode. Thus, if we switch to sequential mode and COPY is executed, the operation fails and there is currently no way of implementing that. Note that, when the local table is not empty and create_distributed_table is used, citus uses COPY internally. Thus, in those cases, create_distributed_table() will also fail. - There is a GUC called citus.enforce_foreign_key_restrictions to disable all the checks. We added that GUC since the restrictions we apply is sometimes a bit more restrictive than its necessary. The user might want to relax those. Similarly, if you don't have CASCADEing reference tables, you might consider disabling all the checks.	2018-07-03 17:05:55 +03:00
velioglu	6be6911ed9	Create foreign key relation graph and functions to query on it	2018-07-03 17:05:55 +03:00
mehmet furkan şahin	2fa4e38841	FK from dist to ref can be added with alter table	2018-07-03 17:05:01 +03:00
Onder Kalaci	2f01894589	Track relation accesses using the connection management infrastructure	2018-06-25 18:40:30 +03:00
velioglu	53b2e81d01	Adds SELECT ... FOR UPDATE support for router plannable queries	2018-06-18 13:55:17 +03:00
Jason Petersen	5bf7bc64ba	Add pg_dist_authinfo schema and validation This table will be used by Citus Enterprise to populate authentication- related fields in outbound connections; Citus Community lacks support for this functionality.	2018-06-13 11:16:26 -06:00
Onder Kalaci	df44956dc3	Make sure that sequential DDL opens a single connection to each node After this commit DDL commands honour `citus.multi_shard_modify_mode`. We preferred using the code-path that executes single task router queries (e.g., ExecuteSingleModifyTask()) in order not to invent a new executor that is only applicable for DDL commands that require sequential execution.	2018-06-05 17:52:17 +03:00
Onder Kalaci	04d9e886fe	Run concurrent modification queries in tests sequentially	2018-05-10 11:59:18 +03:00
mehmet furkan şahin	785a86ed0a	Tests are updated to use create_distributed_table	2018-05-10 11:18:59 +03:00
Marco Slot	90cdfff602	Implement recursive planning for DML statements	2018-05-03 14:42:28 +02:00
Onder Kalaci	317dd02a2f	Implement single repartitioning on hash distributed tables * Change worker_hash_partition_table() such that the divergence between Citus planner's hashing and worker_hash_partition_table() becomes the same. * Rename single partitioning to single range partitioning. * Add single hash repartitioning. Basically, logical planner treats single hash and range partitioning almost equally. Physical planner, on the other hand, treats single hash and dual hash repartitioning almost equally (except for JoinPruning). * Add a new GUC to enable this feature	2018-05-02 18:50:55 +03:00
velioglu	121ff39b26	Removes large_table_shard_count GUC	2018-04-29 10:34:50 +02:00
Murat Tuncer	a6fe5ca183	PG11 compatibility update - changes in ruleutils_11.c is reflected - vacuum statement api change is handled. We now allow multi-table vacuum commands. - some other function header changes are reflected - api conflicts between PG11 and earlier versions are handled by adding shims in version_compat.h - various regression tests are fixed due output and functionality in PG1 - no change is made to support new features in PG11 they need to be handled by new commit	2018-04-26 11:29:43 +03:00
velioglu	1b92812be2	Add co-placement check to CoPartition function	2018-04-13 12:13:08 +03:00
velioglu	698d585fb5	Remove broadcast join logic After this change all the logic related to shard data fetch logic will be removed. Planner won't plan any ShardFetchTask anymore. Shard fetch related steps in real time executor and task-tracker executor have been removed.	2018-03-30 11:45:19 +03:00
Murat Tuncer	76f6883d5d	Add support for window functions that can be pushed down to worker (#2008 ) This is the first of series of window function work. We can now support window functions that can be pushed down to workers. Window function must have distribution column in the partition clause to be pushed down.	2018-03-01 19:07:07 +03:00
Marco Slot	c723a1fa32	Add support for bool and bit aggregates	2018-02-27 23:48:25 +01:00
Murat Tuncer	e13c5beced	Fix worker query when order by avg aggregate is used (#2024 ) We push down order by to worker query when limit is specified (with some other additional checks). If the query has an expression on an aggregate or avg aggregate by itself, and there is an order by on this particular target we may send wrong order by to worker query with potential to affect query result. The fix creates a auxilary target entry in the worker query and uses that target entry for sorting.	2018-02-28 12:12:54 +03:00
Onder Kalaci	1c930c96a3	Support non-co-located joins between subqueries With #1804 (and related PRs), Citus gained the ability to plan subqueries that are not safe to pushdown. There are two high-level requirements for pushing down subqueries: * Individual subqueries that require a merge step (i.e., GROUP BY on non-distribution key, or LIMIT in the subquery etc). We've handled such subqueries via #1876. * Combination of subqueries that are not joined on distribution keys. This commit aims to recursively plan some of such subqueries to make the whole query safe to pushdown. The main logic behind non colocated subquery joins is that we pick an anchor range table entry and check for distribution key equality of any other subqueries in the given query. If for a given subquery, we cannot find distribution key equality with the anchor rte, we recursively plan that subquery. We also used a hacky solution for picking relations as the anchor range table entries. The hack is that we wrap them into a subquery. This is only necessary since some of the attribute equivalance checks are based on queries rather than range table entries.	2018-02-26 13:50:37 +02:00
Onder Kalaci	cdb8d429a7	Add regression tests for non-colocated leaf subqueries	2018-02-26 13:28:24 +02:00
Markus Sintonen	6202e80d06	Implemented jsonb_agg, json_agg, jsonb_object_agg, json_object_agg	2018-02-18 00:19:18 +02:00
velioglu	195ac948d2	Recursively plan subqueries in WHERE clause when FROM recurs	2018-02-13 19:52:12 +03:00
Marco Slot	09c09f650f	Recursively plan set operations when leaf nodes recur	2017-12-26 13:46:55 +02:00
mehmet furkan şahin	fd546cf322	Intermediate result size limitation This commit introduces a new GUC to limit the intermediate result size which we handle when we use read_intermediate_result function for CTEs and complex subqueries.	2017-12-21 14:26:56 +03:00
Onder Kalaci	e2a5124830	Add regression tests for recursive subquery planning	2017-12-21 08:37:40 +02:00
Marco Slot	a811aad264	Deparallelise multi_modifying_xacts tests	2017-12-14 10:27:17 +01:00
mehmet furkan şahin	5851f71bfb	Add CTE regression tests	2017-12-14 09:32:55 +01:00
Marco Slot	716448ddef	Add regression tests for intermediate results	2017-12-04 14:50:11 +01:00
Marco Slot	0d6a7f5884	Add real-time BEGIN regression tests	2017-11-30 12:59:09 +01:00
Marco Slot	a9933deac6	Make real time executor work in transactions	2017-11-30 09:59:32 +03:00
mehmet furkan şahin	032b34ea52	some more parallelization	2017-11-23 14:10:42 +03:00
mehmet furkan şahin	34709c2a16	Regression tests parallelization PART-1	2017-11-20 18:03:37 +03:00
mehmet furkan şahin	314fc09d90	regression test shard_count is changed from 32 to 4	2017-11-20 12:47:49 +03:00
velioglu	0b5db5d826	Support multi shard update/delete queries	2017-10-25 15:52:38 +03:00
Murat Tuncer	f7ab901766	Add select distinct, and distinct on support Distinct, and distinct on() clauses are supported in simple selects, joins, subqueries, and insert into select queries.	2017-10-13 14:59:48 +03:00
Onder Kalaci	498ac80d8b	Add window function support for SUBQUERY PUSHDOWN and INSERT INTO SELECT This commit provides the support for window functions in subquery and insert into select queries. Note that our support for window functions is still limited because it must have a partition by clause on the distribution key. This commit makes changes in the files insert_select_planner and multi_logical_planner. The required tests are also added with files multi_subquery_window_functions.out and multi_insert_select_window.out.	2017-10-04 15:33:07 +03:00
Hadi Moshayedi	11adb9b034	Push down LIMIT and HAVING when grouped by partition key. (#1641 ) We can do this because all rows belonging to a group are in the same shard when grouping by distribution column on a range/hash distributed table.	2017-10-02 20:17:51 -04:00

1 2 3

117 Commits (acc2b0a387603136e7cbea086ccef021de7b8150)