citus

Commit Graph

Author	SHA1	Message	Date
Onder Kalaci	64560b07be	Update regression tests-2 In this commit, we're introducing a way to prevent CTE inlining via a GUC. The GUC is used in all the tests where PG 11 and PG 12 tests would diverge otherwise. Note that, in PG 12, the restriction information for CTEs are generated. It means that for some queries involving CTEs, Citus planner (router planner/ pushdown planner) may behave differently. So, via the GUC, we prevent tests to diverge on PG 11 vs PG 12. When we drop PG 11 support, we should get rid of the GUC, and mark relevant ctes as MATERIALIZED, which does the same thing.	2020-01-16 12:28:15 +01:00
Philip Dubé	261a9de42d	Fix typos: VAR_SET_VALUE_KIND -> VAR_SET_VALUE kind beginnig -> beginning plannig -> planning the the -> the er then -> er than	2019-11-25 23:24:13 +00:00
Philip Dubé	c563e0825c	Strip trailing whitespace and add final newline (#3186 ) This brings files in line with our editorconfig file	2019-11-21 14:25:37 +01:00
Philip Dubé	befd0caddd	Tests: normalize sql_procedure and custom_aggregate_support Also fix typo in multi_insert_select	2019-07-10 14:36:17 +00:00
Önder Kalacı	40da78c6fd	Introduce the adaptive executor (#2798 ) With this commit, we're introducing the Adaptive Executor. The commit message consists of two distinct sections. The first part explains how the executor works. The second part consists of the commit messages of the individual smaller commits that resulted in this commit. The readers can search for the each of the smaller commit messages on https://github.com/citusdata/citus and can learn more about the history of the change. /------------------------------------------------------------------------- * adaptive_executor.c * * The adaptive executor executes a list of tasks (queries on shards) over * a connection pool per worker node. The results of the queries, if any, * are written to a tuple store. * * The concepts in the executor are modelled in a set of structs: * * - DistributedExecution: * Execution of a Task list over a set of WorkerPools. * - WorkerPool * Pool of WorkerSessions for the same worker which opportunistically * executes "unassigned" tasks from a queue. * - WorkerSession: * Connection to a worker that is used to execute "assigned" tasks * from a queue and may execute unasssigned tasks from the WorkerPool. * - ShardCommandExecution: * Execution of a Task across a list of placements. * - TaskPlacementExecution: * Execution of a Task on a specific placement. * Used in the WorkerPool and WorkerSession queues. * * Every connection pool (WorkerPool) and every connection (WorkerSession) * have a queue of tasks that are ready to execute (readyTaskQueue) and a * queue/set of pending tasks that may become ready later in the execution * (pendingTaskQueue). The tasks are wrapped in a ShardCommandExecution, * which keeps track of the state of execution and is referenced from a * TaskPlacementExecution, which is the data structure that is actually * added to the queues and describes the state of the execution of a task * on a particular worker node. * * When the task list is part of a bigger distributed transaction, the * shards that are accessed or modified by the task may have already been * accessed earlier in the transaction. We need to make sure we use the * same connection since it may hold relevant locks or have uncommitted * writes. In that case we "assign" the task to a connection by adding * it to the task queue of specific connection (in * AssignTasksToConnections). Otherwise we consider the task unassigned * and add it to the task queue of a worker pool, which means that it * can be executed over any connection in the pool. * * A task may be executed on multiple placements in case of a reference * table or a replicated distributed table. Depending on the type of * task, it may not be ready to be executed on a worker node immediately. * For instance, INSERTs on a reference table are executed serially across * placements to avoid deadlocks when concurrent INSERTs take conflicting * locks. At the beginning, only the "first" placement is ready to execute * and therefore added to the readyTaskQueue in the pool or connection. * The remaining placements are added to the pendingTaskQueue. Once * execution on the first placement is done the second placement moves * from pendingTaskQueue to readyTaskQueue. The same approach is used to * fail over read-only tasks to another placement. * * Once all the tasks are added to a queue, the main loop in * RunDistributedExecution repeatedly does the following: * * For each pool: * - ManageWorkPool evaluates whether to open additional connections * based on the number unassigned tasks that are ready to execute * and the targetPoolSize of the execution. * * Poll all connections: * - We use a WaitEventSet that contains all (non-failed) connections * and is rebuilt whenever the set of active connections or any of * their wait flags change. * * We almost always check for WL_SOCKET_READABLE because a session * can emit notices at any time during execution, but it will only * wake up WaitEventSetWait when there are actual bytes to read. * * We check for WL_SOCKET_WRITEABLE just after sending bytes in case * there is not enough space in the TCP buffer. Since a socket is * almost always writable we also use WL_SOCKET_WRITEABLE as a * mechanism to wake up WaitEventSetWait for non-I/O events, e.g. * when a task moves from pending to ready. * * For each connection that is ready: * - ConnectionStateMachine handles connection establishment and failure * as well as command execution via TransactionStateMachine. * * When a connection is ready to execute a new task, it first checks its * own readyTaskQueue and otherwise takes a task from the worker pool's * readyTaskQueue (on a first-come-first-serve basis). * * In cases where the tasks finish quickly (e.g. <1ms), a single * connection will often be sufficient to finish all tasks. It is * therefore not necessary that all connections are established * successfully or open a transaction (which may be blocked by an * intermediate pgbouncer in transaction pooling mode). It is therefore * essential that we take a task from the queue only after opening a * transaction block. * * When a command on a worker finishes or the connection is lost, we call * PlacementExecutionDone, which then updates the state of the task * based on whether we need to run it on other placements. When a * connection fails or all connections to a worker fail, we also call * PlacementExecutionDone for all queued tasks to try the next placement * and, if necessary, mark shard placements as inactive. If a task fails * to execute on all placements, the execution fails and the distributed * transaction rolls back. * * For multi-row INSERTs, tasks are executed sequentially by * SequentialRunDistributedExecution instead of in parallel, which allows * a high degree of concurrency without high risk of deadlocks. * Conversely, multi-row UPDATE/DELETE/DDL commands take aggressive locks * which forbids concurrency, but allows parallelism without high risk * of deadlocks. Note that this is unrelated to SEQUENTIAL_CONNECTION, * which indicates that we should use at most one connection per node, but * can run tasks in parallel across nodes. This is used when there are * writes to a reference table that has foreign keys from a distributed * table. * * Execution finishes when all tasks are done, the query errors out, or * the user cancels the query. * ------------------------------------------------------------------------- / All the commits involved here: * Initial unified executor prototype * Latest changes * Fix rebase conflicts to master branch * Add missing variable for assertion * Ensure that master_modify_multiple_shards() returns the affectedTupleCount * Adjust intermediate result sizes The real-time executor uses COPY command to get the results from the worker nodes. Unified executor avoids that which results in less data transfer. Simply adjust the tests to lower sizes. * Force one connection per placement (or co-located placements) when requested The existing executors (real-time and router) always open 1 connection per placement when parallel execution is requested. That might be useful under certain circumstances: (a) User wants to utilize as much as CPUs on the workers per distributed query (b) User has a transaction block which involves COPY command Also, lots of regression tests rely on this execution semantics. So, we'd enable few of the tests with this change as well. * For parameters to be resolved before using them For the details, see PostgreSQL's copyParamList() * Unified executor sorts the returning output * Ensure that unified executor doesn't ignore sequential execution of DDLJob's Certain DDL commands, mainly creating foreign keys to reference tables, should be executed sequentially. Otherwise, we'd end up with a self distributed deadlock. To overcome this situaiton, we set a flag `DDLJob->executeSequentially` and execute it sequentially. Note that we have to do this because the command might not be called within a transaction block, and we cannot call `SetLocalMultiShardModifyModeToSequential()`. This fixes at least two test: multi_insert_select_on_conflit.sql and multi_foreign_key.sql Also, I wouldn't mind scattering local `targetPoolSize` variables within the code. The reason is that we'll soon have a GUC (or a global variable based on a GUC) that'd set the pool size. In that case, we'd simply replace `targetPoolSize` with the global variables. * Fix 2PC conditions for DDL tasks * Improve closing connections that are not fully established in unified execution * Support foreign keys to reference tables in unified executor The idea for supporting foreign keys to reference tables is simple: Keep track of the relation accesses within a transaction block. - If a parallel access happens on a distributed table which has a foreign key to a reference table, one cannot modify the reference table in the same transaction. Otherwise, we're very likely to end-up with a self-distributed deadlock. - If an access to a reference table happens, and then a parallel access to a distributed table (which has a fkey to the reference table) happens, we switch to sequential mode. Unified executor misses the function calls that marks the relation accesses during the execution. Thus, simply add the necessary calls and let the logic kick in. * Make sure to close the failed connections after the execution * Improve comments * Fix savepoints in unified executor. * Rebuild the WaitEventSet only when necessary * Unclaim connections on all errors. * Improve failure handling for unified executor - Implement the notion of errorOnAnyFailure. This is similar to Critical Connections that the connection managament APIs provide - If the nodes inside a modifying transaction expand, activate 2PC - Fix few bugs related to wait event sets - Mark placement INACTIVE during the execution as much as possible as opposed to we do in the COMMIT handler - Fix few bugs related to scheduling next placement executions - Improve decision on when to use 2PC Improve the logic to start a transaction block for distributed transactions - Make sure that only reference table modifications are always executed with distributed transactions - Make sure that stored procedures and functions are executed with distributed transactions * Move waitEventSet to DistributedExecution This could also be local to RunDistributedExecution(), but in that case we had to mark it as "volatile" to avoid PG_TRY()/PG_CATCH() issues, and cast it to non-volatile when doing WaitEventSetFree(). We thought that would make code a bit harder to read than making this non-local, so we move it here. See comments for PG_TRY() in postgres/src/include/elog.h and "man 3 siglongjmp" for more context. * Fix multi_insert_select test outputs Two things: 1) One complex transaction block is now supported. Simply update the test output 2) Due to dynamic nature of the unified executor, the orders of the errors coming from the shards might change (e.g., all of the queries on the shards would fail, but which one appears on the error message?). To fix that, we simply added it to our shardId normalization tool which happens just before diff. * Fix subeury_and_cte test The error message is updated from: failed to execute task To: more than one row returned by a subquery or an expression which is a lot clearer to the user. * Fix intermediate_results test outputs Simply update the error message from: could not receive query results to result "squares" does not exist which makes a lot more sense. * Fix multi_function_in_join test The error messages update from: Failed to execute task XXX To: function f(..) does not exist * Fix multi_query_directory_cleanup test The unified executor does not create any intermediate files. * Fix with_transactions test A test case that just started to work fine * Fix multi_router_planner test outputs The error message is update from: Could not receive query results To: Relation does not exists which is a lot more clearer for the users * Fix multi_router_planner_fast_path test The error message is update from: Could not receive query results To: Relation does not exists which is a lot more clearer for the users * Fix isolation_copy_placement_vs_modification by disabling select_opens_transaction_block * Fix ordering in isolation_multi_shard_modify_vs_all * Add executor locks to unified executor * Make sure to allocate enought WaitEvents The previous code was missing the waitEvents for the latch and postmaster death. * Fix rebase conflicts for master rebase * Make sure that TRUNCATE relies on unified executor * Implement true sequential execution for multi-row INSERTS Execute the individual tasks executed one by one. Note that this is different than MultiShardConnectionType == SEQUENTIAL_CONNECTION case (e.g., sequential execution mode). In that case, running the tasks across the nodes in parallel is acceptable and implemented in that way. However, the executions that are qualified here would perform poorly if the tasks across the workers are executed in parallel. We currently qualify only one class of distributed queries here, multi-row INSERTs. If we do not enforce true sequential execution, concurrent multi-row upserts could easily form a distributed deadlock when the upserts touch the same rows. * Remove SESSION_LIFESPAN flag in unified_executor * Apply failure test updates We've changed the failure behaviour a bit, and also the error messages that show up to the user. This PR covers majority of the updates. * Unified executor honors citus.node_connection_timeout With this commit, unified executor errors out if even a single connection cannot be established within citus.node_connection_timeout. And, as a side effect this fixes failure_connection_establishment test. * Properly increment/decrement pool size variables Before this commit, the idle and active connection counts were not properly calculated. * insert_select_executor goes through unified executor. * Add missing file for task tracker * Modify ExecuteTaskListExtended()'s signature * Sort output of INSERT ... SELECT ... RETURNING * Take partition locks correctly in unified executor * Alternative implementation for force_max_query_parallelization * Fix compile warnings in unified executor * Fix style issues * Decrement idleConnectionCount when idle connection is lost * Always rebuild the wait event sets In the previous implementation, on waitFlag changes, we were only modifying the wait events. However, we've realized that it might be an over optimization since (a) we couldn't see any performance benefits (b) we see some errors on failures and because of (a) we prefer to disable it now. * Make sure to allocate enough sized waitEventSet With multi-row INSERTs, we might have more sessions than taskworkerCount after few calls of RunDistributedExecution() because the previous sessions would also be alive. Instead, re-allocate events when the connectino set changes. Implement SELECT FOR UPDATE on reference tables On master branch, we do two extra things on SELECT FOR UPDATE queries on reference tables: - Acquire executor locks - Execute the query on all replicas With this commit, we're implementing the same logic on the new executor. * SELECT FOR UPDATE opens transaction block even if SelectOpensTransactionBlock disabled Otherwise, users would be very confused and their logic is very likely to break. * Fix build error * Fix the newConnectionCount calculation in ManageWorkerPool * Fix rebase conflicts * Fix minor test output differences * Fix citus indent * Remove duplicate sorts that is added with rebase * Create distributed table via executor * Fix wait flags in CheckConnectionReady * failure_savepoints output for unified executor. * failure_vacuum output (pg 10) for unified executor. * Fix WaitEventSetWait timeout in unified executor * Stabilize failure_truncate test output * Add an ORDER BY to multi_upsert * Fix regression test outputs after rebase to master * Add executor.c comment * Rename executor.c to adaptive_executor.c * Do not schedule tasks if the failed placement is not ready to execute Before the commit, we were blindly scheduling the next placement executions even if the failed placement is not on the ready queue. Now, we're ensuring that if failed placement execution is on a failed pool or session where the execution is on the pendingQueue, we do not schedule the next task. Because the other placement execution should be already running. * Implement a proper custom scan node for adaptive executor - Switch between the executors, add GUC to set the pool size - Add non-adaptive regression test suites - Enable CIRCLE CI for non-adaptive tests - Adjust test output files * Add slow start interval to the executor * Expose max_cached_connection_per_worker to user * Do not start slow when there are cached connections * Consider ExecutorSlowStartInterval in NextEventTimeout * Fix memory issues with ReceiveResults(). * Disable executor via TaskExecutorType * Make sure to execute the tests with the other executor * Use task_executor_type to enable-disable adaptive executor * Remove useless code * Adjust the regression tests * Add slow start regression test * Rebase to master * Fix test failures in adaptive executor. * Rebase to master - 2 * Improve comments & debug messages * Set force_max_query_parallelization in isolation_citus_dist_activity * Force max parallelization for creating shards when asked to use exclusive connection. * Adjust the default pool size * Expand description of max_adaptive_executor_pool_size GUC * Update warnings in FinishRemoteTransactionCommit() * Improve session clean up at the end of execution Explicitly list all the states that the execution might end, otherwise warn. * Remove MULTI_CONNECTION_WAIT_RETRY which is not used at all * Add more ORDER BYs to multi_mx_partitioning	2019-06-28 14:04:40 +02:00
Onder Kalaci	6a8e2c260a	Add order by multi_insert_select	2019-04-09 12:28:57 +03:00
Onder Kalaci	92e87738dd	Make sure that the regression test output is durable to different execution orders Mostly add order bys and suppress worker node ports in the test outputs.	2019-04-08 11:48:08 +03:00
Marco Slot	8893cc141d	Support INSERT...SELECT with ON CONFLICT or RETURNING via coordinator Before this commit, Citus supported INSERT...SELECT queries with ON CONFLICT or RETURNING clauses only for pushdownable ones, since queries supported via coordinator were utilizing COPY infrastructure of PG to send selected tuples to the target worker nodes. After this PR, INSERT...SELECT queries with ON CONFLICT or RETURNING clauses will be performed in two phases via coordinator. In the first phase selected tuples will be saved to the intermediate table which is colocated with target table of the INSERT...SELECT query. Note that, a utility function to save results to the colocated intermediate result also implemented as a part of this commit. In the second phase, INSERT.. SELECT query is directly run on the worker node using the intermediate table as the source table.	2018-11-30 15:29:12 +03:00
Nils Dijk	2d13900230	error on unsupported changing of distirbution column in ON CONFLICT for INSERT ... SELECT	2018-07-23 15:18:21 +02:00
Marco Slot	f3f2805978	Fix use-after-free that may occur for INSERT..SELECT in prepared statements	2018-06-18 22:55:06 -06:00
Onder Kalaci	d918556dca	INSERT .. SELECT pushdown honors multi_shard_modification_mode	2018-06-06 12:42:23 +03:00
mehmet furkan şahin	785a86ed0a	Tests are updated to use create_distributed_table	2018-05-10 11:18:59 +03:00
Marco Slot	f8cfe07fd1	Support intermediate results in distributed INSERT..SELECT	2018-05-03 14:42:28 +02:00
Marco Slot	09c09f650f	Recursively plan set operations when leaf nodes recur	2017-12-26 13:46:55 +02:00
Onder Kalaci	e2a5124830	Add regression tests for recursive subquery planning	2017-12-21 08:37:40 +02:00
Marco Slot	fa73abe6d4	Regression test output changes after CTE support	2017-12-14 09:32:55 +01:00
Marco Slot	a9933deac6	Make real time executor work in transactions	2017-11-30 09:59:32 +03:00
Marco Slot	ea306c6cfe	Use citus.next_placement_id where practical in regression tests	2017-11-15 10:12:06 +01:00
Marco Slot	89eb833375	Use citus.next_shard_id where practical in regression tests	2017-11-15 10:12:05 +01:00
Brian Cloutier	7be1545843	Support implicit casts during INSERT/SELECT It's possible to build INSERT SELECT queries which include implicit casts, currently we attempt to support these by adding explicit casts to the SELECT query, but this sometimes crashes because we don't update all nodes with the new types. (SortClauses, for instance) This commit removes those explicit casts and passes an unmodified SELECT query to the COPY executor (how we implement INSERT SELECT under the scenes). In lieu of those cases, COPY has been given some extra logic to inspect queries, notice that the types don't line up with the table it's supposed to be inserting into, and "manually" casting every tuple before sending them to workers.	2017-11-03 22:27:15 -07:00
metdos	8c356b2bc8	Don't try to add restrictions for reference tables in insert into select	2017-10-31 19:44:10 +02:00
Murat Tuncer	f7ab901766	Add select distinct, and distinct on support Distinct, and distinct on() clauses are supported in simple selects, joins, subqueries, and insert into select queries.	2017-10-13 14:59:48 +03:00
Marco Slot	cf375d6a66	Consider dropped columns that precede the partition column in COPY	2017-08-22 13:02:35 +02:00
Jason Petersen	6a35c2937c	Enable multi-row INSERTs This is a pretty substantial refactoring of the existing modify path within the router executor and planner. In particular, we now hunt for all VALUES range table entries in INSERT statements and group the rows contained therein by shard identifier. These rows are stashed away for later in "ModifyRoute" elements. During deparse, the appropriate RTE is extracted from the Query and its values list is replaced by these rows before any SQL is generated. In this way, we can create multiple Tasks, but only one per shard, to piecemeal execute a multi-row INSERT. The execution of jobs containing such tasks now exclusively go through the "multi-router executor" which was previously used for e.g. INSERT INTO ... SELECT. By piggybacking onto that executor, we participate in ongoing trans- actions, get rollback-ability, etc. In short order, the only remaining use of the "single modify" router executor will be for bare single- row INSERT statements (i.e. those not in a transaction). This change appropriately handles deferred pruning as well as master- evaluated functions.	2017-08-10 00:32:46 -07:00
Marco Slot	d3e9746236	Avoid connections that accessed non-colocated placements in multi-shard commands	2017-08-08 18:32:34 +02:00
Marco Slot	fd72cca6c8	Use predictable placement IDs in regression test output	2017-07-17 13:44:29 +03:00
Marco Slot	d3785b97c0	Remove XactModificationLevel distinction between DML and multi-shard	2017-07-12 11:59:19 +02:00
Marco Slot	2f8ac82660	Execute INSERT..SELECT via coordinator if it cannot be pushed down Add a second implementation of INSERT INTO distributed_table SELECT ... that is used if the query cannot be pushed down. The basic idea is to execute the SELECT query separately and pass the results into the distributed table using a CopyDestReceiver, which is also used for COPY and create_distributed_table. When planning the SELECT, we go through planner hooks again, which means the SELECT can also be a distributed query. EXPLAIN is supported, but EXPLAIN ANALYZE is not because preventing double execution was a lot more complicated in this case.	2017-06-22 15:46:30 +02:00
Jason Petersen	db11324ac7	Add unambiguous ORDER BY clauses to many tests Queries which do not specify an order may arbitrarily change output across PostgreSQL versions.	2017-05-16 11:05:34 -06:00
Önder Kalacı	b74ed3c8e1	Subqueries in where -- updated (#1372 ) * Support for subqueries in WHERE clause This commit enables subqueries in WHERE clause to be pushed down by the subquery pushdown logic. The support covers: - Correlated subqueries with IN, NOT IN, EXISTS, NOT EXISTS, operator expressions such as (>, <, =, ALL, ANY etc.) - Non-correlated subqueries with (partition_key) IN (SELECT partition_key ..) (partition_key) =ANY (SELECT partition_key ...) Note that this commit heavily utilizes the attribute equivalence logic introduced in the `1cb6a34ba8`. In general, this commit mostly adjusts the logical planner not to error out on the subqueries in WHERE clause. * Improve error checks for subquery pushdown and INSERT ... SELECT Since we allow subqueries in WHERE clause with the previous commit, we should apply the same limitations to those subqueries. With this commit, we do not iterate on each subquery one by one. Instead, we extract all the subqueries and apply the checks directly on those subqueries. The aim of this change is to (i) Simplify the code (ii) Make it close to the checks on INSERT .. SELECT code base. * Extend checks for unresolved paramaters to include SubLinks With the presence of subqueries in where clause (i.e., SubPlans on the query) the existing way for checking unresolved parameters fail. The reason is that the parameters for SubPlans are kept on the parent plan not on the query itself (see primnodes.h for the details). With this commit, instead of checking SubPlans on the modified plans we start to use originalQuery, where SubLinks represent the subqueries in where clause. The unresolved parameters can be found on the SubLinks. * Apply code-review feedback * Remove unnecessary copying of shard interval list This commit removes unnecessary copying of shard interval list. Note that there are no copyObject function implemented for shard intervals.	2017-05-01 17:20:21 +03:00
Önder Kalacı	ad5cd326a4	Subquery pushdown - main branch (#1323 ) * Enabling physical planner for subquery pushdown changes This commit applies the logic that exists in INSERT .. SELECT planning to the subquery pushdown changes. The main algorithm is followed as : - pick an anchor relation (i.e., target relation) - per each target shard interval - add the target shard interval's shard range as a restriction to the relations (if all relations joined on the partition keys) - Check whether the query is router plannable per target shard interval. - If router plannable, create a task * Add union support within the JOINS This commit adds support for UNION/UNION ALL subqueries that are in the following form: .... (Q1 UNION Q2 UNION ...) as union_query JOIN (QN) ... In other words, we currently do NOT support the queries that are in the following form where union query is not JOINed with other relations/subqueries : .... (Q1 UNION Q2 UNION ...) as union_query .... * Subquery pushdown planner uses original query With this commit, we change the input to the logical planner for subquery pushdown. Before this commit, the planner was relying on the query tree that is transformed by the postgresql planner. After this commit, the planner uses the original query. The main motivation behind this change is the simplify deparsing of subqueries. * Enable top level subquery join queries This work enables - Top level subquery joins - Joins between subqueries and relations - Joins involving more than 2 range table entries A new regression test file is added to reflect enabled test cases * Add top level union support This commit adds support for UNION/UNION ALL subqueries that are in the following form: .... (Q1 UNION Q2 UNION ...) as union_query .... In other words, Citus supports allow top level unions being wrapped into aggregations queries and/or simple projection queries that only selects some fields from the lower level queries. * Disallow subqueries without a relation in the range table list for subquery pushdown This commit disallows subqueries without relation in the range table list. This commit is only applied for subquery pushdown. In other words, we do not add this limitation for single table re-partition subqueries. The reasoning behind this limitation is that if we allow pushing down such queries, the result would include (shardCount * expectedResults) where in a non distributed world the result would be (expectedResult) only. * Disallow subqueries without a relation in the range table list for INSERT .. SELECT This commit disallows subqueries without relation in the range table list. This commit is only applied for INSERT.. SELECT queries. The reasoning behind this limitation is that if we allow pushing down such queries, the result would include (shardCount * expectedResults) where in a non distributed world the result would be (expectedResult) only. * Change behaviour of subquery pushdown flag (#1315) This commit changes the behaviour of the citus.subquery_pushdown flag. Before this commit, the flag is used to enable subquery pushdown logic. But, with this commit, that behaviour is enabled by default. In other words, the flag is now useless. We prefer to keep the flag since we don't want to break the backward compatibility. Also, we may consider using that flag for other purposes in the next commits. * Require subquery_pushdown when limit is used in subquery Using limit in subqueries may cause returning incorrect results. Therefore we allow limits in subqueries only if user explicitly set subquery_pushdown flag. * Evaluate expressions on the LIMIT clause (#1333) Subquery pushdown uses orignal query, the LIMIT and OFFSET clauses are not evaluated. However, logical optimizer expects these expressions are already evaluated by the standard planner. This commit manually evaluates the functions on the logical planner for subquery pushdown. * Better format subquery regression tests (#1340) * Style fix for subquery pushdown regression tests With this commit we intented a more consistent style for the regression tests we've added in the - multi_subquery_union.sql - multi_subquery_complex_queries.sql - multi_subquery_behavioral_analytics.sql * Enable the tests that are temporarily commented This commit enables some of the regression tests that were commented out until all the development is done. * Fix merge conflicts (#1347) - Update regression tests to meet the changes in the regression test output. - Replace Ifs with Asserts given that the check is already done - Update shard pruning outputs * Add view regression tests for increased subquery coverage (#1348) - joins between views and tables - joins between views - union/union all queries involving views - views with limit - explain queries with view * Improve btree operators for the subquery tests This commit adds the missing comprasion for subquery composite key btree comparator.	2017-04-29 04:09:48 +03:00
velioglu	2327b63291	Change native hash function with worker_hash	2017-04-19 22:16:55 +03:00
Marco Slot	f838c83809	Remove redundant pg_dist_jobid_seq restarts in tests	2017-04-18 11:42:32 +02:00
Onder Kalaci	1cb6a34ba8	Remove uninstantiated qual logic, use attribute equivalences In this PR, we aim to deduce whether each of the RTE_RELATION is joined with at least on another RTE_RELATION on their partition keys. If each RTE_RELATION follows the above rule, we can conclude that all RTE_RELATIONs are joined on their partition keys. In order to do that, we invented a new equivalence class namely: AttributeEquivalenceClass. In very simple words, a AttributeEquivalenceClass is identified by an unique id and consists of a list of AttributeEquivalenceMembers. Each AttributeEquivalenceMember is designed to identify attributes uniquely within the whole query. The necessity of this arise since varno attributes are defined within a single level of a query. Instead, here we want to identify each RTE_RELATION uniquely and try to find equality among each RTE_RELATION's partition key. Whenever we find an equality clause A = B, where both A and B originates from relation attributes (i.e., not random expressions), we create an AttributeEquivalenceClass to record this knowledge. If we later find another equivalence B = C, we create another AttributeEquivalenceClass. Finally, we can apply transitity rules and generate a new AttributeEquivalenceClass which includes A, B and C. Note that equality among the members are identified by the varattno and rteIdentity. Each equality among RTE_RELATION is saved using an AttributeEquivalenceClass where each member attribute is identified by a AttributeEquivalenceMember. In the final step, we try generate a common attribute equivalence class that holds as much as AttributeEquivalenceMembers whose attributes are a partition keys.	2017-04-13 11:51:26 +03:00
Onder Kalaci	11665dbe3c	Fix pushing down wrong queries for INSERT ... SELECT queries Before this commit, in certain cases router planner allowed pushing down JOINs that are not on the partition keys. With @anarazel's suggestion, we change the logic to use uninstantiated parameter. Previously, the planner was traversing on the restriction information and once it finds the parameter, it was replacing it with the shard range. With this commit, instead of traversing the restrict infos, the planner explicitly checks for the equivalence of the relation partition key with the uninstantiated parameter. If finds an equivalence, it adds the restrictions. In this way, we have more control over the queries that are pushed down.	2017-03-24 11:37:35 +02:00
Andres Freund	9721e80901	Use DEBUG2 instead of DEBUG4 in INSERT SELECT tests & debug message. During later work the transaction debug output will change (as it will in postgres 10), which makes it hard to see actual changes in the INSERT ... SELECT ... test. Reduce to DEBUG2 after changing a debug message to that log level.	2017-02-20 12:56:16 +02:00
Murat Tuncer	d76f781ae4	Convert multi copy to use new connection api This enables proper transactional behaviour for copy and relaxes some restrictions like combining COPY with single-row modifications. It also provides the basis for relaxing restrictions further, and for optionally allowing connection caching.	2017-01-20 19:15:19 -08:00
Onder Kalaci	a7ed49c16e	Improve error messages for INSERT INTO .. SELECT This commit is intended to improve the error messages while planning INSERT INTO .. SELECT queries. The main motivation for this change is that we used to map multiple cases into a single message. With this change, we added explicit error messages for many cases.	2017-01-16 12:16:14 -07:00
Murat Tuncer	77f8db6b14	Add view support Enables use views within distributed queries. User can create and use a view on distributed tables/queries as he/she would use with regular queries. After this change router queries will have full support for views, insert into select queries will support reading from views, not writing into. Outer joins would have a limited support, and would error out at certain cases such as when a view is in the inner side of the outer join. Although PostgreSQL supports writing into views under certain circumstances. We disallowed that for distributed views.	2017-01-13 09:39:42 +03:00
Marco Slot	11031bcf55	Enable evaluation of stable functions in INSERT..SELECT	2016-12-23 12:47:21 +01:00
Onder Kalaci	df974e15b8	Bugfix for deparsing INSERT..SELECT queries which involve constant values This commit fixes a bug when the SELECT target list includes a constant value. Previous behaviour of target list re-ordering: * Iterate over the INSERT target list * If it includes a Var, find the corresponding SELECT entry and update its resno accordingly * If it does not include a Var (which we only considered to be DEFAULTs), generate a new SELECT target entry * If the processed target entry count in SELECT target list is less than the original SELECT target list (GROUP BY elements not included in the SELECT target entry), add them in the SELECT target list and update the resnos accordingly. * However, this step was leading to add the CONST SELECT target entries twice. The reason is that when CONST target list entries appear in the SELECT target list, the INSERT target list doesn't include a Var. Instead, it includes CONST as it does for DEFAULTs. New behaviour of target list re-ordering: * Iterate over the INSERT target list * If it includes a Var, find the corresponding SELECT entry and update its resno accordingly * If it does not include a Var (which we consider to be DEFAULTs and CONSTs on the SELECT), generate a new SELECT target entry * If any target entry remains on the SELECT target list which are resjunk, (GROUP BY elements not included in the SELECT target entry), keep them in the SELECT target list by updating the resnos.	2016-12-01 10:41:56 +02:00
Marco Slot	f6b3af7a49	Use co-located shard ID in multi_shard_transaction	2016-11-02 11:01:19 +01:00
Onder Kalaci	a43e3bad56	Improve error semantics for INSERT..SELECT With this commit, we error out if a worker query cannot be executed on all placements of a target insert shard interval.	2016-10-27 14:09:05 +03:00
Onder Kalaci	1673ea937c	Feature: INSERT INTO ... SELECT This commit adds INSERT INTO ... SELECT feature for distributed tables. We implement INSERT INTO ... SELECT by pushing down the SELECT to each shard. To compute that we use the router planner, by adding an "uninstantiated" constraint that the partition column be equal to a certain value. standard_planner() distributes that constraint to all the tables where it knows how to push the restriction safely. An example is that the tables that are connected via equi joins. The router planner then iterates over the target table's shards, for each we replace the "uninstantiated" restriction, with one that PruneShardList() handles. Do so by replacing the partitioning qual parameter added in multi_planner() with the current shard's actual boundary values. Also, add the current shard's boundary values to the top level subquery to ensure that even if the partitioning qual is not distributed to all the tables, we never run the queries on the shards that don't match with the current shard boundaries. Finally, perform the normal shard pruning to decide on whether to push the query to the current shard or not. We do not support certain SQLs on the subquery, which are described/commented on ErrorIfInsertSelectQueryNotSupported(). We also added some locking on the router executor. When an INSERT/SELECT command runs on a distributed table with replication factor >1, we need to ensure that it sees the same result on each placement of a shard. So we added the ability such that router executor takes exclusive locks on shards from which the SELECT in an INSERT/SELECT reads in order to prevent concurrent changes. This is not a very optimal solution, but it's simple and correct. The citus.all_modifications_commutative can be used to avoid aggressive locking. An INSERT/SELECT whose filters are known to exclude any ongoing writes can be marked as commutative. See RequiresConsistentSnapshot() for the details. We also moved the decison of whether the multiPlan should be executed on the router executor or not to the planning phase. This allowed us to integrate multi task router executor tasks to the router executor smoothly.	2016-10-26 10:01:00 +03:00

44 Commits (86876c0473c280b09bfe769d9268da8b0b7aac4a)