citus

Commit Graph

Author	SHA1	Message	Date
Önder Kalacı	40da78c6fd	Introduce the adaptive executor (#2798 ) With this commit, we're introducing the Adaptive Executor. The commit message consists of two distinct sections. The first part explains how the executor works. The second part consists of the commit messages of the individual smaller commits that resulted in this commit. The readers can search for the each of the smaller commit messages on https://github.com/citusdata/citus and can learn more about the history of the change. /------------------------------------------------------------------------- * adaptive_executor.c * * The adaptive executor executes a list of tasks (queries on shards) over * a connection pool per worker node. The results of the queries, if any, * are written to a tuple store. * * The concepts in the executor are modelled in a set of structs: * * - DistributedExecution: * Execution of a Task list over a set of WorkerPools. * - WorkerPool * Pool of WorkerSessions for the same worker which opportunistically * executes "unassigned" tasks from a queue. * - WorkerSession: * Connection to a worker that is used to execute "assigned" tasks * from a queue and may execute unasssigned tasks from the WorkerPool. * - ShardCommandExecution: * Execution of a Task across a list of placements. * - TaskPlacementExecution: * Execution of a Task on a specific placement. * Used in the WorkerPool and WorkerSession queues. * * Every connection pool (WorkerPool) and every connection (WorkerSession) * have a queue of tasks that are ready to execute (readyTaskQueue) and a * queue/set of pending tasks that may become ready later in the execution * (pendingTaskQueue). The tasks are wrapped in a ShardCommandExecution, * which keeps track of the state of execution and is referenced from a * TaskPlacementExecution, which is the data structure that is actually * added to the queues and describes the state of the execution of a task * on a particular worker node. * * When the task list is part of a bigger distributed transaction, the * shards that are accessed or modified by the task may have already been * accessed earlier in the transaction. We need to make sure we use the * same connection since it may hold relevant locks or have uncommitted * writes. In that case we "assign" the task to a connection by adding * it to the task queue of specific connection (in * AssignTasksToConnections). Otherwise we consider the task unassigned * and add it to the task queue of a worker pool, which means that it * can be executed over any connection in the pool. * * A task may be executed on multiple placements in case of a reference * table or a replicated distributed table. Depending on the type of * task, it may not be ready to be executed on a worker node immediately. * For instance, INSERTs on a reference table are executed serially across * placements to avoid deadlocks when concurrent INSERTs take conflicting * locks. At the beginning, only the "first" placement is ready to execute * and therefore added to the readyTaskQueue in the pool or connection. * The remaining placements are added to the pendingTaskQueue. Once * execution on the first placement is done the second placement moves * from pendingTaskQueue to readyTaskQueue. The same approach is used to * fail over read-only tasks to another placement. * * Once all the tasks are added to a queue, the main loop in * RunDistributedExecution repeatedly does the following: * * For each pool: * - ManageWorkPool evaluates whether to open additional connections * based on the number unassigned tasks that are ready to execute * and the targetPoolSize of the execution. * * Poll all connections: * - We use a WaitEventSet that contains all (non-failed) connections * and is rebuilt whenever the set of active connections or any of * their wait flags change. * * We almost always check for WL_SOCKET_READABLE because a session * can emit notices at any time during execution, but it will only * wake up WaitEventSetWait when there are actual bytes to read. * * We check for WL_SOCKET_WRITEABLE just after sending bytes in case * there is not enough space in the TCP buffer. Since a socket is * almost always writable we also use WL_SOCKET_WRITEABLE as a * mechanism to wake up WaitEventSetWait for non-I/O events, e.g. * when a task moves from pending to ready. * * For each connection that is ready: * - ConnectionStateMachine handles connection establishment and failure * as well as command execution via TransactionStateMachine. * * When a connection is ready to execute a new task, it first checks its * own readyTaskQueue and otherwise takes a task from the worker pool's * readyTaskQueue (on a first-come-first-serve basis). * * In cases where the tasks finish quickly (e.g. <1ms), a single * connection will often be sufficient to finish all tasks. It is * therefore not necessary that all connections are established * successfully or open a transaction (which may be blocked by an * intermediate pgbouncer in transaction pooling mode). It is therefore * essential that we take a task from the queue only after opening a * transaction block. * * When a command on a worker finishes or the connection is lost, we call * PlacementExecutionDone, which then updates the state of the task * based on whether we need to run it on other placements. When a * connection fails or all connections to a worker fail, we also call * PlacementExecutionDone for all queued tasks to try the next placement * and, if necessary, mark shard placements as inactive. If a task fails * to execute on all placements, the execution fails and the distributed * transaction rolls back. * * For multi-row INSERTs, tasks are executed sequentially by * SequentialRunDistributedExecution instead of in parallel, which allows * a high degree of concurrency without high risk of deadlocks. * Conversely, multi-row UPDATE/DELETE/DDL commands take aggressive locks * which forbids concurrency, but allows parallelism without high risk * of deadlocks. Note that this is unrelated to SEQUENTIAL_CONNECTION, * which indicates that we should use at most one connection per node, but * can run tasks in parallel across nodes. This is used when there are * writes to a reference table that has foreign keys from a distributed * table. * * Execution finishes when all tasks are done, the query errors out, or * the user cancels the query. * ------------------------------------------------------------------------- / All the commits involved here: * Initial unified executor prototype * Latest changes * Fix rebase conflicts to master branch * Add missing variable for assertion * Ensure that master_modify_multiple_shards() returns the affectedTupleCount * Adjust intermediate result sizes The real-time executor uses COPY command to get the results from the worker nodes. Unified executor avoids that which results in less data transfer. Simply adjust the tests to lower sizes. * Force one connection per placement (or co-located placements) when requested The existing executors (real-time and router) always open 1 connection per placement when parallel execution is requested. That might be useful under certain circumstances: (a) User wants to utilize as much as CPUs on the workers per distributed query (b) User has a transaction block which involves COPY command Also, lots of regression tests rely on this execution semantics. So, we'd enable few of the tests with this change as well. * For parameters to be resolved before using them For the details, see PostgreSQL's copyParamList() * Unified executor sorts the returning output * Ensure that unified executor doesn't ignore sequential execution of DDLJob's Certain DDL commands, mainly creating foreign keys to reference tables, should be executed sequentially. Otherwise, we'd end up with a self distributed deadlock. To overcome this situaiton, we set a flag `DDLJob->executeSequentially` and execute it sequentially. Note that we have to do this because the command might not be called within a transaction block, and we cannot call `SetLocalMultiShardModifyModeToSequential()`. This fixes at least two test: multi_insert_select_on_conflit.sql and multi_foreign_key.sql Also, I wouldn't mind scattering local `targetPoolSize` variables within the code. The reason is that we'll soon have a GUC (or a global variable based on a GUC) that'd set the pool size. In that case, we'd simply replace `targetPoolSize` with the global variables. * Fix 2PC conditions for DDL tasks * Improve closing connections that are not fully established in unified execution * Support foreign keys to reference tables in unified executor The idea for supporting foreign keys to reference tables is simple: Keep track of the relation accesses within a transaction block. - If a parallel access happens on a distributed table which has a foreign key to a reference table, one cannot modify the reference table in the same transaction. Otherwise, we're very likely to end-up with a self-distributed deadlock. - If an access to a reference table happens, and then a parallel access to a distributed table (which has a fkey to the reference table) happens, we switch to sequential mode. Unified executor misses the function calls that marks the relation accesses during the execution. Thus, simply add the necessary calls and let the logic kick in. * Make sure to close the failed connections after the execution * Improve comments * Fix savepoints in unified executor. * Rebuild the WaitEventSet only when necessary * Unclaim connections on all errors. * Improve failure handling for unified executor - Implement the notion of errorOnAnyFailure. This is similar to Critical Connections that the connection managament APIs provide - If the nodes inside a modifying transaction expand, activate 2PC - Fix few bugs related to wait event sets - Mark placement INACTIVE during the execution as much as possible as opposed to we do in the COMMIT handler - Fix few bugs related to scheduling next placement executions - Improve decision on when to use 2PC Improve the logic to start a transaction block for distributed transactions - Make sure that only reference table modifications are always executed with distributed transactions - Make sure that stored procedures and functions are executed with distributed transactions * Move waitEventSet to DistributedExecution This could also be local to RunDistributedExecution(), but in that case we had to mark it as "volatile" to avoid PG_TRY()/PG_CATCH() issues, and cast it to non-volatile when doing WaitEventSetFree(). We thought that would make code a bit harder to read than making this non-local, so we move it here. See comments for PG_TRY() in postgres/src/include/elog.h and "man 3 siglongjmp" for more context. * Fix multi_insert_select test outputs Two things: 1) One complex transaction block is now supported. Simply update the test output 2) Due to dynamic nature of the unified executor, the orders of the errors coming from the shards might change (e.g., all of the queries on the shards would fail, but which one appears on the error message?). To fix that, we simply added it to our shardId normalization tool which happens just before diff. * Fix subeury_and_cte test The error message is updated from: failed to execute task To: more than one row returned by a subquery or an expression which is a lot clearer to the user. * Fix intermediate_results test outputs Simply update the error message from: could not receive query results to result "squares" does not exist which makes a lot more sense. * Fix multi_function_in_join test The error messages update from: Failed to execute task XXX To: function f(..) does not exist * Fix multi_query_directory_cleanup test The unified executor does not create any intermediate files. * Fix with_transactions test A test case that just started to work fine * Fix multi_router_planner test outputs The error message is update from: Could not receive query results To: Relation does not exists which is a lot more clearer for the users * Fix multi_router_planner_fast_path test The error message is update from: Could not receive query results To: Relation does not exists which is a lot more clearer for the users * Fix isolation_copy_placement_vs_modification by disabling select_opens_transaction_block * Fix ordering in isolation_multi_shard_modify_vs_all * Add executor locks to unified executor * Make sure to allocate enought WaitEvents The previous code was missing the waitEvents for the latch and postmaster death. * Fix rebase conflicts for master rebase * Make sure that TRUNCATE relies on unified executor * Implement true sequential execution for multi-row INSERTS Execute the individual tasks executed one by one. Note that this is different than MultiShardConnectionType == SEQUENTIAL_CONNECTION case (e.g., sequential execution mode). In that case, running the tasks across the nodes in parallel is acceptable and implemented in that way. However, the executions that are qualified here would perform poorly if the tasks across the workers are executed in parallel. We currently qualify only one class of distributed queries here, multi-row INSERTs. If we do not enforce true sequential execution, concurrent multi-row upserts could easily form a distributed deadlock when the upserts touch the same rows. * Remove SESSION_LIFESPAN flag in unified_executor * Apply failure test updates We've changed the failure behaviour a bit, and also the error messages that show up to the user. This PR covers majority of the updates. * Unified executor honors citus.node_connection_timeout With this commit, unified executor errors out if even a single connection cannot be established within citus.node_connection_timeout. And, as a side effect this fixes failure_connection_establishment test. * Properly increment/decrement pool size variables Before this commit, the idle and active connection counts were not properly calculated. * insert_select_executor goes through unified executor. * Add missing file for task tracker * Modify ExecuteTaskListExtended()'s signature * Sort output of INSERT ... SELECT ... RETURNING * Take partition locks correctly in unified executor * Alternative implementation for force_max_query_parallelization * Fix compile warnings in unified executor * Fix style issues * Decrement idleConnectionCount when idle connection is lost * Always rebuild the wait event sets In the previous implementation, on waitFlag changes, we were only modifying the wait events. However, we've realized that it might be an over optimization since (a) we couldn't see any performance benefits (b) we see some errors on failures and because of (a) we prefer to disable it now. * Make sure to allocate enough sized waitEventSet With multi-row INSERTs, we might have more sessions than taskworkerCount after few calls of RunDistributedExecution() because the previous sessions would also be alive. Instead, re-allocate events when the connectino set changes. Implement SELECT FOR UPDATE on reference tables On master branch, we do two extra things on SELECT FOR UPDATE queries on reference tables: - Acquire executor locks - Execute the query on all replicas With this commit, we're implementing the same logic on the new executor. * SELECT FOR UPDATE opens transaction block even if SelectOpensTransactionBlock disabled Otherwise, users would be very confused and their logic is very likely to break. * Fix build error * Fix the newConnectionCount calculation in ManageWorkerPool * Fix rebase conflicts * Fix minor test output differences * Fix citus indent * Remove duplicate sorts that is added with rebase * Create distributed table via executor * Fix wait flags in CheckConnectionReady * failure_savepoints output for unified executor. * failure_vacuum output (pg 10) for unified executor. * Fix WaitEventSetWait timeout in unified executor * Stabilize failure_truncate test output * Add an ORDER BY to multi_upsert * Fix regression test outputs after rebase to master * Add executor.c comment * Rename executor.c to adaptive_executor.c * Do not schedule tasks if the failed placement is not ready to execute Before the commit, we were blindly scheduling the next placement executions even if the failed placement is not on the ready queue. Now, we're ensuring that if failed placement execution is on a failed pool or session where the execution is on the pendingQueue, we do not schedule the next task. Because the other placement execution should be already running. * Implement a proper custom scan node for adaptive executor - Switch between the executors, add GUC to set the pool size - Add non-adaptive regression test suites - Enable CIRCLE CI for non-adaptive tests - Adjust test output files * Add slow start interval to the executor * Expose max_cached_connection_per_worker to user * Do not start slow when there are cached connections * Consider ExecutorSlowStartInterval in NextEventTimeout * Fix memory issues with ReceiveResults(). * Disable executor via TaskExecutorType * Make sure to execute the tests with the other executor * Use task_executor_type to enable-disable adaptive executor * Remove useless code * Adjust the regression tests * Add slow start regression test * Rebase to master * Fix test failures in adaptive executor. * Rebase to master - 2 * Improve comments & debug messages * Set force_max_query_parallelization in isolation_citus_dist_activity * Force max parallelization for creating shards when asked to use exclusive connection. * Adjust the default pool size * Expand description of max_adaptive_executor_pool_size GUC * Update warnings in FinishRemoteTransactionCommit() * Improve session clean up at the end of execution Explicitly list all the states that the execution might end, otherwise warn. * Remove MULTI_CONNECTION_WAIT_RETRY which is not used at all * Add more ORDER BYs to multi_mx_partitioning	2019-06-28 14:04:40 +02:00
Philip Dubé	5c62f9935a	Router planner: reject SELECT FOR UPDATE ctes	2019-06-26 10:32:01 +02:00
Onder Kalaci	ad93d6feea	Change the order of placement access added to the list This is to make sure that the error messages related to foreign keys to reference tables shows the exact placement access name instead of SELECT.	2019-06-23 11:32:58 +02:00
Hadi Moshayedi	4bbae02778	Make COPY compatible with unified executor.	2019-06-20 19:53:40 +02:00
Hadi Moshayedi	2e6d04df7b	Refactor ExecuteModifyTasksSequentially.	2019-06-20 18:38:57 +02:00
Philip Dubé	4bfcf5b665	Enable Werror for all warnings Changes to ruleutils match changes made upstream to silence gcc fallthrough warnings	2019-06-18 14:43:54 -07:00
Hadi Moshayedi	85325e0098	Refactor ScanStateGetExecutorState into its own function.	2019-06-05 09:16:43 -07:00
Hadi Moshayedi	0b01c59fa6	Refactor ScanStateGetTupleDescriptor() into a function.	2019-06-04 15:19:49 -07:00
Marco Slot	bb3a96eacb	Cache a configurable number of connections at xact end	2019-05-29 13:24:31 +02:00
Hadi Moshayedi	8ae47e1244	Fix comments for RemoteFileDestReceiverStartup and CitusCopyDestReceiverStartup	2019-05-21 09:03:22 -07:00
Hadi Moshayedi	b5c0ca45f1	Remove stopOnFailure flag from EndRemoteCopy()	2019-05-11 06:18:34 -07:00
Hadi Moshayedi	e584961267	Fix mixed declarations and code warnings	2019-05-08 12:51:40 -07:00
Onder Kalaci	495b6e9b62	Refactor Parallel Relation Access Recording Instead of scattering the code around, we move all the logic into a single function. This will help supporting foreign keys to reference tables in the unified executor with a single line of change, just calling this function.	2019-05-02 18:12:33 +03:00
Hadi Moshayedi	32ecb6884c	Test ROLLBACK TO SAVEPOINT with multi-shard CTE failures	2019-05-01 09:33:43 -07:00
Hadi Moshayedi	b69a762e0b	Fix savepoint rollback after multi-shard update failure.	2019-05-01 09:33:43 -07:00
Jason Petersen	71d5d1c865	Enable variable shadowing warnings; fix all Rather than wait for another place like the previous commit to bite us, I think we should turn on this warning.	2019-04-30 13:24:25 -06:00
Onder Kalaci	004f28e18c	Sort output of RETURNING The feature is only intended for getting consistent outputs for the regression tests. RETURNING does not have any ordering gurantees and with unified executor, the ordering of query executions on the shards are also becoming unpredictable. Thus, we're enforcing ordering when a GUC is set. We implicitly add an `ORDER BY` something equivalent of ` RETURNING expr1, expr2, .. ,exprN ORDER BY expr1, expr2, .. ,exprN ` As described in the code comments as well, this is probably not the most performant approach we could implement. However, since we're only targeting regression tests, I don't see any issues with that. If we decide to expand this to a feature to users, we should revisit the implementation and improve the performance.	2019-04-24 11:51:19 +03:00
Marco Slot	e3b7e74f43	Allow rescan in DECLARE .. WITH HOLD	2019-03-22 11:25:55 +01:00
Marco Slot	5ff1821411	Cache the current database name Purely for performance reasons.	2019-03-20 12:14:46 +03:00
Marco Slot	f2abf2b8e5	Functions are treated as transaction blocks	2019-03-15 16:34:08 -06:00
Hadi Moshayedi	a9e6d06a98	Skip execution of ALTER TABLE constraint checks on the coordinator	2019-03-14 15:40:56 -07:00
Hadi Moshayedi	f4d3b94e22	Fix some of the casts for groupId (#2609 ) A small change which partially addresses #2608.	2019-03-05 12:06:44 -08:00
Jason Petersen	339e6e661e	Remove 9.6 (#2554 ) Removes support and code for PostgreSQL 9.6 cr: @velioglu	2019-01-16 13:11:24 -07:00
Marco Slot	1b1c6374f7	Execute CREATE INDEX CONCURRENTLY concurrently	2018-12-21 14:02:59 -07:00
Marco Slot	8893cc141d	Support INSERT...SELECT with ON CONFLICT or RETURNING via coordinator Before this commit, Citus supported INSERT...SELECT queries with ON CONFLICT or RETURNING clauses only for pushdownable ones, since queries supported via coordinator were utilizing COPY infrastructure of PG to send selected tuples to the target worker nodes. After this PR, INSERT...SELECT queries with ON CONFLICT or RETURNING clauses will be performed in two phases via coordinator. In the first phase selected tuples will be saved to the intermediate table which is colocated with target table of the INSERT...SELECT query. Note that, a utility function to save results to the colocated intermediate result also implemented as a part of this commit. In the second phase, INSERT.. SELECT query is directly run on the worker node using the intermediate table as the source table.	2018-11-30 15:29:12 +03:00
Marco Slot	aff37cf1bc	Control multi-shard modify locks with enable_deadlock_prevention	2018-11-28 02:59:50 +01:00
Marco Slot	6aa5592e52	Add user ID suffix to intermediate files in re-partition jobs	2018-11-23 08:36:11 +01:00
Marco Slot	a59bf31c76	Use worker_execute_sql_task UDF in task-tracker executor	2018-11-22 18:15:33 +01:00
Marco Slot	caf402d506	COPY to a task file no longer switches to superuser	2018-11-22 18:15:33 +01:00
Marco Slot	f383e4f307	Description: Refactor code that handles DDL commands from one file into a module The file handling the utility functions (DDL) for citus organically grew over time and became unreasonably large. This refactor takes that file and refactored the functionality into separate files per command. Initially modeled after the directory and file layout that can be found in postgres. Although the size of the change is quite big there are barely any code changes. Only one two functions have been added for readability purposes: - PostProcessIndexStmt which is extracted from PostProcessUtility - PostProcessAlterTableStmt which is extracted from multi_ProcessUtility A README.md has been added to `src/backend/distributed/commands` describing the contents of the module and every file in the module. We need more documentation around the overloading of the COPY command, for now the boilerplate has been added for people with better knowledge to fill out.	2018-11-14 13:36:27 +01:00
Onder Kalaci	6e05921736	Processes that are blocked on advisory locks show up in wait edges Assign the distributed transaction id before trying to acquire the executor advisory locks. This is useful to show this backend in citus lock graphs (e.g., dump_global_wait_edges() and citus_lock_waits).	2018-10-24 13:32:13 +03:00
Hadi Moshayedi	3e00bf1c0d	Don't throw error for DROP DATABASE IF EXISTS	2018-10-23 09:45:03 -04:00
Marco Slot	d56baefe3d	Allow simple DML commands from hot standby	2018-10-06 10:54:44 +02:00
velioglu	512d23934f	Show router modify,select and real-time queries on MX views	2018-10-02 13:59:38 +03:00
Onder Kalaci	cdc0d1491c	Make sure to use correct execution mode for TRUNCATE We used to set the execution mode in the truncate trigger. However, when multiple tables are truncated with a single command, we could set the execution mode very late. Instead, now set the execution mode on the utility hook.	2018-09-25 15:35:27 +03:00
Onder Kalaci	abc443d7fa	Make sure that shard repair considers replication factor	2018-09-21 15:24:49 +03:00
Onder Kalaci	c1b5a04f6e	Allow partitioned tables with replication factor > 1 With this commit, we all partitioned distributed tables with replication factor > 1. However, we also have many restrictions. In summary, we disallow all kinds of modifications (including DDLs) on the partition tables. Instead, the user is allowed to run the modifications over the parent table. The necessity for such a restriction have two aspects: - We need to acquire shard resource locks appropriately - We need to handle marking partitions INVALID in case of any failures. Note that, in theory, the parent table should also become INVALID, which is too aggressive.	2018-09-21 14:40:41 +03:00
Murat Tuncer	b6930e3db9	Add distributed locking to truncated mx tables We acquire distributed lock on all mx nodes for truncated tables before actually doing truncate operation. This is needed for distributed serialization of the truncate command without causing a deadlock.	2018-09-21 14:23:19 +03:00
Murat Tuncer	0f6e514bfb	Fixes a bug on not being able to drop index on a partitioned table. Reason for the failure is that PG11 introduced a new relation kind RELKIND_PARTITIONED_INDEX to be used for partitioned indices. We expanded our check to cover that case.	2018-09-19 13:15:05 +03:00
Marco Slot	f34ab55389	Fix bug preventing rollback in stored procedure	2018-08-31 20:49:20 +02:00
velioglu	bd30e3e908	Add support for writing to reference tables from MX nodes	2018-08-27 18:15:04 +03:00
Onder Kalaci	b8af8c359b	Make sure that modifying CTEs always use the correct execution mode	2018-08-23 14:53:55 +03:00
mehmet furkan şahin	ef9f38b68d	ApplyLogRedaction noop func is added	2018-08-17 14:48:54 -07:00
Onder Kalaci	85d418412d	Fix DDL execution problem on MX when search_path is used Make sure that the coordinator sends the commands when the search path synchronised with the coordinator's search_path. This is only important when Citus sends the commands that are directly relayed to the worker nodes. For example, the deparsed DLL commands or queries always adds schema qualifications to the queries. So, they do not require this change.	2018-08-13 16:34:50 +03:00
velioglu	e23625bf5e	Use contype to check for FK constraint instead of reading catalog table	2018-07-24 15:53:05 +03:00
mehmet furkan şahin	6d0fbbace7	ALTER TABLE %s ADD COLUMN constraint check is added	2018-07-24 15:53:05 +03:00
Marco Slot	625816242a	Don't try to check unopened connection in EXEC_TASK_FAILED state	2018-07-23 11:41:02 -06:00
Nils Dijk	df98900f80	fix missing space for tablein in error	2018-07-20 15:05:13 +02:00
Marco Slot	89870e76ce	Add a select_opens_transaction_block GUC	2018-07-08 03:50:39 +02:00
Onder Kalaci	7fb529aab9	Some stylistic improvements in the foreign keys to reference table changes.	2018-07-05 23:23:34 +03:00
Nils Dijk	c1c8c38dc9	create placeholder for policy ddl	2018-07-05 11:07:01 +02:00
Murat Tuncer	901066a421	Move partition key logging related code from enterprise	2018-07-04 13:11:34 +03:00
Onder Kalaci	d83be3a33f	Enforce foreign key restrictions inside transaction blocks When a hash distributed table have a foreign key to a reference table, there are few restrictions we have to apply in order to prevent distributed deadlocks or reading wrong results. The necessity to apply the restrictions arise from cascading nature of foreign keys. When a foreign key on a reference table cascades to a distributed table, a single operation over a single connection can acquire locks on multiple shards of the distributed table. Thus, any parallel operation on that distributed table, in the same transaction should not open parallel connections to the shards. Otherwise, we'd either end-up with a self-distributed deadlock or read wrong results. As briefly described above, the restrictions that we apply is done by tracking the distributed/reference relation accesses inside transaction blocks, and act accordingly when necessary. The two main rules are as follows: - Whenever a parallel distributed relation access conflicts with a consecutive reference relation access, Citus errors out - Whenever a reference relation access is followed by a conflicting parallel relation access, the execution mode is switched to sequential mode. There are also some other notes to mention: - If the user does SET LOCAL citus.multi_shard_modify_mode TO 'sequential';, all the queries should simply work with using one connection per worker and sequentially executing the commands. That's obviously a slower approach than Citus' usual parallel execution. However, we've at least have a way to run all commands successfully. - If an unrelated parallel query executed on any distributed table, we cannot switch to sequential mode. Because, the essense of sequential mode is using one connection per worker. However, in the presence of a parallel connection, the connection manager picks those connections to execute the commands. That contradicts with our purpose, thus we error out. - COPY to a distributed table cannot be executed in sequential mode. Thus, if we switch to sequential mode and COPY is executed, the operation fails and there is currently no way of implementing that. Note that, when the local table is not empty and create_distributed_table is used, citus uses COPY internally. Thus, in those cases, create_distributed_table() will also fail. - There is a GUC called citus.enforce_foreign_key_restrictions to disable all the checks. We added that GUC since the restrictions we apply is sometimes a bit more restrictive than its necessary. The user might want to relax those. Similarly, if you don't have CASCADEing reference tables, you might consider disabling all the checks.	2018-07-03 17:05:55 +03:00
velioglu	6be6911ed9	Create foreign key relation graph and functions to query on it	2018-07-03 17:05:55 +03:00
mehmet furkan şahin	4db72c99f6	Specific DDLs are sequentialized when there is FK -[x] drop constraint -[x] drop column -[x] alter column type -[x] truncate are sequentialized if there is a foreign constraint from a distributed table to a reference table on the affected relations by the above commands.	2018-07-03 17:05:55 +03:00
mehmet furkan şahin	2fa4e38841	FK from dist to ref can be added with alter table	2018-07-03 17:05:01 +03:00
Murat Tuncer	4d35b92016	Add groundwork for citus_stat_statements api	2018-06-27 14:20:03 +03:00
Brian Cloutier	5ce18327a7	Don't spinloop when trying to cleanup a failed connection	2018-06-26 13:13:34 -07:00
Onder Kalaci	8ccb8b679e	Real-time executor marks multi shard relation accesses before opening connections	2018-06-25 18:40:31 +03:00
Onder Kalaci	2890154420	Make sure that TRUNCATE always opens a DDL access	2018-06-25 18:40:31 +03:00
Onder Kalaci	21038f0d0e	Make sure that inter-shard DDL commands are always covers both tables	2018-06-25 18:40:30 +03:00
Onder Kalaci	2f01894589	Track relation accesses using the connection management infrastructure	2018-06-25 18:40:30 +03:00
Onder Kalaci	d5472614df	Use non-data connection for intermediate results Make sure that intermediate results use a connection that is not associated with any placement. That is useful in two ways: - More complex queries can be executed with CTEs - Safely use the same connections when there is a foreign key to reference table from a distributed table, which needs to use the same connection for modifications since the reference table might cascade to the distributed table.	2018-06-21 13:26:13 +03:00
Onder Kalaci	7762d81cba	Move test UDF under test folder	2018-06-21 08:42:44 +03:00
Onder Kalaci	8f5821493a	Implement C interface for setting GUC We need the ability to switch to sequential mode (e.g., SET LOCAL citus.multi_shard_modify_mode = 'sequential'). This commit enables that.	2018-06-19 10:23:43 +03:00
Marco Slot	f3f2805978	Fix use-after-free that may occur for INSERT..SELECT in prepared statements	2018-06-18 22:55:06 -06:00
velioglu	53b2e81d01	Adds SELECT ... FOR UPDATE support for router plannable queries	2018-06-18 13:55:17 +03:00
Marco Slot	0bbe778760	Rename failOnError to alwaysThrowErrorOnFailure	2018-06-14 23:37:47 +02:00
Marco Slot	4ab8e87090	Always throw errors on failure on critical connection in router executor	2018-06-14 23:33:07 +02:00
Nils Dijk	73efcb22c4	Extract RoleSpecString and resolve role references	2018-06-14 11:38:42 +02:00
mehmet furkan şahin	d1a3b20115	foreign_constraint_utils is created	2018-06-07 18:19:24 +03:00
Onder Kalaci	a5370f5bb0	Realtime executor honours multi_shard_modify_mode We're relying on multi_shard_modify_mode GUC for real-time SELECTs. The name of the GUC is unfortunate, but, adding one more GUC (or renaming the GUC) would make the UX even worse. Given that this mode is mostly important for transaction blocks that involve modification /DDL queries along with real-time SELECTs, we can live with the confusion.	2018-06-06 14:59:54 +03:00
Onder Kalaci	d918556dca	INSERT .. SELECT pushdown honors multi_shard_modification_mode	2018-06-06 12:42:23 +03:00
Onder Kalaci	336044f2a8	master_modify_multiple_shards() and TRUNCATE honors multi_shard_modification_mode	2018-06-06 12:29:05 +03:00
Onder Kalaci	df44956dc3	Make sure that sequential DDL opens a single connection to each node After this commit DDL commands honour `citus.multi_shard_modify_mode`. We preferred using the code-path that executes single task router queries (e.g., ExecuteSingleModifyTask()) in order not to invent a new executor that is only applicable for DDL commands that require sequential execution.	2018-06-05 17:52:17 +03:00
Marco Slot	fd4ff29f2f	Add a debug message with distribution column value	2018-06-05 15:09:17 +03:00
Murat Tuncer	ba50e3f33e	Add handling for grant/revoke all tables in schema	2018-05-31 13:47:02 +03:00
Marco Slot	5f5f7b4fe0	Throw an error if placements cannot be found in router executor	2018-05-08 22:39:18 -04:00
Marco Slot	9438e5bde9	Ensure single-shard modifying CTEs are part of distributed transaction	2018-05-06 12:49:40 +02:00
Marco Slot	90cdfff602	Implement recursive planning for DML statements	2018-05-03 14:42:28 +02:00
velioglu	32bcd610c1	Support modify queries with multiple tables With this commit we begin to support modify queries with multiple tables if these queries are pushdownable.	2018-05-02 16:22:26 +03:00
Brian Cloutier	f8fb7a27fb	Don't copyObject into the wrong memory context utilityStmt sometimes (such as when it's inside of a plpgsql function) comes from a cached plan, which is kept in a child of the CacheMemoryContext. When we naively call copyObject we're copying it into a statement-local context, which corrupts the cached plan when it's thrown away.	2018-05-01 15:34:32 -07:00
mehmet furkan şahin	f2555317b6	ProcessVacuumStmt update on names	2018-04-27 14:37:01 +03:00
mehmet furkan şahin	a4153c6ab1	notice handler is implemented	2018-04-27 14:37:01 +03:00
Marco Slot	3d3c19a717	Improve messages for essential connection failures	2018-04-26 12:58:47 -06:00
Hadi Moshayedi	24659a97dc	Fail task in real-time executor if no placements found. (#2133 )	2018-04-26 12:05:24 -04:00
Murat Tuncer	a6fe5ca183	PG11 compatibility update - changes in ruleutils_11.c is reflected - vacuum statement api change is handled. We now allow multi-table vacuum commands. - some other function header changes are reflected - api conflicts between PG11 and earlier versions are handled by adding shims in version_compat.h - various regression tests are fixed due output and functionality in PG1 - no change is made to support new features in PG11 they need to be handled by new commit	2018-04-26 11:29:43 +03:00
Brian Cloutier	a59c1c634e	Fix cancellation of real time queries Without this change multi_real_time_transaction blocks forever (on Windows) in the block where it repeatedly calls pg_advisory_lock(15). This happens because the deadlock detector tries to cancel the backend but the backend never processes that signal.	2018-04-17 14:26:22 -07:00
Burak Yucesoy	b33b282030	Fix bug while DROPping partitioned table from worker We recently added partitionin support to Citus MX. We should not execute DROP table commands from MX workers but at the moment we try to execute such commands for partitioned tables. This PR fixes that problem by adding check.	2018-04-09 13:50:21 +03:00
Burak Yucesoy	0c283fa8a3	Add partitioning support to MX tables Previously, we prevented creation of partitioned tables on Citus MX. We decided to not focus on this feature until there is a need. Since now there are requests for this feature, we are implementing support for partitioned tables on Citus MX.	2018-04-06 12:47:06 +03:00
velioglu	698d585fb5	Remove broadcast join logic After this change all the logic related to shard data fetch logic will be removed. Planner won't plan any ShardFetchTask anymore. Shard fetch related steps in real time executor and task-tracker executor have been removed.	2018-03-30 11:45:19 +03:00
Murat Tuncer	224b0a8c14	Replace poll with select/poll Windows does not have poll(), so fall back to select()	2018-03-21 20:05:00 -07:00
Onder Kalaci	7dc9589b56	Handle failures during I/O This commit checks the connection status right after any IO happens on the socket. This is necessary since before this commit we didn't pass any information to the higher level functions whether we're done with the connection (e.g., no IO required anymore) or an errors happened during the IO.	2018-03-02 08:33:53 +02:00
Metin Doslu	bcf660475a	Add support for modifying CTEs	2018-02-27 15:08:32 +02:00
Marco Slot	ee6a751798	Only copy distributed plan when modifying it	2018-02-12 16:30:55 +01:00
Marco Slot	bd0ebac865	Skip call to ActiveReadableNodeList when there are no subplans	2018-01-29 16:05:10 +01:00
Hadi Moshayedi	ff26bcd5a5	Include sys/stat.h for S_IRUSR and S_IWUSR. (#1977 )	2018-01-26 16:21:48 -05:00
Dimitri Fontaine	952da72c55	Implement ALTER TABLE\|INDEX ... SET\|RESET (). PostgreSQL implements support for several relation kinds in a single statement, such as in the AlterTableStmt case, which supports both tables and indexes and more (see ATExecSetRelOptions in PostgreSQL source code file src/backend/commands/tablecmds.c for an example of that). As a consequence, this patch implements support for setting and resetting storage parameters on both relation kinds.	2018-01-17 21:56:40 +01:00
Dimitri Fontaine	17266e3301	Implement ALTER INDEX ... RENAME TO ... The command is now distributed among the shards when the table is distributed. To that effect, we fill in the DDLJob's targetRelationId with the OID of the table for which the index is defined, rather than the OID of the index itself.	2018-01-17 21:56:40 +01:00
Dimitri Fontaine	e010238280	Implement ALTER TABLE ... RENAME TO ... The implementation was already mostly in place, but the code was protected by a principled check against the operation. Turns out there's a nasty concurrency bug though with long identifier names, much as in #1664. To prevent deadlocks from happening, we could either review the DDL transaction management in shards and placements, or we can simply reject names with (NAMEDATALEN - 1) chars or more — that's because of the PostgreSQL array types being created with a one-char prefix: '_'.	2018-01-11 13:21:24 +01:00

1 2 3 4 5 ...

379 Commits (aeec3d1544e91ec47c1ac57c6e746ad7d3aac833)