citus

Commit Graph

Author	SHA1	Message	Date
onderkalaci	31aa589c50	Improve failure handling of distributed execution Prior to this commit, the code would skip processing the errors happened for local commands. Prior to https://github.com/citusdata/citus/pull/5379, it might make sense to allow the execution continue. But, as of today, if a modification fails on any placement, we can safely fail the execution. (cherry picked from commit `b4008bc872`)	2023-08-01 13:42:10 +03:00
aykut-bozkurt	0626f366c1	fix single tuple result memory leak (#6724 ) We should not omit to free PGResult when we receive single tuple result from an internal backend. Single tuple results are normally freed by our ReceiveResults for `tupleDescriptor != NULL` flow but not for those with `tupleDescriptor == NULL`. See PR #6722 for details. DESCRIPTION: Fixes memory leak issue with query results that returns single row. (cherry picked from commit `9e69dd0e7f`)	2023-02-17 14:31:34 +03:00
Onder Kalaci	feb5534c65	Do not create additional WaitEventSet for RemoteSocketClosed checks Before this commit, we created an additional WaitEventSet for checking whether the remote socket is closed per connection - only once at the start of the execution. However, for certain workloads, such as pgbench select-only workloads, the creation/deletion of the additional WaitEventSet adds ~7% CPU overhead, which is also reflected on the benchmark results. With this commit, we use the same WaitEventSet for the purposes of checking the remote socket at the start of the execution. We use "rebuildWaitEventSet" flag so that the executor can re-use the existing WaitEventSet. As a result, we see the following improvements on PG 15: main : 120051 tps, 0.532 ms latency avg. avoid_wes_rebuild: 127119 tps, 0.503 ms latency avg. And, on PG 14, as expected, there is no difference main : 129191 tps, 0.495 ms latency avg. avoid_wes_rebuild: 129480 tps, 0.494 ms latency avg. But, note that PG 15 is slightly (~1.5%) slower than PG 14. That is probably the overhead of checking the remote socket.	2022-12-14 22:42:55 +01:00
Onder Kalaci	d52da55ac0	Move WaitEvent to DistributedExecution Prep. for caching WaitEventsSet/WaitEvents	2022-12-14 21:59:19 +01:00
Marco Slot	666696c01c	Deprecate citus.replicate_reference_tables_on_activate, make it always off (#6474 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-11-04 16:21:10 +01:00
Önder Kalacı	8b624b5c9d	Detect remotely closed sockets and add a single connection retry in the executor (#6404 ) PostgreSQL 15 exposes WL_SOCKET_CLOSED in WaitEventSet API, which is useful for detecting closed remote sockets. In this patch, we use this new event and try to detect closed remote sockets in the executor. When a closed socket is detected, the executor now has the ability to retry the connection establishment. Note that, the executor can retry connection establishments only for the connection that has not been used. Basically, this patch is mostly useful for preventing the executor to fail if a cached connection is closed because of the worker node restart (or worker failover). In other words, the executor cannot retry connection establishment if we are in a distributed transaction AND any command has been sent over the connection. That requires more sophisticated retry mechanisms. For now, fixing the above use case is enough. Fixes #5538 Earlier discussions: #5908, #6259 and #6283 ### Summary of the current approach regards to earlier trials As noted, we explored some alternatives before getting into this. https://github.com/citusdata/citus/pull/6283 is simple, but lacks an important property. We should be checking for `WL_SOCKET_CLOSED` _before_ sending anything over the wire. Otherwise, it becomes very tricky to understand which connection is actually safe to retry. For example, in the current patch, we can safely check `transaction->transactionState == REMOTE_TRANS_NOT_STARTED` before restarting a connection. #6259 does what we intent here (e.g., check for sending any command). However, as @marcocitus noted, it is very tricky to handle `WaitEventSets` in multiple places. And, the executor is designed such that it reacts to the events. So, adding anything `pre-executor` seemed too ugly. In the end, I converged into this patch. This patch relies on the simplicity of #6283 and also does a very limited handling of `WaitEventSets`, just for our purpose. Just before we add any connection to the execution, we check if the remote session has already closed. With that, we do a brief interaction of multiple wait event processing, but with different purposes. The new wait event processing we added does not even consider cancellations. We let that handled by the main event processing loop. Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-10-14 15:08:49 +02:00
Gokhan Gulbiz	ac96370ddf	Use IsMultiStatementTransaction for SELECT .. FOR UPDATE queries (#6288 ) * Use IsMultiStatementTransaction instead of IsTransaction for row-locking operations. * Add regression test for SELECT..FOR UPDATE statement	2022-09-06 16:38:41 +02:00
Naisila Puka	7d6410c838	Drop postgres 12 support (#6040 ) * Remove if conditions with PG_VERSION_NUM < 13 * Remove server_above_twelve(&eleven) checks from tests * Fix tests * Remove pg12 and pg11 alternative test output files * Remove pg12 specific normalization rules * Some more if conditions in the code * Change RemoteCollationIdExpression and some pg12/pg13 comments * Remove some more normalization rules	2022-07-20 17:49:36 +03:00
Marco Slot	09ec366ff5	Improve nested execution checks and add GUC to disable	2022-05-20 18:55:43 +02:00
Teja Mupparti	e56fc34404	Fixes: #5787 In prepared statements, map any unused parameters to a generic type.	2022-05-13 19:31:05 -07:00
Jeff Davis	26f5e20580	PG15: update integer parsing APIs. Account for PG commits 3c6f8c011f and cfc7191dfe.	2022-05-02 10:12:03 -07:00
Marco Slot	055bbd6212	Use coordinated transaction when there are multiple queries per task	2022-03-18 15:04:27 +01:00
Onder Kalaci	338752d96e	Guard against hard wait event set errors Similar to https://github.com/citusdata/citus/pull/5158, but this time instead of the executor, use this in all the remaining places.	2022-03-14 14:35:56 +01:00
Onder Kalaci	953951007c	Move wait event error checks to connection manager	2022-03-14 14:35:56 +01:00
Marco Slot	72d8fde28b	Use intermediate results for re-partition joins	2022-02-23 19:40:21 +01:00
Onur Tirtir	8c8d696621	Not fail over to local execution when it's not supported (#5625 ) We fall back to local execution if we cannot establish any more connections to local node. However, we should not do that for the commands that we don't know how to execute locally (or we know we shouldn't execute locally). To fix that, we take localExecutionSupported take into account in CanFailoverPlacementExecutionToLocalExecution too. Moreover, we also prompt a more accurate hint message to inform user about whether the execution is failed because local execution is disabled by them, or because local execution wasn't possible for given command.	2022-01-25 16:43:21 +01:00
Onur Tirtir	4dc38e9e3d	Use EnsureCompatibleLocalExecutionState instead (#5640 )	2022-01-21 15:37:59 +01:00
Teja Mupparti	54862f8c22	(1) Functions will be delegated even when present in the scope of an explicit BEGIN/COMMIT transaction block or in a UDF calling another UDF. (2) Prohibit/Limit the delegated function not to do a 2PC (or any work on a remote connection). (3) Have a safety net to ensure the (2) i.e. we should block the connections from the delegated procedure or make sure that no 2PC happens on the node. (4) Such delegated functions are restricted to use only the distributed argument value. Note: To limit the scope of the project we are considering only Functions(not procedures) for the initial work. DESCRIPTION: Introduce a new flag "force_delegation" in create_distributed_function(), which will allow a function to be delegated in an explicit transaction block. Fixes #3265 Once the function is delegated to the worker, on that node during the planning distributed_planner() TryToDelegateFunctionCall() CheckDelegatedFunctionExecution() EnableInForceDelegatedFuncExecution() Save the distribution argument (Constant) ExecutorStart() CitusBeginScan() IsShardKeyValueAllowed() Ensure to not use non-distribution argument. ExecutorRun() AdaptiveExecutor() StartDistributedExecution() EnsureNoRemoteExecutionFromWorkers() Ensure all the shards are local to the node in the remoteTaskList. NonPushableInsertSelectExecScan() InitializeCopyShardState() EnsureNoRemoteExecutionFromWorkers() Ensure all the shards are local to the node in the placementList. This also fixes a minor issue: Properly handle expressions+parameters in distribution arguments	2022-01-19 16:43:33 -08:00
Marco Slot	ee3b50b026	Disallow remote execution from queries on shards	2022-01-07 17:46:21 +01:00
Önder Kalacı	8c0bc94b51	Enable replication factor > 1 in metadata syncing (#5392 ) - [x] Add some more regression test coverage - [x] Make sure returning works fine in case of local execution + remote execution (task->partiallyLocalOrRemote works as expected, already added tests) - [x] Implement locking properly (and add isolation tests) - [x] We do #shardcount round-trips on `SerializeNonCommutativeWrites`. We made it a single round-trip. - [x] Acquire locks for subselects on the workers & add isolation tests - [x] Add a GUC to prevent modification from the workers, hence increase the coordinator-only throughput - The performance slightly drops (~%15), unless `citus.allow_modifications_from_workers_to_replicated_tables` is set to false	2021-11-15 15:10:18 +03:00
Onder Kalaci	d5e89b1132	Unify distributed execution logic for single replicated tables Citus does not acquire any executor locks for shard replication == 1. With this commit, we unify this decision and exit early.	2021-11-08 13:52:20 +01:00
Philip Dubé	cc50682158	Fix typos. Spurred spotting "connectios" in logs	2021-10-25 13:54:09 +00:00
Onder Kalaci	ce4c4540c5	Simplify 2PC decision in the executor It seems like the decision for 2PC is more complicated than it should be. With this change, we do one behavioral change. In essense, before this commit, when a SELECT task with replication factor > 1 is executed, the executor was triggering 2PC. And, in fact, the transaction manager (`ConnectionModifiedPlacement()`) was able to understand not to trigger 2PC when no modification happens. However, for transaction blocks like: BEGIN; -- a command that triggers 2PC -- A SELECT command on replication > 1 .. COMMIT; The SELECT was used to be qualified as required 2PC. And, as a side-effect the executor was setting `xactProperties.errorOnAnyFailure = true;` So, the commands was failing at the time of execution. Now, they fail at the end of the transaction.	2021-10-23 09:06:28 +02:00
Onder Kalaci	575bb6dde9	Drop support for Inactive Shard placements Given that we do all operations via 2PC, there is no way for any placement to be marked as INACTIVE.	2021-10-22 18:03:35 +02:00
Önder Kalacı	b3299de81c	Drop support for citus.multi_shard_commit_protocol (#5380 ) In the past, we allowed users to manually switch to 1PC (e.g., one phase commit). However, with this commit, we don't. All multi-shard modifications are done via 2PC.	2021-10-21 14:01:28 +02:00
Önder Kalacı	3f726c72e0	When replication factor > 1, all modifications are done via 2PC (#5379 ) With Citus 9.0, we introduced `citus.single_shard_commit_protocol` which defaults to 2PC. With this commit, we prevent any user to set it to 1PC and drop support for `citus.single_shard_commit_protocol`. Although this might add some overhead for users, it is already the default behaviour (so less likely) and marking placements as INVALID is much worse.	2021-10-20 01:39:03 -07:00
Halil Ozan Akgul	43d5853b6d	Fixes function names in comments	2021-10-06 09:24:43 +03:00
Jelte Fennema	bb5c494104	Enable binary encoding by default on PG14 Since PG14 we can now use binary encoding for arrays and composite types that contain user defined types. This was fixed in this commit in Postgres: `670c0a1d47` This change starts using that knowledge, by not necessarily falling back to text encoding anymore for those types. While doing this and testing a bit more I found various cases where binary encoding would fail that our checks didn't cover. This fixes those cases and adds tests for those. It also fixes EXPLAIN ANALYZE never using binary encoding, which was a leftover of workaround that was not necessary anymore. Finally, it changes the default for both `citus.enable_binary_protocol` and `citus.binary_worker_copy_format` to `true` for PG14 and up. In our cloud offering `binary_worker_copy_format` already was true by default. `enable_binary_protocol` had some bug with MX and user defined types, this bug was fixed by the above mentioned fixes.	2021-09-06 10:27:29 +02:00
Onder Kalaci	86bd28b92c	Guard against hard WaitEvenSet errors In short, add wrappers around Postgres' AddWaitEventToSet() and ModifyWaitEvent(). AddWaitEventToSet()/ModifyWaitEvent*() may throw hard errors. For example, when the underlying socket for a connection is closed by the remote server and already reflected by the OS, however Citus hasn't had a chance to get this information. In that case, if replication factor is >1, Citus can failover to other nodes for executing the query. Even if replication factor = 1, Citus can give much nicer errors. So CitusAddWaitEventSetToSet()/CitusModifyWaitEvent() simply puts AddWaitEventToSet()/ModifyWaitEvent() into a PG_TRY/PG_CATCH block in order to catch any hard errors, and returns this information to the caller.	2021-08-10 09:35:03 +02:00
SaitTalhaNisanci	a4944a2102	Rename CoordinatedTransactionShouldUse2PC (#4995 )	2021-05-21 18:57:42 +03:00
Onder Kalaci	926069a859	Wait until all connections are successfully established Comment from the code: /* * Iterate until all the tasks are finished. Once all the tasks * are finished, ensure that that all the connection initializations * are also finished. Otherwise, those connections are terminated * abruptly before they are established (or failed). Instead, we let * the ConnectionStateMachine() to properly handle them. * * Note that we could have the connections that are not established * as a side effect of slow-start algorithm. At the time the algorithm * decides to establish new connections, the execution might have tasks * to finish. But, the execution might finish before the new connections * are established. / Note that the abruptly terminated connections lead to the following errors: 2020-11-16 21:09:09.800 CET [16633] LOG: could not accept SSL connection: Connection reset by peer 2020-11-16 21:09:09.872 CET [16657] LOG: could not accept SSL connection: Undefined error: 0 2020-11-16 21:09:09.894 CET [16667] LOG: could not accept SSL connection: Connection reset by peer To easily reproduce the issue: - Create a single node Citus - Add the coordinator to the metadata - Create a distributed table with shards on the coordinator - f.sql: select count() from test; - pgbench -f /tmp/f.sql postgres -T 12 -c 40 -P 1 or pgbench -f /tmp/f.sql postgres -T 12 -c 40 -P 1 -C	2021-05-19 15:59:13 +02:00
Onder Kalaci	995adf1a19	Executor takes connection establishment and task execution costs into account With this commit, the executor becomes smarter about refrain to open new connections. The very basic example is that, if the connection establishments take 1000ms and task executions as 5 msecs, the executor becomes smart enough to not establish new connections.	2021-05-19 15:48:07 +02:00
Onder Kalaci	28b0b4ebd1	Move slow start increment to generic place	2021-05-19 14:31:20 +02:00
Marco Slot	00792831ad	Add execution memory contexts and free after local query execution	2021-05-18 16:11:43 +02:00
SaitTalhaNisanci	ff2a125a5b	Lookup hostname before execution (#4976 ) We lookup the hostname just before the execution so that even if there are cached entries in the prepared statement cache we use the updated entries.	2021-05-18 16:46:31 +03:00
Onder Kalaci	cc4870a635	Remove wrong PG_USED_FOR_ASSERTS_ONLY	2021-05-11 12:58:37 +02:00
Onder Kalaci	5482d5822f	Keep more statistics about connection establishment times When DEBUG4 enabled, Citus now prints per connection establishment time.	2021-04-16 14:56:31 +02:00
Onder Kalaci	5b78f6cd63	Keep more execution statistics When DEBUG4 enabled, Citus now prints per task execution times.	2021-04-16 14:45:00 +02:00
Onder Kalaci	e65e72130d	Rename use -> shouldUse Because setting the flag doesn't necessarily mean that we'll use 2PC. If connections are read-only, we will not use 2PC. In other words, we'll use 2PC only for connections that modified any placements.	2021-03-12 08:29:43 +00:00
Onder Kalaci	6a7ed7b309	Do not trigger 2PC for reads on local execution Before this commit, Citus used 2PC no matter what kind of local query execution happens. For example, if the coordinator has shards (and the workers as well), even a simple SELECT query could start 2PC: ```SQL WITH cte_1 AS (SELECT * FROM test LIMIT 10) SELECT count(*) FROM cte_1; ``` In this query, the local execution of the shards (and also intermediate result reads) triggers the 2PC. To prevent that, Citus now distinguishes local reads and local writes. And, Citus switches to 2PC only if a modification happens. This may still lead to unnecessary 2PCs when there is a local modification and remote SELECTs only. Though, we handle that separately via #4587.	2021-03-12 08:29:43 +00:00
Philip Dubé	4e22f02997	Fix various typos due to zealous repetition	2021-03-04 19:28:15 +00:00
Nils Dijk	d127516dc8	Mitigate segfault in connection statemachine (#4551 ) As described in the comment, we have observed crashes in production due to a segfault caused by the dereference of a NULL pointer in our connection statemachine. As a mitigation, preventing system crashes, we provide an error with a small explanation of the issue. Unfortunately the case is not reliably reproduced yet, hence the inability to add tests. DESCRIPTION: Prevent segfaults when SAVEPOINT handling cannot recover from connection failures	2021-01-25 15:55:04 +01:00
Onder Kalaci	c546ec5e78	Local node connection management When Citus needs to parallelize queries on the local node (e.g., the node executing the distributed query and the shards are the same), we need to be mindful about the connection management. The reason is that the client backends that are running distributed queries are competing with the client backends that Citus initiates to parallelize the queries in order to get a slot on the max_connections. In that regard, we implemented a "failover" mechanism where if the distributed queries cannot get a connection, the execution failovers the tasks to the local execution. The failover logic is follows: - As the connection manager if it is OK to get a connection - If yes, we are good. - If no, we fail the workerPool and the failure triggers the failover of the tasks to local execution queue The decision of getting a connection is follows: /* * For local nodes, solely relying on citus.max_shared_pool_size or * max_connections might not be sufficient. The former gives us * a preview of the future (e.g., we let the new connections to establish, * but they are not established yet). The latter gives us the close to * precise view of the past (e.g., the active number of client backends). * * Overall, we want to limit both of the metrics. The former limit typically * kics in under regular loads, where the load of the database increases in * a reasonable pace. The latter limit typically kicks in when the database * is issued lots of concurrent sessions at the same time, such as benchmarks. */	2020-12-03 14:16:13 +03:00
Onder Kalaci	f7e1aa3f22	Multi-row INSERTs use local execution when placements are local Multi-row execution already uses sequential execution. When shards are local, using local execution is profitable as it avoids an extra connection establishment to the local node.	2020-12-01 21:37:59 +03:00
Önder Kalacı	c760cd3470	Move local execution after remote execution (#4301 ) * Move local execution after the remote execution Before this commit, when both local and remote tasks exist, the executor was starting the execution with local execution. There is no strict requirements on this. Especially considering the adaptive connection management improvements that we plan to roll soon, moving the local execution after to the remote execution makes more sense. The adaptive connection management for single node Citus would look roughly as follows: - Try to connect back to the coordinator for running parallel queries. - If succeeds, go on and execute tasks in parallel - If fails, fallback to the local execution So, we'll use local execution as a fallback mechanism. And, moving it after to the remote execution allows us to implement such further scenarios.	2020-11-24 13:43:38 +01:00
Önder Kalacı	532b457554	Solidify the slow-start algorithm (#4318 ) The adaptive executor emulates the TCP's slow start algorithm. Whenever the executor needs new connections, it doubles the number of connections established in the previous iteration. This approach is powerful. When the remote queries are very short (like index lookup with < 1ms), even a single connection is sufficent most of the time. When the remote queries are long, the executor can quickly establish necessary number of connections. One missing piece on our implementation seems that the executor keeps doubling the number of connections even if the previous connection attempts have been finalized. Instead, we should wait until all the attempts are finalized. This is how TCP's slow-start works. Plus, it decreases the unnecessary pressure on the remote nodes.	2020-11-23 19:20:13 +01:00
Onder Kalaci	c433c66f2b	Do not execute subplans multiple times with cursors Before this commit, we let AdaptiveExecutorPreExecutorRun() to be effective multiple times on every FETCH on cursors. That does not affect the correctness of the query results, but adds significant overhead.	2020-11-20 10:43:56 +01:00
Onur Tirtir	f80f4839ad	Remove unused functions that cppcheck found	2020-10-19 13:50:52 +03:00
Onder Kalaci	fe3caf3bc8	Local execution considers intermediate result size limit With this commit, we make sure that local execution adds the intermediate result size as the distributed execution adds. Plus, it enforces the citus.max_intermediate_result_size value.	2020-10-15 17:18:55 +02:00
Sait Talha Nisanci	ecde6c6eef	Introduce GetCurrentLocalExecutionStatus wrapper We should not access CurrentLocalExecutionStatus directly because that would mean that we could also set it directly, which we shouldn't because we have checks to see if the new state is possible, otherwise we error.	2020-10-15 15:38:19 +03:00

1 2 3

146 Commits (31aa589c50171853026717c777c0fccc9e5ef481)