citus

Commit Graph

Author	SHA1	Message	Date
aykut-bozkurt	67ac3da2b0	added citus_depended_objects udf and HideCitusDependentObjects GUC to hide citus depended objects from pg meta queries (#6055 ) use RecurseObjectDependencies api to find if an object is citus depended make vanilla tests runnable to see if citus_depended function is working correctly	2022-07-25 16:43:34 +03:00
Onder Kalaci	483a3a5875	PG 15 Compat: Resolve compile issues + shmem requests Similar to #5897, one more step for running Citus with PG 15. This PR at least make Citus run with PG 15. I have not tried running the tests with PG 15. Shmem changes are based on `4f2400cb3f` Compile breaks are mostly due to #6008	2022-07-15 10:11:39 +02:00
Ahmet Gedemenli	c8e1e243b8	Fix matviews for citus_add_local_table_to_metadata (#6023 )	2022-07-04 17:00:07 +03:00
Jelte Fennema	184c7c0bce	Make enterprise features open source (#6008 ) This PR makes all of the features open source that were previously only available in Citus Enterprise. Features that this adds: 1. Non blocking shard moves/shard rebalancer (`citus.logical_replication_timeout`) 2. Propagation of CREATE/DROP/ALTER ROLE statements 3. Propagation of GRANT statements 4. Propagation of CLUSTER statements 5. Propagation of ALTER DATABASE ... OWNER TO ... 6. Optimization for COPY when loading JSON to avoid double parsing of the JSON object (`citus.skip_jsonb_validation_in_copy`) 7. Support for row level security 8. Support for `pg_dist_authinfo`, which allows storing different authentication options for different users, e.g. you can store passwords or certificates here. 9. Support for `pg_dist_poolinfo`, which allows using connection poolers in between coordinator and workers 10. Tracking distributed query execution times using citus_stat_statements (`citus.stat_statements_max`, `citus.stat_statements_purge_interval`, `citus.stat_statements_track`). This is disabled by default. 11. Blocking tenant_isolation 12. Support for `sslkey` and `sslcert` in `citus.node_conninfo`	2022-06-16 00:23:46 -07:00
Ying Xu	a1151c2395	Clear metadatacache during abort for create extension (#5907 ) * Bug fix for bug #5876. Memset MetadataCacheSystem every time there is an abort * Created an ObjectAccessHook that saves the transactionlevel of when citus was created and will clear metadatacache if that transaction level is rolled back. Added additional tests to make sure metadatacache is cleared	2022-05-20 13:47:58 -07:00
Marco Slot	7abcfac61f	Add caching for functions that check the backend type	2022-05-20 19:02:37 +02:00
Marco Slot	09ec366ff5	Improve nested execution checks and add GUC to disable	2022-05-20 18:55:43 +02:00
jeff-davis	a9f8a60007	Columnar: support relation options with ALTER TABLE. (#5935 ) Columnar: support relation options with ALTER TABLE. Use ALTER TABLE ... SET/RESET to specify relation options rather than alter_columnar_table_set() and alter_columnar_table_reset(). Not only is this more ergonomic, but it also allows better integration because it can be treated like DDL on a regular table. For instance, citus can use its own ProcessUtility_hook to distribute the new settings to the shards. DESCRIPTION: Columnar: support relation options with ALTER TABLE.	2022-05-20 08:35:00 -07:00
gledis69	4731630741	Add distributing lock command support	2022-05-20 12:28:07 +03:00
Marco Slot	ceb593c9da	Convert citus.hide_shards_from_app_name_prefixes to citus.show_shards_for_app_name_prefixes	2022-05-03 14:22:13 +02:00
Marco Slot	9476f377b5	Remove old re-partitioning functions	2022-04-04 18:11:52 +02:00
jeff-davis	c485a04139	Separate build of citus.so and citus_columnar.so. (#5805 ) * Separate build of citus.so and citus_columnar.so. Because columnar code is statically-linked to both modules, it doesn't make sense to load them both at once. A subsequent commit will make the modules entirely separate and allow loading them both simultaneously. Author: Yanwen Jin * Separate citus and citus_columnar modules. Now the modules are independent. Columnar can be loaded by itself, or along with citus. Co-authored-by: Jeff Davis <jefdavi@microsoft.com>	2022-03-31 19:47:17 -07:00
Onder Kalaci	af4ba3eb1f	Remove citus.enable_cte_inlining GUC In Postgres 12+, users can adjust whether to inline/not inline CTEs by [NOT] MATERIALIZED keywords. So, this GUC is already useless.	2022-03-22 17:14:44 +01:00
Marco Slot	7559ad12ba	Change create_object_propagation default to immediate	2022-03-09 17:40:50 +01:00
Onder Kalaci	c32b2de1a7	Improve citus_lock_waits 1) Remove useless columns 2) Show backends that are blocked on a DDL even before gpid is assigned 3) One minor bugfix, where we clear distributedCommandOriginator properly.	2022-03-07 11:10:44 +01:00
Marco Slot	43e4dd3808	Add a citus.internal_reserved_connections setting	2022-03-02 19:13:53 +01:00
Marco Slot	dcfbb51b6b	Revert "Build Columnar.so and make Citus depends on it (#5661 )" This reverts commit `a4133c69e8`.	2022-03-02 11:33:15 +01:00
ywj	a4133c69e8	Build Columnar.so and make Citus depends on it (#5661 ) * [Columnar] Build columnar.so and let citus depends on it Co-authored-by: Yanwen Jin <yanwjin@microsoft.com> Co-authored-by: Ying Xu <32597660+yxu2162@users.noreply.github.com> Co-authored-by: jeff-davis <Jeffrey.Davis@microsoft.com>	2022-03-01 23:31:14 +03:00
Nils Dijk	65bd540943	Feature: configure object propagation behaviour in transactions (#5724 ) DESCRIPTION: Add GUC to control ddl creation behaviour in transactions Historically we would _not_ propagate objects when we are in a transaction block. Creation of distributed tables would not always work in sequential mode, hence objects created in the same transaction as distributing a table that would use the just created object wouldn't work. The benefit was that the user could still benefit from parallelism. Now that the creation of distributed tables is supported in sequential mode it would make sense for users to force transactional consistency of ddl commands for distributed tables. A transaction could switch more aggressively to sequential mode when creating new objects in a transaction. We don't change the default behaviour just yet. Also, many objects would not even propagate their creation when the transaction was already set to sequential, leaving the probability of a self deadlock. The new policy checks solve this discrepancy between objects as well.	2022-03-01 17:29:31 +03:00
Marco Slot	3cd9aa655a	Stop using citus.binary_worker_copy_format	2022-02-23 19:40:21 +01:00
Teja Mupparti	a62901396b	Allow unsafe triggers via a GUC	2022-02-21 22:45:17 -08:00
Onder Kalaci	abd5b1c506	Prevent any monitoring view/udf to show already exited backends The low-level StoreAllActiveTransactions() function filters out backends that exited. Before this commit, if you run a pgbench, after that you'd still see the backends show up: ```SQL select count() from get_global_active_transactions(); ┌───────┐ │ count │ ├───────┤ │ 538 │ └───────┘ ``` After this patch, only active backends show-up: ```SQL select count() from get_global_active_transactions(); ┌───────┐ │ count │ ├───────┤ │ 72 │ └───────┘ ```	2022-02-14 17:34:32 +01:00
Onder Kalaci	1c30f61a70	Prevent citus.node_conninfo to use "application_name" With https://github.com/citusdata/citus/pull/5657, Citus uses a fixed application_name while connecting to remote nodes for internal purposes. It means that we cannot allow users to override it via citus.node_conninfo.	2022-02-09 13:22:04 +01:00
Halil Ozan Akgul	8ee02b29d0	Introduce global PID	2022-02-08 16:49:38 +03:00
Marco Slot	872f0a79db	Remove random shard placement policy	2022-02-06 21:55:58 +01:00
Marco Slot	0cae8e7d6b	Remove local-node-first shard placement	2022-02-06 21:36:34 +01:00
Ying Xu	b5c116449b	Removed dependency from EnsureTableOwner (#5676 ) Removed dependency for EnsureTableOwner. Also removed pg_fini() and columnar_tableam_finish() Still need to remove CheckCitusVersion dependency to make Columnar_tableam.h dependency free from Citus.	2022-02-04 12:45:07 -08:00
Onder Kalaci	ff234fbfd2	Unify old GUCs into a single one Replaces citus.enable_object_propagation with citus.enable_metadata_sync Also, within Citus 11 release cycle, we added citus.enable_metadata_sync_by_default, that is also replaced with citus.enable_metadata_sync. In essence, when citus.enable_metadata_sync is set to true, all the objects and the metadata is send to the remote node. We strongly advice that the users never changes the value of this GUC.	2022-02-04 10:52:56 +01:00
Onur Tirtir	ff3913ad99	Copy errmsg for distributed deadlock error into heap (#5641 ) multi_log_hook() hook is called by EmitErrorReport() when emitting the ereport either to frontend or to the server logs. And some callers of EmitErrorReport() (e.g.: errfinish()) seems to assume that string fields of given ErrorData object needs to be freed. For this reason, we copy the message into heap here. I don't think we have faced with such a problem before but it seems worth fixing as it is theoretically possible due to the reasoning above.	2022-01-24 06:27:41 -08:00
Marco Slot	33bfa0b191	Hide shards from application_name's with a specific prefix	2022-01-18 15:20:55 +04:00
jeff-davis	2e03efd91e	Columnar: move DDL hooks to citus to remove dependency. (#5547 ) Add a new hook ColumnarTableSetOptions_hook so that citus can get control when the columnar table options change.	2022-01-04 23:26:46 -08:00
Onder Kalaci	fc98f83af2	Add citus.grep_remote_commands Simply applies ```SQL SELECT textlike(command, citus.grep_remote_commands) ``` And, if returns true, the command is logged. Else, the log is ignored. When citus.grep_remote_commands is empty string, all commands are logged.	2021-12-17 11:47:40 +01:00
Önder Kalacı	8c0bc94b51	Enable replication factor > 1 in metadata syncing (#5392 ) - [x] Add some more regression test coverage - [x] Make sure returning works fine in case of local execution + remote execution (task->partiallyLocalOrRemote works as expected, already added tests) - [x] Implement locking properly (and add isolation tests) - [x] We do #shardcount round-trips on `SerializeNonCommutativeWrites`. We made it a single round-trip. - [x] Acquire locks for subselects on the workers & add isolation tests - [x] Add a GUC to prevent modification from the workers, hence increase the coordinator-only throughput - The performance slightly drops (~%15), unless `citus.allow_modifications_from_workers_to_replicated_tables` is set to false	2021-11-15 15:10:18 +03:00
Ahmet Gedemenli	14a33d4e8e	Introduce GUC citus.use_citus_managed_tables	2021-11-11 14:09:06 +03:00
Marco Slot	78866df13c	Remove master_append_table_to_shard UDF	2021-11-08 10:43:24 +01:00
Jelte Fennema	57a0228c52	Fix string-concatenation warning on Clang 13 (#5425 ) Clang 13 complains about a suspicious string concatenation. It thinks we might have missed a comma. This adds parentheses to make it clear that concatenation is indeed what we meant.	2021-11-01 13:55:43 +03:00
Philip Dubé	cc50682158	Fix typos. Spurred spotting "connectios" in logs	2021-10-25 13:54:09 +00:00
Önder Kalacı	b3299de81c	Drop support for citus.multi_shard_commit_protocol (#5380 ) In the past, we allowed users to manually switch to 1PC (e.g., one phase commit). However, with this commit, we don't. All multi-shard modifications are done via 2PC.	2021-10-21 14:01:28 +02:00
Önder Kalacı	3f726c72e0	When replication factor > 1, all modifications are done via 2PC (#5379 ) With Citus 9.0, we introduced `citus.single_shard_commit_protocol` which defaults to 2PC. With this commit, we prevent any user to set it to 1PC and drop support for `citus.single_shard_commit_protocol`. Although this might add some overhead for users, it is already the default behaviour (so less likely) and marking placements as INVALID is much worse.	2021-10-20 01:39:03 -07:00
Halil Ozan Akgul	9c9d4b5eeb	Turn MX on by default	2021-10-08 18:17:21 +03:00
Jelte Fennema	bb5c494104	Enable binary encoding by default on PG14 Since PG14 we can now use binary encoding for arrays and composite types that contain user defined types. This was fixed in this commit in Postgres: `670c0a1d47` This change starts using that knowledge, by not necessarily falling back to text encoding anymore for those types. While doing this and testing a bit more I found various cases where binary encoding would fail that our checks didn't cover. This fixes those cases and adds tests for those. It also fixes EXPLAIN ANALYZE never using binary encoding, which was a leftover of workaround that was not necessary anymore. Finally, it changes the default for both `citus.enable_binary_protocol` and `citus.binary_worker_copy_format` to `true` for PG14 and up. In our cloud offering `binary_worker_copy_format` already was true by default. `enable_binary_protocol` had some bug with MX and user defined types, this bug was fixed by the above mentioned fixes.	2021-09-06 10:27:29 +02:00
SaitTalhaNisanci	c8326df8c0	Fix missing comma in connection options (#5206 )	2021-08-25 13:40:42 +03:00
Jelte Fennema	a31429aae5	Allow configuring tcp_user_timeout using citus.node_conn_info (#5203 ) `tcp_user_timeout` is the awesome relatively unknown big brother of the TCP keepalive related options. Instead of depending on keepalives being sent, this determines that a socket is dead by waiting at most N seconds for an ack of data that it has sent. It's exposed in libpq starting from PG12.	2021-08-24 11:48:40 +03:00
Ahmet Gedemenli	51d410bb7b	Add check for alphabetically sorted gucs Move to a separate script Add the new script to readme	2021-08-05 16:37:49 +03:00
Onder Kalaci	2c349e6dfd	Use current user to sync metadata Before this commit, we always synced the metadata with superuser. However, that creates various edge cases such as visibility errors or self distributed deadlocks or complicates user access checks. Instead, with this commit, we use the current user to sync the metadata. Note that, `start_metadata_sync_to_node` still requires super user because accessing certain metadata (like pg_dist_node) always require superuser (e.g., the current user should be a superuser). However, metadata syncing operations regarding the distributed tables can now be done with regular users, as long as the user is the owner of the table. A table owner can still insert non-sense metadata, however it'd only affect its own table. So, we cannot do anything about that.	2021-07-16 13:25:27 +02:00
Ahmet Gedemenli	089ef35940	Disable dropping and truncating known shards Add test for disabling dropping and truncating known shards	2021-06-02 14:30:27 +02:00
Jelte Fennema	1a83628195	Use "orphaned shards" naming in more places We were not very consistent in how we named these shards.	2021-06-04 11:39:19 +02:00
Ahmet Gedemenli	103cf34418	Sort GUCs in alphabetical order	2021-06-02 12:52:18 +03:00
Hanefi Onaldi	878513f325	Remove all occurences of replication_model GUC	2021-05-21 16:14:59 +03:00
SaitTalhaNisanci	82f34a8d88	Enable citus.defer_drop_after_shard_move by default (#4961 ) Enable citus.defer_drop_after_shard_move by default	2021-05-21 10:48:32 +03:00
Jelte Fennema	10f06ad753	Fetch shard size on the fly for the rebalance monitor Without this change the rebalancer progress monitor gets the shard sizes from the `shardlength` column in `pg_dist_placement`. This column needs to be updated manually by calling `citus_update_table_statistics`. However, `citus_update_table_statistics` could lead to distributed deadlocks while database traffic is on-going (see #4752). To work around this we don't use `shardlength` column anymore. Instead for every rebalance we now fetch all shard sizes on the fly. Two additional things this does are: 1. It adds tests for the rebalance progress function. 2. If a shard move cannot be done because a source or target node is unreachable, then we error in stop the rebalance, instead of showing a warning and continuing. When using the by_disk_size rebalance strategy it's not safe to continue with other moves if a specific move failed. It's possible that the failed move made space for the next move, and because the failed move never happened this space now does not exist. 3. Adds two new columns to the result of `get_rebalancer_progress` which shows the size of the shard on the source and target node. Fixes #4930	2021-05-20 16:38:17 +02:00
Nils Dijk	a6c2d2a4c4	Feature: alter database owner (#4986 ) DESCRIPTION: Add support for ALTER DATABASE OWNER This adds support for changing the database owner. It achieves this by marking the database as a distributed object. By marking the database as a distributed object it will look for its dependencies and order the user creation commands (enterprise only) before the alter of the database owner. This is mostly important when adding new nodes. By having the database marked as a distributed object it can easily understand for which `ALTER DATABASE ... OWNER TO ...` commands to propagate by resolving the object address of the database and verifying it is a distributed object, and hence should propagate changes of owner ship to all workers. Given the ownership of the database might have implications on subsequent commands in transactions we force sequential mode for transactions that have a `ALTER DATABASE ... OWNER TO ...` command in them. This will fail the transaction with meaningful help when the transaction already executed parallel statements. By default the feature is turned off since roles are not automatically propagated, having it turned on would cause hard to understand errors for the user. It can be turned on by the user via setting the `citus.enable_alter_database_owner`.	2021-05-20 13:27:44 +02:00
Onder Kalaci	926069a859	Wait until all connections are successfully established Comment from the code: /* * Iterate until all the tasks are finished. Once all the tasks * are finished, ensure that that all the connection initializations * are also finished. Otherwise, those connections are terminated * abruptly before they are established (or failed). Instead, we let * the ConnectionStateMachine() to properly handle them. * * Note that we could have the connections that are not established * as a side effect of slow-start algorithm. At the time the algorithm * decides to establish new connections, the execution might have tasks * to finish. But, the execution might finish before the new connections * are established. / Note that the abruptly terminated connections lead to the following errors: 2020-11-16 21:09:09.800 CET [16633] LOG: could not accept SSL connection: Connection reset by peer 2020-11-16 21:09:09.872 CET [16657] LOG: could not accept SSL connection: Undefined error: 0 2020-11-16 21:09:09.894 CET [16667] LOG: could not accept SSL connection: Connection reset by peer To easily reproduce the issue: - Create a single node Citus - Add the coordinator to the metadata - Create a distributed table with shards on the coordinator - f.sql: select count() from test; - pgbench -f /tmp/f.sql postgres -T 12 -c 40 -P 1 or pgbench -f /tmp/f.sql postgres -T 12 -c 40 -P 1 -C	2021-05-19 15:59:13 +02:00
Onder Kalaci	995adf1a19	Executor takes connection establishment and task execution costs into account With this commit, the executor becomes smarter about refrain to open new connections. The very basic example is that, if the connection establishments take 1000ms and task executions as 5 msecs, the executor becomes smart enough to not establish new connections.	2021-05-19 15:48:07 +02:00
Nils Dijk	c91f8d8a15	Feature: localhost guc (#4836 ) DESCRIPTION: introduce `citus.local_hostname` GUC for connections to the current node Citus once in a while needs to connect to itself for some systems operations. This used to be hardcoded to `localhost`. The hardcoded hostname causes some issues, for example in environments where `sslmode=verify-full` is required. It is not always desirable or even feasible to get `localhost` as an alt name on the certificate. By introducing a GUC to use when connecting to the current instance the user has more control what network path is used and what hostname is required to be present in the server certificate.	2021-05-12 16:59:44 +02:00
Jelte Fennema	cbbd10b974	Implement an improvement threshold in the rebalancer (#4927 ) Every move in the rebalancer algorithm results in an improvement in the balance. However, even if the improvement in the balance was very small the move was still chosen. This is especially problematic if the shard itself is very big and the move will take a long time. This changes the rebalancer algorithm to take the relative size of the balance improvement into account when choosing moves. By default a move will not be chosen if it improves the balance by less than half of the size of the shard. An extra argument is added to the rebalancer functions so that the user can decide to lower the default threshold if the ignored move is wanted anyway.	2021-05-11 14:24:59 +02:00
SaitTalhaNisanci	6b1904d37a	When moving a shard to a new node ensure there is enough space (#4929 ) * When moving a shard to a new node ensure there is enough space * Add WairForMiliseconds time utility * Add more tests and increase readability * Remove the retry loop and use a single udf for disk stats * Address review * address review Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>	2021-05-06 17:28:02 +03:00
Ahmet Gedemenli	fe65be993e	Sort GUCs in alphabetic order	2021-04-26 15:05:42 +03:00
Onder Kalaci	918838e488	Allow constant VALUES clauses in pushdown queries As long as the VALUES clause contains constant values, we should not recursively plan the queries/CTEs. This is a follow-up work of #1805. So, we can easily apply OUTER join checks as if VALUES clause is a reference table/immutable function.	2021-04-21 14:28:08 +02:00
SaitTalhaNisanci	03832f353c	Drop postgres 11 support	2021-03-25 09:20:28 +03:00
Marco Slot	fbc2147e11	Replace MAX_PUT_COPY_DATA_BUFFER_SIZE by citus.remote_copy_flush_threshold GUC	2021-03-16 06:00:38 +01:00
Marco Slot	1646fca445	Add GUC to set maximum connection lifetime	2021-03-16 01:57:57 +01:00
Onder Kalaci	f297c96ec5	Add regression tests for COPY into colocated intermediate results To add the tests without too much data, make the copy switchover configurable.	2021-02-11 15:41:06 +01:00
Onder Kalaci	c804c9aa21	Allow local execution for intermediate results in COPY When COPY is used for copying into co-located files, it was not allowed to use local execution. The primary reason was Citus treating co-located intermediate results as co-located shards, and COPY into the distributed table was done via "format result". And, local execution of such COPY commands was not implemented. With this change, we implement support for local execution with "format result". To do that, we use the buffer for every file on shardState->copyOutState, similar to how local copy on shards are implemented. In fact, the logic is similar to local copy on shards, but instead of writing to the shards, Citus writes the results to a file. The logic relies on LOCAL_COPY_FLUSH_THRESHOLD, and flushes only when the size exceeds the threshold. But, unlike local copy on shards, in this case we write the headers and footers just once.	2021-02-09 15:00:06 +01:00
Onder Kalaci	30d0a65f40	Adds citus.enable_local_reference_table_foreign_keys When enabled any foreign keys between local tables and reference tables supported by converting the local table to a citus local table. When the coordinator is not in the metadata, the logic is disabled as foreign keys are not allowed in this configuration.	2021-01-15 18:04:52 +03:00
Ahmet Gedemenli	9a100bcdb9	Remove unused GUCs Remove deprecated variables Remove GUC citus.sslmode Remove GUC citus.expire_cached_shards Remove GUC citus.task_tracker_delay Remove GUC citus.max_assign_task_batch_size Remove GUC citus.max_tracked_tasks_per_node Remove GUC citus.max_running_tasks_per_node Remove GUC citus.large_table_shard_count Remove GUC citus.max_task_string_size Remove GUC citus.binary_master_copy_format	2021-01-15 13:30:45 +03:00
Marco Slot	011283122b	Add the shard rebalancer implementation	2021-01-07 16:51:55 +01:00
Sait Talha Nisanci	1d82972ff4	Increase the performance with a trick Instead of sending NULL's over a network, we now convert the subqueries in the form of: SELECT t.a, NULL, NULL FROM (SELECT a FROM table)t; And we recursively plan the inner part so that we don't send the NULL's over network. We still need the NULLs in the outer subquery because we currently don't have an easy way of updating all the necessary places in the query. Add some documentation for how the conversion is done	2020-12-15 18:18:36 +03:00
Sait Talha Nisanci	5618f3a3fc	Use BaseRestrictInfo for finding equality columns Baseinfo also has pushed down filters etc, so it makes more sense to use BaseRestrictInfo to determine what columns have constant equality filters. Also RteIdentity is used for removing conversion candidates instead of rteIndex.	2020-12-15 18:18:36 +03:00
Sait Talha Nisanci	26d9f0b457	Use auto mode in tests and fix debug message	2020-12-15 18:17:10 +03:00
Sait Talha Nisanci	f3d55448b3	Choose distributed table if it has a unique index in filter When doing local-distributed table joins we convert one of them to subquery. The current policy is that we convert distributed tables to subquery if it has a unique index on a column that has unique index(primary key also has a unique index).	2020-12-15 18:17:10 +03:00
Onder Kalaci	8f8390ed6e	Recursively plan local table joins The logical planner cannot handle joins between local and distributed table. Instead, we can recursively plan one side of the join and let the logical planner handle the rest. Our algorithm is a little smart, trying not to recursively plan distributed tables, but favors local tables.	2020-12-15 18:17:10 +03:00
Onder Kalaci	c546ec5e78	Local node connection management When Citus needs to parallelize queries on the local node (e.g., the node executing the distributed query and the shards are the same), we need to be mindful about the connection management. The reason is that the client backends that are running distributed queries are competing with the client backends that Citus initiates to parallelize the queries in order to get a slot on the max_connections. In that regard, we implemented a "failover" mechanism where if the distributed queries cannot get a connection, the execution failovers the tasks to the local execution. The failover logic is follows: - As the connection manager if it is OK to get a connection - If yes, we are good. - If no, we fail the workerPool and the failure triggers the failover of the tasks to local execution queue The decision of getting a connection is follows: /* * For local nodes, solely relying on citus.max_shared_pool_size or * max_connections might not be sufficient. The former gives us * a preview of the future (e.g., we let the new connections to establish, * but they are not established yet). The latter gives us the close to * precise view of the past (e.g., the active number of client backends). * * Overall, we want to limit both of the metrics. The former limit typically * kics in under regular loads, where the load of the database increases in * a reasonable pace. The latter limit typically kicks in when the database * is issued lots of concurrent sessions at the same time, such as benchmarks. */	2020-12-03 14:16:13 +03:00
Onder Kalaci	629ecc3dee	Add the infrastructure to count the number of client backends Considering the adaptive connection management improvements that we plan to roll soon, it makes it very helpful to know the number of active client backends. We are doing this addition to simplify yhe adaptive connection management for single node Citus. In single node Citus, both the client backends and Citus parallel queries would compete to get slots on Postgres' `max_connections` on the same Citus database. With adaptive connection management, we have the counters for Citus parallel queries. That helps us to adaptively decide on the remote executions pool size (e.g., throttle connections if necessary). However, we do not have any counters for the total number of client backends on the database. For single node Citus, we should consider all the client backends, not only the remote connections that Citus does. Of course Postgres internally knows how many client backends are active. However, to get that number Postgres iterates over all the backends. For examaple, see [pg_stat_get_db_numbackends](`8e90ec5580/src/backend/utils/adt/pgstatfuncs.c (L1240)`) where Postgres iterates over all the backends. For our purpuses, we need this information on every connection establishment. That's why we cannot affort to do this kind of iterattion.	2020-11-25 19:19:24 +01:00
Nils Dijk	213eb93e6d	make columnar compile and functionally working	2020-11-17 18:55:34 +01:00
SaitTalhaNisanci	45bb0fb587	Do initial cleanup only once in pg_init (#4213 ) In postmasters execution of _PG_init, IsUnderPostmaster will be false and we want to do the cleanup at that time only, otherwise there is a chance that there will be parallel queries and we might do a cleanup for things that are already in use.	2020-10-02 09:12:39 +03:00
Ahmet Gedemenli	abfb79bda6	Sort explain analyze output by task time Add sort method parameter for regression tests Fix check-style Change sorting method parameters to enum Polish Add task fields to OutTask Add test into multi_explain Fix isolation test	2020-09-24 11:38:40 +03:00
Önder Kalacı	983206c5e1	Hide `citus.subquery_pushdown` flag and NOTICE when enabled (#4124 ) * Hide citus.subquery_pushdown flag This flag is dangerous and could likely to let queries return wrong results. The flag has a very specific purpose for a very specific data distribution and query structure. In those cases, when the flag is set, the user can skip recursive planning altogether at their own risk. The meaning of the flag is that "I know what I'm doing such that the query structure/data distribution is on my control, so Citus can skip many correctness checks". For regular users, enabling this flag is discouraged. We have to keep the support only for backward compatibility for some users. In addition to that, give a NOTICE to discourage new users to use it.	2020-08-28 14:53:09 +02:00
Onder Kalaci	eeb8c81de2	Implement shared connection count reservation & enable `citus.max_shared_pool_size` for COPY With this patch, we introduce `locally_reserved_shared_connections.c/h` files which are responsible for reserving some space in shared memory counters upfront. We sometimes need to reserve connections, but not necessarily establish them. For example: - COPY command should reserve connections as it cannot know which connections it needs in which order. COPY establishes connections as any input data hits the workers. For example, for router COPY command, it only establishes 1 connection. As discussed here (https://github.com/citusdata/citus/pull/3849#pullrequestreview-431792473), COPY needs to reserve connections up-front, otherwise we can end up with resource starvation/un-detected deadlocks.	2020-08-03 18:51:40 +02:00
Sait Talha Nisanci	a3dc8fe2b5	remove occurrences of task-tracker from gucs	2020-07-21 16:19:46 +03:00
SaitTalhaNisanci	b3af63c8ce	Remove task tracker executor (#3850 ) * use adaptive executor even if task-tracker is set * Update check-multi-mx tests for adaptive executor Basically repartition joins are enabled where necessary. For parallel tests max adaptive executor pool size is decresed to 2, otherwise we would get too many clients error. * Update limit_intermediate_size test It seems that when we use adaptive executor instead of task tracker, we exceed the intermediate result size less in the test. Therefore updated the tests accordingly. * Update multi_router_planner It seems that there is one problem with multi_router_planner when we use adaptive executor, we should fix the following error: +ERROR: relation "authors_range_840010" does not exist +CONTEXT: while executing command on localhost:57637 * update repartition join tests for check-multi * update isolation tests for repartitioning * Error out if shard_replication_factor > 1 with repartitioning As we are removing the task tracker, we cannot switch to it if shard_replication_factor > 1. In that case, we simply error out. * Remove MULTI_EXECUTOR_TASK_TRACKER * Remove multi_task_tracker_executor Some utility methods are moved to task_execution_utils.c. * Remove task tracker protocol methods * Remove task_tracker.c methods * remove unused methods from multi_server_executor * fix style * remove task tracker specific tests from worker_schedule * comment out task tracker udf calls in tests We were using task tracker udfs to test permissions in multi_multiuser.sql. We should find some other way to test them, then we should remove the commented out task tracker calls. * remove task tracker test from follower schedule * remove task tracker tests from multi mx schedule * Remove task-tracker specific functions from worker functions * remove multi task tracker extra schedule * Remove unused methods from multi physical planner * remove task_executor_type related things in tests * remove LoadTuplesIntoTupleStore * Do initial cleanup for repartition leftovers During startup, task tracker would call TrackerCleanupJobDirectories and TrackerCleanupJobSchemas to clean up leftover directories and job schemas. With adaptive executor, while doing repartitions it is possible to leak these things as well. We don't retry cleanups, so it is possible to have leftover in case of errors. TrackerCleanupJobDirectories is renamed as RepartitionCleanupJobDirectories since it is repartition specific now, however TrackerCleanupJobSchemas cannot be used currently because it is task tracker specific. The thing is that this function is a no-op currently. We should add cleaning up intermediate schemas to DoInitialCleanup method when that problem is solved(We might want to solve it in this PR as well) * Revert "remove task tracker tests from multi mx schedule" This reverts commit `03ecc0a681`. * update multi mx repartition parallel tests * not error with task_tracker_conninfo_cache_invalidate * not run 4 repartition queries in parallel It seems that when we run 4 repartition queries in parallel we get too many clients error on CI even though we don't get it locally. Our guess is that, it is because we open/close many connections without doing some work and postgres has some delay to close the connections. Hence even though connections are removed from the pg_stat_activity, they might still not be closed. If the above assumption is correct, it is unlikely for it to happen in practice because: - There is some network latency in clusters, so this leaves some times for connections to be able to close - Repartition joins return some data and that also leaves some time for connections to be fully closed. As we don't get this error in our local, we currently assume that it is not a bug. Ideally this wouldn't happen when we get rid of the task-tracker repartition methods because they don't do any pruning and might be opening more connections than necessary. If this still gives us "too many clients" error, we can try to increase the max_connections in our test suite(which is 100 by default). Also there are different places where this error is given in postgres, but adding some backtrace it seems that we get this from ProcessStartupPacket. The backtraces can be found in this link: https://circleci.com/gh/citusdata/citus/138702 * Set distributePlan->relationIdList when it is needed It seems that we were setting the distributedPlan->relationIdList after JobExecutorType is called, which would choose task-tracker if replication factor > 1 and there is a repartition query. However, it uses relationIdList to decide if the query has a repartition query, and since it was not set yet, it would always think it is not a repartition query and would choose adaptive executor when it should choose task-tracker. * use adaptive executor even with shard_replication_factor > 1 It seems that we were already using adaptive executor when replication_factor > 1. So this commit removes the check. * remove multi_resowner.c and deprecate some settings * remove TaskExecution related leftovers * change deprecated API error message * not recursively plan single relatition repartition subquery * recursively plan single relation repartition subquery * test depreceated task tracker functions * fix overlapping shard intervals in range-distributed test * fix error message for citus_metadata_container * drop task-tracker deprecated functions * put the implemantation back to worker_cleanup_job_schema_cachesince citus cloud uses it * drop some functions, add downgrade script Some deprecated functions are dropped. Downgrade script is added. Some gucs are deprecated. A new guc for repartition joins bucket size is added. * order by a test to fix flappiness	2020-07-18 13:11:36 +03:00
Nils Dijk	d0b6e62c9a	change wording to allowlist and the likes (#3906 ) In the same line as #3904 Change wording to better reflect use and remove words that enforce/maintain bias.	2020-07-15 16:24:40 +02:00
Jelte Fennema	392c5e2c34	Fix wrong cancellation message about distributed deadlocks (#3956 )	2020-06-30 14:57:46 +02:00
Marco Slot	2a3234ca26	Rename masterQuery to combineQuery	2020-06-17 14:14:37 +02:00
Marco Slot	d1bab78d79	Remove master from file hierarchy	2020-06-16 17:49:09 +02:00
Jelte Fennema	0e12d045b1	Support use of binary protocol in between nodes (#3877 ) This can save a lot of data to be sent in some cases, thus improving performance for which inter query bandwidth is the bottleneck. There's some issues with enabling this as default, so that's currently not done.	2020-06-12 15:02:51 +02:00
Hadi Moshayedi	bb96ef5047	Does the EXPLAIN ANALYZE at the same time as execution, so avoids executing twice. We wrap worker tasks in worker_save_query_explain_analyze() so we can fetch their explain output later by a call worker_last_saved_explain_analyze(). Fixes #3519 Fixes #2347 Fixes #2613 Fixes #621	2020-06-11 01:55:57 -07:00
Onder Kalaci	bc54c5125f	Increase the default value of citus.node_connection_timeout The previous default was 5 seconds, and we change it to 30 seconds. The main motivation for this is that for busy clusters, 5 seconds can be too aggressive. Especially with connection throttling, the servers might be kept busy for a really long time, and users may see the connection errors more frequently. We've done some sanity checks, for really quick queries (like `SELECT count(*) from table`), 30 seconds is a decent value even if users execute 300 distributed queries on the coordinator. We've verified this on Hyperscale(Citus).	2020-04-24 15:16:42 +02:00
Nils Dijk	1d6ba1d09e	Refactor alter role to work on distributed roles (#3739 ) DESCRIPTION: Alter role only works for citus managed roles Alter role was implemented before we implemented good role management that hooks into the object propagation framework. This is a refactor of all alter role commands that have been implemented to - be on by default - only work for supported roles - make the citus extension owner a supported role Instead of distributing the alter role commands for roles at the beginning of the node activation role it now _only_ executes the alter role commands for all users in all databases and in the current database. In preparation of full role support small refactors have been done in the deparser. Earlier tests targeting other roles than the citus extension owner have been either slightly changed or removed to be put back where we have full role support. Fixes #2549	2020-04-16 12:23:27 +02:00
Marco Slot	8b83306a27	Issue worker messages with the same log level	2020-04-14 21:08:25 +02:00
Onder Kalaci	aa6b641828	Throttle connections to the worker nodes With this commit, we're introducing a new infrastructure to throttle connections to the worker nodes. This infrastructure is useful for multi-shard queries, router queries are have not been affected by this. The goal is to prevent establishing more than citus.max_shared_pool_size number of connections per worker node in total, across sessions. To do that, we've introduced a new connection flag OPTIONAL_CONNECTION. The idea is that some connections are optional such as the second (and further connections) for the adaptive executor. A single connection is enough to finish the distributed execution, the others are useful to execute the query faster. Thus, they can be consider as optional connections. When an optional connection is not allowed to the adaptive executor, it simply skips it and continues the execution with the already established connections. However, it'll keep retrying to establish optional connections, in case some slots are open again.	2020-04-14 10:27:48 +02:00
Onder Kalaci	0dbfbe0c37	Add the necessary shared memory infrastructure - The hashmap in the shared memory - The lock to access the hashmap - The GUC to control the size	2020-04-14 10:03:26 +02:00
Hadi Moshayedi	dda53a0bba	GUC for replicate reference tables on activate.	2020-04-08 12:42:45 -07:00
SaitTalhaNisanci	ba01f3457a	use macros for pg versions instead of hardcoded values (#3694 ) 3 Macros are defined for removing the hardcoded pg versions. PG_VERSION_11, PG_VERSION_12 and PG_VERSION_13.	2020-04-01 17:01:52 +03:00
Jelte Fennema	2aabe3e2ef	Mark all connections for shutdown when citus.node_conninfo chan… (#3642 ) We cache connections between nodes in our connection management code. This is good for speed. For security this can be a problem though. If the user changes settings related to TLS encryption they want those to be applied to future queries. This is especially important when they did not have TLS enabled before and now they want to enable it. This can normally be achieved by changing citus.node_conninfo. However, because connections are not reopened there will still be old connections that might not be encrypted at all. This commit changes that by marking all connections to be shutdown at the end of their current transaction. This way running transactions will succeed, even if placement requires connections to be reused for this transaction. But after this transaction completes any future statements will use a connection created with the new connection options. If a connection is requested and a connection is found that is marked for shutdown, then we don't return this connection. Instead a new one is created. This is needed to make sure that if there are no running transactions, then the next statement will not use an old cached connection, since connections are only actually shutdown at the end of a transaction.	2020-03-24 15:31:41 +01:00
Onder Kalaci	7b4eb9611b	Properly terminate connections at the end session Citus coordinator (or MX nodes) caches `citus.max_cached_conns_per_worker` connections per node. This means that, those connections are not terminated after each statement. Instead, cached to avoid the cost of re-establishment. This is crucial for OLTP performance. The problem with that approach is that, we never properly handle the termnation of those cached connections. For instance, when a session on the coordinator disconnects, you'd see the following logs on the workers: ``` 2020-03-20 09:13:39.454 CET [64028] LOG: could not receive data from client: Connection reset by peer ``` With this patch, we're terminating the cached connections properly at the end of the connection.	2020-03-20 17:34:34 +01:00
Jelte Fennema	8de8b62669	Convert unsafe APIs to safe ones	2020-02-25 15:39:27 +01:00
Nils Dijk	a77ed9cd23	Refactor master query to be planned by postgres' planner (#3326 ) DESCRIPTION: Replace the query planner for the coordinator part with the postgres planner Closes #2761 Citus had a simple rule based planner for the query executed on the query coordinator. This planner grew over time with the addigion of SQL support till it was getting close to the functionality of the postgres planner. Except the code was brittle and its complexity rose which made it hard to add new SQL support. Given its resemblance with the postgres planner it was a long outstanding wish to replace our hand crafted planner with the well supported postgres planner. This patch replaces our planner with a call to postgres' planner. Due to the functionality of the postgres planner we needed to support both projections and filters/quals on the citus custom scan node. When a sort operation is planned above the custom scan it might require fields to be reordered in the custom scan before returning the tuple (projection). The postgres planner assumes every custom scan node implements projections. Because we controlled the plan that was created we prevented reordering in the custom scan and never had implemented it before. A same optimisation applies to having clauses that could have been where clauses. Instead of applying the filter as a having on the aggregate it will push it down into the plan which could reach a custom scan node. For both filters and projections we have implemented them when tuples are read from the tuple store. If no projections or filters are required it will directly return the tuple from the tuple store. Otherwise it will loop tuples from the tuple store through the filter and projection until a tuple is found and returned. Besides filters being pushed down a side effect of having quals that could have been a where clause is that a call to read intermediate result could be called before the first tuple is fetched from the custom scan. This failed because the intermediate result would only be pulled to the coordinator on the first tuple fetch. To overcome this problem we do run the distributed subplans now before we run the postgres executor. This ensures the intermediate result is present on the coordinator in time. We do account for total time instrumentation by removing the instrumentation before handing control to the psotgres executor and update the timings our self. For future SQL support it is enough to create a valid query structure for the part of the query to be executed on the query coordinating node. As a utility we do serialise and print the query at debug level4 for engineers to inspect what kind of query is being planned on the query coordinator.	2020-02-25 14:39:56 +01:00
Önder Kalacı	412fe719f7	Hide citus.enable_ddl_propagation setting (#3437 ) As that is powerful and cause metadata inconsistency. See the following steps: (Note that we cannot use PGC_SUSET because on Citus MX we need this flag for non- superusers as well) ```SQL CREATE TABLE test_ref_table(key int); SELECT create_reference_table('test_ref_table'); SELECT logicalrelid, logicalrelid::oid FROM pg_dist_partition; ┌────────────────┬──────────────┐ │ logicalrelid │ logicalrelid │ ├────────────────┼──────────────┤ │ test_ref_table │ 16831 │ └────────────────┴──────────────┘ (1 row) Time: 0.929 ms SELECT relname FROM pg_class WHERE oid = 16831; ┌────────────────┐ │ relname │ ├────────────────┤ │ test_ref_table │ └────────────────┘ (1 row) Time: 0.785 ms SET citus.enable_ddl_propagation TO off; DROP TABLE test_ref_table ; SELECT logicalrelid, logicalrelid::oid FROM pg_dist_partition; ┌──────────────┬──────────────┐ │ logicalrelid │ logicalrelid │ ├──────────────┼──────────────┤ │ 16831 │ 16831 │ └──────────────┴──────────────┘ (1 row) Time: 0.972 ms SELECT relname FROM pg_class WHERE oid = 16831; ┌─────────┐ │ relname │ ├─────────┤ └─────────┘ (0 rows) Time: 0.908 ms SELECT master_add_node('localhost', 9703); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. Time: 5.028 ms !> ```	2020-01-29 10:17:53 +01:00
Hadi Moshayedi	a079278b0c	Repartitioned INSERT/SELECT: Add a GUC to enable/disable it	2020-01-16 23:24:52 -08:00

1 2 3 4 5 ...

271 Commits (0d503dd5ac5547ca71cd0147e53236d8d8a22fce)