citus

Commit Graph

Author	SHA1	Message	Date
Naisila Puka	84f2d8685a	Adds control for background task executors involving a node (#6771 ) DESCRIPTION: Adds control for background task executors involving a node ### Background and motivation Nonblocking concurrent task execution via background workers was introduced in [#6459](https://github.com/citusdata/citus/pull/6459), and concurrent shard moves in the background rebalancer were introduced in [#6756](https://github.com/citusdata/citus/pull/6756) - with a hard dependency that limits to 1 shard move per node. As we know, a shard move consists of a shard moving from a source node to a target node. The hard dependency was used because the background task runner didn't have an option to limit the parallel shard moves per node. With the motivation of controlling the number of concurrent shard moves that involve a particular node, either as source or target, this PR introduces a general new GUC citus.max_background_task_executors_per_node to be used in the background task runner infrastructure. So, why do we even want to control and limit the concurrency? Well, it's all about resource availability: because the moves involve the same nodes, extra parallelism won’t make the rebalance complete faster if some resource is already maxed out (usually cpu or disk). Or, if the cluster is being used in a production setting, the moves might compete for resources with production queries much more than if they had been executed sequentially. ### How does it work? A new column named nodes_involved is added to the catalog table that keeps track of the scheduled background tasks, pg_dist_background_task. It is of type integer[] - to store a list of node ids. It is NULL by default - the column will be filled by the rebalancer, but we may not care about the nodes involved in other uses of the background task runner. Table "pg_catalog.pg_dist_background_task" Column \| Type ============================================ job_id \| bigint task_id \| bigint owner \| regrole pid \| integer status \| citus_task_status command \| text retry_count \| integer not_before \| timestamp with time zone message \| text +nodes_involved \| integer[] A hashtable named ParallelTasksPerNode keeps track of the number of parallel running background tasks per node. An entry in the hashtable is as follows: ParallelTasksPerNodeEntry { node_id // The node is used as the hash table key counter // Number of concurrent background tasks that involve node node_id // The counter limit is citus.max_background_task_executors_per_node } When the background task runner assigns a runnable task to a new executor, it increments the counter for each of the nodes involved with that runnable task. The limit of each counter is citus.max_background_task_executors_per_node. If the limit is reached for any of the nodes involved, this runnable task is skipped. And then, later, when the running task finishes, the background task runner decrements the counter for each of the nodes involved with the done task. The following functions take care of these increment-decrement steps: IncrementParallelTaskCountForNodesInvolved(task) DecrementParallelTaskCountForNodesInvolved(task) citus.max_background_task_executors_per_node can be changed in the fly. In the background rebalancer, we simply give {source_node, target_node} as the nodesInvolved input to the ScheduleBackgroundTask function. The rest is taken care of by the general background task runner infrastructure explained above. Check background_task_queue_monitor.sql and background_rebalance_parallel.sql tests for detailed examples. #### Note This PR also adds a hard node dependency if a node is first being used as a source for a move, and then later as a target. The reason this should be a hard dependency is that the first move might make space for the second move. So, we could run out of disk space (or at least overload the node) if we move the second shard to it before the first one is moved away. Fixes https://github.com/citusdata/citus/issues/6716	2023-04-06 14:12:39 +03:00
Nils Dijk	3801576dfb	Move pg_dist_object to pg_catalog (#5765 ) DESCRIPTION: Move pg_dist_object to pg_catalog Historically `pg_dist_object` had been created in the `citus` schema as an experiment to understand if we could move our catalog tables to a branded schema. We quickly realised that this interfered with the UX on our managed services and other environments, where users connected via a user with the name of `citus`. By default postgres put the username on the search_path. To be able to read the catalog in the `citus` schema we would need to grant access permissions to the schema. This caused newly created objects like tables etc, to default to this schema for creation. This failed due to the write permissions to that schema. With this change we move the `pg_dist_object` catalog table to the `pg_catalog` schema, where our other schema's are also located. This makes the catalog table visible and readable by any user, like our other catalog tables, for debugging purposes. Note: due to the change of schema, we had to disable 1 test that was running into a discrepancy between the schema and binary. Secondly, we needed to make the lookup functions for the `pg_dist_object` relation and their indexes less strict on the fallback of the naming due to an other test that, due to an unfortunate cache invalidation, needed to lookup the relation again. This makes that we won't default to _only_ resolving from `pg_catalog` outside of upgrades.	2022-03-04 17:40:38 +00:00
Teja Mupparti	54862f8c22	(1) Functions will be delegated even when present in the scope of an explicit BEGIN/COMMIT transaction block or in a UDF calling another UDF. (2) Prohibit/Limit the delegated function not to do a 2PC (or any work on a remote connection). (3) Have a safety net to ensure the (2) i.e. we should block the connections from the delegated procedure or make sure that no 2PC happens on the node. (4) Such delegated functions are restricted to use only the distributed argument value. Note: To limit the scope of the project we are considering only Functions(not procedures) for the initial work. DESCRIPTION: Introduce a new flag "force_delegation" in create_distributed_function(), which will allow a function to be delegated in an explicit transaction block. Fixes #3265 Once the function is delegated to the worker, on that node during the planning distributed_planner() TryToDelegateFunctionCall() CheckDelegatedFunctionExecution() EnableInForceDelegatedFuncExecution() Save the distribution argument (Constant) ExecutorStart() CitusBeginScan() IsShardKeyValueAllowed() Ensure to not use non-distribution argument. ExecutorRun() AdaptiveExecutor() StartDistributedExecution() EnsureNoRemoteExecutionFromWorkers() Ensure all the shards are local to the node in the remoteTaskList. NonPushableInsertSelectExecScan() InitializeCopyShardState() EnsureNoRemoteExecutionFromWorkers() Ensure all the shards are local to the node in the placementList. This also fixes a minor issue: Properly handle expressions+parameters in distribution arguments	2022-01-19 16:43:33 -08:00
Ahmet Gedemenli	67dca4363d	Dont auto-undistribute user-added citus local tables (#5314 ) * Disable auto-undistribute for user-added citus local tables	2021-10-28 12:10:26 +03:00
Jelte Fennema	cbbd10b974	Implement an improvement threshold in the rebalancer (#4927 ) Every move in the rebalancer algorithm results in an improvement in the balance. However, even if the improvement in the balance was very small the move was still chosen. This is especially problematic if the shard itself is very big and the move will take a long time. This changes the rebalancer algorithm to take the relative size of the balance improvement into account when choosing moves. By default a move will not be chosen if it improves the balance by less than half of the size of the shard. An extra argument is added to the rebalancer functions so that the user can decide to lower the default threshold if the ignored move is wanted anyway.	2021-05-11 14:24:59 +02:00
Jelte Fennema	78e495e030	Add shouldhaveshards to pg_dist_node (#2960 ) This is an improvement over #2512. This adds the boolean shouldhaveshards column to pg_dist_node. When it's false, create_distributed_table for new collocation groups will not create shards on that node. Reference tables will still be created on nodes where it is false.	2019-10-22 16:47:16 +02:00
Philip Dubé	74cb168205	Remove Postgres 10 support	2019-10-11 21:56:56 +00:00
Hadi Moshayedi	d3e284dcd6	Use heap_deform_tuple() instead of calling heap_getattr(). (#2464 ) After Fast ALTER TABLE ADD COLUMN with a non-NULL default in PG11, physical heaps might not contain all attributes after a ALTER TABLE ADD COLUMN happens. heap_getattr() returns NULL when the physical tuple doesn't contain an attribute. So we should use heap_deform_tuple() in these cases, which fills in the missing attributes. Our catalog tables evolve over time, and an upgrade might involve some ALTER TABLE ADD COLUMN commands. Note that we don't need to worry about postgres catalog tables and we can use heap_getattr() for them, because they only change between major versions. This also fixes #2453.	2018-11-05 15:11:01 -05:00

8 Commits (2675a682184a1d7f7a29e4569a10e605fb549fd7)