mirror of https://github.com/citusdata/citus.git
Adds control for background task executors involving a node (#6771)
DESCRIPTION: Adds control for background task executors involving a node ### Background and motivation Nonblocking concurrent task execution via background workers was introduced in [#6459](https://github.com/citusdata/citus/pull/6459), and concurrent shard moves in the background rebalancer were introduced in [#6756](https://github.com/citusdata/citus/pull/6756) - with a hard dependency that limits to 1 shard move per node. As we know, a shard move consists of a shard moving from a source node to a target node. The hard dependency was used because the background task runner didn't have an option to limit the parallel shard moves per node. With the motivation of controlling the number of concurrent shard moves that involve a particular node, either as source or target, this PR introduces a general new GUC citus.max_background_task_executors_per_node to be used in the background task runner infrastructure. So, why do we even want to control and limit the concurrency? Well, it's all about resource availability: because the moves involve the same nodes, extra parallelism won’t make the rebalance complete faster if some resource is already maxed out (usually cpu or disk). Or, if the cluster is being used in a production setting, the moves might compete for resources with production queries much more than if they had been executed sequentially. ### How does it work? A new column named nodes_involved is added to the catalog table that keeps track of the scheduled background tasks, pg_dist_background_task. It is of type integer[] - to store a list of node ids. It is NULL by default - the column will be filled by the rebalancer, but we may not care about the nodes involved in other uses of the background task runner. Table "pg_catalog.pg_dist_background_task" Column | Type ============================================ job_id | bigint task_id | bigint owner | regrole pid | integer status | citus_task_status command | text retry_count | integer not_before | timestamp with time zone message | text +nodes_involved | integer[] A hashtable named ParallelTasksPerNode keeps track of the number of parallel running background tasks per node. An entry in the hashtable is as follows: ParallelTasksPerNodeEntry { node_id // The node is used as the hash table key counter // Number of concurrent background tasks that involve node node_id // The counter limit is citus.max_background_task_executors_per_node } When the background task runner assigns a runnable task to a new executor, it increments the counter for each of the nodes involved with that runnable task. The limit of each counter is citus.max_background_task_executors_per_node. If the limit is reached for any of the nodes involved, this runnable task is skipped. And then, later, when the running task finishes, the background task runner decrements the counter for each of the nodes involved with the done task. The following functions take care of these increment-decrement steps: IncrementParallelTaskCountForNodesInvolved(task) DecrementParallelTaskCountForNodesInvolved(task) citus.max_background_task_executors_per_node can be changed in the fly. In the background rebalancer, we simply give {source_node, target_node} as the nodesInvolved input to the ScheduleBackgroundTask function. The rest is taken care of by the general background task runner infrastructure explained above. Check background_task_queue_monitor.sql and background_rebalance_parallel.sql tests for detailed examples. #### Note This PR also adds a hard node dependency if a node is first being used as a source for a move, and then later as a target. The reason this should be a hard dependency is that the first move might make space for the second move. So, we could run out of disk space (or at least overload the node) if we move the second shard to it before the first one is moved away. Fixes https://github.com/citusdata/citus/issues/6716pull/6827/head^2
parent
fa00fc6e3e
commit
84f2d8685a
|
@ -32,6 +32,7 @@
|
|||
#include "catalog/pg_type.h"
|
||||
#include "commands/extension.h"
|
||||
#include "commands/sequence.h"
|
||||
#include "distributed/background_jobs.h"
|
||||
#include "distributed/colocation_utils.h"
|
||||
#include "distributed/connection_management.h"
|
||||
#include "distributed/citus_nodes.h"
|
||||
|
@ -57,7 +58,9 @@
|
|||
#include "distributed/relay_utility.h"
|
||||
#include "distributed/resource_lock.h"
|
||||
#include "distributed/remote_commands.h"
|
||||
#include "distributed/shard_rebalancer.h"
|
||||
#include "distributed/tuplestore.h"
|
||||
#include "distributed/utils/array_type.h"
|
||||
#include "distributed/worker_manager.h"
|
||||
#include "distributed/worker_protocol.h"
|
||||
#include "distributed/version_compat.h"
|
||||
|
@ -777,7 +780,6 @@ GenerateSizeQueryOnMultiplePlacements(List *shardIntervalList,
|
|||
{
|
||||
partitionedShardNames = lappend(partitionedShardNames, quotedShardName);
|
||||
}
|
||||
|
||||
/* for non-partitioned tables, we will use Postgres' size functions */
|
||||
else
|
||||
{
|
||||
|
@ -2816,7 +2818,8 @@ CreateBackgroundJob(const char *jobType, const char *description)
|
|||
*/
|
||||
BackgroundTask *
|
||||
ScheduleBackgroundTask(int64 jobId, Oid owner, char *command, int dependingTaskCount,
|
||||
int64 dependingTaskIds[])
|
||||
int64 dependingTaskIds[], int nodesInvolvedCount, int32
|
||||
nodesInvolved[])
|
||||
{
|
||||
BackgroundTask *task = NULL;
|
||||
|
||||
|
@ -2890,6 +2893,11 @@ ScheduleBackgroundTask(int64 jobId, Oid owner, char *command, int dependingTaskC
|
|||
values[Anum_pg_dist_background_task_message - 1] = CStringGetTextDatum("");
|
||||
nulls[Anum_pg_dist_background_task_message - 1] = false;
|
||||
|
||||
values[Anum_pg_dist_background_task_nodes_involved - 1] =
|
||||
IntArrayToDatum(nodesInvolvedCount, nodesInvolved);
|
||||
nulls[Anum_pg_dist_background_task_nodes_involved - 1] = (nodesInvolvedCount ==
|
||||
0);
|
||||
|
||||
HeapTuple newTuple = heap_form_tuple(RelationGetDescr(pgDistBackgroundTask),
|
||||
values, nulls);
|
||||
CatalogTupleInsert(pgDistBackgroundTask, newTuple);
|
||||
|
@ -3201,6 +3209,13 @@ DeformBackgroundTaskHeapTuple(TupleDesc tupleDescriptor, HeapTuple taskTuple)
|
|||
TextDatumGetCString(values[Anum_pg_dist_background_task_message - 1]);
|
||||
}
|
||||
|
||||
if (!nulls[Anum_pg_dist_background_task_nodes_involved - 1])
|
||||
{
|
||||
ArrayType *nodesInvolvedArrayObject =
|
||||
DatumGetArrayTypeP(values[Anum_pg_dist_background_task_nodes_involved - 1]);
|
||||
task->nodesInvolved = IntegerArrayTypeToList(nodesInvolvedArrayObject);
|
||||
}
|
||||
|
||||
return task;
|
||||
}
|
||||
|
||||
|
@ -3333,7 +3348,8 @@ GetRunnableBackgroundTask(void)
|
|||
while (HeapTupleIsValid(taskTuple = systable_getnext(scanDescriptor)))
|
||||
{
|
||||
task = DeformBackgroundTaskHeapTuple(tupleDescriptor, taskTuple);
|
||||
if (BackgroundTaskReadyToRun(task))
|
||||
if (BackgroundTaskReadyToRun(task) &&
|
||||
IncrementParallelTaskCountForNodesInvolved(task))
|
||||
{
|
||||
/* found task, close table and return */
|
||||
break;
|
||||
|
|
|
@ -190,13 +190,32 @@ typedef struct WorkerShardStatistics
|
|||
HTAB *statistics;
|
||||
} WorkerShardStatistics;
|
||||
|
||||
/* ShardMoveDependencyHashEntry contains the taskId which any new shard move task within the corresponding colocation group must take a dependency on */
|
||||
/*
|
||||
* ShardMoveDependencyHashEntry contains the taskId which any new shard
|
||||
* move task within the corresponding colocation group
|
||||
* must take a dependency on
|
||||
*/
|
||||
typedef struct ShardMoveDependencyInfo
|
||||
{
|
||||
int64 key;
|
||||
int64 taskId;
|
||||
} ShardMoveDependencyInfo;
|
||||
|
||||
/*
|
||||
* ShardMoveSourceNodeHashEntry keeps track of the source nodes
|
||||
* of the moves.
|
||||
*/
|
||||
typedef struct ShardMoveSourceNodeHashEntry
|
||||
{
|
||||
/* this is the key */
|
||||
int32 node_id;
|
||||
List *taskIds;
|
||||
} ShardMoveSourceNodeHashEntry;
|
||||
|
||||
/*
|
||||
* ShardMoveDependencies keeps track of all needed dependencies
|
||||
* between shard moves.
|
||||
*/
|
||||
typedef struct ShardMoveDependencies
|
||||
{
|
||||
HTAB *colocationDependencies;
|
||||
|
@ -274,6 +293,15 @@ static void AddToWorkerShardIdSet(HTAB *shardsByWorker, char *workerName, int wo
|
|||
static HTAB * BuildShardSizesHash(ProgressMonitorData *monitor, HTAB *shardStatistics);
|
||||
static void ErrorOnConcurrentRebalance(RebalanceOptions *);
|
||||
static List * GetSetCommandListForNewConnections(void);
|
||||
static int64 GetColocationId(PlacementUpdateEvent *move);
|
||||
static ShardMoveDependencies InitializeShardMoveDependencies();
|
||||
static int64 * GenerateTaskMoveDependencyList(PlacementUpdateEvent *move, int64
|
||||
colocationId,
|
||||
ShardMoveDependencies shardMoveDependencies,
|
||||
int *nDepends);
|
||||
static void UpdateShardMoveDependencies(PlacementUpdateEvent *move, uint64 colocationId,
|
||||
int64 taskId,
|
||||
ShardMoveDependencies shardMoveDependencies);
|
||||
|
||||
/* declarations for dynamic loading */
|
||||
PG_FUNCTION_INFO_V1(rebalance_table_shards);
|
||||
|
@ -1930,8 +1958,7 @@ GetColocationId(PlacementUpdateEvent *move)
|
|||
* InitializeShardMoveDependencies function creates the hash maps that we use to track
|
||||
* the latest moves so that subsequent moves with the same properties must take a dependency
|
||||
* on them. There are two hash maps. One is for tracking the latest move scheduled in a
|
||||
* given colocation group and the other one is for tracking the latest move which involves
|
||||
* a given node either as its source node or its target node.
|
||||
* given colocation group and the other one is for tracking source nodes of all moves.
|
||||
*/
|
||||
static ShardMoveDependencies
|
||||
InitializeShardMoveDependencies()
|
||||
|
@ -1941,18 +1968,17 @@ InitializeShardMoveDependencies()
|
|||
ShardMoveDependencyInfo,
|
||||
"colocationDependencyHashMap",
|
||||
6);
|
||||
shardMoveDependencies.nodeDependencies = CreateSimpleHashWithNameAndSize(int64,
|
||||
ShardMoveDependencyInfo,
|
||||
shardMoveDependencies.nodeDependencies = CreateSimpleHashWithNameAndSize(int32,
|
||||
ShardMoveSourceNodeHashEntry,
|
||||
"nodeDependencyHashMap",
|
||||
6);
|
||||
|
||||
return shardMoveDependencies;
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
* GenerateTaskMoveDependencyList creates and returns a List of taskIds that
|
||||
* the move must take a dependency on.
|
||||
* the move must take a dependency on, given the shard move dependencies as input.
|
||||
*/
|
||||
static int64 *
|
||||
GenerateTaskMoveDependencyList(PlacementUpdateEvent *move, int64 colocationId,
|
||||
|
@ -1972,27 +1998,24 @@ GenerateTaskMoveDependencyList(PlacementUpdateEvent *move, int64 colocationId,
|
|||
hash_search(dependsList, &shardMoveDependencyInfo->taskId, HASH_ENTER, NULL);
|
||||
}
|
||||
|
||||
/* Check if there exists a move scheduled earlier whose source or target node
|
||||
* overlaps with the current move's source node. */
|
||||
shardMoveDependencyInfo = hash_search(
|
||||
shardMoveDependencies.nodeDependencies, &move->sourceNode->nodeId, HASH_ENTER,
|
||||
/*
|
||||
* Check if there exists moves scheduled earlier whose source node
|
||||
* overlaps with the current move's target node.
|
||||
* The earlier/first move might make space for the later/second move.
|
||||
* So we could run out of disk space (or at least overload the node)
|
||||
* if we move the second shard to it before the first one is moved away.
|
||||
*/
|
||||
ShardMoveSourceNodeHashEntry *shardMoveSourceNodeHashEntry = hash_search(
|
||||
shardMoveDependencies.nodeDependencies, &move->targetNode->nodeId, HASH_FIND,
|
||||
&found);
|
||||
|
||||
if (found)
|
||||
{
|
||||
hash_search(dependsList, &shardMoveDependencyInfo->taskId, HASH_ENTER, NULL);
|
||||
}
|
||||
|
||||
/* Check if there exists a move scheduled earlier whose source or target node
|
||||
* overlaps with the current move's target node. */
|
||||
shardMoveDependencyInfo = hash_search(
|
||||
shardMoveDependencies.nodeDependencies, &move->targetNode->nodeId, HASH_ENTER,
|
||||
&found);
|
||||
|
||||
|
||||
if (found)
|
||||
{
|
||||
hash_search(dependsList, &shardMoveDependencyInfo->taskId, HASH_ENTER, NULL);
|
||||
int64 *taskId = NULL;
|
||||
foreach_ptr(taskId, shardMoveSourceNodeHashEntry->taskIds)
|
||||
{
|
||||
hash_search(dependsList, taskId, HASH_ENTER, NULL);
|
||||
}
|
||||
}
|
||||
|
||||
*nDepends = hash_get_num_entries(dependsList);
|
||||
|
@ -2030,15 +2053,20 @@ UpdateShardMoveDependencies(PlacementUpdateEvent *move, uint64 colocationId, int
|
|||
shardMoveDependencies.colocationDependencies, &colocationId, HASH_ENTER, NULL);
|
||||
shardMoveDependencyInfo->taskId = taskId;
|
||||
|
||||
shardMoveDependencyInfo = hash_search(shardMoveDependencies.nodeDependencies,
|
||||
&move->sourceNode->nodeId, HASH_ENTER, NULL);
|
||||
bool found;
|
||||
ShardMoveSourceNodeHashEntry *shardMoveSourceNodeHashEntry = hash_search(
|
||||
shardMoveDependencies.nodeDependencies, &move->sourceNode->nodeId, HASH_ENTER,
|
||||
&found);
|
||||
|
||||
shardMoveDependencyInfo->taskId = taskId;
|
||||
if (!found)
|
||||
{
|
||||
shardMoveSourceNodeHashEntry->taskIds = NIL;
|
||||
}
|
||||
|
||||
shardMoveDependencyInfo = hash_search(shardMoveDependencies.nodeDependencies,
|
||||
&move->targetNode->nodeId, HASH_ENTER, NULL);
|
||||
|
||||
shardMoveDependencyInfo->taskId = taskId;
|
||||
int64 *newTaskId = palloc0(sizeof(int64));
|
||||
*newTaskId = taskId;
|
||||
shardMoveSourceNodeHashEntry->taskIds = lappend(
|
||||
shardMoveSourceNodeHashEntry->taskIds, newTaskId);
|
||||
}
|
||||
|
||||
|
||||
|
@ -2135,8 +2163,10 @@ RebalanceTableShardsBackground(RebalanceOptions *options, Oid shardReplicationMo
|
|||
appendStringInfo(&buf,
|
||||
"SELECT pg_catalog.replicate_reference_tables(%s)",
|
||||
quote_literal_cstr(shardTranferModeLabel));
|
||||
|
||||
int32 nodesInvolved[] = { 0 };
|
||||
BackgroundTask *task = ScheduleBackgroundTask(jobId, GetUserId(), buf.data, 0,
|
||||
NULL);
|
||||
NULL, 0, nodesInvolved);
|
||||
replicateRefTablesTaskId = task->taskid;
|
||||
}
|
||||
|
||||
|
@ -2170,9 +2200,14 @@ RebalanceTableShardsBackground(RebalanceOptions *options, Oid shardReplicationMo
|
|||
dependsArray[0] = replicateRefTablesTaskId;
|
||||
}
|
||||
|
||||
int32 nodesInvolved[2] = { 0 };
|
||||
nodesInvolved[0] = move->sourceNode->nodeId;
|
||||
nodesInvolved[1] = move->targetNode->nodeId;
|
||||
|
||||
BackgroundTask *task = ScheduleBackgroundTask(jobId, GetUserId(), buf.data,
|
||||
nDepends,
|
||||
dependsArray);
|
||||
dependsArray, 2,
|
||||
nodesInvolved);
|
||||
|
||||
UpdateShardMoveDependencies(move, colocationId, task->taskid,
|
||||
shardMoveDependencies);
|
||||
|
|
|
@ -1793,6 +1793,18 @@ RegisterCitusConfigVariables(void)
|
|||
GUC_STANDARD,
|
||||
NULL, NULL, NULL);
|
||||
|
||||
DefineCustomIntVariable(
|
||||
"citus.max_background_task_executors_per_node",
|
||||
gettext_noop(
|
||||
"Sets the maximum number of parallel background task executor workers "
|
||||
"for scheduled background tasks that involve a particular node"),
|
||||
NULL,
|
||||
&MaxBackgroundTaskExecutorsPerNode,
|
||||
1, 1, 128,
|
||||
PGC_SIGHUP,
|
||||
GUC_STANDARD,
|
||||
NULL, NULL, NULL);
|
||||
|
||||
DefineCustomIntVariable(
|
||||
"citus.max_cached_connection_lifetime",
|
||||
gettext_noop("Sets the maximum lifetime of cached connections to other nodes."),
|
||||
|
|
|
@ -14,3 +14,7 @@ ALTER TABLE pg_catalog.pg_dist_transaction REPLICA IDENTITY USING INDEX pg_dist_
|
|||
|
||||
#include "udfs/citus_stat_tenants_local_reset/11.3-1.sql"
|
||||
#include "udfs/citus_stat_tenants_reset/11.3-1.sql"
|
||||
|
||||
-- we introduce nodes_involved, which will be used internally to
|
||||
-- limit the number of parallel tasks running per node
|
||||
ALTER TABLE pg_catalog.pg_dist_background_task ADD COLUMN nodes_involved int[] DEFAULT NULL;
|
||||
|
|
|
@ -28,3 +28,5 @@ DROP FUNCTION pg_catalog.citus_stat_tenants(boolean);
|
|||
|
||||
DROP FUNCTION pg_catalog.citus_stat_tenants_local_reset();
|
||||
DROP FUNCTION pg_catalog.citus_stat_tenants_reset();
|
||||
|
||||
ALTER TABLE pg_catalog.pg_dist_background_task DROP COLUMN nodes_involved;
|
||||
|
|
|
@ -140,3 +140,34 @@ TextArrayTypeToIntegerList(ArrayType *arrayObject)
|
|||
|
||||
return list;
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
* IntArrayToDatum
|
||||
*
|
||||
* Convert an integer array to the datum int array format
|
||||
* (currently used for nodes_involved in pg_dist_background_task)
|
||||
*
|
||||
* Returns the array in the form of a Datum, or PointerGetDatum(NULL)
|
||||
* if the int_array is empty.
|
||||
*/
|
||||
Datum
|
||||
IntArrayToDatum(uint32 int_array_size, int int_array[])
|
||||
{
|
||||
if (int_array_size == 0)
|
||||
{
|
||||
return PointerGetDatum(NULL);
|
||||
}
|
||||
|
||||
ArrayBuildState *astate = NULL;
|
||||
for (int i = 0; i < int_array_size; i++)
|
||||
{
|
||||
Datum dvalue = Int32GetDatum(int_array[i]);
|
||||
bool disnull = false;
|
||||
Oid element_type = INT4OID;
|
||||
astate = accumArrayResult(astate, dvalue, disnull, element_type,
|
||||
CurrentMemoryContext);
|
||||
}
|
||||
|
||||
return makeArrayResult(astate, CurrentMemoryContext);
|
||||
}
|
||||
|
|
|
@ -63,6 +63,7 @@
|
|||
#include "distributed/metadata_cache.h"
|
||||
#include "distributed/metadata_utility.h"
|
||||
#include "distributed/shard_cleaner.h"
|
||||
#include "distributed/shard_rebalancer.h"
|
||||
#include "distributed/resource_lock.h"
|
||||
|
||||
/* Table-of-contents constants for our dynamic shared memory segment. */
|
||||
|
@ -115,12 +116,17 @@ static bool MonitorGotTerminationOrCancellationRequest();
|
|||
static void QueueMonitorSigTermHandler(SIGNAL_ARGS);
|
||||
static void QueueMonitorSigIntHandler(SIGNAL_ARGS);
|
||||
static void QueueMonitorSigHupHandler(SIGNAL_ARGS);
|
||||
static void DecrementParallelTaskCountForNodesInvolved(BackgroundTask *task);
|
||||
|
||||
/* flags set by signal handlers */
|
||||
static volatile sig_atomic_t GotSigterm = false;
|
||||
static volatile sig_atomic_t GotSigint = false;
|
||||
static volatile sig_atomic_t GotSighup = false;
|
||||
|
||||
/* keeping track of parallel background tasks per node */
|
||||
HTAB *ParallelTasksPerNode = NULL;
|
||||
int MaxBackgroundTaskExecutorsPerNode = 1;
|
||||
|
||||
PG_FUNCTION_INFO_V1(citus_job_cancel);
|
||||
PG_FUNCTION_INFO_V1(citus_job_wait);
|
||||
PG_FUNCTION_INFO_V1(citus_task_wait);
|
||||
|
@ -211,7 +217,7 @@ citus_job_wait(PG_FUNCTION_ARGS)
|
|||
* assume any terminal state as its desired status. The function returns if the
|
||||
* desired_state was reached.
|
||||
*
|
||||
* The current implementation is a polling implementation with an interval of 1 second.
|
||||
* The current implementation is a polling implementation with an interval of 0.1 seconds.
|
||||
* Ideally we would have some synchronization between the background tasks queue monitor
|
||||
* and any backend calling this function to receive a signal when the task changes state.
|
||||
*/
|
||||
|
@ -857,6 +863,7 @@ TaskEnded(TaskExecutionContext *taskExecutionContext)
|
|||
UpdateBackgroundTask(task);
|
||||
UpdateDependingTasks(task);
|
||||
UpdateBackgroundJob(task->jobid);
|
||||
DecrementParallelTaskCountForNodesInvolved(task);
|
||||
|
||||
/* we are sure that at least one task did not block on current iteration */
|
||||
queueMonitorExecutionContext->allTasksWouldBlock = false;
|
||||
|
@ -868,6 +875,77 @@ TaskEnded(TaskExecutionContext *taskExecutionContext)
|
|||
}
|
||||
|
||||
|
||||
/*
|
||||
* IncrementParallelTaskCountForNodesInvolved
|
||||
* Checks whether we have reached the limit of parallel tasks per node
|
||||
* per each of the nodes involved with the task
|
||||
* If at least one limit is reached, it returns false.
|
||||
* If limits aren't reached, it increments the parallel task count
|
||||
* for each of the nodes involved with the task, and returns true.
|
||||
*/
|
||||
bool
|
||||
IncrementParallelTaskCountForNodesInvolved(BackgroundTask *task)
|
||||
{
|
||||
if (task->nodesInvolved)
|
||||
{
|
||||
int node;
|
||||
|
||||
/* first check whether we have reached the limit for any of the nodes */
|
||||
foreach_int(node, task->nodesInvolved)
|
||||
{
|
||||
bool found;
|
||||
ParallelTasksPerNodeEntry *hashEntry = hash_search(
|
||||
ParallelTasksPerNode, &(node), HASH_ENTER, &found);
|
||||
if (!found)
|
||||
{
|
||||
hashEntry->counter = 0;
|
||||
}
|
||||
else if (hashEntry->counter >= MaxBackgroundTaskExecutorsPerNode)
|
||||
{
|
||||
/* at least one node's limit is reached */
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
/* then, increment the parallel task count per each node */
|
||||
foreach_int(node, task->nodesInvolved)
|
||||
{
|
||||
ParallelTasksPerNodeEntry *hashEntry = hash_search(
|
||||
ParallelTasksPerNode, &(node), HASH_FIND, NULL);
|
||||
Assert(hashEntry);
|
||||
hashEntry->counter += 1;
|
||||
}
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
* DecrementParallelTaskCountForNodesInvolved
|
||||
* Decrements the parallel task count for each of the nodes involved
|
||||
* with the task.
|
||||
* We call this function after the task has gone through Running state
|
||||
* and then has ended.
|
||||
*/
|
||||
static void
|
||||
DecrementParallelTaskCountForNodesInvolved(BackgroundTask *task)
|
||||
{
|
||||
if (task->nodesInvolved)
|
||||
{
|
||||
int node;
|
||||
foreach_int(node, task->nodesInvolved)
|
||||
{
|
||||
ParallelTasksPerNodeEntry *hashEntry = hash_search(ParallelTasksPerNode,
|
||||
&(node),
|
||||
HASH_FIND, NULL);
|
||||
|
||||
hashEntry->counter -= 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
* QueueMonitorSigHupHandler handles SIGHUP to update monitor related config params.
|
||||
*/
|
||||
|
@ -1023,7 +1101,7 @@ CitusBackgroundTaskQueueMonitorMain(Datum arg)
|
|||
/* handle SIGINT to properly cancel active task executors */
|
||||
pqsignal(SIGINT, QueueMonitorSigIntHandler);
|
||||
|
||||
/* handle SIGHUP to update MaxBackgroundTaskExecutors */
|
||||
/* handle SIGHUP to update MaxBackgroundTaskExecutors and MaxBackgroundTaskExecutorsPerNode */
|
||||
pqsignal(SIGHUP, QueueMonitorSigHupHandler);
|
||||
|
||||
/* ready to handle signals */
|
||||
|
@ -1167,10 +1245,15 @@ CitusBackgroundTaskQueueMonitorMain(Datum arg)
|
|||
{
|
||||
GotSighup = false;
|
||||
|
||||
/* update max_background_task_executors if changed */
|
||||
/* update max_background_task_executors and max_background_task_executors_per_node if changed */
|
||||
ProcessConfigFile(PGC_SIGHUP);
|
||||
}
|
||||
|
||||
if (ParallelTasksPerNode == NULL)
|
||||
{
|
||||
ParallelTasksPerNode = CreateSimpleHash(int32, ParallelTasksPerNodeEntry);
|
||||
}
|
||||
|
||||
/* assign runnable tasks, if any, to new task executors in a transaction if we do not have SIGTERM or SIGINT */
|
||||
if (!MonitorGotTerminationOrCancellationRequest())
|
||||
{
|
||||
|
|
|
@ -85,6 +85,21 @@ typedef struct TaskExecutionContext
|
|||
} TaskExecutionContext;
|
||||
|
||||
|
||||
/*
|
||||
* ParallelTasksPerNodeEntry is the struct used
|
||||
* to track the number of concurrent background tasks that
|
||||
* involve a particular node (the key to the entry)
|
||||
*/
|
||||
typedef struct ParallelTasksPerNodeEntry
|
||||
{
|
||||
/* Used as hash key. */
|
||||
int32 node_id;
|
||||
|
||||
/* number of concurrent background tasks that involve node node_id */
|
||||
uint32 counter;
|
||||
} ParallelTasksPerNodeEntry;
|
||||
|
||||
|
||||
extern BackgroundWorkerHandle * StartCitusBackgroundTaskQueueMonitor(Oid database,
|
||||
Oid extensionOwner);
|
||||
extern void CitusBackgroundTaskQueueMonitorMain(Datum arg);
|
||||
|
@ -95,5 +110,6 @@ extern Datum citus_job_wait(PG_FUNCTION_ARGS);
|
|||
extern Datum citus_task_wait(PG_FUNCTION_ARGS);
|
||||
extern void citus_job_wait_internal(int64 jobid, BackgroundJobStatus *desiredStatus);
|
||||
extern void citus_task_wait_internal(int64 taskid, BackgroundTaskStatus *desiredStatus);
|
||||
extern bool IncrementParallelTaskCountForNodesInvolved(BackgroundTask *task);
|
||||
|
||||
#endif /*CITUS_BACKGROUND_JOBS_H */
|
||||
|
|
|
@ -252,6 +252,7 @@ typedef struct BackgroundTask
|
|||
int32 *retry_count;
|
||||
TimestampTz *not_before;
|
||||
char *message;
|
||||
List *nodesInvolved;
|
||||
|
||||
/* extra space to store values for nullable value types above */
|
||||
struct
|
||||
|
@ -388,7 +389,9 @@ extern bool HasNonTerminalJobOfType(const char *jobType, int64 *jobIdOut);
|
|||
extern int64 CreateBackgroundJob(const char *jobType, const char *description);
|
||||
extern BackgroundTask * ScheduleBackgroundTask(int64 jobId, Oid owner, char *command,
|
||||
int dependingTaskCount,
|
||||
int64 dependingTaskIds[]);
|
||||
int64 dependingTaskIds[],
|
||||
int nodesInvolvedCount,
|
||||
int32 nodesInvolved[]);
|
||||
extern BackgroundTask * GetRunnableBackgroundTask(void);
|
||||
extern void ResetRunningBackgroundTasks(void);
|
||||
extern BackgroundJob * GetBackgroundJobByJobId(int64 jobId);
|
||||
|
|
|
@ -15,7 +15,7 @@
|
|||
* compiler constants for pg_dist_background_task
|
||||
* ----------------
|
||||
*/
|
||||
#define Natts_pg_dist_background_task 9
|
||||
#define Natts_pg_dist_background_task 10
|
||||
#define Anum_pg_dist_background_task_job_id 1
|
||||
#define Anum_pg_dist_background_task_task_id 2
|
||||
#define Anum_pg_dist_background_task_owner 3
|
||||
|
@ -25,5 +25,6 @@
|
|||
#define Anum_pg_dist_background_task_retry_count 7
|
||||
#define Anum_pg_dist_background_task_not_before 8
|
||||
#define Anum_pg_dist_background_task_message 9
|
||||
#define Anum_pg_dist_background_task_nodes_involved 10
|
||||
|
||||
#endif /* CITUS_PG_DIST_BACKGROUND_TASK_H */
|
||||
|
|
|
@ -190,6 +190,7 @@ extern char *VariablesToBePassedToNewConnections;
|
|||
extern int MaxRebalancerLoggedIgnoredMoves;
|
||||
extern bool RunningUnderIsolationTest;
|
||||
extern bool PropagateSessionSettingsForLoopbackConnection;
|
||||
extern int MaxBackgroundTaskExecutorsPerNode;
|
||||
|
||||
/* External function declarations */
|
||||
extern Datum shard_placement_rebalance_array(PG_FUNCTION_ARGS);
|
||||
|
|
|
@ -22,5 +22,6 @@ extern ArrayType * DatumArrayToArrayType(Datum *datumArray, int datumCount,
|
|||
Oid datumTypeId);
|
||||
extern List * IntegerArrayTypeToList(ArrayType *arrayObject);
|
||||
extern List * TextArrayTypeToIntegerList(ArrayType *arrayObject);
|
||||
extern Datum IntArrayToDatum(uint32 int_array_size, int int_array[]);
|
||||
|
||||
#endif /* CITUS_ARRAY_TYPE_H */
|
||||
|
|
|
@ -1,12 +1,22 @@
|
|||
/*
|
||||
Test to check if the background tasks scheduled by the background rebalancer
|
||||
has the correct dependencies.
|
||||
*/
|
||||
--
|
||||
-- BACKGROUND_REBALANCE_PARALLEL
|
||||
--
|
||||
-- Test to check if the background tasks scheduled by the background rebalancer
|
||||
-- have the correct dependencies
|
||||
--
|
||||
-- Test to verify that we do not allow parallel rebalancer moves involving a
|
||||
-- particular node (either as source or target) more than
|
||||
-- citus.max_background_task_executors_per_node, and that we can change the GUC on
|
||||
-- the fly, and that will affect the ongoing balance as it should
|
||||
--
|
||||
-- Test to verify that there's a hard dependency when a specific node is first being
|
||||
-- used as a source for a move, and then later as a target.
|
||||
--
|
||||
CREATE SCHEMA background_rebalance_parallel;
|
||||
SET search_path TO background_rebalance_parallel;
|
||||
SET citus.next_shard_id TO 85674000;
|
||||
SET citus.shard_replication_factor TO 1;
|
||||
SET client_min_messages TO WARNING;
|
||||
SET client_min_messages TO ERROR;
|
||||
ALTER SEQUENCE pg_dist_background_job_job_id_seq RESTART 17777;
|
||||
ALTER SEQUENCE pg_dist_background_task_task_id_seq RESTART 1000;
|
||||
ALTER SEQUENCE pg_catalog.pg_dist_colocationid_seq RESTART 50050;
|
||||
|
@ -45,52 +55,52 @@ SELECT pg_reload_conf();
|
|||
t
|
||||
(1 row)
|
||||
|
||||
/* Colocation group 1: create two tables table1_colg1, table2_colg1 and in a colocation group */
|
||||
-- Colocation group 1: create two tables table1_colg1, table2_colg1 and in a colocation group
|
||||
CREATE TABLE table1_colg1 (a int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table1_colg1', 'a', shard_count => 4 , colocate_with => 'none');
|
||||
SELECT create_distributed_table('table1_colg1', 'a', shard_count => 4, colocate_with => 'none');
|
||||
create_distributed_table
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
CREATE TABLE table2_colg1 (b int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table2_colg1', 'b' , colocate_with => 'table1_colg1');
|
||||
SELECT create_distributed_table('table2_colg1', 'b', colocate_with => 'table1_colg1');
|
||||
create_distributed_table
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
/* Colocation group 2: create two tables table1_colg2, table2_colg2 and in a colocation group */
|
||||
-- Colocation group 2: create two tables table1_colg2, table2_colg2 and in a colocation group
|
||||
CREATE TABLE table1_colg2 (a int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table1_colg2 ', 'a', shard_count => 4, colocate_with => 'none');
|
||||
SELECT create_distributed_table('table1_colg2', 'a', shard_count => 4, colocate_with => 'none');
|
||||
create_distributed_table
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
CREATE TABLE table2_colg2 (b int primary key);
|
||||
SELECT create_distributed_table('table2_colg2', 'b' , colocate_with => 'table1_colg2');
|
||||
SELECT create_distributed_table('table2_colg2', 'b', colocate_with => 'table1_colg2');
|
||||
create_distributed_table
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
/* Colocation group 3: create two tables table1_colg3, table2_colg3 and in a colocation group */
|
||||
-- Colocation group 3: create two tables table1_colg3, table2_colg3 and in a colocation group
|
||||
CREATE TABLE table1_colg3 (a int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table1_colg3 ', 'a', shard_count => 4, colocate_with => 'none');
|
||||
SELECT create_distributed_table('table1_colg3', 'a', shard_count => 4, colocate_with => 'none');
|
||||
create_distributed_table
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
CREATE TABLE table2_colg3 (b int primary key);
|
||||
SELECT create_distributed_table('table2_colg3', 'b' , colocate_with => 'table1_colg3');
|
||||
SELECT create_distributed_table('table2_colg3', 'b', colocate_with => 'table1_colg3');
|
||||
create_distributed_table
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
/* Add two new node so that we can rebalance */
|
||||
-- Add two new nodes so that we can rebalance
|
||||
SELECT 1 FROM citus_add_node('localhost', :worker_3_port);
|
||||
?column?
|
||||
---------------------------------------------------------------------
|
||||
|
@ -132,10 +142,11 @@ SELECT citus_rebalance_wait();
|
|||
|
||||
(1 row)
|
||||
|
||||
/*Check that a move is dependent on
|
||||
1. any other move scheduled earlier in its colocation group.
|
||||
2. any other move scheduled earlier whose source node or target
|
||||
node overlaps with the current moves nodes. */
|
||||
-- PART 1
|
||||
-- Test to check if the background tasks scheduled by the background rebalancer
|
||||
-- have the correct dependencies
|
||||
-- Check that a move is dependent on
|
||||
-- any other move scheduled earlier in its colocation group.
|
||||
SELECT S.shardid, P.colocationid
|
||||
FROM pg_dist_shard S, pg_dist_partition P
|
||||
WHERE S.logicalrelid = P.logicalrelid ORDER BY S.shardid ASC;
|
||||
|
@ -175,16 +186,12 @@ FROM pg_dist_background_task_depend D WHERE job_id = 17777 ORDER BY D.task_id,
|
|||
task_id | command | depends_on | command
|
||||
---------------------------------------------------------------------
|
||||
1001 | SELECT pg_catalog.citus_move_shard_placement(85674000,50,53,'auto') | 1000 | SELECT pg_catalog.citus_move_shard_placement(85674001,51,52,'auto')
|
||||
1002 | SELECT pg_catalog.citus_move_shard_placement(85674009,51,52,'auto') | 1000 | SELECT pg_catalog.citus_move_shard_placement(85674001,51,52,'auto')
|
||||
1003 | SELECT pg_catalog.citus_move_shard_placement(85674008,50,53,'auto') | 1001 | SELECT pg_catalog.citus_move_shard_placement(85674000,50,53,'auto')
|
||||
1003 | SELECT pg_catalog.citus_move_shard_placement(85674008,50,53,'auto') | 1002 | SELECT pg_catalog.citus_move_shard_placement(85674009,51,52,'auto')
|
||||
1004 | SELECT pg_catalog.citus_move_shard_placement(85674017,51,52,'auto') | 1002 | SELECT pg_catalog.citus_move_shard_placement(85674009,51,52,'auto')
|
||||
1005 | SELECT pg_catalog.citus_move_shard_placement(85674016,50,53,'auto') | 1003 | SELECT pg_catalog.citus_move_shard_placement(85674008,50,53,'auto')
|
||||
1005 | SELECT pg_catalog.citus_move_shard_placement(85674016,50,53,'auto') | 1004 | SELECT pg_catalog.citus_move_shard_placement(85674017,51,52,'auto')
|
||||
(7 rows)
|
||||
(3 rows)
|
||||
|
||||
/* Check that if there is a reference table that needs to be synched to a node,
|
||||
any move without a dependency must depend on the move task for reference table. */
|
||||
-- Check that if there is a reference table that needs to be synched to a node,
|
||||
-- any move without a dependency must depend on the move task for reference table.
|
||||
SELECT 1 FROM citus_drain_node('localhost',:worker_4_port);
|
||||
?column?
|
||||
---------------------------------------------------------------------
|
||||
|
@ -203,8 +210,8 @@ SELECT 1 FROM citus_disable_node('localhost', :worker_4_port, synchronous:=true)
|
|||
1
|
||||
(1 row)
|
||||
|
||||
/* Drain worker_3 so that we can move only one colocation group to worker_3
|
||||
to create an unbalance that would cause parallel rebalancing. */
|
||||
-- Drain worker_3 so that we can move only one colocation group to worker_3
|
||||
-- to create an unbalance that would cause parallel rebalancing.
|
||||
SELECT 1 FROM citus_drain_node('localhost',:worker_3_port);
|
||||
?column?
|
||||
---------------------------------------------------------------------
|
||||
|
@ -225,7 +232,7 @@ SELECT create_reference_table('ref_table');
|
|||
|
||||
(1 row)
|
||||
|
||||
/* Move all the shards of Colocation group 3 to worker_3.*/
|
||||
-- Move all the shards of Colocation group 3 to worker_3.
|
||||
SELECT
|
||||
master_move_shard_placement(shardid, 'localhost', nodeport, 'localhost', :worker_3_port, 'block_writes')
|
||||
FROM
|
||||
|
@ -243,7 +250,7 @@ ORDER BY
|
|||
(4 rows)
|
||||
|
||||
CALL citus_cleanup_orphaned_resources();
|
||||
/* Activate and new nodes so that we can rebalance. */
|
||||
-- Activate and new nodes so that we can rebalance.
|
||||
SELECT 1 FROM citus_activate_node('localhost', :worker_4_port);
|
||||
?column?
|
||||
---------------------------------------------------------------------
|
||||
|
@ -323,18 +330,34 @@ FROM pg_dist_background_task_depend D WHERE job_id = 17778 ORDER BY D.task_id,
|
|||
1008 | SELECT pg_catalog.citus_move_shard_placement(85674003,51,54,'auto') | 1006 | SELECT pg_catalog.replicate_reference_tables('auto')
|
||||
1009 | SELECT pg_catalog.citus_move_shard_placement(85674000,50,55,'auto') | 1008 | SELECT pg_catalog.citus_move_shard_placement(85674003,51,54,'auto')
|
||||
1010 | SELECT pg_catalog.citus_move_shard_placement(85674017,52,53,'auto') | 1007 | SELECT pg_catalog.citus_move_shard_placement(85674016,52,53,'auto')
|
||||
1011 | SELECT pg_catalog.citus_move_shard_placement(85674008,51,54,'auto') | 1008 | SELECT pg_catalog.citus_move_shard_placement(85674003,51,54,'auto')
|
||||
1011 | SELECT pg_catalog.citus_move_shard_placement(85674008,51,54,'auto') | 1006 | SELECT pg_catalog.replicate_reference_tables('auto')
|
||||
1012 | SELECT pg_catalog.citus_move_shard_placement(85674001,50,55,'auto') | 1009 | SELECT pg_catalog.citus_move_shard_placement(85674000,50,55,'auto')
|
||||
(6 rows)
|
||||
|
||||
-- PART 2
|
||||
-- Test to verify that we do not allow parallel rebalancer moves involving a
|
||||
-- particular node (either as source or target)
|
||||
-- more than citus.max_background_task_executors_per_node
|
||||
-- and that we can change the GUC on the fly
|
||||
-- citus_task_wait calls are used to ensure consistent pg_dist_background_task query
|
||||
-- output i.e. to avoid flakiness
|
||||
-- First let's restart the scenario
|
||||
DROP SCHEMA background_rebalance_parallel CASCADE;
|
||||
TRUNCATE pg_dist_background_job CASCADE;
|
||||
TRUNCATE pg_dist_background_task CASCADE;
|
||||
TRUNCATE pg_dist_background_task_depend;
|
||||
SELECT public.wait_for_resource_cleanup();
|
||||
wait_for_resource_cleanup
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
select citus_remove_node('localhost', :worker_2_port);
|
||||
citus_remove_node
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
select citus_remove_node('localhost', :worker_3_port);
|
||||
citus_remove_node
|
||||
---------------------------------------------------------------------
|
||||
|
@ -359,6 +382,474 @@ select citus_remove_node('localhost', :worker_6_port);
|
|||
|
||||
(1 row)
|
||||
|
||||
CREATE SCHEMA background_rebalance_parallel;
|
||||
SET search_path TO background_rebalance_parallel;
|
||||
-- Create 8 tables in 4 colocation groups, and populate them
|
||||
CREATE TABLE table1_colg1 (a int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table1_colg1', 'a', shard_count => 3, colocate_with => 'none');
|
||||
create_distributed_table
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
INSERT INTO table1_colg1 SELECT i FROM generate_series(0, 100)i;
|
||||
CREATE TABLE table2_colg1 (b int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table2_colg1', 'b', colocate_with => 'table1_colg1');
|
||||
create_distributed_table
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
INSERT INTO table2_colg1 SELECT i FROM generate_series(0, 100)i;
|
||||
CREATE TABLE table1_colg2 (a int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table1_colg2', 'a', shard_count => 3, colocate_with => 'none');
|
||||
create_distributed_table
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
INSERT INTO table1_colg2 SELECT i FROM generate_series(0, 100)i;
|
||||
CREATE TABLE table2_colg2 (b int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table2_colg2', 'b', colocate_with => 'table1_colg2');
|
||||
create_distributed_table
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
INSERT INTO table2_colg2 SELECT i FROM generate_series(0, 100)i;
|
||||
CREATE TABLE table1_colg3 (a int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table1_colg3', 'a', shard_count => 3, colocate_with => 'none');
|
||||
create_distributed_table
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
INSERT INTO table1_colg3 SELECT i FROM generate_series(0, 100)i;
|
||||
CREATE TABLE table2_colg3 (b int primary key);
|
||||
SELECT create_distributed_table('table2_colg3', 'b', colocate_with => 'table1_colg3');
|
||||
create_distributed_table
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
INSERT INTO table2_colg3 SELECT i FROM generate_series(0, 100)i;
|
||||
CREATE TABLE table1_colg4 (a int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table1_colg4', 'a', shard_count => 3, colocate_with => 'none');
|
||||
create_distributed_table
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
INSERT INTO table1_colg4 SELECT i FROM generate_series(0, 100)i;
|
||||
CREATE TABLE table2_colg4 (b int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table2_colg4', 'b', colocate_with => 'table1_colg4');
|
||||
create_distributed_table
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
INSERT INTO table2_colg4 SELECT i FROM generate_series(0, 100)i;
|
||||
-- Add nodes so that we can rebalance
|
||||
SELECT citus_add_node('localhost', :worker_2_port);
|
||||
citus_add_node
|
||||
---------------------------------------------------------------------
|
||||
56
|
||||
(1 row)
|
||||
|
||||
SELECT citus_add_node('localhost', :worker_3_port);
|
||||
citus_add_node
|
||||
---------------------------------------------------------------------
|
||||
57
|
||||
(1 row)
|
||||
|
||||
SELECT citus_rebalance_start AS job_id from citus_rebalance_start() \gset
|
||||
-- see dependent tasks to understand which tasks remain runnable because of
|
||||
-- citus.max_background_task_executors_per_node
|
||||
-- and which tasks are actually blocked from colocation group dependencies
|
||||
SELECT D.task_id,
|
||||
(SELECT T.command FROM pg_dist_background_task T WHERE T.task_id = D.task_id),
|
||||
D.depends_on,
|
||||
(SELECT T.command FROM pg_dist_background_task T WHERE T.task_id = D.depends_on)
|
||||
FROM pg_dist_background_task_depend D WHERE job_id in (:job_id) ORDER BY D.task_id, D.depends_on ASC;
|
||||
task_id | command | depends_on | command
|
||||
---------------------------------------------------------------------
|
||||
1014 | SELECT pg_catalog.citus_move_shard_placement(85674026,50,57,'auto') | 1013 | SELECT pg_catalog.citus_move_shard_placement(85674025,50,56,'auto')
|
||||
1016 | SELECT pg_catalog.citus_move_shard_placement(85674032,50,57,'auto') | 1015 | SELECT pg_catalog.citus_move_shard_placement(85674031,50,56,'auto')
|
||||
1018 | SELECT pg_catalog.citus_move_shard_placement(85674038,50,57,'auto') | 1017 | SELECT pg_catalog.citus_move_shard_placement(85674037,50,56,'auto')
|
||||
1020 | SELECT pg_catalog.citus_move_shard_placement(85674044,50,57,'auto') | 1019 | SELECT pg_catalog.citus_move_shard_placement(85674043,50,56,'auto')
|
||||
(4 rows)
|
||||
|
||||
-- default citus.max_background_task_executors_per_node is 1
|
||||
-- show that first exactly one task per node is running
|
||||
-- among the tasks that are not blocked
|
||||
SELECT citus_task_wait(1013, desired_status => 'running');
|
||||
citus_task_wait
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
SELECT job_id, task_id, status, nodes_involved
|
||||
FROM pg_dist_background_task WHERE job_id in (:job_id) ORDER BY task_id;
|
||||
job_id | task_id | status | nodes_involved
|
||||
---------------------------------------------------------------------
|
||||
17779 | 1013 | running | {50,56}
|
||||
17779 | 1014 | blocked | {50,57}
|
||||
17779 | 1015 | runnable | {50,56}
|
||||
17779 | 1016 | blocked | {50,57}
|
||||
17779 | 1017 | runnable | {50,56}
|
||||
17779 | 1018 | blocked | {50,57}
|
||||
17779 | 1019 | runnable | {50,56}
|
||||
17779 | 1020 | blocked | {50,57}
|
||||
(8 rows)
|
||||
|
||||
-- increase citus.max_background_task_executors_per_node
|
||||
ALTER SYSTEM SET citus.max_background_task_executors_per_node = 2;
|
||||
SELECT pg_reload_conf();
|
||||
pg_reload_conf
|
||||
---------------------------------------------------------------------
|
||||
t
|
||||
(1 row)
|
||||
|
||||
SELECT citus_task_wait(1015, desired_status => 'running');
|
||||
citus_task_wait
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
SELECT citus_task_wait(1013, desired_status => 'done');
|
||||
citus_task_wait
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
-- show that at most 2 tasks per node are running
|
||||
-- among the tasks that are not blocked
|
||||
SELECT job_id, task_id, status, nodes_involved
|
||||
FROM pg_dist_background_task WHERE job_id in (:job_id) ORDER BY task_id;
|
||||
job_id | task_id | status | nodes_involved
|
||||
---------------------------------------------------------------------
|
||||
17779 | 1013 | done | {50,56}
|
||||
17779 | 1014 | running | {50,57}
|
||||
17779 | 1015 | running | {50,56}
|
||||
17779 | 1016 | blocked | {50,57}
|
||||
17779 | 1017 | runnable | {50,56}
|
||||
17779 | 1018 | blocked | {50,57}
|
||||
17779 | 1019 | runnable | {50,56}
|
||||
17779 | 1020 | blocked | {50,57}
|
||||
(8 rows)
|
||||
|
||||
-- decrease to default (1)
|
||||
ALTER SYSTEM RESET citus.max_background_task_executors_per_node;
|
||||
SELECT pg_reload_conf();
|
||||
pg_reload_conf
|
||||
---------------------------------------------------------------------
|
||||
t
|
||||
(1 row)
|
||||
|
||||
SELECT citus_task_wait(1015, desired_status => 'done');
|
||||
citus_task_wait
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
SELECT citus_task_wait(1014, desired_status => 'done');
|
||||
citus_task_wait
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
SELECT citus_task_wait(1016, desired_status => 'running');
|
||||
citus_task_wait
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
-- show that exactly one task per node is running
|
||||
-- among the tasks that are not blocked
|
||||
SELECT job_id, task_id, status, nodes_involved
|
||||
FROM pg_dist_background_task WHERE job_id in (:job_id) ORDER BY task_id;
|
||||
job_id | task_id | status | nodes_involved
|
||||
---------------------------------------------------------------------
|
||||
17779 | 1013 | done | {50,56}
|
||||
17779 | 1014 | done | {50,57}
|
||||
17779 | 1015 | done | {50,56}
|
||||
17779 | 1016 | running | {50,57}
|
||||
17779 | 1017 | runnable | {50,56}
|
||||
17779 | 1018 | blocked | {50,57}
|
||||
17779 | 1019 | runnable | {50,56}
|
||||
17779 | 1020 | blocked | {50,57}
|
||||
(8 rows)
|
||||
|
||||
SELECT citus_rebalance_stop();
|
||||
citus_rebalance_stop
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
-- PART 3
|
||||
-- Test to verify that there's a hard dependency when A specific node is first being used as a
|
||||
-- source for a move, and then later as a target.
|
||||
-- First let's restart the scenario
|
||||
DROP SCHEMA background_rebalance_parallel CASCADE;
|
||||
TRUNCATE pg_dist_background_job CASCADE;
|
||||
TRUNCATE pg_dist_background_task CASCADE;
|
||||
TRUNCATE pg_dist_background_task_depend;
|
||||
SELECT public.wait_for_resource_cleanup();
|
||||
wait_for_resource_cleanup
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
select citus_remove_node('localhost', :worker_1_port);
|
||||
citus_remove_node
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
select citus_remove_node('localhost', :worker_2_port);
|
||||
citus_remove_node
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
select citus_remove_node('localhost', :worker_3_port);
|
||||
citus_remove_node
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
CREATE SCHEMA background_rebalance_parallel;
|
||||
SET search_path TO background_rebalance_parallel;
|
||||
SET citus.next_shard_id TO 85674051;
|
||||
ALTER SEQUENCE pg_catalog.pg_dist_node_nodeid_seq RESTART 61;
|
||||
-- add the first node
|
||||
-- nodeid here is 61
|
||||
select citus_add_node('localhost', :worker_1_port);
|
||||
citus_add_node
|
||||
---------------------------------------------------------------------
|
||||
61
|
||||
(1 row)
|
||||
|
||||
-- create, populate and distribute 6 tables, each with 1 shard, none colocated with each other
|
||||
CREATE TABLE table1_colg1 (a int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table1_colg1', 'a', shard_count => 1, colocate_with => 'none');
|
||||
create_distributed_table
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
INSERT INTO table1_colg1 SELECT i FROM generate_series(0, 100)i;
|
||||
CREATE TABLE table1_colg2 (a int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table1_colg2', 'a', shard_count => 1, colocate_with => 'none');
|
||||
create_distributed_table
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
INSERT INTO table1_colg2 SELECT i FROM generate_series(0, 100)i;
|
||||
CREATE TABLE table1_colg3 (a int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table1_colg3', 'a', shard_count => 1, colocate_with => 'none');
|
||||
create_distributed_table
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
INSERT INTO table1_colg3 SELECT i FROM generate_series(0, 100)i;
|
||||
CREATE TABLE table1_colg4 (a int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table1_colg4', 'a', shard_count => 1, colocate_with => 'none');
|
||||
create_distributed_table
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
INSERT INTO table1_colg4 SELECT i FROM generate_series(0, 100)i;
|
||||
CREATE TABLE table1_colg5 (a int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table1_colg5', 'a', shard_count => 1, colocate_with => 'none');
|
||||
create_distributed_table
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
INSERT INTO table1_colg5 SELECT i FROM generate_series(0, 100)i;
|
||||
CREATE TABLE table1_colg6 (a int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table1_colg6', 'a', shard_count => 1, colocate_with => 'none');
|
||||
create_distributed_table
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
INSERT INTO table1_colg6 SELECT i FROM generate_series(0, 100)i;
|
||||
-- add two other nodes
|
||||
-- nodeid here is 62
|
||||
select citus_add_node('localhost', :worker_2_port);
|
||||
citus_add_node
|
||||
---------------------------------------------------------------------
|
||||
62
|
||||
(1 row)
|
||||
|
||||
-- nodeid here is 63
|
||||
select citus_add_node('localhost', :worker_3_port);
|
||||
citus_add_node
|
||||
---------------------------------------------------------------------
|
||||
63
|
||||
(1 row)
|
||||
|
||||
CREATE OR REPLACE FUNCTION shard_placement_rebalance_array(
|
||||
worker_node_list json[],
|
||||
shard_placement_list json[],
|
||||
threshold float4 DEFAULT 0,
|
||||
max_shard_moves int DEFAULT 1000000,
|
||||
drain_only bool DEFAULT false,
|
||||
improvement_threshold float4 DEFAULT 0.5
|
||||
)
|
||||
RETURNS json[]
|
||||
AS 'citus'
|
||||
LANGUAGE C STRICT VOLATILE;
|
||||
-- we are simulating the following from shard_rebalancer_unit.sql
|
||||
-- the following steps are all according to this scenario
|
||||
-- where the third move should be dependent of the first two
|
||||
-- because the third move's target is the source of the first two
|
||||
SELECT unnest(shard_placement_rebalance_array(
|
||||
ARRAY['{"node_name": "hostname1", "disallowed_shards": "1,2,3,5,6"}',
|
||||
'{"node_name": "hostname2", "disallowed_shards": "4"}',
|
||||
'{"node_name": "hostname3", "disallowed_shards": "4"}'
|
||||
]::json[],
|
||||
ARRAY['{"shardid":1, "nodename":"hostname1"}',
|
||||
'{"shardid":2, "nodename":"hostname1"}',
|
||||
'{"shardid":3, "nodename":"hostname2"}',
|
||||
'{"shardid":4, "nodename":"hostname2"}',
|
||||
'{"shardid":5, "nodename":"hostname3"}',
|
||||
'{"shardid":6, "nodename":"hostname3"}'
|
||||
]::json[]
|
||||
));
|
||||
unnest
|
||||
---------------------------------------------------------------------
|
||||
{"updatetype":1,"shardid":1,"sourcename":"hostname1","sourceport":5432,"targetname":"hostname2","targetport":5432}
|
||||
{"updatetype":1,"shardid":2,"sourcename":"hostname1","sourceport":5432,"targetname":"hostname3","targetport":5432}
|
||||
{"updatetype":1,"shardid":4,"sourcename":"hostname2","sourceport":5432,"targetname":"hostname1","targetport":5432}
|
||||
(3 rows)
|
||||
|
||||
-- manually balance the cluster such that we have
|
||||
-- a balanced cluster like above with 1,2,3,4,5,6 and hostname1/2/3
|
||||
-- shardid 85674051 (1) nodeid 61 (hostname1)
|
||||
-- shardid 85674052 (2) nodeid 61 (hostname1)
|
||||
-- shardid 85674053 (3) nodeid 62 (hostname2)
|
||||
-- shardid 85674054 (4) nodeid 62 (hostname2)
|
||||
-- shardid 85674055 (5) nodeid 63 (hostname3)
|
||||
-- shardid 85674056 (6) nodeid 63 (hostname3)
|
||||
SELECT pg_catalog.citus_move_shard_placement(85674053,61,62,'auto');
|
||||
citus_move_shard_placement
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
SELECT pg_catalog.citus_move_shard_placement(85674054,61,62,'auto');
|
||||
citus_move_shard_placement
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
SELECT pg_catalog.citus_move_shard_placement(85674055,61,63,'auto');
|
||||
citus_move_shard_placement
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
SELECT pg_catalog.citus_move_shard_placement(85674056,61,63,'auto');
|
||||
citus_move_shard_placement
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
-- now create another rebalance strategy in order to simulate moves
|
||||
-- which use as target a node that has been previously used as source
|
||||
CREATE OR REPLACE FUNCTION test_shard_allowed_on_node(shardid bigint, nodeid int)
|
||||
RETURNS boolean AS
|
||||
$$
|
||||
-- analogous to '{"node_name": "hostname1", "disallowed_shards": "1,2,3,5,6"}'
|
||||
select case when (shardid != 85674054 and nodeid = 61)
|
||||
then false
|
||||
-- analogous to '{"node_name": "hostname2", "disallowed_shards": "4"}'
|
||||
-- AND '{"node_name": "hostname2", "disallowed_shards": "4"}'
|
||||
when (shardid = 85674054 and nodeid != 61)
|
||||
then false
|
||||
else true
|
||||
end;
|
||||
$$ LANGUAGE sql;
|
||||
-- insert the new test rebalance strategy
|
||||
INSERT INTO
|
||||
pg_catalog.pg_dist_rebalance_strategy(
|
||||
name,
|
||||
default_strategy,
|
||||
shard_cost_function,
|
||||
node_capacity_function,
|
||||
shard_allowed_on_node_function,
|
||||
default_threshold,
|
||||
minimum_threshold,
|
||||
improvement_threshold
|
||||
) VALUES (
|
||||
'test_source_then_target',
|
||||
false,
|
||||
'citus_shard_cost_1',
|
||||
'citus_node_capacity_1',
|
||||
'background_rebalance_parallel.test_shard_allowed_on_node',
|
||||
0,
|
||||
0,
|
||||
0
|
||||
);
|
||||
SELECT * FROM get_rebalance_table_shards_plan(rebalance_strategy := 'test_source_then_target');
|
||||
table_name | shardid | shard_size | sourcename | sourceport | targetname | targetport
|
||||
---------------------------------------------------------------------
|
||||
table1_colg1 | 85674051 | 0 | localhost | 57637 | localhost | 57638
|
||||
table1_colg2 | 85674052 | 0 | localhost | 57637 | localhost | 57639
|
||||
table1_colg4 | 85674054 | 0 | localhost | 57638 | localhost | 57637
|
||||
(3 rows)
|
||||
|
||||
SELECT citus_rebalance_start AS job_id from citus_rebalance_start(rebalance_strategy := 'test_source_then_target') \gset
|
||||
-- check that the third move is blocked and depends on the first two
|
||||
SELECT job_id, task_id, status, nodes_involved
|
||||
FROM pg_dist_background_task WHERE job_id in (:job_id) ORDER BY task_id;
|
||||
job_id | task_id | status | nodes_involved
|
||||
---------------------------------------------------------------------
|
||||
17780 | 1021 | runnable | {61,62}
|
||||
17780 | 1022 | runnable | {61,63}
|
||||
17780 | 1023 | blocked | {62,61}
|
||||
(3 rows)
|
||||
|
||||
SELECT D.task_id,
|
||||
(SELECT T.command FROM pg_dist_background_task T WHERE T.task_id = D.task_id),
|
||||
D.depends_on,
|
||||
(SELECT T.command FROM pg_dist_background_task T WHERE T.task_id = D.depends_on)
|
||||
FROM pg_dist_background_task_depend D WHERE job_id in (:job_id) ORDER BY D.task_id, D.depends_on ASC;
|
||||
task_id | command | depends_on | command
|
||||
---------------------------------------------------------------------
|
||||
1023 | SELECT pg_catalog.citus_move_shard_placement(85674054,62,61,'auto') | 1021 | SELECT pg_catalog.citus_move_shard_placement(85674051,61,62,'auto')
|
||||
1023 | SELECT pg_catalog.citus_move_shard_placement(85674054,62,61,'auto') | 1022 | SELECT pg_catalog.citus_move_shard_placement(85674052,61,63,'auto')
|
||||
(2 rows)
|
||||
|
||||
SELECT citus_rebalance_stop();
|
||||
citus_rebalance_stop
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
DELETE FROM pg_catalog.pg_dist_rebalance_strategy WHERE name='test_source_then_target';
|
||||
DROP SCHEMA background_rebalance_parallel CASCADE;
|
||||
TRUNCATE pg_dist_background_job CASCADE;
|
||||
TRUNCATE pg_dist_background_task CASCADE;
|
||||
TRUNCATE pg_dist_background_task_depend;
|
||||
SELECT public.wait_for_resource_cleanup();
|
||||
wait_for_resource_cleanup
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
select citus_remove_node('localhost', :worker_3_port);
|
||||
citus_remove_node
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
-- keep the rest of the tests inact that depends node/group ids
|
||||
ALTER SEQUENCE pg_catalog.pg_dist_groupid_seq RESTART :last_group_id_cls;
|
||||
ALTER SEQUENCE pg_catalog.pg_dist_node_nodeid_seq RESTART :last_node_id_cls;
|
||||
|
|
|
@ -3,6 +3,7 @@ SET search_path TO background_task_queue_monitor;
|
|||
SET citus.shard_count TO 4;
|
||||
SET citus.shard_replication_factor TO 1;
|
||||
SET citus.next_shard_id TO 3536400;
|
||||
SET client_min_messages TO ERROR;
|
||||
-- reset sequence values
|
||||
ALTER SEQUENCE pg_dist_background_job_job_id_seq RESTART 1450000;
|
||||
ALTER SEQUENCE pg_dist_background_task_task_id_seq RESTART 1450000;
|
||||
|
@ -654,11 +655,268 @@ SELECT job_id, task_id, status FROM pg_dist_background_task
|
|||
1450016 | 1450024 | done
|
||||
(2 rows)
|
||||
|
||||
-- TEST11
|
||||
-- verify that we do not allow parallel task executors involving a particular node
|
||||
-- more than citus.max_background_task_executors_per_node
|
||||
-- verify that we can change citus.max_background_task_executors_per_node on the fly
|
||||
-- tests are done with dummy node ids
|
||||
-- citus_task_wait calls are used to ensure consistent pg_dist_background_task query
|
||||
-- output i.e. to avoid flakiness
|
||||
BEGIN;
|
||||
INSERT INTO pg_dist_background_job (job_type, description) VALUES ('test_job', 'simple test to verify changing max background task executors per node on the fly') RETURNING job_id AS job_id1 \gset
|
||||
INSERT INTO pg_dist_background_task (job_id, command, nodes_involved) VALUES (:job_id1, $job$ SELECT pg_sleep(2); $job$, ARRAY [1, 2]) RETURNING task_id AS task_id1 \gset
|
||||
INSERT INTO pg_dist_background_task (job_id, command, nodes_involved) VALUES (:job_id1, $job$ SELECT pg_sleep(2); $job$, ARRAY [3, 4]) RETURNING task_id AS task_id2 \gset
|
||||
INSERT INTO pg_dist_background_task (job_id, command, nodes_involved) VALUES (:job_id1, $job$ SELECT pg_sleep(4); $job$, ARRAY [1, 2]) RETURNING task_id AS task_id3 \gset
|
||||
INSERT INTO pg_dist_background_task (job_id, command, nodes_involved) VALUES (:job_id1, $job$ SELECT pg_sleep(4); $job$, ARRAY [1, 3]) RETURNING task_id AS task_id4 \gset
|
||||
INSERT INTO pg_dist_background_task (job_id, command, nodes_involved) VALUES (:job_id1, $job$ SELECT pg_sleep(4); $job$, ARRAY [2, 4]) RETURNING task_id AS task_id5 \gset
|
||||
INSERT INTO pg_dist_background_task (job_id, command, nodes_involved) VALUES (:job_id1, $job$ SELECT pg_sleep(7); $job$, ARRAY [1, 2]) RETURNING task_id AS task_id6 \gset
|
||||
INSERT INTO pg_dist_background_task (job_id, command, nodes_involved) VALUES (:job_id1, $job$ SELECT pg_sleep(6); $job$, ARRAY [1, 3]) RETURNING task_id AS task_id7 \gset
|
||||
INSERT INTO pg_dist_background_task (job_id, command, nodes_involved) VALUES (:job_id1, $job$ SELECT pg_sleep(6); $job$, ARRAY [1, 4]) RETURNING task_id AS task_id8 \gset
|
||||
COMMIT;
|
||||
SELECT citus_task_wait(:task_id1, desired_status => 'running');
|
||||
citus_task_wait
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
SELECT citus_task_wait(:task_id2, desired_status => 'running');
|
||||
citus_task_wait
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
SELECT job_id, task_id, status, nodes_involved FROM pg_dist_background_task
|
||||
WHERE task_id IN (:task_id1, :task_id2, :task_id3, :task_id4,
|
||||
:task_id5, :task_id6, :task_id7, :task_id8)
|
||||
ORDER BY job_id, task_id; -- show that at most 1 task per node is running
|
||||
job_id | task_id | status | nodes_involved
|
||||
---------------------------------------------------------------------
|
||||
1450017 | 1450025 | running | {1,2}
|
||||
1450017 | 1450026 | running | {3,4}
|
||||
1450017 | 1450027 | runnable | {1,2}
|
||||
1450017 | 1450028 | runnable | {1,3}
|
||||
1450017 | 1450029 | runnable | {2,4}
|
||||
1450017 | 1450030 | runnable | {1,2}
|
||||
1450017 | 1450031 | runnable | {1,3}
|
||||
1450017 | 1450032 | runnable | {1,4}
|
||||
(8 rows)
|
||||
|
||||
SELECT citus_task_wait(:task_id1, desired_status => 'done');
|
||||
citus_task_wait
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
SELECT citus_task_wait(:task_id2, desired_status => 'done');
|
||||
citus_task_wait
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
-- increase max_background_task_executors_per_node on the fly
|
||||
ALTER SYSTEM SET citus.max_background_task_executors_per_node = 2;
|
||||
SELECT pg_reload_conf();
|
||||
pg_reload_conf
|
||||
---------------------------------------------------------------------
|
||||
t
|
||||
(1 row)
|
||||
|
||||
SELECT citus_task_wait(:task_id3, desired_status => 'running');
|
||||
citus_task_wait
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
SELECT citus_task_wait(:task_id4, desired_status => 'running');
|
||||
citus_task_wait
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
SELECT citus_task_wait(:task_id5, desired_status => 'running');
|
||||
citus_task_wait
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
SELECT job_id, task_id, status, nodes_involved FROM pg_dist_background_task
|
||||
WHERE task_id IN (:task_id1, :task_id2, :task_id3, :task_id4,
|
||||
:task_id5, :task_id6, :task_id7, :task_id8)
|
||||
ORDER BY job_id, task_id; -- show that at most 2 tasks per node are running
|
||||
job_id | task_id | status | nodes_involved
|
||||
---------------------------------------------------------------------
|
||||
1450017 | 1450025 | done | {1,2}
|
||||
1450017 | 1450026 | done | {3,4}
|
||||
1450017 | 1450027 | running | {1,2}
|
||||
1450017 | 1450028 | running | {1,3}
|
||||
1450017 | 1450029 | running | {2,4}
|
||||
1450017 | 1450030 | runnable | {1,2}
|
||||
1450017 | 1450031 | runnable | {1,3}
|
||||
1450017 | 1450032 | runnable | {1,4}
|
||||
(8 rows)
|
||||
|
||||
-- increase to 3 max_background_task_executors_per_node on the fly
|
||||
SELECT citus_task_wait(:task_id3, desired_status => 'done');
|
||||
citus_task_wait
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
SELECT citus_task_wait(:task_id4, desired_status => 'done');
|
||||
citus_task_wait
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
SELECT citus_task_wait(:task_id5, desired_status => 'done');
|
||||
citus_task_wait
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
ALTER SYSTEM SET citus.max_background_task_executors_per_node = 3;
|
||||
SELECT pg_reload_conf();
|
||||
pg_reload_conf
|
||||
---------------------------------------------------------------------
|
||||
t
|
||||
(1 row)
|
||||
|
||||
SELECT citus_task_wait(:task_id6, desired_status => 'running');
|
||||
citus_task_wait
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
SELECT citus_task_wait(:task_id7, desired_status => 'running');
|
||||
citus_task_wait
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
SELECT citus_task_wait(:task_id8, desired_status => 'running');
|
||||
citus_task_wait
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
SELECT job_id, task_id, status, nodes_involved FROM pg_dist_background_task
|
||||
WHERE task_id IN (:task_id1, :task_id2, :task_id3, :task_id4,
|
||||
:task_id5, :task_id6, :task_id7, :task_id8)
|
||||
ORDER BY job_id, task_id; -- show that at most 3 tasks per node are running
|
||||
job_id | task_id | status | nodes_involved
|
||||
---------------------------------------------------------------------
|
||||
1450017 | 1450025 | done | {1,2}
|
||||
1450017 | 1450026 | done | {3,4}
|
||||
1450017 | 1450027 | done | {1,2}
|
||||
1450017 | 1450028 | done | {1,3}
|
||||
1450017 | 1450029 | done | {2,4}
|
||||
1450017 | 1450030 | running | {1,2}
|
||||
1450017 | 1450031 | running | {1,3}
|
||||
1450017 | 1450032 | running | {1,4}
|
||||
(8 rows)
|
||||
|
||||
ALTER SYSTEM RESET citus.max_background_task_executors_per_node;
|
||||
SELECT pg_reload_conf();
|
||||
pg_reload_conf
|
||||
---------------------------------------------------------------------
|
||||
t
|
||||
(1 row)
|
||||
|
||||
-- if pg_cancel_backend is called on one of the running task PIDs
|
||||
-- task doesn't restart because it's not allowed anymore by the limit.
|
||||
-- node with id 1 can be used only once, unless there are previously running tasks
|
||||
SELECT pid AS task_id6_pid FROM pg_dist_background_task WHERE task_id IN (:task_id6) \gset
|
||||
SELECT pg_cancel_backend(:task_id6_pid); -- cancel task_id6 process
|
||||
pg_cancel_backend
|
||||
---------------------------------------------------------------------
|
||||
t
|
||||
(1 row)
|
||||
|
||||
-- task goes to only runnable state, not running anymore.
|
||||
SELECT citus_task_wait(:task_id6, desired_status => 'runnable');
|
||||
citus_task_wait
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
-- show that cancelled task hasn't restarted because limit doesn't allow it
|
||||
SELECT job_id, task_id, status, nodes_involved FROM pg_dist_background_task
|
||||
WHERE task_id IN (:task_id1, :task_id2, :task_id3, :task_id4,
|
||||
:task_id5, :task_id6, :task_id7, :task_id8)
|
||||
ORDER BY job_id, task_id;
|
||||
job_id | task_id | status | nodes_involved
|
||||
---------------------------------------------------------------------
|
||||
1450017 | 1450025 | done | {1,2}
|
||||
1450017 | 1450026 | done | {3,4}
|
||||
1450017 | 1450027 | done | {1,2}
|
||||
1450017 | 1450028 | done | {1,3}
|
||||
1450017 | 1450029 | done | {2,4}
|
||||
1450017 | 1450030 | runnable | {1,2}
|
||||
1450017 | 1450031 | running | {1,3}
|
||||
1450017 | 1450032 | running | {1,4}
|
||||
(8 rows)
|
||||
|
||||
SELECT citus_task_wait(:task_id7, desired_status => 'done');
|
||||
citus_task_wait
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
SELECT citus_task_wait(:task_id8, desired_status => 'done');
|
||||
citus_task_wait
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
SELECT citus_task_wait(:task_id6, desired_status => 'running');
|
||||
citus_task_wait
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
-- show that the 6th task has restarted only after both 6 and 7 are done
|
||||
-- since we have a limit of 1 background task executor per node with id 1
|
||||
SELECT job_id, task_id, status, nodes_involved FROM pg_dist_background_task
|
||||
WHERE task_id IN (:task_id1, :task_id2, :task_id3, :task_id4,
|
||||
:task_id5, :task_id6, :task_id7, :task_id8)
|
||||
ORDER BY job_id, task_id;
|
||||
job_id | task_id | status | nodes_involved
|
||||
---------------------------------------------------------------------
|
||||
1450017 | 1450025 | done | {1,2}
|
||||
1450017 | 1450026 | done | {3,4}
|
||||
1450017 | 1450027 | done | {1,2}
|
||||
1450017 | 1450028 | done | {1,3}
|
||||
1450017 | 1450029 | done | {2,4}
|
||||
1450017 | 1450030 | running | {1,2}
|
||||
1450017 | 1450031 | done | {1,3}
|
||||
1450017 | 1450032 | done | {1,4}
|
||||
(8 rows)
|
||||
|
||||
SELECT citus_job_cancel(:job_id1);
|
||||
citus_job_cancel
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
SELECT citus_job_wait(:job_id1);
|
||||
citus_job_wait
|
||||
---------------------------------------------------------------------
|
||||
|
||||
(1 row)
|
||||
|
||||
ALTER SYSTEM RESET citus.max_background_task_executors_per_node;
|
||||
SELECT pg_reload_conf();
|
||||
pg_reload_conf
|
||||
---------------------------------------------------------------------
|
||||
t
|
||||
(1 row)
|
||||
|
||||
SET client_min_messages TO WARNING;
|
||||
TRUNCATE pg_dist_background_job CASCADE;
|
||||
TRUNCATE pg_dist_background_task CASCADE;
|
||||
TRUNCATE pg_dist_background_task_depend;
|
||||
DROP SCHEMA background_task_queue_monitor CASCADE;
|
||||
RESET client_min_messages;
|
||||
ALTER SYSTEM RESET citus.background_task_queue_interval;
|
||||
ALTER SYSTEM RESET citus.max_background_task_executors;
|
||||
SELECT pg_reload_conf();
|
||||
|
|
|
@ -9,7 +9,8 @@ FROM pg_attribute
|
|||
WHERE atthasmissing AND attrelid NOT IN ('pg_dist_node'::regclass,
|
||||
'pg_dist_rebalance_strategy'::regclass,
|
||||
'pg_dist_partition'::regclass,
|
||||
'pg_dist_object'::regclass)
|
||||
'pg_dist_object'::regclass,
|
||||
'pg_dist_background_task'::regclass)
|
||||
ORDER BY attrelid, attname;
|
||||
attrelid | attname | atthasmissing | attmissingval
|
||||
---------------------------------------------------------------------
|
||||
|
|
|
@ -1,12 +1,22 @@
|
|||
/*
|
||||
Test to check if the background tasks scheduled by the background rebalancer
|
||||
has the correct dependencies.
|
||||
*/
|
||||
--
|
||||
-- BACKGROUND_REBALANCE_PARALLEL
|
||||
--
|
||||
-- Test to check if the background tasks scheduled by the background rebalancer
|
||||
-- have the correct dependencies
|
||||
--
|
||||
-- Test to verify that we do not allow parallel rebalancer moves involving a
|
||||
-- particular node (either as source or target) more than
|
||||
-- citus.max_background_task_executors_per_node, and that we can change the GUC on
|
||||
-- the fly, and that will affect the ongoing balance as it should
|
||||
--
|
||||
-- Test to verify that there's a hard dependency when a specific node is first being
|
||||
-- used as a source for a move, and then later as a target.
|
||||
--
|
||||
CREATE SCHEMA background_rebalance_parallel;
|
||||
SET search_path TO background_rebalance_parallel;
|
||||
SET citus.next_shard_id TO 85674000;
|
||||
SET citus.shard_replication_factor TO 1;
|
||||
SET client_min_messages TO WARNING;
|
||||
SET client_min_messages TO ERROR;
|
||||
|
||||
ALTER SEQUENCE pg_dist_background_job_job_id_seq RESTART 17777;
|
||||
ALTER SEQUENCE pg_dist_background_task_task_id_seq RESTART 1000;
|
||||
|
@ -26,34 +36,34 @@ SELECT 1 FROM master_add_node('localhost', :worker_2_port);
|
|||
ALTER SYSTEM SET citus.background_task_queue_interval TO '1s';
|
||||
SELECT pg_reload_conf();
|
||||
|
||||
/* Colocation group 1: create two tables table1_colg1, table2_colg1 and in a colocation group */
|
||||
-- Colocation group 1: create two tables table1_colg1, table2_colg1 and in a colocation group
|
||||
CREATE TABLE table1_colg1 (a int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table1_colg1', 'a', shard_count => 4 , colocate_with => 'none');
|
||||
SELECT create_distributed_table('table1_colg1', 'a', shard_count => 4, colocate_with => 'none');
|
||||
|
||||
CREATE TABLE table2_colg1 (b int PRIMARY KEY);
|
||||
|
||||
SELECT create_distributed_table('table2_colg1', 'b' , colocate_with => 'table1_colg1');
|
||||
SELECT create_distributed_table('table2_colg1', 'b', colocate_with => 'table1_colg1');
|
||||
|
||||
/* Colocation group 2: create two tables table1_colg2, table2_colg2 and in a colocation group */
|
||||
-- Colocation group 2: create two tables table1_colg2, table2_colg2 and in a colocation group
|
||||
CREATE TABLE table1_colg2 (a int PRIMARY KEY);
|
||||
|
||||
SELECT create_distributed_table('table1_colg2 ', 'a', shard_count => 4, colocate_with => 'none');
|
||||
SELECT create_distributed_table('table1_colg2', 'a', shard_count => 4, colocate_with => 'none');
|
||||
|
||||
CREATE TABLE table2_colg2 (b int primary key);
|
||||
|
||||
SELECT create_distributed_table('table2_colg2', 'b' , colocate_with => 'table1_colg2');
|
||||
SELECT create_distributed_table('table2_colg2', 'b', colocate_with => 'table1_colg2');
|
||||
|
||||
/* Colocation group 3: create two tables table1_colg3, table2_colg3 and in a colocation group */
|
||||
-- Colocation group 3: create two tables table1_colg3, table2_colg3 and in a colocation group
|
||||
CREATE TABLE table1_colg3 (a int PRIMARY KEY);
|
||||
|
||||
SELECT create_distributed_table('table1_colg3 ', 'a', shard_count => 4, colocate_with => 'none');
|
||||
SELECT create_distributed_table('table1_colg3', 'a', shard_count => 4, colocate_with => 'none');
|
||||
|
||||
CREATE TABLE table2_colg3 (b int primary key);
|
||||
|
||||
SELECT create_distributed_table('table2_colg3', 'b' , colocate_with => 'table1_colg3');
|
||||
SELECT create_distributed_table('table2_colg3', 'b', colocate_with => 'table1_colg3');
|
||||
|
||||
|
||||
/* Add two new node so that we can rebalance */
|
||||
-- Add two new nodes so that we can rebalance
|
||||
SELECT 1 FROM citus_add_node('localhost', :worker_3_port);
|
||||
SELECT 1 FROM citus_add_node('localhost', :worker_4_port);
|
||||
|
||||
|
@ -63,10 +73,12 @@ SELECT * FROM citus_rebalance_start();
|
|||
|
||||
SELECT citus_rebalance_wait();
|
||||
|
||||
/*Check that a move is dependent on
|
||||
1. any other move scheduled earlier in its colocation group.
|
||||
2. any other move scheduled earlier whose source node or target
|
||||
node overlaps with the current moves nodes. */
|
||||
-- PART 1
|
||||
-- Test to check if the background tasks scheduled by the background rebalancer
|
||||
-- have the correct dependencies
|
||||
|
||||
-- Check that a move is dependent on
|
||||
-- any other move scheduled earlier in its colocation group.
|
||||
SELECT S.shardid, P.colocationid
|
||||
FROM pg_dist_shard S, pg_dist_partition P
|
||||
WHERE S.logicalrelid = P.logicalrelid ORDER BY S.shardid ASC;
|
||||
|
@ -78,14 +90,14 @@ SELECT D.task_id,
|
|||
FROM pg_dist_background_task_depend D WHERE job_id = 17777 ORDER BY D.task_id, D.depends_on ASC;
|
||||
|
||||
|
||||
/* Check that if there is a reference table that needs to be synched to a node,
|
||||
any move without a dependency must depend on the move task for reference table. */
|
||||
-- Check that if there is a reference table that needs to be synched to a node,
|
||||
-- any move without a dependency must depend on the move task for reference table.
|
||||
SELECT 1 FROM citus_drain_node('localhost',:worker_4_port);
|
||||
SELECT public.wait_for_resource_cleanup();
|
||||
SELECT 1 FROM citus_disable_node('localhost', :worker_4_port, synchronous:=true);
|
||||
|
||||
/* Drain worker_3 so that we can move only one colocation group to worker_3
|
||||
to create an unbalance that would cause parallel rebalancing. */
|
||||
-- Drain worker_3 so that we can move only one colocation group to worker_3
|
||||
-- to create an unbalance that would cause parallel rebalancing.
|
||||
SELECT 1 FROM citus_drain_node('localhost',:worker_3_port);
|
||||
SELECT citus_set_node_property('localhost', :worker_3_port, 'shouldhaveshards', true);
|
||||
|
||||
|
@ -95,7 +107,7 @@ CREATE TABLE ref_table(a int PRIMARY KEY);
|
|||
|
||||
SELECT create_reference_table('ref_table');
|
||||
|
||||
/* Move all the shards of Colocation group 3 to worker_3.*/
|
||||
-- Move all the shards of Colocation group 3 to worker_3.
|
||||
SELECT
|
||||
master_move_shard_placement(shardid, 'localhost', nodeport, 'localhost', :worker_3_port, 'block_writes')
|
||||
FROM
|
||||
|
@ -107,7 +119,7 @@ ORDER BY
|
|||
|
||||
CALL citus_cleanup_orphaned_resources();
|
||||
|
||||
/* Activate and new nodes so that we can rebalance. */
|
||||
-- Activate and new nodes so that we can rebalance.
|
||||
SELECT 1 FROM citus_activate_node('localhost', :worker_4_port);
|
||||
SELECT citus_set_node_property('localhost', :worker_4_port, 'shouldhaveshards', true);
|
||||
|
||||
|
@ -128,13 +140,265 @@ SELECT D.task_id,
|
|||
(SELECT T.command FROM pg_dist_background_task T WHERE T.task_id = D.depends_on)
|
||||
FROM pg_dist_background_task_depend D WHERE job_id = 17778 ORDER BY D.task_id, D.depends_on ASC;
|
||||
|
||||
-- PART 2
|
||||
-- Test to verify that we do not allow parallel rebalancer moves involving a
|
||||
-- particular node (either as source or target)
|
||||
-- more than citus.max_background_task_executors_per_node
|
||||
-- and that we can change the GUC on the fly
|
||||
-- citus_task_wait calls are used to ensure consistent pg_dist_background_task query
|
||||
-- output i.e. to avoid flakiness
|
||||
|
||||
-- First let's restart the scenario
|
||||
DROP SCHEMA background_rebalance_parallel CASCADE;
|
||||
TRUNCATE pg_dist_background_job CASCADE;
|
||||
TRUNCATE pg_dist_background_task CASCADE;
|
||||
TRUNCATE pg_dist_background_task_depend;
|
||||
SELECT public.wait_for_resource_cleanup();
|
||||
select citus_remove_node('localhost', :worker_2_port);
|
||||
select citus_remove_node('localhost', :worker_3_port);
|
||||
select citus_remove_node('localhost', :worker_4_port);
|
||||
select citus_remove_node('localhost', :worker_5_port);
|
||||
select citus_remove_node('localhost', :worker_6_port);
|
||||
CREATE SCHEMA background_rebalance_parallel;
|
||||
SET search_path TO background_rebalance_parallel;
|
||||
|
||||
-- Create 8 tables in 4 colocation groups, and populate them
|
||||
CREATE TABLE table1_colg1 (a int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table1_colg1', 'a', shard_count => 3, colocate_with => 'none');
|
||||
INSERT INTO table1_colg1 SELECT i FROM generate_series(0, 100)i;
|
||||
|
||||
CREATE TABLE table2_colg1 (b int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table2_colg1', 'b', colocate_with => 'table1_colg1');
|
||||
INSERT INTO table2_colg1 SELECT i FROM generate_series(0, 100)i;
|
||||
|
||||
CREATE TABLE table1_colg2 (a int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table1_colg2', 'a', shard_count => 3, colocate_with => 'none');
|
||||
INSERT INTO table1_colg2 SELECT i FROM generate_series(0, 100)i;
|
||||
|
||||
CREATE TABLE table2_colg2 (b int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table2_colg2', 'b', colocate_with => 'table1_colg2');
|
||||
INSERT INTO table2_colg2 SELECT i FROM generate_series(0, 100)i;
|
||||
|
||||
CREATE TABLE table1_colg3 (a int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table1_colg3', 'a', shard_count => 3, colocate_with => 'none');
|
||||
INSERT INTO table1_colg3 SELECT i FROM generate_series(0, 100)i;
|
||||
|
||||
CREATE TABLE table2_colg3 (b int primary key);
|
||||
SELECT create_distributed_table('table2_colg3', 'b', colocate_with => 'table1_colg3');
|
||||
INSERT INTO table2_colg3 SELECT i FROM generate_series(0, 100)i;
|
||||
|
||||
CREATE TABLE table1_colg4 (a int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table1_colg4', 'a', shard_count => 3, colocate_with => 'none');
|
||||
INSERT INTO table1_colg4 SELECT i FROM generate_series(0, 100)i;
|
||||
|
||||
CREATE TABLE table2_colg4 (b int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table2_colg4', 'b', colocate_with => 'table1_colg4');
|
||||
INSERT INTO table2_colg4 SELECT i FROM generate_series(0, 100)i;
|
||||
|
||||
-- Add nodes so that we can rebalance
|
||||
SELECT citus_add_node('localhost', :worker_2_port);
|
||||
SELECT citus_add_node('localhost', :worker_3_port);
|
||||
|
||||
SELECT citus_rebalance_start AS job_id from citus_rebalance_start() \gset
|
||||
|
||||
-- see dependent tasks to understand which tasks remain runnable because of
|
||||
-- citus.max_background_task_executors_per_node
|
||||
-- and which tasks are actually blocked from colocation group dependencies
|
||||
SELECT D.task_id,
|
||||
(SELECT T.command FROM pg_dist_background_task T WHERE T.task_id = D.task_id),
|
||||
D.depends_on,
|
||||
(SELECT T.command FROM pg_dist_background_task T WHERE T.task_id = D.depends_on)
|
||||
FROM pg_dist_background_task_depend D WHERE job_id in (:job_id) ORDER BY D.task_id, D.depends_on ASC;
|
||||
|
||||
-- default citus.max_background_task_executors_per_node is 1
|
||||
-- show that first exactly one task per node is running
|
||||
-- among the tasks that are not blocked
|
||||
SELECT citus_task_wait(1013, desired_status => 'running');
|
||||
SELECT job_id, task_id, status, nodes_involved
|
||||
FROM pg_dist_background_task WHERE job_id in (:job_id) ORDER BY task_id;
|
||||
|
||||
-- increase citus.max_background_task_executors_per_node
|
||||
ALTER SYSTEM SET citus.max_background_task_executors_per_node = 2;
|
||||
SELECT pg_reload_conf();
|
||||
SELECT citus_task_wait(1015, desired_status => 'running');
|
||||
SELECT citus_task_wait(1013, desired_status => 'done');
|
||||
|
||||
-- show that at most 2 tasks per node are running
|
||||
-- among the tasks that are not blocked
|
||||
SELECT job_id, task_id, status, nodes_involved
|
||||
FROM pg_dist_background_task WHERE job_id in (:job_id) ORDER BY task_id;
|
||||
|
||||
-- decrease to default (1)
|
||||
ALTER SYSTEM RESET citus.max_background_task_executors_per_node;
|
||||
SELECT pg_reload_conf();
|
||||
SELECT citus_task_wait(1015, desired_status => 'done');
|
||||
SELECT citus_task_wait(1014, desired_status => 'done');
|
||||
SELECT citus_task_wait(1016, desired_status => 'running');
|
||||
|
||||
-- show that exactly one task per node is running
|
||||
-- among the tasks that are not blocked
|
||||
SELECT job_id, task_id, status, nodes_involved
|
||||
FROM pg_dist_background_task WHERE job_id in (:job_id) ORDER BY task_id;
|
||||
|
||||
SELECT citus_rebalance_stop();
|
||||
|
||||
-- PART 3
|
||||
-- Test to verify that there's a hard dependency when A specific node is first being used as a
|
||||
-- source for a move, and then later as a target.
|
||||
|
||||
-- First let's restart the scenario
|
||||
DROP SCHEMA background_rebalance_parallel CASCADE;
|
||||
TRUNCATE pg_dist_background_job CASCADE;
|
||||
TRUNCATE pg_dist_background_task CASCADE;
|
||||
TRUNCATE pg_dist_background_task_depend;
|
||||
SELECT public.wait_for_resource_cleanup();
|
||||
select citus_remove_node('localhost', :worker_1_port);
|
||||
select citus_remove_node('localhost', :worker_2_port);
|
||||
select citus_remove_node('localhost', :worker_3_port);
|
||||
CREATE SCHEMA background_rebalance_parallel;
|
||||
SET search_path TO background_rebalance_parallel;
|
||||
SET citus.next_shard_id TO 85674051;
|
||||
ALTER SEQUENCE pg_catalog.pg_dist_node_nodeid_seq RESTART 61;
|
||||
|
||||
-- add the first node
|
||||
-- nodeid here is 61
|
||||
select citus_add_node('localhost', :worker_1_port);
|
||||
|
||||
-- create, populate and distribute 6 tables, each with 1 shard, none colocated with each other
|
||||
CREATE TABLE table1_colg1 (a int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table1_colg1', 'a', shard_count => 1, colocate_with => 'none');
|
||||
INSERT INTO table1_colg1 SELECT i FROM generate_series(0, 100)i;
|
||||
|
||||
CREATE TABLE table1_colg2 (a int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table1_colg2', 'a', shard_count => 1, colocate_with => 'none');
|
||||
INSERT INTO table1_colg2 SELECT i FROM generate_series(0, 100)i;
|
||||
|
||||
CREATE TABLE table1_colg3 (a int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table1_colg3', 'a', shard_count => 1, colocate_with => 'none');
|
||||
INSERT INTO table1_colg3 SELECT i FROM generate_series(0, 100)i;
|
||||
|
||||
CREATE TABLE table1_colg4 (a int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table1_colg4', 'a', shard_count => 1, colocate_with => 'none');
|
||||
INSERT INTO table1_colg4 SELECT i FROM generate_series(0, 100)i;
|
||||
|
||||
CREATE TABLE table1_colg5 (a int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table1_colg5', 'a', shard_count => 1, colocate_with => 'none');
|
||||
INSERT INTO table1_colg5 SELECT i FROM generate_series(0, 100)i;
|
||||
|
||||
CREATE TABLE table1_colg6 (a int PRIMARY KEY);
|
||||
SELECT create_distributed_table('table1_colg6', 'a', shard_count => 1, colocate_with => 'none');
|
||||
INSERT INTO table1_colg6 SELECT i FROM generate_series(0, 100)i;
|
||||
|
||||
-- add two other nodes
|
||||
-- nodeid here is 62
|
||||
select citus_add_node('localhost', :worker_2_port);
|
||||
-- nodeid here is 63
|
||||
select citus_add_node('localhost', :worker_3_port);
|
||||
|
||||
CREATE OR REPLACE FUNCTION shard_placement_rebalance_array(
|
||||
worker_node_list json[],
|
||||
shard_placement_list json[],
|
||||
threshold float4 DEFAULT 0,
|
||||
max_shard_moves int DEFAULT 1000000,
|
||||
drain_only bool DEFAULT false,
|
||||
improvement_threshold float4 DEFAULT 0.5
|
||||
)
|
||||
RETURNS json[]
|
||||
AS 'citus'
|
||||
LANGUAGE C STRICT VOLATILE;
|
||||
|
||||
-- we are simulating the following from shard_rebalancer_unit.sql
|
||||
-- the following steps are all according to this scenario
|
||||
-- where the third move should be dependent of the first two
|
||||
-- because the third move's target is the source of the first two
|
||||
SELECT unnest(shard_placement_rebalance_array(
|
||||
ARRAY['{"node_name": "hostname1", "disallowed_shards": "1,2,3,5,6"}',
|
||||
'{"node_name": "hostname2", "disallowed_shards": "4"}',
|
||||
'{"node_name": "hostname3", "disallowed_shards": "4"}'
|
||||
]::json[],
|
||||
ARRAY['{"shardid":1, "nodename":"hostname1"}',
|
||||
'{"shardid":2, "nodename":"hostname1"}',
|
||||
'{"shardid":3, "nodename":"hostname2"}',
|
||||
'{"shardid":4, "nodename":"hostname2"}',
|
||||
'{"shardid":5, "nodename":"hostname3"}',
|
||||
'{"shardid":6, "nodename":"hostname3"}'
|
||||
]::json[]
|
||||
));
|
||||
|
||||
-- manually balance the cluster such that we have
|
||||
-- a balanced cluster like above with 1,2,3,4,5,6 and hostname1/2/3
|
||||
-- shardid 85674051 (1) nodeid 61 (hostname1)
|
||||
-- shardid 85674052 (2) nodeid 61 (hostname1)
|
||||
-- shardid 85674053 (3) nodeid 62 (hostname2)
|
||||
-- shardid 85674054 (4) nodeid 62 (hostname2)
|
||||
-- shardid 85674055 (5) nodeid 63 (hostname3)
|
||||
-- shardid 85674056 (6) nodeid 63 (hostname3)
|
||||
SELECT pg_catalog.citus_move_shard_placement(85674053,61,62,'auto');
|
||||
SELECT pg_catalog.citus_move_shard_placement(85674054,61,62,'auto');
|
||||
SELECT pg_catalog.citus_move_shard_placement(85674055,61,63,'auto');
|
||||
SELECT pg_catalog.citus_move_shard_placement(85674056,61,63,'auto');
|
||||
|
||||
-- now create another rebalance strategy in order to simulate moves
|
||||
-- which use as target a node that has been previously used as source
|
||||
CREATE OR REPLACE FUNCTION test_shard_allowed_on_node(shardid bigint, nodeid int)
|
||||
RETURNS boolean AS
|
||||
$$
|
||||
-- analogous to '{"node_name": "hostname1", "disallowed_shards": "1,2,3,5,6"}'
|
||||
select case when (shardid != 85674054 and nodeid = 61)
|
||||
then false
|
||||
-- analogous to '{"node_name": "hostname2", "disallowed_shards": "4"}'
|
||||
-- AND '{"node_name": "hostname2", "disallowed_shards": "4"}'
|
||||
when (shardid = 85674054 and nodeid != 61)
|
||||
then false
|
||||
else true
|
||||
end;
|
||||
$$ LANGUAGE sql;
|
||||
|
||||
-- insert the new test rebalance strategy
|
||||
INSERT INTO
|
||||
pg_catalog.pg_dist_rebalance_strategy(
|
||||
name,
|
||||
default_strategy,
|
||||
shard_cost_function,
|
||||
node_capacity_function,
|
||||
shard_allowed_on_node_function,
|
||||
default_threshold,
|
||||
minimum_threshold,
|
||||
improvement_threshold
|
||||
) VALUES (
|
||||
'test_source_then_target',
|
||||
false,
|
||||
'citus_shard_cost_1',
|
||||
'citus_node_capacity_1',
|
||||
'background_rebalance_parallel.test_shard_allowed_on_node',
|
||||
0,
|
||||
0,
|
||||
0
|
||||
);
|
||||
|
||||
SELECT * FROM get_rebalance_table_shards_plan(rebalance_strategy := 'test_source_then_target');
|
||||
|
||||
SELECT citus_rebalance_start AS job_id from citus_rebalance_start(rebalance_strategy := 'test_source_then_target') \gset
|
||||
|
||||
-- check that the third move is blocked and depends on the first two
|
||||
SELECT job_id, task_id, status, nodes_involved
|
||||
FROM pg_dist_background_task WHERE job_id in (:job_id) ORDER BY task_id;
|
||||
|
||||
SELECT D.task_id,
|
||||
(SELECT T.command FROM pg_dist_background_task T WHERE T.task_id = D.task_id),
|
||||
D.depends_on,
|
||||
(SELECT T.command FROM pg_dist_background_task T WHERE T.task_id = D.depends_on)
|
||||
FROM pg_dist_background_task_depend D WHERE job_id in (:job_id) ORDER BY D.task_id, D.depends_on ASC;
|
||||
|
||||
SELECT citus_rebalance_stop();
|
||||
DELETE FROM pg_catalog.pg_dist_rebalance_strategy WHERE name='test_source_then_target';
|
||||
|
||||
DROP SCHEMA background_rebalance_parallel CASCADE;
|
||||
TRUNCATE pg_dist_background_job CASCADE;
|
||||
TRUNCATE pg_dist_background_task CASCADE;
|
||||
TRUNCATE pg_dist_background_task_depend;
|
||||
SELECT public.wait_for_resource_cleanup();
|
||||
select citus_remove_node('localhost', :worker_3_port);
|
||||
-- keep the rest of the tests inact that depends node/group ids
|
||||
ALTER SEQUENCE pg_catalog.pg_dist_groupid_seq RESTART :last_group_id_cls;
|
||||
ALTER SEQUENCE pg_catalog.pg_dist_node_nodeid_seq RESTART :last_node_id_cls;
|
||||
|
|
|
@ -3,6 +3,7 @@ SET search_path TO background_task_queue_monitor;
|
|||
SET citus.shard_count TO 4;
|
||||
SET citus.shard_replication_factor TO 1;
|
||||
SET citus.next_shard_id TO 3536400;
|
||||
SET client_min_messages TO ERROR;
|
||||
|
||||
-- reset sequence values
|
||||
ALTER SEQUENCE pg_dist_background_job_job_id_seq RESTART 1450000;
|
||||
|
@ -279,11 +280,106 @@ SELECT job_id, task_id, status FROM pg_dist_background_task
|
|||
WHERE task_id IN (:task_id1, :task_id2)
|
||||
ORDER BY job_id, task_id; -- show that task is cancelled
|
||||
|
||||
-- TEST11
|
||||
-- verify that we do not allow parallel task executors involving a particular node
|
||||
-- more than citus.max_background_task_executors_per_node
|
||||
-- verify that we can change citus.max_background_task_executors_per_node on the fly
|
||||
-- tests are done with dummy node ids
|
||||
-- citus_task_wait calls are used to ensure consistent pg_dist_background_task query
|
||||
-- output i.e. to avoid flakiness
|
||||
|
||||
BEGIN;
|
||||
INSERT INTO pg_dist_background_job (job_type, description) VALUES ('test_job', 'simple test to verify changing max background task executors per node on the fly') RETURNING job_id AS job_id1 \gset
|
||||
INSERT INTO pg_dist_background_task (job_id, command, nodes_involved) VALUES (:job_id1, $job$ SELECT pg_sleep(2); $job$, ARRAY [1, 2]) RETURNING task_id AS task_id1 \gset
|
||||
INSERT INTO pg_dist_background_task (job_id, command, nodes_involved) VALUES (:job_id1, $job$ SELECT pg_sleep(2); $job$, ARRAY [3, 4]) RETURNING task_id AS task_id2 \gset
|
||||
INSERT INTO pg_dist_background_task (job_id, command, nodes_involved) VALUES (:job_id1, $job$ SELECT pg_sleep(4); $job$, ARRAY [1, 2]) RETURNING task_id AS task_id3 \gset
|
||||
INSERT INTO pg_dist_background_task (job_id, command, nodes_involved) VALUES (:job_id1, $job$ SELECT pg_sleep(4); $job$, ARRAY [1, 3]) RETURNING task_id AS task_id4 \gset
|
||||
INSERT INTO pg_dist_background_task (job_id, command, nodes_involved) VALUES (:job_id1, $job$ SELECT pg_sleep(4); $job$, ARRAY [2, 4]) RETURNING task_id AS task_id5 \gset
|
||||
INSERT INTO pg_dist_background_task (job_id, command, nodes_involved) VALUES (:job_id1, $job$ SELECT pg_sleep(7); $job$, ARRAY [1, 2]) RETURNING task_id AS task_id6 \gset
|
||||
INSERT INTO pg_dist_background_task (job_id, command, nodes_involved) VALUES (:job_id1, $job$ SELECT pg_sleep(6); $job$, ARRAY [1, 3]) RETURNING task_id AS task_id7 \gset
|
||||
INSERT INTO pg_dist_background_task (job_id, command, nodes_involved) VALUES (:job_id1, $job$ SELECT pg_sleep(6); $job$, ARRAY [1, 4]) RETURNING task_id AS task_id8 \gset
|
||||
COMMIT;
|
||||
|
||||
SELECT citus_task_wait(:task_id1, desired_status => 'running');
|
||||
SELECT citus_task_wait(:task_id2, desired_status => 'running');
|
||||
|
||||
SELECT job_id, task_id, status, nodes_involved FROM pg_dist_background_task
|
||||
WHERE task_id IN (:task_id1, :task_id2, :task_id3, :task_id4,
|
||||
:task_id5, :task_id6, :task_id7, :task_id8)
|
||||
ORDER BY job_id, task_id; -- show that at most 1 task per node is running
|
||||
|
||||
SELECT citus_task_wait(:task_id1, desired_status => 'done');
|
||||
SELECT citus_task_wait(:task_id2, desired_status => 'done');
|
||||
-- increase max_background_task_executors_per_node on the fly
|
||||
ALTER SYSTEM SET citus.max_background_task_executors_per_node = 2;
|
||||
SELECT pg_reload_conf();
|
||||
|
||||
SELECT citus_task_wait(:task_id3, desired_status => 'running');
|
||||
SELECT citus_task_wait(:task_id4, desired_status => 'running');
|
||||
SELECT citus_task_wait(:task_id5, desired_status => 'running');
|
||||
|
||||
SELECT job_id, task_id, status, nodes_involved FROM pg_dist_background_task
|
||||
WHERE task_id IN (:task_id1, :task_id2, :task_id3, :task_id4,
|
||||
:task_id5, :task_id6, :task_id7, :task_id8)
|
||||
ORDER BY job_id, task_id; -- show that at most 2 tasks per node are running
|
||||
|
||||
-- increase to 3 max_background_task_executors_per_node on the fly
|
||||
SELECT citus_task_wait(:task_id3, desired_status => 'done');
|
||||
SELECT citus_task_wait(:task_id4, desired_status => 'done');
|
||||
SELECT citus_task_wait(:task_id5, desired_status => 'done');
|
||||
ALTER SYSTEM SET citus.max_background_task_executors_per_node = 3;
|
||||
SELECT pg_reload_conf();
|
||||
|
||||
SELECT citus_task_wait(:task_id6, desired_status => 'running');
|
||||
SELECT citus_task_wait(:task_id7, desired_status => 'running');
|
||||
SELECT citus_task_wait(:task_id8, desired_status => 'running');
|
||||
|
||||
SELECT job_id, task_id, status, nodes_involved FROM pg_dist_background_task
|
||||
WHERE task_id IN (:task_id1, :task_id2, :task_id3, :task_id4,
|
||||
:task_id5, :task_id6, :task_id7, :task_id8)
|
||||
ORDER BY job_id, task_id; -- show that at most 3 tasks per node are running
|
||||
|
||||
ALTER SYSTEM RESET citus.max_background_task_executors_per_node;
|
||||
SELECT pg_reload_conf();
|
||||
|
||||
-- if pg_cancel_backend is called on one of the running task PIDs
|
||||
-- task doesn't restart because it's not allowed anymore by the limit.
|
||||
-- node with id 1 can be used only once, unless there are previously running tasks
|
||||
SELECT pid AS task_id6_pid FROM pg_dist_background_task WHERE task_id IN (:task_id6) \gset
|
||||
SELECT pg_cancel_backend(:task_id6_pid); -- cancel task_id6 process
|
||||
|
||||
-- task goes to only runnable state, not running anymore.
|
||||
SELECT citus_task_wait(:task_id6, desired_status => 'runnable');
|
||||
|
||||
-- show that cancelled task hasn't restarted because limit doesn't allow it
|
||||
SELECT job_id, task_id, status, nodes_involved FROM pg_dist_background_task
|
||||
WHERE task_id IN (:task_id1, :task_id2, :task_id3, :task_id4,
|
||||
:task_id5, :task_id6, :task_id7, :task_id8)
|
||||
ORDER BY job_id, task_id;
|
||||
|
||||
SELECT citus_task_wait(:task_id7, desired_status => 'done');
|
||||
SELECT citus_task_wait(:task_id8, desired_status => 'done');
|
||||
SELECT citus_task_wait(:task_id6, desired_status => 'running');
|
||||
|
||||
-- show that the 6th task has restarted only after both 6 and 7 are done
|
||||
-- since we have a limit of 1 background task executor per node with id 1
|
||||
SELECT job_id, task_id, status, nodes_involved FROM pg_dist_background_task
|
||||
WHERE task_id IN (:task_id1, :task_id2, :task_id3, :task_id4,
|
||||
:task_id5, :task_id6, :task_id7, :task_id8)
|
||||
ORDER BY job_id, task_id;
|
||||
|
||||
SELECT citus_job_cancel(:job_id1);
|
||||
SELECT citus_job_wait(:job_id1);
|
||||
|
||||
ALTER SYSTEM RESET citus.max_background_task_executors_per_node;
|
||||
SELECT pg_reload_conf();
|
||||
|
||||
SET client_min_messages TO WARNING;
|
||||
TRUNCATE pg_dist_background_job CASCADE;
|
||||
TRUNCATE pg_dist_background_task CASCADE;
|
||||
TRUNCATE pg_dist_background_task_depend;
|
||||
DROP SCHEMA background_task_queue_monitor CASCADE;
|
||||
RESET client_min_messages;
|
||||
|
||||
ALTER SYSTEM RESET citus.background_task_queue_interval;
|
||||
ALTER SYSTEM RESET citus.max_background_task_executors;
|
||||
|
|
|
@ -10,5 +10,6 @@ FROM pg_attribute
|
|||
WHERE atthasmissing AND attrelid NOT IN ('pg_dist_node'::regclass,
|
||||
'pg_dist_rebalance_strategy'::regclass,
|
||||
'pg_dist_partition'::regclass,
|
||||
'pg_dist_object'::regclass)
|
||||
'pg_dist_object'::regclass,
|
||||
'pg_dist_background_task'::regclass)
|
||||
ORDER BY attrelid, attname;
|
||||
|
|
Loading…
Reference in New Issue