BEGIN/COMMIT transaction block or in a UDF calling another UDF.
(2) Prohibit/Limit the delegated function not to do a 2PC (or any work on a
remote connection).
(3) Have a safety net to ensure the (2) i.e. we should block the connections
from the delegated procedure or make sure that no 2PC happens on the node.
(4) Such delegated functions are restricted to use only the distributed argument
value.
Note: To limit the scope of the project we are considering only Functions(not
procedures) for the initial work.
DESCRIPTION: Introduce a new flag "force_delegation" in create_distributed_function(),
which will allow a function to be delegated in an explicit transaction block.
Fixes#3265
Once the function is delegated to the worker, on that node during the planning
distributed_planner()
TryToDelegateFunctionCall()
CheckDelegatedFunctionExecution()
EnableInForceDelegatedFuncExecution()
Save the distribution argument (Constant)
ExecutorStart()
CitusBeginScan()
IsShardKeyValueAllowed()
Ensure to not use non-distribution argument.
ExecutorRun()
AdaptiveExecutor()
StartDistributedExecution()
EnsureNoRemoteExecutionFromWorkers()
Ensure all the shards are local to the node in the remoteTaskList.
NonPushableInsertSelectExecScan()
InitializeCopyShardState()
EnsureNoRemoteExecutionFromWorkers()
Ensure all the shards are local to the node in the placementList.
This also fixes a minor issue: Properly handle expressions+parameters in distribution arguments
* Removed distributed dependency in columnar_metadata.c
* Changed columnar_debug.c so that it no longer needed distributed/tuplestore and made it return a record instead of a tuplestore
* removed distributed/commands.h dependency
* Made columnar_tableam.c dependency-free
* Fixed spacing for columnar_store_memory_stats function
* indentation fix
* fixed test failures
* Require superuser while activating a node
With this change, we require ActiveNode() (hence citus_add_node(),
citus_activate_node()) explicitly require for a superuser.
Before this commit, these functions were designed to work with
non-superuser roles with the relevent GRANTs given.
However, that is not a widely used way for calling the functions
above.
Due to possibility of non-super user calling the UDFs, they were
designed in a way that some commands were using some additional
short-lived superuser connections. That is:
(a) breaking transactional behavior (e.g., ROLLBACK
wouldn't fully rollback the whole transaction)
(b) Making it very complicated to reason about which
parts of the node activation goes over which connections,
and becoming vulnerable to deadlocks / visibility issues.
We prefer the background daemon to only sync node metadata. That's
why we move placement metadata changes from disable node to
activate node. With that, we can make sure that disable node
only changes node metadata, whereas activate node syncs all
the metadata changes. In essence, we already expect all
nodes to be up when a node is activated. So, this does not change
the behavior much.
Dropping sequences means we need to recreate
and hence losing the sequence.
With this commit, we keep the existing sequences
such that resyncing wouldn't drop the sequence.
We do that by breaking the dependency of the sequence
from the table.
Before this commit, Citus was triggering metadata syncing
in the background when a function is distributed. However,
with Citus 11, we expect all clusters to have metadata synced
enabled. So, we do not expect any nodes not to have the metadata.
This change:
(a) pro: simplifies the code and opens up possibilities
to simplify futher by reducing the scope of
bg worker to only sync node metadata
(b) pro: explicitly asks users to sync the metadata such that
any unforseen impact can be easily detected
(c) con: For distributed functions without distribution
argument, we do not necessarily require the metadata
sycned. However, for completeness and simplicity, we
do so.
With Citus 11, the default behavior is to sync the metadata.
However, partitioned tables created pre-Citus 11 might have
index names that are not compatiable with metadata syncing.
See https://github.com/citusdata/citus/issues/4962 for the
details.
With this commit, we record the existence of partitioned tables
such that we can fix it later if any exists.
With this commit, fix_partition_shard_index_names()
works significantly faster.
For example,
32 shards, 365 partitions, 5 indexes drop from ~120 seconds to ~44 seconds
32 shards, 1095 partitions, 5 indexes drop from ~600 seconds to ~265 seconds
`queryStringList` can be really long, because it may contain #partitions * #indexes entries.
Before this change, we were actually going through the executor where each command
in the query string triggers 1 round trip per entry in queryStringList.
The aim of this commit is to avoid the round-trips by creating a single query string.
I first simply tried sending `q1;q2;..;qn` . However, the executor is designed to
handle `q1;q2;..;qn` type of query executions via the infrastructure mentioned
above (e.g., by tracking the query indexes in the list and doing 1 statement
per round trip).
One another option could have been to change the executor such that only track
the query index when `queryStringList` is provided not with queryString
including multiple `;`s . That is (a) more work (b) could cause weird edge
cases with failure handling (c) felt like coding a special case in to the executor
(cherry picked from commit 90928cfd74)
Fix function signature generation
Fix comment typo
Add test for worker_create_or_replace_object
Add test for recreating distributed functions with OUT/TABLE params
Add test for recreating distributed function that returns setof int
Fix test output
Fix comment
Simply applies
```SQL
SELECT textlike(command, citus.grep_remote_commands)
```
And, if returns true, the command is logged. Else, the log is ignored.
When citus.grep_remote_commands is empty string, all commands are
logged.
This UDF coordinates connectivity checks accross the whole cluster.
This UDF gets the list of active readable nodes in the cluster, and
coordinates all connectivity checks in sequential order.
The algorithm is:
for sourceNode in activeReadableWorkerList:
c = connectToNode(sourceNode)
for targetNode in activeReadableWorkerList:
result = c.execute(
"SELECT citus_check_connection_to_node(targetNode.name,
targetNode.port")
emit sourceNode.name,
sourceNode.port,
targetNode.name,
targetNode.port,
result
- result -> true -> connection attempt from source to target succeeded
- result -> false -> connection attempt from source to target failed
- result -> NULL -> connection attempt from the current node to source node failed
I suggest you use the following query to get an overview on the connectivity:
SELECT bool_and(COALESCE(result, false))
FROM citus_check_cluster_node_health();
Whenever this query returns false, there is a connectivity issue, check in detail.
Before that PR we were updating citus.pg_dist_object metadata, which keeps
the metadata related to objects on Citus, only on the coordinator node. In
order to allow using those object from worker nodes (or erroring out with
proper error message) we've started to propagate that metedata to worker
nodes as well.
citus_check_connection_to_node runs a simple query on a remote node and
reports whether this attempt was successful.
This UDF will be used to make sure each worker node can connect to all
the worker nodes in the cluster.
parameters:
nodename: required
nodeport: optional (default: 5432)
return value:
boolean success
As of master branch, Citus does all the modifications to replicated tables
(e.g., reference tables and distributed tables with replication factor > 1),
via 2PC and avoids any shardstate=3. As a side-effect of those changes,
handling node failures for replicated tables change.
With this PR, when one (or multiple) node failures happen, the users would
see query errors on modifications. If the problem is intermitant, that's OK,
once the node failure(s) recover by themselves, the modification queries would
succeed. If the node failure(s) are permenant, the users should call
`SELECT citus_disable_node(...)` to disable the node. As soon as the node is
disabled, modification would start to succeed. However, now the old node gets
behind. It means that, when the node is up again, the placements should be
re-created on the node. First, use `SELECT citus_activate_node()`. Then, use
`SELECT replicate_table_shards(...)` to replicate the missing placements on
the re-activated node.