Citus heavily relies on application_name, see
`IsCitusInitiatedRemoteBackend()`.
But if the user set the application name, such as export PGAPPNAME=test_name,
Citus uses that name while connecting to the remote node.
With this commit, we ensure that Citus always connects with
the "citus" user name to the remote nodes.
With https://github.com/citusdata/citus/pull/2780, we allow
COPY to use any number of connections that the executor used
in a tx block.
Meaning that, while COPYing data to the shards, create_distributed_table
could allow sequential mode.
We fall back to local execution if we cannot establish any more
connections to local node. However, we should not do that for the
commands that we don't know how to execute locally (or we know we
shouldn't execute locally). To fix that, we take localExecutionSupported
take into account in CanFailoverPlacementExecutionToLocalExecution too.
Moreover, we also prompt a more accurate hint message to inform user
about whether the execution is failed because local execution is
disabled by them, or because local execution wasn't possible for given
command.
multi_log_hook() hook is called by EmitErrorReport() when emitting the
ereport either to frontend or to the server logs. And some callers of
EmitErrorReport() (e.g.: errfinish()) seems to assume that string fields
of given ErrorData object needs to be freed. For this reason, we copy the
message into heap here.
I don't think we have faced with such a problem before but it seems worth
fixing as it is theoretically possible due to the reasoning above.
Drop ruleutils copied for triggers & statistics
While reading trigger related parts of our code-base, realized that
we actually don't need to copy & paste underlying worker functions
from pg/ruleutils.c since higher level functions for those two are
anyway exposed as SQL callables, so we can delete more than ~1k lines
of code from our ruleutils_x.c files.
BEGIN/COMMIT transaction block or in a UDF calling another UDF.
(2) Prohibit/Limit the delegated function not to do a 2PC (or any work on a
remote connection).
(3) Have a safety net to ensure the (2) i.e. we should block the connections
from the delegated procedure or make sure that no 2PC happens on the node.
(4) Such delegated functions are restricted to use only the distributed argument
value.
Note: To limit the scope of the project we are considering only Functions(not
procedures) for the initial work.
DESCRIPTION: Introduce a new flag "force_delegation" in create_distributed_function(),
which will allow a function to be delegated in an explicit transaction block.
Fixes#3265
Once the function is delegated to the worker, on that node during the planning
distributed_planner()
TryToDelegateFunctionCall()
CheckDelegatedFunctionExecution()
EnableInForceDelegatedFuncExecution()
Save the distribution argument (Constant)
ExecutorStart()
CitusBeginScan()
IsShardKeyValueAllowed()
Ensure to not use non-distribution argument.
ExecutorRun()
AdaptiveExecutor()
StartDistributedExecution()
EnsureNoRemoteExecutionFromWorkers()
Ensure all the shards are local to the node in the remoteTaskList.
NonPushableInsertSelectExecScan()
InitializeCopyShardState()
EnsureNoRemoteExecutionFromWorkers()
Ensure all the shards are local to the node in the placementList.
This also fixes a minor issue: Properly handle expressions+parameters in distribution arguments
* Removed distributed dependency in columnar_metadata.c
* Changed columnar_debug.c so that it no longer needed distributed/tuplestore and made it return a record instead of a tuplestore
* removed distributed/commands.h dependency
* Made columnar_tableam.c dependency-free
* Fixed spacing for columnar_store_memory_stats function
* indentation fix
* fixed test failures
* Require superuser while activating a node
With this change, we require ActiveNode() (hence citus_add_node(),
citus_activate_node()) explicitly require for a superuser.
Before this commit, these functions were designed to work with
non-superuser roles with the relevent GRANTs given.
However, that is not a widely used way for calling the functions
above.
Due to possibility of non-super user calling the UDFs, they were
designed in a way that some commands were using some additional
short-lived superuser connections. That is:
(a) breaking transactional behavior (e.g., ROLLBACK
wouldn't fully rollback the whole transaction)
(b) Making it very complicated to reason about which
parts of the node activation goes over which connections,
and becoming vulnerable to deadlocks / visibility issues.
In addition to starting a new transaction, we also need to tell other
backends --including the ones spawned for connections opened to
localhost to build indexes on shards of this relation-- that concurrent
index builds can safely ignore us.
Normally, DefineIndex() only does that if index doesn't have any
predicates (i.e.: where clause) and no index expressions at all.
However, now that we already called standard process utility, index
build on the shell table is finished anyway.
The reason behind doing so is that we cannot guarantee not grabbing any
snapshots via adaptive executor, and the backends creating indexes on
local shards (if any) might block on waiting for current xact of the
current backend to finish, which would cause self deadlocks that are not
detectable.
With https://github.com/citusdata/citus/pull/5493 we introduced
metadata specific connections.
With this connection we guarantee that there is a single metadata connection.
But note that this connection can be used for any other operation.
In other words, this connection is not only reserved for metadata
operations.
However, as https://github.com/citusdata/citus-enterprise/issues/715 showed
us that the logic has a flaw. We allowed ineligible connections to be
picked as metadata connections: such as exclusively claimed connections
or not fully initialized connections.
With this commit, we make sure that we only consider eligable connections
for metadata operations.
We prefer the background daemon to only sync node metadata. That's
why we move placement metadata changes from disable node to
activate node. With that, we can make sure that disable node
only changes node metadata, whereas activate node syncs all
the metadata changes. In essence, we already expect all
nodes to be up when a node is activated. So, this does not change
the behavior much.