With this commit we've started to propagate sequences and shell
tables within the object dependency resolution. So, ensuring any
dependencies for any object will consider shell tables and sequences
as well. Separate logics for both shell tables and sequences have
been removed.
Since both shell tables and sequences logic were implemented as a
part of the metadata handling before that logic, we were propagating
them while syncing table metadata. With this commit we've divided
metadata (which means anything except shards thereafter) syncing
logic into multiple parts and implemented it either as a part of
ActivateNode. You can check the functions called in ActivateNode
to check definition of different metadata.
Definitions of start_metadata_sync_to_node and citus_activate_node
have also been updated. citus_activate_node will basically create
an active node with all metadata and reference table shards.
start_metadata_sync_to_node will be same with citus_activate_node
except replicating reference tables. stop_metadata_sync_to_node
will remove all the metadata. All of those UDFs need to be called
by superuser.
BEGIN/COMMIT transaction block or in a UDF calling another UDF.
(2) Prohibit/Limit the delegated function not to do a 2PC (or any work on a
remote connection).
(3) Have a safety net to ensure the (2) i.e. we should block the connections
from the delegated procedure or make sure that no 2PC happens on the node.
(4) Such delegated functions are restricted to use only the distributed argument
value.
Note: To limit the scope of the project we are considering only Functions(not
procedures) for the initial work.
DESCRIPTION: Introduce a new flag "force_delegation" in create_distributed_function(),
which will allow a function to be delegated in an explicit transaction block.
Fixes#3265
Once the function is delegated to the worker, on that node during the planning
distributed_planner()
TryToDelegateFunctionCall()
CheckDelegatedFunctionExecution()
EnableInForceDelegatedFuncExecution()
Save the distribution argument (Constant)
ExecutorStart()
CitusBeginScan()
IsShardKeyValueAllowed()
Ensure to not use non-distribution argument.
ExecutorRun()
AdaptiveExecutor()
StartDistributedExecution()
EnsureNoRemoteExecutionFromWorkers()
Ensure all the shards are local to the node in the remoteTaskList.
NonPushableInsertSelectExecScan()
InitializeCopyShardState()
EnsureNoRemoteExecutionFromWorkers()
Ensure all the shards are local to the node in the placementList.
This also fixes a minor issue: Properly handle expressions+parameters in distribution arguments
Before this commit, Citus was triggering metadata syncing
in the background when a function is distributed. However,
with Citus 11, we expect all clusters to have metadata synced
enabled. So, we do not expect any nodes not to have the metadata.
This change:
(a) pro: simplifies the code and opens up possibilities
to simplify futher by reducing the scope of
bg worker to only sync node metadata
(b) pro: explicitly asks users to sync the metadata such that
any unforseen impact can be easily detected
(c) con: For distributed functions without distribution
argument, we do not necessarily require the metadata
sycned. However, for completeness and simplicity, we
do so.
(cherry picked from commit 90928cfd74)
Fix function signature generation
Fix comment typo
Add test for worker_create_or_replace_object
Add test for recreating distributed functions with OUT/TABLE params
Add test for recreating distributed function that returns setof int
Fix test output
Fix comment
Before that PR we were updating citus.pg_dist_object metadata, which keeps
the metadata related to objects on Citus, only on the coordinator node. In
order to allow using those object from worker nodes (or erroring out with
proper error message) we've started to propagate that metedata to worker
nodes as well.
Postgres tightened up its checks for invalid GUC names hence we started
to get an alternative output for one of our tests. We add an alternative
output since the file is relatively small.
Commit on PG:
3db826bd55cd1df0dd8c3d811f8e5b936d7ba1e4
create_distributed_function(function_name,
distribution_arg_name,
colocate_with text)
This UDF did not allow colocate_with parameters when there were no
disttribution_arg_name supplied. This commit changes the behaviour to
allow missing distribution_arg_name parameters when the function should
be colocated with a reference table.
In two places I've made code more straight forward by using ROUTINE in our own codegen
Two changes which may seem extraneous:
AppendFunctionName was updated to not use pg_get_function_identity_arguments.
This is because that function includes ORDER BY when printing an aggregate like my_rank.
While ALTER AGGREGATE my_rank(x "any" ORDER BY y "any") is accepted by postgres,
ALTER ROUTINE my_rank(x "any" ORDER BY y "any") is not.
Tests were updated to use macaddr over integer. Using integer is flaky, our logic
could sometimes end up on tables like users_table. I originally wanted to use money,
but money isn't hashable.
Test ALTER ROLE doesn't deadlock when coordinator added, or propagate from mx workers
Consolidate wait_until_metadata_sync & verify_metadata to multi_test_helpers
Phase 1 seeks to implement minimal infrastructure, so does not include:
- dynamic generation of support aggregates to handle multiple arguments
- configuration methods to direct aggregation strategy,
or mark an aggregate's serialize/deserialize as safe to operate across nodes
Aggregates can be distributed when:
- they have a single argument
- they have a combinefunc
- their transition type is not a pseudotype
DESCRIPTION: Disallow distributed functions for functions depending on an extension
Functions depending on an extension cannot (yet) be distributed by citus. If we would allow this it would cause issues with our dependency following mechanism as we stop following objects depending on an extension.
By not allowing functions to be distributed when they depend on an extension as well as not allowing to make distributed functions depend on an extension we won't break the ability to add new nodes. Allowing functions depending on extensions to be distributed at the moment could cause problems in that area.
DESCRIPTION: Propagate CREATE OR REPLACE FUNCTION
Distributed functions could be replaced, which should be propagated to the workers to keep the function in sync between all nodes.
Due to the complexity of deparsing the `CreateFunctionStmt` we actually produce the plan during the processing phase of our utilityhook. Since the changes have already been made in the catalog tables we can reuse `pg_get_functiondef` to get us the generated `CREATE OR REPLACE` sql.
DESCRIPTION: Propagate ALTER FUNCTION statements for distributed functions
Using the implemented deparser for function statements to propagate changes to both functions and procedures that are previously distributed.
Since the distributed functions are useful when the workers have
metadata, we automatically sync it.
Also, after master_add_node(). We do it lazily and let the deamon
sync it. That's mainly because the metadata syncing cannot be done
in transaction blocks, and we don't want to add lots of transactional
limitations to master_add_node() and create_distributed_function().
With this commit, we're changing the API for create_distributed_function()
such that users can provide the distribution argument and the colocation
information.
This PR aims to add the minimal set of changes required to start
distributing functions. You can use create_distributed_function(regproc)
UDF to distribute a function.
SELECT create_distributed_function('add(int,int)');
The function definition should include the param types to properly
identify the correct function that we wish to distribute