We had 2 class definitions for CitusCacheManyConnectionsConfig, where
one of them was a copy of CitusSmallCopyBuffersConfig.
This commit leaves the intended class definition that configures caching
many connections, and removes the one that is a copy of another class
Since sequences are not marked as distributed while creating table if no
metadata worker node exists, we are marking all sequences distributed
while syncing metadata explicitly.
We've both allowed delegating functions and procedures from worker nodes
and also prevented delegation if a function/procedure has already been
propagated from another node.
Before that PR we were updating citus.pg_dist_object metadata, which keeps
the metadata related to objects on Citus, only on the coordinator node. In
order to allow using those object from worker nodes (or erroring out with
proper error message) we've started to propagate that metedata to worker
nodes as well.
citus_check_connection_to_node runs a simple query on a remote node and
reports whether this attempt was successful.
This UDF will be used to make sure each worker node can connect to all
the worker nodes in the cluster.
parameters:
nodename: required
nodeport: optional (default: 5432)
return value:
boolean success
* Update broken link for upgrade tests
* Update src/test/regress/README.md
Co-authored-by: Nils Dijk <nils@citusdata.com>
Co-authored-by: Nils Dijk <nils@citusdata.com>
As of master branch, Citus does all the modifications to replicated tables
(e.g., reference tables and distributed tables with replication factor > 1),
via 2PC and avoids any shardstate=3. As a side-effect of those changes,
handling node failures for replicated tables change.
With this PR, when one (or multiple) node failures happen, the users would
see query errors on modifications. If the problem is intermitant, that's OK,
once the node failure(s) recover by themselves, the modification queries would
succeed. If the node failure(s) are permenant, the users should call
`SELECT citus_disable_node(...)` to disable the node. As soon as the node is
disabled, modification would start to succeed. However, now the old node gets
behind. It means that, when the node is up again, the placements should be
re-created on the node. First, use `SELECT citus_activate_node()`. Then, use
`SELECT replicate_table_shards(...)` to replicate the missing placements on
the re-activated node.
With this commit, we make sure to use a dedicated connection per
node for all the metadata operations within the same transaction.
This is needed because the same metadata (e.g., metadata includes
the distributed table on the workers) can be modified accross
multiple connections.
With this connection we guarantee that there is a single metadata connection.
But note that this connection can be used for any other operation.
In other words, this connection is not only reserved for metadata
operations.
The checks for preventing to remove a node are very much reference
table centric. We are soon going to add the same checks for replicated
tables. So, make the checks generic such that:
(a) replicated tables fit naturally
(b) we can the same checks in `citus_disable_node`.
We do not use comments starting with # in spec files because it creates
errors from C preprocessor that expects directives after this character.
Instead use C style comments, i.e:
// single line comment
You can also use multiline comments as well
/*
* multi line comment
*/
We re-define the meaning of active shard placement. It used
to only be defined via shardstate == SHARD_STATE_ACTIVE.
Now, we also add one more check. The worker node that the
placement is on should be active as well.
This is a preparation for supporting citus_disable_node()
for MX with multiple failures at the same time.
With this change, the maintanince daemon only needs to
sync the "node metadata" (e.g., pg_dist_node), not the
shard metadata.
Before this commit, we acquire the metadata locks on the reference
tables while removing/disabling a node on all the MX nodes.
Although it has some marginal benefits, such as a concurrent
modification during remove/disable node blocks, instead of erroring
out, the drawbacks seems worse. Both citus_remove_node and citus_disable_node
are not tolerant to multiple node failures.
With this commit, we relax the locks. The implication is that while
a node is removed/disabled, users might see query errors. On the
other hand, this change becomes removing/disabling nodes more
tolerant to multiple node failures.