The sections about the rebalancer algorithm and the backround tasks were
empty.
---------
Co-authored-by: Marco Slot <marco.slot@gmail.com>
Co-authored-by: Steven Sheehy <17552371+steven-sheehy@users.noreply.github.com>
Related to issue #7619, #7620
Merge command fails when source query is single sharded and source and
target are co-located and insert is not using distribution key of
source.
Example
```
CREATE TABLE source (id integer);
CREATE TABLE target (id integer );
-- let's distribute both table on id field
SELECT create_distributed_table('source', 'id');
SELECT create_distributed_table('target', 'id');
MERGE INTO target t
USING ( SELECT 1 AS somekey
FROM source
WHERE source.id = 1) s
ON t.id = s.somekey
WHEN NOT MATCHED
THEN INSERT (id)
VALUES (s.somekey)
ERROR: MERGE INSERT must use the source table distribution column value
HINT: MERGE INSERT must use the source table distribution column value
```
Author's Opinion: If join is not between source and target distributed
column, we should not force user to use source distributed column while
inserting value of target distributed column.
Fix: If user is not using distributed key of source for insertion let's
not push down query to workers and don't force user to use source
distributed column if it is not part of join.
This reverts commit fa4fc0b372.
Co-authored-by: paragjain <paragjain@microsoft.com>
Because we want to track PR numbers and to make backporting easy we
(pretty much always) use squash-merges when merging to master. We
accidentally used a rebase merge for PR #7620. This reverts those
changes so we can redo the merge using squash merge.
This reverts all commits from eedb607c to 9e71750fc.
DESCRIPTION: Fix performance issue when using "\d tablename" on a server
with many tables
We introduce a filter to every query on pg_class to automatically remove
shards. This is useful to make sure \d and PgAdmin are not cluttered
with shards. However, the way we were introducing this filter was using
`securityQuals` which can have negative impact on query performance.
On clusters with 100k+ tables this could cause a simple "\d tablename"
command to take multiple seconds, because a skipped optimization by
Postgres causes a full table scan. This changes the code to introduce
this filter in the regular `quals` list instead of in `securityQuals`.
Which causes Postgres to use the intended optimization again.
For reference, this was initially reported as a Postgres issue by me:
https://www.postgresql.org/message-id/flat/4189982.1712785863%40sss.pgh.pa.us#b87421293b362d581ea8677e3bfea920
Variables being modified in the PG_TRY block and read in the PG_CATCH
block should be qualified with volatile.
The variable waitEventSet is modified in the PG_TRY block (line 1085)
and read in the PG_CATCH block (line 1095).
The variable relation is modified in the PG_TRY block (line 500) and
read in the PG_CATCH block (line 515).
Besides, the variable objectAddress doesn't need the volatile qualifier.
Ref: C99 7.13.2.1[^1],
> All accessible objects have values, and all other components of the
abstract machine have state, as of the time the longjmp function was
called, except that the values of objects of automatic storage duration
that are local to the function containing the invocation of the
corresponding setjmp macro that do not have volatile-qualified type and
have been changed between the setjmp invocation and longjmp call are
indeterminate.
[^1]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
DESCRIPTION: Correctly mark some variables as volatile
---------
Co-authored-by: Hong Yi <zouzou0208@gmail.com>
DESCRIPTION: Fix performance issue in GetForeignKeyOids on systems with
many constraints
GetForeignKeyOids was showing up in CPU profiles when distributing
schemas on systems with 100k+ constraints. The reason was that this
function was doing a sequence scan of pg_constraint to get the foreign
keys that referenced the requested table.
This fixes that by finding the constraints referencing the table through
pg_depend instead of pg_constraint. We're doing this indirection,
because pg_constraint doesn't have an index that we can use, but
pg_depend does.
DESCRIPTION: Fix PG upgrades when invalid rebalance strategies exist
Without this change an upgrade of a cluster with an invalid rebalance
strategy would fail with an error like this:
```
cache lookup failed for shard_cost_function with oid 6077337
CONTEXT: SQL statement "SELECT citus_validate_rebalance_strategy_functions(
NEW.shard_cost_function,
NEW.node_capacity_function,
NEW.shard_allowed_on_node_function)"
PL/pgSQL function citus_internal.pg_dist_rebalance_strategy_trigger_func() line 5 at PERFORM
SQL statement "INSERT INTO pg_catalog.pg_dist_rebalance_strategy SELECT
name,
default_strategy,
shard_cost_function::regprocedure::regproc,
node_capacity_function::regprocedure::regproc,
shard_allowed_on_node_function::regprocedure::regproc,
default_threshold,
minimum_threshold,
improvement_threshold
FROM public.pg_dist_rebalance_strategy"
PL/pgSQL function citus_finish_pg_upgrade() line 115 at SQL statement
```
This fixes that by disabling the trigger and simply re-inserting the
invalid rebalance strategy without checking. We could also silently
remove it, but this seems nicer.
DESCRIPTION: Fix performance issue when distributing a table that
depends on an extension
When the database contains many objects this function would show up in
profiles because it was doing a sequence scan on pg_depend. And with
many objects pg_depend can get very large.
This starts using an index scan to only look for rows containing FDWs,
of which there are expected to be very few (often even zero).
DESCRIPTION: Fix performance issue when creating distributed tables if
many already exist
This builds on the work to speed up EnsureSequenceTypeSupported, and now
does something similar for SequenceUsedInDistributedTable.
SequenceUsedInDistributedTable had a similar O(number of citus tables)
operation. This fixes that and speeds up creation of distributed tables
significantly when many distributed tables already exist.
Fixes#7022
DESCRIPTION: Fix performance issue when creating distributed tables and many already exist
EnsureSequenceTypeSupported was doing an O(number of distributed tables)
operation. This can become very slow with lots of Citus tables, which
now happens much more frequently in practice due to schema based sharding.
Partially addresses #7022
And when that is the case, directly use it as "host" parameter for the
connections between nodes and use the "hostname" provided in
pg_dist_node / pg_dist_poolinfo as "hostaddr" to avoid host name lookup.
This is to avoid allowing dns resolution (and / or setting up DNS names
for each host in the cluster). This already works currently when using
IPs in the hostname. The only use of setting host is that you can then
use sslmode=verify-full and it will validate that the hostname matches
the certificate provided by the node you're connecting too.
It would be more flexible to make this a per-node setting, but that
requires SQL changes. And we'd like to backport this change, and
backporting such a sql change would be quite hard while backporting this
change would be very easy. And in many setups, a different hostname for
TLS validation is actually not needed. The reason for that is
query-from-any node: With query-from-any-node all nodes usually have a
certificate that is valid for the same "cluster hostname", either using
a wildcard cert or a Subject Alternative Name (SAN). Because if you load
balance across nodes you don't know which node you're connecting to, but
you still want TLS validation to do it's job. So with this change you
can use this same "cluster hostname" for TLS validation within the
cluster. Obviously this means you don't validate that you're connecting
to a particular node, just that you're connecting to one of the nodes in
the cluster, but that should be fine from a security perspective (in
most cases).
Note to self: This change requires updating
https://docs.citusdata.com/en/latest/develop/api_guc.html#citus-node-conninfo-text.
DESCRIPTION: Allows overwriting host name for all inter-node connections
by supporting "host" parameter in citus.node_conninfo
DESCRIPTION: Fixes a crash caused by some form of ALTER TABLE ADD COLUMN
statements. When adding multiple columns, if one of the ADD COLUMN
statements contains a FOREIGN constraint ommitting the referenced
columns in the statement, a SEGFAULT occurs.
For instance, the following statement results in a crash:
```
ALTER TABLE lt ADD COLUMN new_col1 bool,
ADD COLUMN new_col2 int references rt;
```
Fixes#7520.
Fixes https://github.com/citusdata/citus/issues/7536.
Note to reviewer:
Before this commit, the following results in an assertion failure when
executed locally and this won't be the case anymore:
```console
make -C src/test/regress/ check-citus-upgrade-local citus-old-version=v10.2.0
```
Note that this doesn't happen on CI as we don't enable assertions there.
---------
Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
RunPreprocessNonMainDBCommand and RunPostprocessNonMainDBCommand are
the entrypoints for this module. These functions are called from
utility_hook.c to support some of the node-wide object management
commands from non-main databases.
To add support for a new command type, one needs to define a new
NonMainDbDistributeObjectOps object and add it to
GetNonMainDbDistributeObjectOps.
This PR changes the order in which the locks are acquired (for the
target and reference tables), when a modify request is initiated from a
worker node that is not the "FirstWorkerNode".
To prevent concurrent writes, locks are acquired on the first worker
node for the replicated tables. When the update statement originates
from the first worker node, it acquires the lock on the reference
table(s) first, followed by the target table(s). However, if the update
statement is initiated in another worker node, the lock requests are
sent to the first worker in a different order. This PR unifies the
modification order on the first worker node. With the third commit,
independent of the node that received the request, the locks are
acquired for the modified table and then the reference tables on the
first node.
The first commit shows a sample output for the test prior to the fix.
Fixes#7477
---------
Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
When using a CASE WHEN expression in the body
of the function that is used in the DO block, a segmentation
fault occured. This fixes that.
Fixes#7381
---------
Co-authored-by: Konstantin Morozov <vzbdryn@yahoo.com>
This fixes#7551 reported by Egor Chindyaskin
Function activate_node_snapshot() is not meant to be called on a cluster
without worker nodes. This commit adds ERROR report for such case to
prevent server crash.
DESCRIPTION: Adds support for distributed `ALTER/DROP ROLE` commands
from the databases where Citus is not installed
---------
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
In preprocess phase, we save the original database name, replace
dbname field of CreatedbStmt with a temporary name (to let Postgres
to create the database with the temporary name locally) and then
we insert a cleanup record for the temporary database name on all
nodes **(\*\*)**.
And in postprocess phase, we first rename the temporary database
back to its original name for local node and then return a list of
distributed DDL jobs i) to create the database with the temporary
name and then ii) to rename it back to its original name on other
nodes. That way, if CREATE DATABASE fails on any of the nodes, the
temporary database will be cleaned up by the cleanup records that
we inserted in preprocess phase and in case of a failure, we won't
leak any databases called as the name that user intended to use for
the database.
Solves the problem documented in
https://github.com/citusdata/citus/issues/7369
for CREATE DATABASE commands.
**(\*\*):** To ensure that we insert cleanup records on all nodes,
with this PR we also start requiring having the coordinator in the
metadata because otherwise we would skip inserting a cleanup record
for the coordinator.
When adding CREATE/DROP DATABASE propagation in #7240, luckily
we've added EnsureSupportedCreateDatabaseCommand() check into
deparser too just to be on the safe side. That way, today CREATE
DATABASE commands from non-main dbs don't silently allow unsupported
options.
I wasn't aware of this when merging #7439 and hence wanted to add
a test so that we don't mistakenly remove that check from deparser
in future.
This fixes#7454: master_disable_node() has only two arguments, but
calls citus_disable_node() that tries to read three arguments
Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>
DESCRIPTION: Adds support for distributed `CREATE/DROP DATABASE `
commands from the databases where Citus is not installed
---------
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
DESCRIPTION: Adds support for distributed `GRANT .. ON DATABASE TO USER`
commands from the databases where Citus is not installed
---------
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
Rename InsertCleanupRecordInCurrentTransaction ->
InsertCleanupOnSuccessRecordInCurrentTransaction and hardcode policy
type as CLEANUP_DEFERRED_ON_SUCCESS.
Rename InsertCleanupRecordInSubtransaction ->
InsertCleanupRecordOutsideTransaction.
DESCRIPTION: Adds support for distributed role-membership management
commands from the databases where Citus is not installed (`GRANT <role>
TO <role>`)
This PR also refactors the code-path that allows executing some of the
node-wide commands so that we use send deparsed query string to other
nodes instead of the `queryString` passed into utility hook.
---------
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
DESCRIPTION: Fixes incorrect propagating of `GRANTED BY` and
`CASCADE/RESTRICT` clauses for `REVOKE` statements
There are two issues fixed in this PR
1. granted by statement will appear for revoke statements as well
2. revoke/cascade statement will appear after granted by
Since granted by statements does not appear in statements, this bug
hasn't been visible until now. However, after activating the granted by
statement for revoke, order problem arised and this issue was fixed
order problem for cascade/revoke as well
In summary, this PR provides usage of granted by statements properly now
with the correct order of statements.
We can verify the both errors, fixed with just single statement
REVOKE dist_role_3 from non_dist_role_3 granted by test_admin_role
cascade;
DESCRIPTION: Resolves an issue that disrupts distributed GRANT
statements with the grantor option
In this issue 3 issues are being solved:
1.Correcting the erroneous appending of multiple granted by in the
deparser.
2Adding support for grantor (granted by) in grant role propagation.
3. Implementing grantor (granted by) support during the metadata sync
grant role propagation phase.
Limitations: Currently, the grantor must be created prior to the
metadata sync phase. During metadata sync, both the creation of the
grantor and the grants given by that role cannot be performed, as the
grantor role is not detected during the dependency resolution phase.
---------
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
Moves the following functions to the Citus internal schema:
citus_internal_local_blocked_processes
citus_internal_global_blocked_processes
citus_internal_mark_node_not_synced
citus_internal_unregister_tenant_schema_globally
citus_internal_update_none_dist_table_metadata
citus_internal_update_placement_metadata
citus_internal_update_relation_colocation
citus_internal_start_replication_origin_tracking
citus_internal_stop_replication_origin_tracking
citus_internal_is_replication_origin_tracking_active
#7405
---------
Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
DESCRIPTION: citus_move_shard_placement now fails early when shard
cannot be safely moved
The implementation is quite simplistic -
`citus_move_shard_placement(...)` will fail with an error if there's any
new node in the cluster that doesn't have reference tables yet.
It could have been finer-grained, i.e. erroring only when trying to move
a shard to an unitialized node. Looking at the related functions -
`replicate_reference_tables()` or `citus_rebalance_start()`, I think
it's acceptable behaviour. These other functions also treat "any"
unitialized node as a temporary anomaly.
Fixes#7426
---------
Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
Since Postgres commit da9b580d files and directories are supposed to
be created with pg_file_create_mode and pg_dir_create_mode permissions
when default permissions are expected.
This fixes a failure of one of the postgres tests:
If we create file add.conf containing
```
shared_preload_libraries='citus'
```
and run postgres tests
```
TEMP_CONFIG=/path/to/add.conf make installcheck -C src/bin/pg_ctl/
```
then 001_start_stop.pl fails with
```
.../data/base/pgsql_job_cache mode must be 0750
```
in the log.
In passing this also stops creating directories that we haven't used
since Citus 7.4
This change explicitely doesn't change permissions of certificates/keys
that we create.
---------
Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>
Moves the following functions:
citus_internal_delete_colocation_metadata
citus_internal_delete_partition_metadata
citus_internal_delete_placement_metadata
citus_internal_delete_shard_metadata
citus_internal_delete_tenant_schema
Move more functions to citus_internal schema, the list:
citus_internal_add_placement_metadata
citus_internal_add_shard_metadata
citus_internal_add_tenant_schema
citus_internal_adjust_local_clock_to_remote
citus_internal_database_command
#7405
Move citus_internal_acquire_citus_advisory_object_class_lock and
citus_internal_add_colocation_metadata functions from pg_catalog to
citus_internal.
#7405
Fixes a bug that breaks queries from non-maindbs when
citus.local_hostname is set to a value different than "localhost".
This is a very old bug doesn't cause a problem as long as Citus catalog
is available to FindWorkerNode(). And the catalog is always available
unless we're in non-main database, which might be the case on main but
not on older releases, hence not adding a `DESCRIPTION`. For this
reason, I don't see a reason to backport this.
Maybe we should totally refrain using LOCAL_HOST_NAME in all code-paths,
but not doing that in this PR as the other paths don't seem to be
breaking something that is user-facing.
```c
char *
GetAuthinfo(char *hostname, int32 port, char *user)
{
char *authinfo = NULL;
bool isLoopback = (strncmp(LOCAL_HOST_NAME, hostname, MAX_NODE_LENGTH) == 0 &&
PostPortNumber == port);
if (IsTransactionState())
{
int64 nodeId = WILDCARD_NODE_ID;
/* -1 is a special value for loopback connections (task tracker) */
if (isLoopback)
{
nodeId = LOCALHOST_NODE_ID;
}
else
{
WorkerNode *worker = FindWorkerNode(hostname, port);
if (worker != NULL)
{
nodeId = worker->nodeId;
}
}
authinfo = GetAuthinfoViaCatalog(user, nodeId);
}
return (authinfo != NULL) ? authinfo : "";
}
```
This patch includes the username in the reported error message.
This makes debugging easier when certain commands open connections
as other users than the user that is executing the command.
```
monitora_snapshot=# SELECT citus_move_shard_placement(102030, 'monitora.db-dev-worker-a', 6005, 'monitora.db-dev-worker-a', 6017);
ERROR: connection to the remote node monitora_user@monitora.db-dev-worker-a:6017 failed with the following error: fe_sendauth: no password supplied
Time: 40,198 ms
```
This PR makes the connections to other nodes for
`mark_object_distributed` use the same user as
`execute_command_on_remote_nodes_as_user` so they'll use the same
connection.
ExecuteTaskListIntoTupleDestWithParam and ExecuteTaskListIntoTupleDest
are nearly the same. I parameterized and a made a reusable structure
here
---------
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
DESCRIPTION: Remove a few small memory leaks
In #7440 one instance of a strdup was removed. But there were a few
more. This removes the ones that are left over, or adds a comment why
strdup is on purpose.
This change refactors the code by using generate_qualified_relation_name
from id instead of using a sequence of functions to generate the
relation name.
Fixes#6602
DESCRIPTION: Adds comment on database and role propagation.
Example commands are as below
comment on database <db_name> is '<comment_text>'
comment on database <db_name> is NULL
comment on role <role_name> is '<comment_text>'
comment on role <role_name> is NULL
---------
Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
Running a query from a Citus non-main database that inserts to
pg_dist_object requires a new connection to the main database itself.
This PR adds that connection to the main database.
---------
Co-authored-by: Jelte Fennema-Nio <github-tech@jeltef.nl>
When there are multiple localhost entries in /etc/hosts like following
/etc/hosts:
```
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
127.0.0.1 localhost
```
multi_cluster_management check will failed:
```
@@ -857,20 +857,21 @@
ERROR: group 14 already has a primary node
-- check that you can add secondaries and unavailable nodes to a group
SELECT groupid AS worker_2_group FROM pg_dist_node WHERE nodeport = :worker_2_port \gset
SELECT 1 FROM master_add_node('localhost', 9998, groupid => :worker_1_group, noderole => 'secondary');
?column?
----------
1
(1 row)
SELECT 1 FROM master_add_node('localhost', 9997, groupid => :worker_1_group, noderole => 'unavailable');
+WARNING: could not establish connection after 5000 ms
?column?
----------
1
(1 row)
```
This actually isn't just a problem in test environments, but could occur
as well during actual usage when a hostname in pg_dist_node
resolves to multiple IPs and one of those IPs is unreachable.
Postgres will then automatically continue with the next IP, but
Citus should listen for events on the new socket. Not on the
old one.
Co-authored-by: chuhx43211 <chuhx43211@hundsun.com>
LoadShardList is called twice, which is not neccessary, and there is no
need to sort the shard placement list since we only want to know the list
length.
DESCRIPTION: Adds support for issuing `CREATE`/`DROP` DATABASE commands
from worker nodes
With this commit, we allow issuing CREATE / DROP DATABASE commands from
worker nodes too.
As in #7278, this is not allowed when the coordinator is not added to
metadata because we don't ever sync metadata changes to coordinator
when adding coordinator to the metadata via
`SELECT citus_set_coordinator_host('<hostname>')`, or equivalently, via
`SELECT citus_add_node(<coordinator_node_name>, <coordinator_node_port>, 0)`.
We serialize database management commands by acquiring a Citus specific
advisory lock on the first primary worker node if there are any workers in the
cluster. As opposed to what we've done in https://github.com/citusdata/citus/pull/7278
for role management commands, we try to avoid from running into distributed deadlocks
as much as possible. This is because, while distributed deadlocks that can happen around
role management commands can be detected by Citus, this is not the case for database
management commands because most of them cannot be run inside in a transaction block.
In that case, Citus cannot even detect the distributed deadlock because the command is not
part of a distributed transaction at all, then the command execution might not return the
control back to the user for an indefinite amount of time.
This fixes#7230.
First of all, using HeapTupleHeaderGetDatumLength(heapTuple) is
definetly wrong, it gives a number that's 4 times less than the correct
tuple size (heapTuple.t_len). See
https://github.com/postgres/postgres/blob/REL_16_0/src/include/access/htup_details.h#L455-L456https://github.com/postgres/postgres/blob/REL_16_0/src/include/varatt.h#L279https://github.com/postgres/postgres/blob/REL_16_0/src/include/varatt.h#L225-L226
When I fixed it, the limit_intermediate_size test failed, so I tried to
understand what's going on there. In original commit fd546cf these
queries were supposed to fail. Then in b3af63c three of the queries that
were supposed to fail suddenly worked and tests were changed to pass
without understanding why the output had changed or how to keep test
testing what it had to test. Even comments saying that these queries
should fail were left untouched. Commit message gives no clue about why
exactly test has changed:
> It seems that when we use adaptive executor instead of task tracker,
we
> exceed the intermediate result size less in the test. Therefore
updated
> the tests accordingly.
Then 3fda2c3 also blindly raised the limit for one of the queries to
keep it working:
3fda2c3254 (diff-a9b7b617f9dfd345318cb8987d5897143ca1b723c87b81049bbadd94dcc86570R19)
When in fe3caf3 that HeapTupleHeaderGetDatumLength(heapTuple) call was
finally added, one of those test queries became failing again.
The other two of them now also failing after the fix. I don't understand
how exactly the calculation of "intermediate result size" that is
limited by citus.max_intermediate_result_size had changed through
b3af63c and fe3caf3, but these numbers are now closer to what
they originally were when this limitation was added in
fd546cf. So these queries should fail, like in the original
version of the limit_intermediate_size test.
Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>
DESCRIPTION: Adds REASSIGN OWNED BY propagation
This pull request introduces the propagation of the "Reassign owned by"
statement. It accommodates both local and distributed roles for both the
old and new assignments. However, when the old role is a local role, it
undergoes filtering and is not propagated. On the other hand, if the new
role is a local role, the process involves first creating the role on
worker nodes before propagating the "Reassign owned" statement.
DESCRIPTION: Adds database connection limit, rename and set tablespace
propagation
In this PR, below statement propagations are added
alter database <database_name> with allow_connections = <boolean_value>;
alter database <database_name> rename to <database_name2>;
alter database <database_name> set TABLESPACE <table_space_name>
---------
Co-authored-by: Jelte Fennema-Nio <github-tech@jeltef.nl>
Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
DESCRIPTION: Adds support for 2PC from non-Citus main databases
This PR only adds support for `CREATE USER` queries, other queries need
to be added. But it should be simple because this PR creates the
underlying structure.
Citus main database is the database where the Citus extension is
created. A non-main database is all the other databases that are in the
same node with a Citus main database.
When a `CREATE USER` query is run on a non-main database we:
1. Run `start_management_transaction` on the main database. This
function saves the outer transaction's xid (the non-main database
query's transaction id) and marks the current query as main db command.
2. Run `execute_command_on_remote_nodes_as_user("CREATE USER
<username>", <username to run the command>)` on the main database. This
function creates the users in the rest of the cluster by running the
query on the other nodes. The user on the current node is created by the
query on the outer, non-main db, query to make sure consequent commands
in the same transaction can see this user.
3. Run `mark_object_distributed` on the main database. This function
adds the user to `pg_dist_object` in all of the nodes, including the
current one.
This PR also implements transaction recovery for the queries from
non-main databases.
Allowing GRANT ADMIN to now also be INHERIT or SET in support of psql16
GRANT role_name [, ...] TO role_specification [, ...] [ WITH { ADMIN |
INHERIT | SET } { OPTION | TRUE | FALSE } ] [ GRANTED BY
role_specification ]
Fixes: #7148
Related: #7138
See review changes from https://github.com/citusdata/citus/pull/7164
This change adds a script to programatically group all includes in a
specific order. The script was used as a one time invocation to group
and sort all includes throught our formatted code. The grouping is as
follows:
- System includes (eg. `#include<...>`)
- Postgres.h (eg. `#include "postgres.h"`)
- Toplevel imports from postgres, not contained in a directory (eg.
`#include "miscadmin.h"`)
- General postgres includes (eg . `#include "nodes/..."`)
- Toplevel citus includes, not contained in a directory (eg. `#include
"citus_verion.h"`)
- Columnar includes (eg. `#include "columnar/..."`)
- Distributed includes (eg. `#include "distributed/..."`)
Because it is quite hard to understand the difference between toplevel
citus includes and toplevel postgres includes it hardcodes the list of
toplevel citus includes. In the same manner it assumes anything not
prefixed with `columnar/` or `distributed/` as a postgres include.
The sorting/grouping is enforced by CI. Since we do so with our own
script there are not changes required in our uncrustify configuration.
DESCRIPTION: Adds support for propagating `CREATE`/`DROP` database
In this PR, create and drop database support is added.
For CREATE DATABASE:
* "oid" option is not supported
* specifying "strategy" to be different than "wal_log" is not supported
* specifying "template" to be different than "template1" is not
supported
The last two are because those are not saved in `pg_database` and when
activating a node, we cannot assume what parameters were provided when
creating the database.
And "oid" is not supported because whether user specified an arbitrary
oid when creating the database is not saved in pg_database and we want
to avoid from oid collisions that might arise from attempting to use an
auto-assigned oid on workers.
Finally, in case of node activation, GRANTs for the database are also
propagated.
---------
Co-authored-by: Jelte Fennema-Nio <github-tech@jeltef.nl>
Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
We propagate `SECURITY LABEL [for provider] ON ROLE rolename IS
labelname` to the worker nodes.
We also make sure to run the relevant `SecLabelStmt` commands on a
newly added node by looking at roles found in `pg_shseclabel`.
See official docs for explanation on how this command works:
https://www.postgresql.org/docs/current/sql-security-label.html
This command stores the role label in the `pg_shseclabel` catalog table.
This commit also fixes the regex string in
`check_gucs_are_alphabetically_sorted.sh` script such that it escapes
the dot. Previously it was looking for all strings starting with "citus"
instead of "citus." as it should.
To test this feature, I currently make use of a special GUC to control
label provider registration in PG_init when creating the Citus extension.
DESCRIPTION: Adds support from issuing role management commands from worker nodes
It's unlikely to get into a distributed deadlock with role commands, we
don't care much about them at the moment.
There were several attempts to reduce the chances of a deadlock but we
didn't any of them merged into main branch yet, see:
#7325#7016#7009
When executing a prepared CALL, which is not pure SQL but available with
some drivers like npgsql and jpgdbc, Citus entered a code path where a
plan is not defined, while trying to increase its cost. Thus SIG11 when
plan is a NULL pointer.
Fix by only increasing plan cost when plan is not null.
However, it is a bit suspicious to get here with a NULL plan and maybe a
better change will be to not call
ShardPlacementForFunctionColocatedWithDistTable() with a NULL plan at
all (in call.c:134)
bug hit with for example:
```
CallableStatement proc = con.prepareCall("{CALL p(?)}");
proc.registerOutParameter(1, java.sql.Types.BIGINT);
proc.setInt(1, -100);
proc.execute();
```
where `p(bigint)` is a distributed "function" and the param the
distribution key (also in a distributed table), see #7242 for details
Fixes#7242
One of our most flaky and most anoying tests is
multi_cluster_management. It usually fails like this:
```diff
SELECT citus_disable_node('localhost', :worker_2_port);
citus_disable_node
--------------------
(1 row)
SELECT public.wait_until_metadata_sync(60000);
+WARNING: waiting for metadata sync timed out
wait_until_metadata_sync
--------------------------
(1 row)
```
This tries to address that by hardening wait_until_metadata_sync. I
believe the reason for this warning is that there is a race condition in
wait_until_metadata_sync. It's possible for the pre-check to fail, then
have the maintenance daemon send a notification. And only then have the
backend start to listen. I tried to fix it in two ways:
1. First run LISTEN, and only then read do the pre-check.
2. If we time out, check again just to make sure that we did not miss
the notification somehow. And don't show a warning if all metadata is
synced after the timeout.
It's hard to know for sure that this fixes it because the test is not
repeatable and I could not reproduce it locally. Let's just hope for the
best.
---------
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
I just enhanced the existing code to check if the relation is an index
belonging to a distributed table.
If so the shardId is appended to relation (index) name and the *_size
function are executed as before.
There is a change in an extern function:
`extern StringInfo GenerateSizeQueryOnMultiplePlacements(...)`
It's possible to create a new function and deprecate this one later if
compatibility is an issue.
Fixes https://github.com/citusdata/citus/issues/6496.
DESCRIPTION: Allows using Citus size functions on distributed tables
indexes.
---------
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
DESCRIPTION: This change starts a maintenance deamon at the time of
server start if there is a designated main database.
This is the code flow:
1. User designates a main database:
`ALTER SYSTEM SET citus.main_db = "myadmindb";`
2. When postmaster starts, in _PG_Init, citus calls
`InitializeMaintenanceDaemonForMainDb`
This function registers a background worker to run
`CitusMaintenanceDaemonMain `with `databaseOid = 0 `
3. `CitusMaintenanceDaemonMain ` takes some special actions when
databaseOid is 0:
- Gets the citus.main_db value.
- Connects to the citus.main_db
- Now the `MyDatabaseId `is available, creates a hash entry for it.
- Then follows the same control flow as for a regular db,
HasDistributionKey & HasDistributionKeyCacheEntry returns true when the
corresponding table has a distribution key, the comments state the
opposite,
which should be fixed.
Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
DESCRIPTION: Send keepalive messages during the logical replication
phase of large shard splits to avoid timeouts.
During the logical replication part of the shard split process, split
decoder filters out the wal records produced by the initial copy. If the
number of wal records is big, then split decoder ends up processing for
a long time before sending out any wal records through pgoutput. Hence
the wal receiver may time out and restarts repeatedly causing our split
driver code catch up logic to fail.
Notes:
1. If the wal_receiver_timeout is set to a very small number e.g. 600ms,
it may time out before receiving the keepalives. My tests show that this
code works best when the` wal_receiver_timeout `is set to 1minute, which
is the default value.
2. Once a logical replication worker time outs, a new one gets launched.
The new logical replication worker sets the pg_stat_subscription columns
to initial values. E.g. the latest_end_lsn is set to 0. Our driver logic
in `WaitForGroupedLogicalRepTargetsToCatchUp` can not handle LSN value
to go back. This is the main reason for it to get stuck in the infinite
loop.
DESCRIPTION: Fix leaking of memory and memory contexts in Foreign
Constraint Graphs
Previously, every time we (re)created the Foreign Constraint
Relationship Graph, we created a new Memory Context while loosing a
reference to the previous context. This old context could still have
left over memory in there causing a memory leak.
With this patch we statically have one memory context that we lazily
initialize the first time we create our foreign constraint relationship
graph. On every subsequent creation, beside destroying our previous
hashmap we also reset our memory context to remove any left over
references.
This commit aims to add a comprehensive guide that covers all essential
aspects of Citus, including planning, execution, locking mechanisms,
shard moves, 2PC, and many other major components of Citus.
Co-authored-by: Marco Slot <marco.slot@gmail.com>
DESCRIPTION: Shard moves/isolate report LSN's in lsn format
While investigating an issue with our catchup mechanism on certain
postgres versions we noticed we print LSN's in the format of the native
long type. This is an uncommon representation for LSN's in postgres
logs.
This patch changes the output of our log message to go from the long
type representation to the native LSN type representation. Making it
easier for postgres users to recognize and compare LSN's with other
related reports.
example of new output:
```
2023-09-25 17:28:47.544 CEST [11345] LOG: The LSN of the target subscriptions on node localhost:9701 have increased from 0/0 to 0/E1ED20F8 at 2023-09-25 17:28:47.544165+02 where the source LSN is 1/415DCAD0
```
When cdc got added the makefiles hardcoded the `.so` extension instead
of using the platform specifc `$(DLSUFFIX)` variable used by `pgxs.mk`.
Also don't remove installed cdc artifacts on `make clean`.
DESCRIPTION: Adds support for ALTER DATABASE <db_name> SET .. statement
propagation
SET statements in Postgres has a common structure which is already being
used in Alter Function
statement.
In this PR, I added a util file; citus_setutils and made it usable for
both for
alter database<db_name>set .. and alter function ... set ... statements.
With this PR, below statements will be propagated
```sql
ALTER DATABASE name SET configuration_parameter { TO | = } { value | DEFAULT }
ALTER DATABASE name SET configuration_parameter FROM CURRENT
ALTER DATABASE name RESET configuration_parameter
ALTER DATABASE name RESET ALL
```
Additionally, there was a bug in processing float values in the common
code block.
I fixed this one as well
Previous
```C
case T_Float:
{
appendStringInfo(buf, " %s", strVal(value));
break;
}
```
Now
```C
case T_Float:
{
appendStringInfo(buf, " %s", nodeToString(value));
break;
}
```
DESCRIPTION: Adds ALTER DATABASE WITH ... and REFRESH COLLATION VERSION
support
This PR adds supports for basic ALTER DATABASE statements propagation
support. Below statements are supported:
ALTER DATABASE <database_name> with IS_TEMPLATE <true/false>;
ALTER DATABASE <database_name> with CONNECTION LIMIT <integer_value>;
ALTER DATABASE <database_name> REFRESH COLLATION VERSION;
---------
Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
We currently don't support propagating these options in Citus
Relevant PG commits:
https://github.com/postgres/postgres/commit/e3ce2dehttps://github.com/postgres/postgres/commit/3d14e17
Limitation:
We also need to take care of generated GRANT statements by dependencies
in attempt to distribute something else. Specifically, this part of the
code in `GenerateGrantRoleStmtsOfRole`:
```
grantRoleStmt->admin_opt = membership->admin_option;
```
In PG16, membership also has `inherit_option` and `set_option` which
need to properly be part of the `grantRoleStmt`. We can skip for now
since #7164 will take care of this soon, and also this is not an
expected use-case.
Add citus_schema_move() that can be used to move tenant tables within a distributed
schema to another node. The function has two variations as simple wrappers around
citus_move_shard_placement() and citus_move_shard_placement_with_nodeid() respectively.
They pick a shard that belongs to the given tenant schema and resolve the source node
that contain the shards under given tenant schema. Hence their signatures are quite
similar to underlying functions:
```sql
-- citus_schema_move(), using target node name and node port
CREATE OR REPLACE FUNCTION pg_catalog.citus_schema_move(
schema_id regnamespace,
target_node_name text,
target_node_port integer,
shard_transfer_mode citus.shard_transfer_mode default 'auto')
RETURNS void
LANGUAGE C STRICT
AS 'MODULE_PATHNAME', $$citus_schema_move$$;
-- citus_schema_move(), using target node id
CREATE OR REPLACE FUNCTION pg_catalog.citus_schema_move(
schema_id regnamespace,
target_node_id integer,
shard_transfer_mode citus.shard_transfer_mode default 'auto')
RETURNS void
LANGUAGE C STRICT
AS 'MODULE_PATHNAME', $$citus_schema_move_with_nodeid$$;
```
Since in PG16, truncate triggers are supported on foreign tables, we add
the citus_truncate_trigger to Citus foreign tables as well, such that the TRUNCATE
command is propagated to the table's single local shard as well.
Note that TRUNCATE command was working for foreign tables even before this
commit: see https://github.com/citusdata/citus/pull/7170#issuecomment-1706240593 for details
This commit also adds tests with user-enabled truncate triggers on Citus foreign tables:
both trigger on the shell table and on its single foreign local shard.
Relevant PG commit:
https://github.com/postgres/postgres/commit/3b00a94
**Problem:**
Previously we always used an outside superuser connection to overcome
permission issues for the current user while propagating dependencies.
That has mainly 2 problems:
1. Visibility issues during dependency propagation, (metadata connection
propagates some objects like a schema, and outside transaction does not
see it and tries to create it again)
2. Security issues (it is preferrable to use current user's connection
instead of extension superuser)
**Solution (high level):**
Now, we try to make a smarter decision on whether should we use an
outside superuser connection or current user's metadata connection. We
prefer using current user's connection if any of the objects, which is
already propagated in the current transaction, is a dependency for a
target object. We do that since we assume if current user has
permissions to create the dependency, then it can most probably
propagate the target as well.
Our assumption is expected to hold most of the times but it can still be
wrong. In those cases, transaction would fail and user should set the
GUC `citus.create_object_propagation` to `deferred` to work around it.
**Solution:**
1. We track all objects propagated in the current transaction (we can
handle subtransactions),
2. We propagate dependencies via the current user's metadata connection
if any dependency is created in the current transaction to address
issues listed above. Otherwise, we still use an outside superuser
connection.
DESCRIPTION: Fixes some object propagation errors seen with transaction
blocks.
Fixes https://github.com/citusdata/citus/issues/6614
---------
Co-authored-by: Nils Dijk <nils@citusdata.com>
For a database that does not create the citus extension by running
` CREATE EXTENSION citus;`
`CitusHasBeenLoaded ` function ends up querying the `pg_extension` table
every time it is invoked. This is not an ideal situation for a such a
database.
The idea in this PR is as follows:
### A new field in MetadataCache.
Add a new variable `extensionCreatedState `of the following type:
```
typedef enum ExtensionCreatedState
{
UNKNOWN = 0,
CREATED = 1,
NOTCREATED = 2,
} ExtensionCreatedState;
```
When the MetadataCache is invalidated, `ExtensionCreatedState` will be
set to UNKNOWN.
### Invalidate MetadataCache when CREATE/DROP/ALTER EXTENSION citus
commands are run.
- Register a callback function, named
`InvalidateDistRelationCacheCallback`, for relcache invalidation during
the shared library initialization for `citus.so`. This callback function
is invoked in all the backends whenever the relcache is invalidated in
one of the backends. (This could be caused many DDLs operations).
- In the cache invalidation callback,`
InvalidateDistRelationCacheCallback`, invalidate `MetadataCache` zeroing
it out.
- In `CitusHasBeenLoaded`, perform the costly citus is loaded check only
if the `MetadataCache` is not valid.
### Downsides
Any relcache invalidation (caused by various DDL operations) will case
Citus MetadataCache to get invalidated. Most of the time it will be
unnecessary. But we rely on that DDL operations on relations will not be
too frequent.
When braking a colocation, we need to create a new colocation group
record in pg_dist_colocation for the relation. It is not sufficient to
have a new colocationid value in pg_dist_partition only.
This patch also fixes a bug when deleting a colocation group if no
tables are left in it. Previously we passed a relation id as a parameter
to DeleteColocationGroupIfNoTablesBelong function, where we should have
passed a colocation id.
DESCRIPTION: Presenting citus_pause_node UDF enabling pausing by
node_id.
citus_pause_node takes a node_id parameter and fetches all the shards in
that node and puts AccessExclusiveLock on all the shards inside that
node. With this lock, insert is disabled, until citus_pause_node
transaction is closed.
---------
Co-authored-by: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com>
If we're in the middle of a table type conversion (such as from Citus
local table to a reference table), the table might not have all the
placements that we expect from the table type. For this reason, we
should intersect the placements of tables at hand when creating
inter-shard ddl tasks.
What we do to collect foreign key constraint commands in
WorkerCreateShardCommandList is quite similar to what we do in
CopyShardForeignConstraintCommandList. Plus, the code that we used
in WorkerCreateShardCommandList before was not able to properly handle
foreign key constraints between Citus local tables --when creating a
reference table from the referencing one.
With a few slight modifications made to
CopyShardForeignConstraintCommandList, we can use the same logic in
WorkerCreateShardCommandList too.
DESCRIPTION: Adds grant/revoke propagation support for database
privileges
Following the implementation of support for granting and revoking
database privileges, certain tests that issued grants for worker nodes
experienced failures. These ones are fixed in this PR as well.
DESCRIPTION: Adds PG16Beta3 support
This is the final commit that adds
PG16 compatibility with Citus's current features.
You can use Citus community with PG16Beta3. This commit:
- Enables PG16 in the configure script.
- Adds PG16 tests to CI using test images that have 16beta3
- Skips wal2json cdc test since wal2json package is not available for PG16 yet
- Fixes an isolation test
Several PG16 Compatibility commits have been merged before this final one.
All these subtasks are done https://github.com/citusdata/citus/issues/7017
See the list below:
1 - 42d956888d
Resolve compilation issues
2 - 0d503dd5ac
Ruleutils and successful CREATE EXTENSION
3 - 907d72e60d
Some test outputs
4 - 7c6b4ce103
Outer join checks, subscription password, crash fixes
5 - 6056cb2c29
get_relation_info hook to avoid crash from adjusted partitioning
6 - b36c431abb
Rework PlannedStmt and Query's Permission Info
7 - ee3153fe50
More test output fixes
8 - 2c50b5f7ff
varnullingrels additions
9 - b2291374b4
More test output fixes
10- a2315fdc67
New options to vacuum and analyze
11- 9fa72545e2
Fix AM dependency and grant's admin option
12- 2d6cf8e79a
One more outer join check
Stay tuned for PG16 new features in Citus :)
PG16 compatibility - part 11
Check out part 1 42d956888d
part 2 0d503dd5ac
part 3 907d72e60d
part 4 7c6b4ce103
part 5 6056cb2c29
part 6 b36c431abb
part 7 ee3153fe50
part 8 2c50b5f7ff
part 9 b2291374b4
part 10 a2315fdc67
part 11 9fa72545e2
This commit is in the series of PG16 compatibility commits.
We already took care of the majority of necessary outer join checks
in part 4 7c6b4ce103
However, In RelationInfoContainsOnlyRecurringTuples,
we need to add one more check of whether we are dealing
with an outer join RTE using IsRelOptOuterJoin function.
This prevents an outer join crash in sqlancer_failures.sql test.
We expect one more commit of PG compatibility with Citus's current
features are regression tests sanity.
PG16 compatibility - part 11
Check out part 1 42d956888d
part 2 0d503dd5ac
part 3 907d72e60d
part 4 7c6b4ce103
part 5 6056cb2c29
part 6 b36c431abb
part 7 ee3153fe50
part 8 2c50b5f7ff
part 9 b2291374b4
part 10 a2315fdc67
This commit is in the series of PG16 compatibility commits. It fixes
AM dependency and grant's admin option:
- Fix with admin option in grants
grantstmt->admin_opt no longer exists in PG16
instead, grantstmt has a list of options, one of them is admin option.
Relevant PG commit:
e3ce2de09d
e3ce2de09d814f8770b2e3b3c152b7671bcdb83f
- Fix pg_depend entry to AMs after ALTER TABLE .. SET ACCESS METHOD
Relevant PG commit:
97d8910104
97d89101045fac8cb36f4ef6c08526ea0841a596
More PG16 compatibility commits are coming soon:
We are very close to merging "PG16Beta3 Support - Regression tests sanity"
PG16 compatibility - part 10
Check out part 1 42d956888d
part 2 0d503dd5ac
part 3 907d72e60d
part 4 7c6b4ce103
part 5 6056cb2c29
part 6 b36c431abb
part 7 ee3153fe50
part 8 2c50b5f7ff
part 9 b2291374b4
This commit is in the series of PG16 compatibility commits. It:
- Adds buffer_usage_limit to vacuum and analyze
- Adds process_main, skip_database_stats, only_database_stats to vacuum
Important Note: adding these options is actually required for check-vanilla tests to succeed.
However, in concept, this PR belongs to "PG16 new features",
rather than "PG16 regression tests sanity"
Relevant PG commits:
1cbbee0338
1cbbee03385763b066ae3961fc61f2cd01a0d0d7
4211fbd841
4211fbd8413b26e0abedbe4338aa7cda2cd469b4
a46a7011b2
a46a7011b27188af526047a111969f257aaf4db8
More PG16 compatibility commits are coming soon ...
PG16 compatibility - part 7
Check out part 1 42d956888d
part 2 0d503dd5ac
part 3 907d72e60d
part 4 7c6b4ce103
part 5 6056cb2c29
part 6 b36c431abb
part 7 ee3153fe50
This commit is in the series of PG16 compatibility commits. PG16 introduced a new entry
varnnullingrels to Var, which represents our partkey in pg_dist_partition.
This commit does the necessary changes in Citus to support this.
Relevant PG commit:
2489d76c49
2489d76c4906f4461a364ca8ad7e0751ead8aa0d
More PG16 compatibility commits are coming soon ...
PG16 compatibility - Part 6
Check out part 1 42d956888d
part 2 0d503dd5ac
part 3 907d72e60d
part 4 7c6b4ce103
part 5 6056cb2c29
This commit is in the series of PG16 compatibility commits.
It handles the Permission Info changes in PG16. See below:
The main issue lies in the following entries of PlannedStmt: {
rtable
permInfos
}
Each rtable has an int perminfoindex, and its actual permission info is
obtained through the following:
permInfos[perminfoindex]
We had crashes because perminfoindexes were not updated in the finalized
planned statement after distributed planner hook.
So, basically, everywhere we set a query's or planned statement's rtable
entry, we need to set the rteperminfos/permInfos accordingly.
Relevant PG commits:
a61b1f7482
a61b1f74823c9c4f79c95226a461f1e7a367764b
b803b7d132
b803b7d132e3505ab77c29acf91f3d1caa298f95
More PG16 compatibility commits are coming soon ...
PG16 compatibility - Part 5
Check out part 1 42d956888d
part 2 0d503dd5ac
part 3 907d72e60d
part 4 7c6b4ce103
This commit is in the series of PG16 compatibility commits. Find the explanation below:
If we allow to adjust partitioning, we get a crash when accessing
amcostestimate of partitioned indexes, because amcostestimate is NULL
for them. The following PG commit is the culprit:
3c569049b7
3c569049b7b502bb4952483d19ce622ff0af5fd6
Previously, partitioned indexes would just be ignored.
Now, they are added in the list. However get_relation_info expects the
tables which have partitioned indexes to have the inh flag set properly.
AdjustPartitioningForDistributedPlanning plays with that flag, hence we
don't get the desired behaviour.
The hook is simply removing all partitioned indexes from the list.
More PG16 compatibility commits are coming soon ...
PG16 compatibility - Part 4
Check out part 1 42d956888d
part 2 0d503dd5ac
part 3 907d72e60d
This commit is in the series of PG16 compatibility commits.
It adds some outer join checks to the planner,
the new password_required option to the subscription,
and a crash fix related to PGIOAlignedBlock, see below for more details:
- Fix PGIOAlignedBlock Assert crash in PG16
Relevant PG commit:
faeedbcefd
faeedbcefd40bfdf314e048c425b6d9208896d90
- Pass planner info as argument to make_simple_restrictinfo
Pre PG16 passing plannerInfo to make_simple_restrictinfo
was only needed for placeholder Vars, which is not the case
in this part of the codebase because we are building the
expression from shard intervals which don't have placeholder
vars.
However, PG16 is counting baserels appearing in clause_relids
and is deleting the rels mentioned in plannerinfo->outer_join_rels
Hence directly accessing plannerinfo.
We will crash if we leave it as NULL.
For reference
2489d76c49 (diff-e045c41eda9686451a7993e91518e40056b3739365e39eb1b70ae438dc1f7c76R207)
Relevant PG commit:
2489d76c49
2489d76c4906f4461a364ca8ad7e0751ead8aa0d
- Add outer join checks, root->simple_rel_array
- fix rebalancer to include passwork_required option
Relevant PG commit:
c3afe8cf5a
c3afe8cf5a1e465bd71e48e4bc717f5bfdc7a7d6
More PG16 compatibility commits are coming soon ...
Similar to https://github.com/citusdata/citus/pull/7077.
As PG 16+ has changed the join restriction information for certain outer
joins, MERGE is also impacted given that is is also underlying an outer
join.
See #7077 for the details.
PG16 compatibility - Part 2
Part 1 provided successful compilation against pg16beta2.
42d956888d
This PR provides ruleutils changes with pg16beta2 and successful CREATE EXTENSION command.
Note that more changes are needed in order to have successful regression tests.
More commits are coming soon ...
For any_value changes, I referred to this commit
8ef94dc1f5
where we did something similar for PG14 support.
Prior to this commit, the code would skip processing the
errors happened for local commands.
Prior to https://github.com/citusdata/citus/pull/5379, it might
make sense to allow the execution continue. But, as of today,
if a modification fails on any placement, we can safely fail
the execution.
The first commit show the problem in action. The second commit
includes the fix and the test fixes.
Tradionally our planner works in the following order:
router - > pushdown -> repartition -> pull to coordinator
However, for INSERT .. SELECT commands, we did not support "router".
In practice, that is not a big issue, because pushdown planning can
handle router case as well.
However, with PG 16, certain outer joins are converted to JOIN without
any conditions (e.g., JOIN .. ON (true)) and the filters are pushed down
to the tables.
When the filters are pushed down to the tables, router planner can
detect. However, pushdown planner relies on JOIN conditions.
An example query:
```
INSERT INTO agg_events (user_id)
SELECT raw_events_first.user_id
FROM raw_events_first LEFT JOIN raw_events_second
ON raw_events_first.user_id = raw_events_second.user_id
WHERE raw_events_first.user_id = 10;
```
As a side effect of this change, now we can also relax certain
limitation that "pushdown" planner emposes, but not "router". So, with
this PR, we also allow those.
Closes https://github.com/citusdata/citus/pull/6772
DESCRIPTION: Prevents unnecessarily pulling the data into coordinator
for some INSERT .. SELECT queries that target a single-shard group
and the expression originating from the source. If the types are different, Citus uses
different hash functions for the two column types, which might lead to incorrect repartitioning
of the result data
Previously, we only checked whether the relations are colocated, but we
ignore the shard indexes. That causes certain queries still to be
accidentally router. We should enforce colocation checks for both shard
index and table colocation id to make the check restrictive enough.
For example, the following query should not be router, and after this
patch, it won't:
```SQL
SELECT
user_id
FROM
((SELECT user_id FROM raw_events_first WHERE user_id = 15) EXCEPT
(SELECT user_id FROM raw_events_second where user_id = 17)) as foo;
```
DESCRIPTION: Enforce shard level colocation with
citus.enable_non_colocated_router_query_pushdown
This PR provides successful compilation against PG16Beta2. It does some
necessary refactoring to prepare for full support of version 16, in
https://github.com/citusdata/citus/pull/6952 .
Change RelFileNode to RelFileNumber or RelFileLocator
Relevant PG commit
b0a55e43299c4ea2a9a8c757f9c26352407d0ccc
new header for varatt.h
Relevant PG commit:
d952373a987bad331c0e499463159dd142ced1ef
drop support for Abs, use fabs
Relevant PG commit
357cfefb09115292cfb98d504199e6df8201c957
tuplesort PGcommit: d37aa3d35832afde94e100c4d2a9618b3eb76472
Relevant PG commit:
d37aa3d35832afde94e100c4d2a9618b3eb76472
Fix vacuum in columnar
Relevant PG commit:
4ce3afb82ecfbf64d4f6247e725004e1da30f47c
older one:
b6074846cebc33d752f1d9a66e5a9932f21ad177
Add alloc_flags to pg_clean_ascii
Relevant PG commit:
45b1a67a0fcb3f1588df596431871de4c93cb76f
Merge GetNumConfigOptions() into get_guc_variables()
Relevant PG commit:
3057465acfbea2f3dd7a914a1478064022c6eecd
Minor PG refactor PG_FUNCNAME_MACRO __func__
Relevant PG commit
320f92b744b44f961e5d56f5f21de003e8027a7f
Pass NULL context to stringToQualifiedNameList, typeStringToTypeName
The pre-PG16 error behaviour for the following
stringToQualifiedNameList & typeStringToTypeName
was ereport(ERROR, ...)
Now with PG16 we have this context input. We preserve the same behaviour
by passing a NULL context, because of the following:
(copy paste comment from PG16)
If "context" isn't an ErrorSaveContext node, this behaves as
errstart(ERROR, domain), and the errsave() macro ends up acting
exactly like ereport(ERROR, ...).
Relevant PG commit
858e776c84f48841e7e16fba7b690b76e54f3675
Use RangeVarCallbackMaintainsTable instead of RangeVarCallbackOwnsTable
Relevant PG commit:
60684dd834a222fefedd49b19d1f0a6189c1632e
FIX THIS: Not implemented grant-level control of role inheritance
see PG commit
e3ce2de09d814f8770b2e3b3c152b7671bcdb83f
Make Scan node abstract
PG commit:
8c73c11a0d39049de2c1f400d8765a0eb21f5228
Change in Var representations, get_relids_in_jointree
PG commit
2489d76c4906f4461a364ca8ad7e0751ead8aa0d
Deadlock detection changes because SHM_QUEUE is removed
Relevant PG Commit:
d137cb52cb7fd44a3f24f3c750fbf7924a4e9532
TU_UpdateIndexes
Relevant PG commit
19d8e2308bc51ec4ab993ce90077342c915dd116
Use object_ownercheck and object_aclcheck functions
Relevant PG commits:
afbfc02983f86c4d71825efa6befd547fe81a926
c727f511bd7bf3c58063737bcf7a8f331346f253
Rework Permission Info for successful compilation
Relevant PG commits:
postgres/postgres@a61b1f7postgres/postgres@b803b7d
---------
Co-authored-by: onderkalaci <onderkalaci@gmail.com>
Index scans in PG16 return empty sets because of extra compatibility
enforcement for `ScanKeyInit` arguments.
Could be one of the relevant PG commits:
c8b2ef05f4
This PR fixes all incompatible `RegProcedure` and `Datum` arguments in
all `ScanKeyInit` functions used throughout the codebase.
Helpful for https://github.com/citusdata/citus/pull/6952