DESCRIPTION: Makes sure to disallow triggers that depend on extensions
We were already doing so for `ALTER trigger DEPENDS ON EXTENSION`
commands. However, we also need to disallow creating Citus tables
having such triggers already, so this PR fixes that.
DESCRIPTION: Allow citus_update_node() to work with nodes from different clusters
citus_update_node(), citus_nodename_for_nodeid(), and citus_nodeport_for_nodeid() functions only checked for nodes in their own clusters and hence last two returned NULLs and the first one showed an error is the nodeId was from a different cluster.
Fixes https://github.com/citusdata/citus/issues/6433
increasing logical clock. Clock guarantees to never go back in value after restarts,
and makes best attempt to keep the value close to unix epoch time in milliseconds.
Also, introduces a new GUC "citus.enable_cluster_clock", when true, every
distributed transaction is stamped with logical causal clock and persisted
in a catalog pg_dist_commit_transaction.
DESCRIPTION: Drops GUC defer_drop_after_shard_split
DESCRIPTION: Drops GUC defer_drop_after_shard_move
Drop GUCs and related parts from the code.
Delete tests that specifically added for the GUCs.
Keep tests that can be used without the GUCs.
Update test output changes.
The motivation for this PR is to have an "always deferring" mechanism.
These two GUCs provide an option to not deferring dropping objects
during a shard move/split, and dropping them immediately. With this PR,
we will be always deferring dropping orphaned shards and other types of
objects.
We will have a separate PR to extend the deferred cleanup operation, so
that we would create records for deferred drop, for Subscriptions,
Publications, Replication Slots etc. This will make us be able to keep
track of created objects that needs to be dropped, during a shard
move/split. We will have objects created specifically for the current
operation; and those objects will be dropped at the end.
We have an issue (a draft roadmap) for enabling parallel shard moves.
For details please see: https://github.com/citusdata/citus/issues/6437
Sometimes in CI our failure_split_cleanup test would fail like this:
```diff
CALL pg_catalog.citus_cleanup_orphaned_resources();
-NOTICE: cleaned up 79 orphaned resources
+NOTICE: cleaned up 82 orphaned resources
SELECT operation_id, object_type, object_name, node_group_id, policy_type
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/28107/workflows/4ec712c9-98b5-4e90-9806-e02a37d71679/jobs/846107
The reason was that previous tests in the schedule would also create
some orphaned resources. Sometimes some of those would already be
cleaned up by the maintenance daemon, resulting in a different number of
cleaned up resources than expected. This cleans up any previously
created resources at the start of the test without logging how many
exactly were cleaned up. As a bonus this now also allows running this
test using check-failure-base.
DESCRIPTION: Don't leak search_path to workers on DDL
For DDL we have to set the `search_path` on workers to the same as on
the coordinator for some DDL to work. Previously this search_path would
leak outside of the transaction that was used for the DDL. This fixes
that by using `SET LOCAL` instead of `SET`. The only place where we
still use plain `SET` is for DDL commands that are not allowed within
transactions, such as `CREATE INDEX CONCURRENLTY`.
This fixes this flaky test:
```diff
CONTEXT: SQL statement "SELECT change_id FROM distributed_triggers.data_changes
WHERE shard_key_value = NEW.shard_key_value AND object_id = NEW.object_id
ORDER BY change_id DESC LIMIT 1"
-PL/pgSQL function record_change() line XX at SQL statement
+PL/pgSQL function distributed_triggers.record_change() line 17 at SQL statement
while executing command on localhost:57638
DELETE FROM data_ref_table where shard_key_value = 'hello';
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/27849/workflows/75ae5f1a-100b-4b7a-b991-7de069f39ee1/jobs/831429
I had tried to fix this flaky test in #5894 and then I tried
implementing a better fix in #5896, where @marcocitus suggested this
better fix. This change reverts the fix from #5894 and implements the
fix suggested by Marco.
Our multi_mx_alter_distributed_table test actually depended on the old
buggy search_path leaking behavior. After fixing the bug that test would
fail like this:
```diff
CALL proc_0(1.0);
DEBUG: pushing down the procedure
-NOTICE: Res: 3
-DETAIL: from localhost:xxxxx
+ERROR: relation "test_proc_colocation_0" does not exist
+CONTEXT: PL/pgSQL function mx_alter_distributed_table.proc_0(double precision) line 5 at SQL statement
+while executing command on localhost:57637
RESET client_min_messages;
```
I fixed this test by fully qualifying the table names used in the
procedure. I think it's quite unlikely that actual users depend
on this behavior though. Since it would require first doing
DDL before calling a procedure in a session where the
search_path was changed after connecting.
DESCRIPTION: Adds failure test for shard move
DESCRIPTION: Remove function `WaitForAllSubscriptionsToBecomeReady` and
related tests
Adding some failure tests for shard moves.
Dropping the not-needed-anymore function
`WaitForAllSubscriptionsToBecomeReady`, as the subscriptions now start
as ready from the beginning because we don't use logical replication
table sync workers anymore.
fixes: #6260
In CI shard_rebalancer sometimes fails with this error:
```diff
SET citus.node_connection_timeout to 60;
BEGIN;
SET LOCAL citus.shard_replication_factor TO 2;
SET citus.log_remote_commands TO ON;
SET SESSION citus.max_adaptive_executor_pool_size TO 5;
SELECT replicate_table_shards('dist_table_test_2', max_shard_copies := 4, shard_transfer_mode:='block_writes');
+WARNING: could not establish connection after 60 ms
```
Source
https://app.circleci.com/pipelines/github/citusdata/citus/28128/workflows/38eeacc4-4191-4366-87ed-9a628414965a/jobs/847458?invite=true#step-107-21
This PR avoids this issue by increasing
```citus.node_connection_timeout``` to 35s.
Deparser function set_relation_column_names() knows that it needs to
re-evaluate column names based on relation's tuple descriptor when
the rte belongs to a relation (RTE_RELATION).
However before this commit, it didn't know about the fact that citus
might wrap such an rte with an rte that points to
citus_extradata_container() placeholder.
And because of this, it was simply taking the column aliases
(e.g., "bar" in "foo AS bar") into the account and this might result in
an incorrectly deparsed query as in below case:
* Say, if we had view based on following query:
```sql
SELECT a FROM table;
```
* And if we rename column "a" to "b", the view query normally becomes:
```sql
SELECT b AS a FROM table;
```
* So before this commit, deparsing a query based on that view was
resulting in such a query due to deparsing based on the column aliases,
which is not correct:
```sql
SELECT a FROM table;
```
Fixes#5932.
DESCRIPTION: Fixes a bug that might cause failing to query the views
based on tables that have renamed columns
PostgreSQL 15 exposes WL_SOCKET_CLOSED in WaitEventSet API, which is
useful for detecting closed remote sockets. In this patch, we use this
new event and try to detect closed remote sockets in the executor.
When a closed socket is detected, the executor now has the ability to
retry the connection establishment. Note that, the executor can retry
connection establishments only for the connection that has not been
used. Basically, this patch is mostly useful for preventing the executor
to fail if a cached connection is closed because of the worker node
restart (or worker failover).
In other words, the executor cannot retry connection establishment if we
are in a distributed transaction AND any command has been sent over the
connection. That requires more sophisticated retry mechanisms. For now,
fixing the above use case is enough.
Fixes#5538
Earlier discussions: #5908, #6259 and #6283
### Summary of the current approach regards to earlier trials
As noted, we explored some alternatives before getting into this.
https://github.com/citusdata/citus/pull/6283 is simple, but lacks an
important property. We should be checking for `WL_SOCKET_CLOSED`
_before_ sending anything over the wire. Otherwise, it becomes very
tricky to understand which connection is actually safe to retry. For
example, in the current patch, we can safely check
`transaction->transactionState == REMOTE_TRANS_NOT_STARTED` before
restarting a connection.
#6259 does what we intent here (e.g., check for sending any command).
However, as @marcocitus noted, it is very tricky to handle
`WaitEventSets` in multiple places. And, the executor is designed such
that it reacts to the events. So, adding anything `pre-executor` seemed
too ugly.
In the end, I converged into this patch. This patch relies on the
simplicity of #6283 and also does a very limited handling of
`WaitEventSets`, just for our purpose. Just before we add any connection
to the execution, we check if the remote session has already closed.
With that, we do a brief interaction of multiple wait event processing,
but with different purposes. The new wait event processing we added does
not even consider cancellations. We let that handled by the main event
processing loop.
Co-authored-by: Marco Slot <marco.slot@gmail.com>
If an operation requires having coordinator in pg_dist_node and if that
is not the case, then we automatically add the coordinator into
pg_dist_node if user didn't add any worker nodes yet.
However, if user have already added some worker nodes before, we throw
an error. With this commit, we improve the error thrown in that case.
Closes#6423 based on the discussion made there.
DESCRIPTION: Fix bug in global PID assignment for rebalancer
sub-connections
In CI our isolation_shard_rebalancer_progress test would sometimes fail
like this:
```diff
+isolationtester: canceling step s1-rebalance-c1-block-writes after 60 seconds
step s1-rebalance-c1-block-writes:
SELECT rebalance_table_shards('colocated1', shard_transfer_mode:='block_writes');
- <waiting ...>
+
+ERROR: canceling statement due to user request
step s7-get-progress:
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/27855/workflows/2a7e335a-f3e8-46ed-b6bd-6920d42f7214/jobs/831710
It turned out this was an actual bug in the way our assigning of global
PIDs interacts with the way we connect to ourselves as the shard
rebalancer. The first command the shard rebalancer sends is a SET
ommand to change the application_name to `citus_rebalancer`. If
`StartupCitusBackend` is called after this command is processed, then it
overwrites the global PID that was extracted from the previous
application_name. This makes sure that we don't do that, and continue to
use the original global PID. While it might seem that we only call
`StartupCitusBackend` once for each query backend, this isn't actually
the case. Whenever pg_dist_partition gets ANALYZEd by autovacuum
we indirectly call `StartupCitusBackend` again, because we invalidate
the cache then.
In passing this fixes two other things as well:
1. It sets `distributedCommandOriginator` correctly in
`AssignGlobalPID`, by using IsExternalClientBackend(). This doesn't
matter much anymore, since AssignGlobalPID effectively becomes a
no-op in this PR for any non-external client backends.
2. It passes the application_name to InitializeBackendData in
StartupCitusBackend, instead of INVALID_CITUS_INTERNAL_BACKEND_GPID
(which effectively got casted to NULL). In practice this doesn't
change the behaviour of the call, since the call is a no-op for every
backend except the maintenance daemon. And the behaviour of the call
is the same for NULL as for the application_name of the maintenance
daemon.
We decrease verbosity level here to avoid the flaky output
https://app.circleci.com/pipelines/github/citusdata/citus/27936/workflows/dc63128a-1570-41a0-8722-08f3e3cfe301/jobs/836153
```diff
select alter_table_set_access_method('ref','heap');
NOTICE: creating a new table for alter_table_set_access_method.ref
NOTICE: moving the data of alter_table_set_access_method.ref
NOTICE: dropping the old alter_table_set_access_method.ref
NOTICE: drop cascades to 2 other objects
-DETAIL: drop cascades to materialized view m_ref
-drop cascades to view v_ref
+DETAIL: drop cascades to view v_ref
+drop cascades to materialized view m_ref
CONTEXT: SQL statement "DROP TABLE alter_table_set_access_method.ref CASCADE"
NOTICE: renaming the new table to alter_table_set_access_method.ref
alter_table_set_access_method
-------------------------------
(1 row)
```
PG15 introduced an optimization on GROUP BY keys that is now reverted on
RC2.
Relevant PG commit:
Revert "Optimize order of GROUP BY keys".
443df6e2db932a7cd6d85ddfb67e11a43345130d
Fixes https://github.com/citusdata/citus/issues/6394.
DESCRIPTION: Fixes a bug that causes creating disabled-triggers on
shards as enabled
Since CREATE TRIGGER doesn't have syntax support to specify
whether the trigger should be enabled/disabled, the underlying
PG function (`pg_get_triggerdef()`) that we use to generate the
command to create the trigger is not enough. For this reason, we
append a second command to enable/disable trigger, right after
creating it.
We don't retain explicit extension dependencies set by using
`ALTER trigger DEPENDS ON EXTENSION` commands too, but apparently
right fix for that is to throw an error as in
`PreprocessAlterTriggerDependsStmt()`; so, opened a separate PR
to fix that #6399.
During alter_distributed_table, we create a new table like the
original table but with the altered options.
To retrieve the name of the distribution column, we were using
the attribute syscache of the new table, since we already created
the new table as identical to the original table.
However, the attribute syscaches of these two tables are not
the same if the original table has dropped columns. The reason
is that dropped columns are all still present in the cache.
Hence, for example, the attnos would be different in the syscaches.
So, let's use the attribute syscache of the original table.
DESCRIPTION: Fixes a bug that prevents retaining columnar table options after a table-rewrite
A fix for this issue: Columnar: options ignored during ALTER TABLE
rewrite #5927
The OID for the temporary table created during ALTER TABLE was not the
same as the original table's OID so the columnar options were not being
applied during rewrite.
The change is that I applied the original table's columnar options to
the new table so that it has the correct options during write. I also
added a test.
This test used to contain some utility commands that Citus did not
support. However we added support for most of the commands, and this
test got outdated.
We used to error out on community when user attempted to use pooler
options. Now that we open sourced all enterprise features, the test can
now be removed.
PG15 now allows users to specify oids when creating databases. This
feature is a side effect of a bigger feature in pg_upgrade.
Relevant PG Commit:
pg_upgrade: Preserve database OIDs.
aa01051418f10afbdfa781b8dc109615ca785ff9
PG15 has suppressed some casts on constants when querying foreign
tables.
For example, we can use text to represent a type that's an enum on the
remote side.
A comparison on such a column will get shipped as "var = 'foo'::text".
But there's no enum = text operator on the remote side.
If we leave off the explicit cast, the comparison will work.
Test we behave in the same way with a Citus foreign table
Reminder: foreign tables cannot be distributed/reference, can only be
Citus local
Relevant PG commit:
f8abb0f5e1
PostgreSQL 15 had some changes to jsonpath to conform with ECMA-262
referenced by SQL standard. This commit adds tests to make sure Citus
also supports the same standards.
Relevant pg commit:
e26114c817b610424010cfbe91a743f591246ff1
In Split, Logical replication logic and ShardCleaner we call
`SendCommandListToWorkerOutsideTransaction` and
`SendOptionalCommandListToWorkerOutsideTransaction` frequently. This
opens new connection for each of those calls, even though we already
have a perfectly good connection lying around.
This PR adds two new APIs
`SendCommandListToWorkerOutsideTransactionWithConnection` and
`SendOptionalCommandListToWorkerOutsideTransactionWithConnection` that
allow sending a list of queries in a transaction over an existing
connection. We also update the callers (Split, ShardCleaner, Logical
Replication) to use these new APIs instead.
Co-authored-by: Nitish Upreti <niupre@microsoft.com>
Co-authored-by: Onder Kalaci <onderkalaci@gmail.com>
Given that we drop DEFAULT nextval('sequence') expressions from
shard relation columns, allowing `ON DELETE/UPDATE SET DEFAULT`
on such columns might cause inserting NULL values as a result
of a delete/update operation.
For this reason, we disallow ON DELETE/UPDATE SET DEFAULT actions
on columns that default to sequences.
DESCRIPTION: Disallows having ON DELETE/UPDATE SET DEFAULT actions on
columns that default to sequences
Fixes#6339.
As we did for GENERATED STORED columns in #4613, we should not drop
column
default expressions that are not based on sequences from shard relation
since
such expressions need to exist e.g. for foreign key actions.
For the column default expressions that are based on sequences we cannot
do much, so we need to disallow having ON DELETE SET DEFAULT actions on
such columns in a separate PR, see #6339.
Fixes#6318.
DESCRIPTION: Fixes a bug that might cause inserting incorrect DEFAULT
values when applying foreign key actions
PG15 added support for security invoker views. Relevant PG conmit:
7faa5fc84b
These views check the permissions for the underlying tables of the view
invoker user, not the view definer user.
When the view has underlying distributed tables, the queries to the
shards are sent by opening connections with the current user, which is
the view invoker, no matter what the type of the view is. This means
that, for distributed views, they were always behaving like security
invoker views. Check the following issue for more details:
https://github.com/citusdata/citus/issues/6161
So, Citus doesn't fully support security definer views.
However Citus does fully support security invoker views. We add tests to
make sure we cover different cases.
Both tests include pushdown and pull to coordinator type of aggregate
execution.
Relevant PG commits:
Add min() and max() aggregates for xid8
400fc6b6487ddf16aa82c9d76e5cfbe64d94f660
Add range_agg with multirange inputs
7ae1619bc5b1794938c7387a766b8cae34e38d8a
Co-authored-by: Onder Kalaci <onderkalaci@gmail.com>
Comment from the code is clear on this:
/*
* The statistics objects of the distributed table are not relevant
* for the distributed planning, so we can override it.
*
* Normally, we should not need this. However, the combination of
* Postgres commit 269b532aef55a579ae02a3e8e8df14101570dfd9 and
* Citus function AdjustPartitioningForDistributedPlanning()
* forces us to do this. The commit expects statistics objects
* of partitions to have "inh" flag set properly. Whereas, the
* function overrides "inh" flag. To avoid Postgres to throw error,
* we override statlist such that Postgres does not try to process
* any statistics objects during the standard_planner() on the
* coordinator. In the end, we do not need the standard_planner()
* on the coordinator to generate an optimized plan. We call
* into standard_planner() for other purposes, such as generating the
* relationRestrictionContext here.
*
* AdjustPartitioningForDistributedPlanning() is a hack that we use
* to prevent Postgres' standard_planner() to expand all the partitions
* for the distributed planning when a distributed partitioned table
* is queried. It is required for both correctness and performance
* reasons. Although we can eliminate the use of the function for
* the correctness (e.g., make sure that rest of the planner can handle
* partitions), it's performance implication is hard to avoid. Certain
* planning logic of Citus (such as router or query pushdown) relies
* heavily on the relationRestrictionList. If
* AdjustPartitioningForDistributedPlanning() is removed, all the
* partitions show up in the, causing high planning times for
* such queries.
*/
DESCRIPTION: Fixes floating exception during
create_distributed_table_concurrently.
Fixes#6332.
During create_distributed_table_concurrently, when there is no active
primary node, it fails with floating exception. We added similar check
with create_distributed_table. It will fail with proper message if
current active node is less than replication factor.
No need for description, fixing issue introduced with new feature for
11.1
Fixes#6333
Due to Postgres' C api being o-indexed and postgres' attributes being
1-indexed, we were reading the wrong Datum as the Task owner when
cancelling. Here we add a test to show the error and fix the off-by-one
error.
We can now do the following:
- Distribute sequence with logged/unlogged option
- ALTER TABLE my_sequence SET LOGGED/UNLOGGED
- ALTER SEQUENCE my_sequence SET LOGGED/UNLOGGED
Relevant PG commit
344d62fb9a
PG15 introduces `CLUSTER` commands for partitioned tables. Similar to a
`CLUSTER` command with no supplied table names, these commands also can
not be run inside transaction blocks and therefore can not be propagated
in a distributed transaction block with ease. Therefore we raise warnings.
Relevant PG commit: cfdd03f45e6afc632fbe70519250ec19167d6765
DESCRIPTION: Add a rebalancer that uses background tasks for its
execution
Based on the baclground jobs and tasks introduced in #6296 we implement
a new rebalancer on top of the primitives of background execution. This
allows the user to initiate a rebalance and let Citus execute the long
running steps in the background until completion.
Users can invoke the new background rebalancer with `SELECT
citus_rebalance_start();`. It will output information on its job id and
how to track progress. Also it returns its job id for automation
purposes. If you simply want to wait till the rebalance is done you can
use `SELECT citus_rebalance_wait();`
A running rebalance can be canelled/stopped with `SELECT
citus_rebalance_stop();`.