Apparently no-one actually ran the mx_base_schedule, because the tests
in schedule itself were already failing. This updates it to be in line
with multi_mx_schedule again to make the tests pass again. Notably it
doesn't contain multi_mx_node_metadata and multi_extension. Because
those tests take long to run and the were not necessary to make
multi_mx_create_table pass again.
DESCRIPTION: Adds support for creating table constraints UNIQUE and
EXCLUDE via ALTER TABLE command without client having to specify a name.
ALTER TABLE ... ADD CONSTRAINT <conname> UNIQUE ...
ALTER TABLE ... ADD CONSTRAINT <conname> EXCLUDE ...
commands require the client to provide an explicit constraint name.
However, in postgres it is possible for clients not to provide a name
and let the postgres generate it using the following commands
ALTER TABLE ... ADD UNIQUE ...
ALTER TABLE ... ADD EXCLUDE ...
This PR enables the same functionality for citus tables.
DESCRIPTION: Drop `SHARD_STATE_TO_DELETE` and use the cleanup records
instead
Drops the shard state that is used to mark shards as orphaned. Now we
insert cleanup records into `pg_dist_cleanup` so "orphaned" shards will
be dropped either by maintenance daemon or internal cleanup calls. With
this PR, we make the "cleanup orphaned shards" functions to be no-op, as
they would not be needed anymore.
This PR includes some naming changes about placement functions. We don't
need functions that filter orphaned shards, as there will be no orphaned
shards anymore.
We will also be introducing a small script with this PR, for users with
orphaned shards. We'll basically delete the orphaned shard entries from
`pg_dist_placement` and insert cleanup records into `pg_dist_cleanup`
for each one of them, during Citus upgrade.
We also have a lot of flakiness fixes in this PR.
Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
Sometimes our `isolation_insert_vs_vacuum` test would fail like this.
```diff
step s2-vacuum-analyze:
VACUUM ANALYZE test_insert_vacuum;
-
+ <waiting ...>
step s1-commit:
COMMIT;
+step s2-vacuum-analyze: <... completed>
```
The reason seems to be that VACUUM ANALYZE tries to take some locks that
conflict with the other transaction, but these locks somehow get
released or VACUUM ANALYZE stops waiting for them. This is somewhat
expected since VACUUM has some special locking logic.
To solve the flakyness we now trigger VACUUM ANALYZE to always report as
blocking and after that we wait explicitly wait for it to complete. This
is done
like is suggested by the flaky test tips from postgres:
c68a183990/src/test/isolation/README (L152)
I've confirmed that this fixes the issue suing our flaky-test-debugging
CI workflow.
DESCRIPTION: Defers cleanup after a failure in shard move or split
We don't need to do a cleanup in case of failure on a shard transfer or
split anymore. Because,
* Maintenance daemon will clean them up anyway.
* We trigger a cleanup at the beginning of shard transfers/splits.
* The cleanup on failure logic also can fail sometimes and instead of
the original error, we throw the error that is raised by the cleanup
procedure, and it causes confusion.
* Skip some exceptional test files in the flaky workflow, like
multi_extension
* Run some tests without a schedule, like single_node_enterprise
* Use minimal schedule for the tests in split and operations schedules
DESCRIPTION: Cleanup the shard on the target node in case of a
failed/aborted shard move
Inserts a cleanup record for the moved shard placement on the target
node. If the move operation succeeds, the record will be deleted. If
not, it will remain there to be cleaned up later.
fixes: #6580
We have several version checks in our Citus upgrade tests. However, as
we drop support for PG versions, we need to update the Citus versions
used in our CI images. Therefore we must compare Citus versions in our
tests instead of using equality checks so that the queries are ran in
all the associated Citus versions.
For example, we have many conditionals where we early exit if the Citus
version is not equal to 9.0. However, as of today we never use version
9.0 and thus we always early exit in those tests.
All the tables (target, source or any CTE present) in the SQL statement are local i.e. a merge-sql with a combination of Citus local and
Non-Citus tables (regular Postgres tables) should work and give the same result as Postgres MERGE on regular tables. Catch and throw an
exception (not-yet-supported) for all other scenarios during Citus-planning phase.
DESCRIPTION: Support ALTER TABLE .. ADD PRIMARY KEY ... command
Before processing
> **ALTER TABLE ... ADD PRIMARY KEY ...**
command
1. Create a primary key name to use as the constraint name.
2. Change the **ALTER TABLE ... ADD PRIMARY KEY ...** command to into
**ALTER TABLE ... ADD CONSTRAINT \<constraint name> PRIMARY KEY ...**
form.
This is the only form we can specify a name for a primary key. If we run
ALTER TABLE .. ADD PRIMARY KEY, postgres
would create a constraint name internally in its own scheme. But the
problem is that we need to create constraint names
for shards in our own scheme which is \<constraint name>_\<shardid>.
Hence we need to create a name and send it to workers so that the
workers can append the shardid.
4. Run the changed command on the coordinator to make sure we are using
the same constraint name across the board.
5. Send the changed command to workers such that it is executed for the
main table as well as for the shards.
Fixes#6515.
Removes unused job boundary tag `SUBQUERY_MAP_MERGE_JOB`.
Only usage is at `BuildMapMergeJob`, which is only called when the
boundary = `JOIN_MAP_MERGE_JOB`. Hence, it should be safe to remove.
Before this commit, we created an additional WaitEventSet for
checking whether the remote socket is closed per connection -
only once at the start of the execution.
However, for certain workloads, such as pgbench select-only
workloads, the creation/deletion of the additional WaitEventSet
adds ~7% CPU overhead, which is also reflected on the benchmark
results.
With this commit, we use the same WaitEventSet for the purposes
of checking the remote socket at the start of the execution.
We use "rebuildWaitEventSet" flag so that the executor can re-use
the existing WaitEventSet.
As a result, we see the following improvements on PG 15:
main : 120051 tps, 0.532 ms latency avg.
avoid_wes_rebuild: 127119 tps, 0.503 ms latency avg.
And, on PG 14, as expected, there is no difference
main : 129191 tps, 0.495 ms latency avg.
avoid_wes_rebuild: 129480 tps, 0.494 ms latency avg.
But, note that PG 15 is slightly (~1.5%) slower than PG 14.
That is probably the overhead of checking the remote socket.
Fixes a missed include in #6315.
While adding the cluster clock we have added some extra steps to
`citus_prepare_pg_upgrade` and `citus_finish_pg_upgrade`. These changes
were not added to the citus upgrade and downgrade scripts, this allowed
for a syntax error to slip in.
This PR adds the new versions of both UDF's to the upgrade script while
adding the old version to the downgrade script. This exposed the syntax
error which is also solved.
- Because of the make command used for vanilla tests, test status is
always shown as success on CI. As a fix, I added `&& false` at the end
of the copying diff file to make the command fail when check-vanilla
fails.
```make
check-vanilla: all
$(pg_regress_multi_check) --vanillatest || (cp $(vanilla_diffs_file) $(citus_abs_srcdir)/regression.diffs && false)
```
- I also fixed some vanilla tests that fails due to recently added clock
related operators shown up at some queries.
We already have citus_job_wait to wait until the job reaches the desired
state. That PR adds waiting on task state to allow more granular
waiting. It can be used for Citus operations. Moreover, it is also
useful for testing purposes. (wait until a task reaches specified state)
Related to #6459.
Fixes task executor SIGTERM handling.
Problem:
When task executors are sent SIGTERM, their default handler
`bgworker_die`, which is set at worker startup, logs FATAL error. But
they do not release locks there before logging the error, which
sometimes causes hanging of the monitor. e.g. Monitor waits for the lock
forever at pg_stat flush after calling proc_exit.
Solution:
Because executors have connection to backend, they should handle SIGTERM
similar to normal backends. Normal backends uses `die` handler, in which
they set ProcDiePending flag and the next CHECK_FOR_INTERRUPTS call
handles it gracefully by releasing any lock before termination.
This PR adds a new CI workflow named ```flaky-test``` to run flaky test
detection on newly introduced regression tests.
Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
Adding a testing function `wait_for_resource_cleanup` which waits until
all records in `pg_dist_cleanup` are cleaned up. The motivation is to
prevent flakiness in our tests, since the `NOTICE: cleaned up X orphaned
resources` message is not consistent in many cases. This PR replaces
`citus_cleanup_orphaned_resources` calls with
`wait_for_resource_cleanup` calls.
DESCRIPTION: Create replication artifacts with unique names
We're creating replication objects with generic names. This disallows us
to enable parallel shard moves, as two operations might use the same
objects. With this PR, we'll create below objects with operation
specific names, by appending OparationId to the names.
* Subscriptions
* Publications
* Replication Slots
* Users created for subscriptions
1) Regular users fail to use clock UDF with permission issue.
2) Clock functions were declared as STABLE, whereas by definition they are VOLATILE. By design, any clock/time
functions will return different results for each call even within a single SQL statement.
Note: UDF citus_get_transaction_clock() is a misnomer as it internally calls the clock tick which always returns
different results for every invocation in the same transaction.
Adds signal handlers for graceful termination, cancellation of
task executors and detecting config updates. Related to PR #6459.
#### How to handle termination signal?
Monitor need to gracefully terminate all running task executors before
terminating. Hence, we have sigterm handler for the monitor.
#### How to handle cancellation signal?
Monitor need to gracefully cancel all running task executors before
terminating. Hence, we have sigint handler for the monitor.
#### How to detect configuration changes?
Monitor has SIGHUP handler to reflect configuration changes while
executing tasks.
We are having some flakiness in our test schedule because of the objects
leftover from shard moves/splits. With this commit we prevent logging
cleanup object counts.
fixes: #6534
When using multiline strings, we occasionally forget to add a single
space at the end of the first line. When this line is concatenated with
the next one, the resulting string has a missing space.
With this PR, citus code will be tested in all packaging environments.
Sometimes, there can be compile errors which blocks packaging and in
this case unplanned delays may occur.
By testing the code in packaging environments, I'm aiming to detect any
compilation errors before packaging.
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
Co-authored-by: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com>
DESCRIPTION: Extend cleanup process for replication artifacts
This PR adds new cleanup record types for:
* Subscriptions
* Replication slots
* Publications
* Users created for subscriptions
We add records for these object types, to `pg_dist_cleanup` during
creation phase. Once the operation is done, in case of success or
failure, we iterate those records and drop the objects. With this PR we
will not be dropping any of these objects during the operation. In
short, we will always be deferring the drop.
One thing that's worth mentioning is that we sort cleanup records before
processing (dropping) them, because of dependency relations among those
objects, e.g a subscription might depend on a publication. Therefore, we
always drop subscriptions before publications.
We have some renames in this PR:
* `TryDropOrphanedShards` -> `TryDropOrphanedResources`
* `DropOrphanedShardsForCleanup` -> `DropOrphanedResourcesForCleanup`
* `run_try_drop_marked_shards` -> `run_try_drop_marked_resources`
as these functions now process replication artifacts as well.
This PR drops function `DropAllLogicalReplicationLeftovers` and its all
usages, since now we rely on the deferring drop mechanism.
Improvement on our background task monitoring API (PR #6296) to support
concurrent and nonblocking task execution.
Mainly we have a queue monitor background process which forks task
executors for `Runnable` tasks and then monitors their status by
fetching messages from shared memory queue in nonblocking way.
**Problem**: Currently, we error out if we detect recurring tuples in
one side without checking the other side of the join.
**Solution**: When one side of the full join consists recurring tuples
and the other side consists nonrecurring tuples, we should not pushdown
to prevent duplicate results. Otherwise, safe to pushdown.
This PR changes
```citus.propagate_session_settings_for_loopback_connection``` default
value to off not to expose this feature publicly at this point. See
#6488 for details.
When debugging issues it's quite useful to see the originating gpid in
the application_name of a query on a worker. This already happens for
most queries, but not for queries created by the rebalancer or by
run_command_on_worker. This adds a gpid to those two application_names
too.
Note, that if the GPID of the new application_names is different than
the current GPID of the backend the backend will continue to keep
the old gpid as its actual GPID. This PR is just meant to make sure
that the application_name is as useful as it can be for users to
look at. Updating of gpids will be done in a follow-up PR, and
adding gpids to all internal connections will make this easier.
DESCRIPTION: Fixes a potential dangling pointer issue
Need to backport to 11.0 & 11.1 since we might want to release packages
for debian/bookworm based on those branches in future.
Fixes a bug that causes crash when using auto_explain extension with
ALTER TABLE...ADD FOREIGN KEY... queries.
Those queries trigger a SELECT query on the citus tables as part of the
foreign key constraint validation check. At the explain hook, workers
try to explain this SELECT query as a distributed query causing memory
corruption in the connection data structures. Hence, we will not explain
ALTER TABLE...ADD FOREIGN KEY... and the triggered queries on the
workers.
Fixes#6424.
I recently cleaned up our test suite from redundant test outputs: #6111#6140#6214#6140#6434
I had to check many files manually, as they didn't have any
documentation on why the alternative test output existed in the first
place.
Adding a section in our test docs to remind developers to add
alternative test outputs with enough information/keywords.
(Hopefully) Fixes#5000.
If memory allocation done for `SubXactContext *state` in `PushSubXact()`
fails, then `PopSubXact()` might segfault, for example, when grabbing
the
topmost `SubXactContext` from `activeSubXactContexts` if this is the
first
ever subxact within the current xact, with the following stack trace:
```c
citus.so!list_nth_cell(const List * list, int n) (\opt\pgenv\pgsql-14.3\include\server\nodes\pg_list.h:260)
citus.so!PopSubXact(SubTransactionId subId) (\home\onurctirtir\citus\src\backend\distributed\transaction\transaction_management.c:761)
citus.so!CoordinatedSubTransactionCallback(SubXactEvent event, SubTransactionId subId, SubTransactionId parentSubid, void * arg) (\home\onurctirtir\citus\src\backend\distributed\transaction\transaction_management.c:673)
CallSubXactCallbacks(SubXactEvent event, SubTransactionId mySubid, SubTransactionId parentSubid) (\opt\pgenv\src\postgresql-14.3\src\backend\access\transam\xact.c:3644)
AbortSubTransaction() (\opt\pgenv\src\postgresql-14.3\src\backend\access\transam\xact.c:5058)
AbortCurrentTransaction() (\opt\pgenv\src\postgresql-14.3\src\backend\access\transam\xact.c:3366)
PostgresMain(int argc, char ** argv, const char * dbname, const char * username) (\opt\pgenv\src\postgresql-14.3\src\backend\tcop\postgres.c:4250)
BackendRun(Port * port) (\opt\pgenv\src\postgresql-14.3\src\backend\postmaster\postmaster.c:4530)
BackendStartup(Port * port) (\opt\pgenv\src\postgresql-14.3\src\backend\postmaster\postmaster.c:4252)
ServerLoop() (\opt\pgenv\src\postgresql-14.3\src\backend\postmaster\postmaster.c:1745)
PostmasterMain(int argc, char ** argv) (\opt\pgenv\src\postgresql-14.3\src\backend\postmaster\postmaster.c:1417)
main(int argc, char ** argv) (\opt\pgenv\src\postgresql-14.3\src\backend\main\main.c:209)
```
For this reason, to be more defensive against memory-allocation errors
that could happen at `PushSubXact()`, now we use our pre-allocated
memory
context for the objects created in `PushSubXact()`.
This commit also attempts reducing the memory allocations done under
CommitContext to reduce the chances of consuming all the memory
available
to CommitContext.
Note that it's problematic to encounter with such a memory-allocation
error for other objects created in `PushSubXact()` as well, so above is
an **example** scenario that might result in a segfault.
DESCRIPTION: Fixes a bug that might cause segfaults when handling deeply
nested subtransactions
DESCRIPTION: Makes sure to disallow triggers that depend on extensions
We were already doing so for `ALTER trigger DEPENDS ON EXTENSION`
commands. However, we also need to disallow creating Citus tables
having such triggers already, so this PR fixes that.
DESCRIPTION: Improve a query that terminates compeling backends from citus_update_node()
1. Use pg_blocking_pids() function instead of self join on pg_locks. It exists since 9.6 and more accurate than pg_locks.
2. Prefix all function calls with pg_catalog schema to prevent privilege escalation by creating functions with similar names in a public schema.
3. Change logs and update comments to reflect the fact that the pg_terminate_backend() function only sends SIGTERM but not wating for the actual backend termination.
DESCRIPTION: Allow citus_update_node() to work with nodes from different clusters
citus_update_node(), citus_nodename_for_nodeid(), and citus_nodeport_for_nodeid() functions only checked for nodes in their own clusters and hence last two returned NULLs and the first one showed an error is the nodeId was from a different cluster.
Fixes https://github.com/citusdata/citus/issues/6433
increasing logical clock. Clock guarantees to never go back in value after restarts,
and makes best attempt to keep the value close to unix epoch time in milliseconds.
Also, introduces a new GUC "citus.enable_cluster_clock", when true, every
distributed transaction is stamped with logical causal clock and persisted
in a catalog pg_dist_commit_transaction.
DESCRIPTION: Drops GUC defer_drop_after_shard_split
DESCRIPTION: Drops GUC defer_drop_after_shard_move
Drop GUCs and related parts from the code.
Delete tests that specifically added for the GUCs.
Keep tests that can be used without the GUCs.
Update test output changes.
The motivation for this PR is to have an "always deferring" mechanism.
These two GUCs provide an option to not deferring dropping objects
during a shard move/split, and dropping them immediately. With this PR,
we will be always deferring dropping orphaned shards and other types of
objects.
We will have a separate PR to extend the deferred cleanup operation, so
that we would create records for deferred drop, for Subscriptions,
Publications, Replication Slots etc. This will make us be able to keep
track of created objects that needs to be dropped, during a shard
move/split. We will have objects created specifically for the current
operation; and those objects will be dropped at the end.
We have an issue (a draft roadmap) for enabling parallel shard moves.
For details please see: https://github.com/citusdata/citus/issues/6437
Sometimes in CI our failure_split_cleanup test would fail like this:
```diff
CALL pg_catalog.citus_cleanup_orphaned_resources();
-NOTICE: cleaned up 79 orphaned resources
+NOTICE: cleaned up 82 orphaned resources
SELECT operation_id, object_type, object_name, node_group_id, policy_type
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/28107/workflows/4ec712c9-98b5-4e90-9806-e02a37d71679/jobs/846107
The reason was that previous tests in the schedule would also create
some orphaned resources. Sometimes some of those would already be
cleaned up by the maintenance daemon, resulting in a different number of
cleaned up resources than expected. This cleans up any previously
created resources at the start of the test without logging how many
exactly were cleaned up. As a bonus this now also allows running this
test using check-failure-base.
This didn't cause any bugs since today we're always calling
UpdateAutoConvertedForConnectedRelations with autoconverted=false, so we
don't need to backport this to anywhere.
Good PR descriptions for flaky tests are quite helpful when reviewing.
Although obviously no PR description is the same, there's a few common
pieces of information that are useful for all PRs that fix flaky tests.
We should not introduce breaking sql changes to upgrade files after they
are released. We did that for worker_fetch_foreign_file in v9.0.0 and
worker_repartition_cleanup in v9.2.0. Later when we try to drop those
udfs, they were missing for some clients unexpectedly due to breaking
change in an old upgrade script. For that case, the fix is to add DROP
IF EXISTS for those 2 udfs in 11.0-4--11.1-1.
This crash happens with recursively planned queries. For such queries,
subplans are explained via the ExplainOnePlan function of postgresql.
This function reconstructs the query description from the plan therefore
it expects the ActiveSnaphot for the query be available. This fix makes
sure that the snapshot is in the stack before calling ExplainOnePlan.
Fixes#2920.
DESCRIPTION: Don't leak search_path to workers on DDL
For DDL we have to set the `search_path` on workers to the same as on
the coordinator for some DDL to work. Previously this search_path would
leak outside of the transaction that was used for the DDL. This fixes
that by using `SET LOCAL` instead of `SET`. The only place where we
still use plain `SET` is for DDL commands that are not allowed within
transactions, such as `CREATE INDEX CONCURRENLTY`.
This fixes this flaky test:
```diff
CONTEXT: SQL statement "SELECT change_id FROM distributed_triggers.data_changes
WHERE shard_key_value = NEW.shard_key_value AND object_id = NEW.object_id
ORDER BY change_id DESC LIMIT 1"
-PL/pgSQL function record_change() line XX at SQL statement
+PL/pgSQL function distributed_triggers.record_change() line 17 at SQL statement
while executing command on localhost:57638
DELETE FROM data_ref_table where shard_key_value = 'hello';
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/27849/workflows/75ae5f1a-100b-4b7a-b991-7de069f39ee1/jobs/831429
I had tried to fix this flaky test in #5894 and then I tried
implementing a better fix in #5896, where @marcocitus suggested this
better fix. This change reverts the fix from #5894 and implements the
fix suggested by Marco.
Our multi_mx_alter_distributed_table test actually depended on the old
buggy search_path leaking behavior. After fixing the bug that test would
fail like this:
```diff
CALL proc_0(1.0);
DEBUG: pushing down the procedure
-NOTICE: Res: 3
-DETAIL: from localhost:xxxxx
+ERROR: relation "test_proc_colocation_0" does not exist
+CONTEXT: PL/pgSQL function mx_alter_distributed_table.proc_0(double precision) line 5 at SQL statement
+while executing command on localhost:57637
RESET client_min_messages;
```
I fixed this test by fully qualifying the table names used in the
procedure. I think it's quite unlikely that actual users depend
on this behavior though. Since it would require first doing
DDL before calling a procedure in a session where the
search_path was changed after connecting.
DESCRIPTION: Adds failure test for shard move
DESCRIPTION: Remove function `WaitForAllSubscriptionsToBecomeReady` and
related tests
Adding some failure tests for shard moves.
Dropping the not-needed-anymore function
`WaitForAllSubscriptionsToBecomeReady`, as the subscriptions now start
as ready from the beginning because we don't use logical replication
table sync workers anymore.
fixes: #6260
In CI shard_rebalancer sometimes fails with this error:
```diff
SET citus.node_connection_timeout to 60;
BEGIN;
SET LOCAL citus.shard_replication_factor TO 2;
SET citus.log_remote_commands TO ON;
SET SESSION citus.max_adaptive_executor_pool_size TO 5;
SELECT replicate_table_shards('dist_table_test_2', max_shard_copies := 4, shard_transfer_mode:='block_writes');
+WARNING: could not establish connection after 60 ms
```
Source
https://app.circleci.com/pipelines/github/citusdata/citus/28128/workflows/38eeacc4-4191-4366-87ed-9a628414965a/jobs/847458?invite=true#step-107-21
This PR avoids this issue by increasing
```citus.node_connection_timeout``` to 35s.
I fixed a lot of flaky tests recently and I found some patterns in the
type of issues and type of fixes. This adds a document that lists
these types of issues and explains how to fix them.
To be able to test non-blocking shard moves we take an advisory lock, so
we can pause the shard move at an interesting moment. Originally this
was during the logical replication catch up phase. But when I added
tests for the rebalancer progress I moved this lock before the initial
data copy. This allowed testing of the rebalance progress, but
inadvertently made our non-blocking tests not actually test if we held
unintended locks during logical replication catch up.
This fixes that by creating two types of advisory locks, one before the
copy and one after. This causes the tests to actually test their
intended scenario again.
Furthermore it starts using one of these locks for blocking shard moves
too. Which allowed me to reduce the complexity of the rebalance progress
test suite quite a bit. It also allowed enabling some flaky tests again,
because this stopped them from being flaky. And finally it allowed
testing of rebalance progress for blocking shard copy operations as
well.
In passing it fixes a flaky test during parallel blocking shard moves by
ordering the output.
DESCRIPTION: Adds status column to get_rebalance_progress()
Introduces a new column named `status` for the function
`get_rebalance_progress()`. For each ongoing shard move, this column
will reveal information about that shard move operation's current
status.
For now, candidate status messages could be one of the below.
* Not Started
* Setting Up
* Copying Data
* Catching Up
* Creating Constraints
* Final Catchup
* Creating Foreign Keys
* Completing
* Completed
Deparser function set_relation_column_names() knows that it needs to
re-evaluate column names based on relation's tuple descriptor when
the rte belongs to a relation (RTE_RELATION).
However before this commit, it didn't know about the fact that citus
might wrap such an rte with an rte that points to
citus_extradata_container() placeholder.
And because of this, it was simply taking the column aliases
(e.g., "bar" in "foo AS bar") into the account and this might result in
an incorrectly deparsed query as in below case:
* Say, if we had view based on following query:
```sql
SELECT a FROM table;
```
* And if we rename column "a" to "b", the view query normally becomes:
```sql
SELECT b AS a FROM table;
```
* So before this commit, deparsing a query based on that view was
resulting in such a query due to deparsing based on the column aliases,
which is not correct:
```sql
SELECT a FROM table;
```
Fixes#5932.
DESCRIPTION: Fixes a bug that might cause failing to query the views
based on tables that have renamed columns
PostgreSQL 15 exposes WL_SOCKET_CLOSED in WaitEventSet API, which is
useful for detecting closed remote sockets. In this patch, we use this
new event and try to detect closed remote sockets in the executor.
When a closed socket is detected, the executor now has the ability to
retry the connection establishment. Note that, the executor can retry
connection establishments only for the connection that has not been
used. Basically, this patch is mostly useful for preventing the executor
to fail if a cached connection is closed because of the worker node
restart (or worker failover).
In other words, the executor cannot retry connection establishment if we
are in a distributed transaction AND any command has been sent over the
connection. That requires more sophisticated retry mechanisms. For now,
fixing the above use case is enough.
Fixes#5538
Earlier discussions: #5908, #6259 and #6283
### Summary of the current approach regards to earlier trials
As noted, we explored some alternatives before getting into this.
https://github.com/citusdata/citus/pull/6283 is simple, but lacks an
important property. We should be checking for `WL_SOCKET_CLOSED`
_before_ sending anything over the wire. Otherwise, it becomes very
tricky to understand which connection is actually safe to retry. For
example, in the current patch, we can safely check
`transaction->transactionState == REMOTE_TRANS_NOT_STARTED` before
restarting a connection.
#6259 does what we intent here (e.g., check for sending any command).
However, as @marcocitus noted, it is very tricky to handle
`WaitEventSets` in multiple places. And, the executor is designed such
that it reacts to the events. So, adding anything `pre-executor` seemed
too ugly.
In the end, I converged into this patch. This patch relies on the
simplicity of #6283 and also does a very limited handling of
`WaitEventSets`, just for our purpose. Just before we add any connection
to the execution, we check if the remote session has already closed.
With that, we do a brief interaction of multiple wait event processing,
but with different purposes. The new wait event processing we added does
not even consider cancellations. We let that handled by the main event
processing loop.
Co-authored-by: Marco Slot <marco.slot@gmail.com>
In #6405 I added better improved blocked process detection for isolation
tests. But when cleaning up unnecessary code I cleaned up a bit too
much. This actually includes the new function definition in our
migrations.
In CI multi_partitioning sometimes fails with this error:
```diff
SELECT citus_remove_node('localhost', :master_port);
- citus_remove_node
----------------------------------------------------------------------
-
-(1 row)
-
+ERROR: tuple concurrently deleted
-- d) invalid tables for helper UDFs
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/27993/workflows/685e5b20-c923-43e5-8a0d-b932ef4c4914/jobs/839466
This PR avoids this concurrency issue by not running the
multi_partitioning test in parallel with other tests.
If an operation requires having coordinator in pg_dist_node and if that
is not the case, then we automatically add the coordinator into
pg_dist_node if user didn't add any worker nodes yet.
However, if user have already added some worker nodes before, we throw
an error. With this commit, we improve the error thrown in that case.
Closes#6423 based on the discussion made there.
Sometimes our CI randomly fails on a test in a way similar to this:
```diff
step s2-drop:
DROP TABLE cancel_table;
-
+ <waiting ...>
+step s2-drop: <... completed>
starting permutation: s1-timeout s1-begin s1-sleep10000 s1-rollback s1-reset s1-drop
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/26524/workflows/5415b84f-13a3-482f-bef9-648314c79a67/jobs/756377
I tried to fix that already in #6252 by disabling the maintenance daemon
during isolation tests. But it seems that hasn't fixed all cases of
these errors. This is another attempt at fixing these issues that seems
to have better results.
What it does is that it starts using the pInterestingPids parameter that
citus_isolation_test_session_is_blocked receives. With this change we
start filter out block-edges that are not caused by any of these pids.
In passing this change also makes it possible to run
`isolation_create_distributed_table_concurrently` with
`check-isolation-base`
PG15 introduced a function called ReplicationSlotName that causes
conflicts with our function with the same name. I solved this issue by
renaming our function to ReplicationSlotNameForNodeAndOwner
Relevant PG commit:
c3b5992b91
DESCRIPTION: Fix bug in global PID assignment for rebalancer
sub-connections
In CI our isolation_shard_rebalancer_progress test would sometimes fail
like this:
```diff
+isolationtester: canceling step s1-rebalance-c1-block-writes after 60 seconds
step s1-rebalance-c1-block-writes:
SELECT rebalance_table_shards('colocated1', shard_transfer_mode:='block_writes');
- <waiting ...>
+
+ERROR: canceling statement due to user request
step s7-get-progress:
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/27855/workflows/2a7e335a-f3e8-46ed-b6bd-6920d42f7214/jobs/831710
It turned out this was an actual bug in the way our assigning of global
PIDs interacts with the way we connect to ourselves as the shard
rebalancer. The first command the shard rebalancer sends is a SET
ommand to change the application_name to `citus_rebalancer`. If
`StartupCitusBackend` is called after this command is processed, then it
overwrites the global PID that was extracted from the previous
application_name. This makes sure that we don't do that, and continue to
use the original global PID. While it might seem that we only call
`StartupCitusBackend` once for each query backend, this isn't actually
the case. Whenever pg_dist_partition gets ANALYZEd by autovacuum
we indirectly call `StartupCitusBackend` again, because we invalidate
the cache then.
In passing this fixes two other things as well:
1. It sets `distributedCommandOriginator` correctly in
`AssignGlobalPID`, by using IsExternalClientBackend(). This doesn't
matter much anymore, since AssignGlobalPID effectively becomes a
no-op in this PR for any non-external client backends.
2. It passes the application_name to InitializeBackendData in
StartupCitusBackend, instead of INVALID_CITUS_INTERNAL_BACKEND_GPID
(which effectively got casted to NULL). In practice this doesn't
change the behaviour of the call, since the call is a no-op for every
backend except the maintenance daemon. And the behaviour of the call
is the same for NULL as for the application_name of the maintenance
daemon.
We decrease verbosity level here to avoid the flaky output
https://app.circleci.com/pipelines/github/citusdata/citus/27936/workflows/dc63128a-1570-41a0-8722-08f3e3cfe301/jobs/836153
```diff
select alter_table_set_access_method('ref','heap');
NOTICE: creating a new table for alter_table_set_access_method.ref
NOTICE: moving the data of alter_table_set_access_method.ref
NOTICE: dropping the old alter_table_set_access_method.ref
NOTICE: drop cascades to 2 other objects
-DETAIL: drop cascades to materialized view m_ref
-drop cascades to view v_ref
+DETAIL: drop cascades to view v_ref
+drop cascades to materialized view m_ref
CONTEXT: SQL statement "DROP TABLE alter_table_set_access_method.ref CASCADE"
NOTICE: renaming the new table to alter_table_set_access_method.ref
alter_table_set_access_method
-------------------------------
(1 row)
```
DESCRIPTION: Raises memory limits in columnar from 256MB to 1GB for
reads and writes
This doesn't completely fix#5918 but at least increases the
buffer limits that might cause throwing an error when reading
from or writing into into columnar storage. A way better approach
to fix this is documented in #6420.
Replacing memcpy_s with memcpy is quite safe in those places
since we anyway make sure to allocate enough amount of memory
before writing into related buffers.
When you run vanilla tests in your local environment, some of the tests
tries to find path for regress.so which is not in default lib path. That
is why we need to specify regress.so path as dlpath option.
Example failure:
```
LOAD :'regresslib';
+ERROR: could not access file "/home/aykutbozkurt/.pgenv/pgsql-15beta4/lib/regress.so": No such file or directory
```
It is actually in
`~/.pgenv/src/postgresql-15beta4/src/test/regress/regress.so` which is
found by `$regresslibdir`.
PG15 introduced an optimization on GROUP BY keys that is now reverted on
RC2.
Relevant PG commit:
Revert "Optimize order of GROUP BY keys".
443df6e2db932a7cd6d85ddfb67e11a43345130d
Fixes https://github.com/citusdata/citus/issues/6394.
DESCRIPTION: Fixes a bug that causes creating disabled-triggers on
shards as enabled
Since CREATE TRIGGER doesn't have syntax support to specify
whether the trigger should be enabled/disabled, the underlying
PG function (`pg_get_triggerdef()`) that we use to generate the
command to create the trigger is not enough. For this reason, we
append a second command to enable/disable trigger, right after
creating it.
We don't retain explicit extension dependencies set by using
`ALTER trigger DEPENDS ON EXTENSION` commands too, but apparently
right fix for that is to throw an error as in
`PreprocessAlterTriggerDependsStmt()`; so, opened a separate PR
to fix that #6399.
During alter_distributed_table, we create a new table like the
original table but with the altered options.
To retrieve the name of the distribution column, we were using
the attribute syscache of the new table, since we already created
the new table as identical to the original table.
However, the attribute syscaches of these two tables are not
the same if the original table has dropped columns. The reason
is that dropped columns are all still present in the cache.
Hence, for example, the attnos would be different in the syscaches.
So, let's use the attribute syscache of the original table.
DESCRIPTION: Fixes a bug that prevents retaining columnar table options after a table-rewrite
A fix for this issue: Columnar: options ignored during ALTER TABLE
rewrite #5927
The OID for the temporary table created during ALTER TABLE was not the
same as the original table's OID so the columnar options were not being
applied during rewrite.
The change is that I applied the original table's columnar options to
the new table so that it has the correct options during write. I also
added a test.
DESCRIPTION: Adds source_lsn and target_lsn fields into
get_rebalance_progress
Adding two fields named `source_lsn` and `target_lsn` to the function
`get_rebalance_progress`.
Target lsn data is fetched in `GetShardStatistics`, by expanding the
query sent to workers (joining with pg_subscription_rel and
pg_stat_subscription). Then put into the hashmap, for each shard.
Source lsn data is fetched in `BuildWorkerShardStatististicsHash`, in
the loop that iterate each node, by sending a pg_current_wal_lsn query
to each node. Then put into the hashmap, for each node.
On our CI our isolation_shard_rebalancer_progress would sometimes
randomly fail like this:
```diff
table_name|shardid|shard_size|sourcename|sourceport|source_shard_size|targetname|targetport|target_shard_size|progress|operation_type
----------+-------+----------+----------+----------+-----------------+----------+----------+-----------------+--------+--------------
-colocated1|1500001| 49152|localhost | 57637| 49152|localhost | 57638| 73728| 1|move
-colocated2|1500005| 376832|localhost | 57637| 376832|localhost | 57638| 401408| 1|move
+colocated1|1500001| 49152|localhost | 57637| 49152|localhost | 57638| 81920| 1|move
+colocated2|1500005| 376832|localhost | 57637| 376832|localhost | 57638| 409600| 1|move
(2 rows)
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/27688/workflows/8c5ca443-5f21-4f21-b74f-0ca7bde69648/jobs/823648/parallel-runs/1
The shard sizes would be slightly larger or smaller than expected. This
fixes this by fixing the output to the nearest expected shard size. To
do so I used a trick described in this stack overflow answer:
https://stackoverflow.com/a/33147437/2570866
When investigating I ran into one more random failure:
```diff
-step s1-shard-move-c1-block-writes: <... completed>
+step s4-shard-move-sep-block-writes: <... completed>
citus_move_shard_placement
--------------------------
(1 row)
-step s4-shard-move-sep-block-writes: <... completed>
+step s1-shard-move-c1-block-writes: <... completed>
citus_move_shard_placement
--------------------------
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/27707/workflows/c3ff4fc7-5068-4096-ab9f-803c941ddac0/jobs/824622/parallel-runs/29?filterBy=FAILED
This random failure happens, because the two parallel moves can complete
at the same time. So, it's non-deterministic which one finishes first. To
make this deterministic I used the "marker" feature from the isolation
tester.
And finally I ran into a third random failure:
```diff
table_name|shardid|shard_size|sourcename|sourceport|source_shard_size|targetname|targetport|target_shard_size|progress|operation_type
----------+-------+----------+----------+----------+-----------------+----------+----------+-----------------+--------+--------------
-colocated1|1500001| 50000|localhost | 57637| 50000|localhost | 57638| 50000| 1|move
-colocated2|1500005| 400000|localhost | 57637| 400000|localhost | 57638| 400000| 1|move
+colocated1|1500001| 50000|localhost | 57637| 50000|localhost | 57638| 8000| 1|move
+colocated2|1500005| 400000|localhost | 57637| 400000|localhost | 57638| 8000| 1|move
colocated1|1500002| 200000|localhost | 57637| 200000|localhost | 57638| 0| 0|move
colocated2|1500006| 8000|localhost | 57637| 8000|localhost | 57638| 0| 0|move
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/27707/workflows/c3ff4fc7-5068-4096-ab9f-803c941ddac0/jobs/824622/parallel-runs/30?filterBy=FAILED
This happened in two of the tests only. For now I commented these tests
out. I have some ideas on how to fix these, but these ideas require more
impactful changes than I would like in this PR. One of these tests had a
copy paste error too, in passing I fixed that in the commented out line.
This test used to contain some utility commands that Citus did not
support. However we added support for most of the commands, and this
test got outdated.
We used to error out on community when user attempted to use pooler
options. Now that we open sourced all enterprise features, the test can
now be removed.
Sometimes our CI randomly fails on a test in a way similar to this:
```diff
step s2-drop:
DROP TABLE cancel_table;
-
+ <waiting ...>
+step s2-drop: <... completed>
starting permutation: s1-timeout s1-begin s1-sleep10000 s1-rollback s1-reset s1-drop
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/26524/workflows/5415b84f-13a3-482f-bef9-648314c79a67/jobs/756377
Another example of a failure like this:
```diff
stop_session_level_connection_to_node
-------------------------------------
(1 row)
step s3-display:
SELECT * FROM ref_table ORDER BY id, value;
SELECT * FROM dist_table ORDER BY id, value;
-
+ <waiting ...>
+step s3-display: <... completed>
id|value
--+-----
```
Source: https://app.circleci.com/pipelines/github/citusdata/citus/26551/workflows/91dca4b2-bb1c-4cae-b2ef-ce3f9c689ce5/jobs/757781
A step that shouldn't be blocked is detected as "waiting..." temporarily
and then gets unblocked automatically immediately after. I'm not
certain of the reason for this, but one explanation is that the
maintenance daemon is doing something that blocks the query. In the
shown case my hunch is that it could be the deferred shard deletion.
This PR disables all the features of the maintenance daemon during
isolation testing to try and prevent process from randomly being
detected as blocking.
NOTE: I'm not certain that this will actually fix this issue. If the
issue persists even after this change, at least we know that it's not
the maintenance daemon that's blocking it.
For the sake of documentation, here is a failing diff:
```diff
step s2-view-dist:
SELECT query, citus_nodename_for_nodeid(citus_nodeid_for_gpid(global_pid)), citus_nodeport_for_nodeid(citus_nodeid_for_gpid(global_pid)), state, wait_event_type, wait_event, usename, datname FROM citus_dist_stat_activity WHERE query NOT ILIKE ALL(VALUES('%pg_prepared_xacts%'), ('%COMMIT%'), ('%BEGIN%'), ('%pg_catalog.pg_isolation_test_session_is_blocked%'), ('%citus_add_node%')) AND backend_type = 'client backend' ORDER BY query DESC;
query |citus_nodename_for_nodeid|citus_nodeport_for_nodeid|state |wait_event_type|wait_event|usename |datname
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+-------------------------+-------------------+---------------+----------+--------+----------
ALTER TABLE test_table ADD COLUMN x INT;
|localhost | 57636|idle in transaction|Client |ClientRead|postgres|regression
-(1 row)
+
+ SELECT coalesce(to_jsonb(array_agg(csa_from_one_node.*)), '[{}]'::JSONB)
+ FROM (
+ SELECT global_pid, worker_query AS is_worker_query, pg_stat_activity.* FROM
+ pg_stat_activity LEFT JOIN get_all_active_transactions() ON process_id = pid
+ ) AS csa_from_one_node;
+ |localhost | 57638|active | | |postgres|regression
+(2 rows)
```
This failure can be seen at [this CI
run](https://app.circleci.com/pipelines/github/citusdata/citus/27653/workflows/d769701c-8f6e-4f97-a412-16f7b9b288a6/jobs/821416)
PG15 now allows users to specify oids when creating databases. This
feature is a side effect of a bigger feature in pg_upgrade.
Relevant PG Commit:
pg_upgrade: Preserve database OIDs.
aa01051418f10afbdfa781b8dc109615ca785ff9
Depends on https://github.com/citusdata/the-process/pull/92Closes: #6371
Updates test dependencies to not rely on a known vulnerable dependency
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
PG15 has suppressed some casts on constants when querying foreign
tables.
For example, we can use text to represent a type that's an enum on the
remote side.
A comparison on such a column will get shipped as "var = 'foo'::text".
But there's no enum = text operator on the remote side.
If we leave off the explicit cast, the comparison will work.
Test we behave in the same way with a Citus foreign table
Reminder: foreign tables cannot be distributed/reference, can only be
Citus local
Relevant PG commit:
f8abb0f5e1
PostgreSQL 15 had some changes to jsonpath to conform with ECMA-262
referenced by SQL standard. This commit adds tests to make sure Citus
also supports the same standards.
Relevant pg commit:
e26114c817b610424010cfbe91a743f591246ff1
In Split, Logical replication logic and ShardCleaner we call
`SendCommandListToWorkerOutsideTransaction` and
`SendOptionalCommandListToWorkerOutsideTransaction` frequently. This
opens new connection for each of those calls, even though we already
have a perfectly good connection lying around.
This PR adds two new APIs
`SendCommandListToWorkerOutsideTransactionWithConnection` and
`SendOptionalCommandListToWorkerOutsideTransactionWithConnection` that
allow sending a list of queries in a transaction over an existing
connection. We also update the callers (Split, ShardCleaner, Logical
Replication) to use these new APIs instead.
Co-authored-by: Nitish Upreti <niupre@microsoft.com>
Co-authored-by: Onder Kalaci <onderkalaci@gmail.com>
In Citus 11.1.0 we changed the order of doing the initial data copy and
the replica identity creation when doing a non blocking shard move. This
was done to try and increase the speed with which shard moves could be
done. But after doing more extensive performance testing this change
turned out to have a negative impact on the speed of moves on the setups
that I tested.
Looking at the resource usage metrics of the VMs the reason for this
seems to be that these shard moves were bottlenecked by disk bandwidth.
While creating replica identities in bulk after the initial copy will
reduce CPU usage a bit, it does require an additional sequence scan of
the just written data. So when a VM is bottlenecked on disk, it makes
sense to spend a little bit more CPU to avoid an additional scan. Since
PKs are usually simple indexes that don't require lots of CPU to update,
as opposed to e.g. GiST indexes.
This reverts the order change to avoid a regression on shard move speed
in these cases.
For future releases we might consider re-evaluating our index creation
order for other indexes too, and create "simple" indexes before the
copy.
Given that we drop DEFAULT nextval('sequence') expressions from
shard relation columns, allowing `ON DELETE/UPDATE SET DEFAULT`
on such columns might cause inserting NULL values as a result
of a delete/update operation.
For this reason, we disallow ON DELETE/UPDATE SET DEFAULT actions
on columns that default to sequences.
DESCRIPTION: Disallows having ON DELETE/UPDATE SET DEFAULT actions on
columns that default to sequences
Fixes#6339.
As we did for GENERATED STORED columns in #4613, we should not drop
column
default expressions that are not based on sequences from shard relation
since
such expressions need to exist e.g. for foreign key actions.
For the column default expressions that are based on sequences we cannot
do much, so we need to disallow having ON DELETE SET DEFAULT actions on
such columns in a separate PR, see #6339.
Fixes#6318.
DESCRIPTION: Fixes a bug that might cause inserting incorrect DEFAULT
values when applying foreign key actions
PG15 added support for security invoker views. Relevant PG conmit:
7faa5fc84b
These views check the permissions for the underlying tables of the view
invoker user, not the view definer user.
When the view has underlying distributed tables, the queries to the
shards are sent by opening connections with the current user, which is
the view invoker, no matter what the type of the view is. This means
that, for distributed views, they were always behaving like security
invoker views. Check the following issue for more details:
https://github.com/citusdata/citus/issues/6161
So, Citus doesn't fully support security definer views.
However Citus does fully support security invoker views. We add tests to
make sure we cover different cases.
DESCRIPTION: Fixes dropping replication slots
As detected by a flaky test, Citus sometimes fails to drop replication
slots, possibly due to a race condition, at the end of a shard split.
With this PR, we retry to drop them in case of an `OBJECT_IN_USE` error,
consistently for 20 seconds.
fixes: #6326
Both tests include pushdown and pull to coordinator type of aggregate
execution.
Relevant PG commits:
Add min() and max() aggregates for xid8
400fc6b6487ddf16aa82c9d76e5cfbe64d94f660
Add range_agg with multirange inputs
7ae1619bc5b1794938c7387a766b8cae34e38d8a
Co-authored-by: Onder Kalaci <onderkalaci@gmail.com>
DESCRIPTION: Improve logging during shard split and resource cleanup
### DESCRIPTION
This PR makes logging improvements to Shard Split :
1. Update confusing logging to fix#6312
2. Added new `ereport(LOG` to make debugging easier as part of telemetry review.
Comment from the code is clear on this:
/*
* The statistics objects of the distributed table are not relevant
* for the distributed planning, so we can override it.
*
* Normally, we should not need this. However, the combination of
* Postgres commit 269b532aef55a579ae02a3e8e8df14101570dfd9 and
* Citus function AdjustPartitioningForDistributedPlanning()
* forces us to do this. The commit expects statistics objects
* of partitions to have "inh" flag set properly. Whereas, the
* function overrides "inh" flag. To avoid Postgres to throw error,
* we override statlist such that Postgres does not try to process
* any statistics objects during the standard_planner() on the
* coordinator. In the end, we do not need the standard_planner()
* on the coordinator to generate an optimized plan. We call
* into standard_planner() for other purposes, such as generating the
* relationRestrictionContext here.
*
* AdjustPartitioningForDistributedPlanning() is a hack that we use
* to prevent Postgres' standard_planner() to expand all the partitions
* for the distributed planning when a distributed partitioned table
* is queried. It is required for both correctness and performance
* reasons. Although we can eliminate the use of the function for
* the correctness (e.g., make sure that rest of the planner can handle
* partitions), it's performance implication is hard to avoid. Certain
* planning logic of Citus (such as router or query pushdown) relies
* heavily on the relationRestrictionList. If
* AdjustPartitioningForDistributedPlanning() is removed, all the
* partitions show up in the, causing high planning times for
* such queries.
*/
DESCRIPTION: Fixes floating exception during
create_distributed_table_concurrently.
Fixes#6332.
During create_distributed_table_concurrently, when there is no active
primary node, it fails with floating exception. We added similar check
with create_distributed_table. It will fail with proper message if
current active node is less than replication factor.
The PR introduces code changes to fix Issue
[6303](https://github.com/citusdata/citus/issues/6303)
`create_distributed_table_concurrently` following drop column, creates a
buggy situation in split decoder.
* Consider the below scenario:
* Session1 : Drop column followed by
create_distributed_table_concurrently
* Session2 : Concurrent insert workload
The child shards created by `create_distributed_table_concurrently` will
have less columns than the source shard because some column were
dropped. The incoming tuple from session2 will have more columns as the
writes happened on source shard. But now the tuple needs to be applied
on child shard. So we need to format existing tuple according to child
schema and skip dropped column values.
The PR fixes this by reformatting the tuple according the target child
schema.
Test:
1) isolation_create_distributed_concurrently_after_drop_column - Repros
the issue and tests on the same.
No need for description, fixing issue introduced with new feature for
11.1
Fixes#6333
Due to Postgres' C api being o-indexed and postgres' attributes being
1-indexed, we were reading the wrong Datum as the Task owner when
cancelling. Here we add a test to show the error and fix the off-by-one
error.
When I built Citus on PG15beta4 locally, I get a warning message.
```
utils/background_jobs.c:902:5: warning: declaration does not declare anything
[-Wmissing-declarations]
__attribute__((fallthrough));
^
1 warning generated.
```
This is a hint to the compiler that we are deliberately falling through
in a switch-case block.
DESCRIPTION: Show citus_copy_shard_placement progress in
get_rebalance_progress
When rebalancing to a new node that does not have reference tables yet
the rebalancer will first copy the reference tables to the nodes.
Depending on the size of the reference tables, this might take a long
time. However, there's no indication of what's happening at this stage
of the rebalance.
This PR improves this situation by also showing the progress of any
citus_copy_shard_placement calls when calling get_rebalance_progress.
We can now do the following:
- Distribute sequence with logged/unlogged option
- ALTER TABLE my_sequence SET LOGGED/UNLOGGED
- ALTER SEQUENCE my_sequence SET LOGGED/UNLOGGED
Relevant PG commit
344d62fb9a
PG15 introduces `CLUSTER` commands for partitioned tables. Similar to a
`CLUSTER` command with no supplied table names, these commands also can
not be run inside transaction blocks and therefore can not be propagated
in a distributed transaction block with ease. Therefore we raise warnings.
Relevant PG commit: cfdd03f45e6afc632fbe70519250ec19167d6765
DESCRIPTION: Add a rebalancer that uses background tasks for its
execution
Based on the baclground jobs and tasks introduced in #6296 we implement
a new rebalancer on top of the primitives of background execution. This
allows the user to initiate a rebalance and let Citus execute the long
running steps in the background until completion.
Users can invoke the new background rebalancer with `SELECT
citus_rebalance_start();`. It will output information on its job id and
how to track progress. Also it returns its job id for automation
purposes. If you simply want to wait till the rebalance is done you can
use `SELECT citus_rebalance_wait();`
A running rebalance can be canelled/stopped with `SELECT
citus_rebalance_stop();`.