On our CI our isolation_shard_rebalancer_progress would sometimes
randomly fail like this:
```diff
table_name|shardid|shard_size|sourcename|sourceport|source_shard_size|targetname|targetport|target_shard_size|progress|operation_type
----------+-------+----------+----------+----------+-----------------+----------+----------+-----------------+--------+--------------
-colocated1|1500001| 49152|localhost | 57637| 49152|localhost | 57638| 73728| 1|move
-colocated2|1500005| 376832|localhost | 57637| 376832|localhost | 57638| 401408| 1|move
+colocated1|1500001| 49152|localhost | 57637| 49152|localhost | 57638| 81920| 1|move
+colocated2|1500005| 376832|localhost | 57637| 376832|localhost | 57638| 409600| 1|move
(2 rows)
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/27688/workflows/8c5ca443-5f21-4f21-b74f-0ca7bde69648/jobs/823648/parallel-runs/1
The shard sizes would be slightly larger or smaller than expected. This
fixes this by fixing the output to the nearest expected shard size. To
do so I used a trick described in this stack overflow answer:
https://stackoverflow.com/a/33147437/2570866
When investigating I ran into one more random failure:
```diff
-step s1-shard-move-c1-block-writes: <... completed>
+step s4-shard-move-sep-block-writes: <... completed>
citus_move_shard_placement
--------------------------
(1 row)
-step s4-shard-move-sep-block-writes: <... completed>
+step s1-shard-move-c1-block-writes: <... completed>
citus_move_shard_placement
--------------------------
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/27707/workflows/c3ff4fc7-5068-4096-ab9f-803c941ddac0/jobs/824622/parallel-runs/29?filterBy=FAILED
This random failure happens, because the two parallel moves can complete
at the same time. So, it's non-deterministic which one finishes first. To
make this deterministic I used the "marker" feature from the isolation
tester.
And finally I ran into a third random failure:
```diff
table_name|shardid|shard_size|sourcename|sourceport|source_shard_size|targetname|targetport|target_shard_size|progress|operation_type
----------+-------+----------+----------+----------+-----------------+----------+----------+-----------------+--------+--------------
-colocated1|1500001| 50000|localhost | 57637| 50000|localhost | 57638| 50000| 1|move
-colocated2|1500005| 400000|localhost | 57637| 400000|localhost | 57638| 400000| 1|move
+colocated1|1500001| 50000|localhost | 57637| 50000|localhost | 57638| 8000| 1|move
+colocated2|1500005| 400000|localhost | 57637| 400000|localhost | 57638| 8000| 1|move
colocated1|1500002| 200000|localhost | 57637| 200000|localhost | 57638| 0| 0|move
colocated2|1500006| 8000|localhost | 57637| 8000|localhost | 57638| 0| 0|move
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/27707/workflows/c3ff4fc7-5068-4096-ab9f-803c941ddac0/jobs/824622/parallel-runs/30?filterBy=FAILED
This happened in two of the tests only. For now I commented these tests
out. I have some ideas on how to fix these, but these ideas require more
impactful changes than I would like in this PR. One of these tests had a
copy paste error too, in passing I fixed that in the commented out line.
This test used to contain some utility commands that Citus did not
support. However we added support for most of the commands, and this
test got outdated.
We used to error out on community when user attempted to use pooler
options. Now that we open sourced all enterprise features, the test can
now be removed.
Sometimes our CI randomly fails on a test in a way similar to this:
```diff
step s2-drop:
DROP TABLE cancel_table;
-
+ <waiting ...>
+step s2-drop: <... completed>
starting permutation: s1-timeout s1-begin s1-sleep10000 s1-rollback s1-reset s1-drop
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/26524/workflows/5415b84f-13a3-482f-bef9-648314c79a67/jobs/756377
Another example of a failure like this:
```diff
stop_session_level_connection_to_node
-------------------------------------
(1 row)
step s3-display:
SELECT * FROM ref_table ORDER BY id, value;
SELECT * FROM dist_table ORDER BY id, value;
-
+ <waiting ...>
+step s3-display: <... completed>
id|value
--+-----
```
Source: https://app.circleci.com/pipelines/github/citusdata/citus/26551/workflows/91dca4b2-bb1c-4cae-b2ef-ce3f9c689ce5/jobs/757781
A step that shouldn't be blocked is detected as "waiting..." temporarily
and then gets unblocked automatically immediately after. I'm not
certain of the reason for this, but one explanation is that the
maintenance daemon is doing something that blocks the query. In the
shown case my hunch is that it could be the deferred shard deletion.
This PR disables all the features of the maintenance daemon during
isolation testing to try and prevent process from randomly being
detected as blocking.
NOTE: I'm not certain that this will actually fix this issue. If the
issue persists even after this change, at least we know that it's not
the maintenance daemon that's blocking it.
For the sake of documentation, here is a failing diff:
```diff
step s2-view-dist:
SELECT query, citus_nodename_for_nodeid(citus_nodeid_for_gpid(global_pid)), citus_nodeport_for_nodeid(citus_nodeid_for_gpid(global_pid)), state, wait_event_type, wait_event, usename, datname FROM citus_dist_stat_activity WHERE query NOT ILIKE ALL(VALUES('%pg_prepared_xacts%'), ('%COMMIT%'), ('%BEGIN%'), ('%pg_catalog.pg_isolation_test_session_is_blocked%'), ('%citus_add_node%')) AND backend_type = 'client backend' ORDER BY query DESC;
query |citus_nodename_for_nodeid|citus_nodeport_for_nodeid|state |wait_event_type|wait_event|usename |datname
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+-------------------------+-------------------+---------------+----------+--------+----------
ALTER TABLE test_table ADD COLUMN x INT;
|localhost | 57636|idle in transaction|Client |ClientRead|postgres|regression
-(1 row)
+
+ SELECT coalesce(to_jsonb(array_agg(csa_from_one_node.*)), '[{}]'::JSONB)
+ FROM (
+ SELECT global_pid, worker_query AS is_worker_query, pg_stat_activity.* FROM
+ pg_stat_activity LEFT JOIN get_all_active_transactions() ON process_id = pid
+ ) AS csa_from_one_node;
+ |localhost | 57638|active | | |postgres|regression
+(2 rows)
```
This failure can be seen at [this CI
run](https://app.circleci.com/pipelines/github/citusdata/citus/27653/workflows/d769701c-8f6e-4f97-a412-16f7b9b288a6/jobs/821416)
Update the test images from PG15beta4 to PG15rc1.
There is a new commit in 15rc1 that improves message styles. We also
update the messages accordingly.
Relevant PG commit:
[517484b5820e9e20057ff066b5df7d09cbb5f464](517484b582)
Depends on: https://github.com/citusdata/the-process/pull/93
PG15 now allows users to specify oids when creating databases. This
feature is a side effect of a bigger feature in pg_upgrade.
Relevant PG Commit:
pg_upgrade: Preserve database OIDs.
aa01051418f10afbdfa781b8dc109615ca785ff9
Depends on https://github.com/citusdata/the-process/pull/92Closes: #6371
Updates test dependencies to not rely on a known vulnerable dependency
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
PG15 has suppressed some casts on constants when querying foreign
tables.
For example, we can use text to represent a type that's an enum on the
remote side.
A comparison on such a column will get shipped as "var = 'foo'::text".
But there's no enum = text operator on the remote side.
If we leave off the explicit cast, the comparison will work.
Test we behave in the same way with a Citus foreign table
Reminder: foreign tables cannot be distributed/reference, can only be
Citus local
Relevant PG commit:
f8abb0f5e1
PostgreSQL 15 had some changes to jsonpath to conform with ECMA-262
referenced by SQL standard. This commit adds tests to make sure Citus
also supports the same standards.
Relevant pg commit:
e26114c817b610424010cfbe91a743f591246ff1
In Split, Logical replication logic and ShardCleaner we call
`SendCommandListToWorkerOutsideTransaction` and
`SendOptionalCommandListToWorkerOutsideTransaction` frequently. This
opens new connection for each of those calls, even though we already
have a perfectly good connection lying around.
This PR adds two new APIs
`SendCommandListToWorkerOutsideTransactionWithConnection` and
`SendOptionalCommandListToWorkerOutsideTransactionWithConnection` that
allow sending a list of queries in a transaction over an existing
connection. We also update the callers (Split, ShardCleaner, Logical
Replication) to use these new APIs instead.
Co-authored-by: Nitish Upreti <niupre@microsoft.com>
Co-authored-by: Onder Kalaci <onderkalaci@gmail.com>
In Citus 11.1.0 we changed the order of doing the initial data copy and
the replica identity creation when doing a non blocking shard move. This
was done to try and increase the speed with which shard moves could be
done. But after doing more extensive performance testing this change
turned out to have a negative impact on the speed of moves on the setups
that I tested.
Looking at the resource usage metrics of the VMs the reason for this
seems to be that these shard moves were bottlenecked by disk bandwidth.
While creating replica identities in bulk after the initial copy will
reduce CPU usage a bit, it does require an additional sequence scan of
the just written data. So when a VM is bottlenecked on disk, it makes
sense to spend a little bit more CPU to avoid an additional scan. Since
PKs are usually simple indexes that don't require lots of CPU to update,
as opposed to e.g. GiST indexes.
This reverts the order change to avoid a regression on shard move speed
in these cases.
For future releases we might consider re-evaluating our index creation
order for other indexes too, and create "simple" indexes before the
copy.
Given that we drop DEFAULT nextval('sequence') expressions from
shard relation columns, allowing `ON DELETE/UPDATE SET DEFAULT`
on such columns might cause inserting NULL values as a result
of a delete/update operation.
For this reason, we disallow ON DELETE/UPDATE SET DEFAULT actions
on columns that default to sequences.
DESCRIPTION: Disallows having ON DELETE/UPDATE SET DEFAULT actions on
columns that default to sequences
Fixes#6339.
As we did for GENERATED STORED columns in #4613, we should not drop
column
default expressions that are not based on sequences from shard relation
since
such expressions need to exist e.g. for foreign key actions.
For the column default expressions that are based on sequences we cannot
do much, so we need to disallow having ON DELETE SET DEFAULT actions on
such columns in a separate PR, see #6339.
Fixes#6318.
DESCRIPTION: Fixes a bug that might cause inserting incorrect DEFAULT
values when applying foreign key actions
PG15 added support for security invoker views. Relevant PG conmit:
7faa5fc84b
These views check the permissions for the underlying tables of the view
invoker user, not the view definer user.
When the view has underlying distributed tables, the queries to the
shards are sent by opening connections with the current user, which is
the view invoker, no matter what the type of the view is. This means
that, for distributed views, they were always behaving like security
invoker views. Check the following issue for more details:
https://github.com/citusdata/citus/issues/6161
So, Citus doesn't fully support security definer views.
However Citus does fully support security invoker views. We add tests to
make sure we cover different cases.
DESCRIPTION: Fixes dropping replication slots
As detected by a flaky test, Citus sometimes fails to drop replication
slots, possibly due to a race condition, at the end of a shard split.
With this PR, we retry to drop them in case of an `OBJECT_IN_USE` error,
consistently for 20 seconds.
fixes: #6326
Both tests include pushdown and pull to coordinator type of aggregate
execution.
Relevant PG commits:
Add min() and max() aggregates for xid8
400fc6b6487ddf16aa82c9d76e5cfbe64d94f660
Add range_agg with multirange inputs
7ae1619bc5b1794938c7387a766b8cae34e38d8a
Co-authored-by: Onder Kalaci <onderkalaci@gmail.com>
DESCRIPTION: Improve logging during shard split and resource cleanup
### DESCRIPTION
This PR makes logging improvements to Shard Split :
1. Update confusing logging to fix#6312
2. Added new `ereport(LOG` to make debugging easier as part of telemetry review.
Comment from the code is clear on this:
/*
* The statistics objects of the distributed table are not relevant
* for the distributed planning, so we can override it.
*
* Normally, we should not need this. However, the combination of
* Postgres commit 269b532aef55a579ae02a3e8e8df14101570dfd9 and
* Citus function AdjustPartitioningForDistributedPlanning()
* forces us to do this. The commit expects statistics objects
* of partitions to have "inh" flag set properly. Whereas, the
* function overrides "inh" flag. To avoid Postgres to throw error,
* we override statlist such that Postgres does not try to process
* any statistics objects during the standard_planner() on the
* coordinator. In the end, we do not need the standard_planner()
* on the coordinator to generate an optimized plan. We call
* into standard_planner() for other purposes, such as generating the
* relationRestrictionContext here.
*
* AdjustPartitioningForDistributedPlanning() is a hack that we use
* to prevent Postgres' standard_planner() to expand all the partitions
* for the distributed planning when a distributed partitioned table
* is queried. It is required for both correctness and performance
* reasons. Although we can eliminate the use of the function for
* the correctness (e.g., make sure that rest of the planner can handle
* partitions), it's performance implication is hard to avoid. Certain
* planning logic of Citus (such as router or query pushdown) relies
* heavily on the relationRestrictionList. If
* AdjustPartitioningForDistributedPlanning() is removed, all the
* partitions show up in the, causing high planning times for
* such queries.
*/
DESCRIPTION: Fixes floating exception during
create_distributed_table_concurrently.
Fixes#6332.
During create_distributed_table_concurrently, when there is no active
primary node, it fails with floating exception. We added similar check
with create_distributed_table. It will fail with proper message if
current active node is less than replication factor.
The PR introduces code changes to fix Issue
[6303](https://github.com/citusdata/citus/issues/6303)
`create_distributed_table_concurrently` following drop column, creates a
buggy situation in split decoder.
* Consider the below scenario:
* Session1 : Drop column followed by
create_distributed_table_concurrently
* Session2 : Concurrent insert workload
The child shards created by `create_distributed_table_concurrently` will
have less columns than the source shard because some column were
dropped. The incoming tuple from session2 will have more columns as the
writes happened on source shard. But now the tuple needs to be applied
on child shard. So we need to format existing tuple according to child
schema and skip dropped column values.
The PR fixes this by reformatting the tuple according the target child
schema.
Test:
1) isolation_create_distributed_concurrently_after_drop_column - Repros
the issue and tests on the same.
No need for description, fixing issue introduced with new feature for
11.1
Fixes#6333
Due to Postgres' C api being o-indexed and postgres' attributes being
1-indexed, we were reading the wrong Datum as the Task owner when
cancelling. Here we add a test to show the error and fix the off-by-one
error.
When I built Citus on PG15beta4 locally, I get a warning message.
```
utils/background_jobs.c:902:5: warning: declaration does not declare anything
[-Wmissing-declarations]
__attribute__((fallthrough));
^
1 warning generated.
```
This is a hint to the compiler that we are deliberately falling through
in a switch-case block.
DESCRIPTION: Show citus_copy_shard_placement progress in
get_rebalance_progress
When rebalancing to a new node that does not have reference tables yet
the rebalancer will first copy the reference tables to the nodes.
Depending on the size of the reference tables, this might take a long
time. However, there's no indication of what's happening at this stage
of the rebalance.
This PR improves this situation by also showing the progress of any
citus_copy_shard_placement calls when calling get_rebalance_progress.
We can now do the following:
- Distribute sequence with logged/unlogged option
- ALTER TABLE my_sequence SET LOGGED/UNLOGGED
- ALTER SEQUENCE my_sequence SET LOGGED/UNLOGGED
Relevant PG commit
344d62fb9a
PG15 introduces `CLUSTER` commands for partitioned tables. Similar to a
`CLUSTER` command with no supplied table names, these commands also can
not be run inside transaction blocks and therefore can not be propagated
in a distributed transaction block with ease. Therefore we raise warnings.
Relevant PG commit: cfdd03f45e6afc632fbe70519250ec19167d6765
DESCRIPTION: Add a rebalancer that uses background tasks for its
execution
Based on the baclground jobs and tasks introduced in #6296 we implement
a new rebalancer on top of the primitives of background execution. This
allows the user to initiate a rebalance and let Citus execute the long
running steps in the background until completion.
Users can invoke the new background rebalancer with `SELECT
citus_rebalance_start();`. It will output information on its job id and
how to track progress. Also it returns its job id for automation
purposes. If you simply want to wait till the rebalance is done you can
use `SELECT citus_rebalance_wait();`
A running rebalance can be canelled/stopped with `SELECT
citus_rebalance_stop();`.
Reverting the following commits:
977ddaae564a5cf06def9ae19c181f30447117e5f9c43f433221dba4ed08262932da3e
We have to manually make changes to this file.
Follow the relevant PG commit in ruleutils.c & make the exact same changes in ruleutils_15.c
Relevant PG commit:
96ef3237bf741c12390003e90a4d7115c0c854b7
PG 15 added support for that (d6f96ed94e73052f99a2e545ed17a8b2fdc1fb8a).
We also add support, but we already do not support ON DELETE SET
NULL/DEFAULT for distribution column. So, in essence, we add support for
reference tables and Citus local tables.
Semi-related: We should really consider fixing:
https://github.com/citusdata/citus/issues/6318
PG 15 added support for that (d6f96ed94e73052f99a2e545ed17a8b2fdc1fb8a).
We also add support, but we already do not support ON DELETE SET NULL/DEFAULT
for distribution column. So, in essence, we add support for reference tables
and Citus local tables.
The logical replication catchup part for shard splits and shard moves is
very similar. This abstracts most of that similarity away into a single
function. This also improves the logic for non blocking shard splits a
bit, by using faster foreign key creation. It also parallelizes index creation
which shard moves were already doing, but shard splits did not.