Description:
Implementing CDC changes using Logical Replication to avoid
re-publishing events multiple times by setting up replication origin
session, which will add "DoNotReplicateId" to every WAL entry.
- shard splits
- shard moves
- create distributed table
- undistribute table
- alter distributed tables (for some cases)
- reference table operations
The citus decoder which will be decoding WAL events for CDC clients,
ignores any WAL entry with replication origin that is not zero.
It also maps the shard names to distributed table names.
Today we allow planning the queries that reference non-colocated tables
if the shards that query targets are placed on the same node. However,
this may not be the case, e.g., after rebalancing shards because it's
not guaranteed to have those shards on the same node anymore.
This commit adds citus.enable_non_colocated_router_query_pushdown GUC
that can be used to disallow planning such queries via router planner,
when it's set to false. Note that the default value for this GUC will be
"true" for 11.3, but we will alter it to "false" on 12.0 to not
introduce
a breaking change in a minor release.
Closes#692.
Even more, allowing such queries to go through router planner also
causes
generating an incorrect plan for the DML queries that reference
distributed
tables that are sharded based on different replication factor settings.
For
this reason, #6779 can be closed after altering the default value for
this
GUC to "false", hence not now.
DESCRIPTION: Adds `citus.enable_non_colocated_router_query_pushdown` GUC
to ensure generating a consistent distributed plan for the queries that
reference non-colocated distributed tables (when set to "false", the
default is "true").
Soon I will be doing some changes related to #692 in router planner
and those changes require updating ~5/6 tests related to router
planning. And to make those test files runnable by run_test.py
multiple times, we need to make some other tests (that they're
run in parallel / they badly depend on) ready for run_test.py too.
This would be useful for testing #6773. This is because, given that
#6773
only adds support for router / fast-path queries, theoretically almost
all
the tests that we have in that test file should work for null-shard-key
tables too (and they indeed do).
I deliberately did not replace multi_router_planner_fast_path.sql with
the one that I'm adding into arbitrary configs because we might still
want to see when we're able to go through fast-path planning for the
usual distributed tables (the ones that have a shard key).
DESCRIPTION: Check before logicalrep for rebalancer, error if needed
Check if we can use logical replication or not, in case of shard
transfer mode = auto, before executing the shard moves. If we can't,
error out. Before this PR, we used to error out in the middle of shard
moves:
```sql
set citus.shard_count = 4; -- just to get the error sooner
select citus_remove_node('localhost',9702);
create table t1 (a int primary key);
select create_distributed_table('t1','a');
create table t2 (a bigint);
select create_distributed_table('t2','a');
select citus_add_node('localhost',9702);
select rebalance_table_shards();
NOTICE: Moving shard 102008 from localhost:9701 to localhost:9702 ...
NOTICE: Moving shard 102009 from localhost:9701 to localhost:9702 ...
NOTICE: Moving shard 102012 from localhost:9701 to localhost:9702 ...
ERROR: cannot use logical replication to transfer shards of the relation t2 since it doesn't have a REPLICA IDENTITY or PRIMARY KEY
```
Now we check and error out in the beginning, without moving the shards.
fixes: #6727
Fixes#6672
2) Move all MERGE related routines to a new file merge_planner.c
3) Make ConjunctionContainsColumnFilter() static again, and rearrange the code in MergeQuerySupported()
4) Restore the original format in the comments section.
5) Add big serial test. Implement latest set of comments
This implements the phase - II of MERGE sql support
Support routable query where all the tables in the merge-sql are distributed, co-located, and both the source and
target relations are joined on the distribution column with a constant qual. This should be a Citus single-task
query. Below is an example.
SELECT create_distributed_table('t1', 'id');
SELECT create_distributed_table('s1', 'id', colocate_with => ‘t1’);
MERGE INTO t1
USING s1 ON t1.id = s1.id AND t1.id = 100
WHEN MATCHED THEN
UPDATE SET val = s1.val + 10
WHEN MATCHED THEN
DELETE
WHEN NOT MATCHED THEN
INSERT (id, val, src) VALUES (s1.id, s1.val, s1.src)
Basically, MERGE checks to see if
There are a minimum of two distributed tables (source and a target).
All the distributed tables are indeed colocated.
MERGE relations are joined on the distribution column
MERGE .. USING .. ON target.dist_key = source.dist_key
The query should touch only a single shard i.e. JOIN AND with a constant qual
MERGE .. USING .. ON target.dist_key = source.dist_key AND target.dist_key = <>
If any of the conditions are not met, it raises an exception.
(cherry picked from commit 44c387b978)
This implements MERGE phase3
Support pushdown query where all the tables in the merge-sql are Citus-distributed, co-located, and both
the source and target relations are joined on the distribution column. This will generate multiple tasks
which execute independently after pushdown.
SELECT create_distributed_table('t1', 'id');
SELECT create_distributed_table('s1', 'id', colocate_with => ‘t1’);
MERGE INTO t1
USING s1
ON t1.id = s1.id
WHEN MATCHED THEN
UPDATE SET val = s1.val + 10
WHEN MATCHED THEN
DELETE
WHEN NOT MATCHED THEN
INSERT (id, val, src) VALUES (s1.id, s1.val, s1.src)
*The only exception for both the phases II and III is, UPDATEs and INSERTs must be done on the same shard-group
as the joined key; for example, below scenarios are NOT supported as the key-value to be inserted/updated is not
guaranteed to be on the same node as the id distribution-column.
MERGE INTO target t
USING source s ON (t.customer_id = s.customer_id)
WHEN NOT MATCHED THEN - -
INSERT(customer_id, …) VALUES (<non-local-constant-key-value>, ……);
OR this scenario where we update the distribution column itself
MERGE INTO target t
USING source s On (t.customer_id = s.customer_id)
WHEN MATCHED THEN
UPDATE SET customer_id = 100;
(cherry picked from commit fa7b8949a8)
In the past, having columnar tables in the cluster was causing pg
upgrades to fail when attempting to access columnar metadata. This is
because, pg_dump doesn't see objects that we use for columnar-am related
booking as the dependencies of the tables using columnar-am.
To fix that; in #5456, we inserted some "normal dependency" edges (from
those objects to columnar-am) into pg_depend.
This helped us ensuring the existency of a class of metadata objects
--such as columnar.storageid_seq-- and helped fixing #5437.
However, the normal-dependency edges that we added for indexes on
columnar metadata tables --such columnar.stripe_pkey-- didn't help at
all because they were indeed causing dependency loops (#5510) and
pg_dump was not able to take those dependency edges into the account.
For this reason, this commit deletes those dependency edges so that
pg_dump stops complaining about them. Note that it's not critical to
delete those edges from pg_depend since they're not breaking pg upgrades
but were triggering some warning messages. And given that backporting
a sql change into older versions is hard a lot, we skip backporting
this.
This commit hides port numbers in upgrade_columnar_after because the
port numbers assigned to nodes in upgrade schedule differ from the ones
that flaky test detector assigns.
DESCRIPTION: Fixes a bug in shard copy operations.
For copying shards in both shard move and shard split operations, Citus
uses the COPY statement.
A COPY all statement in the following form
` COPY target_shard FROM STDIN;`
throws an error when there is a GENERATED column in the shard table.
In order to fix this issue, we need to exclude the GENERATED columns in
the COPY and the matching SELECT statements. Hence this fix converts the
COPY and SELECT all statements to the following form:
```
COPY target_shard (col1, col2, ..., coln) FROM STDIN;
SELECT (col1, col2, ..., coln) FROM source_shard;
```
where (col1, col2, ..., coln) does not include a GENERATED column.
GENERATED column values are created in the target_shard as the values
are inserted.
Fixes#6705.
---------
Co-authored-by: Teja Mupparti <temuppar@microsoft.com>
Co-authored-by: aykut-bozkurt <51649454+aykut-bozkurt@users.noreply.github.com>
Co-authored-by: Jelte Fennema <jelte.fennema@microsoft.com>
Co-authored-by: Gürkan İndibay <gindibay@microsoft.com>
DESCRIPTION: Adds logic to distribute unbalanced shards
If the number of shard placements (for a colocation group) is less than
the number of workers, it means that some of the workers will remain
empty. With this PR, we consider these shard groups as a colocation
group, in order to make them be distributed evenly as much as possible
across the cluster.
Example:
```sql
create table t1 (a int primary key);
create table t2 (a int primary key);
create table t3 (a int primary key);
set citus.shard_count =1;
select create_distributed_table('t1','a');
select create_distributed_table('t2','a',colocate_with=>'t1');
select create_distributed_table('t3','a',colocate_with=>'t2');
create table tb1 (a bigint);
create table tb2 (a bigint);
select create_distributed_table('tb1','a');
select create_distributed_table('tb2','a',colocate_with=>'tb1');
select citus_add_node('localhost',9702);
select rebalance_table_shards();
```
Here we have two colocation groups, each with one shard group. Both
shard groups are placed on the first worker node. When we add a new
worker node and try to rebalance table shards, the rebalance planner
considers it well balanced and does nothing. With this PR, the
rebalancer tries to distribute these shard groups evenly across the
cluster as much as possible. For this example, with this PR, the
rebalancer moves one of the shard groups to the second worker node.
fixes: #6715
DESCRIPTION: Correctly report shard size in citus_shards view
When looking at citus_shards, people are interested in the actual size
that all the data related to the shard takes up on disk.
`pg_total_relation_size` is the function to use for that purpose. The
previously used `pg_relation_size` does not include indexes or TOAST.
Especially the missing toast can have enormous impact on the size of the
shown data.
First of all, this commit sets next_shard_id for
single_node_truncate.sql because shard ids in the test output were
changing whenever we modify a prior test file.
Then the flaky test detector started complaining about
single_node_truncate.sql. We fix that by specifying the correct
test dependency for it in run_test.py.
O Simple fix is to add ORDER BY to have definitive results.
O Add search_path explicitly after reconnecting, this avoids creating objects in public schema
which prevents us from repetitive running of tests.
O multi_mx_modification is not designed to run repetitive, so isolate it.
The failure_create_distributed_table_non_empty test would sometimes fail
like this:
```diff
-- in the first test, cancel the first connection we sent from the coordinator
SELECT citus.mitmproxy('conn.cancel(' || pg_backend_pid() || ')');
- mitmproxy
----------------------------------------------------------------------
-
-(1 row)
-
+ERROR: canceling statement due to user request
+CONTEXT: COPY mitmproxy_result, line 0
+SQL statement "COPY mitmproxy_result FROM '/home/circleci/project/src/test/regress/tmp_check/mitmproxy.fifo'"
+PL/pgSQL function citus.mitmproxy(text) line 11 at EXECUTE
SELECT create_distributed_table('test_table', 'id');
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/30474/workflows/be1c9f9d-22c9-465c-964a-dcdd1cb8c99c/jobs/985441
Because the cancel command had no filter it would actually sometimes
cancel the mitmproxy cancel command itself. This PR addresses that by
simply removing this test.
This is basically the exact same issue as #6217, only in a different
place in the file. It's fixed here by removing the test since there's
already many different similar tests.
DESCRIPTION: Fix background rebalance when reference table has no PK
For the background rebalance we would always fail if a reference table
that was not replicated to all nodes would not have a PK (or replica
identity). Even when we used force_logical or block_writes as the shard
transfer mode. This fixes that and adds some regression tests.
Fixes#6680
We should disallow dropping table_name option if foreign table is in
metadata. Otherwise, we get table not found error which contains
shardid.
DESCRIPTION: Fixes an unexpected foreign table error by disallowing to drop the table_name option.
Fixes#6663
Fixes#6570.
In the past, having columnar tables in the cluster was causing pg
upgrades to fail when attempting to access columnar metadata. This is
because, pg_dump doesn't see objects that we use for columnar-am related
booking as the dependencies of the tables using columnar-am.
To fix that; in #5456, we inserted some "normal dependency" edges (from
those objects to columnar-am) into pg_depend.
This helped us ensuring the existency of a class of metadata objects
--such as columnar.storageid_seq-- and helped fixing #5437.
However, the normal-dependency edges that we added for indexes on
columnar metadata tables --such columnar.stripe_pkey-- didn't help at
all because they were indeed causing dependency loops (#5510) and
pg_dump was not able to take those dependency edges into the account.
For this reason, instead of inserting such dependency edges from indexes
to columnar-am, we allow columnar metadata accessors to fall-back to
sequential scan during pg upgrades.
Sometimes isolation_non_blocking_shard_split would fail like this:
```diff
step s2-show-pg_dist_cleanup:
SELECT object_name, object_type, policy_type FROM pg_dist_cleanup;
object_name |object_type|policy_type
------------------------------+-----------+-----------
+citus_shard_split_slot_2_10_39| 3| 0
public.to_split_table_1500001 | 1| 2
-(1 row)
+(2 rows)
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/30237/workflows/edcf34b7-d7d3-4d10-8293-b6f59b00cdf2/jobs/970960
The reason is that replication slots have now become part of
pg_dist_cleanup too, and sometimes they cannot be cleaned up right away.
This is harmless as they will be cleaned up eventually. So this simply
filters out the replication slots for those tests.
Recursive planner should handle all the tree from bottom to top at
single pass. i.e. It should have already recursively planned all
required parts in its first pass. Otherwise, this means we have bug at
recursive planner, which needs to be handled. We add a check here and
return error.
DESCRIPTION: Fixes wrong results by throwing error in case recursive
planner multipass the query.
We found 3 different cases which causes recursive planner passes the
query multiple times.
1. Sublink in WHERE clause is planned at second pass after we
recursively planned a distributed table at the first pass. Fixed by PR
#6657.
2. Local-distributed joins are recursively planned at both the first and
the second pass. Issue #6659.
3. Some parts of the query is considered to be noncolocated at the
second pass as we do not generate attribute equivalances between
nondistributed and distributed tables. Issue #6653
DESCRIPTION: Fix foreign key validation skip at the end of shard move
In eadc88a we started completely skipping foreign key constraint
validation at the end of a non blocking shard move, instead of only for
foreign keys to reference tables. However, it turns out that this didn't
work at all because of a hard to notice bug: By resetting the
SkipConstraintValidation flag at the end of our utility hook, we
actually make the SET command that sets it a no-op.
This fixes that bug by removing the code that resets it. This is fine
because #6543 removed the only place where we set the flag in C code. So
the resetting of the flag has no purpose anymore. This PR also adds a
regression test, because it turned out we didn't have any otherwise we
would have caught that the feature was completely broken.
It also moves the constraint validation skipping to the utility hook.
The reason is that #6550 showed us that this is the better place to skip
it, because it will also skip the planning phase and not just the
execution.
We should do the sublink conversations at the end of the recursive
planning because earlier steps might have transformed the query into a
shape that needs recursively planning the sublinks.
DESCRIPTION: Fixes early sublink check at recursive planner.
Related to PR https://github.com/citusdata/citus/pull/6650
This change allows creating a constraint without a name using an index.
The index name will be used as the constraint name the same way postgres
handles it.
Fixes issue #6644
This commit also cleans up some leftovers from nameless constraint checks.
With this commit, we now fully support adding all nameless constraints
directly to a table.
Co-authored-by: naisila <nicypp@gmail.com>
Adds NOT VALID option to deparser. When we need to deparse:
"ALTER TABLE ADD FOREIGN KEY ... NOT VALID"
"ALTER TABLE ADD CHECK ... NOT VALID"
NOT VALID option should be propagated to workers.
Fixes issue #6646
This commit also uses AppendColumnNameList function
instead of repeated code blocks in two appropriate places
in the "ALTER TABLE" deparser.
If an update query on a reference table has a returns clause with a
subquery that accesses some other local table, we end-up with an crash.
This commit prevents the crash, but does not prevent other error
messages from happening due to Citus not being able to pushdown the
results of that subquery in a valid SQL command.
Related: #6634
DESCRIPTION: Fix regression in allowed foreign keys on distributed
tables
In commit eadc88a we changed how we skip foreign key validation. The
goal was to skip it in more cases. However, one change had the
unintended regression of introducing failures when trying to create
certain foreign keys. This reverts that part of the change.
The way of skipping validation of foreign keys that was introduced in
eadc88a was skipping validation during execution. The reason that
this caused this regression was because some foreign key validation
queries already fail during planning. In those cases it never gets to
the execution step where it would later be skipped.
Fixes#6543
DESCRIPTION: Enable adding FOREIGN KEY constraints on Citus tables
without a name
This PR enables adding a foreign key to a distributed/reference/Citus
local table without specifying the name of the constraint, e.g. `ALTER
TABLE items ADD FOREIGN KEY (user_id) REFERENCES users (id);`
This implements the phase - II of MERGE sql support
Support routable query where all the tables in the merge-sql are distributed, co-located, and both the source and
target relations are joined on the distribution column with a constant qual. This should be a Citus single-task
query. Below is an example.
SELECT create_distributed_table('t1', 'id');
SELECT create_distributed_table('s1', 'id', colocate_with => ‘t1’);
MERGE INTO t1
USING s1 ON t1.id = s1.id AND t1.id = 100
WHEN MATCHED THEN
UPDATE SET val = s1.val + 10
WHEN MATCHED THEN
DELETE
WHEN NOT MATCHED THEN
INSERT (id, val, src) VALUES (s1.id, s1.val, s1.src)
Basically, MERGE checks to see if
There are a minimum of two distributed tables (source and a target).
All the distributed tables are indeed colocated.
MERGE relations are joined on the distribution column
MERGE .. USING .. ON target.dist_key = source.dist_key
The query should touch only a single shard i.e. JOIN AND with a constant qual
MERGE .. USING .. ON target.dist_key = source.dist_key AND target.dist_key = <>
If any of the conditions are not met, it raises an exception.
citus_job_list() lists all background jobs by simply showing the records
in pg_dist_background_job.
citus_job_status(job_id bigint, raw boolean default false) shows the
status of a single background job by appending a jsonb details column to
the associated row from pg_dist_background_job. If the raw argument is
set, machine readable sizes are used instead of human readable
alternatives.
citus_rebalance_status(raw boolean default false) shows the status of
the last rebalance operation. If the raw argument is set, machine
readable sizes are used instead of human readable alternatives.
DESCRIPTION: Enable adding CHECK constraints on distributed tables
without the client having to provide a constraint name.
This PR enables the following command syntax for adding check
constraints to distributed tables.
ALTER TABLE ... ADD CHECK ...
by creating a default constraint name and transforming the command into
the below syntax before sending it to workers.
ALTER TABLE ... ADD CONSTRAINT \<conname> CHECK ...