Commit Graph

2791 Commits (5027321af006c9ee6e696c1b038b15bf4e947bd2)

Author SHA1 Message Date
aykutbozkurt a74232bb39 PR #6728  / commit - 9
Do not enforce distributed transaction at `EnsureCoordinatorInitiatedOperation`.
2023-03-30 10:53:22 +03:00
aykutbozkurt 1fb3de14df PR #6728  / commit - 6
Let `activate_node_snapshot` use new metadata sync api.
2023-03-30 10:53:22 +03:00
aykutbozkurt bc25ba51c3 PR #6728  / commit - 5
Let `ActivateNode` use new metadata sync api.
2023-03-30 10:53:22 +03:00
aykutbozkurt 29ef9117e6 PR #6728  / commit - 4
Add new metadata sync methods which uses MemorySyncContext api so that during the sync we can
- free memory to prevent OOM,
- use either transactional or nontransactional modes according to the GUC .
2023-03-30 10:53:22 +03:00
aykutbozkurt 8feb8c634a PR #6728  / commit - 3
Let nontransactional sync mode create transaction per shell table during dropping the shell tables from worker.
2023-03-30 10:53:20 +03:00
Gokhan Gulbiz e71bfd6074
Identity column implementation refactorings (#6738)
This pull request proposes a change to the logic used for propagating
identity columns to worker nodes in citus. Instead of creating a
dependent sequence for each identity column and changing its default
value to `nextval(seq)/worker_nextval(seq)`, this update will pass the
identity columns as-is to the worker nodes.

Please note that there are a few limitations to this change. 

1. Only bigint identity columns will be allowed in distributed tables to
ensure compatibility with the DDL from any node functionality. Our
current distributed sequence implementation only allows insert
statements from all nodes for bigint sequences.
2. `alter_distributed_table` and `undistribute_table` operations will
not be allowed for tables with identity columns. This is because we do
not have a proper way of keeping sequence states consistent across the
cluster.

DESCRIPTION: Prevents using identity columns on data types other than
`bigint` on distributed tables
DESCRIPTION: Prevents using `alter_distributed_table` and
`undistribute_table` UDFs when a table has identity columns
DESCRIPTION: Fixes a bug that prevents enforcing identity column
restrictions on worker nodes

Depends on #6740
Fixes #6694
2023-03-30 10:41:01 +03:00
Emel Şimşek d3fb9288ab
Schedule parallel shard moves in background rebalancer by removing task dependencies between shard moves across colocation groups. (#6756)
DESCRIPTION: This PR removes the task dependencies between shard moves
for which the shards belong to different colocation groups. This change
results in scheduling multiple tasks in the RUNNABLE state. Therefore it
is possible that the background task monitor can run them concurrently.

Previously, all the shard moves planned in a rebalance operation took
dependency on each other sequentially.
For instance, given the following table and shards 

colocation group 1 colocation group 2
table1 table2 table3 table4 table 5
shard11 shard21 shard31 shard41 shard51
shard12 shard22 shard32 shard42 shard52
  
 if the rebalancer planner returned the below set of moves 
` {move(shard11), move(shard12), move(shard41), move(shard42)}`

background rebalancer scheduled them such that they depend on each other
sequentially.
```
      {move(reftables) if there is any, none}
               |
      move( shard11)
               |
      move(shard12)
               |                {move(shard41)<--- move(shard12)} This is an artificial dependency  
      move(shard41)
               |
      move(shard42) 

```
This results in artificial dependencies between otherwise independent
moves.

Considering that the shards in different colocation groups can be moved
concurrently, this PR changes the dependency relationship between the
moves as follows:

```
      {move(reftables) if there is any, none}          {move(reftables) if there is any, none}     
               |                                                            |
      move(shard11)                                                  move(shard41)
               |                                                            |
      move(shard12)                                                   move(shard42) 
   
```

---------

Co-authored-by: Jelte Fennema <jelte.fennema@microsoft.com>
2023-03-29 22:03:37 +03:00
Marco Slot e5fd1c3a87 Fix TAP tests after CREATE PUBLICATION changes 2023-03-29 00:59:12 +02:00
Marco Slot 8ad444f8ef Hide shards from CDC subscriptions 2023-03-29 00:59:12 +02:00
Marco Slot b09d239809 Propagate CREATE PUBLICATION statements 2023-03-29 00:59:12 +02:00
Gokhan Gulbiz e618345703
Handle identity columns properly in the router planner (#6802)
DESCRIPTION: Fixes a bug with insert..select queries with identity
columns
Fixes #6798
2023-03-29 15:50:12 +03:00
Teja Mupparti 37500806d6 Add appropriate locks for MERGE to run in parallel 2023-03-28 09:45:40 -07:00
rajeshkt78 85b8a2c7a1
CDC implementation for Citus using Logical Replication (#6623)
Description:
Implementing CDC changes using Logical Replication to avoid
re-publishing events multiple times by setting up replication origin
session, which will add "DoNotReplicateId" to every WAL entry.
   - shard splits
   - shard moves
   - create distributed table
   - undistribute table
   - alter distributed tables (for some cases)
   - reference table operations
   

The citus decoder which will be decoding WAL events for CDC clients, 
ignores any WAL entry with replication origin that is not zero.
It also maps the shard names to distributed table names.
2023-03-28 16:00:21 +05:30
Onur Tirtir 616b5018a0
Add a GUC to disallow planning the queries that reference non-colocated tables via router planner (#6793)
Today we allow planning the queries that reference non-colocated tables
if the shards that query targets are placed on the same node. However,
this may not be the case, e.g., after rebalancing shards because it's
not guaranteed to have those shards on the same node anymore.
This commit adds citus.enable_non_colocated_router_query_pushdown GUC
that can be used to disallow  planning such queries via router planner,
when it's set to false. Note that the default value for this GUC will be
"true" for 11.3, but we will alter it to "false" on 12.0 to not
introduce
a breaking change in a minor release.

Closes #692.

Even more, allowing such queries to go through router planner also
causes
generating an incorrect plan for the DML queries that reference
distributed
tables that are sharded based on different replication factor settings.
For
this reason, #6779 can be closed after altering the default value for
this
GUC to "false", hence not now.

DESCRIPTION: Adds `citus.enable_non_colocated_router_query_pushdown` GUC
to ensure generating a consistent distributed plan for the queries that
reference non-colocated distributed tables (when set to "false", the
default is "true").
2023-03-28 13:10:29 +03:00
Teja Mupparti 9bab819f26 Disentangle MERGE planning code from the modify-planning code path 2023-03-27 10:41:46 -07:00
Onur Tirtir 372a93b529
Make 8 more tests runnable multiple times via run_test.py (#6791)
Soon I will be doing some changes related to #692 in router planner
and those changes require updating ~5/6 tests related to router
planning. And to make those test files runnable by run_test.py
multiple times, we need to make some other tests (that they're
run in parallel / they badly depend on) ready for run_test.py too.
2023-03-27 12:19:06 +03:00
Onur Tirtir 4960ced175
Add an arbitrary config test heavily based on multi_router_planner_fast_path.sql (#6782)
This would be useful for testing #6773. This is because, given that
#6773
only adds support for router / fast-path queries, theoretically almost
all
the tests that we have in that test file should work for null-shard-key
tables too (and they indeed do).

I deliberately did not replace multi_router_planner_fast_path.sql with
the one that I'm adding into arbitrary configs because we might still
want to see when we're able to go through fast-path planning for the
usual distributed tables (the ones that have a shard key).
2023-03-22 10:49:08 +03:00
Ahmet Gedemenli 2713e015d6
Check before logicalrep for rebalancer, error if needed (#6754)
DESCRIPTION: Check before logicalrep for rebalancer, error if needed

Check if we can use logical replication or not, in case of shard
transfer mode = auto, before executing the shard moves. If we can't,
error out. Before this PR, we used to error out in the middle of shard
moves:
```sql
set citus.shard_count = 4; -- just to get the error sooner
select citus_remove_node('localhost',9702);

create table t1 (a int primary key);
select create_distributed_table('t1','a');
create table t2 (a bigint);
select create_distributed_table('t2','a');

select citus_add_node('localhost',9702);
select rebalance_table_shards();
NOTICE:  Moving shard 102008 from localhost:9701 to localhost:9702 ...
NOTICE:  Moving shard 102009 from localhost:9701 to localhost:9702 ...
NOTICE:  Moving shard 102012 from localhost:9701 to localhost:9702 ...
ERROR:  cannot use logical replication to transfer shards of the relation t2 since it doesn't have a REPLICA IDENTITY or PRIMARY KEY
```

Now we check and error out in the beginning, without moving the shards.

fixes: #6727
2023-03-21 16:34:52 +03:00
aykut-bozkurt aa33988c6e
fix pip lock file (#6766)
ci/fix_styles.sh were complaining about `black` and `isort` packages are
not found even if I `pipenv install --dev` due to broken lock file. I
regenerated the lock file and now it works fine. We also wanted to
upgrade required python version for the pipfile.
2023-03-21 00:58:12 +03:00
aykut-bozkurt ea3093bdb6
Make workerCount configurable for regression tests (#6764)
Make worker count flexible in our regression tests instead of hardcoding
it to 2 workers.
2023-03-20 12:06:31 +03:00
Teja Mupparti cf55136281 1) Restrict MERGE command INSERT to the source's distribution column
Fixes #6672

2) Move all MERGE related routines to a new file merge_planner.c

3) Make ConjunctionContainsColumnFilter() static again, and rearrange the code in MergeQuerySupported()
4) Restore the original format in the comments section.
5) Add big serial test. Implement latest set of comments
2023-03-16 13:43:08 -07:00
Teja Mupparti 1e42cd3da0 Support MERGE on distributed tables with restrictions
This implements the phase - II of MERGE sql support

Support routable query where all the tables in the merge-sql are distributed, co-located, and both the source and
target relations are joined on the distribution column with a constant qual. This should be a Citus single-task
query. Below is an example.

SELECT create_distributed_table('t1', 'id');
SELECT create_distributed_table('s1', 'id', colocate_with => ‘t1’);

MERGE INTO t1
USING s1 ON t1.id = s1.id AND t1.id = 100
WHEN MATCHED THEN
UPDATE SET val = s1.val + 10
WHEN MATCHED THEN
DELETE
WHEN NOT MATCHED THEN
INSERT (id, val, src) VALUES (s1.id, s1.val, s1.src)

Basically, MERGE checks to see if

There are a minimum of two distributed tables (source and a target).
All the distributed tables are indeed colocated.
MERGE relations are joined on the distribution column
MERGE .. USING .. ON target.dist_key = source.dist_key
The query should touch only a single shard i.e. JOIN AND with a constant qual
MERGE .. USING .. ON target.dist_key = source.dist_key AND target.dist_key = <>
If any of the conditions are not met, it raises an exception.

(cherry picked from commit 44c387b978)

This implements MERGE phase3

Support pushdown query where all the tables in the merge-sql are Citus-distributed, co-located, and both
the source and target relations are joined on the distribution column. This will generate multiple tasks
which execute independently after pushdown.

SELECT create_distributed_table('t1', 'id');
SELECT create_distributed_table('s1', 'id', colocate_with => ‘t1’);

MERGE INTO t1
USING s1
ON t1.id = s1.id
        WHEN MATCHED THEN
                UPDATE SET val = s1.val + 10
        WHEN MATCHED THEN
                DELETE
        WHEN NOT MATCHED THEN
                INSERT (id, val, src) VALUES (s1.id, s1.val, s1.src)

*The only exception for both the phases II and III is, UPDATEs and INSERTs must be done on the same shard-group
as the joined key; for example, below scenarios are NOT supported as the key-value to be inserted/updated is not
guaranteed to be on the same node as the id distribution-column.

MERGE INTO target t
USING source s ON (t.customer_id = s.customer_id)
WHEN NOT MATCHED THEN - -
     INSERT(customer_id, …) VALUES (<non-local-constant-key-value>, ……);

OR this scenario where we update the distribution column itself

MERGE INTO target t
USING source s On (t.customer_id = s.customer_id)
WHEN MATCHED THEN
     UPDATE SET customer_id = 100;

(cherry picked from commit fa7b8949a8)
2023-03-16 13:43:08 -07:00
Jelte Fennema b8b85072d6
Add pytest depedencies to Pipfile (#6767)
In #6720 I'm adding a `pytest` based testing framework. This adds the
dependencies for those. They have already been [merged into our docker
files][the-process-merge] in the the-process repo preparation for #6720.
But by not having them on our citus main branch it is impossible to
make changes to the Pipfile, because our CI Dockerfiles and master
are out of date.

Since #6720 will need some more discussion and might take a few more
weeks to be merged, this takes out the Pipfile changes. By merging this
PR we can unblock new Pipfile changes.

Unblocks and partially addresses #6766 

[the-process-merge]: https://github.com/citusdata/the-process/pull/117
2023-03-15 14:53:14 +01:00
Onur Tirtir 9550ebd118 Remove pg_depend entries from columnar metadata indexes to columnar-am
In the past, having columnar tables in the cluster was causing pg
upgrades to fail when attempting to access columnar metadata. This is
because, pg_dump doesn't see objects that we use for columnar-am related
booking as the dependencies of the tables using columnar-am.
To fix that; in #5456, we inserted some "normal dependency" edges (from
those objects to columnar-am) into pg_depend.

This helped us ensuring the existency of a class of metadata objects
--such as columnar.storageid_seq-- and helped fixing #5437.

However, the normal-dependency edges that we added for indexes on
columnar metadata tables --such columnar.stripe_pkey-- didn't help at
all because they were indeed causing dependency loops (#5510) and
pg_dump was not able to take those dependency edges into the account.

For this reason, this commit deletes those dependency edges so that
pg_dump stops complaining about them. Note that it's not critical to
delete those edges from pg_depend since they're not breaking pg upgrades
but were triggering some warning messages. And given that backporting
a sql change into older versions is hard a lot, we skip backporting
this.
2023-03-14 17:13:52 +03:00
Onur Tirtir 2b4be535de Do clean-up before upgrade_columnar_before to make it runnable multiple times
So that flaky test detector can run upgrade_columnar_before.sql multiple
times.
2023-03-14 17:13:52 +03:00
Onur Tirtir 994f67185f Make upgrade_columnar_after runnable multiple times
This commit hides port numbers in upgrade_columnar_after because the
port numbers assigned to nodes in upgrade schedule differ from the ones
that flaky test detector assigns.
2023-03-14 17:13:52 +03:00
Onur Tirtir 821f26cc74 Fix flaky test detection for upgrade tests
When run_test.py is run for an upgrade_.*_after.sql then, then
automatically run the corresponding uprade_.*_before.sql file first.
This is because all those upgrade_.*_after.sql files depend on the
objects created in upgrade_.*_before.sql files by definition.
2023-03-14 17:13:52 +03:00
Onur Tirtir cc945fa331
Add multi_create_fdw into minimal_schedule (#6759)
So that we can run the tests that require fake_fdw by using minimal
schedule too.

Also move multi_create_fdw.sql up in multi_1_schedule to make it
available to more tests.
2023-03-14 10:22:34 +03:00
Emel Şimşek 4043abd5aa
Exclude-Generated-Columns-In-Copy (#6721)
DESCRIPTION: Fixes a bug in shard copy operations.

For copying shards in both shard move and shard split operations, Citus
uses the COPY statement.

A COPY all statement in the following form
` COPY target_shard FROM STDIN;`
throws an error when there is a GENERATED column in the shard table.

In order to fix this issue, we need to exclude the GENERATED columns in
the COPY and the matching SELECT statements. Hence this fix converts the
COPY and SELECT all statements to the following form:
```
COPY target_shard (col1, col2, ..., coln) FROM STDIN;
SELECT (col1, col2, ..., coln) FROM source_shard;
```
where (col1, col2, ..., coln) does not include a GENERATED column. 
GENERATED column values are created in the target_shard as the values
are inserted.

Fixes #6705.

---------

Co-authored-by: Teja Mupparti <temuppar@microsoft.com>
Co-authored-by: aykut-bozkurt <51649454+aykut-bozkurt@users.noreply.github.com>
Co-authored-by: Jelte Fennema <jelte.fennema@microsoft.com>
Co-authored-by: Gürkan İndibay <gindibay@microsoft.com>
2023-03-07 18:15:50 +03:00
Ahmet Gedemenli 03f1bb70b7
Rebalance shard groups with placement count less than worker count (#6739)
DESCRIPTION: Adds logic to distribute unbalanced shards

If the number of shard placements (for a colocation group) is less than
the number of workers, it means that some of the workers will remain
empty. With this PR, we consider these shard groups as a colocation
group, in order to make them be distributed evenly as much as possible
across the cluster.

Example:
```sql
create table t1 (a int primary key);
create table t2 (a int primary key);
create table t3 (a int primary key);
set citus.shard_count =1;
select create_distributed_table('t1','a');
select create_distributed_table('t2','a',colocate_with=>'t1');
select create_distributed_table('t3','a',colocate_with=>'t2');

create table tb1 (a bigint);
create table tb2 (a bigint);
select create_distributed_table('tb1','a');
select create_distributed_table('tb2','a',colocate_with=>'tb1');

select citus_add_node('localhost',9702);
select rebalance_table_shards();
```

Here we have two colocation groups, each with one shard group. Both
shard groups are placed on the first worker node. When we add a new
worker node and try to rebalance table shards, the rebalance planner
considers it well balanced and does nothing. With this PR, the
rebalancer tries to distribute these shard groups evenly across the
cluster as much as possible. For this example, with this PR, the
rebalancer moves one of the shard groups to the second worker node.

fixes: #6715
2023-03-06 14:14:27 +03:00
Jelte Fennema b489d763e1
Use pg_total_relation_size in citus_shards (#6748)
DESCRIPTION: Correctly report shard size in citus_shards view

When looking at citus_shards, people are interested in the actual size
that all the data related to the shard takes up on disk.
`pg_total_relation_size` is the function to use for that purpose. The
previously used `pg_relation_size` does not include indexes or TOAST.
Especially the missing toast can have enormous impact on the size of the
shown data.
2023-03-06 10:53:12 +01:00
Gledis Zeneli dc7fa0d5af
Fix multiple output version arbitrary config tests (#6744)
With this small change, arbitrary config tests can have multiple acceptable correct outputs.

For an arbitrary config tests named `t`, now you can define `expected/t.out`, `expected/t_0.out`, `expected/t_1.out` etc and the test will succeed if the output of `sql/t.sql` is equal to any of the `t.out` or `t_{0, 1, ...}.out` files.
2023-03-03 21:06:59 +03:00
Onur Tirtir a9820e96a3 Make single_node_truncate.sql re-runnable
First of all, this commit sets next_shard_id for
single_node_truncate.sql because shard ids in the test output were
changing whenever we modify a prior test file.

Then the flaky test detector started complaining about
single_node_truncate.sql. We fix that by specifying the correct
test dependency for it in run_test.py.
2023-03-02 16:33:18 +03:00
Onur Tirtir 40105bf1fc Make single_node.sql re-runnable 2023-03-02 16:33:17 +03:00
Jelte Fennema 17ad61678f
Make run_test.py and create_test.py importable without errors (#6736)
Allowing scripts to be importable is good practice in general and it's
required for the pytest testing framework that I'm adding in a follow up
PR.
2023-02-28 00:34:42 +03:00
Jelte Fennema c018e29bec
Don't blanket ignore flake8 E402 error (#6734)
Instead this starts ignoring it in specific places only, because most
files don't actually need it ignored.
2023-02-27 18:17:15 +03:00
Jelte Fennema 24ad8574b5
Fix run_test.py on python 3.9 (#6735)
In #6718 I accidentally added Python type hint syntax that was only
supported on Python 3.10. Our CI uses 3.9, so this PR changes that to a
syntax that's supported on 3.9 too.
2023-02-27 10:12:18 +01:00
Teja Mupparti d7b499929c Rearrange the common code into a newfunction to facilitate the multiple checks of the same conditions in a multi-modify MERGE statement 2023-02-24 12:55:11 -08:00
Teja Mupparti ca65d2ba0b Fix flaky tests local_shards_execution and local_shards_execution_replication.
O Simple fix is to add ORDER BY to have definitive results.
O Add search_path explicitly after reconnecting, this avoids creating objects in public schema
  which prevents us from repetitive running of tests.
O multi_mx_modification is not designed to run repetitive, so isolate it.
2023-02-15 09:18:10 -08:00
Jelte Fennema b02a5b5b78
Add more powerfull dependency tracking to run_test.py (#6718)
Some of our tests depend on previous tests. Normally all these tests
should be part of a base schedule, but that's not always the case. The
flaky test detection script should ensure that we don't introduce other
dependencies by accident in new tests. But we have many old tests that
are not worth the effort of changing. This adds a way to define such
test dependencies in `run_test.py`, so that it can make sure to run any
dependencies before the actual test.
2023-02-15 17:20:05 +03:00
Jelte Fennema 3ba639f162
Install non-vulnerable cryptography package (#6710)
Our repo was complaining about the cryptography package being
vulnerable. This updates it, including our mitmproxy fork, because that
was pinning an outdated version.

Relevant commit on our mitmproxy fork:
2fd18ef051

Relevant PR on the-process:
https://github.com/citusdata/the-process/pull/112
2023-02-14 18:03:10 +01:00
Jelte Fennema 9f41ea2157 Fix issues reported by flake8 2023-02-10 13:05:37 +01:00
Jelte Fennema 188cc7d2ae Run python files through isort 2023-02-10 13:05:37 +01:00
Jelte Fennema 530b24a887 Format python files with black 2023-02-10 13:05:37 +01:00
Jelte Fennema 42970665fc Add linting and formatting tools for python 2023-02-10 13:05:37 +01:00
Jelte Fennema 09be4bb5fd
Allow multi_insert_select to run repeatably (#6707)
It was not cleaning up all the tables it created. This changes it to
create a dedicated schema for this test, like we have for many others.
2023-02-10 10:06:42 +01:00
Jelte Fennema 590df5360c
Fix flakyness in failure_create_distributed_table_non_empty (#6708)
The failure_create_distributed_table_non_empty test would sometimes fail
like this:
```diff
 -- in the first test, cancel the first connection we sent from the coordinator
 SELECT citus.mitmproxy('conn.cancel(' ||  pg_backend_pid() || ')');
- mitmproxy
----------------------------------------------------------------------
-
-(1 row)
-
+ERROR:  canceling statement due to user request
+CONTEXT:  COPY mitmproxy_result, line 0
+SQL statement "COPY mitmproxy_result FROM '/home/circleci/project/src/test/regress/tmp_check/mitmproxy.fifo'"
+PL/pgSQL function citus.mitmproxy(text) line 11 at EXECUTE
 SELECT create_distributed_table('test_table', 'id');
```

Source:
https://app.circleci.com/pipelines/github/citusdata/citus/30474/workflows/be1c9f9d-22c9-465c-964a-dcdd1cb8c99c/jobs/985441

Because the cancel command had no filter it would actually sometimes
cancel the mitmproxy cancel command itself. This PR addresses that by
simply removing this test.

This is basically the exact same issue as #6217, only in a different
place in the file. It's fixed here by removing the test since there's
already many different similar tests.
2023-02-10 09:55:12 +01:00
Onur Tirtir 483b51392f
Bump Citus to 11.3devel (#6690) 2023-02-06 10:23:25 +00:00
Gokhan Gulbiz b6a4652849
Stop background daemon before dropping the database (#6688)
DESCRIPTION: Stop maintenance daemon when dropping a database even
without Citus extension

Fixes #6670
2023-02-03 15:15:44 +03:00
Jelte Fennema 14c31fbb07
Fix background rebalance when reference table has no PK (#6682)
DESCRIPTION: Fix background rebalance when reference table has no PK

For the background rebalance we would always fail if a reference table
that was not replicated to all nodes would not have a PK (or replica
identity). Even when we used force_logical or block_writes as the shard
transfer mode. This fixes that and adds some regression tests.

Fixes #6680
2023-01-31 12:18:29 +01:00
aykut-bozkurt 8a9bb272e4
fix dropping table_name option from foreign table (#6669)
We should disallow dropping table_name option if foreign table is in
metadata. Otherwise, we get table not found error which contains
shardid.

DESCRIPTION: Fixes an unexpected foreign table error by disallowing to drop the table_name option.

Fixes #6663
2023-01-30 17:24:30 +03:00
Marco Slot a482b36760
Revert "Support MERGE on distributed tables with restrictions" (#6675)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2023-01-30 15:01:59 +01:00
Hanefi Onaldi 0962cf7517
Allow empty lines in arbitrary config schedules (#6654)
This change is a precursor to attempts to add more editorconfig rules in
our codebase. It is a good idea to comply with POSIX standards and have
an empty newline at the end of text files. However, once we have such a
rule, arbitrary configs scripts used to fail before this change.

Related: #5981
2023-01-30 16:30:12 +03:00
Onur Tirtir 594684bb33 Do clean-up before columnar_create to make it runnable multiple times
So that flaky test detector can run columnar_create.sql multiple times.
2023-01-30 15:58:34 +03:00
Onur Tirtir 1c51ddae49 Fall-back to seq-scan when accessing columnar metadata if the index doesn't exist
Fixes #6570.

In the past, having columnar tables in the cluster was causing pg
upgrades to fail when attempting to access columnar metadata. This is
because, pg_dump doesn't see objects that we use for columnar-am related
booking as the dependencies of the tables using columnar-am.
To fix that; in #5456, we inserted some "normal dependency" edges (from
those objects to columnar-am) into pg_depend.

This helped us ensuring the existency of a class of metadata objects
--such as columnar.storageid_seq-- and helped fixing #5437.

However, the normal-dependency edges that we added for indexes on
columnar metadata tables --such columnar.stripe_pkey-- didn't help at
all because they were indeed causing dependency loops (#5510) and
pg_dump was not able to take those dependency edges into the account.

For this reason, instead of inserting such dependency edges from indexes
to columnar-am, we allow columnar metadata accessors to fall-back to
sequential scan during pg upgrades.
2023-01-30 15:58:34 +03:00
Jelte Fennema 1109b70e58
Fix flaky isolation_non_blocking_shard_split test (#6666)
Sometimes isolation_non_blocking_shard_split would fail like this:
```diff
 step s2-show-pg_dist_cleanup:
  SELECT object_name, object_type, policy_type FROM pg_dist_cleanup;

 object_name                   |object_type|policy_type
 ------------------------------+-----------+-----------
+citus_shard_split_slot_2_10_39|          3|          0
 public.to_split_table_1500001 |          1|          2
-(1 row)
+(2 rows)
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/30237/workflows/edcf34b7-d7d3-4d10-8293-b6f59b00cdf2/jobs/970960

The reason is that replication slots have now become part of
pg_dist_cleanup too, and sometimes they cannot be cleaned up right away.
This is harmless as they will be cleaned up eventually. So this simply
filters out the replication slots for those tests.
2023-01-30 13:44:23 +01:00
Jelte Fennema 10603ed5d4
Fix flaky multi_reference_table test (#6664)
Sometimes in CI our multi_reference_table test fails like this:
```diff
 WHERE
 	colocated_table_test.value_2 = reference_table_test.value_2;
 LOG:  join order: [ "colocated_table_test" ][ reference join "reference_table_test" ]
  value_2
 ---------
-       1
        2
+       1
 (2 rows)
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/30223/workflows/ce3ab5db-310f-4e30-ba0b-c3b31927d9b6/jobs/970041

We forgot an ORDER BY in this test.
2023-01-30 13:32:38 +01:00
aykut-bozkurt ab71cd01ee
fix multi level recursive plan (#6650)
Recursive planner should handle all the tree from bottom to top at
single pass. i.e. It should have already recursively planned all
required parts in its first pass. Otherwise, this means we have bug at
recursive planner, which needs to be handled. We add a check here and
return error.

DESCRIPTION: Fixes wrong results by throwing error in case recursive
planner multipass the query.

We found 3 different cases which causes recursive planner passes the
query multiple times.
1. Sublink in WHERE clause is planned at second pass after we
recursively planned a distributed table at the first pass. Fixed by PR
#6657.
2. Local-distributed joins are recursively planned at both the first and
the second pass. Issue #6659.
3. Some parts of the query is considered to be noncolocated at the
second pass as we do not generate attribute equivalances between
nondistributed and distributed tables. Issue #6653
2023-01-27 21:25:04 +03:00
Jelte Fennema 0903091343
Add valgrind support to run_test.py (#6667)
Running tests with valgrind was not possible with our run_test.py script
yet. This adds that support.
2023-01-27 16:01:59 +01:00
Gokhan Gulbiz 4e26464969
Allow plain pg foreign tables without a table_name option (#6652) 2023-01-27 16:34:11 +03:00
Jelte Fennema 81dcddd1ef
Actually skip constraint validation on shards after shard move (#6640)
DESCRIPTION: Fix foreign key validation skip at the end of shard move

In eadc88a we started completely skipping foreign key constraint
validation at the end of a non blocking shard move, instead of only for
foreign keys to reference tables. However, it turns out that this didn't
work at all because of a hard to notice bug: By resetting the
SkipConstraintValidation flag at the end of our utility hook, we
actually make the SET command that sets it a no-op.

This fixes that bug by removing the code that resets it. This is fine
because #6543 removed the only place where we set the flag in C code. So
the resetting of the flag has no purpose anymore. This PR also adds a
regression test, because it turned out we didn't have any otherwise we
would have caught that the feature was completely broken.

It also moves the constraint validation skipping to the utility hook.
The reason is that #6550 showed us that this is the better place to skip
it, because it will also skip the planning phase and not just the
execution.
2023-01-27 13:08:05 +01:00
aykut-bozkurt 8870f0f90b
fix order of recursive sublink planning (#6657)
We should do the sublink conversations at the end of the recursive
planning because earlier steps might have transformed the query into a
shape that needs recursively planning the sublinks.

DESCRIPTION: Fixes early sublink check at recursive planner.

Related to PR https://github.com/citusdata/citus/pull/6650
2023-01-27 14:35:16 +03:00
Emel Şimşek 24f6136f72
Fixes ADD {PRIMARY KEY/UNIQUE} USING INDEX cmd (#6647)
This change allows creating a constraint without a name using an index.
The index name will be used as the constraint name the same way postgres
handles it.
Fixes issue #6644

This commit also cleans up some leftovers from nameless constraint checks.
With this commit, we now fully support adding all nameless constraints
directly to a table.

Co-authored-by: naisila <nicypp@gmail.com>
2023-01-25 21:28:07 +03:00
Emel Şimşek 2169e0222b
Propagates NOT VALID option for FK&CHECK constraints w/out a name (#6649)
Adds NOT VALID option to deparser. When we need to deparse:
  "ALTER TABLE ADD FOREIGN KEY ... NOT VALID"
  "ALTER TABLE ADD CHECK ... NOT VALID"
NOT VALID option should be propagated to workers.

Fixes issue #6646

This commit also uses AppendColumnNameList function
instead of repeated code blocks in two appropriate places
in the "ALTER TABLE" deparser.
2023-01-25 20:41:04 +03:00
Hanefi Onaldi 94b63f35a5
Prevent crashes on update with returning clauses (#6643)
If an update query on a reference table has a returns clause with a
subquery that accesses some other local table, we end-up with an crash.

This commit prevents the crash, but does not prevent other error
messages from happening due to Citus not being able to pushdown the
results of that subquery in a valid SQL command.

Related: #6634
2023-01-24 20:07:43 +03:00
Jelte Fennema aa9cd16d15
Use correct guc value to disable statistics collection (#6641)
The `citus.enable_statistics_collection` is a boolean GUC not an integer
one. Setting it to `-1` showed errors in the logs.
2023-01-24 15:32:50 +01:00
Jelte Fennema 7a7880aec9
Fix regression in allowed foreign keys on distributed tables (#6550)
DESCRIPTION: Fix regression in allowed foreign keys on distributed
tables

In commit eadc88a we changed how we skip foreign key validation. The
goal was to skip it in more cases. However, one change had the
unintended regression of introducing failures when trying to create
certain foreign keys. This reverts that part of the change.

The way of skipping validation of foreign keys that was introduced in
eadc88a was skipping validation during execution. The reason that
this caused this regression was because some foreign key validation
queries already fail during planning. In those cases it never gets to
the execution step where it would later be skipped.

Fixes #6543
2023-01-24 16:09:21 +03:00
Jelte Fennema 93fcc5c5d8
Move tablespace directory creation to pg_regress_multi.pl (#6629)
Multiple `check-xxx` targets create tablespaces. If you run
two of these at the same time you would get an error like:

```diff
CREATE TABLESPACE test_tablespace LOCATION :'test_tablespace';
+ERROR:  directory "/home/rajesh/citus/citus/src/test/regress/tmp_check/ts0/PG_14_202107181" already in use as a tablespace
```

This fixes that by moving creation of table space directory creation and
removal to pg_regress_multi.pl instead of being in the Makefile.
2023-01-20 12:34:33 +00:00
Emel Şimşek 58368b7783
Enable adding FOREIGN KEY constraints on Citus tables without a name. (#6616)
DESCRIPTION: Enable adding FOREIGN KEY constraints on Citus tables
without a name

This PR enables adding a foreign key to a distributed/reference/Citus
local table without specifying the name of the constraint, e.g. `ALTER
TABLE items ADD FOREIGN KEY (user_id) REFERENCES users (id);`
2023-01-20 01:43:52 +03:00
Gokhan Gulbiz 2388fbea6e
Identity Column Support on Citus Managed Tables (#6591)
DESCRIPTION: Identity Column Support on Citus Managed Tables
2023-01-19 15:45:41 +03:00
Marco Slot 64e3fee89b
Remove shardstate leftovers (#6627)
Remove ShardState enum and associated logic.

Co-authored-by: Marco Slot <marco.slot@gmail.com>
Co-authored-by: Ahmet Gedemenli <afgedemenli@gmail.com>
2023-01-19 11:43:58 +03:00
Teja Mupparti 44c387b978 Support MERGE on distributed tables with restrictions
This implements the phase - II of MERGE sql support

Support routable query where all the tables in the merge-sql are distributed, co-located, and both the source and
target relations are joined on the distribution column with a constant qual. This should be a Citus single-task
query. Below is an example.

SELECT create_distributed_table('t1', 'id');
SELECT create_distributed_table('s1', 'id', colocate_with => ‘t1’);

MERGE INTO t1
USING s1 ON t1.id = s1.id AND t1.id = 100
WHEN MATCHED THEN
UPDATE SET val = s1.val + 10
WHEN MATCHED THEN
DELETE
WHEN NOT MATCHED THEN
INSERT (id, val, src) VALUES (s1.id, s1.val, s1.src)

Basically, MERGE checks to see if

There are a minimum of two distributed tables (source and a target).
All the distributed tables are indeed colocated.
MERGE relations are joined on the distribution column
MERGE .. USING .. ON target.dist_key = source.dist_key
The query should touch only a single shard i.e. JOIN AND with a constant qual
MERGE .. USING .. ON target.dist_key = source.dist_key AND target.dist_key = <>
If any of the conditions are not met, it raises an exception.
2023-01-18 11:05:27 -08:00
Ahmet Gedemenli b3b135867e
Remove shardstate from placement insert functions (#6615) 2023-01-18 09:52:38 +01:00
Hanefi Onaldi f21dfd5fae
Rebalance Progress Reporting API (#6576)
citus_job_list() lists all background jobs by simply showing the records
in pg_dist_background_job.

citus_job_status(job_id bigint, raw boolean default false) shows the
status of a single background job by appending a jsonb details column to
the associated row from pg_dist_background_job. If the raw argument is
set, machine readable sizes are used instead of human readable
alternatives.

citus_rebalance_status(raw boolean default false) shows the status of
the last rebalance operation. If the raw argument is set, machine
readable sizes are used instead of human readable alternatives.
2023-01-16 16:17:31 +03:00
Jelte Fennema 92689a8362
Make GPIDs work with pg_dist_poolinfo (#6588)
The original implementation of GPIDs didn't work correctly when using
`pg_dist_poolinfo` together with PgBouncer. The reason is that it
assumed that once a connection was made to a worker, the originating
GPID should stay the same for ever. But when pg_dist_poolinfo is used
this isn't the case, because the same connection on the worker might be
used by different backends of the coordinator.

This fixes that issue by updating the GPID whenever a new application
name is set on a connection. This is the only thing that's needed,
because PgBouncer already sets the application name correctly on the
server connection whenever a client is updated.
2023-01-13 14:39:19 +00:00
Marco Slot ad3407b5ff
Revert "Make the metadata syncing less resource invasive [Phase-1]" (#6618)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2023-01-13 13:56:55 +01:00
Ahmet Gedemenli ed6dd8b086
Mark 0/0 lsn results as null (#6617)
Marks `source_lsn` and `target_lsn` fields as null if the result is 0/0
2023-01-13 15:33:43 +03:00
Emel Şimşek 28ed013a91
Enable ALTER TABLE ... ADD CHECK (#6606)
DESCRIPTION: Enable adding CHECK constraints on distributed tables
without the client having to provide a constraint name.

This PR enables the following command syntax for adding check
constraints to distributed tables.
 ALTER TABLE ... ADD CHECK ... 

by creating a default constraint name and transforming the command into
the below syntax before sending it to workers.

ALTER TABLE ... ADD CONSTRAINT \<conname> CHECK ...
2023-01-12 23:31:06 +03:00
Ahmet Gedemenli 9b9d8e7abd Use shard transfer UDFs with node ids for rebalancing 2023-01-12 16:57:51 +03:00
Ahmet Gedemenli e5fef40c06 Introduce citus_move_shard_placement UDF with nodeid 2023-01-12 16:57:51 +03:00
Ahmet Gedemenli e19c545fbf Introduce citus_copy_shard_placement UDF with nodeid 2023-01-12 16:57:51 +03:00
Emel Şimşek d322f9e382
Handle DEFERRABLE option for the relevant constraints at deparser. (#6613)
Table Constraints UNIQUE, PRIMARY KEY and EXCLUDE may have option
DEFERRABLE in their command syntax. This PR handles the option when
deparsing the relevant constraint statements.

NOT DEFERRABLE 
and
INITIALLY IMMEDIATE (if DEFERRABLE}

are the default values for the option so we only append the non-default
values to the alter table statement.
2023-01-12 12:32:38 +03:00
Jelte Fennema 44e09128f0
Fix failures in mx_base_schedule (#6601)
Apparently no-one actually ran the mx_base_schedule, because the tests
in schedule itself were already failing. This updates it to be in line
with multi_mx_schedule again to make the tests pass again. Notably it
doesn't contain multi_mx_node_metadata and multi_extension. Because
those tests take long to run and the were not necessary to make
multi_mx_create_table pass again.
2023-01-06 14:48:18 +01:00
Ahmet Gedemenli bc3383170e
Fix crash when trying to replicate a ref table that is actually dropped (#6595)
DESCRIPTION: Fix crash when trying to replicate a ref table that is actually dropped

see #6592 
We should have a real solution for it.
2023-01-06 14:52:08 +03:00
Emel Şimşek db7a70ef3e
Enable ALTER TABLE ... ADD UNIQUE and ADD EXCLUDE. (#6582)
DESCRIPTION: Adds support for creating table constraints UNIQUE and
EXCLUDE via ALTER TABLE command without client having to specify a name.

ALTER TABLE ... ADD CONSTRAINT <conname> UNIQUE ...
ALTER TABLE ... ADD CONSTRAINT <conname> EXCLUDE ...

commands require the client to provide an explicit constraint name.
However, in postgres it is possible for clients not to provide a name
and let the postgres generate it using the following commands

ALTER TABLE ... ADD UNIQUE ...
ALTER TABLE ... ADD EXCLUDE ...

This PR enables the same functionality for citus tables.
2023-01-05 18:12:32 +03:00
Emel Şimşek 135c519c62
Fix flakyness for multi_name_lengths test (#6599)
Fix flakyness for multi_name_lengths test.
2023-01-05 17:27:16 +03:00
Önder Kalacı a1aa96b32c
Make the metadata syncing less resource invasive [Phase-1] (#6537) 2023-01-04 11:36:45 +01:00
Ahmet Gedemenli 235047670d
Drop SHARD_STATE_TO_DELETE (#6494)
DESCRIPTION: Drop `SHARD_STATE_TO_DELETE` and use the cleanup records
instead

Drops the shard state that is used to mark shards as orphaned. Now we
insert cleanup records into `pg_dist_cleanup` so "orphaned" shards will
be dropped either by maintenance daemon or internal cleanup calls. With
this PR, we make the "cleanup orphaned shards" functions to be no-op, as
they would not be needed anymore.

This PR includes some naming changes about placement functions. We don't
need functions that filter orphaned shards, as there will be no orphaned
shards anymore.

We will also be introducing a small script with this PR, for users with
orphaned shards. We'll basically delete the orphaned shard entries from
`pg_dist_placement` and insert cleanup records into `pg_dist_cleanup`
for each one of them, during Citus upgrade.

We also have a lot of flakiness fixes in this PR.

Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
2023-01-03 14:38:16 +03:00
Jelte Fennema f56904fe04
Fix flakyness in isolation_insert_vs_vacuum (#6589)
Sometimes our `isolation_insert_vs_vacuum` test would fail like this.

```diff
 step s2-vacuum-analyze:
     VACUUM ANALYZE test_insert_vacuum;
-
+ <waiting ...>
 step s1-commit:
     COMMIT;

+step s2-vacuum-analyze: <... completed>

```

The reason seems to be that VACUUM ANALYZE tries to take some locks that
conflict with the other transaction, but these locks somehow get
released or VACUUM ANALYZE stops waiting for them. This is somewhat
expected since VACUUM has some special locking logic.

To solve the flakyness we now trigger VACUUM ANALYZE to always report as
blocking and after that we wait explicitly wait for it to complete. This
is done
like is suggested by the flaky test tips from postgres: 

c68a183990/src/test/isolation/README (L152)

I've confirmed that this fixes the issue suing our flaky-test-debugging
CI workflow.
2023-01-02 16:51:58 +01:00
Ahmet Gedemenli 0c74e4cc0f
Fix some flaky tests (#6587)
Fix for some simple flakiness'.
All `DROP USER' and cleanup function calls.
2022-12-29 10:19:09 +03:00
Ahmet Gedemenli f824c996b3
Remove duplicate split entry in run_test.py (#6586)
Nothing important. Removing the duplicate `"split" in test_schedule`
check
2022-12-28 18:18:24 +03:00
Ahmet Gedemenli 1b1e737e51
Drop cleanup on failure (#6584)
DESCRIPTION: Defers cleanup after a failure in shard move or split

We don't need to do a cleanup in case of failure on a shard transfer or
split anymore. Because,
* Maintenance daemon will clean them up anyway.
* We trigger a cleanup at the beginning of shard transfers/splits. 
* The cleanup on failure logic also can fail sometimes and instead of
the original error, we throw the error that is raised by the cleanup
procedure, and it causes confusion.
2022-12-28 15:48:44 +03:00
Ahmet Gedemenli cfc17385e9
Some minor improvements on flakiness detection (#6585)
* Skip some exceptional test files in the flaky workflow, like
multi_extension
* Run some tests without a schedule, like single_node_enterprise
* Use minimal schedule for the tests in split and operations schedules
2022-12-28 15:08:39 +03:00
Ahmet Gedemenli eba9abeee2
Fix leftover shard copy on the target node when tx with move is aborted (#6583)
DESCRIPTION: Cleanup the shard on the target node in case of a
failed/aborted shard move

Inserts a cleanup record for the moved shard placement on the target
node. If the move operation succeeds, the record will be deleted. If
not, it will remain there to be cleaned up later.

fixes: #6580
2022-12-27 22:42:46 +03:00
Naisila Puka e937935935
Clean up normalize file (#6578) 2022-12-26 12:08:27 +03:00
Naisila Puka bc49616426
Fix link of path to flaky tests docs (#6579) 2022-12-23 12:09:22 +03:00
Ahmet Gedemenli 96bf648d1a Unify dependent test files into one 2022-12-22 13:06:26 +03:00
Ahmet Gedemenli acf3539a90 Fix split schedule 2022-12-22 13:06:26 +03:00
Hanefi Onaldi 303db172f8
Use Citus version comparison in upgrade tests, not equality (#6568)
We have several version checks in our Citus upgrade tests. However, as
we drop support for PG versions, we need to update the Citus versions
used in our CI images. Therefore we must compare Citus versions in our
tests instead of using equality checks so that the queries are ran in
all the associated Citus versions.

For example, we have many conditionals where we early exit if the Citus
version is not equal to 9.0. However, as of today we never use version
9.0 and thus we always early exit in those tests.
2022-12-21 14:01:57 +03:00
Teja Mupparti 9a9989fc15 Support MERGE Phase – I
All the tables (target, source or any CTE present) in the SQL statement are local i.e. a merge-sql with a combination of Citus local and
Non-Citus tables (regular Postgres tables) should work and give the same result as Postgres MERGE on regular tables. Catch and throw an
exception (not-yet-supported) for all other scenarios during Citus-planning phase.
2022-12-18 20:32:15 -08:00