1) For distributed tables that are not colocated.
2) When joining on a non-distribution column for colocated tables.
3) When merging into a distributed table using reference or citus-local tables as the data source.
This is accomplished primarily through the implementation of the following two strategies.
Repartition: Plan the source query independently,
execute the results into intermediate files, and repartition the files to
co-locate them with the merge-target table. Subsequently, compile a final
merge query on the target table using the intermediate results as the data
source.
Pull-to-coordinator: Execute the plan that requires evaluation at the coordinator,
run the query on the coordinator, and redistribute the resulting rows to ensure
colocation with the target shards. Direct the MERGE SQL operation to the worker
nodes' target shards, using the intermediate files colocated with the data as the
data source.
Citus upgrade tests require some additional logic to run, because we
have a before and after schedule and we need to swap the Citus
version in-between. This adds that logic to `run_test.py`.
In passing this makes running upgrade tests locally multiple times
faster by caching tarballs.
Enable router planner and a limited version of INSERT .. SELECT planner
for the queries that reference colocated null shard key tables.
* SELECT / UPDATE / DELETE / MERGE is supported as long as it's a router
query.
* INSERT .. SELECT is supported as long as it only references colocated
null shard key tables.
Note that this is not only limited to distributed INSERT .. SELECT but
also
covers a limited set of query types that require pull-to-coordinator,
e.g.,
due to LIMIT clause, generate_series() etc. ...
(Ideally distributed INSERT .. SELECT could handle such queries too,
e.g.,
when we're only referencing tables that don't have a shard key, but
today
this is not the case. See
https://github.com/citusdata/citus/pull/6773#discussion_r1140130562.
- Query generator is used to create queries, allowed by the grammar which is documented at `query_generator/query_gen.py` (currently contains only joins).
- This PR adds a CI test which utilizes the query generator to compare the results of generated queries that are executed on Citus tables and local (undistributed) tables. It fails if there is an unexpected error at results. The error can be related to Citus, the query generator, or even Postgres.
- The tool is configured by the file `query_generator/config/config.yaml`, which limits table counts at generated queries and sets many table related parameters (e.g. row count).
- Run time of the CI task can be configured from the config file. By default, we run 250 queries with maximum table count of 40 inside each query.
DESCRIPTION: Changes the regression test setups adding the coordinator
to metadata by default.
When creating a Citus cluster, coordinator can be added in metadata
explicitly by running `citus_set_coordinator_host ` function. Adding the
coordinator to metadata allows to create citus managed local tables.
Other Citus functionality is expected to be unaffected.
This change adds the coordinator to metadata by default when creating
test clusters in regression tests.
There are 3 ways to run commands in a sql file (or a schedule which is a
sequence of sql files) with Citus regression tests. Below is how this PR
adds the coordinator to metadata for each.
1. `make <schedule_name>`
Changed the sql files (sql/multi_cluster_management.sql and
sql/minimal_cluster_management.sql) which sets up the test clusters such
that they call `citus_set_coordinator_host`. This ensures any following
tests will have the coordinator in metadata by default.
2. `citus_tests/run_test.py <sql_file_name>`
Changed the python code that sets up the cluster to always call `
citus_set_coordinator_host`.
For the upgrade tests, a version check is included to make sure
`citus_set_coordinator_host` function is available for a given version.
3. ` make check-arbitrary-configs `
Changed the python code that sets up the cluster to always call
`citus_set_coordinator_host `.
#6864 will be used to track the remaining work which is to change the
tests where coordinator is added/removed as a node.
DESCRIPTION: Support ALTER TABLE .. ADD PRIMARY KEY ... command
Before processing
> **ALTER TABLE ... ADD PRIMARY KEY ...**
command
1. Create a primary key name to use as the constraint name.
2. Change the **ALTER TABLE ... ADD PRIMARY KEY ...** command to into
**ALTER TABLE ... ADD CONSTRAINT \<constraint name> PRIMARY KEY ...**
form.
This is the only form we can specify a name for a primary key. If we run
ALTER TABLE .. ADD PRIMARY KEY, postgres
would create a constraint name internally in its own scheme. But the
problem is that we need to create constraint names
for shards in our own scheme which is \<constraint name>_\<shardid>.
Hence we need to create a name and send it to workers so that the
workers can append the shardid.
4. Run the changed command on the coordinator to make sure we are using
the same constraint name across the board.
5. Send the changed command to workers such that it is executed for the
main table as well as for the shards.
Fixes#6515.
One of our arbitrary config tests would sometimes fail like this in CI:
```diff
su_nationkey,
cust_nation,
l_year;
- supp_nation | cust_nation | l_year | revenue
----------------------------------------------------------------------
- 9 | C | 2008 | 3.00
-(1 row)
-
+ERROR: cannot connect to localhost:10212 to fetch intermediate results
+CONTEXT: while executing command on localhost:10211
```
When looking at the logs it seems like we were running out of
connections:
```
2022-08-23 14:03:52.856 UTC [28122] FATAL: sorry, too many clients already
2022-08-23 14:03:52.860 UTC [21027] ERROR: cannot connect to localhost:10212 to fetch intermediate results
```
This happened with `CitusThreeWorkersManyShards` config. This test on
purpose tries to push the limits of Citus quite far. And the
`ch_benchmarks_1` test is also run in parallel with a few more ones. So
it's not too weird that it ran out of connections. This doubles the
connection limit in the arbitrary config tests to hopefully not hit this
error again.
Example of failed test: https://app.circleci.com/pipelines/github/citusdata/citus/26365/workflows/7a1b5688-85cc-4bc3-ade5-9bd1d83cd0ed/jobs/747908/parallel-runs/1
We have fsync enabled for regular tests already in `pg_regress_multi.pl`.
This does the same for the arbitrary config tests.
On my machine this changes the runtime from the following command from
~37 to ~25 seconds:
```bash
make -C src/test/regress/ check-arbitrary-configs CONFIGS=CitusDefaultClusterConfig
```
(cherry picked from commit 4e93afd1f78854e1aaab63690c441b0b0598a82c)
(cherry picked from commit 0295fe2f5b)
(cherry picked from commit 878510725fab9cb6870b4504e0b1f055d7bbc68d)
We had 2 class definitions for CitusCacheManyConnectionsConfig, where
one of them was a copy of CitusSmallCopyBuffersConfig.
This commit leaves the intended class definition that configures caching
many connections, and removes the one that is a copy of another class
- [x] Add some more regression test coverage
- [x] Make sure returning works fine in case of
local execution + remote execution
(task->partiallyLocalOrRemote works as expected, already added tests)
- [x] Implement locking properly (and add isolation tests)
- [x] We do #shardcount round-trips on `SerializeNonCommutativeWrites`.
We made it a single round-trip.
- [x] Acquire locks for subselects on the workers & add isolation tests
- [x] Add a GUC to prevent modification from the workers, hence increase the
coordinator-only throughput
- The performance slightly drops (~%15), unless
`citus.allow_modifications_from_workers_to_replicated_tables`
is set to false
To run tests in parallel use:
```bash
make check-arbitrary-configs parallel=4
```
To run tests sequentially use:
```bash
make check-arbitrary-configs parallel=1
```
To run only some configs:
```bash
make check-arbitrary-base CONFIGS=CitusSingleNodeClusterConfig,CitusSmallSharedPoolSizeConfig
```
To run only some test files with some config:
```bash
make check-arbitrary-base CONFIGS=CitusSingleNodeClusterConfig EXTRA_TESTS=dropped_columns_1
```
To get a deterministic run, you can give the random's seed:
```bash
make check-arbitrary-configs parallel=4 seed=12312
```
The `seed` will be in the output of the run.
In our regular regression tests, we can see all the details about either planning or execution but this means
we need to run the same query under different configs/cluster setups again and again, which is not really maintanable.
When we don't care about the internals of how planning/execution is done but the correctness, especially with different configs
this infrastructure can be used.
With `check-arbitrary-configs` target, the following happens:
- a bunch of configs are loaded, which are defined in `config.py`. These configs have different settings such as different shard count, different citus settings, postgres settings, worker amount, or different metadata.
- For each config, a separate data directory is created for tests in `tmp_citus_test` with the config's name.
- For each config, `create_schedule` is run on the coordinator to setup the necessary tables.
- For each config, `sql_schedule` is run. `sql_schedule` is run on the coordinator if it is a non-mx cluster. And if it is mx, it is either run on the coordinator or a random worker.
- Tests results are checked if they match with the expected.
When tests results don't match, you can see the regression diffs in a config's datadir, such as `tmp_citus_tests/dataCitusSingleNodeClusterConfig`.
We also have a PostgresConfig which runs all the test suite with Postgres.
By default configs use regular user, but we have a config to run as a superuser as well.
So the infrastructure tests:
- Postgres vs Citus
- Mx vs Non-Mx
- Superuser vs regular user
- Arbitrary Citus configs
When you want to add a new test, you can add the create statements to `create_schedule` and add the sql queries to `sql_schedule`.
If you are adding Citus UDFs that should be a NO-OP for Postgres, make sure to override the UDFs in `postgres.sql`.
You can add your new config to `config.py`. Make sure to extend either `CitusDefaultClusterConfig` or `CitusMXBaseClusterConfig`.
On the CI, upon a failure, all logfiles will be uploaded as artifacts, so you can check the artifacts tab.
All the regressions will be shown as part of the job on CI.
In your local, you can check the regression diffs in config's datadirs as in `tmp_citus_tests/dataCitusSingleNodeClusterConfig`.