Recursive planner should handle all the tree from bottom to top at
single pass. i.e. It should have already recursively planned all
required parts in its first pass. Otherwise, this means we have bug at
recursive planner, which needs to be handled. We add a check here and
return error.
DESCRIPTION: Fixes wrong results by throwing error in case recursive
planner multipass the query.
We found 3 different cases which causes recursive planner passes the
query multiple times.
1. Sublink in WHERE clause is planned at second pass after we
recursively planned a distributed table at the first pass. Fixed by PR
#6657.
2. Local-distributed joins are recursively planned at both the first and
the second pass. Issue #6659.
3. Some parts of the query is considered to be noncolocated at the
second pass as we do not generate attribute equivalances between
nondistributed and distributed tables. Issue #6653
We should do the sublink conversations at the end of the recursive
planning because earlier steps might have transformed the query into a
shape that needs recursively planning the sublinks.
DESCRIPTION: Fixes early sublink check at recursive planner.
Related to PR https://github.com/citusdata/citus/pull/6650
This change allows creating a constraint without a name using an index.
The index name will be used as the constraint name the same way postgres
handles it.
Fixes issue #6644
This commit also cleans up some leftovers from nameless constraint checks.
With this commit, we now fully support adding all nameless constraints
directly to a table.
Co-authored-by: naisila <nicypp@gmail.com>
Adds NOT VALID option to deparser. When we need to deparse:
"ALTER TABLE ADD FOREIGN KEY ... NOT VALID"
"ALTER TABLE ADD CHECK ... NOT VALID"
NOT VALID option should be propagated to workers.
Fixes issue #6646
This commit also uses AppendColumnNameList function
instead of repeated code blocks in two appropriate places
in the "ALTER TABLE" deparser.
If an update query on a reference table has a returns clause with a
subquery that accesses some other local table, we end-up with an crash.
This commit prevents the crash, but does not prevent other error
messages from happening due to Citus not being able to pushdown the
results of that subquery in a valid SQL command.
Related: #6634
DESCRIPTION: Fix regression in allowed foreign keys on distributed
tables
In commit eadc88a we changed how we skip foreign key validation. The
goal was to skip it in more cases. However, one change had the
unintended regression of introducing failures when trying to create
certain foreign keys. This reverts that part of the change.
The way of skipping validation of foreign keys that was introduced in
eadc88a was skipping validation during execution. The reason that
this caused this regression was because some foreign key validation
queries already fail during planning. In those cases it never gets to
the execution step where it would later be skipped.
Fixes#6543
DESCRIPTION: Enable adding FOREIGN KEY constraints on Citus tables
without a name
This PR enables adding a foreign key to a distributed/reference/Citus
local table without specifying the name of the constraint, e.g. `ALTER
TABLE items ADD FOREIGN KEY (user_id) REFERENCES users (id);`
This implements the phase - II of MERGE sql support
Support routable query where all the tables in the merge-sql are distributed, co-located, and both the source and
target relations are joined on the distribution column with a constant qual. This should be a Citus single-task
query. Below is an example.
SELECT create_distributed_table('t1', 'id');
SELECT create_distributed_table('s1', 'id', colocate_with => ‘t1’);
MERGE INTO t1
USING s1 ON t1.id = s1.id AND t1.id = 100
WHEN MATCHED THEN
UPDATE SET val = s1.val + 10
WHEN MATCHED THEN
DELETE
WHEN NOT MATCHED THEN
INSERT (id, val, src) VALUES (s1.id, s1.val, s1.src)
Basically, MERGE checks to see if
There are a minimum of two distributed tables (source and a target).
All the distributed tables are indeed colocated.
MERGE relations are joined on the distribution column
MERGE .. USING .. ON target.dist_key = source.dist_key
The query should touch only a single shard i.e. JOIN AND with a constant qual
MERGE .. USING .. ON target.dist_key = source.dist_key AND target.dist_key = <>
If any of the conditions are not met, it raises an exception.
DESCRIPTION: Enable adding CHECK constraints on distributed tables
without the client having to provide a constraint name.
This PR enables the following command syntax for adding check
constraints to distributed tables.
ALTER TABLE ... ADD CHECK ...
by creating a default constraint name and transforming the command into
the below syntax before sending it to workers.
ALTER TABLE ... ADD CONSTRAINT \<conname> CHECK ...
Table Constraints UNIQUE, PRIMARY KEY and EXCLUDE may have option
DEFERRABLE in their command syntax. This PR handles the option when
deparsing the relevant constraint statements.
NOT DEFERRABLE
and
INITIALLY IMMEDIATE (if DEFERRABLE}
are the default values for the option so we only append the non-default
values to the alter table statement.
DESCRIPTION: Adds support for creating table constraints UNIQUE and
EXCLUDE via ALTER TABLE command without client having to specify a name.
ALTER TABLE ... ADD CONSTRAINT <conname> UNIQUE ...
ALTER TABLE ... ADD CONSTRAINT <conname> EXCLUDE ...
commands require the client to provide an explicit constraint name.
However, in postgres it is possible for clients not to provide a name
and let the postgres generate it using the following commands
ALTER TABLE ... ADD UNIQUE ...
ALTER TABLE ... ADD EXCLUDE ...
This PR enables the same functionality for citus tables.
DESCRIPTION: Drop `SHARD_STATE_TO_DELETE` and use the cleanup records
instead
Drops the shard state that is used to mark shards as orphaned. Now we
insert cleanup records into `pg_dist_cleanup` so "orphaned" shards will
be dropped either by maintenance daemon or internal cleanup calls. With
this PR, we make the "cleanup orphaned shards" functions to be no-op, as
they would not be needed anymore.
This PR includes some naming changes about placement functions. We don't
need functions that filter orphaned shards, as there will be no orphaned
shards anymore.
We will also be introducing a small script with this PR, for users with
orphaned shards. We'll basically delete the orphaned shard entries from
`pg_dist_placement` and insert cleanup records into `pg_dist_cleanup`
for each one of them, during Citus upgrade.
We also have a lot of flakiness fixes in this PR.
Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
DESCRIPTION: Defers cleanup after a failure in shard move or split
We don't need to do a cleanup in case of failure on a shard transfer or
split anymore. Because,
* Maintenance daemon will clean them up anyway.
* We trigger a cleanup at the beginning of shard transfers/splits.
* The cleanup on failure logic also can fail sometimes and instead of
the original error, we throw the error that is raised by the cleanup
procedure, and it causes confusion.
DESCRIPTION: Cleanup the shard on the target node in case of a
failed/aborted shard move
Inserts a cleanup record for the moved shard placement on the target
node. If the move operation succeeds, the record will be deleted. If
not, it will remain there to be cleaned up later.
fixes: #6580
We have several version checks in our Citus upgrade tests. However, as
we drop support for PG versions, we need to update the Citus versions
used in our CI images. Therefore we must compare Citus versions in our
tests instead of using equality checks so that the queries are ran in
all the associated Citus versions.
For example, we have many conditionals where we early exit if the Citus
version is not equal to 9.0. However, as of today we never use version
9.0 and thus we always early exit in those tests.
All the tables (target, source or any CTE present) in the SQL statement are local i.e. a merge-sql with a combination of Citus local and
Non-Citus tables (regular Postgres tables) should work and give the same result as Postgres MERGE on regular tables. Catch and throw an
exception (not-yet-supported) for all other scenarios during Citus-planning phase.
DESCRIPTION: Support ALTER TABLE .. ADD PRIMARY KEY ... command
Before processing
> **ALTER TABLE ... ADD PRIMARY KEY ...**
command
1. Create a primary key name to use as the constraint name.
2. Change the **ALTER TABLE ... ADD PRIMARY KEY ...** command to into
**ALTER TABLE ... ADD CONSTRAINT \<constraint name> PRIMARY KEY ...**
form.
This is the only form we can specify a name for a primary key. If we run
ALTER TABLE .. ADD PRIMARY KEY, postgres
would create a constraint name internally in its own scheme. But the
problem is that we need to create constraint names
for shards in our own scheme which is \<constraint name>_\<shardid>.
Hence we need to create a name and send it to workers so that the
workers can append the shardid.
4. Run the changed command on the coordinator to make sure we are using
the same constraint name across the board.
5. Send the changed command to workers such that it is executed for the
main table as well as for the shards.
Fixes#6515.
Before this commit, we created an additional WaitEventSet for
checking whether the remote socket is closed per connection -
only once at the start of the execution.
However, for certain workloads, such as pgbench select-only
workloads, the creation/deletion of the additional WaitEventSet
adds ~7% CPU overhead, which is also reflected on the benchmark
results.
With this commit, we use the same WaitEventSet for the purposes
of checking the remote socket at the start of the execution.
We use "rebuildWaitEventSet" flag so that the executor can re-use
the existing WaitEventSet.
As a result, we see the following improvements on PG 15:
main : 120051 tps, 0.532 ms latency avg.
avoid_wes_rebuild: 127119 tps, 0.503 ms latency avg.
And, on PG 14, as expected, there is no difference
main : 129191 tps, 0.495 ms latency avg.
avoid_wes_rebuild: 129480 tps, 0.494 ms latency avg.
But, note that PG 15 is slightly (~1.5%) slower than PG 14.
That is probably the overhead of checking the remote socket.
We already have citus_job_wait to wait until the job reaches the desired
state. That PR adds waiting on task state to allow more granular
waiting. It can be used for Citus operations. Moreover, it is also
useful for testing purposes. (wait until a task reaches specified state)
Related to #6459.
Adding a testing function `wait_for_resource_cleanup` which waits until
all records in `pg_dist_cleanup` are cleaned up. The motivation is to
prevent flakiness in our tests, since the `NOTICE: cleaned up X orphaned
resources` message is not consistent in many cases. This PR replaces
`citus_cleanup_orphaned_resources` calls with
`wait_for_resource_cleanup` calls.
DESCRIPTION: Create replication artifacts with unique names
We're creating replication objects with generic names. This disallows us
to enable parallel shard moves, as two operations might use the same
objects. With this PR, we'll create below objects with operation
specific names, by appending OparationId to the names.
* Subscriptions
* Publications
* Replication Slots
* Users created for subscriptions
1) Regular users fail to use clock UDF with permission issue.
2) Clock functions were declared as STABLE, whereas by definition they are VOLATILE. By design, any clock/time
functions will return different results for each call even within a single SQL statement.
Note: UDF citus_get_transaction_clock() is a misnomer as it internally calls the clock tick which always returns
different results for every invocation in the same transaction.
Adds signal handlers for graceful termination, cancellation of
task executors and detecting config updates. Related to PR #6459.
#### How to handle termination signal?
Monitor need to gracefully terminate all running task executors before
terminating. Hence, we have sigterm handler for the monitor.
#### How to handle cancellation signal?
Monitor need to gracefully cancel all running task executors before
terminating. Hence, we have sigint handler for the monitor.
#### How to detect configuration changes?
Monitor has SIGHUP handler to reflect configuration changes while
executing tasks.
We are having some flakiness in our test schedule because of the objects
leftover from shard moves/splits. With this commit we prevent logging
cleanup object counts.
fixes: #6534
DESCRIPTION: Extend cleanup process for replication artifacts
This PR adds new cleanup record types for:
* Subscriptions
* Replication slots
* Publications
* Users created for subscriptions
We add records for these object types, to `pg_dist_cleanup` during
creation phase. Once the operation is done, in case of success or
failure, we iterate those records and drop the objects. With this PR we
will not be dropping any of these objects during the operation. In
short, we will always be deferring the drop.
One thing that's worth mentioning is that we sort cleanup records before
processing (dropping) them, because of dependency relations among those
objects, e.g a subscription might depend on a publication. Therefore, we
always drop subscriptions before publications.
We have some renames in this PR:
* `TryDropOrphanedShards` -> `TryDropOrphanedResources`
* `DropOrphanedShardsForCleanup` -> `DropOrphanedResourcesForCleanup`
* `run_try_drop_marked_shards` -> `run_try_drop_marked_resources`
as these functions now process replication artifacts as well.
This PR drops function `DropAllLogicalReplicationLeftovers` and its all
usages, since now we rely on the deferring drop mechanism.
Improvement on our background task monitoring API (PR #6296) to support
concurrent and nonblocking task execution.
Mainly we have a queue monitor background process which forks task
executors for `Runnable` tasks and then monitors their status by
fetching messages from shared memory queue in nonblocking way.
**Problem**: Currently, we error out if we detect recurring tuples in
one side without checking the other side of the join.
**Solution**: When one side of the full join consists recurring tuples
and the other side consists nonrecurring tuples, we should not pushdown
to prevent duplicate results. Otherwise, safe to pushdown.
This PR changes
```citus.propagate_session_settings_for_loopback_connection``` default
value to off not to expose this feature publicly at this point. See
#6488 for details.