Because they're only interested in distributed tables. Even more,
this replaces HasDistributionKey() check with
IsCitusTableType(DISTRIBUTED_TABLE) because this doesn't make a
difference on main and sounds slightly more intuitive. Plus, this
would also allow safely using this function in
https://github.com/citusdata/citus/pull/6773.
DESCRIPTION: Check before logicalrep for rebalancer, error if needed
Check if we can use logical replication or not, in case of shard
transfer mode = auto, before executing the shard moves. If we can't,
error out. Before this PR, we used to error out in the middle of shard
moves:
```sql
set citus.shard_count = 4; -- just to get the error sooner
select citus_remove_node('localhost',9702);
create table t1 (a int primary key);
select create_distributed_table('t1','a');
create table t2 (a bigint);
select create_distributed_table('t2','a');
select citus_add_node('localhost',9702);
select rebalance_table_shards();
NOTICE: Moving shard 102008 from localhost:9701 to localhost:9702 ...
NOTICE: Moving shard 102009 from localhost:9701 to localhost:9702 ...
NOTICE: Moving shard 102012 from localhost:9701 to localhost:9702 ...
ERROR: cannot use logical replication to transfer shards of the relation t2 since it doesn't have a REPLICA IDENTITY or PRIMARY KEY
```
Now we check and error out in the beginning, without moving the shards.
fixes: #6727
Fixes#6672
2) Move all MERGE related routines to a new file merge_planner.c
3) Make ConjunctionContainsColumnFilter() static again, and rearrange the code in MergeQuerySupported()
4) Restore the original format in the comments section.
5) Add big serial test. Implement latest set of comments
This implements the phase - II of MERGE sql support
Support routable query where all the tables in the merge-sql are distributed, co-located, and both the source and
target relations are joined on the distribution column with a constant qual. This should be a Citus single-task
query. Below is an example.
SELECT create_distributed_table('t1', 'id');
SELECT create_distributed_table('s1', 'id', colocate_with => ‘t1’);
MERGE INTO t1
USING s1 ON t1.id = s1.id AND t1.id = 100
WHEN MATCHED THEN
UPDATE SET val = s1.val + 10
WHEN MATCHED THEN
DELETE
WHEN NOT MATCHED THEN
INSERT (id, val, src) VALUES (s1.id, s1.val, s1.src)
Basically, MERGE checks to see if
There are a minimum of two distributed tables (source and a target).
All the distributed tables are indeed colocated.
MERGE relations are joined on the distribution column
MERGE .. USING .. ON target.dist_key = source.dist_key
The query should touch only a single shard i.e. JOIN AND with a constant qual
MERGE .. USING .. ON target.dist_key = source.dist_key AND target.dist_key = <>
If any of the conditions are not met, it raises an exception.
(cherry picked from commit 44c387b978)
This implements MERGE phase3
Support pushdown query where all the tables in the merge-sql are Citus-distributed, co-located, and both
the source and target relations are joined on the distribution column. This will generate multiple tasks
which execute independently after pushdown.
SELECT create_distributed_table('t1', 'id');
SELECT create_distributed_table('s1', 'id', colocate_with => ‘t1’);
MERGE INTO t1
USING s1
ON t1.id = s1.id
WHEN MATCHED THEN
UPDATE SET val = s1.val + 10
WHEN MATCHED THEN
DELETE
WHEN NOT MATCHED THEN
INSERT (id, val, src) VALUES (s1.id, s1.val, s1.src)
*The only exception for both the phases II and III is, UPDATEs and INSERTs must be done on the same shard-group
as the joined key; for example, below scenarios are NOT supported as the key-value to be inserted/updated is not
guaranteed to be on the same node as the id distribution-column.
MERGE INTO target t
USING source s ON (t.customer_id = s.customer_id)
WHEN NOT MATCHED THEN - -
INSERT(customer_id, …) VALUES (<non-local-constant-key-value>, ……);
OR this scenario where we update the distribution column itself
MERGE INTO target t
USING source s On (t.customer_id = s.customer_id)
WHEN MATCHED THEN
UPDATE SET customer_id = 100;
(cherry picked from commit fa7b8949a8)
Decide core distribution params in CreateCitusTable to reduce the
chances of
creating Citus tables based on incorrect combinations of distribution
method
and replication model params.
Also introduce DistributedTableParams struct to encapsulate the
parameters
that are specific to distributed tables.
Now that we will soon add another table type having DISTRIBUTE_BY_NONE
as distribution method and that we want the code to interpret such
tables mostly as distributed tables, let's make the definition of those
other two table types more strict by removing
CITUS_TABLE_WITH_NO_DIST_KEY
macro.
And instead, use HasDistributionKey() check in the places where the
logic applies to all table types that have / don't have a distribution
key. In future PRs, we might want to convert some of those
HasDistributionKey() checks if logic only applies to Citus local /
reference tables, not the others.
And adding HasDistributionKey() also allows us to consider having
DISTRIBUTE_BY_NONE as the distribution method as a "table attribute"
that can apply to distributed tables too, rather something that
determines the table type.
Split the main logic that allows creating a Citus table into the
internal function CreateCitusTable().
Old CreateDistributedTable() function was assuming that it's creating
a reference table when the distribution method is DISTRIBUTE_BY_NONE.
However, soon this won't be the case when adding support for creating
single-shard distributed tables because their distribution method would
also be the same.
Now the internal method CreateCitusTable() doesn't make any assumptions
about table's replication model or such. Instead, it expects callers to
properly set all such metadata bits.
Even more, some of the parameters the old CreateDistributedTable() takes
--such as the shard count-- were not meaningful for a reference table,
and would be the same as for new table type.
DESCRIPTION: Fixes a bug in shard copy operations.
For copying shards in both shard move and shard split operations, Citus
uses the COPY statement.
A COPY all statement in the following form
` COPY target_shard FROM STDIN;`
throws an error when there is a GENERATED column in the shard table.
In order to fix this issue, we need to exclude the GENERATED columns in
the COPY and the matching SELECT statements. Hence this fix converts the
COPY and SELECT all statements to the following form:
```
COPY target_shard (col1, col2, ..., coln) FROM STDIN;
SELECT (col1, col2, ..., coln) FROM source_shard;
```
where (col1, col2, ..., coln) does not include a GENERATED column.
GENERATED column values are created in the target_shard as the values
are inserted.
Fixes#6705.
---------
Co-authored-by: Teja Mupparti <temuppar@microsoft.com>
Co-authored-by: aykut-bozkurt <51649454+aykut-bozkurt@users.noreply.github.com>
Co-authored-by: Jelte Fennema <jelte.fennema@microsoft.com>
Co-authored-by: Gürkan İndibay <gindibay@microsoft.com>
DESCRIPTION: Adds logic to distribute unbalanced shards
If the number of shard placements (for a colocation group) is less than
the number of workers, it means that some of the workers will remain
empty. With this PR, we consider these shard groups as a colocation
group, in order to make them be distributed evenly as much as possible
across the cluster.
Example:
```sql
create table t1 (a int primary key);
create table t2 (a int primary key);
create table t3 (a int primary key);
set citus.shard_count =1;
select create_distributed_table('t1','a');
select create_distributed_table('t2','a',colocate_with=>'t1');
select create_distributed_table('t3','a',colocate_with=>'t2');
create table tb1 (a bigint);
create table tb2 (a bigint);
select create_distributed_table('tb1','a');
select create_distributed_table('tb2','a',colocate_with=>'tb1');
select citus_add_node('localhost',9702);
select rebalance_table_shards();
```
Here we have two colocation groups, each with one shard group. Both
shard groups are placed on the first worker node. When we add a new
worker node and try to rebalance table shards, the rebalance planner
considers it well balanced and does nothing. With this PR, the
rebalancer tries to distribute these shard groups evenly across the
cluster as much as possible. For this example, with this PR, the
rebalancer moves one of the shard groups to the second worker node.
fixes: #6715
DESCRIPTION: Correctly report shard size in citus_shards view
When looking at citus_shards, people are interested in the actual size
that all the data related to the shard takes up on disk.
`pg_total_relation_size` is the function to use for that purpose. The
previously used `pg_relation_size` does not include indexes or TOAST.
Especially the missing toast can have enormous impact on the size of the
shown data.
2 improvements to prevent memory leaks during altering or undistributing
distributed tables with a lot of partitions and shards:
1. Free memory for each call to ConvertTable so that colocated and partition tables at
`AlterDistributedTable`, `UndistributeTable`, or
`AlterTableSetAccessMethod` will not cause an increase
in memory usage,
2. Free memory while executing attach partition commands for each partition table at
`AlterDistributedTable` to prevent an increase in memory usage.
DESCRIPTION: Fixes memory leak issue during altering distributed table
with a lot of partition and shards.
Fixes https://github.com/citusdata/citus/issues/6503.
We have memory leak during distribution of a table with a lot of
partitions as we do not release memory at ExprContext until all
partitions are not distributed. We improved 2 things to resolve the
issue:
1. We create and delete MemoryContext for each call to
`CreateDistributedTable` by partitions,
2. We rebuild the cache after we insert all the placements instead of
each placement for a shard.
DESCRIPTION: Fixes memory leak during distribution of a table with a lot
of partitions and shards.
Fixes https://github.com/citusdata/citus/issues/6572.
When auto_explain module is loaded and configured, EXPLAIN will be
implicitly run for all the supported commands. Postgres does not support
`EXPLAIN` for `ALTER` command. However, auto_explain will try to
`EXPLAIN` other supported commands internally triggered by `ALTER`.
For instance,
`ALTER TABLE target_table ADD CONSTRAINT fkey_167 FOREIGN KEY (col_1)
REFERENCES ref_table(key) ... `
command may trigger a SELECT command in the following form for foreign
key validation purpose:
`SELECT fk.col_1 FROM ONLY target_table fk LEFT OUTER JOIN ONLY
ref_table pk ON ( pk.key OPERATOR(pg_catalog.=) fk.col_1) WHERE pk.key
IS NULL AND (fk.col_1 IS NOT NULL) `
For Citus tables, the Citus utility hook should ensure that constraint
validation is skipped for shell tables but they are done for shard
tables. The reason behind this design choice can be summed up as:
- An ALTER TABLE command via coordinator node is run in a distributed
transaction.
- Citus does not support nested distributed transactions.
- A SELECT query on a distributed table (aka shell table) is also run in
a distributed transaction.
- Therefore, Citus does not support running a SELECT query on a shell
table while an ALTER TABLE command is running.
With
eadc88a800
a bug is introduced breaking the skip constraint validation behaviour of
Citus. With this change, we see that validation queries on distributed
tables are triggered within `ALTER` command adding constraints with
validation check. This regression did not cause an issue for regular use
cases since the citus executor hook blocks those queries heuristically
when there is an ALTER TABLE command in progress.
The issue is surfaced as a crash (#6424 Workers, when configured to use
auto_explain, crash during distributed transactions.) when auto_explain
is enabled. This is due to auto_explain trying to execute the SELECT
queries in a nested distributed transaction.
Now since the regression with constraint validation is fixed in
https://github.com/citusdata/citus/issues/6543, we should be able to
remove the workaround.
We should not omit to free PGResult when we receive single tuple result
from an internal backend.
Single tuple results are normally freed by our ReceiveResults for
`tupleDescriptor != NULL` flow but not for those with `tupleDescriptor
== NULL`. See PR #6722 for details.
DESCRIPTION: Fixes memory leak issue with query results that returns
single row.
Prevents memory leak during ConvertTable call for a table with a lot of
partitions.
DESCRIPTION: Fixes memory leak during undistribution and alteration of a
table with a lot of partitions.
In #6314 I refactored the connection cleanup to be simpler to
understand and use. However, by doing so I introduced a use-after-free
possibility (that valgrind luckily picked up):
In the `ShouldShutdownConnection` path of
`AfterXactHostConnectionHandling`
we free connections without removing the `transactionNode` from the
dlist that it might be part of. Before the refactoring this wasn't a
problem, because the dlist would be completely reset quickly after in
`ResetGlobalVariables` (without reading or writing the dlist entries).
The refactoring changed this by moving the `dlist_delete` call to
`ResetRemoteTransaction`, which in turn was called in the
`!ShouldShutdownConnection` path of `AfterXactHostConnectionHandling`.
Thus this `!ShouldShutdownConnection` path would now delete from the
`dlist`, but the `ShouldShutdownConnection` path would not. Thus to
remove itself the deleting path would sometimes update nodes in the list
that were freed right before.
There's two ways of fixing this:
1. Call `dlist_delete` from **both** of paths.
2. Call `dlist_delete` from **neither** of the paths.
This commit implements the second approach, and #6684 implements the
first. We need to choose which approach we prefer.
To make calling `dlist_delete` from both paths actually work, we also need
to use a slightly different check to determine if we need to call dlist_delete.
Various regression tests showed that there can be cases where the
`transactionState` is something else than `REMOTE_TRANS_NOT_STARTED`
but the connection was not added to the `InProgressTransactions` list
One example of such a case is when running `TransactionStateMachine`
without calling `StartRemoteTransactionBegin` beforehand. In those
cases the connection won't be added to `InProgressTransactions`, but
the `transactionState` is changed to `REMOTE_TRANS_SENT_COMMAND`.
Sidenote: This bug already existed in 11.1, but valgrind didn't catch it
back then. My guess is that this happened because #6314 was merged after
the initial release branch was cut.
Fixes#6638
If there is a problem with an ongoing rebalance, we did not show details
on background tasks that are stuck in runnable state. Similar to how we
show details for errored tasks, we now show details on tasks that are
being retried.
Earlier we showed the following output when a task was stuck:
```
┌────────────────────────────┐
│ { ↵│
│ "tasks": [ ↵│
│ ], ↵│
│ "task_state_counts": {↵│
│ "done": 13, ↵│
│ "blocked": 2, ↵│
│ "runnable": 1 ↵│
│ } ↵│
│ } │
└────────────────────────────┘
```
Now we show details like the following:
```
+-----------------------------------------------------------------------
| {
| "tasks": [
| {
| "state": "runnable",
| "command": "SELECT pg_catalog.citus_move_shard_placement(1
| "message": "ERROR: Moving shards to a node that shouldn't
| "retried": 2,
| "task_id": 3
| }
| ],
| "task_state_counts": {
| "blocked": 1,
| "runnable": 1
| }
| }
+-----------------------------------------------------------------------
```
DESCRIPTION: Fix background rebalance when reference table has no PK
For the background rebalance we would always fail if a reference table
that was not replicated to all nodes would not have a PK (or replica
identity). Even when we used force_logical or block_writes as the shard
transfer mode. This fixes that and adds some regression tests.
Fixes#6680
We should disallow dropping table_name option if foreign table is in
metadata. Otherwise, we get table not found error which contains
shardid.
DESCRIPTION: Fixes an unexpected foreign table error by disallowing to drop the table_name option.
Fixes#6663
Recursive planner should handle all the tree from bottom to top at
single pass. i.e. It should have already recursively planned all
required parts in its first pass. Otherwise, this means we have bug at
recursive planner, which needs to be handled. We add a check here and
return error.
DESCRIPTION: Fixes wrong results by throwing error in case recursive
planner multipass the query.
We found 3 different cases which causes recursive planner passes the
query multiple times.
1. Sublink in WHERE clause is planned at second pass after we
recursively planned a distributed table at the first pass. Fixed by PR
#6657.
2. Local-distributed joins are recursively planned at both the first and
the second pass. Issue #6659.
3. Some parts of the query is considered to be noncolocated at the
second pass as we do not generate attribute equivalances between
nondistributed and distributed tables. Issue #6653
DESCRIPTION: Fix foreign key validation skip at the end of shard move
In eadc88a we started completely skipping foreign key constraint
validation at the end of a non blocking shard move, instead of only for
foreign keys to reference tables. However, it turns out that this didn't
work at all because of a hard to notice bug: By resetting the
SkipConstraintValidation flag at the end of our utility hook, we
actually make the SET command that sets it a no-op.
This fixes that bug by removing the code that resets it. This is fine
because #6543 removed the only place where we set the flag in C code. So
the resetting of the flag has no purpose anymore. This PR also adds a
regression test, because it turned out we didn't have any otherwise we
would have caught that the feature was completely broken.
It also moves the constraint validation skipping to the utility hook.
The reason is that #6550 showed us that this is the better place to skip
it, because it will also skip the planning phase and not just the
execution.
We should do the sublink conversations at the end of the recursive
planning because earlier steps might have transformed the query into a
shape that needs recursively planning the sublinks.
DESCRIPTION: Fixes early sublink check at recursive planner.
Related to PR https://github.com/citusdata/citus/pull/6650
Fixes#6655.
heap_modify_tuple() fetches values[i] if replace[i] is set true,
regardless of the fact that whether isnull[i] is true or false. So
similar to replace[], let's init values[] & isnull[] too.
DESCRIPTION: Fixes an uninitialized memory access in
create_distributed_function()
This change allows creating a constraint without a name using an index.
The index name will be used as the constraint name the same way postgres
handles it.
Fixes issue #6644
This commit also cleans up some leftovers from nameless constraint checks.
With this commit, we now fully support adding all nameless constraints
directly to a table.
Co-authored-by: naisila <nicypp@gmail.com>
Adds NOT VALID option to deparser. When we need to deparse:
"ALTER TABLE ADD FOREIGN KEY ... NOT VALID"
"ALTER TABLE ADD CHECK ... NOT VALID"
NOT VALID option should be propagated to workers.
Fixes issue #6646
This commit also uses AppendColumnNameList function
instead of repeated code blocks in two appropriate places
in the "ALTER TABLE" deparser.
If an update query on a reference table has a returns clause with a
subquery that accesses some other local table, we end-up with an crash.
This commit prevents the crash, but does not prevent other error
messages from happening due to Citus not being able to pushdown the
results of that subquery in a valid SQL command.
Related: #6634
DESCRIPTION: Fix regression in allowed foreign keys on distributed
tables
In commit eadc88a we changed how we skip foreign key validation. The
goal was to skip it in more cases. However, one change had the
unintended regression of introducing failures when trying to create
certain foreign keys. This reverts that part of the change.
The way of skipping validation of foreign keys that was introduced in
eadc88a was skipping validation during execution. The reason that
this caused this regression was because some foreign key validation
queries already fail during planning. In those cases it never gets to
the execution step where it would later be skipped.
Fixes#6543
DESCRIPTION: Enable adding FOREIGN KEY constraints on Citus tables
without a name
This PR enables adding a foreign key to a distributed/reference/Citus
local table without specifying the name of the constraint, e.g. `ALTER
TABLE items ADD FOREIGN KEY (user_id) REFERENCES users (id);`
This implements the phase - II of MERGE sql support
Support routable query where all the tables in the merge-sql are distributed, co-located, and both the source and
target relations are joined on the distribution column with a constant qual. This should be a Citus single-task
query. Below is an example.
SELECT create_distributed_table('t1', 'id');
SELECT create_distributed_table('s1', 'id', colocate_with => ‘t1’);
MERGE INTO t1
USING s1 ON t1.id = s1.id AND t1.id = 100
WHEN MATCHED THEN
UPDATE SET val = s1.val + 10
WHEN MATCHED THEN
DELETE
WHEN NOT MATCHED THEN
INSERT (id, val, src) VALUES (s1.id, s1.val, s1.src)
Basically, MERGE checks to see if
There are a minimum of two distributed tables (source and a target).
All the distributed tables are indeed colocated.
MERGE relations are joined on the distribution column
MERGE .. USING .. ON target.dist_key = source.dist_key
The query should touch only a single shard i.e. JOIN AND with a constant qual
MERGE .. USING .. ON target.dist_key = source.dist_key AND target.dist_key = <>
If any of the conditions are not met, it raises an exception.
citus_job_list() lists all background jobs by simply showing the records
in pg_dist_background_job.
citus_job_status(job_id bigint, raw boolean default false) shows the
status of a single background job by appending a jsonb details column to
the associated row from pg_dist_background_job. If the raw argument is
set, machine readable sizes are used instead of human readable
alternatives.
citus_rebalance_status(raw boolean default false) shows the status of
the last rebalance operation. If the raw argument is set, machine
readable sizes are used instead of human readable alternatives.
The original implementation of GPIDs didn't work correctly when using
`pg_dist_poolinfo` together with PgBouncer. The reason is that it
assumed that once a connection was made to a worker, the originating
GPID should stay the same for ever. But when pg_dist_poolinfo is used
this isn't the case, because the same connection on the worker might be
used by different backends of the coordinator.
This fixes that issue by updating the GPID whenever a new application
name is set on a connection. This is the only thing that's needed,
because PgBouncer already sets the application name correctly on the
server connection whenever a client is updated.
DESCRIPTION: Enable adding CHECK constraints on distributed tables
without the client having to provide a constraint name.
This PR enables the following command syntax for adding check
constraints to distributed tables.
ALTER TABLE ... ADD CHECK ...
by creating a default constraint name and transforming the command into
the below syntax before sending it to workers.
ALTER TABLE ... ADD CONSTRAINT \<conname> CHECK ...
Table Constraints UNIQUE, PRIMARY KEY and EXCLUDE may have option
DEFERRABLE in their command syntax. This PR handles the option when
deparsing the relevant constraint statements.
NOT DEFERRABLE
and
INITIALLY IMMEDIATE (if DEFERRABLE}
are the default values for the option so we only append the non-default
values to the alter table statement.
In #6412 I made a change to not re-assign the global PID if it was
already set. This inadvertently introduced a regression where `userId`
and `databaseId` would not be set on the backend data when the global
PID was assigned in the authentication hook.
This fixes it by doing two things:
1. Removing `userId` from `BackendData`, since it's not used anywhere
anyway.
2. Move assignment of `databaseId` to dedicated
`SetBackendDataDatabaseId` function, that isn't a no-op when global
pid is already set.
Since #6412 is not released yet this does not need a description.
In #6598 it was noticed that Citus could generate syntactically invalid
statements during logical replication. With #6603 we resolved the direct
issue, by only generating valid subscription names. But there was also
the underlying problem that we did not escape certain identifier
strings. While in theory this should be okay since we should only
generate names that are valid, this issue reiterated that we should not
take this for granted. As an extra line of defense this quotes all
identifiers we use during logical replication setup.
DESCRIPTION: Adds support for creating table constraints UNIQUE and
EXCLUDE via ALTER TABLE command without client having to specify a name.
ALTER TABLE ... ADD CONSTRAINT <conname> UNIQUE ...
ALTER TABLE ... ADD CONSTRAINT <conname> EXCLUDE ...
commands require the client to provide an explicit constraint name.
However, in postgres it is possible for clients not to provide a name
and let the postgres generate it using the following commands
ALTER TABLE ... ADD UNIQUE ...
ALTER TABLE ... ADD EXCLUDE ...
This PR enables the same functionality for citus tables.
DESCRIPTION: Drop `SHARD_STATE_TO_DELETE` and use the cleanup records
instead
Drops the shard state that is used to mark shards as orphaned. Now we
insert cleanup records into `pg_dist_cleanup` so "orphaned" shards will
be dropped either by maintenance daemon or internal cleanup calls. With
this PR, we make the "cleanup orphaned shards" functions to be no-op, as
they would not be needed anymore.
This PR includes some naming changes about placement functions. We don't
need functions that filter orphaned shards, as there will be no orphaned
shards anymore.
We will also be introducing a small script with this PR, for users with
orphaned shards. We'll basically delete the orphaned shard entries from
`pg_dist_placement` and insert cleanup records into `pg_dist_cleanup`
for each one of them, during Citus upgrade.
We also have a lot of flakiness fixes in this PR.
Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
DESCRIPTION: Defers cleanup after a failure in shard move or split
We don't need to do a cleanup in case of failure on a shard transfer or
split anymore. Because,
* Maintenance daemon will clean them up anyway.
* We trigger a cleanup at the beginning of shard transfers/splits.
* The cleanup on failure logic also can fail sometimes and instead of
the original error, we throw the error that is raised by the cleanup
procedure, and it causes confusion.
DESCRIPTION: Cleanup the shard on the target node in case of a
failed/aborted shard move
Inserts a cleanup record for the moved shard placement on the target
node. If the move operation succeeds, the record will be deleted. If
not, it will remain there to be cleaned up later.
fixes: #6580
All the tables (target, source or any CTE present) in the SQL statement are local i.e. a merge-sql with a combination of Citus local and
Non-Citus tables (regular Postgres tables) should work and give the same result as Postgres MERGE on regular tables. Catch and throw an
exception (not-yet-supported) for all other scenarios during Citus-planning phase.
DESCRIPTION: Support ALTER TABLE .. ADD PRIMARY KEY ... command
Before processing
> **ALTER TABLE ... ADD PRIMARY KEY ...**
command
1. Create a primary key name to use as the constraint name.
2. Change the **ALTER TABLE ... ADD PRIMARY KEY ...** command to into
**ALTER TABLE ... ADD CONSTRAINT \<constraint name> PRIMARY KEY ...**
form.
This is the only form we can specify a name for a primary key. If we run
ALTER TABLE .. ADD PRIMARY KEY, postgres
would create a constraint name internally in its own scheme. But the
problem is that we need to create constraint names
for shards in our own scheme which is \<constraint name>_\<shardid>.
Hence we need to create a name and send it to workers so that the
workers can append the shardid.
4. Run the changed command on the coordinator to make sure we are using
the same constraint name across the board.
5. Send the changed command to workers such that it is executed for the
main table as well as for the shards.
Fixes#6515.
Removes unused job boundary tag `SUBQUERY_MAP_MERGE_JOB`.
Only usage is at `BuildMapMergeJob`, which is only called when the
boundary = `JOIN_MAP_MERGE_JOB`. Hence, it should be safe to remove.
Before this commit, we created an additional WaitEventSet for
checking whether the remote socket is closed per connection -
only once at the start of the execution.
However, for certain workloads, such as pgbench select-only
workloads, the creation/deletion of the additional WaitEventSet
adds ~7% CPU overhead, which is also reflected on the benchmark
results.
With this commit, we use the same WaitEventSet for the purposes
of checking the remote socket at the start of the execution.
We use "rebuildWaitEventSet" flag so that the executor can re-use
the existing WaitEventSet.
As a result, we see the following improvements on PG 15:
main : 120051 tps, 0.532 ms latency avg.
avoid_wes_rebuild: 127119 tps, 0.503 ms latency avg.
And, on PG 14, as expected, there is no difference
main : 129191 tps, 0.495 ms latency avg.
avoid_wes_rebuild: 129480 tps, 0.494 ms latency avg.
But, note that PG 15 is slightly (~1.5%) slower than PG 14.
That is probably the overhead of checking the remote socket.
Fixes a missed include in #6315.
While adding the cluster clock we have added some extra steps to
`citus_prepare_pg_upgrade` and `citus_finish_pg_upgrade`. These changes
were not added to the citus upgrade and downgrade scripts, this allowed
for a syntax error to slip in.
This PR adds the new versions of both UDF's to the upgrade script while
adding the old version to the downgrade script. This exposed the syntax
error which is also solved.
- Because of the make command used for vanilla tests, test status is
always shown as success on CI. As a fix, I added `&& false` at the end
of the copying diff file to make the command fail when check-vanilla
fails.
```make
check-vanilla: all
$(pg_regress_multi_check) --vanillatest || (cp $(vanilla_diffs_file) $(citus_abs_srcdir)/regression.diffs && false)
```
- I also fixed some vanilla tests that fails due to recently added clock
related operators shown up at some queries.
We already have citus_job_wait to wait until the job reaches the desired
state. That PR adds waiting on task state to allow more granular
waiting. It can be used for Citus operations. Moreover, it is also
useful for testing purposes. (wait until a task reaches specified state)
Related to #6459.
Fixes task executor SIGTERM handling.
Problem:
When task executors are sent SIGTERM, their default handler
`bgworker_die`, which is set at worker startup, logs FATAL error. But
they do not release locks there before logging the error, which
sometimes causes hanging of the monitor. e.g. Monitor waits for the lock
forever at pg_stat flush after calling proc_exit.
Solution:
Because executors have connection to backend, they should handle SIGTERM
similar to normal backends. Normal backends uses `die` handler, in which
they set ProcDiePending flag and the next CHECK_FOR_INTERRUPTS call
handles it gracefully by releasing any lock before termination.
DESCRIPTION: Create replication artifacts with unique names
We're creating replication objects with generic names. This disallows us
to enable parallel shard moves, as two operations might use the same
objects. With this PR, we'll create below objects with operation
specific names, by appending OparationId to the names.
* Subscriptions
* Publications
* Replication Slots
* Users created for subscriptions
1) Regular users fail to use clock UDF with permission issue.
2) Clock functions were declared as STABLE, whereas by definition they are VOLATILE. By design, any clock/time
functions will return different results for each call even within a single SQL statement.
Note: UDF citus_get_transaction_clock() is a misnomer as it internally calls the clock tick which always returns
different results for every invocation in the same transaction.
Adds signal handlers for graceful termination, cancellation of
task executors and detecting config updates. Related to PR #6459.
#### How to handle termination signal?
Monitor need to gracefully terminate all running task executors before
terminating. Hence, we have sigterm handler for the monitor.
#### How to handle cancellation signal?
Monitor need to gracefully cancel all running task executors before
terminating. Hence, we have sigint handler for the monitor.
#### How to detect configuration changes?
Monitor has SIGHUP handler to reflect configuration changes while
executing tasks.
When using multiline strings, we occasionally forget to add a single
space at the end of the first line. When this line is concatenated with
the next one, the resulting string has a missing space.
DESCRIPTION: Extend cleanup process for replication artifacts
This PR adds new cleanup record types for:
* Subscriptions
* Replication slots
* Publications
* Users created for subscriptions
We add records for these object types, to `pg_dist_cleanup` during
creation phase. Once the operation is done, in case of success or
failure, we iterate those records and drop the objects. With this PR we
will not be dropping any of these objects during the operation. In
short, we will always be deferring the drop.
One thing that's worth mentioning is that we sort cleanup records before
processing (dropping) them, because of dependency relations among those
objects, e.g a subscription might depend on a publication. Therefore, we
always drop subscriptions before publications.
We have some renames in this PR:
* `TryDropOrphanedShards` -> `TryDropOrphanedResources`
* `DropOrphanedShardsForCleanup` -> `DropOrphanedResourcesForCleanup`
* `run_try_drop_marked_shards` -> `run_try_drop_marked_resources`
as these functions now process replication artifacts as well.
This PR drops function `DropAllLogicalReplicationLeftovers` and its all
usages, since now we rely on the deferring drop mechanism.
Improvement on our background task monitoring API (PR #6296) to support
concurrent and nonblocking task execution.
Mainly we have a queue monitor background process which forks task
executors for `Runnable` tasks and then monitors their status by
fetching messages from shared memory queue in nonblocking way.
**Problem**: Currently, we error out if we detect recurring tuples in
one side without checking the other side of the join.
**Solution**: When one side of the full join consists recurring tuples
and the other side consists nonrecurring tuples, we should not pushdown
to prevent duplicate results. Otherwise, safe to pushdown.
This PR changes
```citus.propagate_session_settings_for_loopback_connection``` default
value to off not to expose this feature publicly at this point. See
#6488 for details.
When debugging issues it's quite useful to see the originating gpid in
the application_name of a query on a worker. This already happens for
most queries, but not for queries created by the rebalancer or by
run_command_on_worker. This adds a gpid to those two application_names
too.
Note, that if the GPID of the new application_names is different than
the current GPID of the backend the backend will continue to keep
the old gpid as its actual GPID. This PR is just meant to make sure
that the application_name is as useful as it can be for users to
look at. Updating of gpids will be done in a follow-up PR, and
adding gpids to all internal connections will make this easier.
DESCRIPTION: Fixes a potential dangling pointer issue
Need to backport to 11.0 & 11.1 since we might want to release packages
for debian/bookworm based on those branches in future.
Fixes a bug that causes crash when using auto_explain extension with
ALTER TABLE...ADD FOREIGN KEY... queries.
Those queries trigger a SELECT query on the citus tables as part of the
foreign key constraint validation check. At the explain hook, workers
try to explain this SELECT query as a distributed query causing memory
corruption in the connection data structures. Hence, we will not explain
ALTER TABLE...ADD FOREIGN KEY... and the triggered queries on the
workers.
Fixes#6424.
(Hopefully) Fixes#5000.
If memory allocation done for `SubXactContext *state` in `PushSubXact()`
fails, then `PopSubXact()` might segfault, for example, when grabbing
the
topmost `SubXactContext` from `activeSubXactContexts` if this is the
first
ever subxact within the current xact, with the following stack trace:
```c
citus.so!list_nth_cell(const List * list, int n) (\opt\pgenv\pgsql-14.3\include\server\nodes\pg_list.h:260)
citus.so!PopSubXact(SubTransactionId subId) (\home\onurctirtir\citus\src\backend\distributed\transaction\transaction_management.c:761)
citus.so!CoordinatedSubTransactionCallback(SubXactEvent event, SubTransactionId subId, SubTransactionId parentSubid, void * arg) (\home\onurctirtir\citus\src\backend\distributed\transaction\transaction_management.c:673)
CallSubXactCallbacks(SubXactEvent event, SubTransactionId mySubid, SubTransactionId parentSubid) (\opt\pgenv\src\postgresql-14.3\src\backend\access\transam\xact.c:3644)
AbortSubTransaction() (\opt\pgenv\src\postgresql-14.3\src\backend\access\transam\xact.c:5058)
AbortCurrentTransaction() (\opt\pgenv\src\postgresql-14.3\src\backend\access\transam\xact.c:3366)
PostgresMain(int argc, char ** argv, const char * dbname, const char * username) (\opt\pgenv\src\postgresql-14.3\src\backend\tcop\postgres.c:4250)
BackendRun(Port * port) (\opt\pgenv\src\postgresql-14.3\src\backend\postmaster\postmaster.c:4530)
BackendStartup(Port * port) (\opt\pgenv\src\postgresql-14.3\src\backend\postmaster\postmaster.c:4252)
ServerLoop() (\opt\pgenv\src\postgresql-14.3\src\backend\postmaster\postmaster.c:1745)
PostmasterMain(int argc, char ** argv) (\opt\pgenv\src\postgresql-14.3\src\backend\postmaster\postmaster.c:1417)
main(int argc, char ** argv) (\opt\pgenv\src\postgresql-14.3\src\backend\main\main.c:209)
```
For this reason, to be more defensive against memory-allocation errors
that could happen at `PushSubXact()`, now we use our pre-allocated
memory
context for the objects created in `PushSubXact()`.
This commit also attempts reducing the memory allocations done under
CommitContext to reduce the chances of consuming all the memory
available
to CommitContext.
Note that it's problematic to encounter with such a memory-allocation
error for other objects created in `PushSubXact()` as well, so above is
an **example** scenario that might result in a segfault.
DESCRIPTION: Fixes a bug that might cause segfaults when handling deeply
nested subtransactions
DESCRIPTION: Makes sure to disallow triggers that depend on extensions
We were already doing so for `ALTER trigger DEPENDS ON EXTENSION`
commands. However, we also need to disallow creating Citus tables
having such triggers already, so this PR fixes that.
DESCRIPTION: Improve a query that terminates compeling backends from citus_update_node()
1. Use pg_blocking_pids() function instead of self join on pg_locks. It exists since 9.6 and more accurate than pg_locks.
2. Prefix all function calls with pg_catalog schema to prevent privilege escalation by creating functions with similar names in a public schema.
3. Change logs and update comments to reflect the fact that the pg_terminate_backend() function only sends SIGTERM but not wating for the actual backend termination.