Commit Graph

3358 Commits (a664e849bc1e70d15bdc260e87d72f9980c7eee8)

Author SHA1 Message Date
Teja Mupparti da7db53c87 Refactor some of the planning code to accomodate a new planning path for MERGE SQL 2023-03-22 11:29:24 -07:00
Onur Tirtir e1f1d63050
Rename AllRelations.. functions to AllDistributedRelations.. (#6789)
Because they're only interested in distributed tables. Even more,
this replaces HasDistributionKey() check with
IsCitusTableType(DISTRIBUTED_TABLE) because this doesn't make a
difference on main and sounds slightly more intuitive. Plus, this
would also allow safely using this function in
https://github.com/citusdata/citus/pull/6773.
2023-03-22 15:15:23 +03:00
Ahmet Gedemenli 2713e015d6
Check before logicalrep for rebalancer, error if needed (#6754)
DESCRIPTION: Check before logicalrep for rebalancer, error if needed

Check if we can use logical replication or not, in case of shard
transfer mode = auto, before executing the shard moves. If we can't,
error out. Before this PR, we used to error out in the middle of shard
moves:
```sql
set citus.shard_count = 4; -- just to get the error sooner
select citus_remove_node('localhost',9702);

create table t1 (a int primary key);
select create_distributed_table('t1','a');
create table t2 (a bigint);
select create_distributed_table('t2','a');

select citus_add_node('localhost',9702);
select rebalance_table_shards();
NOTICE:  Moving shard 102008 from localhost:9701 to localhost:9702 ...
NOTICE:  Moving shard 102009 from localhost:9701 to localhost:9702 ...
NOTICE:  Moving shard 102012 from localhost:9701 to localhost:9702 ...
ERROR:  cannot use logical replication to transfer shards of the relation t2 since it doesn't have a REPLICA IDENTITY or PRIMARY KEY
```

Now we check and error out in the beginning, without moving the shards.

fixes: #6727
2023-03-21 16:34:52 +03:00
Onur Tirtir aa465b6de1
Decide what to do with router planner error at one place (#6781) 2023-03-21 14:04:07 +03:00
Teja Mupparti cf55136281 1) Restrict MERGE command INSERT to the source's distribution column
Fixes #6672

2) Move all MERGE related routines to a new file merge_planner.c

3) Make ConjunctionContainsColumnFilter() static again, and rearrange the code in MergeQuerySupported()
4) Restore the original format in the comments section.
5) Add big serial test. Implement latest set of comments
2023-03-16 13:43:08 -07:00
Teja Mupparti 1e42cd3da0 Support MERGE on distributed tables with restrictions
This implements the phase - II of MERGE sql support

Support routable query where all the tables in the merge-sql are distributed, co-located, and both the source and
target relations are joined on the distribution column with a constant qual. This should be a Citus single-task
query. Below is an example.

SELECT create_distributed_table('t1', 'id');
SELECT create_distributed_table('s1', 'id', colocate_with => ‘t1’);

MERGE INTO t1
USING s1 ON t1.id = s1.id AND t1.id = 100
WHEN MATCHED THEN
UPDATE SET val = s1.val + 10
WHEN MATCHED THEN
DELETE
WHEN NOT MATCHED THEN
INSERT (id, val, src) VALUES (s1.id, s1.val, s1.src)

Basically, MERGE checks to see if

There are a minimum of two distributed tables (source and a target).
All the distributed tables are indeed colocated.
MERGE relations are joined on the distribution column
MERGE .. USING .. ON target.dist_key = source.dist_key
The query should touch only a single shard i.e. JOIN AND with a constant qual
MERGE .. USING .. ON target.dist_key = source.dist_key AND target.dist_key = <>
If any of the conditions are not met, it raises an exception.

(cherry picked from commit 44c387b978)

This implements MERGE phase3

Support pushdown query where all the tables in the merge-sql are Citus-distributed, co-located, and both
the source and target relations are joined on the distribution column. This will generate multiple tasks
which execute independently after pushdown.

SELECT create_distributed_table('t1', 'id');
SELECT create_distributed_table('s1', 'id', colocate_with => ‘t1’);

MERGE INTO t1
USING s1
ON t1.id = s1.id
        WHEN MATCHED THEN
                UPDATE SET val = s1.val + 10
        WHEN MATCHED THEN
                DELETE
        WHEN NOT MATCHED THEN
                INSERT (id, val, src) VALUES (s1.id, s1.val, s1.src)

*The only exception for both the phases II and III is, UPDATEs and INSERTs must be done on the same shard-group
as the joined key; for example, below scenarios are NOT supported as the key-value to be inserted/updated is not
guaranteed to be on the same node as the id distribution-column.

MERGE INTO target t
USING source s ON (t.customer_id = s.customer_id)
WHEN NOT MATCHED THEN - -
     INSERT(customer_id, …) VALUES (<non-local-constant-key-value>, ……);

OR this scenario where we update the distribution column itself

MERGE INTO target t
USING source s On (t.customer_id = s.customer_id)
WHEN MATCHED THEN
     UPDATE SET customer_id = 100;

(cherry picked from commit fa7b8949a8)
2023-03-16 13:43:08 -07:00
Onur Tirtir 9550ebd118 Remove pg_depend entries from columnar metadata indexes to columnar-am
In the past, having columnar tables in the cluster was causing pg
upgrades to fail when attempting to access columnar metadata. This is
because, pg_dump doesn't see objects that we use for columnar-am related
booking as the dependencies of the tables using columnar-am.
To fix that; in #5456, we inserted some "normal dependency" edges (from
those objects to columnar-am) into pg_depend.

This helped us ensuring the existency of a class of metadata objects
--such as columnar.storageid_seq-- and helped fixing #5437.

However, the normal-dependency edges that we added for indexes on
columnar metadata tables --such columnar.stripe_pkey-- didn't help at
all because they were indeed causing dependency loops (#5510) and
pg_dump was not able to take those dependency edges into the account.

For this reason, this commit deletes those dependency edges so that
pg_dump stops complaining about them. Note that it's not critical to
delete those edges from pg_depend since they're not breaking pg upgrades
but were triggering some warning messages. And given that backporting
a sql change into older versions is hard a lot, we skip backporting
this.
2023-03-14 17:13:52 +03:00
Onur Tirtir be0735a329 Use "cpp" to expand "#include" directives in columnar sql files 2023-03-14 17:13:52 +03:00
Onur Tirtir f68fc9e69c
Decide core distribution params in CreateCitusTable (#6760)
Decide core distribution params in CreateCitusTable to reduce the
chances of
creating Citus tables based on incorrect combinations of distribution
method
and replication model params.

Also introduce DistributedTableParams struct to encapsulate the
parameters
that are specific to distributed tables.
2023-03-14 14:24:52 +03:00
Onur Tirtir 20a5f3af2b
Replace CITUS_TABLE_WITH_NO_DIST_KEY checks with HasDistributionKey() (#6743)
Now that we will soon add another table type having DISTRIBUTE_BY_NONE
as distribution method and that we want the code to interpret such
tables mostly as distributed tables, let's make the definition of those
other two table types more strict by removing
CITUS_TABLE_WITH_NO_DIST_KEY
macro.

And instead, use HasDistributionKey() check in the places where the
logic applies to all table types that have / don't have a distribution
key. In future PRs, we might want to convert some of those
HasDistributionKey() checks if logic only applies to Citus local /
reference tables, not the others.

And adding HasDistributionKey() also allows us to consider having
DISTRIBUTE_BY_NONE as the distribution method as a "table attribute"
that can apply to distributed tables too, rather something that
determines the table type.
2023-03-10 13:55:52 +03:00
Onur Tirtir e3cf7ace7c
Stabilize single_node.sql and others that report illegal node removal (#6751)
See
https://app.circleci.com/pipelines/github/citusdata/citus/30859/workflows/223d61db-8c1d-4909-9aea-d8e470f0368b/jobs/1009243.
2023-03-08 15:25:36 +03:00
Onur Tirtir d82c11f793
Refactor CreateDistributedTable() (#6742)
Split the main logic that allows creating a Citus table into the
internal function CreateCitusTable().

Old CreateDistributedTable() function was assuming that it's creating
a reference table when the distribution method is DISTRIBUTE_BY_NONE.
However, soon this won't be the case when adding support for creating
single-shard distributed tables because their distribution method would
also be the same.

Now the internal method CreateCitusTable() doesn't make any assumptions
about table's replication model or such. Instead, it expects callers to
properly set all such metadata bits.

Even more, some of the parameters the old CreateDistributedTable() takes
 --such as the shard count-- were not meaningful for a reference table,
and would be the same as for new table type.
2023-03-08 13:38:51 +03:00
Emel Şimşek 4043abd5aa
Exclude-Generated-Columns-In-Copy (#6721)
DESCRIPTION: Fixes a bug in shard copy operations.

For copying shards in both shard move and shard split operations, Citus
uses the COPY statement.

A COPY all statement in the following form
` COPY target_shard FROM STDIN;`
throws an error when there is a GENERATED column in the shard table.

In order to fix this issue, we need to exclude the GENERATED columns in
the COPY and the matching SELECT statements. Hence this fix converts the
COPY and SELECT all statements to the following form:
```
COPY target_shard (col1, col2, ..., coln) FROM STDIN;
SELECT (col1, col2, ..., coln) FROM source_shard;
```
where (col1, col2, ..., coln) does not include a GENERATED column. 
GENERATED column values are created in the target_shard as the values
are inserted.

Fixes #6705.

---------

Co-authored-by: Teja Mupparti <temuppar@microsoft.com>
Co-authored-by: aykut-bozkurt <51649454+aykut-bozkurt@users.noreply.github.com>
Co-authored-by: Jelte Fennema <jelte.fennema@microsoft.com>
Co-authored-by: Gürkan İndibay <gindibay@microsoft.com>
2023-03-07 18:15:50 +03:00
Ahmet Gedemenli 03f1bb70b7
Rebalance shard groups with placement count less than worker count (#6739)
DESCRIPTION: Adds logic to distribute unbalanced shards

If the number of shard placements (for a colocation group) is less than
the number of workers, it means that some of the workers will remain
empty. With this PR, we consider these shard groups as a colocation
group, in order to make them be distributed evenly as much as possible
across the cluster.

Example:
```sql
create table t1 (a int primary key);
create table t2 (a int primary key);
create table t3 (a int primary key);
set citus.shard_count =1;
select create_distributed_table('t1','a');
select create_distributed_table('t2','a',colocate_with=>'t1');
select create_distributed_table('t3','a',colocate_with=>'t2');

create table tb1 (a bigint);
create table tb2 (a bigint);
select create_distributed_table('tb1','a');
select create_distributed_table('tb2','a',colocate_with=>'tb1');

select citus_add_node('localhost',9702);
select rebalance_table_shards();
```

Here we have two colocation groups, each with one shard group. Both
shard groups are placed on the first worker node. When we add a new
worker node and try to rebalance table shards, the rebalance planner
considers it well balanced and does nothing. With this PR, the
rebalancer tries to distribute these shard groups evenly across the
cluster as much as possible. For this example, with this PR, the
rebalancer moves one of the shard groups to the second worker node.

fixes: #6715
2023-03-06 14:14:27 +03:00
Emel Şimşek ed7cc8f460
Remove unused lock functions (#6747)
Code cleanup. This change removes two unused functions seemingly left
over after a previous refactoring of shard move code.
2023-03-06 13:59:45 +03:00
Jelte Fennema b489d763e1
Use pg_total_relation_size in citus_shards (#6748)
DESCRIPTION: Correctly report shard size in citus_shards view

When looking at citus_shards, people are interested in the actual size
that all the data related to the shard takes up on disk.
`pg_total_relation_size` is the function to use for that purpose. The
previously used `pg_relation_size` does not include indexes or TOAST.
Especially the missing toast can have enormous impact on the size of the
shown data.
2023-03-06 10:53:12 +01:00
aykut-bozkurt e2654deeae
fix memory leak during altering distributed table with a lot of partition and shards (#6726)
2 improvements to prevent memory leaks during altering or undistributing
distributed tables with a lot of partitions and shards:

1. Free memory for each call to ConvertTable so that colocated and partition tables at
`AlterDistributedTable`, `UndistributeTable`, or
`AlterTableSetAccessMethod` will not cause an increase
in memory usage,
2. Free memory while executing attach partition commands for each partition table at
`AlterDistributedTable` to prevent an increase in memory usage.

DESCRIPTION: Fixes memory leak issue during altering distributed table
with a lot of partition and shards.

Fixes https://github.com/citusdata/citus/issues/6503.
2023-02-28 21:23:41 +03:00
Teja Mupparti 9cbfdc86dd MERGE: In deparser, add missing check for RETURNING clause. 2023-02-26 22:38:14 -08:00
Teja Mupparti d7b499929c Rearrange the common code into a newfunction to facilitate the multiple checks of the same conditions in a multi-modify MERGE statement 2023-02-24 12:55:11 -08:00
aykut-bozkurt a7689c3f8d
fix memory leak during distribution of a table with a lot of partitions (#6722)
We have memory leak during distribution of a table with a lot of
partitions as we do not release memory at ExprContext until all
partitions are not distributed. We improved 2 things to resolve the
issue:

1. We create and delete MemoryContext for each call to
`CreateDistributedTable` by partitions,
2. We rebuild the cache after we insert all the placements instead of
each placement for a shard.

DESCRIPTION: Fixes memory leak during distribution of a table with a lot
of partitions and shards.

Fixes https://github.com/citusdata/citus/issues/6572.
2023-02-17 18:12:49 +03:00
Emel Şimşek 756c1d3f5d
Remove auto_explain workaround in citus explain hook for ALTER TABLE (#6714)
When auto_explain module is loaded and configured, EXPLAIN will be
implicitly run for all the supported commands. Postgres does not support
`EXPLAIN` for `ALTER` command. However, auto_explain will try to
`EXPLAIN` other supported commands internally triggered by `ALTER`.

 For instance, 
`ALTER TABLE target_table ADD CONSTRAINT fkey_167 FOREIGN KEY (col_1)
REFERENCES ref_table(key) ... `

command may trigger a SELECT command in the following form for foreign
key validation purpose:
`SELECT fk.col_1 FROM ONLY target_table fk LEFT OUTER JOIN ONLY
ref_table pk ON ( pk.key OPERATOR(pg_catalog.=) fk.col_1) WHERE pk.key
IS NULL AND (fk.col_1 IS NOT NULL) `

For Citus tables, the Citus utility hook should ensure that constraint
validation is skipped for shell tables but they are done for shard
tables. The reason behind this design choice can be summed up as:

- An ALTER TABLE command via coordinator node is run in a distributed
transaction.
- Citus does not support nested distributed transactions. 
- A SELECT query on a distributed table (aka shell table) is also run in
a distributed transaction.
- Therefore, Citus does not support running a SELECT query on a shell
table while an ALTER TABLE command is running.

With
eadc88a800
a bug is introduced breaking the skip constraint validation behaviour of
Citus. With this change, we see that validation queries on distributed
tables are triggered within `ALTER` command adding constraints with
validation check. This regression did not cause an issue for regular use
cases since the citus executor hook blocks those queries heuristically
when there is an ALTER TABLE command in progress.

The issue is surfaced as a crash (#6424 Workers, when configured to use
auto_explain, crash during distributed transactions.) when auto_explain
is enabled. This is due to auto_explain trying to execute the SELECT
queries in a nested distributed transaction.

Now since the regression with constraint validation is fixed in
https://github.com/citusdata/citus/issues/6543, we should be able to
remove the workaround.
2023-02-17 17:47:03 +03:00
aykut-bozkurt 9e69dd0e7f
fix single tuple result memory leak (#6724)
We should not omit to free PGResult when we receive single tuple result
from an internal backend.
Single tuple results are normally freed by our ReceiveResults for
`tupleDescriptor != NULL` flow but not for those with `tupleDescriptor
== NULL`. See PR #6722 for details.

DESCRIPTION: Fixes memory leak issue with query results that returns
single row.
2023-02-17 14:15:09 +03:00
aykut-bozkurt 273911ac7f
prevent memory leak during ConvertTable with a lot of partitions (#6693)
Prevents memory leak during ConvertTable call for a table with a lot of
partitions.

DESCRIPTION: Fixes memory leak during undistribution and alteration of a
table with a lot of partitions.
2023-02-13 15:22:13 +03:00
Teja Mupparti 0824d9c1fb Miscellaneous cleanup 2023-02-09 13:05:59 -08:00
Onur Tirtir 483b51392f
Bump Citus to 11.3devel (#6690) 2023-02-06 10:23:25 +00:00
Gokhan Gulbiz b6a4652849
Stop background daemon before dropping the database (#6688)
DESCRIPTION: Stop maintenance daemon when dropping a database even
without Citus extension

Fixes #6670
2023-02-03 15:15:44 +03:00
Jelte Fennema f061dbb253
Also reset transactions at connection shutdown (#6685)
In #6314 I refactored the connection cleanup to be simpler to
understand and use. However, by doing so I introduced a use-after-free
possibility (that valgrind luckily picked up):

In the `ShouldShutdownConnection` path of
`AfterXactHostConnectionHandling`
we free connections without removing the `transactionNode` from the
dlist that it might be part of. Before the refactoring this wasn't a
problem, because the dlist would be completely reset quickly after in
`ResetGlobalVariables` (without reading or writing the dlist entries).

The refactoring changed this by moving the `dlist_delete` call to
`ResetRemoteTransaction`, which in turn was called in the
`!ShouldShutdownConnection` path of `AfterXactHostConnectionHandling`.
Thus this `!ShouldShutdownConnection` path would now delete from the
`dlist`, but the `ShouldShutdownConnection` path would not. Thus to
remove itself the deleting path would sometimes update nodes in the list
that were freed right before.

There's two ways of fixing this:
1. Call `dlist_delete` from **both** of paths.
2. Call `dlist_delete` from **neither** of the paths.

This commit implements the second approach, and #6684 implements the
first. We need to choose which approach we prefer.

To make calling `dlist_delete` from both paths actually work, we also need
to use a slightly different check to determine if we need to call dlist_delete.
Various regression tests showed that there can be cases where the
`transactionState` is something else than `REMOTE_TRANS_NOT_STARTED`
but the connection was not added to the `InProgressTransactions` list
One example of such a case is when running `TransactionStateMachine`
without calling `StartRemoteTransactionBegin` beforehand. In those
cases the connection won't be added to `InProgressTransactions`, but
the `transactionState` is changed to `REMOTE_TRANS_SENT_COMMAND`. 

Sidenote: This bug already existed in 11.1, but valgrind didn't catch it
back then. My guess is that this happened because #6314 was merged after
the initial release branch was cut.

Fixes #6638
2023-02-02 16:05:34 +01:00
Hanefi Onaldi 47ff03123b
Improve rebalance reporting for retried tasks (#6683)
If there is a problem with an ongoing rebalance, we did not show details
on background tasks that are stuck in runnable state. Similar to how we
show details for errored tasks, we now show details on tasks that are
being retried.

Earlier we showed the following output when a task was stuck:

```
┌────────────────────────────┐
│ {                         ↵│
│     "tasks": [            ↵│
│     ],                    ↵│
│     "task_state_counts": {↵│
│         "done": 13,       ↵│
│         "blocked": 2,     ↵│
│         "runnable": 1     ↵│
│     }                     ↵│
│ }                          │
└────────────────────────────┘
```

Now we show details like the following:

```
+-----------------------------------------------------------------------
| {
|     "tasks": [
|         {
|             "state": "runnable",
|             "command": "SELECT pg_catalog.citus_move_shard_placement(1
|             "message": "ERROR: Moving shards to a node that shouldn't
|             "retried": 2,
|             "task_id": 3
|         }
|     ],
|     "task_state_counts": {
|         "blocked": 1,
|         "runnable": 1
|     }
| }
+-----------------------------------------------------------------------
```
2023-01-31 15:26:52 +03:00
Jelte Fennema 14c31fbb07
Fix background rebalance when reference table has no PK (#6682)
DESCRIPTION: Fix background rebalance when reference table has no PK

For the background rebalance we would always fail if a reference table
that was not replicated to all nodes would not have a PK (or replica
identity). Even when we used force_logical or block_writes as the shard
transfer mode. This fixes that and adds some regression tests.

Fixes #6680
2023-01-31 12:18:29 +01:00
aykut-bozkurt 8a9bb272e4
fix dropping table_name option from foreign table (#6669)
We should disallow dropping table_name option if foreign table is in
metadata. Otherwise, we get table not found error which contains
shardid.

DESCRIPTION: Fixes an unexpected foreign table error by disallowing to drop the table_name option.

Fixes #6663
2023-01-30 17:24:30 +03:00
Marco Slot a482b36760
Revert "Support MERGE on distributed tables with restrictions" (#6675)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2023-01-30 15:01:59 +01:00
Onur Tirtir 1c51ddae49 Fall-back to seq-scan when accessing columnar metadata if the index doesn't exist
Fixes #6570.

In the past, having columnar tables in the cluster was causing pg
upgrades to fail when attempting to access columnar metadata. This is
because, pg_dump doesn't see objects that we use for columnar-am related
booking as the dependencies of the tables using columnar-am.
To fix that; in #5456, we inserted some "normal dependency" edges (from
those objects to columnar-am) into pg_depend.

This helped us ensuring the existency of a class of metadata objects
--such as columnar.storageid_seq-- and helped fixing #5437.

However, the normal-dependency edges that we added for indexes on
columnar metadata tables --such columnar.stripe_pkey-- didn't help at
all because they were indeed causing dependency loops (#5510) and
pg_dump was not able to take those dependency edges into the account.

For this reason, instead of inserting such dependency edges from indexes
to columnar-am, we allow columnar metadata accessors to fall-back to
sequential scan during pg upgrades.
2023-01-30 15:58:34 +03:00
aykut-bozkurt ab71cd01ee
fix multi level recursive plan (#6650)
Recursive planner should handle all the tree from bottom to top at
single pass. i.e. It should have already recursively planned all
required parts in its first pass. Otherwise, this means we have bug at
recursive planner, which needs to be handled. We add a check here and
return error.

DESCRIPTION: Fixes wrong results by throwing error in case recursive
planner multipass the query.

We found 3 different cases which causes recursive planner passes the
query multiple times.
1. Sublink in WHERE clause is planned at second pass after we
recursively planned a distributed table at the first pass. Fixed by PR
#6657.
2. Local-distributed joins are recursively planned at both the first and
the second pass. Issue #6659.
3. Some parts of the query is considered to be noncolocated at the
second pass as we do not generate attribute equivalances between
nondistributed and distributed tables. Issue #6653
2023-01-27 21:25:04 +03:00
Gokhan Gulbiz 4e26464969
Allow plain pg foreign tables without a table_name option (#6652) 2023-01-27 16:34:11 +03:00
Jelte Fennema 81dcddd1ef
Actually skip constraint validation on shards after shard move (#6640)
DESCRIPTION: Fix foreign key validation skip at the end of shard move

In eadc88a we started completely skipping foreign key constraint
validation at the end of a non blocking shard move, instead of only for
foreign keys to reference tables. However, it turns out that this didn't
work at all because of a hard to notice bug: By resetting the
SkipConstraintValidation flag at the end of our utility hook, we
actually make the SET command that sets it a no-op.

This fixes that bug by removing the code that resets it. This is fine
because #6543 removed the only place where we set the flag in C code. So
the resetting of the flag has no purpose anymore. This PR also adds a
regression test, because it turned out we didn't have any otherwise we
would have caught that the feature was completely broken.

It also moves the constraint validation skipping to the utility hook.
The reason is that #6550 showed us that this is the better place to skip
it, because it will also skip the planning phase and not just the
execution.
2023-01-27 13:08:05 +01:00
aykut-bozkurt 8870f0f90b
fix order of recursive sublink planning (#6657)
We should do the sublink conversations at the end of the recursive
planning because earlier steps might have transformed the query into a
shape that needs recursively planning the sublinks.

DESCRIPTION: Fixes early sublink check at recursive planner.

Related to PR https://github.com/citusdata/citus/pull/6650
2023-01-27 14:35:16 +03:00
Onur Tirtir 97dba0ac00
Fix uninit mem acceess in UpdateFunctionDistributionInfo (#6658)
Fixes #6655.

heap_modify_tuple() fetches values[i] if replace[i] is set true,
regardless of the fact that whether isnull[i] is true or false. So
similar to replace[], let's init values[] & isnull[] too.

DESCRIPTION: Fixes an uninitialized memory access in
create_distributed_function()
2023-01-27 11:00:41 +03:00
Onur Tirtir d2d507eb85
Fix columnar README.md (#6633)
Reported in #6626.
2023-01-26 10:39:39 +03:00
Emel Şimşek 24f6136f72
Fixes ADD {PRIMARY KEY/UNIQUE} USING INDEX cmd (#6647)
This change allows creating a constraint without a name using an index.
The index name will be used as the constraint name the same way postgres
handles it.
Fixes issue #6644

This commit also cleans up some leftovers from nameless constraint checks.
With this commit, we now fully support adding all nameless constraints
directly to a table.

Co-authored-by: naisila <nicypp@gmail.com>
2023-01-25 21:28:07 +03:00
Emel Şimşek 2169e0222b
Propagates NOT VALID option for FK&CHECK constraints w/out a name (#6649)
Adds NOT VALID option to deparser. When we need to deparse:
  "ALTER TABLE ADD FOREIGN KEY ... NOT VALID"
  "ALTER TABLE ADD CHECK ... NOT VALID"
NOT VALID option should be propagated to workers.

Fixes issue #6646

This commit also uses AppendColumnNameList function
instead of repeated code blocks in two appropriate places
in the "ALTER TABLE" deparser.
2023-01-25 20:41:04 +03:00
Hanefi Onaldi 94b63f35a5
Prevent crashes on update with returning clauses (#6643)
If an update query on a reference table has a returns clause with a
subquery that accesses some other local table, we end-up with an crash.

This commit prevents the crash, but does not prevent other error
messages from happening due to Citus not being able to pushdown the
results of that subquery in a valid SQL command.

Related: #6634
2023-01-24 20:07:43 +03:00
Naisila Puka 3c96b2a0cd
Remove unused function RelationUsesIdentityColumns (#6645)
Cleanup from #6591
2023-01-24 17:10:05 +03:00
Jelte Fennema 7a7880aec9
Fix regression in allowed foreign keys on distributed tables (#6550)
DESCRIPTION: Fix regression in allowed foreign keys on distributed
tables

In commit eadc88a we changed how we skip foreign key validation. The
goal was to skip it in more cases. However, one change had the
unintended regression of introducing failures when trying to create
certain foreign keys. This reverts that part of the change.

The way of skipping validation of foreign keys that was introduced in
eadc88a was skipping validation during execution. The reason that
this caused this regression was because some foreign key validation
queries already fail during planning. In those cases it never gets to
the execution step where it would later be skipped.

Fixes #6543
2023-01-24 16:09:21 +03:00
Emel Şimşek 58368b7783
Enable adding FOREIGN KEY constraints on Citus tables without a name. (#6616)
DESCRIPTION: Enable adding FOREIGN KEY constraints on Citus tables
without a name

This PR enables adding a foreign key to a distributed/reference/Citus
local table without specifying the name of the constraint, e.g. `ALTER
TABLE items ADD FOREIGN KEY (user_id) REFERENCES users (id);`
2023-01-20 01:43:52 +03:00
Gokhan Gulbiz 2388fbea6e
Identity Column Support on Citus Managed Tables (#6591)
DESCRIPTION: Identity Column Support on Citus Managed Tables
2023-01-19 15:45:41 +03:00
Marco Slot 64e3fee89b
Remove shardstate leftovers (#6627)
Remove ShardState enum and associated logic.

Co-authored-by: Marco Slot <marco.slot@gmail.com>
Co-authored-by: Ahmet Gedemenli <afgedemenli@gmail.com>
2023-01-19 11:43:58 +03:00
Teja Mupparti 44c387b978 Support MERGE on distributed tables with restrictions
This implements the phase - II of MERGE sql support

Support routable query where all the tables in the merge-sql are distributed, co-located, and both the source and
target relations are joined on the distribution column with a constant qual. This should be a Citus single-task
query. Below is an example.

SELECT create_distributed_table('t1', 'id');
SELECT create_distributed_table('s1', 'id', colocate_with => ‘t1’);

MERGE INTO t1
USING s1 ON t1.id = s1.id AND t1.id = 100
WHEN MATCHED THEN
UPDATE SET val = s1.val + 10
WHEN MATCHED THEN
DELETE
WHEN NOT MATCHED THEN
INSERT (id, val, src) VALUES (s1.id, s1.val, s1.src)

Basically, MERGE checks to see if

There are a minimum of two distributed tables (source and a target).
All the distributed tables are indeed colocated.
MERGE relations are joined on the distribution column
MERGE .. USING .. ON target.dist_key = source.dist_key
The query should touch only a single shard i.e. JOIN AND with a constant qual
MERGE .. USING .. ON target.dist_key = source.dist_key AND target.dist_key = <>
If any of the conditions are not met, it raises an exception.
2023-01-18 11:05:27 -08:00
Ahmet Gedemenli b3b135867e
Remove shardstate from placement insert functions (#6615) 2023-01-18 09:52:38 +01:00
Hanefi Onaldi f21dfd5fae
Rebalance Progress Reporting API (#6576)
citus_job_list() lists all background jobs by simply showing the records
in pg_dist_background_job.

citus_job_status(job_id bigint, raw boolean default false) shows the
status of a single background job by appending a jsonb details column to
the associated row from pg_dist_background_job. If the raw argument is
set, machine readable sizes are used instead of human readable
alternatives.

citus_rebalance_status(raw boolean default false) shows the status of
the last rebalance operation. If the raw argument is set, machine
readable sizes are used instead of human readable alternatives.
2023-01-16 16:17:31 +03:00
Jelte Fennema 92689a8362
Make GPIDs work with pg_dist_poolinfo (#6588)
The original implementation of GPIDs didn't work correctly when using
`pg_dist_poolinfo` together with PgBouncer. The reason is that it
assumed that once a connection was made to a worker, the originating
GPID should stay the same for ever. But when pg_dist_poolinfo is used
this isn't the case, because the same connection on the worker might be
used by different backends of the coordinator.

This fixes that issue by updating the GPID whenever a new application
name is set on a connection. This is the only thing that's needed,
because PgBouncer already sets the application name correctly on the
server connection whenever a client is updated.
2023-01-13 14:39:19 +00:00
Marco Slot ad3407b5ff
Revert "Make the metadata syncing less resource invasive [Phase-1]" (#6618)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2023-01-13 13:56:55 +01:00
Ahmet Gedemenli ed6dd8b086
Mark 0/0 lsn results as null (#6617)
Marks `source_lsn` and `target_lsn` fields as null if the result is 0/0
2023-01-13 15:33:43 +03:00
Emel Şimşek 28ed013a91
Enable ALTER TABLE ... ADD CHECK (#6606)
DESCRIPTION: Enable adding CHECK constraints on distributed tables
without the client having to provide a constraint name.

This PR enables the following command syntax for adding check
constraints to distributed tables.
 ALTER TABLE ... ADD CHECK ... 

by creating a default constraint name and transforming the command into
the below syntax before sending it to workers.

ALTER TABLE ... ADD CONSTRAINT \<conname> CHECK ...
2023-01-12 23:31:06 +03:00
Ahmet Gedemenli 9b9d8e7abd Use shard transfer UDFs with node ids for rebalancing 2023-01-12 16:57:51 +03:00
Ahmet Gedemenli e5fef40c06 Introduce citus_move_shard_placement UDF with nodeid 2023-01-12 16:57:51 +03:00
Ahmet Gedemenli e19c545fbf Introduce citus_copy_shard_placement UDF with nodeid 2023-01-12 16:57:51 +03:00
Emel Şimşek d322f9e382
Handle DEFERRABLE option for the relevant constraints at deparser. (#6613)
Table Constraints UNIQUE, PRIMARY KEY and EXCLUDE may have option
DEFERRABLE in their command syntax. This PR handles the option when
deparsing the relevant constraint statements.

NOT DEFERRABLE 
and
INITIALLY IMMEDIATE (if DEFERRABLE}

are the default values for the option so we only append the non-default
values to the alter table statement.
2023-01-12 12:32:38 +03:00
Jelte Fennema 34df853bda
Fix bug introduced by #6412 (#6590)
In #6412 I made a change to not re-assign the global PID if it was
already set. This inadvertently introduced a regression where `userId`
and `databaseId` would not be set on the backend data when the global
PID was assigned in the authentication hook.

This fixes it by doing two things:
1. Removing `userId` from `BackendData`, since it's not used anywhere
   anyway.
2. Move assignment of `databaseId` to dedicated
   `SetBackendDataDatabaseId` function, that isn't a no-op when global
   pid is already set.

Since #6412 is not released yet this does not need a description.
2023-01-10 16:21:57 +01:00
Jelte Fennema c2b4087ff0
Quote all identifiers that we use for logical replication (#6604)
In #6598 it was noticed that Citus could generate syntactically invalid
statements during logical replication. With #6603 we resolved the direct
issue, by only generating valid subscription names. But there was also
the underlying problem that we did not escape certain identifier
strings. While in theory this should be okay since we should only
generate names that are valid, this issue reiterated that we should not
take this for granted. As an extra line of defense this quotes all
identifiers we use during logical replication setup.
2023-01-06 14:12:03 +00:00
Ahmet Gedemenli 26b170e1a8
Use %u instead of %i for naming subscriptions & roles (#6603)
DESCRIPTION: Fix the modifier for subscription and role creation

fixes: #6598 
Reported by @ivyazmitinov
2023-01-06 14:38:32 +01:00
Ahmet Gedemenli bc3383170e
Fix crash when trying to replicate a ref table that is actually dropped (#6595)
DESCRIPTION: Fix crash when trying to replicate a ref table that is actually dropped

see #6592 
We should have a real solution for it.
2023-01-06 14:52:08 +03:00
Emel Şimşek db7a70ef3e
Enable ALTER TABLE ... ADD UNIQUE and ADD EXCLUDE. (#6582)
DESCRIPTION: Adds support for creating table constraints UNIQUE and
EXCLUDE via ALTER TABLE command without client having to specify a name.

ALTER TABLE ... ADD CONSTRAINT <conname> UNIQUE ...
ALTER TABLE ... ADD CONSTRAINT <conname> EXCLUDE ...

commands require the client to provide an explicit constraint name.
However, in postgres it is possible for clients not to provide a name
and let the postgres generate it using the following commands

ALTER TABLE ... ADD UNIQUE ...
ALTER TABLE ... ADD EXCLUDE ...

This PR enables the same functionality for citus tables.
2023-01-05 18:12:32 +03:00
Önder Kalacı eb75decbeb
Undo planner extended statistics override (#6492) 2023-01-04 13:25:57 +01:00
Önder Kalacı a1aa96b32c
Make the metadata syncing less resource invasive [Phase-1] (#6537) 2023-01-04 11:36:45 +01:00
Ahmet Gedemenli 235047670d
Drop SHARD_STATE_TO_DELETE (#6494)
DESCRIPTION: Drop `SHARD_STATE_TO_DELETE` and use the cleanup records
instead

Drops the shard state that is used to mark shards as orphaned. Now we
insert cleanup records into `pg_dist_cleanup` so "orphaned" shards will
be dropped either by maintenance daemon or internal cleanup calls. With
this PR, we make the "cleanup orphaned shards" functions to be no-op, as
they would not be needed anymore.

This PR includes some naming changes about placement functions. We don't
need functions that filter orphaned shards, as there will be no orphaned
shards anymore.

We will also be introducing a small script with this PR, for users with
orphaned shards. We'll basically delete the orphaned shard entries from
`pg_dist_placement` and insert cleanup records into `pg_dist_cleanup`
for each one of them, during Citus upgrade.

We also have a lot of flakiness fixes in this PR.

Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
2023-01-03 14:38:16 +03:00
Ahmet Gedemenli 1b1e737e51
Drop cleanup on failure (#6584)
DESCRIPTION: Defers cleanup after a failure in shard move or split

We don't need to do a cleanup in case of failure on a shard transfer or
split anymore. Because,
* Maintenance daemon will clean them up anyway.
* We trigger a cleanup at the beginning of shard transfers/splits. 
* The cleanup on failure logic also can fail sometimes and instead of
the original error, we throw the error that is raised by the cleanup
procedure, and it causes confusion.
2022-12-28 15:48:44 +03:00
Ahmet Gedemenli eba9abeee2
Fix leftover shard copy on the target node when tx with move is aborted (#6583)
DESCRIPTION: Cleanup the shard on the target node in case of a
failed/aborted shard move

Inserts a cleanup record for the moved shard placement on the target
node. If the move operation succeeds, the record will be deleted. If
not, it will remain there to be cleaned up later.

fixes: #6580
2022-12-27 22:42:46 +03:00
Teja Mupparti 9a9989fc15 Support MERGE Phase – I
All the tables (target, source or any CTE present) in the SQL statement are local i.e. a merge-sql with a combination of Citus local and
Non-Citus tables (regular Postgres tables) should work and give the same result as Postgres MERGE on regular tables. Catch and throw an
exception (not-yet-supported) for all other scenarios during Citus-planning phase.
2022-12-18 20:32:15 -08:00
Emel Şimşek 5268d0a6cb
Enable PRIMARY KEY generation via ALTER TABLE even if the constraint name is not provided (#6520)
DESCRIPTION: Support ALTER TABLE .. ADD PRIMARY KEY ... command

Before processing
	> **ALTER TABLE ... ADD PRIMARY KEY ...**
command

1. 	Create a primary key name to use as the constraint name.
2. Change the **ALTER TABLE ... ADD PRIMARY KEY ...** command to into
**ALTER TABLE ... ADD CONSTRAINT \<constraint name> PRIMARY KEY ...**
form.
This is the only form we can specify a name for a primary key. If we run
ALTER TABLE .. ADD PRIMARY KEY, postgres
would create a constraint name internally in its own scheme. But the
problem is that we need to create constraint names
for shards in our own scheme which is \<constraint name>_\<shardid>.
Hence we need to create a name and send it to workers so that the
workers can append the shardid.
4. Run the changed command on the coordinator to make sure we are using
the same constraint name across the board.
5. Send the changed command to workers such that it is executed for the
main table as well as for the shards.

Fixes #6515.
2022-12-16 20:34:00 +03:00
aykut-bozkurt 9c0073ba57
remove unused boundary type (#6563)
Removes unused job boundary tag `SUBQUERY_MAP_MERGE_JOB`.

Only usage is at `BuildMapMergeJob`, which is only called when the
boundary = `JOIN_MAP_MERGE_JOB`. Hence, it should be safe to remove.
2022-12-16 18:19:22 +03:00
Onder Kalaci feb5534c65 Do not create additional WaitEventSet for RemoteSocketClosed checks
Before this commit, we created an additional WaitEventSet for
checking whether the remote socket is closed per connection -
only once at the start of the execution.

However, for certain workloads, such as pgbench select-only
workloads, the creation/deletion of the additional WaitEventSet
adds ~7% CPU overhead, which is also reflected on the benchmark
results.

With this commit, we use the same WaitEventSet for the purposes
of checking the remote socket at the start of the execution.

We use "rebuildWaitEventSet" flag so that the executor can re-use
the existing WaitEventSet.

As a result, we see the following improvements on PG 15:

main			 : 120051 tps, 0.532 ms latency avg.
avoid_wes_rebuild: 127119 tps, 0.503 ms latency avg.

And, on PG 14, as expected, there is no difference

main			 : 129191 tps, 0.495 ms latency avg.
avoid_wes_rebuild: 129480 tps, 0.494 ms latency avg.

But, note that PG 15 is slightly (~1.5%) slower than PG 14.
That is probably the overhead of checking the remote socket.
2022-12-14 22:42:55 +01:00
Onder Kalaci d52da55ac0 Move WaitEvent to DistributedExecution
Prep. for caching WaitEventsSet/WaitEvents
2022-12-14 21:59:19 +01:00
Nils Dijk b5b73d78c3
add prepare and finish pg upgrade functions to 11.2-1 (#6560)
Fixes a missed include in #6315.

While adding the cluster clock we have added some extra steps to
`citus_prepare_pg_upgrade` and `citus_finish_pg_upgrade`. These changes
were not added to the citus upgrade and downgrade scripts, this allowed
for a syntax error to slip in.

This PR adds the new versions of both UDF's to the upgrade script while
adding the old version to the downgrade script. This exposed the syntax
error which is also solved.
2022-12-14 12:34:22 +01:00
aykut-bozkurt 8be4ce546e
fix vanilla test status on CI (#6555)
- Because of the make command used for vanilla tests, test status is
always shown as success on CI. As a fix, I added `&& false` at the end
of the copying diff file to make the command fail when check-vanilla
fails.
```make
check-vanilla: all
	$(pg_regress_multi_check) --vanillatest || (cp $(vanilla_diffs_file) $(citus_abs_srcdir)/regression.diffs && false)
```

- I also fixed some vanilla tests that fails due to recently added clock
related operators shown up at some queries.
2022-12-13 11:15:47 +03:00
Gürkan İndibay 3f091e3493
Give nicer error message when using alter_table_set_access_method on a view (#6553)
DESCRIPTION: Fixes alter_table_set_access_method error for views.

Fixes #6001
2022-12-12 23:56:22 +03:00
aykut-bozkurt 1ad1a0a336
add citus_task_wait udf to wait on desired task status (#6475)
We already have citus_job_wait to wait until the job reaches the desired
state. That PR adds waiting on task state to allow more granular
waiting. It can be used for Citus operations. Moreover, it is also
useful for testing purposes. (wait until a task reaches specified state)

Related to #6459.
2022-12-12 22:41:03 +03:00
aykut-bozkurt 3da6e3e743
bgworkers with backend connection should handle SIGTERM properly (#6552)
Fixes task executor SIGTERM handling.

Problem:
When task executors are sent SIGTERM, their default handler
`bgworker_die`, which is set at worker startup, logs FATAL error. But
they do not release locks there before logging the error, which
sometimes causes hanging of the monitor. e.g. Monitor waits for the lock
forever at pg_stat flush after calling proc_exit.

Solution:
Because executors have connection to backend, they should handle SIGTERM
similar to normal backends. Normal backends uses `die` handler, in which
they set ProcDiePending flag and the next CHECK_FOR_INTERRUPTS call
handles it gracefully by releasing any lock before termination.
2022-12-12 16:44:36 +03:00
Onur Tirtir 2803470b58 Add lateral join checks for outer joins and drop the useless ones for semi joins 2022-12-07 18:27:50 +03:00
Onur Tirtir e7e4881289 Phase - III: recursively plan non-recurring sub join trees too 2022-12-07 18:27:50 +03:00
Onur Tirtir f52381387e Phase - II: recursively plan non-recurring subqueries too 2022-12-07 18:27:50 +03:00
Onur Tirtir f339450a9d Phase - I: recursively plan non-recurring relations 2022-12-07 18:27:50 +03:00
Ahmet Gedemenli cb02d62369
Unique names for replication artifacts (#6529)
DESCRIPTION: Create replication artifacts with unique names

We're creating replication objects with generic names. This disallows us
to enable parallel shard moves, as two operations might use the same
objects. With this PR, we'll create below objects with operation
specific names, by appending OparationId to the names.

* Subscriptions
* Publications
* Replication Slots
* Users created for subscriptions
2022-12-06 15:48:16 +03:00
Teja Mupparti e14dc5d45d Address the issues/comments from the original PR# 6315
1) Regular users fail to use clock UDF with permission issue.
2) Clock functions were declared as STABLE, whereas by definition they are VOLATILE. By design, any clock/time
   functions will return different results for each call even within a single SQL statement.

Note: UDF citus_get_transaction_clock() is a misnomer as it internally calls the clock tick which always returns
      different results for every invocation in the same transaction.
2022-12-05 11:06:21 -08:00
aykut-bozkurt 65f256eec4
* add SIGTERM handler to gracefully terminate task executors, \ (#6473)
Adds signal handlers for graceful termination, cancellation of
task executors and detecting config updates. Related to PR #6459.

#### How to handle termination signal?
Monitor need to gracefully terminate all running task executors before
terminating. Hence, we have sigterm handler for the monitor.

#### How to handle cancellation signal?
Monitor need to gracefully cancel all running task executors before
terminating. Hence, we have sigint handler for the monitor.

#### How to detect configuration changes?
Monitor has SIGHUP handler to reflect configuration changes while
executing tasks.
2022-12-02 18:15:31 +03:00
songjinzhou ad6450b793
fix the problem #5763 (#6519)
Co-authored-by: TsinghuaLucky912 <tsinghualucky912@foxmail.com>
Fixes https://github.com/citusdata/citus/issues/5763
2022-12-02 13:49:32 +01:00
Hanefi Onaldi d4394b2e2d
Fix spacing in multiline strings (#6533)
When using multiline strings, we occasionally forget to add a single
space at the end of the first line. When this line is concatenated with
the next one, the resulting string has a missing space.
2022-12-01 23:42:47 +03:00
Fabrízio de Royes Mello 37f3dff1ca
Simplify columnar perf example (#6526)
Rewrite the plpython function to generate random words in SQL to
simplify the usage and run the example.
2022-12-01 20:05:40 +01:00
songjinzhou 29f0196fdf
Add support for SET ACCESS METHOD in altering a distributed table (#6525)
Co-authored-by: TsinghuaLucky912 <postgres@localhost.localdomain>
2022-12-01 17:45:32 +01:00
Hanefi Onaldi 1f29c16262
Fix misleading GUC description (#6532)
citus.skip_advisory_lock_permission_checks skips checks when it is set
to 'on', not 'off'
2022-12-01 15:43:02 +03:00
Ahmet Gedemenli 0e92244bfe
Cleanup for shard moves (#6472)
DESCRIPTION: Extend cleanup process for replication artifacts

This PR adds new cleanup record types for:
* Subscriptions
* Replication slots
* Publications
* Users created for subscriptions

We add records for these object types, to `pg_dist_cleanup` during
creation phase. Once the operation is done, in case of success or
failure, we iterate those records and drop the objects. With this PR we
will not be dropping any of these objects during the operation. In
short, we will always be deferring the drop.

One thing that's worth mentioning is that we sort cleanup records before
processing (dropping) them, because of dependency relations among those
objects, e.g a subscription might depend on a publication. Therefore, we
always drop subscriptions before publications.

We have some renames in this PR:
* `TryDropOrphanedShards` -> `TryDropOrphanedResources`
* `DropOrphanedShardsForCleanup` -> `DropOrphanedResourcesForCleanup`
* `run_try_drop_marked_shards` -> `run_try_drop_marked_resources`
as these functions now process replication artifacts as well.

This PR drops function `DropAllLogicalReplicationLeftovers` and its all
usages, since now we rely on the deferring drop mechanism.
2022-11-30 15:38:05 +03:00
aykut-bozkurt 1f8675da43
nonblocking concurrent task execution via background workers (#6459)
Improvement on our background task monitoring API (PR #6296) to support
concurrent and nonblocking task execution.

Mainly we have a queue monitor background process which forks task
executors for `Runnable` tasks and then monitors their status by
fetching messages from shared memory queue in nonblocking way.
2022-11-30 14:29:46 +03:00
aykut-bozkurt 83ef600f27
fix false full join pushdown error check (#6523)
**Problem**: Currently, we error out if we detect recurring tuples in
one side without checking the other side of the join.

**Solution**: When one side of the full join consists recurring tuples
and the other side consists nonrecurring tuples, we should not pushdown
to prevent duplicate results. Otherwise, safe to pushdown.
2022-11-30 14:17:56 +03:00
Gokhan Gulbiz bc118ee551
Change GUC propagation flag's default value to off (#6516)
This PR changes
```citus.propagate_session_settings_for_loopback_connection``` default
value to off not to expose this feature publicly at this point. See
#6488 for details.
2022-11-29 13:25:53 +03:00
Philip Dubé cf69fc3652 Grammar: it's to its
Includes an error message

& one case of its to it's

Also fix "to the to" typos
2022-11-28 20:43:44 +00:00
Jelte Fennema 68de2ce601
Include gpid in all internal application names (#6431)
When debugging issues it's quite useful to see the originating gpid in
the application_name of a query on a worker. This already happens for
most queries, but not for queries created by the rebalancer or by
run_command_on_worker. This adds a gpid to those two application_names
too.

Note, that if the GPID of the new application_names is different than
the current GPID of the backend the backend will continue to keep 
the old gpid as its actual GPID. This PR is just meant to make sure 
that the application_name is as useful as it can be for users to 
look at. Updating of gpids will be done in a follow-up PR, and 
adding gpids to all internal connections will make this easier.
2022-11-25 11:16:33 +01:00
Teja Mupparti edaf88e0ff Fix the dangling pointer bug in get_merged_argument_list() 2022-11-22 09:41:10 -08:00
Onur Tirtir 80faf47ab5
Fix dangling pointer warning in AnyTableReplicated (#6504)
DESCRIPTION: Fixes a potential dangling pointer issue

Need to backport to 11.0 & 11.1 since we might want to release packages
for debian/bookworm based on those branches in future.
2022-11-21 16:42:00 +03:00
Jelte Fennema a477ffdf4b
Correctly fix OpenSSL 3.0 warnings (#6502)
In #6038 I tried to fix OpenSSL 3.0 warnings with PG13, but I had made a
mistake when doing that. This actually fixes these warnings.
2022-11-18 14:35:41 +01:00
Emel Şimşek 8e5ba45b74
Fixes a bug that causes crash when using auto_explain extension with ALTER TABLE...ADD FOREIGN KEY... queries. (#6470)
Fixes a bug that causes crash when using auto_explain extension with
ALTER TABLE...ADD FOREIGN KEY... queries.
Those queries trigger a SELECT query on the citus tables as part of the
foreign key constraint validation check. At the explain hook, workers
try to explain this SELECT query as a distributed query causing memory
corruption in the connection data structures. Hence, we will not explain
ALTER TABLE...ADD FOREIGN KEY... and the triggered queries on the
workers.

Fixes #6424.
2022-11-15 17:53:39 +03:00
Teja Mupparti 7358b826ef Remove the explicit-transaction requirement for the UDF citus_get_transaction_clock() as implicit transactions too use this UDF. 2022-11-10 10:54:36 -08:00
Marco Slot 77fbcfaf14
Propagate BEGIN properties to worker nodes (#6483)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-11-10 18:08:43 +01:00
Marco Slot fcaabfdcf3
Remove remaining master_create_distributed_table usages (#6477)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-11-04 16:30:06 +01:00
Marco Slot 666696c01c
Deprecate citus.replicate_reference_tables_on_activate, make it always off (#6474)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-11-04 16:21:10 +01:00
Onur Tirtir 1af28b3f27
Use CommitContext for subxact mgmt and reduce memory usage in CommitContext (#6099)
(Hopefully) Fixes #5000.

If memory allocation done for `SubXactContext *state` in `PushSubXact()`
fails, then `PopSubXact()` might segfault, for example, when grabbing
the
topmost `SubXactContext` from `activeSubXactContexts` if this is the
first
ever subxact within the current xact, with the following stack trace:
```c
citus.so!list_nth_cell(const List * list, int n) (\opt\pgenv\pgsql-14.3\include\server\nodes\pg_list.h:260)
citus.so!PopSubXact(SubTransactionId subId) (\home\onurctirtir\citus\src\backend\distributed\transaction\transaction_management.c:761)
citus.so!CoordinatedSubTransactionCallback(SubXactEvent event, SubTransactionId subId, SubTransactionId parentSubid, void * arg) (\home\onurctirtir\citus\src\backend\distributed\transaction\transaction_management.c:673)
CallSubXactCallbacks(SubXactEvent event, SubTransactionId mySubid, SubTransactionId parentSubid) (\opt\pgenv\src\postgresql-14.3\src\backend\access\transam\xact.c:3644)
AbortSubTransaction() (\opt\pgenv\src\postgresql-14.3\src\backend\access\transam\xact.c:5058)
AbortCurrentTransaction() (\opt\pgenv\src\postgresql-14.3\src\backend\access\transam\xact.c:3366)
PostgresMain(int argc, char ** argv, const char * dbname, const char * username) (\opt\pgenv\src\postgresql-14.3\src\backend\tcop\postgres.c:4250)
BackendRun(Port * port) (\opt\pgenv\src\postgresql-14.3\src\backend\postmaster\postmaster.c:4530)
BackendStartup(Port * port) (\opt\pgenv\src\postgresql-14.3\src\backend\postmaster\postmaster.c:4252)
ServerLoop() (\opt\pgenv\src\postgresql-14.3\src\backend\postmaster\postmaster.c:1745)
PostmasterMain(int argc, char ** argv) (\opt\pgenv\src\postgresql-14.3\src\backend\postmaster\postmaster.c:1417)
main(int argc, char ** argv) (\opt\pgenv\src\postgresql-14.3\src\backend\main\main.c:209)
```

For this reason, to be more defensive against memory-allocation errors
that could happen at `PushSubXact()`, now we use our pre-allocated
memory
context for the objects created in `PushSubXact()`.

This commit also attempts reducing the memory allocations done under
CommitContext to reduce the chances of consuming all the memory
available
to CommitContext.

Note that it's problematic to encounter with such a memory-allocation
error for other objects created in `PushSubXact()` as well, so above is
an **example** scenario that might result in a segfault.

DESCRIPTION: Fixes a bug that might cause segfaults when handling deeply
nested subtransactions
2022-11-03 00:57:32 +03:00
Onur Tirtir a5f7f001b0
Make sure to disallow triggers that depend on extensions (#6399)
DESCRIPTION: Makes sure to disallow triggers that depend on extensions

We were already doing so for `ALTER trigger DEPENDS ON EXTENSION`
commands. However, we also need to disallow creating Citus tables
having such triggers already, so this PR fixes that.
2022-11-02 16:27:31 +03:00
Alexander Kukushkin deeacfee04
Improve a query that terminates compeling backends from citus_update_node() (#6468)
DESCRIPTION: Improve a query that terminates compeling backends from citus_update_node()

1. Use pg_blocking_pids() function instead of self join on pg_locks. It exists since 9.6 and more accurate than pg_locks.
2. Prefix all function calls with pg_catalog schema to prevent privilege escalation by creating functions with similar names in a public schema.
3. Change logs and update comments to reflect the fact that the pg_terminate_backend() function only sends SIGTERM but not wating for the actual backend termination.
2022-11-02 12:32:00 +01:00
Alexander Kukushkin 402a30a2b7
Allow citus_update_node() to work with nodes from different clusters (#6466)
DESCRIPTION: Allow citus_update_node() to work with nodes from different clusters

citus_update_node(), citus_nodename_for_nodeid(), and citus_nodeport_for_nodeid() functions only checked for nodes in their own clusters and hence last two returned NULLs and the first one showed an error is the nodeId was from a different cluster.

Fixes https://github.com/citusdata/citus/issues/6433
2022-11-02 10:07:01 +01:00
oohira 3f66f3d9dd
Add missing space to citus.shard_count description (#6464)
DESCRIPTION: Add missing space to citus.shard_count description
2022-10-31 10:37:14 +01:00
Teja Mupparti 01103ce05d This implements a new UDF citus_get_cluster_clock() that returns a monotonically
increasing logical clock. Clock guarantees to never go back in value after restarts,
and makes best attempt to keep the value close to unix epoch time in milliseconds.

Also, introduces a new GUC "citus.enable_cluster_clock", when true, every
distributed transaction is stamped with logical causal clock and persisted
in a catalog pg_dist_commit_transaction.
2022-10-28 10:15:08 -07:00
Ahmet Gedemenli c379ff8614
Drop defer drop gucs (#6447)
DESCRIPTION: Drops GUC defer_drop_after_shard_split
DESCRIPTION: Drops GUC defer_drop_after_shard_move

Drop GUCs and related parts from the code.
Delete tests that specifically added for the GUCs.
Keep tests that can be used without the GUCs.
Update test output changes.

The motivation for this PR is to have an "always deferring" mechanism.
These two GUCs provide an option to not deferring dropping objects
during a shard move/split, and dropping them immediately. With this PR,
we will be always deferring dropping orphaned shards and other types of
objects.

We will have a separate PR to extend the deferred cleanup operation, so
that we would create records for deferred drop, for Subscriptions,
Publications, Replication Slots etc. This will make us be able to keep
track of created objects that needs to be dropped, during a shard
move/split. We will have objects created specifically for the current
operation; and those objects will be dropped at the end.

We have an issue (a draft roadmap) for enabling parallel shard moves.
For details please see: https://github.com/citusdata/citus/issues/6437
2022-10-25 16:48:34 +03:00
Onur Tirtir 2d14dd85e9
Not hardcode "false" in UpdateAutoConvertedForConnectedRelations (#6452)
This didn't cause any bugs since today we're always calling
UpdateAutoConvertedForConnectedRelations with autoconverted=false, so we
don't need to backport this to anywhere.
2022-10-21 18:14:20 +03:00
Onur Tirtir dbe2749bbf
Drop unreachable code from query_pushdown_planning.c (#6451)
Given that we cannot continue after a `RaiseDeferredErrorInternal(..,
ERROR)` call.
2022-10-21 18:04:31 +03:00
aykut-bozkurt 162c8a5160
Drop worker_fetch_foreign_file/worker_repartition_cleanup only if they exist when upgrading Citus (#6441)
We should not introduce breaking sql changes to upgrade files after they
are released. We did that for worker_fetch_foreign_file in v9.0.0 and
worker_repartition_cleanup in v9.2.0. Later when we try to drop those
udfs, they were missing for some clients unexpectedly due to breaking
change in an old upgrade script. For that case, the fix is to add DROP
IF EXISTS for those 2 udfs in 11.0-4--11.1-1.
2022-10-21 14:32:42 +03:00
Emel Şimşek 02fd1e6c03
Fix the crash that happens when using auto_explain extension with recursive queries (#6406)
This crash happens with recursively planned queries. For such queries,
subplans are explained via the ExplainOnePlan function of postgresql.
This function reconstructs the query description from the plan therefore
it expects the ActiveSnaphot for the query be available. This fix makes
sure that the snapshot is in the stack before calling ExplainOnePlan.

Fixes #2920.
2022-10-19 18:04:45 +03:00
Jelte Fennema 737e2bb1bb
Don't leak search_path to workers on DDL (#6444)
DESCRIPTION: Don't leak search_path to workers on DDL

For DDL we have to set the `search_path` on workers to the same as on
the coordinator for some DDL to work. Previously this search_path would
leak outside of the transaction that was used for the DDL. This fixes
that by using `SET LOCAL` instead of `SET`. The only place where we
still use plain `SET` is for DDL commands that are not allowed within
transactions, such as `CREATE INDEX CONCURRENLTY`.

This fixes this flaky test:
```diff
 CONTEXT:  SQL statement "SELECT change_id                           FROM distributed_triggers.data_changes
       WHERE shard_key_value = NEW.shard_key_value AND object_id = NEW.object_id
       ORDER BY change_id DESC LIMIT 1"
-PL/pgSQL function record_change() line XX at SQL statement
+PL/pgSQL function distributed_triggers.record_change() line 17 at SQL statement
 while executing command on localhost:57638
 DELETE FROM data_ref_table where shard_key_value = 'hello';
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/27849/workflows/75ae5f1a-100b-4b7a-b991-7de069f39ee1/jobs/831429

I had tried to fix this flaky test in #5894 and then I tried
implementing a better fix in #5896, where @marcocitus suggested this
better fix. This change reverts the fix from #5894 and implements the
fix suggested by Marco.


Our multi_mx_alter_distributed_table test actually depended on the old
buggy search_path leaking behavior. After fixing the bug that test would
fail like this:
```diff
 CALL proc_0(1.0);
 DEBUG:  pushing down the procedure
-NOTICE:  Res: 3
-DETAIL:  from localhost:xxxxx
+ERROR:  relation "test_proc_colocation_0" does not exist
+CONTEXT:  PL/pgSQL function mx_alter_distributed_table.proc_0(double precision) line 5 at SQL statement
+while executing command on localhost:57637
 RESET client_min_messages;
```

I fixed this test by fully qualifying the table names used in the
procedure. I think it's quite unlikely that actual users depend 
on this behavior though. Since it would require first doing 
DDL before calling a procedure in a session where the
search_path was changed after connecting.
2022-10-19 16:47:35 +02:00
Ahmet Gedemenli cdbda9ea6b
Add failure test for shard move (#6325)
DESCRIPTION: Adds failure test for shard move
DESCRIPTION: Remove function `WaitForAllSubscriptionsToBecomeReady` and
related tests

Adding some failure tests for shard moves.
Dropping the not-needed-anymore function
`WaitForAllSubscriptionsToBecomeReady`, as the subscriptions now start
as ready from the beginning because we don't use logical replication
table sync workers anymore.

fixes: #6260
2022-10-19 14:25:26 +02:00
Onur Tirtir 5aec88d084
Not try locking relations referencing to views (#6430)
Since there can't be such a foreign key already.

This mainly fixes the error that Citus throws
when trying to truncate a distributed view.

Fixes #5990.
2022-10-19 11:24:22 +03:00
Gokhan Gulbiz e87eda6496
Introduce a new GUC to propagate local settings to new connections in rebalancer (#6396)
DESCRIPTION: Introduce
```citus.propagate_session_settings_for_loopback_connection``` GUC to
propagate local settings to new connections.

Fixes: #5289
2022-10-18 12:50:30 +03:00
Jelte Fennema 60eb67b908
Increase shard move test coverage by improving advisory locks (#6429)
To be able to test non-blocking shard moves we take an advisory lock, so
we can pause the shard move at an interesting moment. Originally this
was during the logical replication catch up phase. But when I added
tests for the rebalancer progress I moved this lock before the initial
data copy. This allowed testing of the rebalance progress, but
inadvertently made our non-blocking tests not actually test if we held
unintended locks during logical replication catch up.

This fixes that by creating two types of advisory locks, one before the
copy and one after. This causes the tests to actually test their
intended scenario again.

Furthermore it starts using one of these locks for blocking shard moves
too. Which allowed me to reduce the complexity of the rebalance progress
test suite quite a bit. It also allowed enabling some flaky tests again,
because this stopped them from being flaky. And finally it allowed
testing of rebalance progress for blocking shard copy operations as
well.

In passing it fixes a flaky test during parallel blocking shard moves by
ordering the output.
2022-10-17 17:32:28 +02:00
Ahmet Gedemenli 96912d9ba1
Add status column to get_rebalance_progress() (#6403)
DESCRIPTION: Adds status column to get_rebalance_progress()

Introduces a new column named `status` for the function
`get_rebalance_progress()`. For each ongoing shard move, this column
will reveal information about that shard move operation's current
status.

For now, candidate status messages could be one of the below.

* Not Started
* Setting Up
* Copying Data
* Catching Up
* Creating Constraints
* Final Catchup
* Creating Foreign Keys
* Completing
* Completed
2022-10-17 16:55:31 +03:00
Onur Tirtir 4152a391c2
Properly set col names for shard rels that citus_extradata_container points to (#6428)
Deparser function set_relation_column_names() knows that it needs to
re-evaluate column names based on relation's tuple descriptor when
the rte belongs to a relation (RTE_RELATION).

However before this commit, it didn't know about the fact that citus
might wrap such an rte with an rte that points to
citus_extradata_container() placeholder.

And because of this, it was simply taking the column aliases
(e.g., "bar" in "foo AS bar") into the account and this might result in
an incorrectly deparsed query as in below case:

* Say, if we had view based on following query:
  ```sql
  SELECT a FROM table;
  ```
* And if we rename column "a" to "b", the view query normally becomes:
  ```sql
  SELECT b AS a FROM table;
  ```
* So before this commit, deparsing a query based on that view was
resulting in such a query due to deparsing based on the column aliases,
  which is not correct:
  ```sql
  SELECT a FROM table;
  ```

Fixes #5932.

DESCRIPTION: Fixes a bug that might cause failing to query the views
based on tables that have renamed columns
2022-10-14 17:31:25 +03:00
Önder Kalacı 8b624b5c9d
Detect remotely closed sockets and add a single connection retry in the executor (#6404)
PostgreSQL 15 exposes WL_SOCKET_CLOSED in WaitEventSet API, which is
useful for detecting closed remote sockets. In this patch, we use this
new event and try to detect closed remote sockets in the executor.

When a closed socket is detected, the executor now has the ability to
retry the connection establishment. Note that, the executor can retry
connection establishments only for the connection that has not been
used. Basically, this patch is mostly useful for preventing the executor
to fail if a cached connection is closed because of the worker node
restart (or worker failover).

In other words, the executor cannot retry connection establishment if we
are in a distributed transaction AND any command has been sent over the
connection. That requires more sophisticated retry mechanisms. For now,
fixing the above use case is enough.


Fixes #5538 

Earlier discussions: #5908, #6259 and #6283

### Summary of the current approach regards to earlier trials

As noted, we explored some alternatives before getting into this.
https://github.com/citusdata/citus/pull/6283 is simple, but lacks an
important property. We should be checking for `WL_SOCKET_CLOSED`
_before_ sending anything over the wire. Otherwise, it becomes very
tricky to understand which connection is actually safe to retry. For
example, in the current patch, we can safely check
`transaction->transactionState == REMOTE_TRANS_NOT_STARTED` before
restarting a connection.

#6259 does what we intent here (e.g., check for sending any command).
However, as @marcocitus noted, it is very tricky to handle
`WaitEventSets` in multiple places. And, the executor is designed such
that it reacts to the events. So, adding anything `pre-executor` seemed
too ugly.

In the end, I converged into this patch. This patch relies on the
simplicity of #6283 and also does a very limited handling of
`WaitEventSets`, just for our purpose. Just before we add any connection
to the execution, we check if the remote session has already closed.
With that, we do a brief interaction of multiple wait event processing,
but with different purposes. The new wait event processing we added does
not even consider cancellations. We let that handled by the main event
processing loop.

Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-10-14 15:08:49 +02:00
Jelte Fennema 0cee79a7ab
Actually enable improved blocked process detection (#6426)
In #6405 I added better improved blocked process detection for isolation
tests. But when cleaning up unnecessary code I cleaned up a bit too
much. This actually includes the new function definition in our
migrations.
2022-10-13 09:50:37 +02:00
Onur Tirtir 20847515fa
Hint users to call "citus_set_coordinator_host" first (#6425)
If an operation requires having coordinator in pg_dist_node and if that
is not the case, then we automatically add the coordinator into
pg_dist_node if user didn't add any worker nodes yet.

However, if user have already added some worker nodes before, we throw
an error. With this commit, we improve the error thrown in that case.

Closes #6423 based on the discussion made there.
2022-10-12 18:18:51 +03:00
Jelte Fennema 6277ffd69e
Reduce isolation flakyness by improving blocked process detection (#6405)
Sometimes our CI randomly fails on a test in a way similar to this:
```diff
 step s2-drop:
     DROP TABLE cancel_table;
-
+ <waiting ...>
+step s2-drop: <... completed>

 starting permutation: s1-timeout s1-begin s1-sleep10000 s1-rollback s1-reset s1-drop
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/26524/workflows/5415b84f-13a3-482f-bef9-648314c79a67/jobs/756377

I tried to fix that already in #6252 by disabling the maintenance daemon
during isolation tests. But it seems that hasn't fixed all cases of
these errors. This is another attempt at fixing these issues that seems
to have better results.

What it does is that it starts using the pInterestingPids parameter that
citus_isolation_test_session_is_blocked receives. With this change we
start filter out block-edges that are not caused by any of these pids.

In passing this change also makes it possible to run 
`isolation_create_distributed_table_concurrently` with
`check-isolation-base`
2022-10-12 16:35:09 +02:00
Hanefi Onaldi ec3eebbaf6
Rename a function that collides with PG15 (#6422)
PG15 introduced a function called ReplicationSlotName that causes
conflicts with our function with the same name. I solved this issue by
renaming our function to ReplicationSlotNameForNodeAndOwner

Relevant PG commit:
c3b5992b91
2022-10-12 13:24:04 +03:00
Jelte Fennema cb34adf7ac
Don't reassign global PID when already assigned (#6412)
DESCRIPTION: Fix bug in global PID assignment for rebalancer
sub-connections

In CI our isolation_shard_rebalancer_progress test would sometimes fail
like this:
```diff
+isolationtester: canceling step s1-rebalance-c1-block-writes after 60 seconds
 step s1-rebalance-c1-block-writes:
  SELECT rebalance_table_shards('colocated1', shard_transfer_mode:='block_writes');
- <waiting ...>
+
+ERROR:  canceling statement due to user request
 step s7-get-progress:
```

Source:
https://app.circleci.com/pipelines/github/citusdata/citus/27855/workflows/2a7e335a-f3e8-46ed-b6bd-6920d42f7214/jobs/831710

It turned out this was an actual bug in the way our assigning of global
PIDs interacts with the way we connect to ourselves as the shard
rebalancer. The first command the shard rebalancer sends is a SET 
ommand to change the application_name to `citus_rebalancer`. If
`StartupCitusBackend` is called after this command is processed, then it
overwrites the global PID that was extracted from the previous
application_name. This makes sure that we don't do that, and continue to
use the original global PID. While it might seem that we only call 
`StartupCitusBackend` once for each query backend, this isn't actually 
the case. Whenever pg_dist_partition gets ANALYZEd by autovacuum
we indirectly call `StartupCitusBackend` again, because we invalidate 
the cache then.

In passing this fixes two other things as well:
1. It sets `distributedCommandOriginator` correctly in
   `AssignGlobalPID`, by using IsExternalClientBackend(). This doesn't
   matter much anymore, since AssignGlobalPID effectively becomes a
   no-op in this PR for any non-external client backends.
2. It passes the application_name to InitializeBackendData in
   StartupCitusBackend, instead of INVALID_CITUS_INTERNAL_BACKEND_GPID
   (which effectively got casted to NULL). In practice this doesn't
   change the behaviour of the call, since the call is a no-op for every
   backend except the maintenance daemon. And the behaviour of the call
   is the same for NULL as for the application_name of the maintenance
   daemon.
2022-10-11 16:41:01 +02:00
Naisila Puka 89aa9a015f
Fixes empty password issue (#6417) 2022-10-11 15:56:44 +03:00
Onur Tirtir 0b81f68def
Use memcpy instead of memcpy_s to avoid pointless limits in columnar (#6419)
DESCRIPTION: Raises memory limits in columnar from 256MB to 1GB for
reads and writes

This doesn't completely fix #5918 but at least increases the
buffer limits that might cause throwing an error when reading
from or writing into into columnar storage. A way better approach
to fix this is documented in #6420.

Replacing memcpy_s with memcpy is quite safe in those places
since we anyway make sure to allocate enough amount of memory
before writing into related buffers.
2022-10-11 14:57:31 +03:00
Onur Tirtir 517b72a9d5
Fix use-after-free in GetAlterTriggerStateCommand() (#6413)
Fix use-after-free in GetAlterTriggerStateCommand() introduced in #6398.
2022-10-10 16:38:21 +03:00
Gokhan Gulbiz 1776bdf654
Limit citus_drain_node to drain the specified node only (#6361)
DESCRIPTION: Fixes citus_drain_node to drain the specified worker only.

Fixes #6267
2022-10-09 13:33:08 +03:00
Onur Tirtir 86e186f671
Retain trigger settings when re-creating the triggers (on shards) (#6398)
Fixes https://github.com/citusdata/citus/issues/6394.

DESCRIPTION: Fixes a bug that causes creating disabled-triggers on
shards as enabled

Since CREATE TRIGGER doesn't have syntax support to specify
whether the trigger should be enabled/disabled, the underlying
PG function (`pg_get_triggerdef()`) that we use to generate the
command to create the trigger is not enough. For this reason, we
append a second command to enable/disable trigger, right after
creating it.

We don't retain explicit extension dependencies set by using
`ALTER trigger DEPENDS ON EXTENSION` commands too, but apparently
right fix for that is to throw an error as in
`PreprocessAlterTriggerDependsStmt()`; so, opened a separate PR
to fix that #6399.
2022-10-06 14:51:07 +03:00
Naisila Puka 27e867afbc
Propagates column aliases (#6400)
Propagates column aliases in the shard-level commands
2022-10-06 12:27:31 +03:00
Naisila Puka b5cba3a3fe
Use original relation to retrieve column name because of syscache (#6387)
During alter_distributed_table, we create a new table like the
original table but with the altered options.

To retrieve the name of the distribution column, we were using
the attribute syscache of the new table, since we already created
the new table as identical to the original table.

However, the attribute syscaches of these two tables are not
the same if the original table has dropped columns. The reason
is that dropped columns are all still present in the cache.
Hence, for example, the attnos would be different in the syscaches.

So, let's use the attribute syscache of the original table.
2022-10-06 12:08:00 +03:00
Ying Xu f21cbe68f8
[Columnar] Bugfix for Columnar: options ignored during ALTER TABLE rewrite (#6337)
DESCRIPTION: Fixes a bug that prevents retaining columnar table options after a table-rewrite
A fix for this issue: Columnar: options ignored during ALTER TABLE
rewrite #5927
The OID for the temporary table created during ALTER TABLE was not the
same as the original table's OID so the columnar options were not being
applied during rewrite.

The change is that I applied the original table's columnar options to
the new table so that it has the correct options during write. I also
added a test.
2022-10-05 11:42:09 -07:00
Ahmet Gedemenli e36890ce55
Add source_lsn and target_lsn fields into get_rebalance_progress (#6364)
DESCRIPTION: Adds source_lsn and target_lsn fields into
get_rebalance_progress

Adding two fields named `source_lsn` and `target_lsn` to the function
`get_rebalance_progress`.
Target lsn data is fetched in `GetShardStatistics`, by expanding the
query sent to workers (joining with pg_subscription_rel and
pg_stat_subscription). Then put into the hashmap, for each shard.
Source lsn data is fetched in `BuildWorkerShardStatististicsHash`, in
the loop that iterate each node, by sending a pg_current_wal_lsn query
to each node. Then put into the hashmap, for each node.
2022-10-05 11:12:24 +03:00
Hanefi Onaldi 11a9a3771f
Ensure no dependencies to index before drop 2022-10-04 18:56:20 +03:00
Ahmet Gedemenli d0fa10a98c
Bump Citus to 11.2devel (#6385) 2022-09-30 14:47:42 +03:00
Jelte Fennema 24e06af6d2
Reuse connections for Splits and Logical Replication (#6314)
In Split, Logical replication logic and ShardCleaner we call
`SendCommandListToWorkerOutsideTransaction` and
`SendOptionalCommandListToWorkerOutsideTransaction` frequently. This
opens new connection for each of those calls, even though we already
have a perfectly good connection lying around.

This PR adds two new APIs
`SendCommandListToWorkerOutsideTransactionWithConnection` and
`SendOptionalCommandListToWorkerOutsideTransactionWithConnection` that
allow sending a list of queries in a transaction over an existing
connection. We also update the callers (Split, ShardCleaner, Logical
Replication) to use these new APIs instead.

Co-authored-by: Nitish Upreti <niupre@microsoft.com>
Co-authored-by: Onder Kalaci <onderkalaci@gmail.com>
2022-09-26 13:37:40 +02:00
Jelte Fennema d9a9a3263b
Revert replica identity creation order for shard moves (#6367)
In Citus 11.1.0 we changed the order of doing the initial data copy and
the replica identity creation when doing a non blocking shard move. This
was done to try and increase the speed with which shard moves could be
done. But after doing more extensive performance testing this change
turned out to have a negative impact on the speed of moves on the setups
that I tested.

Looking at the resource usage metrics of the VMs the reason for this
seems to be that these shard moves were bottlenecked by disk bandwidth.
While creating replica identities in bulk after the initial copy will
reduce CPU usage a bit, it does require an additional sequence scan of
the just written data. So when a VM is bottlenecked on disk, it makes
sense to spend a little bit more CPU to avoid an additional scan. Since
PKs are usually simple indexes that don't require lots of CPU to update,
as opposed to e.g. GiST indexes.

This reverts the order change to avoid a regression on shard move speed
in these cases.

For future releases we might consider re-evaluating our index creation
order for other indexes too, and create "simple" indexes before the
copy.
2022-09-23 14:55:25 +02:00
Onur Tirtir a868cc049a
Not allow ON DELETE/UPDATE SET DEFAULT actions on columns that default to sequences (#6340)
Given that we drop DEFAULT nextval('sequence') expressions from
shard relation columns, allowing `ON DELETE/UPDATE SET DEFAULT`
on such columns might cause inserting NULL values as a result
of a delete/update operation.

For this reason, we disallow ON DELETE/UPDATE SET DEFAULT actions
on columns that default to sequences.

DESCRIPTION: Disallows having ON DELETE/UPDATE SET DEFAULT actions on
columns that default to sequences

Fixes #6339.
2022-09-23 03:34:02 -07:00
Onur Tirtir de24a3eda5
Not drop default col exprs from shard when adding local table to metadata (#6323)
As we did for GENERATED STORED columns in #4613, we should not drop
column
default expressions that are not based on sequences from shard relation
since
such expressions need to exist e.g. for foreign key actions.

For the column default expressions that are based on sequences we cannot
do much, so we need to disallow having ON DELETE SET DEFAULT actions on
such columns in a separate PR, see #6339.

Fixes #6318.

DESCRIPTION: Fixes a bug that might cause inserting incorrect DEFAULT
values when applying foreign key actions
2022-09-23 03:05:08 -07:00
Ahmet Gedemenli bae4b47c2f
Fix dropping replication slot (#6359)
DESCRIPTION: Fixes dropping replication slots

As detected by a flaky test, Citus sometimes fails to drop replication
slots, possibly due to a race condition, at the end of a shard split.
With this PR, we retry to drop them in case of an `OBJECT_IN_USE` error,
consistently for 20 seconds.

fixes: #6326
2022-09-21 16:29:56 +03:00
Nitish Upreti e9508b2603
Shard Split : Add / Update logging (#6336)
DESCRIPTION: Improve logging during shard split and resource cleanup

### DESCRIPTION

This PR makes logging improvements to Shard Split : 

1. Update confusing logging to fix #6312
2. Added new `ereport(LOG` to make debugging easier as part of telemetry review.
2022-09-16 09:39:08 -07:00
Marco Slot 8544346a78
Allow create_distributed_table_concurrently on an empty node (#6353)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-09-16 10:55:02 +02:00
Onder Kalaci 766f340ce0 Prevent failures on partitioned distributed tables with statistics objects on PG 15
Comment from the code is clear on this:
/*
 * The statistics objects of the distributed table are not relevant
 * for the distributed planning, so we can override it.
 *
 * Normally, we should not need this. However, the combination of
 * Postgres commit 269b532aef55a579ae02a3e8e8df14101570dfd9 and
 * Citus function AdjustPartitioningForDistributedPlanning()
 * forces us to do this. The commit expects statistics objects
 * of partitions to have "inh" flag set properly. Whereas, the
 * function overrides "inh" flag. To avoid Postgres to throw error,
 * we override statlist such that Postgres does not try to process
 * any statistics objects during the standard_planner() on the
 * coordinator. In the end, we do not need the standard_planner()
 * on the coordinator to generate an optimized plan. We call
 * into standard_planner() for other purposes, such as generating the
 * relationRestrictionContext here.
 *
 * AdjustPartitioningForDistributedPlanning() is a hack that we use
 * to prevent Postgres' standard_planner() to expand all the partitions
 * for the distributed planning when a distributed partitioned table
 * is queried. It is required for both correctness and performance
 * reasons. Although we can eliminate the use of the function for
 * the correctness (e.g., make sure that rest of the planner can handle
 * partitions), it's performance implication is hard to avoid. Certain
 * planning logic of Citus (such as router or query pushdown) relies
 * heavily on the relationRestrictionList. If
 * AdjustPartitioningForDistributedPlanning() is removed, all the
 * partitions show up in the, causing high planning times for
 * such queries.
 */
2022-09-15 14:36:05 +03:00
aykut-bozkurt 739b91afa6
ensure we have more active nodes than replication factor. (#6341)
DESCRIPTION: Fixes floating exception during
create_distributed_table_concurrently.

Fixes #6332.
During create_distributed_table_concurrently, when there is no active
primary node, it fails with floating exception. We added similar check
with create_distributed_table. It will fail with proper message if
current active node is less than replication factor.
2022-09-14 18:20:50 +03:00
Marco Slot 4ab415c43a
Fix escaping in sequence dependency queries (#6345)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-09-14 17:43:24 +03:00
Sameer Awasekar 4851b4e8f2
Introduce code changes to fix Issue:6303 (#6328)
The PR introduces code changes to fix Issue
[6303](https://github.com/citusdata/citus/issues/6303)

`create_distributed_table_concurrently` following drop column, creates a
buggy situation in split decoder.
 * Consider the below scenario:
* Session1 : Drop column followed by
create_distributed_table_concurrently
 * Session2 : Concurrent insert workload

The child shards created by `create_distributed_table_concurrently` will
have less columns than the source shard because some column were
dropped. The incoming tuple from session2 will have more columns as the
writes happened on source shard. But now the tuple needs to be applied
on child shard. So we need to format existing tuple according to child
schema and skip dropped column values.
The PR fixes this by reformatting the tuple according the target child
schema.

Test:
1) isolation_create_distributed_concurrently_after_drop_column - Repros
the issue and tests on the same.
2022-09-14 19:56:32 +05:30
Marco Slot 7a92d873b6
Fix bugs in CheckIfRelationWithSameNameExists (#6343)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-09-14 15:42:46 +02:00
Nils Dijk da527951ca
Fix: rebalance stop non super user (#6334)
No need for description, fixing issue introduced with new feature for
11.1

Fixes #6333 

Due to Postgres' C api being o-indexed and postgres' attributes being
1-indexed, we were reading the wrong Datum as the Task owner when
cancelling. Here we add a test to show the error and fix the off-by-one
error.
2022-09-13 23:19:31 +02:00
Hanefi Onaldi f34467dcb3
Remove missing declaration warning (#6330)
When I built Citus on PG15beta4 locally, I get a warning message.

```
utils/background_jobs.c:902:5: warning: declaration does not declare anything
      [-Wmissing-declarations]
                                __attribute__((fallthrough));
                                ^
1 warning generated.
```

This is a hint to the compiler that we are deliberately falling through
in a switch-case block.
2022-09-13 13:48:51 +03:00
Jelte Fennema f13b140621
Show citus_copy_shard_placement progress in get_rebalance_progress (#6322)
DESCRIPTION: Show citus_copy_shard_placement progress in
get_rebalance_progress

When rebalancing to a new node that does not have reference tables yet
the rebalancer will first copy the reference tables to the nodes.
Depending on the size of the reference tables, this might take a long
time. However, there's no indication of what's happening at this stage
of the rebalance.

This PR improves this situation by also showing the progress of any
citus_copy_shard_placement calls when calling get_rebalance_progress.
2022-09-13 08:59:52 +00:00
Naisila Puka 76ff4ab188
Adds support for unlogged distributed sequences (#6292)
We can now do the following:
- Distribute sequence with logged/unlogged option
- ALTER TABLE my_sequence SET LOGGED/UNLOGGED
- ALTER SEQUENCE my_sequence SET LOGGED/UNLOGGED

Relevant PG commit
344d62fb9a
2022-09-13 10:53:39 +03:00
Hanefi Onaldi 5cfcc63308
Add warning messages for cluster commands on partitioned tables (#6306)
PG15 introduces `CLUSTER` commands for partitioned tables. Similar to a
`CLUSTER` command with no supplied table names, these commands also can
not be run inside transaction blocks and therefore can not be propagated
in a distributed transaction block with ease. Therefore we raise warnings.

Relevant PG commit: cfdd03f45e6afc632fbe70519250ec19167d6765
2022-09-13 00:05:58 +03:00
Hanefi Onaldi 164f2fa0a6
PG15: Add support for NULLS NOT DISTINCT (#6308)
Relevant PG commit: 94aa7cc5f707712f592885995a28e018c7c80488
2022-09-12 23:47:37 +03:00
Marco Slot b79111527e
Avoid blocking writes in create_distributed_table_concurrently (#6324)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-09-12 12:09:37 -07:00
Nils Dijk cda3686d86
Feature: run rebalancer in the background (#6215)
DESCRIPTION: Add a rebalancer that uses background tasks for its
execution

Based on the baclground jobs and tasks introduced in #6296 we implement
a new rebalancer on top of the primitives of background execution. This
allows the user to initiate a rebalance and let Citus execute the long
running steps in the background until completion.

Users can invoke the new background rebalancer with `SELECT
citus_rebalance_start();`. It will output information on its job id and
how to track progress. Also it returns its job id for automation
purposes. If you simply want to wait till the rebalance is done you can
use `SELECT citus_rebalance_wait();`

A running rebalance can be canelled/stopped with `SELECT
citus_rebalance_stop();`.
2022-09-12 20:46:53 +03:00
Marco Slot 48f7d6c279
Show local managed tables in citus_tables and hide tables owned by extensions (#6321)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-09-12 17:49:17 +03:00
Marco Slot b036e44aa4
Fix bug preventing isolate_tenant_to_new_shard with text column (#6320)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-09-12 16:29:57 +02:00
naisila 47bea76c6c Revert "Support JSON_TABLE on PG 15 (#6241)"
This reverts commit 1f4fe35512.
2022-09-12 15:20:17 +03:00
naisila 53ffbe440a Revert SQL/JSON features in ruleutils_15.c
Reverting the following commits:
977ddaae56
4a5cf06def
9ae19c181f
30447117e5
f9c43f4332
21dba4ed08
262932da3e

We have to manually make changes to this file.
Follow the relevant PG commit in ruleutils.c & make the exact same changes in ruleutils_15.c

Relevant PG commit:
96ef3237bf741c12390003e90a4d7115c0c854b7
2022-09-12 15:20:17 +03:00
Marco Slot 2e943a64a0
Make shard moves more idempotent (#6313)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-09-09 18:21:36 +02:00
Jelte Fennema a2d86214b2
Share more replication code between moves and splits (#6310)
The logical replication catchup part for shard splits and shard moves is
very similar. This abstracts most of that similarity away into a single
function. This also improves the logic for non blocking shard splits a
bit, by using faster foreign key creation. It also parallelizes index creation
which shard moves were already doing, but shard splits did not.
2022-09-09 16:45:38 +02:00
Marco Slot ba2fe3e3c4
Remove do_repair option from citus_copy_shard_placement (#6299)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-09-09 15:44:30 +02:00
Nils Dijk 00a94c7f13
Implement infrastructure to run sql jobs in the background (#6296)
DESCRIPTION: Add infrastructure to run long running management operations in background

This infrastructure introduces the primitives of jobs and tasks.
A task consists of a sql statement and an owner. Tasks belong to a
Job and can depend on other tasks from the same job.

When there are either runnable or running tasks we would like to
make sure a bacgrkound task queue monitor process is running. A Task
could be in running state while there is actually no monitor present
due to a database restart or failover. Once the monitor starts it
will reset any running task to its runnable state.

To make sure only one background task queue monitor is ever running
at once it will acquire an advisory lock that self conflicts.

Once a task is done it will find all tasks depending on this task.
After checking that the task doesn't have unmet dependencies it will
transition the task from blocked to runnable state for the task to
be picked up on a subsequent task start.

Currently only one task can be running at a time. This can be
improved upon in later releases without changes to the higher level
API.

The initial goal for this background tasks is to allow a rebalance
to run in the background. This will be implemented in a subsequent PR.
2022-09-09 16:11:19 +03:00
Jelte Fennema 76137e967f
Create all foreign keys quickly at the end of a shard move (#6148)
Previously we would create foreign keys to reference table in an extra
fast way at the end of a shard move. This uses that same logic to also
do it for foreign keys between distributed tables.

Fixes #6141
2022-09-09 09:58:33 +02:00
Nils Dijk cc0eeea4c5
remove redundant call to TerminateBackgroundWorker (#6307)
Remove redundant call to TerminateBackgroundWorker
Discussion: https://github.com/citusdata/citus/pull/6296#discussion_r965926695
2022-09-09 07:37:02 +02:00
Ahmet Gedemenli eadc88a800
Introduce GUC citus.skip_constraint_validation (#6281)
Introduces a new GUC named citus.skip_constraint_validation, which basically skips constraint validation when set to on.
For some several places that we hack to skip the foreign key validation phase, now we use this GUC.
2022-09-08 18:13:18 +03:00
Jelte Fennema e29db74a19
Don't override postgres C symbols with our own (#6300)
When introducing our overrides of pg_cancel_backend and
pg_terminate_backend we accidentally did that in such a way that we
cannot call the original pg_cancel_backend and pg_terminate_backend from
C anymore. This happened because we defined the exact same symbols in
our shared library as postgres does in its own binary.

This fixes that by using a different names for the C function than for
the SQL function.

Making this work in all upgrade and downgrade scenarios is not trivial
though, because we actually need to remove the C function definition.
Postgres errors in two different times when the symbol that a C function
wants to call is not defined in the library it expects it in:
1. When creating the SQL function definition
2. When calling the SQL function

Item 1 causes an issue when creating our extension for the first time.
We then go execute all the migrations that we have. So if the 11.0
migration contains a SQL function definition that still references the
pg_cancel_backend symbol, that migration will fail. This issue is solved
by actually changing the SQL definition in the old migration.

This is not enough to fix all issues though. Item 2 causes an issue
after an upgrade to 11.1, because it won't have the new definition of
the SQL function. This is solved by recreating the SQL functions in the
migration to 11.1. That way it gets the new definition.

Then finally there's the case of downgrades. To continue to make our
pg_cancel_backend SQL function work after downgrading, we will need to
make a patch release for 11.0 that includes the new citus_cancel_backend
symbol. This is done in a separate commit.
2022-09-07 11:27:05 +02:00
Nitish Upreti d7404a9446
'Deferred Drop' and robust 'Shard Cleanup' for Splits. (#6258)
DESCRIPTION:
This PR adds support for 'Deferred Drop' and robust 'Shard Cleanup' for Splits.

Common Infrastructure
This PR introduces new common infrastructure so as any operation that wants robust cleanup of resources can register with the cleaner and have the resources cleaned appropriately based on a specified policy. 'Shard Split' is the first consumer using this new infrastructure.
Note : We only support adding 'shards' as resources to be cleaned-up right now but the framework will be extended to support other resources in future.

Deferred Drop for Split
Deferred Drop Support ensures that shards undergoing split are not dropped inline as part of operation but dropped later when no active read queries are running on shard. This helps with :

Avoids any potential deadlock scenarios that can cause long running Split operation to rollback.
Avoids Split operation blocking writes and then getting blocked (due to running queries on the shard) when trying to drop shards.
Deferred drop is the new default behavior going forward.
Shard Cleaner Extension
Shard Cleaner is a background task responsible for deferred drops in case of 'Move' operations.
The cleaner has been extended to ensure robust cleanup of shards (dummy shards and split children) in case of a failure based on the new infrastructure mentioned above. The cleaner also handles deferred drop for 'Splits'.

TESTING:
New test ''citus_split_shard_by_split_points_deferred_drop' to test deferred drop support.
New test 'failure_split_cleanup' to test shard cleanup with failures in different stages.
Update 'isolation_blocking_shard_split and isolation_non_blocking_shard_split' for deferred drop.
Added non-deferred drop version of existing tests : 'citus_split_shard_no_deferred_drop' and 'citus_non_blocking_splits_no_deferred_drop'
2022-09-06 12:11:20 -07:00
Gokhan Gulbiz ac96370ddf
Use IsMultiStatementTransaction for SELECT .. FOR UPDATE queries (#6288)
* Use IsMultiStatementTransaction instead of IsTransaction for row-locking operations.

* Add regression test for SELECT..FOR UPDATE statement
2022-09-06 16:38:41 +02:00
Emel Şimşek 6f06ff78cc
Throw an error if there is a RangeTblEntry that is not assigned an RTE identity. (#6295)
* Fix issue : 6109 Segfault or (assertion failure) is possible when using a SQL function

* DESCRIPTION: Ensures disallowing the usage of SQL functions referencing to a distributed table and prevents a segfault.
Using a SQL function may result in segmentation fault in some cases.
This change fixes the issue by throwing an error message when a SQL function cannot be handled.

Fixes #6109.

* DESCRIPTION: Ensures disallowing the usage of SQL functions referencing to a distributed table and prevents a segfault.
Using a SQL function may result in segmentation fault in some cases. This change fixes the issue by throwing an error message when a SQL function cannot be handled.

Fixes #6109.

Co-authored-by: Emel Simsek <emel.simsek@microsoft.com>
2022-09-06 15:46:41 +02:00
aykut-bozkurt 69726648ab
verify shards if exists for insert, delete, update (#6280)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-09-06 15:29:14 +02:00
Hanefi Onaldi 85b19c851a
Disallow distributing by numeric with negative scale
PG15 allows numeric scale to be negative or greater than precision. This
causes issues and we may end up routing queries to a wrong shard due to
differing hash results after rounding.

Formerly, when specifying NUMERIC(precision, scale), the scale had to be
in the range [0, precision], which was per SQL spec. PG15 extends the
range of allowed scales to [-1000, 1000].

A negative scale implies rounding before the decimal point. For
example, a column might be declared with a scale of -3 to round values
to the nearest thousand. Note that the display scale remains
non-negative, so in this case the display scale will be zero, and all
digits before the decimal point will be displayed.

Relevant PG commit: 085f931f52494e1f304e35571924efa6fcdc2b44
2022-09-06 12:40:56 +03:00
Naisila Puka d7f41cacbe
Prohibit renaming child trigger on distributed partition pre PG15 (#6290)
Pre PG15, renaming child triggers on partitions is allowed. When
creating a trigger in a distributed parent partitioned table, the
triggers on the shards of the partitions have the same name with
the triggers on the corresponding parent shards of the parent
table. Therefore, they don't have the same appended shard id as
the shard id of the partition. Hence, when trying to rename a
child trigger on a partition of a distributed table, we can't
correctly find the triggers on the shards of the partition in
order to rename them since we append a different shard id to the
name of the trigger. Since we can't find the trigger we get a
misleading error of inexistent trigger.

In this commit we prohibit renaming child triggers on distributed
partitions altogether.
2022-09-06 12:19:25 +03:00
Marco Slot e6b1845931
Change split logic to avoid EnsureReferenceTablesExistOnAllNodesExtended (#6208)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-09-05 22:02:18 +02:00
Önder Kalacı bd13836648
Add citus.skip_advisory_lock_permission_checks (#6293) 2022-09-05 17:47:41 +02:00
Jelte Fennema 1c5b8588fe
Address race condition in InitializeBackendData (#6285)
Sometimes in CI our isolation_citus_dist_activity test fails randomly
like this:
```diff
 step s2-view-dist:
  SELECT query, citus_nodename_for_nodeid(citus_nodeid_for_gpid(global_pid)), citus_nodeport_for_nodeid(citus_nodeid_for_gpid(global_pid)), state, wait_event_type, wait_event, usename, datname FROM citus_dist_stat_activity WHERE query NOT ILIKE ALL(VALUES('%pg_prepared_xacts%'), ('%COMMIT%'), ('%BEGIN%'), ('%pg_catalog.pg_isolation_test_session_is_blocked%'), ('%citus_add_node%')) AND backend_type = 'client backend' ORDER BY query DESC;

 query                                                                                                                                                                                                                                                                                                                                                                 |citus_nodename_for_nodeid|citus_nodeport_for_nodeid|state              |wait_event_type|wait_event|usename |datname
 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+-------------------------+-------------------+---------------+----------+--------+----------

   INSERT INTO test_table VALUES (100, 100);
                                                                                                                                                                                                                                                                                                                          |localhost                |                    57636|idle in transaction|Client         |ClientRead|postgres|regression
-(1 row)
+
+                SELECT coalesce(to_jsonb(array_agg(csa_from_one_node.*)), '[{}]'::JSONB)
+                FROM (
+                    SELECT global_pid, worker_query AS is_worker_query, pg_stat_activity.* FROM
+                    pg_stat_activity LEFT JOIN get_all_active_transactions() ON process_id = pid
+                ) AS csa_from_one_node;
+            |localhost                |                    57636|active             |               |          |postgres|regression
+(2 rows)

 step s3-view-worker:
```
Source: https://app.circleci.com/pipelines/github/citusdata/citus/26692/workflows/3406e4b4-b686-4667-bec6-8253ee0809b1/jobs/765119

I intended to fix this with #6263, but the fix turned out to be
insufficient. This PR tries to address the issue by setting
distributedCommandOriginator correctly in more situations. However, even
with this change it's still possible to reproduce the flaky test in CI.
In any case this should fix at least some instances of this issue.

In passing this changes the isolation_citus_dist_activity test to allow
running it multiple times in a row.
2022-09-02 14:23:47 +02:00
Marco Slot 432f399a5d
Allow citus_internal application_name with additional suffix (#6282)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-09-01 14:26:43 +02:00
Naisila Puka 317dda6af1
Use RelationGetPrimaryKeyIndex for citus catalog tables (#6262)
pg_dist_node and pg_dist_colocation have a primary key index, not a replica identity index.

Citus catalog tables are created in public schema, which has replica identity index by default 
as primary key index. Later the citus catalog tables are moved to pg_catalog schema.

During pg_upgrade, all tables are recreated, and given that pg_dist_colocation is found in
pg_catalog schema, it is recreated in that schema, and when it is recreated it doesn't
have a replica identity index, because catalog tables have no replica identity.

Further action:
Do we even need to acquire this lock on the primary key index?
Postgres doesn't acquire such locks on indexes before deleting catalog tuples.
Also, catalog tuples don't have replica identities by definition.
2022-09-01 11:56:31 +03:00
Jelte Fennema 8bb082e77d
Fix reporting of progress on waiting and moved shards (#6274)
In commit 31faa88a4e I removed some features of the rebalance progress
monitor. I did this because the plan was to remove the foreground shard
rebalancer later in the PR that would add the background shard
rebalancer. So, I didn't want to spend time fixing something that we
would throw away anyway.

As it turns out we're not removing the foreground shard rebalancer after
all, so it made sens to fix the stuff that I broke. This PR does that.
For the most part this commit reverts the changes in commit 31faa88a4e.
It's not a full revert though, because it keeps the improved tests and
the changes to `citus_move_shard_placement`.
2022-08-31 14:55:47 +03:00
Naisila Puka 98dcbeb304
Specifies that our CustomScan providers support projections (#6244)
Before, this was the default mode for CustomScan providers.
Now, the default is to assume that they can't project.
This causes performance penalties due to adding unnecessary
Result nodes.

Hence we use the newly added flag, CUSTOMPATH_SUPPORT_PROJECTION
to get it back to how it was.

In PG15 support branch we created explain functions to ignore
the new Result nodes, so we undo that in this commit.

Relevant PG commit:
955b3e0f9269639fb916cee3dea37aee50b82df0
2022-08-31 10:48:01 +03:00
Marco Slot 6bb31c5d75
Add non-blocking variant of create_distributed_table (#6087)
Added create_distributed_table_concurrently which is nonblocking variant of create_distributed_table.

It bases on the split API which takes advantage of logical replication to support nonblocking split operations.

Co-authored-by: Marco Slot <marco.slot@gmail.com>
Co-authored-by: aykutbozkurt <aykut.bozkurt1995@gmail.com>
2022-08-30 15:35:40 +03:00
Jelte Fennema d68654680b
Fix flakyness in isolation_citus_dist_activity (#6263)
Sometimes in CI our isolation_citus_dist_activity test fails randomly
like this:
```diff
 step s2-view-dist:
  SELECT query, citus_nodename_for_nodeid(citus_nodeid_for_gpid(global_pid)), citus_nodeport_for_nodeid(citus_nodeid_for_gpid(global_pid)), state, wait_event_type, wait_event, usename, datname FROM citus_dist_stat_activity WHERE query NOT ILIKE ALL(VALUES('%pg_prepared_xacts%'), ('%COMMIT%'), ('%BEGIN%'), ('%pg_catalog.pg_isolation_test_session_is_blocked%'), ('%citus_add_node%')) AND backend_type = 'client backend' ORDER BY query DESC;

 query                                                                                                                                                                                                                                                                                                                                                                 |citus_nodename_for_nodeid|citus_nodeport_for_nodeid|state              |wait_event_type|wait_event|usename |datname
 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+-------------------------+-------------------+---------------+----------+--------+----------

   INSERT INTO test_table VALUES (100, 100);
                                                                                                                                                                                                                                                                                                                          |localhost                |                    57636|idle in transaction|Client         |ClientRead|postgres|regression
-(1 row)
+
+                SELECT coalesce(to_jsonb(array_agg(csa_from_one_node.*)), '[{}]'::JSONB)
+                FROM (
+                    SELECT global_pid, worker_query AS is_worker_query, pg_stat_activity.* FROM
+                    pg_stat_activity LEFT JOIN get_all_active_transactions() ON process_id = pid
+                ) AS csa_from_one_node;
+            |localhost                |                    57636|active             |               |          |postgres|regression
+(2 rows)

 step s3-view-worker:
```
Source: https://app.circleci.com/pipelines/github/citusdata/citus/26605/workflows/56d284d2-5bb3-4e64-a0ea-7b9b1626e7cd/jobs/760633

The reason for this is that citus_dist_stat_activity sometimes shows the
query that it uses itself to get the data from pg_stat_activity. This is
actually a bug, because it's a worker query and thus shouldn't show up
there. To try and solve this bug, we remove two small opportunities for a
race condition. These race conditions could happen when the backenddata
was marked as active, but the distributedCommandOriginator was not set
correctly yet/anymore. There was an opportunity for this to happen both 
during connection start and shutdown.
2022-08-30 12:57:37 +03:00
Ahmet Gedemenli 0855a9d1d4
Use SUM for calculating non partitioned table sizes (#6222)
We currently do a `pg_relation_total_size('t1') + pg_relation_total_size('t2') + ..` on shard lists, especially when rebalancing the shards. This in some cases goes huge. With this PR, we basically use a SUM for all table sizes, instead of using thousands of pluses.
2022-08-26 18:02:14 +03:00
Sameer Awasekar 4df8eca77f
Add worker_split_shard_release_dsm udf to release dynamic shared memory (#6248)
The code introduces worker_split_shard_release_dsm udf to release the dynamic shared memory segment allocated during non-blocking split workflow.
2022-08-26 18:27:32 +05:30
Önder Kalacı 3ed6fea1cf
Prevent Merge command on distributed tables [PG 15] (#6238) 2022-08-25 13:27:08 +03:00
Marco Slot 9bf3c3dd5c
Add an allow_unsafe_constraints flag for constraints without distribution column (#6237)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-08-25 11:37:50 +03:00
Gokhan Gulbiz 69d2fcf5c0
Use the same colocation group for child and parent rels when altering a distributed table (#6225)
* Alter_distributed_table colocateWith:none bug fix for partitioned tables.

* Regression tests added for alter_distributed_table colocateWith:none for partitioned tables

* Update query comparision to be more accurate
2022-08-25 11:23:59 +03:00
Marco Slot ac07d33a29
Remove unused reduceQuery from physical planning (#6221)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-08-24 17:24:27 +00:00
Naisila Puka 1f4fe35512
Support JSON_TABLE on PG 15 (#6241)
Postgres supports JSON_TABLE feature on PG 15.

We treat JSON_TABLE the same as correlated functions (e.g., recurring tuples).
In the end, for multi-shard JSON_TABLE commands, we apply the same
restrictions as reference tables (e.g., cannot be in the outer part of
an outer join etc.)

Co-authored-by: Onder Kalaci <onderkalaci@gmail.com>
2022-08-24 19:11:18 +03:00
Naisila Puka 35b4ddc355
Pg15 support (#6085)
* Adjust configure script to allow PG15

* Adds copy of ruleutils_14.c as ruleutils_15.c

* Uses get_namespace_name_or_temp in ruleutils_15.c

Relevant PG commit:
48c5c9068211e0a04fd9553c8714b2821ed3ad17

* Clean up code using "(expr) ? true : false" in ruleutils_15.c

Relevant PG commit:
fd0625c7a9c679c0c1e896014b8f49a489c3a245

* Change varno from Index (unsigned int) to int in ruleutils_15.c

Relevant PG commit:
e3ec3c00d85bd2844ffddee83df2bd67c4f8297f

* Adds find_recursive_union to ruleutils_15.c

Relevant PG commit:
3f50b82639637c9908afa2087de7588450aa866b

* Fix display of SQL-std func's args in INSERT/SELECT in ruleutils_15.c

Relevant PG commit:
a8d8445a7b2f80f6d0bfe97b19f90bd2cbef8759

* Fix ruleutils_15.c's dumping of whole-row Vars in more contexts

Relevant PG commit:
43c2175121c829c8591fc5117b725f1f22bfb670

* Fix assorted missing logic for GroupingFunc nodes in ruleutils_15.c

Relevant PG commit:
2591ee8ec44d8cbc8e1226550337a64c684746e4

* Adds grammar support for SQL/JSON clauses in ruleutils_15.c

Relevant PG commit:
f79b803dcc98d707450e158db3638dc67ff8380b

* Adds SQL/JSON constructors to ruleutils_15.c

Relevant PG commits:
f4fb45d15c59d7add2e1b81a9d477d0119a9691a
cc7401d5ca498a84d9b47fd2e01cebd8e830e558

* Adds support for MERGE in ruleutils_15.c

Relevant PG commit:
7103ebb7aae8ab8076b7e85f335ceb8fe799097c

* Add IS JSON predicate to ruleutils_15.c

Relevant PG commit:
33a377608fc29cdd1f6b63be561eab0aee5c81f0

* Add SQL/JSON query functions to ruleutils_15.c

Relevant PG commit:
1a36bc9dba8eae90963a586d37b6457b32b2fed4

* Adds three different SQL/JSON values to ruleutils_15.c

Relevant PG commits:
606948b058dc16bce494270eea577011a602810e
49082c2cc3d8167cca70cfe697afb064710828ca

* Adds JSON table functions in ruleutils_15.c

Relevant PG commit:
4e34747c88a03ede6e9d731727815e37273d4bc9

* Add PLAN function for JSON table in ruleutils_15.c

Relevant PG commit:
fadb48b00e02ccfd152baa80942de30205ab3c4f

* Remove extra blank lines before block-closing braces ruleutils_15.c

Relevant PG commit:
24d2b2680a8d0e01b30ce8a41c4eb3b47aca5031

* set_deparse_plan: Reuse variable to appease Coverity ruleutils_15.c

Relevant PG commit:
e70813fbc4aaca35ec012d5a426706bd54e4acab

* Mechanical code beautification ruleutils_15.c

Relevant PG commit:
23e7b38bfe396f919fdb66057174d29e17086418

* Rename value_type to item_type in ruleutils_15.c

Relevant PG commit:
3ab9a63cb638a1fd99475668e2da9c237495aeda

* Show 'AS "?column?"' explicitly when it's important in ruleutils_15.c

Relevant PG commit:
c7461fc25558832dd347a9c8150b0f1ed85e36e8

* Fix ruleutils_15.c issues with dropped cols in funcs-returning-composite

Relevant PG commit:
c1d1e8469c77ce6b8e5310955580b4a3eee7fe96

* Change comment regarding functions returning composite in ruleutils_15.c

Relevant PG commit:
c2fa113ddb1117b1f03e91960f65d5d7d8a90270

* Replace int nodes with bool nodes where needed

In PG15, Boolean nodes are added. Pre PG15, internal Boolean values
in Create Role commands were represented by Integer nodes. This
commit replaces int nodes logic with bool nodes logic where needed.
Mostly there are CREATE ROLE logic changes.

Relevant PG commit:
941460fcf731a32e6a90691508d5cfa3d1f8eeaf

* Handle new option colliculocale in CREATE COLLATION logic

In PG15, there is an added option to use ICU as global locale provider.
pg_collation has three locale-related fields: collcollate and collctype,
which are libc-related fields, and a new one colliculocale, which is the
ICU-related field. Only the libc-related fields or the ICU-related field
is set, never both.

Relevant PG commits:
f2553d43060edb210b36c63187d52a632448e1d2
54637508f87bd5f07fb9406bac6b08240283be3b

* Add PG15 tests to CI using test images that have 15beta2 (#6093)

* Change warning message in pg_signal_backend()

Relevant PG commit:
7fa945b857cc1b2964799411f1633468826861ff

* Revert "Add missing ifdef for PG 15"

This reverts commit c7b51025ab.

* Fixes tests for ALTER TRIGGER RENAME consistency for part. tables

Relevant PG commit:
80ba4bb383538a2ee846fece6a7b8da9518b6866

* Prevent creating child triggers on partitions when adding new node

Pre PG15, tgisinternal is true for a "child" trigger on a partition
cloned from the trigger on the parent.
In PG15, tgisinternal is false in that case. However, we don't want to
create this trigger on the partition since it will create a conflict
when we try to attach the partition to the parent table:
ERROR: trigger "..." for relation "{partition_name}" already exists

Relevant PG commit:
f4566345cf40b068368cb5617e61318da60676ec

* Fix tests for generated columns dependency changes

In PG15, For GENERATED columns, all dependencies of the generation
expression are recorded as NORMAL dependencies of the column itself.
This requires CASCADE to drop generated cols with the original col.
PRE PG15, dependencies were recorded as AUTO, with which
generated columns are silently dropped with the original column.

Relevant PG commit:
cb02fcb4c95bae08adaca1202c2081cfc81a28b5

* Explicitly cast catalog "char" column to text before concatenation

Relevant PG commit:
07eee5a0dc642d26f44d65c4e6263304208e8583

* Remove 'AS "?column?"' from test outputs

There were some instances in the following tst outputs
in planning debug outputs where AS "?column?" is added.
We add a normalization rule to remove it as it is not
important.

cte_inline.out
recursive_relation_planning_restriction_pushdown.out

Relevant PG commit:
c7461fc25558832dd347a9c8150b0f1ed85e36e8

* Use pg_backup_stop(PG15) instead of pg_stop_backup(PG<15)

Add an alternative test output because of the change in the
backup modes of Postgres. Specifically here, there is a renaming
issue: pg_stop_backup PRE PG15 vs pg_backup_stop PG15+
The alternative output can be deleted when we drop support for PG14

Relevant PG commit:
39969e2a1e4d7f5a37f3ef37d53bbfe171e7d77a

* Adds citus.mitmfifo GUC

Previously we setting this configuration parameter
in the fly for failure tests schedule.
However, PG15 doesn't allow that anymore: reserved prefixes
like "citus" cannot be used to set non-existing GUCs.

Relevant PG commit:
88103567cb8fa5be46dc9fac3e3b8774951a2be7

* Handles EXPLAIN output diffs in PG15 - Extra result lines

To handle extra "Result" lines in explain outputs, we add explain
method to multi_test_helpers.sql file
- plan_without_result_lines() is added for cases where we want the
whole explain output with only "Result" lines removed

* Handles EXPLAIN output diffs in PG15, Hash Agg/Join leverage

To handle differences in usage of GroupAggregate vs HashAggregate
or Merge Join vs Hash join in cases where this detail doesn't
seem to matter, we use coordinator_plan().
- coordinator_plan() is updated to remove "Result" lines

There are some cases where we have subplans so we add a new
function that prints all Task Count lines as well
- coordinator_plan_with_subplans()

Still not sure of the relevant PG commit
Could be db0d67db2401eb6238ccc04c6407a4fd4f985832
but disabling enable_group_by_reordering didn't help.

* Handles EXPLAIN output diffs in PG15: enable_group_by_reordering

Relevant PG commit
db0d67db2401eb6238ccc04c6407a4fd4f985832

* Normalizes Memory Usage, Buckets, Batches for PG15 explain diffs

We create a new function in multi_test_helpers, which is similar
to explain_merge function in PG15. This explain helper function
normalies Memory Usage, Buckets and Batches, and we use it in the
tests which give a different output for PG15.

* Bump test images to 15beta3 (#6172)

* Omit namespace in post-copy errmsg

Relevant PG commit:
069d33d0c5a021601245e44df77a0423ddd69359

* Handles EXPLAIN output diffs in PG15: extra arrows&result lines

To handle extra "->" arrows resulting from extra Result lines
in explain outputs, we add the following explain method to
multi_test_helpers.sql file

- plan_without_arrows() is added for cases where we want the
whole explain output without arrows and without Result lines

* Alters public schema's owner to pg_database_owner in PG15

In PG15, public schema is owned by pg_database_owner role.
In multi_extension, we drop and recreate the ppublic schema,
hence its owner become the default user in our tests, postgres.
Change that to pg_database_owner for PG15 consistency.

This results in alternative test output for public schema grants
in the following test:

grant_on_schema_propagation.sql

Relevant PG commit: b073c3ccd06e4cb845e121387a43faa8c68a7b62

* Add alternative test outputs for change in Insert Select display

citus_local_tables_queries.sql
coordinator_shouldhaveshards.sql
cte_inline.sql
insert_select_repartition.sql
intermediate_result_pruning.sql
local_shard_execution.sql
local_shard_execution_replicated.sql
multi_deparse_shard_query.sql
multi_insert_select.sql
multi_insert_select_conflict.sql
multi_mx_insert_select_repartition.sql
mx_coordinator_shouldhaveshards.sql
single_node.sql

Relevant PG commit:
a8d8445a7b2f80f6d0bfe97b19f90bd2cbef8759

* Fixes columnar tap tests for PG15

In PG15, Perl test modules have been moved to a new namespace.
Also, postgres node new() and get_new_node() methods have been
unified to one method: new()

We create separate tap tests for PG13/14 and PG15+
and update the Makefiles accordingly.

Relevant PG commits:
201a76183e2056c2217129e12d68c25ec9c559c8
b3b4d8e68ae83f432f43f035c7eb481ef93e1583

* Handles EXPLAIN output diffs in PG15: HashAgg Leverage,alt. output

Still not sure of the relevant PG commit
Could be db0d67db2401eb6238ccc04c6407a4fd4f985832
but disabling enable_group_by_reordering didn't help.
2022-08-24 17:59:17 +02:00
aykut-bozkurt 041f88d7bf
Revert "Revert "Creates new colocation for colocate_with:='none' too"" (#6227)
This reverts commit d171a736ab.
2022-08-24 10:54:04 +03:00
Jelte Fennema e0ada050aa
Enable binary logical replication for shard moves (#6017)
Using binary encoding can save a lot of CPU cycles, both on the sender
and on the receiver. Since the walsender and walreceiver processes are
single threaded, this can matter a lot for the throughput if they are
bottlenecked on CPU.

This feature is only available in PG14, not PG13. It should be safe to 
always enable because it's only used for types that support binary 
encoding according to the PG docs:
> Even when this option is enabled, only data types that have binary 
> send and receive functions will be transferred in binary.

But in case it causes problems, it can still be disabled by setting
`citus.enable_binary_protocol` to `false`.
2022-08-23 16:38:00 +02:00
aykut-bozkurt 07cfba461a
ensuring reference tables on nodes should not create colocation entry. (#6224)
We create colocation entry in create_reference_table.
2022-08-23 16:17:59 +03:00
Marco Slot 639588bee0
Remove unused functions (#6220)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-08-22 11:53:25 +03:00
Jelte Fennema 3fadb98380
Fix compilation warning on PG13 + OpenSSL 3.0 (#6038)
This removes some warnings that are present when building on Ubuntu 22.04. 
It removes warnings on PG13 + OpenSSL 3.0. OpenSSL 3.0 has marked some 
functions that we use as deprecated, but we want to continue support OpenSSL
1.0.1 for the time being too. This indicates that to OpenSSL 3.0, so it doesn't 
show warnings.
2022-08-19 05:51:47 -07:00
Marco Slot 5160cafa82
Do not propagate GRANT ON SCHEMA from CREATE EXTENSION (#6175)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-08-19 13:23:47 +03:00
Jelte Fennema 31faa88a4e
Track rebalance progress at the shard move level (#6187)
We're in the processes of totally changing the shard rebalancer
experience and infrastructure. Soon the shard rebalancer will include
retries, crash recovery and support for running in the background.

These improvements come at a cost though, the way the
get_rebalance_progress UDF currently works is very hard to replicate
with this new structure. This is mostly because the old behaviour
doesn't really make sense anymore with this new infrastructure. A new
and better way to track the progress will be included as part of the new
infrastructure.

This PR is in preparation of the new code rebalancer experience.
It changes the get_rebalance_progress UDF to only display the moves that
are in progress at the moment, not the ones that happened in the past or
that are planned in the future. Another option would have been to
completely remove the current get_rebalance_progress functionality and
point people to the new way of tracking progress. But old blogposts
still reference the old UDF and users might have some automation on top
of it. Showing the progress of the current moves is fairly simple to
achieve, even with the new infrastructure.

So this PR is a kind of compromise: It doesn't have complete feature
parity with the old get_rebalance_progress, but the most common use
cases will still work.

There's also an advantage of the change: You can now see progress of
shard moves that were triggered by calling citus_move_shard_placement
manually. Instead of only being able to see progress of moves that were
initiated using get_rebalance_table_shards.
2022-08-18 18:57:04 +02:00
Onder Kalaci 9ec8e627c1 Support Sequences owned by columns before distributing tables
There are 3 different ways that a sequence can be interacting
with tables. (1) and (2) are already supported. This commit adds
support for (3).

     (1) column DEFAULT nextval('seq'):

	The dependency is roughly like below,
	and ExpandCitusSupportedTypes() is responsible
	for finding the depending sequences.

        schema <--- table <--- column <---- default value
         ^                                     |
         |------------------ sequence <--------|

    (2) serial columns: Bigserial/small serial etc:

	The dependency is roughly like below,
	and ExpandCitusSupportedTypes() is responsible
	for finding the depending sequences.

        schema <--- table <--- column <---- default value
                                 ^             |
				 |             |
          		     sequence <--------|

   (3) Sequence OWNED BY table.column: Added support for
       this type of resolution in this commit.

       The dependency is almost like the following, and
       ExpandCitusSupportedTypes() is NOT responsible for finding
       the dependency.

        schema <--- table <--- column
                                 ^
				 |
          		     sequence
2022-08-18 10:29:40 +02:00
Naisila Puka 69ffdbf0e3
Uses object name in cannot distribute object error (#6186)
Object type ids have changed in PG15 because of at least two added
objects in the list: OBJECT_PARAMETER_ACL, OBJECT_PUBLICATION_NAMESPACE

To avoid different output between pg versions, let's use the object
name in the error, and put the object id in the error detail.

Relevant PG commits:
a0ffa885e478f5eeacc4e250e35ce25a4740c487
5a2832465fd8984d089e8c44c094e6900d987fcd
2022-08-18 11:05:17 +03:00
Ying Xu 91473635db
[Columnar] Check for existence of Citus before creating Citus_Columnar (#6178)
* Added a check to see if Citus has already been loaded before creating citus_columnar

* added tests
2022-08-17 15:12:42 -07:00
Nils Dijk a9d47a96f6
Fix reference table lock contention (#6173)
DESCRIPTION: Fix reference table lock contention

Dropping and creating reference tables unintentionally blocked on each other due to the use of an ExclusiveLock for both the Drop and conditionally copying existing reference tables to (new) nodes.

The patch does the following:
 - Lower lock lever for dropping (reference) tables to `ShareLock` so they don't self conflict
 - Treat reference tables and distributed tables equally and acquire the colocation lock when dropping any table that is in a colocation group
 - Perform the precondition check for copying reference tables twice, first time with a lower lock that doesn't conflict with anything. Could have been a NoLock, however, in preparation for dropping a colocation group, it is an `AccessShareLock`

During normal operation the first check will always pass and we don't have to escalate that lock. Making it that we won't be blocked on adding and remove reference tables. Only after a node addition the first `create_reference_table` will still need to acquire an `ExclusiveLock` on the colocation group to perform the copy.
2022-08-17 18:19:28 +02:00
Ahmet Gedemenli 0631e1998b
Fix upgrade paths for #6100 (#6176)
* Fix upgrade paths for #6100

Co-authored-by: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com>
2022-08-17 18:56:53 +03:00
Jelte Fennema 3f6ce889eb
Use CreateSimpleHash (and variants) whenever possible (#6177)
This is a refactoring PR that starts using our new hash table creation
helper function. It adds a few more macros for ease of use, because C
doesn't have default arguments. It also adds a macro to check if a
struct contains automatic padding bytes. No struct that is hashed using
tag_hash should have automatic padding bytes, because those bytes are
undefined and thus using them to create a hash will result in undefined
behaviour (usually a random hash).
2022-08-17 13:01:59 +03:00
aykut-bozkurt 52efe08642
default mode for shard splitting is set to auto. (#6179) 2022-08-17 12:18:47 +03:00
aykut-bozkurt be06d65721
Nonblocking tenant isolation is supported by using split api. (#6167) 2022-08-17 11:13:07 +03:00
Jelte Fennema 78a5013e24
Support changing CPU priorities for backends and shard moves (#6126)
**Intro**
This adds support to Citus to change the CPU priority values of
backends. This is created with two main usecases in mind:

1. Users might want to run the logical replication part of the shard moves
   or shard splits at a higher speed than they would do by themselves. 
   This might cause some small loss of DB performance for their regular 
   queries, but this is often worth it. During high load it's very possible
   that the logical replication WAL sender is not able to keep up with the
   WAL that is generated. This is especially a big problem when the
   machine is close to running out of disk when doing a rebalance.
2. Users might have certain long running queries that they don't impact
   their regular workload too much.

**Be very careful!!!**
Using CPU priorities to control scheduling can be helpful in some cases
to control which processes are getting more CPU time than others. 
However, due to an issue called "[priority inversion][1]" it's possible that
using CPU priorities together with the many locks that are used within
Postgres cause the exact opposite behavior of what you intended. This
is why this PR only allows the PG superuser to change the CPU priority 
of its own processes. Currently it's not recommended to set `citus.cpu_priority`
directly. Currently the only recommended interface for users is the setting 
called `citus.cpu_priority_for_logical_replication_senders`. This setting
controls CPU priority for a very limited set of processes (the logical 
replication senders). So, the dangers of priority inversion are also limited
with when using it for this usecase.

**Background**
Before reading the rest it's important to understand some basic
background regarding process CPU priorities, because they are a bit
counter intuitive. A lower priority value, means that the process will
be scheduled more and whatever it's doing will thus complete faster. The
default priority for processes is 0. Valid values are from -20 to 19
inclusive. On Linux a larger difference between values of two processes
will result in a bigger difference in percentage of scheduling.

**Handling the usecases**
Usecase 1 can be achieved by setting `citus.cpu_priority_for_logical_replication_senders`
to the priority value that you want it to have. It's necessary to set
this both on the workers and the coordinator. Example:
```
citus.cpu_priority_for_logical_replication_senders = -10
```

Usecase 2 can with this PR be achieved by running the following as
superuser. Note that this is only possible as superuser currently 
due to the dangers mentioned in the "Be very carefull!!!" section. 
And although this is possible it's **NOT** recommended:
```sql
ALTER USER background_job_user SET citus.cpu_priority = 5;
```

**OS configuration**
To actually make these settings work well it's important to run Postgres
with more a more permissive value for the 'nice' resource limit than
Linux will do by default. By default Linux will not allow a process to
set its priority lower than it currently is, even if it was lower when
the process originally started. This capability is necessary to reset
the CPU priority to its original value after a transaction finishes.
Depending on how you run Postgres this needs to be done in one of two
ways:

If you use systemd to start Postgres all you have to do is add  a line
like this to the systemd service file:
```conf
LimitNice=+0 # the + is important, otherwise its interpreted incorrectly as 20
```

If that's not the case you'll have to configure `/etc/security/limits.conf` 
like so, assuming that you are running Postgres as the `postgres` OS user:
```
postgres            soft    nice            0
postgres            hard    nice            0
```
Finally you'd have add the following line to `/etc/pam.d/common-session`
```
session required pam_limits.so
```

These settings would allow to change the priority back after setting it
to a higher value.

However, to actually allow you to set priorities even lower than the
default priority value you would need to change the values in the 
config to something lower than 0. So for example:
```conf
LimitNice=-10
```

or

```
postgres            soft    nice            -10
postgres            hard    nice            -10
```

If you use WSL2 you'll likely have to do another thing. You have to 
open a new shell, because when PAM is only used during login, and 
WSL2 doesn't actually log you in. You can force a login like this:
```
sudo su $USER --shell /bin/bash
```
Source: https://stackoverflow.com/a/68322992/2570866

[1]: https://en.wikipedia.org/wiki/Priority_inversion
2022-08-16 13:07:17 +03:00
Jelte Fennema 1a01c896f0
Fix description of citus.distributed_deadlock_detection_factor (#5860)
The long description of the `citus.distributed_deadlock_detection_factor` 
setting was incorrectly stating that 1000 would disable it. Instead -1 
is the value that disables distributed deadlock detection.
2022-08-16 01:19:49 +03:00
Jelte Fennema 43c2a1e88b
Share more code between splits and moves (#6152)
When introducing non-blocking shard split functionality it was based
heavily on the non-blocking shard moves. However, differences between
usage was slightly to big to be able to reuse the existing functions
easily. So, most logical replication code was simply copied to dedicated
shard split functions and modified for that purpose.

This PR tries to create a more generic logical replication
infrastructure that can be used by both shard splits and shard moves.
There's probably more code sharing possible in the future, but I believe
this is at least a good start and addresses the lowest hanging fruit.

This also adds a CreateSimpleHash function that makes creating the
most common type of hashmap common.
2022-08-15 20:21:51 +03:00
Marco Slot 6c73576606 Fix HTAB memory leaks 2022-08-15 16:10:24 +02:00
Teja Mupparti e962113c63 Remove the GUC mention in the error message as this config is meant for advanced users 2022-08-11 09:43:14 -07:00
aykut-bozkurt 898801504e
sysid should be parsed as int. (#6150) 2022-08-11 10:44:46 +03:00
aykut-bozkurt 166272963a
log NOTICE createdb only if EnableUnsupportedFeatureMessages GUC is enabled. (#6151) 2022-08-09 21:21:22 +03:00
aykut-bozkurt cc694b6bcf
we consider stat object as invalid if it is not owned by current user (#6130) 2022-08-09 20:59:30 +03:00
Hanefi Onaldi a58523f1d8
Remove all references to .source files 2022-08-09 14:15:52 +03:00
Jelte Fennema 8017693b2f
Allow specifying the shard_transfer_mode when replicating reference tables (#6070)
When using `citus.replicate_reference_tables_on_activate = off`,
reference tables need to be replicated later. This can be done using the
`replicate_reference_tables()` UDF. However, this function only allowed
blocking replication. This changes the function to default to logical
replication instead, and allows choosing any of our existing shard
transfer modes.
2022-08-09 13:21:31 +03:00
Marco Slot 3b57ff2867 Fix crash in citus_copy_shard_placement 2022-08-09 09:31:05 +02:00
Jelte Fennema dd548ee3c7
Use faster custom copy logic for non-blocking shard moves (#6119)
DESCRIPTION: Use faster custom copy logic for non-blocking shard moves

Non-blocking shard moves consist of two main phases:
1. Initial data copy
2. Catchup phase

This changes the first of these phases significantly. Previously we used the
copy logic provided by postgres subscriptions. This meant we didn't have
to implement it ourselves, but it came with the downside of little control.
When implementing shard splits we needed more control to even make it
work, so we implemented our own logic for copying data between nodes.

This PR starts using that logic for non-blocking shard moves. Doing so
has four main advantages:
1. It uses COPY in binary format when possible, which is cheaper to encode 
    and decode. Furthermore it very often results in less data that needs to 
    be sent over the network.
2. It allows us to create the primary key (or other replica identity) after doing
    the initial data copy. This should give some speed up over the total run,
    because creating an index is bulk is much faster than incrementally building it.
3. It doesn't require a replication slot per parallel copy. Increasing the maximum
    number of replication slots uses resources in postgres, even if they are not used.
    So reducing the number of replication slots that shard moves need is nice.
4. Logical replication table_sync workers are slow to start up, so if lots of shards
    need to be copied that can make it quite slow. This can happen easily when
    combining Postgres partitioning with Citus.
2022-08-08 17:09:43 +02:00
Marco Slot ead9d28835 Avoid deadlocks on split failure by closing connections 2022-08-08 13:33:23 +02:00
Marco Slot 044dd26e40 Reimplement tenant isolation on top of block shard split 2022-08-08 13:33:23 +02:00
Teja Mupparti 430c201d03 get_current_transaction_id() UDF is not printing the timestamp of the current transaction on the coordinator even when non-null 2022-08-05 10:12:07 -07:00
aykut-bozkurt 4992533e33
support grant statement propagation for aggregates (#6132) 2022-08-05 14:47:33 +03:00
Ahmet Gedemenli 8b68b0b5bb
Fix pg upgrade script for foreign tables (#6100)
Fixes unexpected error for foreign tables when upgrading pg
2022-08-05 13:35:17 +03:00
Sameer Awasekar e236711eea Introduce Non-Blocking Shard Split Workflow 2022-08-04 16:32:38 +02:00
aykut-bozkurt b67abdd28c
we should not log error in preprocess if attached partition is missing. (#6131) 2022-08-04 15:49:14 +03:00
aykut-bozkurt 3ddc089651
stop distributing views with no distributed dependency if GUC DistributeLocalViews is set false. (#6083) 2022-08-04 12:34:40 +03:00
aykut-bozkurt 4ffe436bf9
we validate constraint as well if the statement is alter domain drop constraint (#6125) 2022-08-03 23:06:33 +03:00
aykut-bozkurt a662331668
qualify text dict and conf respect missingok (#6120) 2022-08-03 13:13:53 +03:00
aykutbozkurt 7387c7ed3d address method should take parameter isPostprocess 2022-08-02 21:00:23 +03:00
aykutbozkurt c98a68662a introduces operation type for dist ops 2022-08-02 20:42:32 +03:00
aykutbozkurt 57ce4cf8c4 use address method to decide if we should run preprocess and postprocess steps for a distributed object 2022-08-02 20:42:32 +03:00
Onder Kalaci c7b51025ab Add missing ifdef for PG 15 2022-08-02 09:46:53 +02:00
Jelte Fennema abffa6c3b9
Use shard split copy code for blocking shard moves (#6098)
The new shard copy code that was created for shard splits has some
advantages over the old shard copy code. The old code was using 
worker_append_table_to_shard, which wrote to disk twice. And it also 
didn't use binary copy when that was possible. Both of these issues
were fixed in the new copy code. This PR starts using this new copy
logic also for shard moves, not just for shard splits.

On my local machine I created a single shard table like this.
```sql
set citus.shard_count = 1;
create table t(id bigint, a bigint);
select create_distributed_table('t', 'id');

INSERT into t(id, a) SELECT i, i from generate_series(1, 100000000) i;
```

I then turned `fsync` off to make sure I wasn't bottlenecked by disk. 
Finally I moved this shard between nodes with `citus_move_shard_placement`
with `block_writes`.

Before this PR a move took ~127s, after this PR it took only ~38s. So for this 
small test this resulted in spending ~70% less time.

And I also tried the same test for a table that contained large strings:
```sql
set citus.shard_count = 1;
create table t(id bigint, a bigint, content text);
select create_distributed_table('t', 'id');

INSERT into t(id, a, content) SELECT i, i, 'aunethautnehoautnheaotnuhetnohueoutnehotnuhetncouhaeohuaeochgrhgd.athbetndairgexdbuhaobulrhdbaetoausnetohuracehousncaoehuesousnaceohuenacouhancoexdaseohusnaetobuetnoduhasneouhaceohusnaoetcuhmsnaetohuacoeuhebtokteaoshetouhsanetouhaoug.lcuahesonuthaseauhcoerhuaoecuh.lg;rcydabsnetabuesabhenth' from generate_series(1, 20000000) i;
```
2022-08-01 20:10:36 +03:00
Naisila Puka 85324f3acc
Clean up multi_shard_commit_protocol guc leftovers (#6110) 2022-08-01 15:22:02 +03:00
aykut-bozkurt f372e93d22
we supress notice log during looking up function oid to not break pg vanilla tests. (#6082) 2022-08-01 10:14:35 +03:00
Önder Kalacı cbdc2b3019
Merge branch 'main' into fix_relation_acess_2 2022-07-29 16:45:02 +02:00
Marco Slot 6d6e44166f Avoid catalog read via superuser() call in DecrementSharedConnectionCounter 2022-07-29 14:05:41 +02:00
Onder Kalaci bdaeb40b51 Add missing relation access record for local utility command
While testing 5670dffd33, I realized
that we have a missing RecordNonDistTableAccessesForTask() for
local utility commands.

Although we don't have to record the relation access for local
only cases, we really want to keep the behaviour for scale-out
be the same with single node on all aspects. We wouldn't want
any single node complex transaction to work on single machine,
but not on multi node cluster. Hence, we apply the same restrictions.

For example, on a distributed cluster, the following errors, and
after this commit this errors locally as well

```SQL
CREATE TABLE ref(a int primary key);
INSERT INTO ref VALUES (1);

CREATE TABLE dist(a int REFERENCES ref(a));
SELECT create_reference_table('ref');
SELECT create_distributed_table('dist', 'a');

BEGIN;
		SELECT * FROM dist;
		TRUNCATE ref CASCADE;

ERROR:  cannot execute DDL on table "ref" because there was a parallel SELECT access to distributed table "dist" in the same transaction
HINT:  Try re-running the transaction with "SET LOCAL citus.multi_shard_modify_mode TO 'sequential';"

COMMIT;
```

We also add the comprehensive test suite and run the same locally.
2022-07-29 11:36:33 +02:00
Onder Kalaci 149771792b Remove useless version compats
most likely leftover from earlier versions
2022-07-29 10:31:55 +02:00
Ying Xu 7c1a93b26b
Removed USE_PGXS snippet in Makefile that was blocking citus build when flag is set (#6101)
Code snippet in Makefile was blocking Citus build when USE_PGXS flag was set. This was included for port to FSPG but is not needed for Citus engine and can be safely removed.
2022-07-28 14:15:45 -07:00
aykut-bozkurt a218198e8f
reindex object address should return invalid addresses for unsepported object types in reindex stmt (#6096) 2022-07-28 15:31:49 +03:00
Marco Slot cff013a057 Fix issues with insert..select casts and column ordering 2022-07-28 13:23:57 +02:00
aykut-bozkurt 789d5b9ef9
null check for server in GetObjectAddressByServerName (#6095) 2022-07-28 13:13:28 +03:00
Onder Kalaci 0a5112964d Call relation access hash clean-up irrespective of remote transaction state
Mainly because local-only transactions should be cleaned up
2022-07-28 11:27:59 +02:00
Onder Kalaci d67cf907a2 Detach relation access tracking from connection management 2022-07-28 11:27:59 +02:00
Ying Xu fdf090758b
Bugfix for IN clause to be considered during planner phase in Columnar (#6030)
Reported bug #5803 shows that we are currently not sending the IN clause to our planner for columnar. This PR fixes it by checking for ScalarArrayOpExpr in ExtractPushdownClause so that we do not skip it. Also added a test case for this new addition.
2022-07-27 11:06:49 -07:00
Jelte Fennema 0f50bef696
Avoid possible information leakage about existing users (#6090) 2022-07-27 17:46:32 +02:00
Ahmet Gedemenli 2b2a529653
Error out for views with circular dependencies (#6051)
Adds error check for views with circular dependencies
2022-07-27 17:57:45 +03:00