Commit Graph

7070 Commits (m3hm3t/pg18_support)

Author SHA1 Message Date
Gürkan İndibay 553d5ba15d
Adds changelog for 12.1.3 (#7587)
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
Co-authored-by: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com>
2024-04-22 15:38:51 +03:00
Jelte Fennema-Nio a0151aa31d
Greatly speed up "\d tablename" on servers with many tables (#7577)
DESCRIPTION: Fix performance issue when using "\d tablename" on a server
with many tables

We introduce a filter to every query on pg_class to automatically remove
shards. This is useful to make sure \d and PgAdmin are not cluttered
with shards. However, the way we were introducing this filter was using
`securityQuals` which can have negative impact on query performance.

On clusters with 100k+ tables this could cause a simple "\d tablename"
command to take multiple seconds, because a skipped optimization by
Postgres causes a full table scan. This changes the code to introduce
this filter in the regular `quals` list instead of in `securityQuals`.
Which causes Postgres to use the intended optimization again.

For reference, this was initially reported as a Postgres issue by me:

https://www.postgresql.org/message-id/flat/4189982.1712785863%40sss.pgh.pa.us#b87421293b362d581ea8677e3bfea920
2024-04-16 17:26:12 +02:00
Xing Guo ada3ba2507
Add missing volatile qualifier. (#7570)
Variables being modified in the PG_TRY block and read in the PG_CATCH
block should be qualified with volatile.

The variable waitEventSet is modified in the PG_TRY block (line 1085)
and read in the PG_CATCH block (line 1095).

The variable relation is modified in the PG_TRY block (line 500) and
read in the PG_CATCH block (line 515).

Besides, the variable objectAddress doesn't need the volatile qualifier.

Ref: C99 7.13.2.1[^1],

> All accessible objects have values, and all other components of the
abstract machine have state, as of the time the longjmp function was
called, except that the values of objects of automatic storage duration
that are local to the function containing the invocation of the
corresponding setjmp macro that do not have volatile-qualified type and
have been changed between the setjmp invocation and longjmp call are
indeterminate.

[^1]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf

DESCRIPTION: Correctly mark some variables as volatile

---------

Co-authored-by: Hong Yi <zouzou0208@gmail.com>
2024-04-16 15:29:14 +02:00
Karina 41e2af8ff5
Use expecteddir option in _run_pg_regress() (#7582)
Fix check-arbitrary-configs tests failure with current REL_16_STABLE.
This is the same problem as described in #7573. I missed pg_regress call
in _run_pg_regress() in that PR.

Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>
2024-04-16 08:44:47 +00:00
Jelte Fennema-Nio a263ac6f5f
Speed up GetForeignKeyOids (#7578)
DESCRIPTION: Fix performance issue in GetForeignKeyOids on systems with
many constraints

GetForeignKeyOids was showing up in CPU profiles when distributing
schemas on systems with 100k+ constraints. The reason was that this
function was doing a sequence scan of pg_constraint to get the foreign
keys that referenced the requested table.

This fixes that by finding the constraints referencing the table through
pg_depend instead of pg_constraint. We're doing this indirection,
because pg_constraint doesn't have an index that we can use, but
pg_depend does.
2024-04-16 08:16:40 +00:00
Jelte Fennema-Nio 110b4192b2
Fix PG upgrades when invalid rebalance strategies exist (#7580)
DESCRIPTION: Fix PG upgrades when invalid rebalance strategies exist

Without this change an upgrade of a cluster with an invalid rebalance
strategy would fail with an error like this:
```
cache lookup failed for shard_cost_function with oid 6077337
CONTEXT:  SQL statement "SELECT citus_validate_rebalance_strategy_functions(
        NEW.shard_cost_function,
        NEW.node_capacity_function,
        NEW.shard_allowed_on_node_function)"
PL/pgSQL function citus_internal.pg_dist_rebalance_strategy_trigger_func() line 5 at PERFORM
SQL statement "INSERT INTO pg_catalog.pg_dist_rebalance_strategy SELECT
        name,
        default_strategy,
        shard_cost_function::regprocedure::regproc,
        node_capacity_function::regprocedure::regproc,
        shard_allowed_on_node_function::regprocedure::regproc,
        default_threshold,
        minimum_threshold,
        improvement_threshold
    FROM public.pg_dist_rebalance_strategy"
PL/pgSQL function citus_finish_pg_upgrade() line 115 at SQL statement
```

This fixes that by disabling the trigger and simply re-inserting the
invalid rebalance strategy without checking. We could also silently
remove it, but this seems nicer.
2024-04-15 14:26:33 +00:00
Jelte Fennema-Nio 16604a6601
Use an index to get FDWs that depend on extensions (#7574)
DESCRIPTION: Fix performance issue when distributing a table that
depends on an extension

When the database contains many objects this function would show up in
profiles because it was doing a sequence scan on pg_depend. And with
many objects pg_depend can get very large.

This starts using an index scan to only look for rows containing FDWs,
of which there are expected to be very few (often even zero).
2024-04-15 12:42:56 +00:00
Jelte Fennema-Nio cdf51da458
Speed up SequenceUsedInDistributedTable (#7579)
DESCRIPTION: Fix performance issue when creating distributed tables if
many already exist

This builds on the work to speed up EnsureSequenceTypeSupported, and now
does something similar for SequenceUsedInDistributedTable.
SequenceUsedInDistributedTable had a similar O(number of citus tables)
operation. This fixes that and speeds up creation of distributed tables
significantly when many distributed tables already exist.

Fixes #7022
2024-04-15 12:01:55 +00:00
Jelte Fennema-Nio 381f31756e
Speed up EnsureSequenceTypeSupported (#7575)
DESCRIPTION: Fix performance issue when creating distributed tables and many already exist

EnsureSequenceTypeSupported was doing an O(number of distributed tables)
operation. This can become very slow with lots of Citus tables, which
now happens much more frequently in practice due to schema based sharding.

Partially addresses #7022
2024-04-15 10:28:11 +00:00
Onur Tirtir 3586aab17a
Allow providing "host" parameter via citus.node_conninfo (#7541)
And when that is the case, directly use it as "host" parameter for the
connections between nodes and use the "hostname" provided in
pg_dist_node / pg_dist_poolinfo as "hostaddr" to avoid host name lookup.

This is to avoid allowing dns resolution (and / or setting up DNS names
for each host in the cluster). This already works currently when using
IPs in the hostname. The only use of setting host is that you can then
use sslmode=verify-full and it will validate that the hostname matches
the certificate provided by the node you're connecting too.

It would be more flexible to make this a per-node setting, but that
requires SQL changes. And we'd like to backport this change, and
backporting such a sql change would be quite hard while backporting this
change would be very easy. And in many setups, a different hostname for
TLS validation is actually not needed. The reason for that is
query-from-any node: With query-from-any-node all nodes usually have a
certificate that is valid for the same "cluster hostname", either using
a wildcard cert or a Subject Alternative Name (SAN). Because if you load
balance across nodes you don't know which node you're connecting to, but
you still want TLS validation to do it's job. So with this change you
can use this same "cluster hostname" for TLS validation within the
cluster. Obviously this means you don't validate that you're connecting
to a particular node, just that you're connecting to one of the nodes in
the cluster, but that should be fine from a security perspective (in
most cases).

Note to self: This change requires updating

https://docs.citusdata.com/en/latest/develop/api_guc.html#citus-node-conninfo-text.

DESCRIPTION: Allows overwriting host name for all inter-node connections
by supporting "host" parameter in citus.node_conninfo
2024-04-15 09:51:11 +00:00
Karina 41d99249d9
Use expecteddir option when running vanilla tests (#7573)
In PostgreSQL 16 a new option expecteddir was introduced to pg_regress.
Together with fix in
[196eeb6b](https://github.com/postgres/postgres/commit/196eeb6b) it
causes check-vanilla failure if expecteddir is not specified.

Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>
2024-04-10 16:08:54 +00:00
Onur Tirtir 3929a5b2a6
Fix incorrect "VALID UNTIL" assumption made for roles in node activation (#7534)
Fixes https://github.com/citusdata/citus/issues/7533.

DESCRIPTION: Fixes incorrect `VALID UNTIL` setting assumption made for
roles when syncing them to new nodes
2024-03-20 11:38:33 +00:00
Emel Şimşek fdd658acec
Fix crash caused by some form of ALTER TABLE ADD COLUMN statements. (#7522)
DESCRIPTION: Fixes a crash caused by some form of ALTER TABLE ADD COLUMN
statements. When adding multiple columns, if one of the ADD COLUMN
statements contains a FOREIGN constraint ommitting the referenced
columns in the statement, a SEGFAULT occurs.

For instance, the following statement results in a crash:

```
  ALTER TABLE lt ADD COLUMN new_col1 bool,
                          ADD COLUMN new_col2 int references rt;

```                      


Fixes #7520.
2024-03-20 11:06:05 +03:00
Onur Tirtir 0acb5f6e86
Fix assertion failure in maintenance daemon during Citus upgrades (#7537)
Fixes https://github.com/citusdata/citus/issues/7536.

Note to reviewer:

Before this commit, the following results in an assertion failure when
executed locally and this won't be the case anymore:
```console
make -C src/test/regress/ check-citus-upgrade-local citus-old-version=v10.2.0
```

Note that this doesn't happen on CI as we don't enable assertions there.

---------

Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
2024-03-20 00:10:12 +00:00
Onur Tirtir d129064280
Refactor the code that supports node-wide object mgmt commands from non-main dbs (#7544)
RunPreprocessNonMainDBCommand and RunPostprocessNonMainDBCommand are
the entrypoints for this module. These functions are called from
utility_hook.c to support some of the node-wide object management
commands from non-main databases.

To add support for a new command type, one needs to define a new
NonMainDbDistributeObjectOps object and add it to
GetNonMainDbDistributeObjectOps.
2024-03-19 14:26:17 +01:00
Hanefi Onaldi bf05bf51ec
Refactor one helper function (#7562)
The code looks simpler and easier to read now.
2024-03-18 12:06:49 +00:00
eaydingol 8afa2d0386
Change the order in which the locks are acquired (#7542)
This PR changes the order in which the locks are acquired (for the
target and reference tables), when a modify request is initiated from a
worker node that is not the "FirstWorkerNode".


To prevent concurrent writes, locks are acquired on the first worker
node for the replicated tables. When the update statement originates
from the first worker node, it acquires the lock on the reference
table(s) first, followed by the target table(s). However, if the update
statement is initiated in another worker node, the lock requests are
sent to the first worker in a different order. This PR unifies the
modification order on the first worker node. With the third commit,
independent of the node that received the request, the locks are
acquired for the modified table and then the reference tables on the
first node.

The first commit shows a sample output for the test prior to the fix. 

Fixes #7477

---------

Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
2024-03-10 10:20:08 +03:00
copetol 12f56438fc
Fix segfault when using certain DO block in function (#7554)
When using a CASE WHEN expression in the body
of the function that is used in the DO block, a segmentation
fault occured. This fixes that.

Fixes #7381

---------

Co-authored-by: Konstantin Morozov <vzbdryn@yahoo.com>
2024-03-08 14:21:42 +01:00
Karina f0043b64a1
Fix server crash when trying to execute activate_node_snapshot() on a single-node cluster (#7552)
This fixes #7551 reported by Egor Chindyaskin

Function activate_node_snapshot() is not meant to be called on a cluster
without worker nodes. This commit adds ERROR report for such case to
prevent server crash.
2024-03-07 11:08:19 +01:00
eaydingol edcdbe67b1
Fix: store the previous shard cost for order verification (#7550)
Store the previous shard cost so that the invariant checking performs as
expected.
2024-03-06 14:46:49 +03:00
sminux d59c93bc50
fix bad copy-paste rightComparisonLimit (#7547)
DESCRIPTION: change for #7543
2024-03-05 08:49:35 +01:00
Gürkan İndibay 51009d0191
Add support for alter/drop role propagation from non-main databases (#7461)
DESCRIPTION: Adds support for distributed `ALTER/DROP ROLE` commands
from the databases where Citus is not installed

---------

Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
2024-02-28 08:58:28 +00:00
Onur Tirtir f4242685e3
Add failure handling for CREATE DATABASE commands (#7483)
In preprocess phase, we save the original database name, replace
dbname field of CreatedbStmt with a temporary name (to let Postgres
to create the database with the temporary name locally) and then
we insert a cleanup record for the temporary database name on all
nodes **(\*\*)**.

And in postprocess phase, we first rename the temporary database
back to its original name for local node and then return a list of
distributed DDL jobs i) to create the database with the temporary
name and then ii) to rename it back to its original name on other
nodes. That way, if CREATE DATABASE fails on any of the nodes, the
temporary database will be cleaned up by the cleanup records that
we inserted in preprocess phase and in case of a failure, we won't
leak any databases called as the name that user intended to use for
the database.

Solves the problem documented in
https://github.com/citusdata/citus/issues/7369
for CREATE DATABASE commands.

**(\*\*):** To ensure that we insert cleanup records on all nodes,
with this PR we also start requiring having the coordinator in the
metadata because otherwise we would skip inserting a cleanup record
for the coordinator.
2024-02-23 17:02:32 +00:00
Nils Dijk cbb90cc4ae
Devcontainer: enable coredumps (#7523)
Add configuration for coredumps and document how to make sure they are
enabled when developing in a devcontainer.

---------

Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
2024-02-23 13:38:11 +00:00
Onur Tirtir 9ddee5d02a
Test that we check unsupported options for CREATE DATABASE from non-main dbs (#7532)
When adding CREATE/DROP DATABASE propagation in #7240, luckily
we've added EnsureSupportedCreateDatabaseCommand() check into
deparser too just to be on the safe side. That way, today CREATE
DATABASE commands from non-main dbs don't silently allow unsupported
options.

I wasn't aware of this when merging #7439 and hence wanted to add
a test so that we don't mistakenly remove that check from deparser
in future.
2024-02-23 10:37:11 +00:00
eaydingol 3509b7df5a
Add support for SECURITY LABEL on ROLE propagation from non-main databases (#7525)
DESCRIPTION: Adds support for distributed "SECURITY LABEL on ROLE"
commands from the databases where Citus is not installed.
2024-02-23 09:54:19 +03:00
Gürkan İndibay 211415dd4b
Removes granted by statement to fix flaky test errors (#7526)
Fix for the #7519
In metadata sync phase, grant statements for roles are being fetched and
propagated from catalog tables.
However, in some cases grant .. with admin option clauses executes after
the granted by statements which causes #7519 error.
We will fix this issue with the grantor propagation task in the project
2024-02-21 18:37:25 +03:00
Karina 683e10ab69
Fix error in master_disable_node/citus_disable_node (#7492)
This fixes #7454: master_disable_node() has only two arguments, but
calls citus_disable_node() that tries to read three arguments

Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>
2024-02-21 11:35:27 +00:00
Halil Ozan Akgül 852bcc5483
Add support for create / drop database propagation from non-main databases (#7439)
DESCRIPTION: Adds support for distributed `CREATE/DROP DATABASE `
commands from the databases where Citus is not installed

---------

Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
2024-02-21 10:44:01 +00:00
Gürkan İndibay b3ef1b7e39
Add support for grant on database propagation from non-main databases (#7443)
DESCRIPTION: Adds support for distributed `GRANT .. ON DATABASE TO USER`
commands from the databases where Citus is not installed

---------

Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
2024-02-21 13:14:58 +03:00
Onur Tirtir 56e014e64e
Clarify resource-cleaner apis (#7518)
Rename InsertCleanupRecordInCurrentTransaction ->
InsertCleanupOnSuccessRecordInCurrentTransaction and hardcode policy
type as CLEANUP_DEFERRED_ON_SUCCESS.

Rename InsertCleanupRecordInSubtransaction ->
InsertCleanupRecordOutsideTransaction.
2024-02-20 08:57:08 +00:00
Gürkan İndibay 71ccbcf3e2
Adds changelog for v11.0.10 (#7513) 2024-02-20 08:06:57 +00:00
Gürkan İndibay 2cbfdbfa46
Adds Grant Role support from non-main db (#7404)
DESCRIPTION: Adds support for distributed role-membership management
commands from the databases where Citus is not installed (`GRANT <role>
TO <role>`)

This PR also refactors the code-path that allows executing some of the
node-wide commands so that we use send deparsed query string to other
nodes instead of the `queryString` passed into utility hook.

---------

Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
2024-02-19 17:53:27 +03:00
Gürkan İndibay 9a0cdbf5af
Fixes granted by cascade/restrict statements for revoke (#7517)
DESCRIPTION: Fixes incorrect propagating of `GRANTED BY` and
`CASCADE/RESTRICT` clauses for `REVOKE` statements

There are two issues fixed in this PR
1. granted by statement will appear for revoke statements as well
2. revoke/cascade statement will appear after granted by

Since granted by statements does not appear in statements, this bug
hasn't been visible until now. However, after activating the granted by
statement for revoke, order problem arised and this issue was fixed
order problem for cascade/revoke as well
In summary, this PR provides usage of granted by statements properly now
with the correct order of statements.
We can verify the both errors, fixed with just single statement
REVOKE dist_role_3 from non_dist_role_3 granted by test_admin_role
cascade;
2024-02-19 15:44:21 +03:00
Onur Tirtir 74b55d0546
Enforce using werkzeug 2.3.7 for failure tests and update Postgres versions to latest minors (#7491)
Let's use version 2.3.7 to fix the following error as we do in docker
images created in https://github.com/citusdata/the-process/ repo.
```
ImportError: cannot import name 'url_quote' from 'werkzeug.urls' (/home/onurctirtir/.local/share/virtualenvs/regress-ffZKpSmO/lib/python3.9/site-packages/werkzeug/urls.py)
```

And changing werkzeug version required rebuilding Pipfile.lock file in
src/test/regress. Before updating this Pipfile.lock file, we want to
make sure that versions specified there don't break any tests. And to
ensure that this is the case,
https://github.com/citusdata/the-process/pull/155 synchronizes
requirements.txt file based on new Pipfile.lock and hence this PR
updates test image suffix accordingly.

Also, while updating https://github.com/citusdata/the-process/pull/155,
I also had to update Postgres versions to latest minors to make image
builds passing again and updating Postgres versions in images requires
updating Postgres versions in this repo too. While doing that, we also
update Postgres version used in devcontainer too.
2024-02-16 14:38:32 +00:00
eaydingol 15a3adebe8
Support SECURITY LABEL ON ROLE from any node (#7508)
DESCRIPTION: Propagates SECURITY LABEL ON ROLE statement from any node
2024-02-15 20:34:15 +03:00
Gürkan İndibay 59da0633bb
Fixes invalid grantor field parsing in grant role propagation (#7451)
DESCRIPTION: Resolves an issue that disrupts distributed GRANT
statements with the grantor option

In this issue 3 issues are being solved:
1.Correcting the erroneous appending of multiple granted by in the
deparser.
2Adding support for grantor (granted by) in grant role propagation.
3. Implementing grantor (granted by) support during the metadata sync
grant role propagation phase.

Limitations: Currently, the grantor must be created prior to the
metadata sync phase. During metadata sync, both the creation of the
grantor and the grants given by that role cannot be performed, as the
grantor role is not detected during the dependency resolution phase.

---------

Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
2024-02-15 08:27:29 +00:00
Gürkan İndibay c665cb8af3
Adds changelog for 11.0.9,11.1.7,11.2.2,11.3.1,12.0.1,12.1.2 (#7507) 2024-02-14 08:40:28 +03:00
Ivan Vyazmitinov 2fae91c5df
Force LC_COLLATE=C for sort in check_gucs_are_alphabetically_sorted.sh (#7489)
Fixed gucs check, as described
[here](https://github.com/citusdata/citus/pull/7286#discussion_r1481049261)
2024-02-08 12:21:21 +01:00
Onur Tirtir 689c6897a4
Refactor CREATE / DROP database functions for better readability (#7486) 2024-02-08 01:55:50 +03:00
eaydingol f01c5f2593
Move remaining citus_internal functions (#7478)
Moves the following functions to the Citus internal schema: 

citus_internal_local_blocked_processes
citus_internal_global_blocked_processes
citus_internal_mark_node_not_synced
citus_internal_unregister_tenant_schema_globally
citus_internal_update_none_dist_table_metadata
citus_internal_update_placement_metadata
citus_internal_update_relation_colocation
citus_internal_start_replication_origin_tracking
citus_internal_stop_replication_origin_tracking
citus_internal_is_replication_origin_tracking_active


#7405

---------

Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
2024-02-07 16:58:17 +03:00
Filip Sedlák 6869b3ad10
Fail early when shard can't be safely moved to a new node (#7467)
DESCRIPTION: citus_move_shard_placement now fails early when shard
cannot be safely moved

The implementation is quite simplistic -
`citus_move_shard_placement(...)` will fail with an error if there's any
new node in the cluster that doesn't have reference tables yet.

It could have been finer-grained, i.e. erroring only when trying to move
a shard to an unitialized node. Looking at the related functions -
`replicate_reference_tables()` or `citus_rebalance_start()`, I think
it's acceptable behaviour. These other functions also treat "any"
unitialized node as a temporary anomaly.

Fixes #7426

---------

Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
2024-02-07 12:04:52 +00:00
Karina 9ff8436f14
Create directories and files with pg_file_create_mode and pg_dir_create_mode permissions (#7479)
Since Postgres commit da9b580d files and directories are supposed to
be created with pg_file_create_mode and pg_dir_create_mode permissions
when default permissions are expected.

This fixes a failure of one of the postgres tests:
If we create file add.conf containing
```
shared_preload_libraries='citus'
```
and run postgres tests
```
TEMP_CONFIG=/path/to/add.conf make installcheck -C src/bin/pg_ctl/
```
then 001_start_stop.pl fails with
```
.../data/base/pgsql_job_cache mode must be 0750
```
in the log.

In passing this also stops creating directories that we haven't used
since Citus 7.4

This change explicitely doesn't change permissions of certificates/keys
that we create.

---------

Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>
2024-02-07 12:48:31 +01:00
eaydingol 594cb6f274
Move more citus internal functions (#7473)
Moves the following functions:

 citus_internal_delete_colocation_metadata 
 citus_internal_delete_partition_metadata 
 citus_internal_delete_placement_metadata 
 citus_internal_delete_shard_metadata 
 citus_internal_delete_tenant_schema
2024-01-31 23:00:04 +03:00
eaydingol d05174093b
Move citus internal functions (#7470)
Move more functions to citus_internal schema, the list:

citus_internal_add_placement_metadata
citus_internal_add_shard_metadata
citus_internal_add_tenant_schema
citus_internal_adjust_local_clock_to_remote
citus_internal_database_command

#7405
2024-01-31 11:45:19 +00:00
Onur Tirtir 3ce731d497
Make multi_metadata_sync runnable via run_test.py (#7472) 2024-01-31 09:50:16 +00:00
Onur Tirtir 6f43d5c02f
Enhance technical README for DDL propagation (#7471) 2024-01-31 10:30:14 +01:00
Onur Tirtir 5aedec4242
Improve error message for recursive CTEs (#7407)
Fixes #2870
2024-01-30 15:12:48 +00:00
eaydingol f6ea619e27
Move citus internal functions (#7466)
Move the following functions from pg_catalog to citus_internal:

citus_internal_add_object_metadata
citus_internal_add_partition_metadata


#7405
2024-01-30 12:27:10 +03:00
Onur Tirtir 9c243d4477
Improve check_gucs_are_alphabetically_sorted.sh (#7460)
Apparently https://github.com/citusdata/citus/pull/7452 was not enough,
need to consider the GUC-like expressions only within
RegisterCitusConfigVariables
function.
2024-01-26 12:10:35 +00:00