Commit Graph

6763 Commits (v12.1.7)

Author SHA1 Message Date
eaydingol f3cb3d99ee
Bump Citus to 12.1.7 (#7894)
Bump Citus to 12.1.7
2025-02-07 14:58:55 +03:00
eaydingol bae20578d4
Add changelog for 12.1.7 (#7889)
Add changelog entries for 12.1.7

---------

Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
2025-02-07 10:56:08 +03:00
Onur Tirtir 02b3c009e7 Fix mixed Citus upgrade tests (#7218)
When testing rolling Citus upgrades, coordinator should not be upgraded
until we upgrade all the workers.

---------

Co-authored-by: Jelte Fennema-Nio <github-tech@jeltef.nl>
(cherry picked from commit 27ac44eb2a)
2025-02-04 01:24:57 +03:00
Onur Tirtir 352516e619 Avoid publishing artifacts with conflicting names
.. as documented in actions/upload-artifact#480.

(cherry picked from commit 26f16a7654)
2025-02-03 10:13:59 +03:00
Onur Tirtir 66d35b35f8 Upgrade download-artifacts action to 4.1.8
(cherry picked from commit cbe0de33a6)
2025-02-03 10:13:59 +03:00
Onur Tirtir 296b623093 Upgrade upload-artifacts action to 4.6.0
(cherry picked from commit b886cfa223)
2025-02-03 10:13:59 +03:00
Naisila Puka 71d921707d Fix foreign_key_to_reference_shard_rebalance test (#7826)
foreign_key_to_reference_shard_rebalance failed because partition of
2024 year does not exist, fixed by add default partition.

Replaces https://github.com/citusdata/citus/pull/7396 by adding a rule
that allows properly testing foreign_key_to_reference_shard_rebalance
via run_test.py.

Closes #7396

Co-authored-by: chuhx <148182736+cstarc1@users.noreply.github.com>
(cherry picked from commit 968ac74cde)

Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
2025-01-07 23:03:18 +03:00
Teja Mupparti 39e890d9ce For scenarios, such as, Bug 3697586: Server crashes when assigning distributed transaction: Raise an ERROR instead of a crash 2025-01-07 23:03:18 +03:00
Onur Tirtir cb31a649e9 Avoid re-assigning the global pid for client backends and bg workers when the application_name changes (#7791)
DESCRIPTION: Fixes a crash that happens because of unsafe catalog access
when re-assigning the global pid after application_name changes.

When application_name changes, we don't actually need to
try re-assigning the global pid for external client backends because
application_name doesn't affect the global pid for such backends. Plus,
trying to re-assign the global pid for external client backends would
unnecessarily cause performing a catalog access when the cached local
node id is invalidated. However, accessing to the catalog tables is
dangerous in certain situations like when we're not in a transaction
block. And for the other types of backends, i.e., the Citus internal
backends, we need to re-assign the global pid when the application_name
changes because for such backends we simply extract the global pid
inherited from the originating backend from the application_name -that's
specified by originating backend when openning that connection- and this
doesn't require catalog access.

(cherry picked from commit 73411915a4)
2024-12-24 09:34:08 +03:00
Emel Şimşek c44682a7d0
Bump Citus Version to 12.1.6 (#7744)
Bump Citus Version to 12.1.6
2024-11-14 15:56:38 +03:00
Emel Şimşek d4f1635775
Add changelog entries for 12.1.6 (#7742)
Add changelog entries for 12.1.6.
2024-11-14 13:41:34 +03:00
Emel Şimşek 0f1aa0e16a
[Bug Fix] Query on distributed tables with window partition may cause… (#7740)
… segfault #7705 (#7718)

This PR is a proposed fix for issue
[7705](https://github.com/citusdata/citus/issues/7705). The following is
the background and rationale for the fix (please refer to
[7705](https://github.com/citusdata/citus/issues/7705) for context);

The `varnullingrels `field was introduced to the Var node struct
definition in Postgres 16. Its purpose is to associate a variable with
the set of outer join relations that can cause the variable to be NULL.
The `varnullingrels ` for the variable
`"gianluca_camp_test"."start_timestamp"` in the problem query is 3,
because the variable "gianluca_camp_test"."start_timestamp" is coming
from the inner (nullable) side of an outer join and 3 is the RT index
(aka relid) of that outer join. The problem occurs when the Postgres
planner attempts to plan the combine query. The format of a combine
query is:
```
SELECT <targets>
FROM   pg_catalog.citus_extradata_container();
```
There is only one relation in a combine query, so no outer joins are
present, but the non-empty `varnullingrels `field causes the Postgres
planner to access structures for a non-existent relation. The source of
the problem is that, when creating the target list for the combine
query, function MasterAggregateMutator() uses copyObject() to construct
a Var node before setting the master table ID, and this copies over the
non-empty varnullingrels field in the case of the
`"gianluca_camp_test"."start_timestamp"` var. The proposed solution is
to have MasterAggregateMutator() use makeVar() instead of copyObject(),
and only set the fields that make sense for the combine query; var type,
collation and type modifier. The `varnullingrels `field can be left
empty because there is only one relation in the combine query.

A new regress test issue_7705.sql is added to exercise the fix. The
issue is not specific to window functions, any target expression that
cannot be pushed down and contains at least one column from the inner
side of a left outer join (so has a non-empty varnullingrels field) can
cause the same issue.

More about Citus combine queries

[here](https://github.com/citusdata/citus/tree/main/src/backend/distributed#combine-query-planner).
More about Postgres varnullingrels

[here](https://github.com/postgres/postgres/blob/master/src/backend/optimizer/README).

(cherry picked from commit 248ff5d52a)

DESCRIPTION: PR description that will go into the change log, up to 78
characters

Co-authored-by: Colm <colm.mchugh@gmail.com>
2024-11-13 19:37:51 +03:00
Emel Şimşek 686d2b46ca
Propagates SECURITY LABEL ON ROLE stmt (#7304) (#7735)
Propagates SECURITY LABEL ON ROLE stmt (https://github.com/citusdata/citus/pull/7304)
We propagate `SECURITY LABEL [for provider] ON ROLE rolename IS
labelname` to the worker nodes.
We also make sure to run the relevant `SecLabelStmt` commands on a
newly added node by looking at roles found in `pg_shseclabel`.

See official docs for explanation on how this command works:
https://www.postgresql.org/docs/current/sql-security-label.html
This command stores the role label in the `pg_shseclabel` catalog table.

This commit also fixes the regex string in
`check_gucs_are_alphabetically_sorted.sh` script such that it escapes
the dot. Previously it was looking for all strings starting with "citus"
instead of "citus." as it should.

To test this feature, I currently make use of a special GUC to control
label provider registration in PG_init when creating the Citus extension.

(cherry picked from commit 0d1f18862b)

Co-authored-by: Naisila Puka <37271756+naisila@users.noreply.github.com>
2024-11-13 14:21:08 +03:00
Hanefi Onaldi 15ecc37ecd
Bump Citus to 12.1.5 2024-07-17 15:11:38 +03:00
Hanefi Onaldi 5c2ef8e2d8
Add changelog entries for 12.1.5
(cherry picked from commit 5c097860aa)
2024-07-17 15:11:38 +03:00
Parag Jain 6349f2d52d
Support MERGE command for single_shard_distributed Target (#7643)
This PR has following changes :
1. Enable MERGE command for single_shard_distributed targets.

(cherry picked from commit 3c467e6e02)
2024-07-17 15:11:38 +03:00
Nils Dijk f60c4cbd19
bump postgres versions in CI and dev (#7655)
Upgrade postgres versions to:
 - 14.12
 -  15.7
 - 16.3

Depends on https://github.com/citusdata/the-process/pull/158

(cherry picked from commit accb7d09f7)
2024-07-17 15:11:38 +03:00
Gürkan İndibay f0ea07a813
Removes el/7 and ol/7 as runners (#7650)
Removes el/7 and ol/7 as runners and update checkout action to v4

We use EL/7 and OL/7 runners to test packaging for these distributions.
However, for the past two weeks, we've encountered errors during the
checkout step in the pipelines. The error message is as follows:
```
/__e/node20/bin/node: /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /__e/node20/bin/node)
/__e/node20/bin/node: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /__e/node20/bin/node)
/__e/node20/bin/node: /lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by /__e/node20/bin/node)
/__e/node20/bin/node: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /__e/node20/bin/node)
/__e/node20/bin/node: /lib64/libc.so.6: version `GLIBC_2.28' not found (required by /__e/node20/bin/node)
/__e/node20/bin/node: /lib64/libc.so.6: version `GLIBC_2.25' not found (required by /__e/node20/bin/node)
```
The GCC version within the EL/7 and OL/7 Docker images is 2.17, and we
cannot upgrade it. Therefore, we need to remove these images from the
packaging test pipelines. Consequently, we will no longer verify if the
code builds for EL/7 and OL/7.

However, we are not using these packaging images as runners within the
packaging infrastructure, so we can continue to use these images for
packaging.

Additional Info: I learned that Marlin team fully dropped the el/7
support so we will drop in further releases as well

(cherry picked from commit c603c3ed74)
2024-07-17 15:11:38 +03:00
Nils Dijk 9dcfcb92ff
CI: move to github container registry (#7652)
We move the CI images to the github container registry.

Given we mostly (if not solely) run these containers on github actions
infra it makes sense to have them hosted closer to where they are
needed.

Image changes: https://github.com/citusdata/the-process/pull/157

(cherry picked from commit e776a7ebbb)
2024-07-17 14:57:50 +03:00
paragjain caee20ad7c fixing expected file of multi_move_mx test 2024-06-18 16:49:39 +02:00
Onur Tirtir d9635609f4 Fix flaky multi_mx_node_metadata.sql test (#7317)
Fixes the flaky test that results in following diff:
```diff
--- /__w/citus/citus/src/test/regress/expected/multi_mx_node_metadata.out.modified	2023-11-01 14:22:12.890476575 +0000
+++ /__w/citus/citus/src/test/regress/results/multi_mx_node_metadata.out.modified	2023-11-01 14:22:12.914476657 +0000
@@ -840,24 +840,26 @@
 (1 row)

 \c :datname - - :master_port
 SELECT datname FROM pg_stat_activity WHERE application_name LIKE 'Citus Met%';
   datname
 ------------
  db_to_drop
 (1 row)

 DROP DATABASE db_to_drop;
+ERROR:  database "db_to_drop" is being accessed by other users
 SELECT datname FROM pg_stat_activity WHERE application_name LIKE 'Citus Met%';
   datname
 ------------
-(0 rows)
+ db_to_drop
+(1 row)

 -- cleanup
 DROP SEQUENCE sequence CASCADE;
 NOTICE:  drop cascades to default value for column a of table reference_table
```

(cherry picked from commit 9867c5b949)
2024-06-18 16:49:39 +02:00
Jelte Fennema-Nio 4f0053ed6d Redo #7620: Fix merge command when insert value does not have source distributed column (#7627)
Related to issue #7619, #7620
Merge command fails when source query is single sharded and source and
target are co-located and insert is not using distribution key of
source.

Example
```
CREATE TABLE source (id integer);
CREATE TABLE target (id integer );

-- let's distribute both table on id field
SELECT create_distributed_table('source', 'id');
SELECT create_distributed_table('target', 'id');

MERGE INTO target t
  USING ( SELECT 1 AS somekey
          FROM source
        WHERE source.id = 1) s
  ON t.id = s.somekey
  WHEN NOT MATCHED
  THEN INSERT (id)
    VALUES (s.somekey)

ERROR:  MERGE INSERT must use the source table distribution column value
HINT:  MERGE INSERT must use the source table distribution column value
```

Author's Opinion: If join is not between source and target distributed
column, we should not force user to use source distributed column while
inserting value of target distributed column.

Fix: If user is not using distributed key of source for insertion let's
not push down query to workers and don't force user to use source
distributed column if it is not part of join.

This reverts commit fa4fc0b372.

Co-authored-by: paragjain <paragjain@microsoft.com>
(cherry picked from commit aaaf637a6b)
2024-06-18 16:49:39 +02:00
Jelte Fennema-Nio 3594bd7ac0 Fix CI issues after Github Actions networking changes (#7624)
For some reason using localhost in our hba file doesn't have the
intended effect anymore in our Github Actions runners. Probably because
of some networking change (IPv6 maybe) or some change in the
`/etc/hosts` file.

Replacing localhost with the equivalent loopback IPv4 and IPv6 addresses
resolved this issue.

(cherry picked from commit 8c9de08b76)
2024-06-18 16:49:39 +02:00
Gürkan İndibay 7e0dc18b22
Bump Citus version to 12.1.4 (#7610) 2024-05-29 11:35:08 +03:00
Gürkan İndibay 4e838a471a
Adds null check for node in HasRangeTableRef (#7604)
DESCRIPTION: Adds null check for node in HasRangeTableRef to prevent
errors

When executing the query below, users encountered an error due to a null
Node object. This PR adds a null check to handle this error.

Query:
```sql
select
    ct.conname as constraint_name,
    a.attname as column_name,
    fc.relname as foreign_table_name,
    fns.nspname as foreign_table_schema,
    fa.attname as foreign_column_name
from
    (SELECT ct.conname, ct.conrelid, ct.confrelid, ct.conkey, ct.contype,
ct.confkey, generate_subscripts(ct.conkey, 1) AS s
       FROM pg_constraint ct
    ) AS ct
    inner join pg_class c on c.oid=ct.conrelid
    inner join pg_namespace ns on c.relnamespace=ns.oid
    inner join pg_attribute a on a.attrelid=ct.conrelid and a.attnum =
ct.conkey[ct.s]
    left join pg_class fc on fc.oid=ct.confrelid
    left join pg_namespace fns on fc.relnamespace=fns.oid
    left join pg_attribute fa on fa.attrelid=ct.confrelid and fa.attnum =
ct.confkey[ct.s]
where
    ct.contype='f'
    and c.relname='table1'
    and ns.nspname='schemauser'
order by
    fns.nspname, fc.relname, a.attnum
;
```

Error:
```
#0  HasRangeTableRef (node=0x0, varno=varno@entry=0x7ffe18cc3674) at worker/worker_shard_visibility.c:507
507             if (IsA(node, RangeTblRef))
#0  HasRangeTableRef (node=0x0, varno=varno@entry=0x7ffe18cc3674) at worker/worker_shard_visibility.c:507
#1  0x0000561b0aae390e in expression_tree_walker_impl (node=0x561b0d19cc78, walker=walker@entry=0x7f2a73249f0a <HasRangeTableRef>, context=0x7ffe18cc3674)
    at nodeFuncs.c:2091
#2  0x00007f2a73249f26 in HasRangeTableRef (node=<optimized out>, varno=<optimized out>) at worker/worker_shard_visibility.c:513
#3  0x0000561b0aae3e09 in expression_tree_walker_impl (node=0x561b0d19cd68, walker=walker@entry=0x7f2a73249f0a <HasRangeTableRef>, context=context@entry=0x7ffe18cc3674)
    at nodeFuncs.c:2405
#4  0x0000561b0aae3945 in expression_tree_walker_impl (node=0x561b0d19d0f8, walker=walker@entry=0x7f2a73249f0a <HasRangeTableRef>, context=0x7ffe18cc3674)
    at nodeFuncs.c:2111
#5  0x00007f2a73249f26 in HasRangeTableRef (node=<optimized out>, varno=<optimized out>) at worker/worker_shard_visibility.c:513
#6  0x0000561b0aae3e09 in expression_tree_walker_impl (node=0x561b0d19cb38, walker=walker@entry=0x7f2a73249f0a <HasRangeTableRef>, context=context@entry=0x7ffe18cc3674)
    at nodeFuncs.c:2405
#7  0x0000561b0aae396d in expression_tree_walker_impl (node=0x561b0d19d198, walker=walker@entry=0x7f2a73249f0a <HasRangeTableRef>, context=0x7ffe18cc3674)
    at nodeFuncs.c:2127
#8  0x00007f2a73249f26 in HasRangeTableRef (node=<optimized out>, varno=<optimized out>) at worker/worker_shard_visibility.c:513
#9  0x0000561b0aae3ef7 in expression_tree_walker_impl (node=0x561b0d183e88, walker=walker@entry=0x7f2a73249f0a <HasRangeTableRef>, context=0x7ffe18cc3674)
    at nodeFuncs.c:2464
#10 0x00007f2a73249f26 in HasRangeTableRef (node=<optimized out>, varno=<optimized out>) at worker/worker_shard_visibility.c:513
#11 0x0000561b0aae3ed3 in expression_tree_walker_impl (node=0x561b0d184278, walker=walker@entry=0x7f2a73249f0a <HasRangeTableRef>, context=0x7ffe18cc3674)
    at nodeFuncs.c:2460
#12 0x00007f2a73249f26 in HasRangeTableRef (node=<optimized out>, varno=<optimized out>) at worker/worker_shard_visibility.c:513
#13 0x0000561b0aae3ed3 in expression_tree_walker_impl (node=0x561b0d184668, walker=walker@entry=0x7f2a73249f0a <HasRangeTableRef>, context=0x7ffe18cc3674)
    at nodeFuncs.c:2460
#14 0x00007f2a73249f26 in HasRangeTableRef (node=<optimized out>, varno=<optimized out>) at worker/worker_shard_visibility.c:513
#15 0x0000561b0aae3ed3 in expression_tree_walker_impl (node=0x561b0d184f68, walker=walker@entry=0x7f2a73249f0a <HasRangeTableRef>, context=0x7ffe18cc3674)
    at nodeFuncs.c:2460
#16 0x00007f2a73249f26 in HasRangeTableRef (node=<optimized out>, varno=<optimized out>) at worker/worker_shard_visibility.c:513
#17 0x0000561b0aae3e09 in expression_tree_walker_impl (node=0x7f2a68010148, walker=walker@entry=0x7f2a73249f0a <HasRangeTableRef>, context=context@entry=0x7ffe18cc3674)
    at nodeFuncs.c:2405
#18 0x00007f2a7324a0eb in FilterShardsFromPgclass (node=node@entry=0x561b0d185de8, context=context@entry=0x0) at worker/worker_shard_visibility.c:464
#19 0x00007f2a7324a5ff in HideShardsFromSomeApplications (query=query@entry=0x561b0d185de8) at worker/worker_shard_visibility.c:294
#20 0x00007f2a731ed7ac in distributed_planner (parse=0x561b0d185de8, 
    query_string=0x561b0d009478 "select\n    ct.conname as constraint_name,\n    a.attname as column_name,\n    fc.relname as foreign_table_name,\n    fns.nspname as foreign_table_schema,\n    fa.attname as foreign_column_name\nfrom\n    (S"..., cursorOptions=<optimized out>, boundParams=0x0) at planner/distributed_planner.c:237
#21 0x00007f2a7311a52a in pgss_planner (parse=0x561b0d185de8, 
    query_string=0x561b0d009478 "select\n    ct.conname as constraint_name,\n    a.attname as column_name,\n    fc.relname as foreign_table_name,\n    fns.nspname as foreign_table_schema,\n    fa.attname as foreign_column_name\nfrom\n    (S"..., cursorOptions=2048, boundParams=0x0) at pg_stat_statements.c:953
#22 0x0000561b0ab65465 in planner (parse=parse@entry=0x561b0d185de8, 
    query_string=query_string@entry=0x561b0d009478 "select\n    ct.conname as constraint_name,\n    a.attname as column_name,\n    fc.relname as foreign_table_name,\n    fns.nspname as foreign_table_schema,\n    fa.attname as foreign_column_name\nfrom\n    (S"..., cursorOptions=cursorOptions@entry=2048, boundParams=boundParams@entry=0x0)
    at planner.c:279
#23 0x0000561b0ac53aa3 in pg_plan_query (querytree=querytree@entry=0x561b0d185de8, 
    query_string=query_string@entry=0x561b0d009478 "select\n    ct.conname as constraint_name,\n    a.attname as column_name,\n    fc.relname as foreign_table_name,\n    fns.nspname as foreign_table_schema,\n    fa.attname as foreign_column_name\nfrom\n    (S"..., cursorOptions=cursorOptions@entry=2048, boundParams=boundParams@entry=0x0)
    at postgres.c:904
#24 0x0000561b0ac53b71 in pg_plan_queries (querytrees=0x7f2a68012878, 
    query_string=query_string@entry=0x561b0d009478 "select\n    ct.conname as constraint_name,\n    a.attname as column_name,\n    fc.relname as foreign_table_name,\n    fns.nspname as foreign_table_schema,\n    fa.attname as foreign_column_name\nfrom\n    (S"..., cursorOptions=cursorOptions@entry=2048, boundParams=boundParams@entry=0x0)
    at postgres.c:996
#25 0x0000561b0ac5408e in exec_simple_query (
    query_string=query_string@entry=0x561b0d009478 "select\n    ct.conname as constraint_name,\n    a.attname as column_name,\n    fc.relname as foreign_table_name,\n    fns.nspname as foreign_table_schema,\n    fa.attname as foreign_column_name\nfrom\n    (S"...) at postgres.c:1193
#26 0x0000561b0ac56116 in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4637
#27 0x0000561b0abab7a7 in BackendRun (port=port@entry=0x561b0d0caf50) at postmaster.c:4464
#28 0x0000561b0abae969 in BackendStartup (port=port@entry=0x561b0d0caf50) at postmaster.c:4192
#29 0x0000561b0abaeaa6 in ServerLoop () at postmaster.c:1782
```


Fixes #7603
2024-05-28 08:54:40 +03:00
Gürkan İndibay 035aa6eada
Bump Citus version to 12.1.3 (#7588) 2024-04-24 11:15:04 +03:00
Gürkan İndibay 75ff237340 Removes unnecessary package installations in packaging pipelines (#7341)
With the recent changes in packaging images, linux package installations
to execute validate_output is unnecessary now.
In this PR, I removed them to make the pipeline more effective.

- [x] Remove the test warning before merge

(cherry picked from commit 32b0fc23f5)
2024-04-17 10:26:50 +02:00
Gürkan İndibay 40e9e2614d Removes centos 7 for PG 16 in packaging pipelines (#7205)
centos 7 and oracle 7 is not being supported for newer releases by
Postgres. Therefore, getting package download errors in packaging
pipelines.
This PR removes el/7 and ol/7 Postgres 16 pipelines

(cherry picked from commit b0e982d0b5)
2024-04-17 10:26:50 +02:00
Jelte Fennema-Nio bac95cc523 Greatly speed up "\d tablename" on servers with many tables (#7577)
DESCRIPTION: Fix performance issue when using "\d tablename" on a server
with many tables

We introduce a filter to every query on pg_class to automatically remove
shards. This is useful to make sure \d and PgAdmin are not cluttered
with shards. However, the way we were introducing this filter was using
`securityQuals` which can have negative impact on query performance.

On clusters with 100k+ tables this could cause a simple "\d tablename"
command to take multiple seconds, because a skipped optimization by
Postgres causes a full table scan. This changes the code to introduce
this filter in the regular `quals` list instead of in `securityQuals`.
Which causes Postgres to use the intended optimization again.

For reference, this was initially reported as a Postgres issue by me:

https://www.postgresql.org/message-id/flat/4189982.1712785863%40sss.pgh.pa.us#b87421293b362d581ea8677e3bfea920
(cherry picked from commit a0151aa31d)
2024-04-17 10:26:50 +02:00
Xing Guo 38967491ef Add missing volatile qualifier. (#7570)
Variables being modified in the PG_TRY block and read in the PG_CATCH
block should be qualified with volatile.

The variable waitEventSet is modified in the PG_TRY block (line 1085)
and read in the PG_CATCH block (line 1095).

The variable relation is modified in the PG_TRY block (line 500) and
read in the PG_CATCH block (line 515).

Besides, the variable objectAddress doesn't need the volatile qualifier.

Ref: C99 7.13.2.1[^1],

> All accessible objects have values, and all other components of the
abstract machine have state, as of the time the longjmp function was
called, except that the values of objects of automatic storage duration
that are local to the function containing the invocation of the
corresponding setjmp macro that do not have volatile-qualified type and
have been changed between the setjmp invocation and longjmp call are
indeterminate.

[^1]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf

DESCRIPTION: Correctly mark some variables as volatile

---------

Co-authored-by: Hong Yi <zouzou0208@gmail.com>
(cherry picked from commit ada3ba2507)
2024-04-17 10:26:50 +02:00
Filip Sedlák fc09e1cfdc Log username in the failed connection message (#7432)
This patch includes the username in the reported error message.
This makes debugging easier when certain commands open connections
as other users than the user that is executing the command.

```
monitora_snapshot=# SELECT citus_move_shard_placement(102030, 'monitora.db-dev-worker-a', 6005, 'monitora.db-dev-worker-a', 6017);
ERROR:  connection to the remote node monitora_user@monitora.db-dev-worker-a:6017 failed with the following error: fe_sendauth: no password supplied
Time: 40,198 ms
```

(cherry picked from commit 8b48d6ab02)
2024-04-17 10:26:50 +02:00
Karina 7513061057 Make isolation_update_node test system independent (#7423)
Test isolation_update_node fails on some systems with the following error:
```
-s2: WARNING:  connection to the remote node non-existent:57637 failed with the following error: could not translate host name "non-existent" to address: Name or service not known
+s2: WARNING:  connection to the remote node non-existent:57637 failed with the following error: could not translate host name "non-existent" to address: Temporary failure in name resolution
```

This slightly modifies an already existing [normalization
rule](739c6d26df/src/test/regress/bin/normalize.sed (L217-L218))
to fix it.

Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>
(cherry picked from commit 21464adfec)
2024-04-17 10:26:50 +02:00
Jelte Fennema-Nio f4af59ab4b Support running isolation_update_node in flaky test detection (#7425)
I noticed in #7423 that `isolation_update_node` could not be run using
flaky test detection. This fixes that.
2024-04-17 10:26:50 +02:00
sminux 5708fca1ea fix bad copy-paste rightComparisonLimit (#7547)
DESCRIPTION: change for #7543
(cherry picked from commit d59c93bc50)
2024-04-17 10:26:50 +02:00
LightDB Enterprise Postgres 2a6164d2d9 Fix timeout when underlying socket is changed in a MultiConnection (#7377)
When there are multiple localhost entries in /etc/hosts like following
/etc/hosts:
```
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
127.0.0.1   localhost
```

multi_cluster_management check will failed:
```

@@ -857,20 +857,21 @@
 ERROR:  group 14 already has a primary node
 -- check that you can add secondaries and unavailable nodes to a group
 SELECT groupid AS worker_2_group FROM pg_dist_node WHERE nodeport = :worker_2_port \gset
 SELECT 1 FROM master_add_node('localhost', 9998, groupid => :worker_1_group, noderole => 'secondary');
  ?column?
 ----------
         1
 (1 row)

 SELECT 1 FROM master_add_node('localhost', 9997, groupid => :worker_1_group, noderole => 'unavailable');
+WARNING:  could not establish connection after 5000 ms
  ?column?
 ----------
         1
 (1 row)
```

This actually isn't just a problem in test environments, but could occur
as well during actual usage when a hostname in pg_dist_node
resolves to multiple IPs and one of those IPs is unreachable.
Postgres will then automatically continue with the next IP, but
Citus should listen for events on the new socket. Not on the
old one.

Co-authored-by: chuhx43211 <chuhx43211@hundsun.com>
(cherry picked from commit 9a91136a3d)
2024-04-17 10:26:50 +02:00
eaydingol db391c0bb7 Change the order in which the locks are acquired (#7542)
This PR changes the order in which the locks are acquired (for the
target and reference tables), when a modify request is initiated from a
worker node that is not the "FirstWorkerNode".

To prevent concurrent writes, locks are acquired on the first worker
node for the replicated tables. When the update statement originates
from the first worker node, it acquires the lock on the reference
table(s) first, followed by the target table(s). However, if the update
statement is initiated in another worker node, the lock requests are
sent to the first worker in a different order. This PR unifies the
modification order on the first worker node. With the third commit,
independent of the node that received the request, the locks are
acquired for the modified table and then the reference tables on the
first node.

The first commit shows a sample output for the test prior to the fix.

Fixes #7477

---------

Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
(cherry picked from commit 8afa2d0386)
2024-04-17 10:26:50 +02:00
Jelte Fennema-Nio 146725fc9b Replace more spurious strdups with pstrdups (#7441)
DESCRIPTION: Remove a few small memory leaks

In #7440 one instance of a strdup was removed. But there were a few
more. This removes the ones that are left over, or adds a comment why
strdup is on purpose.

(cherry picked from commit 9683bef2ec)
2024-04-17 10:26:50 +02:00
Marco Slot 94ab1dc240 Replace spurious strdup with pstrdup (#7440)
Not sure why we never found this using valgrind, but using strdup will
cause memory leaks because the pointer is not tracked in a memory
context.

(cherry picked from commit 72fbea20c4)
2024-04-17 10:26:50 +02:00
Onur Tirtir 812a2b759f Improve error message for recursive CTEs (#7407)
Fixes #2870

(cherry picked from commit 5aedec4242)
2024-04-17 10:26:50 +02:00
Onur Tirtir 452564c19b Allow providing "host" parameter via citus.node_conninfo (#7541)
And when that is the case, directly use it as "host" parameter for the
connections between nodes and use the "hostname" provided in
pg_dist_node / pg_dist_poolinfo as "hostaddr" to avoid host name lookup.

This is to avoid allowing dns resolution (and / or setting up DNS names
for each host in the cluster). This already works currently when using
IPs in the hostname. The only use of setting host is that you can then
use sslmode=verify-full and it will validate that the hostname matches
the certificate provided by the node you're connecting too.

It would be more flexible to make this a per-node setting, but that
requires SQL changes. And we'd like to backport this change, and
backporting such a sql change would be quite hard while backporting this
change would be very easy. And in many setups, a different hostname for
TLS validation is actually not needed. The reason for that is
query-from-any node: With query-from-any-node all nodes usually have a
certificate that is valid for the same "cluster hostname", either using
a wildcard cert or a Subject Alternative Name (SAN). Because if you load
balance across nodes you don't know which node you're connecting to, but
you still want TLS validation to do it's job. So with this change you
can use this same "cluster hostname" for TLS validation within the
cluster. Obviously this means you don't validate that you're connecting
to a particular node, just that you're connecting to one of the nodes in
the cluster, but that should be fine from a security perspective (in
most cases).

Note to self: This change requires updating

https://docs.citusdata.com/en/latest/develop/api_guc.html#citus-node-conninfo-text.

DESCRIPTION: Allows overwriting host name for all inter-node connections
by supporting "host" parameter in citus.node_conninfo

(cherry picked from commit 3586aab17a)
2024-04-17 10:26:50 +02:00
Karina 9b06d02c43 Fix error in master_disable_node/citus_disable_node (#7492)
This fixes #7454: master_disable_node() has only two arguments, but
calls citus_disable_node() that tries to read three arguments

Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>
(cherry picked from commit 683e10ab69)
2024-04-17 10:26:50 +02:00
copetol 2ee43fd00c Fix segfault when using certain DO block in function (#7554)
When using a CASE WHEN expression in the body
of the function that is used in the DO block, a segmentation
fault occured. This fixes that.

Fixes #7381

---------

Co-authored-by: Konstantin Morozov <vzbdryn@yahoo.com>
(cherry picked from commit 12f56438fc)
2024-04-17 10:26:50 +02:00
Emel Şimşek f2d102d54b Fix crash caused by some form of ALTER TABLE ADD COLUMN statements. (#7522)
DESCRIPTION: Fixes a crash caused by some form of ALTER TABLE ADD COLUMN
statements. When adding multiple columns, if one of the ADD COLUMN
statements contains a FOREIGN constraint ommitting the referenced
columns in the statement, a SEGFAULT occurs.

For instance, the following statement results in a crash:

```
  ALTER TABLE lt ADD COLUMN new_col1 bool,
                          ADD COLUMN new_col2 int references rt;

```

Fixes #7520.

(cherry picked from commit fdd658acec)
2024-04-17 10:26:50 +02:00
Karina 82637f3e13 Use expecteddir option in _run_pg_regress() (#7582)
Fix check-arbitrary-configs tests failure with current REL_16_STABLE.
This is the same problem as described in #7573. I missed pg_regress call
in _run_pg_regress() in that PR.

Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>
(cherry picked from commit 41e2af8ff5)
2024-04-17 10:26:50 +02:00
Karina 79616bc7db Use expecteddir option when running vanilla tests (#7573)
In PostgreSQL 16 a new option expecteddir was introduced to pg_regress.
Together with fix in
[196eeb6b](https://github.com/postgres/postgres/commit/196eeb6b) it
causes check-vanilla failure if expecteddir is not specified.

Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>
(cherry picked from commit 41d99249d9)
2024-04-17 10:26:50 +02:00
Jelte Fennema-Nio 364e8ece14 Speed up GetForeignKeyOids (#7578)
DESCRIPTION: Fix performance issue in GetForeignKeyOids on systems with
many constraints

GetForeignKeyOids was showing up in CPU profiles when distributing
schemas on systems with 100k+ constraints. The reason was that this
function was doing a sequence scan of pg_constraint to get the foreign
keys that referenced the requested table.

This fixes that by finding the constraints referencing the table through
pg_depend instead of pg_constraint. We're doing this indirection,
because pg_constraint doesn't have an index that we can use, but
pg_depend does.

(cherry picked from commit a263ac6f5f)
2024-04-17 10:26:50 +02:00
Jelte Fennema-Nio 62c32067f1 Use an index to get FDWs that depend on extensions (#7574)
DESCRIPTION: Fix performance issue when distributing a table that
depends on an extension

When the database contains many objects this function would show up in
profiles because it was doing a sequence scan on pg_depend. And with
many objects pg_depend can get very large.

This starts using an index scan to only look for rows containing FDWs,
of which there are expected to be very few (often even zero).

(cherry picked from commit 16604a6601)
2024-04-17 10:26:50 +02:00
Jelte Fennema-Nio d9069b1e01 Speed up SequenceUsedInDistributedTable (#7579)
DESCRIPTION: Fix performance issue when creating distributed tables if
many already exist

This builds on the work to speed up EnsureSequenceTypeSupported, and now
does something similar for SequenceUsedInDistributedTable.
SequenceUsedInDistributedTable had a similar O(number of citus tables)
operation. This fixes that and speeds up creation of distributed tables
significantly when many distributed tables already exist.

Fixes #7022

(cherry picked from commit cdf51da458)
2024-04-17 10:26:50 +02:00
Jelte Fennema-Nio fa6743d436 Speed up EnsureSequenceTypeSupported (#7575)
DESCRIPTION: Fix performance issue when creating distributed tables and many already exist

EnsureSequenceTypeSupported was doing an O(number of distributed tables)
operation. This can become very slow with lots of Citus tables, which
now happens much more frequently in practice due to schema based sharding.

Partially addresses #7022

(cherry picked from commit 381f31756e)
2024-04-17 10:26:50 +02:00
Jelte Fennema-Nio 26178fb538 Add missing postgres.h includes
After sorting includes in the previous commit some files were now
invalid because they were not including postgres.h
2024-04-17 10:26:50 +02:00