PG15 commit d1ef5631e620f9a5b6480a32bb70124c857af4f1
and PG16 commit 695f5deb7902865901eb2d50a70523af655c3a00
disallow replacing joins with scans in queries with pseudoconstant quals.
This commit prevents the set_join_pathlist_hook from being called
if any of the join restrictions is a pseudo-constant.
So in these cases, citus has no info on the join, never sees that
the query has an outer join, and ends up producing an incorrect plan.
PG17 fixes this by commit 9e9931d2bf40e2fea447d779c2e133c2c1256ef3
Therefore, we take this extra measure here for PG versions less than 17.
hasOuterJoin can never be true when set_join_pathlist_hook is absent.
When executing a prepared CALL, which is not pure SQL but available with
some drivers like npgsql and jpgdbc, Citus entered a code path where a
plan is not defined, while trying to increase its cost. Thus SIG11 when
plan is a NULL pointer.
Fix by only increasing plan cost when plan is not null.
However, it is a bit suspicious to get here with a NULL plan and maybe a
better change will be to not call
ShardPlacementForFunctionColocatedWithDistTable() with a NULL plan at
all (in call.c:134)
bug hit with for example:
```
CallableStatement proc = con.prepareCall("{CALL p(?)}");
proc.registerOutParameter(1, java.sql.Types.BIGINT);
proc.setInt(1, -100);
proc.execute();
```
where `p(bigint)` is a distributed "function" and the param the
distribution key (also in a distributed table), see #7242 for details
Fixes#7242
(cherry picked from commit 0678a2fd89)
DESCRIPTION: Fixes a crash in left outer joins that can happen when
there is an an aggregate on a column from the inner side of the join.
Fix the SEGV seen in #7787 and #7899; it occurs because a column in the
targetlist of a worker subquery can contain a non-empty varnullingrels
field if the column is from the inner side of a left outer join. The
issue can also occur with the columns in the HAVING clause, and this is
also tested in the fix. The issue was triggered by the introduction of
the varnullingrels to Vars in Postgres 16 (2489d76c)
There is a related issue, #7705, where a non-empty varnullingrels was
incorrectly copied into the query tree for the combine query. Here, a
non-empty varnullingrels field of a var is incorrectly copied into the
query tree for a worker subquery.
The regress file from #7705 is used (and renamed) to also test this
(#7787). An alternative test output file is required for Postgres 15
because of an optimization to DISTINCT in Postgres 16 (1349d2790bf).
When testing rolling Citus upgrades, coordinator should not be upgraded
until we upgrade all the workers.
---------
Co-authored-by: Jelte Fennema-Nio <github-tech@jeltef.nl>
(cherry picked from commit 27ac44eb2a)
foreign_key_to_reference_shard_rebalance failed because partition of
2024 year does not exist, fixed by add default partition.
Replaces https://github.com/citusdata/citus/pull/7396 by adding a rule
that allows properly testing foreign_key_to_reference_shard_rebalance
via run_test.py.
Closes#7396
Co-authored-by: chuhx <148182736+cstarc1@users.noreply.github.com>
(cherry picked from commit 968ac74cde)
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
DESCRIPTION: Fixes a crash that happens because of unsafe catalog access
when re-assigning the global pid after application_name changes.
When application_name changes, we don't actually need to
try re-assigning the global pid for external client backends because
application_name doesn't affect the global pid for such backends. Plus,
trying to re-assign the global pid for external client backends would
unnecessarily cause performing a catalog access when the cached local
node id is invalidated. However, accessing to the catalog tables is
dangerous in certain situations like when we're not in a transaction
block. And for the other types of backends, i.e., the Citus internal
backends, we need to re-assign the global pid when the application_name
changes because for such backends we simply extract the global pid
inherited from the originating backend from the application_name -that's
specified by originating backend when openning that connection- and this
doesn't require catalog access.
(cherry picked from commit 73411915a4)
… segfault #7705 (#7718)
This PR is a proposed fix for issue
[7705](https://github.com/citusdata/citus/issues/7705). The following is
the background and rationale for the fix (please refer to
[7705](https://github.com/citusdata/citus/issues/7705) for context);
The `varnullingrels `field was introduced to the Var node struct
definition in Postgres 16. Its purpose is to associate a variable with
the set of outer join relations that can cause the variable to be NULL.
The `varnullingrels ` for the variable
`"gianluca_camp_test"."start_timestamp"` in the problem query is 3,
because the variable "gianluca_camp_test"."start_timestamp" is coming
from the inner (nullable) side of an outer join and 3 is the RT index
(aka relid) of that outer join. The problem occurs when the Postgres
planner attempts to plan the combine query. The format of a combine
query is:
```
SELECT <targets>
FROM pg_catalog.citus_extradata_container();
```
There is only one relation in a combine query, so no outer joins are
present, but the non-empty `varnullingrels `field causes the Postgres
planner to access structures for a non-existent relation. The source of
the problem is that, when creating the target list for the combine
query, function MasterAggregateMutator() uses copyObject() to construct
a Var node before setting the master table ID, and this copies over the
non-empty varnullingrels field in the case of the
`"gianluca_camp_test"."start_timestamp"` var. The proposed solution is
to have MasterAggregateMutator() use makeVar() instead of copyObject(),
and only set the fields that make sense for the combine query; var type,
collation and type modifier. The `varnullingrels `field can be left
empty because there is only one relation in the combine query.
A new regress test issue_7705.sql is added to exercise the fix. The
issue is not specific to window functions, any target expression that
cannot be pushed down and contains at least one column from the inner
side of a left outer join (so has a non-empty varnullingrels field) can
cause the same issue.
More about Citus combine queries
[here](https://github.com/citusdata/citus/tree/main/src/backend/distributed#combine-query-planner).
More about Postgres varnullingrels
[here](https://github.com/postgres/postgres/blob/master/src/backend/optimizer/README).
(cherry picked from commit 248ff5d52a)
DESCRIPTION: PR description that will go into the change log, up to 78
characters
Co-authored-by: Colm <colm.mchugh@gmail.com>
Propagates SECURITY LABEL ON ROLE stmt (https://github.com/citusdata/citus/pull/7304)
We propagate `SECURITY LABEL [for provider] ON ROLE rolename IS
labelname` to the worker nodes.
We also make sure to run the relevant `SecLabelStmt` commands on a
newly added node by looking at roles found in `pg_shseclabel`.
See official docs for explanation on how this command works:
https://www.postgresql.org/docs/current/sql-security-label.html
This command stores the role label in the `pg_shseclabel` catalog table.
This commit also fixes the regex string in
`check_gucs_are_alphabetically_sorted.sh` script such that it escapes
the dot. Previously it was looking for all strings starting with "citus"
instead of "citus." as it should.
To test this feature, I currently make use of a special GUC to control
label provider registration in PG_init when creating the Citus extension.
(cherry picked from commit 0d1f18862b)
Co-authored-by: Naisila Puka <37271756+naisila@users.noreply.github.com>
Removes el/7 and ol/7 as runners and update checkout action to v4
We use EL/7 and OL/7 runners to test packaging for these distributions.
However, for the past two weeks, we've encountered errors during the
checkout step in the pipelines. The error message is as follows:
```
/__e/node20/bin/node: /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /__e/node20/bin/node)
/__e/node20/bin/node: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /__e/node20/bin/node)
/__e/node20/bin/node: /lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by /__e/node20/bin/node)
/__e/node20/bin/node: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /__e/node20/bin/node)
/__e/node20/bin/node: /lib64/libc.so.6: version `GLIBC_2.28' not found (required by /__e/node20/bin/node)
/__e/node20/bin/node: /lib64/libc.so.6: version `GLIBC_2.25' not found (required by /__e/node20/bin/node)
```
The GCC version within the EL/7 and OL/7 Docker images is 2.17, and we
cannot upgrade it. Therefore, we need to remove these images from the
packaging test pipelines. Consequently, we will no longer verify if the
code builds for EL/7 and OL/7.
However, we are not using these packaging images as runners within the
packaging infrastructure, so we can continue to use these images for
packaging.
Additional Info: I learned that Marlin team fully dropped the el/7
support so we will drop in further releases as well
(cherry picked from commit c603c3ed74)
We move the CI images to the github container registry.
Given we mostly (if not solely) run these containers on github actions
infra it makes sense to have them hosted closer to where they are
needed.
Image changes: https://github.com/citusdata/the-process/pull/157
(cherry picked from commit e776a7ebbb)
Fixes the flaky test that results in following diff:
```diff
--- /__w/citus/citus/src/test/regress/expected/multi_mx_node_metadata.out.modified 2023-11-01 14:22:12.890476575 +0000
+++ /__w/citus/citus/src/test/regress/results/multi_mx_node_metadata.out.modified 2023-11-01 14:22:12.914476657 +0000
@@ -840,24 +840,26 @@
(1 row)
\c :datname - - :master_port
SELECT datname FROM pg_stat_activity WHERE application_name LIKE 'Citus Met%';
datname
------------
db_to_drop
(1 row)
DROP DATABASE db_to_drop;
+ERROR: database "db_to_drop" is being accessed by other users
SELECT datname FROM pg_stat_activity WHERE application_name LIKE 'Citus Met%';
datname
------------
-(0 rows)
+ db_to_drop
+(1 row)
-- cleanup
DROP SEQUENCE sequence CASCADE;
NOTICE: drop cascades to default value for column a of table reference_table
```
(cherry picked from commit 9867c5b949)
Related to issue #7619, #7620
Merge command fails when source query is single sharded and source and
target are co-located and insert is not using distribution key of
source.
Example
```
CREATE TABLE source (id integer);
CREATE TABLE target (id integer );
-- let's distribute both table on id field
SELECT create_distributed_table('source', 'id');
SELECT create_distributed_table('target', 'id');
MERGE INTO target t
USING ( SELECT 1 AS somekey
FROM source
WHERE source.id = 1) s
ON t.id = s.somekey
WHEN NOT MATCHED
THEN INSERT (id)
VALUES (s.somekey)
ERROR: MERGE INSERT must use the source table distribution column value
HINT: MERGE INSERT must use the source table distribution column value
```
Author's Opinion: If join is not between source and target distributed
column, we should not force user to use source distributed column while
inserting value of target distributed column.
Fix: If user is not using distributed key of source for insertion let's
not push down query to workers and don't force user to use source
distributed column if it is not part of join.
This reverts commit fa4fc0b372.
Co-authored-by: paragjain <paragjain@microsoft.com>
(cherry picked from commit aaaf637a6b)
For some reason using localhost in our hba file doesn't have the
intended effect anymore in our Github Actions runners. Probably because
of some networking change (IPv6 maybe) or some change in the
`/etc/hosts` file.
Replacing localhost with the equivalent loopback IPv4 and IPv6 addresses
resolved this issue.
(cherry picked from commit 8c9de08b76)
DESCRIPTION: Adds null check for node in HasRangeTableRef to prevent
errors
When executing the query below, users encountered an error due to a null
Node object. This PR adds a null check to handle this error.
Query:
```sql
select
ct.conname as constraint_name,
a.attname as column_name,
fc.relname as foreign_table_name,
fns.nspname as foreign_table_schema,
fa.attname as foreign_column_name
from
(SELECT ct.conname, ct.conrelid, ct.confrelid, ct.conkey, ct.contype,
ct.confkey, generate_subscripts(ct.conkey, 1) AS s
FROM pg_constraint ct
) AS ct
inner join pg_class c on c.oid=ct.conrelid
inner join pg_namespace ns on c.relnamespace=ns.oid
inner join pg_attribute a on a.attrelid=ct.conrelid and a.attnum =
ct.conkey[ct.s]
left join pg_class fc on fc.oid=ct.confrelid
left join pg_namespace fns on fc.relnamespace=fns.oid
left join pg_attribute fa on fa.attrelid=ct.confrelid and fa.attnum =
ct.confkey[ct.s]
where
ct.contype='f'
and c.relname='table1'
and ns.nspname='schemauser'
order by
fns.nspname, fc.relname, a.attnum
;
```
Error:
```
#0 HasRangeTableRef (node=0x0, varno=varno@entry=0x7ffe18cc3674) at worker/worker_shard_visibility.c:507
507 if (IsA(node, RangeTblRef))
#0 HasRangeTableRef (node=0x0, varno=varno@entry=0x7ffe18cc3674) at worker/worker_shard_visibility.c:507
#1 0x0000561b0aae390e in expression_tree_walker_impl (node=0x561b0d19cc78, walker=walker@entry=0x7f2a73249f0a <HasRangeTableRef>, context=0x7ffe18cc3674)
at nodeFuncs.c:2091
#2 0x00007f2a73249f26 in HasRangeTableRef (node=<optimized out>, varno=<optimized out>) at worker/worker_shard_visibility.c:513
#3 0x0000561b0aae3e09 in expression_tree_walker_impl (node=0x561b0d19cd68, walker=walker@entry=0x7f2a73249f0a <HasRangeTableRef>, context=context@entry=0x7ffe18cc3674)
at nodeFuncs.c:2405
#4 0x0000561b0aae3945 in expression_tree_walker_impl (node=0x561b0d19d0f8, walker=walker@entry=0x7f2a73249f0a <HasRangeTableRef>, context=0x7ffe18cc3674)
at nodeFuncs.c:2111
#5 0x00007f2a73249f26 in HasRangeTableRef (node=<optimized out>, varno=<optimized out>) at worker/worker_shard_visibility.c:513
#6 0x0000561b0aae3e09 in expression_tree_walker_impl (node=0x561b0d19cb38, walker=walker@entry=0x7f2a73249f0a <HasRangeTableRef>, context=context@entry=0x7ffe18cc3674)
at nodeFuncs.c:2405
#7 0x0000561b0aae396d in expression_tree_walker_impl (node=0x561b0d19d198, walker=walker@entry=0x7f2a73249f0a <HasRangeTableRef>, context=0x7ffe18cc3674)
at nodeFuncs.c:2127
#8 0x00007f2a73249f26 in HasRangeTableRef (node=<optimized out>, varno=<optimized out>) at worker/worker_shard_visibility.c:513
#9 0x0000561b0aae3ef7 in expression_tree_walker_impl (node=0x561b0d183e88, walker=walker@entry=0x7f2a73249f0a <HasRangeTableRef>, context=0x7ffe18cc3674)
at nodeFuncs.c:2464
#10 0x00007f2a73249f26 in HasRangeTableRef (node=<optimized out>, varno=<optimized out>) at worker/worker_shard_visibility.c:513
#11 0x0000561b0aae3ed3 in expression_tree_walker_impl (node=0x561b0d184278, walker=walker@entry=0x7f2a73249f0a <HasRangeTableRef>, context=0x7ffe18cc3674)
at nodeFuncs.c:2460
#12 0x00007f2a73249f26 in HasRangeTableRef (node=<optimized out>, varno=<optimized out>) at worker/worker_shard_visibility.c:513
#13 0x0000561b0aae3ed3 in expression_tree_walker_impl (node=0x561b0d184668, walker=walker@entry=0x7f2a73249f0a <HasRangeTableRef>, context=0x7ffe18cc3674)
at nodeFuncs.c:2460
#14 0x00007f2a73249f26 in HasRangeTableRef (node=<optimized out>, varno=<optimized out>) at worker/worker_shard_visibility.c:513
#15 0x0000561b0aae3ed3 in expression_tree_walker_impl (node=0x561b0d184f68, walker=walker@entry=0x7f2a73249f0a <HasRangeTableRef>, context=0x7ffe18cc3674)
at nodeFuncs.c:2460
#16 0x00007f2a73249f26 in HasRangeTableRef (node=<optimized out>, varno=<optimized out>) at worker/worker_shard_visibility.c:513
#17 0x0000561b0aae3e09 in expression_tree_walker_impl (node=0x7f2a68010148, walker=walker@entry=0x7f2a73249f0a <HasRangeTableRef>, context=context@entry=0x7ffe18cc3674)
at nodeFuncs.c:2405
#18 0x00007f2a7324a0eb in FilterShardsFromPgclass (node=node@entry=0x561b0d185de8, context=context@entry=0x0) at worker/worker_shard_visibility.c:464
#19 0x00007f2a7324a5ff in HideShardsFromSomeApplications (query=query@entry=0x561b0d185de8) at worker/worker_shard_visibility.c:294
#20 0x00007f2a731ed7ac in distributed_planner (parse=0x561b0d185de8,
query_string=0x561b0d009478 "select\n ct.conname as constraint_name,\n a.attname as column_name,\n fc.relname as foreign_table_name,\n fns.nspname as foreign_table_schema,\n fa.attname as foreign_column_name\nfrom\n (S"..., cursorOptions=<optimized out>, boundParams=0x0) at planner/distributed_planner.c:237
#21 0x00007f2a7311a52a in pgss_planner (parse=0x561b0d185de8,
query_string=0x561b0d009478 "select\n ct.conname as constraint_name,\n a.attname as column_name,\n fc.relname as foreign_table_name,\n fns.nspname as foreign_table_schema,\n fa.attname as foreign_column_name\nfrom\n (S"..., cursorOptions=2048, boundParams=0x0) at pg_stat_statements.c:953
#22 0x0000561b0ab65465 in planner (parse=parse@entry=0x561b0d185de8,
query_string=query_string@entry=0x561b0d009478 "select\n ct.conname as constraint_name,\n a.attname as column_name,\n fc.relname as foreign_table_name,\n fns.nspname as foreign_table_schema,\n fa.attname as foreign_column_name\nfrom\n (S"..., cursorOptions=cursorOptions@entry=2048, boundParams=boundParams@entry=0x0)
at planner.c:279
#23 0x0000561b0ac53aa3 in pg_plan_query (querytree=querytree@entry=0x561b0d185de8,
query_string=query_string@entry=0x561b0d009478 "select\n ct.conname as constraint_name,\n a.attname as column_name,\n fc.relname as foreign_table_name,\n fns.nspname as foreign_table_schema,\n fa.attname as foreign_column_name\nfrom\n (S"..., cursorOptions=cursorOptions@entry=2048, boundParams=boundParams@entry=0x0)
at postgres.c:904
#24 0x0000561b0ac53b71 in pg_plan_queries (querytrees=0x7f2a68012878,
query_string=query_string@entry=0x561b0d009478 "select\n ct.conname as constraint_name,\n a.attname as column_name,\n fc.relname as foreign_table_name,\n fns.nspname as foreign_table_schema,\n fa.attname as foreign_column_name\nfrom\n (S"..., cursorOptions=cursorOptions@entry=2048, boundParams=boundParams@entry=0x0)
at postgres.c:996
#25 0x0000561b0ac5408e in exec_simple_query (
query_string=query_string@entry=0x561b0d009478 "select\n ct.conname as constraint_name,\n a.attname as column_name,\n fc.relname as foreign_table_name,\n fns.nspname as foreign_table_schema,\n fa.attname as foreign_column_name\nfrom\n (S"...) at postgres.c:1193
#26 0x0000561b0ac56116 in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4637
#27 0x0000561b0abab7a7 in BackendRun (port=port@entry=0x561b0d0caf50) at postmaster.c:4464
#28 0x0000561b0abae969 in BackendStartup (port=port@entry=0x561b0d0caf50) at postmaster.c:4192
#29 0x0000561b0abaeaa6 in ServerLoop () at postmaster.c:1782
```
Fixes#7603
With the recent changes in packaging images, linux package installations
to execute validate_output is unnecessary now.
In this PR, I removed them to make the pipeline more effective.
- [x] Remove the test warning before merge
(cherry picked from commit 32b0fc23f5)
centos 7 and oracle 7 is not being supported for newer releases by
Postgres. Therefore, getting package download errors in packaging
pipelines.
This PR removes el/7 and ol/7 Postgres 16 pipelines
(cherry picked from commit b0e982d0b5)
DESCRIPTION: Fix performance issue when using "\d tablename" on a server
with many tables
We introduce a filter to every query on pg_class to automatically remove
shards. This is useful to make sure \d and PgAdmin are not cluttered
with shards. However, the way we were introducing this filter was using
`securityQuals` which can have negative impact on query performance.
On clusters with 100k+ tables this could cause a simple "\d tablename"
command to take multiple seconds, because a skipped optimization by
Postgres causes a full table scan. This changes the code to introduce
this filter in the regular `quals` list instead of in `securityQuals`.
Which causes Postgres to use the intended optimization again.
For reference, this was initially reported as a Postgres issue by me:
https://www.postgresql.org/message-id/flat/4189982.1712785863%40sss.pgh.pa.us#b87421293b362d581ea8677e3bfea920
(cherry picked from commit a0151aa31d)
Variables being modified in the PG_TRY block and read in the PG_CATCH
block should be qualified with volatile.
The variable waitEventSet is modified in the PG_TRY block (line 1085)
and read in the PG_CATCH block (line 1095).
The variable relation is modified in the PG_TRY block (line 500) and
read in the PG_CATCH block (line 515).
Besides, the variable objectAddress doesn't need the volatile qualifier.
Ref: C99 7.13.2.1[^1],
> All accessible objects have values, and all other components of the
abstract machine have state, as of the time the longjmp function was
called, except that the values of objects of automatic storage duration
that are local to the function containing the invocation of the
corresponding setjmp macro that do not have volatile-qualified type and
have been changed between the setjmp invocation and longjmp call are
indeterminate.
[^1]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
DESCRIPTION: Correctly mark some variables as volatile
---------
Co-authored-by: Hong Yi <zouzou0208@gmail.com>
(cherry picked from commit ada3ba2507)
This patch includes the username in the reported error message.
This makes debugging easier when certain commands open connections
as other users than the user that is executing the command.
```
monitora_snapshot=# SELECT citus_move_shard_placement(102030, 'monitora.db-dev-worker-a', 6005, 'monitora.db-dev-worker-a', 6017);
ERROR: connection to the remote node monitora_user@monitora.db-dev-worker-a:6017 failed with the following error: fe_sendauth: no password supplied
Time: 40,198 ms
```
(cherry picked from commit 8b48d6ab02)
Test isolation_update_node fails on some systems with the following error:
```
-s2: WARNING: connection to the remote node non-existent:57637 failed with the following error: could not translate host name "non-existent" to address: Name or service not known
+s2: WARNING: connection to the remote node non-existent:57637 failed with the following error: could not translate host name "non-existent" to address: Temporary failure in name resolution
```
This slightly modifies an already existing [normalization
rule](739c6d26df/src/test/regress/bin/normalize.sed (L217-L218))
to fix it.
Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>
(cherry picked from commit 21464adfec)
When there are multiple localhost entries in /etc/hosts like following
/etc/hosts:
```
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
127.0.0.1 localhost
```
multi_cluster_management check will failed:
```
@@ -857,20 +857,21 @@
ERROR: group 14 already has a primary node
-- check that you can add secondaries and unavailable nodes to a group
SELECT groupid AS worker_2_group FROM pg_dist_node WHERE nodeport = :worker_2_port \gset
SELECT 1 FROM master_add_node('localhost', 9998, groupid => :worker_1_group, noderole => 'secondary');
?column?
----------
1
(1 row)
SELECT 1 FROM master_add_node('localhost', 9997, groupid => :worker_1_group, noderole => 'unavailable');
+WARNING: could not establish connection after 5000 ms
?column?
----------
1
(1 row)
```
This actually isn't just a problem in test environments, but could occur
as well during actual usage when a hostname in pg_dist_node
resolves to multiple IPs and one of those IPs is unreachable.
Postgres will then automatically continue with the next IP, but
Citus should listen for events on the new socket. Not on the
old one.
Co-authored-by: chuhx43211 <chuhx43211@hundsun.com>
(cherry picked from commit 9a91136a3d)
This PR changes the order in which the locks are acquired (for the
target and reference tables), when a modify request is initiated from a
worker node that is not the "FirstWorkerNode".
To prevent concurrent writes, locks are acquired on the first worker
node for the replicated tables. When the update statement originates
from the first worker node, it acquires the lock on the reference
table(s) first, followed by the target table(s). However, if the update
statement is initiated in another worker node, the lock requests are
sent to the first worker in a different order. This PR unifies the
modification order on the first worker node. With the third commit,
independent of the node that received the request, the locks are
acquired for the modified table and then the reference tables on the
first node.
The first commit shows a sample output for the test prior to the fix.
Fixes#7477
---------
Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
(cherry picked from commit 8afa2d0386)
DESCRIPTION: Remove a few small memory leaks
In #7440 one instance of a strdup was removed. But there were a few
more. This removes the ones that are left over, or adds a comment why
strdup is on purpose.
(cherry picked from commit 9683bef2ec)
Not sure why we never found this using valgrind, but using strdup will
cause memory leaks because the pointer is not tracked in a memory
context.
(cherry picked from commit 72fbea20c4)
And when that is the case, directly use it as "host" parameter for the
connections between nodes and use the "hostname" provided in
pg_dist_node / pg_dist_poolinfo as "hostaddr" to avoid host name lookup.
This is to avoid allowing dns resolution (and / or setting up DNS names
for each host in the cluster). This already works currently when using
IPs in the hostname. The only use of setting host is that you can then
use sslmode=verify-full and it will validate that the hostname matches
the certificate provided by the node you're connecting too.
It would be more flexible to make this a per-node setting, but that
requires SQL changes. And we'd like to backport this change, and
backporting such a sql change would be quite hard while backporting this
change would be very easy. And in many setups, a different hostname for
TLS validation is actually not needed. The reason for that is
query-from-any node: With query-from-any-node all nodes usually have a
certificate that is valid for the same "cluster hostname", either using
a wildcard cert or a Subject Alternative Name (SAN). Because if you load
balance across nodes you don't know which node you're connecting to, but
you still want TLS validation to do it's job. So with this change you
can use this same "cluster hostname" for TLS validation within the
cluster. Obviously this means you don't validate that you're connecting
to a particular node, just that you're connecting to one of the nodes in
the cluster, but that should be fine from a security perspective (in
most cases).
Note to self: This change requires updating
https://docs.citusdata.com/en/latest/develop/api_guc.html#citus-node-conninfo-text.
DESCRIPTION: Allows overwriting host name for all inter-node connections
by supporting "host" parameter in citus.node_conninfo
(cherry picked from commit 3586aab17a)
This fixes#7454: master_disable_node() has only two arguments, but
calls citus_disable_node() that tries to read three arguments
Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>
(cherry picked from commit 683e10ab69)
When using a CASE WHEN expression in the body
of the function that is used in the DO block, a segmentation
fault occured. This fixes that.
Fixes#7381
---------
Co-authored-by: Konstantin Morozov <vzbdryn@yahoo.com>
(cherry picked from commit 12f56438fc)