When multiple sessions concurrently attempt to add the same coordinator
node using `citus_set_coordinator_host`, there is a potential race
condition. Both sessions may pass the initial metadata check
(`isCoordinatorInMetadata`), but only one will succeed in adding the
node. The other session will fail with an assertion error
(`Assert(!nodeAlreadyExists)`), causing the server to crash. Even though
the `AddNodeMetadata` function takes an exclusive lock, it appears that
the lock is not preventing the race condition before the initial
metadata check.
- **Issue**: The current logic allows concurrent sessions to pass the
check for existing coordinators, leading to an attempt to insert
duplicate nodes, which triggers the assertion failure.
- **Impact**: This race condition leads to crashes during operations
that involve concurrent coordinator additions, as seen in
https://github.com/citusdata/citus/issues/7646.
**Test Plan:**
- Isolation Test Limitation: An isolation test was added to simulate
concurrent additions of the same coordinator node, but due to the
behavior of PostgreSQL locking mechanisms, the test does not trigger the
edge case. The lock applied within the function serializes the
operations, preventing the race condition from occurring in the
isolation test environment.
While the edge case is difficult to reproduce in an isolation test, the
fix addresses the core issue by ensuring concurrency control through
proper locking.
- Existing Tests: All existing tests related to node metadata and
coordinator management have been run to ensure that no regressions were
introduced.
**After the Fix:**
- Concurrent attempts to add the same coordinator node will be
serialized. One session will succeed in adding the node, while the
others will skip the operation without crashing the server.
Co-authored-by: Mehmet YILMAZ <mehmet.yilmaz@microsoft.com>
**Description:**
This PR adds a section to CONTRIBUTING.md that explains how to set up
debugging in the devcontainer using VS Code.
**Changes:**
- **New Debugging Section**: Clear instructions on starting the
debugger, selecting the appropriate PostgreSQL process, and setting
breakpoints for easier troubleshooting.
**Purpose:**
- **Improved Contributor Workflow**: Enables contributors to debug the
Citus extension within the devcontainer, enhancing productivity and
making it easier to resolve issues.
---------
Co-authored-by: Mehmet YILMAZ <mehmet.yilmaz@microsoft.com>
DESCRIPTION: Add a check to see if the given limit is null.
Fixes a bug by checking if the limit given in the query is null when the
actual limit is computed with respect to the given offset.
Prior to this change, null is interpreted as 0 during the limit
calculation when both limit and offset are given.
Fixes#7663
Removes el/7 and ol/7 as runners and update checkout action to v4
We use EL/7 and OL/7 runners to test packaging for these distributions.
However, for the past two weeks, we've encountered errors during the
checkout step in the pipelines. The error message is as follows:
```
/__e/node20/bin/node: /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /__e/node20/bin/node)
/__e/node20/bin/node: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /__e/node20/bin/node)
/__e/node20/bin/node: /lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by /__e/node20/bin/node)
/__e/node20/bin/node: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /__e/node20/bin/node)
/__e/node20/bin/node: /lib64/libc.so.6: version `GLIBC_2.28' not found (required by /__e/node20/bin/node)
/__e/node20/bin/node: /lib64/libc.so.6: version `GLIBC_2.25' not found (required by /__e/node20/bin/node)
```
The GCC version within the EL/7 and OL/7 Docker images is 2.17, and we
cannot upgrade it. Therefore, we need to remove these images from the
packaging test pipelines. Consequently, we will no longer verify if the
code builds for EL/7 and OL/7.
However, we are not using these packaging images as runners within the
packaging infrastructure, so we can continue to use these images for
packaging.
Additional Info: I learned that Marlin team fully dropped the el/7
support so we will drop in further releases as well
We move the CI images to the github container registry.
Given we mostly (if not solely) run these containers on github actions
infra it makes sense to have them hosted closer to where they are
needed.
Image changes: https://github.com/citusdata/the-process/pull/157
The sections about the rebalancer algorithm and the backround tasks were
empty.
---------
Co-authored-by: Marco Slot <marco.slot@gmail.com>
Co-authored-by: Steven Sheehy <17552371+steven-sheehy@users.noreply.github.com>
Related to issue #7619, #7620
Merge command fails when source query is single sharded and source and
target are co-located and insert is not using distribution key of
source.
Example
```
CREATE TABLE source (id integer);
CREATE TABLE target (id integer );
-- let's distribute both table on id field
SELECT create_distributed_table('source', 'id');
SELECT create_distributed_table('target', 'id');
MERGE INTO target t
USING ( SELECT 1 AS somekey
FROM source
WHERE source.id = 1) s
ON t.id = s.somekey
WHEN NOT MATCHED
THEN INSERT (id)
VALUES (s.somekey)
ERROR: MERGE INSERT must use the source table distribution column value
HINT: MERGE INSERT must use the source table distribution column value
```
Author's Opinion: If join is not between source and target distributed
column, we should not force user to use source distributed column while
inserting value of target distributed column.
Fix: If user is not using distributed key of source for insertion let's
not push down query to workers and don't force user to use source
distributed column if it is not part of join.
This reverts commit fa4fc0b372.
Co-authored-by: paragjain <paragjain@microsoft.com>
Because we want to track PR numbers and to make backporting easy we
(pretty much always) use squash-merges when merging to master. We
accidentally used a rebase merge for PR #7620. This reverts those
changes so we can redo the merge using squash merge.
This reverts all commits from eedb607c to 9e71750fc.
For some reason using localhost in our hba file doesn't have the
intended effect anymore in our Github Actions runners. Probably because
of some networking change (IPv6 maybe) or some change in the
`/etc/hosts` file.
Replacing localhost with the equivalent loopback IPv4 and IPv6 addresses
resolved this issue.
Updates checkout plugin for github actions to v4. Can not update the
version for check-sql-snapshots since new plugin causes below error in
the docker image this step is using . Please refer to:
https://github.com/citusdata/citus/actions/runs/9286197994/job/25552373953
Error:
```
/__e/node20/bin/node: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.27' not found (required by /__e/node20/bin/node)
/__e/node20/bin/node: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.28' not found (required by /__e/node20/bin/node)
/__e/node20/bin/node: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.25' not found (required by /__e/node20/bin/node)
```
DESCRIPTION: Fix performance issue when using "\d tablename" on a server
with many tables
We introduce a filter to every query on pg_class to automatically remove
shards. This is useful to make sure \d and PgAdmin are not cluttered
with shards. However, the way we were introducing this filter was using
`securityQuals` which can have negative impact on query performance.
On clusters with 100k+ tables this could cause a simple "\d tablename"
command to take multiple seconds, because a skipped optimization by
Postgres causes a full table scan. This changes the code to introduce
this filter in the regular `quals` list instead of in `securityQuals`.
Which causes Postgres to use the intended optimization again.
For reference, this was initially reported as a Postgres issue by me:
https://www.postgresql.org/message-id/flat/4189982.1712785863%40sss.pgh.pa.us#b87421293b362d581ea8677e3bfea920
Variables being modified in the PG_TRY block and read in the PG_CATCH
block should be qualified with volatile.
The variable waitEventSet is modified in the PG_TRY block (line 1085)
and read in the PG_CATCH block (line 1095).
The variable relation is modified in the PG_TRY block (line 500) and
read in the PG_CATCH block (line 515).
Besides, the variable objectAddress doesn't need the volatile qualifier.
Ref: C99 7.13.2.1[^1],
> All accessible objects have values, and all other components of the
abstract machine have state, as of the time the longjmp function was
called, except that the values of objects of automatic storage duration
that are local to the function containing the invocation of the
corresponding setjmp macro that do not have volatile-qualified type and
have been changed between the setjmp invocation and longjmp call are
indeterminate.
[^1]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
DESCRIPTION: Correctly mark some variables as volatile
---------
Co-authored-by: Hong Yi <zouzou0208@gmail.com>
Fix check-arbitrary-configs tests failure with current REL_16_STABLE.
This is the same problem as described in #7573. I missed pg_regress call
in _run_pg_regress() in that PR.
Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>
DESCRIPTION: Fix performance issue in GetForeignKeyOids on systems with
many constraints
GetForeignKeyOids was showing up in CPU profiles when distributing
schemas on systems with 100k+ constraints. The reason was that this
function was doing a sequence scan of pg_constraint to get the foreign
keys that referenced the requested table.
This fixes that by finding the constraints referencing the table through
pg_depend instead of pg_constraint. We're doing this indirection,
because pg_constraint doesn't have an index that we can use, but
pg_depend does.
DESCRIPTION: Fix PG upgrades when invalid rebalance strategies exist
Without this change an upgrade of a cluster with an invalid rebalance
strategy would fail with an error like this:
```
cache lookup failed for shard_cost_function with oid 6077337
CONTEXT: SQL statement "SELECT citus_validate_rebalance_strategy_functions(
NEW.shard_cost_function,
NEW.node_capacity_function,
NEW.shard_allowed_on_node_function)"
PL/pgSQL function citus_internal.pg_dist_rebalance_strategy_trigger_func() line 5 at PERFORM
SQL statement "INSERT INTO pg_catalog.pg_dist_rebalance_strategy SELECT
name,
default_strategy,
shard_cost_function::regprocedure::regproc,
node_capacity_function::regprocedure::regproc,
shard_allowed_on_node_function::regprocedure::regproc,
default_threshold,
minimum_threshold,
improvement_threshold
FROM public.pg_dist_rebalance_strategy"
PL/pgSQL function citus_finish_pg_upgrade() line 115 at SQL statement
```
This fixes that by disabling the trigger and simply re-inserting the
invalid rebalance strategy without checking. We could also silently
remove it, but this seems nicer.
DESCRIPTION: Fix performance issue when distributing a table that
depends on an extension
When the database contains many objects this function would show up in
profiles because it was doing a sequence scan on pg_depend. And with
many objects pg_depend can get very large.
This starts using an index scan to only look for rows containing FDWs,
of which there are expected to be very few (often even zero).
DESCRIPTION: Fix performance issue when creating distributed tables if
many already exist
This builds on the work to speed up EnsureSequenceTypeSupported, and now
does something similar for SequenceUsedInDistributedTable.
SequenceUsedInDistributedTable had a similar O(number of citus tables)
operation. This fixes that and speeds up creation of distributed tables
significantly when many distributed tables already exist.
Fixes#7022
DESCRIPTION: Fix performance issue when creating distributed tables and many already exist
EnsureSequenceTypeSupported was doing an O(number of distributed tables)
operation. This can become very slow with lots of Citus tables, which
now happens much more frequently in practice due to schema based sharding.
Partially addresses #7022
And when that is the case, directly use it as "host" parameter for the
connections between nodes and use the "hostname" provided in
pg_dist_node / pg_dist_poolinfo as "hostaddr" to avoid host name lookup.
This is to avoid allowing dns resolution (and / or setting up DNS names
for each host in the cluster). This already works currently when using
IPs in the hostname. The only use of setting host is that you can then
use sslmode=verify-full and it will validate that the hostname matches
the certificate provided by the node you're connecting too.
It would be more flexible to make this a per-node setting, but that
requires SQL changes. And we'd like to backport this change, and
backporting such a sql change would be quite hard while backporting this
change would be very easy. And in many setups, a different hostname for
TLS validation is actually not needed. The reason for that is
query-from-any node: With query-from-any-node all nodes usually have a
certificate that is valid for the same "cluster hostname", either using
a wildcard cert or a Subject Alternative Name (SAN). Because if you load
balance across nodes you don't know which node you're connecting to, but
you still want TLS validation to do it's job. So with this change you
can use this same "cluster hostname" for TLS validation within the
cluster. Obviously this means you don't validate that you're connecting
to a particular node, just that you're connecting to one of the nodes in
the cluster, but that should be fine from a security perspective (in
most cases).
Note to self: This change requires updating
https://docs.citusdata.com/en/latest/develop/api_guc.html#citus-node-conninfo-text.
DESCRIPTION: Allows overwriting host name for all inter-node connections
by supporting "host" parameter in citus.node_conninfo
In PostgreSQL 16 a new option expecteddir was introduced to pg_regress.
Together with fix in
[196eeb6b](https://github.com/postgres/postgres/commit/196eeb6b) it
causes check-vanilla failure if expecteddir is not specified.
Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>
DESCRIPTION: Fixes a crash caused by some form of ALTER TABLE ADD COLUMN
statements. When adding multiple columns, if one of the ADD COLUMN
statements contains a FOREIGN constraint ommitting the referenced
columns in the statement, a SEGFAULT occurs.
For instance, the following statement results in a crash:
```
ALTER TABLE lt ADD COLUMN new_col1 bool,
ADD COLUMN new_col2 int references rt;
```
Fixes#7520.
Fixes https://github.com/citusdata/citus/issues/7536.
Note to reviewer:
Before this commit, the following results in an assertion failure when
executed locally and this won't be the case anymore:
```console
make -C src/test/regress/ check-citus-upgrade-local citus-old-version=v10.2.0
```
Note that this doesn't happen on CI as we don't enable assertions there.
---------
Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
RunPreprocessNonMainDBCommand and RunPostprocessNonMainDBCommand are
the entrypoints for this module. These functions are called from
utility_hook.c to support some of the node-wide object management
commands from non-main databases.
To add support for a new command type, one needs to define a new
NonMainDbDistributeObjectOps object and add it to
GetNonMainDbDistributeObjectOps.
This PR changes the order in which the locks are acquired (for the
target and reference tables), when a modify request is initiated from a
worker node that is not the "FirstWorkerNode".
To prevent concurrent writes, locks are acquired on the first worker
node for the replicated tables. When the update statement originates
from the first worker node, it acquires the lock on the reference
table(s) first, followed by the target table(s). However, if the update
statement is initiated in another worker node, the lock requests are
sent to the first worker in a different order. This PR unifies the
modification order on the first worker node. With the third commit,
independent of the node that received the request, the locks are
acquired for the modified table and then the reference tables on the
first node.
The first commit shows a sample output for the test prior to the fix.
Fixes#7477
---------
Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
When using a CASE WHEN expression in the body
of the function that is used in the DO block, a segmentation
fault occured. This fixes that.
Fixes#7381
---------
Co-authored-by: Konstantin Morozov <vzbdryn@yahoo.com>
This fixes#7551 reported by Egor Chindyaskin
Function activate_node_snapshot() is not meant to be called on a cluster
without worker nodes. This commit adds ERROR report for such case to
prevent server crash.
DESCRIPTION: Adds support for distributed `ALTER/DROP ROLE` commands
from the databases where Citus is not installed
---------
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
In preprocess phase, we save the original database name, replace
dbname field of CreatedbStmt with a temporary name (to let Postgres
to create the database with the temporary name locally) and then
we insert a cleanup record for the temporary database name on all
nodes **(\*\*)**.
And in postprocess phase, we first rename the temporary database
back to its original name for local node and then return a list of
distributed DDL jobs i) to create the database with the temporary
name and then ii) to rename it back to its original name on other
nodes. That way, if CREATE DATABASE fails on any of the nodes, the
temporary database will be cleaned up by the cleanup records that
we inserted in preprocess phase and in case of a failure, we won't
leak any databases called as the name that user intended to use for
the database.
Solves the problem documented in
https://github.com/citusdata/citus/issues/7369
for CREATE DATABASE commands.
**(\*\*):** To ensure that we insert cleanup records on all nodes,
with this PR we also start requiring having the coordinator in the
metadata because otherwise we would skip inserting a cleanup record
for the coordinator.
Add configuration for coredumps and document how to make sure they are
enabled when developing in a devcontainer.
---------
Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>