citus

Commit Graph

Author	SHA1	Message	Date
naisila	ad791f4b3c	Merge branch 'release-13.0' into naisila/merge_13_0 - Fixed the merge conflicts - Removed some more PG14 paths as we dropped PG14 support - Renamed some more foreach_ptr to foreach_declared_ptr - Renamed all 12.2-1.sql files to 13.1-1.sql - Defined a PG version compat for Anum_pg_database_datlocale because PG17 renamed it from Anum_pg_database_daticulocale - Changed the node-wide object Assert to PG17 from PG16 Note: this needs some extra check for new PG17 stuff	2025-03-10 16:12:57 +03:00
Muhammad Usama	43f3786c1f	Fix Deadlock with transaction recovery is possible during Citus upgrades (#7910 ) DESCRIPTION: Fixes deadlock with transaction recovery that is possible during Citus upgrades. Fixes #7875. This commit addresses two interrelated deadlock issues uncovered during Citus upgrades: 1. Local Deadlock: - Problem: In `RecoverWorkerTransactions()`, a new connection is created for each worker node to perform transaction recovery by locking the `pg_dist_transaction` catalog table until the end of the transaction. When `RecoverTwoPhaseCommits()` calls this function for each worker node, the order of acquiring locks on `pg_dist_authinfo` and `pg_dist_transaction` can alternate. This reversal can lead to a deadlock if any concurrent process requires locks on these tables. - Fix: Pre-establish all worker node connections upfront so that `RecoverWorkerTransactions()` operates with a single, consistent connection. This ensures that locks on `pg_dist_authinfo` and `pg_dist_transaction` are always acquired in the correct order, thereby preventing the local deadlock. 2. Distributed Deadlock: - Problem: After resolving the local deadlock, a distributed deadlock issue emerges. The maintenance daemon calls `RecoverWorkerTransactions()` on each worker node— including the local node—which leads to a complex locking sequence: - A RowExclusiveLock is taken on the `pg_dist_transaction` table in `RecoverWorkerTransactions()`. - An update extension then tries to acquire an AccessExclusiveLock on the same table, getting blocked by the RowExclusiveLock. - A subsequent query (e.g., a SELECT on `pg_prepared_xacts`) issued using a separate connection on the local node gets blocked due to locks held during a call to `BuildCitusTableCacheEntry()`. - The maintenance daemon waits for this query, resulting in a circular wait and stalling the entire cluster. - Fix: Avoid cache lookups for internal PostgreSQL tables by implementing an early bailout for relation IDs below `FirstNormalObjectId` (system objects). This eliminates unnecessary calls to `BuildCitusTableCache`, reducing lock contention and mitigating the distributed deadlock. Furthermore, this optimization improves performance in fast connect→query_catalog→disconnect cycles by eliminating redundant cache creation and lookups. 3. Also reverts the commit that disabled the relevant test cases.	2025-03-04 15:11:01 +05:00
Colm	86107ca191	#7782 - catch when Postgres planning removes all Citus tables (#7907 ) DESCRIPTION: fix a planning error caused by a redundant WHERE clause Fix a Citus planning glitch that occurs in a DML query when the WHERE clause of the query is of the form: ` WHERE true OR <expression with 1 or more citus tables> ` and this is the only place in the query referencing a citus table. Postgres' standard planner transforms the WHERE clause to: ` WHERE true ` So the query now has no citus tables, confusing the Citus planner as described in issues #7782 and #7783. The fix is to check, after Postgres standard planner, if the Query has been transformed as shown, and re-run the check of whether or not the query needs distributed planning.	2025-02-27 10:54:39 +00:00
Mehmet YILMAZ	2b964228bc	Fix 0-Task Plans in Single-Shard Router When Updating a Local Table with Reference Table in Subquery (#7897 ) This PR fixes an issue #7891 in the Citus planner where an `UPDATE` on a local table with a subquery referencing a reference table could produce a 0-task plan. Historically, the planner sometimes failed to detect that both the target and referenced tables were effectively “local,” assigning `INVALID_SHARD_ID `and yielding a no-op plan. ### Root Cause - In the Citus router logic (`PlanRouterQuery`), we relied on `shardId` to determine whether a query should be routed to a single shard. - If `shardId == INVALID_SHARD_ID`, but we also had not marked the query as a “local table modification,” the code path would produce zero tasks. - Local + reference tables do not require multi-shard routing. Failing to detect this “purely local” scenario caused Citus to incorrectly route to zero tasks. ### Changes Enhanced Local Table Detection - Updated `IsLocalTableModification` and related checks to consider both local and reference tables as “local” for planning, preventing the 0-task scenario. - Expanded `ContainsOnlyLocalOrReferenceTables` to return true if there are no fully distributed tables in the query. Added Regress Test - Introduced a new regress test (`issue_7891.sql`) which reproduces the scenario. - Verifies we get a valid single- or local-task plan rather than a 0-task plan.	2025-02-25 20:49:32 +03:00
Onur Tirtir	30bf960c5c	Avoid artifact name collision for flaky test detection jobs	2025-02-24 14:02:13 +03:00
Colm	c1f5762645	Enhance MERGE .. WHEN NOT MATCHED BY SOURCE for repartitioned source (#7900 ) DESCRIPTION: Ensure that a MERGE command on a distributed table with a `WHEN NOT MATCHED BY SOURCE` clause runs against all shards of the distributed table. The Postgres MERGE command updates a table using a table or a query as a data source. It provides three ways to match the target table with the source: `WHEN MATCHED` means that there is a row in both the target and source; `WHEN NOT MATCHED` means that there is a row in the source that has no match (is not present) in the target; and, as of PG17, `WHEN NOT MATCHED BY SOURCE` means that there is a row in the target that has no match in the source. In Citus, when a MERGE command updates a distributed table using a local/reference table or a distributed query as source, that source is repartitioned, and for each repartitioned shard that has data (i.e. 1 or more rows) the MERGE is run against the corresponding distributed table shard. Suppose the distributed table has 32 shards, and the source repartitions into 4 shards that have data, with the remaining 28 shards being empty; then the MERGE command is performed on the 4 corresponding shards of the distributed table. However, the semantics of `WHEN NOT MATCHED BY SOURCE` are that the specified action must be performed on the target for each row in the target that is not in the source; so if the source is empty, all target rows should be updated. To see this, consider the following MERGE command: ``` MERGE INTO target AS t USING source AS s ON t.id = s.id WHEN NOT MATCHED BY SOURCE THEN UPDATE t SET t.col1 = 100 ``` If the source has zero rows then every row in the target is updated s.t. its col1 value is 100. Currently in Citus a MERGE on a distributed table with a local/reference table or a distributed query as source ignores shards of the distributed table when the corresponding shard of the repartitioned source has zero rows. However, if the MERGE command specifies a `WHEN NOT MATCHED BY SOURCE` clause, then the MERGE should be performed on all shards of the distributed table, to ensure that the specified action is performed on the target for each row in the target that is not in the source. This PR enhances Citus MERGE execution so that when a repartitioned source shard has zero rows, and the MERGE command specifies a `WHEN NOT MATCHED BY SOURCE` clause, the MERGE is performed against the corresponding shard of the distributed table using an empty (zero row) relation as source, by generating a query of the form: ``` MERGE INTO target_shard_0002 AS t USING (SELECT id FROM (VALUES (NULL) ) source_0002(id) WHERE FALSE) AS s ON t.id = s.id WHEN NOT MATCHED BY SOURCE THEN UPDATE t set t.col1 = 100 ``` This works because each row in the target shard will be updated, and `WHEN MATCHED` and `WHEN NOT MATCHED`, if specified, will be no-ops because the source has zero rows. To implement this when the source is a local or reference table involves teaching function `ExcuteSourceAtCoordAndRedistribution()` in `merge_executor.c` to not prune tasks when the query has `WHEN NOT MATCHED BY SOURCE` but to instead replace the task's query to one that uses an empty relation as source. And when the source is a distributed query, function `ExecuteMergeSourcePlanIntoColocatedIntermediateResults()` (also in `merge_executor.c`) instead of skipping empty tasks now generates a query that uses an empty relation as source for the corresponding target shard of the distributed table, but again only when the query has `WHEN NOT MATCHED BY SOURCE`. A new function `BuildEmptyResultQuery()` is added to `recursive_planning.c` and it is used by both the aforementioned functions in `merge_executor.c` to build an empty relation to use as the source. It applies the appropriate type to each column of the empty relation so the join with the target makes sense to the query compiler.	2025-02-24 09:11:19 +00:00
eaydingol	117bd1d04f	Disable nonmaindb interface (#7905 ) DESCRIPTION: The PR disables the non-main db related features. The non-main db related features were introduced in https://github.com/citusdata/citus/pull/7203.	2025-02-21 13:36:19 +03:00
OlgaSergeyevaB	459c283e7d	Custom Scan (ColumnarScan): exclude outer_join_rels from CandidateRelids (#7703 ) DESCRIPTION: Fixes a crash in columnar custom scan that happens when a columnar table is used in a join. Fixes issue #7647. Co-authored-by: Ольга Сергеева <ob-sergeeva@it-serv.ru>	2025-02-18 20:58:02 +00:00
Colm	8f3d9deffe	[Bug Fix] SEGV on query with Left Outer Join (#7787 ) (#7901 ) DESCRIPTION: Fixes a crash in left outer joins that can happen when there is an an aggregate on a column from the inner side of the join. Fix the SEGV seen in #7787 and #7899; it occurs because a column in the targetlist of a worker subquery can contain a non-empty varnullingrels field if the column is from the inner side of a left outer join. The issue can also occur with the columns in the HAVING clause, and this is also tested in the fix. The issue was triggered by the introduction of the varnullingrels to Vars in Postgres 16 (2489d76c) There is a related issue, #7705, where a non-empty varnullingrels was incorrectly copied into the query tree for the combine query. Here, a non-empty varnullingrels field of a var is incorrectly copied into the query tree for a worker subquery. The regress file from #7705 is used (and renamed) to also test this (#7787). An alternative test output file is required for Postgres 15 because of an optimization to DISTINCT in Postgres 16 (1349d2790bf).	2025-02-18 12:41:34 +00:00
Karina	711aec80fa	Fix system_queries test to actually test the problem (#7613 ) The test added in #7604 doesn't reach the `HasRangeTableRef` function and thus doesn't test what it should. Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>	2025-02-07 14:29:13 +00:00
michailtoksovo	829665ebca	Fix typo: collcet -> collect (#7734 ) Just a tiny typo fix in comment	2025-02-07 14:03:34 +00:00
mulander	f7c57351a7	Update 13 blog URL	2025-02-06 17:59:22 +02:00
mulander	565c309a1e	Update README.md Replace packages for 13.0.1. Drop mention of Centos, we are no longer building packages for it. Change release blog title, URL change pending.	2025-02-06 17:59:22 +02:00
Onur Tirtir	cee0f31ddb	Port recent CI fixes and 13.0.1 changelog entry to main (#7882 ) Although we will re-create the main branch from release-13.0 soon, let's get the CI on main up and running fwiw.	2025-02-04 17:15:47 +03:00
Onur Tirtir	2d8be01853	Disable 2PC recovery while executing ALTER EXTENSION cmd during Citus upgrade tests (cherry picked from commit `b6b73e2f4c`)	2025-02-04 16:53:32 +03:00
Naisila Puka	9a0cc282b7	Changelog entries for v13.0.1 (#7873 ) (cherry picked from commit `d28a5eae6c`)	2025-02-04 16:51:33 +03:00
Gürkan İndibay	7073f06153	Updates github checkout actions to v4 (#7611 ) (cherry picked from commit 3fe22406e62fb40da12a0d91f3ecc0cba81cdb24)	2025-02-04 16:50:01 +03:00
Onur Tirtir	8783cae57f	Avoid publishing artifacts with conflicting names .. as documented in actions/upload-artifact#480. (cherry picked from commit `0d4c676b07`)	2025-02-04 16:49:20 +03:00
Onur Tirtir	b6e3f39583	Fix flaky citus upgrade test (cherry picked from commit `4cad81d643`)	2025-02-04 16:49:12 +03:00
Onur Tirtir	a28f75cc77	Upgrade download-artifacts action to 4.1.8 (cherry picked from commit `5317cc7310`)	2025-02-04 16:49:06 +03:00
Onur Tirtir	af5fced935	Upgrade upload-artifacts action to 4.6.0 (cherry picked from commit `398a2ea197`)	2025-02-04 16:47:04 +03:00
Naisila Puka	d28a5eae6c	Changelog entries for v13.0.1 (#7873 )	2025-02-04 12:55:35 +00:00
Naisila Puka	e5a1c17134	Bump Citus version to 13.0.1 (#7872 )	2025-02-04 15:15:05 +03:00
Onur Tirtir	b6b73e2f4c	Disable 2PC recovery while executing ALTER EXTENSION cmd during Citus upgrade tests	2025-02-04 14:00:31 +03:00
Onur Tirtir	0b4896f7b4	Revert "Release RowExclusiveLock on pg_dist_transaction as soon as remote xacts are recovered" This reverts commit `684b4c6b96`.	2025-02-04 14:00:31 +03:00
Gürkan İndibay	ee76c4423e	Updates github checkout actions to v4 (#7611 ) (cherry picked from commit 3fe22406e62fb40da12a0d91f3ecc0cba81cdb24)	2025-02-04 11:18:35 +03:00
Naisila Puka	9a7f6d6c59	Drops PG14 support (#7753 ) DESCRIPTION: Drops PG14 support 1. Remove "$version_num" != 'xx' from configure file 2. delete all PG_VERSION_NUM = PG_VERSION_XX references in the code 3. Look at pg_version_compat.h file, remove all _compat functions etc defined specifically for PGXX differences 4. delete all PG_VERSION_NUM >= PG_VERSION_(XX+1), PG_VERSION_NUM < PG_VERSION_(XX+1) ifs in the codebase 5. delete ruleutils_xx.c file 6. cleanup normalize.sed file from pg14 specific lines 7. delete all alternative output files for that particular PG version, server_version_ge variable helps here	2025-02-03 17:13:40 +03:00
Naisila Puka	6b70724b31	Fix mixed Citus upgrade tests (#7218 ) (#7871 ) When testing rolling Citus upgrades, coordinator should not be upgraded until we upgrade all the workers. --------- Co-authored-by: Jelte Fennema-Nio <github-tech@jeltef.nl> (cherry picked from commit `27ac44eb2a`) Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>	2025-02-03 16:49:18 +03:00
Onur Tirtir	26f16a7654	Avoid publishing artifacts with conflicting names .. as documented in actions/upload-artifact#480.	2025-02-01 01:49:09 +03:00
Onur Tirtir	24758c39a1	Fix flaky citus upgrade test	2025-02-01 01:49:09 +03:00
Onur Tirtir	684b4c6b96	Release RowExclusiveLock on pg_dist_transaction as soon as remote xacts are recovered As of this commit, after recovering the remote transactions, now we release the lock on pg_dist_transaction while closing it to avoid deadlocks that might occur because of trying to acquire a lock on pg_dist_authinfo while holding a lock on pg_dist_transaction. Such a scenario can only cause a deadlock if another transaction is trying to acquire a strong lock on pg_dist_transaction while holding a lock on pg_dist_authinfo. As of today, we (implicitly) acquire a strong lock on pg_dist_transaction only when upgrading Citus to 11.3-1 and this happens when creating a REPLICA IDENTITY on pg_dist_transaction. And regardless of the code-path we are in, it should be okay to release the lock there because all we do after that point is to abort the prepared transactions that are not part of an in-progress distributed transaction and releasing the lock before doing so should be just fine. This also changes the blocking behavior between citus_create_restore_point and the transaction recovery code-path in the sense that now citus_create_restore_point doesn't until transaction recovery completes aborting the prepared transactions that are not part of an in-progress distributed transaction. However, this should be fine because even before this was possible, e.g., if transaction recovery fails to open a remote connection to a node.	2025-02-01 01:49:09 +03:00
Onur Tirtir	cbe0de33a6	Upgrade download-artifacts action to 4.1.8	2025-02-01 01:49:09 +03:00
Onur Tirtir	b886cfa223	Upgrade upload-artifacts action to 4.6.0	2025-02-01 01:49:09 +03:00
Naisila Puka	548395fd77	fix changelog date (#7859 )	2025-01-22 14:28:46 +03:00
Naisila Puka	cba8e57737	Changelog entries for 13.0.0 (#7858 )	2025-01-22 13:17:35 +03:00
Naisila Puka	23d5207701	Fix pg17 test (#7857 ) error merged in `ab7c3b7804`	2025-01-22 12:54:52 +03:00
Naisila Puka	7b6a828c74	Changelog entries for 13.0.0 (#7850 )	2025-01-22 12:22:31 +03:00
Mehmet YILMAZ	ab7c3b7804	PG17 Compatibility - Fix crash when pg_class is used in MERGE (#7853 ) This pull request addresses Issue #7846, where specific MERGE queries on non-distributed and distributed tables can result in crashes in certain scenarios. The issue stems from the usage of `pg_class` catalog table, and the `FilterShardsFromPgclass` function in Citus. This function goes through the query's jointree to hide the shards. However, in PG17, MERGE's join quals are in a separate structure called `mergeJoinCondition`. Therefore FilterShardsFromPgclass was not filtering correctly in a `MERGE` command that involves `pg_class`. To fix the issue, we handle `mergeJoinCondition` separately in PG17. Relevant PG commit: `0294df2f1f` Non-Distributed Tables: A MERGE query involving a non-distributed table using `pg_catalog.pg_class` as the source may execute successfully but needs testing to ensure stability. Distributed Tables: Performing a MERGE on a distributed table using `pg_catalog.pg_class` as the source raises an error: `ERROR: MERGE INTO a distributed table from Postgres table is not yet supported` However, in some cases, this can lead to a server crash if the unsupported operation is not properly handled. This is the test output from the same test conducted prior to the code changes being implemented. ``` -- Issue #7846: Test crash scenarios with MERGE on non-distributed and distributed tables -- Step 1: Connect to a worker node to verify shard visibility \c postgresql://postgres@localhost::worker_1_port/regression?application_name=psql SET search_path TO pg17; -- Step 2: Create and test a non-distributed table CREATE TABLE non_dist_table_12345 (id INTEGER); -- Test MERGE on the non-distributed table MERGE INTO non_dist_table_12345 AS target_0 USING pg_catalog.pg_class AS ref_0 ON target_0.id = ref_0.relpages WHEN NOT MATCHED THEN DO NOTHING; SSL SYSCALL error: EOF detected connection to server was lost ```	2025-01-21 17:48:06 +03:00
Colm	c2bc7aca4a	Update tdigest_aggregate_support output for PG15+ (#7849 ) Regress test tdigest_aggregate_support has been failing since at least Citus 12.0, when tdigest extension is installed in Postgres. This appears to be because of an omission by commit `03832f3` and a change in the implementation of Postgres random() function (pg commit [d4f109e4a](https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d4f109e4a)). To reproduce the test diff: - Checkout [tdigest ](https://github.com/tvondra/tdigest)and run `make; make install` - In citus regress directory run `make check-multi` or `./citus_tests/run_test.py tdigest_aggregate_support` There are two parts to this commit: 1. Revert `Output: xxxxx` in EXPLAIN VERBOSE. Citus commit `fe4ac51` normalized EXPLAIN VERBOSE output because of a change between pg12 and pg13. When pg12 support was no longer required, the rule was removed from normalize.sed and `Output: xxxx` was reverted in the impacted regress output files (`03832f3`), but `tdigest_aggregate_support` was omitted. 2. Adjust the query results; the tdigest_aggregate_support test file has a comment _verifying results - should be stable due to seed while inserting the data, if failure due to data these queries could be removed or check for certain ranges_ but the result values in this commit are consistent across citus 12.0 (pg 15), citus 12.1 (pg 16) and citus 13.0 (pg 17), or since the Postgres changed their [implementation of random](https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d4f109e4a), so proposing to go with these results.	2025-01-20 22:00:33 +00:00
Naisila Puka	fa8e867662	Bump to latest PG minors 17.2, 16.6, 15.10, 14.15 (#7843 ) Similar to `5ef2cd67ed`, we use the commit sha of a local build of the images, pushed.	2025-01-13 22:35:11 +03:00
Emel Şimşek	c55bc8c669	Propagates SECURITY LABEL ON ROLE stmt (#7304 ) (#7735 ) Propagates SECURITY LABEL ON ROLE stmt (https://github.com/citusdata/citus/pull/7304) We propagate `SECURITY LABEL [for provider] ON ROLE rolename IS labelname` to the worker nodes. We also make sure to run the relevant `SecLabelStmt` commands on a newly added node by looking at roles found in `pg_shseclabel`. See official docs for explanation on how this command works: https://www.postgresql.org/docs/current/sql-security-label.html This command stores the role label in the `pg_shseclabel` catalog table. This commit also fixes the regex string in `check_gucs_are_alphabetically_sorted.sh` script such that it escapes the dot. Previously it was looking for all strings starting with "citus" instead of "citus." as it should. To test this feature, I currently make use of a special GUC to control label provider registration in PG_init when creating the Citus extension. (cherry picked from commit `0d1f18862b`) Co-authored-by: Naisila Puka <37271756+naisila@users.noreply.github.com> (cherry picked from commit `686d2b46ca`)	2025-01-13 19:56:01 +03:00
Nils Dijk	7e316c90c4	Shard moves/isolate report LSN's in lsn format (#7227 ) DESCRIPTION: Shard moves/isolate report LSN's in lsn format While investigating an issue with our catchup mechanism on certain postgres versions we noticed we print LSN's in the format of the native long type. This is an uncommon representation for LSN's in postgres logs. This patch changes the output of our log message to go from the long type representation to the native LSN type representation. Making it easier for postgres users to recognize and compare LSN's with other related reports. example of new output: ``` 2023-09-25 17:28:47.544 CEST [11345] LOG: The LSN of the target subscriptions on node localhost:9701 have increased from 0/0 to 0/E1ED20F8 at 2023-09-25 17:28:47.544165+02 where the source LSN is 1/415DCAD0 ``` (cherry picked from commit `b87fbcbf79`)	2025-01-13 17:47:47 +03:00
Teja Mupparti	d2ca63fb8c	For scenarios, such as, Bug 3697586: Server crashes when assigning distributed transaction: Raise an ERROR instead of a crash (cherry picked from commit `ab7c13beb5`)	2025-01-13 17:47:18 +03:00
Onur Tirtir	a19e180212	Avoid re-assigning the global pid for client backends and bg workers when the application_name changes (#7791 ) DESCRIPTION: Fixes a crash that happens because of unsafe catalog access when re-assigning the global pid after application_name changes. When application_name changes, we don't actually need to try re-assigning the global pid for external client backends because application_name doesn't affect the global pid for such backends. Plus, trying to re-assign the global pid for external client backends would unnecessarily cause performing a catalog access when the cached local node id is invalidated. However, accessing to the catalog tables is dangerous in certain situations like when we're not in a transaction block. And for the other types of backends, i.e., the Citus internal backends, we need to re-assign the global pid when the application_name changes because for such backends we simply extract the global pid inherited from the originating backend from the application_name -that's specified by originating backend when openning that connection- and this doesn't require catalog access. (cherry picked from commit `73411915a4`)	2025-01-13 17:47:11 +03:00
Naisila Puka	f7bead22d4	Remove accidentally added citus-tools empty submodule (#7842 ) Accidentally added here `4775715691`	2025-01-13 16:49:50 +03:00
Naisila Puka	5ef2cd67ed	Bump pg versions 14.15, 15.10, 16.6 (#7829 ) Bump PG versions to the latest minors 14.15, 15.10, 16.6 There is a libpq symlink issue when the images are built remotely https://github.com/citusdata/citus/actions/runs/12583502447/job/35071296238 Hence, we use the commit sha of a local build of the images, pushed. This is temporary, until we find the underlying cause of the symlink issue. --------- Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>	2025-01-13 16:24:51 +03:00
Pavel Seleznev	cdded256ef	Remove warnings on some builds (#7680 ) Co-authored-by: Pavel Seleznev <PNSeleznev@sberbank.ru> (cherry picked from commit `fe6d198ab2`)	2025-01-13 15:22:13 +03:00
Colm McHugh	353033d3f0	[Bug Fix] [SEGFAULT] Querying distributed tables with window partition may cause segfault #7705 In function MasterAggregateMutator(), when the original Node is a Var node use makeVar() instead of copyObject() when constructing the Var node for the target list of the combine query. The varnullingrels field of the original Var node is ignored because it is not relevant for the combine query; copying this cause the problem in issue 7705, where a coordinator query had a Var with a reference to a non-existent join relation. (cherry picked from commit `c52f36019f`)	2025-01-13 15:21:50 +03:00
Parag Jain	1893d9a900	[Bug Fix] : writing incorrect data to target Merge repartition Command (#7659 ) We were writing incorrect data to target collection in some cases of merge command. In case of repartition when source query is RELATION. We were referring to incorrect attribute number that was resulting into this incorrect behavior. Example : ![image](https://github.com/user-attachments/assets/a101cb36-7976-459c-befb-96a55a5b3dc1) ![image](https://github.com/user-attachments/assets/e5c83b7b-5b8e-4d79-a927-95684dc9ba49) I have added fixed tests as part of this PR , Thanks. (cherry picked from commit `5bad6c6a1d`)	2025-01-13 15:21:37 +03:00
Mehmet YILMAZ	063cff908e	Fix race condition in citus_set_coordinator_host when adding multiple coordinator nodes concurrently (#7682 ) When multiple sessions concurrently attempt to add the same coordinator node using `citus_set_coordinator_host`, there is a potential race condition. Both sessions may pass the initial metadata check (`isCoordinatorInMetadata`), but only one will succeed in adding the node. The other session will fail with an assertion error (`Assert(!nodeAlreadyExists)`), causing the server to crash. Even though the `AddNodeMetadata` function takes an exclusive lock, it appears that the lock is not preventing the race condition before the initial metadata check. - Issue: The current logic allows concurrent sessions to pass the check for existing coordinators, leading to an attempt to insert duplicate nodes, which triggers the assertion failure. - Impact: This race condition leads to crashes during operations that involve concurrent coordinator additions, as seen in https://github.com/citusdata/citus/issues/7646. Test Plan: - Isolation Test Limitation: An isolation test was added to simulate concurrent additions of the same coordinator node, but due to the behavior of PostgreSQL locking mechanisms, the test does not trigger the edge case. The lock applied within the function serializes the operations, preventing the race condition from occurring in the isolation test environment. While the edge case is difficult to reproduce in an isolation test, the fix addresses the core issue by ensuring concurrency control through proper locking. - Existing Tests: All existing tests related to node metadata and coordinator management have been run to ensure that no regressions were introduced. After the Fix: - Concurrent attempts to add the same coordinator node will be serialized. One session will succeed in adding the node, while the others will skip the operation without crashing the server. Co-authored-by: Mehmet YILMAZ <mehmet.yilmaz@microsoft.com> (cherry picked from commit `4775715691`)	2025-01-13 15:20:03 +03:00

1 2 3 4 5 ...

7080 Commits (naisila/merge_13_0_first_try) All Branches Search

7080 Commits (naisila/merge_13_0_first_try)

All Branches