citus

Commit Graph

Author	SHA1	Message	Date
Mehmet Yilmaz	5fe3aa296e	multi_cluster_management errors: ambiguous collation in LIKE	2025-06-13 14:27:18 +00:00
Muhammad Usama	95da74c47f	Fix Deadlock with transaction recovery is possible during Citus upgrades (#7910 ) DESCRIPTION: Fixes deadlock with transaction recovery that is possible during Citus upgrades. Fixes #7875. This commit addresses two interrelated deadlock issues uncovered during Citus upgrades: 1. Local Deadlock: - Problem: In `RecoverWorkerTransactions()`, a new connection is created for each worker node to perform transaction recovery by locking the `pg_dist_transaction` catalog table until the end of the transaction. When `RecoverTwoPhaseCommits()` calls this function for each worker node, the order of acquiring locks on `pg_dist_authinfo` and `pg_dist_transaction` can alternate. This reversal can lead to a deadlock if any concurrent process requires locks on these tables. - Fix: Pre-establish all worker node connections upfront so that `RecoverWorkerTransactions()` operates with a single, consistent connection. This ensures that locks on `pg_dist_authinfo` and `pg_dist_transaction` are always acquired in the correct order, thereby preventing the local deadlock. 2. Distributed Deadlock: - Problem: After resolving the local deadlock, a distributed deadlock issue emerges. The maintenance daemon calls `RecoverWorkerTransactions()` on each worker node— including the local node—which leads to a complex locking sequence: - A RowExclusiveLock is taken on the `pg_dist_transaction` table in `RecoverWorkerTransactions()`. - An update extension then tries to acquire an AccessExclusiveLock on the same table, getting blocked by the RowExclusiveLock. - A subsequent query (e.g., a SELECT on `pg_prepared_xacts`) issued using a separate connection on the local node gets blocked due to locks held during a call to `BuildCitusTableCacheEntry()`. - The maintenance daemon waits for this query, resulting in a circular wait and stalling the entire cluster. - Fix: Avoid cache lookups for internal PostgreSQL tables by implementing an early bailout for relation IDs below `FirstNormalObjectId` (system objects). This eliminates unnecessary calls to `BuildCitusTableCache`, reducing lock contention and mitigating the distributed deadlock. Furthermore, this optimization improves performance in fast connect→query_catalog→disconnect cycles by eliminating redundant cache creation and lookups. 3. Also reverts the commit that disabled the relevant test cases.	2025-03-12 12:43:01 +03:00
Onur Tirtir	7004295065	Revert "Release RowExclusiveLock on pg_dist_transaction as soon as remote xacts are recovered" This reverts commit `684b4c6b96`.	2025-03-12 12:43:01 +03:00
Onur Tirtir	d5618b6b4c	Release RowExclusiveLock on pg_dist_transaction as soon as remote xacts are recovered As of this commit, after recovering the remote transactions, now we release the lock on pg_dist_transaction while closing it to avoid deadlocks that might occur because of trying to acquire a lock on pg_dist_authinfo while holding a lock on pg_dist_transaction. Such a scenario can only cause a deadlock if another transaction is trying to acquire a strong lock on pg_dist_transaction while holding a lock on pg_dist_authinfo. As of today, we (implicitly) acquire a strong lock on pg_dist_transaction only when upgrading Citus to 11.3-1 and this happens when creating a REPLICA IDENTITY on pg_dist_transaction. And regardless of the code-path we are in, it should be okay to release the lock there because all we do after that point is to abort the prepared transactions that are not part of an in-progress distributed transaction and releasing the lock before doing so should be just fine. This also changes the blocking behavior between citus_create_restore_point and the transaction recovery code-path in the sense that now citus_create_restore_point doesn't until transaction recovery completes aborting the prepared transactions that are not part of an in-progress distributed transaction. However, this should be fine because even before this was possible, e.g., if transaction recovery fails to open a remote connection to a node.	2025-03-12 12:43:01 +03:00
Naisila Puka	6bd3474804	Rename foreach_ macros to foreach_declared_ macros (#7700 ) This is prep work for successful compilation with PG17 PG17added foreach_ptr, foreach_int and foreach_oid macros Relevant PG commit 14dd0f27d7cd56ffae9ecdbe324965073d01a9ff `14dd0f27d7` We already have these macros, but they are different with the PG17 ones because our macros take a DECLARED variable, whereas the PG16 macros declare a locally-scoped loop variable themselves. Hence I am renaming our macros to foreach_declared_ I am separating this into its own PR since it touches many files. The main compilation PR is https://github.com/citusdata/citus/pull/7699	2025-03-12 11:01:49 +03:00
Onur Tirtir	0acb5f6e86	Fix assertion failure in maintenance daemon during Citus upgrades (#7537 ) Fixes https://github.com/citusdata/citus/issues/7536. Note to reviewer: Before this commit, the following results in an assertion failure when executed locally and this won't be the case anymore: ```console make -C src/test/regress/ check-citus-upgrade-local citus-old-version=v10.2.0 ``` Note that this doesn't happen on CI as we don't enable assertions there. --------- Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>	2024-03-20 00:10:12 +00:00
Halil Ozan Akgül	b877d606c7	Adds 2PC distributed commands from other databases (#7203 ) DESCRIPTION: Adds support for 2PC from non-Citus main databases This PR only adds support for `CREATE USER` queries, other queries need to be added. But it should be simple because this PR creates the underlying structure. Citus main database is the database where the Citus extension is created. A non-main database is all the other databases that are in the same node with a Citus main database. When a `CREATE USER` query is run on a non-main database we: 1. Run `start_management_transaction` on the main database. This function saves the outer transaction's xid (the non-main database query's transaction id) and marks the current query as main db command. 2. Run `execute_command_on_remote_nodes_as_user("CREATE USER <username>", <username to run the command>)` on the main database. This function creates the users in the rest of the cluster by running the query on the other nodes. The user on the current node is created by the query on the outer, non-main db, query to make sure consequent commands in the same transaction can see this user. 3. Run `mark_object_distributed` on the main database. This function adds the user to `pg_dist_object` in all of the nodes, including the current one. This PR also implements transaction recovery for the queries from non-main databases.	2023-12-22 19:19:41 +03:00
Nils Dijk	0620c8f9a6	Sort includes (#7326 ) This change adds a script to programatically group all includes in a specific order. The script was used as a one time invocation to group and sort all includes throught our formatted code. The grouping is as follows: - System includes (eg. `#include<...>`) - Postgres.h (eg. `#include "postgres.h"`) - Toplevel imports from postgres, not contained in a directory (eg. `#include "miscadmin.h"`) - General postgres includes (eg . `#include "nodes/..."`) - Toplevel citus includes, not contained in a directory (eg. `#include "citus_verion.h"`) - Columnar includes (eg. `#include "columnar/..."`) - Distributed includes (eg. `#include "distributed/..."`) Because it is quite hard to understand the difference between toplevel citus includes and toplevel postgres includes it hardcodes the list of toplevel citus includes. In the same manner it assumes anything not prefixed with `columnar/` or `distributed/` as a postgres include. The sorting/grouping is enforced by CI. Since we do so with our own script there are not changes required in our uncrustify configuration.	2023-11-23 18:19:54 +01:00
Nils Dijk	0dac63afc0	move pg_version_constants.h to toplevel include (#7335 ) In preparation of sorting and grouping all includes we wanted to move this file to the toplevel includes for good grouping/sorting.	2023-11-09 15:09:39 +00:00
Ivan Vyazmitinov	e94bf93152	#6548 2PC recovery is extremely ineffective on a cluster with multiple DATABASEs fix (#7174 )	2023-09-04 15:28:22 +02:00
Onder Kalaci	149771792b	Remove useless version compats most likely leftover from earlier versions	2022-07-29 10:31:55 +02:00
Hanefi Onaldi	76176caea7	Fix typo s/exlusive/exclusive/	2021-12-23 01:35:01 +03:00
SaitTalhaNisanci	03832f353c	Drop postgres 11 support	2021-03-25 09:20:28 +03:00
Ahmet Gedemenli	936775e8e3	Delete transactions when removing node With this commit, we delete entries in pg_dist_transaction for the primary nodes that are removed by `master_remove_node`.	2020-12-07 11:35:20 +03:00
SaitTalhaNisanci	e7cd1ed0ee	Not take ShareUpdateExlusiveLock on pg_dist_transaction (#4184 ) * Not take ShareUpdateExlusiveLock on pg_dist_transaction We were taking ShareUpdateExlusiveLock on pg_dist_transaction during recovery to prevent multiple recoveries happening concurrenly. VACUUM( not FULL) also takes ShareUpdateExclusiveLock, and they can conflict. It seems that VACUUM will skip the table if there is a conflicting lock already taken unless it is doing the vacuum to prevent id wraparound, in which case there can be a deadlock. I guess the deadlock happens if: - VACUUM takes a lock on pg_dist_transaction and is done for id wraparound problem - The transaction in the maintenance tries to take a lock but cannot as that conflicts with the lock acquired by VACUUM - The transaction in the maintenance daemon has a very old xid hence VACUUM cannot proceed. If we take a row exclusive lock in transaction recovery then it wouldn't conflict with VACUUM hence it could proceed so the deadlock would be resolved. To prevent concurrent transaction recoveries happening, an advisory lock is taken with ShareUpdateExlusiveLock as before. * Use CITUS_OPERATIONS tag	2020-09-21 15:20:38 +03:00
Sait Talha Nisanci	b641f63bfd	Use CMDTAG_SELECT_COMPAT CMDTAG_SELECT exists in PG12 hence defining a MACRO such as CMDTAG_SELECT -> "SELECT" is not possible. I chose CMDTAG_SELECT_COMPAT because with the COMPAT suffix it is explicit that it maps to different things in different versions and also has a less chance of mapping something irrevelant. For example if we used SELECT as a macro, then it would map every SELECT to whatever it is mapping to, which might have unexpected/undesired behaviour.	2020-08-04 15:38:13 +03:00
Sait Talha Nisanci	bf831d2e59	Use table_openXXX methods in the codebase With PG13 heap_* (heap_open, heap_close etc) are replaced with table_* (table_open, table_close etc). It is better to use the new table access methods in the codebase and define the macros for the previous versions as we can easily remove the macro without having to change the codebase when we drop the support for the old version. Commits that introduced this change on Postgres: f25968c49697db673f6cd2a07b3f7626779f1827 e0c4ec07284db817e1f8d9adfb3fffc952252db0 4b21acf522d751ba5b6679df391d5121b6c4a35f Command to see relevant commits on Postgres side: git log --all --grep="heap_open"	2020-08-04 15:10:22 +03:00
SaitTalhaNisanci	ba01f3457a	use macros for pg versions instead of hardcoded values (#3694 ) 3 Macros are defined for removing the hardcoded pg versions. PG_VERSION_11, PG_VERSION_12 and PG_VERSION_13.	2020-04-01 17:01:52 +03:00
Onur Tirtir	88bfd2e4b7	refactor around local group id checks Mostyl optimizes the calls made to GetLocalGroupId and refactors its usages	2020-03-05 20:20:41 +03:00
Philip Dubé	20abc4d2b5	Replace foreach with foreach_ptr/foreach_oid (#3544 )	2020-02-27 16:54:49 +01:00
Philip Dubé	3a906b8210	Fix typos noticed while reading through code trying to understand HAVING	2020-02-11 19:55:10 +00:00
Jelte Fennema	1d8dde232f	Automatically convert useless declarations using regex replace (#3181 ) * Add declaration removal to CI * Convert declarations	2019-11-21 13:47:29 +01:00
Philip Dubé	e8ecbbfcb3	Escape transaction names	2019-11-08 21:23:01 +00:00
SaitTalhaNisanci	94a7e6475c	Remove copyright years (#2918 ) * Update year as 2012-2019 * Remove copyright years	2019-10-15 17:44:30 +03:00
Philip Dubé	492d1b2cba	ActivePrimaryNodeList: add lockMode parameter	2019-09-13 17:44:56 +00:00
Philip Dubé	68c4b71f93	Fix up includes with pg12 changes	2019-08-22 18:56:21 +00:00
Marco Slot	bb3a96eacb	Cache a configurable number of connections at xact end	2019-05-29 13:24:31 +02:00
Hadi Moshayedi	f4d3b94e22	Fix some of the casts for groupId (#2609 ) A small change which partially addresses #2608.	2019-03-05 12:06:44 -08:00
Onder Kalaci	bf28dd0cff	Do not recover wrong distributed transactions in MX	2018-09-07 09:52:46 +03:00
Murat Tuncer	a6fe5ca183	PG11 compatibility update - changes in ruleutils_11.c is reflected - vacuum statement api change is handled. We now allow multi-table vacuum commands. - some other function header changes are reflected - api conflicts between PG11 and earlier versions are handled by adding shims in version_compat.h - various regression tests are fixed due output and functionality in PG1 - no change is made to support new features in PG11 they need to be handled by new commit	2018-04-26 11:29:43 +03:00
Marco Slot	8486f76e15	Auto-recover 2PC transactions	2017-11-22 11:26:58 +01:00
Marco Slot	9793218122	Do not commit already-committed prepared transactions in recovery	2017-11-20 13:18:48 +01:00
Marco Slot	ae47df01ea	Observe prepared xacts twice in RecoverWorkerTransactions to avoid race condition	2017-11-20 11:44:08 +01:00
Marco Slot	2410c2e450	Rewrite recover_prepared_transactions to be fast, non-blocking	2017-11-20 11:27:40 +01:00
Marco Slot	9e516513fc	Use local group ID when querying for prepared transactions	2017-10-03 16:36:53 +02:00
Marco Slot	da6b42a3e2	Use unique constraint index for transaction record deletion	2017-09-28 12:04:56 +02:00
Marco Slot	3d7f79127d	Do not release locks in LogTransactionRecord	2017-07-24 20:44:38 +02:00
Brian Cloutier	ec99f8f983	Add nodeRole column - master_add_node enforces that there is only one primary per group - there's also a trigger on pg_dist_node to prevent multiple primaries per group - functions in metadata cache only return primary nodes - Rename ActiveWorkerNodeList -> ActivePrimaryNodeList - Rename WorkerGetLive{Node->Group}Count() - Refactor WorkerGetRandomCandidateNode - master_remove_node only complains about active shard placements if the node being removed is a primary. - master_remove_node only deletes all reference table placements in the group if the node being removed is the primary. - Rename {Node->NodeGroup}HasShardPlacements, this reflects the behavior it already had. - Rename DeleteAllReferenceTablePlacementsFrom{Node->NodeGroup}. This also reflects the behavior it already had, but the new signature forces the caller to pass in a groupId - Rename {WorkerGetLiveGroup->ActivePrimaryNode}Count	2017-07-24 11:57:46 +03:00
Jason Petersen	2204da19f0	Support PostgreSQL 10 (#1379 ) Adds support for PostgreSQL 10 by copying in the requisite ruleutils and updating all API usages to conform with changes in PostgreSQL 10. Most changes are fairly minor but they are numerous. One particular obstacle was the change in \d behavior in PostgreSQL 10's psql; I had to add SQL implementations (views, mostly) to mimic the pre-10 output.	2017-06-26 02:35:46 -06:00
Burak Yucesoy	9fb15c439c	Add version checks to necessary UDFs	2017-05-22 09:53:29 +03:00
Burak Yucesoy	e9095e62ec	Decouple reference table replication With this change we add an option to add a node without replicating all reference tables to that node. If a node is added with this option, we mark the node as inactive and no queries will sent to that node. We also added two new UDFs; - master_activate_node(host, port): - marks node as active and replicates all reference tables to that node - master_add_inactive_node(host, port): - only adds node to pg_dist_node	2017-04-17 13:33:31 +03:00
Brian Cloutier	b1b2b4fadf	Create ExecuteOptionalRemoteCommand A small refactor which pulls some code out of `RecoverWorkerTransactions` and into `remote_commands.c`. This code block currently only occurs in `RecoverWorkerTransactions` but will be useful to other functions shortly. Unfortunately we couldn't call it `ExecuteRemoteCommand`, that name was already taken.	2017-01-17 17:04:37 +02:00
Marco Slot	31231ce196	Use GetNodeConnection to establish a connection in transaction recovery	2017-01-10 02:44:34 +01:00
Andres Freund	d256f3fca9	Remove unused LogPreparedTransactions() function. This is unused since `92c7567008`.	2017-01-06 09:15:01 -08:00
Marco Slot	6cbc1945f9	Enable transaction recovery in connection API	2016-12-23 16:14:29 +01:00
Eren Basak	cee7b54e7c	Add worker transaction and transaction recovery infrastructure	2016-10-18 14:18:14 +03:00

46 Commits (5fe3aa296e67da1313ea7971d18abf4137b953ef)