citus

Commit Graph

Author	SHA1	Message	Date
naisila	17b4122e84	Rename some more foreach_ptr to foreach_declared_ptr	2025-03-13 15:13:56 +03:00
Muhammad Usama	95da74c47f	Fix Deadlock with transaction recovery is possible during Citus upgrades (#7910 ) DESCRIPTION: Fixes deadlock with transaction recovery that is possible during Citus upgrades. Fixes #7875. This commit addresses two interrelated deadlock issues uncovered during Citus upgrades: 1. Local Deadlock: - Problem: In `RecoverWorkerTransactions()`, a new connection is created for each worker node to perform transaction recovery by locking the `pg_dist_transaction` catalog table until the end of the transaction. When `RecoverTwoPhaseCommits()` calls this function for each worker node, the order of acquiring locks on `pg_dist_authinfo` and `pg_dist_transaction` can alternate. This reversal can lead to a deadlock if any concurrent process requires locks on these tables. - Fix: Pre-establish all worker node connections upfront so that `RecoverWorkerTransactions()` operates with a single, consistent connection. This ensures that locks on `pg_dist_authinfo` and `pg_dist_transaction` are always acquired in the correct order, thereby preventing the local deadlock. 2. Distributed Deadlock: - Problem: After resolving the local deadlock, a distributed deadlock issue emerges. The maintenance daemon calls `RecoverWorkerTransactions()` on each worker node— including the local node—which leads to a complex locking sequence: - A RowExclusiveLock is taken on the `pg_dist_transaction` table in `RecoverWorkerTransactions()`. - An update extension then tries to acquire an AccessExclusiveLock on the same table, getting blocked by the RowExclusiveLock. - A subsequent query (e.g., a SELECT on `pg_prepared_xacts`) issued using a separate connection on the local node gets blocked due to locks held during a call to `BuildCitusTableCacheEntry()`. - The maintenance daemon waits for this query, resulting in a circular wait and stalling the entire cluster. - Fix: Avoid cache lookups for internal PostgreSQL tables by implementing an early bailout for relation IDs below `FirstNormalObjectId` (system objects). This eliminates unnecessary calls to `BuildCitusTableCache`, reducing lock contention and mitigating the distributed deadlock. Furthermore, this optimization improves performance in fast connect→query_catalog→disconnect cycles by eliminating redundant cache creation and lookups. 3. Also reverts the commit that disabled the relevant test cases.	2025-03-12 12:43:01 +03:00
Onur Tirtir	7004295065	Revert "Release RowExclusiveLock on pg_dist_transaction as soon as remote xacts are recovered" This reverts commit `684b4c6b96`.	2025-03-12 12:43:01 +03:00
Naisila Puka	3b1c082791	Drops PG14 support (#7753 ) DESCRIPTION: Drops PG14 support 1. Remove "$version_num" != 'xx' from configure file 2. delete all PG_VERSION_NUM = PG_VERSION_XX references in the code 3. Look at pg_version_compat.h file, remove all _compat functions etc defined specifically for PGXX differences 4. delete all PG_VERSION_NUM >= PG_VERSION_(XX+1), PG_VERSION_NUM < PG_VERSION_(XX+1) ifs in the codebase 5. delete ruleutils_xx.c file 6. cleanup normalize.sed file from pg14 specific lines 7. delete all alternative output files for that particular PG version, server_version_ge variable helps here	2025-03-12 12:43:01 +03:00
Onur Tirtir	d5618b6b4c	Release RowExclusiveLock on pg_dist_transaction as soon as remote xacts are recovered As of this commit, after recovering the remote transactions, now we release the lock on pg_dist_transaction while closing it to avoid deadlocks that might occur because of trying to acquire a lock on pg_dist_authinfo while holding a lock on pg_dist_transaction. Such a scenario can only cause a deadlock if another transaction is trying to acquire a strong lock on pg_dist_transaction while holding a lock on pg_dist_authinfo. As of today, we (implicitly) acquire a strong lock on pg_dist_transaction only when upgrading Citus to 11.3-1 and this happens when creating a REPLICA IDENTITY on pg_dist_transaction. And regardless of the code-path we are in, it should be okay to release the lock there because all we do after that point is to abort the prepared transactions that are not part of an in-progress distributed transaction and releasing the lock before doing so should be just fine. This also changes the blocking behavior between citus_create_restore_point and the transaction recovery code-path in the sense that now citus_create_restore_point doesn't until transaction recovery completes aborting the prepared transactions that are not part of an in-progress distributed transaction. However, this should be fine because even before this was possible, e.g., if transaction recovery fails to open a remote connection to a node.	2025-03-12 12:43:01 +03:00
Naisila Puka	dce54db494	PG17 compatibility: Resolve compilation issues (#7699 ) This PR provides successful compilation against PG17.0. - Remove ExecFreeExprContext call Relevant PG commit d060e921ea5aa47b6265174c32e1128cebdbc3df `d060e921ea` - PG17 uses streaming IO in analyze, fix scan_analyze_next_block function Relevant PG commit 041b96802efa33d2bc9456f2ad946976b92b5ae1 `041b96802e` - Define ObjectClass for PG17+ only since it's removed Relevant PG commit: 89e5ef7e21812916c9cf9fcf56e45f0f74034656 `89e5ef7e21` - Remove ReorderBufferTupleBuf structure. Relevant PG commit: 08e6344fd6423210b339e92c069bb979ba4e7cd6 `08e6344fd6` - Define colliculocale and daticulocale since they have been renamed Relevant PG commit: f696c0cd5f299f1b51e214efc55a22a782cc175d `f696c0cd5f` - makeStringConst defined in PG17 Relevant PG commit: de3600452b61d1bc3967e9e37e86db8956c8f577 `de3600452b` - RangeVarCallbackOwnsTable was replaced by RangeVarCallbackMaintainsTable Relevant PG commit: ecb0fd33720fab91df1207e85704f382f55e1eb7 `ecb0fd3372` - attstattarget is nullable, define pg compatible functions for it Relevant PG commit: 4f622503d6de975ac87448aea5cea7de4bc140d5 `4f622503d6` - stxstattarget is nullable in PG17, write compat functions for it Relevant PG commit: 012460ee93c304fbc7220e5b55d9d0577fc766ab `012460ee93` - Use ResourceOwner to track WaitEventSet in PG17 Relevant PG commit: 50c67c2019ab9ade8aa8768bfe604cd802fe8591 `50c67c2019` - getIdentitySequence now uses Relation instead of relation_id Relevant PG commit: 509199587df73f06eda898ae13284292f4ae573a `509199587d` - Remove no-op tuplestore_donestoring function Relevant PG commit: 75680c3d805e2323cd437ac567f0677fdfc7b680 `75680c3d80` - MergeAction can have 3 merge kinds (now enum) in PG17, write compat Relevant PG commit: 0294df2f1f842dfb0eed79007b21016f486a3c6c `0294df2f1f` - EXPLAIN (MEMORY) is added, make changes to ExplainOnePlan Relevant PG commit: 5de890e3610d5a12cdaea36413d967cf5c544e20 `5de890e361` - LIMIT_OPTION_DEFAULT has been removed as it's useless, use LIMIT_OPTION_COUNT Relevant PG commit: a6be0600ac3b71dda8277ab0fcbe59ee101ac1ce `a6be0600ac` - write compat for create_foreignscan_path bcs of more arguments in PG17 Relevant PG commit: 9e9931d2bf40e2fea447d779c2e133c2c1256ef3 `9e9931d2bf` - pgprocno and lxid have been combined into a struct in PGPROC Relevant PG commits: 28f3915b73f75bd1b50ba070f56b34241fe53fd1 `28f3915b73` ab355e3a88de745607f6dd4c21f0119b5c68f2ad `ab355e3a88` 024c521117579a6d356050ad3d78fdc95e44eefa `024c521117` - Simplify CitusNewNode (#7434) postgres refactored newNode() in PG 17, the main point for doing this is the original tricks is no longer neccessary for modern compilers[1]. This does the same for Citus. This should have no backward compatibility issues since it just replaces palloc0fast with palloc0. This is good for forward compatibility since palloc0fast no longer exists in PG 17. [1] https://www.postgresql.org/message-id/b51f1fa7-7e6a-4ecc-936d-90a8a1659e7c@iki.fi (cherry picked from commit `4b295cc`)	2025-03-12 11:01:49 +03:00
Naisila Puka	6bd3474804	Rename foreach_ macros to foreach_declared_ macros (#7700 ) This is prep work for successful compilation with PG17 PG17added foreach_ptr, foreach_int and foreach_oid macros Relevant PG commit 14dd0f27d7cd56ffae9ecdbe324965073d01a9ff `14dd0f27d7` We already have these macros, but they are different with the PG17 ones because our macros take a DECLARED variable, whereas the PG16 macros declare a locally-scoped loop variable themselves. Hence I am renaming our macros to foreach_declared_ I am separating this into its own PR since it touches many files. The main compilation PR is https://github.com/citusdata/citus/pull/7699	2025-03-12 11:01:49 +03:00
Teja Mupparti	ab7c13beb5	For scenarios, such as, Bug 3697586: Server crashes when assigning distributed transaction: Raise an ERROR instead of a crash	2024-12-26 10:45:59 -08:00
Onur Tirtir	73411915a4	Avoid re-assigning the global pid for client backends and bg workers when the application_name changes (#7791 ) DESCRIPTION: Fixes a crash that happens because of unsafe catalog access when re-assigning the global pid after application_name changes. When application_name changes, we don't actually need to try re-assigning the global pid for external client backends because application_name doesn't affect the global pid for such backends. Plus, trying to re-assign the global pid for external client backends would unnecessarily cause performing a catalog access when the cached local node id is invalidated. However, accessing to the catalog tables is dangerous in certain situations like when we're not in a transaction block. And for the other types of backends, i.e., the Citus internal backends, we need to re-assign the global pid when the application_name changes because for such backends we simply extract the global pid inherited from the originating backend from the application_name -that's specified by originating backend when openning that connection- and this doesn't require catalog access.	2024-12-23 14:01:53 +00:00
Evgeny Nechayev	fcc72d8a23	Use macro wrapper to access PGPROC data, which allow to improve compa… (#7607 ) DESCRIPTION: Use macro wrapper to access PGPROC data, to improve compatibility with PostgreSQL forks.	2024-05-28 00:39:13 +00:00
Onur Tirtir	0acb5f6e86	Fix assertion failure in maintenance daemon during Citus upgrades (#7537 ) Fixes https://github.com/citusdata/citus/issues/7536. Note to reviewer: Before this commit, the following results in an assertion failure when executed locally and this won't be the case anymore: ```console make -C src/test/regress/ check-citus-upgrade-local citus-old-version=v10.2.0 ``` Note that this doesn't happen on CI as we don't enable assertions there. --------- Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>	2024-03-20 00:10:12 +00:00
Halil Ozan Akgül	852bcc5483	Add support for create / drop database propagation from non-main databases (#7439 ) DESCRIPTION: Adds support for distributed `CREATE/DROP DATABASE ` commands from the databases where Citus is not installed --------- Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>	2024-02-21 10:44:01 +00:00
eaydingol	f01c5f2593	Move remaining citus_internal functions (#7478 ) Moves the following functions to the Citus internal schema: citus_internal_local_blocked_processes citus_internal_global_blocked_processes citus_internal_mark_node_not_synced citus_internal_unregister_tenant_schema_globally citus_internal_update_none_dist_table_metadata citus_internal_update_placement_metadata citus_internal_update_relation_colocation citus_internal_start_replication_origin_tracking citus_internal_stop_replication_origin_tracking citus_internal_is_replication_origin_tracking_active #7405 --------- Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>	2024-02-07 16:58:17 +03:00
Halil Ozan Akgül	1cb2e1e4e8	Fixes create user queries from Citus non-main databases with other users (#7442 ) This PR makes the connections to other nodes for `mark_object_distributed` use the same user as `execute_command_on_remote_nodes_as_user` so they'll use the same connection.	2024-01-24 12:57:54 +03:00
Gürkan İndibay	c3579eef06	Adds REASSIGN OWNED BY propagation (#7319 ) DESCRIPTION: Adds REASSIGN OWNED BY propagation This pull request introduces the propagation of the "Reassign owned by" statement. It accommodates both local and distributed roles for both the old and new assignments. However, when the old role is a local role, it undergoes filtering and is not propagated. On the other hand, if the new role is a local role, the process involves first creating the role on worker nodes before propagating the "Reassign owned" statement.	2023-12-28 15:15:58 +03:00
Halil Ozan Akgül	b877d606c7	Adds 2PC distributed commands from other databases (#7203 ) DESCRIPTION: Adds support for 2PC from non-Citus main databases This PR only adds support for `CREATE USER` queries, other queries need to be added. But it should be simple because this PR creates the underlying structure. Citus main database is the database where the Citus extension is created. A non-main database is all the other databases that are in the same node with a Citus main database. When a `CREATE USER` query is run on a non-main database we: 1. Run `start_management_transaction` on the main database. This function saves the outer transaction's xid (the non-main database query's transaction id) and marks the current query as main db command. 2. Run `execute_command_on_remote_nodes_as_user("CREATE USER <username>", <username to run the command>)` on the main database. This function creates the users in the rest of the cluster by running the query on the other nodes. The user on the current node is created by the query on the outer, non-main db, query to make sure consequent commands in the same transaction can see this user. 3. Run `mark_object_distributed` on the main database. This function adds the user to `pg_dist_object` in all of the nodes, including the current one. This PR also implements transaction recovery for the queries from non-main databases.	2023-12-22 19:19:41 +03:00
Nils Dijk	0620c8f9a6	Sort includes (#7326 ) This change adds a script to programatically group all includes in a specific order. The script was used as a one time invocation to group and sort all includes throught our formatted code. The grouping is as follows: - System includes (eg. `#include<...>`) - Postgres.h (eg. `#include "postgres.h"`) - Toplevel imports from postgres, not contained in a directory (eg. `#include "miscadmin.h"`) - General postgres includes (eg . `#include "nodes/..."`) - Toplevel citus includes, not contained in a directory (eg. `#include "citus_verion.h"`) - Columnar includes (eg. `#include "columnar/..."`) - Distributed includes (eg. `#include "distributed/..."`) Because it is quite hard to understand the difference between toplevel citus includes and toplevel postgres includes it hardcodes the list of toplevel citus includes. In the same manner it assumes anything not prefixed with `columnar/` or `distributed/` as a postgres include. The sorting/grouping is enforced by CI. Since we do so with our own script there are not changes required in our uncrustify configuration.	2023-11-23 18:19:54 +01:00
Onur Tirtir	240313e286	Support role commands from any node (#7278 ) DESCRIPTION: Adds support from issuing role management commands from worker nodes It's unlikely to get into a distributed deadlock with role commands, we don't care much about them at the moment. There were several attempts to reduce the chances of a deadlock but we didn't any of them merged into main branch yet, see: #7325 #7016 #7009	2023-11-10 09:58:51 +00:00
Nils Dijk	0dac63afc0	move pg_version_constants.h to toplevel include (#7335 ) In preparation of sorting and grouping all includes we wanted to move this file to the toplevel includes for good grouping/sorting.	2023-11-09 15:09:39 +00:00
aykut-bozkurt	8eb3360017	Fixes visibility problems with dependency propagation (#7028 ) Problem: Previously we always used an outside superuser connection to overcome permission issues for the current user while propagating dependencies. That has mainly 2 problems: 1. Visibility issues during dependency propagation, (metadata connection propagates some objects like a schema, and outside transaction does not see it and tries to create it again) 2. Security issues (it is preferrable to use current user's connection instead of extension superuser) Solution (high level): Now, we try to make a smarter decision on whether should we use an outside superuser connection or current user's metadata connection. We prefer using current user's connection if any of the objects, which is already propagated in the current transaction, is a dependency for a target object. We do that since we assume if current user has permissions to create the dependency, then it can most probably propagate the target as well. Our assumption is expected to hold most of the times but it can still be wrong. In those cases, transaction would fail and user should set the GUC `citus.create_object_propagation` to `deferred` to work around it. Solution: 1. We track all objects propagated in the current transaction (we can handle subtransactions), 2. We propagate dependencies via the current user's metadata connection if any dependency is created in the current transaction to address issues listed above. Otherwise, we still use an outside superuser connection. DESCRIPTION: Fixes some object propagation errors seen with transaction blocks. Fixes https://github.com/citusdata/citus/issues/6614 --------- Co-authored-by: Nils Dijk <nils@citusdata.com>	2023-09-05 18:04:16 +03:00
Ivan Vyazmitinov	e94bf93152	#6548 2PC recovery is extremely ineffective on a cluster with multiple DATABASEs fix (#7174 )	2023-09-04 15:28:22 +02:00
zhjwpku	5034f8eba5	polish the codebase by fixing dozens of typos (#7166 )	2023-09-01 12:21:53 +02:00
Naisila Puka	42d956888d	PG16 compatibility: Resolve compilation issues (#7005 ) This PR provides successful compilation against PG16Beta2. It does some necessary refactoring to prepare for full support of version 16, in https://github.com/citusdata/citus/pull/6952 . Change RelFileNode to RelFileNumber or RelFileLocator Relevant PG commit b0a55e43299c4ea2a9a8c757f9c26352407d0ccc new header for varatt.h Relevant PG commit: d952373a987bad331c0e499463159dd142ced1ef drop support for Abs, use fabs Relevant PG commit 357cfefb09115292cfb98d504199e6df8201c957 tuplesort PGcommit: d37aa3d35832afde94e100c4d2a9618b3eb76472 Relevant PG commit: d37aa3d35832afde94e100c4d2a9618b3eb76472 Fix vacuum in columnar Relevant PG commit: 4ce3afb82ecfbf64d4f6247e725004e1da30f47c older one: b6074846cebc33d752f1d9a66e5a9932f21ad177 Add alloc_flags to pg_clean_ascii Relevant PG commit: 45b1a67a0fcb3f1588df596431871de4c93cb76f Merge GetNumConfigOptions() into get_guc_variables() Relevant PG commit: 3057465acfbea2f3dd7a914a1478064022c6eecd Minor PG refactor PG_FUNCNAME_MACRO __func__ Relevant PG commit 320f92b744b44f961e5d56f5f21de003e8027a7f Pass NULL context to stringToQualifiedNameList, typeStringToTypeName The pre-PG16 error behaviour for the following stringToQualifiedNameList & typeStringToTypeName was ereport(ERROR, ...) Now with PG16 we have this context input. We preserve the same behaviour by passing a NULL context, because of the following: (copy paste comment from PG16) If "context" isn't an ErrorSaveContext node, this behaves as errstart(ERROR, domain), and the errsave() macro ends up acting exactly like ereport(ERROR, ...). Relevant PG commit 858e776c84f48841e7e16fba7b690b76e54f3675 Use RangeVarCallbackMaintainsTable instead of RangeVarCallbackOwnsTable Relevant PG commit: 60684dd834a222fefedd49b19d1f0a6189c1632e FIX THIS: Not implemented grant-level control of role inheritance see PG commit e3ce2de09d814f8770b2e3b3c152b7671bcdb83f Make Scan node abstract PG commit: 8c73c11a0d39049de2c1f400d8765a0eb21f5228 Change in Var representations, get_relids_in_jointree PG commit 2489d76c4906f4461a364ca8ad7e0751ead8aa0d Deadlock detection changes because SHM_QUEUE is removed Relevant PG Commit: d137cb52cb7fd44a3f24f3c750fbf7924a4e9532 TU_UpdateIndexes Relevant PG commit 19d8e2308bc51ec4ab993ce90077342c915dd116 Use object_ownercheck and object_aclcheck functions Relevant PG commits: afbfc02983f86c4d71825efa6befd547fe81a926 c727f511bd7bf3c58063737bcf7a8f331346f253 Rework Permission Info for successful compilation Relevant PG commits: postgres/postgres@a61b1f7 postgres/postgres@b803b7d --------- Co-authored-by: onderkalaci <onderkalaci@gmail.com>	2023-07-21 14:32:37 +03:00
Naisila Puka	69af3e8509	Drop PG13 Support Phase 2 - Remove PG13 specific paths/tests (#7007 ) This commit is the second and last phase of dropping PG13 support. It consists of the following: - Removes all PG_VERSION_13 & PG_VERSION_14 from codepaths - Removes pg_version_compat entries and columnar_version_compat entries specific for PG13 - Removes alternative pg13 test outputs - Removes PG13 normalize lines and fix the test outputs based on that It is a continuation of `5bf163a27d`	2023-06-21 14:18:23 +03:00
Onur Tirtir	fa467e05e7	Add support for creating distributed tables with a null shard key (#6745 ) With this PR, we allow creating distributed tables with without specifying a shard key via create_distributed_table(). Here are the the important details about those tables: * Specifying `shard_count` is not allowed because it is assumed to be 1. * We mostly call such tables as "null shard-key" table in code / comments. * To avoid doing a breaking layout change in create_distributed_table(); instead of throwing an error, it will inform the user that `distribution_type` param is ignored unless it's explicitly set to NULL or 'h'. * `colocate_with` param allows colocating such null shard-key tables to each other. * We define this table type, i.e., NULL_SHARD_KEY_TABLE, as a subclass of DISTRIBUTED_TABLE because we mostly want to treat them as distributed tables in terms of SQL / DDL / operation support. * Metadata for such tables look like: - distribution method => DISTRIBUTE_BY_NONE - replication model => REPLICATION_MODEL_STREAMING - colocation id => != INVALID_COLOCATION_ID (distinguishes from Citus local tables) * We assign colocation groups for such tables to different nodes in a round-robin fashion based on the modulo of "colocation id". Note that this PR doesn't care about DDL (except CREATE TABLE) / SQL / operation (i.e., Citus UDFs) support for such tables but adds a preliminary API.	2023-05-03 16:18:27 +03:00
aykutbozkurt	35dbdae5a4	PR #6728 / commit - 11 Let AddNodeMetadata to use metadatasync api during node addition.	2023-03-30 11:06:16 +03:00
aykutbozkurt	a74232bb39	PR #6728 / commit - 9 Do not enforce distributed transaction at `EnsureCoordinatorInitiatedOperation`.	2023-03-30 10:53:22 +03:00
aykutbozkurt	98abd68178	PR #6728 / commit - 1 Add a method to send multiple commands to worker list reusing the same bare connections. Change will be useful for metadata sync api.	2023-03-30 10:52:46 +03:00
Marco Slot	8ad444f8ef	Hide shards from CDC subscriptions	2023-03-29 00:59:12 +02:00
rajeshkt78	85b8a2c7a1	CDC implementation for Citus using Logical Replication (#6623 ) Description: Implementing CDC changes using Logical Replication to avoid re-publishing events multiple times by setting up replication origin session, which will add "DoNotReplicateId" to every WAL entry. - shard splits - shard moves - create distributed table - undistribute table - alter distributed tables (for some cases) - reference table operations The citus decoder which will be decoding WAL events for CDC clients, ignores any WAL entry with replication origin that is not zero. It also maps the shard names to distributed table names.	2023-03-28 16:00:21 +05:30
Onur Tirtir	20a5f3af2b	Replace CITUS_TABLE_WITH_NO_DIST_KEY checks with HasDistributionKey() (#6743 ) Now that we will soon add another table type having DISTRIBUTE_BY_NONE as distribution method and that we want the code to interpret such tables mostly as distributed tables, let's make the definition of those other two table types more strict by removing CITUS_TABLE_WITH_NO_DIST_KEY macro. And instead, use HasDistributionKey() check in the places where the logic applies to all table types that have / don't have a distribution key. In future PRs, we might want to convert some of those HasDistributionKey() checks if logic only applies to Citus local / reference tables, not the others. And adding HasDistributionKey() also allows us to consider having DISTRIBUTE_BY_NONE as the distribution method as a "table attribute" that can apply to distributed tables too, rather something that determines the table type.	2023-03-10 13:55:52 +03:00
Jelte Fennema	f061dbb253	Also reset transactions at connection shutdown (#6685 ) In #6314 I refactored the connection cleanup to be simpler to understand and use. However, by doing so I introduced a use-after-free possibility (that valgrind luckily picked up): In the `ShouldShutdownConnection` path of `AfterXactHostConnectionHandling` we free connections without removing the `transactionNode` from the dlist that it might be part of. Before the refactoring this wasn't a problem, because the dlist would be completely reset quickly after in `ResetGlobalVariables` (without reading or writing the dlist entries). The refactoring changed this by moving the `dlist_delete` call to `ResetRemoteTransaction`, which in turn was called in the `!ShouldShutdownConnection` path of `AfterXactHostConnectionHandling`. Thus this `!ShouldShutdownConnection` path would now delete from the `dlist`, but the `ShouldShutdownConnection` path would not. Thus to remove itself the deleting path would sometimes update nodes in the list that were freed right before. There's two ways of fixing this: 1. Call `dlist_delete` from both of paths. 2. Call `dlist_delete` from neither of the paths. This commit implements the second approach, and #6684 implements the first. We need to choose which approach we prefer. To make calling `dlist_delete` from both paths actually work, we also need to use a slightly different check to determine if we need to call dlist_delete. Various regression tests showed that there can be cases where the `transactionState` is something else than `REMOTE_TRANS_NOT_STARTED` but the connection was not added to the `InProgressTransactions` list One example of such a case is when running `TransactionStateMachine` without calling `StartRemoteTransactionBegin` beforehand. In those cases the connection won't be added to `InProgressTransactions`, but the `transactionState` is changed to `REMOTE_TRANS_SENT_COMMAND`. Sidenote: This bug already existed in 11.1, but valgrind didn't catch it back then. My guess is that this happened because #6314 was merged after the initial release branch was cut. Fixes #6638	2023-02-02 16:05:34 +01:00
Jelte Fennema	92689a8362	Make GPIDs work with pg_dist_poolinfo (#6588 ) The original implementation of GPIDs didn't work correctly when using `pg_dist_poolinfo` together with PgBouncer. The reason is that it assumed that once a connection was made to a worker, the originating GPID should stay the same for ever. But when pg_dist_poolinfo is used this isn't the case, because the same connection on the worker might be used by different backends of the coordinator. This fixes that issue by updating the GPID whenever a new application name is set on a connection. This is the only thing that's needed, because PgBouncer already sets the application name correctly on the server connection whenever a client is updated.	2023-01-13 14:39:19 +00:00
Jelte Fennema	34df853bda	Fix bug introduced by #6412 (#6590 ) In #6412 I made a change to not re-assign the global PID if it was already set. This inadvertently introduced a regression where `userId` and `databaseId` would not be set on the backend data when the global PID was assigned in the authentication hook. This fixes it by doing two things: 1. Removing `userId` from `BackendData`, since it's not used anywhere anyway. 2. Move assignment of `databaseId` to dedicated `SetBackendDataDatabaseId` function, that isn't a no-op when global pid is already set. Since #6412 is not released yet this does not need a description.	2023-01-10 16:21:57 +01:00
Philip Dubé	cf69fc3652	Grammar: it's to its Includes an error message & one case of its to it's Also fix "to the to" typos	2022-11-28 20:43:44 +00:00
Jelte Fennema	68de2ce601	Include gpid in all internal application names (#6431 ) When debugging issues it's quite useful to see the originating gpid in the application_name of a query on a worker. This already happens for most queries, but not for queries created by the rebalancer or by run_command_on_worker. This adds a gpid to those two application_names too. Note, that if the GPID of the new application_names is different than the current GPID of the backend the backend will continue to keep the old gpid as its actual GPID. This PR is just meant to make sure that the application_name is as useful as it can be for users to look at. Updating of gpids will be done in a follow-up PR, and adding gpids to all internal connections will make this easier.	2022-11-25 11:16:33 +01:00
Marco Slot	77fbcfaf14	Propagate BEGIN properties to worker nodes (#6483 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-11-10 18:08:43 +01:00
Onur Tirtir	1af28b3f27	Use CommitContext for subxact mgmt and reduce memory usage in CommitContext (#6099 ) (Hopefully) Fixes #5000. If memory allocation done for `SubXactContext state` in `PushSubXact()` fails, then `PopSubXact()` might segfault, for example, when grabbing the topmost `SubXactContext` from `activeSubXactContexts` if this is the first ever subxact within the current xact, with the following stack trace: ```c citus.so!list_nth_cell(const List list, int n) (\opt\pgenv\pgsql-14.3\include\server\nodes\pg_list.h:260) citus.so!PopSubXact(SubTransactionId subId) (\home\onurctirtir\citus\src\backend\distributed\transaction\transaction_management.c:761) citus.so!CoordinatedSubTransactionCallback(SubXactEvent event, SubTransactionId subId, SubTransactionId parentSubid, void * arg) (\home\onurctirtir\citus\src\backend\distributed\transaction\transaction_management.c:673) CallSubXactCallbacks(SubXactEvent event, SubTransactionId mySubid, SubTransactionId parentSubid) (\opt\pgenv\src\postgresql-14.3\src\backend\access\transam\xact.c:3644) AbortSubTransaction() (\opt\pgenv\src\postgresql-14.3\src\backend\access\transam\xact.c:5058) AbortCurrentTransaction() (\opt\pgenv\src\postgresql-14.3\src\backend\access\transam\xact.c:3366) PostgresMain(int argc, char ** argv, const char * dbname, const char * username) (\opt\pgenv\src\postgresql-14.3\src\backend\tcop\postgres.c:4250) BackendRun(Port * port) (\opt\pgenv\src\postgresql-14.3\src\backend\postmaster\postmaster.c:4530) BackendStartup(Port * port) (\opt\pgenv\src\postgresql-14.3\src\backend\postmaster\postmaster.c:4252) ServerLoop() (\opt\pgenv\src\postgresql-14.3\src\backend\postmaster\postmaster.c:1745) PostmasterMain(int argc, char argv) (\opt\pgenv\src\postgresql-14.3\src\backend\postmaster\postmaster.c:1417) main(int argc, char argv) (\opt\pgenv\src\postgresql-14.3\src\backend\main\main.c:209) ``` For this reason, to be more defensive against memory-allocation errors that could happen at `PushSubXact()`, now we use our pre-allocated memory context for the objects created in `PushSubXact()`. This commit also attempts reducing the memory allocations done under CommitContext to reduce the chances of consuming all the memory available to CommitContext. Note that it's problematic to encounter with such a memory-allocation error for other objects created in `PushSubXact()` as well, so above is an example scenario that might result in a segfault. DESCRIPTION: Fixes a bug that might cause segfaults when handling deeply nested subtransactions	2022-11-03 00:57:32 +03:00
Jelte Fennema	cb34adf7ac	Don't reassign global PID when already assigned (#6412 ) DESCRIPTION: Fix bug in global PID assignment for rebalancer sub-connections In CI our isolation_shard_rebalancer_progress test would sometimes fail like this: ```diff +isolationtester: canceling step s1-rebalance-c1-block-writes after 60 seconds step s1-rebalance-c1-block-writes: SELECT rebalance_table_shards('colocated1', shard_transfer_mode:='block_writes'); - <waiting ...> + +ERROR: canceling statement due to user request step s7-get-progress: ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/27855/workflows/2a7e335a-f3e8-46ed-b6bd-6920d42f7214/jobs/831710 It turned out this was an actual bug in the way our assigning of global PIDs interacts with the way we connect to ourselves as the shard rebalancer. The first command the shard rebalancer sends is a SET ommand to change the application_name to `citus_rebalancer`. If `StartupCitusBackend` is called after this command is processed, then it overwrites the global PID that was extracted from the previous application_name. This makes sure that we don't do that, and continue to use the original global PID. While it might seem that we only call `StartupCitusBackend` once for each query backend, this isn't actually the case. Whenever pg_dist_partition gets ANALYZEd by autovacuum we indirectly call `StartupCitusBackend` again, because we invalidate the cache then. In passing this fixes two other things as well: 1. It sets `distributedCommandOriginator` correctly in `AssignGlobalPID`, by using IsExternalClientBackend(). This doesn't matter much anymore, since AssignGlobalPID effectively becomes a no-op in this PR for any non-external client backends. 2. It passes the application_name to InitializeBackendData in StartupCitusBackend, instead of INVALID_CITUS_INTERNAL_BACKEND_GPID (which effectively got casted to NULL). In practice this doesn't change the behaviour of the call, since the call is a no-op for every backend except the maintenance daemon. And the behaviour of the call is the same for NULL as for the application_name of the maintenance daemon.	2022-10-11 16:41:01 +02:00
Jelte Fennema	24e06af6d2	Reuse connections for Splits and Logical Replication (#6314 ) In Split, Logical replication logic and ShardCleaner we call `SendCommandListToWorkerOutsideTransaction` and `SendOptionalCommandListToWorkerOutsideTransaction` frequently. This opens new connection for each of those calls, even though we already have a perfectly good connection lying around. This PR adds two new APIs `SendCommandListToWorkerOutsideTransactionWithConnection` and `SendOptionalCommandListToWorkerOutsideTransactionWithConnection` that allow sending a list of queries in a transaction over an existing connection. We also update the callers (Split, ShardCleaner, Logical Replication) to use these new APIs instead. Co-authored-by: Nitish Upreti <niupre@microsoft.com> Co-authored-by: Onder Kalaci <onderkalaci@gmail.com>	2022-09-26 13:37:40 +02:00
Nitish Upreti	d7404a9446	'Deferred Drop' and robust 'Shard Cleanup' for Splits. (#6258 ) DESCRIPTION: This PR adds support for 'Deferred Drop' and robust 'Shard Cleanup' for Splits. Common Infrastructure This PR introduces new common infrastructure so as any operation that wants robust cleanup of resources can register with the cleaner and have the resources cleaned appropriately based on a specified policy. 'Shard Split' is the first consumer using this new infrastructure. Note : We only support adding 'shards' as resources to be cleaned-up right now but the framework will be extended to support other resources in future. Deferred Drop for Split Deferred Drop Support ensures that shards undergoing split are not dropped inline as part of operation but dropped later when no active read queries are running on shard. This helps with : Avoids any potential deadlock scenarios that can cause long running Split operation to rollback. Avoids Split operation blocking writes and then getting blocked (due to running queries on the shard) when trying to drop shards. Deferred drop is the new default behavior going forward. Shard Cleaner Extension Shard Cleaner is a background task responsible for deferred drops in case of 'Move' operations. The cleaner has been extended to ensure robust cleanup of shards (dummy shards and split children) in case of a failure based on the new infrastructure mentioned above. The cleaner also handles deferred drop for 'Splits'. TESTING: New test ''citus_split_shard_by_split_points_deferred_drop' to test deferred drop support. New test 'failure_split_cleanup' to test shard cleanup with failures in different stages. Update 'isolation_blocking_shard_split and isolation_non_blocking_shard_split' for deferred drop. Added non-deferred drop version of existing tests : 'citus_split_shard_no_deferred_drop' and 'citus_non_blocking_splits_no_deferred_drop'	2022-09-06 12:11:20 -07:00
Jelte Fennema	1c5b8588fe	Address race condition in InitializeBackendData (#6285 ) Sometimes in CI our isolation_citus_dist_activity test fails randomly like this: ```diff step s2-view-dist: SELECT query, citus_nodename_for_nodeid(citus_nodeid_for_gpid(global_pid)), citus_nodeport_for_nodeid(citus_nodeid_for_gpid(global_pid)), state, wait_event_type, wait_event, usename, datname FROM citus_dist_stat_activity WHERE query NOT ILIKE ALL(VALUES('%pg_prepared_xacts%'), ('%COMMIT%'), ('%BEGIN%'), ('%pg_catalog.pg_isolation_test_session_is_blocked%'), ('%citus_add_node%')) AND backend_type = 'client backend' ORDER BY query DESC; query \|citus_nodename_for_nodeid\|citus_nodeport_for_nodeid\|state \|wait_event_type\|wait_event\|usename \|datname ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+-------------------------+-------------------+---------------+----------+--------+---------- INSERT INTO test_table VALUES (100, 100); \|localhost \| 57636\|idle in transaction\|Client \|ClientRead\|postgres\|regression -(1 row) + + SELECT coalesce(to_jsonb(array_agg(csa_from_one_node.)), '[{}]'::JSONB) + FROM ( + SELECT global_pid, worker_query AS is_worker_query, pg_stat_activity. FROM + pg_stat_activity LEFT JOIN get_all_active_transactions() ON process_id = pid + ) AS csa_from_one_node; + \|localhost \| 57636\|active \| \| \|postgres\|regression +(2 rows) step s3-view-worker: ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26692/workflows/3406e4b4-b686-4667-bec6-8253ee0809b1/jobs/765119 I intended to fix this with #6263, but the fix turned out to be insufficient. This PR tries to address the issue by setting distributedCommandOriginator correctly in more situations. However, even with this change it's still possible to reproduce the flaky test in CI. In any case this should fix at least some instances of this issue. In passing this changes the isolation_citus_dist_activity test to allow running it multiple times in a row.	2022-09-02 14:23:47 +02:00
Marco Slot	432f399a5d	Allow citus_internal application_name with additional suffix (#6282 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-09-01 14:26:43 +02:00
Jelte Fennema	d68654680b	Fix flakyness in isolation_citus_dist_activity (#6263 ) Sometimes in CI our isolation_citus_dist_activity test fails randomly like this: ```diff step s2-view-dist: SELECT query, citus_nodename_for_nodeid(citus_nodeid_for_gpid(global_pid)), citus_nodeport_for_nodeid(citus_nodeid_for_gpid(global_pid)), state, wait_event_type, wait_event, usename, datname FROM citus_dist_stat_activity WHERE query NOT ILIKE ALL(VALUES('%pg_prepared_xacts%'), ('%COMMIT%'), ('%BEGIN%'), ('%pg_catalog.pg_isolation_test_session_is_blocked%'), ('%citus_add_node%')) AND backend_type = 'client backend' ORDER BY query DESC; query \|citus_nodename_for_nodeid\|citus_nodeport_for_nodeid\|state \|wait_event_type\|wait_event\|usename \|datname ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+-------------------------+-------------------+---------------+----------+--------+---------- INSERT INTO test_table VALUES (100, 100); \|localhost \| 57636\|idle in transaction\|Client \|ClientRead\|postgres\|regression -(1 row) + + SELECT coalesce(to_jsonb(array_agg(csa_from_one_node.)), '[{}]'::JSONB) + FROM ( + SELECT global_pid, worker_query AS is_worker_query, pg_stat_activity. FROM + pg_stat_activity LEFT JOIN get_all_active_transactions() ON process_id = pid + ) AS csa_from_one_node; + \|localhost \| 57636\|active \| \| \|postgres\|regression +(2 rows) step s3-view-worker: ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26605/workflows/56d284d2-5bb3-4e64-a0ea-7b9b1626e7cd/jobs/760633 The reason for this is that citus_dist_stat_activity sometimes shows the query that it uses itself to get the data from pg_stat_activity. This is actually a bug, because it's a worker query and thus shouldn't show up there. To try and solve this bug, we remove two small opportunities for a race condition. These race conditions could happen when the backenddata was marked as active, but the distributedCommandOriginator was not set correctly yet/anymore. There was an opportunity for this to happen both during connection start and shutdown.	2022-08-30 12:57:37 +03:00
Marco Slot	639588bee0	Remove unused functions (#6220 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-08-22 11:53:25 +03:00
Jelte Fennema	dd548ee3c7	Use faster custom copy logic for non-blocking shard moves (#6119 ) DESCRIPTION: Use faster custom copy logic for non-blocking shard moves Non-blocking shard moves consist of two main phases: 1. Initial data copy 2. Catchup phase This changes the first of these phases significantly. Previously we used the copy logic provided by postgres subscriptions. This meant we didn't have to implement it ourselves, but it came with the downside of little control. When implementing shard splits we needed more control to even make it work, so we implemented our own logic for copying data between nodes. This PR starts using that logic for non-blocking shard moves. Doing so has four main advantages: 1. It uses COPY in binary format when possible, which is cheaper to encode and decode. Furthermore it very often results in less data that needs to be sent over the network. 2. It allows us to create the primary key (or other replica identity) after doing the initial data copy. This should give some speed up over the total run, because creating an index is bulk is much faster than incrementally building it. 3. It doesn't require a replication slot per parallel copy. Increasing the maximum number of replication slots uses resources in postgres, even if they are not used. So reducing the number of replication slots that shard moves need is nice. 4. Logical replication table_sync workers are slow to start up, so if lots of shards need to be copied that can make it quite slow. This can happen easily when combining Postgres partitioning with Citus.	2022-08-08 17:09:43 +02:00
Teja Mupparti	430c201d03	get_current_transaction_id() UDF is not printing the timestamp of the current transaction on the coordinator even when non-null	2022-08-05 10:12:07 -07:00
Onder Kalaci	149771792b	Remove useless version compats most likely leftover from earlier versions	2022-07-29 10:31:55 +02:00
Onder Kalaci	0a5112964d	Call relation access hash clean-up irrespective of remote transaction state Mainly because local-only transactions should be cleaned up	2022-07-28 11:27:59 +02:00
Onder Kalaci	d67cf907a2	Detach relation access tracking from connection management	2022-07-28 11:27:59 +02:00

1 2 3 4 5 ...

357 Commits (3f67efb2c143f5784ab7050edcd9c199fd425ca3)