citus

Commit Graph

Author	SHA1	Message	Date
Onder Kalaci	b2e9a5baf1	Make sure citus_is_coordinator works on read replicas	2022-07-13 14:11:18 +02:00
Onder Kalaci	7157152f6c	Do not send metadata changes during add node if citus.enable_metadata_sync is set to false	2022-05-30 13:24:31 +02:00
Onder Kalaci	010a2a408e	Avoid assertion failure on citus_add_node	2022-05-30 12:22:09 +02:00
Gledis Zeneli	27ddb4fc8e	Do not obtain AccessShareLock before actual lock (#5965 ) Do not obtain AccessShareLock before acquiring the distributed locks. Acquiring an AccessShareLock ensures that the relations which we are trying to get a distributed lock on will not be dropped in the time between when the LOCK command is issued and the LOCK commands are send to the worker. However, this also leads to distributed deadlocks in such scenarios: ```sql -- for dist lock acquiring order coor, w1, w2 -- on w2 LOCK t1 IN ACCESS EXLUSIVE MODE; -- acquire AccessShareLock locally on t1 to ensure it is not dropped while we get ready to distribute the lock -- concurrently on w1 LOCK t1 IN ACCESS EXLUSIVE MODE; -- acquire AccessShareLock locally on t1 to ensure it is not dropped while we get ready to distribute the lock -- acquire dist lock on coor, w1, gets blocked on local AccessShareLock on w2 -- on w2 continuation of the execution above -- starts to acquire dist locks and gets blocked on the coor by the lock acquired by w1 -- distributed deadlock ``` We opt for avoiding such deadlocks with the cost of the possibility of running into errors when the relations on which we are trying to acquire locks on get dropped.	2022-05-23 13:06:38 +03:00
Onder Kalaci	dd02e1755f	Parallelize metadata syncing on node activate It is often useful to be able to sync the metadata in parallel across nodes. Also citus_finalize_upgrade_to_citus11() uses start_metadata_sync_to_primary_nodes() after this commit. Note that this commit does not parallelize all pieces of node activation or metadata syncing. Instead, it tries to parallelize potenially large parts of metadata, which is the objects and distributed tables (in general Citus tables). In the future, it would be nice to sync the reference tables in parallel across nodes. Create ~720 distributed tables / ~23450 shards ```SQL -- declaratively partitioned table CREATE TABLE github_events_looooooooooooooong_name ( event_id bigint, event_type text, event_public boolean, repo_id bigint, payload jsonb, repo jsonb, actor jsonb, org jsonb, created_at timestamp ) PARTITION BY RANGE (created_at); SELECT create_time_partitions( table_name := 'github_events_looooooooooooooong_name', partition_interval := '1 day', end_at := now() + '24 months' ); CREATE INDEX ON github_events_looooooooooooooong_name USING btree (event_id, event_type, event_public, repo_id); SELECT create_distributed_table('github_events_looooooooooooooong_name', 'repo_id'); SET client_min_messages TO ERROR; ``` across 1 node: almost same as expected ```SQL SELECT start_metadata_sync_to_primary_nodes(); Time: 15664.418 ms (00:15.664) select start_metadata_sync_to_node(nodename,nodeport) from pg_dist_node; Time: 14284.069 ms (00:14.284) ``` across 7 nodes: ~3.5x improvement ```SQL SELECT start_metadata_sync_to_primary_nodes(); ┌──────────────────────────────────────┐ │ start_metadata_sync_to_primary_nodes │ ├──────────────────────────────────────┤ │ t │ └──────────────────────────────────────┘ (1 row) Time: 25711.192 ms (00:25.711) -- across 7 nodes select start_metadata_sync_to_node(nodename,nodeport) from pg_dist_node; Time: 82126.075 ms (01:22.126) ```	2022-05-23 09:15:48 +02:00
Onder Kalaci	127450466e	Do not warn unncessarily when a node is removed In the past (pre-11), we allowed removing worker nodes that had active placements for replicated distributed table, without even checking if there are any other replicas of the same placement. However, with #5469, we prevent disabling nodes via a hard error when there is the last active placement of shard, as we do for reference tables. Note that otherwise, we'd allow users to lose data. As of today, the NOTICE is completely irrelevant.	2022-05-18 17:23:38 +02:00
Onder Kalaci	b4dbd84743	Prevent distributed queries while disabling first worker node First worker node has a special meaning for modifications on the replicated tables It is used to acquire a remote lock, such that the modifications are serialized. With this commit, we make sure that we do not let any distributed query to see a different 'first worker node' while first worker node is disabled. Note that, maybe implicitly mentioned above, when first worker node is disabled, the first worker node changes, that's why we have to handle the situation.	2022-05-18 17:21:12 +02:00
Onder Kalaci	db998b3d66	Adds "sync" option to citus_disable_node() UDF Before this commit, we had: ```SQL SELECT citus_disable_node(nodename, nodeport, force boolean DEFAULT false) ``` Where, we allow forcing to disable first worker node with `force:=true`. However, it entails the risk for losing data / diverging placement data etc. With `force` flag, we control disabling the first worker node, and with `async` flag we control whether the changes are done via bg worker or immediately. ```SQL SELECT citus_disable_node(nodename, nodeport, force boolean DEFAULT false, sync boolean DEFAULT false) ``` Where we can achieve all the following: \| Mode \| Data loss possibility \| Can run in 2PC \| Handle multiple node failures \| Immediately effective \| \| --- \|--- \|--- \|--- \|--- \| \| force:false, sync: false \| false \| true \| true \| false \| \| force:false, sync: true \| false \| false \| false \| true \| \| force:true, sync: false \| true \| true \| true \| false \| \| force:true, sync: true \| false \| false \| false \| true \|	2022-05-18 17:21:12 +02:00
Marco Slot	6fad5dc207	Add a citus_is_coordinator function	2022-05-13 10:02:52 +02:00
Onder Kalaci	5fc7661169	Do not set coordinator's metadatasynced column to false After a disable_node	2022-04-25 09:25:59 +02:00
Onder Kalaci	b0b91bab04	Rename metadata sync to node metadata sync where applicable	2022-04-07 17:51:31 +02:00
Halil Ozan Akgul	4dbc760603	Introduces citus_coordinator_node_id	2022-03-22 10:34:22 +03:00
Gledis Zeneli	56ab64b747	Patches #5758 with some more error checks (#5804 ) Add error checks to detect failed connection and don't ping secondary nodes to detect self reference.	2022-03-15 15:02:47 +03:00
Gledis Zeneli	2cb02bfb56	Fix node adding itself with citus_add_node leading to deadlock (Fix #5720 ) (#5758 ) If a worker node is being added, a command is sent to get the server_id of the worker from the pg_dist_node_metadata table. If the worker's id is the same as the node executing the code, we will know the node is trying to add itself. If the node tries to add itself without specifying `groupid:=0` the operation will result in an error.	2022-03-10 17:46:33 +03:00
Halil Ozan Akgül	333bcc7948	Global PID Helper Functions (#5768 ) * Introduces citus_nodename_for_nodeid and citus_nodeport_for_nodeid functions * Introduces citus_nodeid_for_gpid and citus_pid_for_gpid functions * Add tests	2022-03-09 13:15:59 +03:00
Marco Slot	3ba61244b8	Synchronize pg_dist_colocation metadata	2022-03-03 11:01:59 +01:00
Onder Kalaci	df95d59e33	Drop support for CitusInitiatedBackend CitusInitiatedBackend was a pre-mature implemenation of the whole GlobalPID infrastructure. We used it to track whether any individual query is triggered by Citus or not. As of now, after GlobalPID is already in place, we don't need CitusInitiatedBackend, in fact it could even be wrong.	2022-02-24 12:12:43 +01:00
Hanefi Onaldi	f4e8af2c22	Do not acquire locks on node metadata explicitly	2022-02-24 03:19:56 +03:00
Halil Ozan Akgul	f6cd4d0f07	Overrides pg_cancel_backend and pg_terminate_backend to accept global pid	2022-02-21 16:41:35 +03:00
Onder Kalaci	ff234fbfd2	Unify old GUCs into a single one Replaces citus.enable_object_propagation with citus.enable_metadata_sync Also, within Citus 11 release cycle, we added citus.enable_metadata_sync_by_default, that is also replaced with citus.enable_metadata_sync. In essence, when citus.enable_metadata_sync is set to true, all the objects and the metadata is send to the remote node. We strongly advice that the users never changes the value of this GUC.	2022-02-04 10:52:56 +01:00
Onder Kalaci	650243927c	Relax some transactional limications on activate node We already enforce EnsureSequentialModeMetadataOperations(), and given that all activate node is transaction, we should be fine	2022-02-01 15:56:55 +01:00
Burak Velioglu	f88cc230bf	Handle tables and objects as metadata. Update UDFs accordingly With this commit we've started to propagate sequences and shell tables within the object dependency resolution. So, ensuring any dependencies for any object will consider shell tables and sequences as well. Separate logics for both shell tables and sequences have been removed. Since both shell tables and sequences logic were implemented as a part of the metadata handling before that logic, we were propagating them while syncing table metadata. With this commit we've divided metadata (which means anything except shards thereafter) syncing logic into multiple parts and implemented it either as a part of ActivateNode. You can check the functions called in ActivateNode to check definition of different metadata. Definitions of start_metadata_sync_to_node and citus_activate_node have also been updated. citus_activate_node will basically create an active node with all metadata and reference table shards. start_metadata_sync_to_node will be same with citus_activate_node except replicating reference tables. stop_metadata_sync_to_node will remove all the metadata. All of those UDFs need to be called by superuser.	2022-01-31 16:20:15 +03:00
Önder Kalacı	885601c02c	Require superuser while activating a node (#5609 ) * Require superuser while activating a node With this change, we require ActiveNode() (hence citus_add_node(), citus_activate_node()) explicitly require for a superuser. Before this commit, these functions were designed to work with non-superuser roles with the relevent GRANTs given. However, that is not a widely used way for calling the functions above. Due to possibility of non-super user calling the UDFs, they were designed in a way that some commands were using some additional short-lived superuser connections. That is: (a) breaking transactional behavior (e.g., ROLLBACK wouldn't fully rollback the whole transaction) (b) Making it very complicated to reason about which parts of the node activation goes over which connections, and becoming vulnerable to deadlocks / visibility issues.	2022-01-10 08:30:13 -08:00
Onder Kalaci	9f2d9e1487	Move placement deletion from disable node to activate node We prefer the background daemon to only sync node metadata. That's why we move placement metadata changes from disable node to activate node. With that, we can make sure that disable node only changes node metadata, whereas activate node syncs all the metadata changes. In essence, we already expect all nodes to be up when a node is activated. So, this does not change the behavior much.	2022-01-07 09:56:03 +01:00
Önder Kalacı	0a8b0b06c6	Do not allow distributed functions on non-metadata synced nodes (#5586 ) Before this commit, Citus was triggering metadata syncing in the background when a function is distributed. However, with Citus 11, we expect all clusters to have metadata synced enabled. So, we do not expect any nodes not to have the metadata. This change: (a) pro: simplifies the code and opens up possibilities to simplify futher by reducing the scope of bg worker to only sync node metadata (b) pro: explicitly asks users to sync the metadata such that any unforseen impact can be easily detected (c) con: For distributed functions without distribution argument, we do not necessarily require the metadata sycned. However, for completeness and simplicity, we do so.	2022-01-04 13:12:57 +01:00
Onder Kalaci	549edcabb6	Allow disabling node(s) when multiple failures happen As of master branch, Citus does all the modifications to replicated tables (e.g., reference tables and distributed tables with replication factor > 1), via 2PC and avoids any shardstate=3. As a side-effect of those changes, handling node failures for replicated tables change. With this PR, when one (or multiple) node failures happen, the users would see query errors on modifications. If the problem is intermitant, that's OK, once the node failure(s) recover by themselves, the modification queries would succeed. If the node failure(s) are permenant, the users should call `SELECT citus_disable_node(...)` to disable the node. As soon as the node is disabled, modification would start to succeed. However, now the old node gets behind. It means that, when the node is up again, the placements should be re-created on the node. First, use `SELECT citus_activate_node()`. Then, use `SELECT replicate_table_shards(...)` to replicate the missing placements on the re-activated node.	2021-12-01 10:19:48 +01:00
Onder Kalaci	d405993b57	Make sure to use a dedicated metadata connection With this commit, we make sure to use a dedicated connection per node for all the metadata operations within the same transaction. This is needed because the same metadata (e.g., metadata includes the distributed table on the workers) can be modified accross multiple connections. With this connection we guarantee that there is a single metadata connection. But note that this connection can be used for any other operation. In other words, this connection is not only reserved for metadata operations.	2021-11-26 14:36:28 +01:00
Onder Kalaci	38b08ebde9	Generalize the error checks while removing node The checks for preventing to remove a node are very much reference table centric. We are soon going to add the same checks for replicated tables. So, make the checks generic such that: (a) replicated tables fit naturally (b) we can the same checks in `citus_disable_node`.	2021-11-26 14:25:29 +01:00
Halil Ozan Akgul	91b377490b	Fix multi_cluster_management fails for metadata syncing	2021-11-04 11:09:21 +03:00
Halil Ozan Akgul	9c9d4b5eeb	Turn MX on by default	2021-10-08 18:17:21 +03:00
Halil Ozan Akgul	43d5853b6d	Fixes function names in comments	2021-10-06 09:24:43 +03:00
Hanefi Onaldi	7e39c7ea83	Replace master with citus in logs and comments (#5210 ) I replaced - master_add_node, - master_add_inactive_node - master_activate_node with - citus_add_node, - citus_add_inactive_node - citus_activate_node respectively.	2021-08-26 11:31:17 +03:00
Ahmet Gedemenli	9e90894f21	Synchronize hasmetadata flag on mx workers (#5086 ) * Synchronize hasmetadata flag on mx workers * Switch to sequential execution * Add test * Use SetWorkerColumn * Add test for stop_sync * Remove usage of UpdateHasmetadataOnWorkersWithMetadata * Remove MarkNodeMetadataSynced * Fix test for metadatasynced * Remove MarkNodeMetadataSynced * Style * Remove MarkNodeHasMetadata * Remove UpdateDistNodeBoolAttr * Refactor SetWorkerColumn * Use SetWorkerColumnLocalOnly when setting up dependencies * Use SetWorkerColumnLocalOnly in TriggerSyncMetadataToPrimaryNodes * Style * Make update command generator functions static * Set metadatasynced before syncing * Call SetWorkerColumn only if the sync is successful * Try to sync all nodes * Fix indexno * Update metadatasynced locally first * Break if a node fails to sync metadata * Send worker commands optional * Style & Rebase * Add raiseOnError param to SetWorkerColumn * Style * Set metadatasynced for all metadata nodes * Style * Introduce SetWorkerColumnOptional * Polish * Style * Dont send set command to not synced metadata nodes * Style * Polish * Add test for stop_sync * Add test for shouldhaveshards * Add test for isactive flag * Sort by placementid in the function verify_metadata * Cover edge cases for failing nodes * Add comments * Add nodeport to isactive test * Add warning if metadata out of sync * Update warning message	2021-08-12 14:16:18 +03:00
Onder Kalaci	5f02d18ef8	transactional metadata sync for maintanince daemon As we use the current user to sync the metadata to the nodes with #5105 (and many other PRs), there is no reason that prevents us to use the coordinated transaction for metadata syncing. This commit also renames few functions to reflect their actual implementation.	2021-08-09 10:34:55 +02:00
Marco Slot	c03729ad03	Only warn about reference tables when removing last node	2021-06-01 10:53:12 +02:00
Jelte Fennema	b1cad26ebc	Move CheckCitusVersion to the top of each function Previously this was usually done after argument parsing. This can cause SEGFAULTs if the number or type of arguments changes in a new version. By checking that Citus version is correct before doing any argument parsing we protect against these types of issues. Issues like this have occurred in pg_auto_failover, so it's not just a theoretical issue. The main reason why these calls were not at the top of functions is really just historical. It was because in the past we didn't allow statements before declarations. Thus having this check before the argument parsing would have only been possible if we first declared all variables. In addition to moving existing CheckCitusVersion calls it also adds these calls to rebalancer related functions (they were missing there).	2021-06-01 17:43:46 +02:00
SaitTalhaNisanci	8c3f85692d	Not consider old placements when disabling or removing a node (#4960 ) * Not consider old placements when disabling or removing a node * update cluster test	2021-05-28 22:38:20 +02:00
Nils Dijk	c91f8d8a15	Feature: localhost guc (#4836 ) DESCRIPTION: introduce `citus.local_hostname` GUC for connections to the current node Citus once in a while needs to connect to itself for some systems operations. This used to be hardcoded to `localhost`. The hardcoded hostname causes some issues, for example in environments where `sslmode=verify-full` is required. It is not always desirable or even feasible to get `localhost` as an alt name on the certificate. By introducing a GUC to use when connecting to the current instance the user has more control what network path is used and what hostname is required to be present in the server certificate.	2021-05-12 16:59:44 +02:00
Marco Slot	f25de6a0e3	Try to return earlier in idempotent master_add_node	2021-03-02 21:22:47 +01:00
Onder Kalaci	fc9a23792c	COPY uses adaptive connection management on local node With #4338, the executor is smart enough to failover to local node if there is not enough space in max_connections for remote connections. For COPY, the logic is different. With #4034, we made COPY work with the adaptive connection management slightly differently. The cause of the difference is that COPY doesn't know which placements are going to be accessed hence requires to get connections up-front. Similarly, COPY decides to use local execution up-front. With this commit, we change the logic for COPY on local nodes: Try to reserve a connection to local host. This logic follows the same logic (e.g., citus.local_shared_pool_size) as the executor because COPY also relies on TryToIncrementSharedConnectionCounter(). If reservation to local node fails, switch to local execution Apart from this, if local execution is disabled, we follow the exact same logic for multi-node Citus. It means that if we are out of the connection, we'd give an error.	2021-02-04 09:45:07 +01:00
Hadi Moshayedi	bc01c795a2	Reland #4419	2021-01-19 07:48:47 -08:00
Ahmet Gedemenli	436c9d9d79	Remove the word 'master' from Citus UDFs (#4472 ) * Replace master_add_node with citus_add_node * Replace master_activate_node with citus_activate_node * Replace master_add_inactive_node with citus_add_inactive_node * Use master udfs in old scripts * Replace master_add_secondary_node with citus_add_secondary_node * Replace master_disable_node with citus_disable_node * Replace master_drain_node with citus_drain_node * Replace master_remove_node with citus_remove_node * Replace master_set_node_property with citus_set_node_property * Replace master_unmark_object_distributed with citus_unmark_object_distributed * Replace master_update_node with citus_update_node * Replace master_update_shard_statistics with citus_update_shard_statistics * Replace master_update_table_statistics with citus_update_table_statistics * Rename master_conninfo_cache_invalidate to citus_conninfo_cache_invalidate Rename master_dist_local_group_cache_invalidate to citus_dist_local_group_cache_invalidate * Replace master_copy_shard_placement with citus_copy_shard_placement * Replace master_move_shard_placement with citus_move_shard_placement * Rename master_dist_node_cache_invalidate to citus_dist_node_cache_invalidate * Rename master_dist_object_cache_invalidate to citus_dist_object_cache_invalidate * Rename master_dist_partition_cache_invalidate to citus_dist_partition_cache_invalidate * Rename master_dist_placement_cache_invalidate to citus_dist_placement_cache_invalidate * Rename master_dist_shard_cache_invalidate to citus_dist_shard_cache_invalidate * Drop master_modify_multiple_shards * Rename master_drop_all_shards to citus_drop_all_shards * Drop master_create_distributed_table * Drop master_create_worker_shards * Revert old function definitions * Add missing revoke statement for citus_disable_node	2021-01-13 12:10:43 +03:00
Marco Slot	d900a7336e	Automatically add placeholder record for coordinator	2021-01-08 15:09:53 +01:00
Marco Slot	597533b1ff	Add citus_set_coordinator_host	2021-01-08 13:36:26 +01:00
Marco Slot	d9f175532b	Revert "Trigger metadata sync at transaction commit" This reverts commit `a2c73bef27`.	2021-01-07 10:30:00 +01:00
Hadi Moshayedi	a2c73bef27	Trigger metadata sync at transaction commit	2020-12-24 08:28:38 -08:00
Ahmet Gedemenli	936775e8e3	Delete transactions when removing node With this commit, we delete entries in pg_dist_transaction for the primary nodes that are removed by `master_remove_node`.	2020-12-07 11:35:20 +03:00
Ahmet Gedemenli	81db4dca5c	Degrade gracefully when no background workers available	2020-10-05 16:55:00 +03:00
Onder Kalaci	5d017cd123	Improve node matedata when coordinator is added Coordinator should always be always active, hasmetadata and metadasynced. Prevent changing those fields.	2020-09-21 14:53:41 +02:00
Onder Kalaci	6fc1dea85c	Improve the robustness of function call delegation Pushing down the CALLs to the node that the CALL is executed is dangerous and could lead to infinite recursion. When the coordinator added as worker, Citus was by chance preventing this. The coordinator was marked as "not metadatasynced" node in pg_dist_node, which prevented CALL/function delegation to happen. With this commit, we do the following: - Fix metadatasynced column for the coordinator on pg_dist_node - Prevent pushdown of function/procedure to the same node that the function/procedure is being executed. Today, we do not sync pg_dist_object (e.g., distributed functions metadata) to the worker nodes. But, even if we do it now, the function call delegation would prevent the infinite recursion.	2020-09-21 14:53:30 +02:00

1 2

74 Commits (b2e9a5baf15e6fbb461bd8f5476a947c6512be03)