Commit Graph

74 Commits (b2e9a5baf15e6fbb461bd8f5476a947c6512be03)

Author SHA1 Message Date
Onder Kalaci b2e9a5baf1 Make sure citus_is_coordinator works on read replicas 2022-07-13 14:11:18 +02:00
Onder Kalaci 7157152f6c Do not send metadata changes during add node if citus.enable_metadata_sync is set to false 2022-05-30 13:24:31 +02:00
Onder Kalaci 010a2a408e Avoid assertion failure on citus_add_node 2022-05-30 12:22:09 +02:00
Gledis Zeneli 27ddb4fc8e
Do not obtain AccessShareLock before actual lock (#5965)
Do not obtain AccessShareLock before acquiring the distributed locks.

Acquiring an AccessShareLock ensures that the relations which we are trying to get a distributed lock on will not be dropped in the time between when the LOCK command is issued and the LOCK commands are send to the worker. However, this also leads to distributed deadlocks in such scenarios:

```sql
-- for dist lock acquiring order coor, w1, w2

-- on w2
LOCK t1 IN ACCESS EXLUSIVE MODE;
-- acquire AccessShareLock locally on t1 to ensure it is not dropped while we get ready to distribute the lock

      -- concurrently on w1
      LOCK t1 IN ACCESS EXLUSIVE MODE;
      -- acquire AccessShareLock locally on t1 to ensure it is not dropped while we get ready to distribute the lock
      -- acquire dist lock on coor, w1, gets blocked on local AccessShareLock on w2

-- on w2 continuation of the execution above
-- starts to acquire dist locks and gets blocked on the coor by the lock acquired by w1

-- distributed deadlock

``` 

We opt for avoiding such deadlocks with the cost of the possibility of running into errors when the relations on which we are trying to acquire locks on get dropped.
2022-05-23 13:06:38 +03:00
Onder Kalaci dd02e1755f Parallelize metadata syncing on node activate
It is often useful to be able to sync the metadata in parallel
across nodes.

Also citus_finalize_upgrade_to_citus11() uses
start_metadata_sync_to_primary_nodes() after this commit.

Note that this commit does not parallelize all pieces of node
activation or metadata syncing. Instead, it tries to parallelize
potenially large parts of metadata, which is the objects and
distributed tables (in general Citus tables).

In the future, it would be nice to sync the reference tables
in parallel across nodes.

Create ~720 distributed tables / ~23450 shards
```SQL
-- declaratively partitioned table
CREATE TABLE github_events_looooooooooooooong_name (
  event_id bigint,
  event_type text,
  event_public boolean,
  repo_id bigint,
  payload jsonb,
  repo jsonb,
  actor jsonb,
  org jsonb,
  created_at timestamp
) PARTITION BY RANGE (created_at);

SELECT create_time_partitions(
  table_name         := 'github_events_looooooooooooooong_name',
  partition_interval := '1 day',
  end_at             := now() + '24 months'
);

CREATE INDEX ON github_events_looooooooooooooong_name USING btree (event_id, event_type, event_public, repo_id);
SELECT create_distributed_table('github_events_looooooooooooooong_name', 'repo_id');

SET client_min_messages TO ERROR;

```

across 1 node: almost same as expected
```SQL

SELECT start_metadata_sync_to_primary_nodes();
Time: 15664.418 ms (00:15.664)

select start_metadata_sync_to_node(nodename,nodeport) from pg_dist_node;
Time: 14284.069 ms (00:14.284)
```

across 7 nodes: ~3.5x improvement
```SQL

SELECT start_metadata_sync_to_primary_nodes();
┌──────────────────────────────────────┐
│ start_metadata_sync_to_primary_nodes │
├──────────────────────────────────────┤
│ t                                    │
└──────────────────────────────────────┘
(1 row)

Time: 25711.192 ms (00:25.711)

-- across 7 nodes
select start_metadata_sync_to_node(nodename,nodeport) from pg_dist_node;
Time: 82126.075 ms (01:22.126)
```
2022-05-23 09:15:48 +02:00
Onder Kalaci 127450466e Do not warn unncessarily when a node is removed
In the past (pre-11), we allowed removing worker nodes
that had active placements for replicated distributed
table, without even checking if there are any other
replicas of the same placement.

However, with #5469, we prevent disabling nodes via a hard
error when there is the last active placement of shard, as we
do for reference tables. Note that otherwise, we'd allow
users to lose data.

As of today, the NOTICE is completely irrelevant.
2022-05-18 17:23:38 +02:00
Onder Kalaci b4dbd84743 Prevent distributed queries while disabling first worker node
First worker node has a special meaning for modifications on the replicated tables

It is used to acquire a remote lock, such that the modifications are serialized.

With this commit, we make sure that we do not let any distributed query to see a
different 'first worker node' while first worker node is disabled.

Note that, maybe implicitly mentioned above, when first worker node is disabled,
the first worker node changes, that's why we have to handle the situation.
2022-05-18 17:21:12 +02:00
Onder Kalaci db998b3d66 Adds "sync" option to citus_disable_node() UDF
Before this commit, we had:
```SQL
SELECT citus_disable_node(nodename, nodeport, force boolean DEFAULT false)
```

Where, we allow forcing to disable first worker node with
`force:=true`. However, it entails the risk for losing
data / diverging placement data etc.

With `force` flag, we control disabling the first worker node,
and with `async` flag we control whether the changes are done
via bg worker or immediately.

```SQL
SELECT citus_disable_node(nodename, nodeport, force boolean DEFAULT false, sync boolean DEFAULT false)
```

Where we can achieve all the following:

| Mode  | Data loss possibility | Can run in 2PC | Handle multiple node failures | Immediately effective |
| --- |--- |--- |--- |--- |
| force:false, sync: false  | false   | true  | true  | false |
| force:false, sync: true   | false  | false | false | true |
| force:true, sync: false   | true   | true  | true   | false |
| force:true, sync: true    | false  | false | false  | true |
2022-05-18 17:21:12 +02:00
Marco Slot 6fad5dc207 Add a citus_is_coordinator function 2022-05-13 10:02:52 +02:00
Onder Kalaci 5fc7661169 Do not set coordinator's metadatasynced column to false
After a disable_node
2022-04-25 09:25:59 +02:00
Onder Kalaci b0b91bab04 Rename metadata sync to node metadata sync where applicable 2022-04-07 17:51:31 +02:00
Halil Ozan Akgul 4dbc760603 Introduces citus_coordinator_node_id 2022-03-22 10:34:22 +03:00
Gledis Zeneli 56ab64b747
Patches #5758 with some more error checks (#5804)
Add error checks to detect failed connection and don't ping secondary nodes to detect self reference.
2022-03-15 15:02:47 +03:00
Gledis Zeneli 2cb02bfb56
Fix node adding itself with citus_add_node leading to deadlock (Fix #5720) (#5758)
If a worker node is being added, a command is sent to get the server_id of the worker from the pg_dist_node_metadata table. If the worker's id is the same as the node executing the code, we will know the node is trying to add itself. If the node tries to add itself without specifying `groupid:=0` the operation will result in an error.
2022-03-10 17:46:33 +03:00
Halil Ozan Akgül 333bcc7948
Global PID Helper Functions (#5768)
* Introduces citus_nodename_for_nodeid and citus_nodeport_for_nodeid functions

* Introduces citus_nodeid_for_gpid and citus_pid_for_gpid functions

* Add tests
2022-03-09 13:15:59 +03:00
Marco Slot 3ba61244b8 Synchronize pg_dist_colocation metadata 2022-03-03 11:01:59 +01:00
Onder Kalaci df95d59e33 Drop support for CitusInitiatedBackend
CitusInitiatedBackend was a pre-mature implemenation of the whole
GlobalPID infrastructure. We used it to track whether any individual
query is triggered by Citus or not.

As of now, after GlobalPID is already in place, we don't need
CitusInitiatedBackend, in fact it could even be wrong.
2022-02-24 12:12:43 +01:00
Hanefi Onaldi f4e8af2c22
Do not acquire locks on node metadata explicitly 2022-02-24 03:19:56 +03:00
Halil Ozan Akgul f6cd4d0f07 Overrides pg_cancel_backend and pg_terminate_backend to accept global pid 2022-02-21 16:41:35 +03:00
Onder Kalaci ff234fbfd2 Unify old GUCs into a single one
Replaces citus.enable_object_propagation with citus.enable_metadata_sync

Also, within Citus 11 release cycle, we added citus.enable_metadata_sync_by_default,
that is also replaced with citus.enable_metadata_sync.

In essence, when citus.enable_metadata_sync is set to true, all the objects
and the metadata is send to the remote node.

We strongly advice that the users never changes the value of
this GUC.
2022-02-04 10:52:56 +01:00
Onder Kalaci 650243927c Relax some transactional limications on activate node
We already enforce EnsureSequentialModeMetadataOperations(), and given that all activate node is transaction, we should be fine
2022-02-01 15:56:55 +01:00
Burak Velioglu f88cc230bf
Handle tables and objects as metadata. Update UDFs accordingly
With this commit we've started to propagate sequences and shell
tables within the object dependency resolution. So, ensuring any
dependencies for any object will consider shell tables and sequences
as well. Separate logics for both shell tables and sequences have
been removed.

Since both shell tables and sequences logic were implemented as a
part of the metadata handling before that logic, we were propagating
them while syncing table metadata. With this commit we've divided
metadata (which means anything except shards thereafter) syncing
logic into multiple parts and implemented it either as a part of
ActivateNode. You can check the functions called in ActivateNode
to check definition of different metadata.

Definitions of start_metadata_sync_to_node and citus_activate_node
have also been updated. citus_activate_node will basically create
an active node with all metadata and reference table shards.
start_metadata_sync_to_node will be same with citus_activate_node
except replicating reference tables. stop_metadata_sync_to_node
will remove all the metadata. All of those UDFs need to be called
by superuser.
2022-01-31 16:20:15 +03:00
Önder Kalacı 885601c02c
Require superuser while activating a node (#5609)
* Require superuser while activating a node

With this change, we require ActiveNode() (hence citus_add_node(),
citus_activate_node()) explicitly require for a superuser.

Before this commit, these functions were designed to work with
non-superuser roles with the relevent GRANTs given.

However, that is not a widely used way for calling the functions
above.

Due to possibility of non-super user calling the UDFs, they were
designed in a way that some commands were using some additional
short-lived superuser connections. That is:
	(a) breaking transactional behavior (e.g., ROLLBACK
 	    wouldn't fully rollback the whole transaction)
        (b) Making it very complicated to reason about which
	    parts of the node activation goes over which connections,
	    and becoming vulnerable to deadlocks / visibility issues.
2022-01-10 08:30:13 -08:00
Onder Kalaci 9f2d9e1487 Move placement deletion from disable node to activate node
We prefer the background daemon to only sync node metadata. That's
why we move placement metadata changes from disable node to
activate node. With that, we can make sure that disable node
only changes node metadata, whereas activate node syncs all
the metadata changes. In essence, we already expect all
nodes to be up when a node is activated. So, this does not change
the behavior much.
2022-01-07 09:56:03 +01:00
Önder Kalacı 0a8b0b06c6
Do not allow distributed functions on non-metadata synced nodes (#5586)
Before this commit, Citus was triggering metadata syncing
in the background when a function is distributed. However,
with Citus 11, we expect all clusters to have metadata synced
enabled. So, we do not expect any nodes not to have the metadata.

This change:
	(a) pro: simplifies the code and opens up possibilities
		 to simplify futher by reducing the scope of
		 bg worker to only sync node metadata
        (b) pro: explicitly asks users to sync the metadata such that
  	    any unforseen impact can be easily detected
        (c) con: For distributed functions without distribution
		 argument, we do not necessarily require the metadata
		 sycned. However, for completeness and simplicity, we
		 do so.
2022-01-04 13:12:57 +01:00
Onder Kalaci 549edcabb6 Allow disabling node(s) when multiple failures happen
As of master branch, Citus does all the modifications to replicated tables
(e.g., reference tables and distributed tables with replication factor > 1),
via 2PC and avoids any shardstate=3. As a side-effect of those changes,
handling node failures for replicated tables change.

With this PR, when one (or multiple) node failures happen, the users would
see query errors on modifications. If the problem is intermitant, that's OK,
once the node failure(s) recover by themselves, the modification queries would
succeed. If the node failure(s) are permenant, the users should call
`SELECT citus_disable_node(...)` to disable the node. As soon as the node is
disabled, modification would start to succeed. However, now the old node gets
behind. It means that, when the node is up again, the placements should be
re-created on the node. First, use `SELECT citus_activate_node()`. Then, use
`SELECT replicate_table_shards(...)` to replicate the missing placements on
the re-activated node.
2021-12-01 10:19:48 +01:00
Onder Kalaci d405993b57 Make sure to use a dedicated metadata connection
With this commit, we make sure to use a dedicated connection per
node for all the metadata operations within the same transaction.

This is needed because the same metadata (e.g., metadata includes
the distributed table on the workers) can be modified accross
multiple connections.

With this connection we guarantee that there is a single metadata connection.
But note that this connection can be used for any other operation.
In other words, this connection is not only reserved for metadata
operations.
2021-11-26 14:36:28 +01:00
Onder Kalaci 38b08ebde9 Generalize the error checks while removing node
The checks for preventing to remove a node are very much reference
table centric. We are soon going to add the same checks for replicated
tables. So, make the checks generic such that:
	 (a) replicated tables fit naturally
	 (b) we can the same checks in `citus_disable_node`.
2021-11-26 14:25:29 +01:00
Halil Ozan Akgul 91b377490b Fix multi_cluster_management fails for metadata syncing 2021-11-04 11:09:21 +03:00
Halil Ozan Akgul 9c9d4b5eeb Turn MX on by default 2021-10-08 18:17:21 +03:00
Halil Ozan Akgul 43d5853b6d Fixes function names in comments 2021-10-06 09:24:43 +03:00
Hanefi Onaldi 7e39c7ea83
Replace master with citus in logs and comments (#5210)
I replaced 

- master_add_node,
- master_add_inactive_node
- master_activate_node

with

- citus_add_node,
- citus_add_inactive_node
- citus_activate_node

respectively.
2021-08-26 11:31:17 +03:00
Ahmet Gedemenli 9e90894f21
Synchronize hasmetadata flag on mx workers (#5086)
* Synchronize hasmetadata flag on mx workers

* Switch to sequential execution

* Add test

* Use SetWorkerColumn

* Add test for stop_sync

* Remove usage of UpdateHasmetadataOnWorkersWithMetadata

* Remove MarkNodeMetadataSynced

* Fix test for metadatasynced

* Remove MarkNodeMetadataSynced

* Style

* Remove MarkNodeHasMetadata

* Remove UpdateDistNodeBoolAttr

* Refactor SetWorkerColumn

* Use SetWorkerColumnLocalOnly when setting up dependencies

* Use SetWorkerColumnLocalOnly in TriggerSyncMetadataToPrimaryNodes

* Style

* Make update command generator functions static

* Set metadatasynced before syncing

* Call SetWorkerColumn only if the sync is successful

* Try to sync all nodes

* Fix indexno

* Update metadatasynced locally first

* Break if a node fails to sync metadata

* Send worker commands optional

* Style & Rebase

* Add raiseOnError param to SetWorkerColumn

* Style

* Set metadatasynced for all metadata nodes

* Style

* Introduce SetWorkerColumnOptional

* Polish

* Style

* Dont send set command to not synced metadata nodes

* Style

* Polish

* Add test for stop_sync

* Add test for shouldhaveshards

* Add test for isactive flag

* Sort by placementid in the function verify_metadata

* Cover edge cases for failing nodes

* Add comments

* Add nodeport to isactive test

* Add warning if metadata out of sync

* Update warning message
2021-08-12 14:16:18 +03:00
Onder Kalaci 5f02d18ef8 transactional metadata sync for maintanince daemon
As we use the current user to sync the metadata to the nodes
with #5105 (and many other PRs), there is no reason that
prevents us to use the coordinated transaction for metadata syncing.

This commit also renames few functions to reflect their actual
implementation.
2021-08-09 10:34:55 +02:00
Marco Slot c03729ad03 Only warn about reference tables when removing last node 2021-06-01 10:53:12 +02:00
Jelte Fennema b1cad26ebc Move CheckCitusVersion to the top of each function
Previously this was usually done after argument parsing. This can cause
SEGFAULTs if the number or type of arguments changes in a new version.
By checking that Citus version is correct before doing any argument
parsing we protect against these types of issues. Issues like this have
occurred in pg_auto_failover, so it's not just a theoretical issue.

The main reason why these calls were not at the top of functions is
really just historical. It was because in the past we didn't allow
statements before declarations. Thus having this check before the
argument parsing would have only been possible if we first declared all
variables.

In addition to moving existing CheckCitusVersion calls it also adds
these calls to rebalancer related functions (they were missing there).
2021-06-01 17:43:46 +02:00
SaitTalhaNisanci 8c3f85692d
Not consider old placements when disabling or removing a node (#4960)
* Not consider old placements when disabling or removing a node

* update cluster test
2021-05-28 22:38:20 +02:00
Nils Dijk c91f8d8a15
Feature: localhost guc (#4836)
DESCRIPTION: introduce `citus.local_hostname` GUC for connections to the current node

Citus once in a while needs to connect to itself for some systems operations. This used to be hardcoded to `localhost`. The hardcoded hostname causes some issues, for example in environments where `sslmode=verify-full` is required. It is not always desirable or even feasible to get `localhost` as an alt name on the certificate.

By introducing a GUC to use when connecting to the current instance the user has more control what network path is used and what hostname is required to be present in the server certificate.
2021-05-12 16:59:44 +02:00
Marco Slot f25de6a0e3 Try to return earlier in idempotent master_add_node 2021-03-02 21:22:47 +01:00
Onder Kalaci fc9a23792c COPY uses adaptive connection management on local node
With #4338, the executor is smart enough to failover to
local node if there is not enough space in max_connections
for remote connections.

For COPY, the logic is different. With #4034, we made COPY
work with the adaptive connection management slightly
differently. The cause of the difference is that COPY doesn't
know which placements are going to be accessed hence requires
to get connections up-front.

Similarly, COPY decides to use local execution up-front.

With this commit, we change the logic for COPY on local nodes:

Try to reserve a connection to local host. This logic follows
the same logic (e.g., citus.local_shared_pool_size) as the
executor because COPY also relies on TryToIncrementSharedConnectionCounter().
If reservation to local node fails, switch to local execution
Apart from this, if local execution is disabled, we follow the
exact same logic for multi-node Citus. It means that if we are
out of the connection, we'd give an error.
2021-02-04 09:45:07 +01:00
Hadi Moshayedi bc01c795a2 Reland #4419 2021-01-19 07:48:47 -08:00
Ahmet Gedemenli 436c9d9d79
Remove the word 'master' from Citus UDFs (#4472)
* Replace master_add_node with citus_add_node

* Replace master_activate_node with citus_activate_node

* Replace master_add_inactive_node with citus_add_inactive_node

* Use master udfs in old scripts

* Replace master_add_secondary_node with citus_add_secondary_node

* Replace master_disable_node with citus_disable_node

* Replace master_drain_node with citus_drain_node

* Replace master_remove_node with citus_remove_node

* Replace master_set_node_property with citus_set_node_property

* Replace master_unmark_object_distributed with citus_unmark_object_distributed

* Replace master_update_node with citus_update_node

* Replace master_update_shard_statistics with citus_update_shard_statistics

* Replace master_update_table_statistics with citus_update_table_statistics

* Rename master_conninfo_cache_invalidate to citus_conninfo_cache_invalidate

Rename master_dist_local_group_cache_invalidate to citus_dist_local_group_cache_invalidate

* Replace master_copy_shard_placement with citus_copy_shard_placement

* Replace master_move_shard_placement with citus_move_shard_placement

* Rename master_dist_node_cache_invalidate to citus_dist_node_cache_invalidate

* Rename master_dist_object_cache_invalidate to citus_dist_object_cache_invalidate

* Rename master_dist_partition_cache_invalidate to citus_dist_partition_cache_invalidate

* Rename master_dist_placement_cache_invalidate to citus_dist_placement_cache_invalidate

* Rename master_dist_shard_cache_invalidate to citus_dist_shard_cache_invalidate

* Drop master_modify_multiple_shards

* Rename master_drop_all_shards to citus_drop_all_shards

* Drop master_create_distributed_table

* Drop master_create_worker_shards

* Revert old function definitions

* Add missing revoke statement for citus_disable_node
2021-01-13 12:10:43 +03:00
Marco Slot d900a7336e Automatically add placeholder record for coordinator 2021-01-08 15:09:53 +01:00
Marco Slot 597533b1ff Add citus_set_coordinator_host 2021-01-08 13:36:26 +01:00
Marco Slot d9f175532b Revert "Trigger metadata sync at transaction commit"
This reverts commit a2c73bef27.
2021-01-07 10:30:00 +01:00
Hadi Moshayedi a2c73bef27 Trigger metadata sync at transaction commit 2020-12-24 08:28:38 -08:00
Ahmet Gedemenli 936775e8e3 Delete transactions when removing node
With this commit, we delete entries in pg_dist_transaction
for the primary nodes that are removed by `master_remove_node`.
2020-12-07 11:35:20 +03:00
Ahmet Gedemenli 81db4dca5c Degrade gracefully when no background workers available 2020-10-05 16:55:00 +03:00
Onder Kalaci 5d017cd123 Improve node matedata when coordinator is added
Coordinator should always be always active, hasmetadata and
metadasynced. Prevent changing those fields.
2020-09-21 14:53:41 +02:00
Onder Kalaci 6fc1dea85c Improve the robustness of function call delegation
Pushing down the CALLs to the node that the CALL is executed is
dangerous and could lead to infinite recursion.

When the coordinator added as worker, Citus was by chance preventing
this. The coordinator was marked as "not metadatasynced" node
in pg_dist_node, which prevented CALL/function delegation to happen.

With this commit, we do the following:

  - Fix metadatasynced column for the coordinator on pg_dist_node
  - Prevent pushdown of function/procedure to the same node that
    the function/procedure is being executed. Today, we do not sync
    pg_dist_object (e.g., distributed functions metadata) to the
    worker nodes. But, even if we do it now, the function call delegation
    would prevent the infinite recursion.
2020-09-21 14:53:30 +02:00