Commit Graph

3723 Commits (d19d876b5f54f188155181700bae9606cb91ee6d)

Author SHA1 Message Date
Onder Kalaci af22a30b48 Use citus_finish_citus_upgrade() in the tests
We already have tests relying on citus_finalize_upgrade_to_citus11().
Now, adjust those to rely on citus_finish_citus_upgrade() and
always call citus_finish_citus_upgrade().
2022-06-13 13:15:15 +02:00
Marco Slot 36c4ec6d53 Introduce a citus_finish_citus_upgrade() function 2022-06-13 13:15:15 +02:00
Halil Ozan Akgul b255706189 Fixes the bug where undistribute can drop Citus extension 2022-05-31 16:23:28 +03:00
Onder Kalaci 89c1ccb7a5 Show that no metadata is sent when disabled 2022-05-30 13:41:06 +02:00
Onder Kalaci 7157152f6c Do not send metadata changes during add node if citus.enable_metadata_sync is set to false 2022-05-30 13:24:31 +02:00
Onder Kalaci 010a2a408e Avoid assertion failure on citus_add_node 2022-05-30 12:22:09 +02:00
Gledis Zeneli beef392f5a
Fix memory error with citus_add_node reported by valgrind test (#5967)
The error comes due to the datum jsonb in pg_dist_metadata_node.metadata being 0 in some scenarios. This is likely due to not copying the data when receiving a datum from a tuple and pg deciding to deallocate that memory when the table that the tuple was from is closed.
Also fix another place in the code that might have been susceptible to this issue.
I tested on both multi-vg and multi-1-vg and the test were successful.
2022-05-28 00:22:00 +03:00
Ahmet Gedemenli 26d927178c
Propagate dependent views upon distribution (#5950) 2022-05-26 14:23:45 +03:00
jeff-davis 74ce210f8b
Columnar: fix wraparound bug. (#5962)
columnar_vacuum_rel() now advances relfrozenxid.

Fixes #5958.
2022-05-25 07:50:48 -07:00
Burak Velioglu 1d7dda991f Create view and materialized views with right schema and owner while
altering the distributed table.

To be able to alter view's owner without enforcing sequential mode.
Alter view process functions have been udpated to use metadata
connection.
2022-05-24 15:27:30 +03:00
Gledis Zeneli 27ddb4fc8e
Do not obtain AccessShareLock before actual lock (#5965)
Do not obtain AccessShareLock before acquiring the distributed locks.

Acquiring an AccessShareLock ensures that the relations which we are trying to get a distributed lock on will not be dropped in the time between when the LOCK command is issued and the LOCK commands are send to the worker. However, this also leads to distributed deadlocks in such scenarios:

```sql
-- for dist lock acquiring order coor, w1, w2

-- on w2
LOCK t1 IN ACCESS EXLUSIVE MODE;
-- acquire AccessShareLock locally on t1 to ensure it is not dropped while we get ready to distribute the lock

      -- concurrently on w1
      LOCK t1 IN ACCESS EXLUSIVE MODE;
      -- acquire AccessShareLock locally on t1 to ensure it is not dropped while we get ready to distribute the lock
      -- acquire dist lock on coor, w1, gets blocked on local AccessShareLock on w2

-- on w2 continuation of the execution above
-- starts to acquire dist locks and gets blocked on the coor by the lock acquired by w1

-- distributed deadlock

``` 

We opt for avoiding such deadlocks with the cost of the possibility of running into errors when the relations on which we are trying to acquire locks on get dropped.
2022-05-23 13:06:38 +03:00
Onder Kalaci dd02e1755f Parallelize metadata syncing on node activate
It is often useful to be able to sync the metadata in parallel
across nodes.

Also citus_finalize_upgrade_to_citus11() uses
start_metadata_sync_to_primary_nodes() after this commit.

Note that this commit does not parallelize all pieces of node
activation or metadata syncing. Instead, it tries to parallelize
potenially large parts of metadata, which is the objects and
distributed tables (in general Citus tables).

In the future, it would be nice to sync the reference tables
in parallel across nodes.

Create ~720 distributed tables / ~23450 shards
```SQL
-- declaratively partitioned table
CREATE TABLE github_events_looooooooooooooong_name (
  event_id bigint,
  event_type text,
  event_public boolean,
  repo_id bigint,
  payload jsonb,
  repo jsonb,
  actor jsonb,
  org jsonb,
  created_at timestamp
) PARTITION BY RANGE (created_at);

SELECT create_time_partitions(
  table_name         := 'github_events_looooooooooooooong_name',
  partition_interval := '1 day',
  end_at             := now() + '24 months'
);

CREATE INDEX ON github_events_looooooooooooooong_name USING btree (event_id, event_type, event_public, repo_id);
SELECT create_distributed_table('github_events_looooooooooooooong_name', 'repo_id');

SET client_min_messages TO ERROR;

```

across 1 node: almost same as expected
```SQL

SELECT start_metadata_sync_to_primary_nodes();
Time: 15664.418 ms (00:15.664)

select start_metadata_sync_to_node(nodename,nodeport) from pg_dist_node;
Time: 14284.069 ms (00:14.284)
```

across 7 nodes: ~3.5x improvement
```SQL

SELECT start_metadata_sync_to_primary_nodes();
┌──────────────────────────────────────┐
│ start_metadata_sync_to_primary_nodes │
├──────────────────────────────────────┤
│ t                                    │
└──────────────────────────────────────┘
(1 row)

Time: 25711.192 ms (00:25.711)

-- across 7 nodes
select start_metadata_sync_to_node(nodename,nodeport) from pg_dist_node;
Time: 82126.075 ms (01:22.126)
```
2022-05-23 09:15:48 +02:00
jeff-davis a2f5b068e6
Columnar: tighten security and improve visibility. (#5922)
Move internal storage details to a separate schema with no public
access to limit the possibility for information leakage.

Create views with public access that show storage details for those
columnar tables where the user has ownership privileges. Include
mapping between relation ID and storage ID for easier interpretation.
2022-05-20 15:30:31 -07:00
Hanefi Onaldi 52541c5802 Add normalization rules for flaky isolation tests
We remove `<waiting ...>` and `<... completed>` outputs for some CREATE
INDEX CONCURRENTLY commands since they can cause flakiness in some scenarios.

Postgres calls WaitForOlderSnapshots() and this can cause CREATE INDEX
CONCURRENTLY commands for shards to get blocked by each other for brief
periods of time. The extra waits can pop-up, or they can get completed
at different lines in the output files. To remedy that, we rename those
indexes to be captured by the new normalization rule.
2022-05-21 00:55:47 +03:00
Ying Xu a1151c2395
Clear metadatacache during abort for create extension (#5907)
* Bug fix for bug #5876. Memset MetadataCacheSystem every time there is an abort

* Created an ObjectAccessHook that saves the transactionlevel of when citus was created and will clear metadatacache if that transaction level is rolled back. Added additional tests to make sure metadatacache is cleared
2022-05-20 13:47:58 -07:00
Marco Slot 7abcfac61f Add caching for functions that check the backend type 2022-05-20 19:02:37 +02:00
Marco Slot 09ec366ff5 Improve nested execution checks and add GUC to disable 2022-05-20 18:55:43 +02:00
Marco Slot e683993449 Fix prepared statement bug when switching from local to remote execution 2022-05-20 18:55:43 +02:00
jeff-davis a9f8a60007
Columnar: support relation options with ALTER TABLE. (#5935)
Columnar: support relation options with ALTER TABLE.

Use ALTER TABLE ... SET/RESET to specify relation options rather than
alter_columnar_table_set() and alter_columnar_table_reset().

Not only is this more ergonomic, but it also allows better integration
because it can be treated like DDL on a regular table. For instance,
citus can use its own ProcessUtility_hook to distribute the new
settings to the shards.

DESCRIPTION: Columnar: support relation options with ALTER TABLE.
2022-05-20 08:35:00 -07:00
Marco Slot ad5214b50c Allow distributed execution from run_command_on_* functions 2022-05-20 15:26:47 +02:00
gledis69 4731630741 Add distributing lock command support 2022-05-20 12:28:07 +03:00
Marco Slot 79d7e860e6 Add a run_command_on_coordinator function 2022-05-19 10:26:09 +02:00
Marco Slot fa9cee409c Fix downgrade scripts and add new downgrade tests 2022-05-19 10:26:09 +02:00
Ahmet Gedemenli 48d5c9a1b5 Fix schemaname qualify for rename seq stmts 2022-05-18 19:04:22 +03:00
Onder Kalaci 0596062f96 Serialize reference table modifications with node changes & restore point
With Citus MX enabled, when a reference table is modified, it does
some operations on the first worker node(e.g., acquire locks).

If node metadata is locked (via add node or create restore point),
the changes to the reference tables should be blocked.
2022-05-18 17:23:38 +02:00
Onder Kalaci 127450466e Do not warn unncessarily when a node is removed
In the past (pre-11), we allowed removing worker nodes
that had active placements for replicated distributed
table, without even checking if there are any other
replicas of the same placement.

However, with #5469, we prevent disabling nodes via a hard
error when there is the last active placement of shard, as we
do for reference tables. Note that otherwise, we'd allow
users to lose data.

As of today, the NOTICE is completely irrelevant.
2022-05-18 17:23:38 +02:00
Onder Kalaci b4dbd84743 Prevent distributed queries while disabling first worker node
First worker node has a special meaning for modifications on the replicated tables

It is used to acquire a remote lock, such that the modifications are serialized.

With this commit, we make sure that we do not let any distributed query to see a
different 'first worker node' while first worker node is disabled.

Note that, maybe implicitly mentioned above, when first worker node is disabled,
the first worker node changes, that's why we have to handle the situation.
2022-05-18 17:21:12 +02:00
Onder Kalaci db998b3d66 Adds "sync" option to citus_disable_node() UDF
Before this commit, we had:
```SQL
SELECT citus_disable_node(nodename, nodeport, force boolean DEFAULT false)
```

Where, we allow forcing to disable first worker node with
`force:=true`. However, it entails the risk for losing
data / diverging placement data etc.

With `force` flag, we control disabling the first worker node,
and with `async` flag we control whether the changes are done
via bg worker or immediately.

```SQL
SELECT citus_disable_node(nodename, nodeport, force boolean DEFAULT false, sync boolean DEFAULT false)
```

Where we can achieve all the following:

| Mode  | Data loss possibility | Can run in 2PC | Handle multiple node failures | Immediately effective |
| --- |--- |--- |--- |--- |
| force:false, sync: false  | false   | true  | true  | false |
| force:false, sync: true   | false  | false | false | true |
| force:true, sync: false   | true   | true  | true   | false |
| force:true, sync: true    | false  | false | false  | true |
2022-05-18 17:21:12 +02:00
Onder Kalaci 2cc4053fc1 Fixes a bug that prevents dropping/altering indexes
There are two problems in this area. First, when there are expressions
on the index name, we should call `transformIndexExpression()` before
generating the index name. That is what Postgres does.

Second, because of 40c24bfef9
PG 13 and PG 14 generates different names for indexes with function calls even for local PG tables.
Assume we have:
```SQL
create table t(id int);
select create_distributed_table('t', 'id');
create index ON t (my_very_boring_function(id));
```

On PG 13, the name of the index is `t_expr_idx`
```SQL
\d t
Table "public.t"
┌────────┬─────────┬───────────┬──────────┬─────────┐
│ Column │  Type   │ Collation │ Nullable │ Default │
├────────┼─────────┼───────────┼──────────┼─────────┤
│ id     │ integer │           │          │         │
└────────┴─────────┴───────────┴──────────┴─────────┘
Indexes:
    "t_expr_idx" btree (my_very_boring_function(id::bigint))
```

On PG 14, the name of the index is `t_my_very_boring_function_idx`
```SQL
\d t
 Table "public.t"
┌────────┬─────────┬───────────┬──────────┬─────────┐
│ Column │  Type   │ Collation │ Nullable │ Default │
├────────┼─────────┼───────────┼──────────┼─────────┤
│ id     │ integer │           │          │         │
└────────┴─────────┴───────────┴──────────┴─────────┘
Indexes:
    "t_my_very_boring_function_idx" btree (my_very_boring_function(id::bigint))

```

The second issue is not very critical. The important part is that
we adjust regression tests to drop all the indexes, which ensures
the index names are sane on any version.
2022-05-18 16:35:17 +02:00
Nils Dijk b71a08955a
Refactor: reduce complexity and code duplication for Object Propagation
Over time we have added significantly improved the support for objects to be propagated by Citus as to make scaling out the database more seamless. It became evident that there was a lot of code duplication that got into the codebase to implement the propagation.

This PR tries to reduce the amount of repeated code that is at most only slightly different. To make things worse, most of the differences were actually oversights instead of correct.

This Patch introduces 3 reusable sets of pre/post processing steps for respectively
 - create
 - alter
 - drop

With the use of the common functionality we should have more coherent behaviour between different supported object by Citus.

Some steps either omit the Pre or Post processing step if they would not make sense to include.

All tests pass, only 1 test needed changing, foreign servers, as the dropping of foreign servers didn't implement support for dropping multiple foreign servers at once. Given the common approach correctly supports dropping of multiple objects, either distributed or not, the test that assumed it wouldn't work was now obsolete.
2022-05-18 15:58:28 +02:00
Onder Kalaci ee45e7bfbf Mark existing views as distributed when upgrade to 11.0+
We have a mechanism which ensures that newly distributed
objects are recorded during `alter extension citus update`.

However, the logic was lacking "view"s. With this commit, we make
sure that existing views are also marked as distributed during
upgrade.
2022-05-18 15:43:17 +02:00
Nils Dijk 14c6c799f2
suppress notices when more dependencies are found (#5954)
We are nearing the 100 objects being propagated in `master_copy_shard_placement` and with the extra supported objects this gets pushed over a 100 objects.

When a 100 objects are reached for propagation a notice will be shown to the user, informing them it might take a while to finish the operation.

During testing this is not important to see. Since the message contains the exact number of objects to be propagated the tests becomes very unstable when merging community into enterprsie.

This change makes that the test output stays stable.
2022-05-18 14:31:10 +03:00
Hanefi Onaldi 313104ab9b
Grep logs for deterministic global_cancel test results (#5948) 2022-05-18 11:09:54 +03:00
Halil Ozan Akgul d171a736ab Revert "Creates new colocation for colocate_with:='none' too"
This reverts commit f74447b3b7.
2022-05-17 15:32:22 +03:00
Ahmet Gedemenli aa8f46ead0
Fix schema name bug for sequences (#5937) 2022-05-16 18:11:57 +03:00
Halil Ozan Akgul f74447b3b7 Creates new colocation for colocate_with:='none' too 2022-05-16 13:39:05 +03:00
Teja Mupparti e56fc34404 Fixes: #5787 In prepared statements, map any unused parameters
to a generic type.
2022-05-13 19:31:05 -07:00
Burak Velioglu 1875516ae9 Add ALTER VIEW support
Adds support for propagation ALTER VIEW commands to
- Change owner of view
- SET/RESET option
- Rename view and view's column name
- Change schema of the view

Since PG also supports targeting views with ALTER TABLE
commands, related code also added to direct such ALTER TABLE
commands to ALTER VIEW commands while sending them to workers.
2022-05-13 13:21:53 +03:00
Marco Slot 6fad5dc207 Add a citus_is_coordinator function 2022-05-13 10:02:52 +02:00
Ahmet Gedemenli 00e0f4d8e6 Fix alter statistics namespace name 2022-05-11 18:44:37 +03:00
Gledis Zeneli 4c6f62efc6
Switch to using LOCK instead of lock_relation_if_exists in TRUNCATE (#5930)
Breaking down #5899 into smaller PR-s

This particular PR changes the way TRUNCATE acquires distributed locks on the relations it is truncating to use the LOCK command instead of lock_relation_if_exists. This has the benefit of using pg's recursive locking logic it implements for the LOCK command instead of us having to resolve relation dependencies and lock them explicitly. While this does not directly affect truncate, it will allow us to generalize this locking logic to then log different relations where the pg recursive locking will become useful (e.g. locking views).

This implementation is a bit more complex that it needs to be due to pg not supporting locking foreign tables. We can however, still lock foreign tables with lock_relation_if_exists. So for a command:

TRUNCATE dist_table_1, dist_table_2, foreign_table_1, foreign_table_2, dist_table_3;

We generate and send the following command to all the workers in metadata:
```sql
SEL citus.enable_ddl_propagation TO FALSE;
LOCK dist_table_1, dist_table_2 IN ACCESS EXCLUSIVE MODE;
SELECT lock_relation_if_exists('foreign_table_1', 'ACCESS EXCLUSIVE');
SELECT lock_relation_if_exists('foreign_table_2', 'ACCESS EXCLUSIVE');
LOCK dist_table_3 IN ACCESS EXCLUSIVE MODE;
SEL citus.enable_ddl_propagation TO TRUE;
```

Note that we need to alternate between the lock command and lock_table_if_exists in order to preserve the TRUNCATE order of relations.
When pg supports locking foreign tables, we will be able to massive simplify this logic and send a single LOCK command.
2022-05-11 18:38:48 +03:00
Burak Velioglu 1460452442 Introduce CREATE/DROP VIEW
Adds support for propagating create/drop view commands and views to
worker node while scaling out the cluster. Since views are dropped while
converting the table type, metadata connection will be used while
propagating view commands to not switch to sequential mode.
2022-05-10 13:07:14 +03:00
Burak Velioglu 06a94d167e Use object address instead of relation id on DDLJob to decide on syncing metadata 2022-05-05 17:59:44 +03:00
Onder Kalaci f193e16a01 Refrain reading the metadata cache for all tables during upgrade
First, it is not needed. Second, in the past we had issues regarding
this: https://github.com/citusdata/citus/pull/4344

When I create 10k tables, ~120K shards, this saves
40Mb of memory during ALTER EXTENSION citus UPDATE.

Before the change:  MetadataCacheMemoryContext: 41943040 ~ 40MB
After the change:  MetadataCacheMemoryContext: 8192
2022-05-04 16:44:06 +02:00
Marco Slot ceb593c9da Convert citus.hide_shards_from_app_name_prefixes to citus.show_shards_for_app_name_prefixes 2022-05-03 14:22:13 +02:00
Jeff Davis 3e1180de78 PG15: handle extra argument to parse_analyze_varparams().
From PG commit 25751f54b8.
2022-05-02 10:12:03 -07:00
Jeff Davis b6a5617ea8 PG15: handle pg_analyze_and_rewrite_* renaming.
From PG commit 791b1b71da.
2022-05-02 10:12:03 -07:00
Jeff Davis 33ee4877d4 PG15: rename pgstat_initstats() -> pgstat_init_relation().
From PG commits bff258a273 and be902e2651.
2022-05-02 10:12:03 -07:00
Jeff Davis 033f9cfff7 PG15: update copied pg_get_object_address() code.
Account for PG commits 5a2832465fd8 and a0ffa885e478.
2022-05-02 10:12:03 -07:00
Jeff Davis bd455f42e3 PG15: handle change to SeqScan structure.
Account for PG commit 2226b4189b. The one site dependent on it can do
just as well with a Scan instead of a SeqScan.
2022-05-02 10:12:03 -07:00