Commit Graph

106 Commits (0eee7fd9b8cc86d704687babbe6533c0884a044d)

Author SHA1 Message Date
Hanefi Onaldi 0eee7fd9b8
Fix downgrade scripts from 11.0-2 to 11.0-1
(cherry picked from commit f60809a6c1)

Conflicts:
	src/test/regress/expected/multi_extension.out
	src/test/regress/sql/multi_extension.sql
2022-06-29 22:52:07 +03:00
Önder Kalacı 03a4305e06
Fixes a bug that prevents upgrades when there are no worker nodes (#6037)
(cherry picked from commit bab4c0a8c3)
2022-06-29 14:36:24 +03:00
Marco Slot 4bcffce036 Introduce a citus_finish_citus_upgrade() function 2022-06-13 13:28:31 +02:00
Onder Kalaci 8b0499c91a Parallelize metadata syncing on node activate
It is often useful to be able to sync the metadata in parallel
across nodes.

Also citus_finalize_upgrade_to_citus11() uses
start_metadata_sync_to_primary_nodes() after this commit.

Note that this commit does not parallelize all pieces of node
activation or metadata syncing. Instead, it tries to parallelize
potenially large parts of metadata, which is the objects and
distributed tables (in general Citus tables).

In the future, it would be nice to sync the reference tables
in parallel across nodes.

Create ~720 distributed tables / ~23450 shards
```SQL
-- declaratively partitioned table
CREATE TABLE github_events_looooooooooooooong_name (
  event_id bigint,
  event_type text,
  event_public boolean,
  repo_id bigint,
  payload jsonb,
  repo jsonb,
  actor jsonb,
  org jsonb,
  created_at timestamp
) PARTITION BY RANGE (created_at);

SELECT create_time_partitions(
  table_name         := 'github_events_looooooooooooooong_name',
  partition_interval := '1 day',
  end_at             := now() + '24 months'
);

CREATE INDEX ON github_events_looooooooooooooong_name USING btree (event_id, event_type, event_public, repo_id);
SELECT create_distributed_table('github_events_looooooooooooooong_name', 'repo_id');

SET client_min_messages TO ERROR;

```

across 1 node: almost same as expected
```SQL

SELECT start_metadata_sync_to_primary_nodes();
Time: 15664.418 ms (00:15.664)

select start_metadata_sync_to_node(nodename,nodeport) from pg_dist_node;
Time: 14284.069 ms (00:14.284)
```

across 7 nodes: ~3.5x improvement
```SQL

SELECT start_metadata_sync_to_primary_nodes();
┌──────────────────────────────────────┐
│ start_metadata_sync_to_primary_nodes │
├──────────────────────────────────────┤
│ t                                    │
└──────────────────────────────────────┘
(1 row)

Time: 25711.192 ms (00:25.711)

-- across 7 nodes
select start_metadata_sync_to_node(nodename,nodeport) from pg_dist_node;
Time: 82126.075 ms (01:22.126)
```

(cherry picked from commit dd02e1755f)
2022-05-23 09:25:31 +02:00
Onder Kalaci 5fe384329e Adds "sync" option to citus_disable_node() UDF 2022-05-19 11:00:51 +02:00
Marco Slot c20732142e Add a run_command_on_coordinator function 2022-05-19 10:41:10 +02:00
Marco Slot 082a14656d Fix downgrade scripts and add new downgrade tests 2022-05-19 10:37:56 +02:00
Marco Slot e8b41d1e5b Convert citus.hide_shards_from_app_name_prefixes to citus.show_shards_for_app_name_prefixes 2022-05-05 13:24:23 +02:00
Halil Ozan Akgul 4dbc760603 Introduces citus_coordinator_node_id 2022-03-22 10:34:22 +03:00
Halil Ozan Akgül 333bcc7948
Global PID Helper Functions (#5768)
* Introduces citus_nodename_for_nodeid and citus_nodeport_for_nodeid functions

* Introduces citus_nodeid_for_gpid and citus_pid_for_gpid functions

* Add tests
2022-03-09 13:15:59 +03:00
Onder Kalaci c32b2de1a7 Improve citus_lock_waits
1) Remove useless columns
2) Show backends that are blocked on a DDL even before
   gpid is assigned
3) One minor bugfix, where we clear distributedCommandOriginator
   properly.
2022-03-07 11:10:44 +01:00
Nils Dijk 3801576dfb
Move pg_dist_object to pg_catalog (#5765)
DESCRIPTION: Move pg_dist_object to pg_catalog

Historically `pg_dist_object` had been created in the `citus` schema as an experiment to understand if we could move our catalog tables to a branded schema. We quickly realised that this interfered with the UX on our managed services and other environments, where users connected via a user with the name of `citus`.

By default postgres put the username on the search_path. To be able to read the catalog in the `citus` schema we would need to grant access permissions to the schema. This caused newly created objects like tables etc, to default to this schema for creation. This failed due to the write permissions to that schema.

With this change we move the `pg_dist_object` catalog table to the `pg_catalog` schema, where our other schema's are also located. This makes the catalog table visible and readable by any user, like our other catalog tables, for debugging purposes.

Note: due to the change of schema, we had to disable 1 test that was running into a discrepancy between the schema and binary. Secondly, we needed to make the lookup functions for the `pg_dist_object` relation and their indexes less strict on the fallback of the naming due to an other test that, due to an unfortunate cache invalidation, needed to lookup the relation again. This makes that we won't default to _only_ resolving from `pg_catalog` outside of upgrades.
2022-03-04 17:40:38 +00:00
Halil Ozan Akgul 0500a62515 Updates citus_dist_stat_activity to use citus_stat_activity 2022-03-04 17:28:17 +03:00
Onder Kalaci c7b67ba0ea Add citus_backend_gpid()
And also citus_calculate_gpid(nodeId,pid). These UDFs are just
wrappers for the existing functions. Useful for testing and simple
manipulation of citus_stat_activity.
2022-03-03 15:29:40 +01:00
Halil Ozan Akgul 06a0509b1a Introduces citus_stat_activity view 2022-03-03 16:19:20 +03:00
Marco Slot 3ba61244b8 Synchronize pg_dist_colocation metadata 2022-03-03 11:01:59 +01:00
Onder Kalaci 35ec9721b4 Add a new API for enabling Citus MX for clusters upgrading from earlier versions
Clusters created pre-Citus 11 mostly didn't have metadata sync enabled.
For those clusters, we add a utility UDF which fixes some minor issues
and sync the necessary objects to the workers.
2022-03-02 17:02:55 +01:00
Marco Slot 0c4e3cb69c Drop worker_partition_query_result on downgrade 2022-02-24 10:18:56 +01:00
Marco Slot 72d8fde28b Use intermediate results for re-partition joins 2022-02-23 19:40:21 +01:00
Onder Kalaci dffcafc096 Use global pids in citus_lock_waits 2022-02-21 17:46:34 +01:00
Onder Kalaci 331af3dce8 Dumping wait edges becomes optionally scan all backends
Before this commit, dumping wait edges can only be used for
distributed deadlock detection purposes. With this commit,
we open the possibility that we can use it for any backend.
2022-02-21 17:37:07 +01:00
Halil Ozan Akgul f6cd4d0f07 Overrides pg_cancel_backend and pg_terminate_backend to accept global pid 2022-02-21 16:41:35 +03:00
Nils Dijk ea86f9f94e
Add support for TEXT SEARCH CONFIGURATION objects (#5685)
DESCRIPTION: Implement TEXT SEARCH CONFIGURATION propagation

The change adds support to Citus for propagating TEXT SEARCH CONFIGURATION objects. TSConfig objects cannot always be created in one create statement, and instead require a create statement followed by many alter statements to get turned into the object they should represent.

To support this we add functionality to the worker to create or replace objects based on a list of statements. When the lists of the local object and the remote object correspond 1:1 we skip the creation of the object and simply mark it distributed. This is especially important for TSConfig objects as initdb pre-populates databases with a dozen configurations (for many different languages).

When the user creates a new TSConfig based on the copy of an existing configuration there is no direct link to the object copied from. Since there is no link we can't simply rely on propagating the dependencies to the worker and send a qualified
2022-02-17 13:12:46 +01:00
Halil Ozan Akgul 8ee02b29d0 Introduce global PID 2022-02-08 16:49:38 +03:00
Burak Velioglu f88cc230bf
Handle tables and objects as metadata. Update UDFs accordingly
With this commit we've started to propagate sequences and shell
tables within the object dependency resolution. So, ensuring any
dependencies for any object will consider shell tables and sequences
as well. Separate logics for both shell tables and sequences have
been removed.

Since both shell tables and sequences logic were implemented as a
part of the metadata handling before that logic, we were propagating
them while syncing table metadata. With this commit we've divided
metadata (which means anything except shards thereafter) syncing
logic into multiple parts and implemented it either as a part of
ActivateNode. You can check the functions called in ActivateNode
to check definition of different metadata.

Definitions of start_metadata_sync_to_node and citus_activate_node
have also been updated. citus_activate_node will basically create
an active node with all metadata and reference table shards.
start_metadata_sync_to_node will be same with citus_activate_node
except replicating reference tables. stop_metadata_sync_to_node
will remove all the metadata. All of those UDFs need to be called
by superuser.
2022-01-31 16:20:15 +03:00
Teja Mupparti 54862f8c22 (1) Functions will be delegated even when present in the scope of an explicit
BEGIN/COMMIT transaction block or in a UDF calling another UDF.
(2) Prohibit/Limit the delegated function not to do a 2PC (or any work on a
remote connection).
(3) Have a safety net to ensure the (2) i.e. we should block the connections
from the delegated procedure or make sure that no 2PC happens on the node.
(4) Such delegated functions are restricted to use only the distributed argument
value.

Note: To limit the scope of the project we are considering only Functions(not
procedures) for the initial work.

DESCRIPTION: Introduce a new flag "force_delegation" in create_distributed_function(),
which will allow a function to be delegated in an explicit transaction block.

Fixes #3265

Once the function is delegated to the worker, on that node during the planning

distributed_planner()
TryToDelegateFunctionCall()
CheckDelegatedFunctionExecution()
EnableInForceDelegatedFuncExecution()
Save the distribution argument (Constant)
ExecutorStart()
CitusBeginScan()
IsShardKeyValueAllowed()
Ensure to not use non-distribution argument.

ExecutorRun()
AdaptiveExecutor()
StartDistributedExecution()
EnsureNoRemoteExecutionFromWorkers()
Ensure all the shards are local to the node in the remoteTaskList.
NonPushableInsertSelectExecScan()
InitializeCopyShardState()
EnsureNoRemoteExecutionFromWorkers()
Ensure all the shards are local to the node in the placementList.

This also fixes a minor issue: Properly handle expressions+parameters in distribution arguments
2022-01-19 16:43:33 -08:00
Marco Slot 33bfa0b191 Hide shards from application_name's with a specific prefix 2022-01-18 15:20:55 +04:00
Önder Kalacı 5305aa4246
Do not drop sequences when dropping metadata (#5584)
Dropping sequences means we need to recreate
and hence losing the sequence.

With this commit, we keep the existing sequences
such that resyncing wouldn't drop the sequence.

We do that by breaking the dependency of the sequence
from the table.
2022-01-06 09:48:34 +01:00
Önder Kalacı c9127f921f
Avoid round trips while fixing index names (#5549)
With this commit, fix_partition_shard_index_names()
works significantly faster.

For example,

32 shards, 365 partitions, 5 indexes drop from ~120 seconds to ~44 seconds
32 shards, 1095 partitions, 5 indexes drop from ~600 seconds to ~265 seconds

`queryStringList` can be really long, because it may contain #partitions * #indexes entries.

Before this change, we were actually going through the executor where each command
in the query string triggers 1 round trip per entry in queryStringList.

The aim of this commit is to avoid the round-trips by creating a single query string.

I first simply tried sending `q1;q2;..;qn` . However, the executor is designed to
handle `q1;q2;..;qn` type of query executions via the infrastructure mentioned
above (e.g., by tracking the query indexes in the list and doing 1 statement
per round trip).

One another option could have been to change the executor such that only track
the query index when `queryStringList` is provided not with queryString
including multiple `;`s . That is (a) more work (b) could cause weird edge
cases with failure handling (c) felt like coding a special case in to the executor
2021-12-27 10:29:37 +01:00
Hanefi Onaldi 29e4516642 Introduce citus_check_cluster_node_health UDF
This UDF coordinates connectivity checks accross the whole cluster.

This UDF gets the list of active readable nodes in the cluster, and
coordinates all connectivity checks in sequential order.

The algorithm is:

for sourceNode in activeReadableWorkerList:
    c = connectToNode(sourceNode)
    for targetNode in activeReadableWorkerList:
        result = c.execute(
            "SELECT citus_check_connection_to_node(targetNode.name,
                                                   targetNode.port")
        emit sourceNode.name,
             sourceNode.port,
             targetNode.name,
             targetNode.port,
             result

- result -> true  ->  connection attempt from source to target succeeded
- result -> false -> connection attempt from source to target failed
- result -> NULL  -> connection attempt from the current node to source node failed

I suggest you use the following query to get an overview on the connectivity:

SELECT bool_and(COALESCE(result, false))
FROM citus_check_cluster_node_health();

Whenever this query returns false, there is a connectivity issue, check in detail.
2021-12-15 01:41:51 +03:00
Burak Velioglu ed8e32de5e
Sync pg_dist_object on an update and propagate while syncing to a new node
Before that PR we were updating citus.pg_dist_object metadata, which keeps
the metadata related to objects on Citus, only on the coordinator node. In
order to allow using those object from worker nodes (or erroring out with
proper error message) we've started to propagate that metedata to worker
nodes as well.
2021-12-06 19:25:50 +03:00
Hanefi Onaldi 56e9b1b968 Introduce UDF to check worker connectivity
citus_check_connection_to_node runs a simple query on a remote node and
reports whether this attempt was successful.

This UDF will be used to make sure each worker node can connect to all
the worker nodes in the cluster.

parameters:
nodename: required
nodeport: optional (default: 5432)

return value:
boolean success
2021-12-03 02:30:28 +03:00
Onder Kalaci 549edcabb6 Allow disabling node(s) when multiple failures happen
As of master branch, Citus does all the modifications to replicated tables
(e.g., reference tables and distributed tables with replication factor > 1),
via 2PC and avoids any shardstate=3. As a side-effect of those changes,
handling node failures for replicated tables change.

With this PR, when one (or multiple) node failures happen, the users would
see query errors on modifications. If the problem is intermitant, that's OK,
once the node failure(s) recover by themselves, the modification queries would
succeed. If the node failure(s) are permenant, the users should call
`SELECT citus_disable_node(...)` to disable the node. As soon as the node is
disabled, modification would start to succeed. However, now the old node gets
behind. It means that, when the node is up again, the placements should be
re-created on the node. First, use `SELECT citus_activate_node()`. Then, use
`SELECT replicate_table_shards(...)` to replicate the missing placements on
the re-activated node.
2021-12-01 10:19:48 +01:00
Onur Tirtir 73f06323d8 Introduce dependencies from columnarAM to columnar metadata objects
During pg upgrades, we have seen that it is not guaranteed that a
columnar table will be created after metadata objects got created.
Prior to changes done in this commit, we had such a dependency
relationship in `pg_depend`:

```
columnar_table ----> columnarAM ----> citus extension
                                           ^  ^
                                           |  |
columnar.storage_id_seq --------------------  |
                                              |
columnar.stripe -------------------------------
```

Since `pg_upgrade` just knows to follow topological sort of the objects
when creating database dump, above dependency graph doesn't imply that
`columnar_table` should be created before metadata objects such as
`columnar.storage_id_seq` and `columnar.stripe` are created.

For this reason, with this commit we add new records to `pg_depend` to
make columnarAM depending on all rel objects living in `columnar`
schema. That way, `pg_upgrade` will know it needs to create those before
creating `columnarAM`, and similarly, before creating any tables using
`columnarAM`.

Note that in addition to inserting those records via installation script,
we also do the same in `citus_finish_pg_upgrade()`. This is because,
`pg_upgrade` rebuilds catalog tables in the new cluster and that means,
we must insert them in the new cluster too.
2021-11-23 13:14:00 +03:00
Hanefi Onaldi 3d9cec70fd
Update migration paths from 10.2 to 11.0 (#5459)
We recently introduced a set of patches to 10.2, and introduced 10.2-4
migration version. This migration version only resides on `release-10.2`
branch, and is missing on our default branch. This creates a problem
because we do not have a valid migration path from 10.2 to latest 11.0.

To remedy this issue, I copied the relevant migration files from
`release-10.2` branch, and renamed some of our migration files on
default branch to make sure we have a linear upgrade path.
2021-11-11 13:55:28 +03:00
Marco Slot 78866df13c Remove master_append_table_to_shard UDF 2021-11-08 10:43:24 +01:00
Halil Ozan Akgul c0785d570c Remove EnsureSuperUser from start and stop metadata sync to node 2021-11-01 18:01:49 +03:00
Ahmet Gedemenli 67dca4363d
Dont auto-undistribute user-added citus local tables (#5314)
* Disable auto-undistribute for user-added citus local tables
2021-10-28 12:10:26 +03:00
Marco Slot dafba6c242 Deprecate master_get_table_metadata UDF 2021-10-21 12:08:05 +02:00
Marco Slot 096660d61d Remove master_apply_delete_command 2021-10-18 22:29:37 +02:00
Naisila Puka d0390af72d
Add fix_partition_shard_index_names udf to fix currently broken names (#5291)
* Add udf to include shardId in broken partition shard index names

* Address reviews: rename index such that operations can be done on it

* More comprehensive index tests

* Final touches and formatting
2021-10-07 19:34:52 +03:00
Hanefi Onaldi a74409f24c
Bump Citus to 11.0devel 2021-10-01 22:21:22 +03:00
Önder Kalacı c2311b4c0c
Make (columnar.stripe) first_row_number index a unique constraint (#5324)
* Make (columnar.stripe) first_row_number index a unique constraint

Since stripe_first_row_number_idx is required to scan a columnar
table, we need to make sure that it is created before doing anything
with columnar tables during pg upgrades.

However, a plain btree index is not a dependency of a table, so
pg_upgrade cannot guarantee that stripe_first_row_number_idx gets
created when creating columnar.stripe, unless we make it a unique
"constraint".

To do that, drop stripe_first_row_number_idx and create a unique
constraint with the same name to keep the code change at minimum.

* Add more pg upgrade tests for columnar

* Fix a logic error in uprade_columnar_after test

Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
2021-09-30 10:51:56 +03:00
Onur Tirtir 77a2dd68da
Revoke read access to columnar.chunk from unprivileged user (#5313)
Since this could expose chunk min/max values to unprivileged users.
2021-09-22 16:23:02 +03:00
Naisila Puka a69abe3be0
Fixes bug about int and smallint sequences on MX (#5254)
* Introduce worker_nextval udf for int&smallint column defaults

* Fix current tests and add new ones for worker_nextval
2021-09-09 23:41:07 +03:00
Hanefi Onaldi 9ae912a8c8
Prevent C-style comments in all directories (#5250) 2021-09-09 11:54:58 +03:00
Burak Velioglu c3895f35cd
Add helper UDFs for easy time partition management
- get_missing_time_partition_ranges: Gets the ranges of missing partitions for the given table, interval and range unless any existing partition conflicts with calculated missing ranges.

- create_time_partitions: Creates partitions by getting range values from get_missing_time_partition_ranges.

- drop_old_time_partitions: Drops partitions of the table older than given threshold.
2021-09-03 23:03:13 +03:00
Naisila Puka acb5ae6ab6
Skip dropping shards when we know it's a partition (#5176) 2021-08-31 17:41:37 +03:00
Onder Kalaci 482b8096e9 Introduce citus_internal_update_relation_colocation
update_distributed_table_colocation can be called by the relation
owner, and internally it updates pg_dist_partition. With this
commit, update_distributed_table_colocation uses an internal
UDF to access pg_dist_partition.

As a result, this operation can now be done by regular users
on MX.
2021-08-03 11:44:58 +02:00
Onder Kalaci c8368e7929 Introduce citus_internal_delete_shard_metadata
With this function, the owner of the table is allowed to remove
shard metadata. This is going to be useful for tenant-isolation.
2021-07-19 13:25:05 +02:00