Commit Graph

3255 Commits (acecdf88278aa4e9fd7d58c0f3c77874ff21ad05)

Author SHA1 Message Date
Hanefi Onaldi 914aa87c4e
Do not acquire locks if the index does not exist 2023-06-19 01:41:38 +03:00
Marco Slot 3c4cec827d
Propagate create/drop/reindex index via workers 2023-06-19 01:41:38 +03:00
Nils Dijk ce2ba1d07e
Optimize QueryPushdownSqlTaskList on memory and cpu (#6945)
While going over this piece of code (a long time ago) it was bothering
to me we keep a bool array with the size of shardcount to iterate only
over shards present in the list of non-pruned shards. Especially since
we keep min/max of the set shards to optimize iteration.

Postgres has the bitmapset datastructure which a) takes significantly
less space, b) has iterator functions to only iterate over set bits, c)
can efficiently skip long sequences of unset bits and d) stops quickly
once the last set bit has been reached.

I have been contemplating if it is worth to keep the minShardOffset
because of readability and the efficient skipping of unset bits,
however, I have decided to keep it -although less readable-, as there
are known usecases where 100k+ shards are pruned to single digit shards.
If these would end up at the end of `shardcount` a hotloop of zero
checks on the first iteration _could_ cause a theoretical performance
regression.

All in all, this code is using less memory in all cases where it
matters, and less cpu in most cases, while using more idiomatic
datastructures for the task at hand.
2023-06-16 16:06:22 +02:00
Marco Slot 3adc1575d9
Fix DROP CONSTRAINT in command string with other commands (#7012)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2023-06-16 15:54:37 +02:00
Onur Tirtir 12a093b456
Allow using generated identity column based on int/smallint when creating a distributed table (#7008)
Allow using generated identity column based on int/smallint when
creating a distributed table so that applications that rely on
those data types don't break.

Inserting into / modifying such columns from workers is not allowed
but it's better than not allowing such columns altogether.
2023-06-16 14:34:23 +03:00
Halil Ozan Akgül 04f6868ed2
Add citus_schemas view (#6979)
DESCRIPTION: Adds citus_schemas view

The citus_schemas view will be created in public schema if it exists, if
not the view will be created in pg_catalog.

Need to:
- [x] Add tests
- [x] Fix tests
2023-06-16 14:21:58 +03:00
Ahmet Gedemenli 002a88ae7f
Error for single shard table creation if replication factor > 1 (#7006)
Error for single shard table creation if replication factor > 1
2023-06-15 13:13:45 +03:00
Emel Şimşek 4f793abc4a
Turn on GUC_REPORT flag for search_path to enable reporting back the parameter value upon change. (#6983)
DESCRIPTION: Turns on the GUC_REPORT flag for search_path. This results
in postgres to report the parameter status back in addition to Command
Complete packet.

In response to the following command,

> SET search_path TO client1;

postgres sends back the following packets (shown in pseudo form):

C (Command Complete) SET + **S (Parameter Status) search_path =
client1**
2023-06-14 17:35:52 +03:00
Onur Tirtir dbdf04e8ba
Rename pg_dist tenant_schema to pg_dist_schema (#7001) 2023-06-14 12:12:15 +03:00
Naisila Puka ba40eb363c
Fix some gucs' initial and boot values, and flag combinations (#6957)
PG16beta1 added some sanity checks for GUCS, find the Relevant PG
commits below:

1- Add check on initial and boot values when loading GUCs

a73952b795
2- Extend check_GUC_init() with checks on flag combinations when loading
GUCs

009f8d1714

I fixed our currently problematic GUCS, we can merge this directly into
main as these make sense for any PG version.

There was a particular NodeConninfo issue:
Previously we would rely on the fact that NodeConninfo initial value
is an empty string. However, with PG16 enforcing same initial and boot
values, we can't use an empty initial value for NodeConninfo anymore.
Therefore we add a new flag to indicate whether we are at boot check.
2023-06-14 11:55:52 +03:00
Ahmet Gedemenli 7b0bc62173
Support CREATE TABLE .. AS SELECT .. commands for tenant tables (#6998)
Support CREATE TABLE .. AS SELECT .. commands for tenant tables
2023-06-13 17:54:09 +03:00
Halil Ozan Akgül 772d194357
Changes citus_shard_sizes view's Shard Name Column to Shard Id (#7003)
citus_shard_sizes view had a shard name column we use to extract shard
id. This PR changes the column to shard id so we don't do unnecessary
string operation.
2023-06-13 16:36:35 +03:00
Gokhan Gulbiz e0ccd155ab
Make citus_stat_tenants work with schema-based tenants. (#6936)
DESCRIPTION: Enabling citus_stat_tenants to support schema-based
tenants.

This pull request modifies the existing logic to enable tenant
monitoring with schema-based tenants. The changes made are as follows:

- If a query has a partitionKeyValue (which serves as a tenant
key/identifier for distributed tables), Citus annotates the query with
both the partitionKeyValue and colocationId. This allows for accurate
tracking of the query.
- If a query does not have a partitionKeyValue, but its colocationId
belongs to a distributed schema, Citus annotates the query with only the
colocationId. The tenant monitor can then easily look up the schema to
determine if it's a distributed schema and make a decision on whether to
track the query.

---------

Co-authored-by: Jelte Fennema <jelte.fennema@microsoft.com>
2023-06-13 14:11:45 +03:00
aykut-bozkurt 5acbd735ca
Move 2 functions to correct files (#7000)
Followup item from
https://github.com/citusdata/citus/pull/6933#discussion_r1217896933
2023-06-13 11:43:48 +03:00
aykut-bozkurt 213d363bc3
Add citus_schema_distribute/undistribute udfs to convert a schema into a tenant schema / back to a regular schema (#6933)
* Currently we do not allow any Citus tables other than Citus local
tables inside a regular schema before executing
`citus_schema_distribute`.
* `citus_schema_undistribute` expects only single shard distributed
tables inside a tenant schema.

DESCRIPTION: Adds the udf `citus_schema_distribute` to convert a regular
schema into a tenant schema.
DESCRIPTION: Adds the udf `citus_schema_undistribute` to convert a
tenant schema back to a regular schema.

---------

Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
2023-06-12 18:41:31 +03:00
Gokhan Gulbiz 2c509b712a
Tenant monitoring performance improvements (#6868)
- [x] Use spinlock instead of lwlock per tenant
[b437aa9](b437aa9e52)
- [x] Use hashtable to store tenant stats
[ccd464b](ccd464ba04)
- [x] Introduce a new GUC for specifying the sampling rate of new tenant
entries in the tenant monitor.
[a8d3805](a8d3805bd6)

Below are the pgbench metrics with select-only workloads from my local
machine. Here is the
[script](https://gist.github.com/gokhangulbiz/7a2308470597dc06734ff7c08f87c656)
I used for benchmarking.

| | Connection Count | Initial Implementation (TPS) | On/Off Diff |
Final Implementation -Run#1 (TPS) | On/Off Diff | Final Implementation
-Run#2 (TPS) | On/Off Diff | Final Implementation -Run#3 (TPS) | On/Off
Diff | Avg On/Off Diff |
| --- | ---------------- | ---------------------------- | ----------- |
---------------------------------- | ----------- |
---------------------------------- | ----------- |
---------------------------------- | ----------- | --------------- |
| On | 32 | 37488.69839 | \-17% | 42859.94402 | \-5% | 43379.63121 |
\-2% | 42636.2264 | \-7% | \-5% |
| Off | 32 | 43909.83121 | | 45139.63151 | | 44188.77425 | | 45451.9548
| | |
| On | 300 | 30463.03538 | \-15% | 33265.19957 | \-7% | 34685.87233 |
\-2% | 34682.5214 | \-1% | \-3% |
| Off | 300 | 35105.73594 | | 35637.45423 | | 35331.33447 | | 35113.3214
| | |
2023-06-11 12:17:31 +03:00
Ahmet Gedemenli e6ac9f2a68
Propagate ALTER SCHEMA .. OWNER TO .. (#6987)
Propagate `ALTER SCHEMA .. OWNER TO ..` commands to workers
2023-06-09 15:32:18 +03:00
Naisila Puka 2ba3bffe1e
Random warning fixes (#6974)
Citus build with PG16 fails because of the following warnings:
 - using char* instead of Datum
 - using pointer instead of oid
 - candidate function for format attribute
 - remove old definition from PG11 compatibility 62bf571ced

This commit fixes the above.
2023-06-09 14:36:43 +03:00
Emel Şimşek 8b2024b730
When Creating a FOREIGN KEY without a name, schema qualify referenced table name in deparser. (#6986)
DESCRIPTION: Fixes a bug which causes an error when creating a FOREIGN
KEY constraint without a name if the referenced table is schema
qualified.

In deparsing the `ALTER TABLE s1.t1 ADD FOREIGN KEY (key) REFERENCES
s2.t2; `, command back from its cooked form, we should schema qualify
the REFERENCED table.

Fixes #6982.
2023-06-09 14:13:13 +03:00
Onur Tirtir fa8870217d
Enable logical planner for single-shard tables (#6950)
* Enable using logical planner for single-shard tables

* Improve non-colocated table error in physical planner

* Favor distributed tables over reference tables when chosing anchor shard
2023-06-08 10:57:23 +03:00
Ahmet Gedemenli 8d8968ae63
Disable ALTER TABLE .. SET SCHEMA for tenant tables (#6973)
Disables `ALTER TABLE .. SET SCHEMA` for tenant tables.
Disables `ALTER TABLE .. SET SCHEMA` for tenant schemas.
2023-06-07 11:02:53 +03:00
Halil Ozan Akgül 3f7bc0cbf5
Single Shard Partition Column UDFs (#6964)
This PR fixes and tests:
- debug_equality_expression
- partition_column_id
2023-06-06 17:55:40 +03:00
Halil Ozan Akgül 7e486345f1
Fix citus_table_type column in citus_tables and citus_shards views for single shard tables (#6971)
`citus_table_type` column of `citus_tables` and `citus_shards` will show
"schema" for tenants schema tables and "distributed" for single shard
tables that are not in a tenant schema.
2023-06-06 16:20:11 +03:00
Naisila Puka c2f117c559
Citus Revise tree-walk APIs to include context (#6975)
Without revising there are Warnings in PG16 build

Relevant PG commit

1c27d16e6e
1c27d16e6e5c1f463bbe1e9ece88dda811235165
2023-06-06 14:17:51 +03:00
Teja Mupparti f6a516dab5 Refactor repartitioning code into generic format 2023-06-05 09:06:05 -07:00
Naisila Puka 48f068d08e
Remove AssertArg and AssertState (#6970)
PG16 removed them. They were already identical to Assert. We can merge
this directly to main branch

Relevant PG commit:

b1099eca8f
b1099eca8f38ff5cfaf0901bb91cb6a22f909bc6

Co-authored-by: onderkalaci <onderkalaci@gmail.com>
2023-06-05 13:25:21 +03:00
ahmet gedemenli 2bd6ff0e93 Use schema name in the error msg 2023-06-02 15:25:14 +03:00
ahmet gedemenli fccfee08b6 Style 2023-06-02 14:48:07 +03:00
ahmet gedemenli f68ea20009 Disable alter_distributed_table for tenant tables 2023-06-02 14:48:07 +03:00
ahmet gedemenli 4b67e398b1 Disable undistribute_table for tenant tables 2023-06-02 14:48:07 +03:00
ahmet gedemenli f4b2494d0c Disable update_distributed_table_colocation for tenant tables 2023-06-02 14:48:07 +03:00
Halil Ozan Akgül 3e183746b7
Single Shard Misc UDFs 2 (#6963)
Creating a second PR to make reviewing easier.
This PR tests:
- replicate_reference_tables
- fix_partition_shard_index_names
- isolate_tenant_to_new_shard
- replicate_table_shards
2023-06-02 13:46:14 +03:00
Teja Mupparti ff2062e8c3 Rename insert-select redistribute code base to generic purpose 2023-06-01 09:43:43 -07:00
Teja Mupparti f9dbe7784b This commit adds a safety-net to the issue seen in #6785. The fix for the underlying issue will be in the PR#6943 2023-05-30 10:53:05 -07:00
Halil Ozan Akgül d99a5e2f62
Single Shard Table Tests for Shard Lock UDFs (#6944)
This PR adds single shard table tests for shard lock UDFs,
`shard_lock_metadata`, `shard_lock_resources`
2023-05-30 12:23:41 +03:00
Halil Ozan Akgül 321fcfcdb5
Add Support for Single Shard Tables in update_distributed_table_colocation (#6924)
Adds Support for Single Shard Tables in
`update_distributed_table_colocation`.

This PR changes checks that make sure tables should be hash distributed
table to hash or single shard distributed tables.
2023-05-29 11:47:50 +03:00
Ahmet Gedemenli 1ca80813f6
Citus UDFs support for single shard tables (#6916)
Verify Citus UDFs work well with single shard tables

SUPPORTED
* citus_table_size
* citus_total_relation_size
* citus_relation_size
* citus_shard_sizes
* truncate_local_data_after_distributing_table
* create_distributed_function // test function colocated with a single
shard table
* undistribute_table
* alter_table_set_access_method

UNSUPPORTED - error out for single shard tables
* master_create_empty_shard
* create_distributed_table_concurrently
* create_distributed_table
* create_reference_table
* citus_add_local_table_to_metadata
* citus_split_shard_by_split_points
* alter_distributed_table
2023-05-26 17:30:05 +03:00
Onur Tirtir 246b054a7d
Add support for schema-based-sharding via a GUC (#6866)
DESCRIPTION: Adds citus.enable_schema_based_sharding GUC that allows
sharding the database based on schemas when enabled.

* Refactor the logic that automatically creates Citus managed tables 

* Refactor CreateSingleShardTable() to allow specifying colocation id
instead

* Add support for schema-based-sharding via a GUC

### What this PR is about:
Add **citus.enable_schema_based_sharding GUC** to enable schema-based
sharding. Each schema created while this GUC is ON will be considered
as a tenant schema. Later on, regardless of whether the GUC is ON or
OFF, any table created in a tenant schema will be converted to a
single shard distributed table (without a shard key). All the tenant
tables that belong to a particular schema will be co-located with each
other and will have a shard count of 1.

We introduce a new metadata table --pg_dist_tenant_schema-- to do the
bookkeeping for tenant schemas:
```sql
psql> \d pg_dist_tenant_schema
          Table "pg_catalog.pg_dist_tenant_schema"
┌───────────────┬─────────┬───────────┬──────────┬─────────┐
│    Column     │  Type   │ Collation │ Nullable │ Default │
├───────────────┼─────────┼───────────┼──────────┼─────────┤
│ schemaid      │ oid     │           │ not null │         │
│ colocationid  │ integer │           │ not null │         │
└───────────────┴─────────┴───────────┴──────────┴─────────┘
Indexes:
    "pg_dist_tenant_schema_pkey" PRIMARY KEY, btree (schemaid)
    "pg_dist_tenant_schema_unique_colocationid_index" UNIQUE, btree (colocationid)

psql> table pg_dist_tenant_schema;
┌───────────┬───────────────┐
│ schemaid  │ colocationid  │
├───────────┼───────────────┤
│     41963 │            91 │
│     41962 │            90 │
└───────────┴───────────────┘
(2 rows)
```

Colocation id column of pg_dist_tenant_schema can never be NULL even
for the tenant schemas that don't have a tenant table yet. This is
because, we assign colocation ids to tenant schemas as soon as they
are created. That way, we can keep associating tenant schemas with
particular colocation groups even if all the tenant tables of a tenant
schema are dropped and recreated later on.

When a tenant schema is dropped, we delete the corresponding row from
pg_dist_tenant_schema. In that case, we delete the corresponding
colocation group from pg_dist_colocation as well.

### Future work for 12.0 release:
We're building schema-based sharding on top of the infrastructure that
adds support for creating distributed tables without a shard key
(https://github.com/citusdata/citus/pull/6867).
However, not all the operations that can be done on distributed tables
without a shard key necessarily make sense (in the same way) in the
context of schema-based sharding. For example, we need to think about
what happens if user attempts altering schema of a tenant table. We
will tackle such scenarios in a future PR.

We will also add a new UDF --citus.schema_tenant_set() or such-- to
allow users to use an existing schema as a tenant schema, and another
one --citus.schema_tenant_unset() or such-- to stop using a schema as
a tenant schema in future PRs.
2023-05-26 10:49:58 +03:00
Emel Şimşek 02f815ce1f
Disable local execution when Explain Analyze is requested for a query. (#6892)
DESCRIPTION: Fixes a crash when explain analyze is requested for a query
that is normally locally executed.

When explain analyze is requested for a query, a task with two queries
is created. Those two queries are
    
1. Wrapped Query --> `SELECT ... FROM
worker_save_query_explain_analyze(<query>, <explain analyze options>)`
2. Fetch Query -->` SELECT explain_analyze_output, execution_duration
FROM worker_last_saved_explain_analyze();`

When the query is locally executed a task with multiple queries causes a
crash in production. See the Assert at
57455dc64d/src/backend/distributed/executor/tuple_destination.c#:~:text=Assert(task%2D%3EqueryCount%20%3D%3D%201)%3B

This becomes a critical issue when auto_explain extension is used. When
auto_explain extension is enabled, explain analyze is automatically
requested for every query.

One possible solution could be not to create two queries for a locally
executed query. The fetch part may not have to be a query since the
values are available in local variables.

Until we enable local execution for explain analyze, it is best to
disable local execution.

Fixes #6777.
2023-05-23 14:33:22 +03:00
Emel Şimşek f9a5be59b9
Run replicate_reference_tables background task as superuser. (#6930)
DESCRIPTION: Fixes a bug in background shard rebalancer where the
replicate reference tables task fails if the current user is not a
superuser.

This change is to be backported to earlier releases. We should fix the
permissions for replicate_reference_tables on main branch such that it
can be run by non-superuser roles.

Fixes #6925.
Fixes #6926.
2023-05-18 23:46:32 +03:00
Onur Tirtir 8ff9dde4b3
Prevent pushing down INSERT .. SELECT queries that we shouldn't (and allow some more) (#6752)
Previously INSERT .. SELECT planner were pushing down some queries that should not be pushed down due to wrong colocation checks. It was checking whether one of the table in SELECT part and target table are colocated. But now, we check colocation for all tables in SELECT part and the target table.

Another problem with INSERT .. SELECT planner was that some queries, which is valid to be pushed down, were not pushed down due to unnecessary checks which are currently supported. e.g. UNION check. As solution, we reused the pushdown planner checks for INSERT .. SELECT planner.


DESCRIPTION: Fixes a bug that causes incorrectly pushing down some
INSERT .. SELECT queries that we shouldn't
DESCRIPTION: Prevents unnecessarily pulling the data into coordinator
for some INSERT .. SELECT queries
DESCRIPTION: Drops support for pushing down INSERT .. SELECT with append
table as target

Fixes #6749.
Fixes #1428.
Fixes #6920.

---------

Co-authored-by: aykutbozkurt <aykut.bozkurt1995@gmail.com>
2023-05-17 15:05:08 +03:00
Onur Tirtir 56d217b108
Mark objects as distributed even when pg_dist_node is empty (#6900)
We mark objects as distributed objects in Citus metadata only if we need
to propagate given the command that creates it to worker nodes. For this
reason, we were not doing this for the objects that are created while
pg_dist_node is empty.

One implication of doing so is that we defer the schema propagation to
the time when user creates the first distributed table in the schema.
However, this doesn't help for schema-based sharding (#6866) because we
want to sync pg_dist_tenant_schema to the worker nodes even for empty
schemas too.

* Support test dependencies for isolation tests without a schedule

* Comment out a test due to a known issue (#6901)

* Also, reduce the verbosity for some log messages and make some
   tests compatible with run_test.py.
2023-05-16 11:45:42 +03:00
Onur Tirtir e7abde7e81
Prevent downgrades when there is a single-shard table in the cluster (#6908)
Also add a few tests for Citus/PG upgrade/downgrade scenarios.
2023-05-16 09:44:28 +02:00
Onur Tirtir 893ed416f1
Disable citus.enable_non_colocated_router_query_pushdown by default (#6909)
Fixes #6779.

DESCRIPTION: Disables citus.enable_non_colocated_router_query_pushdown
GUC by default to ensure generating a consistent distributed plan for
the queries that reference non-colocated distributed tables

We already have tests for the cases where this GUC is disabled,
so I'm not adding any more tests in this PR.

Also make multi_insert_select_window idempotent.

Related to: #6793
2023-05-15 12:07:50 +03:00
Jelte Fennema 07b8cd2634
Forward to existing emit_log_hook in our log hook (#6877)
DESCRIPTION: Forward to existing emit_log_hook in our log hook

This makes us work better with other extensions installed in Postgres.
Without this change we would overwrite their emit_log_hook, causing it
to never be called.

Fixes #6874
2023-05-09 16:55:56 +02:00
Naisila Puka 905fd46410
Fixes flakiness in background_rebalance_parallel test (#6910)
Fixes the following flaky outputs by decreasing citus_task_wait loop
interval, and changing the order of wait commands.

https://app.circleci.com/pipelines/github/citusdata/citus/32102/workflows/19958297-6c7e-49ef-9bc2-8efe8aacb96f/jobs/1089589

``` diff
SELECT job_id, task_id, status, nodes_involved
 FROM pg_dist_background_task WHERE job_id in (:job_id) ORDER BY task_id;
  job_id | task_id |  status  | nodes_involved 
 --------+---------+----------+----------------
   17779 |    1013 | done     | {50,56}
   17779 |    1014 | running  | {50,57}
-  17779 |    1015 | running  | {50,56}
-  17779 |    1016 | blocked  | {50,57}
+  17779 |    1015 | done     | {50,56}
+  17779 |    1016 | running  | {50,57}
   17779 |    1017 | runnable | {50,56}
   17779 |    1018 | blocked  | {50,57}
   17779 |    1019 | runnable | {50,56}
   17779 |    1020 | blocked  | {50,57}
 (8 rows)
```

https://github.com/citusdata/citus/pull/6893#issuecomment-1525661408
```diff
SELECT job_id, task_id, status, nodes_involved
 FROM pg_dist_background_task WHERE job_id in (:job_id) ORDER BY task_id;
  job_id | task_id |  status  | nodes_involved 
 --------+---------+----------+----------------
   17779 |    1013 | done     | {50,56}
-  17779 |    1014 | running  | {50,57}
+  17779 |    1014 | runnable | {50,57}
   17779 |    1015 | running  | {50,56}
   17779 |    1016 | blocked  | {50,57}
   17779 |    1017 | runnable | {50,56}
   17779 |    1018 | blocked  | {50,57}
   17779 |    1019 | runnable | {50,56}
   17779 |    1020 | blocked  | {50,57}
 (8 rows)
```
2023-05-05 16:47:01 +03:00
Teja Mupparti b58665773b Move all pre-15-defined routines to the bottom of the file 2023-05-04 10:07:08 -07:00
Naisila Puka 072ae44742
Adjusts query's CoerceViaIO & RelabelType nodes that are improper for deparsing (#6391)
Adjusts query's CoerceViaIO & RelabelType nodes that are
improper for deparsing

The standard planner converts some `::text` casts to `::cstring` and
here we convert back because `cstring` is a pseudotype and it cannot be
casted to most types. This problem occurs in CoerceViaIO nodes.
There was another problem with RelabelType nodes fixed in the following
PR:
https://github.com/citusdata/citus/pull/4580
We undo the changes in that PR, and fix both CoerceViaIO and RelabelType
nodes in the planning phase (not in the deparsing phase in ruleutils)

Fixes https://github.com/citusdata/citus/issues/5646
Fixes https://github.com/citusdata/citus/issues/5033
Fixes https://github.com/citusdata/citus/issues/6061
2023-05-04 16:46:02 +03:00
Ahmet Gedemenli 4321286005 Disable master_create_empty_shard udf for single shard tables (#6902) 2023-05-03 17:02:43 +03:00
Onur Tirtir db2514ef78 Call null-shard-key tables as single-shard distributed tables in code 2023-05-03 17:02:43 +03:00