Commit Graph

807 Commits (6acb4dc9f6b4137e8be0bfeb8b47bf7e9bcfd883)

Author SHA1 Message Date
Hanefi Onaldi 6acb4dc9f6
Merge branch 'main' of github.com:citusdata/citus into index-ddl-via-worker 2023-06-20 15:50:21 +03:00
Hanefi Onaldi 979b48e1a7
Disallow index DDL on worker if coord not in metadata 2023-06-20 15:49:15 +03:00
aykut-bozkurt f667f14029
Rewind tuple store to fix scrollable with hold cursor fetches (#7014)
We need to rewind the tuplestorestate's tuple index to get correct
results on fetching scrollable with hold cursors.


`PersistHoldablePortal` is responsible for persisting out
tuplestorestate inside a with hold cursor before commiting a
transaction.

It rewinds the cursor like below (`ExecutorRewindcalls` calls `rescan`):
```c
if (portal->cursorOptions & CURSOR_OPT_SCROLL)
{
  ExecutorRewind(queryDesc);
}
```

At the end, it adjusts tuple index for holdStore in the portal properly.
```c
if (portal->cursorOptions & CURSOR_OPT_SCROLL)
{
         if (!tuplestore_skiptuples(portal->holdStore,
	                                         portal->portalPos,
	                                         true))
	    elog(ERROR, "unexpected end of tuple stream");
}
```

DESCRIPTION: Fixes incorrect results on fetching scrollable with hold
cursors.

Fixes https://github.com/citusdata/citus/issues/7010
2023-06-19 23:00:18 +03:00
Teja Mupparti 58da8771aa This pull request introduces support for nonroutable merge commands in the following scenarios:
1) For distributed tables that are not colocated.
2) When joining on a non-distribution column for colocated tables.
3) When merging into a distributed table using reference or citus-local tables as the data source.

This is accomplished primarily through the implementation of the following two strategies.

Repartition: Plan the source query independently,
execute the results into intermediate files, and repartition the files to
co-locate them with the merge-target table. Subsequently, compile a final
merge query on the target table using the intermediate results as the data
source.

Pull-to-coordinator: Execute the plan that requires evaluation at the coordinator,
run the query on the coordinator, and redistribute the resulting rows to ensure
colocation with the target shards. Direct the MERGE SQL operation to the worker
nodes' target shards, using the intermediate files colocated with the data as the
data source.
2023-06-19 12:23:40 -07:00
aykut-bozkurt fba5c8dd30
ALTER TABLE <tblname> SET SCHEMA <schemaname> for single shard tables (#7004)
Adds support for altering schema of single shard tables. We do that in 2
steps.
1. Undistribute the tenant table at `preprocess` step,
2. Distribute new schema if it is a distributed schema after DDLs are
propagated.

DESCRIPTION: Adds support for altering a table's schema to/from
distributed schemas.
2023-06-19 10:21:13 +03:00
Hanefi Onaldi 914aa87c4e
Do not acquire locks if the index does not exist 2023-06-19 01:41:38 +03:00
Marco Slot 3c4cec827d
Propagate create/drop/reindex index via workers 2023-06-19 01:41:38 +03:00
Marco Slot 3adc1575d9
Fix DROP CONSTRAINT in command string with other commands (#7012)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2023-06-16 15:54:37 +02:00
Onur Tirtir 12a093b456
Allow using generated identity column based on int/smallint when creating a distributed table (#7008)
Allow using generated identity column based on int/smallint when
creating a distributed table so that applications that rely on
those data types don't break.

Inserting into / modifying such columns from workers is not allowed
but it's better than not allowing such columns altogether.
2023-06-16 14:34:23 +03:00
Ahmet Gedemenli 002a88ae7f
Error for single shard table creation if replication factor > 1 (#7006)
Error for single shard table creation if replication factor > 1
2023-06-15 13:13:45 +03:00
Naisila Puka ba40eb363c
Fix some gucs' initial and boot values, and flag combinations (#6957)
PG16beta1 added some sanity checks for GUCS, find the Relevant PG
commits below:

1- Add check on initial and boot values when loading GUCs

a73952b795
2- Extend check_GUC_init() with checks on flag combinations when loading
GUCs

009f8d1714

I fixed our currently problematic GUCS, we can merge this directly into
main as these make sense for any PG version.

There was a particular NodeConninfo issue:
Previously we would rely on the fact that NodeConninfo initial value
is an empty string. However, with PG16 enforcing same initial and boot
values, we can't use an empty initial value for NodeConninfo anymore.
Therefore we add a new flag to indicate whether we are at boot check.
2023-06-14 11:55:52 +03:00
Ahmet Gedemenli 7b0bc62173
Support CREATE TABLE .. AS SELECT .. commands for tenant tables (#6998)
Support CREATE TABLE .. AS SELECT .. commands for tenant tables
2023-06-13 17:54:09 +03:00
aykut-bozkurt 5acbd735ca
Move 2 functions to correct files (#7000)
Followup item from
https://github.com/citusdata/citus/pull/6933#discussion_r1217896933
2023-06-13 11:43:48 +03:00
aykut-bozkurt 213d363bc3
Add citus_schema_distribute/undistribute udfs to convert a schema into a tenant schema / back to a regular schema (#6933)
* Currently we do not allow any Citus tables other than Citus local
tables inside a regular schema before executing
`citus_schema_distribute`.
* `citus_schema_undistribute` expects only single shard distributed
tables inside a tenant schema.

DESCRIPTION: Adds the udf `citus_schema_distribute` to convert a regular
schema into a tenant schema.
DESCRIPTION: Adds the udf `citus_schema_undistribute` to convert a
tenant schema back to a regular schema.

---------

Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
2023-06-12 18:41:31 +03:00
Ahmet Gedemenli e6ac9f2a68
Propagate ALTER SCHEMA .. OWNER TO .. (#6987)
Propagate `ALTER SCHEMA .. OWNER TO ..` commands to workers
2023-06-09 15:32:18 +03:00
Ahmet Gedemenli 8d8968ae63
Disable ALTER TABLE .. SET SCHEMA for tenant tables (#6973)
Disables `ALTER TABLE .. SET SCHEMA` for tenant tables.
Disables `ALTER TABLE .. SET SCHEMA` for tenant schemas.
2023-06-07 11:02:53 +03:00
Naisila Puka 48f068d08e
Remove AssertArg and AssertState (#6970)
PG16 removed them. They were already identical to Assert. We can merge
this directly to main branch

Relevant PG commit:

b1099eca8f
b1099eca8f38ff5cfaf0901bb91cb6a22f909bc6

Co-authored-by: onderkalaci <onderkalaci@gmail.com>
2023-06-05 13:25:21 +03:00
ahmet gedemenli 2bd6ff0e93 Use schema name in the error msg 2023-06-02 15:25:14 +03:00
ahmet gedemenli fccfee08b6 Style 2023-06-02 14:48:07 +03:00
ahmet gedemenli f68ea20009 Disable alter_distributed_table for tenant tables 2023-06-02 14:48:07 +03:00
ahmet gedemenli 4b67e398b1 Disable undistribute_table for tenant tables 2023-06-02 14:48:07 +03:00
ahmet gedemenli f4b2494d0c Disable update_distributed_table_colocation for tenant tables 2023-06-02 14:48:07 +03:00
Ahmet Gedemenli 1ca80813f6
Citus UDFs support for single shard tables (#6916)
Verify Citus UDFs work well with single shard tables

SUPPORTED
* citus_table_size
* citus_total_relation_size
* citus_relation_size
* citus_shard_sizes
* truncate_local_data_after_distributing_table
* create_distributed_function // test function colocated with a single
shard table
* undistribute_table
* alter_table_set_access_method

UNSUPPORTED - error out for single shard tables
* master_create_empty_shard
* create_distributed_table_concurrently
* create_distributed_table
* create_reference_table
* citus_add_local_table_to_metadata
* citus_split_shard_by_split_points
* alter_distributed_table
2023-05-26 17:30:05 +03:00
Onur Tirtir 246b054a7d
Add support for schema-based-sharding via a GUC (#6866)
DESCRIPTION: Adds citus.enable_schema_based_sharding GUC that allows
sharding the database based on schemas when enabled.

* Refactor the logic that automatically creates Citus managed tables 

* Refactor CreateSingleShardTable() to allow specifying colocation id
instead

* Add support for schema-based-sharding via a GUC

### What this PR is about:
Add **citus.enable_schema_based_sharding GUC** to enable schema-based
sharding. Each schema created while this GUC is ON will be considered
as a tenant schema. Later on, regardless of whether the GUC is ON or
OFF, any table created in a tenant schema will be converted to a
single shard distributed table (without a shard key). All the tenant
tables that belong to a particular schema will be co-located with each
other and will have a shard count of 1.

We introduce a new metadata table --pg_dist_tenant_schema-- to do the
bookkeeping for tenant schemas:
```sql
psql> \d pg_dist_tenant_schema
          Table "pg_catalog.pg_dist_tenant_schema"
┌───────────────┬─────────┬───────────┬──────────┬─────────┐
│    Column     │  Type   │ Collation │ Nullable │ Default │
├───────────────┼─────────┼───────────┼──────────┼─────────┤
│ schemaid      │ oid     │           │ not null │         │
│ colocationid  │ integer │           │ not null │         │
└───────────────┴─────────┴───────────┴──────────┴─────────┘
Indexes:
    "pg_dist_tenant_schema_pkey" PRIMARY KEY, btree (schemaid)
    "pg_dist_tenant_schema_unique_colocationid_index" UNIQUE, btree (colocationid)

psql> table pg_dist_tenant_schema;
┌───────────┬───────────────┐
│ schemaid  │ colocationid  │
├───────────┼───────────────┤
│     41963 │            91 │
│     41962 │            90 │
└───────────┴───────────────┘
(2 rows)
```

Colocation id column of pg_dist_tenant_schema can never be NULL even
for the tenant schemas that don't have a tenant table yet. This is
because, we assign colocation ids to tenant schemas as soon as they
are created. That way, we can keep associating tenant schemas with
particular colocation groups even if all the tenant tables of a tenant
schema are dropped and recreated later on.

When a tenant schema is dropped, we delete the corresponding row from
pg_dist_tenant_schema. In that case, we delete the corresponding
colocation group from pg_dist_colocation as well.

### Future work for 12.0 release:
We're building schema-based sharding on top of the infrastructure that
adds support for creating distributed tables without a shard key
(https://github.com/citusdata/citus/pull/6867).
However, not all the operations that can be done on distributed tables
without a shard key necessarily make sense (in the same way) in the
context of schema-based sharding. For example, we need to think about
what happens if user attempts altering schema of a tenant table. We
will tackle such scenarios in a future PR.

We will also add a new UDF --citus.schema_tenant_set() or such-- to
allow users to use an existing schema as a tenant schema, and another
one --citus.schema_tenant_unset() or such-- to stop using a schema as
a tenant schema in future PRs.
2023-05-26 10:49:58 +03:00
Onur Tirtir 56d217b108
Mark objects as distributed even when pg_dist_node is empty (#6900)
We mark objects as distributed objects in Citus metadata only if we need
to propagate given the command that creates it to worker nodes. For this
reason, we were not doing this for the objects that are created while
pg_dist_node is empty.

One implication of doing so is that we defer the schema propagation to
the time when user creates the first distributed table in the schema.
However, this doesn't help for schema-based sharding (#6866) because we
want to sync pg_dist_tenant_schema to the worker nodes even for empty
schemas too.

* Support test dependencies for isolation tests without a schedule

* Comment out a test due to a known issue (#6901)

* Also, reduce the verbosity for some log messages and make some
   tests compatible with run_test.py.
2023-05-16 11:45:42 +03:00
Onur Tirtir db2514ef78 Call null-shard-key tables as single-shard distributed tables in code 2023-05-03 17:02:43 +03:00
Ahmet Gedemenli cdf54ff4b1 Add DDL support null-shard-key tables(#6778/#6784/#6787/#6859)
Add tests for ddl coverage:
* indexes
* partitioned tables + indexes with long names
* triggers
* foreign keys
* statistics
* grant & revoke statements
* truncate & vacuum
* create/test/drop view that depends on a dist table with no shard key
* policy & rls test

* alter table add/drop/alter_type column (using sequences/different data
  types/identity columns)
* alter table add constraint (not null, check, exclusion constraint)
* alter table add column with a default value / set default / drop
  default
* alter table set option (autovacuum)

* indexes / constraints without names
* multiple subcommands

Adds support for
* Creating new partitions after distributing (with null key) the parent
table
* Attaching partitions to a distributed table with null distribution key
(and automatically distribute the new partition with null key as well)
* Detaching partitions from it
2023-05-03 16:18:27 +03:00
Onur Tirtir fa467e05e7 Add support for creating distributed tables with a null shard key (#6745)
With this PR, we allow creating distributed tables with without
specifying a shard key via create_distributed_table(). Here are the
the important details about those tables:
* Specifying `shard_count` is not allowed because it is assumed to be 1.
* We mostly call such tables as "null shard-key" table in code /
comments.
* To avoid doing a breaking layout change in create_distributed_table();
instead of throwing an error, it will inform the user that
`distribution_type`
  param is ignored unless it's explicitly set to NULL or  'h'.
* `colocate_with` param allows colocating such null shard-key tables to
  each other.
* We define this table type, i.e., NULL_SHARD_KEY_TABLE, as a subclass
of
  DISTRIBUTED_TABLE because we mostly want to treat them as distributed
  tables in terms of SQL / DDL / operation support.
* Metadata for such tables look like:
  - distribution method => DISTRIBUTE_BY_NONE
  - replication model => REPLICATION_MODEL_STREAMING
- colocation id => **!=** INVALID_COLOCATION_ID (distinguishes from
Citus local tables)
* We assign colocation groups for such tables to different nodes in a
  round-robin fashion based on the modulo of "colocation id".

Note that this PR doesn't care about DDL (except CREATE TABLE) / SQL /
operation (i.e., Citus UDFs) support for such tables but adds a
preliminary
API.
2023-05-03 16:18:27 +03:00
aykut-bozkurt 8cb69cfd13
break sequence dependency during table creation (#6889)
We need to break sequence dependency for a table while creating the
table during non-transactional metadata sync to ensure idempotency of
the creation of the table.

**Problem:**
When we send `SELECT
pg_catalog.worker_drop_sequence_dependency(logicalrelid::regclass::text)
FROM pg_dist_partition` to workers during the non-transactional sync,
table might not be in `pg_dist_partition` at worker, and sequence
dependency is not broken at the worker.

**Solution:** 
We break sequence dependency via `SELECT
pg_catalog.worker_drop_sequence_dependency(logicalrelid::regclass::text)`
for each table while creating it at the workers. It is safe to send
since the udf is a no-op when there is no sequence dependency.

DESCRIPTION: Fixes a bug related to sequence idempotency at
non-transactional sync.

Fixes https://github.com/citusdata/citus/issues/6888.
2023-04-28 15:09:09 +03:00
Onur Tirtir f87a2d02b0
Move the common logic related to creating a Citus table down to CreateCitusTable (#6836)
.. rather than having it in user facing functions. That way, we
can use the same logic for creating Citus tables from other places
too.

This would be useful for creating tenant tables via a simple function
call in the utility hook, for schema-based sharding purposes.
2023-04-14 16:13:39 +03:00
Marco Slot 343d1c5072
Refactor executor utility functions into multiple files (#6593)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2023-03-31 13:07:48 +02:00
aykutbozkurt cf4e93a332 PR #6728  / commit - 8
Drop table, if exists, during table dependency creation.
2023-03-30 10:53:22 +03:00
aykutbozkurt f8fb20cc95 PR #6728  / commit - 7
Remove unused old metadata sync methods.
2023-03-30 10:53:22 +03:00
aykutbozkurt 29ef9117e6 PR #6728  / commit - 4
Add new metadata sync methods which uses MemorySyncContext api so that during the sync we can
- free memory to prevent OOM,
- use either transactional or nontransactional modes according to the GUC .
2023-03-30 10:53:22 +03:00
Gokhan Gulbiz e71bfd6074
Identity column implementation refactorings (#6738)
This pull request proposes a change to the logic used for propagating
identity columns to worker nodes in citus. Instead of creating a
dependent sequence for each identity column and changing its default
value to `nextval(seq)/worker_nextval(seq)`, this update will pass the
identity columns as-is to the worker nodes.

Please note that there are a few limitations to this change. 

1. Only bigint identity columns will be allowed in distributed tables to
ensure compatibility with the DDL from any node functionality. Our
current distributed sequence implementation only allows insert
statements from all nodes for bigint sequences.
2. `alter_distributed_table` and `undistribute_table` operations will
not be allowed for tables with identity columns. This is because we do
not have a proper way of keeping sequence states consistent across the
cluster.

DESCRIPTION: Prevents using identity columns on data types other than
`bigint` on distributed tables
DESCRIPTION: Prevents using `alter_distributed_table` and
`undistribute_table` UDFs when a table has identity columns
DESCRIPTION: Fixes a bug that prevents enforcing identity column
restrictions on worker nodes

Depends on #6740
Fixes #6694
2023-03-30 10:41:01 +03:00
Marco Slot b09d239809 Propagate CREATE PUBLICATION statements 2023-03-29 00:59:12 +02:00
rajeshkt78 85b8a2c7a1
CDC implementation for Citus using Logical Replication (#6623)
Description:
Implementing CDC changes using Logical Replication to avoid
re-publishing events multiple times by setting up replication origin
session, which will add "DoNotReplicateId" to every WAL entry.
   - shard splits
   - shard moves
   - create distributed table
   - undistribute table
   - alter distributed tables (for some cases)
   - reference table operations
   

The citus decoder which will be decoding WAL events for CDC clients, 
ignores any WAL entry with replication origin that is not zero.
It also maps the shard names to distributed table names.
2023-03-28 16:00:21 +05:30
Onur Tirtir f68fc9e69c
Decide core distribution params in CreateCitusTable (#6760)
Decide core distribution params in CreateCitusTable to reduce the
chances of
creating Citus tables based on incorrect combinations of distribution
method
and replication model params.

Also introduce DistributedTableParams struct to encapsulate the
parameters
that are specific to distributed tables.
2023-03-14 14:24:52 +03:00
Onur Tirtir 20a5f3af2b
Replace CITUS_TABLE_WITH_NO_DIST_KEY checks with HasDistributionKey() (#6743)
Now that we will soon add another table type having DISTRIBUTE_BY_NONE
as distribution method and that we want the code to interpret such
tables mostly as distributed tables, let's make the definition of those
other two table types more strict by removing
CITUS_TABLE_WITH_NO_DIST_KEY
macro.

And instead, use HasDistributionKey() check in the places where the
logic applies to all table types that have / don't have a distribution
key. In future PRs, we might want to convert some of those
HasDistributionKey() checks if logic only applies to Citus local /
reference tables, not the others.

And adding HasDistributionKey() also allows us to consider having
DISTRIBUTE_BY_NONE as the distribution method as a "table attribute"
that can apply to distributed tables too, rather something that
determines the table type.
2023-03-10 13:55:52 +03:00
Onur Tirtir d82c11f793
Refactor CreateDistributedTable() (#6742)
Split the main logic that allows creating a Citus table into the
internal function CreateCitusTable().

Old CreateDistributedTable() function was assuming that it's creating
a reference table when the distribution method is DISTRIBUTE_BY_NONE.
However, soon this won't be the case when adding support for creating
single-shard distributed tables because their distribution method would
also be the same.

Now the internal method CreateCitusTable() doesn't make any assumptions
about table's replication model or such. Instead, it expects callers to
properly set all such metadata bits.

Even more, some of the parameters the old CreateDistributedTable() takes
 --such as the shard count-- were not meaningful for a reference table,
and would be the same as for new table type.
2023-03-08 13:38:51 +03:00
aykut-bozkurt e2654deeae
fix memory leak during altering distributed table with a lot of partition and shards (#6726)
2 improvements to prevent memory leaks during altering or undistributing
distributed tables with a lot of partitions and shards:

1. Free memory for each call to ConvertTable so that colocated and partition tables at
`AlterDistributedTable`, `UndistributeTable`, or
`AlterTableSetAccessMethod` will not cause an increase
in memory usage,
2. Free memory while executing attach partition commands for each partition table at
`AlterDistributedTable` to prevent an increase in memory usage.

DESCRIPTION: Fixes memory leak issue during altering distributed table
with a lot of partition and shards.

Fixes https://github.com/citusdata/citus/issues/6503.
2023-02-28 21:23:41 +03:00
aykut-bozkurt a7689c3f8d
fix memory leak during distribution of a table with a lot of partitions (#6722)
We have memory leak during distribution of a table with a lot of
partitions as we do not release memory at ExprContext until all
partitions are not distributed. We improved 2 things to resolve the
issue:

1. We create and delete MemoryContext for each call to
`CreateDistributedTable` by partitions,
2. We rebuild the cache after we insert all the placements instead of
each placement for a shard.

DESCRIPTION: Fixes memory leak during distribution of a table with a lot
of partitions and shards.

Fixes https://github.com/citusdata/citus/issues/6572.
2023-02-17 18:12:49 +03:00
aykut-bozkurt 273911ac7f
prevent memory leak during ConvertTable with a lot of partitions (#6693)
Prevents memory leak during ConvertTable call for a table with a lot of
partitions.

DESCRIPTION: Fixes memory leak during undistribution and alteration of a
table with a lot of partitions.
2023-02-13 15:22:13 +03:00
Gokhan Gulbiz b6a4652849
Stop background daemon before dropping the database (#6688)
DESCRIPTION: Stop maintenance daemon when dropping a database even
without Citus extension

Fixes #6670
2023-02-03 15:15:44 +03:00
aykut-bozkurt 8a9bb272e4
fix dropping table_name option from foreign table (#6669)
We should disallow dropping table_name option if foreign table is in
metadata. Otherwise, we get table not found error which contains
shardid.

DESCRIPTION: Fixes an unexpected foreign table error by disallowing to drop the table_name option.

Fixes #6663
2023-01-30 17:24:30 +03:00
Gokhan Gulbiz 4e26464969
Allow plain pg foreign tables without a table_name option (#6652) 2023-01-27 16:34:11 +03:00
Jelte Fennema 81dcddd1ef
Actually skip constraint validation on shards after shard move (#6640)
DESCRIPTION: Fix foreign key validation skip at the end of shard move

In eadc88a we started completely skipping foreign key constraint
validation at the end of a non blocking shard move, instead of only for
foreign keys to reference tables. However, it turns out that this didn't
work at all because of a hard to notice bug: By resetting the
SkipConstraintValidation flag at the end of our utility hook, we
actually make the SET command that sets it a no-op.

This fixes that bug by removing the code that resets it. This is fine
because #6543 removed the only place where we set the flag in C code. So
the resetting of the flag has no purpose anymore. This PR also adds a
regression test, because it turned out we didn't have any otherwise we
would have caught that the feature was completely broken.

It also moves the constraint validation skipping to the utility hook.
The reason is that #6550 showed us that this is the better place to skip
it, because it will also skip the planning phase and not just the
execution.
2023-01-27 13:08:05 +01:00
Onur Tirtir 97dba0ac00
Fix uninit mem acceess in UpdateFunctionDistributionInfo (#6658)
Fixes #6655.

heap_modify_tuple() fetches values[i] if replace[i] is set true,
regardless of the fact that whether isnull[i] is true or false. So
similar to replace[], let's init values[] & isnull[] too.

DESCRIPTION: Fixes an uninitialized memory access in
create_distributed_function()
2023-01-27 11:00:41 +03:00
Emel Şimşek 24f6136f72
Fixes ADD {PRIMARY KEY/UNIQUE} USING INDEX cmd (#6647)
This change allows creating a constraint without a name using an index.
The index name will be used as the constraint name the same way postgres
handles it.
Fixes issue #6644

This commit also cleans up some leftovers from nameless constraint checks.
With this commit, we now fully support adding all nameless constraints
directly to a table.

Co-authored-by: naisila <nicypp@gmail.com>
2023-01-25 21:28:07 +03:00
Naisila Puka 3c96b2a0cd
Remove unused function RelationUsesIdentityColumns (#6645)
Cleanup from #6591
2023-01-24 17:10:05 +03:00