DESCRIPTION: Adds citus.enable_schema_based_sharding GUC that allows
sharding the database based on schemas when enabled.
* Refactor the logic that automatically creates Citus managed tables
* Refactor CreateSingleShardTable() to allow specifying colocation id
instead
* Add support for schema-based-sharding via a GUC
### What this PR is about:
Add **citus.enable_schema_based_sharding GUC** to enable schema-based
sharding. Each schema created while this GUC is ON will be considered
as a tenant schema. Later on, regardless of whether the GUC is ON or
OFF, any table created in a tenant schema will be converted to a
single shard distributed table (without a shard key). All the tenant
tables that belong to a particular schema will be co-located with each
other and will have a shard count of 1.
We introduce a new metadata table --pg_dist_tenant_schema-- to do the
bookkeeping for tenant schemas:
```sql
psql> \d pg_dist_tenant_schema
Table "pg_catalog.pg_dist_tenant_schema"
┌───────────────┬─────────┬───────────┬──────────┬─────────┐
│ Column │ Type │ Collation │ Nullable │ Default │
├───────────────┼─────────┼───────────┼──────────┼─────────┤
│ schemaid │ oid │ │ not null │ │
│ colocationid │ integer │ │ not null │ │
└───────────────┴─────────┴───────────┴──────────┴─────────┘
Indexes:
"pg_dist_tenant_schema_pkey" PRIMARY KEY, btree (schemaid)
"pg_dist_tenant_schema_unique_colocationid_index" UNIQUE, btree (colocationid)
psql> table pg_dist_tenant_schema;
┌───────────┬───────────────┐
│ schemaid │ colocationid │
├───────────┼───────────────┤
│ 41963 │ 91 │
│ 41962 │ 90 │
└───────────┴───────────────┘
(2 rows)
```
Colocation id column of pg_dist_tenant_schema can never be NULL even
for the tenant schemas that don't have a tenant table yet. This is
because, we assign colocation ids to tenant schemas as soon as they
are created. That way, we can keep associating tenant schemas with
particular colocation groups even if all the tenant tables of a tenant
schema are dropped and recreated later on.
When a tenant schema is dropped, we delete the corresponding row from
pg_dist_tenant_schema. In that case, we delete the corresponding
colocation group from pg_dist_colocation as well.
### Future work for 12.0 release:
We're building schema-based sharding on top of the infrastructure that
adds support for creating distributed tables without a shard key
(https://github.com/citusdata/citus/pull/6867).
However, not all the operations that can be done on distributed tables
without a shard key necessarily make sense (in the same way) in the
context of schema-based sharding. For example, we need to think about
what happens if user attempts altering schema of a tenant table. We
will tackle such scenarios in a future PR.
We will also add a new UDF --citus.schema_tenant_set() or such-- to
allow users to use an existing schema as a tenant schema, and another
one --citus.schema_tenant_unset() or such-- to stop using a schema as
a tenant schema in future PRs.
With this PR, we allow creating distributed tables with without
specifying a shard key via create_distributed_table(). Here are the
the important details about those tables:
* Specifying `shard_count` is not allowed because it is assumed to be 1.
* We mostly call such tables as "null shard-key" table in code /
comments.
* To avoid doing a breaking layout change in create_distributed_table();
instead of throwing an error, it will inform the user that
`distribution_type`
param is ignored unless it's explicitly set to NULL or 'h'.
* `colocate_with` param allows colocating such null shard-key tables to
each other.
* We define this table type, i.e., NULL_SHARD_KEY_TABLE, as a subclass
of
DISTRIBUTED_TABLE because we mostly want to treat them as distributed
tables in terms of SQL / DDL / operation support.
* Metadata for such tables look like:
- distribution method => DISTRIBUTE_BY_NONE
- replication model => REPLICATION_MODEL_STREAMING
- colocation id => **!=** INVALID_COLOCATION_ID (distinguishes from
Citus local tables)
* We assign colocation groups for such tables to different nodes in a
round-robin fashion based on the modulo of "colocation id".
Note that this PR doesn't care about DDL (except CREATE TABLE) / SQL /
operation (i.e., Citus UDFs) support for such tables but adds a
preliminary
API.
DESCRIPTION: PR description that will go into the change log, up to 78
characters
---------
Co-authored-by: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com>
In the past, having columnar tables in the cluster was causing pg
upgrades to fail when attempting to access columnar metadata. This is
because, pg_dump doesn't see objects that we use for columnar-am related
booking as the dependencies of the tables using columnar-am.
To fix that; in #5456, we inserted some "normal dependency" edges (from
those objects to columnar-am) into pg_depend.
This helped us ensuring the existency of a class of metadata objects
--such as columnar.storageid_seq-- and helped fixing #5437.
However, the normal-dependency edges that we added for indexes on
columnar metadata tables --such columnar.stripe_pkey-- didn't help at
all because they were indeed causing dependency loops (#5510) and
pg_dump was not able to take those dependency edges into the account.
For this reason, this commit deletes those dependency edges so that
pg_dump stops complaining about them. Note that it's not critical to
delete those edges from pg_depend since they're not breaking pg upgrades
but were triggering some warning messages. And given that backporting
a sql change into older versions is hard a lot, we skip backporting
this.
DESCRIPTION: Drop `SHARD_STATE_TO_DELETE` and use the cleanup records
instead
Drops the shard state that is used to mark shards as orphaned. Now we
insert cleanup records into `pg_dist_cleanup` so "orphaned" shards will
be dropped either by maintenance daemon or internal cleanup calls. With
this PR, we make the "cleanup orphaned shards" functions to be no-op, as
they would not be needed anymore.
This PR includes some naming changes about placement functions. We don't
need functions that filter orphaned shards, as there will be no orphaned
shards anymore.
We will also be introducing a small script with this PR, for users with
orphaned shards. We'll basically delete the orphaned shard entries from
`pg_dist_placement` and insert cleanup records into `pg_dist_cleanup`
for each one of them, during Citus upgrade.
We also have a lot of flakiness fixes in this PR.
Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
DESCRIPTION: Drops GUC defer_drop_after_shard_split
DESCRIPTION: Drops GUC defer_drop_after_shard_move
Drop GUCs and related parts from the code.
Delete tests that specifically added for the GUCs.
Keep tests that can be used without the GUCs.
Update test output changes.
The motivation for this PR is to have an "always deferring" mechanism.
These two GUCs provide an option to not deferring dropping objects
during a shard move/split, and dropping them immediately. With this PR,
we will be always deferring dropping orphaned shards and other types of
objects.
We will have a separate PR to extend the deferred cleanup operation, so
that we would create records for deferred drop, for Subscriptions,
Publications, Replication Slots etc. This will make us be able to keep
track of created objects that needs to be dropped, during a shard
move/split. We will have objects created specifically for the current
operation; and those objects will be dropped at the end.
We have an issue (a draft roadmap) for enabling parallel shard moves.
For details please see: https://github.com/citusdata/citus/issues/6437
* Adjust configure script to allow PG15
* Adds copy of ruleutils_14.c as ruleutils_15.c
* Uses get_namespace_name_or_temp in ruleutils_15.c
Relevant PG commit:
48c5c9068211e0a04fd9553c8714b2821ed3ad17
* Clean up code using "(expr) ? true : false" in ruleutils_15.c
Relevant PG commit:
fd0625c7a9c679c0c1e896014b8f49a489c3a245
* Change varno from Index (unsigned int) to int in ruleutils_15.c
Relevant PG commit:
e3ec3c00d85bd2844ffddee83df2bd67c4f8297f
* Adds find_recursive_union to ruleutils_15.c
Relevant PG commit:
3f50b82639637c9908afa2087de7588450aa866b
* Fix display of SQL-std func's args in INSERT/SELECT in ruleutils_15.c
Relevant PG commit:
a8d8445a7b2f80f6d0bfe97b19f90bd2cbef8759
* Fix ruleutils_15.c's dumping of whole-row Vars in more contexts
Relevant PG commit:
43c2175121c829c8591fc5117b725f1f22bfb670
* Fix assorted missing logic for GroupingFunc nodes in ruleutils_15.c
Relevant PG commit:
2591ee8ec44d8cbc8e1226550337a64c684746e4
* Adds grammar support for SQL/JSON clauses in ruleutils_15.c
Relevant PG commit:
f79b803dcc98d707450e158db3638dc67ff8380b
* Adds SQL/JSON constructors to ruleutils_15.c
Relevant PG commits:
f4fb45d15c59d7add2e1b81a9d477d0119a9691a
cc7401d5ca498a84d9b47fd2e01cebd8e830e558
* Adds support for MERGE in ruleutils_15.c
Relevant PG commit:
7103ebb7aae8ab8076b7e85f335ceb8fe799097c
* Add IS JSON predicate to ruleutils_15.c
Relevant PG commit:
33a377608fc29cdd1f6b63be561eab0aee5c81f0
* Add SQL/JSON query functions to ruleutils_15.c
Relevant PG commit:
1a36bc9dba8eae90963a586d37b6457b32b2fed4
* Adds three different SQL/JSON values to ruleutils_15.c
Relevant PG commits:
606948b058dc16bce494270eea577011a602810e
49082c2cc3d8167cca70cfe697afb064710828ca
* Adds JSON table functions in ruleutils_15.c
Relevant PG commit:
4e34747c88a03ede6e9d731727815e37273d4bc9
* Add PLAN function for JSON table in ruleutils_15.c
Relevant PG commit:
fadb48b00e02ccfd152baa80942de30205ab3c4f
* Remove extra blank lines before block-closing braces ruleutils_15.c
Relevant PG commit:
24d2b2680a8d0e01b30ce8a41c4eb3b47aca5031
* set_deparse_plan: Reuse variable to appease Coverity ruleutils_15.c
Relevant PG commit:
e70813fbc4aaca35ec012d5a426706bd54e4acab
* Mechanical code beautification ruleutils_15.c
Relevant PG commit:
23e7b38bfe396f919fdb66057174d29e17086418
* Rename value_type to item_type in ruleutils_15.c
Relevant PG commit:
3ab9a63cb638a1fd99475668e2da9c237495aeda
* Show 'AS "?column?"' explicitly when it's important in ruleutils_15.c
Relevant PG commit:
c7461fc25558832dd347a9c8150b0f1ed85e36e8
* Fix ruleutils_15.c issues with dropped cols in funcs-returning-composite
Relevant PG commit:
c1d1e8469c77ce6b8e5310955580b4a3eee7fe96
* Change comment regarding functions returning composite in ruleutils_15.c
Relevant PG commit:
c2fa113ddb1117b1f03e91960f65d5d7d8a90270
* Replace int nodes with bool nodes where needed
In PG15, Boolean nodes are added. Pre PG15, internal Boolean values
in Create Role commands were represented by Integer nodes. This
commit replaces int nodes logic with bool nodes logic where needed.
Mostly there are CREATE ROLE logic changes.
Relevant PG commit:
941460fcf731a32e6a90691508d5cfa3d1f8eeaf
* Handle new option colliculocale in CREATE COLLATION logic
In PG15, there is an added option to use ICU as global locale provider.
pg_collation has three locale-related fields: collcollate and collctype,
which are libc-related fields, and a new one colliculocale, which is the
ICU-related field. Only the libc-related fields or the ICU-related field
is set, never both.
Relevant PG commits:
f2553d43060edb210b36c63187d52a632448e1d2
54637508f87bd5f07fb9406bac6b08240283be3b
* Add PG15 tests to CI using test images that have 15beta2 (#6093)
* Change warning message in pg_signal_backend()
Relevant PG commit:
7fa945b857cc1b2964799411f1633468826861ff
* Revert "Add missing ifdef for PG 15"
This reverts commit c7b51025ab.
* Fixes tests for ALTER TRIGGER RENAME consistency for part. tables
Relevant PG commit:
80ba4bb383538a2ee846fece6a7b8da9518b6866
* Prevent creating child triggers on partitions when adding new node
Pre PG15, tgisinternal is true for a "child" trigger on a partition
cloned from the trigger on the parent.
In PG15, tgisinternal is false in that case. However, we don't want to
create this trigger on the partition since it will create a conflict
when we try to attach the partition to the parent table:
ERROR: trigger "..." for relation "{partition_name}" already exists
Relevant PG commit:
f4566345cf40b068368cb5617e61318da60676ec
* Fix tests for generated columns dependency changes
In PG15, For GENERATED columns, all dependencies of the generation
expression are recorded as NORMAL dependencies of the column itself.
This requires CASCADE to drop generated cols with the original col.
PRE PG15, dependencies were recorded as AUTO, with which
generated columns are silently dropped with the original column.
Relevant PG commit:
cb02fcb4c95bae08adaca1202c2081cfc81a28b5
* Explicitly cast catalog "char" column to text before concatenation
Relevant PG commit:
07eee5a0dc642d26f44d65c4e6263304208e8583
* Remove 'AS "?column?"' from test outputs
There were some instances in the following tst outputs
in planning debug outputs where AS "?column?" is added.
We add a normalization rule to remove it as it is not
important.
cte_inline.out
recursive_relation_planning_restriction_pushdown.out
Relevant PG commit:
c7461fc25558832dd347a9c8150b0f1ed85e36e8
* Use pg_backup_stop(PG15) instead of pg_stop_backup(PG<15)
Add an alternative test output because of the change in the
backup modes of Postgres. Specifically here, there is a renaming
issue: pg_stop_backup PRE PG15 vs pg_backup_stop PG15+
The alternative output can be deleted when we drop support for PG14
Relevant PG commit:
39969e2a1e4d7f5a37f3ef37d53bbfe171e7d77a
* Adds citus.mitmfifo GUC
Previously we setting this configuration parameter
in the fly for failure tests schedule.
However, PG15 doesn't allow that anymore: reserved prefixes
like "citus" cannot be used to set non-existing GUCs.
Relevant PG commit:
88103567cb8fa5be46dc9fac3e3b8774951a2be7
* Handles EXPLAIN output diffs in PG15 - Extra result lines
To handle extra "Result" lines in explain outputs, we add explain
method to multi_test_helpers.sql file
- plan_without_result_lines() is added for cases where we want the
whole explain output with only "Result" lines removed
* Handles EXPLAIN output diffs in PG15, Hash Agg/Join leverage
To handle differences in usage of GroupAggregate vs HashAggregate
or Merge Join vs Hash join in cases where this detail doesn't
seem to matter, we use coordinator_plan().
- coordinator_plan() is updated to remove "Result" lines
There are some cases where we have subplans so we add a new
function that prints all Task Count lines as well
- coordinator_plan_with_subplans()
Still not sure of the relevant PG commit
Could be db0d67db2401eb6238ccc04c6407a4fd4f985832
but disabling enable_group_by_reordering didn't help.
* Handles EXPLAIN output diffs in PG15: enable_group_by_reordering
Relevant PG commit
db0d67db2401eb6238ccc04c6407a4fd4f985832
* Normalizes Memory Usage, Buckets, Batches for PG15 explain diffs
We create a new function in multi_test_helpers, which is similar
to explain_merge function in PG15. This explain helper function
normalies Memory Usage, Buckets and Batches, and we use it in the
tests which give a different output for PG15.
* Bump test images to 15beta3 (#6172)
* Omit namespace in post-copy errmsg
Relevant PG commit:
069d33d0c5a021601245e44df77a0423ddd69359
* Handles EXPLAIN output diffs in PG15: extra arrows&result lines
To handle extra "->" arrows resulting from extra Result lines
in explain outputs, we add the following explain method to
multi_test_helpers.sql file
- plan_without_arrows() is added for cases where we want the
whole explain output without arrows and without Result lines
* Alters public schema's owner to pg_database_owner in PG15
In PG15, public schema is owned by pg_database_owner role.
In multi_extension, we drop and recreate the ppublic schema,
hence its owner become the default user in our tests, postgres.
Change that to pg_database_owner for PG15 consistency.
This results in alternative test output for public schema grants
in the following test:
grant_on_schema_propagation.sql
Relevant PG commit: b073c3ccd06e4cb845e121387a43faa8c68a7b62
* Add alternative test outputs for change in Insert Select display
citus_local_tables_queries.sql
coordinator_shouldhaveshards.sql
cte_inline.sql
insert_select_repartition.sql
intermediate_result_pruning.sql
local_shard_execution.sql
local_shard_execution_replicated.sql
multi_deparse_shard_query.sql
multi_insert_select.sql
multi_insert_select_conflict.sql
multi_mx_insert_select_repartition.sql
mx_coordinator_shouldhaveshards.sql
single_node.sql
Relevant PG commit:
a8d8445a7b2f80f6d0bfe97b19f90bd2cbef8759
* Fixes columnar tap tests for PG15
In PG15, Perl test modules have been moved to a new namespace.
Also, postgres node new() and get_new_node() methods have been
unified to one method: new()
We create separate tap tests for PG13/14 and PG15+
and update the Makefiles accordingly.
Relevant PG commits:
201a76183e2056c2217129e12d68c25ec9c559c8
b3b4d8e68ae83f432f43f035c7eb481ef93e1583
* Handles EXPLAIN output diffs in PG15: HashAgg Leverage,alt. output
Still not sure of the relevant PG commit
Could be db0d67db2401eb6238ccc04c6407a4fd4f985832
but disabling enable_group_by_reordering didn't help.
* Remove if conditions with PG_VERSION_NUM < 13
* Remove server_above_twelve(&eleven) checks from tests
* Fix tests
* Remove pg12 and pg11 alternative test output files
* Remove pg12 specific normalization rules
* Some more if conditions in the code
* Change RemoteCollationIdExpression and some pg12/pg13 comments
* Remove some more normalization rules
* Support upgrade and downgrade and separate columnar as citus_columnar extension
Co-authored-by: Yanwen Jin <yanwjin@microsoft.com>
Co-authored-by: Jeff Davis <jeff@j-davis.com>
Move internal storage details to a separate schema with no public
access to limit the possibility for information leakage.
Create views with public access that show storage details for those
columnar tables where the user has ownership privileges. Include
mapping between relation ID and storage ID for easier interpretation.
Columnar: support relation options with ALTER TABLE.
Use ALTER TABLE ... SET/RESET to specify relation options rather than
alter_columnar_table_set() and alter_columnar_table_reset().
Not only is this more ergonomic, but it also allows better integration
because it can be treated like DDL on a regular table. For instance,
citus can use its own ProcessUtility_hook to distribute the new
settings to the shards.
DESCRIPTION: Columnar: support relation options with ALTER TABLE.
DESCRIPTION: Move pg_dist_object to pg_catalog
Historically `pg_dist_object` had been created in the `citus` schema as an experiment to understand if we could move our catalog tables to a branded schema. We quickly realised that this interfered with the UX on our managed services and other environments, where users connected via a user with the name of `citus`.
By default postgres put the username on the search_path. To be able to read the catalog in the `citus` schema we would need to grant access permissions to the schema. This caused newly created objects like tables etc, to default to this schema for creation. This failed due to the write permissions to that schema.
With this change we move the `pg_dist_object` catalog table to the `pg_catalog` schema, where our other schema's are also located. This makes the catalog table visible and readable by any user, like our other catalog tables, for debugging purposes.
Note: due to the change of schema, we had to disable 1 test that was running into a discrepancy between the schema and binary. Secondly, we needed to make the lookup functions for the `pg_dist_object` relation and their indexes less strict on the fallback of the naming due to an other test that, due to an unfortunate cache invalidation, needed to lookup the relation again. This makes that we won't default to _only_ resolving from `pg_catalog` outside of upgrades.
Clusters created pre-Citus 11 mostly didn't have metadata sync enabled.
For those clusters, we add a utility UDF which fixes some minor issues
and sync the necessary objects to the workers.
Replaces citus.enable_object_propagation with citus.enable_metadata_sync
Also, within Citus 11 release cycle, we added citus.enable_metadata_sync_by_default,
that is also replaced with citus.enable_metadata_sync.
In essence, when citus.enable_metadata_sync is set to true, all the objects
and the metadata is send to the remote node.
We strongly advice that the users never changes the value of
this GUC.
With this commit we've started to propagate sequences and shell
tables within the object dependency resolution. So, ensuring any
dependencies for any object will consider shell tables and sequences
as well. Separate logics for both shell tables and sequences have
been removed.
Since both shell tables and sequences logic were implemented as a
part of the metadata handling before that logic, we were propagating
them while syncing table metadata. With this commit we've divided
metadata (which means anything except shards thereafter) syncing
logic into multiple parts and implemented it either as a part of
ActivateNode. You can check the functions called in ActivateNode
to check definition of different metadata.
Definitions of start_metadata_sync_to_node and citus_activate_node
have also been updated. citus_activate_node will basically create
an active node with all metadata and reference table shards.
start_metadata_sync_to_node will be same with citus_activate_node
except replicating reference tables. stop_metadata_sync_to_node
will remove all the metadata. All of those UDFs need to be called
by superuser.
With Citus 11, the default behavior is to sync the metadata.
However, partitioned tables created pre-Citus 11 might have
index names that are not compatiable with metadata syncing.
See https://github.com/citusdata/citus/issues/4962 for the
details.
With this commit, we record the existence of partitioned tables
such that we can fix it later if any exists.
During pg upgrades, we have seen that it is not guaranteed that a
columnar table will be created after metadata objects got created.
Prior to changes done in this commit, we had such a dependency
relationship in `pg_depend`:
```
columnar_table ----> columnarAM ----> citus extension
^ ^
| |
columnar.storage_id_seq -------------------- |
|
columnar.stripe -------------------------------
```
Since `pg_upgrade` just knows to follow topological sort of the objects
when creating database dump, above dependency graph doesn't imply that
`columnar_table` should be created before metadata objects such as
`columnar.storage_id_seq` and `columnar.stripe` are created.
For this reason, with this commit we add new records to `pg_depend` to
make columnarAM depending on all rel objects living in `columnar`
schema. That way, `pg_upgrade` will know it needs to create those before
creating `columnarAM`, and similarly, before creating any tables using
`columnarAM`.
Note that in addition to inserting those records via installation script,
we also do the same in `citus_finish_pg_upgrade()`. This is because,
`pg_upgrade` rebuilds catalog tables in the new cluster and that means,
we must insert them in the new cluster too.
We recently introduced a set of patches to 10.2, and introduced 10.2-4
migration version. This migration version only resides on `release-10.2`
branch, and is missing on our default branch. This creates a problem
because we do not have a valid migration path from 10.2 to latest 11.0.
To remedy this issue, I copied the relevant migration files from
`release-10.2` branch, and renamed some of our migration files on
default branch to make sure we have a linear upgrade path.
Considering all code-paths that we might interact with a columnar table,
add `CheckCitusVersion` calls to tableAM callbacks:
- initializing table scan (`columnar_beginscan` & `columnar_index_fetch_begin`)
- setting a new filenode for a relation (storage initializiation or a table rewrite)
- truncating the storage
- inserting tuple (single and multi)
Also add `CheckCitusVersion` call to:
- drop hook (`ColumnarTableDropHook`)
- `alter_columnar_table_set` & `alter_columnar_table_reset` UDFs