With this PR, Citus starts to support all possible ways to create
distributed partitioned tables. These are;
- Distributing already created partitioning hierarchy
- CREATE TABLE ... PARTITION OF a distributed_table
- ALTER TABLE distributed_table ATTACH PARTITION non_distributed_table
- ALTER TABLE distributed_table ATTACH PARTITION distributed_table
We also support DETACHing partitions from partitioned tables and propogating
TRUNCATE and DDL commands to distributed partitioned tables.
This PR also refactors some parts of distributed table creation logic.
This change removes distributed tables' dependency on distribution key columns. We already check that we cannot drop distribution key columns in ErrorIfUnsupportedAlterTableStmt() at multi_utility.c, so we don't need to have distributed table to distribution key column dependency to avoid dropping of distribution key column.
Furthermore, having this dependency causes some warnings in pg_dump --schema-only (See #866), which are not desirable.
This change also adds check to disallow drop of distribution keys when citus.enable_ddl_propagation is set to false. Regression tests are updated accordingly.
Before this change, we used ShareLock to acquire lock on distributed tables while
running VACUUM. This makes VACUUM and INSERT block each other. With this change we
changed lock mode from ShareLock to ShareUpdateExclusiveLock, which does not conflict
with the locks INSERT acquire.
Adds support for PostgreSQL 10 by copying in the requisite ruleutils
and updating all API usages to conform with changes in PostgreSQL 10.
Most changes are fairly minor but they are numerous. One particular
obstacle was the change in \d behavior in PostgreSQL 10's psql; I had
to add SQL implementations (views, mostly) to mimic the pre-10 output.
During version update, we indirectly calld CheckInstalledVersion via
ChackCitusVersions. This obviously fails because during version update it is
expected to have version mismatch between installed version and binary version.
Thus, we remove that ChackCitusVersions. We now only call ChackAvailableVersion.
Before this commit, we were erroring out at almost all queries if there is a
version mismatch. With this commit, we started to error out only requested
operation touches distributed tables.
Normally we would need to use distributed cache to understand whether a table
is distributed or not. However, it is not safe to read our metadata tables when
there is a version mismatch, thus it is not safe to create distributed cache.
Therefore for this specific occasion, we directly read from pg_dist_partition
table. However; reading from catalog is costly and we should not use this
method in other places as much as possible.
Pretty straightforward. Had some concerns about locking, but due to the
fact that all distributed operations use either some level of deparsing
or need to enumerate column names, they all block during any concurrent
column renames (due to the AccessExclusive lock).
In addition, I had some misgivings about permitting renames of the dis-
tribution column, but nothing bad comes from just allowing them.
Finally, I tried to trigger any sort of error using prepared statements
and could not trigger any errors not also exhibited by plain PostgreSQL
tables.
This was getting pretty long and complex in the context of the main
utility hook. Moved out the checks for what should skip Citus process-
ing and what should have version checks performed.
With this change, we start to error out if loaded citus binaries does not match
the available major version or installed citus extension version. In this case
we force user to restart the server or run ALTER EXTENSION depending on the
situation
Thought this looked slightly nicer than the default behavior.
Changed preventTransaction to concurrent to be clearer that this code
path presently affects CONCURRENTLY code only.
Coordinator code marks index as invalid as a base, set it as valid in a
transactional layer atop that base, then proceeds with worker commands.
If a worker command has problems, the rollback results in an index with
isvalid = false. If everything succeeds, the user sees a valid index.
Some tests relied on worker errors though local commands were invalid.
Fixed those by ensuring preconditions were met to have command work
correctly. Otherwise most test changes are related to slight changes
in local/remote error ordering.
We have one replication of reference table for each node. Therefore all problems with
replication factor > 1 also applies to reference table. As a solution we will not allow
foreign keys on reference tables. It is not possible to define foreign key from, to or
between reference tables.
With this commit, we implemented some basic features of reference tables.
To start with, a reference table is
* a distributed table whithout a distribution column defined on it
* the distributed table is single sharded
* and the shard is replicated to all nodes
Reference tables follows the same code-path with a single sharded
tables. Thus, broadcast JOINs are applicable to reference tables.
But, since the table is replicated to all nodes, table fetching is
not required any more.
Reference tables support the uniqueness constraints for any column.
Reference tables can be used in INSERT INTO .. SELECT queries with
the following rules:
* If a reference table is in the SELECT part of the query, it is
safe join with another reference table and/or hash partitioned
tables.
* If a reference table is in the INSERT part of the query, all
other participating tables should be reference tables.
Reference tables follow the regular co-location structure. Since
all reference tables are single sharded and replicated to all nodes,
they are always co-located with each other.
Queries involving only reference tables always follows router planner
and executor.
Reference tables can have composite typed columns and there is no need
to create/define the necessary support functions.
All modification queries, master_* UDFs, EXPLAIN, DDLs, TRUNCATE,
sequences, transactions, COPY, schema support works on reference
tables as expected. Plus, all the pre-requisites associated with
distribution columns are dismissed.