With this commit, we implemented some basic features of reference tables.
To start with, a reference table is
* a distributed table whithout a distribution column defined on it
* the distributed table is single sharded
* and the shard is replicated to all nodes
Reference tables follows the same code-path with a single sharded
tables. Thus, broadcast JOINs are applicable to reference tables.
But, since the table is replicated to all nodes, table fetching is
not required any more.
Reference tables support the uniqueness constraints for any column.
Reference tables can be used in INSERT INTO .. SELECT queries with
the following rules:
* If a reference table is in the SELECT part of the query, it is
safe join with another reference table and/or hash partitioned
tables.
* If a reference table is in the INSERT part of the query, all
other participating tables should be reference tables.
Reference tables follow the regular co-location structure. Since
all reference tables are single sharded and replicated to all nodes,
they are always co-located with each other.
Queries involving only reference tables always follows router planner
and executor.
Reference tables can have composite typed columns and there is no need
to create/define the necessary support functions.
All modification queries, master_* UDFs, EXPLAIN, DDLs, TRUNCATE,
sequences, transactions, COPY, schema support works on reference
tables as expected. Plus, all the pre-requisites associated with
distribution columns are dismissed.
With this PR, we add foreign key support to ALTER TABLE commands. For now,
we only support foreign constraint creation via ALTER TABLE query, if it
is only subcommand in ALTER TABLE subcommand list.
We also only allow foreign key creation if replication factor is 1.
While creating a colocated table, we don't want the source table to be dropped.
However, using a ShareLock blocks DML statements on the source table, and
using AccessShareLock is enough to prevent DROP. Therefore, we just loosened
the lock to AccessShareLock.
Added a new UDF, mark_tables_colocated(), to colocate tables with the same
configuration (shard count, shard replication count and distribution column type).
Previously, when a repair is requested on a shard, we also repair all co-located shards
of given shard, which may cause repairing already healthy shards. With this change, we
only repair given shard.
This forces prepared statements to be re-planned after changes of the
placement metadata. There's some locking issues remaining, but that's a
a separate task.
Also add regression tests verifying that invalidations take effect on
prepared statements.
With this change, we now push down foreign key constraints created during CREATE TABLE
statements. We also start to send foreign constraints during shard move along with
other DDL statements
With this change, master_copy_shard_placement and master_move_shard_placement functions
start to copy/move given shard along with its co-located shards.
This change adds the required infrastructure about metadata snapshot from MX
codebase into Citus, mainly metadata_sync.c file and master_metadata_snapshot UDF.
So far placements were assigned an Oid, but that was just used to track
insertion order. It also did so incompletely, as it was not preserved
across changes of the shard state. The behaviour around oid wraparound
was also not entirely as intended.
The newly introduced, explicitly assigned, IDs are preserved across
shard-state changes.
The prime goal of this change is not to improve ordering of task
assignment policies, but to make it easier to reference shards. The
newly introduced UpdateShardPlacementState() makes use of that, and so
will the in-progress connection and transaction management changes.
Related to #786
This change adds the `pg_dist_node` table that contains the information
about the workers in the cluster, replacing the previously used
`pg_worker_list.conf` file (or the one specified with `citus.worker_list_file`).
Upon update, `pg_worker_list.conf` file is read and `pg_dist_node` table is
populated with the file's content. After that, `pg_worker_list.conf` file
is renamed to `pg_worker_list.conf.obsolete`
For adding and removing nodes, the change also includes two new UDFs:
`master_add_node` and `master_remove_node`, which require superuser
permissions.
'citus.worker_list_file' guc is kept for update purposes but not used after the
update is finished.
related to a table that might be distributed, allowing any name
that is within regular PostgreSQL length limits to be extended
with a shard ID for use in shards on workers. Handles multi-byte
character boundaries in identifiers when making prefixes for
shard-extended names. Includes tests.
Uses hash_any from PostgreSQL's access/hashfunc.c.
Removes AppendShardIdToStringInfo() as it's used only once
and arguably is best replaced there with a call to AppendShardIdToName().
Adds UDF shard_name(object_name, shard_id) to expose the shard-extended
name logic to other PL/PGSQL, UDFs and scripts.
Bumps version to 6.0-2 to allow for UDF to be created in migration script.
Fixescitusdata/citus#781 and citusdata/citus#179.
Fixescitusdata/citus#714
On `InsertShardRow`, we previously called `CommandCounterIncrement()` before
`CitusInvalidateRelcacheByRelid(relationId);`. This might prevent to skip
invalidation of the distributed table in the next access within the same session.
UNIQUE or PRIMARY KEY constraints. Also, properly propagate valid
EXCLUDE constraints to worker shard tables.
If an EXCLUDE constraint includes the distribution column,
the operator must be an equality operator.
Tests in regression suite for exclusion constraints that include
the partition column, omit it, and include it but with non-equality
operator. Regression tests also verify that valid exclusion constraints
are propagated to the shard tables. And the tests work in different
timezones now.
Fixescitusdata/citus#748 and citusdata/citus#778.
Three changes here to get to true multi-statement, multi-relation DDL
transactions (same functionality pre-5.2, with benefits of atomicity):
1. Changed the multi-shard utility hook to always run (consistency
with router executor hook, removes ad-hoc "installed" boolean)
2. Change the global connection list in multi_shard_transaction to
instead be a hash; update related functions to operate on global
hash instead of local hash/global list
3. Remove check within DDL code to prevent subsequent DDL commands;
place unset/reset guard around call to ConnectToNode to permit
connecting to additional nodes after DDL transaction has begun
In addition, code has been added to raise an error if a ROLLBACK TO
SAVEPOINT is attempted (similar to router executor), and comprehensive
tests execute all multi-DDL scenarios (full success, user ROLLBACK, any
actual errors (say, duplicate index), partial failure (duplicate index
on one node but not others), partial COMMIT (one node fails), and 2PC
partial PREPARE (one node fails)). Interleavings with other commands
(DML, \copy) are similarly all covered.
This adds support for SERIAL/BIGSERIAL column types. Because we now can
evaluate functions on the master (during execution), adding this is a
matter of ensuring the table creation step works properly.
To accomplish this, I've added some logic to detect sequences owned by
a table (i.e. those related to its columns). Simply creating a sequence
and using it in a default value is insufficient; users who do so must
ensure the sequence is owned by the column using it.
Fortunately, this is exactly what SERIAL and BIGSERIAL do, which is the
use case we're targeting with this feature. While testing this, I found
that worker_apply_shard_ddl_command actually adds shard identifiers to
sequence names, though I found no places that use or test this path. I
removed that code so that sequence names are not mutated and will match
those used by a SERIAL default value expression.
Our use of the new-to-9.5 CREATE SEQUENCE IF NOT EXISTS syntax means we
are dropping support for 9.4 (which is being done regardless, but makes
this change simpler). I've removed 9.4 from the Travis build matrix.
Some edge cases are possible in ALTER SEQUENCE, COPY FROM (on workers),
and CREATE SEQUENCE OWNED BY. I've added errors for each so that users
understand when and why certain operations are prohibited.
Fixes#676
We added old versions (i.e. without schema name) of worker_apply_shard_ddl_command,
worker_fetch_foreign_file and worker_fetch_regular_table back. During function call
of one of these functions, we set schema name as public schema and call the newer
version of the functions.
Fixes#565Fixes#626
To add schema support to citus, we need to schema-prefix all table names, object names etc.
in the queries sent to worker nodes. However; query deparsing is not available for most of
DDL commands, therefore it is not easy to generate worker query in the master node.
As a solution we are sending schema names along with shard id and query to run to worker
nodes with worker_apply_shard_ddl_command.
To not break \STAGE command we pass public schema as paramater while calling
worker_apply_shard_ddl_command from there. This will not cause problem if user uses \STAGE
in different schema because passes schema name is used only if there is no schema name is
given in the query.
Fixes#215Fixes#267Fixes#502Fixes#556Fixes#557Fixes#560Fixes#568Fixes#623Fixes#624
With this change we schema-prefix table names, operator names and composite types.
Fixes#513
This change modifies the DDL Propagation logic so that DDL queries
are propagated via 2-Phase Commit protocol. This way, failures during
the execution of distributed DDL commands will not leave the table in
an intermediate state and the pending prepared transactions can be
commited manually.
DDL commands are not allowed inside other transaction blocks or functions.
DDL commands are performed with 2PC regardless of the value of
`citus.multi_shard_commit_protocol` parameter.
The workflow of the successful case is this:
1. Open individual connections to all shard placements and send `BEGIN`
2. Send `SELECT worker_apply_shard_ddl_command(<shardId>, <DDL Command>)`
to all connections, one by one, in a serial manner.
3. Send `PREPARE TRANSCATION <transaction_id>` to all connections.
4. Sedn `COMMIT` to all connections.
Failure cases:
- If a worker problem occurs before sending of all DDL commands is finished, then
all changes are rolled back.
- If a worker problem occurs after all DDL commands are sent but not after
`PREPARE TRANSACTION` commands are finished, then all changes are rolled back.
However, if a worker node is failed, then the prepared transactions in that worker
should be rolled back manually.
- If a worker problem occurs during `COMMIT PREPARED` statements are being sent,
then the prepared transactions on the failed workers should be commited manually.
- If master fails before the first 'PREPARE TRANSACTION' is sent, then nothing is
changed on workers.
- If master fails during `PREPARE TRANSACTION` commands are being sent, then the
prepared transactions on workers should be rolled back manually.
- If master fails during `COMMIT PREPARED` or `ROLLBACK PREPARED` commands are being
sent, then the remaining prepared transactions on the workers should be handled manually.
This change also helps with #480, since failed DDL changes no longer mark
failed placements as inactive.