Commit Graph

395 Commits (9fd3f62cb62f4c91caea27b32ce7df2d10ed95f0)

Author SHA1 Message Date
Onder Kalaci c1b5a04f6e Allow partitioned tables with replication factor > 1
With this commit, we all partitioned distributed tables with
replication factor > 1. However, we also have many restrictions.

In summary, we disallow all kinds of modifications (including DDLs)
on the partition tables. Instead, the user is allowed to run the
modifications over the parent table.

The necessity for such a restriction have two aspects:
   - We need to acquire shard resource locks appropriately
   - We need to handle marking partitions INVALID in case
     of any failures. Note that, in theory, the parent table
     should also become INVALID, which is too aggressive.
2018-09-21 14:40:41 +03:00
Onder Kalaci 5cf8fbe7b6 Add infrastructure to relation if exists 2018-09-07 14:49:36 +03:00
Onder Kalaci 1b3257816e Make sure that table is dropped before shards are dropped
This commit fixes a bug where a concurrent DROP TABLE deadlocks
with SELECT (or DML) when the SELECT is executed from the workers.

The problem was that Citus used to remove the metadata before
droping the table on the workers. That creates a time window
where the SELECT starts running on some of the nodes and DROP
table on some of the other nodes.
2018-09-04 08:57:20 +03:00
velioglu bd30e3e908 Add support for writing to reference tables from MX nodes 2018-08-27 18:15:04 +03:00
mehmet furkan şahin ef9f38b68d ApplyLogRedaction noop func is added 2018-08-17 14:48:54 -07:00
Onder Kalaci 7fb529aab9 Some stylistic improvements in the foreign keys to reference table
changes.
2018-07-05 23:23:34 +03:00
Nils Dijk c1c8c38dc9 create placeholder for policy ddl 2018-07-05 11:07:01 +02:00
Onder Kalaci d83be3a33f Enforce foreign key restrictions inside transaction blocks
When a hash distributed table have a foreign key to a reference
table, there are few restrictions we have to apply in order to
prevent distributed deadlocks or reading wrong results.

The necessity to apply the restrictions arise from cascading
nature of foreign keys. When a foreign key on a reference table
cascades to a distributed table, a single operation over a single
connection can acquire locks on multiple shards of the distributed
table. Thus, any parallel operation on that distributed table, in the
same transaction should not open parallel connections to the shards.
Otherwise, we'd either end-up with a self-distributed deadlock or
read wrong results.

As briefly described above, the restrictions that we apply is done
by tracking the distributed/reference relation accesses inside
transaction blocks, and act accordingly when necessary.

The two main rules are as follows:
   - Whenever a parallel distributed relation access conflicts
     with a consecutive reference relation access, Citus errors
     out
   - Whenever a reference relation access is followed by a
     conflicting parallel relation access, the execution mode
     is switched to sequential mode.

There are also some other notes to mention:
   - If the user does SET LOCAL citus.multi_shard_modify_mode
     TO 'sequential';, all the queries should simply work with
     using one connection per worker and sequentially executing
     the commands. That's obviously a slower approach than Citus'
     usual parallel execution. However, we've at least have a way
     to run all commands successfully.

   - If an unrelated parallel query executed on any distributed
     table, we cannot switch to sequential mode. Because, the essense
     of sequential mode is using one connection per worker. However,
     in the presence of a parallel connection, the connection manager
     picks those connections to execute the commands. That contradicts
     with our purpose, thus we error out.

   - COPY to a distributed table cannot be executed in sequential mode.
     Thus, if we switch to sequential mode and COPY is executed, the
     operation fails and there is currently no way of implementing that.
     Note that, when the local table is not empty and create_distributed_table
     is used, citus uses COPY internally. Thus, in those cases,
     create_distributed_table() will also fail.

   - There is a GUC called citus.enforce_foreign_key_restrictions
     to disable all the checks. We added that GUC since the restrictions
     we apply is sometimes a bit more restrictive than its necessary.
     The user might want to relax those. Similarly, if you don't have
     CASCADEing reference tables, you might consider disabling all the
     checks.
2018-07-03 17:05:55 +03:00
velioglu 6be6911ed9 Create foreign key relation graph and functions to query on it 2018-07-03 17:05:55 +03:00
mehmet furkan şahin 2c5d59f3a8 create_distributed_table in transaction is fixed 2018-07-03 17:05:01 +03:00
Onder Kalaci 2f01894589 Track relation accesses using the connection management infrastructure 2018-06-25 18:40:30 +03:00
mehmet furkan şahin 2b2ce036eb create_distributed_table honors sequential mode 2018-06-19 17:33:45 +03:00
mehmet furkan şahin d1a3b20115 foreign_constraint_utils is created 2018-06-07 18:19:24 +03:00
Marco Slot 2559b84049 Drop shards as current user instead of super user 2018-05-01 09:57:20 +02:00
Marco Slot 3d3c19a717
Improve messages for essential connection failures 2018-04-26 12:58:47 -06:00
Murat Tuncer a6fe5ca183 PG11 compatibility update
- changes in ruleutils_11.c is reflected
- vacuum statement api change is handled. We now allow
  multi-table vacuum commands.
- some other function header changes are reflected
- api conflicts between PG11 and earlier versions
  are handled by adding shims in version_compat.h
- various regression tests are fixed due output and
  functionality in PG1
- no change is made to support new features in PG11
  they need to be handled by new commit
2018-04-26 11:29:43 +03:00
Brian Cloutier 42ddfa176d Fix crash on Windows where there is no detail 2018-04-13 12:54:22 -07:00
Burak Yucesoy 0c283fa8a3 Add partitioning support to MX tables
Previously, we prevented creation of partitioned tables on Citus MX.
We decided to not focus on this feature until there is a need. Since
now there are requests for this feature, we are implementing support
for partitioned tables on Citus MX.
2018-04-06 12:47:06 +03:00
Metin Doslu 3b7b64a8b6 Remove skip_jsonb_validation_in_copy GUC 2018-03-13 10:33:27 +02:00
Marco Slot 6f7c3bd73b Skip JSON validation on coordinator during COPY 2018-02-02 15:33:27 +01:00
Marco Slot 36ee21c323 Make CanUseBinaryCopyFormatForType public 2017-12-14 09:32:55 +01:00
Marco Slot 4cdadfcab6 Add intermediate results infrastructure 2017-12-04 14:50:11 +01:00
Marco Slot bfcc76df69 Make several COPY-related functions public 2017-12-04 13:12:03 +01:00
Murat Tuncer 2d66bf5f16
Fix hard coded formatting strings for 64 bit numbers (#1831)
Postgres provides OS agnosting formatting macros for
formatting 64 bit numbers. Replaced %ld %lu with
INT64_FORMAT and UINT64_FORMAT respectively.

Also found some incorrect usages of formatting
flags and fixed them.
2017-12-04 14:11:06 +03:00
Hadi Moshayedi ff706cf556 Test that COPY blocks UPDATE/DELETE/INSERT...SELECT when rep factor 2. 2017-11-30 14:52:29 -05:00
Marco Slot acbc0fe0de Use RowExclusiveLock shard resource lock in COPY 2017-11-30 09:15:45 -05:00
Marco Slot a9933deac6 Make real time executor work in transactions 2017-11-30 09:59:32 +03:00
Brian Cloutier 7be1545843 Support implicit casts during INSERT/SELECT
It's possible to build INSERT SELECT queries which include implicit
casts, currently we attempt to support these by adding explicit casts to
the SELECT query, but this sometimes crashes because we don't update all
nodes with the new types. (SortClauses, for instance)

This commit removes those explicit casts and passes an unmodified SELECT
query to the COPY executor (how we implement INSERT SELECT under the
scenes). In lieu of those cases, COPY has been given some extra logic to
inspect queries, notice that the types don't line up with the table it's
supposed to be inserting into, and "manually" casting every tuple before
sending them to workers.
2017-11-03 22:27:15 -07:00
Marco Slot 6883a09cdd Allow distributed partitioned table creation in Cloud 2017-11-03 10:09:18 +01:00
Brian Cloutier 91ff8cd2d5 {*,}create_distributed_table doesn't emit OID (#1710) 2017-10-16 18:08:51 -06:00
Jason Petersen b4d53423fa
Add adapter functions for OpenFile changes 2017-09-25 17:20:24 -07:00
Jason Petersen bbc15e0598
Handle HASHPROC changes
PostgreSQL 11 now has "standard" and "extended" (64-bit) versions of
hash functions.
2017-09-25 17:20:24 -07:00
Jason Petersen 6c9b19a954
Add version-compat header
For polyfill macros, etc.
2017-09-25 17:20:23 -07:00
Jason Petersen fbeaa2f9d0
Remove direct access to tupleDesc->attrs
A level of indirection was removed from this field for PostgreSQL 11.
By using the handy provided macro, we can be version agnostic.
2017-09-25 17:20:23 -07:00
Onder Kalaci 4782f9f98a Properly copy and trim the error messages that come from pg_conn
When a NULL connection is provided to PQerrorMessage(), the
returned error message is a static text. Modifying that static
text, which doesn't necessarly be in a writeable memory, is
dangreous and might cause a segfault.
2017-09-22 19:43:09 +03:00
Onder Kalaci 33ec33c5b3 Ensure schema exists on reference table creation
If the schema doesn't exists on the workers, create it.
2017-09-18 23:50:47 +03:00
Marco Slot cbe16169b4 Free per-tuple COPY memory in INSERT...SELECT 2017-09-12 15:35:53 -07:00
Marco Slot cf375d6a66 Consider dropped columns that precede the partition column in COPY 2017-08-22 13:02:35 +02:00
Marco Slot 4614814de1 Enable 2PC for INSERT...SELECT via coordinator 2017-08-15 13:44:20 +02:00
Marco Slot 9232823070 Abort on failure on master connection during copy from worker 2017-08-15 13:44:20 +02:00
Marco Slot 53584affa8 Fix locking in create_distributed_table 2017-08-11 11:34:33 +03:00
Burak Yucesoy 2eee556738 Add distributed partitioned table support for COPY
For partitioned tables, PostgreSQL opens partition and its partitions
in BeginCopyFrom and it expects its caller to close those relations.
However, we do not have quick access to opened relations and performing
special operations for partitioned tables isn't necessary in coordinator
node. Therefore before calling BeginCopyFrom, we change relkind of those
partitioned tables to RELKIND_RELATION. This prevents PostgreSQL to open
its partitions as well.
2017-08-09 10:01:35 +03:00
Burak Yucesoy fddf9b3fcc Add distributed partitioned table support distributed table creation
With this PR, Citus starts to support all possible ways to create
distributed partitioned tables. These are;

- Distributing already created partitioning hierarchy
- CREATE TABLE ... PARTITION OF a distributed_table
- ALTER TABLE distributed_table ATTACH PARTITION non_distributed_table
- ALTER TABLE distributed_table ATTACH PARTITION distributed_table

We also support DETACHing partitions from partitioned tables and propogating
TRUNCATE and DDL commands to distributed partitioned tables.

This PR also refactors some parts of distributed table creation logic.
2017-08-09 10:01:35 +03:00
Marco Slot d3e9746236 Avoid connections that accessed non-colocated placements in multi-shard commands 2017-08-08 18:32:34 +02:00
Burak Yucesoy 7769f1d012 Refactor distributed table creation logic
This commit is preperation for introducing distributed partitioned
table support. We want to clean and refactor some code in distributed
table creation logic so that we can handle partitioned tables in more
robust way.
2017-07-31 11:11:23 +03:00
Brian Cloutier ec99f8f983 Add nodeRole column
- master_add_node enforces that there is only one primary per group
- there's also a trigger on pg_dist_node to prevent multiple primaries
  per group
- functions in metadata cache only return primary nodes
- Rename ActiveWorkerNodeList -> ActivePrimaryNodeList
- Rename WorkerGetLive{Node->Group}Count()
- Refactor WorkerGetRandomCandidateNode
- master_remove_node only complains about active shard placements if the
  node being removed is a primary.
- master_remove_node only deletes all reference table placements in the
  group if the node being removed is the primary.
- Rename {Node->NodeGroup}HasShardPlacements, this reflects the behavior it
  already had.
- Rename DeleteAllReferenceTablePlacementsFrom{Node->NodeGroup}. This also
  reflects the behavior it already had, but the new signature forces the
  caller to pass in a groupId
- Rename {WorkerGetLiveGroup->ActivePrimaryNode}Count
2017-07-24 11:57:46 +03:00
velioglu 6ea15fbb25 Make create_distributed_table transactional 2017-07-18 12:35:40 +03:00
Andres Freund 90a2d13a64 Move multi_copy.c to interrupt aware libpq wrappers. 2017-07-04 14:46:03 -07:00
Andres Freund 3dedeadb5e Fix memory leak in RemoteFinalizedShardPlacementList(). 2017-07-04 12:38:52 -07:00
Andres Freund b96ba9b490 Fix code only enabled for 9.5.
There's still supporting wrappers used, a subsequent commit will
remove those.

This also removes the already unused tuplecount_t define.
2017-06-26 08:46:32 -07:00
Jason Petersen 2204da19f0 Support PostgreSQL 10 (#1379)
Adds support for PostgreSQL 10 by copying in the requisite ruleutils
and updating all API usages to conform with changes in PostgreSQL 10.
Most changes are fairly minor but they are numerous. One particular
obstacle was the change in \d behavior in PostgreSQL 10's psql; I had
to add SQL implementations (views, mostly) to mimic the pre-10 output.
2017-06-26 02:35:46 -06:00
Marco Slot a6f42e4948 Clarify error message when copying NULL value into table 2017-06-22 15:48:24 +02:00
Marco Slot 2f8ac82660 Execute INSERT..SELECT via coordinator if it cannot be pushed down
Add a second implementation of INSERT INTO distributed_table SELECT ... that is used if
the query cannot be pushed down. The basic idea is to execute the SELECT query separately
and pass the results into the distributed table using a CopyDestReceiver, which is also
used for COPY and create_distributed_table. When planning the SELECT, we go through
planner hooks again, which means the SELECT can also be a distributed query.

EXPLAIN is supported, but EXPLAIN ANALYZE is not because preventing double execution was
a lot more complicated in this case.
2017-06-22 15:46:30 +02:00
Marco Slot 2cd358ad1a Support quoted column-names in COPY logic 2017-06-22 15:45:57 +02:00
Marco Slot 70abfd29d2 Allow COPY after a multi-shard command
This change removes the XactModificationLevel check at the start of COPY
that was made redundant by consistently using GetPlacementConnection.
2017-06-09 13:54:58 +02:00
Burak Yucesoy 9fb15c439c Add version checks to necessary UDFs 2017-05-22 09:53:29 +03:00
Marco Slot 6f9e18de24 Ensure all preceding writes are visible in data migration 2017-05-11 09:42:12 +02:00
Andres Freund d399f395f7 Faster shard pruning.
So far citus used postgres' predicate proofing logic for shard
pruning, except for INSERT and COPY which were already optimized for
speed.  That turns out to be too slow:
* Shard pruning for SELECTs is currently O(#shards), because
  PruneShardList calls predicate_refuted_by() for every
  shard. Obviously using an O(N) type algorithm for general pruning
  isn't good.
* predicate_refuted_by() is quite expensive on its own right. That's
  primarily because it's optimized for doing a single refutation
  proof, rather than performing the same proof over and over.
* predicate_refuted_by() does not keep persistent state (see 2.) for
  function calls, which means that a lot of syscache lookups will be
  performed. That's particularly bad if the partitioning key is a
  composite key, because without a persistent FunctionCallInfo
  record_cmp() has to repeatedly look-up the type definition of the
  composite key. That's quite expensive.

Thus replace this with custom-code that works in two phases:
1) Search restrictions for constraints that can be pruned upon
2) Use those restrictions to search for matching shards in the most
   efficient manner available:
   a) Binary search / Hash Lookup in case of hash partitioned tables
   b) Binary search for equal clauses in case of range or append
      tables without overlapping shards.
   c) Binary search for inequality clauses, searching for both lower
      and upper boundaries, again in case of range or append
      tables without overlapping shards.
   d) exhaustive search testing each ShardInterval

My measurements suggest that we are considerably, often orders of
magnitude, faster than the previous solution, even if we have to fall
back to exhaustive pruning.
2017-04-28 14:40:41 -07:00
Jason Petersen 42ee7c05f5
Refactor FindShardInterval to use cacheEntry
All callers fetch a cache entry and extract/compute arguments for the
eventual FindShardInterval call, so it makes more sense to refactor
into that function itself; this solves the use-after-free bug, too.
2017-04-27 13:32:36 -06:00
velioglu 24d24db25c Implement ALTER TABLE ADD CONSTRAINT command 2017-04-20 15:02:33 +03:00
Burak Yucesoy e9095e62ec Decouple reference table replication
With this change we add an option to add a node without replicating all reference
tables to that node. If a node is added with this option, we mark the node as
inactive and no queries will sent to that node.

We also added two new UDFs;
 - master_activate_node(host, port):
    - marks node as active and replicates all reference tables to that node
 - master_add_inactive_node(host, port):
    - only adds node to pg_dist_node
2017-04-17 13:33:31 +03:00
velioglu 1fb11c738f Check binary output function of type. 2017-04-10 16:28:09 +03:00
Jason Petersen 047825c6ca
Rename misleading allowEmpty parameter
Last bit of PR feedback.
2017-02-28 22:48:00 -07:00
Marco Slot 56d4d375c2 Address review feedback in create_distributed_table data loading 2017-02-28 17:39:45 +01:00
Marco Slot db98c28354 Address review feedback in COPY refactoring 2017-02-28 17:39:45 +01:00
Marco Slot d74fb764b1 Use CitusCopyDestReceiver for regular COPY 2017-02-28 17:24:45 +01:00
Marco Slot d11eca7d4a Load data into distributed table on creation 2017-02-28 17:24:45 +01:00
Marco Slot bf3541cb24 Add CitusCopyDestReceiver infrastructure 2017-02-28 17:24:45 +01:00
Eren Basak df9cf346ee Enforce statement based replication on old APIs and non-hash tables
This change ignores `citus.replication_model` setting and uses the
statement based replication in

- Tables distributed via the old `master_create_distributed_table` function
- Append and range partitioned tables, even if created via
`create_distributed_table` function

This seems like the easiest solution to #1191, without changing the existing
behavior and harming existing users with custom scripts.

This change also prevents RF>1 on streaming replicated tables on `master_create_worker_shards`

Prior to this change, `master_create_worker_shards` command was not checking
the replication model of the target table, thus allowing RF>1 with streaming
replicated tables. With this change, `master_create_worker_shards` errors
out on the case.
2017-02-16 10:37:53 -08:00
Brian Cloutier 1173f3f225 Refactor CheckShardPlacements
- Break CheckShardPlacements into multiple functions (The most important
  is MarkFailedShardPlacements), so that we can get rid of the global
  CoordinatedTransactionUses2PC.
- Call MarkFailedShardPlacements in the router executor, so we mark
  shards as invalid and stop using them while inside transaction blocks.
2017-01-26 13:20:45 +02:00
Marco Slot f56454360c Mark failed placements as inactive immediately after COPY 2017-01-25 19:19:39 +03:00
Marco Slot b1626887d5 Don't mark placements inactive in COPY after successful connection 2017-01-25 19:19:38 +03:00
Marco Slot d0c76407b8 Set placement to inactive on connection failure in COPY 2017-01-25 19:19:38 +03:00
Marco Slot ba940a1de9 Use coordinator instead of schema node in terminology 2017-01-25 11:07:23 +01:00
Jason Petersen 56197dbdba
Add replication_model GUC
This adds a replication_model GUC which is used as the replication
model for any new distributed table that is not a reference table.
With this change, tables with replication factor 1 are no longer
implicitly MX tables.

The GUC is similarly respected during empty shard creation for e.g.
existing append-partitioned tables. If the model is set to streaming
while replication factor is greater than one, table and shard creation
routines will error until this invalid combination is corrected.

Changing this parameter requires superuser permissions.
2017-01-23 09:05:14 -07:00
Murat Tuncer d76f781ae4 Convert multi copy to use new connection api
This enables proper transactional behaviour for copy and relaxes some
restrictions like combining COPY with single-row modifications. It
also provides the basis for relaxing restrictions further, and for
optionally allowing connection caching.
2017-01-20 19:15:19 -08:00
Eren Basak 4def1ca696 Prevent COPY to reference tables from worker nodes 2017-01-18 17:38:01 +03:00
Eren Basak e7c15ecc1f Make `upgrade_to_reference_table` function MX-compatible 2017-01-18 16:49:50 +03:00
Eren Basak e44d226221 Propagate Metadata to Workers on `create_reference_table` call. 2017-01-18 11:05:24 +03:00
Andres Freund bdef35ac14 Query placementId in RemoteFinalizedShardPlacementList().
Not having the id in the ShardPlacement struct causes issues while
making copy use the placement aware connection management.
2017-01-17 13:27:26 -08:00
Onder Kalaci 1efa301ada Copy on reference tables should never mark placements invalid
This commit ensures that COPY does not mark any placement
of reference's state as INVALID in case of an error.
2017-01-12 02:43:41 +02:00
Burak Yucesoy 31cd2357fe Add upgrade_to_reference_table
With this change we introduce new UDF, upgrade_to_reference_table, which can be used to
upgrade existing broadcast tables reference tables. For upgrading, we require that given
table contains only one shard.
2017-01-02 17:54:42 +02:00
Eren Basak 7e09bd6836 Error on Unsupported Features on Workers
This change makes the metadata workers error out on unsupported commands.
2017-01-02 16:03:45 +03:00
Burak Yucesoy 88ee7802dd Address Onder's comments 2016-12-28 12:26:16 +03:00
Burak Yucesoy bb9e95e134 Error out on foreign keys with reference tables
We have one replication of reference table for each node. Therefore all problems with
replication factor > 1 also applies to reference table. As a solution we will not allow
foreign keys on reference tables. It is not possible to define foreign key from, to or
between reference tables.
2016-12-28 10:58:26 +03:00
Eren Basak 31af40cc26 Handle MX tables on workers during drop table commands 2016-12-23 15:43:32 +03:00
Eren Basak 71d73ec5ff Propagate DDL commands to metadata workers for MX tables 2016-12-23 15:43:32 +03:00
Eren Basak 048fddf4da Propagate MX table and shard metadata on `create_distributed_table` call 2016-12-23 15:43:32 +03:00
Eren Basak 61a1e487d0 Mark hash distributed tables with replication factor = 1 as streaming replicated tables (repmodel=s).
This works only with `create_distributed_table` call.
2016-12-23 15:43:31 +03:00
Onder Kalaci 9f0bd4cb36 Reference Table Support - Phase 1
With this commit, we implemented some basic features of reference tables.

To start with, a reference table is
  * a distributed table whithout a distribution column defined on it
  * the distributed table is single sharded
  * and the shard is replicated to all nodes

Reference tables follows the same code-path with a single sharded
tables. Thus, broadcast JOINs are applicable to reference tables.
But, since the table is replicated to all nodes, table fetching is
not required any more.

Reference tables support the uniqueness constraints for any column.

Reference tables can be used in INSERT INTO .. SELECT queries with
the following rules:
  * If a reference table is in the SELECT part of the query, it is
    safe join with another reference table and/or hash partitioned
    tables.
  * If a reference table is in the INSERT part of the query, all
    other participating tables should be reference tables.

Reference tables follow the regular co-location structure. Since
all reference tables are single sharded and replicated to all nodes,
they are always co-located with each other.

Queries involving only reference tables always follows router planner
and executor.

Reference tables can have composite typed columns and there is no need
to create/define the necessary support functions.

All modification queries, master_* UDFs, EXPLAIN, DDLs, TRUNCATE,
sequences, transactions, COPY, schema support works on reference
tables as expected. Plus, all the pre-requisites associated with
distribution columns are dismissed.
2016-12-20 14:09:35 +02:00
Metin Doslu 20b8f1feeb Refactor distribution column type check for colocation 2016-12-16 15:24:45 +02:00
Metin Doslu e2d0bd38f2 Don't allow tables with different replication models to be colocated 2016-12-16 15:23:49 +02:00
Metin Doslu 86cca54857 Add colocate_with option to create_distributed_table()
With this commit, we support three versions of colocate_with: i.default, ii.none
and iii. a specific table name.
2016-12-16 14:53:35 +02:00
Metin Doslu edbedbd744 Move colocation related functions to colocation_utils.c 2016-12-16 14:52:40 +02:00
Eren Basak 5e96e4f60e Make truncate triggers propagated on start_metadata_sync_to_node call 2016-12-14 10:53:10 +03:00
Burak Yucesoy 8d7cd4d746 Add Foreign Key Support to ALTER TABLE commands
With this PR, we add foreign key support to ALTER TABLE commands. For now,
we only support foreign constraint creation via ALTER TABLE query, if it
is only subcommand in ALTER TABLE subcommand list.

We also only allow foreign key creation if replication factor is 1.
2016-12-08 15:03:25 +02:00
Andres Freund a77cf36778 Use connection_management.c from within connection_cache.c.
This is a temporary step towards removing connection_cache.c.
2016-12-07 11:44:24 -08:00
Burak Yucesoy b30b339f91 Fix typo in error message 2016-11-01 16:58:27 +02:00
Burak Yucesoy 6246702a4c Change error message we displayed for foreign constraints if RF > 1
At the moment, we do not support foreign constraints if replication factor is greater
than 1. However foreign constraints can be used in cloud with high availability option.
Therefore we do not want to create an impression such that foreign constraints with
high availability is not supported at all. We call users to action with this error
message.
2016-11-01 15:47:19 +02:00
Metin Doslu 4e555880b7 Add mark_tables_colocated() to update colocation groups
Added a new UDF, mark_tables_colocated(), to colocate tables with the same
configuration (shard count, shard replication count and distribution column type).
2016-10-26 17:29:03 +03:00
Brian Cloutier 80c8cfeabe Don't add a raw 32-bit int to tuples in create_distributed_table 2016-10-26 14:02:42 +03:00
Burak Yucesoy 5a03acf2bf Foreign Constraint Support for create_distributed_table and shard move
With this change, we now push down foreign key constraints created during CREATE TABLE
statements. We also start to send foreign constraints during shard move along with
other DDL statements
2016-10-21 15:38:55 +03:00
Metin Doslu 405335fcee Add create_reference_table()
create_reference_table() creates a hash distributed table with shard count
equals to 1 and replication factor equals to shard_replication_factor
configuration value.
2016-10-20 15:29:30 +03:00
Metin Doslu d3e7d9dc8d Final refactoring 2016-10-20 11:29:11 +03:00
Metin Doslu 58ac477ffb Change return type of BuildDistributionKeyFromColumnName() to Var *
BuildDistributionKeyFromColumnName() always returns a Var pointer, so there is
no reason to return a Node pointer instead of a Var pointer.
2016-10-20 10:59:31 +03:00
Metin Doslu 40bdafa8d1 Add create_distributed_table()
create_distributed_table() creates a hash distributed table with default values
of shard count and shard replication factor.
2016-10-20 10:58:25 +03:00
Marco Slot a497e7178c Parallelise master_modify_multiple_shards 2016-10-19 08:33:08 +02:00
Eren Basak cee7b54e7c Add worker transaction and transaction recovery infrastructure 2016-10-18 14:18:14 +03:00
Eren Basak ed3af403fd Add Metadata Snapshot Infrastructure
This change adds the required infrastructure about metadata snapshot from MX
codebase into Citus, mainly metadata_sync.c file and master_metadata_snapshot UDF.
2016-10-13 10:40:14 +03:00
Marco Slot 33b7723530 Use UpdateShardPlacementState where appropriate 2016-10-07 11:59:20 -07:00
Andres Freund 982ad66753 Introduce placement IDs.
So far placements were assigned an Oid, but that was just used to track
insertion order. It also did so incompletely, as it was not preserved
across changes of the shard state. The behaviour around oid wraparound
was also not entirely as intended.

The newly introduced, explicitly assigned, IDs are preserved across
shard-state changes.

The prime goal of this change is not to improve ordering of task
assignment policies, but to make it easier to reference shards.  The
newly introduced UpdateShardPlacementState() makes use of that, and so
will the in-progress connection and transaction management changes.
2016-10-07 11:59:20 -07:00
Marco Slot 32b2bd4ed8 Add replication model column to pg_dist_partition 2016-10-05 01:14:28 +02:00
Eren Basak ac3a4eee21 Fix command counter increment bug
Fixes citusdata/citus#714

On `InsertShardRow`, we previously called `CommandCounterIncrement()` before
`CitusInvalidateRelcacheByRelid(relationId);`. This might prevent to skip
invalidation of the distributed table in the next access within the same session.
2016-10-03 17:00:27 +03:00
Burak Yucesoy 1ee39eb098 Internal co-location API
With this commit we introduce internal API for co-location related operations.
2016-09-29 11:56:53 +03:00
Murat Tuncer 6317bbe9a8
Address feedback 2016-09-26 18:23:42 -06:00
Murat Tuncer 2eec0167be
Add support for truncate statement 2016-09-26 18:23:42 -06:00
Robin Thomas 614c858375 Forbid EXCLUDE constraints on distributed tables just as we forbid
UNIQUE or PRIMARY KEY constraints. Also, properly propagate valid
EXCLUDE constraints to worker shard tables.

If an EXCLUDE constraint includes the distribution column,
the operator must be an equality operator.
Tests in regression suite for exclusion constraints that include
the partition column, omit it, and include it but with non-equality
operator. Regression tests also verify that valid exclusion constraints
are propagated to the shard tables. And the tests work in different
timezones now.

Fixes citusdata/citus#748 and citusdata/citus#778.
2016-09-21 14:02:42 -04:00
Jason Petersen 74f4e0003b
Permit multiple DDL commands in a transaction
Three changes here to get to true multi-statement, multi-relation DDL
transactions (same functionality pre-5.2, with benefits of atomicity):

    1. Changed the multi-shard utility hook to always run (consistency
       with router executor hook, removes ad-hoc "installed" boolean)

    2. Change the global connection list in multi_shard_transaction to
       instead be a hash; update related functions to operate on global
       hash instead of local hash/global list

    3. Remove check within DDL code to prevent subsequent DDL commands;
       place unset/reset guard around call to ConnectToNode to permit
       connecting to additional nodes after DDL transaction has begun

In addition, code has been added to raise an error if a ROLLBACK TO
SAVEPOINT is attempted (similar to router executor), and comprehensive
tests execute all multi-DDL scenarios (full success, user ROLLBACK, any
actual errors (say, duplicate index), partial failure (duplicate index
on one node but not others), partial COMMIT (one node fails), and 2PC
partial PREPARE (one node fails)). Interleavings with other commands
(DML, \copy) are similarly all covered.
2016-09-08 22:35:55 -05:00
Eric B. Ridge e80f1612a6
Add syscols in queries; extend relnames in indexes
To permit use with ZomboDB (https://github.com/zombodb/zombodb), two
changes were necessary:

  1. Permit use of `tableoid` system column in queries
  2. Extend relation names appearing in index expressions

The first is accomplished by simply changing the deparse logic to allow
system columns in queries destined for distributed tables. The latter
was slightly more complex, given that DDL extension currently occurs on
workers. But since indexes cannot reference tables other than the one
being indexed, it is safe to look for any relation reference ending in
a '*' character and extend their penultimate segments with a shard id.

This change also adds an error to prevent users from distributing any
relations using the WITH (OIDS) feature, which is unsupported.
2016-09-07 11:54:55 -05:00
Burak Yucesoy 12d1aba1fc Error out at master_create_distributed_table if the table has any rows
Before this change, we do not check whether given table which already contains any data
in master_create_distributed_table command. If that table contains any data, making it
it distributed, makes that data hidden to user. With this change, we now gave error to
user if the table contains data.
2016-09-01 17:42:47 +03:00
Jason Petersen 850c51947a
Re-permit DDL in transactions, selectively
Recent changes to DDL and transaction logic resulted in a "regression"
from the viewpoint of users. Previously, DDL commands were allowed in
multi-command transaction blocks, though they were not processed in any
actual transactional manner. We improved the atomicity of our DDL code,
but added a restriction that DDL commands themselves must not occur in
any BEGIN/END transaction block.

To give users back the original functionality (and improved atomicity)
we now keep track of whether a multi-command transaction has modified
data (DML) or schema (DDL). Interleaving the two modification types in
a single transaction is disallowed.

This first step simply permits a single DDL command in such a block,
admittedly an incomplete solution, but one which will permit us to add
full multi-DDL command support in a subsequent commit.
2016-08-30 20:37:19 -06:00
Eren 3eaff48114 Propagate DDL Commands with 2PC
Fixes #513

This change modifies the DDL Propagation logic so that DDL queries
are propagated via 2-Phase Commit protocol. This way, failures during
the execution of distributed DDL commands will not leave the table in
an intermediate state and the pending prepared transactions can be
commited manually.

DDL commands are not allowed inside other transaction blocks or functions.

DDL commands are performed with 2PC regardless of the value of
`citus.multi_shard_commit_protocol` parameter.

The workflow of the successful case is this:
1. Open individual connections to all shard placements and send `BEGIN`
2. Send `SELECT worker_apply_shard_ddl_command(<shardId>, <DDL Command>)`
to all connections, one by one, in a serial manner.
3. Send `PREPARE TRANSCATION <transaction_id>` to all connections.
4. Sedn `COMMIT` to all connections.

Failure cases:
- If a worker problem occurs before sending of all DDL commands is finished, then
all changes are rolled back.
- If a worker problem occurs after all DDL commands are sent but not after
`PREPARE TRANSACTION` commands are finished, then all changes are rolled back.
However, if a worker node is failed, then the prepared transactions in that worker
should be rolled back manually.
- If a worker problem occurs during `COMMIT PREPARED` statements are being sent,
then the prepared transactions on the failed workers should be commited manually.
- If master fails before the first 'PREPARE TRANSACTION' is sent, then nothing is
changed on workers.
- If master fails during `PREPARE TRANSACTION` commands are being sent, then the
prepared transactions on workers should be rolled back manually.
- If master fails during `COMMIT PREPARED` or `ROLLBACK PREPARED` commands are being
sent, then the remaining prepared transactions on the workers should be handled manually.

This change also helps with #480, since failed DDL changes no longer mark
failed placements as inactive.
2016-07-19 10:44:11 +03:00
Burak Yucesoy cab03a6274 Fix COPY produces error when using array of user-defined types
Fixes #463

OID of user-defined types may be different in master and worker nodes. This causes errors
while sending data between nodes with binary nodes. Because binary copy format adds OID
of the element if it is in an array. The code adding OID is in PostgreSQL code, therefore
we cannot change it. Instead we decided to use text format if we try to send array of
user-defined type.
2016-07-13 11:12:24 +03:00
Burak Yucesoy 4a718d293b Append shardId before escaping the table name
Fixes #550, fixes #545

If table name contains special characters, it needs to be escaped. However in some cases,
we escape table name before appending shardId, which causes syntax error in the queries
sent to worker nodes. With this change we now append shardId before escaping table names.
2016-06-15 04:15:40 +03:00
Jason Petersen 9ba02928ac
Refactor ReportRemoteError to remove boolean arg
Broke it into two explicitly-named functions instead: WarnRemoteError
and ReraiseRemoteError.
2016-06-07 12:38:32 -06:00
Metin Doslu 7d0c90b398 Fail fast on constraint violations in router executor 2016-06-07 18:11:17 +03:00
Matthew Seaman 332c322b4f
Add inet includes for htonl and htons funtions
Needed to fix FreeBSD builds.
2016-05-27 12:36:12 -06:00
eren 132d9212d0 ADD master_modify_multiple_shards UDF
Fixes #10

This change creates a new UDF: master_modify_multiple_shards
Parameters:
  modify_query: A simple DELETE or UPDATE query as a string.

The UDF is similar to the existing master_apply_delete_command UDF.
Basically, given the modify query, it prunes the shard list, re-constructs
the query for each shard and sends the query to the placements.

Depending on the value of citus.multi_shard_commit_protocol, the commit
can be done in one-phase or two-phase manner.

Limitations:
* It cannot be called inside a transaction block
* It only be called with simple operator expressions (like Single Shard Modify)

Sample Usage:
```
SELECT master_modify_multiple_shards(
  'DELETE FROM customer_delete_protocol WHERE c_custkey > 500 AND c_custkey < 500');
```
2016-05-26 17:30:35 +03:00
Burak Yucesoy 0e71ffd937 Fix #469
This change renames one of the ReceiveRegularFile functions with
more descriptive name.
2016-05-26 12:03:36 +03:00
Metin Doslu 866271b765 Add COPY support on worker nodes for append partitioned relations
Now, we can copy to an append-partitioned distributed relation from
any worker node by providing master options such as;

COPY relation_name FROM file_path WITH (delimiter '|', master_host 'localhost', master_port 5432);

where master_port is optional and default is 5432.
2016-05-03 16:00:00 +03:00
eren ab240a7d4c Rename copy_transaction_manager
This change renames the distributed transaction manager parameter from
citus.copy_transaction_manager to citus.multi_shard_commit_protocol.

Distributed transaction manager has been used only by the COPY on hash
partitioned tables but it can be used by upcoming features so, we needed
to rename so that its name do not contain a reference to COPY.

The change also includes renames like transaction_manager_options to
commit_protocol_options and TRANSACTION_MANAGER_1PC to COMMIT_PROTOCOL_1PC.

With this change, declaration of MultiShardCommitProtocol (was
CopyTransactionManager) is moved from multi_copy.c to multi_transaction.c.
2016-04-28 15:12:50 +03:00
Andres Freund 12a246de37 Perform permission checks in functions manipulating distributed tables.
Previously several commands, amongst them commands like
master_create_distributed_table(), were allowed for everyone. That's not
good: Even though citus currently requires superuser permissions, we
shouldn't allow non-superusers to perform actions as sensitive as making
a table distributed.

There's no checks on the worker_* functions, as these usually just punt
the action to underlying postgres functionality, which then perform the
necessary checks.
2016-04-27 10:22:20 -07:00
Andres Freund 42d232c0e8 Use the current session's username when connecting to worker nodes.
So far we've always used libpq defaults when connecting to workers; bar
special environment variables being set that'll always be the user that
started the server.  That's not desirable because it prevents using
users with fewer privileges.

Thus change the various APIs creating connections to workers to always
use usernames. That means:
1) MultiClientConnect() needs to, optionally, accept a username
2) GetOrEstablishConnection(), including the underlying cache, need to
   use the current user as part of the connection cache key. That way
   connections for separate users are distinct, and we always use one
   with the correct authorization.
3) The task tracker needs to keep track of the username associated with
   a task, so it can use it when establishing connections outside the
   originating session.
2016-04-27 10:00:08 -07:00
Onder Kalaci c4b783b70b Fix Merge Conflict
This commit fixes merge conflicts.
2016-04-26 11:18:47 +03:00
Onder Kalaci 6c7abc2ba5 Add fast shard pruning path for INSERTs on hash partitioned tables
This commit adds a fast shard pruning path for INSERTs on
hash-partitioned tables. The rationale behind this change is
that if there exists a sorted shard interval array, a single
index lookup on the array allows us to find the corresponding
shard interval. As mentioned above, we need a sorted
(wrt shardminvalue) shard interval array. Thus, this commit
updates shardIntervalArray to sortedShardIntervalArray in the
metadata cache. Then uses the low-level API that is defined in
multi_copy to handle the fast shard pruning.

The performance impact of this change is more apparent as more
shards exist for a distributed table. Previous implementation
was relying on linear search through the shard intervals. However,
this commit relies on constant lookup time on shard interval
array. Thus, the shard pruning becomes less dependent on the
shard count.
2016-04-26 11:16:00 +03:00
Metin Doslu 132a77f992 Add COPY support on master node for append partitioned relations 2016-04-19 21:57:59 +03:00
Metin Doslu 1150ce6414 Send COPY rows in binary format 2016-04-12 20:22:31 +02:00
Marco Slot d25ee8fbd8 Support for COPY FROM, based on pg_shard PR by Postres Pro 2016-04-12 20:22:31 +02:00
Jason Petersen 423e6c8ea0
Update copyright dates
Fixed configure variable and updated all end dates to 2016.
2016-03-23 17:14:37 -06:00
Marco Slot 75a141a7c6 Merge remote-tracking branch 'origin/master' into feature/drop_shards_on_drop_table 2016-02-17 22:52:58 +01:00
Murat Tuncer 3528d7ce85 Merge from master branch into feature/citusdb-to-citus 2016-02-17 14:49:01 +02:00
Marco Slot 2af6797c04 Perform relcache invalidation in CitusInvalidateRelcacheByRelid 2016-02-16 12:59:38 +01:00
Jason Petersen fdb37682b2
First formatting attempt
Skipped csql, ruleutils, readfuncs, and functions obviously copied from
PostgreSQL. Seeing how this looks, then continuing.
2016-02-15 23:29:32 -07:00
Murat Tuncer 55c44b48dd Changed product name to citus
All citusdb references in
- extension, binary names
- file headers
- all configuration name prefixes
- error/warning messages
- some functions names
- regression tests

are changed to be citus.
2016-02-15 16:04:31 +02:00
Onder Kalaci 136306a1fe Initial commit of Citus 5.0 2016-02-11 04:05:32 +02:00