Commit Graph

287 Commits (3b1c082791592097e2c944ef98ec86e4234882bf)

Author SHA1 Message Date
Burak Yucesoy f8e0d37ba1 Fix crashes caused by stack size increase under high memory load
Each PostgreSQL backend starts with a predefined amount of stack and this stack
size can be increased if there is a need. However, stack size increase during
high memory load may cause unexpected crashes, because if there is not enough
memory for stack size increase, there is nothing to do for process apart from
crashing. An interesting thing is; the process would get OOM error instead of
crash, if the process had an explicit memory request (with palloc) for example.
However, in the case of stack size increase, there is no system call to get OOM
error, so the process simply crashes.

With this change, we are increasing the stack size explicitly by requesting extra
memory from the stack, so that, even if there is not memory, we can at least get
an OOM instead of a crash.
2018-11-14 01:27:53 +03:00
Marco Slot d56baefe3d Allow simple DML commands from hot standby 2018-10-06 10:54:44 +02:00
velioglu d1f005daac Adds UDFs for testing MX functionalities with isolation tests 2018-09-12 07:04:16 +03:00
Onder Kalaci 974cbf11a5 Hide shard names on MX worker nodes
This commit by default enables hiding shard names on MX workers
by simple replacing `pg_table_is_visible()` calls with
`citus_table_is_visible()` calls on the MX worker nodes. The latter
function filters out tables that are known to be shards.

The main motivation of this change is a better UX. The functionality
can be opted out via a GUC.

We also added two views, namely citus_shards_on_worker and
citus_shard_indexes_on_worker such that users can query
them to see the shards and their corresponding indexes.

We also added debug messages such that the filtered tables can
be interactively seen by setting the level to DEBUG1.
2018-08-07 14:21:45 +03:00
Marco Slot 89870e76ce Add a select_opens_transaction_block GUC 2018-07-08 03:50:39 +02:00
Brian Cloutier 735218ee5d Remove sslmode structs and add more helpful description 2018-07-05 14:12:36 +02:00
Onder Kalaci d83be3a33f Enforce foreign key restrictions inside transaction blocks
When a hash distributed table have a foreign key to a reference
table, there are few restrictions we have to apply in order to
prevent distributed deadlocks or reading wrong results.

The necessity to apply the restrictions arise from cascading
nature of foreign keys. When a foreign key on a reference table
cascades to a distributed table, a single operation over a single
connection can acquire locks on multiple shards of the distributed
table. Thus, any parallel operation on that distributed table, in the
same transaction should not open parallel connections to the shards.
Otherwise, we'd either end-up with a self-distributed deadlock or
read wrong results.

As briefly described above, the restrictions that we apply is done
by tracking the distributed/reference relation accesses inside
transaction blocks, and act accordingly when necessary.

The two main rules are as follows:
   - Whenever a parallel distributed relation access conflicts
     with a consecutive reference relation access, Citus errors
     out
   - Whenever a reference relation access is followed by a
     conflicting parallel relation access, the execution mode
     is switched to sequential mode.

There are also some other notes to mention:
   - If the user does SET LOCAL citus.multi_shard_modify_mode
     TO 'sequential';, all the queries should simply work with
     using one connection per worker and sequentially executing
     the commands. That's obviously a slower approach than Citus'
     usual parallel execution. However, we've at least have a way
     to run all commands successfully.

   - If an unrelated parallel query executed on any distributed
     table, we cannot switch to sequential mode. Because, the essense
     of sequential mode is using one connection per worker. However,
     in the presence of a parallel connection, the connection manager
     picks those connections to execute the commands. That contradicts
     with our purpose, thus we error out.

   - COPY to a distributed table cannot be executed in sequential mode.
     Thus, if we switch to sequential mode and COPY is executed, the
     operation fails and there is currently no way of implementing that.
     Note that, when the local table is not empty and create_distributed_table
     is used, citus uses COPY internally. Thus, in those cases,
     create_distributed_table() will also fail.

   - There is a GUC called citus.enforce_foreign_key_restrictions
     to disable all the checks. We added that GUC since the restrictions
     we apply is sometimes a bit more restrictive than its necessary.
     The user might want to relax those. Similarly, if you don't have
     CASCADEing reference tables, you might consider disabling all the
     checks.
2018-07-03 17:05:55 +03:00
Murat Tuncer 4d35b92016 Add groundwork for citus_stat_statements api 2018-06-27 14:20:03 +03:00
Jason Petersen 57b3f253c5
Add node_conninfo GUC and related logic
To support more flexible (i.e. not at compile-time) specification of
libpq connection parameters, this change adds a new GUC, node_conninfo,
which must be a space-separated string of key-value pairs suitable for
parsing by libpq's connection establishment methods.

To avoid rebuilding and parsing these values at connection time, this
change also adds a cache in front of the configuration params to permit
immediate use of any previously-calculated parameters.
2018-06-12 20:23:47 -06:00
Onder Kalaci 317dd02a2f Implement single repartitioning on hash distributed tables
* Change worker_hash_partition_table() such that the
     divergence between Citus planner's hashing and
     worker_hash_partition_table() becomes the same.

   * Rename single partitioning to single range partitioning.

   * Add single hash repartitioning. Basically, logical planner
     treats single hash and range partitioning almost equally.
     Physical planner, on the other hand, treats single hash and
     dual hash repartitioning almost equally (except for JoinPruning).

   * Add a new GUC to enable this feature
2018-05-02 18:50:55 +03:00
velioglu d9fa69c031 Refactor query pushdown related logic 2018-05-02 15:03:09 +03:00
velioglu 121ff39b26 Removes large_table_shard_count GUC 2018-04-29 10:34:50 +02:00
velioglu 698d585fb5 Remove broadcast join logic
After this change all the logic related to shard data fetch logic
will be removed. Planner won't plan any ShardFetchTask anymore.
Shard fetch related steps in real time executor and task-tracker
executor have been removed.
2018-03-30 11:45:19 +03:00
Metin Doslu 3b7b64a8b6 Remove skip_jsonb_validation_in_copy GUC 2018-03-13 10:33:27 +02:00
Metin Doslu e86d34256c Change default to false for citus.skip_jsonb_validation_in_copy 2018-03-06 13:19:47 +02:00
Marco Slot 6f7c3bd73b Skip JSON validation on coordinator during COPY 2018-02-02 15:33:27 +01:00
Brian Cloutier 377b31dcf7 Remove enable_deadlock_prevention prevention warning 2017-12-21 14:47:52 +01:00
mehmet furkan ÅŸahin fd546cf322 Intermediate result size limitation
This commit introduces a new GUC to limit the intermediate
result size which we handle when we use read_intermediate_result
function for CTEs and complex subqueries.
2017-12-21 14:26:56 +03:00
mehmet furkan ÅŸahin 3c941aedf1 adds citus.enable_repartition_joins GUC
The new GUC allows Citus to switch between task executors
when necessary
2017-12-11 09:36:37 +03:00
Onder Kalaci 16421f089f Register citus custom scan nodes 2017-11-23 11:38:33 +02:00
Marco Slot f4ceea5a3d Enable 2PC by default 2017-11-22 11:26:58 +01:00
Marco Slot 8486f76e15 Auto-recover 2PC transactions 2017-11-22 11:26:58 +01:00
Marco Slot 6ba3f42d23 Rename MultiPlan to DistributedPlan 2017-11-22 09:36:24 +01:00
Andres Freund d063658d6d Protect some initializations from being called during backend startup.
On EXEC_BACKEND builds these functions shouldn't be called at every
backend start.
2017-11-20 15:29:51 -08:00
Marco Slot d3b634b301 Allow generating placement IDs without using the sequence 2017-11-15 10:12:06 +01:00
Marco Slot c24a0875a5 Allow generating shard IDs without using the sequence 2017-11-15 10:12:05 +01:00
Marco Slot f71728f634 Add GUC for specifying sslmode in connections to workers 2017-11-08 14:15:58 +01:00
Marco Slot 100aaeb3f5 Fix typo in distributed deadlock error message 2017-10-31 19:39:32 +01:00
velioglu 0b5db5d826 Support multi shard update/delete queries 2017-10-25 15:52:38 +03:00
Hadi Moshayedi 2aec6eda49 Properly use #ifdef HAVE_LIBCURL. 2017-10-13 12:04:36 -06:00
Hadi Moshayedi a1387f4aa8 Basic usage statistics collection. (#1656)
Adds ```citus.enable_statistics_collection``` GUC variable, which ```true``` by default, unless built without libcurl. If statistics collection is enabled, sends basic usage data to Citus servers every 24 hours.

The data that is collected consists of:
- Citus version
- OS name & release
- Hardware Id
- Number of tables, rounded to next power of 2
- Size of data, rounded to next power of 2
- Number of workers
2017-10-11 09:55:15 -04:00
Onder Kalaci 59133415b0 Add logging infrasture for distributed deadlock detection
We added a new GUC citus.log_distributed_deadlock_detection
which is off by default. When set to on, we log some debug messages
related to the distributed deadlock to the server logs.
2017-08-12 13:28:37 +03:00
Onder Kalaci e5d5bdff51 Enable distributed deadlock detection on the maintenance deamon
With this commit, the maintenance deamon starts to check for
distributed deadlocks.

We also introduced a GUC variable (distributed_deadlock_detection_factor)
whose value is multiplied with Postgres' deadlock_timeout. Setting
it to -1 disables the distributed deadlock detection.
2017-08-12 13:28:37 +03:00
Onder Kalaci 66936053a0 Improve error messages when a backend is cancelled by deadlock detection
We send SIGINT to a backend that is cancelled due to a deadlock. That
approach ends up being a very confusing error message.

With this commit we intercept the error messages and show a more
meaningful error message to the user.
2017-08-12 13:28:37 +03:00
Onder Kalaci be4fc45c03 Deprecate enable_deadlock_prevention flag
Now that we already have the necessary infrastructure for detecting
distributed deadlocks. Thus, we don't need enable_deadlock_prevention
which is purely intended for preventing some forms of distributed
deadlocks.
2017-08-12 13:28:37 +03:00
Brian Cloutier 9d93fb5551 Create citus.use_secondary_nodes GUC
This GUC has two settings, 'always' and 'never'. When it's set to
'never' all behavior stays exactly as it was prior to this commit. When
it's set to 'always' only SELECT queries are allowed to run, and only
secondary nodes are used when processing those queries.

Add some helper functions:
- WorkerNodeIsSecondary(), checks the noderole of the worker node
- WorkerNodeIsReadable(), returns whether we're currently allowed to
  read from this node
- ActiveReadableNodeList(), some functions (namely, the ones on the
  SELECT path) don't require working with Primary Nodes. They should call
  this function instead of ActivePrimaryNodeList(), because the latter
  will error out in contexts where we're not allowed to write to nodes.
- ActiveReadableNodeCount(), like the above, replaces
  ActivePrimaryNodeCount().
- EnsureModificationsCanRun(), error out if we're not currently allowed
  to run queries which modify data. (Either we're in read-only mode or
  use_secondary_nodes is set)

Some parts of the code were switched over to use readable nodes instead
of primary nodes:
- Deadlock detection
- DistributedTableSize,
- the router, real-time, and task tracker executors
- ShardPlacement resolution
2017-08-10 17:37:17 +03:00
Brian Cloutier 3151b52a0b Add citus.cluster_name GUC
- Nodes with a nodecluster which does not match citus.cluster_name
  are excluded from the metadata cache and never seen by another part of
  Citus.
2017-08-08 13:12:06 +03:00
Murat Tuncer 26f020dc6e Make maxTaskStringSize configurable (#1501)
maxTaskStringSize determines the size of worker query string.
It was originally hard coded to a specific value. This has caused
issues at some users. Since it determines initial shared memory
allocation, we did not want to set it to an arbitrary higher number.
Instead made it configurable.

This commit introduces a new GUC variable max_task_string_size

Changes in this variable requires restart to be in effect.
2017-07-27 11:39:12 -07:00
Onder Kalaci 3369f3486f Introduce distributed transaction ids
This commit adds distributed transaction id infrastructure in
the scope of distributed deadlock detection.

In general, the distributed transaction id consists of a tuple
in the form of: `(databaseId, initiatorNodeIdentifier, transactionId,
timestamp)`.

Briefly, we add a shared memory block on each node, which holds some
information per backend (i.e., an array `BackendData backends[MaxBackends]`).
Later, on each coordinated transaction, Citus sends
`SELECT assign_distributed_transaction_id()` right after `BEGIN`.
For that backend on the worker, the distributed transaction id is set to
the values assigned via the function call.

The aim of the above is to correlate the transactions on the coordinator
to the transactions on the worker nodes.
2017-07-18 15:01:42 +03:00
Jason Petersen 2204da19f0 Support PostgreSQL 10 (#1379)
Adds support for PostgreSQL 10 by copying in the requisite ruleutils
and updating all API usages to conform with changes in PostgreSQL 10.
Most changes are fairly minor but they are numerous. One particular
obstacle was the change in \d behavior in PostgreSQL 10's psql; I had
to add SQL implementations (views, mostly) to mimic the pre-10 output.
2017-06-26 02:35:46 -06:00
Andres Freund c3b7c5dc33 Introduce per-database maintenance process.
This will be used for deadlock detection, prepared transaction
recovery amongst others, but currently is just idling around.
2017-06-23 11:53:39 -07:00
Andres Freund 3483bb99eb Minimal infrastructure for per-backend citus initialization. 2017-06-23 11:20:10 -07:00
velioglu 8cbef819be Log message of across shard queries according to the log level 2017-04-20 12:24:46 +03:00
Marco Slot dfd7d86948 Stop using a sequence to generate unique job IDs 2017-04-18 11:31:51 +02:00
Onder Kalaci 1cb6a34ba8 Remove uninstantiated qual logic, use attribute equivalences
In this PR, we aim to deduce whether each of the RTE_RELATION
is joined with at least on another RTE_RELATION on their partition keys. If each
RTE_RELATION follows the above rule, we can conclude that all RTE_RELATIONs are
joined on their partition keys.

In order to do that, we invented a new equivalence class namely:
AttributeEquivalenceClass. In very simple words, a AttributeEquivalenceClass is
identified by an unique id and consists of a list of AttributeEquivalenceMembers.

Each AttributeEquivalenceMember is designed to identify attributes uniquely within the
whole query. The necessity of this arise since varno attributes are defined within
a single level of a query. Instead, here we want to identify each RTE_RELATION uniquely
and try to find equality among each RTE_RELATION's partition key.

Whenever we find an equality clause A = B, where both A and B originates from
relation attributes (i.e., not random expressions), we create an
AttributeEquivalenceClass to record this knowledge. If we later find another
equivalence B = C, we create another AttributeEquivalenceClass. Finally, we can
apply transitity rules and generate a new AttributeEquivalenceClass which includes
A, B and C.

Note that equality among the members are identified by the varattno and rteIdentity.

Each equality among RTE_RELATION is saved using an AttributeEquivalenceClass where
each member attribute is identified by a AttributeEquivalenceMember. In the final
step, we try generate a common attribute equivalence class that holds as much as
AttributeEquivalenceMembers whose attributes are a partition keys.
2017-04-13 11:51:26 +03:00
Jason Petersen 7e46f41c12
Add comments, use strncmp, clean up GUC desc.
Good to go!
2017-04-04 16:16:49 -06:00
Burak Yucesoy a09614553f Add enable_version_checks GUC and address feedback 2017-04-04 19:11:13 +03:00
Burak Yucesoy 087d8427e3
Error out if binary citus version does not match installed extension
With this change, we start to error out if loaded citus binaries does not match
the available major version or installed citus extension version. In this case
we force user to restart the server or run ALTER EXTENSION depending on the
situation
2017-04-03 17:36:13 -06:00
Metin Doslu 1f838199f8 Use CustomScan API for query execution
Custom Scan is a node in the planned statement which helps external providers
to abstract data scan not just for foreign data wrappers but also for regular
relations so you can benefit your version of caching or hardware optimizations.
This sounds like only an abstraction on the data scan layer, but we can use it
as an abstraction for our distributed queries. The only thing we need to do is
to find distributable parts of the query, plan for them and replace them with
a Citus Custom Scan. Then, whenever PostgreSQL hits this custom scan node in
its Vulcano style execution, it will call our callback functions which run
distributed plan and provides tuples to the upper node as it scans a regular
relation. This means fewer code changes, fewer bugs and more supported features
for us!

First, in the distributed query planner phase, we create a Custom Scan which
wraps the distributed plan. For real-time and task-tracker executors, we add
this custom plan under the master query plan. For router executor, we directly
pass the custom plan because there is not any master query. Then, we simply let
the PostgreSQL executor run this plan. When it hits the custom scan node, we
call the related executor parts for distributed plan, fill the tuple store in
the custom scan and return results to PostgreSQL executor in Vulcano style,
a tuple per XXX_ExecScan() call.

* Modify planner to utilize Custom Scan node.
* Create different scan methods for different executors.
* Use native PostgreSQL Explain for master part of queries.
2017-03-14 12:17:51 +02:00
Andres Freund 52358fe891 Initial temp table removal implementation 2017-03-14 12:09:49 +02:00
Jason Petersen 56197dbdba
Add replication_model GUC
This adds a replication_model GUC which is used as the replication
model for any new distributed table that is not a reference table.
With this change, tables with replication factor 1 are no longer
implicitly MX tables.

The GUC is similarly respected during empty shard creation for e.g.
existing append-partitioned tables. If the model is set to streaming
while replication factor is greater than one, table and shard creation
routines will error until this invalid combination is corrected.

Changing this parameter requires superuser permissions.
2017-01-23 09:05:14 -07:00
Marco Slot ea855ddf86 Add an enable_deadlock_prevention flag to allow router transactions to expand to multiple nodes 2017-01-22 17:31:24 +01:00
Andres Freund 6ec34bed84 Remove remnants of commit_protocol.[ch]. 2017-01-21 09:01:15 -08:00
Jason Petersen 4e7b23472c
Change default replication factor to one
Took the quick-and-dirty approach of changing it back to two during
test runs. Can update tests to expect one in due time.
2017-01-20 18:56:43 -07:00
Andres Freund 0f28a11970 Remove citus.explain_multi_logical/physical_plan.
They make fixing explain for prepared statement harder, and they don't
really fit into EXPLAIN in the first place.  Additionally they're
currently not exercised in any tests.
2017-01-20 12:31:19 -08:00
Andres Freund bfa742d794 Centralized shard/placement connection and state management.
Currently there are several places in citus that map placements to
connections and that manage placement health. Centralize this
knowledge.  Because of the centralized knowledge about which
connection has previously been used for which shard/placement, this
also provides the basis for relaxing restrictions around combining
various forms of DDL/DML.

Connections for a placement can now be acquired using
GetPlacementConnection(). If the connection is used for DML or DDL the
FOR_DDL/DML flags should be used respectively.  If an individual
remote transaction fails (but the transaction on the master succeeds)
and FOR_DDL/DML have been specified, the placement is marked as
invalid, unless that'd mark all placements for a shard as invalid.
2017-01-09 13:13:02 -08:00
Eren Basak 296e0bd33a Add citus.node_connection_timeout GUC 2016-12-20 14:11:37 +03:00
Murat Tuncer c3a60bff70 Make router planner active at all times
We used to disable router planner and executor
when task executor is set to task-tracker.

This change enables router planning and execution
at all times regardless of task execution mode.

We are introducing a hidden flag enable_router_execution
to enable/disable router execution. Its default value is
true. User may disable router planning by setting it to false.
2016-12-20 11:24:01 +03:00
Andres Freund 80b34a5d6b Integrate router executor into transaction management framework.
One less place managing remote transactions. It also makes it fairly
easy to use 2PC for certain modifications (e.g. reference tables). Just
issue a CoordinatedTransactionUse2PC(). If every placement failure
should cause the whole transaction to abort, additionally mark the
relevant transactions as critical.
2016-12-12 15:18:12 -08:00
Andres Freund fa5e202403 Convert multi_shard_transaction.[ch] to new framework. 2016-12-12 15:18:12 -08:00
Andres Freund 3505d431cd Add initial helpers to make interactions with MultiConnection et al. easier.
This includes basic infrastructure for logging of commands sent to
remote/worker nodes. Note that this has no effect as of yet, since no
callers are converted to the new infrastructure.
2016-12-07 11:44:24 -08:00
Andres Freund 3223b3c92d Centralized Connection Lifetime Management.
Connections are tracked and released by integrating into postgres'
transaction handling. That allows to to use connections without having
to resort to having to disable interrupts or using PG_TRY/CATCH blocks
to avoid leaking connections.

This is intended to eventually replace multi_client_executor.c and
connection_cache.c, and to provide the basis of a centralized
transaction management.

The newly introduced transaction hook should, in the future, be the only
one in citus, to allow for proper ordering between operations.  For now
this central handler is responsible for releasing connections and
resetting XactModificationLevel after a transaction.
2016-12-07 11:43:18 -08:00
Murat Tuncer b5c1ecb684 Fix failures during pg_upgrade
- fix error in CitusHasBeenLoaded()
- allow creation of pg_catalog tables during upgrade
2016-11-11 17:22:45 -08:00
Metin Doslu d04f4f5935 Add guc variable for shard count 2016-10-19 10:44:50 +03:00
Andres Freund ac14b2edbc
Support PostgreSQL 9.6
Adds support for PostgreSQL 9.6 by copying in the requisite ruleutils
file and refactoring the out/readfuncs code to flexibly support the
old-style copy/pasted out/readfuncs (prior to 9.6) or use extensible
node APIs (in 9.6 and higher).

Most version-specific code within this change is only needed to set new
fields in the AggRef nodes we build for aggregations. Version-specific
test output files were added in certain cases, though in most they were
not necessary. Each such file begins by e.g. printing the major version
in order to clarify its purpose.

The comment atop citus_nodes.h details how to add support for new nodes
for when that becomes necessary.
2016-10-18 16:23:55 -06:00
Metin Doslu d94a65e0e9 Reduce minimum value of task_tracker_delay to 1ms 2016-10-07 09:55:56 +03:00
Brian Cloutier 9d6699b07c Switch from pg_worker_list.conf file to pg_dist_node metadata table.
Related to #786

This change adds the `pg_dist_node` table that contains the information
about the workers in the cluster, replacing the previously used
`pg_worker_list.conf` file (or the one specified with `citus.worker_list_file`).

Upon update, `pg_worker_list.conf` file is read and `pg_dist_node` table is
populated with the file's content. After that, `pg_worker_list.conf` file
is renamed to `pg_worker_list.conf.obsolete`

For adding and removing nodes, the change also includes two new UDFs:
`master_add_node` and `master_remove_node`, which require superuser
permissions.

'citus.worker_list_file' guc is kept for update purposes but not used after the
update is finished.
2016-10-05 13:01:35 +03:00
Jason Petersen 5b80d4e8dd
Directly register multi-shard callbacks in PG_init
I had changed these callbacks to use the same method I chose for the
router executor (for consistency), but as that method is flawed, we now
want to ensure we directly register them from PG_init as well.
2016-09-29 11:43:19 -06:00
Jason Petersen 5f6264105d
Directly register router xact callbacks in PG_init
Not entirely sure why we went with the shared memory hook approach, but
it causes problems (multiple registration) during crashes. Changing to
a simple direct registration call from PG_init.
2016-09-29 11:43:18 -06:00
Jason Petersen 74f4e0003b
Permit multiple DDL commands in a transaction
Three changes here to get to true multi-statement, multi-relation DDL
transactions (same functionality pre-5.2, with benefits of atomicity):

    1. Changed the multi-shard utility hook to always run (consistency
       with router executor hook, removes ad-hoc "installed" boolean)

    2. Change the global connection list in multi_shard_transaction to
       instead be a hash; update related functions to operate on global
       hash instead of local hash/global list

    3. Remove check within DDL code to prevent subsequent DDL commands;
       place unset/reset guard around call to ConnectToNode to permit
       connecting to additional nodes after DDL transaction has begun

In addition, code has been added to raise an error if a ROLLBACK TO
SAVEPOINT is attempted (similar to router executor), and comprehensive
tests execute all multi-DDL scenarios (full success, user ROLLBACK, any
actual errors (say, duplicate index), partial failure (duplicate index
on one node but not others), partial COMMIT (one node fails), and 2PC
partial PREPARE (one node fails)). Interleavings with other commands
(DML, \copy) are similarly all covered.
2016-09-08 22:35:55 -05:00
Murat Tuncer cc33a450c4 Expand router planner coverage
We can now support richer set of queries in router planner.
This allow us to support CTEs, joins, window function, subqueries
if they are known to be executed at a single worker with a single
task (all tables are filtered down to a single shard and a single
worker contains all table shards referenced in the query).

Fixes : #501
2016-07-27 23:35:38 +03:00
Jason Petersen 5d525fba24
Permit "single-shard" transactions
Allows the use of modification commands (INSERT/UPDATE/DELETE) within
transaction blocks (delimited by BEGIN and ROLLBACK/COMMIT), so long as
all modifications hit a subset of nodes involved in the first such com-
mand in the transaction. This does not circumvent the requirement that
each individual modification command must still target a single shard.

For instance, after sending BEGIN, a user might INSERT some rows to a
shard replicated on two nodes. Subsequent modifications can hit other
shards, so long as they are on one or both of these nodes.

SAVEPOINTs are supported, though if the user actually attempts to send
a ROLLBACK command that specifies a SAVEPOINT they will receive an
ERROR at the end of the topmost transaction.

Placements are only marked inactive if at least one replica succeeds
in a transaction where others fail. Non-atomic behavior is possible if
the shard targeted by the initial modification within a transaction has
a higher replication factor than another shard within the same block
and a node with the latter shard has a failure during the COMMIT phase.

Other methods of denoting transaction blocks (multi-statement commands
sent all at once and functions written in e.g. PL/pgSQL or other such
languages) are not presently supported; their treatment remains the
same as before.
2016-07-21 15:57:22 -06:00
Eren 3eaff48114 Propagate DDL Commands with 2PC
Fixes #513

This change modifies the DDL Propagation logic so that DDL queries
are propagated via 2-Phase Commit protocol. This way, failures during
the execution of distributed DDL commands will not leave the table in
an intermediate state and the pending prepared transactions can be
commited manually.

DDL commands are not allowed inside other transaction blocks or functions.

DDL commands are performed with 2PC regardless of the value of
`citus.multi_shard_commit_protocol` parameter.

The workflow of the successful case is this:
1. Open individual connections to all shard placements and send `BEGIN`
2. Send `SELECT worker_apply_shard_ddl_command(<shardId>, <DDL Command>)`
to all connections, one by one, in a serial manner.
3. Send `PREPARE TRANSCATION <transaction_id>` to all connections.
4. Sedn `COMMIT` to all connections.

Failure cases:
- If a worker problem occurs before sending of all DDL commands is finished, then
all changes are rolled back.
- If a worker problem occurs after all DDL commands are sent but not after
`PREPARE TRANSACTION` commands are finished, then all changes are rolled back.
However, if a worker node is failed, then the prepared transactions in that worker
should be rolled back manually.
- If a worker problem occurs during `COMMIT PREPARED` statements are being sent,
then the prepared transactions on the failed workers should be commited manually.
- If master fails before the first 'PREPARE TRANSACTION' is sent, then nothing is
changed on workers.
- If master fails during `PREPARE TRANSACTION` commands are being sent, then the
prepared transactions on workers should be rolled back manually.
- If master fails during `COMMIT PREPARED` or `ROLLBACK PREPARED` commands are being
sent, then the remaining prepared transactions on the workers should be handled manually.

This change also helps with #480, since failed DDL changes no longer mark
failed placements as inactive.
2016-07-19 10:44:11 +03:00
Jason Petersen e064cacea9
Minor formatting fix
Noticed that uncrustify doesn't like the array-of-struct literals, so
omitting them from formatting (at least here).
2016-06-28 13:09:57 -06:00
Murat Tuncer 360e884de1 Add enable_ddl_propagation flag to control automatic ddl propagation 2016-06-06 13:42:46 +03:00
Metin Doslu afa74ce5ca Make master_create_empty_shard() aware of the shard placement policy
Now, master_create_empty_shard() will create shards according to the
value of citus.shard_placement_policy which also makes default round-robin
instead of random.
2016-05-27 15:05:53 +03:00
Marco Slot fc4f23065a Add EXPLAIN for simple distributed queries 2016-04-30 00:11:02 +02:00
eren ab240a7d4c Rename copy_transaction_manager
This change renames the distributed transaction manager parameter from
citus.copy_transaction_manager to citus.multi_shard_commit_protocol.

Distributed transaction manager has been used only by the COPY on hash
partitioned tables but it can be used by upcoming features so, we needed
to rename so that its name do not contain a reference to COPY.

The change also includes renames like transaction_manager_options to
commit_protocol_options and TRANSACTION_MANAGER_1PC to COMMIT_PROTOCOL_1PC.

With this change, declaration of MultiShardCommitProtocol (was
CopyTransactionManager) is moved from multi_copy.c to multi_transaction.c.
2016-04-28 15:12:50 +03:00
Murat Tuncer a88d3ecd4e Add dynamic executor selection
- non-router plannable queries can be executed
  by router executor if they satisfy the criteria
- router executor is removed from configuration,
  now task executor can not be set to router
- removed some tests that error out for router executor
2016-04-21 09:15:33 +03:00
Murat Tuncer 938546b938 Add router plannable check and router planning logic
for single shard select queries
2016-04-21 09:15:33 +03:00
eren 3c8f275aa9 Clarify Error Message Related to shared_preload_libraries
Fixes #363

This change modifies the error message given when Citus is attempted
to be loaded other than shared_preload_libraries. Explanations have been
extended with that shared_preload_parameters parameter is in
postgresql.conf and citus should be at the beginning.
2016-04-13 12:12:21 +03:00
Marco Slot d25ee8fbd8 Support for COPY FROM, based on pg_shard PR by Postres Pro 2016-04-12 20:22:31 +02:00
Jason Petersen 423e6c8ea0
Update copyright dates
Fixed configure variable and updated all end dates to 2016.
2016-03-23 17:14:37 -06:00
Murat Tuncer 3528d7ce85 Merge from master branch into feature/citusdb-to-citus 2016-02-17 14:49:01 +02:00
Jason Petersen fdb37682b2
First formatting attempt
Skipped csql, ruleutils, readfuncs, and functions obviously copied from
PostgreSQL. Seeing how this looks, then continuing.
2016-02-15 23:29:32 -07:00
Murat Tuncer 55c44b48dd Changed product name to citus
All citusdb references in
- extension, binary names
- file headers
- all configuration name prefixes
- error/warning messages
- some functions names
- regression tests

are changed to be citus.
2016-02-15 16:04:31 +02:00
Onder Kalaci 136306a1fe Initial commit of Citus 5.0 2016-02-11 04:05:32 +02:00