Commit Graph

70 Commits (fb74e08fa5a2e6de29456f1cec36ccbcc0e63c8b)

Author SHA1 Message Date
Marco Slot a2276adcd2 Fix segmentation fault in case of joins with WHERE 1=0 2016-09-26 15:12:29 +02:00
Robin Thomas 6880efce5b Forbid EXCLUDE constraints on distributed tables just as we forbid
UNIQUE or PRIMARY KEY constraints. Also, properly propagate valid
EXCLUDE constraints to worker shard tables.

If an EXCLUDE constraint includes the distribution column,
the operator must be an equality operator.
Tests in regression suite for exclusion constraints that include
the partition column, omit it, and include it but with non-equality
operator. Regression tests also verify that valid exclusion constraints
are propagated to the shard tables. And the tests work in different
timezones now.

Fixes citusdata/citus#748 and citusdata/citus#778.
2016-09-21 14:02:42 -04:00
Jason Petersen b00b15a718 Permit multiple DDL commands in a transaction
Three changes here to get to true multi-statement, multi-relation DDL
transactions (same functionality pre-5.2, with benefits of atomicity):

    1. Changed the multi-shard utility hook to always run (consistency
       with router executor hook, removes ad-hoc "installed" boolean)

    2. Change the global connection list in multi_shard_transaction to
       instead be a hash; update related functions to operate on global
       hash instead of local hash/global list

    3. Remove check within DDL code to prevent subsequent DDL commands;
       place unset/reset guard around call to ConnectToNode to permit
       connecting to additional nodes after DDL transaction has begun

In addition, code has been added to raise an error if a ROLLBACK TO
SAVEPOINT is attempted (similar to router executor), and comprehensive
tests execute all multi-DDL scenarios (full success, user ROLLBACK, any
actual errors (say, duplicate index), partial failure (duplicate index
on one node but not others), partial COMMIT (one node fails), and 2PC
partial PREPARE (one node fails)). Interleavings with other commands
(DML, \copy) are similarly all covered.
2016-09-08 22:35:55 -05:00
Metin Doslu 60d67a39f1 Add outer join clause list extraction for subquery pushdown logic
In subquery pushdown, we allow outer joins if the join condition is on the
partition columns. WhereClauseList() used to return all join conditions including
outer joins. However, this has been changed with a commit related to outer join
support on regular queries. With this commit, we refactored ExtractFromExpressionWalker()
to return two lists of qualifiers. The first list is for inner join and filter
clauses and the second list is for outer join clauses. Therefore, we can also
use outer join clauses to check subquery pushdown prerequisites.
2016-09-02 11:54:44 +03:00
Burak Yucesoy 619ec529d4 Error out at master_create_distributed_table if the table has any rows
Before this change, we do not check whether given table which already contains any data
in master_create_distributed_table command. If that table contains any data, making it
it distributed, makes that data hidden to user. With this change, we now gave error to
user if the table contains data.
2016-09-01 17:42:47 +03:00
Jason Petersen c684ac11ec Re-permit DDL in transactions, selectively
Recent changes to DDL and transaction logic resulted in a "regression"
from the viewpoint of users. Previously, DDL commands were allowed in
multi-command transaction blocks, though they were not processed in any
actual transactional manner. We improved the atomicity of our DDL code,
but added a restriction that DDL commands themselves must not occur in
any BEGIN/END transaction block.

To give users back the original functionality (and improved atomicity)
we now keep track of whether a multi-command transaction has modified
data (DML) or schema (DDL). Interleaving the two modification types in
a single transaction is disallowed.

This first step simply permits a single DDL command in such a block,
admittedly an incomplete solution, but one which will permit us to add
full multi-DDL command support in a subsequent commit.
2016-08-30 20:37:19 -06:00
Robin Thomas 475a6245bf Remove all usage of pg_dist_shard.shardalias in extension code. (#739)
Remove regression test of non-null shardalias.
2016-08-19 17:06:22 +03:00
Marco Slot f538ab7f62 Rewrite WorkerShardStats to avoid invalid value bugs 2016-07-29 20:11:18 +02:00
Marco Slot 061662a51f Add MultiClientExecute and MultiClientValueIsNull for simple remote query execution 2016-07-29 20:07:18 +02:00
Andres Freund 0670ba6376 Don't access pg_dist_partition->partkey directly, use heap_getattr().
Text datums can't be directly accessed via the struct equivalence trick
used to access catalogs. That's because, as an optimization, they're
sometimes aligned to 1 byte ("text"'s alignment), and sometimes to 4
bytes. That depends on it being a short
varlena (cf. VARATT_NOT_PAD_BYTE) or not.

In the case at hand here, partkey became longer than 127 characters -
the boundary for short varlenas (cf. VARATT_CAN_MAKE_SHORT()). Thus it
became 4 byte/int aligned. Which lead to the direct struct access
accessing the wrong data.

The fix is simply to never access partkey that way - to enforce that,
hide partkey ehind the usual ifdef.

Fixes: #674
2016-07-29 10:02:36 -07:00
Jason Petersen d43578c557 Quick fix for possible segfault in PurgeConnection
Now that connections can be acquired without going through the cache,
we have to handle cases where functions assume the cache has been ini-
tialized.
2016-07-29 00:12:56 -06:00
Jason Petersen f19779b0ce Support SERIAL/BIGSERIAL non-partition columns
This adds support for SERIAL/BIGSERIAL column types. Because we now can
evaluate functions on the master (during execution), adding this is a
matter of ensuring the table creation step works properly.

To accomplish this, I've added some logic to detect sequences owned by
a table (i.e. those related to its columns). Simply creating a sequence
and using it in a default value is insufficient; users who do so must
ensure the sequence is owned by the column using it.

Fortunately, this is exactly what SERIAL and BIGSERIAL do, which is the
use case we're targeting with this feature. While testing this, I found
that worker_apply_shard_ddl_command actually adds shard identifiers to
sequence names, though I found no places that use or test this path. I
removed that code so that sequence names are not mutated and will match
those used by a SERIAL default value expression.

Our use of the new-to-9.5 CREATE SEQUENCE IF NOT EXISTS syntax means we
are dropping support for 9.4 (which is being done regardless, but makes
this change simpler). I've removed 9.4 from the Travis build matrix.

Some edge cases are possible in ALTER SEQUENCE, COPY FROM (on workers),
and CREATE SEQUENCE OWNED BY. I've added errors for each so that users
understand when and why certain operations are prohibited.
2016-07-28 23:55:40 -06:00
Burak Yucesoy 0a2c940ae5 Remove schema name parameter from API functions
We remove schema name parameter from worker_fetch_foreign_file and
worker_fetch_regular_table functions. We now send schema name
concatanated with table name.
2016-07-28 20:41:05 +03:00
Burak Yucesoy 98025110f0 Add old version(without schema name parameter) of api functions back
Fixes #676

We added old versions (i.e. without schema name) of worker_apply_shard_ddl_command,
worker_fetch_foreign_file and worker_fetch_regular_table back. During function call
of one of these functions, we set schema name as  public schema and call the newer
version of the functions.
2016-07-28 20:40:38 +03:00
Murat Tuncer 992997b8ad Expand router planner coverage
We can now support richer set of queries in router planner.
This allow us to support CTEs, joins, window function, subqueries
if they are known to be executed at a single worker with a single
task (all tables are filtered down to a single shard and a single
worker contains all table shards referenced in the query).

Fixes : #501
2016-07-27 23:35:38 +03:00
Murat Tuncer 719e44d1f4 Remove PostgreSQL 9.4 support 2016-07-26 20:16:09 +03:00
Burak Yucesoy 444d4eb558 Fix worker_fetch_regular_table with schema
Fixes #504
Fixes #646

We changed signature of worker_fetch_regular_table to accept schema name as parameter to
make it work with schemas.
2016-07-22 00:44:02 -06:00
Jason Petersen 44e444ac6a Permit "single-shard" transactions
Allows the use of modification commands (INSERT/UPDATE/DELETE) within
transaction blocks (delimited by BEGIN and ROLLBACK/COMMIT), so long as
all modifications hit a subset of nodes involved in the first such com-
mand in the transaction. This does not circumvent the requirement that
each individual modification command must still target a single shard.

For instance, after sending BEGIN, a user might INSERT some rows to a
shard replicated on two nodes. Subsequent modifications can hit other
shards, so long as they are on one or both of these nodes.

SAVEPOINTs are supported, though if the user actually attempts to send
a ROLLBACK command that specifies a SAVEPOINT they will receive an
ERROR at the end of the topmost transaction.

Placements are only marked inactive if at least one replica succeeds
in a transaction where others fail. Non-atomic behavior is possible if
the shard targeted by the initial modification within a transaction has
a higher replication factor than another shard within the same block
and a node with the latter shard has a failure during the COMMIT phase.

Other methods of denoting transaction blocks (multi-statement commands
sent all at once and functions written in e.g. PL/pgSQL or other such
languages) are not presently supported; their treatment remains the
same as before.
2016-07-21 15:57:22 -06:00
Burak Yucesoy 7df5a265c7 Fix COUNT DISTINCT approximation with schema
Fixes #555

Before this change, we were resolving HLL function and type Oid without qualified name.
Now we find the schema name where HLL objects are stored and generate qualified names for
each objects.

Similar fix is also applied for cstore_table_size function call.
2016-07-21 17:29:18 +03:00
Burak Yucesoy d0beacc4e1 Change worker_apply_shard_ddl_command to accept schema name as parameter
Fixes #565
Fixes #626

To add schema support to citus, we need to schema-prefix all table names, object names etc.
in the queries sent to worker nodes. However; query deparsing is not available for most of
DDL commands, therefore it is not easy to generate worker query in the master node.

As a solution we are sending schema names along with shard id and query to run to worker
nodes with worker_apply_shard_ddl_command.

To not break \STAGE command we pass public schema as paramater while calling
worker_apply_shard_ddl_command from there. This will not cause problem if user uses \STAGE
in different schema because passes schema name is used only if there is no schema name is
given in the query.
2016-07-21 14:17:26 +03:00
Marco Slot 7c093c5cef Move CompleteShardPlacementTransactions to multi_shard_transaction.c 2016-07-20 12:10:46 +02:00
Eren 692ef0964a Propagate DDL Commands with 2PC
Fixes #513

This change modifies the DDL Propagation logic so that DDL queries
are propagated via 2-Phase Commit protocol. This way, failures during
the execution of distributed DDL commands will not leave the table in
an intermediate state and the pending prepared transactions can be
commited manually.

DDL commands are not allowed inside other transaction blocks or functions.

DDL commands are performed with 2PC regardless of the value of
`citus.multi_shard_commit_protocol` parameter.

The workflow of the successful case is this:
1. Open individual connections to all shard placements and send `BEGIN`
2. Send `SELECT worker_apply_shard_ddl_command(<shardId>, <DDL Command>)`
to all connections, one by one, in a serial manner.
3. Send `PREPARE TRANSCATION <transaction_id>` to all connections.
4. Sedn `COMMIT` to all connections.

Failure cases:
- If a worker problem occurs before sending of all DDL commands is finished, then
all changes are rolled back.
- If a worker problem occurs after all DDL commands are sent but not after
`PREPARE TRANSACTION` commands are finished, then all changes are rolled back.
However, if a worker node is failed, then the prepared transactions in that worker
should be rolled back manually.
- If a worker problem occurs during `COMMIT PREPARED` statements are being sent,
then the prepared transactions on the failed workers should be commited manually.
- If master fails before the first 'PREPARE TRANSACTION' is sent, then nothing is
changed on workers.
- If master fails during `PREPARE TRANSACTION` commands are being sent, then the
prepared transactions on workers should be rolled back manually.
- If master fails during `COMMIT PREPARED` or `ROLLBACK PREPARED` commands are being
sent, then the remaining prepared transactions on the workers should be handled manually.

This change also helps with #480, since failed DDL changes no longer mark
failed placements as inactive.
2016-07-19 10:44:11 +03:00
Murat Tuncer eae7f79a8b Make router planner use original query 2016-07-18 18:23:04 +03:00
Brian Cloutier 728eefcf2b Simplify code and fix include guards in citus_clauses 2016-07-13 11:45:51 -07:00
Brian Cloutier c46cb19cda Only reparse queries if the planner flags them for reparsing 2016-07-13 11:45:51 -07:00
Brian Cloutier d792c0af4d citus_indent and some renaming 2016-07-13 11:45:51 -07:00
Brian Cloutier e73b4ac026 Evaluate functions on the master
- Enables using VOLATILE functions (like nextval()) in INSERT queries
- Enables using STABLE functions (like now()) targetLists and joinTrees

UPDATE and INSERT can now contain non-immutable functions. INSERT can contain any kind of
expression, while UPDATE can contain any STABLE function, so long as a Var is not passed
into the STABLE function, even indirectly. UPDATE TagetEntry's can now also include Vars.

There's an exception, CASE/COALESCE statements may not contain mutable functions.

Functions calls in master_modify_multiple_shards are also evaluated.
2016-07-13 11:45:51 -07:00
Andres Freund 610e17d94a Combine router executor paths for select and modify commands.
The upcoming RETURNING support would otherwise require too much
duplication.  This contains most of the pieces required for RETURNING
support, except removing the planner checks and adjusting regression
test output.
2016-07-01 13:07:12 -07:00
Murat Tuncer e86b4b397c Refactor multi_planner to create router plan directly
If router plan creation fails, it falls back to normal planner
2016-06-21 12:50:21 +03:00
Burak Yucesoy 2da5ae240e Fix master_append_table_to_shard to work with schemas
Fixes #78

With this change, it is possible to append a table in any schema to shard. The function
master_append_table_to_shard now supports schema names.
2016-06-17 04:35:00 +03:00
Marco Slot f15ec5554c Do not copy outer join clauses into WHERE 2016-06-16 16:42:32 -07:00
Jason Petersen 8efb504d1a Refactor ReportRemoteError to remove boolean arg
Broke it into two explicitly-named functions instead: WarnRemoteError
and ReraiseRemoteError.
2016-06-07 12:38:32 -06:00
Metin Doslu dfc7dd8d87 Fail fast on constraint violations in router executor 2016-06-07 18:11:17 +03:00
Murat Tuncer fcd4248f6a Add enable_ddl_propagation flag to control automatic ddl propagation 2016-06-06 13:42:46 +03:00
Murat Tuncer 41096f2076 Change equality operator check for operator expressions 2016-06-06 12:34:16 +03:00
Andres Freund ee5bb2297b Rely less on remote_task_check_interval.
When executing queries with citus.task_executor = 'real-time', query
execution could, so far, spend a significant amount of time
sleeping. That's because we were
a) sleeping after several phases of query execution, even if we're not
   waiting for network IO
b) sleeping for a fixed amount of time when waiting for network IO;
   often a lot longer than actually required.
Just reducing the amount of time slept isn't a real solution, because
that just increases CPU usage.

Instead have the real-time executor's ManageTaskExecution return whether
a task is currently being processed, waiting for reads or writes, or
failed. When all tasks are waiting for IO use poll() to wait for IO
readyness.

That requires to slightly redefine how connection timeouts are handled:
before we counted the number of times ManageTaskExecution() was called,
and compared that with the timeout divided by the task check
interval. That, if processing of tasks took a while, could significantly
increase the time till a timeout occurred. Because it was based on the
ManageTaskExecution() being called on a constant interval, this approach
isn't feasible anymore.  Instead measure the actual time since
connection establishment was started. That could in theory, if task
processing takes a very long time, lead to few passes over
PQconnectPoll().

The problem of sleeping too much also exists for the 'task-tracker'
executor, but is generally less problematic there, as processing the
individual tasks usually will take longer. That said, for e.g. the
regression tests it'd be helpful to use a similar approach.
2016-06-02 12:11:16 -06:00
Murat Tuncer 9167373f54 Add complex distinct count support for repartitioned subqueries
Single table repartition subqueries now support count(distinct column)
and count(distinct (case when ...)) expressions. Repartition query
extracts column used in aggregate expression and adds them to target
list and group by list, master query stays the same (count (distinct ...))
but attribute numbers inside the aggregate expression is modified to
reflect changes in repartition query.
2016-05-27 15:43:05 +03:00
Metin Doslu a82efa6613 Make master_create_empty_shard() aware of the shard placement policy
Now, master_create_empty_shard() will create shards according to the
value of citus.shard_placement_policy which also makes default round-robin
instead of random.
2016-05-27 15:05:53 +03:00
eren 793cb2d004 ADD master_modify_multiple_shards UDF
Fixes #10

This change creates a new UDF: master_modify_multiple_shards
Parameters:
  modify_query: A simple DELETE or UPDATE query as a string.

The UDF is similar to the existing master_apply_delete_command UDF.
Basically, given the modify query, it prunes the shard list, re-constructs
the query for each shard and sends the query to the placements.

Depending on the value of citus.multi_shard_commit_protocol, the commit
can be done in one-phase or two-phase manner.

Limitations:
* It cannot be called inside a transaction block
* It only be called with simple operator expressions (like Single Shard Modify)

Sample Usage:
```
SELECT master_modify_multiple_shards(
  'DELETE FROM customer_delete_protocol WHERE c_custkey > 500 AND c_custkey < 500');
```
2016-05-26 17:30:35 +03:00
Burak Yucesoy 31b0423f1f Fix #469
This change renames one of the ReceiveRegularFile functions with
more descriptive name.
2016-05-26 12:03:36 +03:00
Metin Doslu fb6b6daf9d Add COPY support on worker nodes for append partitioned relations
Now, we can copy to an append-partitioned distributed relation from
any worker node by providing master options such as;

COPY relation_name FROM file_path WITH (delimiter '|', master_host 'localhost', master_port 5432);

where master_port is optional and default is 5432.
2016-05-03 16:00:00 +03:00
Marco Slot cfbdbe29a9 Add EXPLAIN for simple distributed queries 2016-04-30 00:11:02 +02:00
eren 888457bb7f Rename copy_transaction_manager
This change renames the distributed transaction manager parameter from
citus.copy_transaction_manager to citus.multi_shard_commit_protocol.

Distributed transaction manager has been used only by the COPY on hash
partitioned tables but it can be used by upcoming features so, we needed
to rename so that its name do not contain a reference to COPY.

The change also includes renames like transaction_manager_options to
commit_protocol_options and TRANSACTION_MANAGER_1PC to COMMIT_PROTOCOL_1PC.

With this change, declaration of MultiShardCommitProtocol (was
CopyTransactionManager) is moved from multi_copy.c to multi_transaction.c.
2016-04-28 15:12:50 +03:00
Andres Freund 63998786ba Create new shards as owned the distributed table's owner.
That's important because ownership of relations implies special
privileges. Without this change, a distributed table can be accessible
by a table's owner, but a shard created by another user might not.
2016-04-27 10:28:33 -07:00
Andres Freund c45b94e88a Add ReplicateGrantStmt().
This is the basis for coordinating GRANT/REVOKE across nodes.
2016-04-27 10:28:25 -07:00
Andres Freund ee6ef363c0 Add pg_get_table_grants() function and support extending GRANTs. 2016-04-27 10:28:25 -07:00
Andres Freund 99e983433f Run some commands as superuser to allow normal users to execute queries.
Some small parts of citus currently require superuser privileges; which
is obviously not desirable for production scenarios. Run these small
parts under superuser privileges (we use the extension owner) to avoid
that.

This does not yet coordinate grants between master and workers. Thus it
allows to create shards, load data, and run queries as a non-superuser,
but it is not easily possible to allow differentiated accesses to
several users.
2016-04-27 10:28:22 -07:00
Andres Freund a0058023bf Add CitusExtensionOwner(), to execute some priviledged operations under.
There exist some operations we have to execute with elevated
privileges. The most expedient user for that is the user owning the
citusdb extension.
2016-04-27 10:26:08 -07:00
Andres Freund 22ea434cef Perform permission checks in functions manipulating distributed tables.
Previously several commands, amongst them commands like
master_create_distributed_table(), were allowed for everyone. That's not
good: Even though citus currently requires superuser permissions, we
shouldn't allow non-superusers to perform actions as sensitive as making
a table distributed.

There's no checks on the worker_* functions, as these usually just punt
the action to underlying postgres functionality, which then perform the
necessary checks.
2016-04-27 10:22:20 -07:00
Andres Freund 3dae284bbe Use the current session's username when connecting to worker nodes.
So far we've always used libpq defaults when connecting to workers; bar
special environment variables being set that'll always be the user that
started the server.  That's not desirable because it prevents using
users with fewer privileges.

Thus change the various APIs creating connections to workers to always
use usernames. That means:
1) MultiClientConnect() needs to, optionally, accept a username
2) GetOrEstablishConnection(), including the underlying cache, need to
   use the current user as part of the connection cache key. That way
   connections for separate users are distinct, and we always use one
   with the correct authorization.
3) The task tracker needs to keep track of the username associated with
   a task, so it can use it when establishing connections outside the
   originating session.
2016-04-27 10:00:08 -07:00