Commit Graph

119 Commits (a342cacfc40c2c42724d7b14144d9859fabe2bbf)

Author SHA1 Message Date
Murat Tuncer a342cacfc4 Address feedback 2016-09-26 18:23:42 -06:00
Murat Tuncer 66069ed610 Fix regression test failures after rebase 2016-09-26 18:23:42 -06:00
Murat Tuncer 4416b77088 Add support for truncate statement 2016-09-26 18:23:42 -06:00
Marco Slot a2276adcd2 Fix segmentation fault in case of joins with WHERE 1=0 2016-09-26 15:12:29 +02:00
Robin Thomas 6880efce5b Forbid EXCLUDE constraints on distributed tables just as we forbid
UNIQUE or PRIMARY KEY constraints. Also, properly propagate valid
EXCLUDE constraints to worker shard tables.

If an EXCLUDE constraint includes the distribution column,
the operator must be an equality operator.
Tests in regression suite for exclusion constraints that include
the partition column, omit it, and include it but with non-equality
operator. Regression tests also verify that valid exclusion constraints
are propagated to the shard tables. And the tests work in different
timezones now.

Fixes citusdata/citus#748 and citusdata/citus#778.
2016-09-21 14:02:42 -04:00
Metin Doslu 80a833aeb3 Remove pg_toast_* references from regression tests
pg_toast_* oids are constantly changing, and this causes regression tests to
fail time to time. With this commit, we remove all of the pg_toast_* references
from regression test outputs.
2016-09-09 11:31:51 +03:00
Jason Petersen b00b15a718 Permit multiple DDL commands in a transaction
Three changes here to get to true multi-statement, multi-relation DDL
transactions (same functionality pre-5.2, with benefits of atomicity):

    1. Changed the multi-shard utility hook to always run (consistency
       with router executor hook, removes ad-hoc "installed" boolean)

    2. Change the global connection list in multi_shard_transaction to
       instead be a hash; update related functions to operate on global
       hash instead of local hash/global list

    3. Remove check within DDL code to prevent subsequent DDL commands;
       place unset/reset guard around call to ConnectToNode to permit
       connecting to additional nodes after DDL transaction has begun

In addition, code has been added to raise an error if a ROLLBACK TO
SAVEPOINT is attempted (similar to router executor), and comprehensive
tests execute all multi-DDL scenarios (full success, user ROLLBACK, any
actual errors (say, duplicate index), partial failure (duplicate index
on one node but not others), partial COMMIT (one node fails), and 2PC
partial PREPARE (one node fails)). Interleavings with other commands
(DML, \copy) are similarly all covered.
2016-09-08 22:35:55 -05:00
Eric B. Ridge 361e37f921 Add syscols in queries; extend relnames in indexes
To permit use with ZomboDB (https://github.com/zombodb/zombodb), two
changes were necessary:

  1. Permit use of `tableoid` system column in queries
  2. Extend relation names appearing in index expressions

The first is accomplished by simply changing the deparse logic to allow
system columns in queries destined for distributed tables. The latter
was slightly more complex, given that DDL extension currently occurs on
workers. But since indexes cannot reference tables other than the one
being indexed, it is safe to look for any relation reference ending in
a '*' character and extend their penultimate segments with a shard id.

This change also adds an error to prevent users from distributing any
relations using the WITH (OIDS) feature, which is unsupported.
2016-09-07 11:54:55 -05:00
Marco Slot 575bc99be5 Allow noop updates of the partition column 2016-09-07 14:22:41 +02:00
Jason Petersen 407533d0f9 Add sort call to shard placement test
The comparator is kind of broken, but I think this is better than the
current state of random failures.
2016-09-06 11:07:27 -05:00
Metin Doslu 2a8b2f6a99 Add complex subquery pushdown regression tests 2016-09-02 14:21:51 +03:00
Burak Yucesoy 619ec529d4 Error out at master_create_distributed_table if the table has any rows
Before this change, we do not check whether given table which already contains any data
in master_create_distributed_table command. If that table contains any data, making it
it distributed, makes that data hidden to user. With this change, we now gave error to
user if the table contains data.
2016-09-01 17:42:47 +03:00
Jason Petersen c684ac11ec Re-permit DDL in transactions, selectively
Recent changes to DDL and transaction logic resulted in a "regression"
from the viewpoint of users. Previously, DDL commands were allowed in
multi-command transaction blocks, though they were not processed in any
actual transactional manner. We improved the atomicity of our DDL code,
but added a restriction that DDL commands themselves must not occur in
any BEGIN/END transaction block.

To give users back the original functionality (and improved atomicity)
we now keep track of whether a multi-command transaction has modified
data (DML) or schema (DDL). Interleaving the two modification types in
a single transaction is disallowed.

This first step simply permits a single DDL command in such a block,
admittedly an incomplete solution, but one which will permit us to add
full multi-DDL command support in a subsequent commit.
2016-08-30 20:37:19 -06:00
Metin Doslu f1bab608c1 Return false in MultiClientQueryResult() on failing query 2016-08-29 17:05:35 +03:00
Brian Cloutier 24bd08d287 Remove check-multi-fdw tests, nobody uses Citus with fdws 2016-08-26 10:41:33 +03:00
Jason Petersen 1ed9871ead Rename test files with 'stage' in name
Ignored FDW files as those test are being removed entirely, I believe.
2016-08-22 13:32:53 -06:00
Jason Petersen 945ce21320 Replace verb 'stage' with 'load' in test comments
"Staging table" will be the only valid use of 'stage' from now on, we
will now say "load" when talking about data ingestion. If creation of
shards is its own step, we'll just say "shard creation".
2016-08-22 13:24:18 -06:00
Jason Petersen 1500438313 Replace verb 'stage' with 'load' in schedules
"Staging table" will be the only valid use of 'stage' from now on.
2016-08-22 11:48:41 -06:00
Eren BaÅŸak 7e511d278e Lowercase \copy to match PostgreSQL's style for local/psql-level functions 2016-08-22 11:31:26 -06:00
Eren Basak d030eb63d1 Replace \stage With \copy on Regression Tests
Fixes #547

This change removes all references to \stage in the regression tests
and puts \COPY instead. Doing so changed shard counts, min/max
values on some test tables (lineitem, orders, etc.).
2016-08-22 11:31:26 -06:00
Robin Thomas 475a6245bf Remove all usage of pg_dist_shard.shardalias in extension code. (#739)
Remove regression test of non-null shardalias.
2016-08-19 17:06:22 +03:00
Jason Petersen 8e7a567827 Fix Travis local_first_candidate_nodes failures
A recent change to the image used in Travis causes some problems for
the code we use here to ensure the local replica is first. Since this
code is essentially dead in a post-stage world anyhow, we're OK with
ripping out the tests to placate Travis.
2016-08-14 23:12:10 -06:00
Murat Tuncer ed3a442cda Remove a router planner test for materialized view
PostgreSQL 9.5.4 stopped calling planner for materialized view create
command when NO DATA option is provided.

This causes our test to behave differently between pre-9.5.4 and 9.5.4.
2016-08-14 22:57:09 -06:00
Metin Doslu 16e02775ca Bump version numbers for 5.2 release 2016-08-01 13:48:24 -07:00
Marco Slot f538ab7f62 Rewrite WorkerShardStats to avoid invalid value bugs 2016-07-29 20:11:18 +02:00
Eren BaÅŸak bb4f4e25b5 Set 1PC as the Default Commit Protocol for DDL Commands
Fixes #679

This change sets the default commit protocol for distributed DDL
commands to '1pc'. If the user issues a distributed DDL command with
this default setting, then once in a session, a NOTICE message is
shown about using '2pc' being extra safe.
2016-07-29 16:42:55 +03:00
Jason Petersen d43578c557 Quick fix for possible segfault in PurgeConnection
Now that connections can be acquired without going through the cache,
we have to handle cases where functions assume the cache has been ini-
tialized.
2016-07-29 00:12:56 -06:00
Jason Petersen f19779b0ce Support SERIAL/BIGSERIAL non-partition columns
This adds support for SERIAL/BIGSERIAL column types. Because we now can
evaluate functions on the master (during execution), adding this is a
matter of ensuring the table creation step works properly.

To accomplish this, I've added some logic to detect sequences owned by
a table (i.e. those related to its columns). Simply creating a sequence
and using it in a default value is insufficient; users who do so must
ensure the sequence is owned by the column using it.

Fortunately, this is exactly what SERIAL and BIGSERIAL do, which is the
use case we're targeting with this feature. While testing this, I found
that worker_apply_shard_ddl_command actually adds shard identifiers to
sequence names, though I found no places that use or test this path. I
removed that code so that sequence names are not mutated and will match
those used by a SERIAL default value expression.

Our use of the new-to-9.5 CREATE SEQUENCE IF NOT EXISTS syntax means we
are dropping support for 9.4 (which is being done regardless, but makes
this change simpler). I've removed 9.4 from the Travis build matrix.

Some edge cases are possible in ALTER SEQUENCE, COPY FROM (on workers),
and CREATE SEQUENCE OWNED BY. I've added errors for each so that users
understand when and why certain operations are prohibited.
2016-07-28 23:55:40 -06:00
Burak Yucesoy 0a2c940ae5 Remove schema name parameter from API functions
We remove schema name parameter from worker_fetch_foreign_file and
worker_fetch_regular_table functions. We now send schema name
concatanated with table name.
2016-07-28 20:41:05 +03:00
Burak Yucesoy 98025110f0 Add old version(without schema name parameter) of api functions back
Fixes #676

We added old versions (i.e. without schema name) of worker_apply_shard_ddl_command,
worker_fetch_foreign_file and worker_fetch_regular_table back. During function call
of one of these functions, we set schema name as  public schema and call the newer
version of the functions.
2016-07-28 20:40:38 +03:00
Murat Tuncer 992997b8ad Expand router planner coverage
We can now support richer set of queries in router planner.
This allow us to support CTEs, joins, window function, subqueries
if they are known to be executed at a single worker with a single
task (all tables are filtered down to a single shard and a single
worker contains all table shards referenced in the query).

Fixes : #501
2016-07-27 23:35:38 +03:00
Murat Tuncer 719e44d1f4 Remove PostgreSQL 9.4 support 2016-07-26 20:16:09 +03:00
Burak Yucesoy 770ecffc8f Remove warnings on schema creation
Since now we support schema related operations, there is no need to warn user about
schema usage.
2016-07-22 18:24:23 +03:00
Burak Yucesoy 9df8300efa Fix ALTER TABLE SET SCHEMA
Fixes #132

We hook into ALTER ... SET SCHEMA and warn out if user tries to change schema of a
distributed table.

We also hook into ALTER TABLE ALL IN TABLE SPACE statements and warn out if citus has
been loaded.
2016-07-22 17:52:40 +03:00
Murat Tuncer 461fefbdb2 Fix outer join crash when subquery is flatten 2016-07-22 17:01:19 +03:00
Burak Yucesoy 444d4eb558 Fix worker_fetch_regular_table with schema
Fixes #504
Fixes #646

We changed signature of worker_fetch_regular_table to accept schema name as parameter to
make it work with schemas.
2016-07-22 00:44:02 -06:00
Jason Petersen 44e444ac6a Permit "single-shard" transactions
Allows the use of modification commands (INSERT/UPDATE/DELETE) within
transaction blocks (delimited by BEGIN and ROLLBACK/COMMIT), so long as
all modifications hit a subset of nodes involved in the first such com-
mand in the transaction. This does not circumvent the requirement that
each individual modification command must still target a single shard.

For instance, after sending BEGIN, a user might INSERT some rows to a
shard replicated on two nodes. Subsequent modifications can hit other
shards, so long as they are on one or both of these nodes.

SAVEPOINTs are supported, though if the user actually attempts to send
a ROLLBACK command that specifies a SAVEPOINT they will receive an
ERROR at the end of the topmost transaction.

Placements are only marked inactive if at least one replica succeeds
in a transaction where others fail. Non-atomic behavior is possible if
the shard targeted by the initial modification within a transaction has
a higher replication factor than another shard within the same block
and a node with the latter shard has a failure during the COMMIT phase.

Other methods of denoting transaction blocks (multi-statement commands
sent all at once and functions written in e.g. PL/pgSQL or other such
languages) are not presently supported; their treatment remains the
same as before.
2016-07-21 15:57:22 -06:00
Burak Yucesoy 7df5a265c7 Fix COUNT DISTINCT approximation with schema
Fixes #555

Before this change, we were resolving HLL function and type Oid without qualified name.
Now we find the schema name where HLL objects are stored and generate qualified names for
each objects.

Similar fix is also applied for cstore_table_size function call.
2016-07-21 17:29:18 +03:00
Burak Yucesoy 5a93a70e2d Fix master_apply_delete_command with schema
Fixes #73
2016-07-21 15:09:20 +03:00
Burak Yucesoy d0beacc4e1 Change worker_apply_shard_ddl_command to accept schema name as parameter
Fixes #565
Fixes #626

To add schema support to citus, we need to schema-prefix all table names, object names etc.
in the queries sent to worker nodes. However; query deparsing is not available for most of
DDL commands, therefore it is not easy to generate worker query in the master node.

As a solution we are sending schema names along with shard id and query to run to worker
nodes with worker_apply_shard_ddl_command.

To not break \STAGE command we pass public schema as paramater while calling
worker_apply_shard_ddl_command from there. This will not cause problem if user uses \STAGE
in different schema because passes schema name is used only if there is no schema name is
given in the query.
2016-07-21 14:17:26 +03:00
Metin Doslu 28000a8203 Add support for prepared statements with parameterized non-partition columns in router executor 2016-07-21 11:09:28 +03:00
Burak Yucesoy 71bb558641 Always schema-prefix worker queries
Fixes #215
Fixes #267
Fixes #502
Fixes #556
Fixes #557
Fixes #560
Fixes #568
Fixes #623
Fixes #624

With this change we schema-prefix table names, operator names and composite types.
2016-07-20 10:42:24 +03:00
Eren 692ef0964a Propagate DDL Commands with 2PC
Fixes #513

This change modifies the DDL Propagation logic so that DDL queries
are propagated via 2-Phase Commit protocol. This way, failures during
the execution of distributed DDL commands will not leave the table in
an intermediate state and the pending prepared transactions can be
commited manually.

DDL commands are not allowed inside other transaction blocks or functions.

DDL commands are performed with 2PC regardless of the value of
`citus.multi_shard_commit_protocol` parameter.

The workflow of the successful case is this:
1. Open individual connections to all shard placements and send `BEGIN`
2. Send `SELECT worker_apply_shard_ddl_command(<shardId>, <DDL Command>)`
to all connections, one by one, in a serial manner.
3. Send `PREPARE TRANSCATION <transaction_id>` to all connections.
4. Sedn `COMMIT` to all connections.

Failure cases:
- If a worker problem occurs before sending of all DDL commands is finished, then
all changes are rolled back.
- If a worker problem occurs after all DDL commands are sent but not after
`PREPARE TRANSACTION` commands are finished, then all changes are rolled back.
However, if a worker node is failed, then the prepared transactions in that worker
should be rolled back manually.
- If a worker problem occurs during `COMMIT PREPARED` statements are being sent,
then the prepared transactions on the failed workers should be commited manually.
- If master fails before the first 'PREPARE TRANSACTION' is sent, then nothing is
changed on workers.
- If master fails during `PREPARE TRANSACTION` commands are being sent, then the
prepared transactions on workers should be rolled back manually.
- If master fails during `COMMIT PREPARED` or `ROLLBACK PREPARED` commands are being
sent, then the remaining prepared transactions on the workers should be handled manually.

This change also helps with #480, since failed DDL changes no longer mark
failed placements as inactive.
2016-07-19 10:44:11 +03:00
Murat Tuncer eae7f79a8b Make router planner use original query 2016-07-18 18:23:04 +03:00
Eren c92c81b550 Add LIMIT/OFFSET Support
Fixes #394

This change adds LIMIT/OFFSET support for non router-plannable
distributed queries.

In cases that we can push the LIMIT down, we add the OFFSET value to
that LIMIT in the worker queries. When a query with LIMIT x OFFSET y is issued,
the query is propagated to the workers as LIMIT (x+y) OFFSET 0, and on the
master table, the original LIMIT and OFFSET values are used. With this change,
we can use OFFSET wherever we can use LIMIT.
2016-07-18 12:00:24 +03:00
Brian Cloutier e73b4ac026 Evaluate functions on the master
- Enables using VOLATILE functions (like nextval()) in INSERT queries
- Enables using STABLE functions (like now()) targetLists and joinTrees

UPDATE and INSERT can now contain non-immutable functions. INSERT can contain any kind of
expression, while UPDATE can contain any STABLE function, so long as a Var is not passed
into the STABLE function, even indirectly. UPDATE TagetEntry's can now also include Vars.

There's an exception, CASE/COALESCE statements may not contain mutable functions.

Functions calls in master_modify_multiple_shards are also evaluated.
2016-07-13 11:45:51 -07:00
Burak Yucesoy 7cb92b8bb1 Fix COPY produces error when using array of user-defined types
Fixes #463

OID of user-defined types may be different in master and worker nodes. This causes errors
while sending data between nodes with binary nodes. Because binary copy format adds OID
of the element if it is in an array. The code adding OID is in PostgreSQL code, therefore
we cannot change it. Instead we decided to use text format if we try to send array of
user-defined type.
2016-07-13 11:12:24 +03:00
Jason Petersen 9157ac9f10 Remove hash-pruning logic for NULL values
It turns out some tests exercised this behavior, but removing it should
have no ill effects. Besides, both copy and INSERT disallow NULLs in a
table's partition column.

Fixes a bug where anti-joins on hash-partitioned distributed tables
would incorrectly prune shards early, result in incorrect results (test
included).
2016-07-06 17:04:21 -06:00
Andres Freund c945ea310b Add regression tests for RETURNING. 2016-07-01 13:07:12 -07:00
Andres Freund 586f738bc7 Support RETURNING for modification commands.
Fixes: #242
2016-07-01 13:07:12 -07:00