Commit Graph

122 Commits (520e7e3cb247f666270d9cf07e6ff9c30c32a574)

Author SHA1 Message Date
Metin Doslu 520e7e3cb2 Add mark_tables_colocated() to update colocation groups
Added a new UDF, mark_tables_colocated(), to colocate tables with the same
configuration (shard count, shard replication count and distribution column type).
2016-10-26 17:29:03 +03:00
Brian Cloutier c7e722e624 Fix segfault during EXPLAIN EXECUTE
Fix citusdata/citus#886

The way postgres' explain hook is designed means that our hook is never
called during EXPLAIN EXECUTE. So, we special-case EXPLAIN EXECUTE by
catching it in the utility hook.  We then replace the EXECUTE with the
original query and pass it back to Citus.
2016-10-26 15:18:42 +03:00
Andres Freund ebbef819f2 Invalidate relcache after pg_dist_shard_placement changes.
This forces prepared statements to be re-planned after changes of the
placement metadata. There's some locking issues remaining, but that's a
a separate task.

Also add regression tests verifying that invalidations take effect on
prepared statements.
2016-10-26 03:36:35 -07:00
Onder Kalaci 9e82cd6d2d Feature: INSERT INTO ... SELECT
This commit adds INSERT INTO ... SELECT feature for distributed tables.

We implement INSERT INTO ... SELECT by pushing down the SELECT to
each shard. To compute that we use the router planner, by adding
an "uninstantiated" constraint that the partition column be equal to a
certain value. standard_planner() distributes that constraint to all
the tables where it knows how to push the restriction safely. An example
is that the tables that are connected via equi joins.

The router planner then iterates over the target table's shards,
for each we replace the "uninstantiated" restriction, with one that
PruneShardList() handles. Do so by replacing the partitioning qual
parameter added in multi_planner() with the current shard's
actual boundary values. Also, add the current shard's boundary values to the
top level subquery to ensure that even if the partitioning qual is
not distributed to all the tables, we never run the queries on the shards
that don't match with the current shard boundaries. Finally, perform the
normal shard pruning to decide on whether to push the query to the
current shard or not.

We do not support certain SQLs on the subquery, which are described/commented
on ErrorIfInsertSelectQueryNotSupported().

We also added some locking on the router executor. When an INSERT/SELECT command
runs on a distributed table with replication factor >1, we need to ensure that
it sees the same result on each placement of a shard. So we added the ability
such that router executor takes exclusive locks on shards from which the SELECT
in an INSERT/SELECT reads in order to prevent concurrent changes. This is not a
very optimal solution, but it's simple and correct. The
citus.all_modifications_commutative can be used to avoid aggressive locking.
An INSERT/SELECT whose filters are known to exclude any ongoing writes can be
marked as commutative. See RequiresConsistentSnapshot() for the details.

We also moved the decison of whether the multiPlan should be executed on
the router executor or not to the planning phase. This allowed us to
integrate multi task router executor tasks to the router executor smoothly.
2016-10-26 10:01:00 +03:00
Onder Kalaci 75dd6ee3ab Add ability to reorder target list for INSERT/SELECT queries
The necessity for this functionality comes from the fact that ruleutils.c is not supposed to be
used on "rewritten" queries (i.e. ones that have been passed through QueryRewrite()).
Query rewriting is the process in which views and such are expanded,
and, INSERT/UPDATE targetlists are reordered to match the physical order,
defaults etc. For the details of reordeing, see transformInsertRow().
2016-10-26 10:00:03 +03:00
Jason Petersen 95830ded74 Move all funcs to pg_catalog, add test to verify
We'd been relying on a single SET search_path command in an earlier
script, but a subsequent script RESET search_path, causing any further
bare functions to be created in the first schema on the search path.

However, starting with an older extension version and executing ALTER
scripts one at a time DOES avoid putting any functions in the public
namespace, so I wrote an upgrade script resilient to that, especially
because PostgreSQL 9.5 will error out if a function is already in the
schema it's being moved to.
2016-10-25 12:45:53 -06:00
Burak Yucesoy c7414c3af2 Foreign Constraint Support for create_distributed_table and shard move
With this change, we now push down foreign key constraints created during CREATE TABLE
statements. We also start to send foreign constraints during shard move along with
other DDL statements
2016-10-21 15:38:55 +03:00
Marco Slot 6608e4c628 Re-disable master evaluation for SELECT 2016-10-21 10:51:47 +02:00
Metin Doslu 663828363c Add create_reference_table()
create_reference_table() creates a hash distributed table with shard count
equals to 1 and replication factor equals to shard_replication_factor
configuration value.
2016-10-20 15:29:30 +03:00
Metin Doslu 7baf72cc86 Final refactoring 2016-10-20 11:29:11 +03:00
Metin Doslu 31f08f8377 Add create_distributed_table()
create_distributed_table() creates a hash distributed table with default values
of shard count and shard replication factor.
2016-10-20 10:58:25 +03:00
Marco Slot 06e790f420 Parallelise master_modify_multiple_shards 2016-10-19 08:33:08 +02:00
Andres Freund 0e02b838a3 Support PostgreSQL 9.6
Adds support for PostgreSQL 9.6 by copying in the requisite ruleutils
file and refactoring the out/readfuncs code to flexibly support the
old-style copy/pasted out/readfuncs (prior to 9.6) or use extensible
node APIs (in 9.6 and higher).

Most version-specific code within this change is only needed to set new
fields in the AggRef nodes we build for aggregations. Version-specific
test output files were added in certain cases, though in most they were
not necessary. Each such file begins by e.g. printing the major version
in order to clarify its purpose.

The comment atop citus_nodes.h details how to add support for new nodes
for when that becomes necessary.
2016-10-18 16:23:55 -06:00
Murat Tuncer f76cb6672a Add master_run_on_worker UDF 2016-10-18 17:59:54 +03:00
Eren Basak e31830f3fb Add worker transaction and transaction recovery infrastructure 2016-10-18 14:18:14 +03:00
Metin Doslu ca4a94e137 Add regression tests for parameterized queries 2016-10-18 14:02:50 +03:00
Eren Basak 1e95e4576d Add pg_dist_local_group Metadata Table
This change adds the pg_dist_local_group metadata table, which indicates
the group id of the current node. It is expected that this table contains
one and only one row, which only contains the group id of the node as an
integer.
2016-10-14 11:41:14 +03:00
Eren Basak 662b767431 Fix changing placement ids in metadata snapshot test 2016-10-14 11:13:16 +03:00
Brian Cloutier 84c4ba1073 Drop shardalias 2016-10-14 11:03:26 +03:00
Burak Yucesoy 55341815a8 Make shard transfer functions co-location aware
With this change, master_copy_shard_placement and master_move_shard_placement functions
start to copy/move given shard along with its co-located shards.
2016-10-13 18:16:40 +03:00
Metin Doslu 827d1ddb75 Add HAVING support
This commit completes having support in Citus by adding having support for
real-time and task-tracker executors. Multiple tests are added to regression
tests to cover new supported queries with having support.
2016-10-13 15:47:53 +03:00
Eren Basak 6cb3ae93ba Add Metadata Snapshot Infrastructure
This change adds the required infrastructure about metadata snapshot from MX
codebase into Citus, mainly metadata_sync.c file and master_metadata_snapshot UDF.
2016-10-13 10:40:14 +03:00
Jason Petersen 9330719b89 Use single-quote interpolation in partition test
Noticed an old issue and this outdated comment. Figured I'd fix it.
2016-10-10 13:03:43 -06:00
Jason Petersen 60ec3345a7 Fix tests and tell Travis to run them all
Two sets of tests are fixed by this change:
  * multi_agg_approximate_distinct
  * those in multi_task_tracker_extra_schedule

The first broke when we renamed stage to load in many files and was
never being run because the HyperLogLog extension wasn't easily
available in Debian. Now it's in our repo, so we install it and run
the test. I removed the distinct HLL target in favor of just always
running it and providing an output variant to handle when the extension
is absent. Basically, if PostgreSQL thinks HLL is available, the test
installs it and runs normally, otherwise the absent variant is used.

The second broke when I removed a test variant, erroneously believing
it to be related to an older Citus version. I've added a line in that
test to clarify why the variant is necessary (a practice we should
widely adopt).
2016-10-07 17:32:54 -06:00
Andres Freund 5de52c3b04 Introduce placement IDs.
So far placements were assigned an Oid, but that was just used to track
insertion order. It also did so incompletely, as it was not preserved
across changes of the shard state. The behaviour around oid wraparound
was also not entirely as intended.

The newly introduced, explicitly assigned, IDs are preserved across
shard-state changes.

The prime goal of this change is not to improve ordering of task
assignment policies, but to make it easier to reference shards.  The
newly introduced UpdateShardPlacementState() makes use of that, and so
will the in-progress connection and transaction management changes.
2016-10-07 11:59:20 -07:00
Brian Cloutier 62e7bdbdd6 Switch from pg_worker_list.conf file to pg_dist_node metadata table.
Related to #786

This change adds the `pg_dist_node` table that contains the information
about the workers in the cluster, replacing the previously used
`pg_worker_list.conf` file (or the one specified with `citus.worker_list_file`).

Upon update, `pg_worker_list.conf` file is read and `pg_dist_node` table is
populated with the file's content. After that, `pg_worker_list.conf` file
is renamed to `pg_worker_list.conf.obsolete`

For adding and removing nodes, the change also includes two new UDFs:
`master_add_node` and `master_remove_node`, which require superuser
permissions.

'citus.worker_list_file' guc is kept for update purposes but not used after the
update is finished.
2016-10-05 13:01:35 +03:00
Marco Slot 31ab616b31 Add replication model column to pg_dist_partition 2016-10-05 01:14:28 +02:00
Robin Thomas d23f11490a Provides safe, idempotent shard-extended names to any object name
related to a table that might be distributed, allowing any name
that is within regular PostgreSQL length limits to be extended
with a shard ID for use in shards on workers. Handles multi-byte
character boundaries in identifiers when making prefixes for
shard-extended names. Includes tests.
Uses hash_any from PostgreSQL's access/hashfunc.c.
Removes AppendShardIdToStringInfo() as it's used only once
and arguably is best replaced there with a call to AppendShardIdToName().

Adds UDF shard_name(object_name, shard_id) to expose the shard-extended
name logic to other PL/PGSQL, UDFs and scripts.

Bumps version to 6.0-2 to allow for UDF to be created in migration script.

Fixes citusdata/citus#781 and citusdata/citus#179.
2016-10-03 17:02:34 -04:00
Marco Slot 992179c187 Change logicalrelid type in pg_dist_partition and pg_dist_shard to regclass 2016-10-03 20:27:16 +02:00
Marco Slot cf419be848 Remove EventInvokeTrigger from regression test output 2016-10-03 20:21:15 +02:00
Robin Thomas 1e80d27585 During repartitions, the partitionColumnType argument sent to workers
is now a `::regtype` using the qualified name of the column type,
not the column type OID which may differ between master/worker nodes.
Test coverage of a hash reparitition using a UDT as the join column.

Note that the UDFs `worker_hash_partition_table` and `worker_range_partition_table`
are unchanged, and rightly expect an OID for the column type; but the
planner code building the commands now allows for `::regtype` casting
to do its magic.

Fixes citusdata/citus#111.
2016-10-03 13:41:20 -04:00
Robin Thomas de5242fa41 Added test coverage for partial unique indexes and exclude constraints. 2016-10-03 10:47:30 -04:00
Jason Petersen 2a3d8b2913 Remove references to 9.4
Some still lingered.
2016-09-29 17:35:19 -06:00
Burak Yucesoy 1a618b9c43 Internal co-location API
With this commit we introduce internal API for co-location related operations.
2016-09-29 11:56:53 +03:00
Murat Tuncer ba3d035b23 Make where false queries router plannable 2016-09-28 18:49:26 +03:00
Murat Tuncer 3ef841c67b Add UDF master_expire_table_cache 2016-09-28 12:08:37 +03:00
Jason Petersen f9e63097c9 Fix unique-violation-in-xact segfault
An interaction between ReraiseRemoteError and DML transaction support
causes segfaults:

  * ReraiseRemoteError calls PurgeConnection, freeing a connection...
  * That connection is still in the xactParticipantHash

At transaction end, the memory in the freed connection might happen to
pass the "is this connection OK?" check, causing us to try to send an
ABORT over that connection. By removing it from the transaction hash
before calling ReraiseRemoteError, we avoid this possibility.
2016-09-27 16:44:03 -06:00
Metin Doslu db64995353 Pass text oid inteads of invalid oid for null values
Passing invalid oids even for null values in PQsendQueryParams() causes worker
nodes to fail. Therefore, we pass text oid for null values.
2016-09-27 08:15:46 +03:00
Andres Freund 5e01434402 Support NoMovement direction in router executor
This is mainly interesting because it allows to use RETURN QUERY/RETURN
QUERY EXECUTE and FOR ... IN .. LOOPs in plpgsql.
2016-09-26 18:28:36 -06:00
Murat Tuncer fb74e08fa5 Add tests with spaces in table names 2016-09-26 18:23:43 -06:00
Murat Tuncer a342cacfc4 Address feedback 2016-09-26 18:23:42 -06:00
Murat Tuncer 4416b77088 Add support for truncate statement 2016-09-26 18:23:42 -06:00
Marco Slot a2276adcd2 Fix segmentation fault in case of joins with WHERE 1=0 2016-09-26 15:12:29 +02:00
Robin Thomas 6880efce5b Forbid EXCLUDE constraints on distributed tables just as we forbid
UNIQUE or PRIMARY KEY constraints. Also, properly propagate valid
EXCLUDE constraints to worker shard tables.

If an EXCLUDE constraint includes the distribution column,
the operator must be an equality operator.
Tests in regression suite for exclusion constraints that include
the partition column, omit it, and include it but with non-equality
operator. Regression tests also verify that valid exclusion constraints
are propagated to the shard tables. And the tests work in different
timezones now.

Fixes citusdata/citus#748 and citusdata/citus#778.
2016-09-21 14:02:42 -04:00
Metin Doslu 80a833aeb3 Remove pg_toast_* references from regression tests
pg_toast_* oids are constantly changing, and this causes regression tests to
fail time to time. With this commit, we remove all of the pg_toast_* references
from regression test outputs.
2016-09-09 11:31:51 +03:00
Eric B. Ridge 361e37f921 Add syscols in queries; extend relnames in indexes
To permit use with ZomboDB (https://github.com/zombodb/zombodb), two
changes were necessary:

  1. Permit use of `tableoid` system column in queries
  2. Extend relation names appearing in index expressions

The first is accomplished by simply changing the deparse logic to allow
system columns in queries destined for distributed tables. The latter
was slightly more complex, given that DDL extension currently occurs on
workers. But since indexes cannot reference tables other than the one
being indexed, it is safe to look for any relation reference ending in
a '*' character and extend their penultimate segments with a shard id.

This change also adds an error to prevent users from distributing any
relations using the WITH (OIDS) feature, which is unsupported.
2016-09-07 11:54:55 -05:00
Marco Slot 575bc99be5 Allow noop updates of the partition column 2016-09-07 14:22:41 +02:00
Burak Yucesoy 619ec529d4 Error out at master_create_distributed_table if the table has any rows
Before this change, we do not check whether given table which already contains any data
in master_create_distributed_table command. If that table contains any data, making it
it distributed, makes that data hidden to user. With this change, we now gave error to
user if the table contains data.
2016-09-01 17:42:47 +03:00
Jason Petersen c684ac11ec Re-permit DDL in transactions, selectively
Recent changes to DDL and transaction logic resulted in a "regression"
from the viewpoint of users. Previously, DDL commands were allowed in
multi-command transaction blocks, though they were not processed in any
actual transactional manner. We improved the atomicity of our DDL code,
but added a restriction that DDL commands themselves must not occur in
any BEGIN/END transaction block.

To give users back the original functionality (and improved atomicity)
we now keep track of whether a multi-command transaction has modified
data (DML) or schema (DDL). Interleaving the two modification types in
a single transaction is disallowed.

This first step simply permits a single DDL command in such a block,
admittedly an incomplete solution, but one which will permit us to add
full multi-DDL command support in a subsequent commit.
2016-08-30 20:37:19 -06:00
Brian Cloutier 24bd08d287 Remove check-multi-fdw tests, nobody uses Citus with fdws 2016-08-26 10:41:33 +03:00