Commit Graph

19 Commits (b96ba9b4908b8c87e95f0fb7323c7ced662e3dc2)

Author SHA1 Message Date
Andres Freund b96ba9b490 Fix code only enabled for 9.5.
There's still supporting wrappers used, a subsequent commit will
remove those.

This also removes the already unused tuplecount_t define.
2017-06-26 08:46:32 -07:00
Marco Slot 2f8ac82660 Execute INSERT..SELECT via coordinator if it cannot be pushed down
Add a second implementation of INSERT INTO distributed_table SELECT ... that is used if
the query cannot be pushed down. The basic idea is to execute the SELECT query separately
and pass the results into the distributed table using a CopyDestReceiver, which is also
used for COPY and create_distributed_table. When planning the SELECT, we go through
planner hooks again, which means the SELECT can also be a distributed query.

EXPLAIN is supported, but EXPLAIN ANALYZE is not because preventing double execution was
a lot more complicated in this case.
2017-06-22 15:46:30 +02:00
Marco Slot 4ed093970a Support expressions in the partition column in INSERTs 2017-04-21 14:05:52 +02:00
Metin Doslu 1f838199f8 Use CustomScan API for query execution
Custom Scan is a node in the planned statement which helps external providers
to abstract data scan not just for foreign data wrappers but also for regular
relations so you can benefit your version of caching or hardware optimizations.
This sounds like only an abstraction on the data scan layer, but we can use it
as an abstraction for our distributed queries. The only thing we need to do is
to find distributable parts of the query, plan for them and replace them with
a Citus Custom Scan. Then, whenever PostgreSQL hits this custom scan node in
its Vulcano style execution, it will call our callback functions which run
distributed plan and provides tuples to the upper node as it scans a regular
relation. This means fewer code changes, fewer bugs and more supported features
for us!

First, in the distributed query planner phase, we create a Custom Scan which
wraps the distributed plan. For real-time and task-tracker executors, we add
this custom plan under the master query plan. For router executor, we directly
pass the custom plan because there is not any master query. Then, we simply let
the PostgreSQL executor run this plan. When it hits the custom scan node, we
call the related executor parts for distributed plan, fill the tuple store in
the custom scan and return results to PostgreSQL executor in Vulcano style,
a tuple per XXX_ExecScan() call.

* Modify planner to utilize Custom Scan node.
* Create different scan methods for different executors.
* Use native PostgreSQL Explain for master part of queries.
2017-03-14 12:17:51 +02:00
Andres Freund 52358fe891 Initial temp table removal implementation 2017-03-14 12:09:49 +02:00
Andres Freund c244b8ef4a Make router planner error handling more flexible.
So far router planner had encapsulated different functionality in
MultiRouterPlanCreate. Modifications always go through router, selects
sometimes. Modifications always error out if the query is unsupported,
selects return NULL.  Especially the error handling is a problem for
the upcoming extension of prepared statement support.

Split MultiRouterPlanCreate into CreateRouterPlan and
CreateModifyPlan, and change them to not throw errors.

Instead errors are now reported by setting the new
MultiPlan->plannigError.

Callers of router planner functionality now have to throw errors
themselves if desired, but also can skip doing so.

This is a pre-requisite for expanding prepared statement support.

While touching all those lines, improve a number of error messages by
getting them closer to the postgres error message guidelines.
2017-01-23 09:23:50 -08:00
Andres Freund 557ccc6fda Support for deferred error messages.
It can be useful, e.g. in the upcoming prepared statement support, to
be able to return an error from a function that is not raised
immediately, but can later be thrown.  That allows e.g. to attempt to
plan a statment using different methods and to create good error
messages in each planner, but to only error out after all planners
have been run.

To enable that create support for deferred error messages that can be
created (supporting errorcode, message, detail, hint) in one function,
and then thrown in different place.
2017-01-23 09:23:50 -08:00
Andres Freund 6972186652 Add ShardPlacement fields required for colocated placement connection mapping. 2017-01-16 13:42:54 -08:00
Onder Kalaci 6d050fd677 Use 2PC for reference table modification
With this commit, we ensure that router executor always uses
2PC for reference table modifications and never mark the placements
of it as INVALID.
2017-01-04 12:46:35 +02:00
Marco Slot d745d7bf70 Add explicit RelationShards mapping to tasks 2016-12-23 10:23:43 +01:00
Onder Kalaci 1673ea937c Feature: INSERT INTO ... SELECT
This commit adds INSERT INTO ... SELECT feature for distributed tables.

We implement INSERT INTO ... SELECT by pushing down the SELECT to
each shard. To compute that we use the router planner, by adding
an "uninstantiated" constraint that the partition column be equal to a
certain value. standard_planner() distributes that constraint to all
the tables where it knows how to push the restriction safely. An example
is that the tables that are connected via equi joins.

The router planner then iterates over the target table's shards,
for each we replace the "uninstantiated" restriction, with one that
PruneShardList() handles. Do so by replacing the partitioning qual
parameter added in multi_planner() with the current shard's
actual boundary values. Also, add the current shard's boundary values to the
top level subquery to ensure that even if the partitioning qual is
not distributed to all the tables, we never run the queries on the shards
that don't match with the current shard boundaries. Finally, perform the
normal shard pruning to decide on whether to push the query to the
current shard or not.

We do not support certain SQLs on the subquery, which are described/commented
on ErrorIfInsertSelectQueryNotSupported().

We also added some locking on the router executor. When an INSERT/SELECT command
runs on a distributed table with replication factor >1, we need to ensure that
it sees the same result on each placement of a shard. So we added the ability
such that router executor takes exclusive locks on shards from which the SELECT
in an INSERT/SELECT reads in order to prevent concurrent changes. This is not a
very optimal solution, but it's simple and correct. The
citus.all_modifications_commutative can be used to avoid aggressive locking.
An INSERT/SELECT whose filters are known to exclude any ongoing writes can be
marked as commutative. See RequiresConsistentSnapshot() for the details.

We also moved the decison of whether the multiPlan should be executed on
the router executor or not to the planning phase. This allowed us to
integrate multi task router executor tasks to the router executor smoothly.
2016-10-26 10:01:00 +03:00
Marco Slot 9d98acfb6d Move requiresMasterEvaluation from Task to Job 2016-10-19 08:23:06 +02:00
Andres Freund ac14b2edbc
Support PostgreSQL 9.6
Adds support for PostgreSQL 9.6 by copying in the requisite ruleutils
file and refactoring the out/readfuncs code to flexibly support the
old-style copy/pasted out/readfuncs (prior to 9.6) or use extensible
node APIs (in 9.6 and higher).

Most version-specific code within this change is only needed to set new
fields in the AggRef nodes we build for aggregations. Version-specific
test output files were added in certain cases, though in most they were
not necessary. Each such file begins by e.g. printing the major version
in order to clarify its purpose.

The comment atop citus_nodes.h details how to add support for new nodes
for when that becomes necessary.
2016-10-18 16:23:55 -06:00
Metin Doslu d03a2af778 Add HAVING support
This commit completes having support in Citus by adding having support for
real-time and task-tracker executors. Multiple tests are added to regression
tests to cover new supported queries with having support.
2016-10-13 15:47:53 +03:00
Andres Freund 982ad66753 Introduce placement IDs.
So far placements were assigned an Oid, but that was just used to track
insertion order. It also did so incompletely, as it was not preserved
across changes of the shard state. The behaviour around oid wraparound
was also not entirely as intended.

The newly introduced, explicitly assigned, IDs are preserved across
shard-state changes.

The prime goal of this change is not to improve ordering of task
assignment policies, but to make it easier to reference shards.  The
newly introduced UpdateShardPlacementState() makes use of that, and so
will the in-progress connection and transaction management changes.
2016-10-07 11:59:20 -07:00
Brian Cloutier af9515f669 Only reparse queries if the planner flags them for reparsing 2016-07-13 11:45:51 -07:00
Jason Petersen 423e6c8ea0
Update copyright dates
Fixed configure variable and updated all end dates to 2016.
2016-03-23 17:14:37 -06:00
Murat Tuncer 55c44b48dd Changed product name to citus
All citusdb references in
- extension, binary names
- file headers
- all configuration name prefixes
- error/warning messages
- some functions names
- regression tests

are changed to be citus.
2016-02-15 16:04:31 +02:00
Onder Kalaci 136306a1fe Initial commit of Citus 5.0 2016-02-11 04:05:32 +02:00