Commit Graph

454 Commits (bc1a800f7095c8fe29f0c95e21c97dc008eca225)

Author SHA1 Message Date
Marco Slot 7279d42849 Treat read_intermediate_result as recurring tuples 2017-12-04 14:50:11 +01:00
Murat Tuncer 2d66bf5f16
Fix hard coded formatting strings for 64 bit numbers (#1831)
Postgres provides OS agnosting formatting macros for
formatting 64 bit numbers. Replaced %ld %lu with
INT64_FORMAT and UINT64_FORMAT respectively.

Also found some incorrect usages of formatting
flags and fixed them.
2017-12-04 14:11:06 +03:00
Onder Kalaci a273711500 The common attribute equivalance class always includes the input relations
We added the ability to filter out the planner restriction information
for specific parts of the query. This might lead to situations where
the common restriction includes some other relations that we're searching
for. The reason is that while filtering for join restrictions, we add the
restriction as soon as we find the relation.

With this commit we make sure that the common attribute
equivalance class always includes the input relations.
2017-11-30 16:00:26 +02:00
Marco Slot 3a4d5f8182 Remove filter checks on leaf queries 2017-11-30 12:25:14 +01:00
Marco Slot 3f03cb6a6a Support UNION with joins in the subqueries 2017-11-30 10:37:56 +01:00
Marco Slot a9933deac6 Make real time executor work in transactions 2017-11-30 09:59:32 +03:00
Marco Slot 7ea718fd8d Round-robin over worker nodes for 0-shard router queries 2017-11-29 15:52:22 +01:00
Onder Kalaci 05fb0dd020 Add infrastructure for filtering restriction contexts based on the input query
In subquery pushdown, we first ensure that each relation is joined with at least
on another relation on the partition keys. That's fine given that the decision
is binary: pushdown the query at all or not.

With recursive planning, we'd want to check whether any specific part
of the query can be pushded down or not. Thus, we need the ability to
understand which part(s) of the subquery is safe to pushdown. This commit
adds the infrastructure for doing that.
2017-11-28 09:58:21 +02:00
Onder Kalaci 26d9b58e9e Make sure that ExtractRangeTableRelationWalker never misses RTE_RELATION 2017-11-28 09:27:34 +02:00
Onder Kalaci 32def06ebd Split assigning RTE identities and partitioning related query modifications
Note that we used to iterate over the RTEs once for performance reasons.
However, keeping an extra copy of original query seems more costly and
hard to maintain/explain.
2017-11-28 09:27:34 +02:00
Marco Slot feffe86440 Subqueries containing functions go through subquery pushdown 2017-11-27 22:13:02 +01:00
Onder Kalaci 48f96bf3e5 Enable non equi joins in subquery pushdown
Subquery pushdown planning is based on relation restriction
equivalnce. This brings us the opportuneatly to allow any
other joins as long as there is an already equi join between
the distributed tables.

We already allow that for joins with reference tables and
this commit allows that for joins among distributed tables.
2017-11-23 16:13:46 +02:00
Onder Kalaci 83c1143505 Refactor custom scan related codes
In this commit, we don't change any codes, only create a new
file and move the related functions and types there.
2017-11-23 11:38:12 +02:00
Marco Slot 6ba3f42d23 Rename MultiPlan to DistributedPlan 2017-11-22 09:36:24 +01:00
Marco Slot 0ad39b36fe Treat immutable table functions and constant subqueries as reference tables 2017-11-21 14:15:22 +01:00
Onder Kalaci d558ebb923 Relax the checks on ensuring distribution columns for target entries
With this commit, we allow pushing down subqueries with only
reference tables where GROUP BY or DISTINCT clause or Window
functions include only columns from reference tables.
2017-11-21 12:28:14 +02:00
Brian Cloutier 7be1545843 Support implicit casts during INSERT/SELECT
It's possible to build INSERT SELECT queries which include implicit
casts, currently we attempt to support these by adding explicit casts to
the SELECT query, but this sometimes crashes because we don't update all
nodes with the new types. (SortClauses, for instance)

This commit removes those explicit casts and passes an unmodified SELECT
query to the COPY executor (how we implement INSERT SELECT under the
scenes). In lieu of those cases, COPY has been given some extra logic to
inspect queries, notice that the types don't line up with the table it's
supposed to be inserting into, and "manually" casting every tuple before
sending them to workers.
2017-11-03 22:27:15 -07:00
Marco Slot 6219186683 Allow distributed INSERT...SELECT via worker nodes in MX 2017-11-02 14:38:39 +01:00
metdos 8c356b2bc8 Don't try to add restrictions for reference tables in insert into select 2017-10-31 19:44:10 +02:00
Murat Tuncer e16805215d
Support count(distinct) for non-partition columns (#1692)
Expands count distinct coverage by allowing more cases. We used to support
count distinct only if we can push down distinct aggregate to worker query
i.e. the count distinct clause was on the partition column of the table,
or there was a grouping on the partition column.

Now we can support
- non-partition columns, with or without grouping on partition column
- partition, and non partition column in the same query
- having clause
- single table subqueries
- insert into select queries
- join queries where count distinct is on partition, or non-partition column
- filters on count distinct clauses (extends existing support)

We first try to push down aggregate to worker query (original case), if we
can't then we modify worker query to return distinct columns to coordinator
node. We do that by adding distinct column targets to group by clauses. Then
we perform count distinct operation on the coordinator node.

This work should reduce the cases where HLL is used as it can address anything
that HLL can. However, if we start having performance issues due to very large
number rows, then we can recommend hll use.
2017-10-30 13:12:24 +02:00
velioglu 0b5db5d826 Support multi shard update/delete queries 2017-10-25 15:52:38 +03:00
Murat Tuncer 4832abc7cb Make multi_master_planner.c coding convention compliant
Changed order of function definitions and added
declarations in the beginning of the file
2017-10-13 14:59:48 +03:00
Murat Tuncer f7ab901766 Add select distinct, and distinct on support
Distinct, and distinct on() clauses are supported
in simple selects, joins, subqueries, and insert into select
queries.
2017-10-13 14:59:48 +03:00
Onder Kalaci 498ac80d8b Add window function support for SUBQUERY PUSHDOWN and INSERT INTO SELECT
This commit provides the support for window functions in subquery and insert
into select queries. Note that our support for window functions is still limited
because it must have a partition by clause on the distribution key. This commit
makes changes in the files insert_select_planner and multi_logical_planner. The
required tests are also added with files multi_subquery_window_functions.out
and multi_insert_select_window.out.
2017-10-04 15:33:07 +03:00
Hadi Moshayedi 11adb9b034 Push down LIMIT and HAVING when grouped by partition key. (#1641)
We can do this because all rows belonging to a group are in the same shard when grouping by distribution column on a range/hash distributed table.
2017-10-02 20:17:51 -04:00
Jason Petersen d686123dae
Omit now-public Explain methods from PG11 build
This copy-pasted code is no longer needed in PG11.
2017-09-25 17:20:24 -07:00
Jason Petersen 6c9b19a954
Add version-compat header
For polyfill macros, etc.
2017-09-25 17:20:23 -07:00
Jason Petersen fbeaa2f9d0
Remove direct access to tupleDesc->attrs
A level of indirection was removed from this field for PostgreSQL 11.
By using the handy provided macro, we can be version agnostic.
2017-09-25 17:20:23 -07:00
velioglu 0a56ed910b Change error message of queries with distributed and local table
Citus can handle INSERT INTO ... SELECT queries if the query inserts
into local table by reading data from distributed table. The opposite
way is not correct. With this commit we warn the user if the latter
option is used.
2017-09-22 13:46:19 -07:00
Onder Kalaci 6116c8e93d Allow pushing down GROUP BYs when at least there is one distribution
column in the target list
2017-09-15 19:15:06 +03:00
Onder Kalaci a5b66912d4 Expand reference table support in subquery pushdown
With this commit, we relax the restrictions put on the reference
tables with subquery pushdown.

We did three notable improvements:

1) Relax equi-join restrictions

 Previously, we always expected that the non-reference tables are
 equi joined with reference tables on the partition key of the
 non-reference table.

 With this commit, we allow any column of non-reference tables
 joined using non-equi joins as well.

2) Relax OUTER JOIN restrictions

 Previously Citus errored out if any reference table exists at
 any point of the outer part of an outer join. For instance,
 See the below sketch where (h) denotes a hash distributed relation,
 (r) denotes a reference table, (L) denotes LEFT JOIN and
 (I) denotes INNER JOIN.

             (L)
             /  \
           (I)     h
          /  \
        r      h

 Before this commit Citus would error out since a reference table
 appears on the left most part of an left join. However, that was
 too restrictive so that we only error out if the reference table
 is directly below and in the outer part of an outer join.

3) Bug fixes

 We've done some minor bugfixes in the existing implementation.
2017-09-14 20:59:22 +03:00
Marco Slot 5fe0845d7e Always copy MultiPlan in GetMultiPlan 2017-09-12 11:38:52 -07:00
Marco Slot 0aadbb1760 Convert multi-row INSERT target list to Vars 2017-08-25 10:55:56 +02:00
Marco Slot ae00795dab Allow default columns in multi-row INSERTs 2017-08-25 10:55:56 +02:00
Marco Slot c97692f382 Fix multi-row INSERT with RETURNING on reference tables 2017-08-24 10:42:12 +02:00
Jason Petersen 8cb69e3a14 Add alias for target in multi-row INSERTs
This is necessary for multi-row INSERTs for the same reasons we use it
in e.g. UPSERTs: if the range table list has more than one entry, then
PostgreSQL's deparse logic requires that vars be prefixed by the name
of their corresponding range table entry. This of course doesn't affect
single-row INSERTs, but since multi-row INSERTs have a VALUE RTE, they
were affected.

The piece of ruleutils which builds range table names wasn't modified
to handle shard extension; instead UPSERT/INSERT INTO ... SELECT added
an alias to the RTE. When present, this alias is favored. Doing the
same in the multi-row INSERT case fixes RETURNING for such commands.
2017-08-23 10:24:00 +02:00
Metin Doslu 0d052e9864 Fix a crash on zero-shard tables 2017-08-18 13:53:59 +03:00
Burak Yucesoy 52b9e35d50 Add relationIdList field to the Job struct 2017-08-14 14:06:22 +03:00
velioglu b0efffae1c Correct planner and add more tests 2017-08-11 10:16:13 +03:00
velioglu 7550b8ad52 Fix anchor shard id selection when reference table exists 2017-08-11 10:09:47 +03:00
velioglu ceba81ce35 Move physical planner checks to logical planner 2017-08-11 10:09:47 +03:00
velioglu 0359d03530 Add set operation check for reference tables 2017-08-11 10:09:47 +03:00
velioglu c4e3b8b5e1 Add planner changes and tests for subquery on reference tables 2017-08-11 10:09:47 +03:00
velioglu 45717dd013 Check equivalence on reference tables for subquery pushdown 2017-08-11 10:09:47 +03:00
Brian Cloutier 9d93fb5551 Create citus.use_secondary_nodes GUC
This GUC has two settings, 'always' and 'never'. When it's set to
'never' all behavior stays exactly as it was prior to this commit. When
it's set to 'always' only SELECT queries are allowed to run, and only
secondary nodes are used when processing those queries.

Add some helper functions:
- WorkerNodeIsSecondary(), checks the noderole of the worker node
- WorkerNodeIsReadable(), returns whether we're currently allowed to
  read from this node
- ActiveReadableNodeList(), some functions (namely, the ones on the
  SELECT path) don't require working with Primary Nodes. They should call
  this function instead of ActivePrimaryNodeList(), because the latter
  will error out in contexts where we're not allowed to write to nodes.
- ActiveReadableNodeCount(), like the above, replaces
  ActivePrimaryNodeCount().
- EnsureModificationsCanRun(), error out if we're not currently allowed
  to run queries which modify data. (Either we're in read-only mode or
  use_secondary_nodes is set)

Some parts of the code were switched over to use readable nodes instead
of primary nodes:
- Deadlock detection
- DistributedTableSize,
- the router, real-time, and task tracker executors
- ShardPlacement resolution
2017-08-10 17:37:17 +03:00
Jason Petersen dee66e3959
Final review feedback 2017-08-10 01:10:09 -07:00
Jason Petersen 6a35c2937c
Enable multi-row INSERTs
This is a pretty substantial refactoring of the existing modify path
within the router executor and planner. In particular, we now hunt for
all VALUES range table entries in INSERT statements and group the rows
contained therein by shard identifier. These rows are stashed away for
later in "ModifyRoute" elements. During deparse, the appropriate RTE
is extracted from the Query and its values list is replaced by these
rows before any SQL is generated.

In this way, we can create multiple Tasks, but only one per shard, to
piecemeal execute a multi-row INSERT. The execution of jobs containing
such tasks now exclusively go through the "multi-router executor" which
was previously used for e.g. INSERT INTO ... SELECT.

By piggybacking onto that executor, we participate in ongoing trans-
actions, get rollback-ability, etc. In short order, the only remaining
use of the "single modify" router executor will be for bare single-
row INSERT statements (i.e. those not in a transaction).

This change appropriately handles deferred pruning as well as master-
evaluated functions.
2017-08-10 00:32:46 -07:00
velioglu 7e436c0277 Add bool expression to pruning instance with a function 2017-08-10 08:56:36 +03:00
Andres Freund e8b793c454 Support for IN (const, list) and = ANY(const, b, c) pruning. 2017-08-10 08:56:36 +03:00
Burak Yucesoy 31f3221342 Add distributed partitioned table support to router plannable queries
In standart_planner, PostgreSQL expands partitioned tables to their
partitions and call our restriction hook for each partition. It also,
for some queries, skips the partitioned table itself completely. This
behaviour makes it difficult to prune shards and decide whether query
is router plannable or not. To prevent this behaviour, we change inh
flag of partitioned tables to false in the query tree. In this case,
PostgreSQL treats those partitioned tables as regular relations and
does not expand them.

This behaviour is inline with our expectations, because we do not want
to treat partitioned tables differently on coordinator. Although we are
not entirely comfortable with modifying query tree, other solutions to
this problem is overly complicated.
2017-08-09 10:01:35 +03:00
Metin Doslu b8a9e7c1bf Add support for UPDATE/DELETE with subqueries 2017-08-08 21:35:08 +03:00
Marco Slot aa7ca81548 Execute UPDATE/DELETE statements with 0 shards 2017-08-07 15:36:58 +02:00
Murat Tuncer fa18899cf9 Remove serialization/deserialization of multiplan node (#1477)
introduces copy functions for Citus MultiPlan nodes.
uses ExtensibleNode mechanism to store MultiPlan data
drops serialiazation of MultiPlans
2017-08-02 08:24:00 +03:00
Brian Cloutier ec99f8f983 Add nodeRole column
- master_add_node enforces that there is only one primary per group
- there's also a trigger on pg_dist_node to prevent multiple primaries
  per group
- functions in metadata cache only return primary nodes
- Rename ActiveWorkerNodeList -> ActivePrimaryNodeList
- Rename WorkerGetLive{Node->Group}Count()
- Refactor WorkerGetRandomCandidateNode
- master_remove_node only complains about active shard placements if the
  node being removed is a primary.
- master_remove_node only deletes all reference table placements in the
  group if the node being removed is the primary.
- Rename {Node->NodeGroup}HasShardPlacements, this reflects the behavior it
  already had.
- Rename DeleteAllReferenceTablePlacementsFrom{Node->NodeGroup}. This also
  reflects the behavior it already had, but the new signature forces the
  caller to pass in a groupId
- Rename {WorkerGetLiveGroup->ActivePrimaryNode}Count
2017-07-24 11:57:46 +03:00
Brian Cloutier 7ad95b53d2 Rename pg_dist_shard_placement -> pg_dist_placement
Comes with a few changes:

- Change the signature of some functions to accept groupid
  - InsertShardPlacementRow
  - DeleteShardPlacementRow
  - UpdateShardPlacementState

- NodeHasActiveShardPlacements returns true if the group the node is a
  part of has any active shard placements

- TupleToShardPlacement now returns ShardPlacements which have NULL
  nodeName and nodePort.

- Populate (nodeName, nodePort) when creating ShardPlacements
- Disallow removing a node if it contains any shard placements

- DeleteAllReferenceTablePlacementsFromNode matches based on group. This
  doesn't change behavior for now (while there is only one node per
  group), but means in the future callers should be careful about
  calling it on a secondary node, it'll delete placements on the primary.

- Create concept of a GroupShardPlacement, which represents an actual
  tuple in pg_dist_placement and is distinct from a ShardPlacement,
  which has been resolved to a specific node. In the future
  ShardPlacement should be renamed to NodeShardPlacement.

- Create some triggers which allow existing code to continue to insert
  into and update pg_dist_shard_placement as if it still existed.
2017-07-12 14:17:31 +02:00
Jason Petersen 9018e698ec
Indentation cleanup
Uncrustify 0.65 appears to have changed some defaults, resulting in
breakages for those of us who have already upgraded; Travis still uses
Uncrustify 0.64, but these changes work with both versions (assuming
appropriately updated config), so this should permit use of either
version for the time being.
2017-07-11 15:59:28 -06:00
Murat Tuncer 2a4eada150 Replace duplicate code and call check_functions_in_node (#1478)
MasterIrreducibleExpressionWalker has a copied code from
function check_functions_in_node() which was available with
PG 9.6+. Now PG 9.5 support is dropped we can remove
duplicate code and directly call check_functions_in_node().
2017-07-07 10:19:33 +03:00
Marco Slot da47a03b18 Move INSERT ... SELECT planning logic into one place 2017-06-29 15:03:14 +02:00
Andres Freund dc3997c3b8 Remove 9.5 related node wrappers.
Now that all branches support the extensible node infrastructure, we
don't need our wrappers anymore.
2017-06-26 08:46:32 -07:00
Andres Freund b96ba9b490 Fix code only enabled for 9.5.
There's still supporting wrappers used, a subsequent commit will
remove those.

This also removes the already unused tuplecount_t define.
2017-06-26 08:46:32 -07:00
Jason Petersen 2204da19f0 Support PostgreSQL 10 (#1379)
Adds support for PostgreSQL 10 by copying in the requisite ruleutils
and updating all API usages to conform with changes in PostgreSQL 10.
Most changes are fairly minor but they are numerous. One particular
obstacle was the change in \d behavior in PostgreSQL 10's psql; I had
to add SQL implementations (views, mostly) to mimic the pre-10 output.
2017-06-26 02:35:46 -06:00
Marco Slot 2f8ac82660 Execute INSERT..SELECT via coordinator if it cannot be pushed down
Add a second implementation of INSERT INTO distributed_table SELECT ... that is used if
the query cannot be pushed down. The basic idea is to execute the SELECT query separately
and pass the results into the distributed table using a CopyDestReceiver, which is also
used for COPY and create_distributed_table. When planning the SELECT, we go through
planner hooks again, which means the SELECT can also be a distributed query.

EXPLAIN is supported, but EXPLAIN ANALYZE is not because preventing double execution was
a lot more complicated in this case.
2017-06-22 15:46:30 +02:00
Marco Slot 155db4d913 Simplify router planner call path 2017-06-22 15:45:57 +02:00
jmunsch 1647d17a14 Clarify error message for local and distributed query plans. 2017-06-01 11:52:49 -07:00
Jason Petersen f86920f9d6
Add includes for missing standard headers
We use symbols from each of these and were relying on them being
included by other headers.
2017-05-16 11:05:33 -06:00
Jason Petersen 82b03d5cb6
Add explicit cast for argument to copyObject
PostgreSQL 10 adds a call to typeof, if supported.
2017-05-16 11:05:33 -06:00
Önder Kalacı 3ec502b286 Add support for parametrized execution for subquery pushdown (#1356)
Distributed query planning for subquery pushdown is done on the original
query. This prevents the usage of external parameters on the execution.
To overcome this, we manually replace the parameters on the original
query.
2017-05-10 09:38:48 +03:00
Önder Kalacı ef6d3587b6 Skip exhaustive test in CoPartitionedTables() if declared colocated (#1376)
That's considerably cheaper.
2017-05-02 03:33:21 +03:00
Önder Kalacı b74ed3c8e1 Subqueries in where -- updated (#1372)
* Support for subqueries in WHERE clause

This commit enables subqueries in WHERE clause to be pushed down
by the subquery pushdown logic.

The support covers:
  - Correlated subqueries with IN, NOT IN, EXISTS, NOT EXISTS,
    operator expressions such as (>, <, =, ALL, ANY etc.)
  - Non-correlated subqueries with (partition_key) IN (SELECT partition_key ..)
    (partition_key) =ANY (SELECT partition_key ...)

Note that this commit heavily utilizes the attribute equivalence logic introduced
in the 1cb6a34ba8. In general, this commit mostly
adjusts the logical planner not to error out on the subqueries in WHERE clause.

* Improve error checks for subquery pushdown and INSERT ... SELECT

Since we allow subqueries in WHERE clause with the previous commit,
we should apply the same limitations to those subqueries.

With this commit, we do not iterate on each subquery one by one.
Instead, we extract all the subqueries and apply the checks directly
on those subqueries. The aim of this change is to (i) Simplify the
code (ii) Make it close to the checks on INSERT .. SELECT code base.

* Extend checks for unresolved paramaters to include SubLinks

With the presence of subqueries in where clause (i.e., SubPlans on the
query) the existing way for checking unresolved parameters fail. The
reason is that the parameters for SubPlans are kept on the parent plan not
on the query itself (see primnodes.h for the details).

With this commit, instead of checking SubPlans on the modified plans
we start to use originalQuery, where SubLinks represent the subqueries
in where clause. The unresolved parameters can be found on the SubLinks.

* Apply code-review feedback

* Remove unnecessary copying of shard interval list

This commit removes unnecessary copying of shard interval list. Note
that there are no copyObject function implemented for shard intervals.
2017-05-01 17:20:21 +03:00
Önder Kalacı ad5cd326a4 Subquery pushdown - main branch (#1323)
* Enabling physical planner for subquery pushdown changes

This commit applies the logic that exists in INSERT .. SELECT
planning to the subquery pushdown changes.

The main algorithm is followed as :
   - pick an anchor relation (i.e., target relation)
   - per each target shard interval
       - add the target shard interval's shard range
         as a restriction to the relations (if all relations
         joined on the partition keys)
        - Check whether the query is router plannable per
          target shard interval.
        - If router plannable, create a task

* Add union support within the JOINS

This commit adds support for UNION/UNION ALL subqueries that are
in the following form:

     .... (Q1 UNION Q2 UNION ...) as union_query JOIN (QN) ...

In other words, we currently do NOT support the queries that are
in the following form where union query is not JOINed with
other relations/subqueries :

     .... (Q1 UNION Q2 UNION ...) as union_query ....

* Subquery pushdown planner uses original query

With this commit, we change the input to the logical planner for
subquery pushdown. Before this commit, the planner was relying
on the query tree that is transformed by the postgresql planner.
After this commit, the planner uses the original query. The main
motivation behind this change is the simplify deparsing of
subqueries.

* Enable top level subquery join queries

This work enables
- Top level subquery joins
- Joins between subqueries and relations
- Joins involving more than 2 range table entries

A new regression test file is added to reflect enabled test cases

* Add top level union support

This commit adds support for UNION/UNION ALL subqueries that are
in the following form:

     .... (Q1 UNION Q2 UNION ...) as union_query ....

In other words, Citus supports allow top level
unions being wrapped into aggregations queries
and/or simple projection queries that only selects
some fields from the lower level queries.

* Disallow subqueries without a relation in the range table list for subquery pushdown

This commit disallows subqueries without relation in the range table
list. This commit is only applied for subquery pushdown. In other words,
we do not add this limitation for single table re-partition subqueries.

The reasoning behind this limitation is that if we allow pushing down
such queries, the result would include (shardCount * expectedResults)
where in a non distributed world the result would be (expectedResult)
only.

* Disallow subqueries without a relation in the range table list for INSERT .. SELECT

This commit disallows subqueries without relation in the range table
list. This commit is only applied for INSERT.. SELECT queries.

The reasoning behind this limitation is that if we allow pushing down
such queries, the result would include (shardCount * expectedResults)
where in a non distributed world the result would be (expectedResult)
only.

* Change behaviour of subquery pushdown flag (#1315)

This commit changes the behaviour of the citus.subquery_pushdown flag.
Before this commit, the flag is used to enable subquery pushdown logic. But,
with this commit, that behaviour is enabled by default. In other words, the
flag is now useless. We prefer to keep the flag since we don't want to break
the backward compatibility. Also, we may consider using that flag for other
purposes in the next commits.

* Require subquery_pushdown when limit is used in subquery

Using limit in subqueries may cause returning incorrect
results. Therefore we allow limits in subqueries only
if user explicitly set subquery_pushdown flag.

* Evaluate expressions on the LIMIT clause (#1333)

Subquery pushdown uses orignal query, the LIMIT and OFFSET clauses
are not evaluated. However, logical optimizer expects these expressions
are already evaluated by the standard planner. This commit manually
evaluates the functions on the logical planner for subquery pushdown.

* Better format subquery regression tests (#1340)

* Style fix for subquery pushdown regression tests

With this commit we intented a more consistent style for the
regression tests we've added in the
  - multi_subquery_union.sql
  - multi_subquery_complex_queries.sql
  - multi_subquery_behavioral_analytics.sql

* Enable the tests that are temporarily commented

This commit enables some of the regression tests that were commented
out until all the development is done.

* Fix merge conflicts (#1347)

 - Update regression tests to meet the changes in the regression
   test output.
 - Replace Ifs with Asserts given that the check is already done
 - Update shard pruning outputs

* Add view regression tests for increased subquery coverage (#1348)

- joins between views and tables
- joins between views
- union/union all queries involving views
- views with limit
- explain queries with view

* Improve btree operators for the subquery tests

This commit adds the missing comprasion for subquery composite key
btree comparator.
2017-04-29 04:09:48 +03:00
Andres Freund 90b211267d Perform range based pruning if equality pruning has survivor.
We previously dismissed this as unimportant, but it turns out to be
very useful for the upcoming subquery pushdown, where a user might
specify an equality constraint in a subquery, and the subquery
pushdown machinery adds >= and <= restrictions on the shard boundary.
Previously the latter restriction was ignored.
2017-04-28 17:35:18 -07:00
Andres Freund 6c08fe72f9 Use stricter qual for pruning if both >/< and >=/<= are present.
Previously, if both =< and < (>= and < respectively) were specified,
we always used the latter restriction.  Instead use the stricter one.
2017-04-28 17:35:18 -07:00
Burak Yucesoy 6599677902 Fix check-vanilla tests
It semms that GEQO optimizations, when it is set to on, create their own memory context
and free it after when it is no longer necessary. In join multi_join_restriction_hook
we allocate our variables in the CurrentMemoryContext, which is GEQO's memory context
if it is active. To prevent deallocation of our variables when GEQO's memory context is
freed, we started to allocate memory fo these variables in separate MemoryContext.
2017-04-29 01:55:18 +02:00
Andres Freund d399f395f7 Faster shard pruning.
So far citus used postgres' predicate proofing logic for shard
pruning, except for INSERT and COPY which were already optimized for
speed.  That turns out to be too slow:
* Shard pruning for SELECTs is currently O(#shards), because
  PruneShardList calls predicate_refuted_by() for every
  shard. Obviously using an O(N) type algorithm for general pruning
  isn't good.
* predicate_refuted_by() is quite expensive on its own right. That's
  primarily because it's optimized for doing a single refutation
  proof, rather than performing the same proof over and over.
* predicate_refuted_by() does not keep persistent state (see 2.) for
  function calls, which means that a lot of syscache lookups will be
  performed. That's particularly bad if the partitioning key is a
  composite key, because without a persistent FunctionCallInfo
  record_cmp() has to repeatedly look-up the type definition of the
  composite key. That's quite expensive.

Thus replace this with custom-code that works in two phases:
1) Search restrictions for constraints that can be pruned upon
2) Use those restrictions to search for matching shards in the most
   efficient manner available:
   a) Binary search / Hash Lookup in case of hash partitioned tables
   b) Binary search for equal clauses in case of range or append
      tables without overlapping shards.
   c) Binary search for inequality clauses, searching for both lower
      and upper boundaries, again in case of range or append
      tables without overlapping shards.
   d) exhaustive search testing each ShardInterval

My measurements suggest that we are considerably, often orders of
magnitude, faster than the previous solution, even if we have to fall
back to exhaustive pruning.
2017-04-28 14:40:41 -07:00
Metin Doslu b6659bec22 Send explain queries with savepoints
With this commit, we started to send explain queries within a savepoint. After
running explain query, we rollback to savepoint. This saves us from side effects
of EXPLAIN ANALYZE on DML queries.
2017-04-28 12:13:48 -07:00
Jason Petersen 93e3afc25c
Remove FastShardPruning method
With the other simplifications, it doesn't make sense to keep around.
2017-04-27 13:32:36 -06:00
Jason Petersen 42ee7c05f5
Refactor FindShardInterval to use cacheEntry
All callers fetch a cache entry and extract/compute arguments for the
eventual FindShardInterval call, so it makes more sense to refactor
into that function itself; this solves the use-after-free bug, too.
2017-04-27 13:32:36 -06:00
Andres Freund b7dfeb0bec Boring regression test output adjustments.
Soon shard pruning will be optimized not to generally work linearly
anymore.  Thus we can't print the pruned shard intervals as currently
done anymore.

The current printing of shard ids also prevents us from running tests
in parallel, as otherwise shard ids aren't linearly numbered.
2017-04-26 11:33:56 -07:00
Andres Freund 71a7f39b05 Skip exhaustive test in CoPartitionedTables() if declared colocated.
That's considerably cheaper.
2017-04-26 11:19:17 -07:00
Marco Slot 4ed093970a Support expressions in the partition column in INSERTs 2017-04-21 14:05:52 +02:00
velioglu 8cbef819be Log message of across shard queries according to the log level 2017-04-20 12:24:46 +03:00
velioglu 2327b63291 Change native hash function with worker_hash 2017-04-19 22:16:55 +03:00
Marco Slot dfd7d86948 Stop using a sequence to generate unique job IDs 2017-04-18 11:31:51 +02:00
Marco Slot af0e462409 Support UPDATE/DELETE with parameterised partition column qual 2017-04-17 16:17:30 +02:00
Burak Yucesoy e9095e62ec Decouple reference table replication
With this change we add an option to add a node without replicating all reference
tables to that node. If a node is added with this option, we mark the node as
inactive and no queries will sent to that node.

We also added two new UDFs;
 - master_activate_node(host, port):
    - marks node as active and replicates all reference tables to that node
 - master_add_inactive_node(host, port):
    - only adds node to pg_dist_node
2017-04-17 13:33:31 +03:00
Burak Yucesoy 7cfcb7d2f8 Error out on parameterized SQL functions
Before this commit, we were erroring out for queries containing parameterized SQL functions
like 'SELECT parameterized_sql_query(value)' as we should, however we were returning wrong
results for queries like 'SELECT * FROM parameterized_sql_query(value)'. With this commit
we started to error out on such queries too.
2017-04-13 16:36:24 +03:00
Onder Kalaci 1cb6a34ba8 Remove uninstantiated qual logic, use attribute equivalences
In this PR, we aim to deduce whether each of the RTE_RELATION
is joined with at least on another RTE_RELATION on their partition keys. If each
RTE_RELATION follows the above rule, we can conclude that all RTE_RELATIONs are
joined on their partition keys.

In order to do that, we invented a new equivalence class namely:
AttributeEquivalenceClass. In very simple words, a AttributeEquivalenceClass is
identified by an unique id and consists of a list of AttributeEquivalenceMembers.

Each AttributeEquivalenceMember is designed to identify attributes uniquely within the
whole query. The necessity of this arise since varno attributes are defined within
a single level of a query. Instead, here we want to identify each RTE_RELATION uniquely
and try to find equality among each RTE_RELATION's partition key.

Whenever we find an equality clause A = B, where both A and B originates from
relation attributes (i.e., not random expressions), we create an
AttributeEquivalenceClass to record this knowledge. If we later find another
equivalence B = C, we create another AttributeEquivalenceClass. Finally, we can
apply transitity rules and generate a new AttributeEquivalenceClass which includes
A, B and C.

Note that equality among the members are identified by the varattno and rteIdentity.

Each equality among RTE_RELATION is saved using an AttributeEquivalenceClass where
each member attribute is identified by a AttributeEquivalenceMember. In the final
step, we try generate a common attribute equivalence class that holds as much as
AttributeEquivalenceMembers whose attributes are a partition keys.
2017-04-13 11:51:26 +03:00
Onder Kalaci 11665dbe3c Fix pushing down wrong queries for INSERT ... SELECT queries
Before this commit, in certain cases router planner allowed pushing
down JOINs that are not on the partition keys.

With @anarazel's suggestion, we change the logic to use uninstantiated
parameter. Previously, the planner was traversing on the restriction
information and once it finds the parameter, it was replacing it with
the shard range. With this commit, instead of traversing the restrict
infos, the planner explicitly checks for the equivalence of the relation
partition key with the uninstantiated parameter. If finds an equivalence,
it adds the restrictions. In this way, we have more control over the
queries that are pushed down.
2017-03-24 11:37:35 +02:00
Metin Doslu b1ee7ec93e
Fix access permission checks for distributed relations
With this commit, we add the range table list of the original query to our
custom plan. Therefore, PostgreSQL can check relations in the original query
for access permissions and error out if the proper access is not granted.
2017-03-22 15:25:00 -06:00
Murat Tuncer c4734d7d94 Rephrase router modify errors
generic "distributed modifications must target exactly one shard"
message is replaced by more context aware error messages.
2017-03-16 15:09:10 +03:00
Metin Doslu 1f838199f8 Use CustomScan API for query execution
Custom Scan is a node in the planned statement which helps external providers
to abstract data scan not just for foreign data wrappers but also for regular
relations so you can benefit your version of caching or hardware optimizations.
This sounds like only an abstraction on the data scan layer, but we can use it
as an abstraction for our distributed queries. The only thing we need to do is
to find distributable parts of the query, plan for them and replace them with
a Citus Custom Scan. Then, whenever PostgreSQL hits this custom scan node in
its Vulcano style execution, it will call our callback functions which run
distributed plan and provides tuples to the upper node as it scans a regular
relation. This means fewer code changes, fewer bugs and more supported features
for us!

First, in the distributed query planner phase, we create a Custom Scan which
wraps the distributed plan. For real-time and task-tracker executors, we add
this custom plan under the master query plan. For router executor, we directly
pass the custom plan because there is not any master query. Then, we simply let
the PostgreSQL executor run this plan. When it hits the custom scan node, we
call the related executor parts for distributed plan, fill the tuple store in
the custom scan and return results to PostgreSQL executor in Vulcano style,
a tuple per XXX_ExecScan() call.

* Modify planner to utilize Custom Scan node.
* Create different scan methods for different executors.
* Use native PostgreSQL Explain for master part of queries.
2017-03-14 12:17:51 +02:00
Andres Freund 52358fe891 Initial temp table removal implementation 2017-03-14 12:09:49 +02:00
Murat Tuncer f657a744d5 Enable router planner for queries on range partitioned tables
Router planner now supports queries using range partitioned
tables. Queries on append partitioned tables are still not
supported.
2017-03-09 16:39:15 +03:00
Metin Doslu ee425871ee Get reproducible costs between different PostgreSQL versions 2017-02-22 15:40:02 +02:00
Andres Freund 9721e80901 Use DEBUG2 instead of DEBUG4 in INSERT SELECT tests & debug message.
During later work the transaction debug output will change (as it will
in postgres 10), which makes it hard to see actual changes in the
INSERT ... SELECT ... test.  Reduce to DEBUG2 after changing a debug
message to that log level.
2017-02-20 12:56:16 +02:00
Marco Slot ba940a1de9 Use coordinator instead of schema node in terminology 2017-01-25 11:07:23 +01:00
Andres Freund 6939cb8c56 Hack up PREPARE/EXECUTE for nearly all distributed queries.
All router, real-time, task-tracker plannable queries should now have
full prepared statement support (and even use router when possible),
unless they don't go through the custom plan interface (which
basically just affects LANGUAGE SQL (not plpgsql) functions).

This is achieved by forcing postgres' planner to always choose a
custom plan, by assigning very low costs to plans with bound
parameters (i.e. ones were the postgres planner replanned the query
upon EXECUTE with all parameter values provided), instead of the
generic one.

This requires some trickery, because for custom plans to work the
costs for a non-custom plan have to be known, which means we can't
error out when planning the generic plan.  Instead we have to return a
"faux" plan, that'd trigger an error message if executed.  But due to
the custom plan logic that plan will likely (unless called by an SQL
function, or because we can't support that query for some reason) not
be executed; instead the custom plan will be chosen.
2017-01-23 09:23:50 -08:00
Andres Freund c244b8ef4a Make router planner error handling more flexible.
So far router planner had encapsulated different functionality in
MultiRouterPlanCreate. Modifications always go through router, selects
sometimes. Modifications always error out if the query is unsupported,
selects return NULL.  Especially the error handling is a problem for
the upcoming extension of prepared statement support.

Split MultiRouterPlanCreate into CreateRouterPlan and
CreateModifyPlan, and change them to not throw errors.

Instead errors are now reported by setting the new
MultiPlan->plannigError.

Callers of router planner functionality now have to throw errors
themselves if desired, but also can skip doing so.

This is a pre-requisite for expanding prepared statement support.

While touching all those lines, improve a number of error messages by
getting them closer to the postgres error message guidelines.
2017-01-23 09:23:50 -08:00
Andres Freund 7681f6ab9d Centralize more of distributed planning into CreateDistributedPlan().
The name CreatePhysicalPlan() hasn't been accurate for a while, and
the split of work between multi_planner() and CreatePhysicalPlan()
doesn't seem perfect.  So rename to CreateDistributedPlan() and move a
bit more logic in there.
2017-01-23 09:23:50 -08:00
Andres Freund 9a82e8f06b Make usage of static a bit more consistent in multi_planner.c. 2017-01-23 09:23:50 -08:00
Jason Petersen 56197dbdba
Add replication_model GUC
This adds a replication_model GUC which is used as the replication
model for any new distributed table that is not a reference table.
With this change, tables with replication factor 1 are no longer
implicitly MX tables.

The GUC is similarly respected during empty shard creation for e.g.
existing append-partitioned tables. If the model is set to streaming
while replication factor is greater than one, table and shard creation
routines will error until this invalid combination is corrected.

Changing this parameter requires superuser permissions.
2017-01-23 09:05:14 -07:00
Burak Yucesoy 2e1df4c910 Reword error message for outer joins requiring repartition
We changed error message which appears when user tries to execute outer join command and
that command requires repartitioning. Old error message mentioned about 1-to-1 shard
partitioning which may not be clear to user.
2017-01-23 10:42:36 +03:00
Marco Slot 87ae26aef3 Ensure job IDs are unique across workers 2017-01-22 16:55:14 +01:00
Andres Freund 3a36d32c43 Mark some now unnecessarily exposed multi_planner.c functions static. 2017-01-20 12:31:56 -08:00
Andres Freund 608bed0387 Don't duplicate planning logic in citus' explain hook.
Instead use pg_plan_query() like the normal explain does, and use that
to explain the query.  That's important because it allows to remove
the duplicated planner logic from multi_explain - and that logic is
about to get more complicated.
2017-01-20 12:31:28 -08:00
Andres Freund 0f28a11970 Remove citus.explain_multi_logical/physical_plan.
They make fixing explain for prepared statement harder, and they don't
really fit into EXPLAIN in the first place.  Additionally they're
currently not exercised in any tests.
2017-01-20 12:31:19 -08:00
Metin Doslu 93e626c896 Refactor get_shard_id_for_distribution_column() and other minor changes 2017-01-20 14:38:01 +02:00
Onder Kalaci a7ed49c16e
Improve error messages for INSERT INTO .. SELECT
This commit is intended to improve the error messages while planning
INSERT INTO .. SELECT queries. The main motivation for this change is
that we used to map multiple cases into a single message. With this change,
we added explicit error messages for many cases.
2017-01-16 12:16:14 -07:00
Murat Tuncer e7935a3be4 Report error when original range table id is not found in NewTableId() 2017-01-13 09:39:43 +03:00
Murat Tuncer 77f8db6b14 Add view support
Enables use views within distributed queries.
User can create and use a view on distributed tables/queries
as he/she would use with regular queries.

After this change router queries will have full support for views,
insert into select queries will support reading from views, not
writing into. Outer joins would have a limited support, and would
error out at certain cases such as when a view is in the inner side
of the outer join.

Although PostgreSQL supports writing into views under certain circumstances.
We disallowed that for distributed views.
2017-01-13 09:39:42 +03:00
Murat Tuncer cb1dfd0a17 Add hint to errored real time queries 2017-01-12 11:33:35 +03:00
Burak Yucesoy 59d3d05bc4 Error out on CTEs with data modifying statement
With this change we start to error out on router planner queries where a common table
expression with data-modifying statement is present. We already do not support if
there is a data-modifying statement using result of the CTE, now we also error out
if CTE itself is data-modifying statement.
2017-01-10 10:30:09 +02:00
Onder Kalaci 6d050fd677 Use 2PC for reference table modification
With this commit, we ensure that router executor always uses
2PC for reference table modifications and never mark the placements
of it as INVALID.
2017-01-04 12:46:35 +02:00
Eren Basak 7e09bd6836 Error on Unsupported Features on Workers
This change makes the metadata workers error out on unsupported commands.
2017-01-02 16:03:45 +03:00
Murat Tuncer 2f76b4be99 Add error hint to failing modify query 2016-12-23 19:43:55 +03:00
Marco Slot 11031bcf55 Enable evaluation of stable functions in INSERT..SELECT 2016-12-23 12:47:21 +01:00
Marco Slot d745d7bf70 Add explicit RelationShards mapping to tasks 2016-12-23 10:23:43 +01:00
Onder Kalaci 9f0bd4cb36 Reference Table Support - Phase 1
With this commit, we implemented some basic features of reference tables.

To start with, a reference table is
  * a distributed table whithout a distribution column defined on it
  * the distributed table is single sharded
  * and the shard is replicated to all nodes

Reference tables follows the same code-path with a single sharded
tables. Thus, broadcast JOINs are applicable to reference tables.
But, since the table is replicated to all nodes, table fetching is
not required any more.

Reference tables support the uniqueness constraints for any column.

Reference tables can be used in INSERT INTO .. SELECT queries with
the following rules:
  * If a reference table is in the SELECT part of the query, it is
    safe join with another reference table and/or hash partitioned
    tables.
  * If a reference table is in the INSERT part of the query, all
    other participating tables should be reference tables.

Reference tables follow the regular co-location structure. Since
all reference tables are single sharded and replicated to all nodes,
they are always co-located with each other.

Queries involving only reference tables always follows router planner
and executor.

Reference tables can have composite typed columns and there is no need
to create/define the necessary support functions.

All modification queries, master_* UDFs, EXPLAIN, DDLs, TRUNCATE,
sequences, transactions, COPY, schema support works on reference
tables as expected. Plus, all the pre-requisites associated with
distribution columns are dismissed.
2016-12-20 14:09:35 +02:00
Murat Tuncer c3a60bff70 Make router planner active at all times
We used to disable router planner and executor
when task executor is set to task-tracker.

This change enables router planning and execution
at all times regardless of task execution mode.

We are introducing a hidden flag enable_router_execution
to enable/disable router execution. Its default value is
true. User may disable router planning by setting it to false.
2016-12-20 11:24:01 +03:00
Onder Kalaci df974e15b8 Bugfix for deparsing INSERT..SELECT queries which involve constant values
This commit fixes a bug when the SELECT target list includes a constant
value.

Previous behaviour of target list re-ordering:
  * Iterate over the INSERT target list
    * If it includes a Var, find the corresponding SELECT entry
      and update its resno accordingly
    * If it does not include a Var (which we only considered to be
      DEFAULTs), generate a new SELECT target entry
  * If the processed target entry count in SELECT target list is less
    than the original SELECT target list (GROUP BY elements not included in
    the SELECT target entry), add them in the SELECT target list and
    update the resnos accordingly.
     * However, this step was leading to add the CONST SELECT target entries
       twice. The reason is that when CONST target list entries appear in the
       SELECT target list, the INSERT target list doesn't include a Var. Instead,
       it includes CONST as it does for DEFAULTs.

New behaviour of target list re-ordering:
  * Iterate over the INSERT target list
    * If it includes a Var, find the corresponding SELECT entry
      and update its resno accordingly
    * If it does not include a Var (which we consider to be
      DEFAULTs and CONSTs on the SELECT), generate a new SELECT
      target entry
  * If any target entry remains on the SELECT target list which are resjunk,
    (GROUP BY elements not included in the SELECT target entry), keep them
    in the SELECT target list by updating the resnos.
2016-12-01 10:41:56 +02:00
Murat Tuncer 45762006f3 Add support for filters
Ensures filter clauses are stripped from master query, and pushed
down to worker queries.
2016-12-01 08:53:46 +03:00
Onder Kalaci a43e3bad56 Improve error semantics for INSERT..SELECT
With this commit, we error out if a worker query cannot be executed
on all placements of a target insert shard interval.
2016-10-27 14:09:05 +03:00
Brian Cloutier 1e6d1ef67e Fix segfault during EXPLAIN EXECUTE
Fix citusdata/citus#886

The way postgres' explain hook is designed means that our hook is never
called during EXPLAIN EXECUTE. So, we special-case EXPLAIN EXECUTE by
catching it in the utility hook.  We then replace the EXECUTE with the
original query and pass it back to Citus.
2016-10-26 15:18:42 +03:00
Onder Kalaci 1673ea937c Feature: INSERT INTO ... SELECT
This commit adds INSERT INTO ... SELECT feature for distributed tables.

We implement INSERT INTO ... SELECT by pushing down the SELECT to
each shard. To compute that we use the router planner, by adding
an "uninstantiated" constraint that the partition column be equal to a
certain value. standard_planner() distributes that constraint to all
the tables where it knows how to push the restriction safely. An example
is that the tables that are connected via equi joins.

The router planner then iterates over the target table's shards,
for each we replace the "uninstantiated" restriction, with one that
PruneShardList() handles. Do so by replacing the partitioning qual
parameter added in multi_planner() with the current shard's
actual boundary values. Also, add the current shard's boundary values to the
top level subquery to ensure that even if the partitioning qual is
not distributed to all the tables, we never run the queries on the shards
that don't match with the current shard boundaries. Finally, perform the
normal shard pruning to decide on whether to push the query to the
current shard or not.

We do not support certain SQLs on the subquery, which are described/commented
on ErrorIfInsertSelectQueryNotSupported().

We also added some locking on the router executor. When an INSERT/SELECT command
runs on a distributed table with replication factor >1, we need to ensure that
it sees the same result on each placement of a shard. So we added the ability
such that router executor takes exclusive locks on shards from which the SELECT
in an INSERT/SELECT reads in order to prevent concurrent changes. This is not a
very optimal solution, but it's simple and correct. The
citus.all_modifications_commutative can be used to avoid aggressive locking.
An INSERT/SELECT whose filters are known to exclude any ongoing writes can be
marked as commutative. See RequiresConsistentSnapshot() for the details.

We also moved the decison of whether the multiPlan should be executed on
the router executor or not to the planning phase. This allowed us to
integrate multi task router executor tasks to the router executor smoothly.
2016-10-26 10:01:00 +03:00
Onder Kalaci e0d83d65af Add ability to reorder target list for INSERT/SELECT queries
The necessity for this functionality comes from the fact that ruleutils.c is not supposed to be
used on "rewritten" queries (i.e. ones that have been passed through QueryRewrite()).
Query rewriting is the process in which views and such are expanded,
and, INSERT/UPDATE targetlists are reordered to match the physical order,
defaults etc. For the details of reordeing, see transformInsertRow().
2016-10-26 10:00:03 +03:00
Marco Slot 02d2b86e68 Re-disable master evaluation for SELECT 2016-10-21 10:51:47 +02:00
Marco Slot 9d98acfb6d Move requiresMasterEvaluation from Task to Job 2016-10-19 08:23:06 +02:00
Andres Freund ac14b2edbc
Support PostgreSQL 9.6
Adds support for PostgreSQL 9.6 by copying in the requisite ruleutils
file and refactoring the out/readfuncs code to flexibly support the
old-style copy/pasted out/readfuncs (prior to 9.6) or use extensible
node APIs (in 9.6 and higher).

Most version-specific code within this change is only needed to set new
fields in the AggRef nodes we build for aggregations. Version-specific
test output files were added in certain cases, though in most they were
not necessary. Each such file begins by e.g. printing the major version
in order to clarify its purpose.

The comment atop citus_nodes.h details how to add support for new nodes
for when that becomes necessary.
2016-10-18 16:23:55 -06:00
Metin Doslu d03a2af778 Add HAVING support
This commit completes having support in Citus by adding having support for
real-time and task-tracker executors. Multiple tests are added to regression
tests to cover new supported queries with having support.
2016-10-13 15:47:53 +03:00
Andres Freund 982ad66753 Introduce placement IDs.
So far placements were assigned an Oid, but that was just used to track
insertion order. It also did so incompletely, as it was not preserved
across changes of the shard state. The behaviour around oid wraparound
was also not entirely as intended.

The newly introduced, explicitly assigned, IDs are preserved across
shard-state changes.

The prime goal of this change is not to improve ordering of task
assignment policies, but to make it easier to reference shards.  The
newly introduced UpdateShardPlacementState() makes use of that, and so
will the in-progress connection and transaction management changes.
2016-10-07 11:59:20 -07:00
Brian Cloutier 9d6699b07c Switch from pg_worker_list.conf file to pg_dist_node metadata table.
Related to #786

This change adds the `pg_dist_node` table that contains the information
about the workers in the cluster, replacing the previously used
`pg_worker_list.conf` file (or the one specified with `citus.worker_list_file`).

Upon update, `pg_worker_list.conf` file is read and `pg_dist_node` table is
populated with the file's content. After that, `pg_worker_list.conf` file
is renamed to `pg_worker_list.conf.obsolete`

For adding and removing nodes, the change also includes two new UDFs:
`master_add_node` and `master_remove_node`, which require superuser
permissions.

'citus.worker_list_file' guc is kept for update purposes but not used after the
update is finished.
2016-10-05 13:01:35 +03:00
Andres Freund 6d050bc9f8 Initialize count_agg_clauses argument to 0.
count_agg_clause *adds* the cost of the aggregates to the state
variable, it doesn't reinitialize it. That is intentional, as it is used
to incrementally add costs in some places.
2016-10-03 13:07:43 -07:00
Robin Thomas c507a0df1c During repartitions, the partitionColumnType argument sent to workers
is now a `::regtype` using the qualified name of the column type,
not the column type OID which may differ between master/worker nodes.
Test coverage of a hash reparitition using a UDT as the join column.

Note that the UDFs `worker_hash_partition_table` and `worker_range_partition_table`
are unchanged, and rightly expect an OID for the column type; but the
planner code building the commands now allows for `::regtype` casting
to do its magic.

Fixes citusdata/citus#111.
2016-10-03 13:41:20 -04:00
Onder Kalaci a533b8e7c1 Differentiate worker and master job temporary folders
This commit enables to create different worker and master temporary folders.
This change is important for citus-mx on task-tracker execution. In simple words,
on citus-mx, the worker could actually be reponsible for the master tasks as well.
Prior to this change, both master and worker logic on task-tracker executor was
accessing and using the same files for different purposes which was dangerous on
certain cases (i.e., when task_tracker_delay is low).
2016-10-03 14:24:08 +03:00
Marco Slot c4bc0742a7 Make count return 0 if all shards are pruned away
Before this change, count on a distributed returned NULL if all shards
were pruned away, because on the master we replace with count(..) call
with a sum(..) call to sum the counts from the shards. However, sum
returns NULL when there are no rows, whereas count is expected to return
0.
2016-09-29 20:27:26 +02:00
Murat Tuncer 5b42318ac4 Make where false queries router plannable 2016-09-28 18:49:26 +03:00
Marco Slot 3318288d75 Fix segmentation fault in case of joins with WHERE 1=0 2016-09-26 15:12:29 +02:00
Marco Slot 6f6cb1a0d6 Allow noop updates of the partition column 2016-09-07 14:22:41 +02:00
Metin Doslu 7d212b847f Add outer join clause list extraction for subquery pushdown logic
In subquery pushdown, we allow outer joins if the join condition is on the
partition columns. WhereClauseList() used to return all join conditions including
outer joins. However, this has been changed with a commit related to outer join
support on regular queries. With this commit, we refactored ExtractFromExpressionWalker()
to return two lists of qualifiers. The first list is for inner join and filter
clauses and the second list is for outer join clauses. Therefore, we can also
use outer join clauses to check subquery pushdown prerequisites.
2016-09-02 11:54:44 +03:00
Robin Thomas 010cbf16fc Remove all usage of pg_dist_shard.shardalias in extension code. (#739)
Remove regression test of non-null shardalias.
2016-08-19 17:06:22 +03:00
Burak Yucesoy 6f20af9e38 Remove schema name parameter from API functions
We remove schema name parameter from worker_fetch_foreign_file and
worker_fetch_regular_table functions. We now send schema name
concatanated with table name.
2016-07-28 20:41:05 +03:00
Burak Yucesoy a649b47bac Add old version(without schema name parameter) of api functions back
Fixes #676

We added old versions (i.e. without schema name) of worker_apply_shard_ddl_command,
worker_fetch_foreign_file and worker_fetch_regular_table back. During function call
of one of these functions, we set schema name as  public schema and call the newer
version of the functions.
2016-07-28 20:40:38 +03:00
Murat Tuncer cc33a450c4 Expand router planner coverage
We can now support richer set of queries in router planner.
This allow us to support CTEs, joins, window function, subqueries
if they are known to be executed at a single worker with a single
task (all tables are filtered down to a single shard and a single
worker contains all table shards referenced in the query).

Fixes : #501
2016-07-27 23:35:38 +03:00
Murat Tuncer c20080992d Remove PostgreSQL 9.4 support 2016-07-26 20:16:09 +03:00
Murat Tuncer 5d996a6891 Fix outer join crash when subquery is flatten 2016-07-22 17:01:19 +03:00
Burak Yucesoy b58872b441
Fix worker_fetch_regular_table with schema
Fixes #504
Fixes #646

We changed signature of worker_fetch_regular_table to accept schema name as parameter to
make it work with schemas.
2016-07-22 00:44:02 -06:00
Burak Yucesoy 20debfc0ee Fix COUNT DISTINCT approximation with schema
Fixes #555

Before this change, we were resolving HLL function and type Oid without qualified name.
Now we find the schema name where HLL objects are stored and generate qualified names for
each objects.

Similar fix is also applied for cstore_table_size function call.
2016-07-21 17:29:18 +03:00
Murat Tuncer 4d992c8143 Make router planner use original query 2016-07-18 18:23:04 +03:00
Eren 5b54e28f93 Add LIMIT/OFFSET Support
Fixes #394

This change adds LIMIT/OFFSET support for non router-plannable
distributed queries.

In cases that we can push the LIMIT down, we add the OFFSET value to
that LIMIT in the worker queries. When a query with LIMIT x OFFSET y is issued,
the query is propagated to the workers as LIMIT (x+y) OFFSET 0, and on the
master table, the original LIMIT and OFFSET values are used. With this change,
we can use OFFSET wherever we can use LIMIT.
2016-07-18 12:00:24 +03:00
Andres Freund 4cf0a4e48e citus_indent fixups 2016-07-13 11:45:51 -07:00
Brian Cloutier 0cad3b22cc Simplify code and fix include guards in citus_clauses 2016-07-13 11:45:51 -07:00
Brian Cloutier 08384ddc71 cosmetic changes 2016-07-13 11:45:51 -07:00
Brian Cloutier af9515f669 Only reparse queries if the planner flags them for reparsing 2016-07-13 11:45:51 -07:00
Brian Cloutier 4820366a6f citus_indent and some renaming 2016-07-13 11:45:51 -07:00
Brian Cloutier ae91768c96 Evaluate functions on the master
- Enables using VOLATILE functions (like nextval()) in INSERT queries
- Enables using STABLE functions (like now()) targetLists and joinTrees

UPDATE and INSERT can now contain non-immutable functions. INSERT can contain any kind of
expression, while UPDATE can contain any STABLE function, so long as a Var is not passed
into the STABLE function, even indirectly. UPDATE TagetEntry's can now also include Vars.

There's an exception, CASE/COALESCE statements may not contain mutable functions.

Functions calls in master_modify_multiple_shards are also evaluated.
2016-07-13 11:45:51 -07:00
Jason Petersen 41ed433b0e
Remove hash-pruning logic for NULL values
It turns out some tests exercised this behavior, but removing it should
have no ill effects. Besides, both copy and INSERT disallow NULLs in a
table's partition column.

Fixes a bug where anti-joins on hash-partitioned distributed tables
would incorrectly prune shards early, result in incorrect results (test
included).
2016-07-06 17:04:21 -06:00
Andres Freund cccba66f24 Support RETURNING for modification commands.
Fixes: #242
2016-07-01 13:07:12 -07:00
Andres Freund e1282b6d70 Remember original targetlist in MultiQueryContainerNode().
The old targetlist wasn't used so far, but the upcoming RETURNING
support relies on it.

This also allows to get rid of some crufty code in
multi_executor.c:multi_ExecutorStart(), which used the worker query's
targetlist instead of the main statement's (which didn't have one up to
now).
2016-07-01 12:50:12 -07:00
Andres Freund f78c135e63 Fix definition of faux targetlist element inserted to prevent backward scans.
The targetlist contains TargetEntrys containing expressions, not
expressions directly. That didn't matter so far, but with the upcoming
RETURNING support, the targetlist is inspected to build a TupleDesc.
ExecCleanTypeFromTL hits an assert when looking at something that's not
a TargetEntry.

Mark the entry as resjunk, so it's not actually used.
2016-07-01 12:50:12 -07:00
Murat Tuncer fb99585ca5 Refactor multi_planner to create router plan directly
If router plan creation fails, it falls back to normal planner
2016-06-21 12:50:21 +03:00
Andres Freund 2e8e8d377e Store ShardInterval instead of shardId in RangeTableFragments.
For CITUS_RTE_RELATION type fragments, reloading shardIntervals from the
database is rather expensive. So store a pointer to the full shard
interval, instead of just the shard id.  There's no new memory lifetime
hazards here, because we already passed a pointer to the shardInterval's
->shardId field around.

The plan time for the query in issue #607 goes from 2889 ms to 106 ms.
with this change.
2016-06-16 17:31:35 -07:00
Andres Freund 211a9721a9 Use cached comparator in ShardIntervalsOverlap().
By far the most expensive part of ShardIntervalsOverlap() is computing
the function to use to determine overlap. Luckily we already have that
computed and cached.

The plan time for the query in issue #607 goes from 8764 ms to 2889 ms
with this change.
2016-06-16 17:21:19 -07:00
Marco Slot 52bc209c37 Do not copy outer join clauses into WHERE 2016-06-16 16:42:32 -07:00
Eren 57256b3476 Eliminate compile time warnings in multi_logical_optimizer.c
This change removes some issues about mixed declarations
and code in TablePartitioningSupportsDistinct() and
WorkerExtendedOpNode() functions.
2016-06-10 12:27:12 +03:00
Murat Tuncer 0db413491c Fix crash in count distinct with filters in repartition subqueries
now copies all column references in count distinct aggreagete
to worker target list and group by. Master target list is
also updated to reflect changes in attribute order.

Fixes 569
2016-06-09 11:47:24 +03:00
Murat Tuncer 20ba0f72a6 Change equality operator check for operator expressions 2016-06-06 12:34:16 +03:00
Burak Yucesoy 5db357eb1a Remove ONLY clause from worker queries
Fixes #475

With this change we prevent addition of ONLY clause to queries prepared for
worker nodes. When we add ONLY clause we may miss the inherited tables in
worker nodes created by users manually.
2016-06-03 11:42:43 +03:00
Murat Tuncer 2b0d6473b9 Add complex distinct count support for repartitioned subqueries
Single table repartition subqueries now support count(distinct column)
and count(distinct (case when ...)) expressions. Repartition query
extracts column used in aggregate expression and adds them to target
list and group by list, master query stays the same (count (distinct ...))
but attribute numbers inside the aggregate expression is modified to
reflect changes in repartition query.
2016-05-27 15:43:05 +03:00
eren 132d9212d0 ADD master_modify_multiple_shards UDF
Fixes #10

This change creates a new UDF: master_modify_multiple_shards
Parameters:
  modify_query: A simple DELETE or UPDATE query as a string.

The UDF is similar to the existing master_apply_delete_command UDF.
Basically, given the modify query, it prunes the shard list, re-constructs
the query for each shard and sends the query to the placements.

Depending on the value of citus.multi_shard_commit_protocol, the commit
can be done in one-phase or two-phase manner.

Limitations:
* It cannot be called inside a transaction block
* It only be called with simple operator expressions (like Single Shard Modify)

Sample Usage:
```
SELECT master_modify_multiple_shards(
  'DELETE FROM customer_delete_protocol WHERE c_custkey > 500 AND c_custkey < 500');
```
2016-05-26 17:30:35 +03:00
Marco Slot 1b4fbc76e2 Add JSON/XML validation to EXPLAIN regression tests and fix issues 2016-05-06 11:30:07 +02:00
Lukas Fittl 2f694f7af3 Distributed EXPLAIN: Generate valid JSON output.
This modifies the EXPLAIN output functions to actually generate
valid JSON output when (FORMAT JSON) is being used.

Fixes #494.
2016-05-05 12:48:01 +02:00
Onder Kalaci 38da3c826b Fix compile time warning
This change fixes a compile time warning related to definition/declaration order
of the code.
2016-05-04 09:42:10 +03:00
Brian Cloutier 58535eb337 Query Planning Performance Improvments (#474)
- Only look at pruned shards when determining AnchorTable
- Use cached shardIntervalCompareFunction during copartition check
2016-05-03 10:48:46 +03:00
Marco Slot fc4f23065a Add EXPLAIN for simple distributed queries 2016-04-30 00:11:02 +02:00
eren 7e19ebe679 FIX "mixed declarations and code" Warning in multi_physical_planner.c
Fixes #477

This change fixes the compile time warning message in BuildMapMergeJob in
multi_physical_planner.c about mixed declarations and code. Basically, the
problematic declaration is moved up so that no expression is before it.
2016-04-29 11:18:04 +03:00
Brian Cloutier 0036eb3253 Allow references to columns in UPDATE statements (#472)
Allow references to columns in UPDATE statements

Queries like "UPDATE tbl SET column = column + 1" are now allowed, so long as you don't use any IMMUTABLE functions.
2016-04-28 05:45:16 -07:00
Andres Freund a5b3dcddb3 Run some commands as superuser to allow normal users to execute queries.
Some small parts of citus currently require superuser privileges; which
is obviously not desirable for production scenarios. Run these small
parts under superuser privileges (we use the extension owner) to avoid
that.

This does not yet coordinate grants between master and workers. Thus it
allows to create shards, load data, and run queries as a non-superuser,
but it is not easily possible to allow differentiated accesses to
several users.
2016-04-27 10:28:22 -07:00
Andres Freund 42d232c0e8 Use the current session's username when connecting to worker nodes.
So far we've always used libpq defaults when connecting to workers; bar
special environment variables being set that'll always be the user that
started the server.  That's not desirable because it prevents using
users with fewer privileges.

Thus change the various APIs creating connections to workers to always
use usernames. That means:
1) MultiClientConnect() needs to, optionally, accept a username
2) GetOrEstablishConnection(), including the underlying cache, need to
   use the current user as part of the connection cache key. That way
   connections for separate users are distinct, and we always use one
   with the correct authorization.
3) The task tracker needs to keep track of the username associated with
   a task, so it can use it when establishing connections outside the
   originating session.
2016-04-27 10:00:08 -07:00
Onder Kalaci 108114ab99 Apply final code review feedback
- Fix o(n^2) loop to o(n)
- Collapse two if statements into a single one
- Some coding conventions feedback
2016-04-27 10:36:03 +03:00
Onder Kalaci c4b783b70b Fix Merge Conflict
This commit fixes merge conflicts.
2016-04-26 11:18:47 +03:00
Onder Kalaci 6c7abc2ba5 Add fast shard pruning path for INSERTs on hash partitioned tables
This commit adds a fast shard pruning path for INSERTs on
hash-partitioned tables. The rationale behind this change is
that if there exists a sorted shard interval array, a single
index lookup on the array allows us to find the corresponding
shard interval. As mentioned above, we need a sorted
(wrt shardminvalue) shard interval array. Thus, this commit
updates shardIntervalArray to sortedShardIntervalArray in the
metadata cache. Then uses the low-level API that is defined in
multi_copy to handle the fast shard pruning.

The performance impact of this change is more apparent as more
shards exist for a distributed table. Previous implementation
was relying on linear search through the shard intervals. However,
this commit relies on constant lookup time on shard interval
array. Thus, the shard pruning becomes less dependent on the
shard count.
2016-04-26 11:16:00 +03:00
Murat Tuncer a88d3ecd4e Add dynamic executor selection
- non-router plannable queries can be executed
  by router executor if they satisfy the criteria
- router executor is removed from configuration,
  now task executor can not be set to router
- removed some tests that error out for router executor
2016-04-21 09:15:33 +03:00
Murat Tuncer 938546b938 Add router plannable check and router planning logic
for single shard select queries
2016-04-21 09:15:33 +03:00
Brian Cloutier 7b1dc0d511 Support count(distinct) on hash partitioned tables
Also add test to ensure we get the same results when running
count(distinct) on range and hash partitioned tables.
2016-04-20 04:54:07 -07:00
eren 53186b4e67 FIX Warning Message in multi_logical_optimizer.c
With #426, some new warning messages started to arise, because of
cross assignment of Node and Expr pointers. This change fixes the
warnings with type casts.
2016-04-20 11:33:29 +03:00
eren 448527c3af
Fix JOINs on varchar columns with subquery pushdown
Fixes #379

Varchar VAR struct is wrapped in RELABELTYPE struct inside PostgreSQL code and
IsPartitionColumnRecursive function considers only VAR types so returning false
for varchar.

This change adds strip_implicit_coercions() call to the columnExpression in
IsPartitionColumnRecursive function so that we get rid of implicit coercions like
RELABELTYPE are stripped to VAR.
2016-04-19 21:55:50 -06:00
eren 399b5738b0
Fix Join Problem With VARCHAR Partition Columns
This change fixes the problem with joins with VARCHAR columns. Prior to
this change, when we tried to do large table joins on varchar columns, we got
an error of the form:
ERROR: cannot perform local joins that involve expressions
DETAIL: local joins can be performed between columns only.

This is because we have a check in CheckJoinBetweenColumns() which requires the
join clause to have only 'Var' nodes (i.e. columns). Postgres adds a relabel t
ype cast to cast the varchar to text; hence the type of the node is not T_Var
and the join fails.

The fix involves calling strip_implicit_coercions() to the left and right
arguments so that RELABELTYPE is stripped to VAR.

Fixes #76.
2016-04-19 21:55:50 -06:00
eren 1ffc30d7f5
Fix Shard Pruning Problem With Subqueries on VARCHAR Partition Columns
Fixes #375

Prior to this change, shard pruning couldn't be done if:
- Table is hash-distributed
- Partition column of is VARCHAR
- Query to be pruned is a subquery

There were two problems:
- A bug in left-side/right-side checks for the partition column
- We were not considering relabeled types (VARCHAR was relabeled as TEXT)
2016-04-19 21:55:50 -06:00
Andres Freund 39233c54ac
Remove wholly unused variable.
This avoids a -Wunused warning.
2016-04-19 12:31:13 -06:00
Andres Freund 29b8576a33
Annotate variables only used for asserts with PG_USED_FOR_ASSERTS_ONLY.
This avoids '-Wunused-but-set-variable' type warnings when compiling
without assertions, e.g. against a system postgres.
2016-04-19 12:31:12 -06:00
Jason Petersen 30fdb59a80
Add clarifying comment in HashableClauseMutator
While reading this code last week, it appeared as though there was no
place we ensured that the partition clause actually used equality ops.
As such, I was worried that we might transform a clause such as id < 5
into a constraint like hash(id) = hash(5) when doing shard pruning. The
relevant code seemed to just ensure:

  1. The node is an OpExpr
  2. With a related hash function
  3. It compares the partition column
  4. Against a constant

A superficial reading implied we didn't actually make sure the original
op was equality-related, but it turns out the hash lookup function DOES
ensure that for us. So I added a comment.
2016-04-19 12:21:11 -06:00
Onder Kalaci d917d9a615 Allow all types of nodes in the WHERE clauses
This change removes the whitelisting check on the WHERE clauses. Note that, before
this change, citus was already allowing all types of nodes with the following
format (i.e., wrap with a boolean test):

  * SELECT col FROM table WHERE (ANY EXPRESSION) is TRUE;

Thus, this change is mostly useful for allowing the expressions in the WHERE clause
directly and avoiding "unsupport clause type" errors.
2016-03-30 16:39:58 +03:00
eren ef6d5c7571 Fix spurious NOTICE messages with ANY/ALL
Fixes issue #258

Prior to this change, Citus gives a deceptive NOTICE message when a query
including ANY or ALL on a non-partition column is issued on a hash
partitioned table.

Let the github_events table be hash-distributed on repo_id column. Then,
issuing this query:
    SELECT count(*) FROM github_events WHERE event_id = ANY ('{1,2,3}')

Gives this message:
    NOTICE: cannot use shard pruning with ANY (array expression)
    HINT: Consider rewriting the expression with OR clauses.

Note that since event_id is not the partition column, shard pruning would
not be applied in any case. However, the NOTICE message would be valid
and be given if the ANY clause would have been applied on repo_id column.

Reviewer: Murat Tuncer
2016-03-25 14:30:02 +02:00
Jason Petersen 423e6c8ea0
Update copyright dates
Fixed configure variable and updated all end dates to 2016.
2016-03-23 17:14:37 -06:00
Murat Tuncer 3528d7ce85 Merge from master branch into feature/citusdb-to-citus 2016-02-17 14:49:01 +02:00
Metin Doslu 6123022ca7 Add check for count distinct on single table subqueries
Fixes #314
2016-02-17 14:24:07 +02:00
Jason Petersen fdb37682b2
First formatting attempt
Skipped csql, ruleutils, readfuncs, and functions obviously copied from
PostgreSQL. Seeing how this looks, then continuing.
2016-02-15 23:29:32 -07:00
Murat Tuncer 55c44b48dd Changed product name to citus
All citusdb references in
- extension, binary names
- file headers
- all configuration name prefixes
- error/warning messages
- some functions names
- regression tests

are changed to be citus.
2016-02-15 16:04:31 +02:00
Önder Kalacı a55287411b Merge pull request #332 from citusdata/bugfix/memory_context_leak
Remove unnecessary memory context switch on the planner
2016-02-12 11:13:12 -08:00
Onder Kalaci 0a6839e544 Perform distributed planning in the calling memory context
Previously we used, for historical reasons, MessageContext.
That is problematic if a single message from the client
causes a lot of statements to be planned. E.g. for the
copy_to_distributed_table script one insert statement
is planned for each row inserted via COPY, and only freed
when COPY has finished.
2016-02-12 20:50:40 +02:00
Jason Petersen b1ef2e59a2 Merge pull request #331 from citusdata/feature-permit_dml_to_append_tables#321
Allow DML commands on append-partitioned tables

cr: @lithp
2016-02-12 11:24:06 -07:00
Jason Petersen 6f308c5e2d
Allow DML commands on append-partitioned tables
This entirely removes any restriction on the type of partitioning
during DML planning and execution. Though there aren't actually any
technical limitations preventing DML commands against append- (or even
range-) partitioned tables, we had initially forbidden this, as any
future stage operation could cause shards to overlap, banning all
subsequent DML operations to partition values contained within more
than one shards. This ended up mostly restricting us, so we're now
removing that restriction.
2016-02-11 16:09:35 -07:00
Jason Petersen d164305929
Handle hash-partitioned aliased data types
When two data types have the same binary representation, PostgreSQL may
add an implicit coercion between them by wrapping a node in a relabel
type. This wrapper signals that the wrapped value is completely binary
compatible with the designated "final type" of the relabel node. As an
example, the varchar type is often relabeled to text, since functions
provided for use with text (comparisons, hashes, etc.) are completely
compatible with varchar as well.

The hash-partitioned codepath contains functions that verify queries
actually contain an equality constraint on the partition column, but
those functions expect such constraints to be comparison operations
between a Var and Const. The RelabelType wrapper node causes these
functions to always return false, which bypasses shard pruning.
2016-02-11 13:50:43 -07:00
Onder Kalaci 136306a1fe Initial commit of Citus 5.0 2016-02-11 04:05:32 +02:00