Commit Graph

24 Commits (33375944d6cc189651b33c88cce255088088c87b)

Author SHA1 Message Date
Murat Tuncer 33375944d6 Add support for filters
Ensures filter clauses are stripped from master query, and pushed
down to worker queries.
2016-12-01 08:53:46 +03:00
Onder Kalaci 9e82cd6d2d Feature: INSERT INTO ... SELECT
This commit adds INSERT INTO ... SELECT feature for distributed tables.

We implement INSERT INTO ... SELECT by pushing down the SELECT to
each shard. To compute that we use the router planner, by adding
an "uninstantiated" constraint that the partition column be equal to a
certain value. standard_planner() distributes that constraint to all
the tables where it knows how to push the restriction safely. An example
is that the tables that are connected via equi joins.

The router planner then iterates over the target table's shards,
for each we replace the "uninstantiated" restriction, with one that
PruneShardList() handles. Do so by replacing the partitioning qual
parameter added in multi_planner() with the current shard's
actual boundary values. Also, add the current shard's boundary values to the
top level subquery to ensure that even if the partitioning qual is
not distributed to all the tables, we never run the queries on the shards
that don't match with the current shard boundaries. Finally, perform the
normal shard pruning to decide on whether to push the query to the
current shard or not.

We do not support certain SQLs on the subquery, which are described/commented
on ErrorIfInsertSelectQueryNotSupported().

We also added some locking on the router executor. When an INSERT/SELECT command
runs on a distributed table with replication factor >1, we need to ensure that
it sees the same result on each placement of a shard. So we added the ability
such that router executor takes exclusive locks on shards from which the SELECT
in an INSERT/SELECT reads in order to prevent concurrent changes. This is not a
very optimal solution, but it's simple and correct. The
citus.all_modifications_commutative can be used to avoid aggressive locking.
An INSERT/SELECT whose filters are known to exclude any ongoing writes can be
marked as commutative. See RequiresConsistentSnapshot() for the details.

We also moved the decison of whether the multiPlan should be executed on
the router executor or not to the planning phase. This allowed us to
integrate multi task router executor tasks to the router executor smoothly.
2016-10-26 10:01:00 +03:00
Andres Freund 0e02b838a3 Support PostgreSQL 9.6
Adds support for PostgreSQL 9.6 by copying in the requisite ruleutils
file and refactoring the out/readfuncs code to flexibly support the
old-style copy/pasted out/readfuncs (prior to 9.6) or use extensible
node APIs (in 9.6 and higher).

Most version-specific code within this change is only needed to set new
fields in the AggRef nodes we build for aggregations. Version-specific
test output files were added in certain cases, though in most they were
not necessary. Each such file begins by e.g. printing the major version
in order to clarify its purpose.

The comment atop citus_nodes.h details how to add support for new nodes
for when that becomes necessary.
2016-10-18 16:23:55 -06:00
Metin Doslu 827d1ddb75 Add HAVING support
This commit completes having support in Citus by adding having support for
real-time and task-tracker executors. Multiple tests are added to regression
tests to cover new supported queries with having support.
2016-10-13 15:47:53 +03:00
Marco Slot 2dfe17b75e Make count return 0 if all shards are pruned away
Before this change, count on a distributed returned NULL if all shards
were pruned away, because on the master we replace with count(..) call
with a sum(..) call to sum the counts from the shards. However, sum
returns NULL when there are no rows, whereas count is expected to return
0.
2016-09-29 20:27:26 +02:00
Marco Slot a2276adcd2 Fix segmentation fault in case of joins with WHERE 1=0 2016-09-26 15:12:29 +02:00
Metin Doslu 60d67a39f1 Add outer join clause list extraction for subquery pushdown logic
In subquery pushdown, we allow outer joins if the join condition is on the
partition columns. WhereClauseList() used to return all join conditions including
outer joins. However, this has been changed with a commit related to outer join
support on regular queries. With this commit, we refactored ExtractFromExpressionWalker()
to return two lists of qualifiers. The first list is for inner join and filter
clauses and the second list is for outer join clauses. Therefore, we can also
use outer join clauses to check subquery pushdown prerequisites.
2016-09-02 11:54:44 +03:00
Burak Yucesoy 7df5a265c7 Fix COUNT DISTINCT approximation with schema
Fixes #555

Before this change, we were resolving HLL function and type Oid without qualified name.
Now we find the schema name where HLL objects are stored and generate qualified names for
each objects.

Similar fix is also applied for cstore_table_size function call.
2016-07-21 17:29:18 +03:00
Eren c92c81b550 Add LIMIT/OFFSET Support
Fixes #394

This change adds LIMIT/OFFSET support for non router-plannable
distributed queries.

In cases that we can push the LIMIT down, we add the OFFSET value to
that LIMIT in the worker queries. When a query with LIMIT x OFFSET y is issued,
the query is propagated to the workers as LIMIT (x+y) OFFSET 0, and on the
master table, the original LIMIT and OFFSET values are used. With this change,
we can use OFFSET wherever we can use LIMIT.
2016-07-18 12:00:24 +03:00
Eren ae5687e726 Eliminate compile time warnings in multi_logical_optimizer.c
This change removes some issues about mixed declarations
and code in TablePartitioningSupportsDistinct() and
WorkerExtendedOpNode() functions.
2016-06-10 12:27:12 +03:00
Murat Tuncer 315b7f3e4c Fix crash in count distinct with filters in repartition subqueries
now copies all column references in count distinct aggreagete
to worker target list and group by. Master target list is
also updated to reflect changes in attribute order.

Fixes 569
2016-06-09 11:47:24 +03:00
Murat Tuncer 41096f2076 Change equality operator check for operator expressions 2016-06-06 12:34:16 +03:00
Murat Tuncer 9167373f54 Add complex distinct count support for repartitioned subqueries
Single table repartition subqueries now support count(distinct column)
and count(distinct (case when ...)) expressions. Repartition query
extracts column used in aggregate expression and adds them to target
list and group by list, master query stays the same (count (distinct ...))
but attribute numbers inside the aggregate expression is modified to
reflect changes in repartition query.
2016-05-27 15:43:05 +03:00
Brian Cloutier 5962c9b7c8 Query Planning Performance Improvments (#474)
- Only look at pruned shards when determining AnchorTable
- Use cached shardIntervalCompareFunction during copartition check
2016-05-03 10:48:46 +03:00
Onder Kalaci c763d7492c Apply final code review feedback
- Fix o(n^2) loop to o(n)
- Collapse two if statements into a single one
- Some coding conventions feedback
2016-04-27 10:36:03 +03:00
Onder Kalaci 16425e9054 Add fast shard pruning path for INSERTs on hash partitioned tables
This commit adds a fast shard pruning path for INSERTs on
hash-partitioned tables. The rationale behind this change is
that if there exists a sorted shard interval array, a single
index lookup on the array allows us to find the corresponding
shard interval. As mentioned above, we need a sorted
(wrt shardminvalue) shard interval array. Thus, this commit
updates shardIntervalArray to sortedShardIntervalArray in the
metadata cache. Then uses the low-level API that is defined in
multi_copy to handle the fast shard pruning.

The performance impact of this change is more apparent as more
shards exist for a distributed table. Previous implementation
was relying on linear search through the shard intervals. However,
this commit relies on constant lookup time on shard interval
array. Thus, the shard pruning becomes less dependent on the
shard count.
2016-04-26 11:16:00 +03:00
Brian Cloutier c6135fe0dc Support count(distinct) on hash partitioned tables
Also add test to ensure we get the same results when running
count(distinct) on range and hash partitioned tables.
2016-04-20 04:54:07 -07:00
eren 33b96dfb7f FIX Warning Message in multi_logical_optimizer.c
With #426, some new warning messages started to arise, because of
cross assignment of Node and Expr pointers. This change fixes the
warnings with type casts.
2016-04-20 11:33:29 +03:00
eren f77cff3fb6 Fix JOINs on varchar columns with subquery pushdown
Fixes #379

Varchar VAR struct is wrapped in RELABELTYPE struct inside PostgreSQL code and
IsPartitionColumnRecursive function considers only VAR types so returning false
for varchar.

This change adds strip_implicit_coercions() call to the columnExpression in
IsPartitionColumnRecursive function so that we get rid of implicit coercions like
RELABELTYPE are stripped to VAR.
2016-04-19 21:55:50 -06:00
eren f53057c7dd Fix Shard Pruning Problem With Subqueries on VARCHAR Partition Columns
Fixes #375

Prior to this change, shard pruning couldn't be done if:
- Table is hash-distributed
- Partition column of is VARCHAR
- Query to be pruned is a subquery

There were two problems:
- A bug in left-side/right-side checks for the partition column
- We were not considering relabeled types (VARCHAR was relabeled as TEXT)
2016-04-19 21:55:50 -06:00
Jason Petersen a95c9da472 Update copyright dates
Fixed configure variable and updated all end dates to 2016.
2016-03-23 17:14:37 -06:00
Metin Doslu 87ff558c1c Add check for count distinct on single table subqueries
Fixes #314
2016-02-17 14:24:07 +02:00
Jason Petersen 166f96bb83 First formatting attempt
Skipped csql, ruleutils, readfuncs, and functions obviously copied from
PostgreSQL. Seeing how this looks, then continuing.
2016-02-15 23:29:32 -07:00
Onder Kalaci 136306a1fe Initial commit of Citus 5.0 2016-02-11 04:05:32 +02:00