Commit Graph

74 Commits (62e7bdbdd67a64002877d73746c974373e4d3be7)

Author SHA1 Message Date
Onder Kalaci 16425e9054 Add fast shard pruning path for INSERTs on hash partitioned tables
This commit adds a fast shard pruning path for INSERTs on
hash-partitioned tables. The rationale behind this change is
that if there exists a sorted shard interval array, a single
index lookup on the array allows us to find the corresponding
shard interval. As mentioned above, we need a sorted
(wrt shardminvalue) shard interval array. Thus, this commit
updates shardIntervalArray to sortedShardIntervalArray in the
metadata cache. Then uses the low-level API that is defined in
multi_copy to handle the fast shard pruning.

The performance impact of this change is more apparent as more
shards exist for a distributed table. Previous implementation
was relying on linear search through the shard intervals. However,
this commit relies on constant lookup time on shard interval
array. Thus, the shard pruning becomes less dependent on the
shard count.
2016-04-26 11:16:00 +03:00
Murat Tuncer 2bc96fabe5 Add dynamic executor selection
- non-router plannable queries can be executed
  by router executor if they satisfy the criteria
- router executor is removed from configuration,
  now task executor can not be set to router
- removed some tests that error out for router executor
2016-04-21 09:15:33 +03:00
Murat Tuncer 68cbf8a482 Add router plannable check and router planning logic
for single shard select queries
2016-04-21 09:15:33 +03:00
Brian Cloutier c6135fe0dc Support count(distinct) on hash partitioned tables
Also add test to ensure we get the same results when running
count(distinct) on range and hash partitioned tables.
2016-04-20 04:54:07 -07:00
eren 33b96dfb7f FIX Warning Message in multi_logical_optimizer.c
With #426, some new warning messages started to arise, because of
cross assignment of Node and Expr pointers. This change fixes the
warnings with type casts.
2016-04-20 11:33:29 +03:00
eren f77cff3fb6 Fix JOINs on varchar columns with subquery pushdown
Fixes #379

Varchar VAR struct is wrapped in RELABELTYPE struct inside PostgreSQL code and
IsPartitionColumnRecursive function considers only VAR types so returning false
for varchar.

This change adds strip_implicit_coercions() call to the columnExpression in
IsPartitionColumnRecursive function so that we get rid of implicit coercions like
RELABELTYPE are stripped to VAR.
2016-04-19 21:55:50 -06:00
eren e786cbed0f Fix Join Problem With VARCHAR Partition Columns
This change fixes the problem with joins with VARCHAR columns. Prior to
this change, when we tried to do large table joins on varchar columns, we got
an error of the form:
ERROR: cannot perform local joins that involve expressions
DETAIL: local joins can be performed between columns only.

This is because we have a check in CheckJoinBetweenColumns() which requires the
join clause to have only 'Var' nodes (i.e. columns). Postgres adds a relabel t
ype cast to cast the varchar to text; hence the type of the node is not T_Var
and the join fails.

The fix involves calling strip_implicit_coercions() to the left and right
arguments so that RELABELTYPE is stripped to VAR.

Fixes #76.
2016-04-19 21:55:50 -06:00
eren f53057c7dd Fix Shard Pruning Problem With Subqueries on VARCHAR Partition Columns
Fixes #375

Prior to this change, shard pruning couldn't be done if:
- Table is hash-distributed
- Partition column of is VARCHAR
- Query to be pruned is a subquery

There were two problems:
- A bug in left-side/right-side checks for the partition column
- We were not considering relabeled types (VARCHAR was relabeled as TEXT)
2016-04-19 21:55:50 -06:00
Andres Freund c53fcd8042 Remove wholly unused variable.
This avoids a -Wunused warning.
2016-04-19 12:31:13 -06:00
Andres Freund 926534bbc2 Annotate variables only used for asserts with PG_USED_FOR_ASSERTS_ONLY.
This avoids '-Wunused-but-set-variable' type warnings when compiling
without assertions, e.g. against a system postgres.
2016-04-19 12:31:12 -06:00
Jason Petersen 37d7f98f50 Add clarifying comment in HashableClauseMutator
While reading this code last week, it appeared as though there was no
place we ensured that the partition clause actually used equality ops.
As such, I was worried that we might transform a clause such as id < 5
into a constraint like hash(id) = hash(5) when doing shard pruning. The
relevant code seemed to just ensure:

  1. The node is an OpExpr
  2. With a related hash function
  3. It compares the partition column
  4. Against a constant

A superficial reading implied we didn't actually make sure the original
op was equality-related, but it turns out the hash lookup function DOES
ensure that for us. So I added a comment.
2016-04-19 12:21:11 -06:00
Onder Kalaci 2eabf3fcfa Allow all types of nodes in the WHERE clauses
This change removes the whitelisting check on the WHERE clauses. Note that, before
this change, citus was already allowing all types of nodes with the following
format (i.e., wrap with a boolean test):

  * SELECT col FROM table WHERE (ANY EXPRESSION) is TRUE;

Thus, this change is mostly useful for allowing the expressions in the WHERE clause
directly and avoiding "unsupport clause type" errors.
2016-03-30 16:39:58 +03:00
eren 3a9a01557f Fix spurious NOTICE messages with ANY/ALL
Fixes issue #258

Prior to this change, Citus gives a deceptive NOTICE message when a query
including ANY or ALL on a non-partition column is issued on a hash
partitioned table.

Let the github_events table be hash-distributed on repo_id column. Then,
issuing this query:
    SELECT count(*) FROM github_events WHERE event_id = ANY ('{1,2,3}')

Gives this message:
    NOTICE: cannot use shard pruning with ANY (array expression)
    HINT: Consider rewriting the expression with OR clauses.

Note that since event_id is not the partition column, shard pruning would
not be applied in any case. However, the NOTICE message would be valid
and be given if the ANY clause would have been applied on repo_id column.

Reviewer: Murat Tuncer
2016-03-25 14:30:02 +02:00
Jason Petersen a95c9da472 Update copyright dates
Fixed configure variable and updated all end dates to 2016.
2016-03-23 17:14:37 -06:00
Murat Tuncer 00b10e5a93 Merge from master branch into feature/citusdb-to-citus 2016-02-17 14:49:01 +02:00
Metin Doslu 87ff558c1c Add check for count distinct on single table subqueries
Fixes #314
2016-02-17 14:24:07 +02:00
Jason Petersen 166f96bb83 First formatting attempt
Skipped csql, ruleutils, readfuncs, and functions obviously copied from
PostgreSQL. Seeing how this looks, then continuing.
2016-02-15 23:29:32 -07:00
Murat Tuncer c1d377b7d2 Changed product name to citus
All citusdb references in
- extension, binary names
- file headers
- all configuration name prefixes
- error/warning messages
- some functions names
- regression tests

are changed to be citus.
2016-02-15 16:04:31 +02:00
Önder Kalacı 2c51168576 Merge pull request #332 from citusdata/bugfix/memory_context_leak
Remove unnecessary memory context switch on the planner
2016-02-12 11:13:12 -08:00
Onder Kalaci 0a6839e544 Perform distributed planning in the calling memory context
Previously we used, for historical reasons, MessageContext.
That is problematic if a single message from the client
causes a lot of statements to be planned. E.g. for the
copy_to_distributed_table script one insert statement
is planned for each row inserted via COPY, and only freed
when COPY has finished.
2016-02-12 20:50:40 +02:00
Jason Petersen 05bb408865 Merge pull request #331 from citusdata/feature-permit_dml_to_append_tables#321
Allow DML commands on append-partitioned tables

cr: @lithp
2016-02-12 11:24:06 -07:00
Jason Petersen ef7206348d Allow DML commands on append-partitioned tables
This entirely removes any restriction on the type of partitioning
during DML planning and execution. Though there aren't actually any
technical limitations preventing DML commands against append- (or even
range-) partitioned tables, we had initially forbidden this, as any
future stage operation could cause shards to overlap, banning all
subsequent DML operations to partition values contained within more
than one shards. This ended up mostly restricting us, so we're now
removing that restriction.
2016-02-11 16:09:35 -07:00
Jason Petersen bbda64aaa1 Handle hash-partitioned aliased data types
When two data types have the same binary representation, PostgreSQL may
add an implicit coercion between them by wrapping a node in a relabel
type. This wrapper signals that the wrapped value is completely binary
compatible with the designated "final type" of the relabel node. As an
example, the varchar type is often relabeled to text, since functions
provided for use with text (comparisons, hashes, etc.) are completely
compatible with varchar as well.

The hash-partitioned codepath contains functions that verify queries
actually contain an equality constraint on the partition column, but
those functions expect such constraints to be comparison operations
between a Var and Const. The RelabelType wrapper node causes these
functions to always return false, which bypasses shard pruning.
2016-02-11 13:50:43 -07:00
Onder Kalaci 136306a1fe Initial commit of Citus 5.0 2016-02-11 04:05:32 +02:00