Commit Graph

996 Commits (8f98d81f48268e153a594a7e8b966181879ab0da)

Author SHA1 Message Date
Marco Slot 8f98d81f48 Don't change query tree of DDL commands 2017-05-04 21:34:28 +02:00
Jason Petersen 5a242bf9b5 Fix CREATE SEQUENCE generation bug
Apparently we've had a typo all this time causing us to pass the cache
value for the start value.
2017-05-03 21:47:06 -07:00
Önder Kalacı 260989677c Skip exhaustive test in CoPartitionedTables() if declared colocated (#1376)
That's considerably cheaper.
2017-05-02 03:33:21 +03:00
Önder Kalacı 2ef11d6b25 Subqueries in where -- updated (#1372)
* Support for subqueries in WHERE clause

This commit enables subqueries in WHERE clause to be pushed down
by the subquery pushdown logic.

The support covers:
  - Correlated subqueries with IN, NOT IN, EXISTS, NOT EXISTS,
    operator expressions such as (>, <, =, ALL, ANY etc.)
  - Non-correlated subqueries with (partition_key) IN (SELECT partition_key ..)
    (partition_key) =ANY (SELECT partition_key ...)

Note that this commit heavily utilizes the attribute equivalence logic introduced
in the 1cb6a34ba8. In general, this commit mostly
adjusts the logical planner not to error out on the subqueries in WHERE clause.

* Improve error checks for subquery pushdown and INSERT ... SELECT

Since we allow subqueries in WHERE clause with the previous commit,
we should apply the same limitations to those subqueries.

With this commit, we do not iterate on each subquery one by one.
Instead, we extract all the subqueries and apply the checks directly
on those subqueries. The aim of this change is to (i) Simplify the
code (ii) Make it close to the checks on INSERT .. SELECT code base.

* Extend checks for unresolved paramaters to include SubLinks

With the presence of subqueries in where clause (i.e., SubPlans on the
query) the existing way for checking unresolved parameters fail. The
reason is that the parameters for SubPlans are kept on the parent plan not
on the query itself (see primnodes.h for the details).

With this commit, instead of checking SubPlans on the modified plans
we start to use originalQuery, where SubLinks represent the subqueries
in where clause. The unresolved parameters can be found on the SubLinks.

* Apply code-review feedback

* Remove unnecessary copying of shard interval list

This commit removes unnecessary copying of shard interval list. Note
that there are no copyObject function implemented for shard intervals.
2017-05-01 17:20:21 +03:00
Marco Slot 79b55fcf88 Merge pull request #1357 from citusdata/fix_gitignore
Add missing regression test output files to .gitignore
2017-04-28 19:13:26 -07:00
Marco Slot 552ccdaa07 Add missing regression test output files to .gitignore 2017-04-29 03:56:14 +02:00
Marco Slot 99c51803a7 Merge pull request #1265 from citusdata/truncate_propagation
Honour enable_ddl_propagation in truncate trigger
2017-04-28 18:47:52 -07:00
Marco Slot d936d535df Honour enable_ddl_propagation in truncate trigger 2017-04-29 03:32:52 +02:00
Brian Cloutier e94879ae7c Fix crash in isolation tests
- There was a crash when the table a shardid belonged to changed during
  a session. Instead of crashing (a failed assert) we now throw an error
- Update the isolation test which was crashing to no longer exercise
  that code path
- Add a regression test to check that the error is thrown
2017-04-29 04:25:26 +03:00
Önder Kalacı 76ea17a9bc Subquery pushdown - main branch (#1323)
* Enabling physical planner for subquery pushdown changes

This commit applies the logic that exists in INSERT .. SELECT
planning to the subquery pushdown changes.

The main algorithm is followed as :
   - pick an anchor relation (i.e., target relation)
   - per each target shard interval
       - add the target shard interval's shard range
         as a restriction to the relations (if all relations
         joined on the partition keys)
        - Check whether the query is router plannable per
          target shard interval.
        - If router plannable, create a task

* Add union support within the JOINS

This commit adds support for UNION/UNION ALL subqueries that are
in the following form:

     .... (Q1 UNION Q2 UNION ...) as union_query JOIN (QN) ...

In other words, we currently do NOT support the queries that are
in the following form where union query is not JOINed with
other relations/subqueries :

     .... (Q1 UNION Q2 UNION ...) as union_query ....

* Subquery pushdown planner uses original query

With this commit, we change the input to the logical planner for
subquery pushdown. Before this commit, the planner was relying
on the query tree that is transformed by the postgresql planner.
After this commit, the planner uses the original query. The main
motivation behind this change is the simplify deparsing of
subqueries.

* Enable top level subquery join queries

This work enables
- Top level subquery joins
- Joins between subqueries and relations
- Joins involving more than 2 range table entries

A new regression test file is added to reflect enabled test cases

* Add top level union support

This commit adds support for UNION/UNION ALL subqueries that are
in the following form:

     .... (Q1 UNION Q2 UNION ...) as union_query ....

In other words, Citus supports allow top level
unions being wrapped into aggregations queries
and/or simple projection queries that only selects
some fields from the lower level queries.

* Disallow subqueries without a relation in the range table list for subquery pushdown

This commit disallows subqueries without relation in the range table
list. This commit is only applied for subquery pushdown. In other words,
we do not add this limitation for single table re-partition subqueries.

The reasoning behind this limitation is that if we allow pushing down
such queries, the result would include (shardCount * expectedResults)
where in a non distributed world the result would be (expectedResult)
only.

* Disallow subqueries without a relation in the range table list for INSERT .. SELECT

This commit disallows subqueries without relation in the range table
list. This commit is only applied for INSERT.. SELECT queries.

The reasoning behind this limitation is that if we allow pushing down
such queries, the result would include (shardCount * expectedResults)
where in a non distributed world the result would be (expectedResult)
only.

* Change behaviour of subquery pushdown flag (#1315)

This commit changes the behaviour of the citus.subquery_pushdown flag.
Before this commit, the flag is used to enable subquery pushdown logic. But,
with this commit, that behaviour is enabled by default. In other words, the
flag is now useless. We prefer to keep the flag since we don't want to break
the backward compatibility. Also, we may consider using that flag for other
purposes in the next commits.

* Require subquery_pushdown when limit is used in subquery

Using limit in subqueries may cause returning incorrect
results. Therefore we allow limits in subqueries only
if user explicitly set subquery_pushdown flag.

* Evaluate expressions on the LIMIT clause (#1333)

Subquery pushdown uses orignal query, the LIMIT and OFFSET clauses
are not evaluated. However, logical optimizer expects these expressions
are already evaluated by the standard planner. This commit manually
evaluates the functions on the logical planner for subquery pushdown.

* Better format subquery regression tests (#1340)

* Style fix for subquery pushdown regression tests

With this commit we intented a more consistent style for the
regression tests we've added in the
  - multi_subquery_union.sql
  - multi_subquery_complex_queries.sql
  - multi_subquery_behavioral_analytics.sql

* Enable the tests that are temporarily commented

This commit enables some of the regression tests that were commented
out until all the development is done.

* Fix merge conflicts (#1347)

 - Update regression tests to meet the changes in the regression
   test output.
 - Replace Ifs with Asserts given that the check is already done
 - Update shard pruning outputs

* Add view regression tests for increased subquery coverage (#1348)

- joins between views and tables
- joins between views
- union/union all queries involving views
- views with limit
- explain queries with view

* Improve btree operators for the subquery tests

This commit adds the missing comprasion for subquery composite key
btree comparator.
2017-04-29 04:09:48 +03:00
Andres Freund e55408bf42 Merge pull request #1369 from citusdata/featurefix/better-range-pruning
Improve / Fix range pruning
2017-04-28 17:45:58 -07:00
Andres Freund 4bf6b8cdfa Perform range based pruning if equality pruning has survivor.
We previously dismissed this as unimportant, but it turns out to be
very useful for the upcoming subquery pushdown, where a user might
specify an equality constraint in a subquery, and the subquery
pushdown machinery adds >= and <= restrictions on the shard boundary.
Previously the latter restriction was ignored.
2017-04-28 17:35:18 -07:00
Andres Freund 042020eabf Use stricter qual for pruning if both >/< and >=/<= are present.
Previously, if both =< and < (>= and < respectively) were specified,
we always used the latter restriction.  Instead use the stricter one.
2017-04-28 17:35:18 -07:00
Marco Slot 6d53c2e79c Merge pull request #1368 from citusdata/fix_get_live_node_count
Fix list length lookup in WorkerGetLiveNodeCount
2017-04-28 17:26:25 -07:00
Marco Slot 053f10d91c Fix list length lookup in WorkerGetLiveNodeCount 2017-04-29 02:13:20 +02:00
Marco Slot c6392bd98b Merge pull request #1349 from citusdata/fix_check_vanilla
Fix check-vanilla tests
2017-04-28 17:10:48 -07:00
Burak Yucesoy edd69310fd Fix check-vanilla tests
It semms that GEQO optimizations, when it is set to on, create their own memory context
and free it after when it is no longer necessary. In join multi_join_restriction_hook
we allocate our variables in the CurrentMemoryContext, which is GEQO's memory context
if it is active. To prevent deallocation of our variables when GEQO's memory context is
freed, we started to allocate memory fo these variables in separate MemoryContext.
2017-04-29 01:55:18 +02:00
Marco Slot 7d77176842 Merge pull request #1365 from citusdata/fix_size
Check whether relation ID exists in DistributedTableSize
2017-04-28 16:51:41 -07:00
Marco Slot 97d36b7dfe Check whether relation ID exists in citus_relation_size 2017-04-29 01:39:39 +02:00
Andres Freund c7b4f170fd Merge pull request #1331 from citusdata/feature/faster-pruning
Faster Shard Pruning Implementation
2017-04-28 15:01:41 -07:00
Andres Freund f6ef7f2c03 Faster shard pruning.
So far citus used postgres' predicate proofing logic for shard
pruning, except for INSERT and COPY which were already optimized for
speed.  That turns out to be too slow:
* Shard pruning for SELECTs is currently O(#shards), because
  PruneShardList calls predicate_refuted_by() for every
  shard. Obviously using an O(N) type algorithm for general pruning
  isn't good.
* predicate_refuted_by() is quite expensive on its own right. That's
  primarily because it's optimized for doing a single refutation
  proof, rather than performing the same proof over and over.
* predicate_refuted_by() does not keep persistent state (see 2.) for
  function calls, which means that a lot of syscache lookups will be
  performed. That's particularly bad if the partitioning key is a
  composite key, because without a persistent FunctionCallInfo
  record_cmp() has to repeatedly look-up the type definition of the
  composite key. That's quite expensive.

Thus replace this with custom-code that works in two phases:
1) Search restrictions for constraints that can be pruned upon
2) Use those restrictions to search for matching shards in the most
   efficient manner available:
   a) Binary search / Hash Lookup in case of hash partitioned tables
   b) Binary search for equal clauses in case of range or append
      tables without overlapping shards.
   c) Binary search for inequality clauses, searching for both lower
      and upper boundaries, again in case of range or append
      tables without overlapping shards.
   d) exhaustive search testing each ShardInterval

My measurements suggest that we are considerably, often orders of
magnitude, faster than the previous solution, even if we have to fall
back to exhaustive pruning.
2017-04-28 14:40:41 -07:00
Andres Freund 2013090a77 Add DistTableCacheEntry->hasOverlappingShardInterval.
This determines whether it's possible to perform binary search on
sortedShardIntervalArray or not.  If e.g. two shards have overlapping
ranges, that'd be prohibitive.

That'll be useful in later commit introducing faster shard pruning.
2017-04-28 14:40:38 -07:00
Andres Freund 15d427f931 Add DistTableCacheEntry->shardValueCompareFunction.
That's useful when comparing values a hash-partitioned table is
filtered by.  The existing shardIntervalCompareFunction is about
comparing hashed values, not unhashed ones.

The added btree opclass function is so we can get a comparator
back. This should be changed much more widely, but is not necessary so
far.
2017-04-28 14:40:38 -07:00
Andres Freund f3172e9719 Build DistTableCacheEntry->shardIntervalCompareFunction even for 0 shards.
Previously we, unnecessarily, used a the first shard's type
information to to look up the comparison function.  But that
information is already available, so use it.  That's helpful because
we sometimes want to access the comparator function even if there's no
shards.
2017-04-28 14:40:38 -07:00
Andres Freund 99642306ed Fix: Make FindShardIntervalIndex robust against 0 shards. 2017-04-28 14:40:38 -07:00
Metin Döşlü 669b5e243f Merge pull request #1361 from citusdata/explain_with_savepoint
Send explain queries with savepoints
2017-04-28 13:43:27 -07:00
Metin Doslu d411892fe6 Send explain queries with savepoints
With this commit, we started to send explain queries within a savepoint. After
running explain query, we rollback to savepoint. This saves us from side effects
of EXPLAIN ANALYZE on DML queries.
2017-04-28 12:13:48 -07:00
Jason Petersen a5ad70379b Merge pull request #1353 from citusdata/fix_copy_crasher
Refactor COPY to not directly use cache entry

cr: @marcocitus
2017-04-27 16:06:11 -06:00
Jason Petersen 1c353e68aa Remove FastShardPruning method
With the other simplifications, it doesn't make sense to keep around.
2017-04-27 13:32:36 -06:00
Jason Petersen 06497d74f5 Refactor FindShardInterval to use cacheEntry
All callers fetch a cache entry and extract/compute arguments for the
eventual FindShardInterval call, so it makes more sense to refactor
into that function itself; this solves the use-after-free bug, too.
2017-04-27 13:32:36 -06:00
Andres Freund 9cefa97972 Merge pull request #1351 from citusdata/feature/remove_pruning_debug
Remove Pruning Debug Output
2017-04-26 11:58:52 -07:00
Andres Freund 4fe14bdeda Some cleanup in multi_subquery test.
Remove trailing whitespace and use of EXPLAIN instead of
EXPLAIN (COSTS OFF).
2017-04-26 11:33:56 -07:00
Andres Freund f064c33d5c Add back pruning coverage lost in last commit.
Because we can't rely on the debuggin message anymore, add a bunch of
explain statements that roughly fulfill the same purpose.
2017-04-26 11:33:56 -07:00
Andres Freund 5b389eb6d7 Boring regression test output adjustments.
Soon shard pruning will be optimized not to generally work linearly
anymore.  Thus we can't print the pruned shard intervals as currently
done anymore.

The current printing of shard ids also prevents us from running tests
in parallel, as otherwise shard ids aren't linearly numbered.
2017-04-26 11:33:56 -07:00
Andres Freund 8923fe7f54 Merge pull request #1354 from citusdata/feature/faster-copartitioned-check
Skip exhaustive test in CoPartitionedTables() if declared colocated.
2017-04-26 11:33:31 -07:00
Andres Freund 9e4ec991d8 Skip exhaustive test in CoPartitionedTables() if declared colocated.
That's considerably cheaper.
2017-04-26 11:19:17 -07:00
Andres Freund c34e357885 Merge pull request #1350 from citusdata/fix/vpath-builds
Fix VPATH builds broken in 087d8427e3.
2017-04-25 16:25:54 -07:00
Andres Freund 5524d389cb Fix VPATH builds broken in 087d8427e3.
1) Generated files reside in the build directory, not the source
   directory.
2) As a generated file is now included in the build, add it to the
   include path (-I)
2017-04-25 16:04:42 -07:00
Marco Slot 1b4ebd490d Only process error if not NULL in StoreErrorMessage 2017-04-21 17:01:01 +02:00
Marco Slot 326f8d9d61 Use right sizeof in UpdateRelationColocationGroup 2017-04-21 16:37:09 +02:00
Burak Yücesoy e291887227 Merge pull request #1294 from citusdata/fix_test_outputs_for_valgrind
Prepare for valgrind automation
2017-04-21 05:51:14 -08:00
Burak Yucesoy a35d0cd8af Configure valgrind command line arguments 2017-04-21 16:30:12 +03:00
Burak Yucesoy 9312ef8bcf Stabilize test outputs 2017-04-21 16:08:52 +03:00
Eren Basak 71d99b72ce Add support for proper valgrind tests
This change allows valgrind tests (`make check-multi-vg`) to be
run seamlessly without test output errors and timeout problems.
2017-04-21 16:08:52 +03:00
Marco Slot 384f32b191 Merge pull request #1302 from citusdata/serial_partition_column
Support expressions in the partition column in INSERTs
2017-04-21 14:18:13 +02:00
Marco Slot 7d1f7b8923 Support expressions in the partition column in INSERTs 2017-04-21 14:05:52 +02:00
Burak Velioglu 41bb84c9c0 Merge pull request #1292 from citusdata/alter_add_constraint_m
Alter Table Add Constraint
2017-04-20 15:33:02 +03:00
velioglu a26edd2249 Implement ALTER TABLE ADD CONSTRAINT command 2017-04-20 15:02:33 +03:00
Burak Velioglu 5170731848 Merge pull request #1316 from citusdata/add_guc_for_cross_shard
Log cross-shard queries
2017-04-20 14:08:21 +03:00
velioglu 5b3e47de7a Log message of across shard queries according to the log level 2017-04-20 12:24:46 +03:00