Commit Graph

287 Commits (601e9431dc30ada92815c70c2339082f997f7583)

Author SHA1 Message Date
Brian Cloutier 5166319f0e Add a test for upgrading shard placements 2017-07-12 14:18:27 +02:00
Brian Cloutier 86ce70a220 Rename pg_dist_shard_placement -> pg_dist_placement
Comes with a few changes:

- Change the signature of some functions to accept groupid
  - InsertShardPlacementRow
  - DeleteShardPlacementRow
  - UpdateShardPlacementState

- NodeHasActiveShardPlacements returns true if the group the node is a
  part of has any active shard placements

- TupleToShardPlacement now returns ShardPlacements which have NULL
  nodeName and nodePort.

- Populate (nodeName, nodePort) when creating ShardPlacements
- Disallow removing a node if it contains any shard placements

- DeleteAllReferenceTablePlacementsFromNode matches based on group. This
  doesn't change behavior for now (while there is only one node per
  group), but means in the future callers should be careful about
  calling it on a secondary node, it'll delete placements on the primary.

- Create concept of a GroupShardPlacement, which represents an actual
  tuple in pg_dist_placement and is distinct from a ShardPlacement,
  which has been resolved to a specific node. In the future
  ShardPlacement should be renamed to NodeShardPlacement.

- Create some triggers which allow existing code to continue to insert
  into and update pg_dist_shard_placement as if it still existed.
2017-07-12 14:17:31 +02:00
Brian Cloutier 74511748c6 Remove functions created just for unit testing
These functions are holdovers from pg_shard and were created for unit
testing c-level functions (like InsertShardPlacementRow) which our
regression tests already test quite effectively. Removing because it
makes refactoring the signatures of those c-level functions
unnecessarily difficult.

- create_healthy_local_shard_placement_row
- update_shard_placement_row_state
- delete_shard_placement_row
2017-07-12 14:16:24 +02:00
Brian Cloutier 405bd91978 Ignore generated multi_behavioral_analytics_create_table test files 2017-07-12 14:16:24 +02:00
Marco Slot e0c960e248 Use consistent placement IDs in mulity_modyfing_xactstest 2017-07-12 14:16:23 +02:00
Marco Slot b7a935b282 Remove XactModificationLevel distinction between DML and multi-shard 2017-07-12 11:59:19 +02:00
Jason Petersen a7317564ea Add some test outputs to gitignore
These were bothering me.
2017-07-11 15:37:32 -06:00
Marco Slot e2091ae96a Handle implicit casts in prepared INSERTs 2017-07-06 16:17:35 +02:00
Marco Slot 1ed7976cb9 Change implementation of shard_name UDF to get schema-qualified shard name 2017-07-04 10:49:40 +03:00
Onder Kalaci dc9b63504e Add some utility functions for partitioned tables
This commit is intended to be a base for supporting declarative partitioning
on distributed tables. Here we add the following utility functions and their
unit tests:

  * Very basic functions including differnentiating partitioned tables and
    partitions, listing the partitions
  * Generating the PARTITION BY (expr) and adding this to the DDL events
    of partitioned tables
  * Ability to generate text representations of the ranges for partitions
  * Ability to generate the `ALTER TABLE parent_table ATTACH PARTITION
    partition_table FOR VALUES value_range`
  * Ability to apply add shard ids to the above command using
    `worker_apply_inter_shard_ddl_command()`
  * Ability to generate `ALTER TABLE parent_table DETACH PARTITION`
2017-06-28 09:39:55 +03:00
Andres Freund 1a728f747b Remove 9.5 regression test output files. 2017-06-26 12:17:46 -07:00
Jason Petersen 05aef7ce29 Support PostgreSQL 10 (#1379)
Adds support for PostgreSQL 10 by copying in the requisite ruleutils
and updating all API usages to conform with changes in PostgreSQL 10.
Most changes are fairly minor but they are numerous. One particular
obstacle was the change in \d behavior in PostgreSQL 10's psql; I had
to add SQL implementations (views, mostly) to mimic the pre-10 output.
2017-06-26 02:35:46 -06:00
Andres Freund 2e7e40dc66 Add some tests checking that maintenance daemon gets started.
The 2nd database one is a bit slow, but also shows something
important, so we might want to keep it?
2017-06-23 11:53:39 -07:00
Andres Freund 0df00a0955 Force cache invalidation machinery to be initialized earlier.
Previously it was not guaranteed that invalidations were registered
after creating the extension, only if the extension was used
afterwards.
2017-06-23 11:20:10 -07:00
Marco Slot d4295587aa Fix spuriously failing regression test 2017-06-23 10:06:15 +02:00
Marco Slot 5f84d8e0c4 Add weird column name to create_distributed_table test 2017-06-22 16:27:39 +02:00
Marco Slot f1c6c966f0 Execute INSERT..SELECT via coordinator if it cannot be pushed down
Add a second implementation of INSERT INTO distributed_table SELECT ... that is used if
the query cannot be pushed down. The basic idea is to execute the SELECT query separately
and pass the results into the distributed table using a CopyDestReceiver, which is also
used for COPY and create_distributed_table. When planning the SELECT, we go through
planner hooks again, which means the SELECT can also be a distributed query.

EXPLAIN is supported, but EXPLAIN ANALYZE is not because preventing double execution was
a lot more complicated in this case.
2017-06-22 15:46:30 +02:00
Jason Petersen ffaae84fa5 Don't call PostProcessUtility for local commands
It is intended only to aid in processing of distributed DDL commands,
but as written could execute during local CREATE INDEX CONCURRENTLY
commands.
2017-06-19 15:56:03 -06:00
Marco Slot a6eb469ff5 Add support for unlogged distributed tables 2017-06-14 13:50:00 +02:00
velioglu 57bb512069 Use placement connection to drop shards instead of node connection 2017-06-14 14:14:59 +03:00
Marco Slot 7396cbe9e2 Allow COPY after a multi-shard command
This change removes the XactModificationLevel check at the start of COPY
that was made redundant by consistently using GetPlacementConnection.
2017-06-09 13:54:58 +02:00
Jason Petersen 5353705dc3 Add ORDER clause to subquery test missing it 2017-06-08 18:30:14 -06:00
Jason Petersen d8083e5cb5 Remove tracked files from gitignore
Causes very hard-to-debug test failures.
2017-06-08 17:39:31 -06:00
Onder Kalaci c6629fd4e8 Improve subquery pushdown regression tests
- Use native postgres function for composite key btree functions
  - Move explain tests to multi_explain.sql (get rid of .out _0.out files)
  - Get rid of input/output files for multi_subquery.sql by moving table creations
  - Update some comments
2017-05-30 14:05:15 +03:00
Burak Yucesoy 4c04d3beb4 Add tests for version check 2017-05-24 17:39:25 +03:00
Önder Kalacı 9fbdbd183e Merge branch 'master' into better_comment_for_tests 2017-05-22 10:58:21 +03:00
Onder Kalaci 4a1730d85d Add comment to the regression test file to prevent any misunderstandings about
the usage of enable_router_execution GUC variable.
2017-05-22 10:39:32 +03:00
Burak Yucesoy 2448950665 Add tests for version checks 2017-05-22 09:53:29 +03:00
Jason Petersen 618102cda5 Bump extension and configure PACKAGE versions
Actually getting this done before the next dev cycle begins.
2017-05-17 15:25:30 -06:00
Jason Petersen 1b60dd71d9 Limit sequence SELECT to last_value
Unbounded column output differs by version.
2017-05-16 11:05:34 -06:00
Jason Petersen 8fe806db58 Suppress hash index warning
Irrelevant to the test.
2017-05-16 11:05:34 -06:00
Jason Petersen e3edf83b7c Change version-sensitive tests to handle '10'
Previously assumed period in version; this makes tests future-proof.
2017-05-16 11:05:34 -06:00
Jason Petersen 53e54c66fa Remove ALTER SEQUENCE from parallel groups
Removing these has no side effect, and in the (current) PostgreSQL 10,
an ERROR is printed during concurrent sequence modification.
2017-05-16 11:05:34 -06:00
Jason Petersen 8b0a4d8315 Add unambiguous ORDER BY clauses to many tests
Queries which do not specify an order may arbitrarily change output
across PostgreSQL versions.
2017-05-16 11:05:34 -06:00
Jason Petersen c51c17327d Rename very long test files
In addition to not actually providing much information, these names can
cause problems in PostgreSQL 10.
2017-05-16 11:05:33 -06:00
Burak Yucesoy 25d2cac8f6 Add tests for non-default schema owner 2017-05-15 16:49:37 +03:00
Önder Kalacı df20f7c5ed Add support for parametrized execution for subquery pushdown (#1356)
Distributed query planning for subquery pushdown is done on the original
query. This prevents the usage of external parameters on the execution.
To overcome this, we manually replace the parameters on the original
query.
2017-05-10 09:38:48 +03:00
Marco Slot 8f98d81f48 Don't change query tree of DDL commands 2017-05-04 21:34:28 +02:00
Önder Kalacı 2ef11d6b25 Subqueries in where -- updated (#1372)
* Support for subqueries in WHERE clause

This commit enables subqueries in WHERE clause to be pushed down
by the subquery pushdown logic.

The support covers:
  - Correlated subqueries with IN, NOT IN, EXISTS, NOT EXISTS,
    operator expressions such as (>, <, =, ALL, ANY etc.)
  - Non-correlated subqueries with (partition_key) IN (SELECT partition_key ..)
    (partition_key) =ANY (SELECT partition_key ...)

Note that this commit heavily utilizes the attribute equivalence logic introduced
in the 1cb6a34ba8. In general, this commit mostly
adjusts the logical planner not to error out on the subqueries in WHERE clause.

* Improve error checks for subquery pushdown and INSERT ... SELECT

Since we allow subqueries in WHERE clause with the previous commit,
we should apply the same limitations to those subqueries.

With this commit, we do not iterate on each subquery one by one.
Instead, we extract all the subqueries and apply the checks directly
on those subqueries. The aim of this change is to (i) Simplify the
code (ii) Make it close to the checks on INSERT .. SELECT code base.

* Extend checks for unresolved paramaters to include SubLinks

With the presence of subqueries in where clause (i.e., SubPlans on the
query) the existing way for checking unresolved parameters fail. The
reason is that the parameters for SubPlans are kept on the parent plan not
on the query itself (see primnodes.h for the details).

With this commit, instead of checking SubPlans on the modified plans
we start to use originalQuery, where SubLinks represent the subqueries
in where clause. The unresolved parameters can be found on the SubLinks.

* Apply code-review feedback

* Remove unnecessary copying of shard interval list

This commit removes unnecessary copying of shard interval list. Note
that there are no copyObject function implemented for shard intervals.
2017-05-01 17:20:21 +03:00
Marco Slot d936d535df Honour enable_ddl_propagation in truncate trigger 2017-04-29 03:32:52 +02:00
Brian Cloutier e94879ae7c Fix crash in isolation tests
- There was a crash when the table a shardid belonged to changed during
  a session. Instead of crashing (a failed assert) we now throw an error
- Update the isolation test which was crashing to no longer exercise
  that code path
- Add a regression test to check that the error is thrown
2017-04-29 04:25:26 +03:00
Önder Kalacı 76ea17a9bc Subquery pushdown - main branch (#1323)
* Enabling physical planner for subquery pushdown changes

This commit applies the logic that exists in INSERT .. SELECT
planning to the subquery pushdown changes.

The main algorithm is followed as :
   - pick an anchor relation (i.e., target relation)
   - per each target shard interval
       - add the target shard interval's shard range
         as a restriction to the relations (if all relations
         joined on the partition keys)
        - Check whether the query is router plannable per
          target shard interval.
        - If router plannable, create a task

* Add union support within the JOINS

This commit adds support for UNION/UNION ALL subqueries that are
in the following form:

     .... (Q1 UNION Q2 UNION ...) as union_query JOIN (QN) ...

In other words, we currently do NOT support the queries that are
in the following form where union query is not JOINed with
other relations/subqueries :

     .... (Q1 UNION Q2 UNION ...) as union_query ....

* Subquery pushdown planner uses original query

With this commit, we change the input to the logical planner for
subquery pushdown. Before this commit, the planner was relying
on the query tree that is transformed by the postgresql planner.
After this commit, the planner uses the original query. The main
motivation behind this change is the simplify deparsing of
subqueries.

* Enable top level subquery join queries

This work enables
- Top level subquery joins
- Joins between subqueries and relations
- Joins involving more than 2 range table entries

A new regression test file is added to reflect enabled test cases

* Add top level union support

This commit adds support for UNION/UNION ALL subqueries that are
in the following form:

     .... (Q1 UNION Q2 UNION ...) as union_query ....

In other words, Citus supports allow top level
unions being wrapped into aggregations queries
and/or simple projection queries that only selects
some fields from the lower level queries.

* Disallow subqueries without a relation in the range table list for subquery pushdown

This commit disallows subqueries without relation in the range table
list. This commit is only applied for subquery pushdown. In other words,
we do not add this limitation for single table re-partition subqueries.

The reasoning behind this limitation is that if we allow pushing down
such queries, the result would include (shardCount * expectedResults)
where in a non distributed world the result would be (expectedResult)
only.

* Disallow subqueries without a relation in the range table list for INSERT .. SELECT

This commit disallows subqueries without relation in the range table
list. This commit is only applied for INSERT.. SELECT queries.

The reasoning behind this limitation is that if we allow pushing down
such queries, the result would include (shardCount * expectedResults)
where in a non distributed world the result would be (expectedResult)
only.

* Change behaviour of subquery pushdown flag (#1315)

This commit changes the behaviour of the citus.subquery_pushdown flag.
Before this commit, the flag is used to enable subquery pushdown logic. But,
with this commit, that behaviour is enabled by default. In other words, the
flag is now useless. We prefer to keep the flag since we don't want to break
the backward compatibility. Also, we may consider using that flag for other
purposes in the next commits.

* Require subquery_pushdown when limit is used in subquery

Using limit in subqueries may cause returning incorrect
results. Therefore we allow limits in subqueries only
if user explicitly set subquery_pushdown flag.

* Evaluate expressions on the LIMIT clause (#1333)

Subquery pushdown uses orignal query, the LIMIT and OFFSET clauses
are not evaluated. However, logical optimizer expects these expressions
are already evaluated by the standard planner. This commit manually
evaluates the functions on the logical planner for subquery pushdown.

* Better format subquery regression tests (#1340)

* Style fix for subquery pushdown regression tests

With this commit we intented a more consistent style for the
regression tests we've added in the
  - multi_subquery_union.sql
  - multi_subquery_complex_queries.sql
  - multi_subquery_behavioral_analytics.sql

* Enable the tests that are temporarily commented

This commit enables some of the regression tests that were commented
out until all the development is done.

* Fix merge conflicts (#1347)

 - Update regression tests to meet the changes in the regression
   test output.
 - Replace Ifs with Asserts given that the check is already done
 - Update shard pruning outputs

* Add view regression tests for increased subquery coverage (#1348)

- joins between views and tables
- joins between views
- union/union all queries involving views
- views with limit
- explain queries with view

* Improve btree operators for the subquery tests

This commit adds the missing comprasion for subquery composite key
btree comparator.
2017-04-29 04:09:48 +03:00
Marco Slot 97d36b7dfe Check whether relation ID exists in citus_relation_size 2017-04-29 01:39:39 +02:00
Andres Freund f6ef7f2c03 Faster shard pruning.
So far citus used postgres' predicate proofing logic for shard
pruning, except for INSERT and COPY which were already optimized for
speed.  That turns out to be too slow:
* Shard pruning for SELECTs is currently O(#shards), because
  PruneShardList calls predicate_refuted_by() for every
  shard. Obviously using an O(N) type algorithm for general pruning
  isn't good.
* predicate_refuted_by() is quite expensive on its own right. That's
  primarily because it's optimized for doing a single refutation
  proof, rather than performing the same proof over and over.
* predicate_refuted_by() does not keep persistent state (see 2.) for
  function calls, which means that a lot of syscache lookups will be
  performed. That's particularly bad if the partitioning key is a
  composite key, because without a persistent FunctionCallInfo
  record_cmp() has to repeatedly look-up the type definition of the
  composite key. That's quite expensive.

Thus replace this with custom-code that works in two phases:
1) Search restrictions for constraints that can be pruned upon
2) Use those restrictions to search for matching shards in the most
   efficient manner available:
   a) Binary search / Hash Lookup in case of hash partitioned tables
   b) Binary search for equal clauses in case of range or append
      tables without overlapping shards.
   c) Binary search for inequality clauses, searching for both lower
      and upper boundaries, again in case of range or append
      tables without overlapping shards.
   d) exhaustive search testing each ShardInterval

My measurements suggest that we are considerably, often orders of
magnitude, faster than the previous solution, even if we have to fall
back to exhaustive pruning.
2017-04-28 14:40:41 -07:00
Andres Freund 15d427f931 Add DistTableCacheEntry->shardValueCompareFunction.
That's useful when comparing values a hash-partitioned table is
filtered by.  The existing shardIntervalCompareFunction is about
comparing hashed values, not unhashed ones.

The added btree opclass function is so we can get a comparator
back. This should be changed much more widely, but is not necessary so
far.
2017-04-28 14:40:38 -07:00
Metin Doslu d411892fe6 Send explain queries with savepoints
With this commit, we started to send explain queries within a savepoint. After
running explain query, we rollback to savepoint. This saves us from side effects
of EXPLAIN ANALYZE on DML queries.
2017-04-28 12:13:48 -07:00
Andres Freund f064c33d5c Add back pruning coverage lost in last commit.
Because we can't rely on the debuggin message anymore, add a bunch of
explain statements that roughly fulfill the same purpose.
2017-04-26 11:33:56 -07:00
Burak Yucesoy 9312ef8bcf Stabilize test outputs 2017-04-21 16:08:52 +03:00
Marco Slot 7d1f7b8923 Support expressions in the partition column in INSERTs 2017-04-21 14:05:52 +02:00
velioglu a26edd2249 Implement ALTER TABLE ADD CONSTRAINT command 2017-04-20 15:02:33 +03:00