Commit Graph

119 Commits (2e4029973c0a4189e61337611f0b789e64c7426f)

Author SHA1 Message Date
Nils Dijk 6cf4516fdb
fix \d change for indexes in pg11 2018-08-15 23:27:31 -06:00
Nils Dijk 2a9d47e1a6
fix pg11 tests 2018-08-15 23:27:31 -06:00
mehmet furkan şahin 6d0fbbace7 ALTER TABLE %s ADD COLUMN constraint check is added 2018-07-24 15:53:05 +03:00
Murat Tuncer f20258ef10 Expand count distinct support
We can now support more complex count distinct operations by
pulling necessary columns to coordinator and evalutating the
aggreage at coordinator.

It supports broad range of expression with the restriction that
the expression must contain a column.
2018-07-06 09:44:20 +03:00
Onder Kalaci ed47e4e6b9 Remove placementId from the ORDER BY to make results consistent 2018-05-11 17:04:50 +03:00
mehmet furkan şahin 785a86ed0a Tests are updated to use create_distributed_table 2018-05-10 11:18:59 +03:00
mehmet furkan şahin ef90122cd3 shard count for some of the tests are increased 2018-05-03 10:44:43 +03:00
velioglu 121ff39b26 Removes large_table_shard_count GUC 2018-04-29 10:34:50 +02:00
Brian Cloutier 0104790385 Fix hard-coded temp directory in multi_copy
/tmp does not exist on Windows, use :temp_dir instead
2018-04-17 15:01:22 -07:00
velioglu 82b2d21b0c Convert broadcast join to reference join
After this commit large_table_shard_count wont be used to
check whether broadcast join, which is renamed as reference
join, can be applied. Reference join can only be applied over
reference tables.
2018-04-13 12:58:14 +03:00
velioglu 72dfe4a289 Adds colocation check to local join 2018-04-04 22:49:27 +03:00
velioglu 698d585fb5 Remove broadcast join logic
After this change all the logic related to shard data fetch logic
will be removed. Planner won't plan any ShardFetchTask anymore.
Shard fetch related steps in real time executor and task-tracker
executor have been removed.
2018-03-30 11:45:19 +03:00
Brian Cloutier 9aff4384a1 Make tests platform independent
- Force all platforms to use the same collation
- Force all platforms to use the same locale
- Use /dev/null or NUL, depending on platform
- Use /tmp or %TEMP%, dpeending on platform
2018-03-27 14:18:48 -07:00
Onder Kalaci 4d4648aabd Change single shard mx test tables to reference tables 2018-02-26 13:28:24 +02:00
Murat Tuncer 901b543e20 Fix count distinct using field select on top level query
We were allowing count distict queries even if they were
not directly on columns if the query is grouped on
distribution column.

When performing these checks we were skipping subqueries
because they also perform this check in a more concise manner.
We relied on oid SUBQUERY_RELATION_ID (10000) to decide if
a given RTE relation id denotes a subquery, however, we also
use SUBQUERY_PUSHDOWN_RELATION_ID (10001) for some subqueries.

We skip both type of subqueries with this change.
2018-02-06 13:16:10 +03:00
Marco Slot 6f7c3bd73b Skip JSON validation on coordinator during COPY 2018-02-02 15:33:27 +01:00
Dimitri Fontaine c9760fbb64 Fix CREATE INDEX with storage options on distributed tables.
By sharing the implementation of the function AppendOptionListToString on
three call sites, we would expand an extra OPTIONS keyword in a create index
statement, and omit other bits of the specific syntax here.

This patch introduces an AppendStorageParametersToString() function that is
very similar to AppendOptionListToString() but handles WITH(a="foo",...)
syntax that is used in reloptions (aka Storage Parameters).

Fixes #1747.
2018-01-17 21:56:40 +01:00
Dimitri Fontaine 952da72c55 Implement ALTER TABLE|INDEX ... SET|RESET ().
PostgreSQL implements support for several relation kinds in a single
statement, such as in the AlterTableStmt case, which supports both tables
and indexes and more (see ATExecSetRelOptions in PostgreSQL source code file
src/backend/commands/tablecmds.c for an example of that).

As a consequence, this patch implements support for setting and resetting
storage parameters on both relation kinds.
2018-01-17 21:56:40 +01:00
Dimitri Fontaine 17266e3301 Implement ALTER INDEX ... RENAME TO ...
The command is now distributed among the shards when the table is
distributed. To that effect, we fill in the DDLJob's targetRelationId with
the OID of the table for which the index is defined, rather than the OID of
the index itself.
2018-01-17 21:56:40 +01:00
Dimitri Fontaine e010238280 Implement ALTER TABLE ... RENAME TO ...
The implementation was already mostly in place, but the code was protected
by a principled check against the operation. Turns out there's a nasty
concurrency bug though with long identifier names, much as in #1664.

To prevent deadlocks from happening, we could either review the DDL
transaction management in shards and placements, or we can simply reject
names with (NAMEDATALEN - 1) chars or more — that's because of the
PostgreSQL array types being created with a one-char prefix: '_'.
2018-01-11 13:21:24 +01:00
Marco Slot a64f0060ba Reduce the frequency of FinishConnectionIO calls during COPY (#1864) 2017-12-14 13:21:59 -05:00
Marco Slot a9933deac6 Make real time executor work in transactions 2017-11-30 09:59:32 +03:00
Marco Slot f4ceea5a3d Enable 2PC by default 2017-11-22 11:26:58 +01:00
mehmet furkan şahin 34709c2a16 Regression tests parallelization PART-1 2017-11-20 18:03:37 +03:00
mehmet furkan şahin 314fc09d90 regression test shard_count is changed from 32 to 4 2017-11-20 12:47:49 +03:00
Marco Slot 89eb833375 Use citus.next_shard_id where practical in regression tests 2017-11-15 10:12:05 +01:00
Murat Tuncer 2332678793
Fix intermittent test failures on count distinct tests (#1761)
Added analyze to test which seem to force planner to use index.
2017-11-06 10:58:46 +02:00
Murat Tuncer e16805215d
Support count(distinct) for non-partition columns (#1692)
Expands count distinct coverage by allowing more cases. We used to support
count distinct only if we can push down distinct aggregate to worker query
i.e. the count distinct clause was on the partition column of the table,
or there was a grouping on the partition column.

Now we can support
- non-partition columns, with or without grouping on partition column
- partition, and non partition column in the same query
- having clause
- single table subqueries
- insert into select queries
- join queries where count distinct is on partition, or non-partition column
- filters on count distinct clauses (extends existing support)

We first try to push down aggregate to worker query (original case), if we
can't then we modify worker query to return distinct columns to coordinator
node. We do that by adding distinct column targets to group by clauses. Then
we perform count distinct operation on the coordinator node.

This work should reduce the cases where HLL is used as it can address anything
that HLL can. However, if we start having performance issues due to very large
number rows, then we can recommend hll use.
2017-10-30 13:12:24 +02:00
mehmet furkan şahin 61ae33dc7f ALTER TABLE .. REPLICA IDENTITY support is implemented 2017-10-26 13:44:28 +03:00
velioglu 0b5db5d826 Support multi shard update/delete queries 2017-10-25 15:52:38 +03:00
Onder Kalaci 498ac80d8b Add window function support for SUBQUERY PUSHDOWN and INSERT INTO SELECT
This commit provides the support for window functions in subquery and insert
into select queries. Note that our support for window functions is still limited
because it must have a partition by clause on the distribution key. This commit
makes changes in the files insert_select_planner and multi_logical_planner. The
required tests are also added with files multi_subquery_window_functions.out
and multi_insert_select_window.out.
2017-10-04 15:33:07 +03:00
Marco Slot 24915779d1 Remove separate citus.binary_worker_copy_format regression tests 2017-10-03 17:44:50 +02:00
Onder Kalaci a5b66912d4 Expand reference table support in subquery pushdown
With this commit, we relax the restrictions put on the reference
tables with subquery pushdown.

We did three notable improvements:

1) Relax equi-join restrictions

 Previously, we always expected that the non-reference tables are
 equi joined with reference tables on the partition key of the
 non-reference table.

 With this commit, we allow any column of non-reference tables
 joined using non-equi joins as well.

2) Relax OUTER JOIN restrictions

 Previously Citus errored out if any reference table exists at
 any point of the outer part of an outer join. For instance,
 See the below sketch where (h) denotes a hash distributed relation,
 (r) denotes a reference table, (L) denotes LEFT JOIN and
 (I) denotes INNER JOIN.

             (L)
             /  \
           (I)     h
          /  \
        r      h

 Before this commit Citus would error out since a reference table
 appears on the left most part of an left join. However, that was
 too restrictive so that we only error out if the reference table
 is directly below and in the outer part of an outer join.

3) Bug fixes

 We've done some minor bugfixes in the existing implementation.
2017-09-14 20:59:22 +03:00
Marco Slot cf375d6a66 Consider dropped columns that precede the partition column in COPY 2017-08-22 13:02:35 +02:00
Hadi Moshayedi e5fbcf37dd Add Savepoint Support (#1539)
This change adds support for SAVEPOINT, ROLLBACK TO SAVEPOINT, and RELEASE SAVEPOINT.

When transaction connections are not established yet, savepoints are kept in a stack and sent to the worker when the connection is later established. After establishing connections, savepoint commands are sent as they arrive.

This change fixes #1493 .
2017-08-15 13:02:28 -04:00
Marco Slot 3ff46245b3 Make sure we don't use 2PC in copy from worker 2017-08-15 13:44:20 +02:00
velioglu b0efffae1c Correct planner and add more tests 2017-08-11 10:16:13 +03:00
Brian Cloutier 74ce4faab5 Make multi_cluster_management test more stable 2017-08-08 11:18:31 +03:00
Hadi Moshayedi 8229a64fe8 Remove distributed tables' dependency on distribution key columns. (#1527)
This change removes distributed tables' dependency on distribution key columns. We already check that we cannot drop distribution key columns in ErrorIfUnsupportedAlterTableStmt() at multi_utility.c, so we don't need to have distributed table to distribution key column dependency to avoid dropping of distribution key column.

Furthermore, having this dependency causes some warnings in pg_dump --schema-only (See #866), which are not desirable.

This change also adds check to disallow drop of distribution keys when citus.enable_ddl_propagation is set to false. Regression tests are updated accordingly.
2017-08-03 10:07:04 -04:00
Jason Petersen 2204da19f0 Support PostgreSQL 10 (#1379)
Adds support for PostgreSQL 10 by copying in the requisite ruleutils
and updating all API usages to conform with changes in PostgreSQL 10.
Most changes are fairly minor but they are numerous. One particular
obstacle was the change in \d behavior in PostgreSQL 10's psql; I had
to add SQL implementations (views, mostly) to mimic the pre-10 output.
2017-06-26 02:35:46 -06:00
velioglu a1ea29ec2b Use placement connection to drop shards instead of node connection 2017-06-14 14:14:59 +03:00
Onder Kalaci df494c0403 Improve subquery pushdown regression tests
- Use native postgres function for composite key btree functions
  - Move explain tests to multi_explain.sql (get rid of .out _0.out files)
  - Get rid of input/output files for multi_subquery.sql by moving table creations
  - Update some comments
2017-05-30 14:05:15 +03:00
Jason Petersen 97f8302c9c
Change version-sensitive tests to handle '10'
Previously assumed period in version; this makes tests future-proof.
2017-05-16 11:05:34 -06:00
Jason Petersen d6cccee5bc
Remove ALTER SEQUENCE from parallel groups
Removing these has no side effect, and in the (current) PostgreSQL 10,
an ERROR is printed during concurrent sequence modification.
2017-05-16 11:05:34 -06:00
Jason Petersen db11324ac7
Add unambiguous ORDER BY clauses to many tests
Queries which do not specify an order may arbitrarily change output
across PostgreSQL versions.
2017-05-16 11:05:34 -06:00
Jason Petersen b9bc3fdada
Update header comments to match new test names
Just keeping these in sync with the actual file name.
2017-05-16 11:05:33 -06:00
Jason Petersen 9f4a33eee1
Rename very long test files
In addition to not actually providing much information, these names can
cause problems in PostgreSQL 10.
2017-05-16 11:05:33 -06:00
Önder Kalacı ad5cd326a4 Subquery pushdown - main branch (#1323)
* Enabling physical planner for subquery pushdown changes

This commit applies the logic that exists in INSERT .. SELECT
planning to the subquery pushdown changes.

The main algorithm is followed as :
   - pick an anchor relation (i.e., target relation)
   - per each target shard interval
       - add the target shard interval's shard range
         as a restriction to the relations (if all relations
         joined on the partition keys)
        - Check whether the query is router plannable per
          target shard interval.
        - If router plannable, create a task

* Add union support within the JOINS

This commit adds support for UNION/UNION ALL subqueries that are
in the following form:

     .... (Q1 UNION Q2 UNION ...) as union_query JOIN (QN) ...

In other words, we currently do NOT support the queries that are
in the following form where union query is not JOINed with
other relations/subqueries :

     .... (Q1 UNION Q2 UNION ...) as union_query ....

* Subquery pushdown planner uses original query

With this commit, we change the input to the logical planner for
subquery pushdown. Before this commit, the planner was relying
on the query tree that is transformed by the postgresql planner.
After this commit, the planner uses the original query. The main
motivation behind this change is the simplify deparsing of
subqueries.

* Enable top level subquery join queries

This work enables
- Top level subquery joins
- Joins between subqueries and relations
- Joins involving more than 2 range table entries

A new regression test file is added to reflect enabled test cases

* Add top level union support

This commit adds support for UNION/UNION ALL subqueries that are
in the following form:

     .... (Q1 UNION Q2 UNION ...) as union_query ....

In other words, Citus supports allow top level
unions being wrapped into aggregations queries
and/or simple projection queries that only selects
some fields from the lower level queries.

* Disallow subqueries without a relation in the range table list for subquery pushdown

This commit disallows subqueries without relation in the range table
list. This commit is only applied for subquery pushdown. In other words,
we do not add this limitation for single table re-partition subqueries.

The reasoning behind this limitation is that if we allow pushing down
such queries, the result would include (shardCount * expectedResults)
where in a non distributed world the result would be (expectedResult)
only.

* Disallow subqueries without a relation in the range table list for INSERT .. SELECT

This commit disallows subqueries without relation in the range table
list. This commit is only applied for INSERT.. SELECT queries.

The reasoning behind this limitation is that if we allow pushing down
such queries, the result would include (shardCount * expectedResults)
where in a non distributed world the result would be (expectedResult)
only.

* Change behaviour of subquery pushdown flag (#1315)

This commit changes the behaviour of the citus.subquery_pushdown flag.
Before this commit, the flag is used to enable subquery pushdown logic. But,
with this commit, that behaviour is enabled by default. In other words, the
flag is now useless. We prefer to keep the flag since we don't want to break
the backward compatibility. Also, we may consider using that flag for other
purposes in the next commits.

* Require subquery_pushdown when limit is used in subquery

Using limit in subqueries may cause returning incorrect
results. Therefore we allow limits in subqueries only
if user explicitly set subquery_pushdown flag.

* Evaluate expressions on the LIMIT clause (#1333)

Subquery pushdown uses orignal query, the LIMIT and OFFSET clauses
are not evaluated. However, logical optimizer expects these expressions
are already evaluated by the standard planner. This commit manually
evaluates the functions on the logical planner for subquery pushdown.

* Better format subquery regression tests (#1340)

* Style fix for subquery pushdown regression tests

With this commit we intented a more consistent style for the
regression tests we've added in the
  - multi_subquery_union.sql
  - multi_subquery_complex_queries.sql
  - multi_subquery_behavioral_analytics.sql

* Enable the tests that are temporarily commented

This commit enables some of the regression tests that were commented
out until all the development is done.

* Fix merge conflicts (#1347)

 - Update regression tests to meet the changes in the regression
   test output.
 - Replace Ifs with Asserts given that the check is already done
 - Update shard pruning outputs

* Add view regression tests for increased subquery coverage (#1348)

- joins between views and tables
- joins between views
- union/union all queries involving views
- views with limit
- explain queries with view

* Improve btree operators for the subquery tests

This commit adds the missing comprasion for subquery composite key
btree comparator.
2017-04-29 04:09:48 +03:00
Andres Freund 6bd2e3ed30 Add DistTableCacheEntry->hasOverlappingShardInterval.
This determines whether it's possible to perform binary search on
sortedShardIntervalArray or not.  If e.g. two shards have overlapping
ranges, that'd be prohibitive.

That'll be useful in later commit introducing faster shard pruning.
2017-04-28 14:40:38 -07:00
Andres Freund 1f93c325fa Some cleanup in multi_subquery test.
Remove trailing whitespace and use of EXPLAIN instead of
EXPLAIN (COSTS OFF).
2017-04-26 11:33:56 -07:00