Commit Graph

22 Commits (1d7dda991f54a2da75febf9040a4356221e9a4ba)

Author SHA1 Message Date
Marco Slot 8976f245ab Support reference table view in reference table modification 2020-10-16 11:31:24 +02:00
SaitTalhaNisanci b3af63c8ce
Remove task tracker executor (#3850)
* use adaptive executor even if task-tracker is set

* Update check-multi-mx tests for adaptive executor

Basically repartition joins are enabled where necessary. For parallel
tests max adaptive executor pool size is decresed to 2, otherwise we
would get too many clients error.

* Update limit_intermediate_size test

It seems that when we use adaptive executor instead of task tracker, we
exceed the intermediate result size less in the test. Therefore updated
the tests accordingly.

* Update multi_router_planner

It seems that there is one problem with multi_router_planner when we use
adaptive executor, we should fix the following error:
+ERROR:  relation "authors_range_840010" does not exist
+CONTEXT:  while executing command on localhost:57637

* update repartition join tests for check-multi

* update isolation tests for repartitioning

* Error out if shard_replication_factor > 1 with repartitioning

As we are removing the task tracker, we cannot switch to it if
shard_replication_factor > 1. In that case, we simply error out.

* Remove MULTI_EXECUTOR_TASK_TRACKER

* Remove multi_task_tracker_executor

Some utility methods are moved to task_execution_utils.c.

* Remove task tracker protocol methods

* Remove task_tracker.c methods

* remove unused methods from multi_server_executor

* fix style

* remove task tracker specific tests from worker_schedule

* comment out task tracker udf calls in tests

We were using task tracker udfs to test permissions in
multi_multiuser.sql. We should find some other way to test them, then we
should remove the commented out task tracker calls.

* remove task tracker test from follower schedule

* remove task tracker tests from multi mx schedule

* Remove task-tracker specific functions from worker functions

* remove multi task tracker extra schedule

* Remove unused methods from multi physical planner

* remove task_executor_type related things in tests

* remove LoadTuplesIntoTupleStore

* Do initial cleanup for repartition leftovers

During startup, task tracker would call TrackerCleanupJobDirectories and
TrackerCleanupJobSchemas to clean up leftover directories and job
schemas. With adaptive executor, while doing repartitions it is possible
to leak these things as well. We don't retry cleanups, so it is possible
to have leftover in case of errors.

TrackerCleanupJobDirectories is renamed as
RepartitionCleanupJobDirectories since it is repartition specific now,
however TrackerCleanupJobSchemas cannot be used currently because it is
task tracker specific. The thing is that this function is a no-op
currently.

We should add cleaning up intermediate schemas to DoInitialCleanup
method when that problem is solved(We might want to solve it in this PR
as well)

* Revert "remove task tracker tests from multi mx schedule"

This reverts commit 03ecc0a681.

* update multi mx repartition parallel tests

* not error with task_tracker_conninfo_cache_invalidate

* not run 4 repartition queries in parallel

It seems that when we run 4 repartition queries in parallel we get too
many clients error on CI even though we don't get it locally. Our guess
is that, it is because we open/close many connections without doing some
work and postgres has some delay to close the connections. Hence even
though connections are removed from the pg_stat_activity, they might
still not be closed. If the above assumption is correct, it is unlikely
for it to happen in practice because:
- There is some network latency in clusters, so this leaves some times
for connections to be able to close
- Repartition joins return some data and that also leaves some time for
connections to be fully closed.

As we don't get this error in our local, we currently assume that it is
not a bug. Ideally this wouldn't happen when we get rid of the
task-tracker repartition methods because they don't do any pruning and
might be opening more connections than necessary.

If this still gives us "too many clients" error, we can try to increase
the max_connections in our test suite(which is 100 by default).

Also there are different places where this error is given in postgres,
but adding some backtrace it seems that we get this from
ProcessStartupPacket. The backtraces can be found in this link:
https://circleci.com/gh/citusdata/citus/138702

* Set distributePlan->relationIdList when it is needed

It seems that we were setting the distributedPlan->relationIdList after
JobExecutorType is called, which would choose task-tracker if
replication factor > 1 and there is a repartition query. However, it
uses relationIdList to decide if the query has a repartition query, and
since it was not set yet, it would always think it is not a repartition
query and would choose adaptive executor when it should choose
task-tracker.

* use adaptive executor even with shard_replication_factor > 1

It seems that we were already using adaptive executor when
replication_factor > 1. So this commit removes the check.

* remove multi_resowner.c and deprecate some settings

* remove TaskExecution related leftovers

* change deprecated API error message

* not recursively plan single relatition repartition subquery

* recursively plan single relation repartition subquery

* test depreceated task tracker functions

* fix overlapping shard intervals in range-distributed test

* fix error message for citus_metadata_container

* drop task-tracker deprecated functions

* put the implemantation back to worker_cleanup_job_schema_cachesince citus cloud uses it

* drop some functions, add downgrade script

Some deprecated functions are dropped.
Downgrade script is added.
Some gucs are deprecated.
A new guc for repartition joins bucket size is added.

* order by a test to fix flappiness
2020-07-18 13:11:36 +03:00
Philip Dubé c563e0825c Strip trailing whitespace and add final newline (#3186)
This brings files in line with our editorconfig file
2019-11-21 14:25:37 +01:00
Onur TIRTIR d3f68bf44f
Fix view is not distributed error when view is used in modify statements (#3104) 2019-11-01 16:34:01 +03:00
Halil Ozan Akgul 5f04ac774f Adds the tests for refresh materialized views 2019-10-17 16:00:56 +03:00
Philip Dubé 74cb168205 Remove Postgres 10 support 2019-10-11 21:56:56 +00:00
Philip Dubé 66ce2d2d2d Materialize c1 to keep subplan ids in sync 2019-08-09 15:25:59 +00:00
Onder Kalaci 92e87738dd Make sure that the regression test output is durable to different execution orders
Mostly add order bys and suppress worker node ports in the test
outputs.
2019-04-08 11:48:08 +03:00
Marco Slot 1656b519c4 Plan outer joins through pushdown planning 2019-01-05 20:55:27 +01:00
Murat Tuncer a72d959735 Fix multi_view tests 2019-01-03 17:07:26 +03:00
Murat Tuncer a6fe5ca183 PG11 compatibility update
- changes in ruleutils_11.c is reflected
- vacuum statement api change is handled. We now allow
  multi-table vacuum commands.
- some other function header changes are reflected
- api conflicts between PG11 and earlier versions
  are handled by adding shims in version_compat.h
- various regression tests are fixed due output and
  functionality in PG1
- no change is made to support new features in PG11
  they need to be handled by new commit
2018-04-26 11:29:43 +03:00
Onder Kalaci 1c930c96a3 Support non-co-located joins between subqueries
With #1804 (and related PRs), Citus gained the ability to
plan subqueries that are not safe to pushdown.

There are two high-level requirements for pushing down subqueries:

   * Individual subqueries that require a merge step (i.e., GROUP BY
     on non-distribution key, or LIMIT in the subquery etc). We've
     handled such subqueries via #1876.

    * Combination of subqueries that are not joined on distribution keys.
      This commit aims to recursively plan some of such subqueries to make
      the whole query safe to pushdown.

The main logic behind non colocated subquery joins is that we pick
an anchor range table entry and check for distribution key equality
of any  other subqueries in the given query. If for a given subquery,
we cannot find distribution key equality with the anchor rte, we
recursively plan that subquery.

We also used a hacky solution for picking relations as the anchor range
table entries. The hack is that we wrap them into a subquery. This is only
necessary since some of the attribute equivalance checks are based on
queries rather than range table entries.
2018-02-26 13:50:37 +02:00
Marco Slot 09c09f650f Recursively plan set operations when leaf nodes recur 2017-12-26 13:46:55 +02:00
Onder Kalaci e2a5124830 Add regression tests for recursive subquery planning 2017-12-21 08:37:40 +02:00
Marco Slot 5895c88552 Add materialized view regression tests 2017-12-07 16:20:23 +01:00
mehmet furkan şahin 1b06b2b306 The data used in regression tests is reduced
This commit reduces the size of the data in users_table.data
and events_table.data from 10K rows to 100 rows.
2017-11-28 14:15:46 +03:00
Murat Tuncer e16805215d
Support count(distinct) for non-partition columns (#1692)
Expands count distinct coverage by allowing more cases. We used to support
count distinct only if we can push down distinct aggregate to worker query
i.e. the count distinct clause was on the partition column of the table,
or there was a grouping on the partition column.

Now we can support
- non-partition columns, with or without grouping on partition column
- partition, and non partition column in the same query
- having clause
- single table subqueries
- insert into select queries
- join queries where count distinct is on partition, or non-partition column
- filters on count distinct clauses (extends existing support)

We first try to push down aggregate to worker query (original case), if we
can't then we modify worker query to return distinct columns to coordinator
node. We do that by adding distinct column targets to group by clauses. Then
we perform count distinct operation on the coordinator node.

This work should reduce the cases where HLL is used as it can address anything
that HLL can. However, if we start having performance issues due to very large
number rows, then we can recommend hll use.
2017-10-30 13:12:24 +02:00
Onder Kalaci 4f668ad38b Make the test outputs consistent
by using VACUUM ANALYZE on the tables.
2017-08-12 13:29:25 +03:00
Marco Slot 2f8ac82660 Execute INSERT..SELECT via coordinator if it cannot be pushed down
Add a second implementation of INSERT INTO distributed_table SELECT ... that is used if
the query cannot be pushed down. The basic idea is to execute the SELECT query separately
and pass the results into the distributed table using a CopyDestReceiver, which is also
used for COPY and create_distributed_table. When planning the SELECT, we go through
planner hooks again, which means the SELECT can also be a distributed query.

EXPLAIN is supported, but EXPLAIN ANALYZE is not because preventing double execution was
a lot more complicated in this case.
2017-06-22 15:46:30 +02:00
Önder Kalacı b74ed3c8e1 Subqueries in where -- updated (#1372)
* Support for subqueries in WHERE clause

This commit enables subqueries in WHERE clause to be pushed down
by the subquery pushdown logic.

The support covers:
  - Correlated subqueries with IN, NOT IN, EXISTS, NOT EXISTS,
    operator expressions such as (>, <, =, ALL, ANY etc.)
  - Non-correlated subqueries with (partition_key) IN (SELECT partition_key ..)
    (partition_key) =ANY (SELECT partition_key ...)

Note that this commit heavily utilizes the attribute equivalence logic introduced
in the 1cb6a34ba8. In general, this commit mostly
adjusts the logical planner not to error out on the subqueries in WHERE clause.

* Improve error checks for subquery pushdown and INSERT ... SELECT

Since we allow subqueries in WHERE clause with the previous commit,
we should apply the same limitations to those subqueries.

With this commit, we do not iterate on each subquery one by one.
Instead, we extract all the subqueries and apply the checks directly
on those subqueries. The aim of this change is to (i) Simplify the
code (ii) Make it close to the checks on INSERT .. SELECT code base.

* Extend checks for unresolved paramaters to include SubLinks

With the presence of subqueries in where clause (i.e., SubPlans on the
query) the existing way for checking unresolved parameters fail. The
reason is that the parameters for SubPlans are kept on the parent plan not
on the query itself (see primnodes.h for the details).

With this commit, instead of checking SubPlans on the modified plans
we start to use originalQuery, where SubLinks represent the subqueries
in where clause. The unresolved parameters can be found on the SubLinks.

* Apply code-review feedback

* Remove unnecessary copying of shard interval list

This commit removes unnecessary copying of shard interval list. Note
that there are no copyObject function implemented for shard intervals.
2017-05-01 17:20:21 +03:00
Önder Kalacı ad5cd326a4 Subquery pushdown - main branch (#1323)
* Enabling physical planner for subquery pushdown changes

This commit applies the logic that exists in INSERT .. SELECT
planning to the subquery pushdown changes.

The main algorithm is followed as :
   - pick an anchor relation (i.e., target relation)
   - per each target shard interval
       - add the target shard interval's shard range
         as a restriction to the relations (if all relations
         joined on the partition keys)
        - Check whether the query is router plannable per
          target shard interval.
        - If router plannable, create a task

* Add union support within the JOINS

This commit adds support for UNION/UNION ALL subqueries that are
in the following form:

     .... (Q1 UNION Q2 UNION ...) as union_query JOIN (QN) ...

In other words, we currently do NOT support the queries that are
in the following form where union query is not JOINed with
other relations/subqueries :

     .... (Q1 UNION Q2 UNION ...) as union_query ....

* Subquery pushdown planner uses original query

With this commit, we change the input to the logical planner for
subquery pushdown. Before this commit, the planner was relying
on the query tree that is transformed by the postgresql planner.
After this commit, the planner uses the original query. The main
motivation behind this change is the simplify deparsing of
subqueries.

* Enable top level subquery join queries

This work enables
- Top level subquery joins
- Joins between subqueries and relations
- Joins involving more than 2 range table entries

A new regression test file is added to reflect enabled test cases

* Add top level union support

This commit adds support for UNION/UNION ALL subqueries that are
in the following form:

     .... (Q1 UNION Q2 UNION ...) as union_query ....

In other words, Citus supports allow top level
unions being wrapped into aggregations queries
and/or simple projection queries that only selects
some fields from the lower level queries.

* Disallow subqueries without a relation in the range table list for subquery pushdown

This commit disallows subqueries without relation in the range table
list. This commit is only applied for subquery pushdown. In other words,
we do not add this limitation for single table re-partition subqueries.

The reasoning behind this limitation is that if we allow pushing down
such queries, the result would include (shardCount * expectedResults)
where in a non distributed world the result would be (expectedResult)
only.

* Disallow subqueries without a relation in the range table list for INSERT .. SELECT

This commit disallows subqueries without relation in the range table
list. This commit is only applied for INSERT.. SELECT queries.

The reasoning behind this limitation is that if we allow pushing down
such queries, the result would include (shardCount * expectedResults)
where in a non distributed world the result would be (expectedResult)
only.

* Change behaviour of subquery pushdown flag (#1315)

This commit changes the behaviour of the citus.subquery_pushdown flag.
Before this commit, the flag is used to enable subquery pushdown logic. But,
with this commit, that behaviour is enabled by default. In other words, the
flag is now useless. We prefer to keep the flag since we don't want to break
the backward compatibility. Also, we may consider using that flag for other
purposes in the next commits.

* Require subquery_pushdown when limit is used in subquery

Using limit in subqueries may cause returning incorrect
results. Therefore we allow limits in subqueries only
if user explicitly set subquery_pushdown flag.

* Evaluate expressions on the LIMIT clause (#1333)

Subquery pushdown uses orignal query, the LIMIT and OFFSET clauses
are not evaluated. However, logical optimizer expects these expressions
are already evaluated by the standard planner. This commit manually
evaluates the functions on the logical planner for subquery pushdown.

* Better format subquery regression tests (#1340)

* Style fix for subquery pushdown regression tests

With this commit we intented a more consistent style for the
regression tests we've added in the
  - multi_subquery_union.sql
  - multi_subquery_complex_queries.sql
  - multi_subquery_behavioral_analytics.sql

* Enable the tests that are temporarily commented

This commit enables some of the regression tests that were commented
out until all the development is done.

* Fix merge conflicts (#1347)

 - Update regression tests to meet the changes in the regression
   test output.
 - Replace Ifs with Asserts given that the check is already done
 - Update shard pruning outputs

* Add view regression tests for increased subquery coverage (#1348)

- joins between views and tables
- joins between views
- union/union all queries involving views
- views with limit
- explain queries with view

* Improve btree operators for the subquery tests

This commit adds the missing comprasion for subquery composite key
btree comparator.
2017-04-29 04:09:48 +03:00
Murat Tuncer 77f8db6b14 Add view support
Enables use views within distributed queries.
User can create and use a view on distributed tables/queries
as he/she would use with regular queries.

After this change router queries will have full support for views,
insert into select queries will support reading from views, not
writing into. Outer joins would have a limited support, and would
error out at certain cases such as when a view is in the inner side
of the outer join.

Although PostgreSQL supports writing into views under certain circumstances.
We disallowed that for distributed views.
2017-01-13 09:39:42 +03:00