Commit Graph

15 Commits (808f30c1a26eea3ac9e67ef4a3fecb086df8cefc)

Author SHA1 Message Date
Onder Kalaci 5c4c9304ba Remove RemoveDuplicateJoinRestrictions() function
RemoveDuplicateJoinRestrictions() function was introduced with the aim of decrasing the overall planning times by eliminating the duplicate JOIN restriction entries (#1989). However, it turns out that the function itself is so CPU intensive with a very high algorithmic complexity, it hurts a lot more than it helps. The function is a clear example of premature optimization.

The table below shows the difference clearly:

"distributed query planning
 time master"	RemoveDuplicateJoinRestrictions() execution time on master	"Remove the function RemoveDuplicateJoinRestrictions()
this PR"
5 table INNER JOIN	9 msec	2msec	7 msec
10 table INNER JOIN	227 msec	194 msec	29  msec
20 table INNER JOIN	1 sec 235 msec	1  sec 139  msec	90 msecs
50 table INNER JOIN	24 seconds	21 seconds	1.5 seconds
100 table INNER JOIN	2 minutes 16 secods	1 minute 53 seconds	23 seconds
250 table INNER JOIN	Bottleneck on JoinClauseList	18 minutes 52 seconds	Bottleneck on JoinClauseList

5 table INNER JOIN in subquery	9 msec	0 msec	6 msec
10 table INNER JOIN subquery	33 msec	10 msec	32 msec
20 table INNER JOIN subquery	132 msec	67 msec	123 msec
50 table INNER JOIN subquery	1.2  seconds	900 msec	500 msec
100 table INNER JOIN subquery	6 seconds	5  seconds	2 seconds
250 table INNER JOIN subquery	54 seconds	37 seconds	20  seconds

5 table LEFT JOIN	5 msec	0 msec	5 msec
10 table LEFT JOIN	11 msec	0 msec	13 msec
20 table LEFT JOIN	26 msec	2 msec	30 msec
50 table LEFT JOIN	150 msec	15 msec	193 msec
100 table LEFT JOIN	757 msec	71 msec	722 msec
250 table LEFT JOIN	8 seconds	600 msec	8 seconds

5 JOINs among 2 table JOINs 	37 msec	11 msec	25 msec
10 JOINs among 2 table JOINs 	536 msec	306 msec	352 msec
20 JOINs among 2 table JOINs 	794 msec	181 msec	640 msec
50 JOINs among 2 table JOINs 	25 seconds	2 seconds	22 seconds
100 JOINs among 2 table JOINs 	Bottleneck on JoinClauseList	9 seconds	Bottleneck on JoinClauseList
150 JOINs among 2 table JOINs 	Bottleneck on JoinClauseList	46 seconds	Bottleneck on JoinClauseList

On top of the performance penalty, the function had a critical bug #4255, and with #4254 we hit one more important bug. It should be fixed by adding the followig check to the ContextCoversJoinRestriction():
```
static bool
JoinRelIdsSame(JoinRestriction *leftRestriction, JoinRestriction *rightRestriction)
{
	Relids leftInnerRelIds = leftRestriction->innerrel->relids;
	Relids rightInnerRelIds = rightRestriction->innerrel->relids;
	if (!bms_equal(leftInnerRelIds, rightInnerRelIds))
	{
		return false;
	}

	Relids leftOuterRelIds = leftRestriction->outerrel->relids;
	Relids rightOuterRelIds = rightRestriction->outerrel->relids;
	if (!bms_equal(leftOuterRelIds, rightOuterRelIds))
	{
		return false;
	}

	return true;
}
```

However, adding this eliminates all the benefits tha RemoveDuplicateJoinRestrictions() brings.

I've used the commands here to generate the JOINs mentioned in the PR: https://gist.github.com/onderkalaci/fe8654f9df5916c7af4c7c5eb892561e#file-gistfile1-txt

Inner and outer JOINs behave roughly the same, to simplify the table only added INNER joins.
2020-10-21 10:29:39 +02:00
Onder Kalaci bbedfca761 Improve the relation restriction counters
It seems like Postgres could call set_rel_pathlist() for
the same relation multiple times. This breaks the logic
where we assume relationCount eqauls to the number of
entries in relationRestrictionList.

In summary, relationRestrictionList may contain duplicate
entries.
2020-10-19 08:51:16 +02:00
Philip Dubé 08f6842d50 Fix typos
Equivalance -> Equivalence
utillity -> utility
shorted lived one -> shortly lived one
elegible -> eligible
2020-02-18 17:14:40 +00:00
Önder Kalacı ffd89e4e01
Include all relevant relations in the ExtractRangeTableRelationWalker (#3135)
We've changed the logic for pulling RTE_RELATIONs in #3109 and
non-colocated subquery joins and partitioned tables.
@onurctirtir found this steps where I traced back and found the issues.

While looking into it in more detail, we decided to expand the list in a
way that the callers get all the relevant RTE_RELATIONs RELKIND_RELATION,
RELKIND_PARTITIONED_TABLE, RELKIND_FOREIGN_TABLE and RELKIND_MATVIEW.
These are all relation kinds that Citus planner is aware of.
2019-11-01 16:06:58 +01:00
SaitTalhaNisanci 94a7e6475c
Remove copyright years (#2918)
* Update year as 2012-2019

* Remove copyright years
2019-10-15 17:44:30 +03:00
Onder Kalaci 1c930c96a3 Support non-co-located joins between subqueries
With #1804 (and related PRs), Citus gained the ability to
plan subqueries that are not safe to pushdown.

There are two high-level requirements for pushing down subqueries:

   * Individual subqueries that require a merge step (i.e., GROUP BY
     on non-distribution key, or LIMIT in the subquery etc). We've
     handled such subqueries via #1876.

    * Combination of subqueries that are not joined on distribution keys.
      This commit aims to recursively plan some of such subqueries to make
      the whole query safe to pushdown.

The main logic behind non colocated subquery joins is that we pick
an anchor range table entry and check for distribution key equality
of any  other subqueries in the given query. If for a given subquery,
we cannot find distribution key equality with the anchor rte, we
recursively plan that subquery.

We also used a hacky solution for picking relations as the anchor range
table entries. The hack is that we wrap them into a subquery. This is only
necessary since some of the attribute equivalance checks are based on
queries rather than range table entries.
2018-02-26 13:50:37 +02:00
Onder Kalaci e8aa532a90 Refactor checks for distribution key equality
Change some function names, ensure we stick to Citus'
function order rules etc.
2018-02-26 13:28:24 +02:00
Onder Kalaci 94c5ac6ebb Remove duplicate join restrictions
We use PostgreSQL hooks to accumulate the join restrictions
and PostgreSQL gives us all the join paths it tries while
deciding on the join order. Thus, for queries that have many
joins, this function is likely to remove lots of duplicate join
restrictions. This becomes relevant for Citus on query pushdown
check peformance.
2018-02-12 18:35:05 +02:00
Onder Kalaci c228d8ff3d Refactor equivalance generation related codes
This commit changes the APIs for restriction generation to make future
changes simpler.
2018-02-12 18:35:04 +02:00
Onder Kalaci 2f2d350924 Refactor relation restriction related codes
This commit moves some of the functions to a more relevant
source file.
2018-02-12 18:35:04 +02:00
Marco Slot 3f03cb6a6a Support UNION with joins in the subqueries 2017-11-30 10:37:56 +01:00
Onder Kalaci 05fb0dd020 Add infrastructure for filtering restriction contexts based on the input query
In subquery pushdown, we first ensure that each relation is joined with at least
on another relation on the partition keys. That's fine given that the decision
is binary: pushdown the query at all or not.

With recursive planning, we'd want to check whether any specific part
of the query can be pushded down or not. Thus, we need the ability to
understand which part(s) of the subquery is safe to pushdown. This commit
adds the infrastructure for doing that.
2017-11-28 09:58:21 +02:00
Marco Slot 6ba3f42d23 Rename MultiPlan to DistributedPlan 2017-11-22 09:36:24 +01:00
Önder Kalacı ad5cd326a4 Subquery pushdown - main branch (#1323)
* Enabling physical planner for subquery pushdown changes

This commit applies the logic that exists in INSERT .. SELECT
planning to the subquery pushdown changes.

The main algorithm is followed as :
   - pick an anchor relation (i.e., target relation)
   - per each target shard interval
       - add the target shard interval's shard range
         as a restriction to the relations (if all relations
         joined on the partition keys)
        - Check whether the query is router plannable per
          target shard interval.
        - If router plannable, create a task

* Add union support within the JOINS

This commit adds support for UNION/UNION ALL subqueries that are
in the following form:

     .... (Q1 UNION Q2 UNION ...) as union_query JOIN (QN) ...

In other words, we currently do NOT support the queries that are
in the following form where union query is not JOINed with
other relations/subqueries :

     .... (Q1 UNION Q2 UNION ...) as union_query ....

* Subquery pushdown planner uses original query

With this commit, we change the input to the logical planner for
subquery pushdown. Before this commit, the planner was relying
on the query tree that is transformed by the postgresql planner.
After this commit, the planner uses the original query. The main
motivation behind this change is the simplify deparsing of
subqueries.

* Enable top level subquery join queries

This work enables
- Top level subquery joins
- Joins between subqueries and relations
- Joins involving more than 2 range table entries

A new regression test file is added to reflect enabled test cases

* Add top level union support

This commit adds support for UNION/UNION ALL subqueries that are
in the following form:

     .... (Q1 UNION Q2 UNION ...) as union_query ....

In other words, Citus supports allow top level
unions being wrapped into aggregations queries
and/or simple projection queries that only selects
some fields from the lower level queries.

* Disallow subqueries without a relation in the range table list for subquery pushdown

This commit disallows subqueries without relation in the range table
list. This commit is only applied for subquery pushdown. In other words,
we do not add this limitation for single table re-partition subqueries.

The reasoning behind this limitation is that if we allow pushing down
such queries, the result would include (shardCount * expectedResults)
where in a non distributed world the result would be (expectedResult)
only.

* Disallow subqueries without a relation in the range table list for INSERT .. SELECT

This commit disallows subqueries without relation in the range table
list. This commit is only applied for INSERT.. SELECT queries.

The reasoning behind this limitation is that if we allow pushing down
such queries, the result would include (shardCount * expectedResults)
where in a non distributed world the result would be (expectedResult)
only.

* Change behaviour of subquery pushdown flag (#1315)

This commit changes the behaviour of the citus.subquery_pushdown flag.
Before this commit, the flag is used to enable subquery pushdown logic. But,
with this commit, that behaviour is enabled by default. In other words, the
flag is now useless. We prefer to keep the flag since we don't want to break
the backward compatibility. Also, we may consider using that flag for other
purposes in the next commits.

* Require subquery_pushdown when limit is used in subquery

Using limit in subqueries may cause returning incorrect
results. Therefore we allow limits in subqueries only
if user explicitly set subquery_pushdown flag.

* Evaluate expressions on the LIMIT clause (#1333)

Subquery pushdown uses orignal query, the LIMIT and OFFSET clauses
are not evaluated. However, logical optimizer expects these expressions
are already evaluated by the standard planner. This commit manually
evaluates the functions on the logical planner for subquery pushdown.

* Better format subquery regression tests (#1340)

* Style fix for subquery pushdown regression tests

With this commit we intented a more consistent style for the
regression tests we've added in the
  - multi_subquery_union.sql
  - multi_subquery_complex_queries.sql
  - multi_subquery_behavioral_analytics.sql

* Enable the tests that are temporarily commented

This commit enables some of the regression tests that were commented
out until all the development is done.

* Fix merge conflicts (#1347)

 - Update regression tests to meet the changes in the regression
   test output.
 - Replace Ifs with Asserts given that the check is already done
 - Update shard pruning outputs

* Add view regression tests for increased subquery coverage (#1348)

- joins between views and tables
- joins between views
- union/union all queries involving views
- views with limit
- explain queries with view

* Improve btree operators for the subquery tests

This commit adds the missing comprasion for subquery composite key
btree comparator.
2017-04-29 04:09:48 +03:00
Onder Kalaci 1cb6a34ba8 Remove uninstantiated qual logic, use attribute equivalences
In this PR, we aim to deduce whether each of the RTE_RELATION
is joined with at least on another RTE_RELATION on their partition keys. If each
RTE_RELATION follows the above rule, we can conclude that all RTE_RELATIONs are
joined on their partition keys.

In order to do that, we invented a new equivalence class namely:
AttributeEquivalenceClass. In very simple words, a AttributeEquivalenceClass is
identified by an unique id and consists of a list of AttributeEquivalenceMembers.

Each AttributeEquivalenceMember is designed to identify attributes uniquely within the
whole query. The necessity of this arise since varno attributes are defined within
a single level of a query. Instead, here we want to identify each RTE_RELATION uniquely
and try to find equality among each RTE_RELATION's partition key.

Whenever we find an equality clause A = B, where both A and B originates from
relation attributes (i.e., not random expressions), we create an
AttributeEquivalenceClass to record this knowledge. If we later find another
equivalence B = C, we create another AttributeEquivalenceClass. Finally, we can
apply transitity rules and generate a new AttributeEquivalenceClass which includes
A, B and C.

Note that equality among the members are identified by the varattno and rteIdentity.

Each equality among RTE_RELATION is saved using an AttributeEquivalenceClass where
each member attribute is identified by a AttributeEquivalenceMember. In the final
step, we try generate a common attribute equivalence class that holds as much as
AttributeEquivalenceMembers whose attributes are a partition keys.
2017-04-13 11:51:26 +03:00