Some refactoring:
Consolidate expression which decides whether GROUP BY/HAVING are pushed down
Rename early pullUpIntermediateRows to hasNonDistributableAggregates
Create WorkerColumnName to handle formatting WORKER_COLUMN_FORMAT
Ignore NULL StringInfo pointers to SafeToPushdownWindowFunction
Fix bug where SubqueryPushdownMultiNodeTree mutates supplied Query,
SafeToPushdownWindowFunction requires the original query as it relies on rtable
Previously we checked if an operator is in pg_catalog, and if it wasn't we prefixed it with namespace in worker queries. This can have a huge impact on performance of physical planner when using custom data types.
This happened regardless of current search_path config, because Citus overrides the search path in get_query_def_extended(). When we do so, the check for existence of the operator in current search path in generate_operator_name() fails for any operators outside pg_catalog. This means that nothing gets cached, and in the following calls we will again recheck the system tables for existence of the operators, which took an additional 40-50ms for some of the usecases we were seeing.
In this change we skip the pg_catalog check, and always prefix the operator with its namespace.
- changes in ruleutils_11.c is reflected
- vacuum statement api change is handled. We now allow
multi-table vacuum commands.
- some other function header changes are reflected
- api conflicts between PG11 and earlier versions
are handled by adding shims in version_compat.h
- various regression tests are fixed due output and
functionality in PG1
- no change is made to support new features in PG11
they need to be handled by new commit
This is the first of series of window function work.
We can now support window functions that can be pushed down to workers.
Window function must have distribution column in the partition clause
to be pushed down.
With #1804 (and related PRs), Citus gained the ability to
plan subqueries that are not safe to pushdown.
There are two high-level requirements for pushing down subqueries:
* Individual subqueries that require a merge step (i.e., GROUP BY
on non-distribution key, or LIMIT in the subquery etc). We've
handled such subqueries via #1876.
* Combination of subqueries that are not joined on distribution keys.
This commit aims to recursively plan some of such subqueries to make
the whole query safe to pushdown.
The main logic behind non colocated subquery joins is that we pick
an anchor range table entry and check for distribution key equality
of any other subqueries in the given query. If for a given subquery,
we cannot find distribution key equality with the anchor rte, we
recursively plan that subquery.
We also used a hacky solution for picking relations as the anchor range
table entries. The hack is that we wrap them into a subquery. This is only
necessary since some of the attribute equivalance checks are based on
queries rather than range table entries.
We used to error out if the join clause includes filters like
t1.a < t2.a even if other filter like t1.key = t2.key exists.
Recently we lifted that restriction in subquery planning by
not lifting that restriction and focusing on equivalance classes
provided by postgres.
This checkin forwards previously erroring out real-time queries
due to join clauses to subquery planner and let it handle the
join even if the query does not have a subquery.
We are now pushing down queries that do not have any
subqueries in it. Error message looked misleading, changed to a more descriptive one.