Our checks to find subqueries in the rewritten query were not sufficient. When
multiple subqueries are present in the original query and some would be
replaced by a join, we could miss other subqueries that were not rewritten.
This in turn caused us not to go into the subquery planner, causing some
queries that were planning fine before to suddenly not plan anymore.
This was a regression introduced by #3171.
This is purely to enable better performance with prepared statements.
Before this commit, the fast path queries with prepared statements
where the distribution key includes a parameter always went through
distributed planning. After this change, we only go through distributed
planning on the first 5 executions.
DESCRIPTION: Fixes a problem when adding a new node due to tables referenced in a functions body
Fixes#3378
It was reported that `master_add_node` would fail if a distributed function has a table name referenced in its declare section of the body. By default postgres validates the body of a function on creation. This is not a problem in the normal case as tables are replicated to the workers when we distribute functions.
However when a new node is added we first create dependencies on the workers before we try to create any tables, and the original tables get created out of bound when the metadata gets synced to the new node. This causes the function body validator to raise an error the table is not on the worker.
To mitigate this issue we set `check_function_bodies` to `off` right before we are creating the function.
The added test shows this does resolve the issue. (issue can be reproduced on the commit without the fix)
Before this patch, Citus used to always recursively plan CTEs.
In PostgreSQL 12, there is a [logic](https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=608b167f9f9c4553c35bb1ec0eab9ddae643989b) for inlining CTEs, which is basically converting certain CTEs to subqueries.
With this patch, citus becomes capable of doing the same, can get rid of
recursively planning all the CTEs. Instead, the pushdown-able ones would
simply be converted to subquery pushdown. If the inlined CTE query cannot
be pushed down, it'd simply follow the recursive planning logic.
See an example below:
```SQL
-- the query that users pass
WITH some_users AS
(SELECT
users_table.user_id
FROM
users_table JOIN events_table USING (user_id) WHERE event_type = 5)
SELECT count(*) FROM users_table JOIN some_users USING (user_id);
-- worker query
SELECT count(*) AS COUNT
FROM ((users_table_102039 users_table
JOIN users_table_102039 users_table_1 ON ((users_table_1.user_id OPERATOR(pg_catalog.=) users_table.user_id)))
JOIN events_table_102071 events_table ON ((users_table.user_id OPERATOR(pg_catalog.=) events_table.user_id)))
WHERE (events_table.event_type OPERATOR(pg_catalog.=) 5)
```
There are few things to call-out for future reference and
help the reviewer(s) to understand the patch easier:
1) On top of Postgres' restrictions to inline CTEs, Citus enforces one more.
This is to prevent regressing on the SQL support. For example, the following
cte is OK to inline by Postgres. However, if inlined, Citus cannot plan the whole
query, so we prefer to skip inlining that cte:
```SQL
-- Citus should not inline this CTE because otherwise it cannot
-- plan the query
WITH cte_1 AS (SELECT * FROM test_table)
SELECT
*, row_number() OVER ()
FROM
cte_1;
```
2) Some exotic queries with multiple colocation groups involved
could become repartition joins. Basically, after the CTE inlining
happens, ShouldRecursivelyPlanNonColocatedSubqueries() fails to
detect that the query is a non-colocated subquery. We should improve
there to fix it. But, since we fall-back to planning again, the query is
successfully executed by Citus.
```SQL
SET citus.shard_count TO 4;
CREATE TABLE colocation_1 (key int, value int);
SELECT create_distributed_table('colocation_1', 'key');
SET citus.shard_count TO 8;
CREATE TABLE colocation_2 (key int, value int);
SELECT create_distributed_table('colocation_2', 'key');
-- which used to work because the cte was recursively planned
-- now the cte becomes a repartition join since
--- (a) the cte is replaced to a subquery
--- (b) since the subquery is very simple, postgres pulled it to become
--- a simple join
WITH cte AS (SELECT * FROM colocation_1)
SELECT count(*) FROM cte JOIN colocation_2 USING (key);
...
message: the query contains a join that requires repartitioning
detail:
hint: Set citus.enable_repartition_joins to on to enable repartitioning
...
┌───────┐
│ count │
├───────┤
│ 0 │
└───────┘
(1 row)
```
3) We decided to implement inlining CTEs even after standard planner.
In Postgres 12+, the restriction information in CTEs are generated
because the CTEs are actually treated as subqueries via Postgres' inline
capabilities.
In Postgres 11-, the restriction information is not generated for CTEs. Because
of that, some queries work differently on pg 11 vs pg 12. To see such queries,
see cte_inline.sql file, where the file has two output files.
4) As a side-effect of (2), we're now able to inline CTEs for INSERT .. SELECT
queries as well. Postgres prevents it, I cannot see a reason to prevent it. With
this capability, some of the INSERT ... SELECT queries where the cte is in the
SELECT query could become pushdownable. See an example:
```SQL
INSERT INTO test_table
WITH fist_table_cte AS
(SELECT * FROM test_table)
SELECT
key, value
FROM
fist_table_cte;
```
5) A class of queries now could be supported. Previously, if a CTE is used in the
outer part of an outer join, Citus would complained about that.
So, the following query:
```SQL
WITH cte AS (
SELECT * FROM users_table WHERE user_id = 1 ORDER BY value_1
)
SELECT
cte.user_id, cte.time, events_table.event_type
FROM
cte
LEFT JOIN
events_table ON cte.user_id = events_table.user_id
ORDER BY
1,2,3
LIMIT
5;
ERROR: cannot pushdown the subquery
DETAIL: Complex subqueries and CTEs cannot be in the outer part of the outer join
```
Becomes
```SQL
-- cte LEFT JOIN distributed_table should error out
WITH cte AS (
SELECT * FROM users_table WHERE user_id = 1 ORDER BY value_1
)
SELECT
cte.user_id, cte.time, events_table.event_type
FROM
cte
LEFT JOIN
events_table ON cte.user_id = events_table.user_id
ORDER BY
1,2,3
LIMIT
5;
user_id | time | event_type
---------+---------------------------------+------------
1 | Wed Nov 22 22:51:43.132261 2017 | 0
1 | Wed Nov 22 22:51:43.132261 2017 | 0
1 | Wed Nov 22 22:51:43.132261 2017 | 1
1 | Wed Nov 22 22:51:43.132261 2017 | 1
1 | Wed Nov 22 22:51:43.132261 2017 | 2
(5 rows)
```
In this commit, we're introducing a way to prevent CTE inlining via a GUC.
The GUC is used in all the tests where PG 11 and PG 12 tests would diverge
otherwise.
Note that, in PG 12, the restriction information for CTEs are generated. It
means that for some queries involving CTEs, Citus planner (router planner/
pushdown planner) may behave differently. So, via the GUC, we prevent
tests to diverge on PG 11 vs PG 12.
When we drop PG 11 support, we should get rid of the GUC, and mark
relevant ctes as MATERIALIZED, which does the same thing.
These set of tests has changed in both PG 11 and PG 12.
The changes are only about CTE inlining kicking in both
versions, and yielding the exact same distributed planning.
With this commit, we're adding the specific tests for CTE inlining.
The test has a different output file for pg 11, because as mentioned
in the previous commits, PG 12 generates more restriction information
for CTEs.
The idea is simple: Inline CTEs(if any), try distributed planning.
If the planning yields a successful distributed plan, simply return
it.
If the planning fails, fallback to distributed planning on the query
tree where CTEs are not inlined. In that case, if the planning failed
just because of the CTE inlining, via recursive planning, the same
query would yield a successful plan.
A very basic set of examples:
WITH cte_1 AS (SELECT * FROM test_table)
SELECT
*, row_number() OVER ()
FROM
cte_1;
or
WITH a AS (SELECT * FROM test_table),
b AS (SELECT * FROM test_table)
SELECT * FROM a JOIN b ON (a.value> b.value);
With this commit we add the necessary Citus function to inline CTEs
in a queryTree.
You might ask, why do we need to inline CTEs if Postgres is already
going to do it?
Few reasons behind this decision:
- One techinal node here is that Citus does the recursive CTE planning
by checking the originalQuery which is the query that has not gone
through the standard_planner().
CTEs in Citus is super powerful. It is practically key for full SQL
coverage for multi-shard queries. With CTEs, you can always reduce
any query multi-shard query into a router query via recursive
planning (thus full SQL coverage).
We cannot let CTE inlining break that. The main idea is Citus should
be able to retry planning if anything goes after CTE inlining.
So, by taking ownership of CTE inlining on the originalQuery, Citus
can fallback to recursive planning of CTEs if the planning with the
inlined query fails. It could have been a lot harder if we had relied
on standard_planner() to have the inlined CTEs on the original query.
- We want to have this feature in PostgreSQL 11 as well, but Postgres
only inlines in version 12
All the code in this commit is direct copy & paste from Postgres
source code.
We can classify the copy&paste code into two:
- Copy paste from CTE inline patch from postgres
(https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=608b167f9f9c4553c35bb1ec0eab9ddae643989b)
These include the functions inline_cte(), inline_cte_walker(),
contain_dml(), contain_dml_walker().
It also include the code in function PostgreSQLCTEInlineCondition().
We prefer to extract that code into a seperate function, because
(a) we'll re-use the logic later (b) we added one check for PG_11
Finally, the struct "inline_cte_walker_context" is also copied from
the same Postgres commit.
- Copy paste from the other parts of the Postgres code
In order to implement CTE inlining in Postgres 12, the hackers
modified the query_tree_walker()/range_table_walker() with the
18c0da88a5
Since Citus needs to support the same logic in PG 11, we copy & pasted
that functions (and related flags) with the names pg_12_query_tree_walker()
and pg_12_range_table_walker()
In two places I've made code more straight forward by using ROUTINE in our own codegen
Two changes which may seem extraneous:
AppendFunctionName was updated to not use pg_get_function_identity_arguments.
This is because that function includes ORDER BY when printing an aggregate like my_rank.
While ALTER AGGREGATE my_rank(x "any" ORDER BY y "any") is accepted by postgres,
ALTER ROUTINE my_rank(x "any" ORDER BY y "any") is not.
Tests were updated to use macaddr over integer. Using integer is flaky, our logic
could sometimes end up on tables like users_table. I originally wanted to use money,
but money isn't hashable.