Commit Graph

5883 Commits (release-11.0-ahmet)

Author SHA1 Message Date
Philip Dubé fdcc413559 Code cleanup of adaptive_executor, connection_management, placement_connection
adaptive_executor: sort includes, use foreach_ptr, remove lies from FinishDistributedExecution docs
connection_management: rename msecs, which isn't milliseconds
placement_connection: small typos
2020-01-17 17:44:47 +00:00
Önder Kalacı 5f34399e1f
Merge pull request #3388 from citusdata/local_prepared_on_top_lazy_deparse
Cache local plans on shards for Citus MX
2020-01-17 17:17:41 +01:00
Onder Kalaci 2f0ef8bc36 Apply feedback 1 2020-01-17 16:06:04 +01:00
Onder Kalaci fd17e4578e Improve tests 2020-01-17 16:02:57 +01:00
Onder Kalaci 0bf1e81e33 Cache local plans on BeginScan 2020-01-17 16:02:57 +01:00
Onder Kalaci 08d148d43e Make TaskAccessesLocalNode external function 2020-01-17 16:02:57 +01:00
Onder Kalaci 5dc454cdad Exclude localPlannedStatements from copy distributedPlan 2020-01-17 16:02:57 +01:00
Onder Kalaci ff12df411b Add LocalPlannedStatement struct 2020-01-17 16:02:57 +01:00
Önder Kalacı 4b5241c7b2
Merge pull request #3397 from citusdata/cte_inline_pg_11
Fix issues for CTE inlining on Postgres 11
2020-01-17 14:39:21 +01:00
Onder Kalaci 016f561e45 Ingest data for cte_inline tests 2020-01-17 12:46:00 +01:00
Onder Kalaci 3833a7e686 Fix issues for CTE inlining on Postgres 11
Comment from code:

/*
 * We had to implement this hack because on Postgres11 and below, the originalQuery
 * and the query would have significant differences in terms of CTEs where CTEs
 * would not be inlined on the query (as standard_planner() wouldn't inline CTEs
 * on PG 11 and below).
 *
 * Instead, we prefer to pass the inlined query to the distributed planning. We rely
 * on the fact that the query includes subqueries, and it'd definitely go through
 * query pushdown planning. During query pushdown planning, the only relevant query
 * tree is the original query.
 */
2020-01-17 11:59:02 +01:00
Jelte Fennema 246435be7e
Lazy query deparsing executable queries (#3350)
Deparsing and parsing a query can be heavy on CPU. When locally executing 
the query we don't need to do this in theory most of the time.

This PR is the first step in allowing to skip deparsing and parsing
the query in these cases, by lazily creating the query string and
storing the query in the task. Future commits will make use of this and
not deparse and parse the query anymore, but use the one from the task
directly.
2020-01-17 11:49:43 +01:00
Hadi Moshayedi 60a2bc5ec2
Merge pull request #3376 from citusdata/insert_select
INSERT...SELECT with re-partitioning
2020-01-17 01:36:36 -08:00
Hadi Moshayedi 6cf1c01660 Don't use repartitioned INSERT/SELECT for repartition joins 2020-01-16 23:40:31 -08:00
Hadi Moshayedi 5eeb07124f Repartitioned INSERT/SELECT: include job id in result id prefix 2020-01-16 23:24:52 -08:00
Hadi Moshayedi a079278b0c Repartitioned INSERT/SELECT: Add a GUC to enable/disable it 2020-01-16 23:24:52 -08:00
Hadi Moshayedi ce5eea4885 INSERT/SELECT: make SELECT column names unique 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 3258d87f3e Isolation tests for INSERT/SELECT repartition 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 8b27a9a195 More range partitioned tests 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 8635396cea Repartitioned INSERT/SELECT: Test rollback behaviour 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 43218eebf6 Failure tests for INSERT/SELECT repartition 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 665b33dca1 MX tests for INSERT/SELECT repartition 2020-01-16 23:24:52 -08:00
Hadi Moshayedi af2349f21f Repartitioned INSERT/SELECT: Add a prepared statement test 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 97072c9eb1 INSERT/SELECT: show method in EXPLAIN output 2020-01-16 23:24:52 -08:00
Hadi Moshayedi b143d9588a Repartitioned INSERT/SELECT: Test GROUP BY 2020-01-16 23:24:52 -08:00
Hadi Moshayedi fe548b762f Repartitioned INSERT/SELECT: Test CTEs 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 494cc383cc Repartitioned INSERT/SELECT: Enable RETURNING 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 4b14347fc3 Tests for DML followed by insert/select repartition 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 44a2aede16 Don't start a coordinated transaction on workers.
Otherwise transaction hooks of Citus kick in and might cause unwanted errors.
2020-01-16 23:24:52 -08:00
Hadi Moshayedi 42c3c03b85 Handle extra columns added in ExpandWorkerTargetEntry() in repartitioned INSERT/SELECT 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 89463f9760 Repartitioned INSERT/SELECT: cast columns in SELECT targets 2020-01-16 23:24:52 -08:00
Hadi Moshayedi d67a384350 Enable repartitioned INSERT/SELECT ON CONFLICT. 2020-01-16 23:24:52 -08:00
Hadi Moshayedi b4e5f4b10a Implement INSERT ... SELECT with repartitioning 2020-01-16 23:24:52 -08:00
Hadi Moshayedi ced876358d INSERT/SELECT: Refactor out AddInsertSelectCasts 2020-01-16 23:24:52 -08:00
Hadi Moshayedi d449c1857c INSERT/SELECT: Use ExecutePlan* instead of ExecuteSelect* 2020-01-16 23:24:52 -08:00
Philip Dubé a53b844939
Merge pull request #3393 from citusdata/order_by_multirow_insert
Add ORDER BY to multi_row_insert.sql
2020-01-17 00:31:32 +00:00
Hadi Moshayedi e30580e2bd Add ORDER BY to multi_row_insert.sql 2020-01-16 15:20:39 -08:00
Jelte Fennema 062bda29fb
Fix bug causing errors when planning a query with multiple subq… (#3389)
Our checks to find subqueries in the rewritten query were not sufficient. When
multiple subqueries are present in the original query and some would be
replaced by a join, we could miss other subqueries that were not rewritten.
This in turn caused us not to go into the subquery planner, causing some
queries that were planning fine before to suddenly not plan anymore.

This was a regression introduced by #3171.
2020-01-16 19:01:13 +01:00
Jelte Fennema 0ee1eab070 Make tests fail with a useful error message 2020-01-16 18:30:30 +01:00
Jelte Fennema cb5154cf03 Add more failing tests, of which some have bad error messages 2020-01-16 18:30:30 +01:00
Marco Slot 82f1fffa28 Fix epoll_ctl() error message on connection error 2020-01-16 06:40:57 +01:00
Önder Kalacı 89d5bed88d
Merge pull request #3369 from citusdata/move_fast_path_pruning_to_executor
Defer shard pruning for fast-path router queries to execution
2020-01-16 17:35:33 +01:00
Onder Kalaci dc17c2658e Defer shard pruning for fast-path router queries to execution
This is purely to enable better performance with prepared statements.
Before this commit, the fast path queries with prepared statements
where the distribution key includes a parameter always went through
distributed planning. After this change, we only go through distributed
planning on the first 5 executions.
2020-01-16 16:59:36 +01:00
Onder Kalaci 933d666c0d Do not forget to copy fastPathRouterPlan@DistributedPlan 2020-01-16 16:39:20 +01:00
Halil Ozan Akgül 023f40ca60
Merge pull request #3373 from citusdata/alter_table_schema_propagation
Adds alter table schema propagation
2020-01-16 17:18:01 +03:00
Halil Ozan Akgul c5539d20d9 Adds alter table schema propagation 2020-01-16 17:04:16 +03:00
Nils Dijk b6e09eb691
Fix: distributed function with table reference in declare (#3384)
DESCRIPTION: Fixes a problem when adding a new node due to tables referenced in a functions body

Fixes #3378 

It was reported that `master_add_node` would fail if a distributed function has a table name referenced in its declare section of the body. By default postgres validates the body of a function on creation. This is not a problem in the normal case as tables are replicated to the workers when we distribute functions.

However when a new node is added we first create dependencies on the workers before we try to create any tables, and the original tables get created out of bound when the metadata gets synced to the new node. This causes the function body validator to raise an error the table is not on the worker.

To mitigate this issue we set `check_function_bodies` to `off` right before we are creating the function.

The added test shows this does resolve the issue. (issue can be reproduced on the commit without the fix)
2020-01-16 14:21:54 +01:00
Jelte Fennema e76281500c
Replace shardId lock with lock on colocation+shardIntervalIndex (#3374)
This new locking pattern makes sure that some deadlocks that could
happend during rebalancing cannot occur anymore.
2020-01-16 13:14:01 +01:00
Jelte Fennema 86876c0473
CTE pushdown via CTE inlining in distributed planning (#3161)
Before this patch, Citus used to always recursively plan CTEs.
In PostgreSQL 12, there is a [logic](https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=608b167f9f9c4553c35bb1ec0eab9ddae643989b) for inlining CTEs, which is basically converting certain CTEs to subqueries.

With this patch, citus becomes capable of doing the same, can get rid of
recursively planning all the CTEs. Instead, the pushdown-able ones would
simply be converted to subquery pushdown. If the inlined CTE query cannot
be pushed down, it'd simply follow the recursive planning logic.

See an example below:

```SQL
-- the query that users pass
WITH some_users AS
(SELECT
        users_table.user_id
FROM
 users_table JOIN events_table USING (user_id) WHERE event_type = 5)
SELECT count(*) FROM users_table JOIN some_users USING (user_id);

-- worker query
SELECT count(*) AS COUNT
FROM ((users_table_102039 users_table
       JOIN users_table_102039 users_table_1 ON ((users_table_1.user_id OPERATOR(pg_catalog.=) users_table.user_id)))
      JOIN events_table_102071 events_table ON ((users_table.user_id OPERATOR(pg_catalog.=) events_table.user_id)))
WHERE (events_table.event_type OPERATOR(pg_catalog.=) 5)
```

There are few things to call-out for future reference and
help the reviewer(s) to understand the patch easier:

1) On top of Postgres' restrictions to inline CTEs, Citus enforces one more.
This is to prevent regressing on the SQL support. For example, the following
cte is OK to inline by Postgres. However, if inlined, Citus cannot plan the whole
query, so we prefer to skip inlining that cte:

```SQL
-- Citus should not inline this CTE because otherwise it cannot
-- plan the query
WITH cte_1 AS (SELECT * FROM test_table) 
SELECT 
	*, row_number() OVER () 
FROM 
	cte_1;
```

2) Some exotic queries with multiple colocation groups involved
   could become repartition joins. Basically, after the CTE inlining
   happens, ShouldRecursivelyPlanNonColocatedSubqueries() fails to
   detect that the query is a non-colocated subquery. We should improve
   there to fix it. But, since we fall-back to planning again, the query is
   successfully executed by Citus.
```SQL
SET citus.shard_count TO 4;
CREATE TABLE  colocation_1 (key int, value int);
SELECT create_distributed_table('colocation_1', 'key');

SET citus.shard_count TO 8;
CREATE TABLE  colocation_2 (key int, value int);
SELECT create_distributed_table('colocation_2', 'key');

-- which used to work because the cte was recursively planned
-- now the cte becomes a repartition join since
    --- (a) the cte is replaced to a subquery
    --- (b) since the subquery is very simple, postgres pulled it to become
    ---     a simple join
WITH cte AS (SELECT * FROM colocation_1)
SELECT count(*) FROM cte JOIN colocation_2 USING (key);
...
message: the query contains a join that requires repartitioning
detail: 
hint: Set citus.enable_repartition_joins to on to enable repartitioning
...
┌───────┐
│ count │
├───────┤
│     0 │
└───────┘
(1 row)
```

3) We decided to implement inlining CTEs even after standard planner. 
In Postgres 12+, the restriction information in CTEs are generated
because the CTEs are actually treated as subqueries via Postgres' inline
capabilities.

In Postgres 11-, the restriction information is not generated for CTEs. Because
of that, some queries work differently on pg 11 vs pg 12. To see such queries,
see cte_inline.sql file, where the file has two output files.

4) As a side-effect of (2), we're now able to inline CTEs for INSERT .. SELECT
queries as well. Postgres prevents it, I cannot see a reason to prevent it. With
this capability, some of the INSERT ... SELECT queries where the cte is in the
SELECT query could become pushdownable. See an example:

```SQL
INSERT INTO test_table
WITH fist_table_cte AS
  (SELECT * FROM test_table)
    SELECT
      key, value
    FROM
      fist_table_cte;
```

5) A class of queries now could be supported. Previously, if a CTE is used in the
outer part of an outer join, Citus would complained about that.

So, the following query:

```SQL
WITH cte AS (
  SELECT * FROM users_table WHERE user_id = 1 ORDER BY value_1
)
SELECT
  cte.user_id, cte.time, events_table.event_type
FROM
  cte
LEFT JOIN
  events_table ON cte.user_id = events_table.user_id
ORDER BY
  1,2,3
LIMIT
  5;
ERROR:  cannot pushdown the subquery
DETAIL:  Complex subqueries and CTEs cannot be in the outer part of the outer join
```

Becomes
```SQL
-- cte LEFT JOIN distributed_table should error out
WITH cte AS (
  SELECT * FROM users_table WHERE user_id = 1 ORDER BY value_1
)
SELECT
  cte.user_id, cte.time, events_table.event_type
FROM
  cte
LEFT JOIN
  events_table ON cte.user_id = events_table.user_id
ORDER BY
  1,2,3
LIMIT
  5;
 user_id |              time               | event_type
---------+---------------------------------+------------
       1 | Wed Nov 22 22:51:43.132261 2017 |          0
       1 | Wed Nov 22 22:51:43.132261 2017 |          0
       1 | Wed Nov 22 22:51:43.132261 2017 |          1
       1 | Wed Nov 22 22:51:43.132261 2017 |          1
       1 | Wed Nov 22 22:51:43.132261 2017 |          2
(5 rows)
```
2020-01-16 12:43:48 +01:00
Jelte Fennema 86343bcc8f Re-add test that broke with GUC workaround 2020-01-16 12:34:50 +01:00