Commit Graph

1245 Commits (1ca108fedae270ca97eec3046d70be3cced926f6)

Author SHA1 Message Date
Philip Dubé 40ce531850 Update diff-filter to handle lines removed by normalization
Add a script to test our diff logic

pg_regress_multi updated to rely on $PATH having copy_modified to fix testing with VPATH
2020-01-28 15:39:40 +00:00
Jelte Fennema b9eee70fa5 Fix random output ordering in CTE inlining test (#3434) 2020-01-27 16:38:27 +01:00
Jelte Fennema c38446b5f5 Replace denormalized test output with normalized at the end of the run 2020-01-24 11:42:38 +01:00
Önder Kalacı 4519d3411d
Improve the representation of used sub plans (#3411)
Previously, we've identified the usedSubPlans by only looking
to the subPlanId.

With this commit, we're expanding it to also include information
on the location of the subPlan.

This is useful to distinguish the cases where the subPlan is used
either on only HAVING or both HAVING and any other part of the query.
2020-01-24 10:47:14 +01:00
Philip Dubé 69dde460de See what flaky multi_extension test is doing with roles 2020-01-23 21:50:40 +00:00
Philip Dubé 5e55a36172 Avoid obscuring regression test diffs with normalization
First, diff is updated to not update the files in-place

For some reason diff is being called multiple times,
so $file1.unmodified becomes normalized on second invocation

Secondly, diff-filter updates output to come from the unmodified version

Normalization is serving two purposes:
- avoid diff noise in regressions
- avoid diff noise in commits when expected result is updated

The first purpose only wants to reduce the lines which diff registers,
whereas the second wants those changes to be committed
2020-01-23 18:51:23 +00:00
Önder Kalacı ef7d1ea91d
Locally execute queries that don't need any data access (#3410)
* Update shardPlacement->nodeId to uint

As the source of the shardPlacement->nodeId is always workerNode->nodeId,
and that is uint32.

We had this hack because of: 0ea4e52df5 (r266421409)

And, that is gone with: 90056f7d3c (diff-c532177d74c72d3f0e7cd10e448ab3c6L1123)

So, we're safe to do it now.

* Relax the restrictions on using the local execution

Previously, whenever any local execution happens, we disabled further
commands to do any remote queries. The basic motivation for doing that
is to prevent any accesses in the same transaction block to access the
same placements over multiple sessions: one is local session the other
is remote session to the same placement.

However, the current implementation does not distinguish local accesses
being to a placement or not. For example, we could have local accesses
that only touches intermediate results. In that case, we should not
implement the same restrictions as they become useless.

So, this is a pre-requisite for executing the intermediate result only
queries locally.

* Update the error messages

As the underlying implementation has changed, reflect it in the error
messages.

* Keep track of connections to local node

With this commit, we're adding infrastructure to track if any connection
to the same local host is done or not.

The main motivation for doing this is that we've previously were more
conservative about not choosing local execution. Simply, we disallowed
local execution if any connection to any remote node is done. However,
if we want to use local execution for intermediate result only queries,
this'd be annoying because we expect all queries to touch remote node
before the final query.

Note that this approach is still limiting in Citus MX case, but for now
we can ignore that.

* Formalize the concept of Local Node

Also some minor refactoring while creating the dummy placement

* Write intermediate results locally when the results are only needed locally

Before this commit, Citus used to always broadcast all the intermediate
results to remote nodes. However, it is possible to skip pushing
the results to remote nodes always.

There are two notable cases for doing that:

   (a) When the query consists of only intermediate results
   (b) When the query is a zero shard query

In both of the above cases, we don't need to access any data on the shards. So,
it is a valuable optimization to skip pushing the results to remote nodes.

The pattern mentioned in (a) is actually a common patterns that Citus users
use in practice. For example, if you have the following query:

WITH cte_1 AS (...), cte_2 AS (....), ... cte_n (...)
SELECT ... FROM cte_1 JOIN cte_2 .... JOIN cte_n ...;

The final query could be operating only on intermediate results. With this patch,
the intermediate results of the ctes are not unnecessarily pushed to remote
nodes.

* Add specific regression tests

As there are edge cases in Citus MX and with round-robin policy,
use the same queries on those cases as well.

* Fix failure tests

By forcing not to use local execution for intermediate results since
all the tests expects the results to be pushed remotely.

* Fix flaky test

* Apply code-review feedback

Mostly style changes

* Limit the max value of pg_dist_node_seq to reserve for internal use
2020-01-23 18:28:34 +01:00
Hadi Moshayedi be647ad944 Output filenames in ensure_no_intermediate_data_leak
This can helpful in guiding us where to look when this test fails.
For example, if the result file has repartitioned_results_ prefix,
then we need to look into repartitioned insert/select. Otherwise
it is probably a CTE or a subquery.
2020-01-22 11:12:16 -08:00
Jelte Fennema cd5259a25a
Do not place new shards with shards in TO_DELETE state (#3408)
When creating a new distributed table. The shards would colocate with shards
with SHARD_STATE_TO_DELETE (shardstate = 4). This means if that state was
because of a shard move the new shard would be created on two nodes and it
would not get deleted since it's shard state would be 1.
2020-01-22 14:52:12 +01:00
Halil Ozan Akgul b40f067d05 Adds propagation for grant on schema commands 2020-01-20 14:51:28 +03:00
Onder Kalaci fd17e4578e Improve tests 2020-01-17 16:02:57 +01:00
Onder Kalaci 0bf1e81e33 Cache local plans on BeginScan 2020-01-17 16:02:57 +01:00
Onder Kalaci 016f561e45 Ingest data for cte_inline tests 2020-01-17 12:46:00 +01:00
Onder Kalaci 3833a7e686 Fix issues for CTE inlining on Postgres 11
Comment from code:

/*
 * We had to implement this hack because on Postgres11 and below, the originalQuery
 * and the query would have significant differences in terms of CTEs where CTEs
 * would not be inlined on the query (as standard_planner() wouldn't inline CTEs
 * on PG 11 and below).
 *
 * Instead, we prefer to pass the inlined query to the distributed planning. We rely
 * on the fact that the query includes subqueries, and it'd definitely go through
 * query pushdown planning. During query pushdown planning, the only relevant query
 * tree is the original query.
 */
2020-01-17 11:59:02 +01:00
Jelte Fennema 246435be7e
Lazy query deparsing executable queries (#3350)
Deparsing and parsing a query can be heavy on CPU. When locally executing 
the query we don't need to do this in theory most of the time.

This PR is the first step in allowing to skip deparsing and parsing
the query in these cases, by lazily creating the query string and
storing the query in the task. Future commits will make use of this and
not deparse and parse the query anymore, but use the one from the task
directly.
2020-01-17 11:49:43 +01:00
Hadi Moshayedi 6cf1c01660 Don't use repartitioned INSERT/SELECT for repartition joins 2020-01-16 23:40:31 -08:00
Hadi Moshayedi 5eeb07124f Repartitioned INSERT/SELECT: include job id in result id prefix 2020-01-16 23:24:52 -08:00
Hadi Moshayedi a079278b0c Repartitioned INSERT/SELECT: Add a GUC to enable/disable it 2020-01-16 23:24:52 -08:00
Hadi Moshayedi ce5eea4885 INSERT/SELECT: make SELECT column names unique 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 3258d87f3e Isolation tests for INSERT/SELECT repartition 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 8b27a9a195 More range partitioned tests 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 8635396cea Repartitioned INSERT/SELECT: Test rollback behaviour 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 43218eebf6 Failure tests for INSERT/SELECT repartition 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 665b33dca1 MX tests for INSERT/SELECT repartition 2020-01-16 23:24:52 -08:00
Hadi Moshayedi af2349f21f Repartitioned INSERT/SELECT: Add a prepared statement test 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 97072c9eb1 INSERT/SELECT: show method in EXPLAIN output 2020-01-16 23:24:52 -08:00
Hadi Moshayedi b143d9588a Repartitioned INSERT/SELECT: Test GROUP BY 2020-01-16 23:24:52 -08:00
Hadi Moshayedi fe548b762f Repartitioned INSERT/SELECT: Test CTEs 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 494cc383cc Repartitioned INSERT/SELECT: Enable RETURNING 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 4b14347fc3 Tests for DML followed by insert/select repartition 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 44a2aede16 Don't start a coordinated transaction on workers.
Otherwise transaction hooks of Citus kick in and might cause unwanted errors.
2020-01-16 23:24:52 -08:00
Hadi Moshayedi 42c3c03b85 Handle extra columns added in ExpandWorkerTargetEntry() in repartitioned INSERT/SELECT 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 89463f9760 Repartitioned INSERT/SELECT: cast columns in SELECT targets 2020-01-16 23:24:52 -08:00
Hadi Moshayedi d67a384350 Enable repartitioned INSERT/SELECT ON CONFLICT. 2020-01-16 23:24:52 -08:00
Hadi Moshayedi b4e5f4b10a Implement INSERT ... SELECT with repartitioning 2020-01-16 23:24:52 -08:00
Hadi Moshayedi e30580e2bd Add ORDER BY to multi_row_insert.sql 2020-01-16 15:20:39 -08:00
Jelte Fennema 0ee1eab070 Make tests fail with a useful error message 2020-01-16 18:30:30 +01:00
Jelte Fennema cb5154cf03 Add more failing tests, of which some have bad error messages 2020-01-16 18:30:30 +01:00
Onder Kalaci dc17c2658e Defer shard pruning for fast-path router queries to execution
This is purely to enable better performance with prepared statements.
Before this commit, the fast path queries with prepared statements
where the distribution key includes a parameter always went through
distributed planning. After this change, we only go through distributed
planning on the first 5 executions.
2020-01-16 16:59:36 +01:00
Halil Ozan Akgul c5539d20d9 Adds alter table schema propagation 2020-01-16 17:04:16 +03:00
Nils Dijk b6e09eb691
Fix: distributed function with table reference in declare (#3384)
DESCRIPTION: Fixes a problem when adding a new node due to tables referenced in a functions body

Fixes #3378 

It was reported that `master_add_node` would fail if a distributed function has a table name referenced in its declare section of the body. By default postgres validates the body of a function on creation. This is not a problem in the normal case as tables are replicated to the workers when we distribute functions.

However when a new node is added we first create dependencies on the workers before we try to create any tables, and the original tables get created out of bound when the metadata gets synced to the new node. This causes the function body validator to raise an error the table is not on the worker.

To mitigate this issue we set `check_function_bodies` to `off` right before we are creating the function.

The added test shows this does resolve the issue. (issue can be reproduced on the commit without the fix)
2020-01-16 14:21:54 +01:00
Jelte Fennema e76281500c
Replace shardId lock with lock on colocation+shardIntervalIndex (#3374)
This new locking pattern makes sure that some deadlocks that could
happend during rebalancing cannot occur anymore.
2020-01-16 13:14:01 +01:00
Jelte Fennema 86343bcc8f Re-add test that broke with GUC workaround 2020-01-16 12:34:50 +01:00
Jelte Fennema 6b9b633695 Add more tests for prepared statements 2020-01-16 12:28:15 +01:00
Jelte Fennema 43a3fdd12f Fix comment 2020-01-16 12:28:15 +01:00
Jelte Fennema fe3827e499 Add tests for [NOT] MATERIALEZED 2020-01-16 12:28:15 +01:00
Onder Kalaci 326dfab44a Fix a query which triggers an existing bug, see https://github.com/citusdata/citus/issues/3189#issuecomment-571497051 2020-01-16 12:28:15 +01:00
Onder Kalaci c653923960 Update regression tests 6
Local execution and CTE pushdown
2020-01-16 12:28:15 +01:00
Onder Kalaci 3818be45a6 Update regression tests-5
Failure tests that rely on intermediate results
2020-01-16 12:28:15 +01:00
Onder Kalaci 1e85938b46 Update regression tests-4
Update the MX tests. Similar to the previous commits, prevent CTE
inlining in some cases to prevent divergent test outputs.
2020-01-16 12:28:15 +01:00