Commit Graph

1314 Commits (61bc47e6a8339fc15b7caed631f72d5db06403b5)

Author SHA1 Message Date
Önder Kalacı ef7d1ea91d
Locally execute queries that don't need any data access (#3410)
* Update shardPlacement->nodeId to uint

As the source of the shardPlacement->nodeId is always workerNode->nodeId,
and that is uint32.

We had this hack because of: 0ea4e52df5 (r266421409)

And, that is gone with: 90056f7d3c (diff-c532177d74c72d3f0e7cd10e448ab3c6L1123)

So, we're safe to do it now.

* Relax the restrictions on using the local execution

Previously, whenever any local execution happens, we disabled further
commands to do any remote queries. The basic motivation for doing that
is to prevent any accesses in the same transaction block to access the
same placements over multiple sessions: one is local session the other
is remote session to the same placement.

However, the current implementation does not distinguish local accesses
being to a placement or not. For example, we could have local accesses
that only touches intermediate results. In that case, we should not
implement the same restrictions as they become useless.

So, this is a pre-requisite for executing the intermediate result only
queries locally.

* Update the error messages

As the underlying implementation has changed, reflect it in the error
messages.

* Keep track of connections to local node

With this commit, we're adding infrastructure to track if any connection
to the same local host is done or not.

The main motivation for doing this is that we've previously were more
conservative about not choosing local execution. Simply, we disallowed
local execution if any connection to any remote node is done. However,
if we want to use local execution for intermediate result only queries,
this'd be annoying because we expect all queries to touch remote node
before the final query.

Note that this approach is still limiting in Citus MX case, but for now
we can ignore that.

* Formalize the concept of Local Node

Also some minor refactoring while creating the dummy placement

* Write intermediate results locally when the results are only needed locally

Before this commit, Citus used to always broadcast all the intermediate
results to remote nodes. However, it is possible to skip pushing
the results to remote nodes always.

There are two notable cases for doing that:

   (a) When the query consists of only intermediate results
   (b) When the query is a zero shard query

In both of the above cases, we don't need to access any data on the shards. So,
it is a valuable optimization to skip pushing the results to remote nodes.

The pattern mentioned in (a) is actually a common patterns that Citus users
use in practice. For example, if you have the following query:

WITH cte_1 AS (...), cte_2 AS (....), ... cte_n (...)
SELECT ... FROM cte_1 JOIN cte_2 .... JOIN cte_n ...;

The final query could be operating only on intermediate results. With this patch,
the intermediate results of the ctes are not unnecessarily pushed to remote
nodes.

* Add specific regression tests

As there are edge cases in Citus MX and with round-robin policy,
use the same queries on those cases as well.

* Fix failure tests

By forcing not to use local execution for intermediate results since
all the tests expects the results to be pushed remotely.

* Fix flaky test

* Apply code-review feedback

Mostly style changes

* Limit the max value of pg_dist_node_seq to reserve for internal use
2020-01-23 18:28:34 +01:00
Hadi Moshayedi be647ad944 Output filenames in ensure_no_intermediate_data_leak
This can helpful in guiding us where to look when this test fails.
For example, if the result file has repartitioned_results_ prefix,
then we need to look into repartitioned insert/select. Otherwise
it is probably a CTE or a subquery.
2020-01-22 11:12:16 -08:00
Jelte Fennema cd5259a25a
Do not place new shards with shards in TO_DELETE state (#3408)
When creating a new distributed table. The shards would colocate with shards
with SHARD_STATE_TO_DELETE (shardstate = 4). This means if that state was
because of a shard move the new shard would be created on two nodes and it
would not get deleted since it's shard state would be 1.
2020-01-22 14:52:12 +01:00
Halil Ozan Akgul b40f067d05 Adds propagation for grant on schema commands 2020-01-20 14:51:28 +03:00
Onder Kalaci fd17e4578e Improve tests 2020-01-17 16:02:57 +01:00
Onder Kalaci 0bf1e81e33 Cache local plans on BeginScan 2020-01-17 16:02:57 +01:00
Onder Kalaci 016f561e45 Ingest data for cte_inline tests 2020-01-17 12:46:00 +01:00
Onder Kalaci 3833a7e686 Fix issues for CTE inlining on Postgres 11
Comment from code:

/*
 * We had to implement this hack because on Postgres11 and below, the originalQuery
 * and the query would have significant differences in terms of CTEs where CTEs
 * would not be inlined on the query (as standard_planner() wouldn't inline CTEs
 * on PG 11 and below).
 *
 * Instead, we prefer to pass the inlined query to the distributed planning. We rely
 * on the fact that the query includes subqueries, and it'd definitely go through
 * query pushdown planning. During query pushdown planning, the only relevant query
 * tree is the original query.
 */
2020-01-17 11:59:02 +01:00
Jelte Fennema 246435be7e
Lazy query deparsing executable queries (#3350)
Deparsing and parsing a query can be heavy on CPU. When locally executing 
the query we don't need to do this in theory most of the time.

This PR is the first step in allowing to skip deparsing and parsing
the query in these cases, by lazily creating the query string and
storing the query in the task. Future commits will make use of this and
not deparse and parse the query anymore, but use the one from the task
directly.
2020-01-17 11:49:43 +01:00
Hadi Moshayedi 6cf1c01660 Don't use repartitioned INSERT/SELECT for repartition joins 2020-01-16 23:40:31 -08:00
Hadi Moshayedi 5eeb07124f Repartitioned INSERT/SELECT: include job id in result id prefix 2020-01-16 23:24:52 -08:00
Hadi Moshayedi a079278b0c Repartitioned INSERT/SELECT: Add a GUC to enable/disable it 2020-01-16 23:24:52 -08:00
Hadi Moshayedi ce5eea4885 INSERT/SELECT: make SELECT column names unique 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 3258d87f3e Isolation tests for INSERT/SELECT repartition 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 8b27a9a195 More range partitioned tests 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 8635396cea Repartitioned INSERT/SELECT: Test rollback behaviour 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 43218eebf6 Failure tests for INSERT/SELECT repartition 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 665b33dca1 MX tests for INSERT/SELECT repartition 2020-01-16 23:24:52 -08:00
Hadi Moshayedi af2349f21f Repartitioned INSERT/SELECT: Add a prepared statement test 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 97072c9eb1 INSERT/SELECT: show method in EXPLAIN output 2020-01-16 23:24:52 -08:00
Hadi Moshayedi b143d9588a Repartitioned INSERT/SELECT: Test GROUP BY 2020-01-16 23:24:52 -08:00
Hadi Moshayedi fe548b762f Repartitioned INSERT/SELECT: Test CTEs 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 494cc383cc Repartitioned INSERT/SELECT: Enable RETURNING 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 4b14347fc3 Tests for DML followed by insert/select repartition 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 44a2aede16 Don't start a coordinated transaction on workers.
Otherwise transaction hooks of Citus kick in and might cause unwanted errors.
2020-01-16 23:24:52 -08:00
Hadi Moshayedi 42c3c03b85 Handle extra columns added in ExpandWorkerTargetEntry() in repartitioned INSERT/SELECT 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 89463f9760 Repartitioned INSERT/SELECT: cast columns in SELECT targets 2020-01-16 23:24:52 -08:00
Hadi Moshayedi d67a384350 Enable repartitioned INSERT/SELECT ON CONFLICT. 2020-01-16 23:24:52 -08:00
Hadi Moshayedi b4e5f4b10a Implement INSERT ... SELECT with repartitioning 2020-01-16 23:24:52 -08:00
Hadi Moshayedi e30580e2bd Add ORDER BY to multi_row_insert.sql 2020-01-16 15:20:39 -08:00
Jelte Fennema 0ee1eab070 Make tests fail with a useful error message 2020-01-16 18:30:30 +01:00
Jelte Fennema cb5154cf03 Add more failing tests, of which some have bad error messages 2020-01-16 18:30:30 +01:00
Onder Kalaci dc17c2658e Defer shard pruning for fast-path router queries to execution
This is purely to enable better performance with prepared statements.
Before this commit, the fast path queries with prepared statements
where the distribution key includes a parameter always went through
distributed planning. After this change, we only go through distributed
planning on the first 5 executions.
2020-01-16 16:59:36 +01:00
Halil Ozan Akgul c5539d20d9 Adds alter table schema propagation 2020-01-16 17:04:16 +03:00
Nils Dijk b6e09eb691
Fix: distributed function with table reference in declare (#3384)
DESCRIPTION: Fixes a problem when adding a new node due to tables referenced in a functions body

Fixes #3378 

It was reported that `master_add_node` would fail if a distributed function has a table name referenced in its declare section of the body. By default postgres validates the body of a function on creation. This is not a problem in the normal case as tables are replicated to the workers when we distribute functions.

However when a new node is added we first create dependencies on the workers before we try to create any tables, and the original tables get created out of bound when the metadata gets synced to the new node. This causes the function body validator to raise an error the table is not on the worker.

To mitigate this issue we set `check_function_bodies` to `off` right before we are creating the function.

The added test shows this does resolve the issue. (issue can be reproduced on the commit without the fix)
2020-01-16 14:21:54 +01:00
Jelte Fennema e76281500c
Replace shardId lock with lock on colocation+shardIntervalIndex (#3374)
This new locking pattern makes sure that some deadlocks that could
happend during rebalancing cannot occur anymore.
2020-01-16 13:14:01 +01:00
Jelte Fennema 86343bcc8f Re-add test that broke with GUC workaround 2020-01-16 12:34:50 +01:00
Jelte Fennema 6b9b633695 Add more tests for prepared statements 2020-01-16 12:28:15 +01:00
Jelte Fennema 43a3fdd12f Fix comment 2020-01-16 12:28:15 +01:00
Jelte Fennema fe3827e499 Add tests for [NOT] MATERIALEZED 2020-01-16 12:28:15 +01:00
Onder Kalaci 326dfab44a Fix a query which triggers an existing bug, see https://github.com/citusdata/citus/issues/3189#issuecomment-571497051 2020-01-16 12:28:15 +01:00
Onder Kalaci c653923960 Update regression tests 6
Local execution and CTE pushdown
2020-01-16 12:28:15 +01:00
Onder Kalaci 3818be45a6 Update regression tests-5
Failure tests that rely on intermediate results
2020-01-16 12:28:15 +01:00
Onder Kalaci 1e85938b46 Update regression tests-4
Update the MX tests. Similar to the previous commits, prevent CTE
inlining in some cases to prevent divergent test outputs.
2020-01-16 12:28:15 +01:00
Onder Kalaci fc07bd7c5b Update regression tests-3
Update the regression tests which only change in PG 12.
2020-01-16 12:28:15 +01:00
Onder Kalaci 64560b07be Update regression tests-2
In this commit, we're introducing a way to prevent CTE inlining via a GUC.

The GUC is used in all the tests where PG 11 and PG 12 tests would diverge
otherwise.

Note that, in PG 12, the restriction information for CTEs are generated. It
means that for some queries involving CTEs, Citus planner (router planner/
pushdown planner) may behave differently. So, via the GUC, we prevent
tests to diverge on PG 11 vs PG 12.

When we drop PG 11 support, we should get rid of the GUC, and mark
relevant ctes as MATERIALIZED, which does the same thing.
2020-01-16 12:28:15 +01:00
Onder Kalaci 5cb203b276 Update regression tests-1
These set of tests has changed in both PG 11 and PG 12.
The changes are only about CTE inlining kicking in both
versions, and yielding the exact same distributed planning.
2020-01-16 12:28:15 +01:00
Onder Kalaci 421bf68516 Add the specific regression tests
With this commit, we're adding the specific tests for CTE inlining.
The test has a different output file for pg 11, because as mentioned
in the previous commits, PG 12 generates more restriction information
for CTEs.
2020-01-16 12:28:15 +01:00
Philip Dubé 4d9a733c2f Fix inserting multiple values with row expression partition column causing the insert to be ignored
Raise an error instead of silently inserting nothing if we hit this condition in the future
2020-01-15 21:10:50 +00:00
Marco Slot 06709ee108 Always use NOTICE in log_remote_commands and avoid redaction when possible 2020-01-13 18:24:36 +01:00
Philip Dubé ccabf19090 Propagate DROP ROUTINE, ALTER ROUTINE
In two places I've made code more straight forward by using ROUTINE in our own codegen

Two changes which may seem extraneous:

AppendFunctionName was updated to not use pg_get_function_identity_arguments.
This is because that function includes ORDER BY when printing an aggregate like my_rank.
While ALTER AGGREGATE my_rank(x "any" ORDER BY y "any") is accepted by postgres,
ALTER ROUTINE my_rank(x "any" ORDER BY y "any") is not.

Tests were updated to use macaddr over integer. Using integer is flaky, our logic
could sometimes end up on tables like users_table. I originally wanted to use money,
but money isn't hashable.
2020-01-13 15:37:46 +00:00
Philip Dubé 4b5d6c3ebe Rename RelayFileState to ShardState
Replace FILE_ prefix with SHARD_STATE_
2020-01-12 05:57:53 +00:00
Hadi Moshayedi 40ba2cdd6e Test RedistributeTaskListResult 2020-01-09 23:47:25 -08:00
Philip Dubé 281aacce9b Fix row-gather for subqueries being handled by task-tracker
task-tracker has specific logic for MultiPartition when GROUP BY is missing

We were ending up in this code path because row-gather removes GROUP BY
2020-01-10 01:51:37 +00:00
Hadi Moshayedi bb65669186 Failure tests for PartitionTasklistResults 2020-01-09 10:55:58 -08:00
Hadi Moshayedi f38d0e5b3f Partitioned task list results. 2020-01-09 10:32:58 -08:00
Jelte Fennema 9724e25065 Normalize plan numbers in insert_select output 2020-01-07 10:34:08 +01:00
Philip Dubé bf7d86a3e8 Fix typo: aggragate -> aggregate 2020-01-07 01:16:09 +00:00
Philip Dubé 863bf49507 Implement pulling up rows to coordinator when aggregates cannot be pushed down. Enabled by default 2020-01-07 01:16:04 +00:00
Onder Kalaci 7f3ab7892d Skip shard pruning when possible
We're already traversing the queryTree and finding the distribution
key value, so pass it to the later stages of the planning.
2020-01-06 12:42:43 +01:00
Onder Kalaci c8f14c9f6c Make sure to update shard states of partitions on failures
Fixes #3331

In #2389, we've implemented support for partitioned tables with rep > 1.
The implementation is limiting the use of modification queries on the
partitions. In fact, we error out when any partition is modified via
EnsurePartitionTableNotReplicated().

However, we seem to forgot an important case, where the parent table's
partition is marked as INVALID. In that case, at least one of the partition
becomes INVALID. However, we do not mark partitions as INVALID ever.

If the user queries the partition table directly, Citus could happily send
the query to INVALID placements -- which are not marked as INVALID.

This PR fixes it by marking the placements of the partitions as INVALID
as well.

The shard placement repair logic already re-creates all the partitions,
so should be fine in that front.
2020-01-06 12:26:08 +01:00
Jelte Fennema d29ce8965c
Actually check that test output normalization is applied in CI (#3358)
Fixup of an issue with #3336 that caused CI not to check correctly that
normalized test output was committed.
2020-01-06 10:37:34 +01:00
Jelte Fennema 4a20ba3bfc Merge remote-tracking branch 'origin/master' into normalized-test-output 2020-01-06 09:36:04 +01:00
Jelte Fennema 7c3e8e150e Normalize tests: s/Subplan [0-9]+\_/Subplan XXX\_/g 2020-01-06 09:32:03 +01:00
Jelte Fennema acd12a6de5 Normalize tests: s/read_intermediate_result\('[0-9]+_/read_intermediate_result('XXX_/g 2020-01-06 09:32:03 +01:00
Jelte Fennema 21dbd4e55d Normalize tests: s/generating subplan [0-9]+\_/generating subplan XXX\_/g 2020-01-06 09:32:03 +01:00
Jelte Fennema 58723dd8b0 Normalize tests: s/DEBUG: Plan [0-9]+/DEBUG: Plan XXX/g 2020-01-06 09:32:03 +01:00
Jelte Fennema 38ac28b4b8 Normalize tests: intermediate_results 2020-01-06 09:32:03 +01:00
Jelte Fennema 0c6983a80e Normalize tests: pg12 changes 2020-01-06 09:32:03 +01:00
Jelte Fennema 7730bd449c Normalize tests: Remove trailing whitespace 2020-01-06 09:32:03 +01:00
Jelte Fennema 6353c9907f Normalize tests: Line info varies between versions 2020-01-06 09:32:03 +01:00
Jelte Fennema bf2c203908 Normalize tests: solation_ref2ref_foreign_keys 2020-01-06 09:32:03 +01:00
Jelte Fennema 7b2c769a5d Normalize tests: normalize file names for partitioned files 2020-01-06 09:32:03 +01:00
Jelte Fennema d0ade90cd0 Normalize tests: pkey constraints for multi_insert_select 2020-01-06 09:32:03 +01:00
Jelte Fennema 704e1d2bc8 Normalize tests: shard table names for multi_name_lengths 2020-01-06 09:32:03 +01:00
Jelte Fennema 1c4ea6836b Normalize tests: shard table names for multi_insert_select_conflict 2020-01-06 09:32:03 +01:00
Jelte Fennema 27997c054e Normalize tests: shard table names for foreign_key_restrection_enforcement 2020-01-06 09:32:03 +01:00
Jelte Fennema 432b5baac7 Normalize tests: shard table names for custom_aggregate_support 2020-01-06 09:32:03 +01:00
Jelte Fennema 0c23caeb75 Normalize tests: shard table names for multi_subtransactions 2020-01-06 09:32:03 +01:00
Jelte Fennema 883ee9121f Normalize tests: shard table names in foreign_key_to_reference_table 2020-01-06 09:32:03 +01:00
Jelte Fennema 7f3de68b0d Normalize tests: header separator length 2020-01-06 09:32:03 +01:00
Philip Dubé 566246ecd4 End regression tests with ensure_no_intermediate_data_leak
Also update tests to clean up jobs when they're directly testing job udfs
2020-01-03 18:59:02 +00:00
Önder Kalacı 0c70a5470e
Allow RETURNING in fast-path queries (#3352)
* Allow RETURNING in fast-path queries

Because there is no specific reason for that.
2020-01-03 13:42:50 +00:00
Önder Kalacı a174eb4f7b
Do not go through standard_planner() for INSERTs (#3348)
That seems unnecessary. We already have the notion of FastPath queries,
simply add it there.
2020-01-03 12:15:22 +00:00
Jelte Fennema 75a9c25acd Normalize tests: s/node group [12] (but|does)/node group \1/ 2020-01-03 11:46:01 +01:00
Jelte Fennema 96434e898f Normalize tests: s/assigned task [0-9]+ to node/assigned task to node/ 2020-01-03 11:45:22 +01:00
Jelte Fennema 7b833466ba Normalize tests: s/shard [0-9]+/shard xxxxx/g 2020-01-03 11:44:30 +01:00
Jelte Fennema 8b5fe8aa17 Normalize tests: s/placement [0-9]+/placement xxxxx/g 2020-01-03 11:42:48 +01:00
Jelte Fennema f21f00544e Normalize tests: s/ port=[0-9]+ / port=xxxxx /g 2020-01-03 11:42:09 +01:00
Jelte Fennema 8c5c0dd74c Normalize tests: s/localhost:[0-9]+/localhost:xxxxx/g 2020-01-03 11:40:50 +01:00
Jelte Fennema 5fee9d04c9 Uncomment local execution EXPLAIN ANALYZE tests 2020-01-02 18:56:32 +00:00
Marco Slot ba39d72fe1 Fix incorrect union all pushdown issue 2020-01-01 09:03:50 +01:00
Jelte Fennema cf88bdf833 Add tests for complex joins on reference tables 2019-12-27 15:05:51 +01:00
Jelte Fennema 3a042e4611 Allow cartesian products on reference tables 2019-12-27 15:05:51 +01:00
Jelte Fennema 4233cd0d9d Allow non equi joins on reference tables 2019-12-27 15:05:51 +01:00
Marco Slot b21b6905ae Do not repeat GROUP BY distribution_column on coordinator
Allow arbitrary aggregates to be pushed down in these scenarios
2019-12-25 01:33:41 +00:00
Philip Dubé a6ffcab59d CREATE EXTENSION is propagated now 2019-12-24 21:04:37 +00:00
Marco Slot a2ddfecd86 Fix inconsistent shard metadata issue 2019-12-24 08:01:32 +01:00
Hadi Moshayedi d7aea7fa10 Implement partitioned intermediate results. 2019-12-24 03:53:39 -08:00
Marco Slot b37ef0e394 Fix error in distributed queries when shards are on the coordinator 2019-12-24 06:36:43 +01:00
Philip Dubé e9bbdb8f31 Fix handling of empty intermediate results when distributing custom aggregates 2019-12-23 17:27:52 +00:00
Jelte Fennema b655c02352
Add the necessary changes for rebalance strategies on enterprise (#3325)
This commit adds the SQL and C changes necessary to support custom rebalance
strategies in the Enterprise version of Citus.
2019-12-19 15:23:08 +01:00
Hadi Moshayedi ef487e0792 Implement fetch_intermediate_results 2019-12-18 10:46:35 -08:00
Hadi Moshayedi 249508d267 Estimate cost of read_intermediate_results() 2019-12-17 13:51:51 -08:00
Hadi Moshayedi 113bd1e5f1 Implement read_intermediate_results 2019-12-17 13:51:16 -08:00
SaitTalhaNisanci 7ff4ce2169
Add adaptive executor support for repartition joins (#3169)
* WIP

* wip

* add basic logic to run a single job with repartioning joins with adaptive executor

* fix some warnings and return in ExecuteDependedTasks if there is none

* Add the logic to run depended jobs in adaptive executor

The execution of depended tasks logic is changed. With the current
logic:
- All tasks are created from the top level task list.
- At one iteration:
	- CurTasks whose dependencies are executed are found.
	- CurTasks are executed in parallel with adapter executor main
logic.
- The iteration is repeated until all tasks are completed.

* Separate adaptive executor repartioning logic

* Remove duplicate parts

* cleanup directories and schemas

* add basic repartion tests for adaptive executor

* Use the first placement to fetch data

In task tracker, when there are replicas, we try to fetch from a replica
for which a map task is succeeded. TaskExecution is used for this,
however TaskExecution is not used in adaptive executor. So we cannot use
the same thing as task tracker.

Since adaptive executor fails when a map task fails (There is no retry
logic yet). We know that if we try to execute a fetch task, all of its
map tasks already succeeded, so we can just use the first one to fetch
from.

* fix clean directories logic

* do not change the search path while creating a udf

* Enable repartition joins with adaptive executor with only enable_reparitition_joins guc

* Add comments to adaptive_executor_repartition

* dont run adaptive executor repartition test in paralle with other tests

* execute cleanup only in the top level execution

* do cleanup only in the top level ezecution

* not begin a transaction if repartition query is used

* use new connections for repartititon specific queries

New connections are opened to send repartition specific queries. The
opened connections will be closed at the FinishDistributedExecution.

While sending repartition queries no transaction is begun so that
we can see all changes.

* error if a modification was done prior to repartition execution

* not start a transaction if a repartition query and sql task, and clean temporary files and schemas at each subplan level

* fix cleanup logic

* update tests

* add missing function comments

* add test for transaction with DDL before repartition query

* do not close repartition connections in adaptive executor

* rollback instead of commit in repartition join test

* use close connection instead of shutdown connection

* remove unnecesary connection list, ensure schema owner before removing directory

* rename ExecuteTaskListRepartition

* put fetch query string in planner not executor as we currently support only replication factor = 1 with adaptive executor and repartition query and we know the query string in the planner phase in that case

* split adaptive executor repartition to DAG execution logic and repartition logic

* apply review items

* apply review items

* use an enum for remote transaction state and fix cleanup for repartition

* add outside transaction flag to find connections that are unclaimed instead of always opening a new transaction

* fix style

* wip

* rename removejobdir to partition cleanup

* do not close connections at the end of repartition queries

* do repartition cleanup in pg catch

* apply review items

* decide whether to use transaction or not at execution creation

* rename isOutsideTransaction and add missing comment

* not error in pg catch while doing cleanup

* use replication factor of the creation time, not current time to decide if task tracker should be chosen

* apply review items

* apply review items

* apply review item
2019-12-17 19:09:45 +03:00
Marco Slot 2f568ad5a5 Forbid using connections that sent intermediate results for data access and vice versa 2019-12-17 11:49:13 +01:00
Onur TIRTIR 8092529a2c
Split propagate extension test and add alternative output (#3314)
* Split extension name tests from propagate_extension_commands.sql

* Add alternative output for escape_extension_name.sql
2019-12-17 13:49:16 +03:00
Marco Slot 5f656e22db Fix issue in IsMultiStatementTransaction detection 2019-12-16 17:01:43 +01:00
Marco Slot 1633123d78 Fix crash in IN (NULL) queries 2019-12-13 08:35:54 +01:00
Hadi Moshayedi e7a6cc0801 Fix some typos from #3280 2019-12-12 13:29:26 -08:00
Marco Slot e7a8db5493 Fix issue with some zero-shard modifications 2019-12-12 07:19:10 +01:00
SaitTalhaNisanci 053fe18404
not continue in sequential execution if a cancellation is received (#3289) 2019-12-12 17:22:30 +03:00
Hadi Moshayedi 383d34f51b Tests for multi-statement transactions with subqueries or ctes 2019-12-11 19:54:15 -08:00
Hadi Moshayedi 939d3c955b Don't plan function joins locally 2019-12-11 16:53:29 -08:00
Hadi Moshayedi 067d92a7f6 Don't plan joins between ref tables and views locally 2019-12-11 14:31:34 -08:00
Hadi Moshayedi e3e174f30f Fix the way we check for local/reference table joins in the executor 2019-12-11 12:50:20 -08:00
Önder Kalacı fecf61ef1f
Add missing ORDER BY in a CTE (#3282)
Otherwise, the query output might not be consistent.
2019-12-11 10:24:54 +01:00
Marco Slot 133b8e1e0e Move coordinator insert..select logic into executor 2019-12-10 11:21:35 -08:00
Marco Slot 486c620a3c Fix inserts into local tables with distributed subqueries 2019-12-10 10:17:18 +01:00
SaitTalhaNisanci 8e5041885d Refactor isolation tests (#3062)
Currently in mx isolation tests the setup is the same except the creation of tables. Isolation framework lets us define multiple `setup` stages, therefore I thought that we can put the `mx_setup` to one file and prepend this prior to running tests. 

How the structure works:
- cpp is used before running isolation tests to preprocess spec files. This way we can include any file we want to. Currently this is used to include mx common part.
- spec files are put to `/build/specs` for clear separation between generated files and template files
- a symbolic link is created for `/expected` in `build/expected/`.
- when running isolation tests, as the `inputdir`, `build` is passed so it runs the spec files from `build/specs` and checks the expected output from `build/expected`.

`/specs` is renamed as `/spec` because postgres first look at the `specs` file under current directory, so this is renamed to avoid that since we are running the isolation tests from `build/specs` now.

Note: now we use `//` instead of `#` in comments in spec files, because cpp interprets `#` as a directive and it ignores `//`.
2019-12-10 16:12:54 +01:00
Önder Kalacı f027e9dd77
Improve Recursive CTE tests (#3274)
Postgres keeps track of recursive CTEs in the queryTree in two ways:

   - queryTree->hasRecursive is set to true, whenever a RECURSIVE CTE
     is used in the SQL. Citus checks for it
   - If the CTE is actually a recursive one (a.k.a., references itself)
     Postgres marks CommonTableExpr->cterecursive as true as well

The tests that are changed in the PR doesn't cover (b), and this becomes
an issue with CTE inlining (#3161). In that case, Citus/Postgres can inline
such CTEs, and the queries works with Citus.

However, this tests intend to check if there is any recursive CTE in the queryTree.
So, we're actually making the CTEs recursive CTEs by referring itself.

We'll add cases where a recursive CTE works by inlining in #3161.
2019-12-10 09:38:45 +01:00
Philip Dubé fcf2fd819b Add distributioncolumncollation to to pg_dist_colocation
Use partition column's collation for range distributed tables
Don't allow non deterministic collations for hash distributed tables
CoPartitionedTables: don't compare unequal types
2019-12-09 19:51:40 +00:00
Philip Dubé d138bb89bf Support creating collations as part of dependency resolution. Propagate ALTER/DROP on distributed collations
Propagate CREATE COLLATION when outside transaction
2019-12-09 04:42:51 +00:00
Marco Slot 6a9c0ea7fe Fix errors in DML with sublinks hidden by null expressions 2019-12-06 14:25:04 +01:00
Hadi Moshayedi d28beb3711 Detect SQL UDF Calls. 2019-12-05 14:31:05 -08:00
Philip Dubé 5a17fd6d9d Test more reference/local cases, also ALTER ROLE
Test ALTER ROLE doesn't deadlock when coordinator added, or propagate from mx workers

Consolidate wait_until_metadata_sync & verify_metadata to multi_test_helpers
2019-12-03 22:23:14 +00:00
Philip Dubé 1597fbb369 aggregate_support test: test DISTINCT, ORDER BY, FILTER, & no intermediate results
Previously,
- we'd push down ORDER BY, but this doesn't order intermediate results between workers
- we'd keep FILTER on master aggregate, which would raise an error about unexpected cstrings
2019-12-03 15:46:01 +00:00
Philip Dubé 5fcc169a3a Stray depended to dependent tidy up 2019-12-03 15:28:32 +00:00
Marco Slot bb3bc10f0c Fix segfault in column_to_column_name 2019-12-01 23:57:25 +01:00
Marco Slot b1b13e394e Fix segfault when executing DDL via UDF 2019-12-01 22:54:41 +01:00
Marco Slot 4c8d43c5d0 Bump repo version to 9.2devel 2019-11-29 07:33:39 +01:00
Philip Dubé 0d04ff1692 RECORD: Add support for more expression types
- OpExpr
- NullIfExpr
- MinMaxExpr
- CoalesceExpr
- CaseExpr

Also fix case where ARRAY[(1,2), NULL] was rejected
2019-11-27 17:07:22 +00:00
Philip Dubé 168e11cc9b Implement support for RECORD[] where we support RECORD
Support for ARRAY[] expressions is limited to having a consistent shape,
eg ARRAY[(int,text),(int,text)] as opposed to ARRAY[(int,text),(float,text)] or ARRAY[(int,text),(int,text,float)]
2019-11-27 15:02:43 +00:00
Hadi Moshayedi 2268a9cae6 Error for metadata commands if any metadata node is out-of-sync (#3226)
* Error for metadata commands if any metadata node is out-of-sync

* Make the functions have separate APIs for all workers/metadata workers
2019-11-27 09:52:57 +01:00
Marco Slot 2329157406 Swap aggregate_support tests to simplify enterprise merge 2019-11-26 13:39:18 +01:00
Philip Dubé 261a9de42d Fix typos:
VAR_SET_VALUE_KIND -> VAR_SET_VALUE kind
beginnig -> beginning
plannig -> planning
the the -> the
er then -> er than
2019-11-25 23:24:13 +00:00
Marco Slot 4b0ac4b0dd Properly escape ALTER FUNCTION .. SET deparsing. Also test 2019-11-25 23:01:30 +00:00
Philip Dubé 3c10c27b13 GetFunctionAlterOwnerCommand: use format_procedure_qualified
distributed_functions: test a function with a quote in name
AppendDefElemSet: quote variable names
2019-11-25 23:01:30 +00:00
Philip Dubé a81e6a81ab Fix distributed aggregation for non superuser roles
Moves support functions to pg_catalog for now. We'd prefer a different solution
for when we're creating these support functions dynamically
2019-11-25 20:46:25 +00:00
Onur TIRTIR bef32624c3
Escape extension name in extension command propagation (#3218) 2019-11-24 12:16:10 +03:00
Philip Dubé 99164398bf Fix potential segfault from standard_planner inlining functions 2019-11-21 18:47:36 +00:00
Philip Dubé c563e0825c Strip trailing whitespace and add final newline (#3186)
This brings files in line with our editorconfig file
2019-11-21 14:25:37 +01:00
Onur TIRTIR 9961297d7b Improve extension command propagation logic and tests
* Improve extension command propagation tests

* patch for hardcoded citus extension name

(cherry picked from commit 0bb3dbac0afabda10e8928f9c17eda048dc4361a)
2019-11-21 11:24:39 +03:00
Hanefi Onaldi d82f3e9406
Introduce intermediate result broadcasting
In plain words, each distributed plan pulls the necessary intermediate
results to the worker nodes that the plan hits. This is primarily useful
in three ways. 

(i) If the distributed plan that uses intermediate
result(s) is a router query, then the intermediate results are only
broadcasted to a single node.

(ii) If a distributed plan consists of only intermediate results, which
is not uncommon, the intermediate results are broadcasted to a single
node only.

(iii) If a distributed query hits a sub-set of the shards in multiple
workers, the intermediate results will be broadcasted to the relevant
node(s).

The final item (iii) becomes crucial for append/range distributed
tables where typically the distributed queries hit a small subset of
shards/workers.

To do this, for each query that Citus creates a distributed plan, we keep
track of the subPlans used in the queryTree, and save it in the distributed
plan. Just before Citus executes each subPlan, Citus first keeps track of
every worker node that the distributed plan hits, and marks every subPlan
should be broadcasted to these nodes. Later, for each subPlan which is a
distributed plan, Citus does this operation recursively since these
distributed plans may access to different subPlans, and those have to be
recorded as well.
2019-11-20 15:26:36 +03:00
Jelte Fennema 1ed05be82c
Flaky test: Fix recover_prepared_transactions (#3205)
Failed test: https://app.circleci.com/jobs/github/citusdata/citus/35994

We now always take a new connection
2019-11-19 17:49:13 +01:00
Jelte Fennema 1ac96f228b
Flaky test: Force correct plan (#3203)
Failing test: https://app.circleci.com/jobs/github/citusdata/citus/23148
2019-11-19 17:11:05 +01:00
Onur TIRTIR 26c306d188
Add extensions to distributed object propagation infrastructure (#3185) 2019-11-19 17:56:28 +03:00
Jelte Fennema 87f57eb92b
Fix verify_metadata not returning consistent results (#3199)
Failing test: https://app.circleci.com/jobs/github/citusdata/citus/58827
2019-11-19 11:02:35 +01:00
Hanefi Onaldi e3ad4aba94
Bump 9.1devel
* Add Changelog entry for 9.0.1
* Bump citus version to 9.1devel
2019-11-19 10:35:57 +03:00
Marco Slot 622462cad7 Return early in CitusHasBeenLoaded when creating a different extension 2019-11-15 03:00:20 +01:00
Halil Ozan Akgul 5ae7b219ff Create the ALTER ROLE propagation 2019-11-18 18:31:28 +03:00
Nils Dijk 217890af5f
Feature: Expression in reference join (#3180)
DESCRIPTION: Expression in reference join

Fixed: #2582

This patch allows arbitrary expressions in the join clause when joining to a reference table. An example of such joins could be found in CHbenCHmark queries 7, 8, 9 and 11; `mod((s_w_id * s_i_id),10000) = su_suppkey` and `ascii(substr(c_state,1,1)) = n2.n_nationkey`. Since the join is on a reference table these queries are able to be pushed down to the workers.

To implement these queries we will widen the `IsJoinClause` predicate to not check if the expressions are a type `Var` after stripping the implicit coerciens. Instead we define a join clause when the `Var`'s in a clause come from more than 1 table.

This allows more clauses to pass into the logical planner's `MultiNodeTree(...)` planning function. To compensate for this we tighten down the `LocalJoin`, `SinglePartitionJoin` and `DualPartitionJoin` to check for direct column references when planning. This allows the planner to work with arbitrary join expressions on reference tables.
2019-11-18 16:25:46 +01:00
Hadi Moshayedi d9dcba25e3 Plan reference/local table joins locally 2019-11-15 07:36:50 -08:00
Onder Kalaci 90943a6ce6 Do not include coordinator shards when round-robin is selected
When the user picks "round-robin" policy, the aim is that the load
is distributed across nodes. However, for reference tables on the
coordinator, since local execution kicks in immediately, round-robin
is ignored.

With this change, we're excluding the placement on the coordinator.
Although the approach seems a little bit invasive because of
modifications in the placement list, that sounds acceptable.

We could have done this in some other ways such as:

1) Add a field to "Task->roundRobinPlacement" (or such), which is
updated as the first element after RoundRobinPolicy is applied.
During the execution, if that placement is local to the coordinator,
skip it and try the other remote placements.

2) On TaskAccessesLocalNode()@local_execution.c, check
task_assignment_policy, if round-robin selected and there is local
placement on the coordinator, skip it. However, task assignment is done
on planning, but this decision is happening on the execution, which
could create weird edge cases.
2019-11-15 06:03:32 -08:00
Hadi Moshayedi 15af1637aa Replicate reference tables to coordinator. 2019-11-15 05:50:19 -08:00
Hadi Moshayedi cb011bb30f Propagate isactive to metadata nodes. 2019-11-15 05:48:42 -08:00
Philip Dubé 495c0f5117 Phase 1 implementation of custom aggregates
Phase 1 seeks to implement minimal infrastructure, so does not include:
	- dynamic generation of support aggregates to handle multiple arguments
	- configuration methods to direct aggregation strategy,
		or mark an aggregate's serialize/deserialize as safe to operate across nodes

Aggregates can be distributed when:
	- they have a single argument
	- they have a combinefunc
	- their transition type is not a pseudotype
2019-11-14 19:01:24 +00:00
Philip Dubé edc7a2ee38 Improve RECORD support 2019-11-14 18:32:22 +00:00
Philip Dubé eb35743c3f Remove citus.worker_list_file & master_initialize_node_metadata 2019-11-13 00:49:58 +00:00
Jelte Fennema adc6ca6100
Make simple in queries on unique columns work with repartion join (#3171)
This is necassery to support Q20 of the CHbenCHmark: #2582.

To summarize the fix: The subquery is converted into an INNER JOIN on a
table. This fixes the issue, since an INNER JOIN on a table is already
supported by the repartion planner.

The way this replacement is happening.:
1. Postgres replaces `col in (subquery)` with a SEMI JOIN (subquery) on col = subquery_result
2. If this subquery is simple enough Postgres will replace it with a
   regular read from a table
3. If the subquery returns unique results (e.g. a primary key) Postgres
   will convert the SEMI JOIN into an INNER JOIN during the planning. It
   will not change this in the rewritten query though.
4. We check if Postgres sends us any SEMI JOINs during its join order
   planning, if it doesn't we replace all SEMI JOINs in the rewritten
   query with INNER JOIN (which we already support).
2019-11-11 13:44:28 +01:00
Önder Kalacı 460f000218
Remove failure tests related to real-time executor (#3174)
Since we've removed the executor, we don't need the specific tests.
Since the tests are already using adaptive executor, they were passing.
But, we've plenty of extra tests for adaptive executor, so seems safe
to remove.
2019-11-11 10:18:37 +01:00
Philip Dubé ad86c1b866 AcquireDistributedLockOnRelations: escape relation names 2019-11-08 21:23:01 +00:00
Jelte Fennema 9fb897a074
Fix queries with repartition joins and group by unique column (#3157)
Postgres doesn't require you to add all columns that are in the target list to
the GROUP BY when you group by a unique column (or columns). It even actively
removes these group by clauses when you do.

This is normally fine, but for repartition joins it is not. The reason for this
is that the temporary tables don't have these primary key columns. So when the
worker executes the query it will complain that it is missing columns in the
group by.

This PR fixes that by adding an ANY_VALUE aggregate around each variable in
the target list that does is not contained in the group by or in an aggregate.
This is done only for repartition joins.

The ANY_VALUE aggregate chooses the value from an undefined row in the
group.
2019-11-08 15:36:18 +01:00
Önder Kalacı 0b3d4e55d9
Local execution should not change hasReturning for distributed tables (#3160)
It looks like the logic to prevent RETURNING in reference tables to
have duplicate entries that comes from local and remote executions
leads to missing some tuples for distributed tables.

With this PR, we're ensuring to kick in the logic for reference tables
only.
2019-11-08 12:49:56 +01:00
Philip Dubé 9a31837647 isolation_create_restore_point: test reference tables too 2019-11-07 17:50:22 +00:00
Philip Dubé 2fc45e5897 create_distributed_function: accept aggregates
Adds support for OCLASS_PROC to worker_create_or_replace_object
2019-11-06 18:23:37 +00:00
Hadi Moshayedi e00d1546f3 Don't maintain replicationfactor of reference tables 2019-11-05 07:23:14 -08:00
Önder Kalacı 960cd02c67
Remove real time router executors (#3142)
* Remove unused executor codes

All of the codes of real-time executor. Some functions
in router executor still remains there because there
are common functions. We'll move them to accurate places
in the follow-up commits.

* Move GUCs to transaction mngnt and remove unused struct

* Update test output

* Get rid of references of real-time executor from code

* Warn if real-time executor is picked

* Remove lots of unused connection codes

* Removed unused code for connection restrictions

Real-time and router executors cannot handle re-using of the existing
connections within a transaction block.

Adaptive executor and COPY can re-use the connections. So, there is no
reason to keep the code around for applying the restrictions in the
placement connection logic.
2019-11-05 12:48:10 +01:00
Önder Kalacı ffd89e4e01
Include all relevant relations in the ExtractRangeTableRelationWalker (#3135)
We've changed the logic for pulling RTE_RELATIONs in #3109 and
non-colocated subquery joins and partitioned tables.
@onurctirtir found this steps where I traced back and found the issues.

While looking into it in more detail, we decided to expand the list in a
way that the callers get all the relevant RTE_RELATIONs RELKIND_RELATION,
RELKIND_PARTITIONED_TABLE, RELKIND_FOREIGN_TABLE and RELKIND_MATVIEW.
These are all relation kinds that Citus planner is aware of.
2019-11-01 16:06:58 +01:00
Onur TIRTIR d3f68bf44f
Fix view is not distributed error when view is used in modify statements (#3104) 2019-11-01 16:34:01 +03:00
SaitTalhaNisanci 70e46703aa
Fix debug1 message in JobExecutorType (#3147)
When citus.enable_repartition_joins guc is set to on, and we have
adaptive executor, there was a typo in the debug message, which was
saying realtime executor no adaptive executor.
2019-11-01 11:14:19 +03:00
Marco Slot 03cae27782 Add tests for distributing functions with replication_model statement 2019-10-26 23:57:59 +02:00
SaitTalhaNisanci 29d45bd1b9
Do not assign InvalidOid for local execution while extracting parameters (#3131)
* do not assign InvalidOid for local execution while extracting parameters

* rename functions

* rename parameter and replace function
2019-10-28 14:28:22 +03:00
Önder Kalacı dceaddbe4d
Remove real-time/router executors (step 1) (#3125)
See #3125 for details on each item.

* Remove real-time/router executor tests-1

These are the ones which doesn't have '_%d' in the test
output files.

* Remove real-time/router executor tests-2

These are the ones which has in the test
output files.

* Move the tests outputs to correct place

* Make sure that single shard commits use 2PC on adaptive executor

It looks like we've messed the tests in #2891. Fixing back.

* Use adaptive executor for all router queries

This becomes important because when task-tracker is picked, we
used to pick router executor, which doesn't make sense.

* Remove explicit references to real-time/router executors in the tests

* JobExecutorType never picks real-time/router executors

* Make sure to go incremental in test output numbers

* Even users cannot pick real-time anymore

* Do not use real-time/router custom scans

* Get rid of unnecessary normalizations

* Reflect unneeded normalizations

* Get rid of unnecessary test output file
2019-10-25 10:54:54 +02:00
Marco Slot b8c8fd4612 Fix run_command_on_colocated_placements tests 2019-10-23 00:08:17 +02:00
Onder Kalaci c2460a1c31 Add upgrade test for distributed functions
Simply make sure that Citus can pushdown functions after pg upgrade.
2019-10-23 12:07:51 +02:00
Philip Dubé 2a969fe4bb ssl_by_default: remove stray PG10 check 2019-10-23 00:27:54 +00:00
Philip Dubé 2204a17dbd isolation_multiuser_locking: reorder GRANT to avoid deadlock on enterprise 2019-10-22 21:10:55 +00:00
Jelte Fennema 78e495e030
Add shouldhaveshards to pg_dist_node (#2960)
This is an improvement over #2512.

This adds the boolean shouldhaveshards column to pg_dist_node. When it's false, create_distributed_table for new collocation groups will not create shards on that node. Reference tables will still be created on nodes where it is false.
2019-10-22 16:47:16 +02:00
Halil Ozan Akgul 5f04ac774f Adds the tests for refresh materialized views 2019-10-17 16:00:56 +03:00
Jelte Fennema 7abedc38b0
Support subqueries in HAVING (#3098)
Areas for further optimization:
- Don't save subquery results to a local file on the coordinator when the subquery is not in the having clause
- Push the the HAVING with subquery to the workers if there's a group by on the distribution column
- Don't push down the results to the workers when we don't push down the HAVING clause, only the coordinator needs it

Fixes #520
Fixes #756
Closes #2047
2019-10-16 16:40:14 +02:00
Jelte Fennema 9b2f4d71ac
Make sure some MX tests use defined shard_ids (#3103) 2019-10-12 22:46:14 +02:00
Philip Dubé 74cb168205 Remove Postgres 10 support 2019-10-11 21:56:56 +00:00
Philip Dubé 4063e7ca67 CALL delegation: apply strip_implicit_coercions to distribution argument 2019-10-10 17:42:43 +00:00
Philip Dubé 7ffd78b6e0 isolation_multiuser_locking
Introduce a test which checks that locks are only acquired when a user has necessary permissions
Currently tests REINDEX, CREATE INDEX, TRUNCATE
2019-10-10 16:58:41 +00:00
Nils Dijk 4a4a220945
Fix enum add value order and pg12 (#3082)
DESCRIPTION: Fix order for enum values and correctly support pg12

PG 12 introduces `ALTER TYPE ... ADD VALUE ...` during transactions. Earlier versions would error out when called in a transaction, hence we connect to workers outside of the transaction which could cause inconsistencies on pg12 now that postgres doesn't error with this syntax anymore.

During the implementation of this fix it became apparent there was an error with the ordering of enum labels when the type was recreated. A patch and test have been included.
2019-10-07 17:16:19 +02:00
Jelte Fennema 01da11f264
Change citus truncate trigger to AFTER and add more upgrade tests (#3070)
* Add more upgrade tests

* Fix citus trigger generation after upgrade

citus_truncate_trigger runs before truncate when created by create_distributed_table:
492d1b2cba/src/backend/distributed/commands/create_distributed_table.c (L1163)

* Remove pg_dist_jobid_seq
2019-10-07 16:43:04 +02:00
Onder Kalaci 3be72ce42f Make sure that distributed functions always have the correct user
Objectives:

(a) both super user and regular user should have the correct owner for the function on the worker
(b) The transactional semantics would work fine for both super user and regular user
(c) non-super-user and non-function owner would get a reasonable error message if tries to distribute the function

Co-authored-by: @serprex
2019-10-04 21:38:49 +00:00
SaitTalhaNisanci c547664fae
Add Citus upgrade tests with its job (#3003)
* Add initial citus upgrade test

* Add restart databases and run tests in all nodes

* Add output for citus versions 8.0 8.1 8.2 and 8.3

* Add verify step for citus upgrade

* Add target for citus upgrade test in makefile

* Add check citus upgrade job

* Fix installation file path and add missing tar

* Run citus upgrade for v8.0 v8.1 v8.2 and v8.3

* Create upgrade_common file and rename upgrade check

* Add pg version to citus upgrade test

* Test with postgres 10 and 11 in citus upgrade tests

* Add readme for citus upgrade test

* Add some basic tests to citus upgrade tests

* Add citus upgrade mixed mode test

* Remove citus artifacts before installing another one

* Refactor citus upgrade test according to reviews

* quick and dirty rewrite of citus upgrade tests to support local execution.

I think we need to change the makefile in such a way that the tar files can be injected from the circle ci config file.

Also I removed some of the citus version checks you had to not have the requirement to pass that in separately from the pre tar file. I am not super happy with it, but two flags that need to be kept in sync is also not desirable. Instead I print out the citus version that is installed per node. This will not cause a failure if they are not what one would expect but it lets us verify we are running the expected version.

* use latest citusupgradetester in circleci

* update readme and use common alias for upgrade_common import
2019-10-04 17:44:49 +03:00
Marco Slot 1a3a174f67 Grant usage on schema citus to public 2019-10-04 12:26:08 +02:00
Hadi Moshayedi 217db2a03e Don't block for locks in SyncMetadataToNodes() 2019-10-03 16:53:36 -07:00
Halil Ozan Akgul bda8f6f87b Created tests for distribution to reference table foreign keys on mx 2019-10-03 09:31:13 +03:00
SaitTalhaNisanci 19bdca14d8
Add jobs to run tests with pg 12 (#3033)
* Add PG12 test outputs

* Add jobs to run tests with pg 12

* use POSIX collate for compatibility between pg10/pg11/pg12

* do not override the new default value when running vanilla tests

* fix 2 problems with pg12 tests

* update pg12 images with pg12 rc1

* remove pg10 jobs

* Revert "Add PG12 test outputs"

This reverts commit f3545b92ef.

* change images to use latest instead of dev

* add missing coverage flags
2019-10-02 15:33:12 +03:00
Halil Ozan Akgul e5906bead2 Created isolation tests for update, delete, upsert on reference tables with MX. 2019-10-02 10:11:21 +03:00
Hanefi Onaldi bd416ef68f Fix empty FROM clauses in PG12 2019-10-01 19:54:11 +00:00
Halil Ozan Akgul 1d7030a651 Created isolation tests for select for update on reference tables with MX. 2019-10-01 16:29:15 +03:00
Philip Dubé 89d35e9692 Attempt to force custom plans for prepared statements when trying to delegate function calls
We discern between PARAM_EXEC & PARAM_EXTERN:
d52eaa0948/src/include/nodes/primnodes.h (L211)
According to primnodes.h we should only run into PARAM_EXEC or PARAM_EXTERN
2019-09-30 23:49:14 +00:00
Hadi Moshayedi 5e97e5c98e Don't push down queries when in subqueries/ctes 2019-09-30 14:22:05 -07:00
Nils Dijk 01b26cf91a
Disallow distributed functions for functions depending on an extension (#3049)
DESCRIPTION: Disallow distributed functions for functions depending on an extension

Functions depending on an extension cannot (yet) be distributed by citus. If we would allow this it would cause issues with our dependency following mechanism as we stop following objects depending on an extension.

By not allowing functions to be distributed when they depend on an extension as well as not allowing to make distributed functions depend on an extension we won't break the ability to add new nodes. Allowing functions depending on extensions to be distributed at the moment could cause problems in that area.
2019-09-30 15:19:47 +02:00
Nils Dijk 473cbc0115
Propagate CREATE OR REPLACE FUNCTION to workers for distributed functions (#3043)
DESCRIPTION: Propagate CREATE OR REPLACE FUNCTION

Distributed functions could be replaced, which should be propagated to the workers to keep the function in sync between all nodes.

Due to the complexity of deparsing the `CreateFunctionStmt` we actually produce the plan during the processing phase of our utilityhook. Since the changes have already been made in the catalog tables we can reuse `pg_get_functiondef` to get us the generated `CREATE OR REPLACE` sql.
2019-09-30 12:41:17 +02:00
Jelte Fennema 82ec918b29
Add explain summary support (#3046)
Fixes #2922 and also adds explain analyze regression tests
2019-09-30 10:58:49 +02:00
Nils Dijk 9c2c50d875
Hookup function/procedure deparsing to our utility hook (#3041)
DESCRIPTION: Propagate ALTER FUNCTION statements for distributed functions

Using the implemented deparser for function statements to propagate changes to both functions and procedures that are previously distributed.
2019-09-27 22:06:49 +02:00
Philip Dubé 363409a0c2 Propagate REINDEX TABLE & REINDEX INDEX 2019-09-27 18:14:53 +00:00
Hanefi Onaldi 66b9f2e887 Deparsing and qualifiying for FUNCTION/PROCEDURE statements (#3014)
This PR aims to add all the necessary logic to qualify and deparse all possible `{ALTER|DROP} .. {FUNCTION|PROCEDURE}` queries.

As Procedures are introduced in PG11, the code contains many PG version checks. I tried my best to make it easy to clean up once we drop PG10 support.


Here are some caveats:
- I assumed that the parse tree is a valid one. There are some queries that are not allowed, but still are parsed successfully by postgres planner. Such queries will result in errors in execution time. (e.g. `ALTER PROCEDURE p STRICT` -> `STRICT` action is valid for functions but not procedures. Postgres decides to parse them nevertheless.)
2019-09-27 19:02:52 +02:00
Marco Slot 2868e02a3d Implement SELECT function call delegation.
When a function is marked as colocated with a distributed table,
we try delegating queries of kind "SELECT func(...)" to workers.

We currently only support this simple form, and don't delegate
forms like "SELECT f1(...), f2(...)", "SELECT f1(...) FROM ...",
or function calls inside transactions.

As a side effect, we also fix the transactional semantics of DO blocks.
Previously we didn't consider a DO block a multi-statement transaction.
Now we do.

Co-authored-by: Marco Slot <marco@citusdata.com>
Co-authored-by: serprex <serprex@users.noreply.github.com>
Co-authored-by: pykello <hadi.moshayedi@microsoft.com>
2019-09-27 09:13:25 -07:00
Halil Ozan Akgul 824a69587c Created isolation tests for insert select on MX 2019-09-26 17:40:36 +03:00
Halil Ozan Akgul d56ab6274c Created isolation tests for drop, alter, index and select for update on MX. 2019-09-26 10:47:14 +03:00
Halil Ozan Akgul d426fb2159 Created isolation tests for truncate on MX. 2019-09-25 16:51:20 +03:00
Halil Ozan Akgul 62b6852923 Created isolation tests for copy on MX. 2019-09-25 15:36:05 +03:00
Onder Kalaci 219f3676a0 Improve some tests around local execution and CTE inlining on pg 12 2019-09-25 10:53:19 +02:00
Philip Dubé 4f60e3a149 Feedback 2019-09-24 17:31:09 +00:00
Marco Slot c1e43b25da Use the new create_distributed_function API in some call tests 2019-09-24 17:31:09 +00:00
Philip Dubé 90e1f1442a Annotated tests for multi_mx_call.
Co-authored-by: pykello <hadi.moshayedi@microsoft.com>
2019-09-24 17:31:09 +00:00
Philip Dubé c95d46b4f3 Extend multi_mx_call with some of Hadi's suggestions for better test coverage 2019-09-24 17:31:09 +00:00
Philip Dubé 16b8d17aba Test: multi_mx_call 2019-09-24 17:31:09 +00:00
Onder Kalaci 18de78f386 Relax the colocation checks for distributed functions
As long as the types can be coerced, it is safe to pushdown
functions.
2019-09-24 16:31:08 +02:00
Jelte Fennema bd2103e597 Remove flappy test 2019-09-24 14:15:33 +02:00
Jelte Fennema 897ec1bdeb Revert "Temporarily disable flappy test"
This reverts commit 4b4459ee62.
2019-09-24 14:15:33 +02:00
Marco Slot 42be8afd74 Swap pg_dist_node groupid and nodeid sequences 2019-09-24 12:03:44 +02:00
Marco Slot 0dea485c68 Fix misspelling in multi_colocation_utils 2019-09-24 11:27:30 +02:00
Marco Slot 4b4459ee62 Temporarily disable flappy test 2019-09-24 11:02:34 +02:00
Hadi Moshayedi 48078a30e6 Fix wait_until_metadata_sync() for postgres 12.
Postgres 12 now has an assertion that the calls to WaitLatchOrSocket
handle postmaster death.
2019-09-23 14:15:35 -07:00
Philip Dubé 06faba91c0 Include ifdefs for pg12 API changes, update local_shard_executiuon test to avoid CTE inlining 2019-09-23 20:22:35 +00:00
Onder Kalaci d37745bfc7 Sync metadata to worker nodes after create_distributed_function
Since the distributed functions are useful when the workers have
metadata, we automatically sync it.

Also, after master_add_node(). We do it lazily and let the deamon
sync it. That's mainly because the metadata syncing cannot be done
in transaction blocks, and we don't want to add lots of transactional
limitations to master_add_node() and create_distributed_function().
2019-09-23 18:30:53 +02:00
Marco Slot 5f23b951c7 Support serial and smallserial when syncing metadata 2019-09-23 17:39:21 +02:00
Marco Slot e58d76c5f6 Fix assert failure in bare SELECT FROM reference table FOR UPDATE in MX 2019-09-23 17:00:09 +02:00
SaitTalhaNisanci 71e7047e65
Enhance pg upgrade tests (#3002)
* Enhance pg upgrade tests

* Add a specific upgrade test for pg_dist_partition

We store the index of distribution column, and when a column with an
index that is smaller than distribution column index is dropped before
an upgrade, the index should still match the distribution column after
an upgrade
2019-09-23 17:37:14 +03:00
Marco Slot d85d77634d Handle anonymous composite types on the target list 2019-09-23 14:53:02 +02:00
Halil Ozan Akgul b55b275a30 Created isolation tests for update, delete and upsert on MX 2019-09-23 14:13:29 +03:00
Onder Kalaci d7e2968120 Add parameters to create_distributed_function()
With this commit, we're changing the API for create_distributed_function()
such that users can provide the distribution argument and the colocation
information.
2019-09-22 21:53:33 +02:00
Onder Kalaci e1fe8d60b4 Make sure that functions are also listed in SupportedDependencyByCitus
We've recently merged two commits, db5d03931d
and eccba1d4c3, which actually operates
on the very similar places.

It turns out that we've an integration issue, where master_add_node()
fails to replicate the functions to newly added node.
2019-09-20 11:02:50 +02:00
Nils Dijk 72015faeb2
fix disable_object_propagation test for pg12 2019-09-19 17:40:24 +02:00
Hanefi Onaldi ed11b9590c
Add distributed func creation queries in dependency replication logic 2019-09-18 20:07:45 +03:00
Hadi Moshayedi d2f2acc4b2 Make master_update_node citus-ha friendly. 2019-09-18 09:32:54 -07:00
Hadi Moshayedi 76f3933b05 Add metadatasynced, and sync on master_update_node()
Co-authored-by: pykello <hadi.moshayedi@microsoft.com>
Co-authored-by: serprex <serprex@users.noreply.github.com>
2019-09-18 09:32:54 -07:00
Nils Dijk db5d03931d
Feature disable object propagation (#2986)
DESCRIPTION: Provide a GUC to turn of the new dependency propagation functionality

In the case the dependency propagation functionality introduced in 9.0 causes issues to a cluster of a user they can turn it off almost completely. The only dependency that will still be propagated and kept track of is the schema to emulate the old behaviour.

GUC to change is `citus.enable_object_propagation`. When set to `false` the functionality will be mostly turned off. Be aware that objects marked as distributed in `pg_dist_object` will still be kept in the catalog as a distributed object. Alter statements to these objects will not be propagated to workers and may cause desynchronisation.
2019-09-18 17:16:22 +02:00
Philip Dubé ac14f1dd49 pg12 doesn't support client_min_messages as 'fatal' 2019-09-17 20:37:06 +00:00
Nils Dijk 2b7f5552c8
Fix: rename remote type on conflict (#2983)
DESCRIPTION: Rename remote types during type propagation

To prevent data to be destructed when a remote type differs from the type on the coordinator during type propagation we wanted to rename the type instead of `DROP CASCADE`.

This patch removes the `DROP` logic and adds the creation of a rename statement to a free name.
2019-09-17 18:54:10 +02:00
Nils Dijk 0a3152d09c
Add feature flag to turn off create type propagation (#2982)
DESCRIPTION: Add feature flag to turn off create type propagation

When `citus.enable_create_type_propagation` is set to `false` citus will not propagate `CREATE TYPE` statements to the workers. Types are still distributed when tables that depend on these types are distributed.
2019-09-17 15:50:06 +02:00
Halil Ozan Akgul 5333296a54 Created isolation tests for select on MX 2019-09-17 12:44:45 +03:00
Halil Ozan Akgul 7cde785031 Added the MX isolation tests for insert 2019-09-16 15:49:43 +03:00
Hanefi Onaldi 8f2a3a0604
Introduce create_distributed_function(regproc) UDF (#2961)
This PR aims to add the minimal set of changes required to start
distributing functions. You can use create_distributed_function(regproc)
UDF to distribute a function.

    SELECT create_distributed_function('add(int,int)');

The function definition should include the param types to properly
identify the correct function that we wish to distribute
2019-09-13 23:27:46 +03:00
Philip Dubé fb10edcb9d isolation_add_node_vs_reference_table_operations: test add in parallel with create_reference_table 2019-09-13 18:13:58 +00:00
Jelte Fennema 4bbf65d913
Change SQL migration build process for easier reviews (#2951)
@thanodnl told me it was a bit of a problem that it's impossible to see
the history of a UDF in git. The only way to do so is by reading all the
sql migration files from new to old. Another problem is that it's also
hard to review the changed UDF during code review, because to find out
what changed you have to do the same. I thought of a IMHO better (but
not perfect) way to handle this.

We keep the definition of a UDF in sql/udfs/{name_of_udf}/latest.sql.
That file we change whenever we need to make a change to the the UDF. On
top of that you also make a snapshot of the file in
sql/udfs/{name_of_udf}/{migration-version}.sql (e.g. 9.0-1.sql) by
copying the contents. This way you can easily view what the actual
changes were by looking at the latest.sql file.

There's still the question on how to use these files then. Sadly
postgres doesn't allow inclusion of other sql files in the migration sql
file (it does in psql using \i). So instead I used the C preprocessor+
make to compile a sql/xxx.sql to a build/sql/xxx.sql file. This final
build/sql/xxx.sql file has every occurence of #include "somefile.sql" in
sql/xxx.sql replaced by the contents of somefile.sql.
2019-09-13 18:44:27 +02:00
Nils Dijk 2879689441
Distribute Types to worker nodes (#2893)
DESCRIPTION: Distribute Types to worker nodes

When to propagate
==============

There are two logical moments that types could be distributed to the worker nodes
 - When they get used ( just in time distribution )
 - When they get created ( proactive distribution )

The just in time distribution follows the model used by how schema's get created right before we are going to create a table in that schema, for types this would be when the table uses a type as its column.

The proactive distribution is suitable for situations where it is benificial to have the type on the worker nodes directly. They can later on be used in queries where an intermediate result gets created with a cast to this type.

Just in time creation is always the last resort, you cannot create a distributed table before the type gets created. A good example use case is; you have an existing postgres server that needs to scale out. By adding the citus extension, add some nodes to the cluster, and distribute the table. The type got created before citus existed. There was no moment where citus could have propagated the creation of a type.

Proactive is almost always a good option. Types are not resource intensive objects, there is no performance overhead of having 100's of types. If you want to use them in a query to represent an intermediate result (which happens in our test suite) they just work.

There is however a moment when proactive type distribution is not beneficial; in transactions where the type is used in a distributed table.

Lets assume the following transaction:

```sql
BEGIN;
CREATE TYPE tt1 AS (a int, b int);
CREATE TABLE t1 AS (a int PRIMARY KEY, b tt1);
SELECT create_distributed_table('t1', 'a');
\copy t1 FROM bigdata.csv
```

Types are node scoped objects; meaning the type exists once per worker. Shards however have best performance when they are created over their own connection. For the type to be visible on all connections it needs to be created and committed before we try to create the shards. Here the just in time situation is most beneficial and follows how we create schema's on the workers. Outside of a transaction block we will just use 1 connection to propagate the creation.

How propagation works
=================

Just in time
-----------

Just in time propagation hooks into the infrastructure introduced in #2882. It adds types as a supported object in `SupportedDependencyByCitus`. This will make sure that any object being distributed by citus that depends on types will now cascade into types. When types are depending them self on other objects they will get created first.

Creation later works by getting the ddl commands to create the object by its `ObjectAddress` in `GetDependencyCreateDDLCommands` which will dispatch types to `CreateTypeDDLCommandsIdempotent`.

For the correct walking of the graph we follow array types, when later asked for the ddl commands for array types we return `NIL` (empty list) which makes that the object will not be recorded as distributed, (its an internal type, dependant on the user type).

Proactive distribution
---------------------

When the user creates a type (composite or enum) we will have a hook running in `multi_ProcessUtility` after the command has been applied locally. Running after running locally makes that we already have an `ObjectAddress` for the type. This is required to mark the type as being distributed.

Keeping the type up to date
====================

For types that are recorded in `pg_dist_object` (eg. `IsObjectDistributed` returns true for the `ObjectAddress`) we will intercept the utility commands that alter the type.
 - `AlterTableStmt` with `relkind` set to `OBJECT_TYPE` encapsulate changes to the fields of a composite type.
 - `DropStmt` with removeType set to `OBJECT_TYPE` encapsulate `DROP TYPE`.
 - `AlterEnumStmt` encapsulates changes to enum values.
    Enum types can not be changed transactionally. When the execution on a worker fails a warning will be shown to the user the propagation was incomplete due to worker communication failure. An idempotent command is shown for the user to re-execute when the worker communication is fixed.

Keeping types up to date is done via the executor. Before the statement is executed locally we create a plan on how to apply it on the workers. This plan is executed after we have applied the statement locally.

All changes to types need to be done in the same transaction for types that have already been distributed and will fail with an error if parallel queries have already been executed in the same transaction. Much like foreign keys to reference tables.
2019-09-13 17:46:07 +02:00
Jelte Fennema e4cfea3751 Correctly add schema when distributing sequence definitons
Fixes 2958
2019-09-13 17:19:35 +02:00
Jelte Fennema 579a40dfa5 Add make check-base-mx 2019-09-13 17:19:35 +02:00
Halil Ozan Akgul 4d34b79b87 There were two multi insert - single insert tests but no multi insert - multi insert test. Fixed it. 2019-09-13 16:09:11 +03:00
Nils Dijk 05f0668cdc
Fix: schema leak onto create index statement cache (#2964)
DESCRIPTION: Fix schema leak on CREATE INDEX statement

When a CREATE INDEX is cached between execution we might leak the schema name onto the cached statement of an earlier execution preventing the right index to be created.

Even though the cache is cleared when the search_path changes we can trigger this behaviour by having the schema already on the search path before a colliding table is created in a schema earlier on the `search_path`. When calling an unqualified create index via a function (used to trigger the caching behaviour) we see that the index is created on the wrong table after the schema leaked onto the statement.

By copying the complete `PlannedStmt` and `utilityStmt` during our planning phase for distributed ddls we make sure we are not leaking the schema name onto a cached data structure.

Caveat; COPY statements already have a lot of parsestree copying ongoing without directly putting it back on the `pstmt`. We should verify that copies modify the statement and potentially copy the complete `pstmt` there already.
2019-09-13 14:04:23 +02:00