Commit Graph

1696 Commits (07f9a442b0f822c90c2c71ad01b0728ef1e5883f)

Author SHA1 Message Date
SaitTalhaNisanci 07f9a442b0
Refactor CopyLocalDataIntoShards (#3693)
This PR:
- Declares variables when they are needed.
- Creates DoCopyFromLocalTableIntoShards for better readability.
- Doesn't use a hardcoded value, instead use a variable for better
readability.
2020-04-10 09:25:26 +03:00
Marco Slot a4b2197450 Correctly handle non-constant LIMIT/OFFSET clauses 2020-04-09 19:59:50 +00:00
SaitTalhaNisanci 3dc7cad754
use an enum for local execution status (#3733)
We have two variables that are related to local execution status.
TransactionAccessedLocalPlacement and
TransactionConnectedToLocalGroup. Only one of these fields should be
set, however we didn't have any check for this contraint and it was
error prone.

What those two variables are used is that we are trying to understand if
we should use local execution, the current session, or if we should be
using a connection to execute the current query, therefore the tasks. In
the enum, now it is more clear what these variables mean.

Also, now we have a method to change the local execution status. The
method will error if we are trying to transition from a state to a wrong
state. This will help us avoid problems.
2020-04-09 19:11:04 +03:00
SaitTalhaNisanci 24dcb02bca
enable local table join with reference table (#3697)
* enable local table join with reference table

* test different cases with local table and reference join
2020-04-09 15:25:54 +03:00
SaitTalhaNisanci 233e4a24d1
use local execution within transaction block (#3714)
* use local executon when in a transaction block

When we are inside a transaction block, there could be other methods
that need local execution, therefore we will use local execution in a
transaction block.

* update test outputs with transaction block local execution

* add a test to verify we dont leak intermediate schemas
2020-04-09 12:41:58 +03:00
SaitTalhaNisanci fa88046ce1
test that we don't leak intermediate schemas (#3737)
* test that we don't leak intermediate schemas

We have tests to make sure that we don't intermediate any intermediate
files, tables etc but we don't test if we are leaking schemas. It makes
sense to test this as well.

* remove all repartition schemas in case of error

This solution is not an ideal one but it seems to be doing the job.
We should have a more generic solution for the cleanup but it seems that
putting the cleanup in the abort handler is dangerous and it was
crashing.
2020-04-09 12:17:41 +03:00
SaitTalhaNisanci 362d72853c
return early in ExecuteTaskListExtended (#3738)
It is possible to return an error in ExecuteTaskListExtended after
performing local execution with the current structure. However there is
no point in execution the local tasks if we are going to return an error
later. So the local execution is moved after the error check.
2020-04-09 10:10:49 +03:00
Hadi Moshayedi 9b8802ba2d Remove todo from reference_table_utils 2020-04-08 12:46:55 -07:00
Hadi Moshayedi dda53a0bba GUC for replicate reference tables on activate. 2020-04-08 12:42:45 -07:00
Hadi Moshayedi 0758a81287 Prevent reference tables being dropped when replicating reference tables 2020-04-08 12:41:36 -07:00
Marco Slot 924cd7343a Defer reference table replication to shard creation time 2020-04-08 12:41:36 -07:00
Philip Dubé 26797bfb94 Verify trigger relation before reading old/new tuples
master_dist_placement_cache_invalidate: bail when triggering on pg_dist_shard_placement
2020-04-07 15:39:31 +00:00
Önder Kalacı 70012dfd33 Do not error when an intermediate file does not exit (#3707)
When the file does not exist, it could mean two different things.
First -- and a lot more common -- case is that a failure happened
in a concurrent backend on the same distributed transaction. And,
one of the backends in that transaction has already been roll
backed, which has already removed the file. If we throw an error
here, the user might see this error instead of the actual error
message. Instead, we prefer to WARN the user and pretend that the
file has no data in it. In the end, the user would see the actual
error message for the failure.

Second, in case of any bugs in intermediate result broadcasts,
we could try to read a non-existing file. That is most likely
to happen during development. Thus, when asserts enabled, we throw
an error instead of WARNING so that the developers cannot miss.
2020-04-07 17:06:55 +02:00
Onder Kalaci 4f7c902c6c Move connection establishment for intermediate results after query execution
When we have a query like the following:

```SQL
WITH a AS (SELECT * FROM foo LIMIT 10) SELECT max(x) FROM a JOIN bar 2 USING (y);
```

Citus currently opens side channels for doing the
	`COPY "1_1"` FROM STDIN (format 'result')

before starting the execution of
	`SELECT * FROM foo LIMIT 10`

Since we need at least 1 connection per worker to do
	`SELECT * FROM foo LIMIT 10`
We need to have 2 connections to worker in order to broadcast the results.

However, we don't actually send a single row over the side channel until the
execution of `SELECT * FROM foo LIMIT 10` is completely done (and connections
unclaimed) and the results are written to a tuple store. We could actually
reuse the same connection for doing the `COPY "1_1"` FROM STDIN (format 'result').

This also fixes the issue that Citus doesn't obey `citus.max_adaptive_executor_pool_size`
when the query includes an intermediate result.
2020-04-07 17:06:55 +02:00
Onder Kalaci 721daec9a5 Move the logic that initilize connections/local files into a function 2020-04-07 17:06:55 +02:00
Onder Kalaci 9b29a32d7a Remove all references for side channel connections
We don't need any side channel connections. That is actually
problematic in the sense that it creates extra connections.
Say, citus.max_adaptive_executor_pool_size equals to 1, Citus
ends up using one extra connection for the intermediate results.
Thus, not obeying citus.max_adaptive_executor_pool_size.

In this PR, we remove the following entities from the codebase
to allow further commits to implement not requiring extra connection
for the intermediate results:

- The connection flag REQUIRE_SIDECHANNEL
- The function GivePurposeToConnection
- The ConnectionPurpose struct and related fields
2020-04-07 17:06:55 +02:00
Hanefi Onaldi 1d22d0c2ff
Remove metadata locks from size functions 2020-04-07 17:37:15 +03:00
SaitTalhaNisanci 0430b568be
explicitly return false if transaction connected to local node (#3715)
* explicitly return false if transaction connected to local node

* not set TransactionConnectedToLocalGroup if we are writing to a file

We use TransactionConnectedToLocalGroup to prevent local execution from
happening as that might cause visibility problems. As files are visible
to all transactions, we shouldn't set this variable if we are writing to
a file.
2020-04-07 17:30:34 +03:00
Marco Slot 2632343f64 Fix intermediate result pruning for INSERT..SELECT 2020-04-07 11:07:49 +02:00
Marco Slot 84672c3dbd Simplify intermediate result pruning logic 2020-04-07 10:53:29 +02:00
SaitTalhaNisanci a710b3cdc5
fix null tupleStoreState case in ExecuteLocalTaskListExtended (#3711)
In case we don't care about the tupleStoreState in
ExecuteLocalTaskListExtended, it could be passed as null. In that case
we will get a seg error. This changes it so that a dummy tuple store
will be created when it is null.

Do not use local execution in ExecuteTaskListOutsideTransaction.
As we are going to run the tasks outside transaction, we shouldn't use local execution.
However, there is some problem when using local execution related to
repartition joins, when we solve that problem, we can execute the tasks
coming to this path with local execution.

Also logging the local command is simplified.

normalize job id in worker_hash_partition_table in test outputs.
2020-04-07 11:47:09 +03:00
SaitTalhaNisanci a369f9001d
fix incorrect groupid or nodeid (#3710)
For shardplacements, we were setting nodeid, nodename, nodeport and
nodegroup manually. This makes it very error prone, and it seems that we
already forgot to set some of them. This would mean that they would have
their default values, e.g group id would be 0 when its group id is not
0.

So the implication is that we would have inconsistent worker metadata.

A new method is introduced, and we call the method to set those fields
now, so that as long as we call this method, we won't be setting
inconsistent metadata.

It probably makes sense to have a struct for these fields. We already
have NodeMetadata but it doesn't have nodename or nodeport. So that
could be done over another refactor to make things simpler.
2020-04-07 11:14:14 +03:00
Philip Dubé 4860e11561 Duplicate grouping on worker whenever possible
This is possible whenever we aren't pulling up intermediate rows

We want to do this because this was done in 9.2,
some queries rely on the performance of grouping causing distinct values

This change was introduced when implementing window functions on coordinator
2020-04-06 18:51:30 +00:00
Philip Dubé b01bae5937 Check connections from connection_placement before polling 2020-04-06 17:45:44 +00:00
Onur Tirtir 4c95ad1579 do not traverse parse tree in distributed planner one more time 2020-04-03 18:24:48 +03:00
Onur Tirtir abdabbedb2 refactor distributed_planner.c 2020-04-03 18:24:41 +03:00
Onur Tirtir 13a35c6813 implement GetOnlyShardOidOfReferenceTable and some refactor in shard_uitls 2020-04-03 18:24:13 +03:00
Hanefi Önaldı d1223bd6cc
Remove migration paths to 9.3-1, introduce 9.3-2 2020-04-03 12:50:45 +03:00
Marco Slot fd8cdb92f4 Evaluate nextval in the target list on the coordinator 2020-04-02 02:53:19 +02:00
SaitTalhaNisanci df88ab71b6 normalize assign_distributed_transaction_id in tests 2020-04-01 18:23:16 +03:00
SaitTalhaNisanci 0aebd78ea7 use localExecution in ExecuteTaskListExtended
ExecuteTaskListExtended is the common method for different codepaths,
and instead of writing separate local execution logics in different
codepaths, it makes more sense to have the logic here. We still need to
do some refactoring, this is an initial step.

After this commit, we can run create shard commands locally. There is a
special case with shard creation commands. A create shard command might
have a concatenated query string, however local execution did not know
how to execute a task with multiple query strings. This is also
implemented in this commit. We go over each query in the concatenated
query string and plan/execute them one by one.

A more clean solution to this would be to make sure that each task has a
single query. We currently cannot do that because we need to ensure the
task dependencies. However, it would make sense to do that at some point
and it would simplify the code a lot.
2020-04-01 18:23:16 +03:00
SaitTalhaNisanci ba01f3457a
use macros for pg versions instead of hardcoded values (#3694)
3 Macros are defined for removing the hardcoded pg versions.
PG_VERSION_11, PG_VERSION_12 and PG_VERSION_13.
2020-04-01 17:01:52 +03:00
Philip Dubé ddc3377026 Assert bounds checks on two array reads which rely on data not being out of bounds 2020-03-31 18:58:35 +00:00
Marco Slot 252abcce16 Allow table type to be used in target list 2020-03-31 11:11:01 -07:00
SaitTalhaNisanci 6cd32b0db1
refactor ExecuteLocalTaskList (#3617)
ExecuteLocalTaskList doesn't need scanState as it only uses
paramListInfo, distributedPlan and tupleStoreState. It is better to pass
only the variables that the function needs, so that we can call this
function from other places when we dont have scanState.
2020-03-31 19:19:54 +03:00
SaitTalhaNisanci b5591b1b28 use taskQuery as a struct to simplify the code 2020-03-31 15:47:55 +03:00
SaitTalhaNisanci 8806c4d697 move queryStringList into taskQuery
Also allocate task query in the memory context of task.
2020-03-31 15:47:55 +03:00
SaitTalhaNisanci c796ac335d add TaskQuery struct to abstract query string related fields
We had many fields in task related to query strings. It was kind of
complex, and only of them could be set at a time. Therefore it makes
more sense to abstract this and use a union so that it is clear that
only of them should be set.

We have three fields that could have query related strings:
- queryForLocation
- queryStringLazy
- perPlacementQueryStrings

Relatively, they can be set with:
- SetTaskQueryString
- SetTaskQueryIfShouldLazyDeparse
- SetTaskPerPlacementQueryStrings

The direct usage of the query related fields are also removed.

Rename queryForLocalExecution

Currently queryForLocalExecution is only used for deparsing purposes,
therefore it makes sense to rename it to what it is doing.
2020-03-31 15:47:55 +03:00
SaitTalhaNisanci 98f95e2a5e add TaskQueryStringForPlacement
TaskQueryStringForPlacement simplifies how the executor gets the query
string for a given placement. Task will use the necessary fields to
return the correct query placement string. Executor doesn't need to know
the details for this.

rename TaskQueryString as TaskQueryStringAllPlacements

TaskQueryString returns the query string that will be the same for all
the placements. In INSERT..SELECT the query string can be different for
each placement. Adaptive executor uses TaskQueryStringForPlacement,
which returns the query string for a placement. It makes sense to rename
TaskQueryString as TaskQueryStringAllPlacements as it is returning the
query string for all placements.

rename SetTaskQuery as SetTaskQueryIfShouldLazyDeparse

SetTaskQuery does not always sets the task query. It can set the query
string as well. So it is more clear to name it
SetTaskQueryIfShouldLazyDeparse, since it will set the query not query
string only when we should deparse the query in a lazy way.
2020-03-31 15:47:55 +03:00
SaitTalhaNisanci 982b5fbabf add SetTaskPerPlacementStrings
It is possible that a task will have different query string for each
placement. This is the case in INSERT..SELECT via repartitioning. When
we are setting task->perPlacementQueryString, we should set
queryStringLazy to NULL. Therefore a method for that purpose is created.
2020-03-31 15:47:55 +03:00
Marco Slot 331b45348c Fix error when using LEFT JOIN with GROUP BY on primary key 2020-03-30 16:42:22 +02:00
SaitTalhaNisanci e1802c5c00
extract local plan cache related methods into a file (#3667) 2020-03-31 11:11:34 +03:00
SaitTalhaNisanci 8dfc2cb122
not append ; if end of the list in StringJoin (#3672) 2020-03-31 10:01:28 +03:00
Philip Dubé 4eb2c33f38 multi_copy.c: remove tableMetadata 2020-03-30 19:26:44 +00:00
Jelte Fennema 3be665269f
Reintroduce ForceSearchShardPlacementInList (#3664)
This was added to silence static analysis errors. It was removed
accidentally in #3591. This reintroduces it again.
2020-03-27 14:28:50 +01:00
Hanefi Onaldi 0e8103b101
Propagate ALTER ROLE .. SET statements
In PostgreSQL, user defaults for config parameters can be changed by
ALTER ROLE .. SET statements. We wish to propagate those defaults
accross the Citus cluster so that the behaviour will be similar in
different workers.

The defaults can either be set in a specific database, or the whole
cluster, similarly they can be set for a single role or all roles.

We propagate the ALTER ROLE .. SET if all the conditions below are met:
- The query affects the current database, or all databases
- The user is already created in worker nodes
2020-03-27 13:02:48 +03:00
Marco Slot a65ffee266 Fixes a bug that causes some DML queries containing aggregates to fail 2020-03-26 16:08:34 +00:00
SaitTalhaNisanci d3fdade2e8
add missing perPlacementQueryStrings to copy and out funcs (#3657) 2020-03-26 17:16:29 +03:00
Marco Slot b89e9dc158 Fix a bug which caused queries with SRFs and function evalution to fail 2020-03-25 06:55:53 +01:00
SaitTalhaNisanci dd1a456407
store query command list in task (#3649)
Sometimes we have concatenated query strings for a task. However,
when we want to find each query string, it is not a trivial task.
Therefore, it makes sense to store this in task so that when we need
each query string we can easily get it.
2020-03-26 12:04:08 +03:00