As suggested by @marcocitus in https://github.com/citusdata/citus/pull/3911#issuecomment-643978531, there was
a regression in #3893. If another backend would write a file during deletion of
the intermediate results directory, this file would not necessarily be deleted.
The approach used in `CitusRemoveDirectory` is to try recursive removal of the
directory again if it has failed. This does not work here, since when a file
can not be removed for other reasons (e.g. `EPERM`) it will not throw an error
anymore. So then we would get into an infinite removal loop. Instead I now
`rename` the directory before removing it. That way other backends will not
write files to it anymore.
We sort the workerList because adaptive connection management
(e.g., OPTIONAL_CONNECTION) requires any concurrent executions
to wait for the connections in the same order to prevent any
starvation. If we don't sort, we might end up with:
Execution 1: Get connection for worker 1, wait for worker 2
Execution 2: Get connection for worker 2, wait for worker 1
and, none could proceed. Instead, we enforce every execution establish
the required connections to workers in the same order.
We've had two issues with merge conflicts to enterprise in the last week, that
suddenly happened. Because of this CI check this actually blocks all community
PRs from being merged.
This PR tries to improve on the previous script we had, by putting tougher
constraints on when a merge is allowed.
Previously the check would pass in two cases:
1. This PR be merged without conflicts into `enterprise-master`
2. A branch exists with the same name as this PR on enterprise and that can be
merged into `enterprise-master`.
The first case stays the same, but I've changed the second case to require the
following instead:
1. A branch exists on enterprise with the same name as this PR
2. **NEW: This branch contains the the last commit of the community PR branch**
3. This branch can be merged into enterprise-master
This makes sure the enterprise branch is actually up to date and not forgotten about.
If we still get problems with this change, future improvements could be:
1. Check that the PR on enterprise passes CI
2. Check that the PR on enterprise has been approved
3. Require the enterprise PR branch to be merged before merging community.
When we are using hammerdb jobs, the job creates a branch on test
automation, since that branch should be deleted, it would have
`delete_me` prefix, however since the result branch on
release-test-results will have the test automation branch as prefix, it
will also have `delete_me` prefix, which seems a bit confusing.
This PR updates it as citus_github_push
In #3901 the "Data received from worker(s)" sections were added to EXPLAIN
ANALYZE. After merging @pykello posted some review comments. This addresses
those comments as well as fixing a other issues that I found while addressing
them. The things this does:
1. Fix `EXPLAIN ANALYZE EXECUTE p1` to not increase received data on every
execution
2. Fix `EXPLAIN ANALYZE EXECUTE p1(1)` to not return 0 bytes as received data
allways.
3. Move `EXPLAIN ANALYZE` specific logic to `multi_explain.c` from
`adaptive_executor.c`
4. Change naming of new explain sections to `Tuple data received from node(s)`.
Firstly because a task can reference the coordinator too, so "worker(s)" was
incorrect. Secondly to indicate that this is tuple data and not all network
traffic that was performed.
5. Rename `totalReceivedData` in our codebase to `totalReceivedTupleData` to
make it clearer that it's a tuple data counter, not all network traffic.
6. Actually add `binary_protocol` test to `multi_schedule` (woops)
7. Fix a randomly failing test in `local_shard_execution.sql`.
Sometimes isolation tests get stuck in CI and we cannot see why, because
the job is killed by the CI runner. This will instead fail inside make
the testsuite continue, but mark it as a failure like this in the diff
output:
```diff
+isolationtester: canceling step s2-ddl-create-index-concurrently after 5 seconds
step s2-ddl-create-index-concurrently: CREATE INDEX CONCURRENTLY select_append_index ON select_append(id);
+ERROR: CONCURRENTLY-enabled index command failed
```
We should detect blockages very quickly and the queries we run are also
very fast, so 5 seconds should be more than enough to catch any random
slowness. The default from Postgres is 5 minutes, which is waaay to much
for us.
Shard id to index mapping stored in cache entry as there may now be multiple entries alive for a given relation
insert_select_executor: revert copying cache entry, which was a hack added to avoid memory safety issues
Sadly this does not actually work yet for binary protocol data, because
when doing EXPLAIN ANALYZE we send two commands at the same time. This
means we cannot use `SendRemoteCommandParams`, and thus cannot use the
binary protocol. This can still be useful though when using the text
protocol, to find out that a lot of data is being sent.
Due to the problem described in #3908 we don't cover the tdigest integration (and other extensions) on CI.
Due to this a bug got in the patch due to a change in `EXPLAIN VERBOSE` being merged concurrently with the tdigest integration. This PR fixes the test output that missed the newly added information.
* Insert select with master query
* Use relid to set custom_scan_tlist varno
* Reviews
* Fixes null check
Co-authored-by: Marco Slot <marco.slot@gmail.com>
This can save a lot of data to be sent in some cases, thus improving
performance for which inter query bandwidth is the bottleneck.
There's some issues with enabling this as default, so that's currently not done.
DESCRIPTION: Adds support to partially push down tdigest aggregates
tdigest extensions: https://github.com/tvondra/tdigest
This PR implements the partial pushdown of tdigest calculations when possible. The extension adds a tdigest type which can be combined into the same structure. There are several aggregate functions that can be used to get;
- a quantile
- a list of quantiles
- the quantile of a hypothetical value
- a list of quantiles for a list of hypothetical values
These function can work both on values or tdigest types.
Since we can create tdigest values either by combining them, or based on a group of values we can rewrite the aggregates in such a way that most of the computation gets delegated to the compute on the shards. This both speeds up the percentile calculations because the values don't have to be sorted while at the same time making the transfer size from the shards to the coordinator significantly less.
We still recursively plan some cases, eg:
- INSERTs
- SELECT FOR UPDATE when reference tables in query
- Everything must be same single shard & replication model
We wrap worker tasks in worker_save_query_explain_analyze() so we can fetch
their explain output later by a call worker_last_saved_explain_analyze().
Fixes#3519Fixes#2347Fixes#2613Fixes#621
This code is not needed anymore since #3668 was merged.
It's actually causing some issues when using the binary Postgres
protocol, because postgres thinks it gets a `bigint` from
the worker, but actually gets an normal `int`.
The query in question that fails is this:
```sql
CREATE TABLE test_table_1(id int, val1 int);
CREATE TABLE test_table_2(id int, val1 bigint);
SELECT create_distributed_table('test_table_1', 'id');
SELECT create_distributed_table('test_table_2', 'id');
INSERT INTO test_table_1 VALUES(1,1),(2,2),(3,3);
INSERT INTO test_table_2 VALUES(1,1),(3,3),(4,5);
SELECT val1
FROM test_table_1 LEFT JOIN test_table_2 USING(id, val1)
ORDER BY 1;
```
The difference in queries that is sent to the workers after this change is this, for this query:
```diff
--- query_old.sql 2020-06-09 09:51:21.460000000 +0200
+++ query_new.sql 2020-06-09 09:51:39.500000000 +0200
@@ -1 +1 @@
-SELECT worker_column_1 AS val1 FROM (SELECT test_table_1.val1 AS worker_column_1 FROM (public.test_table_1_102015 test_table_1(id, val1) LEFT JOIN public.test_table_2_102019 test_table_2(id, val1) USING (id, val1))) worker_subquery
+SELECT worker_column_1 AS val1 FROM (SELECT val1 AS worker_column_1 FROM (public.test_table_1_102015 test_table_1(id, val1) LEFT JOIN public.test_table_2_102019 test_table_2(id, val1) USING (id, val1))) worker_subquery
```
This is a different version of #3634. It also removes SwallowErrors, but
instead of modifying our own functions to not throw errors, it uses the
postgres built in `PathNameDeleteTemporaryDir` function. This function
does not throw errors.
Since this change is for a bugfix, I tried to minimize the changes.
PRs with the following changes would be good to do separately from this
PR:
1. Use PathName(Create|Open|Delete)Temporary(File|Dir) to open and
remove all files/dirs instead of our own custom file functions.
2. Prefix our outmost files/directories with `PG_TEMP_FILE_PREFIX` so
that they are identified by Postgres as temporary files, which will be
removed at postmaster start. This way we do not have to do this cleanup
ourselves.
3. Store the files in the temporary table space if it exists.
Fixes#3634Fixes#3618
Implements worker_save_query_explain_analyze and worker_last_saved_explain_analyze.
worker_save_query_explain_analyze executes and returns results of query while
saving its EXPLAIN ANALYZE to be fetched later.
worker_last_saved_explain_analyze returns the saved EXPLAIN ANALYZE result.
If we want to get necessary lockmode for a relation RangeVar within
a query, we can get the lockmode easily from the RangeVar itself (if
pg version >= 12).
However, if we want to decide the lockmode appropriate for the
"query", we can derive this information by using GetQueryLockMode
according to the code comment from RangeTblEntry->rellockmode.
Implements a new `TupleDestination` interface to allow custom tuple processing per task.
This can be specially useful if a task contains multiple queries. An example of this EXPLAIN
ANALYZE, where it needs to add some UDF calls to the query to fetch the explain output
from worker after fetching the actual query results.
We should check the remove type in IsDropCitusStmt because if the remove
type is not OBJECT_EXTENSION then the stored objects in
dropStmt->objects may not be of type Value. This was crashing PG-13.
Also rename the method as IsDropCitusExtensionStmt.
To reduce code duplication, implement function that pushes search_path
to be NIL and sets addCatalog to true so that all objects outside of
pg_catalog will be schema-prefixed.
Append IF NOT EXISTS to CREATE SERVER commands generated by
pg_get_serverdef_string function when deparsing an existing server
object that a foreign table depends.
SELECT_TASK is renamed to READ_TASK as a SELECT with modifying CTEs will be a MODIFYING_TASK
RouterInsertJob: Assert originalQuery->commandType == CMD_INSERT
CreateModifyPlan: Assert originalQuery->commandType != CMD_SELECT
Remove unused function IsModifyDistributedPlan
DistributedExecution, ExecutionParams, DistributedPlan: Rename hasReturning to expectResults
SELECTs set expectResults to true
Rename CreateSingleTaskRouterPlan to CreateSingleTaskRouterSelectPlan
Do not release AccessShareLock when closing pg_constraint to prevent
modifications to be done on pg_constraint to make sure that caller
will process valid foreign key constraints through the transaction.
With this commit:
You can trigger two types of hammerdb benchmark jobs:
-ch_benchmark (analytical and transactional queries)
-tpcc_benchmark (only transactional queries)
Your branch will be run against `master` branch.
In order to trigger the jobs prepend `ch_benchmark/` or `tpcc_benchmark/` to your branch and push it.
For example if you were running on a feature/improvement branch with name `improve/adaptive_executor`. In order to trigger a tpcc benchmark, you can do the following:
```bash
git checkout improve/adaptive_executor
git checkout -b tpcc_benchmark/improve/adaptive_executor
git push origin tpcc_benchmark/improve/adaptive_executor # the tpcc benchmark job will be triggered.
```
You will see the results in a branch in [https://github.com/citusdata/release-test-results](https://github.com/citusdata/release-test-results).
The branch name will be something like: `delete_me/citusbot_tpcc_benchmark_rg/<date>/<date>`.
The resource groups will be deleted automatically but if the benchmark fails, they won't be deleted(If you don't see the results after a reasonable time, it might mean it failed, you can check the resource usage from portal, if it is almost 0 and you didn't see the results, it means it probably failed). In that case, you will need to delete the resource groups manually from portal, the resource groups are `citusbot_ch_benchmark_rg` and `citusbot_tpcc_benchmark_rg`.
This PR removes ExecuteUtilityTaskListWithoutResults and uses the same
path for local execution via ExecuteTaskListExtended.
ExecuteUtilityTaskList is added. ExecuteLocalTaskListExtended now has a
parameter for utility commands so that it can call the right method. In
order not to change the existing calls,
ExecuteTaskListExtendedInternal is added, which is the main method that
runs the execution, via local and remote execution.
DESCRIPTION: Ignore pruned target list entries in coordinator plan
The postgres planner has the ability to prune target list entries that are proven not used in the output relation. When this happens at the `CitusCustomScan` boundary we need to _not_ return these pruned columns to not upset the rest of the planner.
By using the target list the planner asks us to return we fix issues that lead to Assertion failures, and potentially could be runtime errors when they hit in a production build.
Fixes#3809
* add a job to check if merge to enterprise master would fail
Add a job to check if merge to enterprise master would fail.
The job does the following:
- It checks if there is already a branch with the same name on
enterprise, if so it tries to merge it to enterprise master, if the
merge fails the job fails.
- If the branch doesn't exist on the enterprise, it tries to merge the
current branch to enterprise master, it fails if there is any conflict
while merging.
The motivation is that if a branch on community would create a conflict
on enterprise-master, until we create a PR on enterprise that would
solve this conflict, we won't be able to merge the PR on community. This
way we won't have many conflicts when merging to enterprise master and
the author, who has the most context will be responsible for resolving
the conflict when he has the most context, not after 1 month.
* Improve test suite to be able to easily run locally
* Add documentation on how to resolve conflicts to enterprise master
* Improve enterprise merge script
* Improve merge conflict job README
* Improve merge conflict job README
* Improve merge conflict job README
* Improve merge conflict job README
Co-authored-by: Nils Dijk <nils@citusdata.com>
In the code, we had the assumption that if restriction information
is NULL, it means that we cannot have any disributetd tables in
the subquery.
However, for subqueries in WHERE clause, that is not the case when
the subquery is ANDed with FALSE. In that case, Citus operates
on the originalQuery (which doesn't go through the standard_planner()),
and rely on the restriction information generated by standard_plannner().
As Postgres is smart enough to no generate restriction information for
subqueries ANDed with FALSE, we hit the assertion.
The reason is that PQconnectPoll() may change the underlying
socket. If we don't rebuild the wait event set, the low level
APIs (such as epoll_ctl()) may fail due to invalid sockets.
Instead, rebuilding ensures that we'll use accurate/active sockets.
* Not append empty task in ExtractLocalAndRemoteTasks
ExtractLocalAndRemoteTasks extracts the local and remote tasks. If we do
not have a local task the localTaskPlacementList will be NIL, in this
case we should not append anything to local tasks. Previously we would
first check if a task contains a single placement or not, now we first
check if there is any local task before doing anything.
* fix copy of node task
Task node has task query, which might contain a list of strings in its
fields. We were using postgres copyObject for these lists. Postgres
assumes that each element of list will be a node type. If it is not a
node type it will error.
As a solution to that, a new macro is introduced to copy a list of
strings.
The previous default was 5 seconds, and we change it to 30 seconds.
The main motivation for this is that for busy clusters, 5 seconds
can be too aggressive. Especially with connection throttling, the servers
might be kept busy for a really long time, and users may see the
connection errors more frequently.
We've done some sanity checks, for really quick queries (like
`SELECT count(*) from table`), 30 seconds is a decent value even
if users execute 300 distributed queries on the coordinator. We've
verified this on Hyperscale(Citus).
Physical planner doesn't support parameters. If the parameters have already
been resolved when the physical planner handling the queries, mark it.
The reason is that the executor is unaware of this, and sends the parameters
along with the worker queries, which fails for composite types.
(See `DissuadePlannerFromUsingPlan()` for the details of paramater resolving)
As reported in #3787, we were having issues while building citus with "GCC Red Hat 10" (maybe in some other versions of gcc as well).
Fixes "multiple definition of 'CitusNodeTagNames'" error by explicitly specifying storage of CitusNodeTagNames to be extern.
We currently put the actual error message to the detail part. However,
many drivers don't show detail part.
As connection errors are somehow common, and hard to trace back, can't
we added the detail to the message itself.
In addition to that, we changed "connection error" message, as it
was confusing to the users who think that the error was happening
while connecting to the coordinator. In fact, this error is showing
up when the coordinator fails to connect remote nodes.
* invalidate plan cache in master_update_node
If a plan is cached by postgres but a user uses master_update_node, then
when the plan cache is used for the updated node, they will get the old
nodename/nodepost in the plan. This is because the plan cache doesn't
know about the master_update_node. This could be a problem in prepared
statements or anything that goes into plancache. As a solution the plan
cache is invalidated inside master_update_node.
* add invalidate_inactive_shared_connections test function
We introduce invalidate_inactive_shared_connections udf to be used in
testing. It is possible that a connection count for an inactive node
will be greater than 0 and in that case it will not be removed at the
time of invalidation. However, later we don't have a mechanism to remove
it, which means that it will stay in the hash. For this not to cause a
problem, we use this udf in testing.
* move invalidate_inactive_shared_connections to udfs from test as it will be used in mx
* remove the test udf
* remove the IsInactive check
This copies over fixes from reference counting branch,
all CitusTableCacheEntry data may be freed when a GetCitusTableCacheEntry call occurs for its relationId
This fix is not complete, but reference counting is being deferred until 9.4
CopyShardInterval: remove dest parameter, always return newly allocated object
We initially considered removing entries just before any change to
pg_dist_node. However, that ended-up being very complex and making
MX even more complex.
Instead, we're switching to a simpler solution, where we remove entries
when the counter gets to 0.
With certain workloads, this may have some performance penalty. But, two
notes on that:
- When counter == 0, it implies that the cluster is not busy
- With cached connections, that's not possible
When we call SetTaskQueryString we would set the task type to
TASK_QUERY_TEXT, and some parts of the codebase rely on the fact that if
TASK_QUERY_TEXT is set, the data can be read safely. However if
SetTaskQueryString is called with a NULL taskQueryString this can cause
crashes. In that case taskQueryType will simply be set to
TASK_QUERY_NULL.
DESCRIPTION: Alter role only works for citus managed roles
Alter role was implemented before we implemented good role management that hooks into the object propagation framework. This is a refactor of all alter role commands that have been implemented to
- be on by default
- only work for supported roles
- make the citus extension owner a supported role
Instead of distributing the alter role commands for roles at the beginning of the node activation role it now _only_ executes the alter role commands for all users in all databases and in the current database.
In preparation of full role support small refactors have been done in the deparser.
Earlier tests targeting other roles than the citus extension owner have been either slightly changed or removed to be put back where we have full role support.
Fixes#2549
We had 9+ parameters in some of the functions related to execution.
Execution params is created to simplify this a bit so that we can set
only the fields that we are interested in and it is easier to read.
With this commit, we're introducing a new infrastructure to throttle
connections to the worker nodes. This infrastructure is useful for
multi-shard queries, router queries are have not been affected by this.
The goal is to prevent establishing more than citus.max_shared_pool_size
number of connections per worker node in total, across sessions.
To do that, we've introduced a new connection flag OPTIONAL_CONNECTION.
The idea is that some connections are optional such as the second
(and further connections) for the adaptive executor. A single connection
is enough to finish the distributed execution, the others are useful to
execute the query faster. Thus, they can be consider as optional connections.
When an optional connection is not allowed to the adaptive executor, it
simply skips it and continues the execution with the already established
connections. However, it'll keep retrying to establish optional
connections, in case some slots are open again.
We currently don't use any cursor flags in local execution, but we can
use CURSOR_OPT_PARALLEL_OK flag to potentially benefit from parallelism
when possible.
This PR:
- Declares variables when they are needed.
- Creates DoCopyFromLocalTableIntoShards for better readability.
- Doesn't use a hardcoded value, instead use a variable for better
readability.
We have two variables that are related to local execution status.
TransactionAccessedLocalPlacement and
TransactionConnectedToLocalGroup. Only one of these fields should be
set, however we didn't have any check for this contraint and it was
error prone.
What those two variables are used is that we are trying to understand if
we should use local execution, the current session, or if we should be
using a connection to execute the current query, therefore the tasks. In
the enum, now it is more clear what these variables mean.
Also, now we have a method to change the local execution status. The
method will error if we are trying to transition from a state to a wrong
state. This will help us avoid problems.
* use local executon when in a transaction block
When we are inside a transaction block, there could be other methods
that need local execution, therefore we will use local execution in a
transaction block.
* update test outputs with transaction block local execution
* add a test to verify we dont leak intermediate schemas
* test that we don't leak intermediate schemas
We have tests to make sure that we don't intermediate any intermediate
files, tables etc but we don't test if we are leaking schemas. It makes
sense to test this as well.
* remove all repartition schemas in case of error
This solution is not an ideal one but it seems to be doing the job.
We should have a more generic solution for the cleanup but it seems that
putting the cleanup in the abort handler is dangerous and it was
crashing.
It is possible to return an error in ExecuteTaskListExtended after
performing local execution with the current structure. However there is
no point in execution the local tasks if we are going to return an error
later. So the local execution is moved after the error check.
When the file does not exist, it could mean two different things.
First -- and a lot more common -- case is that a failure happened
in a concurrent backend on the same distributed transaction. And,
one of the backends in that transaction has already been roll
backed, which has already removed the file. If we throw an error
here, the user might see this error instead of the actual error
message. Instead, we prefer to WARN the user and pretend that the
file has no data in it. In the end, the user would see the actual
error message for the failure.
Second, in case of any bugs in intermediate result broadcasts,
we could try to read a non-existing file. That is most likely
to happen during development. Thus, when asserts enabled, we throw
an error instead of WARNING so that the developers cannot miss.
When we have a query like the following:
```SQL
WITH a AS (SELECT * FROM foo LIMIT 10) SELECT max(x) FROM a JOIN bar 2 USING (y);
```
Citus currently opens side channels for doing the
`COPY "1_1"` FROM STDIN (format 'result')
before starting the execution of
`SELECT * FROM foo LIMIT 10`
Since we need at least 1 connection per worker to do
`SELECT * FROM foo LIMIT 10`
We need to have 2 connections to worker in order to broadcast the results.
However, we don't actually send a single row over the side channel until the
execution of `SELECT * FROM foo LIMIT 10` is completely done (and connections
unclaimed) and the results are written to a tuple store. We could actually
reuse the same connection for doing the `COPY "1_1"` FROM STDIN (format 'result').
This also fixes the issue that Citus doesn't obey `citus.max_adaptive_executor_pool_size`
when the query includes an intermediate result.
We don't need any side channel connections. That is actually
problematic in the sense that it creates extra connections.
Say, citus.max_adaptive_executor_pool_size equals to 1, Citus
ends up using one extra connection for the intermediate results.
Thus, not obeying citus.max_adaptive_executor_pool_size.
In this PR, we remove the following entities from the codebase
to allow further commits to implement not requiring extra connection
for the intermediate results:
- The connection flag REQUIRE_SIDECHANNEL
- The function GivePurposeToConnection
- The ConnectionPurpose struct and related fields
* explicitly return false if transaction connected to local node
* not set TransactionConnectedToLocalGroup if we are writing to a file
We use TransactionConnectedToLocalGroup to prevent local execution from
happening as that might cause visibility problems. As files are visible
to all transactions, we shouldn't set this variable if we are writing to
a file.
In case we don't care about the tupleStoreState in
ExecuteLocalTaskListExtended, it could be passed as null. In that case
we will get a seg error. This changes it so that a dummy tuple store
will be created when it is null.
Do not use local execution in ExecuteTaskListOutsideTransaction.
As we are going to run the tasks outside transaction, we shouldn't use local execution.
However, there is some problem when using local execution related to
repartition joins, when we solve that problem, we can execute the tasks
coming to this path with local execution.
Also logging the local command is simplified.
normalize job id in worker_hash_partition_table in test outputs.
For shardplacements, we were setting nodeid, nodename, nodeport and
nodegroup manually. This makes it very error prone, and it seems that we
already forgot to set some of them. This would mean that they would have
their default values, e.g group id would be 0 when its group id is not
0.
So the implication is that we would have inconsistent worker metadata.
A new method is introduced, and we call the method to set those fields
now, so that as long as we call this method, we won't be setting
inconsistent metadata.
It probably makes sense to have a struct for these fields. We already
have NodeMetadata but it doesn't have nodename or nodeport. So that
could be done over another refactor to make things simpler.
This is possible whenever we aren't pulling up intermediate rows
We want to do this because this was done in 9.2,
some queries rely on the performance of grouping causing distinct values
This change was introduced when implementing window functions on coordinator
The purpose of null_parameters is to make sure that citus doesn't crash
with null parameters. (The related issue is #3493.) The logs in this
file are not that important and they are flaky. The flakiness is related
to postgres part as well so it is hard to reproduce them. Therefore it
makes sense to decrease the log level.
look at sent commands to simplify complex logic in vacuum test
also normalize connection id as that can differ when we don't have to
choose a specific connection.
It seems that sometimes the pruning is deferred and sometimes not with
this statement. What we care in this test is to see that it doesn't
crash. I think we don't care about the log statement for this line. So
it makes sense to not log this statement, and care about the result.
ExecuteTaskListExtended is the common method for different codepaths,
and instead of writing separate local execution logics in different
codepaths, it makes more sense to have the logic here. We still need to
do some refactoring, this is an initial step.
After this commit, we can run create shard commands locally. There is a
special case with shard creation commands. A create shard command might
have a concatenated query string, however local execution did not know
how to execute a task with multiple query strings. This is also
implemented in this commit. We go over each query in the concatenated
query string and plan/execute them one by one.
A more clean solution to this would be to make sure that each task has a
single query. We currently cannot do that because we need to ensure the
task dependencies. However, it would make sense to do that at some point
and it would simplify the code a lot.
It seems that one of the deadlock detection tests fails way too often in
our CI. The difference is only ordering. Currently it seems that it is a
good idea to disable this test for the sake of development.
ExecuteLocalTaskList doesn't need scanState as it only uses
paramListInfo, distributedPlan and tupleStoreState. It is better to pass
only the variables that the function needs, so that we can call this
function from other places when we dont have scanState.
We had many fields in task related to query strings. It was kind of
complex, and only of them could be set at a time. Therefore it makes
more sense to abstract this and use a union so that it is clear that
only of them should be set.
We have three fields that could have query related strings:
- queryForLocation
- queryStringLazy
- perPlacementQueryStrings
Relatively, they can be set with:
- SetTaskQueryString
- SetTaskQueryIfShouldLazyDeparse
- SetTaskPerPlacementQueryStrings
The direct usage of the query related fields are also removed.
Rename queryForLocalExecution
Currently queryForLocalExecution is only used for deparsing purposes,
therefore it makes sense to rename it to what it is doing.
TaskQueryStringForPlacement simplifies how the executor gets the query
string for a given placement. Task will use the necessary fields to
return the correct query placement string. Executor doesn't need to know
the details for this.
rename TaskQueryString as TaskQueryStringAllPlacements
TaskQueryString returns the query string that will be the same for all
the placements. In INSERT..SELECT the query string can be different for
each placement. Adaptive executor uses TaskQueryStringForPlacement,
which returns the query string for a placement. It makes sense to rename
TaskQueryString as TaskQueryStringAllPlacements as it is returning the
query string for all placements.
rename SetTaskQuery as SetTaskQueryIfShouldLazyDeparse
SetTaskQuery does not always sets the task query. It can set the query
string as well. So it is more clear to name it
SetTaskQueryIfShouldLazyDeparse, since it will set the query not query
string only when we should deparse the query in a lazy way.
It is possible that a task will have different query string for each
placement. This is the case in INSERT..SELECT via repartitioning. When
we are setting task->perPlacementQueryString, we should set
queryStringLazy to NULL. Therefore a method for that purpose is created.
In PostgreSQL, user defaults for config parameters can be changed by
ALTER ROLE .. SET statements. We wish to propagate those defaults
accross the Citus cluster so that the behaviour will be similar in
different workers.
The defaults can either be set in a specific database, or the whole
cluster, similarly they can be set for a single role or all roles.
We propagate the ALTER ROLE .. SET if all the conditions below are met:
- The query affects the current database, or all databases
- The user is already created in worker nodes
Sometimes we have concatenated query strings for a task. However,
when we want to find each query string, it is not a trivial task.
Therefore, it makes sense to store this in task so that when we need
each query string we can easily get it.
Some refactoring:
Consolidate expression which decides whether GROUP BY/HAVING are pushed down
Rename early pullUpIntermediateRows to hasNonDistributableAggregates
Create WorkerColumnName to handle formatting WORKER_COLUMN_FORMAT
Ignore NULL StringInfo pointers to SafeToPushdownWindowFunction
Fix bug where SubqueryPushdownMultiNodeTree mutates supplied Query,
SafeToPushdownWindowFunction requires the original query as it relies on rtable
DESCRIPTION: Refactor dependency resolution and resolve from pg_shdepend
This PR refactors how dependencies are resolved by not assuming solely a `pg_depend` record describing the dependency. Instead we keep a definition of the dependency around which records how the dependency is resolved. This can be one of the following ways
- `pg_depend`, data will contain a copy of the `pg_depend` record
- `pg_shdepend`, data will contain a copy of the `pg_shdepend` record
- `ObjectAddress`, data will contain only an `ObjectAddress` describing a dependency
Irregardless of way the dependency was found it will always be able to get to the address of the dependency as that is the most important property.
For some checks we can inspect the source where the dependency was found and perform a deep inspection to decide if we want to follow the dependency. This is important to not distribute dependencies coming from extensions for example.
We cache connections between nodes in our connection management code.
This is good for speed. For security this can be a problem though. If
the user changes settings related to TLS encryption they want those to
be applied to future queries. This is especially important when they did
not have TLS enabled before and now they want to enable it. This can
normally be achieved by changing citus.node_conninfo. However, because
connections are not reopened there will still be old connections that
might not be encrypted at all.
This commit changes that by marking all connections to be shutdown at
the end of their current transaction. This way running transactions will
succeed, even if placement requires connections to be reused for this
transaction. But after this transaction completes any future statements
will use a connection created with the new connection options.
If a connection is requested and a connection is found that is marked
for shutdown, then we don't return this connection. Instead a new one is
created. This is needed to make sure that if there are no running
transactions, then the next statement will not use an old cached
connection, since connections are only actually shutdown at the end of a
transaction.