For another PR I needed to add another column which would require to add
another argument to an already 9 argument function signature. In this
case it would be a boolean flag and there were already two boolean flags
in there. In my experience it becomes really easy to mess up the order
of these flags at that point. Especially because the type system doesn't
distinguish between the 3 different booleans with completely different
meanings.
So I refactored these signatures to receive a struct containing most of
these arguments. Like that you don't mess up orderening, because the
meaning of the boolean is not order dependent but fieldname dependent.
It also makes it possible to set good shared defaults for this struct.
DESCRIPTION: Fix schema leak on CREATE INDEX statement
When a CREATE INDEX is cached between execution we might leak the schema name onto the cached statement of an earlier execution preventing the right index to be created.
Even though the cache is cleared when the search_path changes we can trigger this behaviour by having the schema already on the search path before a colliding table is created in a schema earlier on the `search_path`. When calling an unqualified create index via a function (used to trigger the caching behaviour) we see that the index is created on the wrong table after the schema leaked onto the statement.
By copying the complete `PlannedStmt` and `utilityStmt` during our planning phase for distributed ddls we make sure we are not leaking the schema name onto a cached data structure.
Caveat; COPY statements already have a lot of parsestree copying ongoing without directly putting it back on the `pstmt`. We should verify that copies modify the statement and potentially copy the complete `pstmt` there already.
/*
* local_executor.c
*
* The scope of the local execution is locally executing the queries on the
* shards. In other words, local execution does not deal with any local tables
* that are not shards on the node that the query is being executed. In that sense,
* the local executor is only triggered if the node has both the metadata and the
* shards (e.g., only Citus MX worker nodes).
*
* The goal of the local execution is to skip the unnecessary network round-trip
* happening on the node itself. Instead, identify the locally executable tasks and
* simply call PostgreSQL's planner and executor.
*
* The local executor is an extension of the adaptive executor. So, the executor uses
* adaptive executor's custom scan nodes.
*
* One thing to note that Citus MX is only supported with replication factor = 1, so
* keep that in mind while continuing the comments below.
*
* On the high level, there are 3 slightly different ways of utilizing local execution:
*
* (1) Execution of local single shard queries of a distributed table
*
* This is the simplest case. The executor kicks at the start of the adaptive
* executor, and since the query is only a single task the execution finishes
* without going to the network at all.
*
* Even if there is a transaction block (or recursively planned CTEs), as long
* as the queries hit the shards on the same, the local execution will kick in.
*
* (2) Execution of local single queries and remote multi-shard queries
*
* The rule is simple. If a transaction block starts with a local query execution,
* all the other queries in the same transaction block that touch any local shard
* have to use the local execution. Although this sounds restrictive, we prefer to
* implement in this way, otherwise we'd end-up with as complex scenarious as we
* have in the connection managements due to foreign keys.
*
* See the following example:
* BEGIN;
* -- assume that the query is executed locally
* SELECT count(*) FROM test WHERE key = 1;
*
* -- at this point, all the shards that reside on the
* -- node is executed locally one-by-one. After those finishes
* -- the remaining tasks are handled by adaptive executor
* SELECT count(*) FROM test;
*
*
* (3) Modifications of reference tables
*
* Modifications to reference tables have to be executed on all nodes. So, after the
* local execution, the adaptive executor keeps continuing the execution on the other
* nodes.
*
* Note that for read-only queries, after the local execution, there is no need to
* kick in adaptive executor.
*
* There are also few limitations/trade-offs that is worth mentioning. First, the
* local execution on multiple shards might be slow because the execution has to
* happen one task at a time (e.g., no parallelism). Second, if a transaction
* block/CTE starts with a multi-shard command, we do not use local query execution
* since local execution is sequential. Basically, we do not want to lose parallelism
* across local tasks by switching to local execution. Third, the local execution
* currently only supports queries. In other words, any utility commands like TRUNCATE,
* fails if the command is executed after a local execution inside a transaction block.
* Forth, the local execution cannot be mixed with the executors other than adaptive,
* namely task-tracker, real-time and router executors. Finally, related with the
* previous item, COPY command cannot be mixed with local execution in a transaction.
* The implication of that any part of INSERT..SELECT via coordinator cannot happen
* via the local execution.
*/
Before this patch, when a connection is lost, we'd have the following
situation:
- Pop a task execution from readyQueue
- Lost connection
- Fail the session/pool. -> This step was not acting properly
because we've popped the task, but not set to session->currentTask
yet
After the patch:
- Pop a task execution from readyQueue
- Immediately set it to session->currentTask
- Lost connection
- Fail the session/pool. -> At this step, failing the
session would trigger query failures (or failovers)
properly.
* Add creating a citus cluster script
Creating a citus cluster is automated.
Before running this script:
- Citus should be installed and its control file should be added to postgres. (make install)
- Postgres should be installed.
* Initialize upgrade test table and fill
* Finalize the layout of upgrade tests
Postgres upgrade function is added.
The newly added UDFs(citus_prepare_pg_upgrade, citus_finish_pg_upgrade) are used to
perform upgrade.
* Refactor upgrade test and add config file
* Add schedules for upgrade testing
* Use pg_regress for upgrade tests
pg_regress is used for creating a simple distributed table in
upgrade tests. After upgrading another schedule is used to verify
that the distributed table exists. Router and realtime queries are
used for verifying.
* Run upgrade tests as a postgres user in a temp dir
postgres user is used for psql to be consistent at running tests.
A temp dir is created and the temp dir's permissions are changed so
that postgres user can access it. All psql commands are now run with
postgres user.
"Select * from t" query is changed as "Select * from t order by a"
so that the result is always in the same order.
* Add docopt and arguments for the upgrade script
Docopt dependency is added to parse flags in script.
Some refactoring in variable names is done.
* Add readme for upgrade tests
* Refactor upgrade tests
Use relative data path instead of absolute assuming that this script will
always be run from 'src/test/regress'
Remove 'citus-path' flag
Use specific version for docopt instead of *
Use named args in string formatting
* Resolve a security problem
Instead of using string formatting in subprocess.call, arguments
list is used. Otherwise users could do shell injection.
Shell = True is removed from subprocess call as it is not recommended
to use this.
* Add how the test works to readme
* Refactor some variables to be consistent
* Update upgrade script based on the reviews
It was possible that postgres server would stay running even when the script
crashes, atexit library is used to ensure that we always do a teardown where we stop
the databases.
Some formatting is done in the code for better readability.
Config class is used instead of a dictonary.
A target for upgrade test is added to makefile.
Unused flags/functions/variables are removed.
* Format commands and remove unnecessary flag from readme
This is a bug that got in when we inlined the body of a function into this loop. Earlier revisions had two loops, hence a function that would be reused.
With a return instead of a continue the list of dependencies being walked is dependent on the order in which we find them in pg_depend. This became apparent during pg12 compatibility. The order of entries in pg12 was luckily different causing a random test to fail due to this return.
By changing it to a continue we only skip the entries that we don’t want to follow instead of skipping all entries that happen to be found later.
sidefix for more stable isolation tests around ensure dependency
DESCRIPTION: Refactor ensure schema exists to dependency exists
Historically we only supported schema's as table dependencies to be created on the workers before a table gets distributed. This PR puts infrastructure in place to walk pg_depend to figure out which dependencies to create on the workers. Currently only schema's are supported as objects to create before creating a table.
We also keep track of dependencies that have been created in the cluster. When we add a new node to the cluster we use this catalog to know which objects need to be created on the worker.
Side effect of knowing which objects are already distributed is that we don't have debug messages anymore when creating schema's that are already created on the workers.
* Add tuplestore helpers
* More detailed error messages in tuplestore
* Add CreateTupleDescCopy to SetupTuplestore
* Use new SetupTuplestore helper function
* Remove unnecessary copy
* Remove comment about undefined behaviour