create_distributed_function(function_name,
distribution_arg_name,
colocate_with text)
This UDF did not allow colocate_with parameters when there were no
disttribution_arg_name supplied. This commit changes the behaviour to
allow missing distribution_arg_name parameters when the function should
be colocated with a reference table.
SELECT_TASK is renamed to READ_TASK as a SELECT with modifying CTEs will be a MODIFYING_TASK
RouterInsertJob: Assert originalQuery->commandType == CMD_INSERT
CreateModifyPlan: Assert originalQuery->commandType != CMD_SELECT
Remove unused function IsModifyDistributedPlan
DistributedExecution, ExecutionParams, DistributedPlan: Rename hasReturning to expectResults
SELECTs set expectResults to true
Rename CreateSingleTaskRouterPlan to CreateSingleTaskRouterSelectPlan
We had 9+ parameters in some of the functions related to execution.
Execution params is created to simplify this a bit so that we can set
only the fields that we are interested in and it is easier to read.
ExecuteTaskListExtended is the common method for different codepaths,
and instead of writing separate local execution logics in different
codepaths, it makes more sense to have the logic here. We still need to
do some refactoring, this is an initial step.
After this commit, we can run create shard commands locally. There is a
special case with shard creation commands. A create shard command might
have a concatenated query string, however local execution did not know
how to execute a task with multiple query strings. This is also
implemented in this commit. We go over each query in the concatenated
query string and plan/execute them one by one.
A more clean solution to this would be to make sure that each task has a
single query. We currently cannot do that because we need to ensure the
task dependencies. However, it would make sense to do that at some point
and it would simplify the code a lot.
Deparsing and parsing a query can be heavy on CPU. When locally executing
the query we don't need to do this in theory most of the time.
This PR is the first step in allowing to skip deparsing and parsing
the query in these cases, by lazily creating the query string and
storing the query in the task. Future commits will make use of this and
not deparse and parse the query anymore, but use the one from the task
directly.
* WIP
* wip
* add basic logic to run a single job with repartioning joins with adaptive executor
* fix some warnings and return in ExecuteDependedTasks if there is none
* Add the logic to run depended jobs in adaptive executor
The execution of depended tasks logic is changed. With the current
logic:
- All tasks are created from the top level task list.
- At one iteration:
- CurTasks whose dependencies are executed are found.
- CurTasks are executed in parallel with adapter executor main
logic.
- The iteration is repeated until all tasks are completed.
* Separate adaptive executor repartioning logic
* Remove duplicate parts
* cleanup directories and schemas
* add basic repartion tests for adaptive executor
* Use the first placement to fetch data
In task tracker, when there are replicas, we try to fetch from a replica
for which a map task is succeeded. TaskExecution is used for this,
however TaskExecution is not used in adaptive executor. So we cannot use
the same thing as task tracker.
Since adaptive executor fails when a map task fails (There is no retry
logic yet). We know that if we try to execute a fetch task, all of its
map tasks already succeeded, so we can just use the first one to fetch
from.
* fix clean directories logic
* do not change the search path while creating a udf
* Enable repartition joins with adaptive executor with only enable_reparitition_joins guc
* Add comments to adaptive_executor_repartition
* dont run adaptive executor repartition test in paralle with other tests
* execute cleanup only in the top level execution
* do cleanup only in the top level ezecution
* not begin a transaction if repartition query is used
* use new connections for repartititon specific queries
New connections are opened to send repartition specific queries. The
opened connections will be closed at the FinishDistributedExecution.
While sending repartition queries no transaction is begun so that
we can see all changes.
* error if a modification was done prior to repartition execution
* not start a transaction if a repartition query and sql task, and clean temporary files and schemas at each subplan level
* fix cleanup logic
* update tests
* add missing function comments
* add test for transaction with DDL before repartition query
* do not close repartition connections in adaptive executor
* rollback instead of commit in repartition join test
* use close connection instead of shutdown connection
* remove unnecesary connection list, ensure schema owner before removing directory
* rename ExecuteTaskListRepartition
* put fetch query string in planner not executor as we currently support only replication factor = 1 with adaptive executor and repartition query and we know the query string in the planner phase in that case
* split adaptive executor repartition to DAG execution logic and repartition logic
* apply review items
* apply review items
* use an enum for remote transaction state and fix cleanup for repartition
* add outside transaction flag to find connections that are unclaimed instead of always opening a new transaction
* fix style
* wip
* rename removejobdir to partition cleanup
* do not close connections at the end of repartition queries
* do repartition cleanup in pg catch
* apply review items
* decide whether to use transaction or not at execution creation
* rename isOutsideTransaction and add missing comment
* not error in pg catch while doing cleanup
* use replication factor of the creation time, not current time to decide if task tracker should be chosen
* apply review items
* apply review items
* apply review item