With this commit, we're enabling recursive planning for the
queries that used to fail due to recurring tuples.
We're simply returning the inner relations/subqueries to the caller
of the error check, which is then able to recursively plan them.
With this commit, we only pull&push the necessary columns. In this
context, necessary columns means that the columns that are required
for the query to be executed when the relation is wrapped into a
subquery.
We could potentially optimize things further:
(a) If a column only appears as a qual filter, we don't need to
pull it to the coordinator
(b) We currently pull unnecessary columns as NULL. However, we
could potentially adjust remaining of the query tree and do
not add columns of the relation to the target entry.
PostgreSQL already keeps track of the restrictions that are on the relation.
With this commit, Citus uses that information and pushes down the filters
to the subquery that is recursively planned for the table that is in
considiration.
Description: Support round-robin `task_assignment_policy` for queries to reference tables.
This PR allows users to query multiple placements of shards in a round robin fashion. When `citus.task_assignment_policy` is set to `'round-robin'` the planner will use a round robin scheduling feature when multiple shard placements are available.
The primary use-case is spreading the load of reference table queries to all the nodes in the cluster instead of hammering only the first placement of the reference table. Since reference tables share the same path for selecting the shards with single shard queries that have multiple placements (`citus.shard_replication_factor > 1`) this setting also allows users to spread the query load on these shards.
For modifying queries we do not apply a round-robin strategy. This would be negated by an extra reordering step in the executor for such queries where a `first-replica` strategy is enforced.
The file handling the utility functions (DDL) for citus organically grew over time and became unreasonably large. This refactor takes that file and refactored the functionality into separate files per command. Initially modeled after the directory and file layout that can be found in postgres.
Although the size of the change is quite big there are barely any code changes. Only one two functions have been added for readability purposes:
- PostProcessIndexStmt which is extracted from PostProcessUtility
- PostProcessAlterTableStmt which is extracted from multi_ProcessUtility
A README.md has been added to `src/backend/distributed/commands` describing the contents of the module and every file in the module.
We need more documentation around the overloading of the COPY command, for now the boilerplate has been added for people with better knowledge to fill out.
Each PostgreSQL backend starts with a predefined amount of stack and this stack
size can be increased if there is a need. However, stack size increase during
high memory load may cause unexpected crashes, because if there is not enough
memory for stack size increase, there is nothing to do for process apart from
crashing. An interesting thing is; the process would get OOM error instead of
crash, if the process had an explicit memory request (with palloc) for example.
However, in the case of stack size increase, there is no system call to get OOM
error, so the process simply crashes.
With this change, we are increasing the stack size explicitly by requesting extra
memory from the stack, so that, even if there is not memory, we can at least get
an OOM instead of a crash.
In recent postgres builds you cannot set client_min_messages to
values higher then ERROR, if will silently set it to ERROR if so.
During some tests we would set it to fatal to hide random values
(eg. pid's of processes) from the test output. This patch will use
different tactics for hiding these values.