* Support for subqueries in WHERE clause
This commit enables subqueries in WHERE clause to be pushed down
by the subquery pushdown logic.
The support covers:
- Correlated subqueries with IN, NOT IN, EXISTS, NOT EXISTS,
operator expressions such as (>, <, =, ALL, ANY etc.)
- Non-correlated subqueries with (partition_key) IN (SELECT partition_key ..)
(partition_key) =ANY (SELECT partition_key ...)
Note that this commit heavily utilizes the attribute equivalence logic introduced
in the 1cb6a34ba8. In general, this commit mostly
adjusts the logical planner not to error out on the subqueries in WHERE clause.
* Improve error checks for subquery pushdown and INSERT ... SELECT
Since we allow subqueries in WHERE clause with the previous commit,
we should apply the same limitations to those subqueries.
With this commit, we do not iterate on each subquery one by one.
Instead, we extract all the subqueries and apply the checks directly
on those subqueries. The aim of this change is to (i) Simplify the
code (ii) Make it close to the checks on INSERT .. SELECT code base.
* Extend checks for unresolved paramaters to include SubLinks
With the presence of subqueries in where clause (i.e., SubPlans on the
query) the existing way for checking unresolved parameters fail. The
reason is that the parameters for SubPlans are kept on the parent plan not
on the query itself (see primnodes.h for the details).
With this commit, instead of checking SubPlans on the modified plans
we start to use originalQuery, where SubLinks represent the subqueries
in where clause. The unresolved parameters can be found on the SubLinks.
* Apply code-review feedback
* Remove unnecessary copying of shard interval list
This commit removes unnecessary copying of shard interval list. Note
that there are no copyObject function implemented for shard intervals.
Custom Scan is a node in the planned statement which helps external providers
to abstract data scan not just for foreign data wrappers but also for regular
relations so you can benefit your version of caching or hardware optimizations.
This sounds like only an abstraction on the data scan layer, but we can use it
as an abstraction for our distributed queries. The only thing we need to do is
to find distributable parts of the query, plan for them and replace them with
a Citus Custom Scan. Then, whenever PostgreSQL hits this custom scan node in
its Vulcano style execution, it will call our callback functions which run
distributed plan and provides tuples to the upper node as it scans a regular
relation. This means fewer code changes, fewer bugs and more supported features
for us!
First, in the distributed query planner phase, we create a Custom Scan which
wraps the distributed plan. For real-time and task-tracker executors, we add
this custom plan under the master query plan. For router executor, we directly
pass the custom plan because there is not any master query. Then, we simply let
the PostgreSQL executor run this plan. When it hits the custom scan node, we
call the related executor parts for distributed plan, fill the tuple store in
the custom scan and return results to PostgreSQL executor in Vulcano style,
a tuple per XXX_ExecScan() call.
* Modify planner to utilize Custom Scan node.
* Create different scan methods for different executors.
* Use native PostgreSQL Explain for master part of queries.
- Break CheckShardPlacements into multiple functions (The most important
is MarkFailedShardPlacements), so that we can get rid of the global
CoordinatedTransactionUses2PC.
- Call MarkFailedShardPlacements in the router executor, so we mark
shards as invalid and stop using them while inside transaction blocks.
With this change we start to error out on router planner queries where a common table
expression with data-modifying statement is present. We already do not support if
there is a data-modifying statement using result of the CTE, now we also error out
if CTE itself is data-modifying statement.
Router planner already handles cases when all shards
are pruned out. This is about missing test cases. Notice that
"column is null" and "column = null" have different shard
pruning behavior.
We used to disable router planner and executor
when task executor is set to task-tracker.
This change enables router planning and execution
at all times regardless of task execution mode.
We are introducing a hidden flag enable_router_execution
to enable/disable router execution. Its default value is
true. User may disable router planning by setting it to false.
Adds support for PostgreSQL 9.6 by copying in the requisite ruleutils
file and refactoring the out/readfuncs code to flexibly support the
old-style copy/pasted out/readfuncs (prior to 9.6) or use extensible
node APIs (in 9.6 and higher).
Most version-specific code within this change is only needed to set new
fields in the AggRef nodes we build for aggregations. Version-specific
test output files were added in certain cases, though in most they were
not necessary. Each such file begins by e.g. printing the major version
in order to clarify its purpose.
The comment atop citus_nodes.h details how to add support for new nodes
for when that becomes necessary.
PostgreSQL 9.5.4 stopped calling planner for materialized view create
command when NO DATA option is provided.
This causes our test to behave differently between pre-9.5.4 and 9.5.4.
We can now support richer set of queries in router planner.
This allow us to support CTEs, joins, window function, subqueries
if they are known to be executed at a single worker with a single
task (all tables are filtered down to a single shard and a single
worker contains all table shards referenced in the query).
Fixes : #501
Fixes#271
This change sets ShardIds and JobIds for each test case. Before this change,
when a new test that somehow increments Job or Shard IDs is added, then
the tests after the new test should be updated.
ShardID and JobID sequences are set at the beginning of each file with the
following commands:
```
ALTER SEQUENCE pg_catalog.pg_dist_shardid_seq RESTART 290000;
ALTER SEQUENCE pg_catalog.pg_dist_jobid_seq RESTART 290000;
```
ShardIds and JobIds are multiples of 10000. Exceptions are:
- multi_large_shardid: shardid and jobid sequences are set to much larger values
- multi_fdw_large_shardid: same as above
- multi_join_pruning: Causes a race condition with multi_hash_pruning since
they are run in parallel.
- non-router plannable queries can be executed
by router executor if they satisfy the criteria
- router executor is removed from configuration,
now task executor can not be set to router
- removed some tests that error out for router executor