Semmle reported quite some places where we use a value that could be NULL. Most of these are not actually a real issue, but better to be on the safe side with these things and make the static analysis happy.
DESCRIPTION: Fix unnecessary repartition on joins with more than 4 tables
In 9.1 we have introduced support for all CH-benCHmark queries by widening our definitions of joins to include joins with expressions in them. This had the undesired side effect of Q5 regressing on its plan by implementing a repartition join.
It turned out this regression was not directly related to widening of the join clause, nor the schema employed by CH-benCHmark. Instead it had to do with 4 or more tables being joined in a chain. A chain meaning:
```sql
SELECT * FROM a,b,c,d WHERE a.part = b.part AND b.part = c.part AND ....
```
Due to how our join order planner was implemented it would only keep track of 1 of the partition columns when comparing if the join could be executed locally. This manifested in a join chain of 4 tables to _always_ be executed as a repartition join. 3 tables joined in a chain would have the middle table shared by the two outer tables causing the local join possibility to be found.
With this patch we keep a unique list (or set) of all partition columns participating in the join. When a candidate table is checked for a possibility to execute a local join it will check if there is any partition column in that set that matches an equality join clause on the partition column of the candidate table.
By taking into account all partition columns in the left relation it will now find the local join path on >= 4 tables joined in a chain.
fixes: #3276
DESCRIPTION: Expression in reference join
Fixed: #2582
This patch allows arbitrary expressions in the join clause when joining to a reference table. An example of such joins could be found in CHbenCHmark queries 7, 8, 9 and 11; `mod((s_w_id * s_i_id),10000) = su_suppkey` and `ascii(substr(c_state,1,1)) = n2.n_nationkey`. Since the join is on a reference table these queries are able to be pushed down to the workers.
To implement these queries we will widen the `IsJoinClause` predicate to not check if the expressions are a type `Var` after stripping the implicit coerciens. Instead we define a join clause when the `Var`'s in a clause come from more than 1 table.
This allows more clauses to pass into the logical planner's `MultiNodeTree(...)` planning function. To compensate for this we tighten down the `LocalJoin`, `SinglePartitionJoin` and `DualPartitionJoin` to check for direct column references when planning. This allows the planner to work with arbitrary join expressions on reference tables.
* Change worker_hash_partition_table() such that the
divergence between Citus planner's hashing and
worker_hash_partition_table() becomes the same.
* Rename single partitioning to single range partitioning.
* Add single hash repartitioning. Basically, logical planner
treats single hash and range partitioning almost equally.
Physical planner, on the other hand, treats single hash and
dual hash repartitioning almost equally (except for JoinPruning).
* Add a new GUC to enable this feature
After this commit large_table_shard_count wont be used to
check whether broadcast join, which is renamed as reference
join, can be applied. Reference join can only be applied over
reference tables.
Adds support for PostgreSQL 10 by copying in the requisite ruleutils
and updating all API usages to conform with changes in PostgreSQL 10.
Most changes are fairly minor but they are numerous. One particular
obstacle was the change in \d behavior in PostgreSQL 10's psql; I had
to add SQL implementations (views, mostly) to mimic the pre-10 output.
This adds a replication_model GUC which is used as the replication
model for any new distributed table that is not a reference table.
With this change, tables with replication factor 1 are no longer
implicitly MX tables.
The GUC is similarly respected during empty shard creation for e.g.
existing append-partitioned tables. If the model is set to streaming
while replication factor is greater than one, table and shard creation
routines will error until this invalid combination is corrected.
Changing this parameter requires superuser permissions.