citus

Commit Graph

Author	SHA1	Message	Date
Marco Slot	2a3234ca26	Rename masterQuery to combineQuery	2020-06-17 14:14:37 +02:00
Marco Slot	2632343f64	Fix intermediate result pruning for INSERT..SELECT	2020-04-07 11:07:49 +02:00
Marco Slot	84672c3dbd	Simplify intermediate result pruning logic	2020-04-07 10:53:29 +02:00
Hanefi Onaldi	c0ad44f975	Fix early exit bug on intermediate result pruning There are 2 problems with our early exit strategy that this commit fixes: 1- When we decide that a subplan results are sent to all worker nodes, we used to skip traversing the whole distributed plan, instead of skipping only the subplan. 2- We used to consider all available nodes in the cluster (secondaries and inactive nodes as well as active primaries) when deciding on early exit strategy. This resulted in failures to early exit when there are secondaries or inactive nodes.	2020-03-05 16:41:44 +03:00
Önder Kalacı	4519d3411d	Improve the representation of used sub plans (#3411 ) Previously, we've identified the usedSubPlans by only looking to the subPlanId. With this commit, we're expanding it to also include information on the location of the subPlan. This is useful to distinguish the cases where the subPlan is used either on only HAVING or both HAVING and any other part of the query.	2020-01-24 10:47:14 +01:00
Önder Kalacı	ef7d1ea91d	Locally execute queries that don't need any data access (#3410 ) * Update shardPlacement->nodeId to uint As the source of the shardPlacement->nodeId is always workerNode->nodeId, and that is uint32. We had this hack because of: `0ea4e52df5 (r266421409)` And, that is gone with: `90056f7d3c (diff-c532177d74c72d3f0e7cd10e448ab3c6L1123)` So, we're safe to do it now. * Relax the restrictions on using the local execution Previously, whenever any local execution happens, we disabled further commands to do any remote queries. The basic motivation for doing that is to prevent any accesses in the same transaction block to access the same placements over multiple sessions: one is local session the other is remote session to the same placement. However, the current implementation does not distinguish local accesses being to a placement or not. For example, we could have local accesses that only touches intermediate results. In that case, we should not implement the same restrictions as they become useless. So, this is a pre-requisite for executing the intermediate result only queries locally. * Update the error messages As the underlying implementation has changed, reflect it in the error messages. * Keep track of connections to local node With this commit, we're adding infrastructure to track if any connection to the same local host is done or not. The main motivation for doing this is that we've previously were more conservative about not choosing local execution. Simply, we disallowed local execution if any connection to any remote node is done. However, if we want to use local execution for intermediate result only queries, this'd be annoying because we expect all queries to touch remote node before the final query. Note that this approach is still limiting in Citus MX case, but for now we can ignore that. * Formalize the concept of Local Node Also some minor refactoring while creating the dummy placement * Write intermediate results locally when the results are only needed locally Before this commit, Citus used to always broadcast all the intermediate results to remote nodes. However, it is possible to skip pushing the results to remote nodes always. There are two notable cases for doing that: (a) When the query consists of only intermediate results (b) When the query is a zero shard query In both of the above cases, we don't need to access any data on the shards. So, it is a valuable optimization to skip pushing the results to remote nodes. The pattern mentioned in (a) is actually a common patterns that Citus users use in practice. For example, if you have the following query: WITH cte_1 AS (...), cte_2 AS (....), ... cte_n (...) SELECT ... FROM cte_1 JOIN cte_2 .... JOIN cte_n ...; The final query could be operating only on intermediate results. With this patch, the intermediate results of the ctes are not unnecessarily pushed to remote nodes. * Add specific regression tests As there are edge cases in Citus MX and with round-robin policy, use the same queries on those cases as well. * Fix failure tests By forcing not to use local execution for intermediate results since all the tests expects the results to be pushed remotely. * Fix flaky test * Apply code-review feedback Mostly style changes * Limit the max value of pg_dist_node_seq to reserve for internal use	2020-01-23 18:28:34 +01:00
Jelte Fennema	1d8dde232f	Automatically convert useless declarations using regex replace (#3181 ) * Add declaration removal to CI * Convert declarations	2019-11-21 13:47:29 +01:00
Hanefi Onaldi	d82f3e9406	Introduce intermediate result broadcasting In plain words, each distributed plan pulls the necessary intermediate results to the worker nodes that the plan hits. This is primarily useful in three ways. (i) If the distributed plan that uses intermediate result(s) is a router query, then the intermediate results are only broadcasted to a single node. (ii) If a distributed plan consists of only intermediate results, which is not uncommon, the intermediate results are broadcasted to a single node only. (iii) If a distributed query hits a sub-set of the shards in multiple workers, the intermediate results will be broadcasted to the relevant node(s). The final item (iii) becomes crucial for append/range distributed tables where typically the distributed queries hit a small subset of shards/workers. To do this, for each query that Citus creates a distributed plan, we keep track of the subPlans used in the queryTree, and save it in the distributed plan. Just before Citus executes each subPlan, Citus first keeps track of every worker node that the distributed plan hits, and marks every subPlan should be broadcasted to these nodes. Later, for each subPlan which is a distributed plan, Citus does this operation recursively since these distributed plans may access to different subPlans, and those have to be recorded as well.	2019-11-20 15:26:36 +03:00

8 Commits (2a3234ca2659a6345759eadbacea44b7797b9918)