citus

Commit Graph

Author	SHA1	Message	Date
Jelte Fennema	4c3676feda	Always return a dummy placement	2020-07-07 14:22:37 +02:00
Marco Slot	b4fec63bc0	Rename master evaluation to coordinator evaluation	2020-07-07 10:37:41 +02:00
Marco Slot	2a3234ca26	Rename masterQuery to combineQuery	2020-06-17 14:14:37 +02:00
Marco Slot	d1bab78d79	Remove master from file hierarchy	2020-06-16 17:49:09 +02:00
Hadi Moshayedi	ef778c1cd7	address feedback from Sait Talha & Hadi	2020-06-12 18:36:02 -07:00
Marco Slot	4f7989ad8e	Rename WorkersContainingAllShards to PlacementsForWorkersContainingAllShards	2020-06-12 18:36:02 -07:00
Marco Slot	080f711e62	Remove useless debug message in router planner	2020-06-12 18:36:02 -07:00
Marco Slot	d953f084db	Rename FindRouterWorkerList to CreateTaskPlacementListForShardIntervals	2020-06-12 18:36:01 -07:00
Marco Slot	24feadc230	Handle joins between local/reference/cte via router planner	2020-06-12 18:36:01 -07:00
Halil Ozan Akgül	8c5eb6b7ea	Insert Select Into Local Table (#3870 ) * Insert select with master query * Use relid to set custom_scan_tlist varno * Reviews * Fixes null check Co-authored-by: Marco Slot <marco.slot@gmail.com>	2020-06-12 17:06:31 +03:00
Philip Dubé	1722d8ac8b	Allow routing modifying CTEs We still recursively plan some cases, eg: - INSERTs - SELECT FOR UPDATE when reference tables in query - Everything must be same single shard & replication model	2020-06-11 15:14:06 +00:00
Onder Kalaci	06461ca55f	Coerce types properly for INSERT Also, unify similar code-paths to rely on more accurate function.	2020-06-10 10:40:28 +02:00
Philip Dubé	25f86bca3f	multi_router_planner: Remove NULL check which would've segfaulted earlier	2020-06-02 13:08:38 +00:00
Philip Dubé	2623aefe38	multi_router_planner: replace GetUpdateOrDeleteRTE with ExtractResultRelationRTE	2020-06-02 00:22:30 +00:00
Philip Dubé	c0515dcd67	This prepares for routing modifying CTEs, where modLevel should not be used to infer whether a plan is a select or not SELECT_TASK is renamed to READ_TASK as a SELECT with modifying CTEs will be a MODIFYING_TASK RouterInsertJob: Assert originalQuery->commandType == CMD_INSERT CreateModifyPlan: Assert originalQuery->commandType != CMD_SELECT Remove unused function IsModifyDistributedPlan DistributedExecution, ExecutionParams, DistributedPlan: Rename hasReturning to expectResults SELECTs set expectResults to true Rename CreateSingleTaskRouterPlan to CreateSingleTaskRouterSelectPlan	2020-05-20 17:26:12 +00:00
Philip Dubé	c0a95a3adb	Copy data from CitusTableCacheEntry more often This copies over fixes from reference counting branch, all CitusTableCacheEntry data may be freed when a GetCitusTableCacheEntry call occurs for its relationId This fix is not complete, but reference counting is being deferred until 9.4 CopyShardInterval: remove dest parameter, always return newly allocated object	2020-04-17 14:17:18 +00:00
SaitTalhaNisanci	a369f9001d	fix incorrect groupid or nodeid (#3710 ) For shardplacements, we were setting nodeid, nodename, nodeport and nodegroup manually. This makes it very error prone, and it seems that we already forgot to set some of them. This would mean that they would have their default values, e.g group id would be 0 when its group id is not 0. So the implication is that we would have inconsistent worker metadata. A new method is introduced, and we call the method to set those fields now, so that as long as we call this method, we won't be setting inconsistent metadata. It probably makes sense to have a struct for these fields. We already have NodeMetadata but it doesn't have nodename or nodeport. So that could be done over another refactor to make things simpler.	2020-04-07 11:14:14 +03:00
Marco Slot	fd8cdb92f4	Evaluate nextval in the target list on the coordinator	2020-04-02 02:53:19 +02:00
SaitTalhaNisanci	ba01f3457a	use macros for pg versions instead of hardcoded values (#3694 ) 3 Macros are defined for removing the hardcoded pg versions. PG_VERSION_11, PG_VERSION_12 and PG_VERSION_13.	2020-04-01 17:01:52 +03:00
SaitTalhaNisanci	98f95e2a5e	add TaskQueryStringForPlacement TaskQueryStringForPlacement simplifies how the executor gets the query string for a given placement. Task will use the necessary fields to return the correct query placement string. Executor doesn't need to know the details for this. rename TaskQueryString as TaskQueryStringAllPlacements TaskQueryString returns the query string that will be the same for all the placements. In INSERT..SELECT the query string can be different for each placement. Adaptive executor uses TaskQueryStringForPlacement, which returns the query string for a placement. It makes sense to rename TaskQueryString as TaskQueryStringAllPlacements as it is returning the query string for all placements. rename SetTaskQuery as SetTaskQueryIfShouldLazyDeparse SetTaskQuery does not always sets the task query. It can set the query string as well. So it is more clear to name it SetTaskQueryIfShouldLazyDeparse, since it will set the query not query string only when we should deparse the query in a lazy way.	2020-03-31 15:47:55 +03:00
Marco Slot	cb3d90bdc8	Simplify INSERT logic in router planner	2020-03-10 15:54:40 +01:00
Philip Dubé	7cdfa1daab	Rename LookupCitusTableCacheEntry to GetCitusTableCacheEntry, LookupLookupCitusTableCacheEntry back to LookupCitusTableCacheEntry	2020-03-08 14:08:23 +00:00
Philip Dubé	a7cca1bcde	Rename DistTableCacheEntry to CitusTableCacheEntry	2020-03-07 14:08:03 +00:00
Philip Dubé	bec58000d6	Given IsDistributedTableRTE, there's ambiguity in what DistributedTable means Elsewhere we used DistributedTable to include reference tables Marco suggested we use CitusTable for distributed & reference tables So renaming: - IsDistributedTable -> IsCitusTable - IsDistributedTableViaCatalog -> IsCitusTableViaCatalog - DistributedTableCacheEntry -> CitusTableCacheEntry - DistributedTableList -> CitusTableList - isDistributedTable -> isCitusTable - InsertSelectIntoDistributedTable -> InsertSelectIntoCitusTable - ExtractFirstDistributedTableId -> ExtractFirstCitusTableId	2020-03-06 18:57:55 +00:00
Marco Slot	dc4c0c032e	Refactor CitusBeginScan into separate DML / SELECT paths	2020-03-05 12:37:22 +01:00
Jelte Fennema	685b54b3de	Semmle: Check for NULL in some places where it might occur (#3509 ) Semmle reported quite some places where we use a value that could be NULL. Most of these are not actually a real issue, but better to be on the safe side with these things and make the static analysis happy.	2020-02-27 10:45:29 +01:00
Jelte Fennema	8de8b62669	Convert unsafe APIs to safe ones	2020-02-25 15:39:27 +01:00
Onder Kalaci	975c4c2264	Do not prune shards if the distribution key is NULL The root of the problem is that, standard_planner() converts the following qual ``` {OPEXPR :opno 98 :opfuncid 67 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 100 :args ( {VAR :varno 1 :varattno 1 :vartype 25 :vartypmod -1 :varcollid 100 :varlevelsup 0 :varnoold 1 :varoattno 1 :location 45 } {CONST :consttype 25 :consttypmod -1 :constcollid 100 :constlen -1 :constbyval false :constisnull true :location 51 :constvalue <> } ) :location 49 } ``` To ``` ( {CONST :consttype 16 :consttypmod -1 :constcollid 0 :constlen 1 :constbyval true :constisnull true :location -1 :constvalue <> } ) ``` So, Citus doesn't deal with NULL values in real-time or non-fast path router queries. And, in the FastPathRouter planner, we check constisnull in DistKeyInSimpleOpExpression(). However, in deferred pruning case, we do not check for isnull for const. Thus, the fix consists of two parts: - Let PruneShards() not crash when NULL parameter is passed - For deferred shard pruning in fast-path queries, explicitly check that we have CONST which is not NULL	2020-02-13 15:00:31 +01:00
Önder Kalacı	ef7d1ea91d	Locally execute queries that don't need any data access (#3410 ) * Update shardPlacement->nodeId to uint As the source of the shardPlacement->nodeId is always workerNode->nodeId, and that is uint32. We had this hack because of: `0ea4e52df5 (r266421409)` And, that is gone with: `90056f7d3c (diff-c532177d74c72d3f0e7cd10e448ab3c6L1123)` So, we're safe to do it now. * Relax the restrictions on using the local execution Previously, whenever any local execution happens, we disabled further commands to do any remote queries. The basic motivation for doing that is to prevent any accesses in the same transaction block to access the same placements over multiple sessions: one is local session the other is remote session to the same placement. However, the current implementation does not distinguish local accesses being to a placement or not. For example, we could have local accesses that only touches intermediate results. In that case, we should not implement the same restrictions as they become useless. So, this is a pre-requisite for executing the intermediate result only queries locally. * Update the error messages As the underlying implementation has changed, reflect it in the error messages. * Keep track of connections to local node With this commit, we're adding infrastructure to track if any connection to the same local host is done or not. The main motivation for doing this is that we've previously were more conservative about not choosing local execution. Simply, we disallowed local execution if any connection to any remote node is done. However, if we want to use local execution for intermediate result only queries, this'd be annoying because we expect all queries to touch remote node before the final query. Note that this approach is still limiting in Citus MX case, but for now we can ignore that. * Formalize the concept of Local Node Also some minor refactoring while creating the dummy placement * Write intermediate results locally when the results are only needed locally Before this commit, Citus used to always broadcast all the intermediate results to remote nodes. However, it is possible to skip pushing the results to remote nodes always. There are two notable cases for doing that: (a) When the query consists of only intermediate results (b) When the query is a zero shard query In both of the above cases, we don't need to access any data on the shards. So, it is a valuable optimization to skip pushing the results to remote nodes. The pattern mentioned in (a) is actually a common patterns that Citus users use in practice. For example, if you have the following query: WITH cte_1 AS (...), cte_2 AS (....), ... cte_n (...) SELECT ... FROM cte_1 JOIN cte_2 .... JOIN cte_n ...; The final query could be operating only on intermediate results. With this patch, the intermediate results of the ctes are not unnecessarily pushed to remote nodes. * Add specific regression tests As there are edge cases in Citus MX and with round-robin policy, use the same queries on those cases as well. * Fix failure tests By forcing not to use local execution for intermediate results since all the tests expects the results to be pushed remotely. * Fix flaky test * Apply code-review feedback Mostly style changes * Limit the max value of pg_dist_node_seq to reserve for internal use	2020-01-23 18:28:34 +01:00
Jelte Fennema	246435be7e	Lazy query deparsing executable queries (#3350 ) Deparsing and parsing a query can be heavy on CPU. When locally executing the query we don't need to do this in theory most of the time. This PR is the first step in allowing to skip deparsing and parsing the query in these cases, by lazily creating the query string and storing the query in the task. Future commits will make use of this and not deparse and parse the query anymore, but use the one from the task directly.	2020-01-17 11:49:43 +01:00
Onder Kalaci	dc17c2658e	Defer shard pruning for fast-path router queries to execution This is purely to enable better performance with prepared statements. Before this commit, the fast path queries with prepared statements where the distribution key includes a parameter always went through distributed planning. After this change, we only go through distributed planning on the first 5 executions.	2020-01-16 16:59:36 +01:00
Philip Dubé	4d9a733c2f	Fix inserting multiple values with row expression partition column causing the insert to be ignored Raise an error instead of silently inserting nothing if we hit this condition in the future	2020-01-15 21:10:50 +00:00
Philip Dubé	4b5d6c3ebe	Rename RelayFileState to ShardState Replace FILE_ prefix with SHARD_STATE_	2020-01-12 05:57:53 +00:00
Jelte Fennema	5b0baea72c	Refactor distributed_planner for better understandability	2020-01-06 14:23:38 +01:00
Onder Kalaci	5a1e752726	Apply feedback - add fastPath field to plan	2020-01-06 12:42:43 +01:00
Onder Kalaci	13a9b55695	Skip expensive checks when fast-path query The definition of fast-path query is very strict. So, we don't need to do some extra checks.	2020-01-06 12:42:43 +01:00
Onder Kalaci	7f3ab7892d	Skip shard pruning when possible We're already traversing the queryTree and finding the distribution key value, so pass it to the later stages of the planning.	2020-01-06 12:42:43 +01:00
Onder Kalaci	ca293116fa	Reduce calls to FastPathRouterQuery() Before this commit, we called it twice durning planning. Instead, we save the information and pass it.	2020-01-06 12:42:43 +01:00
SaitTalhaNisanci	420e21919b	refactor extract distributed insert values rte (#3287 )	2019-12-12 23:47:44 +03:00
Marco Slot	e7a8db5493	Fix issue with some zero-shard modifications	2019-12-12 07:19:10 +01:00
SaitTalhaNisanci	13204487e9	remove copyright years (#3286 )	2019-12-11 21:14:08 +03:00
Marco Slot	486c620a3c	Fix inserts into local tables with distributed subqueries	2019-12-10 10:17:18 +01:00
Philip Dubé	fcf2fd819b	Add distributioncolumncollation to to pg_dist_colocation Use partition column's collation for range distributed tables Don't allow non deterministic collations for hash distributed tables CoPartitionedTables: don't compare unequal types	2019-12-09 19:51:40 +00:00
Marco Slot	6a9c0ea7fe	Fix errors in DML with sublinks hidden by null expressions	2019-12-06 14:25:04 +01:00
Marco Slot	bb3bc10f0c	Fix segfault in column_to_column_name	2019-12-01 23:57:25 +01:00
Marco Slot	16d1ad3666	Remove distinction between SQL_TASK and ROUTER_TASK	2019-11-29 05:58:29 +01:00
SaitTalhaNisanci	aeec3d1544	fix typo in dependent jobs and dependent task (#3244 )	2019-11-28 23:47:28 +03:00
Philip Dubé	261a9de42d	Fix typos: VAR_SET_VALUE_KIND -> VAR_SET_VALUE kind beginnig -> beginning plannig -> planning the the -> the er then -> er than	2019-11-25 23:24:13 +00:00
Jelte Fennema	1d8dde232f	Automatically convert useless declarations using regex replace (#3181 ) * Add declaration removal to CI * Convert declarations	2019-11-21 13:47:29 +01:00
Onder Kalaci	90943a6ce6	Do not include coordinator shards when round-robin is selected When the user picks "round-robin" policy, the aim is that the load is distributed across nodes. However, for reference tables on the coordinator, since local execution kicks in immediately, round-robin is ignored. With this change, we're excluding the placement on the coordinator. Although the approach seems a little bit invasive because of modifications in the placement list, that sounds acceptable. We could have done this in some other ways such as: 1) Add a field to "Task->roundRobinPlacement" (or such), which is updated as the first element after RoundRobinPolicy is applied. During the execution, if that placement is local to the coordinator, skip it and try the other remote placements. 2) On TaskAccessesLocalNode()@local_execution.c, check task_assignment_policy, if round-robin selected and there is local placement on the coordinator, skip it. However, task assignment is done on planning, but this decision is happening on the execution, which could create weird edge cases.	2019-11-15 06:03:32 -08:00

1 2 3 4

174 Commits (4c3676fedaeec4391fc90a18a613d48d3d2d376f)