citus

Commit Graph

Author	SHA1	Message	Date
SaitTalhaNisanci	96adce77d6	rename node/worker utilities (#4003 ) The names were not explicit about what they do, and we have many misusages in the codebase, so they are renamed to be more explicit.	2020-07-09 15:30:35 +03:00
Jelte Fennema	f6e2f1b1cb	Replace words that have bad associations (#3992 ) We had a few words in our codebase that static analysis flagged as having bad associations.	2020-07-08 14:57:48 +02:00
Hadi Moshayedi	23fa421639	Fix task->fetchedExplainAnalyzePlan memory issue.	2020-07-07 07:58:02 -07:00
citus bot	f0693e2f75	Remove unused MaxMasterConnectionCount function	2020-07-07 10:37:57 +02:00
citus bot	bdfeb380d3	Fix some more master->coordinator comments	2020-07-07 10:37:53 +02:00
Marco Slot	b4fec63bc0	Rename master evaluation to coordinator evaluation	2020-07-07 10:37:41 +02:00
Marco Slot	eeffbde8bd	Fix pushdown of constants in aggregate queries	2020-06-30 11:41:16 -07:00
Jelte Fennema	392c5e2c34	Fix wrong cancellation message about distributed deadlocks (#3956 )	2020-06-30 14:57:46 +02:00
Marco Slot	634d6cf9d7	Improve performance of metadata cache (#3924 ) #3866 removed the shard ID hash in metadata_cache.c to simplify cache management, but we observed a significant performance regression that was being masked by the performance improvement provided by #3654 in our benchmarks, but #3654 only applies to specific workloads. This PR brings back the shard ID cache as it existed before #3866 with some extra measures to handle invalidation. When we load a table entry, we overwrite ShardIdCacheEntry->tableEntry pointers for all the shards in that table, though it's possible that the table no longer contains the old shard ID or the table entry is never reloaded, which would leave a dangling pointer once the table entry is freed. To handle that case, we remove all shard ID cache entries that point exactly to that table entry when a table is freed (at the end of the transaction or any call to CitusTableCacheFlushInvalidatedEntries). Co-authored-by: SaitTalhaNisanci <s.talhanisanci@gmail.com> Co-authored-by: Marco Slot <marco.slot@gmail.com> Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>	2020-06-30 12:10:10 +02:00
Hadi Moshayedi	4ed59d2db3	Move more from insert_select_executor to insert_select_planner	2020-06-26 08:08:26 -07:00
Hadi Moshayedi	d34c21890f	Rename CoordinatorInsertSelect... to NonPushableInsertSelect	2020-06-25 08:55:48 -07:00
Hadi Moshayedi	4e8d79998e	Save INSERT/SELECT method in DistributedPlan. This is so we don't need to calculate it twice in insert_select_executor.c and multi_explain.c, which can cause discrepancy if an update in one of them is not reflected in the other site.	2020-06-25 08:55:48 -07:00
SaitTalhaNisanci	f458d1fd1c	Fix/task execution (#3941 ) * Not set TaskExecution with adaptive executor Adaptive executor is using a utility method from task tracker for repartition joins, however adaptive executor doesn't need taskExecution. It is only used by task tracker. This causes a problem when explain analyze is used because what taskExecution is pointing to might be random. We solve this by not setting taskExecution from adaptive executor. So it will stay NULL as set by CreateTask. * use same memory context as task for taskExecution Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>	2020-06-24 12:10:00 +03:00
Marco Slot	2a3234ca26	Rename masterQuery to combineQuery	2020-06-17 14:14:37 +02:00
Jelte Fennema	0259815d3a	Fix EXPLAIN ANALYZE received data counter issues (#3917 ) In #3901 the "Data received from worker(s)" sections were added to EXPLAIN ANALYZE. After merging @pykello posted some review comments. This addresses those comments as well as fixing a other issues that I found while addressing them. The things this does: 1. Fix `EXPLAIN ANALYZE EXECUTE p1` to not increase received data on every execution 2. Fix `EXPLAIN ANALYZE EXECUTE p1(1)` to not return 0 bytes as received data allways. 3. Move `EXPLAIN ANALYZE` specific logic to `multi_explain.c` from `adaptive_executor.c` 4. Change naming of new explain sections to `Tuple data received from node(s)`. Firstly because a task can reference the coordinator too, so "worker(s)" was incorrect. Secondly to indicate that this is tuple data and not all network traffic that was performed. 5. Rename `totalReceivedData` in our codebase to `totalReceivedTupleData` to make it clearer that it's a tuple data counter, not all network traffic. 6. Actually add `binary_protocol` test to `multi_schedule` (woops) 7. Fix a randomly failing test in `local_shard_execution.sql`.	2020-06-17 11:33:38 +02:00
Marco Slot	d1bab78d79	Remove master from file hierarchy	2020-06-16 17:49:09 +02:00
Philip Dubé	39400319e6	Defer freeing CitusTableCacheEntry, as there were memory safety issues before Shard id to index mapping stored in cache entry as there may now be multiple entries alive for a given relation insert_select_executor: revert copying cache entry, which was a hack added to avoid memory safety issues	2020-06-15 16:20:50 +00:00
Jelte Fennema	927de6d187	Show amount of data received in EXPLAIN ANALYZE (#3901 ) Sadly this does not actually work yet for binary protocol data, because when doing EXPLAIN ANALYZE we send two commands at the same time. This means we cannot use `SendRemoteCommandParams`, and thus cannot use the binary protocol. This can still be useful though when using the text protocol, to find out that a lot of data is being sent.	2020-06-15 16:01:05 +02:00
Marco Slot	4f7989ad8e	Rename WorkersContainingAllShards to PlacementsForWorkersContainingAllShards	2020-06-12 18:36:02 -07:00
Marco Slot	d953f084db	Rename FindRouterWorkerList to CreateTaskPlacementListForShardIntervals	2020-06-12 18:36:01 -07:00
Marco Slot	24feadc230	Handle joins between local/reference/cte via router planner	2020-06-12 18:36:01 -07:00
Halil Ozan Akgül	8c5eb6b7ea	Insert Select Into Local Table (#3870 ) * Insert select with master query * Use relid to set custom_scan_tlist varno * Reviews * Fixes null check Co-authored-by: Marco Slot <marco.slot@gmail.com>	2020-06-12 17:06:31 +03:00
Jelte Fennema	0e12d045b1	Support use of binary protocol in between nodes (#3877 ) This can save a lot of data to be sent in some cases, thus improving performance for which inter query bandwidth is the bottleneck. There's some issues with enabling this as default, so that's currently not done.	2020-06-12 15:02:51 +02:00
Nils Dijk	da8f2b0134	Feature: tdigest aggregate (#3897 ) DESCRIPTION: Adds support to partially push down tdigest aggregates tdigest extensions: https://github.com/tvondra/tdigest This PR implements the partial pushdown of tdigest calculations when possible. The extension adds a tdigest type which can be combined into the same structure. There are several aggregate functions that can be used to get; - a quantile - a list of quantiles - the quantile of a hypothetical value - a list of quantiles for a list of hypothetical values These function can work both on values or tdigest types. Since we can create tdigest values either by combining them, or based on a group of values we can rewrite the aggregates in such a way that most of the computation gets delegated to the compute on the shards. This both speeds up the percentile calculations because the values don't have to be sorted while at the same time making the transfer size from the shards to the coordinator significantly less.	2020-06-12 13:50:28 +02:00
Philip Dubé	1722d8ac8b	Allow routing modifying CTEs We still recursively plan some cases, eg: - INSERTs - SELECT FOR UPDATE when reference tables in query - Everything must be same single shard & replication model	2020-06-11 15:14:06 +00:00
Hadi Moshayedi	7c52c6edb0	CTE statistics in EXPLAIN ANALYZE	2020-06-11 02:39:59 -07:00
Hadi Moshayedi	bb96ef5047	Does the EXPLAIN ANALYZE at the same time as execution, so avoids executing twice. We wrap worker tasks in worker_save_query_explain_analyze() so we can fetch their explain output later by a call worker_last_saved_explain_analyze(). Fixes #3519 Fixes #2347 Fixes #2613 Fixes #621	2020-06-11 01:55:57 -07:00
Marco Slot	1243b6a948	Execute shard creation as utility tasks	2020-06-10 11:29:49 +02:00
Onder Kalaci	06461ca55f	Coerce types properly for INSERT Also, unify similar code-paths to rely on more accurate function.	2020-06-10 10:40:28 +02:00
Hadi Moshayedi	5cdfa9f571	Implement EXPLAIN ANALYZE udfs. Implements worker_save_query_explain_analyze and worker_last_saved_explain_analyze. worker_save_query_explain_analyze executes and returns results of query while saving its EXPLAIN ANALYZE to be fetched later. worker_last_saved_explain_analyze returns the saved EXPLAIN ANALYZE result.	2020-06-09 10:02:05 -07:00
Onur Tirtir	a4f1c41391	Implement GetQueryLockMode helper (#3860 ) If we want to get necessary lockmode for a relation RangeVar within a query, we can get the lockmode easily from the RangeVar itself (if pg version >= 12). However, if we want to decide the lockmode appropriate for the "query", we can derive this information by using GetQueryLockMode according to the code comment from RangeTblEntry->rellockmode.	2020-06-09 13:08:44 +03:00
Hadi Moshayedi	198d5d8b0f	typedef TupleDestination once	2020-06-08 20:38:28 -07:00
Hadi Moshayedi	0bfd39ea52	Implement TupleDestination intereface. Implements a new `TupleDestination` interface to allow custom tuple processing per task. This can be specially useful if a task contains multiple queries. An example of this EXPLAIN ANALYZE, where it needs to add some UDF calls to the query to fetch the explain output from worker after fetching the actual query results.	2020-06-05 17:47:40 -07:00
SaitTalhaNisanci	d0f47eb338	Check the removeType in IsDropCitusStmt (#3859 ) We should check the remove type in IsDropCitusStmt because if the remove type is not OBJECT_EXTENSION then the stored objects in dropStmt->objects may not be of type Value. This was crashing PG-13. Also rename the method as IsDropCitusExtensionStmt.	2020-06-05 20:49:54 +03:00
Onur Tirtir	f7224a12f2	Implement PushOverrideEmptySearchPath (#3874 ) To reduce code duplication, implement function that pushes search_path to be NIL and sets addCatalog to true so that all objects outside of pg_catalog will be schema-prefixed.	2020-06-05 19:23:59 +03:00
Onur Tirtir	f3f711e097	Implement IndexIsImpliedByAConstraint	2020-06-05 15:33:54 +03:00
Onur Tirtir	dfcc18468c	Error out for unsupported trigger objects Error out if creating a citus table from a table having triggers. Error out for CREATE TRIGGER commands that are run on citus tables.	2020-05-31 23:10:01 +03:00
Onur Tirtir	6e6bc155a9	Implement methods to process & recreate triggers on citus tables	2020-05-31 15:28:17 +03:00
Philip Dubé	c0515dcd67	This prepares for routing modifying CTEs, where modLevel should not be used to infer whether a plan is a select or not SELECT_TASK is renamed to READ_TASK as a SELECT with modifying CTEs will be a MODIFYING_TASK RouterInsertJob: Assert originalQuery->commandType == CMD_INSERT CreateModifyPlan: Assert originalQuery->commandType != CMD_SELECT Remove unused function IsModifyDistributedPlan DistributedExecution, ExecutionParams, DistributedPlan: Rename hasReturning to expectResults SELECTs set expectResults to true Rename CreateSingleTaskRouterPlan to CreateSingleTaskRouterSelectPlan	2020-05-20 17:26:12 +00:00
Onur Tirtir	79a688ffe0	Refactor the methods accessing to pg_constraint Implement internal functions to accces to pg_contraint and utilize them in existing foreign key checks.	2020-05-20 17:27:17 +03:00
SaitTalhaNisanci	22c903b151	remove ExecuteUtilityTaskListWithoutResults (#3696 ) This PR removes ExecuteUtilityTaskListWithoutResults and uses the same path for local execution via ExecuteTaskListExtended. ExecuteUtilityTaskList is added. ExecuteLocalTaskListExtended now has a parameter for utility commands so that it can call the right method. In order not to change the existing calls, ExecuteTaskListExtendedInternal is added, which is the main method that runs the execution, via local and remote execution.	2020-05-07 13:30:50 +03:00
Marco Slot	6ce2803777	Make sure we don't wrap GROUP BY expressions in any_value	2020-05-05 05:12:45 +02:00
Onder Kalaci	0cb7ab2d05	Explicitly mark queries in physical planner for [not] having parameters Physical planner doesn't support parameters. If the parameters have already been resolved when the physical planner handling the queries, mark it. The reason is that the executor is unaware of this, and sends the parameters along with the worker queries, which fails for composite types. (See `DissuadePlannerFromUsingPlan()` for the details of paramater resolving)	2020-04-24 12:49:43 +02:00
Onur Tirtir	b8dd8f50d1	Fix build issue in GCC 10 (#3790 ) As reported in #3787, we were having issues while building citus with "GCC Red Hat 10" (maybe in some other versions of gcc as well). Fixes "multiple definition of 'CitusNodeTagNames'" error by explicitly specifying storage of CitusNodeTagNames to be extern.	2020-04-22 16:41:34 +03:00
Philip Dubé	c0a95a3adb	Copy data from CitusTableCacheEntry more often This copies over fixes from reference counting branch, all CitusTableCacheEntry data may be freed when a GetCitusTableCacheEntry call occurs for its relationId This fix is not complete, but reference counting is being deferred until 9.4 CopyShardInterval: remove dest parameter, always return newly allocated object	2020-04-17 14:17:18 +00:00
Önder Kalacı	a919f09c96	Remove the entries from the shared connection counter hash when no connections remain (#3775 ) We initially considered removing entries just before any change to pg_dist_node. However, that ended-up being very complex and making MX even more complex. Instead, we're switching to a simpler solution, where we remove entries when the counter gets to 0. With certain workloads, this may have some performance penalty. But, two notes on that: - When counter == 0, it implies that the cluster is not busy - With cached connections, that's not possible	2020-04-17 17:14:58 +03:00
SaitTalhaNisanci	a9a3be15cc	introduce TASK_QUERY_NULL task type (#3774 ) When we call SetTaskQueryString we would set the task type to TASK_QUERY_TEXT, and some parts of the codebase rely on the fact that if TASK_QUERY_TEXT is set, the data can be read safely. However if SetTaskQueryString is called with a NULL taskQueryString this can cause crashes. In that case taskQueryType will simply be set to TASK_QUERY_NULL.	2020-04-17 14:59:22 +03:00
Nils Dijk	1d6ba1d09e	Refactor alter role to work on distributed roles (#3739 ) DESCRIPTION: Alter role only works for citus managed roles Alter role was implemented before we implemented good role management that hooks into the object propagation framework. This is a refactor of all alter role commands that have been implemented to - be on by default - only work for supported roles - make the citus extension owner a supported role Instead of distributing the alter role commands for roles at the beginning of the node activation role it now _only_ executes the alter role commands for all users in all databases and in the current database. In preparation of full role support small refactors have been done in the deparser. Earlier tests targeting other roles than the citus extension owner have been either slightly changed or removed to be put back where we have full role support. Fixes #2549	2020-04-16 12:23:27 +02:00
Marco Slot	8b83306a27	Issue worker messages with the same log level	2020-04-14 21:08:25 +02:00
SaitTalhaNisanci	132efdbc56	add execution params struct (#3747 ) We had 9+ parameters in some of the functions related to execution. Execution params is created to simplify this a bit so that we can set only the fields that we are interested in and it is easier to read.	2020-04-14 14:32:40 +03:00
Onder Kalaci	aa6b641828	Throttle connections to the worker nodes With this commit, we're introducing a new infrastructure to throttle connections to the worker nodes. This infrastructure is useful for multi-shard queries, router queries are have not been affected by this. The goal is to prevent establishing more than citus.max_shared_pool_size number of connections per worker node in total, across sessions. To do that, we've introduced a new connection flag OPTIONAL_CONNECTION. The idea is that some connections are optional such as the second (and further connections) for the adaptive executor. A single connection is enough to finish the distributed execution, the others are useful to execute the query faster. Thus, they can be consider as optional connections. When an optional connection is not allowed to the adaptive executor, it simply skips it and continues the execution with the already established connections. However, it'll keep retrying to establish optional connections, in case some slots are open again.	2020-04-14 10:27:48 +02:00
Onder Kalaci	0dbfbe0c37	Add the necessary shared memory infrastructure - The hashmap in the shared memory - The lock to access the hashmap - The GUC to control the size	2020-04-14 10:03:26 +02:00
Philip Dubé	30f10984e1	Defer get_agg_clause_costs, it happens later & avoids errors	2020-04-10 13:26:05 +00:00
SaitTalhaNisanci	3dc7cad754	use an enum for local execution status (#3733 ) We have two variables that are related to local execution status. TransactionAccessedLocalPlacement and TransactionConnectedToLocalGroup. Only one of these fields should be set, however we didn't have any check for this contraint and it was error prone. What those two variables are used is that we are trying to understand if we should use local execution, the current session, or if we should be using a connection to execute the current query, therefore the tasks. In the enum, now it is more clear what these variables mean. Also, now we have a method to change the local execution status. The method will error if we are trying to transition from a state to a wrong state. This will help us avoid problems.	2020-04-09 19:11:04 +03:00
Hadi Moshayedi	dda53a0bba	GUC for replicate reference tables on activate.	2020-04-08 12:42:45 -07:00
Hadi Moshayedi	0758a81287	Prevent reference tables being dropped when replicating reference tables	2020-04-08 12:41:36 -07:00
Marco Slot	924cd7343a	Defer reference table replication to shard creation time	2020-04-08 12:41:36 -07:00
Onder Kalaci	9b29a32d7a	Remove all references for side channel connections We don't need any side channel connections. That is actually problematic in the sense that it creates extra connections. Say, citus.max_adaptive_executor_pool_size equals to 1, Citus ends up using one extra connection for the intermediate results. Thus, not obeying citus.max_adaptive_executor_pool_size. In this PR, we remove the following entities from the codebase to allow further commits to implement not requiring extra connection for the intermediate results: - The connection flag REQUIRE_SIDECHANNEL - The function GivePurposeToConnection - The ConnectionPurpose struct and related fields	2020-04-07 17:06:55 +02:00
Marco Slot	2632343f64	Fix intermediate result pruning for INSERT..SELECT	2020-04-07 11:07:49 +02:00
Marco Slot	84672c3dbd	Simplify intermediate result pruning logic	2020-04-07 10:53:29 +02:00
SaitTalhaNisanci	a369f9001d	fix incorrect groupid or nodeid (#3710 ) For shardplacements, we were setting nodeid, nodename, nodeport and nodegroup manually. This makes it very error prone, and it seems that we already forgot to set some of them. This would mean that they would have their default values, e.g group id would be 0 when its group id is not 0. So the implication is that we would have inconsistent worker metadata. A new method is introduced, and we call the method to set those fields now, so that as long as we call this method, we won't be setting inconsistent metadata. It probably makes sense to have a struct for these fields. We already have NodeMetadata but it doesn't have nodename or nodeport. So that could be done over another refactor to make things simpler.	2020-04-07 11:14:14 +03:00
Onur Tirtir	4c95ad1579	do not traverse parse tree in distributed planner one more time	2020-04-03 18:24:48 +03:00
Onur Tirtir	13a35c6813	implement GetOnlyShardOidOfReferenceTable and some refactor in shard_uitls	2020-04-03 18:24:13 +03:00
SaitTalhaNisanci	0aebd78ea7	use localExecution in ExecuteTaskListExtended ExecuteTaskListExtended is the common method for different codepaths, and instead of writing separate local execution logics in different codepaths, it makes more sense to have the logic here. We still need to do some refactoring, this is an initial step. After this commit, we can run create shard commands locally. There is a special case with shard creation commands. A create shard command might have a concatenated query string, however local execution did not know how to execute a task with multiple query strings. This is also implemented in this commit. We go over each query in the concatenated query string and plan/execute them one by one. A more clean solution to this would be to make sure that each task has a single query. We currently cannot do that because we need to ensure the task dependencies. However, it would make sense to do that at some point and it would simplify the code a lot.	2020-04-01 18:23:16 +03:00
SaitTalhaNisanci	ba01f3457a	use macros for pg versions instead of hardcoded values (#3694 ) 3 Macros are defined for removing the hardcoded pg versions. PG_VERSION_11, PG_VERSION_12 and PG_VERSION_13.	2020-04-01 17:01:52 +03:00
SaitTalhaNisanci	6cd32b0db1	refactor ExecuteLocalTaskList (#3617 ) ExecuteLocalTaskList doesn't need scanState as it only uses paramListInfo, distributedPlan and tupleStoreState. It is better to pass only the variables that the function needs, so that we can call this function from other places when we dont have scanState.	2020-03-31 19:19:54 +03:00
SaitTalhaNisanci	b5591b1b28	use taskQuery as a struct to simplify the code	2020-03-31 15:47:55 +03:00
SaitTalhaNisanci	8806c4d697	move queryStringList into taskQuery Also allocate task query in the memory context of task.	2020-03-31 15:47:55 +03:00
SaitTalhaNisanci	c796ac335d	add TaskQuery struct to abstract query string related fields We had many fields in task related to query strings. It was kind of complex, and only of them could be set at a time. Therefore it makes more sense to abstract this and use a union so that it is clear that only of them should be set. We have three fields that could have query related strings: - queryForLocation - queryStringLazy - perPlacementQueryStrings Relatively, they can be set with: - SetTaskQueryString - SetTaskQueryIfShouldLazyDeparse - SetTaskPerPlacementQueryStrings The direct usage of the query related fields are also removed. Rename queryForLocalExecution Currently queryForLocalExecution is only used for deparsing purposes, therefore it makes sense to rename it to what it is doing.	2020-03-31 15:47:55 +03:00
SaitTalhaNisanci	98f95e2a5e	add TaskQueryStringForPlacement TaskQueryStringForPlacement simplifies how the executor gets the query string for a given placement. Task will use the necessary fields to return the correct query placement string. Executor doesn't need to know the details for this. rename TaskQueryString as TaskQueryStringAllPlacements TaskQueryString returns the query string that will be the same for all the placements. In INSERT..SELECT the query string can be different for each placement. Adaptive executor uses TaskQueryStringForPlacement, which returns the query string for a placement. It makes sense to rename TaskQueryString as TaskQueryStringAllPlacements as it is returning the query string for all placements. rename SetTaskQuery as SetTaskQueryIfShouldLazyDeparse SetTaskQuery does not always sets the task query. It can set the query string as well. So it is more clear to name it SetTaskQueryIfShouldLazyDeparse, since it will set the query not query string only when we should deparse the query in a lazy way.	2020-03-31 15:47:55 +03:00
SaitTalhaNisanci	982b5fbabf	add SetTaskPerPlacementStrings It is possible that a task will have different query string for each placement. This is the case in INSERT..SELECT via repartitioning. When we are setting task->perPlacementQueryString, we should set queryStringLazy to NULL. Therefore a method for that purpose is created.	2020-03-31 15:47:55 +03:00
Marco Slot	331b45348c	Fix error when using LEFT JOIN with GROUP BY on primary key	2020-03-30 16:42:22 +02:00
SaitTalhaNisanci	e1802c5c00	extract local plan cache related methods into a file (#3667 )	2020-03-31 11:11:34 +03:00
Philip Dubé	4eb2c33f38	multi_copy.c: remove tableMetadata	2020-03-30 19:26:44 +00:00
Jelte Fennema	3be665269f	Reintroduce ForceSearchShardPlacementInList (#3664 ) This was added to silence static analysis errors. It was removed accidentally in #3591. This reintroduces it again.	2020-03-27 14:28:50 +01:00
Hanefi Onaldi	0e8103b101	Propagate ALTER ROLE .. SET statements In PostgreSQL, user defaults for config parameters can be changed by ALTER ROLE .. SET statements. We wish to propagate those defaults accross the Citus cluster so that the behaviour will be similar in different workers. The defaults can either be set in a specific database, or the whole cluster, similarly they can be set for a single role or all roles. We propagate the ALTER ROLE .. SET if all the conditions below are met: - The query affects the current database, or all databases - The user is already created in worker nodes	2020-03-27 13:02:48 +03:00
SaitTalhaNisanci	dd1a456407	store query command list in task (#3649 ) Sometimes we have concatenated query strings for a task. However, when we want to find each query string, it is not a trivial task. Therefore, it makes sense to store this in task so that when we need each query string we can easily get it.	2020-03-26 12:04:08 +03:00
Philip Dubé	720525cfda	Add support for window functions on coordinator Some refactoring: Consolidate expression which decides whether GROUP BY/HAVING are pushed down Rename early pullUpIntermediateRows to hasNonDistributableAggregates Create WorkerColumnName to handle formatting WORKER_COLUMN_FORMAT Ignore NULL StringInfo pointers to SafeToPushdownWindowFunction Fix bug where SubqueryPushdownMultiNodeTree mutates supplied Query, SafeToPushdownWindowFunction requires the original query as it relies on rtable	2020-03-25 15:31:20 +00:00
Onur Tirtir	52fd58d51f	move MakeNameListFromRangeVar function to a more appropriate file	2020-03-25 11:01:50 +03:00
Jelte Fennema	2aabe3e2ef	Mark all connections for shutdown when citus.node_conninfo chan… (#3642 ) We cache connections between nodes in our connection management code. This is good for speed. For security this can be a problem though. If the user changes settings related to TLS encryption they want those to be applied to future queries. This is especially important when they did not have TLS enabled before and now they want to enable it. This can normally be achieved by changing citus.node_conninfo. However, because connections are not reopened there will still be old connections that might not be encrypted at all. This commit changes that by marking all connections to be shutdown at the end of their current transaction. This way running transactions will succeed, even if placement requires connections to be reused for this transaction. But after this transaction completes any future statements will use a connection created with the new connection options. If a connection is requested and a connection is found that is marked for shutdown, then we don't return this connection. Instead a new one is created. This is needed to make sure that if there are no running transactions, then the next statement will not use an old cached connection, since connections are only actually shutdown at the end of a transaction.	2020-03-24 15:31:41 +01:00
Marco Slot	ede176d849	Implement shard placement copying	2020-03-23 08:33:08 -07:00
SaitTalhaNisanci	3df578010e	add a UDF to update colocation (#3623 ) If two tables have the same distribution column type, we implicitly colocate them. This is useful since colocation has a big performance impact in most applications. When a table is rebalanced, all of the colocated tables are also rebalanced. If table A and table B are colocated and we want to rebalance table A, table B will also be rebalanced. We need replica identity so that logical replication can replicate updates and deletes during rebalancing. If table B does not have a replica identity we error out. A solution to this is to introduce a UDF so that colocation can be updated. The remaining tables in the colocation group will stay colocated. For example if table A, B and C are colocated and after updating table B's colocations, table A and table C stay colocated. The "updating colocation" step does not move any data around, it only updated pg_dist_partition and pg_dist_colocation tables. Specifically it creates a new colocation group for the table and updates the entry in pg_dist_partition while invalidating any cache.	2020-03-23 13:22:24 +03:00
Onder Kalaci	7b4eb9611b	Properly terminate connections at the end session Citus coordinator (or MX nodes) caches `citus.max_cached_conns_per_worker` connections per node. This means that, those connections are not terminated after each statement. Instead, cached to avoid the cost of re-establishment. This is crucial for OLTP performance. The problem with that approach is that, we never properly handle the termnation of those cached connections. For instance, when a session on the coordinator disconnects, you'd see the following logs on the workers: ``` 2020-03-20 09:13:39.454 CET [64028] LOG: could not receive data from client: Connection reset by peer ``` With this patch, we're terminating the cached connections properly at the end of the connection.	2020-03-20 17:34:34 +01:00
SaitTalhaNisanci	9d2f3c392a	enable local execution in INSERT..SELECT and add more tests We can use local copy in INSERT..SELECT, so the check that disables local execution is removed. Also a test for local copy where the data size > LOCAL_COPY_FLUSH_THRESHOLD is added. use local execution with insert..select	2020-03-18 09:34:39 +03:00
SaitTalhaNisanci	42cfc4c0e9	apply review items log shard id in local copy and add more comments	2020-03-18 09:33:55 +03:00
SaitTalhaNisanci	1df9601e13	not use local copy if current transaction is connected to local group If current transaction is connected to local group we should not use local copy, because we might not see some of the changes that are made over the connection to the local group.	2020-03-18 09:28:59 +03:00
SaitTalhaNisanci	f9c4431885	add the support to execute copy locally A copy will be executed locally if - Local execution is enabled and current transaction accessed a local placement - Local execution is enabled and we are inside a transaction block. So even if local execution is enabled but we are not in a transaction block, the copy will not be run locally. This will not run locally: ``` COPY distributed_table FROM STDIN; .... ``` This will run locally: ``` SET citus.enable_local_execution to 'on'; BEGIN; COPY distributed_table FROM STDIN; COMMIT; .... ``` . There are 3 ways to do a copy in postgres programmatically: - from a file - from a program - from a callback function I have chosen to implement it with a callback function, which means that we write the rows of copy from a callback function to the output buffer, which is used to insert tuples into the actual table. For each shard id, we have a buffer that keeps the current rows to be written, we perform the actual copy operation either when: - copy buffer for the given shard id reaches to a threshold, which is currently 512KB - we reach to the end of the copy The buffer size is debatable(512KB). At a given time, we might allocate (local placement * buffer size) memory at most. The local copy uses the same copy format as remote copy, which means that we serialize the data in the same format as remote copy and send it locally. There was also the option to use ExecSimpleRelationInsert to insert slots one by one, which would avoid the extra serialization/deserialization but doing some benchmarks it seems that using buffers are significantly better in terms of the performance. You can see this comment for more details: https://github.com/citusdata/citus/pull/3557#discussion_r389499054	2020-03-18 09:28:59 +03:00
Onur Tirtir	a14739f808	Local execution of ddl/drop/truncate commands (#3514 ) * reimplement ExecuteUtilityTaskListWithoutResults for local utility command execution * introduce new functions for local execution of utility commands * change ErrorIfTransactionAccessedPlacementsLocally logic for local utility command execution * enable local execution for TRUNCATE command on distributed & reference tables * update existing tests for local utility command execution * enable local execution for DDL commands on distributed & reference tables * enable local execution for DROP command on distributed & reference tables * add normalization rules for cascaded commands * add new tests for local utility command execution	2020-03-13 15:39:32 +03:00
Jelte Fennema	c4cc26ed37	Semmle: Ensure stack memory is not leaked through uninitialized… (#3561 ) New stack memory can contain anything including passwords/private keys. In these functions we return structs that can have their padding bytes uninitialized. By first zeroing out the struct fully, we try to ensure that any data that is in these padding bytes is at least overwritten once. It might not be zero anymore after setting the fields, but at least it shouldn't be private data anymore.	2020-03-11 20:05:36 +01:00
Philip Dubé	81cfa05d3d	First phase of addressing HAVING subquery issues Add failing tests, make changes to avoid crashes at least Fix HAVING subquery pushdown ignoring reference table only subqueries, also include HAVING in recursive planning Given that we have a function IsDistributedTable which includes reference tables, it seems best to have IsDistributedTableRTE & QueryContainsDistributedTableRTE reflect that they do not include reference tables in their check Similarly SublinkList's name should reflect that it only scans WHERE contain_agg_clause asserts that we don't have SubLinks, use contain_aggs_of_level as suggested by pg sourcecode	2020-03-09 17:58:30 +00:00
SaitTalhaNisanci	321d0152c1	add a utility to get shard oid from relation oid and shard id (#3596 )	2020-03-09 15:50:29 +03:00
Hanefi Onaldi	2595b4864b	Remove all GetWorkerNodeCount() references As @onderkalaci suggested removing the definition of GetWorkerNodeCount() that can potentially cause misunderstandings. I can advise using ActiveReadableWorkerNodeCount() that returns the number of active primaries is a safer alternative than GetWorkerNodeCount() that returns the total number of workers containing inactives, primaries, and unavailable nodes. I introduced a bug #3556 and in the bugfix #3564 removed the single usage of said function	2020-03-09 13:35:18 +03:00
Philip Dubé	7cdfa1daab	Rename LookupCitusTableCacheEntry to GetCitusTableCacheEntry, LookupLookupCitusTableCacheEntry back to LookupCitusTableCacheEntry	2020-03-08 14:08:23 +00:00
Philip Dubé	a7cca1bcde	Rename DistTableCacheEntry to CitusTableCacheEntry	2020-03-07 14:08:03 +00:00
Philip Dubé	b514ab0f55	Fix typos, rename isDistributedRelation to isCitusRelation	2020-03-06 19:20:34 +00:00
Philip Dubé	bec58000d6	Given IsDistributedTableRTE, there's ambiguity in what DistributedTable means Elsewhere we used DistributedTable to include reference tables Marco suggested we use CitusTable for distributed & reference tables So renaming: - IsDistributedTable -> IsCitusTable - IsDistributedTableViaCatalog -> IsCitusTableViaCatalog - DistributedTableCacheEntry -> CitusTableCacheEntry - DistributedTableList -> CitusTableList - isDistributedTable -> isCitusTable - InsertSelectIntoDistributedTable -> InsertSelectIntoCitusTable - ExtractFirstDistributedTableId -> ExtractFirstCitusTableId	2020-03-06 18:57:55 +00:00
Marco Slot	dc4c0c032e	Refactor CitusBeginScan into separate DML / SELECT paths	2020-03-05 12:37:22 +01:00
Nils Dijk	268ad741a9	Refactor the deparsing of a CREATE EXTENSION to prevent NULL POINTER dereferences (#3518 ) DESCRIPTION: satisfy static analysis tool for a nullptr dereference During the static analysis project on the codebase this code has been flagged as having the potential for a null pointer dereference. Funnily enough the author had already made a comment of it in the code this was not possible due to us setting the schema name before we pass in the statement. If we want to reuse this code in a later setting this comment might not always apply and we could actually run into null pointer dereference. This patch changes a bit of the code around to first of all make sure there is no NULL pointer dereference in this code anymore. Secondly we allow for better deparsing by setting and adhering to the `if_not_exists` flag on the statement. And finally add support for all syntax described in the documentation of postgres (FROM was missing).	2020-03-04 16:47:07 +01:00
Philip Dubé	20abc4d2b5	Replace foreach with foreach_ptr/foreach_oid (#3544 )	2020-02-27 16:54:49 +01:00
Jelte Fennema	c48f0ca7e5	Make bad refactors to foreach_xxx error out Without this commit you could still use varCell in the body of loop. This makes it easy for bad refactors that still use the ListCell to slip through unnoticed, because the new ListCell will be named the same as the one used in the old code. By renaming the ListCell to varCellDoNotUse this will not happen.	2020-02-27 10:59:45 +01:00
Jelte Fennema	685b54b3de	Semmle: Check for NULL in some places where it might occur (#3509 ) Semmle reported quite some places where we use a value that could be NULL. Most of these are not actually a real issue, but better to be on the safe side with these things and make the static analysis happy.	2020-02-27 10:45:29 +01:00
SaitTalhaNisanci	82d22b34fe	create temp schemas in parallel (#3540 )	2020-02-26 16:20:08 +03:00
SaitTalhaNisanci	d94c3fd43d	send repartition cleanup jobs in parallel to all workers (#3485 ) * send repartition cleanup jobs in parallel to all workers * add review items	2020-02-26 13:44:06 +03:00
Jelte Fennema	8de8b62669	Convert unsafe APIs to safe ones	2020-02-25 15:39:27 +01:00
Nils Dijk	a77ed9cd23	Refactor master query to be planned by postgres' planner (#3326 ) DESCRIPTION: Replace the query planner for the coordinator part with the postgres planner Closes #2761 Citus had a simple rule based planner for the query executed on the query coordinator. This planner grew over time with the addigion of SQL support till it was getting close to the functionality of the postgres planner. Except the code was brittle and its complexity rose which made it hard to add new SQL support. Given its resemblance with the postgres planner it was a long outstanding wish to replace our hand crafted planner with the well supported postgres planner. This patch replaces our planner with a call to postgres' planner. Due to the functionality of the postgres planner we needed to support both projections and filters/quals on the citus custom scan node. When a sort operation is planned above the custom scan it might require fields to be reordered in the custom scan before returning the tuple (projection). The postgres planner assumes every custom scan node implements projections. Because we controlled the plan that was created we prevented reordering in the custom scan and never had implemented it before. A same optimisation applies to having clauses that could have been where clauses. Instead of applying the filter as a having on the aggregate it will push it down into the plan which could reach a custom scan node. For both filters and projections we have implemented them when tuples are read from the tuple store. If no projections or filters are required it will directly return the tuple from the tuple store. Otherwise it will loop tuples from the tuple store through the filter and projection until a tuple is found and returned. Besides filters being pushed down a side effect of having quals that could have been a where clause is that a call to read intermediate result could be called before the first tuple is fetched from the custom scan. This failed because the intermediate result would only be pulled to the coordinator on the first tuple fetch. To overcome this problem we do run the distributed subplans now before we run the postgres executor. This ensures the intermediate result is present on the coordinator in time. We do account for total time instrumentation by removing the instrumentation before handing control to the psotgres executor and update the timings our self. For future SQL support it is enough to create a valid query structure for the part of the query to be executed on the query coordinating node. As a utility we do serialise and print the query at debug level4 for engineers to inspect what kind of query is being planned on the query coordinator.	2020-02-25 14:39:56 +01:00
Onur Tirtir	3c99db40b9	Some small typos & cleanup	2020-02-24 16:37:55 +03:00
Jelte Fennema	2a9fccc7a0	Remove READFUNCs (#3536 ) We don't actually use these functions anymore since merging #1477. Advantages of removing: 1. They add work whenever we add a new node. 2. They contain some usage of stdlib APIs that are banned by Microsoft. Removing it means we don't have to replace those with safe ones.	2020-02-24 12:43:28 +01:00
Philip Dubé	bcf54c5014	Address a couple issues with maintenace daemon management: - Stop the daemon when citus extension is dropped - Bail on maintenance daemon startup if myDbData is started with a non-zero pid - Stop maintenance daemon from spawning itself - Don't use postgres die, just wrap proc_exit(0) - Assert(myDbData->workerPid == MyProcPid) The two issues were that multiple daemons could be running for a database, or that a daemon would be leftover after DROP EXTENSION citus	2020-02-21 16:49:01 +00:00
Jelte Fennema	00d667c41d	Semmle: Fix obvious issues (#3502 ) Fixes some obvious issues found by the Semmle static analysis tool.	2020-02-21 10:16:00 +01:00
Philip Dubé	52042d4a00	Prefer instr_time to TimestampTz when we want CLOCK_MONOTONIC	2020-02-19 00:34:17 +00:00
Philip Dubé	08f6842d50	Fix typos Equivalance -> Equivalence utillity -> utility shorted lived one -> shortly lived one elegible -> eligible	2020-02-18 17:14:40 +00:00
Marco Slot	038e5999cb	Implement direct COPY table TO stdout	2020-02-17 15:15:10 +01:00
Onder Kalaci	975c4c2264	Do not prune shards if the distribution key is NULL The root of the problem is that, standard_planner() converts the following qual ``` {OPEXPR :opno 98 :opfuncid 67 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 100 :args ( {VAR :varno 1 :varattno 1 :vartype 25 :vartypmod -1 :varcollid 100 :varlevelsup 0 :varnoold 1 :varoattno 1 :location 45 } {CONST :consttype 25 :consttypmod -1 :constcollid 100 :constlen -1 :constbyval false :constisnull true :location 51 :constvalue <> } ) :location 49 } ``` To ``` ( {CONST :consttype 16 :consttypmod -1 :constcollid 0 :constlen 1 :constbyval true :constisnull true :location -1 :constvalue <> } ) ``` So, Citus doesn't deal with NULL values in real-time or non-fast path router queries. And, in the FastPathRouter planner, we check constisnull in DistKeyInSimpleOpExpression(). However, in deferred pruning case, we do not check for isnull for const. Thus, the fix consists of two parts: - Let PruneShards() not crash when NULL parameter is passed - For deferred shard pruning in fast-path queries, explicitly check that we have CONST which is not NULL	2020-02-13 15:00:31 +01:00
Onur Tirtir	39df51e903	Introduce objects to dist. infrastructure when updating Citus (#3477 ) Mark existing objects that are not included in distributed object infrastructure in older versions of Citus (but now should be) as distributed, after updating Citus successfully.	2020-02-07 18:07:59 +03:00
Nils Dijk	d5433400f9	Fix: Unnecessary repartition on joins with more than 4 tables (#3473 ) DESCRIPTION: Fix unnecessary repartition on joins with more than 4 tables In 9.1 we have introduced support for all CH-benCHmark queries by widening our definitions of joins to include joins with expressions in them. This had the undesired side effect of Q5 regressing on its plan by implementing a repartition join. It turned out this regression was not directly related to widening of the join clause, nor the schema employed by CH-benCHmark. Instead it had to do with 4 or more tables being joined in a chain. A chain meaning: ```sql SELECT * FROM a,b,c,d WHERE a.part = b.part AND b.part = c.part AND .... ``` Due to how our join order planner was implemented it would only keep track of 1 of the partition columns when comparing if the join could be executed locally. This manifested in a join chain of 4 tables to _always_ be executed as a repartition join. 3 tables joined in a chain would have the middle table shared by the two outer tables causing the local join possibility to be found. With this patch we keep a unique list (or set) of all partition columns participating in the join. When a candidate table is checked for a possibility to execute a local join it will check if there is any partition column in that set that matches an equality join clause on the partition column of the candidate table. By taking into account all partition columns in the left relation it will now find the local join path on >= 4 tables joined in a chain. fixes: #3276	2020-02-06 15:07:07 +01:00
Philip Dubé	ecad4aa5e6	Fill in jobIdList field of DistributedExecution Pass down jobIdList from ExecuteTasksInDependencyOrder Also clean up comment for ExecuteTaskListOutsideTransaction	2020-02-05 17:32:22 +00:00
Philip Dubé	c252811884	dont: don't, wont: won't, acylic: acyclic	2020-02-05 17:32:22 +00:00
Hadi Moshayedi	bc1a800f70	Use current user for repartition join temp schemas. Otherwise when using a less privileged user we might get errors when trying to create the schema.	2020-02-04 09:48:20 -08:00
Onder Kalaci	2f274a4fce	Make sure to go deeper into the functions to search for PARAMs For example, a PARAM might reside inside a function just because of a casting of a type such as the follows: ``` {FUNCEXPR :funcid 1740 :funcresulttype 1700 :funcretset false :funcvariadic false :funcformat 2 :funccollid 0 :inputcollid 0 :args ( {PARAM :paramkind 0 :paramid 15 :paramtype 23 :paramtypmod -1 :paramcollid 0 :location 356 } ) ``` We should recursively check the expression before bailing out.	2020-02-03 09:36:12 +01:00
Philip Dubé	84a500ffc6	CitusRemoveDirectory: loop when directory is not empty Sometimes during errors workers will create files while we're deleting intermediate directories example: DEBUG: could not remove file "base/pgsql_job_cache/10_0_431": Directory not empty DETAIL: WARNING from localhost:57637	2020-01-30 20:02:08 +00:00
Önder Kalacı	8584cb005b	Do not evaluate functions on the coordinator for SELECT queries (#3440 ) Previously, the logic for evaluting the functions and the parameters were the same. That ended-up evaluting the functions inaccurately on the coordinator. Instead, split the function evaluation logic from parameter evalution logic.	2020-01-30 08:47:28 +01:00
Önder Kalacı	4519d3411d	Improve the representation of used sub plans (#3411 ) Previously, we've identified the usedSubPlans by only looking to the subPlanId. With this commit, we're expanding it to also include information on the location of the subPlan. This is useful to distinguish the cases where the subPlan is used either on only HAVING or both HAVING and any other part of the query.	2020-01-24 10:47:14 +01:00
Philip Dubé	50c5e814c8	CurrentDatabaseName: return const char* as we're borrowing from cache	2020-01-23 22:49:35 +00:00
Hadi Moshayedi	3e1004c232	Change DistributedResultFragment::nodeId to uint32. This is to match the type of WorkerNode::nodeId.	2020-01-23 09:33:15 -08:00
Önder Kalacı	ef7d1ea91d	Locally execute queries that don't need any data access (#3410 ) * Update shardPlacement->nodeId to uint As the source of the shardPlacement->nodeId is always workerNode->nodeId, and that is uint32. We had this hack because of: `0ea4e52df5 (r266421409)` And, that is gone with: `90056f7d3c (diff-c532177d74c72d3f0e7cd10e448ab3c6L1123)` So, we're safe to do it now. * Relax the restrictions on using the local execution Previously, whenever any local execution happens, we disabled further commands to do any remote queries. The basic motivation for doing that is to prevent any accesses in the same transaction block to access the same placements over multiple sessions: one is local session the other is remote session to the same placement. However, the current implementation does not distinguish local accesses being to a placement or not. For example, we could have local accesses that only touches intermediate results. In that case, we should not implement the same restrictions as they become useless. So, this is a pre-requisite for executing the intermediate result only queries locally. * Update the error messages As the underlying implementation has changed, reflect it in the error messages. * Keep track of connections to local node With this commit, we're adding infrastructure to track if any connection to the same local host is done or not. The main motivation for doing this is that we've previously were more conservative about not choosing local execution. Simply, we disallowed local execution if any connection to any remote node is done. However, if we want to use local execution for intermediate result only queries, this'd be annoying because we expect all queries to touch remote node before the final query. Note that this approach is still limiting in Citus MX case, but for now we can ignore that. * Formalize the concept of Local Node Also some minor refactoring while creating the dummy placement * Write intermediate results locally when the results are only needed locally Before this commit, Citus used to always broadcast all the intermediate results to remote nodes. However, it is possible to skip pushing the results to remote nodes always. There are two notable cases for doing that: (a) When the query consists of only intermediate results (b) When the query is a zero shard query In both of the above cases, we don't need to access any data on the shards. So, it is a valuable optimization to skip pushing the results to remote nodes. The pattern mentioned in (a) is actually a common patterns that Citus users use in practice. For example, if you have the following query: WITH cte_1 AS (...), cte_2 AS (....), ... cte_n (...) SELECT ... FROM cte_1 JOIN cte_2 .... JOIN cte_n ...; The final query could be operating only on intermediate results. With this patch, the intermediate results of the ctes are not unnecessarily pushed to remote nodes. * Add specific regression tests As there are edge cases in Citus MX and with round-robin policy, use the same queries on those cases as well. * Fix failure tests By forcing not to use local execution for intermediate results since all the tests expects the results to be pushed remotely. * Fix flaky test * Apply code-review feedback Mostly style changes * Limit the max value of pg_dist_node_seq to reserve for internal use	2020-01-23 18:28:34 +01:00
Onder Kalaci	a0dff301c7	Update shardPlacement->nodeId to uint As the source of the shardPlacement->nodeId is always workerNode->nodeId, and that is uint32. We had this hack because of: `0ea4e52df5 (r266421409)` And, that is gone with: `90056f7d3c (diff-c532177d74c72d3f0e7cd10e448ab3c6L1123)` So, we're safe to do it now.	2020-01-23 13:00:24 +01:00
Halil Ozan Akgul	b40f067d05	Adds propagation for grant on schema commands	2020-01-20 14:51:28 +03:00
Onder Kalaci	0bf1e81e33	Cache local plans on BeginScan	2020-01-17 16:02:57 +01:00
Onder Kalaci	08d148d43e	Make TaskAccessesLocalNode external function	2020-01-17 16:02:57 +01:00
Onder Kalaci	ff12df411b	Add LocalPlannedStatement struct	2020-01-17 16:02:57 +01:00
Jelte Fennema	246435be7e	Lazy query deparsing executable queries (#3350 ) Deparsing and parsing a query can be heavy on CPU. When locally executing the query we don't need to do this in theory most of the time. This PR is the first step in allowing to skip deparsing and parsing the query in these cases, by lazily creating the query string and storing the query in the task. Future commits will make use of this and not deparse and parse the query anymore, but use the one from the task directly.	2020-01-17 11:49:43 +01:00
Hadi Moshayedi	a079278b0c	Repartitioned INSERT/SELECT: Add a GUC to enable/disable it	2020-01-16 23:24:52 -08:00
Hadi Moshayedi	97072c9eb1	INSERT/SELECT: show method in EXPLAIN output	2020-01-16 23:24:52 -08:00
Hadi Moshayedi	44a2aede16	Don't start a coordinated transaction on workers. Otherwise transaction hooks of Citus kick in and might cause unwanted errors.	2020-01-16 23:24:52 -08:00
Hadi Moshayedi	42c3c03b85	Handle extra columns added in ExpandWorkerTargetEntry() in repartitioned INSERT/SELECT	2020-01-16 23:24:52 -08:00
Hadi Moshayedi	b4e5f4b10a	Implement INSERT ... SELECT with repartitioning	2020-01-16 23:24:52 -08:00
Onder Kalaci	dc17c2658e	Defer shard pruning for fast-path router queries to execution This is purely to enable better performance with prepared statements. Before this commit, the fast path queries with prepared statements where the distribution key includes a parameter always went through distributed planning. After this change, we only go through distributed planning on the first 5 executions.	2020-01-16 16:59:36 +01:00
Halil Ozan Akgul	c5539d20d9	Adds alter table schema propagation	2020-01-16 17:04:16 +03:00
Jelte Fennema	e76281500c	Replace shardId lock with lock on colocation+shardIntervalIndex (#3374 ) This new locking pattern makes sure that some deadlocks that could happend during rebalancing cannot occur anymore.	2020-01-16 13:14:01 +01:00
Onder Kalaci	64560b07be	Update regression tests-2 In this commit, we're introducing a way to prevent CTE inlining via a GUC. The GUC is used in all the tests where PG 11 and PG 12 tests would diverge otherwise. Note that, in PG 12, the restriction information for CTEs are generated. It means that for some queries involving CTEs, Citus planner (router planner/ pushdown planner) may behave differently. So, via the GUC, we prevent tests to diverge on PG 11 vs PG 12. When we drop PG 11 support, we should get rid of the GUC, and mark relevant ctes as MATERIALIZED, which does the same thing.	2020-01-16 12:28:15 +01:00
Onder Kalaci	01a5800ee8	Add Citus' CTE inlining functions With this commit we add the necessary Citus function to inline CTEs in a queryTree. You might ask, why do we need to inline CTEs if Postgres is already going to do it? Few reasons behind this decision: - One techinal node here is that Citus does the recursive CTE planning by checking the originalQuery which is the query that has not gone through the standard_planner(). CTEs in Citus is super powerful. It is practically key for full SQL coverage for multi-shard queries. With CTEs, you can always reduce any query multi-shard query into a router query via recursive planning (thus full SQL coverage). We cannot let CTE inlining break that. The main idea is Citus should be able to retry planning if anything goes after CTE inlining. So, by taking ownership of CTE inlining on the originalQuery, Citus can fallback to recursive planning of CTEs if the planning with the inlined query fails. It could have been a lot harder if we had relied on standard_planner() to have the inlined CTEs on the original query. - We want to have this feature in PostgreSQL 11 as well, but Postgres only inlines in version 12	2020-01-16 12:28:15 +01:00
Marco Slot	f1a0582973	Make ApplyLogRedaction a macro and redefine ereport	2020-01-13 18:24:36 +01:00
Marco Slot	90056f7d3c	Remove copy from worker for append-partitioned table	2020-01-13 23:03:40 -08:00
Philip Dubé	4b5d6c3ebe	Rename RelayFileState to ShardState Replace FILE_ prefix with SHARD_STATE_	2020-01-12 05:57:53 +00:00
Philip Dubé	e71386af33	Replace ARRAY_OUT_FUNC_ID with postgres's F_ARRAY_OUT Also use stack allocation for walkerContext in multi_logical_optimizer	2020-01-10 16:54:00 +00:00
Hadi Moshayedi	527d7d41c1	Implement RedistributeTaskListResult	2020-01-09 23:47:25 -08:00
Hadi Moshayedi	e1e383cb59	Don't override xact id assigned by coordinator on workers. We might need to send commands from workers to other workers. In these cases we shouldn't override the xact id assigned by coordinator, or otherwise we won't read the consistent set of result files accross the nodes.	2020-01-09 11:09:11 -08:00
Hadi Moshayedi	c7c460e843	PartitionTasklistResults: Use different queries per placement We need to know which placement succeeded in executing the worker_partition_query_result() call. Otherwise we wouldn't know which node to fetch from. This change allows that by introducing Task::perPlacementQueryStrings.	2020-01-09 10:55:58 -08:00
Hadi Moshayedi	f38d0e5b3f	Partitioned task list results.	2020-01-09 10:32:58 -08:00
Philip Dubé	73c06fae3b	Introduce GetDistributeObjectOps to organize dispatch of logic dependent on node/object type	2020-01-09 18:24:29 +00:00
Philip Dubé	863bf49507	Implement pulling up rows to coordinator when aggregates cannot be pushed down. Enabled by default	2020-01-07 01:16:04 +00:00
Jelte Fennema	5b0baea72c	Refactor distributed_planner for better understandability	2020-01-06 14:23:38 +01:00
Onder Kalaci	5a1e752726	Apply feedback - add fastPath field to plan	2020-01-06 12:42:43 +01:00
Onder Kalaci	7f3ab7892d	Skip shard pruning when possible We're already traversing the queryTree and finding the distribution key value, so pass it to the later stages of the planning.	2020-01-06 12:42:43 +01:00
Onder Kalaci	ca293116fa	Reduce calls to FastPathRouterQuery() Before this commit, we called it twice durning planning. Instead, we save the information and pass it.	2020-01-06 12:42:43 +01:00
Onder Kalaci	c8f14c9f6c	Make sure to update shard states of partitions on failures Fixes #3331 In #2389, we've implemented support for partitioned tables with rep > 1. The implementation is limiting the use of modification queries on the partitions. In fact, we error out when any partition is modified via EnsurePartitionTableNotReplicated(). However, we seem to forgot an important case, where the parent table's partition is marked as INVALID. In that case, at least one of the partition becomes INVALID. However, we do not mark partitions as INVALID ever. If the user queries the partition table directly, Citus could happily send the query to INVALID placements -- which are not marked as INVALID. This PR fixes it by marking the placements of the partitions as INVALID as well. The shard placement repair logic already re-creates all the partitions, so should be fine in that front.	2020-01-06 12:26:08 +01:00
Önder Kalacı	a174eb4f7b	Do not go through standard_planner() for INSERTs (#3348 ) That seems unnecessary. We already have the notion of FastPath queries, simply add it there.	2020-01-03 12:15:22 +00:00
Jelte Fennema	3a042e4611	Allow cartesian products on reference tables	2019-12-27 15:05:51 +01:00
Jelte Fennema	61e2501645	Make any expression with two or more tables a join expression	2019-12-27 15:05:51 +01:00
Hadi Moshayedi	d7aea7fa10	Implement partitioned intermediate results.	2019-12-24 03:53:39 -08:00
Jelte Fennema	b655c02352	Add the necessary changes for rebalance strategies on enterprise (#3325 ) This commit adds the SQL and C changes necessary to support custom rebalance strategies in the Enterprise version of Citus.	2019-12-19 15:23:08 +01:00
Hadi Moshayedi	ef487e0792	Implement fetch_intermediate_results	2019-12-18 10:46:35 -08:00
Hadi Moshayedi	249508d267	Estimate cost of read_intermediate_results()	2019-12-17 13:51:51 -08:00
SaitTalhaNisanci	7ff4ce2169	Add adaptive executor support for repartition joins (#3169 ) * WIP * wip * add basic logic to run a single job with repartioning joins with adaptive executor * fix some warnings and return in ExecuteDependedTasks if there is none * Add the logic to run depended jobs in adaptive executor The execution of depended tasks logic is changed. With the current logic: - All tasks are created from the top level task list. - At one iteration: - CurTasks whose dependencies are executed are found. - CurTasks are executed in parallel with adapter executor main logic. - The iteration is repeated until all tasks are completed. * Separate adaptive executor repartioning logic * Remove duplicate parts * cleanup directories and schemas * add basic repartion tests for adaptive executor * Use the first placement to fetch data In task tracker, when there are replicas, we try to fetch from a replica for which a map task is succeeded. TaskExecution is used for this, however TaskExecution is not used in adaptive executor. So we cannot use the same thing as task tracker. Since adaptive executor fails when a map task fails (There is no retry logic yet). We know that if we try to execute a fetch task, all of its map tasks already succeeded, so we can just use the first one to fetch from. * fix clean directories logic * do not change the search path while creating a udf * Enable repartition joins with adaptive executor with only enable_reparitition_joins guc * Add comments to adaptive_executor_repartition * dont run adaptive executor repartition test in paralle with other tests * execute cleanup only in the top level execution * do cleanup only in the top level ezecution * not begin a transaction if repartition query is used * use new connections for repartititon specific queries New connections are opened to send repartition specific queries. The opened connections will be closed at the FinishDistributedExecution. While sending repartition queries no transaction is begun so that we can see all changes. * error if a modification was done prior to repartition execution * not start a transaction if a repartition query and sql task, and clean temporary files and schemas at each subplan level * fix cleanup logic * update tests * add missing function comments * add test for transaction with DDL before repartition query * do not close repartition connections in adaptive executor * rollback instead of commit in repartition join test * use close connection instead of shutdown connection * remove unnecesary connection list, ensure schema owner before removing directory * rename ExecuteTaskListRepartition * put fetch query string in planner not executor as we currently support only replication factor = 1 with adaptive executor and repartition query and we know the query string in the planner phase in that case * split adaptive executor repartition to DAG execution logic and repartition logic * apply review items * apply review items * use an enum for remote transaction state and fix cleanup for repartition * add outside transaction flag to find connections that are unclaimed instead of always opening a new transaction * fix style * wip * rename removejobdir to partition cleanup * do not close connections at the end of repartition queries * do repartition cleanup in pg catch * apply review items * decide whether to use transaction or not at execution creation * rename isOutsideTransaction and add missing comment * not error in pg catch while doing cleanup * use replication factor of the creation time, not current time to decide if task tracker should be chosen * apply review items * apply review items * apply review item	2019-12-17 19:09:45 +03:00
Marco Slot	2f568ad5a5	Forbid using connections that sent intermediate results for data access and vice versa	2019-12-17 11:49:13 +01:00
Marco Slot	f4031dd477	Clean up transaction block usage logic in adaptive executor	2019-12-17 10:48:19 +01:00
Marco Slot	5f656e22db	Fix issue in IsMultiStatementTransaction detection	2019-12-16 17:01:43 +01:00
SaitTalhaNisanci	2829c601dd	replace Begin words in coordinated transactions with use (#3293 )	2019-12-16 10:40:31 +03:00
SaitTalhaNisanci	a0fe8646e0	add IsHoldOffCancellationReceived utility function (#3290 )	2019-12-12 17:32:59 +03:00
SaitTalhaNisanci	13204487e9	remove copyright years (#3286 )	2019-12-11 21:14:08 +03:00
SaitTalhaNisanci	d10f97998c	rename REMOTE_TRANS_INVALID to REMOTE_TRANS_NOT_STARTED	2019-12-11 15:24:18 +03:00
Marco Slot	133b8e1e0e	Move coordinator insert..select logic into executor	2019-12-10 11:21:35 -08:00
Philip Dubé	fcf2fd819b	Add distributioncolumncollation to to pg_dist_colocation Use partition column's collation for range distributed tables Don't allow non deterministic collations for hash distributed tables CoPartitionedTables: don't compare unequal types	2019-12-09 19:51:40 +00:00
Philip Dubé	d138bb89bf	Support creating collations as part of dependency resolution. Propagate ALTER/DROP on distributed collations Propagate CREATE COLLATION when outside transaction	2019-12-09 04:42:51 +00:00
Hadi Moshayedi	d28beb3711	Detect SQL UDF Calls.	2019-12-05 14:31:05 -08:00
Marco Slot	bb3bc10f0c	Fix segfault in column_to_column_name	2019-12-01 23:57:25 +01:00
Marco Slot	16d1ad3666	Remove distinction between SQL_TASK and ROUTER_TASK	2019-11-29 05:58:29 +01:00
SaitTalhaNisanci	aeec3d1544	fix typo in dependent jobs and dependent task (#3244 )	2019-11-28 23:47:28 +03:00
Hadi Moshayedi	2268a9cae6	Error for metadata commands if any metadata node is out-of-sync (#3226 ) * Error for metadata commands if any metadata node is out-of-sync * Make the functions have separate APIs for all workers/metadata workers	2019-11-27 09:52:57 +01:00
Önder Kalacı	1cfbeb89ec	Make NodeCanHaveDistTablePlacements() public (#3229 ) Since it is required in rebalancer.	2019-11-26 12:15:38 +01:00
Philip Dubé	261a9de42d	Fix typos: VAR_SET_VALUE_KIND -> VAR_SET_VALUE kind beginnig -> beginning plannig -> planning the the -> the er then -> er than	2019-11-25 23:24:13 +00:00
Hanefi Onaldi	d82f3e9406	Introduce intermediate result broadcasting In plain words, each distributed plan pulls the necessary intermediate results to the worker nodes that the plan hits. This is primarily useful in three ways. (i) If the distributed plan that uses intermediate result(s) is a router query, then the intermediate results are only broadcasted to a single node. (ii) If a distributed plan consists of only intermediate results, which is not uncommon, the intermediate results are broadcasted to a single node only. (iii) If a distributed query hits a sub-set of the shards in multiple workers, the intermediate results will be broadcasted to the relevant node(s). The final item (iii) becomes crucial for append/range distributed tables where typically the distributed queries hit a small subset of shards/workers. To do this, for each query that Citus creates a distributed plan, we keep track of the subPlans used in the queryTree, and save it in the distributed plan. Just before Citus executes each subPlan, Citus first keeps track of every worker node that the distributed plan hits, and marks every subPlan should be broadcasted to these nodes. Later, for each subPlan which is a distributed plan, Citus does this operation recursively since these distributed plans may access to different subPlans, and those have to be recorded as well.	2019-11-20 15:26:36 +03:00
Philip Dubé	b7fef5c31a	Miscellaneous cleanup in prep for collation propagation	2019-11-19 17:28:59 +00:00
Onur TIRTIR	26c306d188	Add extensions to distributed object propagation infrastructure (#3185 )	2019-11-19 17:56:28 +03:00
Önder Kalacı	40fa3862ce	Prevent Citus extension becoming distributed object (#3197 ) Prevent Citus extension being distributed Because that could prevent doing rolling upgrades, where users may prefer to upgrade the version on the coordinator but not the workers. There could be some other edge cases, so I'd prefer to keep Citus extension outside the picture for now.	2019-11-18 16:57:10 +01:00
Halil Ozan Akgul	5ae7b219ff	Create the ALTER ROLE propagation	2019-11-18 18:31:28 +03:00
Nils Dijk	217890af5f	Feature: Expression in reference join (#3180 ) DESCRIPTION: Expression in reference join Fixed: #2582 This patch allows arbitrary expressions in the join clause when joining to a reference table. An example of such joins could be found in CHbenCHmark queries 7, 8, 9 and 11; `mod((s_w_id * s_i_id),10000) = su_suppkey` and `ascii(substr(c_state,1,1)) = n2.n_nationkey`. Since the join is on a reference table these queries are able to be pushed down to the workers. To implement these queries we will widen the `IsJoinClause` predicate to not check if the expressions are a type `Var` after stripping the implicit coerciens. Instead we define a join clause when the `Var`'s in a clause come from more than 1 table. This allows more clauses to pass into the logical planner's `MultiNodeTree(...)` planning function. To compensate for this we tighten down the `LocalJoin`, `SinglePartitionJoin` and `DualPartitionJoin` to check for direct column references when planning. This allows the planner to work with arbitrary join expressions on reference tables.	2019-11-18 16:25:46 +01:00
Hadi Moshayedi	d9dcba25e3	Plan reference/local table joins locally	2019-11-15 07:36:50 -08:00
Onder Kalaci	90943a6ce6	Do not include coordinator shards when round-robin is selected When the user picks "round-robin" policy, the aim is that the load is distributed across nodes. However, for reference tables on the coordinator, since local execution kicks in immediately, round-robin is ignored. With this change, we're excluding the placement on the coordinator. Although the approach seems a little bit invasive because of modifications in the placement list, that sounds acceptable. We could have done this in some other ways such as: 1) Add a field to "Task->roundRobinPlacement" (or such), which is updated as the first element after RoundRobinPolicy is applied. During the execution, if that placement is local to the coordinator, skip it and try the other remote placements. 2) On TaskAccessesLocalNode()@local_execution.c, check task_assignment_policy, if round-robin selected and there is local placement on the coordinator, skip it. However, task assignment is done on planning, but this decision is happening on the execution, which could create weird edge cases.	2019-11-15 06:03:32 -08:00
Hadi Moshayedi	15af1637aa	Replicate reference tables to coordinator.	2019-11-15 05:50:19 -08:00
SaitTalhaNisanci	b9b7fd7660	add IsLoggableLevel utility function (#3149 ) * add IsLoggableLevel utility function * add function comment for IsLoggableLevel * put ApplyLogRedaction to logutils	2019-11-15 14:59:13 +03:00
Jelte Fennema	1b2c438e69	Rename variables to not shadow globals in RHEL6 (#3194 ) Fixes #2839	2019-11-15 12:12:24 +01:00
Philip Dubé	495c0f5117	Phase 1 implementation of custom aggregates Phase 1 seeks to implement minimal infrastructure, so does not include: - dynamic generation of support aggregates to handle multiple arguments - configuration methods to direct aggregation strategy, or mark an aggregate's serialize/deserialize as safe to operate across nodes Aggregates can be distributed when: - they have a single argument - they have a combinefunc - their transition type is not a pseudotype	2019-11-14 19:01:24 +00:00
Philip Dubé	edc7a2ee38	Improve RECORD support	2019-11-14 18:32:22 +00:00
Philip Dubé	eb35743c3f	Remove citus.worker_list_file & master_initialize_node_metadata	2019-11-13 00:49:58 +00:00
Jelte Fennema	adc6ca6100	Make simple in queries on unique columns work with repartion join (#3171 ) This is necassery to support Q20 of the CHbenCHmark: #2582. To summarize the fix: The subquery is converted into an INNER JOIN on a table. This fixes the issue, since an INNER JOIN on a table is already supported by the repartion planner. The way this replacement is happening.: 1. Postgres replaces `col in (subquery)` with a SEMI JOIN (subquery) on col = subquery_result 2. If this subquery is simple enough Postgres will replace it with a regular read from a table 3. If the subquery returns unique results (e.g. a primary key) Postgres will convert the SEMI JOIN into an INNER JOIN during the planning. It will not change this in the rewritten query though. 4. We check if Postgres sends us any SEMI JOINs during its join order planning, if it doesn't we replace all SEMI JOINs in the rewritten query with INNER JOIN (which we already support).	2019-11-11 13:44:28 +01:00
Jelte Fennema	9fb897a074	Fix queries with repartition joins and group by unique column (#3157 ) Postgres doesn't require you to add all columns that are in the target list to the GROUP BY when you group by a unique column (or columns). It even actively removes these group by clauses when you do. This is normally fine, but for repartition joins it is not. The reason for this is that the temporary tables don't have these primary key columns. So when the worker executes the query it will complain that it is missing columns in the group by. This PR fixes that by adding an ANY_VALUE aggregate around each variable in the target list that does is not contained in the group by or in an aggregate. This is done only for repartition joins. The ANY_VALUE aggregate chooses the value from an undefined row in the group.	2019-11-08 15:36:18 +01:00
Önder Kalacı	0b3d4e55d9	Local execution should not change hasReturning for distributed tables (#3160 ) It looks like the logic to prevent RETURNING in reference tables to have duplicate entries that comes from local and remote executions leads to missing some tuples for distributed tables. With this PR, we're ensuring to kick in the logic for reference tables only.	2019-11-08 12:49:56 +01:00
Philip Dubé	2fc45e5897	create_distributed_function: accept aggregates Adds support for OCLASS_PROC to worker_create_or_replace_object	2019-11-06 18:23:37 +00:00
Hadi Moshayedi	e00d1546f3	Don't maintain replicationfactor of reference tables	2019-11-05 07:23:14 -08:00
Önder Kalacı	960cd02c67	Remove real time router executors (#3142 ) * Remove unused executor codes All of the codes of real-time executor. Some functions in router executor still remains there because there are common functions. We'll move them to accurate places in the follow-up commits. * Move GUCs to transaction mngnt and remove unused struct * Update test output * Get rid of references of real-time executor from code * Warn if real-time executor is picked * Remove lots of unused connection codes * Removed unused code for connection restrictions Real-time and router executors cannot handle re-using of the existing connections within a transaction block. Adaptive executor and COPY can re-use the connections. So, there is no reason to keep the code around for applying the restrictions in the placement connection logic.	2019-11-05 12:48:10 +01:00
SaitTalhaNisanci	7c410e3cd7	pass CitusCustomState directly to adaptive executor (#3151 )	2019-11-01 19:57:32 +03:00
Önder Kalacı	ffd89e4e01	Include all relevant relations in the ExtractRangeTableRelationWalker (#3135 ) We've changed the logic for pulling RTE_RELATIONs in #3109 and non-colocated subquery joins and partitioned tables. @onurctirtir found this steps where I traced back and found the issues. While looking into it in more detail, we decided to expand the list in a way that the callers get all the relevant RTE_RELATIONs RELKIND_RELATION, RELKIND_PARTITIONED_TABLE, RELKIND_FOREIGN_TABLE and RELKIND_MATVIEW. These are all relation kinds that Citus planner is aware of.	2019-11-01 16:06:58 +01:00
SaitTalhaNisanci	dadbe86af1	refactor some of hard coded values in citus gucs (#3137 ) * refactor some of hard coded values in citus gucs * rename GUC_ALLOW_ALL to GUC_STANDARD	2019-10-30 10:35:39 +03:00
SaitTalhaNisanci	29d45bd1b9	Do not assign InvalidOid for local execution while extracting parameters (#3131 ) * do not assign InvalidOid for local execution while extracting parameters * rename functions * rename parameter and replace function	2019-10-28 14:28:22 +03:00
Jelte Fennema	a5010e5b17	Add extra foreach convenience macros (#3117 ) This completely hides `ListCell` to the user of the loop Example usage: ```c WorkerNode workerNode = NULL; foreach_ptr(workerNode, workerNodeList) { // Do stuff with workerNode } ``` Instead of: ```c ListCell workerNodeCell = NULL; foreach(cell, workerNodeList) { WorkerNode *workerNode = lfirst(workerNodeCell); // Do stuff with workerNode } ```	2019-10-23 16:49:12 +02:00
Jelte Fennema	78e495e030	Add shouldhaveshards to pg_dist_node (#2960 ) This is an improvement over #2512. This adds the boolean shouldhaveshards column to pg_dist_node. When it's false, create_distributed_table for new collocation groups will not create shards on that node. Reference tables will still be created on nodes where it is false.	2019-10-22 16:47:16 +02:00
Jelte Fennema	7abedc38b0	Support subqueries in HAVING (#3098 ) Areas for further optimization: - Don't save subquery results to a local file on the coordinator when the subquery is not in the having clause - Push the the HAVING with subquery to the workers if there's a group by on the distribution column - Don't push down the results to the workers when we don't push down the HAVING clause, only the coordinator needs it Fixes #520 Fixes #756 Closes #2047	2019-10-16 16:40:14 +02:00
Onur TIRTIR	d5f83dc110	Refactor range table walkers (#3109 )	2019-10-16 01:20:49 +03:00
SaitTalhaNisanci	94a7e6475c	Remove copyright years (#2918 ) * Update year as 2012-2019 * Remove copyright years	2019-10-15 17:44:30 +03:00
Philip Dubé	74cb168205	Remove Postgres 10 support	2019-10-11 21:56:56 +00:00
Philip Dubé	dd490b6376	Cache whether an object is in pg_dist_object. Avoids redundant lookups for non-distributed objects	2019-10-10 14:50:38 +00:00
Onder Kalaci	3be72ce42f	Make sure that distributed functions always have the correct user Objectives: (a) both super user and regular user should have the correct owner for the function on the worker (b) The transactional semantics would work fine for both super user and regular user (c) non-super-user and non-function owner would get a reasonable error message if tries to distribute the function Co-authored-by: @serprex	2019-10-04 21:38:49 +00:00
Hadi Moshayedi	217db2a03e	Don't block for locks in SyncMetadataToNodes()	2019-10-03 16:53:36 -07:00
Philip Dubé	89d35e9692	Attempt to force custom plans for prepared statements when trying to delegate function calls We discern between PARAM_EXEC & PARAM_EXTERN: `d52eaa0948/src/include/nodes/primnodes.h (L211)` According to primnodes.h we should only run into PARAM_EXEC or PARAM_EXTERN	2019-09-30 23:49:14 +00:00
Hadi Moshayedi	5e97e5c98e	Don't push down queries when in subqueries/ctes	2019-09-30 14:22:05 -07:00
Marco Slot	35bef0f3db	Avoid caching connections from backends that servicei internal connections	2019-09-28 08:32:10 +02:00
Nils Dijk	01b26cf91a	Disallow distributed functions for functions depending on an extension (#3049 ) DESCRIPTION: Disallow distributed functions for functions depending on an extension Functions depending on an extension cannot (yet) be distributed by citus. If we would allow this it would cause issues with our dependency following mechanism as we stop following objects depending on an extension. By not allowing functions to be distributed when they depend on an extension as well as not allowing to make distributed functions depend on an extension we won't break the ability to add new nodes. Allowing functions depending on extensions to be distributed at the moment could cause problems in that area.	2019-09-30 15:19:47 +02:00
Nils Dijk	473cbc0115	Propagate CREATE OR REPLACE FUNCTION to workers for distributed functions (#3043 ) DESCRIPTION: Propagate CREATE OR REPLACE FUNCTION Distributed functions could be replaced, which should be propagated to the workers to keep the function in sync between all nodes. Due to the complexity of deparsing the `CreateFunctionStmt` we actually produce the plan during the processing phase of our utilityhook. Since the changes have already been made in the catalog tables we can reuse `pg_get_functiondef` to get us the generated `CREATE OR REPLACE` sql.	2019-09-30 12:41:17 +02:00
Nils Dijk	9c2c50d875	Hookup function/procedure deparsing to our utility hook (#3041 ) DESCRIPTION: Propagate ALTER FUNCTION statements for distributed functions Using the implemented deparser for function statements to propagate changes to both functions and procedures that are previously distributed.	2019-09-27 22:06:49 +02:00
Philip Dubé	363409a0c2	Propagate REINDEX TABLE & REINDEX INDEX	2019-09-27 18:14:53 +00:00
Hanefi Onaldi	66b9f2e887	Deparsing and qualifiying for FUNCTION/PROCEDURE statements (#3014 ) This PR aims to add all the necessary logic to qualify and deparse all possible `{ALTER\|DROP} .. {FUNCTION\|PROCEDURE}` queries. As Procedures are introduced in PG11, the code contains many PG version checks. I tried my best to make it easy to clean up once we drop PG10 support. Here are some caveats: - I assumed that the parse tree is a valid one. There are some queries that are not allowed, but still are parsed successfully by postgres planner. Such queries will result in errors in execution time. (e.g. `ALTER PROCEDURE p STRICT` -> `STRICT` action is valid for functions but not procedures. Postgres decides to parse them nevertheless.)	2019-09-27 19:02:52 +02:00
Marco Slot	2868e02a3d	Implement SELECT function call delegation. When a function is marked as colocated with a distributed table, we try delegating queries of kind "SELECT func(...)" to workers. We currently only support this simple form, and don't delegate forms like "SELECT f1(...), f2(...)", "SELECT f1(...) FROM ...", or function calls inside transactions. As a side effect, we also fix the transactional semantics of DO blocks. Previously we didn't consider a DO block a multi-statement transaction. Now we do. Co-authored-by: Marco Slot <marco@citusdata.com> Co-authored-by: serprex <serprex@users.noreply.github.com> Co-authored-by: pykello <hadi.moshayedi@microsoft.com>	2019-09-27 09:13:25 -07:00
Marco Slot	ca478defeb	Deparse CALL statement instead of using original query string	2019-09-24 17:31:09 +00:00
Marco Slot	e269d990c9	Cast the distribution argument value when possible	2019-09-24 17:31:09 +00:00
Philip Dubé	432a8ef85b	Hadi's feedback Co-authored-by: pykello <hadi.moshayedi@microsoft.com> Co-authored-by: serprex <serprex@users.noreply.github.com>	2019-09-24 17:31:09 +00:00
Philip Dubé	bc1ad67eb5	Distribute CALL on distributed procedures to metadata workers Lots taken from https://github.com/citusdata/citus/pull/2829	2019-09-24 17:31:09 +00:00
Onder Kalaci	d37745bfc7	Sync metadata to worker nodes after create_distributed_function Since the distributed functions are useful when the workers have metadata, we automatically sync it. Also, after master_add_node(). We do it lazily and let the deamon sync it. That's mainly because the metadata syncing cannot be done in transaction blocks, and we don't want to add lots of transactional limitations to master_add_node() and create_distributed_function().	2019-09-23 18:30:53 +02:00
Marco Slot	5f23b951c7	Support serial and smallserial when syncing metadata	2019-09-23 17:39:21 +02:00
Marco Slot	e58d76c5f6	Fix assert failure in bare SELECT FROM reference table FOR UPDATE in MX	2019-09-23 17:00:09 +02:00
Hanefi Onaldi	ed11b9590c	Add distributed func creation queries in dependency replication logic	2019-09-18 20:07:45 +03:00
Hadi Moshayedi	76f3933b05	Add metadatasynced, and sync on master_update_node() Co-authored-by: pykello <hadi.moshayedi@microsoft.com> Co-authored-by: serprex <serprex@users.noreply.github.com>	2019-09-18 09:32:54 -07:00
Nils Dijk	db5d03931d	Feature disable object propagation (#2986 ) DESCRIPTION: Provide a GUC to turn of the new dependency propagation functionality In the case the dependency propagation functionality introduced in 9.0 causes issues to a cluster of a user they can turn it off almost completely. The only dependency that will still be propagated and kept track of is the schema to emulate the old behaviour. GUC to change is `citus.enable_object_propagation`. When set to `false` the functionality will be mostly turned off. Be aware that objects marked as distributed in `pg_dist_object` will still be kept in the catalog as a distributed object. Alter statements to these objects will not be propagated to workers and may cause desynchronisation.	2019-09-18 17:16:22 +02:00
Nils Dijk	2b7f5552c8	Fix: rename remote type on conflict (#2983 ) DESCRIPTION: Rename remote types during type propagation To prevent data to be destructed when a remote type differs from the type on the coordinator during type propagation we wanted to rename the type instead of `DROP CASCADE`. This patch removes the `DROP` logic and adds the creation of a rename statement to a free name.	2019-09-17 18:54:10 +02:00
Nils Dijk	0a3152d09c	Add feature flag to turn off create type propagation (#2982 ) DESCRIPTION: Add feature flag to turn off create type propagation When `citus.enable_create_type_propagation` is set to `false` citus will not propagate `CREATE TYPE` statements to the workers. Types are still distributed when tables that depend on these types are distributed.	2019-09-17 15:50:06 +02:00
Philip Dubé	964020097d	Merge two conflicting pg_dist_object headers	2019-09-16 19:19:21 +00:00
Onder Kalaci	cde6b02858	Add columns to pg_dist_object for distributed functions This PR simply adds the columns to pg_dist_object and implements the necessary metadata changes to keep track of distribution argument of the functions/procedures.	2019-09-16 17:28:04 +02:00
Philip Dubé	492d1b2cba	ActivePrimaryNodeList: add lockMode parameter	2019-09-13 17:44:56 +00:00
Nils Dijk	2879689441	Distribute Types to worker nodes (#2893 ) DESCRIPTION: Distribute Types to worker nodes When to propagate ============== There are two logical moments that types could be distributed to the worker nodes - When they get used ( just in time distribution ) - When they get created ( proactive distribution ) The just in time distribution follows the model used by how schema's get created right before we are going to create a table in that schema, for types this would be when the table uses a type as its column. The proactive distribution is suitable for situations where it is benificial to have the type on the worker nodes directly. They can later on be used in queries where an intermediate result gets created with a cast to this type. Just in time creation is always the last resort, you cannot create a distributed table before the type gets created. A good example use case is; you have an existing postgres server that needs to scale out. By adding the citus extension, add some nodes to the cluster, and distribute the table. The type got created before citus existed. There was no moment where citus could have propagated the creation of a type. Proactive is almost always a good option. Types are not resource intensive objects, there is no performance overhead of having 100's of types. If you want to use them in a query to represent an intermediate result (which happens in our test suite) they just work. There is however a moment when proactive type distribution is not beneficial; in transactions where the type is used in a distributed table. Lets assume the following transaction: ```sql BEGIN; CREATE TYPE tt1 AS (a int, b int); CREATE TABLE t1 AS (a int PRIMARY KEY, b tt1); SELECT create_distributed_table('t1', 'a'); \copy t1 FROM bigdata.csv ``` Types are node scoped objects; meaning the type exists once per worker. Shards however have best performance when they are created over their own connection. For the type to be visible on all connections it needs to be created and committed before we try to create the shards. Here the just in time situation is most beneficial and follows how we create schema's on the workers. Outside of a transaction block we will just use 1 connection to propagate the creation. How propagation works ================= Just in time ----------- Just in time propagation hooks into the infrastructure introduced in #2882. It adds types as a supported object in `SupportedDependencyByCitus`. This will make sure that any object being distributed by citus that depends on types will now cascade into types. When types are depending them self on other objects they will get created first. Creation later works by getting the ddl commands to create the object by its `ObjectAddress` in `GetDependencyCreateDDLCommands` which will dispatch types to `CreateTypeDDLCommandsIdempotent`. For the correct walking of the graph we follow array types, when later asked for the ddl commands for array types we return `NIL` (empty list) which makes that the object will not be recorded as distributed, (its an internal type, dependant on the user type). Proactive distribution --------------------- When the user creates a type (composite or enum) we will have a hook running in `multi_ProcessUtility` after the command has been applied locally. Running after running locally makes that we already have an `ObjectAddress` for the type. This is required to mark the type as being distributed. Keeping the type up to date ==================== For types that are recorded in `pg_dist_object` (eg. `IsObjectDistributed` returns true for the `ObjectAddress`) we will intercept the utility commands that alter the type. - `AlterTableStmt` with `relkind` set to `OBJECT_TYPE` encapsulate changes to the fields of a composite type. - `DropStmt` with removeType set to `OBJECT_TYPE` encapsulate `DROP TYPE`. - `AlterEnumStmt` encapsulates changes to enum values. Enum types can not be changed transactionally. When the execution on a worker fails a warning will be shown to the user the propagation was incomplete due to worker communication failure. An idempotent command is shown for the user to re-execute when the worker communication is fixed. Keeping types up to date is done via the executor. Before the statement is executed locally we create a plan on how to apply it on the workers. This plan is executed after we have applied the statement locally. All changes to types need to be done in the same transaction for types that have already been distributed and will fail with an error if parallel queries have already been executed in the same transaction. Much like foreign keys to reference tables.	2019-09-13 17:46:07 +02:00
Jelte Fennema	d6deb062aa	Add shard rebalancer stubs	2019-09-12 16:40:25 +02:00
Jelte Fennema	58012054c9	Add an extra advisory lock tag class	2019-09-12 16:40:25 +02:00
Jelte Fennema	eb7e45d556	Make LookupNodeForGroup extern	2019-09-12 16:40:25 +02:00
Jelte Fennema	de5174f763	include postgres.h into some of our .h files to silence warnings	2019-09-12 16:40:25 +02:00
Onder Kalaci	0b0c779c77	Introduce the concept of Local Execution /* * local_executor.c * * The scope of the local execution is locally executing the queries on the * shards. In other words, local execution does not deal with any local tables * that are not shards on the node that the query is being executed. In that sense, * the local executor is only triggered if the node has both the metadata and the * shards (e.g., only Citus MX worker nodes). * * The goal of the local execution is to skip the unnecessary network round-trip * happening on the node itself. Instead, identify the locally executable tasks and * simply call PostgreSQL's planner and executor. * * The local executor is an extension of the adaptive executor. So, the executor uses * adaptive executor's custom scan nodes. * * One thing to note that Citus MX is only supported with replication factor = 1, so * keep that in mind while continuing the comments below. * * On the high level, there are 3 slightly different ways of utilizing local execution: * * (1) Execution of local single shard queries of a distributed table * * This is the simplest case. The executor kicks at the start of the adaptive * executor, and since the query is only a single task the execution finishes * without going to the network at all. * * Even if there is a transaction block (or recursively planned CTEs), as long * as the queries hit the shards on the same, the local execution will kick in. * * (2) Execution of local single queries and remote multi-shard queries * * The rule is simple. If a transaction block starts with a local query execution, * all the other queries in the same transaction block that touch any local shard * have to use the local execution. Although this sounds restrictive, we prefer to * implement in this way, otherwise we'd end-up with as complex scenarious as we * have in the connection managements due to foreign keys. * * See the following example: * BEGIN; * -- assume that the query is executed locally * SELECT count() FROM test WHERE key = 1; * -- at this point, all the shards that reside on the * -- node is executed locally one-by-one. After those finishes * -- the remaining tasks are handled by adaptive executor * SELECT count() FROM test; * * (3) Modifications of reference tables * * Modifications to reference tables have to be executed on all nodes. So, after the * local execution, the adaptive executor keeps continuing the execution on the other * nodes. * * Note that for read-only queries, after the local execution, there is no need to * kick in adaptive executor. * * There are also few limitations/trade-offs that is worth mentioning. First, the * local execution on multiple shards might be slow because the execution has to * happen one task at a time (e.g., no parallelism). Second, if a transaction * block/CTE starts with a multi-shard command, we do not use local query execution * since local execution is sequential. Basically, we do not want to lose parallelism * across local tasks by switching to local execution. Third, the local execution * currently only supports queries. In other words, any utility commands like TRUNCATE, * fails if the command is executed after a local execution inside a transaction block. * Forth, the local execution cannot be mixed with the executors other than adaptive, * namely task-tracker, real-time and router executors. Finally, related with the * previous item, COPY command cannot be mixed with local execution in a transaction. * The implication of that any part of INSERT..SELECT via coordinator cannot happen * via the local execution. */	2019-09-12 11:51:25 +02:00
Nils Dijk	936d546a3c	Refactor Ensure Schema Exists to Ensure Dependecies Exists (#2882 ) DESCRIPTION: Refactor ensure schema exists to dependency exists Historically we only supported schema's as table dependencies to be created on the workers before a table gets distributed. This PR puts infrastructure in place to walk pg_depend to figure out which dependencies to create on the workers. Currently only schema's are supported as objects to create before creating a table. We also keep track of dependencies that have been created in the cluster. When we add a new node to the cluster we use this catalog to know which objects need to be created on the worker. Side effect of knowing which objects are already distributed is that we don't have debug messages anymore when creating schema's that are already created on the workers.	2019-09-04 14:10:20 +02:00
Philip Dubé	28d964240f	Remove CheckForUpdates https://reports.citusdata.com/v1/releases/latest We haven't updated the version CheckForUpdates sees since 7.1.0	2019-09-03 21:11:25 +00:00
Jelte Fennema	cbecf97c84	Move tuplestore setup to a helper function (#2898 ) * Add tuplestore helpers * More detailed error messages in tuplestore * Add CreateTupleDescCopy to SetupTuplestore * Use new SetupTuplestore helper function * Remove unnecessary copy * Remove comment about undefined behaviour	2019-08-27 09:11:08 +02:00
Philip Dubé	6b0d8ed83d	SortList in FinalizedShardPlacementList, makes 3 failure tests consistent between 11/12	2019-08-22 19:30:56 +00:00
Philip Dubé	e5cd298a98	pg12 revised layout of FunctionCallInfoData See `a9c35cf85c` clang raises a warning due to FunctionCall2InfoData technically being variable sized This is fine, as the struct is the size we want it to be. So silence the warning	2019-08-22 19:02:35 +00:00
Philip Dubé	be3285828f	Collations matter for hashing strings in pg12 See https://www.postgresql.org/docs/12/collation.html#COLLATION-NONDETERMINISTIC	2019-08-22 18:58:37 +00:00

... 3 4 5 6 7 ...

970 Commits (remove-unnecessary-assertion)