citus

Commit Graph

Author	SHA1	Message	Date
Sait Talha Nisanci	4308d867d9	remove task-tracker in comments, documentation	2020-07-21 16:21:01 +03:00
SaitTalhaNisanci	b3af63c8ce	Remove task tracker executor (#3850 ) * use adaptive executor even if task-tracker is set * Update check-multi-mx tests for adaptive executor Basically repartition joins are enabled where necessary. For parallel tests max adaptive executor pool size is decresed to 2, otherwise we would get too many clients error. * Update limit_intermediate_size test It seems that when we use adaptive executor instead of task tracker, we exceed the intermediate result size less in the test. Therefore updated the tests accordingly. * Update multi_router_planner It seems that there is one problem with multi_router_planner when we use adaptive executor, we should fix the following error: +ERROR: relation "authors_range_840010" does not exist +CONTEXT: while executing command on localhost:57637 * update repartition join tests for check-multi * update isolation tests for repartitioning * Error out if shard_replication_factor > 1 with repartitioning As we are removing the task tracker, we cannot switch to it if shard_replication_factor > 1. In that case, we simply error out. * Remove MULTI_EXECUTOR_TASK_TRACKER * Remove multi_task_tracker_executor Some utility methods are moved to task_execution_utils.c. * Remove task tracker protocol methods * Remove task_tracker.c methods * remove unused methods from multi_server_executor * fix style * remove task tracker specific tests from worker_schedule * comment out task tracker udf calls in tests We were using task tracker udfs to test permissions in multi_multiuser.sql. We should find some other way to test them, then we should remove the commented out task tracker calls. * remove task tracker test from follower schedule * remove task tracker tests from multi mx schedule * Remove task-tracker specific functions from worker functions * remove multi task tracker extra schedule * Remove unused methods from multi physical planner * remove task_executor_type related things in tests * remove LoadTuplesIntoTupleStore * Do initial cleanup for repartition leftovers During startup, task tracker would call TrackerCleanupJobDirectories and TrackerCleanupJobSchemas to clean up leftover directories and job schemas. With adaptive executor, while doing repartitions it is possible to leak these things as well. We don't retry cleanups, so it is possible to have leftover in case of errors. TrackerCleanupJobDirectories is renamed as RepartitionCleanupJobDirectories since it is repartition specific now, however TrackerCleanupJobSchemas cannot be used currently because it is task tracker specific. The thing is that this function is a no-op currently. We should add cleaning up intermediate schemas to DoInitialCleanup method when that problem is solved(We might want to solve it in this PR as well) * Revert "remove task tracker tests from multi mx schedule" This reverts commit `03ecc0a681`. * update multi mx repartition parallel tests * not error with task_tracker_conninfo_cache_invalidate * not run 4 repartition queries in parallel It seems that when we run 4 repartition queries in parallel we get too many clients error on CI even though we don't get it locally. Our guess is that, it is because we open/close many connections without doing some work and postgres has some delay to close the connections. Hence even though connections are removed from the pg_stat_activity, they might still not be closed. If the above assumption is correct, it is unlikely for it to happen in practice because: - There is some network latency in clusters, so this leaves some times for connections to be able to close - Repartition joins return some data and that also leaves some time for connections to be fully closed. As we don't get this error in our local, we currently assume that it is not a bug. Ideally this wouldn't happen when we get rid of the task-tracker repartition methods because they don't do any pruning and might be opening more connections than necessary. If this still gives us "too many clients" error, we can try to increase the max_connections in our test suite(which is 100 by default). Also there are different places where this error is given in postgres, but adding some backtrace it seems that we get this from ProcessStartupPacket. The backtraces can be found in this link: https://circleci.com/gh/citusdata/citus/138702 * Set distributePlan->relationIdList when it is needed It seems that we were setting the distributedPlan->relationIdList after JobExecutorType is called, which would choose task-tracker if replication factor > 1 and there is a repartition query. However, it uses relationIdList to decide if the query has a repartition query, and since it was not set yet, it would always think it is not a repartition query and would choose adaptive executor when it should choose task-tracker. * use adaptive executor even with shard_replication_factor > 1 It seems that we were already using adaptive executor when replication_factor > 1. So this commit removes the check. * remove multi_resowner.c and deprecate some settings * remove TaskExecution related leftovers * change deprecated API error message * not recursively plan single relatition repartition subquery * recursively plan single relation repartition subquery * test depreceated task tracker functions * fix overlapping shard intervals in range-distributed test * fix error message for citus_metadata_container * drop task-tracker deprecated functions * put the implemantation back to worker_cleanup_job_schema_cachesince citus cloud uses it * drop some functions, add downgrade script Some deprecated functions are dropped. Downgrade script is added. Some gucs are deprecated. A new guc for repartition joins bucket size is added. * order by a test to fix flappiness	2020-07-18 13:11:36 +03:00
Marco Slot	fd8cdb92f4	Evaluate nextval in the target list on the coordinator	2020-04-02 02:53:19 +02:00
Marco Slot	038e5999cb	Implement direct COPY table TO stdout	2020-02-17 15:15:10 +01:00
Philip Dubé	863bf49507	Implement pulling up rows to coordinator when aggregates cannot be pushed down. Enabled by default	2020-01-07 01:16:04 +00:00
Jelte Fennema	7730bd449c	Normalize tests: Remove trailing whitespace	2020-01-06 09:32:03 +01:00
Jelte Fennema	7f3de68b0d	Normalize tests: header separator length	2020-01-06 09:32:03 +01:00
Önder Kalacı	960cd02c67	Remove real time router executors (#3142 ) * Remove unused executor codes All of the codes of real-time executor. Some functions in router executor still remains there because there are common functions. We'll move them to accurate places in the follow-up commits. * Move GUCs to transaction mngnt and remove unused struct * Update test output * Get rid of references of real-time executor from code * Warn if real-time executor is picked * Remove lots of unused connection codes * Removed unused code for connection restrictions Real-time and router executors cannot handle re-using of the existing connections within a transaction block. Adaptive executor and COPY can re-use the connections. So, there is no reason to keep the code around for applying the restrictions in the placement connection logic.	2019-11-05 12:48:10 +01:00
Jelte Fennema	7abedc38b0	Support subqueries in HAVING (#3098 ) Areas for further optimization: - Don't save subquery results to a local file on the coordinator when the subquery is not in the having clause - Push the the HAVING with subquery to the workers if there's a group by on the distribution column - Don't push down the results to the workers when we don't push down the HAVING clause, only the coordinator needs it Fixes #520 Fixes #756 Closes #2047	2019-10-16 16:40:14 +02:00
Philip Dubé	ae1171a373	Test invalid aggregate	2019-09-12 16:55:05 +00:00
Philip Dubé	77efec04a0	Router Planner: accept SELECT_CMD ctes in modification queries	2019-06-26 10:32:01 +02:00
Onder Kalaci	f144bb4911	Introduce fast path router planning In this context, we define "Fast Path Planning for SELECT" as trivial queries where Citus can skip relying on the standard_planner() and handle all the planning. For router planner, standard_planner() is mostly important to generate the necessary restriction information. Later, the restriction information generated by the standard_planner is used to decide whether all the shards that a distributed query touches reside on a single worker node. However, standard_planner() does a lot of extra things such as cost estimation and execution path generations which are completely unnecessary in the context of distributed planning. There are certain types of queries where Citus could skip relying on standard_planner() to generate the restriction information. For queries in the following format, Citus does not need any information that the standard_planner() generates: SELECT ... FROM single_table WHERE distribution_key = X; or DELETE FROM single_table WHERE distribution_key = X; or UPDATE single_table SET value_1 = value_2 + 1 WHERE distribution_key = X; Note that the queries might not be as simple as the above such that GROUP BY, WINDOW FUNCIONS, ORDER BY or HAVING etc. are all acceptable. The only rule is that the query is on a single distributed (or reference) table and there is a "distribution_key = X;" in the WHERE clause. With that, we could use to decide the shard that a distributed query touches reside on a worker node.	2019-02-21 13:27:01 +03:00
Marco Slot	55f46acedf	Support TABLESAMPLE in router queries	2018-08-31 13:22:38 +02:00
Marco Slot	fd4ff29f2f	Add a debug message with distribution column value	2018-06-05 15:09:17 +03:00
velioglu	121ff39b26	Removes large_table_shard_count GUC	2018-04-29 10:34:50 +02:00
velioglu	72dfe4a289	Adds colocation check to local join	2018-04-04 22:49:27 +03:00
velioglu	698d585fb5	Remove broadcast join logic After this change all the logic related to shard data fetch logic will be removed. Planner won't plan any ShardFetchTask anymore. Shard fetch related steps in real time executor and task-tracker executor have been removed.	2018-03-30 11:45:19 +03:00
Onder Kalaci	1c930c96a3	Support non-co-located joins between subqueries With #1804 (and related PRs), Citus gained the ability to plan subqueries that are not safe to pushdown. There are two high-level requirements for pushing down subqueries: * Individual subqueries that require a merge step (i.e., GROUP BY on non-distribution key, or LIMIT in the subquery etc). We've handled such subqueries via #1876. * Combination of subqueries that are not joined on distribution keys. This commit aims to recursively plan some of such subqueries to make the whole query safe to pushdown. The main logic behind non colocated subquery joins is that we pick an anchor range table entry and check for distribution key equality of any other subqueries in the given query. If for a given subquery, we cannot find distribution key equality with the anchor rte, we recursively plan that subquery. We also used a hacky solution for picking relations as the anchor range table entries. The hack is that we wrap them into a subquery. This is only necessary since some of the attribute equivalance checks are based on queries rather than range table entries.	2018-02-26 13:50:37 +02:00
Marco Slot	09c09f650f	Recursively plan set operations when leaf nodes recur	2017-12-26 13:46:55 +02:00
Onder Kalaci	0d5a4b9c72	Recursively plan subqueries that are not safe to pushdown With this commit, Citus recursively plans subqueries that are not safe to pushdown, in other words, requires a merge step. The algorithm is simple: Recursively traverse the query from bottom up (i.e., bottom meaning the leaf queries). On each level, check whether the query is safe to pushdown (or a single repartition subquery). If the answer is yes, do not touch that subquery. If the answer is no, plan the subquery seperately (i.e., create a subPlan for it) and replace the subquery with a call to `read_intermediate_results(planId, subPlanId)`. During the the execution, run the subPlans first, and make them avaliable to the next query executions. Some of the queries hat this change allows us: * Subqueries with LIMIT * Subqueries with GROUP BY/DISTINCT on non-partition keys * Subqueries involving re-partition joins, router queries * Mixed usage of subqueries and CTEs (i.e., use CTEs in subqueries as well). Nested subqueries as long as we support the subquery inside the nested subquery. * Subqueries with local tables (i.e., those subqueries has the limitation that they have to be leaf subqueries) * VIEWs on the distributed tables just works (i.e., the limitations mentioned below still applies to views) Some of the queries that is still NOT supported: * Corrolated subqueries that are not safe to pushdown * Window function on non-partition keys * Recursively planned subqueries or CTEs on the outer side of an outer join * Only recursively planned subqueries and CTEs in the FROM (i.e., not any distributed tables in the FROM) and subqueries in WHERE clause * Subquery joins that are not on the partition columns (i.e., each subquery is individually joined on partition keys but not the upper level subquery.) * Any limitation that logical planner applies such as aggregate distincts (except for count) when GROUP BY is on non-partition key, or array_agg with ORDER BY	2017-12-21 08:37:40 +02:00

20 Commits (c23bdb129d2b936fdfa03fa842d0868f454237d7)