citus

Commit Graph

Author	SHA1	Message	Date
SaitTalhaNisanci	b3af63c8ce	Remove task tracker executor (#3850 ) * use adaptive executor even if task-tracker is set * Update check-multi-mx tests for adaptive executor Basically repartition joins are enabled where necessary. For parallel tests max adaptive executor pool size is decresed to 2, otherwise we would get too many clients error. * Update limit_intermediate_size test It seems that when we use adaptive executor instead of task tracker, we exceed the intermediate result size less in the test. Therefore updated the tests accordingly. * Update multi_router_planner It seems that there is one problem with multi_router_planner when we use adaptive executor, we should fix the following error: +ERROR: relation "authors_range_840010" does not exist +CONTEXT: while executing command on localhost:57637 * update repartition join tests for check-multi * update isolation tests for repartitioning * Error out if shard_replication_factor > 1 with repartitioning As we are removing the task tracker, we cannot switch to it if shard_replication_factor > 1. In that case, we simply error out. * Remove MULTI_EXECUTOR_TASK_TRACKER * Remove multi_task_tracker_executor Some utility methods are moved to task_execution_utils.c. * Remove task tracker protocol methods * Remove task_tracker.c methods * remove unused methods from multi_server_executor * fix style * remove task tracker specific tests from worker_schedule * comment out task tracker udf calls in tests We were using task tracker udfs to test permissions in multi_multiuser.sql. We should find some other way to test them, then we should remove the commented out task tracker calls. * remove task tracker test from follower schedule * remove task tracker tests from multi mx schedule * Remove task-tracker specific functions from worker functions * remove multi task tracker extra schedule * Remove unused methods from multi physical planner * remove task_executor_type related things in tests * remove LoadTuplesIntoTupleStore * Do initial cleanup for repartition leftovers During startup, task tracker would call TrackerCleanupJobDirectories and TrackerCleanupJobSchemas to clean up leftover directories and job schemas. With adaptive executor, while doing repartitions it is possible to leak these things as well. We don't retry cleanups, so it is possible to have leftover in case of errors. TrackerCleanupJobDirectories is renamed as RepartitionCleanupJobDirectories since it is repartition specific now, however TrackerCleanupJobSchemas cannot be used currently because it is task tracker specific. The thing is that this function is a no-op currently. We should add cleaning up intermediate schemas to DoInitialCleanup method when that problem is solved(We might want to solve it in this PR as well) * Revert "remove task tracker tests from multi mx schedule" This reverts commit `03ecc0a681`. * update multi mx repartition parallel tests * not error with task_tracker_conninfo_cache_invalidate * not run 4 repartition queries in parallel It seems that when we run 4 repartition queries in parallel we get too many clients error on CI even though we don't get it locally. Our guess is that, it is because we open/close many connections without doing some work and postgres has some delay to close the connections. Hence even though connections are removed from the pg_stat_activity, they might still not be closed. If the above assumption is correct, it is unlikely for it to happen in practice because: - There is some network latency in clusters, so this leaves some times for connections to be able to close - Repartition joins return some data and that also leaves some time for connections to be fully closed. As we don't get this error in our local, we currently assume that it is not a bug. Ideally this wouldn't happen when we get rid of the task-tracker repartition methods because they don't do any pruning and might be opening more connections than necessary. If this still gives us "too many clients" error, we can try to increase the max_connections in our test suite(which is 100 by default). Also there are different places where this error is given in postgres, but adding some backtrace it seems that we get this from ProcessStartupPacket. The backtraces can be found in this link: https://circleci.com/gh/citusdata/citus/138702 * Set distributePlan->relationIdList when it is needed It seems that we were setting the distributedPlan->relationIdList after JobExecutorType is called, which would choose task-tracker if replication factor > 1 and there is a repartition query. However, it uses relationIdList to decide if the query has a repartition query, and since it was not set yet, it would always think it is not a repartition query and would choose adaptive executor when it should choose task-tracker. * use adaptive executor even with shard_replication_factor > 1 It seems that we were already using adaptive executor when replication_factor > 1. So this commit removes the check. * remove multi_resowner.c and deprecate some settings * remove TaskExecution related leftovers * change deprecated API error message * not recursively plan single relatition repartition subquery * recursively plan single relation repartition subquery * test depreceated task tracker functions * fix overlapping shard intervals in range-distributed test * fix error message for citus_metadata_container * drop task-tracker deprecated functions * put the implemantation back to worker_cleanup_job_schema_cachesince citus cloud uses it * drop some functions, add downgrade script Some deprecated functions are dropped. Downgrade script is added. Some gucs are deprecated. A new guc for repartition joins bucket size is added. * order by a test to fix flappiness	2020-07-18 13:11:36 +03:00
Markus Sintonen	cdedb98c54	Improve shard pruning logic to understand OR-conditions. Previously a limitation in the shard pruning logic caused multi distribution value queries to always go into all the shards/workers whenever query also used OR conditions in WHERE clause. Related to https://github.com/citusdata/citus/issues/2593 and https://github.com/citusdata/citus/issues/1537 There was no good workaround for this limitation. The limitation caused quite a bit of overhead with simple queries being sent to all workers/shards (especially with setups having lot of workers/shards). An example of a previous plan which was inadequately pruned: ``` EXPLAIN SELECT count() FROM orders_hash_partitioned WHERE (o_orderkey IN (1,2)) AND (o_custkey = 11 OR o_custkey = 22); QUERY PLAN --------------------------------------------------------------------- Aggregate (cost=0.00..0.00 rows=0 width=0) -> Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=0 width=0) Task Count: 4 Tasks Shown: One of 4 -> Task Node: host=localhost port=xxxxx dbname=regression -> Aggregate (cost=13.68..13.69 rows=1 width=8) -> Seq Scan on orders_hash_partitioned_630000 orders_hash_partitioned (cost=0.00..13.68 rows=1 width=0) Filter: ((o_orderkey = ANY ('{1,2}'::integer[])) AND ((o_custkey = 11) OR (o_custkey = 22))) (9 rows) ``` After this commit the task count is what one would expect from the query defining multiple distinct values for the distribution column: ``` EXPLAIN SELECT count() FROM orders_hash_partitioned WHERE (o_orderkey IN (1,2)) AND (o_custkey = 11 OR o_custkey = 22); QUERY PLAN --------------------------------------------------------------------- Aggregate (cost=0.00..0.00 rows=0 width=0) -> Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=0 width=0) Task Count: 2 Tasks Shown: One of 2 -> Task Node: host=localhost port=xxxxx dbname=regression -> Aggregate (cost=13.68..13.69 rows=1 width=8) -> Seq Scan on orders_hash_partitioned_630000 orders_hash_partitioned (cost=0.00..13.68 rows=1 width=0) Filter: ((o_orderkey = ANY ('{1,2}'::integer[])) AND ((o_custkey = 11) OR (o_custkey = 22))) (9 rows) ``` "Core" of the pruning logic works as previously where it uses `PrunableInstances` to queue ORable valid constraints for shard pruning. The difference is that now we build a compact internal representation of the query expression tree with PruningTreeNodes before actual shard pruning is run. Pruning tree nodes represent boolean operators and the associated constraints of it. This internal format allows us to have compact representation of the query WHERE clauses which allows "core" pruning logic to work with OR-clauses correctly. For example query having `WHERE (o_orderkey IN (1,2)) AND (o_custkey=11 OR (o_shippriority > 1 AND o_shippriority < 10))` gets transformed into: 1. AND(o_orderkey IN (1,2), OR(X, AND(X, X))) 2. AND(o_orderkey IN (1,2), OR(X, X)) 3. AND(o_orderkey IN (1,2), X) Here X is any set of unknown condition(s) for shard pruning. This allow the final shard pruning to correctly recognize that shard pruning is done with the valid condition of `o_orderkey IN (1,2)`. Another example with unprunable condition in query `WHERE (o_orderkey IN (1,2)) OR (o_custkey=11 AND o_custkey=22)` gets transformed into: 1. OR(o_orderkey IN (1,2), AND(X, X)) 2. OR(o_orderkey IN (1,2), X) Which is recognized as unprunable due to the OR condition between distribution column and unknown constraint -> goes to all shards. Issue https://github.com/citusdata/citus/issues/1537 originally suggested transforming the query conditions into a full disjunctive normal form (DNF), but this process of transforming into DNF is quite a heavy operation. It may "blow up" into a really large DNF form with complex queries having non trivial `WHERE` clauses. I think the logic for shard pruning could be simplified further but I decided to leave the "core" of the shard pruning untouched.	2020-02-14 17:58:13 +00:00
Jelte Fennema	7730bd449c	Normalize tests: Remove trailing whitespace	2020-01-06 09:32:03 +01:00
Jelte Fennema	7f3de68b0d	Normalize tests: header separator length	2020-01-06 09:32:03 +01:00
Jelte Fennema	96434e898f	Normalize tests: s/assigned task [0-9]+ to node/assigned task to node/	2020-01-03 11:45:22 +01:00
Jelte Fennema	8c5c0dd74c	Normalize tests: s/localhost:[0-9]+/localhost:xxxxx/g	2020-01-03 11:40:50 +01:00
Philip Dubé	5fcc169a3a	Stray depended to dependent tidy up	2019-12-03 15:28:32 +00:00
Philip Dubé	74cb168205	Remove Postgres 10 support	2019-10-11 21:56:56 +00:00
Philip Dubé	77efec04a0	Router Planner: accept SELECT_CMD ctes in modification queries	2019-06-26 10:32:01 +02:00
Philip Dubé	84fe626378	multi_router_planner: refactor error propagation	2019-06-26 10:32:01 +02:00
velioglu	121ff39b26	Removes large_table_shard_count GUC	2018-04-29 10:34:50 +02:00

11 Commits (61bc47e6a8339fc15b7caed631f72d5db06403b5)