citus

Commit Graph

Author	SHA1	Message	Date
Nils Dijk	b5f3ed6523	support 2 level of objecttypes	2020-02-07 16:03:26 +01:00
Nils Dijk	a52ef4ea35	extract collation DistributeObjectOps	2020-02-07 01:49:54 +01:00
Nils Dijk	7fe71eb124	extract rename DistributeObjectOps	2020-02-07 01:45:29 +01:00
Nils Dijk	4ca33dde64	extract schema DistributeObjectOps	2020-02-07 01:43:13 +01:00
Nils Dijk	91e01a657b	extract index DistributeObjectOps	2020-02-07 01:40:41 +01:00
Nils Dijk	f497a7a994	extract cluster DistributeObjectOps	2020-02-07 01:37:54 +01:00
Nils Dijk	cc86236fa1	extract table DistributeObjectOps	2020-02-07 01:36:56 +01:00
Nils Dijk	2e0b395c26	extract role DistributeObjectOps	2020-02-07 01:30:54 +01:00
Nils Dijk	e1f9f37206	extract policy DistributeObjectOps	2020-02-07 01:29:11 +01:00
Nils Dijk	6a5b126910	extract function DistributeObjectOps	2020-02-07 01:25:24 +01:00
Nils Dijk	ee964dde3d	extract extension DistributeObjectOps	2020-02-07 01:21:37 +01:00
Nils Dijk	28c523cb1a	move distops for functionesques to commands/function.c	2020-02-07 01:13:09 +01:00
Nils Dijk	000f9032a2	move type related distops to commands/type.c	2020-02-07 00:56:07 +01:00
Nils Dijk	91a7047b6d	use linker for every distops	2020-02-07 00:16:49 +01:00
Nils Dijk	fa6f2ed382	beginning on using linker for dist ops collection	2020-02-06 18:44:48 +01:00
Philip Dubé	ecad4aa5e6	Fill in jobIdList field of DistributedExecution Pass down jobIdList from ExecuteTasksInDependencyOrder Also clean up comment for ExecuteTaskListOutsideTransaction	2020-02-05 17:32:22 +00:00
Halil Ozan Akgul	8ce4f20061	Fixes the bug of grants on public schema propagation	2020-02-05 18:05:58 +03:00
Önder Kalacı	ef7d1ea91d	Locally execute queries that don't need any data access (#3410 ) * Update shardPlacement->nodeId to uint As the source of the shardPlacement->nodeId is always workerNode->nodeId, and that is uint32. We had this hack because of: `0ea4e52df5 (r266421409)` And, that is gone with: `90056f7d3c (diff-c532177d74c72d3f0e7cd10e448ab3c6L1123)` So, we're safe to do it now. * Relax the restrictions on using the local execution Previously, whenever any local execution happens, we disabled further commands to do any remote queries. The basic motivation for doing that is to prevent any accesses in the same transaction block to access the same placements over multiple sessions: one is local session the other is remote session to the same placement. However, the current implementation does not distinguish local accesses being to a placement or not. For example, we could have local accesses that only touches intermediate results. In that case, we should not implement the same restrictions as they become useless. So, this is a pre-requisite for executing the intermediate result only queries locally. * Update the error messages As the underlying implementation has changed, reflect it in the error messages. * Keep track of connections to local node With this commit, we're adding infrastructure to track if any connection to the same local host is done or not. The main motivation for doing this is that we've previously were more conservative about not choosing local execution. Simply, we disallowed local execution if any connection to any remote node is done. However, if we want to use local execution for intermediate result only queries, this'd be annoying because we expect all queries to touch remote node before the final query. Note that this approach is still limiting in Citus MX case, but for now we can ignore that. * Formalize the concept of Local Node Also some minor refactoring while creating the dummy placement * Write intermediate results locally when the results are only needed locally Before this commit, Citus used to always broadcast all the intermediate results to remote nodes. However, it is possible to skip pushing the results to remote nodes always. There are two notable cases for doing that: (a) When the query consists of only intermediate results (b) When the query is a zero shard query In both of the above cases, we don't need to access any data on the shards. So, it is a valuable optimization to skip pushing the results to remote nodes. The pattern mentioned in (a) is actually a common patterns that Citus users use in practice. For example, if you have the following query: WITH cte_1 AS (...), cte_2 AS (....), ... cte_n (...) SELECT ... FROM cte_1 JOIN cte_2 .... JOIN cte_n ...; The final query could be operating only on intermediate results. With this patch, the intermediate results of the ctes are not unnecessarily pushed to remote nodes. * Add specific regression tests As there are edge cases in Citus MX and with round-robin policy, use the same queries on those cases as well. * Fix failure tests By forcing not to use local execution for intermediate results since all the tests expects the results to be pushed remotely. * Fix flaky test * Apply code-review feedback Mostly style changes * Limit the max value of pg_dist_node_seq to reserve for internal use	2020-01-23 18:28:34 +01:00
Halil Ozan Akgul	b40f067d05	Adds propagation for grant on schema commands	2020-01-20 14:51:28 +03:00
Jelte Fennema	246435be7e	Lazy query deparsing executable queries (#3350 ) Deparsing and parsing a query can be heavy on CPU. When locally executing the query we don't need to do this in theory most of the time. This PR is the first step in allowing to skip deparsing and parsing the query in these cases, by lazily creating the query string and storing the query in the task. Future commits will make use of this and not deparse and parse the query anymore, but use the one from the task directly.	2020-01-17 11:49:43 +01:00
Hadi Moshayedi	b4e5f4b10a	Implement INSERT ... SELECT with repartitioning	2020-01-16 23:24:52 -08:00
Halil Ozan Akgul	c5539d20d9	Adds alter table schema propagation	2020-01-16 17:04:16 +03:00
Nils Dijk	b6e09eb691	Fix: distributed function with table reference in declare (#3384 ) DESCRIPTION: Fixes a problem when adding a new node due to tables referenced in a functions body Fixes #3378 It was reported that `master_add_node` would fail if a distributed function has a table name referenced in its declare section of the body. By default postgres validates the body of a function on creation. This is not a problem in the normal case as tables are replicated to the workers when we distribute functions. However when a new node is added we first create dependencies on the workers before we try to create any tables, and the original tables get created out of bound when the metadata gets synced to the new node. This causes the function body validator to raise an error the table is not on the worker. To mitigate this issue we set `check_function_bodies` to `off` right before we are creating the function. The added test shows this does resolve the issue. (issue can be reproduced on the commit without the fix)	2020-01-16 14:21:54 +01:00
Marco Slot	90056f7d3c	Remove copy from worker for append-partitioned table	2020-01-13 23:03:40 -08:00
Philip Dubé	ccabf19090	Propagate DROP ROUTINE, ALTER ROUTINE In two places I've made code more straight forward by using ROUTINE in our own codegen Two changes which may seem extraneous: AppendFunctionName was updated to not use pg_get_function_identity_arguments. This is because that function includes ORDER BY when printing an aggregate like my_rank. While ALTER AGGREGATE my_rank(x "any" ORDER BY y "any") is accepted by postgres, ALTER ROUTINE my_rank(x "any" ORDER BY y "any") is not. Tests were updated to use macaddr over integer. Using integer is flaky, our logic could sometimes end up on tables like users_table. I originally wanted to use money, but money isn't hashable.	2020-01-13 15:37:46 +00:00
Philip Dubé	4b5d6c3ebe	Rename RelayFileState to ShardState Replace FILE_ prefix with SHARD_STATE_	2020-01-12 05:57:53 +00:00
Philip Dubé	73c06fae3b	Introduce GetDistributeObjectOps to organize dispatch of logic dependent on node/object type	2020-01-09 18:24:29 +00:00
Hadi Moshayedi	d7aea7fa10	Implement partitioned intermediate results.	2019-12-24 03:53:39 -08:00
Hadi Moshayedi	ef487e0792	Implement fetch_intermediate_results	2019-12-18 10:46:35 -08:00
SaitTalhaNisanci	7ff4ce2169	Add adaptive executor support for repartition joins (#3169 ) * WIP * wip * add basic logic to run a single job with repartioning joins with adaptive executor * fix some warnings and return in ExecuteDependedTasks if there is none * Add the logic to run depended jobs in adaptive executor The execution of depended tasks logic is changed. With the current logic: - All tasks are created from the top level task list. - At one iteration: - CurTasks whose dependencies are executed are found. - CurTasks are executed in parallel with adapter executor main logic. - The iteration is repeated until all tasks are completed. * Separate adaptive executor repartioning logic * Remove duplicate parts * cleanup directories and schemas * add basic repartion tests for adaptive executor * Use the first placement to fetch data In task tracker, when there are replicas, we try to fetch from a replica for which a map task is succeeded. TaskExecution is used for this, however TaskExecution is not used in adaptive executor. So we cannot use the same thing as task tracker. Since adaptive executor fails when a map task fails (There is no retry logic yet). We know that if we try to execute a fetch task, all of its map tasks already succeeded, so we can just use the first one to fetch from. * fix clean directories logic * do not change the search path while creating a udf * Enable repartition joins with adaptive executor with only enable_reparitition_joins guc * Add comments to adaptive_executor_repartition * dont run adaptive executor repartition test in paralle with other tests * execute cleanup only in the top level execution * do cleanup only in the top level ezecution * not begin a transaction if repartition query is used * use new connections for repartititon specific queries New connections are opened to send repartition specific queries. The opened connections will be closed at the FinishDistributedExecution. While sending repartition queries no transaction is begun so that we can see all changes. * error if a modification was done prior to repartition execution * not start a transaction if a repartition query and sql task, and clean temporary files and schemas at each subplan level * fix cleanup logic * update tests * add missing function comments * add test for transaction with DDL before repartition query * do not close repartition connections in adaptive executor * rollback instead of commit in repartition join test * use close connection instead of shutdown connection * remove unnecesary connection list, ensure schema owner before removing directory * rename ExecuteTaskListRepartition * put fetch query string in planner not executor as we currently support only replication factor = 1 with adaptive executor and repartition query and we know the query string in the planner phase in that case * split adaptive executor repartition to DAG execution logic and repartition logic * apply review items * apply review items * use an enum for remote transaction state and fix cleanup for repartition * add outside transaction flag to find connections that are unclaimed instead of always opening a new transaction * fix style * wip * rename removejobdir to partition cleanup * do not close connections at the end of repartition queries * do repartition cleanup in pg catch * apply review items * decide whether to use transaction or not at execution creation * rename isOutsideTransaction and add missing comment * not error in pg catch while doing cleanup * use replication factor of the creation time, not current time to decide if task tracker should be chosen * apply review items * apply review items * apply review item	2019-12-17 19:09:45 +03:00
Marco Slot	f4031dd477	Clean up transaction block usage logic in adaptive executor	2019-12-17 10:48:19 +01:00
SaitTalhaNisanci	2829c601dd	replace Begin words in coordinated transactions with use (#3293 )	2019-12-16 10:40:31 +03:00
SaitTalhaNisanci	13204487e9	remove copyright years (#3286 )	2019-12-11 21:14:08 +03:00
Philip Dubé	fcf2fd819b	Add distributioncolumncollation to to pg_dist_colocation Use partition column's collation for range distributed tables Don't allow non deterministic collations for hash distributed tables CoPartitionedTables: don't compare unequal types	2019-12-09 19:51:40 +00:00
Philip Dubé	d138bb89bf	Support creating collations as part of dependency resolution. Propagate ALTER/DROP on distributed collations Propagate CREATE COLLATION when outside transaction	2019-12-09 04:42:51 +00:00
SaitTalhaNisanci	aeec3d1544	fix typo in dependent jobs and dependent task (#3244 )	2019-11-28 23:47:28 +03:00
Hadi Moshayedi	2268a9cae6	Error for metadata commands if any metadata node is out-of-sync (#3226 ) * Error for metadata commands if any metadata node is out-of-sync * Make the functions have separate APIs for all workers/metadata workers	2019-11-27 09:52:57 +01:00
Philip Dubé	261a9de42d	Fix typos: VAR_SET_VALUE_KIND -> VAR_SET_VALUE kind beginnig -> beginning plannig -> planning the the -> the er then -> er than	2019-11-25 23:24:13 +00:00
Philip Dubé	3c10c27b13	GetFunctionAlterOwnerCommand: use format_procedure_qualified distributed_functions: test a function with a quote in name AppendDefElemSet: quote variable names	2019-11-25 23:01:30 +00:00
Jelte Fennema	1d8dde232f	Automatically convert useless declarations using regex replace (#3181 ) * Add declaration removal to CI * Convert declarations	2019-11-21 13:47:29 +01:00
Onur TIRTIR	9961297d7b	Improve extension command propagation logic and tests * Improve extension command propagation tests * patch for hardcoded citus extension name (cherry picked from commit 0bb3dbac0afabda10e8928f9c17eda048dc4361a)	2019-11-21 11:24:39 +03:00
Philip Dubé	b7fef5c31a	Miscellaneous cleanup in prep for collation propagation	2019-11-19 17:28:59 +00:00
Onur TIRTIR	26c306d188	Add extensions to distributed object propagation infrastructure (#3185 )	2019-11-19 17:56:28 +03:00
Halil Ozan Akgul	5ae7b219ff	Create the ALTER ROLE propagation	2019-11-18 18:31:28 +03:00
Hadi Moshayedi	15af1637aa	Replicate reference tables to coordinator.	2019-11-15 05:50:19 -08:00
SaitTalhaNisanci	b9b7fd7660	add IsLoggableLevel utility function (#3149 ) * add IsLoggableLevel utility function * add function comment for IsLoggableLevel * put ApplyLogRedaction to logutils	2019-11-15 14:59:13 +03:00
Jelte Fennema	1b2c438e69	Rename variables to not shadow globals in RHEL6 (#3194 ) Fixes #2839	2019-11-15 12:12:24 +01:00
Philip Dubé	48552bfffe	Call DestReceiver rDestroy before it goes out of scope CitusCopyDestReceiverDestroy: call hash_destroy on shardStateHash & connectionStateHash	2019-11-12 15:03:07 +00:00
Philip Dubé	ad86c1b866	AcquireDistributedLockOnRelations: escape relation names	2019-11-08 21:23:01 +00:00
Philip Dubé	2fc45e5897	create_distributed_function: accept aggregates Adds support for OCLASS_PROC to worker_create_or_replace_object	2019-11-06 18:23:37 +00:00

1 2 3 4 5 ...

280 Commits (b5f3ed6523373d12686aadd42cce87b629f1e283)