citus

Commit Graph

Author	SHA1	Message	Date
Onur Tirtir	81d48d3466	fix some typos	2020-03-25 11:01:26 +03:00
SaitTalhaNisanci	3df578010e	add a UDF to update colocation (#3623 ) If two tables have the same distribution column type, we implicitly colocate them. This is useful since colocation has a big performance impact in most applications. When a table is rebalanced, all of the colocated tables are also rebalanced. If table A and table B are colocated and we want to rebalance table A, table B will also be rebalanced. We need replica identity so that logical replication can replicate updates and deletes during rebalancing. If table B does not have a replica identity we error out. A solution to this is to introduce a UDF so that colocation can be updated. The remaining tables in the colocation group will stay colocated. For example if table A, B and C are colocated and after updating table B's colocations, table A and table C stay colocated. The "updating colocation" step does not move any data around, it only updated pg_dist_partition and pg_dist_colocation tables. Specifically it creates a new colocation group for the table and updates the entry in pg_dist_partition while invalidating any cache.	2020-03-23 13:22:24 +03:00
SaitTalhaNisanci	2eaf7bba69	not use local copy if we are copying into intermediate results file We have special logic to copy into intermediate results and we use a custom format for that, "result" copy format. Postgres internally does not know this format and if we use this locally it will error saying that it does not know this format. Files are visible to all transactions, which means that we can use any connection to access files. In order to use the existing logic, it makes sense that in case we have intermediate results, which means we will write the results to a file, we preserve the same behavior, which is opening connections to localhost. Therefore if we have intermediate results we return false in ShouldExecuteCopyLocally.	2020-03-18 09:35:20 +03:00
SaitTalhaNisanci	9d2f3c392a	enable local execution in INSERT..SELECT and add more tests We can use local copy in INSERT..SELECT, so the check that disables local execution is removed. Also a test for local copy where the data size > LOCAL_COPY_FLUSH_THRESHOLD is added. use local execution with insert..select	2020-03-18 09:34:39 +03:00
SaitTalhaNisanci	42cfc4c0e9	apply review items log shard id in local copy and add more comments	2020-03-18 09:33:55 +03:00
SaitTalhaNisanci	c22068e75a	use the right partition for partitioned tables	2020-03-18 09:28:59 +03:00
SaitTalhaNisanci	1df9601e13	not use local copy if current transaction is connected to local group If current transaction is connected to local group we should not use local copy, because we might not see some of the changes that are made over the connection to the local group.	2020-03-18 09:28:59 +03:00
SaitTalhaNisanci	f9c4431885	add the support to execute copy locally A copy will be executed locally if - Local execution is enabled and current transaction accessed a local placement - Local execution is enabled and we are inside a transaction block. So even if local execution is enabled but we are not in a transaction block, the copy will not be run locally. This will not run locally: ``` COPY distributed_table FROM STDIN; .... ``` This will run locally: ``` SET citus.enable_local_execution to 'on'; BEGIN; COPY distributed_table FROM STDIN; COMMIT; .... ``` . There are 3 ways to do a copy in postgres programmatically: - from a file - from a program - from a callback function I have chosen to implement it with a callback function, which means that we write the rows of copy from a callback function to the output buffer, which is used to insert tuples into the actual table. For each shard id, we have a buffer that keeps the current rows to be written, we perform the actual copy operation either when: - copy buffer for the given shard id reaches to a threshold, which is currently 512KB - we reach to the end of the copy The buffer size is debatable(512KB). At a given time, we might allocate (local placement * buffer size) memory at most. The local copy uses the same copy format as remote copy, which means that we serialize the data in the same format as remote copy and send it locally. There was also the option to use ExecSimpleRelationInsert to insert slots one by one, which would avoid the extra serialization/deserialization but doing some benchmarks it seems that using buffers are significantly better in terms of the performance. You can see this comment for more details: https://github.com/citusdata/citus/pull/3557#discussion_r389499054	2020-03-18 09:28:59 +03:00
Onur Tirtir	a14739f808	Local execution of ddl/drop/truncate commands (#3514 ) * reimplement ExecuteUtilityTaskListWithoutResults for local utility command execution * introduce new functions for local execution of utility commands * change ErrorIfTransactionAccessedPlacementsLocally logic for local utility command execution * enable local execution for TRUNCATE command on distributed & reference tables * update existing tests for local utility command execution * enable local execution for DDL commands on distributed & reference tables * enable local execution for DROP command on distributed & reference tables * add normalization rules for cascaded commands * add new tests for local utility command execution	2020-03-13 15:39:32 +03:00
Jelte Fennema	e0bbe1ca38	Semmle: Actively check one possible NULL deref case (#3560 ) Calling ErrorIfUnsupportedConstraint was still giving errors on Semmle. This makes sure that we check for NULL at runtime. This way we can safely ignore all errors created by this function.	2020-03-11 18:11:56 +01:00
Onder Kalaci	7d787e3d5e	Prevent create_distributed_function() from the workers As this could cause weird edge cases.	2020-03-10 18:24:20 +01:00
Philip Dubé	7cdfa1daab	Rename LookupCitusTableCacheEntry to GetCitusTableCacheEntry, LookupLookupCitusTableCacheEntry back to LookupCitusTableCacheEntry	2020-03-08 14:08:23 +00:00
Philip Dubé	a7cca1bcde	Rename DistTableCacheEntry to CitusTableCacheEntry	2020-03-07 14:08:03 +00:00
Philip Dubé	b514ab0f55	Fix typos, rename isDistributedRelation to isCitusRelation	2020-03-06 19:20:34 +00:00
Philip Dubé	bec58000d6	Given IsDistributedTableRTE, there's ambiguity in what DistributedTable means Elsewhere we used DistributedTable to include reference tables Marco suggested we use CitusTable for distributed & reference tables So renaming: - IsDistributedTable -> IsCitusTable - IsDistributedTableViaCatalog -> IsCitusTableViaCatalog - DistributedTableCacheEntry -> CitusTableCacheEntry - DistributedTableList -> CitusTableList - isDistributedTable -> isCitusTable - InsertSelectIntoDistributedTable -> InsertSelectIntoCitusTable - ExtractFirstDistributedTableId -> ExtractFirstCitusTableId	2020-03-06 18:57:55 +00:00
Onur Tirtir	bdce9acc30	some refactor around foreign key constraints	2020-03-05 20:20:41 +03:00
Onur Tirtir	88bfd2e4b7	refactor around local group id checks Mostyl optimizes the calls made to GetLocalGroupId and refactors its usages	2020-03-05 20:20:41 +03:00
Onur Tirtir	1e128a6ee4	fix a potential infinite loop	2020-03-05 20:20:41 +03:00
Nils Dijk	268ad741a9	Refactor the deparsing of a CREATE EXTENSION to prevent NULL POINTER dereferences (#3518 ) DESCRIPTION: satisfy static analysis tool for a nullptr dereference During the static analysis project on the codebase this code has been flagged as having the potential for a null pointer dereference. Funnily enough the author had already made a comment of it in the code this was not possible due to us setting the schema name before we pass in the statement. If we want to reuse this code in a later setting this comment might not always apply and we could actually run into null pointer dereference. This patch changes a bit of the code around to first of all make sure there is no NULL pointer dereference in this code anymore. Secondly we allow for better deparsing by setting and adhering to the `if_not_exists` flag on the statement. And finally add support for all syntax described in the documentation of postgres (FROM was missing).	2020-03-04 16:47:07 +01:00
Onur Tirtir	ff9c9d1808	make VacuumTaskList even with other taskList functions and some safety changes Makees VacuumTaskList function even with other TaskList creator functions. Also, previously we were generating per-shard vacuum command strings via unconventional usage of StringInfo struct (setting the stringInfo->len field manually) which could cause unexepected memory errors (that I cannot foresee now).	2020-03-02 10:25:28 +03:00
Philip Dubé	34f241af16	Fix create_distributed_table on a table using GENERATED ALWAYS AS If the generated column does not come at the end of the column list, columnNameList doesn't line up with the column indexes. Seek past CREATE TABLE test_table ( test_id int PRIMARY KEY, gen_n int GENERATED ALWAYS AS (1) STORED, created_at TIMESTAMPTZ NOT NULL DEFAULT now() ); SELECT create_distributed_table('test_table', 'test_id'); Would raise ERROR: cannot cast 23 to 1184	2020-02-28 09:34:26 -08:00
Philip Dubé	20abc4d2b5	Replace foreach with foreach_ptr/foreach_oid (#3544 )	2020-02-27 16:54:49 +01:00
Jelte Fennema	685b54b3de	Semmle: Check for NULL in some places where it might occur (#3509 ) Semmle reported quite some places where we use a value that could be NULL. Most of these are not actually a real issue, but better to be on the safe side with these things and make the static analysis happy.	2020-02-27 10:45:29 +01:00
Jelte Fennema	8de8b62669	Convert unsafe APIs to safe ones	2020-02-25 15:39:27 +01:00
Philip Dubé	bcf54c5014	Address a couple issues with maintenace daemon management: - Stop the daemon when citus extension is dropped - Bail on maintenance daemon startup if myDbData is started with a non-zero pid - Stop maintenance daemon from spawning itself - Don't use postgres die, just wrap proc_exit(0) - Assert(myDbData->workerPid == MyProcPid) The two issues were that multiple daemons could be running for a database, or that a daemon would be leftover after DROP EXTENSION citus	2020-02-21 16:49:01 +00:00
Onur Tirtir	926a1a61b9	change "relation" with "table" in error messages related with foreign keys on reference tables	2020-02-20 09:58:47 +03:00
Philip Dubé	08f6842d50	Fix typos Equivalance -> Equivalence utillity -> utility shorted lived one -> shortly lived one elegible -> eligible	2020-02-18 17:14:40 +00:00
Marco Slot	038e5999cb	Implement direct COPY table TO stdout	2020-02-17 15:15:10 +01:00
Philip Dubé	3a906b8210	Fix typos noticed while reading through code trying to understand HAVING	2020-02-11 19:55:10 +00:00
Onur Tirtir	39df51e903	Introduce objects to dist. infrastructure when updating Citus (#3477 ) Mark existing objects that are not included in distributed object infrastructure in older versions of Citus (but now should be) as distributed, after updating Citus successfully.	2020-02-07 18:07:59 +03:00
Philip Dubé	ecad4aa5e6	Fill in jobIdList field of DistributedExecution Pass down jobIdList from ExecuteTasksInDependencyOrder Also clean up comment for ExecuteTaskListOutsideTransaction	2020-02-05 17:32:22 +00:00
Halil Ozan Akgul	8ce4f20061	Fixes the bug of grants on public schema propagation	2020-02-05 18:05:58 +03:00
Önder Kalacı	ef7d1ea91d	Locally execute queries that don't need any data access (#3410 ) * Update shardPlacement->nodeId to uint As the source of the shardPlacement->nodeId is always workerNode->nodeId, and that is uint32. We had this hack because of: `0ea4e52df5 (r266421409)` And, that is gone with: `90056f7d3c (diff-c532177d74c72d3f0e7cd10e448ab3c6L1123)` So, we're safe to do it now. * Relax the restrictions on using the local execution Previously, whenever any local execution happens, we disabled further commands to do any remote queries. The basic motivation for doing that is to prevent any accesses in the same transaction block to access the same placements over multiple sessions: one is local session the other is remote session to the same placement. However, the current implementation does not distinguish local accesses being to a placement or not. For example, we could have local accesses that only touches intermediate results. In that case, we should not implement the same restrictions as they become useless. So, this is a pre-requisite for executing the intermediate result only queries locally. * Update the error messages As the underlying implementation has changed, reflect it in the error messages. * Keep track of connections to local node With this commit, we're adding infrastructure to track if any connection to the same local host is done or not. The main motivation for doing this is that we've previously were more conservative about not choosing local execution. Simply, we disallowed local execution if any connection to any remote node is done. However, if we want to use local execution for intermediate result only queries, this'd be annoying because we expect all queries to touch remote node before the final query. Note that this approach is still limiting in Citus MX case, but for now we can ignore that. * Formalize the concept of Local Node Also some minor refactoring while creating the dummy placement * Write intermediate results locally when the results are only needed locally Before this commit, Citus used to always broadcast all the intermediate results to remote nodes. However, it is possible to skip pushing the results to remote nodes always. There are two notable cases for doing that: (a) When the query consists of only intermediate results (b) When the query is a zero shard query In both of the above cases, we don't need to access any data on the shards. So, it is a valuable optimization to skip pushing the results to remote nodes. The pattern mentioned in (a) is actually a common patterns that Citus users use in practice. For example, if you have the following query: WITH cte_1 AS (...), cte_2 AS (....), ... cte_n (...) SELECT ... FROM cte_1 JOIN cte_2 .... JOIN cte_n ...; The final query could be operating only on intermediate results. With this patch, the intermediate results of the ctes are not unnecessarily pushed to remote nodes. * Add specific regression tests As there are edge cases in Citus MX and with round-robin policy, use the same queries on those cases as well. * Fix failure tests By forcing not to use local execution for intermediate results since all the tests expects the results to be pushed remotely. * Fix flaky test * Apply code-review feedback Mostly style changes * Limit the max value of pg_dist_node_seq to reserve for internal use	2020-01-23 18:28:34 +01:00
Halil Ozan Akgul	b40f067d05	Adds propagation for grant on schema commands	2020-01-20 14:51:28 +03:00
Jelte Fennema	246435be7e	Lazy query deparsing executable queries (#3350 ) Deparsing and parsing a query can be heavy on CPU. When locally executing the query we don't need to do this in theory most of the time. This PR is the first step in allowing to skip deparsing and parsing the query in these cases, by lazily creating the query string and storing the query in the task. Future commits will make use of this and not deparse and parse the query anymore, but use the one from the task directly.	2020-01-17 11:49:43 +01:00
Hadi Moshayedi	b4e5f4b10a	Implement INSERT ... SELECT with repartitioning	2020-01-16 23:24:52 -08:00
Halil Ozan Akgul	c5539d20d9	Adds alter table schema propagation	2020-01-16 17:04:16 +03:00
Nils Dijk	b6e09eb691	Fix: distributed function with table reference in declare (#3384 ) DESCRIPTION: Fixes a problem when adding a new node due to tables referenced in a functions body Fixes #3378 It was reported that `master_add_node` would fail if a distributed function has a table name referenced in its declare section of the body. By default postgres validates the body of a function on creation. This is not a problem in the normal case as tables are replicated to the workers when we distribute functions. However when a new node is added we first create dependencies on the workers before we try to create any tables, and the original tables get created out of bound when the metadata gets synced to the new node. This causes the function body validator to raise an error the table is not on the worker. To mitigate this issue we set `check_function_bodies` to `off` right before we are creating the function. The added test shows this does resolve the issue. (issue can be reproduced on the commit without the fix)	2020-01-16 14:21:54 +01:00
Marco Slot	90056f7d3c	Remove copy from worker for append-partitioned table	2020-01-13 23:03:40 -08:00
Philip Dubé	ccabf19090	Propagate DROP ROUTINE, ALTER ROUTINE In two places I've made code more straight forward by using ROUTINE in our own codegen Two changes which may seem extraneous: AppendFunctionName was updated to not use pg_get_function_identity_arguments. This is because that function includes ORDER BY when printing an aggregate like my_rank. While ALTER AGGREGATE my_rank(x "any" ORDER BY y "any") is accepted by postgres, ALTER ROUTINE my_rank(x "any" ORDER BY y "any") is not. Tests were updated to use macaddr over integer. Using integer is flaky, our logic could sometimes end up on tables like users_table. I originally wanted to use money, but money isn't hashable.	2020-01-13 15:37:46 +00:00
Philip Dubé	4b5d6c3ebe	Rename RelayFileState to ShardState Replace FILE_ prefix with SHARD_STATE_	2020-01-12 05:57:53 +00:00
Philip Dubé	73c06fae3b	Introduce GetDistributeObjectOps to organize dispatch of logic dependent on node/object type	2020-01-09 18:24:29 +00:00
Hadi Moshayedi	d7aea7fa10	Implement partitioned intermediate results.	2019-12-24 03:53:39 -08:00
Hadi Moshayedi	ef487e0792	Implement fetch_intermediate_results	2019-12-18 10:46:35 -08:00
SaitTalhaNisanci	7ff4ce2169	Add adaptive executor support for repartition joins (#3169 ) * WIP * wip * add basic logic to run a single job with repartioning joins with adaptive executor * fix some warnings and return in ExecuteDependedTasks if there is none * Add the logic to run depended jobs in adaptive executor The execution of depended tasks logic is changed. With the current logic: - All tasks are created from the top level task list. - At one iteration: - CurTasks whose dependencies are executed are found. - CurTasks are executed in parallel with adapter executor main logic. - The iteration is repeated until all tasks are completed. * Separate adaptive executor repartioning logic * Remove duplicate parts * cleanup directories and schemas * add basic repartion tests for adaptive executor * Use the first placement to fetch data In task tracker, when there are replicas, we try to fetch from a replica for which a map task is succeeded. TaskExecution is used for this, however TaskExecution is not used in adaptive executor. So we cannot use the same thing as task tracker. Since adaptive executor fails when a map task fails (There is no retry logic yet). We know that if we try to execute a fetch task, all of its map tasks already succeeded, so we can just use the first one to fetch from. * fix clean directories logic * do not change the search path while creating a udf * Enable repartition joins with adaptive executor with only enable_reparitition_joins guc * Add comments to adaptive_executor_repartition * dont run adaptive executor repartition test in paralle with other tests * execute cleanup only in the top level execution * do cleanup only in the top level ezecution * not begin a transaction if repartition query is used * use new connections for repartititon specific queries New connections are opened to send repartition specific queries. The opened connections will be closed at the FinishDistributedExecution. While sending repartition queries no transaction is begun so that we can see all changes. * error if a modification was done prior to repartition execution * not start a transaction if a repartition query and sql task, and clean temporary files and schemas at each subplan level * fix cleanup logic * update tests * add missing function comments * add test for transaction with DDL before repartition query * do not close repartition connections in adaptive executor * rollback instead of commit in repartition join test * use close connection instead of shutdown connection * remove unnecesary connection list, ensure schema owner before removing directory * rename ExecuteTaskListRepartition * put fetch query string in planner not executor as we currently support only replication factor = 1 with adaptive executor and repartition query and we know the query string in the planner phase in that case * split adaptive executor repartition to DAG execution logic and repartition logic * apply review items * apply review items * use an enum for remote transaction state and fix cleanup for repartition * add outside transaction flag to find connections that are unclaimed instead of always opening a new transaction * fix style * wip * rename removejobdir to partition cleanup * do not close connections at the end of repartition queries * do repartition cleanup in pg catch * apply review items * decide whether to use transaction or not at execution creation * rename isOutsideTransaction and add missing comment * not error in pg catch while doing cleanup * use replication factor of the creation time, not current time to decide if task tracker should be chosen * apply review items * apply review items * apply review item	2019-12-17 19:09:45 +03:00
Marco Slot	f4031dd477	Clean up transaction block usage logic in adaptive executor	2019-12-17 10:48:19 +01:00
SaitTalhaNisanci	2829c601dd	replace Begin words in coordinated transactions with use (#3293 )	2019-12-16 10:40:31 +03:00
SaitTalhaNisanci	13204487e9	remove copyright years (#3286 )	2019-12-11 21:14:08 +03:00
Philip Dubé	fcf2fd819b	Add distributioncolumncollation to to pg_dist_colocation Use partition column's collation for range distributed tables Don't allow non deterministic collations for hash distributed tables CoPartitionedTables: don't compare unequal types	2019-12-09 19:51:40 +00:00
Philip Dubé	d138bb89bf	Support creating collations as part of dependency resolution. Propagate ALTER/DROP on distributed collations Propagate CREATE COLLATION when outside transaction	2019-12-09 04:42:51 +00:00
SaitTalhaNisanci	aeec3d1544	fix typo in dependent jobs and dependent task (#3244 )	2019-11-28 23:47:28 +03:00
Hadi Moshayedi	2268a9cae6	Error for metadata commands if any metadata node is out-of-sync (#3226 ) * Error for metadata commands if any metadata node is out-of-sync * Make the functions have separate APIs for all workers/metadata workers	2019-11-27 09:52:57 +01:00
Philip Dubé	261a9de42d	Fix typos: VAR_SET_VALUE_KIND -> VAR_SET_VALUE kind beginnig -> beginning plannig -> planning the the -> the er then -> er than	2019-11-25 23:24:13 +00:00
Philip Dubé	3c10c27b13	GetFunctionAlterOwnerCommand: use format_procedure_qualified distributed_functions: test a function with a quote in name AppendDefElemSet: quote variable names	2019-11-25 23:01:30 +00:00
Jelte Fennema	1d8dde232f	Automatically convert useless declarations using regex replace (#3181 ) * Add declaration removal to CI * Convert declarations	2019-11-21 13:47:29 +01:00
Onur TIRTIR	9961297d7b	Improve extension command propagation logic and tests * Improve extension command propagation tests * patch for hardcoded citus extension name (cherry picked from commit 0bb3dbac0afabda10e8928f9c17eda048dc4361a)	2019-11-21 11:24:39 +03:00
Philip Dubé	b7fef5c31a	Miscellaneous cleanup in prep for collation propagation	2019-11-19 17:28:59 +00:00
Onur TIRTIR	26c306d188	Add extensions to distributed object propagation infrastructure (#3185 )	2019-11-19 17:56:28 +03:00
Halil Ozan Akgul	5ae7b219ff	Create the ALTER ROLE propagation	2019-11-18 18:31:28 +03:00
Hadi Moshayedi	15af1637aa	Replicate reference tables to coordinator.	2019-11-15 05:50:19 -08:00
SaitTalhaNisanci	b9b7fd7660	add IsLoggableLevel utility function (#3149 ) * add IsLoggableLevel utility function * add function comment for IsLoggableLevel * put ApplyLogRedaction to logutils	2019-11-15 14:59:13 +03:00
Jelte Fennema	1b2c438e69	Rename variables to not shadow globals in RHEL6 (#3194 ) Fixes #2839	2019-11-15 12:12:24 +01:00
Philip Dubé	48552bfffe	Call DestReceiver rDestroy before it goes out of scope CitusCopyDestReceiverDestroy: call hash_destroy on shardStateHash & connectionStateHash	2019-11-12 15:03:07 +00:00
Philip Dubé	ad86c1b866	AcquireDistributedLockOnRelations: escape relation names	2019-11-08 21:23:01 +00:00
Philip Dubé	2fc45e5897	create_distributed_function: accept aggregates Adds support for OCLASS_PROC to worker_create_or_replace_object	2019-11-06 18:23:37 +00:00
Önder Kalacı	960cd02c67	Remove real time router executors (#3142 ) * Remove unused executor codes All of the codes of real-time executor. Some functions in router executor still remains there because there are common functions. We'll move them to accurate places in the follow-up commits. * Move GUCs to transaction mngnt and remove unused struct * Update test output * Get rid of references of real-time executor from code * Warn if real-time executor is picked * Remove lots of unused connection codes * Removed unused code for connection restrictions Real-time and router executors cannot handle re-using of the existing connections within a transaction block. Adaptive executor and COPY can re-use the connections. So, there is no reason to keep the code around for applying the restrictions in the placement connection logic.	2019-11-05 12:48:10 +01:00
Marco Slot	067657af26	Disallow distributed functions with distribution arguments unless replication_model is streaming	2019-10-26 23:57:59 +02:00
Jelte Fennema	a5010e5b17	Add extra foreach convenience macros (#3117 ) This completely hides `ListCell` to the user of the loop Example usage: ```c WorkerNode workerNode = NULL; foreach_ptr(workerNode, workerNodeList) { // Do stuff with workerNode } ``` Instead of: ```c ListCell workerNodeCell = NULL; foreach(cell, workerNodeList) { WorkerNode *workerNode = lfirst(workerNodeCell); // Do stuff with workerNode } ```	2019-10-23 16:49:12 +02:00
SaitTalhaNisanci	94a7e6475c	Remove copyright years (#2918 ) * Update year as 2012-2019 * Remove copyright years	2019-10-15 17:44:30 +03:00
Philip Dubé	74cb168205	Remove Postgres 10 support	2019-10-11 21:56:56 +00:00
Philip Dubé	4063e7ca67	CALL delegation: apply strip_implicit_coercions to distribution argument	2019-10-10 17:42:43 +00:00
Philip Dubé	dd490b6376	Cache whether an object is in pg_dist_object. Avoids redundant lookups for non-distributed objects	2019-10-10 14:50:38 +00:00
Nils Dijk	4a4a220945	Fix enum add value order and pg12 (#3082 ) DESCRIPTION: Fix order for enum values and correctly support pg12 PG 12 introduces `ALTER TYPE ... ADD VALUE ...` during transactions. Earlier versions would error out when called in a transaction, hence we connect to workers outside of the transaction which could cause inconsistencies on pg12 now that postgres doesn't error with this syntax anymore. During the implementation of this fix it became apparent there was an error with the ordering of enum labels when the type was recreated. A patch and test have been included.	2019-10-07 17:16:19 +02:00
Jelte Fennema	01da11f264	Change citus truncate trigger to AFTER and add more upgrade tests (#3070 ) * Add more upgrade tests * Fix citus trigger generation after upgrade citus_truncate_trigger runs before truncate when created by create_distributed_table: `492d1b2cba/src/backend/distributed/commands/create_distributed_table.c (L1163)` * Remove pg_dist_jobid_seq	2019-10-07 16:43:04 +02:00
Onder Kalaci	3be72ce42f	Make sure that distributed functions always have the correct user Objectives: (a) both super user and regular user should have the correct owner for the function on the worker (b) The transactional semantics would work fine for both super user and regular user (c) non-super-user and non-function owner would get a reasonable error message if tries to distribute the function Co-authored-by: @serprex	2019-10-04 21:38:49 +00:00
Philip Dubé	29f1ea079b	PG_VERSION_NUM > 110000 should be PG_VERSION_NUM >= 110000 Also fix a > 12000 typo	2019-09-30 23:37:43 +00:00
Nils Dijk	01b26cf91a	Disallow distributed functions for functions depending on an extension (#3049 ) DESCRIPTION: Disallow distributed functions for functions depending on an extension Functions depending on an extension cannot (yet) be distributed by citus. If we would allow this it would cause issues with our dependency following mechanism as we stop following objects depending on an extension. By not allowing functions to be distributed when they depend on an extension as well as not allowing to make distributed functions depend on an extension we won't break the ability to add new nodes. Allowing functions depending on extensions to be distributed at the moment could cause problems in that area.	2019-09-30 15:19:47 +02:00
Nils Dijk	473cbc0115	Propagate CREATE OR REPLACE FUNCTION to workers for distributed functions (#3043 ) DESCRIPTION: Propagate CREATE OR REPLACE FUNCTION Distributed functions could be replaced, which should be propagated to the workers to keep the function in sync between all nodes. Due to the complexity of deparsing the `CreateFunctionStmt` we actually produce the plan during the processing phase of our utilityhook. Since the changes have already been made in the catalog tables we can reuse `pg_get_functiondef` to get us the generated `CREATE OR REPLACE` sql.	2019-09-30 12:41:17 +02:00
Nils Dijk	9c2c50d875	Hookup function/procedure deparsing to our utility hook (#3041 ) DESCRIPTION: Propagate ALTER FUNCTION statements for distributed functions Using the implemented deparser for function statements to propagate changes to both functions and procedures that are previously distributed.	2019-09-27 22:06:49 +02:00
Philip Dubé	363409a0c2	Propagate REINDEX TABLE & REINDEX INDEX	2019-09-27 18:14:53 +00:00
Marco Slot	2868e02a3d	Implement SELECT function call delegation. When a function is marked as colocated with a distributed table, we try delegating queries of kind "SELECT func(...)" to workers. We currently only support this simple form, and don't delegate forms like "SELECT f1(...), f2(...)", "SELECT f1(...) FROM ...", or function calls inside transactions. As a side effect, we also fix the transactional semantics of DO blocks. Previously we didn't consider a DO block a multi-statement transaction. Now we do. Co-authored-by: Marco Slot <marco@citusdata.com> Co-authored-by: serprex <serprex@users.noreply.github.com> Co-authored-by: pykello <hadi.moshayedi@microsoft.com>	2019-09-27 09:13:25 -07:00
Marco Slot	32a11bdf6c	Return early for common commands in the utility hook (#3031 ) We started copying parse trees by default further on in `multi_ProcessUtility`. That's not a problem for maintenance command, but might register for things like `PREPARE` and `EXECUTE`, which might happen thousands of times per second. Add a few common commands to the check at the start.	2019-09-26 11:43:35 +02:00
Philip Dubé	4f60e3a149	Feedback	2019-09-24 17:31:09 +00:00
Marco Slot	ca478defeb	Deparse CALL statement instead of using original query string	2019-09-24 17:31:09 +00:00
Philip Dubé	90e1f1442a	Annotated tests for multi_mx_call. Co-authored-by: pykello <hadi.moshayedi@microsoft.com>	2019-09-24 17:31:09 +00:00
Marco Slot	e269d990c9	Cast the distribution argument value when possible	2019-09-24 17:31:09 +00:00
Philip Dubé	432a8ef85b	Hadi's feedback Co-authored-by: pykello <hadi.moshayedi@microsoft.com> Co-authored-by: serprex <serprex@users.noreply.github.com>	2019-09-24 17:31:09 +00:00
Philip Dubé	bc1ad67eb5	Distribute CALL on distributed procedures to metadata workers Lots taken from https://github.com/citusdata/citus/pull/2829	2019-09-24 17:31:09 +00:00
Onder Kalaci	18de78f386	Relax the colocation checks for distributed functions As long as the types can be coerced, it is safe to pushdown functions.	2019-09-24 16:31:08 +02:00
Philip Dubé	06faba91c0	Include ifdefs for pg12 API changes, update local_shard_executiuon test to avoid CTE inlining	2019-09-23 20:22:35 +00:00
Onder Kalaci	d37745bfc7	Sync metadata to worker nodes after create_distributed_function Since the distributed functions are useful when the workers have metadata, we automatically sync it. Also, after master_add_node(). We do it lazily and let the deamon sync it. That's mainly because the metadata syncing cannot be done in transaction blocks, and we don't want to add lots of transactional limitations to master_add_node() and create_distributed_function().	2019-09-23 18:30:53 +02:00
Onder Kalaci	d7e2968120	Add parameters to create_distributed_function() With this commit, we're changing the API for create_distributed_function() such that users can provide the distribution argument and the colocation information.	2019-09-22 21:53:33 +02:00
Hanefi Onaldi	ed11b9590c	Add distributed func creation queries in dependency replication logic	2019-09-18 20:07:45 +03:00
Nils Dijk	db5d03931d	Feature disable object propagation (#2986 ) DESCRIPTION: Provide a GUC to turn of the new dependency propagation functionality In the case the dependency propagation functionality introduced in 9.0 causes issues to a cluster of a user they can turn it off almost completely. The only dependency that will still be propagated and kept track of is the schema to emulate the old behaviour. GUC to change is `citus.enable_object_propagation`. When set to `false` the functionality will be mostly turned off. Be aware that objects marked as distributed in `pg_dist_object` will still be kept in the catalog as a distributed object. Alter statements to these objects will not be propagated to workers and may cause desynchronisation.	2019-09-18 17:16:22 +02:00
Nils Dijk	2b7f5552c8	Fix: rename remote type on conflict (#2983 ) DESCRIPTION: Rename remote types during type propagation To prevent data to be destructed when a remote type differs from the type on the coordinator during type propagation we wanted to rename the type instead of `DROP CASCADE`. This patch removes the `DROP` logic and adds the creation of a rename statement to a free name.	2019-09-17 18:54:10 +02:00
Nils Dijk	0a3152d09c	Add feature flag to turn off create type propagation (#2982 ) DESCRIPTION: Add feature flag to turn off create type propagation When `citus.enable_create_type_propagation` is set to `false` citus will not propagate `CREATE TYPE` statements to the workers. Types are still distributed when tables that depend on these types are distributed.	2019-09-17 15:50:06 +02:00
Hanefi Onaldi	8f2a3a0604	Introduce create_distributed_function(regproc) UDF (#2961 ) This PR aims to add the minimal set of changes required to start distributing functions. You can use create_distributed_function(regproc) UDF to distribute a function. SELECT create_distributed_function('add(int,int)'); The function definition should include the param types to properly identify the correct function that we wish to distribute	2019-09-13 23:27:46 +03:00
Philip Dubé	492d1b2cba	ActivePrimaryNodeList: add lockMode parameter	2019-09-13 17:44:56 +00:00
Nils Dijk	2879689441	Distribute Types to worker nodes (#2893 ) DESCRIPTION: Distribute Types to worker nodes When to propagate ============== There are two logical moments that types could be distributed to the worker nodes - When they get used ( just in time distribution ) - When they get created ( proactive distribution ) The just in time distribution follows the model used by how schema's get created right before we are going to create a table in that schema, for types this would be when the table uses a type as its column. The proactive distribution is suitable for situations where it is benificial to have the type on the worker nodes directly. They can later on be used in queries where an intermediate result gets created with a cast to this type. Just in time creation is always the last resort, you cannot create a distributed table before the type gets created. A good example use case is; you have an existing postgres server that needs to scale out. By adding the citus extension, add some nodes to the cluster, and distribute the table. The type got created before citus existed. There was no moment where citus could have propagated the creation of a type. Proactive is almost always a good option. Types are not resource intensive objects, there is no performance overhead of having 100's of types. If you want to use them in a query to represent an intermediate result (which happens in our test suite) they just work. There is however a moment when proactive type distribution is not beneficial; in transactions where the type is used in a distributed table. Lets assume the following transaction: ```sql BEGIN; CREATE TYPE tt1 AS (a int, b int); CREATE TABLE t1 AS (a int PRIMARY KEY, b tt1); SELECT create_distributed_table('t1', 'a'); \copy t1 FROM bigdata.csv ``` Types are node scoped objects; meaning the type exists once per worker. Shards however have best performance when they are created over their own connection. For the type to be visible on all connections it needs to be created and committed before we try to create the shards. Here the just in time situation is most beneficial and follows how we create schema's on the workers. Outside of a transaction block we will just use 1 connection to propagate the creation. How propagation works ================= Just in time ----------- Just in time propagation hooks into the infrastructure introduced in #2882. It adds types as a supported object in `SupportedDependencyByCitus`. This will make sure that any object being distributed by citus that depends on types will now cascade into types. When types are depending them self on other objects they will get created first. Creation later works by getting the ddl commands to create the object by its `ObjectAddress` in `GetDependencyCreateDDLCommands` which will dispatch types to `CreateTypeDDLCommandsIdempotent`. For the correct walking of the graph we follow array types, when later asked for the ddl commands for array types we return `NIL` (empty list) which makes that the object will not be recorded as distributed, (its an internal type, dependant on the user type). Proactive distribution --------------------- When the user creates a type (composite or enum) we will have a hook running in `multi_ProcessUtility` after the command has been applied locally. Running after running locally makes that we already have an `ObjectAddress` for the type. This is required to mark the type as being distributed. Keeping the type up to date ==================== For types that are recorded in `pg_dist_object` (eg. `IsObjectDistributed` returns true for the `ObjectAddress`) we will intercept the utility commands that alter the type. - `AlterTableStmt` with `relkind` set to `OBJECT_TYPE` encapsulate changes to the fields of a composite type. - `DropStmt` with removeType set to `OBJECT_TYPE` encapsulate `DROP TYPE`. - `AlterEnumStmt` encapsulates changes to enum values. Enum types can not be changed transactionally. When the execution on a worker fails a warning will be shown to the user the propagation was incomplete due to worker communication failure. An idempotent command is shown for the user to re-execute when the worker communication is fixed. Keeping types up to date is done via the executor. Before the statement is executed locally we create a plan on how to apply it on the workers. This plan is executed after we have applied the statement locally. All changes to types need to be done in the same transaction for types that have already been distributed and will fail with an error if parallel queries have already been executed in the same transaction. Much like foreign keys to reference tables.	2019-09-13 17:46:07 +02:00
Nils Dijk	05f0668cdc	Fix: schema leak onto create index statement cache (#2964 ) DESCRIPTION: Fix schema leak on CREATE INDEX statement When a CREATE INDEX is cached between execution we might leak the schema name onto the cached statement of an earlier execution preventing the right index to be created. Even though the cache is cleared when the search_path changes we can trigger this behaviour by having the schema already on the search path before a colliding table is created in a schema earlier on the `search_path`. When calling an unqualified create index via a function (used to trigger the caching behaviour) we see that the index is created on the wrong table after the schema leaked onto the statement. By copying the complete `PlannedStmt` and `utilityStmt` during our planning phase for distributed ddls we make sure we are not leaking the schema name onto a cached data structure. Caveat; COPY statements already have a lot of parsestree copying ongoing without directly putting it back on the `pstmt`. We should verify that copies modify the statement and potentially copy the complete `pstmt` there already.	2019-09-13 14:04:23 +02:00
Onder Kalaci	0b0c779c77	Introduce the concept of Local Execution /* * local_executor.c * * The scope of the local execution is locally executing the queries on the * shards. In other words, local execution does not deal with any local tables * that are not shards on the node that the query is being executed. In that sense, * the local executor is only triggered if the node has both the metadata and the * shards (e.g., only Citus MX worker nodes). * * The goal of the local execution is to skip the unnecessary network round-trip * happening on the node itself. Instead, identify the locally executable tasks and * simply call PostgreSQL's planner and executor. * * The local executor is an extension of the adaptive executor. So, the executor uses * adaptive executor's custom scan nodes. * * One thing to note that Citus MX is only supported with replication factor = 1, so * keep that in mind while continuing the comments below. * * On the high level, there are 3 slightly different ways of utilizing local execution: * * (1) Execution of local single shard queries of a distributed table * * This is the simplest case. The executor kicks at the start of the adaptive * executor, and since the query is only a single task the execution finishes * without going to the network at all. * * Even if there is a transaction block (or recursively planned CTEs), as long * as the queries hit the shards on the same, the local execution will kick in. * * (2) Execution of local single queries and remote multi-shard queries * * The rule is simple. If a transaction block starts with a local query execution, * all the other queries in the same transaction block that touch any local shard * have to use the local execution. Although this sounds restrictive, we prefer to * implement in this way, otherwise we'd end-up with as complex scenarious as we * have in the connection managements due to foreign keys. * * See the following example: * BEGIN; * -- assume that the query is executed locally * SELECT count() FROM test WHERE key = 1; * -- at this point, all the shards that reside on the * -- node is executed locally one-by-one. After those finishes * -- the remaining tasks are handled by adaptive executor * SELECT count() FROM test; * * (3) Modifications of reference tables * * Modifications to reference tables have to be executed on all nodes. So, after the * local execution, the adaptive executor keeps continuing the execution on the other * nodes. * * Note that for read-only queries, after the local execution, there is no need to * kick in adaptive executor. * * There are also few limitations/trade-offs that is worth mentioning. First, the * local execution on multiple shards might be slow because the execution has to * happen one task at a time (e.g., no parallelism). Second, if a transaction * block/CTE starts with a multi-shard command, we do not use local query execution * since local execution is sequential. Basically, we do not want to lose parallelism * across local tasks by switching to local execution. Third, the local execution * currently only supports queries. In other words, any utility commands like TRUNCATE, * fails if the command is executed after a local execution inside a transaction block. * Forth, the local execution cannot be mixed with the executors other than adaptive, * namely task-tracker, real-time and router executors. Finally, related with the * previous item, COPY command cannot be mixed with local execution in a transaction. * The implication of that any part of INSERT..SELECT via coordinator cannot happen * via the local execution. */	2019-09-12 11:51:25 +02:00
Philip Dubé	bdd30bb181	Don't allow distributing by a generated column	2019-09-04 14:50:17 +00:00
Philip Dubé	41dca121e2	Support GENERATE ALWAYS AS STORED	2019-09-04 14:50:17 +00:00
Nils Dijk	936d546a3c	Refactor Ensure Schema Exists to Ensure Dependecies Exists (#2882 ) DESCRIPTION: Refactor ensure schema exists to dependency exists Historically we only supported schema's as table dependencies to be created on the workers before a table gets distributed. This PR puts infrastructure in place to walk pg_depend to figure out which dependencies to create on the workers. Currently only schema's are supported as objects to create before creating a table. We also keep track of dependencies that have been created in the cluster. When we add a new node to the cluster we use this catalog to know which objects need to be created on the worker. Side effect of knowing which objects are already distributed is that we don't have debug messages anymore when creating schema's that are already created on the workers.	2019-09-04 14:10:20 +02:00
Philip Dubé	693d4695d7	Create a test 'pg12' for pg12 features & error on unsupported new features Unsupported new features: COPY FROM WHERE, GENERATED ALWAYS AS, non-heap table access methods	2019-08-22 19:30:56 +00:00
Philip Dubé	fe10ca453d	Implement FileCompat to abstract pg12 requiring API consumer to track file offsets	2019-08-22 18:57:47 +00:00
Philip Dubé	018ad1c58e	pg12: version_compat.h, tuples, oids, misc	2019-08-22 18:57:23 +00:00
Philip Dubé	9643ff580e	Update commands/vacuum.c with pg12 changes Adds support for SKIP_LOCKED, INDEX_CLEANUP, TRUNCATE Removes broken assert	2019-08-22 18:56:54 +00:00
Philip Dubé	68c4b71f93	Fix up includes with pg12 changes	2019-08-22 18:56:21 +00:00
Hadi Moshayedi	6be1bacddd	Fix distributed deadlock for TRUNCATE	2019-08-22 11:03:53 -07:00
Hadi Moshayedi	a5b087c89b	Support FKs between reference tables	2019-08-21 16:11:27 -07:00
Philip Dubé	7bf7e41594	commands/index.c: Fix assertion typo	2019-08-21 18:54:05 +00:00
Philip Dubé	f4b90419ae	Raise an error when REINDEX TABLE or INDEX is invoked on a distributed relation	2019-08-21 17:03:14 +00:00
Hadi Moshayedi	c582eb89c8	Add some missing locks.	2019-08-15 12:34:31 -07:00
Philip Dubé	705d1bf0e0	Use PG_JOB_CACHE_DIR	2019-08-09 15:25:59 +00:00
Onder Kalaci	060ac11476	Do not record relation accessess unnecessarily Before this commit, we've recorded the relation accesses in 3 different places - FindPlacementListConnection -- applies all executor in tx block - StartPlacementExecutionOnSession() -- adaptive executor only - StartPlacementListConnection() -- router/real-time only This is different than Citus 8.2, and could lead to query execution times increase considerably on multi-shard commands in transaction block that are on partitioned tables. Benchmarks: ``` 1+8 c5.4xlarge cluster Empty distributed partitioned table with 365 partitions: https://gist.github.com/onderkalaci/1edace4ed6bd6f061c8a15594865bb51#file-partitions_365-sql ./pgbench -f /tmp/multi_shard.sql -c10 -j10 -P 1 -T 120 postgres://citus:w3r6KLJpv3mxe9E-NIUeJw@c.fy5fkjcv45vcepaogqcaskmmkee.db.citusdata.com:5432/citus?sslmode=require cat /tmp/multi_shard.sql BEGIN; DELETE FROM collections_list; DELETE FROM collections_list; DELETE FROM collections_list; COMMIT; cat /tmp/single_shard.sql BEGIN; DELETE FROM collections_list WHERE key = :aid; DELETE FROM collections_list WHERE key = :aid; DELETE FROM collections_list WHERE key = :aid; COMMIT; cat /tmp/mix.sql BEGIN; DELETE FROM collections_list WHERE key = :aid; DELETE FROM collections_list WHERE key = :aid; DELETE FROM collections_list WHERE key = :aid; DELETE FROM collections_list; DELETE FROM collections_list; DELETE FROM collections_list; COMMIT; ``` The table shows `latency average` of pgbench runs explained above, so we have a pretty solid improvement even over 8.2.2. \| Test \| Citus 8.2.2 \| Citus 8.3.1 \| Citus 8.3.2 (this branch) \| Citus 8.3.1 (FKEYs disabled via GUC) \| \| ------------- \| ------------- \| ------------- \|------------- \| ------------- \| \|multi_shard \| 2370.083 ms \|3605.040 ms \|1324.094 ms \|1247.255 ms \| \| single_shard \| 85.338 ms \|120.934 ms \|73.216 ms \| 78.765 ms \| \| mix \| 2434.459 ms \| 3727.080 ms \|1306.456 ms \| 1280.326 ms \|	2019-08-08 18:42:08 +02:00
Onder Kalaci	35ee896f3d	Get rid of an unnecessary parameter targetPoolSize parameter for ExecuteUtilityTaskListWithoutResults becomes obsolete, just remove it.	2019-08-07 19:35:56 +02:00
Onder Kalaci	b2e01d0745	Refactor switching to sequential mode We don't need to wait until the execution. As soon as we realize that we need sequential execution, we should do it.	2019-08-07 19:35:56 +02:00
Hadi Moshayedi	032167c553	Fix Assert() in ProcessVariableSetStmt()	2019-07-05 14:11:22 -07:00
Marco Slot	07d2266e11	Fix RESET and other types of SET	2019-07-05 19:30:48 +02:00
Önder Kalacı	40da78c6fd	Introduce the adaptive executor (#2798 ) With this commit, we're introducing the Adaptive Executor. The commit message consists of two distinct sections. The first part explains how the executor works. The second part consists of the commit messages of the individual smaller commits that resulted in this commit. The readers can search for the each of the smaller commit messages on https://github.com/citusdata/citus and can learn more about the history of the change. /------------------------------------------------------------------------- * adaptive_executor.c * * The adaptive executor executes a list of tasks (queries on shards) over * a connection pool per worker node. The results of the queries, if any, * are written to a tuple store. * * The concepts in the executor are modelled in a set of structs: * * - DistributedExecution: * Execution of a Task list over a set of WorkerPools. * - WorkerPool * Pool of WorkerSessions for the same worker which opportunistically * executes "unassigned" tasks from a queue. * - WorkerSession: * Connection to a worker that is used to execute "assigned" tasks * from a queue and may execute unasssigned tasks from the WorkerPool. * - ShardCommandExecution: * Execution of a Task across a list of placements. * - TaskPlacementExecution: * Execution of a Task on a specific placement. * Used in the WorkerPool and WorkerSession queues. * * Every connection pool (WorkerPool) and every connection (WorkerSession) * have a queue of tasks that are ready to execute (readyTaskQueue) and a * queue/set of pending tasks that may become ready later in the execution * (pendingTaskQueue). The tasks are wrapped in a ShardCommandExecution, * which keeps track of the state of execution and is referenced from a * TaskPlacementExecution, which is the data structure that is actually * added to the queues and describes the state of the execution of a task * on a particular worker node. * * When the task list is part of a bigger distributed transaction, the * shards that are accessed or modified by the task may have already been * accessed earlier in the transaction. We need to make sure we use the * same connection since it may hold relevant locks or have uncommitted * writes. In that case we "assign" the task to a connection by adding * it to the task queue of specific connection (in * AssignTasksToConnections). Otherwise we consider the task unassigned * and add it to the task queue of a worker pool, which means that it * can be executed over any connection in the pool. * * A task may be executed on multiple placements in case of a reference * table or a replicated distributed table. Depending on the type of * task, it may not be ready to be executed on a worker node immediately. * For instance, INSERTs on a reference table are executed serially across * placements to avoid deadlocks when concurrent INSERTs take conflicting * locks. At the beginning, only the "first" placement is ready to execute * and therefore added to the readyTaskQueue in the pool or connection. * The remaining placements are added to the pendingTaskQueue. Once * execution on the first placement is done the second placement moves * from pendingTaskQueue to readyTaskQueue. The same approach is used to * fail over read-only tasks to another placement. * * Once all the tasks are added to a queue, the main loop in * RunDistributedExecution repeatedly does the following: * * For each pool: * - ManageWorkPool evaluates whether to open additional connections * based on the number unassigned tasks that are ready to execute * and the targetPoolSize of the execution. * * Poll all connections: * - We use a WaitEventSet that contains all (non-failed) connections * and is rebuilt whenever the set of active connections or any of * their wait flags change. * * We almost always check for WL_SOCKET_READABLE because a session * can emit notices at any time during execution, but it will only * wake up WaitEventSetWait when there are actual bytes to read. * * We check for WL_SOCKET_WRITEABLE just after sending bytes in case * there is not enough space in the TCP buffer. Since a socket is * almost always writable we also use WL_SOCKET_WRITEABLE as a * mechanism to wake up WaitEventSetWait for non-I/O events, e.g. * when a task moves from pending to ready. * * For each connection that is ready: * - ConnectionStateMachine handles connection establishment and failure * as well as command execution via TransactionStateMachine. * * When a connection is ready to execute a new task, it first checks its * own readyTaskQueue and otherwise takes a task from the worker pool's * readyTaskQueue (on a first-come-first-serve basis). * * In cases where the tasks finish quickly (e.g. <1ms), a single * connection will often be sufficient to finish all tasks. It is * therefore not necessary that all connections are established * successfully or open a transaction (which may be blocked by an * intermediate pgbouncer in transaction pooling mode). It is therefore * essential that we take a task from the queue only after opening a * transaction block. * * When a command on a worker finishes or the connection is lost, we call * PlacementExecutionDone, which then updates the state of the task * based on whether we need to run it on other placements. When a * connection fails or all connections to a worker fail, we also call * PlacementExecutionDone for all queued tasks to try the next placement * and, if necessary, mark shard placements as inactive. If a task fails * to execute on all placements, the execution fails and the distributed * transaction rolls back. * * For multi-row INSERTs, tasks are executed sequentially by * SequentialRunDistributedExecution instead of in parallel, which allows * a high degree of concurrency without high risk of deadlocks. * Conversely, multi-row UPDATE/DELETE/DDL commands take aggressive locks * which forbids concurrency, but allows parallelism without high risk * of deadlocks. Note that this is unrelated to SEQUENTIAL_CONNECTION, * which indicates that we should use at most one connection per node, but * can run tasks in parallel across nodes. This is used when there are * writes to a reference table that has foreign keys from a distributed * table. * * Execution finishes when all tasks are done, the query errors out, or * the user cancels the query. * ------------------------------------------------------------------------- / All the commits involved here: * Initial unified executor prototype * Latest changes * Fix rebase conflicts to master branch * Add missing variable for assertion * Ensure that master_modify_multiple_shards() returns the affectedTupleCount * Adjust intermediate result sizes The real-time executor uses COPY command to get the results from the worker nodes. Unified executor avoids that which results in less data transfer. Simply adjust the tests to lower sizes. * Force one connection per placement (or co-located placements) when requested The existing executors (real-time and router) always open 1 connection per placement when parallel execution is requested. That might be useful under certain circumstances: (a) User wants to utilize as much as CPUs on the workers per distributed query (b) User has a transaction block which involves COPY command Also, lots of regression tests rely on this execution semantics. So, we'd enable few of the tests with this change as well. * For parameters to be resolved before using them For the details, see PostgreSQL's copyParamList() * Unified executor sorts the returning output * Ensure that unified executor doesn't ignore sequential execution of DDLJob's Certain DDL commands, mainly creating foreign keys to reference tables, should be executed sequentially. Otherwise, we'd end up with a self distributed deadlock. To overcome this situaiton, we set a flag `DDLJob->executeSequentially` and execute it sequentially. Note that we have to do this because the command might not be called within a transaction block, and we cannot call `SetLocalMultiShardModifyModeToSequential()`. This fixes at least two test: multi_insert_select_on_conflit.sql and multi_foreign_key.sql Also, I wouldn't mind scattering local `targetPoolSize` variables within the code. The reason is that we'll soon have a GUC (or a global variable based on a GUC) that'd set the pool size. In that case, we'd simply replace `targetPoolSize` with the global variables. * Fix 2PC conditions for DDL tasks * Improve closing connections that are not fully established in unified execution * Support foreign keys to reference tables in unified executor The idea for supporting foreign keys to reference tables is simple: Keep track of the relation accesses within a transaction block. - If a parallel access happens on a distributed table which has a foreign key to a reference table, one cannot modify the reference table in the same transaction. Otherwise, we're very likely to end-up with a self-distributed deadlock. - If an access to a reference table happens, and then a parallel access to a distributed table (which has a fkey to the reference table) happens, we switch to sequential mode. Unified executor misses the function calls that marks the relation accesses during the execution. Thus, simply add the necessary calls and let the logic kick in. * Make sure to close the failed connections after the execution * Improve comments * Fix savepoints in unified executor. * Rebuild the WaitEventSet only when necessary * Unclaim connections on all errors. * Improve failure handling for unified executor - Implement the notion of errorOnAnyFailure. This is similar to Critical Connections that the connection managament APIs provide - If the nodes inside a modifying transaction expand, activate 2PC - Fix few bugs related to wait event sets - Mark placement INACTIVE during the execution as much as possible as opposed to we do in the COMMIT handler - Fix few bugs related to scheduling next placement executions - Improve decision on when to use 2PC Improve the logic to start a transaction block for distributed transactions - Make sure that only reference table modifications are always executed with distributed transactions - Make sure that stored procedures and functions are executed with distributed transactions * Move waitEventSet to DistributedExecution This could also be local to RunDistributedExecution(), but in that case we had to mark it as "volatile" to avoid PG_TRY()/PG_CATCH() issues, and cast it to non-volatile when doing WaitEventSetFree(). We thought that would make code a bit harder to read than making this non-local, so we move it here. See comments for PG_TRY() in postgres/src/include/elog.h and "man 3 siglongjmp" for more context. * Fix multi_insert_select test outputs Two things: 1) One complex transaction block is now supported. Simply update the test output 2) Due to dynamic nature of the unified executor, the orders of the errors coming from the shards might change (e.g., all of the queries on the shards would fail, but which one appears on the error message?). To fix that, we simply added it to our shardId normalization tool which happens just before diff. * Fix subeury_and_cte test The error message is updated from: failed to execute task To: more than one row returned by a subquery or an expression which is a lot clearer to the user. * Fix intermediate_results test outputs Simply update the error message from: could not receive query results to result "squares" does not exist which makes a lot more sense. * Fix multi_function_in_join test The error messages update from: Failed to execute task XXX To: function f(..) does not exist * Fix multi_query_directory_cleanup test The unified executor does not create any intermediate files. * Fix with_transactions test A test case that just started to work fine * Fix multi_router_planner test outputs The error message is update from: Could not receive query results To: Relation does not exists which is a lot more clearer for the users * Fix multi_router_planner_fast_path test The error message is update from: Could not receive query results To: Relation does not exists which is a lot more clearer for the users * Fix isolation_copy_placement_vs_modification by disabling select_opens_transaction_block * Fix ordering in isolation_multi_shard_modify_vs_all * Add executor locks to unified executor * Make sure to allocate enought WaitEvents The previous code was missing the waitEvents for the latch and postmaster death. * Fix rebase conflicts for master rebase * Make sure that TRUNCATE relies on unified executor * Implement true sequential execution for multi-row INSERTS Execute the individual tasks executed one by one. Note that this is different than MultiShardConnectionType == SEQUENTIAL_CONNECTION case (e.g., sequential execution mode). In that case, running the tasks across the nodes in parallel is acceptable and implemented in that way. However, the executions that are qualified here would perform poorly if the tasks across the workers are executed in parallel. We currently qualify only one class of distributed queries here, multi-row INSERTs. If we do not enforce true sequential execution, concurrent multi-row upserts could easily form a distributed deadlock when the upserts touch the same rows. * Remove SESSION_LIFESPAN flag in unified_executor * Apply failure test updates We've changed the failure behaviour a bit, and also the error messages that show up to the user. This PR covers majority of the updates. * Unified executor honors citus.node_connection_timeout With this commit, unified executor errors out if even a single connection cannot be established within citus.node_connection_timeout. And, as a side effect this fixes failure_connection_establishment test. * Properly increment/decrement pool size variables Before this commit, the idle and active connection counts were not properly calculated. * insert_select_executor goes through unified executor. * Add missing file for task tracker * Modify ExecuteTaskListExtended()'s signature * Sort output of INSERT ... SELECT ... RETURNING * Take partition locks correctly in unified executor * Alternative implementation for force_max_query_parallelization * Fix compile warnings in unified executor * Fix style issues * Decrement idleConnectionCount when idle connection is lost * Always rebuild the wait event sets In the previous implementation, on waitFlag changes, we were only modifying the wait events. However, we've realized that it might be an over optimization since (a) we couldn't see any performance benefits (b) we see some errors on failures and because of (a) we prefer to disable it now. * Make sure to allocate enough sized waitEventSet With multi-row INSERTs, we might have more sessions than taskworkerCount after few calls of RunDistributedExecution() because the previous sessions would also be alive. Instead, re-allocate events when the connectino set changes. Implement SELECT FOR UPDATE on reference tables On master branch, we do two extra things on SELECT FOR UPDATE queries on reference tables: - Acquire executor locks - Execute the query on all replicas With this commit, we're implementing the same logic on the new executor. * SELECT FOR UPDATE opens transaction block even if SelectOpensTransactionBlock disabled Otherwise, users would be very confused and their logic is very likely to break. * Fix build error * Fix the newConnectionCount calculation in ManageWorkerPool * Fix rebase conflicts * Fix minor test output differences * Fix citus indent * Remove duplicate sorts that is added with rebase * Create distributed table via executor * Fix wait flags in CheckConnectionReady * failure_savepoints output for unified executor. * failure_vacuum output (pg 10) for unified executor. * Fix WaitEventSetWait timeout in unified executor * Stabilize failure_truncate test output * Add an ORDER BY to multi_upsert * Fix regression test outputs after rebase to master * Add executor.c comment * Rename executor.c to adaptive_executor.c * Do not schedule tasks if the failed placement is not ready to execute Before the commit, we were blindly scheduling the next placement executions even if the failed placement is not on the ready queue. Now, we're ensuring that if failed placement execution is on a failed pool or session where the execution is on the pendingQueue, we do not schedule the next task. Because the other placement execution should be already running. * Implement a proper custom scan node for adaptive executor - Switch between the executors, add GUC to set the pool size - Add non-adaptive regression test suites - Enable CIRCLE CI for non-adaptive tests - Adjust test output files * Add slow start interval to the executor * Expose max_cached_connection_per_worker to user * Do not start slow when there are cached connections * Consider ExecutorSlowStartInterval in NextEventTimeout * Fix memory issues with ReceiveResults(). * Disable executor via TaskExecutorType * Make sure to execute the tests with the other executor * Use task_executor_type to enable-disable adaptive executor * Remove useless code * Adjust the regression tests * Add slow start regression test * Rebase to master * Fix test failures in adaptive executor. * Rebase to master - 2 * Improve comments & debug messages * Set force_max_query_parallelization in isolation_citus_dist_activity * Force max parallelization for creating shards when asked to use exclusive connection. * Adjust the default pool size * Expand description of max_adaptive_executor_pool_size GUC * Update warnings in FinishRemoteTransactionCommit() * Improve session clean up at the end of execution Explicitly list all the states that the execution might end, otherwise warn. * Remove MULTI_CONNECTION_WAIT_RETRY which is not used at all * Add more ORDER BYs to multi_mx_partitioning	2019-06-28 14:04:40 +02:00
Hanefi Onaldi	7e8fd49b94	Create Schemas as superuser on all shard/table creation UDFs - All the schema creations on the workers will now be via superuser connections - If a shard is being repaired or a shard is replicated, we will create the schema only in the relevant worker; and in all the other cases where a schema creation is needed, we will block operations until we ensure the schema exists in all the workers	2019-06-26 17:12:28 +02:00
Jason Petersen	d4e1172247	Implement propagation of SET LOCAL commands Adds support for propagation of SET LOCAL commands to all workers involved in a query. For now, SET SESSION (i.e. plain SET) is not supported whatsoever, though this code is intended as somewhat of a base for implementing such support in the future. As SET LOCAL modifications are scoped to the body of a BEGIN/END xact block, queries wishing to use SET LOCAL propagation must be within such a block. In addition, subsequent modifications after e.g. any SAVEPOINT or ROLLBACK statements will correspondingly push or pop variable mod- ifications onto an internal stack such that the behavior of changed values across the cluster will be identical to such behavior on e.g. single-node PostgreSQL (or equivalently, what values are visible to the end user by running SHOW on such variables on the coordinator). If nodes enter the set of participants at some point after SET LOCAL modifications (or SAVEPOINT, ROLLBACK, etc.) have occurred, the SET variable state is eagerly propagated to them upon their entrance (this is identical to, and indeed just augments, the existing logic for the propagation of the SAVEPOINT "stack"). A new GUC (citus.propagate_set_commands) has been added to control this behavior. Though the code suggests the valid settings are 'none', 'local', 'session', and 'all', only 'none' (the default) and 'local' are presently implemented: attempting to use other values will result in an error.	2019-06-20 16:15:43 -07:00
Hadi Moshayedi	4bbae02778	Make COPY compatible with unified executor.	2019-06-20 19:53:40 +02:00
Philip Dubé	4bfcf5b665	Enable Werror for all warnings Changes to ruleutils match changes made upstream to silence gcc fallthrough warnings	2019-06-18 14:43:54 -07:00
Hadi Moshayedi	dee5bc31b4	Refactor ShardIdForTuple() to a separate function.	2019-06-02 09:48:15 -07:00
Philip Dubé	b8871d9ff4	Propagate more ALTER FOREIGN TABLE to workers	2019-05-24 12:54:05 -07:00
Philip Dubé	16886b3c63	Fix misc typos	2019-05-23 17:23:27 -07:00
Hadi Moshayedi	8ae47e1244	Fix comments for RemoteFileDestReceiverStartup and CitusCopyDestReceiverStartup	2019-05-21 09:03:22 -07:00
Hadi Moshayedi	b5c0ca45f1	Remove stopOnFailure flag from EndRemoteCopy()	2019-05-11 06:18:34 -07:00
Hadi Moshayedi	32ecb6884c	Test ROLLBACK TO SAVEPOINT with multi-shard CTE failures	2019-05-01 09:33:43 -07:00
Hadi Moshayedi	aafd22dffa	Fix savepoint rollback for INSERT INTO ... SELECT.	2019-05-01 09:33:43 -07:00
Jason Petersen	71d5d1c865	Enable variable shadowing warnings; fix all Rather than wait for another place like the previous commit to bite us, I think we should turn on this warning.	2019-04-30 13:24:25 -06:00
Jason Petersen	1125fc9da0	Fix self-strncmp in ConstrIsFKToReferenceTable Make the function do what I assume was intended.	2019-04-30 13:24:25 -06:00
Marco Slot	0ea4e52df5	Add nodeId to shardPlacements and use it for shard placement comparisons Before this commit, shardPlacements were identified with shardId, nodeName and nodeport. Instead of using nodeName and nodePort, we now use nodeId since it apparently has performance benefits in several places in the code.	2019-03-20 12:14:46 +03:00
Marco Slot	f2abf2b8e5	Functions are treated as transaction blocks	2019-03-15 16:34:08 -06:00
Hadi Moshayedi	a9e6d06a98	Skip execution of ALTER TABLE constraint checks on the coordinator	2019-03-14 15:40:56 -07:00
Hadi Moshayedi	cdd3b15ac8	Fix distributed deadlock for ALTER TABLE ... ATTACH PARTITION. Following scenario resulted in distributed deadlock before this commit: CREATE TABLE partitioning_test(id int, time date) PARTITION BY RANGE (time); CREATE TABLE partitioning_test_2009 (LIKE partitioning_test); CREATE TABLE partitioning_test_reference(id int PRIMARY KEY, subid int); SELECT create_distributed_table('partitioning_test_2009', 'id'), create_distributed_table('partitioning_test', 'id'), create_reference_table('partitioning_test_reference'); ALTER TABLE partitioning_test ADD CONSTRAINT partitioning_reference_fkey FOREIGN KEY (id) REFERENCES partitioning_test_reference(id) ON DELETE CASCADE; ALTER TABLE partitioning_test_2009 ADD CONSTRAINT partitioning_reference_fkey_2009 FOREIGN KEY (id) REFERENCES partitioning_test_reference(id) ON DELETE CASCADE; ALTER TABLE partitioning_test ATTACH PARTITION partitioning_test_2009 FOR VALUES FROM ('2009-01-01') TO ('2010-01-01');	2019-03-14 15:28:37 -07:00
Hadi Moshayedi	f19feb742c	Remove never assigned colocatedRelation from CreateDistributedTable (#2479 )	2019-03-12 14:50:18 -07:00
Murat Tuncer	cd5213abee	Set sequential mode execution GUC for alter partitioned table PG recently started propagating foreign key constraints to partition tables. This came with a select query to validate the the constaint. We are already setting sequential mode execution for this command. In order for validation select query to respect this setting we need to explicitly set the GUC. This commit also handles detach partition part.	2019-01-25 15:28:07 +03:00
Jason Petersen	339e6e661e	Remove 9.6 (#2554 ) Removes support and code for PostgreSQL 9.6 cr: @velioglu	2019-01-16 13:11:24 -07:00
Marco Slot	1b1c6374f7	Execute CREATE INDEX CONCURRENTLY concurrently	2018-12-21 14:02:59 -07:00
Marco Slot	5b9376a7f8	Check ownership before taking locks in distributed table creation	2018-12-18 15:32:07 +01:00
Marco Slot	9cf91c438b	Only allow transmit from pgsql_job_cache directory	2018-12-05 10:18:27 +01:00
Marco Slot	8893cc141d	Support INSERT...SELECT with ON CONFLICT or RETURNING via coordinator Before this commit, Citus supported INSERT...SELECT queries with ON CONFLICT or RETURNING clauses only for pushdownable ones, since queries supported via coordinator were utilizing COPY infrastructure of PG to send selected tuples to the target worker nodes. After this PR, INSERT...SELECT queries with ON CONFLICT or RETURNING clauses will be performed in two phases via coordinator. In the first phase selected tuples will be saved to the intermediate table which is colocated with target table of the INSERT...SELECT query. Note that, a utility function to save results to the colocated intermediate result also implemented as a part of this commit. In the second phase, INSERT.. SELECT query is directly run on the worker node using the intermediate table as the source table.	2018-11-30 15:29:12 +03:00
Hanefi Onaldi	7db6991dc0	propagate validate queries to workers	2018-11-26 14:04:51 +03:00
Marco Slot	6aa5592e52	Add user ID suffix to intermediate files in re-partition jobs	2018-11-23 08:36:11 +01:00
Marco Slot	caf402d506	COPY to a task file no longer switches to superuser	2018-11-22 18:15:33 +01:00
Onder Kalaci	7f0a57a153	Make sure to prevent unauthorized users to drop tables in Citus MX	2018-11-15 18:07:03 +03:00
Marco Slot	f383e4f307	Description: Refactor code that handles DDL commands from one file into a module The file handling the utility functions (DDL) for citus organically grew over time and became unreasonably large. This refactor takes that file and refactored the functionality into separate files per command. Initially modeled after the directory and file layout that can be found in postgres. Although the size of the change is quite big there are barely any code changes. Only one two functions have been added for readability purposes: - PostProcessIndexStmt which is extracted from PostProcessUtility - PostProcessAlterTableStmt which is extracted from multi_ProcessUtility A README.md has been added to `src/backend/distributed/commands` describing the contents of the module and every file in the module. We need more documentation around the overloading of the COPY command, for now the boilerplate has been added for people with better knowledge to fill out.	2018-11-14 13:36:27 +01:00

1 2 3 4 5 ...

395 Commits (9fd3f62cb62f4c91caea27b32ce7df2d10ed95f0)