citus

Commit Graph

Author	SHA1	Message	Date
Philip Dubé	90e1f1442a	Annotated tests for multi_mx_call. Co-authored-by: pykello <hadi.moshayedi@microsoft.com>	2019-09-24 17:31:09 +00:00
Marco Slot	e269d990c9	Cast the distribution argument value when possible	2019-09-24 17:31:09 +00:00
Philip Dubé	432a8ef85b	Hadi's feedback Co-authored-by: pykello <hadi.moshayedi@microsoft.com> Co-authored-by: serprex <serprex@users.noreply.github.com>	2019-09-24 17:31:09 +00:00
Philip Dubé	bc1ad67eb5	Distribute CALL on distributed procedures to metadata workers Lots taken from https://github.com/citusdata/citus/pull/2829	2019-09-24 17:31:09 +00:00
Onder Kalaci	18de78f386	Relax the colocation checks for distributed functions As long as the types can be coerced, it is safe to pushdown functions.	2019-09-24 16:31:08 +02:00
Marco Slot	42be8afd74	Swap pg_dist_node groupid and nodeid sequences	2019-09-24 12:03:44 +02:00
Hadi Moshayedi	48078a30e6	Fix wait_until_metadata_sync() for postgres 12. Postgres 12 now has an assertion that the calls to WaitLatchOrSocket handle postmaster death.	2019-09-23 14:15:35 -07:00
Philip Dubé	06faba91c0	Include ifdefs for pg12 API changes, update local_shard_executiuon test to avoid CTE inlining	2019-09-23 20:22:35 +00:00
Onder Kalaci	d37745bfc7	Sync metadata to worker nodes after create_distributed_function Since the distributed functions are useful when the workers have metadata, we automatically sync it. Also, after master_add_node(). We do it lazily and let the deamon sync it. That's mainly because the metadata syncing cannot be done in transaction blocks, and we don't want to add lots of transactional limitations to master_add_node() and create_distributed_function().	2019-09-23 18:30:53 +02:00
Marco Slot	5f23b951c7	Support serial and smallserial when syncing metadata	2019-09-23 17:39:21 +02:00
Marco Slot	e58d76c5f6	Fix assert failure in bare SELECT FROM reference table FOR UPDATE in MX	2019-09-23 17:00:09 +02:00
Marco Slot	d85d77634d	Handle anonymous composite types on the target list	2019-09-23 14:53:02 +02:00
Onder Kalaci	d7e2968120	Add parameters to create_distributed_function() With this commit, we're changing the API for create_distributed_function() such that users can provide the distribution argument and the colocation information.	2019-09-22 21:53:33 +02:00
Onder Kalaci	e1fe8d60b4	Make sure that functions are also listed in SupportedDependencyByCitus We've recently merged two commits, `db5d03931d` and `eccba1d4c3`, which actually operates on the very similar places. It turns out that we've an integration issue, where master_add_node() fails to replicate the functions to newly added node.	2019-09-20 11:02:50 +02:00
Hadi Moshayedi	d24cefd055	Set active snapshot before SyncMetadataToNodes().	2019-09-19 09:00:25 -07:00
Hanefi Onaldi	ed11b9590c	Add distributed func creation queries in dependency replication logic	2019-09-18 20:07:45 +03:00
Hadi Moshayedi	d2f2acc4b2	Make master_update_node citus-ha friendly.	2019-09-18 09:32:54 -07:00
Hadi Moshayedi	76f3933b05	Add metadatasynced, and sync on master_update_node() Co-authored-by: pykello <hadi.moshayedi@microsoft.com> Co-authored-by: serprex <serprex@users.noreply.github.com>	2019-09-18 09:32:54 -07:00
Nils Dijk	db5d03931d	Feature disable object propagation (#2986 ) DESCRIPTION: Provide a GUC to turn of the new dependency propagation functionality In the case the dependency propagation functionality introduced in 9.0 causes issues to a cluster of a user they can turn it off almost completely. The only dependency that will still be propagated and kept track of is the schema to emulate the old behaviour. GUC to change is `citus.enable_object_propagation`. When set to `false` the functionality will be mostly turned off. Be aware that objects marked as distributed in `pg_dist_object` will still be kept in the catalog as a distributed object. Alter statements to these objects will not be propagated to workers and may cause desynchronisation.	2019-09-18 17:16:22 +02:00
Nils Dijk	2b7f5552c8	Fix: rename remote type on conflict (#2983 ) DESCRIPTION: Rename remote types during type propagation To prevent data to be destructed when a remote type differs from the type on the coordinator during type propagation we wanted to rename the type instead of `DROP CASCADE`. This patch removes the `DROP` logic and adds the creation of a rename statement to a free name.	2019-09-17 18:54:10 +02:00
Nils Dijk	0a3152d09c	Add feature flag to turn off create type propagation (#2982 ) DESCRIPTION: Add feature flag to turn off create type propagation When `citus.enable_create_type_propagation` is set to `false` citus will not propagate `CREATE TYPE` statements to the workers. Types are still distributed when tables that depend on these types are distributed.	2019-09-17 15:50:06 +02:00
Onder Kalaci	cde6b02858	Add columns to pg_dist_object for distributed functions This PR simply adds the columns to pg_dist_object and implements the necessary metadata changes to keep track of distribution argument of the functions/procedures.	2019-09-16 17:28:04 +02:00
Jelte Fennema	af9fb9f785	Fix depend arguments for OSX clang cpp (#2978 ) A better fix for #2975. Apparently for OSX cpp -MF and -MT shouldn't have a space in between the flag and their value. Without the space it still works for gcc as well.	2019-09-16 15:22:07 +02:00
Jelte Fennema	31fac3b90e	Don't generate SQL files twice by not making directories a target (#2977 )	2019-09-16 12:53:17 +02:00
Önder Kalacı	13947a63ce	Don't use flags that mac clang doesn't support as it does on other platforms (#2975 )	2019-09-16 11:44:06 +02:00
Hanefi Onaldi	8f2a3a0604	Introduce create_distributed_function(regproc) UDF (#2961 ) This PR aims to add the minimal set of changes required to start distributing functions. You can use create_distributed_function(regproc) UDF to distribute a function. SELECT create_distributed_function('add(int,int)'); The function definition should include the param types to properly identify the correct function that we wish to distribute	2019-09-13 23:27:46 +03:00
Philip Dubé	492d1b2cba	ActivePrimaryNodeList: add lockMode parameter	2019-09-13 17:44:56 +00:00
Philip Dubé	5e5f4628a0	Fix pg12 compile	2019-09-13 17:25:30 +00:00
Jelte Fennema	4bbf65d913	Change SQL migration build process for easier reviews (#2951 ) @thanodnl told me it was a bit of a problem that it's impossible to see the history of a UDF in git. The only way to do so is by reading all the sql migration files from new to old. Another problem is that it's also hard to review the changed UDF during code review, because to find out what changed you have to do the same. I thought of a IMHO better (but not perfect) way to handle this. We keep the definition of a UDF in sql/udfs/{name_of_udf}/latest.sql. That file we change whenever we need to make a change to the the UDF. On top of that you also make a snapshot of the file in sql/udfs/{name_of_udf}/{migration-version}.sql (e.g. 9.0-1.sql) by copying the contents. This way you can easily view what the actual changes were by looking at the latest.sql file. There's still the question on how to use these files then. Sadly postgres doesn't allow inclusion of other sql files in the migration sql file (it does in psql using \i). So instead I used the C preprocessor+ make to compile a sql/xxx.sql to a build/sql/xxx.sql file. This final build/sql/xxx.sql file has every occurence of #include "somefile.sql" in sql/xxx.sql replaced by the contents of somefile.sql.	2019-09-13 18:44:27 +02:00
Nils Dijk	2879689441	Distribute Types to worker nodes (#2893 ) DESCRIPTION: Distribute Types to worker nodes When to propagate ============== There are two logical moments that types could be distributed to the worker nodes - When they get used ( just in time distribution ) - When they get created ( proactive distribution ) The just in time distribution follows the model used by how schema's get created right before we are going to create a table in that schema, for types this would be when the table uses a type as its column. The proactive distribution is suitable for situations where it is benificial to have the type on the worker nodes directly. They can later on be used in queries where an intermediate result gets created with a cast to this type. Just in time creation is always the last resort, you cannot create a distributed table before the type gets created. A good example use case is; you have an existing postgres server that needs to scale out. By adding the citus extension, add some nodes to the cluster, and distribute the table. The type got created before citus existed. There was no moment where citus could have propagated the creation of a type. Proactive is almost always a good option. Types are not resource intensive objects, there is no performance overhead of having 100's of types. If you want to use them in a query to represent an intermediate result (which happens in our test suite) they just work. There is however a moment when proactive type distribution is not beneficial; in transactions where the type is used in a distributed table. Lets assume the following transaction: ```sql BEGIN; CREATE TYPE tt1 AS (a int, b int); CREATE TABLE t1 AS (a int PRIMARY KEY, b tt1); SELECT create_distributed_table('t1', 'a'); \copy t1 FROM bigdata.csv ``` Types are node scoped objects; meaning the type exists once per worker. Shards however have best performance when they are created over their own connection. For the type to be visible on all connections it needs to be created and committed before we try to create the shards. Here the just in time situation is most beneficial and follows how we create schema's on the workers. Outside of a transaction block we will just use 1 connection to propagate the creation. How propagation works ================= Just in time ----------- Just in time propagation hooks into the infrastructure introduced in #2882. It adds types as a supported object in `SupportedDependencyByCitus`. This will make sure that any object being distributed by citus that depends on types will now cascade into types. When types are depending them self on other objects they will get created first. Creation later works by getting the ddl commands to create the object by its `ObjectAddress` in `GetDependencyCreateDDLCommands` which will dispatch types to `CreateTypeDDLCommandsIdempotent`. For the correct walking of the graph we follow array types, when later asked for the ddl commands for array types we return `NIL` (empty list) which makes that the object will not be recorded as distributed, (its an internal type, dependant on the user type). Proactive distribution --------------------- When the user creates a type (composite or enum) we will have a hook running in `multi_ProcessUtility` after the command has been applied locally. Running after running locally makes that we already have an `ObjectAddress` for the type. This is required to mark the type as being distributed. Keeping the type up to date ==================== For types that are recorded in `pg_dist_object` (eg. `IsObjectDistributed` returns true for the `ObjectAddress`) we will intercept the utility commands that alter the type. - `AlterTableStmt` with `relkind` set to `OBJECT_TYPE` encapsulate changes to the fields of a composite type. - `DropStmt` with removeType set to `OBJECT_TYPE` encapsulate `DROP TYPE`. - `AlterEnumStmt` encapsulates changes to enum values. Enum types can not be changed transactionally. When the execution on a worker fails a warning will be shown to the user the propagation was incomplete due to worker communication failure. An idempotent command is shown for the user to re-execute when the worker communication is fixed. Keeping types up to date is done via the executor. Before the statement is executed locally we create a plan on how to apply it on the workers. This plan is executed after we have applied the statement locally. All changes to types need to be done in the same transaction for types that have already been distributed and will fail with an error if parallel queries have already been executed in the same transaction. Much like foreign keys to reference tables.	2019-09-13 17:46:07 +02:00
Jelte Fennema	e4cfea3751	Correctly add schema when distributing sequence definitons Fixes 2958	2019-09-13 17:19:35 +02:00
Jelte Fennema	389086102a	Refactor 9 argument function to use a struct (#2952 ) For another PR I needed to add another column which would require to add another argument to an already 9 argument function signature. In this case it would be a boolean flag and there were already two boolean flags in there. In my experience it becomes really easy to mess up the order of these flags at that point. Especially because the type system doesn't distinguish between the 3 different booleans with completely different meanings. So I refactored these signatures to receive a struct containing most of these arguments. Like that you don't mess up orderening, because the meaning of the boolean is not order dependent but fieldname dependent. It also makes it possible to set good shared defaults for this struct.	2019-09-13 15:49:53 +02:00
Nils Dijk	05f0668cdc	Fix: schema leak onto create index statement cache (#2964 ) DESCRIPTION: Fix schema leak on CREATE INDEX statement When a CREATE INDEX is cached between execution we might leak the schema name onto the cached statement of an earlier execution preventing the right index to be created. Even though the cache is cleared when the search_path changes we can trigger this behaviour by having the schema already on the search path before a colliding table is created in a schema earlier on the `search_path`. When calling an unqualified create index via a function (used to trigger the caching behaviour) we see that the index is created on the wrong table after the schema leaked onto the statement. By copying the complete `PlannedStmt` and `utilityStmt` during our planning phase for distributed ddls we make sure we are not leaking the schema name onto a cached data structure. Caveat; COPY statements already have a lot of parsestree copying ongoing without directly putting it back on the `pstmt`. We should verify that copies modify the statement and potentially copy the complete `pstmt` there already.	2019-09-13 14:04:23 +02:00
Hadi Moshayedi	48ff4691a0	Return nodeid instead of record in some UDFs	2019-09-12 12:46:21 -07:00
Philip Dubé	2aa6852dea	Begin searching AggregateNames from 1, not 0	2019-09-12 16:55:05 +00:00
Jelte Fennema	d6deb062aa	Add shard rebalancer stubs	2019-09-12 16:40:25 +02:00
Jelte Fennema	eb7e45d556	Make LookupNodeForGroup extern	2019-09-12 16:40:25 +02:00
Jelte Fennema	257406fda7	Fix ArrayObjectCount for zero sized arrays	2019-09-12 16:40:25 +02:00
Onder Kalaci	0b0c779c77	Introduce the concept of Local Execution /* * local_executor.c * * The scope of the local execution is locally executing the queries on the * shards. In other words, local execution does not deal with any local tables * that are not shards on the node that the query is being executed. In that sense, * the local executor is only triggered if the node has both the metadata and the * shards (e.g., only Citus MX worker nodes). * * The goal of the local execution is to skip the unnecessary network round-trip * happening on the node itself. Instead, identify the locally executable tasks and * simply call PostgreSQL's planner and executor. * * The local executor is an extension of the adaptive executor. So, the executor uses * adaptive executor's custom scan nodes. * * One thing to note that Citus MX is only supported with replication factor = 1, so * keep that in mind while continuing the comments below. * * On the high level, there are 3 slightly different ways of utilizing local execution: * * (1) Execution of local single shard queries of a distributed table * * This is the simplest case. The executor kicks at the start of the adaptive * executor, and since the query is only a single task the execution finishes * without going to the network at all. * * Even if there is a transaction block (or recursively planned CTEs), as long * as the queries hit the shards on the same, the local execution will kick in. * * (2) Execution of local single queries and remote multi-shard queries * * The rule is simple. If a transaction block starts with a local query execution, * all the other queries in the same transaction block that touch any local shard * have to use the local execution. Although this sounds restrictive, we prefer to * implement in this way, otherwise we'd end-up with as complex scenarious as we * have in the connection managements due to foreign keys. * * See the following example: * BEGIN; * -- assume that the query is executed locally * SELECT count() FROM test WHERE key = 1; * -- at this point, all the shards that reside on the * -- node is executed locally one-by-one. After those finishes * -- the remaining tasks are handled by adaptive executor * SELECT count() FROM test; * * (3) Modifications of reference tables * * Modifications to reference tables have to be executed on all nodes. So, after the * local execution, the adaptive executor keeps continuing the execution on the other * nodes. * * Note that for read-only queries, after the local execution, there is no need to * kick in adaptive executor. * * There are also few limitations/trade-offs that is worth mentioning. First, the * local execution on multiple shards might be slow because the execution has to * happen one task at a time (e.g., no parallelism). Second, if a transaction * block/CTE starts with a multi-shard command, we do not use local query execution * since local execution is sequential. Basically, we do not want to lose parallelism * across local tasks by switching to local execution. Third, the local execution * currently only supports queries. In other words, any utility commands like TRUNCATE, * fails if the command is executed after a local execution inside a transaction block. * Forth, the local execution cannot be mixed with the executors other than adaptive, * namely task-tracker, real-time and router executors. Finally, related with the * previous item, COPY command cannot be mixed with local execution in a transaction. * The implication of that any part of INSERT..SELECT via coordinator cannot happen * via the local execution. */	2019-09-12 11:51:25 +02:00
Marco Slot	810aca8d41	Drop foreign key from pg_dist_poolinfo to pg_dist_node	2019-09-10 09:52:19 +02:00
Onder Kalaci	485189c0b6	Make sure that lost connections are handled properly Before this patch, when a connection is lost, we'd have the following situation: - Pop a task execution from readyQueue - Lost connection - Fail the session/pool. -> This step was not acting properly because we've popped the task, but not set to session->currentTask yet After the patch: - Pop a task execution from readyQueue - Immediately set it to session->currentTask - Lost connection - Fail the session/pool. -> At this step, failing the session would trigger query failures (or failovers) properly.	2019-09-10 17:54:27 +02:00
Philip Dubé	a28b82d67d	get_catalog_object_by_oid requires an extra parameter in pg12	2019-09-05 16:38:07 +00:00
Nils Dijk	511e715ee3	Remove early escape in walking pg_depend (#2930 ) This is a bug that got in when we inlined the body of a function into this loop. Earlier revisions had two loops, hence a function that would be reused. With a return instead of a continue the list of dependencies being walked is dependent on the order in which we find them in pg_depend. This became apparent during pg12 compatibility. The order of entries in pg12 was luckily different causing a random test to fail due to this return. By changing it to a continue we only skip the entries that we don’t want to follow instead of skipping all entries that happen to be found later. sidefix for more stable isolation tests around ensure dependency	2019-09-05 18:03:34 +02:00
Philip Dubé	bdd30bb181	Don't allow distributing by a generated column	2019-09-04 14:50:17 +00:00
Philip Dubé	41dca121e2	Support GENERATE ALWAYS AS STORED	2019-09-04 14:50:17 +00:00
Nils Dijk	936d546a3c	Refactor Ensure Schema Exists to Ensure Dependecies Exists (#2882 ) DESCRIPTION: Refactor ensure schema exists to dependency exists Historically we only supported schema's as table dependencies to be created on the workers before a table gets distributed. This PR puts infrastructure in place to walk pg_depend to figure out which dependencies to create on the workers. Currently only schema's are supported as objects to create before creating a table. We also keep track of dependencies that have been created in the cluster. When we add a new node to the cluster we use this catalog to know which objects need to be created on the worker. Side effect of knowing which objects are already distributed is that we don't have debug messages anymore when creating schema's that are already created on the workers.	2019-09-04 14:10:20 +02:00
Philip Dubé	28d964240f	Remove CheckForUpdates https://reports.citusdata.com/v1/releases/latest We haven't updated the version CheckForUpdates sees since 7.1.0	2019-09-03 21:11:25 +00:00
Philip Dubé	da00c62eea	create_distributed_table: include COLLATE on columns	2019-08-29 14:22:54 +00:00
Philip Dubé	32ef459025	backend_data.c: include max_wal_senders in calculating maxBackend, matches changes in pg12's InitializeMaxBackends	2019-08-28 21:24:33 +00:00
Jelte Fennema	cbecf97c84	Move tuplestore setup to a helper function (#2898 ) * Add tuplestore helpers * More detailed error messages in tuplestore * Add CreateTupleDescCopy to SetupTuplestore * Use new SetupTuplestore helper function * Remove unnecessary copy * Remove comment about undefined behaviour	2019-08-27 09:11:08 +02:00
Philip Dubé	eba3828ef7	ColocatedShardIntervalList: sort	2019-08-26 17:42:41 +00:00
Philip Dubé	6b0d8ed83d	SortList in FinalizedShardPlacementList, makes 3 failure tests consistent between 11/12	2019-08-22 19:30:56 +00:00
Philip Dubé	693d4695d7	Create a test 'pg12' for pg12 features & error on unsupported new features Unsupported new features: COPY FROM WHERE, GENERATED ALWAYS AS, non-heap table access methods	2019-08-22 19:30:56 +00:00
Philip Dubé	e5cd298a98	pg12 revised layout of FunctionCallInfoData See `a9c35cf85c` clang raises a warning due to FunctionCall2InfoData technically being variable sized This is fine, as the struct is the size we want it to be. So silence the warning	2019-08-22 19:02:35 +00:00
Philip Dubé	bee779e7d4	planner/distributed_planner.c: get_func_cost replaced with add_function_cost in pg12	2019-08-22 19:02:10 +00:00
Philip Dubé	be3285828f	Collations matter for hashing strings in pg12 See https://www.postgresql.org/docs/12/collation.html#COLLATION-NONDETERMINISTIC	2019-08-22 18:58:37 +00:00
Philip Dubé	fe10ca453d	Implement FileCompat to abstract pg12 requiring API consumer to track file offsets	2019-08-22 18:57:47 +00:00
Philip Dubé	018ad1c58e	pg12: version_compat.h, tuples, oids, misc	2019-08-22 18:57:23 +00:00
Philip Dubé	9643ff580e	Update commands/vacuum.c with pg12 changes Adds support for SKIP_LOCKED, INDEX_CLEANUP, TRUNCATE Removes broken assert	2019-08-22 18:56:54 +00:00
Philip Dubé	68c4b71f93	Fix up includes with pg12 changes	2019-08-22 18:56:21 +00:00
Philip Dubé	fbc3e346e8	ruleutils_12.c Produced this file by copying ruleutils_11.c, then comparing postgres ruleutils.c changes between REL_11_STABLE & REL_12_STABLE	2019-08-22 18:56:05 +00:00
Hadi Moshayedi	6be1bacddd	Fix distributed deadlock for TRUNCATE	2019-08-22 11:03:53 -07:00
Hadi Moshayedi	a5b087c89b	Support FKs between reference tables	2019-08-21 16:11:27 -07:00
Hadi Moshayedi	a3578a6e60	Sort load_shard_placement_array by worker name/port	2019-08-21 14:35:05 -07:00
Philip Dubé	7bf7e41594	commands/index.c: Fix assertion typo	2019-08-21 18:54:05 +00:00
Philip Dubé	f4b90419ae	Raise an error when REINDEX TABLE or INDEX is invoked on a distributed relation	2019-08-21 17:03:14 +00:00
Philip Dubé	db5a7f49a7	Task Tracker: fix error being copy pasted from above block	2019-08-21 15:44:01 +00:00
Philip Dubé	f62d4a6712	citus_rm_job_directory for multi_query_directory_cleanup	2019-08-19 17:04:42 +00:00
Philip Dubé	9777f22e1e	Avoid invalid array accesses to partitionFileArray	2019-08-19 17:04:42 +00:00
Philip Dubé	f4ca02664a	single_shard_commit_protocol: GUC_NO_SHOW_ALL	2019-08-18 12:54:32 +00:00
Hadi Moshayedi	c582eb89c8	Add some missing locks.	2019-08-15 12:34:31 -07:00
Philip Dubé	f4e513b3d4	Introduce citus.single_shard_commit_protocol for if users want 1PC on writes to replicas	2019-08-15 18:49:40 +00:00
Philip Dubé	cd951fa9ca	Avoid multiple pg_dist_colocation records being created for reference tables master_deactivate_node is updated to decrement the replication factor Otherwise deactivation could have create_reference_table produce a second record UpdateColocationGroupReplicationFactor is renamed UpdateColocationGroupReplicationFactorForReferenceTables & the implementation looks up the record based on distributioncolumntype == InvalidOid, rather than by id Otherwise the record's replication factor fails to be maintained when there are no reference tables	2019-08-13 17:21:02 +00:00
Nils Dijk	be6b7bec69	Add UDF citus_(prepare\|finish)_pg_upgrade to aid with upgrading citus (#2877 ) DESCRIPTION: Add functions to help with postgres upgrades Currently there is [a list of manual steps](https://docs.citusdata.com/en/v8.2/admin_guide/upgrading_citus.html?highlight=upgrade#upgrading-postgresql-version-from-10-to-11) to perform during a postgres upgrade. These steps guarantee our catalog tables are kept and counter values are maintained across upgrades. Having more than 1 command in our docs for users to manually execute during upgrades is error prone for both the user, and our docs. There are already 2 catalog tables that have been introduced to citus that have not been added to our docs for backing up during upgrades (`pg_authinfo` and `pg_dist_poolinfo`). As we add more functionality to citus we run into situations where there are more steps required either before or after the upgrade. At the same time, when we move catalog tables to a place where the contents will be maintained automatically during upgrades we could have less steps in our docs. This will come to a hard to maintain matrix of citus versions and steps to be performed. Instead we could take ownership of these steps within the extension itself. This PR introduces two new functions for the user to use instead of long lists of error prone instructions to follow. - `citus_prepare_pg_upgrade` This function should be called by the user right before shutting down the cluster. This will ensure all citus catalog tables are backed up in a location where the information will be retained during an upgrade. - `citus_finish_pg_upgrade` This function should be called right after a pg_upgrade of the cluster. This will restore the catalog tables to the state before the upgrade happend. Both functions need to be executed both on the coordinator and on all the workers, in the same fashion our current documentation instructs to do. There are two known problems with this function in its current form, which is also a problem with our docs. We should schedule time in the future to improve on this, but having it automated now is better as we are about to add extra steps to take after upgrades. - When you install citus in a clean cluster we do enable ssl for communication between the coordinator and the workers. If an upgrade to a clean cluster is performed we do not setup ssl on the new cluster causing the communication to fail. - There are no automated tests added in this PR to execute an upgrade test durning every build. Our current test infrastructure does not allow for 2 versions of postgres to exist in the same environment. We will need to invest time to create a new testing harness that could run the following scenario: 1. Create cluster 2. Run extensible scripts to execute arbitrary statements on this cluster 3. Perform an upgrade by preparing, upgrading and finishing 4. Run extensible scripts to verify all objects created by earlier scripts exists in correct form in the upgraded cluster Given the non trivial amount of work involved for such a suite I'd like to land this before we have automated testing. On a side note; As the reviewer noticed, the tables created in the public namespace are not visible in `psql` with `\d`. The backup catalog tables have the same name as the tables in `pg_catalog`. Due to postgres internals `pg_catalog` is first in the search path and therefore the non-qualified name would alwasy resolve to `pg_catalog.pg_dist_*`. Internally this is called a non-visible table as it would resolve to a different table without a qualified name. Only visible tables are shown with `\d`.	2019-08-13 15:53:10 +02:00
Hadi Moshayedi	009d8b7401	Some cleanup	2019-08-12 15:38:52 -07:00
Philip Dubé	705d1bf0e0	Use PG_JOB_CACHE_DIR	2019-08-09 15:25:59 +00:00
Onder Kalaci	060ac11476	Do not record relation accessess unnecessarily Before this commit, we've recorded the relation accesses in 3 different places - FindPlacementListConnection -- applies all executor in tx block - StartPlacementExecutionOnSession() -- adaptive executor only - StartPlacementListConnection() -- router/real-time only This is different than Citus 8.2, and could lead to query execution times increase considerably on multi-shard commands in transaction block that are on partitioned tables. Benchmarks: ``` 1+8 c5.4xlarge cluster Empty distributed partitioned table with 365 partitions: https://gist.github.com/onderkalaci/1edace4ed6bd6f061c8a15594865bb51#file-partitions_365-sql ./pgbench -f /tmp/multi_shard.sql -c10 -j10 -P 1 -T 120 postgres://citus:w3r6KLJpv3mxe9E-NIUeJw@c.fy5fkjcv45vcepaogqcaskmmkee.db.citusdata.com:5432/citus?sslmode=require cat /tmp/multi_shard.sql BEGIN; DELETE FROM collections_list; DELETE FROM collections_list; DELETE FROM collections_list; COMMIT; cat /tmp/single_shard.sql BEGIN; DELETE FROM collections_list WHERE key = :aid; DELETE FROM collections_list WHERE key = :aid; DELETE FROM collections_list WHERE key = :aid; COMMIT; cat /tmp/mix.sql BEGIN; DELETE FROM collections_list WHERE key = :aid; DELETE FROM collections_list WHERE key = :aid; DELETE FROM collections_list WHERE key = :aid; DELETE FROM collections_list; DELETE FROM collections_list; DELETE FROM collections_list; COMMIT; ``` The table shows `latency average` of pgbench runs explained above, so we have a pretty solid improvement even over 8.2.2. \| Test \| Citus 8.2.2 \| Citus 8.3.1 \| Citus 8.3.2 (this branch) \| Citus 8.3.1 (FKEYs disabled via GUC) \| \| ------------- \| ------------- \| ------------- \|------------- \| ------------- \| \|multi_shard \| 2370.083 ms \|3605.040 ms \|1324.094 ms \|1247.255 ms \| \| single_shard \| 85.338 ms \|120.934 ms \|73.216 ms \| 78.765 ms \| \| mix \| 2434.459 ms \| 3727.080 ms \|1306.456 ms \| 1280.326 ms \|	2019-08-08 18:42:08 +02:00
Onder Kalaci	35ee896f3d	Get rid of an unnecessary parameter targetPoolSize parameter for ExecuteUtilityTaskListWithoutResults becomes obsolete, just remove it.	2019-08-07 19:35:56 +02:00
Onder Kalaci	b2e01d0745	Refactor switching to sequential mode We don't need to wait until the execution. As soon as we realize that we need sequential execution, we should do it.	2019-08-07 19:35:56 +02:00
Philip Dubé	b77c52f95b	PlanRouterQuery: don't store list of list of shard intervals in relationShardList	2019-08-02 14:08:57 +00:00
Philip Dubé	fdc0ef6392	Adaptive executor: use 2PC when replication_factor > 1	2019-08-01 23:55:12 +00:00
Philip Dubé	064bd66a20	Avoid segfault in logging queries	2019-07-31 15:28:46 +00:00
Philip Dubé	3982b4635f	CompareShardIntervals: if intervals are equal, compare id. Works around sort being unstable	2019-07-26 16:13:36 +00:00
Marco Slot	e2bc09838e	Use ereport instead of elog in adaptive executor	2019-07-23 20:40:32 +02:00
Marco Slot	bd111366b0	Skip CheckConnectionTimeout when checkForPoolTimeout is false	2019-07-23 20:40:32 +02:00
Marco Slot	a3811b1e55	Avoid FindWorkerNode calls in adaptive executor	2019-07-23 20:40:32 +02:00
Marco Slot	4444d92dbc	Set initial pool size to cached connection count	2019-07-23 20:40:32 +02:00
Marco Slot	4c0c33365e	Avoid creating a redundant event set at the start	2019-07-23 20:40:32 +02:00
Marco Slot	32e7a80960	Avoid unnecessary calls to PQconsumeInput	2019-07-23 20:40:32 +02:00
Marco Slot	71ad5c095b	Use ModifyWaitEvent when only wait flags changed	2019-07-23 20:40:32 +02:00
Philip Dubé	50144b75d0	Add check-empty to testing Makefile Don't create functions multiple times Move ALTER TABLEs to their declaration Remove DROP FUNCTIONS IF EXISTS, OR REPLACE	2019-07-24 11:03:54 -07:00
Philip Dubé	acbaa38a62	Squash migrations for versions 5/6, don't use WITH OIDS	2019-07-24 11:03:29 -07:00
Hanefi Onaldi	8127297999	update workerNodeList after sorting	2019-07-23 20:57:07 +00:00
Marco Slot	efbe58eab2	Fix SQL schema version, we skipped 8.3	2019-07-17 16:05:25 +02:00
Philip Dubé	0915027389	DistributedPlan: replace operation with modLevel This causes no behaviorial changes, only organizes better to implement modifying CTEs Also rename ExtactInsertRangeTableEntry to ExtractResultRelationRTE, as the source of this function didn't match the documentation Remove Task's upsertQuery in favor of ROW_MODIFY_NONCOMMUTATIVE Split up AcquireExecutorShardLock into more internal functions Tests: Normalize multi_reference_table multi_create_table_constraints	2019-07-16 13:58:18 -07:00
Hanefi Onaldi	0bdec52761	Fix default_version in citus.control file (#2840 )	2019-07-11 14:24:51 +03:00
Hanefi Onaldi	5a6eba6ba9	Bump Citus to 8.4devel	2019-07-10 15:26:10 +03:00
Nils Dijk	791cc26a86	Fix an issue with subquery map merge jobs as non-root Also automated all manual tests around multi user isolation for internal citus udf's automate upgrade_to_reference_table tests add negative tests for lock_relation_if_exists add tests for permissions on worker_cleanup_job_schema_cache add tests for worker_fetch_partition_file add tests for worker_merge_files_into_table fix problem with worker_merge_files_and_run_query when run as non-super user and add tests for behaviour	2019-07-10 12:40:05 +02:00
Hadi Moshayedi	91d8a41ecd	Don't modify cache entry in RelationShardListForShardCreate()	2019-07-09 12:44:48 -07:00
Hadi Moshayedi	032167c553	Fix Assert() in ProcessVariableSetStmt()	2019-07-05 14:11:22 -07:00
Marco Slot	07d2266e11	Fix RESET and other types of SET	2019-07-05 19:30:48 +02:00
Marco Slot	97334ff1ec	Copy WorkerNode before returning in FindWorkerNode	2019-07-05 09:35:53 +02:00
Hadi Moshayedi	47aa95d00d	Fix a NULL dereference.	2019-07-03 16:26:49 -07:00
Hadi Moshayedi	805a2ac602	Fix a use after free in adaptive executor	2019-07-02 10:12:13 -07:00
Marco Slot	d6c667946c	Fix citus_executor_name mapping by reimplementing it in C	2019-06-29 22:38:29 +02:00
Marco Slot	70c0d96507	Track partition key for adaptive executor in CitusEndScan	2019-06-29 21:37:15 +02:00
Önder Kalacı	40da78c6fd	Introduce the adaptive executor (#2798 ) With this commit, we're introducing the Adaptive Executor. The commit message consists of two distinct sections. The first part explains how the executor works. The second part consists of the commit messages of the individual smaller commits that resulted in this commit. The readers can search for the each of the smaller commit messages on https://github.com/citusdata/citus and can learn more about the history of the change. /------------------------------------------------------------------------- * adaptive_executor.c * * The adaptive executor executes a list of tasks (queries on shards) over * a connection pool per worker node. The results of the queries, if any, * are written to a tuple store. * * The concepts in the executor are modelled in a set of structs: * * - DistributedExecution: * Execution of a Task list over a set of WorkerPools. * - WorkerPool * Pool of WorkerSessions for the same worker which opportunistically * executes "unassigned" tasks from a queue. * - WorkerSession: * Connection to a worker that is used to execute "assigned" tasks * from a queue and may execute unasssigned tasks from the WorkerPool. * - ShardCommandExecution: * Execution of a Task across a list of placements. * - TaskPlacementExecution: * Execution of a Task on a specific placement. * Used in the WorkerPool and WorkerSession queues. * * Every connection pool (WorkerPool) and every connection (WorkerSession) * have a queue of tasks that are ready to execute (readyTaskQueue) and a * queue/set of pending tasks that may become ready later in the execution * (pendingTaskQueue). The tasks are wrapped in a ShardCommandExecution, * which keeps track of the state of execution and is referenced from a * TaskPlacementExecution, which is the data structure that is actually * added to the queues and describes the state of the execution of a task * on a particular worker node. * * When the task list is part of a bigger distributed transaction, the * shards that are accessed or modified by the task may have already been * accessed earlier in the transaction. We need to make sure we use the * same connection since it may hold relevant locks or have uncommitted * writes. In that case we "assign" the task to a connection by adding * it to the task queue of specific connection (in * AssignTasksToConnections). Otherwise we consider the task unassigned * and add it to the task queue of a worker pool, which means that it * can be executed over any connection in the pool. * * A task may be executed on multiple placements in case of a reference * table or a replicated distributed table. Depending on the type of * task, it may not be ready to be executed on a worker node immediately. * For instance, INSERTs on a reference table are executed serially across * placements to avoid deadlocks when concurrent INSERTs take conflicting * locks. At the beginning, only the "first" placement is ready to execute * and therefore added to the readyTaskQueue in the pool or connection. * The remaining placements are added to the pendingTaskQueue. Once * execution on the first placement is done the second placement moves * from pendingTaskQueue to readyTaskQueue. The same approach is used to * fail over read-only tasks to another placement. * * Once all the tasks are added to a queue, the main loop in * RunDistributedExecution repeatedly does the following: * * For each pool: * - ManageWorkPool evaluates whether to open additional connections * based on the number unassigned tasks that are ready to execute * and the targetPoolSize of the execution. * * Poll all connections: * - We use a WaitEventSet that contains all (non-failed) connections * and is rebuilt whenever the set of active connections or any of * their wait flags change. * * We almost always check for WL_SOCKET_READABLE because a session * can emit notices at any time during execution, but it will only * wake up WaitEventSetWait when there are actual bytes to read. * * We check for WL_SOCKET_WRITEABLE just after sending bytes in case * there is not enough space in the TCP buffer. Since a socket is * almost always writable we also use WL_SOCKET_WRITEABLE as a * mechanism to wake up WaitEventSetWait for non-I/O events, e.g. * when a task moves from pending to ready. * * For each connection that is ready: * - ConnectionStateMachine handles connection establishment and failure * as well as command execution via TransactionStateMachine. * * When a connection is ready to execute a new task, it first checks its * own readyTaskQueue and otherwise takes a task from the worker pool's * readyTaskQueue (on a first-come-first-serve basis). * * In cases where the tasks finish quickly (e.g. <1ms), a single * connection will often be sufficient to finish all tasks. It is * therefore not necessary that all connections are established * successfully or open a transaction (which may be blocked by an * intermediate pgbouncer in transaction pooling mode). It is therefore * essential that we take a task from the queue only after opening a * transaction block. * * When a command on a worker finishes or the connection is lost, we call * PlacementExecutionDone, which then updates the state of the task * based on whether we need to run it on other placements. When a * connection fails or all connections to a worker fail, we also call * PlacementExecutionDone for all queued tasks to try the next placement * and, if necessary, mark shard placements as inactive. If a task fails * to execute on all placements, the execution fails and the distributed * transaction rolls back. * * For multi-row INSERTs, tasks are executed sequentially by * SequentialRunDistributedExecution instead of in parallel, which allows * a high degree of concurrency without high risk of deadlocks. * Conversely, multi-row UPDATE/DELETE/DDL commands take aggressive locks * which forbids concurrency, but allows parallelism without high risk * of deadlocks. Note that this is unrelated to SEQUENTIAL_CONNECTION, * which indicates that we should use at most one connection per node, but * can run tasks in parallel across nodes. This is used when there are * writes to a reference table that has foreign keys from a distributed * table. * * Execution finishes when all tasks are done, the query errors out, or * the user cancels the query. * ------------------------------------------------------------------------- / All the commits involved here: * Initial unified executor prototype * Latest changes * Fix rebase conflicts to master branch * Add missing variable for assertion * Ensure that master_modify_multiple_shards() returns the affectedTupleCount * Adjust intermediate result sizes The real-time executor uses COPY command to get the results from the worker nodes. Unified executor avoids that which results in less data transfer. Simply adjust the tests to lower sizes. * Force one connection per placement (or co-located placements) when requested The existing executors (real-time and router) always open 1 connection per placement when parallel execution is requested. That might be useful under certain circumstances: (a) User wants to utilize as much as CPUs on the workers per distributed query (b) User has a transaction block which involves COPY command Also, lots of regression tests rely on this execution semantics. So, we'd enable few of the tests with this change as well. * For parameters to be resolved before using them For the details, see PostgreSQL's copyParamList() * Unified executor sorts the returning output * Ensure that unified executor doesn't ignore sequential execution of DDLJob's Certain DDL commands, mainly creating foreign keys to reference tables, should be executed sequentially. Otherwise, we'd end up with a self distributed deadlock. To overcome this situaiton, we set a flag `DDLJob->executeSequentially` and execute it sequentially. Note that we have to do this because the command might not be called within a transaction block, and we cannot call `SetLocalMultiShardModifyModeToSequential()`. This fixes at least two test: multi_insert_select_on_conflit.sql and multi_foreign_key.sql Also, I wouldn't mind scattering local `targetPoolSize` variables within the code. The reason is that we'll soon have a GUC (or a global variable based on a GUC) that'd set the pool size. In that case, we'd simply replace `targetPoolSize` with the global variables. * Fix 2PC conditions for DDL tasks * Improve closing connections that are not fully established in unified execution * Support foreign keys to reference tables in unified executor The idea for supporting foreign keys to reference tables is simple: Keep track of the relation accesses within a transaction block. - If a parallel access happens on a distributed table which has a foreign key to a reference table, one cannot modify the reference table in the same transaction. Otherwise, we're very likely to end-up with a self-distributed deadlock. - If an access to a reference table happens, and then a parallel access to a distributed table (which has a fkey to the reference table) happens, we switch to sequential mode. Unified executor misses the function calls that marks the relation accesses during the execution. Thus, simply add the necessary calls and let the logic kick in. * Make sure to close the failed connections after the execution * Improve comments * Fix savepoints in unified executor. * Rebuild the WaitEventSet only when necessary * Unclaim connections on all errors. * Improve failure handling for unified executor - Implement the notion of errorOnAnyFailure. This is similar to Critical Connections that the connection managament APIs provide - If the nodes inside a modifying transaction expand, activate 2PC - Fix few bugs related to wait event sets - Mark placement INACTIVE during the execution as much as possible as opposed to we do in the COMMIT handler - Fix few bugs related to scheduling next placement executions - Improve decision on when to use 2PC Improve the logic to start a transaction block for distributed transactions - Make sure that only reference table modifications are always executed with distributed transactions - Make sure that stored procedures and functions are executed with distributed transactions * Move waitEventSet to DistributedExecution This could also be local to RunDistributedExecution(), but in that case we had to mark it as "volatile" to avoid PG_TRY()/PG_CATCH() issues, and cast it to non-volatile when doing WaitEventSetFree(). We thought that would make code a bit harder to read than making this non-local, so we move it here. See comments for PG_TRY() in postgres/src/include/elog.h and "man 3 siglongjmp" for more context. * Fix multi_insert_select test outputs Two things: 1) One complex transaction block is now supported. Simply update the test output 2) Due to dynamic nature of the unified executor, the orders of the errors coming from the shards might change (e.g., all of the queries on the shards would fail, but which one appears on the error message?). To fix that, we simply added it to our shardId normalization tool which happens just before diff. * Fix subeury_and_cte test The error message is updated from: failed to execute task To: more than one row returned by a subquery or an expression which is a lot clearer to the user. * Fix intermediate_results test outputs Simply update the error message from: could not receive query results to result "squares" does not exist which makes a lot more sense. * Fix multi_function_in_join test The error messages update from: Failed to execute task XXX To: function f(..) does not exist * Fix multi_query_directory_cleanup test The unified executor does not create any intermediate files. * Fix with_transactions test A test case that just started to work fine * Fix multi_router_planner test outputs The error message is update from: Could not receive query results To: Relation does not exists which is a lot more clearer for the users * Fix multi_router_planner_fast_path test The error message is update from: Could not receive query results To: Relation does not exists which is a lot more clearer for the users * Fix isolation_copy_placement_vs_modification by disabling select_opens_transaction_block * Fix ordering in isolation_multi_shard_modify_vs_all * Add executor locks to unified executor * Make sure to allocate enought WaitEvents The previous code was missing the waitEvents for the latch and postmaster death. * Fix rebase conflicts for master rebase * Make sure that TRUNCATE relies on unified executor * Implement true sequential execution for multi-row INSERTS Execute the individual tasks executed one by one. Note that this is different than MultiShardConnectionType == SEQUENTIAL_CONNECTION case (e.g., sequential execution mode). In that case, running the tasks across the nodes in parallel is acceptable and implemented in that way. However, the executions that are qualified here would perform poorly if the tasks across the workers are executed in parallel. We currently qualify only one class of distributed queries here, multi-row INSERTs. If we do not enforce true sequential execution, concurrent multi-row upserts could easily form a distributed deadlock when the upserts touch the same rows. * Remove SESSION_LIFESPAN flag in unified_executor * Apply failure test updates We've changed the failure behaviour a bit, and also the error messages that show up to the user. This PR covers majority of the updates. * Unified executor honors citus.node_connection_timeout With this commit, unified executor errors out if even a single connection cannot be established within citus.node_connection_timeout. And, as a side effect this fixes failure_connection_establishment test. * Properly increment/decrement pool size variables Before this commit, the idle and active connection counts were not properly calculated. * insert_select_executor goes through unified executor. * Add missing file for task tracker * Modify ExecuteTaskListExtended()'s signature * Sort output of INSERT ... SELECT ... RETURNING * Take partition locks correctly in unified executor * Alternative implementation for force_max_query_parallelization * Fix compile warnings in unified executor * Fix style issues * Decrement idleConnectionCount when idle connection is lost * Always rebuild the wait event sets In the previous implementation, on waitFlag changes, we were only modifying the wait events. However, we've realized that it might be an over optimization since (a) we couldn't see any performance benefits (b) we see some errors on failures and because of (a) we prefer to disable it now. * Make sure to allocate enough sized waitEventSet With multi-row INSERTs, we might have more sessions than taskworkerCount after few calls of RunDistributedExecution() because the previous sessions would also be alive. Instead, re-allocate events when the connectino set changes. Implement SELECT FOR UPDATE on reference tables On master branch, we do two extra things on SELECT FOR UPDATE queries on reference tables: - Acquire executor locks - Execute the query on all replicas With this commit, we're implementing the same logic on the new executor. * SELECT FOR UPDATE opens transaction block even if SelectOpensTransactionBlock disabled Otherwise, users would be very confused and their logic is very likely to break. * Fix build error * Fix the newConnectionCount calculation in ManageWorkerPool * Fix rebase conflicts * Fix minor test output differences * Fix citus indent * Remove duplicate sorts that is added with rebase * Create distributed table via executor * Fix wait flags in CheckConnectionReady * failure_savepoints output for unified executor. * failure_vacuum output (pg 10) for unified executor. * Fix WaitEventSetWait timeout in unified executor * Stabilize failure_truncate test output * Add an ORDER BY to multi_upsert * Fix regression test outputs after rebase to master * Add executor.c comment * Rename executor.c to adaptive_executor.c * Do not schedule tasks if the failed placement is not ready to execute Before the commit, we were blindly scheduling the next placement executions even if the failed placement is not on the ready queue. Now, we're ensuring that if failed placement execution is on a failed pool or session where the execution is on the pendingQueue, we do not schedule the next task. Because the other placement execution should be already running. * Implement a proper custom scan node for adaptive executor - Switch between the executors, add GUC to set the pool size - Add non-adaptive regression test suites - Enable CIRCLE CI for non-adaptive tests - Adjust test output files * Add slow start interval to the executor * Expose max_cached_connection_per_worker to user * Do not start slow when there are cached connections * Consider ExecutorSlowStartInterval in NextEventTimeout * Fix memory issues with ReceiveResults(). * Disable executor via TaskExecutorType * Make sure to execute the tests with the other executor * Use task_executor_type to enable-disable adaptive executor * Remove useless code * Adjust the regression tests * Add slow start regression test * Rebase to master * Fix test failures in adaptive executor. * Rebase to master - 2 * Improve comments & debug messages * Set force_max_query_parallelization in isolation_citus_dist_activity * Force max parallelization for creating shards when asked to use exclusive connection. * Adjust the default pool size * Expand description of max_adaptive_executor_pool_size GUC * Update warnings in FinishRemoteTransactionCommit() * Improve session clean up at the end of execution Explicitly list all the states that the execution might end, otherwise warn. * Remove MULTI_CONNECTION_WAIT_RETRY which is not used at all * Add more ORDER BYs to multi_mx_partitioning	2019-06-28 14:04:40 +02:00
Hanefi Onaldi	7e8fd49b94	Create Schemas as superuser on all shard/table creation UDFs - All the schema creations on the workers will now be via superuser connections - If a shard is being repaired or a shard is replicated, we will create the schema only in the relevant worker; and in all the other cases where a schema creation is needed, we will block operations until we ensure the schema exists in all the workers	2019-06-26 17:12:28 +02:00
Philip Dubé	db7fdb1854	Router planner: bail on volatile functions in CTEs	2019-06-26 10:32:01 +02:00
Philip Dubé	5c62f9935a	Router planner: reject SELECT FOR UPDATE ctes	2019-06-26 10:32:01 +02:00
Philip Dubé	77efec04a0	Router Planner: accept SELECT_CMD ctes in modification queries	2019-06-26 10:32:01 +02:00
Philip Dubé	84fe626378	multi_router_planner: refactor error propagation	2019-06-26 10:32:01 +02:00
Philip Dubé	9ed6dd5570	Ignore compile_commands.json, fix typo	2019-06-26 10:32:01 +02:00
Onder Kalaci	ad93d6feea	Change the order of placement access added to the list This is to make sure that the error messages related to foreign keys to reference tables shows the exact placement access name instead of SELECT.	2019-06-23 11:32:58 +02:00
Nils Dijk	eb98f2d13a	Fix null pointer caused by partial initialization of ConnParamsHashEntry (#2789 ) It has been reported a null pointer dereference could be triggered in FreeConnParamsHashEntryFields. Likely cause is an error in GetConnParams which will leave the cached ConnParamsHashEntry in a state that would cause the null pointer dereference in a subsequent connection establishment to the same server. This has been simulated by inserting ereport(ERROR, ...) at certain places in the code. Not only would ConnParamsHashEntry be in a state that would cause a crash, it was also leaking memory in the ConnectionContext due to the loss of pointers as they are only stored on the ConnParamsHashEntry at the end of the function. This patch rewrites both the GetConnParams to store pointers 'durably' at every point in the code so that an error would not lose the pointer as well as FreeConnParamsHashEntryFields in a way that it can clear half initialised ConnParamsHashEntry's in a safer manner.	2019-06-21 18:16:43 +02:00
Nils Dijk	5df1b49bed	Feature: optionally force master_update_node during failover (#2773 ) When `master_update_node` is called to update a node's location it waits for appropriate locks to become available. This is useful during normal operation as new operations will be blocked till after the metadata update while running operations have time to finish. When `master_update_node` is called after a node failure it is less useful to wait for running operations to finish as they can't. The lock being held indicates an operation that once attempted to commit will fail as the machine already failed. Now the downside is the failover is postponed till the termination point of the operation. This has been observed by users to take a significant amount of time causing the rest of the system to be observed unavailable. With this patch it is possible in such situations to invoke `master_update_node` with 2 optional arguments: - `force` (bool defaults to `false`): When called with true the update of the metadata will be forced to proceed by terminating conflicting backends. A cancel is not enough as the backend might be in idle time (eg. an interactive session, or going back and forth between an appliaction), therefore a more intrusive solution of termination is used here. - `lock_cooldown` (int defaults to `10000`): This is the time in milliseconds before conflicting backends are terminated. This is to allow the backends to finish cleanly before terminating them. This allows the user to set an upperbound to the expected time to complete the metadata update, eg. performing the failover. The functionality is implemented by spawning a background worker that has the task of helping a certain backend in acquiring its locks. The backend is either terminated on successful execution of the metadata update, or once the memory context of the expression gets reset, eg. on a cancel of the statement.	2019-06-21 12:03:15 +02:00
Jason Petersen	d4e1172247	Implement propagation of SET LOCAL commands Adds support for propagation of SET LOCAL commands to all workers involved in a query. For now, SET SESSION (i.e. plain SET) is not supported whatsoever, though this code is intended as somewhat of a base for implementing such support in the future. As SET LOCAL modifications are scoped to the body of a BEGIN/END xact block, queries wishing to use SET LOCAL propagation must be within such a block. In addition, subsequent modifications after e.g. any SAVEPOINT or ROLLBACK statements will correspondingly push or pop variable mod- ifications onto an internal stack such that the behavior of changed values across the cluster will be identical to such behavior on e.g. single-node PostgreSQL (or equivalently, what values are visible to the end user by running SHOW on such variables on the coordinator). If nodes enter the set of participants at some point after SET LOCAL modifications (or SAVEPOINT, ROLLBACK, etc.) have occurred, the SET variable state is eagerly propagated to them upon their entrance (this is identical to, and indeed just augments, the existing logic for the propagation of the SAVEPOINT "stack"). A new GUC (citus.propagate_set_commands) has been added to control this behavior. Though the code suggests the valid settings are 'none', 'local', 'session', and 'all', only 'none' (the default) and 'local' are presently implemented: attempting to use other values will result in an error.	2019-06-20 16:15:43 -07:00
Jason Petersen	1dec6c5163	Change BeginCoordinatedTransaction to internal linkage It's only ever called from a single file, so having it be extern didn't make a whole lot of sense.	2019-06-20 13:44:06 -07:00
Jason Petersen	2349e8e75c	Remove extraneous comments around PG header change	2019-06-20 13:37:53 -07:00
Hadi Moshayedi	4bbae02778	Make COPY compatible with unified executor.	2019-06-20 19:53:40 +02:00
Hadi Moshayedi	2e6d04df7b	Refactor ExecuteModifyTasksSequentially.	2019-06-20 18:38:57 +02:00
Hadi Moshayedi	83f6c7dab4	Fix subxact release crash	2019-06-19 17:43:10 +02:00
Onder Kalaci	2b0c4accda	Apply feedback	2019-06-19 10:03:58 +02:00
Onder Kalaci	3a04374a9e	Refactor relation shard list creation during placement creation This change is to make further refactoring even simpler such as using the executor for shard creation.	2019-06-19 10:03:58 +02:00
Onder Kalaci	4fd1fcbbef	Refactor shard creation logic This is a preperation for the new executor, where creating shards would go through the executor. So, explicitly generate the commands for further processing.	2019-06-19 10:03:58 +02:00
Philip Dubé	4bfcf5b665	Enable Werror for all warnings Changes to ruleutils match changes made upstream to silence gcc fallthrough warnings	2019-06-18 14:43:54 -07:00
Hadi Moshayedi	b240854b8c	Use SendCancelationRequest() in ShutdownConnection()	2019-06-18 12:10:05 +02:00
Philip Dubé	342d423725	Fix join alias resolution FROM (query) alias ignored renaming In nested subqueries the select list would rename, while the join alias would not respect that	2019-06-12 17:25:07 -07:00
Marco Slot	c1ac794b77	enable_statistics_collection defaults to off	2019-06-05 18:43:26 +02:00
Hadi Moshayedi	85325e0098	Refactor ScanStateGetExecutorState into its own function.	2019-06-05 09:16:43 -07:00
Hadi Moshayedi	0b01c59fa6	Refactor ScanStateGetTupleDescriptor() into a function.	2019-06-04 15:19:49 -07:00
Hadi Moshayedi	8e2d328530	Search all outer node levels for lateral join params.	2019-06-04 10:14:05 -07:00
Philip Dubé	b5ced403d8	Also check rewrittenQuery jointree for outer join	2019-06-04 07:47:35 -07:00
Hadi Moshayedi	dee5bc31b4	Refactor ShardIdForTuple() to a separate function.	2019-06-02 09:48:15 -07:00
Marco Slot	bb3a96eacb	Cache a configurable number of connections at xact end	2019-05-29 13:24:31 +02:00
Hadi Moshayedi	23207a43e0	Fix a typo: WITH CARDINALITY -> WITH ORDINALITY	2019-05-24 15:49:17 -07:00
Philip Dubé	b8871d9ff4	Propagate more ALTER FOREIGN TABLE to workers	2019-05-24 12:54:05 -07:00
Marco Slot	b3fcf2a48f	Deprecate master_modify_multiple_shards	2019-05-24 15:22:06 +02:00
Marco Slot	7fa5d36057	Stop using master_modify_multiple_shards in TRUNCATE	2019-05-24 14:35:46 +02:00
exialin	59e54de54d	Minor code clean-up	2019-05-24 14:26:26 +02:00
Hanefi Onaldi	4d737177e6	Remove redundant active placement filters and unneded sort operations If a query is router executable, it hits a single shard and therefore has a single task associated with it. Therefore there is no need to sort the task list that has a single element. Also we already have a list of active shard placements, sending it in param and reuse it.	2019-05-24 14:16:50 +03:00
Philip Dubé	16886b3c63	Fix misc typos	2019-05-23 17:23:27 -07:00
Hadi Moshayedi	8ae47e1244	Fix comments for RemoteFileDestReceiverStartup and CitusCopyDestReceiverStartup	2019-05-21 09:03:22 -07:00
Hadi Moshayedi	dce9260c0e	Fix an include in recusive_planning.c	2019-05-20 18:57:03 -07:00
Murat Tuncer	3fe482adbc	Fix DistShardCacheHash initialization InitializeCaches() method may prematurely set performedInitialization without actually creating DistShardCacheHash. Fix makes sure flag is set only if DistShardCacheHash is created successfully. Also introduced a new memory context to allocate aforementioned hash tables. If allocation/initialization fails for any reason we make sure memory is reclaimed by deleting the memory context.	2019-05-15 16:47:44 +03:00
Hanefi Onaldi	4030d603eb	Merge pull request #2691 from citusdata/update_changelog Add 8.1.2 and 8.2.1 changelog entries	2019-05-15 09:18:58 +03:00
Onder Kalaci	495b6e9b62	Refactor Parallel Relation Access Recording Instead of scattering the code around, we move all the logic into a single function. This will help supporting foreign keys to reference tables in the unified executor with a single line of change, just calling this function.	2019-05-02 18:12:33 +03:00
Hadi Moshayedi	32ecb6884c	Test ROLLBACK TO SAVEPOINT with multi-shard CTE failures	2019-05-01 09:33:43 -07:00
Hadi Moshayedi	aafd22dffa	Fix savepoint rollback for INSERT INTO ... SELECT.	2019-05-01 09:33:43 -07:00
Hadi Moshayedi	b69a762e0b	Fix savepoint rollback after multi-shard update failure.	2019-05-01 09:33:43 -07:00
Jason Petersen	71d5d1c865	Enable variable shadowing warnings; fix all Rather than wait for another place like the previous commit to bite us, I think we should turn on this warning.	2019-04-30 13:24:25 -06:00
Jason Petersen	1125fc9da0	Fix self-strncmp in ConstrIsFKToReferenceTable Make the function do what I assume was intended.	2019-04-30 13:24:25 -06:00
Hadi Moshayedi	c9b1d9c2d1	Check all placements aren't inactive	2019-04-26 10:04:55 -07:00
Hadi Moshayedi	7b1d03772d	Don't schedule tasks on inactive nodes.	2019-04-26 10:04:54 -07:00
Onder Kalaci	004f28e18c	Sort output of RETURNING The feature is only intended for getting consistent outputs for the regression tests. RETURNING does not have any ordering gurantees and with unified executor, the ordering of query executions on the shards are also becoming unpredictable. Thus, we're enforcing ordering when a GUC is set. We implicitly add an `ORDER BY` something equivalent of ` RETURNING expr1, expr2, .. ,exprN ORDER BY expr1, expr2, .. ,exprN ` As described in the code comments as well, this is probably not the most performant approach we could implement. However, since we're only targeting regression tests, I don't see any issues with that. If we decide to expand this to a feature to users, we should revisit the implementation and improve the performance.	2019-04-24 11:51:19 +03:00
Jason Petersen	4b9519e7d6	Check for non-extended constraint before extending This will only apply to DROP and VALIDATE commands; see the lengthy comment in multi_create_table_constraints.sql for more explanation.	2019-04-15 23:14:21 -06:00
Onder Kalaci	7d872a343a	Rename MultiConnectionState to MultiConnectionPollState	2019-04-05 11:50:11 +03:00
Onder Kalaci	fb38dc3136	Ensure that stack resizing logic works expected This commit has two goals: (a) Ensure to access both edges of the allocated stack (b) Ensure that any compiler optimizations to prevent the function optimized away. Stack size after the patch: sudo grep -A 1 stack /proc/2119/smaps 7ffe305a6000-7ffe307a9000 rw-p 00000000 00:00 0 [stack] Size: 2060 kB Stack size before the patch: sudo grep -A 1 stack /proc/3610/smaps 7fff09957000-7fff09978000 rw-p 00000000 00:00 0 [stack] Size: 132 kB	2019-04-03 10:58:19 +03:00
Murat Tuncer	1424f75ec9	Support columns referencing an aliased joins We used to rely on PG function flatten_join_alias_vars to resolve actual columns referenced in target entry list. The function goes deep and finds the actual relation. This logic usually works fine. However, when joins are given an alias, inner relation names are not visible to target entry entry. Thus relation resolving should stop when we the target entry column refers an rte of an aliased join. We stopped using PG function and provided our own flatten function.	2019-03-26 09:46:22 +03:00
Jason Petersen	4c7f78bd7e	Code review feedback	2019-03-25 22:07:27 -05:00
Jason Petersen	6a0dc7756e	Formatting fixes Noticed a lot of weird lines wrapped at 80; our standard is 90.	2019-03-22 20:32:19 -06:00
Jason Petersen	6acf52660c	Always coerce RHS of pruning op to part. key type Our assumption that strip_implicit_coercions would leave us with a bi- nary-compatible type to that of the partition key was wrong. Instead, we should ensure the RHS of the comparison we perform is proactively coerced into a compatible type (at least binary compatible).	2019-03-22 20:32:19 -06:00
Jason Petersen	5baa257c91	Add second assert to guard against future changes This isn't entirely necessary but I feel safer with it here.	2019-03-22 20:32:19 -06:00
Jason Petersen	69adb627c3	Add Assert that will crash before coercion fix is in	2019-03-22 20:32:19 -06:00
Nils Dijk	feaac69769	Implementation for asycn FinishConnectionListEstablishment (#2584 )	2019-03-22 17:30:42 +01:00
Marco Slot	e3b7e74f43	Allow rescan in DECLARE .. WITH HOLD	2019-03-22 11:25:55 +01:00
Jason Petersen	a2c6f596f9	Address code review comments	2019-03-21 11:59:52 -06:00
Jason Petersen	04aa34da68	Invalidate ConnParamsHash at config reload At configuration reload, we free all "global" (i.e. GUC-set) connection parameters, but these may still have live references in the connection parameters hash. By marking the entries as invalid, we can ensure they will not be used after free.	2019-03-21 00:03:35 -06:00
Jason Petersen	00d836e5a3	alloc non-global conn. params in provided context Having DATA-segment string literals made blindly freeing the keywords/ values difficult, so I've switched to allocating all in the provided context; because of this (and with the knowledge of the end point of the global parameters), we can safely pfree non-global parameters when we come across an invalid connection parameter entry.	2019-03-21 00:03:35 -06:00
Marco Slot	e8152d9b6d	Only look in top-level rtable in ExtractFirstDistributedTableId	2019-03-20 12:14:46 +03:00
Marco Slot	ee6a0b6943	Speed up RTE walkers Do it in two ways (a) re-use the rte list as much as possible instead of re-calculating over and over again (b) Limit the recursion to the relevant parts of the query tree	2019-03-20 12:14:46 +03:00
Marco Slot	5ff1821411	Cache the current database name Purely for performance reasons.	2019-03-20 12:14:46 +03:00
Marco Slot	0ea4e52df5	Add nodeId to shardPlacements and use it for shard placement comparisons Before this commit, shardPlacements were identified with shardId, nodeName and nodeport. Instead of using nodeName and nodePort, we now use nodeId since it apparently has performance benefits in several places in the code.	2019-03-20 12:14:46 +03:00
Onder Kalaci	ad5ff1d01a	Some queries lead to infinite recursion with recurisve planning The rule for infinite recursion is the following: - If the query contains a subquery which is recursively planned, and no other subqueries can be recursively planned due to correlation (e.g., LATERAL joins), the planner keeps recursing again and again. One interesting thing here is that even if a subquery contains only intermediate result(s), we re-recursively plan that. In the end, the logic in the code does the following: - Try recursive planning any of the subqueries in the query tree - If any subquery is recursively planned, call the planner again where the subquery is replaced with the intermediate result. - Try recursively planning any of the queries - If any subquery is recursively planned, call the planner again where the subquery (in this case it is already intermediate result) is replaced with the intermediate result. - Try recursively planning any of the queries - If any subquery is recursively planned, call the planner again where the subquery (in this case it is already intermediate result) is replaced with the intermediate result. - Try recursively planning any of the queries - If any subquery is recursively planned, call the planner again where the subquery (in this case it is already intermediate result) is replaced with the intermediate result. ......	2019-03-18 10:35:00 +03:00
Marco Slot	f2abf2b8e5	Functions are treated as transaction blocks	2019-03-15 16:34:08 -06:00
Marco Slot	4b9bd54ae0	Remove create_insert_proxy_for_table	2019-03-15 14:13:03 -06:00
exialin	84b853e1b5	Fix some typos (#2620 )	2019-03-14 16:48:31 -07:00
Hadi Moshayedi	a9e6d06a98	Skip execution of ALTER TABLE constraint checks on the coordinator	2019-03-14 15:40:56 -07:00
Hadi Moshayedi	cdd3b15ac8	Fix distributed deadlock for ALTER TABLE ... ATTACH PARTITION. Following scenario resulted in distributed deadlock before this commit: CREATE TABLE partitioning_test(id int, time date) PARTITION BY RANGE (time); CREATE TABLE partitioning_test_2009 (LIKE partitioning_test); CREATE TABLE partitioning_test_reference(id int PRIMARY KEY, subid int); SELECT create_distributed_table('partitioning_test_2009', 'id'), create_distributed_table('partitioning_test', 'id'), create_reference_table('partitioning_test_reference'); ALTER TABLE partitioning_test ADD CONSTRAINT partitioning_reference_fkey FOREIGN KEY (id) REFERENCES partitioning_test_reference(id) ON DELETE CASCADE; ALTER TABLE partitioning_test_2009 ADD CONSTRAINT partitioning_reference_fkey_2009 FOREIGN KEY (id) REFERENCES partitioning_test_reference(id) ON DELETE CASCADE; ALTER TABLE partitioning_test ATTACH PARTITION partitioning_test_2009 FOR VALUES FROM ('2009-01-01') TO ('2010-01-01');	2019-03-14 15:28:37 -07:00
Hadi Moshayedi	f19feb742c	Remove never assigned colocatedRelation from CreateDistributedTable (#2479 )	2019-03-12 14:50:18 -07:00
Murat Tuncer	2681231c98	Create column aliases for shard tables in worker queries when requested	2019-03-07 12:54:42 +03:00
Hadi Moshayedi	f4d3b94e22	Fix some of the casts for groupId (#2609 ) A small change which partially addresses #2608.	2019-03-05 12:06:44 -08:00
velioglu	faf50849d7	Enhance pushdown planning logic to handle full outer joins with using clause Since flattening query may flatten outer joins' columns into coalesce expr that is in the USING part, and that was not expected before this commit, these queries were erroring out. It is fixed by this commit with considering coalesce expression as well.	2019-03-05 11:49:30 +03:00
Onder Kalaci	26f569abd8	Make sure to clear PGresult on few places This leads to a memory leak otherwise.	2019-02-28 13:44:34 +03:00
Jason Petersen	3df2f51881	Turn on style-checking, fix lingering violations We'd been ignoring updating uncrustify for some time now because I'd thought these were misclassifications that would require an update in our rules to address. Turns out they're legit, so I'm checking them in.	2019-02-26 23:01:40 -07:00
Onder Kalaci	f706772b2f	Round-robin task assignment policy relies on local transaction id Before this commit, round-robin task assignment policy was relying on the taskId. Thus, even inside a transaction, the tasks were assigned to different nodes. This was especially problematic while reading from reference tables within transaction blocks. Because, we had to expand the distributed transaction to many nodes that are not necessarily already in the distributed transaction.	2019-02-22 19:26:38 +03:00
Onder Kalaci	e521e7e39c	Apply feedback	2019-02-22 18:14:30 +03:00
Onder Kalaci	407d0e30f5	Fix selectForUpdate bug	2019-02-21 18:21:41 +03:00
Onder Kalaci	f144bb4911	Introduce fast path router planning In this context, we define "Fast Path Planning for SELECT" as trivial queries where Citus can skip relying on the standard_planner() and handle all the planning. For router planner, standard_planner() is mostly important to generate the necessary restriction information. Later, the restriction information generated by the standard_planner is used to decide whether all the shards that a distributed query touches reside on a single worker node. However, standard_planner() does a lot of extra things such as cost estimation and execution path generations which are completely unnecessary in the context of distributed planning. There are certain types of queries where Citus could skip relying on standard_planner() to generate the restriction information. For queries in the following format, Citus does not need any information that the standard_planner() generates: SELECT ... FROM single_table WHERE distribution_key = X; or DELETE FROM single_table WHERE distribution_key = X; or UPDATE single_table SET value_1 = value_2 + 1 WHERE distribution_key = X; Note that the queries might not be as simple as the above such that GROUP BY, WINDOW FUNCIONS, ORDER BY or HAVING etc. are all acceptable. The only rule is that the query is on a single distributed (or reference) table and there is a "distribution_key = X;" in the WHERE clause. With that, we could use to decide the shard that a distributed query touches reside on a worker node.	2019-02-21 13:27:01 +03:00
Nils Dijk	1623c44fc7	Simplify make file for citus sql files	2019-02-19 21:29:20 -05:00
Hanefi Onaldi	148dcad0bb	More documentation and stale comments rewritten	2019-02-04 20:21:51 +03:00
Hanefi Onaldi	825666f912	Query samples in docs and better errors	2019-02-04 19:20:02 +03:00
Hanefi Onaldi	574b071113	Add wrapper function introduced in PG11 for compatibility	2019-02-04 19:20:02 +03:00
Hanefi Onaldi	1106e14385	Wrap functions in subqueries remove debug logs to fix travis tests Support RowType functions in joins Regression tests for a custom type function in join	2019-02-04 19:19:29 +03:00
Murat Tuncer	b36b59dd4f	Relax reference table restrictions in subquery union pushdowns We used to error out if there is a reference table in the query participating a union. This has caused pushdownable queries to be evaluated in coordinator. Now we let reference tables inside union queries as long as there is a distributed table in from clause. Existing join checks (reference table on the outer part) sufficient enought that we do not need check the join relation of reference tables.	2019-01-31 15:34:29 +03:00
Onder Kalaci	ec67381ba2	Queries with only intermediate results do not rely on task assignment policy Previously we allowed task assignment policy to have affect on router queries with only intermediate results. However, that is erroneous since the code-path that assigns placements relies on shardIds and placements, which doesn't exists for intermediate results. With this commit, we do not apply task assignment policies when a router query hits only intermediate results.	2019-01-28 17:59:17 +03:00
Murat Tuncer	cd5213abee	Set sequential mode execution GUC for alter partitioned table PG recently started propagating foreign key constraints to partition tables. This came with a select query to validate the the constaint. We are already setting sequential mode execution for this command. In order for validation select query to respect this setting we need to explicitly set the GUC. This commit also handles detach partition part.	2019-01-25 15:28:07 +03:00
velioglu	1bb0ec316a	Reset planner restriction context instead of popping with recursive planning	2019-01-17 14:35:16 +03:00
Jason Petersen	339e6e661e	Remove 9.6 (#2554 ) Removes support and code for PostgreSQL 9.6 cr: @velioglu	2019-01-16 13:11:24 -07:00
Marco Slot	1656b519c4	Plan outer joins through pushdown planning	2019-01-05 20:55:27 +01:00
Murat Tuncer	b389bebda1	Move repeated code to a function	2019-01-03 17:19:01 +03:00
Murat Tuncer	2ed7d24591	Fix having clause bug for complex joins We update column attributes of various clauses for a query inluding target columns, select clauses when we introduce new range table entries in the query. It seems having clause column attributes were not updated. This fix resolves the issue	2019-01-03 17:07:26 +03:00
Murat Tuncer	ec36030fae	Move functions calls that can fail to outside of spinlock We had recently fixed a spinlock issue due to functions failing, but spinlock is not being released. This is the continuation of that work to eliminate possible regression of the issue. Function calls that are moved out of spinlock scope are macros and plain type casting. However, depending on the configuration they have an alternate implementation in PG source that performs memory allocation. This commit moves last bit of codes to out of spinlock for completion purposes.	2019-01-03 15:59:56 +03:00
Murat Tuncer	3b95a03c3e	Merge branch 'master' into fix_spinlock_use	2018-12-25 14:41:21 +03:00
Hadi Moshayedi	38579d52d0	Speed-up run_command_on_shards(). (#2564 ) We were establishing connections synchronously. Establishing connections asynchronously results in some parallelization, saving hundreds of milliseconds. In a test I did, this decreased the query time from 150ms to 40ms.	2018-12-24 08:47:01 -05:00
Murat Tuncer	9671bc3cbb	Make sure spinlock is not left unreleased when an exception is thrown A spinlock is not released when an exception is thrown after spinlock is acquired. This has caused infinite wait and eventual crash in maintenance daemon. This work moves the code than can fail to the outside of spinlock scope so that in the case of failure spinlock is not left locked since it was not locked in the first place.	2018-12-24 15:47:21 +03:00
Hanefi Onaldi	fb497ddad1	Bump 8.2devel on master (#2567 )	2018-12-24 13:49:50 +03:00
Onder Kalaci	9fff7d28a7	Revert `4925521`	2018-12-21 15:36:40 -07:00
Marco Slot	1b1c6374f7	Execute CREATE INDEX CONCURRENTLY concurrently	2018-12-21 14:02:59 -07:00
Marco Slot	3ff2b47366	Restrict visibility of get_*_active_transactions functions to pg_monitor	2018-12-19 18:32:42 +01:00
Dimitri Fontaine	6a1a2b8458	Move an assert-only array-bound check to run-time. When the bound-check fails at run-time, better abort with an error message rather than trying to user memory we did not allocate.	2018-12-19 06:12:05 +01:00
Marco Slot	5b9376a7f8	Check ownership before taking locks in distributed table creation	2018-12-18 15:32:07 +01:00
Nils Dijk	694992e946	upgrade default ssl_ciphers to more restrictive on extension creation Show ssl_ciphers in ssl_by_default_test	2018-12-12 15:33:15 +01:00
Jason Petersen	92893e9601	Fix control file version	2018-12-11 18:50:20 -07:00
Jason Petersen	bd0d1f05e7	Bump SQL version Should have been done when the release-8.0 branch was created…	2018-12-11 10:40:15 -07:00
velioglu	90704d9a52	Fix getting function oid to get hll_add_agg id	2018-12-10 14:16:19 +03:00
velioglu	3e0cff94a6	Add FunctionOidExtended function	2018-12-10 11:59:41 +03:00
Nils Dijk	4af40eee76	Enable SSL by default during installation of citus	2018-12-07 11:23:19 -07:00
velioglu	8764a19464	Adds support for disabling hash agg with hll functions on coordinator query	2018-12-07 18:49:25 +03:00
Marco Slot	9cf91c438b	Only allow transmit from pgsql_job_cache directory	2018-12-05 10:18:27 +01:00
Marco Slot	70fb9c851b	Remove odd memcpy usag in BuildCachedShardList	2018-12-04 14:09:10 +01:00
Marco Slot	0388324fbe	Expand planner readme	2018-12-04 09:55:19 +01:00
Dimitri Fontaine	d1b182de7d	Replace calls to unsafe functions like memcpy and sscanf In answer to a security audit, we double check buffer sizes and avoid known-dangerous operations such as sscanf.	2018-12-04 08:54:43 +01:00
Onder Kalaci	621ccf3946	Ensure to use initialized MaxBackends Postgresql loads shared libraries before calculating MaxBackends. However, Citus relies on MaxBackends being set. Thus, with this commit we use the same steps to calculate MaxBackends while Citus is being loaded (e.g., PG_Init is called). Note that this is safe since all the elements that are used to calculate MaxBackends are PGC_POSTMASTER gucs and a constant value.	2018-12-03 13:25:51 +03:00
Onder Kalaci	b6ebd791a6	Sort task list for multi-task explain outputs This is purely for ensuring that regression tests do not randomly fail.	2018-11-30 11:19:37 -07:00
Marco Slot	8893cc141d	Support INSERT...SELECT with ON CONFLICT or RETURNING via coordinator Before this commit, Citus supported INSERT...SELECT queries with ON CONFLICT or RETURNING clauses only for pushdownable ones, since queries supported via coordinator were utilizing COPY infrastructure of PG to send selected tuples to the target worker nodes. After this PR, INSERT...SELECT queries with ON CONFLICT or RETURNING clauses will be performed in two phases via coordinator. In the first phase selected tuples will be saved to the intermediate table which is colocated with target table of the INSERT...SELECT query. Note that, a utility function to save results to the colocated intermediate result also implemented as a part of this commit. In the second phase, INSERT.. SELECT query is directly run on the worker node using the intermediate table as the source table.	2018-11-30 15:29:12 +03:00
Hanefi Onaldi	088a2ef66a	throw an error when a subquery has grouping set clause	2018-11-30 13:11:32 +03:00
Nils Dijk	9309e63156	create_distributed_table as user, change table ownership during create	2018-11-29 14:20:42 +01:00
Nils Dijk	6aa191f72c	remove table_ddl_command_array and test master_get_table_ddl_events	2018-11-29 14:20:42 +01:00
Murat Tuncer	fd868ec268	Fix citus_stat_statements view Join between pg_stat_statements and citus_query_stats should include queryid, dbid, userid instead of just queryid.	2018-11-29 14:49:16 +03:00
Dimitri Fontaine	5ae2d03881	Refrain from having a strong opinion on maxGroupId. When initializing a Citus formation automatically from an external piece of software such as Citus-HA, the following process process may be used: - decide on the groupId in the external software - SELECT * FROM master_add_inactive_node('localhost', 9701, groupid => X) When Citus checks for maxGroupId, it forbids other software to pick their own group Ids to ues with the master_add_inactive_node() API. This patch removes the extra testing around maxGroupId.	2018-11-28 04:29:15 +01:00
Marco Slot	aff37cf1bc	Control multi-shard modify locks with enable_deadlock_prevention	2018-11-28 02:59:50 +01:00
Marco Slot	1ec5b6c890	Remove old worker_hash_partition_table API	2018-11-26 14:40:37 +01:00
Marco Slot	5a63deab2e	Clean up UDFs and remove unnecessary permissions	2018-11-26 14:40:37 +01:00
Hanefi Onaldi	7db6991dc0	propagate validate queries to workers	2018-11-26 14:04:51 +03:00
Marco Slot	e8e956aa9f	Require superuser when using non-existent job schema in worker_merge_files_into_table	2018-11-24 02:57:16 +01:00
Marco Slot	c4ad899dd8	Check schema ownership in worker_merge_* functions	2018-11-23 11:05:09 +01:00
Marco Slot	e9a7295ead	Add multi-user tests for task-tracker protocol functions	2018-11-23 11:05:09 +01:00
Marco Slot	8e93fe5870	Check schema owner in task_tracker_assign_task	2018-11-23 11:05:09 +01:00
Marco Slot	ec957a833a	Check permission in task_tracker_task_status	2018-11-23 11:04:58 +01:00
Marco Slot	6aa5592e52	Add user ID suffix to intermediate files in re-partition jobs	2018-11-23 08:36:11 +01:00
Marco Slot	a59bf31c76	Use worker_execute_sql_task UDF in task-tracker executor	2018-11-22 18:15:33 +01:00
Marco Slot	30bad7e66f	Add worker_execute_sql_task UDF	2018-11-22 18:15:33 +01:00
Marco Slot	caf402d506	COPY to a task file no longer switches to superuser	2018-11-22 18:15:33 +01:00
Marco Slot	e17025e1d4	Check table ownership in mark_tables_colocated	2018-11-18 00:11:38 +01:00
Marco Slot	18acd00553	Check permissions in lock_relation_if_exists	2018-11-18 00:11:38 +01:00
Marco Slot	aab9f623eb	Check table ownership in upgrade_to_reference_table	2018-11-16 23:27:34 +01:00
Onder Kalaci	052ba21b19	Make sure to prevent unauthorized users to drop sequences in Citus MX	2018-11-15 18:08:04 +03:00
Onder Kalaci	7f0a57a153	Make sure to prevent unauthorized users to drop tables in Citus MX	2018-11-15 18:07:03 +03:00
Nils Dijk	f9520be011	Round robin queries to reference tables with task_assignment_policy set to `round-robin` (#2472 ) Description: Support round-robin `task_assignment_policy` for queries to reference tables. This PR allows users to query multiple placements of shards in a round robin fashion. When `citus.task_assignment_policy` is set to `'round-robin'` the planner will use a round robin scheduling feature when multiple shard placements are available. The primary use-case is spreading the load of reference table queries to all the nodes in the cluster instead of hammering only the first placement of the reference table. Since reference tables share the same path for selecting the shards with single shard queries that have multiple placements (`citus.shard_replication_factor > 1`) this setting also allows users to spread the query load on these shards. For modifying queries we do not apply a round-robin strategy. This would be negated by an extra reordering step in the executor for such queries where a `first-replica` strategy is enforced.	2018-11-15 15:11:15 +01:00

... 3 4 5 6 7 ...

1501 Commits (44a2aede16ee762263d626b3cb374f7a0e6223d5)