citus

Commit Graph

Author	SHA1	Message	Date
Jeff Davis	5b3c32eb38	fixup tests	2020-12-07 13:18:22 -08:00
Marco Slot	c9b658daea	Add a public.citus_tables view	2020-12-03 17:31:40 +01:00
Nils Dijk	7c891a01a9	create missing objects during upgrade path	2020-11-17 19:01:51 +01:00
Simon Kelly	4f94e544b7	create 9.5-1 udfs and update citus--9.4-1--9.5-1.sql	2020-10-15 13:50:36 +02:00
Simon Kelly	2a6c867cb0	Make citus_prepare_pg_upgrade idempotent https://github.com/citusdata/citus/issues/3527	2020-10-15 13:49:50 +02:00
Simon Kelly	50fa4af7e4	update migration script	2020-10-08 12:52:27 +02:00
Simon Kelly	6fffee7616	Drop backup table after upgrade The prepare for upgrade script creates the `'public.pg_dist_rebalance_strategy` table which is not dropped when the upgrade is finished. This may block future upgrades.	2020-10-08 09:48:04 +02:00
Marco Slot	dbc348b7e0	Create sequence dependency during metadata syncing	2020-10-06 10:57:39 +02:00
Marco Slot	9bba8bb4e8	Remove master_drop_sequences	2020-10-06 10:57:33 +02:00
Onur Tirtir	17cc810372	Implement "citus local table" creation logic	2020-09-09 11:50:48 +03:00
Halil Ozan Akgul	375310b7f1	Adds support for table undistribution	2020-08-05 14:36:03 +03:00
Hadi Moshayedi	0e3140c14d	Include execution duration in worker_last_saved_explain_analyze	2020-06-11 02:54:54 -07:00
Hadi Moshayedi	5cdfa9f571	Implement EXPLAIN ANALYZE udfs. Implements worker_save_query_explain_analyze and worker_last_saved_explain_analyze. worker_save_query_explain_analyze executes and returns results of query while saving its EXPLAIN ANALYZE to be fetched later. worker_last_saved_explain_analyze returns the saved EXPLAIN ANALYZE result.	2020-06-09 10:02:05 -07:00
Hanefi Önaldı	d535121f8d	Introduce truncate_local_data_after_distributing_table()	2020-04-17 13:21:34 +03:00
Nils Dijk	1d6ba1d09e	Refactor alter role to work on distributed roles (#3739 ) DESCRIPTION: Alter role only works for citus managed roles Alter role was implemented before we implemented good role management that hooks into the object propagation framework. This is a refactor of all alter role commands that have been implemented to - be on by default - only work for supported roles - make the citus extension owner a supported role Instead of distributing the alter role commands for roles at the beginning of the node activation role it now _only_ executes the alter role commands for all users in all databases and in the current database. In preparation of full role support small refactors have been done in the deparser. Earlier tests targeting other roles than the citus extension owner have been either slightly changed or removed to be put back where we have full role support. Fixes #2549	2020-04-16 12:23:27 +02:00
Onder Kalaci	aa6b641828	Throttle connections to the worker nodes With this commit, we're introducing a new infrastructure to throttle connections to the worker nodes. This infrastructure is useful for multi-shard queries, router queries are have not been affected by this. The goal is to prevent establishing more than citus.max_shared_pool_size number of connections per worker node in total, across sessions. To do that, we've introduced a new connection flag OPTIONAL_CONNECTION. The idea is that some connections are optional such as the second (and further connections) for the adaptive executor. A single connection is enough to finish the distributed execution, the others are useful to execute the query faster. Thus, they can be consider as optional connections. When an optional connection is not allowed to the adaptive executor, it simply skips it and continues the execution with the already established connections. However, it'll keep retrying to establish optional connections, in case some slots are open again.	2020-04-14 10:27:48 +02:00
Onder Kalaci	38b8a9ad62	Add citus_remote_connection_stats() function This function is intended to be used for monitoring the remote connections.	2020-04-14 10:03:27 +02:00
Hadi Moshayedi	dda53a0bba	GUC for replicate reference tables on activate.	2020-04-08 12:42:45 -07:00
Marco Slot	924cd7343a	Defer reference table replication to shard creation time	2020-04-08 12:41:36 -07:00
Hanefi Önaldı	d1223bd6cc	Remove migration paths to 9.3-1, introduce 9.3-2	2020-04-03 12:50:45 +03:00
SaitTalhaNisanci	3df578010e	add a UDF to update colocation (#3623 ) If two tables have the same distribution column type, we implicitly colocate them. This is useful since colocation has a big performance impact in most applications. When a table is rebalanced, all of the colocated tables are also rebalanced. If table A and table B are colocated and we want to rebalance table A, table B will also be rebalanced. We need replica identity so that logical replication can replicate updates and deletes during rebalancing. If table B does not have a replica identity we error out. A solution to this is to introduce a UDF so that colocation can be updated. The remaining tables in the colocation group will stay colocated. For example if table A, B and C are colocated and after updating table B's colocations, table A and table C stay colocated. The "updating colocation" step does not move any data around, it only updated pg_dist_partition and pg_dist_colocation tables. Specifically it creates a new colocation group for the table and updates the entry in pg_dist_partition while invalidating any cache.	2020-03-23 13:22:24 +03:00
Nils Dijk	a77ed9cd23	Refactor master query to be planned by postgres' planner (#3326 ) DESCRIPTION: Replace the query planner for the coordinator part with the postgres planner Closes #2761 Citus had a simple rule based planner for the query executed on the query coordinator. This planner grew over time with the addigion of SQL support till it was getting close to the functionality of the postgres planner. Except the code was brittle and its complexity rose which made it hard to add new SQL support. Given its resemblance with the postgres planner it was a long outstanding wish to replace our hand crafted planner with the well supported postgres planner. This patch replaces our planner with a call to postgres' planner. Due to the functionality of the postgres planner we needed to support both projections and filters/quals on the citus custom scan node. When a sort operation is planned above the custom scan it might require fields to be reordered in the custom scan before returning the tuple (projection). The postgres planner assumes every custom scan node implements projections. Because we controlled the plan that was created we prevented reordering in the custom scan and never had implemented it before. A same optimisation applies to having clauses that could have been where clauses. Instead of applying the filter as a having on the aggregate it will push it down into the plan which could reach a custom scan node. For both filters and projections we have implemented them when tuples are read from the tuple store. If no projections or filters are required it will directly return the tuple from the tuple store. Otherwise it will loop tuples from the tuple store through the filter and projection until a tuple is found and returned. Besides filters being pushed down a side effect of having quals that could have been a where clause is that a call to read intermediate result could be called before the first tuple is fetched from the custom scan. This failed because the intermediate result would only be pulled to the coordinator on the first tuple fetch. To overcome this problem we do run the distributed subplans now before we run the postgres executor. This ensures the intermediate result is present on the coordinator in time. We do account for total time instrumentation by removing the instrumentation before handing control to the psotgres executor and update the timings our self. For future SQL support it is enough to create a valid query structure for the part of the query to be executed on the query coordinating node. As a utility we do serialise and print the query at debug level4 for engineers to inspect what kind of query is being planned on the query coordinator.	2020-02-25 14:39:56 +01:00
Hadi Moshayedi	bc1a800f70	Use current user for repartition join temp schemas. Otherwise when using a less privileged user we might get errors when trying to create the schema.	2020-02-04 09:48:20 -08:00
Hadi Moshayedi	d7aea7fa10	Implement partitioned intermediate results.	2019-12-24 03:53:39 -08:00
Jelte Fennema	b655c02352	Add the necessary changes for rebalance strategies on enterprise (#3325 ) This commit adds the SQL and C changes necessary to support custom rebalance strategies in the Enterprise version of Citus.	2019-12-19 15:23:08 +01:00
Hadi Moshayedi	ef487e0792	Implement fetch_intermediate_results	2019-12-18 10:46:35 -08:00
Hadi Moshayedi	113bd1e5f1	Implement read_intermediate_results	2019-12-17 13:51:16 -08:00
SaitTalhaNisanci	7ff4ce2169	Add adaptive executor support for repartition joins (#3169 ) * WIP * wip * add basic logic to run a single job with repartioning joins with adaptive executor * fix some warnings and return in ExecuteDependedTasks if there is none * Add the logic to run depended jobs in adaptive executor The execution of depended tasks logic is changed. With the current logic: - All tasks are created from the top level task list. - At one iteration: - CurTasks whose dependencies are executed are found. - CurTasks are executed in parallel with adapter executor main logic. - The iteration is repeated until all tasks are completed. * Separate adaptive executor repartioning logic * Remove duplicate parts * cleanup directories and schemas * add basic repartion tests for adaptive executor * Use the first placement to fetch data In task tracker, when there are replicas, we try to fetch from a replica for which a map task is succeeded. TaskExecution is used for this, however TaskExecution is not used in adaptive executor. So we cannot use the same thing as task tracker. Since adaptive executor fails when a map task fails (There is no retry logic yet). We know that if we try to execute a fetch task, all of its map tasks already succeeded, so we can just use the first one to fetch from. * fix clean directories logic * do not change the search path while creating a udf * Enable repartition joins with adaptive executor with only enable_reparitition_joins guc * Add comments to adaptive_executor_repartition * dont run adaptive executor repartition test in paralle with other tests * execute cleanup only in the top level execution * do cleanup only in the top level ezecution * not begin a transaction if repartition query is used * use new connections for repartititon specific queries New connections are opened to send repartition specific queries. The opened connections will be closed at the FinishDistributedExecution. While sending repartition queries no transaction is begun so that we can see all changes. * error if a modification was done prior to repartition execution * not start a transaction if a repartition query and sql task, and clean temporary files and schemas at each subplan level * fix cleanup logic * update tests * add missing function comments * add test for transaction with DDL before repartition query * do not close repartition connections in adaptive executor * rollback instead of commit in repartition join test * use close connection instead of shutdown connection * remove unnecesary connection list, ensure schema owner before removing directory * rename ExecuteTaskListRepartition * put fetch query string in planner not executor as we currently support only replication factor = 1 with adaptive executor and repartition query and we know the query string in the planner phase in that case * split adaptive executor repartition to DAG execution logic and repartition logic * apply review items * apply review items * use an enum for remote transaction state and fix cleanup for repartition * add outside transaction flag to find connections that are unclaimed instead of always opening a new transaction * fix style * wip * rename removejobdir to partition cleanup * do not close connections at the end of repartition queries * do repartition cleanup in pg catch * apply review items * decide whether to use transaction or not at execution creation * rename isOutsideTransaction and add missing comment * not error in pg catch while doing cleanup * use replication factor of the creation time, not current time to decide if task tracker should be chosen * apply review items * apply review items * apply review item	2019-12-17 19:09:45 +03:00
Philip Dubé	c563e0825c	Strip trailing whitespace and add final newline (#3186 ) This brings files in line with our editorconfig file	2019-11-21 14:25:37 +01:00
Halil Ozan Akgul	5ae7b219ff	Create the ALTER ROLE propagation	2019-11-18 18:31:28 +03:00
Hadi Moshayedi	15af1637aa	Replicate reference tables to coordinator.	2019-11-15 05:50:19 -08:00
Jelte Fennema	a8bd2d58f5	Update SQL definitions to prepare for drain node functionality (#3179 )	2019-11-15 10:11:56 +01:00
Jelte Fennema	9fb897a074	Fix queries with repartition joins and group by unique column (#3157 ) Postgres doesn't require you to add all columns that are in the target list to the GROUP BY when you group by a unique column (or columns). It even actively removes these group by clauses when you do. This is normally fine, but for repartition joins it is not. The reason for this is that the temporary tables don't have these primary key columns. So when the worker executes the query it will complain that it is missing columns in the group by. This PR fixes that by adding an ANY_VALUE aggregate around each variable in the target list that does is not contained in the group by or in an aggregate. This is done only for repartition joins. The ANY_VALUE aggregate chooses the value from an undefined row in the group.	2019-11-08 15:36:18 +01:00
Marco Slot	04040e0a37	Revoke usage from the citus schema	2019-10-23 00:08:17 +02:00
Jelte Fennema	78e495e030	Add shouldhaveshards to pg_dist_node (#2960 ) This is an improvement over #2512. This adds the boolean shouldhaveshards column to pg_dist_node. When it's false, create_distributed_table for new collocation groups will not create shards on that node. Reference tables will still be created on nodes where it is false.	2019-10-22 16:47:16 +02:00
Jelte Fennema	ec4a165eec	Improve isolation test block detection (#3055 )	2019-10-01 14:10:15 +02:00
Jelte Fennema	40f785e6d8	Move citus_isolation_test_session_is_blocked to separate udf sql file	2019-10-01 14:10:15 +02:00
Jelte Fennema	dab16be283	Set default threshold on get_rebalance_table_shards_plan to 0, like rebalance_table_shards (#3039 ) In this PR the default `threshold` of `rebalance_table_shards` was set to 0: https://github.com/citusdata/shard_rebalancer/pull/73 However, the default for get_rebalance_table_shards_plan was not updated. This can cause the confusing situation where the actual steps run by `rebalance_table_shards` are not the same as the ones returned by `get_rebalance_table_shards_plan`.	2019-09-27 17:21:36 +02:00
Philip Dubé	bc1ad67eb5	Distribute CALL on distributed procedures to metadata workers Lots taken from https://github.com/citusdata/citus/pull/2829	2019-09-24 17:31:09 +00:00
Onder Kalaci	d7e2968120	Add parameters to create_distributed_function() With this commit, we're changing the API for create_distributed_function() such that users can provide the distribution argument and the colocation information.	2019-09-22 21:53:33 +02:00
Onder Kalaci	cde6b02858	Add columns to pg_dist_object for distributed functions This PR simply adds the columns to pg_dist_object and implements the necessary metadata changes to keep track of distribution argument of the functions/procedures.	2019-09-16 17:28:04 +02:00
Hanefi Onaldi	8f2a3a0604	Introduce create_distributed_function(regproc) UDF (#2961 ) This PR aims to add the minimal set of changes required to start distributing functions. You can use create_distributed_function(regproc) UDF to distribute a function. SELECT create_distributed_function('add(int,int)'); The function definition should include the param types to properly identify the correct function that we wish to distribute	2019-09-13 23:27:46 +03:00
Jelte Fennema	4bbf65d913	Change SQL migration build process for easier reviews (#2951 ) @thanodnl told me it was a bit of a problem that it's impossible to see the history of a UDF in git. The only way to do so is by reading all the sql migration files from new to old. Another problem is that it's also hard to review the changed UDF during code review, because to find out what changed you have to do the same. I thought of a IMHO better (but not perfect) way to handle this. We keep the definition of a UDF in sql/udfs/{name_of_udf}/latest.sql. That file we change whenever we need to make a change to the the UDF. On top of that you also make a snapshot of the file in sql/udfs/{name_of_udf}/{migration-version}.sql (e.g. 9.0-1.sql) by copying the contents. This way you can easily view what the actual changes were by looking at the latest.sql file. There's still the question on how to use these files then. Sadly postgres doesn't allow inclusion of other sql files in the migration sql file (it does in psql using \i). So instead I used the C preprocessor+ make to compile a sql/xxx.sql to a build/sql/xxx.sql file. This final build/sql/xxx.sql file has every occurence of #include "somefile.sql" in sql/xxx.sql replaced by the contents of somefile.sql.	2019-09-13 18:44:27 +02:00

1 2 3 4

193 Commits (b877d606c7ab469e7584368fa73ba428905fa164)