citus

Commit Graph

Author	SHA1	Message	Date
aykut-bozkurt	1ad1a0a336	add citus_task_wait udf to wait on desired task status (#6475 ) We already have citus_job_wait to wait until the job reaches the desired state. That PR adds waiting on task state to allow more granular waiting. It can be used for Citus operations. Moreover, it is also useful for testing purposes. (wait until a task reaches specified state) Related to #6459.	2022-12-12 22:41:03 +03:00
Ahmet Gedemenli	cb02d62369	Unique names for replication artifacts (#6529 ) DESCRIPTION: Create replication artifacts with unique names We're creating replication objects with generic names. This disallows us to enable parallel shard moves, as two operations might use the same objects. With this PR, we'll create below objects with operation specific names, by appending OparationId to the names. * Subscriptions * Publications * Replication Slots * Users created for subscriptions	2022-12-06 15:48:16 +03:00
Teja Mupparti	e14dc5d45d	Address the issues/comments from the original PR# 6315 1) Regular users fail to use clock UDF with permission issue. 2) Clock functions were declared as STABLE, whereas by definition they are VOLATILE. By design, any clock/time functions will return different results for each call even within a single SQL statement. Note: UDF citus_get_transaction_clock() is a misnomer as it internally calls the clock tick which always returns different results for every invocation in the same transaction.	2022-12-05 11:06:21 -08:00
Teja Mupparti	01103ce05d	This implements a new UDF citus_get_cluster_clock() that returns a monotonically increasing logical clock. Clock guarantees to never go back in value after restarts, and makes best attempt to keep the value close to unix epoch time in milliseconds. Also, introduces a new GUC "citus.enable_cluster_clock", when true, every distributed transaction is stamped with logical causal clock and persisted in a catalog pg_dist_commit_transaction.	2022-10-28 10:15:08 -07:00
Ahmet Gedemenli	96912d9ba1	Add status column to get_rebalance_progress() (#6403 ) DESCRIPTION: Adds status column to get_rebalance_progress() Introduces a new column named `status` for the function `get_rebalance_progress()`. For each ongoing shard move, this column will reveal information about that shard move operation's current status. For now, candidate status messages could be one of the below. * Not Started * Setting Up * Copying Data * Catching Up * Creating Constraints * Final Catchup * Creating Foreign Keys * Completing * Completed	2022-10-17 16:55:31 +03:00
Jelte Fennema	6277ffd69e	Reduce isolation flakyness by improving blocked process detection (#6405 ) Sometimes our CI randomly fails on a test in a way similar to this: ```diff step s2-drop: DROP TABLE cancel_table; - + <waiting ...> +step s2-drop: <... completed> starting permutation: s1-timeout s1-begin s1-sleep10000 s1-rollback s1-reset s1-drop ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26524/workflows/5415b84f-13a3-482f-bef9-648314c79a67/jobs/756377 I tried to fix that already in #6252 by disabling the maintenance daemon during isolation tests. But it seems that hasn't fixed all cases of these errors. This is another attempt at fixing these issues that seems to have better results. What it does is that it starts using the pInterestingPids parameter that citus_isolation_test_session_is_blocked receives. With this change we start filter out block-edges that are not caused by any of these pids. In passing this change also makes it possible to run `isolation_create_distributed_table_concurrently` with `check-isolation-base`	2022-10-12 16:35:09 +02:00
Ahmet Gedemenli	e36890ce55	Add source_lsn and target_lsn fields into get_rebalance_progress (#6364 ) DESCRIPTION: Adds source_lsn and target_lsn fields into get_rebalance_progress Adding two fields named `source_lsn` and `target_lsn` to the function `get_rebalance_progress`. Target lsn data is fetched in `GetShardStatistics`, by expanding the query sent to workers (joining with pg_subscription_rel and pg_stat_subscription). Then put into the hashmap, for each shard. Source lsn data is fetched in `BuildWorkerShardStatististicsHash`, in the loop that iterate each node, by sending a pg_current_wal_lsn query to each node. Then put into the hashmap, for each node.	2022-10-05 11:12:24 +03:00
Jelte Fennema	f13b140621	Show citus_copy_shard_placement progress in get_rebalance_progress (#6322 ) DESCRIPTION: Show citus_copy_shard_placement progress in get_rebalance_progress When rebalancing to a new node that does not have reference tables yet the rebalancer will first copy the reference tables to the nodes. Depending on the size of the reference tables, this might take a long time. However, there's no indication of what's happening at this stage of the rebalance. This PR improves this situation by also showing the progress of any citus_copy_shard_placement calls when calling get_rebalance_progress.	2022-09-13 08:59:52 +00:00
Nils Dijk	cda3686d86	Feature: run rebalancer in the background (#6215 ) DESCRIPTION: Add a rebalancer that uses background tasks for its execution Based on the baclground jobs and tasks introduced in #6296 we implement a new rebalancer on top of the primitives of background execution. This allows the user to initiate a rebalance and let Citus execute the long running steps in the background until completion. Users can invoke the new background rebalancer with `SELECT citus_rebalance_start();`. It will output information on its job id and how to track progress. Also it returns its job id for automation purposes. If you simply want to wait till the rebalance is done you can use `SELECT citus_rebalance_wait();` A running rebalance can be canelled/stopped with `SELECT citus_rebalance_stop();`.	2022-09-12 20:46:53 +03:00
Marco Slot	48f7d6c279	Show local managed tables in citus_tables and hide tables owned by extensions (#6321 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-09-12 17:49:17 +03:00
Marco Slot	ba2fe3e3c4	Remove do_repair option from citus_copy_shard_placement (#6299 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-09-09 15:44:30 +02:00
Nils Dijk	00a94c7f13	Implement infrastructure to run sql jobs in the background (#6296 ) DESCRIPTION: Add infrastructure to run long running management operations in background This infrastructure introduces the primitives of jobs and tasks. A task consists of a sql statement and an owner. Tasks belong to a Job and can depend on other tasks from the same job. When there are either runnable or running tasks we would like to make sure a bacgrkound task queue monitor process is running. A Task could be in running state while there is actually no monitor present due to a database restart or failover. Once the monitor starts it will reset any running task to its runnable state. To make sure only one background task queue monitor is ever running at once it will acquire an advisory lock that self conflicts. Once a task is done it will find all tasks depending on this task. After checking that the task doesn't have unmet dependencies it will transition the task from blocked to runnable state for the task to be picked up on a subsequent task start. Currently only one task can be running at a time. This can be improved upon in later releases without changes to the higher level API. The initial goal for this background tasks is to allow a rebalance to run in the background. This will be implemented in a subsequent PR.	2022-09-09 16:11:19 +03:00
Jelte Fennema	e29db74a19	Don't override postgres C symbols with our own (#6300 ) When introducing our overrides of pg_cancel_backend and pg_terminate_backend we accidentally did that in such a way that we cannot call the original pg_cancel_backend and pg_terminate_backend from C anymore. This happened because we defined the exact same symbols in our shared library as postgres does in its own binary. This fixes that by using a different names for the C function than for the SQL function. Making this work in all upgrade and downgrade scenarios is not trivial though, because we actually need to remove the C function definition. Postgres errors in two different times when the symbol that a C function wants to call is not defined in the library it expects it in: 1. When creating the SQL function definition 2. When calling the SQL function Item 1 causes an issue when creating our extension for the first time. We then go execute all the migrations that we have. So if the 11.0 migration contains a SQL function definition that still references the pg_cancel_backend symbol, that migration will fail. This issue is solved by actually changing the SQL definition in the old migration. This is not enough to fix all issues though. Item 2 causes an issue after an upgrade to 11.1, because it won't have the new definition of the SQL function. This is solved by recreating the SQL functions in the migration to 11.1. That way it gets the new definition. Then finally there's the case of downgrades. To continue to make our pg_cancel_backend SQL function work after downgrading, we will need to make a patch release for 11.0 that includes the new citus_cancel_backend symbol. This is done in a separate commit.	2022-09-07 11:27:05 +02:00
Nitish Upreti	d7404a9446	'Deferred Drop' and robust 'Shard Cleanup' for Splits. (#6258 ) DESCRIPTION: This PR adds support for 'Deferred Drop' and robust 'Shard Cleanup' for Splits. Common Infrastructure This PR introduces new common infrastructure so as any operation that wants robust cleanup of resources can register with the cleaner and have the resources cleaned appropriately based on a specified policy. 'Shard Split' is the first consumer using this new infrastructure. Note : We only support adding 'shards' as resources to be cleaned-up right now but the framework will be extended to support other resources in future. Deferred Drop for Split Deferred Drop Support ensures that shards undergoing split are not dropped inline as part of operation but dropped later when no active read queries are running on shard. This helps with : Avoids any potential deadlock scenarios that can cause long running Split operation to rollback. Avoids Split operation blocking writes and then getting blocked (due to running queries on the shard) when trying to drop shards. Deferred drop is the new default behavior going forward. Shard Cleaner Extension Shard Cleaner is a background task responsible for deferred drops in case of 'Move' operations. The cleaner has been extended to ensure robust cleanup of shards (dummy shards and split children) in case of a failure based on the new infrastructure mentioned above. The cleaner also handles deferred drop for 'Splits'. TESTING: New test ''citus_split_shard_by_split_points_deferred_drop' to test deferred drop support. New test 'failure_split_cleanup' to test shard cleanup with failures in different stages. Update 'isolation_blocking_shard_split and isolation_non_blocking_shard_split' for deferred drop. Added non-deferred drop version of existing tests : 'citus_split_shard_no_deferred_drop' and 'citus_non_blocking_splits_no_deferred_drop'	2022-09-06 12:11:20 -07:00
Marco Slot	6bb31c5d75	Add non-blocking variant of create_distributed_table (#6087 ) Added create_distributed_table_concurrently which is nonblocking variant of create_distributed_table. It bases on the split API which takes advantage of logical replication to support nonblocking split operations. Co-authored-by: Marco Slot <marco.slot@gmail.com> Co-authored-by: aykutbozkurt <aykut.bozkurt1995@gmail.com>	2022-08-30 15:35:40 +03:00
Sameer Awasekar	4df8eca77f	Add worker_split_shard_release_dsm udf to release dynamic shared memory (#6248 ) The code introduces worker_split_shard_release_dsm udf to release the dynamic shared memory segment allocated during non-blocking split workflow.	2022-08-26 18:27:32 +05:30
Ahmet Gedemenli	0631e1998b	Fix upgrade paths for #6100 (#6176 ) * Fix upgrade paths for #6100 Co-authored-by: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com>	2022-08-17 18:56:53 +03:00
aykut-bozkurt	52efe08642	default mode for shard splitting is set to auto. (#6179 )	2022-08-17 12:18:47 +03:00
aykut-bozkurt	be06d65721	Nonblocking tenant isolation is supported by using split api. (#6167 )	2022-08-17 11:13:07 +03:00
Jelte Fennema	8017693b2f	Allow specifying the shard_transfer_mode when replicating reference tables (#6070 ) When using `citus.replicate_reference_tables_on_activate = off`, reference tables need to be replicated later. This can be done using the `replicate_reference_tables()` UDF. However, this function only allowed blocking replication. This changes the function to default to logical replication instead, and allows choosing any of our existing shard transfer modes.	2022-08-09 13:21:31 +03:00
Jelte Fennema	dd548ee3c7	Use faster custom copy logic for non-blocking shard moves (#6119 ) DESCRIPTION: Use faster custom copy logic for non-blocking shard moves Non-blocking shard moves consist of two main phases: 1. Initial data copy 2. Catchup phase This changes the first of these phases significantly. Previously we used the copy logic provided by postgres subscriptions. This meant we didn't have to implement it ourselves, but it came with the downside of little control. When implementing shard splits we needed more control to even make it work, so we implemented our own logic for copying data between nodes. This PR starts using that logic for non-blocking shard moves. Doing so has four main advantages: 1. It uses COPY in binary format when possible, which is cheaper to encode and decode. Furthermore it very often results in less data that needs to be sent over the network. 2. It allows us to create the primary key (or other replica identity) after doing the initial data copy. This should give some speed up over the total run, because creating an index is bulk is much faster than incrementally building it. 3. It doesn't require a replication slot per parallel copy. Increasing the maximum number of replication slots uses resources in postgres, even if they are not used. So reducing the number of replication slots that shard moves need is nice. 4. Logical replication table_sync workers are slow to start up, so if lots of shards need to be copied that can make it quite slow. This can happen easily when combining Postgres partitioning with Citus.	2022-08-08 17:09:43 +02:00
Ahmet Gedemenli	8b68b0b5bb	Fix pg upgrade script for foreign tables (#6100 ) Fixes unexpected error for foreign tables when upgrading pg	2022-08-05 13:35:17 +03:00
Sameer Awasekar	e236711eea	Introduce Non-Blocking Shard Split Workflow	2022-08-04 16:32:38 +02:00
Jelte Fennema	abffa6c3b9	Use shard split copy code for blocking shard moves (#6098 ) The new shard copy code that was created for shard splits has some advantages over the old shard copy code. The old code was using worker_append_table_to_shard, which wrote to disk twice. And it also didn't use binary copy when that was possible. Both of these issues were fixed in the new copy code. This PR starts using this new copy logic also for shard moves, not just for shard splits. On my local machine I created a single shard table like this. ```sql set citus.shard_count = 1; create table t(id bigint, a bigint); select create_distributed_table('t', 'id'); INSERT into t(id, a) SELECT i, i from generate_series(1, 100000000) i; ``` I then turned `fsync` off to make sure I wasn't bottlenecked by disk. Finally I moved this shard between nodes with `citus_move_shard_placement` with `block_writes`. Before this PR a move took ~127s, after this PR it took only ~38s. So for this small test this resulted in spending ~70% less time. And I also tried the same test for a table that contained large strings: ```sql set citus.shard_count = 1; create table t(id bigint, a bigint, content text); select create_distributed_table('t', 'id'); INSERT into t(id, a, content) SELECT i, i, 'aunethautnehoautnheaotnuhetnohueoutnehotnuhetncouhaeohuaeochgrhgd.athbetndairgexdbuhaobulrhdbaetoausnetohuracehousncaoehuesousnaceohuenacouhancoexdaseohusnaetobuetnoduhasneouhaceohusnaoetcuhmsnaetohuacoeuhebtokteaoshetouhsanetouhaoug.lcuahesonuthaseauhcoerhuaoecuh.lg;rcydabsnetabuesabhenth' from generate_series(1, 20000000) i; ```	2022-08-01 20:10:36 +03:00
Hanefi Onaldi	eb3e5ee227	Introduce citus_locks view citus_locks combines the pg_locks views from all nodes and adds global_pid, nodeid, and relation_name. The columns of citus_locks don't change based on the Postgres version, however the pg_locks's columns do. Postgres 14 added one more column to pg_locks (waitstart timestamptz). citus_locks has the most expansive column set, including the newly added column. If citus_locks is queried in a Postgres version where pg_locks doesn't have some columns, the values for those columns in citus_locks will be NULL	2022-07-21 03:06:57 +03:00
Nitish Upreti	5b3537cdff	Shard Split for Citus (#6029 ) * Blocking split setup * Add missing type * Missing API from Metadata Sync * Shard Split e2e code * Worker Split Copy DestReceiver skeleton * Basic destreceiver code * worker_split_copy UDF * UDF calling * Split points are text * Isolate Tenant and Split Shard Unification * Fixing executor and misc * Reindent code * Fixing UDF definitions * Hello World Local Copy works * Remote copy hello world works * Local and Remote binary test * Fixing text local copy and adding tests * Hello World shard split works * Negative tests * Blocking Split workflow works * Refactor * Bug fix * Reindent * Cleaning up and adding comments * Basic test for shard split workflow * ReIndent * Circle CI integration * Removing include causing circle-ci build failure * Remove SplitCopyDestReceiver and use PartitionedResultDestReceiver * Add support for citus.enable_binary_protocol * Reindent * Fix build break * Update Test * Cleanup on catch * Addressing open comments * Update downgrade script and quote schema/table in COPY statement * Fix metadata sync issue. Update regression test * Isolation test and bug fix * Add Isolation test, fix foreign constraint deadlock issue * Misc code review comments * Test name needing to be quoted * Refactor code from review comments * Explaining shardGroupSplitIntervalListList * Fix upgrade & downgrade * Fix broken test * Test fix Round 2 * Fixing bug and modifying test appropriately * Fully qualify copy udf name. Run Reindent * Address PR comments * Fix null handling when creating AuxiliaryStructures * Ensure local copy is triggered in tests * Limit max shards that can be created with split * Test failure fix * Remove split_mode and use shard_transfer_mode instead' * Fix test failure * Fix test failure * Fixing permission issue when splitting non-superuser owned tables * Fix test expected output * Remove extra space * Fix test * attempt to fix test * Addressing Marco's PR comment * Only clean shards created by workflow * Remove from merge * Update test	2022-07-18 02:54:15 -07:00
ywj	1675519f93	Support citus_columnar as separate extension (#5911 ) * Support upgrade and downgrade and separate columnar as citus_columnar extension Co-authored-by: Yanwen Jin <yanwjin@microsoft.com> Co-authored-by: Jeff Davis <jeff@j-davis.com>	2022-07-13 21:08:29 -07:00
Halil Ozan Akgul	1490acbbe9	Removes incorrect parameter from get_all_active_transactions	2022-07-06 11:35:46 +03:00
Onder Kalaci	bab4c0a8c3	Fixes a bug that prevents upgrades when there are no worker nodes	2022-06-28 15:54:49 +02:00
Jelte Fennema	184c7c0bce	Make enterprise features open source (#6008 ) This PR makes all of the features open source that were previously only available in Citus Enterprise. Features that this adds: 1. Non blocking shard moves/shard rebalancer (`citus.logical_replication_timeout`) 2. Propagation of CREATE/DROP/ALTER ROLE statements 3. Propagation of GRANT statements 4. Propagation of CLUSTER statements 5. Propagation of ALTER DATABASE ... OWNER TO ... 6. Optimization for COPY when loading JSON to avoid double parsing of the JSON object (`citus.skip_jsonb_validation_in_copy`) 7. Support for row level security 8. Support for `pg_dist_authinfo`, which allows storing different authentication options for different users, e.g. you can store passwords or certificates here. 9. Support for `pg_dist_poolinfo`, which allows using connection poolers in between coordinator and workers 10. Tracking distributed query execution times using citus_stat_statements (`citus.stat_statements_max`, `citus.stat_statements_purge_interval`, `citus.stat_statements_track`). This is disabled by default. 11. Blocking tenant_isolation 12. Support for `sslkey` and `sslcert` in `citus.node_conninfo`	2022-06-16 00:23:46 -07:00
Marco Slot	36c4ec6d53	Introduce a citus_finish_citus_upgrade() function	2022-06-13 13:15:15 +02:00
Onder Kalaci	dd02e1755f	Parallelize metadata syncing on node activate It is often useful to be able to sync the metadata in parallel across nodes. Also citus_finalize_upgrade_to_citus11() uses start_metadata_sync_to_primary_nodes() after this commit. Note that this commit does not parallelize all pieces of node activation or metadata syncing. Instead, it tries to parallelize potenially large parts of metadata, which is the objects and distributed tables (in general Citus tables). In the future, it would be nice to sync the reference tables in parallel across nodes. Create ~720 distributed tables / ~23450 shards ```SQL -- declaratively partitioned table CREATE TABLE github_events_looooooooooooooong_name ( event_id bigint, event_type text, event_public boolean, repo_id bigint, payload jsonb, repo jsonb, actor jsonb, org jsonb, created_at timestamp ) PARTITION BY RANGE (created_at); SELECT create_time_partitions( table_name := 'github_events_looooooooooooooong_name', partition_interval := '1 day', end_at := now() + '24 months' ); CREATE INDEX ON github_events_looooooooooooooong_name USING btree (event_id, event_type, event_public, repo_id); SELECT create_distributed_table('github_events_looooooooooooooong_name', 'repo_id'); SET client_min_messages TO ERROR; ``` across 1 node: almost same as expected ```SQL SELECT start_metadata_sync_to_primary_nodes(); Time: 15664.418 ms (00:15.664) select start_metadata_sync_to_node(nodename,nodeport) from pg_dist_node; Time: 14284.069 ms (00:14.284) ``` across 7 nodes: ~3.5x improvement ```SQL SELECT start_metadata_sync_to_primary_nodes(); ┌──────────────────────────────────────┐ │ start_metadata_sync_to_primary_nodes │ ├──────────────────────────────────────┤ │ t │ └──────────────────────────────────────┘ (1 row) Time: 25711.192 ms (00:25.711) -- across 7 nodes select start_metadata_sync_to_node(nodename,nodeport) from pg_dist_node; Time: 82126.075 ms (01:22.126) ```	2022-05-23 09:15:48 +02:00
Marco Slot	79d7e860e6	Add a run_command_on_coordinator function	2022-05-19 10:26:09 +02:00
Onder Kalaci	127450466e	Do not warn unncessarily when a node is removed In the past (pre-11), we allowed removing worker nodes that had active placements for replicated distributed table, without even checking if there are any other replicas of the same placement. However, with #5469, we prevent disabling nodes via a hard error when there is the last active placement of shard, as we do for reference tables. Note that otherwise, we'd allow users to lose data. As of today, the NOTICE is completely irrelevant.	2022-05-18 17:23:38 +02:00
Onder Kalaci	db998b3d66	Adds "sync" option to citus_disable_node() UDF Before this commit, we had: ```SQL SELECT citus_disable_node(nodename, nodeport, force boolean DEFAULT false) ``` Where, we allow forcing to disable first worker node with `force:=true`. However, it entails the risk for losing data / diverging placement data etc. With `force` flag, we control disabling the first worker node, and with `async` flag we control whether the changes are done via bg worker or immediately. ```SQL SELECT citus_disable_node(nodename, nodeport, force boolean DEFAULT false, sync boolean DEFAULT false) ``` Where we can achieve all the following: \| Mode \| Data loss possibility \| Can run in 2PC \| Handle multiple node failures \| Immediately effective \| \| --- \|--- \|--- \|--- \|--- \| \| force:false, sync: false \| false \| true \| true \| false \| \| force:false, sync: true \| false \| false \| false \| true \| \| force:true, sync: false \| true \| true \| true \| false \| \| force:true, sync: true \| false \| false \| false \| true \|	2022-05-18 17:21:12 +02:00
Marco Slot	6fad5dc207	Add a citus_is_coordinator function	2022-05-13 10:02:52 +02:00
Marco Slot	ceb593c9da	Convert citus.hide_shards_from_app_name_prefixes to citus.show_shards_for_app_name_prefixes	2022-05-03 14:22:13 +02:00
Halil Ozan Akgul	4dbc760603	Introduces citus_coordinator_node_id	2022-03-22 10:34:22 +03:00
Marco Slot	5bb5359da0	Fix worker node version check	2022-03-17 13:23:02 +01:00
Marco Slot	22a18fc1f2	Fix typo in upgrade function	2022-03-17 13:23:02 +01:00
Halil Ozan Akgül	333bcc7948	Global PID Helper Functions (#5768 ) * Introduces citus_nodename_for_nodeid and citus_nodeport_for_nodeid functions * Introduces citus_nodeid_for_gpid and citus_pid_for_gpid functions * Add tests	2022-03-09 13:15:59 +03:00
Onder Kalaci	c32b2de1a7	Improve citus_lock_waits 1) Remove useless columns 2) Show backends that are blocked on a DDL even before gpid is assigned 3) One minor bugfix, where we clear distributedCommandOriginator properly.	2022-03-07 11:10:44 +01:00
Ahmet Gedemenli	2a3c0c1914	Revert upgrade script changes (#5757 )	2022-03-07 13:04:58 +03:00
Nils Dijk	3801576dfb	Move pg_dist_object to pg_catalog (#5765 ) DESCRIPTION: Move pg_dist_object to pg_catalog Historically `pg_dist_object` had been created in the `citus` schema as an experiment to understand if we could move our catalog tables to a branded schema. We quickly realised that this interfered with the UX on our managed services and other environments, where users connected via a user with the name of `citus`. By default postgres put the username on the search_path. To be able to read the catalog in the `citus` schema we would need to grant access permissions to the schema. This caused newly created objects like tables etc, to default to this schema for creation. This failed due to the write permissions to that schema. With this change we move the `pg_dist_object` catalog table to the `pg_catalog` schema, where our other schema's are also located. This makes the catalog table visible and readable by any user, like our other catalog tables, for debugging purposes. Note: due to the change of schema, we had to disable 1 test that was running into a discrepancy between the schema and binary. Secondly, we needed to make the lookup functions for the `pg_dist_object` relation and their indexes less strict on the fallback of the naming due to an other test that, due to an unfortunate cache invalidation, needed to lookup the relation again. This makes that we won't default to _only_ resolving from `pg_catalog` outside of upgrades.	2022-03-04 17:40:38 +00:00
Halil Ozan Akgul	0500a62515	Updates citus_dist_stat_activity to use citus_stat_activity	2022-03-04 17:28:17 +03:00
Onder Kalaci	c7b67ba0ea	Add citus_backend_gpid() And also citus_calculate_gpid(nodeId,pid). These UDFs are just wrappers for the existing functions. Useful for testing and simple manipulation of citus_stat_activity.	2022-03-03 15:29:40 +01:00
Halil Ozan Akgul	06a0509b1a	Introduces citus_stat_activity view	2022-03-03 16:19:20 +03:00
Marco Slot	3ba61244b8	Synchronize pg_dist_colocation metadata	2022-03-03 11:01:59 +01:00
Onder Kalaci	35ec9721b4	Add a new API for enabling Citus MX for clusters upgrading from earlier versions Clusters created pre-Citus 11 mostly didn't have metadata sync enabled. For those clusters, we add a utility UDF which fixes some minor issues and sync the necessary objects to the workers.	2022-03-02 17:02:55 +01:00
Ahmet Gedemenli	e1809af376	Propagate CREATE AGGREGATE commands	2022-03-02 10:52:43 +03:00
Hanefi Onaldi	6c25eea62f	Fix some typos in comments	2022-02-24 19:48:52 +03:00
Marco Slot	72d8fde28b	Use intermediate results for re-partition joins	2022-02-23 19:40:21 +01:00
Onder Kalaci	dffcafc096	Use global pids in citus_lock_waits	2022-02-21 17:46:34 +01:00
Onder Kalaci	331af3dce8	Dumping wait edges becomes optionally scan all backends Before this commit, dumping wait edges can only be used for distributed deadlock detection purposes. With this commit, we open the possibility that we can use it for any backend.	2022-02-21 17:37:07 +01:00
Halil Ozan Akgul	f6cd4d0f07	Overrides pg_cancel_backend and pg_terminate_backend to accept global pid	2022-02-21 16:41:35 +03:00
Nils Dijk	ea86f9f94e	Add support for TEXT SEARCH CONFIGURATION objects (#5685 ) DESCRIPTION: Implement TEXT SEARCH CONFIGURATION propagation The change adds support to Citus for propagating TEXT SEARCH CONFIGURATION objects. TSConfig objects cannot always be created in one create statement, and instead require a create statement followed by many alter statements to get turned into the object they should represent. To support this we add functionality to the worker to create or replace objects based on a list of statements. When the lists of the local object and the remote object correspond 1:1 we skip the creation of the object and simply mark it distributed. This is especially important for TSConfig objects as initdb pre-populates databases with a dozen configurations (for many different languages). When the user creates a new TSConfig based on the copy of an existing configuration there is no direct link to the object copied from. Since there is no link we can't simply rely on propagating the dependencies to the worker and send a qualified	2022-02-17 13:12:46 +01:00
Halil Ozan Akgul	8ee02b29d0	Introduce global PID	2022-02-08 16:49:38 +03:00
Burak Velioglu	f88cc230bf	Handle tables and objects as metadata. Update UDFs accordingly With this commit we've started to propagate sequences and shell tables within the object dependency resolution. So, ensuring any dependencies for any object will consider shell tables and sequences as well. Separate logics for both shell tables and sequences have been removed. Since both shell tables and sequences logic were implemented as a part of the metadata handling before that logic, we were propagating them while syncing table metadata. With this commit we've divided metadata (which means anything except shards thereafter) syncing logic into multiple parts and implemented it either as a part of ActivateNode. You can check the functions called in ActivateNode to check definition of different metadata. Definitions of start_metadata_sync_to_node and citus_activate_node have also been updated. citus_activate_node will basically create an active node with all metadata and reference table shards. start_metadata_sync_to_node will be same with citus_activate_node except replicating reference tables. stop_metadata_sync_to_node will remove all the metadata. All of those UDFs need to be called by superuser.	2022-01-31 16:20:15 +03:00
Teja Mupparti	54862f8c22	(1) Functions will be delegated even when present in the scope of an explicit BEGIN/COMMIT transaction block or in a UDF calling another UDF. (2) Prohibit/Limit the delegated function not to do a 2PC (or any work on a remote connection). (3) Have a safety net to ensure the (2) i.e. we should block the connections from the delegated procedure or make sure that no 2PC happens on the node. (4) Such delegated functions are restricted to use only the distributed argument value. Note: To limit the scope of the project we are considering only Functions(not procedures) for the initial work. DESCRIPTION: Introduce a new flag "force_delegation" in create_distributed_function(), which will allow a function to be delegated in an explicit transaction block. Fixes #3265 Once the function is delegated to the worker, on that node during the planning distributed_planner() TryToDelegateFunctionCall() CheckDelegatedFunctionExecution() EnableInForceDelegatedFuncExecution() Save the distribution argument (Constant) ExecutorStart() CitusBeginScan() IsShardKeyValueAllowed() Ensure to not use non-distribution argument. ExecutorRun() AdaptiveExecutor() StartDistributedExecution() EnsureNoRemoteExecutionFromWorkers() Ensure all the shards are local to the node in the remoteTaskList. NonPushableInsertSelectExecScan() InitializeCopyShardState() EnsureNoRemoteExecutionFromWorkers() Ensure all the shards are local to the node in the placementList. This also fixes a minor issue: Properly handle expressions+parameters in distribution arguments	2022-01-19 16:43:33 -08:00
Marco Slot	33bfa0b191	Hide shards from application_name's with a specific prefix	2022-01-18 15:20:55 +04:00
Önder Kalacı	5305aa4246	Do not drop sequences when dropping metadata (#5584 ) Dropping sequences means we need to recreate and hence losing the sequence. With this commit, we keep the existing sequences such that resyncing wouldn't drop the sequence. We do that by breaking the dependency of the sequence from the table.	2022-01-06 09:48:34 +01:00
Önder Kalacı	c9127f921f	Avoid round trips while fixing index names (#5549 ) With this commit, fix_partition_shard_index_names() works significantly faster. For example, 32 shards, 365 partitions, 5 indexes drop from ~120 seconds to ~44 seconds 32 shards, 1095 partitions, 5 indexes drop from ~600 seconds to ~265 seconds `queryStringList` can be really long, because it may contain #partitions * #indexes entries. Before this change, we were actually going through the executor where each command in the query string triggers 1 round trip per entry in queryStringList. The aim of this commit is to avoid the round-trips by creating a single query string. I first simply tried sending `q1;q2;..;qn` . However, the executor is designed to handle `q1;q2;..;qn` type of query executions via the infrastructure mentioned above (e.g., by tracking the query indexes in the list and doing 1 statement per round trip). One another option could have been to change the executor such that only track the query index when `queryStringList` is provided not with queryString including multiple `;`s . That is (a) more work (b) could cause weird edge cases with failure handling (c) felt like coding a special case in to the executor	2021-12-27 10:29:37 +01:00
Hanefi Onaldi	29e4516642	Introduce citus_check_cluster_node_health UDF This UDF coordinates connectivity checks accross the whole cluster. This UDF gets the list of active readable nodes in the cluster, and coordinates all connectivity checks in sequential order. The algorithm is: for sourceNode in activeReadableWorkerList: c = connectToNode(sourceNode) for targetNode in activeReadableWorkerList: result = c.execute( "SELECT citus_check_connection_to_node(targetNode.name, targetNode.port") emit sourceNode.name, sourceNode.port, targetNode.name, targetNode.port, result - result -> true -> connection attempt from source to target succeeded - result -> false -> connection attempt from source to target failed - result -> NULL -> connection attempt from the current node to source node failed I suggest you use the following query to get an overview on the connectivity: SELECT bool_and(COALESCE(result, false)) FROM citus_check_cluster_node_health(); Whenever this query returns false, there is a connectivity issue, check in detail.	2021-12-15 01:41:51 +03:00
Burak Velioglu	ed8e32de5e	Sync pg_dist_object on an update and propagate while syncing to a new node Before that PR we were updating citus.pg_dist_object metadata, which keeps the metadata related to objects on Citus, only on the coordinator node. In order to allow using those object from worker nodes (or erroring out with proper error message) we've started to propagate that metedata to worker nodes as well.	2021-12-06 19:25:50 +03:00
Hanefi Onaldi	56e9b1b968	Introduce UDF to check worker connectivity citus_check_connection_to_node runs a simple query on a remote node and reports whether this attempt was successful. This UDF will be used to make sure each worker node can connect to all the worker nodes in the cluster. parameters: nodename: required nodeport: optional (default: 5432) return value: boolean success	2021-12-03 02:30:28 +03:00
Onder Kalaci	549edcabb6	Allow disabling node(s) when multiple failures happen As of master branch, Citus does all the modifications to replicated tables (e.g., reference tables and distributed tables with replication factor > 1), via 2PC and avoids any shardstate=3. As a side-effect of those changes, handling node failures for replicated tables change. With this PR, when one (or multiple) node failures happen, the users would see query errors on modifications. If the problem is intermitant, that's OK, once the node failure(s) recover by themselves, the modification queries would succeed. If the node failure(s) are permenant, the users should call `SELECT citus_disable_node(...)` to disable the node. As soon as the node is disabled, modification would start to succeed. However, now the old node gets behind. It means that, when the node is up again, the placements should be re-created on the node. First, use `SELECT citus_activate_node()`. Then, use `SELECT replicate_table_shards(...)` to replicate the missing placements on the re-activated node.	2021-12-01 10:19:48 +01:00
Onur Tirtir	73f06323d8	Introduce dependencies from columnarAM to columnar metadata objects During pg upgrades, we have seen that it is not guaranteed that a columnar table will be created after metadata objects got created. Prior to changes done in this commit, we had such a dependency relationship in `pg_depend`: ``` columnar_table ----> columnarAM ----> citus extension ^ ^ \| \| columnar.storage_id_seq -------------------- \| \| columnar.stripe ------------------------------- ``` Since `pg_upgrade` just knows to follow topological sort of the objects when creating database dump, above dependency graph doesn't imply that `columnar_table` should be created before metadata objects such as `columnar.storage_id_seq` and `columnar.stripe` are created. For this reason, with this commit we add new records to `pg_depend` to make columnarAM depending on all rel objects living in `columnar` schema. That way, `pg_upgrade` will know it needs to create those before creating `columnarAM`, and similarly, before creating any tables using `columnarAM`. Note that in addition to inserting those records via installation script, we also do the same in `citus_finish_pg_upgrade()`. This is because, `pg_upgrade` rebuilds catalog tables in the new cluster and that means, we must insert them in the new cluster too.	2021-11-23 13:14:00 +03:00
Hanefi Onaldi	3d9cec70fd	Update migration paths from 10.2 to 11.0 (#5459 ) We recently introduced a set of patches to 10.2, and introduced 10.2-4 migration version. This migration version only resides on `release-10.2` branch, and is missing on our default branch. This creates a problem because we do not have a valid migration path from 10.2 to latest 11.0. To remedy this issue, I copied the relevant migration files from `release-10.2` branch, and renamed some of our migration files on default branch to make sure we have a linear upgrade path.	2021-11-11 13:55:28 +03:00
Philip Dubé	cc50682158	Fix typos. Spurred spotting "connectios" in logs	2021-10-25 13:54:09 +00:00
Hanefi Onaldi	3e64dc44c8	Fix some typos in comments (#5369 )	2021-10-13 13:00:39 +03:00
Naisila Puka	d0390af72d	Add fix_partition_shard_index_names udf to fix currently broken names (#5291 ) * Add udf to include shardId in broken partition shard index names * Address reviews: rename index such that operations can be done on it * More comprehensive index tests * Final touches and formatting	2021-10-07 19:34:52 +03:00
Naisila Puka	a69abe3be0	Fixes bug about int and smallint sequences on MX (#5254 ) * Introduce worker_nextval udf for int&smallint column defaults * Fix current tests and add new ones for worker_nextval	2021-09-09 23:41:07 +03:00
Hanefi Onaldi	9ae912a8c8	Prevent C-style comments in all directories (#5250 )	2021-09-09 11:54:58 +03:00
Burak Velioglu	c3895f35cd	Add helper UDFs for easy time partition management - get_missing_time_partition_ranges: Gets the ranges of missing partitions for the given table, interval and range unless any existing partition conflicts with calculated missing ranges. - create_time_partitions: Creates partitions by getting range values from get_missing_time_partition_ranges. - drop_old_time_partitions: Drops partitions of the table older than given threshold.	2021-09-03 23:03:13 +03:00
Sait Talha Nisanci	0b67fcf81d	Fix style	2021-09-03 16:09:59 +03:00
Halil Ozan Akgul	8ef94dc1f5	Changes array_cat argument type from anyarray to anycompatiblearray Relevant PG commit: 9e38c2bb5093ceb0c04d6315ccd8975bd17add66 fix array_cat_agg for pg upgrades array_cat_agg now needs to take anycompatiblearray instead of anyarray because array_cat changed its type from anyarray to anycompatiblearray with pg14. To handle upgrades correctly, we drop the aggregate in citus_pg_prepare_upgrade. To be able to drop it, we first remove the dependency from pg_depend. Then we create the right aggregate in citus_finish_pg_upgrade and we also add the dependency back to pg_depend.	2021-09-03 15:41:28 +03:00
Naisila Puka	acb5ae6ab6	Skip dropping shards when we know it's a partition (#5176 )	2021-08-31 17:41:37 +03:00
Onder Kalaci	482b8096e9	Introduce citus_internal_update_relation_colocation update_distributed_table_colocation can be called by the relation owner, and internally it updates pg_dist_partition. With this commit, update_distributed_table_colocation uses an internal UDF to access pg_dist_partition. As a result, this operation can now be done by regular users on MX.	2021-08-03 11:44:58 +02:00
Onder Kalaci	c8368e7929	Introduce citus_internal_delete_shard_metadata With this function, the owner of the table is allowed to remove shard metadata. This is going to be useful for tenant-isolation.	2021-07-19 13:25:05 +02:00
Onder Kalaci	2c349e6dfd	Use current user to sync metadata Before this commit, we always synced the metadata with superuser. However, that creates various edge cases such as visibility errors or self distributed deadlocks or complicates user access checks. Instead, with this commit, we use the current user to sync the metadata. Note that, `start_metadata_sync_to_node` still requires super user because accessing certain metadata (like pg_dist_node) always require superuser (e.g., the current user should be a superuser). However, metadata syncing operations regarding the distributed tables can now be done with regular users, as long as the user is the owner of the table. A table owner can still insert non-sense metadata, however it'd only affect its own table. So, we cannot do anything about that.	2021-07-16 13:25:27 +02:00
Hanefi Onaldi	8e9cc229ff	Remove public schema dependency for 10.0 upgrades This commit contains a subset of the changes that should be cherry picked to 10.0 releases.	2021-07-09 02:08:22 +03:00
Nils Dijk	18652ef9ff	fix 10.1-1 upgrade script to adhere to idempotency	2021-07-08 12:24:52 +02:00
Nils Dijk	e5517dc7b3	fix 9.5-2 upgrade script to adhere to idempotency	2021-07-08 12:24:52 +02:00
Marco Slot	214c674989	Fix PG upgrade scripts for 10.1	2021-07-05 14:38:26 +02:00
Marco Slot	b14955c2bd	Fix PG upgrade scripts for 10.0	2021-07-05 14:38:20 +02:00
Marco Slot	3c0dfc12c0	Fix PG upgrade scripts for 9.5	2021-07-05 13:39:35 +02:00
Marco Slot	bee202aa39	Fix PG upgrade scripts for 9.4	2021-07-05 13:39:28 +02:00
Ahmet Gedemenli	8bae58fdb7	Add parameter to cleanup metadata (#5055 ) * Add parameter to cleanup metadata * Set clear metadata default to true * Add test for clearing metadata * Separate test file for start/stop metadata syncing * Fix stop_sync bug for secondary nodes * Use PreventInTransactionBlock * DRemovedebuggiing logs * Remove relation not found logs from mx test * Revert localGroupId when doing stop_sync * Move metadata sync test to mx schedule * Add test with name that needs to be quoted * Add test for views and matviews * Add test for distributed table with custom type * Add comments to test * Add test with stats, indexes and constraints * Fix matview test * Add test for dropped column * Add notice messages to stop_metadata_sync * Add coordinator check to stop metadat sync * Revert local_group_id only if clearMetadata is true * Add a final check to see the metadata is sane * Remove the drop verbosity in test * Remove table description tests from sync test * Add stop sync to coordinator test * Change the order in stop_sync * Add test for hybrid (columnar+heap) partitioned table * Change error to notice for stop sync to coordinator * Sync at the end of the test to prevent any failures * Add test case in a transaction block * Remove relation not found tests	2021-07-01 16:23:53 +03:00
Jelte Fennema	4c3934272f	Improve performance of citus_shards (#5036 ) We were effectively joining on a calculated column because of our calls to `shard_name`. This caused a really bad plan to be generated. In my specific case it was taking ~18 seconds to show the output of citus_shards. It had this explain plan: ``` QUERY PLAN ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Subquery Scan on citus_shards (cost=18369.74..18437.34 rows=5408 width=124) (actual time=18277.461..18278.509 rows=5408 loops=1) -> Sort (cost=18369.74..18383.26 rows=5408 width=156) (actual time=18277.457..18277.726 rows=5408 loops=1) Sort Key: ((pg_dist_shard.logicalrelid)::text), pg_dist_shard.shardid Sort Method: quicksort Memory: 1629kB CTE shard_sizes -> Function Scan on citus_shard_sizes (cost=0.00..10.00 rows=1000 width=40) (actual time=71.137..71.934 rows=5413 loops=1) -> Hash Join (cost=177.62..18024.42 rows=5408 width=156) (actual time=77.985..18257.237 rows=5408 loops=1) Hash Cond: ((pg_dist_shard.logicalrelid)::oid = (pg_dist_partition.logicalrelid)::oid) -> Hash Join (cost=169.81..371.98 rows=5408 width=48) (actual time=1.415..13.166 rows=5408 loops=1) Hash Cond: (pg_dist_placement.groupid = pg_dist_node.groupid) -> Hash Join (cost=168.68..296.49 rows=5408 width=16) (actual time=1.403..10.011 rows=5408 loops=1) Hash Cond: (pg_dist_placement.shardid = pg_dist_shard.shardid) -> Seq Scan on pg_dist_placement (cost=0.00..113.60 rows=5408 width=12) (actual time=0.004..3.684 rows=5408 loops=1) Filter: (shardstate = 1) -> Hash (cost=101.08..101.08 rows=5408 width=12) (actual time=1.385..1.386 rows=5408 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 318kB -> Seq Scan on pg_dist_shard (cost=0.00..101.08 rows=5408 width=12) (actual time=0.003..0.688 rows=5408 loops=1) -> Hash (cost=1.06..1.06 rows=6 width=40) (actual time=0.007..0.007 rows=6 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on pg_dist_node (cost=0.00..1.06 rows=6 width=40) (actual time=0.004..0.005 rows=6 loops=1) -> Hash (cost=5.69..5.69 rows=169 width=130) (actual time=0.070..0.071 rows=169 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 36kB -> Seq Scan on pg_dist_partition (cost=0.00..5.69 rows=169 width=130) (actual time=0.009..0.041 rows=169 loops=1) SubPlan 2 -> Limit (cost=0.00..3.25 rows=1 width=8) (actual time=3.370..3.370 rows=1 loops=5408) -> CTE Scan on shard_sizes (cost=0.00..32.50 rows=10 width=8) (actual time=3.369..3.369 rows=1 loops=5408) Filter: ((shard_name(pg_dist_shard.logicalrelid, pg_dist_shard.shardid) = table_name) OR (('public.'::text \|\| shard_name(pg_dist_shard.logicalrelid, pg_dist_shard.shardid)) = table_name)) Rows Removed by Filter: 2707 Planning Time: 0.705 ms Execution Time: 18278.877 ms ``` With the changes it only takes 180ms to show the same output: ``` QUERY PLAN ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Sort (cost=904.59..918.11 rows=5408 width=156) (actual time=182.508..182.960 rows=5408 loops=1) Sort Key: ((pg_dist_shard.logicalrelid)::text), pg_dist_shard.shardid Sort Method: quicksort Memory: 1629kB -> Hash Join (cost=418.03..569.27 rows=5408 width=156) (actual time=136.333..146.591 rows=5408 loops=1) Hash Cond: ((pg_dist_shard.logicalrelid)::oid = (pg_dist_partition.logicalrelid)::oid) -> Hash Join (cost=410.22..492.83 rows=5408 width=56) (actual time=136.231..140.132 rows=5408 loops=1) Hash Cond: (pg_dist_placement.groupid = pg_dist_node.groupid) -> Hash Right Join (cost=409.09..417.34 rows=5408 width=24) (actual time=136.218..138.890 rows=5408 loops=1) Hash Cond: ((((regexp_matches(citus_shard_sizes.table_name, '_(\d+)$'::text))[1])::integer) = pg_dist_shard.shardid) -> HashAggregate (cost=45.00..48.50 rows=200 width=12) (actual time=131.609..132.481 rows=5408 loops=1) Group Key: ((regexp_matches(citus_shard_sizes.table_name, '_(\d+)$'::text))[1])::integer Batches: 1 Memory Usage: 737kB -> Result (cost=0.00..40.00 rows=1000 width=12) (actual time=107.786..129.831 rows=5408 loops=1) -> ProjectSet (cost=0.00..22.50 rows=1000 width=40) (actual time=107.780..128.492 rows=5408 loops=1) -> Function Scan on citus_shard_sizes (cost=0.00..10.00 rows=1000 width=40) (actual time=107.746..108.107 rows=5414 loops=1) -> Hash (cost=296.49..296.49 rows=5408 width=16) (actual time=4.595..4.598 rows=5408 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 339kB -> Hash Join (cost=168.68..296.49 rows=5408 width=16) (actual time=1.702..3.783 rows=5408 loops=1) Hash Cond: (pg_dist_placement.shardid = pg_dist_shard.shardid) -> Seq Scan on pg_dist_placement (cost=0.00..113.60 rows=5408 width=12) (actual time=0.004..0.837 rows=5408 loops=1) Filter: (shardstate = 1) -> Hash (cost=101.08..101.08 rows=5408 width=12) (actual time=1.683..1.685 rows=5408 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 318kB -> Seq Scan on pg_dist_shard (cost=0.00..101.08 rows=5408 width=12) (actual time=0.004..0.824 rows=5408 loops=1) -> Hash (cost=1.06..1.06 rows=6 width=40) (actual time=0.007..0.008 rows=6 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on pg_dist_node (cost=0.00..1.06 rows=6 width=40) (actual time=0.004..0.006 rows=6 loops=1) -> Hash (cost=5.69..5.69 rows=169 width=130) (actual time=0.079..0.079 rows=169 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 36kB -> Seq Scan on pg_dist_partition (cost=0.00..5.69 rows=169 width=130) (actual time=0.011..0.046 rows=169 loops=1) Planning Time: 0.789 ms Execution Time: 184.095 ms ```	2021-06-14 13:32:30 +02:00
Jelte Fennema	7015049ea5	Add citus_cleanup_orphaned_shards UDF Sometimes the background daemon doesn't cleanup orphaned shards quickly enough. It's useful to have a UDF to trigger this removal when needed. We already had a UDF like this but it was only used during testing. This exposes that UDF to users. As a safety measure it cannot be run in a transaction, because that would cause the background daemon to stop cleaning up shards while this transaction is running.	2021-06-04 11:23:07 +02:00
Jelte Fennema	4c20bf7a36	Remove pg_dist_rebalence_strategy_enterprise_check (#5014 ) This is not necessary anymore now that the rebalancer is open source.	2021-06-01 06:16:46 -07:00
SaitTalhaNisanci	a20cc3b36a	Only consider shard state 1 in citus shards (#4970 )	2021-05-28 11:33:48 +03:00
Jelte Fennema	10f06ad753	Fetch shard size on the fly for the rebalance monitor Without this change the rebalancer progress monitor gets the shard sizes from the `shardlength` column in `pg_dist_placement`. This column needs to be updated manually by calling `citus_update_table_statistics`. However, `citus_update_table_statistics` could lead to distributed deadlocks while database traffic is on-going (see #4752). To work around this we don't use `shardlength` column anymore. Instead for every rebalance we now fetch all shard sizes on the fly. Two additional things this does are: 1. It adds tests for the rebalance progress function. 2. If a shard move cannot be done because a source or target node is unreachable, then we error in stop the rebalance, instead of showing a warning and continuing. When using the by_disk_size rebalance strategy it's not safe to continue with other moves if a specific move failed. It's possible that the failed move made space for the next move, and because the failed move never happened this space now does not exist. 3. Adds two new columns to the result of `get_rebalancer_progress` which shows the size of the shard on the source and target node. Fixes #4930	2021-05-20 16:38:17 +02:00
Jelte Fennema	cbbd10b974	Implement an improvement threshold in the rebalancer (#4927 ) Every move in the rebalancer algorithm results in an improvement in the balance. However, even if the improvement in the balance was very small the move was still chosen. This is especially problematic if the shard itself is very big and the move will take a long time. This changes the rebalancer algorithm to take the relative size of the balance improvement into account when choosing moves. By default a move will not be chosen if it improves the balance by less than half of the size of the shard. An extra argument is added to the rebalancer functions so that the user can decide to lower the default threshold if the ignored move is wanted anyway.	2021-05-11 14:24:59 +02:00
SaitTalhaNisanci	6b1904d37a	When moving a shard to a new node ensure there is enough space (#4929 ) * When moving a shard to a new node ensure there is enough space * Add WairForMiliseconds time utility * Add more tests and increase readability * Remove the retry loop and use a single udf for disk stats * Address review * address review Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>	2021-05-06 17:28:02 +03:00
Ahmet Gedemenli	332c5ce4ad	Fix worker partitioned size functions (#4922 )	2021-04-26 10:29:46 +03:00
Ahmet Gedemenli	e445e3d39c	Introduce 3 partitioned size udfs (#4899 ) * Introduce 3 partitioned size udfs * Add tests for new partition size udfs * Fix type incompatibilities * Convert UDFs into pure sql functions * Fix function comment	2021-04-13 17:36:27 +03:00
Onur Tirtir	fe5c985e1d	Remove HAS_TABLEAM config since we dropped pg11 support (#4862 ) * Remove HAS_TABLEAM config * Drop columnar_ensure_objects_exist * Not call columnar_ensure_objects_exist in citus_finish_pg_upgrade	2021-04-13 10:51:26 +03:00
Halil Ozan Akgul	a5038046f9	Adds shard_count parameter to create_distributed_table	2021-03-29 16:22:49 +03:00
Naisila Puka	71a9f45513	Fix upgrade and downgrade paths for master/citus_update_table_statistics (#4805 )	2021-03-11 14:52:40 +03:00

1 2 3 4 5

212 Commits (ec617b872d50679ea2fb95315a40e3c5fbef8c74)