citus

Commit Graph

Author	SHA1	Message	Date
Marco Slot	a7e4d6c94a	Fix a bug that causes worker_create_or_alter_role to crash with NULL input	2021-06-15 20:07:08 +02:00
Naisila Puka	e26b29d3bb	Fix nextval('seq_name'::text) bug, and schema for seq tests (#5046 )	2021-06-16 13:58:49 +03:00
Jelte Fennema	4c3934272f	Improve performance of citus_shards (#5036 ) We were effectively joining on a calculated column because of our calls to `shard_name`. This caused a really bad plan to be generated. In my specific case it was taking ~18 seconds to show the output of citus_shards. It had this explain plan: ``` QUERY PLAN ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Subquery Scan on citus_shards (cost=18369.74..18437.34 rows=5408 width=124) (actual time=18277.461..18278.509 rows=5408 loops=1) -> Sort (cost=18369.74..18383.26 rows=5408 width=156) (actual time=18277.457..18277.726 rows=5408 loops=1) Sort Key: ((pg_dist_shard.logicalrelid)::text), pg_dist_shard.shardid Sort Method: quicksort Memory: 1629kB CTE shard_sizes -> Function Scan on citus_shard_sizes (cost=0.00..10.00 rows=1000 width=40) (actual time=71.137..71.934 rows=5413 loops=1) -> Hash Join (cost=177.62..18024.42 rows=5408 width=156) (actual time=77.985..18257.237 rows=5408 loops=1) Hash Cond: ((pg_dist_shard.logicalrelid)::oid = (pg_dist_partition.logicalrelid)::oid) -> Hash Join (cost=169.81..371.98 rows=5408 width=48) (actual time=1.415..13.166 rows=5408 loops=1) Hash Cond: (pg_dist_placement.groupid = pg_dist_node.groupid) -> Hash Join (cost=168.68..296.49 rows=5408 width=16) (actual time=1.403..10.011 rows=5408 loops=1) Hash Cond: (pg_dist_placement.shardid = pg_dist_shard.shardid) -> Seq Scan on pg_dist_placement (cost=0.00..113.60 rows=5408 width=12) (actual time=0.004..3.684 rows=5408 loops=1) Filter: (shardstate = 1) -> Hash (cost=101.08..101.08 rows=5408 width=12) (actual time=1.385..1.386 rows=5408 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 318kB -> Seq Scan on pg_dist_shard (cost=0.00..101.08 rows=5408 width=12) (actual time=0.003..0.688 rows=5408 loops=1) -> Hash (cost=1.06..1.06 rows=6 width=40) (actual time=0.007..0.007 rows=6 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on pg_dist_node (cost=0.00..1.06 rows=6 width=40) (actual time=0.004..0.005 rows=6 loops=1) -> Hash (cost=5.69..5.69 rows=169 width=130) (actual time=0.070..0.071 rows=169 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 36kB -> Seq Scan on pg_dist_partition (cost=0.00..5.69 rows=169 width=130) (actual time=0.009..0.041 rows=169 loops=1) SubPlan 2 -> Limit (cost=0.00..3.25 rows=1 width=8) (actual time=3.370..3.370 rows=1 loops=5408) -> CTE Scan on shard_sizes (cost=0.00..32.50 rows=10 width=8) (actual time=3.369..3.369 rows=1 loops=5408) Filter: ((shard_name(pg_dist_shard.logicalrelid, pg_dist_shard.shardid) = table_name) OR (('public.'::text \|\| shard_name(pg_dist_shard.logicalrelid, pg_dist_shard.shardid)) = table_name)) Rows Removed by Filter: 2707 Planning Time: 0.705 ms Execution Time: 18278.877 ms ``` With the changes it only takes 180ms to show the same output: ``` QUERY PLAN ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Sort (cost=904.59..918.11 rows=5408 width=156) (actual time=182.508..182.960 rows=5408 loops=1) Sort Key: ((pg_dist_shard.logicalrelid)::text), pg_dist_shard.shardid Sort Method: quicksort Memory: 1629kB -> Hash Join (cost=418.03..569.27 rows=5408 width=156) (actual time=136.333..146.591 rows=5408 loops=1) Hash Cond: ((pg_dist_shard.logicalrelid)::oid = (pg_dist_partition.logicalrelid)::oid) -> Hash Join (cost=410.22..492.83 rows=5408 width=56) (actual time=136.231..140.132 rows=5408 loops=1) Hash Cond: (pg_dist_placement.groupid = pg_dist_node.groupid) -> Hash Right Join (cost=409.09..417.34 rows=5408 width=24) (actual time=136.218..138.890 rows=5408 loops=1) Hash Cond: ((((regexp_matches(citus_shard_sizes.table_name, '_(\d+)$'::text))[1])::integer) = pg_dist_shard.shardid) -> HashAggregate (cost=45.00..48.50 rows=200 width=12) (actual time=131.609..132.481 rows=5408 loops=1) Group Key: ((regexp_matches(citus_shard_sizes.table_name, '_(\d+)$'::text))[1])::integer Batches: 1 Memory Usage: 737kB -> Result (cost=0.00..40.00 rows=1000 width=12) (actual time=107.786..129.831 rows=5408 loops=1) -> ProjectSet (cost=0.00..22.50 rows=1000 width=40) (actual time=107.780..128.492 rows=5408 loops=1) -> Function Scan on citus_shard_sizes (cost=0.00..10.00 rows=1000 width=40) (actual time=107.746..108.107 rows=5414 loops=1) -> Hash (cost=296.49..296.49 rows=5408 width=16) (actual time=4.595..4.598 rows=5408 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 339kB -> Hash Join (cost=168.68..296.49 rows=5408 width=16) (actual time=1.702..3.783 rows=5408 loops=1) Hash Cond: (pg_dist_placement.shardid = pg_dist_shard.shardid) -> Seq Scan on pg_dist_placement (cost=0.00..113.60 rows=5408 width=12) (actual time=0.004..0.837 rows=5408 loops=1) Filter: (shardstate = 1) -> Hash (cost=101.08..101.08 rows=5408 width=12) (actual time=1.683..1.685 rows=5408 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 318kB -> Seq Scan on pg_dist_shard (cost=0.00..101.08 rows=5408 width=12) (actual time=0.004..0.824 rows=5408 loops=1) -> Hash (cost=1.06..1.06 rows=6 width=40) (actual time=0.007..0.008 rows=6 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on pg_dist_node (cost=0.00..1.06 rows=6 width=40) (actual time=0.004..0.006 rows=6 loops=1) -> Hash (cost=5.69..5.69 rows=169 width=130) (actual time=0.079..0.079 rows=169 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 36kB -> Seq Scan on pg_dist_partition (cost=0.00..5.69 rows=169 width=130) (actual time=0.011..0.046 rows=169 loops=1) Planning Time: 0.789 ms Execution Time: 184.095 ms ```	2021-06-14 13:32:30 +02:00
Hanefi Onaldi	5c6069a74a	Do not rely on fk cache when truncating local data (#5018 )	2021-06-07 11:56:48 +03:00
Marco Slot	e81d25a7be	Refactor RelationIsAKnownShard to remove onlySearchPath argument	2021-06-02 14:30:27 +02:00
Ahmet Gedemenli	089ef35940	Disable dropping and truncating known shards Add test for disabling dropping and truncating known shards	2021-06-02 14:30:27 +02:00
Jelte Fennema	1a83628195	Use "orphaned shards" naming in more places We were not very consistent in how we named these shards.	2021-06-04 11:39:19 +02:00
Jelte Fennema	3f60e4f394	Add ExecuteCriticalCommandInDifferentTransaction function We use this pattern multiple times throughout the codebase now. Seems like a good moment to abstract it away.	2021-06-04 11:30:27 +02:00
Jelte Fennema	503c70b619	Cleanup orphaned shards before moving when necessary A shard move would fail if there was an orphaned version of the shard on the target node. With this change before actually fail, we try to clean up orphaned shards to see if that fixes the issue.	2021-06-04 11:23:07 +02:00
Jelte Fennema	280b9ae018	Cleanup orphaned shards at the start of a rebalance In case the background daemon hasn't cleaned up shards yet, we do this manually at the start of a rebalance.	2021-06-04 11:23:07 +02:00
Jelte Fennema	7015049ea5	Add citus_cleanup_orphaned_shards UDF Sometimes the background daemon doesn't cleanup orphaned shards quickly enough. It's useful to have a UDF to trigger this removal when needed. We already had a UDF like this but it was only used during testing. This exposes that UDF to users. As a safety measure it cannot be run in a transaction, because that would cause the background daemon to stop cleaning up shards while this transaction is running.	2021-06-04 11:23:07 +02:00
Naisila Puka	0f37ab5f85	Fixes column default coming from a sequence (#4914 ) * Add user-defined sequence support for MX * Remove default part when propagating to workers * Fix ALTER TABLE with sequences for mx tables * Clean up and add tests * Propagate DROP SEQUENCE * Removing function parts * Propagate ALTER SEQUENCE * Change sequence type before propagation & cleanup * Revert "Propagate ALTER SEQUENCE" This reverts commit 2bef64c5a29f4e7224a7f43b43b88e0133c65159. * Ensure sequence is not used in a different column with different type * Insert select tests * Propagate rename sequence stmt * Fix issue with group ID cache invalidation * Add ALTER TABLE ALTER COLUMN TYPE .. precaution * Fix attnum inconsistency and add various tests * Add ALTER SEQUENCE precaution * Remove Citus hook * More tests Co-authored-by: Marco Slot <marco.slot@gmail.com>	2021-06-03 23:02:09 +03:00
Marco Slot	c03729ad03	Only warn about reference tables when removing last node	2021-06-01 10:53:12 +02:00
Hanefi Onaldi	fa29d6667a	Accept invalidation before fk graph validity check (#5017 ) InvalidateForeignKeyGraph sends an invalidation via shared memory to all backends, including the current one. However, we might not call AcceptInvalidationMessages before reading from the cache below. It would be better to also add a call to AcceptInvalidationMessages in IsForeignConstraintRelationshipGraphValid.	2021-06-02 14:45:35 +03:00
Ahmet Gedemenli	103cf34418	Sort GUCs in alphabetical order	2021-06-02 12:52:18 +03:00
Jelte Fennema	b1cad26ebc	Move CheckCitusVersion to the top of each function Previously this was usually done after argument parsing. This can cause SEGFAULTs if the number or type of arguments changes in a new version. By checking that Citus version is correct before doing any argument parsing we protect against these types of issues. Issues like this have occurred in pg_auto_failover, so it's not just a theoretical issue. The main reason why these calls were not at the top of functions is really just historical. It was because in the past we didn't allow statements before declarations. Thus having this check before the argument parsing would have only been possible if we first declared all variables. In addition to moving existing CheckCitusVersion calls it also adds these calls to rebalancer related functions (they were missing there).	2021-06-01 17:43:46 +02:00
Jelte Fennema	4c20bf7a36	Remove pg_dist_rebalence_strategy_enterprise_check (#5014 ) This is not necessary anymore now that the rebalancer is open source.	2021-06-01 06:16:46 -07:00
Ahmet Gedemenli	69d39c0e8b	Fix relname null bug when parallel execution	2021-06-01 14:14:35 +03:00
Ahmet Gedemenli	9638933d9d	Remove function GenerateNewTargetEntriesForSortClauses	2021-06-01 12:35:36 +03:00
Jelte Fennema	3271f1bd13	Fix data race in get_rebalance_progress (#5008 ) To be able to report progress of the rebalancer, the rebalancer updates the state of a shard move in a shared memory segment. To then fetch the progress, `get_rebalance_progress` can be called which reads this shared memory. Without this change it did so without using any synchronization primitives, allowing for data races. This fixes that by using atomic operations to update and read from the parts of the shared memory that can be changed after initialization.	2021-05-31 15:27:32 +02:00
SaitTalhaNisanci	8c3f85692d	Not consider old placements when disabling or removing a node (#4960 ) * Not consider old placements when disabling or removing a node * update cluster test	2021-05-28 22:38:20 +02:00
SaitTalhaNisanci	a20cc3b36a	Only consider shard state 1 in citus shards (#4970 )	2021-05-28 11:33:48 +03:00
SaitTalhaNisanci	a4944a2102	Rename CoordinatedTransactionShouldUse2PC (#4995 )	2021-05-21 18:57:42 +03:00
Hanefi Onaldi	c160325d07	Use streaming replication when repl factor = 1	2021-05-21 16:14:59 +03:00
Hanefi Onaldi	878513f325	Remove all occurences of replication_model GUC	2021-05-21 16:14:59 +03:00
SaitTalhaNisanci	87e3a5e24a	Use 2PC when using a node connection (#4997 )	2021-05-21 14:58:53 +03:00
SaitTalhaNisanci	82f34a8d88	Enable citus.defer_drop_after_shard_move by default (#4961 ) Enable citus.defer_drop_after_shard_move by default	2021-05-21 10:48:32 +03:00
Nils Dijk	d7dd247fb5	fix shared dependencies that are not resident in a database (#4992 ) DESCRIPTION: fix shared dependencies that are not resident in a database eg. databases depend on users (their owners) that both don’t have a database they reside in. These dependencies are recorded in pg_shdepend with a `dbid` of `InvalidOid` When we fetch our shared dependencies we don’t take these links in account. With this patch we use logic inspired by `classIdGetDbId` to decide when to use `MyDatabaseId` vs `InvalidOid` to correctly resolve dependencies between shared objects.	2021-05-20 08:55:02 -07:00
Jelte Fennema	10f06ad753	Fetch shard size on the fly for the rebalance monitor Without this change the rebalancer progress monitor gets the shard sizes from the `shardlength` column in `pg_dist_placement`. This column needs to be updated manually by calling `citus_update_table_statistics`. However, `citus_update_table_statistics` could lead to distributed deadlocks while database traffic is on-going (see #4752). To work around this we don't use `shardlength` column anymore. Instead for every rebalance we now fetch all shard sizes on the fly. Two additional things this does are: 1. It adds tests for the rebalance progress function. 2. If a shard move cannot be done because a source or target node is unreachable, then we error in stop the rebalance, instead of showing a warning and continuing. When using the by_disk_size rebalance strategy it's not safe to continue with other moves if a specific move failed. It's possible that the failed move made space for the next move, and because the failed move never happened this space now does not exist. 3. Adds two new columns to the result of `get_rebalancer_progress` which shows the size of the shard on the source and target node. Fixes #4930	2021-05-20 16:38:17 +02:00
Nils Dijk	a6c2d2a4c4	Feature: alter database owner (#4986 ) DESCRIPTION: Add support for ALTER DATABASE OWNER This adds support for changing the database owner. It achieves this by marking the database as a distributed object. By marking the database as a distributed object it will look for its dependencies and order the user creation commands (enterprise only) before the alter of the database owner. This is mostly important when adding new nodes. By having the database marked as a distributed object it can easily understand for which `ALTER DATABASE ... OWNER TO ...` commands to propagate by resolving the object address of the database and verifying it is a distributed object, and hence should propagate changes of owner ship to all workers. Given the ownership of the database might have implications on subsequent commands in transactions we force sequential mode for transactions that have a `ALTER DATABASE ... OWNER TO ...` command in them. This will fail the transaction with meaningful help when the transaction already executed parallel statements. By default the feature is turned off since roles are not automatically propagated, having it turned on would cause hard to understand errors for the user. It can be turned on by the user via setting the `citus.enable_alter_database_owner`.	2021-05-20 13:27:44 +02:00
Onder Kalaci	d07db99ea4	Make sure that target node in shard moves is eligable for shard move	2021-05-20 10:51:01 +02:00
Onder Kalaci	926069a859	Wait until all connections are successfully established Comment from the code: /* * Iterate until all the tasks are finished. Once all the tasks * are finished, ensure that that all the connection initializations * are also finished. Otherwise, those connections are terminated * abruptly before they are established (or failed). Instead, we let * the ConnectionStateMachine() to properly handle them. * * Note that we could have the connections that are not established * as a side effect of slow-start algorithm. At the time the algorithm * decides to establish new connections, the execution might have tasks * to finish. But, the execution might finish before the new connections * are established. / Note that the abruptly terminated connections lead to the following errors: 2020-11-16 21:09:09.800 CET [16633] LOG: could not accept SSL connection: Connection reset by peer 2020-11-16 21:09:09.872 CET [16657] LOG: could not accept SSL connection: Undefined error: 0 2020-11-16 21:09:09.894 CET [16667] LOG: could not accept SSL connection: Connection reset by peer To easily reproduce the issue: - Create a single node Citus - Add the coordinator to the metadata - Create a distributed table with shards on the coordinator - f.sql: select count() from test; - pgbench -f /tmp/f.sql postgres -T 12 -c 40 -P 1 or pgbench -f /tmp/f.sql postgres -T 12 -c 40 -P 1 -C	2021-05-19 15:59:13 +02:00
Onder Kalaci	995adf1a19	Executor takes connection establishment and task execution costs into account With this commit, the executor becomes smarter about refrain to open new connections. The very basic example is that, if the connection establishments take 1000ms and task executions as 5 msecs, the executor becomes smart enough to not establish new connections.	2021-05-19 15:48:07 +02:00
Onder Kalaci	28b0b4ebd1	Move slow start increment to generic place	2021-05-19 14:31:20 +02:00
Marco Slot	715dce1eea	Reduce local insert memory usage during deparsing	2021-05-18 16:11:43 +02:00
Marco Slot	644b266dee	Only cache local plans when reusing a distributed plan	2021-05-18 16:11:43 +02:00
Marco Slot	00792831ad	Add execution memory contexts and free after local query execution	2021-05-18 16:11:43 +02:00
SaitTalhaNisanci	ff2a125a5b	Lookup hostname before execution (#4976 ) We lookup the hostname just before the execution so that even if there are cached entries in the prepared statement cache we use the updated entries.	2021-05-18 16:46:31 +03:00
SaitTalhaNisanci	eaa7d2bada	Not block maintenance daemon (#4972 ) It was possible to block maintenance daemon by taking an SHARE ROW EXCLUSIVE lock on pg_dist_placement. Until the lock is released maintenance daemon would be blocked. We should not block the maintenance daemon under any case hence now we try to get the pg_dist_placement lock without waiting, if we cannot get it then we don't try to drop the old placements.	2021-05-17 03:22:35 -07:00
Nils Dijk	c91f8d8a15	Feature: localhost guc (#4836 ) DESCRIPTION: introduce `citus.local_hostname` GUC for connections to the current node Citus once in a while needs to connect to itself for some systems operations. This used to be hardcoded to `localhost`. The hardcoded hostname causes some issues, for example in environments where `sslmode=verify-full` is required. It is not always desirable or even feasible to get `localhost` as an alt name on the certificate. By introducing a GUC to use when connecting to the current instance the user has more control what network path is used and what hostname is required to be present in the server certificate.	2021-05-12 16:59:44 +02:00
Jelte Fennema	cbbd10b974	Implement an improvement threshold in the rebalancer (#4927 ) Every move in the rebalancer algorithm results in an improvement in the balance. However, even if the improvement in the balance was very small the move was still chosen. This is especially problematic if the shard itself is very big and the move will take a long time. This changes the rebalancer algorithm to take the relative size of the balance improvement into account when choosing moves. By default a move will not be chosen if it improves the balance by less than half of the size of the shard. An extra argument is added to the rebalancer functions so that the user can decide to lower the default threshold if the ignored move is wanted anyway.	2021-05-11 14:24:59 +02:00
Onder Kalaci	cc4870a635	Remove wrong PG_USED_FOR_ASSERTS_ONLY	2021-05-11 12:58:37 +02:00
Onder Kalaci	a231ff29b0	Get prepared for some improvements for online rebalancer To see all the changes, see https://github.com/citusdata/citus-enterprise/pull/586/files	2021-05-10 19:54:31 +02:00
SaitTalhaNisanci	5a941814fd	Close connection after each shard move (#4967 )	2021-05-10 16:57:19 +03:00
Ahmet Gedemenli	8cb505d6e1	Fix matview access method change issue (#4959 ) * Fix matview access method change issue * Use pg function get_am_name * Split view generation command into pieces	2021-05-07 15:47:24 +03:00
SaitTalhaNisanci	6b1904d37a	When moving a shard to a new node ensure there is enough space (#4929 ) * When moving a shard to a new node ensure there is enough space * Add WairForMiliseconds time utility * Add more tests and increase readability * Remove the retry loop and use a single udf for disk stats * Address review * address review Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>	2021-05-06 17:28:02 +03:00
Ahmet Gedemenli	bc818e76e2	Add notice log message for skipping child tables for optimization	2021-05-06 16:49:37 +03:00
Ahmet Gedemenli	2e0bb5c0c8	Fix nested select query with union bug	2021-05-05 20:35:00 +03:00
Jelte Fennema	50357db957	Simplify code that tests the shard rebalancer algorithm (#4925 ) This modifies the test code to use sane defaults instead of requiring all values to be specified in the test.	2021-05-03 15:47:19 +02:00
Jelte Fennema	2f29d4e53e	Continue to remove shards after first failure in DropMarkedShards The comment of DropMarkedShards described the behaviour that after a failure we would continue trying to drop other shards. However the code did not do this and would stop after the first failure. Instead of simply fixing the comment I fixed the code, because the described behaviour is more useful. Now a single shard that cannot be removed yet does not block others from being removed.	2021-04-30 15:42:09 +03:00

1 2 3 4 5 ...

2284 Commits (a7e4d6c94a96a9013ad23fb6b6c44dc83da15d32)