citus

Commit Graph

Author	SHA1	Message	Date
naisila	bed3770d54	Add alternative test outputs for change in Insert Select display coordinator_shouldhaveshards.sql multi_insert_select.sql multi_insert_select_conflict.sql single_node.sql multi_deparse_shard_query.sql citus_local_tables_queries.sql cte_inline.sql insert_select_repartition.sql intermediate_result_pruning.sql local_shard_execution.sql local_shard_execution_replicated.sql multi_mx_insert_select_repartition.sql mx_coordinator_shouldhaveshards.sql Relevant PG commit: a8d8445a7b2f80f6d0bfe97b19f90bd2cbef8759	2022-08-22 16:54:38 +03:00
naisila	c85b7589ef	pg_database_owner -- commit To be renamed	2022-08-22 16:54:38 +03:00
naisila	7a48229753	Handles EXPLAIN output diffs in PG15: HashAgg Leverage,alt. output multi_select_distinct.sql Still not sure of the relevant PG commit Could be db0d67db2401eb6238ccc04c6407a4fd4f985832 but disabling enable_group_by_reordering didn't help.	2022-08-22 16:54:38 +03:00
naisila	fb0fb0f0c6	Handles EXPLAIN output diffs in PG15: extra arrows&result lines To handle extra "->" arrows resulting from extra Result lines in explain outputs, we add the following explain method to multi_test_helpers.sql file - plan_without_arrows() is added for cases where we want the whole explain output without arrows and without Result lines	2022-08-22 16:54:38 +03:00
naisila	30e6eea7e4	Omit namespace in post-copy errmsg Relevant PG commit: 069d33d0c5a021601245e44df77a0423ddd69359	2022-08-22 16:54:38 +03:00
naisila	7834001233	Normalizes Memory Usage, Buckets, Batches for PG15 explain diffs We create a new function in multi_test_helpers, which is similar to explain_merge function in PG15. This explain helper function normalies Memory Usage, Buckets and Batches, and we use it in the tests which give a different output for PG15.	2022-08-22 16:54:38 +03:00
naisila	25767dcf3c	Handles EXPLAIN output diffs in PG15: enable_group_by_reordering Relevant PG commit db0d67db2401eb6238ccc04c6407a4fd4f985832	2022-08-22 16:54:38 +03:00
naisila	2a0814fe28	Handles EXPLAIN output diffs in PG15, Hash Agg/Join leverage To handle differences in usage of GroupAggregate vs HashAggregate or Merge Join vs Hash join in cases where this detail doesn't seem to matter, we use coordinator_plan(). - coordinator_plan() is updated to remove "Result" lines There are some cases where we have subplans so we add a new function that prints all Task Count lines as well - coordinator_plan_with_subplans() Still not sure of the relevant PG commit Could be db0d67db2401eb6238ccc04c6407a4fd4f985832 but disabling enable_group_by_reordering didn't help.	2022-08-22 16:54:38 +03:00
naisila	92880842a4	Handles EXPLAIN output diffs in PG15 - Extra result lines To handle extra "Result" lines in explain outputs, we add explain method to multi_test_helpers.sql file - plan_without_result_lines() is added for cases where we want the whole explain output with only "Result" lines removed	2022-08-22 16:54:38 +03:00
Onder Kalaci	5f0ac25779	Support JSON_TABLE on PG 15 Postgres supports JSON_TABLE feature on PG 15. We treat JSON_TABLE the same as correlated functions (e.g., recurring tuples). In the end, for multi-shard JSON_TABLE commands, we apply the same restrictions as reference tables (e.g., cannot be in the outer part of an outer join etc.)	2022-08-22 16:54:38 +03:00
naisila	4ad8376796	Adds test failure_pg15.sql for duplicate error message cases Each of the following tests: failure_ddl.sql, failure_truncate.sql failure_multi_dml.sql, failure_vacuum.sql has a part with alternative output for PG15 resulting from removal of duplicate error messages This test file has been created to avoid 4 alternative output files Relevant PG commit: 618c16707a6d6e8f5c83ede2092975e4670201ad	2022-08-22 16:54:38 +03:00
naisila	2161b183dd	Adds alt. output for failure_savepoints bcs of PG15 libpq error changes Duplicated error/warning texts is now avoided in PG15. The whole test file has duplications hence I added an alternative. In some cases ERROR/WARNING order is swapped so I reduced the log level. Relevant PG commit: 618c16707a6d6e8f5c83ede2092975e4670201ad	2022-08-22 16:54:38 +03:00
naisila	c23162208d	Remove "invalid socket" from failure tests' outputs Relevant PG commit: b71a9cb31e46b08aeac35a4355936165648b3c49	2022-08-22 16:54:38 +03:00
naisila	a499fea352	Use pg_backup_stop(PG15) instead of pg_stop_backup(PG<15) Add an alternative test output because of the change in the backup modes of Postgres. Specifically here, there is a renaming issue: pg_stop_backup PRE PG15 vs pg_backup_stop PG15+ The alternative output can be deleted when we drop support for PG14 Relevant PG commit: 39969e2a1e4d7f5a37f3ef37d53bbfe171e7d77a	2022-08-22 16:54:38 +03:00
naisila	b7d7a2e6d8	Remove 'AS "?column?"' from test outputs There were some instances in the following tst outputs in planning debug outputs where AS "?column?" is added. We add a normalization rule to remove it as it is not important. cte_inline.out recursive_relation_planning_restriction_pushdown.out Relevant PG commit: c7461fc25558832dd347a9c8150b0f1ed85e36e8	2022-08-22 16:54:38 +03:00
naisila	7339b9ba53	Explicitly cast catalog "char" column to text before concatenation Relevant PG commit: 07eee5a0dc642d26f44d65c4e6263304208e8583	2022-08-22 16:54:38 +03:00
naisila	653ae8d6a0	Fix tests for generated columns dependency changes In PG15, For GENERATED columns, all dependencies of the generation expression are recorded as NORMAL dependencies of the column itself. This requires CASCADE to drop generated cols with the original col. PRE PG15, dependencies were recorded as AUTO, with which generated columns are silently dropped with the original column. Relevant PG commit: cb02fcb4c95bae08adaca1202c2081cfc81a28b5	2022-08-22 16:54:38 +03:00
naisila	b0349996cd	Fixes tests for ALTER TRIGGER RENAME consistency for part. tables Relevant PG commit: 80ba4bb383538a2ee846fece6a7b8da9518b6866	2022-08-22 16:54:38 +03:00
naisila	ea15caf369	Change warning message in pg_signal_backend() Relevant PG commit: 7fa945b857cc1b2964799411f1633468826861ff	2022-08-22 16:54:38 +03:00
naisila	4717351873	Handle new option colliculocale in CREATE COLLATION logic In PG15, there is an added option to use ICU as global locale provider. pg_collation has three locale-related fields: collcollate and collctype, which are libc-related fields, and a new one colliculocale, which is the ICU-related field. Only the libc-related fields or the ICU-related field is set, never both. Relevant PG commits: f2553d43060edb210b36c63187d52a632448e1d2 54637508f87bd5f07fb9406bac6b08240283be3b	2022-08-22 16:54:38 +03:00
Jelte Fennema	e2a24b921e	Fix flakyness in failure_create_distributed_table_non_empty (#6217 ) The failure_create_distributed_table_non_empty test would sometimes fail like this: ```diff -- in the first test, cancel the first connection we sent from the coordinator SELECT citus.mitmproxy('conn.cancel(' \|\| pg_backend_pid() \|\| ')'); - mitmproxy ---------------------------------------------------------------------- - -(1 row) - +ERROR: canceling statement due to user request +CONTEXT: COPY mitmproxy_result, line 1: "" +SQL statement "COPY mitmproxy_result FROM '/home/circleci/project/src/test/regress/tmp_check/mitmproxy.fifo'" +PL/pgSQL function citus.mitmproxy(text) line 11 at EXECUTE SELECT create_distributed_table('test_table', 'id'); ``` Because the cancel command had no filter it would actually sometimes cancel the mitmproxy cancel command itself. This PR addresses that by filtering on CREATE TABLE, which is one of the command that create_distributed_table will send to the workers. Example of failing test: https://app.circleci.com/pipelines/github/citusdata/citus/26252/workflows/1b7e5464-cca4-4ec1-99b3-48ddf25c29fa/jobs/742829	2022-08-20 01:23:25 +03:00
Jelte Fennema	4ce17f015b	Fix flakyness in columnar_memory test (#6216 ) Sometimes in CI the columnar_memory test was using slightly more memory than expected. ```diff SELECT CASE WHEN 1.0 * TopMemoryContext / :top_post BETWEEN 0.98 AND 1.02 THEN 1 ELSE 1.0 * TopMemoryContext / :top_post END AS top_growth FROM columnar_test_helpers.columnar_store_memory_stats(); --[ RECORD 1 ]- -top_growth \| 1 +-[ RECORD 1 ]------------------ +top_growth \| 1.0206132116232119 -- before this change, max mem usage while executing inserts was 28MB and ``` This PR changes the expectation to be slightly higher, such that this random increase in memory usage doesn't cause a flaky test. Failing test: https://app.circleci.com/pipelines/github/citusdata/citus/26256/workflows/c0870f66-3346-4f8d-a1d3-36dfd7c98289/jobs/743028	2022-08-19 23:46:28 +02:00
Jelte Fennema	de475feb69	Actually connect to the right database in logical_replication test (#6211 ) In the logical_replication test we test that the cleanup logic at the start of a shard move works as expected. To do so we create a subscription and publication slot manually. This changes the test to make that subscription actually connect to the database that the publication is in. Useful for #5987 #6085	2022-08-20 00:09:50 +03:00
Jelte Fennema	dfa6c26d7d	Increase isolation timeout because of shards splits (#6213 ) Recently isolation tests involving shard splits have been randomly failing in CI with timeouts. It's possible that there's an actual bug here, but it's also quite likely that our timeout is just slightly too low for the combination of shard splits and the CI VM having a bad day. Increasing the timeout is fairly low cost and allows us to find out if there's an actual bug or if its simply slowness. So that's what this PR does. If it turns out to be an actual bug, we can decrease the timeout again when we fix it. Examples of failed tests: 1. https://app.circleci.com/pipelines/github/citusdata/citus/26241/workflows/9e0bb721-d798-481b-907c-914236b63e38/jobs/742409 2. https://app.circleci.com/pipelines/github/citusdata/citus/26171/workflows/8f352e3b-e6e4-4f7f-b0d0-2543f62a0209/jobs/739470	2022-08-19 22:37:45 +03:00
Naisila Puka	9cfadd7965	Deletes unnecessary test outputs pt2 (#6214 )	2022-08-19 18:21:13 +03:00
Jelte Fennema	85305b2773	Don't run any isolation tests in parallel (#6212 ) By running isolation tests in parallel we're just asking for flaky tasks. The first test might temporarily block one of the commands in the second test, which we then detect as waiting like this: ```diff step s2-vacuum-analyze: VACUUM ANALYZE test_insert_vacuum; - + <waiting ...> step s1-commit: COMMIT; +step s2-vacuum-analyze: <... completed> ``` Debugging flaky tests is also much harder when they are run in parallel. This PR starts running all our isolation tests sequentially. The reason for opening this PR was me seeing this failing test: https://app.circleci.com/pipelines/github/citusdata/citus/26194/workflows/ff57e2cf-8ac4-40fe-bc0c-74a7f8fecb53/jobs/740454 As well as having fixed a similar issue recently in #6122	2022-08-19 17:05:36 +02:00
Önder Kalacı	616ff2a3fe	Adjust some isolation test for the recent PG commits (#6210 ) * Adjust some isolation test for the recent PG commits In `3f32395612`, Postgres starts any isolation session with `set application_name`. However, one of the tests we had expected that it is exactly the first command in the session. The test tries to show that even if a gpid has not been assigned, we can show it in the citus_lock_waits graph. Now that, it is literally not possible to have such test as gpid would be assigned after `set application_name` command. Still, it is good to have a test where a command is blocked on the parser	2022-08-19 17:06:34 +03:00
Jelte Fennema	e6a1a86db0	Improve debugability for columnar_memory flakyness (#6203 ) Sometimes the columnar_memory test fails in CI with the following error: ```diff SELECT 1.0 * TopMemoryContext / :top_post BETWEEN 0.98 AND 1.02 AS top_growth_ok FROM columnar_test_helpers.columnar_store_memory_stats(); -[ RECORD 1 ]-+-- -top_growth_ok \| t +top_growth_ok \| f -- before this change, max mem usage while executing inserts was 28MB and ``` This is almost certainly a harmless failure that simply requires bumping the margin a little bit. However, it's impossible to say with the current output. I was unable to reproduce this on-demand on my local machine or even in CI. So this changes the test to include the actual value difference in the size of TopMemoryContext when it's outside the expected range. Then next time it fails we at least have some information about why. Example of failing test: https://app.circleci.com/pipelines/github/citusdata/citus/25966/workflows/d472a57b-419a-4f33-b8bc-2e174a98d4d6/jobs/730576	2022-08-19 15:41:16 +02:00
Jelte Fennema	3f4440ff69	Improve debugability of failures in isolation_ref2ref_foreign_keys (#6197 ) As shown in #6196 the output of s1-view-locks is sometimes not as expected. However, because it's output is very minimal it's hard to understand the reason for that. This adds some more columns and aggregates less, so we can more easily see what locks are unexpectedly held or released. In passing this also fixes the following flaky part of this test by excluding locks taken by the maintenance daemon. After running it with this more detailed output for s1-view-locks it became obvious that that was the problem here. ```diff diff -dU10 -w /home/jelte/work/citus/src/test/regress/expected/isolation_ref2ref_foreign_keys.out /home/jelte/work/citus/src/test/regress/results/isolation_ref2ref_foreign_keys.out --- /home/jelte/work/citus/src/test/regress/expected/isolation_ref2ref_foreign_keys.out.modified 2022-08-18 15:42:08.689525233 +0200 +++ /home/jelte/work/citus/src/test/regress/results/isolation_ref2ref_foreign_keys.out.modified 2022-08-18 15:42:08.729525233 +0200 @@ -288,21 +288,22 @@ step s1-view-locks: SELECT mode, count(*) FROM pg_locks WHERE locktype='advisory' GROUP BY mode ORDER BY 1, 2; mode \|count ------------------------+----- -(0 rows) +ShareUpdateExclusiveLock\| 1 +(1 row) starting permutation: s2-begin s2-insert-table-3 s1-view-locks s2-rollback s1-view-locks step s2-begin: BEGIN; step s2-insert-table-3: INSERT INTO ref_table_3 VALUES (7, 5); step s1-view-locks: ```	2022-08-19 15:12:09 +02:00
Jelte Fennema	25e5cf2e50	Fix flakyness in failure_setup (#6205 ) In CI sometimes failure_setup will fail with the following error: ```diff SELECT master_add_node('localhost', :worker_2_proxy_port); -- an mitmproxy which forwards to the second worker - master_add_node ---------------------------------------------------------------------- - 2 -(1 row) - +ERROR: connection to the remote node localhost:9060 failed with the following error: could not connect to server: Connection refused + Is the server running on host "localhost" (127.0.0.1) and accepting + TCP/IP connections on port 9060? +could not connect to server: Connection refused + Is the server running on host "localhost" (127.0.0.1) and accepting + TCP/IP connections on port 9060? +could not connect to server: Cannot assign requested address + Is the server running on host "localhost" (::1) and accepting + TCP/IP connections on port 9060? diff -dU10 -w /home/circleci/project/src/test/regress/expected/failure_online_move_shard_placement.out /home/circleci/project/src/test/regress/results/failure_online_move_shard_placement.out ``` This then breaks all the tests run after it as well, because we're missing one worker node. Locally I was able to reproduce this error by sleeping for 10 seconds in the forked process sleep before actually starting mitmproxy. So I'm expecting what's happening in CI is that due to limited resources, mitmproxy is not up yet when we try to add its port as a workernode. This PR fixes this by waiting until mitmproxy is listening on its socket before actually starting to run our tests. This fixed it locally for me when I made the forked process sleep for 10 seconds before starting mitmproxy. In passing it also improves the detection and errors that we already had for the case where something was already listening on the mitmproxy port. Because both @gledis69 and me were changing things in our CI images at the same time this also includes a bump of the style checker tools. Closes #6200	2022-08-19 13:03:08 +00:00
Jelte Fennema	fe1668e43f	Fix flakyness in multi_utilities (#6204 ) Sometimes this multi_utilities would fail with the following error: ```diff SET citus.log_remote_commands TO ON; -- should propagate to all workers because no table is specified ANALYZE; NOTICE: issuing BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;SELECT assign_distributed_transaction_id(0, 3461, '2022-08-19 01:56:06.35816-07'); DETAIL: on server postgres@localhost:57637 connectionId: 1 NOTICE: issuing BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;SELECT assign_distributed_transaction_id(0, 3461, '2022-08-19 01:56:06.35816-07'); DETAIL: on server postgres@localhost:57638 connectionId: 2 NOTICE: issuing SET citus.enable_ddl_propagation TO 'off' DETAIL: on server postgres@localhost:57637 connectionId: 1 -NOTICE: issuing SET citus.enable_ddl_propagation TO 'off' -DETAIL: on server postgres@localhost:xxxxx connectionId: xxxxxxx NOTICE: issuing ANALYZE DETAIL: on server postgres@localhost:57637 connectionId: 1 +NOTICE: issuing SET citus.enable_ddl_propagation TO 'off' +DETAIL: on server postgres@localhost:57638 connectionId: 2 NOTICE: issuing ANALYZE DETAIL: on server postgres@localhost:57638 connectionId: 2 ``` This is simply a harmless change in output due to some timing differences. This PR makes the test output consistent by only logging the remote ANALYZE commands, not the SET commands.	2022-08-19 12:38:55 +02:00
Jelte Fennema	8ce12eb51f	Fix flakyness in failure_insert_select_repartition (#6202 ) This fixes our most commonly randomly failing failure test. The failing diff is as follows: ```diff SELECT citus.mitmproxy('conn.onQuery(query="fetch_intermediate_results").kill()'); mitmproxy ----------- (1 row) INSERT INTO target_table SELECT * FROM source_table; -ERROR: connection to the remote node localhost:xxxxx failed with the following error: connection not open +ERROR: could not open file "base/pgsql_job_cache/10_0_40/repartitioned_results_20770193413_from_4213590_to_1.data": No such file or directory +CONTEXT: while executing command on localhost:9060 +while executing command on localhost:57637 SELECT * FROM target_table ORDER BY a; ``` As far as I can tell this is the cause of a race condition: After killing fetch_intermediate_results on worker 9060, the previously created data file gets cleaned up. The fetch_intermediate_results call that's sent to worker 57637 will be cancelled and rolled back soon because of the failure on the other connection. But if that fetch_intermediate_results call is able to connect to 9060 before it is cancelled, it won't find the file it's looking for there anymore. So while it's not the error we expect, it does indicate that we succeeded. To avoid this issue instead of killing the fetch_intermediate_results call directly, we kill the COPY command that it uses to do the fetch. This results in stable output as can be seen here, where 227 runs of failure_insert_select_repartition succeeded: https://app.circleci.com/pipelines/github/citusdata/citus/26168/workflows/9c64a3b6-f46c-4725-9fb4-8f6a2d00a023/jobs/739389 To be clear this changes the test to affects the opposite fetch_intermediate_results call. This kills the fetch_intermediate_results call of worker 57637, instead of killing the fetch_intermediate_results call on worker 9060. Example of failing test: https://app.circleci.com/pipelines/github/citusdata/citus/26147/workflows/780e95ea-264a-4c9f-ad2e-cf11449a795e/jobs/738467	2022-08-19 09:11:07 +00:00
Naisila Puka	5a9fdc221b	Add explicit alias to avoid debug output diff in pg15 (#6183 )	2022-08-19 11:39:18 +03:00
Jelte Fennema	31faa88a4e	Track rebalance progress at the shard move level (#6187 ) We're in the processes of totally changing the shard rebalancer experience and infrastructure. Soon the shard rebalancer will include retries, crash recovery and support for running in the background. These improvements come at a cost though, the way the get_rebalance_progress UDF currently works is very hard to replicate with this new structure. This is mostly because the old behaviour doesn't really make sense anymore with this new infrastructure. A new and better way to track the progress will be included as part of the new infrastructure. This PR is in preparation of the new code rebalancer experience. It changes the get_rebalance_progress UDF to only display the moves that are in progress at the moment, not the ones that happened in the past or that are planned in the future. Another option would have been to completely remove the current get_rebalance_progress functionality and point people to the new way of tracking progress. But old blogposts still reference the old UDF and users might have some automation on top of it. Showing the progress of the current moves is fairly simple to achieve, even with the new infrastructure. So this PR is a kind of compromise: It doesn't have complete feature parity with the old get_rebalance_progress, but the most common use cases will still work. There's also an advantage of the change: You can now see progress of shard moves that were triggered by calling citus_move_shard_placement manually. Instead of only being able to see progress of moves that were initiated using get_rebalance_table_shards.	2022-08-18 18:57:04 +02:00
Önder Kalacı	961fcff5db	Properly add / remove coordinator for isolation tests (#6181 ) We used to rely on a seperate session to add the coordinator. However, that might prevent the existing sessions to get assigned proper gpids, which causes flaky tests.	2022-08-18 17:32:12 +03:00
Jelte Fennema	7dca028391	Fix flakyness in isolation_reference_table (#6193 ) The newly introduced isolation_reference_table test had some flakyness, because the assumption on how the arbitrary reference table gets chosen was incorrect. This introduces a VACUUM FULL at the start of the test to ensure the assumption actually holds. Example of failed test: https://app.circleci.com/pipelines/github/citusdata/citus/26108/workflows/0a5cd526-006b-423e-8b67-7411b9c6be36/jobs/736802	2022-08-18 15:47:28 +03:00
Jelte Fennema	0a045afd3a	Fix flakyness in columnar_first_row_number test (#6192 ) When running columnar_first_row_number in parallel with the columnar_query test sometimes it would fail. This bug is tracked in #6191. For now to make CI less flaky we simply don't run these tests in parallel. Example of failed test: https://app.circleci.com/pipelines/github/citusdata/citus/26106/workflows/75d00ea9-23f8-4bff-a927-bced19e1f81b/jobs/736713 Fixes #6184	2022-08-18 15:32:57 +03:00
Jelte Fennema	d16b458e2a	Remove the flaky rollback_to_savepoint test (#6190 ) This removes a flaky test that I introduced in #3868 after I fixed the issue described in #3622. This test is sometimes fails randomly in CI. The way it fails indicates that there might be some bug: A connection breaks after rolling back to a savepoint. I tried reproducing this issue locally, but I wasn't able to. I don't understand what causes the failure. Things that I tried were: 1. Running the test with: ```sql SET citus.force_max_query_parallelization = true; ``` 2. Running the test with: ```sql SET citus.max_adaptive_executor_pool_size = 1; ``` 3. Running the test in parallel with the same tests that it is run in parallel with in multi_schedule. None of these allowed me to reproduce the issue locally. So I think it's time to give on fixing this test and simply remove the test. The regression that this test protects against seems very unlikely to reappear, since in #3868 I also added a big comment about the need for the newly added `UnclaimConnection` call. So, I think the need for the test is quite small, and removing it will make our CI less flaky. In case the cause of the bug ever gets found, I tracked the bug in #6189 Example of a failing CI run: https://app.circleci.com/pipelines/github/citusdata/citus/26098/workflows/f84741d9-13b1-4ae7-9155-c21ed3466951/jobs/736424 For reference the unexpected diff is this (so both warnings and an error): ```diff INSERT INTO t SELECT i FROM generate_series(1, 100) i; +WARNING: connection to the remote node localhost:57638 failed with the following error: +WARNING: +CONTEXT: while executing command on localhost:57638 +ERROR: connection to the remote node localhost:57638 failed with the following error: ROLLBACK; ``` This test is also mentioned as the most failing regression test in #5975	2022-08-18 15:14:16 +03:00
Onder Kalaci	9ec8e627c1	Support Sequences owned by columns before distributing tables There are 3 different ways that a sequence can be interacting with tables. (1) and (2) are already supported. This commit adds support for (3). (1) column DEFAULT nextval('seq'): The dependency is roughly like below, and ExpandCitusSupportedTypes() is responsible for finding the depending sequences. schema <--- table <--- column <---- default value ^ \| \|------------------ sequence <--------\| (2) serial columns: Bigserial/small serial etc: The dependency is roughly like below, and ExpandCitusSupportedTypes() is responsible for finding the depending sequences. schema <--- table <--- column <---- default value ^ \| \| \| sequence <--------\| (3) Sequence OWNED BY table.column: Added support for this type of resolution in this commit. The dependency is almost like the following, and ExpandCitusSupportedTypes() is NOT responsible for finding the dependency. schema <--- table <--- column ^ \| sequence	2022-08-18 10:29:40 +02:00
Naisila Puka	69ffdbf0e3	Uses object name in cannot distribute object error (#6186 ) Object type ids have changed in PG15 because of at least two added objects in the list: OBJECT_PARAMETER_ACL, OBJECT_PUBLICATION_NAMESPACE To avoid different output between pg versions, let's use the object name in the error, and put the object id in the error detail. Relevant PG commits: a0ffa885e478f5eeacc4e250e35ce25a4740c487 5a2832465fd8984d089e8c44c094e6900d987fcd	2022-08-18 11:05:17 +03:00
Ying Xu	91473635db	[Columnar] Check for existence of Citus before creating Citus_Columnar (#6178 ) * Added a check to see if Citus has already been loaded before creating citus_columnar * added tests	2022-08-17 15:12:42 -07:00
Nils Dijk	a9d47a96f6	Fix reference table lock contention (#6173 ) DESCRIPTION: Fix reference table lock contention Dropping and creating reference tables unintentionally blocked on each other due to the use of an ExclusiveLock for both the Drop and conditionally copying existing reference tables to (new) nodes. The patch does the following: - Lower lock lever for dropping (reference) tables to `ShareLock` so they don't self conflict - Treat reference tables and distributed tables equally and acquire the colocation lock when dropping any table that is in a colocation group - Perform the precondition check for copying reference tables twice, first time with a lower lock that doesn't conflict with anything. Could have been a NoLock, however, in preparation for dropping a colocation group, it is an `AccessShareLock` During normal operation the first check will always pass and we don't have to escalate that lock. Making it that we won't be blocked on adding and remove reference tables. Only after a node addition the first `create_reference_table` will still need to acquire an `ExclusiveLock` on the colocation group to perform the copy.	2022-08-17 18:19:28 +02:00
Ahmet Gedemenli	0631e1998b	Fix upgrade paths for #6100 (#6176 ) * Fix upgrade paths for #6100 Co-authored-by: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com>	2022-08-17 18:56:53 +03:00
Naisila Puka	20a0e0ed39	Grant create on public to some users where necessary (for PG15) (#6180 )	2022-08-17 17:35:10 +03:00
aykut-bozkurt	52efe08642	default mode for shard splitting is set to auto. (#6179 )	2022-08-17 12:18:47 +03:00
aykut-bozkurt	be06d65721	Nonblocking tenant isolation is supported by using split api. (#6167 )	2022-08-17 11:13:07 +03:00
Jelte Fennema	78a5013e24	Support changing CPU priorities for backends and shard moves (#6126 ) Intro This adds support to Citus to change the CPU priority values of backends. This is created with two main usecases in mind: 1. Users might want to run the logical replication part of the shard moves or shard splits at a higher speed than they would do by themselves. This might cause some small loss of DB performance for their regular queries, but this is often worth it. During high load it's very possible that the logical replication WAL sender is not able to keep up with the WAL that is generated. This is especially a big problem when the machine is close to running out of disk when doing a rebalance. 2. Users might have certain long running queries that they don't impact their regular workload too much. Be very careful!!! Using CPU priorities to control scheduling can be helpful in some cases to control which processes are getting more CPU time than others. However, due to an issue called "[priority inversion][1]" it's possible that using CPU priorities together with the many locks that are used within Postgres cause the exact opposite behavior of what you intended. This is why this PR only allows the PG superuser to change the CPU priority of its own processes. Currently it's not recommended to set `citus.cpu_priority` directly. Currently the only recommended interface for users is the setting called `citus.cpu_priority_for_logical_replication_senders`. This setting controls CPU priority for a very limited set of processes (the logical replication senders). So, the dangers of priority inversion are also limited with when using it for this usecase. Background Before reading the rest it's important to understand some basic background regarding process CPU priorities, because they are a bit counter intuitive. A lower priority value, means that the process will be scheduled more and whatever it's doing will thus complete faster. The default priority for processes is 0. Valid values are from -20 to 19 inclusive. On Linux a larger difference between values of two processes will result in a bigger difference in percentage of scheduling. Handling the usecases Usecase 1 can be achieved by setting `citus.cpu_priority_for_logical_replication_senders` to the priority value that you want it to have. It's necessary to set this both on the workers and the coordinator. Example: ``` citus.cpu_priority_for_logical_replication_senders = -10 ``` Usecase 2 can with this PR be achieved by running the following as superuser. Note that this is only possible as superuser currently due to the dangers mentioned in the "Be very carefull!!!" section. And although this is possible it's NOT recommended: ```sql ALTER USER background_job_user SET citus.cpu_priority = 5; ``` OS configuration To actually make these settings work well it's important to run Postgres with more a more permissive value for the 'nice' resource limit than Linux will do by default. By default Linux will not allow a process to set its priority lower than it currently is, even if it was lower when the process originally started. This capability is necessary to reset the CPU priority to its original value after a transaction finishes. Depending on how you run Postgres this needs to be done in one of two ways: If you use systemd to start Postgres all you have to do is add a line like this to the systemd service file: ```conf LimitNice=+0 # the + is important, otherwise its interpreted incorrectly as 20 ``` If that's not the case you'll have to configure `/etc/security/limits.conf` like so, assuming that you are running Postgres as the `postgres` OS user: ``` postgres soft nice 0 postgres hard nice 0 ``` Finally you'd have add the following line to `/etc/pam.d/common-session` ``` session required pam_limits.so ``` These settings would allow to change the priority back after setting it to a higher value. However, to actually allow you to set priorities even lower than the default priority value you would need to change the values in the config to something lower than 0. So for example: ```conf LimitNice=-10 ``` or ``` postgres soft nice -10 postgres hard nice -10 ``` If you use WSL2 you'll likely have to do another thing. You have to open a new shell, because when PAM is only used during login, and WSL2 doesn't actually log you in. You can force a login like this: ``` sudo su $USER --shell /bin/bash ``` Source: https://stackoverflow.com/a/68322992/2570866 [1]: https://en.wikipedia.org/wiki/Priority_inversion	2022-08-16 13:07:17 +03:00
Jelte Fennema	43c2a1e88b	Share more code between splits and moves (#6152 ) When introducing non-blocking shard split functionality it was based heavily on the non-blocking shard moves. However, differences between usage was slightly to big to be able to reuse the existing functions easily. So, most logical replication code was simply copied to dedicated shard split functions and modified for that purpose. This PR tries to create a more generic logical replication infrastructure that can be used by both shard splits and shard moves. There's probably more code sharing possible in the future, but I believe this is at least a good start and addresses the lowest hanging fruit. This also adds a CreateSimpleHash function that makes creating the most common type of hashmap common.	2022-08-15 20:21:51 +03:00
yxu2162	e1322ec905	Change for PG15 test because hash_mem_multiplier was changed to 2 as a default instead of 1 which was what PG13/14 have	2022-08-11 09:49:56 -07:00
Teja Mupparti	e962113c63	Remove the GUC mention in the error message as this config is meant for advanced users	2022-08-11 09:43:14 -07:00

1 2 3 4 5 ...

2493 Commits (bed3770d5454f1f959b568c03dd6e9c9b2938a1b)