citus

Commit Graph

Author	SHA1	Message	Date
Naisila Puka	35b4ddc355	Pg15 support (#6085 ) * Adjust configure script to allow PG15 * Adds copy of ruleutils_14.c as ruleutils_15.c * Uses get_namespace_name_or_temp in ruleutils_15.c Relevant PG commit: 48c5c9068211e0a04fd9553c8714b2821ed3ad17 * Clean up code using "(expr) ? true : false" in ruleutils_15.c Relevant PG commit: fd0625c7a9c679c0c1e896014b8f49a489c3a245 * Change varno from Index (unsigned int) to int in ruleutils_15.c Relevant PG commit: e3ec3c00d85bd2844ffddee83df2bd67c4f8297f * Adds find_recursive_union to ruleutils_15.c Relevant PG commit: 3f50b82639637c9908afa2087de7588450aa866b * Fix display of SQL-std func's args in INSERT/SELECT in ruleutils_15.c Relevant PG commit: a8d8445a7b2f80f6d0bfe97b19f90bd2cbef8759 * Fix ruleutils_15.c's dumping of whole-row Vars in more contexts Relevant PG commit: 43c2175121c829c8591fc5117b725f1f22bfb670 * Fix assorted missing logic for GroupingFunc nodes in ruleutils_15.c Relevant PG commit: 2591ee8ec44d8cbc8e1226550337a64c684746e4 * Adds grammar support for SQL/JSON clauses in ruleutils_15.c Relevant PG commit: f79b803dcc98d707450e158db3638dc67ff8380b * Adds SQL/JSON constructors to ruleutils_15.c Relevant PG commits: f4fb45d15c59d7add2e1b81a9d477d0119a9691a cc7401d5ca498a84d9b47fd2e01cebd8e830e558 * Adds support for MERGE in ruleutils_15.c Relevant PG commit: 7103ebb7aae8ab8076b7e85f335ceb8fe799097c * Add IS JSON predicate to ruleutils_15.c Relevant PG commit: 33a377608fc29cdd1f6b63be561eab0aee5c81f0 * Add SQL/JSON query functions to ruleutils_15.c Relevant PG commit: 1a36bc9dba8eae90963a586d37b6457b32b2fed4 * Adds three different SQL/JSON values to ruleutils_15.c Relevant PG commits: 606948b058dc16bce494270eea577011a602810e 49082c2cc3d8167cca70cfe697afb064710828ca * Adds JSON table functions in ruleutils_15.c Relevant PG commit: 4e34747c88a03ede6e9d731727815e37273d4bc9 * Add PLAN function for JSON table in ruleutils_15.c Relevant PG commit: fadb48b00e02ccfd152baa80942de30205ab3c4f * Remove extra blank lines before block-closing braces ruleutils_15.c Relevant PG commit: 24d2b2680a8d0e01b30ce8a41c4eb3b47aca5031 * set_deparse_plan: Reuse variable to appease Coverity ruleutils_15.c Relevant PG commit: e70813fbc4aaca35ec012d5a426706bd54e4acab * Mechanical code beautification ruleutils_15.c Relevant PG commit: 23e7b38bfe396f919fdb66057174d29e17086418 * Rename value_type to item_type in ruleutils_15.c Relevant PG commit: 3ab9a63cb638a1fd99475668e2da9c237495aeda * Show 'AS "?column?"' explicitly when it's important in ruleutils_15.c Relevant PG commit: c7461fc25558832dd347a9c8150b0f1ed85e36e8 * Fix ruleutils_15.c issues with dropped cols in funcs-returning-composite Relevant PG commit: c1d1e8469c77ce6b8e5310955580b4a3eee7fe96 * Change comment regarding functions returning composite in ruleutils_15.c Relevant PG commit: c2fa113ddb1117b1f03e91960f65d5d7d8a90270 * Replace int nodes with bool nodes where needed In PG15, Boolean nodes are added. Pre PG15, internal Boolean values in Create Role commands were represented by Integer nodes. This commit replaces int nodes logic with bool nodes logic where needed. Mostly there are CREATE ROLE logic changes. Relevant PG commit: 941460fcf731a32e6a90691508d5cfa3d1f8eeaf * Handle new option colliculocale in CREATE COLLATION logic In PG15, there is an added option to use ICU as global locale provider. pg_collation has three locale-related fields: collcollate and collctype, which are libc-related fields, and a new one colliculocale, which is the ICU-related field. Only the libc-related fields or the ICU-related field is set, never both. Relevant PG commits: f2553d43060edb210b36c63187d52a632448e1d2 54637508f87bd5f07fb9406bac6b08240283be3b * Add PG15 tests to CI using test images that have 15beta2 (#6093) * Change warning message in pg_signal_backend() Relevant PG commit: 7fa945b857cc1b2964799411f1633468826861ff * Revert "Add missing ifdef for PG 15" This reverts commit `c7b51025ab`. * Fixes tests for ALTER TRIGGER RENAME consistency for part. tables Relevant PG commit: 80ba4bb383538a2ee846fece6a7b8da9518b6866 * Prevent creating child triggers on partitions when adding new node Pre PG15, tgisinternal is true for a "child" trigger on a partition cloned from the trigger on the parent. In PG15, tgisinternal is false in that case. However, we don't want to create this trigger on the partition since it will create a conflict when we try to attach the partition to the parent table: ERROR: trigger "..." for relation "{partition_name}" already exists Relevant PG commit: f4566345cf40b068368cb5617e61318da60676ec * Fix tests for generated columns dependency changes In PG15, For GENERATED columns, all dependencies of the generation expression are recorded as NORMAL dependencies of the column itself. This requires CASCADE to drop generated cols with the original col. PRE PG15, dependencies were recorded as AUTO, with which generated columns are silently dropped with the original column. Relevant PG commit: cb02fcb4c95bae08adaca1202c2081cfc81a28b5 * Explicitly cast catalog "char" column to text before concatenation Relevant PG commit: 07eee5a0dc642d26f44d65c4e6263304208e8583 * Remove 'AS "?column?"' from test outputs There were some instances in the following tst outputs in planning debug outputs where AS "?column?" is added. We add a normalization rule to remove it as it is not important. cte_inline.out recursive_relation_planning_restriction_pushdown.out Relevant PG commit: c7461fc25558832dd347a9c8150b0f1ed85e36e8 * Use pg_backup_stop(PG15) instead of pg_stop_backup(PG<15) Add an alternative test output because of the change in the backup modes of Postgres. Specifically here, there is a renaming issue: pg_stop_backup PRE PG15 vs pg_backup_stop PG15+ The alternative output can be deleted when we drop support for PG14 Relevant PG commit: 39969e2a1e4d7f5a37f3ef37d53bbfe171e7d77a * Adds citus.mitmfifo GUC Previously we setting this configuration parameter in the fly for failure tests schedule. However, PG15 doesn't allow that anymore: reserved prefixes like "citus" cannot be used to set non-existing GUCs. Relevant PG commit: 88103567cb8fa5be46dc9fac3e3b8774951a2be7 * Handles EXPLAIN output diffs in PG15 - Extra result lines To handle extra "Result" lines in explain outputs, we add explain method to multi_test_helpers.sql file - plan_without_result_lines() is added for cases where we want the whole explain output with only "Result" lines removed * Handles EXPLAIN output diffs in PG15, Hash Agg/Join leverage To handle differences in usage of GroupAggregate vs HashAggregate or Merge Join vs Hash join in cases where this detail doesn't seem to matter, we use coordinator_plan(). - coordinator_plan() is updated to remove "Result" lines There are some cases where we have subplans so we add a new function that prints all Task Count lines as well - coordinator_plan_with_subplans() Still not sure of the relevant PG commit Could be db0d67db2401eb6238ccc04c6407a4fd4f985832 but disabling enable_group_by_reordering didn't help. * Handles EXPLAIN output diffs in PG15: enable_group_by_reordering Relevant PG commit db0d67db2401eb6238ccc04c6407a4fd4f985832 * Normalizes Memory Usage, Buckets, Batches for PG15 explain diffs We create a new function in multi_test_helpers, which is similar to explain_merge function in PG15. This explain helper function normalies Memory Usage, Buckets and Batches, and we use it in the tests which give a different output for PG15. * Bump test images to 15beta3 (#6172) * Omit namespace in post-copy errmsg Relevant PG commit: 069d33d0c5a021601245e44df77a0423ddd69359 * Handles EXPLAIN output diffs in PG15: extra arrows&result lines To handle extra "->" arrows resulting from extra Result lines in explain outputs, we add the following explain method to multi_test_helpers.sql file - plan_without_arrows() is added for cases where we want the whole explain output without arrows and without Result lines * Alters public schema's owner to pg_database_owner in PG15 In PG15, public schema is owned by pg_database_owner role. In multi_extension, we drop and recreate the ppublic schema, hence its owner become the default user in our tests, postgres. Change that to pg_database_owner for PG15 consistency. This results in alternative test output for public schema grants in the following test: grant_on_schema_propagation.sql Relevant PG commit: b073c3ccd06e4cb845e121387a43faa8c68a7b62 * Add alternative test outputs for change in Insert Select display citus_local_tables_queries.sql coordinator_shouldhaveshards.sql cte_inline.sql insert_select_repartition.sql intermediate_result_pruning.sql local_shard_execution.sql local_shard_execution_replicated.sql multi_deparse_shard_query.sql multi_insert_select.sql multi_insert_select_conflict.sql multi_mx_insert_select_repartition.sql mx_coordinator_shouldhaveshards.sql single_node.sql Relevant PG commit: a8d8445a7b2f80f6d0bfe97b19f90bd2cbef8759 * Fixes columnar tap tests for PG15 In PG15, Perl test modules have been moved to a new namespace. Also, postgres node new() and get_new_node() methods have been unified to one method: new() We create separate tap tests for PG13/14 and PG15+ and update the Makefiles accordingly. Relevant PG commits: 201a76183e2056c2217129e12d68c25ec9c559c8 b3b4d8e68ae83f432f43f035c7eb481ef93e1583 * Handles EXPLAIN output diffs in PG15: HashAgg Leverage,alt. output Still not sure of the relevant PG commit Could be db0d67db2401eb6238ccc04c6407a4fd4f985832 but disabling enable_group_by_reordering didn't help.	2022-08-24 17:59:17 +02:00
Naisila Puka	ddbd10d2e7	Rename server version checks in tests (#6239 )	2022-08-24 16:31:52 +03:00
Jelte Fennema	5c0205ce10	Fix flakyness in multi_replicate_reference_table (#6235 ) In CI multi_replicate_reference_table would sometimes fail like this: ```diff -- detects correctly that referecence table doesn't have replica identity SELECT replicate_reference_tables(); -ERROR: cannot use logical replication to transfer shards of the relation initially_not_replicated_reference_table since it doesn't have a REPLICA IDENTITY or PRIMARY KEY +ERROR: cannot use logical replication to transfer shards of the relation ref_table since it doesn't have a REPLICA IDENTITY or PRIMARY KEY DETAIL: UPDATE and DELETE commands on the shard will error out during logical replication unless there is a REPLICA IDENTITY or PRIMARY KEY. HINT: If you wish to continue without a replica identity set the shard_transfer_mode to 'force_logical' or 'block_writes'. ``` Because `CitusTableTypeIdList` returns tables in heap order so it's a bit random which one is first in the list. And the test contained multiple tables that didn't have a primary key or replica identity. So it made sense that the error could be for either one of these tables. This PR makes the test output consistent by changing one of the tables to have a primary key. Example of failing test: https://app.circleci.com/pipelines/github/citusdata/citus/26387/workflows/fc3196e7-ddf2-4000-a70b-5ac71c836321/jobs/748940	2022-08-24 13:34:10 +03:00
aykut-bozkurt	041f88d7bf	Revert "Revert "Creates new colocation for colocate_with:='none' too"" (#6227 ) This reverts commit `d171a736ab`.	2022-08-24 10:54:04 +03:00
Marco Slot	bad8196da3	Verify that we can replicate reference tables using rebalancer (#6232 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2022-08-24 00:34:21 +02:00
Jelte Fennema	e0ada050aa	Enable binary logical replication for shard moves (#6017 ) Using binary encoding can save a lot of CPU cycles, both on the sender and on the receiver. Since the walsender and walreceiver processes are single threaded, this can matter a lot for the throughput if they are bottlenecked on CPU. This feature is only available in PG14, not PG13. It should be safe to always enable because it's only used for types that support binary encoding according to the PG docs: > Even when this option is enabled, only data types that have binary > send and receive functions will be transferred in binary. But in case it causes problems, it can still be disabled by setting `citus.enable_binary_protocol` to `false`.	2022-08-23 16:38:00 +02:00
Jelte Fennema	cc7e93a56a	Fix flakyness in failure_connection_establishment (#6226 ) In CI our failure_connection_establishment sometimes failed randomly with the following error: ```diff -- verify a connection attempt was made to the intercepted node, this would have cause the -- connection to have been delayed and thus caused a timeout SELECT * FROM citus.dump_network_traffic() WHERE conn=0; conn \| source \| message ------+--------+--------- - 0 \| coordinator \| [initial message] -(1 row) +(0 rows) SELECT citus.mitmproxy('conn.allow()'); ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26318/workflows/d3354024-9a67-4b01-9416-5cf79aec6bd8/jobs/745558 The way I fixed this was by removing the dump_network_traffic call. This might sound simple, but doing this while continuing to let the test serve its intended purpose required quite some more changes. This dump_network_traffic call was there because we didn't want to show warnings in the queries above, because the exact warnings were not reliable. The main reason this error was not reliable was because we were using round-robin task assignment. We did the same query twice, so that it would hit the node with the intercepted connection in one of those connections. Instead of doing that I'm now using the "first-replica" policy and do the queries only once. This works, because the first placements by placementid for each of the used tables are on the second node, so first-replica will cause the first connection to go there. This solved most of the flakyness, but when confirming that the flakyness was fixed I found some additional errors: ```diff -- show that INSERT failed SELECT citus.mitmproxy('conn.allow()'); mitmproxy ----------- (1 row) SELECT count(*) FROM single_replicatated WHERE key = 100; - count ---------------------------------------------------------------------- - 0 -(1 row) - +ERROR: could not establish any connections to the node localhost:9060 after 400 ms RESET client_min_messages; ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26321/workflows/fd5f4622-400c-465e-8d82-83f5f55a87ec/jobs/745666 I addressed this with a combination of two things: 1. Only change citus.node_connection_timeout for the queries that we want to test timeout behaviour for. When those queries are done I reset the value to the default again. 2. Change our mitm framework to only delay the initial connection packet instead of all packets. I think sometimes a follow on packet of a previous connection attempt was causing the next connection attempt to be delayed even if `conn.allow()` was already called. For our tests we only care about connection timeouts, so there's no reason to delay any other packets than the initial connection packet. Then there was some last flakyness in the exact error that was given: ```diff -- tests for connectivity checks SELECT name FROM r1 WHERE id = 2; WARNING: could not establish any connections to the node localhost:9060 after 900 ms +WARNING: connection to the remote node localhost:9060 failed with the following error: name ------ bar (1 row) ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26338/workflows/9610941c-4d01-4f62-84dc-b91abc56c252/jobs/746467 I don't have a good explaination for this slight change in error message, but given that it is missing the actual error message I expected this to be related to some small difference in timing: e.g. the server responding to the connection attempt right after the coordinator determined that the connection timed out. To solve this last flakyness I increased the connection timeouts and made the difference between the timeout and the delay a bit bigger. With these tweaks I wasn't able to reproduce this error on CI anymore. Finally, I made most of the same changes to failure_failover_to_local_execution, since it was using the `conn.delay()` mitm method too. The only change that I left out was the timing increase, since it might not be strictly necessary and increases time it takes to run the test. If this test ever becomes flaky the first thing we should try is increase its timeout.	2022-08-23 15:04:20 +03:00
Jelte Fennema	506c16efdf	Fix flakyness in failure_single_select (#6223 ) The failure_single_select test would sometimes fail with an error that's similar to this: ```diff -- cancel after first SELECT; txn should fail and nothing should be marked as invalid SELECT citus.mitmproxy('conn.onQuery(query="^SELECT").cancel(' \|\| pg_backend_pid() \|\| ')'); - mitmproxy ---------------------------------------------------------------------- - -(1 row) - +ERROR: canceling statement due to user request +CONTEXT: COPY mitmproxy_result, line 1: "" +SQL statement "COPY mitmproxy_result FROM '/home/circleci/project/src/test/regress/tmp_check/mitmproxy.fifo'" +PL/pgSQL function citus.mitmproxy(text) line 11 at EXECUTE BEGIN; ``` This error looked very to the one from #6217 and indeed the cause turned out to be similar. Because we were canceling all SELECT queries, we would actually sometimes cancel our mitmproxy SELECT queries itself. This puts some additional restrictions on the queries that we cancel, most importantly it should contain the name of the table that we're selecting from. I was able to reproduce the original issue locally pretty reliably. With the changes in this PR it didn't happen again. In passing this also changes one other failure test that was cancelling all selects and puts similar additional restrictions on those cancellations. Example of failed test in CI: https://app.circleci.com/pipelines/github/citusdata/citus/26305/workflows/4d942b91-f83c-453c-8d9a-ae22d608e756/jobs/745071	2022-08-22 20:06:33 +02:00
Hanefi Onaldi	e33ba7da9e	Decrease min messages for normalization	2022-08-22 17:16:52 +03:00
Jelte Fennema	e2a24b921e	Fix flakyness in failure_create_distributed_table_non_empty (#6217 ) The failure_create_distributed_table_non_empty test would sometimes fail like this: ```diff -- in the first test, cancel the first connection we sent from the coordinator SELECT citus.mitmproxy('conn.cancel(' \|\| pg_backend_pid() \|\| ')'); - mitmproxy ---------------------------------------------------------------------- - -(1 row) - +ERROR: canceling statement due to user request +CONTEXT: COPY mitmproxy_result, line 1: "" +SQL statement "COPY mitmproxy_result FROM '/home/circleci/project/src/test/regress/tmp_check/mitmproxy.fifo'" +PL/pgSQL function citus.mitmproxy(text) line 11 at EXECUTE SELECT create_distributed_table('test_table', 'id'); ``` Because the cancel command had no filter it would actually sometimes cancel the mitmproxy cancel command itself. This PR addresses that by filtering on CREATE TABLE, which is one of the command that create_distributed_table will send to the workers. Example of failing test: https://app.circleci.com/pipelines/github/citusdata/citus/26252/workflows/1b7e5464-cca4-4ec1-99b3-48ddf25c29fa/jobs/742829	2022-08-20 01:23:25 +03:00
Jelte Fennema	4ce17f015b	Fix flakyness in columnar_memory test (#6216 ) Sometimes in CI the columnar_memory test was using slightly more memory than expected. ```diff SELECT CASE WHEN 1.0 * TopMemoryContext / :top_post BETWEEN 0.98 AND 1.02 THEN 1 ELSE 1.0 * TopMemoryContext / :top_post END AS top_growth FROM columnar_test_helpers.columnar_store_memory_stats(); --[ RECORD 1 ]- -top_growth \| 1 +-[ RECORD 1 ]------------------ +top_growth \| 1.0206132116232119 -- before this change, max mem usage while executing inserts was 28MB and ``` This PR changes the expectation to be slightly higher, such that this random increase in memory usage doesn't cause a flaky test. Failing test: https://app.circleci.com/pipelines/github/citusdata/citus/26256/workflows/c0870f66-3346-4f8d-a1d3-36dfd7c98289/jobs/743028	2022-08-19 23:46:28 +02:00
Jelte Fennema	de475feb69	Actually connect to the right database in logical_replication test (#6211 ) In the logical_replication test we test that the cleanup logic at the start of a shard move works as expected. To do so we create a subscription and publication slot manually. This changes the test to make that subscription actually connect to the database that the publication is in. Useful for #5987 #6085	2022-08-20 00:09:50 +03:00
Naisila Puka	9cfadd7965	Deletes unnecessary test outputs pt2 (#6214 )	2022-08-19 18:21:13 +03:00
Jelte Fennema	e6a1a86db0	Improve debugability for columnar_memory flakyness (#6203 ) Sometimes the columnar_memory test fails in CI with the following error: ```diff SELECT 1.0 * TopMemoryContext / :top_post BETWEEN 0.98 AND 1.02 AS top_growth_ok FROM columnar_test_helpers.columnar_store_memory_stats(); -[ RECORD 1 ]-+-- -top_growth_ok \| t +top_growth_ok \| f -- before this change, max mem usage while executing inserts was 28MB and ``` This is almost certainly a harmless failure that simply requires bumping the margin a little bit. However, it's impossible to say with the current output. I was unable to reproduce this on-demand on my local machine or even in CI. So this changes the test to include the actual value difference in the size of TopMemoryContext when it's outside the expected range. Then next time it fails we at least have some information about why. Example of failing test: https://app.circleci.com/pipelines/github/citusdata/citus/25966/workflows/d472a57b-419a-4f33-b8bc-2e174a98d4d6/jobs/730576	2022-08-19 15:41:16 +02:00
Jelte Fennema	fe1668e43f	Fix flakyness in multi_utilities (#6204 ) Sometimes this multi_utilities would fail with the following error: ```diff SET citus.log_remote_commands TO ON; -- should propagate to all workers because no table is specified ANALYZE; NOTICE: issuing BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;SELECT assign_distributed_transaction_id(0, 3461, '2022-08-19 01:56:06.35816-07'); DETAIL: on server postgres@localhost:57637 connectionId: 1 NOTICE: issuing BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;SELECT assign_distributed_transaction_id(0, 3461, '2022-08-19 01:56:06.35816-07'); DETAIL: on server postgres@localhost:57638 connectionId: 2 NOTICE: issuing SET citus.enable_ddl_propagation TO 'off' DETAIL: on server postgres@localhost:57637 connectionId: 1 -NOTICE: issuing SET citus.enable_ddl_propagation TO 'off' -DETAIL: on server postgres@localhost:xxxxx connectionId: xxxxxxx NOTICE: issuing ANALYZE DETAIL: on server postgres@localhost:57637 connectionId: 1 +NOTICE: issuing SET citus.enable_ddl_propagation TO 'off' +DETAIL: on server postgres@localhost:57638 connectionId: 2 NOTICE: issuing ANALYZE DETAIL: on server postgres@localhost:57638 connectionId: 2 ``` This is simply a harmless change in output due to some timing differences. This PR makes the test output consistent by only logging the remote ANALYZE commands, not the SET commands.	2022-08-19 12:38:55 +02:00
Jelte Fennema	8ce12eb51f	Fix flakyness in failure_insert_select_repartition (#6202 ) This fixes our most commonly randomly failing failure test. The failing diff is as follows: ```diff SELECT citus.mitmproxy('conn.onQuery(query="fetch_intermediate_results").kill()'); mitmproxy ----------- (1 row) INSERT INTO target_table SELECT * FROM source_table; -ERROR: connection to the remote node localhost:xxxxx failed with the following error: connection not open +ERROR: could not open file "base/pgsql_job_cache/10_0_40/repartitioned_results_20770193413_from_4213590_to_1.data": No such file or directory +CONTEXT: while executing command on localhost:9060 +while executing command on localhost:57637 SELECT * FROM target_table ORDER BY a; ``` As far as I can tell this is the cause of a race condition: After killing fetch_intermediate_results on worker 9060, the previously created data file gets cleaned up. The fetch_intermediate_results call that's sent to worker 57637 will be cancelled and rolled back soon because of the failure on the other connection. But if that fetch_intermediate_results call is able to connect to 9060 before it is cancelled, it won't find the file it's looking for there anymore. So while it's not the error we expect, it does indicate that we succeeded. To avoid this issue instead of killing the fetch_intermediate_results call directly, we kill the COPY command that it uses to do the fetch. This results in stable output as can be seen here, where 227 runs of failure_insert_select_repartition succeeded: https://app.circleci.com/pipelines/github/citusdata/citus/26168/workflows/9c64a3b6-f46c-4725-9fb4-8f6a2d00a023/jobs/739389 To be clear this changes the test to affects the opposite fetch_intermediate_results call. This kills the fetch_intermediate_results call of worker 57637, instead of killing the fetch_intermediate_results call on worker 9060. Example of failing test: https://app.circleci.com/pipelines/github/citusdata/citus/26147/workflows/780e95ea-264a-4c9f-ad2e-cf11449a795e/jobs/738467	2022-08-19 09:11:07 +00:00
Naisila Puka	5a9fdc221b	Add explicit alias to avoid debug output diff in pg15 (#6183 )	2022-08-19 11:39:18 +03:00
Jelte Fennema	d16b458e2a	Remove the flaky rollback_to_savepoint test (#6190 ) This removes a flaky test that I introduced in #3868 after I fixed the issue described in #3622. This test is sometimes fails randomly in CI. The way it fails indicates that there might be some bug: A connection breaks after rolling back to a savepoint. I tried reproducing this issue locally, but I wasn't able to. I don't understand what causes the failure. Things that I tried were: 1. Running the test with: ```sql SET citus.force_max_query_parallelization = true; ``` 2. Running the test with: ```sql SET citus.max_adaptive_executor_pool_size = 1; ``` 3. Running the test in parallel with the same tests that it is run in parallel with in multi_schedule. None of these allowed me to reproduce the issue locally. So I think it's time to give on fixing this test and simply remove the test. The regression that this test protects against seems very unlikely to reappear, since in #3868 I also added a big comment about the need for the newly added `UnclaimConnection` call. So, I think the need for the test is quite small, and removing it will make our CI less flaky. In case the cause of the bug ever gets found, I tracked the bug in #6189 Example of a failing CI run: https://app.circleci.com/pipelines/github/citusdata/citus/26098/workflows/f84741d9-13b1-4ae7-9155-c21ed3466951/jobs/736424 For reference the unexpected diff is this (so both warnings and an error): ```diff INSERT INTO t SELECT i FROM generate_series(1, 100) i; +WARNING: connection to the remote node localhost:57638 failed with the following error: +WARNING: +CONTEXT: while executing command on localhost:57638 +ERROR: connection to the remote node localhost:57638 failed with the following error: ROLLBACK; ``` This test is also mentioned as the most failing regression test in #5975	2022-08-18 15:14:16 +03:00
Onder Kalaci	9ec8e627c1	Support Sequences owned by columns before distributing tables There are 3 different ways that a sequence can be interacting with tables. (1) and (2) are already supported. This commit adds support for (3). (1) column DEFAULT nextval('seq'): The dependency is roughly like below, and ExpandCitusSupportedTypes() is responsible for finding the depending sequences. schema <--- table <--- column <---- default value ^ \| \|------------------ sequence <--------\| (2) serial columns: Bigserial/small serial etc: The dependency is roughly like below, and ExpandCitusSupportedTypes() is responsible for finding the depending sequences. schema <--- table <--- column <---- default value ^ \| \| \| sequence <--------\| (3) Sequence OWNED BY table.column: Added support for this type of resolution in this commit. The dependency is almost like the following, and ExpandCitusSupportedTypes() is NOT responsible for finding the dependency. schema <--- table <--- column ^ \| sequence	2022-08-18 10:29:40 +02:00
Ying Xu	91473635db	[Columnar] Check for existence of Citus before creating Citus_Columnar (#6178 ) * Added a check to see if Citus has already been loaded before creating citus_columnar * added tests	2022-08-17 15:12:42 -07:00
Ahmet Gedemenli	0631e1998b	Fix upgrade paths for #6100 (#6176 ) * Fix upgrade paths for #6100 Co-authored-by: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com>	2022-08-17 18:56:53 +03:00
Naisila Puka	20a0e0ed39	Grant create on public to some users where necessary (for PG15) (#6180 )	2022-08-17 17:35:10 +03:00
aykut-bozkurt	52efe08642	default mode for shard splitting is set to auto. (#6179 )	2022-08-17 12:18:47 +03:00
aykut-bozkurt	be06d65721	Nonblocking tenant isolation is supported by using split api. (#6167 )	2022-08-17 11:13:07 +03:00
Jelte Fennema	78a5013e24	Support changing CPU priorities for backends and shard moves (#6126 ) Intro This adds support to Citus to change the CPU priority values of backends. This is created with two main usecases in mind: 1. Users might want to run the logical replication part of the shard moves or shard splits at a higher speed than they would do by themselves. This might cause some small loss of DB performance for their regular queries, but this is often worth it. During high load it's very possible that the logical replication WAL sender is not able to keep up with the WAL that is generated. This is especially a big problem when the machine is close to running out of disk when doing a rebalance. 2. Users might have certain long running queries that they don't impact their regular workload too much. Be very careful!!! Using CPU priorities to control scheduling can be helpful in some cases to control which processes are getting more CPU time than others. However, due to an issue called "[priority inversion][1]" it's possible that using CPU priorities together with the many locks that are used within Postgres cause the exact opposite behavior of what you intended. This is why this PR only allows the PG superuser to change the CPU priority of its own processes. Currently it's not recommended to set `citus.cpu_priority` directly. Currently the only recommended interface for users is the setting called `citus.cpu_priority_for_logical_replication_senders`. This setting controls CPU priority for a very limited set of processes (the logical replication senders). So, the dangers of priority inversion are also limited with when using it for this usecase. Background Before reading the rest it's important to understand some basic background regarding process CPU priorities, because they are a bit counter intuitive. A lower priority value, means that the process will be scheduled more and whatever it's doing will thus complete faster. The default priority for processes is 0. Valid values are from -20 to 19 inclusive. On Linux a larger difference between values of two processes will result in a bigger difference in percentage of scheduling. Handling the usecases Usecase 1 can be achieved by setting `citus.cpu_priority_for_logical_replication_senders` to the priority value that you want it to have. It's necessary to set this both on the workers and the coordinator. Example: ``` citus.cpu_priority_for_logical_replication_senders = -10 ``` Usecase 2 can with this PR be achieved by running the following as superuser. Note that this is only possible as superuser currently due to the dangers mentioned in the "Be very carefull!!!" section. And although this is possible it's NOT recommended: ```sql ALTER USER background_job_user SET citus.cpu_priority = 5; ``` OS configuration To actually make these settings work well it's important to run Postgres with more a more permissive value for the 'nice' resource limit than Linux will do by default. By default Linux will not allow a process to set its priority lower than it currently is, even if it was lower when the process originally started. This capability is necessary to reset the CPU priority to its original value after a transaction finishes. Depending on how you run Postgres this needs to be done in one of two ways: If you use systemd to start Postgres all you have to do is add a line like this to the systemd service file: ```conf LimitNice=+0 # the + is important, otherwise its interpreted incorrectly as 20 ``` If that's not the case you'll have to configure `/etc/security/limits.conf` like so, assuming that you are running Postgres as the `postgres` OS user: ``` postgres soft nice 0 postgres hard nice 0 ``` Finally you'd have add the following line to `/etc/pam.d/common-session` ``` session required pam_limits.so ``` These settings would allow to change the priority back after setting it to a higher value. However, to actually allow you to set priorities even lower than the default priority value you would need to change the values in the config to something lower than 0. So for example: ```conf LimitNice=-10 ``` or ``` postgres soft nice -10 postgres hard nice -10 ``` If you use WSL2 you'll likely have to do another thing. You have to open a new shell, because when PAM is only used during login, and WSL2 doesn't actually log you in. You can force a login like this: ``` sudo su $USER --shell /bin/bash ``` Source: https://stackoverflow.com/a/68322992/2570866 [1]: https://en.wikipedia.org/wiki/Priority_inversion	2022-08-16 13:07:17 +03:00
Jelte Fennema	43c2a1e88b	Share more code between splits and moves (#6152 ) When introducing non-blocking shard split functionality it was based heavily on the non-blocking shard moves. However, differences between usage was slightly to big to be able to reuse the existing functions easily. So, most logical replication code was simply copied to dedicated shard split functions and modified for that purpose. This PR tries to create a more generic logical replication infrastructure that can be used by both shard splits and shard moves. There's probably more code sharing possible in the future, but I believe this is at least a good start and addresses the lowest hanging fruit. This also adds a CreateSimpleHash function that makes creating the most common type of hashmap common.	2022-08-15 20:21:51 +03:00
yxu2162	e1322ec905	Change for PG15 test because hash_mem_multiplier was changed to 2 as a default instead of 1 which was what PG13/14 have	2022-08-11 09:49:56 -07:00
Önder Kalacı	73fcbdf12c	Merge branch 'main' into add_missing_schema	2022-08-11 11:28:41 +02:00
aykut-bozkurt	898801504e	sysid should be parsed as int. (#6150 )	2022-08-11 10:44:46 +03:00
Hanefi Onaldi	294400b2eb	Fix typos in tests that fail on PG15	2022-08-10 22:45:28 +03:00
Onder Kalaci	00ce7235cb	Set missing search_path in the tests On PG 15, public schema requires explicit GRANT, so lets avoid the conflict helpful for #6085	2022-08-10 18:04:10 +02:00
Onder Kalaci	44947d5634	This is not supported in PG15 so fix earlier	2022-08-10 17:44:03 +02:00
naisila	ea209bd11d	Rename remaining regclass to relation in columnar.options	2022-08-10 15:38:53 +02:00
aykut-bozkurt	cc694b6bcf	we consider stat object as invalid if it is not owned by current user (#6130 )	2022-08-09 20:59:30 +03:00
Hanefi Onaldi	6ef96ac560	Use client side \copy when accessing test files	2022-08-09 15:00:42 +03:00
Hanefi Onaldi	9f52fa7610	Remove dynamic translation of regression test scripts, step 2. This commit is inspired by a commit dc9c3b0ff21465fa89d71eecf5e6cc956d647eca from PostgreSQL 15 that shares the same header. I also removed some gitignore rules so that I can add some files to git worktree. We used to ignore the generated files, that are no longer generated after this commit. -------------------- Below is the commit message from PostgreSQL 15 commit dc9c3b0ff21465fa89d71eecf5e6cc956d647eca : "git mv" all the input/.source and output/.source files into the corresponding sql/ and expected/ directories. Then remove the pg_regress and Makefile infrastructure associated with dynamic translation. Discussion: https://postgr.es/m/1655733.1639871614@sss.pgh.pa.us	2022-08-09 14:15:52 +03:00
Jelte Fennema	8017693b2f	Allow specifying the shard_transfer_mode when replicating reference tables (#6070 ) When using `citus.replicate_reference_tables_on_activate = off`, reference tables need to be replicated later. This can be done using the `replicate_reference_tables()` UDF. However, this function only allowed blocking replication. This changes the function to default to logical replication instead, and allows choosing any of our existing shard transfer modes.	2022-08-09 13:21:31 +03:00
Marco Slot	3b57ff2867	Fix crash in citus_copy_shard_placement	2022-08-09 09:31:05 +02:00
naisila	796d90d293	Explain w/out costs in ch_bench to avoid PG15 output diff	2022-08-09 07:53:27 +03:00
Naisila Puka	bcbba99c96	Clean up large_table_shard_count guc leftovers (#6144 )	2022-08-09 06:31:57 +03:00
Naisila Puka	3806f6f6a9	Add ORDER BY in pg_locks to avoid output order diffs (#6145 )	2022-08-09 06:02:07 +03:00
Naisila Puka	ce944c3c0f	Remove bogus guc citus.compression (#6142 )	2022-08-09 05:21:32 +03:00
Jelte Fennema	dd548ee3c7	Use faster custom copy logic for non-blocking shard moves (#6119 ) DESCRIPTION: Use faster custom copy logic for non-blocking shard moves Non-blocking shard moves consist of two main phases: 1. Initial data copy 2. Catchup phase This changes the first of these phases significantly. Previously we used the copy logic provided by postgres subscriptions. This meant we didn't have to implement it ourselves, but it came with the downside of little control. When implementing shard splits we needed more control to even make it work, so we implemented our own logic for copying data between nodes. This PR starts using that logic for non-blocking shard moves. Doing so has four main advantages: 1. It uses COPY in binary format when possible, which is cheaper to encode and decode. Furthermore it very often results in less data that needs to be sent over the network. 2. It allows us to create the primary key (or other replica identity) after doing the initial data copy. This should give some speed up over the total run, because creating an index is bulk is much faster than incrementally building it. 3. It doesn't require a replication slot per parallel copy. Increasing the maximum number of replication slots uses resources in postgres, even if they are not used. So reducing the number of replication slots that shard moves need is nice. 4. Logical replication table_sync workers are slow to start up, so if lots of shards need to be copied that can make it quite slow. This can happen easily when combining Postgres partitioning with Citus.	2022-08-08 17:09:43 +02:00
Marco Slot	6aee8f35a6	Fix tenant isolation failure tests	2022-08-08 13:33:23 +02:00
Marco Slot	044dd26e40	Reimplement tenant isolation on top of block shard split	2022-08-08 13:33:23 +02:00
Naisila Puka	3401b31c13	Deletes unnecessary test outputs (#6140 )	2022-08-08 11:19:14 +03:00
Naisila Puka	9eedf6dcf8	Reduce log level to avoid alternative output for PG15 (#6139 )	2022-08-07 16:07:58 +03:00
aykut-bozkurt	4992533e33	support grant statement propagation for aggregates (#6132 )	2022-08-05 14:47:33 +03:00
Ahmet Gedemenli	8b68b0b5bb	Fix pg upgrade script for foreign tables (#6100 ) Fixes unexpected error for foreign tables when upgrading pg	2022-08-05 13:35:17 +03:00
Sameer Awasekar	e236711eea	Introduce Non-Blocking Shard Split Workflow	2022-08-04 16:32:38 +02:00

1 2 3 4 5 ...

1855 Commits (35b4ddc355ecbcae71afebb6568131af3d4a2481)