citus

Commit Graph

Author	SHA1	Message	Date
Nitish Upreti	6e6342bb62	Fix failing tests	2022-08-30 17:54:08 -07:00
Nitish Upreti	c58594ef36	Reidnent	2022-08-30 17:40:52 -07:00
Nitish Upreti	c8b7817bec	Rebase shard_split changes on main merge	2022-08-30 17:36:59 -07:00
Nitish Upreti	231b8ac719	Merge branch 'main' into niupre/DeferredDropAndCleanup	2022-08-30 16:58:12 -07:00
Nitish Upreti	b2dcf1b122	Fix test failure due to space	2022-08-30 16:36:16 -07:00
Nitish Upreti	e6d4a11702	New outfiles	2022-08-30 16:22:04 -07:00
Nitish Upreti	46c8968603	Update Tests	2022-08-30 16:04:02 -07:00
Jelte Fennema	24e695ca27	Fix flakyness in multi_utilities (#6272 ) Sometimes in CI our multi_utilities test fails like this: ```diff VACUUM (INDEX_CLEANUP ON, PARALLEL 1) local_vacuum_table; SELECT CASE WHEN s BETWEEN 20000000 AND 25000000 THEN 22500000 ELSE s END size FROM pg_total_relation_size('local_vacuum_table') s ; size ---------- - 22500000 + 39518208 (1 row) ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26641/workflows/5caea99c-9f58-4baa-839a-805aea714628/jobs/762870 Apparently VACUUM is not as reliable in cleaning up as we thought. This increases the range of allowed values. Important to note is that the range is still completely outside of the allowed range of the initial size. So we know for sure that some data was cleaned up.	2022-08-30 14:32:34 -07:00
Jelte Fennema	f22a47981a	Fix flakyness in adaptive_executor (#6275 ) Sometimes in CI our adaptive_executor test would fail randomly with the following error: ```diff SELECT sum(result::bigint) FROM run_command_on_workers($$ SELECT count(*) FROM pg_stat_activity WHERE pid <> pg_backend_pid() AND query LIKE '%8010090%' $$); sum ----- - 4 + 2 (1 row) END; ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26665/workflows/40665680-0044-4852-8fe4-5fd628f9fb47/jobs/764371 This means that the low slow start interval did not have any effect on the number of connections being opened. I could see two possibilities for this to happen: 1. CI was slow and actually doing the start of the second connection. I tried to solve this by doubling the time a query to the worker takes. 2. The second option is that the shards were queried in the oposite order than we expect. This would mean that the first query to the worker completes quickly because there's no, sleep because it doesn't contain any rows. I tried to solve this option by adding a row to each shard. After trying to reproduce the random failure in CI it turned out that I needed both of these fixes to resolve the random failure.	2022-08-30 23:23:30 +02:00
Jelte Fennema	8354853dec	Fix flakyness in citus_split_shard_columnar_partitioned (#6273 ) On CI our citus_split_shard_columnar_partitioned test would sometimes randomly fail like this: ```diff 8970008 \| colocated_dist_table \| -2147483648 \| 2147483647 \| localhost \| 57637 8970009 \| colocated_partitioned_table \| -2147483648 \| 2147483647 \| localhost \| 57637 8970010 \| colocated_partitioned_table_2020_01_01 \| -2147483648 \| 2147483647 \| localhost \| 57637 - 8970011 \| reference_table \| \| \| localhost \| 57637 8970011 \| reference_table \| \| \| localhost \| 57638 + 8970011 \| reference_table \| \| \| localhost \| 57637 (13 rows) ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26651/workflows/f695b4fb-ad81-46ff-b97e-0100e5d167ea/jobs/763517 This is a harmless diff due to a missing column in the order by list. This fixes that by adding the nodeport as a tiebreaker.	2022-08-30 19:54:50 +03:00
Marco Slot	6bb31c5d75	Add non-blocking variant of create_distributed_table (#6087 ) Added create_distributed_table_concurrently which is nonblocking variant of create_distributed_table. It bases on the split API which takes advantage of logical replication to support nonblocking split operations. Co-authored-by: Marco Slot <marco.slot@gmail.com> Co-authored-by: aykutbozkurt <aykut.bozkurt1995@gmail.com>	2022-08-30 15:35:40 +03:00
Jelte Fennema	d68654680b	Fix flakyness in isolation_citus_dist_activity (#6263 ) Sometimes in CI our isolation_citus_dist_activity test fails randomly like this: ```diff step s2-view-dist: SELECT query, citus_nodename_for_nodeid(citus_nodeid_for_gpid(global_pid)), citus_nodeport_for_nodeid(citus_nodeid_for_gpid(global_pid)), state, wait_event_type, wait_event, usename, datname FROM citus_dist_stat_activity WHERE query NOT ILIKE ALL(VALUES('%pg_prepared_xacts%'), ('%COMMIT%'), ('%BEGIN%'), ('%pg_catalog.pg_isolation_test_session_is_blocked%'), ('%citus_add_node%')) AND backend_type = 'client backend' ORDER BY query DESC; query \|citus_nodename_for_nodeid\|citus_nodeport_for_nodeid\|state \|wait_event_type\|wait_event\|usename \|datname ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+-------------------------+-------------------+---------------+----------+--------+---------- INSERT INTO test_table VALUES (100, 100); \|localhost \| 57636\|idle in transaction\|Client \|ClientRead\|postgres\|regression -(1 row) + + SELECT coalesce(to_jsonb(array_agg(csa_from_one_node.)), '[{}]'::JSONB) + FROM ( + SELECT global_pid, worker_query AS is_worker_query, pg_stat_activity. FROM + pg_stat_activity LEFT JOIN get_all_active_transactions() ON process_id = pid + ) AS csa_from_one_node; + \|localhost \| 57636\|active \| \| \|postgres\|regression +(2 rows) step s3-view-worker: ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26605/workflows/56d284d2-5bb3-4e64-a0ea-7b9b1626e7cd/jobs/760633 The reason for this is that citus_dist_stat_activity sometimes shows the query that it uses itself to get the data from pg_stat_activity. This is actually a bug, because it's a worker query and thus shouldn't show up there. To try and solve this bug, we remove two small opportunities for a race condition. These race conditions could happen when the backenddata was marked as active, but the distributedCommandOriginator was not set correctly yet/anymore. There was an opportunity for this to happen both during connection start and shutdown.	2022-08-30 12:57:37 +03:00
Önder Kalacı	33af407ac8	Add missing orderbys (#6271 )	2022-08-30 12:49:15 +03:00
Jelte Fennema	895a484b39	Hopefully fix flakyeness in drop_partitioned_table (#6270 ) Sometimes in CI our drop_partitioned_talbe test would fail with the following error: ```diff NOTICE: issuing SELECT worker_drop_distributed_table('drop_partitioned_table.child1') NOTICE: issuing SELECT worker_drop_distributed_table('drop_partitioned_table.child1') NOTICE: issuing DROP TABLE IF EXISTS drop_partitioned_table.child1_727001 CASCADE -NOTICE: issuing SELECT pg_catalog.citus_internal_delete_colocation_metadata(100047) -NOTICE: issuing SELECT pg_catalog.citus_internal_delete_colocation_metadata(100047) +NOTICE: issuing SELECT pg_catalog.citus_internal_delete_colocation_metadata(100046) +NOTICE: issuing SELECT pg_catalog.citus_internal_delete_colocation_metadata(100046) ROLLBACK; NOTICE: issuing ROLLBACK NOTICE: issuing ROLLBACK ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26631/workflows/31536032-e1ba-493b-b12a-f40757f3a7d6/jobs/762170 For some reason the colocationid of the distributed partitioned table would be one less than we expected. Why this happens I'm not sure, but it seems fairly harmless that it does. In an attempt to work around this flakyness I now reset the colocation id sequence right before creating the table in question. This is good practice in general, because it allows us to run the test successfully using `check-minimal` and it also allows us to rerun it multiple times.	2022-08-30 12:21:16 +03:00
Jelte Fennema	5c95604154	Always copy normalized files after a regress run (#6254 ) Our python based tests didn't always copy the normalized files after the regress run. I had the problem where running the following command would result in non-normalized files in the expected directory after running our PG upgrade tests locally: ``` cp src/test/regress/{results,expected}/upgrade_list_citus_objects.out ``` This PR fixes that by always running `copy_modified` even if the tests fail. The same was already being done for our perl based tests at the end of the `pg_regress_multi.pl` file.	2022-08-30 07:15:29 +00:00
Nitish Upreti	28dceecfff	Handling failure with subtransaction	2022-08-29 18:42:14 -07:00
Nitish Upreti	789ff7b162	Validate relation name before logging it	2022-08-29 18:24:38 -07:00
Naisila Puka	13fe89f018	Fixes flakyness in columnar_permissions test (#6266 ) `columnar_permissions.sql` test is flaky due to a missing `ORDER BY` clauses. Added the other `ORDER BY` clauses for consistency in the test. ```diff where relation in ('no_access'::regclass, 'columnar_permissions'::regclass); relation \| chunk_group_row_limit \| stripe_row_limit \| compression \| compression_level ----------------------+-----------------------+------------------+-------------+------------------- - no_access \| 10000 \| 150000 \| zstd \| 3 columnar_permissions \| 10000 \| 2222 \| none \| 3 + no_access \| 10000 \| 150000 \| zstd \| 3 (2 rows) ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26610/workflows/79f03ef9-7674-4567-a087-02536c9ddf04/jobs/760942	2022-08-29 14:33:26 +02:00
Önder Kalacı	1df943e0d5	Use Posix locale in the tests (#6261 ) Commit `9653a0065e` has changed it to C.UTF-8 , which fails on MacOS	2022-08-29 12:52:03 +02:00
Nitish Upreti	2c50101074	Update sql script	2022-08-28 17:58:20 -07:00
Nitish Upreti	6348faf7d3	Sort GUC	2022-08-28 17:48:30 -07:00
Nitish Upreti	0353ca3258	Upgrade test tweak	2022-08-28 17:44:45 -07:00
Nitish Upreti	3d162e1623	Update split tests output	2022-08-28 17:08:12 -07:00
Nitish Upreti	d3442e2e04	Update isolation tests	2022-08-28 16:45:47 -07:00
Nitish Upreti	e9e64eb3e7	failure split cleanup	2022-08-28 01:04:16 -07:00
Nitish Upreti	2b83be1f1a	failure split cleanup	2022-08-28 00:56:59 -07:00
Nitish Upreti	2ce437776c	test message	2022-08-28 00:20:23 -07:00
Nitish Upreti	daa38468c8	test message	2022-08-28 00:16:44 -07:00
Nitish Upreti	7280b80ef4	Update tests	2022-08-28 00:08:58 -07:00
Nitish Upreti	0655a03ee0	Upgrade Basic After	2022-08-27 23:54:55 -07:00
Nitish Upreti	bd5cd55b7a	Upgrade Basic After	2022-08-27 23:42:33 -07:00
Nitish Upreti	a9557725fd	Multi Tenant Isolation	2022-08-27 23:27:12 -07:00
Nitish Upreti	44480c3586	Multi Tenant Isolation	2022-08-27 23:25:35 -07:00
Nitish Upreti	f1c0e0456a	Multi Tenant Isolation	2022-08-27 23:21:39 -07:00
Nitish Upreti	0180214826	Multi Tenant Isolation	2022-08-27 23:16:41 -07:00
Nitish Upreti	5aedb82268	multi_tenant_isolation edit	2022-08-27 23:12:04 -07:00
Nitish Upreti	3b231b70d8	multi_tenant_isolation test changes	2022-08-27 23:04:11 -07:00
Nitish Upreti	5e5a2147cd	Fix more tests	2022-08-27 22:19:14 -07:00
Nitish Upreti	ef2361f091	Run reindent	2022-08-27 21:24:56 -07:00
Nitish Upreti	59aaed3e5c	Fix failing tests	2022-08-27 21:23:17 -07:00
Nitish Upreti	21028434ce	Add operation name for drop	2022-08-27 20:12:29 -07:00
Nitish Upreti	ce3ae8ff81	Downgrade steps	2022-08-27 18:06:24 -07:00
Nitish Upreti	f3a14460e8	Permission check causes tenant isolation failure	2022-08-26 16:08:37 -07:00
Nitish Upreti	a7ec398f7a	Use recordid sequence always	2022-08-26 16:03:14 -07:00
Nitish Upreti	cc54697580	Remove null.d	2022-08-26 15:37:01 -07:00
Nitish Upreti	6e02b84394	Deferred drop test	2022-08-26 15:13:33 -07:00
Nitish Upreti	fa1456d14f	Fix dummy shard logging bug and update test	2022-08-26 13:50:32 -07:00
Ahmet Gedemenli	0855a9d1d4	Use SUM for calculating non partitioned table sizes (#6222 ) We currently do a `pg_relation_total_size('t1') + pg_relation_total_size('t2') + ..` on shard lists, especially when rebalancing the shards. This in some cases goes huge. With this PR, we basically use a SUM for all table sizes, instead of using thousands of pluses.	2022-08-26 18:02:14 +03:00
Sameer Awasekar	4df8eca77f	Add worker_split_shard_release_dsm udf to release dynamic shared memory (#6248 ) The code introduces worker_split_shard_release_dsm udf to release the dynamic shared memory segment allocated during non-blocking split workflow.	2022-08-26 18:27:32 +05:30
Jelte Fennema	77dd49fcf8	Fix flakyness in failure_online_move_shard_placement (#6250 ) Sometimes in CI failure_online_move_shard_placement fails with the following error: ```diff SELECT citus.mitmproxy('conn.onQuery(query="^ALTER SUBSCRIPTION .* ENABLE").cancel(' \|\| :pid \|\| ')'); mitmproxy ----------- (1 row) SELECT master_move_shard_placement(101, 'localhost', :worker_1_port, 'localhost', :worker_2_proxy_port); -ERROR: canceling statement due to user request +ERROR: tuple concurrently updated +CONTEXT: while executing command on localhost:9060 -- failure on polling subscription state ``` Source: https://app.circleci.com/pipelines/github/citusdata/citus/26441/workflows/dd6e3475-6121-47b3-aea3-4ac92be114f4/jobs/751476/steps This error is not completely harmless, because based on the logs it mean that our cleanup logic failed, which in turn means that replication slots are left around: ``` 2022-08-24 16:01:29.247 UTC [1219] ERROR: XX000: tuple concurrently updated 2022-08-24 16:01:29.247 UTC [1219] LOCATION: simple_heap_update, heapam.c:4179 2022-08-24 16:01:29.247 UTC [1219] STATEMENT: ALTER SUBSCRIPTION citus_shard_move_subscription_10 DISABLE ``` However, we have other mechanisms to clean up any leftovers in case of a failed cleanup. So it's not that big of a problem. The reason we run into this error is arguably because of a Postgres bug, so I created a patch for Postgres that fixes this. While we wait for this (or a similar) patch to be merged, this PR disables the flaky test. There's still a test that tests in case of a connection "kill" instead of a "cancel", so I don't think we lose very important coverage by disabling this test. When trying to reproduce this I only hit this issue in the cancel case, so I don't think there's a need to disable the kill case for now.	2022-08-26 12:49:45 +02:00

1 2 3 4 5 ...

6141 Commits (6e6342bb6263a805c6552996a6a83e3c4477fcc7) All Branches Search

6141 Commits (6e6342bb6263a805c6552996a6a83e3c4477fcc7)

All Branches