* Adjust configure script to allow PG15
* Adds copy of ruleutils_14.c as ruleutils_15.c
* Uses get_namespace_name_or_temp in ruleutils_15.c
Relevant PG commit:
48c5c9068211e0a04fd9553c8714b2821ed3ad17
* Clean up code using "(expr) ? true : false" in ruleutils_15.c
Relevant PG commit:
fd0625c7a9c679c0c1e896014b8f49a489c3a245
* Change varno from Index (unsigned int) to int in ruleutils_15.c
Relevant PG commit:
e3ec3c00d85bd2844ffddee83df2bd67c4f8297f
* Adds find_recursive_union to ruleutils_15.c
Relevant PG commit:
3f50b82639637c9908afa2087de7588450aa866b
* Fix display of SQL-std func's args in INSERT/SELECT in ruleutils_15.c
Relevant PG commit:
a8d8445a7b2f80f6d0bfe97b19f90bd2cbef8759
* Fix ruleutils_15.c's dumping of whole-row Vars in more contexts
Relevant PG commit:
43c2175121c829c8591fc5117b725f1f22bfb670
* Fix assorted missing logic for GroupingFunc nodes in ruleutils_15.c
Relevant PG commit:
2591ee8ec44d8cbc8e1226550337a64c684746e4
* Adds grammar support for SQL/JSON clauses in ruleutils_15.c
Relevant PG commit:
f79b803dcc98d707450e158db3638dc67ff8380b
* Adds SQL/JSON constructors to ruleutils_15.c
Relevant PG commits:
f4fb45d15c59d7add2e1b81a9d477d0119a9691a
cc7401d5ca498a84d9b47fd2e01cebd8e830e558
* Adds support for MERGE in ruleutils_15.c
Relevant PG commit:
7103ebb7aae8ab8076b7e85f335ceb8fe799097c
* Add IS JSON predicate to ruleutils_15.c
Relevant PG commit:
33a377608fc29cdd1f6b63be561eab0aee5c81f0
* Add SQL/JSON query functions to ruleutils_15.c
Relevant PG commit:
1a36bc9dba8eae90963a586d37b6457b32b2fed4
* Adds three different SQL/JSON values to ruleutils_15.c
Relevant PG commits:
606948b058dc16bce494270eea577011a602810e
49082c2cc3d8167cca70cfe697afb064710828ca
* Adds JSON table functions in ruleutils_15.c
Relevant PG commit:
4e34747c88a03ede6e9d731727815e37273d4bc9
* Add PLAN function for JSON table in ruleutils_15.c
Relevant PG commit:
fadb48b00e02ccfd152baa80942de30205ab3c4f
* Remove extra blank lines before block-closing braces ruleutils_15.c
Relevant PG commit:
24d2b2680a8d0e01b30ce8a41c4eb3b47aca5031
* set_deparse_plan: Reuse variable to appease Coverity ruleutils_15.c
Relevant PG commit:
e70813fbc4aaca35ec012d5a426706bd54e4acab
* Mechanical code beautification ruleutils_15.c
Relevant PG commit:
23e7b38bfe396f919fdb66057174d29e17086418
* Rename value_type to item_type in ruleutils_15.c
Relevant PG commit:
3ab9a63cb638a1fd99475668e2da9c237495aeda
* Show 'AS "?column?"' explicitly when it's important in ruleutils_15.c
Relevant PG commit:
c7461fc25558832dd347a9c8150b0f1ed85e36e8
* Fix ruleutils_15.c issues with dropped cols in funcs-returning-composite
Relevant PG commit:
c1d1e8469c77ce6b8e5310955580b4a3eee7fe96
* Change comment regarding functions returning composite in ruleutils_15.c
Relevant PG commit:
c2fa113ddb1117b1f03e91960f65d5d7d8a90270
* Replace int nodes with bool nodes where needed
In PG15, Boolean nodes are added. Pre PG15, internal Boolean values
in Create Role commands were represented by Integer nodes. This
commit replaces int nodes logic with bool nodes logic where needed.
Mostly there are CREATE ROLE logic changes.
Relevant PG commit:
941460fcf731a32e6a90691508d5cfa3d1f8eeaf
* Handle new option colliculocale in CREATE COLLATION logic
In PG15, there is an added option to use ICU as global locale provider.
pg_collation has three locale-related fields: collcollate and collctype,
which are libc-related fields, and a new one colliculocale, which is the
ICU-related field. Only the libc-related fields or the ICU-related field
is set, never both.
Relevant PG commits:
f2553d43060edb210b36c63187d52a632448e1d2
54637508f87bd5f07fb9406bac6b08240283be3b
* Add PG15 tests to CI using test images that have 15beta2 (#6093)
* Change warning message in pg_signal_backend()
Relevant PG commit:
7fa945b857cc1b2964799411f1633468826861ff
* Revert "Add missing ifdef for PG 15"
This reverts commit c7b51025ab.
* Fixes tests for ALTER TRIGGER RENAME consistency for part. tables
Relevant PG commit:
80ba4bb383538a2ee846fece6a7b8da9518b6866
* Prevent creating child triggers on partitions when adding new node
Pre PG15, tgisinternal is true for a "child" trigger on a partition
cloned from the trigger on the parent.
In PG15, tgisinternal is false in that case. However, we don't want to
create this trigger on the partition since it will create a conflict
when we try to attach the partition to the parent table:
ERROR: trigger "..." for relation "{partition_name}" already exists
Relevant PG commit:
f4566345cf40b068368cb5617e61318da60676ec
* Fix tests for generated columns dependency changes
In PG15, For GENERATED columns, all dependencies of the generation
expression are recorded as NORMAL dependencies of the column itself.
This requires CASCADE to drop generated cols with the original col.
PRE PG15, dependencies were recorded as AUTO, with which
generated columns are silently dropped with the original column.
Relevant PG commit:
cb02fcb4c95bae08adaca1202c2081cfc81a28b5
* Explicitly cast catalog "char" column to text before concatenation
Relevant PG commit:
07eee5a0dc642d26f44d65c4e6263304208e8583
* Remove 'AS "?column?"' from test outputs
There were some instances in the following tst outputs
in planning debug outputs where AS "?column?" is added.
We add a normalization rule to remove it as it is not
important.
cte_inline.out
recursive_relation_planning_restriction_pushdown.out
Relevant PG commit:
c7461fc25558832dd347a9c8150b0f1ed85e36e8
* Use pg_backup_stop(PG15) instead of pg_stop_backup(PG<15)
Add an alternative test output because of the change in the
backup modes of Postgres. Specifically here, there is a renaming
issue: pg_stop_backup PRE PG15 vs pg_backup_stop PG15+
The alternative output can be deleted when we drop support for PG14
Relevant PG commit:
39969e2a1e4d7f5a37f3ef37d53bbfe171e7d77a
* Adds citus.mitmfifo GUC
Previously we setting this configuration parameter
in the fly for failure tests schedule.
However, PG15 doesn't allow that anymore: reserved prefixes
like "citus" cannot be used to set non-existing GUCs.
Relevant PG commit:
88103567cb8fa5be46dc9fac3e3b8774951a2be7
* Handles EXPLAIN output diffs in PG15 - Extra result lines
To handle extra "Result" lines in explain outputs, we add explain
method to multi_test_helpers.sql file
- plan_without_result_lines() is added for cases where we want the
whole explain output with only "Result" lines removed
* Handles EXPLAIN output diffs in PG15, Hash Agg/Join leverage
To handle differences in usage of GroupAggregate vs HashAggregate
or Merge Join vs Hash join in cases where this detail doesn't
seem to matter, we use coordinator_plan().
- coordinator_plan() is updated to remove "Result" lines
There are some cases where we have subplans so we add a new
function that prints all Task Count lines as well
- coordinator_plan_with_subplans()
Still not sure of the relevant PG commit
Could be db0d67db2401eb6238ccc04c6407a4fd4f985832
but disabling enable_group_by_reordering didn't help.
* Handles EXPLAIN output diffs in PG15: enable_group_by_reordering
Relevant PG commit
db0d67db2401eb6238ccc04c6407a4fd4f985832
* Normalizes Memory Usage, Buckets, Batches for PG15 explain diffs
We create a new function in multi_test_helpers, which is similar
to explain_merge function in PG15. This explain helper function
normalies Memory Usage, Buckets and Batches, and we use it in the
tests which give a different output for PG15.
* Bump test images to 15beta3 (#6172)
* Omit namespace in post-copy errmsg
Relevant PG commit:
069d33d0c5a021601245e44df77a0423ddd69359
* Handles EXPLAIN output diffs in PG15: extra arrows&result lines
To handle extra "->" arrows resulting from extra Result lines
in explain outputs, we add the following explain method to
multi_test_helpers.sql file
- plan_without_arrows() is added for cases where we want the
whole explain output without arrows and without Result lines
* Alters public schema's owner to pg_database_owner in PG15
In PG15, public schema is owned by pg_database_owner role.
In multi_extension, we drop and recreate the ppublic schema,
hence its owner become the default user in our tests, postgres.
Change that to pg_database_owner for PG15 consistency.
This results in alternative test output for public schema grants
in the following test:
grant_on_schema_propagation.sql
Relevant PG commit: b073c3ccd06e4cb845e121387a43faa8c68a7b62
* Add alternative test outputs for change in Insert Select display
citus_local_tables_queries.sql
coordinator_shouldhaveshards.sql
cte_inline.sql
insert_select_repartition.sql
intermediate_result_pruning.sql
local_shard_execution.sql
local_shard_execution_replicated.sql
multi_deparse_shard_query.sql
multi_insert_select.sql
multi_insert_select_conflict.sql
multi_mx_insert_select_repartition.sql
mx_coordinator_shouldhaveshards.sql
single_node.sql
Relevant PG commit:
a8d8445a7b2f80f6d0bfe97b19f90bd2cbef8759
* Fixes columnar tap tests for PG15
In PG15, Perl test modules have been moved to a new namespace.
Also, postgres node new() and get_new_node() methods have been
unified to one method: new()
We create separate tap tests for PG13/14 and PG15+
and update the Makefiles accordingly.
Relevant PG commits:
201a76183e2056c2217129e12d68c25ec9c559c8
b3b4d8e68ae83f432f43f035c7eb481ef93e1583
* Handles EXPLAIN output diffs in PG15: HashAgg Leverage,alt. output
Still not sure of the relevant PG commit
Could be db0d67db2401eb6238ccc04c6407a4fd4f985832
but disabling enable_group_by_reordering didn't help.
In CI multi_replicate_reference_table would sometimes fail like this:
```diff
-- detects correctly that referecence table doesn't have replica identity
SELECT replicate_reference_tables();
-ERROR: cannot use logical replication to transfer shards of the relation initially_not_replicated_reference_table since it doesn't have a REPLICA IDENTITY or PRIMARY KEY
+ERROR: cannot use logical replication to transfer shards of the relation ref_table since it doesn't have a REPLICA IDENTITY or PRIMARY KEY
DETAIL: UPDATE and DELETE commands on the shard will error out during logical replication unless there is a REPLICA IDENTITY or PRIMARY KEY.
HINT: If you wish to continue without a replica identity set the shard_transfer_mode to 'force_logical' or 'block_writes'.
```
Because `CitusTableTypeIdList` returns tables in heap order so it's
a bit random which one is first in the list. And the test contained
multiple tables that didn't have a primary key or replica identity. So
it made sense that the error could be for either one of these tables.
This PR makes the test output consistent by changing one of the tables
to have a primary key.
Example of failing test: https://app.circleci.com/pipelines/github/citusdata/citus/26387/workflows/fc3196e7-ddf2-4000-a70b-5ac71c836321/jobs/748940
Using binary encoding can save a lot of CPU cycles, both on the sender
and on the receiver. Since the walsender and walreceiver processes are
single threaded, this can matter a lot for the throughput if they are
bottlenecked on CPU.
This feature is only available in PG14, not PG13. It should be safe to
always enable because it's only used for types that support binary
encoding according to the PG docs:
> Even when this option is enabled, only data types that have binary
> send and receive functions will be transferred in binary.
But in case it causes problems, it can still be disabled by setting
`citus.enable_binary_protocol` to `false`.
In CI our failure_connection_establishment sometimes failed randomly
with the following error:
```diff
-- verify a connection attempt was made to the intercepted node, this would have cause the
-- connection to have been delayed and thus caused a timeout
SELECT * FROM citus.dump_network_traffic() WHERE conn=0;
conn | source | message
------+--------+---------
- 0 | coordinator | [initial message]
-(1 row)
+(0 rows)
SELECT citus.mitmproxy('conn.allow()');
```
Source: https://app.circleci.com/pipelines/github/citusdata/citus/26318/workflows/d3354024-9a67-4b01-9416-5cf79aec6bd8/jobs/745558
The way I fixed this was by removing the dump_network_traffic call. This
might sound simple, but doing this while continuing to let the test
serve its intended purpose required quite some more changes.
This dump_network_traffic call was there because we didn't want to show
warnings in the queries above, because the exact warnings were not
reliable. The main reason this error was not reliable was because we
were using round-robin task assignment. We did the same query twice, so
that it would hit the node with the intercepted connection in one of
those connections. Instead of doing that I'm now using the
"first-replica" policy and do the queries only once. This works, because
the first placements by placementid for each of the used tables are on
the second node, so first-replica will cause the first connection to go
there.
This solved most of the flakyness, but when confirming that the
flakyness was fixed I found some additional errors:
```diff
-- show that INSERT failed
SELECT citus.mitmproxy('conn.allow()');
mitmproxy
-----------
(1 row)
SELECT count(*) FROM single_replicatated WHERE key = 100;
- count
----------------------------------------------------------------------
- 0
-(1 row)
-
+ERROR: could not establish any connections to the node localhost:9060 after 400 ms
RESET client_min_messages;
```
Source: https://app.circleci.com/pipelines/github/citusdata/citus/26321/workflows/fd5f4622-400c-465e-8d82-83f5f55a87ec/jobs/745666
I addressed this with a combination of two things:
1. Only change citus.node_connection_timeout for the queries that we
want to test timeout behaviour for. When those queries are done I
reset the value to the default again.
2. Change our mitm framework to only delay the initial connection packet
instead of all packets. I think sometimes a follow on packet of a previous
connection attempt was causing the next connection attempt to be delayed
even if `conn.allow()` was already called. For our tests we only care about
connection timeouts, so there's no reason to delay any other packets than
the initial connection packet.
Then there was some last flakyness in the exact error that was given:
```diff
-- tests for connectivity checks
SELECT name FROM r1 WHERE id = 2;
WARNING: could not establish any connections to the node localhost:9060 after 900 ms
+WARNING: connection to the remote node localhost:9060 failed with the following error:
name
------
bar
(1 row)
```
Source: https://app.circleci.com/pipelines/github/citusdata/citus/26338/workflows/9610941c-4d01-4f62-84dc-b91abc56c252/jobs/746467
I don't have a good explaination for this slight change in error message, but
given that it is missing the actual error message I expected this to be related
to some small difference in timing: e.g. the server responding to the connection
attempt right after the coordinator determined that the connection timed out.
To solve this last flakyness I increased the connection timeouts and made the
difference between the timeout and the delay a bit bigger. With these tweaks
I wasn't able to reproduce this error on CI anymore.
Finally, I made most of the same changes to failure_failover_to_local_execution,
since it was using the `conn.delay()` mitm method too. The only change that
I left out was the timing increase, since it might not be strictly necessary and
increases time it takes to run the test. If this test ever becomes flaky the first
thing we should try is increase its timeout.
The failure_single_select test would sometimes fail with an error that's
similar to this:
```diff
-- cancel after first SELECT; txn should fail and nothing should be marked as invalid
SELECT citus.mitmproxy('conn.onQuery(query="^SELECT").cancel(' || pg_backend_pid() || ')');
- mitmproxy
----------------------------------------------------------------------
-
-(1 row)
-
+ERROR: canceling statement due to user request
+CONTEXT: COPY mitmproxy_result, line 1: ""
+SQL statement "COPY mitmproxy_result FROM '/home/circleci/project/src/test/regress/tmp_check/mitmproxy.fifo'"
+PL/pgSQL function citus.mitmproxy(text) line 11 at EXECUTE
BEGIN;
```
This error looked very to the one from #6217 and indeed the cause turned
out to be similar. Because we were canceling all SELECT queries, we
would actually sometimes cancel our mitmproxy SELECT queries itself.
This puts some additional restrictions on the queries that we cancel,
most importantly it should contain the name of the table that we're
selecting from.
I was able to reproduce the original issue locally pretty reliably. With
the changes in this PR it didn't happen again.
In passing this also changes one other failure test that was cancelling
all selects and puts similar additional restrictions on those
cancellations.
Example of failed test in CI: https://app.circleci.com/pipelines/github/citusdata/citus/26305/workflows/4d942b91-f83c-453c-8d9a-ae22d608e756/jobs/745071
The failure_create_distributed_table_non_empty test would sometimes fail
like this:
```diff
-- in the first test, cancel the first connection we sent from the coordinator
SELECT citus.mitmproxy('conn.cancel(' || pg_backend_pid() || ')');
- mitmproxy
----------------------------------------------------------------------
-
-(1 row)
-
+ERROR: canceling statement due to user request
+CONTEXT: COPY mitmproxy_result, line 1: ""
+SQL statement "COPY mitmproxy_result FROM '/home/circleci/project/src/test/regress/tmp_check/mitmproxy.fifo'"
+PL/pgSQL function citus.mitmproxy(text) line 11 at EXECUTE
SELECT create_distributed_table('test_table', 'id');
```
Because the cancel command had no filter it would actually sometimes
cancel the mitmproxy cancel command itself. This PR addresses that by
filtering on CREATE TABLE, which is one of the command that
create_distributed_table will send to the workers.
Example of failing test: https://app.circleci.com/pipelines/github/citusdata/citus/26252/workflows/1b7e5464-cca4-4ec1-99b3-48ddf25c29fa/jobs/742829
Sometimes in CI the columnar_memory test was using slightly more memory
than expected.
```diff
SELECT CASE WHEN 1.0 * TopMemoryContext / :top_post BETWEEN 0.98 AND 1.02 THEN 1 ELSE 1.0 * TopMemoryContext / :top_post END AS top_growth
FROM columnar_test_helpers.columnar_store_memory_stats();
--[ RECORD 1 ]-
-top_growth | 1
+-[ RECORD 1 ]------------------
+top_growth | 1.0206132116232119
-- before this change, max mem usage while executing inserts was 28MB and
```
This PR changes the expectation to be slightly higher, such that this
random increase in memory usage doesn't cause a flaky test.
Failing test: https://app.circleci.com/pipelines/github/citusdata/citus/26256/workflows/c0870f66-3346-4f8d-a1d3-36dfd7c98289/jobs/743028
In the logical_replication test we test that the cleanup logic at the
start of a shard move works as expected. To do so we create a
subscription and publication slot manually. This changes the test to
make that subscription actually connect to the database that the
publication is in.
Useful for #5987#6085
Sometimes the columnar_memory test fails in CI with the following error:
```diff
SELECT 1.0 * TopMemoryContext / :top_post BETWEEN 0.98 AND 1.02 AS top_growth_ok
FROM columnar_test_helpers.columnar_store_memory_stats();
-[ RECORD 1 ]-+--
-top_growth_ok | t
+top_growth_ok | f
-- before this change, max mem usage while executing inserts was 28MB and
```
This is almost certainly a harmless failure that simply requires bumping
the margin a little bit. However, it's impossible to say with the
current output. I was unable to reproduce this on-demand on my local
machine or even in CI. So this changes the test to include the actual
value difference in the size of TopMemoryContext when it's outside the
expected range. Then next time it fails we at least have some
information about why.
Example of failing test: https://app.circleci.com/pipelines/github/citusdata/citus/25966/workflows/d472a57b-419a-4f33-b8bc-2e174a98d4d6/jobs/730576
Sometimes this multi_utilities would fail with the following error:
```diff
SET citus.log_remote_commands TO ON;
-- should propagate to all workers because no table is specified
ANALYZE;
NOTICE: issuing BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;SELECT assign_distributed_transaction_id(0, 3461, '2022-08-19 01:56:06.35816-07');
DETAIL: on server postgres@localhost:57637 connectionId: 1
NOTICE: issuing BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;SELECT assign_distributed_transaction_id(0, 3461, '2022-08-19 01:56:06.35816-07');
DETAIL: on server postgres@localhost:57638 connectionId: 2
NOTICE: issuing SET citus.enable_ddl_propagation TO 'off'
DETAIL: on server postgres@localhost:57637 connectionId: 1
-NOTICE: issuing SET citus.enable_ddl_propagation TO 'off'
-DETAIL: on server postgres@localhost:xxxxx connectionId: xxxxxxx
NOTICE: issuing ANALYZE
DETAIL: on server postgres@localhost:57637 connectionId: 1
+NOTICE: issuing SET citus.enable_ddl_propagation TO 'off'
+DETAIL: on server postgres@localhost:57638 connectionId: 2
NOTICE: issuing ANALYZE
DETAIL: on server postgres@localhost:57638 connectionId: 2
```
This is simply a harmless change in output due to some timing
differences. This PR makes the test output consistent by only logging
the remote ANALYZE commands, not the SET commands.
This fixes our most commonly randomly failing failure test. The failing
diff is as follows:
```diff
SELECT citus.mitmproxy('conn.onQuery(query="fetch_intermediate_results").kill()');
mitmproxy
-----------
(1 row)
INSERT INTO target_table SELECT * FROM source_table;
-ERROR: connection to the remote node localhost:xxxxx failed with the following error: connection not open
+ERROR: could not open file "base/pgsql_job_cache/10_0_40/repartitioned_results_20770193413_from_4213590_to_1.data": No such file or directory
+CONTEXT: while executing command on localhost:9060
+while executing command on localhost:57637
SELECT * FROM target_table ORDER BY a;
```
As far as I can tell this is the cause of a race condition: After killing
fetch_intermediate_results on worker 9060, the previously created data
file gets cleaned up. The fetch_intermediate_results call that's sent
to worker 57637 will be cancelled and rolled back soon because of the
failure on the other connection. But if that fetch_intermediate_results
call is able to connect to 9060 before it is cancelled, it won't find
the file it's looking for there anymore. So while it's not the error we
expect, it does indicate that we succeeded.
To avoid this issue instead of killing the fetch_intermediate_results
call directly, we kill the COPY command that it uses to do the fetch.
This results in stable output as can be seen here, where 227 runs of
failure_insert_select_repartition succeeded:
https://app.circleci.com/pipelines/github/citusdata/citus/26168/workflows/9c64a3b6-f46c-4725-9fb4-8f6a2d00a023/jobs/739389
To be clear this changes the test to affects the opposite
fetch_intermediate_results call. This kills the fetch_intermediate_results
call of worker 57637, instead of killing the fetch_intermediate_results call
on worker 9060.
Example of failing test: https://app.circleci.com/pipelines/github/citusdata/citus/26147/workflows/780e95ea-264a-4c9f-ad2e-cf11449a795e/jobs/738467
This removes a flaky test that I introduced in #3868 after I fixed the
issue described in #3622. This test is sometimes fails randomly in CI.
The way it fails indicates that there might be some bug: A connection
breaks after rolling back to a savepoint.
I tried reproducing this issue locally, but I wasn't able to. I don't
understand what causes the failure.
Things that I tried were:
1. Running the test with:
```sql
SET citus.force_max_query_parallelization = true;
```
2. Running the test with:
```sql
SET citus.max_adaptive_executor_pool_size = 1;
```
3. Running the test in parallel with the same tests that it is run in
parallel with in multi_schedule.
None of these allowed me to reproduce the issue locally.
So I think it's time to give on fixing this test and simply remove the
test. The regression that this test protects against seems very unlikely
to reappear, since in #3868 I also added a big comment about the need
for the newly added `UnclaimConnection` call. So, I think the need for
the test is quite small, and removing it will make our CI less flaky.
In case the cause of the bug ever gets found, I tracked the bug in #6189
Example of a failing CI run:
https://app.circleci.com/pipelines/github/citusdata/citus/26098/workflows/f84741d9-13b1-4ae7-9155-c21ed3466951/jobs/736424
For reference the unexpected diff is this (so both warnings and an error):
```diff
INSERT INTO t SELECT i FROM generate_series(1, 100) i;
+WARNING: connection to the remote node localhost:57638 failed with the following error:
+WARNING:
+CONTEXT: while executing command on localhost:57638
+ERROR: connection to the remote node localhost:57638 failed with the following error:
ROLLBACK;
```
This test is also mentioned as the most failing regression test in #5975
There are 3 different ways that a sequence can be interacting
with tables. (1) and (2) are already supported. This commit adds
support for (3).
(1) column DEFAULT nextval('seq'):
The dependency is roughly like below,
and ExpandCitusSupportedTypes() is responsible
for finding the depending sequences.
schema <--- table <--- column <---- default value
^ |
|------------------ sequence <--------|
(2) serial columns: Bigserial/small serial etc:
The dependency is roughly like below,
and ExpandCitusSupportedTypes() is responsible
for finding the depending sequences.
schema <--- table <--- column <---- default value
^ |
| |
sequence <--------|
(3) Sequence OWNED BY table.column: Added support for
this type of resolution in this commit.
The dependency is almost like the following, and
ExpandCitusSupportedTypes() is NOT responsible for finding
the dependency.
schema <--- table <--- column
^
|
sequence
**Intro**
This adds support to Citus to change the CPU priority values of
backends. This is created with two main usecases in mind:
1. Users might want to run the logical replication part of the shard moves
or shard splits at a higher speed than they would do by themselves.
This might cause some small loss of DB performance for their regular
queries, but this is often worth it. During high load it's very possible
that the logical replication WAL sender is not able to keep up with the
WAL that is generated. This is especially a big problem when the
machine is close to running out of disk when doing a rebalance.
2. Users might have certain long running queries that they don't impact
their regular workload too much.
**Be very careful!!!**
Using CPU priorities to control scheduling can be helpful in some cases
to control which processes are getting more CPU time than others.
However, due to an issue called "[priority inversion][1]" it's possible that
using CPU priorities together with the many locks that are used within
Postgres cause the exact opposite behavior of what you intended. This
is why this PR only allows the PG superuser to change the CPU priority
of its own processes. Currently it's not recommended to set `citus.cpu_priority`
directly. Currently the only recommended interface for users is the setting
called `citus.cpu_priority_for_logical_replication_senders`. This setting
controls CPU priority for a very limited set of processes (the logical
replication senders). So, the dangers of priority inversion are also limited
with when using it for this usecase.
**Background**
Before reading the rest it's important to understand some basic
background regarding process CPU priorities, because they are a bit
counter intuitive. A lower priority value, means that the process will
be scheduled more and whatever it's doing will thus complete faster. The
default priority for processes is 0. Valid values are from -20 to 19
inclusive. On Linux a larger difference between values of two processes
will result in a bigger difference in percentage of scheduling.
**Handling the usecases**
Usecase 1 can be achieved by setting `citus.cpu_priority_for_logical_replication_senders`
to the priority value that you want it to have. It's necessary to set
this both on the workers and the coordinator. Example:
```
citus.cpu_priority_for_logical_replication_senders = -10
```
Usecase 2 can with this PR be achieved by running the following as
superuser. Note that this is only possible as superuser currently
due to the dangers mentioned in the "Be very carefull!!!" section.
And although this is possible it's **NOT** recommended:
```sql
ALTER USER background_job_user SET citus.cpu_priority = 5;
```
**OS configuration**
To actually make these settings work well it's important to run Postgres
with more a more permissive value for the 'nice' resource limit than
Linux will do by default. By default Linux will not allow a process to
set its priority lower than it currently is, even if it was lower when
the process originally started. This capability is necessary to reset
the CPU priority to its original value after a transaction finishes.
Depending on how you run Postgres this needs to be done in one of two
ways:
If you use systemd to start Postgres all you have to do is add a line
like this to the systemd service file:
```conf
LimitNice=+0 # the + is important, otherwise its interpreted incorrectly as 20
```
If that's not the case you'll have to configure `/etc/security/limits.conf`
like so, assuming that you are running Postgres as the `postgres` OS user:
```
postgres soft nice 0
postgres hard nice 0
```
Finally you'd have add the following line to `/etc/pam.d/common-session`
```
session required pam_limits.so
```
These settings would allow to change the priority back after setting it
to a higher value.
However, to actually allow you to set priorities even lower than the
default priority value you would need to change the values in the
config to something lower than 0. So for example:
```conf
LimitNice=-10
```
or
```
postgres soft nice -10
postgres hard nice -10
```
If you use WSL2 you'll likely have to do another thing. You have to
open a new shell, because when PAM is only used during login, and
WSL2 doesn't actually log you in. You can force a login like this:
```
sudo su $USER --shell /bin/bash
```
Source: https://stackoverflow.com/a/68322992/2570866
[1]: https://en.wikipedia.org/wiki/Priority_inversion
When introducing non-blocking shard split functionality it was based
heavily on the non-blocking shard moves. However, differences between
usage was slightly to big to be able to reuse the existing functions
easily. So, most logical replication code was simply copied to dedicated
shard split functions and modified for that purpose.
This PR tries to create a more generic logical replication
infrastructure that can be used by both shard splits and shard moves.
There's probably more code sharing possible in the future, but I believe
this is at least a good start and addresses the lowest hanging fruit.
This also adds a CreateSimpleHash function that makes creating the
most common type of hashmap common.
This commit is inspired by a commit
dc9c3b0ff21465fa89d71eecf5e6cc956d647eca from PostgreSQL 15 that shares
the same header.
I also removed some gitignore rules so that I can add some files to git
worktree. We used to ignore the generated files, that are no longer
generated after this commit.
--------------------
Below is the commit message from PostgreSQL 15 commit
dc9c3b0ff21465fa89d71eecf5e6cc956d647eca :
"git mv" all the input/*.source and output/*.source files into
the corresponding sql/ and expected/ directories. Then remove
the pg_regress and Makefile infrastructure associated with
dynamic translation.
Discussion: https://postgr.es/m/1655733.1639871614@sss.pgh.pa.us
When using `citus.replicate_reference_tables_on_activate = off`,
reference tables need to be replicated later. This can be done using the
`replicate_reference_tables()` UDF. However, this function only allowed
blocking replication. This changes the function to default to logical
replication instead, and allows choosing any of our existing shard
transfer modes.
DESCRIPTION: Use faster custom copy logic for non-blocking shard moves
Non-blocking shard moves consist of two main phases:
1. Initial data copy
2. Catchup phase
This changes the first of these phases significantly. Previously we used the
copy logic provided by postgres subscriptions. This meant we didn't have
to implement it ourselves, but it came with the downside of little control.
When implementing shard splits we needed more control to even make it
work, so we implemented our own logic for copying data between nodes.
This PR starts using that logic for non-blocking shard moves. Doing so
has four main advantages:
1. It uses COPY in binary format when possible, which is cheaper to encode
and decode. Furthermore it very often results in less data that needs to
be sent over the network.
2. It allows us to create the primary key (or other replica identity) after doing
the initial data copy. This should give some speed up over the total run,
because creating an index is bulk is much faster than incrementally building it.
3. It doesn't require a replication slot per parallel copy. Increasing the maximum
number of replication slots uses resources in postgres, even if they are not used.
So reducing the number of replication slots that shard moves need is nice.
4. Logical replication table_sync workers are slow to start up, so if lots of shards
need to be copied that can make it quite slow. This can happen easily when
combining Postgres partitioning with Citus.