citus

History

Emel Şimşek e9035f6d32 Send keepalive messages in split decoder periodically to avoid wal receiver timeouts during large shard splits. (#7229 ) DESCRIPTION: Send keepalive messages during the logical replication phase of large shard splits to avoid timeouts. During the logical replication part of the shard split process, split decoder filters out the wal records produced by the initial copy. If the number of wal records is big, then split decoder ends up processing for a long time before sending out any wal records through pgoutput. Hence the wal receiver may time out and restarts repeatedly causing our split driver code catch up logic to fail. Notes: 1. If the wal_receiver_timeout is set to a very small number e.g. 600ms, it may time out before receiving the keepalives. My tests show that this code works best when the` wal_receiver_timeout `is set to 1minute, which is the default value. 2. Once a logical replication worker time outs, a new one gets launched. The new logical replication worker sets the pg_stat_subscription columns to initial values. E.g. the latest_end_lsn is set to 0. Our driver logic in `WaitForGroupedLogicalRepTargetsToCatchUp` can not handle LSN value to go back. This is the main reason for it to get stuck in the infinite loop.		2023-10-09 22:33:08 +03:00
..
backend	Send keepalive messages in split decoder periodically to avoid wal receiver timeouts during large shard splits. (#7229 )	2023-10-09 22:33:08 +03:00
include	Fix leaking of memory and memory contexts in Foreign Constraint Graphs (#7236 )	2023-10-09 13:05:51 +02:00
test	Take improvement_threshold into the account in citus_add_rebalance_strategy() (#7247 )	2023-10-09 13:13:08 +03:00