From 3da6e3e74330dd96e25a819840d741d74720a12e Mon Sep 17 00:00:00 2001 From: aykut-bozkurt <51649454+aykut-bozkurt@users.noreply.github.com> Date: Mon, 12 Dec 2022 16:44:36 +0300 Subject: [PATCH] bgworkers with backend connection should handle SIGTERM properly (#6552) Fixes task executor SIGTERM handling. Problem: When task executors are sent SIGTERM, their default handler `bgworker_die`, which is set at worker startup, logs FATAL error. But they do not release locks there before logging the error, which sometimes causes hanging of the monitor. e.g. Monitor waits for the lock forever at pg_stat flush after calling proc_exit. Solution: Because executors have connection to backend, they should handle SIGTERM similar to normal backends. Normal backends uses `die` handler, in which they set ProcDiePending flag and the next CHECK_FOR_INTERRUPTS call handles it gracefully by releasing any lock before termination. --- src/backend/distributed/utils/background_jobs.c | 2 ++ src/test/regress/bin/normalize.sed | 3 +++ .../regress/expected/background_task_queue_monitor.out | 10 +++++----- 3 files changed, 10 insertions(+), 5 deletions(-) diff --git a/src/backend/distributed/utils/background_jobs.c b/src/backend/distributed/utils/background_jobs.c index a502b9219..961acc356 100644 --- a/src/backend/distributed/utils/background_jobs.c +++ b/src/backend/distributed/utils/background_jobs.c @@ -1613,6 +1613,8 @@ CitusBackgroundJobExecutorErrorCallback(void *arg) void CitusBackgroundTaskExecutor(Datum main_arg) { + /* handles SIGTERM similar to backends */ + pqsignal(SIGTERM, die); BackgroundWorkerUnblockSignals(); /* Set up a dynamic shared memory segment. */ diff --git a/src/test/regress/bin/normalize.sed b/src/test/regress/bin/normalize.sed index 3d251156c..204da389e 100644 --- a/src/test/regress/bin/normalize.sed +++ b/src/test/regress/bin/normalize.sed @@ -304,3 +304,6 @@ s/LOG: duration: [0-9].[0-9]+ ms/LOG: duration: xxxx ms/g s/"Total Cost": [0-9].[0-9]+/"Total Cost": xxxx/g s/(NOTICE: issuing SET LOCAL application_name TO 'citus_rebalancer gpid=)[0-9]+/\1xxxxx/g + +# PG13 changes bgworker sigterm message, we can drop that line with PG13 drop +s/(FATAL: terminating).*Citus Background Task Queue Executor.*(due to administrator command)\+/\1 connection \2 \+/g diff --git a/src/test/regress/expected/background_task_queue_monitor.out b/src/test/regress/expected/background_task_queue_monitor.out index 2d528c50c..ebaa2148c 100644 --- a/src/test/regress/expected/background_task_queue_monitor.out +++ b/src/test/regress/expected/background_task_queue_monitor.out @@ -454,13 +454,13 @@ SELECT pg_sleep(2); -- wait enough to show that tasks are terminated SELECT task_id, status, retry_count, message FROM pg_dist_background_task WHERE task_id IN (:task_id1, :task_id2) ORDER BY task_id; -- show that all tasks are runnable by retry policy after termination signal - task_id | status | retry_count | message + task_id | status | retry_count | message --------------------------------------------------------------------- - 21 | runnable | 1 | FATAL: terminating background worker "Citus Background Task Queue Executor: regression/postgres for (13/21)" due to administrator command+ - | | | CONTEXT: Citus Background Task Queue Executor: regression/postgres for (13/21) + + 21 | runnable | 1 | FATAL: terminating connection due to administrator command + + | | | CONTEXT: Citus Background Task Queue Executor: regression/postgres for (13/21)+ | | | - 22 | runnable | 1 | FATAL: terminating background worker "Citus Background Task Queue Executor: regression/postgres for (14/22)" due to administrator command+ - | | | CONTEXT: Citus Background Task Queue Executor: regression/postgres for (14/22) + + 22 | runnable | 1 | FATAL: terminating connection due to administrator command + + | | | CONTEXT: Citus Background Task Queue Executor: regression/postgres for (14/22)+ | | | (2 rows)