PG-1349 Prevent LWLock deadlocks from happening

Instead of trying to fix every case where we could throw an error and
handling that properly we just make sure to disable the error capture
of the hook while our backend holds the lock.

We keep the check for IsSystemOOM() in the hook even though that might
not be relevant anymore because if we are in OOM it is not like there
would be any point to log the error anyway.

This is done via a global variable, similar to the
__pgsm_do_not_capture_error variable that we are replacing, which we
also use in one place to disable recursive calls to the log hook
where we do not hold the lock.

A potential future improvement would be to make this variable a counter,
or have two separate globals, so that we could guard against recursive
calls to the hook running us out of stack and not just prevent the
deadlocks.
This commit is contained in:
Andreas Karlsson
2025-02-20 11:07:55 +01:00
parent fd43b75153
commit 4ebb3d1f36
2 changed files with 63 additions and 65 deletions

View File

@@ -112,31 +112,6 @@
/* Update this if need a enum GUC with more options. */
#define MAX_ENUM_OPTIONS 6
/*
* API for disabling error capture ereport(ERROR,..) by PGSM's error capture hook
* pgsm_emit_log_hook()
*
* Use these macros as follows:
* PGSM_DISABLE_ERROR_CAPUTRE();
* {
* ... code that might throw ereport(ERROR) ...
* }PGSM_END_DISABLE_ERROR_CAPTURE();
*
* These macros can be used to error recursion if the error gets
* thrown from within the function called from pgsm_emit_log_hook()
*/
extern volatile bool __pgsm_do_not_capture_error;
#define PGSM_DISABLE_ERROR_CAPUTRE() \
do { \
__pgsm_do_not_capture_error = true
#define PGSM_END_DISABLE_ERROR_CAPTURE() \
__pgsm_do_not_capture_error = false; \
} while (0)
#define PGSM_ERROR_CAPTURE_ENABLED \
__pgsm_do_not_capture_error == false
/*
* pg_stat_monitor uses the hash structure to store all query statistics
* except the query text, which gets stored out of line in the raw DSA area.