Mitigate segfault in connection statemachine (#4551)

As described in the comment, we have observed crashes in production
due to a segfault caused by the dereference of a NULL pointer in our
connection statemachine.

As a mitigation, preventing system crashes, we provide an error with
a small explanation of the issue. Unfortunately the case is not
reliably reproduced yet, hence the inability to add tests.

DESCRIPTION: Prevent segfaults when SAVEPOINT handling cannot recover from connection failures
(cherry picked from commit d127516dc8)
pull/4578/head
Nils Dijk 2021-01-25 15:55:04 +01:00 committed by Sait Talha Nisanci
parent 49ce36fe8b
commit 2efeed412a
1 changed files with 19 additions and 0 deletions

View File

@ -3297,6 +3297,25 @@ TransactionStateMachine(WorkerSession *session)
case REMOTE_TRANS_SENT_COMMAND:
{
TaskPlacementExecution *placementExecution = session->currentTask;
if (placementExecution == NULL)
{
/*
* We have seen accounts in production where the placementExecution
* could inadvertently be not set. Investigation documented on
* https://github.com/citusdata/citus-enterprise/issues/493
* (due to sensitive data in the initial report it is not discussed
* in our community repository)
*
* Currently we don't have a reliable way of reproducing this issue.
* Erroring here seems to be a more desirable approach compared to a
* SEGFAULT on the dereference of placementExecution, with a possible
* crash recovery as a result.
*/
ereport(ERROR, (errmsg(
"unable to recover from inconsistent state in "
"the connection state machine on coordinator")));
}
ShardCommandExecution *shardCommandExecution =
placementExecution->shardCommandExecution;
Task *task = shardCommandExecution->task;