mirror of https://github.com/citusdata/citus.git
Mitigate segfault in connection statemachine (#4551)
As described in the comment, we have observed crashes in production
due to a segfault caused by the dereference of a NULL pointer in our
connection statemachine.
As a mitigation, preventing system crashes, we provide an error with
a small explanation of the issue. Unfortunately the case is not
reliably reproduced yet, hence the inability to add tests.
DESCRIPTION: Prevent segfaults when SAVEPOINT handling cannot recover from connection failures
(cherry picked from commit d127516dc8
)
pull/4578/head
parent
49ce36fe8b
commit
2efeed412a
|
@ -3297,6 +3297,25 @@ TransactionStateMachine(WorkerSession *session)
|
|||
case REMOTE_TRANS_SENT_COMMAND:
|
||||
{
|
||||
TaskPlacementExecution *placementExecution = session->currentTask;
|
||||
if (placementExecution == NULL)
|
||||
{
|
||||
/*
|
||||
* We have seen accounts in production where the placementExecution
|
||||
* could inadvertently be not set. Investigation documented on
|
||||
* https://github.com/citusdata/citus-enterprise/issues/493
|
||||
* (due to sensitive data in the initial report it is not discussed
|
||||
* in our community repository)
|
||||
*
|
||||
* Currently we don't have a reliable way of reproducing this issue.
|
||||
* Erroring here seems to be a more desirable approach compared to a
|
||||
* SEGFAULT on the dereference of placementExecution, with a possible
|
||||
* crash recovery as a result.
|
||||
*/
|
||||
ereport(ERROR, (errmsg(
|
||||
"unable to recover from inconsistent state in "
|
||||
"the connection state machine on coordinator")));
|
||||
}
|
||||
|
||||
ShardCommandExecution *shardCommandExecution =
|
||||
placementExecution->shardCommandExecution;
|
||||
Task *task = shardCommandExecution->task;
|
||||
|
|
Loading…
Reference in New Issue