With this PR we add isolation tests for
COPY to reference table vs. other operations
COPY to partitioned table vs. other operations
Multi row INSERTs vs other operations
INSERT/SELECT vs. other operations
UPSERT vs. other operations
DELETE vs. other operations
TRUNCATE vs. other operations
DROP vs. other operations
DDL vs. other operations
other operations consist of basic SQL operations (like SELECT,
INSERT, DELETE, UPSERT, COPY TRUNCATE, CREATE INDEX) as well
as some Citus functionalities (like master_modify_multiple_shards,
master_apply_delete_command, citus_total_relation_size etc.)
If after the distributed deadlock detection decides to cancel
a backend, the backend has been terminated/killed/cancelled
externally, we might be accessing to a NULL pointer. This commit
prevents that case by ignoring the current distributed deadlock.
This is necessary for multi-row INSERTs for the same reasons we use it
in e.g. UPSERTs: if the range table list has more than one entry, then
PostgreSQL's deparse logic requires that vars be prefixed by the name
of their corresponding range table entry. This of course doesn't affect
single-row INSERTs, but since multi-row INSERTs have a VALUE RTE, they
were affected.
The piece of ruleutils which builds range table names wasn't modified
to handle shard extension; instead UPSERT/INSERT INTO ... SELECT added
an alias to the RTE. When present, this alias is favored. Doing the
same in the multi-row INSERT case fixes RETURNING for such commands.
This change adds support for SAVEPOINT, ROLLBACK TO SAVEPOINT, and RELEASE SAVEPOINT.
When transaction connections are not established yet, savepoints are kept in a stack and sent to the worker when the connection is later established. After establishing connections, savepoint commands are sent as they arrive.
This change fixes#1493 .
We should prevent running the deadlock detection if
there is a major version change. Otherwise, the daemon
may access to obsolete metadata catalog tables.
This change fixes a use-after-free bug while renaming obsolete
`pg_worker_list.conf` file, which causes Citus to crash during upgrade
(or even extension creation) if `pg_worker_list.conf` exists.
We added a new field to the transaction id that is set to true only
for the transactions initialized on the coordinator. This is only
useful for MX in order to distinguish the transaction that started
the distributed transaction on the coordinator where we could
have the same transactions' worker queries on the same node.
We added a new GUC citus.log_distributed_deadlock_detection
which is off by default. When set to on, we log some debug messages
related to the distributed deadlock to the server logs.
With this commit, the maintenance deamon starts to check for
distributed deadlocks.
We also introduced a GUC variable (distributed_deadlock_detection_factor)
whose value is multiplied with Postgres' deadlock_timeout. Setting
it to -1 disables the distributed deadlock detection.
We send SIGINT to a backend that is cancelled due to a deadlock. That
approach ends up being a very confusing error message.
With this commit we intercept the error messages and show a more
meaningful error message to the user.
Now that we already have the necessary infrastructure for detecting
distributed deadlocks. Thus, we don't need enable_deadlock_prevention
which is purely intended for preventing some forms of distributed
deadlocks.
This commit adds all the necessary pieces to do the distributed
deadlock detection.
Each distributed transaction is already assigned with distributed
transaction ids introduced with
3369f3486f. The dependency among the
distributed transactions are gathered with
80ea233ec1.
With this commit, we implement a DFS (depth first seach) on the
dependency graph and search for cycles. Finding a cycle reveals
a distributed deadlock.
Once we find the deadlock, we examine the path that the cycle exists
and cancel the youngest distributed transaction.
Note that, we're not yet enabling the deadlock detection by default
with this commit.