I recently cleaned up our test suite from redundant test outputs: #6111#6140#6214#6140#6434
I had to check many files manually, as they didn't have any
documentation on why the alternative test output existed in the first
place.
Adding a section in our test docs to remind developers to add
alternative test outputs with enough information/keywords.
(Hopefully) Fixes#5000.
If memory allocation done for `SubXactContext *state` in `PushSubXact()`
fails, then `PopSubXact()` might segfault, for example, when grabbing
the
topmost `SubXactContext` from `activeSubXactContexts` if this is the
first
ever subxact within the current xact, with the following stack trace:
```c
citus.so!list_nth_cell(const List * list, int n) (\opt\pgenv\pgsql-14.3\include\server\nodes\pg_list.h:260)
citus.so!PopSubXact(SubTransactionId subId) (\home\onurctirtir\citus\src\backend\distributed\transaction\transaction_management.c:761)
citus.so!CoordinatedSubTransactionCallback(SubXactEvent event, SubTransactionId subId, SubTransactionId parentSubid, void * arg) (\home\onurctirtir\citus\src\backend\distributed\transaction\transaction_management.c:673)
CallSubXactCallbacks(SubXactEvent event, SubTransactionId mySubid, SubTransactionId parentSubid) (\opt\pgenv\src\postgresql-14.3\src\backend\access\transam\xact.c:3644)
AbortSubTransaction() (\opt\pgenv\src\postgresql-14.3\src\backend\access\transam\xact.c:5058)
AbortCurrentTransaction() (\opt\pgenv\src\postgresql-14.3\src\backend\access\transam\xact.c:3366)
PostgresMain(int argc, char ** argv, const char * dbname, const char * username) (\opt\pgenv\src\postgresql-14.3\src\backend\tcop\postgres.c:4250)
BackendRun(Port * port) (\opt\pgenv\src\postgresql-14.3\src\backend\postmaster\postmaster.c:4530)
BackendStartup(Port * port) (\opt\pgenv\src\postgresql-14.3\src\backend\postmaster\postmaster.c:4252)
ServerLoop() (\opt\pgenv\src\postgresql-14.3\src\backend\postmaster\postmaster.c:1745)
PostmasterMain(int argc, char ** argv) (\opt\pgenv\src\postgresql-14.3\src\backend\postmaster\postmaster.c:1417)
main(int argc, char ** argv) (\opt\pgenv\src\postgresql-14.3\src\backend\main\main.c:209)
```
For this reason, to be more defensive against memory-allocation errors
that could happen at `PushSubXact()`, now we use our pre-allocated
memory
context for the objects created in `PushSubXact()`.
This commit also attempts reducing the memory allocations done under
CommitContext to reduce the chances of consuming all the memory
available
to CommitContext.
Note that it's problematic to encounter with such a memory-allocation
error for other objects created in `PushSubXact()` as well, so above is
an **example** scenario that might result in a segfault.
DESCRIPTION: Fixes a bug that might cause segfaults when handling deeply
nested subtransactions
DESCRIPTION: Makes sure to disallow triggers that depend on extensions
We were already doing so for `ALTER trigger DEPENDS ON EXTENSION`
commands. However, we also need to disallow creating Citus tables
having such triggers already, so this PR fixes that.
DESCRIPTION: Improve a query that terminates compeling backends from citus_update_node()
1. Use pg_blocking_pids() function instead of self join on pg_locks. It exists since 9.6 and more accurate than pg_locks.
2. Prefix all function calls with pg_catalog schema to prevent privilege escalation by creating functions with similar names in a public schema.
3. Change logs and update comments to reflect the fact that the pg_terminate_backend() function only sends SIGTERM but not wating for the actual backend termination.
DESCRIPTION: Allow citus_update_node() to work with nodes from different clusters
citus_update_node(), citus_nodename_for_nodeid(), and citus_nodeport_for_nodeid() functions only checked for nodes in their own clusters and hence last two returned NULLs and the first one showed an error is the nodeId was from a different cluster.
Fixes https://github.com/citusdata/citus/issues/6433
increasing logical clock. Clock guarantees to never go back in value after restarts,
and makes best attempt to keep the value close to unix epoch time in milliseconds.
Also, introduces a new GUC "citus.enable_cluster_clock", when true, every
distributed transaction is stamped with logical causal clock and persisted
in a catalog pg_dist_commit_transaction.
Recently a question was posed in the community how to handle security
related reports to Citus. Other Microsoft owned repositories include a
`SECURITY.md` file explaining how security related incidents can be
reported. Thanks @JelteF for finding these.
Looking around in internal systems I found a checklist for opensourcing
repositories where a SECURITY.md template was provided. For now we only
add the `SECURITY.md` file as it was prompted in the community how to
handle these.
DESCRIPTION: Drops GUC defer_drop_after_shard_split
DESCRIPTION: Drops GUC defer_drop_after_shard_move
Drop GUCs and related parts from the code.
Delete tests that specifically added for the GUCs.
Keep tests that can be used without the GUCs.
Update test output changes.
The motivation for this PR is to have an "always deferring" mechanism.
These two GUCs provide an option to not deferring dropping objects
during a shard move/split, and dropping them immediately. With this PR,
we will be always deferring dropping orphaned shards and other types of
objects.
We will have a separate PR to extend the deferred cleanup operation, so
that we would create records for deferred drop, for Subscriptions,
Publications, Replication Slots etc. This will make us be able to keep
track of created objects that needs to be dropped, during a shard
move/split. We will have objects created specifically for the current
operation; and those objects will be dropped at the end.
We have an issue (a draft roadmap) for enabling parallel shard moves.
For details please see: https://github.com/citusdata/citus/issues/6437
Sometimes in CI our failure_split_cleanup test would fail like this:
```diff
CALL pg_catalog.citus_cleanup_orphaned_resources();
-NOTICE: cleaned up 79 orphaned resources
+NOTICE: cleaned up 82 orphaned resources
SELECT operation_id, object_type, object_name, node_group_id, policy_type
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/28107/workflows/4ec712c9-98b5-4e90-9806-e02a37d71679/jobs/846107
The reason was that previous tests in the schedule would also create
some orphaned resources. Sometimes some of those would already be
cleaned up by the maintenance daemon, resulting in a different number of
cleaned up resources than expected. This cleans up any previously
created resources at the start of the test without logging how many
exactly were cleaned up. As a bonus this now also allows running this
test using check-failure-base.
This didn't cause any bugs since today we're always calling
UpdateAutoConvertedForConnectedRelations with autoconverted=false, so we
don't need to backport this to anywhere.
Good PR descriptions for flaky tests are quite helpful when reviewing.
Although obviously no PR description is the same, there's a few common
pieces of information that are useful for all PRs that fix flaky tests.
We should not introduce breaking sql changes to upgrade files after they
are released. We did that for worker_fetch_foreign_file in v9.0.0 and
worker_repartition_cleanup in v9.2.0. Later when we try to drop those
udfs, they were missing for some clients unexpectedly due to breaking
change in an old upgrade script. For that case, the fix is to add DROP
IF EXISTS for those 2 udfs in 11.0-4--11.1-1.
This crash happens with recursively planned queries. For such queries,
subplans are explained via the ExplainOnePlan function of postgresql.
This function reconstructs the query description from the plan therefore
it expects the ActiveSnaphot for the query be available. This fix makes
sure that the snapshot is in the stack before calling ExplainOnePlan.
Fixes#2920.
DESCRIPTION: Don't leak search_path to workers on DDL
For DDL we have to set the `search_path` on workers to the same as on
the coordinator for some DDL to work. Previously this search_path would
leak outside of the transaction that was used for the DDL. This fixes
that by using `SET LOCAL` instead of `SET`. The only place where we
still use plain `SET` is for DDL commands that are not allowed within
transactions, such as `CREATE INDEX CONCURRENLTY`.
This fixes this flaky test:
```diff
CONTEXT: SQL statement "SELECT change_id FROM distributed_triggers.data_changes
WHERE shard_key_value = NEW.shard_key_value AND object_id = NEW.object_id
ORDER BY change_id DESC LIMIT 1"
-PL/pgSQL function record_change() line XX at SQL statement
+PL/pgSQL function distributed_triggers.record_change() line 17 at SQL statement
while executing command on localhost:57638
DELETE FROM data_ref_table where shard_key_value = 'hello';
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/27849/workflows/75ae5f1a-100b-4b7a-b991-7de069f39ee1/jobs/831429
I had tried to fix this flaky test in #5894 and then I tried
implementing a better fix in #5896, where @marcocitus suggested this
better fix. This change reverts the fix from #5894 and implements the
fix suggested by Marco.
Our multi_mx_alter_distributed_table test actually depended on the old
buggy search_path leaking behavior. After fixing the bug that test would
fail like this:
```diff
CALL proc_0(1.0);
DEBUG: pushing down the procedure
-NOTICE: Res: 3
-DETAIL: from localhost:xxxxx
+ERROR: relation "test_proc_colocation_0" does not exist
+CONTEXT: PL/pgSQL function mx_alter_distributed_table.proc_0(double precision) line 5 at SQL statement
+while executing command on localhost:57637
RESET client_min_messages;
```
I fixed this test by fully qualifying the table names used in the
procedure. I think it's quite unlikely that actual users depend
on this behavior though. Since it would require first doing
DDL before calling a procedure in a session where the
search_path was changed after connecting.
DESCRIPTION: Adds failure test for shard move
DESCRIPTION: Remove function `WaitForAllSubscriptionsToBecomeReady` and
related tests
Adding some failure tests for shard moves.
Dropping the not-needed-anymore function
`WaitForAllSubscriptionsToBecomeReady`, as the subscriptions now start
as ready from the beginning because we don't use logical replication
table sync workers anymore.
fixes: #6260
In CI shard_rebalancer sometimes fails with this error:
```diff
SET citus.node_connection_timeout to 60;
BEGIN;
SET LOCAL citus.shard_replication_factor TO 2;
SET citus.log_remote_commands TO ON;
SET SESSION citus.max_adaptive_executor_pool_size TO 5;
SELECT replicate_table_shards('dist_table_test_2', max_shard_copies := 4, shard_transfer_mode:='block_writes');
+WARNING: could not establish connection after 60 ms
```
Source
https://app.circleci.com/pipelines/github/citusdata/citus/28128/workflows/38eeacc4-4191-4366-87ed-9a628414965a/jobs/847458?invite=true#step-107-21
This PR avoids this issue by increasing
```citus.node_connection_timeout``` to 35s.
I fixed a lot of flaky tests recently and I found some patterns in the
type of issues and type of fixes. This adds a document that lists
these types of issues and explains how to fix them.
To be able to test non-blocking shard moves we take an advisory lock, so
we can pause the shard move at an interesting moment. Originally this
was during the logical replication catch up phase. But when I added
tests for the rebalancer progress I moved this lock before the initial
data copy. This allowed testing of the rebalance progress, but
inadvertently made our non-blocking tests not actually test if we held
unintended locks during logical replication catch up.
This fixes that by creating two types of advisory locks, one before the
copy and one after. This causes the tests to actually test their
intended scenario again.
Furthermore it starts using one of these locks for blocking shard moves
too. Which allowed me to reduce the complexity of the rebalance progress
test suite quite a bit. It also allowed enabling some flaky tests again,
because this stopped them from being flaky. And finally it allowed
testing of rebalance progress for blocking shard copy operations as
well.
In passing it fixes a flaky test during parallel blocking shard moves by
ordering the output.
DESCRIPTION: Adds status column to get_rebalance_progress()
Introduces a new column named `status` for the function
`get_rebalance_progress()`. For each ongoing shard move, this column
will reveal information about that shard move operation's current
status.
For now, candidate status messages could be one of the below.
* Not Started
* Setting Up
* Copying Data
* Catching Up
* Creating Constraints
* Final Catchup
* Creating Foreign Keys
* Completing
* Completed
Deparser function set_relation_column_names() knows that it needs to
re-evaluate column names based on relation's tuple descriptor when
the rte belongs to a relation (RTE_RELATION).
However before this commit, it didn't know about the fact that citus
might wrap such an rte with an rte that points to
citus_extradata_container() placeholder.
And because of this, it was simply taking the column aliases
(e.g., "bar" in "foo AS bar") into the account and this might result in
an incorrectly deparsed query as in below case:
* Say, if we had view based on following query:
```sql
SELECT a FROM table;
```
* And if we rename column "a" to "b", the view query normally becomes:
```sql
SELECT b AS a FROM table;
```
* So before this commit, deparsing a query based on that view was
resulting in such a query due to deparsing based on the column aliases,
which is not correct:
```sql
SELECT a FROM table;
```
Fixes#5932.
DESCRIPTION: Fixes a bug that might cause failing to query the views
based on tables that have renamed columns
PostgreSQL 15 exposes WL_SOCKET_CLOSED in WaitEventSet API, which is
useful for detecting closed remote sockets. In this patch, we use this
new event and try to detect closed remote sockets in the executor.
When a closed socket is detected, the executor now has the ability to
retry the connection establishment. Note that, the executor can retry
connection establishments only for the connection that has not been
used. Basically, this patch is mostly useful for preventing the executor
to fail if a cached connection is closed because of the worker node
restart (or worker failover).
In other words, the executor cannot retry connection establishment if we
are in a distributed transaction AND any command has been sent over the
connection. That requires more sophisticated retry mechanisms. For now,
fixing the above use case is enough.
Fixes#5538
Earlier discussions: #5908, #6259 and #6283
### Summary of the current approach regards to earlier trials
As noted, we explored some alternatives before getting into this.
https://github.com/citusdata/citus/pull/6283 is simple, but lacks an
important property. We should be checking for `WL_SOCKET_CLOSED`
_before_ sending anything over the wire. Otherwise, it becomes very
tricky to understand which connection is actually safe to retry. For
example, in the current patch, we can safely check
`transaction->transactionState == REMOTE_TRANS_NOT_STARTED` before
restarting a connection.
#6259 does what we intent here (e.g., check for sending any command).
However, as @marcocitus noted, it is very tricky to handle
`WaitEventSets` in multiple places. And, the executor is designed such
that it reacts to the events. So, adding anything `pre-executor` seemed
too ugly.
In the end, I converged into this patch. This patch relies on the
simplicity of #6283 and also does a very limited handling of
`WaitEventSets`, just for our purpose. Just before we add any connection
to the execution, we check if the remote session has already closed.
With that, we do a brief interaction of multiple wait event processing,
but with different purposes. The new wait event processing we added does
not even consider cancellations. We let that handled by the main event
processing loop.
Co-authored-by: Marco Slot <marco.slot@gmail.com>
In #6405 I added better improved blocked process detection for isolation
tests. But when cleaning up unnecessary code I cleaned up a bit too
much. This actually includes the new function definition in our
migrations.
In CI multi_partitioning sometimes fails with this error:
```diff
SELECT citus_remove_node('localhost', :master_port);
- citus_remove_node
----------------------------------------------------------------------
-
-(1 row)
-
+ERROR: tuple concurrently deleted
-- d) invalid tables for helper UDFs
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/27993/workflows/685e5b20-c923-43e5-8a0d-b932ef4c4914/jobs/839466
This PR avoids this concurrency issue by not running the
multi_partitioning test in parallel with other tests.
If an operation requires having coordinator in pg_dist_node and if that
is not the case, then we automatically add the coordinator into
pg_dist_node if user didn't add any worker nodes yet.
However, if user have already added some worker nodes before, we throw
an error. With this commit, we improve the error thrown in that case.
Closes#6423 based on the discussion made there.
Sometimes our CI randomly fails on a test in a way similar to this:
```diff
step s2-drop:
DROP TABLE cancel_table;
-
+ <waiting ...>
+step s2-drop: <... completed>
starting permutation: s1-timeout s1-begin s1-sleep10000 s1-rollback s1-reset s1-drop
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/26524/workflows/5415b84f-13a3-482f-bef9-648314c79a67/jobs/756377
I tried to fix that already in #6252 by disabling the maintenance daemon
during isolation tests. But it seems that hasn't fixed all cases of
these errors. This is another attempt at fixing these issues that seems
to have better results.
What it does is that it starts using the pInterestingPids parameter that
citus_isolation_test_session_is_blocked receives. With this change we
start filter out block-edges that are not caused by any of these pids.
In passing this change also makes it possible to run
`isolation_create_distributed_table_concurrently` with
`check-isolation-base`
PG15 introduced a function called ReplicationSlotName that causes
conflicts with our function with the same name. I solved this issue by
renaming our function to ReplicationSlotNameForNodeAndOwner
Relevant PG commit:
c3b5992b91
DESCRIPTION: Fix bug in global PID assignment for rebalancer
sub-connections
In CI our isolation_shard_rebalancer_progress test would sometimes fail
like this:
```diff
+isolationtester: canceling step s1-rebalance-c1-block-writes after 60 seconds
step s1-rebalance-c1-block-writes:
SELECT rebalance_table_shards('colocated1', shard_transfer_mode:='block_writes');
- <waiting ...>
+
+ERROR: canceling statement due to user request
step s7-get-progress:
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/27855/workflows/2a7e335a-f3e8-46ed-b6bd-6920d42f7214/jobs/831710
It turned out this was an actual bug in the way our assigning of global
PIDs interacts with the way we connect to ourselves as the shard
rebalancer. The first command the shard rebalancer sends is a SET
ommand to change the application_name to `citus_rebalancer`. If
`StartupCitusBackend` is called after this command is processed, then it
overwrites the global PID that was extracted from the previous
application_name. This makes sure that we don't do that, and continue to
use the original global PID. While it might seem that we only call
`StartupCitusBackend` once for each query backend, this isn't actually
the case. Whenever pg_dist_partition gets ANALYZEd by autovacuum
we indirectly call `StartupCitusBackend` again, because we invalidate
the cache then.
In passing this fixes two other things as well:
1. It sets `distributedCommandOriginator` correctly in
`AssignGlobalPID`, by using IsExternalClientBackend(). This doesn't
matter much anymore, since AssignGlobalPID effectively becomes a
no-op in this PR for any non-external client backends.
2. It passes the application_name to InitializeBackendData in
StartupCitusBackend, instead of INVALID_CITUS_INTERNAL_BACKEND_GPID
(which effectively got casted to NULL). In practice this doesn't
change the behaviour of the call, since the call is a no-op for every
backend except the maintenance daemon. And the behaviour of the call
is the same for NULL as for the application_name of the maintenance
daemon.
We decrease verbosity level here to avoid the flaky output
https://app.circleci.com/pipelines/github/citusdata/citus/27936/workflows/dc63128a-1570-41a0-8722-08f3e3cfe301/jobs/836153
```diff
select alter_table_set_access_method('ref','heap');
NOTICE: creating a new table for alter_table_set_access_method.ref
NOTICE: moving the data of alter_table_set_access_method.ref
NOTICE: dropping the old alter_table_set_access_method.ref
NOTICE: drop cascades to 2 other objects
-DETAIL: drop cascades to materialized view m_ref
-drop cascades to view v_ref
+DETAIL: drop cascades to view v_ref
+drop cascades to materialized view m_ref
CONTEXT: SQL statement "DROP TABLE alter_table_set_access_method.ref CASCADE"
NOTICE: renaming the new table to alter_table_set_access_method.ref
alter_table_set_access_method
-------------------------------
(1 row)
```