Fixes task executor SIGTERM handling.
Problem:
When task executors are sent SIGTERM, their default handler
`bgworker_die`, which is set at worker startup, logs FATAL error. But
they do not release locks there before logging the error, which
sometimes causes hanging of the monitor. e.g. Monitor waits for the lock
forever at pg_stat flush after calling proc_exit.
Solution:
Because executors have connection to backend, they should handle SIGTERM
similar to normal backends. Normal backends uses `die` handler, in which
they set ProcDiePending flag and the next CHECK_FOR_INTERRUPTS call
handles it gracefully by releasing any lock before termination.
This PR adds a new CI workflow named ```flaky-test``` to run flaky test
detection on newly introduced regression tests.
Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
Adding a testing function `wait_for_resource_cleanup` which waits until
all records in `pg_dist_cleanup` are cleaned up. The motivation is to
prevent flakiness in our tests, since the `NOTICE: cleaned up X orphaned
resources` message is not consistent in many cases. This PR replaces
`citus_cleanup_orphaned_resources` calls with
`wait_for_resource_cleanup` calls.
DESCRIPTION: Adds support for outer joins having a recurring rel in the
outer side of the join (e.g., \<reference table\> LEFT JOIN
\<distributed table\>)
Closes#6219.
Closes#521
If the outer part of an outer join is a recurring rel (i.e., reference
table
or an intermediate_result injected into the query during the earlier
stages
of the recursive planning), Citus cannot run the join query if the other
side
of the join is not a recurring rel (i.e., distributed table).
See DeferredErrorIfUnsupportedRecurringTuplesJoin for the reasoning.
And to support such joins, now we start recursively planning distributed
side
of such joins so that non-recurring rel becomes an intermediate result
(and
hence a recurring rel) since Citus already knows how to compute an outer
join
between two recurring rels already. In the simplest scenario, this means
to
convert
_"\<reference\> LEFT JOIN \<distributed\>"_ to
_"\<reference\> LEFT JOIN \<intermediate_result\>"_
by wrapping the distributed table into a subquery.
- [x] Add support for outer joins having a recurring rel in the outer
side and a "distributed table" (*) in the inner side of the join
- [x] Expand "distributed table" concept to "distributed rel" in first
item.
This means that;
- [x] Currently RecursivelyPlanNonRecurringJoinNode doesn't know how to
wrap a sub join tree that constitutes a recurring rel, e.g., rhs clause
of the following join: `reference LEFT OUTER <distributed INNER JOIN
distributed>`; fix this.
- [x] Similar to previous item, currently
RecursivelyPlanNonRecurringJoinNode doesn't know how to handle
subqueries constituting a distributed rel, e.g., `SELECT * FROM ref LEFT
JOIN (SELECT * FROM dist_1) u1 ON (ref.a = u1.a);`; fix this.
- [x] Add lateral join checks for now-supported outer joins into
recursive planner
- [x] Fix regressions tests
- [x] Verified each test output file by first un-distributing Citus
tables involved in related queries and re-running the test file.
- [x] Some of the tests --that were not supposed to return any data
before but this PR adds support for-- were likely to get flaky, so added
some "ORDER BY"s to them.
- [x] Continue doing manual testing and start writing a test file for
the join clauses that this PR adds support for --not only rely on
existing tests
See https://github.com/citusdata/citus/issues/6546 for what we could do
further.
DESCRIPTION: Create replication artifacts with unique names
We're creating replication objects with generic names. This disallows us
to enable parallel shard moves, as two operations might use the same
objects. With this PR, we'll create below objects with operation
specific names, by appending OparationId to the names.
* Subscriptions
* Publications
* Replication Slots
* Users created for subscriptions
1) Regular users fail to use clock UDF with permission issue.
2) Clock functions were declared as STABLE, whereas by definition they are VOLATILE. By design, any clock/time
functions will return different results for each call even within a single SQL statement.
Note: UDF citus_get_transaction_clock() is a misnomer as it internally calls the clock tick which always returns
different results for every invocation in the same transaction.
Adds signal handlers for graceful termination, cancellation of
task executors and detecting config updates. Related to PR #6459.
#### How to handle termination signal?
Monitor need to gracefully terminate all running task executors before
terminating. Hence, we have sigterm handler for the monitor.
#### How to handle cancellation signal?
Monitor need to gracefully cancel all running task executors before
terminating. Hence, we have sigint handler for the monitor.
#### How to detect configuration changes?
Monitor has SIGHUP handler to reflect configuration changes while
executing tasks.
Finds core files from correct path on CI. According to default core
pattern on CI, core is generated at the location relative to binary is
executed.
It can be safe to set core pattern before running binary but to change a
kernel param(in our case kernel.core_pattern), you need related
privilege in docker container. Or you have to change it at image build.
But, by default, on CI machines, kernel pattern contains a relative path
to binary + pid + process name, so we do not need to set it explicitly
for now. (Example core file name on CI machine:
`core.2559.!usr!lib!postgresql!14!bin!postgres`)
We are having some flakiness in our test schedule because of the objects
leftover from shard moves/splits. With this commit we prevent logging
cleanup object counts.
fixes: #6534
When using multiline strings, we occasionally forget to add a single
space at the end of the first line. When this line is concatenated with
the next one, the resulting string has a missing space.
With this PR, citus code will be tested in all packaging environments.
Sometimes, there can be compile errors which blocks packaging and in
this case unplanned delays may occur.
By testing the code in packaging environments, I'm aiming to detect any
compilation errors before packaging.
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
Co-authored-by: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com>
DESCRIPTION: Extend cleanup process for replication artifacts
This PR adds new cleanup record types for:
* Subscriptions
* Replication slots
* Publications
* Users created for subscriptions
We add records for these object types, to `pg_dist_cleanup` during
creation phase. Once the operation is done, in case of success or
failure, we iterate those records and drop the objects. With this PR we
will not be dropping any of these objects during the operation. In
short, we will always be deferring the drop.
One thing that's worth mentioning is that we sort cleanup records before
processing (dropping) them, because of dependency relations among those
objects, e.g a subscription might depend on a publication. Therefore, we
always drop subscriptions before publications.
We have some renames in this PR:
* `TryDropOrphanedShards` -> `TryDropOrphanedResources`
* `DropOrphanedShardsForCleanup` -> `DropOrphanedResourcesForCleanup`
* `run_try_drop_marked_shards` -> `run_try_drop_marked_resources`
as these functions now process replication artifacts as well.
This PR drops function `DropAllLogicalReplicationLeftovers` and its all
usages, since now we rely on the deferring drop mechanism.
Improvement on our background task monitoring API (PR #6296) to support
concurrent and nonblocking task execution.
Mainly we have a queue monitor background process which forks task
executors for `Runnable` tasks and then monitors their status by
fetching messages from shared memory queue in nonblocking way.
**Problem**: Currently, we error out if we detect recurring tuples in
one side without checking the other side of the join.
**Solution**: When one side of the full join consists recurring tuples
and the other side consists nonrecurring tuples, we should not pushdown
to prevent duplicate results. Otherwise, safe to pushdown.
This PR changes
```citus.propagate_session_settings_for_loopback_connection``` default
value to off not to expose this feature publicly at this point. See
#6488 for details.
When debugging issues it's quite useful to see the originating gpid in
the application_name of a query on a worker. This already happens for
most queries, but not for queries created by the rebalancer or by
run_command_on_worker. This adds a gpid to those two application_names
too.
Note, that if the GPID of the new application_names is different than
the current GPID of the backend the backend will continue to keep
the old gpid as its actual GPID. This PR is just meant to make sure
that the application_name is as useful as it can be for users to
look at. Updating of gpids will be done in a follow-up PR, and
adding gpids to all internal connections will make this easier.
DESCRIPTION: Fixes a potential dangling pointer issue
Need to backport to 11.0 & 11.1 since we might want to release packages
for debian/bookworm based on those branches in future.
Fixes a bug that causes crash when using auto_explain extension with
ALTER TABLE...ADD FOREIGN KEY... queries.
Those queries trigger a SELECT query on the citus tables as part of the
foreign key constraint validation check. At the explain hook, workers
try to explain this SELECT query as a distributed query causing memory
corruption in the connection data structures. Hence, we will not explain
ALTER TABLE...ADD FOREIGN KEY... and the triggered queries on the
workers.
Fixes#6424.