One of our most flaky and most anoying tests is
multi_cluster_management. It usually fails like this:
```diff
SELECT citus_disable_node('localhost', :worker_2_port);
citus_disable_node
--------------------
(1 row)
SELECT public.wait_until_metadata_sync(60000);
+WARNING: waiting for metadata sync timed out
wait_until_metadata_sync
--------------------------
(1 row)
```
This tries to address that by hardening wait_until_metadata_sync. I
believe the reason for this warning is that there is a race condition in
wait_until_metadata_sync. It's possible for the pre-check to fail, then
have the maintenance daemon send a notification. And only then have the
backend start to listen. I tried to fix it in two ways:
1. First run LISTEN, and only then read do the pre-check.
2. If we time out, check again just to make sure that we did not miss
the notification somehow. And don't show a warning if all metadata is
synced after the timeout.
It's hard to know for sure that this fixes it because the test is not
repeatable and I could not reproduce it locally. Let's just hope for the
best.
---------
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
I just enhanced the existing code to check if the relation is an index
belonging to a distributed table.
If so the shardId is appended to relation (index) name and the *_size
function are executed as before.
There is a change in an extern function:
`extern StringInfo GenerateSizeQueryOnMultiplePlacements(...)`
It's possible to create a new function and deprecate this one later if
compatibility is an issue.
Fixes https://github.com/citusdata/citus/issues/6496.
DESCRIPTION: Allows using Citus size functions on distributed tables
indexes.
---------
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
DESCRIPTION: This change starts a maintenance deamon at the time of
server start if there is a designated main database.
This is the code flow:
1. User designates a main database:
`ALTER SYSTEM SET citus.main_db = "myadmindb";`
2. When postmaster starts, in _PG_Init, citus calls
`InitializeMaintenanceDaemonForMainDb`
This function registers a background worker to run
`CitusMaintenanceDaemonMain `with `databaseOid = 0 `
3. `CitusMaintenanceDaemonMain ` takes some special actions when
databaseOid is 0:
- Gets the citus.main_db value.
- Connects to the citus.main_db
- Now the `MyDatabaseId `is available, creates a hash entry for it.
- Then follows the same control flow as for a regular db,
HasDistributionKey & HasDistributionKeyCacheEntry returns true when the
corresponding table has a distribution key, the comments state the
opposite,
which should be fixed.
Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
DESCRIPTION: Send keepalive messages during the logical replication
phase of large shard splits to avoid timeouts.
During the logical replication part of the shard split process, split
decoder filters out the wal records produced by the initial copy. If the
number of wal records is big, then split decoder ends up processing for
a long time before sending out any wal records through pgoutput. Hence
the wal receiver may time out and restarts repeatedly causing our split
driver code catch up logic to fail.
Notes:
1. If the wal_receiver_timeout is set to a very small number e.g. 600ms,
it may time out before receiving the keepalives. My tests show that this
code works best when the` wal_receiver_timeout `is set to 1minute, which
is the default value.
2. Once a logical replication worker time outs, a new one gets launched.
The new logical replication worker sets the pg_stat_subscription columns
to initial values. E.g. the latest_end_lsn is set to 0. Our driver logic
in `WaitForGroupedLogicalRepTargetsToCatchUp` can not handle LSN value
to go back. This is the main reason for it to get stuck in the infinite
loop.
DESCRIPTION: Fix leaking of memory and memory contexts in Foreign
Constraint Graphs
Previously, every time we (re)created the Foreign Constraint
Relationship Graph, we created a new Memory Context while loosing a
reference to the previous context. This old context could still have
left over memory in there causing a memory leak.
With this patch we statically have one memory context that we lazily
initialize the first time we create our foreign constraint relationship
graph. On every subsequent creation, beside destroying our previous
hashmap we also reset our memory context to remove any left over
references.
This commit aims to add a comprehensive guide that covers all essential
aspects of Citus, including planning, execution, locking mechanisms,
shard moves, 2PC, and many other major components of Citus.
Co-authored-by: Marco Slot <marco.slot@gmail.com>
DESCRIPTION: Shard moves/isolate report LSN's in lsn format
While investigating an issue with our catchup mechanism on certain
postgres versions we noticed we print LSN's in the format of the native
long type. This is an uncommon representation for LSN's in postgres
logs.
This patch changes the output of our log message to go from the long
type representation to the native LSN type representation. Making it
easier for postgres users to recognize and compare LSN's with other
related reports.
example of new output:
```
2023-09-25 17:28:47.544 CEST [11345] LOG: The LSN of the target subscriptions on node localhost:9701 have increased from 0/0 to 0/E1ED20F8 at 2023-09-25 17:28:47.544165+02 where the source LSN is 1/415DCAD0
```
When cdc got added the makefiles hardcoded the `.so` extension instead
of using the platform specifc `$(DLSUFFIX)` variable used by `pgxs.mk`.
Also don't remove installed cdc artifacts on `make clean`.
DESCRIPTION: Adds support for ALTER DATABASE <db_name> SET .. statement
propagation
SET statements in Postgres has a common structure which is already being
used in Alter Function
statement.
In this PR, I added a util file; citus_setutils and made it usable for
both for
alter database<db_name>set .. and alter function ... set ... statements.
With this PR, below statements will be propagated
```sql
ALTER DATABASE name SET configuration_parameter { TO | = } { value | DEFAULT }
ALTER DATABASE name SET configuration_parameter FROM CURRENT
ALTER DATABASE name RESET configuration_parameter
ALTER DATABASE name RESET ALL
```
Additionally, there was a bug in processing float values in the common
code block.
I fixed this one as well
Previous
```C
case T_Float:
{
appendStringInfo(buf, " %s", strVal(value));
break;
}
```
Now
```C
case T_Float:
{
appendStringInfo(buf, " %s", nodeToString(value));
break;
}
```
DESCRIPTION: Adds ALTER DATABASE WITH ... and REFRESH COLLATION VERSION
support
This PR adds supports for basic ALTER DATABASE statements propagation
support. Below statements are supported:
ALTER DATABASE <database_name> with IS_TEMPLATE <true/false>;
ALTER DATABASE <database_name> with CONNECTION LIMIT <integer_value>;
ALTER DATABASE <database_name> REFRESH COLLATION VERSION;
---------
Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
We currently don't support propagating these options in Citus
Relevant PG commits:
https://github.com/postgres/postgres/commit/e3ce2dehttps://github.com/postgres/postgres/commit/3d14e17
Limitation:
We also need to take care of generated GRANT statements by dependencies
in attempt to distribute something else. Specifically, this part of the
code in `GenerateGrantRoleStmtsOfRole`:
```
grantRoleStmt->admin_opt = membership->admin_option;
```
In PG16, membership also has `inherit_option` and `set_option` which
need to properly be part of the `grantRoleStmt`. We can skip for now
since #7164 will take care of this soon, and also this is not an
expected use-case.
Add citus_schema_move() that can be used to move tenant tables within a distributed
schema to another node. The function has two variations as simple wrappers around
citus_move_shard_placement() and citus_move_shard_placement_with_nodeid() respectively.
They pick a shard that belongs to the given tenant schema and resolve the source node
that contain the shards under given tenant schema. Hence their signatures are quite
similar to underlying functions:
```sql
-- citus_schema_move(), using target node name and node port
CREATE OR REPLACE FUNCTION pg_catalog.citus_schema_move(
schema_id regnamespace,
target_node_name text,
target_node_port integer,
shard_transfer_mode citus.shard_transfer_mode default 'auto')
RETURNS void
LANGUAGE C STRICT
AS 'MODULE_PATHNAME', $$citus_schema_move$$;
-- citus_schema_move(), using target node id
CREATE OR REPLACE FUNCTION pg_catalog.citus_schema_move(
schema_id regnamespace,
target_node_id integer,
shard_transfer_mode citus.shard_transfer_mode default 'auto')
RETURNS void
LANGUAGE C STRICT
AS 'MODULE_PATHNAME', $$citus_schema_move_with_nodeid$$;
```
Since in PG16, truncate triggers are supported on foreign tables, we add
the citus_truncate_trigger to Citus foreign tables as well, such that the TRUNCATE
command is propagated to the table's single local shard as well.
Note that TRUNCATE command was working for foreign tables even before this
commit: see https://github.com/citusdata/citus/pull/7170#issuecomment-1706240593 for details
This commit also adds tests with user-enabled truncate triggers on Citus foreign tables:
both trigger on the shell table and on its single foreign local shard.
Relevant PG commit:
https://github.com/postgres/postgres/commit/3b00a94
**Problem:**
Previously we always used an outside superuser connection to overcome
permission issues for the current user while propagating dependencies.
That has mainly 2 problems:
1. Visibility issues during dependency propagation, (metadata connection
propagates some objects like a schema, and outside transaction does not
see it and tries to create it again)
2. Security issues (it is preferrable to use current user's connection
instead of extension superuser)
**Solution (high level):**
Now, we try to make a smarter decision on whether should we use an
outside superuser connection or current user's metadata connection. We
prefer using current user's connection if any of the objects, which is
already propagated in the current transaction, is a dependency for a
target object. We do that since we assume if current user has
permissions to create the dependency, then it can most probably
propagate the target as well.
Our assumption is expected to hold most of the times but it can still be
wrong. In those cases, transaction would fail and user should set the
GUC `citus.create_object_propagation` to `deferred` to work around it.
**Solution:**
1. We track all objects propagated in the current transaction (we can
handle subtransactions),
2. We propagate dependencies via the current user's metadata connection
if any dependency is created in the current transaction to address
issues listed above. Otherwise, we still use an outside superuser
connection.
DESCRIPTION: Fixes some object propagation errors seen with transaction
blocks.
Fixes https://github.com/citusdata/citus/issues/6614
---------
Co-authored-by: Nils Dijk <nils@citusdata.com>
For a database that does not create the citus extension by running
` CREATE EXTENSION citus;`
`CitusHasBeenLoaded ` function ends up querying the `pg_extension` table
every time it is invoked. This is not an ideal situation for a such a
database.
The idea in this PR is as follows:
### A new field in MetadataCache.
Add a new variable `extensionCreatedState `of the following type:
```
typedef enum ExtensionCreatedState
{
UNKNOWN = 0,
CREATED = 1,
NOTCREATED = 2,
} ExtensionCreatedState;
```
When the MetadataCache is invalidated, `ExtensionCreatedState` will be
set to UNKNOWN.
### Invalidate MetadataCache when CREATE/DROP/ALTER EXTENSION citus
commands are run.
- Register a callback function, named
`InvalidateDistRelationCacheCallback`, for relcache invalidation during
the shared library initialization for `citus.so`. This callback function
is invoked in all the backends whenever the relcache is invalidated in
one of the backends. (This could be caused many DDLs operations).
- In the cache invalidation callback,`
InvalidateDistRelationCacheCallback`, invalidate `MetadataCache` zeroing
it out.
- In `CitusHasBeenLoaded`, perform the costly citus is loaded check only
if the `MetadataCache` is not valid.
### Downsides
Any relcache invalidation (caused by various DDL operations) will case
Citus MetadataCache to get invalidated. Most of the time it will be
unnecessary. But we rely on that DDL operations on relations will not be
too frequent.
When braking a colocation, we need to create a new colocation group
record in pg_dist_colocation for the relation. It is not sufficient to
have a new colocationid value in pg_dist_partition only.
This patch also fixes a bug when deleting a colocation group if no
tables are left in it. Previously we passed a relation id as a parameter
to DeleteColocationGroupIfNoTablesBelong function, where we should have
passed a colocation id.
DESCRIPTION: Presenting citus_pause_node UDF enabling pausing by
node_id.
citus_pause_node takes a node_id parameter and fetches all the shards in
that node and puts AccessExclusiveLock on all the shards inside that
node. With this lock, insert is disabled, until citus_pause_node
transaction is closed.
---------
Co-authored-by: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com>
If we're in the middle of a table type conversion (such as from Citus
local table to a reference table), the table might not have all the
placements that we expect from the table type. For this reason, we
should intersect the placements of tables at hand when creating
inter-shard ddl tasks.
What we do to collect foreign key constraint commands in
WorkerCreateShardCommandList is quite similar to what we do in
CopyShardForeignConstraintCommandList. Plus, the code that we used
in WorkerCreateShardCommandList before was not able to properly handle
foreign key constraints between Citus local tables --when creating a
reference table from the referencing one.
With a few slight modifications made to
CopyShardForeignConstraintCommandList, we can use the same logic in
WorkerCreateShardCommandList too.
DESCRIPTION: Adds grant/revoke propagation support for database
privileges
Following the implementation of support for granting and revoking
database privileges, certain tests that issued grants for worker nodes
experienced failures. These ones are fixed in this PR as well.
DESCRIPTION: Adds PG16Beta3 support
This is the final commit that adds
PG16 compatibility with Citus's current features.
You can use Citus community with PG16Beta3. This commit:
- Enables PG16 in the configure script.
- Adds PG16 tests to CI using test images that have 16beta3
- Skips wal2json cdc test since wal2json package is not available for PG16 yet
- Fixes an isolation test
Several PG16 Compatibility commits have been merged before this final one.
All these subtasks are done https://github.com/citusdata/citus/issues/7017
See the list below:
1 - 42d956888d
Resolve compilation issues
2 - 0d503dd5ac
Ruleutils and successful CREATE EXTENSION
3 - 907d72e60d
Some test outputs
4 - 7c6b4ce103
Outer join checks, subscription password, crash fixes
5 - 6056cb2c29
get_relation_info hook to avoid crash from adjusted partitioning
6 - b36c431abb
Rework PlannedStmt and Query's Permission Info
7 - ee3153fe50
More test output fixes
8 - 2c50b5f7ff
varnullingrels additions
9 - b2291374b4
More test output fixes
10- a2315fdc67
New options to vacuum and analyze
11- 9fa72545e2
Fix AM dependency and grant's admin option
12- 2d6cf8e79a
One more outer join check
Stay tuned for PG16 new features in Citus :)
PG16 compatibility - part 11
Check out part 1 42d956888d
part 2 0d503dd5ac
part 3 907d72e60d
part 4 7c6b4ce103
part 5 6056cb2c29
part 6 b36c431abb
part 7 ee3153fe50
part 8 2c50b5f7ff
part 9 b2291374b4
part 10 a2315fdc67
part 11 9fa72545e2
This commit is in the series of PG16 compatibility commits.
We already took care of the majority of necessary outer join checks
in part 4 7c6b4ce103
However, In RelationInfoContainsOnlyRecurringTuples,
we need to add one more check of whether we are dealing
with an outer join RTE using IsRelOptOuterJoin function.
This prevents an outer join crash in sqlancer_failures.sql test.
We expect one more commit of PG compatibility with Citus's current
features are regression tests sanity.