Now that we will soon add another table type having DISTRIBUTE_BY_NONE
as distribution method and that we want the code to interpret such
tables mostly as distributed tables, let's make the definition of those
other two table types more strict by removing
CITUS_TABLE_WITH_NO_DIST_KEY
macro.
And instead, use HasDistributionKey() check in the places where the
logic applies to all table types that have / don't have a distribution
key. In future PRs, we might want to convert some of those
HasDistributionKey() checks if logic only applies to Citus local /
reference tables, not the others.
And adding HasDistributionKey() also allows us to consider having
DISTRIBUTE_BY_NONE as the distribution method as a "table attribute"
that can apply to distributed tables too, rather something that
determines the table type.
DESCRIPTION: Correctly report shard size in citus_shards view
When looking at citus_shards, people are interested in the actual size
that all the data related to the shard takes up on disk.
`pg_total_relation_size` is the function to use for that purpose. The
previously used `pg_relation_size` does not include indexes or TOAST.
Especially the missing toast can have enormous impact on the size of the
shown data.
citus_job_list() lists all background jobs by simply showing the records
in pg_dist_background_job.
citus_job_status(job_id bigint, raw boolean default false) shows the
status of a single background job by appending a jsonb details column to
the associated row from pg_dist_background_job. If the raw argument is
set, machine readable sizes are used instead of human readable
alternatives.
citus_rebalance_status(raw boolean default false) shows the status of
the last rebalance operation. If the raw argument is set, machine
readable sizes are used instead of human readable alternatives.
DESCRIPTION: Drop `SHARD_STATE_TO_DELETE` and use the cleanup records
instead
Drops the shard state that is used to mark shards as orphaned. Now we
insert cleanup records into `pg_dist_cleanup` so "orphaned" shards will
be dropped either by maintenance daemon or internal cleanup calls. With
this PR, we make the "cleanup orphaned shards" functions to be no-op, as
they would not be needed anymore.
This PR includes some naming changes about placement functions. We don't
need functions that filter orphaned shards, as there will be no orphaned
shards anymore.
We will also be introducing a small script with this PR, for users with
orphaned shards. We'll basically delete the orphaned shard entries from
`pg_dist_placement` and insert cleanup records into `pg_dist_cleanup`
for each one of them, during Citus upgrade.
We also have a lot of flakiness fixes in this PR.
Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
When using multiline strings, we occasionally forget to add a single
space at the end of the first line. When this line is concatenated with
the next one, the resulting string has a missing space.
DESCRIPTION: Makes sure to disallow triggers that depend on extensions
We were already doing so for `ALTER trigger DEPENDS ON EXTENSION`
commands. However, we also need to disallow creating Citus tables
having such triggers already, so this PR fixes that.
DESCRIPTION: Allow citus_update_node() to work with nodes from different clusters
citus_update_node(), citus_nodename_for_nodeid(), and citus_nodeport_for_nodeid() functions only checked for nodes in their own clusters and hence last two returned NULLs and the first one showed an error is the nodeId was from a different cluster.
Fixes https://github.com/citusdata/citus/issues/6433
increasing logical clock. Clock guarantees to never go back in value after restarts,
and makes best attempt to keep the value close to unix epoch time in milliseconds.
Also, introduces a new GUC "citus.enable_cluster_clock", when true, every
distributed transaction is stamped with logical causal clock and persisted
in a catalog pg_dist_commit_transaction.
DESCRIPTION: Fix bug in global PID assignment for rebalancer
sub-connections
In CI our isolation_shard_rebalancer_progress test would sometimes fail
like this:
```diff
+isolationtester: canceling step s1-rebalance-c1-block-writes after 60 seconds
step s1-rebalance-c1-block-writes:
SELECT rebalance_table_shards('colocated1', shard_transfer_mode:='block_writes');
- <waiting ...>
+
+ERROR: canceling statement due to user request
step s7-get-progress:
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/27855/workflows/2a7e335a-f3e8-46ed-b6bd-6920d42f7214/jobs/831710
It turned out this was an actual bug in the way our assigning of global
PIDs interacts with the way we connect to ourselves as the shard
rebalancer. The first command the shard rebalancer sends is a SET
ommand to change the application_name to `citus_rebalancer`. If
`StartupCitusBackend` is called after this command is processed, then it
overwrites the global PID that was extracted from the previous
application_name. This makes sure that we don't do that, and continue to
use the original global PID. While it might seem that we only call
`StartupCitusBackend` once for each query backend, this isn't actually
the case. Whenever pg_dist_partition gets ANALYZEd by autovacuum
we indirectly call `StartupCitusBackend` again, because we invalidate
the cache then.
In passing this fixes two other things as well:
1. It sets `distributedCommandOriginator` correctly in
`AssignGlobalPID`, by using IsExternalClientBackend(). This doesn't
matter much anymore, since AssignGlobalPID effectively becomes a
no-op in this PR for any non-external client backends.
2. It passes the application_name to InitializeBackendData in
StartupCitusBackend, instead of INVALID_CITUS_INTERNAL_BACKEND_GPID
(which effectively got casted to NULL). In practice this doesn't
change the behaviour of the call, since the call is a no-op for every
backend except the maintenance daemon. And the behaviour of the call
is the same for NULL as for the application_name of the maintenance
daemon.
No need for description, fixing issue introduced with new feature for
11.1
Fixes#6333
Due to Postgres' C api being o-indexed and postgres' attributes being
1-indexed, we were reading the wrong Datum as the Task owner when
cancelling. Here we add a test to show the error and fix the off-by-one
error.
DESCRIPTION: Add a rebalancer that uses background tasks for its
execution
Based on the baclground jobs and tasks introduced in #6296 we implement
a new rebalancer on top of the primitives of background execution. This
allows the user to initiate a rebalance and let Citus execute the long
running steps in the background until completion.
Users can invoke the new background rebalancer with `SELECT
citus_rebalance_start();`. It will output information on its job id and
how to track progress. Also it returns its job id for automation
purposes. If you simply want to wait till the rebalance is done you can
use `SELECT citus_rebalance_wait();`
A running rebalance can be canelled/stopped with `SELECT
citus_rebalance_stop();`.
DESCRIPTION: Add infrastructure to run long running management operations in background
This infrastructure introduces the primitives of jobs and tasks.
A task consists of a sql statement and an owner. Tasks belong to a
Job and can depend on other tasks from the same job.
When there are either runnable or running tasks we would like to
make sure a bacgrkound task queue monitor process is running. A Task
could be in running state while there is actually no monitor present
due to a database restart or failover. Once the monitor starts it
will reset any running task to its runnable state.
To make sure only one background task queue monitor is ever running
at once it will acquire an advisory lock that self conflicts.
Once a task is done it will find all tasks depending on this task.
After checking that the task doesn't have unmet dependencies it will
transition the task from blocked to runnable state for the task to
be picked up on a subsequent task start.
Currently only one task can be running at a time. This can be
improved upon in later releases without changes to the higher level
API.
The initial goal for this background tasks is to allow a rebalance
to run in the background. This will be implemented in a subsequent PR.
DESCRIPTION:
This PR adds support for 'Deferred Drop' and robust 'Shard Cleanup' for Splits.
Common Infrastructure
This PR introduces new common infrastructure so as any operation that wants robust cleanup of resources can register with the cleaner and have the resources cleaned appropriately based on a specified policy. 'Shard Split' is the first consumer using this new infrastructure.
Note : We only support adding 'shards' as resources to be cleaned-up right now but the framework will be extended to support other resources in future.
Deferred Drop for Split
Deferred Drop Support ensures that shards undergoing split are not dropped inline as part of operation but dropped later when no active read queries are running on shard. This helps with :
Avoids any potential deadlock scenarios that can cause long running Split operation to rollback.
Avoids Split operation blocking writes and then getting blocked (due to running queries on the shard) when trying to drop shards.
Deferred drop is the new default behavior going forward.
Shard Cleaner Extension
Shard Cleaner is a background task responsible for deferred drops in case of 'Move' operations.
The cleaner has been extended to ensure robust cleanup of shards (dummy shards and split children) in case of a failure based on the new infrastructure mentioned above. The cleaner also handles deferred drop for 'Splits'.
TESTING:
New test ''citus_split_shard_by_split_points_deferred_drop' to test deferred drop support.
New test 'failure_split_cleanup' to test shard cleanup with failures in different stages.
Update 'isolation_blocking_shard_split and isolation_non_blocking_shard_split' for deferred drop.
Added non-deferred drop version of existing tests : 'citus_split_shard_no_deferred_drop' and 'citus_non_blocking_splits_no_deferred_drop'
pg_dist_node and pg_dist_colocation have a primary key index, not a replica identity index.
Citus catalog tables are created in public schema, which has replica identity index by default
as primary key index. Later the citus catalog tables are moved to pg_catalog schema.
During pg_upgrade, all tables are recreated, and given that pg_dist_colocation is found in
pg_catalog schema, it is recreated in that schema, and when it is recreated it doesn't
have a replica identity index, because catalog tables have no replica identity.
Further action:
Do we even need to acquire this lock on the primary key index?
Postgres doesn't acquire such locks on indexes before deleting catalog tuples.
Also, catalog tuples don't have replica identities by definition.
Added create_distributed_table_concurrently which is nonblocking variant of create_distributed_table.
It bases on the split API which takes advantage of logical replication to support nonblocking split operations.
Co-authored-by: Marco Slot <marco.slot@gmail.com>
Co-authored-by: aykutbozkurt <aykut.bozkurt1995@gmail.com>
We currently do a `pg_relation_total_size('t1') + pg_relation_total_size('t2') + ..` on shard lists, especially when rebalancing the shards. This in some cases goes huge. With this PR, we basically use a SUM for all table sizes, instead of using thousands of pluses.
There are 3 different ways that a sequence can be interacting
with tables. (1) and (2) are already supported. This commit adds
support for (3).
(1) column DEFAULT nextval('seq'):
The dependency is roughly like below,
and ExpandCitusSupportedTypes() is responsible
for finding the depending sequences.
schema <--- table <--- column <---- default value
^ |
|------------------ sequence <--------|
(2) serial columns: Bigserial/small serial etc:
The dependency is roughly like below,
and ExpandCitusSupportedTypes() is responsible
for finding the depending sequences.
schema <--- table <--- column <---- default value
^ |
| |
sequence <--------|
(3) Sequence OWNED BY table.column: Added support for
this type of resolution in this commit.
The dependency is almost like the following, and
ExpandCitusSupportedTypes() is NOT responsible for finding
the dependency.
schema <--- table <--- column
^
|
sequence
Object type ids have changed in PG15 because of at least two added
objects in the list: OBJECT_PARAMETER_ACL, OBJECT_PUBLICATION_NAMESPACE
To avoid different output between pg versions, let's use the object
name in the error, and put the object id in the error detail.
Relevant PG commits:
a0ffa885e478f5eeacc4e250e35ce25a4740c487
5a2832465fd8984d089e8c44c094e6900d987fcd
This is a refactoring PR that starts using our new hash table creation
helper function. It adds a few more macros for ease of use, because C
doesn't have default arguments. It also adds a macro to check if a
struct contains automatic padding bytes. No struct that is hashed using
tag_hash should have automatic padding bytes, because those bytes are
undefined and thus using them to create a hash will result in undefined
behaviour (usually a random hash).
use RecurseObjectDependencies api to find if an object is citus depended
make vanilla tests runnable to see if citus_depended function is working correctly
* Remove if conditions with PG_VERSION_NUM < 13
* Remove server_above_twelve(&eleven) checks from tests
* Fix tests
* Remove pg12 and pg11 alternative test output files
* Remove pg12 specific normalization rules
* Some more if conditions in the code
* Change RemoteCollationIdExpression and some pg12/pg13 comments
* Remove some more normalization rules
This PR makes all of the features open source that were previously only
available in Citus Enterprise.
Features that this adds:
1. Non blocking shard moves/shard rebalancer
(`citus.logical_replication_timeout`)
2. Propagation of CREATE/DROP/ALTER ROLE statements
3. Propagation of GRANT statements
4. Propagation of CLUSTER statements
5. Propagation of ALTER DATABASE ... OWNER TO ...
6. Optimization for COPY when loading JSON to avoid double parsing of
the JSON object (`citus.skip_jsonb_validation_in_copy`)
7. Support for row level security
8. Support for `pg_dist_authinfo`, which allows storing different
authentication options for different users, e.g. you can store
passwords or certificates here.
9. Support for `pg_dist_poolinfo`, which allows using connection poolers
in between coordinator and workers
10. Tracking distributed query execution times using
citus_stat_statements (`citus.stat_statements_max`,
`citus.stat_statements_purge_interval`,
`citus.stat_statements_track`). This is disabled by default.
11. Blocking tenant_isolation
12. Support for `sslkey` and `sslcert` in `citus.node_conninfo`
The error comes due to the datum jsonb in pg_dist_metadata_node.metadata being 0 in some scenarios. This is likely due to not copying the data when receiving a datum from a tuple and pg deciding to deallocate that memory when the table that the tuple was from is closed.
Also fix another place in the code that might have been susceptible to this issue.
I tested on both multi-vg and multi-1-vg and the test were successful.