Columnar: support relation options with ALTER TABLE.
Use ALTER TABLE ... SET/RESET to specify relation options rather than
alter_columnar_table_set() and alter_columnar_table_reset().
Not only is this more ergonomic, but it also allows better integration
because it can be treated like DDL on a regular table. For instance,
citus can use its own ProcessUtility_hook to distribute the new
settings to the shards.
DESCRIPTION: Columnar: support relation options with ALTER TABLE.
In the past (pre-11), we allowed removing worker nodes
that had active placements for replicated distributed
table, without even checking if there are any other
replicas of the same placement.
However, with #5469, we prevent disabling nodes via a hard
error when there is the last active placement of shard, as we
do for reference tables. Note that otherwise, we'd allow
users to lose data.
As of today, the NOTICE is completely irrelevant.
First worker node has a special meaning for modifications on the replicated tables
It is used to acquire a remote lock, such that the modifications are serialized.
With this commit, we make sure that we do not let any distributed query to see a
different 'first worker node' while first worker node is disabled.
Note that, maybe implicitly mentioned above, when first worker node is disabled,
the first worker node changes, that's why we have to handle the situation.
Before this commit, we had:
```SQL
SELECT citus_disable_node(nodename, nodeport, force boolean DEFAULT false)
```
Where, we allow forcing to disable first worker node with
`force:=true`. However, it entails the risk for losing
data / diverging placement data etc.
With `force` flag, we control disabling the first worker node,
and with `async` flag we control whether the changes are done
via bg worker or immediately.
```SQL
SELECT citus_disable_node(nodename, nodeport, force boolean DEFAULT false, sync boolean DEFAULT false)
```
Where we can achieve all the following:
| Mode | Data loss possibility | Can run in 2PC | Handle multiple node failures | Immediately effective |
| --- |--- |--- |--- |--- |
| force:false, sync: false | false | true | true | false |
| force:false, sync: true | false | false | false | true |
| force:true, sync: false | true | true | true | false |
| force:true, sync: true | false | false | false | true |
There are two problems in this area. First, when there are expressions
on the index name, we should call `transformIndexExpression()` before
generating the index name. That is what Postgres does.
Second, because of 40c24bfef9
PG 13 and PG 14 generates different names for indexes with function calls even for local PG tables.
Assume we have:
```SQL
create table t(id int);
select create_distributed_table('t', 'id');
create index ON t (my_very_boring_function(id));
```
On PG 13, the name of the index is `t_expr_idx`
```SQL
\d t
Table "public.t"
┌────────┬─────────┬───────────┬──────────┬─────────┐
│ Column │ Type │ Collation │ Nullable │ Default │
├────────┼─────────┼───────────┼──────────┼─────────┤
│ id │ integer │ │ │ │
└────────┴─────────┴───────────┴──────────┴─────────┘
Indexes:
"t_expr_idx" btree (my_very_boring_function(id::bigint))
```
On PG 14, the name of the index is `t_my_very_boring_function_idx`
```SQL
\d t
Table "public.t"
┌────────┬─────────┬───────────┬──────────┬─────────┐
│ Column │ Type │ Collation │ Nullable │ Default │
├────────┼─────────┼───────────┼──────────┼─────────┤
│ id │ integer │ │ │ │
└────────┴─────────┴───────────┴──────────┴─────────┘
Indexes:
"t_my_very_boring_function_idx" btree (my_very_boring_function(id::bigint))
```
The second issue is not very critical. The important part is that
we adjust regression tests to drop all the indexes, which ensures
the index names are sane on any version.
Over time we have added significantly improved the support for objects to be propagated by Citus as to make scaling out the database more seamless. It became evident that there was a lot of code duplication that got into the codebase to implement the propagation.
This PR tries to reduce the amount of repeated code that is at most only slightly different. To make things worse, most of the differences were actually oversights instead of correct.
This Patch introduces 3 reusable sets of pre/post processing steps for respectively
- create
- alter
- drop
With the use of the common functionality we should have more coherent behaviour between different supported object by Citus.
Some steps either omit the Pre or Post processing step if they would not make sense to include.
All tests pass, only 1 test needed changing, foreign servers, as the dropping of foreign servers didn't implement support for dropping multiple foreign servers at once. Given the common approach correctly supports dropping of multiple objects, either distributed or not, the test that assumed it wouldn't work was now obsolete.
We have a mechanism which ensures that newly distributed
objects are recorded during `alter extension citus update`.
However, the logic was lacking "view"s. With this commit, we make
sure that existing views are also marked as distributed during
upgrade.
Adds support for propagation ALTER VIEW commands to
- Change owner of view
- SET/RESET option
- Rename view and view's column name
- Change schema of the view
Since PG also supports targeting views with ALTER TABLE
commands, related code also added to direct such ALTER TABLE
commands to ALTER VIEW commands while sending them to workers.
Breaking down #5899 into smaller PR-s
This particular PR changes the way TRUNCATE acquires distributed locks on the relations it is truncating to use the LOCK command instead of lock_relation_if_exists. This has the benefit of using pg's recursive locking logic it implements for the LOCK command instead of us having to resolve relation dependencies and lock them explicitly. While this does not directly affect truncate, it will allow us to generalize this locking logic to then log different relations where the pg recursive locking will become useful (e.g. locking views).
This implementation is a bit more complex that it needs to be due to pg not supporting locking foreign tables. We can however, still lock foreign tables with lock_relation_if_exists. So for a command:
TRUNCATE dist_table_1, dist_table_2, foreign_table_1, foreign_table_2, dist_table_3;
We generate and send the following command to all the workers in metadata:
```sql
SEL citus.enable_ddl_propagation TO FALSE;
LOCK dist_table_1, dist_table_2 IN ACCESS EXCLUSIVE MODE;
SELECT lock_relation_if_exists('foreign_table_1', 'ACCESS EXCLUSIVE');
SELECT lock_relation_if_exists('foreign_table_2', 'ACCESS EXCLUSIVE');
LOCK dist_table_3 IN ACCESS EXCLUSIVE MODE;
SEL citus.enable_ddl_propagation TO TRUE;
```
Note that we need to alternate between the lock command and lock_table_if_exists in order to preserve the TRUNCATE order of relations.
When pg supports locking foreign tables, we will be able to massive simplify this logic and send a single LOCK command.
Adds support for propagating create/drop view commands and views to
worker node while scaling out the cluster. Since views are dropped while
converting the table type, metadata connection will be used while
propagating view commands to not switch to sequential mode.
First, it is not needed. Second, in the past we had issues regarding
this: https://github.com/citusdata/citus/pull/4344
When I create 10k tables, ~120K shards, this saves
40Mb of memory during ALTER EXTENSION citus UPDATE.
Before the change: MetadataCacheMemoryContext: 41943040 ~ 40MB
After the change: MetadataCacheMemoryContext: 8192
In the past, for all modifications on the local execution,
we enabled 2PC (with 6a7ed7b309).
This also required us to enable coordinated transactions
via https://github.com/citusdata/citus/pull/4831 .
However, it does have a very substantial impact on the
distributed deadlock detection. The distributed deadlock
detection is designed to avoid single-statement transactions
because they cannot lead to any actual deadlocks.
The implementation is to skip backends without distributed
transactions are assigned. Now that we assign single
statement local executions in the lock graphs, we are
conflicting with the design of distributed deadlock
detection.
In general, we should fix it. However, one might
think that it is not a big deal, even if the processes
show up in the lock graphs, the deadlock detection
should not be causing any false positives. That is
false, unless https://github.com/citusdata/citus/issues/1803
is fixed. Now that local processes are considered as a single
distributed backend, the lock graphs might find:
local execution 1 [tx id: 1] -> any local process [tx id: 0]
any local process [tx id: 0] -> local execution 2 [tx id: 2]
And, decides that there is a distributed deadlock.
This commit is:
(a) right thing to do, as local execuion should not need any
distributed tx id
(b) Eliminates performance issues that might come up with
deadlock detection does a lot of unncessary checks
(c) After moving local execution after the remote execution
via https://github.com/citusdata/citus/pull/4301, the
vauge requirement for assigning distributed tx ids are
already gone.
* Separate build of citus.so and citus_columnar.so.
Because columnar code is statically-linked to both modules, it doesn't
make sense to load them both at once.
A subsequent commit will make the modules entirely separate and allow
loading them both simultaneously.
Author: Yanwen Jin
* Separate citus and citus_columnar modules.
Now the modules are independent. Columnar can be loaded by itself, or
along with citus.
Co-authored-by: Jeff Davis <jefdavi@microsoft.com>
The aim of hiding shards is to hide shards from client applications.
Certain bg workers (such as pg_cron or Citus maintanince daemon)
should be treated like client applications because users can run
queries from such bg workers. And, these bg workers should follow
the similar application_name checks as client backeends.
Certain other bg workers, such as logical replication or postgres'
parallel workers, should never hide shards. They are internal
operations.
Similarly the other backend types like the walsender or
checkpointer or autovacuum should never hide shards.
We've had custom versions of Postgres its `foreach` macro which with a
hidden ListCell for quite some time now. People like these custom
macros, because they are easier to use and require less boilerplate.
This adds similar custom versions of Postgres its `forboth` macro. Now
you don't need ListCells anymore when looping over two lists at the same
time.