* Fix problews with concurrent calls of DropMarkedShards
When trying to enable `citus.defer_drop_after_shard_move` by default it
turned out that DropMarkedShards was not safe to call concurrently.
This could especially cause big problems when also moving shards at the
same time. During tests it was possible to trigger a state where a shard
that was moved would not be available on any of the nodes anymore after
the move.
Currently DropMarkedShards is only called in production by the
maintenaince deamon. Since this is only a single process triggering such
a race is currently impossible in production settings. In future changes
we will want to call DropMarkedShards from other places too though.
* Add some isolation tests
Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
(cherry picked from commit 93c2dcf3d2)
The first and main issue was that we were putting absolute pointers into
shared memory for the `steps` field of the `ProgressMonitorData`. This
pointer was being overwritten every time a process requested the monitor
steps, which is the only reason why this even worked in the first place.
To quote a part of a relevant stack overflow answer:
> First of all, putting absolute pointers in shared memory segments is
> terrible terible idea - those pointers would only be valid in the
> process that filled in their values. Shared memory segments are not
> guaranteed to attach at the same virtual address in every process.
> On the contrary - they attach where the system deems it possible when
> `shmaddr == NULL` is specified on call to `shmat()`
Source: https://stackoverflow.com/a/10781921/2570866
In this case a race condition occurred when a second process overwrote
the pointer in between the first process its write and read of the steps
field.
This issue is fixed by not storing the pointer in shared memory anymore.
Instead we now calculate it's position every time we need it.
The second race condition I have not been able to trigger, but I found
it while investigating this. This issue was that we published the handle
of the shared memory segment, before we initialized the data in the
steps. This means that during initialization of the data, a call to
`get_rebalance_progress()` could read partial data in an unsynchronized
manner.
(cherry picked from commit ca00b63272)
DESCRIPTION: introduce `citus.local_hostname` GUC for connections to the current node
Citus once in a while needs to connect to itself for some systems operations. This used to be hardcoded to `localhost`. The hardcoded hostname causes some issues, for example in environments where `sslmode=verify-full` is required. It is not always desirable or even feasible to get `localhost` as an alt name on the certificate.
By introducing a GUC to use when connecting to the current instance the user has more control what network path is used and what hostname is required to be present in the server certificate.
Because setting the flag doesn't necessarily mean that we'll
use 2PC. If connections are read-only, we will not use 2PC.
In other words, we'll use 2PC only for connections that modified
any placements.
(cherry picked from commit e65e72130d)
* Reimplement citus_update_table_statistics
* Update stats for the given table not colocation group
* Add tests for reimplemented citus_update_table_statistics
* Use coordinated transaction, merge with citus_shard_sizes functions
* Update the old master_update_table_statistics as well
(cherry picked from commit 2f30614fe3)
With #4338, the executor is smart enough to failover to
local node if there is not enough space in max_connections
for remote connections.
For COPY, the logic is different. With #4034, we made COPY
work with the adaptive connection management slightly
differently. The cause of the difference is that COPY doesn't
know which placements are going to be accessed hence requires
to get connections up-front.
Similarly, COPY decides to use local execution up-front.
With this commit, we change the logic for COPY on local nodes:
Try to reserve a connection to local host. This logic follows
the same logic (e.g., citus.local_shared_pool_size) as the
executor because COPY also relies on TryToIncrementSharedConnectionCounter().
If reservation to local node fails, switch to local execution
Apart from this, if local execution is disabled, we follow the
exact same logic for multi-node Citus. It means that if we are
out of the connection, we'd give an error.
Because master_add_node(or others) might acquire ExclusiveLock
and their initiated sessions may call CoordinatorAddedAsWorkerNode().
With this we prevent potential deadlocks.
* Replace master_add_node with citus_add_node
* Replace master_activate_node with citus_activate_node
* Replace master_add_inactive_node with citus_add_inactive_node
* Use master udfs in old scripts
* Replace master_add_secondary_node with citus_add_secondary_node
* Replace master_disable_node with citus_disable_node
* Replace master_drain_node with citus_drain_node
* Replace master_remove_node with citus_remove_node
* Replace master_set_node_property with citus_set_node_property
* Replace master_unmark_object_distributed with citus_unmark_object_distributed
* Replace master_update_node with citus_update_node
* Replace master_update_shard_statistics with citus_update_shard_statistics
* Replace master_update_table_statistics with citus_update_table_statistics
* Rename master_conninfo_cache_invalidate to citus_conninfo_cache_invalidate
Rename master_dist_local_group_cache_invalidate to citus_dist_local_group_cache_invalidate
* Replace master_copy_shard_placement with citus_copy_shard_placement
* Replace master_move_shard_placement with citus_move_shard_placement
* Rename master_dist_node_cache_invalidate to citus_dist_node_cache_invalidate
* Rename master_dist_object_cache_invalidate to citus_dist_object_cache_invalidate
* Rename master_dist_partition_cache_invalidate to citus_dist_partition_cache_invalidate
* Rename master_dist_placement_cache_invalidate to citus_dist_placement_cache_invalidate
* Rename master_dist_shard_cache_invalidate to citus_dist_shard_cache_invalidate
* Drop master_modify_multiple_shards
* Rename master_drop_all_shards to citus_drop_all_shards
* Drop master_create_distributed_table
* Drop master_create_worker_shards
* Revert old function definitions
* Add missing revoke statement for citus_disable_node
A utility function is added so that each caller can implement a handler
for each index on a given table. This means that the caller doesn't need
to worry about how to access each index, the only thing that it needs to
do each to implement a function to which each index on the table is
passed iteratively.
When distributing a columnar table, as well as changing options on a distributed columnar table, this patch will forward the settings from the coordinator to the workers.
For propagating options changes on an already distributed table this change is pretty straight forward. Before applying the change in options locally we will create a `DDLJob` that contains a call to `alter_columnar_table_set(...)` for every shard placement with all settings of the current table. This goes both for setting an option as well as resetting. This will reset the values to the defaults configured on the coordinator. Having the effect that the coordinator is authoritative on the settings and makes sure the shards have the same settings set as the table on the coordinator.
When a columnar table is distributed it is using the `TableDDLCommand` infra structure to create a new kind of `TableDDLCommand`. This new type, called a `TableDDLCommandFunction` contains a context and 2 function pointers to execute. One function returns the command as applied on the table, the second function will return the sql command to apply to a shard with a given shard id. The schema name is ignored as it will use the fully qualified name of the shard in the same schema as the base table.
Refactor internals on how Citus creates the SQL commands it sends to recreate shards.
Before Citus collected solely ddl commands as `char *`'s to recreate a table. If they were used to create a shard they were wrapped with `worker_apply_shard_ddl_command` and send to the workers. On the workers the UDF wrapping the ddl command would rewrite the parsetree to replace tables names with their shard name equivalent.
This worked well, but poses an issue when adding columnar. Due to limitations in Postgres on creating custom options on table access methods we need to fall back on a UDF to set columnar specific options. Now, to recreate the table, we can not longer rely on having solely DDL statements to recreate a table.
A prototype was made to run this UDF wrapped in `worker_apply_shard_ddl_command`. This became pretty messy, hard to understand and subsequently hard to maintain.
This PR proposes a refactor of the internal representation of table ddl commands into a `TableDDLCommand` structure. The current implementation only supports a `char *` as its contents. Based on the use of the DDL statement (eg. creating the table -mx- or creating a shard) one of two different functions can be called to get the statement to send to the worker:
- `GetTableDDLCommand(TableDDLCommand *command)`: This function returns that ddl command to create the table. In this implementation it will just return the `char *`. This has the same functionality as getting the old list and not wrapping it.
- `GetShardedTableDDLCommand(TableDDLCommand *command, uint64 shardId, char *schemaName)`: This function returns the ddl command wrapped in `worker_apply_shard_ddl_command` with the `shardId` as an argument. Due to backwards compatibility it also accepts a. `schemaName`. The exact purpose is not directly clear. Ideally new implementations would work with fully qualified statements and ignore the `schemaName`.
A future implementation could accept 2.function pointers and a `void *` for context to let the two pointers work on. This gives greater flexibility in controlling what commands get send in which situations. Also, in a future, we could implement the intermediate step of creating the `parsetree` datastructure of statements based on the contents in the catalog with a corresponding deparser. For sharded queries a mutator could be ran over the parsetree to rewrite the tablenames to the names with the shard identifier. This will completely omit the requirement for `worker_apply_shard_ddl_command`.
* Not allow removing a single node with ref tables
We should not allow removing a node if it is the only node in the
cluster and there is a data on it. We have this check for distributed
tables but we didn't have it for reference tables.
* Update src/test/regress/expected/single_node.out
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
* Update src/test/regress/sql/single_node.sql
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
This commit brings following features:
Foreign key support from citus local tables to reference tables
* Foreign key support from reference tables to citus local tables
(only with RESTRICT & NO ACTION behavior)
* ALTER TABLE ENABLE/DISABLE trigger command support
* CREATE/DROP/ALTER trigger command support
and disallows:
* ALTER TABLE ATTACH/DETACH PARTITION commands
* CREATE TABLE <postgres table> ATTACH PARTITION <citus local table>
commands
* Foreign keys from postgres tables to citus local tables
(the other way was already disallowed)
for citus local tables.
Introduce table entry utility functions
Citus table cache entry utilities are introduced so that we can easily
extend existing functionality with minimum changes, specifically changes
to these functions. For example IsNonDistributedTableCacheEntry can be
extended for citus local tables without the need to scan the whole
codebase and update each relevant part.
* Introduce utility functions to find the type of tables
A table type can be a reference table, a hash/range/append distributed
table. Utility methods are created so that we don't have to worry about
how a table is considered as a reference table etc. This also makes it
easy to extend the table types.
* Add IsCitusTableType utilities
* Rename IsCacheEntryCitusTableType -> IsCitusTableTypeCacheEntry
* Change citus table types in some checks
This commit mostly adds pg_get_triggerdef_command to our ruleutils_13.
This doesn't add anything extra for ruleutils 13 so it is basically a copy
of the change on ruleutils_12
With PG13 heap_* (heap_open, heap_close etc) are replaced with table_*
(table_open, table_close etc).
It is better to use the new table access methods in the codebase and
define the macros for the previous versions as we can easily remove the
macro without having to change the codebase when we drop the support for
the old version.
Commits that introduced this change on Postgres:
f25968c49697db673f6cd2a07b3f7626779f1827
e0c4ec07284db817e1f8d9adfb3fffc952252db0
4b21acf522d751ba5b6679df391d5121b6c4a35f
Command to see relevant commits on Postgres side:
git log --all --grep="heap_open"
Pass the list to lnext API
lnext API now expects the list as well.
The commit on Postgres that introduced the change: 1cff1b95ab6ddae32faa3efe0d95a820dbfdc164
lnext_compat and list_delete_cell_compat macros are introduced so that
we can use these macros in the codebase without having to use #if
directives in the codebase.
Related commit on postgres:
1cff1b95ab6ddae32faa3efe0d95a820dbfdc164
Command to search in postgres:
git log --all --grep="list_delete_cell"
add ListCellAndListWrapper
When iterating a list in separate function calls, we need both the list
and the current cell starting from PG13, therefore
ListCellAndListWrapper is added to store both as a wrapper.
Use ListCellAndListWrapper in foreign key test udfs
As we iterate a list in these udfs using a functionContext, we need to
use the wrapper to be able to access both the list and the current cell.
With this patch, we introduce `locally_reserved_shared_connections.c/h` files
which are responsible for reserving some space in shared memory counters
upfront.
We sometimes need to reserve connections, but not necessarily
establish them. For example:
- COPY command should reserve connections as it cannot know which
connections it needs in which order. COPY establishes connections
as any input data hits the workers. For example, for router COPY
command, it only establishes 1 connection.
As discussed here (https://github.com/citusdata/citus/pull/3849#pullrequestreview-431792473),
COPY needs to reserve connections up-front, otherwise we can end
up with resource starvation/un-detected deadlocks.
* Use CalculateUniformHashRangeIndex in HashPartitionId
INT32_MIN definition can change among different platforms hence it is
possible to get overflow, we would see crashes because of this in debian
distros. We have already solved a similar problem with introducing
CalculateUniformHashRangeIndex method, hence to solve it we can use the
same method, this also removes some duplication and has a single place
to decide that.
* Use PG_INT32_XX instead of INT32_XX to be safer
The executor relies on WorkerPool, and many other places rely on WorkerNode.
With this commit, we make sure that they are sorted via the same function/logic.