We often change result types of functions slightly. Our downgrade tests
wouldn't notice these changes. This change adds them to the description
of these items.
An example of an SQL change that isn't caught without this change and is
caught with the get_rebalance_progress change in this PR:
https://github.com/citusdata/citus/pull/4963
It was possible to block maintenance daemon by taking an SHARE ROW
EXCLUSIVE lock on pg_dist_placement. Until the lock is released
maintenance daemon would be blocked.
We should not block the maintenance daemon under any case hence now we
try to get the pg_dist_placement lock without waiting, if we cannot get
it then we don't try to drop the old placements.
DESCRIPTION: introduce `citus.local_hostname` GUC for connections to the current node
Citus once in a while needs to connect to itself for some systems operations. This used to be hardcoded to `localhost`. The hardcoded hostname causes some issues, for example in environments where `sslmode=verify-full` is required. It is not always desirable or even feasible to get `localhost` as an alt name on the certificate.
By introducing a GUC to use when connecting to the current instance the user has more control what network path is used and what hostname is required to be present in the server certificate.
Every move in the rebalancer algorithm results in an improvement in the
balance. However, even if the improvement in the balance was very small
the move was still chosen. This is especially problematic if the shard
itself is very big and the move will take a long time.
This changes the rebalancer algorithm to take the relative size of the
balance improvement into account when choosing moves. By default a move
will not be chosen if it improves the balance by less than half of the
size of the shard. An extra argument is added to the rebalancer
functions so that the user can decide to lower the default threshold if
the ignored move is wanted anyway.
* Make VACUUM hint for upgrade scenario actually work
* Suggest using VACUUM if metapage doesn't exist
Plus, suggest upgrading sql version as another option.
* Always force read metapage block
* Fix two typos
* Columnar: introduce columnar storage API.
This new API is responsible for the low-level storage details of
columnar; translating large reads and writes into individual block
reads and writes that respect the page headers and emit WAL. It's also
responsible for the columnar metapage, resource reservations (stripe
IDs, row numbers, and data), and truncation.
This new API is not used yet, but will be used in subsequent
forthcoming commits.
* Columnar: add columnar_storage_info() for debugging purposes.
* Columnar: expose ColumnarMetadataNewStorageId().
* Columnar: always initialize metapage at creation time.
This avoids the complexity of dealing with tables where the metapage
has not yet been initialized.
* Columnar: columnar storage upgrade/downgrade UDFs.
Necessary upgrade/downgrade step so that new code doesn't see an old
metapage.
* Columnar: improve metadata.c comment.
* Columnar: make ColumnarMetapage internal to the storage API.
Callers should not have or need direct access to the metapage.
* Columnar: perform resource reservation using storage API.
* Columnar: implement truncate using storage API.
* Columnar: implement read/write paths with storage API.
* Columnar: add storage tests.
* Revert "Columnar: don't include stripe reservation locks in lock graph."
This reverts commit c3dcd6b9f8.
No longer needed because the columnar storage API takes care of
concurrency for resource reservation.
* Columnar: remove unnecessary lock when reserving.
No longer necessary because the columnar storage API takes care of
concurrent resource reservation.
* Add simple upgrade tests for storage/ branch
* fix multi_extension.out
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
* When moving a shard to a new node ensure there is enough space
* Add WairForMiliseconds time utility
* Add more tests and increase readability
* Remove the retry loop and use a single udf for disk stats
* Address review
* address review
Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
This allows running the following command to update the expected files
with normalized output files for upgrade tests too:
```bash
cp src/test/regress/{results,expected}/upgrade_rebalance_strategy_before.out
```
The comment of DropMarkedShards described the behaviour that after a
failure we would continue trying to drop other shards. However the code
did not do this and would stop after the first failure. Instead of
simply fixing the comment I fixed the code, because the described
behaviour is more useful. Now a single shard that cannot be removed yet
does not block others from being removed.
We decrease memory usage by:
- Freeing temporary buffers
- Using separate memory context for blocks that uses "small" amount of
memory but can be repeated many times such as loops
Recently two new normalization line deletion rules have been added that
don't match the start of a line:
```
/local tables that are added to metadata but not chained with reference tables via foreign keys might be automatically converted back to postgres tables$/d
/Consider setting citus.enable_local_reference_table_foreign_keys to 'off' to disable this behavior$/d
```
Because `diff-filter` used `regex.match` these lines were not removed
when creating a new diff. This could cause some confusing diffs, where
the wrong lines were shown as changed. This fixes that by using
`regex.search` instead of `regex.match`.
As long as the VALUES clause contains constant values, we should not
recursively plan the queries/CTEs.
This is a follow-up work of #1805. So, we can easily apply OUTER join
checks as if VALUES clause is a reference table/immutable function.
* Fix problews with concurrent calls of DropMarkedShards
When trying to enable `citus.defer_drop_after_shard_move` by default it
turned out that DropMarkedShards was not safe to call concurrently.
This could especially cause big problems when also moving shards at the
same time. During tests it was possible to trigger a state where a shard
that was moved would not be available on any of the nodes anymore after
the move.
Currently DropMarkedShards is only called in production by the
maintenaince deamon. Since this is only a single process triggering such
a race is currently impossible in production settings. In future changes
we will want to call DropMarkedShards from other places too though.
* Add some isolation tests
Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
This commit adds support for long partition names for distributed tables:
- ALTER TABLE dist_table ATTACH PARTITION ..
- CREATE TABLE .. PARTITION OF dist_table ..
Note: create_distributed_table UDF does not support long table and
partition names, and is not covered in this commit
* Introduce 3 partitioned size udfs
* Add tests for new partition size udfs
* Fix type incompatibilities
* Convert UDFs into pure sql functions
* Fix function comment
ConnParams(AuthInfo and PoolInfo) gets a snapshot, which will block the
remote connectinos to localhost. And the release of snapshot will be
blocked by the snapshot. This leads to a deadlock.
We warm up the conn params hash before starting a new transaction so
that the entries will already be there when we start a new transaction.
Hence GetConnParams will not get a snapshot.
* Columnar: use clause Vars for chunk group filtering.
This solves #4780 and also provides a cleaner separation between chunk
group filtering and projection pushdown.
* Columnar: sort and deduplicate Vars pulled from clauses.
* Columnar: cleanup variable names.
* Columnar: remove alternate test output.
* Columnar: do not recurse when looking for whereClauseVars.
Co-authored-by: Jeff Davis <jefdavi@microsoft.com>
comparable to https://github.com/citusdata/tools/pull/88
this patch adds checks to the perl script running the testing harness of citus to start the postgres instances via the fixopen binary when present to work around `Interrupted System` call errors on OSX Big Sur.
Earlier versions of Citus (pre 9.0) had a bug where a user was able to get in a situation where a foreign key between two non-colocated tables was allowed. This was caused by the wrongful scoping together with only setting to on of a boolean variable in a loop, causing the `true` from an earlier iteration to leak into a new iteration.
This was 'by accident' solved in a refactor that was executed in the preparation of the 9.0 release. Only recently we had a user running into this and it was tracked down to this behaviour.
Given the dire situation a user could get them self into when running into this bug we have backported a fix to the latest 8.3 release branch.
To make sure this regression does not happen anymore in the future I propose we add the tests from the backport to our mainline.
For reference: https://github.com/citusdata/citus/pull/4840
With https://github.com/citusdata/citus/pull/4806 we enabled
2PC for any non-read-only local task. However, if the execution
is a single task, enabling 2PC (CoordinatedTransactionShouldUse2PC)
hits an assertion as we are not in a coordinated transaction.
There is no downside of using a coordinated transaction for single
task local queries.
* Columnar: fix misnamed file.
* Columnar: make compression not dependent on columnar.h.
* Columnar: rename columnar_metadata_tables.c to columnar_metadata.c.
* Columnar: make customscan not depend on columnar.h.
Co-authored-by: Jeff Davis <jefdavi@microsoft.com>
Because setting the flag doesn't necessarily mean that we'll
use 2PC. If connections are read-only, we will not use 2PC.
In other words, we'll use 2PC only for connections that modified
any placements.
Before this commit, Citus used 2PC no matter what kind of
local query execution happens.
For example, if the coordinator has shards (and the workers as well),
even a simple SELECT query could start 2PC:
```SQL
WITH cte_1 AS (SELECT * FROM test LIMIT 10) SELECT count(*) FROM cte_1;
```
In this query, the local execution of the shards (and also intermediate
result reads) triggers the 2PC.
To prevent that, Citus now distinguishes local reads and local writes.
And, Citus switches to 2PC only if a modification happens. This may
still lead to unnecessary 2PCs when there is a local modification
and remote SELECTs only. Though, we handle that separately
via #4587.
Postgres keeps AFTER trigger state for each transaction, because we can have deferred AFTER triggers which will be fired at the end of a transaction. Postgres cleans up this state at the end of transaction.
Postgres processes ON COMMIT triggers after cleaning-up the AFTER trigger states. So if we fire any triggers in ON COMMIT, the AFTER trigger state won't be cleaned-up properly and the transaction state will be left in an inconsistent state, which might result in assertion failure.
So with this commit, we remove foreign keys between columnar metadata tables and enforce constraints between them manually when dropping columnar tables.
* Skip 2PC for readonly connections in a transaction
* Use ConnectionModifiedPlacement() function
* Remove the second check of ConnectionModifiedPlacement()
* Add order by to prevent flaky output
* Test using pg_dist_transaction
With this commit, we make sure to prevent infinite recursion for queries
in the format: [subquery with a UNION ALL] JOIN [table or subquery]
Also, fixes a bug where we pushdown UNION ALL below a JOIN even if the
UNION ALL is not safe to pushdown.
* Reimplement citus_update_table_statistics
* Update stats for the given table not colocation group
* Add tests for reimplemented citus_update_table_statistics
* Use coordinated transaction, merge with citus_shard_sizes functions
* Update the old master_update_table_statistics as well
* Use translated vars in postgres 13 as well
Postgres 13 removed translated vars with pg 13 so we had a special logic
for pg 13. However it had some bug, so now we copy the translated vars
before postgres deletes it. This also simplifies the logic.
* fix rtoffset with pg >= 13
/*
* The physical planner assumes that all worker queries would have
* target list entries based on the fact that at least the column
* on the JOINs have to be on the target list. However, there is
* an exception to that if there is a cartesian product join and
* there is no additional target list entries belong to one side
* of the JOIN. Once we support cartesian product join, we should
* remove this error.
*/
When we use PROCESS_UTILITY_TOPLEVEL it causes some problems when
combined with other extensions such as pg_audit. With this commit we use
PROCESS_UTILITY_QUERY in the codebase to fix those problems.
When executing alter_table / undistribute_table udf's, we should not try
to change sequence dependencies on MX workers if new table wouldn't
require syncing metadata.
Previously, we were checking that for input table. But in some cases, the
fact that input table requires syncing metadata doesn't imply the same
for resulting table (e.g when undistributing a Citus table).
Even more, doing that was giving an unexpected error when undistributing
a Citus table so this commit actually fixes that.
It seems that we need to consider only pseudo constants while doing some
shortcuts in planning. For example there could be a false clause but it
can contribute to the result in which case it will not be a pseudo
constant.
We would exclude tables without relationRestriction from conversion
candidates in local-distributed table joins. This could leave a leftover
local table which should have been converted to a subquery.
Ideally I would expect that in each call to CreateDistributedPlan we
would pass a new plan id, but that seems like a bigger change.
/*
* Colocated intermediate results are just files and not required to use
* the same connections with their co-located shards. So, we are free to
* use any connection we can get.
*
* Also, the current connection re-use logic does not know how to handle
* intermediate results as the intermediate results always truncates the
* existing files. That's why, we use one connection per intermediate
* result.
*/
We do not include dummy column if original task didn't return any
columns.
Otherwise, number of columns that original task returned wouldn't
match number of columns returned by worker_save_query_explain_analyze.
When COPY is used for copying into co-located files, it was
not allowed to use local execution. The primary reason was
Citus treating co-located intermediate results as co-located
shards, and COPY into the distributed table was done via
"format result". And, local execution of such COPY commands
was not implemented.
With this change, we implement support for local execution with
"format result". To do that, we use the buffer for every file
on shardState->copyOutState, similar to how local copy on
shards are implemented. In fact, the logic is similar to
local copy on shards, but instead of writing to the shards,
Citus writes the results to a file.
The logic relies on LOCAL_COPY_FLUSH_THRESHOLD, and flushes
only when the size exceeds the threshold. But, unlike local
copy on shards, in this case we write the headers and footers
just once.
* Sort results in citus_shards and give raw size
Sort results so that it is consistent and also similar to citus_tables.
Use raw size in the output so that doing operations on the size is
easier.
* Change column ordering