DESCRIPTION: Fix reference table lock contention
Dropping and creating reference tables unintentionally blocked on each other due to the use of an ExclusiveLock for both the Drop and conditionally copying existing reference tables to (new) nodes.
The patch does the following:
- Lower lock lever for dropping (reference) tables to `ShareLock` so they don't self conflict
- Treat reference tables and distributed tables equally and acquire the colocation lock when dropping any table that is in a colocation group
- Perform the precondition check for copying reference tables twice, first time with a lower lock that doesn't conflict with anything. Could have been a NoLock, however, in preparation for dropping a colocation group, it is an `AccessShareLock`
During normal operation the first check will always pass and we don't have to escalate that lock. Making it that we won't be blocked on adding and remove reference tables. Only after a node addition the first `create_reference_table` will still need to acquire an `ExclusiveLock` on the colocation group to perform the copy.
There are 3 different ways that a sequence can be interacting
with tables. (1) and (2) are already supported. This commit adds
support for (3).
(1) column DEFAULT nextval('seq'):
The dependency is roughly like below,
and ExpandCitusSupportedTypes() is responsible
for finding the depending sequences.
schema <--- table <--- column <---- default value
^ |
|------------------ sequence <--------|
(2) serial columns: Bigserial/small serial etc:
The dependency is roughly like below,
and ExpandCitusSupportedTypes() is responsible
for finding the depending sequences.
schema <--- table <--- column <---- default value
^ |
| |
sequence <--------|
(3) Sequence OWNED BY table.column: Added support for
this type of resolution in this commit.
The dependency is almost like the following, and
ExpandCitusSupportedTypes() is NOT responsible for finding
the dependency.
schema <--- table <--- column
^
|
sequence
(cherry picked from commit 9ec8e627c1)
For some reason search_path is not always set correctly on the worker
when calling a distributed function, this shows up when calling
`insert_document` in our distributed_triggers test. The underlying
reason is currently unknown and warrants deeper investigation.
Currently this test is one of the main causes for random CI failures. So
this change sets the search_path of each function explicitly, to reduce
these failures. So other devs can be more efficient, while I continue
investigating the root cause of this issue.
Also changes explicit `SET citus.enable_unsafe_triggers = false` to
`RESET citus.enable_unsafe_triggers` in passing.
(cherry picked from commit 6d8c5931d6)
Reported bug #5803 shows that we are currently not sending the IN clause to our planner for columnar. This PR fixes it by checking for ScalarArrayOpExpr in ExtractPushdownClause so that we do not skip it. Also added a test case for this new addition.
It turns out that create_distributed_table
and citus_move/copy_shard_placement does not
work well concurrently.
To fix that, we need to acquire a lock, which
sounds like a good use of colocation lock.
However, the current usage of colocation lock is
limited to higher level UDFs like rebalance_table_shards
etc. Those usage of lock is still useful, but
we cannot acquire the same lock on citus_move_shard_placement
etc. because the coordinator connects to itself to acquire
the lock. Hence, the high level UDF blocks itself.
To fix that, we use one more colocation lock, with the placements
are the main objects to consider.
(cherry picked from commit 12fa3aaf6b)
Before this commit, we required multiple copies of the
same stringInfo if we needed to append/prepend data to
the stringInfo. Now, we optionally get prefix/postfix.
For large string operations, this can save up to %10
memory.
(cherry picked from commit 26fdcb68f0)
Previously, CreateFixPartitionShardIndexNames() created all
the relevant query strings for all the shards, and executed
the large query string. And, in terms of the memory consumption,
this huge command (and its ExprContext generated while running
the command) is the main bottleneck/
With this change, we are reducing the total amount of memory
usage to almost 1/shard_count.
On my local machine, a distributed partitioned table with 120 partitions,
each 32 shards, the total memory consumption reduced from ~3GB
to ~0.1GB. And, the total execution time increased from ~28 seconds
to ~30 seconds. This seems like a good trade-off.
(cherry picked from commit b8008999dc)
DESCRIPTION:
Fix Bug #4949 where Blocking shard moves fails if there is a foreign key between partitioned distributed tables (from child to parent). This is because we try to create constraints before attaching child partitions to parent. This causes constraint failure as parent table will be empty. Fix is to reverse the order i.e. attach partitions before we create constraints.
TESTING:
Added a new test 'shard_move_constraints_blocking' inspired for existing 'shard_move_constraints' where we trigger shard move with 'block_writes' instead of 'force_logical' to add coverage for this scenario.
This PR makes all of the features open source that were previously only
available in Citus Enterprise.
Features that this adds:
1. Non blocking shard moves/shard rebalancer
(`citus.logical_replication_timeout`)
2. Propagation of CREATE/DROP/ALTER ROLE statements
3. Propagation of GRANT statements
4. Propagation of CLUSTER statements
5. Propagation of ALTER DATABASE ... OWNER TO ...
6. Optimization for COPY when loading JSON to avoid double parsing of
the JSON object (`citus.skip_jsonb_validation_in_copy`)
7. Support for row level security
8. Support for `pg_dist_authinfo`, which allows storing different
authentication options for different users, e.g. you can store
passwords or certificates here.
9. Support for `pg_dist_poolinfo`, which allows using connection poolers
in between coordinator and workers
10. Tracking distributed query execution times using
citus_stat_statements (`citus.stat_statements_max`,
`citus.stat_statements_purge_interval`,
`citus.stat_statements_track`). This is disabled by default.
11. Blocking tenant_isolation
12. Support for `sslkey` and `sslcert` in `citus.node_conninfo`
We already have tests relying on citus_finalize_upgrade_to_citus11().
Now, adjust those to rely on citus_finish_citus_upgrade() and
always call citus_finish_citus_upgrade().
We remove `<waiting ...>` and `<... completed>` outputs for some CREATE
INDEX CONCURRENTLY commands since they can cause flakiness in some scenarios.
Postgres calls WaitForOlderSnapshots() and this can cause CREATE INDEX
CONCURRENTLY commands for shards to get blocked by each other for brief
periods of time. The extra waits can pop-up, or they can get completed
at different lines in the output files. To remedy that, we rename those
indexes to be captured by the new normalization rule.
(cherry picked from commit 52541c5802)
The error comes due to the datum jsonb in pg_dist_metadata_node.metadata being 0 in some scenarios. This is likely due to not copying the data when receiving a datum from a tuple and pg deciding to deallocate that memory when the table that the tuple was from is closed.
Also fix another place in the code that might have been susceptible to this issue.
I tested on both multi-vg and multi-1-vg and the test were successful.
(cherry picked from commit beef392f5a)
The general rule is:
If the data is used within the bounds of table_open ... table_close > no need to copy
If the data is required for use even after the table is closed > copy
(cherry picked from commit dc9da7630f)