Commit Graph

2976 Commits (789ff7b162acfe0750d3bd04e2287581f8b7c776)

Author SHA1 Message Date
Nitish Upreti 789ff7b162 Validate relation name before logging it 2022-08-29 18:24:38 -07:00
Nitish Upreti 2c50101074 Update sql script 2022-08-28 17:58:20 -07:00
Nitish Upreti 6348faf7d3 Sort GUC 2022-08-28 17:48:30 -07:00
Nitish Upreti 7280b80ef4 Update tests 2022-08-28 00:08:58 -07:00
Nitish Upreti 5e5a2147cd Fix more tests 2022-08-27 22:19:14 -07:00
Nitish Upreti ef2361f091 Run reindent 2022-08-27 21:24:56 -07:00
Nitish Upreti 59aaed3e5c Fix failing tests 2022-08-27 21:23:17 -07:00
Nitish Upreti 21028434ce Add operation name for drop 2022-08-27 20:12:29 -07:00
Nitish Upreti ce3ae8ff81 Downgrade steps 2022-08-27 18:06:24 -07:00
Nitish Upreti f3a14460e8 Permission check causes tenant isolation failure 2022-08-26 16:08:37 -07:00
Nitish Upreti a7ec398f7a Use recordid sequence always 2022-08-26 16:03:14 -07:00
Nitish Upreti fa1456d14f Fix dummy shard logging bug and update test 2022-08-26 13:50:32 -07:00
Nitish Upreti 3d46860fbb Reindent 2022-08-25 18:45:41 -07:00
Nitish Upreti 919e44eab6 Improvements and comments 2022-08-25 18:42:46 -07:00
Nitish Upreti 92b1cdf6c0 Deferred drop Hello World 2022-08-25 13:18:04 -07:00
Nitish Upreti bf61fe3565 Cleaner Improvement 2022-08-25 09:29:19 -07:00
Nitish Upreti 895fe14040 Initial Commit 2022-08-24 23:55:44 -07:00
Jelte Fennema 31faa88a4e
Track rebalance progress at the shard move level (#6187)
We're in the processes of totally changing the shard rebalancer
experience and infrastructure. Soon the shard rebalancer will include
retries, crash recovery and support for running in the background.

These improvements come at a cost though, the way the
get_rebalance_progress UDF currently works is very hard to replicate
with this new structure. This is mostly because the old behaviour
doesn't really make sense anymore with this new infrastructure. A new
and better way to track the progress will be included as part of the new
infrastructure.

This PR is in preparation of the new code rebalancer experience.
It changes the get_rebalance_progress UDF to only display the moves that
are in progress at the moment, not the ones that happened in the past or
that are planned in the future. Another option would have been to
completely remove the current get_rebalance_progress functionality and
point people to the new way of tracking progress. But old blogposts
still reference the old UDF and users might have some automation on top
of it. Showing the progress of the current moves is fairly simple to
achieve, even with the new infrastructure.

So this PR is a kind of compromise: It doesn't have complete feature
parity with the old get_rebalance_progress, but the most common use
cases will still work.

There's also an advantage of the change: You can now see progress of
shard moves that were triggered by calling citus_move_shard_placement
manually. Instead of only being able to see progress of moves that were
initiated using get_rebalance_table_shards.
2022-08-18 18:57:04 +02:00
Onder Kalaci 9ec8e627c1 Support Sequences owned by columns before distributing tables
There are 3 different ways that a sequence can be interacting
with tables. (1) and (2) are already supported. This commit adds
support for (3).

     (1) column DEFAULT nextval('seq'):

	The dependency is roughly like below,
	and ExpandCitusSupportedTypes() is responsible
	for finding the depending sequences.

        schema <--- table <--- column <---- default value
         ^                                     |
         |------------------ sequence <--------|

    (2) serial columns: Bigserial/small serial etc:

	The dependency is roughly like below,
	and ExpandCitusSupportedTypes() is responsible
	for finding the depending sequences.

        schema <--- table <--- column <---- default value
                                 ^             |
				 |             |
          		     sequence <--------|

   (3) Sequence OWNED BY table.column: Added support for
       this type of resolution in this commit.

       The dependency is almost like the following, and
       ExpandCitusSupportedTypes() is NOT responsible for finding
       the dependency.

        schema <--- table <--- column
                                 ^
				 |
          		     sequence
2022-08-18 10:29:40 +02:00
Naisila Puka 69ffdbf0e3
Uses object name in cannot distribute object error (#6186)
Object type ids have changed in PG15 because of at least two added
objects in the list: OBJECT_PARAMETER_ACL, OBJECT_PUBLICATION_NAMESPACE

To avoid different output between pg versions, let's use the object
name in the error, and put the object id in the error detail.

Relevant PG commits:
a0ffa885e478f5eeacc4e250e35ce25a4740c487
5a2832465fd8984d089e8c44c094e6900d987fcd
2022-08-18 11:05:17 +03:00
Ying Xu 91473635db
[Columnar] Check for existence of Citus before creating Citus_Columnar (#6178)
* Added a check to see if Citus has already been loaded before creating citus_columnar

* added tests
2022-08-17 15:12:42 -07:00
Nils Dijk a9d47a96f6
Fix reference table lock contention (#6173)
DESCRIPTION: Fix reference table lock contention

Dropping and creating reference tables unintentionally blocked on each other due to the use of an ExclusiveLock for both the Drop and conditionally copying existing reference tables to (new) nodes.

The patch does the following:
 - Lower lock lever for dropping (reference) tables to `ShareLock` so they don't self conflict
 - Treat reference tables and distributed tables equally and acquire the colocation lock when dropping any table that is in a colocation group
 - Perform the precondition check for copying reference tables twice, first time with a lower lock that doesn't conflict with anything. Could have been a NoLock, however, in preparation for dropping a colocation group, it is an `AccessShareLock`

During normal operation the first check will always pass and we don't have to escalate that lock. Making it that we won't be blocked on adding and remove reference tables. Only after a node addition the first `create_reference_table` will still need to acquire an `ExclusiveLock` on the colocation group to perform the copy.
2022-08-17 18:19:28 +02:00
Ahmet Gedemenli 0631e1998b
Fix upgrade paths for #6100 (#6176)
* Fix upgrade paths for #6100

Co-authored-by: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com>
2022-08-17 18:56:53 +03:00
Jelte Fennema 3f6ce889eb
Use CreateSimpleHash (and variants) whenever possible (#6177)
This is a refactoring PR that starts using our new hash table creation
helper function. It adds a few more macros for ease of use, because C
doesn't have default arguments. It also adds a macro to check if a
struct contains automatic padding bytes. No struct that is hashed using
tag_hash should have automatic padding bytes, because those bytes are
undefined and thus using them to create a hash will result in undefined
behaviour (usually a random hash).
2022-08-17 13:01:59 +03:00
aykut-bozkurt 52efe08642
default mode for shard splitting is set to auto. (#6179) 2022-08-17 12:18:47 +03:00
aykut-bozkurt be06d65721
Nonblocking tenant isolation is supported by using split api. (#6167) 2022-08-17 11:13:07 +03:00
Jelte Fennema 78a5013e24
Support changing CPU priorities for backends and shard moves (#6126)
**Intro**
This adds support to Citus to change the CPU priority values of
backends. This is created with two main usecases in mind:

1. Users might want to run the logical replication part of the shard moves
   or shard splits at a higher speed than they would do by themselves. 
   This might cause some small loss of DB performance for their regular 
   queries, but this is often worth it. During high load it's very possible
   that the logical replication WAL sender is not able to keep up with the
   WAL that is generated. This is especially a big problem when the
   machine is close to running out of disk when doing a rebalance.
2. Users might have certain long running queries that they don't impact
   their regular workload too much.

**Be very careful!!!**
Using CPU priorities to control scheduling can be helpful in some cases
to control which processes are getting more CPU time than others. 
However, due to an issue called "[priority inversion][1]" it's possible that
using CPU priorities together with the many locks that are used within
Postgres cause the exact opposite behavior of what you intended. This
is why this PR only allows the PG superuser to change the CPU priority 
of its own processes. Currently it's not recommended to set `citus.cpu_priority`
directly. Currently the only recommended interface for users is the setting 
called `citus.cpu_priority_for_logical_replication_senders`. This setting
controls CPU priority for a very limited set of processes (the logical 
replication senders). So, the dangers of priority inversion are also limited
with when using it for this usecase.

**Background**
Before reading the rest it's important to understand some basic
background regarding process CPU priorities, because they are a bit
counter intuitive. A lower priority value, means that the process will
be scheduled more and whatever it's doing will thus complete faster. The
default priority for processes is 0. Valid values are from -20 to 19
inclusive. On Linux a larger difference between values of two processes
will result in a bigger difference in percentage of scheduling.

**Handling the usecases**
Usecase 1 can be achieved by setting `citus.cpu_priority_for_logical_replication_senders`
to the priority value that you want it to have. It's necessary to set
this both on the workers and the coordinator. Example:
```
citus.cpu_priority_for_logical_replication_senders = -10
```

Usecase 2 can with this PR be achieved by running the following as
superuser. Note that this is only possible as superuser currently 
due to the dangers mentioned in the "Be very carefull!!!" section. 
And although this is possible it's **NOT** recommended:
```sql
ALTER USER background_job_user SET citus.cpu_priority = 5;
```

**OS configuration**
To actually make these settings work well it's important to run Postgres
with more a more permissive value for the 'nice' resource limit than
Linux will do by default. By default Linux will not allow a process to
set its priority lower than it currently is, even if it was lower when
the process originally started. This capability is necessary to reset
the CPU priority to its original value after a transaction finishes.
Depending on how you run Postgres this needs to be done in one of two
ways:

If you use systemd to start Postgres all you have to do is add  a line
like this to the systemd service file:
```conf
LimitNice=+0 # the + is important, otherwise its interpreted incorrectly as 20
```

If that's not the case you'll have to configure `/etc/security/limits.conf` 
like so, assuming that you are running Postgres as the `postgres` OS user:
```
postgres            soft    nice            0
postgres            hard    nice            0
```
Finally you'd have add the following line to `/etc/pam.d/common-session`
```
session required pam_limits.so
```

These settings would allow to change the priority back after setting it
to a higher value.

However, to actually allow you to set priorities even lower than the
default priority value you would need to change the values in the 
config to something lower than 0. So for example:
```conf
LimitNice=-10
```

or

```
postgres            soft    nice            -10
postgres            hard    nice            -10
```

If you use WSL2 you'll likely have to do another thing. You have to 
open a new shell, because when PAM is only used during login, and 
WSL2 doesn't actually log you in. You can force a login like this:
```
sudo su $USER --shell /bin/bash
```
Source: https://stackoverflow.com/a/68322992/2570866

[1]: https://en.wikipedia.org/wiki/Priority_inversion
2022-08-16 13:07:17 +03:00
Jelte Fennema 1a01c896f0
Fix description of citus.distributed_deadlock_detection_factor (#5860)
The long description of the `citus.distributed_deadlock_detection_factor` 
setting was incorrectly stating that 1000 would disable it. Instead -1 
is the value that disables distributed deadlock detection.
2022-08-16 01:19:49 +03:00
Jelte Fennema 43c2a1e88b
Share more code between splits and moves (#6152)
When introducing non-blocking shard split functionality it was based
heavily on the non-blocking shard moves. However, differences between
usage was slightly to big to be able to reuse the existing functions
easily. So, most logical replication code was simply copied to dedicated
shard split functions and modified for that purpose.

This PR tries to create a more generic logical replication
infrastructure that can be used by both shard splits and shard moves.
There's probably more code sharing possible in the future, but I believe
this is at least a good start and addresses the lowest hanging fruit.

This also adds a CreateSimpleHash function that makes creating the
most common type of hashmap common.
2022-08-15 20:21:51 +03:00
Marco Slot 6c73576606 Fix HTAB memory leaks 2022-08-15 16:10:24 +02:00
Teja Mupparti e962113c63 Remove the GUC mention in the error message as this config is meant for advanced users 2022-08-11 09:43:14 -07:00
aykut-bozkurt 898801504e
sysid should be parsed as int. (#6150) 2022-08-11 10:44:46 +03:00
aykut-bozkurt 166272963a
log NOTICE createdb only if EnableUnsupportedFeatureMessages GUC is enabled. (#6151) 2022-08-09 21:21:22 +03:00
aykut-bozkurt cc694b6bcf
we consider stat object as invalid if it is not owned by current user (#6130) 2022-08-09 20:59:30 +03:00
Hanefi Onaldi a58523f1d8
Remove all references to .source files 2022-08-09 14:15:52 +03:00
Jelte Fennema 8017693b2f
Allow specifying the shard_transfer_mode when replicating reference tables (#6070)
When using `citus.replicate_reference_tables_on_activate = off`,
reference tables need to be replicated later. This can be done using the
`replicate_reference_tables()` UDF. However, this function only allowed
blocking replication. This changes the function to default to logical
replication instead, and allows choosing any of our existing shard
transfer modes.
2022-08-09 13:21:31 +03:00
Marco Slot 3b57ff2867 Fix crash in citus_copy_shard_placement 2022-08-09 09:31:05 +02:00
Jelte Fennema dd548ee3c7
Use faster custom copy logic for non-blocking shard moves (#6119)
DESCRIPTION: Use faster custom copy logic for non-blocking shard moves

Non-blocking shard moves consist of two main phases:
1. Initial data copy
2. Catchup phase

This changes the first of these phases significantly. Previously we used the
copy logic provided by postgres subscriptions. This meant we didn't have
to implement it ourselves, but it came with the downside of little control.
When implementing shard splits we needed more control to even make it
work, so we implemented our own logic for copying data between nodes.

This PR starts using that logic for non-blocking shard moves. Doing so
has four main advantages:
1. It uses COPY in binary format when possible, which is cheaper to encode 
    and decode. Furthermore it very often results in less data that needs to 
    be sent over the network.
2. It allows us to create the primary key (or other replica identity) after doing
    the initial data copy. This should give some speed up over the total run,
    because creating an index is bulk is much faster than incrementally building it.
3. It doesn't require a replication slot per parallel copy. Increasing the maximum
    number of replication slots uses resources in postgres, even if they are not used.
    So reducing the number of replication slots that shard moves need is nice.
4. Logical replication table_sync workers are slow to start up, so if lots of shards
    need to be copied that can make it quite slow. This can happen easily when
    combining Postgres partitioning with Citus.
2022-08-08 17:09:43 +02:00
Marco Slot ead9d28835 Avoid deadlocks on split failure by closing connections 2022-08-08 13:33:23 +02:00
Marco Slot 044dd26e40 Reimplement tenant isolation on top of block shard split 2022-08-08 13:33:23 +02:00
Teja Mupparti 430c201d03 get_current_transaction_id() UDF is not printing the timestamp of the current transaction on the coordinator even when non-null 2022-08-05 10:12:07 -07:00
aykut-bozkurt 4992533e33
support grant statement propagation for aggregates (#6132) 2022-08-05 14:47:33 +03:00
Ahmet Gedemenli 8b68b0b5bb
Fix pg upgrade script for foreign tables (#6100)
Fixes unexpected error for foreign tables when upgrading pg
2022-08-05 13:35:17 +03:00
Sameer Awasekar e236711eea Introduce Non-Blocking Shard Split Workflow 2022-08-04 16:32:38 +02:00
aykut-bozkurt b67abdd28c
we should not log error in preprocess if attached partition is missing. (#6131) 2022-08-04 15:49:14 +03:00
aykut-bozkurt 3ddc089651
stop distributing views with no distributed dependency if GUC DistributeLocalViews is set false. (#6083) 2022-08-04 12:34:40 +03:00
aykut-bozkurt 4ffe436bf9
we validate constraint as well if the statement is alter domain drop constraint (#6125) 2022-08-03 23:06:33 +03:00
aykut-bozkurt a662331668
qualify text dict and conf respect missingok (#6120) 2022-08-03 13:13:53 +03:00
aykutbozkurt 7387c7ed3d address method should take parameter isPostprocess 2022-08-02 21:00:23 +03:00
aykutbozkurt c98a68662a introduces operation type for dist ops 2022-08-02 20:42:32 +03:00