Commit Graph

128 Commits (906583f81151820e81ef25abd7fc2f95c5b120d4)

Author SHA1 Message Date
eaydingol ce9a7b7c79 Run citus tests on latest so, specified sql 2025-11-25 16:16:42 +03:00
Onur Tirtir 2e1de77744
Also use pid in valgrind logfile name (#8150)
Also use pid in valgrind logfile name to avoid overwriting the valgrind
logs due to the memory errors that can happen in different processes
concurrently:

(from https://valgrind.org/docs/manual/manual-core.html)
```
--log-file=<filename>
Specifies that Valgrind should send all of its messages to the specified file. If the file name is empty, it causes an abort. There are three special format specifiers that can be used in the file name.

%p is replaced with the current process ID. This is very useful for program that invoke multiple processes. WARNING: If you use --trace-children=yes and your program invokes multiple processes OR your program forks without calling exec afterwards, and you don't use this specifier (or the %q specifier below), the Valgrind output from all those processes will go into one file, possibly jumbled up, and possibly incomplete.
```

With this change, we'll start having lots of valgrind output files
generated under "src/test/regress" with the same prefix,
citus_valgrind_test_log.txt, by default, during valgrind tests, so it'll
look a bit ugly; but one can use `cat
src/test/regress/citus_valgrind_test_log.txt.[0-9]*"` or such to combine
them into a single valgrind log file later.
2025-08-27 14:01:25 +00:00
Muhammad Usama be6668e440
Snapshot-Based Node Split – Foundation and Core Implementation (#8122)
**DESCRIPTION:**
This pull request introduces the foundation and core logic for the
snapshot-based node split feature in Citus. This feature enables
promoting a streaming replica (referred to as a clone in this feature
and UI) to a primary node and rebalancing shards between the original
and the newly promoted node without requiring a full data copy.

This significantly reduces rebalance times for scale-out operations
where the new node already contains a full copy of the data via
streaming replication.

Key Highlights:
**1. Replica (Clone) Registration & Management Infrastructure**

Introduces a new set of UDFs to register and manage clone nodes:

- citus_add_clone_node()
- citus_add_clone_node_with_nodeid()
- citus_remove_clone_node()
- citus_remove_clone_node_with_nodeid()

These functions allow administrators to register a streaming replica of
an existing worker node as a clone, making it eligible for later
promotion via snapshot-based split.

**2. Snapshot-Based Node Split (Core Implementation)**
New core UDF: 

- citus_promote_clone_and_rebalance()

This function implements the full workflow to promote a clone and
rebalance shards between the old and new primaries. Steps include:

1. Ensuring Exclusivity – Blocks any concurrent placement-changing
operations.
2. Blocking Writes – Temporarily blocks writes on the primary to ensure
consistency.
3. Replica Catch-up – Waits for the replica to be fully in sync.
4. Promotion – Promotes the replica to a primary using pg_promote.
5. Metadata Update – Updates metadata to reflect the newly promoted
primary node.
6. Shard Rebalancing – Redistributes shards between the old and new
primary nodes.


**3. Split Plan Preview**
A new helper UDF get_snapshot_based_node_split_plan() provides a preview
of the shard distribution post-split, without executing the promotion.

**Example:**

```
reb 63796> select * from pg_catalog.get_snapshot_based_node_split_plan('127.0.0.1',5433,'127.0.0.1',5453);
  table_name  | shardid | shard_size | placement_node 
--------------+---------+------------+----------------
 companies    |  102008 |          0 | Primary Node
 campaigns    |  102010 |          0 | Primary Node
 ads          |  102012 |          0 | Primary Node
 mscompanies  |  102014 |          0 | Primary Node
 mscampaigns  |  102016 |          0 | Primary Node
 msads        |  102018 |          0 | Primary Node
 mscompanies2 |  102020 |          0 | Primary Node
 mscampaigns2 |  102022 |          0 | Primary Node
 msads2       |  102024 |          0 | Primary Node
 companies    |  102009 |          0 | Clone Node
 campaigns    |  102011 |          0 | Clone Node
 ads          |  102013 |          0 | Clone Node
 mscompanies  |  102015 |          0 | Clone Node
 mscampaigns  |  102017 |          0 | Clone Node
 msads        |  102019 |          0 | Clone Node
 mscompanies2 |  102021 |          0 | Clone Node
 mscampaigns2 |  102023 |          0 | Clone Node
 msads2       |  102025 |          0 | Clone Node
(18 rows)

```
**4 Test Infrastructure Enhancements**

- Added a new test case scheduler for snapshot-based split scenarios.
- Enhanced pg_regress_multi.pl to support creating node backups with
slightly modified options to simulate real-world backup-based clone
creation.

### 5. Usage Guide
The snapshot-based node split can be performed using the following
workflow:

**- Take a Backup of the Worker Node**
Run pg_basebackup (or an equivalent tool) against the existing worker
node to create a physical backup.

`pg_basebackup -h <primary_worker_host> -p <port> -D
/path/to/replica/data --write-recovery-conf
`

**- Start the Replica Node**
Start PostgreSQL on the replica using the backup data directory,
ensuring it is configured as a streaming replica of the original worker
node.

**- Register the Backup Node as a Clone**
Mark the registered replica as a clone of its original worker node:

`SELECT * FROM citus_add_clone_node('<clone_host>', <clone_port>,
'<primary_host>', <primary_port>);
`

**- Promote and Rebalance the Clone**
Promote the clone to a primary and rebalance shards between it and the
original worker:

`SELECT * FROM citus_promote_clone_and_rebalance('clone_node_id');
`

**- Drop Any Replication Slots from the Original Worker**
After promotion, clean up any unused replication slots from the original
worker:

`SELECT pg_drop_replication_slot('<slot_name>');
`
2025-08-19 14:13:55 +03:00
SongYoungUk 743c9bbf87
fix #7715 - add assign hook for CDC library path adjustment (#8025)
DESCRIPTION: Automatically updates dynamic_library_path when CDC is
enabled

fix : #7715 

According to the documentation and `pg_settings`, the context of the
`citus.enable_change_data_capture` parameter is user.

However, changing this parameter — even as a superuser — doesn't work as
expected: while the initial copy phase works correctly, subsequent
change events are not propagated.

This appears to be due to the fact that `dynamic_library_path` is only
updated to `$libdir/citus_decoders:$libdir` when the server is restarted
and the `_PG_init` function is invoked.

To address this, I added an `EnableChangeDataCaptureAssignHook` that
automatically updates `dynamic_library_path` at runtime when
`citus.enable_change_data_capture` is enabled, ensuring that the CDC
decoder libraries are properly loaded.

Note that `dynamic_library_path` is already a `superuser`-context
parameter in base PostgreSQL, so updating it from within the assign hook
should be safe and consistent with PostgreSQL’s configuration model.

If there’s any reason this approach might be problematic or if there’s a
preferred alternative, I’d appreciate any feedback.




cc. @jy-min

---------

Co-authored-by: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com>
Co-authored-by: ibrahim halatci <ihalatci@gmail.com>
2025-07-18 11:07:17 +03:00
Onur Tirtir 3d61c4dc71
Add citus_stat_counters view and citus_stat_counters_reset() function to reset it (#7917)
DESCRIPTION: Adds citus_stat_counters view that can be used to query
stat counters that Citus collects while the feature is enabled, which is
controlled by citus.enable_stat_counters. citus_stat_counters() can be
used to query the stat counters for the provided database oid and
citus_stat_counters_reset() can be used to reset them for the provided
database oid or for the current database if nothing or 0 is provided.

Today we don't persist stat counters on server shutdown. In other words,
stat counters are automatically reset in case of a server restart.

Details on the underlying design can be found in header comment of
stat_counters.c and in the technical readme.

-------

Here are the details about what we track as of this PR:

For connection management, we have three statistics about the inter-node
connections initiated by the node itself:

* **connection_establishment_succeeded**
* **connection_establishment_failed**
* **connection_reused**

While the first two are relatively easier to understand, the third one
covers the case where a connection is reused. This can happen when a
connection was already established to the desired node, Citus decided to
cache it for some time (see citus.max_cached_conns_per_worker &
citus.max_cached_connection_lifetime), and then reused it for a new
remote operation. Here are the other important details about these
connection statistics:

1. connection_establishment_failed doesn't care about the connections
that we could establish but are lost later in the transaction. Plus, we
cannot guarantee that the connections that are counted in
connection_establishment_succeeded were not lost later.
2. connection_establishment_failed doesn't care about the optional
connections (see OPTIONAL_CONNECTION flag) that we gave up establishing
because of the connection throttling rules we follow (see
citus.max_shared_pool_size & citus.local_shared_pool_size). The reaason
for this is that we didn't even try to establish these connections.
3. For the rest of the cases where a connection failed for some reason,
we always increment connection_establishment_failed even if the caller
was okay with the failure and know how to recover from it (e.g., the
adaptive executor knows how to fall back local execution when the target
node is the local node and if it cannot establish a connection to the
local node). The reason is that even if it's likely that we can still
serve the operation, we still failed to establish the connection and we
want to track this.
4. Finally, the connection failures that we count in
connection_establishment_failed might be caused by any of the following
reasons and for now we prefer to _not_ further distinguish them for
simplicity:
a. remote node is down or cannot accept any more connections, or
overloaded such that citus.node_connection_timeout is not enough to
establish a connection
b. any internal Citus error that might result in preparing a bad
connection string so that libpq fails when parsing the connection string
even before actually trying to establish a connection via connect() call
c. broken citus.node_conninfo or such Citus configuration that was
incorrectly set by the user can also result in similar outcomes as in b
d. internal waitevent set / poll errors or OOM in local node

We also track two more statistics for query execution:

* **query_execution_single_shard**
* **query_execution_multi_shard**

And more importantly, both query_execution_single_shard and
query_execution_multi_shard are not only tracked for the top-level
queries but also for the subplans etc. The reason is that for some
queries, e.g., the ones that go through recursive planning, after Citus
performs the heavy work as part of subplans, the work that needs to be
done for the top-level query becomes quite straightforward. And for such
query types, it would be deceiving if we only incremented the query stat
counters for the top-level query. Similarly, for non-pushable INSERT ..
SELECT and MERGE queries, we perform separate counter increments for the
SELECT / source part of the query besides the final INSERT / MERGE
query.
2025-04-28 12:23:52 +00:00
eaydingol 117bd1d04f
Disable nonmaindb interface (#7905)
DESCRIPTION: The PR disables the non-main db related features. 

The non-main db related features were introduced in
https://github.com/citusdata/citus/pull/7203.
2025-02-21 13:36:19 +03:00
Jelte Fennema-Nio 8c9de08b76
Fix CI issues after Github Actions networking changes (#7624)
For some reason using localhost in our hba file doesn't have the
intended effect anymore in our Github Actions runners. Probably because
of some networking change (IPv6 maybe) or some change in the
`/etc/hosts` file.

Replacing localhost with the equivalent loopback IPv4 and IPv6 addresses
resolved this issue.
2024-06-14 16:20:23 +02:00
Karina 41d99249d9
Use expecteddir option when running vanilla tests (#7573)
In PostgreSQL 16 a new option expecteddir was introduced to pg_regress.
Together with fix in
[196eeb6b](https://github.com/postgres/postgres/commit/196eeb6b) it
causes check-vanilla failure if expecteddir is not specified.

Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>
2024-04-10 16:08:54 +00:00
Halil Ozan Akgül b877d606c7
Adds 2PC distributed commands from other databases (#7203)
DESCRIPTION: Adds support for 2PC from non-Citus main databases

This PR only adds support for `CREATE USER` queries, other queries need
to be added. But it should be simple because this PR creates the
underlying structure.

Citus main database is the database where the Citus extension is
created. A non-main database is all the other databases that are in the
same node with a Citus main database.

When a `CREATE USER` query is run on a non-main database we:

1. Run `start_management_transaction` on the main database. This
function saves the outer transaction's xid (the non-main database
query's transaction id) and marks the current query as main db command.
2. Run `execute_command_on_remote_nodes_as_user("CREATE USER
<username>", <username to run the command>)` on the main database. This
function creates the users in the rest of the cluster by running the
query on the other nodes. The user on the current node is created by the
query on the outer, non-main db, query to make sure consequent commands
in the same transaction can see this user.
3. Run `mark_object_distributed` on the main database. This function
adds the user to `pg_dist_object` in all of the nodes, including the
current one.

This PR also implements transaction recovery for the queries from
non-main databases.
2023-12-22 19:19:41 +03:00
Gürkan İndibay 3b556cb5ed
Adds create / drop database propagation support (#7240)
DESCRIPTION: Adds support for propagating `CREATE`/`DROP` database

In this PR, create and drop database support is added.

For CREATE DATABASE:
* "oid" option is not supported
* specifying "strategy" to be different than "wal_log" is not supported
* specifying "template" to be different than "template1" is not
supported

The last two are because those are not saved in `pg_database` and when
activating a node, we cannot assume what parameters were provided when
creating the database.

And "oid" is not supported because whether user specified an arbitrary
oid when creating the database is not saved in pg_database and we want
to avoid from oid collisions that might arise from attempting to use an
auto-assigned oid on workers.

Finally, in case of node activation, GRANTs for the database are also
propagated.

---------

Co-authored-by: Jelte Fennema-Nio <github-tech@jeltef.nl>
Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
2023-11-21 16:43:51 +03:00
Japin Li e14e8667cc
Fix redundant variable declaration (#7353)
The `$workerCount` declare twice in
src/test/regress/pg_regress_multi.pl.
2023-11-17 13:01:23 +03:00
Naisila Puka 0d1f18862b
Propagates SECURITY LABEL ON ROLE stmt (#7304)
We propagate `SECURITY LABEL [for provider] ON ROLE rolename IS
labelname` to the worker nodes.
We also make sure to run the relevant `SecLabelStmt` commands on a
newly added node by looking at roles found in `pg_shseclabel`.

See official docs for explanation on how this command works:
https://www.postgresql.org/docs/current/sql-security-label.html
This command stores the role label in the `pg_shseclabel` catalog table.

This commit also fixes the regex string in
`check_gucs_are_alphabetically_sorted.sh` script such that it escapes
the dot. Previously it was looking for all strings starting with "citus"
instead of "citus." as it should.

To test this feature, I currently make use of a special GUC to control
label provider registration in PG_init when creating the Citus extension.
2023-11-16 13:12:30 +03:00
Halil Ozan Akgül 2c7beee562
Fix citus.tenant_stats_limit test by setting it to 2 (#6899)
citus.tenant_stats_limit was set to 2 when we were adding tests for it.
Then we changed it to 10, making the tests incorrect.
This PR fixes that without breaking other tests.
2023-05-23 17:44:07 +03:00
Halil Ozan Akgül 52ad2d08c7
Multi tenant monitoring (#6725)
DESCRIPTION: Adds views that monitor statistics on tenant usages

This PR adds `citus_stats_tenants` view that monitors the tenants on the
cluster.

`citus_stats_tenants` shows the node id, colocation id, tenant
attribute, read count in this period and last period, and query count in
this period and last period of the tenant.
Tenant attribute currently is the tenant's distribution column value,
later when schema based sharding is introduced, this meaning might
change.
A period is a time bucket the queries are counted by. Read and query
counts for this period can increase until the current period ends. After
that those counts are moved to last period's counts, which cannot
change. The period length can be set using 'citus.stats_tenants_period'.

`SELECT` queries are counted as _read_ queries, `INSERT`, `UPDATE` and
`DELETE` queries are counted as _write_ queries. So in the view read
counts are `SELECT` counts and query counts are `SELECT`, `INSERT`,
`UPDATE` and `DELETE` count.

The data is stored in shared memory, in a struct named
`MultiTenantMonitor`.

`citus_stats_tenants` shows the data from local tenants.

`citus_stats_tenants` show up to `citus.stats_tenant_limit` number of
tenants.
The tenants are scored based on the number of queries they run and the
recency of those queries. Every query ran increases the score of tenant
by `ONE_QUERY_SCORE`, and after every period ends the scores are halved.
Halving is done lazily.
To retain information a longer the monitor keeps up to 3 times
`citus.stats_tenant_limit` tenants. When the tenant count hits `3 *
citus.stats_tenant_limit`, last `citus.stats_tenant_limit` tenants are
removed. To see all stored tenants you can use
`citus_stats_tenants(return_all_tenants := true)`

- [x] Create collector view that gets data from all nodes. #6761 
- [x] Add monitoring log #6762 
- [x] Create enable/disable GUC #6769 
- [x] Parse the annotation string correctly #6796 
- [x] Add local queries and prepared statements #6797
- [x] Rename to citus_stat_statements #6821 
- [x] Run pgbench
- [x] Fix role permissions #6812

---------

Co-authored-by: Gokhan Gulbiz <ggulbiz@gmail.com>
Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
2023-04-05 17:44:17 +03:00
rajeshkt78 85b8a2c7a1
CDC implementation for Citus using Logical Replication (#6623)
Description:
Implementing CDC changes using Logical Replication to avoid
re-publishing events multiple times by setting up replication origin
session, which will add "DoNotReplicateId" to every WAL entry.
   - shard splits
   - shard moves
   - create distributed table
   - undistribute table
   - alter distributed tables (for some cases)
   - reference table operations
   

The citus decoder which will be decoding WAL events for CDC clients, 
ignores any WAL entry with replication origin that is not zero.
It also maps the shard names to distributed table names.
2023-03-28 16:00:21 +05:30
aykut-bozkurt ea3093bdb6
Make workerCount configurable for regression tests (#6764)
Make worker count flexible in our regression tests instead of hardcoding
it to 2 workers.
2023-03-20 12:06:31 +03:00
Jelte Fennema aa9cd16d15
Use correct guc value to disable statistics collection (#6641)
The `citus.enable_statistics_collection` is a boolean GUC not an integer
one. Setting it to `-1` showed errors in the logs.
2023-01-24 15:32:50 +01:00
Jelte Fennema 93fcc5c5d8
Move tablespace directory creation to pg_regress_multi.pl (#6629)
Multiple `check-xxx` targets create tablespaces. If you run
two of these at the same time you would get an error like:

```diff
CREATE TABLESPACE test_tablespace LOCATION :'test_tablespace';
+ERROR:  directory "/home/rajesh/citus/citus/src/test/regress/tmp_check/ts0/PG_14_202107181" already in use as a tablespace
```

This fixes that by moving creation of table space directory creation and
removal to pg_regress_multi.pl instead of being in the Makefile.
2023-01-20 12:34:33 +00:00
aykut-bozkurt 8be4ce546e
fix vanilla test status on CI (#6555)
- Because of the make command used for vanilla tests, test status is
always shown as success on CI. As a fix, I added `&& false` at the end
of the copying diff file to make the command fail when check-vanilla
fails.
```make
check-vanilla: all
	$(pg_regress_multi_check) --vanillatest || (cp $(vanilla_diffs_file) $(citus_abs_srcdir)/regression.diffs && false)
```

- I also fixed some vanilla tests that fails due to recently added clock
related operators shown up at some queries.
2022-12-13 11:15:47 +03:00
aykut-bozkurt 442cdb2ea5
pg_regress needs the option dlpath for postgres tests to find regress.so (#6416)
When you run vanilla tests in your local environment, some of the tests
tries to find path for regress.so which is not in default lib path. That
is why we need to specify regress.so path as dlpath option.

Example failure:
```
LOAD :'regresslib';
+ERROR:  could not access file "/home/aykutbozkurt/.pgenv/pgsql-15beta4/lib/regress.so": No such file or directory
```

It is actually in
`~/.pgenv/src/postgresql-15beta4/src/test/regress/regress.so` which is
found by `$regresslibdir`.
2022-10-11 14:43:06 +03:00
Jelte Fennema 5c64227223
Hopefully reduce flaky tests by disabling the maintenance daemon (#6252)
Sometimes our CI randomly fails on a test in a way similar to this:
```diff
 step s2-drop:
     DROP TABLE cancel_table;
-
+ <waiting ...>
+step s2-drop: <... completed>

 starting permutation: s1-timeout s1-begin s1-sleep10000 s1-rollback s1-reset s1-drop
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/26524/workflows/5415b84f-13a3-482f-bef9-648314c79a67/jobs/756377

Another example of a failure like this:
```diff
 stop_session_level_connection_to_node
 -------------------------------------
                                      
 (1 row)
 
 step s3-display: 
  SELECT * FROM ref_table ORDER BY id, value;
  SELECT * FROM dist_table ORDER BY id, value;
-
+ <waiting ...>
+step s3-display: <... completed>
 id|value
 --+-----
 ```
Source: https://app.circleci.com/pipelines/github/citusdata/citus/26551/workflows/91dca4b2-bb1c-4cae-b2ef-ce3f9c689ce5/jobs/757781

A step that shouldn't be blocked is detected as "waiting..." temporarily
and then gets unblocked automatically immediately after. I'm not
certain of the reason for this, but one explanation is that the
maintenance daemon is doing something that blocks the query. In the
shown case my hunch is that it could be the deferred shard deletion.

This PR disables all the features of the maintenance daemon during
isolation testing to try and prevent process from randomly being
detected as blocking.

NOTE: I'm not certain that this will actually fix this issue. If the
issue persists even after this change, at least we know that it's not
the maintenance daemon that's blocking it.
2022-10-04 14:33:57 +03:00
Önder Kalacı 1df943e0d5
Use Posix locale in the tests (#6261)
Commit 9653a0065e has changed it to C.UTF-8 , which fails on MacOS
2022-08-29 12:52:03 +02:00
Jelte Fennema ee5af1ab90
Use C.UTF-8 locale in tests (#6242)
I upgraded my OS to Ubuntu 22.04 a while back and since then some tests
order output slightly differently. I think it might be because of the
glibc upgrade that changed ordering for things like underscores and
spaces.

Changing the locale to C.UTF-8 solves this issue.
2022-08-25 13:10:49 +02:00
Jelte Fennema 1dd775fae8
Speed up logical replication tests to fix flakyness (#6229)
The isolation_tenant_isolation_nonblocking test would sometimes randomly
fail in CI, because we have a limit of runtime limit of 2 minutes per test.
```
test isolation_tenant_isolation_nonblocking ... make: *** [Makefile:171: check-enterprise-isolation] Terminated

Too long with no output (exceeded 2m0s): context deadline exceeded
```

One solution would obviously be to increase the timeout, but instead I
spent some time to increase the speed of our tests by tweaking some
timings. On my local machine the time it took to run the
isolation_tenant_isolation_nonblocking test went from 75s to 15s.

So now we should easily stay within the 2 minute per test limit.

I also checked if the new settings improved other logical replication
tests, but the impect differs wildly per test. One other example of a
test that runs much quicker due to the change is
isolation_non_blocking_shard_split_fkey. But the shard move tests I
tried are impacted much less.

Example of failed tests: https://app.circleci.com/pipelines/github/citusdata/citus/26373/workflows/4fa660e4-63c8-4844-bef8-70a7bea902b7/jobs/748199
2022-08-23 17:37:31 +02:00
Jelte Fennema 25e5cf2e50
Fix flakyness in failure_setup (#6205)
In CI sometimes failure_setup will fail with the following error:
```diff
 SELECT master_add_node('localhost', :worker_2_proxy_port);  -- an mitmproxy which forwards to the second worker
- master_add_node
----------------------------------------------------------------------
-               2
-(1 row)
-
+ERROR:  connection to the remote node localhost:9060 failed with the following error: could not connect to server: Connection refused
+	Is the server running on host "localhost" (127.0.0.1) and accepting
+	TCP/IP connections on port 9060?
+could not connect to server: Connection refused
+	Is the server running on host "localhost" (127.0.0.1) and accepting
+	TCP/IP connections on port 9060?
+could not connect to server: Cannot assign requested address
+	Is the server running on host "localhost" (::1) and accepting
+	TCP/IP connections on port 9060?
diff -dU10 -w /home/circleci/project/src/test/regress/expected/failure_online_move_shard_placement.out /home/circleci/project/src/test/regress/results/failure_online_move_shard_placement.out
```

This then breaks all the tests run after it as well, because we're
missing one worker node.

Locally I was able to reproduce this error by sleeping for 10 seconds in
the forked process sleep before actually starting mitmproxy. So I'm
expecting what's happening in CI is that due to limited resources,
mitmproxy is not up yet when we try to add its port as a workernode.

This PR fixes this by waiting until mitmproxy is listening on its socket
before actually starting to run our tests. This fixed it locally for me
when I made the forked process sleep for 10 seconds before starting
mitmproxy.

In passing it also improves the detection and errors that we already
had for the case where something was already listening on the 
mitmproxy port.

Because both @gledis69 and me were changing things in our CI images
at the same time this also includes a bump of the style checker tools.
Closes #6200
2022-08-19 13:03:08 +00:00
Jelte Fennema 78a5013e24
Support changing CPU priorities for backends and shard moves (#6126)
**Intro**
This adds support to Citus to change the CPU priority values of
backends. This is created with two main usecases in mind:

1. Users might want to run the logical replication part of the shard moves
   or shard splits at a higher speed than they would do by themselves. 
   This might cause some small loss of DB performance for their regular 
   queries, but this is often worth it. During high load it's very possible
   that the logical replication WAL sender is not able to keep up with the
   WAL that is generated. This is especially a big problem when the
   machine is close to running out of disk when doing a rebalance.
2. Users might have certain long running queries that they don't impact
   their regular workload too much.

**Be very careful!!!**
Using CPU priorities to control scheduling can be helpful in some cases
to control which processes are getting more CPU time than others. 
However, due to an issue called "[priority inversion][1]" it's possible that
using CPU priorities together with the many locks that are used within
Postgres cause the exact opposite behavior of what you intended. This
is why this PR only allows the PG superuser to change the CPU priority 
of its own processes. Currently it's not recommended to set `citus.cpu_priority`
directly. Currently the only recommended interface for users is the setting 
called `citus.cpu_priority_for_logical_replication_senders`. This setting
controls CPU priority for a very limited set of processes (the logical 
replication senders). So, the dangers of priority inversion are also limited
with when using it for this usecase.

**Background**
Before reading the rest it's important to understand some basic
background regarding process CPU priorities, because they are a bit
counter intuitive. A lower priority value, means that the process will
be scheduled more and whatever it's doing will thus complete faster. The
default priority for processes is 0. Valid values are from -20 to 19
inclusive. On Linux a larger difference between values of two processes
will result in a bigger difference in percentage of scheduling.

**Handling the usecases**
Usecase 1 can be achieved by setting `citus.cpu_priority_for_logical_replication_senders`
to the priority value that you want it to have. It's necessary to set
this both on the workers and the coordinator. Example:
```
citus.cpu_priority_for_logical_replication_senders = -10
```

Usecase 2 can with this PR be achieved by running the following as
superuser. Note that this is only possible as superuser currently 
due to the dangers mentioned in the "Be very carefull!!!" section. 
And although this is possible it's **NOT** recommended:
```sql
ALTER USER background_job_user SET citus.cpu_priority = 5;
```

**OS configuration**
To actually make these settings work well it's important to run Postgres
with more a more permissive value for the 'nice' resource limit than
Linux will do by default. By default Linux will not allow a process to
set its priority lower than it currently is, even if it was lower when
the process originally started. This capability is necessary to reset
the CPU priority to its original value after a transaction finishes.
Depending on how you run Postgres this needs to be done in one of two
ways:

If you use systemd to start Postgres all you have to do is add  a line
like this to the systemd service file:
```conf
LimitNice=+0 # the + is important, otherwise its interpreted incorrectly as 20
```

If that's not the case you'll have to configure `/etc/security/limits.conf` 
like so, assuming that you are running Postgres as the `postgres` OS user:
```
postgres            soft    nice            0
postgres            hard    nice            0
```
Finally you'd have add the following line to `/etc/pam.d/common-session`
```
session required pam_limits.so
```

These settings would allow to change the priority back after setting it
to a higher value.

However, to actually allow you to set priorities even lower than the
default priority value you would need to change the values in the 
config to something lower than 0. So for example:
```conf
LimitNice=-10
```

or

```
postgres            soft    nice            -10
postgres            hard    nice            -10
```

If you use WSL2 you'll likely have to do another thing. You have to 
open a new shell, because when PAM is only used during login, and 
WSL2 doesn't actually log you in. You can force a login like this:
```
sudo su $USER --shell /bin/bash
```
Source: https://stackoverflow.com/a/68322992/2570866

[1]: https://en.wikipedia.org/wiki/Priority_inversion
2022-08-16 13:07:17 +03:00
aykut-bozkurt ccf1e0f584
Pg vanilla tests can be run with citus created. (#6018) 2022-08-11 12:53:22 +03:00
Hanefi Onaldi 4185543910
Pass source directory in env to regression tests
PostgreSQL 15 dropped usage of .source files that are used to generate
.sql and .out files by replacing some placeholders with the actual
values before test runs. Instead, the information is passed from
pg_regress to the .sql and .out files directly via env variables. Those
variables are read via \getenv psql command in relevant test files.

PostgreSQL 15 commit d1029bb5a26cb84b116b0dee4dde312291359f2a introduced
some changes to pg_regress binary that allowed this to happen. However
this change is not backported to earlier versions of PG, and thus we
come up with a similar mechanism in pg_regress_multi that works in all
available PG versions.
2022-08-09 14:15:51 +03:00
aykut-bozkurt 3ddc089651
stop distributing views with no distributed dependency if GUC DistributeLocalViews is set false. (#6083) 2022-08-04 12:34:40 +03:00
aykut-bozkurt 5f27445b69
enable propagation warnings before postgres vanilla tests (#6081) 2022-07-27 10:34:41 +03:00
aykut-bozkurt 67ac3da2b0
added citus_depended_objects udf and HideCitusDependentObjects GUC to hide citus depended objects from pg meta queries (#6055)
use RecurseObjectDependencies api to find if an object is citus depended

make vanilla tests runnable to see if citus_depended function is working correctly
2022-07-25 16:43:34 +03:00
Jelte Fennema 184c7c0bce
Make enterprise features open source (#6008)
This PR makes all of the features open source that were previously only
available in Citus Enterprise.

Features that this adds:
1. Non blocking shard moves/shard rebalancer
   (`citus.logical_replication_timeout`)
2. Propagation of CREATE/DROP/ALTER ROLE statements
3. Propagation of GRANT statements
4. Propagation of CLUSTER statements
5. Propagation of ALTER DATABASE ... OWNER TO ...
6. Optimization for COPY when loading JSON to avoid double parsing of
   the JSON object (`citus.skip_jsonb_validation_in_copy`)
7. Support for row level security
8. Support for `pg_dist_authinfo`, which allows storing different
   authentication options for different users, e.g. you can store
   passwords or certificates here.
9. Support for `pg_dist_poolinfo`, which allows using connection poolers
   in between coordinator and workers
10. Tracking distributed query execution times using
   citus_stat_statements (`citus.stat_statements_max`,
   `citus.stat_statements_purge_interval`,
   `citus.stat_statements_track`). This is disabled by default.
11. Blocking tenant_isolation
12. Support for `sslkey` and `sslcert` in `citus.node_conninfo`
2022-06-16 00:23:46 -07:00
gledis69 4731630741 Add distributing lock command support 2022-05-20 12:28:07 +03:00
Marco Slot ceb593c9da Convert citus.hide_shards_from_app_name_prefixes to citus.show_shards_for_app_name_prefixes 2022-05-03 14:22:13 +02:00
Jelte Fennema 68bfc8d1c0
Use good initdb options in arbitrary configs tests (#5802)
In `pg_regress_multi.pl` we're running `initdb` with some options that
the `common.py` `initdb` is currently not using. All these flags seem
reasonable, so this brings `common.py` in line with
`pg_regress_multi.pl`.

In passing change the `--nosync` flag to `--no-sync`, since that's what
the PG documentation lists as the official option name (but both work).
2022-03-17 13:22:23 +01:00
Marco Slot 33bfa0b191 Hide shards from application_name's with a specific prefix 2022-01-18 15:20:55 +04:00
Ahmet Gedemenli 042d45b263 Propagate foreign server ops 2021-12-23 17:54:04 +03:00
Marco Slot 78866df13c Remove master_append_table_to_shard UDF 2021-11-08 10:43:24 +01:00
Marco Slot 386d2567d4 Reduce reliance on append tables in regression tests 2021-10-08 21:27:14 +02:00
Jelte Fennema bb5c494104 Enable binary encoding by default on PG14
Since PG14 we can now use binary encoding for arrays and composite types
that contain user defined types. This was fixed in this commit in
Postgres: 670c0a1d47

This change starts using that knowledge, by not necessarily falling back
to text encoding anymore for those types.

While doing this and testing a bit more I found various cases where
binary encoding would fail that our checks didn't cover. This fixes
those cases and adds tests for those. It also fixes EXPLAIN ANALYZE
never using binary encoding, which was a leftover of workaround that
was not necessary anymore.

Finally, it changes the default for both `citus.enable_binary_protocol`
and `citus.binary_worker_copy_format` to `true` for PG14 and up. In our
cloud offering `binary_worker_copy_format` already was true by default.
`enable_binary_protocol` had some bug with MX and user defined types,
this bug was fixed by the above mentioned fixes.
2021-09-06 10:27:29 +02:00
Sait Talha Nisanci 2fa1e5ffe3 Use the default max_parallel_workers_per_gather for vanilla 2021-09-03 15:41:28 +03:00
Sait Talha Nisanci d1c0403055 Disable Query Idenfifier calculation in tests
When queryId is not 0 and verbose is true, the query identifier is
emitted to the explain output. This is breaking Postgres outputs.
We disable de query identifier calculation in the tests.
Commit on PG that introduced the query identifier in the explain output:
4f0b0966c866ae9f0e15d7cc73ccf7ce4e1af84b
2021-09-03 15:41:28 +03:00
Ahmet Gedemenli 089ef35940 Disable dropping and truncating known shards
Add test for disabling dropping and truncating known shards
2021-06-02 14:30:27 +02:00
SaitTalhaNisanci 82f34a8d88
Enable citus.defer_drop_after_shard_move by default (#4961)
Enable citus.defer_drop_after_shard_move by default
2021-05-21 10:48:32 +03:00
Jelte Fennema 10f06ad753 Fetch shard size on the fly for the rebalance monitor
Without this change the rebalancer progress monitor gets the shard sizes
from the `shardlength` column in `pg_dist_placement`. This column needs to
be updated manually by calling `citus_update_table_statistics`.
However, `citus_update_table_statistics` could lead to distributed
deadlocks while database traffic is on-going (see #4752).

To work around this we don't use `shardlength` column anymore. Instead
for every rebalance we now fetch all shard sizes on the fly.

Two additional things this does are:
1. It adds tests for the rebalance progress function.
2. If a shard move cannot be done because a source or target node is
   unreachable, then we error in stop the rebalance, instead of showing
   a warning and continuing. When using the by_disk_size rebalance
   strategy it's not safe to continue with other moves if a specific
   move failed. It's possible that the failed move made space for the
   next move, and because the failed move never happened this space now
   does not exist.
3. Adds two new columns to the result of `get_rebalancer_progress` which
   shows the size of the shard on the source and target node.

Fixes #4930
2021-05-20 16:38:17 +02:00
Nils Dijk 1c1999ed7b
incorporate the fixopen fix for osx users on bigsur (#4837)
comparable to https://github.com/citusdata/tools/pull/88

this patch adds checks to the perl script running the testing harness of citus to start the postgres instances via the fixopen binary when present to work around `Interrupted System` call errors on OSX Big Sur.
2021-03-22 16:22:08 +01:00
Hadi Moshayedi 0e0fd6599a Faster logical replication tests.
Logical replication status can take wal_receiver_status_interval
seconds to get updated. Default is 10s, which means tests in
which logical replication is used can take a long time to finish.
We reduce it to 1 second to speed these tests up.

Logical replication apply launcher launches workers every
wal_retrieve_retry_interval, so if we have many shard moves with
logical replication consecutively, they will be throttled by this
parameter. Default is 5s, we reduce it to 1s so we finish tests
faster.
2021-01-19 07:48:47 -08:00
Marco Slot 48caca4084 Improve regression test settings 2020-11-30 20:34:03 +01:00
Onur Tirtir d912d4bc38
Print full file path in valgrind testing (#4299) 2020-11-06 10:26:53 +03:00
Sait Talha Nisanci 078dcae18c Write settings to postgres configuration file directly
In our test structure, we have been passing postgres configurations from
the terminal, which causes problems after it hits to a certain length
hence it cannot start the server and understanding why it failed is not
easy because there isn't a nice error message.

This commit changes this to write the settings directly to the postgres
configuration file. This way we can add as many postgres settings as we
want to without needing to worry about the length problem.
2020-10-05 22:09:08 +03:00