Commit Graph

290 Commits (7a1dcf6666ad333922c7463137e52a250348a3fe)

Author SHA1 Message Date
Colm McHugh 7a1dcf6666 Avoid query deparse and planning of shard query in local execution. 2025-06-24 16:52:37 +00:00
Onur Tirtir ea7aa6712d
Move stat view implementations into a submodule (#7975)
Also move serialize_distributed_ddls into commands submodule, seems like
an oversight from last year (by me).
2025-04-29 14:22:29 +03:00
Onur Tirtir 3d61c4dc71
Add citus_stat_counters view and citus_stat_counters_reset() function to reset it (#7917)
DESCRIPTION: Adds citus_stat_counters view that can be used to query
stat counters that Citus collects while the feature is enabled, which is
controlled by citus.enable_stat_counters. citus_stat_counters() can be
used to query the stat counters for the provided database oid and
citus_stat_counters_reset() can be used to reset them for the provided
database oid or for the current database if nothing or 0 is provided.

Today we don't persist stat counters on server shutdown. In other words,
stat counters are automatically reset in case of a server restart.

Details on the underlying design can be found in header comment of
stat_counters.c and in the technical readme.

-------

Here are the details about what we track as of this PR:

For connection management, we have three statistics about the inter-node
connections initiated by the node itself:

* **connection_establishment_succeeded**
* **connection_establishment_failed**
* **connection_reused**

While the first two are relatively easier to understand, the third one
covers the case where a connection is reused. This can happen when a
connection was already established to the desired node, Citus decided to
cache it for some time (see citus.max_cached_conns_per_worker &
citus.max_cached_connection_lifetime), and then reused it for a new
remote operation. Here are the other important details about these
connection statistics:

1. connection_establishment_failed doesn't care about the connections
that we could establish but are lost later in the transaction. Plus, we
cannot guarantee that the connections that are counted in
connection_establishment_succeeded were not lost later.
2. connection_establishment_failed doesn't care about the optional
connections (see OPTIONAL_CONNECTION flag) that we gave up establishing
because of the connection throttling rules we follow (see
citus.max_shared_pool_size & citus.local_shared_pool_size). The reaason
for this is that we didn't even try to establish these connections.
3. For the rest of the cases where a connection failed for some reason,
we always increment connection_establishment_failed even if the caller
was okay with the failure and know how to recover from it (e.g., the
adaptive executor knows how to fall back local execution when the target
node is the local node and if it cannot establish a connection to the
local node). The reason is that even if it's likely that we can still
serve the operation, we still failed to establish the connection and we
want to track this.
4. Finally, the connection failures that we count in
connection_establishment_failed might be caused by any of the following
reasons and for now we prefer to _not_ further distinguish them for
simplicity:
a. remote node is down or cannot accept any more connections, or
overloaded such that citus.node_connection_timeout is not enough to
establish a connection
b. any internal Citus error that might result in preparing a bad
connection string so that libpq fails when parsing the connection string
even before actually trying to establish a connection via connect() call
c. broken citus.node_conninfo or such Citus configuration that was
incorrectly set by the user can also result in similar outcomes as in b
d. internal waitevent set / poll errors or OOM in local node

We also track two more statistics for query execution:

* **query_execution_single_shard**
* **query_execution_multi_shard**

And more importantly, both query_execution_single_shard and
query_execution_multi_shard are not only tracked for the top-level
queries but also for the subplans etc. The reason is that for some
queries, e.g., the ones that go through recursive planning, after Citus
performs the heavy work as part of subplans, the work that needs to be
done for the top-level query becomes quite straightforward. And for such
query types, it would be deceiving if we only incremented the query stat
counters for the top-level query. Similarly, for non-pushable INSERT ..
SELECT and MERGE queries, we perform separate counter increments for the
SELECT / source part of the query besides the final INSERT / MERGE
query.
2025-04-28 12:23:52 +00:00
Naisila Puka 3b1c082791 Drops PG14 support (#7753)
DESCRIPTION: Drops PG14 support

1. Remove "$version_num" != 'xx' from configure file
2. delete all PG_VERSION_NUM = PG_VERSION_XX references in the code
3. Look at pg_version_compat.h file, remove all _compat functions etc
defined specifically for PGXX differences
4. delete all PG_VERSION_NUM >= PG_VERSION_(XX+1), PG_VERSION_NUM <
PG_VERSION_(XX+1) ifs in the codebase
5. delete ruleutils_xx.c file
6. cleanup normalize.sed file from pg14 specific lines
7. delete all alternative output files for that particular PG version,
server_version_ge variable helps here
2025-03-12 12:43:01 +03:00
Naisila Puka 8940665d17 Allow configuring sslnegotiation using citus.node_conn_info (#7821)
Relevant PG commit:
https://github.com/postgres/postgres/commit/d39a49c1e

PR similar to https://github.com/citusdata/citus/pull/5203
2025-03-12 12:26:06 +03:00
Naisila Puka 6bd3474804 Rename foreach_ macros to foreach_declared_ macros (#7700)
This is prep work for successful compilation with PG17

PG17added foreach_ptr, foreach_int and foreach_oid macros
Relevant PG commit
14dd0f27d7cd56ffae9ecdbe324965073d01a9ff

14dd0f27d7

We already have these macros, but they are different with the
PG17 ones because our macros take a DECLARED variable, whereas
the PG16 macros declare a locally-scoped loop variable themselves.

Hence I am renaming our macros to foreach_declared_

I am separating this into its own PR since it touches many files. The
main compilation PR is https://github.com/citusdata/citus/pull/7699
2025-03-12 11:01:49 +03:00
eaydingol 117bd1d04f
Disable nonmaindb interface (#7905)
DESCRIPTION: The PR disables the non-main db related features. 

The non-main db related features were introduced in
https://github.com/citusdata/citus/pull/7203.
2025-02-21 13:36:19 +03:00
Onur Tirtir 73411915a4
Avoid re-assigning the global pid for client backends and bg workers when the application_name changes (#7791)
DESCRIPTION: Fixes a crash that happens because of unsafe catalog access
when re-assigning the global pid after application_name changes.

When application_name changes, we don't actually need to
try re-assigning the global pid for external client backends because
application_name doesn't affect the global pid for such backends. Plus,
trying to re-assign the global pid for external client backends would
unnecessarily cause performing a catalog access when the cached local
node id is invalidated. However, accessing to the catalog tables is
dangerous in certain situations like when we're not in a transaction
block. And for the other types of backends, i.e., the Citus internal
backends, we need to re-assign the global pid when the application_name
changes because for such backends we simply extract the global pid
inherited from the originating backend from the application_name -that's
specified by originating backend when openning that connection- and this
doesn't require catalog access.
2024-12-23 14:01:53 +00:00
Onur Tirtir 3586aab17a
Allow providing "host" parameter via citus.node_conninfo (#7541)
And when that is the case, directly use it as "host" parameter for the
connections between nodes and use the "hostname" provided in
pg_dist_node / pg_dist_poolinfo as "hostaddr" to avoid host name lookup.

This is to avoid allowing dns resolution (and / or setting up DNS names
for each host in the cluster). This already works currently when using
IPs in the hostname. The only use of setting host is that you can then
use sslmode=verify-full and it will validate that the hostname matches
the certificate provided by the node you're connecting too.

It would be more flexible to make this a per-node setting, but that
requires SQL changes. And we'd like to backport this change, and
backporting such a sql change would be quite hard while backporting this
change would be very easy. And in many setups, a different hostname for
TLS validation is actually not needed. The reason for that is
query-from-any node: With query-from-any-node all nodes usually have a
certificate that is valid for the same "cluster hostname", either using
a wildcard cert or a Subject Alternative Name (SAN). Because if you load
balance across nodes you don't know which node you're connecting to, but
you still want TLS validation to do it's job. So with this change you
can use this same "cluster hostname" for TLS validation within the
cluster. Obviously this means you don't validate that you're connecting
to a particular node, just that you're connecting to one of the nodes in
the cluster, but that should be fine from a security perspective (in
most cases).

Note to self: This change requires updating

https://docs.citusdata.com/en/latest/develop/api_guc.html#citus-node-conninfo-text.

DESCRIPTION: Allows overwriting host name for all inter-node connections
by supporting "host" parameter in citus.node_conninfo
2024-04-15 09:51:11 +00:00
Karina 9ff8436f14
Create directories and files with pg_file_create_mode and pg_dir_create_mode permissions (#7479)
Since Postgres commit da9b580d files and directories are supposed to
be created with pg_file_create_mode and pg_dir_create_mode permissions
when default permissions are expected.

This fixes a failure of one of the postgres tests:
If we create file add.conf containing
```
shared_preload_libraries='citus'
```
and run postgres tests
```
TEMP_CONFIG=/path/to/add.conf make installcheck -C src/bin/pg_ctl/
```
then 001_start_stop.pl fails with
```
.../data/base/pgsql_job_cache mode must be 0750
```
in the log.

In passing this also stops creating directories that we haven't used
since Citus 7.4

This change explicitely doesn't change permissions of certificates/keys
that we create.

---------

Co-authored-by: Karina Litskevich <litskevichkarina@gmail.com>
2024-02-07 12:48:31 +01:00
Halil Ozan Akgül b877d606c7
Adds 2PC distributed commands from other databases (#7203)
DESCRIPTION: Adds support for 2PC from non-Citus main databases

This PR only adds support for `CREATE USER` queries, other queries need
to be added. But it should be simple because this PR creates the
underlying structure.

Citus main database is the database where the Citus extension is
created. A non-main database is all the other databases that are in the
same node with a Citus main database.

When a `CREATE USER` query is run on a non-main database we:

1. Run `start_management_transaction` on the main database. This
function saves the outer transaction's xid (the non-main database
query's transaction id) and marks the current query as main db command.
2. Run `execute_command_on_remote_nodes_as_user("CREATE USER
<username>", <username to run the command>)` on the main database. This
function creates the users in the rest of the cluster by running the
query on the other nodes. The user on the current node is created by the
query on the outer, non-main db, query to make sure consequent commands
in the same transaction can see this user.
3. Run `mark_object_distributed` on the main database. This function
adds the user to `pg_dist_object` in all of the nodes, including the
current one.

This PR also implements transaction recovery for the queries from
non-main databases.
2023-12-22 19:19:41 +03:00
Nils Dijk 0620c8f9a6
Sort includes (#7326)
This change adds a script to programatically group all includes in a
specific order. The script was used as a one time invocation to group
and sort all includes throught our formatted code. The grouping is as
follows:

 - System includes (eg. `#include<...>`)
 - Postgres.h (eg. `#include "postgres.h"`)
- Toplevel imports from postgres, not contained in a directory (eg.
`#include "miscadmin.h"`)
 - General postgres includes (eg . `#include "nodes/..."`)
- Toplevel citus includes, not contained in a directory (eg. `#include
"citus_verion.h"`)
 - Columnar includes (eg. `#include "columnar/..."`)
 - Distributed includes (eg. `#include "distributed/..."`)

Because it is quite hard to understand the difference between toplevel
citus includes and toplevel postgres includes it hardcodes the list of
toplevel citus includes. In the same manner it assumes anything not
prefixed with `columnar/` or `distributed/` as a postgres include.

The sorting/grouping is enforced by CI. Since we do so with our own
script there are not changes required in our uncrustify configuration.
2023-11-23 18:19:54 +01:00
Gürkan İndibay 3b556cb5ed
Adds create / drop database propagation support (#7240)
DESCRIPTION: Adds support for propagating `CREATE`/`DROP` database

In this PR, create and drop database support is added.

For CREATE DATABASE:
* "oid" option is not supported
* specifying "strategy" to be different than "wal_log" is not supported
* specifying "template" to be different than "template1" is not
supported

The last two are because those are not saved in `pg_database` and when
activating a node, we cannot assume what parameters were provided when
creating the database.

And "oid" is not supported because whether user specified an arbitrary
oid when creating the database is not saved in pg_database and we want
to avoid from oid collisions that might arise from attempting to use an
auto-assigned oid on workers.

Finally, in case of node activation, GRANTs for the database are also
propagated.

---------

Co-authored-by: Jelte Fennema-Nio <github-tech@jeltef.nl>
Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
2023-11-21 16:43:51 +03:00
Naisila Puka 0d1f18862b
Propagates SECURITY LABEL ON ROLE stmt (#7304)
We propagate `SECURITY LABEL [for provider] ON ROLE rolename IS
labelname` to the worker nodes.
We also make sure to run the relevant `SecLabelStmt` commands on a
newly added node by looking at roles found in `pg_shseclabel`.

See official docs for explanation on how this command works:
https://www.postgresql.org/docs/current/sql-security-label.html
This command stores the role label in the `pg_shseclabel` catalog table.

This commit also fixes the regex string in
`check_gucs_are_alphabetically_sorted.sh` script such that it escapes
the dot. Previously it was looking for all strings starting with "citus"
instead of "citus." as it should.

To test this feature, I currently make use of a special GUC to control
label provider registration in PG_init when creating the Citus extension.
2023-11-16 13:12:30 +03:00
Emel Şimşek ee8f4bb7e8
Start Maintenance Daemon for Main DB at the server start. (#7254)
DESCRIPTION: This change starts a maintenance deamon at the time of
server start if there is a designated main database.

This is the code flow:

1. User designates a main database:
   `ALTER SYSTEM SET citus.main_db =  "myadmindb";`

2. When postmaster starts, in _PG_Init, citus calls 
    `InitializeMaintenanceDaemonForMainDb`
  
This function registers a background worker to run
`CitusMaintenanceDaemonMain `with `databaseOid = 0 `

3. `CitusMaintenanceDaemonMain ` takes some special actions when
databaseOid is 0:
     - Gets the citus.main_db  value.
     - Connects to the  citus.main_db
     - Now the `MyDatabaseId `is available, creates a hash entry for it.
     - Then follows the same control flow as for a regular db,
2023-10-30 09:44:13 +03:00
Gürkan İndibay 71a4633dad
Fixes typo and renames multi_process_utility (#7259) 2023-10-17 16:39:37 +03:00
Emel Şimşek a849570f3f
Improve the performance of CitusHasBeenLoaded function for a database that does not do CREATE EXTENSION citus but load citus.so. (#7123)
For a database that does not create the citus extension by running

`  CREATE EXTENSION citus;`

`CitusHasBeenLoaded ` function ends up querying the `pg_extension` table
every time it is invoked. This is not an ideal situation for a such a
database.

The idea in this PR is as follows:

### A new field in MetadataCache.
 Add a new variable `extensionCreatedState `of the following type:

```
typedef enum ExtensionCreatedState
{
        UNKNOWN = 0,
        CREATED = 1,
        NOTCREATED = 2,
} ExtensionCreatedState;
```
When the MetadataCache is invalidated, `ExtensionCreatedState` will be
set to UNKNOWN.
     
### Invalidate MetadataCache when CREATE/DROP/ALTER EXTENSION citus
commands are run.

- Register a callback function, named
`InvalidateDistRelationCacheCallback`, for relcache invalidation during
the shared library initialization for `citus.so`. This callback function
is invoked in all the backends whenever the relcache is invalidated in
one of the backends. (This could be caused many DDLs operations).

- In the cache invalidation callback,`
InvalidateDistRelationCacheCallback`, invalidate `MetadataCache` zeroing
it out.
 
- In `CitusHasBeenLoaded`, perform the costly citus is loaded check only
if the `MetadataCache` is not valid.
 
### Downsides

Any relcache invalidation (caused by various DDL operations) will case
Citus MetadataCache to get invalidated. Most of the time it will be
unnecessary. But we rely on that DDL operations on relations will not be
too frequent.
2023-09-05 13:29:35 +03:00
Naisila Puka a17fae36b9
Disable statistics collection (#7162)
Enabled by mistake in

ba40eb363c
2023-08-29 16:09:19 +03:00
Naisila Puka 6056cb2c29
PG16 compatibility - get_relation_info hook to avoid crash from adjusted partitioning (#7099)
PG16 compatibility - Part 5

Check out part 1 42d956888d
part 2 0d503dd5ac
part 3 907d72e60d
part 4 7c6b4ce103

This commit is in the series of PG16 compatibility commits. Find the explanation below:

If we allow to adjust partitioning, we get a crash when accessing
amcostestimate of partitioned indexes, because amcostestimate is NULL
for them. The following PG commit is the culprit:
3c569049b7
3c569049b7b502bb4952483d19ce622ff0af5fd6
Previously, partitioned indexes would just be ignored.
Now, they are added in the list. However get_relation_info expects the
tables which have partitioned indexes to have the inh flag set properly.
AdjustPartitioningForDistributedPlanning plays with that flag, hence we
don't get the desired behaviour.
The hook is simply removing all partitioned indexes from the list.

More PG16 compatibility commits are coming soon ...
2023-08-08 15:51:21 +03:00
Naisila Puka 42d956888d
PG16 compatibility: Resolve compilation issues (#7005)
This PR provides successful compilation against PG16Beta2. It does some
necessary refactoring to prepare for full support of version 16, in
https://github.com/citusdata/citus/pull/6952 .

Change RelFileNode to RelFileNumber or RelFileLocator 
Relevant PG commit
b0a55e43299c4ea2a9a8c757f9c26352407d0ccc

new header for varatt.h 
Relevant PG commit:
d952373a987bad331c0e499463159dd142ced1ef

drop support for Abs, use fabs 
Relevant PG commit
357cfefb09115292cfb98d504199e6df8201c957

tuplesort PGcommit: d37aa3d35832afde94e100c4d2a9618b3eb76472 
Relevant PG commit:
d37aa3d35832afde94e100c4d2a9618b3eb76472

Fix vacuum in columnar 
Relevant PG commit:
4ce3afb82ecfbf64d4f6247e725004e1da30f47c
older one:
b6074846cebc33d752f1d9a66e5a9932f21ad177

Add alloc_flags to pg_clean_ascii 
Relevant PG commit:
45b1a67a0fcb3f1588df596431871de4c93cb76f

Merge GetNumConfigOptions() into get_guc_variables() 
Relevant PG commit:
3057465acfbea2f3dd7a914a1478064022c6eecd

Minor PG refactor PG_FUNCNAME_MACRO __func__ 
Relevant PG commit
320f92b744b44f961e5d56f5f21de003e8027a7f

Pass NULL context to stringToQualifiedNameList, typeStringToTypeName 
The pre-PG16 error behaviour for the following
stringToQualifiedNameList & typeStringToTypeName
was ereport(ERROR, ...)
Now with PG16 we have this context input. We preserve the same behaviour
by passing a NULL context, because of the following:
(copy paste comment from PG16)
If "context" isn't an ErrorSaveContext node, this behaves as
errstart(ERROR, domain), and the errsave() macro ends up acting
exactly like ereport(ERROR, ...).
Relevant PG commit
858e776c84f48841e7e16fba7b690b76e54f3675

Use RangeVarCallbackMaintainsTable instead of RangeVarCallbackOwnsTable 
Relevant PG commit:
60684dd834a222fefedd49b19d1f0a6189c1632e

FIX THIS: Not implemented grant-level control of role inheritance 
see PG commit
e3ce2de09d814f8770b2e3b3c152b7671bcdb83f

Make Scan node abstract 
PG commit:
8c73c11a0d39049de2c1f400d8765a0eb21f5228

Change in Var representations, get_relids_in_jointree 
PG commit
2489d76c4906f4461a364ca8ad7e0751ead8aa0d

Deadlock detection changes because SHM_QUEUE is removed 
Relevant PG Commit:
d137cb52cb7fd44a3f24f3c750fbf7924a4e9532

TU_UpdateIndexes 
Relevant PG commit
19d8e2308bc51ec4ab993ce90077342c915dd116

Use object_ownercheck and object_aclcheck functions 
Relevant PG commits:
afbfc02983f86c4d71825efa6befd547fe81a926
c727f511bd7bf3c58063737bcf7a8f331346f253

Rework Permission Info for successful compilation 
Relevant PG commits:
postgres/postgres@a61b1f7
postgres/postgres@b803b7d
---------

Co-authored-by: onderkalaci <onderkalaci@gmail.com>
2023-07-21 14:32:37 +03:00
Gokhan Gulbiz e0d3476526
Add locking mechanism for tenant monitoring probabilistic approach (#7026)
This PR 
* Addresses a concurrency issue in the probabilistic approach of tenant
monitoring by acquiring a shared lock for tenant existence checks.
* Changes `citus.stat_tenants_sample_rate_for_new_tenants` type to
double
* Renames `citus.stat_tenants_sample_rate_for_new_tenants` to
`citus.stat_tenants_untracked_sample_rate`
2023-07-03 13:08:03 +03:00
Jelte Fennema fd1427de2c
Change by_disk_size rebalance strategy to have a base size (#7035)
One problem with rebalancing by disk size is that shards in newly
created collocation groups are considered extremely small. This can
easily result in bad balances if there are some other collocation groups
that do have some data. One extremely bad example of this is:
1. You have 2 workers
2. Both contain about 100GB of data, but there's a 70MB difference.
3. You create 100 new distributed schemas with a few empty tables in
   them
4. You run the rebalancer
5. Now all new distributed schemas are placed on the node with that had
   70MB less.
6. You start loading some data in these shards and quickly the balance
   is completely off

To address this edge case, this PR changes the by_disk_size rebalance
strategy to add a a base size of 100MB to the actual size of each
shard group. This can still result in a bad balance when shard groups
are empty, but it solves some of the worst cases.
2023-06-27 16:37:09 +02:00
Naisila Puka 69af3e8509
Drop PG13 Support Phase 2 - Remove PG13 specific paths/tests (#7007)
This commit is the second and last phase of dropping PG13 support.

It consists of the following:

- Removes all PG_VERSION_13 & PG_VERSION_14 from codepaths
- Removes pg_version_compat entries and columnar_version_compat entries
specific for PG13
- Removes alternative pg13 test outputs 
- Removes PG13 normalize lines and fix the test outputs based on that

It is a continuation of 5bf163a27d
2023-06-21 14:18:23 +03:00
Onur Tirtir 12a093b456
Allow using generated identity column based on int/smallint when creating a distributed table (#7008)
Allow using generated identity column based on int/smallint when
creating a distributed table so that applications that rely on
those data types don't break.

Inserting into / modifying such columns from workers is not allowed
but it's better than not allowing such columns altogether.
2023-06-16 14:34:23 +03:00
Emel Şimşek 4f793abc4a
Turn on GUC_REPORT flag for search_path to enable reporting back the parameter value upon change. (#6983)
DESCRIPTION: Turns on the GUC_REPORT flag for search_path. This results
in postgres to report the parameter status back in addition to Command
Complete packet.

In response to the following command,

> SET search_path TO client1;

postgres sends back the following packets (shown in pseudo form):

C (Command Complete) SET + **S (Parameter Status) search_path =
client1**
2023-06-14 17:35:52 +03:00
Naisila Puka ba40eb363c
Fix some gucs' initial and boot values, and flag combinations (#6957)
PG16beta1 added some sanity checks for GUCS, find the Relevant PG
commits below:

1- Add check on initial and boot values when loading GUCs

a73952b795
2- Extend check_GUC_init() with checks on flag combinations when loading
GUCs

009f8d1714

I fixed our currently problematic GUCS, we can merge this directly into
main as these make sense for any PG version.

There was a particular NodeConninfo issue:
Previously we would rely on the fact that NodeConninfo initial value
is an empty string. However, with PG16 enforcing same initial and boot
values, we can't use an empty initial value for NodeConninfo anymore.
Therefore we add a new flag to indicate whether we are at boot check.
2023-06-14 11:55:52 +03:00
Gokhan Gulbiz 2c509b712a
Tenant monitoring performance improvements (#6868)
- [x] Use spinlock instead of lwlock per tenant
[b437aa9](b437aa9e52)
- [x] Use hashtable to store tenant stats
[ccd464b](ccd464ba04)
- [x] Introduce a new GUC for specifying the sampling rate of new tenant
entries in the tenant monitor.
[a8d3805](a8d3805bd6)

Below are the pgbench metrics with select-only workloads from my local
machine. Here is the
[script](https://gist.github.com/gokhangulbiz/7a2308470597dc06734ff7c08f87c656)
I used for benchmarking.

| | Connection Count | Initial Implementation (TPS) | On/Off Diff |
Final Implementation -Run#1 (TPS) | On/Off Diff | Final Implementation
-Run#2 (TPS) | On/Off Diff | Final Implementation -Run#3 (TPS) | On/Off
Diff | Avg On/Off Diff |
| --- | ---------------- | ---------------------------- | ----------- |
---------------------------------- | ----------- |
---------------------------------- | ----------- |
---------------------------------- | ----------- | --------------- |
| On | 32 | 37488.69839 | \-17% | 42859.94402 | \-5% | 43379.63121 |
\-2% | 42636.2264 | \-7% | \-5% |
| Off | 32 | 43909.83121 | | 45139.63151 | | 44188.77425 | | 45451.9548
| | |
| On | 300 | 30463.03538 | \-15% | 33265.19957 | \-7% | 34685.87233 |
\-2% | 34682.5214 | \-1% | \-3% |
| Off | 300 | 35105.73594 | | 35637.45423 | | 35331.33447 | | 35113.3214
| | |
2023-06-11 12:17:31 +03:00
Teja Mupparti f6a516dab5 Refactor repartitioning code into generic format 2023-06-05 09:06:05 -07:00
Onur Tirtir 246b054a7d
Add support for schema-based-sharding via a GUC (#6866)
DESCRIPTION: Adds citus.enable_schema_based_sharding GUC that allows
sharding the database based on schemas when enabled.

* Refactor the logic that automatically creates Citus managed tables 

* Refactor CreateSingleShardTable() to allow specifying colocation id
instead

* Add support for schema-based-sharding via a GUC

### What this PR is about:
Add **citus.enable_schema_based_sharding GUC** to enable schema-based
sharding. Each schema created while this GUC is ON will be considered
as a tenant schema. Later on, regardless of whether the GUC is ON or
OFF, any table created in a tenant schema will be converted to a
single shard distributed table (without a shard key). All the tenant
tables that belong to a particular schema will be co-located with each
other and will have a shard count of 1.

We introduce a new metadata table --pg_dist_tenant_schema-- to do the
bookkeeping for tenant schemas:
```sql
psql> \d pg_dist_tenant_schema
          Table "pg_catalog.pg_dist_tenant_schema"
┌───────────────┬─────────┬───────────┬──────────┬─────────┐
│    Column     │  Type   │ Collation │ Nullable │ Default │
├───────────────┼─────────┼───────────┼──────────┼─────────┤
│ schemaid      │ oid     │           │ not null │         │
│ colocationid  │ integer │           │ not null │         │
└───────────────┴─────────┴───────────┴──────────┴─────────┘
Indexes:
    "pg_dist_tenant_schema_pkey" PRIMARY KEY, btree (schemaid)
    "pg_dist_tenant_schema_unique_colocationid_index" UNIQUE, btree (colocationid)

psql> table pg_dist_tenant_schema;
┌───────────┬───────────────┐
│ schemaid  │ colocationid  │
├───────────┼───────────────┤
│     41963 │            91 │
│     41962 │            90 │
└───────────┴───────────────┘
(2 rows)
```

Colocation id column of pg_dist_tenant_schema can never be NULL even
for the tenant schemas that don't have a tenant table yet. This is
because, we assign colocation ids to tenant schemas as soon as they
are created. That way, we can keep associating tenant schemas with
particular colocation groups even if all the tenant tables of a tenant
schema are dropped and recreated later on.

When a tenant schema is dropped, we delete the corresponding row from
pg_dist_tenant_schema. In that case, we delete the corresponding
colocation group from pg_dist_colocation as well.

### Future work for 12.0 release:
We're building schema-based sharding on top of the infrastructure that
adds support for creating distributed tables without a shard key
(https://github.com/citusdata/citus/pull/6867).
However, not all the operations that can be done on distributed tables
without a shard key necessarily make sense (in the same way) in the
context of schema-based sharding. For example, we need to think about
what happens if user attempts altering schema of a tenant table. We
will tackle such scenarios in a future PR.

We will also add a new UDF --citus.schema_tenant_set() or such-- to
allow users to use an existing schema as a tenant schema, and another
one --citus.schema_tenant_unset() or such-- to stop using a schema as
a tenant schema in future PRs.
2023-05-26 10:49:58 +03:00
Onur Tirtir 893ed416f1
Disable citus.enable_non_colocated_router_query_pushdown by default (#6909)
Fixes #6779.

DESCRIPTION: Disables citus.enable_non_colocated_router_query_pushdown
GUC by default to ensure generating a consistent distributed plan for
the queries that reference non-colocated distributed tables

We already have tests for the cases where this GUC is disabled,
so I'm not adding any more tests in this PR.

Also make multi_insert_select_window idempotent.

Related to: #6793
2023-05-15 12:07:50 +03:00
Jelte Fennema 07b8cd2634
Forward to existing emit_log_hook in our log hook (#6877)
DESCRIPTION: Forward to existing emit_log_hook in our log hook

This makes us work better with other extensions installed in Postgres.
Without this change we would overwrite their emit_log_hook, causing it
to never be called.

Fixes #6874
2023-05-09 16:55:56 +02:00
Naisila Puka 84f2d8685a
Adds control for background task executors involving a node (#6771)
DESCRIPTION: Adds control for background task executors involving a node

### Background and motivation

Nonblocking concurrent task execution via background workers was
introduced in [#6459](https://github.com/citusdata/citus/pull/6459), and
concurrent shard moves in the background rebalancer were introduced in
[#6756](https://github.com/citusdata/citus/pull/6756) - with a hard
dependency that limits to 1 shard move per node. As we know, a shard
move consists of a shard moving from a source node to a target node. The
hard dependency was used because the background task runner didn't have
an option to limit the parallel shard moves per node.

With the motivation of controlling the number of concurrent shard
moves that involve a particular node, either as source or target, this
PR introduces a general new GUC
citus.max_background_task_executors_per_node to be used in the
background task runner infrastructure. So, why do we even want to
control and limit the concurrency? Well, it's all about resource
availability: because the moves involve the same nodes, extra
parallelism won’t make the rebalance complete faster if some resource is
already maxed out (usually cpu or disk). Or, if the cluster is being
used in a production setting, the moves might compete for resources with
production queries much more than if they had been executed
sequentially.

### How does it work?

A new column named nodes_involved is added to the catalog table that
keeps track of the scheduled background tasks,
pg_dist_background_task. It is of type integer[] - to store a list
of node ids. It is NULL by default - the column will be filled by the
rebalancer, but we may not care about the nodes involved in other uses
of the background task runner.

Table "pg_catalog.pg_dist_background_task"

     Column     |           Type           
============================================
 job_id         | bigint
 task_id        | bigint
 owner          | regrole
 pid            | integer
 status         | citus_task_status
 command        | text
 retry_count    | integer
 not_before     | timestamp with time zone
 message        | text
+nodes_involved | integer[]

A hashtable named ParallelTasksPerNode keeps track of the number of
parallel running background tasks per node. An entry in the hashtable is
as follows:

ParallelTasksPerNodeEntry
{
	node_id // The node is used as the hash table key 
	counter // Number of concurrent background tasks that involve node node_id
                // The counter limit is citus.max_background_task_executors_per_node
}

When the background task runner assigns a runnable task to a new
executor, it increments the counter for each of the nodes involved with
that runnable task. The limit of each counter is
citus.max_background_task_executors_per_node. If the limit is reached
for any of the nodes involved, this runnable task is skipped. And then,
later, when the running task finishes, the background task runner
decrements the counter for each of the nodes involved with the done
task. The following functions take care of these increment-decrement
steps:

IncrementParallelTaskCountForNodesInvolved(task)
DecrementParallelTaskCountForNodesInvolved(task)

citus.max_background_task_executors_per_node can be changed in the
fly. In the background rebalancer, we simply give {source_node,
target_node} as the nodesInvolved input to the
ScheduleBackgroundTask function. The rest is taken care of by the
general background task runner infrastructure explained above. Check
background_task_queue_monitor.sql and
background_rebalance_parallel.sql tests for detailed examples.

#### Note

This PR also adds a hard node dependency if a node is first being used
as a source for a move, and then later as a target. The reason this
should be a hard dependency is that the first move might make space for
the second move. So, we could run out of disk space (or at least
overload the node) if we move the second shard to it before the first
one is moved away.

Fixes https://github.com/citusdata/citus/issues/6716
2023-04-06 14:12:39 +03:00
Halil Ozan Akgül 52ad2d08c7
Multi tenant monitoring (#6725)
DESCRIPTION: Adds views that monitor statistics on tenant usages

This PR adds `citus_stats_tenants` view that monitors the tenants on the
cluster.

`citus_stats_tenants` shows the node id, colocation id, tenant
attribute, read count in this period and last period, and query count in
this period and last period of the tenant.
Tenant attribute currently is the tenant's distribution column value,
later when schema based sharding is introduced, this meaning might
change.
A period is a time bucket the queries are counted by. Read and query
counts for this period can increase until the current period ends. After
that those counts are moved to last period's counts, which cannot
change. The period length can be set using 'citus.stats_tenants_period'.

`SELECT` queries are counted as _read_ queries, `INSERT`, `UPDATE` and
`DELETE` queries are counted as _write_ queries. So in the view read
counts are `SELECT` counts and query counts are `SELECT`, `INSERT`,
`UPDATE` and `DELETE` count.

The data is stored in shared memory, in a struct named
`MultiTenantMonitor`.

`citus_stats_tenants` shows the data from local tenants.

`citus_stats_tenants` show up to `citus.stats_tenant_limit` number of
tenants.
The tenants are scored based on the number of queries they run and the
recency of those queries. Every query ran increases the score of tenant
by `ONE_QUERY_SCORE`, and after every period ends the scores are halved.
Halving is done lazily.
To retain information a longer the monitor keeps up to 3 times
`citus.stats_tenant_limit` tenants. When the tenant count hits `3 *
citus.stats_tenant_limit`, last `citus.stats_tenant_limit` tenants are
removed. To see all stored tenants you can use
`citus_stats_tenants(return_all_tenants := true)`

- [x] Create collector view that gets data from all nodes. #6761 
- [x] Add monitoring log #6762 
- [x] Create enable/disable GUC #6769 
- [x] Parse the annotation string correctly #6796 
- [x] Add local queries and prepared statements #6797
- [x] Rename to citus_stat_statements #6821 
- [x] Run pgbench
- [x] Fix role permissions #6812

---------

Co-authored-by: Gokhan Gulbiz <ggulbiz@gmail.com>
Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
2023-04-05 17:44:17 +03:00
rajeshkt78 d5df892394
Make CDC decoder an independent extension (#6810)
DESCRIPTION: 

- The CDC decoder is refacroted into a seperate extension that can be used loaded dynamically without having to reload citus.
- CDC decoder code can be compiled using DECODER flag to work with different decoders like pgoutput and wal2json.
   by default the base decode is "pgoutput".
- the dynamic_library_path config is adjusted dynamically to prefer the decoders in cdc_decoders directory in citus init
  so that the users can use the replication subscription commands without having to make any config changes.
2023-04-03 21:32:15 +05:30
aykutbozkurt 85d50203d1 PR #6728  / commit - 2
- Create MetadataSyncContext api to encapsulate
  both transactional and nontransactional modes,
- Add a GUC to switch between metadata sync transaction modes.
2023-03-30 10:52:46 +03:00
rajeshkt78 85b8a2c7a1
CDC implementation for Citus using Logical Replication (#6623)
Description:
Implementing CDC changes using Logical Replication to avoid
re-publishing events multiple times by setting up replication origin
session, which will add "DoNotReplicateId" to every WAL entry.
   - shard splits
   - shard moves
   - create distributed table
   - undistribute table
   - alter distributed tables (for some cases)
   - reference table operations
   

The citus decoder which will be decoding WAL events for CDC clients, 
ignores any WAL entry with replication origin that is not zero.
It also maps the shard names to distributed table names.
2023-03-28 16:00:21 +05:30
Onur Tirtir 616b5018a0
Add a GUC to disallow planning the queries that reference non-colocated tables via router planner (#6793)
Today we allow planning the queries that reference non-colocated tables
if the shards that query targets are placed on the same node. However,
this may not be the case, e.g., after rebalancing shards because it's
not guaranteed to have those shards on the same node anymore.
This commit adds citus.enable_non_colocated_router_query_pushdown GUC
that can be used to disallow  planning such queries via router planner,
when it's set to false. Note that the default value for this GUC will be
"true" for 11.3, but we will alter it to "false" on 12.0 to not
introduce
a breaking change in a minor release.

Closes #692.

Even more, allowing such queries to go through router planner also
causes
generating an incorrect plan for the DML queries that reference
distributed
tables that are sharded based on different replication factor settings.
For
this reason, #6779 can be closed after altering the default value for
this
GUC to "false", hence not now.

DESCRIPTION: Adds `citus.enable_non_colocated_router_query_pushdown` GUC
to ensure generating a consistent distributed plan for the queries that
reference non-colocated distributed tables (when set to "false", the
default is "true").
2023-03-28 13:10:29 +03:00
Jelte Fennema 92689a8362
Make GPIDs work with pg_dist_poolinfo (#6588)
The original implementation of GPIDs didn't work correctly when using
`pg_dist_poolinfo` together with PgBouncer. The reason is that it
assumed that once a connection was made to a worker, the originating
GPID should stay the same for ever. But when pg_dist_poolinfo is used
this isn't the case, because the same connection on the worker might be
used by different backends of the coordinator.

This fixes that issue by updating the GPID whenever a new application
name is set on a connection. This is the only thing that's needed,
because PgBouncer already sets the application name correctly on the
server connection whenever a client is updated.
2023-01-13 14:39:19 +00:00
Jelte Fennema 34df853bda
Fix bug introduced by #6412 (#6590)
In #6412 I made a change to not re-assign the global PID if it was
already set. This inadvertently introduced a regression where `userId`
and `databaseId` would not be set on the backend data when the global
PID was assigned in the authentication hook.

This fixes it by doing two things:
1. Removing `userId` from `BackendData`, since it's not used anywhere
   anyway.
2. Move assignment of `databaseId` to dedicated
   `SetBackendDataDatabaseId` function, that isn't a no-op when global
   pid is already set.

Since #6412 is not released yet this does not need a description.
2023-01-10 16:21:57 +01:00
Hanefi Onaldi d4394b2e2d
Fix spacing in multiline strings (#6533)
When using multiline strings, we occasionally forget to add a single
space at the end of the first line. When this line is concatenated with
the next one, the resulting string has a missing space.
2022-12-01 23:42:47 +03:00
Hanefi Onaldi 1f29c16262
Fix misleading GUC description (#6532)
citus.skip_advisory_lock_permission_checks skips checks when it is set
to 'on', not 'off'
2022-12-01 15:43:02 +03:00
aykut-bozkurt 1f8675da43
nonblocking concurrent task execution via background workers (#6459)
Improvement on our background task monitoring API (PR #6296) to support
concurrent and nonblocking task execution.

Mainly we have a queue monitor background process which forks task
executors for `Runnable` tasks and then monitors their status by
fetching messages from shared memory queue in nonblocking way.
2022-11-30 14:29:46 +03:00
Gokhan Gulbiz bc118ee551
Change GUC propagation flag's default value to off (#6516)
This PR changes
```citus.propagate_session_settings_for_loopback_connection``` default
value to off not to expose this feature publicly at this point. See
#6488 for details.
2022-11-29 13:25:53 +03:00
Marco Slot 666696c01c
Deprecate citus.replicate_reference_tables_on_activate, make it always off (#6474)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-11-04 16:21:10 +01:00
oohira 3f66f3d9dd
Add missing space to citus.shard_count description (#6464)
DESCRIPTION: Add missing space to citus.shard_count description
2022-10-31 10:37:14 +01:00
Teja Mupparti 01103ce05d This implements a new UDF citus_get_cluster_clock() that returns a monotonically
increasing logical clock. Clock guarantees to never go back in value after restarts,
and makes best attempt to keep the value close to unix epoch time in milliseconds.

Also, introduces a new GUC "citus.enable_cluster_clock", when true, every
distributed transaction is stamped with logical causal clock and persisted
in a catalog pg_dist_commit_transaction.
2022-10-28 10:15:08 -07:00
Ahmet Gedemenli c379ff8614
Drop defer drop gucs (#6447)
DESCRIPTION: Drops GUC defer_drop_after_shard_split
DESCRIPTION: Drops GUC defer_drop_after_shard_move

Drop GUCs and related parts from the code.
Delete tests that specifically added for the GUCs.
Keep tests that can be used without the GUCs.
Update test output changes.

The motivation for this PR is to have an "always deferring" mechanism.
These two GUCs provide an option to not deferring dropping objects
during a shard move/split, and dropping them immediately. With this PR,
we will be always deferring dropping orphaned shards and other types of
objects.

We will have a separate PR to extend the deferred cleanup operation, so
that we would create records for deferred drop, for Subscriptions,
Publications, Replication Slots etc. This will make us be able to keep
track of created objects that needs to be dropped, during a shard
move/split. We will have objects created specifically for the current
operation; and those objects will be dropped at the end.

We have an issue (a draft roadmap) for enabling parallel shard moves.
For details please see: https://github.com/citusdata/citus/issues/6437
2022-10-25 16:48:34 +03:00
Gokhan Gulbiz e87eda6496
Introduce a new GUC to propagate local settings to new connections in rebalancer (#6396)
DESCRIPTION: Introduce
```citus.propagate_session_settings_for_loopback_connection``` GUC to
propagate local settings to new connections.

Fixes: #5289
2022-10-18 12:50:30 +03:00
Jelte Fennema cb34adf7ac
Don't reassign global PID when already assigned (#6412)
DESCRIPTION: Fix bug in global PID assignment for rebalancer
sub-connections

In CI our isolation_shard_rebalancer_progress test would sometimes fail
like this:
```diff
+isolationtester: canceling step s1-rebalance-c1-block-writes after 60 seconds
 step s1-rebalance-c1-block-writes:
  SELECT rebalance_table_shards('colocated1', shard_transfer_mode:='block_writes');
- <waiting ...>
+
+ERROR:  canceling statement due to user request
 step s7-get-progress:
```

Source:
https://app.circleci.com/pipelines/github/citusdata/citus/27855/workflows/2a7e335a-f3e8-46ed-b6bd-6920d42f7214/jobs/831710

It turned out this was an actual bug in the way our assigning of global
PIDs interacts with the way we connect to ourselves as the shard
rebalancer. The first command the shard rebalancer sends is a SET 
ommand to change the application_name to `citus_rebalancer`. If
`StartupCitusBackend` is called after this command is processed, then it
overwrites the global PID that was extracted from the previous
application_name. This makes sure that we don't do that, and continue to
use the original global PID. While it might seem that we only call 
`StartupCitusBackend` once for each query backend, this isn't actually 
the case. Whenever pg_dist_partition gets ANALYZEd by autovacuum
we indirectly call `StartupCitusBackend` again, because we invalidate 
the cache then.

In passing this fixes two other things as well:
1. It sets `distributedCommandOriginator` correctly in
   `AssignGlobalPID`, by using IsExternalClientBackend(). This doesn't
   matter much anymore, since AssignGlobalPID effectively becomes a
   no-op in this PR for any non-external client backends.
2. It passes the application_name to InitializeBackendData in
   StartupCitusBackend, instead of INVALID_CITUS_INTERNAL_BACKEND_GPID
   (which effectively got casted to NULL). In practice this doesn't
   change the behaviour of the call, since the call is a no-op for every
   backend except the maintenance daemon. And the behaviour of the call
   is the same for NULL as for the application_name of the maintenance
   daemon.
2022-10-11 16:41:01 +02:00
Marco Slot ba2fe3e3c4
Remove do_repair option from citus_copy_shard_placement (#6299)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-09-09 15:44:30 +02:00