Commit Graph

56 Commits (97cbfb33f0f14fcdd43cd3435a6d796c894e8f59)

Author SHA1 Message Date
Colm McHugh 97cbfb33f0 POC - avoid deparse - parse - plan for local execution on single shard 2025-06-05 16:10:41 +00:00
Onur Tirtir ea7aa6712d
Move stat view implementations into a submodule (#7975)
Also move serialize_distributed_ddls into commands submodule, seems like
an oversight from last year (by me).
2025-04-29 14:22:29 +03:00
Naisila Puka 6bd3474804 Rename foreach_ macros to foreach_declared_ macros (#7700)
This is prep work for successful compilation with PG17

PG17added foreach_ptr, foreach_int and foreach_oid macros
Relevant PG commit
14dd0f27d7cd56ffae9ecdbe324965073d01a9ff

14dd0f27d7

We already have these macros, but they are different with the
PG17 ones because our macros take a DECLARED variable, whereas
the PG16 macros declare a locally-scoped loop variable themselves.

Hence I am renaming our macros to foreach_declared_

I am separating this into its own PR since it touches many files. The
main compilation PR is https://github.com/citusdata/citus/pull/7699
2025-03-12 11:01:49 +03:00
Nils Dijk 0620c8f9a6
Sort includes (#7326)
This change adds a script to programatically group all includes in a
specific order. The script was used as a one time invocation to group
and sort all includes throught our formatted code. The grouping is as
follows:

 - System includes (eg. `#include<...>`)
 - Postgres.h (eg. `#include "postgres.h"`)
- Toplevel imports from postgres, not contained in a directory (eg.
`#include "miscadmin.h"`)
 - General postgres includes (eg . `#include "nodes/..."`)
- Toplevel citus includes, not contained in a directory (eg. `#include
"citus_verion.h"`)
 - Columnar includes (eg. `#include "columnar/..."`)
 - Distributed includes (eg. `#include "distributed/..."`)

Because it is quite hard to understand the difference between toplevel
citus includes and toplevel postgres includes it hardcodes the list of
toplevel citus includes. In the same manner it assumes anything not
prefixed with `columnar/` or `distributed/` as a postgres include.

The sorting/grouping is enforced by CI. Since we do so with our own
script there are not changes required in our uncrustify configuration.
2023-11-23 18:19:54 +01:00
Naisila Puka b36c431abb
PG16 compatibility - Rework PlannedStmt and Query's Permission Info (#7098)
PG16 compatibility - Part 6

Check out part 1 42d956888d
part 2 0d503dd5ac
part 3 907d72e60d
part 4 7c6b4ce103
part 5 6056cb2c29

This commit is in the series of PG16 compatibility commits.
It handles the Permission Info changes in PG16. See below:

The main issue lies in the following entries of PlannedStmt: {
   rtable
   permInfos
}

Each rtable has an int perminfoindex, and its actual permission info is
obtained through the following:
permInfos[perminfoindex]
We had crashes because perminfoindexes were not updated in the finalized
planned statement after distributed planner hook.
So, basically, everywhere we set a query's or planned statement's rtable
entry, we need to set the rteperminfos/permInfos accordingly.

Relevant PG commits:
a61b1f7482
a61b1f74823c9c4f79c95226a461f1e7a367764b
b803b7d132
b803b7d132e3505ab77c29acf91f3d1caa298f95

More PG16 compatibility commits are coming soon ...
2023-08-09 15:23:00 +03:00
Halil Ozan Akgül 52ad2d08c7
Multi tenant monitoring (#6725)
DESCRIPTION: Adds views that monitor statistics on tenant usages

This PR adds `citus_stats_tenants` view that monitors the tenants on the
cluster.

`citus_stats_tenants` shows the node id, colocation id, tenant
attribute, read count in this period and last period, and query count in
this period and last period of the tenant.
Tenant attribute currently is the tenant's distribution column value,
later when schema based sharding is introduced, this meaning might
change.
A period is a time bucket the queries are counted by. Read and query
counts for this period can increase until the current period ends. After
that those counts are moved to last period's counts, which cannot
change. The period length can be set using 'citus.stats_tenants_period'.

`SELECT` queries are counted as _read_ queries, `INSERT`, `UPDATE` and
`DELETE` queries are counted as _write_ queries. So in the view read
counts are `SELECT` counts and query counts are `SELECT`, `INSERT`,
`UPDATE` and `DELETE` count.

The data is stored in shared memory, in a struct named
`MultiTenantMonitor`.

`citus_stats_tenants` shows the data from local tenants.

`citus_stats_tenants` show up to `citus.stats_tenant_limit` number of
tenants.
The tenants are scored based on the number of queries they run and the
recency of those queries. Every query ran increases the score of tenant
by `ONE_QUERY_SCORE`, and after every period ends the scores are halved.
Halving is done lazily.
To retain information a longer the monitor keeps up to 3 times
`citus.stats_tenant_limit` tenants. When the tenant count hits `3 *
citus.stats_tenant_limit`, last `citus.stats_tenant_limit` tenants are
removed. To see all stored tenants you can use
`citus_stats_tenants(return_all_tenants := true)`

- [x] Create collector view that gets data from all nodes. #6761 
- [x] Add monitoring log #6762 
- [x] Create enable/disable GUC #6769 
- [x] Parse the annotation string correctly #6796 
- [x] Add local queries and prepared statements #6797
- [x] Rename to citus_stat_statements #6821 
- [x] Run pgbench
- [x] Fix role permissions #6812

---------

Co-authored-by: Gokhan Gulbiz <ggulbiz@gmail.com>
Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
2023-04-05 17:44:17 +03:00
Teja Mupparti 9a9989fc15 Support MERGE Phase – I
All the tables (target, source or any CTE present) in the SQL statement are local i.e. a merge-sql with a combination of Citus local and
Non-Citus tables (regular Postgres tables) should work and give the same result as Postgres MERGE on regular tables. Catch and throw an
exception (not-yet-supported) for all other scenarios during Citus-planning phase.
2022-12-18 20:32:15 -08:00
aykut-bozkurt 69726648ab
verify shards if exists for insert, delete, update (#6280)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-09-06 15:29:14 +02:00
Marco Slot 639588bee0
Remove unused functions (#6220)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-08-22 11:53:25 +03:00
Sait Talha Nisanci ab29c25658 Fix missing from entry 2021-11-04 18:54:52 +03:00
Onder Kalaci 69ca943e58 Deparse/parse the local cached queries
With local query caching, we try to avoid deparse/parse stages as the
operation is too costly.

However, we can do deparse/parse operations once per cached queries, right
before we put the plan into the cache. With that, we avoid edge
cases like (4239) or (5038).

In a sense, we are making the local plan caching behave similar for non-cached
local/remote queries, by forcing to deparse the query once.
2021-06-21 12:24:29 +03:00
Marco Slot 715dce1eea Reduce local insert memory usage during deparsing 2021-05-18 16:11:43 +02:00
Marco Slot 58f85f55c0 Fixes a crash in queries with a modifying CTE and a SELECT without FROM 2021-03-09 10:39:33 +01:00
Ahmet Gedemenli a64dc8a72b Fixes a bug preventing INSERT SELECT .. ON CONFLICT with a constraint name on local shards
Separate search relation shard function

Add tests
2020-11-25 15:10:46 +03:00
Hanefi Önaldı 6d8e83d24f
Replace worker_hash calls with partkey IS NOT NULL filters 2020-10-02 18:16:24 +03:00
Onur Tirtir 0b1cc118a9 Adapt other cache entry changes for citus local tables 2020-09-09 11:50:55 +03:00
SaitTalhaNisanci 366461ccdb
Introduce cache entry/table utilities (#4132)
Introduce table entry utility functions

Citus table cache entry utilities are introduced so that we can easily
extend existing functionality with minimum changes, specifically changes
to these functions. For example IsNonDistributedTableCacheEntry can be
extended for citus local tables without the need to scan the whole
codebase and update each relevant part.

* Introduce utility functions to find the type of tables

A table type can be a reference table, a hash/range/append distributed
table. Utility methods are created so that we don't have to worry about
how a table is considered as a reference table etc. This also makes it
easy to extend the table types.

* Add IsCitusTableType utilities

* Rename IsCacheEntryCitusTableType -> IsCitusTableTypeCacheEntry

* Change citus table types in some checks
2020-09-02 22:26:05 +03:00
Hadi Moshayedi 7b74eca22d Support EXPLAIN EXECUTE ANALYZE. 2020-08-10 13:44:30 -07:00
Sait Talha Nisanci bf831d2e59 Use table_openXXX methods in the codebase
With PG13 heap_* (heap_open, heap_close etc) are replaced with table_*
(table_open, table_close etc).

It is better to use the new table access methods in the codebase and
define the macros for the previous versions as we can easily remove the
macro without having to change the codebase when we drop the support for
the old version.

Commits that introduced this change on Postgres:
f25968c49697db673f6cd2a07b3f7626779f1827
e0c4ec07284db817e1f8d9adfb3fffc952252db0
4b21acf522d751ba5b6679df391d5121b6c4a35f

Command to see relevant commits on Postgres side:
git log --all --grep="heap_open"
2020-08-04 15:10:22 +03:00
Hadi Moshayedi 13003d8d05 Use TupleDestination API for partitioning in insert/select. 2020-07-17 09:43:46 -07:00
Jelte Fennema 759e628dd5
Handle some NULL issues that static analysis found (#4001)
Static analysis found some issues where we used the result from
ExtractResultRelationRTE, without checking that it wasn't NULL. It seems
like in all these cases it can never actually be NULL, since we have checked
before that it isn't a SELECT query. So, this PR is mostly to make static
analysis happy (and protect a bit against future changes of the code).
2020-07-09 15:46:42 +02:00
Marco Slot 24feadc230 Handle joins between local/reference/cte via router planner 2020-06-12 18:36:01 -07:00
Hadi Moshayedi 0bfd39ea52 Implement TupleDestination intereface.
Implements a new `TupleDestination` interface to allow custom tuple processing per task.

This can be specially useful if a task contains multiple queries. An example of this EXPLAIN
ANALYZE, where it needs to add some UDF calls to the query to fetch the explain output
from worker after fetching the actual query results.
2020-06-05 17:47:40 -07:00
SaitTalhaNisanci a9a3be15cc
introduce TASK_QUERY_NULL task type (#3774)
When we call SetTaskQueryString we would set the task type to
TASK_QUERY_TEXT, and some parts of the codebase rely on the fact that if
TASK_QUERY_TEXT is set, the data can be read safely. However if
SetTaskQueryString is called with a NULL taskQueryString this can cause
crashes. In that case taskQueryType will simply be set to
TASK_QUERY_NULL.
2020-04-17 14:59:22 +03:00
Onur Tirtir 13a35c6813 implement GetOnlyShardOidOfReferenceTable and some refactor in shard_uitls 2020-04-03 18:24:13 +03:00
SaitTalhaNisanci b5591b1b28 use taskQuery as a struct to simplify the code 2020-03-31 15:47:55 +03:00
SaitTalhaNisanci 8806c4d697 move queryStringList into taskQuery
Also allocate task query in the memory context of task.
2020-03-31 15:47:55 +03:00
SaitTalhaNisanci c796ac335d add TaskQuery struct to abstract query string related fields
We had many fields in task related to query strings. It was kind of
complex, and only of them could be set at a time. Therefore it makes
more sense to abstract this and use a union so that it is clear that
only of them should be set.

We have three fields that could have query related strings:
- queryForLocation
- queryStringLazy
- perPlacementQueryStrings

Relatively, they can be set with:
- SetTaskQueryString
- SetTaskQueryIfShouldLazyDeparse
- SetTaskPerPlacementQueryStrings

The direct usage of the query related fields are also removed.

Rename queryForLocalExecution

Currently queryForLocalExecution is only used for deparsing purposes,
therefore it makes sense to rename it to what it is doing.
2020-03-31 15:47:55 +03:00
SaitTalhaNisanci 98f95e2a5e add TaskQueryStringForPlacement
TaskQueryStringForPlacement simplifies how the executor gets the query
string for a given placement. Task will use the necessary fields to
return the correct query placement string. Executor doesn't need to know
the details for this.

rename TaskQueryString as TaskQueryStringAllPlacements

TaskQueryString returns the query string that will be the same for all
the placements. In INSERT..SELECT the query string can be different for
each placement. Adaptive executor uses TaskQueryStringForPlacement,
which returns the query string for a placement. It makes sense to rename
TaskQueryString as TaskQueryStringAllPlacements as it is returning the
query string for all placements.

rename SetTaskQuery as SetTaskQueryIfShouldLazyDeparse

SetTaskQuery does not always sets the task query. It can set the query
string as well. So it is more clear to name it
SetTaskQueryIfShouldLazyDeparse, since it will set the query not query
string only when we should deparse the query in a lazy way.
2020-03-31 15:47:55 +03:00
SaitTalhaNisanci 982b5fbabf add SetTaskPerPlacementStrings
It is possible that a task will have different query string for each
placement. This is the case in INSERT..SELECT via repartitioning. When
we are setting task->perPlacementQueryString, we should set
queryStringLazy to NULL. Therefore a method for that purpose is created.
2020-03-31 15:47:55 +03:00
SaitTalhaNisanci dd1a456407
store query command list in task (#3649)
Sometimes we have concatenated query strings for a task. However,
when we want to find each query string, it is not a trivial task.
Therefore, it makes sense to store this in task so that when we need
each query string we can easily get it.
2020-03-26 12:04:08 +03:00
SaitTalhaNisanci 39bbec0f30 add tests for local copy execution 2020-03-18 09:28:59 +03:00
SaitTalhaNisanci f9c4431885 add the support to execute copy locally
A copy will be executed locally if
- Local execution is enabled and current transaction accessed a local placement
- Local execution is enabled and we are inside a transaction block.

So even if local execution is enabled but we are not in a transaction block, the copy will not be run locally.

This will not run locally:
```
COPY distributed_table FROM STDIN;
....
```

This will run locally:
```
SET citus.enable_local_execution to 'on';
BEGIN;
COPY distributed_table FROM STDIN;
COMMIT;
....
```
.
There are 3 ways to do a copy in postgres programmatically:
- from a file
- from a program
- from a callback function

I have chosen to implement it with a callback function, which means that we write the rows of copy from a callback function to the output buffer, which is used to insert tuples into the actual table.

For each shard id, we have a buffer that keeps the current rows to be written, we perform the actual copy operation either when:
- copy buffer for the given shard id reaches to a threshold, which is currently 512KB
- we reach to the end of the copy

The buffer size is debatable(512KB). At a given time, we might allocate (local placement * buffer size) memory at most.

The local copy uses the same copy format as remote copy, which means that we serialize the data in the same format as remote copy and send it locally.

There was also the option to use ExecSimpleRelationInsert to insert
slots one by one, which would avoid the extra
serialization/deserialization but doing some benchmarks it seems that
using buffers are significantly better in terms of the performance.

You can see this comment for more details: https://github.com/citusdata/citus/pull/3557#discussion_r389499054
2020-03-18 09:28:59 +03:00
SaitTalhaNisanci 321d0152c1
add a utility to get shard oid from relation oid and shard id (#3596) 2020-03-09 15:50:29 +03:00
Marco Slot dc4c0c032e Refactor CitusBeginScan into separate DML / SELECT paths 2020-03-05 12:37:22 +01:00
Philip Dubé 08f6842d50 Fix typos
Equivalance -> Equivalence
utillity -> utility
shorted lived one -> shortly lived one
elegible -> eligible
2020-02-18 17:14:40 +00:00
Onder Kalaci 0bf1e81e33 Cache local plans on BeginScan 2020-01-17 16:02:57 +01:00
Jelte Fennema 246435be7e
Lazy query deparsing executable queries (#3350)
Deparsing and parsing a query can be heavy on CPU. When locally executing 
the query we don't need to do this in theory most of the time.

This PR is the first step in allowing to skip deparsing and parsing
the query in these cases, by lazily creating the query string and
storing the query in the task. Future commits will make use of this and
not deparse and parse the query anymore, but use the one from the task
directly.
2020-01-17 11:49:43 +01:00
SaitTalhaNisanci 13204487e9
remove copyright years (#3286) 2019-12-11 21:14:08 +03:00
Jelte Fennema 1d8dde232f
Automatically convert useless declarations using regex replace (#3181)
* Add declaration removal to CI

* Convert declarations
2019-11-21 13:47:29 +01:00
Philip Dubé 018ad1c58e pg12: version_compat.h, tuples, oids, misc 2019-08-22 18:57:23 +00:00
Philip Dubé 064bd66a20 Avoid segfault in logging queries 2019-07-31 15:28:46 +00:00
Philip Dubé 0915027389 DistributedPlan: replace operation with modLevel
This causes no behaviorial changes, only organizes better to implement modifying CTEs

Also rename ExtactInsertRangeTableEntry to ExtractResultRelationRTE,
as the source of this function didn't match the documentation

Remove Task's upsertQuery in favor of ROW_MODIFY_NONCOMMUTATIVE

Split up AcquireExecutorShardLock into more internal functions

Tests: Normalize multi_reference_table multi_create_table_constraints
2019-07-16 13:58:18 -07:00
mehmet furkan şahin ef9f38b68d ApplyLogRedaction noop func is added 2018-08-17 14:48:54 -07:00
velioglu 32bcd610c1 Support modify queries with multiple tables
With this commit we begin to support modify queries with multiple
tables if these queries are pushdownable.
2018-05-02 16:22:26 +03:00
metdos 8c356b2bc8 Don't try to add restrictions for reference tables in insert into select 2017-10-31 19:44:10 +02:00
velioglu 0b5db5d826 Support multi shard update/delete queries 2017-10-25 15:52:38 +03:00
Jason Petersen 6c9b19a954
Add version-compat header
For polyfill macros, etc.
2017-09-25 17:20:23 -07:00
Jason Petersen fbeaa2f9d0
Remove direct access to tupleDesc->attrs
A level of indirection was removed from this field for PostgreSQL 11.
By using the handy provided macro, we can be version agnostic.
2017-09-25 17:20:23 -07:00
Marco Slot c97692f382 Fix multi-row INSERT with RETURNING on reference tables 2017-08-24 10:42:12 +02:00