Compare commits

..

79 Commits

Author SHA1 Message Date
Hanefi Onaldi 0750cc3a8e
Bump Citus version to 9.5.10 2021-11-08 16:52:56 +03:00
Hanefi Onaldi ac8285f9e0
Add changelog entries for 9.5.10
(cherry picked from commit 7b63edfc83)
2021-11-08 16:51:09 +03:00
Hanefi Onaldi 27ef29a981
Bump Citus version to 9.5.9 2021-11-08 15:21:02 +03:00
Hanefi Onaldi 62ea10989c
Add changelog entries for 9.5.9
(cherry picked from commit 3d49cbf9ab)
2021-11-08 14:14:01 +03:00
Nils Dijk 8f6cbb2d6b
reinstate optimization that got unintentionally broken in 366461ccdb (#5418)
DESCRIPTION: Reinstate optimisation for uniform shard interval ranges

During a refactor introduced in #4132 the following change was made, which made the optimisation in `CalculateUniformHashRangeIndex` unreachable: 
366461ccdb (diff-565a339ed3c78bc5a0d4ffeb4e91032150b1dffbeeff59cd3e65981d20b998c7L319-R319)

This PR reinstates the path to the optimisation!
2021-11-05 13:08:42 +01:00
Sait Talha Nisanci 64a9baffbd Adjust tests for 9.5 2021-11-05 13:08:11 +03:00
Sait Talha Nisanci 52631ea0fc Fix missing from entry
(cherry picked from commit a0e0759f73)
2021-11-05 13:08:11 +03:00
Onder Kalaci 53fbe96399 Deparse/parse the local cached queries
With local query caching, we try to avoid deparse/parse stages as the
operation is too costly.

However, we can do deparse/parse operations once per cached queries, right
before we put the plan into the cache. With that, we avoid edge
cases like (4239) or (5038).

In a sense, we are making the local plan caching behave similar for non-cached
local/remote queries, by forcing to deparse the query once.

(cherry picked from commit 69ca943e58)
2021-11-05 13:08:11 +03:00
Marco Slot 70fcddc0cb Fixes a crash in queries with a modifying CTE and a SELECT without FROM
(cherry picked from commit 58f85f55c0)
2021-11-05 13:08:11 +03:00
Marco Slot a875449b60 Run subquery_prepared_statements by itself
(cherry picked from commit ecbc1ab008)
2021-10-27 11:12:25 +02:00
SaitTalhaNisanci f9245510bc Run pg12 and pg13 separately (#4352)
It seems that sometimes we get `too many clients errors` with this set
of parallel tests, hence two of them are separated.

(cherry picked from commit 8c3dd6338e)
2021-10-27 11:12:25 +02:00
Sait Talha Nisanci d7a4dc8aae Replace workerNodeCount -> nodeCount
(cherry picked from commit ff82e85ea2)
2021-10-27 09:54:43 +02:00
Sait Talha Nisanci c908bad0e2 Set previous cell inside a for loop
(cherry picked from commit eb5be579e3)
2021-10-27 09:54:43 +02:00
Sait Talha Nisanci 3aee6aa494 Remove unused method
(cherry picked from commit 9ba3f70420)
2021-10-27 09:54:43 +02:00
Sait Talha Nisanci 2cf172fd27 Consider coordinator in intermediate result optimization
It seems that we were not considering the case where coordinator was
added to the cluster as a worker in the optimization of intermediate
results.

This could lead to errors when coordinator was added as a worker.

(cherry picked from commit 24e60b44a1)
2021-10-27 09:54:43 +02:00
Ahmet Gedemenli 213a5d746d
Fixes a bug preventing INSERT SELECT .. ON CONFLICT with a constraint name on local shards
Separate search relation shard function

Add tests

(cherry picked from commit a64dc8a72b)
2021-10-22 11:38:05 +03:00
Hanefi Onaldi 9ffef77288
Missing v in changelogs 2021-09-17 20:23:00 +03:00
Hanefi Onaldi b3dd39bf82
Bump Citus version to 9.5.8 2021-09-15 19:32:15 +03:00
Hanefi Onaldi a62e046646
Add changelog entries for 9.5.8
(cherry picked from commit 92af115f21)
2021-09-15 19:31:28 +03:00
Marco Slot ad79a5e080 Small fix to PG12 compatibility 2021-09-10 14:37:21 +02:00
Marco Slot be9fc69923 Avoid switch to superuser in worker_merge_files_into_table 2021-09-10 13:29:13 +02:00
Marco Slot e5e962b9b6 Add worker_append_table_to_shard permissions tests 2021-09-10 13:29:13 +02:00
Marco Slot fcd27a48c3 Perform copy command as regular user in worker_append_table_to_shard 2021-09-10 13:29:10 +02:00
Onur Tirtir 963bf8559c Backport missing pieces of 4a2dde4612
Sorry, I forgot to reflect the changes done in alter_table.c to
create_distributed_table.c when backporting
35043c56f1.
2021-09-08 16:33:33 +03:00
Onur Tirtir 355a774087 Not read heaptuple after closing pg_rewrite (#5255)
(cherry picked from commit cc49e63222)
2021-09-08 16:01:16 +03:00
Ahmet Gedemenli 4a2dde4612 Fix dropping materialized views while doing alter table
(cherry picked from commit 35043c56f1)

 Conflicts:
	src/backend/distributed/commands/alter_table.c
	src/test/regress/expected/alter_distributed_table.out
	src/test/regress/expected/alter_distributed_table_0.out
	src/test/regress/expected/alter_table_set_access_method.out
	src/test/regress/sql/alter_distributed_table.sql
	src/test/regress/sql/alter_table_set_access_method.sql
2021-09-08 16:01:09 +03:00
Hanefi Onaldi 209049006c
Bump Citus version to 9.5.7 2021-08-16 17:51:46 +03:00
Hanefi Onaldi a1633ed969
Add changelog entries for 9.5.7
(cherry picked from commit 41b15d8775)
2021-08-16 17:50:34 +03:00
Onder Kalaci d052c19cd8 Guard against hard WaitEvenSet errors
In short, add wrappers around Postgres' AddWaitEventToSet() and
ModifyWaitEvent().

AddWaitEventToSet()/ModifyWaitEvent*() may throw hard errors. For
example, when the underlying socket for a connection is closed by
the remote server and already reflected by the OS, however
Citus hasn't had a chance to get this information. In that case,
if replication factor is >1, Citus can failover to other nodes
for executing the query. Even if replication factor = 1, Citus
can give much nicer errors.

So CitusAddWaitEventSetToSet()/CitusModifyWaitEvent() simply puts
AddWaitEventToSet()/ModifyWaitEvent() into a PG_TRY/PG_CATCH block
in order to catch any hard errors, and returns this information to
the caller.
2021-08-10 09:47:08 +02:00
Onder Kalaci fc2272c6bd Adjust the tests to earlier versions
- Drop PRIMARY KEY for Citus 10 compatibility
- Drop columnar for PG 12
- Do not start/stop metadata sync as stop is not implemented in 10.1
- PG 11 parallel query changes explain outputs
2021-08-06 16:47:11 +02:00
Onder Kalaci aeca7b1868 Dropped columns do not diverge distribution column for partitioned tables
Before this commit, creating a partition after a DROP column
on the parent (position before dist. key) was leading to
partition to have the wrong distribution column.

(cherry picked from commit 32124efd83)
2021-08-06 16:46:52 +02:00
naisila 74a58270db Bump use of new sql function 2021-08-03 16:45:23 +03:00
naisila edfa976cc8 Fix master_update_table_statistics scripts for 9.5 2021-08-03 16:45:23 +03:00
naisila daf85c6923 Fix master_update_table_statistics scripts for 9.4 2021-08-03 16:45:23 +03:00
naisila 7f11a59863 Reimplement master_update_table_statistics to detect dist. deadlocks
(ALL CODE BORROWED from commit 2f30614fe3)
2021-08-03 16:45:23 +03:00
Onur Tirtir b9df44742f Remove fkey graph visited flags & rework GetConnectedListHelper (#4446)
With this commit, we remove visited flags from ForeignConstraintRelationshipNode
struct since keeping local state in global object is both dangerous and
meaningless.

Also to improve readability, this commit also converts needless recursion to
iterative DFS to avoid passing local hash-map as another parameter to
GetConnectedListHelper function.
(cherry picked from commit 0db21bbe14)
2021-08-03 16:45:23 +03:00
Onur Tirtir 928cee6af6 Refactor foreign_key_relationship.c (#4438)
(cherry picked from commit 3f60b08b11)
2021-08-03 16:45:23 +03:00
naisila 27d74f1540 Add OpenConnectionToNodes and functions that generate shard queries
(ALL CODE BORROWED from commit 724d56f949)
2021-08-03 16:45:23 +03:00
Hanefi Onaldi a529866f93
Bump Citus version to 9.5.6 2021-07-08 15:38:42 +03:00
Hanefi Onaldi 2efbd6e21a
Add changelog entry for 9.5.6
(cherry picked from commit a8a632269af6a276c92bc4852a0c87f0e6467fb3)
2021-07-08 15:38:42 +03:00
Nils Dijk 3e2257fe8f fix 9.5-2 upgrade script to adhere to idempotency 2021-07-08 12:25:18 +02:00
Nils Dijk 2e0f9cff59 Add test for idempotency of citus_prepare_pg_upgrade 2021-07-08 12:25:18 +02:00
Hanefi Onaldi 5ba6edaa48
Bump Citus version to 9.5.5 2021-07-08 09:47:22 +03:00
Hanefi Onaldi 39c4b1045c
Add changelog entry for 9.5.5
(cherry picked from commit d3b6651403)
2021-07-08 09:47:09 +03:00
Onur Tirtir 344ac23f69 Fix lower boundary calculation when pruning range dist table shards (#5082)
This happens only when we have a "<" or "<=" filter on distribution
column of a range distributed table and that filter falls in between
two shards.

When the filter falls in between two shards:

  If the filter is ">" or ">=", then UpperShardBoundary was
  returning "upperBoundIndex - 1", where upperBoundIndex is
  exclusive shard index used during binary seach.
  This is expected since upperBoundIndex is an exclusive
  index.

  If the filter is "<" or "<=", then LowerShardBoundary was
  returning "lowerBoundIndex + 1", where lowerBoundIndex is
  inclusive shard index used during binary seach.
  On the other hand, since lowerBoundIndex is an inclusive
  index, we should just return lowerBoundIndex instead of
  doing "+ 1". Before this commit, we were missing leftmost
  shard in such queries.

* Remove useless conditional branches

The branch that we delete from UpperShardBoundary was obviously useless.

The other one in LowerShardBoundary became useless after we remove "+ 1"
from there.

This indeed is another proof of what & how we are fixing with this pr.

* Improve comments and add more

* Add some tests for upper bound calculation too

(cherry picked from commit b118d4188e)

* Also fix a debug message diff for 9.5
2021-07-07 12:49:16 +03:00
Nils Dijk c11d804e56 Bump use of new sql function 2021-07-05 16:18:43 +02:00
Marco Slot 636bdda886 Fix PG upgrade scripts for 9.5 2021-07-05 16:18:43 +02:00
Marco Slot 7dbda5d607 Fix PG upgrade scripts for 9.4 2021-07-05 16:18:43 +02:00
Sait Talha Nisanci 3584fa11b0 Fix a flaky behaviour in shared_connection_stats
With the previous query, we were not pushing down the pg_sleep hence the
number of connections to a worker could be different from run to run.
2021-07-02 18:37:49 +03:00
Hanefi Onaldi 4aff833ca8 Do not use security flags by default (#4770)
(cherry picked from commit 697bbbd3c6)
2021-03-04 00:54:35 +03:00
Hanefi Onaldi bf345ac49b Add security flags in configure scripts (#4760)
(cherry picked from commit f87107eb6b)
2021-03-04 00:52:34 +03:00
Onur Tirtir cd1e706acb Bump version to 9.5.4 2021-02-19 15:08:31 +03:00
Onur Tirtir 3d4d76fdde Update CHANGELOG for 9.5.4
(cherry picked from commit bb14c5267f)

 Conflicts:
	CHANGELOG.md
2021-02-19 15:08:31 +03:00
SaitTalhaNisanci 45671a1caa Use PROCESS_UTILITY_QUERY in utility calls
When we use PROCESS_UTILITY_TOPLEVEL it causes some problems when
combined with other extensions such as pg_audit. With this commit we use
PROCESS_UTILITY_QUERY in the codebase to fix those problems.

(cherry picked from commit dcf54eaf2a)

 Conflicts:
	src/backend/distributed/commands/alter_table.c
	src/backend/distributed/commands/cascade_table_operation_for_connected_relations.c
	src/backend/distributed/executor/local_executor.c
	src/backend/distributed/utils/role.c
	src/backend/distributed/worker/worker_create_or_replace.c
	src/backend/distributed/worker/worker_data_fetch_protocol.c
2021-02-19 15:08:31 +03:00
Onur Tirtir 1eec630640 Bump version to 9.5.3 2021-02-16 15:19:53 +03:00
Onur Tirtir 5dc2fae9d6 Update CHANGELOG for 9.5.3
(cherry picked from commit a0de066996)

 Conflicts:
	CHANGELOG.md
2021-02-16 15:18:35 +03:00
Ahmet Gedemenli 0f498ac26d Fix dropping fkey when distributing table
(cherry picked from commit c8e83d1f26)
2021-02-12 18:33:05 +03:00
Onur Tirtir 44459be1ab Implement GetPgDependTuplesForDependingObjects
(cherry picked from commit 04a4167a8a)
2021-02-12 18:32:16 +03:00
Onur Tirtir 8401acb761 Implement ConstraintWithNameIsOfType (#4451)
(cherry picked from commit e91e745dbc)
2021-02-12 18:28:00 +03:00
Onur Tirtir 26556b2bba Refactor ColumnAppearsInForeignKeyToReferenceTable (#4441)
(cherry picked from commit d1b3eaf767)
2021-02-12 18:26:23 +03:00
Sait Talha Nisanci 7480160f4f Call 6 times not 7 in subquery_prepared_statements 2021-02-11 17:28:30 +01:00
Onder Kalaci 23951c562e Do not connection re-use for intermediate results
/*
 * Colocated intermediate results are just files and not required to use
 * the same connections with their co-located shards. So, we are free to
 * use any connection we can get.
 *
 * Also, the current connection re-use logic does not know how to handle
 * intermediate results as the intermediate results always truncates the
 * existing files. That's why, we use one connection per intermediate
 * result.
 */

(cherry picked from commit 5d5a357487)
2021-02-11 16:51:09 +01:00
Onder Kalaci 51560f9644 When reaches to shared pool size, COPY sets the placement access
It looks like we forgot to set the placement accesses, and
this could lead to self-deadlocks on complex transaction blocks.
2021-02-02 10:18:48 +01:00
Onder Kalaci 9f27e398a9 When reaches to executor pool size, COPY sets the placement access
It looks like we forgot to set the placement accesses, and
this could lead to self-deadlocks on complex transaction blocks.

(cherry picked from commit 36bdeef1bb)
2021-02-02 10:18:39 +01:00
Sait Talha Nisanci 5bb4bb4b5f Bump version to 9.5.2 2021-01-26 16:26:25 +03:00
SaitTalhaNisanci a7ff0c5800
Update CHANGELOG for 9.5.2 (#4578) 2021-01-26 12:27:01 +03:00
Nils Dijk 6703b173a0 rework ci
(cherry picked from commit a748729998)
2021-01-25 15:45:05 +03:00
Nils Dijk 2efeed412a Mitigate segfault in connection statemachine (#4551)
As described in the comment, we have observed crashes in production
due to a segfault caused by the dereference of a NULL pointer in our
connection statemachine.

As a mitigation, preventing system crashes, we provide an error with
a small explanation of the issue. Unfortunately the case is not
reliably reproduced yet, hence the inability to add tests.

DESCRIPTION: Prevent segfaults when SAVEPOINT handling cannot recover from connection failures
(cherry picked from commit d127516dc8)
2021-01-25 15:22:41 +03:00
Hadi Moshayedi 49ce36fe8b Reland #4419
(cherry picked from commit bc01c795a2)
2021-01-25 15:15:59 +03:00
Hadi Moshayedi 043c3356ae Faster logical replication tests.
Logical replication status can take wal_receiver_status_interval
seconds to get updated. Default is 10s, which means tests in
which logical replication is used can take a long time to finish.
We reduce it to 1 second to speed these tests up.

Logical replication apply launcher launches workers every
wal_retrieve_retry_interval, so if we have many shard moves with
logical replication consecutively, they will be throttled by this
parameter. Default is 5s, we reduce it to 1s so we finish tests
faster.

(cherry picked from commit 0e0fd6599a)
2021-01-25 15:13:16 +03:00
Onur Tirtir a603ad9cbf Not consider single shard hash dist. tables as replicated (#4413)
(cherry picked from commit 0eb5701658)
2020-12-17 19:02:58 +03:00
Onur Tirtir 4a1255fd10 Update CHANGELOG for 9.5.1
(cherry picked from commit dd3453ced5)

 Conflicts:
	CHANGELOG.md
2020-12-02 11:20:43 +03:00
Onur Tirtir 67004edf43 Bump version to 9.5.1 2020-12-02 11:20:43 +03:00
Onur Tirtir 789d441296 Handle invalid connection hash entries (#4362)
If MemoryContextAlloc errors out -e.g. during an OOM-, ConnectionHashEntry->connections
stays as NULL.

With this commit, we add isValid flag to ConnectionHashEntry that should be set to true
right after we allocate & initialize ConnectionHashEntry->connections list properly, and we
check it before accesing to ConnectionHashEntry->connections.
(cherry picked from commit 7f3d1182ed)
2020-12-01 11:07:12 +03:00
Önder Kalacı 6d06e9760a Enable parallel query on EXPLAIN ANALYZE (#4325)
It seems that we forgot to pass the revelant
flag to enable Postgres' parallel query
capabilities on the shards when user does
EXPLAIN ANALYZE on a distributed table.
(cherry picked from commit b0ddbbd33a)
2020-12-01 11:07:12 +03:00
Onder Kalaci 74f0dd0c25 Do not execute subplans multiple times with cursors
Before this commit, we let AdaptiveExecutorPreExecutorRun()
to be effective multiple times on every FETCH on cursors.
That does not affect the correctness of the query results,
but adds significant overhead.

(cherry picked from commit c433c66f2b)
2020-12-01 11:07:12 +03:00
Onder Kalaci e777daad22 Do not cache all the distributed table metadata during CitusTableTypeIdList()
CitusTableTypeIdList() function iterates on all the entries of pg_dist_partition
and loads all the metadata in to the cache. This can be quite memory intensive
especially when there are lots of distributed tables.

When partitioned tables are used, it is common to have many distributed tables
given that each partition also becomes a distributed table.

CitusTableTypeIdList() is used on every CREATE TABLE .. PARTITION OF.. command
as well. It means that, anytime a partition is created, Citus loads all the
metadata to the cache. Note that Citus typically only loads the accessed table's
metadata to the cache.

(cherry picked from commit 7accbff3f6)

 Conflicts:
	src/test/regress/bin/normalize.sed
2020-12-01 11:07:06 +03:00
Onur Tirtir 4e373fadd8 Update CHANGELOG for 9.5.0
(cherry picked from commit 52a5ab0751)
2020-11-11 16:02:17 +03:00
Hanefi Önaldı 35703d5e61
Bump Citus to 9.5.0 2020-11-09 13:16:05 +03:00
2646 changed files with 73357 additions and 470335 deletions

544
.circleci/config.yml Normal file
View File

@ -0,0 +1,544 @@
version: 2.1
orbs:
codecov: codecov/codecov@1.1.1
azure-cli: circleci/azure-cli@1.0.0
jobs:
build:
description: Build the citus extension
parameters:
pg_major:
description: postgres major version building citus for
type: integer
image:
description: docker image to use for the build
type: string
default: citus/extbuilder
image_tag:
description: tag to use for the docker image
type: string
docker:
- image: '<< parameters.image >>:<< parameters.image_tag >>'
steps:
- checkout
- run:
name: 'Configure, Build, and Install'
command: |
./ci/build-citus.sh
- persist_to_workspace:
root: .
paths:
- build-<< parameters.pg_major >>/*
- install-<<parameters.pg_major >>.tar
check-style:
docker:
- image: 'citus/stylechecker:latest'
steps:
- checkout
- run:
name: 'Check Style'
command: citus_indent --check
- run:
name: 'Fix whitespace'
command: ci/editorconfig.sh
- run:
name: 'Check if whitespace fixing changed anything, install editorconfig if it did'
command: git diff --exit-code
- run:
name: 'Remove useless declarations'
command: ci/remove_useless_declarations.sh
- run:
name: 'Check if changed'
command: git diff --cached --exit-code
- run:
name: 'Normalize test output'
command: ci/normalize_expected.sh
- run:
name: 'Check if changed'
command: git diff --exit-code
- run:
name: 'Check for C-style comments in migration files'
command: ci/disallow_c_comments_in_migrations.sh
- run:
name: 'Check if changed'
command: git diff --exit-code
- run:
name: 'Check for lengths of changelog entries'
command: ci/disallow_long_changelog_entries.sh
- run:
name: 'Check for banned C API usage'
command: ci/banned.h.sh
- run:
name: 'Check for tests missing in schedules'
command: ci/check_all_tests_are_run.sh
- run:
name: 'Check if all CI scripts are actually run'
command: ci/check_all_ci_scripts_are_run.sh
check-sql-snapshots:
docker:
- image: 'citus/extbuilder:latest'
steps:
- checkout
- run:
name: 'Check Snapshots'
command: ci/check_sql_snapshots.sh
test-pg-upgrade:
description: Runs postgres upgrade tests
parameters:
old_pg_major:
description: 'postgres major version to use before the upgrade'
type: integer
new_pg_major:
description: 'postgres major version to upgrade to'
type: integer
image:
description: 'docker image to use as for the tests'
type: string
default: citus/pgupgradetester
image_tag:
description: 'docker image tag to use'
type: string
default: latest
docker:
- image: '<< parameters.image >>:<< parameters.image_tag >>'
working_directory: /home/circleci/project
steps:
- checkout
- attach_workspace:
at: .
- run:
name: 'Install Extension'
command: |
tar xfv "${CIRCLE_WORKING_DIRECTORY}/install-<< parameters.old_pg_major >>.tar" --directory /
tar xfv "${CIRCLE_WORKING_DIRECTORY}/install-<< parameters.new_pg_major >>.tar" --directory /
- run:
name: 'Configure'
command: |
chown -R circleci .
gosu circleci ./configure
- run:
name: 'Enable core dumps'
command: |
ulimit -c unlimited
- run:
name: 'Install and test postgres upgrade'
command: |
gosu circleci \
make -C src/test/regress \
check-pg-upgrade \
old-bindir=/usr/lib/postgresql/<< parameters.old_pg_major >>/bin \
new-bindir=/usr/lib/postgresql/<< parameters.new_pg_major >>/bin
no_output_timeout: 2m
- run:
name: 'Regressions'
command: |
if [ -f "src/test/regress/regression.diffs" ]; then
cat src/test/regress/regression.diffs
exit 1
fi
when: on_fail
- run:
name: 'Copy coredumps'
command: |
mkdir -p /tmp/core_dumps
if ls core.* 1> /dev/null 2>&1; then
cp core.* /tmp/core_dumps
fi
when: on_fail
- store_artifacts:
name: 'Save regressions'
path: src/test/regress/regression.diffs
when: on_fail
- store_artifacts:
name: 'Save core dumps'
path: /tmp/core_dumps
when: on_fail
- codecov/upload:
flags: 'test_<< parameters.old_pg_major >>_<< parameters.new_pg_major >>,upgrade'
test-citus-upgrade:
description: Runs citus upgrade tests
parameters:
pg_major:
description: "postgres major version"
type: integer
image:
description: 'docker image to use as for the tests'
type: string
default: citus/citusupgradetester
image_tag:
description: 'docker image tag to use'
type: string
docker:
- image: '<< parameters.image >>:<< parameters.image_tag >>'
working_directory: /home/circleci/project
steps:
- checkout
- attach_workspace:
at: .
- run:
name: 'Configure'
command: |
chown -R circleci .
gosu circleci ./configure
- run:
name: 'Enable core dumps'
command: |
ulimit -c unlimited
- run:
name: 'Install and test citus upgrade'
command: |
# run make check-citus-upgrade for all citus versions
# the image has ${CITUS_VERSIONS} set with all verions it contains the binaries of
for citus_version in ${CITUS_VERSIONS}; do \
gosu circleci \
make -C src/test/regress \
check-citus-upgrade \
bindir=/usr/lib/postgresql/${PG_MAJOR}/bin \
citus-pre-tar=/install-pg11-citus${citus_version}.tar \
citus-post-tar=/home/circleci/project/install-$PG_MAJOR.tar; \
done;
# run make check-citus-upgrade-mixed for all citus versions
# the image has ${CITUS_VERSIONS} set with all verions it contains the binaries of
for citus_version in ${CITUS_VERSIONS}; do \
gosu circleci \
make -C src/test/regress \
check-citus-upgrade-mixed \
bindir=/usr/lib/postgresql/${PG_MAJOR}/bin \
citus-pre-tar=/install-pg11-citus${citus_version}.tar \
citus-post-tar=/home/circleci/project/install-$PG_MAJOR.tar; \
done;
no_output_timeout: 2m
- run:
name: 'Regressions'
command: |
if [ -f "src/test/regress/regression.diffs" ]; then
cat src/test/regress/regression.diffs
exit 1
fi
when: on_fail
- run:
name: 'Copy coredumps'
command: |
mkdir -p /tmp/core_dumps
if ls core.* 1> /dev/null 2>&1; then
cp core.* /tmp/core_dumps
fi
when: on_fail
- store_artifacts:
name: 'Save regressions'
path: src/test/regress/regression.diffs
when: on_fail
- store_artifacts:
name: 'Save core dumps'
path: /tmp/core_dumps
when: on_fail
- codecov/upload:
flags: 'test_<< parameters.pg_major >>,upgrade'
test-citus:
description: Runs the common tests of citus
parameters:
pg_major:
description: "postgres major version"
type: integer
image:
description: 'docker image to use as for the tests'
type: string
default: citus/exttester
image_tag:
description: 'docker image tag to use'
type: string
make:
description: "make target"
type: string
docker:
- image: '<< parameters.image >>:<< parameters.image_tag >>'
working_directory: /home/circleci/project
steps:
- checkout
- attach_workspace:
at: .
- run:
name: 'Install Extension'
command: |
tar xfv "${CIRCLE_WORKING_DIRECTORY}/install-${PG_MAJOR}.tar" --directory /
- run:
name: 'Configure'
command: |
chown -R circleci .
gosu circleci ./configure
- run:
name: 'Enable core dumps'
command: |
ulimit -c unlimited
- run:
name: 'Run Test'
command: |
gosu circleci make -C src/test/regress << parameters.make >>
no_output_timeout: 2m
- run:
name: 'Regressions'
command: |
if [ -f "src/test/regress/regression.diffs" ]; then
cat src/test/regress/regression.diffs
exit 1
fi
when: on_fail
- run:
name: 'Copy coredumps'
command: |
mkdir -p /tmp/core_dumps
if ls core.* 1> /dev/null 2>&1; then
cp core.* /tmp/core_dumps
fi
when: on_fail
- store_artifacts:
name: 'Save regressions'
path: src/test/regress/regression.diffs
when: on_fail
- store_artifacts:
name: 'Save core dumps'
path: /tmp/core_dumps
when: on_fail
- codecov/upload:
flags: 'test_<< parameters.pg_major >>,<< parameters.make >>'
when: always
check-merge-to-enterprise:
docker:
- image: citus/extbuilder:13.0
working_directory: /home/circleci/project
steps:
- checkout
- run:
command: |
ci/check_enterprise_merge.sh
ch_benchmark:
docker:
- image: buildpack-deps:stretch
working_directory: /home/circleci/project
steps:
- checkout
- azure-cli/install
- azure-cli/login-with-service-principal
- run:
command: |
cd ./src/test/hammerdb
sh run_hammerdb.sh citusbot_ch_benchmark_rg
name: install dependencies and run ch_benchmark tests
no_output_timeout: 20m
tpcc_benchmark:
docker:
- image: buildpack-deps:stretch
working_directory: /home/circleci/project
steps:
- checkout
- azure-cli/install
- azure-cli/login-with-service-principal
- run:
command: |
cd ./src/test/hammerdb
sh run_hammerdb.sh citusbot_tpcc_benchmark_rg
name: install dependencies and run ch_benchmark tests
no_output_timeout: 20m
workflows:
version: 2
build_and_test:
jobs:
- check-merge-to-enterprise:
filters:
branches:
ignore:
- /release-[0-9]+\.[0-9]+.*/ # match with releaseX.Y.*
- build:
name: build-11
pg_major: 11
image_tag: '11.9'
- build:
name: build-12
pg_major: 12
image_tag: '12.4'
- build:
name: build-13
pg_major: 13
image_tag: '13.0'
- check-style
- check-sql-snapshots
- test-citus:
name: 'test-11_check-multi'
pg_major: 11
image_tag: '11.9'
make: check-multi
requires: [build-11]
- test-citus:
name: 'test-11_check-mx'
pg_major: 11
image_tag: '11.9'
make: check-multi-mx
requires: [build-11]
- test-citus:
name: 'test-11_check-vanilla'
pg_major: 11
image_tag: '11.9'
make: check-vanilla
requires: [build-11]
- test-citus:
name: 'test-11_check-isolation'
pg_major: 11
image_tag: '11.9'
make: check-isolation
requires: [build-11]
- test-citus:
name: 'test-11_check-worker'
pg_major: 11
image_tag: '11.9'
make: check-worker
requires: [build-11]
- test-citus:
name: 'test-11_check-follower-cluster'
pg_major: 11
image_tag: '11.9'
make: check-follower-cluster
requires: [build-11]
- test-citus:
name: 'test-11_check-failure'
pg_major: 11
image: citus/failtester
image_tag: '11.9'
make: check-failure
requires: [build-11]
- test-citus:
name: 'test-12_check-multi'
pg_major: 12
image_tag: '12.4'
make: check-multi
requires: [build-12]
- test-citus:
name: 'test-12_check-mx'
pg_major: 12
image_tag: '12.4'
make: check-multi-mx
requires: [build-12]
- test-citus:
name: 'test-12_check-vanilla'
pg_major: 12
image_tag: '12.4'
make: check-vanilla
requires: [build-12]
- test-citus:
name: 'test-12_check-isolation'
pg_major: 12
image_tag: '12.4'
make: check-isolation
requires: [build-12]
- test-citus:
name: 'test-12_check-worker'
pg_major: 12
image_tag: '12.4'
make: check-worker
requires: [build-12]
- test-citus:
name: 'test-12_check-follower-cluster'
pg_major: 12
image_tag: '12.4'
make: check-follower-cluster
requires: [build-12]
- test-citus:
name: 'test-12_check-failure'
pg_major: 12
image: citus/failtester
image_tag: '12.4'
make: check-failure
requires: [build-12]
- test-citus:
name: 'test-13_check-multi'
pg_major: 13
image_tag: '13.0'
make: check-multi
requires: [build-13]
- test-citus:
name: 'test-13_check-mx'
pg_major: 13
image_tag: '13.0'
make: check-multi-mx
requires: [build-13]
- test-citus:
name: 'test-13_check-vanilla'
pg_major: 13
image_tag: '13.0'
make: check-vanilla
requires: [build-13]
- test-citus:
name: 'test-13_check-isolation'
pg_major: 13
image_tag: '13.0'
make: check-isolation
requires: [build-13]
- test-citus:
name: 'test-13_check-worker'
pg_major: 13
image_tag: '13.0'
make: check-worker
requires: [build-13]
- test-citus:
name: 'test-13_check-follower-cluster'
pg_major: 13
image_tag: '13.0'
make: check-follower-cluster
requires: [build-13]
- test-citus:
name: 'test-13_check-failure'
pg_major: 13
image: citus/failtester
image_tag: '13.0'
make: check-failure
requires: [build-13]
- test-pg-upgrade:
name: 'test-11-12_check-pg-upgrade'
old_pg_major: 11
new_pg_major: 12
image_tag: latest
requires: [build-11,build-12]
- test-pg-upgrade:
name: 'test-12-13_check-pg-upgrade'
old_pg_major: 12
new_pg_major: 13
image_tag: latest
requires: [build-12,build-13]
- test-citus-upgrade:
name: test-11_check-citus-upgrade
pg_major: 11
image_tag: '11.9'
requires: [build-11]
- ch_benchmark:
requires: [build-13]
filters:
branches:
only:
- /ch_benchmark\/.*/ # match with ch_benchmark/ prefix
- tpcc_benchmark:
requires: [build-13]
filters:
branches:
only:
- /tpcc_benchmark\/.*/ # match with tpcc_benchmark/ prefix

View File

@ -1,7 +0,0 @@
exclude_patterns:
- "src/backend/distributed/utils/citus_outfuncs.c"
- "src/backend/distributed/deparser/ruleutils_*.c"
- "src/include/distributed/citus_nodes.h"
- "src/backend/distributed/safeclib"
- "src/backend/columnar/safeclib"
- "**/vendor/"

View File

@ -1,33 +0,0 @@
# gdbpg.py contains scripts to nicely print the postgres datastructures
# while in a gdb session. Since the vscode debugger is based on gdb this
# actually also works when debugging with vscode. Providing nice tools
# to understand the internal datastructures we are working with.
source /root/gdbpg.py
# when debugging postgres it is convenient to _always_ have a breakpoint
# trigger when an error is logged. Because .gdbinit is sourced before gdb
# is fully attached and has the sources loaded. To make sure the breakpoint
# is added when the library is loaded we temporary set the breakpoint pending
# to on. After we have added out breakpoint we revert back to the default
# configuration for breakpoint pending.
# The breakpoint is hard to read, but at entry of the function we don't have
# the level loaded in elevel. Instead we hardcode the location where the
# level of the current error is stored. Also gdb doesn't understand the
# ERROR symbol so we hardcode this to the value of ERROR. It is very unlikely
# this value will ever change in postgres, but if it does we might need to
# find a way to conditionally load the correct breakpoint.
set breakpoint pending on
break elog.c:errfinish if errordata[errordata_stack_depth].elevel == 21
set breakpoint pending auto
echo \n
echo ----------------------------------------------------------------------------------\n
echo when attaching to a postgres backend a breakpoint will be set on elog.c:errfinish \n
echo it will only break on errors being raised in postgres \n
echo \n
echo to disable this breakpoint from vscode run `-exec disable 1` in the debug console \n
echo this assumes it's the first breakpoint loaded as it is loaded from .gdbinit \n
echo this can be verified with `-exec info break`, enabling can be done with \n
echo `-exec enable 1` \n
echo ----------------------------------------------------------------------------------\n
echo \n

View File

@ -1 +0,0 @@
postgresql-*.tar.bz2

View File

@ -1,7 +0,0 @@
\timing on
\pset linestyle unicode
\pset border 2
\setenv PAGER 'pspg --no-mouse -bX --no-commandbar --no-topbar'
\set HISTSIZE 100000
\set PROMPT1 '\n%[%033[1m%]%M %n@%/:%> (PID: %p)%R%[%033[0m%]%# '
\set PROMPT2 ' '

View File

@ -1,12 +0,0 @@
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"
[packages]
docopt = "*"
[dev-packages]
[requires]
python_version = "3.9"

28
.devcontainer/.vscode/Pipfile.lock generated vendored
View File

@ -1,28 +0,0 @@
{
"_meta": {
"hash": {
"sha256": "6956a6700ead5804aa56bd597c93bb4a13f208d2d49d3b5399365fd240ca0797"
},
"pipfile-spec": 6,
"requires": {
"python_version": "3.9"
},
"sources": [
{
"name": "pypi",
"url": "https://pypi.org/simple",
"verify_ssl": true
}
]
},
"default": {
"docopt": {
"hashes": [
"sha256:49b3a825280bd66b3aa83585ef59c4a8c82f2c8a522dbe754a8bc8d08c85c491"
],
"index": "pypi",
"version": "==0.6.2"
}
},
"develop": {}
}

View File

@ -1,84 +0,0 @@
#! /usr/bin/env pipenv-shebang
"""Generate C/C++ properties file for VSCode.
Uses pgenv to iterate postgres versions and generate
a C/C++ properties file for VSCode containing the
include paths for the postgres headers.
Usage:
generate_c_cpp_properties-json.py <target_path>
generate_c_cpp_properties-json.py (-h | --help)
generate_c_cpp_properties-json.py --version
Options:
-h --help Show this screen.
--version Show version.
"""
import json
import subprocess
from docopt import docopt
def main(args):
target_path = args['<target_path>']
output = subprocess.check_output(['pgenv', 'versions'])
# typical output is:
# 14.8 pgsql-14.8
# * 15.3 pgsql-15.3
# 16beta2 pgsql-16beta2
# where the line marked with a * is the currently active version
#
# we are only interested in the first word of each line, which is the version number
# thus we strip the whitespace and the * from the line and split it into words
# and take the first word
versions = [line.strip('* ').split()[0] for line in output.decode('utf-8').splitlines()]
# create the list of configurations per version
configurations = []
for version in versions:
configurations.append(generate_configuration(version))
# create the json file
c_cpp_properties = {
"configurations": configurations,
"version": 4
}
# write the c_cpp_properties.json file
with open(target_path, 'w') as f:
json.dump(c_cpp_properties, f, indent=4)
def generate_configuration(version):
"""Returns a configuration for the given postgres version.
>>> generate_configuration('14.8')
{
"name": "Citus Development Configuration - Postgres 14.8",
"includePath": [
"/usr/local/include",
"/home/citus/.pgenv/src/postgresql-14.8/src/**",
"${workspaceFolder}/**",
"${workspaceFolder}/src/include/",
],
"configurationProvider": "ms-vscode.makefile-tools"
}
"""
return {
"name": f"Citus Development Configuration - Postgres {version}",
"includePath": [
"/usr/local/include",
f"/home/citus/.pgenv/src/postgresql-{version}/src/**",
"${workspaceFolder}/**",
"${workspaceFolder}/src/include/",
],
"configurationProvider": "ms-vscode.makefile-tools"
}
if __name__ == '__main__':
arguments = docopt(__doc__, version='0.1.0')
main(arguments)

View File

@ -1,40 +0,0 @@
{
"version": "0.2.0",
"configurations": [
{
"name": "Attach Citus (devcontainer)",
"type": "cppdbg",
"request": "attach",
"processId": "${command:pickProcess}",
"program": "/home/citus/.pgenv/pgsql/bin/postgres",
"additionalSOLibSearchPath": "/home/citus/.pgenv/pgsql/lib",
"setupCommands": [
{
"text": "handle SIGUSR1 noprint nostop pass",
"description": "let gdb not stop when SIGUSR1 is sent to process",
"ignoreFailures": true
}
],
},
{
"name": "Open core file",
"type": "cppdbg",
"request": "launch",
"program": "/home/citus/.pgenv/pgsql/bin/postgres",
"coreDumpPath": "${input:corefile}",
"cwd": "${workspaceFolder}",
"MIMode": "gdb",
}
],
"inputs": [
{
"id": "corefile",
"type": "command",
"command": "extension.commandvariable.file.pickFile",
"args": {
"dialogTitle": "Select core file",
"include": "**/core*",
},
},
],
}

View File

@ -1,222 +0,0 @@
FROM ubuntu:22.04 AS base
# environment is to make python pass an interactive shell, probably not the best timezone given a wide variety of colleagues
ENV TZ=UTC
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
# install build tools
RUN apt update && apt install -y \
bison \
bzip2 \
cpanminus \
curl \
docbook-xml \
docbook-xsl \
flex \
gcc \
git \
libcurl4-gnutls-dev \
libicu-dev \
libkrb5-dev \
liblz4-dev \
libpam0g-dev \
libreadline-dev \
libselinux1-dev \
libssl-dev \
libxml2-utils \
libxslt-dev \
libzstd-dev \
locales \
make \
perl \
pkg-config \
python3 \
python3-pip \
software-properties-common \
sudo \
uuid-dev \
valgrind \
xsltproc \
zlib1g-dev \
&& add-apt-repository ppa:deadsnakes/ppa -y \
&& apt install -y \
python3.9-full \
# software properties pulls in pkexec, which makes the debugger unusable in vscode
&& apt purge -y \
software-properties-common \
&& apt autoremove -y \
&& apt clean
RUN sudo pip3 install pipenv pipenv-shebang
RUN cpanm install IPC::Run
RUN locale-gen en_US.UTF-8
# add the citus user to sudoers and allow all sudoers to login without a password prompt
RUN useradd -ms /bin/bash citus \
&& usermod -aG sudo citus \
&& echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers
WORKDIR /home/citus
USER citus
# run all make commands with the number of cores available
RUN echo "export MAKEFLAGS=\"-j \$(nproc)\"" >> "/home/citus/.bashrc"
RUN git clone --branch v1.3.2 --depth 1 https://github.com/theory/pgenv.git .pgenv
COPY --chown=citus:citus pgenv/config/ .pgenv/config/
ENV PATH="/home/citus/.pgenv/bin:${PATH}"
ENV PATH="/home/citus/.pgenv/pgsql/bin:${PATH}"
USER citus
# build postgres versions separately for effective parrallelism and caching of already built versions when changing only certain versions
FROM base AS pg15
RUN MAKEFLAGS="-j $(nproc)" pgenv build 15.13
RUN rm .pgenv/src/*.tar*
RUN make -C .pgenv/src/postgresql-*/ clean
RUN make -C .pgenv/src/postgresql-*/src/include install
# create a staging directory with all files we want to copy from our pgenv build
# we will copy the contents of the staged folder into the final image at once
RUN mkdir .pgenv-staging/
RUN cp -r .pgenv/src .pgenv/pgsql-* .pgenv/config .pgenv-staging/
RUN rm .pgenv-staging/config/default.conf
FROM base AS pg16
RUN MAKEFLAGS="-j $(nproc)" pgenv build 16.9
RUN rm .pgenv/src/*.tar*
RUN make -C .pgenv/src/postgresql-*/ clean
RUN make -C .pgenv/src/postgresql-*/src/include install
# create a staging directory with all files we want to copy from our pgenv build
# we will copy the contents of the staged folder into the final image at once
RUN mkdir .pgenv-staging/
RUN cp -r .pgenv/src .pgenv/pgsql-* .pgenv/config .pgenv-staging/
RUN rm .pgenv-staging/config/default.conf
FROM base AS pg17
RUN MAKEFLAGS="-j $(nproc)" pgenv build 17.5
RUN rm .pgenv/src/*.tar*
RUN make -C .pgenv/src/postgresql-*/ clean
RUN make -C .pgenv/src/postgresql-*/src/include install
# create a staging directory with all files we want to copy from our pgenv build
# we will copy the contents of the staged folder into the final image at once
RUN mkdir .pgenv-staging/
RUN cp -r .pgenv/src .pgenv/pgsql-* .pgenv/config .pgenv-staging/
RUN rm .pgenv-staging/config/default.conf
FROM base AS uncrustify-builder
RUN sudo apt update && sudo apt install -y cmake tree
WORKDIR /uncrustify
RUN curl -L https://github.com/uncrustify/uncrustify/archive/uncrustify-0.68.1.tar.gz | tar xz
WORKDIR /uncrustify/uncrustify-uncrustify-0.68.1/
RUN mkdir build
WORKDIR /uncrustify/uncrustify-uncrustify-0.68.1/build/
RUN cmake ..
RUN MAKEFLAGS="-j $(nproc)" make -s
RUN make install DESTDIR=/uncrustify
# builder for all pipenv's to get them contained in a single layer
FROM base AS pipenv
WORKDIR /workspaces/citus/
# tools to sync pgenv with vscode
COPY --chown=citus:citus .vscode/Pipfile .vscode/Pipfile.lock .devcontainer/.vscode/
RUN ( cd .devcontainer/.vscode && pipenv install )
# environment to run our failure tests
COPY --chown=citus:citus src/ src/
RUN ( cd src/test/regress && pipenv install )
# assemble the final container by copying over the artifacts from separately build containers
FROM base AS devcontainer
LABEL org.opencontainers.image.source=https://github.com/citusdata/citus
LABEL org.opencontainers.image.description="Development container for the Citus project"
LABEL org.opencontainers.image.licenses=AGPL-3.0-only
RUN yes | sudo unminimize
# install developer productivity tools
RUN sudo apt update \
&& sudo apt install -y \
autoconf2.69 \
bash-completion \
fswatch \
gdb \
htop \
libdbd-pg-perl \
libdbi-perl \
lsof \
man \
net-tools \
psmisc \
pspg \
tree \
vim \
&& sudo apt clean
# Since gdb will run in the context of the root user when debugging citus we will need to both
# download the gdbpg.py script as the root user, into their home directory, as well as add .gdbinit
# as a file owned by root
# This will make that as soon as the debugger attaches to a postgres backend (or frankly any other process)
# the gdbpg.py script will be sourced and the developer can direcly use it.
RUN sudo curl -o /root/gdbpg.py https://raw.githubusercontent.com/tvesely/gdbpg/6065eee7872457785f830925eac665aa535caf62/gdbpg.py
COPY --chown=root:root .gdbinit /root/
# install developer dependencies in the global environment
RUN --mount=type=bind,source=requirements.txt,target=requirements.txt pip install -r requirements.txt
# for persistent bash history across devcontainers we need to have
# a) a directory to store the history in
# b) a prompt command to append the history to the file
# c) specify the history file to store the history in
# b and c are done in the .bashrc to make it persistent across shells only
RUN sudo install -d -o citus -g citus /commandhistory \
&& echo "export PROMPT_COMMAND='history -a' && export HISTFILE=/commandhistory/.bash_history" >> "/home/citus/.bashrc"
# install citus-dev
RUN git clone --branch develop https://github.com/citusdata/tools.git citus-tools \
&& ( cd citus-tools/citus_dev && pipenv install ) \
&& mkdir -p ~/.local/bin \
&& ln -s /home/citus/citus-tools/citus_dev/citus_dev-pipenv .local/bin/citus_dev \
&& sudo make -C citus-tools/uncrustify install bindir=/usr/local/bin pkgsysconfdir=/usr/local/etc/ \
&& mkdir -p ~/.local/share/bash-completion/completions/ \
&& ln -s ~/citus-tools/citus_dev/bash_completion ~/.local/share/bash-completion/completions/citus_dev
# TODO some LC_ALL errors, possibly solved by locale-gen
RUN git clone https://github.com/so-fancy/diff-so-fancy.git \
&& mkdir -p ~/.local/bin \
&& ln -s /home/citus/diff-so-fancy/diff-so-fancy .local/bin/
COPY --link --from=uncrustify-builder /uncrustify/usr/ /usr/
COPY --link --from=pg15 /home/citus/.pgenv-staging/ /home/citus/.pgenv/
COPY --link --from=pg16 /home/citus/.pgenv-staging/ /home/citus/.pgenv/
COPY --link --from=pg17 /home/citus/.pgenv-staging/ /home/citus/.pgenv/
COPY --link --from=pipenv /home/citus/.local/share/virtualenvs/ /home/citus/.local/share/virtualenvs/
# place to run your cluster with citus_dev
VOLUME /data
RUN sudo mkdir /data \
&& sudo chown citus:citus /data
COPY --chown=citus:citus .psqlrc .
# with the copy linking of layers github actions seem to misbehave with the ownership of the
# directories leading upto the link, hence a small patch layer to have to right ownerships set
RUN sudo chown --from=root:root citus:citus -R ~
# sets default pg version
RUN pgenv switch 17.5
# make connecting to the coordinator easy
ENV PGPORT=9700

View File

@ -1,11 +0,0 @@
init: ../.vscode/c_cpp_properties.json ../.vscode/launch.json
../.vscode:
mkdir -p ../.vscode
../.vscode/launch.json: ../.vscode .vscode/launch.json
cp .vscode/launch.json ../.vscode/launch.json
../.vscode/c_cpp_properties.json: ../.vscode
./.vscode/generate_c_cpp_properties-json.py ../.vscode/c_cpp_properties.json

View File

@ -1,37 +0,0 @@
{
"image": "ghcr.io/citusdata/citus-devcontainer:main",
"runArgs": [
"--cap-add=SYS_PTRACE",
"--ulimit=core=-1",
],
"forwardPorts": [
9700
],
"customizations": {
"vscode": {
"extensions": [
"eamodio.gitlens",
"GitHub.copilot-chat",
"GitHub.copilot",
"github.vscode-github-actions",
"github.vscode-pull-request-github",
"ms-vscode.cpptools-extension-pack",
"ms-vsliveshare.vsliveshare",
"rioj7.command-variable",
],
"settings": {
"files.exclude": {
"**/*.o": true,
"**/.deps/": true,
}
},
}
},
"mounts": [
"type=volume,target=/data",
"source=citus-bashhistory,target=/commandhistory,type=volume",
],
"updateContentCommand": "./configure",
"postCreateCommand": "make -C .devcontainer/",
}

View File

@ -1,15 +0,0 @@
PGENV_MAKE_OPTIONS=(-s)
PGENV_CONFIGURE_OPTIONS=(
--enable-debug
--enable-depend
--enable-cassert
--enable-tap-tests
'CFLAGS=-ggdb -Og -g3 -fno-omit-frame-pointer -DUSE_VALGRIND'
--with-openssl
--with-libxml
--with-libxslt
--with-uuid=e2fs
--with-icu
--with-lz4
)

View File

@ -1,9 +0,0 @@
black==23.11.0
click==8.1.7
isort==5.12.0
mypy-extensions==1.0.0
packaging==23.2
pathspec==0.11.2
platformdirs==4.0.0
tomli==2.0.1
typing_extensions==4.8.0

View File

@ -1,28 +0,0 @@
[[source]]
name = "pypi"
url = "https://pypi.python.org/simple"
verify_ssl = true
[packages]
mitmproxy = {editable = true, ref = "main", git = "https://github.com/citusdata/mitmproxy.git"}
construct = "*"
docopt = "==0.6.2"
cryptography = ">=41.0.4"
pytest = "*"
psycopg = "*"
filelock = "*"
pytest-asyncio = "*"
pytest-timeout = "*"
pytest-xdist = "*"
pytest-repeat = "*"
pyyaml = "*"
werkzeug = "==2.3.7"
[dev-packages]
black = "*"
isort = "*"
flake8 = "*"
flake8-bugbear = "*"
[requires]
python_version = "3.9"

File diff suppressed because it is too large Load Diff

View File

@ -17,7 +17,13 @@ trim_trailing_whitespace = true
insert_final_newline = unset insert_final_newline = unset
trim_trailing_whitespace = unset trim_trailing_whitespace = unset
[*.{sql,sh,py,toml}] # Don't change test/regress/output directory, this needs to be a separate rule
# for some reason
[/src/test/regress/output/**]
insert_final_newline = unset
trim_trailing_whitespace = unset
[*.{sql,sh}]
indent_style = space indent_style = space
indent_size = 4 indent_size = 4
tab_width = 4 tab_width = 4

View File

@ -1,7 +0,0 @@
[flake8]
# E203 is ignored for black
extend-ignore = E203
# black will truncate to 88 characters usually, but long string literals it
# might keep. That's fine in most cases unless it gets really excessive.
max-line-length = 150
exclude = .git,__pycache__,vendor,tmp_*

8
.gitattributes vendored
View File

@ -16,6 +16,7 @@ README.* conflict-marker-size=32
# Test output files that contain extra whitespace # Test output files that contain extra whitespace
*.out -whitespace *.out -whitespace
src/test/regress/output/*.source -whitespace
# These files are maintained or generated elsewhere. We take them as is. # These files are maintained or generated elsewhere. We take them as is.
configure -whitespace configure -whitespace
@ -25,9 +26,10 @@ configure -whitespace
# except these exceptions... # except these exceptions...
src/backend/distributed/utils/citus_outfuncs.c -citus-style src/backend/distributed/utils/citus_outfuncs.c -citus-style
src/backend/distributed/deparser/ruleutils_15.c -citus-style src/backend/distributed/utils/pg11_snprintf.c -citus-style
src/backend/distributed/deparser/ruleutils_16.c -citus-style src/backend/distributed/deparser/ruleutils_11.c -citus-style
src/backend/distributed/deparser/ruleutils_17.c -citus-style src/backend/distributed/deparser/ruleutils_12.c -citus-style
src/backend/distributed/deparser/ruleutils_13.c -citus-style
src/backend/distributed/commands/index_pg_source.c -citus-style src/backend/distributed/commands/index_pg_source.c -citus-style
src/include/distributed/citus_nodes.h -citus-style src/include/distributed/citus_nodes.h -citus-style

View File

@ -1,23 +0,0 @@
name: 'Parallelization matrix'
inputs:
count:
required: false
default: 32
outputs:
json:
value: ${{ steps.generate_matrix.outputs.json }}
runs:
using: "composite"
steps:
- name: Generate parallelization matrix
id: generate_matrix
shell: bash
run: |-
json_array="{\"include\": ["
for ((i = 1; i <= ${{ inputs.count }}; i++)); do
json_array+="{\"id\":\"$i\"},"
done
json_array=${json_array%,}
json_array+=" ]}"
echo "json=$json_array" >> "$GITHUB_OUTPUT"
echo "json=$json_array"

View File

@ -1,38 +0,0 @@
name: save_logs_and_results
inputs:
folder:
required: false
default: "log"
runs:
using: composite
steps:
- uses: actions/upload-artifact@v4.6.0
name: Upload logs
with:
name: ${{ inputs.folder }}
if-no-files-found: ignore
path: |
src/test/**/proxy.output
src/test/**/results/
src/test/**/tmp_check/master/log
src/test/**/tmp_check/worker.57638/log
src/test/**/tmp_check/worker.57637/log
src/test/**/*.diffs
src/test/**/out/ddls.sql
src/test/**/out/queries.sql
src/test/**/logfile_*
/tmp/pg_upgrade_newData_logs
- name: Publish regression.diffs
run: |-
diffs="$(find src/test/regress -name "*.diffs" -exec cat {} \;)"
if ! [ -z "$diffs" ]; then
echo '```diff' >> $GITHUB_STEP_SUMMARY
echo -E "$diffs" >> $GITHUB_STEP_SUMMARY
echo '```' >> $GITHUB_STEP_SUMMARY
echo -E $diffs
fi
shell: bash
- name: Print stack traces
run: "./ci/print_stack_trace.sh"
if: failure()
shell: bash

View File

@ -1,35 +0,0 @@
name: setup_extension
inputs:
pg_major:
required: false
skip_installation:
required: false
default: false
type: boolean
runs:
using: composite
steps:
- name: Expose $PG_MAJOR to Github Env
run: |-
if [ -z "${{ inputs.pg_major }}" ]; then
echo "PG_MAJOR=${PG_MAJOR}" >> $GITHUB_ENV
else
echo "PG_MAJOR=${{ inputs.pg_major }}" >> $GITHUB_ENV
fi
shell: bash
- uses: actions/download-artifact@v4.1.8
with:
name: build-${{ env.PG_MAJOR }}
- name: Install Extension
if: ${{ inputs.skip_installation == 'false' }}
run: tar xfv "install-$PG_MAJOR.tar" --directory /
shell: bash
- name: Configure
run: |-
chown -R circleci .
git config --global --add safe.directory ${GITHUB_WORKSPACE}
gosu circleci ./configure --without-pg-version-check
shell: bash
- name: Enable core dumps
run: ulimit -c unlimited
shell: bash

View File

@ -1,27 +0,0 @@
name: coverage
inputs:
flags:
required: false
codecov_token:
required: true
runs:
using: composite
steps:
- uses: codecov/codecov-action@v3
with:
flags: ${{ inputs.flags }}
token: ${{ inputs.codecov_token }}
verbose: true
gcov: true
- name: Create codeclimate coverage
run: |-
lcov --directory . --capture --output-file lcov.info
lcov --remove lcov.info -o lcov.info '/usr/*'
sed "s=^SF:$PWD/=SF:=g" -i lcov.info # relative pats are required by codeclimate
mkdir -p /tmp/codeclimate
cc-test-reporter format-coverage -t lcov -o /tmp/codeclimate/${{ inputs.flags }}.json lcov.info
shell: bash
- uses: actions/upload-artifact@v4.6.0
with:
path: "/tmp/codeclimate/*.json"
name: codeclimate-${{ inputs.flags }}

View File

@ -1,3 +0,0 @@
base:
- ".* warning: ignoring old recipe for target [`']check'"
- ".* warning: overriding recipe for target [`']check'"

View File

@ -1,51 +0,0 @@
#!/bin/bash
set -ex
# Function to get the OS version
get_rpm_os_version() {
if [[ -f /etc/centos-release ]]; then
cat /etc/centos-release | awk '{print $4}'
elif [[ -f /etc/oracle-release ]]; then
cat /etc/oracle-release | awk '{print $5}'
else
echo "Unknown"
fi
}
package_type=${1}
# Since $HOME is set in GH_Actions as /github/home, pyenv fails to create virtualenvs.
# For this script, we set $HOME to /root and then set it back to /github/home.
GITHUB_HOME="${HOME}"
export HOME="/root"
eval "$(pyenv init -)"
pyenv versions
pyenv virtualenv ${PACKAGING_PYTHON_VERSION} packaging_env
pyenv activate packaging_env
git clone -b v0.8.27 --depth=1 https://github.com/citusdata/tools.git tools
python3 -m pip install -r tools/packaging_automation/requirements.txt
echo "Package type: ${package_type}"
echo "OS version: $(get_rpm_os_version)"
# For RHEL 7, we need to install urllib3<2 due to below execution error
# ImportError: urllib3 v2.0 only supports OpenSSL 1.1.1+, currently the 'ssl'
# module is compiled with 'OpenSSL 1.0.2k-fips 26 Jan 2017'.
# See: https://github.com/urllib3/urllib3/issues/2168
if [[ ${package_type} == "rpm" && $(get_rpm_os_version) == 7* ]]; then
python3 -m pip uninstall -y urllib3
python3 -m pip install 'urllib3<2'
fi
python3 -m tools.packaging_automation.validate_build_output --output_file output.log \
--ignore_file .github/packaging/packaging_ignore.yml \
--package_type ${package_type}
pyenv deactivate
# Set $HOME back to /github/home
export HOME=${GITHUB_HOME}
# Print the output to the console

View File

@ -1,546 +0,0 @@
name: Build & Test
run-name: Build & Test - ${{ github.event.pull_request.title || github.ref_name }}
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
on:
workflow_dispatch:
inputs:
skip_test_flakyness:
required: false
default: false
type: boolean
push:
branches:
- "main"
- "release-*"
pull_request:
types: [opened, reopened,synchronize]
merge_group:
jobs:
# Since GHA does not interpolate env varibles in matrix context, we need to
# define them in a separate job and use them in other jobs.
params:
runs-on: ubuntu-latest
name: Initialize parameters
outputs:
build_image_name: "ghcr.io/citusdata/extbuilder"
test_image_name: "ghcr.io/citusdata/exttester"
citusupgrade_image_name: "ghcr.io/citusdata/citusupgradetester"
fail_test_image_name: "ghcr.io/citusdata/failtester"
pgupgrade_image_name: "ghcr.io/citusdata/pgupgradetester"
style_checker_image_name: "ghcr.io/citusdata/stylechecker"
style_checker_tools_version: "0.8.18"
sql_snapshot_pg_version: "17.5"
image_suffix: "-dev-d28f316"
pg15_version: '{ "major": "15", "full": "15.13" }'
pg16_version: '{ "major": "16", "full": "16.9" }'
pg17_version: '{ "major": "17", "full": "17.5" }'
upgrade_pg_versions: "15.13-16.9-17.5"
steps:
# Since GHA jobs need at least one step we use a noop step here.
- name: Set up parameters
run: echo 'noop'
check-sql-snapshots:
needs: params
runs-on: ubuntu-latest
container:
image: ${{ needs.params.outputs.build_image_name }}:${{ needs.params.outputs.sql_snapshot_pg_version }}${{ needs.params.outputs.image_suffix }}
options: --user root
steps:
- uses: actions/checkout@v4
- name: Check Snapshots
run: |
git config --global --add safe.directory ${GITHUB_WORKSPACE}
ci/check_sql_snapshots.sh
check-style:
needs: params
runs-on: ubuntu-latest
container:
image: ${{ needs.params.outputs.style_checker_image_name }}:${{ needs.params.outputs.style_checker_tools_version }}${{ needs.params.outputs.image_suffix }}
steps:
- name: Check Snapshots
run: |
git config --global --add safe.directory ${GITHUB_WORKSPACE}
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Check C Style
run: citus_indent --check
- name: Check Python style
run: black --check .
- name: Check Python import order
run: isort --check .
- name: Check Python lints
run: flake8 .
- name: Fix whitespace
run: ci/editorconfig.sh && git diff --exit-code
- name: Remove useless declarations
run: ci/remove_useless_declarations.sh && git diff --cached --exit-code
- name: Sort and group includes
run: ci/sort_and_group_includes.sh && git diff --exit-code
- name: Normalize test output
run: ci/normalize_expected.sh && git diff --exit-code
- name: Check for C-style comments in migration files
run: ci/disallow_c_comments_in_migrations.sh && git diff --exit-code
- name: 'Check for comment--cached ns that start with # character in spec files'
run: ci/disallow_hash_comments_in_spec_files.sh && git diff --exit-code
- name: Check for gitignore entries .for source files
run: ci/fix_gitignore.sh && git diff --exit-code
- name: Check for lengths of changelog entries
run: ci/disallow_long_changelog_entries.sh
- name: Check for banned C API usage
run: ci/banned.h.sh
- name: Check for tests missing in schedules
run: ci/check_all_tests_are_run.sh
- name: Check if all CI scripts are actually run
run: ci/check_all_ci_scripts_are_run.sh
- name: Check if all GUCs are sorted alphabetically
run: ci/check_gucs_are_alphabetically_sorted.sh
- name: Check for missing downgrade scripts
run: ci/check_migration_files.sh
build:
needs: params
name: Build for PG${{ fromJson(matrix.pg_version).major }}
strategy:
fail-fast: false
matrix:
image_name:
- ${{ needs.params.outputs.build_image_name }}
image_suffix:
- ${{ needs.params.outputs.image_suffix}}
pg_version:
- ${{ needs.params.outputs.pg15_version }}
- ${{ needs.params.outputs.pg16_version }}
- ${{ needs.params.outputs.pg17_version }}
runs-on: ubuntu-latest
container:
image: "${{ matrix.image_name }}:${{ fromJson(matrix.pg_version).full }}${{ matrix.image_suffix }}"
options: --user root
steps:
- uses: actions/checkout@v4
- name: Expose $PG_MAJOR to Github Env
run: echo "PG_MAJOR=${PG_MAJOR}" >> $GITHUB_ENV
shell: bash
- name: Build
run: "./ci/build-citus.sh"
shell: bash
- uses: actions/upload-artifact@v4.6.0
with:
name: build-${{ env.PG_MAJOR }}
path: |-
./build-${{ env.PG_MAJOR }}/*
./install-${{ env.PG_MAJOR }}.tar
test-citus:
name: PG${{ fromJson(matrix.pg_version).major }} - ${{ matrix.make }}
strategy:
fail-fast: false
matrix:
suite:
- regress
image_name:
- ${{ needs.params.outputs.test_image_name }}
pg_version:
- ${{ needs.params.outputs.pg15_version }}
- ${{ needs.params.outputs.pg16_version }}
- ${{ needs.params.outputs.pg17_version }}
make:
- check-split
- check-multi
- check-multi-1
- check-multi-mx
- check-vanilla
- check-isolation
- check-operations
- check-follower-cluster
- check-columnar
- check-columnar-isolation
- check-enterprise
- check-enterprise-isolation
- check-enterprise-isolation-logicalrep-1
- check-enterprise-isolation-logicalrep-2
- check-enterprise-isolation-logicalrep-3
include:
- make: check-failure
pg_version: ${{ needs.params.outputs.pg15_version }}
suite: regress
image_name: ${{ needs.params.outputs.fail_test_image_name }}
- make: check-failure
pg_version: ${{ needs.params.outputs.pg16_version }}
suite: regress
image_name: ${{ needs.params.outputs.fail_test_image_name }}
- make: check-failure
pg_version: ${{ needs.params.outputs.pg17_version }}
suite: regress
image_name: ${{ needs.params.outputs.fail_test_image_name }}
- make: check-enterprise-failure
pg_version: ${{ needs.params.outputs.pg15_version }}
suite: regress
image_name: ${{ needs.params.outputs.fail_test_image_name }}
- make: check-enterprise-failure
pg_version: ${{ needs.params.outputs.pg16_version }}
suite: regress
image_name: ${{ needs.params.outputs.fail_test_image_name }}
- make: check-enterprise-failure
pg_version: ${{ needs.params.outputs.pg17_version }}
suite: regress
image_name: ${{ needs.params.outputs.fail_test_image_name }}
- make: check-pytest
pg_version: ${{ needs.params.outputs.pg15_version }}
suite: regress
image_name: ${{ needs.params.outputs.fail_test_image_name }}
- make: check-pytest
pg_version: ${{ needs.params.outputs.pg16_version }}
suite: regress
image_name: ${{ needs.params.outputs.fail_test_image_name }}
- make: check-pytest
pg_version: ${{ needs.params.outputs.pg17_version }}
suite: regress
image_name: ${{ needs.params.outputs.fail_test_image_name }}
- make: installcheck
suite: cdc
image_name: ${{ needs.params.outputs.test_image_name }}
pg_version: ${{ needs.params.outputs.pg15_version }}
- make: installcheck
suite: cdc
image_name: ${{ needs.params.outputs.test_image_name }}
pg_version: ${{ needs.params.outputs.pg16_version }}
- make: installcheck
suite: cdc
image_name: ${{ needs.params.outputs.test_image_name }}
pg_version: ${{ needs.params.outputs.pg17_version }}
- make: check-query-generator
pg_version: ${{ needs.params.outputs.pg15_version }}
suite: regress
image_name: ${{ needs.params.outputs.fail_test_image_name }}
- make: check-query-generator
pg_version: ${{ needs.params.outputs.pg16_version }}
suite: regress
image_name: ${{ needs.params.outputs.fail_test_image_name }}
- make: check-query-generator
pg_version: ${{ needs.params.outputs.pg17_version }}
suite: regress
image_name: ${{ needs.params.outputs.fail_test_image_name }}
runs-on: ubuntu-latest
container:
image: "${{ matrix.image_name }}:${{ fromJson(matrix.pg_version).full }}${{ needs.params.outputs.image_suffix }}"
options: --user root --dns=8.8.8.8
# Due to Github creates a default network for each job, we need to use
# --dns= to have similar DNS settings as our other CI systems or local
# machines. Otherwise, we may see different results.
needs:
- params
- build
steps:
- uses: actions/checkout@v4
- uses: "./.github/actions/setup_extension"
- name: Run Test
run: gosu circleci make -C src/test/${{ matrix.suite }} ${{ matrix.make }}
timeout-minutes: 20
- uses: "./.github/actions/save_logs_and_results"
if: always()
with:
folder: ${{ fromJson(matrix.pg_version).major }}_${{ matrix.make }}
- uses: "./.github/actions/upload_coverage"
if: always()
with:
flags: ${{ env.PG_MAJOR }}_${{ matrix.suite }}_${{ matrix.make }}
codecov_token: ${{ secrets.CODECOV_TOKEN }}
test-arbitrary-configs:
name: PG${{ fromJson(matrix.pg_version).major }} - check-arbitrary-configs-${{ matrix.parallel }}
runs-on: ["self-hosted", "1ES.Pool=1es-gha-citusdata-pool"]
container:
image: "${{ matrix.image_name }}:${{ fromJson(matrix.pg_version).full }}${{ needs.params.outputs.image_suffix }}"
options: --user root
needs:
- params
- build
strategy:
fail-fast: false
matrix:
image_name:
- ${{ needs.params.outputs.fail_test_image_name }}
pg_version:
- ${{ needs.params.outputs.pg15_version }}
- ${{ needs.params.outputs.pg16_version }}
- ${{ needs.params.outputs.pg17_version }}
parallel: [0,1,2,3,4,5] # workaround for running 6 parallel jobs
steps:
- uses: actions/checkout@v4
- uses: "./.github/actions/setup_extension"
- name: Test arbitrary configs
run: |-
# we use parallel jobs to split the tests into 6 parts and run them in parallel
# the script below extracts the tests for the current job
N=6 # Total number of jobs (see matrix.parallel)
X=${{ matrix.parallel }} # Current job number
TESTS=$(src/test/regress/citus_tests/print_test_names.py |
tr '\n' ',' | awk -v N="$N" -v X="$X" -F, '{
split("", parts)
for (i = 1; i <= NF; i++) {
parts[i % N] = parts[i % N] $i ","
}
print substr(parts[X], 1, length(parts[X])-1)
}')
echo $TESTS
gosu circleci \
make -C src/test/regress \
check-arbitrary-configs parallel=4 CONFIGS=$TESTS
- uses: "./.github/actions/save_logs_and_results"
if: always()
with:
folder: ${{ env.PG_MAJOR }}_arbitrary_configs_${{ matrix.parallel }}
- uses: "./.github/actions/upload_coverage"
if: always()
with:
flags: ${{ env.PG_MAJOR }}_arbitrary_configs_${{ matrix.parallel }}
codecov_token: ${{ secrets.CODECOV_TOKEN }}
test-pg-upgrade:
name: PG${{ matrix.old_pg_major }}-PG${{ matrix.new_pg_major }} - check-pg-upgrade
runs-on: ubuntu-latest
container:
image: "${{ needs.params.outputs.pgupgrade_image_name }}:${{ needs.params.outputs.upgrade_pg_versions }}${{ needs.params.outputs.image_suffix }}"
options: --user root
needs:
- params
- build
strategy:
fail-fast: false
matrix:
include:
- old_pg_major: 15
new_pg_major: 16
- old_pg_major: 16
new_pg_major: 17
- old_pg_major: 15
new_pg_major: 17
env:
old_pg_major: ${{ matrix.old_pg_major }}
new_pg_major: ${{ matrix.new_pg_major }}
steps:
- uses: actions/checkout@v4
- uses: "./.github/actions/setup_extension"
with:
pg_major: "${{ env.old_pg_major }}"
- uses: "./.github/actions/setup_extension"
with:
pg_major: "${{ env.new_pg_major }}"
- name: Install and test postgres upgrade
run: |-
gosu circleci \
make -C src/test/regress \
check-pg-upgrade \
old-bindir=/usr/lib/postgresql/${{ env.old_pg_major }}/bin \
new-bindir=/usr/lib/postgresql/${{ env.new_pg_major }}/bin
- name: Copy pg_upgrade logs for newData dir
run: |-
mkdir -p /tmp/pg_upgrade_newData_logs
if ls src/test/regress/tmp_upgrade/newData/*.log 1> /dev/null 2>&1; then
cp src/test/regress/tmp_upgrade/newData/*.log /tmp/pg_upgrade_newData_logs
fi
if: failure()
- uses: "./.github/actions/save_logs_and_results"
if: always()
with:
folder: ${{ env.old_pg_major }}_${{ env.new_pg_major }}_upgrade
- uses: "./.github/actions/upload_coverage"
if: always()
with:
flags: ${{ env.old_pg_major }}_${{ env.new_pg_major }}_upgrade
codecov_token: ${{ secrets.CODECOV_TOKEN }}
test-citus-upgrade:
name: PG${{ fromJson(needs.params.outputs.pg15_version).major }} - check-citus-upgrade
runs-on: ubuntu-latest
container:
image: "${{ needs.params.outputs.citusupgrade_image_name }}:${{ fromJson(needs.params.outputs.pg15_version).full }}${{ needs.params.outputs.image_suffix }}"
options: --user root
needs:
- params
- build
steps:
- uses: actions/checkout@v4
- uses: "./.github/actions/setup_extension"
with:
skip_installation: true
- name: Install and test citus upgrade
run: |-
# run make check-citus-upgrade for all citus versions
# the image has ${CITUS_VERSIONS} set with all verions it contains the binaries of
for citus_version in ${CITUS_VERSIONS}; do \
gosu circleci \
make -C src/test/regress \
check-citus-upgrade \
bindir=/usr/lib/postgresql/${PG_MAJOR}/bin \
citus-old-version=${citus_version} \
citus-pre-tar=/install-pg${PG_MAJOR}-citus${citus_version}.tar \
citus-post-tar=${GITHUB_WORKSPACE}/install-$PG_MAJOR.tar; \
done;
# run make check-citus-upgrade-mixed for all citus versions
# the image has ${CITUS_VERSIONS} set with all verions it contains the binaries of
for citus_version in ${CITUS_VERSIONS}; do \
gosu circleci \
make -C src/test/regress \
check-citus-upgrade-mixed \
citus-old-version=${citus_version} \
bindir=/usr/lib/postgresql/${PG_MAJOR}/bin \
citus-pre-tar=/install-pg${PG_MAJOR}-citus${citus_version}.tar \
citus-post-tar=${GITHUB_WORKSPACE}/install-$PG_MAJOR.tar; \
done;
- uses: "./.github/actions/save_logs_and_results"
if: always()
with:
folder: ${{ env.PG_MAJOR }}_citus_upgrade
- uses: "./.github/actions/upload_coverage"
if: always()
with:
flags: ${{ env.PG_MAJOR }}_citus_upgrade
codecov_token: ${{ secrets.CODECOV_TOKEN }}
upload-coverage:
# secret below is not available for forks so disabling upload action for them
if: ${{ github.event.pull_request.head.repo.full_name == github.repository || github.event_name != 'pull_request' }}
env:
CC_TEST_REPORTER_ID: ${{ secrets.CC_TEST_REPORTER_ID }}
runs-on: ubuntu-latest
container:
image: ${{ needs.params.outputs.test_image_name }}:${{ fromJson(needs.params.outputs.pg17_version).full }}${{ needs.params.outputs.image_suffix }}
needs:
- params
- test-citus
- test-arbitrary-configs
- test-citus-upgrade
- test-pg-upgrade
steps:
- uses: actions/download-artifact@v4.1.8
with:
pattern: codeclimate*
path: codeclimate
merge-multiple: true
- name: Upload coverage results to Code Climate
run: |-
cc-test-reporter sum-coverage codeclimate/*.json -o total.json
cc-test-reporter upload-coverage -i total.json
ch_benchmark:
name: CH Benchmark
if: startsWith(github.ref, 'refs/heads/ch_benchmark/')
runs-on: ubuntu-latest
needs:
- build
steps:
- uses: actions/checkout@v4
- uses: azure/login@v1
with:
creds: ${{ secrets.AZURE_CREDENTIALS }}
- name: install dependencies and run ch_benchmark tests
uses: azure/CLI@v1
with:
inlineScript: |
cd ./src/test/hammerdb
chmod +x run_hammerdb.sh
run_hammerdb.sh citusbot_ch_benchmark_rg
tpcc_benchmark:
name: TPCC Benchmark
if: startsWith(github.ref, 'refs/heads/tpcc_benchmark/')
runs-on: ubuntu-latest
needs:
- build
steps:
- uses: actions/checkout@v4
- uses: azure/login@v1
with:
creds: ${{ secrets.AZURE_CREDENTIALS }}
- name: install dependencies and run tpcc_benchmark tests
uses: azure/CLI@v1
with:
inlineScript: |
cd ./src/test/hammerdb
chmod +x run_hammerdb.sh
run_hammerdb.sh citusbot_tpcc_benchmark_rg
prepare_parallelization_matrix_32:
name: Prepare parallelization matrix
if: ${{ needs.test-flakyness-pre.outputs.tests != ''}}
needs: test-flakyness-pre
runs-on: ubuntu-latest
outputs:
json: ${{ steps.parallelization.outputs.json }}
steps:
- uses: actions/checkout@v4
- uses: "./.github/actions/parallelization"
id: parallelization
with:
count: 32
test-flakyness-pre:
name: Detect regression tests need to be ran
if: ${{ !inputs.skip_test_flakyness }}}
runs-on: ubuntu-latest
needs: build
outputs:
tests: ${{ steps.detect-regression-tests.outputs.tests }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Detect regression tests need to be ran
id: detect-regression-tests
run: |-
detected_changes=$(git diff origin/main... --name-only --diff-filter=AM | (grep 'src/test/regress/sql/.*\.sql\|src/test/regress/spec/.*\.spec\|src/test/regress/citus_tests/test/test_.*\.py' || true))
tests=${detected_changes}
# split the tests to be skipped --today we only skip upgrade tests
skipped_tests=""
not_skipped_tests=""
for test in $tests; do
if [[ $test =~ ^src/test/regress/sql/upgrade_ ]]; then
skipped_tests="$skipped_tests $test"
else
not_skipped_tests="$not_skipped_tests $test"
fi
done
if [ ! -z "$skipped_tests" ]; then
echo "Skipped tests " $skipped_tests
fi
if [ -z "$not_skipped_tests" ]; then
echo "Not detected any tests that flaky test detection should run"
else
echo "Detected tests " $not_skipped_tests
fi
echo 'tests<<EOF' >> $GITHUB_OUTPUT
echo "$not_skipped_tests" >> "$GITHUB_OUTPUT"
echo 'EOF' >> $GITHUB_OUTPUT
test-flakyness:
if: ${{ needs.test-flakyness-pre.outputs.tests != ''}}
name: Test flakyness
runs-on: ubuntu-latest
container:
image: ${{ needs.params.outputs.fail_test_image_name }}:${{ fromJson(needs.params.outputs.pg17_version).full }}${{ needs.params.outputs.image_suffix }}
options: --user root
env:
runs: 8
needs:
- params
- build
- test-flakyness-pre
- prepare_parallelization_matrix_32
strategy:
fail-fast: false
matrix: ${{ fromJson(needs.prepare_parallelization_matrix_32.outputs.json) }}
steps:
- uses: actions/checkout@v4
- uses: actions/download-artifact@v4.1.8
- uses: "./.github/actions/setup_extension"
- name: Run minimal tests
run: |-
tests="${{ needs.test-flakyness-pre.outputs.tests }}"
tests_array=($tests)
for test in "${tests_array[@]}"
do
test_name=$(echo "$test" | sed -r "s/.+\/(.+)\..+/\1/")
gosu circleci src/test/regress/citus_tests/run_test.py $test_name --repeat ${{ env.runs }} --use-whole-schedule-line
done
shell: bash
- uses: "./.github/actions/save_logs_and_results"
if: always()
with:
folder: test_flakyness_parallel_${{ matrix.id }}

View File

@ -1,79 +0,0 @@
name: "CodeQL"
on:
schedule:
- cron: '59 23 * * 6'
workflow_dispatch:
jobs:
analyze:
name: Analyze
runs-on: ubuntu-22.04
permissions:
actions: read
contents: read
security-events: write
strategy:
fail-fast: false
matrix:
language: [ 'cpp', 'python']
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Initialize CodeQL
uses: github/codeql-action/init@v3
with:
languages: ${{ matrix.language }}
- name: Install package dependencies
run: |
# Create the file repository configuration:
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main 15" > /etc/apt/sources.list.d/pgdg.list'
# Import the repository signing key:
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
autotools-dev \
build-essential \
ca-certificates \
curl \
debhelper \
devscripts \
fakeroot \
flex \
libcurl4-openssl-dev \
libdistro-info-perl \
libedit-dev \
libfile-fcntllock-perl \
libicu-dev \
libkrb5-dev \
liblz4-1 \
liblz4-dev \
libpam0g-dev \
libreadline-dev \
libselinux1-dev \
libssl-dev \
libxslt-dev \
libzstd-dev \
libzstd1 \
lintian \
postgresql-server-dev-15 \
postgresql-server-dev-all \
python3-pip \
python3-setuptools \
wget \
zlib1g-dev
- name: Configure, Build and Install Citus
if: matrix.language == 'cpp'
run: |
./configure
make -sj8
sudo make install-all
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v3

View File

@ -1,54 +0,0 @@
name: "Build devcontainer"
# Since building of containers can be quite time consuming, and take up some storage,
# there is no need to finish a build for a tag if new changes are concurrently being made.
# This cancels any previous builds for the same tag, and only the latest one will be kept.
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
on:
push:
paths:
- ".devcontainer/**"
workflow_dispatch:
jobs:
docker:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
attestations: write
id-token: write
steps:
-
name: Docker meta
id: meta
uses: docker/metadata-action@v5
with:
images: |
ghcr.io/citusdata/citus-devcontainer
tags: |
type=ref,event=branch
type=sha
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
-
name: 'Login to GitHub Container Registry'
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{github.actor}}
password: ${{secrets.GITHUB_TOKEN}}
-
name: Build and push
uses: docker/build-push-action@v5
with:
context: "{{defaultContext}}:.devcontainer"
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max

View File

@ -1,79 +0,0 @@
name: Flaky test debugging
run-name: Flaky test debugging - ${{ inputs.flaky_test }} (${{ inputs.flaky_test_runs_per_job }}x${{ inputs.flaky_test_parallel_jobs }})
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
on:
workflow_dispatch:
inputs:
flaky_test:
required: true
type: string
description: Test to run
flaky_test_runs_per_job:
required: false
default: 8
type: number
description: Number of times to run the test
flaky_test_parallel_jobs:
required: false
default: 32
type: number
description: Number of parallel jobs to run
jobs:
build:
name: Build Citus
runs-on: ubuntu-latest
container:
image: ${{ vars.build_image_name }}:${{ vars.pg15_version }}${{ vars.image_suffix }}
options: --user root
steps:
- uses: actions/checkout@v4
- name: Configure, Build, and Install
run: |
echo "PG_MAJOR=${PG_MAJOR}" >> $GITHUB_ENV
./ci/build-citus.sh
shell: bash
- uses: actions/upload-artifact@v4.6.0
with:
name: build-${{ env.PG_MAJOR }}
path: |-
./build-${{ env.PG_MAJOR }}/*
./install-${{ env.PG_MAJOR }}.tar
prepare_parallelization_matrix:
name: Prepare parallelization matrix
runs-on: ubuntu-latest
outputs:
json: ${{ steps.parallelization.outputs.json }}
steps:
- uses: actions/checkout@v4
- uses: "./.github/actions/parallelization"
id: parallelization
with:
count: ${{ inputs.flaky_test_parallel_jobs }}
test_flakyness:
name: Test flakyness
runs-on: ubuntu-latest
container:
image: ${{ vars.fail_test_image_name }}:${{ vars.pg15_version }}${{ vars.image_suffix }}
options: --user root
needs:
[build, prepare_parallelization_matrix]
env:
test: "${{ inputs.flaky_test }}"
runs: "${{ inputs.flaky_test_runs_per_job }}"
skip: false
strategy:
fail-fast: false
matrix: ${{ fromJson(needs.prepare_parallelization_matrix.outputs.json) }}
steps:
- uses: actions/checkout@v4
- uses: "./.github/actions/setup_extension"
- name: Run minimal tests
run: |-
gosu circleci src/test/regress/citus_tests/run_test.py ${{ env.test }} --repeat ${{ env.runs }} --use-whole-schedule-line
shell: bash
- uses: "./.github/actions/save_logs_and_results"
if: always()
with:
folder: check_flakyness_parallel_${{ matrix.id }}

View File

@ -1,177 +0,0 @@
name: Build tests in packaging images
on:
pull_request:
types: [opened, reopened,synchronize]
merge_group:
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
get_postgres_versions_from_file:
runs-on: ubuntu-latest
outputs:
pg_versions: ${{ steps.get-postgres-versions.outputs.pg_versions }}
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 2
- name: Get Postgres Versions
id: get-postgres-versions
run: |
set -euxo pipefail
# Postgres versions are stored in .github/workflows/build_and_test.yml
# file in json strings with major and full keys.
# Below command extracts the versions and get the unique values.
pg_versions=$(cat .github/workflows/build_and_test.yml | grep -oE '"major": "[0-9]+", "full": "[0-9.]+"' | sed -E 's/"major": "([0-9]+)", "full": "([0-9.]+)"/\1/g' | sort | uniq | tr '\n', ',')
pg_versions_array="[ ${pg_versions} ]"
echo "Supported PG Versions: ${pg_versions_array}"
# Below line is needed to set the output variable to be used in the next job
echo "pg_versions=${pg_versions_array}" >> $GITHUB_OUTPUT
shell: bash
rpm_build_tests:
name: rpm_build_tests
needs: get_postgres_versions_from_file
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
# While we use separate images for different Postgres versions in rpm
# based distros
# For this reason, we need to use a "matrix" to generate names of
# rpm images, e.g. citus/packaging:centos-7-pg12
packaging_docker_image:
- oraclelinux-8
- almalinux-8
- almalinux-9
POSTGRES_VERSION: ${{ fromJson(needs.get_postgres_versions_from_file.outputs.pg_versions) }}
container:
image: citus/packaging:${{ matrix.packaging_docker_image }}-pg${{ matrix.POSTGRES_VERSION }}
options: --user root
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Set Postgres and python parameters for rpm based distros
run: |
echo "/usr/pgsql-${{ matrix.POSTGRES_VERSION }}/bin" >> $GITHUB_PATH
echo "/root/.pyenv/bin:$PATH" >> $GITHUB_PATH
echo "PACKAGING_PYTHON_VERSION=3.8.16" >> $GITHUB_ENV
- name: Configure
run: |
echo "Current Shell:$0"
echo "GCC Version: $(gcc --version)"
./configure 2>&1 | tee output.log
- name: Make clean
run: |
make clean
- name: Make
run: |
git config --global --add safe.directory ${GITHUB_WORKSPACE}
make CFLAGS="-Wno-missing-braces" -sj$(cat /proc/cpuinfo | grep "core id" | wc -l) 2>&1 | tee -a output.log
# Check the exit code of the make command
make_exit_code=${PIPESTATUS[0]}
# If the make command returned a non-zero exit code, exit with the same code
if [[ $make_exit_code -ne 0 ]]; then
echo "make command failed with exit code $make_exit_code"
exit $make_exit_code
fi
- name: Make install
run: |
make CFLAGS="-Wno-missing-braces" install 2>&1 | tee -a output.log
- name: Validate output
env:
POSTGRES_VERSION: ${{ matrix.POSTGRES_VERSION }}
PACKAGING_DOCKER_IMAGE: ${{ matrix.packaging_docker_image }}
run: |
echo "Postgres version: ${POSTGRES_VERSION}"
./.github/packaging/validate_build_output.sh "rpm"
deb_build_tests:
name: deb_build_tests
needs: get_postgres_versions_from_file
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
# On deb based distros, we use the same docker image for
# builds based on different Postgres versions because deb
# based images include all postgres installations.
# For this reason, we have multiple runs --which is 3 today--
# for each deb based image and we use POSTGRES_VERSION to set
# PG_CONFIG variable in each of those runs.
packaging_docker_image:
- debian-bookworm-all
- debian-bullseye-all
- ubuntu-focal-all
- ubuntu-jammy-all
POSTGRES_VERSION: ${{ fromJson(needs.get_postgres_versions_from_file.outputs.pg_versions) }}
container:
image: citus/packaging:${{ matrix.packaging_docker_image }}
options: --user root
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Set pg_config path and python parameters for deb based distros
run: |
echo "PG_CONFIG=/usr/lib/postgresql/${{ matrix.POSTGRES_VERSION }}/bin/pg_config" >> $GITHUB_ENV
echo "/root/.pyenv/bin:$PATH" >> $GITHUB_PATH
echo "PACKAGING_PYTHON_VERSION=3.8.16" >> $GITHUB_ENV
- name: Configure
run: |
echo "Current Shell:$0"
echo "GCC Version: $(gcc --version)"
./configure 2>&1 | tee output.log
- name: Make clean
run: |
make clean
- name: Make
shell: bash
run: |
set -e
git config --global --add safe.directory ${GITHUB_WORKSPACE}
make -sj$(cat /proc/cpuinfo | grep "core id" | wc -l) 2>&1 | tee -a output.log
# Check the exit code of the make command
make_exit_code=${PIPESTATUS[0]}
# If the make command returned a non-zero exit code, exit with the same code
if [[ $make_exit_code -ne 0 ]]; then
echo "make command failed with exit code $make_exit_code"
exit $make_exit_code
fi
- name: Make install
run: |
make install 2>&1 | tee -a output.log
- name: Validate output
env:
POSTGRES_VERSION: ${{ matrix.POSTGRES_VERSION }}
PACKAGING_DOCKER_IMAGE: ${{ matrix.packaging_docker_image }}
run: |
echo "Postgres version: ${POSTGRES_VERSION}"
./.github/packaging/validate_build_output.sh "deb"

14
.gitignore vendored
View File

@ -38,23 +38,9 @@ lib*.pc
/Makefile.global /Makefile.global
/src/Makefile.custom /src/Makefile.custom
/compile_commands.json /compile_commands.json
/src/backend/distributed/cdc/build-cdc-*/*
/src/test/cdc/tmp_check/*
# temporary files vim creates # temporary files vim creates
*.swp *.swp
# vscode # vscode
.vscode/* .vscode/*
# output from diff normalization that shouldn't be commited
*.unmodified
*.modified
# style related temporary outputs
*.uncrustify
.venv
# added output when modifying check_gucs_are_alphabetically_sorted.sh
guc.out

File diff suppressed because it is too large Load Diff

View File

@ -1,9 +0,0 @@
# Microsoft Open Source Code of Conduct
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
Resources:
- [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/)
- [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
- Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns

View File

@ -11,65 +11,8 @@ sign a Contributor License Agreement (CLA). For an explanation of
why we ask this as well as instructions for how to proceed, see the why we ask this as well as instructions for how to proceed, see the
[Microsoft CLA](https://cla.opensource.microsoft.com/). [Microsoft CLA](https://cla.opensource.microsoft.com/).
### Devcontainer / Github Codespaces
The easiest way to start contributing is via our devcontainer. This container works both locally in visual studio code with docker-desktop/docker-for-mac as well as [Github Codespaces](https://github.com/features/codespaces). To open the project in vscode you will need the [Dev Containers extension](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers). For codespaces you will need to [create a new codespace](https://codespace.new/citusdata/citus).
With the extension installed you can run the following from the command pallet to get started
```
> Dev Containers: Clone Repository in Container Volume...
```
In the subsequent popup paste the url to the repo and hit enter.
```
https://github.com/citusdata/citus
```
This will create an isolated Workspace in vscode, complete with all tools required to build, test and run the Citus extension. We keep this container up to date with the supported postgres versions as well as the exact versions of tooling we use.
To quickly start we suggest splitting your terminal once to have two shells. The left one in the `/workspaces/citus`, the second one changed to `/data`. The left terminal will be used to interact with the project, the right one with a testing cluster.
To get citus installed from source we run `make install -s` in the first terminal. Once installed you can start a Citus cluster in the second terminal via `citus_dev make citus`. The cluster will run in the background, and can be interacted with via `citus_dev`. To get an overview of the available commands.
With the Citus cluster running you can connect to the coordinator in the first terminal via `psql -p9700`. Because the coordinator is the most common entrypoint the `PGPORT` environment is set accordingly, so a simple `psql` will connect directly to the coordinator.
### Debugging in the VS code
1. Start Debugging: Press F5 in VS Code to start debugging. When prompted, you'll need to attach the debugger to the appropriate PostgreSQL process.
2. Identify the Process: If you're running a psql command, take note of the PID that appears in your psql prompt. For example:
```
[local] citus@citus:9700 (PID: 5436)=#
```
This PID (5436 in this case) indicates the process that you should attach the debugger to.
If you are uncertain about which process to attach, you can list all running PostgreSQL processes using the following command:
```
ps aux | grep postgres
```
Look for the process associated with the PID you noted. For example:
```
citus 5436 0.0 0.0 0 0 ? S 14:00 0:00 postgres: citus citus
```
4. Attach the Debugger: Once you've identified the correct PID, select that process when prompted in VS Code to attach the debugger. You should now be able to debug the PostgreSQL session tied to the psql command.
5. Set Breakpoints and Debug: With the debugger attached, you can set breakpoints within the code. This allows you to step through the code execution, inspect variables, and fully debug the PostgreSQL instance running in your container.
### Getting and building ### Getting and building
[PostgreSQL documentation](https://www.postgresql.org/support/versioning/) has a
section on upgrade policy.
We always recommend that all users run the latest available minor release [for PostgreSQL] for whatever major version is in use.
We expect Citus users to honor this recommendation and use latest available
PostgreSQL minor release. Failure to do so may result in failures in our test
suite. There are some known improvements in PG test architecture such as
[this commit](https://github.com/postgres/postgres/commit/3f323956128ff8589ce4d3a14e8b950837831803)
that are missing in earlier minor versions.
#### Mac #### Mac
1. Install Xcode 1. Install Xcode
@ -87,19 +30,9 @@ that are missing in earlier minor versions.
cd citus cd citus
./configure ./configure
# If you have already installed the project, you need to clean it first
make clean
make make
make install make install
# Optionally, you might instead want to use `make install-all`
# since `multi_extension` regression test would fail due to missing downgrade scripts.
cd src/test/regress cd src/test/regress
pip install pipenv
pipenv --rm
pipenv install
pipenv shell
make check make check
``` ```
@ -114,11 +47,11 @@ that are missing in earlier minor versions.
sudo apt-key add - sudo apt-key add -
sudo apt-get update sudo apt-get update
sudo apt-get install -y postgresql-server-dev-14 postgresql-14 \ sudo apt-get install -y postgresql-server-dev-11 postgresql-11 \
autoconf flex git libcurl4-gnutls-dev libicu-dev \ libreadline-dev libselinux1-dev libxslt-dev \
libkrb5-dev liblz4-dev libpam0g-dev libreadline-dev \ libpam0g-dev git flex make libssl-dev \
libselinux1-dev libssl-dev libxslt1-dev libzstd-dev \ libicu-dev uuid-dev \
make uuid-dev libkrb5-dev libcurl4-gnutls-dev autoconf
``` ```
2. Get, build, and test the code 2. Get, build, and test the code
@ -127,19 +60,9 @@ that are missing in earlier minor versions.
git clone https://github.com/citusdata/citus.git git clone https://github.com/citusdata/citus.git
cd citus cd citus
./configure ./configure
# If you have already installed the project previously, you need to clean it first
make clean
make make
sudo make install sudo make install
# Optionally, you might instead want to use `sudo make install-all`
# since `multi_extension` regression test would fail due to missing downgrade scripts.
cd src/test/regress cd src/test/regress
pip install pipenv
pipenv --rm
pipenv install
pipenv shell
make check make check
``` ```
@ -171,33 +94,59 @@ that are missing in earlier minor versions.
```bash ```bash
sudo yum update -y sudo yum update -y
sudo yum groupinstall -y 'Development Tools' sudo yum groupinstall -y 'Development Tools'
sudo yum install -y postgresql14-devel postgresql14-server \ sudo yum install -y postgresql11-devel postgresql11-server \
git libcurl-devel libxml2-devel libxslt-devel \ libxml2-devel libxslt-devel openssl-devel \
libzstd-devel llvm-toolset-7-clang llvm5.0 lz4-devel \ pam-devel readline-devel git libcurl-devel \
openssl-devel pam-devel readline-devel llvm5.0 llvm-toolset-7-clang
git clone https://github.com/citusdata/citus.git git clone https://github.com/citusdata/citus.git
cd citus cd citus
PG_CONFIG=/usr/pgsql-14/bin/pg_config ./configure PG_CONFIG=/usr/pgsql-11/bin/pg_config ./configure
# If you have already installed the project previously, you need to clean it first
make clean
make make
sudo make install sudo make install
# Optionally, you might instead want to use `sudo make install-all`
# since `multi_extension` regression test would fail due to missing downgrade scripts.
cd src/test/regress cd src/test/regress
pip install pipenv
pipenv --rm
pipenv install
pipenv shell
make check make check
``` ```
### Following our coding conventions ### Following our coding conventions
Our coding conventions are documented in [STYLEGUIDE.md](STYLEGUIDE.md). CircleCI will automatically reject any PRs which do not follow our coding
conventions. The easiest way to ensure your PR adheres to those conventions is
to use the [citus_indent](https://github.com/citusdata/tools/tree/develop/uncrustify)
tool. This tool uses `uncrustify` under the hood.
```bash
# Uncrustify changes the way it formats code every release a bit. To make sure
# everyone formats consistently we use version 0.68.1:
curl -L https://github.com/uncrustify/uncrustify/archive/uncrustify-0.68.1.tar.gz | tar xz
cd uncrustify-uncrustify-0.68.1/
mkdir build
cd build
cmake ..
make -j5
sudo make install
cd ../..
git clone https://github.com/citusdata/tools.git
cd tools
make uncrustify/.install
```
Once you've done that, you can run the `make reindent` command from the top
directory to recursively check and correct the style of any source files in the
current directory. Under the hood, `make reindent` will run `citus_indent` and
some other style corrections for you.
You can also run the following in the directory of this repository to
automatically format all the files that you have changed before committing:
```bash
cat > .git/hooks/pre-commit << __EOF__
#!/bin/bash
citus_indent --check --diff || { citus_indent --diff; exit 1; }
__EOF__
chmod +x .git/hooks/pre-commit
```
### Making SQL changes ### Making SQL changes
@ -234,50 +183,3 @@ style `#include` statements like this:
Any other SQL you can put directly in the main sql file, e.g. Any other SQL you can put directly in the main sql file, e.g.
`src/backend/distributed/sql/citus--8.3-1--9.0-1.sql`. `src/backend/distributed/sql/citus--8.3-1--9.0-1.sql`.
### Backporting a commit to a release branch
1. Check out the release branch that you want to backport to `git checkout release-11.3`
2. Make sure you have the latest changes `git pull`
3. Create a new release branch with a unique name `git checkout -b release-11.3-<yourname>`
4. Cherry-pick the commit that you want to backport `git cherry-pick -x <sha>` (the `-x` is important)
5. Push the branch `git push`
6. Wait for tests to pass
7. If the cherry-pick required non-trivial merge conflicts, create a PR and ask
for a review.
8. After the tests pass on CI, fast-forward the release branch `git push origin release-11.3-<yourname>:release-11.3`
### Running tests
See [`src/test/regress/README.md`](https://github.com/citusdata/citus/blob/master/src/test/regress/README.md)
### Documentation
User-facing documentation is published on [docs.citusdata.com](https://docs.citusdata.com/). When adding a new feature, function, or setting, you can open a pull request or issue against the [Citus docs repo](https://github.com/citusdata/citus_docs/).
Detailed descriptions of the implementation for Citus developers are provided in the [Citus Technical Documentation](src/backend/distributed/README.md). It is currently a single file for ease of searching. Please update the documentation if you make any changes that affect the design or add major new features.
# Making a pull request ready for reviews
Asking for help and asking for reviews are two different things. When you're asking for help, you're asking for someone to help you with something that you're not expected to know.
But when you're asking for a review, you're asking for someone to review your work and provide feedback. So, when you're asking for a review, you're expected to make sure that:
* Your changes don't perform **unnecessary line addition / deletions / style changes on unrelated files / lines**.
* All CI jobs are **passing**, including **style checks** and **flaky test detection jobs**. Note that if you're an external contributor, you don't have to wait CI jobs to run (and finish) because they don't get automatically triggered for external contributors.
* Your PR has necessary amount of **tests** and that they're passing.
* You separated as much as possible work into **separate PRs**, e.g., a prerequisite bugfix, a refactoring etc..
* Your PR doesn't introduce a typo or something that you can easily fix yourself.
* After all CI jobs pass, code-coverage measurement job (CodeCov as of today) then kicks in. That's why it's important to make the **tests passing** first. At that point, you're expected to check **CodeCov annotations** that can be seen in the **Files Changed** tab and expected to make sure that it doesn't complain about any lines that are not covered. For example, it's ok if CodeCov complains about an `ereport()` call that you put for an "unexpected-but-better-than-crashing" case, but it's not ok if it complains about an uncovered `if` branch that you added.
* And finally, perform a **self-review** to make sure that:
* Code and code-comments reflects the idea **without requiring an extra explanation** via a chat message / email / PR comment.
This is important because we don't expect developers to reach out to author / read about the whole discussion in the PR to understand the idea behind a commit merged into `main` branch.
* PR description is clear enough.
* If-and-only-if you're **introducing a user facing change / bugfix**, your PR has a line that starts with `DESCRIPTION: <Present simple tense word that starts with a capital letter, e.g., Adds support for / Fixes / Disallows>`.
* **Commit messages** are clear enough if the commits are doing logically different things.

View File

@ -1,43 +0,0 @@
# Devcontainer
## Coredumps
When postgres/citus crashes, there is the option to create a coredump. This is useful for debugging the issue. Coredumps are enabled in the devcontainer by default. However, not all environments are configured correctly out of the box. The most important configuration that is not standardized is the `core_pattern`. The configuration can be verified from the container, however, you cannot change this setting from inside the container as the filesystem containing this setting is in read only mode while inside the container.
To verify if corefiles are written run the following command in a terminal. This shows the filename pattern with which the corefile will be written.
```bash
cat /proc/sys/kernel/core_pattern
```
This should be configured with a relative path or simply a simple filename, such as `core`. When your environment shows an absolute path you will need to change this setting. How to change this setting depends highly on the underlying system as the setting needs to be changed on the kernel of the host running the container.
You can put any pattern in `/proc/sys/kernel/core_pattern` as you see fit. eg. You can add the PID to the core pattern in one of two ways;
- You either include `%p` in the core_pattern. This gets substituted with the PID of the crashing process.
- Alternatively you could set `/proc/sys/kernel/core_uses_pid` to `1` in the same way as you set `core_pattern`. This will append the PID to the corefile if `%p` is not explicitly contained in the core_pattern.
When a coredump is written you can use the debug/launch configuration `Open core file` which is preconfigured in the devcontainer. This will open a fileprompt that lists all coredumps that are found in your workspace. When you want to debug coredumps from `citus_dev` that are run in your `/data` directory, you can add the data directory to your workspace. In the command pallet of vscode you can run `>Workspace: Add Folder to Workspace...` and select the `/data` directory. This will allow you to open the coredumps from the `/data` directory in the `Open core file` debug configuration.
### Windows (docker desktop)
When running in docker desktop on windows you will most likely need to change this setting. The linux guest in WSL2 that runs your container is the `docker-desktop` environment. The easiest way to get onto the host, where you can change this setting, is to open a powershell window and verify you have the docker-desktop environment listed.
```powershell
wsl --list
```
Among others this should list both `docker-desktop` and `docker-desktop-data`. You can then open a shell in the `docker-desktop` environment.
```powershell
wsl -d docker-desktop
```
Inside this shell you can verify that you have the right environment by running
```bash
cat /proc/sys/kernel/core_pattern
```
This should show the same configuration as the one you see inside the devcontainer. You can then change the setting by running the following command.
This will change the setting for the current session. If you want to make the change permanent you will need to add this to a startup script.
```bash
echo "core" > /proc/sys/kernel/core_pattern
```

View File

@ -13,16 +13,10 @@ include Makefile.global
all: extension all: extension
# build columnar only
columnar:
$(MAKE) -C src/backend/columnar all
# build extension # build extension
extension: $(citus_top_builddir)/src/include/citus_version.h columnar extension: $(citus_top_builddir)/src/include/citus_version.h
$(MAKE) -C src/backend/distributed/ all $(MAKE) -C src/backend/distributed/ all
install-columnar: columnar install-extension: extension
$(MAKE) -C src/backend/columnar install
install-extension: extension install-columnar
$(MAKE) -C src/backend/distributed/ install $(MAKE) -C src/backend/distributed/ install
install-headers: extension install-headers: extension
$(MKDIR_P) '$(DESTDIR)$(includedir_server)/distributed/' $(MKDIR_P) '$(DESTDIR)$(includedir_server)/distributed/'
@ -33,35 +27,27 @@ install-headers: extension
clean-extension: clean-extension:
$(MAKE) -C src/backend/distributed/ clean $(MAKE) -C src/backend/distributed/ clean
$(MAKE) -C src/backend/columnar/ clean
clean-full: clean-full:
$(MAKE) -C src/backend/distributed/ clean-full $(MAKE) -C src/backend/distributed/ clean-full
.PHONY: extension install-extension clean-extension clean-full .PHONY: extension install-extension clean-extension clean-full
# Add to generic targets
install: install-extension install-headers
install-downgrades: install-downgrades:
$(MAKE) -C src/backend/distributed/ install-downgrades $(MAKE) -C src/backend/distributed/ install-downgrades
install-all: install-headers install-all: install-headers
$(MAKE) -C src/backend/columnar/ install-all
$(MAKE) -C src/backend/distributed/ install-all $(MAKE) -C src/backend/distributed/ install-all
# Add to generic targets
install: install-extension install-headers
clean: clean-extension clean: clean-extension
# apply or check style # apply or check style
reindent: reindent:
${citus_abs_top_srcdir}/ci/fix_style.sh ${citus_abs_top_srcdir}/ci/fix_style.sh
check-style: check-style:
black . --check --quiet
isort . --check --quiet
flake8
cd ${citus_abs_top_srcdir} && citus_indent --quiet --check cd ${citus_abs_top_srcdir} && citus_indent --quiet --check
.PHONY: reindent check-style .PHONY: reindent check-style
# depend on install-all so that downgrade scripts are installed as well # depend on install for now
check: all install-all check: all install
# explicetely does not use $(MAKE) to avoid parallelism $(MAKE) -C src/test/regress check-full
make -C src/test/regress check
.PHONY: all check clean install install-downgrades install-all .PHONY: all check clean install install-downgrades install-all

View File

@ -64,8 +64,8 @@ $(citus_top_builddir)/Makefile.global: $(citus_abs_top_srcdir)/configure $(citus
$(citus_top_builddir)/config.status: $(citus_abs_top_srcdir)/configure $(citus_abs_top_srcdir)/src/backend/distributed/citus.control $(citus_top_builddir)/config.status: $(citus_abs_top_srcdir)/configure $(citus_abs_top_srcdir)/src/backend/distributed/citus.control
cd @abs_top_builddir@ && ./config.status --recheck && ./config.status cd @abs_top_builddir@ && ./config.status --recheck && ./config.status
# Regenerate configure if configure.ac changed # Regenerate configure if configure.in changed
$(citus_abs_top_srcdir)/configure: $(citus_abs_top_srcdir)/configure.ac $(citus_abs_top_srcdir)/configure: $(citus_abs_top_srcdir)/configure.in
cd ${citus_abs_top_srcdir} && ./autogen.sh cd ${citus_abs_top_srcdir} && ./autogen.sh
# If specified via configure, replace the default compiler. Normally # If specified via configure, replace the default compiler. Normally

99
NOTICE
View File

@ -1,99 +0,0 @@
NOTICES AND INFORMATION
Do Not Translate or Localize
This software incorporates material from third parties.
Microsoft makes certain open source code available at https://3rdpartysource.microsoft.com,
or you may send a check or money order for US $5.00, including the product name,
the open source component name, platform, and version number, to:
Source Code Compliance Team
Microsoft Corporation
One Microsoft Way
Redmond, WA 98052
USA
Notwithstanding any other terms, you may reverse engineer this software to the extent
required to debug changes to any libraries licensed under the GNU Lesser General Public License.
---------------------------------------------------------
---------------------------------------------------------
intel/safestringlib 245c4b8cff1d2e7338b7f3a82828fc8e72b29549 - MIT
Copyright (c) 2014-2018 Intel Corporation
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================================================
Copyright (C) 2012, 2013 Cisco Systems
All rights reserved.
Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation
files (the "Software"), to deal in the Software without
restriction, including without limitation the rights to use,
copy, modify, merge, publish, distribute, sublicense, and/or
sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following
conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
---------------------------------------------------------
postgres/postgres 29be9983a64c011eac0b9ee29895cce71e15ea77
PostgreSQL Database Management System
(formerly known as Postgres, then as Postgres95)
Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
Portions Copyright (c) 1994, The Regents of the University of California
Permission to use, copy, modify, and distribute this software and its
documentation for any purpose, without fee, and without a written agreement
is hereby granted, provided that the above copyright notice and this
paragraph and the following two paragraphs appear in all copies.
IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR
DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS
ON AN "AS IS" BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO
PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
---------------------------------------------------------

638
README.md
View File

@ -1,496 +1,154 @@
| **<br/>The Citus database is 100% open source.<br/><img width=1000/><br/>Learn what's new in the [Citus 13.0 release blog](https://www.citusdata.com/blog/2025/02/06/distribute-postgresql-17-with-citus-13/) and the [Citus Updates page](https://www.citusdata.com/updates/).<br/><br/>**| ![Citus Banner](/github-banner.png)
|---|
<br/>
![Citus Banner](images/citus-readme-banner.png)
[![Slack Status](http://slack.citusdata.com/badge.svg)](https://slack.citusdata.com)
[![Latest Docs](https://img.shields.io/badge/docs-latest-brightgreen.svg)](https://docs.citusdata.com/) [![Latest Docs](https://img.shields.io/badge/docs-latest-brightgreen.svg)](https://docs.citusdata.com/)
[![Stack Overflow](https://img.shields.io/badge/Stack%20Overflow-%20-545353?logo=Stack%20Overflow)](https://stackoverflow.com/questions/tagged/citus) [![Circleci Status](https://circleci.com/gh/citusdata/citus.svg?style=svg)](https://circleci.com/gh/citusdata/citus.svg?style=svg)
[![Slack](https://cituscdn.azureedge.net/images/social/slack-badge.svg)](https://slack.citusdata.com/) [![Code Coverage](https://codecov.io/gh/citusdata/citus/branch/master/graph/badge.svg)](https://codecov.io/gh/citusdata/citus/branch/master/graph/badge.svg)
[![Code Coverage](https://codecov.io/gh/citusdata/citus/branch/master/graph/badge.svg)](https://app.codecov.io/gh/citusdata/citus)
[![Twitter](https://img.shields.io/twitter/follow/citusdata.svg?label=Follow%20@citusdata)](https://twitter.com/intent/follow?screen_name=citusdata) ### What is Citus?
[![Citus Deb Packages](https://img.shields.io/badge/deb-packagecloud.io-844fec.svg)](https://packagecloud.io/app/citusdata/community/search?q=&filter=debs) * **Open-source** PostgreSQL extension (not a fork)
[![Citus Rpm Packages](https://img.shields.io/badge/rpm-packagecloud.io-844fec.svg)](https://packagecloud.io/app/citusdata/community/search?q=&filter=rpms) * **Built to scale out** across multiple nodes
* **Distributed** engine for query parallelization
## What is Citus? * **Database** designed to scale out multi-tenant applications, real-time analytics dashboards, and high-throughput transactional workloads
Citus is a [PostgreSQL extension](https://www.citusdata.com/blog/2017/10/25/what-it-means-to-be-a-postgresql-extension/) that transforms Postgres into a distributed database—so you can achieve high performance at any scale. Citus is an open source extension to Postgres that distributes your data and your queries across multiple nodes. Because Citus is an extension to Postgres, and not a fork, Citus gives developers and enterprises a scale-out database while keeping the power and familiarity of a relational database. As an extension, Citus supports new PostgreSQL releases, and allows you to benefit from new features while maintaining compatibility with existing PostgreSQL tools.
With Citus, you extend your PostgreSQL database with new superpowers: Citus serves many use cases. Three common ones are:
- **Distributed tables** are sharded across a cluster of PostgreSQL nodes to combine their CPU, memory, storage and I/O capacity. 1. [Multi-tenant & SaaS applications](https://www.citusdata.com/blog/2016/10/03/designing-your-saas-database-for-high-scalability):
- **References tables** are replicated to all nodes for joins and foreign keys from distributed tables and maximum read performance. Most B2B applications already have the notion of a tenant /
- **Distributed query engine** routes and parallelizes SELECT, DML, and other operations on distributed tables across the cluster. customer / account built into their data model. Citus allows you to scale out your
- **Columnar storage** compresses data, speeds up scans, and supports fast projections, both on regular and distributed tables. transactional relational database to 100K+ tenants with minimal changes to your
- **Query from any node** enables you to utilize the full capacity of your cluster for distributed queries application.
You can use these Citus superpowers to make your Postgres database scale-out ready on a single Citus node. Or you can build a large cluster capable of handling **high transaction throughputs**, especially in **multi-tenant apps**, run **fast analytical queries**, and process large amounts of **time series** or **IoT data** for **real-time analytics**. When your data size and volume grow, you can easily add more worker nodes to the cluster and rebalance the shards. 2. [Real-time analytics](https://www.citusdata.com/blog/2017/12/27/real-time-analytics-dashboards-with-citus/):
Citus enables ingesting large volumes of data and running
Our [SIGMOD '21](https://2021.sigmod.org/) paper [Citus: Distributed PostgreSQL for Data-Intensive Applications](https://doi.org/10.1145/3448016.3457551) gives a more detailed look into what Citus is, how it works, and why it works that way. analytical queries on that data in human real-time. Example applications include analytic
dashboards with sub-second response times and exploratory queries on unfolding events.
![Citus scales out from a single node](images/citus-scale-out.png)
3. High-throughput transactional workloads:
Since Citus is an extension to Postgres, you can use Citus with the latest Postgres versions. And Citus works seamlessly with the PostgreSQL tools and extensions you are already familiar with. By distributing your workload across a database cluster, Citus ensures low latency and high performance even with a large number of concurrent users and high volumes of transactions.
- [Why Citus?](#why-citus) To learn more, visit [citusdata.com](https://www.citusdata.com) and join
- [Getting Started](#getting-started) the [Citus slack](https://slack.citusdata.com/) to
- [Using Citus](#using-citus) stay on top of the latest developments.
- [Schema-based sharding](#schema-based-sharding)
- [Setting up with High Availability](#setting-up-with-high-availability) ### Getting started with Citus
- [Documentation](#documentation)
- [Architecture](#architecture) The fastest way to get up and running is to deploy Citus in the cloud. You can also setup a local Citus database cluster with Docker.
- [When to Use Citus](#when-to-use-citus)
- [Need Help?](#need-help) #### Hyperscale (Citus) on Azure Database for PostgreSQL
- [Contributing](#contributing)
- [Stay Connected](#stay-connected) Hyperscale (Citus) is a deployment option on Azure Database for PostgreSQL, a fully-managed database as a service. Hyperscale (Citus) employs the Citus open source extension so you can scale out across multiple nodes. To get started with Hyperscale (Citus), [learn more](https://www.citusdata.com/product/hyperscale-citus/) on the Citus website or use the [Hyperscale (Citus) Quickstart](https://docs.microsoft.com/en-us/azure/postgresql/quickstart-create-hyperscale-portal) in the Azure docs.
## Why Citus? #### Citus Cloud
Developers choose Citus for two reasons: Citus Cloud runs on top of AWS as a fully managed database as a service. You can provision a Citus Cloud account at [https://console.citusdata.com](https://console.citusdata.com/users/sign_up) and get started with just a few clicks.
1. Your application is outgrowing a single PostgreSQL node #### Local Citus Cluster
If the size and volume of your data increases over time, you may start seeing any number of performance and scalability problems on a single PostgreSQL node. For example: High CPU utilization and I/O wait times slow down your queries, SQL queries return out of memory errors, autovacuum cannot keep up and increases table bloat, etc. If you're looking to get started locally, you can follow the following steps to get up and running.
With Citus you can distribute and optionally compress your tables to always have enough memory, CPU, and I/O capacity to achieve high performance at scale. The distributed query engine can efficiently route transactions across the cluster, while parallelizing analytical queries and batch operations across all cores. Moreover, you can still use the PostgreSQL features and tools you know and love. 1. Install Docker Community Edition and Docker Compose
* Mac:
2. PostgreSQL can do things other systems cant 1. [Download](https://www.docker.com/community-edition#/download) and install Docker.
2. Start Docker by clicking on the applications icon.
There are many data processing systems that are built to scale out, but few have as many powerful capabilities as PostgreSQL, including: Advanced joins and subqueries, user-defined functions, update/delete/upsert, constraints and foreign keys, powerful extensions (e.g. PostGIS, HyperLogLog), many types of indexes, time-partitioning, and sophisticated JSON support. * Linux:
```bash
Citus makes PostgreSQLs most powerful capabilities work at any scale, allowing you to handle complex data-intensive workloads on a single database system. curl -sSL https://get.docker.com/ | sh
sudo usermod -aG docker $USER && exec sg docker newgrp `id -gn`
## Getting Started sudo systemctl start docker
The quickest way to get started with Citus is to use the [Azure Cosmos DB for PostgreSQL](https://learn.microsoft.com/azure/cosmos-db/postgresql/quickstart-create-portal) managed service in the cloud—or [set up Citus locally](https://docs.citusdata.com/en/stable/installation/single_node.html). sudo curl -sSL https://github.com/docker/compose/releases/download/1.11.2/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
### Citus Managed Service on Azure ```
The above version of Docker Compose is sufficient for running Citus, or you can install the [latest version](https://github.com/docker/compose/releases/latest).
You can get a fully-managed Citus cluster in minutes through the [Azure Cosmos DB for PostgreSQL portal](https://azure.microsoft.com/products/cosmos-db/). Azure will manage your backups, high availability through auto-failover, software updates, monitoring, and more for all of your servers. To get started Citus on Azure, use the [Azure Cosmos DB for PostgreSQL Quickstart](https://learn.microsoft.com/azure/cosmos-db/postgresql/quickstart-create-portal).
2. Pull and start the Docker images
### Running Citus using Docker ```bash
curl -sSLO https://raw.githubusercontent.com/citusdata/docker/master/docker-compose.yml
The smallest possible Citus cluster is a single PostgreSQL node with the Citus extension, which means you can try out Citus by running a single Docker container. docker-compose -p citus up -d
```
```bash
# run PostgreSQL with Citus on port 5500 3. Connect to the master database
docker run -d --name citus -p 5500:5432 -e POSTGRES_PASSWORD=mypassword citusdata/citus ```bash
docker exec -it citus_master psql -U postgres
# connect using psql within the Docker container ```
docker exec -it citus psql -U postgres
4. Follow the [first tutorial][tutorial] instructions
# or, connect using local psql 5. To shut the cluster down, run
psql -U postgres -d postgres -h localhost -p 5500
``` ```bash
docker-compose -p citus down
### Install Citus locally ```
If you already have a local PostgreSQL installation, the easiest way to install Citus is to use our packaging repo ### Talk to Contributors and Learn More
Install packages on Ubuntu / Debian: <table class="tg">
<col width="45%">
```bash <col width="65%">
curl https://install.citusdata.com/community/deb.sh > add-citus-repo.sh <tr>
sudo bash add-citus-repo.sh <td>Documentation</td>
sudo apt-get -y install postgresql-17-citus-13.0 <td>Try the <a
``` href="https://docs.citusdata.com/en/stable/tutorials/multi-tenant-tutorial.html">Citus
tutorial</a> for a hands-on introduction or <br/>the <a
Install packages on Red Hat: href="https://docs.citusdata.com">documentation</a> for
```bash a more comprehensive reference.</td>
curl https://install.citusdata.com/community/rpm.sh > add-citus-repo.sh </tr>
sudo bash add-citus-repo.sh <tr>
sudo yum install -y citus130_17 <td>Slack</td>
``` <td>Chat with us in our community <a
href="https://slack.citusdata.com">Slack channel</a>.</td>
To add Citus to your local PostgreSQL database, add the following to `postgresql.conf`: </tr>
<tr>
``` <td>Github Issues</td>
shared_preload_libraries = 'citus' <td>We track specific bug reports and feature requests on our <a
``` href="https://github.com/citusdata/citus/issues">project
issues</a>.</td>
After restarting PostgreSQL, connect using `psql` and run: </tr>
<tr>
```sql <td>Twitter</td>
CREATE EXTENSION citus; <td>Follow <a href="https://twitter.com/citusdata">@citusdata</a>
```` for general updates and PostgreSQL scaling tips.</td>
Youre now ready to get started and use Citus tables on a single node. </tr>
<tr>
### Install Citus on multiple nodes <td>Citus Blog</td>
<td>Read our <a href="https://www.citusdata.com/blog/">Citus Data Blog</a>
If you want to set up a multi-node cluster, you can also set up additional PostgreSQL nodes with the Citus extensions and add them to form a Citus cluster: for posts on Postgres, Citus, and scaling your database.</td>
</tr>
```sql </table>
-- before adding the first worker node, tell future worker nodes how to reach the coordinator
SELECT citus_set_coordinator_host('10.0.0.1', 5432); ### Contributing
-- add worker nodes Citus is built on and of open source, and we welcome your contributions.
SELECT citus_add_node('10.0.0.2', 5432); The [CONTRIBUTING.md](CONTRIBUTING.md) file explains how to get started
SELECT citus_add_node('10.0.0.3', 5432); developing the Citus extension itself and our code quality guidelines.
-- rebalance the shards over the new worker nodes ### Who is Using Citus?
SELECT rebalance_table_shards();
``` Citus is deployed in production by many customers, ranging from
technology start-ups to large enterprises. Here are some examples:
For more details, see our [documentation on how to set up a multi-node Citus cluster](https://docs.citusdata.com/en/stable/installation/multi_node.html) on various operating systems.
* [Algolia](https://www.algolia.com/) uses Citus to provide real-time analytics for over 1B searches per day. For faster insights, they also use TopN and HLL extensions. [User Story](https://www.citusdata.com/customers/algolia)
## Using Citus * [Heap](https://heapanalytics.com/) uses Citus to run dynamic
funnel, segmentation, and cohort queries across billions of users
Once you have your Citus cluster, you can start creating distributed tables, reference tables and use columnar storage. and has more than 700B events in their Citus database cluster. [Watch Video](https://www.youtube.com/watch?v=NVl9_6J1G60&list=PLixnExCn6lRpP10ZlpJwx6AuU3XIgNWpL)
* [Pex](https://pex.com/) uses Citus to ingest 80B data points per day and analyze that data in real-time. They use a 20+ node cluster on Google Cloud. [User Story](https://www.citusdata.com/customers/pex)
### Creating Distributed Tables * [MixRank](https://mixrank.com/) uses Citus to efficiently collect
and analyze vast amounts of data to allow inside B2B sales teams
The `create_distributed_table` UDF will transparently shard your table locally or across the worker nodes: to find new customers. [User Story](https://www.citusdata.com/customers/mixrank)
* [Agari](https://www.agari.com/) uses Citus to secure more than
```sql 85 percent of U.S. consumer emails on two 6-8 TB clusters. [User
CREATE TABLE events ( Story](https://www.citusdata.com/customers/agari)
device_id bigint, * [Copper (formerly ProsperWorks)](https://copper.com/) powers a cloud CRM service with Citus. [User Story](https://www.citusdata.com/customers/copper)
event_id bigserial,
event_time timestamptz default now(),
data jsonb not null, You can read more user stories about how they employ Citus to scale Postgres for both multi-tenant SaaS applications as well as real-time analytics dashboards [here](https://www.citusdata.com/customers/).
PRIMARY KEY (device_id, event_id)
);
-- distribute the events table across shards placed locally or on the worker nodes
SELECT create_distributed_table('events', 'device_id');
```
After this operation, queries for a specific device ID will be efficiently routed to a single worker node, while queries across device IDs will be parallelized across the cluster.
```sql
-- insert some events
INSERT INTO events (device_id, data)
SELECT s % 100, ('{"measurement":'||random()||'}')::jsonb FROM generate_series(1,1000000) s;
-- get the last 3 events for device 1, routed to a single node
SELECT * FROM events WHERE device_id = 1 ORDER BY event_time DESC, event_id DESC LIMIT 3;
┌───────────┬──────────┬───────────────────────────────┬───────────────────────────────────────┐
│ device_id │ event_id │ event_time │ data │
├───────────┼──────────┼───────────────────────────────┼───────────────────────────────────────┤
│ 1 │ 1999901 │ 2021-03-04 16:00:31.189963+00 │ {"measurement": 0.88722643925054} │
│ 1 │ 1999801 │ 2021-03-04 16:00:31.189963+00 │ {"measurement": 0.6512231304621992} │
│ 1 │ 1999701 │ 2021-03-04 16:00:31.189963+00 │ {"measurement": 0.019368766051897524} │
└───────────┴──────────┴───────────────────────────────┴───────────────────────────────────────┘
(3 rows)
Time: 4.588 ms
-- explain plan for a query that is parallelized across shards, which shows the plan for
-- a query one of the shards and how the aggregation across shards is done
EXPLAIN (VERBOSE ON) SELECT count(*) FROM events;
┌────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├────────────────────────────────────────────────────────────────────────────────────┤
│ Aggregate │
│ Output: COALESCE((pg_catalog.sum(remote_scan.count))::bigint, '0'::bigint) │
│ -> Custom Scan (Citus Adaptive) │
│ ... │
│ -> Task │
│ Query: SELECT count(*) AS count FROM events_102008 events WHERE true │
│ Node: host=localhost port=5432 dbname=postgres │
│ -> Aggregate │
│ -> Seq Scan on public.events_102008 events │
└────────────────────────────────────────────────────────────────────────────────────┘
```
### Creating Distributed Tables with Co-location
Distributed tables that have the same distribution column can be co-located to enable high performance distributed joins and foreign keys between distributed tables.
By default, distributed tables will be co-located based on the type of the distribution column, but you define co-location explicitly with the `colocate_with` argument in `create_distributed_table`.
```sql
CREATE TABLE devices (
device_id bigint primary key,
device_name text,
device_type_id int
);
CREATE INDEX ON devices (device_type_id);
-- co-locate the devices table with the events table
SELECT create_distributed_table('devices', 'device_id', colocate_with := 'events');
-- insert device metadata
INSERT INTO devices (device_id, device_name, device_type_id)
SELECT s, 'device-'||s, 55 FROM generate_series(0, 99) s;
-- optionally: make sure the application can only insert events for a known device
ALTER TABLE events ADD CONSTRAINT device_id_fk
FOREIGN KEY (device_id) REFERENCES devices (device_id);
-- get the average measurement across all devices of type 55, parallelized across shards
SELECT avg((data->>'measurement')::double precision)
FROM events JOIN devices USING (device_id)
WHERE device_type_id = 55;
┌────────────────────┐
│ avg │
├────────────────────┤
│ 0.5000191877513974 │
└────────────────────┘
(1 row)
Time: 209.961 ms
```
Co-location also helps you scale [INSERT..SELECT](https://docs.citusdata.com/en/stable/articles/aggregation.html), [stored procedures](https://www.citusdata.com/blog/2020/11/21/making-postgres-stored-procedures-9x-faster-in-citus/), and [distributed transactions](https://www.citusdata.com/blog/2017/06/02/scaling-complex-sql-transactions/).
### Distributing Tables without interrupting the application
Some of you already start with Postgres, and decide to distribute tables later on while your application using the tables. In that case, you want to avoid downtime for both reads and writes. `create_distributed_table` command block writes (e.g., DML commands) on the table until the command is finished. Instead, with `create_distributed_table_concurrently` command, your application can continue to read and write the data even during the command.
```sql
CREATE TABLE device_logs (
device_id bigint primary key,
log text
);
-- insert device logs
INSERT INTO device_logs (device_id, log)
SELECT s, 'device log:'||s FROM generate_series(0, 99) s;
-- convert device_logs into a distributed table without interrupting the application
SELECT create_distributed_table_concurrently('device_logs', 'device_id', colocate_with := 'devices');
-- get the count of the logs, parallelized across shards
SELECT count(*) FROM device_logs;
┌───────┐
│ count │
├───────┤
│ 100 │
└───────┘
(1 row)
Time: 48.734 ms
```
### Creating Reference Tables
When you need fast joins or foreign keys that do not include the distribution column, you can use `create_reference_table` to replicate a table across all nodes in the cluster.
```sql
CREATE TABLE device_types (
device_type_id int primary key,
device_type_name text not null unique
);
-- replicate the table across all nodes to enable foreign keys and joins on any column
SELECT create_reference_table('device_types');
-- insert a device type
INSERT INTO device_types (device_type_id, device_type_name) VALUES (55, 'laptop');
-- optionally: make sure the application can only insert devices with known types
ALTER TABLE devices ADD CONSTRAINT device_type_fk
FOREIGN KEY (device_type_id) REFERENCES device_types (device_type_id);
-- get the last 3 events for devices whose type name starts with laptop, parallelized across shards
SELECT device_id, event_time, data->>'measurement' AS value, device_name, device_type_name
FROM events JOIN devices USING (device_id) JOIN device_types USING (device_type_id)
WHERE device_type_name LIKE 'laptop%' ORDER BY event_time DESC LIMIT 3;
┌───────────┬───────────────────────────────┬─────────────────────┬─────────────┬──────────────────┐
│ device_id │ event_time │ value │ device_name │ device_type_name │
├───────────┼───────────────────────────────┼─────────────────────┼─────────────┼──────────────────┤
│ 60 │ 2021-03-04 16:00:31.189963+00 │ 0.28902084163415864 │ device-60 │ laptop │
│ 8 │ 2021-03-04 16:00:31.189963+00 │ 0.8723803076285073 │ device-8 │ laptop │
│ 20 │ 2021-03-04 16:00:31.189963+00 │ 0.8177634801548557 │ device-20 │ laptop │
└───────────┴───────────────────────────────┴─────────────────────┴─────────────┴──────────────────┘
(3 rows)
Time: 146.063 ms
```
Reference tables enable you to scale out complex data models and take full advantage of relational database features.
### Creating Tables with Columnar Storage
To use columnar storage in your PostgreSQL database, all you need to do is add `USING columnar` to your `CREATE TABLE` statements and your data will be automatically compressed using the columnar access method.
```sql
CREATE TABLE events_columnar (
device_id bigint,
event_id bigserial,
event_time timestamptz default now(),
data jsonb not null
)
USING columnar;
-- insert some data
INSERT INTO events_columnar (device_id, data)
SELECT d, '{"hello":"columnar"}' FROM generate_series(1,10000000) d;
-- create a row-based table to compare
CREATE TABLE events_row AS SELECT * FROM events_columnar;
-- see the huge size difference!
\d+
List of relations
┌────────┬──────────────────────────────┬──────────┬───────┬─────────────┬────────────┬─────────────┐
│ Schema │ Name │ Type │ Owner │ Persistence │ Size │ Description │
├────────┼──────────────────────────────┼──────────┼───────┼─────────────┼────────────┼─────────────┤
│ public │ events_columnar │ table │ marco │ permanent │ 25 MB │ │
│ public │ events_row │ table │ marco │ permanent │ 651 MB │ │
└────────┴──────────────────────────────┴──────────┴───────┴─────────────┴────────────┴─────────────┘
(2 rows)
```
You can use columnar storage by itself, or in a distributed table to combine the benefits of compression and the distributed query engine.
When using columnar storage, you should only load data in batch using `COPY` or `INSERT..SELECT` to achieve good compression. Update, delete, and foreign keys are currently unsupported on columnar tables. However, you can use partitioned tables in which newer partitions use row-based storage, and older partitions are compressed using columnar storage.
To learn more about columnar storage, check out the [columnar storage README](https://github.com/citusdata/citus/blob/master/src/backend/columnar/README.md).
## Schema-based sharding
Available since Citus 12.0, [schema-based sharding](https://docs.citusdata.com/en/stable/get_started/concepts.html#schema-based-sharding) is the shared database, separate schema model, the schema becomes the logical shard within the database. Multi-tenant apps can a use a schema per tenant to easily shard along the tenant dimension. Query changes are not required and the application usually only needs a small modification to set the proper search_path when switching tenants. Schema-based sharding is an ideal solution for microservices, and for ISVs deploying applications that cannot undergo the changes required to onboard row-based sharding.
### Creating distributed schemas
You can turn an existing schema into a distributed schema by calling `citus_schema_distribute`:
```sql
SELECT citus_schema_distribute('user_service');
```
Alternatively, you can set `citus.enable_schema_based_sharding` to have all newly created schemas be automatically converted into distributed schemas:
```sql
SET citus.enable_schema_based_sharding TO ON;
CREATE SCHEMA AUTHORIZATION user_service;
CREATE SCHEMA AUTHORIZATION time_service;
CREATE SCHEMA AUTHORIZATION ping_service;
```
### Running queries
Queries will be properly routed to schemas based on `search_path` or by explicitly using the schema name in the query.
For [microservices](https://docs.citusdata.com/en/stable/get_started/tutorial_microservices.html) you would create a USER per service matching the schema name, hence the default `search_path` would contain the schema name. When connected the user queries would be automatically routed and no changes to the microservice would be required.
```sql
CREATE USER user_service;
CREATE SCHEMA AUTHORIZATION user_service;
```
For typical multi-tenant applications, you would set the search path to the tenant schema name in your application:
```sql
SET search_path = tenant_name, public;
```
## Setting up with High Availability
One of the most popular high availability solutions for PostgreSQL, [Patroni 3.0](https://github.com/zalando/patroni), has [first class support for Citus 10.0 and above](https://patroni.readthedocs.io/en/latest/citus.html#citus), additionally since Citus 11.2 ships with improvements for smoother node switchover in Patroni.
An example of patronictl list output for the Citus cluster:
```bash
postgres@coord1:~$ patronictl list demo
```
```text
+ Citus cluster: demo ----------+--------------+---------+----+-----------+
| Group | Member | Host | Role | State | TL | Lag in MB |
+-------+---------+-------------+--------------+---------+----+-----------+
| 0 | coord1 | 172.27.0.10 | Replica | running | 1 | 0 |
| 0 | coord2 | 172.27.0.6 | Sync Standby | running | 1 | 0 |
| 0 | coord3 | 172.27.0.4 | Leader | running | 1 | |
| 1 | work1-1 | 172.27.0.8 | Sync Standby | running | 1 | 0 |
| 1 | work1-2 | 172.27.0.2 | Leader | running | 1 | |
| 2 | work2-1 | 172.27.0.5 | Sync Standby | running | 1 | 0 |
| 2 | work2-2 | 172.27.0.7 | Leader | running | 1 | |
+-------+---------+-------------+--------------+---------+----+-----------+
```
## Documentation
If youre ready to get started with Citus or want to know more, we recommend reading the [Citus open source documentation](https://docs.citusdata.com/en/stable/). Or, if you are using Citus on Azure, then the [Azure Cosmos DB for PostgreSQL](https://learn.microsoft.com/azure/cosmos-db/postgresql/introduction) is the place to start.
Our Citus docs contain comprehensive use case guides on how to build a [multi-tenant SaaS application](https://docs.citusdata.com/en/stable/use_cases/multi_tenant.html), [real-time analytics dashboard]( https://docs.citusdata.com/en/stable/use_cases/realtime_analytics.html), or work with [time series data](https://docs.citusdata.com/en/stable/use_cases/timeseries.html).
## Architecture
A Citus database cluster grows from a single PostgreSQL node into a cluster by adding worker nodes. In a Citus cluster, the original node to which the application connects is referred to as the coordinator node. The Citus coordinator contains both the metadata of distributed tables and reference tables, as well as regular (local) tables, sequences, and other database objects (e.g. foreign tables).
Data in distributed tables is stored in “shards”, which are actually just regular PostgreSQL tables on the worker nodes. When querying a distributed table on the coordinator node, Citus will send regular SQL queries to the worker nodes. That way, all the usual PostgreSQL optimizations and extensions can automatically be used with Citus.
![Citus architecture](images/citus-architecture.png)
When you send a query in which all (co-located) distributed tables have the same filter on the distribution column, Citus will automatically detect that and send the whole query to the worker node that stores the data. That way, arbitrarily complex queries are supported with minimal routing overhead, which is especially useful for scaling transactional workloads. If queries do not have a specific filter, each shard is queried in parallel, which is especially useful in analytical workloads. The Citus distributed executor is adaptive and is designed to handle both query types at the same time on the same system under high concurrency, which enables large-scale mixed workloads.
The schema and metadata of distributed tables and reference tables are automatically synchronized to all the nodes in the cluster. That way, you can connect to any node to run distributed queries. Schema changes and cluster administration still need to go through the coordinator.
Detailed descriptions of the implementation for Citus developers are provided in the [Citus Technical Documentation](src/backend/distributed/README.md).
## When to use Citus
Citus is uniquely capable of scaling both analytical and transactional workloads with up to petabytes of data. Use cases in which Citus is commonly used:
- **[Customer-facing analytics dashboards](http://docs.citusdata.com/en/stable/use_cases/realtime_analytics.html)**:
Citus enables you to build analytics dashboards that simultaneously ingest and process large amounts of data in the database and give sub-second response times even with a large number of concurrent users.
The advanced parallel, distributed query engine in Citus combined with PostgreSQL features such as [array types](https://www.postgresql.org/docs/current/arrays.html), [JSONB](https://www.postgresql.org/docs/current/datatype-json.html), [lateral joins](https://heap.io/blog/engineering/postgresqls-powerful-new-join-type-lateral), and extensions like [HyperLogLog](https://github.com/citusdata/postgresql-hll) and [TopN](https://github.com/citusdata/postgresql-topn) allow you to build responsive analytics dashboards no matter how many customers or how much data you have.
Example real-time analytics users: [Algolia](https://www.citusdata.com/customers/algolia)
- **[Time series data](http://docs.citusdata.com/en/stable/use_cases/timeseries.html)**:
Citus enables you to process and analyze very large amounts of time series data. The biggest Citus clusters store well over a petabyte of time series data and ingest terabytes per day.
Citus integrates seamlessly with [Postgres table partitioning](https://www.postgresql.org/docs/current/ddl-partitioning.html) and has [built-in functions for partitioning by time](https://www.citusdata.com/blog/2021/10/22/how-to-scale-postgres-for-time-series-data-with-citus/), which can speed up queries and writes on time series tables. You can take advantage of Cituss parallel, distributed query engine for fast analytical queries, and use the built-in *columnar storage* to compress old partitions.
Example users: [MixRank](https://www.citusdata.com/customers/mixrank)
- **[Software-as-a-service (SaaS) applications](http://docs.citusdata.com/en/stable/use_cases/multi_tenant.html)**:
SaaS and other multi-tenant applications need to be able to scale their database as the number of tenants/customers grows. Citus enables you to transparently shard a complex data model by the tenant dimension, so your database can grow along with your business.
By distributing tables along a tenant ID column and co-locating data for the same tenant, Citus can horizontally scale complex (tenant-scoped) queries, transactions, and foreign key graphs. Reference tables and distributed DDL commands make database management a breeze compared to manual sharding. On top of that, you have a built-in distributed query engine for doing cross-tenant analytics inside the database.
Example multi-tenant SaaS users: [Salesloft](https://fivetran.com/case-studies/replicating-sharded-databases-a-case-study-of-salesloft-citus-data-and-fivetran), [ConvertFlow](https://www.citusdata.com/customers/convertflow)
- **[Microservices](https://docs.citusdata.com/en/stable/get_started/tutorial_microservices.html)**: Citus supports schema based sharding, which allows distributing regular database schemas across many machines. This sharding methodology fits nicely with typical Microservices architecture, where storage is fully owned by the service hence cant share the same schema definition with other tenants. Citus allows distributing horizontally scalable state across services, solving one of the [main problems](https://stackoverflow.blog/2020/11/23/the-macro-problem-with-microservices/) of microservices.
- **Geospatial**:
Because of the powerful [PostGIS](https://postgis.net/) extension to Postgres that adds support for geographic objects into Postgres, many people run spatial/GIS applications on top of Postgres. And since spatial location information has become part of our daily life, well, there are more geospatial applications than ever. When your Postgres database needs to scale out to handle an increased workload, Citus is a good fit.
Example geospatial users: [Helsinki Regional Transportation Authority (HSL)](https://customers.microsoft.com/story/845146-transit-authority-improves-traffic-monitoring-with-azure-database-for-postgresql-hyperscale), [MobilityDB](https://www.citusdata.com/blog/2020/11/09/analyzing-gps-trajectories-at-scale-with-postgres-mobilitydb/).
## Need Help?
- **Slack**: Ask questions in our Citus community [Slack channel](https://slack.citusdata.com).
- **GitHub issues**: Please submit issues via [GitHub issues](https://github.com/citusdata/citus/issues).
- **Documentation**: Our [Citus docs](https://docs.citusdata.com ) have a wealth of resources, including sections on [query performance tuning](https://docs.citusdata.com/en/stable/performance/performance_tuning.html), [useful diagnostic queries](https://docs.citusdata.com/en/stable/admin_guide/diagnostic_queries.html), and [common error messages](https://docs.citusdata.com/en/stable/reference/common_errors.html).
- **Docs issues**: You can also submit documentation issues via [GitHub issues for our Citus docs](https://github.com/citusdata/citus_docs/issues).
- **Updates & Release Notes**: Learn about what's new in each Citus version on the [Citus Updates page](https://www.citusdata.com/updates/).
## Contributing
Citus is built on and of open source, and we welcome your contributions. The [CONTRIBUTING.md](CONTRIBUTING.md) file explains how to get started developing the Citus extension itself and our code quality guidelines.
## Code of Conduct
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
## Stay Connected
- **Twitter**: Follow us [@citusdata](https://twitter.com/citusdata) to track the latest posts & updates on whats happening.
- **Citus Blog**: Read our popular [Citus Open Source Blog](https://www.citusdata.com/blog/) for posts about PostgreSQL and Citus.
- **Citus Newsletter**: Subscribe to our monthly technical [Citus Newsletter](https://www.citusdata.com/join-newsletter) to get a curated collection of our favorite posts, videos, docs, talks, & other Postgres goodies.
- **Slack**: Our [Citus Public slack](https://slack.citusdata.com/) is a good way to stay connected, not just with us but with other Citus users.
- **Sister Blog**: Read the PostgreSQL posts on the [Azure Cosmos DB for PostgreSQL blog](https://devblogs.microsoft.com/cosmosdb/category/postgresql/) about our managed service on Azure.
- **Videos**: Check out this [YouTube playlist](https://www.youtube.com/playlist?list=PLixnExCn6lRq261O0iwo4ClYxHpM9qfVy) of some of our favorite Citus videos and demos. If you want to deep dive into how Citus extends PostgreSQL, you might want to check out Marco Slots talk at Carnegie Mellon titled [Citus: Distributed PostgreSQL as an Extension](https://youtu.be/X-aAgXJZRqM) that was part of Andy Pavlos Vaccination Database Talks series at CMUDB.
- **Our other Postgres projects**: Our team also works on other awesome PostgreSQL open source extensions & projects, including: [pg_cron](https://github.com/citusdata/pg_cron), [HyperLogLog](https://github.com/citusdata/postgresql-hll), [TopN](https://github.com/citusdata/postgresql-topn), [pg_auto_failover](https://github.com/citusdata/pg_auto_failover), [activerecord-multi-tenant](https://github.com/citusdata/activerecord-multi-tenant), and [django-multitenant](https://github.com/citusdata/django-multitenant).
___ ___
Copyright © Citus Data, Inc. Copyright © Citus Data, Inc.
[faq]: https://www.citusdata.com/frequently-asked-questions
[tutorial]: https://docs.citusdata.com/en/stable/tutorials/multi-tenant-tutorial.html

View File

@ -1,41 +0,0 @@
<!-- BEGIN MICROSOFT SECURITY.MD V0.0.8 BLOCK -->
## Security
Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/).
If you believe you have found a security vulnerability in any Microsoft-owned repository that meets [Microsoft's definition of a security vulnerability](https://aka.ms/opensource/security/definition), please report it to us as described below.
## Reporting Security Issues
**Please do not report security vulnerabilities through public GitHub issues.**
Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://aka.ms/opensource/security/create-report).
If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://aka.ms/opensource/security/pgpkey).
You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://aka.ms/opensource/security/msrc).
Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:
* Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)
* Full paths of source file(s) related to the manifestation of the issue
* The location of the affected source code (tag/branch/commit or direct URL)
* Any special configuration required to reproduce the issue
* Step-by-step instructions to reproduce the issue
* Proof-of-concept or exploit code (if possible)
* Impact of the issue, including how an attacker might exploit the issue
This information will help us triage your report more quickly.
If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://aka.ms/opensource/security/bounty) page for more details about our active programs.
## Preferred Languages
We prefer all communications to be in English.
## Policy
Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://aka.ms/opensource/security/cvd).
<!-- END MICROSOFT SECURITY.MD BLOCK -->

View File

@ -1,160 +0,0 @@
# Coding style
The existing code-style in our code-base is not super consistent. There are multiple reasons for that. One big reason is because our code-base is relatively old and our standards have changed over time. The second big reason is that our style-guide is different from style-guide of Postgres and some code is copied from Postgres source code and is slightly modified. The below rules are for new code. If you're changing existing code that uses a different style, use your best judgement to decide if you use the rules here or if you match the existing style.
## Using citus_indent
CI pipeline will automatically reject any PRs which do not follow our coding
conventions. The easiest way to ensure your PR adheres to those conventions is
to use the [citus_indent](https://github.com/citusdata/tools/tree/develop/uncrustify)
tool. This tool uses `uncrustify` under the hood.
```bash
# Uncrustify changes the way it formats code every release a bit. To make sure
# everyone formats consistently we use version 0.68.1:
curl -L https://github.com/uncrustify/uncrustify/archive/uncrustify-0.68.1.tar.gz | tar xz
cd uncrustify-uncrustify-0.68.1/
mkdir build
cd build
cmake ..
make -j5
sudo make install
cd ../..
git clone https://github.com/citusdata/tools.git
cd tools
make uncrustify/.install
```
Once you've done that, you can run the `make reindent` command from the top
directory to recursively check and correct the style of any source files in the
current directory. Under the hood, `make reindent` will run `citus_indent` and
some other style corrections for you.
You can also run the following in the directory of this repository to
automatically format all the files that you have changed before committing:
```bash
cat > .git/hooks/pre-commit << __EOF__
#!/bin/bash
citus_indent --check --diff || { citus_indent --diff; exit 1; }
__EOF__
chmod +x .git/hooks/pre-commit
```
## Other rules we follow that citus_indent does not enforce
* We almost always use **CamelCase**, when naming functions, variables etc., **not snake_case**.
* We also have the habits of using a **lowerCamelCase** for some variables named from their type or from their function name, as shown in the examples:
```c
bool IsCitusExtensionLoaded = false;
bool
IsAlterTableRenameStmt(RenameStmt *renameStmt)
{
AlterTableCmd *alterTableCommand = NULL;
..
..
bool isAlterTableRenameStmt = false;
..
}
```
* We **start functions with a comment**:
```c
/*
* MyNiceFunction <something in present simple tense, e.g., processes / returns / checks / takes X as input / does Y> ..
* <some more nice words> ..
* <some more nice words> ..
*/
<static?> <return type>
MyNiceFunction(..)
{
..
..
}
```
* `#includes` needs to be sorted based on below ordering and then alphabetically and we should not include what we don't need in a file:
* System includes (eg. #include<...>)
* Postgres.h (eg. #include "postgres.h")
* Toplevel imports from postgres, not contained in a directory (eg. #include "miscadmin.h")
* General postgres includes (eg . #include "nodes/...")
* Toplevel citus includes, not contained in a directory (eg. #include "citus_verion.h")
* Columnar includes (eg. #include "columnar/...")
* Distributed includes (eg. #include "distributed/...")
* Comments:
```c
/* single line comments start with a lower-case */
/*
* We start multi-line comments with a capital letter
* and keep adding a star to the beginning of each line
* until we close the comment with a star and a slash.
*/
```
* Order of function implementations and their declarations in a file:
We define static functions after the functions that call them. For example:
```c
#include<..>
#include<..>
..
..
typedef struct
{
..
..
} MyNiceStruct;
..
..
PG_FUNCTION_INFO_V1(my_nice_udf1);
PG_FUNCTION_INFO_V1(my_nice_udf2);
..
..
// .. somewhere on top of the file …
static void MyNiceStaticlyDeclaredFunction1(…);
static void MyNiceStaticlyDeclaredFunction2(…);
..
..
void
MyNiceFunctionExternedViaHeaderFile(..)
{
..
..
MyNiceStaticlyDeclaredFunction1(..);
..
..
MyNiceStaticlyDeclaredFunction2(..);
..
}
..
..
// we define this first because it's called by MyNiceFunctionExternedViaHeaderFile()
// before MyNiceStaticlyDeclaredFunction2()
static void
MyNiceStaticlyDeclaredFunction1(…)
{
}
..
..
// then we define this
static void
MyNiceStaticlyDeclaredFunction2(…)
{
}
```

View File

@ -1,6 +1,6 @@
#!/bin/bash #!/bin/bash
# #
# autogen.sh converts configure.ac to configure and creates # autogen.sh converts configure.in to configure and creates
# citus_config.h.in. The resuting resulting files are checked into # citus_config.h.in. The resuting resulting files are checked into
# the SCM, to avoid everyone needing autoconf installed. # the SCM, to avoid everyone needing autoconf installed.

View File

@ -1,47 +0,0 @@
{
"Registrations": [
{
"Component": {
"Type": "git",
"git": {
"RepositoryUrl": "https://github.com/intel/safestringlib",
"CommitHash": "245c4b8cff1d2e7338b7f3a82828fc8e72b29549"
}
},
"DevelopmentDependency": false
},
{
"Component": {
"Type": "git",
"git": {
"RepositoryUrl": "https://github.com/postgres/postgres",
"CommitHash": "29be9983a64c011eac0b9ee29895cce71e15ea77"
}
},
"license": "PostgreSQL",
"licenseDetail": [
"Portions Copyright (c) 1996-2010, The PostgreSQL Global Development Group",
"",
"Portions Copyright (c) 1994, The Regents of the University of California",
"",
"Permission to use, copy, modify, and distribute this software and its documentation for ",
"any purpose, without fee, and without a written agreement is hereby granted, provided ",
"that the above copyright notice and this paragraph and the following two paragraphs appear ",
"in all copies.",
"",
"IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, ",
"INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS ",
"SOFTWARE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE ",
"POSSIBILITY OF SUCH DAMAGE.",
"",
"THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, ",
"THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED ",
"HEREUNDER IS ON AN \"AS IS\" BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE ",
"MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS."
],
"version": "0.0.1",
"DevelopmentDependency": false
}
]
}

View File

@ -283,14 +283,6 @@ actually run in CI. This is most commonly forgotten for newly added CI tests
that the developer only ran locally. It also checks that all CI scripts have a that the developer only ran locally. It also checks that all CI scripts have a
section in this `README.md` file and that they include `ci/ci_helpers.sh`. section in this `README.md` file and that they include `ci/ci_helpers.sh`.
## `check_migration_files.sh`
A branch that touches a set of upgrade scripts is also expected to touch
corresponding downgrade scripts as well. If this script fails, read the output
and make sure you update the downgrade scripts in the printed list. If you
really don't need a downgrade to run any SQL. You can write a comment in the
file explaining why a downgrade step is not necessary.
## `disallow_c_comments_in_migrations.sh` ## `disallow_c_comments_in_migrations.sh`
We do not use C-style comments in migration files as the stripped We do not use C-style comments in migration files as the stripped
@ -301,18 +293,6 @@ Instead use SQL type comments, i.e:
``` ```
See [#3115](https://github.com/citusdata/citus/pull/3115) for more info. See [#3115](https://github.com/citusdata/citus/pull/3115) for more info.
## `disallow_hash_comments_in_spec_files.sh`
We do not use comments starting with # in spec files because it creates errors
from C preprocessor that expects directives after this character.
Instead use C type comments, i.e:
```
// this is a single line comment
/*
* this is a multi line comment
*/
```
## `disallow_long_changelog_entries.sh` ## `disallow_long_changelog_entries.sh`
@ -366,37 +346,3 @@ foo = 2
#endif #endif
``` ```
This was deemed to be error prone and not worth the effort. This was deemed to be error prone and not worth the effort.
## `fix_gitignore.sh`
This script checks and fixes issues with `.gitignore` rules:
1. Makes sure we do not commit any generated files that should be ignored. If there is an
ignored file in the git tree, the user is expected to review the files that are removed
from the git tree and commit them.
## `check_gucs_are_alphabetically_sorted.sh`
This script checks the order of the GUCs defined in `shared_library_init.c`.
To solve this failure, please check `shared_library_init.c` and make sure that the GUC
definitions are in alphabetical order.
## `print_stack_trace.sh`
This script prints stack traces for failed tests, if they left core files.
## `sort_and_group_includes.sh`
This script checks and fixes issues with include grouping and sorting in C files.
Includes are grouped in the following groups:
- System includes (eg. `#include <math>`)
- Postgres.h include (eg. `#include "postgres.h"`)
- Toplevel postgres includes (includes not in a directory eg. `#include "miscadmin.h`)
- Postgres includes in a directory (eg. `#include "catalog/pg_type.h"`)
- Toplevel citus includes (includes not in a directory eg. `#include "pg_version_constants.h"`)
- Columnar includes (eg. `#include "columnar/columnar.h"`)
- Distributed includes (eg. `#include "distributed/maintenanced.h"`)
Within every group the include lines are sorted alphabetically.

View File

@ -15,6 +15,9 @@ PG_MAJOR=${PG_MAJOR:?please provide the postgres major version}
codename=${VERSION#*(} codename=${VERSION#*(}
codename=${codename%)*} codename=${codename%)*}
# get project from argument
project="${CIRCLE_PROJECT_REPONAME}"
# we'll do everything with absolute paths # we'll do everything with absolute paths
basedir="$(pwd)" basedir="$(pwd)"
@ -25,12 +28,12 @@ build_ext() {
pg_major="$1" pg_major="$1"
builddir="${basedir}/build-${pg_major}" builddir="${basedir}/build-${pg_major}"
echo "Beginning build for PostgreSQL ${pg_major}..." >&2 echo "Beginning build of ${project} for PostgreSQL ${pg_major}..." >&2
# do everything in a subdirectory to avoid clutter in current directory # do everything in a subdirectory to avoid clutter in current directory
mkdir -p "${builddir}" && cd "${builddir}" mkdir -p "${builddir}" && cd "${builddir}"
CFLAGS=-Werror "${basedir}/configure" PG_CONFIG="/usr/lib/postgresql/${pg_major}/bin/pg_config" --enable-coverage --with-security-flags CFLAGS=-Werror "${basedir}/configure" PG_CONFIG="/usr/lib/postgresql/${pg_major}/bin/pg_config" --enable-coverage
installdir="${builddir}/install" installdir="${builddir}/install"
make -j$(nproc) && mkdir -p "${installdir}" && { make DESTDIR="${installdir}" install-all || make DESTDIR="${installdir}" install ; } make -j$(nproc) && mkdir -p "${installdir}" && { make DESTDIR="${installdir}" install-all || make DESTDIR="${installdir}" install ; }

View File

@ -14,8 +14,8 @@ ci_scripts=$(
grep -v -E '^(ci_helpers.sh|fix_style.sh)$' grep -v -E '^(ci_helpers.sh|fix_style.sh)$'
) )
for script in $ci_scripts; do for script in $ci_scripts; do
if ! grep "\\bci/$script\\b" -r .github > /dev/null; then if ! grep "\\bci/$script\\b" .circleci/config.yml > /dev/null; then
echo "ERROR: CI script with name \"$script\" is not actually used in .github folder" echo "ERROR: CI script with name \"$script\" is not actually used in .circleci/config.yml"
exit 1 exit 1
fi fi
if ! grep "^## \`$script\`\$" ci/README.md > /dev/null; then if ! grep "^## \`$script\`\$" ci/README.md > /dev/null; then

View File

@ -7,12 +7,13 @@ source ci/ci_helpers.sh
cd src/test/regress cd src/test/regress
# 1. Find all *.sql and *.spec files in the sql, and spec directories # 1. Find all *.sql *.spec and *.source files in the sql, spec and input
# directories
# 2. Strip the extension and the directory # 2. Strip the extension and the directory
# 3. Ignore names that end with .include, those files are meant to be in an C # 3. Ignore names that end with .include, those files are meant to be in an C
# preprocessor #include statement. They should not be in schedules. # preprocessor #include statement. They should not be in schedules.
test_names=$( test_names=$(
find sql spec -iname "*.sql" -o -iname "*.spec" | find sql spec input -iname "*.sql" -o -iname "*.spec" -o -iname "*.source" |
sed -E 's#^\w+/([^/]+)\.[^.]+$#\1#g' | sed -E 's#^\w+/([^/]+)\.[^.]+$#\1#g' |
grep -v '.include$' grep -v '.include$'
) )

88
ci/check_enterprise_merge.sh Executable file
View File

@ -0,0 +1,88 @@
#!/bin/bash
# Testing this script locally requires you to set the following environment
# variables:
# CIRCLE_BRANCH, GIT_USERNAME and GIT_TOKEN
# fail if trying to reference a variable that is not set.
set -u
# exit immediately if a command fails
set -e
# Fail on pipe failures
set -o pipefail
PR_BRANCH="${CIRCLE_BRANCH}"
ENTERPRISE_REMOTE="https://${GIT_USERNAME}:${GIT_TOKEN}@github.com/citusdata/citus-enterprise"
# shellcheck disable=SC1091
source ci/ci_helpers.sh
# List executed commands. This is done so debugging this script is easier when
# it fails. It's explicitly done after git remote add so username and password
# are not shown in CI output (even though it's also filtered out by CircleCI)
set -x
check_compile () {
echo "INFO: checking if merged code can be compiled"
./configure --without-libcurl
make -j10
}
# Clone current git repo (which should be community) to a temporary working
# directory and go there
GIT_DIR_ROOT="$(git rev-parse --show-toplevel)"
TMP_GIT_DIR="$(mktemp --directory -t citus-merge-check.XXXXXXXXX)"
git clone "$GIT_DIR_ROOT" "$TMP_GIT_DIR"
cd "$TMP_GIT_DIR"
# Fails in CI without this
git config user.email "citus-bot@microsoft.com"
git config user.name "citus bot"
# Disable "set -x" temporarily, because $ENTERPRISE_REMOTE contains passwords
{ set +x ; } 2> /dev/null
git remote add enterprise "$ENTERPRISE_REMOTE"
set -x
git remote set-url --push enterprise no-pushing
# Fetch enterprise-master
git fetch enterprise enterprise-master
git checkout "enterprise/enterprise-master"
if git merge --no-commit "origin/$PR_BRANCH"; then
echo "INFO: community PR branch could be merged into enterprise-master"
# check that we can compile after the merge
if check_compile; then
exit 0
fi
echo "WARN: Failed to compile after community PR branch was merged into enterprise"
fi
# undo partial merge
git merge --abort
if ! git fetch enterprise "$PR_BRANCH" ; then
echo "ERROR: enterprise/$PR_BRANCH was not found and community PR branch could not be merged into enterprise-master"
exit 1
fi
# Show the top commit of the enterprise PR branch to make debugging easier
git log -n 1 "enterprise/$PR_BRANCH"
# Check that this branch contains the top commit of the current community PR
# branch. If it does not it means it's not up to date with the current PR, so
# the enterprise branch should be updated.
if ! git merge-base --is-ancestor "origin/$PR_BRANCH" "enterprise/$PR_BRANCH" ; then
echo "ERROR: enterprise/$PR_BRANCH is not up to date with community PR branch"
exit 1
fi
# Now check if we can merge the enterprise PR into enterprise-master without
# issues.
git merge --no-commit "enterprise/$PR_BRANCH"
# check that we can compile after the merge
check_compile

View File

@ -1,25 +0,0 @@
#!/bin/bash
set -euo pipefail
# shellcheck disable=SC1091
source ci/ci_helpers.sh
# Find the line that exactly matches "RegisterCitusConfigVariables(void)" in
# shared_library_init.c. grep command returns something like
# "934:RegisterCitusConfigVariables(void)" and we extract the line number
# with cut.
RegisterCitusConfigVariables_begin_linenumber=$(grep -n "^RegisterCitusConfigVariables(void)$" src/backend/distributed/shared_library_init.c | cut -d: -f1)
# Consider the lines starting from $RegisterCitusConfigVariables_begin_linenumber,
# grep the first line that starts with "}" and extract the line number with cut
# as in the previous step.
RegisterCitusConfigVariables_length=$(tail -n +$RegisterCitusConfigVariables_begin_linenumber src/backend/distributed/shared_library_init.c | grep -n -m 1 "^}$" | cut -d: -f1)
# extract the function definition of RegisterCitusConfigVariables into a temp file
tail -n +$RegisterCitusConfigVariables_begin_linenumber src/backend/distributed/shared_library_init.c | head -n $(($RegisterCitusConfigVariables_length)) > RegisterCitusConfigVariables_func_def.out
# extract citus gucs in the form of <tab><tab>"citus.X"
grep -P "^[\t][\t]\"citus\.[a-zA-Z_0-9]+\"" RegisterCitusConfigVariables_func_def.out > gucs.out
LC_COLLATE=C sort -c gucs.out
rm gucs.out
rm RegisterCitusConfigVariables_func_def.out

View File

@ -1,33 +0,0 @@
#! /bin/bash
set -euo pipefail
# shellcheck disable=SC1091
source ci/ci_helpers.sh
# This file checks for the existence of downgrade scripts for every upgrade script that is changed in the branch.
# create list of migration files for upgrades
upgrade_files=$(git diff --name-only origin/main | { grep "src/backend/distributed/sql/citus--.*sql" || exit 0 ; })
downgrade_files=$(git diff --name-only origin/main | { grep "src/backend/distributed/sql/downgrades/citus--.*sql" || exit 0 ; })
ret_value=0
for file in $upgrade_files
do
# There should always be 2 matches, and no need to avoid splitting here
# shellcheck disable=SC2207
versions=($(grep --only-matching --extended-regexp "[0-9]+\.[0-9]+[-.][0-9]+" <<< "$file"))
from_version=${versions[0]};
to_version=${versions[1]};
downgrade_migration_file="src/backend/distributed/sql/downgrades/citus--$to_version--$from_version.sql"
# check for the existence of migration scripts
if [[ $(grep --line-regexp --count "$downgrade_migration_file" <<< "$downgrade_files") == 0 ]]
then
echo "$file is updated, but $downgrade_migration_file is not updated in branch"
ret_value=1
fi
done
exit $ret_value;

View File

@ -4,7 +4,7 @@ set -euo pipefail
# shellcheck disable=SC1091 # shellcheck disable=SC1091
source ci/ci_helpers.sh source ci/ci_helpers.sh
for udf_dir in src/backend/distributed/sql/udfs/* src/backend/columnar/sql/udfs/*; do for udf_dir in src/backend/distributed/sql/udfs/*; do
# We want to find the last snapshotted sql file, to make sure it's the same # We want to find the last snapshotted sql file, to make sure it's the same
# as "latest.sql". This is done by: # as "latest.sql". This is done by:
# 1. Getting the filenames in the UDF directory (using find instead of ls, to keep shellcheck happy) # 1. Getting the filenames in the UDF directory (using find instead of ls, to keep shellcheck happy)

View File

@ -1,10 +1,6 @@
#! /bin/bash #! /bin/bash
set -euo pipefail set -euo pipefail
# make ** match all directories and subdirectories
shopt -s globstar
# shellcheck disable=SC1091 # shellcheck disable=SC1091
source ci/ci_helpers.sh source ci/ci_helpers.sh
@ -16,17 +12,17 @@ source ci/ci_helpers.sh
# and reusing them if needed. GNU sed unfortunately does not support lookaround assertions. # and reusing them if needed. GNU sed unfortunately does not support lookaround assertions.
# /* -> -- # /* -> --
find src/backend/{distributed,columnar}/sql/**/*.sql -print0 | xargs -0 sed -i 's#/\*#--#g' find src/backend/distributed/sql/*.sql -print0 | xargs -0 sed -i 's#/\*#--#g'
# */ -> `` (empty string) # */ -> `` (empty string)
# remove all whitespaces immediately before the match # remove all whitespaces immediately before the match
find src/backend/{distributed,columnar}/sql/**/*.sql -print0 | xargs -0 sed -i 's#\s*\*/\s*##g' find src/backend/distributed/sql/*.sql -print0 | xargs -0 sed -i 's#\s*\*/\s*##g'
# * -> -- # * -> --
# keep the indentation # keep the indentation
# allow only whitespaces before the match # allow only whitespaces before the match
find src/backend/{distributed,columnar}/sql/**/*.sql -print0 | xargs -0 sed -i 's#^\(\s*\) \*#\1--#g' find src/backend/distributed/sql/*.sql -print0 | xargs -0 sed -i 's#^\(\s*\) \*#\1--#g'
# // -> -- # // -> --
# do not touch http:// or similar by allowing only whitespaces before // # do not touch http:// or similar by allowing only whitespaces before //
find src/backend/{distributed,columnar}/sql/**/*.sql -print0 | xargs -0 sed -i 's#^\(\s*\)//#\1--#g' find src/backend/distributed/sql/*.sql -print0 | xargs -0 sed -i 's#^\(\s*\)//#\1--#g'

View File

@ -1,12 +0,0 @@
#! /bin/bash
set -euo pipefail
# shellcheck disable=SC1091
source ci/ci_helpers.sh
# We do not use comments starting with # in spec files because it creates warnings from
# preprocessor that expects directives after this character.
# `# ` -> `-- `
find src/test/regress/spec/*.spec -print0 | xargs -0 sed -i 's!# !// !g'

View File

@ -1,19 +0,0 @@
#! /bin/bash
set -euo pipefail
# shellcheck disable=SC1091
source ci/ci_helpers.sh
# Remove all the ignored files from git tree, and error out
# find all ignored files in git tree, and use quotation marks to prevent word splitting on filenames with spaces in them
# NOTE: Option --cached is needed to avoid a bug in git ls-files command.
ignored_lines_in_git_tree=$(git ls-files --ignored --cached --exclude-standard | sed 's/.*/"&"/')
if [[ -n $ignored_lines_in_git_tree ]]
then
echo "Ignored files should not be in git tree!"
echo "${ignored_lines_in_git_tree}"
echo "Removing these files from git tree, please review and commit"
echo "$ignored_lines_in_git_tree" | xargs git rm -r --cached
exit 1
fi

View File

@ -9,14 +9,8 @@ cidir="${0%/*}"
cd ${cidir}/.. cd ${cidir}/..
citus_indent . --quiet citus_indent . --quiet
black . --quiet
isort . --quiet
ci/editorconfig.sh ci/editorconfig.sh
ci/remove_useless_declarations.sh ci/remove_useless_declarations.sh
ci/disallow_c_comments_in_migrations.sh ci/disallow_c_comments_in_migrations.sh
ci/disallow_hash_comments_in_spec_files.sh
ci/disallow_long_changelog_entries.sh ci/disallow_long_changelog_entries.sh
ci/normalize_expected.sh ci/normalize_expected.sh
ci/fix_gitignore.sh
ci/print_stack_trace.sh
ci/sort_and_group_includes.sh

View File

@ -1,157 +0,0 @@
#!/usr/bin/env python3
"""
easy command line to run against all citus-style checked files:
$ git ls-files \
| git check-attr --stdin citus-style \
| grep 'citus-style: set' \
| awk '{print $1}' \
| cut -d':' -f1 \
| xargs -n1 ./ci/include_grouping.py
"""
import collections
import os
import sys
def main(args):
if len(args) < 2:
print("Usage: include_grouping.py <file>")
return
file = args[1]
if not os.path.isfile(file):
sys.exit(f"File '{file}' does not exist")
with open(file, "r") as in_file:
with open(file + ".tmp", "w") as out_file:
includes = []
skipped_lines = []
# This calls print_sorted_includes on a set of consecutive #include lines.
# This implicitly keeps separation of any #include lines that are contained in
# an #ifdef, because it will order the #include lines inside and after the
# #ifdef completely separately.
for line in in_file:
# if a line starts with #include we don't want to print it yet, instead we
# want to collect all consecutive #include lines
if line.startswith("#include"):
includes.append(line)
skipped_lines = []
continue
# if we have collected any #include lines, we want to print them sorted
# before printing the current line. However, if the current line is empty
# we want to perform a lookahead to see if the next line is an #include.
# To maintain any separation between #include lines and their subsequent
# lines we keep track of all lines we have skipped inbetween.
if len(includes) > 0:
if len(line.strip()) == 0:
skipped_lines.append(line)
continue
# we have includes that need to be grouped before printing the current
# line.
print_sorted_includes(includes, file=out_file)
includes = []
# print any skipped lines
print("".join(skipped_lines), end="", file=out_file)
skipped_lines = []
print(line, end="", file=out_file)
# move out_file to file
os.rename(file + ".tmp", file)
def print_sorted_includes(includes, file=sys.stdout):
default_group_key = 1
groups = collections.defaultdict(set)
# define the groups that we separate correctly. The matchers are tested in the order
# of their priority field. The first matcher that matches the include is used to
# assign the include to a group.
# The groups are printed in the order of their group_key.
matchers = [
{
"name": "system includes",
"matcher": lambda x: x.startswith("<"),
"group_key": -2,
"priority": 0,
},
{
"name": "toplevel postgres includes",
"matcher": lambda x: "/" not in x,
"group_key": 0,
"priority": 9,
},
{
"name": "postgres.h",
"matcher": lambda x: x.strip() in ['"postgres.h"'],
"group_key": -1,
"priority": -1,
},
{
"name": "toplevel citus inlcudes",
"matcher": lambda x: x.strip()
in [
'"citus_version.h"',
'"pg_version_compat.h"',
'"pg_version_constants.h"',
],
"group_key": 3,
"priority": 0,
},
{
"name": "columnar includes",
"matcher": lambda x: x.startswith('"columnar/'),
"group_key": 4,
"priority": 1,
},
{
"name": "distributed includes",
"matcher": lambda x: x.startswith('"distributed/'),
"group_key": 5,
"priority": 1,
},
]
matchers.sort(key=lambda x: x["priority"])
# throughout our codebase we have some includes where either postgres or citus
# includes are wrongfully included with the syntax for system includes. Before we
# try to match those we will change the <> to "" to make them match our system. This
# will also rewrite the include to the correct syntax.
common_system_include_error_prefixes = ["<nodes/", "<distributed/"]
# assign every include to a group
for include in includes:
# extract the group key from the include
include_content = include.split(" ")[1]
# fix common system includes which are secretly postgres or citus includes
for common_prefix in common_system_include_error_prefixes:
if include_content.startswith(common_prefix):
include_content = '"' + include_content.strip()[1:-1] + '"'
include = include.split(" ")[0] + " " + include_content + "\n"
break
group_key = default_group_key
for matcher in matchers:
if matcher["matcher"](include_content):
group_key = matcher["group_key"]
break
groups[group_key].add(include)
# iterate over all groups in the natural order of its keys
for i, group in enumerate(sorted(groups.items())):
if i > 0:
print(file=file)
includes = group[1]
print("".join(sorted(includes)), end="", file=file)
if __name__ == "__main__":
main(sys.argv)

View File

@ -1,25 +0,0 @@
#!/bin/bash
set -euo pipefail
# shellcheck disable=SC1091
source ci/ci_helpers.sh
# find all core files
core_files=( $(find . -type f -regex .*core.*\d*.*postgres) )
if [ ${#core_files[@]} -gt 0 ]; then
# print stack traces for the core files
for core_file in "${core_files[@]}"
do
# set print frame-arguments all: show all scalars + structures in the frame
# set print pretty on: show structures in indented mode
# set print addr off: do not show pointer address
# thread apply all bt full: show stack traces for all threads
gdb --batch \
-ex "set print frame-arguments all" \
-ex "set print pretty on" \
-ex "set print addr off" \
-ex "thread apply all bt full" \
postgres "${core_file}"
done
fi

View File

@ -1,12 +0,0 @@
#!/bin/bash
set -euo pipefail
# shellcheck disable=SC1091
source ci/ci_helpers.sh
git ls-files \
| git check-attr --stdin citus-style \
| grep 'citus-style: set' \
| awk '{print $1}' \
| cut -d':' -f1 \
| xargs -n1 ./ci/include_grouping.py

View File

@ -10,7 +10,7 @@
# argument (other than "yes/no"), etc. # argument (other than "yes/no"), etc.
# #
# The point of this implementation is to reduce code size and # The point of this implementation is to reduce code size and
# redundancy in configure.ac and to improve robustness and consistency # redundancy in configure.in and to improve robustness and consistency
# in the option evaluation code. # in the option evaluation code.

250
configure vendored
View File

@ -1,6 +1,6 @@
#! /bin/sh #! /bin/sh
# Guess values for system-dependent variables and create Makefiles. # Guess values for system-dependent variables and create Makefiles.
# Generated by GNU Autoconf 2.69 for Citus 13.2devel. # Generated by GNU Autoconf 2.69 for Citus 9.5.10.
# #
# #
# Copyright (C) 1992-1996, 1998-2012 Free Software Foundation, Inc. # Copyright (C) 1992-1996, 1998-2012 Free Software Foundation, Inc.
@ -579,8 +579,8 @@ MAKEFLAGS=
# Identity of this package. # Identity of this package.
PACKAGE_NAME='Citus' PACKAGE_NAME='Citus'
PACKAGE_TARNAME='citus' PACKAGE_TARNAME='citus'
PACKAGE_VERSION='13.2devel' PACKAGE_VERSION='9.5.10'
PACKAGE_STRING='Citus 13.2devel' PACKAGE_STRING='Citus 9.5.10'
PACKAGE_BUGREPORT='' PACKAGE_BUGREPORT=''
PACKAGE_URL='' PACKAGE_URL=''
@ -631,8 +631,6 @@ CITUS_BITCODE_CFLAGS
CITUS_CFLAGS CITUS_CFLAGS
GIT_BIN GIT_BIN
with_security_flags with_security_flags
with_zstd
with_lz4
EGREP EGREP
GREP GREP
CPP CPP
@ -644,7 +642,6 @@ LDFLAGS
CFLAGS CFLAGS
CC CC
vpath_build vpath_build
with_pg_version_check
PATH PATH
PG_CONFIG PG_CONFIG
FLEX FLEX
@ -693,12 +690,9 @@ ac_subst_files=''
ac_user_opts=' ac_user_opts='
enable_option_checking enable_option_checking
with_extra_version with_extra_version
with_pg_version_check
enable_coverage enable_coverage
with_libcurl with_libcurl
with_reports_hostname with_reports_hostname
with_lz4
with_zstd
with_security_flags with_security_flags
' '
ac_precious_vars='build_alias ac_precious_vars='build_alias
@ -1262,7 +1256,7 @@ if test "$ac_init_help" = "long"; then
# Omit some internal or obsolete options to make the list less imposing. # Omit some internal or obsolete options to make the list less imposing.
# This message is too long to be a string in the A/UX 3.1 sh. # This message is too long to be a string in the A/UX 3.1 sh.
cat <<_ACEOF cat <<_ACEOF
\`configure' configures Citus 13.2devel to adapt to many kinds of systems. \`configure' configures Citus 9.5.10 to adapt to many kinds of systems.
Usage: $0 [OPTION]... [VAR=VALUE]... Usage: $0 [OPTION]... [VAR=VALUE]...
@ -1324,7 +1318,7 @@ fi
if test -n "$ac_init_help"; then if test -n "$ac_init_help"; then
case $ac_init_help in case $ac_init_help in
short | recursive ) echo "Configuration of Citus 13.2devel:";; short | recursive ) echo "Configuration of Citus 9.5.10:";;
esac esac
cat <<\_ACEOF cat <<\_ACEOF
@ -1339,15 +1333,11 @@ Optional Packages:
--without-PACKAGE do not use PACKAGE (same as --with-PACKAGE=no) --without-PACKAGE do not use PACKAGE (same as --with-PACKAGE=no)
--with-extra-version=STRING --with-extra-version=STRING
append STRING to version append STRING to version
--without-pg-version-check
do not check postgres version during configure
--without-libcurl do not use libcurl for anonymous statistics --without-libcurl do not use libcurl for anonymous statistics
collection collection
--with-reports-hostname=HOSTNAME --with-reports-hostname=HOSTNAME
Use HOSTNAME as hostname for statistics collection Use HOSTNAME as hostname for statistics collection
and update checks and update checks
--without-lz4 do not use lz4
--without-zstd do not use zstd
--with-security-flags use security flags --with-security-flags use security flags
Some influential environment variables: Some influential environment variables:
@ -1429,7 +1419,7 @@ fi
test -n "$ac_init_help" && exit $ac_status test -n "$ac_init_help" && exit $ac_status
if $ac_init_version; then if $ac_init_version; then
cat <<\_ACEOF cat <<\_ACEOF
Citus configure 13.2devel Citus configure 9.5.10
generated by GNU Autoconf 2.69 generated by GNU Autoconf 2.69
Copyright (C) 2012 Free Software Foundation, Inc. Copyright (C) 2012 Free Software Foundation, Inc.
@ -1912,7 +1902,7 @@ cat >config.log <<_ACEOF
This file contains any messages produced by compilers while This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake. running configure, to aid debugging if configure makes a mistake.
It was created by Citus $as_me 13.2devel, which was It was created by Citus $as_me 9.5.10, which was
generated by GNU Autoconf 2.69. Invocation command line was generated by GNU Autoconf 2.69. Invocation command line was
$ $0 $@ $ $0 $@
@ -2559,36 +2549,7 @@ if test -z "$version_num"; then
as_fn_error $? "Could not detect PostgreSQL version from pg_config." "$LINENO" 5 as_fn_error $? "Could not detect PostgreSQL version from pg_config." "$LINENO" 5
fi fi
if test "$version_num" != '11' -a "$version_num" != '12' -a "$version_num" != '13'; then
# Check whether --with-pg-version-check was given.
if test "${with_pg_version_check+set}" = set; then :
withval=$with_pg_version_check;
case $withval in
yes)
:
;;
no)
:
;;
*)
as_fn_error $? "no argument expected for --with-pg-version-check option" "$LINENO" 5
;;
esac
else
with_pg_version_check=yes
fi
if test "$with_pg_version_check" = no; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: building against PostgreSQL $version_num (skipped compatibility check)" >&5
$as_echo "$as_me: building against PostgreSQL $version_num (skipped compatibility check)" >&6;}
elif test "$version_num" != '15' -a "$version_num" != '16' -a "$version_num" != '17'; then
as_fn_error $? "Citus is not compatible with the detected PostgreSQL version ${version_num}." "$LINENO" 5 as_fn_error $? "Citus is not compatible with the detected PostgreSQL version ${version_num}." "$LINENO" 5
else else
{ $as_echo "$as_me:${as_lineno-$LINENO}: building against PostgreSQL $version_num" >&5 { $as_echo "$as_me:${as_lineno-$LINENO}: building against PostgreSQL $version_num" >&5
@ -4565,197 +4526,6 @@ cat >>confdefs.h <<_ACEOF
_ACEOF _ACEOF
#
# LZ4
#
# Check whether --with-lz4 was given.
if test "${with_lz4+set}" = set; then :
withval=$with_lz4;
case $withval in
yes)
$as_echo "#define HAVE_CITUS_LIBLZ4 1" >>confdefs.h
;;
no)
:
;;
*)
as_fn_error $? "no argument expected for --with-lz4 option" "$LINENO" 5
;;
esac
else
with_lz4=yes
$as_echo "#define HAVE_CITUS_LIBLZ4 1" >>confdefs.h
fi
if test "$with_lz4" = yes; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for LZ4_compress_default in -llz4" >&5
$as_echo_n "checking for LZ4_compress_default in -llz4... " >&6; }
if ${ac_cv_lib_lz4_LZ4_compress_default+:} false; then :
$as_echo_n "(cached) " >&6
else
ac_check_lib_save_LIBS=$LIBS
LIBS="-llz4 $LIBS"
cat confdefs.h - <<_ACEOF >conftest.$ac_ext
/* end confdefs.h. */
/* Override any GCC internal prototype to avoid an error.
Use char because int might match the return type of a GCC
builtin and then its argument prototype would still apply. */
#ifdef __cplusplus
extern "C"
#endif
char LZ4_compress_default ();
int
main ()
{
return LZ4_compress_default ();
;
return 0;
}
_ACEOF
if ac_fn_c_try_link "$LINENO"; then :
ac_cv_lib_lz4_LZ4_compress_default=yes
else
ac_cv_lib_lz4_LZ4_compress_default=no
fi
rm -f core conftest.err conftest.$ac_objext \
conftest$ac_exeext conftest.$ac_ext
LIBS=$ac_check_lib_save_LIBS
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_lz4_LZ4_compress_default" >&5
$as_echo "$ac_cv_lib_lz4_LZ4_compress_default" >&6; }
if test "x$ac_cv_lib_lz4_LZ4_compress_default" = xyes; then :
cat >>confdefs.h <<_ACEOF
#define HAVE_LIBLZ4 1
_ACEOF
LIBS="-llz4 $LIBS"
else
as_fn_error $? "lz4 library not found
If you have lz4 installed, see config.log for details on the
failure. It is possible the compiler isn't looking in the proper directory.
Use --without-lz4 to disable lz4 support." "$LINENO" 5
fi
ac_fn_c_check_header_mongrel "$LINENO" "lz4.h" "ac_cv_header_lz4_h" "$ac_includes_default"
if test "x$ac_cv_header_lz4_h" = xyes; then :
else
as_fn_error $? "lz4 header not found
If you have lz4 already installed, see config.log for details on the
failure. It is possible the compiler isn't looking in the proper directory.
Use --without-lz4 to disable lz4 support." "$LINENO" 5
fi
fi
#
# ZSTD
#
# Check whether --with-zstd was given.
if test "${with_zstd+set}" = set; then :
withval=$with_zstd;
case $withval in
yes)
:
;;
no)
:
;;
*)
as_fn_error $? "no argument expected for --with-zstd option" "$LINENO" 5
;;
esac
else
with_zstd=yes
fi
if test "$with_zstd" = yes; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for ZSTD_decompress in -lzstd" >&5
$as_echo_n "checking for ZSTD_decompress in -lzstd... " >&6; }
if ${ac_cv_lib_zstd_ZSTD_decompress+:} false; then :
$as_echo_n "(cached) " >&6
else
ac_check_lib_save_LIBS=$LIBS
LIBS="-lzstd $LIBS"
cat confdefs.h - <<_ACEOF >conftest.$ac_ext
/* end confdefs.h. */
/* Override any GCC internal prototype to avoid an error.
Use char because int might match the return type of a GCC
builtin and then its argument prototype would still apply. */
#ifdef __cplusplus
extern "C"
#endif
char ZSTD_decompress ();
int
main ()
{
return ZSTD_decompress ();
;
return 0;
}
_ACEOF
if ac_fn_c_try_link "$LINENO"; then :
ac_cv_lib_zstd_ZSTD_decompress=yes
else
ac_cv_lib_zstd_ZSTD_decompress=no
fi
rm -f core conftest.err conftest.$ac_objext \
conftest$ac_exeext conftest.$ac_ext
LIBS=$ac_check_lib_save_LIBS
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_zstd_ZSTD_decompress" >&5
$as_echo "$ac_cv_lib_zstd_ZSTD_decompress" >&6; }
if test "x$ac_cv_lib_zstd_ZSTD_decompress" = xyes; then :
cat >>confdefs.h <<_ACEOF
#define HAVE_LIBZSTD 1
_ACEOF
LIBS="-lzstd $LIBS"
else
as_fn_error $? "zstd library not found
If you have zstd installed, see config.log for details on the
failure. It is possible the compiler isn't looking in the proper directory.
Use --without-zstd to disable zstd support." "$LINENO" 5
fi
ac_fn_c_check_header_mongrel "$LINENO" "zstd.h" "ac_cv_header_zstd_h" "$ac_includes_default"
if test "x$ac_cv_header_zstd_h" = xyes; then :
else
as_fn_error $? "zstd header not found
If you have zstd already installed, see config.log for details on the
failure. It is possible the compiler isn't looking in the proper directory.
Use --without-zstd to disable zstd support." "$LINENO" 5
fi
fi
@ -5393,7 +5163,7 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
# report actual input values of CONFIG_FILES etc. instead of their # report actual input values of CONFIG_FILES etc. instead of their
# values after options handling. # values after options handling.
ac_log=" ac_log="
This file was extended by Citus $as_me 13.2devel, which was This file was extended by Citus $as_me 9.5.10, which was
generated by GNU Autoconf 2.69. Invocation command line was generated by GNU Autoconf 2.69. Invocation command line was
CONFIG_FILES = $CONFIG_FILES CONFIG_FILES = $CONFIG_FILES
@ -5455,7 +5225,7 @@ _ACEOF
cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1 cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
ac_cs_config="`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`" ac_cs_config="`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`"
ac_cs_version="\\ ac_cs_version="\\
Citus config.status 13.2devel Citus config.status 9.5.10
configured by $0, generated by GNU Autoconf 2.69, configured by $0, generated by GNU Autoconf 2.69,
with options \\"\$ac_cs_config\\" with options \\"\$ac_cs_config\\"

View File

@ -5,7 +5,7 @@
# everyone needing autoconf installed, the resulting files are checked # everyone needing autoconf installed, the resulting files are checked
# into the SCM. # into the SCM.
AC_INIT([Citus], [13.2devel]) AC_INIT([Citus], [9.5.10])
AC_COPYRIGHT([Copyright (c) Citus Data, Inc.]) AC_COPYRIGHT([Copyright (c) Citus Data, Inc.])
# we'll need sed and awk for some of the version commands # we'll need sed and awk for some of the version commands
@ -74,13 +74,7 @@ if test -z "$version_num"; then
AC_MSG_ERROR([Could not detect PostgreSQL version from pg_config.]) AC_MSG_ERROR([Could not detect PostgreSQL version from pg_config.])
fi fi
PGAC_ARG_BOOL(with, pg-version-check, yes, if test "$version_num" != '11' -a "$version_num" != '12' -a "$version_num" != '13'; then
[do not check postgres version during configure])
AC_SUBST(with_pg_version_check)
if test "$with_pg_version_check" = no; then
AC_MSG_NOTICE([building against PostgreSQL $version_num (skipped compatibility check)])
elif test "$version_num" != '15' -a "$version_num" != '16' -a "$version_num" != '17'; then
AC_MSG_ERROR([Citus is not compatible with the detected PostgreSQL version ${version_num}.]) AC_MSG_ERROR([Citus is not compatible with the detected PostgreSQL version ${version_num}.])
else else
AC_MSG_NOTICE([building against PostgreSQL $version_num]) AC_MSG_NOTICE([building against PostgreSQL $version_num])
@ -222,46 +216,6 @@ PGAC_ARG_REQ(with, reports-hostname, [HOSTNAME],
AC_DEFINE_UNQUOTED(REPORTS_BASE_URL, "$REPORTS_BASE_URL", AC_DEFINE_UNQUOTED(REPORTS_BASE_URL, "$REPORTS_BASE_URL",
[Base URL for statistics collection and update checks]) [Base URL for statistics collection and update checks])
#
# LZ4
#
PGAC_ARG_BOOL(with, lz4, yes,
[do not use lz4],
[AC_DEFINE([HAVE_CITUS_LIBLZ4], 1, [Define to 1 to build with lz4 support. (--with-lz4)])])
AC_SUBST(with_lz4)
if test "$with_lz4" = yes; then
AC_CHECK_LIB(lz4, LZ4_compress_default, [],
[AC_MSG_ERROR([lz4 library not found
If you have lz4 installed, see config.log for details on the
failure. It is possible the compiler isn't looking in the proper directory.
Use --without-lz4 to disable lz4 support.])])
AC_CHECK_HEADER(lz4.h, [], [AC_MSG_ERROR([lz4 header not found
If you have lz4 already installed, see config.log for details on the
failure. It is possible the compiler isn't looking in the proper directory.
Use --without-lz4 to disable lz4 support.])])
fi
#
# ZSTD
#
PGAC_ARG_BOOL(with, zstd, yes,
[do not use zstd])
AC_SUBST(with_zstd)
if test "$with_zstd" = yes; then
AC_CHECK_LIB(zstd, ZSTD_decompress, [],
[AC_MSG_ERROR([zstd library not found
If you have zstd installed, see config.log for details on the
failure. It is possible the compiler isn't looking in the proper directory.
Use --without-zstd to disable zstd support.])])
AC_CHECK_HEADER(zstd.h, [], [AC_MSG_ERROR([zstd header not found
If you have zstd already installed, see config.log for details on the
failure. It is possible the compiler isn't looking in the proper directory.
Use --without-zstd to disable zstd support.])])
fi
PGAC_ARG_BOOL(with, security-flags, no, PGAC_ARG_BOOL(with, security-flags, no,
[use security flags]) [use security flags])
AC_SUBST(with_security_flags) AC_SUBST(with_security_flags)

BIN
github-banner.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.0 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 95 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 94 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 69 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 111 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 168 KiB

View File

@ -1,40 +0,0 @@
[tool.isort]
profile = 'black'
[tool.black]
include = '(src/test/regress/bin/diff-filter|\.pyi?|\.ipynb)$'
[tool.pytest.ini_options]
addopts = [
"--import-mode=importlib",
"--showlocals",
"--tb=short",
]
pythonpath = 'src/test/regress/citus_tests'
asyncio_mode = 'auto'
# Make test discovery quicker from the root dir of the repo
testpaths = ['src/test/regress/citus_tests/test']
# Make test discovery quicker from other directories than root directory
norecursedirs = [
'*.egg',
'.*',
'build',
'venv',
'ci',
'vendor',
'backend',
'bin',
'include',
'tmp_*',
'results',
'expected',
'sql',
'spec',
'data',
'__pycache__',
]
# Don't find files with test at the end such as run_test.py
python_files = ['test_*.py']

View File

@ -1,25 +0,0 @@
* whitespace=space-before-tab,trailing-space
*.[chly] whitespace=space-before-tab,trailing-space,indent-with-non-tab,tabwidth=4
*.dsl whitespace=space-before-tab,trailing-space,tab-in-indent
*.patch -whitespace
*.pl whitespace=space-before-tab,trailing-space,tabwidth=4
*.po whitespace=space-before-tab,trailing-space,tab-in-indent,-blank-at-eof
*.sgml whitespace=space-before-tab,trailing-space,tab-in-indent,-blank-at-eol
*.x[ms]l whitespace=space-before-tab,trailing-space,tab-in-indent
# Avoid confusing ASCII underlines with leftover merge conflict markers
README conflict-marker-size=32
README.* conflict-marker-size=32
# Certain data files that contain special whitespace, and other special cases
*.data -whitespace
# Test output files that contain extra whitespace
*.out -whitespace
# These files are maintained or generated elsewhere. We take them as is.
configure -whitespace
# all C files (implementation and header) use our style...
*.[ch] citus-style

View File

@ -1,3 +0,0 @@
# The directory used to store columnar sql files after pre-processing them
# with 'cpp' in build-time, see src/backend/columnar/Makefile.
/build/

View File

@ -1,60 +0,0 @@
citus_subdir = src/backend/columnar
citus_top_builddir = ../../..
safestringlib_srcdir = $(citus_abs_top_srcdir)/vendor/safestringlib
SUBDIRS = . safeclib
SUBDIRS +=
ENSURE_SUBDIRS_EXIST := $(shell mkdir -p $(SUBDIRS))
OBJS += \
$(patsubst $(citus_abs_srcdir)/%.c,%.o,$(foreach dir,$(SUBDIRS), $(sort $(wildcard $(citus_abs_srcdir)/$(dir)/*.c))))
MODULE_big = citus_columnar
EXTENSION = citus_columnar
template_sql_files = $(patsubst $(citus_abs_srcdir)/%,%,$(wildcard $(citus_abs_srcdir)/sql/*.sql))
template_downgrade_sql_files = $(patsubst $(citus_abs_srcdir)/sql/downgrades/%,%,$(wildcard $(citus_abs_srcdir)/sql/downgrades/*.sql))
generated_sql_files = $(patsubst %,$(citus_abs_srcdir)/build/%,$(template_sql_files))
generated_downgrade_sql_files += $(patsubst %,$(citus_abs_srcdir)/build/sql/%,$(template_downgrade_sql_files))
DATA_built = $(generated_sql_files)
PG_CPPFLAGS += -I$(libpq_srcdir) -I$(safestringlib_srcdir)/include
include $(citus_top_builddir)/Makefile.global
SQL_DEPDIR=.deps/sql
SQL_BUILDDIR=build/sql
$(generated_sql_files): $(citus_abs_srcdir)/build/%: %
@mkdir -p $(citus_abs_srcdir)/$(SQL_DEPDIR) $(citus_abs_srcdir)/$(SQL_BUILDDIR)
@# -MF is used to store dependency files(.Po) in another directory for separation
@# -MT is used to change the target of the rule emitted by dependency generation.
@# -P is used to inhibit generation of linemarkers in the output from the preprocessor.
@# -undef is used to not predefine any system-specific or GCC-specific macros.
@# `man cpp` for further information
cd $(citus_abs_srcdir) && cpp -undef -w -P -MMD -MP -MF$(SQL_DEPDIR)/$(*F).Po -MT$@ $< > $@
$(generated_downgrade_sql_files): $(citus_abs_srcdir)/build/sql/%: sql/downgrades/%
@mkdir -p $(citus_abs_srcdir)/$(SQL_DEPDIR) $(citus_abs_srcdir)/$(SQL_BUILDDIR)
@# -MF is used to store dependency files(.Po) in another directory for separation
@# -MT is used to change the target of the rule emitted by dependency generation.
@# -P is used to inhibit generation of linemarkers in the output from the preprocessor.
@# -undef is used to not predefine any system-specific or GCC-specific macros.
@# `man cpp` for further information
cd $(citus_abs_srcdir) && cpp -undef -w -P -MMD -MP -MF$(SQL_DEPDIR)/$(*F).Po -MT$@ $< > $@
.PHONY: install install-downgrades install-all
cleanup-before-install:
rm -f $(DESTDIR)$(datadir)/$(datamoduledir)/citus_columnar.control
rm -f $(DESTDIR)$(datadir)/$(datamoduledir)/columnar--*
rm -f $(DESTDIR)$(datadir)/$(datamoduledir)/citus_columnar--*
install: cleanup-before-install
# install and install-downgrades should be run sequentially
install-all: install
$(MAKE) install-downgrades
install-downgrades: $(generated_downgrade_sql_files)
$(INSTALL_DATA) $(generated_downgrade_sql_files) '$(DESTDIR)$(datadir)/$(datamoduledir)/'

View File

@ -1,321 +0,0 @@
# Introduction
Citus Columnar offers a per-table option for columnar storage to
reduce IO requirements though compression and projection pushdown.
# Design Trade-Offs
Existing PostgreSQL row tables work well for OLTP:
* Support `UPDATE`/`DELETE` efficiently
* Efficient single-tuple lookups
The Citus Columnar tables work best for analytic or DW workloads:
* Compression
* Doesn't read unnecessary columns
* Efficient `VACUUM`
# Next generation of cstore_fdw
Citus Columnar is the next generation of
[cstore_fdw](https://github.com/citusdata/cstore_fdw/).
Benefits of Citus Columnar over cstore_fdw:
* Citus Columnar is based on the [Table Access Method
API](https://www.postgresql.org/docs/current/tableam.html), which
allows it to behave exactly like an ordinary heap (row) table for
most operations.
* Supports Write-Ahead Log (WAL).
* Supports ``ROLLBACK``.
* Supports physical replication.
* Supports recovery, including Point-In-Time Restore (PITR).
* Supports ``pg_dump`` and ``pg_upgrade`` without the need for special
options or extra steps.
* Better user experience; simple ``USING``clause.
* Supports more features that work on ordinary heap (row) tables.
# Limitations
* Append-only (no ``UPDATE``/``DELETE`` support)
* No space reclamation (e.g. rolled-back transactions may still
consume disk space)
* No bitmap index scans
* No tidscans
* No sample scans
* No TOAST support (large values supported inline)
* No support for [``ON
CONFLICT``](https://www.postgresql.org/docs/12/sql-insert.html#SQL-ON-CONFLICT)
statements (except ``DO NOTHING`` actions with no target specified).
* No support for tuple locks (``SELECT ... FOR SHARE``, ``SELECT
... FOR UPDATE``)
* No support for serializable isolation level
* Support for PostgreSQL server versions 12+ only
* No support for foreign keys
* No support for logical decoding
* No support for intra-node parallel scans
* No support for ``AFTER ... FOR EACH ROW`` triggers
* No `UNLOGGED` columnar tables
Future iterations will incrementally lift the limitations listed above.
# User Experience
Create a Columnar table by specifying ``USING columnar`` when creating
the table.
```sql
CREATE TABLE my_columnar_table
(
id INT,
i1 INT,
i2 INT8,
n NUMERIC,
t TEXT
) USING columnar;
```
Insert data into the table and read from it like normal (subject to
the limitations listed above).
To see internal statistics about the table, use ``VACUUM
VERBOSE``. Note that ``VACUUM`` (without ``FULL``) is much faster on a
columnar table, because it scans only the metadata, and not the actual
data.
## Options
Set options using:
```sql
ALTER TABLE my_columnar_table SET
(columnar.compression = none, columnar.stripe_row_limit = 10000);
```
The following options are available:
* **columnar.compression**: `[none|pglz|zstd|lz4|lz4hc]` - set the compression type
for _newly-inserted_ data. Existing data will not be
recompressed/decompressed. The default value is `zstd` (if support
has been compiled in).
* **columnar.compression_level**: ``<integer>`` - Sets compression level. Valid
settings are from 1 through 19. If the compression method does not
support the level chosen, the closest level will be selected
instead.
* **columnar.stripe_row_limit**: ``<integer>`` - the maximum number of rows per
stripe for _newly-inserted_ data. Existing stripes of data will not
be changed and may have more rows than this maximum value. The
default value is `150000`.
* **columnar.chunk_group_row_limit**: ``<integer>`` - the maximum number of rows per
chunk for _newly-inserted_ data. Existing chunks of data will not be
changed and may have more rows than this maximum value. The default
value is `10000`.
View options for all tables with:
```sql
SELECT * FROM columnar.options;
```
You can also adjust options with a `SET` command of one of the
following GUCs:
* `columnar.compression`
* `columnar.compression_level`
* `columnar.stripe_row_limit`
* `columnar.chunk_group_row_limit`
GUCs only affect newly-created *tables*, not any newly-created
*stripes* on an existing table.
## Partitioning
Columnar tables can be used as partitions; and a partitioned table may
be made up of any combination of row and columnar partitions.
```sql
CREATE TABLE parent(ts timestamptz, i int, n numeric, s text)
PARTITION BY RANGE (ts);
-- columnar partition
CREATE TABLE p0 PARTITION OF parent
FOR VALUES FROM ('2020-01-01') TO ('2020-02-01')
USING COLUMNAR;
-- columnar partition
CREATE TABLE p1 PARTITION OF parent
FOR VALUES FROM ('2020-02-01') TO ('2020-03-01')
USING COLUMNAR;
-- row partition
CREATE TABLE p2 PARTITION OF parent
FOR VALUES FROM ('2020-03-01') TO ('2020-04-01');
INSERT INTO parent VALUES ('2020-01-15', 10, 100, 'one thousand'); -- columnar
INSERT INTO parent VALUES ('2020-02-15', 20, 200, 'two thousand'); -- columnar
INSERT INTO parent VALUES ('2020-03-15', 30, 300, 'three thousand'); -- row
```
When performing operations on a partitioned table with a mix of row
and columnar partitions, take note of the following behaviors for
operations that are supported on row tables but not columnar
(e.g. ``UPDATE``, ``DELETE``, tuple locks, etc.):
* If the operation is targeted at a specific row partition
(e.g. ``UPDATE p2 SET i = i + 1``), it will succeed; if targeted at
a specified columnar partition (e.g. ``UPDATE p1 SET i = i + 1``),
it will fail.
* If the operation is targeted at the partitioned table and has a
``WHERE`` clause that excludes all columnar partitions
(e.g. ``UPDATE parent SET i = i + 1 WHERE ts = '2020-03-15'``), it
will succeed.
* If the operation is targeted at the partitioned table, but does not
exclude all columnar partitions, it will fail; even if the actual
data to be updated only affects row tables (e.g. ``UPDATE parent SET
i = i + 1 WHERE n = 300``).
Note that Citus Columnar supports `btree` and `hash `indexes (and
the constraints requiring them) but does not support `gist`, `gin`,
`spgist` and `brin` indexes.
For this reason, if some partitions are columnar and if the index is
not supported by Citus Columnar, then it's impossible to create indexes
on the partitioned (parent) table directly. In that case, you need to
create the index on the individual row partitions. Similarly for the
constraints that require indexes, e.g.:
```sql
CREATE INDEX p2_ts_idx ON p2 (ts);
CREATE UNIQUE INDEX p2_i_unique ON p2 (i);
ALTER TABLE p2 ADD UNIQUE (n);
```
## Converting Between Row and Columnar
Note: ensure that you understand any advanced features that may be
used with the table before converting it (e.g. row-level security,
storage options, constraints, inheritance, etc.), and ensure that they
are reproduced in the new table or partition appropriately. ``LIKE``,
used below, is a shorthand that works only in simple cases.
```sql
CREATE TABLE my_table(i INT8 DEFAULT '7');
INSERT INTO my_table VALUES(1);
-- convert to columnar
SELECT alter_table_set_access_method('my_table', 'columnar');
-- back to row
SELECT alter_table_set_access_method('my_table', 'heap');
```
# Performance Microbenchmark
*Important*: This microbenchmark is not intended to represent any real
workload. Compression ratios, and therefore performance, will depend
heavily on the specific workload. This is only for the purpose of
illustrating a "columnar friendly" contrived workload that showcases
the benefits of columnar.
## Schema
```sql
CREATE TABLE perf_row(
id INT8,
ts TIMESTAMPTZ,
customer_id INT8,
vendor_id INT8,
name TEXT,
description TEXT,
value NUMERIC,
quantity INT4
);
CREATE TABLE perf_columnar(LIKE perf_row) USING COLUMNAR;
```
## Data
```sql
CREATE OR REPLACE FUNCTION random_words(n INT4) RETURNS TEXT LANGUAGE sql AS $$
WITH words(w) AS (
SELECT ARRAY['zero','one','two','three','four','five','six','seven','eight','nine','ten']
),
random (word) AS (
SELECT w[(random()*array_length(w, 1))::int] FROM generate_series(1, $1) AS i, words
)
SELECT string_agg(word, ' ') FROM random;
$$;
```
```sql
INSERT INTO perf_row
SELECT
g, -- id
'2020-01-01'::timestamptz + ('1 minute'::interval * g), -- ts
(random() * 1000000)::INT4, -- customer_id
(random() * 100)::INT4, -- vendor_id
random_words(7), -- name
random_words(100), -- description
(random() * 100000)::INT4/100.0, -- value
(random() * 100)::INT4 -- quantity
FROM generate_series(1,75000000) g;
INSERT INTO perf_columnar SELECT * FROM perf_row;
```
## Compression Ratio
```
=> SELECT pg_total_relation_size('perf_row')::numeric/pg_total_relation_size('perf_columnar') AS compression_ratio;
compression_ratio
--------------------
5.3958044063457513
(1 row)
```
The overall compression ratio of columnar table, versus the same data
stored with row storage, is **5.4X**.
```
=> VACUUM VERBOSE perf_columnar;
INFO: statistics for "perf_columnar":
storage id: 10000000000
total file size: 8761368576, total data size: 8734266196
compression rate: 5.01x
total row count: 75000000, stripe count: 500, average rows per stripe: 150000
chunk count: 60000, containing data for dropped columns: 0, zstd compressed: 60000
```
``VACUUM VERBOSE`` reports a smaller compression ratio, because it
only averages the compression ratio of the individual chunks, and does
not account for the metadata savings of the columnar format.
## System
* Azure VM: Standard D2s v3 (2 vcpus, 8 GiB memory)
* Linux (ubuntu 18.04)
* Data Drive: Standard HDD (512GB, 500 IOPS Max, 60 MB/s Max)
* PostgreSQL 13 (``--with-llvm``, ``--with-python``)
* ``shared_buffers = 128MB``
* ``max_parallel_workers_per_gather = 0``
* ``jit = on``
Note: because this was run on a system with enough physical memory to
hold a substantial fraction of the table, the IO benefits of columnar
won't be entirely realized by the query runtime unless the data size
is substantially increased.
## Query
```sql
-- OFFSET 1000 so that no rows are returned, and we collect only timings
SELECT vendor_id, SUM(quantity) FROM perf_row GROUP BY vendor_id OFFSET 1000;
SELECT vendor_id, SUM(quantity) FROM perf_row GROUP BY vendor_id OFFSET 1000;
SELECT vendor_id, SUM(quantity) FROM perf_row GROUP BY vendor_id OFFSET 1000;
SELECT vendor_id, SUM(quantity) FROM perf_columnar GROUP BY vendor_id OFFSET 1000;
SELECT vendor_id, SUM(quantity) FROM perf_columnar GROUP BY vendor_id OFFSET 1000;
SELECT vendor_id, SUM(quantity) FROM perf_columnar GROUP BY vendor_id OFFSET 1000;
```
Timing (median of three runs):
* row: 436s
* columnar: 16s
* speedup: **27X**

View File

@ -1,6 +0,0 @@
# Columnar extension
comment = 'Citus Columnar extension'
default_version = '12.2-1'
module_pathname = '$libdir/citus_columnar'
relocatable = false
schema = pg_catalog

View File

@ -1,169 +0,0 @@
/*-------------------------------------------------------------------------
*
* columnar.c
*
* This file contains...
*
* Copyright (c) 2016, Citus Data, Inc.
*
* $Id$
*
*-------------------------------------------------------------------------
*/
#include <sys/stat.h>
#include <unistd.h>
#include "postgres.h"
#include "miscadmin.h"
#include "utils/guc.h"
#include "utils/rel.h"
#include "citus_version.h"
#include "columnar/columnar.h"
#include "columnar/columnar_tableam.h"
/* Default values for option parameters */
#define DEFAULT_STRIPE_ROW_COUNT 150000
#define DEFAULT_CHUNK_ROW_COUNT 10000
#if HAVE_LIBZSTD
#define DEFAULT_COMPRESSION_TYPE COMPRESSION_ZSTD
#elif HAVE_CITUS_LIBLZ4
#define DEFAULT_COMPRESSION_TYPE COMPRESSION_LZ4
#else
#define DEFAULT_COMPRESSION_TYPE COMPRESSION_PG_LZ
#endif
int columnar_compression = DEFAULT_COMPRESSION_TYPE;
int columnar_stripe_row_limit = DEFAULT_STRIPE_ROW_COUNT;
int columnar_chunk_group_row_limit = DEFAULT_CHUNK_ROW_COUNT;
int columnar_compression_level = 3;
static const struct config_enum_entry columnar_compression_options[] =
{
{ "none", COMPRESSION_NONE, false },
{ "pglz", COMPRESSION_PG_LZ, false },
#if HAVE_CITUS_LIBLZ4
{ "lz4", COMPRESSION_LZ4, false },
#endif
#if HAVE_LIBZSTD
{ "zstd", COMPRESSION_ZSTD, false },
#endif
{ NULL, 0, false }
};
void
columnar_init(void)
{
columnar_init_gucs();
columnar_tableam_init();
}
void
columnar_init_gucs()
{
DefineCustomEnumVariable("columnar.compression",
"Compression type for columnar.",
NULL,
&columnar_compression,
DEFAULT_COMPRESSION_TYPE,
columnar_compression_options,
PGC_USERSET,
0,
NULL,
NULL,
NULL);
DefineCustomIntVariable("columnar.compression_level",
"Compression level to be used with zstd.",
NULL,
&columnar_compression_level,
3,
COMPRESSION_LEVEL_MIN,
COMPRESSION_LEVEL_MAX,
PGC_USERSET,
0,
NULL,
NULL,
NULL);
DefineCustomIntVariable("columnar.stripe_row_limit",
"Maximum number of tuples per stripe.",
NULL,
&columnar_stripe_row_limit,
DEFAULT_STRIPE_ROW_COUNT,
STRIPE_ROW_COUNT_MINIMUM,
STRIPE_ROW_COUNT_MAXIMUM,
PGC_USERSET,
0,
NULL,
NULL,
NULL);
DefineCustomIntVariable("columnar.chunk_group_row_limit",
"Maximum number of rows per chunk.",
NULL,
&columnar_chunk_group_row_limit,
DEFAULT_CHUNK_ROW_COUNT,
CHUNK_ROW_COUNT_MINIMUM,
CHUNK_ROW_COUNT_MAXIMUM,
PGC_USERSET,
0,
NULL,
NULL,
NULL);
}
/*
* ParseCompressionType converts a string to a compression type.
* For compression algorithms that are invalid or not compiled, it
* returns COMPRESSION_TYPE_INVALID.
*/
CompressionType
ParseCompressionType(const char *compressionTypeString)
{
Assert(compressionTypeString != NULL);
for (int compressionIndex = 0;
columnar_compression_options[compressionIndex].name != NULL;
compressionIndex++)
{
const char *compressionName = columnar_compression_options[compressionIndex].name;
if (strncmp(compressionTypeString, compressionName, NAMEDATALEN) == 0)
{
return columnar_compression_options[compressionIndex].val;
}
}
return COMPRESSION_TYPE_INVALID;
}
/*
* CompressionTypeStr returns string representation of a compression type.
* For compression algorithms that are invalid or not compiled, it
* returns NULL.
*/
const char *
CompressionTypeStr(CompressionType requestedType)
{
for (int compressionIndex = 0;
columnar_compression_options[compressionIndex].name != NULL;
compressionIndex++)
{
CompressionType compressionType =
columnar_compression_options[compressionIndex].val;
if (compressionType == requestedType)
{
return columnar_compression_options[compressionIndex].name;
}
}
return NULL;
}

View File

@ -1,272 +0,0 @@
/*-------------------------------------------------------------------------
*
* columnar_compression.c
*
* This file contains compression/decompression functions definitions
* used for columnar.
*
* Copyright (c) 2016, Citus Data, Inc.
*
* $Id$
*
*-------------------------------------------------------------------------
*/
#include "postgres.h"
#include "common/pg_lzcompress.h"
#include "lib/stringinfo.h"
#include "citus_version.h"
#include "pg_version_constants.h"
#include "columnar/columnar_compression.h"
#if HAVE_CITUS_LIBLZ4
#include <lz4.h>
#endif
#if PG_VERSION_NUM >= PG_VERSION_16
#include "varatt.h"
#endif
#if HAVE_LIBZSTD
#include <zstd.h>
#endif
/*
* The information at the start of the compressed data. This decription is taken
* from pg_lzcompress in pre-9.5 version of PostgreSQL.
*/
typedef struct ColumnarCompressHeader
{
int32 vl_len_; /* varlena header (do not touch directly!) */
int32 rawsize;
} ColumnarCompressHeader;
/*
* Utilities for manipulation of header information for compressed data
*/
#define COLUMNAR_COMPRESS_HDRSZ ((int32) sizeof(ColumnarCompressHeader))
#define COLUMNAR_COMPRESS_RAWSIZE(ptr) (((ColumnarCompressHeader *) (ptr))->rawsize)
#define COLUMNAR_COMPRESS_RAWDATA(ptr) (((char *) (ptr)) + COLUMNAR_COMPRESS_HDRSZ)
#define COLUMNAR_COMPRESS_SET_RAWSIZE(ptr, \
len) (((ColumnarCompressHeader *) (ptr))->rawsize = \
(len))
/*
* CompressBuffer compresses the given buffer with the given compression type
* outputBuffer enlarged to contain compressed data. The function returns true
* if compression is done, returns false if compression is not done.
* outputBuffer is valid only if the function returns true.
*/
bool
CompressBuffer(StringInfo inputBuffer,
StringInfo outputBuffer,
CompressionType compressionType,
int compressionLevel)
{
switch (compressionType)
{
#if HAVE_CITUS_LIBLZ4
case COMPRESSION_LZ4:
{
int maximumLength = LZ4_compressBound(inputBuffer->len);
resetStringInfo(outputBuffer);
enlargeStringInfo(outputBuffer, maximumLength);
int compressedSize = LZ4_compress_default(inputBuffer->data,
outputBuffer->data,
inputBuffer->len, maximumLength);
if (compressedSize <= 0)
{
elog(DEBUG1,
"failure in LZ4_compress_default, input size=%d, output size=%d",
inputBuffer->len, maximumLength);
return false;
}
elog(DEBUG1, "compressed %d bytes to %d bytes", inputBuffer->len,
compressedSize);
outputBuffer->len = compressedSize;
return true;
}
#endif
#if HAVE_LIBZSTD
case COMPRESSION_ZSTD:
{
int maximumLength = ZSTD_compressBound(inputBuffer->len);
resetStringInfo(outputBuffer);
enlargeStringInfo(outputBuffer, maximumLength);
size_t compressedSize = ZSTD_compress(outputBuffer->data,
outputBuffer->maxlen,
inputBuffer->data,
inputBuffer->len,
compressionLevel);
if (ZSTD_isError(compressedSize))
{
ereport(WARNING, (errmsg("zstd compression failed"),
(errdetail("%s", ZSTD_getErrorName(compressedSize)))));
return false;
}
outputBuffer->len = compressedSize;
return true;
}
#endif
case COMPRESSION_PG_LZ:
{
uint64 maximumLength = PGLZ_MAX_OUTPUT(inputBuffer->len) +
COLUMNAR_COMPRESS_HDRSZ;
bool compressionResult = false;
resetStringInfo(outputBuffer);
enlargeStringInfo(outputBuffer, maximumLength);
int32 compressedByteCount = pglz_compress((const char *) inputBuffer->data,
inputBuffer->len,
COLUMNAR_COMPRESS_RAWDATA(
outputBuffer->data),
PGLZ_strategy_always);
if (compressedByteCount >= 0)
{
COLUMNAR_COMPRESS_SET_RAWSIZE(outputBuffer->data, inputBuffer->len);
SET_VARSIZE_COMPRESSED(outputBuffer->data,
compressedByteCount + COLUMNAR_COMPRESS_HDRSZ);
compressionResult = true;
}
if (compressionResult)
{
outputBuffer->len = VARSIZE(outputBuffer->data);
}
return compressionResult;
}
default:
{
return false;
}
}
}
/*
* DecompressBuffer decompresses the given buffer with the given compression
* type. This function returns the buffer as-is when no compression is applied.
*/
StringInfo
DecompressBuffer(StringInfo buffer,
CompressionType compressionType,
uint64 decompressedSize)
{
switch (compressionType)
{
case COMPRESSION_NONE:
{
return buffer;
}
#if HAVE_CITUS_LIBLZ4
case COMPRESSION_LZ4:
{
StringInfo decompressedBuffer = makeStringInfo();
enlargeStringInfo(decompressedBuffer, decompressedSize);
int lz4DecompressSize = LZ4_decompress_safe(buffer->data,
decompressedBuffer->data,
buffer->len,
decompressedSize);
if (lz4DecompressSize != decompressedSize)
{
ereport(ERROR, (errmsg("cannot decompress the buffer"),
errdetail("Expected %lu bytes, but received %d bytes",
decompressedSize, lz4DecompressSize)));
}
decompressedBuffer->len = decompressedSize;
return decompressedBuffer;
}
#endif
#if HAVE_LIBZSTD
case COMPRESSION_ZSTD:
{
StringInfo decompressedBuffer = makeStringInfo();
enlargeStringInfo(decompressedBuffer, decompressedSize);
size_t zstdDecompressSize = ZSTD_decompress(decompressedBuffer->data,
decompressedSize,
buffer->data,
buffer->len);
if (ZSTD_isError(zstdDecompressSize))
{
ereport(ERROR, (errmsg("zstd decompression failed"),
(errdetail("%s", ZSTD_getErrorName(
zstdDecompressSize)))));
}
if (zstdDecompressSize != decompressedSize)
{
ereport(ERROR, (errmsg("unexpected decompressed size"),
errdetail("Expected %ld, received %ld", decompressedSize,
zstdDecompressSize)));
}
decompressedBuffer->len = decompressedSize;
return decompressedBuffer;
}
#endif
case COMPRESSION_PG_LZ:
{
uint32 compressedDataSize = VARSIZE(buffer->data) - COLUMNAR_COMPRESS_HDRSZ;
uint32 decompressedDataSize = COLUMNAR_COMPRESS_RAWSIZE(buffer->data);
if (compressedDataSize + COLUMNAR_COMPRESS_HDRSZ != buffer->len)
{
ereport(ERROR, (errmsg("cannot decompress the buffer"),
errdetail("Expected %u bytes, but received %u bytes",
compressedDataSize, buffer->len)));
}
char *decompressedData = palloc0(decompressedDataSize);
int32 decompressedByteCount = pglz_decompress(COLUMNAR_COMPRESS_RAWDATA(
buffer->data),
compressedDataSize,
decompressedData,
decompressedDataSize, true);
if (decompressedByteCount < 0)
{
ereport(ERROR, (errmsg("cannot decompress the buffer"),
errdetail("compressed data is corrupted")));
}
StringInfo decompressedBuffer = palloc0(sizeof(StringInfoData));
decompressedBuffer->data = decompressedData;
decompressedBuffer->len = decompressedDataSize;
decompressedBuffer->maxlen = decompressedDataSize;
return decompressedBuffer;
}
default:
{
ereport(ERROR, (errmsg("unexpected compression type: %d", compressionType)));
}
}
}

File diff suppressed because it is too large Load Diff

View File

@ -1,165 +0,0 @@
/*-------------------------------------------------------------------------
*
* columnar_debug.c
*
* Helper functions to debug column store.
*
*-------------------------------------------------------------------------
*/
#include "postgres.h"
#include "funcapi.h"
#include "miscadmin.h"
#include "access/nbtree.h"
#include "access/table.h"
#include "catalog/pg_am.h"
#include "catalog/pg_type.h"
#include "storage/fd.h"
#include "storage/smgr.h"
#include "utils/guc.h"
#include "utils/memutils.h"
#include "utils/rel.h"
#include "utils/tuplestore.h"
#include "pg_version_compat.h"
#include "pg_version_constants.h"
#include "columnar/columnar.h"
#include "columnar/columnar_storage.h"
#include "columnar/columnar_version_compat.h"
static void MemoryContextTotals(MemoryContext context, MemoryContextCounters *counters);
PG_FUNCTION_INFO_V1(columnar_store_memory_stats);
PG_FUNCTION_INFO_V1(columnar_storage_info);
/*
* columnar_store_memory_stats returns a record of 3 values: size of
* TopMemoryContext, TopTransactionContext, and Write State context.
*/
Datum
columnar_store_memory_stats(PG_FUNCTION_ARGS)
{
const int resultColumnCount = 3;
TupleDesc tupleDescriptor = CreateTemplateTupleDesc(resultColumnCount);
TupleDescInitEntry(tupleDescriptor, (AttrNumber) 1, "TopMemoryContext",
INT8OID, -1, 0);
TupleDescInitEntry(tupleDescriptor, (AttrNumber) 2, "TopTransactionContext",
INT8OID, -1, 0);
TupleDescInitEntry(tupleDescriptor, (AttrNumber) 3, "WriteStateContext",
INT8OID, -1, 0);
tupleDescriptor = BlessTupleDesc(tupleDescriptor);
MemoryContextCounters transactionCounters = { 0 };
MemoryContextCounters topCounters = { 0 };
MemoryContextCounters writeStateCounters = { 0 };
MemoryContextTotals(TopTransactionContext, &transactionCounters);
MemoryContextTotals(TopMemoryContext, &topCounters);
MemoryContextTotals(GetWriteContextForDebug(), &writeStateCounters);
bool nulls[3] = { false };
Datum values[3] = {
Int64GetDatum(topCounters.totalspace),
Int64GetDatum(transactionCounters.totalspace),
Int64GetDatum(writeStateCounters.totalspace)
};
HeapTuple tuple = heap_form_tuple(tupleDescriptor, values, nulls);
PG_RETURN_DATUM(HeapTupleGetDatum(tuple));
}
/*
* columnar_storage_info - UDF to return internal storage info for a columnar relation.
*
* DDL:
* CREATE OR REPLACE FUNCTION columnar_storage_info(
* rel regclass,
* version_major OUT int4,
* version_minor OUT int4,
* storage_id OUT int8,
* reserved_stripe_id OUT int8,
* reserved_row_number OUT int8,
* reserved_offset OUT int8)
* STRICT
* LANGUAGE c AS 'MODULE_PATHNAME', 'columnar_storage_info';
*/
Datum
columnar_storage_info(PG_FUNCTION_ARGS)
{
#define STORAGE_INFO_NATTS 6
Oid relid = PG_GETARG_OID(0);
TupleDesc tupdesc;
/* Build a tuple descriptor for our result type */
if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
{
elog(ERROR, "return type must be a row type");
}
if (tupdesc->natts != STORAGE_INFO_NATTS)
{
elog(ERROR, "return type must have %d columns", STORAGE_INFO_NATTS);
}
Relation rel = table_open(relid, AccessShareLock);
if (!IsColumnarTableAmTable(relid))
{
ereport(ERROR, (errmsg("table \"%s\" is not a columnar table",
RelationGetRelationName(rel))));
}
Datum values[STORAGE_INFO_NATTS] = { 0 };
bool nulls[STORAGE_INFO_NATTS] = { 0 };
/*
* Pass force = true so that we can inspect metapages that are not the
* current version.
*
* NB: ensure the order and number of attributes correspond to DDL
* declaration.
*/
values[0] = Int32GetDatum(ColumnarStorageGetVersionMajor(rel, true));
values[1] = Int32GetDatum(ColumnarStorageGetVersionMinor(rel, true));
values[2] = Int64GetDatum(ColumnarStorageGetStorageId(rel, true));
values[3] = Int64GetDatum(ColumnarStorageGetReservedStripeId(rel, true));
values[4] = Int64GetDatum(ColumnarStorageGetReservedRowNumber(rel, true));
values[5] = Int64GetDatum(ColumnarStorageGetReservedOffset(rel, true));
/* release lock */
table_close(rel, AccessShareLock);
HeapTuple tuple = heap_form_tuple(tupdesc, values, nulls);
PG_RETURN_DATUM(HeapTupleGetDatum(tuple));
}
/*
* MemoryContextTotals adds stats of the given memory context and its
* subtree to the given counters.
*/
static void
MemoryContextTotals(MemoryContext context, MemoryContextCounters *counters)
{
if (context == NULL)
{
return;
}
MemoryContext child;
for (child = context->firstchild; child != NULL; child = child->nextchild)
{
MemoryContextTotals(child, counters);
}
context->methods->stats(context, NULL, NULL, counters, true);
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,866 +0,0 @@
/*-------------------------------------------------------------------------
*
* columnar_storage.c
*
* Copyright (c) Citus Data, Inc.
*
* Low-level storage layer for columnar.
* - Translates columnar read/write operations on logical offsets into operations on pages/blocks.
* - Emits WAL.
* - Reads/writes the columnar metapage.
* - Reserves data offsets, stripe numbers, and row offsets.
* - Truncation.
*
* Higher-level columnar operations deal with logical offsets and large
* contiguous buffers of data that need to be stored. But the buffer manager
* and WAL depend on formatted pages with headers, so these large buffers need
* to be written across many pages. This module translates the contiguous
* buffers into individual block reads/writes, and performs WAL when
* necessary.
*
* Storage layout: a metapage in block 0, followed by an empty page in block
* 1, followed by logical data starting at the first byte after the page
* header in block 2 (having logical offset ColumnarFirstLogicalOffset). (XXX:
* Block 1 is left empty for no particular reason. Reconsider?). A columnar
* table should always have at least 2 blocks.
*
* Reservation is done with a relation extension lock, and designed for
* concurrency, so the callers only need an ordinary lock on the
* relation. Initializing the metapage or truncating the relation require that
* the caller holds an AccessExclusiveLock. (XXX: New reservations of data are
* aligned onto a new page for no particular reason. Reconsider?).
*
*-------------------------------------------------------------------------
*/
#include "postgres.h"
#include "miscadmin.h"
#include "safe_lib.h"
#include "access/generic_xlog.h"
#include "catalog/storage.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
#include "pg_version_compat.h"
#include "columnar/columnar.h"
#include "columnar/columnar_storage.h"
/*
* Content of the first page in main fork, which stores metadata at file
* level.
*/
typedef struct ColumnarMetapage
{
/*
* Store version of file format used, so we can detect files from
* previous versions if we change file format.
*/
uint32 versionMajor;
uint32 versionMinor;
/*
* Each of the metadata table rows are identified by a storageId.
* We store it also in the main fork so we can link metadata rows
* with data files.
*/
uint64 storageId;
uint64 reservedStripeId; /* first unused stripe id */
uint64 reservedRowNumber; /* first unused row number */
uint64 reservedOffset; /* first unused byte offset */
/*
* Flag set to true in the init fork. After an unlogged table reset (due
* to a crash), the init fork will be copied over the main fork. When
* trying to read an unlogged table, if this flag is set to true, we must
* clear the metadata for the table (because the actual data is gone,
* too), and clear the flag. We can cross-check that the table is
* UNLOGGED, and that the main fork is at the minimum size (no actual
* data).
*
* XXX: Not used yet; reserved field for later support for UNLOGGED.
*/
bool unloggedReset;
} ColumnarMetapage;
/* represents a "physical" block+offset address */
typedef struct PhysicalAddr
{
BlockNumber blockno;
uint32 offset;
} PhysicalAddr;
#define COLUMNAR_METAPAGE_BLOCKNO 0
#define COLUMNAR_EMPTY_BLOCKNO 1
#define COLUMNAR_INVALID_STRIPE_ID 0
#define COLUMNAR_FIRST_STRIPE_ID 1
#define OLD_METAPAGE_VERSION_HINT "Use \"VACUUM\" to upgrade the columnar table format " \
"version or run \"ALTER EXTENSION citus UPDATE\"."
/* only for testing purposes */
PG_FUNCTION_INFO_V1(test_columnar_storage_write_new_page);
/*
* Map logical offsets to a physical page and offset where the data is kept.
*/
static inline PhysicalAddr
LogicalToPhysical(uint64 logicalOffset)
{
PhysicalAddr addr;
addr.blockno = logicalOffset / COLUMNAR_BYTES_PER_PAGE;
addr.offset = SizeOfPageHeaderData + (logicalOffset % COLUMNAR_BYTES_PER_PAGE);
return addr;
}
/*
* Map a physical page and offset address to a logical address.
*/
static inline uint64
PhysicalToLogical(PhysicalAddr addr)
{
return COLUMNAR_BYTES_PER_PAGE * addr.blockno + addr.offset - SizeOfPageHeaderData;
}
static void ColumnarOverwriteMetapage(Relation relation,
ColumnarMetapage columnarMetapage);
static ColumnarMetapage ColumnarMetapageRead(Relation rel, bool force);
static void ReadFromBlock(Relation rel, BlockNumber blockno, uint32 offset,
char *buf, uint32 len, bool force);
static void WriteToBlock(Relation rel, BlockNumber blockno, uint32 offset,
char *buf, uint32 len, bool clear);
static uint64 AlignReservation(uint64 prevReservation);
static bool ColumnarMetapageIsCurrent(ColumnarMetapage *metapage);
static bool ColumnarMetapageIsOlder(ColumnarMetapage *metapage);
static bool ColumnarMetapageIsNewer(ColumnarMetapage *metapage);
static void ColumnarMetapageCheckVersion(Relation rel, ColumnarMetapage *metapage);
/*
* ColumnarStorageInit - initialize a new metapage in an empty relation
* with the given storageId.
*
* Caller must hold AccessExclusiveLock on the relation.
*/
void
ColumnarStorageInit(SMgrRelation srel, uint64 storageId)
{
BlockNumber nblocks = smgrnblocks(srel, MAIN_FORKNUM);
if (nblocks > 0)
{
elog(ERROR,
"attempted to initialize metapage, but %d pages already exist",
nblocks);
}
/* create two pages */
#if PG_VERSION_NUM >= PG_VERSION_16
PGIOAlignedBlock block;
#else
PGAlignedBlock block;
#endif
Page page = block.data;
/* write metapage */
PageInit(page, BLCKSZ, 0);
PageHeader phdr = (PageHeader) page;
ColumnarMetapage metapage = { 0 };
metapage.storageId = storageId;
metapage.versionMajor = COLUMNAR_VERSION_MAJOR;
metapage.versionMinor = COLUMNAR_VERSION_MINOR;
metapage.reservedStripeId = COLUMNAR_FIRST_STRIPE_ID;
metapage.reservedRowNumber = COLUMNAR_FIRST_ROW_NUMBER;
metapage.reservedOffset = ColumnarFirstLogicalOffset;
metapage.unloggedReset = false;
memcpy_s(page + phdr->pd_lower, phdr->pd_upper - phdr->pd_lower,
(char *) &metapage, sizeof(ColumnarMetapage));
phdr->pd_lower += sizeof(ColumnarMetapage);
log_newpage(RelationPhysicalIdentifierBackend_compat(&srel), MAIN_FORKNUM,
COLUMNAR_METAPAGE_BLOCKNO, page, true);
PageSetChecksumInplace(page, COLUMNAR_METAPAGE_BLOCKNO);
smgrextend(srel, MAIN_FORKNUM, COLUMNAR_METAPAGE_BLOCKNO, page, true);
/* write empty page */
PageInit(page, BLCKSZ, 0);
log_newpage(RelationPhysicalIdentifierBackend_compat(&srel), MAIN_FORKNUM,
COLUMNAR_EMPTY_BLOCKNO, page, true);
PageSetChecksumInplace(page, COLUMNAR_EMPTY_BLOCKNO);
smgrextend(srel, MAIN_FORKNUM, COLUMNAR_EMPTY_BLOCKNO, page, true);
/*
* An immediate sync is required even if we xlog'd the page, because the
* write did not go through shared_buffers and therefore a concurrent
* checkpoint may have moved the redo pointer past our xlog record.
*/
smgrimmedsync(srel, MAIN_FORKNUM);
}
/*
* ColumnarStorageUpdateCurrent - update the metapage to the current
* version. No effect if the version already matches. If 'upgrade' is true,
* throw an error if metapage version is newer; if 'upgrade' is false, it's a
* downgrade, so throw an error if the metapage version is older.
*
* NB: caller must ensure that metapage already exists, which might not be the
* case on 10.0.
*/
void
ColumnarStorageUpdateCurrent(Relation rel, bool upgrade, uint64 reservedStripeId,
uint64 reservedRowNumber, uint64 reservedOffset)
{
LockRelationForExtension(rel, ExclusiveLock);
ColumnarMetapage metapage = ColumnarMetapageRead(rel, true);
if (ColumnarMetapageIsCurrent(&metapage))
{
/* nothing to do */
return;
}
if (upgrade && ColumnarMetapageIsNewer(&metapage))
{
elog(ERROR, "found newer columnar metapage while upgrading");
}
if (!upgrade && ColumnarMetapageIsOlder(&metapage))
{
elog(ERROR, "found older columnar metapage while downgrading");
}
metapage.versionMajor = COLUMNAR_VERSION_MAJOR;
metapage.versionMinor = COLUMNAR_VERSION_MINOR;
/* storageId remains the same */
metapage.reservedStripeId = reservedStripeId;
metapage.reservedRowNumber = reservedRowNumber;
metapage.reservedOffset = reservedOffset;
ColumnarOverwriteMetapage(rel, metapage);
UnlockRelationForExtension(rel, ExclusiveLock);
}
/*
* ColumnarStorageGetVersionMajor - return major version from the metapage.
*
* Throw an error if the metapage is not the current version, unless
* 'force' is true.
*/
uint64
ColumnarStorageGetVersionMajor(Relation rel, bool force)
{
ColumnarMetapage metapage = ColumnarMetapageRead(rel, force);
return metapage.versionMajor;
}
/*
* ColumnarStorageGetVersionMinor - return minor version from the metapage.
*
* Throw an error if the metapage is not the current version, unless
* 'force' is true.
*/
uint64
ColumnarStorageGetVersionMinor(Relation rel, bool force)
{
ColumnarMetapage metapage = ColumnarMetapageRead(rel, force);
return metapage.versionMinor;
}
/*
* ColumnarStorageGetStorageId - return storage ID from the metapage.
*
* Throw an error if the metapage is not the current version, unless
* 'force' is true.
*/
uint64
ColumnarStorageGetStorageId(Relation rel, bool force)
{
ColumnarMetapage metapage = ColumnarMetapageRead(rel, force);
return metapage.storageId;
}
/*
* ColumnarStorageGetReservedStripeId - return reserved stripe ID from the
* metapage.
*
* Throw an error if the metapage is not the current version, unless
* 'force' is true.
*/
uint64
ColumnarStorageGetReservedStripeId(Relation rel, bool force)
{
ColumnarMetapage metapage = ColumnarMetapageRead(rel, force);
return metapage.reservedStripeId;
}
/*
* ColumnarStorageGetReservedRowNumber - return reserved row number from the
* metapage.
*
* Throw an error if the metapage is not the current version, unless
* 'force' is true.
*/
uint64
ColumnarStorageGetReservedRowNumber(Relation rel, bool force)
{
ColumnarMetapage metapage = ColumnarMetapageRead(rel, force);
return metapage.reservedRowNumber;
}
/*
* ColumnarStorageGetReservedOffset - return reserved offset from the metapage.
*
* Throw an error if the metapage is not the current version, unless
* 'force' is true.
*/
uint64
ColumnarStorageGetReservedOffset(Relation rel, bool force)
{
ColumnarMetapage metapage = ColumnarMetapageRead(rel, force);
return metapage.reservedOffset;
}
/*
* ColumnarStorageIsCurrent - return true if metapage exists and is not
* the current version.
*/
bool
ColumnarStorageIsCurrent(Relation rel)
{
BlockNumber nblocks = smgrnblocks(RelationGetSmgr(rel), MAIN_FORKNUM);
if (nblocks < 2)
{
return false;
}
ColumnarMetapage metapage = ColumnarMetapageRead(rel, true);
return ColumnarMetapageIsCurrent(&metapage);
}
/*
* ColumnarStorageReserveRowNumber returns reservedRowNumber and advances
* it for next row number reservation.
*/
uint64
ColumnarStorageReserveRowNumber(Relation rel, uint64 nrows)
{
LockRelationForExtension(rel, ExclusiveLock);
ColumnarMetapage metapage = ColumnarMetapageRead(rel, false);
uint64 firstRowNumber = metapage.reservedRowNumber;
metapage.reservedRowNumber += nrows;
ColumnarOverwriteMetapage(rel, metapage);
UnlockRelationForExtension(rel, ExclusiveLock);
return firstRowNumber;
}
/*
* ColumnarStorageReserveStripeId returns stripeId and advances it for next
* stripeId reservation.
* Note that this function doesn't handle row number reservation.
* See ColumnarStorageReserveRowNumber function.
*/
uint64
ColumnarStorageReserveStripeId(Relation rel)
{
LockRelationForExtension(rel, ExclusiveLock);
ColumnarMetapage metapage = ColumnarMetapageRead(rel, false);
uint64 stripeId = metapage.reservedStripeId;
metapage.reservedStripeId++;
ColumnarOverwriteMetapage(rel, metapage);
UnlockRelationForExtension(rel, ExclusiveLock);
return stripeId;
}
/*
* ColumnarStorageReserveData - reserve logical data offsets for writing.
*/
uint64
ColumnarStorageReserveData(Relation rel, uint64 amount)
{
if (amount == 0)
{
return ColumnarInvalidLogicalOffset;
}
LockRelationForExtension(rel, ExclusiveLock);
ColumnarMetapage metapage = ColumnarMetapageRead(rel, false);
uint64 alignedReservation = AlignReservation(metapage.reservedOffset);
uint64 nextReservation = alignedReservation + amount;
metapage.reservedOffset = nextReservation;
/* write new reservation */
ColumnarOverwriteMetapage(rel, metapage);
/* last used PhysicalAddr of new reservation */
PhysicalAddr final = LogicalToPhysical(nextReservation - 1);
/* extend with new pages */
BlockNumber nblocks = smgrnblocks(RelationGetSmgr(rel), MAIN_FORKNUM);
while (nblocks <= final.blockno)
{
Buffer newBuffer = ReadBuffer(rel, P_NEW);
Assert(BufferGetBlockNumber(newBuffer) == nblocks);
ReleaseBuffer(newBuffer);
nblocks++;
}
UnlockRelationForExtension(rel, ExclusiveLock);
return alignedReservation;
}
/*
* ColumnarStorageRead - map the logical offset to a block and offset, then
* read the buffer from multiple blocks if necessary.
*/
void
ColumnarStorageRead(Relation rel, uint64 logicalOffset, char *data, uint32 amount)
{
/* if there's no work to do, succeed even with invalid offset */
if (amount == 0)
{
return;
}
if (!ColumnarLogicalOffsetIsValid(logicalOffset))
{
elog(ERROR,
"attempted columnar read on relation %d from invalid logical offset: "
UINT64_FORMAT,
rel->rd_id, logicalOffset);
}
uint64 read = 0;
while (read < amount)
{
PhysicalAddr addr = LogicalToPhysical(logicalOffset + read);
uint32 to_read = Min(amount - read, BLCKSZ - addr.offset);
ReadFromBlock(rel, addr.blockno, addr.offset, data + read, to_read,
false);
read += to_read;
}
}
/*
* ColumnarStorageWrite - map the logical offset to a block and offset, then
* write the buffer across multiple blocks if necessary.
*/
void
ColumnarStorageWrite(Relation rel, uint64 logicalOffset, char *data, uint32 amount)
{
/* if there's no work to do, succeed even with invalid offset */
if (amount == 0)
{
return;
}
if (!ColumnarLogicalOffsetIsValid(logicalOffset))
{
elog(ERROR,
"attempted columnar write on relation %d to invalid logical offset: "
UINT64_FORMAT,
rel->rd_id, logicalOffset);
}
uint64 written = 0;
while (written < amount)
{
PhysicalAddr addr = LogicalToPhysical(logicalOffset + written);
uint64 to_write = Min(amount - written, BLCKSZ - addr.offset);
WriteToBlock(rel, addr.blockno, addr.offset, data + written, to_write,
false);
written += to_write;
}
}
/*
* ColumnarStorageTruncate - truncate the columnar storage such that
* newDataReservation will be the first unused logical offset available. Free
* pages at the end of the relation.
*
* Caller must hold AccessExclusiveLock on the relation.
*
* Returns true if pages were truncated; false otherwise.
*/
bool
ColumnarStorageTruncate(Relation rel, uint64 newDataReservation)
{
if (!ColumnarLogicalOffsetIsValid(newDataReservation))
{
elog(ERROR,
"attempted to truncate relation %d to invalid logical offset: " UINT64_FORMAT,
rel->rd_id, newDataReservation);
}
BlockNumber old_rel_pages = smgrnblocks(RelationGetSmgr(rel), MAIN_FORKNUM);
if (old_rel_pages == 0)
{
/* nothing to do */
return false;
}
LockRelationForExtension(rel, ExclusiveLock);
ColumnarMetapage metapage = ColumnarMetapageRead(rel, false);
if (metapage.reservedOffset < newDataReservation)
{
elog(ERROR,
"attempted to truncate relation %d to offset " UINT64_FORMAT \
" which is higher than existing offset " UINT64_FORMAT,
rel->rd_id, newDataReservation, metapage.reservedOffset);
}
if (metapage.reservedOffset == newDataReservation)
{
/* nothing to do */
UnlockRelationForExtension(rel, ExclusiveLock);
return false;
}
metapage.reservedOffset = newDataReservation;
/* write new reservation */
ColumnarOverwriteMetapage(rel, metapage);
UnlockRelationForExtension(rel, ExclusiveLock);
PhysicalAddr final = LogicalToPhysical(newDataReservation - 1);
BlockNumber new_rel_pages = final.blockno + 1;
Assert(new_rel_pages <= old_rel_pages);
/*
* Truncate the storage. Note that RelationTruncate() takes care of
* Write Ahead Logging.
*/
if (new_rel_pages < old_rel_pages)
{
RelationTruncate(rel, new_rel_pages);
return true;
}
return false;
}
/*
* ColumnarOverwriteMetapage writes given columnarMetapage back to metapage
* for given relation.
*/
static void
ColumnarOverwriteMetapage(Relation relation, ColumnarMetapage columnarMetapage)
{
/* clear metapage because we are overwriting */
bool clear = true;
WriteToBlock(relation, COLUMNAR_METAPAGE_BLOCKNO, SizeOfPageHeaderData,
(char *) &columnarMetapage, sizeof(ColumnarMetapage), clear);
}
/*
* ColumnarMetapageRead - read the current contents of the metapage. Error if
* it does not exist. Throw an error if the metapage is not the current
* version, unless 'force' is true.
*
* NB: it's safe to read a different version of a metapage because we
* guarantee that fields will only be added and existing fields will never be
* changed. However, it's important that we don't depend on new fields being
* set properly when we read an old metapage; an old metapage should only be
* read for the purposes of upgrading or error checking.
*/
static ColumnarMetapage
ColumnarMetapageRead(Relation rel, bool force)
{
BlockNumber nblocks = smgrnblocks(RelationGetSmgr(rel), MAIN_FORKNUM);
if (nblocks == 0)
{
/*
* We only expect this to happen when upgrading citus.so. This is because,
* in current version of columnar, we immediately create the metapage
* for columnar tables, i.e right after creating the table.
* However in older versions, we were creating metapages lazily, i.e
* when ingesting data to columnar table.
*/
ereport(ERROR, (errmsg("columnar metapage for relation \"%s\" does not exist",
RelationGetRelationName(rel)),
errhint(OLD_METAPAGE_VERSION_HINT)));
}
/*
* Regardless of "force" parameter, always force read metapage block.
* We will check metapage version in ColumnarMetapageCheckVersion
* depending on "force".
*/
bool forceReadBlock = true;
ColumnarMetapage metapage;
ReadFromBlock(rel, COLUMNAR_METAPAGE_BLOCKNO, SizeOfPageHeaderData,
(char *) &metapage, sizeof(ColumnarMetapage), forceReadBlock);
if (!force)
{
ColumnarMetapageCheckVersion(rel, &metapage);
}
return metapage;
}
/*
* ReadFromBlock - read bytes from a page at the given offset. If 'force' is
* true, don't check pd_lower; useful when reading a metapage of unknown
* version.
*/
static void
ReadFromBlock(Relation rel, BlockNumber blockno, uint32 offset, char *buf,
uint32 len, bool force)
{
Buffer buffer = ReadBuffer(rel, blockno);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
Page page = BufferGetPage(buffer);
PageHeader phdr = (PageHeader) page;
if (BLCKSZ < offset + len || (!force && (phdr->pd_lower < offset + len)))
{
elog(ERROR,
"attempt to read columnar data of length %d from offset %d of block %d of relation %d",
len, offset, blockno, rel->rd_id);
}
memcpy_s(buf, len, page + offset, len);
UnlockReleaseBuffer(buffer);
}
/*
* WriteToBlock - append data to a block, initializing if necessary, and emit
* WAL. If 'clear' is true, always clear the data on the page and reinitialize
* it first, and offset must be SizeOfPageHeaderData. Otherwise, offset must
* be equal to pd_lower and pd_lower will be set to the end of the written
* data.
*/
static void
WriteToBlock(Relation rel, BlockNumber blockno, uint32 offset, char *buf,
uint32 len, bool clear)
{
Buffer buffer = ReadBuffer(rel, blockno);
GenericXLogState *state = GenericXLogStart(rel);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
Page page = GenericXLogRegisterBuffer(state, buffer, GENERIC_XLOG_FULL_IMAGE);
PageHeader phdr = (PageHeader) page;
if (PageIsNew(page) || clear)
{
PageInit(page, BLCKSZ, 0);
}
if (phdr->pd_lower < offset || phdr->pd_upper - offset < len)
{
elog(ERROR,
"attempt to write columnar data of length %d to offset %d of block %d of relation %d",
len, offset, blockno, rel->rd_id);
}
/*
* After a transaction has been rolled-back, we might be
* over-writing the rolledback write, so phdr->pd_lower can be
* different from addr.offset.
*
* We reset pd_lower to reset the rolledback write.
*
* Given that we always align page reservation to the next page as of
* 10.2, having such a disk page is only possible if write operaion
* failed in an older version of columnar, but now user attempts writing
* to that table in version >= 10.2.
*/
if (phdr->pd_lower > offset)
{
ereport(DEBUG4, (errmsg("overwriting page %u", blockno),
errdetail("This can happen after a roll-back.")));
phdr->pd_lower = offset;
}
memcpy_s(page + phdr->pd_lower, phdr->pd_upper - phdr->pd_lower, buf, len);
phdr->pd_lower += len;
GenericXLogFinish(state);
UnlockReleaseBuffer(buffer);
}
/*
* AlignReservation - given an unused logical byte offset, align it so that it
* falls at the start of a page.
*
* XXX: Reconsider whether we want/need to do this at all.
*/
static uint64
AlignReservation(uint64 prevReservation)
{
PhysicalAddr prevAddr = LogicalToPhysical(prevReservation);
uint64 alignedReservation = prevReservation;
if (prevAddr.offset != SizeOfPageHeaderData)
{
/* not aligned; align on beginning of next page */
PhysicalAddr initial = { 0 };
initial.blockno = prevAddr.blockno + 1;
initial.offset = SizeOfPageHeaderData;
alignedReservation = PhysicalToLogical(initial);
}
Assert(alignedReservation >= prevReservation);
return alignedReservation;
}
/*
* ColumnarMetapageIsCurrent - is the metapage at the latest version?
*/
static bool
ColumnarMetapageIsCurrent(ColumnarMetapage *metapage)
{
return (metapage->versionMajor == COLUMNAR_VERSION_MAJOR &&
metapage->versionMinor == COLUMNAR_VERSION_MINOR);
}
/*
* ColumnarMetapageIsOlder - is the metapage older than the current version?
*/
static bool
ColumnarMetapageIsOlder(ColumnarMetapage *metapage)
{
return (metapage->versionMajor < COLUMNAR_VERSION_MAJOR ||
(metapage->versionMajor == COLUMNAR_VERSION_MAJOR &&
(int) metapage->versionMinor < (int) COLUMNAR_VERSION_MINOR));
}
/*
* ColumnarMetapageIsNewer - is the metapage newer than the current version?
*/
static bool
ColumnarMetapageIsNewer(ColumnarMetapage *metapage)
{
return (metapage->versionMajor > COLUMNAR_VERSION_MAJOR ||
(metapage->versionMajor == COLUMNAR_VERSION_MAJOR &&
metapage->versionMinor > COLUMNAR_VERSION_MINOR));
}
/*
* ColumnarMetapageCheckVersion - throw an error if accessing old
* version of metapage.
*/
static void
ColumnarMetapageCheckVersion(Relation rel, ColumnarMetapage *metapage)
{
if (!ColumnarMetapageIsCurrent(metapage))
{
ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg(
"attempted to access relation \"%s\", which uses an older columnar format",
RelationGetRelationName(rel)),
errdetail(
"Columnar format version %d.%d is required, \"%s\" has version %d.%d.",
COLUMNAR_VERSION_MAJOR, COLUMNAR_VERSION_MINOR,
RelationGetRelationName(rel),
metapage->versionMajor, metapage->versionMinor),
errhint(OLD_METAPAGE_VERSION_HINT)));
}
}
/*
* test_columnar_storage_write_new_page is a UDF only used for testing
* purposes. It could make more sense to define this in columnar_debug.c,
* but the storage layer doesn't expose ColumnarMetapage to any other files,
* so we define it here.
*/
Datum
test_columnar_storage_write_new_page(PG_FUNCTION_ARGS)
{
Oid relationId = PG_GETARG_OID(0);
Relation relation = relation_open(relationId, AccessShareLock);
/*
* Allocate a new page, write some data to there, and set reserved offset
* to the start of that page. That way, for a subsequent write operation,
* storage layer would try to overwrite the page that we allocated here.
*/
uint64 newPageOffset = ColumnarStorageGetReservedOffset(relation, false);
ColumnarStorageReserveData(relation, 100);
ColumnarStorageWrite(relation, newPageOffset, "foo_bar", 8);
ColumnarMetapage metapage = ColumnarMetapageRead(relation, false);
metapage.reservedOffset = newPageOffset;
ColumnarOverwriteMetapage(relation, metapage);
relation_close(relation, AccessShareLock);
PG_RETURN_VOID();
}

File diff suppressed because it is too large Load Diff

View File

@ -1,775 +0,0 @@
/*-------------------------------------------------------------------------
*
* columnar_writer.c
*
* This file contains function definitions for writing columnar tables. This
* includes the logic for writing file level metadata, writing row stripes,
* and calculating chunk skip nodes.
*
* Copyright (c) 2016, Citus Data, Inc.
*
* $Id$
*
*-------------------------------------------------------------------------
*/
#include "postgres.h"
#include "miscadmin.h"
#include "safe_lib.h"
#include "access/heapam.h"
#include "access/nbtree.h"
#include "catalog/pg_am.h"
#include "storage/fd.h"
#include "storage/smgr.h"
#include "utils/guc.h"
#include "utils/memutils.h"
#include "utils/rel.h"
#include "pg_version_compat.h"
#include "pg_version_constants.h"
#include "columnar/columnar.h"
#include "columnar/columnar_storage.h"
#include "columnar/columnar_version_compat.h"
#if PG_VERSION_NUM >= PG_VERSION_16
#include "storage/relfilelocator.h"
#include "utils/relfilenumbermap.h"
#else
#include "utils/relfilenodemap.h"
#endif
struct ColumnarWriteState
{
TupleDesc tupleDescriptor;
FmgrInfo **comparisonFunctionArray;
RelFileLocator relfilelocator;
MemoryContext stripeWriteContext;
MemoryContext perTupleContext;
StripeBuffers *stripeBuffers;
StripeSkipList *stripeSkipList;
EmptyStripeReservation *emptyStripeReservation;
ColumnarOptions options;
ChunkData *chunkData;
List *chunkGroupRowCounts;
/*
* compressionBuffer buffer is used as temporary storage during
* data value compression operation. It is kept here to minimize
* memory allocations. It lives in stripeWriteContext and gets
* deallocated when memory context is reset.
*/
StringInfo compressionBuffer;
};
static StripeBuffers * CreateEmptyStripeBuffers(uint32 stripeMaxRowCount,
uint32 chunkRowCount,
uint32 columnCount);
static StripeSkipList * CreateEmptyStripeSkipList(uint32 stripeMaxRowCount,
uint32 chunkRowCount,
uint32 columnCount);
static void FlushStripe(ColumnarWriteState *writeState);
static StringInfo SerializeBoolArray(bool *boolArray, uint32 boolArrayLength);
static void SerializeSingleDatum(StringInfo datumBuffer, Datum datum,
bool datumTypeByValue, int datumTypeLength,
char datumTypeAlign);
static void SerializeChunkData(ColumnarWriteState *writeState, uint32 chunkIndex,
uint32 rowCount);
static void UpdateChunkSkipNodeMinMax(ColumnChunkSkipNode *chunkSkipNode,
Datum columnValue, bool columnTypeByValue,
int columnTypeLength, Oid columnCollation,
FmgrInfo *comparisonFunction);
static Datum DatumCopy(Datum datum, bool datumTypeByValue, int datumTypeLength);
static StringInfo CopyStringInfo(StringInfo sourceString);
/*
* ColumnarBeginWrite initializes a columnar data load operation and returns a table
* handle. This handle should be used for adding the row values and finishing the
* data load operation.
*/
ColumnarWriteState *
ColumnarBeginWrite(RelFileLocator relfilelocator,
ColumnarOptions options,
TupleDesc tupleDescriptor)
{
/* get comparison function pointers for each of the columns */
uint32 columnCount = tupleDescriptor->natts;
FmgrInfo **comparisonFunctionArray = palloc0(columnCount * sizeof(FmgrInfo *));
for (uint32 columnIndex = 0; columnIndex < columnCount; columnIndex++)
{
FmgrInfo *comparisonFunction = NULL;
FormData_pg_attribute *attributeForm = TupleDescAttr(tupleDescriptor,
columnIndex);
if (!attributeForm->attisdropped)
{
Oid typeId = attributeForm->atttypid;
comparisonFunction = GetFunctionInfoOrNull(typeId, BTREE_AM_OID,
BTORDER_PROC);
}
comparisonFunctionArray[columnIndex] = comparisonFunction;
}
/*
* We allocate all stripe specific data in the stripeWriteContext, and
* reset this memory context once we have flushed the stripe to the file.
* This is to avoid memory leaks.
*/
MemoryContext stripeWriteContext = AllocSetContextCreate(CurrentMemoryContext,
"Stripe Write Memory Context",
ALLOCSET_DEFAULT_SIZES);
bool *columnMaskArray = palloc(columnCount * sizeof(bool));
memset(columnMaskArray, true, columnCount * sizeof(bool));
ChunkData *chunkData = CreateEmptyChunkData(columnCount, columnMaskArray,
options.chunkRowCount);
ColumnarWriteState *writeState = palloc0(sizeof(ColumnarWriteState));
writeState->relfilelocator = relfilelocator;
writeState->options = options;
writeState->tupleDescriptor = CreateTupleDescCopy(tupleDescriptor);
writeState->comparisonFunctionArray = comparisonFunctionArray;
writeState->stripeBuffers = NULL;
writeState->stripeSkipList = NULL;
writeState->emptyStripeReservation = NULL;
writeState->stripeWriteContext = stripeWriteContext;
writeState->chunkData = chunkData;
writeState->compressionBuffer = NULL;
writeState->perTupleContext = AllocSetContextCreate(CurrentMemoryContext,
"Columnar per tuple context",
ALLOCSET_DEFAULT_SIZES);
return writeState;
}
/*
* ColumnarWriteRow adds a row to the columnar table. If the stripe is not initialized,
* we create structures to hold stripe data and skip list. Then, we serialize and
* append data to serialized value buffer for each of the columns and update
* corresponding skip nodes. Then, whole chunk data is compressed at every
* rowChunkCount insertion. Then, if row count exceeds stripeMaxRowCount, we flush
* the stripe, and add its metadata to the table footer.
*
* Returns the "row number" assigned to written row.
*/
uint64
ColumnarWriteRow(ColumnarWriteState *writeState, Datum *columnValues, bool *columnNulls)
{
uint32 columnIndex = 0;
StripeBuffers *stripeBuffers = writeState->stripeBuffers;
StripeSkipList *stripeSkipList = writeState->stripeSkipList;
uint32 columnCount = writeState->tupleDescriptor->natts;
ColumnarOptions *options = &writeState->options;
const uint32 chunkRowCount = options->chunkRowCount;
ChunkData *chunkData = writeState->chunkData;
MemoryContext oldContext = MemoryContextSwitchTo(writeState->stripeWriteContext);
if (stripeBuffers == NULL)
{
stripeBuffers = CreateEmptyStripeBuffers(options->stripeRowCount,
chunkRowCount, columnCount);
stripeSkipList = CreateEmptyStripeSkipList(options->stripeRowCount,
chunkRowCount, columnCount);
writeState->stripeBuffers = stripeBuffers;
writeState->stripeSkipList = stripeSkipList;
writeState->compressionBuffer = makeStringInfo();
Oid relationId = RelidByRelfilenumber(RelationTablespace_compat(
writeState->relfilelocator),
RelationPhysicalIdentifierNumber_compat(
writeState->relfilelocator));
Relation relation = relation_open(relationId, NoLock);
writeState->emptyStripeReservation =
ReserveEmptyStripe(relation, columnCount, chunkRowCount,
options->stripeRowCount);
relation_close(relation, NoLock);
/*
* serializedValueBuffer lives in stripe write memory context so it needs to be
* initialized when the stripe is created.
*/
for (columnIndex = 0; columnIndex < columnCount; columnIndex++)
{
chunkData->valueBufferArray[columnIndex] = makeStringInfo();
}
}
uint32 chunkIndex = stripeBuffers->rowCount / chunkRowCount;
uint32 chunkRowIndex = stripeBuffers->rowCount % chunkRowCount;
for (columnIndex = 0; columnIndex < columnCount; columnIndex++)
{
ColumnChunkSkipNode **chunkSkipNodeArray = stripeSkipList->chunkSkipNodeArray;
ColumnChunkSkipNode *chunkSkipNode =
&chunkSkipNodeArray[columnIndex][chunkIndex];
if (columnNulls[columnIndex])
{
chunkData->existsArray[columnIndex][chunkRowIndex] = false;
}
else
{
FmgrInfo *comparisonFunction =
writeState->comparisonFunctionArray[columnIndex];
Form_pg_attribute attributeForm =
TupleDescAttr(writeState->tupleDescriptor, columnIndex);
bool columnTypeByValue = attributeForm->attbyval;
int columnTypeLength = attributeForm->attlen;
Oid columnCollation = attributeForm->attcollation;
char columnTypeAlign = attributeForm->attalign;
chunkData->existsArray[columnIndex][chunkRowIndex] = true;
SerializeSingleDatum(chunkData->valueBufferArray[columnIndex],
columnValues[columnIndex], columnTypeByValue,
columnTypeLength, columnTypeAlign);
UpdateChunkSkipNodeMinMax(chunkSkipNode, columnValues[columnIndex],
columnTypeByValue, columnTypeLength,
columnCollation, comparisonFunction);
}
chunkSkipNode->rowCount++;
}
stripeSkipList->chunkCount = chunkIndex + 1;
/* last row of the chunk is inserted serialize the chunk */
if (chunkRowIndex == chunkRowCount - 1)
{
SerializeChunkData(writeState, chunkIndex, chunkRowCount);
}
uint64 writtenRowNumber = writeState->emptyStripeReservation->stripeFirstRowNumber +
stripeBuffers->rowCount;
stripeBuffers->rowCount++;
if (stripeBuffers->rowCount >= options->stripeRowCount)
{
ColumnarFlushPendingWrites(writeState);
}
MemoryContextSwitchTo(oldContext);
return writtenRowNumber;
}
/*
* ColumnarEndWrite finishes a columnar data load operation. If we have an unflushed
* stripe, we flush it.
*/
void
ColumnarEndWrite(ColumnarWriteState *writeState)
{
ColumnarFlushPendingWrites(writeState);
MemoryContextDelete(writeState->stripeWriteContext);
pfree(writeState->comparisonFunctionArray);
FreeChunkData(writeState->chunkData);
pfree(writeState);
}
void
ColumnarFlushPendingWrites(ColumnarWriteState *writeState)
{
StripeBuffers *stripeBuffers = writeState->stripeBuffers;
if (stripeBuffers != NULL)
{
MemoryContext oldContext = MemoryContextSwitchTo(writeState->stripeWriteContext);
FlushStripe(writeState);
MemoryContextReset(writeState->stripeWriteContext);
/* set stripe data and skip list to NULL so they are recreated next time */
writeState->stripeBuffers = NULL;
writeState->stripeSkipList = NULL;
MemoryContextSwitchTo(oldContext);
}
}
/*
* ColumnarWritePerTupleContext
*
* Return per-tuple context for columnar write operation.
*/
MemoryContext
ColumnarWritePerTupleContext(ColumnarWriteState *state)
{
return state->perTupleContext;
}
/*
* CreateEmptyStripeBuffers allocates an empty StripeBuffers structure with the given
* column count.
*/
static StripeBuffers *
CreateEmptyStripeBuffers(uint32 stripeMaxRowCount, uint32 chunkRowCount,
uint32 columnCount)
{
uint32 columnIndex = 0;
uint32 maxChunkCount = (stripeMaxRowCount / chunkRowCount) + 1;
ColumnBuffers **columnBuffersArray = palloc0(columnCount * sizeof(ColumnBuffers *));
for (columnIndex = 0; columnIndex < columnCount; columnIndex++)
{
uint32 chunkIndex = 0;
ColumnChunkBuffers **chunkBuffersArray =
palloc0(maxChunkCount * sizeof(ColumnChunkBuffers *));
for (chunkIndex = 0; chunkIndex < maxChunkCount; chunkIndex++)
{
chunkBuffersArray[chunkIndex] = palloc0(sizeof(ColumnChunkBuffers));
chunkBuffersArray[chunkIndex]->existsBuffer = NULL;
chunkBuffersArray[chunkIndex]->valueBuffer = NULL;
chunkBuffersArray[chunkIndex]->valueCompressionType = COMPRESSION_NONE;
}
columnBuffersArray[columnIndex] = palloc0(sizeof(ColumnBuffers));
columnBuffersArray[columnIndex]->chunkBuffersArray = chunkBuffersArray;
}
StripeBuffers *stripeBuffers = palloc0(sizeof(StripeBuffers));
stripeBuffers->columnBuffersArray = columnBuffersArray;
stripeBuffers->columnCount = columnCount;
stripeBuffers->rowCount = 0;
return stripeBuffers;
}
/*
* CreateEmptyStripeSkipList allocates an empty StripeSkipList structure with
* the given column count. This structure has enough chunks to hold statistics
* for stripeMaxRowCount rows.
*/
static StripeSkipList *
CreateEmptyStripeSkipList(uint32 stripeMaxRowCount, uint32 chunkRowCount,
uint32 columnCount)
{
uint32 columnIndex = 0;
uint32 maxChunkCount = (stripeMaxRowCount / chunkRowCount) + 1;
ColumnChunkSkipNode **chunkSkipNodeArray =
palloc0(columnCount * sizeof(ColumnChunkSkipNode *));
for (columnIndex = 0; columnIndex < columnCount; columnIndex++)
{
chunkSkipNodeArray[columnIndex] =
palloc0(maxChunkCount * sizeof(ColumnChunkSkipNode));
}
StripeSkipList *stripeSkipList = palloc0(sizeof(StripeSkipList));
stripeSkipList->columnCount = columnCount;
stripeSkipList->chunkCount = 0;
stripeSkipList->chunkSkipNodeArray = chunkSkipNodeArray;
return stripeSkipList;
}
/*
* FlushStripe flushes current stripe data into the file. The function first ensures
* the last data chunk for each column is properly serialized and compressed. Then,
* the function creates the skip list and footer buffers. Finally, the function
* flushes the skip list, data, and footer buffers to the file.
*/
static void
FlushStripe(ColumnarWriteState *writeState)
{
uint32 columnIndex = 0;
uint32 chunkIndex = 0;
StripeBuffers *stripeBuffers = writeState->stripeBuffers;
StripeSkipList *stripeSkipList = writeState->stripeSkipList;
ColumnChunkSkipNode **columnSkipNodeArray = stripeSkipList->chunkSkipNodeArray;
TupleDesc tupleDescriptor = writeState->tupleDescriptor;
uint32 columnCount = tupleDescriptor->natts;
uint32 chunkCount = stripeSkipList->chunkCount;
uint32 chunkRowCount = writeState->options.chunkRowCount;
uint32 lastChunkIndex = stripeBuffers->rowCount / chunkRowCount;
uint32 lastChunkRowCount = stripeBuffers->rowCount % chunkRowCount;
uint64 stripeSize = 0;
uint64 stripeRowCount = stripeBuffers->rowCount;
elog(DEBUG1, "Flushing Stripe of size %d", stripeBuffers->rowCount);
Oid relationId = RelidByRelfilenumber(RelationTablespace_compat(
writeState->relfilelocator),
RelationPhysicalIdentifierNumber_compat(
writeState->relfilelocator));
Relation relation = relation_open(relationId, NoLock);
/*
* check if the last chunk needs serialization , the last chunk was not serialized
* if it was not full yet, e.g. (rowCount > 0)
*/
if (lastChunkRowCount > 0)
{
SerializeChunkData(writeState, lastChunkIndex, lastChunkRowCount);
}
/* update buffer sizes in stripe skip list */
for (columnIndex = 0; columnIndex < columnCount; columnIndex++)
{
ColumnChunkSkipNode *chunkSkipNodeArray = columnSkipNodeArray[columnIndex];
ColumnBuffers *columnBuffers = stripeBuffers->columnBuffersArray[columnIndex];
for (chunkIndex = 0; chunkIndex < chunkCount; chunkIndex++)
{
ColumnChunkBuffers *chunkBuffers =
columnBuffers->chunkBuffersArray[chunkIndex];
uint64 existsBufferSize = chunkBuffers->existsBuffer->len;
ColumnChunkSkipNode *chunkSkipNode = &chunkSkipNodeArray[chunkIndex];
chunkSkipNode->existsChunkOffset = stripeSize;
chunkSkipNode->existsLength = existsBufferSize;
stripeSize += existsBufferSize;
}
for (chunkIndex = 0; chunkIndex < chunkCount; chunkIndex++)
{
ColumnChunkBuffers *chunkBuffers =
columnBuffers->chunkBuffersArray[chunkIndex];
uint64 valueBufferSize = chunkBuffers->valueBuffer->len;
CompressionType valueCompressionType = chunkBuffers->valueCompressionType;
ColumnChunkSkipNode *chunkSkipNode = &chunkSkipNodeArray[chunkIndex];
chunkSkipNode->valueChunkOffset = stripeSize;
chunkSkipNode->valueLength = valueBufferSize;
chunkSkipNode->valueCompressionType = valueCompressionType;
chunkSkipNode->valueCompressionLevel = writeState->options.compressionLevel;
chunkSkipNode->decompressedValueSize = chunkBuffers->decompressedValueSize;
stripeSize += valueBufferSize;
}
}
StripeMetadata *stripeMetadata =
CompleteStripeReservation(relation, writeState->emptyStripeReservation->stripeId,
stripeSize, stripeRowCount, chunkCount);
uint64 currentFileOffset = stripeMetadata->fileOffset;
/*
* Each stripe has only one section:
* Data section, in which we store data for each column continuously.
* We store data for each for each column in chunks. For each chunk, we
* store two buffers: "exists" buffer, and "value" buffer. "exists" buffer
* tells which values are not NULL. "value" buffer contains values for
* present values. For each column, we first store all "exists" buffers,
* and then all "value" buffers.
*/
/* flush the data buffers */
for (columnIndex = 0; columnIndex < columnCount; columnIndex++)
{
ColumnBuffers *columnBuffers = stripeBuffers->columnBuffersArray[columnIndex];
for (chunkIndex = 0; chunkIndex < stripeSkipList->chunkCount; chunkIndex++)
{
ColumnChunkBuffers *chunkBuffers =
columnBuffers->chunkBuffersArray[chunkIndex];
StringInfo existsBuffer = chunkBuffers->existsBuffer;
ColumnarStorageWrite(relation, currentFileOffset,
existsBuffer->data, existsBuffer->len);
currentFileOffset += existsBuffer->len;
}
for (chunkIndex = 0; chunkIndex < stripeSkipList->chunkCount; chunkIndex++)
{
ColumnChunkBuffers *chunkBuffers =
columnBuffers->chunkBuffersArray[chunkIndex];
StringInfo valueBuffer = chunkBuffers->valueBuffer;
ColumnarStorageWrite(relation, currentFileOffset,
valueBuffer->data, valueBuffer->len);
currentFileOffset += valueBuffer->len;
}
}
SaveChunkGroups(writeState->relfilelocator,
stripeMetadata->id,
writeState->chunkGroupRowCounts);
SaveStripeSkipList(writeState->relfilelocator,
stripeMetadata->id,
stripeSkipList, tupleDescriptor);
writeState->chunkGroupRowCounts = NIL;
relation_close(relation, NoLock);
}
/*
* SerializeBoolArray serializes the given boolean array and returns the result
* as a StringInfo. This function packs every 8 boolean values into one byte.
*/
static StringInfo
SerializeBoolArray(bool *boolArray, uint32 boolArrayLength)
{
uint32 boolArrayIndex = 0;
uint32 byteCount = ((boolArrayLength * sizeof(bool)) + (8 - sizeof(bool))) / 8;
StringInfo boolArrayBuffer = makeStringInfo();
enlargeStringInfo(boolArrayBuffer, byteCount);
boolArrayBuffer->len = byteCount;
memset(boolArrayBuffer->data, 0, byteCount);
for (boolArrayIndex = 0; boolArrayIndex < boolArrayLength; boolArrayIndex++)
{
if (boolArray[boolArrayIndex])
{
uint32 byteIndex = boolArrayIndex / 8;
uint32 bitIndex = boolArrayIndex % 8;
boolArrayBuffer->data[byteIndex] |= (1 << bitIndex);
}
}
return boolArrayBuffer;
}
/*
* SerializeSingleDatum serializes the given datum value and appends it to the
* provided string info buffer.
*
* Since we don't want to limit datum buffer size to RSIZE_MAX unnecessarily,
* we use memcpy instead of memcpy_s several places in this function.
*/
static void
SerializeSingleDatum(StringInfo datumBuffer, Datum datum, bool datumTypeByValue,
int datumTypeLength, char datumTypeAlign)
{
uint32 datumLength = att_addlength_datum(0, datumTypeLength, datum);
uint32 datumLengthAligned = att_align_nominal(datumLength, datumTypeAlign);
enlargeStringInfo(datumBuffer, datumLengthAligned);
char *currentDatumDataPointer = datumBuffer->data + datumBuffer->len;
memset(currentDatumDataPointer, 0, datumLengthAligned);
if (datumTypeLength > 0)
{
if (datumTypeByValue)
{
store_att_byval(currentDatumDataPointer, datum, datumTypeLength);
}
else
{
memcpy(currentDatumDataPointer, DatumGetPointer(datum), datumTypeLength); /* IGNORE-BANNED */
}
}
else
{
Assert(!datumTypeByValue);
memcpy(currentDatumDataPointer, DatumGetPointer(datum), datumLength); /* IGNORE-BANNED */
}
datumBuffer->len += datumLengthAligned;
}
/*
* SerializeChunkData serializes and compresses chunk data at given chunk index with given
* compression type for every column.
*/
static void
SerializeChunkData(ColumnarWriteState *writeState, uint32 chunkIndex, uint32 rowCount)
{
uint32 columnIndex = 0;
StripeBuffers *stripeBuffers = writeState->stripeBuffers;
ChunkData *chunkData = writeState->chunkData;
CompressionType requestedCompressionType = writeState->options.compressionType;
int compressionLevel = writeState->options.compressionLevel;
const uint32 columnCount = stripeBuffers->columnCount;
StringInfo compressionBuffer = writeState->compressionBuffer;
writeState->chunkGroupRowCounts =
lappend_int(writeState->chunkGroupRowCounts, rowCount);
/* serialize exist values, data values are already serialized */
for (columnIndex = 0; columnIndex < columnCount; columnIndex++)
{
ColumnBuffers *columnBuffers = stripeBuffers->columnBuffersArray[columnIndex];
ColumnChunkBuffers *chunkBuffers = columnBuffers->chunkBuffersArray[chunkIndex];
chunkBuffers->existsBuffer =
SerializeBoolArray(chunkData->existsArray[columnIndex], rowCount);
}
/*
* check and compress value buffers, if a value buffer is not compressable
* then keep it as uncompressed, store compression information.
*/
for (columnIndex = 0; columnIndex < columnCount; columnIndex++)
{
ColumnBuffers *columnBuffers = stripeBuffers->columnBuffersArray[columnIndex];
ColumnChunkBuffers *chunkBuffers = columnBuffers->chunkBuffersArray[chunkIndex];
CompressionType actualCompressionType = COMPRESSION_NONE;
StringInfo serializedValueBuffer = chunkData->valueBufferArray[columnIndex];
Assert(requestedCompressionType >= 0 &&
requestedCompressionType < COMPRESSION_COUNT);
chunkBuffers->decompressedValueSize =
chunkData->valueBufferArray[columnIndex]->len;
/*
* if serializedValueBuffer is be compressed, update serializedValueBuffer
* with compressed data and store compression type.
*/
bool compressed = CompressBuffer(serializedValueBuffer, compressionBuffer,
requestedCompressionType,
compressionLevel);
if (compressed)
{
serializedValueBuffer = compressionBuffer;
actualCompressionType = requestedCompressionType;
}
/* store (compressed) value buffer */
chunkBuffers->valueCompressionType = actualCompressionType;
chunkBuffers->valueBuffer = CopyStringInfo(serializedValueBuffer);
/* valueBuffer needs to be reset for next chunk's data */
resetStringInfo(chunkData->valueBufferArray[columnIndex]);
}
}
/*
* UpdateChunkSkipNodeMinMax takes the given column value, and checks if this
* value falls outside the range of minimum/maximum values of the given column
* chunk skip node. If it does, the function updates the column chunk skip node
* accordingly.
*/
static void
UpdateChunkSkipNodeMinMax(ColumnChunkSkipNode *chunkSkipNode, Datum columnValue,
bool columnTypeByValue, int columnTypeLength,
Oid columnCollation, FmgrInfo *comparisonFunction)
{
bool hasMinMax = chunkSkipNode->hasMinMax;
Datum previousMinimum = chunkSkipNode->minimumValue;
Datum previousMaximum = chunkSkipNode->maximumValue;
Datum currentMinimum = 0;
Datum currentMaximum = 0;
/* if type doesn't have a comparison function, skip min/max values */
if (comparisonFunction == NULL)
{
return;
}
if (!hasMinMax)
{
currentMinimum = DatumCopy(columnValue, columnTypeByValue, columnTypeLength);
currentMaximum = DatumCopy(columnValue, columnTypeByValue, columnTypeLength);
}
else
{
Datum minimumComparisonDatum = FunctionCall2Coll(comparisonFunction,
columnCollation, columnValue,
previousMinimum);
Datum maximumComparisonDatum = FunctionCall2Coll(comparisonFunction,
columnCollation, columnValue,
previousMaximum);
int minimumComparison = DatumGetInt32(minimumComparisonDatum);
int maximumComparison = DatumGetInt32(maximumComparisonDatum);
if (minimumComparison < 0)
{
currentMinimum = DatumCopy(columnValue, columnTypeByValue, columnTypeLength);
}
else
{
currentMinimum = previousMinimum;
}
if (maximumComparison > 0)
{
currentMaximum = DatumCopy(columnValue, columnTypeByValue, columnTypeLength);
}
else
{
currentMaximum = previousMaximum;
}
}
chunkSkipNode->hasMinMax = true;
chunkSkipNode->minimumValue = currentMinimum;
chunkSkipNode->maximumValue = currentMaximum;
}
/* Creates a copy of the given datum. */
static Datum
DatumCopy(Datum datum, bool datumTypeByValue, int datumTypeLength)
{
Datum datumCopy = 0;
if (datumTypeByValue)
{
datumCopy = datum;
}
else
{
uint32 datumLength = att_addlength_datum(0, datumTypeLength, datum);
char *datumData = palloc0(datumLength);
/*
* We use IGNORE-BANNED here since we don't want to limit datum size to
* RSIZE_MAX unnecessarily.
*/
memcpy(datumData, DatumGetPointer(datum), datumLength); /* IGNORE-BANNED */
datumCopy = PointerGetDatum(datumData);
}
return datumCopy;
}
/*
* CopyStringInfo creates a deep copy of given source string allocating only needed
* amount of memory.
*/
static StringInfo
CopyStringInfo(StringInfo sourceString)
{
StringInfo targetString = palloc0(sizeof(StringInfoData));
if (sourceString->len > 0)
{
targetString->data = palloc0(sourceString->len);
targetString->len = sourceString->len;
targetString->maxlen = sourceString->len;
/*
* We use IGNORE-BANNED here since we don't want to limit string
* buffer size to RSIZE_MAX unnecessarily.
*/
memcpy(targetString->data, sourceString->data, sourceString->len); /* IGNORE-BANNED */
}
return targetString;
}
bool
ContainsPendingWrites(ColumnarWriteState *state)
{
return state->stripeBuffers != NULL && state->stripeBuffers->rowCount != 0;
}

View File

@ -1,32 +0,0 @@
/*-------------------------------------------------------------------------
*
* mod.c
*
* This file contains module-level definitions.
*
* Copyright (c) 2016, Citus Data, Inc.
*
* $Id$
*
*-------------------------------------------------------------------------
*/
#include "postgres.h"
#include "fmgr.h"
#include "citus_version.h"
#include "columnar/columnar.h"
#include "columnar/columnar_tableam.h"
PG_MODULE_MAGIC;
void _PG_init(void);
void
_PG_init(void)
{
columnar_init();
}

View File

@ -1 +0,0 @@
../../../vendor/safestringlib/safeclib/

View File

@ -1,32 +0,0 @@
-- add columnar objects back
ALTER EXTENSION citus_columnar ADD SCHEMA columnar;
ALTER EXTENSION citus_columnar ADD SCHEMA columnar_internal;
ALTER EXTENSION citus_columnar ADD SEQUENCE columnar_internal.storageid_seq;
ALTER EXTENSION citus_columnar ADD TABLE columnar_internal.options;
ALTER EXTENSION citus_columnar ADD TABLE columnar_internal.stripe;
ALTER EXTENSION citus_columnar ADD TABLE columnar_internal.chunk_group;
ALTER EXTENSION citus_columnar ADD TABLE columnar_internal.chunk;
ALTER EXTENSION citus_columnar ADD FUNCTION columnar_internal.columnar_handler;
ALTER EXTENSION citus_columnar ADD ACCESS METHOD columnar;
ALTER EXTENSION citus_columnar ADD FUNCTION pg_catalog.alter_columnar_table_set;
ALTER EXTENSION citus_columnar ADD FUNCTION pg_catalog.alter_columnar_table_reset;
ALTER EXTENSION citus_columnar ADD FUNCTION citus_internal.upgrade_columnar_storage;
ALTER EXTENSION citus_columnar ADD FUNCTION citus_internal.downgrade_columnar_storage;
ALTER EXTENSION citus_columnar ADD FUNCTION citus_internal.columnar_ensure_am_depends_catalog;
ALTER EXTENSION citus_columnar ADD FUNCTION columnar.get_storage_id;
ALTER EXTENSION citus_columnar ADD VIEW columnar.storage;
ALTER EXTENSION citus_columnar ADD VIEW columnar.options;
ALTER EXTENSION citus_columnar ADD VIEW columnar.stripe;
ALTER EXTENSION citus_columnar ADD VIEW columnar.chunk_group;
ALTER EXTENSION citus_columnar ADD VIEW columnar.chunk;
-- move citus_internal functions to columnar_internal
ALTER FUNCTION citus_internal.upgrade_columnar_storage(regclass) SET SCHEMA columnar_internal;
ALTER FUNCTION citus_internal.downgrade_columnar_storage(regclass) SET SCHEMA columnar_internal;
ALTER FUNCTION citus_internal.columnar_ensure_am_depends_catalog() SET SCHEMA columnar_internal;

View File

@ -1 +0,0 @@
-- fake sql file 'Y'

View File

@ -1,19 +0,0 @@
-- citus_columnar--11.1-1--11.2-1
#include "udfs/columnar_ensure_am_depends_catalog/11.2-1.sql"
DELETE FROM pg_depend
WHERE classid = 'pg_am'::regclass::oid
AND objid IN (select oid from pg_am where amname = 'columnar')
AND objsubid = 0
AND refclassid = 'pg_class'::regclass::oid
AND refobjid IN (
'columnar_internal.stripe_first_row_number_idx'::regclass::oid,
'columnar_internal.chunk_group_pkey'::regclass::oid,
'columnar_internal.chunk_pkey'::regclass::oid,
'columnar_internal.options_pkey'::regclass::oid,
'columnar_internal.stripe_first_row_number_idx'::regclass::oid,
'columnar_internal.stripe_pkey'::regclass::oid
)
AND refobjsubid = 0
AND deptype = 'n';

View File

@ -1,435 +0,0 @@
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION citus_columnar" to load this file. \quit
-- columnar--9.5-1--10.0-1.sql
CREATE SCHEMA IF NOT EXISTS columnar;
SET search_path TO columnar;
CREATE SEQUENCE IF NOT EXISTS storageid_seq MINVALUE 10000000000 NO CYCLE;
CREATE TABLE IF NOT EXISTS options (
regclass regclass NOT NULL PRIMARY KEY,
chunk_group_row_limit int NOT NULL,
stripe_row_limit int NOT NULL,
compression_level int NOT NULL,
compression name NOT NULL
) WITH (user_catalog_table = true);
COMMENT ON TABLE options IS 'columnar table specific options, maintained by alter_columnar_table_set';
CREATE TABLE IF NOT EXISTS stripe (
storage_id bigint NOT NULL,
stripe_num bigint NOT NULL,
file_offset bigint NOT NULL,
data_length bigint NOT NULL,
column_count int NOT NULL,
chunk_row_count int NOT NULL,
row_count bigint NOT NULL,
chunk_group_count int NOT NULL,
first_row_number bigint NOT NULL,
PRIMARY KEY (storage_id, stripe_num),
CONSTRAINT stripe_first_row_number_idx UNIQUE (storage_id, first_row_number)
) WITH (user_catalog_table = true);
COMMENT ON TABLE stripe IS 'Columnar per stripe metadata';
CREATE TABLE IF NOT EXISTS chunk_group (
storage_id bigint NOT NULL,
stripe_num bigint NOT NULL,
chunk_group_num int NOT NULL,
row_count bigint NOT NULL,
PRIMARY KEY (storage_id, stripe_num, chunk_group_num)
);
COMMENT ON TABLE chunk_group IS 'Columnar chunk group metadata';
CREATE TABLE IF NOT EXISTS chunk (
storage_id bigint NOT NULL,
stripe_num bigint NOT NULL,
attr_num int NOT NULL,
chunk_group_num int NOT NULL,
minimum_value bytea,
maximum_value bytea,
value_stream_offset bigint NOT NULL,
value_stream_length bigint NOT NULL,
exists_stream_offset bigint NOT NULL,
exists_stream_length bigint NOT NULL,
value_compression_type int NOT NULL,
value_compression_level int NOT NULL,
value_decompressed_length bigint NOT NULL,
value_count bigint NOT NULL,
PRIMARY KEY (storage_id, stripe_num, attr_num, chunk_group_num)
) WITH (user_catalog_table = true);
COMMENT ON TABLE chunk IS 'Columnar per chunk metadata';
DO $proc$
BEGIN
-- from version 12 and up we have support for tableam's if installed on pg11 we can't
-- create the objects here. Instead we rely on citus_finish_pg_upgrade to be called by the
-- user instead to add the missing objects
IF substring(current_Setting('server_version'), '\d+')::int >= 12 THEN
EXECUTE $$
--#include "udfs/columnar_handler/10.0-1.sql"
CREATE OR REPLACE FUNCTION columnar.columnar_handler(internal)
RETURNS table_am_handler
LANGUAGE C
AS 'MODULE_PATHNAME', 'columnar_handler';
COMMENT ON FUNCTION columnar.columnar_handler(internal)
IS 'internal function returning the handler for columnar tables';
-- postgres 11.8 does not support the syntax for table am, also it is seemingly trying
-- to parse the upgrade file and erroring on unknown syntax.
-- normally this section would not execute on postgres 11 anyway. To trick it to pass on
-- 11.8 we wrap the statement in a plpgsql block together with an EXECUTE. This is valid
-- syntax on 11.8 and will execute correctly in 12
DO $create_table_am$
BEGIN
EXECUTE 'CREATE ACCESS METHOD columnar TYPE TABLE HANDLER columnar.columnar_handler';
END $create_table_am$;
--#include "udfs/alter_columnar_table_set/10.0-1.sql"
CREATE OR REPLACE FUNCTION pg_catalog.alter_columnar_table_set(
table_name regclass,
chunk_group_row_limit int DEFAULT NULL,
stripe_row_limit int DEFAULT NULL,
compression name DEFAULT null,
compression_level int DEFAULT NULL)
RETURNS void
LANGUAGE C
AS 'MODULE_PATHNAME', 'alter_columnar_table_set';
COMMENT ON FUNCTION pg_catalog.alter_columnar_table_set(
table_name regclass,
chunk_group_row_limit int,
stripe_row_limit int,
compression name,
compression_level int)
IS 'set one or more options on a columnar table, when set to NULL no change is made';
--#include "udfs/alter_columnar_table_reset/10.0-1.sql"
CREATE OR REPLACE FUNCTION pg_catalog.alter_columnar_table_reset(
table_name regclass,
chunk_group_row_limit bool DEFAULT false,
stripe_row_limit bool DEFAULT false,
compression bool DEFAULT false,
compression_level bool DEFAULT false)
RETURNS void
LANGUAGE C
AS 'MODULE_PATHNAME', 'alter_columnar_table_reset';
COMMENT ON FUNCTION pg_catalog.alter_columnar_table_reset(
table_name regclass,
chunk_group_row_limit bool,
stripe_row_limit bool,
compression bool,
compression_level bool)
IS 'reset on or more options on a columnar table to the system defaults';
$$;
END IF;
END$proc$;
-- (this function being dropped in 10.0.3)->#include "udfs/columnar_ensure_objects_exist/10.0-1.sql"
RESET search_path;
-- columnar--10.0.-1 --10.0.2
GRANT USAGE ON SCHEMA columnar TO PUBLIC;
GRANT SELECT ON ALL tables IN SCHEMA columnar TO PUBLIC ;
-- columnar--10.0-3--10.1-1.sql
-- Drop foreign keys between columnar metadata tables.
-- columnar--10.1-1--10.2-1.sql
-- For a proper mapping between tid & (stripe, row_num), add a new column to
-- columnar.stripe and define a BTREE index on this column.
-- Also include storage_id column for per-relation scans.
-- Populate first_row_number column of columnar.stripe table.
--
-- For simplicity, we calculate MAX(row_count) value across all the stripes
-- of all the columanar tables and then use it to populate first_row_number
-- column. This would introduce some gaps however we are okay with that since
-- it's already the case with regular INSERT/COPY's.
DO $$
DECLARE
max_row_count bigint;
-- this should be equal to columnar_storage.h/COLUMNAR_FIRST_ROW_NUMBER
COLUMNAR_FIRST_ROW_NUMBER constant bigint := 1;
BEGIN
SELECT MAX(row_count) INTO max_row_count FROM columnar.stripe;
UPDATE columnar.stripe SET first_row_number = COLUMNAR_FIRST_ROW_NUMBER +
(stripe_num - 1) * max_row_count;
END;
$$;
-- columnar--10.2-1--10.2-2.sql
-- revoke read access for columnar.chunk from unprivileged
-- user as it contains chunk min/max values
REVOKE SELECT ON columnar.chunk FROM PUBLIC;
-- columnar--10.2-2--10.2-3.sql
-- Since stripe_first_row_number_idx is required to scan a columnar table, we
-- need to make sure that it is created before doing anything with columnar
-- tables during pg upgrades.
--
-- However, a plain btree index is not a dependency of a table, so pg_upgrade
-- cannot guarantee that stripe_first_row_number_idx gets created when
-- creating columnar.stripe, unless we make it a unique "constraint".
--
-- To do that, drop stripe_first_row_number_idx and create a unique
-- constraint with the same name to keep the code change at minimum.
-- columnar--10.2-3--10.2-4.sql
-- columnar--11.0-2--11.1-1.sql
CREATE OR REPLACE FUNCTION pg_catalog.alter_columnar_table_set(
table_name regclass,
chunk_group_row_limit int DEFAULT NULL,
stripe_row_limit int DEFAULT NULL,
compression name DEFAULT null,
compression_level int DEFAULT NULL)
RETURNS void
LANGUAGE plpgsql AS
$alter_columnar_table_set$
declare
noop BOOLEAN := true;
cmd TEXT := 'ALTER TABLE ' || table_name::text || ' SET (';
begin
if (chunk_group_row_limit is not null) then
if (not noop) then cmd := cmd || ', '; end if;
cmd := cmd || 'columnar.chunk_group_row_limit=' || chunk_group_row_limit;
noop := false;
end if;
if (stripe_row_limit is not null) then
if (not noop) then cmd := cmd || ', '; end if;
cmd := cmd || 'columnar.stripe_row_limit=' || stripe_row_limit;
noop := false;
end if;
if (compression is not null) then
if (not noop) then cmd := cmd || ', '; end if;
cmd := cmd || 'columnar.compression=' || compression;
noop := false;
end if;
if (compression_level is not null) then
if (not noop) then cmd := cmd || ', '; end if;
cmd := cmd || 'columnar.compression_level=' || compression_level;
noop := false;
end if;
cmd := cmd || ')';
if (not noop) then
execute cmd;
end if;
return;
end;
$alter_columnar_table_set$;
COMMENT ON FUNCTION pg_catalog.alter_columnar_table_set(
table_name regclass,
chunk_group_row_limit int,
stripe_row_limit int,
compression name,
compression_level int)
IS 'set one or more options on a columnar table, when set to NULL no change is made';
CREATE OR REPLACE FUNCTION pg_catalog.alter_columnar_table_reset(
table_name regclass,
chunk_group_row_limit bool DEFAULT false,
stripe_row_limit bool DEFAULT false,
compression bool DEFAULT false,
compression_level bool DEFAULT false)
RETURNS void
LANGUAGE plpgsql AS
$alter_columnar_table_reset$
declare
noop BOOLEAN := true;
cmd TEXT := 'ALTER TABLE ' || table_name::text || ' RESET (';
begin
if (chunk_group_row_limit) then
if (not noop) then cmd := cmd || ', '; end if;
cmd := cmd || 'columnar.chunk_group_row_limit';
noop := false;
end if;
if (stripe_row_limit) then
if (not noop) then cmd := cmd || ', '; end if;
cmd := cmd || 'columnar.stripe_row_limit';
noop := false;
end if;
if (compression) then
if (not noop) then cmd := cmd || ', '; end if;
cmd := cmd || 'columnar.compression';
noop := false;
end if;
if (compression_level) then
if (not noop) then cmd := cmd || ', '; end if;
cmd := cmd || 'columnar.compression_level';
noop := false;
end if;
cmd := cmd || ')';
if (not noop) then
execute cmd;
end if;
return;
end;
$alter_columnar_table_reset$;
COMMENT ON FUNCTION pg_catalog.alter_columnar_table_reset(
table_name regclass,
chunk_group_row_limit bool,
stripe_row_limit bool,
compression bool,
compression_level bool)
IS 'reset on or more options on a columnar table to the system defaults';
-- rename columnar schema to columnar_internal and tighten security
REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA columnar FROM PUBLIC;
ALTER SCHEMA columnar RENAME TO columnar_internal;
REVOKE ALL PRIVILEGES ON SCHEMA columnar_internal FROM PUBLIC;
-- create columnar schema with public usage privileges
CREATE SCHEMA columnar;
GRANT USAGE ON SCHEMA columnar TO PUBLIC;
--#include "udfs/upgrade_columnar_storage/10.2-1.sql"
CREATE OR REPLACE FUNCTION columnar_internal.upgrade_columnar_storage(rel regclass)
RETURNS VOID
STRICT
LANGUAGE c AS 'MODULE_PATHNAME', $$upgrade_columnar_storage$$;
COMMENT ON FUNCTION columnar_internal.upgrade_columnar_storage(regclass)
IS 'function to upgrade the columnar storage, if necessary';
--#include "udfs/downgrade_columnar_storage/10.2-1.sql"
CREATE OR REPLACE FUNCTION columnar_internal.downgrade_columnar_storage(rel regclass)
RETURNS VOID
STRICT
LANGUAGE c AS 'MODULE_PATHNAME', $$downgrade_columnar_storage$$;
COMMENT ON FUNCTION columnar_internal.downgrade_columnar_storage(regclass)
IS 'function to downgrade the columnar storage, if necessary';
-- update UDF to account for columnar_internal schema
CREATE OR REPLACE FUNCTION columnar_internal.columnar_ensure_am_depends_catalog()
RETURNS void
LANGUAGE plpgsql
SET search_path = pg_catalog
AS $func$
BEGIN
INSERT INTO pg_depend
WITH columnar_schema_members(relid) AS (
SELECT pg_class.oid AS relid FROM pg_class
WHERE relnamespace =
COALESCE(
(SELECT pg_namespace.oid FROM pg_namespace WHERE nspname = 'columnar_internal'),
(SELECT pg_namespace.oid FROM pg_namespace WHERE nspname = 'columnar')
)
AND relname IN ('chunk',
'chunk_group',
'chunk_group_pkey',
'chunk_pkey',
'options',
'options_pkey',
'storageid_seq',
'stripe',
'stripe_first_row_number_idx',
'stripe_pkey')
)
SELECT -- Define a dependency edge from "columnar table access method" ..
'pg_am'::regclass::oid as classid,
(select oid from pg_am where amname = 'columnar') as objid,
0 as objsubid,
-- ... to each object that is registered to pg_class and that lives
-- in "columnar" schema. That contains catalog tables, indexes
-- created on them and the sequences created in "columnar" schema.
--
-- Given the possibility of user might have created their own objects
-- in columnar schema, we explicitly specify list of objects that we
-- are interested in.
'pg_class'::regclass::oid as refclassid,
columnar_schema_members.relid as refobjid,
0 as refobjsubid,
'n' as deptype
FROM columnar_schema_members
-- Avoid inserting duplicate entries into pg_depend.
EXCEPT TABLE pg_depend;
END;
$func$;
COMMENT ON FUNCTION columnar_internal.columnar_ensure_am_depends_catalog()
IS 'internal function responsible for creating dependencies from columnar '
'table access method to the rel objects in columnar schema';
SELECT columnar_internal.columnar_ensure_am_depends_catalog();
-- add utility function
CREATE FUNCTION columnar.get_storage_id(regclass) RETURNS bigint
LANGUAGE C STRICT
AS 'citus_columnar', $$columnar_relation_storageid$$;
-- create views for columnar table information
CREATE VIEW columnar.storage WITH (security_barrier) AS
SELECT c.oid::regclass AS relation,
columnar.get_storage_id(c.oid) AS storage_id
FROM pg_class c, pg_am am
WHERE c.relam = am.oid AND am.amname = 'columnar'
AND pg_has_role(c.relowner, 'USAGE');
COMMENT ON VIEW columnar.storage IS 'Columnar relation ID to storage ID mapping.';
GRANT SELECT ON columnar.storage TO PUBLIC;
CREATE VIEW columnar.options WITH (security_barrier) AS
SELECT regclass AS relation, chunk_group_row_limit,
stripe_row_limit, compression, compression_level
FROM columnar_internal.options o, pg_class c
WHERE o.regclass = c.oid
AND pg_has_role(c.relowner, 'USAGE');
COMMENT ON VIEW columnar.options
IS 'Columnar options for tables on which the current user has ownership privileges.';
GRANT SELECT ON columnar.options TO PUBLIC;
CREATE VIEW columnar.stripe WITH (security_barrier) AS
SELECT relation, storage.storage_id, stripe_num, file_offset, data_length,
column_count, chunk_row_count, row_count, chunk_group_count, first_row_number
FROM columnar_internal.stripe stripe, columnar.storage storage
WHERE stripe.storage_id = storage.storage_id;
COMMENT ON VIEW columnar.stripe
IS 'Columnar stripe information for tables on which the current user has ownership privileges.';
GRANT SELECT ON columnar.stripe TO PUBLIC;
CREATE VIEW columnar.chunk_group WITH (security_barrier) AS
SELECT relation, storage.storage_id, stripe_num, chunk_group_num, row_count
FROM columnar_internal.chunk_group cg, columnar.storage storage
WHERE cg.storage_id = storage.storage_id;
COMMENT ON VIEW columnar.chunk_group
IS 'Columnar chunk group information for tables on which the current user has ownership privileges.';
GRANT SELECT ON columnar.chunk_group TO PUBLIC;
CREATE VIEW columnar.chunk WITH (security_barrier) AS
SELECT relation, storage.storage_id, stripe_num, attr_num, chunk_group_num,
minimum_value, maximum_value, value_stream_offset, value_stream_length,
exists_stream_offset, exists_stream_length, value_compression_type,
value_compression_level, value_decompressed_length, value_count
FROM columnar_internal.chunk chunk, columnar.storage storage
WHERE chunk.storage_id = storage.storage_id;
COMMENT ON VIEW columnar.chunk
IS 'Columnar chunk information for tables on which the current user has ownership privileges.';
GRANT SELECT ON columnar.chunk TO PUBLIC;

View File

@ -1 +0,0 @@
-- citus_columnar--11.2-1--11.3-1

View File

@ -1 +0,0 @@
-- citus_columnar--11.3-1--12.2-1

View File

@ -1,5 +0,0 @@
-- columnar--10.0-1--10.0-2.sql
-- grant read access for columnar metadata tables to unprivileged user
GRANT USAGE ON SCHEMA columnar TO PUBLIC;
GRANT SELECT ON ALL tables IN SCHEMA columnar TO PUBLIC ;

View File

@ -1,22 +0,0 @@
-- columnar--10.0-3--10.1-1.sql
-- Drop foreign keys between columnar metadata tables.
-- Postgres assigns different names to those foreign keys in PG11, so act accordingly.
DO $proc$
BEGIN
IF substring(current_Setting('server_version'), '\d+')::int >= 12 THEN
EXECUTE $$
ALTER TABLE columnar.chunk DROP CONSTRAINT chunk_storage_id_stripe_num_chunk_group_num_fkey;
ALTER TABLE columnar.chunk_group DROP CONSTRAINT chunk_group_storage_id_stripe_num_fkey;
$$;
ELSE
EXECUTE $$
ALTER TABLE columnar.chunk DROP CONSTRAINT chunk_storage_id_fkey;
ALTER TABLE columnar.chunk_group DROP CONSTRAINT chunk_group_storage_id_fkey;
$$;
END IF;
END$proc$;
-- since we dropped pg11 support, we don't need to worry about missing
-- columnar objects when upgrading postgres
DROP FUNCTION citus_internal.columnar_ensure_objects_exist();

View File

@ -1,32 +0,0 @@
-- columnar--10.1-1--10.2-1.sql
-- For a proper mapping between tid & (stripe, row_num), add a new column to
-- columnar.stripe and define a BTREE index on this column.
-- Also include storage_id column for per-relation scans.
ALTER TABLE columnar.stripe ADD COLUMN first_row_number bigint;
CREATE INDEX stripe_first_row_number_idx ON columnar.stripe USING BTREE(storage_id, first_row_number);
-- Populate first_row_number column of columnar.stripe table.
--
-- For simplicity, we calculate MAX(row_count) value across all the stripes
-- of all the columanar tables and then use it to populate first_row_number
-- column. This would introduce some gaps however we are okay with that since
-- it's already the case with regular INSERT/COPY's.
DO $$
DECLARE
max_row_count bigint;
-- this should be equal to columnar_storage.h/COLUMNAR_FIRST_ROW_NUMBER
COLUMNAR_FIRST_ROW_NUMBER constant bigint := 1;
BEGIN
SELECT MAX(row_count) INTO max_row_count FROM columnar.stripe;
UPDATE columnar.stripe SET first_row_number = COLUMNAR_FIRST_ROW_NUMBER +
(stripe_num - 1) * max_row_count;
END;
$$;
#include "udfs/upgrade_columnar_storage/10.2-1.sql"
#include "udfs/downgrade_columnar_storage/10.2-1.sql"
-- upgrade storage for all columnar relations
PERFORM citus_internal.upgrade_columnar_storage(c.oid) FROM pg_class c, pg_am a
WHERE c.relam = a.oid AND amname = 'columnar';

View File

@ -1,5 +0,0 @@
-- columnar--10.2-1--10.2-2.sql
-- revoke read access for columnar.chunk from unprivileged
-- user as it contains chunk min/max values
REVOKE SELECT ON columnar.chunk FROM PUBLIC;

Some files were not shown because too many files have changed in this diff Show More