Setup prettier for formatting

We have an .editorconfig file that is not enforced. Here comes prettier
to the rescue.
prettier-setup
Hanefi Onaldi 2021-12-22 18:39:39 +03:00
parent 479b2da740
commit 90f05f8a0e
No known key found for this signature in database
GPG Key ID: F18CDB10BA0DFDC7
17 changed files with 415 additions and 386 deletions

View File

@ -6,10 +6,9 @@ orbs:
parameters: parameters:
image_suffix: image_suffix:
type: string type: string
default: "-v2021_10_27" default: '-v2021_10_27'
jobs: jobs:
build: build:
description: Build the citus extension description: Build the citus extension
parameters: parameters:
@ -267,17 +266,16 @@ jobs:
name: 'Save core dumps' name: 'Save core dumps'
path: /tmp/core_dumps path: /tmp/core_dumps
- store_artifacts: - store_artifacts:
name: "Save logfiles" name: 'Save logfiles'
path: src/test/regress/tmp_citus_test/logfiles path: src/test/regress/tmp_citus_test/logfiles
- codecov/upload: - codecov/upload:
flags: 'test_<< parameters.pg_major >>,upgrade' flags: 'test_<< parameters.pg_major >>,upgrade'
test-citus-upgrade: test-citus-upgrade:
description: Runs citus upgrade tests description: Runs citus upgrade tests
parameters: parameters:
pg_major: pg_major:
description: "postgres major version" description: 'postgres major version'
type: integer type: integer
image: image:
description: 'docker image to use as for the tests' description: 'docker image to use as for the tests'
@ -360,7 +358,7 @@ jobs:
description: Runs the common tests of citus description: Runs the common tests of citus
parameters: parameters:
pg_major: pg_major:
description: "postgres major version" description: 'postgres major version'
type: integer type: integer
image: image:
description: 'docker image to use as for the tests' description: 'docker image to use as for the tests'
@ -370,7 +368,7 @@ jobs:
description: 'docker image tag to use' description: 'docker image tag to use'
type: string type: string
make: make:
description: "make target" description: 'make target'
type: string type: string
docker: docker:
- image: '<< parameters.image >>:<< parameters.image_tag >><< pipeline.parameters.image_suffix >>' - image: '<< parameters.image >>:<< parameters.image_tag >><< pipeline.parameters.image_suffix >>'
@ -436,7 +434,7 @@ jobs:
description: Runs tap tests for citus description: Runs tap tests for citus
parameters: parameters:
pg_major: pg_major:
description: "postgres major version" description: 'postgres major version'
type: integer type: integer
image: image:
description: 'docker image to use as for the tests' description: 'docker image to use as for the tests'
@ -449,7 +447,7 @@ jobs:
description: 'name of the tap test suite to run' description: 'name of the tap test suite to run'
type: string type: string
make: make:
description: "make target" description: 'make target'
type: string type: string
default: installcheck default: installcheck
docker: docker:
@ -541,7 +539,6 @@ workflows:
version: 2 version: 2
build_and_test: build_and_test:
jobs: jobs:
- check-merge-to-enterprise: - check-merge-to-enterprise:
filters: filters:
branches: branches:

View File

@ -5,14 +5,14 @@ codecov:
coverage: coverage:
precision: 2 precision: 2
round: down round: down
range: "70...100" range: '70...100'
ignore: ignore:
- "src/backend/distributed/utils/citus_outfuncs.c" - 'src/backend/distributed/utils/citus_outfuncs.c'
- "src/backend/distributed/deparser/ruleutils_*.c" - 'src/backend/distributed/deparser/ruleutils_*.c'
- "src/include/distributed/citus_nodes.h" - 'src/include/distributed/citus_nodes.h'
- "src/backend/distributed/safeclib" - 'src/backend/distributed/safeclib'
- "vendor" - 'vendor'
status: status:
project: project:
@ -35,6 +35,6 @@ parsers:
macro: no macro: no
comment: comment:
layout: "header, diff" layout: 'header, diff'
behavior: default behavior: default
require_changes: no require_changes: no

12
.prettierignore Normal file
View File

@ -0,0 +1,12 @@
# ignore C files that are already linted with citus_indent
*.h
*.c
# Generated files that should not be linted
Pipfile.lock
# Packaging infra requires a strict CHANGELOG format
CHANGELOG.md
# vendor files that are copied
vendor/safestringlib/README.md

3
.prettierrc Normal file
View File

@ -0,0 +1,3 @@
{
"singleQuote": true
}

View File

@ -2,9 +2,9 @@
We're happy you want to contribute! You can help us in different ways: We're happy you want to contribute! You can help us in different ways:
* Open an [issue](https://github.com/citusdata/citus/issues) with - Open an [issue](https://github.com/citusdata/citus/issues) with
suggestions for improvements suggestions for improvements
* Fork this repository and submit a pull request - Fork this repository and submit a pull request
Before accepting any code contributions we ask that contributors Before accepting any code contributions we ask that contributors
sign a Contributor License Agreement (CLA). For an explanation of sign a Contributor License Agreement (CLA). For an explanation of
@ -175,6 +175,7 @@ created this stable snapshot of the function definition for your version you
should use it in your actual sql file, e.g. should use it in your actual sql file, e.g.
`src/backend/distributed/sql/citus--8.3-1--9.0-1.sql`. You do this by using C `src/backend/distributed/sql/citus--8.3-1--9.0-1.sql`. You do this by using C
style `#include` statements like this: style `#include` statements like this:
``` ```
#include "udfs/myudf/9.0-1.sql" #include "udfs/myudf/9.0-1.sql"
``` ```

View File

@ -44,7 +44,9 @@ reindent:
${citus_abs_top_srcdir}/ci/fix_style.sh ${citus_abs_top_srcdir}/ci/fix_style.sh
check-style: check-style:
cd ${citus_abs_top_srcdir} && citus_indent --quiet --check cd ${citus_abs_top_srcdir} && citus_indent --quiet --check
.PHONY: reindent check-style prettier:
prettier --write .
.PHONY: reindent check-style prettier
# depend on install-all so that downgrade scripts are installed as well # depend on install-all so that downgrade scripts are installed as well
check: all install-all check: all install-all

View File

@ -85,6 +85,7 @@ sudo apt-get -y install postgresql-14-citus-10.2
``` ```
Install packages on CentOS / Fedora / Red Hat: Install packages on CentOS / Fedora / Red Hat:
```bash ```bash
curl https://install.citusdata.com/community/rpm.sh > add-citus-repo.sh curl https://install.citusdata.com/community/rpm.sh > add-citus-repo.sh
sudo bash add-citus-repo.sh sudo bash add-citus-repo.sh
@ -101,7 +102,8 @@ After restarting PostgreSQL, connect using `psql` and run:
```sql ```sql
CREATE EXTENSION citus; CREATE EXTENSION citus;
```` ```
Youre now ready to get started and use Citus tables on a single node. Youre now ready to get started and use Citus tables on a single node.
### Install Citus on multiple nodes ### Install Citus on multiple nodes
@ -315,7 +317,6 @@ Data in distributed tables is stored in “shards”, which are actually just re
When you send a query in which all (co-located) distributed tables have the same filter on the distribution column, Citus will automatically detect that and send the whole query to the worker node that stores the data. That way, arbitrarily complex queries are supported with minimal routing overhead, which is especially useful for scaling transactional workloads. If queries do not have a specific filter, each shard is queried in parallel, which is especially useful in analytical workloads. The Citus distributed executor is adaptive and is designed to handle both query types at the same time on the same system under high concurrency, which enables large-scale mixed workloads. When you send a query in which all (co-located) distributed tables have the same filter on the distribution column, Citus will automatically detect that and send the whole query to the worker node that stores the data. That way, arbitrarily complex queries are supported with minimal routing overhead, which is especially useful for scaling transactional workloads. If queries do not have a specific filter, each shard is queried in parallel, which is especially useful in analytical workloads. The Citus distributed executor is adaptive and is designed to handle both query types at the same time on the same system under high concurrency, which enables large-scale mixed workloads.
## When to use Citus ## When to use Citus
Citus is uniquely capable of scaling both analytical and transactional workloads with up to petabytes of data. Use cases in which Citus is commonly used: Citus is uniquely capable of scaling both analytical and transactional workloads with up to petabytes of data. Use cases in which Citus is commonly used:
@ -330,7 +331,7 @@ Citus is uniquely capable of scaling both analytical and transactional workloads
- **[Time series data](http://docs.citusdata.com/en/stable/use_cases/timeseries.html)**: - **[Time series data](http://docs.citusdata.com/en/stable/use_cases/timeseries.html)**:
Citus enables you to process and analyze very large amounts of time series data. The biggest Citus clusters store well over a petabyte of time series data and ingest terabytes per day. Citus enables you to process and analyze very large amounts of time series data. The biggest Citus clusters store well over a petabyte of time series data and ingest terabytes per day.
Citus integrates seamlessly with [Postgres table partitioning](https://www.postgresql.org/docs/current/ddl-partitioning.html) and [pg_partman](https://www.citusdata.com/blog/2018/01/24/citus-and-pg-partman-creating-a-scalable-time-series-database-on-PostgreSQL/), which can speed up queries and writes on time series tables. You can take advantage of Cituss parallel, distributed query engine for fast analytical queries, and use the built-in *columnar storage* to compress old partitions. Citus integrates seamlessly with [Postgres table partitioning](https://www.postgresql.org/docs/current/ddl-partitioning.html) and [pg_partman](https://www.citusdata.com/blog/2018/01/24/citus-and-pg-partman-creating-a-scalable-time-series-database-on-PostgreSQL/), which can speed up queries and writes on time series tables. You can take advantage of Cituss parallel, distributed query engine for fast analytical queries, and use the built-in _columnar storage_ to compress old partitions.
Example users: [MixRank](https://www.citusdata.com/customers/mixrank), [Windows team](https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/architecting-petabyte-scale-analytics-by-scaling-out-postgres-on/ba-p/969685) Example users: [MixRank](https://www.citusdata.com/customers/mixrank), [Windows team](https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/architecting-petabyte-scale-analytics-by-scaling-out-postgres-on/ba-p/969685)
@ -368,6 +369,6 @@ Citus is built on and of open source, and we welcome your contributions. The [CO
- **Videos**: Check out this [YouTube playlist](https://www.youtube.com/playlist?list=PLixnExCn6lRq261O0iwo4ClYxHpM9qfVy) of some of our favorite Citus videos and demos. If you want to deep dive into how Citus extends PostgreSQL, you might want to check out Marco Slots talk at Carnegie Mellon titled [Citus: Distributed PostgreSQL as an Extension](https://youtu.be/X-aAgXJZRqM) that was part of Andy Pavlos Vaccination Database Talks series at CMUDB. - **Videos**: Check out this [YouTube playlist](https://www.youtube.com/playlist?list=PLixnExCn6lRq261O0iwo4ClYxHpM9qfVy) of some of our favorite Citus videos and demos. If you want to deep dive into how Citus extends PostgreSQL, you might want to check out Marco Slots talk at Carnegie Mellon titled [Citus: Distributed PostgreSQL as an Extension](https://youtu.be/X-aAgXJZRqM) that was part of Andy Pavlos Vaccination Database Talks series at CMUDB.
- **Our other Postgres projects**: Our team also works on other awesome PostgreSQL open source extensions & projects, including: [pg_cron](https://github.com/citusdata/pg_cron), [HyperLogLog](https://github.com/citusdata/postgresql-hll), [TopN](https://github.com/citusdata/postgresql-topn), [pg_auto_failover](https://github.com/citusdata/pg_auto_failover), [activerecord-multi-tenant](https://github.com/citusdata/activerecord-multi-tenant), and [django-multitenant](https://github.com/citusdata/django-multitenant). - **Our other Postgres projects**: Our team also works on other awesome PostgreSQL open source extensions & projects, including: [pg_cron](https://github.com/citusdata/pg_cron), [HyperLogLog](https://github.com/citusdata/postgresql-hll), [TopN](https://github.com/citusdata/postgresql-topn), [pg_auto_failover](https://github.com/citusdata/pg_auto_failover), [activerecord-multi-tenant](https://github.com/citusdata/activerecord-multi-tenant), and [django-multitenant](https://github.com/citusdata/django-multitenant).
___ ---
Copyright © Citus Data, Inc. Copyright © Citus Data, Inc.

View File

@ -42,6 +42,5 @@
"version": "0.0.1", "version": "0.0.1",
"DevelopmentDependency": false "DevelopmentDependency": false
} }
] ]
} }

View File

@ -5,6 +5,7 @@ standards. Be sure you have followed the setup in the [Following our coding
conventions](https://github.com/citusdata/citus/blob/master/CONTRIBUTING.md#following-our-coding-conventions) conventions](https://github.com/citusdata/citus/blob/master/CONTRIBUTING.md#following-our-coding-conventions)
section of `CONTRIBUTING.md`. Once you've done that, most of them should be section of `CONTRIBUTING.md`. Once you've done that, most of them should be
fixed automatically, when running: fixed automatically, when running:
``` ```
make reindent make reindent
``` ```
@ -30,9 +31,11 @@ risk for buffer overflows. This page lists the Microsoft suggested replacements:
https://liquid.microsoft.com/Web/Object/Read/ms.security/Requirements/Microsoft.Security.SystemsADM.10082#guide https://liquid.microsoft.com/Web/Object/Read/ms.security/Requirements/Microsoft.Security.SystemsADM.10082#guide
These replacements are only available on Windows normally. Since we build for These replacements are only available on Windows normally. Since we build for
Linux we make most of them available with this header file: Linux we make most of them available with this header file:
```c ```c
#include "distributed/citus_safe_lib.h" #include "distributed/citus_safe_lib.h"
``` ```
This uses https://github.com/intel/safestringlib to provide them. This uses https://github.com/intel/safestringlib to provide them.
However, still not all of them are available. For those cases we provide However, still not all of them are available. For those cases we provide
@ -40,6 +43,7 @@ some extra functions in `citus_safe_lib.h`, with similar functionality.
If none of those replacements match your requirements you have to do one of the If none of those replacements match your requirements you have to do one of the
following: following:
1. Add a replacement to `citus_safe_lib.{c,h}` that handles the same error cases 1. Add a replacement to `citus_safe_lib.{c,h}` that handles the same error cases
that the `{func_name}_s` function that Microsoft suggests. that the `{func_name}_s` function that Microsoft suggests.
2. Add a `/* IGNORE-BANNED */` comment to the line that complains. Doing this 2. Add a `/* IGNORE-BANNED */` comment to the line that complains. Doing this
@ -76,6 +80,7 @@ follow the below steps.
Before continuing with the real steps make sure you have done the following Before continuing with the real steps make sure you have done the following
(this only needs to be done once): (this only needs to be done once):
1. You have enabled `git rerere` in globally or in your enterprise repo 1. You have enabled `git rerere` in globally or in your enterprise repo
([docs](https://git-scm.com/docs/git-rerere), [very useful blog](https://medium.com/@porteneuve/fix-conflicts-only-once-with-git-rerere-7d116b2cec67#.3vui844dt)): ([docs](https://git-scm.com/docs/git-rerere), [very useful blog](https://medium.com/@porteneuve/fix-conflicts-only-once-with-git-rerere-7d116b2cec67#.3vui844dt)):
```bash ```bash
@ -88,13 +93,13 @@ Before continuing with the real steps make sure you have done the following
2. You have set up the `community` remote on your enterprise as 2. You have set up the `community` remote on your enterprise as
[described in CONTRIBUTING.md](https://github.com/citusdata/citus-enterprise/blob/enterprise-master/CONTRIBUTING.md#merging-community-changes-onto-enterprise). [described in CONTRIBUTING.md](https://github.com/citusdata/citus-enterprise/blob/enterprise-master/CONTRIBUTING.md#merging-community-changes-onto-enterprise).
#### Important notes on `git rerere` #### Important notes on `git rerere`
This is very useful as it will make sure git will automatically redo merges that This is very useful as it will make sure git will automatically redo merges that
you have done before. However, this has a downside too. It will also redo merges you have done before. However, this has a downside too. It will also redo merges
that you did, but that were incorrect. Two work around this you can use these that you did, but that were incorrect. Two work around this you can use these
commands. commands.
1. Make `git rerere` forget a merge: 1. Make `git rerere` forget a merge:
```bash ```bash
git rerere forget <badly_merged_file> git rerere forget <badly_merged_file>
@ -130,6 +135,7 @@ git pull # Make sure your local enterprise-master is up to date
git fetch community # Fetch your up to date branch name git fetch community # Fetch your up to date branch name
git checkout -b "$PR_BRANCH" enterprise-master git checkout -b "$PR_BRANCH" enterprise-master
``` ```
Now you have X in your enterprise repo, which we refer to as Now you have X in your enterprise repo, which we refer to as
`enterprise/$PR_BRANCH` (even though in git commands you would reference it as `enterprise/$PR_BRANCH` (even though in git commands you would reference it as
`origin/$PR_BRANCH`). This branch is currently the same as `enterprise-master`. `origin/$PR_BRANCH`). This branch is currently the same as `enterprise-master`.
@ -139,6 +145,7 @@ should apply without any merge conflicts:
```bash ```bash
git merge community/master git merge community/master
``` ```
Now you need to merge `community/$PR_BRANCH` to `enterprise/$PR_BRANCH`. Solve Now you need to merge `community/$PR_BRANCH` to `enterprise/$PR_BRANCH`. Solve
any conflicts and make sure to remove any parts that should not be in enterprise any conflicts and make sure to remove any parts that should not be in enterprise
even though it doesn't have a conflict, on enterprise repository: even though it doesn't have a conflict, on enterprise repository:
@ -169,8 +176,7 @@ The subsequent PRs on community will be able to pass the
So there's one issue that can occur. Your branch will become outdated with So there's one issue that can occur. Your branch will become outdated with
master and you have to make it up to date. There are two ways to do this using master and you have to make it up to date. There are two ways to do this using
`git merge` or `git rebase`. As usual, `git merge` is a bit easier than `git `git merge` or `git rebase`. As usual, `git merge` is a bit easier than `git rebase`, but clutters git history. This section will explain both. If you don't
rebase`, but clutters git history. This section will explain both. If you don't
know which one makes the most sense, start with `git rebase`. It's possible that know which one makes the most sense, start with `git rebase`. It's possible that
for whatever reason this doesn't work or becomes very complex, for instance when for whatever reason this doesn't work or becomes very complex, for instance when
new merge conflicts appear. Feel free to fall back to `git merge` in that case, new merge conflicts appear. Feel free to fall back to `git merge` in that case,
@ -204,6 +210,7 @@ Automatic merge might have failed with the above command. However, because of
`git rerere` it should have re-applied your original merge resolution. If this `git rerere` it should have re-applied your original merge resolution. If this
is indeed the case it should show something like this in the output of the is indeed the case it should show something like this in the output of the
previous command (note the `Resolved ...` line): previous command (note the `Resolved ...` line):
``` ```
CONFLICT (content): Merge conflict in <file_path> CONFLICT (content): Merge conflict in <file_path>
Resolved '<file_path>' using previous resolution. Resolved '<file_path>' using previous resolution.
@ -213,6 +220,7 @@ Error redoing merge <merge_sha>
Confirm that the merge conflict is indeed resolved correctly. In that case you Confirm that the merge conflict is indeed resolved correctly. In that case you
can do the following: can do the following:
```bash ```bash
# Add files that were conflicting # Add files that were conflicting
git add "$(git diff --name-only --diff-filter=U)" git add "$(git diff --name-only --diff-filter=U)"
@ -222,11 +230,13 @@ git rebase --continue
Before pushing you should do a final check that the commit hash of your final Before pushing you should do a final check that the commit hash of your final
non merge commit matches the commit hash that's on the community repo. If that's non merge commit matches the commit hash that's on the community repo. If that's
not the case, you should fallback to the `git merge` approach. not the case, you should fallback to the `git merge` approach.
```bash ```bash
git reset origin/$PR_BRANCH --hard git reset origin/$PR_BRANCH --hard
``` ```
If the commit hashes were as expected, push the branch: If the commit hashes were as expected, push the branch:
```bash ```bash
git push origin $PR_BRANCH --force-with-lease git push origin $PR_BRANCH --force-with-lease
``` ```
@ -236,6 +246,7 @@ git push origin $PR_BRANCH --force-with-lease
If you are falling back to the `git merge` approach after trying the If you are falling back to the `git merge` approach after trying the
`git rebase` approach, you should first restore the original branch on the `git rebase` approach, you should first restore the original branch on the
community repo. community repo.
```bash ```bash
git checkout $PR_BRANCH git checkout $PR_BRANCH
git reset ${PR_BRANCH}-backup --hard git reset ${PR_BRANCH}-backup --hard
@ -272,6 +283,7 @@ different.
A test should always be included in a schedule file, otherwise it will not be A test should always be included in a schedule file, otherwise it will not be
run in CI. This is most commonly forgotten for newly added tests. In that case run in CI. This is most commonly forgotten for newly added tests. In that case
the dev ran it locally without running a full schedule with something like: the dev ran it locally without running a full schedule with something like:
```bash ```bash
make -C src/test/regress/ check-minimal EXTRA_TESTS='multi_create_table_new_features' make -C src/test/regress/ check-minimal EXTRA_TESTS='multi_create_table_new_features'
``` ```
@ -288,9 +300,11 @@ section in this `README.md` file and that they include `ci/ci_helpers.sh`.
We do not use C-style comments in migration files as the stripped We do not use C-style comments in migration files as the stripped
zero-length migration files cause warning during packaging. zero-length migration files cause warning during packaging.
Instead use SQL type comments, i.e: Instead use SQL type comments, i.e:
``` ```
-- this is a comment -- this is a comment
``` ```
See [#3115](https://github.com/citusdata/citus/pull/3115) for more info. See [#3115](https://github.com/citusdata/citus/pull/3115) for more info.
## `disallow_hash_comments_in_spec_files.sh` ## `disallow_hash_comments_in_spec_files.sh`
@ -298,6 +312,7 @@ See [#3115](https://github.com/citusdata/citus/pull/3115) for more info.
We do not use comments starting with # in spec files because it creates errors We do not use comments starting with # in spec files because it creates errors
from C preprocessor that expects directives after this character. from C preprocessor that expects directives after this character.
Instead use C type comments, i.e: Instead use C type comments, i.e:
``` ```
// this is a single line comment // this is a single line comment
@ -329,13 +344,16 @@ because we are running the tests in a slightly different configuration.
This script tries to make sure that we don't add useless declarations to our This script tries to make sure that we don't add useless declarations to our
code. What it effectively does is replace this: code. What it effectively does is replace this:
```c ```c
int a = 0; int a = 0;
int b = 2; int b = 2;
Assert(b == 2); Assert(b == 2);
a = b + b; a = b + b;
``` ```
With this equivalent, but shorter version: With this equivalent, but shorter version:
```c ```c
int b = 2; int b = 2;
Assert(b == 2); Assert(b == 2);
@ -349,6 +367,7 @@ definitely possible there's a bug in there. So far no bad ones have been found.
A known issue is that it does not replace code in a block after an `#ifdef` like A known issue is that it does not replace code in a block after an `#ifdef` like
this. this.
```c ```c
int foo = 0; int foo = 0;
#ifdef SOMETHING #ifdef SOMETHING
@ -357,6 +376,7 @@ foo = 1
foo = 2 foo = 2
#endif #endif
``` ```
This was deemed to be error prone and not worth the effort. This was deemed to be error prone and not worth the effort.
## `fix_gitignore.sh` ## `fix_gitignore.sh`

View File

@ -7,14 +7,14 @@ reduce IO requirements though compression and projection pushdown.
Existing PostgreSQL row tables work well for OLTP: Existing PostgreSQL row tables work well for OLTP:
* Support `UPDATE`/`DELETE` efficiently - Support `UPDATE`/`DELETE` efficiently
* Efficient single-tuple lookups - Efficient single-tuple lookups
The Citus Columnar tables work best for analytic or DW workloads: The Citus Columnar tables work best for analytic or DW workloads:
* Compression - Compression
* Doesn't read unnecessary columns - Doesn't read unnecessary columns
* Efficient `VACUUM` - Efficient `VACUUM`
# Next generation of cstore_fdw # Next generation of cstore_fdw
@ -23,47 +23,45 @@ Citus Columnar is the next generation of
Benefits of Citus Columnar over cstore_fdw: Benefits of Citus Columnar over cstore_fdw:
* Citus Columnar is based on the [Table Access Method - Citus Columnar is based on the [Table Access Method
API](https://www.postgresql.org/docs/current/tableam.html), which API](https://www.postgresql.org/docs/current/tableam.html), which
allows it to behave exactly like an ordinary heap (row) table for allows it to behave exactly like an ordinary heap (row) table for
most operations. most operations.
* Supports Write-Ahead Log (WAL). - Supports Write-Ahead Log (WAL).
* Supports ``ROLLBACK``. - Supports `ROLLBACK`.
* Supports physical replication. - Supports physical replication.
* Supports recovery, including Point-In-Time Restore (PITR). - Supports recovery, including Point-In-Time Restore (PITR).
* Supports ``pg_dump`` and ``pg_upgrade`` without the need for special - Supports `pg_dump` and `pg_upgrade` without the need for special
options or extra steps. options or extra steps.
* Better user experience; simple ``USING``clause. - Better user experience; simple `USING`clause.
* Supports more features that work on ordinary heap (row) tables. - Supports more features that work on ordinary heap (row) tables.
# Limitations # Limitations
* Append-only (no ``UPDATE``/``DELETE`` support) - Append-only (no `UPDATE`/`DELETE` support)
* No space reclamation (e.g. rolled-back transactions may still - No space reclamation (e.g. rolled-back transactions may still
consume disk space) consume disk space)
* No bitmap index scans - No bitmap index scans
* No tidscans - No tidscans
* No sample scans - No sample scans
* No TOAST support (large values supported inline) - No TOAST support (large values supported inline)
* No support for [``ON - No support for [`ON CONFLICT`](https://www.postgresql.org/docs/12/sql-insert.html#SQL-ON-CONFLICT)
CONFLICT``](https://www.postgresql.org/docs/12/sql-insert.html#SQL-ON-CONFLICT) statements (except `DO NOTHING` actions with no target specified).
statements (except ``DO NOTHING`` actions with no target specified). - No support for tuple locks (`SELECT ... FOR SHARE`, `SELECT ... FOR UPDATE`)
* No support for tuple locks (``SELECT ... FOR SHARE``, ``SELECT - No support for serializable isolation level
... FOR UPDATE``) - Support for PostgreSQL server versions 12+ only
* No support for serializable isolation level - No support for foreign keys, unique constraints, or exclusion
* Support for PostgreSQL server versions 12+ only
* No support for foreign keys, unique constraints, or exclusion
constraints constraints
* No support for logical decoding - No support for logical decoding
* No support for intra-node parallel scans - No support for intra-node parallel scans
* No support for ``AFTER ... FOR EACH ROW`` triggers - No support for `AFTER ... FOR EACH ROW` triggers
* No `UNLOGGED` columnar tables - No `UNLOGGED` columnar tables
Future iterations will incrementally lift the limitations listed above. Future iterations will incrementally lift the limitations listed above.
# User Experience # User Experience
Create a Columnar table by specifying ``USING columnar`` when creating Create a Columnar table by specifying `USING columnar` when creating
the table. the table.
```sql ```sql
@ -80,8 +78,7 @@ CREATE TABLE my_columnar_table
Insert data into the table and read from it like normal (subject to Insert data into the table and read from it like normal (subject to
the limitations listed above). the limitations listed above).
To see internal statistics about the table, use ``VACUUM To see internal statistics about the table, use `VACUUM VERBOSE`. Note that `VACUUM` (without `FULL`) is much faster on a
VERBOSE``. Note that ``VACUUM`` (without ``FULL``) is much faster on a
columnar table, because it scans only the metadata, and not the actual columnar table, because it scans only the metadata, and not the actual
data. data.
@ -109,19 +106,19 @@ SELECT alter_columnar_table_set(
The following options are available: The following options are available:
* **compression**: `[none|pglz|zstd|lz4|lz4hc]` - set the compression type - **compression**: `[none|pglz|zstd|lz4|lz4hc]` - set the compression type
for _newly-inserted_ data. Existing data will not be for _newly-inserted_ data. Existing data will not be
recompressed/decompressed. The default value is `zstd` (if support recompressed/decompressed. The default value is `zstd` (if support
has been compiled in). has been compiled in).
* **compression_level**: ``<integer>`` - Sets compression level. Valid - **compression_level**: `<integer>` - Sets compression level. Valid
settings are from 1 through 19. If the compression method does not settings are from 1 through 19. If the compression method does not
support the level chosen, the closest level will be selected support the level chosen, the closest level will be selected
instead. instead.
* **stripe_row_limit**: ``<integer>`` - the maximum number of rows per - **stripe_row_limit**: `<integer>` - the maximum number of rows per
stripe for _newly-inserted_ data. Existing stripes of data will not stripe for _newly-inserted_ data. Existing stripes of data will not
be changed and may have more rows than this maximum value. The be changed and may have more rows than this maximum value. The
default value is `150000`. default value is `150000`.
* **chunk_group_row_limit**: ``<integer>`` - the maximum number of rows per - **chunk_group_row_limit**: `<integer>` - the maximum number of rows per
chunk for _newly-inserted_ data. Existing chunks of data will not be chunk for _newly-inserted_ data. Existing chunks of data will not be
changed and may have more rows than this maximum value. The default changed and may have more rows than this maximum value. The default
value is `10000`. value is `10000`.
@ -135,13 +132,13 @@ SELECT * FROM columnar.options;
You can also adjust options with a `SET` command of one of the You can also adjust options with a `SET` command of one of the
following GUCs: following GUCs:
* `columnar.compression` - `columnar.compression`
* `columnar.compression_level` - `columnar.compression_level`
* `columnar.stripe_row_limit` - `columnar.stripe_row_limit`
* `columnar.chunk_group_row_limit` - `columnar.chunk_group_row_limit`
GUCs only affect newly-created *tables*, not any newly-created GUCs only affect newly-created _tables_, not any newly-created
*stripes* on an existing table. _stripes_ on an existing table.
## Partitioning ## Partitioning
@ -172,20 +169,19 @@ INSERT INTO parent VALUES ('2020-03-15', 30, 300, 'three thousand'); -- row
When performing operations on a partitioned table with a mix of row When performing operations on a partitioned table with a mix of row
and columnar partitions, take note of the following behaviors for and columnar partitions, take note of the following behaviors for
operations that are supported on row tables but not columnar operations that are supported on row tables but not columnar
(e.g. ``UPDATE``, ``DELETE``, tuple locks, etc.): (e.g. `UPDATE`, `DELETE`, tuple locks, etc.):
* If the operation is targeted at a specific row partition - If the operation is targeted at a specific row partition
(e.g. ``UPDATE p2 SET i = i + 1``), it will succeed; if targeted at (e.g. `UPDATE p2 SET i = i + 1`), it will succeed; if targeted at
a specified columnar partition (e.g. ``UPDATE p1 SET i = i + 1``), a specified columnar partition (e.g. `UPDATE p1 SET i = i + 1`),
it will fail. it will fail.
* If the operation is targeted at the partitioned table and has a - If the operation is targeted at the partitioned table and has a
``WHERE`` clause that excludes all columnar partitions `WHERE` clause that excludes all columnar partitions
(e.g. ``UPDATE parent SET i = i + 1 WHERE ts = '2020-03-15'``), it (e.g. `UPDATE parent SET i = i + 1 WHERE ts = '2020-03-15'`), it
will succeed. will succeed.
* If the operation is targeted at the partitioned table, but does not - If the operation is targeted at the partitioned table, but does not
exclude all columnar partitions, it will fail; even if the actual exclude all columnar partitions, it will fail; even if the actual
data to be updated only affects row tables (e.g. ``UPDATE parent SET data to be updated only affects row tables (e.g. `UPDATE parent SET i = i + 1 WHERE n = 300`).
i = i + 1 WHERE n = 300``).
Note that Citus Columnar supports `btree` and `hash `indexes (and Note that Citus Columnar supports `btree` and `hash `indexes (and
the constraints requiring them) but does not support `gist`, `gin`, the constraints requiring them) but does not support `gist`, `gin`,
@ -207,7 +203,7 @@ ALTER TABLE p2 ADD UNIQUE (n);
Note: ensure that you understand any advanced features that may be Note: ensure that you understand any advanced features that may be
used with the table before converting it (e.g. row-level security, used with the table before converting it (e.g. row-level security,
storage options, constraints, inheritance, etc.), and ensure that they storage options, constraints, inheritance, etc.), and ensure that they
are reproduced in the new table or partition appropriately. ``LIKE``, are reproduced in the new table or partition appropriately. `LIKE`,
used below, is a shorthand that works only in simple cases. used below, is a shorthand that works only in simple cases.
```sql ```sql
@ -221,7 +217,7 @@ SELECT alter_table_set_access_method('my_table', 'heap');
# Performance Microbenchmark # Performance Microbenchmark
*Important*: This microbenchmark is not intended to represent any real _Important_: This microbenchmark is not intended to represent any real
workload. Compression ratios, and therefore performance, will depend workload. Compression ratios, and therefore performance, will depend
heavily on the specific workload. This is only for the purpose of heavily on the specific workload. This is only for the purpose of
illustrating a "columnar friendly" contrived workload that showcases illustrating a "columnar friendly" contrived workload that showcases
@ -299,19 +295,19 @@ total row count: 75000000, stripe count: 500, average rows per stripe: 150000
chunk count: 60000, containing data for dropped columns: 0, zstd compressed: 60000 chunk count: 60000, containing data for dropped columns: 0, zstd compressed: 60000
``` ```
``VACUUM VERBOSE`` reports a smaller compression ratio, because it `VACUUM VERBOSE` reports a smaller compression ratio, because it
only averages the compression ratio of the individual chunks, and does only averages the compression ratio of the individual chunks, and does
not account for the metadata savings of the columnar format. not account for the metadata savings of the columnar format.
## System ## System
* Azure VM: Standard D2s v3 (2 vcpus, 8 GiB memory) - Azure VM: Standard D2s v3 (2 vcpus, 8 GiB memory)
* Linux (ubuntu 18.04) - Linux (ubuntu 18.04)
* Data Drive: Standard HDD (512GB, 500 IOPS Max, 60 MB/s Max) - Data Drive: Standard HDD (512GB, 500 IOPS Max, 60 MB/s Max)
* PostgreSQL 13 (``--with-llvm``, ``--with-python``) - PostgreSQL 13 (`--with-llvm`, `--with-python`)
* ``shared_buffers = 128MB`` - `shared_buffers = 128MB`
* ``max_parallel_workers_per_gather = 0`` - `max_parallel_workers_per_gather = 0`
* ``jit = on`` - `jit = on`
Note: because this was run on a system with enough physical memory to Note: because this was run on a system with enough physical memory to
hold a substantial fraction of the table, the IO benefits of columnar hold a substantial fraction of the table, the IO benefits of columnar
@ -332,6 +328,7 @@ SELECT vendor_id, SUM(quantity) FROM perf_columnar GROUP BY vendor_id OFFSET 100
``` ```
Timing (median of three runs): Timing (median of three runs):
* row: 436s
* columnar: 16s - row: 436s
* speedup: **27X** - columnar: 16s
- speedup: **27X**

View File

@ -8,7 +8,7 @@ heavily focused on distributed tables. Instead of having all commands in `tablec
they are often moved to files that are named after the command. they are often moved to files that are named after the command.
| File | Description | | File | Description |
|------------------------------|-------------| | ---------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `create_distributed_table.c` | Implementation of UDF's for creating distributed tables | | `create_distributed_table.c` | Implementation of UDF's for creating distributed tables |
| `drop_distributed_table.c` | Implementation for dropping metadata for partitions of distributed tables | | `drop_distributed_table.c` | Implementation for dropping metadata for partitions of distributed tables |
| `extension.c` | Implementation of `CREATE EXTENSION` commands for citus specific checks | | `extension.c` | Implementation of `CREATE EXTENSION` commands for citus specific checks |

View File

@ -6,7 +6,6 @@ If the input query is trivial (e.g., no joins, no subqueries/ctes, single table
Distributed planning (`CreateDistributedPlan`) tries several different methods to plan the query: Distributed planning (`CreateDistributedPlan`) tries several different methods to plan the query:
1. Fast-path router planner, proceed if the query prunes down to a single shard of a single table 1. Fast-path router planner, proceed if the query prunes down to a single shard of a single table
2. Router planner, proceed if the query prunes down to a single set of co-located shards 2. Router planner, proceed if the query prunes down to a single set of co-located shards
3. Modification planning, proceed if the query is a DML command and all joins are co-located 3. Modification planning, proceed if the query is a DML command and all joins are co-located
@ -19,7 +18,6 @@ By examining the query tree, if we can decide that the query hits only a single
As the name reveals, this can be considered as a sub-item of Router planner described below. The only difference is that fast-path planner doesn't rely on `standard_planner()` for collecting restriction information. As the name reveals, this can be considered as a sub-item of Router planner described below. The only difference is that fast-path planner doesn't rely on `standard_planner()` for collecting restriction information.
## Router planner ## Router planner
During the call to `standard_planner`, Postgres calls a hook named `multi_relation_restriction_hook`. We use this hook to determine explicit and implicit filters on (occurrences of) distributed tables. We apply shard pruning to all tables using the filters in `PlanRouterQuery`. If all tables prune down to a single shard and all those shards are on the same node, then the query is router plannable meaning it can be fully executed by one of the worker nodes. During the call to `standard_planner`, Postgres calls a hook named `multi_relation_restriction_hook`. We use this hook to determine explicit and implicit filters on (occurrences of) distributed tables. We apply shard pruning to all tables using the filters in `PlanRouterQuery`. If all tables prune down to a single shard and all those shards are on the same node, then the query is router plannable meaning it can be fully executed by one of the worker nodes.
@ -85,6 +83,6 @@ If `INSERT ... SELECT` query can be planned by pushing down it to the worker nod
If the query can not be pushed down to the worker nodes, two different approaches can be followed depending on whether ON CONFLICT or RETURNING clauses are used. If the query can not be pushed down to the worker nodes, two different approaches can be followed depending on whether ON CONFLICT or RETURNING clauses are used.
* If `ON CONFLICT` or `RETURNING` are not used, Citus uses `COPY` command to handle such queries. After planning the `SELECT` part of the `INSERT ... SELECT` query, including subqueries and CTEs, it executes the plan and send results back to the DestReceiver which is created using the target table info. - If `ON CONFLICT` or `RETURNING` are not used, Citus uses `COPY` command to handle such queries. After planning the `SELECT` part of the `INSERT ... SELECT` query, including subqueries and CTEs, it executes the plan and send results back to the DestReceiver which is created using the target table info.
* Since `COPY` command supports neither `ON CONFLICT` nor `RETURNING` clauses, Citus perform `INSERT ... SELECT` queries with `ON CONFLICT` or `RETURNING` clause in two phases. First, Citus plans the `SELECT` part of the query, executes the plan and saves results to the intermediate table which is colocated with target table of the `INSERT ... SELECT` query. Then, `INSERT ... SELECT` query is directly run on the worker node using the intermediate table as the source table. - Since `COPY` command supports neither `ON CONFLICT` nor `RETURNING` clauses, Citus perform `INSERT ... SELECT` queries with `ON CONFLICT` or `RETURNING` clause in two phases. First, Citus plans the `SELECT` part of the query, executes the plan and saves results to the intermediate table which is colocated with target table of the `INSERT ... SELECT` query. Then, `INSERT ... SELECT` query is directly run on the worker node using the intermediate table as the source table.

View File

@ -1,4 +1,3 @@
# How to trigger hammerdb benchmark jobs # How to trigger hammerdb benchmark jobs
You can trigger two types of hammerdb benchmark jobs: You can trigger two types of hammerdb benchmark jobs:

View File

@ -1,11 +1,11 @@
Contributing # Contributing
============
For each message we wish to capture, we have a class definition in `structs.py`. For each message we wish to capture, we have a class definition in `structs.py`.
If there is a new network message that is not yet parsed by our proxy, check the Postgres documentation [here](https://www.postgresql.org/docs/current/protocol-message-formats.html) for message format and add a new class definition. If there is a new network message that is not yet parsed by our proxy, check the Postgres documentation [here](https://www.postgresql.org/docs/current/protocol-message-formats.html) for message format and add a new class definition.
Rooms for improvement: Rooms for improvement:
- Anonymize network dumps by removing shard/placement/transaction ids - Anonymize network dumps by removing shard/placement/transaction ids
- Occasionally changes in our codebase introduces new messages that contain parts that should be anonymized - Occasionally changes in our codebase introduces new messages that contain parts that should be anonymized
- Add missing message format definitions - Add missing message format definitions

View File

@ -1,5 +1,4 @@
Automated Failure testing # Automated Failure testing
=========================
Automated Failure Testing works by inserting a network proxy (mitmproxy) between the Citus coordinator and one of the workers (connections to the other worker are left unchanged). The proxy is configurable, and sits on a fifo waiting for commands. When it receives a command over the fifo it reconfigures itself and sends back response. Regression tests which use automated failure testing communicate with mitmproxy by running special UDFs which talk to said fifo. The tests send commands such as "fail any connection which contain the string `COMMIT`" and then run SQL queries and assert that the coordinator has reasonable behavior when the specified failures occur. Automated Failure Testing works by inserting a network proxy (mitmproxy) between the Citus coordinator and one of the workers (connections to the other worker are left unchanged). The proxy is configurable, and sits on a fifo waiting for commands. When it receives a command over the fifo it reconfigures itself and sends back response. Regression tests which use automated failure testing communicate with mitmproxy by running special UDFs which talk to said fifo. The tests send commands such as "fail any connection which contain the string `COMMIT`" and then run SQL queries and assert that the coordinator has reasonable behavior when the specified failures occur.
@ -14,7 +13,6 @@ Automated Failure Testing works by inserting a network proxy (mitmproxy) between
- [Chaining](#chaining) - [Chaining](#chaining)
- [Recording Network Traffic](#recording-network-traffic) - [Recording Network Traffic](#recording-network-traffic)
## Getting Started ## Getting Started
First off, to use this you'll need mitmproxy, I recommend version `7.0.4`, and I also recommend running it with `python 3.9`. This script integrates pretty deeply with mitmproxy so other versions might fail to work. First off, to use this you'll need mitmproxy, I recommend version `7.0.4`, and I also recommend running it with `python 3.9`. This script integrates pretty deeply with mitmproxy so other versions might fail to work.
@ -58,14 +56,17 @@ Again, the specific port numbers depend on your setup.
### Using Failure Test Helpers ### Using Failure Test Helpers
In a psql front-end run In a psql front-end run
```psql ```psql
# \i src/test/regress/sql/failure_test_helpers.sql # \i src/test/regress/sql/failure_test_helpers.sql
``` ```
> **_NOTE:_** To make the script above work start psql as follows > **_NOTE:_** To make the script above work start psql as follows
>
> ```bash > ```bash
> psql -p9700 --variable=worker_2_port=9702 > psql -p9700 --variable=worker_2_port=9702
> ``` > ```
>
> Assuming the coordinator is running on 9700 and worker 2 (which is going to be intercepted) runs on 9702 > Assuming the coordinator is running on 9700 and worker 2 (which is going to be intercepted) runs on 9702
The above file creates some UDFs and also disables a few citus features which make connections in the background. The above file creates some UDFs and also disables a few citus features which make connections in the background.
@ -104,9 +105,8 @@ Command strings specify a pipline. Each connection is handled individually, and
There are 5 actions you can take on connections: There are 5 actions you can take on connections:
| Action | Description | | Action | Description |
|:--------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | :----------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `conn.allow()` | the default, allows all connections to execute unmodified | | `conn.allow()` | the default, allows all connections to execute unmodified |
| `conn.kill()` | kills all connections immediately after the first packet is sent | | `conn.kill()` | kills all connections immediately after the first packet is sent |
| `conn.reset()` | `kill()` calls `shutdown(SHUT_WR)`, `shutdown(SHUT_RD)`, `close()`. This is a very graceful way to close the socket. `reset()` causes a RST packet to be sent and forces the connection closed in something more resembling an error. | | `conn.reset()` | `kill()` calls `shutdown(SHUT_WR)`, `shutdown(SHUT_RD)`, `close()`. This is a very graceful way to close the socket. `reset()` causes a RST packet to be sent and forces the connection closed in something more resembling an error. |

View File

@ -1,4 +1,4 @@
In this folder, all tests which in the format of '*_add.spec' organized In this folder, all tests which in the format of '\*\_add.spec' organized
according to specific format. according to specific format.
You should use `//` in mx files not `//`. We preprocess mx files with `cpp` to You should use `//` in mx files not `//`. We preprocess mx files with `cpp` to

View File

@ -8,19 +8,19 @@ In the interest of fostering an open and welcoming environment, we as contributo
Examples of behavior that contributes to creating a positive environment include: Examples of behavior that contributes to creating a positive environment include:
* Using welcoming and inclusive language - Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences - Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism - Gracefully accepting constructive criticism
* Focusing on what is best for the community - Focusing on what is best for the community
* Showing empathy towards other community members - Showing empathy towards other community members
Examples of unacceptable behavior by participants include: Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery and unwelcome sexual attention or advances - The use of sexualized language or imagery and unwelcome sexual attention or advances
* Trolling, insulting/derogatory comments, and personal or political attacks - Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment - Public or private harassment
* Publishing others' private information, such as a physical or electronic address, without explicit permission - Publishing others' private information, such as a physical or electronic address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a professional setting - Other conduct which could reasonably be considered inappropriate in a professional setting
## Our Responsibilities ## Our Responsibilities