mirror of https://github.com/citusdata/citus.git
Setup prettier for formatting
We have an .editorconfig file that is not enforced. Here comes prettier to the rescue.prettier-setup
parent
479b2da740
commit
90f05f8a0e
|
@ -6,10 +6,9 @@ orbs:
|
|||
parameters:
|
||||
image_suffix:
|
||||
type: string
|
||||
default: "-v2021_10_27"
|
||||
default: '-v2021_10_27'
|
||||
|
||||
jobs:
|
||||
|
||||
build:
|
||||
description: Build the citus extension
|
||||
parameters:
|
||||
|
@ -267,17 +266,16 @@ jobs:
|
|||
name: 'Save core dumps'
|
||||
path: /tmp/core_dumps
|
||||
- store_artifacts:
|
||||
name: "Save logfiles"
|
||||
name: 'Save logfiles'
|
||||
path: src/test/regress/tmp_citus_test/logfiles
|
||||
- codecov/upload:
|
||||
flags: 'test_<< parameters.pg_major >>,upgrade'
|
||||
|
||||
|
||||
test-citus-upgrade:
|
||||
description: Runs citus upgrade tests
|
||||
parameters:
|
||||
pg_major:
|
||||
description: "postgres major version"
|
||||
description: 'postgres major version'
|
||||
type: integer
|
||||
image:
|
||||
description: 'docker image to use as for the tests'
|
||||
|
@ -360,7 +358,7 @@ jobs:
|
|||
description: Runs the common tests of citus
|
||||
parameters:
|
||||
pg_major:
|
||||
description: "postgres major version"
|
||||
description: 'postgres major version'
|
||||
type: integer
|
||||
image:
|
||||
description: 'docker image to use as for the tests'
|
||||
|
@ -370,7 +368,7 @@ jobs:
|
|||
description: 'docker image tag to use'
|
||||
type: string
|
||||
make:
|
||||
description: "make target"
|
||||
description: 'make target'
|
||||
type: string
|
||||
docker:
|
||||
- image: '<< parameters.image >>:<< parameters.image_tag >><< pipeline.parameters.image_suffix >>'
|
||||
|
@ -436,7 +434,7 @@ jobs:
|
|||
description: Runs tap tests for citus
|
||||
parameters:
|
||||
pg_major:
|
||||
description: "postgres major version"
|
||||
description: 'postgres major version'
|
||||
type: integer
|
||||
image:
|
||||
description: 'docker image to use as for the tests'
|
||||
|
@ -449,7 +447,7 @@ jobs:
|
|||
description: 'name of the tap test suite to run'
|
||||
type: string
|
||||
make:
|
||||
description: "make target"
|
||||
description: 'make target'
|
||||
type: string
|
||||
default: installcheck
|
||||
docker:
|
||||
|
@ -541,7 +539,6 @@ workflows:
|
|||
version: 2
|
||||
build_and_test:
|
||||
jobs:
|
||||
|
||||
- check-merge-to-enterprise:
|
||||
filters:
|
||||
branches:
|
||||
|
@ -807,21 +804,21 @@ workflows:
|
|||
old_pg_major: 12
|
||||
new_pg_major: 13
|
||||
image_tag: '12.8-13.4-14.0'
|
||||
requires: [build-12,build-13]
|
||||
requires: [build-12, build-13]
|
||||
|
||||
- test-pg-upgrade:
|
||||
name: 'test-12-14_check-pg-upgrade'
|
||||
old_pg_major: 12
|
||||
new_pg_major: 14
|
||||
image_tag: '12.8-13.4-14.0'
|
||||
requires: [build-12,build-14]
|
||||
requires: [build-12, build-14]
|
||||
|
||||
- test-pg-upgrade:
|
||||
name: 'test-13-14_check-pg-upgrade'
|
||||
old_pg_major: 13
|
||||
new_pg_major: 14
|
||||
image_tag: '12.8-13.4-14.0'
|
||||
requires: [build-13,build-14]
|
||||
requires: [build-13, build-14]
|
||||
|
||||
- test-citus-upgrade:
|
||||
name: test-12_check-citus-upgrade
|
||||
|
|
14
.codecov.yml
14
.codecov.yml
|
@ -5,14 +5,14 @@ codecov:
|
|||
coverage:
|
||||
precision: 2
|
||||
round: down
|
||||
range: "70...100"
|
||||
range: '70...100'
|
||||
|
||||
ignore:
|
||||
- "src/backend/distributed/utils/citus_outfuncs.c"
|
||||
- "src/backend/distributed/deparser/ruleutils_*.c"
|
||||
- "src/include/distributed/citus_nodes.h"
|
||||
- "src/backend/distributed/safeclib"
|
||||
- "vendor"
|
||||
- 'src/backend/distributed/utils/citus_outfuncs.c'
|
||||
- 'src/backend/distributed/deparser/ruleutils_*.c'
|
||||
- 'src/include/distributed/citus_nodes.h'
|
||||
- 'src/backend/distributed/safeclib'
|
||||
- 'vendor'
|
||||
|
||||
status:
|
||||
project:
|
||||
|
@ -35,6 +35,6 @@ parsers:
|
|||
macro: no
|
||||
|
||||
comment:
|
||||
layout: "header, diff"
|
||||
layout: 'header, diff'
|
||||
behavior: default
|
||||
require_changes: no
|
||||
|
|
|
@ -0,0 +1,12 @@
|
|||
# ignore C files that are already linted with citus_indent
|
||||
*.h
|
||||
*.c
|
||||
|
||||
# Generated files that should not be linted
|
||||
Pipfile.lock
|
||||
|
||||
# Packaging infra requires a strict CHANGELOG format
|
||||
CHANGELOG.md
|
||||
|
||||
# vendor files that are copied
|
||||
vendor/safestringlib/README.md
|
|
@ -0,0 +1,3 @@
|
|||
{
|
||||
"singleQuote": true
|
||||
}
|
127
CONTRIBUTING.md
127
CONTRIBUTING.md
|
@ -2,9 +2,9 @@
|
|||
|
||||
We're happy you want to contribute! You can help us in different ways:
|
||||
|
||||
* Open an [issue](https://github.com/citusdata/citus/issues) with
|
||||
suggestions for improvements
|
||||
* Fork this repository and submit a pull request
|
||||
- Open an [issue](https://github.com/citusdata/citus/issues) with
|
||||
suggestions for improvements
|
||||
- Fork this repository and submit a pull request
|
||||
|
||||
Before accepting any code contributions we ask that contributors
|
||||
sign a Contributor License Agreement (CLA). For an explanation of
|
||||
|
@ -18,95 +18,95 @@ why we ask this as well as instructions for how to proceed, see the
|
|||
1. Install Xcode
|
||||
2. Install packages with Homebrew
|
||||
|
||||
```bash
|
||||
brew update
|
||||
brew install git postgresql python
|
||||
```
|
||||
```bash
|
||||
brew update
|
||||
brew install git postgresql python
|
||||
```
|
||||
|
||||
3. Get, build, and test the code
|
||||
|
||||
```bash
|
||||
git clone https://github.com/citusdata/citus.git
|
||||
```bash
|
||||
git clone https://github.com/citusdata/citus.git
|
||||
|
||||
cd citus
|
||||
./configure
|
||||
make
|
||||
make install
|
||||
cd src/test/regress
|
||||
make check
|
||||
```
|
||||
cd citus
|
||||
./configure
|
||||
make
|
||||
make install
|
||||
cd src/test/regress
|
||||
make check
|
||||
```
|
||||
|
||||
#### Debian-based Linux (Ubuntu, Debian)
|
||||
|
||||
1. Install build dependencies
|
||||
|
||||
```bash
|
||||
echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" | \
|
||||
sudo tee /etc/apt/sources.list.d/pgdg.list
|
||||
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | \
|
||||
sudo apt-key add -
|
||||
sudo apt-get update
|
||||
```bash
|
||||
echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" | \
|
||||
sudo tee /etc/apt/sources.list.d/pgdg.list
|
||||
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | \
|
||||
sudo apt-key add -
|
||||
sudo apt-get update
|
||||
|
||||
sudo apt-get install -y postgresql-server-dev-14 postgresql-14 \
|
||||
autoconf flex git libcurl4-gnutls-dev libicu-dev \
|
||||
libkrb5-dev liblz4-dev libpam0g-dev libreadline-dev \
|
||||
libselinux1-dev libssl-dev libxslt1-dev libzstd-dev \
|
||||
make uuid-dev
|
||||
```
|
||||
sudo apt-get install -y postgresql-server-dev-14 postgresql-14 \
|
||||
autoconf flex git libcurl4-gnutls-dev libicu-dev \
|
||||
libkrb5-dev liblz4-dev libpam0g-dev libreadline-dev \
|
||||
libselinux1-dev libssl-dev libxslt1-dev libzstd-dev \
|
||||
make uuid-dev
|
||||
```
|
||||
|
||||
2. Get, build, and test the code
|
||||
|
||||
```bash
|
||||
git clone https://github.com/citusdata/citus.git
|
||||
cd citus
|
||||
./configure
|
||||
make
|
||||
sudo make install
|
||||
cd src/test/regress
|
||||
make check
|
||||
```
|
||||
```bash
|
||||
git clone https://github.com/citusdata/citus.git
|
||||
cd citus
|
||||
./configure
|
||||
make
|
||||
sudo make install
|
||||
cd src/test/regress
|
||||
make check
|
||||
```
|
||||
|
||||
#### Red Hat-based Linux (RHEL, CentOS, Fedora)
|
||||
|
||||
1. Find the RPM URL for your repo at [yum.postgresql.org](http://yum.postgresql.org/repopackages.php)
|
||||
2. Register its contents with Yum:
|
||||
|
||||
```bash
|
||||
sudo yum install -y <url>
|
||||
```
|
||||
```bash
|
||||
sudo yum install -y <url>
|
||||
```
|
||||
|
||||
3. Register EPEL and SCL repositories for your distro.
|
||||
|
||||
On CentOS:
|
||||
On CentOS:
|
||||
|
||||
```bash
|
||||
yum install -y centos-release-scl-rh epel-release
|
||||
```
|
||||
```bash
|
||||
yum install -y centos-release-scl-rh epel-release
|
||||
```
|
||||
|
||||
On RHEL, see [this RedHat blog post](https://developers.redhat.com/blog/2018/07/07/yum-install-gcc7-clang/) to install set-up SCL first. Then run:
|
||||
On RHEL, see [this RedHat blog post](https://developers.redhat.com/blog/2018/07/07/yum-install-gcc7-clang/) to install set-up SCL first. Then run:
|
||||
|
||||
```bash
|
||||
yum install -y epel-release
|
||||
```
|
||||
```bash
|
||||
yum install -y epel-release
|
||||
```
|
||||
|
||||
4. Install build dependencies
|
||||
|
||||
```bash
|
||||
sudo yum update -y
|
||||
sudo yum groupinstall -y 'Development Tools'
|
||||
sudo yum install -y postgresql14-devel postgresql14-server \
|
||||
git libcurl-devel libxml2-devel libxslt-devel \
|
||||
libzstd-devel llvm-toolset-7-clang llvm5.0 lz4-devel \
|
||||
openssl-devel pam-devel readline-devel
|
||||
```bash
|
||||
sudo yum update -y
|
||||
sudo yum groupinstall -y 'Development Tools'
|
||||
sudo yum install -y postgresql14-devel postgresql14-server \
|
||||
git libcurl-devel libxml2-devel libxslt-devel \
|
||||
libzstd-devel llvm-toolset-7-clang llvm5.0 lz4-devel \
|
||||
openssl-devel pam-devel readline-devel
|
||||
|
||||
git clone https://github.com/citusdata/citus.git
|
||||
cd citus
|
||||
PG_CONFIG=/usr/pgsql-14/bin/pg_config ./configure
|
||||
make
|
||||
sudo make install
|
||||
cd src/test/regress
|
||||
make check
|
||||
```
|
||||
git clone https://github.com/citusdata/citus.git
|
||||
cd citus
|
||||
PG_CONFIG=/usr/pgsql-14/bin/pg_config ./configure
|
||||
make
|
||||
sudo make install
|
||||
cd src/test/regress
|
||||
make check
|
||||
```
|
||||
|
||||
### Following our coding conventions
|
||||
|
||||
|
@ -175,6 +175,7 @@ created this stable snapshot of the function definition for your version you
|
|||
should use it in your actual sql file, e.g.
|
||||
`src/backend/distributed/sql/citus--8.3-1--9.0-1.sql`. You do this by using C
|
||||
style `#include` statements like this:
|
||||
|
||||
```
|
||||
#include "udfs/myudf/9.0-1.sql"
|
||||
```
|
||||
|
|
4
Makefile
4
Makefile
|
@ -44,7 +44,9 @@ reindent:
|
|||
${citus_abs_top_srcdir}/ci/fix_style.sh
|
||||
check-style:
|
||||
cd ${citus_abs_top_srcdir} && citus_indent --quiet --check
|
||||
.PHONY: reindent check-style
|
||||
prettier:
|
||||
prettier --write .
|
||||
.PHONY: reindent check-style prettier
|
||||
|
||||
# depend on install-all so that downgrade scripts are installed as well
|
||||
check: all install-all
|
||||
|
|
93
README.md
93
README.md
|
@ -10,10 +10,10 @@ Citus is a [PostgreSQL extension](https://www.citusdata.com/blog/2017/10/25/what
|
|||
|
||||
With Citus, you extend your PostgreSQL database with new superpowers:
|
||||
|
||||
- **Distributed tables** are sharded across a cluster of PostgreSQL nodes to combine their CPU, memory, storage and I/O capacity.
|
||||
- **References tables** are replicated to all nodes for joins and foreign keys from distributed tables and maximum read performance.
|
||||
- **Distributed query engine** routes and parallelizes SELECT, DML, and other operations on distributed tables across the cluster.
|
||||
- **Columnar storage** compresses data, speeds up scans, and supports fast projections, both on regular and distributed tables.
|
||||
- **Distributed tables** are sharded across a cluster of PostgreSQL nodes to combine their CPU, memory, storage and I/O capacity.
|
||||
- **References tables** are replicated to all nodes for joins and foreign keys from distributed tables and maximum read performance.
|
||||
- **Distributed query engine** routes and parallelizes SELECT, DML, and other operations on distributed tables across the cluster.
|
||||
- **Columnar storage** compresses data, speeds up scans, and supports fast projections, both on regular and distributed tables.
|
||||
|
||||
You can use these Citus superpowers to make your Postgres database scale-out ready on a single Citus node. Or you can build a large cluster capable of handling **high transaction throughputs**, especially in **multi-tenant apps**, run **fast analytical queries**, and process large amounts of **time series** or **IoT data** for **real-time analytics**. When your data size and volume grow, you can easily add more worker nodes to the cluster and rebalance the shards.
|
||||
|
||||
|
@ -23,15 +23,15 @@ Our [SIGMOD '21](https://2021.sigmod.org/) paper [Citus: Distributed PostgreSQL
|
|||
|
||||
Since Citus is an extension to Postgres, you can use Citus with the latest Postgres versions. And Citus works seamlessly with the PostgreSQL tools and extensions you are already familiar with.
|
||||
|
||||
- [Why Citus?](#why-citus)
|
||||
- [Getting Started](#getting-started)
|
||||
- [Using Citus](#using-citus)
|
||||
- [Documentation](#documentation)
|
||||
- [Architecture](#architecture)
|
||||
- [When to Use Citus](#when-to-use-citus)
|
||||
- [Need Help?](#need-help)
|
||||
- [Contributing](#contributing)
|
||||
- [Stay Connected](#stay-connected)
|
||||
- [Why Citus?](#why-citus)
|
||||
- [Getting Started](#getting-started)
|
||||
- [Using Citus](#using-citus)
|
||||
- [Documentation](#documentation)
|
||||
- [Architecture](#architecture)
|
||||
- [When to Use Citus](#when-to-use-citus)
|
||||
- [Need Help?](#need-help)
|
||||
- [Contributing](#contributing)
|
||||
- [Stay Connected](#stay-connected)
|
||||
|
||||
## Why Citus?
|
||||
|
||||
|
@ -85,6 +85,7 @@ sudo apt-get -y install postgresql-14-citus-10.2
|
|||
```
|
||||
|
||||
Install packages on CentOS / Fedora / Red Hat:
|
||||
|
||||
```bash
|
||||
curl https://install.citusdata.com/community/rpm.sh > add-citus-repo.sh
|
||||
sudo bash add-citus-repo.sh
|
||||
|
@ -101,7 +102,8 @@ After restarting PostgreSQL, connect using `psql` and run:
|
|||
|
||||
```sql
|
||||
CREATE EXTENSION citus;
|
||||
````
|
||||
```
|
||||
|
||||
You’re now ready to get started and use Citus tables on a single node.
|
||||
|
||||
### Install Citus on multiple nodes
|
||||
|
@ -220,7 +222,7 @@ WHERE device_type_id = 55;
|
|||
Time: 209.961 ms
|
||||
```
|
||||
|
||||
Co-location also helps you scale [INSERT..SELECT]( https://docs.citusdata.com/en/stable/articles/aggregation.html), [stored procedures]( https://www.citusdata.com/blog/2020/11/21/making-postgres-stored-procedures-9x-faster-in-citus/), and [distributed transactions](https://www.citusdata.com/blog/2017/06/02/scaling-complex-sql-transactions/).
|
||||
Co-location also helps you scale [INSERT..SELECT](https://docs.citusdata.com/en/stable/articles/aggregation.html), [stored procedures](https://www.citusdata.com/blog/2020/11/21/making-postgres-stored-procedures-9x-faster-in-citus/), and [distributed transactions](https://www.citusdata.com/blog/2017/06/02/scaling-complex-sql-transactions/).
|
||||
|
||||
### Creating Reference Tables
|
||||
|
||||
|
@ -295,7 +297,7 @@ CREATE TABLE events_row AS SELECT * FROM events_columnar;
|
|||
|
||||
You can use columnar storage by itself, or in a distributed table to combine the benefits of compression and the distributed query engine.
|
||||
|
||||
When using columnar storage, you should only load data in batch using `COPY` or `INSERT..SELECT` to achieve good compression. Update, delete, and foreign keys are currently unsupported on columnar tables. However, you can use partitioned tables in which newer partitions use row-based storage, and older partitions are compressed using columnar storage.
|
||||
When using columnar storage, you should only load data in batch using `COPY` or `INSERT..SELECT` to achieve good compression. Update, delete, and foreign keys are currently unsupported on columnar tables. However, you can use partitioned tables in which newer partitions use row-based storage, and older partitions are compressed using columnar storage.
|
||||
|
||||
To learn more about columnar storage, check out the [columnar storage README](https://github.com/citusdata/citus/blob/master/src/backend/columnar/README.md).
|
||||
|
||||
|
@ -303,7 +305,7 @@ To learn more about columnar storage, check out the [columnar storage README](ht
|
|||
|
||||
If you’re ready to get started with Citus or want to know more, we recommend reading the [Citus open source documentation](https://docs.citusdata.com/en/stable/). Or, if you are using Citus on Azure, then the [Hyperscale (Citus) documentation](https://docs.microsoft.com/azure/postgresql/hyperscale/) is online and available as part of the Azure Database for PostgreSQL docs.
|
||||
|
||||
Our Citus docs contain comprehensive use case guides on how to build a [multi-tenant SaaS application]( https://docs.citusdata.com/en/stable/use_cases/multi_tenant.html), [real-time analytics dashboard]( https://docs.citusdata.com/en/stable/use_cases/realtime_analytics.html), or work with [time series data]( https://docs.citusdata.com/en/stable/use_cases/timeseries.html).
|
||||
Our Citus docs contain comprehensive use case guides on how to build a [multi-tenant SaaS application](https://docs.citusdata.com/en/stable/use_cases/multi_tenant.html), [real-time analytics dashboard](https://docs.citusdata.com/en/stable/use_cases/realtime_analytics.html), or work with [time series data](https://docs.citusdata.com/en/stable/use_cases/timeseries.html).
|
||||
|
||||
## Architecture
|
||||
|
||||
|
@ -315,44 +317,43 @@ Data in distributed tables is stored in “shards”, which are actually just re
|
|||
|
||||
When you send a query in which all (co-located) distributed tables have the same filter on the distribution column, Citus will automatically detect that and send the whole query to the worker node that stores the data. That way, arbitrarily complex queries are supported with minimal routing overhead, which is especially useful for scaling transactional workloads. If queries do not have a specific filter, each shard is queried in parallel, which is especially useful in analytical workloads. The Citus distributed executor is adaptive and is designed to handle both query types at the same time on the same system under high concurrency, which enables large-scale mixed workloads.
|
||||
|
||||
|
||||
## When to use Citus
|
||||
|
||||
Citus is uniquely capable of scaling both analytical and transactional workloads with up to petabytes of data. Use cases in which Citus is commonly used:
|
||||
|
||||
- **[Customer-facing analytics dashboards](http://docs.citusdata.com/en/stable/use_cases/realtime_analytics.html)**:
|
||||
Citus enables you to build analytics dashboards that simultaneously ingest and process large amounts of data in the database and give sub-second response times even with a large number of concurrent users.
|
||||
- **[Customer-facing analytics dashboards](http://docs.citusdata.com/en/stable/use_cases/realtime_analytics.html)**:
|
||||
Citus enables you to build analytics dashboards that simultaneously ingest and process large amounts of data in the database and give sub-second response times even with a large number of concurrent users.
|
||||
|
||||
The advanced parallel, distributed query engine in Citus combined with PostgreSQL features such as [array types](https://www.postgresql.org/docs/current/arrays.html), [JSONB](https://www.postgresql.org/docs/current/datatype-json.html), [lateral joins](https://heap.io/blog/engineering/postgresqls-powerful-new-join-type-lateral), and extensions like [HyperLogLog](https://github.com/citusdata/postgresql-hll) and [TopN](https://github.com/citusdata/postgresql-topn) allow you to build responsive analytics dashboards no matter how many customers or how much data you have.
|
||||
The advanced parallel, distributed query engine in Citus combined with PostgreSQL features such as [array types](https://www.postgresql.org/docs/current/arrays.html), [JSONB](https://www.postgresql.org/docs/current/datatype-json.html), [lateral joins](https://heap.io/blog/engineering/postgresqls-powerful-new-join-type-lateral), and extensions like [HyperLogLog](https://github.com/citusdata/postgresql-hll) and [TopN](https://github.com/citusdata/postgresql-topn) allow you to build responsive analytics dashboards no matter how many customers or how much data you have.
|
||||
|
||||
Example real-time analytics users: [Algolia](https://www.citusdata.com/customers/algolia), [Heap](https://www.citusdata.com/customers/heap)
|
||||
Example real-time analytics users: [Algolia](https://www.citusdata.com/customers/algolia), [Heap](https://www.citusdata.com/customers/heap)
|
||||
|
||||
- **[Time series data](http://docs.citusdata.com/en/stable/use_cases/timeseries.html)**:
|
||||
Citus enables you to process and analyze very large amounts of time series data. The biggest Citus clusters store well over a petabyte of time series data and ingest terabytes per day.
|
||||
- **[Time series data](http://docs.citusdata.com/en/stable/use_cases/timeseries.html)**:
|
||||
Citus enables you to process and analyze very large amounts of time series data. The biggest Citus clusters store well over a petabyte of time series data and ingest terabytes per day.
|
||||
|
||||
Citus integrates seamlessly with [Postgres table partitioning](https://www.postgresql.org/docs/current/ddl-partitioning.html) and [pg_partman](https://www.citusdata.com/blog/2018/01/24/citus-and-pg-partman-creating-a-scalable-time-series-database-on-PostgreSQL/), which can speed up queries and writes on time series tables. You can take advantage of Citus’s parallel, distributed query engine for fast analytical queries, and use the built-in *columnar storage* to compress old partitions.
|
||||
Citus integrates seamlessly with [Postgres table partitioning](https://www.postgresql.org/docs/current/ddl-partitioning.html) and [pg_partman](https://www.citusdata.com/blog/2018/01/24/citus-and-pg-partman-creating-a-scalable-time-series-database-on-PostgreSQL/), which can speed up queries and writes on time series tables. You can take advantage of Citus’s parallel, distributed query engine for fast analytical queries, and use the built-in _columnar storage_ to compress old partitions.
|
||||
|
||||
Example users: [MixRank](https://www.citusdata.com/customers/mixrank), [Windows team](https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/architecting-petabyte-scale-analytics-by-scaling-out-postgres-on/ba-p/969685)
|
||||
Example users: [MixRank](https://www.citusdata.com/customers/mixrank), [Windows team](https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/architecting-petabyte-scale-analytics-by-scaling-out-postgres-on/ba-p/969685)
|
||||
|
||||
- **[Software-as-a-service (SaaS) applications](http://docs.citusdata.com/en/stable/use_cases/multi_tenant.html)**:
|
||||
SaaS and other multi-tenant applications need to be able to scale their database as the number of tenants/customers grows. Citus enables you to transparently shard a complex data model by the tenant dimension, so your database can grow along with your business.
|
||||
- **[Software-as-a-service (SaaS) applications](http://docs.citusdata.com/en/stable/use_cases/multi_tenant.html)**:
|
||||
SaaS and other multi-tenant applications need to be able to scale their database as the number of tenants/customers grows. Citus enables you to transparently shard a complex data model by the tenant dimension, so your database can grow along with your business.
|
||||
|
||||
By distributing tables along a tenant ID column and co-locating data for the same tenant, Citus can horizontally scale complex (tenant-scoped) queries, transactions, and foreign key graphs. Reference tables and distributed DDL commands make database management a breeze compared to manual sharding. On top of that, you have a built-in distributed query engine for doing cross-tenant analytics inside the database.
|
||||
By distributing tables along a tenant ID column and co-locating data for the same tenant, Citus can horizontally scale complex (tenant-scoped) queries, transactions, and foreign key graphs. Reference tables and distributed DDL commands make database management a breeze compared to manual sharding. On top of that, you have a built-in distributed query engine for doing cross-tenant analytics inside the database.
|
||||
|
||||
Example multi-tenant SaaS users: [Copper](https://www.citusdata.com/customers/copper), [Salesloft](https://fivetran.com/case-studies/replicating-sharded-databases-a-case-study-of-salesloft-citus-data-and-fivetran), [ConvertFlow](https://www.citusdata.com/customers/convertflow)
|
||||
Example multi-tenant SaaS users: [Copper](https://www.citusdata.com/customers/copper), [Salesloft](https://fivetran.com/case-studies/replicating-sharded-databases-a-case-study-of-salesloft-citus-data-and-fivetran), [ConvertFlow](https://www.citusdata.com/customers/convertflow)
|
||||
|
||||
- **Geospatial**:
|
||||
Because of the powerful [PostGIS](https://postgis.net/) extension to Postgres that adds support for geographic objects into Postgres, many people run spatial/GIS applications on top of Postgres. And since spatial location information has become part of our daily life, well, there are more geospatial applications than ever. When your Postgres database needs to scale out to handle an increased workload, Citus is a good fit.
|
||||
- **Geospatial**:
|
||||
Because of the powerful [PostGIS](https://postgis.net/) extension to Postgres that adds support for geographic objects into Postgres, many people run spatial/GIS applications on top of Postgres. And since spatial location information has become part of our daily life, well, there are more geospatial applications than ever. When your Postgres database needs to scale out to handle an increased workload, Citus is a good fit.
|
||||
|
||||
Example geospatial users: [Helsinki Regional Transportation Authority (HSL)](https://customers.microsoft.com/en-us/story/845146-transit-authority-improves-traffic-monitoring-with-azure-database-for-postgresql-hyperscale), [MobilityDB]( https://www.citusdata.com/blog/2020/11/09/analyzing-gps-trajectories-at-scale-with-postgres-mobilitydb/).
|
||||
Example geospatial users: [Helsinki Regional Transportation Authority (HSL)](https://customers.microsoft.com/en-us/story/845146-transit-authority-improves-traffic-monitoring-with-azure-database-for-postgresql-hyperscale), [MobilityDB](https://www.citusdata.com/blog/2020/11/09/analyzing-gps-trajectories-at-scale-with-postgres-mobilitydb/).
|
||||
|
||||
## Need Help?
|
||||
|
||||
- **Slack**: Ask questions in our Citus community [Slack channel](https://slack.citusdata.com).
|
||||
- **GitHub issues**: Please submit issues via [GitHub issues](https://github.com/citusdata/citus/issues).
|
||||
- **Documentation**: Our [Citus docs](https://docs.citusdata.com ) have a wealth of resources, including sections on [query performance tuning](https://docs.citusdata.com/en/stable/performance/performance_tuning.html), [useful diagnostic queries](https://docs.citusdata.com/en/stable/admin_guide/diagnostic_queries.html), and [common error messages](https://docs.citusdata.com/en/stable/reference/common_errors.html).
|
||||
- **Docs issues**: You can also submit documentation issues via [GitHub
|
||||
issues for our Citus docs](https://github.com/citusdata/citus_docs/issues).
|
||||
- **Slack**: Ask questions in our Citus community [Slack channel](https://slack.citusdata.com).
|
||||
- **GitHub issues**: Please submit issues via [GitHub issues](https://github.com/citusdata/citus/issues).
|
||||
- **Documentation**: Our [Citus docs](https://docs.citusdata.com) have a wealth of resources, including sections on [query performance tuning](https://docs.citusdata.com/en/stable/performance/performance_tuning.html), [useful diagnostic queries](https://docs.citusdata.com/en/stable/admin_guide/diagnostic_queries.html), and [common error messages](https://docs.citusdata.com/en/stable/reference/common_errors.html).
|
||||
- **Docs issues**: You can also submit documentation issues via [GitHub
|
||||
issues for our Citus docs](https://github.com/citusdata/citus_docs/issues).
|
||||
|
||||
## Contributing
|
||||
|
||||
|
@ -360,14 +361,14 @@ Citus is built on and of open source, and we welcome your contributions. The [CO
|
|||
|
||||
## Stay Connected
|
||||
|
||||
- **Twitter**: Follow us [@citusdata](https://twitter.com/citusdata) to track the latest posts & updates on what’s happening.
|
||||
- **Citus Blog**: Read our popular [Citus Blog](https://www.citusdata.com/blog/) for useful & informative posts about PostgreSQL and Citus.
|
||||
- **Citus Newsletter**: Subscribe to our monthly technical [Citus Newsletter](https://www.citusdata.com/join-newsletter) to get a curated collection of our favorite posts, videos, docs, talks, & other Postgres goodies.
|
||||
- **Slack**: Our [Citus Public slack]( https://slack.citusdata.com/) is a good way to stay connected, not just with us but with other Citus users.
|
||||
- **Sister Blog**: Read our Azure Database for PostgreSQL [sister blog on Microsoft TechCommunity](https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/bg-p/ADforPostgreSQL) for posts relating to Postgres (and Citus) on Azure.
|
||||
- **Videos**: Check out this [YouTube playlist](https://www.youtube.com/playlist?list=PLixnExCn6lRq261O0iwo4ClYxHpM9qfVy) of some of our favorite Citus videos and demos. If you want to deep dive into how Citus extends PostgreSQL, you might want to check out Marco Slot’s talk at Carnegie Mellon titled [Citus: Distributed PostgreSQL as an Extension](https://youtu.be/X-aAgXJZRqM) that was part of Andy Pavlo’s Vaccination Database Talks series at CMUDB.
|
||||
- **Our other Postgres projects**: Our team also works on other awesome PostgreSQL open source extensions & projects, including: [pg_cron]( https://github.com/citusdata/pg_cron), [HyperLogLog](https://github.com/citusdata/postgresql-hll), [TopN](https://github.com/citusdata/postgresql-topn), [pg_auto_failover](https://github.com/citusdata/pg_auto_failover), [activerecord-multi-tenant](https://github.com/citusdata/activerecord-multi-tenant), and [django-multitenant](https://github.com/citusdata/django-multitenant).
|
||||
- **Twitter**: Follow us [@citusdata](https://twitter.com/citusdata) to track the latest posts & updates on what’s happening.
|
||||
- **Citus Blog**: Read our popular [Citus Blog](https://www.citusdata.com/blog/) for useful & informative posts about PostgreSQL and Citus.
|
||||
- **Citus Newsletter**: Subscribe to our monthly technical [Citus Newsletter](https://www.citusdata.com/join-newsletter) to get a curated collection of our favorite posts, videos, docs, talks, & other Postgres goodies.
|
||||
- **Slack**: Our [Citus Public slack](https://slack.citusdata.com/) is a good way to stay connected, not just with us but with other Citus users.
|
||||
- **Sister Blog**: Read our Azure Database for PostgreSQL [sister blog on Microsoft TechCommunity](https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/bg-p/ADforPostgreSQL) for posts relating to Postgres (and Citus) on Azure.
|
||||
- **Videos**: Check out this [YouTube playlist](https://www.youtube.com/playlist?list=PLixnExCn6lRq261O0iwo4ClYxHpM9qfVy) of some of our favorite Citus videos and demos. If you want to deep dive into how Citus extends PostgreSQL, you might want to check out Marco Slot’s talk at Carnegie Mellon titled [Citus: Distributed PostgreSQL as an Extension](https://youtu.be/X-aAgXJZRqM) that was part of Andy Pavlo’s Vaccination Database Talks series at CMUDB.
|
||||
- **Our other Postgres projects**: Our team also works on other awesome PostgreSQL open source extensions & projects, including: [pg_cron](https://github.com/citusdata/pg_cron), [HyperLogLog](https://github.com/citusdata/postgresql-hll), [TopN](https://github.com/citusdata/postgresql-topn), [pg_auto_failover](https://github.com/citusdata/pg_auto_failover), [activerecord-multi-tenant](https://github.com/citusdata/activerecord-multi-tenant), and [django-multitenant](https://github.com/citusdata/django-multitenant).
|
||||
|
||||
___
|
||||
---
|
||||
|
||||
Copyright © Citus Data, Inc.
|
||||
|
|
|
@ -1,47 +1,46 @@
|
|||
{
|
||||
"Registrations": [
|
||||
{
|
||||
"Component": {
|
||||
"Type": "git",
|
||||
"git": {
|
||||
"RepositoryUrl": "https://github.com/intel/safestringlib",
|
||||
"CommitHash": "245c4b8cff1d2e7338b7f3a82828fc8e72b29549"
|
||||
}
|
||||
},
|
||||
"DevelopmentDependency": false
|
||||
},
|
||||
{
|
||||
"Component": {
|
||||
"Type": "git",
|
||||
"git": {
|
||||
"RepositoryUrl": "https://github.com/postgres/postgres",
|
||||
"CommitHash": "29be9983a64c011eac0b9ee29895cce71e15ea77"
|
||||
}
|
||||
},
|
||||
"license": "PostgreSQL",
|
||||
"licenseDetail": [
|
||||
"Registrations": [
|
||||
{
|
||||
"Component": {
|
||||
"Type": "git",
|
||||
"git": {
|
||||
"RepositoryUrl": "https://github.com/intel/safestringlib",
|
||||
"CommitHash": "245c4b8cff1d2e7338b7f3a82828fc8e72b29549"
|
||||
}
|
||||
},
|
||||
"DevelopmentDependency": false
|
||||
},
|
||||
{
|
||||
"Component": {
|
||||
"Type": "git",
|
||||
"git": {
|
||||
"RepositoryUrl": "https://github.com/postgres/postgres",
|
||||
"CommitHash": "29be9983a64c011eac0b9ee29895cce71e15ea77"
|
||||
}
|
||||
},
|
||||
"license": "PostgreSQL",
|
||||
"licenseDetail": [
|
||||
"Portions Copyright (c) 1996-2010, The PostgreSQL Global Development Group",
|
||||
"",
|
||||
"Portions Copyright (c) 1994, The Regents of the University of California",
|
||||
"",
|
||||
"Permission to use, copy, modify, and distribute this software and its documentation for ",
|
||||
"any purpose, without fee, and without a written agreement is hereby granted, provided ",
|
||||
"that the above copyright notice and this paragraph and the following two paragraphs appear ",
|
||||
"in all copies.",
|
||||
"",
|
||||
"IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, ",
|
||||
"INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS ",
|
||||
"SOFTWARE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE ",
|
||||
"POSSIBILITY OF SUCH DAMAGE.",
|
||||
"",
|
||||
"THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, ",
|
||||
"THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED ",
|
||||
"HEREUNDER IS ON AN \"AS IS\" BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE ",
|
||||
"MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS."
|
||||
],
|
||||
"version": "0.0.1",
|
||||
"DevelopmentDependency": false
|
||||
}
|
||||
|
||||
]
|
||||
"",
|
||||
"Permission to use, copy, modify, and distribute this software and its documentation for ",
|
||||
"any purpose, without fee, and without a written agreement is hereby granted, provided ",
|
||||
"that the above copyright notice and this paragraph and the following two paragraphs appear ",
|
||||
"in all copies.",
|
||||
"",
|
||||
"IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, ",
|
||||
"INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS ",
|
||||
"SOFTWARE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE ",
|
||||
"POSSIBILITY OF SUCH DAMAGE.",
|
||||
"",
|
||||
"THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, ",
|
||||
"THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED ",
|
||||
"HEREUNDER IS ON AN \"AS IS\" BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE ",
|
||||
"MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS."
|
||||
],
|
||||
"version": "0.0.1",
|
||||
"DevelopmentDependency": false
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
56
ci/README.md
56
ci/README.md
|
@ -5,6 +5,7 @@ standards. Be sure you have followed the setup in the [Following our coding
|
|||
conventions](https://github.com/citusdata/citus/blob/master/CONTRIBUTING.md#following-our-coding-conventions)
|
||||
section of `CONTRIBUTING.md`. Once you've done that, most of them should be
|
||||
fixed automatically, when running:
|
||||
|
||||
```
|
||||
make reindent
|
||||
```
|
||||
|
@ -30,9 +31,11 @@ risk for buffer overflows. This page lists the Microsoft suggested replacements:
|
|||
https://liquid.microsoft.com/Web/Object/Read/ms.security/Requirements/Microsoft.Security.SystemsADM.10082#guide
|
||||
These replacements are only available on Windows normally. Since we build for
|
||||
Linux we make most of them available with this header file:
|
||||
|
||||
```c
|
||||
#include "distributed/citus_safe_lib.h"
|
||||
```
|
||||
|
||||
This uses https://github.com/intel/safestringlib to provide them.
|
||||
|
||||
However, still not all of them are available. For those cases we provide
|
||||
|
@ -40,6 +43,7 @@ some extra functions in `citus_safe_lib.h`, with similar functionality.
|
|||
|
||||
If none of those replacements match your requirements you have to do one of the
|
||||
following:
|
||||
|
||||
1. Add a replacement to `citus_safe_lib.{c,h}` that handles the same error cases
|
||||
that the `{func_name}_s` function that Microsoft suggests.
|
||||
2. Add a `/* IGNORE-BANNED */` comment to the line that complains. Doing this
|
||||
|
@ -65,8 +69,8 @@ There are two conditions in which this check passes:
|
|||
1. There are no merge conflicts between your PR branch and `enterprise-master` and after this merge the code compiles.
|
||||
2. There are merge conflicts, but there is a branch with the same name in the
|
||||
enterprise repo that:
|
||||
1. Contains the last commit of the community branch with the same name.
|
||||
2. Merges cleanly into `enterprise-master`
|
||||
1. Contains the last commit of the community branch with the same name.
|
||||
2. Merges cleanly into `enterprise-master`
|
||||
3. After merging, the code can be compiled.
|
||||
|
||||
If the job already passes, you are done, nothing further required! Otherwise
|
||||
|
@ -76,35 +80,36 @@ follow the below steps.
|
|||
|
||||
Before continuing with the real steps make sure you have done the following
|
||||
(this only needs to be done once):
|
||||
|
||||
1. You have enabled `git rerere` in globally or in your enterprise repo
|
||||
([docs](https://git-scm.com/docs/git-rerere), [very useful blog](https://medium.com/@porteneuve/fix-conflicts-only-once-with-git-rerere-7d116b2cec67#.3vui844dt)):
|
||||
```bash
|
||||
# Enables it globally for all repos
|
||||
git config --global rerere.enabled true
|
||||
# Enables it only for the enterprise repo
|
||||
cd <enterprise-repo>
|
||||
git config rerere.enabled true
|
||||
```
|
||||
```bash
|
||||
# Enables it globally for all repos
|
||||
git config --global rerere.enabled true
|
||||
# Enables it only for the enterprise repo
|
||||
cd <enterprise-repo>
|
||||
git config rerere.enabled true
|
||||
```
|
||||
2. You have set up the `community` remote on your enterprise as
|
||||
[described in CONTRIBUTING.md](https://github.com/citusdata/citus-enterprise/blob/enterprise-master/CONTRIBUTING.md#merging-community-changes-onto-enterprise).
|
||||
|
||||
|
||||
#### Important notes on `git rerere`
|
||||
|
||||
This is very useful as it will make sure git will automatically redo merges that
|
||||
you have done before. However, this has a downside too. It will also redo merges
|
||||
that you did, but that were incorrect. Two work around this you can use these
|
||||
commands.
|
||||
|
||||
1. Make `git rerere` forget a merge:
|
||||
```bash
|
||||
git rerere forget <badly_merged_file>
|
||||
```
|
||||
```bash
|
||||
git rerere forget <badly_merged_file>
|
||||
```
|
||||
2. During conflict resolution where `git rerere` already applied the bad merge,
|
||||
simply forgetting it is not enough. Since it is already applied. In that case
|
||||
you also have to undo the apply using:
|
||||
```bash
|
||||
git checkout --conflict=merge <badly_merged_file>
|
||||
```
|
||||
```bash
|
||||
git checkout --conflict=merge <badly_merged_file>
|
||||
```
|
||||
|
||||
### Actual steps
|
||||
|
||||
|
@ -130,6 +135,7 @@ git pull # Make sure your local enterprise-master is up to date
|
|||
git fetch community # Fetch your up to date branch name
|
||||
git checkout -b "$PR_BRANCH" enterprise-master
|
||||
```
|
||||
|
||||
Now you have X in your enterprise repo, which we refer to as
|
||||
`enterprise/$PR_BRANCH` (even though in git commands you would reference it as
|
||||
`origin/$PR_BRANCH`). This branch is currently the same as `enterprise-master`.
|
||||
|
@ -139,6 +145,7 @@ should apply without any merge conflicts:
|
|||
```bash
|
||||
git merge community/master
|
||||
```
|
||||
|
||||
Now you need to merge `community/$PR_BRANCH` to `enterprise/$PR_BRANCH`. Solve
|
||||
any conflicts and make sure to remove any parts that should not be in enterprise
|
||||
even though it doesn't have a conflict, on enterprise repository:
|
||||
|
@ -169,8 +176,7 @@ The subsequent PRs on community will be able to pass the
|
|||
|
||||
So there's one issue that can occur. Your branch will become outdated with
|
||||
master and you have to make it up to date. There are two ways to do this using
|
||||
`git merge` or `git rebase`. As usual, `git merge` is a bit easier than `git
|
||||
rebase`, but clutters git history. This section will explain both. If you don't
|
||||
`git merge` or `git rebase`. As usual, `git merge` is a bit easier than `git rebase`, but clutters git history. This section will explain both. If you don't
|
||||
know which one makes the most sense, start with `git rebase`. It's possible that
|
||||
for whatever reason this doesn't work or becomes very complex, for instance when
|
||||
new merge conflicts appear. Feel free to fall back to `git merge` in that case,
|
||||
|
@ -204,6 +210,7 @@ Automatic merge might have failed with the above command. However, because of
|
|||
`git rerere` it should have re-applied your original merge resolution. If this
|
||||
is indeed the case it should show something like this in the output of the
|
||||
previous command (note the `Resolved ...` line):
|
||||
|
||||
```
|
||||
CONFLICT (content): Merge conflict in <file_path>
|
||||
Resolved '<file_path>' using previous resolution.
|
||||
|
@ -213,6 +220,7 @@ Error redoing merge <merge_sha>
|
|||
|
||||
Confirm that the merge conflict is indeed resolved correctly. In that case you
|
||||
can do the following:
|
||||
|
||||
```bash
|
||||
# Add files that were conflicting
|
||||
git add "$(git diff --name-only --diff-filter=U)"
|
||||
|
@ -222,11 +230,13 @@ git rebase --continue
|
|||
Before pushing you should do a final check that the commit hash of your final
|
||||
non merge commit matches the commit hash that's on the community repo. If that's
|
||||
not the case, you should fallback to the `git merge` approach.
|
||||
|
||||
```bash
|
||||
git reset origin/$PR_BRANCH --hard
|
||||
```
|
||||
|
||||
If the commit hashes were as expected, push the branch:
|
||||
|
||||
```bash
|
||||
git push origin $PR_BRANCH --force-with-lease
|
||||
```
|
||||
|
@ -236,6 +246,7 @@ git push origin $PR_BRANCH --force-with-lease
|
|||
If you are falling back to the `git merge` approach after trying the
|
||||
`git rebase` approach, you should first restore the original branch on the
|
||||
community repo.
|
||||
|
||||
```bash
|
||||
git checkout $PR_BRANCH
|
||||
git reset ${PR_BRANCH}-backup --hard
|
||||
|
@ -272,6 +283,7 @@ different.
|
|||
A test should always be included in a schedule file, otherwise it will not be
|
||||
run in CI. This is most commonly forgotten for newly added tests. In that case
|
||||
the dev ran it locally without running a full schedule with something like:
|
||||
|
||||
```bash
|
||||
make -C src/test/regress/ check-minimal EXTRA_TESTS='multi_create_table_new_features'
|
||||
```
|
||||
|
@ -288,9 +300,11 @@ section in this `README.md` file and that they include `ci/ci_helpers.sh`.
|
|||
We do not use C-style comments in migration files as the stripped
|
||||
zero-length migration files cause warning during packaging.
|
||||
Instead use SQL type comments, i.e:
|
||||
|
||||
```
|
||||
-- this is a comment
|
||||
```
|
||||
|
||||
See [#3115](https://github.com/citusdata/citus/pull/3115) for more info.
|
||||
|
||||
## `disallow_hash_comments_in_spec_files.sh`
|
||||
|
@ -298,6 +312,7 @@ See [#3115](https://github.com/citusdata/citus/pull/3115) for more info.
|
|||
We do not use comments starting with # in spec files because it creates errors
|
||||
from C preprocessor that expects directives after this character.
|
||||
Instead use C type comments, i.e:
|
||||
|
||||
```
|
||||
// this is a single line comment
|
||||
|
||||
|
@ -329,13 +344,16 @@ because we are running the tests in a slightly different configuration.
|
|||
|
||||
This script tries to make sure that we don't add useless declarations to our
|
||||
code. What it effectively does is replace this:
|
||||
|
||||
```c
|
||||
int a = 0;
|
||||
int b = 2;
|
||||
Assert(b == 2);
|
||||
a = b + b;
|
||||
```
|
||||
|
||||
With this equivalent, but shorter version:
|
||||
|
||||
```c
|
||||
int b = 2;
|
||||
Assert(b == 2);
|
||||
|
@ -349,6 +367,7 @@ definitely possible there's a bug in there. So far no bad ones have been found.
|
|||
|
||||
A known issue is that it does not replace code in a block after an `#ifdef` like
|
||||
this.
|
||||
|
||||
```c
|
||||
int foo = 0;
|
||||
#ifdef SOMETHING
|
||||
|
@ -357,6 +376,7 @@ foo = 1
|
|||
foo = 2
|
||||
#endif
|
||||
```
|
||||
|
||||
This was deemed to be error prone and not worth the effort.
|
||||
|
||||
## `fix_gitignore.sh`
|
||||
|
|
|
@ -7,14 +7,14 @@ reduce IO requirements though compression and projection pushdown.
|
|||
|
||||
Existing PostgreSQL row tables work well for OLTP:
|
||||
|
||||
* Support `UPDATE`/`DELETE` efficiently
|
||||
* Efficient single-tuple lookups
|
||||
- Support `UPDATE`/`DELETE` efficiently
|
||||
- Efficient single-tuple lookups
|
||||
|
||||
The Citus Columnar tables work best for analytic or DW workloads:
|
||||
|
||||
* Compression
|
||||
* Doesn't read unnecessary columns
|
||||
* Efficient `VACUUM`
|
||||
- Compression
|
||||
- Doesn't read unnecessary columns
|
||||
- Efficient `VACUUM`
|
||||
|
||||
# Next generation of cstore_fdw
|
||||
|
||||
|
@ -23,47 +23,45 @@ Citus Columnar is the next generation of
|
|||
|
||||
Benefits of Citus Columnar over cstore_fdw:
|
||||
|
||||
* Citus Columnar is based on the [Table Access Method
|
||||
API](https://www.postgresql.org/docs/current/tableam.html), which
|
||||
allows it to behave exactly like an ordinary heap (row) table for
|
||||
most operations.
|
||||
* Supports Write-Ahead Log (WAL).
|
||||
* Supports ``ROLLBACK``.
|
||||
* Supports physical replication.
|
||||
* Supports recovery, including Point-In-Time Restore (PITR).
|
||||
* Supports ``pg_dump`` and ``pg_upgrade`` without the need for special
|
||||
options or extra steps.
|
||||
* Better user experience; simple ``USING``clause.
|
||||
* Supports more features that work on ordinary heap (row) tables.
|
||||
- Citus Columnar is based on the [Table Access Method
|
||||
API](https://www.postgresql.org/docs/current/tableam.html), which
|
||||
allows it to behave exactly like an ordinary heap (row) table for
|
||||
most operations.
|
||||
- Supports Write-Ahead Log (WAL).
|
||||
- Supports `ROLLBACK`.
|
||||
- Supports physical replication.
|
||||
- Supports recovery, including Point-In-Time Restore (PITR).
|
||||
- Supports `pg_dump` and `pg_upgrade` without the need for special
|
||||
options or extra steps.
|
||||
- Better user experience; simple `USING`clause.
|
||||
- Supports more features that work on ordinary heap (row) tables.
|
||||
|
||||
# Limitations
|
||||
|
||||
* Append-only (no ``UPDATE``/``DELETE`` support)
|
||||
* No space reclamation (e.g. rolled-back transactions may still
|
||||
consume disk space)
|
||||
* No bitmap index scans
|
||||
* No tidscans
|
||||
* No sample scans
|
||||
* No TOAST support (large values supported inline)
|
||||
* No support for [``ON
|
||||
CONFLICT``](https://www.postgresql.org/docs/12/sql-insert.html#SQL-ON-CONFLICT)
|
||||
statements (except ``DO NOTHING`` actions with no target specified).
|
||||
* No support for tuple locks (``SELECT ... FOR SHARE``, ``SELECT
|
||||
... FOR UPDATE``)
|
||||
* No support for serializable isolation level
|
||||
* Support for PostgreSQL server versions 12+ only
|
||||
* No support for foreign keys, unique constraints, or exclusion
|
||||
constraints
|
||||
* No support for logical decoding
|
||||
* No support for intra-node parallel scans
|
||||
* No support for ``AFTER ... FOR EACH ROW`` triggers
|
||||
* No `UNLOGGED` columnar tables
|
||||
- Append-only (no `UPDATE`/`DELETE` support)
|
||||
- No space reclamation (e.g. rolled-back transactions may still
|
||||
consume disk space)
|
||||
- No bitmap index scans
|
||||
- No tidscans
|
||||
- No sample scans
|
||||
- No TOAST support (large values supported inline)
|
||||
- No support for [`ON CONFLICT`](https://www.postgresql.org/docs/12/sql-insert.html#SQL-ON-CONFLICT)
|
||||
statements (except `DO NOTHING` actions with no target specified).
|
||||
- No support for tuple locks (`SELECT ... FOR SHARE`, `SELECT ... FOR UPDATE`)
|
||||
- No support for serializable isolation level
|
||||
- Support for PostgreSQL server versions 12+ only
|
||||
- No support for foreign keys, unique constraints, or exclusion
|
||||
constraints
|
||||
- No support for logical decoding
|
||||
- No support for intra-node parallel scans
|
||||
- No support for `AFTER ... FOR EACH ROW` triggers
|
||||
- No `UNLOGGED` columnar tables
|
||||
|
||||
Future iterations will incrementally lift the limitations listed above.
|
||||
|
||||
# User Experience
|
||||
|
||||
Create a Columnar table by specifying ``USING columnar`` when creating
|
||||
Create a Columnar table by specifying `USING columnar` when creating
|
||||
the table.
|
||||
|
||||
```sql
|
||||
|
@ -80,8 +78,7 @@ CREATE TABLE my_columnar_table
|
|||
Insert data into the table and read from it like normal (subject to
|
||||
the limitations listed above).
|
||||
|
||||
To see internal statistics about the table, use ``VACUUM
|
||||
VERBOSE``. Note that ``VACUUM`` (without ``FULL``) is much faster on a
|
||||
To see internal statistics about the table, use `VACUUM VERBOSE`. Note that `VACUUM` (without `FULL`) is much faster on a
|
||||
columnar table, because it scans only the metadata, and not the actual
|
||||
data.
|
||||
|
||||
|
@ -109,22 +106,22 @@ SELECT alter_columnar_table_set(
|
|||
|
||||
The following options are available:
|
||||
|
||||
* **compression**: `[none|pglz|zstd|lz4|lz4hc]` - set the compression type
|
||||
for _newly-inserted_ data. Existing data will not be
|
||||
recompressed/decompressed. The default value is `zstd` (if support
|
||||
has been compiled in).
|
||||
* **compression_level**: ``<integer>`` - Sets compression level. Valid
|
||||
settings are from 1 through 19. If the compression method does not
|
||||
support the level chosen, the closest level will be selected
|
||||
instead.
|
||||
* **stripe_row_limit**: ``<integer>`` - the maximum number of rows per
|
||||
stripe for _newly-inserted_ data. Existing stripes of data will not
|
||||
be changed and may have more rows than this maximum value. The
|
||||
default value is `150000`.
|
||||
* **chunk_group_row_limit**: ``<integer>`` - the maximum number of rows per
|
||||
chunk for _newly-inserted_ data. Existing chunks of data will not be
|
||||
changed and may have more rows than this maximum value. The default
|
||||
value is `10000`.
|
||||
- **compression**: `[none|pglz|zstd|lz4|lz4hc]` - set the compression type
|
||||
for _newly-inserted_ data. Existing data will not be
|
||||
recompressed/decompressed. The default value is `zstd` (if support
|
||||
has been compiled in).
|
||||
- **compression_level**: `<integer>` - Sets compression level. Valid
|
||||
settings are from 1 through 19. If the compression method does not
|
||||
support the level chosen, the closest level will be selected
|
||||
instead.
|
||||
- **stripe_row_limit**: `<integer>` - the maximum number of rows per
|
||||
stripe for _newly-inserted_ data. Existing stripes of data will not
|
||||
be changed and may have more rows than this maximum value. The
|
||||
default value is `150000`.
|
||||
- **chunk_group_row_limit**: `<integer>` - the maximum number of rows per
|
||||
chunk for _newly-inserted_ data. Existing chunks of data will not be
|
||||
changed and may have more rows than this maximum value. The default
|
||||
value is `10000`.
|
||||
|
||||
View options for all tables with:
|
||||
|
||||
|
@ -135,13 +132,13 @@ SELECT * FROM columnar.options;
|
|||
You can also adjust options with a `SET` command of one of the
|
||||
following GUCs:
|
||||
|
||||
* `columnar.compression`
|
||||
* `columnar.compression_level`
|
||||
* `columnar.stripe_row_limit`
|
||||
* `columnar.chunk_group_row_limit`
|
||||
- `columnar.compression`
|
||||
- `columnar.compression_level`
|
||||
- `columnar.stripe_row_limit`
|
||||
- `columnar.chunk_group_row_limit`
|
||||
|
||||
GUCs only affect newly-created *tables*, not any newly-created
|
||||
*stripes* on an existing table.
|
||||
GUCs only affect newly-created _tables_, not any newly-created
|
||||
_stripes_ on an existing table.
|
||||
|
||||
## Partitioning
|
||||
|
||||
|
@ -172,20 +169,19 @@ INSERT INTO parent VALUES ('2020-03-15', 30, 300, 'three thousand'); -- row
|
|||
When performing operations on a partitioned table with a mix of row
|
||||
and columnar partitions, take note of the following behaviors for
|
||||
operations that are supported on row tables but not columnar
|
||||
(e.g. ``UPDATE``, ``DELETE``, tuple locks, etc.):
|
||||
(e.g. `UPDATE`, `DELETE`, tuple locks, etc.):
|
||||
|
||||
* If the operation is targeted at a specific row partition
|
||||
(e.g. ``UPDATE p2 SET i = i + 1``), it will succeed; if targeted at
|
||||
a specified columnar partition (e.g. ``UPDATE p1 SET i = i + 1``),
|
||||
it will fail.
|
||||
* If the operation is targeted at the partitioned table and has a
|
||||
``WHERE`` clause that excludes all columnar partitions
|
||||
(e.g. ``UPDATE parent SET i = i + 1 WHERE ts = '2020-03-15'``), it
|
||||
will succeed.
|
||||
* If the operation is targeted at the partitioned table, but does not
|
||||
exclude all columnar partitions, it will fail; even if the actual
|
||||
data to be updated only affects row tables (e.g. ``UPDATE parent SET
|
||||
i = i + 1 WHERE n = 300``).
|
||||
- If the operation is targeted at a specific row partition
|
||||
(e.g. `UPDATE p2 SET i = i + 1`), it will succeed; if targeted at
|
||||
a specified columnar partition (e.g. `UPDATE p1 SET i = i + 1`),
|
||||
it will fail.
|
||||
- If the operation is targeted at the partitioned table and has a
|
||||
`WHERE` clause that excludes all columnar partitions
|
||||
(e.g. `UPDATE parent SET i = i + 1 WHERE ts = '2020-03-15'`), it
|
||||
will succeed.
|
||||
- If the operation is targeted at the partitioned table, but does not
|
||||
exclude all columnar partitions, it will fail; even if the actual
|
||||
data to be updated only affects row tables (e.g. `UPDATE parent SET i = i + 1 WHERE n = 300`).
|
||||
|
||||
Note that Citus Columnar supports `btree` and `hash `indexes (and
|
||||
the constraints requiring them) but does not support `gist`, `gin`,
|
||||
|
@ -207,7 +203,7 @@ ALTER TABLE p2 ADD UNIQUE (n);
|
|||
Note: ensure that you understand any advanced features that may be
|
||||
used with the table before converting it (e.g. row-level security,
|
||||
storage options, constraints, inheritance, etc.), and ensure that they
|
||||
are reproduced in the new table or partition appropriately. ``LIKE``,
|
||||
are reproduced in the new table or partition appropriately. `LIKE`,
|
||||
used below, is a shorthand that works only in simple cases.
|
||||
|
||||
```sql
|
||||
|
@ -221,11 +217,11 @@ SELECT alter_table_set_access_method('my_table', 'heap');
|
|||
|
||||
# Performance Microbenchmark
|
||||
|
||||
*Important*: This microbenchmark is not intended to represent any real
|
||||
workload. Compression ratios, and therefore performance, will depend
|
||||
heavily on the specific workload. This is only for the purpose of
|
||||
illustrating a "columnar friendly" contrived workload that showcases
|
||||
the benefits of columnar.
|
||||
_Important_: This microbenchmark is not intended to represent any real
|
||||
workload. Compression ratios, and therefore performance, will depend
|
||||
heavily on the specific workload. This is only for the purpose of
|
||||
illustrating a "columnar friendly" contrived workload that showcases
|
||||
the benefits of columnar.
|
||||
|
||||
## Schema
|
||||
|
||||
|
@ -299,19 +295,19 @@ total row count: 75000000, stripe count: 500, average rows per stripe: 150000
|
|||
chunk count: 60000, containing data for dropped columns: 0, zstd compressed: 60000
|
||||
```
|
||||
|
||||
``VACUUM VERBOSE`` reports a smaller compression ratio, because it
|
||||
`VACUUM VERBOSE` reports a smaller compression ratio, because it
|
||||
only averages the compression ratio of the individual chunks, and does
|
||||
not account for the metadata savings of the columnar format.
|
||||
|
||||
## System
|
||||
|
||||
* Azure VM: Standard D2s v3 (2 vcpus, 8 GiB memory)
|
||||
* Linux (ubuntu 18.04)
|
||||
* Data Drive: Standard HDD (512GB, 500 IOPS Max, 60 MB/s Max)
|
||||
* PostgreSQL 13 (``--with-llvm``, ``--with-python``)
|
||||
* ``shared_buffers = 128MB``
|
||||
* ``max_parallel_workers_per_gather = 0``
|
||||
* ``jit = on``
|
||||
- Azure VM: Standard D2s v3 (2 vcpus, 8 GiB memory)
|
||||
- Linux (ubuntu 18.04)
|
||||
- Data Drive: Standard HDD (512GB, 500 IOPS Max, 60 MB/s Max)
|
||||
- PostgreSQL 13 (`--with-llvm`, `--with-python`)
|
||||
- `shared_buffers = 128MB`
|
||||
- `max_parallel_workers_per_gather = 0`
|
||||
- `jit = on`
|
||||
|
||||
Note: because this was run on a system with enough physical memory to
|
||||
hold a substantial fraction of the table, the IO benefits of columnar
|
||||
|
@ -332,6 +328,7 @@ SELECT vendor_id, SUM(quantity) FROM perf_columnar GROUP BY vendor_id OFFSET 100
|
|||
```
|
||||
|
||||
Timing (median of three runs):
|
||||
* row: 436s
|
||||
* columnar: 16s
|
||||
* speedup: **27X**
|
||||
|
||||
- row: 436s
|
||||
- columnar: 16s
|
||||
- speedup: **27X**
|
||||
|
|
|
@ -7,24 +7,24 @@ here are somewhat more fine-grained. This is due to the nature of citus commands
|
|||
heavily focused on distributed tables. Instead of having all commands in `tablecmds.c`
|
||||
they are often moved to files that are named after the command.
|
||||
|
||||
| File | Description |
|
||||
|------------------------------|-------------|
|
||||
| `create_distributed_table.c` | Implementation of UDF's for creating distributed tables |
|
||||
| `drop_distributed_table.c` | Implementation for dropping metadata for partitions of distributed tables |
|
||||
| `extension.c` | Implementation of `CREATE EXTENSION` commands for citus specific checks |
|
||||
| `foreign_constraint.c` | Implementation of helper functions for foreign key constraints |
|
||||
| `grant.c` | Placeholder for code granting users access to relations, implemented as enterprise feature |
|
||||
| `index.c` | Implementation of commands specific to indices on distributed tables |
|
||||
| `multi_copy.c` | Implementation of `COPY` command. There are multiple different copy modes which are described in detail below |
|
||||
| `policy.c` | Implementation of `CREATE\ALTER POLICY` commands. |
|
||||
| `rename.c` | Implementation of `ALTER ... RENAME ...` commands. It implements the renaming of applicable objects, otherwise provides the user with a warning |
|
||||
| `schema.c` | |
|
||||
| `sequence.c` | Implementation of `CREATE/ALTER SEQUENCE` commands. Primarily checks correctness of sequence statements as they are not propagated to the worker nodes |
|
||||
| `table.c` | |
|
||||
| `transmit.c` | Implementation of `COPY` commands with `format transmit` set in the options. This format is used to transfer files from one node to another node |
|
||||
| `truncate.c` | Implementation of `TRUNCATE` commands on distributed tables |
|
||||
| File | Description |
|
||||
| ---------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `create_distributed_table.c` | Implementation of UDF's for creating distributed tables |
|
||||
| `drop_distributed_table.c` | Implementation for dropping metadata for partitions of distributed tables |
|
||||
| `extension.c` | Implementation of `CREATE EXTENSION` commands for citus specific checks |
|
||||
| `foreign_constraint.c` | Implementation of helper functions for foreign key constraints |
|
||||
| `grant.c` | Placeholder for code granting users access to relations, implemented as enterprise feature |
|
||||
| `index.c` | Implementation of commands specific to indices on distributed tables |
|
||||
| `multi_copy.c` | Implementation of `COPY` command. There are multiple different copy modes which are described in detail below |
|
||||
| `policy.c` | Implementation of `CREATE\ALTER POLICY` commands. |
|
||||
| `rename.c` | Implementation of `ALTER ... RENAME ...` commands. It implements the renaming of applicable objects, otherwise provides the user with a warning |
|
||||
| `schema.c` | |
|
||||
| `sequence.c` | Implementation of `CREATE/ALTER SEQUENCE` commands. Primarily checks correctness of sequence statements as they are not propagated to the worker nodes |
|
||||
| `table.c` | |
|
||||
| `transmit.c` | Implementation of `COPY` commands with `format transmit` set in the options. This format is used to transfer files from one node to another node |
|
||||
| `truncate.c` | Implementation of `TRUNCATE` commands on distributed tables |
|
||||
| `utility_hook.c` | This is the entry point from postgres into the commands module of citus. It contains the implementation that gets registered in postgres' `ProcessUtility_hook` callback to extends the functionality of the original ProcessUtility. This code is used to route the incoming commands to their respective implementation in Citus |
|
||||
| `vacuum.c` | Implementation of `VACUUM` commands on distributed tables |
|
||||
| `vacuum.c` | Implementation of `VACUUM` commands on distributed tables |
|
||||
|
||||
# COPY
|
||||
|
||||
|
|
|
@ -6,12 +6,11 @@ If the input query is trivial (e.g., no joins, no subqueries/ctes, single table
|
|||
|
||||
Distributed planning (`CreateDistributedPlan`) tries several different methods to plan the query:
|
||||
|
||||
|
||||
1. Fast-path router planner, proceed if the query prunes down to a single shard of a single table
|
||||
2. Router planner, proceed if the query prunes down to a single set of co-located shards
|
||||
3. Modification planning, proceed if the query is a DML command and all joins are co-located
|
||||
4. Recursive planning, find CTEs and subqueries that cannot be pushed down and go back to 1
|
||||
5. Logical planner, constructs a multi-relational algebra tree to find a distributed execution plan
|
||||
1. Fast-path router planner, proceed if the query prunes down to a single shard of a single table
|
||||
2. Router planner, proceed if the query prunes down to a single set of co-located shards
|
||||
3. Modification planning, proceed if the query is a DML command and all joins are co-located
|
||||
4. Recursive planning, find CTEs and subqueries that cannot be pushed down and go back to 1
|
||||
5. Logical planner, constructs a multi-relational algebra tree to find a distributed execution plan
|
||||
|
||||
## Fast-path router planner
|
||||
|
||||
|
@ -19,7 +18,6 @@ By examining the query tree, if we can decide that the query hits only a single
|
|||
|
||||
As the name reveals, this can be considered as a sub-item of Router planner described below. The only difference is that fast-path planner doesn't rely on `standard_planner()` for collecting restriction information.
|
||||
|
||||
|
||||
## Router planner
|
||||
|
||||
During the call to `standard_planner`, Postgres calls a hook named `multi_relation_restriction_hook`. We use this hook to determine explicit and implicit filters on (occurrences of) distributed tables. We apply shard pruning to all tables using the filters in `PlanRouterQuery`. If all tables prune down to a single shard and all those shards are on the same node, then the query is router plannable meaning it can be fully executed by one of the worker nodes.
|
||||
|
@ -34,7 +32,7 @@ CTEs and subqueries that cannot be pushed down (checked using `DeferErrorIfCanno
|
|||
|
||||
The logical planner constructs a multi-relational algebra tree from the query with operators such as `MultiTable`, `MultiProject`, `MultiJoin` and `MultiCollect`. It first picks a strategy for handling joins in `MultiLogicalPlanCreate` (pushdown planning, or join order planning) and then builds a `MultiNode` tree based on the original query tree. In the initial `MultiNode` tree, each `MultiTable` is wrapped in `MultiCollect`, which effectively means collect the entire table in one place. The `MultiNode` tree is passed to the logical optimizer which transforms the tree into one that requires less network traffic by pushing down operators. Finally, the physical planner transforms the `MultiNode` tree into a `DistributedPlan` which contains the queries to execute on shards and can be passed to the executor.
|
||||
|
||||
### Pushdown planning
|
||||
### Pushdown planning
|
||||
|
||||
During the call to `standard_planner`, Postgres calls a hook named `multi_relation_restriction_hook`. We use this hook to determine whether all (occurrences of) distributed tables are joined on their respective distribution columns. When this is the case, we can be somewhat agnostic to the structure of subqueries and other joins. In that case, we treat the whole join tree as a single `MultiTable` and deparse this part of the query as is during physical planning. Pushing down a subquery is only possible when the subquery can be answered without a merge step (checked using `DeferErrorIfCannotPushdownSubquery`). However, you may notice that these subqueries are already replaced by `read_intermediate_result` calls during recursive planning. Only subqueries that have references to the outer query remain at this stage would pass through recursive planning and fail the check.
|
||||
|
||||
|
@ -56,10 +54,10 @@ This section needs to be expanded.
|
|||
|
||||
In terms of modification planning, we distinguish between several cases:
|
||||
|
||||
1. DML planning (`CreateModifyPlan`)
|
||||
1.a. UPDATE/DELETE planning
|
||||
1.b. INSERT planning
|
||||
2. INSERT...SELECT planning (`CreateInsertSelectPlan`)
|
||||
1. DML planning (`CreateModifyPlan`)
|
||||
1.a. UPDATE/DELETE planning
|
||||
1.b. INSERT planning
|
||||
2. INSERT...SELECT planning (`CreateInsertSelectPlan`)
|
||||
|
||||
### UPDATE/DELETE planning
|
||||
|
||||
|
@ -85,6 +83,6 @@ If `INSERT ... SELECT` query can be planned by pushing down it to the worker nod
|
|||
|
||||
If the query can not be pushed down to the worker nodes, two different approaches can be followed depending on whether ON CONFLICT or RETURNING clauses are used.
|
||||
|
||||
* If `ON CONFLICT` or `RETURNING` are not used, Citus uses `COPY` command to handle such queries. After planning the `SELECT` part of the `INSERT ... SELECT` query, including subqueries and CTEs, it executes the plan and send results back to the DestReceiver which is created using the target table info.
|
||||
- If `ON CONFLICT` or `RETURNING` are not used, Citus uses `COPY` command to handle such queries. After planning the `SELECT` part of the `INSERT ... SELECT` query, including subqueries and CTEs, it executes the plan and send results back to the DestReceiver which is created using the target table info.
|
||||
|
||||
* Since `COPY` command supports neither `ON CONFLICT` nor `RETURNING` clauses, Citus perform `INSERT ... SELECT` queries with `ON CONFLICT` or `RETURNING` clause in two phases. First, Citus plans the `SELECT` part of the query, executes the plan and saves results to the intermediate table which is colocated with target table of the `INSERT ... SELECT` query. Then, `INSERT ... SELECT` query is directly run on the worker node using the intermediate table as the source table.
|
||||
- Since `COPY` command supports neither `ON CONFLICT` nor `RETURNING` clauses, Citus perform `INSERT ... SELECT` queries with `ON CONFLICT` or `RETURNING` clause in two phases. First, Citus plans the `SELECT` part of the query, executes the plan and saves results to the intermediate table which is colocated with target table of the `INSERT ... SELECT` query. Then, `INSERT ... SELECT` query is directly run on the worker node using the intermediate table as the source table.
|
||||
|
|
|
@ -1,4 +1,3 @@
|
|||
|
||||
# How to trigger hammerdb benchmark jobs
|
||||
|
||||
You can trigger two types of hammerdb benchmark jobs:
|
||||
|
|
|
@ -1,19 +1,19 @@
|
|||
Contributing
|
||||
============
|
||||
# Contributing
|
||||
|
||||
For each message we wish to capture, we have a class definition in `structs.py`.
|
||||
|
||||
If there is a new network message that is not yet parsed by our proxy, check the Postgres documentation [here](https://www.postgresql.org/docs/current/protocol-message-formats.html) for message format and add a new class definition.
|
||||
|
||||
Rooms for improvement:
|
||||
- Anonymize network dumps by removing shard/placement/transaction ids
|
||||
- Occasionally changes in our codebase introduces new messages that contain parts that should be anonymized
|
||||
- Add missing message format definitions
|
||||
- Allow failure testing on underprivileged users are not allowed to write to our fifo file on the database
|
||||
|
||||
- Anonymize network dumps by removing shard/placement/transaction ids
|
||||
- Occasionally changes in our codebase introduces new messages that contain parts that should be anonymized
|
||||
- Add missing message format definitions
|
||||
- Allow failure testing on underprivileged users are not allowed to write to our fifo file on the database
|
||||
|
||||
# Resources at Postgres Docs:
|
||||
|
||||
- [Postgres Frontend/Backend Protocol](https://www.postgresql.org/docs/current/protocol.html) is the root directory for message protocols between frontends and backends.
|
||||
- [Protocol Flow](https://www.postgresql.org/docs/current/protocol-flow.html) explains the lifecyle of a session, and a tentative ordering of messages that will be dispatched
|
||||
- [Extended Query Protocol](https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY) uses a more detailed set of messages in the session lifecycle, and these messages are mostly left unparsed.
|
||||
- [Message Formats](https://www.postgresql.org/docs/current/protocol-message-formats.html) lists formats of all the messages that can be dispatched
|
||||
- [Postgres Frontend/Backend Protocol](https://www.postgresql.org/docs/current/protocol.html) is the root directory for message protocols between frontends and backends.
|
||||
- [Protocol Flow](https://www.postgresql.org/docs/current/protocol-flow.html) explains the lifecyle of a session, and a tentative ordering of messages that will be dispatched
|
||||
- [Extended Query Protocol](https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY) uses a more detailed set of messages in the session lifecycle, and these messages are mostly left unparsed.
|
||||
- [Message Formats](https://www.postgresql.org/docs/current/protocol-message-formats.html) lists formats of all the messages that can be dispatched
|
||||
|
|
|
@ -1,19 +1,17 @@
|
|||
Automated Failure testing
|
||||
=========================
|
||||
# Automated Failure testing
|
||||
|
||||
Automated Failure Testing works by inserting a network proxy (mitmproxy) between the Citus coordinator and one of the workers (connections to the other worker are left unchanged). The proxy is configurable, and sits on a fifo waiting for commands. When it receives a command over the fifo it reconfigures itself and sends back response. Regression tests which use automated failure testing communicate with mitmproxy by running special UDFs which talk to said fifo. The tests send commands such as "fail any connection which contain the string `COMMIT`" and then run SQL queries and assert that the coordinator has reasonable behavior when the specified failures occur.
|
||||
|
||||
**Table of Contents**
|
||||
|
||||
- [Getting Started](#getting-started)
|
||||
- [Running mitmproxy manually](#running-mitmproxy-manually)
|
||||
- [Using Failure Test Helpers](#using-failure-test-helpers)
|
||||
- [`citus.mitmproxy()` command strings](#citusmitmproxy-command-strings)
|
||||
- [Actions](#actions)
|
||||
- [Filters](#filters)
|
||||
- [Chaining](#chaining)
|
||||
- [Recording Network Traffic](#recording-network-traffic)
|
||||
|
||||
- [Getting Started](#getting-started)
|
||||
- [Running mitmproxy manually](#running-mitmproxy-manually)
|
||||
- [Using Failure Test Helpers](#using-failure-test-helpers)
|
||||
- [`citus.mitmproxy()` command strings](#citusmitmproxy-command-strings)
|
||||
- [Actions](#actions)
|
||||
- [Filters](#filters)
|
||||
- [Chaining](#chaining)
|
||||
- [Recording Network Traffic](#recording-network-traffic)
|
||||
|
||||
## Getting Started
|
||||
|
||||
|
@ -58,14 +56,17 @@ Again, the specific port numbers depend on your setup.
|
|||
### Using Failure Test Helpers
|
||||
|
||||
In a psql front-end run
|
||||
|
||||
```psql
|
||||
# \i src/test/regress/sql/failure_test_helpers.sql
|
||||
```
|
||||
|
||||
> **_NOTE:_** To make the script above work start psql as follows
|
||||
> **_NOTE:_** To make the script above work start psql as follows
|
||||
>
|
||||
> ```bash
|
||||
> psql -p9700 --variable=worker_2_port=9702
|
||||
> ```
|
||||
>
|
||||
> Assuming the coordinator is running on 9700 and worker 2 (which is going to be intercepted) runs on 9702
|
||||
|
||||
The above file creates some UDFs and also disables a few citus features which make connections in the background.
|
||||
|
@ -94,79 +95,78 @@ Command strings specify a pipline. Each connection is handled individually, and
|
|||
|
||||
`conn.onQuery().after(2).kill()` -> kill a connection if three Query packets are seen
|
||||
|
||||
- `onQuery()` is a filter. It only passes Query packets (packets which the frontend sends to the backend which specify a query which is to be run) onto the next step of the pipeline.
|
||||
- `onQuery()` is a filter. It only passes Query packets (packets which the frontend sends to the backend which specify a query which is to be run) onto the next step of the pipeline.
|
||||
|
||||
- `after(2)` is another filter, it ignores the first two packets which are sent to it, then sends the following packets to the next step of the pipeline.
|
||||
- `after(2)` is another filter, it ignores the first two packets which are sent to it, then sends the following packets to the next step of the pipeline.
|
||||
|
||||
- `kill()` is an action, when a packet reaches it the connection containing that packet will be killed.
|
||||
- `kill()` is an action, when a packet reaches it the connection containing that packet will be killed.
|
||||
|
||||
### Actions
|
||||
|
||||
There are 5 actions you can take on connections:
|
||||
|
||||
|
||||
| Action | Description |
|
||||
|:--------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| `conn.allow()` | the default, allows all connections to execute unmodified |
|
||||
| `conn.kill()` | kills all connections immediately after the first packet is sent |
|
||||
| `conn.reset()` | `kill()` calls `shutdown(SHUT_WR)`, `shutdown(SHUT_RD)`, `close()`. This is a very graceful way to close the socket. `reset()` causes a RST packet to be sent and forces the connection closed in something more resembling an error. |
|
||||
| `conn.cancel(pid)` | This doesn't cause any changes at the network level. Instead it sends a SIGINT to pid and introduces a short delay, with hopes that the signal will be received before the delay ends. You can use it to write cancellation tests. |
|
||||
| `conn.killall()` | the `killall()` command kills this and all subsequent connections. Any packets sent once it triggers will have their connections killed. |
|
||||
| Action | Description |
|
||||
| :----------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
||||
| `conn.allow()` | the default, allows all connections to execute unmodified |
|
||||
| `conn.kill()` | kills all connections immediately after the first packet is sent |
|
||||
| `conn.reset()` | `kill()` calls `shutdown(SHUT_WR)`, `shutdown(SHUT_RD)`, `close()`. This is a very graceful way to close the socket. `reset()` causes a RST packet to be sent and forces the connection closed in something more resembling an error. |
|
||||
| `conn.cancel(pid)` | This doesn't cause any changes at the network level. Instead it sends a SIGINT to pid and introduces a short delay, with hopes that the signal will be received before the delay ends. You can use it to write cancellation tests. |
|
||||
| `conn.killall()` | the `killall()` command kills this and all subsequent connections. Any packets sent once it triggers will have their connections killed. |
|
||||
|
||||
The first 4 actions all work on a per-connection basis. Meaning, each connection is tracked individually. A command such as `conn.onQuery().kill()` will only kill the connection on which the Query packet was seen. A command such as `conn.onQuery().after(2).kill()` will never trigger if each Query is sent on a different connection, even if you send dozens of Query packets.
|
||||
|
||||
### Filters
|
||||
|
||||
- `conn.onQuery().kill()`
|
||||
- kill a connection once a `Query` packet is seen
|
||||
- `conn.onCopyData().kill()`
|
||||
- kill a connection once a `CopyData` packet is seen
|
||||
- `conn.onQuery().kill()`
|
||||
- kill a connection once a `Query` packet is seen
|
||||
- `conn.onCopyData().kill()`
|
||||
- kill a connection once a `CopyData` packet is seen
|
||||
|
||||
The list of supported packets can be found in [structs.py](structs.py), and the list of packets which
|
||||
could be supported can be found [here](https://www.postgresql.org/docs/current/static/protocol-message-formats.html)
|
||||
|
||||
You can also inspect the contents of packets:
|
||||
|
||||
- `conn.onQuery(query="COMMIT").kill()`
|
||||
- You can look into the actual query which is sent and match on its contents.
|
||||
- Note that this is always a regex
|
||||
- `conn.onQuery(query="^COMMIT").kill()`
|
||||
- The query must start with `COMMIT`
|
||||
- `conn.onQuery(query="pg_table_size\(")`
|
||||
- You must escape parens, since you're in a regex
|
||||
- `after(n)`
|
||||
- Matches after the n-th packet has been sent:
|
||||
- `conn.after(2).kill()`
|
||||
- Kill connections when the third packet is sent down them
|
||||
- `conn.onQuery(query="COMMIT").kill()`
|
||||
- You can look into the actual query which is sent and match on its contents.
|
||||
- Note that this is always a regex
|
||||
- `conn.onQuery(query="^COMMIT").kill()`
|
||||
- The query must start with `COMMIT`
|
||||
- `conn.onQuery(query="pg_table_size\(")`
|
||||
- You must escape parens, since you're in a regex
|
||||
- `after(n)`
|
||||
- Matches after the n-th packet has been sent:
|
||||
- `conn.after(2).kill()`
|
||||
- Kill connections when the third packet is sent down them
|
||||
|
||||
There's also a low-level filter which runs a regex against the raw content of the packet:
|
||||
|
||||
- `conn.matches(b"^Q").kill()`
|
||||
- This is another way of writing `conn.onQuery()`
|
||||
- Note the `b`, it's always required.
|
||||
- `conn.matches(b"^Q").kill()`
|
||||
- This is another way of writing `conn.onQuery()`
|
||||
- Note the `b`, it's always required.
|
||||
|
||||
### Chaining
|
||||
|
||||
Filters and actions can be arbitrarily chained:
|
||||
|
||||
- `conn.matches(b"^Q").after(2).kill()`
|
||||
- kill any connection when the third Query is sent
|
||||
- `conn.matches(b"^Q").after(2).kill()`
|
||||
- kill any connection when the third Query is sent
|
||||
|
||||
## Recording Network Traffic
|
||||
|
||||
There are also some special commands. This proxy also records every packet and lets you
|
||||
inspect them:
|
||||
|
||||
- `recorder.dump()`
|
||||
- Emits a list of captured packets in `COPY` text format
|
||||
- `recorder.reset()`
|
||||
- Empties the data structure containing the captured packets
|
||||
- `recorder.dump()`
|
||||
- Emits a list of captured packets in `COPY` text format
|
||||
- `recorder.reset()`
|
||||
- Empties the data structure containing the captured packets
|
||||
|
||||
Both of those calls empty the structure containing the packets, a call to `dump()` will only return the packets which were captured since the last call to `dump()` or `reset()`
|
||||
|
||||
Back when you called `\i sql/failure_test_helpers.sql` you created some UDFs which make using these strings easier. Here are some commands you can run from psql, or from inside failure tests:
|
||||
|
||||
- `citus.clear_network_traffic()`
|
||||
- Empties the buffer containing captured packets
|
||||
- `citus.dump_network_traffic()`
|
||||
- Returns a little table and pretty-prints information on all the packets captured since the last call to `clear_network_traffic()` or `dump_network_traffic()`
|
||||
- `citus.clear_network_traffic()`
|
||||
- Empties the buffer containing captured packets
|
||||
- `citus.dump_network_traffic()`
|
||||
- Returns a little table and pretty-prints information on all the packets captured since the last call to `clear_network_traffic()` or `dump_network_traffic()`
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
In this folder, all tests which in the format of '*_add.spec' organized
|
||||
In this folder, all tests which in the format of '\*\_add.spec' organized
|
||||
according to specific format.
|
||||
|
||||
You should use `//` in mx files not `//`. We preprocess mx files with `cpp` to
|
||||
|
|
|
@ -8,19 +8,19 @@ In the interest of fostering an open and welcoming environment, we as contributo
|
|||
|
||||
Examples of behavior that contributes to creating a positive environment include:
|
||||
|
||||
* Using welcoming and inclusive language
|
||||
* Being respectful of differing viewpoints and experiences
|
||||
* Gracefully accepting constructive criticism
|
||||
* Focusing on what is best for the community
|
||||
* Showing empathy towards other community members
|
||||
- Using welcoming and inclusive language
|
||||
- Being respectful of differing viewpoints and experiences
|
||||
- Gracefully accepting constructive criticism
|
||||
- Focusing on what is best for the community
|
||||
- Showing empathy towards other community members
|
||||
|
||||
Examples of unacceptable behavior by participants include:
|
||||
|
||||
* The use of sexualized language or imagery and unwelcome sexual attention or advances
|
||||
* Trolling, insulting/derogatory comments, and personal or political attacks
|
||||
* Public or private harassment
|
||||
* Publishing others' private information, such as a physical or electronic address, without explicit permission
|
||||
* Other conduct which could reasonably be considered inappropriate in a professional setting
|
||||
- The use of sexualized language or imagery and unwelcome sexual attention or advances
|
||||
- Trolling, insulting/derogatory comments, and personal or political attacks
|
||||
- Public or private harassment
|
||||
- Publishing others' private information, such as a physical or electronic address, without explicit permission
|
||||
- Other conduct which could reasonably be considered inappropriate in a professional setting
|
||||
|
||||
## Our Responsibilities
|
||||
|
||||
|
|
Loading…
Reference in New Issue