citus

Distributed PostgreSQL as an extension

citus citus-extension database database-cluster distributed-database multi-tenant postgres postgresql relational-database scale sharding sql

Go to file

Onder Kalaci d83be3a33f Enforce foreign key restrictions inside transaction blocks When a hash distributed table have a foreign key to a reference table, there are few restrictions we have to apply in order to prevent distributed deadlocks or reading wrong results. The necessity to apply the restrictions arise from cascading nature of foreign keys. When a foreign key on a reference table cascades to a distributed table, a single operation over a single connection can acquire locks on multiple shards of the distributed table. Thus, any parallel operation on that distributed table, in the same transaction should not open parallel connections to the shards. Otherwise, we'd either end-up with a self-distributed deadlock or read wrong results. As briefly described above, the restrictions that we apply is done by tracking the distributed/reference relation accesses inside transaction blocks, and act accordingly when necessary. The two main rules are as follows: - Whenever a parallel distributed relation access conflicts with a consecutive reference relation access, Citus errors out - Whenever a reference relation access is followed by a conflicting parallel relation access, the execution mode is switched to sequential mode. There are also some other notes to mention: - If the user does SET LOCAL citus.multi_shard_modify_mode TO 'sequential';, all the queries should simply work with using one connection per worker and sequentially executing the commands. That's obviously a slower approach than Citus' usual parallel execution. However, we've at least have a way to run all commands successfully. - If an unrelated parallel query executed on any distributed table, we cannot switch to sequential mode. Because, the essense of sequential mode is using one connection per worker. However, in the presence of a parallel connection, the connection manager picks those connections to execute the commands. That contradicts with our purpose, thus we error out. - COPY to a distributed table cannot be executed in sequential mode. Thus, if we switch to sequential mode and COPY is executed, the operation fails and there is currently no way of implementing that. Note that, when the local table is not empty and create_distributed_table is used, citus uses COPY internally. Thus, in those cases, create_distributed_table() will also fail. - There is a GUC called citus.enforce_foreign_key_restrictions to disable all the checks. We added that GUC since the restrictions we apply is sometimes a bit more restrictive than its necessary. The user might want to relax those. Similarly, if you don't have CASCADEing reference tables, you might consider disabling all the checks.		2018-07-03 17:05:55 +03:00
config	Add citus_version(), analogous to PG's version()	2017-10-16 18:09:29 -06:00
src	Enforce foreign key restrictions inside transaction blocks	2018-07-03 17:05:55 +03:00
windows	Add connparam invalidation trigger creation logic	2018-06-20 14:13:18 -06:00
.codecov.yml	Remove obsolete lines	2017-09-25 11:18:25 -07:00
.editorconfig	Set tab size for GitHub display	2017-03-22 13:03:39 -06:00
.gitattributes	Add ruleutils file for PostgreSQL 11	2017-09-25 17:20:24 -07:00
.gitignore	Add vim swap files to .gitignore	2017-07-12 14:16:23 +02:00
.travis.yml	Bump tools version in .travis.yml	2018-07-02 14:55:23 +03:00
CHANGELOG.md	Add changelog entry for 7.4.1	2018-06-20 11:26:15 +03:00
CONTRIBUTING.md	Two more libs I needed to build citus	2017-08-24 13:04:35 -06:00
LICENSE	Add AGPL-3.0 in LICENSE file	2016-03-23 17:04:58 -06:00
Makefile	Create foreign key relation graph and functions to query on it	2018-07-03 17:05:55 +03:00
Makefile.global.in	Basic usage statistics collection. (#1656 )	2017-10-11 09:55:15 -04:00
README.md	Update README.md	2017-03-23 11:00:32 -07:00
aclocal.m4	Basic usage statistics collection. (#1656 )	2017-10-11 09:55:15 -04:00
appveyor.yml	Configure appveyor to run regression tests	2018-04-25 18:02:07 -07:00
autogen.sh	Changed product name to citus	2016-02-15 16:04:31 +02:00
citus.control	Update citus_stat_statements view and regression tests	2018-07-03 16:14:13 +03:00
configure	Bump citus version to 7.5devel	2018-05-28 17:25:21 -06:00
configure.in	Bump citus version to 7.5devel	2018-05-28 17:25:21 -06:00
github-banner.png	Readme for 5.0	2016-03-18 13:32:13 -07:00
prep_buildtree	Changed product name to citus	2016-02-15 16:04:31 +02:00

README.md

What is Citus?

Open-source PostgreSQL extension (not a fork)
Scalable across multiple machines through sharding and replication
Distributed engine for query parallelization
Database designed to scale multi-tenant applications

Citus is a distributed database that scales across commodity servers using transparent sharding and replication. Citus extends the underlying database rather than forking it, giving developers and enterprises the power and familiarity of a relational database. As an extension, Citus supports new PostgreSQL releases, and allows you to benefit from new features while maintaining compatibility with existing PostgreSQL tools.

Citus serves many use cases. Two common ones are:

Multi-tenant database: Most B2B applications already have the notion of a tenant / customer / account built into their data model. Citus allows you to scale out your transactional relational database to 100K+ tenants with minimal changes to your application.
Real-time analytics: Citus enables ingesting large volumes of data and running analytical queries on that data in human real-time. Example applications include analytic dashboards with subsecond response times and exploratory queries on unfolding events.

To learn more, visit citusdata.com and join the mailing list to stay on top of the latest developments.

Getting started with Citus

The fastest way to get up and running is to create a Citus Cloud account. You can also setup a local Citus cluster with Docker.

Citus Cloud

Citus Cloud runs on top of AWS as a fully managed database as a service and has development plans available for getting started. You can provision a Citus Cloud account at https://console.citusdata.com and get started with just a few clicks.

Local Citus Cluster

If you're looking to get started locally, you can follow the following steps to get up and running.

Install Docker Community Edition and Docker Compose

Mac:
1. Download and install Docker.
2. Start Docker by clicking on the application’s icon.

Linux:

curl -sSL https://get.docker.com/ | sh
sudo usermod -aG docker $USER && exec sg docker newgrp `id -gn`
sudo systemctl start docker

sudo curl -sSL https://github.com/docker/compose/releases/download/1.11.2/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

The above version of Docker Compose is sufficient for running Citus, or you can install the latest version.

Pull and start the Docker images

curl -sSLO https://raw.githubusercontent.com/citusdata/docker/master/docker-compose.yml
docker-compose -p citus up -d

Connect to the master database

docker exec -it citus_master psql -U postgres

Follow the first tutorial instructions
To shut the cluster down, run

docker-compose -p citus down

Talk to Contributors and Learn More

Documentation	Try the Citus tutorial for a hands-on introduction or the documentation for a more comprehensive reference.
Google Groups	The Citus Google Group is our place for detailed questions and discussions.
Slack	Chat with us in our community Slack channel.
Github Issues	We track specific bug reports and feature requests on our project issues.
Twitter	Follow @citusdata for general updates and PostgreSQL scaling tips.

Contributing

Citus is built on and of open source, and we welcome your contributions. The CONTRIBUTING.md file explains how to get started developing the Citus extension itself and our code quality guidelines.

Who is Using Citus?

Citus is deployed in production by many customers, ranging from technology start-ups to large enterprises. Here are some examples:

CloudFlare uses Citus to provide real-time analytics on 100 TBs of data from over 4 million customer websites. Case Study
MixRank uses Citus to efficiently collect and analyze vast amounts of data to allow inside B2B sales teams to find new customers. Case Study
Neustar builds and maintains scalable ad-tech infrastructure that counts billions of events per day using Citus and HyperLogLog.
Agari uses Citus to secure more than 85 percent of U.S. consumer emails on two 6-8 TB clusters. Case Study
Heap uses Citus to run dynamic funnel, segmentation, and cohort queries across billions of users and tens of billions of events. Watch Video

README.md Unescape Escape