citus

Distributed PostgreSQL as an extension

citus citus-extension database database-cluster distributed-database multi-tenant postgres postgresql relational-database scale sharding sql

Go to file

Metin Doslu 1f838199f8 Use CustomScan API for query execution Custom Scan is a node in the planned statement which helps external providers to abstract data scan not just for foreign data wrappers but also for regular relations so you can benefit your version of caching or hardware optimizations. This sounds like only an abstraction on the data scan layer, but we can use it as an abstraction for our distributed queries. The only thing we need to do is to find distributable parts of the query, plan for them and replace them with a Citus Custom Scan. Then, whenever PostgreSQL hits this custom scan node in its Vulcano style execution, it will call our callback functions which run distributed plan and provides tuples to the upper node as it scans a regular relation. This means fewer code changes, fewer bugs and more supported features for us! First, in the distributed query planner phase, we create a Custom Scan which wraps the distributed plan. For real-time and task-tracker executors, we add this custom plan under the master query plan. For router executor, we directly pass the custom plan because there is not any master query. Then, we simply let the PostgreSQL executor run this plan. When it hits the custom scan node, we call the related executor parts for distributed plan, fill the tuple store in the custom scan and return results to PostgreSQL executor in Vulcano style, a tuple per XXX_ExecScan() call. * Modify planner to utilize Custom Scan node. * Create different scan methods for different executors. * Use native PostgreSQL Explain for master part of queries.		2017-03-14 12:17:51 +02:00
src	Use CustomScan API for query execution	2017-03-14 12:17:51 +02:00
.codecov.yml	Bump target to 87.5%	2016-12-09 14:06:35 -07:00
.gitattributes	Support PostgreSQL 9.6	2016-10-18 16:23:55 -06:00
.gitignore	Initial commit of Citus 5.0	2016-02-11 04:05:32 +02:00
.travis.yml	Add comment for otherwise opaque secure value	2016-12-06 11:30:22 -07:00
CHANGELOG.md	Add 6.1.0 CHANGELOG entries (#1219 )	2017-02-09 17:05:17 -07:00
CONTRIBUTING.md	Proper indentation for code blocks in lists	2016-03-30 15:40:53 -07:00
LICENSE	Add AGPL-3.0 in LICENSE file	2016-03-23 17:04:58 -06:00
Makefile	Remove csql, \stage is no longer needed	2016-08-26 10:41:59 +03:00
Makefile.global.in	Identify build and source directory of postgres we're compiling against.	2016-10-27 00:31:41 -07:00
README.md	Use curl everywhere and prevent nested shell session	2017-03-08 15:43:26 -07:00
autogen.sh	Changed product name to citus	2016-02-15 16:04:31 +02:00
configure	Enable instrumentation of coverage	2016-12-06 11:30:22 -07:00
configure.in	Enable instrumentation of coverage	2016-12-06 11:30:22 -07:00
github-banner.png	Readme for 5.0	2016-03-18 13:32:13 -07:00
prep_buildtree	Changed product name to citus	2016-02-15 16:04:31 +02:00

README.md

What is Citus?

Open-source PostgreSQL extension (not a fork)
Scalable across multiple machines through sharding and replication
Distributed engine for query parallelization
Database designed to scale multi-tenant applications

Citus is a distributed database that scales across commodity servers using transparent sharding and replication. Citus extends the underlying database rather than forking it, giving developers and enterprises the power and familiarity of a relational database. As an extension, Citus supports new PostgreSQL releases, and allows you to benefit from new features while maintaining compatibility with existing PostgreSQL tools.

Citus serves many use cases. Two common ones are:

Multi-tenant database: Most B2B applications already have the notion of a tenant / customer / account built into their data model. Citus allows you to scale out your transactional relational database to 100K+ tenants with minimal changes to your application.
Real-time analytics: Citus enables ingesting large volumes of data and running analytical queries on that data in human real-time. Example applications include analytic dashboards with subsecond response times and exploratory queries on unfolding events.

To learn more, visit citusdata.com and join the mailing list to stay on top of the latest developments.

Getting started with Citus

The fastest way to get up and running is to create a Citus Cloud account. You can also setup a local Citus cluster with Docker.

Citus Cloud

Citus Cloud runs on top of AWS as a fully managed database as a service and has development plans available for getting started. You can provision a Citus Cloud account at https://console.citusdata.com and get started with just a few clicks.

Local Citus Cluster

If you're looking to get started locally, you can follow the following steps to get up and running.

Install Docker Community Edition and Docker Compose

Mac:
1. Download and install Docker.
2. Start Docker by clicking on the application’s icon.

Linux:

curl -sSL https://get.docker.com/ | sh
sudo usermod -aG docker $USER && exec sg docker newgrp `id -gn`
sudo systemctl start docker

sudo curl -sSL https://github.com/docker/compose/releases/download/1.11.2/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

The above version of Docker Compose is sufficient for running Citus, or you can install the latest version.

Pull and start the Docker images

curl -sSLO https://raw.githubusercontent.com/citusdata/docker/master/docker-compose.yml
docker-compose -p citus up -d

Connect to the master database

docker exec -it citus_master psql -U postgres

Follow the first tutorial instructions
To shut the cluster down, run

docker-compose -p citus down

Talk to Contributors and Learn More

Documentation	Try the Citus tutorial for a hands-on introduction or the documentation for a more comprehensive reference.
Google Groups	The Citus Google Group is our place for detailed questions and discussions.
Slack	Chat with us in our community Slack channel.
Github Issues	We track specific bug reports and feature requests on our project issues.
Twitter	Follow @citusdata for general updates and PostgreSQL scaling tips.

Contributing

Citus is built on and of open source, and we welcome your contributions. The CONTRIBUTING.md file explains how to get started developing the Citus extension itself and our code quality guidelines.

Who is Using Citus?

Citus is deployed in production by many customers, ranging from technology start-ups to large enterprises. Here are some examples:

CloudFlare uses Citus to provide real-time analytics on 100 TBs of data from over 4 million customer websites. Case Study
MixRank uses Citus to efficiently collect and analyze vast amounts of data to allow inside B2B sales teams to find new customers. Case Study
Neustar builds and maintains scalable ad-tech infrastructure that counts billions of events per day using Citus and HyperLogLog.
Agari uses Citus to secure more than 85 percent of U.S. consumer emails on two 6-8 TB clusters. Case Study
Heap uses Citus to run dynamic funnel, segmentation, and cohort queries across billions of users and tens of billions of events. Watch Video

README.md Unescape Escape