citus

Distributed PostgreSQL as an extension

citus citus-extension database database-cluster distributed-database multi-tenant postgres postgresql relational-database scale sharding sql

Go to file

Andres Freund 3dac0a4d14 Rely less on remote_task_check_interval. When executing queries with citus.task_executor = 'real-time', query execution could, so far, spend a significant amount of time sleeping. That's because we were a) sleeping after several phases of query execution, even if we're not waiting for network IO b) sleeping for a fixed amount of time when waiting for network IO; often a lot longer than actually required. Just reducing the amount of time slept isn't a real solution, because that just increases CPU usage. Instead have the real-time executor's ManageTaskExecution return whether a task is currently being processed, waiting for reads or writes, or failed. When all tasks are waiting for IO use poll() to wait for IO readyness. That requires to slightly redefine how connection timeouts are handled: before we counted the number of times ManageTaskExecution() was called, and compared that with the timeout divided by the task check interval. That, if processing of tasks took a while, could significantly increase the time till a timeout occurred. Because it was based on the ManageTaskExecution() being called on a constant interval, this approach isn't feasible anymore. Instead measure the actual time since connection establishment was started. That could in theory, if task processing takes a very long time, lead to few passes over PQconnectPoll(). The problem of sleeping too much also exists for the 'task-tracker' executor, but is generally less problematic there, as processing the individual tasks usually will take longer. That said, for e.g. the regression tests it'd be helpful to use a similar approach.		2016-06-02 12:11:16 -06:00
src	Rely less on remote_task_check_interval.	2016-06-02 12:11:16 -06:00
.gitattributes	Switch to using git attributes to ignore files	2016-02-15 23:41:51 -07:00
.gitignore	Initial commit of Citus 5.0	2016-02-11 04:05:32 +02:00
.travis.yml	Initial commit of Citus 5.0	2016-02-11 04:05:32 +02:00
CHANGELOG.md	Add CHANGELOG entries for 5.1 release	2016-05-17 10:02:05 -06:00
CONTRIBUTING.md	Proper indentation for code blocks in lists	2016-03-30 15:40:53 -07:00
LICENSE	Add AGPL-3.0 in LICENSE file	2016-03-23 17:04:58 -06:00
Makefile	Fix various build issues	2016-03-11 13:38:47 -07:00
Makefile.global.in	Fix various build issues	2016-03-11 13:38:47 -07:00
README.md	Emphasize our slack room (#464 )	2016-04-22 14:54:13 -07:00
autogen.sh	Changed product name to citus	2016-02-15 16:04:31 +02:00
configure	Update copyright dates	2016-03-23 17:14:37 -06:00
configure.in	Update copyright dates	2016-03-23 17:14:37 -06:00
github-banner.png	Readme for 5.0	2016-03-18 13:32:13 -07:00
prep_buildtree	Changed product name to citus	2016-02-15 16:04:31 +02:00

README.md

What is Citus?

Open-source PostgreSQL extension (not a fork)
Scalable across multiple hosts through sharding and replication
Distributed engine for query parallelization
Highly available in the face of host failures

Citus horizontally scales PostgreSQL across commodity servers using sharding and replication. Its query engine parallelizes incoming SQL queries across these servers to enable real-time responses on large datasets.

Citus extends the underlying database rather than forking it, which gives developers and enterprises the power and familiarity of a traditional relational database. As an extension, Citus supports new PostgreSQL releases, allowing users to benefit from new features while maintaining compatibility with existing PostgreSQL tools. Note that Citus supports many (but not all) SQL commands; see the FAQ for more details.

Common Use-Cases:

Powering real-time analytic dashboards
Exploratory queries on events as they happen
Large dataset archival and reporting
Session analytics (funnels, segmentation, and cohorts)

To learn more, visit citusdata.com and join the mailing list to stay on top of the latest developments.

Quickstart

Local Citus Cluster

Install docker-compose: Mac | Linux
(Mac only) connect to Docker VM
```
eval $(docker-machine env default)
```

Pull and start the docker images

wget https://raw.githubusercontent.com/citusdata/docker/master/docker-compose.yml
docker-compose -p citus up -d

Connect to the master database

docker exec -it citus_master psql -U postgres -d postgres

Follow the first tutorial instructions
To shut the cluster down, run
```
docker-compose -p citus down
```

Talk to Contributors and Learn More

Documentation	Try the Citus tutorials for a hands-on introduction or the documentation for a more comprehensive reference.
Google Groups	The Citus Google Group is our place for detailed questions and discussions.
Slack	Chat with us in our community Slack channel.
Github Issues	We track specific bug reports and feature requests on our project issues.
Twitter	Follow @citusdata for general updates and PostgreSQL scaling tips.
Training and Support	See our support page for training and dedicated support options.

Contributing

Citus is built on and of open source. We welcome your contributions, and have added a helpwanted label to issues which are accessible to new contributors. The CONTRIBUTING.md file explains how to get started developing the Citus extension itself and our code quality guidelines.

Who is Using Citus?

Citus is deployed in production by many customers, ranging from technology start-ups to large enterprises. Here are some examples:

CloudFlare uses Citus to provide real-time analytics on 100 TBs of data from over 4 million customer websites. Case Study
MixRank uses Citus to efficiently collect and analyze vast amounts of data to allow inside B2B sales teams to find new customers. Case Study
Neustar builds and maintains scalable ad-tech infrastructure that counts billions of events per day using Citus and HyperLogLog.
Agari uses Citus to secure more than 85 percent of U.S. consumer emails on two 6-8 TB clusters. Case Study
Heap uses Citus to run dynamic funnel, segmentation, and cohort queries across billions of users and tens of billions of events. Watch Video

README.md Unescape Escape