Distributed PostgreSQL as an extension
 
 
 
 
 
 
Go to file
Önder Kalacı b74ed3c8e1 Subqueries in where -- updated (#1372)
* Support for subqueries in WHERE clause

This commit enables subqueries in WHERE clause to be pushed down
by the subquery pushdown logic.

The support covers:
  - Correlated subqueries with IN, NOT IN, EXISTS, NOT EXISTS,
    operator expressions such as (>, <, =, ALL, ANY etc.)
  - Non-correlated subqueries with (partition_key) IN (SELECT partition_key ..)
    (partition_key) =ANY (SELECT partition_key ...)

Note that this commit heavily utilizes the attribute equivalence logic introduced
in the 1cb6a34ba8. In general, this commit mostly
adjusts the logical planner not to error out on the subqueries in WHERE clause.

* Improve error checks for subquery pushdown and INSERT ... SELECT

Since we allow subqueries in WHERE clause with the previous commit,
we should apply the same limitations to those subqueries.

With this commit, we do not iterate on each subquery one by one.
Instead, we extract all the subqueries and apply the checks directly
on those subqueries. The aim of this change is to (i) Simplify the
code (ii) Make it close to the checks on INSERT .. SELECT code base.

* Extend checks for unresolved paramaters to include SubLinks

With the presence of subqueries in where clause (i.e., SubPlans on the
query) the existing way for checking unresolved parameters fail. The
reason is that the parameters for SubPlans are kept on the parent plan not
on the query itself (see primnodes.h for the details).

With this commit, instead of checking SubPlans on the modified plans
we start to use originalQuery, where SubLinks represent the subqueries
in where clause. The unresolved parameters can be found on the SubLinks.

* Apply code-review feedback

* Remove unnecessary copying of shard interval list

This commit removes unnecessary copying of shard interval list. Note
that there are no copyObject function implemented for shard intervals.
2017-05-01 17:20:21 +03:00
src Subqueries in where -- updated (#1372) 2017-05-01 17:20:21 +03:00
.codecov.yml Bump target to 87.5% 2016-12-09 14:06:35 -07:00
.editorconfig Set tab size for GitHub display 2017-03-22 13:03:39 -06:00
.gitattributes Support PostgreSQL 9.6 2016-10-18 16:23:55 -06:00
.gitignore Initial commit of Citus 5.0 2016-02-11 04:05:32 +02:00
.travis.yml Add comment for otherwise opaque secure value 2016-12-06 11:30:22 -07:00
CHANGELOG.md Add 6.1.0 CHANGELOG entries (#1219) 2017-02-09 17:05:17 -07:00
CONTRIBUTING.md Proper indentation for code blocks in lists 2016-03-30 15:40:53 -07:00
LICENSE Add AGPL-3.0 in LICENSE file 2016-03-23 17:04:58 -06:00
Makefile Fix VPATH builds broken in 087d8427e3. 2017-04-25 16:04:42 -07:00
Makefile.global.in Fix VPATH builds broken in 087d8427e3. 2017-04-25 16:04:42 -07:00
README.md Update README.md 2017-03-23 11:00:32 -07:00
autogen.sh Changed product name to citus 2016-02-15 16:04:31 +02:00
configure Self-implemented review feedback 2017-04-03 22:55:12 -06:00
configure.in Self-implemented review feedback 2017-04-03 22:55:12 -06:00
github-banner.png Readme for 5.0 2016-03-18 13:32:13 -07:00
prep_buildtree Changed product name to citus 2016-02-15 16:04:31 +02:00

README.md

Citus Banner

Build Status Slack Status Latest Docs

What is Citus?

  • Open-source PostgreSQL extension (not a fork)
  • Scalable across multiple machines through sharding and replication
  • Distributed engine for query parallelization
  • Database designed to scale multi-tenant applications

Citus is a distributed database that scales across commodity servers using transparent sharding and replication. Citus extends the underlying database rather than forking it, giving developers and enterprises the power and familiarity of a relational database. As an extension, Citus supports new PostgreSQL releases, and allows you to benefit from new features while maintaining compatibility with existing PostgreSQL tools.

Citus serves many use cases. Two common ones are:

  1. Multi-tenant database: Most B2B applications already have the notion of a tenant / customer / account built into their data model. Citus allows you to scale out your transactional relational database to 100K+ tenants with minimal changes to your application.

  2. Real-time analytics: Citus enables ingesting large volumes of data and running analytical queries on that data in human real-time. Example applications include analytic dashboards with subsecond response times and exploratory queries on unfolding events.

To learn more, visit citusdata.com and join the mailing list to stay on top of the latest developments.

Getting started with Citus

The fastest way to get up and running is to create a Citus Cloud account. You can also setup a local Citus cluster with Docker.

Citus Cloud

Citus Cloud runs on top of AWS as a fully managed database as a service and has development plans available for getting started. You can provision a Citus Cloud account at https://console.citusdata.com and get started with just a few clicks.

Local Citus Cluster

If you're looking to get started locally, you can follow the following steps to get up and running.

  1. Install Docker Community Edition and Docker Compose
  • Mac:
    1. Download and install Docker.
    2. Start Docker by clicking on the applications icon.
  • Linux:
    curl -sSL https://get.docker.com/ | sh
    sudo usermod -aG docker $USER && exec sg docker newgrp `id -gn`
    sudo systemctl start docker
    
    sudo curl -sSL https://github.com/docker/compose/releases/download/1.11.2/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
    sudo chmod +x /usr/local/bin/docker-compose
    
    The above version of Docker Compose is sufficient for running Citus, or you can install the latest version.
  1. Pull and start the Docker images
curl -sSLO https://raw.githubusercontent.com/citusdata/docker/master/docker-compose.yml
docker-compose -p citus up -d
  1. Connect to the master database
docker exec -it citus_master psql -U postgres
  1. Follow the first tutorial instructions
  2. To shut the cluster down, run
docker-compose -p citus down

Talk to Contributors and Learn More

Documentation Try the Citus tutorial for a hands-on introduction or
the documentation for a more comprehensive reference.
Google Groups The Citus Google Group is our place for detailed questions and discussions.
Slack Chat with us in our community Slack channel.
Github Issues We track specific bug reports and feature requests on our project issues.
Twitter Follow @citusdata for general updates and PostgreSQL scaling tips.

Contributing

Citus is built on and of open source, and we welcome your contributions. The CONTRIBUTING.md file explains how to get started developing the Citus extension itself and our code quality guidelines.

Who is Using Citus?

Citus is deployed in production by many customers, ranging from technology start-ups to large enterprises. Here are some examples:

  • CloudFlare uses Citus to provide real-time analytics on 100 TBs of data from over 4 million customer websites. Case Study
  • MixRank uses Citus to efficiently collect and analyze vast amounts of data to allow inside B2B sales teams to find new customers. Case Study
  • Neustar builds and maintains scalable ad-tech infrastructure that counts billions of events per day using Citus and HyperLogLog.
  • Agari uses Citus to secure more than 85 percent of U.S. consumer emails on two 6-8 TB clusters. Case Study
  • Heap uses Citus to run dynamic funnel, segmentation, and cohort queries across billions of users and tens of billions of events. Watch Video

Copyright © 20122017 Citus Data, Inc.