Distributed PostgreSQL as an extension
 
 
 
 
 
 
Go to file
Önder Kalacı ad5cd326a4 Subquery pushdown - main branch (#1323)
* Enabling physical planner for subquery pushdown changes

This commit applies the logic that exists in INSERT .. SELECT
planning to the subquery pushdown changes.

The main algorithm is followed as :
   - pick an anchor relation (i.e., target relation)
   - per each target shard interval
       - add the target shard interval's shard range
         as a restriction to the relations (if all relations
         joined on the partition keys)
        - Check whether the query is router plannable per
          target shard interval.
        - If router plannable, create a task

* Add union support within the JOINS

This commit adds support for UNION/UNION ALL subqueries that are
in the following form:

     .... (Q1 UNION Q2 UNION ...) as union_query JOIN (QN) ...

In other words, we currently do NOT support the queries that are
in the following form where union query is not JOINed with
other relations/subqueries :

     .... (Q1 UNION Q2 UNION ...) as union_query ....

* Subquery pushdown planner uses original query

With this commit, we change the input to the logical planner for
subquery pushdown. Before this commit, the planner was relying
on the query tree that is transformed by the postgresql planner.
After this commit, the planner uses the original query. The main
motivation behind this change is the simplify deparsing of
subqueries.

* Enable top level subquery join queries

This work enables
- Top level subquery joins
- Joins between subqueries and relations
- Joins involving more than 2 range table entries

A new regression test file is added to reflect enabled test cases

* Add top level union support

This commit adds support for UNION/UNION ALL subqueries that are
in the following form:

     .... (Q1 UNION Q2 UNION ...) as union_query ....

In other words, Citus supports allow top level
unions being wrapped into aggregations queries
and/or simple projection queries that only selects
some fields from the lower level queries.

* Disallow subqueries without a relation in the range table list for subquery pushdown

This commit disallows subqueries without relation in the range table
list. This commit is only applied for subquery pushdown. In other words,
we do not add this limitation for single table re-partition subqueries.

The reasoning behind this limitation is that if we allow pushing down
such queries, the result would include (shardCount * expectedResults)
where in a non distributed world the result would be (expectedResult)
only.

* Disallow subqueries without a relation in the range table list for INSERT .. SELECT

This commit disallows subqueries without relation in the range table
list. This commit is only applied for INSERT.. SELECT queries.

The reasoning behind this limitation is that if we allow pushing down
such queries, the result would include (shardCount * expectedResults)
where in a non distributed world the result would be (expectedResult)
only.

* Change behaviour of subquery pushdown flag (#1315)

This commit changes the behaviour of the citus.subquery_pushdown flag.
Before this commit, the flag is used to enable subquery pushdown logic. But,
with this commit, that behaviour is enabled by default. In other words, the
flag is now useless. We prefer to keep the flag since we don't want to break
the backward compatibility. Also, we may consider using that flag for other
purposes in the next commits.

* Require subquery_pushdown when limit is used in subquery

Using limit in subqueries may cause returning incorrect
results. Therefore we allow limits in subqueries only
if user explicitly set subquery_pushdown flag.

* Evaluate expressions on the LIMIT clause (#1333)

Subquery pushdown uses orignal query, the LIMIT and OFFSET clauses
are not evaluated. However, logical optimizer expects these expressions
are already evaluated by the standard planner. This commit manually
evaluates the functions on the logical planner for subquery pushdown.

* Better format subquery regression tests (#1340)

* Style fix for subquery pushdown regression tests

With this commit we intented a more consistent style for the
regression tests we've added in the
  - multi_subquery_union.sql
  - multi_subquery_complex_queries.sql
  - multi_subquery_behavioral_analytics.sql

* Enable the tests that are temporarily commented

This commit enables some of the regression tests that were commented
out until all the development is done.

* Fix merge conflicts (#1347)

 - Update regression tests to meet the changes in the regression
   test output.
 - Replace Ifs with Asserts given that the check is already done
 - Update shard pruning outputs

* Add view regression tests for increased subquery coverage (#1348)

- joins between views and tables
- joins between views
- union/union all queries involving views
- views with limit
- explain queries with view

* Improve btree operators for the subquery tests

This commit adds the missing comprasion for subquery composite key
btree comparator.
2017-04-29 04:09:48 +03:00
src Subquery pushdown - main branch (#1323) 2017-04-29 04:09:48 +03:00
.codecov.yml Bump target to 87.5% 2016-12-09 14:06:35 -07:00
.editorconfig Set tab size for GitHub display 2017-03-22 13:03:39 -06:00
.gitattributes Support PostgreSQL 9.6 2016-10-18 16:23:55 -06:00
.gitignore Initial commit of Citus 5.0 2016-02-11 04:05:32 +02:00
.travis.yml Add comment for otherwise opaque secure value 2016-12-06 11:30:22 -07:00
CHANGELOG.md Add 6.1.0 CHANGELOG entries (#1219) 2017-02-09 17:05:17 -07:00
CONTRIBUTING.md Proper indentation for code blocks in lists 2016-03-30 15:40:53 -07:00
LICENSE Add AGPL-3.0 in LICENSE file 2016-03-23 17:04:58 -06:00
Makefile Fix VPATH builds broken in 087d8427e3. 2017-04-25 16:04:42 -07:00
Makefile.global.in Fix VPATH builds broken in 087d8427e3. 2017-04-25 16:04:42 -07:00
README.md Update README.md 2017-03-23 11:00:32 -07:00
autogen.sh Changed product name to citus 2016-02-15 16:04:31 +02:00
configure Self-implemented review feedback 2017-04-03 22:55:12 -06:00
configure.in Self-implemented review feedback 2017-04-03 22:55:12 -06:00
github-banner.png Readme for 5.0 2016-03-18 13:32:13 -07:00
prep_buildtree Changed product name to citus 2016-02-15 16:04:31 +02:00

README.md

Citus Banner

Build Status Slack Status Latest Docs

What is Citus?

  • Open-source PostgreSQL extension (not a fork)
  • Scalable across multiple machines through sharding and replication
  • Distributed engine for query parallelization
  • Database designed to scale multi-tenant applications

Citus is a distributed database that scales across commodity servers using transparent sharding and replication. Citus extends the underlying database rather than forking it, giving developers and enterprises the power and familiarity of a relational database. As an extension, Citus supports new PostgreSQL releases, and allows you to benefit from new features while maintaining compatibility with existing PostgreSQL tools.

Citus serves many use cases. Two common ones are:

  1. Multi-tenant database: Most B2B applications already have the notion of a tenant / customer / account built into their data model. Citus allows you to scale out your transactional relational database to 100K+ tenants with minimal changes to your application.

  2. Real-time analytics: Citus enables ingesting large volumes of data and running analytical queries on that data in human real-time. Example applications include analytic dashboards with subsecond response times and exploratory queries on unfolding events.

To learn more, visit citusdata.com and join the mailing list to stay on top of the latest developments.

Getting started with Citus

The fastest way to get up and running is to create a Citus Cloud account. You can also setup a local Citus cluster with Docker.

Citus Cloud

Citus Cloud runs on top of AWS as a fully managed database as a service and has development plans available for getting started. You can provision a Citus Cloud account at https://console.citusdata.com and get started with just a few clicks.

Local Citus Cluster

If you're looking to get started locally, you can follow the following steps to get up and running.

  1. Install Docker Community Edition and Docker Compose
  • Mac:
    1. Download and install Docker.
    2. Start Docker by clicking on the applications icon.
  • Linux:
    curl -sSL https://get.docker.com/ | sh
    sudo usermod -aG docker $USER && exec sg docker newgrp `id -gn`
    sudo systemctl start docker
    
    sudo curl -sSL https://github.com/docker/compose/releases/download/1.11.2/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
    sudo chmod +x /usr/local/bin/docker-compose
    
    The above version of Docker Compose is sufficient for running Citus, or you can install the latest version.
  1. Pull and start the Docker images
curl -sSLO https://raw.githubusercontent.com/citusdata/docker/master/docker-compose.yml
docker-compose -p citus up -d
  1. Connect to the master database
docker exec -it citus_master psql -U postgres
  1. Follow the first tutorial instructions
  2. To shut the cluster down, run
docker-compose -p citus down

Talk to Contributors and Learn More

Documentation Try the Citus tutorial for a hands-on introduction or
the documentation for a more comprehensive reference.
Google Groups The Citus Google Group is our place for detailed questions and discussions.
Slack Chat with us in our community Slack channel.
Github Issues We track specific bug reports and feature requests on our project issues.
Twitter Follow @citusdata for general updates and PostgreSQL scaling tips.

Contributing

Citus is built on and of open source, and we welcome your contributions. The CONTRIBUTING.md file explains how to get started developing the Citus extension itself and our code quality guidelines.

Who is Using Citus?

Citus is deployed in production by many customers, ranging from technology start-ups to large enterprises. Here are some examples:

  • CloudFlare uses Citus to provide real-time analytics on 100 TBs of data from over 4 million customer websites. Case Study
  • MixRank uses Citus to efficiently collect and analyze vast amounts of data to allow inside B2B sales teams to find new customers. Case Study
  • Neustar builds and maintains scalable ad-tech infrastructure that counts billions of events per day using Citus and HyperLogLog.
  • Agari uses Citus to secure more than 85 percent of U.S. consumer emails on two 6-8 TB clusters. Case Study
  • Heap uses Citus to run dynamic funnel, segmentation, and cohort queries across billions of users and tens of billions of events. Watch Video

Copyright © 20122017 Citus Data, Inc.