Distributed PostgreSQL as an extension
 
 
 
 
 
 
Go to file
Eren 3eaff48114 Propagate DDL Commands with 2PC
Fixes #513

This change modifies the DDL Propagation logic so that DDL queries
are propagated via 2-Phase Commit protocol. This way, failures during
the execution of distributed DDL commands will not leave the table in
an intermediate state and the pending prepared transactions can be
commited manually.

DDL commands are not allowed inside other transaction blocks or functions.

DDL commands are performed with 2PC regardless of the value of
`citus.multi_shard_commit_protocol` parameter.

The workflow of the successful case is this:
1. Open individual connections to all shard placements and send `BEGIN`
2. Send `SELECT worker_apply_shard_ddl_command(<shardId>, <DDL Command>)`
to all connections, one by one, in a serial manner.
3. Send `PREPARE TRANSCATION <transaction_id>` to all connections.
4. Sedn `COMMIT` to all connections.

Failure cases:
- If a worker problem occurs before sending of all DDL commands is finished, then
all changes are rolled back.
- If a worker problem occurs after all DDL commands are sent but not after
`PREPARE TRANSACTION` commands are finished, then all changes are rolled back.
However, if a worker node is failed, then the prepared transactions in that worker
should be rolled back manually.
- If a worker problem occurs during `COMMIT PREPARED` statements are being sent,
then the prepared transactions on the failed workers should be commited manually.
- If master fails before the first 'PREPARE TRANSACTION' is sent, then nothing is
changed on workers.
- If master fails during `PREPARE TRANSACTION` commands are being sent, then the
prepared transactions on workers should be rolled back manually.
- If master fails during `COMMIT PREPARED` or `ROLLBACK PREPARED` commands are being
sent, then the remaining prepared transactions on the workers should be handled manually.

This change also helps with #480, since failed DDL changes no longer mark
failed placements as inactive.
2016-07-19 10:44:11 +03:00
src Propagate DDL Commands with 2PC 2016-07-19 10:44:11 +03:00
.gitattributes Switch to using git attributes to ignore files 2016-02-15 23:41:51 -07:00
.gitignore Initial commit of Citus 5.0 2016-02-11 04:05:32 +02:00
.travis.yml Omit open- tracking branches from build 2016-06-03 18:01:36 -06:00
CHANGELOG.md Add CHANGELOG entries for 5.1.1 release 2016-06-17 16:03:32 -06:00
CONTRIBUTING.md Proper indentation for code blocks in lists 2016-03-30 15:40:53 -07:00
LICENSE Add AGPL-3.0 in LICENSE file 2016-03-23 17:04:58 -06:00
Makefile Fix various build issues 2016-03-11 13:38:47 -07:00
Makefile.global.in Detect flex in citus configure script, instead of relying postgres'. 2016-06-22 11:03:22 -07:00
README.md Fix the documentation link from Citus 5.0 to Citus 5.1 (#593) 2016-06-16 10:45:36 -07:00
autogen.sh Changed product name to citus 2016-02-15 16:04:31 +02:00
configure Detect flex in citus configure script, instead of relying postgres'. 2016-06-22 11:03:22 -07:00
configure.in Detect flex in citus configure script, instead of relying postgres'. 2016-06-22 11:03:22 -07:00
github-banner.png Readme for 5.0 2016-03-18 13:32:13 -07:00
prep_buildtree Changed product name to citus 2016-02-15 16:04:31 +02:00

README.md

Citus Banner

Build Status Slack Status Latest Docs

What is Citus?

  • Open-source PostgreSQL extension (not a fork)
  • Scalable across multiple hosts through sharding and replication
  • Distributed engine for query parallelization
  • Highly available in the face of host failures

Citus horizontally scales PostgreSQL across commodity servers using sharding and replication. Its query engine parallelizes incoming SQL queries across these servers to enable real-time responses on large datasets.

Citus extends the underlying database rather than forking it, which gives developers and enterprises the power and familiarity of a traditional relational database. As an extension, Citus supports new PostgreSQL releases, allowing users to benefit from new features while maintaining compatibility with existing PostgreSQL tools. Note that Citus supports many (but not all) SQL commands; see the FAQ for more details.

Common Use-Cases:

  • Powering real-time analytic dashboards
  • Exploratory queries on events as they happen
  • Large dataset archival and reporting
  • Session analytics (funnels, segmentation, and cohorts)

To learn more, visit citusdata.com and join the mailing list to stay on top of the latest developments.

Quickstart

Local Citus Cluster

  • Install docker-compose: Mac | Linux

  • (Mac only) connect to Docker VM

    eval $(docker-machine env default)
    
  • Pull and start the docker images

    wget https://raw.githubusercontent.com/citusdata/docker/master/docker-compose.yml
    docker-compose -p citus up -d
    
  • Connect to the master database

    docker exec -it citus_master psql -U postgres -d postgres
    
  • Follow the first tutorial instructions

  • To shut the cluster down, run

    docker-compose -p citus down
    

Talk to Contributors and Learn More

Documentation Try the Citus tutorials for a hands-on introduction or
the documentation for a more comprehensive reference.
Google Groups The Citus Google Group is our place for detailed questions and discussions.
Slack Chat with us in our community Slack channel.
Github Issues We track specific bug reports and feature requests on our project issues.
Twitter Follow @citusdata for general updates and PostgreSQL scaling tips.
Training and Support See our support page for training and dedicated support options.

Contributing

Citus is built on and of open source. We welcome your contributions, and have added a helpwanted label to issues which are accessible to new contributors. The CONTRIBUTING.md file explains how to get started developing the Citus extension itself and our code quality guidelines.

Who is Using Citus?

Citus is deployed in production by many customers, ranging from technology start-ups to large enterprises. Here are some examples:

  • CloudFlare uses Citus to provide real-time analytics on 100 TBs of data from over 4 million customer websites. Case Study
  • MixRank uses Citus to efficiently collect and analyze vast amounts of data to allow inside B2B sales teams to find new customers. Case Study
  • Neustar builds and maintains scalable ad-tech infrastructure that counts billions of events per day using Citus and HyperLogLog.
  • Agari uses Citus to secure more than 85 percent of U.S. consumer emails on two 6-8 TB clusters. Case Study
  • Heap uses Citus to run dynamic funnel, segmentation, and cohort queries across billions of users and tens of billions of events. Watch Video

Copyright © 20122016 Citus Data, Inc.