Distributed PostgreSQL as an extension
 
 
 
 
 
 
Go to file
Onur Tirtir 5e5ba46793
Merge pull request #4143 from citusdata/single-placement-table/master-cache-entry-rebased
DESCRIPTION: Introduce citus local tables

The commits in this pr are merged from other sub-pr's:

* community/#3852: Brings lazy&fast table creation logic for create_citus_local_table udf
* community/#3995: Brings extended utility command support for citus local tables
* community/#4133: Brings changes in planner and in several places to integrate citus local tables into our distributed execution logic

We are introducing citus local tables, which a new table type to citus.

To be able to create a citus local table, first we need to add coordinator as a worker
node.
Then, we can create a citus local table via SELECT create_citus_local_table(<tableName>).

Calling this udf from coordinator will actually create a single-shard table whose shard
is on the coordinator.

Also, from the citus metadata perspective, for citus local tables:
* partitionMethod is set to DISTRIBUTE_BY_NONE (like reference tables) and
* replicationModel is set to the current value of citus.replication_model, which
  already can't be equal to REPLICATION_MODEL_2PC, which is only used for reference
  tables internally.

Note that currently we support creating citus local tables only from postgres tables
living in the coordinator.
That means, it is not allowed to execute this udf from worker nodes or it is not allowed
to move shard of a citus local table to any other nodes.

Also, run-time complexity of calling create_citus_local_table udf does not depend
on the size of the relation, that means, creating citus local tables is actually a
non-blocking operation.
This is because, instead of copying the data to a new shard, this udf just does the
following:

* convert input postgres table to the single-shard of the citus local table by suffixing
  the shardId to it's name, constraints, indexes and triggers etc.,
* create a shell table for citus local table in coordinator and in mx-worker nodes when
  metadata sycn is enabled.
* create necessary objects on shell table.

Here, we should also note we can execute queries/dml's from mx worker nodes
as citus local tables are already first class citus tables.

Even more, we brought trigger support for citus local tables.
That means, we can define triggers on citus local tables so that users can define trigger
objects to perform execution of custom functions that might even modify other citus tables
and other postgres tables.

Other than trigger support, citus local tables can also be involved in foreign key relationships
with reference tables.
Here the only restriction is, foreign keys from reference tables to citus local tables cannot
have behaviors other than RESTRICT & NO ACTION behavior.
Other than that, foreign keys between citus local tables and reference tables just work fine.

All in all, citus local tables are actually just local tables living in the coordinator, but natively
accessible from other nodes like other first class citus tables and this enables us to set foreign
keys constraints between very big coordinator tables and reference tables without having to
do any data replication to worker nodes for local tables.
2020-09-09 13:02:42 +03:00
.circleci Fail if merge to enterprise causes compilation issues (#4027) 2020-08-31 13:56:15 +03:00
.github Add DESCRIPTION to PR template 2018-12-12 05:35:12 +01:00
ci Fail if merge to enterprise causes compilation issues (#4027) 2020-08-31 13:56:15 +03:00
config Add citus_version(), analogous to PG's version() 2017-10-16 18:09:29 -06:00
src Apply planner changes for citus local tables 2020-09-09 11:51:18 +03:00
vendor Update cherry-pick hash in vendor README 2020-03-19 11:53:05 +01:00
.codecov.yml Ignore safestringlib sourcefiles in coverage (#3632) 2020-03-20 14:26:52 +01:00
.editorconfig Fix editorconfig syntax (#3272) 2019-12-06 17:05:04 +01:00
.gitattributes update cte inline output for pg13 2020-08-04 15:18:27 +03:00
.gitignore Ignore .vscode (#2969) 2019-09-18 17:08:22 +03:00
.ignore Convert unsafe APIs to safe ones 2020-02-25 15:39:27 +01:00
CHANGELOG.md Update CHANGELOG for 9.4.0 2020-07-28 14:40:45 +03:00
CONTRIBUTING.md Add python to OSX CONTRIBUTING requirements (#4083) 2020-08-31 18:14:21 +02:00
LICENSE Strip trailing whitespace and add final newline (#3186) 2019-11-21 14:25:37 +01:00
Makefile Introduce new make targets for downgrade scripts 2020-07-14 13:10:18 +03:00
Makefile.global.in Use exactly matching tag in citus_version output (#3828) 2020-05-13 15:05:07 +03:00
README.md add circleci build status (#3310) (#3309) 2019-12-16 19:25:32 +03:00
aclocal.m4 Basic usage statistics collection. (#1656) 2017-10-11 09:55:15 -04:00
autogen.sh Changed product name to citus 2016-02-15 16:04:31 +02:00
configure Enable postgres 13 in configure 2020-08-04 13:34:13 +03:00
configure.in Add postgres 13 to configure.in (#4088) 2020-08-05 11:36:55 +03:00
github-banner.png Readme for 5.0 2016-03-18 13:32:13 -07:00
prep_buildtree Changed product name to citus 2016-02-15 16:04:31 +02:00

README.md

Citus Banner

Slack Status Latest Docs Circleci Status Code Coverage

What is Citus?

  • Open-source PostgreSQL extension (not a fork)
  • Built to scale out across multiple nodes
  • Distributed engine for query parallelization
  • Database designed to scale out multi-tenant applications, real-time analytics dashboards, and high-throughput transactional workloads

Citus is an open source extension to Postgres that distributes your data and your queries across multiple nodes. Because Citus is an extension to Postgres, and not a fork, Citus gives developers and enterprises a scale-out database while keeping the power and familiarity of a relational database. As an extension, Citus supports new PostgreSQL releases, and allows you to benefit from new features while maintaining compatibility with existing PostgreSQL tools.

Citus serves many use cases. Three common ones are:

  1. Multi-tenant & SaaS applications: Most B2B applications already have the notion of a tenant / customer / account built into their data model. Citus allows you to scale out your transactional relational database to 100K+ tenants with minimal changes to your application.

  2. Real-time analytics: Citus enables ingesting large volumes of data and running analytical queries on that data in human real-time. Example applications include analytic dashboards with sub-second response times and exploratory queries on unfolding events.

  3. High-throughput transactional workloads: By distributing your workload across a database cluster, Citus ensures low latency and high performance even with a large number of concurrent users and high volumes of transactions.

To learn more, visit citusdata.com and join the Citus slack to stay on top of the latest developments.

Getting started with Citus

The fastest way to get up and running is to deploy Citus in the cloud. You can also setup a local Citus database cluster with Docker.

Hyperscale (Citus) on Azure Database for PostgreSQL

Hyperscale (Citus) is a deployment option on Azure Database for PostgreSQL, a fully-managed database as a service. Hyperscale (Citus) employs the Citus open source extension so you can scale out across multiple nodes. To get started with Hyperscale (Citus), learn more on the Citus website or use the Hyperscale (Citus) Quickstart in the Azure docs.

Citus Cloud

Citus Cloud runs on top of AWS as a fully managed database as a service. You can provision a Citus Cloud account at https://console.citusdata.com and get started with just a few clicks.

Local Citus Cluster

If you're looking to get started locally, you can follow the following steps to get up and running.

  1. Install Docker Community Edition and Docker Compose
  • Mac:
    1. Download and install Docker.
    2. Start Docker by clicking on the applications icon.
  • Linux:
    curl -sSL https://get.docker.com/ | sh
    sudo usermod -aG docker $USER && exec sg docker newgrp `id -gn`
    sudo systemctl start docker
    
    sudo curl -sSL https://github.com/docker/compose/releases/download/1.11.2/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
    sudo chmod +x /usr/local/bin/docker-compose
    
    The above version of Docker Compose is sufficient for running Citus, or you can install the latest version.
  1. Pull and start the Docker images
curl -sSLO https://raw.githubusercontent.com/citusdata/docker/master/docker-compose.yml
docker-compose -p citus up -d
  1. Connect to the master database
docker exec -it citus_master psql -U postgres
  1. Follow the first tutorial instructions
  2. To shut the cluster down, run
docker-compose -p citus down

Talk to Contributors and Learn More

Documentation Try the Citus tutorial for a hands-on introduction or
the documentation for a more comprehensive reference.
Slack Chat with us in our community Slack channel.
Github Issues We track specific bug reports and feature requests on our project issues.
Twitter Follow @citusdata for general updates and PostgreSQL scaling tips.
Citus Blog Read our Citus Data Blog for posts on Postgres, Citus, and scaling your database.

Contributing

Citus is built on and of open source, and we welcome your contributions. The CONTRIBUTING.md file explains how to get started developing the Citus extension itself and our code quality guidelines.

Who is Using Citus?

Citus is deployed in production by many customers, ranging from technology start-ups to large enterprises. Here are some examples:

  • Algolia uses Citus to provide real-time analytics for over 1B searches per day. For faster insights, they also use TopN and HLL extensions. User Story
  • Heap uses Citus to run dynamic funnel, segmentation, and cohort queries across billions of users and has more than 700B events in their Citus database cluster. Watch Video
  • Pex uses Citus to ingest 80B data points per day and analyze that data in real-time. They use a 20+ node cluster on Google Cloud. User Story
  • MixRank uses Citus to efficiently collect and analyze vast amounts of data to allow inside B2B sales teams to find new customers. User Story
  • Agari uses Citus to secure more than 85 percent of U.S. consumer emails on two 6-8 TB clusters. User Story
  • Copper (formerly ProsperWorks) powers a cloud CRM service with Citus. User Story

You can read more user stories about how they employ Citus to scale Postgres for both multi-tenant SaaS applications as well as real-time analytics dashboards here.


Copyright © Citus Data, Inc.