DESCRIPTION: Add functions to help with postgres upgrades Currently there is [a list of manual steps](https://docs.citusdata.com/en/v8.2/admin_guide/upgrading_citus.html?highlight=upgrade#upgrading-postgresql-version-from-10-to-11) to perform during a postgres upgrade. These steps guarantee our catalog tables are kept and counter values are maintained across upgrades. Having more than 1 command in our docs for users to manually execute during upgrades is error prone for both the user, and our docs. There are already 2 catalog tables that have been introduced to citus that have not been added to our docs for backing up during upgrades (`pg_authinfo` and `pg_dist_poolinfo`). As we add more functionality to citus we run into situations where there are more steps required either before or after the upgrade. At the same time, when we move catalog tables to a place where the contents will be maintained automatically during upgrades we could have less steps in our docs. This will come to a hard to maintain matrix of citus versions and steps to be performed. Instead we could take ownership of these steps within the extension itself. This PR introduces two new functions for the user to use instead of long lists of error prone instructions to follow. - `citus_prepare_pg_upgrade` This function should be called by the user right before shutting down the cluster. This will ensure all citus catalog tables are backed up in a location where the information will be retained during an upgrade. - `citus_finish_pg_upgrade` This function should be called right after a pg_upgrade of the cluster. This will restore the catalog tables to the state before the upgrade happend. Both functions need to be executed both on the coordinator and on all the workers, in the same fashion our current documentation instructs to do. There are two known problems with this function in its current form, which is also a problem with our docs. We should schedule time in the future to improve on this, but having it automated now is better as we are about to add extra steps to take after upgrades. - When you install citus in a clean cluster we do enable ssl for communication between the coordinator and the workers. If an upgrade to a clean cluster is performed we do not setup ssl on the new cluster causing the communication to fail. - There are no automated tests added in this PR to execute an upgrade test durning every build. Our current test infrastructure does not allow for 2 versions of postgres to exist in the same environment. We will need to invest time to create a new testing harness that could run the following scenario: 1. Create cluster 2. Run extensible scripts to execute arbitrary statements on this cluster 3. Perform an upgrade by preparing, upgrading and finishing 4. Run extensible scripts to verify all objects created by earlier scripts exists in correct form in the upgraded cluster Given the non trivial amount of work involved for such a suite I'd like to land this before we have automated testing. On a side note; As the reviewer noticed, the tables created in the public namespace are not visible in `psql` with `\d`. The backup catalog tables have the same name as the tables in `pg_catalog`. Due to postgres internals `pg_catalog` is first in the search path and therefore the non-qualified name would alwasy resolve to `pg_catalog.pg_dist_*`. Internally this is called a non-visible table as it would resolve to a different table without a qualified name. Only visible tables are shown with `\d`. |
||
---|---|---|
.circleci | ||
.github | ||
config | ||
src | ||
.codecov.yml | ||
.editorconfig | ||
.gitattributes | ||
.gitignore | ||
CHANGELOG.md | ||
CONTRIBUTING.md | ||
LICENSE | ||
Makefile | ||
Makefile.global.in | ||
README.md | ||
aclocal.m4 | ||
autogen.sh | ||
configure | ||
configure.in | ||
github-banner.png | ||
prep_buildtree |
README.md
What is Citus?
- Open-source PostgreSQL extension (not a fork)
- Built to scale out across multiple nodes
- Distributed engine for query parallelization
- Database designed to scale out multi-tenant applications, real-time analytics dashboards, and high-throughput transactional workloads
Citus is an open source extension to Postgres that distributes your data and your queries across multiple nodes. Because Citus is an extension to Postgres, and not a fork, Citus gives developers and enterprises a scale-out database while keeping the power and familiarity of a relational database. As an extension, Citus supports new PostgreSQL releases, and allows you to benefit from new features while maintaining compatibility with existing PostgreSQL tools.
Citus serves many use cases. Three common ones are:
-
Multi-tenant & SaaS applications: Most B2B applications already have the notion of a tenant / customer / account built into their data model. Citus allows you to scale out your transactional relational database to 100K+ tenants with minimal changes to your application.
-
Real-time analytics: Citus enables ingesting large volumes of data and running analytical queries on that data in human real-time. Example applications include analytic dashboards with sub-second response times and exploratory queries on unfolding events.
-
High-throughput transactional workloads: By distributing your workload across a database cluster, Citus ensures low latency and high performance even with a large number of concurrent users and high volumes of transactions.
To learn more, visit citusdata.com and join the Citus slack to stay on top of the latest developments.
Getting started with Citus
The fastest way to get up and running is to deploy Citus in the cloud. You can also setup a local Citus database cluster with Docker.
Hyperscale (Citus) on Azure Database for PostgreSQL
Hyperscale (Citus) is a deployment option on Azure Database for PostgreSQL, a fully-managed database as a service. Hyperscale (Citus) employs the Citus open source extension so you can scale out across multiple nodes. To get started with Hyperscale (Citus), learn more on the Citus website or use the Hyperscale (Citus) Quickstart in the Azure docs.
Citus Cloud
Citus Cloud runs on top of AWS as a fully managed database as a service. You can provision a Citus Cloud account at https://console.citusdata.com and get started with just a few clicks.
Local Citus Cluster
If you're looking to get started locally, you can follow the following steps to get up and running.
- Install Docker Community Edition and Docker Compose
- Mac:
- Download and install Docker.
- Start Docker by clicking on the application’s icon.
- Linux:
The above version of Docker Compose is sufficient for running Citus, or you can install the latest version.curl -sSL https://get.docker.com/ | sh sudo usermod -aG docker $USER && exec sg docker newgrp `id -gn` sudo systemctl start docker sudo curl -sSL https://github.com/docker/compose/releases/download/1.11.2/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose sudo chmod +x /usr/local/bin/docker-compose
- Pull and start the Docker images
curl -sSLO https://raw.githubusercontent.com/citusdata/docker/master/docker-compose.yml
docker-compose -p citus up -d
- Connect to the master database
docker exec -it citus_master psql -U postgres
- Follow the first tutorial instructions
- To shut the cluster down, run
docker-compose -p citus down
Talk to Contributors and Learn More
Documentation | Try the Citus
tutorial for a hands-on introduction or the documentation for a more comprehensive reference. |
Slack | Chat with us in our community Slack channel. |
Github Issues | We track specific bug reports and feature requests on our project issues. |
Follow @citusdata for general updates and PostgreSQL scaling tips. | |
Citus Blog | Read our Citus Data Blog for posts on Postgres, Citus, and scaling your database. |
Contributing
Citus is built on and of open source, and we welcome your contributions. The CONTRIBUTING.md file explains how to get started developing the Citus extension itself and our code quality guidelines.
Who is Using Citus?
Citus is deployed in production by many customers, ranging from technology start-ups to large enterprises. Here are some examples:
- Algolia uses Citus to provide real-time analytics for over 1B searches per day. For faster insights, they also use TopN and HLL extensions. User Story
- Heap uses Citus to run dynamic funnel, segmentation, and cohort queries across billions of users and has more than 700B events in their Citus database cluster. Watch Video
- Pex uses Citus to ingest 80B data points per day and analyze that data in real-time. They use a 20+ node cluster on Google Cloud. User Story
- MixRank uses Citus to efficiently collect and analyze vast amounts of data to allow inside B2B sales teams to find new customers. User Story
- Agari uses Citus to secure more than 85 percent of U.S. consumer emails on two 6-8 TB clusters. User Story
- Copper (formerly ProsperWorks) powers a cloud CRM service with Citus. User Story
You can read more user stories about how they employ Citus to scale Postgres for both multi-tenant SaaS applications as well as real-time analytics dashboards here.
Copyright © 2012–2019 Citus Data, Inc.