Commit Graph

28 Commits (5ca792aef99f4983ae89073ca084fb1a2557d440)

Author SHA1 Message Date
Jelte Fennema a01e45f3df Make enterprise features open source
This PR makes all of the features open source that were previously only
available in Citus Enterprise.

Features that this adds:
1. Non blocking shard moves/shard rebalancer
   (`citus.logical_replication_timeout`)
2. Propagation of CREATE/DROP/ALTER ROLE statements
3. Propagation of GRANT statements
4. Propagation of CLUSTER statements
5. Propagation of ALTER DATABASE ... OWNER TO ...
6. Optimization for COPY when loading JSON to avoid double parsing of
   the JSON object (`citus.skip_jsonb_validation_in_copy`)
7. Support for row level security
8. Support for `pg_dist_authinfo`, which allows storing different
   authentication options for different users, e.g. you can store
   passwords or certificates here.
9. Support for `pg_dist_poolinfo`, which allows using connection poolers
   in between coordinator and workers
10. Tracking distributed query execution times using
   citus_stat_statements (`citus.stat_statements_max`,
   `citus.stat_statements_purge_interval`,
   `citus.stat_statements_track`). This is disabled by default.
11. Blocking tenant_isolation
12. Support for `sslkey` and `sslcert` in `citus.node_conninfo`
2022-06-16 08:09:45 +02:00
Ahmet Gedemenli 4b5f749c23 Propagate dependent views upon distribution (#5950)
(cherry picked from commit 26d927178c)
2022-05-26 18:58:04 +03:00
Onder Kalaci 4b5cb7e2b9 Mark existing views as distributed when upgrade to 11.0+
We have a mechanism which ensures that newly distributed
objects are recorded during `alter extension citus update`.

However, the logic was lacking "view"s. With this commit, we make
sure that existing views are also marked as distributed during
upgrade.

(cherry picked from commit ee45e7bfbf)
2022-05-23 09:22:17 +02:00
Nils Dijk 3801576dfb
Move pg_dist_object to pg_catalog (#5765)
DESCRIPTION: Move pg_dist_object to pg_catalog

Historically `pg_dist_object` had been created in the `citus` schema as an experiment to understand if we could move our catalog tables to a branded schema. We quickly realised that this interfered with the UX on our managed services and other environments, where users connected via a user with the name of `citus`.

By default postgres put the username on the search_path. To be able to read the catalog in the `citus` schema we would need to grant access permissions to the schema. This caused newly created objects like tables etc, to default to this schema for creation. This failed due to the write permissions to that schema.

With this change we move the `pg_dist_object` catalog table to the `pg_catalog` schema, where our other schema's are also located. This makes the catalog table visible and readable by any user, like our other catalog tables, for debugging purposes.

Note: due to the change of schema, we had to disable 1 test that was running into a discrepancy between the schema and binary. Secondly, we needed to make the lookup functions for the `pg_dist_object` relation and their indexes less strict on the fallback of the naming due to an other test that, due to an unfortunate cache invalidation, needed to lookup the relation again. This makes that we won't default to _only_ resolving from `pg_catalog` outside of upgrades.
2022-03-04 17:40:38 +00:00
Nils Dijk ea86f9f94e
Add support for TEXT SEARCH CONFIGURATION objects (#5685)
DESCRIPTION: Implement TEXT SEARCH CONFIGURATION propagation

The change adds support to Citus for propagating TEXT SEARCH CONFIGURATION objects. TSConfig objects cannot always be created in one create statement, and instead require a create statement followed by many alter statements to get turned into the object they should represent.

To support this we add functionality to the worker to create or replace objects based on a list of statements. When the lists of the local object and the remote object correspond 1:1 we skip the creation of the object and simply mark it distributed. This is especially important for TSConfig objects as initdb pre-populates databases with a dozen configurations (for many different languages).

When the user creates a new TSConfig based on the copy of an existing configuration there is no direct link to the object copied from. Since there is no link we can't simply rely on propagating the dependencies to the worker and send a qualified
2022-02-17 13:12:46 +01:00
Burak Velioglu 8ae7577581
Use superuser connection while syncing dependent objects' pg_dist_object tuples 2022-02-07 17:50:45 +03:00
Onder Kalaci ff234fbfd2 Unify old GUCs into a single one
Replaces citus.enable_object_propagation with citus.enable_metadata_sync

Also, within Citus 11 release cycle, we added citus.enable_metadata_sync_by_default,
that is also replaced with citus.enable_metadata_sync.

In essence, when citus.enable_metadata_sync is set to true, all the objects
and the metadata is send to the remote node.

We strongly advice that the users never changes the value of
this GUC.
2022-02-04 10:52:56 +01:00
Teja Mupparti 54862f8c22 (1) Functions will be delegated even when present in the scope of an explicit
BEGIN/COMMIT transaction block or in a UDF calling another UDF.
(2) Prohibit/Limit the delegated function not to do a 2PC (or any work on a
remote connection).
(3) Have a safety net to ensure the (2) i.e. we should block the connections
from the delegated procedure or make sure that no 2PC happens on the node.
(4) Such delegated functions are restricted to use only the distributed argument
value.

Note: To limit the scope of the project we are considering only Functions(not
procedures) for the initial work.

DESCRIPTION: Introduce a new flag "force_delegation" in create_distributed_function(),
which will allow a function to be delegated in an explicit transaction block.

Fixes #3265

Once the function is delegated to the worker, on that node during the planning

distributed_planner()
TryToDelegateFunctionCall()
CheckDelegatedFunctionExecution()
EnableInForceDelegatedFuncExecution()
Save the distribution argument (Constant)
ExecutorStart()
CitusBeginScan()
IsShardKeyValueAllowed()
Ensure to not use non-distribution argument.

ExecutorRun()
AdaptiveExecutor()
StartDistributedExecution()
EnsureNoRemoteExecutionFromWorkers()
Ensure all the shards are local to the node in the remoteTaskList.
NonPushableInsertSelectExecScan()
InitializeCopyShardState()
EnsureNoRemoteExecutionFromWorkers()
Ensure all the shards are local to the node in the placementList.

This also fixes a minor issue: Properly handle expressions+parameters in distribution arguments
2022-01-19 16:43:33 -08:00
Önder Kalacı 0a8b0b06c6
Do not allow distributed functions on non-metadata synced nodes (#5586)
Before this commit, Citus was triggering metadata syncing
in the background when a function is distributed. However,
with Citus 11, we expect all clusters to have metadata synced
enabled. So, we do not expect any nodes not to have the metadata.

This change:
	(a) pro: simplifies the code and opens up possibilities
		 to simplify futher by reducing the scope of
		 bg worker to only sync node metadata
        (b) pro: explicitly asks users to sync the metadata such that
  	    any unforseen impact can be easily detected
        (c) con: For distributed functions without distribution
		 argument, we do not necessarily require the metadata
		 sycned. However, for completeness and simplicity, we
		 do so.
2022-01-04 13:12:57 +01:00
Burak Velioglu ed8e32de5e
Sync pg_dist_object on an update and propagate while syncing to a new node
Before that PR we were updating citus.pg_dist_object metadata, which keeps
the metadata related to objects on Citus, only on the coordinator node. In
order to allow using those object from worker nodes (or erroring out with
proper error message) we've started to propagate that metedata to worker
nodes as well.
2021-12-06 19:25:50 +03:00
Sait Talha Nisanci 0b67fcf81d Fix style 2021-09-03 16:09:59 +03:00
Halil Ozan Akgul 54ee93885a Introduces getObjectTypeDescription_compat and getObjectIdentity_compat macros
getObjectTypeDescription and getObjectIdentity functions now have a new bool missing_ok parameter
These new macros give us the ability to use this new parameter for PG14 and they don't give the parameter for previous versions
Currently all missing_ok parameters are set to false to keep current behavior

Relevant PG commit:
2a10fdc4307a667883f7a3369cb93a721ade9680
2021-09-03 15:27:24 +03:00
SaitTalhaNisanci 03832f353c Drop postgres 11 support 2021-03-25 09:20:28 +03:00
Naisila Puka 5ebd4eac7f
Preserve colocation with procedures in alter_distributed_table (#4743) 2021-02-25 19:52:47 +03:00
Ahmet Gedemenli 436c9d9d79
Remove the word 'master' from Citus UDFs (#4472)
* Replace master_add_node with citus_add_node

* Replace master_activate_node with citus_activate_node

* Replace master_add_inactive_node with citus_add_inactive_node

* Use master udfs in old scripts

* Replace master_add_secondary_node with citus_add_secondary_node

* Replace master_disable_node with citus_disable_node

* Replace master_drain_node with citus_drain_node

* Replace master_remove_node with citus_remove_node

* Replace master_set_node_property with citus_set_node_property

* Replace master_unmark_object_distributed with citus_unmark_object_distributed

* Replace master_update_node with citus_update_node

* Replace master_update_shard_statistics with citus_update_shard_statistics

* Replace master_update_table_statistics with citus_update_table_statistics

* Rename master_conninfo_cache_invalidate to citus_conninfo_cache_invalidate

Rename master_dist_local_group_cache_invalidate to citus_dist_local_group_cache_invalidate

* Replace master_copy_shard_placement with citus_copy_shard_placement

* Replace master_move_shard_placement with citus_move_shard_placement

* Rename master_dist_node_cache_invalidate to citus_dist_node_cache_invalidate

* Rename master_dist_object_cache_invalidate to citus_dist_object_cache_invalidate

* Rename master_dist_partition_cache_invalidate to citus_dist_partition_cache_invalidate

* Rename master_dist_placement_cache_invalidate to citus_dist_placement_cache_invalidate

* Rename master_dist_shard_cache_invalidate to citus_dist_shard_cache_invalidate

* Drop master_modify_multiple_shards

* Rename master_drop_all_shards to citus_drop_all_shards

* Drop master_create_distributed_table

* Drop master_create_worker_shards

* Revert old function definitions

* Add missing revoke statement for citus_disable_node
2021-01-13 12:10:43 +03:00
Sait Talha Nisanci bf831d2e59 Use table_openXXX methods in the codebase
With PG13 heap_* (heap_open, heap_close etc) are replaced with table_*
(table_open, table_close etc).

It is better to use the new table access methods in the codebase and
define the macros for the previous versions as we can easily remove the
macro without having to change the codebase when we drop the support for
the old version.

Commits that introduced this change on Postgres:
f25968c49697db673f6cd2a07b3f7626779f1827
e0c4ec07284db817e1f8d9adfb3fffc952252db0
4b21acf522d751ba5b6679df391d5121b6c4a35f

Command to see relevant commits on Postgres side:
git log --all --grep="heap_open"
2020-08-04 15:10:22 +03:00
SaitTalhaNisanci ba01f3457a
use macros for pg versions instead of hardcoded values (#3694)
3 Macros are defined for removing the hardcoded pg versions.
PG_VERSION_11, PG_VERSION_12 and PG_VERSION_13.
2020-04-01 17:01:52 +03:00
SaitTalhaNisanci 13204487e9
remove copyright years (#3286) 2019-12-11 21:14:08 +03:00
Jelte Fennema 1d8dde232f
Automatically convert useless declarations using regex replace (#3181)
* Add declaration removal to CI

* Convert declarations
2019-11-21 13:47:29 +01:00
Onur TIRTIR 9961297d7b Improve extension command propagation logic and tests
* Improve extension command propagation tests

* patch for hardcoded citus extension name

(cherry picked from commit 0bb3dbac0afabda10e8928f9c17eda048dc4361a)
2019-11-21 11:24:39 +03:00
Önder Kalacı 40fa3862ce
Prevent Citus extension becoming distributed object (#3197)
Prevent Citus extension being distributed

Because that could prevent doing rolling upgrades, where users may
prefer to upgrade the version on the coordinator but not the workers.

There could be some other edge cases, so I'd prefer to keep Citus
extension outside the picture for now.
2019-11-18 16:57:10 +01:00
Philip Dubé 74cb168205 Remove Postgres 10 support 2019-10-11 21:56:56 +00:00
Philip Dubé bc1ad67eb5 Distribute CALL on distributed procedures to metadata workers
Lots taken from https://github.com/citusdata/citus/pull/2829
2019-09-24 17:31:09 +00:00
Onder Kalaci d37745bfc7 Sync metadata to worker nodes after create_distributed_function
Since the distributed functions are useful when the workers have
metadata, we automatically sync it.

Also, after master_add_node(). We do it lazily and let the deamon
sync it. That's mainly because the metadata syncing cannot be done
in transaction blocks, and we don't want to add lots of transactional
limitations to master_add_node() and create_distributed_function().
2019-09-23 18:30:53 +02:00
Onder Kalaci d7e2968120 Add parameters to create_distributed_function()
With this commit, we're changing the API for create_distributed_function()
such that users can provide the distribution argument and the colocation
information.
2019-09-22 21:53:33 +02:00
Nils Dijk 2879689441
Distribute Types to worker nodes (#2893)
DESCRIPTION: Distribute Types to worker nodes

When to propagate
==============

There are two logical moments that types could be distributed to the worker nodes
 - When they get used ( just in time distribution )
 - When they get created ( proactive distribution )

The just in time distribution follows the model used by how schema's get created right before we are going to create a table in that schema, for types this would be when the table uses a type as its column.

The proactive distribution is suitable for situations where it is benificial to have the type on the worker nodes directly. They can later on be used in queries where an intermediate result gets created with a cast to this type.

Just in time creation is always the last resort, you cannot create a distributed table before the type gets created. A good example use case is; you have an existing postgres server that needs to scale out. By adding the citus extension, add some nodes to the cluster, and distribute the table. The type got created before citus existed. There was no moment where citus could have propagated the creation of a type.

Proactive is almost always a good option. Types are not resource intensive objects, there is no performance overhead of having 100's of types. If you want to use them in a query to represent an intermediate result (which happens in our test suite) they just work.

There is however a moment when proactive type distribution is not beneficial; in transactions where the type is used in a distributed table.

Lets assume the following transaction:

```sql
BEGIN;
CREATE TYPE tt1 AS (a int, b int);
CREATE TABLE t1 AS (a int PRIMARY KEY, b tt1);
SELECT create_distributed_table('t1', 'a');
\copy t1 FROM bigdata.csv
```

Types are node scoped objects; meaning the type exists once per worker. Shards however have best performance when they are created over their own connection. For the type to be visible on all connections it needs to be created and committed before we try to create the shards. Here the just in time situation is most beneficial and follows how we create schema's on the workers. Outside of a transaction block we will just use 1 connection to propagate the creation.

How propagation works
=================

Just in time
-----------

Just in time propagation hooks into the infrastructure introduced in #2882. It adds types as a supported object in `SupportedDependencyByCitus`. This will make sure that any object being distributed by citus that depends on types will now cascade into types. When types are depending them self on other objects they will get created first.

Creation later works by getting the ddl commands to create the object by its `ObjectAddress` in `GetDependencyCreateDDLCommands` which will dispatch types to `CreateTypeDDLCommandsIdempotent`.

For the correct walking of the graph we follow array types, when later asked for the ddl commands for array types we return `NIL` (empty list) which makes that the object will not be recorded as distributed, (its an internal type, dependant on the user type).

Proactive distribution
---------------------

When the user creates a type (composite or enum) we will have a hook running in `multi_ProcessUtility` after the command has been applied locally. Running after running locally makes that we already have an `ObjectAddress` for the type. This is required to mark the type as being distributed.

Keeping the type up to date
====================

For types that are recorded in `pg_dist_object` (eg. `IsObjectDistributed` returns true for the `ObjectAddress`) we will intercept the utility commands that alter the type.
 - `AlterTableStmt` with `relkind` set to `OBJECT_TYPE` encapsulate changes to the fields of a composite type.
 - `DropStmt` with removeType set to `OBJECT_TYPE` encapsulate `DROP TYPE`.
 - `AlterEnumStmt` encapsulates changes to enum values.
    Enum types can not be changed transactionally. When the execution on a worker fails a warning will be shown to the user the propagation was incomplete due to worker communication failure. An idempotent command is shown for the user to re-execute when the worker communication is fixed.

Keeping types up to date is done via the executor. Before the statement is executed locally we create a plan on how to apply it on the workers. This plan is executed after we have applied the statement locally.

All changes to types need to be done in the same transaction for types that have already been distributed and will fail with an error if parallel queries have already been executed in the same transaction. Much like foreign keys to reference tables.
2019-09-13 17:46:07 +02:00
Philip Dubé a28b82d67d get_catalog_object_by_oid requires an extra parameter in pg12 2019-09-05 16:38:07 +00:00
Nils Dijk 936d546a3c
Refactor Ensure Schema Exists to Ensure Dependecies Exists (#2882)
DESCRIPTION: Refactor ensure schema exists to dependency exists

Historically we only supported schema's as table dependencies to be created on the workers before a table gets distributed. This PR puts infrastructure in place to walk pg_depend to figure out which dependencies to create on the workers. Currently only schema's are supported as objects to create before creating a table.

We also keep track of dependencies that have been created in the cluster. When we add a new node to the cluster we use this catalog to know which objects need to be created on the worker.

Side effect of knowing which objects are already distributed is that we don't have debug messages anymore when creating schema's that are already created on the workers.
2019-09-04 14:10:20 +02:00