Commit Graph

39 Commits (685b54b3de1eb3d8f6b24e3e8505a73f06c6a98a)

Author SHA1 Message Date
Jelte Fennema 685b54b3de
Semmle: Check for NULL in some places where it might occur (#3509)
Semmle reported quite some places where we use a value that could be NULL. Most of these are not actually a real issue, but better to be on the safe side with these things and make the static analysis happy.
2020-02-27 10:45:29 +01:00
Philip Dubé 4b5d6c3ebe Rename RelayFileState to ShardState
Replace FILE_ prefix with SHARD_STATE_
2020-01-12 05:57:53 +00:00
SaitTalhaNisanci 13204487e9
remove copyright years (#3286) 2019-12-11 21:14:08 +03:00
Philip Dubé fcf2fd819b Add distributioncolumncollation to to pg_dist_colocation
Use partition column's collation for range distributed tables
Don't allow non deterministic collations for hash distributed tables
CoPartitionedTables: don't compare unequal types
2019-12-09 19:51:40 +00:00
Hadi Moshayedi 2268a9cae6 Error for metadata commands if any metadata node is out-of-sync (#3226)
* Error for metadata commands if any metadata node is out-of-sync

* Make the functions have separate APIs for all workers/metadata workers
2019-11-27 09:52:57 +01:00
Jelte Fennema 1d8dde232f
Automatically convert useless declarations using regex replace (#3181)
* Add declaration removal to CI

* Convert declarations
2019-11-21 13:47:29 +01:00
Hadi Moshayedi 15af1637aa Replicate reference tables to coordinator. 2019-11-15 05:50:19 -08:00
Hadi Moshayedi cb011bb30f Propagate isactive to metadata nodes. 2019-11-15 05:48:42 -08:00
Hadi Moshayedi e00d1546f3 Don't maintain replicationfactor of reference tables 2019-11-05 07:23:14 -08:00
Hadi Moshayedi 76f3933b05 Add metadatasynced, and sync on master_update_node()
Co-authored-by: pykello <hadi.moshayedi@microsoft.com>
Co-authored-by: serprex <serprex@users.noreply.github.com>
2019-09-18 09:32:54 -07:00
Philip Dubé 492d1b2cba ActivePrimaryNodeList: add lockMode parameter 2019-09-13 17:44:56 +00:00
Nils Dijk 2879689441
Distribute Types to worker nodes (#2893)
DESCRIPTION: Distribute Types to worker nodes

When to propagate
==============

There are two logical moments that types could be distributed to the worker nodes
 - When they get used ( just in time distribution )
 - When they get created ( proactive distribution )

The just in time distribution follows the model used by how schema's get created right before we are going to create a table in that schema, for types this would be when the table uses a type as its column.

The proactive distribution is suitable for situations where it is benificial to have the type on the worker nodes directly. They can later on be used in queries where an intermediate result gets created with a cast to this type.

Just in time creation is always the last resort, you cannot create a distributed table before the type gets created. A good example use case is; you have an existing postgres server that needs to scale out. By adding the citus extension, add some nodes to the cluster, and distribute the table. The type got created before citus existed. There was no moment where citus could have propagated the creation of a type.

Proactive is almost always a good option. Types are not resource intensive objects, there is no performance overhead of having 100's of types. If you want to use them in a query to represent an intermediate result (which happens in our test suite) they just work.

There is however a moment when proactive type distribution is not beneficial; in transactions where the type is used in a distributed table.

Lets assume the following transaction:

```sql
BEGIN;
CREATE TYPE tt1 AS (a int, b int);
CREATE TABLE t1 AS (a int PRIMARY KEY, b tt1);
SELECT create_distributed_table('t1', 'a');
\copy t1 FROM bigdata.csv
```

Types are node scoped objects; meaning the type exists once per worker. Shards however have best performance when they are created over their own connection. For the type to be visible on all connections it needs to be created and committed before we try to create the shards. Here the just in time situation is most beneficial and follows how we create schema's on the workers. Outside of a transaction block we will just use 1 connection to propagate the creation.

How propagation works
=================

Just in time
-----------

Just in time propagation hooks into the infrastructure introduced in #2882. It adds types as a supported object in `SupportedDependencyByCitus`. This will make sure that any object being distributed by citus that depends on types will now cascade into types. When types are depending them self on other objects they will get created first.

Creation later works by getting the ddl commands to create the object by its `ObjectAddress` in `GetDependencyCreateDDLCommands` which will dispatch types to `CreateTypeDDLCommandsIdempotent`.

For the correct walking of the graph we follow array types, when later asked for the ddl commands for array types we return `NIL` (empty list) which makes that the object will not be recorded as distributed, (its an internal type, dependant on the user type).

Proactive distribution
---------------------

When the user creates a type (composite or enum) we will have a hook running in `multi_ProcessUtility` after the command has been applied locally. Running after running locally makes that we already have an `ObjectAddress` for the type. This is required to mark the type as being distributed.

Keeping the type up to date
====================

For types that are recorded in `pg_dist_object` (eg. `IsObjectDistributed` returns true for the `ObjectAddress`) we will intercept the utility commands that alter the type.
 - `AlterTableStmt` with `relkind` set to `OBJECT_TYPE` encapsulate changes to the fields of a composite type.
 - `DropStmt` with removeType set to `OBJECT_TYPE` encapsulate `DROP TYPE`.
 - `AlterEnumStmt` encapsulates changes to enum values.
    Enum types can not be changed transactionally. When the execution on a worker fails a warning will be shown to the user the propagation was incomplete due to worker communication failure. An idempotent command is shown for the user to re-execute when the worker communication is fixed.

Keeping types up to date is done via the executor. Before the statement is executed locally we create a plan on how to apply it on the workers. This plan is executed after we have applied the statement locally.

All changes to types need to be done in the same transaction for types that have already been distributed and will fail with an error if parallel queries have already been executed in the same transaction. Much like foreign keys to reference tables.
2019-09-13 17:46:07 +02:00
Nils Dijk 936d546a3c
Refactor Ensure Schema Exists to Ensure Dependecies Exists (#2882)
DESCRIPTION: Refactor ensure schema exists to dependency exists

Historically we only supported schema's as table dependencies to be created on the workers before a table gets distributed. This PR puts infrastructure in place to walk pg_depend to figure out which dependencies to create on the workers. Currently only schema's are supported as objects to create before creating a table.

We also keep track of dependencies that have been created in the cluster. When we add a new node to the cluster we use this catalog to know which objects need to be created on the worker.

Side effect of knowing which objects are already distributed is that we don't have debug messages anymore when creating schema's that are already created on the workers.
2019-09-04 14:10:20 +02:00
Hadi Moshayedi a5b087c89b Support FKs between reference tables 2019-08-21 16:11:27 -07:00
Philip Dubé cd951fa9ca Avoid multiple pg_dist_colocation records being created for reference tables
master_deactivate_node is updated to decrement the replication factor
Otherwise deactivation could have create_reference_table produce a second record

UpdateColocationGroupReplicationFactor is renamed UpdateColocationGroupReplicationFactorForReferenceTables
& the implementation looks up the record based on distributioncolumntype == InvalidOid, rather than by id
Otherwise the record's replication factor fails to be maintained when there are no reference tables
2019-08-13 17:21:02 +00:00
Hanefi Onaldi 7e8fd49b94 Create Schemas as superuser on all shard/table creation UDFs
- All the schema creations on the workers will now be  via superuser connections
- If a shard is being repaired or a shard is replicated, we will create the
  schema only in the relevant worker; and in all the other cases where a schema
  creation is needed, we will block operations until we ensure the schema exists
  in all the workers
2019-06-26 17:12:28 +02:00
Hadi Moshayedi f4d3b94e22
Fix some of the casts for groupId (#2609)
A small change which partially addresses #2608.
2019-03-05 12:06:44 -08:00
Marco Slot aab9f623eb Check table ownership in upgrade_to_reference_table 2018-11-16 23:27:34 +01:00
Marco Slot f383e4f307
Description: Refactor code that handles DDL commands from one file into a module
The file handling the utility functions (DDL) for citus organically grew over time and became unreasonably large. This refactor takes that file and refactored the functionality into separate files per command. Initially modeled after the directory and file layout that can be found in postgres.

Although the size of the change is quite big there are barely any code changes. Only one two functions have been added for readability purposes:

- PostProcessIndexStmt which is extracted from PostProcessUtility
- PostProcessAlterTableStmt which is extracted from multi_ProcessUtility

A README.md has been added to `src/backend/distributed/commands` describing the contents of the module and every file in the module.
We need more documentation around the overloading of the COPY command, for now the boilerplate has been added for people with better knowledge to fill out.
2018-11-14 13:36:27 +01:00
Onder Kalaci abc443d7fa Make sure that shard repair considers replication factor 2018-09-21 15:24:49 +03:00
velioglu bd30e3e908 Add support for writing to reference tables from MX nodes 2018-08-27 18:15:04 +03:00
mehmet furkan şahin d1a3b20115 foreign_constraint_utils is created 2018-06-07 18:19:24 +03:00
Murat Tuncer 2d66bf5f16
Fix hard coded formatting strings for 64 bit numbers (#1831)
Postgres provides OS agnosting formatting macros for
formatting 64 bit numbers. Replaced %ld %lu with
INT64_FORMAT and UINT64_FORMAT respectively.

Also found some incorrect usages of formatting
flags and fixed them.
2017-12-04 14:11:06 +03:00
Marco Slot 868ee6be83 Fix and simplify pg_dist_node locking 2017-08-09 14:09:54 +02:00
Brian Cloutier a3e9bef685 All users of WorkerNodeHash take an AccessShareLock
The metadata cache simulates a SELECT on pg_dist_node. Now the locks it
takes also simulate that SELECT.
2017-08-08 13:12:06 +03:00
Brian Cloutier 94947c0d54 Refactor: ReplicateShardToAllWorkers more explicitly locks pg_dist_node 2017-08-08 13:12:06 +03:00
Brian Cloutier ec99f8f983 Add nodeRole column
- master_add_node enforces that there is only one primary per group
- there's also a trigger on pg_dist_node to prevent multiple primaries
  per group
- functions in metadata cache only return primary nodes
- Rename ActiveWorkerNodeList -> ActivePrimaryNodeList
- Rename WorkerGetLive{Node->Group}Count()
- Refactor WorkerGetRandomCandidateNode
- master_remove_node only complains about active shard placements if the
  node being removed is a primary.
- master_remove_node only deletes all reference table placements in the
  group if the node being removed is the primary.
- Rename {Node->NodeGroup}HasShardPlacements, this reflects the behavior it
  already had.
- Rename DeleteAllReferenceTablePlacementsFrom{Node->NodeGroup}. This also
  reflects the behavior it already had, but the new signature forces the
  caller to pass in a groupId
- Rename {WorkerGetLiveGroup->ActivePrimaryNode}Count
2017-07-24 11:57:46 +03:00
Brian Cloutier 74dd5bb281 Fix crash when removing an inactive node 2017-07-20 18:55:40 +03:00
Brian Cloutier 7ad95b53d2 Rename pg_dist_shard_placement -> pg_dist_placement
Comes with a few changes:

- Change the signature of some functions to accept groupid
  - InsertShardPlacementRow
  - DeleteShardPlacementRow
  - UpdateShardPlacementState

- NodeHasActiveShardPlacements returns true if the group the node is a
  part of has any active shard placements

- TupleToShardPlacement now returns ShardPlacements which have NULL
  nodeName and nodePort.

- Populate (nodeName, nodePort) when creating ShardPlacements
- Disallow removing a node if it contains any shard placements

- DeleteAllReferenceTablePlacementsFromNode matches based on group. This
  doesn't change behavior for now (while there is only one node per
  group), but means in the future callers should be careful about
  calling it on a secondary node, it'll delete placements on the primary.

- Create concept of a GroupShardPlacement, which represents an actual
  tuple in pg_dist_placement and is distinct from a ShardPlacement,
  which has been resolved to a specific node. In the future
  ShardPlacement should be renamed to NodeShardPlacement.

- Create some triggers which allow existing code to continue to insert
  into and update pg_dist_shard_placement as if it still existed.
2017-07-12 14:17:31 +02:00
Onder Kalaci 5f3f1d75a3 Add some utility functions for partitioned tables
This commit is intended to be a base for supporting declarative partitioning
on distributed tables. Here we add the following utility functions and their
unit tests:

  * Very basic functions including differnentiating partitioned tables and
    partitions, listing the partitions
  * Generating the PARTITION BY (expr) and adding this to the DDL events
    of partitioned tables
  * Ability to generate text representations of the ranges for partitions
  * Ability to generate the `ALTER TABLE parent_table ATTACH PARTITION
    partition_table FOR VALUES value_range`
  * Ability to apply add shard ids to the above command using
    `worker_apply_inter_shard_ddl_command()`
  * Ability to generate `ALTER TABLE parent_table DETACH PARTITION`
2017-06-28 09:39:55 +03:00
Burak Yucesoy 9fb15c439c Add version checks to necessary UDFs 2017-05-22 09:53:29 +03:00
Burak Yucesoy e9095e62ec Decouple reference table replication
With this change we add an option to add a node without replicating all reference
tables to that node. If a node is added with this option, we mark the node as
inactive and no queries will sent to that node.

We also added two new UDFs;
 - master_activate_node(host, port):
    - marks node as active and replicates all reference tables to that node
 - master_add_inactive_node(host, port):
    - only adds node to pg_dist_node
2017-04-17 13:33:31 +03:00
Marco Slot ba940a1de9 Use coordinator instead of schema node in terminology 2017-01-25 11:07:23 +01:00
Eren Basak e7c15ecc1f Make `upgrade_to_reference_table` function MX-compatible 2017-01-18 16:49:50 +03:00
Eren Basak 56ca590daa Propagate metadata changes for deleted reference table placements on master_remove_node call 2017-01-18 16:00:07 +03:00
Eren Basak be78769ae4 Propagate new reference table placement metadata on `master_add_node` 2017-01-18 15:59:06 +03:00
Burak Yucesoy 3315ae6142 Remove placement metadata of reference tables after master_remove_node
With this change, we start to delete placement of reference tables at given worker node
after master_remove_node UDF call. We remove placement metadata at master node but we do
not drop actual shard from the worker node. There are two reasons for that decision,
first, it is not critical to DROP the shards in the workers because Citus will ignore them
as long as node is removed from cluster and if we add that node back to cluster we will
DROP and recreate all reference tables. Second, if node is unreachable, it becomes
complicated to cover failure cases and have a transaction support.
2017-01-16 11:24:56 +03:00
Burak Yucesoy 9c9f479e4b Replicate reference tables when new node is added
With this change, we start to replicate all reference tables to the new node when new node
is added to the cluster with master_add_node command. We also update replication factor
of reference table's colocation group.
2017-01-05 14:30:41 +03:00
Burak Yucesoy 31cd2357fe Add upgrade_to_reference_table
With this change we introduce new UDF, upgrade_to_reference_table, which can be used to
upgrade existing broadcast tables reference tables. For upgrading, we require that given
table contains only one shard.
2017-01-02 17:54:42 +02:00