Commit Graph

662 Commits (df6d56c1ed285b06b41e4ea1493fb69d5990fe9f)

Author SHA1 Message Date
Andres Freund e8b793c454 Support for IN (const, list) and = ANY(const, b, c) pruning. 2017-08-10 08:56:36 +03:00
Onder Kalaci b5ea3ab6a3 Improve locking semantics for backend management
We use the backend shared memory lock for preventing
new backends to be part of a new distributed transaction
or an existing backend to leave a distributed transaction
while we're reading the all backends' data.

The primary goal is to provide consistent view of the
current distributed transactions while doing the
deadlock detection.
2017-08-09 17:17:12 +03:00
Brian Cloutier 2e0916e15a Add master_add_secondary_node() UDF 2017-08-09 17:10:48 +03:00
Marco Slot 08ed6d8269 Prevent pg_dist_node changes during master_create_empty_shard 2017-08-09 14:22:09 +02:00
Marco Slot 3a0571e69b Remove LockMetadataSnapshot 2017-08-09 14:09:54 +02:00
Marco Slot c2f8bafa05 Fix shard creation vs. pg_dist_node change locking 2017-08-09 14:09:54 +02:00
Marco Slot 868ee6be83 Fix and simplify pg_dist_node locking 2017-08-09 14:09:54 +02:00
Burak Yucesoy 8455d1a4ef Ensure we are allowing partitioned tables at all appropriate places 2017-08-09 10:01:35 +03:00
Burak Yucesoy 2eee556738 Add distributed partitioned table support for COPY
For partitioned tables, PostgreSQL opens partition and its partitions
in BeginCopyFrom and it expects its caller to close those relations.
However, we do not have quick access to opened relations and performing
special operations for partitioned tables isn't necessary in coordinator
node. Therefore before calling BeginCopyFrom, we change relkind of those
partitioned tables to RELKIND_RELATION. This prevents PostgreSQL to open
its partitions as well.
2017-08-09 10:01:35 +03:00
Burak Yucesoy 31f3221342 Add distributed partitioned table support to router plannable queries
In standart_planner, PostgreSQL expands partitioned tables to their
partitions and call our restriction hook for each partition. It also,
for some queries, skips the partitioned table itself completely. This
behaviour makes it difficult to prune shards and decide whether query
is router plannable or not. To prevent this behaviour, we change inh
flag of partitioned tables to false in the query tree. In this case,
PostgreSQL treats those partitioned tables as regular relations and
does not expand them.

This behaviour is inline with our expectations, because we do not want
to treat partitioned tables differently on coordinator. Although we are
not entirely comfortable with modifying query tree, other solutions to
this problem is overly complicated.
2017-08-09 10:01:35 +03:00
Burak Yucesoy fddf9b3fcc Add distributed partitioned table support distributed table creation
With this PR, Citus starts to support all possible ways to create
distributed partitioned tables. These are;

- Distributing already created partitioning hierarchy
- CREATE TABLE ... PARTITION OF a distributed_table
- ALTER TABLE distributed_table ATTACH PARTITION non_distributed_table
- ALTER TABLE distributed_table ATTACH PARTITION distributed_table

We also support DETACHing partitions from partitioned tables and propogating
TRUNCATE and DDL commands to distributed partitioned tables.

This PR also refactors some parts of distributed table creation logic.
2017-08-09 10:01:35 +03:00
Metin Doslu b8a9e7c1bf Add support for UPDATE/DELETE with subqueries 2017-08-08 21:35:08 +03:00
Marco Slot d3e9746236 Avoid connections that accessed non-colocated placements in multi-shard commands 2017-08-08 18:32:34 +02:00
Brian Cloutier 7060ade6fe GetNodeTuple returns NULL it node does not exist
It never throws an error.
2017-08-08 13:12:06 +03:00
Brian Cloutier a3e9bef685 All users of WorkerNodeHash take an AccessShareLock
The metadata cache simulates a SELECT on pg_dist_node. Now the locks it
takes also simulate that SELECT.
2017-08-08 13:12:06 +03:00
Brian Cloutier 5914c992e6 cluster management UDFs see nodes in different clusters
- master_activate_node and master_disable_node correctly toggle
  isActive, without crashing
- master_add_node rejects duplicate nodes, even if they're in different
  clusters
- master_remove_node allows removing nodes in different clusters
2017-08-08 13:12:06 +03:00
Brian Cloutier 3151b52a0b Add citus.cluster_name GUC
- Nodes with a nodecluster which does not match citus.cluster_name
  are excluded from the metadata cache and never seen by another part of
  Citus.
2017-08-08 13:12:06 +03:00
Brian Cloutier 94947c0d54 Refactor: ReplicateShardToAllWorkers more explicitly locks pg_dist_node 2017-08-08 13:12:06 +03:00
Brian Cloutier f87fefa323 Refactor: DistributedTableSize more explicitly only locks pg_dist_node 2017-08-08 13:12:06 +03:00
Brian Cloutier 3769381366 Fix inaccurate comment on SetNodeState 2017-08-08 13:12:06 +03:00
Brian Cloutier fbecf48a03 Disallow adding primary nodes to non-default clusters 2017-08-08 11:18:31 +03:00
Brian Cloutier 5618e69386 Add pg_dist_node.nodecluster 2017-08-08 11:18:31 +03:00
Brian Cloutier e7846ba7d1 Allow metadata sync functions on secondaries
{start,stop}_metadata_sync_to_node now toggle the hasMetadata flag when
run on secondaries but don't attempt to actually sync any metadata.
2017-08-07 18:46:51 +03:00
Marco Slot 4cc7c36596 Simplify metadata lock acquisition for DML 2017-08-07 15:36:58 +02:00
Marco Slot aa7ca81548 Execute UPDATE/DELETE statements with 0 shards 2017-08-07 15:36:58 +02:00
Marco Slot bac60bb64f Function evaluation descends into expression trees 2017-08-06 19:53:05 +02:00
Brian Cloutier 37985de85e master_disable_node no longer crashes when given a non-existant node 2017-08-04 11:14:54 +03:00
Hadi Moshayedi 8229a64fe8 Remove distributed tables' dependency on distribution key columns. (#1527)
This change removes distributed tables' dependency on distribution key columns. We already check that we cannot drop distribution key columns in ErrorIfUnsupportedAlterTableStmt() at multi_utility.c, so we don't need to have distributed table to distribution key column dependency to avoid dropping of distribution key column.

Furthermore, having this dependency causes some warnings in pg_dump --schema-only (See #866), which are not desirable.

This change also adds check to disallow drop of distribution keys when citus.enable_ddl_propagation is set to false. Regression tests are updated accordingly.
2017-08-03 10:07:04 -04:00
Murat Tuncer fa18899cf9 Remove serialization/deserialization of multiplan node (#1477)
introduces copy functions for Citus MultiPlan nodes.
uses ExtensibleNode mechanism to store MultiPlan data
drops serialiazation of MultiPlans
2017-08-02 08:24:00 +03:00
Burak Yucesoy 7769f1d012 Refactor distributed table creation logic
This commit is preperation for introducing distributed partitioned
table support. We want to clean and refactor some code in distributed
table creation logic so that we can handle partitioned tables in more
robust way.
2017-07-31 11:11:23 +03:00
Brian Cloutier b20a086a8f master_activate_node UDF also returns noderole 2017-07-28 16:02:43 +03:00
Murat Tuncer 26f020dc6e Make maxTaskStringSize configurable (#1501)
maxTaskStringSize determines the size of worker query string.
It was originally hard coded to a specific value. This has caused
issues at some users. Since it determines initial shared memory
allocation, we did not want to set it to an arbitrary higher number.
Instead made it configurable.

This commit introduces a new GUC variable max_task_string_size

Changes in this variable requires restart to be in effect.
2017-07-27 11:39:12 -07:00
Onder Kalaci 6132d17481 Convert global wait edges to adjacency list
In this commit, we add ability to convert global wait edges
into adjacency list with the following format:
 [transactionId] = [transactionNode->waitsFor {list of waiting transaction nodes}]
2017-07-27 19:53:51 +03:00
Murat Tuncer 8729b7d55a Use cstore_table_size function to determine cstore table size (#1521)
pg_table_size/pg_relation_size variants always return 0 for
cstore tables. We should be using cstore_table_size function
for cstore_tables.
2017-07-27 09:02:07 -07:00
Brian Cloutier 32e16ffe02 Give isolation tester ability to see locks on workers 2017-07-26 18:43:04 +03:00
Eren BaÅŸak a12f1980de Add Progress Tracking Infrastructure
This change adds a general purpose infrastructure to log and monitor
process about long running progresses. It uses
`pg_stat_get_progress_info` infrastructure, introduced with PostgreSQL
9.6 and used for tracking `VACUUM` commands.

This patch only handles the creation of a memory space in dynamic shared
memory, putting its info in `pg_stat_get_progress_info`, fetching the
progress monitors on demand and finalizing the progress tracking.
2017-07-26 14:12:15 +03:00
Marco Slot 80ea233ec1 Add function for dumping global wait edges 2017-07-25 16:52:32 +02:00
Marco Slot 81198a1d02 Add function for dumping local wait edges 2017-07-25 16:52:32 +02:00
Onder Kalaci 58faffa42b Fix bug on error check for assigning distributed transaction id
to a backend that has already been assigned a transaction.
2017-07-25 14:58:07 +03:00
Marco Slot 3d7f79127d Do not release locks in LogTransactionRecord 2017-07-24 20:44:38 +02:00
Brian Cloutier 88702ca58a node_metadata takes out more sane locks
- Never release locks
- AddNodeMetadata takes ShareRowExclusiveLock so it'll conflict with the
  trigger which prevents multiple primary nodes.
- ActivateNode and SetNodeState used to take AccessShareLock, but they
  modify the table so they should take RowExclusiveLock.
- DeleteNodeRow and InsertNodeRow used to take AccessExclusiveLock but
  only need RowExclusiveLock.
2017-07-24 11:57:46 +03:00
Brian Cloutier ec99f8f983 Add nodeRole column
- master_add_node enforces that there is only one primary per group
- there's also a trigger on pg_dist_node to prevent multiple primaries
  per group
- functions in metadata cache only return primary nodes
- Rename ActiveWorkerNodeList -> ActivePrimaryNodeList
- Rename WorkerGetLive{Node->Group}Count()
- Refactor WorkerGetRandomCandidateNode
- master_remove_node only complains about active shard placements if the
  node being removed is a primary.
- master_remove_node only deletes all reference table placements in the
  group if the node being removed is the primary.
- Rename {Node->NodeGroup}HasShardPlacements, this reflects the behavior it
  already had.
- Rename DeleteAllReferenceTablePlacementsFrom{Node->NodeGroup}. This also
  reflects the behavior it already had, but the new signature forces the
  caller to pass in a groupId
- Rename {WorkerGetLiveGroup->ActivePrimaryNode}Count
2017-07-24 11:57:46 +03:00
Brian Cloutier e6c375eb81 Tiny refactor to master_create_empty_shard 2017-07-24 11:57:46 +03:00
Brian Cloutier ee270b65d7 make WorkerGetNodeWithName a static function 2017-07-24 11:57:46 +03:00
Marco Slot 601b17d544 Use distributed transaction number in 2PC identifiers 2017-07-21 17:36:33 +02:00
Marco Slot 18a6e478af Fix typo in GetCurrentDistributedTransctionId 2017-07-21 17:36:33 +02:00
Brian Cloutier 7f1343103e Fix PG 10 build, UNBOUNDED partitions now have different syntax
Update code and tests to match the changes made in pg's d363d42
2017-07-21 14:30:11 +03:00
Brian Cloutier 74dd5bb281 Fix crash when removing an inactive node 2017-07-20 18:55:40 +03:00
Hadi Moshayedi 953df34d22 Explicit switch/case fall-throughs to avoid compiler warnings.
GCC 7 added `-Wimplicit-fallthrough` to warn for not explicitly specified switch/case fall-throughs.

According to https://gcc.gnu.org/gcc-7/changes.html, to suppress that warning we could either use `__attribute__(fallthrough)`, which didn't seem to work for earlier GCC versions, or a `/* fallthrough */` comment just before the following `case`.

Previously Citus code had the fall-through comments inside the brackets, which didn't seem to suppress the warning. Putting a `/* fallthrough */` comment outside the brackets and right before the `case` fixes the problem.
2017-07-19 11:41:59 -04:00
Onder Kalaci 3369f3486f Introduce distributed transaction ids
This commit adds distributed transaction id infrastructure in
the scope of distributed deadlock detection.

In general, the distributed transaction id consists of a tuple
in the form of: `(databaseId, initiatorNodeIdentifier, transactionId,
timestamp)`.

Briefly, we add a shared memory block on each node, which holds some
information per backend (i.e., an array `BackendData backends[MaxBackends]`).
Later, on each coordinated transaction, Citus sends
`SELECT assign_distributed_transaction_id()` right after `BEGIN`.
For that backend on the worker, the distributed transaction id is set to
the values assigned via the function call.

The aim of the above is to correlate the transactions on the coordinator
to the transactions on the worker nodes.
2017-07-18 15:01:42 +03:00
velioglu 6ea15fbb25 Make create_distributed_table transactional 2017-07-18 12:35:40 +03:00
Brian Cloutier 72d8d2429b Add a test for upgrading shard placements 2017-07-12 14:18:27 +02:00
Brian Cloutier ee4edc498f Don't release locks early in metadata functions 2017-07-12 14:18:27 +02:00
Brian Cloutier f40f03270a Fix locking in ReadWorkerNodes() 2017-07-12 14:18:27 +02:00
Brian Cloutier 7ad95b53d2 Rename pg_dist_shard_placement -> pg_dist_placement
Comes with a few changes:

- Change the signature of some functions to accept groupid
  - InsertShardPlacementRow
  - DeleteShardPlacementRow
  - UpdateShardPlacementState

- NodeHasActiveShardPlacements returns true if the group the node is a
  part of has any active shard placements

- TupleToShardPlacement now returns ShardPlacements which have NULL
  nodeName and nodePort.

- Populate (nodeName, nodePort) when creating ShardPlacements
- Disallow removing a node if it contains any shard placements

- DeleteAllReferenceTablePlacementsFromNode matches based on group. This
  doesn't change behavior for now (while there is only one node per
  group), but means in the future callers should be careful about
  calling it on a secondary node, it'll delete placements on the primary.

- Create concept of a GroupShardPlacement, which represents an actual
  tuple in pg_dist_placement and is distinct from a ShardPlacement,
  which has been resolved to a specific node. In the future
  ShardPlacement should be renamed to NodeShardPlacement.

- Create some triggers which allow existing code to continue to insert
  into and update pg_dist_shard_placement as if it still existed.
2017-07-12 14:17:31 +02:00
Brian Cloutier fe53fd4a8e Remove functions created just for unit testing
These functions are holdovers from pg_shard and were created for unit
testing c-level functions (like InsertShardPlacementRow) which our
regression tests already test quite effectively. Removing because it
makes refactoring the signatures of those c-level functions
unnecessarily difficult.

- create_healthy_local_shard_placement_row
- update_shard_placement_row_state
- delete_shard_placement_row
2017-07-12 14:16:24 +02:00
Brian Cloutier 0b64bb1092 Fix typo in comment in CachedRelationLookup 2017-07-12 14:16:24 +02:00
Marco Slot 9f7e4769e2 Clarify placement connection error messages 2017-07-12 11:59:19 +02:00
Marco Slot d3785b97c0 Remove XactModificationLevel distinction between DML and multi-shard 2017-07-12 11:59:19 +02:00
Marco Slot 710fe8666b Use GetPlacementListConnection for router DML 2017-07-12 11:26:23 +02:00
Marco Slot 29f21fea59 Use GetPlacementListConnection for multi-shard commands 2017-07-12 11:26:22 +02:00
Marco Slot 01c9b1f921 Use GetPlacementListConnection for router SELECTs 2017-07-12 11:26:22 +02:00
Marco Slot 63676f5d65 Allow choosing a connection for multiple placements with GetPlacementListConnection 2017-07-12 11:26:22 +02:00
Jason Petersen 9018e698ec
Indentation cleanup
Uncrustify 0.65 appears to have changed some defaults, resulting in
breakages for those of us who have already upgraded; Travis still uses
Uncrustify 0.64, but these changes work with both versions (assuming
appropriately updated config), so this should permit use of either
version for the time being.
2017-07-11 15:59:28 -06:00
Burak Yucesoy c8b9e4011b Remove LockRelationDistributionMetadata function 2017-07-10 15:46:37 +03:00
Burak Yucesoy cb6070c720 Use ShareUpdateExclusiveLock instead ShareLock in VACUUM
Before this change, we used ShareLock to acquire lock on distributed tables while
running VACUUM. This makes VACUUM and INSERT block each other. With this change we
changed lock mode from ShareLock to ShareUpdateExclusiveLock, which does not conflict
with the locks INSERT acquire.
2017-07-10 15:46:19 +03:00
Murat Tuncer 2a4eada150 Replace duplicate code and call check_functions_in_node (#1478)
MasterIrreducibleExpressionWalker has a copied code from
function check_functions_in_node() which was available with
PG 9.6+. Now PG 9.5 support is dropped we can remove
duplicate code and directly call check_functions_in_node().
2017-07-07 10:19:33 +03:00
Marco Slot 31debc96e3 Handle implicit casts in prepared INSERTs 2017-07-06 16:17:35 +02:00
Andres Freund 3461244539 Don't wait for statement completion when aborting coordinated transaction.
Previously we used ForgetResults() in StartRemoteTransactionAbort() -
that's problematic because there might still be an ongoing statement,
and this causes us to wait for its completion.  That e.g. happens when
a statement running on the coordinator is cancelled.
2017-07-04 14:46:03 -07:00
Andres Freund 0d791f6740 Cancel statements when closing connection at transaction end.
That's important because the currently running statement on a worker
might continue to hold locks and consume resources, even after the
connection is closed.  Unfortunately postgres will only notice closed
connections when reading from / writing to the network. That might
only happen much later.
2017-07-04 14:46:03 -07:00
Andres Freund be8677f926 Add NonblockingForgetResults().
This is very similar to ForgetResults() except that no network IO is
performed. Primarily useful in error handling cases.
2017-07-04 14:46:03 -07:00
Andres Freund 24153fae5d Add ShutdownConnection() which cancels statement before closing connection.
That's primarily useful in error cases, where we want to make sure
locks etc held by commands running on workers are released promptly.
2017-07-04 14:46:03 -07:00
Andres Freund 75a7ddea0d Always use connections in non-blocking mode.
Now that there's no blocking libpq callers left, default to using
non-blocking mode in connection_management.c.  This has two
advantages:
1) Blockiness doesn't have to frequently be reset, simplifying code
2) Prevents accidental use of blocking libpq functions, since they'll
   frequently return 'need IO'
2017-07-04 14:46:03 -07:00
Andres Freund 90a2d13a64 Move multi_copy.c to interrupt aware libpq wrappers. 2017-07-04 14:46:03 -07:00
Andres Freund 21c25abbb1 Move multi_client_executor to interrupt aware libpq wrappers. 2017-07-04 12:38:52 -07:00
Andres Freund ddb0651967 Move citus tools to interrupt aware libpq wrappers. 2017-07-04 12:38:52 -07:00
Andres Freund c674bc8640 Add interrupt aware PQputCopy{End,Data} wrappers. 2017-07-04 12:38:52 -07:00
Andres Freund b7f9679ccc Move interrupt handling code from GetRemoteCommandResult to FinishConnectionIO.
Nearby commits will add additional interrupt handling functions, this
way we don't have to duplicate the code.
2017-07-04 12:38:52 -07:00
Andres Freund ec0ed677e3 Fix copy & pasto in WARNING message. 2017-07-04 12:38:52 -07:00
Andres Freund 3dedeadb5e Fix memory leak in RemoteFinalizedShardPlacementList(). 2017-07-04 12:38:52 -07:00
Marco Slot 04fe3f03f6 Change implementation of shard_name UDF to get schema-qualified shard name 2017-07-04 10:49:40 +03:00
Marco Slot da47a03b18 Move INSERT ... SELECT planning logic into one place 2017-06-29 15:03:14 +02:00
Onder Kalaci 5f3f1d75a3 Add some utility functions for partitioned tables
This commit is intended to be a base for supporting declarative partitioning
on distributed tables. Here we add the following utility functions and their
unit tests:

  * Very basic functions including differnentiating partitioned tables and
    partitions, listing the partitions
  * Generating the PARTITION BY (expr) and adding this to the DDL events
    of partitioned tables
  * Ability to generate text representations of the ranges for partitions
  * Ability to generate the `ALTER TABLE parent_table ATTACH PARTITION
    partition_table FOR VALUES value_range`
  * Ability to apply add shard ids to the above command using
    `worker_apply_inter_shard_ddl_command()`
  * Ability to generate `ALTER TABLE parent_table DETACH PARTITION`
2017-06-28 09:39:55 +03:00
Andres Freund dc3997c3b8 Remove 9.5 related node wrappers.
Now that all branches support the extensible node infrastructure, we
don't need our wrappers anymore.
2017-06-26 08:46:32 -07:00
Andres Freund b96ba9b490 Fix code only enabled for 9.5.
There's still supporting wrappers used, a subsequent commit will
remove those.

This also removes the already unused tuplecount_t define.
2017-06-26 08:46:32 -07:00
Andres Freund 60c28ce7a6 Remove 9.5 specific C files. 2017-06-26 08:46:32 -07:00
Jason Petersen 2204da19f0 Support PostgreSQL 10 (#1379)
Adds support for PostgreSQL 10 by copying in the requisite ruleutils
and updating all API usages to conform with changes in PostgreSQL 10.
Most changes are fairly minor but they are numerous. One particular
obstacle was the change in \d behavior in PostgreSQL 10's psql; I had
to add SQL implementations (views, mostly) to mimic the pre-10 output.
2017-06-26 02:35:46 -06:00
Andres Freund c3b7c5dc33 Introduce per-database maintenance process.
This will be used for deadlock detection, prepared transaction
recovery amongst others, but currently is just idling around.
2017-06-23 11:53:39 -07:00
Andres Freund 3483bb99eb Minimal infrastructure for per-backend citus initialization. 2017-06-23 11:20:10 -07:00
Andres Freund 1691f780fd Force cache invalidation machinery to be initialized earlier.
Previously it was not guaranteed that invalidations were registered
after creating the extension, only if the extension was used
afterwards.
2017-06-23 11:20:10 -07:00
Andres Freund f645dca593 Centralized metadata_cache cache variables into one struct, to avoid missing resets.
E.g. extensionOwner was already missed.
2017-06-23 11:20:10 -07:00
Marco Slot a6f42e4948 Clarify error message when copying NULL value into table 2017-06-22 15:48:24 +02:00
Marco Slot 2f8ac82660 Execute INSERT..SELECT via coordinator if it cannot be pushed down
Add a second implementation of INSERT INTO distributed_table SELECT ... that is used if
the query cannot be pushed down. The basic idea is to execute the SELECT query separately
and pass the results into the distributed table using a CopyDestReceiver, which is also
used for COPY and create_distributed_table. When planning the SELECT, we go through
planner hooks again, which means the SELECT can also be a distributed query.

EXPLAIN is supported, but EXPLAIN ANALYZE is not because preventing double execution was
a lot more complicated in this case.
2017-06-22 15:46:30 +02:00
Marco Slot 2cd358ad1a Support quoted column-names in COPY logic 2017-06-22 15:45:57 +02:00
Marco Slot 155db4d913 Simplify router planner call path 2017-06-22 15:45:57 +02:00
Murat Tuncer 0c4bf2d943 Remove fall back to select if poll is not available (#1466)
poll is supported on all relevant systems, there is no
need to have a fall back mechanism to use select()
2017-06-22 12:11:18 +03:00
Jason Petersen 294aeff2ed
Don't call PostProcessUtility for local commands
It is intended only to aid in processing of distributed DDL commands,
but as written could execute during local CREATE INDEX CONCURRENTLY
commands.
2017-06-19 15:56:03 -06:00
velioglu a17ab6408a Delete ExecuteRemoteCommand function 2017-06-15 17:11:19 +03:00
velioglu 173fe137af Convert DropShardsFromWorker to the new connection API 2017-06-15 15:24:06 +03:00
velioglu d7b68e5647 Convert TableDDLCommandList function to the new connection API 2017-06-14 17:29:58 +03:00