citus/src/backend/distributed/commands
Önder Kalacı ef7d1ea91d
Locally execute queries that don't need any data access (#3410)
* Update shardPlacement->nodeId to uint

As the source of the shardPlacement->nodeId is always workerNode->nodeId,
and that is uint32.

We had this hack because of: 0ea4e52df5 (r266421409)

And, that is gone with: 90056f7d3c (diff-c532177d74c72d3f0e7cd10e448ab3c6L1123)

So, we're safe to do it now.

* Relax the restrictions on using the local execution

Previously, whenever any local execution happens, we disabled further
commands to do any remote queries. The basic motivation for doing that
is to prevent any accesses in the same transaction block to access the
same placements over multiple sessions: one is local session the other
is remote session to the same placement.

However, the current implementation does not distinguish local accesses
being to a placement or not. For example, we could have local accesses
that only touches intermediate results. In that case, we should not
implement the same restrictions as they become useless.

So, this is a pre-requisite for executing the intermediate result only
queries locally.

* Update the error messages

As the underlying implementation has changed, reflect it in the error
messages.

* Keep track of connections to local node

With this commit, we're adding infrastructure to track if any connection
to the same local host is done or not.

The main motivation for doing this is that we've previously were more
conservative about not choosing local execution. Simply, we disallowed
local execution if any connection to any remote node is done. However,
if we want to use local execution for intermediate result only queries,
this'd be annoying because we expect all queries to touch remote node
before the final query.

Note that this approach is still limiting in Citus MX case, but for now
we can ignore that.

* Formalize the concept of Local Node

Also some minor refactoring while creating the dummy placement

* Write intermediate results locally when the results are only needed locally

Before this commit, Citus used to always broadcast all the intermediate
results to remote nodes. However, it is possible to skip pushing
the results to remote nodes always.

There are two notable cases for doing that:

   (a) When the query consists of only intermediate results
   (b) When the query is a zero shard query

In both of the above cases, we don't need to access any data on the shards. So,
it is a valuable optimization to skip pushing the results to remote nodes.

The pattern mentioned in (a) is actually a common patterns that Citus users
use in practice. For example, if you have the following query:

WITH cte_1 AS (...), cte_2 AS (....), ... cte_n (...)
SELECT ... FROM cte_1 JOIN cte_2 .... JOIN cte_n ...;

The final query could be operating only on intermediate results. With this patch,
the intermediate results of the ctes are not unnecessarily pushed to remote
nodes.

* Add specific regression tests

As there are edge cases in Citus MX and with round-robin policy,
use the same queries on those cases as well.

* Fix failure tests

By forcing not to use local execution for intermediate results since
all the tests expects the results to be pushed remotely.

* Fix flaky test

* Apply code-review feedback

Mostly style changes

* Limit the max value of pg_dist_node_seq to reserve for internal use
2020-01-23 18:28:34 +01:00
..
README.md Fix misc typos 2019-05-23 17:23:27 -07:00
call.c Lazy query deparsing executable queries (#3350) 2020-01-17 11:49:43 +01:00
cluster.c Introduce GetDistributeObjectOps to organize dispatch of logic dependent on node/object type 2020-01-09 18:24:29 +00:00
collation.c Introduce GetDistributeObjectOps to organize dispatch of logic dependent on node/object type 2020-01-09 18:24:29 +00:00
create_distributed_table.c Introduce GetDistributeObjectOps to organize dispatch of logic dependent on node/object type 2020-01-09 18:24:29 +00:00
dependencies.c Adds propagation for grant on schema commands 2020-01-20 14:51:28 +03:00
distribute_object_ops.c Adds propagation for grant on schema commands 2020-01-20 14:51:28 +03:00
drop_distributed_table.c Error for metadata commands if any metadata node is out-of-sync (#3226) 2019-11-27 09:52:57 +01:00
extension.c Introduce GetDistributeObjectOps to organize dispatch of logic dependent on node/object type 2020-01-09 18:24:29 +00:00
foreign_constraint.c Automatically convert useless declarations using regex replace (#3181) 2019-11-21 13:47:29 +01:00
function.c Fix: distributed function with table reference in declare (#3384) 2020-01-16 14:21:54 +01:00
grant.c Introduce GetDistributeObjectOps to organize dispatch of logic dependent on node/object type 2020-01-09 18:24:29 +00:00
index.c Lazy query deparsing executable queries (#3350) 2020-01-17 11:49:43 +01:00
multi_copy.c Locally execute queries that don't need any data access (#3410) 2020-01-23 18:28:34 +01:00
policy.c Introduce GetDistributeObjectOps to organize dispatch of logic dependent on node/object type 2020-01-09 18:24:29 +00:00
rename.c Introduce GetDistributeObjectOps to organize dispatch of logic dependent on node/object type 2020-01-09 18:24:29 +00:00
role.c Introduce GetDistributeObjectOps to organize dispatch of logic dependent on node/object type 2020-01-09 18:24:29 +00:00
schema.c Adds propagation for grant on schema commands 2020-01-20 14:51:28 +03:00
sequence.c Automatically convert useless declarations using regex replace (#3181) 2019-11-21 13:47:29 +01:00
subscription.c Remove copyright years (#2918) 2019-10-15 17:44:30 +03:00
table.c Lazy query deparsing executable queries (#3350) 2020-01-17 11:49:43 +01:00
transmit.c Automatically convert useless declarations using regex replace (#3181) 2019-11-21 13:47:29 +01:00
truncate.c Introduce GetDistributeObjectOps to organize dispatch of logic dependent on node/object type 2020-01-09 18:24:29 +00:00
type.c Introduce GetDistributeObjectOps to organize dispatch of logic dependent on node/object type 2020-01-09 18:24:29 +00:00
utility_hook.c Locally execute queries that don't need any data access (#3410) 2020-01-23 18:28:34 +01:00
vacuum.c Lazy query deparsing executable queries (#3350) 2020-01-17 11:49:43 +01:00
variableset.c Introduce GetDistributeObjectOps to organize dispatch of logic dependent on node/object type 2020-01-09 18:24:29 +00:00

README.md

Commands

The commands module is modeled after backend/commands from the postgres repository and contains the logic for Citus on how to run these commands on distributed objects. Even though the structure of the directory has some resemblence to its postgres relative, files here are somewhat more fine-grained. This is due to the nature of citus commands that are heavily focused on distributed tables. Instead of having all commands in tablecmds.c they are often moved to files that are named after the command.

File Description
create_distributed_table.c Implementation of UDF's for creating distributed tables
drop_distributed_table.c Implementation for dropping metadata for partitions of distributed tables
extension.c Implementation of CREATE EXTENSION commands for citus specific checks
foreign_constraint.c Implementation of helper functions for foreign key constraints
grant.c Placeholder for code granting users access to relations, implemented as enterprise feature
index.c Implementation of commands specific to indices on distributed tables
multi_copy.c Implementation of COPY command. There are multiple different copy modes which are described in detail below
policy.c Implementation of CREATE\ALTER POLICY commands.
rename.c Implementation of ALTER ... RENAME ... commands. It implements the renaming of applicable objects, otherwise provides the user with a warning
schema.c
sequence.c Implementation of CREATE/ALTER SEQUENCE commands. Primarily checks correctness of sequence statements as they are not propagated to the worker nodes
table.c
transmit.c Implementation of COPY commands with format transmit set in the options. This format is used to transfer files from one node to another node
truncate.c Implementation of TRUNCATE commands on distributed tables
utility_hook.c This is the entry point from postgres into the commands module of citus. It contains the implementation that gets registered in postgres' ProcessUtility_hook callback to extends the functionality of the original ProcessUtility. This code is used to route the incomming commands to their respective implementation in Citus
vacuum.c Implementation of VACUUM commands on distributed tables

COPY

The copy command is overloaded for a couple of use-cases specific to citus. The syntax of the command stays the same, however the implementation might slightly differ from the stock implementation. The overloading is mostly done via extra options that Citus uses to indicate how to operate the copy. The options used are described below.

FORMAT transmit

Implemented in transmit.c

TODO: to be written by someone with enough knowledge to write how, when and why it is used.

FORMAT result

Implemented in multi_copy.c

TODO: to be written by someone with enough knowledge to write how, when and why it is used.

MASTER_HOST host

Implemented in multi_copy.c

Triggered by the MASTER_HOST option being set on the copy command. Also accepts MASTER_PORT

TODO: to be written by someone with enough knowledge to write how, when and why it is used.