With this change DropShards function started to use new connection API. DropShards
function is used by DROP TABLE, master_drop_all_shards and master_apply_delete_command,
therefore all of these functions now support transactional operations. In DropShards
function, if we cannot reach a node, we mark shard state of related placements as
FILE_TO_DELETE and continue to drop remaining shards; however if any error occurs after
establishing the connection, we ROLLBACK whole operation.
All router, real-time, task-tracker plannable queries should now have
full prepared statement support (and even use router when possible),
unless they don't go through the custom plan interface (which
basically just affects LANGUAGE SQL (not plpgsql) functions).
This is achieved by forcing postgres' planner to always choose a
custom plan, by assigning very low costs to plans with bound
parameters (i.e. ones were the postgres planner replanned the query
upon EXECUTE with all parameter values provided), instead of the
generic one.
This requires some trickery, because for custom plans to work the
costs for a non-custom plan have to be known, which means we can't
error out when planning the generic plan. Instead we have to return a
"faux" plan, that'd trigger an error message if executed. But due to
the custom plan logic that plan will likely (unless called by an SQL
function, or because we can't support that query for some reason) not
be executed; instead the custom plan will be chosen.
So far router planner had encapsulated different functionality in
MultiRouterPlanCreate. Modifications always go through router, selects
sometimes. Modifications always error out if the query is unsupported,
selects return NULL. Especially the error handling is a problem for
the upcoming extension of prepared statement support.
Split MultiRouterPlanCreate into CreateRouterPlan and
CreateModifyPlan, and change them to not throw errors.
Instead errors are now reported by setting the new
MultiPlan->plannigError.
Callers of router planner functionality now have to throw errors
themselves if desired, but also can skip doing so.
This is a pre-requisite for expanding prepared statement support.
While touching all those lines, improve a number of error messages by
getting them closer to the postgres error message guidelines.
This adds a replication_model GUC which is used as the replication
model for any new distributed table that is not a reference table.
With this change, tables with replication factor 1 are no longer
implicitly MX tables.
The GUC is similarly respected during empty shard creation for e.g.
existing append-partitioned tables. If the model is set to streaming
while replication factor is greater than one, table and shard creation
routines will error until this invalid combination is corrected.
Changing this parameter requires superuser permissions.
We changed error message which appears when user tries to execute outer join command and
that command requires repartitioning. Old error message mentioned about 1-to-1 shard
partitioning which may not be clear to user.
This enables proper transactional behaviour for copy and relaxes some
restrictions like combining COPY with single-row modifications. It
also provides the basis for relaxing restrictions further, and for
optionally allowing connection caching.
This change adds support for serial columns to be used with MX tables.
Prior to this change, sequences of serial columns were created in all
workers (for being able to create shards) but never used. With MX, we
need to set the sequences so that sequences in each worker create
unique values. This is done by setting the MINVALUE, MAXVALUE and
START values of the sequence.
This commit is intended to improve the error messages while planning
INSERT INTO .. SELECT queries. The main motivation for this change is
that we used to map multiple cases into a single message. With this change,
we added explicit error messages for many cases.
With this change, we start to delete placement of reference tables at given worker node
after master_remove_node UDF call. We remove placement metadata at master node but we do
not drop actual shard from the worker node. There are two reasons for that decision,
first, it is not critical to DROP the shards in the workers because Citus will ignore them
as long as node is removed from cluster and if we add that node back to cluster we will
DROP and recreate all reference tables. Second, if node is unreachable, it becomes
complicated to cover failure cases and have a transaction support.
Enables use views within distributed queries.
User can create and use a view on distributed tables/queries
as he/she would use with regular queries.
After this change router queries will have full support for views,
insert into select queries will support reading from views, not
writing into. Outer joins would have a limited support, and would
error out at certain cases such as when a view is in the inner side
of the outer join.
Although PostgreSQL supports writing into views under certain circumstances.
We disallowed that for distributed views.
In tests related to automatic reference table creation and deletion, there were some
tests whose output may change order thus creating inconsistent test results. With this
change we add ORDER BY clause to related tests to have consistent output.
CloseNodeConnections() is supposed to close connections to a given node.
However, before this commit it lacks to actually call PQFinish() on the
connections. Using CloseConnection() handles closing and all other necessary
actions.
With this change we start to error out on router planner queries where a common table
expression with data-modifying statement is present. We already do not support if
there is a data-modifying statement using result of the CTE, now we also error out
if CTE itself is data-modifying statement.
Remove the router specific transaction and shard management, and
replace it with the new placement connection API. This mostly leaves
behaviour alone, except that it is now, inside a transaction, legal to
select from a shard to which no pre-existing connection exists.
To simplify code the code handling task executions for select and
modify has been split into two - the previous coding was starting to
get confusing due to the amount of only conditionally applicable code.
Modification connections & transactions are now always established in
parallel, not just for reference tables.
With this change, we start to replicate all reference tables to the new node when new node
is added to the cluster with master_add_node command. We also update replication factor
of reference table's colocation group.
Since we will now replicate reference tables each time we add node, we need to ensure
that test space is clean in terms of reference tables before any add node operation.
For this purpose we had to change order of multi_drop_extension test which caused
change of some of the colocation ids.
With this change we introduce new UDF, upgrade_to_reference_table, which can be used to
upgrade existing broadcast tables reference tables. For upgrading, we require that given
table contains only one shard.
Renamed FindShardIntervalIndex() to ShardIndex() and added binary search
capability. It used to assume that hash partition tables are always
uniformly distributed which is not true if upcoming tenant isolation
feature is applied. This commit also reduces code duplication.
Router planner already handles cases when all shards
are pruned out. This is about missing test cases. Notice that
"column is null" and "column = null" have different shard
pruning behavior.
We have one replication of reference table for each node. Therefore all problems with
replication factor > 1 also applies to reference table. As a solution we will not allow
foreign keys on reference tables. It is not possible to define foreign key from, to or
between reference tables.
Previously, we errored out if non-user tries to SELECT query for some metadata tables. It
seems that we already GRANT SELECT access to some metadata tables but not others. With
this change, we GRANT SELECT access to all existing Citus metadata tables.
* Add get_distribution_value_shardid UDF
With this UDF users can now map given distribution value to shard id. We mostly hide
shardids from users to prevent unnecessary complexity but some power users might need
to know about which entry/value is stored in which shard for maintanence purposes.
Signature of this UDF is as follows;
bigint get_distribution_value_shardid(table_name regclass, distribution_value anyelement)
With this commit, we implemented some basic features of reference tables.
To start with, a reference table is
* a distributed table whithout a distribution column defined on it
* the distributed table is single sharded
* and the shard is replicated to all nodes
Reference tables follows the same code-path with a single sharded
tables. Thus, broadcast JOINs are applicable to reference tables.
But, since the table is replicated to all nodes, table fetching is
not required any more.
Reference tables support the uniqueness constraints for any column.
Reference tables can be used in INSERT INTO .. SELECT queries with
the following rules:
* If a reference table is in the SELECT part of the query, it is
safe join with another reference table and/or hash partitioned
tables.
* If a reference table is in the INSERT part of the query, all
other participating tables should be reference tables.
Reference tables follow the regular co-location structure. Since
all reference tables are single sharded and replicated to all nodes,
they are always co-located with each other.
Queries involving only reference tables always follows router planner
and executor.
Reference tables can have composite typed columns and there is no need
to create/define the necessary support functions.
All modification queries, master_* UDFs, EXPLAIN, DDLs, TRUNCATE,
sequences, transactions, COPY, schema support works on reference
tables as expected. Plus, all the pre-requisites associated with
distribution columns are dismissed.
We used to disable router planner and executor
when task executor is set to task-tracker.
This change enables router planning and execution
at all times regardless of task execution mode.
We are introducing a hidden flag enable_router_execution
to enable/disable router execution. Its default value is
true. User may disable router planning by setting it to false.
Adds support for VACUUM and ANALYZE commands which target a specific
distributed table. After grabbing the appropriate locks, this imple-
mentation sends VACUUM commands to each placement (using one connec-
tion per placement). These commands are sent in parallel, so users
with large tables will benefit from sharding. Except for VERBOSE, all
VACUUM and ANALYZE options are supported, including the explicit
column list used by ANALYZE.
As with many of our utility commands, the local command also runs. In
the VACUUM/ANALYZE case, the local command is executed before any re-
mote propagation. Because error handling is managed after local proc-
essing, this can result in a VACUUM completing locally but erroring
out when distributed processing commences: a minor technicality in all
cases, as there isn't really much reason to ever roll back a VACUUM (an
impossibility in any case, as VACUUM cannot run within a transaction).
Remote propagation of targeted VACUUM/ANALYZE is controlled by the
enable_ddl_propagation setting; warnings are emitted if such a command
is attempted when DDL propagation is disabled. Unqualified VACUUM or
ANALYZE is not handled, but a warning message informs the user of this.
Implementation note: this commit adds a "BARE" value to MultiShard-
CommitProtocol. When active, no BEGIN command is ever sent to remote
nodes, useful for commands such as VACUUM/ANALYZE which must not run in
a transaction block. This value is not user-facing and is reset at
transaction end.
This change adds `start_metadata_sync_to_node` UDF which copies the metadata about nodes and MX tables
from master to the specified worker, sets its local group ID and marks its hasmetadata to true to
allow it receive future DDL changes.
One less place managing remote transactions. It also makes it fairly
easy to use 2PC for certain modifications (e.g. reference tables). Just
issue a CoordinatedTransactionUse2PC(). If every placement failure
should cause the whole transaction to abort, additionally mark the
relevant transactions as critical.
Not that this is not with the citus *extension* loaded, just the shared
library. The former runs (by adding --use-existing to the flags) but
has a bunch of trivial test differences.
With this PR, we add foreign key support to ALTER TABLE commands. For now,
we only support foreign constraint creation via ALTER TABLE query, if it
is only subcommand in ALTER TABLE subcommand list.
We also only allow foreign key creation if replication factor is 1.
On some systems a new libpq is available than what we're compiling
against, but until now we used psql in the version we're compiling
against. That' a problem, because (quoting Jason):
With 9.6, libpq's default handling of CONTEXT changed: it is hidden
unless the level is ERROR or higher. We addressed this ourselves using
the SHOW_CONTEXT variable (by setting "always" in pg_regress_multi): in
9.5, this is ignored (and unneeded), in 9.6, it ensures old behavior is
preserved.
For 9.6 we'd already worked around the problem by specifying that
context should always be shown, but < 9.6 psql doesn't know how to do
that.
As there's no csql anymore, which strictly tied us to a specific version
of psql/csql, we can now just use the system's psql if available. We
still fall back to the psql of the installation we're compiling against,
if there's no other psql in PATH.
This commit fixes a bug when the SELECT target list includes a constant
value.
Previous behaviour of target list re-ordering:
* Iterate over the INSERT target list
* If it includes a Var, find the corresponding SELECT entry
and update its resno accordingly
* If it does not include a Var (which we only considered to be
DEFAULTs), generate a new SELECT target entry
* If the processed target entry count in SELECT target list is less
than the original SELECT target list (GROUP BY elements not included in
the SELECT target entry), add them in the SELECT target list and
update the resnos accordingly.
* However, this step was leading to add the CONST SELECT target entries
twice. The reason is that when CONST target list entries appear in the
SELECT target list, the INSERT target list doesn't include a Var. Instead,
it includes CONST as it does for DEFAULTs.
New behaviour of target list re-ordering:
* Iterate over the INSERT target list
* If it includes a Var, find the corresponding SELECT entry
and update its resno accordingly
* If it does not include a Var (which we consider to be
DEFAULTs and CONSTs on the SELECT), generate a new SELECT
target entry
* If any target entry remains on the SELECT target list which are resjunk,
(GROUP BY elements not included in the SELECT target entry), keep them
in the SELECT target list by updating the resnos.
This change allows seeing the names of columns of `master_add_node`,
using `SELECT * FROM master_add_node(...)` by specifying output
columns in UDF definition.
Previously, we threw an error when we ran CREATE INDEX IF NOT EXISTS
with an already existing index. This change enables expected behavior by
checking if the statement has IF NOT EXISTS before throwing the error.
We also ensure that we don't execute the command on the workers, if an
index already exists on the master.
At the moment, we do not support foreign constraints if replication factor is greater
than 1. However foreign constraints can be used in cloud with high availability option.
Therefore we do not want to create an impression such that foreign constraints with
high availability is not supported at all. We call users to action with this error
message.
In ErrorIfShardPlacementsNotColocated(), while checking if shards are colocated,
error out if matching shard intervals have different number of shard placements.