Fixes#302
Since our previous syntax did not allow creating hash partitioned tables,
some of the previous tests manually changed partition method to hash to
be able to test it. With this change we remove unnecessary workaround and
create hash distributed tables instead. Also in some tests metadata was
created manually. With this change we also fixed this issue.
Fixes#475
With this change we prevent addition of ONLY clause to queries prepared for
worker nodes. When we add ONLY clause we may miss the inherited tables in
worker nodes created by users manually.
When executing queries with citus.task_executor = 'real-time', query
execution could, so far, spend a significant amount of time
sleeping. That's because we were
a) sleeping after several phases of query execution, even if we're not
waiting for network IO
b) sleeping for a fixed amount of time when waiting for network IO;
often a lot longer than actually required.
Just reducing the amount of time slept isn't a real solution, because
that just increases CPU usage.
Instead have the real-time executor's ManageTaskExecution return whether
a task is currently being processed, waiting for reads or writes, or
failed. When all tasks are waiting for IO use poll() to wait for IO
readyness.
That requires to slightly redefine how connection timeouts are handled:
before we counted the number of times ManageTaskExecution() was called,
and compared that with the timeout divided by the task check
interval. That, if processing of tasks took a while, could significantly
increase the time till a timeout occurred. Because it was based on the
ManageTaskExecution() being called on a constant interval, this approach
isn't feasible anymore. Instead measure the actual time since
connection establishment was started. That could in theory, if task
processing takes a very long time, lead to few passes over
PQconnectPoll().
The problem of sleeping too much also exists for the 'task-tracker'
executor, but is generally less problematic there, as processing the
individual tasks usually will take longer. That said, for e.g. the
regression tests it'd be helpful to use a similar approach.
Single table repartition subqueries now support count(distinct column)
and count(distinct (case when ...)) expressions. Repartition query
extracts column used in aggregate expression and adds them to target
list and group by list, master query stays the same (count (distinct ...))
but attribute numbers inside the aggregate expression is modified to
reflect changes in repartition query.
Now, master_create_empty_shard() will create shards according to the
value of citus.shard_placement_policy which also makes default round-robin
instead of random.
Fixes#10
This change creates a new UDF: master_modify_multiple_shards
Parameters:
modify_query: A simple DELETE or UPDATE query as a string.
The UDF is similar to the existing master_apply_delete_command UDF.
Basically, given the modify query, it prunes the shard list, re-constructs
the query for each shard and sends the query to the placements.
Depending on the value of citus.multi_shard_commit_protocol, the commit
can be done in one-phase or two-phase manner.
Limitations:
* It cannot be called inside a transaction block
* It only be called with simple operator expressions (like Single Shard Modify)
Sample Usage:
```
SELECT master_modify_multiple_shards(
'DELETE FROM customer_delete_protocol WHERE c_custkey > 500 AND c_custkey < 500');
```
Make's $(wildcard) does not sort the glob result, but returns filenames
in filesystem ordering. This makes the build result vary and hence
unreproducible on the binary level. Fix by adding $(sort).
Spotted by Debian's reproducible builds project.
This commit fixes failures happen during check-full. The change does make
clean seperation of executor types in certain places to keep the outputs
stable.
Now, we can copy to an append-partitioned distributed relation from
any worker node by providing master options such as;
COPY relation_name FROM file_path WITH (delimiter '|', master_host 'localhost', master_port 5432);
where master_port is optional and default is 5432.
Based on Andres' suggestion, I removed SetConnectionStatus, moving its
functionality directly into set_connection_status_bad, which now simply
shuts down the socket underlying a particular connection.
This keeps the functionality as-is while removing our questionable use
of internal libpq headers.
Fixes#477
This change fixes the compile time warning message in BuildMapMergeJob in
multi_physical_planner.c about mixed declarations and code. Basically, the
problematic declaration is moved up so that no expression is before it.
Allow references to columns in UPDATE statements
Queries like "UPDATE tbl SET column = column + 1" are now allowed, so long as you don't use any IMMUTABLE functions.
This change renames the distributed transaction manager parameter from
citus.copy_transaction_manager to citus.multi_shard_commit_protocol.
Distributed transaction manager has been used only by the COPY on hash
partitioned tables but it can be used by upcoming features so, we needed
to rename so that its name do not contain a reference to COPY.
The change also includes renames like transaction_manager_options to
commit_protocol_options and TRANSACTION_MANAGER_1PC to COMMIT_PROTOCOL_1PC.
With this change, declaration of MultiShardCommitProtocol (was
CopyTransactionManager) is moved from multi_copy.c to multi_transaction.c.
Currently that's just COPY FROM. There's other places where we could
check for permissions earlier (to fail less verbosely), but since
there's other pending changes in the whole DDL area, which is affected
by this, I'm just adding a note to those places.
That's important because ownership of relations implies special
privileges. Without this change, a distributed table can be accessible
by a table's owner, but a shard created by another user might not.
Some small parts of citus currently require superuser privileges; which
is obviously not desirable for production scenarios. Run these small
parts under superuser privileges (we use the extension owner) to avoid
that.
This does not yet coordinate grants between master and workers. Thus it
allows to create shards, load data, and run queries as a non-superuser,
but it is not easily possible to allow differentiated accesses to
several users.
\stage so far directly inserted into pg_dist_shard and
pg_dist_shard_placement. That makes it hard to do effective permission
checks. Thus move the inserts into two C functions.
These two new functions aren't the nicest abstraction. But as we are
planning to obsolete \stage, it doesn't seem worthwhile to refactor the
client-side code of \stage to allow the use of
master_create_empty_shard() et al.
Previously several commands, amongst them commands like
master_create_distributed_table(), were allowed for everyone. That's not
good: Even though citus currently requires superuser permissions, we
shouldn't allow non-superusers to perform actions as sensitive as making
a table distributed.
There's no checks on the worker_* functions, as these usually just punt
the action to underlying postgres functionality, which then perform the
necessary checks.
Citus' extension version now has a -$schemaversion appendix. When the
schema is changed, a new schema version has to be added; changes to the
same schema version several commits inside a single pull request are ok.
Schema migration scripts between each schema version have to be
added. To ensure upgrade scripts work correctly a new regression test
ensures that all steps work.
The extension scripts to-be-used for CREATE EXTENSION (i.e. not
extension updates) are generated by concatenating citus.sql and the
relevant migration scripts.
Otherwise the owner of relations and such will depend on the username of
the user running the regression tests. As "postgres" is the most common
username for that purpose, hardcode that in pg_regress_multi.pl.
So far we've always used libpq defaults when connecting to workers; bar
special environment variables being set that'll always be the user that
started the server. That's not desirable because it prevents using
users with fewer privileges.
Thus change the various APIs creating connections to workers to always
use usernames. That means:
1) MultiClientConnect() needs to, optionally, accept a username
2) GetOrEstablishConnection(), including the underlying cache, need to
use the current user as part of the connection cache key. That way
connections for separate users are distinct, and we always use one
with the correct authorization.
3) The task tracker needs to keep track of the username associated with
a task, so it can use it when establishing connections outside the
originating session.
This commit adds a fast shard pruning path for INSERTs on
hash-partitioned tables. The rationale behind this change is
that if there exists a sorted shard interval array, a single
index lookup on the array allows us to find the corresponding
shard interval. As mentioned above, we need a sorted
(wrt shardminvalue) shard interval array. Thus, this commit
updates shardIntervalArray to sortedShardIntervalArray in the
metadata cache. Then uses the low-level API that is defined in
multi_copy to handle the fast shard pruning.
The performance impact of this change is more apparent as more
shards exist for a distributed table. Previous implementation
was relying on linear search through the shard intervals. However,
this commit relies on constant lookup time on shard interval
array. Thus, the shard pruning becomes less dependent on the
shard count.
When we notice that pg_dist_partition is being invalidated we assume
that the citus extension is being dropped and drop state such as
extensionLoaded and the cached oids of all the metadata tables.
This frees the user from needing to reconnect after running DROP
EXTENSION, so we also no longer send a warning message.
- non-router plannable queries can be executed
by router executor if they satisfy the criteria
- router executor is removed from configuration,
now task executor can not be set to router
- removed some tests that error out for router executor
With #426, some new warning messages started to arise, because of
cross assignment of Node and Expr pointers. This change fixes the
warnings with type casts.
Fixes#379
Varchar VAR struct is wrapped in RELABELTYPE struct inside PostgreSQL code and
IsPartitionColumnRecursive function considers only VAR types so returning false
for varchar.
This change adds strip_implicit_coercions() call to the columnExpression in
IsPartitionColumnRecursive function so that we get rid of implicit coercions like
RELABELTYPE are stripped to VAR.
This change fixes the problem with joins with VARCHAR columns. Prior to
this change, when we tried to do large table joins on varchar columns, we got
an error of the form:
ERROR: cannot perform local joins that involve expressions
DETAIL: local joins can be performed between columns only.
This is because we have a check in CheckJoinBetweenColumns() which requires the
join clause to have only 'Var' nodes (i.e. columns). Postgres adds a relabel t
ype cast to cast the varchar to text; hence the type of the node is not T_Var
and the join fails.
The fix involves calling strip_implicit_coercions() to the left and right
arguments so that RELABELTYPE is stripped to VAR.
Fixes#76.
Fixes#375
Prior to this change, shard pruning couldn't be done if:
- Table is hash-distributed
- Partition column of is VARCHAR
- Query to be pruned is a subquery
There were two problems:
- A bug in left-side/right-side checks for the partition column
- We were not considering relabeled types (VARCHAR was relabeled as TEXT)
While reading this code last week, it appeared as though there was no
place we ensured that the partition clause actually used equality ops.
As such, I was worried that we might transform a clause such as id < 5
into a constraint like hash(id) = hash(5) when doing shard pruning. The
relevant code seemed to just ensure:
1. The node is an OpExpr
2. With a related hash function
3. It compares the partition column
4. Against a constant
A superficial reading implied we didn't actually make sure the original
op was equality-related, but it turns out the hash lookup function DOES
ensure that for us. So I added a comment.
Previously (if you're creating the index with the same name on different
tables) we successfully ran the command on the workers before failing it
on the master and leaving no record of the index.
Now we check whether the index exists on the master before sending
commands to the workers.
--
Also make the error better when user attampts to create an index without
a name. Previously those statements returned:
brian=# create index on c (b);
WARNING: could not receive query results from localhost:9700
DETAIL: Client error: cannot extend name for null index name
ERROR: could not execute DDL command on worker node shards
They now return
brian=# create index on c (b);
ERROR: creating index without a name on a distributed table is
currently unsupported
This macro is intended to receive a bare integer literal (no suffix).
It adds a suffix as necessary, depending upon available features. On
e.g. 32-bit platforms, the existing code failed to compile because a
suffix was added to the existing suffix. This fixes that problem.
Fixes#363
This change modifies the error message given when Citus is attempted
to be loaded other than shared_preload_libraries. Explanations have been
extended with that shared_preload_parameters parameter is in
postgresql.conf and citus should be at the beginning.
Prior to this change, performing a SELECT query without a target
list caused backend to crash.
Sample Query: SELECT FROM github_events; (without any * before FROM)
PostgreSQL:
```
--
(39599 rows)
```
Citus:
```
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>
```
The problem was an unnecessary Assert on column list in
SetRangeTblExtraData(citus_nodefuncs.c)
This change removes the whitelisting check on the WHERE clauses. Note that, before
this change, citus was already allowing all types of nodes with the following
format (i.e., wrap with a boolean test):
* SELECT col FROM table WHERE (ANY EXPRESSION) is TRUE;
Thus, this change is mostly useful for allowing the expressions in the WHERE clause
directly and avoiding "unsupport clause type" errors.
Prior to this change, it was not possible to use UDFs in repartitioned
subqueries. The reason is that we were setting the search path explicitly
and omiting public schema from that path.
This change adds the public schema to the explicitly set search path.
Fixes issue #258
Prior to this change, Citus gives a deceptive NOTICE message when a query
including ANY or ALL on a non-partition column is issued on a hash
partitioned table.
Let the github_events table be hash-distributed on repo_id column. Then,
issuing this query:
SELECT count(*) FROM github_events WHERE event_id = ANY ('{1,2,3}')
Gives this message:
NOTICE: cannot use shard pruning with ANY (array expression)
HINT: Consider rewriting the expression with OR clauses.
Note that since event_id is not the partition column, shard pruning would
not be applied in any case. However, the NOTICE message would be valid
and be given if the ANY clause would have been applied on repo_id column.
Reviewer: Murat Tuncer
WORKER_LENGTH + 1 is too large. Fixing this has no impact on the string
that is ultimately copied, as it's impossible for the source string to
be any larger to begin with.
The previous form of the test, utilizing DEBUG2, included too much
output dependent on the specifc system and version. Reformulate it to
explicitly connect to workers and show the schema there, when necessary.
The only remaining difference in some of the remaining alternate
regression test files was due to an older minor version release
change. Remove those as well.
There already exist tests that locally embed knowledge about port
numbers, and there's more tests requiring that. Instead of copying
\set's to several tests, make these port number variables available to
all tests.
multi_ExecutorStart() replaces the original planned statement with the
master select statement. As that hasn't gone through the parse analysis
hooks, it'll not have a associated queryId. This prevents extensions
pg_stat_statements to show useful data associated with the query.
I came across several places we weren't as flexible or resilient as we
should have been in our build logic. They include:
* Not using `DESTDIR` in the install-header destination
* Allowing callers to specify `VPATH` or `srcdir` (which breaks)
* Using absolute path for SCRIPTS (9.5 prepends srcdir)
* Including libpq-int in a confusing way (extracted this function)
* Having server includes come first during csql build (client must)
In particular, I hit all of these attempting to build with pg_buildext
in Debian. It passes in an explicit VPATH, as well as srcdir (breaking
all recursive make invocations), and also uses DESTDIR during install.
In addition, a PGDG-enabled Debian box will have the latest libpq-dev
headers (e.g. 9.5) even when building against an older server version
(e.g. 9.4). This leads to problems when including e.g. `c.h`, which
is ambiguous. While compiling more client-side code (csql), we need to
ensure the newer libpq headers are included _first_, so I fixed that.
The default staging policy is now round-robin, though tests were still
configured to use local-first. Testing with the shipping default seems
like the best option, correctness-wise, and since local-first has some
issues with OSes where connecting from localhost doesn't always resolve
to 'localhost', just going with the default is a win-win.
Though Citus' Task struct has a shardId field, it doesn't have the same
semantics as the one previously used in pg_shard code. The analogous
field in the Citus Task is anchorShardId. I've also added an argument
check to the relevant locking function to catch future locking attempts
which pass an invalid argument.
- Flexed the check which prevented append operation cstore tables
since its storage type is not SHARD_STORAGE_TABLE.
- Used process utility function to perform copy operation in
worker_append_table_to shard() instead of directly calling
postgresql DoCopy().
- Removed the additional check in master_create_empty_shard() function.
This check was redundant and erroneous since it was called after
CheckDistributedTable() call.
- Modified WorkerTableSize() function to retrieve cstore table shard
size correctly.
After this change, shards and associated metadata are automatically
dropped when running DROP TABLE on a distributed table, which fixes#230.
It also adds schema support for master_apply_delete_command, which
fixes#73.
Dropping the shards happens in the master_drop_all_shards UDF, which is
called from the SQL_DROP trigger. Inside the trigger, the table is no
longer visible and calling master_apply_delete_command directly wouldn't
work and oid <-> name mappings are not available. The
master_drop_all_shards function therefore takes the relation id, schema
name, and table name as parameters, which can be obtained from
pg_event_trigger_dropped_objects() in the SQL_DROP trigger. If the user
calls master_drop_all_shards while the table still exists, the schema
name and table name are ignored.
Author: Marco Slot
Reviewed-By: Andres Freund
I removed two braces to have this function remain more similar to the
original PostgreSQL function and added uncrustify commands to disable
formatting of its contents.
All citusdb references in
- extension, binary names
- file headers
- all configuration name prefixes
- error/warning messages
- some functions names
- regression tests
are changed to be citus.
The postgres_fdw extension has an extern function with an identical
signature, which can cause problems when both extensions are loaded.
A simple rename can fix this for now (this is the only function with)
such a conflict.
Previously we used, for historical reasons, MessageContext.
That is problematic if a single message from the client
causes a lot of statements to be planned. E.g. for the
copy_to_distributed_table script one insert statement
is planned for each row inserted via COPY, and only freed
when COPY has finished.
This entirely removes any restriction on the type of partitioning
during DML planning and execution. Though there aren't actually any
technical limitations preventing DML commands against append- (or even
range-) partitioned tables, we had initially forbidden this, as any
future stage operation could cause shards to overlap, banning all
subsequent DML operations to partition values contained within more
than one shards. This ended up mostly restricting us, so we're now
removing that restriction.
When two data types have the same binary representation, PostgreSQL may
add an implicit coercion between them by wrapping a node in a relabel
type. This wrapper signals that the wrapped value is completely binary
compatible with the designated "final type" of the relabel node. As an
example, the varchar type is often relabeled to text, since functions
provided for use with text (comparisons, hashes, etc.) are completely
compatible with varchar as well.
The hash-partitioned codepath contains functions that verify queries
actually contain an equality constraint on the partition column, but
those functions expect such constraints to be comparison operations
between a Var and Const. The RelabelType wrapper node causes these
functions to always return false, which bypasses shard pruning.