- non-router plannable queries can be executed
by router executor if they satisfy the criteria
- router executor is removed from configuration,
now task executor can not be set to router
- removed some tests that error out for router executor
With #426, some new warning messages started to arise, because of
cross assignment of Node and Expr pointers. This change fixes the
warnings with type casts.
Fixes#379
Varchar VAR struct is wrapped in RELABELTYPE struct inside PostgreSQL code and
IsPartitionColumnRecursive function considers only VAR types so returning false
for varchar.
This change adds strip_implicit_coercions() call to the columnExpression in
IsPartitionColumnRecursive function so that we get rid of implicit coercions like
RELABELTYPE are stripped to VAR.
This change fixes the problem with joins with VARCHAR columns. Prior to
this change, when we tried to do large table joins on varchar columns, we got
an error of the form:
ERROR: cannot perform local joins that involve expressions
DETAIL: local joins can be performed between columns only.
This is because we have a check in CheckJoinBetweenColumns() which requires the
join clause to have only 'Var' nodes (i.e. columns). Postgres adds a relabel t
ype cast to cast the varchar to text; hence the type of the node is not T_Var
and the join fails.
The fix involves calling strip_implicit_coercions() to the left and right
arguments so that RELABELTYPE is stripped to VAR.
Fixes#76.
Fixes#375
Prior to this change, shard pruning couldn't be done if:
- Table is hash-distributed
- Partition column of is VARCHAR
- Query to be pruned is a subquery
There were two problems:
- A bug in left-side/right-side checks for the partition column
- We were not considering relabeled types (VARCHAR was relabeled as TEXT)
While reading this code last week, it appeared as though there was no
place we ensured that the partition clause actually used equality ops.
As such, I was worried that we might transform a clause such as id < 5
into a constraint like hash(id) = hash(5) when doing shard pruning. The
relevant code seemed to just ensure:
1. The node is an OpExpr
2. With a related hash function
3. It compares the partition column
4. Against a constant
A superficial reading implied we didn't actually make sure the original
op was equality-related, but it turns out the hash lookup function DOES
ensure that for us. So I added a comment.
Previously (if you're creating the index with the same name on different
tables) we successfully ran the command on the workers before failing it
on the master and leaving no record of the index.
Now we check whether the index exists on the master before sending
commands to the workers.
--
Also make the error better when user attampts to create an index without
a name. Previously those statements returned:
brian=# create index on c (b);
WARNING: could not receive query results from localhost:9700
DETAIL: Client error: cannot extend name for null index name
ERROR: could not execute DDL command on worker node shards
They now return
brian=# create index on c (b);
ERROR: creating index without a name on a distributed table is
currently unsupported
This macro is intended to receive a bare integer literal (no suffix).
It adds a suffix as necessary, depending upon available features. On
e.g. 32-bit platforms, the existing code failed to compile because a
suffix was added to the existing suffix. This fixes that problem.
Fixes#363
This change modifies the error message given when Citus is attempted
to be loaded other than shared_preload_libraries. Explanations have been
extended with that shared_preload_parameters parameter is in
postgresql.conf and citus should be at the beginning.
Prior to this change, performing a SELECT query without a target
list caused backend to crash.
Sample Query: SELECT FROM github_events; (without any * before FROM)
PostgreSQL:
```
--
(39599 rows)
```
Citus:
```
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>
```
The problem was an unnecessary Assert on column list in
SetRangeTblExtraData(citus_nodefuncs.c)
This change removes the whitelisting check on the WHERE clauses. Note that, before
this change, citus was already allowing all types of nodes with the following
format (i.e., wrap with a boolean test):
* SELECT col FROM table WHERE (ANY EXPRESSION) is TRUE;
Thus, this change is mostly useful for allowing the expressions in the WHERE clause
directly and avoiding "unsupport clause type" errors.
Prior to this change, it was not possible to use UDFs in repartitioned
subqueries. The reason is that we were setting the search path explicitly
and omiting public schema from that path.
This change adds the public schema to the explicitly set search path.
Fixes issue #258
Prior to this change, Citus gives a deceptive NOTICE message when a query
including ANY or ALL on a non-partition column is issued on a hash
partitioned table.
Let the github_events table be hash-distributed on repo_id column. Then,
issuing this query:
SELECT count(*) FROM github_events WHERE event_id = ANY ('{1,2,3}')
Gives this message:
NOTICE: cannot use shard pruning with ANY (array expression)
HINT: Consider rewriting the expression with OR clauses.
Note that since event_id is not the partition column, shard pruning would
not be applied in any case. However, the NOTICE message would be valid
and be given if the ANY clause would have been applied on repo_id column.
Reviewer: Murat Tuncer
WORKER_LENGTH + 1 is too large. Fixing this has no impact on the string
that is ultimately copied, as it's impossible for the source string to
be any larger to begin with.
The previous form of the test, utilizing DEBUG2, included too much
output dependent on the specifc system and version. Reformulate it to
explicitly connect to workers and show the schema there, when necessary.
The only remaining difference in some of the remaining alternate
regression test files was due to an older minor version release
change. Remove those as well.
There already exist tests that locally embed knowledge about port
numbers, and there's more tests requiring that. Instead of copying
\set's to several tests, make these port number variables available to
all tests.
multi_ExecutorStart() replaces the original planned statement with the
master select statement. As that hasn't gone through the parse analysis
hooks, it'll not have a associated queryId. This prevents extensions
pg_stat_statements to show useful data associated with the query.
I came across several places we weren't as flexible or resilient as we
should have been in our build logic. They include:
* Not using `DESTDIR` in the install-header destination
* Allowing callers to specify `VPATH` or `srcdir` (which breaks)
* Using absolute path for SCRIPTS (9.5 prepends srcdir)
* Including libpq-int in a confusing way (extracted this function)
* Having server includes come first during csql build (client must)
In particular, I hit all of these attempting to build with pg_buildext
in Debian. It passes in an explicit VPATH, as well as srcdir (breaking
all recursive make invocations), and also uses DESTDIR during install.
In addition, a PGDG-enabled Debian box will have the latest libpq-dev
headers (e.g. 9.5) even when building against an older server version
(e.g. 9.4). This leads to problems when including e.g. `c.h`, which
is ambiguous. While compiling more client-side code (csql), we need to
ensure the newer libpq headers are included _first_, so I fixed that.
The default staging policy is now round-robin, though tests were still
configured to use local-first. Testing with the shipping default seems
like the best option, correctness-wise, and since local-first has some
issues with OSes where connecting from localhost doesn't always resolve
to 'localhost', just going with the default is a win-win.
Though Citus' Task struct has a shardId field, it doesn't have the same
semantics as the one previously used in pg_shard code. The analogous
field in the Citus Task is anchorShardId. I've also added an argument
check to the relevant locking function to catch future locking attempts
which pass an invalid argument.
- Flexed the check which prevented append operation cstore tables
since its storage type is not SHARD_STORAGE_TABLE.
- Used process utility function to perform copy operation in
worker_append_table_to shard() instead of directly calling
postgresql DoCopy().
- Removed the additional check in master_create_empty_shard() function.
This check was redundant and erroneous since it was called after
CheckDistributedTable() call.
- Modified WorkerTableSize() function to retrieve cstore table shard
size correctly.
After this change, shards and associated metadata are automatically
dropped when running DROP TABLE on a distributed table, which fixes#230.
It also adds schema support for master_apply_delete_command, which
fixes#73.
Dropping the shards happens in the master_drop_all_shards UDF, which is
called from the SQL_DROP trigger. Inside the trigger, the table is no
longer visible and calling master_apply_delete_command directly wouldn't
work and oid <-> name mappings are not available. The
master_drop_all_shards function therefore takes the relation id, schema
name, and table name as parameters, which can be obtained from
pg_event_trigger_dropped_objects() in the SQL_DROP trigger. If the user
calls master_drop_all_shards while the table still exists, the schema
name and table name are ignored.
Author: Marco Slot
Reviewed-By: Andres Freund
I removed two braces to have this function remain more similar to the
original PostgreSQL function and added uncrustify commands to disable
formatting of its contents.
All citusdb references in
- extension, binary names
- file headers
- all configuration name prefixes
- error/warning messages
- some functions names
- regression tests
are changed to be citus.
The postgres_fdw extension has an extern function with an identical
signature, which can cause problems when both extensions are loaded.
A simple rename can fix this for now (this is the only function with)
such a conflict.
Previously we used, for historical reasons, MessageContext.
That is problematic if a single message from the client
causes a lot of statements to be planned. E.g. for the
copy_to_distributed_table script one insert statement
is planned for each row inserted via COPY, and only freed
when COPY has finished.
This entirely removes any restriction on the type of partitioning
during DML planning and execution. Though there aren't actually any
technical limitations preventing DML commands against append- (or even
range-) partitioned tables, we had initially forbidden this, as any
future stage operation could cause shards to overlap, banning all
subsequent DML operations to partition values contained within more
than one shards. This ended up mostly restricting us, so we're now
removing that restriction.
When two data types have the same binary representation, PostgreSQL may
add an implicit coercion between them by wrapping a node in a relabel
type. This wrapper signals that the wrapped value is completely binary
compatible with the designated "final type" of the relabel node. As an
example, the varchar type is often relabeled to text, since functions
provided for use with text (comparisons, hashes, etc.) are completely
compatible with varchar as well.
The hash-partitioned codepath contains functions that verify queries
actually contain an equality constraint on the partition column, but
those functions expect such constraints to be comparison operations
between a Var and Const. The RelabelType wrapper node causes these
functions to always return false, which bypasses shard pruning.