Fixes#463
OID of user-defined types may be different in master and worker nodes. This causes errors
while sending data between nodes with binary nodes. Because binary copy format adds OID
of the element if it is in an array. The code adding OID is in PostgreSQL code, therefore
we cannot change it. Instead we decided to use text format if we try to send array of
user-defined type.
It turns out some tests exercised this behavior, but removing it should
have no ill effects. Besides, both copy and INSERT disallow NULLs in a
table's partition column.
Fixes a bug where anti-joins on hash-partitioned distributed tables
would incorrectly prune shards early, result in incorrect results (test
included).
The upcoming RETURNING support would otherwise require too much
duplication. This contains most of the pieces required for RETURNING
support, except removing the planner checks and adjusting regression
test output.
The old targetlist wasn't used so far, but the upcoming RETURNING
support relies on it.
This also allows to get rid of some crufty code in
multi_executor.c:multi_ExecutorStart(), which used the worker query's
targetlist instead of the main statement's (which didn't have one up to
now).
The targetlist contains TargetEntrys containing expressions, not
expressions directly. That didn't matter so far, but with the upcoming
RETURNING support, the targetlist is inspected to build a TupleDesc.
ExecCleanTypeFromTL hits an assert when looking at something that's not
a TargetEntry.
Mark the entry as resjunk, so it's not actually used.
As postgres's generic .l -> .c Makefile rule uses ifdef - which is
evaluated early, not during rule evaluation - we have to override the
rule, in addition to the detection of FLEX in the previous commit.
Fixes: #439
The only way we re-raise an error is if the raiseError flag is true, so
might as well purge connection in that block rather than independently
checking errorLevel.
Fixes#78
With this change, it is possible to append a table in any schema to shard. The function
master_append_table_to_shard now supports schema names.
For CITUS_RTE_RELATION type fragments, reloading shardIntervals from the
database is rather expensive. So store a pointer to the full shard
interval, instead of just the shard id. There's no new memory lifetime
hazards here, because we already passed a pointer to the shardInterval's
->shardId field around.
The plan time for the query in issue #607 goes from 2889 ms to 106 ms.
with this change.
By far the most expensive part of ShardIntervalsOverlap() is computing
the function to use to determine overlap. Luckily we already have that
computed and cached.
The plan time for the query in issue #607 goes from 8764 ms to 2889 ms
with this change.
-Added 2 more schedules for task-tracker and multi-binary
instead of running multi_schedule 3 times
-set task-tracker-delay for each long running schedule
Fixes#550, fixes#545
If table name contains special characters, it needs to be escaped. However in some cases,
we escape table name before appending shardId, which causes syntax error in the queries
sent to worker nodes. With this change we now append shardId before escaping table names.
This checkin removes variant files we needed
due to differences in outputs of pg94 and pg95 runs.
However, variant file for test multi_upsert stays
since this file tests for a feature that does not
exist in pg94, and outputs are drastically different.
now copies all column references in count distinct aggreagete
to worker target list and group by. Master target list is
also updated to reflect changes in attribute order.
Fixes 569
Fixes#496
Previously we do not check whether table is foreign or not while creating empty
shards, and set storage type to 't'(Standard table) or 'c'(Columnar table). Now
if the table is foreign table(but not CStore foreign table) we set storage
type to 'f'(Foreign table). If it is CStore foreign table, we set its storage
type to 'c', i.e. columnar table have priority over foreign table.
Please note that 'c' is only used for CStore tables not for other possible
columnar stores at the moment. Possible improvement could be checking for other
columnar stores, though I am not sure if there is a way to check it for all
other columnar stores.
Since we now short-circuit on certain remote errors, we want to ensure
we preserve the old behavior of not modifying any placement states if
a non-short-circuiting error occurs on all placements.
There's not a ton of documentation about what CONTEXT lines should look
like, but this seems like the most dominant pattern. Similarly, users
should expect lowercase, non-period strings.
Fixes#271
This change sets ShardIds and JobIds for each test case. Before this change,
when a new test that somehow increments Job or Shard IDs is added, then
the tests after the new test should be updated.
ShardID and JobID sequences are set at the beginning of each file with the
following commands:
```
ALTER SEQUENCE pg_catalog.pg_dist_shardid_seq RESTART 290000;
ALTER SEQUENCE pg_catalog.pg_dist_jobid_seq RESTART 290000;
```
ShardIds and JobIds are multiples of 10000. Exceptions are:
- multi_large_shardid: shardid and jobid sequences are set to much larger values
- multi_fdw_large_shardid: same as above
- multi_join_pruning: Causes a race condition with multi_hash_pruning since
they are run in parallel.
Fixes#302
Since our previous syntax did not allow creating hash partitioned tables,
some of the previous tests manually changed partition method to hash to
be able to test it. With this change we remove unnecessary workaround and
create hash distributed tables instead. Also in some tests metadata was
created manually. With this change we also fixed this issue.
Fixes#475
With this change we prevent addition of ONLY clause to queries prepared for
worker nodes. When we add ONLY clause we may miss the inherited tables in
worker nodes created by users manually.
When executing queries with citus.task_executor = 'real-time', query
execution could, so far, spend a significant amount of time
sleeping. That's because we were
a) sleeping after several phases of query execution, even if we're not
waiting for network IO
b) sleeping for a fixed amount of time when waiting for network IO;
often a lot longer than actually required.
Just reducing the amount of time slept isn't a real solution, because
that just increases CPU usage.
Instead have the real-time executor's ManageTaskExecution return whether
a task is currently being processed, waiting for reads or writes, or
failed. When all tasks are waiting for IO use poll() to wait for IO
readyness.
That requires to slightly redefine how connection timeouts are handled:
before we counted the number of times ManageTaskExecution() was called,
and compared that with the timeout divided by the task check
interval. That, if processing of tasks took a while, could significantly
increase the time till a timeout occurred. Because it was based on the
ManageTaskExecution() being called on a constant interval, this approach
isn't feasible anymore. Instead measure the actual time since
connection establishment was started. That could in theory, if task
processing takes a very long time, lead to few passes over
PQconnectPoll().
The problem of sleeping too much also exists for the 'task-tracker'
executor, but is generally less problematic there, as processing the
individual tasks usually will take longer. That said, for e.g. the
regression tests it'd be helpful to use a similar approach.
Single table repartition subqueries now support count(distinct column)
and count(distinct (case when ...)) expressions. Repartition query
extracts column used in aggregate expression and adds them to target
list and group by list, master query stays the same (count (distinct ...))
but attribute numbers inside the aggregate expression is modified to
reflect changes in repartition query.
Now, master_create_empty_shard() will create shards according to the
value of citus.shard_placement_policy which also makes default round-robin
instead of random.
Fixes#10
This change creates a new UDF: master_modify_multiple_shards
Parameters:
modify_query: A simple DELETE or UPDATE query as a string.
The UDF is similar to the existing master_apply_delete_command UDF.
Basically, given the modify query, it prunes the shard list, re-constructs
the query for each shard and sends the query to the placements.
Depending on the value of citus.multi_shard_commit_protocol, the commit
can be done in one-phase or two-phase manner.
Limitations:
* It cannot be called inside a transaction block
* It only be called with simple operator expressions (like Single Shard Modify)
Sample Usage:
```
SELECT master_modify_multiple_shards(
'DELETE FROM customer_delete_protocol WHERE c_custkey > 500 AND c_custkey < 500');
```