Commit Graph

6 Commits (07cca85227f8f162cb03ecfff8d9626f23a9af93)

Author SHA1 Message Date
Onder Kalaci 12e50d96dc Fix tests where hll is not installed 2018-05-11 16:01:47 +03:00
Murat Tuncer e16805215d
Support count(distinct) for non-partition columns (#1692)
Expands count distinct coverage by allowing more cases. We used to support
count distinct only if we can push down distinct aggregate to worker query
i.e. the count distinct clause was on the partition column of the table,
or there was a grouping on the partition column.

Now we can support
- non-partition columns, with or without grouping on partition column
- partition, and non partition column in the same query
- having clause
- single table subqueries
- insert into select queries
- join queries where count distinct is on partition, or non-partition column
- filters on count distinct clauses (extends existing support)

We first try to push down aggregate to worker query (original case), if we
can't then we modify worker query to return distinct columns to coordinator
node. We do that by adding distinct column targets to group by clauses. Then
we perform count distinct operation on the coordinator node.

This work should reduce the cases where HLL is used as it can address anything
that HLL can. However, if we start having performance issues due to very large
number rows, then we can recommend hll use.
2017-10-30 13:12:24 +02:00
Jason Petersen d6cccee5bc
Remove ALTER SEQUENCE from parallel groups
Removing these has no side effect, and in the (current) PostgreSQL 10,
an ERROR is printed during concurrent sequence modification.
2017-05-16 11:05:34 -06:00
Marco Slot f838c83809 Remove redundant pg_dist_jobid_seq restarts in tests 2017-04-18 11:42:32 +02:00
Murat Tuncer 45762006f3 Add support for filters
Ensures filter clauses are stripped from master query, and pushed
down to worker queries.
2016-12-01 08:53:46 +03:00
Jason Petersen bcfc58a7c7
Fix tests and tell Travis to run them all
Two sets of tests are fixed by this change:
  * multi_agg_approximate_distinct
  * those in multi_task_tracker_extra_schedule

The first broke when we renamed stage to load in many files and was
never being run because the HyperLogLog extension wasn't easily
available in Debian. Now it's in our repo, so we install it and run
the test. I removed the distinct HLL target in favor of just always
running it and providing an output variant to handle when the extension
is absent. Basically, if PostgreSQL thinks HLL is available, the test
installs it and runs normally, otherwise the absent variant is used.

The second broke when I removed a test variant, erroneously believing
it to be related to an older Citus version. I've added a line in that
test to clarify why the variant is necessary (a practice we should
widely adopt).
2016-10-07 17:32:54 -06:00