Expands count distinct coverage by allowing more cases. We used to support
count distinct only if we can push down distinct aggregate to worker query
i.e. the count distinct clause was on the partition column of the table,
or there was a grouping on the partition column.
Now we can support
- non-partition columns, with or without grouping on partition column
- partition, and non partition column in the same query
- having clause
- single table subqueries
- insert into select queries
- join queries where count distinct is on partition, or non-partition column
- filters on count distinct clauses (extends existing support)
We first try to push down aggregate to worker query (original case), if we
can't then we modify worker query to return distinct columns to coordinator
node. We do that by adding distinct column targets to group by clauses. Then
we perform count distinct operation on the coordinator node.
This work should reduce the cases where HLL is used as it can address anything
that HLL can. However, if we start having performance issues due to very large
number rows, then we can recommend hll use.
This change introduces the `pg_dist_node_metadata` which has a single jsonb value. When creating
the extension, a random server id is generated and stored in there. Everything in the metadata table
is added as a nested objected to the json payload that is sent to the reports server.
The following scenario can cause an Assert() crash if we don't do this:
- Install Citus v7.0-15
- Restart server & run a query to start maintenanced.
- Install Citus v7.1
- Restart server & run a query. This will tell user to upgrade.
- Type "UPDATE EXTENSION c" & press tab. maintenanced will start and crash
with Assert(CitusHasBeenLoaded() && CheckCitusVersion(WARNING));
This change checks Citus version before calling metadata functions so the
crash doesn't happen.
This will provide the full project name (i.e. Citus/Citus Enterprise),
and the host system, compiler, and architecture word size.
I wanted to limit the number of copied files in 'config', so I added
only config.guess and call it manually, rather than using the macro
AC_CANONICAL_HOST, which requires several other files.
PostgreSQL has it, now we do too!
Example: `./configure --with-extra-version=+git.20171011.a1387f4` would
currently result in: `` reporting the more useful:
SHOW citus.version;
citus.version
-------------------------------
7.1devel+git.20171011.a1387f4
Nice to have for packaging, one-off customer builds, etc. This stuff
is generally already in the package metadata, but it will be nice to
have it directly within a psql session.
Eclipse apparently doesn't scan build output looking for -D flags, so
having the value actually appear in a header is nicer for those of us
using IDEs.
Previously <curl/curl.h> was included even if compiled --without-libcurl.
This can fail when libcurl headers are not there. This commit guards this
include by checks for HAVE_LIBCURL.
Adds ```citus.enable_statistics_collection``` GUC variable, which ```true``` by default, unless built without libcurl. If statistics collection is enabled, sends basic usage data to Citus servers every 24 hours.
The data that is collected consists of:
- Citus version
- OS name & release
- Hardware Id
- Number of tables, rounded to next power of 2
- Size of data, rounded to next power of 2
- Number of workers
This commit provides the support for window functions in subquery and insert
into select queries. Note that our support for window functions is still limited
because it must have a partition by clause on the distribution key. This commit
makes changes in the files insert_select_planner and multi_logical_planner. The
required tests are also added with files multi_subquery_window_functions.out
and multi_insert_select_window.out.