Commit Graph

1716 Commits (1c930c96a3ca32e8ff082f23a55f9b88fa35b690)

Author SHA1 Message Date
Onder Kalaci 1c930c96a3 Support non-co-located joins between subqueries
With #1804 (and related PRs), Citus gained the ability to
plan subqueries that are not safe to pushdown.

There are two high-level requirements for pushing down subqueries:

   * Individual subqueries that require a merge step (i.e., GROUP BY
     on non-distribution key, or LIMIT in the subquery etc). We've
     handled such subqueries via #1876.

    * Combination of subqueries that are not joined on distribution keys.
      This commit aims to recursively plan some of such subqueries to make
      the whole query safe to pushdown.

The main logic behind non colocated subquery joins is that we pick
an anchor range table entry and check for distribution key equality
of any  other subqueries in the given query. If for a given subquery,
we cannot find distribution key equality with the anchor rte, we
recursively plan that subquery.

We also used a hacky solution for picking relations as the anchor range
table entries. The hack is that we wrap them into a subquery. This is only
necessary since some of the attribute equivalance checks are based on
queries rather than range table entries.
2018-02-26 13:50:37 +02:00
Onder Kalaci 7b57e0562a Add infrastructure for detecting non-colocated subqueries 2018-02-26 13:28:25 +02:00
Onder Kalaci cdb8d429a7 Add regression tests for non-colocated leaf subqueries 2018-02-26 13:28:24 +02:00
Onder Kalaci 4d4648aabd Change single shard mx test tables to reference tables 2018-02-26 13:28:24 +02:00
Onder Kalaci 4d70c86645 Leaf level recursive planning for non colocated subqueries
With this commit, we enable recursive planning for the subqueries
that are not joined on the distribution keys.
2018-02-26 13:28:24 +02:00
Onder Kalaci e998703ff8 Enable restriction eq. checks for top level set operations
We used to only support pushdownable set operations inside a
subquery, however, we could easily expand the restriction
checks to cover top level set operations as well.
2018-02-26 13:28:24 +02:00
Onder Kalaci e8aa532a90 Refactor checks for distribution key equality
Change some function names, ensure we stick to Citus'
function order rules etc.
2018-02-26 13:28:24 +02:00
Marco Slot 846b8b1536
Merge pull request #2023 from citusdata/fix_table_size
Do not use new connection in table size functions
2018-02-26 11:04:54 +01:00
Marco Slot 1e9186a3b5 Do not use new connection in table size functions 2018-02-23 07:07:55 +01:00
Marco Slot e2001a332f
Merge pull request #2015 from MarkusSintonen/jsonb-aggregation
Add support for json(b) aggregation
2018-02-21 14:44:07 +01:00
Markus Sintonen 6202e80d06 Implemented jsonb_agg, json_agg, jsonb_object_agg, json_object_agg 2018-02-18 00:19:18 +02:00
Önder Kalacı 62237c40a7
Merge pull request #2007 from citusdata/ref_where_sublinks_v2
Recursively plan subqueries in WHERE clause when FROM recurs
2018-02-14 10:32:20 +03:00
velioglu 195ac948d2 Recursively plan subqueries in WHERE clause when FROM recurs 2018-02-13 19:52:12 +03:00
Marco Slot 6ce4795f1c
Merge pull request #1996 from citusdata/cache_worker_node_array
Cache worker node array for faster iteration
2018-02-12 15:26:48 -08:00
Marco Slot 0cba4ab588 Refactor worker node hash initialisation 2018-02-12 23:36:43 +01:00
Marco Slot 40d715d494 Cache worker node array for faster iteration 2018-02-12 23:36:43 +01:00
Marco Slot 65fca44f4f
Merge pull request #1979 from citusdata/fix_abort_errors
Handle errors that are discovered during abort
2018-02-12 10:04:00 -08:00
Marco Slot d9c5c4a8f1
Merge pull request #2003 from citusdata/no_plan_copy
Only copy distributed plan when modifying it
2018-02-12 09:34:30 -08:00
Önder Kalacı bf1e492011
Merge pull request #1989 from citusdata/refactor_restriction_logic
Some code refactoring and performance improvements for restriction equivalences
2018-02-12 20:01:35 +03:00
Onder Kalaci 94c5ac6ebb Remove duplicate join restrictions
We use PostgreSQL hooks to accumulate the join restrictions
and PostgreSQL gives us all the join paths it tries while
deciding on the join order. Thus, for queries that have many
joins, this function is likely to remove lots of duplicate join
restrictions. This becomes relevant for Citus on query pushdown
check peformance.
2018-02-12 18:35:05 +02:00
Onder Kalaci c228d8ff3d Refactor equivalance generation related codes
This commit changes the APIs for restriction generation to make future
changes simpler.
2018-02-12 18:35:04 +02:00
Onder Kalaci 2f2d350924 Refactor relation restriction related codes
This commit moves some of the functions to a more relevant
source file.
2018-02-12 18:35:04 +02:00
Marco Slot 6e79a34c97 Do not check for cancellation in ClearResultsIfReady 2018-02-12 16:45:02 +01:00
Marco Slot 6051aae56e Handle errors that are discovered during abort 2018-02-12 16:45:02 +01:00
Marco Slot ee6a751798 Only copy distributed plan when modifying it 2018-02-12 16:30:55 +01:00
Jason Petersen e75eb17130
Try new Debian URL 2018-02-07 15:06:37 -07:00
Metin Döşlü 7332244c8c
Merge pull request #1999 from citusdata/citus-7.2.1-changelog-1517919981
Bump citus to 7.2.1
2018-02-06 15:41:44 +03:00
Metin Doslu 238defaee0 Add changelog entry for 7.2.1 2018-02-06 14:27:00 +02:00
Burak Yücesoy cf5d258043
Merge pull request #1993 from citusdata/subquery_pushdown_count_distinct
Fix count distinct using field select on top level query
2018-02-06 15:06:54 +03:00
Murat Tuncer 678223224b Update regression test output expectation based on recent PG10 change 2018-02-06 14:44:55 +03:00
Murat Tuncer 901b543e20 Fix count distinct using field select on top level query
We were allowing count distict queries even if they were
not directly on columns if the query is grouped on
distribution column.

When performing these checks we were skipping subqueries
because they also perform this check in a more concise manner.
We relied on oid SUBQUERY_RELATION_ID (10000) to decide if
a given RTE relation id denotes a subquery, however, we also
use SUBQUERY_PUSHDOWN_RELATION_ID (10001) for some subqueries.

We skip both type of subqueries with this change.
2018-02-06 13:16:10 +03:00
Metin Döşlü aba2f47cdf
Merge pull request #1988 from citusdata/respect_enable_hashagg
Respect enable_hashagg in the master planner
2018-02-05 16:27:05 +03:00
metdos 35f864bcaf Respect enable_hashagg in the master planner 2018-02-05 15:06:00 +02:00
metdos 3d540d961c Fix typo in grouping_is_sortable() 2018-02-05 12:10:19 +02:00
Marco Slot 00f9082cd4
Merge pull request #1965 from citusdata/fast_jsonb_copy
Skip JSON validation on coordinator during COPY
2018-02-04 14:56:56 +01:00
Marco Slot 6f7c3bd73b Skip JSON validation on coordinator during COPY 2018-02-02 15:33:27 +01:00
Brian Cloutier 15511f6ba1 Dynamically allocate connection metadata in WaitForAllConnections 2018-02-01 10:30:41 -08:00
Brian Cloutier e6ebfc1f53 Remove VLA from UpdateNodeLocation 2018-02-01 10:30:41 -08:00
Brian Cloutier a2ed45e206 Remove variable length arrays
VLAs aren't supported by Visual Studio.

- Remove all existing instances of VLAs.
- Add a flag, -Werror=vla, which makes gcc refuse to compile if we add
  VLAs in the future.
2018-02-01 10:30:41 -08:00
Brian Cloutier 2efe80ce55 CheckForDistributedDeadlocks no longer uses a VLA
- variable length arrays (VLAs) do not work with Visual Studio
- fix an off-by-one error. We incorrectly assumed there would always at
  least as many edges as there were nodes.
- refactor: reduce scope of transactionNodeStack by moving it into the
  function which uses it.
- refactor: break up the distinct uses of currentStackDepth into
  separate variables.
2018-02-01 10:30:41 -08:00
Brian Cloutier 097fd15a89 small refactor, CheckDeadlockForTransactionNode builds it's own array 2018-02-01 10:30:41 -08:00
Brian Cloutier 457f570b77 Small refactor, we were using incompatible types 2018-01-31 11:05:59 -08:00
Brian Cloutier b864d014ab
GetNextNodeId() incorrectly called PG_RETURN_DATUM
- Also stabilize the output of a multi_router_planner test
2018-01-29 15:32:36 -08:00
Brian Cloutier 61a6b846b9 Refactor: use a temporary timestamp variable
It's against our coding convention to call functions inside parameter
lists; when single-stepping with a debugger it's difficult to determine
what the function returned.

That wouldn't be good enough reason to change this code but while
porting Citus to Windows I ran into this line of code.
assign_distributed_transaction_id was called with a weird timestamp and
I wasn't able to find the problem without first making this change.
2018-01-29 11:20:13 -08:00
Marco Slot 0303dfc463
Merge pull request #1981 from citusdata/faster_execute_subplans
Skip call to ActiveReadableNodeList when there are no subplans
2018-01-29 17:20:44 +01:00
Marco Slot bd0ebac865 Skip call to ActiveReadableNodeList when there are no subplans 2018-01-29 16:05:10 +01:00
Hadi Moshayedi ff26bcd5a5
Include sys/stat.h for S_IRUSR and S_IWUSR. (#1977) 2018-01-26 16:21:48 -05:00
Marco Slot ddbcb9fc25
Merge pull request #1944 from citusdata/base_schedule
Add base schedule for only running specific regression tests
2018-01-25 22:12:34 +01:00
Marco Slot 4762503c34 Add base schedule for only running specific regression tests 2018-01-25 18:51:22 +01:00
Burak Velioglu d43e18f398
Merge pull request #1975 from citusdata/add_remaining_changelog
Adds missing item to 7.2 changelog
2018-01-25 17:18:01 +03:00