Commit Graph

4012 Commits (remove_join_rest)

Author SHA1 Message Date
Onder Kalaci 4ef6b6868a remove join restructions 2020-10-21 11:05:54 +02:00
Onder Kalaci d7e1cf9e7b remove join restructions 2020-10-21 11:02:18 +02:00
Onder Kalaci 5c4c9304ba Remove RemoveDuplicateJoinRestrictions() function
RemoveDuplicateJoinRestrictions() function was introduced with the aim of decrasing the overall planning times by eliminating the duplicate JOIN restriction entries (#1989). However, it turns out that the function itself is so CPU intensive with a very high algorithmic complexity, it hurts a lot more than it helps. The function is a clear example of premature optimization.

The table below shows the difference clearly:

"distributed query planning
 time master"	RemoveDuplicateJoinRestrictions() execution time on master	"Remove the function RemoveDuplicateJoinRestrictions()
this PR"
5 table INNER JOIN	9 msec	2msec	7 msec
10 table INNER JOIN	227 msec	194 msec	29  msec
20 table INNER JOIN	1 sec 235 msec	1  sec 139  msec	90 msecs
50 table INNER JOIN	24 seconds	21 seconds	1.5 seconds
100 table INNER JOIN	2 minutes 16 secods	1 minute 53 seconds	23 seconds
250 table INNER JOIN	Bottleneck on JoinClauseList	18 minutes 52 seconds	Bottleneck on JoinClauseList

5 table INNER JOIN in subquery	9 msec	0 msec	6 msec
10 table INNER JOIN subquery	33 msec	10 msec	32 msec
20 table INNER JOIN subquery	132 msec	67 msec	123 msec
50 table INNER JOIN subquery	1.2  seconds	900 msec	500 msec
100 table INNER JOIN subquery	6 seconds	5  seconds	2 seconds
250 table INNER JOIN subquery	54 seconds	37 seconds	20  seconds

5 table LEFT JOIN	5 msec	0 msec	5 msec
10 table LEFT JOIN	11 msec	0 msec	13 msec
20 table LEFT JOIN	26 msec	2 msec	30 msec
50 table LEFT JOIN	150 msec	15 msec	193 msec
100 table LEFT JOIN	757 msec	71 msec	722 msec
250 table LEFT JOIN	8 seconds	600 msec	8 seconds

5 JOINs among 2 table JOINs 	37 msec	11 msec	25 msec
10 JOINs among 2 table JOINs 	536 msec	306 msec	352 msec
20 JOINs among 2 table JOINs 	794 msec	181 msec	640 msec
50 JOINs among 2 table JOINs 	25 seconds	2 seconds	22 seconds
100 JOINs among 2 table JOINs 	Bottleneck on JoinClauseList	9 seconds	Bottleneck on JoinClauseList
150 JOINs among 2 table JOINs 	Bottleneck on JoinClauseList	46 seconds	Bottleneck on JoinClauseList

On top of the performance penalty, the function had a critical bug #4255, and with #4254 we hit one more important bug. It should be fixed by adding the followig check to the ContextCoversJoinRestriction():
```
static bool
JoinRelIdsSame(JoinRestriction *leftRestriction, JoinRestriction *rightRestriction)
{
	Relids leftInnerRelIds = leftRestriction->innerrel->relids;
	Relids rightInnerRelIds = rightRestriction->innerrel->relids;
	if (!bms_equal(leftInnerRelIds, rightInnerRelIds))
	{
		return false;
	}

	Relids leftOuterRelIds = leftRestriction->outerrel->relids;
	Relids rightOuterRelIds = rightRestriction->outerrel->relids;
	if (!bms_equal(leftOuterRelIds, rightOuterRelIds))
	{
		return false;
	}

	return true;
}
```

However, adding this eliminates all the benefits tha RemoveDuplicateJoinRestrictions() brings.

I've used the commands here to generate the JOINs mentioned in the PR: https://gist.github.com/onderkalaci/fe8654f9df5916c7af4c7c5eb892561e#file-gistfile1-txt

Inner and outer JOINs behave roughly the same, to simplify the table only added INNER joins.
2020-10-21 10:29:39 +02:00
Onur Tirtir 790beea59f
Add intermediate result tests with unsupported outer joins (#4262) 2020-10-20 12:11:18 +03:00
SaitTalhaNisanci 0f209377c4
Fix incorrect join related fields (#4242)
* Fix incorrect join related fields

Ruleutils expect to give the original index of join columns hence we
should consider the dropped columns while setting the fields in
SetJoinRelatedFieldsCompat.

* add some more tests for joins

* Move tests to join.sql and create a utility function
2020-10-19 18:28:39 +03:00
Onur Tirtir c49077d594
Disallow outer joins `ON TRUE` with ref & dist tables when ref table is outer relation (#4255)
Disallow `ON TRUE` outer joins with reference & distributed tables
when reference table is outer relation by fixing the logic bug made
when calling `LeftListIsSubset` function.

Also, be more defensive when removing duplicate join restrictions
when join clause is empty for non-inner joins as they might still
contain useful information for non-inner joins.
2020-10-19 16:58:11 +03:00
Onur Tirtir 6e493624af
Merge pull request #4103 from citusdata/remove-unused-functions
Remove unused functions that cppcheck found
2020-10-19 14:58:29 +03:00
Onur Tirtir f80f4839ad Remove unused functions that cppcheck found 2020-10-19 13:50:52 +03:00
Önder Kalacı 25e43a4aa6
Merge pull request #4253 from citusdata/improve_perf_for_queries
Improve the relation restriction counters
2020-10-19 09:22:03 +02:00
Onder Kalaci bbedfca761 Improve the relation restriction counters
It seems like Postgres could call set_rel_pathlist() for
the same relation multiple times. This breaks the logic
where we assume relationCount eqauls to the number of
entries in relationRestrictionList.

In summary, relationRestrictionList may contain duplicate
entries.
2020-10-19 08:51:16 +02:00
Hadi Moshayedi 4708dc04f1
Merge pull request #4257 from citusdata/tableam_hadi
Set explicit transfer_mode in tableam tests
2020-10-16 12:54:52 -07:00
Hadi Moshayedi 663549db33 Set explicit transfer_mode in tableam tests 2020-10-16 12:40:37 -07:00
Hadi Moshayedi db96b9f861
Merge pull request #4250 from citusdata/tableam_hadi
Support "CREATE TABLE ... USING table_access_method" for distributed tables
2020-10-16 12:18:37 -07:00
Nils Dijk caabbf4b84 Table access method support for distributed tables 2020-10-16 12:02:25 -07:00
Onur Tirtir 7cb07c70fa
Move hasSemiJoin to JoinRestrictionContext (#4256) 2020-10-16 18:37:39 +03:00
Marco Slot 3261fc7eef
Merge pull request #4251 from citusdata/fix/ref-view-mod
Support view in reference table modification
2020-10-16 11:37:41 +02:00
Marco Slot 8976f245ab Support reference table view in reference table modification 2020-10-16 11:31:24 +02:00
Onur Tirtir de6f2d3f42
Refactor JoinRestrictionListExistsInContext to improve readability (#4249) 2020-10-16 12:24:56 +03:00
Önder Kalacı 212adfb26f
Merge pull request #4245 from citusdata/add_single_node_tests_more
Add more regression test for single node Citus
2020-10-15 17:49:03 +02:00
Onder Kalaci 596f7bf4a9 Add more regression test for single node Citus
Tests on commands with SCHEMA.
2020-10-15 17:32:32 +02:00
Önder Kalacı 3e5a92d33b
Merge pull request #4236 from citusdata/fix_intermediate_size
Local execution enforces citus.max_intermediate_result_size
2020-10-15 17:26:51 +02:00
Onder Kalaci fe3caf3bc8 Local execution considers intermediate result size limit
With this commit, we make sure that local execution adds the
intermediate result size as the distributed execution adds. Plus,
it enforces the citus.max_intermediate_result_size value.
2020-10-15 17:18:55 +02:00
Marco Slot ded4561661
Merge pull request #4246 from citusdata/fix/table-exists
Check table existence in EnsureRelationKindSupported
2020-10-15 17:18:05 +02:00
Marco Slot 31858c8a29 Check table existence in EnsureRelationKindSupported 2020-10-15 17:05:06 +02:00
SaitTalhaNisanci b5a3526c07
Merge pull request #4233 from citusdata/introduce_get_local_execution_status
Introduce GetCurrentLocalExecutionStatus wrapper
2020-10-15 15:55:46 +03:00
Sait Talha Nisanci ecde6c6eef Introduce GetCurrentLocalExecutionStatus wrapper
We should not access CurrentLocalExecutionStatus directly because that
would mean that we could also set it directly, which we shouldn't
because we have checks to see if the new state is possible, otherwise we
error.
2020-10-15 15:38:19 +03:00
Marco Slot 619b8b7654
Merge pull request #4247 from citusdata/fix/idempotent-upgrade 2020-10-15 14:08:46 +02:00
Simon Kelly 4f94e544b7 create 9.5-1 udfs and update citus--9.4-1--9.5-1.sql 2020-10-15 13:50:36 +02:00
Simon Kelly 2a6c867cb0 Make citus_prepare_pg_upgrade idempotent
https://github.com/citusdata/citus/issues/3527
2020-10-15 13:49:50 +02:00
Önder Kalacı 291154665f
Merge pull request #3597 from citusdata/refactor_outer_join_tests
Refactor outer join checks
2020-10-14 15:23:55 +02:00
Onder Kalaci 15e724c073 Add regression tests for outer/cross JOINs 2020-10-14 15:17:30 +02:00
Onder Kalaci de33079065 Improve outer join checks
Before this commit, the logic was:
    - As long as the outer side of the JOIN is not a JOIN (e.g., relation
      or subquery etc.), we check for the existence of any recurring
      tuples. There were two implications of this decision.

      First, even if a subquery which is on the outer side contains
      distributed table JOIN reference table, Citus would unnecessarily throw
      an error. Note that, the JOIN inside the subquery would already
      be going to be tested recursively. But, as long as that check
      passes, there is no reason for the upper JOIN to fail. An example, which
      used to fail and now works:

	SELECT * FROM (SELECT * FROM dist JOIN ref) as foo LEFT JOIN dist;

      Second, certain JOINs, especially with ON (true) conditions were not
      represented as Citus expects the JOINs to be in the format
      DeferredErrorIfUnsupportedRecurringTuplesJoin().
2020-10-14 15:17:30 +02:00
Onur Tirtir 1a28858c47
Disallow field indirection in INSERT/UPDATE queries (#4241) 2020-10-14 14:11:59 +03:00
Onur Tirtir 8efca3b60a
Fix a crash with inserting domain composite types in coord. evaluation (#4231)
Use short lived per-tuple context in citus_evaluate_expr like
(pg) evaluate_expr does.

We should not use planState->ExprContext when evaluating expressions
as it might lead to freeing the same executor twice (first one happens
in citus_evaluate_expr itself and the other one happens when postgres
doing clean-up for the top level executor state), which in turn might
cause seg.faults.

However, now as we don't have necessary planState info to evaluate
prepared statements, we also add planState->es_param_list_info to
per-tuple ExprContext.
2020-10-13 14:19:59 +03:00
Halil Ozan Akgül df185179c3
Merge pull request #4201 from citusdata/support-with-ties
Adds support for WITH TIES option
2020-10-12 19:44:48 +03:00
Halil Ozan Akgul e2736c25bd Adds support for WITH TIES option 2020-10-12 19:34:18 +03:00
Önder Kalacı 93764a3782
Merge pull request #4230 from citusdata/do_not_copy
Do not copy bit map set unnecessarily
2020-10-09 18:29:26 +02:00
Onder Kalaci e29aa51a87 Do not copy bms 2020-10-09 16:41:36 +02:00
SaitTalhaNisanci 0919d90cf8
Merge pull request #4229 from citusdata/use_pg13.0
Use pg 13.0 in tests
2020-10-09 13:53:54 +03:00
Sait Talha Nisanci b27ed05f1a Use pg 13.0 in tests 2020-10-09 11:51:14 +03:00
SaitTalhaNisanci 58d7c1613a
Merge pull request #4221 from citusdata/fix/vacuum_stuck
Commit transaction for VACUUM on shell table
2020-10-09 11:50:31 +03:00
SaitTalhaNisanci c7ceabc44a
Merge branch 'master' into fix/vacuum_stuck 2020-10-09 11:32:39 +03:00
Jelte Fennema d57bbfd3f9
Add uuid-dev to Ubuntu deps in CONTRIBUTING (#4218)
This is needed to compile postgres with --with-uuid=e2fs.
2020-10-09 10:27:47 +02:00
Sait Talha Nisanci dc40758355 Return early if there is no citus table in VACUUM 2020-10-09 11:10:00 +03:00
Sait Talha Nisanci 99bb79745a Commit transaction for VACUUM on shell table
With postgres 13, there is a global lock that prevents multiple VACUUMs
happening in the current database. This global lock is taken for a short
time but this creates a problem because of the following:

- We execute the VACUUM for the shell table through the standard process
utility. In this step the global lock is taken for the current database.
- If the current node has shard placements then it tries to execute
VACUUM over a connection to localhost with ExecuteUtilityTaskList.
- the VACUUM on shard placements cannot proceed because it is waiting
for the global lock for the current database to be released.
- The acquired lock from the VACUUM for shell table will not be released
until the transaction is committed.
- So there is a deadlock.

As a solution, we commit the current transaction in case of VACUUM after
the VACUUM is executed for the shell table. Executing the VACUUM on a
shell table is not important because the data there will probably be
truncated. PostprocessVacuumStmt takes the necessary locks on the shell
table so we don't need to take any extra locks after we commit the
current transaction.
2020-10-09 10:57:44 +03:00
Marco Slot fd40605745
Merge pull request #4222 from citusdata/fix/multiple-maintenanced 2020-10-08 16:45:39 +02:00
Marco Slot 881e5df780 Fix a bug that could lead to multiple maintenance daemons 2020-10-08 16:18:14 +02:00
Marco Slot 18219843d0 Add maintenance daemon error tests 2020-10-08 16:17:33 +02:00
Marco Slot 2e02a30e37
Merge pull request #4226 from snopoke/patch-2 2020-10-08 13:59:48 +02:00
Simon Kelly 03e007e4eb
Merge branch 'master' into patch-2 2020-10-08 13:00:11 +02:00