Deparser function set_relation_column_names() knows that it needs to
re-evaluate column names based on relation's tuple descriptor when
the rte belongs to a relation (RTE_RELATION).
However before this commit, it didn't know about the fact that citus
might wrap such an rte with an rte that points to
citus_extradata_container() placeholder.
And because of this, it was simply taking the column aliases
(e.g., "bar" in "foo AS bar") into the account and this might result in
an incorrectly deparsed query as in below case:
* Say, if we had view based on following query:
```sql
SELECT a FROM table;
```
* And if we rename column "a" to "b", the view query normally becomes:
```sql
SELECT b AS a FROM table;
```
* So before this commit, deparsing a query based on that view was
resulting in such a query due to deparsing based on the column aliases,
which is not correct:
```sql
SELECT a FROM table;
```
Fixes#5932.
DESCRIPTION: Fixes a bug that might cause failing to query the views
based on tables that have renamed columns
* Adjust configure script to allow PG15
* Adds copy of ruleutils_14.c as ruleutils_15.c
* Uses get_namespace_name_or_temp in ruleutils_15.c
Relevant PG commit:
48c5c9068211e0a04fd9553c8714b2821ed3ad17
* Clean up code using "(expr) ? true : false" in ruleutils_15.c
Relevant PG commit:
fd0625c7a9c679c0c1e896014b8f49a489c3a245
* Change varno from Index (unsigned int) to int in ruleutils_15.c
Relevant PG commit:
e3ec3c00d85bd2844ffddee83df2bd67c4f8297f
* Adds find_recursive_union to ruleutils_15.c
Relevant PG commit:
3f50b82639637c9908afa2087de7588450aa866b
* Fix display of SQL-std func's args in INSERT/SELECT in ruleutils_15.c
Relevant PG commit:
a8d8445a7b2f80f6d0bfe97b19f90bd2cbef8759
* Fix ruleutils_15.c's dumping of whole-row Vars in more contexts
Relevant PG commit:
43c2175121c829c8591fc5117b725f1f22bfb670
* Fix assorted missing logic for GroupingFunc nodes in ruleutils_15.c
Relevant PG commit:
2591ee8ec44d8cbc8e1226550337a64c684746e4
* Adds grammar support for SQL/JSON clauses in ruleutils_15.c
Relevant PG commit:
f79b803dcc98d707450e158db3638dc67ff8380b
* Adds SQL/JSON constructors to ruleutils_15.c
Relevant PG commits:
f4fb45d15c59d7add2e1b81a9d477d0119a9691a
cc7401d5ca498a84d9b47fd2e01cebd8e830e558
* Adds support for MERGE in ruleutils_15.c
Relevant PG commit:
7103ebb7aae8ab8076b7e85f335ceb8fe799097c
* Add IS JSON predicate to ruleutils_15.c
Relevant PG commit:
33a377608fc29cdd1f6b63be561eab0aee5c81f0
* Add SQL/JSON query functions to ruleutils_15.c
Relevant PG commit:
1a36bc9dba8eae90963a586d37b6457b32b2fed4
* Adds three different SQL/JSON values to ruleutils_15.c
Relevant PG commits:
606948b058dc16bce494270eea577011a602810e
49082c2cc3d8167cca70cfe697afb064710828ca
* Adds JSON table functions in ruleutils_15.c
Relevant PG commit:
4e34747c88a03ede6e9d731727815e37273d4bc9
* Add PLAN function for JSON table in ruleutils_15.c
Relevant PG commit:
fadb48b00e02ccfd152baa80942de30205ab3c4f
* Remove extra blank lines before block-closing braces ruleutils_15.c
Relevant PG commit:
24d2b2680a8d0e01b30ce8a41c4eb3b47aca5031
* set_deparse_plan: Reuse variable to appease Coverity ruleutils_15.c
Relevant PG commit:
e70813fbc4aaca35ec012d5a426706bd54e4acab
* Mechanical code beautification ruleutils_15.c
Relevant PG commit:
23e7b38bfe396f919fdb66057174d29e17086418
* Rename value_type to item_type in ruleutils_15.c
Relevant PG commit:
3ab9a63cb638a1fd99475668e2da9c237495aeda
* Show 'AS "?column?"' explicitly when it's important in ruleutils_15.c
Relevant PG commit:
c7461fc25558832dd347a9c8150b0f1ed85e36e8
* Fix ruleutils_15.c issues with dropped cols in funcs-returning-composite
Relevant PG commit:
c1d1e8469c77ce6b8e5310955580b4a3eee7fe96
* Change comment regarding functions returning composite in ruleutils_15.c
Relevant PG commit:
c2fa113ddb1117b1f03e91960f65d5d7d8a90270
* Replace int nodes with bool nodes where needed
In PG15, Boolean nodes are added. Pre PG15, internal Boolean values
in Create Role commands were represented by Integer nodes. This
commit replaces int nodes logic with bool nodes logic where needed.
Mostly there are CREATE ROLE logic changes.
Relevant PG commit:
941460fcf731a32e6a90691508d5cfa3d1f8eeaf
* Handle new option colliculocale in CREATE COLLATION logic
In PG15, there is an added option to use ICU as global locale provider.
pg_collation has three locale-related fields: collcollate and collctype,
which are libc-related fields, and a new one colliculocale, which is the
ICU-related field. Only the libc-related fields or the ICU-related field
is set, never both.
Relevant PG commits:
f2553d43060edb210b36c63187d52a632448e1d2
54637508f87bd5f07fb9406bac6b08240283be3b
* Add PG15 tests to CI using test images that have 15beta2 (#6093)
* Change warning message in pg_signal_backend()
Relevant PG commit:
7fa945b857cc1b2964799411f1633468826861ff
* Revert "Add missing ifdef for PG 15"
This reverts commit c7b51025ab.
* Fixes tests for ALTER TRIGGER RENAME consistency for part. tables
Relevant PG commit:
80ba4bb383538a2ee846fece6a7b8da9518b6866
* Prevent creating child triggers on partitions when adding new node
Pre PG15, tgisinternal is true for a "child" trigger on a partition
cloned from the trigger on the parent.
In PG15, tgisinternal is false in that case. However, we don't want to
create this trigger on the partition since it will create a conflict
when we try to attach the partition to the parent table:
ERROR: trigger "..." for relation "{partition_name}" already exists
Relevant PG commit:
f4566345cf40b068368cb5617e61318da60676ec
* Fix tests for generated columns dependency changes
In PG15, For GENERATED columns, all dependencies of the generation
expression are recorded as NORMAL dependencies of the column itself.
This requires CASCADE to drop generated cols with the original col.
PRE PG15, dependencies were recorded as AUTO, with which
generated columns are silently dropped with the original column.
Relevant PG commit:
cb02fcb4c95bae08adaca1202c2081cfc81a28b5
* Explicitly cast catalog "char" column to text before concatenation
Relevant PG commit:
07eee5a0dc642d26f44d65c4e6263304208e8583
* Remove 'AS "?column?"' from test outputs
There were some instances in the following tst outputs
in planning debug outputs where AS "?column?" is added.
We add a normalization rule to remove it as it is not
important.
cte_inline.out
recursive_relation_planning_restriction_pushdown.out
Relevant PG commit:
c7461fc25558832dd347a9c8150b0f1ed85e36e8
* Use pg_backup_stop(PG15) instead of pg_stop_backup(PG<15)
Add an alternative test output because of the change in the
backup modes of Postgres. Specifically here, there is a renaming
issue: pg_stop_backup PRE PG15 vs pg_backup_stop PG15+
The alternative output can be deleted when we drop support for PG14
Relevant PG commit:
39969e2a1e4d7f5a37f3ef37d53bbfe171e7d77a
* Adds citus.mitmfifo GUC
Previously we setting this configuration parameter
in the fly for failure tests schedule.
However, PG15 doesn't allow that anymore: reserved prefixes
like "citus" cannot be used to set non-existing GUCs.
Relevant PG commit:
88103567cb8fa5be46dc9fac3e3b8774951a2be7
* Handles EXPLAIN output diffs in PG15 - Extra result lines
To handle extra "Result" lines in explain outputs, we add explain
method to multi_test_helpers.sql file
- plan_without_result_lines() is added for cases where we want the
whole explain output with only "Result" lines removed
* Handles EXPLAIN output diffs in PG15, Hash Agg/Join leverage
To handle differences in usage of GroupAggregate vs HashAggregate
or Merge Join vs Hash join in cases where this detail doesn't
seem to matter, we use coordinator_plan().
- coordinator_plan() is updated to remove "Result" lines
There are some cases where we have subplans so we add a new
function that prints all Task Count lines as well
- coordinator_plan_with_subplans()
Still not sure of the relevant PG commit
Could be db0d67db2401eb6238ccc04c6407a4fd4f985832
but disabling enable_group_by_reordering didn't help.
* Handles EXPLAIN output diffs in PG15: enable_group_by_reordering
Relevant PG commit
db0d67db2401eb6238ccc04c6407a4fd4f985832
* Normalizes Memory Usage, Buckets, Batches for PG15 explain diffs
We create a new function in multi_test_helpers, which is similar
to explain_merge function in PG15. This explain helper function
normalies Memory Usage, Buckets and Batches, and we use it in the
tests which give a different output for PG15.
* Bump test images to 15beta3 (#6172)
* Omit namespace in post-copy errmsg
Relevant PG commit:
069d33d0c5a021601245e44df77a0423ddd69359
* Handles EXPLAIN output diffs in PG15: extra arrows&result lines
To handle extra "->" arrows resulting from extra Result lines
in explain outputs, we add the following explain method to
multi_test_helpers.sql file
- plan_without_arrows() is added for cases where we want the
whole explain output without arrows and without Result lines
* Alters public schema's owner to pg_database_owner in PG15
In PG15, public schema is owned by pg_database_owner role.
In multi_extension, we drop and recreate the ppublic schema,
hence its owner become the default user in our tests, postgres.
Change that to pg_database_owner for PG15 consistency.
This results in alternative test output for public schema grants
in the following test:
grant_on_schema_propagation.sql
Relevant PG commit: b073c3ccd06e4cb845e121387a43faa8c68a7b62
* Add alternative test outputs for change in Insert Select display
citus_local_tables_queries.sql
coordinator_shouldhaveshards.sql
cte_inline.sql
insert_select_repartition.sql
intermediate_result_pruning.sql
local_shard_execution.sql
local_shard_execution_replicated.sql
multi_deparse_shard_query.sql
multi_insert_select.sql
multi_insert_select_conflict.sql
multi_mx_insert_select_repartition.sql
mx_coordinator_shouldhaveshards.sql
single_node.sql
Relevant PG commit:
a8d8445a7b2f80f6d0bfe97b19f90bd2cbef8759
* Fixes columnar tap tests for PG15
In PG15, Perl test modules have been moved to a new namespace.
Also, postgres node new() and get_new_node() methods have been
unified to one method: new()
We create separate tap tests for PG13/14 and PG15+
and update the Makefiles accordingly.
Relevant PG commits:
201a76183e2056c2217129e12d68c25ec9c559c8
b3b4d8e68ae83f432f43f035c7eb481ef93e1583
* Handles EXPLAIN output diffs in PG15: HashAgg Leverage,alt. output
Still not sure of the relevant PG commit
Could be db0d67db2401eb6238ccc04c6407a4fd4f985832
but disabling enable_group_by_reordering didn't help.
* use adaptive executor even if task-tracker is set
* Update check-multi-mx tests for adaptive executor
Basically repartition joins are enabled where necessary. For parallel
tests max adaptive executor pool size is decresed to 2, otherwise we
would get too many clients error.
* Update limit_intermediate_size test
It seems that when we use adaptive executor instead of task tracker, we
exceed the intermediate result size less in the test. Therefore updated
the tests accordingly.
* Update multi_router_planner
It seems that there is one problem with multi_router_planner when we use
adaptive executor, we should fix the following error:
+ERROR: relation "authors_range_840010" does not exist
+CONTEXT: while executing command on localhost:57637
* update repartition join tests for check-multi
* update isolation tests for repartitioning
* Error out if shard_replication_factor > 1 with repartitioning
As we are removing the task tracker, we cannot switch to it if
shard_replication_factor > 1. In that case, we simply error out.
* Remove MULTI_EXECUTOR_TASK_TRACKER
* Remove multi_task_tracker_executor
Some utility methods are moved to task_execution_utils.c.
* Remove task tracker protocol methods
* Remove task_tracker.c methods
* remove unused methods from multi_server_executor
* fix style
* remove task tracker specific tests from worker_schedule
* comment out task tracker udf calls in tests
We were using task tracker udfs to test permissions in
multi_multiuser.sql. We should find some other way to test them, then we
should remove the commented out task tracker calls.
* remove task tracker test from follower schedule
* remove task tracker tests from multi mx schedule
* Remove task-tracker specific functions from worker functions
* remove multi task tracker extra schedule
* Remove unused methods from multi physical planner
* remove task_executor_type related things in tests
* remove LoadTuplesIntoTupleStore
* Do initial cleanup for repartition leftovers
During startup, task tracker would call TrackerCleanupJobDirectories and
TrackerCleanupJobSchemas to clean up leftover directories and job
schemas. With adaptive executor, while doing repartitions it is possible
to leak these things as well. We don't retry cleanups, so it is possible
to have leftover in case of errors.
TrackerCleanupJobDirectories is renamed as
RepartitionCleanupJobDirectories since it is repartition specific now,
however TrackerCleanupJobSchemas cannot be used currently because it is
task tracker specific. The thing is that this function is a no-op
currently.
We should add cleaning up intermediate schemas to DoInitialCleanup
method when that problem is solved(We might want to solve it in this PR
as well)
* Revert "remove task tracker tests from multi mx schedule"
This reverts commit 03ecc0a681.
* update multi mx repartition parallel tests
* not error with task_tracker_conninfo_cache_invalidate
* not run 4 repartition queries in parallel
It seems that when we run 4 repartition queries in parallel we get too
many clients error on CI even though we don't get it locally. Our guess
is that, it is because we open/close many connections without doing some
work and postgres has some delay to close the connections. Hence even
though connections are removed from the pg_stat_activity, they might
still not be closed. If the above assumption is correct, it is unlikely
for it to happen in practice because:
- There is some network latency in clusters, so this leaves some times
for connections to be able to close
- Repartition joins return some data and that also leaves some time for
connections to be fully closed.
As we don't get this error in our local, we currently assume that it is
not a bug. Ideally this wouldn't happen when we get rid of the
task-tracker repartition methods because they don't do any pruning and
might be opening more connections than necessary.
If this still gives us "too many clients" error, we can try to increase
the max_connections in our test suite(which is 100 by default).
Also there are different places where this error is given in postgres,
but adding some backtrace it seems that we get this from
ProcessStartupPacket. The backtraces can be found in this link:
https://circleci.com/gh/citusdata/citus/138702
* Set distributePlan->relationIdList when it is needed
It seems that we were setting the distributedPlan->relationIdList after
JobExecutorType is called, which would choose task-tracker if
replication factor > 1 and there is a repartition query. However, it
uses relationIdList to decide if the query has a repartition query, and
since it was not set yet, it would always think it is not a repartition
query and would choose adaptive executor when it should choose
task-tracker.
* use adaptive executor even with shard_replication_factor > 1
It seems that we were already using adaptive executor when
replication_factor > 1. So this commit removes the check.
* remove multi_resowner.c and deprecate some settings
* remove TaskExecution related leftovers
* change deprecated API error message
* not recursively plan single relatition repartition subquery
* recursively plan single relation repartition subquery
* test depreceated task tracker functions
* fix overlapping shard intervals in range-distributed test
* fix error message for citus_metadata_container
* drop task-tracker deprecated functions
* put the implemantation back to worker_cleanup_job_schema_cachesince citus cloud uses it
* drop some functions, add downgrade script
Some deprecated functions are dropped.
Downgrade script is added.
Some gucs are deprecated.
A new guc for repartition joins bucket size is added.
* order by a test to fix flappiness
- changes in ruleutils_11.c is reflected
- vacuum statement api change is handled. We now allow
multi-table vacuum commands.
- some other function header changes are reflected
- api conflicts between PG11 and earlier versions
are handled by adding shims in version_compat.h
- various regression tests are fixed due output and
functionality in PG1
- no change is made to support new features in PG11
they need to be handled by new commit
With #1804 (and related PRs), Citus gained the ability to
plan subqueries that are not safe to pushdown.
There are two high-level requirements for pushing down subqueries:
* Individual subqueries that require a merge step (i.e., GROUP BY
on non-distribution key, or LIMIT in the subquery etc). We've
handled such subqueries via #1876.
* Combination of subqueries that are not joined on distribution keys.
This commit aims to recursively plan some of such subqueries to make
the whole query safe to pushdown.
The main logic behind non colocated subquery joins is that we pick
an anchor range table entry and check for distribution key equality
of any other subqueries in the given query. If for a given subquery,
we cannot find distribution key equality with the anchor rte, we
recursively plan that subquery.
We also used a hacky solution for picking relations as the anchor range
table entries. The hack is that we wrap them into a subquery. This is only
necessary since some of the attribute equivalance checks are based on
queries rather than range table entries.
Expands count distinct coverage by allowing more cases. We used to support
count distinct only if we can push down distinct aggregate to worker query
i.e. the count distinct clause was on the partition column of the table,
or there was a grouping on the partition column.
Now we can support
- non-partition columns, with or without grouping on partition column
- partition, and non partition column in the same query
- having clause
- single table subqueries
- insert into select queries
- join queries where count distinct is on partition, or non-partition column
- filters on count distinct clauses (extends existing support)
We first try to push down aggregate to worker query (original case), if we
can't then we modify worker query to return distinct columns to coordinator
node. We do that by adding distinct column targets to group by clauses. Then
we perform count distinct operation on the coordinator node.
This work should reduce the cases where HLL is used as it can address anything
that HLL can. However, if we start having performance issues due to very large
number rows, then we can recommend hll use.
Add a second implementation of INSERT INTO distributed_table SELECT ... that is used if
the query cannot be pushed down. The basic idea is to execute the SELECT query separately
and pass the results into the distributed table using a CopyDestReceiver, which is also
used for COPY and create_distributed_table. When planning the SELECT, we go through
planner hooks again, which means the SELECT can also be a distributed query.
EXPLAIN is supported, but EXPLAIN ANALYZE is not because preventing double execution was
a lot more complicated in this case.
* Support for subqueries in WHERE clause
This commit enables subqueries in WHERE clause to be pushed down
by the subquery pushdown logic.
The support covers:
- Correlated subqueries with IN, NOT IN, EXISTS, NOT EXISTS,
operator expressions such as (>, <, =, ALL, ANY etc.)
- Non-correlated subqueries with (partition_key) IN (SELECT partition_key ..)
(partition_key) =ANY (SELECT partition_key ...)
Note that this commit heavily utilizes the attribute equivalence logic introduced
in the 1cb6a34ba8. In general, this commit mostly
adjusts the logical planner not to error out on the subqueries in WHERE clause.
* Improve error checks for subquery pushdown and INSERT ... SELECT
Since we allow subqueries in WHERE clause with the previous commit,
we should apply the same limitations to those subqueries.
With this commit, we do not iterate on each subquery one by one.
Instead, we extract all the subqueries and apply the checks directly
on those subqueries. The aim of this change is to (i) Simplify the
code (ii) Make it close to the checks on INSERT .. SELECT code base.
* Extend checks for unresolved paramaters to include SubLinks
With the presence of subqueries in where clause (i.e., SubPlans on the
query) the existing way for checking unresolved parameters fail. The
reason is that the parameters for SubPlans are kept on the parent plan not
on the query itself (see primnodes.h for the details).
With this commit, instead of checking SubPlans on the modified plans
we start to use originalQuery, where SubLinks represent the subqueries
in where clause. The unresolved parameters can be found on the SubLinks.
* Apply code-review feedback
* Remove unnecessary copying of shard interval list
This commit removes unnecessary copying of shard interval list. Note
that there are no copyObject function implemented for shard intervals.
* Enabling physical planner for subquery pushdown changes
This commit applies the logic that exists in INSERT .. SELECT
planning to the subquery pushdown changes.
The main algorithm is followed as :
- pick an anchor relation (i.e., target relation)
- per each target shard interval
- add the target shard interval's shard range
as a restriction to the relations (if all relations
joined on the partition keys)
- Check whether the query is router plannable per
target shard interval.
- If router plannable, create a task
* Add union support within the JOINS
This commit adds support for UNION/UNION ALL subqueries that are
in the following form:
.... (Q1 UNION Q2 UNION ...) as union_query JOIN (QN) ...
In other words, we currently do NOT support the queries that are
in the following form where union query is not JOINed with
other relations/subqueries :
.... (Q1 UNION Q2 UNION ...) as union_query ....
* Subquery pushdown planner uses original query
With this commit, we change the input to the logical planner for
subquery pushdown. Before this commit, the planner was relying
on the query tree that is transformed by the postgresql planner.
After this commit, the planner uses the original query. The main
motivation behind this change is the simplify deparsing of
subqueries.
* Enable top level subquery join queries
This work enables
- Top level subquery joins
- Joins between subqueries and relations
- Joins involving more than 2 range table entries
A new regression test file is added to reflect enabled test cases
* Add top level union support
This commit adds support for UNION/UNION ALL subqueries that are
in the following form:
.... (Q1 UNION Q2 UNION ...) as union_query ....
In other words, Citus supports allow top level
unions being wrapped into aggregations queries
and/or simple projection queries that only selects
some fields from the lower level queries.
* Disallow subqueries without a relation in the range table list for subquery pushdown
This commit disallows subqueries without relation in the range table
list. This commit is only applied for subquery pushdown. In other words,
we do not add this limitation for single table re-partition subqueries.
The reasoning behind this limitation is that if we allow pushing down
such queries, the result would include (shardCount * expectedResults)
where in a non distributed world the result would be (expectedResult)
only.
* Disallow subqueries without a relation in the range table list for INSERT .. SELECT
This commit disallows subqueries without relation in the range table
list. This commit is only applied for INSERT.. SELECT queries.
The reasoning behind this limitation is that if we allow pushing down
such queries, the result would include (shardCount * expectedResults)
where in a non distributed world the result would be (expectedResult)
only.
* Change behaviour of subquery pushdown flag (#1315)
This commit changes the behaviour of the citus.subquery_pushdown flag.
Before this commit, the flag is used to enable subquery pushdown logic. But,
with this commit, that behaviour is enabled by default. In other words, the
flag is now useless. We prefer to keep the flag since we don't want to break
the backward compatibility. Also, we may consider using that flag for other
purposes in the next commits.
* Require subquery_pushdown when limit is used in subquery
Using limit in subqueries may cause returning incorrect
results. Therefore we allow limits in subqueries only
if user explicitly set subquery_pushdown flag.
* Evaluate expressions on the LIMIT clause (#1333)
Subquery pushdown uses orignal query, the LIMIT and OFFSET clauses
are not evaluated. However, logical optimizer expects these expressions
are already evaluated by the standard planner. This commit manually
evaluates the functions on the logical planner for subquery pushdown.
* Better format subquery regression tests (#1340)
* Style fix for subquery pushdown regression tests
With this commit we intented a more consistent style for the
regression tests we've added in the
- multi_subquery_union.sql
- multi_subquery_complex_queries.sql
- multi_subquery_behavioral_analytics.sql
* Enable the tests that are temporarily commented
This commit enables some of the regression tests that were commented
out until all the development is done.
* Fix merge conflicts (#1347)
- Update regression tests to meet the changes in the regression
test output.
- Replace Ifs with Asserts given that the check is already done
- Update shard pruning outputs
* Add view regression tests for increased subquery coverage (#1348)
- joins between views and tables
- joins between views
- union/union all queries involving views
- views with limit
- explain queries with view
* Improve btree operators for the subquery tests
This commit adds the missing comprasion for subquery composite key
btree comparator.
Enables use views within distributed queries.
User can create and use a view on distributed tables/queries
as he/she would use with regular queries.
After this change router queries will have full support for views,
insert into select queries will support reading from views, not
writing into. Outer joins would have a limited support, and would
error out at certain cases such as when a view is in the inner side
of the outer join.
Although PostgreSQL supports writing into views under certain circumstances.
We disallowed that for distributed views.