Commit Graph

3753 Commits (993a402c730a24bb067204b46d51b447e08af32a)

Author SHA1 Message Date
Halil Ozan Akgul 993a402c73 Fixes create index concurrently bug
(cherry picked from commit 38b72ddd66)
2020-07-27 10:32:08 +03:00
SaitTalhaNisanci 39e63f5a08 Fix int32 overflow and use PG macros for INT32_XX (#4061)
* Use CalculateUniformHashRangeIndex in HashPartitionId

INT32_MIN definition can change among different platforms hence it is
possible to get overflow, we would see crashes because of this in debian
distros. We have already solved a similar problem with introducing
CalculateUniformHashRangeIndex method, hence to solve it we can use the
same method, this also removes some duplication and has a single place
to decide that.

* Use PG_INT32_XX instead of INT32_XX to be safer

(cherry picked from commit ef841115de)
2020-07-27 10:32:08 +03:00
Halil Ozan Akgül 2271e9ded1 Fixes the non existing table bug (#4058)
(cherry picked from commit e9f89ed651)
2020-07-27 10:32:08 +03:00
Sait Talha Nisanci 4c90dbbd88 improve error message in secondaries
(cherry picked from commit 6f4686c741467b5c8bd6ca15c0788d8db856392a)
2020-07-21 13:55:12 +03:00
Sait Talha Nisanci 388893ce5e add multi follower repartition tests
(cherry picked from commit 6e5598fd58a1c0c6a597ca06539ac5e286cb6914)
2020-07-21 13:55:08 +03:00
Sait Talha Nisanci 4b98f6c5c2 address feedback
(cherry picked from commit 24043a3602abc7b525f2724a35168e4c45442165)
2020-07-21 13:55:04 +03:00
Sait Talha Nisanci 97dda868a0 use ActiveReadableNodeList in JobExecutorType and task tracker
The reason we should use ActiveReadableNodeList instead of ActiveReadableNonCoordinatorNodeList is that if coordinator is added to cluster as a worker, it should be counted as well. Otherwise if there is only coordinator in the cluster, the count will be 0, hence we get a warning.

In MultiTaskTrackerExecute, we should connect to coordinator if it is
added to the cluster because it will also be assigned tasks.

(cherry picked from commit ae6180ace2931223c58b87444a9e812f5e9f06e8)
2020-07-21 13:55:00 +03:00
Sait Talha Nisanci 27ef768f36 use ActivePrimaryNodeList to include coordinator
ActiveReadableWorkerNodeList doesn't include coordinator, however if
coordinator is added as a worker, we should also include that while
planning. The current methods are very easily misusable and this
requires a refactoring to make the distinction between methods that
include coordinator and that don't very explicit as they can introduce
subtle/major bugs pretty easily.

(cherry picked from commit 86b974e4ceddaf5e2c44799148a8cf485c7d90bf)
2020-07-21 13:54:56 +03:00
Sait Talha Nisanci c238e6c8b0 send schema creation/cleanup to coordinator in repartitions
We were using ALL_WORKERS TargetWorkerSet while sending temporary schema
creation and cleanup. We(well mostly I) thought that ALL_WORKERS would also include coordinator when it is added as a worker. It turns out that it was FILTERING OUT the coordinator even if it is added as a worker to the cluster.

So to have some context here, in repartitions, for each jobId we create
(at least we were supposed to) a schema in each worker node in the cluster. Then we partition each shard table into some intermediate files, which is called the PARTITION step. So after this partition step each node has some intermediate files having tuples in those nodes. Then we fetch the partition files to necessary worker nodes, which is called the FETCH step. Then from the files we create intermediate tables in the temporarily created schemas, which is called a MERGE step. Then after evaluating the result, we remove the temporary schemas(one for each job ID in each node) and files.

If node 1 has file1, and node 2 has file2 after PARTITION step, it is
enough to either move file1 from node1 to node2 or vice versa. So we
prune one of them.

In the MERGE step, if the schema for a given jobID doesn't exist, the
node tries to use the `public` schema if it is a superuser, which is
actually added for testing in the past.

So when we were not sending schema creation comands for each job ID to
the coordinator(because we were using ALL_WORKERS flag, and it doesn't
include the coordinator), we would basically not have any schemas for
repartitions in the coordinator. The PARTITION step would be executed on
the coordinator (because the tasks are generated in the planner part)
and it wouldn't give us any error because it doesn't have anything to do
with the temporary schemas(that we didn't create). But later two things
would happen:

- If by chance the fetch is pruned on the coordinator side, we the other
nodes would fetch the partitioned files from the coordinator and execute
the query as expected, because it has all the information.
- If the fetch tasks are not pruned in the coordinator, in the MERGE
step, the coordinator would either error out saying that the necessary
schema doesn't exist, or it would try to create the temporary tables
under public schema ( if it is a superuser). But then if we had the same
task ID with different jobID it would fail saying that the table already
exists, which is an error we were getting.

In the first case, the query would work okay, but it would still not do
the cleanup, hence we would leave the partitioned files from the
PARTITION step there. Hence ensure_no_intermediate_data_leak would fail.

To make things more explicit and prevent such bugs in the future,
ALL_WORKERS is named as ALL_NON_COORD_WORKERS. And a new flag to return
all the active nodes is added as ALL_DATA_NODES. For repartition case,
we don't use the only-reference table nodes but this version makes the
code simpler and there shouldn't be any significant performance issue
with that.

(cherry picked from commit 6532506f4b92b1316eea0812b2bcedb818d3b25c)
2020-07-21 13:54:51 +03:00
Sait Talha Nisanci a04e7b233e rename node/worker utilities
The names were not explicit about what they do, and we have many
misusages in the codebase, so they are renamed to be more explicit.

(cherry picked from commit 09962a7e2ff340705b6b193bbfececa2d48e0855)
2020-07-21 13:54:45 +03:00
Sait Talha Nisanci 0bd4002e5f rename TargetWorkerSet enums
Rename TargetWorkerSet enums to make them more explicit about what they
mean. Ideally it would be good to treat everything as a node without the
'worker' concept because it makes things complicated. Another
improvement could be to rename TargetWorkerSet as TargetNodeSet but it
goes to renaming many occurrences of Worker, which is probably too big
for this PR.

(cherry picked from commit de4b9569359e4f10d4ebf3fbcf7159ee6e2328db)
2020-07-21 13:54:40 +03:00
Nils Dijk 23f24a9668 fix flappy tests due to undeterministic order of test output (#4029)
As reported on #4011 https://github.com/citusdata/citus/pull/4011/files#r453804702 some of the tests were flapping due to an indeterministic order for test outputs.

This PR makes the test output ordered for all tests returning non-zero rows.

Needs to be backported to 9.2, 9.3, 9.4
(cherry picked from commit 23d44eba9f)
2020-07-21 11:01:49 +03:00
Nils Dijk d77e386e92 force aliases in deparsing for queries with anonymous column references (#4011)
DESCRIPTION: Force aliases in deparsing for queries with anonymous column references

Fixes: #3985

The root cause has todo with discrepancies in the query tree we create. I think in the future we should spend some time on categorising all changes we made to ruleutils and see if we can change the data structure `query` we pass to the deparser to have an actual valid postgres query for the deparser to render.

For now the fix is to keep track, besides changing the names of the entries in the target list, also if we have a reference to an anonymous columns. If there are anonymous columns we set the `printaliases` flag to true which forces the deparser to add the aliases.
(cherry picked from commit 449d1f0e91)
2020-07-21 11:01:49 +03:00
Marco Slot b7b960955c Rename master evaluation to coordinator evaluation
(cherry picked from commit b4fec63bc0)
2020-07-21 11:01:49 +03:00
Hadi Moshayedi 49f130fcd3 Fix Subtransaction memory leak
(cherry picked from commit 3651fc64ee)
2020-07-21 11:01:49 +03:00
Jelte Fennema b8c0e5ef1e Make static analysis happier (#4008)
Some small non-functional changes to make static analysis happy.
(cherry picked from commit 4c68ed4c33)
2020-07-21 11:01:49 +03:00
Jelte Fennema 55eed7f2ec Handle some NULL issues that static analysis found (#4001)
Static analysis found some issues where we used the result from
ExtractResultRelationRTE, without checking that it wasn't NULL. It seems
like in all these cases it can never actually be NULL, since we have checked
before that it isn't a SELECT query. So, this PR is mostly to make static
analysis happy (and protect a bit against future changes of the code).
(cherry picked from commit 759e628dd5)
2020-07-21 11:01:48 +03:00
Jelte Fennema 49d23229c4 Fix write queries with const expressions and COLLATE in various places (#3973)
(cherry picked from commit 16242d5264)
2020-07-21 11:01:48 +03:00
Jelte Fennema 48fab6f264 Replace words that have bad associations (#3992)
We had a few words in our codebase that static analysis flagged as having bad
associations.
(cherry picked from commit f6e2f1b1cb)
2020-07-21 11:01:48 +03:00
Jelte Fennema 9a4fddc9c5 Fix crash with single node dummy placement (#3993)
Static analysis found an issue where we could dereference `NULL`, because
`CreateDummyPlacement` could return `NULL` when there were no workers. This
PR changes it so that it never returns `NULL`, which was intended by
@marcocitus when doing this change: https://github.com/citusdata/citus/pull/3887/files#r438136433

While adding tests for citus on a single node I also added some more basic
tests and it turns out we error out on repartition joins. This has been
present since `shouldhaveshards` was introduced and is not trivial to fix.
So I created a separate issue for this: https://github.com/citusdata/citus/issues/3996
(cherry picked from commit ab01571c9e)
2020-07-21 11:01:48 +03:00
Philip Dubé 1d54b8f301 ruleutils: use get_rtable_name for deparsing resultRelation
(cherry picked from commit 444472ffc6)
2020-07-21 11:01:48 +03:00
Hadi Moshayedi 5e648e1a78 Fix task->fetchedExplainAnalyzePlan memory issue.
(cherry picked from commit 23fa421639)
2020-07-21 11:01:48 +03:00
Sait Talha Nisanci fc711af85b Fix explain subplan duration
(cherry picked from commit 4d217819ff)
2020-07-21 11:01:48 +03:00
Hanefi Önaldı 21ca434bef
Accept list of values in a supported ALTER ROLE .. SET statement
Some GUCs support a list of values which is indicated by GUC_LIST_INPUT flag.

When an ALTER ROLE .. SET statement is executed, the new configuration
default for affected users and databases are stored in the
setconfig(text[]) column in a pg_db_role_setting record.

If a GUC that supports a list of values is used in an ALTER ROLE .. SET
statement, we need to split the text into items delimited by commas.

(cherry picked from commit e534dbae4a)
2020-07-21 04:12:39 +03:00
Onur Tirtir 61ab7006d0
Don't run check-merge-to-enterprise for release branches
(cherry picked from commit 1c6439d1af)
2020-07-17 12:54:23 +03:00
Hanefi Önaldı 3de2d2868d
Introduce new make targets for downgrade scripts
Here are the updated make targets:
- install: install everything except downgrade scripts.
- install-downgrades: build and install only the downgrade migration scripts.
- install-all: install everything along with the downgrade migration scripts.

Conflicts:
  src/backend/distributed/Makefile
  src/backend/distributed/sql/downgrades/citus--9.5-1--9.4-1.sql
    - file does not exist on release branch yet, only on master

(cherry picked from commit 315b323d47)
2020-07-17 12:44:16 +03:00
Marco Slot 77b4534c72 Prevent integer overflow in FindShardIntervalIndex 2020-07-16 14:58:53 +02:00
Onder Kalaci 4b493f088b Fix default value of EnableBinaryProtocol
(cherry picked from commit aa8a2866f3)
2020-07-02 14:29:11 +02:00
Onur Tirtir 06c878b348 Bump Citus version to 9.4.0 2020-07-01 11:01:59 +03:00
Hanefi Onaldi 8913d63ae2
Merge pull request #3927 from citusdata/downgrade-paths 2020-07-01 10:48:50 +03:00
Hanefi Önaldı ca2ececb3b
Downgrade path from 9.4 to 9.3 to 9.2 2020-07-01 10:38:11 +03:00
Hadi Moshayedi dd5277418f
Merge pull request #3961 from citusdata/fix/constant-pushdown
Don't push expressions to workers when aggregating without GROUP BY.
2020-06-30 14:06:15 -07:00
Sait Talha Nisanci e5a21f07cb test aggregates with expressions 2020-06-30 11:41:16 -07:00
Marco Slot eeffbde8bd Fix pushdown of constants in aggregate queries 2020-06-30 11:41:16 -07:00
Jelte Fennema 392c5e2c34
Fix wrong cancellation message about distributed deadlocks (#3956) 2020-06-30 14:57:46 +02:00
Marco Slot 634d6cf9d7
Improve performance of metadata cache (#3924)
#3866 removed the shard ID hash in metadata_cache.c to simplify cache management, 
but we observed a significant performance regression that was being masked by the
performance improvement provided by #3654 in our benchmarks, but #3654 only 
applies to specific workloads.

This PR brings back the shard ID cache as it existed before #3866 with some extra
 measures to handle invalidation. When we load a table entry, we overwrite 
ShardIdCacheEntry->tableEntry pointers for all the shards in that table, though 
it's possible that the table no longer contains the old shard ID or the table 
entry is never reloaded, which would leave a dangling pointer once the table 
entry is freed. To handle that case, we remove all shard ID cache entries that 
point exactly to that table entry when a table is freed (at the end of the 
transaction or any call to CitusTableCacheFlushInvalidatedEntries).

Co-authored-by: SaitTalhaNisanci <s.talhanisanci@gmail.com>
Co-authored-by: Marco Slot <marco.slot@gmail.com>
Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
2020-06-30 12:10:10 +02:00
Jelte Fennema 02fa942be1
Fix assertion error when rolling back to savepoint (#3868)
It was possible to get an assertion error, if a DML command was
cancelled that opened a connection and then "ROLLBACK TO SAVEPOINT" was
used to continue the transaction. The reason for this was that canceling
the transaction might leave the `claimedExclusively` flag on for (some
of) it's connections.

This caused an assertion failure because `CanUseExistingConnection`
would return false and a new connection would be opened, and then there
would be two connections doing DML for the same placement. Which is
disallowed. That this situation caused an assertion failure instead of
an error, means that without asserts this could possibly result in some
visibility bugs, similar to the ones described
https://github.com/citusdata/citus/issues/3867
2020-06-30 11:31:46 +02:00
SaitTalhaNisanci e28683a025
Upgrade codecov orb in circleci (#3945)
The only reason for this upgrade is to see if it will fix codecov
pushing the coverage many times to PRs, which is cluttering the PRs.

The reason for this change is that it is possible that "pushing many
times" is related to codecov internals so upgrading can help.
2020-06-30 11:33:21 +03:00
Hadi Moshayedi d022f80340
Merge pull request #3943 from citusdata/fix_explain_2
Report correct INSERT/SELECT method in EXPLAIN
2020-06-26 08:21:50 -07:00
Hadi Moshayedi 4ed59d2db3 Move more from insert_select_executor to insert_select_planner 2020-06-26 08:08:26 -07:00
Hadi Moshayedi d34c21890f Rename CoordinatorInsertSelect... to NonPushableInsertSelect 2020-06-25 08:55:48 -07:00
Hadi Moshayedi cd25a27174 Fix crash caused by EXPLAIN EXECUTE INSERT ... SELECT 2020-06-25 08:55:48 -07:00
Hadi Moshayedi 4e8d79998e Save INSERT/SELECT method in DistributedPlan.
This is so we don't need to calculate it twice in
insert_select_executor.c and multi_explain.c, which can
cause discrepancy if an update in one of them is not
reflected in the other site.
2020-06-25 08:55:48 -07:00
Jelte Fennema 64506143e4
Replace flaky repartition analyze test with a non flaky one (#3950)
The flaky test was introduced in #3941. This removes that flaky test and
adds a new one that fails in the same manner when removing the fix in #3941.

An example of a random failure can be found here:
https://app.circleci.com/pipelines/github/citusdata/citus/9558/workflows/de76e7a5-6558-46c9-97e7-8b1dae1f173b/jobs/135876/steps
2020-06-25 15:19:15 +02:00
SaitTalhaNisanci 50e115fe3a
test task tracker repartition with replication >1 (#3944) 2020-06-24 14:54:20 +03:00
SaitTalhaNisanci f458d1fd1c
Fix/task execution (#3941)
* Not set TaskExecution with adaptive executor

Adaptive executor is using a utility method from task tracker for
repartition joins, however adaptive executor doesn't need taskExecution.
It is only used by task tracker. This causes a problem when explain
analyze is used because what taskExecution is pointing to might be
random.

We solve this by not setting taskExecution from adaptive executor. So it
will stay NULL as set by CreateTask.

* use same memory context as task for taskExecution

Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
2020-06-24 12:10:00 +03:00
Philip Dubé ac3c646ed5
Merge pull request #3942 from citusdata/fix-default-func-param-evaluation
citus_evaluate_expression: call expand_function_arguments beforehand to avoid segfaulting on implicit parameters
2020-06-23 18:37:40 +00:00
Philip Dubé cd0b2ad5b5 citus_evaluate_expression: call expand_function_arguments beforehand to avoid segfaulting on implicit parameters 2020-06-23 18:06:46 +00:00
Jelte Fennema a98226842d
Use rename to make sure no files are inserted while deleting (#3912)
As suggested by @marcocitus in https://github.com/citusdata/citus/pull/3911#issuecomment-643978531, there was
a regression in #3893. If another backend would write a file during deletion of
the intermediate results directory, this file would not necessarily be deleted.

The approach used in `CitusRemoveDirectory` is to try recursive removal of the
directory again if it has failed. This does not work here, since when a file
can not be removed for other reasons (e.g. `EPERM`) it will not throw an error
anymore. So then we would get into an infinite removal loop. Instead I now
`rename` the directory before removing it. That way other backends will not
write files to it anymore.
2020-06-23 10:38:44 +02:00
Hanefi Onaldi 0e0695481c
Merge pull request #3935 from citusdata/disallow-long-changelog 2020-06-22 23:55:50 +03:00