Commit Graph

248 Commits (63efc140de3c7d97e77393a36adef7df57b6a93b)

Author SHA1 Message Date
Jelte Fennema 2f29d4e53e Continue to remove shards after first failure in DropMarkedShards
The comment of DropMarkedShards described the behaviour that after a
failure we would continue trying to drop other shards. However the code
did not do this and would stop after the first failure. Instead of
simply fixing the comment I fixed the code, because the described
behaviour is more useful. Now a single shard that cannot be removed yet
does not block others from being removed.
2021-04-30 15:42:09 +03:00
SaitTalhaNisanci 03832f353c Drop postgres 11 support 2021-03-25 09:20:28 +03:00
Marco Slot 6c5d263b7a Remove unnecessary AtEOXact_Files call 2021-03-15 09:34:02 +01:00
Onder Kalaci e65e72130d Rename use -> shouldUse
Because setting the flag doesn't necessarily mean that we'll
use 2PC. If connections are read-only, we will not use 2PC.
In other words, we'll use 2PC only for connections that modified
any placements.
2021-03-12 08:29:43 +00:00
Onder Kalaci 6a7ed7b309 Do not trigger 2PC for reads on local execution
Before this commit, Citus used 2PC no matter what kind of
local query execution happens.

For example, if the coordinator has shards (and the workers as well),
even a simple SELECT query could start 2PC:
```SQL

WITH cte_1 AS (SELECT * FROM test LIMIT 10) SELECT count(*) FROM cte_1;
```

In this query, the local execution of the shards (and also intermediate
result reads) triggers the 2PC.

To prevent that, Citus now distinguishes local reads and local writes.
And, Citus switches to 2PC only if a modification happens. This may
still lead to unnecessary 2PCs when there is a local modification
and remote SELECTs only. Though, we handle that separately
via #4587.
2021-03-12 08:29:43 +00:00
Naisila Puka 196064836c
Skip 2PC for readonly connections in a transaction (#4587)
* Skip 2PC for readonly connections in a transaction

* Use ConnectionModifiedPlacement() function

* Remove the second check of ConnectionModifiedPlacement()

* Add order by to prevent flaky output

* Test using pg_dist_transaction
2021-03-10 20:01:37 +03:00
Philip Dubé 4e22f02997 Fix various typos due to zealous repetition 2021-03-04 19:28:15 +00:00
Hadi Moshayedi c3dcd6b9f8 Columnar: don't include stripe reservation locks in lock graph. 2021-02-10 10:20:20 -08:00
Onur Tirtir cacb76d2c6
Not mention citus local tables in error messages (#4579) 2021-01-27 12:36:53 +03:00
Hadi Moshayedi bc01c795a2 Reland #4419 2021-01-19 07:48:47 -08:00
Marco Slot d9f175532b Revert "Trigger metadata sync at transaction commit"
This reverts commit a2c73bef27.
2021-01-07 10:30:00 +01:00
Onder Kalaci 2fe158961b Remove "WarnAboutLeakedPreparedTransaction" function
We used to need WarnAboutLeakedPreparedTransaction()
as we didn't have auto 2PC recovery. But, we long have
2PC recovery by https://github.com/citusdata/citus/pull/1574

So, we don't need anymore.
2021-01-06 15:48:58 +03:00
Hadi Moshayedi a2c73bef27 Trigger metadata sync at transaction commit 2020-12-24 08:28:38 -08:00
Ahmet Gedemenli 7577821920 Fix transaction name length calculation 2020-12-07 12:34:15 +03:00
Ahmet Gedemenli 936775e8e3 Delete transactions when removing node
With this commit, we delete entries in pg_dist_transaction
for the primary nodes that are removed by `master_remove_node`.
2020-12-07 11:35:20 +03:00
Onder Kalaci 629ecc3dee Add the infrastructure to count the number of client backends
Considering the adaptive connection management
improvements that we plan to roll soon, it makes it
very helpful to know the number of active client
backends.

We are doing this addition to simplify yhe adaptive connection
management for single node Citus. In single node Citus, both the
client backends and Citus parallel queries would compete to get
slots on Postgres' `max_connections` on the same Citus database.

With adaptive connection management, we have the counters for
Citus parallel queries. That helps us to adaptively decide
on the remote executions pool size (e.g., throttle connections
if necessary).

However, we do not have any counters for the total number of
client backends on the database. For single node Citus, we
should consider all the client backends, not only the remote
connections that Citus does.

Of course Postgres internally knows how many client
backends are active. However, to get that number Postgres
iterates over all the backends. For examaple, see [pg_stat_get_db_numbackends](8e90ec5580/src/backend/utils/adt/pgstatfuncs.c (L1240))
where Postgres iterates over all the backends.

For our purpuses, we need this information on every connection
establishment. That's why we cannot affort to do this kind of
iterattion.
2020-11-25 19:19:24 +01:00
Onur Tirtir f80f4839ad Remove unused functions that cppcheck found 2020-10-19 13:50:52 +03:00
SaitTalhaNisanci e7cd1ed0ee
Not take ShareUpdateExlusiveLock on pg_dist_transaction (#4184)
* Not take ShareUpdateExlusiveLock on pg_dist_transaction

We were taking ShareUpdateExlusiveLock on pg_dist_transaction during
recovery to prevent multiple recoveries happening concurrenly. VACUUM(
not FULL) also takes ShareUpdateExclusiveLock, and they can conflict. It
seems that VACUUM will skip the table if there is a conflicting lock
already taken unless it is doing the vacuum to prevent id wraparound, in
which case there can be a deadlock. I guess the deadlock happens if:

- VACUUM takes a lock on pg_dist_transaction and is done for id
wraparound problem
- The transaction in the maintenance tries to take a lock but
cannot as that conflicts with the lock acquired by VACUUM
- The transaction in the maintenance daemon has a very old xid hence
VACUUM cannot proceed.

If we take a row exclusive lock in transaction recovery then it wouldn't
conflict with VACUUM hence it could proceed so the deadlock would be
resolved. To prevent concurrent transaction recoveries happening, an
advisory lock is taken with ShareUpdateExlusiveLock as before.

* Use CITUS_OPERATIONS tag
2020-09-21 15:20:38 +03:00
Onur Tirtir 3a73fba810 Apply planner changes for citus local tables 2020-09-09 11:51:18 +03:00
Onur Tirtir 0b1cc118a9 Adapt other cache entry changes for citus local tables 2020-09-09 11:50:55 +03:00
Onur Tirtir ba208eae4d
Record non-distributed table accesses in local executor (#4139) 2020-09-07 18:19:08 +03:00
SaitTalhaNisanci 366461ccdb
Introduce cache entry/table utilities (#4132)
Introduce table entry utility functions

Citus table cache entry utilities are introduced so that we can easily
extend existing functionality with minimum changes, specifically changes
to these functions. For example IsNonDistributedTableCacheEntry can be
extended for citus local tables without the need to scan the whole
codebase and update each relevant part.

* Introduce utility functions to find the type of tables

A table type can be a reference table, a hash/range/append distributed
table. Utility methods are created so that we don't have to worry about
how a table is considered as a reference table etc. This also makes it
easy to extend the table types.

* Add IsCitusTableType utilities

* Rename IsCacheEntryCitusTableType -> IsCitusTableTypeCacheEntry

* Change citus table types in some checks
2020-09-02 22:26:05 +03:00
Sait Talha Nisanci b641f63bfd Use CMDTAG_SELECT_COMPAT
CMDTAG_SELECT exists in PG12 hence defining a MACRO such as
CMDTAG_SELECT -> "SELECT" is not possible. I chose CMDTAG_SELECT_COMPAT
because with the COMPAT suffix it is explicit that it maps to different
things in different versions and also has a less chance of mapping
something irrevelant. For example if we used SELECT as a macro, then it
would map every SELECT to whatever it is mapping to, which might have
unexpected/undesired behaviour.
2020-08-04 15:38:13 +03:00
Sait Talha Nisanci 1112b254a7 adapt recently added code for pg13
This commit mostly adds pg_get_triggerdef_command to our ruleutils_13.
This doesn't add anything extra for ruleutils 13 so it is basically a copy
of the change on ruleutils_12
2020-08-04 15:18:27 +03:00
Sait Talha Nisanci 01632c56a0 Change utils/hashutils.h to common/hashfn.h for PG >= 13
Commit on postgres side:
05d8449e73694585b59f8b03aaa087f04cc4679a

Command on postgres side:
git log --all --grep="hashutils"

include common/hashfn.h for pg >= 13

tag_hash was moved from hsearch.h to hashutils.h then to hashfn.h

Commits on Postgres side:
9341c783cc42ffae5860c86bdc713bd47d734ffd
2020-08-04 15:10:22 +03:00
Sait Talha Nisanci bf831d2e59 Use table_openXXX methods in the codebase
With PG13 heap_* (heap_open, heap_close etc) are replaced with table_*
(table_open, table_close etc).

It is better to use the new table access methods in the codebase and
define the macros for the previous versions as we can easily remove the
macro without having to change the codebase when we drop the support for
the old version.

Commits that introduced this change on Postgres:
f25968c49697db673f6cd2a07b3f7626779f1827
e0c4ec07284db817e1f8d9adfb3fffc952252db0
4b21acf522d751ba5b6679df391d5121b6c4a35f

Command to see relevant commits on Postgres side:
git log --all --grep="heap_open"
2020-08-04 15:10:22 +03:00
Onder Kalaci eeb8c81de2 Implement shared connection count reservation & enable `citus.max_shared_pool_size` for COPY
With this patch, we introduce `locally_reserved_shared_connections.c/h` files
which are responsible for reserving some space in shared memory counters
upfront.

We sometimes need to reserve connections, but not necessarily
establish them. For example:
-  COPY command should reserve connections as it cannot know which
   connections it needs in which order. COPY establishes connections
   as any input data hits the workers. For example, for router COPY
   command, it only establishes 1 connection.

   As discussed here (https://github.com/citusdata/citus/pull/3849#pullrequestreview-431792473),
   COPY needs to reserve connections up-front, otherwise we can end
   up with resource starvation/un-detected deadlocks.
2020-08-03 18:51:40 +02:00
Marco Slot 5fbb925df1 Remove level asserts in abort handler 2020-07-12 22:54:35 +02:00
Sait Talha Nisanci 510535f558 address feedback 2020-07-13 19:45:02 +03:00
Sait Talha Nisanci db1b78148c send schema creation/cleanup to coordinator in repartitions
We were using ALL_WORKERS TargetWorkerSet while sending temporary schema
creation and cleanup. We(well mostly I) thought that ALL_WORKERS would also include coordinator when it is added as a worker. It turns out that it was FILTERING OUT the coordinator even if it is added as a worker to the cluster.

So to have some context here, in repartitions, for each jobId we create
(at least we were supposed to) a schema in each worker node in the cluster. Then we partition each shard table into some intermediate files, which is called the PARTITION step. So after this partition step each node has some intermediate files having tuples in those nodes. Then we fetch the partition files to necessary worker nodes, which is called the FETCH step. Then from the files we create intermediate tables in the temporarily created schemas, which is called a MERGE step. Then after evaluating the result, we remove the temporary schemas(one for each job ID in each node) and files.

If node 1 has file1, and node 2 has file2 after PARTITION step, it is
enough to either move file1 from node1 to node2 or vice versa. So we
prune one of them.

In the MERGE step, if the schema for a given jobID doesn't exist, the
node tries to use the `public` schema if it is a superuser, which is
actually added for testing in the past.

So when we were not sending schema creation comands for each job ID to
the coordinator(because we were using ALL_WORKERS flag, and it doesn't
include the coordinator), we would basically not have any schemas for
repartitions in the coordinator. The PARTITION step would be executed on
the coordinator (because the tasks are generated in the planner part)
and it wouldn't give us any error because it doesn't have anything to do
with the temporary schemas(that we didn't create). But later two things
would happen:

- If by chance the fetch is pruned on the coordinator side, we the other
nodes would fetch the partitioned files from the coordinator and execute
the query as expected, because it has all the information.
- If the fetch tasks are not pruned in the coordinator, in the MERGE
step, the coordinator would either error out saying that the necessary
schema doesn't exist, or it would try to create the temporary tables
under public schema ( if it is a superuser). But then if we had the same
task ID with different jobID it would fail saying that the table already
exists, which is an error we were getting.

In the first case, the query would work okay, but it would still not do
the cleanup, hence we would leave the partitioned files from the
PARTITION step there. Hence ensure_no_intermediate_data_leak would fail.

To make things more explicit and prevent such bugs in the future,
ALL_WORKERS is named as ALL_NON_COORD_WORKERS. And a new flag to return
all the active nodes is added as ALL_DATA_NODES. For repartition case,
we don't use the only-reference table nodes but this version makes the
code simpler and there shouldn't be any significant performance issue
with that.
2020-07-13 19:20:15 +03:00
SaitTalhaNisanci 15290bc43b
remove unused worker methods (#4017) 2020-07-10 13:45:55 +03:00
SaitTalhaNisanci 3f50165365
rename TargetWorkerSet enums (#4015)
Rename TargetWorkerSet enums to make them more explicit about what they
mean. Ideally it would be good to treat everything as a node without the
'worker' concept because it makes things complicated. Another
improvement could be to rename TargetWorkerSet as TargetNodeSet but it
goes to renaming many occurrences of Worker, which is probably too big
for this PR.
2020-07-10 11:21:27 +03:00
Hadi Moshayedi 3651fc64ee Fix Subtransaction memory leak 2020-07-09 12:33:39 -07:00
SaitTalhaNisanci 96adce77d6
rename node/worker utilities (#4003)
The names were not explicit about what they do, and we have many
misusages in the codebase, so they are renamed to be more explicit.
2020-07-09 15:30:35 +03:00
Jelte Fennema 392c5e2c34
Fix wrong cancellation message about distributed deadlocks (#3956) 2020-06-30 14:57:46 +02:00
Jelte Fennema 02fa942be1
Fix assertion error when rolling back to savepoint (#3868)
It was possible to get an assertion error, if a DML command was
cancelled that opened a connection and then "ROLLBACK TO SAVEPOINT" was
used to continue the transaction. The reason for this was that canceling
the transaction might leave the `claimedExclusively` flag on for (some
of) it's connections.

This caused an assertion failure because `CanUseExistingConnection`
would return false and a new connection would be opened, and then there
would be two connections doing DML for the same placement. Which is
disallowed. That this situation caused an assertion failure instead of
an error, means that without asserts this could possibly result in some
visibility bugs, similar to the ones described
https://github.com/citusdata/citus/issues/3867
2020-06-30 11:31:46 +02:00
Marco Slot d1bab78d79 Remove master from file hierarchy 2020-06-16 17:49:09 +02:00
Jelte Fennema 0e12d045b1
Support use of binary protocol in between nodes (#3877)
This can save a lot of data to be sent in some cases, thus improving
performance for which inter query bandwidth is the bottleneck.
There's some issues with enabling this as default, so that's currently not done.
2020-06-12 15:02:51 +02:00
Jelte Fennema f4791fcb10
Remove SwallowErrors by using PathNameDeleteTemporaryDir (#3893)
This is a different version of #3634. It also removes SwallowErrors, but
instead of modifying our own functions to not throw errors, it uses the
postgres built in `PathNameDeleteTemporaryDir` function. This function
does not throw errors.

Since this change is for a bugfix, I tried to minimize the changes.

PRs with the following changes would be good to do separately from this
PR:
1. Use PathName(Create|Open|Delete)Temporary(File|Dir) to open and
   remove all files/dirs instead of our own custom file functions.
2. Prefix our outmost files/directories with `PG_TEMP_FILE_PREFIX` so
   that they are identified by Postgres as temporary files, which will be
   removed at postmaster start. This way we do not have to do this cleanup
   ourselves.
3. Store the files in the temporary table space if it exists.

Fixes #3634
Fixes #3618
2020-06-10 17:04:07 +02:00
Hadi Moshayedi 5cdfa9f571 Implement EXPLAIN ANALYZE udfs.
Implements worker_save_query_explain_analyze and worker_last_saved_explain_analyze.

worker_save_query_explain_analyze executes and returns results of query while
saving its EXPLAIN ANALYZE to be fetched later.

worker_last_saved_explain_analyze returns the saved EXPLAIN ANALYZE result.
2020-06-09 10:02:05 -07:00
Philip Dubé c0515dcd67 This prepares for routing modifying CTEs, where modLevel should not be used to infer whether a plan is a select or not
SELECT_TASK is renamed to READ_TASK as a SELECT with modifying CTEs will be a MODIFYING_TASK

RouterInsertJob: Assert originalQuery->commandType == CMD_INSERT
CreateModifyPlan: Assert originalQuery->commandType != CMD_SELECT

Remove unused function IsModifyDistributedPlan

DistributedExecution, ExecutionParams, DistributedPlan: Rename hasReturning to expectResults
SELECTs set expectResults to true

Rename CreateSingleTaskRouterPlan to CreateSingleTaskRouterSelectPlan
2020-05-20 17:26:12 +00:00
Jelte Fennema c6f5d5fe88
Add some asserts to pass static analysis (#3805) 2020-04-29 11:19:11 +02:00
Hadi Moshayedi 59b9a4e5a1 Detect deadlocks in replicate_reference_tables() 2020-04-15 11:06:18 -07:00
Marco Slot 8b83306a27 Issue worker messages with the same log level 2020-04-14 21:08:25 +02:00
Onder Kalaci aa6b641828 Throttle connections to the worker nodes
With this commit, we're introducing a new infrastructure to throttle
connections to the worker nodes. This infrastructure is useful for
multi-shard queries, router queries are have not been affected by this.

The goal is to prevent establishing more than citus.max_shared_pool_size
number of connections per worker node in total, across sessions.

To do that, we've introduced a new connection flag OPTIONAL_CONNECTION.
The idea is that some connections are optional such as the second
(and further connections) for the adaptive executor. A single connection
is enough to finish the distributed execution, the others are useful to
execute the query faster. Thus, they can be consider as optional connections.
When an optional connection is not allowed to the adaptive executor, it
simply skips it and continues the execution with the already established
connections. However, it'll keep retrying to establish optional
connections, in case some slots are open again.
2020-04-14 10:27:48 +02:00
SaitTalhaNisanci 3dc7cad754
use an enum for local execution status (#3733)
We have two variables that are related to local execution status.
TransactionAccessedLocalPlacement and
TransactionConnectedToLocalGroup. Only one of these fields should be
set, however we didn't have any check for this contraint and it was
error prone.

What those two variables are used is that we are trying to understand if
we should use local execution, the current session, or if we should be
using a connection to execute the current query, therefore the tasks. In
the enum, now it is more clear what these variables mean.

Also, now we have a method to change the local execution status. The
method will error if we are trying to transition from a state to a wrong
state. This will help us avoid problems.
2020-04-09 19:11:04 +03:00
SaitTalhaNisanci fa88046ce1
test that we don't leak intermediate schemas (#3737)
* test that we don't leak intermediate schemas

We have tests to make sure that we don't intermediate any intermediate
files, tables etc but we don't test if we are leaking schemas. It makes
sense to test this as well.

* remove all repartition schemas in case of error

This solution is not an ideal one but it seems to be doing the job.
We should have a more generic solution for the cleanup but it seems that
putting the cleanup in the abort handler is dangerous and it was
crashing.
2020-04-09 12:17:41 +03:00
Marco Slot 924cd7343a Defer reference table replication to shard creation time 2020-04-08 12:41:36 -07:00
SaitTalhaNisanci ba01f3457a
use macros for pg versions instead of hardcoded values (#3694)
3 Macros are defined for removing the hardcoded pg versions.
PG_VERSION_11, PG_VERSION_12 and PG_VERSION_13.
2020-04-01 17:01:52 +03:00
Onur Tirtir 8ebb8ef31d use PG_USED_FOR_ASSERTS_ONLY 2020-03-25 11:01:33 +03:00