Commit Graph

1413 Commits (fc09e1cfdcb4619544c6f356b14a39f766c8b718)

Author SHA1 Message Date
SaitTalhaNisanci 8c3f85692d
Not consider old placements when disabling or removing a node (#4960)
* Not consider old placements when disabling or removing a node

* update cluster test
2021-05-28 22:38:20 +02:00
SaitTalhaNisanci a4944a2102
Rename CoordinatedTransactionShouldUse2PC (#4995) 2021-05-21 18:57:42 +03:00
Hanefi Onaldi 878513f325
Remove all occurences of replication_model GUC 2021-05-21 16:14:59 +03:00
Jelte Fennema 10f06ad753 Fetch shard size on the fly for the rebalance monitor
Without this change the rebalancer progress monitor gets the shard sizes
from the `shardlength` column in `pg_dist_placement`. This column needs to
be updated manually by calling `citus_update_table_statistics`.
However, `citus_update_table_statistics` could lead to distributed
deadlocks while database traffic is on-going (see #4752).

To work around this we don't use `shardlength` column anymore. Instead
for every rebalance we now fetch all shard sizes on the fly.

Two additional things this does are:
1. It adds tests for the rebalance progress function.
2. If a shard move cannot be done because a source or target node is
   unreachable, then we error in stop the rebalance, instead of showing
   a warning and continuing. When using the by_disk_size rebalance
   strategy it's not safe to continue with other moves if a specific
   move failed. It's possible that the failed move made space for the
   next move, and because the failed move never happened this space now
   does not exist.
3. Adds two new columns to the result of `get_rebalancer_progress` which
   shows the size of the shard on the source and target node.

Fixes #4930
2021-05-20 16:38:17 +02:00
Nils Dijk a6c2d2a4c4
Feature: alter database owner (#4986)
DESCRIPTION: Add support for ALTER DATABASE OWNER

This adds support for changing the database owner. It achieves this by marking the database as a distributed object. By marking the database as a distributed object it will look for its dependencies and order the user creation commands (enterprise only) before the alter of the database owner. This is mostly important when adding new nodes.

By having the database marked as a distributed object it can easily understand for which `ALTER DATABASE ... OWNER TO ...` commands to propagate by resolving the object address of the database and verifying it is a distributed object, and hence should propagate changes of owner ship to all workers.

Given the ownership of the database might have implications on subsequent commands in transactions we force sequential mode for transactions that have a `ALTER DATABASE ... OWNER TO ...` command in them. This will fail the transaction with meaningful help when the transaction already executed parallel statements.

By default the feature is turned off since roles are not automatically propagated, having it turned on would cause hard to understand errors for the user. It can be turned on by the user via setting the `citus.enable_alter_database_owner`.
2021-05-20 13:27:44 +02:00
Onder Kalaci d07db99ea4 Make sure that target node in shard moves is eligable for shard move 2021-05-20 10:51:01 +02:00
Onder Kalaci 926069a859 Wait until all connections are successfully established
Comment from the code:
/*
 * Iterate until all the tasks are finished. Once all the tasks
 * are finished, ensure that that all the connection initializations
 * are also finished. Otherwise, those connections are terminated
 * abruptly before they are established (or failed). Instead, we let
 * the ConnectionStateMachine() to properly handle them.
 *
 * Note that we could have the connections that are not established
 * as a side effect of slow-start algorithm. At the time the algorithm
 * decides to establish new connections, the execution might have tasks
 * to finish. But, the execution might finish before the new connections
 * are established.
 */

 Note that the abruptly terminated connections lead to the following errors:

2020-11-16 21:09:09.800 CET [16633] LOG:  could not accept SSL connection: Connection reset by peer
2020-11-16 21:09:09.872 CET [16657] LOG:  could not accept SSL connection: Undefined error: 0
2020-11-16 21:09:09.894 CET [16667] LOG:  could not accept SSL connection: Connection reset by peer

To easily reproduce the issue:

- Create a single node Citus
- Add the coordinator to the metadata
- Create a distributed table with shards on the coordinator
- f.sql:  select count(*) from test;
- pgbench -f /tmp/f.sql postgres -T 12 -c 40 -P 1  or pgbench -f /tmp/f.sql postgres -T 12 -c 40 -P 1 -C
2021-05-19 15:59:13 +02:00
Onder Kalaci 995adf1a19 Executor takes connection establishment and task execution costs into account
With this commit, the executor becomes smarter about refrain to open
new connections. The very basic example is that, if the connection
establishments take 1000ms and task executions as 5 msecs, the executor
becomes smart enough to not establish new connections.
2021-05-19 15:48:07 +02:00
Marco Slot 644b266dee Only cache local plans when reusing a distributed plan 2021-05-18 16:11:43 +02:00
SaitTalhaNisanci eaa7d2bada
Not block maintenance daemon (#4972)
It was possible to block maintenance daemon by taking an SHARE ROW
EXCLUSIVE lock on pg_dist_placement. Until the lock is released
maintenance daemon would be blocked.

We should not block the maintenance daemon under any case hence now we
try to get the pg_dist_placement lock without waiting, if we cannot get
it then we don't try to drop the old placements.
2021-05-17 03:22:35 -07:00
Nils Dijk c91f8d8a15
Feature: localhost guc (#4836)
DESCRIPTION: introduce `citus.local_hostname` GUC for connections to the current node

Citus once in a while needs to connect to itself for some systems operations. This used to be hardcoded to `localhost`. The hardcoded hostname causes some issues, for example in environments where `sslmode=verify-full` is required. It is not always desirable or even feasible to get `localhost` as an alt name on the certificate.

By introducing a GUC to use when connecting to the current instance the user has more control what network path is used and what hostname is required to be present in the server certificate.
2021-05-12 16:59:44 +02:00
Jelte Fennema cbbd10b974
Implement an improvement threshold in the rebalancer (#4927)
Every move in the rebalancer algorithm results in an improvement in the
balance. However, even if the improvement in the balance was very small
the move was still chosen. This is especially problematic if the shard
itself is very big and the move will take a long time.

This changes the rebalancer algorithm to take the relative size of the
balance improvement into account when choosing moves. By default a move
will not be chosen if it improves the balance by less than half of the
size of the shard. An extra argument is added to the rebalancer
functions so that the user can decide to lower the default threshold if
the ignored move is wanted anyway.
2021-05-11 14:24:59 +02:00
Onder Kalaci a231ff29b0 Get prepared for some improvements for online rebalancer
To see all the changes, see https://github.com/citusdata/citus-enterprise/pull/586/files
2021-05-10 19:54:31 +02:00
Onur Tirtir 4f3c672ebe Re-consider VALID_ITEMPOINTER_OFFSETS wrt bitmap scan logic 2021-05-10 20:16:50 +03:00
Onur Tirtir 0f4c97e0d0 Improve the constants around row number mapping 2021-05-10 20:16:50 +03:00
Onur Tirtir 2e419ea177 Add first_row_number column to columnar.stripe for tid mapping 2021-05-10 20:16:50 +03:00
jeff-davis 7b9aecff21 Columnnar: metapage changes. (#4907)
* Columnar: introduce columnar storage API.

This new API is responsible for the low-level storage details of
columnar; translating large reads and writes into individual block
reads and writes that respect the page headers and emit WAL. It's also
responsible for the columnar metapage, resource reservations (stripe
IDs, row numbers, and data), and truncation.

This new API is not used yet, but will be used in subsequent
forthcoming commits.

* Columnar: add columnar_storage_info() for debugging purposes.

* Columnar: expose ColumnarMetadataNewStorageId().

* Columnar: always initialize metapage at creation time.

This avoids the complexity of dealing with tables where the metapage
has not yet been initialized.

* Columnar: columnar storage upgrade/downgrade UDFs.

Necessary upgrade/downgrade step so that new code doesn't see an old
metapage.

* Columnar: improve metadata.c comment.

* Columnar: make ColumnarMetapage internal to the storage API.

Callers should not have or need direct access to the metapage.

* Columnar: perform resource reservation using storage API.

* Columnar: implement truncate using storage API.

* Columnar: implement read/write paths with storage API.

* Columnar: add storage tests.

* Revert "Columnar: don't include stripe reservation locks in lock graph."

This reverts commit c3dcd6b9f8.

No longer needed because the columnar storage API takes care of
concurrency for resource reservation.

* Columnar: remove unnecessary lock when reserving.

No longer necessary because the columnar storage API takes care of
concurrent resource reservation.

* Add simple upgrade tests for storage/ branch

* fix multi_extension.out

Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
2021-05-10 20:16:46 +03:00
SaitTalhaNisanci 6b1904d37a
When moving a shard to a new node ensure there is enough space (#4929)
* When moving a shard to a new node ensure there is enough space

* Add WairForMiliseconds time utility

* Add more tests and increase readability

* Remove the retry loop and use a single udf for disk stats

* Address review

* address review

Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
2021-05-06 17:28:02 +03:00
Jelte Fennema 2f29d4e53e Continue to remove shards after first failure in DropMarkedShards
The comment of DropMarkedShards described the behaviour that after a
failure we would continue trying to drop other shards. However the code
did not do this and would stop after the first failure. Instead of
simply fixing the comment I fixed the code, because the described
behaviour is more useful. Now a single shard that cannot be removed yet
does not block others from being removed.
2021-04-30 15:42:09 +03:00
Marco Slot 4b49cb112f Fix FROM ONLY queries on partitioned tables 2021-04-27 16:10:07 +02:00
Onder Kalaci 918838e488 Allow constant VALUES clauses in pushdown queries
As long as the VALUES clause contains constant values, we should not
recursively plan the queries/CTEs.

This is a follow-up work of #1805. So, we can easily apply OUTER join
checks as if VALUES clause is a reference table/immutable function.
2021-04-21 14:28:08 +02:00
SaitTalhaNisanci 93c2dcf3d2
Fix data-race with concurrent calls of DropMarkedShards (#4909)
* Fix problews with concurrent calls of DropMarkedShards

When trying to enable `citus.defer_drop_after_shard_move` by default it
turned out that DropMarkedShards was not safe to call concurrently.
This could especially cause big problems when also moving shards at the
same time. During tests it was possible to trigger a state where a shard
that was moved would not be available on any of the nodes anymore after
the move.

Currently DropMarkedShards is only called in production by the
maintenaince deamon. Since this is only a single process triggering such
a race is currently impossible in production settings. In future changes
we will want to call DropMarkedShards from other places too though.

* Add some isolation tests

Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
2021-04-21 10:59:48 +03:00
Ahmet Gedemenli 33c620f232
Optimize partitioned disk size calculation (#4905)
* Optimize partitioned disk size calculation

* Polish

* Fix test for citus_shard_cost_by_disk_size

Try optimizing if not CSTORE
2021-04-19 13:30:56 +03:00
Onder Kalaci 5482d5822f Keep more statistics about connection establishment times
When DEBUG4 enabled, Citus now prints per connection establishment
time.
2021-04-16 14:56:31 +02:00
Hanefi Onaldi 9919fbe3f8 Switch to sequential mode on long partition names
This commit adds support for long partition names for distributed tables:
- ALTER TABLE dist_table ATTACH PARTITION ..
- CREATE TABLE .. PARTITION OF dist_table ..

Note: create_distributed_table UDF does not support long table and
partition names, and is not covered in this commit
2021-04-14 15:27:50 +03:00
Onur Tirtir fe5c985e1d
Remove HAS_TABLEAM config since we dropped pg11 support (#4862)
* Remove HAS_TABLEAM config

* Drop columnar_ensure_objects_exist

* Not call columnar_ensure_objects_exist in citus_finish_pg_upgrade
2021-04-13 10:51:26 +03:00
Ahmet Gedemenli d74d358a45
Refactor size queries with new enum SizeQueryType (#4898)
* Refactor size queries with new enum SizeQueryType

* Polish
2021-04-12 17:14:29 +03:00
SaitTalhaNisanci b453563e88
Warm up connections params hash (#4872)
ConnParams(AuthInfo and PoolInfo) gets a snapshot, which will block the
remote connectinos to localhost. And the release of snapshot will be
blocked by the snapshot. This leads to a deadlock.

We warm up the conn params hash before starting a new transaction so
that the entries will already be there when we start a new transaction.
Hence GetConnParams will not get a snapshot.
2021-04-12 13:08:38 +03:00
Halil Ozan Akgul a5038046f9 Adds shard_count parameter to create_distributed_table 2021-03-29 16:22:49 +03:00
SaitTalhaNisanci 03832f353c Drop postgres 11 support 2021-03-25 09:20:28 +03:00
Marco Slot fbc2147e11 Replace MAX_PUT_COPY_DATA_BUFFER_SIZE by citus.remote_copy_flush_threshold GUC 2021-03-16 06:00:38 +01:00
Marco Slot 1646fca445 Add GUC to set maximum connection lifetime 2021-03-16 01:57:57 +01:00
jeff-davis 3b12556401
Columnar: cleanup (#4814)
* Columnar: fix misnamed file.

* Columnar: make compression not dependent on columnar.h.

* Columnar: rename columnar_metadata_tables.c to columnar_metadata.c.

* Columnar: make customscan not depend on columnar.h.

Co-authored-by: Jeff Davis <jefdavi@microsoft.com>
2021-03-15 11:34:39 -07:00
Onder Kalaci e65e72130d Rename use -> shouldUse
Because setting the flag doesn't necessarily mean that we'll
use 2PC. If connections are read-only, we will not use 2PC.
In other words, we'll use 2PC only for connections that modified
any placements.
2021-03-12 08:29:43 +00:00
Onder Kalaci 6a7ed7b309 Do not trigger 2PC for reads on local execution
Before this commit, Citus used 2PC no matter what kind of
local query execution happens.

For example, if the coordinator has shards (and the workers as well),
even a simple SELECT query could start 2PC:
```SQL

WITH cte_1 AS (SELECT * FROM test LIMIT 10) SELECT count(*) FROM cte_1;
```

In this query, the local execution of the shards (and also intermediate
result reads) triggers the 2PC.

To prevent that, Citus now distinguishes local reads and local writes.
And, Citus switches to 2PC only if a modification happens. This may
still lead to unnecessary 2PCs when there is a local modification
and remote SELECTs only. Though, we handle that separately
via #4587.
2021-03-12 08:29:43 +00:00
Onder Kalaci d1cd198655 Prevent infinite recursion for queries that involve UNION ALL and JOIN
With this commit, we make sure to prevent infinite recursion for queries
in the format: [subquery with a UNION ALL] JOIN [table or subquery]

Also, fixes a bug where we pushdown UNION ALL below a JOIN even if the
UNION ALL is not safe to pushdown.
2021-03-03 12:27:26 +01:00
Hadi Moshayedi 1a05131331
Use chunk groups to read columnar data (#4768) 2021-03-02 23:53:24 -08:00
Naisila Puka 2f30614fe3
Reimplement citus_update_table_statistics to detect dist. deadlocks (#4752)
* Reimplement citus_update_table_statistics

* Update stats for the given table not colocation group

* Add tests for reimplemented citus_update_table_statistics

* Use coordinated transaction, merge with citus_shard_sizes functions

* Update the old master_update_table_statistics as well
2021-03-03 04:12:30 +03:00
jeff-davis 9da9bd3dfd
Columnar: rename files and tests. (#4751)
* Columnar: rename files and tests.

* Columnar: Rename Table*State to Columnar*State.
2021-03-01 08:34:24 -08:00
SaitTalhaNisanci feee25dfbd
Use translated vars in postgres 13 as well (#4746)
* Use translated vars in postgres 13 as well

Postgres 13 removed translated vars with pg 13 so we had a special logic
for pg 13. However it had some bug, so now we copy the translated vars
before postgres deletes it. This also simplifies the logic.

* fix rtoffset with pg >= 13
2021-02-26 19:41:29 +03:00
Naisila Puka 5ebd4eac7f
Preserve colocation with procedures in alter_distributed_table (#4743) 2021-02-25 19:52:47 +03:00
Hanefi Onaldi 9a792ef841 Remove length limitations for table renames 2021-02-24 03:35:27 +03:00
Onur Tirtir 495096ef5e
Remove useless pg version checks (#4741) 2021-02-23 21:20:18 +03:00
SaitTalhaNisanci bcbd24f8de
Only consider pseudo constants for shortcuts (#4712)
It seems that we need to consider only pseudo constants while doing some
shortcuts in planning. For example there could be a false clause but it
can contribute to the result in which case it will not be a pseudo
constant.
2021-02-15 18:39:37 +03:00
Onder Kalaci f297c96ec5 Add regression tests for COPY into colocated intermediate results
To add the tests without too much data, make the copy switchover
configurable.
2021-02-11 15:41:06 +01:00
Ahmet Gedemenli c8e83d1f26 Fix dropping fkey when distributing table 2021-02-11 15:48:35 +03:00
Hadi Moshayedi c3dcd6b9f8 Columnar: don't include stripe reservation locks in lock graph. 2021-02-10 10:20:20 -08:00
Hadi Moshayedi c8d61a31e2 Columnar: chunk_group metadata table 2021-02-09 14:11:58 -08:00
Onder Kalaci c804c9aa21 Allow local execution for intermediate results in COPY
When COPY is used for copying into co-located files, it was
not allowed to use local execution. The primary reason was
Citus treating co-located intermediate results as co-located
shards, and COPY into the distributed table was done via
"format result". And, local execution of such COPY commands
was not implemented.

With this change, we implement support for local execution with
"format result". To do that, we use the buffer for every file
on shardState->copyOutState, similar to how local copy on
shards are implemented. In fact, the logic is similar to
local copy on shards, but instead of writing to the shards,
Citus writes the results to a file.

The logic relies on LOCAL_COPY_FLUSH_THRESHOLD, and flushes
only when the size exceeds the threshold. But, unlike local
copy on shards, in this case we write the headers and footers
just once.
2021-02-09 15:00:06 +01:00
Jeff Davis 2ea31c899e Columnar: make read and write state private. 2021-02-08 10:11:57 -08:00
Hadi Moshayedi eff8cffaf3
Columnar: improve naming of limit config variables. (#4653)
* Rename chunk_row_count to chunk_group_row_limit

* Rename stripe_row_count to stripe_row_limit

* Undo couple of renames
2021-02-06 09:04:04 -08:00
Hadi Moshayedi 0a9fd91d8f Use 'Chunk Groups' in EXPLAIN ANALYZE of columnar scan 2021-02-05 10:58:01 -08:00
Ahmet Gedemenli 5dd2a3da03 Convert RelabelTypes into CollateExprs in get_rule_expr function 2021-02-05 12:06:46 +03:00
Onder Kalaci fc9a23792c COPY uses adaptive connection management on local node
With #4338, the executor is smart enough to failover to
local node if there is not enough space in max_connections
for remote connections.

For COPY, the logic is different. With #4034, we made COPY
work with the adaptive connection management slightly
differently. The cause of the difference is that COPY doesn't
know which placements are going to be accessed hence requires
to get connections up-front.

Similarly, COPY decides to use local execution up-front.

With this commit, we change the logic for COPY on local nodes:

Try to reserve a connection to local host. This logic follows
the same logic (e.g., citus.local_shared_pool_size) as the
executor because COPY also relies on TryToIncrementSharedConnectionCounter().
If reservation to local node fails, switch to local execution
Apart from this, if local execution is disabled, we follow the
exact same logic for multi-node Citus. It means that if we are
out of the connection, we'd give an error.
2021-02-04 09:45:07 +01:00
Sait Talha Nisanci 9ba3f70420 Remove unused method 2021-02-03 20:02:03 +03:00
Onur Tirtir 3a403090fd
Disallow adding local table with identity column to metadata (#4633)
pg_get_tableschemadef_string doesn't know how to deparse identity
columns so we cannot reflect those columns when creating shell
relation.
For this reason, we don't allow adding local tables -having identity cols-
to metadata.
2021-02-03 19:05:17 +03:00
Onur Tirtir 93c3f30024 Rename ExtractColumnsOwningSequences 2021-02-02 18:17:42 +03:00
Hanefi Önaldı cab17afce9 Introduce UDFs for fixing partitioned table constraint names 2021-01-29 17:32:20 +03:00
SaitTalhaNisanci 738825cc38
Fix partition column index issue (#4591)
* Fix partition column index issue

We send column names to worker_hash/range_partition_table methods, and
in these methods we check the column name index from tuple descriptor.
Then this index is used to decide the bucket that the current row will
be sent for the repartition.

This becomes a problem when there are the same column names in the
tupleDescriptor. Then we can choose the wrong index. Hence the
partitioned data will be put to wrong workers. Then the result could
miss some data because workers might contain different range of data.

An example:
TupleDescriptor contains "trip_id", "car_id", "car_id" for one table.
It contains only "car_id" for the other table. And assuming that the
tables will be partitioned by car_id, it is not certain what should be
used for deciding the bucket number for the first table. Assuming value
2 goes to bucket 2 and value 3 goes to bucket 3, it is not certain which
bucket "1 2 3" (trip_id, car_id, car_id)  row will go to.

As a solution we send the index of partition column in targetList
instead of the column name.

The old API is kept so that if workers upgrade work, it still works
(though it will have the same bug)

* Use the same method so that backporting is easier
2021-01-29 14:40:40 +03:00
Onur Tirtir 2f30be823e Rename create_citus_local_table to citus_add_local_table_to_metadata
For simplicity in downgrade test in multi_extension, didn't
actually remove create_citus_local_table udf.
2021-01-27 15:52:36 +03:00
Onur Tirtir 458a81f93d Add suppressNoticeMessages to TableConversionState 2021-01-27 12:53:58 +03:00
Naisila Puka 94bc2703bc
Make undistribute_table() and citus_create_local_table() work with columnar (#4563)
* Make undistribute_table() and citus_create_local_table() work with columnar

* Rename and use LocallyExecuteUtilityTask for UDF check

* Remove 'local' references in ExecuteUtilityCommand
2021-01-27 01:17:20 +03:00
Halil Ozan Akgul bafa692fc1 Adds error messages with names of indexes that will be dropped 2021-01-26 18:18:26 +03:00
Jeff Davis d62e54dc09 Columnar: optimize write path. 2021-01-25 11:47:21 -08:00
Hadi Moshayedi 639952ffa8 Read chunk row count from catalog tables 2021-01-25 08:53:52 -08:00
Onur Tirtir b5ea033a0b Convert postgres tables to citus local when creating reference table having fkeys 2021-01-25 11:02:50 +03:00
Onur Tirtir 253c19062a
Rename IsCitusInitiatedBackend to IsCitusInitiatedRemoteBackend (#4562) 2021-01-23 01:07:43 +03:00
Jeff Davis 53f7b019d5 Columnar: clean up old references to cstore. 2021-01-22 11:08:36 -08:00
Onur Tirtir 941c8fbf32
Automatically undistribute citus local tables when no more fkeys with reference tables (#4538) 2021-01-22 18:15:41 +03:00
Ahmet Gedemenli 887b67953b
Merge branch 'master' into fix-bug-create-citus-local-table-with-stats 2021-01-22 12:46:47 +03:00
Hadi Moshayedi 222fb4d589 Don't use 'cstore' in function names 2021-01-21 18:32:21 -08:00
jeff-davis 0b5551faaf
Columnar: add explain info for chunk filtering (#4554)
Co-authored-by: Jeff Davis <jefdavi@microsoft.com>
2021-01-21 15:04:42 -08:00
Ahmet Gedemenli 2fa060a32d Fix bug creating citus local table with stats 2021-01-20 17:17:13 +03:00
Onder Kalaci 8df58926c5 Rename CitusProcessUtility -> ProcessUtilityForNode 2021-01-20 15:54:00 +03:00
Hadi Moshayedi bc01c795a2 Reland #4419 2021-01-19 07:48:47 -08:00
Halil Ozan Akgul 27c2bd1599 Moves creation of ALTER INDEX STATISTICS commands next to index commands 2021-01-18 16:55:53 +03:00
Onur Tirtir f1ecbc3a53
Fix segfault when adding/dropping fkey from ref to citus local via remote exec (#4528) 2021-01-17 20:43:33 +03:00
Onder Kalaci c35e22d75d Skip validation for foreign key creation commands
For certaion purposes, we drop and recreate the foreign
keys. As we acquire exclusive locks on the tables in between
drop and re-create, we can safely skip validation phase of
the foreign keys. The reason is purely being performance as
foreign key validation could take a long value.
2021-01-15 18:04:52 +03:00
Onder Kalaci ae0b92233d Rename function 2021-01-15 18:04:52 +03:00
Onder Kalaci 30d0a65f40 Adds citus.enable_local_reference_table_foreign_keys
When enabled any foreign keys between local tables and reference
tables supported by converting the local table to a citus local
table.

When the coordinator is not in the metadata, the logic is disabled
as foreign keys are not allowed in this configuration.
2021-01-15 18:04:52 +03:00
Halil Ozan Akgul 9407965817 Moves struct to the header 2021-01-15 11:50:11 +03:00
Onur Tirtir 36b418982f Add support for ALTER TABLE commands defining foreign keys 2021-01-14 17:12:00 +03:00
Onur Tirtir 05931b8fe2 Pass ProcessUtilityContext to .preprocess 2021-01-14 17:12:00 +03:00
jeff-davis 9cffd41389
Cleanup: use table_open, not heap_open. (#4506)
Co-authored-by: Jeff Davis <jefdavi@microsoft.com>
2021-01-13 12:08:46 -08:00
jeff-davis b49beda4c3
Stronger check for triggers on columnar tables (#4493). (#4494)
* Stronger check for triggers on columnar tables (#4493).

Previously, we used a simple ProcessUtility_hook. Change to use an
object_access_hook instead.

* Replace alter_table_set_access_method test on partition with foreign key

Co-authored-by: Jeff Davis <jefdavi@microsoft.com>
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2021-01-13 10:30:53 -08:00
Onur Tirtir 00da1eed20 Some refactor as a preparation 2021-01-13 16:50:09 +03:00
Halil Ozan Akgul 2be14cce2e Adds alter_distributed_table and alter_table_set_access_method UDFs 2021-01-13 16:02:39 +03:00
SaitTalhaNisanci 724d56f949
Add citus shard helper view (#4361)
With citus shard helper view, we can easily see:
- where each shard is, which node, which port
- what kind of table it belongs to
- its size

With such a view, we can see shards that have a size bigger than some
value, which could be useful. Also debugging can be easier in production
as well with this view.

Fetch shards in one go per node

The previous implementation was slow because it would do a lot of round
trips, one per shard to be exact. Hence it is improved so that we fetch
all the shard_name, shard-size pairs per node in one go.

Construct shards_names, sizes query on coordinator
2021-01-13 13:58:47 +03:00
Ahmet Gedemenli 436c9d9d79
Remove the word 'master' from Citus UDFs (#4472)
* Replace master_add_node with citus_add_node

* Replace master_activate_node with citus_activate_node

* Replace master_add_inactive_node with citus_add_inactive_node

* Use master udfs in old scripts

* Replace master_add_secondary_node with citus_add_secondary_node

* Replace master_disable_node with citus_disable_node

* Replace master_drain_node with citus_drain_node

* Replace master_remove_node with citus_remove_node

* Replace master_set_node_property with citus_set_node_property

* Replace master_unmark_object_distributed with citus_unmark_object_distributed

* Replace master_update_node with citus_update_node

* Replace master_update_shard_statistics with citus_update_shard_statistics

* Replace master_update_table_statistics with citus_update_table_statistics

* Rename master_conninfo_cache_invalidate to citus_conninfo_cache_invalidate

Rename master_dist_local_group_cache_invalidate to citus_dist_local_group_cache_invalidate

* Replace master_copy_shard_placement with citus_copy_shard_placement

* Replace master_move_shard_placement with citus_move_shard_placement

* Rename master_dist_node_cache_invalidate to citus_dist_node_cache_invalidate

* Rename master_dist_object_cache_invalidate to citus_dist_object_cache_invalidate

* Rename master_dist_partition_cache_invalidate to citus_dist_partition_cache_invalidate

* Rename master_dist_placement_cache_invalidate to citus_dist_placement_cache_invalidate

* Rename master_dist_shard_cache_invalidate to citus_dist_shard_cache_invalidate

* Drop master_modify_multiple_shards

* Rename master_drop_all_shards to citus_drop_all_shards

* Drop master_create_distributed_table

* Drop master_create_worker_shards

* Revert old function definitions

* Add missing revoke statement for citus_disable_node
2021-01-13 12:10:43 +03:00
Onur Tirtir dd55ab394e
Disallow cascade_via_foreign_keys if any partition rel has non-inherited fkeys (#4487) 2021-01-11 21:50:09 +03:00
Marco Slot d900a7336e Automatically add placeholder record for coordinator 2021-01-08 15:09:53 +01:00
Marco Slot 597533b1ff Add citus_set_coordinator_host 2021-01-08 13:36:26 +01:00
Onur Tirtir 5289785da4
Add cascade_via_foreign_keys option to create_citus_local_table (#4462) 2021-01-08 15:13:26 +03:00
Marco Slot 011283122b Add the shard rebalancer implementation 2021-01-07 16:51:55 +01:00
Onur Tirtir f3801143fb Add cascade option to undistribute_table 2021-01-07 15:41:49 +03:00
Onur Tirtir 2e3e680ba9 Add infra to cascade citus table functions 2021-01-07 15:41:48 +03:00
Marco Slot 47c1b19174 Revert "Do metadata sync in a separate background worker."
This reverts commit 4df723cf9b.
2021-01-07 10:30:04 +01:00
Marco Slot d9f175532b Revert "Trigger metadata sync at transaction commit"
This reverts commit a2c73bef27.
2021-01-07 10:30:00 +01:00
Marco Slot 5de3337b2f Support local execution for INSERT..SELECT with re-partitioning 2021-01-06 16:15:53 +01:00
Onur Tirtir e91e745dbc
Implement ConstraintWithNameIsOfType (#4451) 2020-12-29 11:53:06 +03:00
Onur Tirtir 04a4167a8a Implement GetPgDependTuplesForDependingObjects 2020-12-25 18:03:28 +03:00
Hadi Moshayedi a2c73bef27 Trigger metadata sync at transaction commit 2020-12-24 08:28:38 -08:00
Hadi Moshayedi 4df723cf9b Do metadata sync in a separate background worker. 2020-12-24 08:25:55 -08:00
Ahmet Gedemenli 48ca1637a4 Propagate alter stats owner 2020-12-24 17:10:12 +03:00
Ahmet Gedemenli f7c70f9a63 Propagate alter stats target 2020-12-24 17:10:12 +03:00
Ahmet Gedemenli 5a1607b6c0 Propagate alter stats schema 2020-12-24 17:10:12 +03:00
Ahmet Gedemenli bdce4a7e67 Propagate rename statistics 2020-12-24 17:10:12 +03:00
Onur Tirtir 5ed9197041
Implement infra to get foreign key connected relations (#4439)
On top of our foreign key graph, implement the infrastructure to get
list of relations that are connected to input relation via a foreign key
graph.
We need this to support cascading create_citus_local_table &
undistribute_table operations.

Also add regression tests to see what our foreign key graph is able to
capture currently.
2020-12-24 16:42:40 +03:00
Halil Ozan Akgül 9fd3f62cb6
Refactor foreign key functions to use table types (#4424)
* Reuses extractReferencing/Referenced variables

* Refactors GetForeignKeyOids function to check table types

* Converts flags to inclusive
2020-12-23 17:05:09 +03:00
Onur Tirtir d1b3eaf767
Refactor ColumnAppearsInForeignKeyToReferenceTable (#4441) 2020-12-23 11:44:02 +03:00
Ahmet Gedemenli 874fa1fc09 Propagate Drop Statistics 2020-12-22 18:34:46 +03:00
Marco Slot f2056e553f
Expose partition column of subqueries in optimizer (#4355)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2020-12-18 20:32:52 +01:00
Ahmet Gedemenli 770d3da1ca Add dependencies for stat schemas 2020-12-18 17:04:13 +03:00
Ahmet Gedemenli 6c0465566a Propagate create statistics 2020-12-17 20:38:36 +03:00
Marco Slot 100e5d3196 Address review feedback 2020-12-15 15:23:38 +01:00
Sait Talha Nisanci 7951273f74 Refactor WrapRteRelationIntoSubquery 2020-12-15 18:18:36 +03:00
Sait Talha Nisanci 0e53aa5d3b Add more tests 2020-12-15 18:18:36 +03:00
Sait Talha Nisanci f7c1509fed Not check if the query is routable for converting
It seems that there are only very few cases where that is useful, and
for now we prefer not having that check. This means that we might
perform some unnecessary checks, but that should be rare and not
performance critical.
2020-12-15 18:18:36 +03:00
Sait Talha Nisanci 1d82972ff4 Increase the performance with a trick
Instead of sending NULL's over a network, we now convert the subqueries
in the form of:

SELECT t.a, NULL, NULL FROM (SELECT a FROM table)t;

And we recursively plan the inner part so that we don't send the NULL's
over network. We still need the NULLs in the outer subquery because we
currently don't have an easy way of updating all the necessary places in
the query.

Add some documentation for how the conversion is done
2020-12-15 18:18:36 +03:00
Sait Talha Nisanci 3aed6c3ad0 Rename containsOnlyLocalTable as isLocalTableModification
Update error message in Modify View
2020-12-15 18:18:36 +03:00
Sait Talha Nisanci 5618f3a3fc Use BaseRestrictInfo for finding equality columns
Baseinfo also has pushed down filters etc, so it makes more sense to use
BaseRestrictInfo to determine what columns have constant equality
filters.

Also RteIdentity is used for removing conversion candidates instead of
rteIndex.
2020-12-15 18:18:36 +03:00
Sait Talha Nisanci 28c5b6a425 Convert some hard coded errors to deferred errors in router planner 2020-12-15 18:18:36 +03:00
Sait Talha Nisanci 69992d58f9 Add broken local-dist table modifications tests
It seems that most of the updates were broken, we weren't aware of it
because there wasn't any data in the tables. They are broken mostly
because local tables do not have a shard id and some code paths should
be updated with that information, currently when there is an invalid
shard id, it is assumed to be pruned.

Consider local tables in router planner

In case there is a local table, the shard id will not be valid and there
are some checks that rely on shard id, we should skip these in case of
local tables, which is handled with a dummy placement.

Add citus local table dist table join tests

add local-dist table mixed joins tests
2020-12-15 18:18:36 +03:00
Sait Talha Nisanci a34504d7bf Move recursive planning related function to recursive_planning 2020-12-15 18:17:10 +03:00
Sait Talha Nisanci 2a44029aaf Simplify ContainsTableToBeConvertedToSubquery
AllDataLocallyAccessible and ContainsLocalTableSubqueryJoin are removed.
We can possibly remove ModifiesLocalTableWithRemoteCitusLocalTable as
well. Though this removal has a side effect that now when all the data
is locally available, we could still wrap a relation into a subquery, I
guess that should be resolved in the router planner itself.

Add more tests
2020-12-15 18:17:10 +03:00
Sait Talha Nisanci 26d9f0b457 Use auto mode in tests and fix debug message 2020-12-15 18:17:10 +03:00
Sait Talha Nisanci eebcd995b3 Add some more tests 2020-12-15 18:17:10 +03:00
Sait Talha Nisanci 5693cabc41 Not convert an already routable plannable query
We should not recursively plan an already routable plannable query. An
example of this is (SELECT * FROM local JOIN (SELECT * FROM dist) d1
USING(a));

So we let the recursive planner do all of its work and at the end we
convert the final query to to handle unsupported joins. While doing each
conversion, we check if it is router plannable, if so we stop.

Only consider range table entries that are in jointree

If a range table is not in jointree then there is no point in
considering that because we are trying to convert range table entries to
subqueries for join use case.
2020-12-15 18:17:10 +03:00
Sait Talha Nisanci 2ff65f3630 Enable partitioned distributed tables in local-dist table joins 2020-12-15 18:17:10 +03:00
Sait Talha Nisanci 44953579cf Enable citus-local distributed table joins
Check equality in quals

We want to recursively plan distributed tables only if they have an
equality filter on a unique column. So '>' and '<' operators will not
trigger recursive planning of distributed tables in local-distributed
table joins.

Recursively plan distributed table only if the filter is constant

If the filter is not a constant then the join might return multiple rows
and there is a chance that the distributed table will return huge data.
Hence if the filter is not constant we choose to recursively plan the
local table.
2020-12-15 18:17:10 +03:00
Sait Talha Nisanci f3d55448b3 Choose distributed table if it has a unique index in filter
When doing local-distributed table joins we convert one of them to
subquery. The current policy is that we convert distributed tables to
subquery if it has a unique index on a column that has unique
index(primary key also has a unique index).
2020-12-15 18:17:10 +03:00
Onder Kalaci 3f4952cc2b Pushdown projections when relations are recursively planned
This is important to limit the data transfer size.
2020-12-15 18:17:10 +03:00
Onder Kalaci 594e001f3b Add filter pushdown regression tests
Also handle WHERE false
2020-12-15 18:17:10 +03:00
Onder Kalaci 7a4d6b2984 Handle modifications as well 2020-12-15 18:17:10 +03:00
Onder Kalaci 8f8390ed6e Recursively plan local table joins
The logical planner cannot handle joins between local and distributed table.
Instead, we can recursively plan one side of the join and let the logical
planner handle the rest.

Our algorithm is a little smart, trying not to recursively plan distributed
tables, but favors local tables.
2020-12-15 18:17:10 +03:00
Onder Kalaci 7cc25c9125 Add ability to fetch the restrictions per relation
With this commit, we add the ability to add restrictions
per relation. We simply rely on the restrictions that Postgres
keeps per relation.
2020-12-15 18:17:10 +03:00
Marco Slot f2538a456f Support co-located/recurring sublinks in the target list 2020-12-13 15:45:24 +01:00
Hadi Moshayedi 4668fe51a6 Columnar: Make compression level configurable 2020-12-09 08:48:50 -08:00
Hadi Moshayedi f5a4a4bc74 Columnar: Support zstd compression 2020-12-09 08:30:55 -08:00
Hadi Moshayedi 3f81ee26fd Columnar: Support LZ4 compression 2020-12-09 08:29:07 -08:00
Jeff Davis 3758e83850 Rename cstore->columnar in SQL objects and errors. 2020-12-07 13:01:53 -08:00
Ahmet Gedemenli 936775e8e3 Delete transactions when removing node
With this commit, we delete entries in pg_dist_transaction
for the primary nodes that are removed by `master_remove_node`.
2020-12-07 11:35:20 +03:00
Hadi Moshayedi 01da2a1c73 Columnar: track decompressed length in metadata 2020-12-04 09:09:39 -08:00
Hadi Moshayedi 4a9aebaa7b Columnar: rename block to chunk 2020-12-03 08:50:19 -08:00
SaitTalhaNisanci f164575524
Add a utility to process each table index (#4382)
A utility function is added so that each caller can implement a handler
for each index on a given table. This means that the caller doesn't need
to worry about how to access each index, the only thing that it needs to
do each to implement a function to which each index on the table is
passed iteratively.
2020-12-03 16:33:13 +03:00
Onder Kalaci c546ec5e78 Local node connection management
When Citus needs to parallelize queries on the local node (e.g., the node
executing the distributed query and the shards are the same), we need to
be mindful about the connection management. The reason is that the client
backends that are running distributed queries are competing with the client
backends that Citus initiates to parallelize the queries in order to get
a slot on the max_connections.

In that regard, we implemented a "failover" mechanism where if the distributed
queries cannot get a connection, the execution failovers the tasks to the local
execution.

The failover logic is follows:

- As the connection manager if it is OK to get a connection
	- If yes, we are good.
	- If no, we fail the workerPool and the failure triggers
	  the failover of the tasks to local execution queue

The decision of getting a connection is follows:

/*
 * For local nodes, solely relying on citus.max_shared_pool_size or
 * max_connections might not be sufficient. The former gives us
 * a preview of the future (e.g., we let the new connections to establish,
 * but they are not established yet). The latter gives us the close to
 * precise view of the past (e.g., the active number of client backends).
 *
 * Overall, we want to limit both of the metrics. The former limit typically
 * kics in under regular loads, where the load of the database increases in
 * a reasonable pace. The latter limit typically kicks in when the database
 * is issued lots of concurrent sessions at the same time, such as benchmarks.
 */
2020-12-03 14:16:13 +03:00
Hadi Moshayedi c2f60b6422
Columnar: pg_upgrade support (#4354) 2020-12-02 08:46:59 -08:00
Ahmet Gedemenli 514c6a76ac Propagate alter schema rename 2020-12-02 15:18:26 +03:00
Nils Dijk 6f9c040f76
DESCRIPTION: Propagate columnar table settings for distributed tables
When distributing a columnar table, as well as changing options on a distributed columnar table, this patch will forward the settings from the coordinator to the workers.

For propagating options changes on an already distributed table this change is pretty straight forward. Before applying the change in options locally we will create a `DDLJob` that contains a call to `alter_columnar_table_set(...)` for every shard placement with all settings of the current table. This goes both for setting an option as well as resetting. This will reset the values to the defaults configured on the coordinator. Having the effect that the coordinator is authoritative on the settings and makes sure the shards have the same settings set as the table on the coordinator.

When a columnar table is distributed it is using the `TableDDLCommand` infra structure to create a new kind of `TableDDLCommand`. This new type, called a `TableDDLCommandFunction` contains a context and 2 function pointers to execute. One function returns the command as applied on the table, the second function will return the sql command to apply to a shard with a given shard id. The schema name is ignored as it will use the fully qualified name of the shard in the same schema as the base table.
2020-12-02 13:02:42 +01:00
Onder Kalaci f7e1aa3f22 Multi-row INSERTs use local execution when placements are local
Multi-row execution already uses sequential execution. When shards
are local, using local execution is profitable as it avoids
an extra connection establishment to the local node.
2020-12-01 21:37:59 +03:00
Hadi Moshayedi a94e8c9cda
Associate column store metadata with storage id (#4347) 2020-11-30 18:01:43 -08:00
Onur Tirtir 7f3d1182ed
Handle invalid connection hash entries (#4362)
If MemoryContextAlloc errors out -e.g. during an OOM-, ConnectionHashEntry->connections
stays as NULL.

With this commit, we add isValid flag to ConnectionHashEntry that should be set to true
right after we allocate & initialize ConnectionHashEntry->connections list properly, and we
check it before accesing to ConnectionHashEntry->connections.
2020-11-30 19:44:03 +03:00
Nils Dijk 383e334023
refactor options to their own table linked to the regclass (#4346)
Columnar options were by accident linked to the relfilenode instead of the regclass/relation oid. This PR moves everything related to columnar options to their own catalog table.
2020-11-27 11:22:08 -08:00
Nils Dijk 326e6afa53
refactor table ddl events scoped for shards (#4342)
Refactor internals on how Citus creates the SQL commands it sends to recreate shards.

Before Citus collected solely ddl commands as `char *`'s to recreate a table. If they were used to create a shard they were wrapped with `worker_apply_shard_ddl_command` and send to the workers. On the workers the UDF wrapping the ddl command would rewrite the parsetree to replace tables names with their shard name equivalent.

This worked well, but poses an issue when adding columnar. Due to limitations in Postgres on creating custom options on table access methods we need to fall back on a UDF to set columnar specific options. Now, to recreate the table, we can not longer rely on having solely DDL statements to recreate a table.

A prototype was made to run this UDF wrapped in `worker_apply_shard_ddl_command`. This became pretty messy, hard to understand and subsequently hard to maintain.

This PR proposes a refactor of the internal representation of table ddl commands into a `TableDDLCommand` structure. The current implementation only supports a `char *` as its contents. Based on the use of the DDL statement (eg. creating the table -mx- or creating a shard) one of two different functions can be called to get the statement to send to the worker:
 - `GetTableDDLCommand(TableDDLCommand *command)`: This function returns that ddl command to create the table. In this implementation it will just return the `char *`. This has the same functionality as getting the old list and not wrapping it.
 - `GetShardedTableDDLCommand(TableDDLCommand *command, uint64 shardId, char *schemaName)`: This function returns the ddl command wrapped in `worker_apply_shard_ddl_command` with the `shardId` as an argument. Due to backwards compatibility it also accepts a. `schemaName`. The exact purpose is not directly clear. Ideally new implementations would work with fully qualified statements and ignore the `schemaName`.

A future implementation could accept 2.function pointers and a `void *` for context to let the two pointers work on. This gives greater flexibility in controlling what commands get send in which situations. Also, in a future, we could implement the intermediate step of creating the `parsetree` datastructure of statements based on the contents in the catalog with a corresponding deparser. For sharded queries a mutator could be ran over the parsetree to rewrite the tablenames to the names with the shard identifier. This will completely omit the requirement for `worker_apply_shard_ddl_command`.
2020-11-26 13:31:59 +01:00
Onder Kalaci 629ecc3dee Add the infrastructure to count the number of client backends
Considering the adaptive connection management
improvements that we plan to roll soon, it makes it
very helpful to know the number of active client
backends.

We are doing this addition to simplify yhe adaptive connection
management for single node Citus. In single node Citus, both the
client backends and Citus parallel queries would compete to get
slots on Postgres' `max_connections` on the same Citus database.

With adaptive connection management, we have the counters for
Citus parallel queries. That helps us to adaptively decide
on the remote executions pool size (e.g., throttle connections
if necessary).

However, we do not have any counters for the total number of
client backends on the database. For single node Citus, we
should consider all the client backends, not only the remote
connections that Citus does.

Of course Postgres internally knows how many client
backends are active. However, to get that number Postgres
iterates over all the backends. For examaple, see [pg_stat_get_db_numbackends](8e90ec5580/src/backend/utils/adt/pgstatfuncs.c (L1240))
where Postgres iterates over all the backends.

For our purpuses, we need this information on every connection
establishment. That's why we cannot affort to do this kind of
iterattion.
2020-11-25 19:19:24 +01:00
Onur Tirtir 46be63d76b
Refactor PreprocessIndexStmt (#4272) 2020-11-25 12:19:37 +03:00
Hadi Moshayedi 40b52ab757 Fix memory leaks in column store 2020-11-23 11:26:12 -08:00
Jeff Davis 8cee2b092b remove columnar FDW code 2020-11-20 10:03:12 -08:00
Onder Kalaci c433c66f2b Do not execute subplans multiple times with cursors
Before this commit, we let AdaptiveExecutorPreExecutorRun()
to be effective multiple times on every FETCH on cursors.
That does not affect the correctness of the query results,
but adds significant overhead.
2020-11-20 10:43:56 +01:00
Jeff Davis a2b698a766 rename cstore_tableam -> columnar 2020-11-19 12:15:51 -08:00
Hadi Moshayedi 97cba2d5b6 Implements write state management for tuple inserts.
TableAM API doesn't allow us to pass around a state variable along all of the tuple inserts belonging to the same command. We require this in columnar store, since we batch them, and when we have enough rows we flush them as stripes.

To do that, we keep a (relfilenode) -> stack of (subxact id, TableWriteState) global mapping.

**Inserts**

Whenever we want to insert a tuple, we look up for the relation's relfilenode in this mapping. If top of the stack matches current subtransaction, we us the existing TableWriteState. Otherwise, we allocate a new TableWriteState and push it on top of stack.

**(Sub)Transaction Commit/Aborts**

When the subtransaction or transaction is committed, we flush and pop all entries matching current SubTransactionId.

When the subtransaction or transaction is committed, we pop all entries matching current SubTransactionId and discard them without flushing.

**Reads**

Since we might have unwritten rows which needs to be read by a table scan, we flush write states on SELECTs. Since flushing the write state of upper transactions in a subtransaction will cause metadata being written in wrong subtransaction, we ERROR out if any of the upper subtransactions have unflushed rows.

**Table Drops**

We record in which subtransaction the table was dropped. When committing a subtransaction in which table was dropped, we propagate the drop to upper transaction. When aborting a subtransaction in which table was dropped, we mark table as not deleted.
2020-11-17 12:07:16 -08:00
Nils Dijk 725f4a37d0
change configure to not have options 2020-11-17 19:01:54 +01:00
Nils Dijk 213eb93e6d
make columnar compile and functionally working 2020-11-17 18:55:34 +01:00
Nils Dijk 527d3ce0bb
move headers to include directory 2020-11-17 18:55:34 +01:00
Önder Kalacı 0c0fc69f2a
Remove unused field (#4275) 2020-11-17 11:41:57 +01:00
Hanefi Onaldi d3019f1b6d
Introduce foreach_ptr_modify macro (#4303)
If one wishes to iterate through a List and insert list elements in
PG13, it is not safe to use for_each_ptr as the List representation
in PostgreSQL no longer linked lists, but arrays, and it is possible
that the whole array is repalloc'ed if ther is not sufficient space
available.

See postgres commit 1cff1b95ab6ddae32faa3efe0d95a820dbfdc164 for more
information
2020-11-09 12:03:59 +03:00
Onder Kalaci e0d2ac7620 Do not rely on set_rel_pathlist_hook for finding local relations
When a relation is used on an OUTER JOIN with FALSE filters,
set_rel_pathlist_hook may not be called for the table.

There might be other cases as well, so do not rely on the hook
for classification of the tables.
2020-11-06 11:14:30 +01:00
Halil Ozan Akgul 77b3be8b6d Turn RelOptInfos to only used field of them, relids, to be able to copy 2020-10-22 13:42:28 +03:00
Onder Kalaci 5c4c9304ba Remove RemoveDuplicateJoinRestrictions() function
RemoveDuplicateJoinRestrictions() function was introduced with the aim of decrasing the overall planning times by eliminating the duplicate JOIN restriction entries (#1989). However, it turns out that the function itself is so CPU intensive with a very high algorithmic complexity, it hurts a lot more than it helps. The function is a clear example of premature optimization.

The table below shows the difference clearly:

"distributed query planning
 time master"	RemoveDuplicateJoinRestrictions() execution time on master	"Remove the function RemoveDuplicateJoinRestrictions()
this PR"
5 table INNER JOIN	9 msec	2msec	7 msec
10 table INNER JOIN	227 msec	194 msec	29  msec
20 table INNER JOIN	1 sec 235 msec	1  sec 139  msec	90 msecs
50 table INNER JOIN	24 seconds	21 seconds	1.5 seconds
100 table INNER JOIN	2 minutes 16 secods	1 minute 53 seconds	23 seconds
250 table INNER JOIN	Bottleneck on JoinClauseList	18 minutes 52 seconds	Bottleneck on JoinClauseList

5 table INNER JOIN in subquery	9 msec	0 msec	6 msec
10 table INNER JOIN subquery	33 msec	10 msec	32 msec
20 table INNER JOIN subquery	132 msec	67 msec	123 msec
50 table INNER JOIN subquery	1.2  seconds	900 msec	500 msec
100 table INNER JOIN subquery	6 seconds	5  seconds	2 seconds
250 table INNER JOIN subquery	54 seconds	37 seconds	20  seconds

5 table LEFT JOIN	5 msec	0 msec	5 msec
10 table LEFT JOIN	11 msec	0 msec	13 msec
20 table LEFT JOIN	26 msec	2 msec	30 msec
50 table LEFT JOIN	150 msec	15 msec	193 msec
100 table LEFT JOIN	757 msec	71 msec	722 msec
250 table LEFT JOIN	8 seconds	600 msec	8 seconds

5 JOINs among 2 table JOINs 	37 msec	11 msec	25 msec
10 JOINs among 2 table JOINs 	536 msec	306 msec	352 msec
20 JOINs among 2 table JOINs 	794 msec	181 msec	640 msec
50 JOINs among 2 table JOINs 	25 seconds	2 seconds	22 seconds
100 JOINs among 2 table JOINs 	Bottleneck on JoinClauseList	9 seconds	Bottleneck on JoinClauseList
150 JOINs among 2 table JOINs 	Bottleneck on JoinClauseList	46 seconds	Bottleneck on JoinClauseList

On top of the performance penalty, the function had a critical bug #4255, and with #4254 we hit one more important bug. It should be fixed by adding the followig check to the ContextCoversJoinRestriction():
```
static bool
JoinRelIdsSame(JoinRestriction *leftRestriction, JoinRestriction *rightRestriction)
{
	Relids leftInnerRelIds = leftRestriction->innerrel->relids;
	Relids rightInnerRelIds = rightRestriction->innerrel->relids;
	if (!bms_equal(leftInnerRelIds, rightInnerRelIds))
	{
		return false;
	}

	Relids leftOuterRelIds = leftRestriction->outerrel->relids;
	Relids rightOuterRelIds = rightRestriction->outerrel->relids;
	if (!bms_equal(leftOuterRelIds, rightOuterRelIds))
	{
		return false;
	}

	return true;
}
```

However, adding this eliminates all the benefits tha RemoveDuplicateJoinRestrictions() brings.

I've used the commands here to generate the JOINs mentioned in the PR: https://gist.github.com/onderkalaci/fe8654f9df5916c7af4c7c5eb892561e#file-gistfile1-txt

Inner and outer JOINs behave roughly the same, to simplify the table only added INNER joins.
2020-10-21 10:29:39 +02:00
SaitTalhaNisanci 0f209377c4
Fix incorrect join related fields (#4242)
* Fix incorrect join related fields

Ruleutils expect to give the original index of join columns hence we
should consider the dropped columns while setting the fields in
SetJoinRelatedFieldsCompat.

* add some more tests for joins

* Move tests to join.sql and create a utility function
2020-10-19 18:28:39 +03:00
Onur Tirtir c49077d594
Disallow outer joins `ON TRUE` with ref & dist tables when ref table is outer relation (#4255)
Disallow `ON TRUE` outer joins with reference & distributed tables
when reference table is outer relation by fixing the logic bug made
when calling `LeftListIsSubset` function.

Also, be more defensive when removing duplicate join restrictions
when join clause is empty for non-inner joins as they might still
contain useful information for non-inner joins.
2020-10-19 16:58:11 +03:00
Onur Tirtir f80f4839ad Remove unused functions that cppcheck found 2020-10-19 13:50:52 +03:00
Onder Kalaci bbedfca761 Improve the relation restriction counters
It seems like Postgres could call set_rel_pathlist() for
the same relation multiple times. This breaks the logic
where we assume relationCount eqauls to the number of
entries in relationRestrictionList.

In summary, relationRestrictionList may contain duplicate
entries.
2020-10-19 08:51:16 +02:00
Nils Dijk caabbf4b84 Table access method support for distributed tables 2020-10-16 12:02:25 -07:00
Onur Tirtir 7cb07c70fa
Move hasSemiJoin to JoinRestrictionContext (#4256) 2020-10-16 18:37:39 +03:00
Onur Tirtir de6f2d3f42
Refactor JoinRestrictionListExistsInContext to improve readability (#4249) 2020-10-16 12:24:56 +03:00
Onder Kalaci fe3caf3bc8 Local execution considers intermediate result size limit
With this commit, we make sure that local execution adds the
intermediate result size as the distributed execution adds. Plus,
it enforces the citus.max_intermediate_result_size value.
2020-10-15 17:18:55 +02:00
Marco Slot 31858c8a29 Check table existence in EnsureRelationKindSupported 2020-10-15 17:05:06 +02:00
Sait Talha Nisanci ecde6c6eef Introduce GetCurrentLocalExecutionStatus wrapper
We should not access CurrentLocalExecutionStatus directly because that
would mean that we could also set it directly, which we shouldn't
because we have checks to see if the new state is possible, otherwise we
error.
2020-10-15 15:38:19 +03:00
Halil Ozan Akgul e2736c25bd Adds support for WITH TIES option 2020-10-12 19:34:18 +03:00
Marco Slot 73fc054c27 Rename DDL command functions 2020-10-06 11:30:56 +02:00
Marco Slot dbc348b7e0 Create sequence dependency during metadata syncing 2020-10-06 10:57:39 +02:00
Ahmet Gedemenli 81db4dca5c Degrade gracefully when no background workers available 2020-10-05 16:55:00 +03:00
Hanefi Önaldı 6d8e83d24f
Replace worker_hash calls with partkey IS NOT NULL filters 2020-10-02 18:16:24 +03:00
Önder Kalacı df5aa0f0cc
Switch to sequential execution if the index name is long (#4209)
Citus has the logic to truncate the long shard names to prevent
various issues, including self-deadlocks. However, for partitioned
tables, when index is created on the parent table, the index names
on the partitions are auto-generated by Postgres. We use the same
Postgres function to generate the index names on the shards of the
partitions. If the length exceeds the limit, we switch to sequential
execution mode.
2020-10-02 13:39:34 +03:00
Onder Kalaci 56ca256374 Forcefully terminate connections after citus.node_connection_timeout
After the connection timeout, we fail the session/pool. However, the
underlying connection can still be trying to connect. That is dangerous
because the new placement executions have already been in place. The
executor cannot handle the situation where multiple of
EXECUTION_ORDER_ANY task executions succeeds.

Adding a regression test doesn't seem easily doable. To reproduce the issue
- Add 2 worker nodes
- create a reference table
- set citus.node_connection_timeout to 1ms (requires code change)
- Continiously execute `SELECT count(*) FROM ref_table`
- Sometime later, you hit an out-of-array access in
  `ScheduleNextPlacementExecution()` hence crashing.
- The reason for that is sometimes the first connection
  successfully established while the executor is already
  trying to execute the query on the second node.
2020-09-30 18:24:24 +02:00
Marco Slot b905c8043d Fix create index concurrently crash with local execution 2020-09-25 11:49:09 +02:00
Ahmet Gedemenli abfb79bda6 Sort explain analyze output by task time
Add sort method parameter for regression tests

Fix check-style

Change sorting method parameters to enum

Polish

Add task fields to OutTask

Add test into multi_explain

Fix isolation test
2020-09-24 11:38:40 +03:00
SaitTalhaNisanci e7cd1ed0ee
Not take ShareUpdateExlusiveLock on pg_dist_transaction (#4184)
* Not take ShareUpdateExlusiveLock on pg_dist_transaction

We were taking ShareUpdateExlusiveLock on pg_dist_transaction during
recovery to prevent multiple recoveries happening concurrenly. VACUUM(
not FULL) also takes ShareUpdateExclusiveLock, and they can conflict. It
seems that VACUUM will skip the table if there is a conflicting lock
already taken unless it is doing the vacuum to prevent id wraparound, in
which case there can be a deadlock. I guess the deadlock happens if:

- VACUUM takes a lock on pg_dist_transaction and is done for id
wraparound problem
- The transaction in the maintenance tries to take a lock but
cannot as that conflicts with the lock acquired by VACUUM
- The transaction in the maintenance daemon has a very old xid hence
VACUUM cannot proceed.

If we take a row exclusive lock in transaction recovery then it wouldn't
conflict with VACUUM hence it could proceed so the deadlock would be
resolved. To prevent concurrent transaction recoveries happening, an
advisory lock is taken with ShareUpdateExlusiveLock as before.

* Use CITUS_OPERATIONS tag
2020-09-21 15:20:38 +03:00
Onur Tirtir 1b31b22635 Refactor the functions that return OID lists for citus tables 2020-09-18 16:42:46 +03:00
SaitTalhaNisanci dae2c69fd7
Not allow removing a single node with ref tables (#4127)
* Not allow removing a single node with ref tables

We should not allow removing a node if it is the only node in the
cluster and there is a data on it. We have this check for distributed
tables but we didn't have it for reference tables.

* Update src/test/regress/expected/single_node.out

Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>

* Update src/test/regress/sql/single_node.sql

Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
2020-09-18 15:35:59 +03:00
Onur Tirtir 4118560b75
Prevent citus local table creation from a catalog table (#4158) 2020-09-15 14:30:48 +03:00
Marco Slot bd12555b16 Fix distributing tables owned by extensions 2020-09-10 04:46:11 +02:00
Onur Tirtir 3a73fba810 Apply planner changes for citus local tables 2020-09-09 11:51:18 +03:00
Onur Tirtir a58a4395ab Extend citus local table utility command support
This commit brings following features:

Foreign key support from citus local tables to reference tables
* Foreign key support from reference tables to citus local tables
  (only with RESTRICT & NO ACTION behavior)
* ALTER TABLE ENABLE/DISABLE trigger command support
* CREATE/DROP/ALTER trigger command support

and disallows:
* ALTER TABLE ATTACH/DETACH PARTITION commands
* CREATE TABLE <postgres table> ATTACH PARTITION <citus local table>
  commands
* Foreign keys from postgres tables to citus local tables
  (the other way was already disallowed)

for citus local tables.
2020-09-09 11:50:55 +03:00
Onur Tirtir 17cc810372 Implement "citus local table" creation logic 2020-09-09 11:50:48 +03:00
Onur Tirtir ba208eae4d
Record non-distributed table accesses in local executor (#4139) 2020-09-07 18:19:08 +03:00
Nils Dijk bbf42063a7
export LookupShardTransferMode 2020-09-03 16:06:38 +02:00
Nils Dijk 6e4862c57f
expose transfermode for ensure reference table existance 2020-09-03 16:06:37 +02:00
SaitTalhaNisanci 366461ccdb
Introduce cache entry/table utilities (#4132)
Introduce table entry utility functions

Citus table cache entry utilities are introduced so that we can easily
extend existing functionality with minimum changes, specifically changes
to these functions. For example IsNonDistributedTableCacheEntry can be
extended for citus local tables without the need to scan the whole
codebase and update each relevant part.

* Introduce utility functions to find the type of tables

A table type can be a reference table, a hash/range/append distributed
table. Utility methods are created so that we don't have to worry about
how a table is considered as a reference table etc. This also makes it
easy to extend the table types.

* Add IsCitusTableType utilities

* Rename IsCacheEntryCitusTableType -> IsCitusTableTypeCacheEntry

* Change citus table types in some checks
2020-09-02 22:26:05 +03:00
Jelte Fennema 451ea04508 Rename ForceXxx functions to to XxxOrError
This clearer naming was suggested in https://github.com/citusdata/citus/pull/4001
2020-09-01 11:19:17 +02:00
Hanefi Önaldı 024d398cd7
Allow distribution of functions that read from reference tables
create_distributed_function(function_name,
                            distribution_arg_name,
                            colocate_with text)

This UDF did not allow colocate_with parameters when there were no
disttribution_arg_name supplied. This commit changes the behaviour to
allow missing distribution_arg_name parameters when the function should
be colocated with a reference table.
2020-09-01 07:28:34 +03:00
Hanefi Onaldi f47b3a7e7d
Remove unused parameters from round robin reordering and friends (#4120) 2020-08-20 12:45:01 +03:00
SaitTalhaNisanci 679bf0d2b2
Create CanPushdownSubqery wrapper for better readability (#4108) 2020-08-12 17:28:20 +03:00
SaitTalhaNisanci 73ef40886b
Rename FindNodeCheckXXX functions (#4106)
FindNodeCheck is not clear about what the function is doing. They are
renamed to FindNodeMatchingCheckFunctionXXX. Also for choosing elements in these
functions, CheckNodeFunc type is introduced.
2020-08-11 15:01:23 +03:00
Hadi Moshayedi 7b74eca22d Support EXPLAIN EXECUTE ANALYZE. 2020-08-10 13:44:30 -07:00
Halil Ozan Akgul 375310b7f1 Adds support for table undistribution 2020-08-05 14:36:03 +03:00
Sait Talha Nisanci fe4ac51d8c Normalize Output:.. since it changes with pg13
Fix indentation for better readability
2020-08-04 15:38:13 +03:00
Sait Talha Nisanci 63ed126ad4 Set buffer usage with explain
It seems that currently we process even postgres tables in explain
commands. This is because we register a hook for explain and we don't
have any check to see if the query has any citus table.

With this commit, we now send the buffer usage as well to the relevant
API. There is some duplicate in the code but it is because of the
existing structure, we can refactor this separately.
2020-08-04 15:38:13 +03:00
Sait Talha Nisanci fe1e1c9b68 Replace Set_ptr_value as SetListCellPtr to be more explicit
Move header to right place and fix comment style
2020-08-04 15:38:13 +03:00
Sait Talha Nisanci 8e9b52971c Use new var field names in the codebase
The codebase is updated to use varattnosync and varnosyn and we defined
the macros for older versions. This way we can just remove the macros
when we drop an older version.
2020-08-04 15:38:13 +03:00
Sait Talha Nisanci b641f63bfd Use CMDTAG_SELECT_COMPAT
CMDTAG_SELECT exists in PG12 hence defining a MACRO such as
CMDTAG_SELECT -> "SELECT" is not possible. I chose CMDTAG_SELECT_COMPAT
because with the COMPAT suffix it is explicit that it maps to different
things in different versions and also has a less chance of mapping
something irrevelant. For example if we used SELECT as a macro, then it
would map every SELECT to whatever it is mapping to, which might have
unexpected/undesired behaviour.
2020-08-04 15:38:13 +03:00
Sait Talha Nisanci d68bfc5687 Improve error for index operator class parameters
The error message when index has opclassopts is improved and the commit
from postgres side is also included for future reference.

Also some minor style related changes are applied.
2020-08-04 15:38:13 +03:00
Sait Talha Nisanci 1070828465 update cte inline output for pg13
Make some macros in version_compat more robust
Remove commented code in ruleutils
Remove unnecessary variable assignments
2020-08-04 15:18:27 +03:00
Sait Talha Nisanci 1112b254a7 adapt recently added code for pg13
This commit mostly adds pg_get_triggerdef_command to our ruleutils_13.
This doesn't add anything extra for ruleutils 13 so it is basically a copy
of the change on ruleutils_12
2020-08-04 15:18:27 +03:00
Sait Talha Nisanci 38aaf1faba use QueryCompletion struct
Postgres introduced QueryCompletion struct. Hence a compat utility is
added to finish query completion for older versions and pg >= 13.

The commit on Postgres side:
2f9661311b83dc481fc19f6e3bda015392010a40
2020-08-04 15:10:22 +03:00
Sait Talha Nisanci 9f1ec792b3 add queryString to distributed_planner
distributed_planner now takes query string as a parameter.

related commit on PG side:
6aba63ef3e606db71beb596210dd95fa73c44ce2
2020-08-04 15:10:22 +03:00
Sait Talha Nisanci 1a7ccac6ef Add RangeTableEntryFromNSItem macro
addRangeTableEntryXXX methods return a ParseNamespaceItem with pg >= 13.
RangeTableEntryFromNSItem macro is added so that we return the range
table entry from the ParseNamespaceItem in pg>=13 and for pg < 13 rte
would already be returned with addRangeTableEntryXXX methods.

Commit on Postgres side:
5815696bc66b3092f6361f53e0394909647042c8
2020-08-04 15:10:22 +03:00
Sait Talha Nisanci 4ed30a0824 create Set_ptr_value
Since PG13 changed the list, a listcell doesn't contain data anymore.
Therefore Set_ptr_value macro is created, so that depending on the
version it will either use cell->data.ptr_value or cell->ptr_value.

Commit on Postgres side:
1cff1b95ab6ddae32faa3efe0d95a820dbfdc164
2020-08-04 15:10:22 +03:00
Sait Talha Nisanci ab85a8129d map varoattno and varnoold fields in Var
With PG13 varoattno and varnoold fields were renamed as varattnosyn and
varnosyn. A macro is defined for these.

Commit on Postgres side:
9ce77d75c5ab094637cc4a446296dc3be6e3c221

Command on Postgres side:
git log --all --grep="varoattno"
2020-08-04 15:10:22 +03:00
Sait Talha Nisanci 688ab16bba Introduce ExplainOnePlanCompat
Since ExplainOnePlan expects BufferUsage as well with PG >= 13,
ExplainOnePlanCompat is added.

Commit on Postgres side:
ed7a5095716ee498ecc406e1b8d5ab92c7662d10
2020-08-04 15:10:22 +03:00
Sait Talha Nisanci 6314eba5df introduce standard_planner_compat
standard_planner now takes the query string as a parameter as well with
pg >= 13.

Commit on Postgres Side:
66888f7424f7d6c7cea2c26e181054d1455d4e7a
2020-08-04 15:10:22 +03:00
Sait Talha Nisanci 991f49efc9 introduce getOwnedSequencesCompat macro
Commit on Postgres side:
19781729f789f3c6b2540e02b96f8aa500460322
2020-08-04 15:10:22 +03:00
Sait Talha Nisanci 00e7386007 introduce PortalDefineQuerySelectCompat
PortalDefineQuery doesn't accept char* for command tag anymore with PG
>= 13. We are currently only using it with Select, therefore a Portal
define query compat for select is created.

Commit on PG side:
2f9661311b83dc481fc19f6e3bda015392010a40
2020-08-04 15:10:22 +03:00
Sait Talha Nisanci 62879ee8c1 introduce planner_compat and pg_plan_query_compat macros
As the new planner and pg_plan_query_compat methods expect the query
string as well, macros are defined to be compatible in different
versions of postgres.

Relevant commit on Postgres:
6aba63ef3e606db71beb596210dd95fa73c44ce2

Command on Postgres:
git log --all --grep="pg_plan_query"
2020-08-04 15:10:22 +03:00
Sait Talha Nisanci bf831d2e59 Use table_openXXX methods in the codebase
With PG13 heap_* (heap_open, heap_close etc) are replaced with table_*
(table_open, table_close etc).

It is better to use the new table access methods in the codebase and
define the macros for the previous versions as we can easily remove the
macro without having to change the codebase when we drop the support for
the old version.

Commits that introduced this change on Postgres:
f25968c49697db673f6cd2a07b3f7626779f1827
e0c4ec07284db817e1f8d9adfb3fffc952252db0
4b21acf522d751ba5b6679df391d5121b6c4a35f

Command to see relevant commits on Postgres side:
git log --all --grep="heap_open"
2020-08-04 15:10:22 +03:00
Sait Talha Nisanci 0819b79631 introduce list compat macros
Pass the list to lnext API
lnext API now expects the list as well.
The commit on Postgres that introduced the change: 1cff1b95ab6ddae32faa3efe0d95a820dbfdc164

lnext_compat and list_delete_cell_compat macros are introduced so that
we can use these macros in the codebase without having to use #if
directives in the codebase.

Related commit on postgres:
1cff1b95ab6ddae32faa3efe0d95a820dbfdc164

Command to search in postgres:
git log --all --grep="list_delete_cell"

add ListCellAndListWrapper

When iterating a list in separate function calls, we need both the list
and the current cell starting from PG13, therefore
ListCellAndListWrapper is added to store both as a wrapper.

Use ListCellAndListWrapper in foreign key test udfs

As we iterate a list in these udfs using a functionContext, we need to
use the wrapper to be able to access both the list and the current cell.
2020-08-04 15:10:22 +03:00
Sait Talha Nisanci 30549dc0e2 add copy of ruleutils_12 as ruleutils_13 2020-08-04 13:34:13 +03:00
Onder Kalaci eeb8c81de2 Implement shared connection count reservation & enable `citus.max_shared_pool_size` for COPY
With this patch, we introduce `locally_reserved_shared_connections.c/h` files
which are responsible for reserving some space in shared memory counters
upfront.

We sometimes need to reserve connections, but not necessarily
establish them. For example:
-  COPY command should reserve connections as it cannot know which
   connections it needs in which order. COPY establishes connections
   as any input data hits the workers. For example, for router COPY
   command, it only establishes 1 connection.

   As discussed here (https://github.com/citusdata/citus/pull/3849#pullrequestreview-431792473),
   COPY needs to reserve connections up-front, otherwise we can end
   up with resource starvation/un-detected deadlocks.
2020-08-03 18:51:40 +02:00
SaitTalhaNisanci ef841115de
Fix int32 overflow and use PG macros for INT32_XX (#4061)
* Use CalculateUniformHashRangeIndex in HashPartitionId

INT32_MIN definition can change among different platforms hence it is
possible to get overflow, we would see crashes because of this in debian
distros. We have already solved a similar problem with introducing
CalculateUniformHashRangeIndex method, hence to solve it we can use the
same method, this also removes some duplication and has a single place
to decide that.

* Use PG_INT32_XX instead of INT32_XX to be safer
2020-07-23 18:30:08 +03:00
Onder Kalaci cfb633601d Minor refactorings in COPY command execution
1) Rename CONNECTION_PER_PLACEMENT to REQUIRE_CLEAN_CONNECTION. This is
mostly to make things clear as the new name reveals more.

2) We also make sure that mark all the copy connections critical,
even if they are accessed earlier in the transction
2020-07-23 15:36:19 +02:00
Onder Kalaci 52c0fccb08 Move executor specific logic to a function
Because as we're planning to use the same logic, it'd be nice to
use the exact same functions.
2020-07-22 15:09:47 +02:00
Onder Kalaci ff6555299c Unify node sort ordering
The executor relies on WorkerPool, and many other places rely on WorkerNode.
With this commit, we make sure that they are sorted via the same function/logic.
2020-07-22 11:03:25 +02:00
Hanefi Önaldı e534dbae4a
Accept list of values in a supported ALTER ROLE .. SET statement
Some GUCs support a list of values which is indicated by GUC_LIST_INPUT flag.

When an ALTER ROLE .. SET statement is executed, the new configuration
default for affected users and databases are stored in the
setconfig(text[]) column in a pg_db_role_setting record.

If a GUC that supports a list of values is used in an ALTER ROLE .. SET
statement, we need to split the text into items delimited by commas.
2020-07-21 03:49:57 +03:00
Onder Kalaci c25de2cf22 Remove flag from
As it doesn't make any sense anymore
2020-07-20 12:45:05 +02:00
SaitTalhaNisanci b3af63c8ce
Remove task tracker executor (#3850)
* use adaptive executor even if task-tracker is set

* Update check-multi-mx tests for adaptive executor

Basically repartition joins are enabled where necessary. For parallel
tests max adaptive executor pool size is decresed to 2, otherwise we
would get too many clients error.

* Update limit_intermediate_size test

It seems that when we use adaptive executor instead of task tracker, we
exceed the intermediate result size less in the test. Therefore updated
the tests accordingly.

* Update multi_router_planner

It seems that there is one problem with multi_router_planner when we use
adaptive executor, we should fix the following error:
+ERROR:  relation "authors_range_840010" does not exist
+CONTEXT:  while executing command on localhost:57637

* update repartition join tests for check-multi

* update isolation tests for repartitioning

* Error out if shard_replication_factor > 1 with repartitioning

As we are removing the task tracker, we cannot switch to it if
shard_replication_factor > 1. In that case, we simply error out.

* Remove MULTI_EXECUTOR_TASK_TRACKER

* Remove multi_task_tracker_executor

Some utility methods are moved to task_execution_utils.c.

* Remove task tracker protocol methods

* Remove task_tracker.c methods

* remove unused methods from multi_server_executor

* fix style

* remove task tracker specific tests from worker_schedule

* comment out task tracker udf calls in tests

We were using task tracker udfs to test permissions in
multi_multiuser.sql. We should find some other way to test them, then we
should remove the commented out task tracker calls.

* remove task tracker test from follower schedule

* remove task tracker tests from multi mx schedule

* Remove task-tracker specific functions from worker functions

* remove multi task tracker extra schedule

* Remove unused methods from multi physical planner

* remove task_executor_type related things in tests

* remove LoadTuplesIntoTupleStore

* Do initial cleanup for repartition leftovers

During startup, task tracker would call TrackerCleanupJobDirectories and
TrackerCleanupJobSchemas to clean up leftover directories and job
schemas. With adaptive executor, while doing repartitions it is possible
to leak these things as well. We don't retry cleanups, so it is possible
to have leftover in case of errors.

TrackerCleanupJobDirectories is renamed as
RepartitionCleanupJobDirectories since it is repartition specific now,
however TrackerCleanupJobSchemas cannot be used currently because it is
task tracker specific. The thing is that this function is a no-op
currently.

We should add cleaning up intermediate schemas to DoInitialCleanup
method when that problem is solved(We might want to solve it in this PR
as well)

* Revert "remove task tracker tests from multi mx schedule"

This reverts commit 03ecc0a681.

* update multi mx repartition parallel tests

* not error with task_tracker_conninfo_cache_invalidate

* not run 4 repartition queries in parallel

It seems that when we run 4 repartition queries in parallel we get too
many clients error on CI even though we don't get it locally. Our guess
is that, it is because we open/close many connections without doing some
work and postgres has some delay to close the connections. Hence even
though connections are removed from the pg_stat_activity, they might
still not be closed. If the above assumption is correct, it is unlikely
for it to happen in practice because:
- There is some network latency in clusters, so this leaves some times
for connections to be able to close
- Repartition joins return some data and that also leaves some time for
connections to be fully closed.

As we don't get this error in our local, we currently assume that it is
not a bug. Ideally this wouldn't happen when we get rid of the
task-tracker repartition methods because they don't do any pruning and
might be opening more connections than necessary.

If this still gives us "too many clients" error, we can try to increase
the max_connections in our test suite(which is 100 by default).

Also there are different places where this error is given in postgres,
but adding some backtrace it seems that we get this from
ProcessStartupPacket. The backtraces can be found in this link:
https://circleci.com/gh/citusdata/citus/138702

* Set distributePlan->relationIdList when it is needed

It seems that we were setting the distributedPlan->relationIdList after
JobExecutorType is called, which would choose task-tracker if
replication factor > 1 and there is a repartition query. However, it
uses relationIdList to decide if the query has a repartition query, and
since it was not set yet, it would always think it is not a repartition
query and would choose adaptive executor when it should choose
task-tracker.

* use adaptive executor even with shard_replication_factor > 1

It seems that we were already using adaptive executor when
replication_factor > 1. So this commit removes the check.

* remove multi_resowner.c and deprecate some settings

* remove TaskExecution related leftovers

* change deprecated API error message

* not recursively plan single relatition repartition subquery

* recursively plan single relation repartition subquery

* test depreceated task tracker functions

* fix overlapping shard intervals in range-distributed test

* fix error message for citus_metadata_container

* drop task-tracker deprecated functions

* put the implemantation back to worker_cleanup_job_schema_cachesince citus cloud uses it

* drop some functions, add downgrade script

Some deprecated functions are dropped.
Downgrade script is added.
Some gucs are deprecated.
A new guc for repartition joins bucket size is added.

* order by a test to fix flappiness
2020-07-18 13:11:36 +03:00
Hadi Moshayedi 13003d8d05 Use TupleDestination API for partitioning in insert/select. 2020-07-17 09:43:46 -07:00
Nils Dijk d0b6e62c9a
change wording to allowlist and the likes (#3906)
In the same line as #3904

Change wording to better reflect use and remove words that enforce/maintain bias.
2020-07-15 16:24:40 +02:00
Sait Talha Nisanci 510535f558 address feedback 2020-07-13 19:45:02 +03:00
Sait Talha Nisanci db1b78148c send schema creation/cleanup to coordinator in repartitions
We were using ALL_WORKERS TargetWorkerSet while sending temporary schema
creation and cleanup. We(well mostly I) thought that ALL_WORKERS would also include coordinator when it is added as a worker. It turns out that it was FILTERING OUT the coordinator even if it is added as a worker to the cluster.

So to have some context here, in repartitions, for each jobId we create
(at least we were supposed to) a schema in each worker node in the cluster. Then we partition each shard table into some intermediate files, which is called the PARTITION step. So after this partition step each node has some intermediate files having tuples in those nodes. Then we fetch the partition files to necessary worker nodes, which is called the FETCH step. Then from the files we create intermediate tables in the temporarily created schemas, which is called a MERGE step. Then after evaluating the result, we remove the temporary schemas(one for each job ID in each node) and files.

If node 1 has file1, and node 2 has file2 after PARTITION step, it is
enough to either move file1 from node1 to node2 or vice versa. So we
prune one of them.

In the MERGE step, if the schema for a given jobID doesn't exist, the
node tries to use the `public` schema if it is a superuser, which is
actually added for testing in the past.

So when we were not sending schema creation comands for each job ID to
the coordinator(because we were using ALL_WORKERS flag, and it doesn't
include the coordinator), we would basically not have any schemas for
repartitions in the coordinator. The PARTITION step would be executed on
the coordinator (because the tasks are generated in the planner part)
and it wouldn't give us any error because it doesn't have anything to do
with the temporary schemas(that we didn't create). But later two things
would happen:

- If by chance the fetch is pruned on the coordinator side, we the other
nodes would fetch the partitioned files from the coordinator and execute
the query as expected, because it has all the information.
- If the fetch tasks are not pruned in the coordinator, in the MERGE
step, the coordinator would either error out saying that the necessary
schema doesn't exist, or it would try to create the temporary tables
under public schema ( if it is a superuser). But then if we had the same
task ID with different jobID it would fail saying that the table already
exists, which is an error we were getting.

In the first case, the query would work okay, but it would still not do
the cleanup, hence we would leave the partitioned files from the
PARTITION step there. Hence ensure_no_intermediate_data_leak would fail.

To make things more explicit and prevent such bugs in the future,
ALL_WORKERS is named as ALL_NON_COORD_WORKERS. And a new flag to return
all the active nodes is added as ALL_DATA_NODES. For repartition case,
we don't use the only-reference table nodes but this version makes the
code simpler and there shouldn't be any significant performance issue
with that.
2020-07-13 19:20:15 +03:00
SaitTalhaNisanci 15290bc43b
remove unused worker methods (#4017) 2020-07-10 13:45:55 +03:00
SaitTalhaNisanci 3f50165365
rename TargetWorkerSet enums (#4015)
Rename TargetWorkerSet enums to make them more explicit about what they
mean. Ideally it would be good to treat everything as a node without the
'worker' concept because it makes things complicated. Another
improvement could be to rename TargetWorkerSet as TargetNodeSet but it
goes to renaming many occurrences of Worker, which is probably too big
for this PR.
2020-07-10 11:21:27 +03:00
Jelte Fennema 759e628dd5
Handle some NULL issues that static analysis found (#4001)
Static analysis found some issues where we used the result from
ExtractResultRelationRTE, without checking that it wasn't NULL. It seems
like in all these cases it can never actually be NULL, since we have checked
before that it isn't a SELECT query. So, this PR is mostly to make static
analysis happy (and protect a bit against future changes of the code).
2020-07-09 15:46:42 +02:00
SaitTalhaNisanci 96adce77d6
rename node/worker utilities (#4003)
The names were not explicit about what they do, and we have many
misusages in the codebase, so they are renamed to be more explicit.
2020-07-09 15:30:35 +03:00
Jelte Fennema f6e2f1b1cb
Replace words that have bad associations (#3992)
We had a few words in our codebase that static analysis flagged as having bad
associations.
2020-07-08 14:57:48 +02:00
Hadi Moshayedi 23fa421639 Fix task->fetchedExplainAnalyzePlan memory issue. 2020-07-07 07:58:02 -07:00
citus bot f0693e2f75 Remove unused MaxMasterConnectionCount function 2020-07-07 10:37:57 +02:00
citus bot bdfeb380d3 Fix some more master->coordinator comments 2020-07-07 10:37:53 +02:00
Marco Slot b4fec63bc0 Rename master evaluation to coordinator evaluation 2020-07-07 10:37:41 +02:00
Marco Slot eeffbde8bd Fix pushdown of constants in aggregate queries 2020-06-30 11:41:16 -07:00
Jelte Fennema 392c5e2c34
Fix wrong cancellation message about distributed deadlocks (#3956) 2020-06-30 14:57:46 +02:00
Marco Slot 634d6cf9d7
Improve performance of metadata cache (#3924)
#3866 removed the shard ID hash in metadata_cache.c to simplify cache management, 
but we observed a significant performance regression that was being masked by the
performance improvement provided by #3654 in our benchmarks, but #3654 only 
applies to specific workloads.

This PR brings back the shard ID cache as it existed before #3866 with some extra
 measures to handle invalidation. When we load a table entry, we overwrite 
ShardIdCacheEntry->tableEntry pointers for all the shards in that table, though 
it's possible that the table no longer contains the old shard ID or the table 
entry is never reloaded, which would leave a dangling pointer once the table 
entry is freed. To handle that case, we remove all shard ID cache entries that 
point exactly to that table entry when a table is freed (at the end of the 
transaction or any call to CitusTableCacheFlushInvalidatedEntries).

Co-authored-by: SaitTalhaNisanci <s.talhanisanci@gmail.com>
Co-authored-by: Marco Slot <marco.slot@gmail.com>
Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
2020-06-30 12:10:10 +02:00
Hadi Moshayedi 4ed59d2db3 Move more from insert_select_executor to insert_select_planner 2020-06-26 08:08:26 -07:00
Hadi Moshayedi d34c21890f Rename CoordinatorInsertSelect... to NonPushableInsertSelect 2020-06-25 08:55:48 -07:00
Hadi Moshayedi 4e8d79998e Save INSERT/SELECT method in DistributedPlan.
This is so we don't need to calculate it twice in
insert_select_executor.c and multi_explain.c, which can
cause discrepancy if an update in one of them is not
reflected in the other site.
2020-06-25 08:55:48 -07:00
SaitTalhaNisanci f458d1fd1c
Fix/task execution (#3941)
* Not set TaskExecution with adaptive executor

Adaptive executor is using a utility method from task tracker for
repartition joins, however adaptive executor doesn't need taskExecution.
It is only used by task tracker. This causes a problem when explain
analyze is used because what taskExecution is pointing to might be
random.

We solve this by not setting taskExecution from adaptive executor. So it
will stay NULL as set by CreateTask.

* use same memory context as task for taskExecution

Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
2020-06-24 12:10:00 +03:00
Marco Slot 2a3234ca26 Rename masterQuery to combineQuery 2020-06-17 14:14:37 +02:00
Jelte Fennema 0259815d3a
Fix EXPLAIN ANALYZE received data counter issues (#3917)
In #3901 the "Data received from worker(s)" sections were added to EXPLAIN
ANALYZE. After merging @pykello posted some review comments. This addresses
those comments as well as fixing a other issues that I found while addressing 
them. The things this does:

1. Fix `EXPLAIN ANALYZE EXECUTE p1` to not increase received data on every
   execution
2. Fix `EXPLAIN ANALYZE EXECUTE p1(1)` to not return 0 bytes as received data
   allways.
3. Move `EXPLAIN ANALYZE` specific logic to `multi_explain.c` from
   `adaptive_executor.c`
4. Change naming of new explain sections to `Tuple data received from node(s)`.
   Firstly because a task can reference the coordinator too, so "worker(s)" was
   incorrect. Secondly to indicate that this is tuple data and not all network
   traffic that was performed.
5. Rename `totalReceivedData` in our codebase to `totalReceivedTupleData` to
   make it clearer that it's a tuple data counter, not all network traffic.
6. Actually add `binary_protocol` test to `multi_schedule` (woops)
7. Fix a randomly failing test in `local_shard_execution.sql`.
2020-06-17 11:33:38 +02:00
Marco Slot d1bab78d79 Remove master from file hierarchy 2020-06-16 17:49:09 +02:00
Philip Dubé 39400319e6 Defer freeing CitusTableCacheEntry, as there were memory safety issues before
Shard id to index mapping stored in cache entry as there may now be multiple entries alive for a given relation

insert_select_executor: revert copying cache entry, which was a hack added to avoid memory safety issues
2020-06-15 16:20:50 +00:00
Jelte Fennema 927de6d187
Show amount of data received in EXPLAIN ANALYZE (#3901)
Sadly this does not actually work yet for binary protocol data, because
when doing EXPLAIN ANALYZE we send two commands at the same time. This
means we cannot use `SendRemoteCommandParams`, and thus cannot use the
binary protocol. This can still be useful though when using the text
protocol, to find out that a lot of data is being sent.
2020-06-15 16:01:05 +02:00
Marco Slot 4f7989ad8e Rename WorkersContainingAllShards to PlacementsForWorkersContainingAllShards 2020-06-12 18:36:02 -07:00
Marco Slot d953f084db Rename FindRouterWorkerList to CreateTaskPlacementListForShardIntervals 2020-06-12 18:36:01 -07:00
Marco Slot 24feadc230 Handle joins between local/reference/cte via router planner 2020-06-12 18:36:01 -07:00
Halil Ozan Akgül 8c5eb6b7ea
Insert Select Into Local Table (#3870)
* Insert select with master query

* Use relid to set custom_scan_tlist varno

* Reviews

* Fixes null check

Co-authored-by: Marco Slot <marco.slot@gmail.com>
2020-06-12 17:06:31 +03:00
Jelte Fennema 0e12d045b1
Support use of binary protocol in between nodes (#3877)
This can save a lot of data to be sent in some cases, thus improving
performance for which inter query bandwidth is the bottleneck.
There's some issues with enabling this as default, so that's currently not done.
2020-06-12 15:02:51 +02:00
Nils Dijk da8f2b0134
Feature: tdigest aggregate (#3897)
DESCRIPTION: Adds support to partially push down tdigest aggregates

tdigest extensions: https://github.com/tvondra/tdigest

This PR implements the partial pushdown of tdigest calculations when possible. The extension adds a tdigest type which can be combined into the same structure. There are several aggregate functions that can be used to get;
 - a quantile
 - a list of quantiles
 - the quantile of a hypothetical value
 - a list of quantiles for a list of hypothetical values

These function can work both on values or tdigest types.

Since we can create tdigest values either by combining them, or based on a group of values we can rewrite the aggregates in such a way that most of the computation gets delegated to the compute on the shards. This both speeds up the percentile calculations because the values don't have to be sorted while at the same time making the transfer size from the shards to the coordinator significantly less.
2020-06-12 13:50:28 +02:00
Philip Dubé 1722d8ac8b Allow routing modifying CTEs
We still recursively plan some cases, eg:
- INSERTs
- SELECT FOR UPDATE when reference tables in query
- Everything must be same single shard & replication model
2020-06-11 15:14:06 +00:00
Hadi Moshayedi 7c52c6edb0 CTE statistics in EXPLAIN ANALYZE 2020-06-11 02:39:59 -07:00
Hadi Moshayedi bb96ef5047 Does the EXPLAIN ANALYZE at the same time as execution, so avoids executing twice.
We wrap worker tasks in worker_save_query_explain_analyze() so we can fetch
their explain output later by a call worker_last_saved_explain_analyze().

Fixes #3519
Fixes #2347
Fixes #2613
Fixes #621
2020-06-11 01:55:57 -07:00
Marco Slot 1243b6a948 Execute shard creation as utility tasks 2020-06-10 11:29:49 +02:00
Onder Kalaci 06461ca55f Coerce types properly for INSERT
Also, unify similar code-paths to rely on more accurate function.
2020-06-10 10:40:28 +02:00
Hadi Moshayedi 5cdfa9f571 Implement EXPLAIN ANALYZE udfs.
Implements worker_save_query_explain_analyze and worker_last_saved_explain_analyze.

worker_save_query_explain_analyze executes and returns results of query while
saving its EXPLAIN ANALYZE to be fetched later.

worker_last_saved_explain_analyze returns the saved EXPLAIN ANALYZE result.
2020-06-09 10:02:05 -07:00
Onur Tirtir a4f1c41391
Implement GetQueryLockMode helper (#3860)
If we want to get necessary lockmode for a relation RangeVar within
a query, we can get the lockmode easily from the RangeVar itself (if
pg version >= 12).

However, if we want to decide the lockmode appropriate for the
"query", we can derive this information by using GetQueryLockMode
according to the code comment from RangeTblEntry->rellockmode.
2020-06-09 13:08:44 +03:00
Hadi Moshayedi 198d5d8b0f typedef TupleDestination once 2020-06-08 20:38:28 -07:00
Hadi Moshayedi 0bfd39ea52 Implement TupleDestination intereface.
Implements a new `TupleDestination` interface to allow custom tuple processing per task.

This can be specially useful if a task contains multiple queries. An example of this EXPLAIN
ANALYZE, where it needs to add some UDF calls to the query to fetch the explain output
from worker after fetching the actual query results.
2020-06-05 17:47:40 -07:00
SaitTalhaNisanci d0f47eb338
Check the removeType in IsDropCitusStmt (#3859)
We should check the remove type in IsDropCitusStmt because if the remove
type is not OBJECT_EXTENSION then the stored objects in
dropStmt->objects may not be of type Value. This was crashing PG-13.

Also rename the method as IsDropCitusExtensionStmt.
2020-06-05 20:49:54 +03:00
Onur Tirtir f7224a12f2
Implement PushOverrideEmptySearchPath (#3874)
To reduce code duplication, implement function that pushes search_path
to be NIL and sets addCatalog to true so that all objects outside of
pg_catalog will be schema-prefixed.
2020-06-05 19:23:59 +03:00
Onur Tirtir f3f711e097 Implement IndexIsImpliedByAConstraint 2020-06-05 15:33:54 +03:00
Onur Tirtir dfcc18468c Error out for unsupported trigger objects
Error out if creating a citus table from a table having triggers.
Error out for CREATE TRIGGER commands that are run on citus tables.
2020-05-31 23:10:01 +03:00
Onur Tirtir 6e6bc155a9 Implement methods to process & recreate triggers on citus tables 2020-05-31 15:28:17 +03:00
Philip Dubé c0515dcd67 This prepares for routing modifying CTEs, where modLevel should not be used to infer whether a plan is a select or not
SELECT_TASK is renamed to READ_TASK as a SELECT with modifying CTEs will be a MODIFYING_TASK

RouterInsertJob: Assert originalQuery->commandType == CMD_INSERT
CreateModifyPlan: Assert originalQuery->commandType != CMD_SELECT

Remove unused function IsModifyDistributedPlan

DistributedExecution, ExecutionParams, DistributedPlan: Rename hasReturning to expectResults
SELECTs set expectResults to true

Rename CreateSingleTaskRouterPlan to CreateSingleTaskRouterSelectPlan
2020-05-20 17:26:12 +00:00
Onur Tirtir 79a688ffe0 Refactor the methods accessing to pg_constraint
Implement internal functions to accces to pg_contraint
and utilize them in existing foreign key checks.
2020-05-20 17:27:17 +03:00
SaitTalhaNisanci 22c903b151
remove ExecuteUtilityTaskListWithoutResults (#3696)
This PR removes ExecuteUtilityTaskListWithoutResults and uses the same
path for local execution via ExecuteTaskListExtended.
ExecuteUtilityTaskList is added. ExecuteLocalTaskListExtended now has a
parameter for utility commands so that it can call the right method. In
order not to change the existing calls,
ExecuteTaskListExtendedInternal is added, which is the main method that
runs the execution, via local and remote execution.
2020-05-07 13:30:50 +03:00
Marco Slot 6ce2803777 Make sure we don't wrap GROUP BY expressions in any_value 2020-05-05 05:12:45 +02:00
Onder Kalaci 0cb7ab2d05 Explicitly mark queries in physical planner for [not] having parameters
Physical planner doesn't support parameters. If the parameters have already
been resolved when the physical planner handling the queries, mark it.
The reason is that the executor is unaware of this, and sends the parameters
along with the worker queries, which fails for composite types.

(See `DissuadePlannerFromUsingPlan()` for the details of paramater resolving)
2020-04-24 12:49:43 +02:00
Onur Tirtir b8dd8f50d1
Fix build issue in GCC 10 (#3790)
As reported in #3787, we were having issues while building citus with "GCC Red Hat 10" (maybe in some other versions of gcc as well).
Fixes "multiple definition of 'CitusNodeTagNames'" error by explicitly specifying storage of CitusNodeTagNames to be extern.
2020-04-22 16:41:34 +03:00
Philip Dubé c0a95a3adb Copy data from CitusTableCacheEntry more often
This copies over fixes from reference counting branch,
all CitusTableCacheEntry data may be freed when a GetCitusTableCacheEntry call occurs for its relationId

This fix is not complete, but reference counting is being deferred until 9.4

CopyShardInterval: remove dest parameter, always return newly allocated object
2020-04-17 14:17:18 +00:00
Önder Kalacı a919f09c96
Remove the entries from the shared connection counter hash when no connections remain (#3775)
We initially considered removing entries just before any change to
pg_dist_node. However, that ended-up being very complex and making
MX even more complex.

Instead, we're switching to a simpler solution, where we remove entries
when the counter gets to 0.

With certain workloads, this may have some performance penalty. But, two
notes on that:
 - When counter == 0, it implies that the cluster is not busy
 - With cached connections, that's not possible
2020-04-17 17:14:58 +03:00
SaitTalhaNisanci a9a3be15cc
introduce TASK_QUERY_NULL task type (#3774)
When we call SetTaskQueryString we would set the task type to
TASK_QUERY_TEXT, and some parts of the codebase rely on the fact that if
TASK_QUERY_TEXT is set, the data can be read safely. However if
SetTaskQueryString is called with a NULL taskQueryString this can cause
crashes. In that case taskQueryType will simply be set to
TASK_QUERY_NULL.
2020-04-17 14:59:22 +03:00
Nils Dijk 1d6ba1d09e
Refactor alter role to work on distributed roles (#3739)
DESCRIPTION: Alter role only works for citus managed roles

Alter role was implemented before we implemented good role management that hooks into the object propagation framework. This is a refactor of all alter role commands that have been implemented to
 - be on by default
 - only work for supported roles
 - make the citus extension owner a supported role

Instead of distributing the alter role commands for roles at the beginning of the node activation role it now _only_ executes the alter role commands for all users in all databases and in the current database.

In preparation of full role support small refactors have been done in the deparser.

Earlier tests targeting other roles than the citus extension owner have been either slightly changed or removed to be put back where we have full role support.

Fixes #2549
2020-04-16 12:23:27 +02:00
Marco Slot 8b83306a27 Issue worker messages with the same log level 2020-04-14 21:08:25 +02:00
SaitTalhaNisanci 132efdbc56
add execution params struct (#3747)
We had 9+ parameters in some of the functions related to execution.
Execution params is created to simplify this a bit so that we can set
only the fields that we are interested in and it is easier to read.
2020-04-14 14:32:40 +03:00
Onder Kalaci aa6b641828 Throttle connections to the worker nodes
With this commit, we're introducing a new infrastructure to throttle
connections to the worker nodes. This infrastructure is useful for
multi-shard queries, router queries are have not been affected by this.

The goal is to prevent establishing more than citus.max_shared_pool_size
number of connections per worker node in total, across sessions.

To do that, we've introduced a new connection flag OPTIONAL_CONNECTION.
The idea is that some connections are optional such as the second
(and further connections) for the adaptive executor. A single connection
is enough to finish the distributed execution, the others are useful to
execute the query faster. Thus, they can be consider as optional connections.
When an optional connection is not allowed to the adaptive executor, it
simply skips it and continues the execution with the already established
connections. However, it'll keep retrying to establish optional
connections, in case some slots are open again.
2020-04-14 10:27:48 +02:00
Onder Kalaci 0dbfbe0c37 Add the necessary shared memory infrastructure
- The hashmap in the shared memory
- The lock to access the hashmap
- The GUC to control the size
2020-04-14 10:03:26 +02:00
Philip Dubé 30f10984e1 Defer get_agg_clause_costs, it happens later & avoids errors 2020-04-10 13:26:05 +00:00
SaitTalhaNisanci 3dc7cad754
use an enum for local execution status (#3733)
We have two variables that are related to local execution status.
TransactionAccessedLocalPlacement and
TransactionConnectedToLocalGroup. Only one of these fields should be
set, however we didn't have any check for this contraint and it was
error prone.

What those two variables are used is that we are trying to understand if
we should use local execution, the current session, or if we should be
using a connection to execute the current query, therefore the tasks. In
the enum, now it is more clear what these variables mean.

Also, now we have a method to change the local execution status. The
method will error if we are trying to transition from a state to a wrong
state. This will help us avoid problems.
2020-04-09 19:11:04 +03:00
Hadi Moshayedi dda53a0bba GUC for replicate reference tables on activate. 2020-04-08 12:42:45 -07:00
Hadi Moshayedi 0758a81287 Prevent reference tables being dropped when replicating reference tables 2020-04-08 12:41:36 -07:00
Marco Slot 924cd7343a Defer reference table replication to shard creation time 2020-04-08 12:41:36 -07:00
Onder Kalaci 9b29a32d7a Remove all references for side channel connections
We don't need any side channel connections. That is actually
problematic in the sense that it creates extra connections.
Say, citus.max_adaptive_executor_pool_size equals to 1, Citus
ends up using one extra connection for the intermediate results.
Thus, not obeying citus.max_adaptive_executor_pool_size.

In this PR, we remove the following entities from the codebase
to allow further commits to implement not requiring extra connection
for the intermediate results:

- The connection flag REQUIRE_SIDECHANNEL
- The function GivePurposeToConnection
- The ConnectionPurpose struct and related fields
2020-04-07 17:06:55 +02:00
Marco Slot 2632343f64 Fix intermediate result pruning for INSERT..SELECT 2020-04-07 11:07:49 +02:00
Marco Slot 84672c3dbd Simplify intermediate result pruning logic 2020-04-07 10:53:29 +02:00
SaitTalhaNisanci a369f9001d
fix incorrect groupid or nodeid (#3710)
For shardplacements, we were setting nodeid, nodename, nodeport and
nodegroup manually. This makes it very error prone, and it seems that we
already forgot to set some of them. This would mean that they would have
their default values, e.g group id would be 0 when its group id is not
0.

So the implication is that we would have inconsistent worker metadata.

A new method is introduced, and we call the method to set those fields
now, so that as long as we call this method, we won't be setting
inconsistent metadata.

It probably makes sense to have a struct for these fields. We already
have NodeMetadata but it doesn't have nodename or nodeport. So that
could be done over another refactor to make things simpler.
2020-04-07 11:14:14 +03:00
Onur Tirtir 4c95ad1579 do not traverse parse tree in distributed planner one more time 2020-04-03 18:24:48 +03:00
Onur Tirtir 13a35c6813 implement GetOnlyShardOidOfReferenceTable and some refactor in shard_uitls 2020-04-03 18:24:13 +03:00
SaitTalhaNisanci 0aebd78ea7 use localExecution in ExecuteTaskListExtended
ExecuteTaskListExtended is the common method for different codepaths,
and instead of writing separate local execution logics in different
codepaths, it makes more sense to have the logic here. We still need to
do some refactoring, this is an initial step.

After this commit, we can run create shard commands locally. There is a
special case with shard creation commands. A create shard command might
have a concatenated query string, however local execution did not know
how to execute a task with multiple query strings. This is also
implemented in this commit. We go over each query in the concatenated
query string and plan/execute them one by one.

A more clean solution to this would be to make sure that each task has a
single query. We currently cannot do that because we need to ensure the
task dependencies. However, it would make sense to do that at some point
and it would simplify the code a lot.
2020-04-01 18:23:16 +03:00
SaitTalhaNisanci ba01f3457a
use macros for pg versions instead of hardcoded values (#3694)
3 Macros are defined for removing the hardcoded pg versions.
PG_VERSION_11, PG_VERSION_12 and PG_VERSION_13.
2020-04-01 17:01:52 +03:00
SaitTalhaNisanci 6cd32b0db1
refactor ExecuteLocalTaskList (#3617)
ExecuteLocalTaskList doesn't need scanState as it only uses
paramListInfo, distributedPlan and tupleStoreState. It is better to pass
only the variables that the function needs, so that we can call this
function from other places when we dont have scanState.
2020-03-31 19:19:54 +03:00
SaitTalhaNisanci b5591b1b28 use taskQuery as a struct to simplify the code 2020-03-31 15:47:55 +03:00
SaitTalhaNisanci 8806c4d697 move queryStringList into taskQuery
Also allocate task query in the memory context of task.
2020-03-31 15:47:55 +03:00
SaitTalhaNisanci c796ac335d add TaskQuery struct to abstract query string related fields
We had many fields in task related to query strings. It was kind of
complex, and only of them could be set at a time. Therefore it makes
more sense to abstract this and use a union so that it is clear that
only of them should be set.

We have three fields that could have query related strings:
- queryForLocation
- queryStringLazy
- perPlacementQueryStrings

Relatively, they can be set with:
- SetTaskQueryString
- SetTaskQueryIfShouldLazyDeparse
- SetTaskPerPlacementQueryStrings

The direct usage of the query related fields are also removed.

Rename queryForLocalExecution

Currently queryForLocalExecution is only used for deparsing purposes,
therefore it makes sense to rename it to what it is doing.
2020-03-31 15:47:55 +03:00
SaitTalhaNisanci 98f95e2a5e add TaskQueryStringForPlacement
TaskQueryStringForPlacement simplifies how the executor gets the query
string for a given placement. Task will use the necessary fields to
return the correct query placement string. Executor doesn't need to know
the details for this.

rename TaskQueryString as TaskQueryStringAllPlacements

TaskQueryString returns the query string that will be the same for all
the placements. In INSERT..SELECT the query string can be different for
each placement. Adaptive executor uses TaskQueryStringForPlacement,
which returns the query string for a placement. It makes sense to rename
TaskQueryString as TaskQueryStringAllPlacements as it is returning the
query string for all placements.

rename SetTaskQuery as SetTaskQueryIfShouldLazyDeparse

SetTaskQuery does not always sets the task query. It can set the query
string as well. So it is more clear to name it
SetTaskQueryIfShouldLazyDeparse, since it will set the query not query
string only when we should deparse the query in a lazy way.
2020-03-31 15:47:55 +03:00
SaitTalhaNisanci 982b5fbabf add SetTaskPerPlacementStrings
It is possible that a task will have different query string for each
placement. This is the case in INSERT..SELECT via repartitioning. When
we are setting task->perPlacementQueryString, we should set
queryStringLazy to NULL. Therefore a method for that purpose is created.
2020-03-31 15:47:55 +03:00
Marco Slot 331b45348c Fix error when using LEFT JOIN with GROUP BY on primary key 2020-03-30 16:42:22 +02:00
SaitTalhaNisanci e1802c5c00
extract local plan cache related methods into a file (#3667) 2020-03-31 11:11:34 +03:00
Philip Dubé 4eb2c33f38 multi_copy.c: remove tableMetadata 2020-03-30 19:26:44 +00:00
Jelte Fennema 3be665269f
Reintroduce ForceSearchShardPlacementInList (#3664)
This was added to silence static analysis errors. It was removed
accidentally in #3591. This reintroduces it again.
2020-03-27 14:28:50 +01:00
Hanefi Onaldi 0e8103b101
Propagate ALTER ROLE .. SET statements
In PostgreSQL, user defaults for config parameters can be changed by
ALTER ROLE .. SET statements. We wish to propagate those defaults
accross the Citus cluster so that the behaviour will be similar in
different workers.

The defaults can either be set in a specific database, or the whole
cluster, similarly they can be set for a single role or all roles.

We propagate the ALTER ROLE .. SET if all the conditions below are met:
- The query affects the current database, or all databases
- The user is already created in worker nodes
2020-03-27 13:02:48 +03:00
SaitTalhaNisanci dd1a456407
store query command list in task (#3649)
Sometimes we have concatenated query strings for a task. However,
when we want to find each query string, it is not a trivial task.
Therefore, it makes sense to store this in task so that when we need
each query string we can easily get it.
2020-03-26 12:04:08 +03:00
Philip Dubé 720525cfda Add support for window functions on coordinator
Some refactoring:
Consolidate expression which decides whether GROUP BY/HAVING are pushed down
Rename early pullUpIntermediateRows to hasNonDistributableAggregates
Create WorkerColumnName to handle formatting WORKER_COLUMN_FORMAT
Ignore NULL StringInfo pointers to SafeToPushdownWindowFunction
Fix bug where SubqueryPushdownMultiNodeTree mutates supplied Query,
	SafeToPushdownWindowFunction requires the original query as it relies on rtable
2020-03-25 15:31:20 +00:00
Onur Tirtir 52fd58d51f move MakeNameListFromRangeVar function to a more appropriate file 2020-03-25 11:01:50 +03:00
Jelte Fennema 2aabe3e2ef
Mark all connections for shutdown when citus.node_conninfo chan… (#3642)
We cache connections between nodes in our connection management code.
This is good for speed. For security this can be a problem though. If
the user changes settings related to TLS encryption they want those to
be applied to future queries. This is especially important when they did
not have TLS enabled before and now they want to enable it. This can
normally be achieved by changing citus.node_conninfo.  However, because
connections are not reopened there will still be old connections that
might not be encrypted at all.

This commit changes that by marking all connections to be shutdown at
the end of their current transaction. This way running transactions will
succeed, even if placement requires connections to be reused for this
transaction. But after this transaction completes any future statements
will use a connection created with the new connection options.

If a connection is requested and a connection is found that is marked
for shutdown, then we don't return this connection. Instead a new one is
created. This is needed to make sure that if there are no running
transactions, then the next statement will not use an old cached
connection, since connections are only actually shutdown at the end of a
transaction.
2020-03-24 15:31:41 +01:00
Marco Slot ede176d849 Implement shard placement copying 2020-03-23 08:33:08 -07:00
SaitTalhaNisanci 3df578010e
add a UDF to update colocation (#3623)
If two tables have the same distribution column type, we implicitly
colocate them. This is useful since colocation has a big performance
impact in most applications.

When a table is rebalanced, all of the colocated tables are also
rebalanced. If table A and table B are colocated and we want to
rebalance table A, table B will also be rebalanced. We need replica
identity so that logical replication can replicate updates and deletes
during rebalancing. If table B does not have a replica identity we
error out.

A solution to this is to introduce a UDF so that colocation can be
updated. The remaining tables in the colocation group will stay
colocated. For example if table A, B and C are colocated and after
updating table B's colocations, table A and table C stay colocated.

The "updating colocation" step does not move any data around, it only
updated pg_dist_partition and pg_dist_colocation tables. Specifically it
creates a new colocation group for the table and updates the entry in
pg_dist_partition while invalidating any cache.
2020-03-23 13:22:24 +03:00
Onder Kalaci 7b4eb9611b Properly terminate connections at the end session
Citus coordinator (or MX nodes) caches `citus.max_cached_conns_per_worker` connections
per node. This means that, those connections are not terminated after each statement.
Instead, cached to avoid the cost of re-establishment. This is crucial for OLTP performance.

The problem with that approach is that, we never properly handle the termnation of
those cached connections. For instance, when a session on the coordinator disconnects,
you'd see the following logs on the workers:

```
2020-03-20 09:13:39.454 CET [64028] LOG:  could not receive data from client: Connection reset by peer
```

With this patch, we're terminating the cached connections properly at the end of the connection.
2020-03-20 17:34:34 +01:00
SaitTalhaNisanci 9d2f3c392a enable local execution in INSERT..SELECT and add more tests
We can use local copy in INSERT..SELECT, so the check that disables
local execution is removed.

Also a test for local copy where the data size >
LOCAL_COPY_FLUSH_THRESHOLD is added.

use local execution with insert..select
2020-03-18 09:34:39 +03:00
SaitTalhaNisanci 42cfc4c0e9 apply review items
log shard id in local copy and add more comments
2020-03-18 09:33:55 +03:00
SaitTalhaNisanci 1df9601e13 not use local copy if current transaction is connected to local group
If current transaction is connected to local group we should not use
local copy, because we might not see some of the changes that are made
over the connection to the local group.
2020-03-18 09:28:59 +03:00
SaitTalhaNisanci f9c4431885 add the support to execute copy locally
A copy will be executed locally if
- Local execution is enabled and current transaction accessed a local placement
- Local execution is enabled and we are inside a transaction block.

So even if local execution is enabled but we are not in a transaction block, the copy will not be run locally.

This will not run locally:
```
COPY distributed_table FROM STDIN;
....
```

This will run locally:
```
SET citus.enable_local_execution to 'on';
BEGIN;
COPY distributed_table FROM STDIN;
COMMIT;
....
```
.
There are 3 ways to do a copy in postgres programmatically:
- from a file
- from a program
- from a callback function

I have chosen to implement it with a callback function, which means that we write the rows of copy from a callback function to the output buffer, which is used to insert tuples into the actual table.

For each shard id, we have a buffer that keeps the current rows to be written, we perform the actual copy operation either when:
- copy buffer for the given shard id reaches to a threshold, which is currently 512KB
- we reach to the end of the copy

The buffer size is debatable(512KB). At a given time, we might allocate (local placement * buffer size) memory at most.

The local copy uses the same copy format as remote copy, which means that we serialize the data in the same format as remote copy and send it locally.

There was also the option to use ExecSimpleRelationInsert to insert
slots one by one, which would avoid the extra
serialization/deserialization but doing some benchmarks it seems that
using buffers are significantly better in terms of the performance.

You can see this comment for more details: https://github.com/citusdata/citus/pull/3557#discussion_r389499054
2020-03-18 09:28:59 +03:00
Onur Tirtir a14739f808
Local execution of ddl/drop/truncate commands (#3514)
* reimplement ExecuteUtilityTaskListWithoutResults for local utility command execution

* introduce new functions for local execution of utility commands

* change ErrorIfTransactionAccessedPlacementsLocally logic for local utility command execution

* enable local execution for TRUNCATE command on distributed & reference tables

* update existing tests for local utility command execution

* enable local execution for DDL commands on distributed & reference tables

* enable local execution for DROP command on distributed & reference tables

* add normalization rules for cascaded commands

* add new tests for local utility command execution
2020-03-13 15:39:32 +03:00
Jelte Fennema c4cc26ed37
Semmle: Ensure stack memory is not leaked through uninitialized… (#3561)
New stack memory can contain anything including passwords/private keys.
In these functions we return structs that can have their padding
bytes uninitialized. By first zeroing out the struct fully, we try to
ensure that any data that is in these padding bytes is at least
overwritten once. It might not be zero anymore after setting the fields,
but at least it shouldn't be private data anymore.
2020-03-11 20:05:36 +01:00
Philip Dubé 81cfa05d3d First phase of addressing HAVING subquery issues
Add failing tests, make changes to avoid crashes at least

Fix HAVING subquery pushdown ignoring reference table only subqueries,
also include HAVING in recursive planning

Given that we have a function IsDistributedTable which includes reference tables,
it seems best to have IsDistributedTableRTE & QueryContainsDistributedTableRTE
reflect that they do not include reference tables in their check

Similarly SublinkList's name should reflect that it only scans WHERE

contain_agg_clause asserts that we don't have SubLinks,
use contain_aggs_of_level as suggested by pg sourcecode
2020-03-09 17:58:30 +00:00
SaitTalhaNisanci 321d0152c1
add a utility to get shard oid from relation oid and shard id (#3596) 2020-03-09 15:50:29 +03:00
Hanefi Onaldi 2595b4864b
Remove all GetWorkerNodeCount() references
As @onderkalaci suggested removing the definition of GetWorkerNodeCount() that can potentially cause misunderstandings.

I can advise using ActiveReadableWorkerNodeCount() that returns the number of active primaries is a safer alternative than GetWorkerNodeCount() that returns the total number of workers containing inactives, primaries, and unavailable nodes. I introduced a bug #3556 and in the bugfix #3564 removed the single usage of said function
2020-03-09 13:35:18 +03:00
Philip Dubé 7cdfa1daab Rename LookupCitusTableCacheEntry to GetCitusTableCacheEntry, LookupLookupCitusTableCacheEntry back to LookupCitusTableCacheEntry 2020-03-08 14:08:23 +00:00
Philip Dubé a7cca1bcde Rename DistTableCacheEntry to CitusTableCacheEntry 2020-03-07 14:08:03 +00:00
Philip Dubé b514ab0f55 Fix typos, rename isDistributedRelation to isCitusRelation 2020-03-06 19:20:34 +00:00
Philip Dubé bec58000d6 Given IsDistributedTableRTE, there's ambiguity in what DistributedTable means
Elsewhere we used DistributedTable to include reference tables
Marco suggested we use CitusTable for distributed & reference tables

So renaming:
- IsDistributedTable -> IsCitusTable
- IsDistributedTableViaCatalog -> IsCitusTableViaCatalog
- DistributedTableCacheEntry -> CitusTableCacheEntry
- DistributedTableList -> CitusTableList
- isDistributedTable -> isCitusTable
- InsertSelectIntoDistributedTable -> InsertSelectIntoCitusTable
- ExtractFirstDistributedTableId -> ExtractFirstCitusTableId
2020-03-06 18:57:55 +00:00
Marco Slot dc4c0c032e Refactor CitusBeginScan into separate DML / SELECT paths 2020-03-05 12:37:22 +01:00
Nils Dijk 268ad741a9
Refactor the deparsing of a CREATE EXTENSION to prevent NULL POINTER dereferences (#3518)
DESCRIPTION: satisfy static analysis tool for a nullptr dereference

During the static analysis project on the codebase this code has been flagged as having the potential for a null pointer dereference. Funnily enough the author had already made a comment of it in the code this was not possible due to us setting the schema name before we pass in the statement. If we want to reuse this code in a later setting this comment might not always apply and we could actually run into null pointer dereference.

This patch changes a bit of the code around to first of all make sure there is no NULL pointer dereference in this code anymore.
Secondly we allow for better deparsing by setting and adhering to the `if_not_exists` flag on the statement.
And finally add support for all syntax described in the documentation of postgres (FROM was missing).
2020-03-04 16:47:07 +01:00
Philip Dubé 20abc4d2b5
Replace foreach with foreach_ptr/foreach_oid (#3544) 2020-02-27 16:54:49 +01:00
Jelte Fennema c48f0ca7e5 Make bad refactors to foreach_xxx error out
Without this commit you could still use varCell in the body of loop.
This makes it easy for bad refactors that still use the ListCell to slip
through unnoticed, because the new ListCell will be named the same as the
one used in the old code. By renaming the ListCell to varCellDoNotUse
this will not happen.
2020-02-27 10:59:45 +01:00
Jelte Fennema 685b54b3de
Semmle: Check for NULL in some places where it might occur (#3509)
Semmle reported quite some places where we use a value that could be NULL. Most of these are not actually a real issue, but better to be on the safe side with these things and make the static analysis happy.
2020-02-27 10:45:29 +01:00
SaitTalhaNisanci 82d22b34fe
create temp schemas in parallel (#3540) 2020-02-26 16:20:08 +03:00
SaitTalhaNisanci d94c3fd43d
send repartition cleanup jobs in parallel to all workers (#3485)
* send repartition cleanup jobs in parallel to all workers

* add review items
2020-02-26 13:44:06 +03:00
Jelte Fennema 8de8b62669 Convert unsafe APIs to safe ones 2020-02-25 15:39:27 +01:00
Nils Dijk a77ed9cd23
Refactor master query to be planned by postgres' planner (#3326)
DESCRIPTION: Replace the query planner for the coordinator part with the postgres planner

Closes #2761 

Citus had a simple rule based planner for the query executed on the query coordinator. This planner grew over time with the addigion of SQL support till it was getting close to the functionality of the postgres planner. Except the code was brittle and its complexity rose which made it hard to add new SQL support.

Given its resemblance with the postgres planner it was a long outstanding wish to replace our hand crafted planner with the well supported postgres planner. This patch replaces our planner with a call to postgres' planner.

Due to the functionality of the postgres planner we needed to support both projections and filters/quals on the citus custom scan node. When a sort operation is planned above the custom scan it might require fields to be reordered in the custom scan before returning the tuple (projection). The postgres planner assumes every custom scan node implements projections. Because we controlled the plan that was created we prevented reordering in the custom scan and never had implemented it before.

A same optimisation applies to having clauses that could have been where clauses. Instead of applying the filter as a having on the aggregate it will push it down into the plan which could reach a custom scan node.

For both filters and projections we have implemented them when tuples are read from the tuple store. If no projections or filters are required it will directly return the tuple from the tuple store. Otherwise it will loop tuples from the tuple store through the filter and projection until a tuple is found and returned.

Besides filters being pushed down a side effect of having quals that could have been a where clause is that a call to read intermediate result could be called before the first tuple is fetched from the custom scan. This failed because the intermediate result would only be pulled to the coordinator on the first tuple fetch. To overcome this problem we do run the distributed subplans now before we run the postgres executor. This ensures the intermediate result is present on the coordinator in time. We do account for total time instrumentation by removing the instrumentation before handing control to the psotgres executor and update the timings our self.

For future SQL support it is enough to create a valid query structure for the part of the query to be executed on the query coordinating node. As a utility we do serialise and print the query at debug level4 for engineers to inspect what kind of query is being planned on the query coordinator.
2020-02-25 14:39:56 +01:00
Onur Tirtir 3c99db40b9 Some small typos & cleanup 2020-02-24 16:37:55 +03:00
Jelte Fennema 2a9fccc7a0
Remove READFUNCs (#3536)
We don't actually use these functions anymore since merging #1477.

Advantages of removing:
1. They add work whenever we add a new node.
2. They contain some usage of stdlib APIs that are banned by Microsoft.
   Removing it means we don't have to replace those with safe ones.
2020-02-24 12:43:28 +01:00
Philip Dubé bcf54c5014 Address a couple issues with maintenace daemon management:
- Stop the daemon when citus extension is dropped
- Bail on maintenance daemon startup if myDbData is started with a non-zero pid
- Stop maintenance daemon from spawning itself
- Don't use postgres die, just wrap proc_exit(0)
- Assert(myDbData->workerPid == MyProcPid)

The two issues were that multiple daemons could be running for a database,
or that a daemon would be leftover after DROP EXTENSION citus
2020-02-21 16:49:01 +00:00
Jelte Fennema 00d667c41d
Semmle: Fix obvious issues (#3502)
Fixes some obvious issues found by the Semmle static analysis tool.
2020-02-21 10:16:00 +01:00
Philip Dubé 52042d4a00 Prefer instr_time to TimestampTz when we want CLOCK_MONOTONIC 2020-02-19 00:34:17 +00:00
Philip Dubé 08f6842d50 Fix typos
Equivalance -> Equivalence
utillity -> utility
shorted lived one -> shortly lived one
elegible -> eligible
2020-02-18 17:14:40 +00:00
Marco Slot 038e5999cb Implement direct COPY table TO stdout 2020-02-17 15:15:10 +01:00
Onder Kalaci 975c4c2264 Do not prune shards if the distribution key is NULL
The root of the problem is that, standard_planner() converts the following qual

```
   {OPEXPR
   :opno 98
   :opfuncid 67
   :opresulttype 16
   :opretset false
   :opcollid 0
   :inputcollid 100
   :args (
      {VAR
      :varno 1
      :varattno 1
      :vartype 25
      :vartypmod -1
      :varcollid 100
      :varlevelsup 0
      :varnoold 1
      :varoattno 1
      :location 45
      }
      {CONST
      :consttype 25
      :consttypmod -1
      :constcollid 100
      :constlen -1
      :constbyval false
      :constisnull true
      :location 51
      :constvalue <>
      }
   )
   :location 49
   }
```

To

```
(
   {CONST
   :consttype 16
   :consttypmod -1
   :constcollid 0
   :constlen 1
   :constbyval true
   :constisnull true
   :location -1
   :constvalue <>
   }
)
```

So, Citus doesn't deal with NULL values in real-time or non-fast path router queries.

And, in the FastPathRouter planner, we check constisnull in DistKeyInSimpleOpExpression().
However, in deferred pruning case, we do not check for isnull for const.

Thus, the fix consists of two parts:
- Let PruneShards() not crash when NULL parameter is passed
- For deferred shard pruning in fast-path queries, explicitly check that we have CONST which is not NULL
2020-02-13 15:00:31 +01:00
Onur Tirtir 39df51e903
Introduce objects to dist. infrastructure when updating Citus (#3477)
Mark existing objects that are not included in distributed object infrastructure
in older versions of Citus (but now should be) as distributed, after updating
Citus successfully.
2020-02-07 18:07:59 +03:00
Nils Dijk d5433400f9
Fix: Unnecessary repartition on joins with more than 4 tables (#3473)
DESCRIPTION: Fix unnecessary repartition on joins with more than 4 tables

In 9.1 we have introduced support for all CH-benCHmark queries by widening our definitions of joins to include joins with expressions in them. This had the undesired side effect of Q5 regressing on its plan by implementing a repartition join.

It turned out this regression was not directly related to widening of the join clause, nor the schema employed by CH-benCHmark. Instead it had to do with 4 or more tables being joined in a chain. A chain meaning:

```sql
SELECT * FROM a,b,c,d WHERE a.part = b.part AND b.part = c.part AND ....
```

Due to how our join order planner was implemented it would only keep track of 1 of the partition columns when comparing if the join could be executed locally. This manifested in a join chain of 4 tables to _always_ be executed as a repartition join. 3 tables joined in a chain would have the middle table shared by the two outer tables causing the local join possibility to be found.

With this patch we keep a  unique list (or set) of all partition columns participating in the join. When a candidate table is checked for a possibility to execute a local join it will check if there is any partition column in that set that matches an equality join clause on the partition column of the candidate table.

By taking into account all partition columns in the left relation it will now find the local join path on >= 4 tables joined in a chain. 

fixes: #3276
2020-02-06 15:07:07 +01:00
Philip Dubé ecad4aa5e6 Fill in jobIdList field of DistributedExecution
Pass down jobIdList from ExecuteTasksInDependencyOrder

Also clean up comment for ExecuteTaskListOutsideTransaction
2020-02-05 17:32:22 +00:00
Philip Dubé c252811884 dont: don't, wont: won't, acylic: acyclic 2020-02-05 17:32:22 +00:00
Hadi Moshayedi bc1a800f70 Use current user for repartition join temp schemas.
Otherwise when using a less privileged user we might get
errors when trying to create the schema.
2020-02-04 09:48:20 -08:00
Onder Kalaci 2f274a4fce Make sure to go deeper into the functions to search for PARAMs
For example, a PARAM might reside inside a function just because
of a casting of a type such as the follows:

```
               {FUNCEXPR
               :funcid 1740
               :funcresulttype 1700
               :funcretset false
               :funcvariadic false
               :funcformat 2
               :funccollid 0
               :inputcollid 0
               :args (
                  {PARAM
                  :paramkind 0
                  :paramid 15
                  :paramtype 23
                  :paramtypmod -1
                  :paramcollid 0
                  :location 356
                  }
               )
```

We should recursively check the expression before bailing out.
2020-02-03 09:36:12 +01:00
Philip Dubé 84a500ffc6 CitusRemoveDirectory: loop when directory is not empty
Sometimes during errors workers will create files while we're deleting intermediate directories

example:
DEBUG:  could not remove file "base/pgsql_job_cache/10_0_431": Directory not empty
DETAIL:  WARNING from localhost:57637
2020-01-30 20:02:08 +00:00
Önder Kalacı 8584cb005b
Do not evaluate functions on the coordinator for SELECT queries (#3440)
Previously, the logic for evaluting the functions and the parameters
were the same. That ended-up evaluting the functions inaccurately
on the coordinator. Instead, split the function evaluation logic
from parameter evalution logic.
2020-01-30 08:47:28 +01:00
Önder Kalacı 4519d3411d
Improve the representation of used sub plans (#3411)
Previously, we've identified the usedSubPlans by only looking
to the subPlanId.

With this commit, we're expanding it to also include information
on the location of the subPlan.

This is useful to distinguish the cases where the subPlan is used
either on only HAVING or both HAVING and any other part of the query.
2020-01-24 10:47:14 +01:00
Philip Dubé 50c5e814c8 CurrentDatabaseName: return const char* as we're borrowing from cache 2020-01-23 22:49:35 +00:00
Hadi Moshayedi 3e1004c232 Change DistributedResultFragment::nodeId to uint32.
This is to match the type of WorkerNode::nodeId.
2020-01-23 09:33:15 -08:00
Önder Kalacı ef7d1ea91d
Locally execute queries that don't need any data access (#3410)
* Update shardPlacement->nodeId to uint

As the source of the shardPlacement->nodeId is always workerNode->nodeId,
and that is uint32.

We had this hack because of: 0ea4e52df5 (r266421409)

And, that is gone with: 90056f7d3c (diff-c532177d74c72d3f0e7cd10e448ab3c6L1123)

So, we're safe to do it now.

* Relax the restrictions on using the local execution

Previously, whenever any local execution happens, we disabled further
commands to do any remote queries. The basic motivation for doing that
is to prevent any accesses in the same transaction block to access the
same placements over multiple sessions: one is local session the other
is remote session to the same placement.

However, the current implementation does not distinguish local accesses
being to a placement or not. For example, we could have local accesses
that only touches intermediate results. In that case, we should not
implement the same restrictions as they become useless.

So, this is a pre-requisite for executing the intermediate result only
queries locally.

* Update the error messages

As the underlying implementation has changed, reflect it in the error
messages.

* Keep track of connections to local node

With this commit, we're adding infrastructure to track if any connection
to the same local host is done or not.

The main motivation for doing this is that we've previously were more
conservative about not choosing local execution. Simply, we disallowed
local execution if any connection to any remote node is done. However,
if we want to use local execution for intermediate result only queries,
this'd be annoying because we expect all queries to touch remote node
before the final query.

Note that this approach is still limiting in Citus MX case, but for now
we can ignore that.

* Formalize the concept of Local Node

Also some minor refactoring while creating the dummy placement

* Write intermediate results locally when the results are only needed locally

Before this commit, Citus used to always broadcast all the intermediate
results to remote nodes. However, it is possible to skip pushing
the results to remote nodes always.

There are two notable cases for doing that:

   (a) When the query consists of only intermediate results
   (b) When the query is a zero shard query

In both of the above cases, we don't need to access any data on the shards. So,
it is a valuable optimization to skip pushing the results to remote nodes.

The pattern mentioned in (a) is actually a common patterns that Citus users
use in practice. For example, if you have the following query:

WITH cte_1 AS (...), cte_2 AS (....), ... cte_n (...)
SELECT ... FROM cte_1 JOIN cte_2 .... JOIN cte_n ...;

The final query could be operating only on intermediate results. With this patch,
the intermediate results of the ctes are not unnecessarily pushed to remote
nodes.

* Add specific regression tests

As there are edge cases in Citus MX and with round-robin policy,
use the same queries on those cases as well.

* Fix failure tests

By forcing not to use local execution for intermediate results since
all the tests expects the results to be pushed remotely.

* Fix flaky test

* Apply code-review feedback

Mostly style changes

* Limit the max value of pg_dist_node_seq to reserve for internal use
2020-01-23 18:28:34 +01:00
Onder Kalaci a0dff301c7 Update shardPlacement->nodeId to uint
As the source of the shardPlacement->nodeId is always workerNode->nodeId,
and that is uint32.

We had this hack because of: 0ea4e52df5 (r266421409)

And, that is gone with: 90056f7d3c (diff-c532177d74c72d3f0e7cd10e448ab3c6L1123)

So, we're safe to do it now.
2020-01-23 13:00:24 +01:00
Halil Ozan Akgul b40f067d05 Adds propagation for grant on schema commands 2020-01-20 14:51:28 +03:00
Onder Kalaci 0bf1e81e33 Cache local plans on BeginScan 2020-01-17 16:02:57 +01:00
Onder Kalaci 08d148d43e Make TaskAccessesLocalNode external function 2020-01-17 16:02:57 +01:00
Onder Kalaci ff12df411b Add LocalPlannedStatement struct 2020-01-17 16:02:57 +01:00
Jelte Fennema 246435be7e
Lazy query deparsing executable queries (#3350)
Deparsing and parsing a query can be heavy on CPU. When locally executing 
the query we don't need to do this in theory most of the time.

This PR is the first step in allowing to skip deparsing and parsing
the query in these cases, by lazily creating the query string and
storing the query in the task. Future commits will make use of this and
not deparse and parse the query anymore, but use the one from the task
directly.
2020-01-17 11:49:43 +01:00
Hadi Moshayedi a079278b0c Repartitioned INSERT/SELECT: Add a GUC to enable/disable it 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 97072c9eb1 INSERT/SELECT: show method in EXPLAIN output 2020-01-16 23:24:52 -08:00
Hadi Moshayedi 44a2aede16 Don't start a coordinated transaction on workers.
Otherwise transaction hooks of Citus kick in and might cause unwanted errors.
2020-01-16 23:24:52 -08:00
Hadi Moshayedi 42c3c03b85 Handle extra columns added in ExpandWorkerTargetEntry() in repartitioned INSERT/SELECT 2020-01-16 23:24:52 -08:00
Hadi Moshayedi b4e5f4b10a Implement INSERT ... SELECT with repartitioning 2020-01-16 23:24:52 -08:00
Onder Kalaci dc17c2658e Defer shard pruning for fast-path router queries to execution
This is purely to enable better performance with prepared statements.
Before this commit, the fast path queries with prepared statements
where the distribution key includes a parameter always went through
distributed planning. After this change, we only go through distributed
planning on the first 5 executions.
2020-01-16 16:59:36 +01:00
Halil Ozan Akgul c5539d20d9 Adds alter table schema propagation 2020-01-16 17:04:16 +03:00
Jelte Fennema e76281500c
Replace shardId lock with lock on colocation+shardIntervalIndex (#3374)
This new locking pattern makes sure that some deadlocks that could
happend during rebalancing cannot occur anymore.
2020-01-16 13:14:01 +01:00
Onder Kalaci 64560b07be Update regression tests-2
In this commit, we're introducing a way to prevent CTE inlining via a GUC.

The GUC is used in all the tests where PG 11 and PG 12 tests would diverge
otherwise.

Note that, in PG 12, the restriction information for CTEs are generated. It
means that for some queries involving CTEs, Citus planner (router planner/
pushdown planner) may behave differently. So, via the GUC, we prevent
tests to diverge on PG 11 vs PG 12.

When we drop PG 11 support, we should get rid of the GUC, and mark
relevant ctes as MATERIALIZED, which does the same thing.
2020-01-16 12:28:15 +01:00
Onder Kalaci 01a5800ee8 Add Citus' CTE inlining functions
With this commit we add the necessary Citus function to inline CTEs
in a queryTree.

You might ask, why do we need to inline CTEs if Postgres is already
going to do it?

Few reasons behind this decision:

- One techinal node here is that Citus does the recursive CTE planning
  by checking the originalQuery which is the query that has not gone
  through the standard_planner().

  CTEs in Citus is super powerful. It is practically key for full SQL
  coverage for multi-shard queries. With CTEs, you can always reduce
  any query multi-shard query into a router query via recursive
  planning (thus full SQL coverage).
  We cannot let CTE inlining break that. The main idea is Citus should
  be able to retry planning if anything goes after CTE inlining.

  So, by taking ownership of CTE inlining on the originalQuery, Citus
  can fallback to recursive planning of CTEs if the planning with the
  inlined query fails. It could have been a lot harder if we had relied
  on standard_planner() to have the inlined CTEs on the original query.

- We want to have this feature in PostgreSQL 11 as well, but Postgres
  only inlines in version 12
2020-01-16 12:28:15 +01:00
Marco Slot f1a0582973 Make ApplyLogRedaction a macro and redefine ereport 2020-01-13 18:24:36 +01:00
Marco Slot 90056f7d3c Remove copy from worker for append-partitioned table 2020-01-13 23:03:40 -08:00
Philip Dubé 4b5d6c3ebe Rename RelayFileState to ShardState
Replace FILE_ prefix with SHARD_STATE_
2020-01-12 05:57:53 +00:00
Philip Dubé e71386af33 Replace ARRAY_OUT_FUNC_ID with postgres's F_ARRAY_OUT
Also use stack allocation for walkerContext in multi_logical_optimizer
2020-01-10 16:54:00 +00:00
Hadi Moshayedi 527d7d41c1 Implement RedistributeTaskListResult 2020-01-09 23:47:25 -08:00
Hadi Moshayedi e1e383cb59 Don't override xact id assigned by coordinator on workers.
We might need to send commands from workers to other workers. In
these cases we shouldn't override the xact id assigned by coordinator,
or otherwise we won't read the consistent set of result files
accross the nodes.
2020-01-09 11:09:11 -08:00
Hadi Moshayedi c7c460e843 PartitionTasklistResults: Use different queries per placement
We need to know which placement succeeded in executing the worker_partition_query_result() call. Otherwise we wouldn't know which node to fetch from. This change allows that by introducing Task::perPlacementQueryStrings.
2020-01-09 10:55:58 -08:00
Hadi Moshayedi f38d0e5b3f Partitioned task list results. 2020-01-09 10:32:58 -08:00
Philip Dubé 73c06fae3b Introduce GetDistributeObjectOps to organize dispatch of logic dependent on node/object type 2020-01-09 18:24:29 +00:00
Philip Dubé 863bf49507 Implement pulling up rows to coordinator when aggregates cannot be pushed down. Enabled by default 2020-01-07 01:16:04 +00:00
Jelte Fennema 5b0baea72c Refactor distributed_planner for better understandability 2020-01-06 14:23:38 +01:00
Onder Kalaci 5a1e752726 Apply feedback - add fastPath field to plan 2020-01-06 12:42:43 +01:00
Onder Kalaci 7f3ab7892d Skip shard pruning when possible
We're already traversing the queryTree and finding the distribution
key value, so pass it to the later stages of the planning.
2020-01-06 12:42:43 +01:00
Onder Kalaci ca293116fa Reduce calls to FastPathRouterQuery()
Before this commit, we called it twice durning planning. Instead,
we save the information and pass it.
2020-01-06 12:42:43 +01:00
Onder Kalaci c8f14c9f6c Make sure to update shard states of partitions on failures
Fixes #3331

In #2389, we've implemented support for partitioned tables with rep > 1.
The implementation is limiting the use of modification queries on the
partitions. In fact, we error out when any partition is modified via
EnsurePartitionTableNotReplicated().

However, we seem to forgot an important case, where the parent table's
partition is marked as INVALID. In that case, at least one of the partition
becomes INVALID. However, we do not mark partitions as INVALID ever.

If the user queries the partition table directly, Citus could happily send
the query to INVALID placements -- which are not marked as INVALID.

This PR fixes it by marking the placements of the partitions as INVALID
as well.

The shard placement repair logic already re-creates all the partitions,
so should be fine in that front.
2020-01-06 12:26:08 +01:00
Önder Kalacı a174eb4f7b
Do not go through standard_planner() for INSERTs (#3348)
That seems unnecessary. We already have the notion of FastPath queries,
simply add it there.
2020-01-03 12:15:22 +00:00
Jelte Fennema 3a042e4611 Allow cartesian products on reference tables 2019-12-27 15:05:51 +01:00
Jelte Fennema 61e2501645 Make any expression with two or more tables a join expression 2019-12-27 15:05:51 +01:00
Hadi Moshayedi d7aea7fa10 Implement partitioned intermediate results. 2019-12-24 03:53:39 -08:00
Jelte Fennema b655c02352
Add the necessary changes for rebalance strategies on enterprise (#3325)
This commit adds the SQL and C changes necessary to support custom rebalance
strategies in the Enterprise version of Citus.
2019-12-19 15:23:08 +01:00
Hadi Moshayedi ef487e0792 Implement fetch_intermediate_results 2019-12-18 10:46:35 -08:00
Hadi Moshayedi 249508d267 Estimate cost of read_intermediate_results() 2019-12-17 13:51:51 -08:00
SaitTalhaNisanci 7ff4ce2169
Add adaptive executor support for repartition joins (#3169)
* WIP

* wip

* add basic logic to run a single job with repartioning joins with adaptive executor

* fix some warnings and return in ExecuteDependedTasks if there is none

* Add the logic to run depended jobs in adaptive executor

The execution of depended tasks logic is changed. With the current
logic:
- All tasks are created from the top level task list.
- At one iteration:
	- CurTasks whose dependencies are executed are found.
	- CurTasks are executed in parallel with adapter executor main
logic.
- The iteration is repeated until all tasks are completed.

* Separate adaptive executor repartioning logic

* Remove duplicate parts

* cleanup directories and schemas

* add basic repartion tests for adaptive executor

* Use the first placement to fetch data

In task tracker, when there are replicas, we try to fetch from a replica
for which a map task is succeeded. TaskExecution is used for this,
however TaskExecution is not used in adaptive executor. So we cannot use
the same thing as task tracker.

Since adaptive executor fails when a map task fails (There is no retry
logic yet). We know that if we try to execute a fetch task, all of its
map tasks already succeeded, so we can just use the first one to fetch
from.

* fix clean directories logic

* do not change the search path while creating a udf

* Enable repartition joins with adaptive executor with only enable_reparitition_joins guc

* Add comments to adaptive_executor_repartition

* dont run adaptive executor repartition test in paralle with other tests

* execute cleanup only in the top level execution

* do cleanup only in the top level ezecution

* not begin a transaction if repartition query is used

* use new connections for repartititon specific queries

New connections are opened to send repartition specific queries. The
opened connections will be closed at the FinishDistributedExecution.

While sending repartition queries no transaction is begun so that
we can see all changes.

* error if a modification was done prior to repartition execution

* not start a transaction if a repartition query and sql task, and clean temporary files and schemas at each subplan level

* fix cleanup logic

* update tests

* add missing function comments

* add test for transaction with DDL before repartition query

* do not close repartition connections in adaptive executor

* rollback instead of commit in repartition join test

* use close connection instead of shutdown connection

* remove unnecesary connection list, ensure schema owner before removing directory

* rename ExecuteTaskListRepartition

* put fetch query string in planner not executor as we currently support only replication factor = 1 with adaptive executor and repartition query and we know the query string in the planner phase in that case

* split adaptive executor repartition to DAG execution logic and repartition logic

* apply review items

* apply review items

* use an enum for remote transaction state and fix cleanup for repartition

* add outside transaction flag to find connections that are unclaimed instead of always opening a new transaction

* fix style

* wip

* rename removejobdir to partition cleanup

* do not close connections at the end of repartition queries

* do repartition cleanup in pg catch

* apply review items

* decide whether to use transaction or not at execution creation

* rename isOutsideTransaction and add missing comment

* not error in pg catch while doing cleanup

* use replication factor of the creation time, not current time to decide if task tracker should be chosen

* apply review items

* apply review items

* apply review item
2019-12-17 19:09:45 +03:00
Marco Slot 2f568ad5a5 Forbid using connections that sent intermediate results for data access and vice versa 2019-12-17 11:49:13 +01:00
Marco Slot f4031dd477 Clean up transaction block usage logic in adaptive executor 2019-12-17 10:48:19 +01:00
Marco Slot 5f656e22db Fix issue in IsMultiStatementTransaction detection 2019-12-16 17:01:43 +01:00
SaitTalhaNisanci 2829c601dd
replace Begin words in coordinated transactions with use (#3293) 2019-12-16 10:40:31 +03:00
SaitTalhaNisanci a0fe8646e0
add IsHoldOffCancellationReceived utility function (#3290) 2019-12-12 17:32:59 +03:00
SaitTalhaNisanci 13204487e9
remove copyright years (#3286) 2019-12-11 21:14:08 +03:00
SaitTalhaNisanci d10f97998c rename REMOTE_TRANS_INVALID to REMOTE_TRANS_NOT_STARTED 2019-12-11 15:24:18 +03:00
Marco Slot 133b8e1e0e Move coordinator insert..select logic into executor 2019-12-10 11:21:35 -08:00
Philip Dubé fcf2fd819b Add distributioncolumncollation to to pg_dist_colocation
Use partition column's collation for range distributed tables
Don't allow non deterministic collations for hash distributed tables
CoPartitionedTables: don't compare unequal types
2019-12-09 19:51:40 +00:00
Philip Dubé d138bb89bf Support creating collations as part of dependency resolution. Propagate ALTER/DROP on distributed collations
Propagate CREATE COLLATION when outside transaction
2019-12-09 04:42:51 +00:00
Hadi Moshayedi d28beb3711 Detect SQL UDF Calls. 2019-12-05 14:31:05 -08:00
Marco Slot bb3bc10f0c Fix segfault in column_to_column_name 2019-12-01 23:57:25 +01:00
Marco Slot 16d1ad3666 Remove distinction between SQL_TASK and ROUTER_TASK 2019-11-29 05:58:29 +01:00
SaitTalhaNisanci aeec3d1544
fix typo in dependent jobs and dependent task (#3244) 2019-11-28 23:47:28 +03:00
Hadi Moshayedi 2268a9cae6 Error for metadata commands if any metadata node is out-of-sync (#3226)
* Error for metadata commands if any metadata node is out-of-sync

* Make the functions have separate APIs for all workers/metadata workers
2019-11-27 09:52:57 +01:00
Önder Kalacı 1cfbeb89ec
Make NodeCanHaveDistTablePlacements() public (#3229)
Since it is required in rebalancer.
2019-11-26 12:15:38 +01:00
Philip Dubé 261a9de42d Fix typos:
VAR_SET_VALUE_KIND -> VAR_SET_VALUE kind
beginnig -> beginning
plannig -> planning
the the -> the
er then -> er than
2019-11-25 23:24:13 +00:00
Hanefi Onaldi d82f3e9406
Introduce intermediate result broadcasting
In plain words, each distributed plan pulls the necessary intermediate
results to the worker nodes that the plan hits. This is primarily useful
in three ways. 

(i) If the distributed plan that uses intermediate
result(s) is a router query, then the intermediate results are only
broadcasted to a single node.

(ii) If a distributed plan consists of only intermediate results, which
is not uncommon, the intermediate results are broadcasted to a single
node only.

(iii) If a distributed query hits a sub-set of the shards in multiple
workers, the intermediate results will be broadcasted to the relevant
node(s).

The final item (iii) becomes crucial for append/range distributed
tables where typically the distributed queries hit a small subset of
shards/workers.

To do this, for each query that Citus creates a distributed plan, we keep
track of the subPlans used in the queryTree, and save it in the distributed
plan. Just before Citus executes each subPlan, Citus first keeps track of
every worker node that the distributed plan hits, and marks every subPlan
should be broadcasted to these nodes. Later, for each subPlan which is a
distributed plan, Citus does this operation recursively since these
distributed plans may access to different subPlans, and those have to be
recorded as well.
2019-11-20 15:26:36 +03:00
Philip Dubé b7fef5c31a Miscellaneous cleanup in prep for collation propagation 2019-11-19 17:28:59 +00:00
Onur TIRTIR 26c306d188
Add extensions to distributed object propagation infrastructure (#3185) 2019-11-19 17:56:28 +03:00
Önder Kalacı 40fa3862ce
Prevent Citus extension becoming distributed object (#3197)
Prevent Citus extension being distributed

Because that could prevent doing rolling upgrades, where users may
prefer to upgrade the version on the coordinator but not the workers.

There could be some other edge cases, so I'd prefer to keep Citus
extension outside the picture for now.
2019-11-18 16:57:10 +01:00
Halil Ozan Akgul 5ae7b219ff Create the ALTER ROLE propagation 2019-11-18 18:31:28 +03:00
Nils Dijk 217890af5f
Feature: Expression in reference join (#3180)
DESCRIPTION: Expression in reference join

Fixed: #2582

This patch allows arbitrary expressions in the join clause when joining to a reference table. An example of such joins could be found in CHbenCHmark queries 7, 8, 9 and 11; `mod((s_w_id * s_i_id),10000) = su_suppkey` and `ascii(substr(c_state,1,1)) = n2.n_nationkey`. Since the join is on a reference table these queries are able to be pushed down to the workers.

To implement these queries we will widen the `IsJoinClause` predicate to not check if the expressions are a type `Var` after stripping the implicit coerciens. Instead we define a join clause when the `Var`'s in a clause come from more than 1 table.

This allows more clauses to pass into the logical planner's `MultiNodeTree(...)` planning function. To compensate for this we tighten down the `LocalJoin`, `SinglePartitionJoin` and `DualPartitionJoin` to check for direct column references when planning. This allows the planner to work with arbitrary join expressions on reference tables.
2019-11-18 16:25:46 +01:00
Hadi Moshayedi d9dcba25e3 Plan reference/local table joins locally 2019-11-15 07:36:50 -08:00
Onder Kalaci 90943a6ce6 Do not include coordinator shards when round-robin is selected
When the user picks "round-robin" policy, the aim is that the load
is distributed across nodes. However, for reference tables on the
coordinator, since local execution kicks in immediately, round-robin
is ignored.

With this change, we're excluding the placement on the coordinator.
Although the approach seems a little bit invasive because of
modifications in the placement list, that sounds acceptable.

We could have done this in some other ways such as:

1) Add a field to "Task->roundRobinPlacement" (or such), which is
updated as the first element after RoundRobinPolicy is applied.
During the execution, if that placement is local to the coordinator,
skip it and try the other remote placements.

2) On TaskAccessesLocalNode()@local_execution.c, check
task_assignment_policy, if round-robin selected and there is local
placement on the coordinator, skip it. However, task assignment is done
on planning, but this decision is happening on the execution, which
could create weird edge cases.
2019-11-15 06:03:32 -08:00
Hadi Moshayedi 15af1637aa Replicate reference tables to coordinator. 2019-11-15 05:50:19 -08:00
SaitTalhaNisanci b9b7fd7660
add IsLoggableLevel utility function (#3149)
* add IsLoggableLevel utility function

* add function comment for IsLoggableLevel

* put ApplyLogRedaction to logutils
2019-11-15 14:59:13 +03:00
Jelte Fennema 1b2c438e69
Rename variables to not shadow globals in RHEL6 (#3194)
Fixes #2839
2019-11-15 12:12:24 +01:00
Philip Dubé 495c0f5117 Phase 1 implementation of custom aggregates
Phase 1 seeks to implement minimal infrastructure, so does not include:
	- dynamic generation of support aggregates to handle multiple arguments
	- configuration methods to direct aggregation strategy,
		or mark an aggregate's serialize/deserialize as safe to operate across nodes

Aggregates can be distributed when:
	- they have a single argument
	- they have a combinefunc
	- their transition type is not a pseudotype
2019-11-14 19:01:24 +00:00
Philip Dubé edc7a2ee38 Improve RECORD support 2019-11-14 18:32:22 +00:00
Philip Dubé eb35743c3f Remove citus.worker_list_file & master_initialize_node_metadata 2019-11-13 00:49:58 +00:00
Jelte Fennema adc6ca6100
Make simple in queries on unique columns work with repartion join (#3171)
This is necassery to support Q20 of the CHbenCHmark: #2582.

To summarize the fix: The subquery is converted into an INNER JOIN on a
table. This fixes the issue, since an INNER JOIN on a table is already
supported by the repartion planner.

The way this replacement is happening.:
1. Postgres replaces `col in (subquery)` with a SEMI JOIN (subquery) on col = subquery_result
2. If this subquery is simple enough Postgres will replace it with a
   regular read from a table
3. If the subquery returns unique results (e.g. a primary key) Postgres
   will convert the SEMI JOIN into an INNER JOIN during the planning. It
   will not change this in the rewritten query though.
4. We check if Postgres sends us any SEMI JOINs during its join order
   planning, if it doesn't we replace all SEMI JOINs in the rewritten
   query with INNER JOIN (which we already support).
2019-11-11 13:44:28 +01:00
Jelte Fennema 9fb897a074
Fix queries with repartition joins and group by unique column (#3157)
Postgres doesn't require you to add all columns that are in the target list to
the GROUP BY when you group by a unique column (or columns). It even actively
removes these group by clauses when you do.

This is normally fine, but for repartition joins it is not. The reason for this
is that the temporary tables don't have these primary key columns. So when the
worker executes the query it will complain that it is missing columns in the
group by.

This PR fixes that by adding an ANY_VALUE aggregate around each variable in
the target list that does is not contained in the group by or in an aggregate.
This is done only for repartition joins.

The ANY_VALUE aggregate chooses the value from an undefined row in the
group.
2019-11-08 15:36:18 +01:00
Önder Kalacı 0b3d4e55d9
Local execution should not change hasReturning for distributed tables (#3160)
It looks like the logic to prevent RETURNING in reference tables to
have duplicate entries that comes from local and remote executions
leads to missing some tuples for distributed tables.

With this PR, we're ensuring to kick in the logic for reference tables
only.
2019-11-08 12:49:56 +01:00
Philip Dubé 2fc45e5897 create_distributed_function: accept aggregates
Adds support for OCLASS_PROC to worker_create_or_replace_object
2019-11-06 18:23:37 +00:00
Hadi Moshayedi e00d1546f3 Don't maintain replicationfactor of reference tables 2019-11-05 07:23:14 -08:00
Önder Kalacı 960cd02c67
Remove real time router executors (#3142)
* Remove unused executor codes

All of the codes of real-time executor. Some functions
in router executor still remains there because there
are common functions. We'll move them to accurate places
in the follow-up commits.

* Move GUCs to transaction mngnt and remove unused struct

* Update test output

* Get rid of references of real-time executor from code

* Warn if real-time executor is picked

* Remove lots of unused connection codes

* Removed unused code for connection restrictions

Real-time and router executors cannot handle re-using of the existing
connections within a transaction block.

Adaptive executor and COPY can re-use the connections. So, there is no
reason to keep the code around for applying the restrictions in the
placement connection logic.
2019-11-05 12:48:10 +01:00
SaitTalhaNisanci 7c410e3cd7
pass CitusCustomState directly to adaptive executor (#3151) 2019-11-01 19:57:32 +03:00
Önder Kalacı ffd89e4e01
Include all relevant relations in the ExtractRangeTableRelationWalker (#3135)
We've changed the logic for pulling RTE_RELATIONs in #3109 and
non-colocated subquery joins and partitioned tables.
@onurctirtir found this steps where I traced back and found the issues.

While looking into it in more detail, we decided to expand the list in a
way that the callers get all the relevant RTE_RELATIONs RELKIND_RELATION,
RELKIND_PARTITIONED_TABLE, RELKIND_FOREIGN_TABLE and RELKIND_MATVIEW.
These are all relation kinds that Citus planner is aware of.
2019-11-01 16:06:58 +01:00
SaitTalhaNisanci dadbe86af1
refactor some of hard coded values in citus gucs (#3137)
* refactor some of hard coded values in citus gucs

* rename GUC_ALLOW_ALL to GUC_STANDARD
2019-10-30 10:35:39 +03:00
SaitTalhaNisanci 29d45bd1b9
Do not assign InvalidOid for local execution while extracting parameters (#3131)
* do not assign InvalidOid for local execution while extracting parameters

* rename functions

* rename parameter and replace function
2019-10-28 14:28:22 +03:00
Jelte Fennema a5010e5b17
Add extra foreach convenience macros (#3117)
This completely hides `ListCell` to the user of the loop

Example usage:
```c
WorkerNode *workerNode = NULL;

foreach_ptr(workerNode, workerNodeList) {
	// Do stuff with workerNode
}
```

Instead of:
```c
ListCell *workerNodeCell = NULL;

foreach(cell, workerNodeList) {
    WorkerNode *workerNode = lfirst(workerNodeCell);
	// Do stuff with workerNode
}
```
2019-10-23 16:49:12 +02:00
Jelte Fennema 78e495e030
Add shouldhaveshards to pg_dist_node (#2960)
This is an improvement over #2512.

This adds the boolean shouldhaveshards column to pg_dist_node. When it's false, create_distributed_table for new collocation groups will not create shards on that node. Reference tables will still be created on nodes where it is false.
2019-10-22 16:47:16 +02:00