Commit Graph

665 Commits (666696c01c307f9ea6a939294d9d2d313790f27c)

Author SHA1 Message Date
Jeff Davis 33ee4877d4 PG15: rename pgstat_initstats() -> pgstat_init_relation().
From PG commits bff258a273 and be902e2651.
2022-05-02 10:12:03 -07:00
Onder Kalaci b0b91bab04 Rename metadata sync to node metadata sync where applicable 2022-04-07 17:51:31 +02:00
Marco Slot 9476f377b5 Remove old re-partitioning functions 2022-04-04 18:11:52 +02:00
Jelte Fennema 3a44fa827a
Add versions of forboth that don't need ListCell (#5856)
We've had custom versions of Postgres its `foreach` macro which with a
hidden ListCell for quite some time now. People like these custom
macros, because they are easier to use and require less boilerplate.
This adds similar custom versions of Postgres its `forboth` macro. Now
you don't need ListCells anymore when looping over two lists at the same
time.
2022-03-23 14:50:36 +03:00
Gledis Zeneli 2cb02bfb56
Fix node adding itself with citus_add_node leading to deadlock (Fix #5720) (#5758)
If a worker node is being added, a command is sent to get the server_id of the worker from the pg_dist_node_metadata table. If the worker's id is the same as the node executing the code, we will know the node is trying to add itself. If the node tries to add itself without specifying `groupid:=0` the operation will result in an error.
2022-03-10 17:46:33 +03:00
Marco Slot 3ba61244b8 Synchronize pg_dist_colocation metadata 2022-03-03 11:01:59 +01:00
Ahmet Gedemenli 2bc6a00408 Refactor CreateDistributedTable to take column name 2022-02-21 12:07:17 +03:00
Ahmet Gedemenli a1c3580c64 Support TRUNCATE for foreign tables 2022-02-17 09:59:53 +03:00
Burak Velioglu f88cc230bf
Handle tables and objects as metadata. Update UDFs accordingly
With this commit we've started to propagate sequences and shell
tables within the object dependency resolution. So, ensuring any
dependencies for any object will consider shell tables and sequences
as well. Separate logics for both shell tables and sequences have
been removed.

Since both shell tables and sequences logic were implemented as a
part of the metadata handling before that logic, we were propagating
them while syncing table metadata. With this commit we've divided
metadata (which means anything except shards thereafter) syncing
logic into multiple parts and implemented it either as a part of
ActivateNode. You can check the functions called in ActivateNode
to check definition of different metadata.

Definitions of start_metadata_sync_to_node and citus_activate_node
have also been updated. citus_activate_node will basically create
an active node with all metadata and reference table shards.
start_metadata_sync_to_node will be same with citus_activate_node
except replicating reference tables. stop_metadata_sync_to_node
will remove all the metadata. All of those UDFs need to be called
by superuser.
2022-01-31 16:20:15 +03:00
Önder Kalacı f68ac4a7cf
Consider foreign keys between reference tables (#5659)
On #5071, we avoid edge cases, but below there are foreign key constraints as well

This commit makes sure we cover those as well
2022-01-28 13:38:14 +01:00
Önder Kalacı 885601c02c
Require superuser while activating a node (#5609)
* Require superuser while activating a node

With this change, we require ActiveNode() (hence citus_add_node(),
citus_activate_node()) explicitly require for a superuser.

Before this commit, these functions were designed to work with
non-superuser roles with the relevent GRANTs given.

However, that is not a widely used way for calling the functions
above.

Due to possibility of non-super user calling the UDFs, they were
designed in a way that some commands were using some additional
short-lived superuser connections. That is:
	(a) breaking transactional behavior (e.g., ROLLBACK
 	    wouldn't fully rollback the whole transaction)
        (b) Making it very complicated to reason about which
	    parts of the node activation goes over which connections,
	    and becoming vulnerable to deadlocks / visibility issues.
2022-01-10 08:30:13 -08:00
Onder Kalaci 9f2d9e1487 Move placement deletion from disable node to activate node
We prefer the background daemon to only sync node metadata. That's
why we move placement metadata changes from disable node to
activate node. With that, we can make sure that disable node
only changes node metadata, whereas activate node syncs all
the metadata changes. In essence, we already expect all
nodes to be up when a node is activated. So, this does not change
the behavior much.
2022-01-07 09:56:03 +01:00
Önder Kalacı c9127f921f
Avoid round trips while fixing index names (#5549)
With this commit, fix_partition_shard_index_names()
works significantly faster.

For example,

32 shards, 365 partitions, 5 indexes drop from ~120 seconds to ~44 seconds
32 shards, 1095 partitions, 5 indexes drop from ~600 seconds to ~265 seconds

`queryStringList` can be really long, because it may contain #partitions * #indexes entries.

Before this change, we were actually going through the executor where each command
in the query string triggers 1 round trip per entry in queryStringList.

The aim of this commit is to avoid the round-trips by creating a single query string.

I first simply tried sending `q1;q2;..;qn` . However, the executor is designed to
handle `q1;q2;..;qn` type of query executions via the infrastructure mentioned
above (e.g., by tracking the query indexes in the list and doing 1 statement
per round trip).

One another option could have been to change the executor such that only track
the query index when `queryStringList` is provided not with queryString
including multiple `;`s . That is (a) more work (b) could cause weird edge
cases with failure handling (c) felt like coding a special case in to the executor
2021-12-27 10:29:37 +01:00
Onder Kalaci 549edcabb6 Allow disabling node(s) when multiple failures happen
As of master branch, Citus does all the modifications to replicated tables
(e.g., reference tables and distributed tables with replication factor > 1),
via 2PC and avoids any shardstate=3. As a side-effect of those changes,
handling node failures for replicated tables change.

With this PR, when one (or multiple) node failures happen, the users would
see query errors on modifications. If the problem is intermitant, that's OK,
once the node failure(s) recover by themselves, the modification queries would
succeed. If the node failure(s) are permenant, the users should call
`SELECT citus_disable_node(...)` to disable the node. As soon as the node is
disabled, modification would start to succeed. However, now the old node gets
behind. It means that, when the node is up again, the placements should be
re-created on the node. First, use `SELECT citus_activate_node()`. Then, use
`SELECT replicate_table_shards(...)` to replicate the missing placements on
the re-activated node.
2021-12-01 10:19:48 +01:00
Onder Kalaci b4931f7345 Do not acquire locks on reference tables when a node is removed/disabled
Before this commit, we acquire the metadata locks on the reference
tables while removing/disabling a node on all the MX nodes.

Although it has some marginal benefits, such as a concurrent
modification during remove/disable node blocks, instead of erroring
out, the drawbacks seems worse. Both citus_remove_node and citus_disable_node
are not tolerant to multiple node failures.

With this commit, we relax the locks. The implication is that while
a node is removed/disabled, users might see query errors. On the
other hand, this change becomes removing/disabling nodes more
tolerant to multiple node failures.
2021-11-26 09:08:25 +01:00
Önder Kalacı 8c0bc94b51
Enable replication factor > 1 in metadata syncing (#5392)
- [x] Add some more regression test coverage
- [x] Make sure returning works fine in case of
     local execution + remote execution
     (task->partiallyLocalOrRemote works as expected, already added tests)
- [x] Implement locking properly (and add isolation tests)
     - [x] We do #shardcount round-trips on `SerializeNonCommutativeWrites`.
           We made it a single round-trip.
- [x] Acquire locks for subselects on the workers & add isolation tests
- [x] Add a GUC to prevent modification from the workers, hence increase the
      coordinator-only throughput
       - The performance slightly drops (~%15), unless
         `citus.allow_modifications_from_workers_to_replicated_tables`
         is set to false
2021-11-15 15:10:18 +03:00
Önder Kalacı 98ca6ba6ca
Allow lock_shard_resources to be called by the users with privileges (#5441)
Before this commit, we required the user to be owner of the shard/table
in order to call lock_shard_resources.

However, that is too restrictive. We can have users with GRANTS
to the table who are not owners of the tables/shards.

With this commit, we allow such patterns.
2021-11-08 15:36:51 +01:00
naisila 385ba94d15 Run fix_partition_shard_index_names after each wrong naming command 2021-11-08 10:43:34 +01:00
Nils Dijk 0e7cf9f0ca
reinstate optimization that got unintentionally broken in 366461ccdb (#5418)
DESCRIPTION: Reinstate optimisation for uniform shard interval ranges

During a refactor introduced in #4132 the following change was made, which made the optimisation in `CalculateUniformHashRangeIndex` unreachable: 
366461ccdb (diff-565a339ed3c78bc5a0d4ffeb4e91032150b1dffbeeff59cd3e65981d20b998c7L319-R319)

This PR reinstates the path to the optimisation!
2021-11-05 13:07:51 +01:00
Ahmet Gedemenli 67dca4363d
Dont auto-undistribute user-added citus local tables (#5314)
* Disable auto-undistribute for user-added citus local tables
2021-10-28 12:10:26 +03:00
Philip Dubé cc50682158 Fix typos. Spurred spotting "connectios" in logs 2021-10-25 13:54:09 +00:00
Önder Kalacı b3299de81c
Drop support for citus.multi_shard_commit_protocol (#5380)
In the past, we allowed users to manually switch to 1PC
(e.g., one phase commit). However, with this commit, we
don't. All multi-shard modifications are done via 2PC.
2021-10-21 14:01:28 +02:00
Ahmet Gedemenli d19793c174 Add partitioning support for citus local tables
Add/fix tests

Fix creating partitions

Add test for mx - partition creating case

Enable cascading to partitioned tables

Fix mx partition adding test

Fix cascading through fkeys

Style

Disable converting with non-inherited fkeys

Fix detach bug

Early return in case of cascade & Add tests

Style

Fix undistribute_table bug & Fix test outputs

Remove RemovePartitionRelationIds

Test with undistribute_table

Add test for mx+convert+undistribute

Remove redundant usage of CreatePartitionedCitusLocalTable

Add some comments

Introduce bulk functions for generating attach/detach partition commands

Fix: Convert partitioned tables after adding fkey

Change the error message for partitions

Introduce function ErrorIfPartitionTableAddedToMetadata

Polish attach/detach command generation functions

Use time_partitions for testing

Move mx tests to citus_local_tables_mx

Add new partitioned table to cascade test

Add test with time series management UDFs

Fix test output

Fix: Assertion fail on relation access tracking

Style

Refactor creating partitioned citus local tables

Remove CreatePartitionedCitusLocalTable

Style

Error out if converting multi-level table

Revert some old tests

Error out adding partitioned partition

Polish

Polish/address

Fix create table partition of case

Use CascadeOperationForRelationIdList if no cascade needed

Fix create partition bug

Revert / Add new tests to mx

Style

Fix dropping fkey bug

Add test with IF NOT EXISTS

Convert to CLT when doing ATTACH PARTITION

Add comments

Add more tests with time series management

Edit the error message for converting the child

Use OR instead of AND in ErrorIfUnsupportedAlterTableStmt

Edit/improve tests

Disable ddl prop when dropping default column definitions

Disable/enable ddl prop just before/after the command

Add comment

Add sequence test

Add trigger test

Remove NeedCascadeViaForeignKeys

Add one more insert to sequence test

Add comment

Style

Fix test output shard ids

Update comments

Disable creating fkey on partitions

Move partition check to CreateCitusLocalTable

Add comment

Add check for  attachingmulti-level  partition

Add test for pg_constraint

Check pg_dist_partition in tests

Add test inserting on the worker
2021-10-11 10:45:07 +03:00
Naisila Puka d0390af72d
Add fix_partition_shard_index_names udf to fix currently broken names (#5291)
* Add udf to include shardId in broken partition shard index names

* Address reviews: rename index such that operations can be done on it

* More comprehensive index tests

* Final touches and formatting
2021-10-07 19:34:52 +03:00
Halil Ozan Akgul 43d5853b6d Fixes function names in comments 2021-10-06 09:24:43 +03:00
Marco Slot 4faa49775b Perform copy command as regular user in worker_append_table_to_shard 2021-09-09 11:00:29 +02:00
Sait Talha Nisanci 0b67fcf81d Fix style 2021-09-03 16:09:59 +03:00
Sait Talha Nisanci 96833e2b8f Use HASH_STRINGS explicitly in hash functions
Postgres expects to set the HASH_STRINGS explicitly in case of the
default behaivor for string hash function.

Postgres Commit
b3817f5f774663d55931dd4fab9c5a94a15ae7ab
2021-09-03 15:27:25 +03:00
Halil Ozan Akgul cb3b76ed24 Introduces get_partition_parent_compat and RelationGetPartitionDesc_compat macros
get_partition_parent and RelationGetPartitionDesc functions now have new parameters to also include detached partitions
Thess new macros give us the ability to use these new parameter for PG14 and they don't give the parameters for previous versions
Existing parameters are set to not accept detached partitions

Relevant PG commit:
71f4c8c6f74ba021e55d35b1128d22fb8c6e1629
2021-09-03 15:27:25 +03:00
Halil Ozan Akgul ebf1b7e23f Introduces macros for functions that now have include_out_arguments argument
New macros: FuncnameGetCandidates_compat and expand_function_arguments_compat

The functions (the ones without _compat) now have a new bool include_out_arguments parameter
These new macros give us the ability to use this new parameter for PG14 and it doesn't give the parameter for previous versions
Existing include_out_arguments parameters are set to 'false' to keep current behavior

Relevant PG commit:
e56bce5d43789cce95d099554ae9593ada92b3b7
2021-09-03 15:27:24 +03:00
Hanefi Onaldi 7e39c7ea83
Replace master with citus in logs and comments (#5210)
I replaced 

- master_add_node,
- master_add_inactive_node
- master_activate_node

with

- citus_add_node,
- citus_add_inactive_node
- citus_activate_node

respectively.
2021-08-26 11:31:17 +03:00
Onder Kalaci 5f02d18ef8 transactional metadata sync for maintanince daemon
As we use the current user to sync the metadata to the nodes
with #5105 (and many other PRs), there is no reason that
prevents us to use the coordinated transaction for metadata syncing.

This commit also renames few functions to reflect their actual
implementation.
2021-08-09 10:34:55 +02:00
Onder Kalaci 35964c6366 Dropped columns do not diverge distribution column for partitioned tables
Before this commit, creating a partition after a DROP column
on the parent (position before dist. key) was leading to
partition to have the wrong distribution column.
2021-08-06 13:36:12 +02:00
Onder Kalaci 482b8096e9 Introduce citus_internal_update_relation_colocation
update_distributed_table_colocation can be called by the relation
owner, and internally it updates pg_dist_partition. With this
commit, update_distributed_table_colocation uses an internal
UDF to access pg_dist_partition.

As a result, this operation can now be done by regular users
on MX.
2021-08-03 11:44:58 +02:00
Onder Kalaci 2c349e6dfd Use current user to sync metadata
Before this commit, we always synced the metadata with superuser.
However, that creates various edge cases such as visibility errors
or self distributed deadlocks or complicates user access checks.

Instead, with this commit, we use the current user to sync the metadata.
Note that, `start_metadata_sync_to_node` still requires super user
because accessing certain metadata (like pg_dist_node) always require
superuser (e.g., the current user should be a superuser).

However, metadata syncing operations regarding the distributed
tables can now be done with regular users, as long as the user
is the owner of the table. A table owner can still insert non-sense
metadata, however it'd only affect its own table. So, we cannot do
anything about that.
2021-07-16 13:25:27 +02:00
Sait Talha Nisanci e7ed16c296 Not include to-be-deleted shards while finding shard placements
Ignore orphaned shards in more places

Only use active shard placements in RouterInsertTaskList

Use IncludingOrphanedPlacements in some more places

Fix comment

Add tests
2021-06-28 13:05:31 +03:00
Jelte Fennema f4a2d99ce9
Harden ReplicateShardToNode to unexpected placements (#5071)
Originally ReplicateShardToNode was meant for
`upgrade_to_reference_table`, which required handling of existing inactive
placements. These days `upgrade_to_reference_table` is deprecated and
cannot be used anymore. Now that we have SHARD_STATE_TO_DELETE too, this
left over code seemed error prone. So this removes support for
activating inactive reference table placemements, since these should not
be possible. If it finds a non active reference table placement anyway
it now errors out.

This also removes a few outdated comments related to `upgrade_to_refeference_table`.
2021-06-24 13:11:02 +03:00
Marco Slot a7e4d6c94a Fix a bug that causes worker_create_or_alter_role to crash with NULL input 2021-06-15 20:07:08 +02:00
Jelte Fennema 1a83628195 Use "orphaned shards" naming in more places
We were not very consistent in how we named these shards.
2021-06-04 11:39:19 +02:00
Hanefi Onaldi fa29d6667a
Accept invalidation before fk graph validity check (#5017)
InvalidateForeignKeyGraph sends an invalidation via shared memory to all
backends, including the current one.

However, we might not call AcceptInvalidationMessages before reading
from the cache below. It would be better to also add a call to
AcceptInvalidationMessages in IsForeignConstraintRelationshipGraphValid.
2021-06-02 14:45:35 +03:00
Jelte Fennema b1cad26ebc Move CheckCitusVersion to the top of each function
Previously this was usually done after argument parsing. This can cause
SEGFAULTs if the number or type of arguments changes in a new version.
By checking that Citus version is correct before doing any argument
parsing we protect against these types of issues. Issues like this have
occurred in pg_auto_failover, so it's not just a theoretical issue.

The main reason why these calls were not at the top of functions is
really just historical. It was because in the past we didn't allow
statements before declarations. Thus having this check before the
argument parsing would have only been possible if we first declared all
variables.

In addition to moving existing CheckCitusVersion calls it also adds
these calls to rebalancer related functions (they were missing there).
2021-06-01 17:43:46 +02:00
SaitTalhaNisanci 82f34a8d88
Enable citus.defer_drop_after_shard_move by default (#4961)
Enable citus.defer_drop_after_shard_move by default
2021-05-21 10:48:32 +03:00
Marco Slot 644b266dee Only cache local plans when reusing a distributed plan 2021-05-18 16:11:43 +02:00
SaitTalhaNisanci eaa7d2bada
Not block maintenance daemon (#4972)
It was possible to block maintenance daemon by taking an SHARE ROW
EXCLUSIVE lock on pg_dist_placement. Until the lock is released
maintenance daemon would be blocked.

We should not block the maintenance daemon under any case hence now we
try to get the pg_dist_placement lock without waiting, if we cannot get
it then we don't try to drop the old placements.
2021-05-17 03:22:35 -07:00
Nils Dijk c91f8d8a15
Feature: localhost guc (#4836)
DESCRIPTION: introduce `citus.local_hostname` GUC for connections to the current node

Citus once in a while needs to connect to itself for some systems operations. This used to be hardcoded to `localhost`. The hardcoded hostname causes some issues, for example in environments where `sslmode=verify-full` is required. It is not always desirable or even feasible to get `localhost` as an alt name on the certificate.

By introducing a GUC to use when connecting to the current instance the user has more control what network path is used and what hostname is required to be present in the server certificate.
2021-05-12 16:59:44 +02:00
Ahmet Gedemenli bc818e76e2 Add notice log message for skipping child tables for optimization 2021-05-06 16:49:37 +03:00
SaitTalhaNisanci 93c2dcf3d2
Fix data-race with concurrent calls of DropMarkedShards (#4909)
* Fix problews with concurrent calls of DropMarkedShards

When trying to enable `citus.defer_drop_after_shard_move` by default it
turned out that DropMarkedShards was not safe to call concurrently.
This could especially cause big problems when also moving shards at the
same time. During tests it was possible to trigger a state where a shard
that was moved would not be available on any of the nodes anymore after
the move.

Currently DropMarkedShards is only called in production by the
maintenaince deamon. Since this is only a single process triggering such
a race is currently impossible in production settings. In future changes
we will want to call DropMarkedShards from other places too though.

* Add some isolation tests

Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
2021-04-21 10:59:48 +03:00
Ahmet Gedemenli 33c620f232
Optimize partitioned disk size calculation (#4905)
* Optimize partitioned disk size calculation

* Polish

* Fix test for citus_shard_cost_by_disk_size

Try optimizing if not CSTORE
2021-04-19 13:30:56 +03:00
Hanefi Onaldi 9919fbe3f8 Switch to sequential mode on long partition names
This commit adds support for long partition names for distributed tables:
- ALTER TABLE dist_table ATTACH PARTITION ..
- CREATE TABLE .. PARTITION OF dist_table ..

Note: create_distributed_table UDF does not support long table and
partition names, and is not covered in this commit
2021-04-14 15:27:50 +03:00
SaitTalhaNisanci 03832f353c Drop postgres 11 support 2021-03-25 09:20:28 +03:00
Naisila Puka 2f30614fe3
Reimplement citus_update_table_statistics to detect dist. deadlocks (#4752)
* Reimplement citus_update_table_statistics

* Update stats for the given table not colocation group

* Add tests for reimplemented citus_update_table_statistics

* Use coordinated transaction, merge with citus_shard_sizes functions

* Update the old master_update_table_statistics as well
2021-03-03 04:12:30 +03:00
Hanefi Onaldi 9a792ef841 Remove length limitations for table renames 2021-02-24 03:35:27 +03:00
SaitTalhaNisanci dcf54eaf2a Use PROCESS_UTILITY_QUERY in utility calls
When we use PROCESS_UTILITY_TOPLEVEL it causes some problems when
combined with other extensions such as pg_audit. With this commit we use
PROCESS_UTILITY_QUERY in the codebase to fix those problems.
2021-02-19 13:55:59 +03:00
Sait Talha Nisanci bbf6132226 Revert "wip (#4730)"
This reverts commit 62e6d54a4e.
2021-02-19 13:55:59 +03:00
SaitTalhaNisanci 62e6d54a4e
wip (#4730) 2021-02-19 13:42:19 +03:00
Hanefi Onaldi 353b080474
Fix Semmle errors (#4636)
Co-authored-by: Halil Ozan Akgül <hozanakgul@gmail.com>
2021-02-08 18:37:44 +03:00
Hanefi Önaldı cab17afce9 Introduce UDFs for fixing partitioned table constraint names 2021-01-29 17:32:20 +03:00
SaitTalhaNisanci 738825cc38
Fix partition column index issue (#4591)
* Fix partition column index issue

We send column names to worker_hash/range_partition_table methods, and
in these methods we check the column name index from tuple descriptor.
Then this index is used to decide the bucket that the current row will
be sent for the repartition.

This becomes a problem when there are the same column names in the
tupleDescriptor. Then we can choose the wrong index. Hence the
partitioned data will be put to wrong workers. Then the result could
miss some data because workers might contain different range of data.

An example:
TupleDescriptor contains "trip_id", "car_id", "car_id" for one table.
It contains only "car_id" for the other table. And assuming that the
tables will be partitioned by car_id, it is not certain what should be
used for deciding the bucket number for the first table. Assuming value
2 goes to bucket 2 and value 3 goes to bucket 3, it is not certain which
bucket "1 2 3" (trip_id, car_id, car_id)  row will go to.

As a solution we send the index of partition column in targetList
instead of the column name.

The old API is kept so that if workers upgrade work, it still works
(though it will have the same bug)

* Use the same method so that backporting is easier
2021-01-29 14:40:40 +03:00
Onur Tirtir cacb76d2c6
Not mention citus local tables in error messages (#4579) 2021-01-27 12:36:53 +03:00
Ahmet Gedemenli 14bf9d85d6
Merge branch 'master' into fix-maintenance-daemon-crash 2021-01-26 12:52:28 +03:00
Onur Tirtir 941c8fbf32
Automatically undistribute citus local tables when no more fkeys with reference tables (#4538) 2021-01-22 18:15:41 +03:00
Ahmet Gedemenli 5022fc8301 Remove failing assertions 2021-01-22 17:09:24 +03:00
Ahmet Gedemenli 63fab1b7d9
Merge branch 'master' into remove-deprecated-gucs-udfs 2021-01-22 13:29:07 +03:00
SaitTalhaNisanci 3d69ab5576
Choose the smallest colocation id among all matches (#4559)
Currently we choose an arbitrary colocation id from all the matches for
a colocation id. This could mean that 2 distributed tables, which have
the same scheme could go into different colocation groups. This fix
makes sure that the same match will go to the same colocation group.
2021-01-22 13:28:43 +03:00
Ahmet Gedemenli 89a6fe83f7 Replace to update_distributed_table_colocation for tests 2021-01-20 17:30:06 +03:00
Onder Kalaci 8df58926c5 Rename CitusProcessUtility -> ProcessUtilityForNode 2021-01-20 15:54:00 +03:00
Hadi Moshayedi bc01c795a2 Reland #4419 2021-01-19 07:48:47 -08:00
Marco Slot 011283122b Add the shard rebalancer implementation 2021-01-07 16:51:55 +01:00
Marco Slot 47c1b19174 Revert "Do metadata sync in a separate background worker."
This reverts commit 4df723cf9b.
2021-01-07 10:30:04 +01:00
Hadi Moshayedi 4df723cf9b Do metadata sync in a separate background worker. 2020-12-24 08:25:55 -08:00
Onur Tirtir 5ed9197041
Implement infra to get foreign key connected relations (#4439)
On top of our foreign key graph, implement the infrastructure to get
list of relations that are connected to input relation via a foreign key
graph.
We need this to support cascading create_citus_local_table &
undistribute_table operations.

Also add regression tests to see what our foreign key graph is able to
capture currently.
2020-12-24 16:42:40 +03:00
Onur Tirtir 0db21bbe14
Remove fkey graph visited flags & rework GetConnectedListHelper (#4446)
With this commit, we remove visited flags from ForeignConstraintRelationshipNode
struct since keeping local state in global object is both dangerous and
meaningless.

Also to improve readability, this commit also converts needless recursion to
iterative DFS to avoid passing local hash-map as another parameter to
GetConnectedListHelper function.
2020-12-24 12:38:48 +03:00
Marco Slot e3dcc278e0 Remove upgrade_to_reference_table UDF 2020-12-23 00:40:14 +01:00
Onur Tirtir 3f60b08b11
Refactor foreign_key_relationship.c (#4438) 2020-12-22 18:12:02 +03:00
Sait Talha Nisanci 3aed6c3ad0 Rename containsOnlyLocalTable as isLocalTableModification
Update error message in Modify View
2020-12-15 18:18:36 +03:00
Sait Talha Nisanci 5618f3a3fc Use BaseRestrictInfo for finding equality columns
Baseinfo also has pushed down filters etc, so it makes more sense to use
BaseRestrictInfo to determine what columns have constant equality
filters.

Also RteIdentity is used for removing conversion candidates instead of
rteIndex.
2020-12-15 18:18:36 +03:00
Sait Talha Nisanci 69992d58f9 Add broken local-dist table modifications tests
It seems that most of the updates were broken, we weren't aware of it
because there wasn't any data in the tables. They are broken mostly
because local tables do not have a shard id and some code paths should
be updated with that information, currently when there is an invalid
shard id, it is assumed to be pruned.

Consider local tables in router planner

In case there is a local table, the shard id will not be valid and there
are some checks that rely on shard id, we should skip these in case of
local tables, which is handled with a dummy placement.

Add citus local table dist table join tests

add local-dist table mixed joins tests
2020-12-15 18:18:36 +03:00
Onur Tirtir 0eb5701658
Not consider single shard hash dist. tables as replicated (#4413) 2020-12-15 14:33:01 +03:00
Nils Dijk 6f9c040f76
DESCRIPTION: Propagate columnar table settings for distributed tables
When distributing a columnar table, as well as changing options on a distributed columnar table, this patch will forward the settings from the coordinator to the workers.

For propagating options changes on an already distributed table this change is pretty straight forward. Before applying the change in options locally we will create a `DDLJob` that contains a call to `alter_columnar_table_set(...)` for every shard placement with all settings of the current table. This goes both for setting an option as well as resetting. This will reset the values to the defaults configured on the coordinator. Having the effect that the coordinator is authoritative on the settings and makes sure the shards have the same settings set as the table on the coordinator.

When a columnar table is distributed it is using the `TableDDLCommand` infra structure to create a new kind of `TableDDLCommand`. This new type, called a `TableDDLCommandFunction` contains a context and 2 function pointers to execute. One function returns the command as applied on the table, the second function will return the sql command to apply to a shard with a given shard id. The schema name is ignored as it will use the fully qualified name of the shard in the same schema as the base table.
2020-12-02 13:02:42 +01:00
Nils Dijk 326e6afa53
refactor table ddl events scoped for shards (#4342)
Refactor internals on how Citus creates the SQL commands it sends to recreate shards.

Before Citus collected solely ddl commands as `char *`'s to recreate a table. If they were used to create a shard they were wrapped with `worker_apply_shard_ddl_command` and send to the workers. On the workers the UDF wrapping the ddl command would rewrite the parsetree to replace tables names with their shard name equivalent.

This worked well, but poses an issue when adding columnar. Due to limitations in Postgres on creating custom options on table access methods we need to fall back on a UDF to set columnar specific options. Now, to recreate the table, we can not longer rely on having solely DDL statements to recreate a table.

A prototype was made to run this UDF wrapped in `worker_apply_shard_ddl_command`. This became pretty messy, hard to understand and subsequently hard to maintain.

This PR proposes a refactor of the internal representation of table ddl commands into a `TableDDLCommand` structure. The current implementation only supports a `char *` as its contents. Based on the use of the DDL statement (eg. creating the table -mx- or creating a shard) one of two different functions can be called to get the statement to send to the worker:
 - `GetTableDDLCommand(TableDDLCommand *command)`: This function returns that ddl command to create the table. In this implementation it will just return the `char *`. This has the same functionality as getting the old list and not wrapping it.
 - `GetShardedTableDDLCommand(TableDDLCommand *command, uint64 shardId, char *schemaName)`: This function returns the ddl command wrapped in `worker_apply_shard_ddl_command` with the `shardId` as an argument. Due to backwards compatibility it also accepts a. `schemaName`. The exact purpose is not directly clear. Ideally new implementations would work with fully qualified statements and ignore the `schemaName`.

A future implementation could accept 2.function pointers and a `void *` for context to let the two pointers work on. This gives greater flexibility in controlling what commands get send in which situations. Also, in a future, we could implement the intermediate step of creating the `parsetree` datastructure of statements based on the contents in the catalog with a corresponding deparser. For sharded queries a mutator could be ran over the parsetree to rewrite the tablenames to the names with the shard identifier. This will completely omit the requirement for `worker_apply_shard_ddl_command`.
2020-11-26 13:31:59 +01:00
Onur Tirtir 46be63d76b
Refactor PreprocessIndexStmt (#4272) 2020-11-25 12:19:37 +03:00
Onder Kalaci 5c4c9304ba Remove RemoveDuplicateJoinRestrictions() function
RemoveDuplicateJoinRestrictions() function was introduced with the aim of decrasing the overall planning times by eliminating the duplicate JOIN restriction entries (#1989). However, it turns out that the function itself is so CPU intensive with a very high algorithmic complexity, it hurts a lot more than it helps. The function is a clear example of premature optimization.

The table below shows the difference clearly:

"distributed query planning
 time master"	RemoveDuplicateJoinRestrictions() execution time on master	"Remove the function RemoveDuplicateJoinRestrictions()
this PR"
5 table INNER JOIN	9 msec	2msec	7 msec
10 table INNER JOIN	227 msec	194 msec	29  msec
20 table INNER JOIN	1 sec 235 msec	1  sec 139  msec	90 msecs
50 table INNER JOIN	24 seconds	21 seconds	1.5 seconds
100 table INNER JOIN	2 minutes 16 secods	1 minute 53 seconds	23 seconds
250 table INNER JOIN	Bottleneck on JoinClauseList	18 minutes 52 seconds	Bottleneck on JoinClauseList

5 table INNER JOIN in subquery	9 msec	0 msec	6 msec
10 table INNER JOIN subquery	33 msec	10 msec	32 msec
20 table INNER JOIN subquery	132 msec	67 msec	123 msec
50 table INNER JOIN subquery	1.2  seconds	900 msec	500 msec
100 table INNER JOIN subquery	6 seconds	5  seconds	2 seconds
250 table INNER JOIN subquery	54 seconds	37 seconds	20  seconds

5 table LEFT JOIN	5 msec	0 msec	5 msec
10 table LEFT JOIN	11 msec	0 msec	13 msec
20 table LEFT JOIN	26 msec	2 msec	30 msec
50 table LEFT JOIN	150 msec	15 msec	193 msec
100 table LEFT JOIN	757 msec	71 msec	722 msec
250 table LEFT JOIN	8 seconds	600 msec	8 seconds

5 JOINs among 2 table JOINs 	37 msec	11 msec	25 msec
10 JOINs among 2 table JOINs 	536 msec	306 msec	352 msec
20 JOINs among 2 table JOINs 	794 msec	181 msec	640 msec
50 JOINs among 2 table JOINs 	25 seconds	2 seconds	22 seconds
100 JOINs among 2 table JOINs 	Bottleneck on JoinClauseList	9 seconds	Bottleneck on JoinClauseList
150 JOINs among 2 table JOINs 	Bottleneck on JoinClauseList	46 seconds	Bottleneck on JoinClauseList

On top of the performance penalty, the function had a critical bug #4255, and with #4254 we hit one more important bug. It should be fixed by adding the followig check to the ContextCoversJoinRestriction():
```
static bool
JoinRelIdsSame(JoinRestriction *leftRestriction, JoinRestriction *rightRestriction)
{
	Relids leftInnerRelIds = leftRestriction->innerrel->relids;
	Relids rightInnerRelIds = rightRestriction->innerrel->relids;
	if (!bms_equal(leftInnerRelIds, rightInnerRelIds))
	{
		return false;
	}

	Relids leftOuterRelIds = leftRestriction->outerrel->relids;
	Relids rightOuterRelIds = rightRestriction->outerrel->relids;
	if (!bms_equal(leftOuterRelIds, rightOuterRelIds))
	{
		return false;
	}

	return true;
}
```

However, adding this eliminates all the benefits tha RemoveDuplicateJoinRestrictions() brings.

I've used the commands here to generate the JOINs mentioned in the PR: https://gist.github.com/onderkalaci/fe8654f9df5916c7af4c7c5eb892561e#file-gistfile1-txt

Inner and outer JOINs behave roughly the same, to simplify the table only added INNER joins.
2020-10-21 10:29:39 +02:00
SaitTalhaNisanci 0f209377c4
Fix incorrect join related fields (#4242)
* Fix incorrect join related fields

Ruleutils expect to give the original index of join columns hence we
should consider the dropped columns while setting the fields in
SetJoinRelatedFieldsCompat.

* add some more tests for joins

* Move tests to join.sql and create a utility function
2020-10-19 18:28:39 +03:00
Onur Tirtir f80f4839ad Remove unused functions that cppcheck found 2020-10-19 13:50:52 +03:00
Onur Tirtir de6f2d3f42
Refactor JoinRestrictionListExistsInContext to improve readability (#4249) 2020-10-16 12:24:56 +03:00
Onur Tirtir 8efca3b60a
Fix a crash with inserting domain composite types in coord. evaluation (#4231)
Use short lived per-tuple context in citus_evaluate_expr like
(pg) evaluate_expr does.

We should not use planState->ExprContext when evaluating expressions
as it might lead to freeing the same executor twice (first one happens
in citus_evaluate_expr itself and the other one happens when postgres
doing clean-up for the top level executor state), which in turn might
cause seg.faults.

However, now as we don't have necessary planState info to evaluate
prepared statements, we also add planState->es_param_list_info to
per-tuple ExprContext.
2020-10-13 14:19:59 +03:00
Halil Ozan Akgul e2736c25bd Adds support for WITH TIES option 2020-10-12 19:34:18 +03:00
Marco Slot 881e5df780 Fix a bug that could lead to multiple maintenance daemons 2020-10-08 16:18:14 +02:00
Ahmet Gedemenli 81db4dca5c Degrade gracefully when no background workers available 2020-10-05 16:55:00 +03:00
Önder Kalacı df5aa0f0cc
Switch to sequential execution if the index name is long (#4209)
Citus has the logic to truncate the long shard names to prevent
various issues, including self-deadlocks. However, for partitioned
tables, when index is created on the parent table, the index names
on the partitions are auto-generated by Postgres. We use the same
Postgres function to generate the index names on the shards of the
partitions. If the length exceeds the limit, we switch to sequential
execution mode.
2020-10-02 13:39:34 +03:00
Ahmet Gedemenli abfb79bda6 Sort explain analyze output by task time
Add sort method parameter for regression tests

Fix check-style

Change sorting method parameters to enum

Polish

Add task fields to OutTask

Add test into multi_explain

Fix isolation test
2020-09-24 11:38:40 +03:00
SaitTalhaNisanci e7cd1ed0ee
Not take ShareUpdateExlusiveLock on pg_dist_transaction (#4184)
* Not take ShareUpdateExlusiveLock on pg_dist_transaction

We were taking ShareUpdateExlusiveLock on pg_dist_transaction during
recovery to prevent multiple recoveries happening concurrenly. VACUUM(
not FULL) also takes ShareUpdateExclusiveLock, and they can conflict. It
seems that VACUUM will skip the table if there is a conflicting lock
already taken unless it is doing the vacuum to prevent id wraparound, in
which case there can be a deadlock. I guess the deadlock happens if:

- VACUUM takes a lock on pg_dist_transaction and is done for id
wraparound problem
- The transaction in the maintenance tries to take a lock but
cannot as that conflicts with the lock acquired by VACUUM
- The transaction in the maintenance daemon has a very old xid hence
VACUUM cannot proceed.

If we take a row exclusive lock in transaction recovery then it wouldn't
conflict with VACUUM hence it could proceed so the deadlock would be
resolved. To prevent concurrent transaction recoveries happening, an
advisory lock is taken with ShareUpdateExlusiveLock as before.

* Use CITUS_OPERATIONS tag
2020-09-21 15:20:38 +03:00
Onur Tirtir 1b31b22635 Refactor the functions that return OID lists for citus tables 2020-09-18 16:42:46 +03:00
SaitTalhaNisanci 6e316d46a2
Remove unused variable (#4172) 2020-09-18 11:25:07 +03:00
Onur Tirtir 0b1cc118a9 Adapt other cache entry changes for citus local tables 2020-09-09 11:50:55 +03:00
Onur Tirtir a58a4395ab Extend citus local table utility command support
This commit brings following features:

Foreign key support from citus local tables to reference tables
* Foreign key support from reference tables to citus local tables
  (only with RESTRICT & NO ACTION behavior)
* ALTER TABLE ENABLE/DISABLE trigger command support
* CREATE/DROP/ALTER trigger command support

and disallows:
* ALTER TABLE ATTACH/DETACH PARTITION commands
* CREATE TABLE <postgres table> ATTACH PARTITION <citus local table>
  commands
* Foreign keys from postgres tables to citus local tables
  (the other way was already disallowed)

for citus local tables.
2020-09-09 11:50:55 +03:00
Onur Tirtir 17cc810372 Implement "citus local table" creation logic 2020-09-09 11:50:48 +03:00
Nils Dijk 6e4862c57f
expose transfermode for ensure reference table existance 2020-09-03 16:06:37 +02:00
SaitTalhaNisanci 366461ccdb
Introduce cache entry/table utilities (#4132)
Introduce table entry utility functions

Citus table cache entry utilities are introduced so that we can easily
extend existing functionality with minimum changes, specifically changes
to these functions. For example IsNonDistributedTableCacheEntry can be
extended for citus local tables without the need to scan the whole
codebase and update each relevant part.

* Introduce utility functions to find the type of tables

A table type can be a reference table, a hash/range/append distributed
table. Utility methods are created so that we don't have to worry about
how a table is considered as a reference table etc. This also makes it
easy to extend the table types.

* Add IsCitusTableType utilities

* Rename IsCacheEntryCitusTableType -> IsCitusTableTypeCacheEntry

* Change citus table types in some checks
2020-09-02 22:26:05 +03:00
SaitTalhaNisanci 73ef40886b
Rename FindNodeCheckXXX functions (#4106)
FindNodeCheck is not clear about what the function is doing. They are
renamed to FindNodeMatchingCheckFunctionXXX. Also for choosing elements in these
functions, CheckNodeFunc type is introduced.
2020-08-11 15:01:23 +03:00