Commit Graph

337 Commits (3b1c082791592097e2c944ef98ec86e4234882bf)

Author SHA1 Message Date
Jelte Fennema 4c20bf7a36
Remove pg_dist_rebalence_strategy_enterprise_check (#5014)
This is not necessary anymore now that the rebalancer is open source.
2021-06-01 06:16:46 -07:00
Jelte Fennema 10f06ad753 Fetch shard size on the fly for the rebalance monitor
Without this change the rebalancer progress monitor gets the shard sizes
from the `shardlength` column in `pg_dist_placement`. This column needs to
be updated manually by calling `citus_update_table_statistics`.
However, `citus_update_table_statistics` could lead to distributed
deadlocks while database traffic is on-going (see #4752).

To work around this we don't use `shardlength` column anymore. Instead
for every rebalance we now fetch all shard sizes on the fly.

Two additional things this does are:
1. It adds tests for the rebalance progress function.
2. If a shard move cannot be done because a source or target node is
   unreachable, then we error in stop the rebalance, instead of showing
   a warning and continuing. When using the by_disk_size rebalance
   strategy it's not safe to continue with other moves if a specific
   move failed. It's possible that the failed move made space for the
   next move, and because the failed move never happened this space now
   does not exist.
3. Adds two new columns to the result of `get_rebalancer_progress` which
   shows the size of the shard on the source and target node.

Fixes #4930
2021-05-20 16:38:17 +02:00
Jelte Fennema 924959fdb1
Include result type in upgrade diff test (#4987)
We often change result types of functions slightly. Our downgrade tests
wouldn't notice these changes. This change adds them to the description
of these items.

An example of an SQL change that isn't caught without this change and is
caught with the get_rebalance_progress change in this PR:
https://github.com/citusdata/citus/pull/4963
2021-05-18 16:25:39 +02:00
Jelte Fennema cbbd10b974
Implement an improvement threshold in the rebalancer (#4927)
Every move in the rebalancer algorithm results in an improvement in the
balance. However, even if the improvement in the balance was very small
the move was still chosen. This is especially problematic if the shard
itself is very big and the move will take a long time.

This changes the rebalancer algorithm to take the relative size of the
balance improvement into account when choosing moves. By default a move
will not be chosen if it improves the balance by less than half of the
size of the shard. An extra argument is added to the rebalancer
functions so that the user can decide to lower the default threshold if
the ignored move is wanted anyway.
2021-05-11 14:24:59 +02:00
SaitTalhaNisanci 6b1904d37a
When moving a shard to a new node ensure there is enough space (#4929)
* When moving a shard to a new node ensure there is enough space

* Add WairForMiliseconds time utility

* Add more tests and increase readability

* Remove the retry loop and use a single udf for disk stats

* Address review

* address review

Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
2021-05-06 17:28:02 +03:00
Ahmet Gedemenli 332c5ce4ad
Fix worker partitioned size functions (#4922) 2021-04-26 10:29:46 +03:00
Ahmet Gedemenli e445e3d39c
Introduce 3 partitioned size udfs (#4899)
* Introduce 3 partitioned size udfs

* Add tests for new partition size udfs

* Fix type incompatibilities

* Convert UDFs into pure sql functions

* Fix function comment
2021-04-13 17:36:27 +03:00
Onur Tirtir fe5c985e1d
Remove HAS_TABLEAM config since we dropped pg11 support (#4862)
* Remove HAS_TABLEAM config

* Drop columnar_ensure_objects_exist

* Not call columnar_ensure_objects_exist in citus_finish_pg_upgrade
2021-04-13 10:51:26 +03:00
Halil Ozan Akgul a5038046f9 Adds shard_count parameter to create_distributed_table 2021-03-29 16:22:49 +03:00
Ahmet Gedemenli 5e5db9eefa Add udf citus_get_active_worker_nodes 2021-03-17 13:15:59 +03:00
Onur Tirtir dcc0207605 Add 10.0-2 schema version 2021-02-26 12:31:09 +03:00
Onur Tirtir 676d9a9726 Bump Citus to 10.1devel 2021-02-17 11:54:33 +03:00
Hadi Moshayedi c8d61a31e2 Columnar: chunk_group metadata table 2021-02-09 14:11:58 -08:00
Hanefi Önaldı cab17afce9 Introduce UDFs for fixing partitioned table constraint names 2021-01-29 17:32:20 +03:00
Onur Tirtir dfcdccd0e7 Rename udf in regression tests (as per prev commit) 2021-01-27 15:52:37 +03:00
Onur Tirtir 1a4482a37c Get rid of the sql dir for new udf 2021-01-27 15:52:37 +03:00
Onur Tirtir 2f30be823e Rename create_citus_local_table to citus_add_local_table_to_metadata
For simplicity in downgrade test in multi_extension, didn't
actually remove create_citus_local_table udf.
2021-01-27 15:52:36 +03:00
Onur Tirtir 941c8fbf32
Automatically undistribute citus local tables when no more fkeys with reference tables (#4538) 2021-01-22 18:15:41 +03:00
Ahmet Gedemenli 76354ff563
Merge branch 'master' into remove-deprecated-gucs-udfs 2021-01-22 12:47:06 +03:00
Hadi Moshayedi ff38996645 More meaningful columnar metadata table names 2021-01-21 21:29:07 -08:00
Ahmet Gedemenli ceb6b503c0 Remove unused UDF mark_tables_colocated 2021-01-20 17:29:23 +03:00
Marco Slot b840e97cd6 Add a alter_old_partitions_set_access_method UDF 2021-01-14 10:44:14 +01:00
Halil Ozan Akgul 2be14cce2e Adds alter_distributed_table and alter_table_set_access_method UDFs 2021-01-13 16:02:39 +03:00
SaitTalhaNisanci 724d56f949
Add citus shard helper view (#4361)
With citus shard helper view, we can easily see:
- where each shard is, which node, which port
- what kind of table it belongs to
- its size

With such a view, we can see shards that have a size bigger than some
value, which could be useful. Also debugging can be easier in production
as well with this view.

Fetch shards in one go per node

The previous implementation was slow because it would do a lot of round
trips, one per shard to be exact. Hence it is improved so that we fetch
all the shard_name, shard-size pairs per node in one go.

Construct shards_names, sizes query on coordinator
2021-01-13 13:58:47 +03:00
Ahmet Gedemenli 436c9d9d79
Remove the word 'master' from Citus UDFs (#4472)
* Replace master_add_node with citus_add_node

* Replace master_activate_node with citus_activate_node

* Replace master_add_inactive_node with citus_add_inactive_node

* Use master udfs in old scripts

* Replace master_add_secondary_node with citus_add_secondary_node

* Replace master_disable_node with citus_disable_node

* Replace master_drain_node with citus_drain_node

* Replace master_remove_node with citus_remove_node

* Replace master_set_node_property with citus_set_node_property

* Replace master_unmark_object_distributed with citus_unmark_object_distributed

* Replace master_update_node with citus_update_node

* Replace master_update_shard_statistics with citus_update_shard_statistics

* Replace master_update_table_statistics with citus_update_table_statistics

* Rename master_conninfo_cache_invalidate to citus_conninfo_cache_invalidate

Rename master_dist_local_group_cache_invalidate to citus_dist_local_group_cache_invalidate

* Replace master_copy_shard_placement with citus_copy_shard_placement

* Replace master_move_shard_placement with citus_move_shard_placement

* Rename master_dist_node_cache_invalidate to citus_dist_node_cache_invalidate

* Rename master_dist_object_cache_invalidate to citus_dist_object_cache_invalidate

* Rename master_dist_partition_cache_invalidate to citus_dist_partition_cache_invalidate

* Rename master_dist_placement_cache_invalidate to citus_dist_placement_cache_invalidate

* Rename master_dist_shard_cache_invalidate to citus_dist_shard_cache_invalidate

* Drop master_modify_multiple_shards

* Rename master_drop_all_shards to citus_drop_all_shards

* Drop master_create_distributed_table

* Drop master_create_worker_shards

* Revert old function definitions

* Add missing revoke statement for citus_disable_node
2021-01-13 12:10:43 +03:00
Marco Slot d900a7336e Automatically add placeholder record for coordinator 2021-01-08 15:09:53 +01:00
Marco Slot 597533b1ff Add citus_set_coordinator_host 2021-01-08 13:36:26 +01:00
Marco Slot e7f13978b5 Add a view for simple (time) partitions and their access methods 2021-01-08 11:28:15 +01:00
Onur Tirtir 5289785da4
Add cascade_via_foreign_keys option to create_citus_local_table (#4462) 2021-01-08 15:13:26 +03:00
Onur Tirtir f3801143fb Add cascade option to undistribute_table 2021-01-07 15:41:49 +03:00
Marco Slot e3dcc278e0 Remove upgrade_to_reference_table UDF 2020-12-23 00:40:14 +01:00
Marco Slot 321cc784c7 Collapse Citus 7.* scripts into Citus 8.0-1 2020-12-21 22:55:51 +01:00
Marco Slot 8e8adcd92a Harden citus_tables against node failure 2020-12-13 15:10:40 +01:00
Hadi Moshayedi 4668fe51a6 Columnar: Make compression level configurable 2020-12-09 08:48:50 -08:00
Jeff Davis 5b3c32eb38 fixup tests 2020-12-07 13:18:22 -08:00
Jeff Davis 068af7f38e fixup upgrade tests 2020-12-07 13:11:51 -08:00
Marco Slot c9b658daea Add a public.citus_tables view 2020-12-03 17:31:40 +01:00
Hadi Moshayedi a94e8c9cda
Associate column store metadata with storage id (#4347) 2020-11-30 18:01:43 -08:00
Nils Dijk 383e334023
refactor options to their own table linked to the regclass (#4346)
Columnar options were by accident linked to the relfilenode instead of the regclass/relation oid. This PR moves everything related to columnar options to their own catalog table.
2020-11-27 11:22:08 -08:00
Jeff Davis 8cee2b092b remove columnar FDW code 2020-11-20 10:03:12 -08:00
Jeff Davis cef1d0e915 fixup test output 2020-11-19 12:45:52 -08:00
Jeff Davis 91015deb9d rename UDFs also 2020-11-19 12:27:40 -08:00
Jeff Davis a2b698a766 rename cstore_tableam -> columnar 2020-11-19 12:15:51 -08:00
Nils Dijk 22df8027b0
add extra output for multi_extension targeting pg11 2020-11-17 19:01:54 +01:00
Nils Dijk 7c891a01a9 create missing objects during upgrade path 2020-11-17 19:01:51 +01:00
Nils Dijk d065bb495d
Prepare downgrade script and bump development version to 10.0-1 2020-11-17 18:55:35 +01:00
Onur Tirtir 5e3dc9d707 Bump citus version to 10.0devel 2020-11-09 13:16:54 +03:00
Marco Slot 881e5df780 Fix a bug that could lead to multiple maintenance daemons 2020-10-08 16:18:14 +02:00
Marco Slot 18219843d0 Add maintenance daemon error tests 2020-10-08 16:17:33 +02:00
Marco Slot dbc348b7e0 Create sequence dependency during metadata syncing 2020-10-06 10:57:39 +02:00
Marco Slot 9bba8bb4e8 Remove master_drop_sequences 2020-10-06 10:57:33 +02:00
Onur Tirtir 64d5ac6a10
Do not downgrade if a citus local table exists (#4174)
As the previous versions of Citus don't know how to handle citus local
tables, we should prevent downgrading from 9.5 to older versions if any
citus local tables exists.
2020-09-22 14:19:50 +03:00
Onur Tirtir 17cc810372 Implement "citus local table" creation logic 2020-09-09 11:50:48 +03:00
Halil Ozan Akgul 375310b7f1 Adds support for table undistribution 2020-08-05 14:36:03 +03:00
SaitTalhaNisanci b3af63c8ce
Remove task tracker executor (#3850)
* use adaptive executor even if task-tracker is set

* Update check-multi-mx tests for adaptive executor

Basically repartition joins are enabled where necessary. For parallel
tests max adaptive executor pool size is decresed to 2, otherwise we
would get too many clients error.

* Update limit_intermediate_size test

It seems that when we use adaptive executor instead of task tracker, we
exceed the intermediate result size less in the test. Therefore updated
the tests accordingly.

* Update multi_router_planner

It seems that there is one problem with multi_router_planner when we use
adaptive executor, we should fix the following error:
+ERROR:  relation "authors_range_840010" does not exist
+CONTEXT:  while executing command on localhost:57637

* update repartition join tests for check-multi

* update isolation tests for repartitioning

* Error out if shard_replication_factor > 1 with repartitioning

As we are removing the task tracker, we cannot switch to it if
shard_replication_factor > 1. In that case, we simply error out.

* Remove MULTI_EXECUTOR_TASK_TRACKER

* Remove multi_task_tracker_executor

Some utility methods are moved to task_execution_utils.c.

* Remove task tracker protocol methods

* Remove task_tracker.c methods

* remove unused methods from multi_server_executor

* fix style

* remove task tracker specific tests from worker_schedule

* comment out task tracker udf calls in tests

We were using task tracker udfs to test permissions in
multi_multiuser.sql. We should find some other way to test them, then we
should remove the commented out task tracker calls.

* remove task tracker test from follower schedule

* remove task tracker tests from multi mx schedule

* Remove task-tracker specific functions from worker functions

* remove multi task tracker extra schedule

* Remove unused methods from multi physical planner

* remove task_executor_type related things in tests

* remove LoadTuplesIntoTupleStore

* Do initial cleanup for repartition leftovers

During startup, task tracker would call TrackerCleanupJobDirectories and
TrackerCleanupJobSchemas to clean up leftover directories and job
schemas. With adaptive executor, while doing repartitions it is possible
to leak these things as well. We don't retry cleanups, so it is possible
to have leftover in case of errors.

TrackerCleanupJobDirectories is renamed as
RepartitionCleanupJobDirectories since it is repartition specific now,
however TrackerCleanupJobSchemas cannot be used currently because it is
task tracker specific. The thing is that this function is a no-op
currently.

We should add cleaning up intermediate schemas to DoInitialCleanup
method when that problem is solved(We might want to solve it in this PR
as well)

* Revert "remove task tracker tests from multi mx schedule"

This reverts commit 03ecc0a681.

* update multi mx repartition parallel tests

* not error with task_tracker_conninfo_cache_invalidate

* not run 4 repartition queries in parallel

It seems that when we run 4 repartition queries in parallel we get too
many clients error on CI even though we don't get it locally. Our guess
is that, it is because we open/close many connections without doing some
work and postgres has some delay to close the connections. Hence even
though connections are removed from the pg_stat_activity, they might
still not be closed. If the above assumption is correct, it is unlikely
for it to happen in practice because:
- There is some network latency in clusters, so this leaves some times
for connections to be able to close
- Repartition joins return some data and that also leaves some time for
connections to be fully closed.

As we don't get this error in our local, we currently assume that it is
not a bug. Ideally this wouldn't happen when we get rid of the
task-tracker repartition methods because they don't do any pruning and
might be opening more connections than necessary.

If this still gives us "too many clients" error, we can try to increase
the max_connections in our test suite(which is 100 by default).

Also there are different places where this error is given in postgres,
but adding some backtrace it seems that we get this from
ProcessStartupPacket. The backtraces can be found in this link:
https://circleci.com/gh/citusdata/citus/138702

* Set distributePlan->relationIdList when it is needed

It seems that we were setting the distributedPlan->relationIdList after
JobExecutorType is called, which would choose task-tracker if
replication factor > 1 and there is a repartition query. However, it
uses relationIdList to decide if the query has a repartition query, and
since it was not set yet, it would always think it is not a repartition
query and would choose adaptive executor when it should choose
task-tracker.

* use adaptive executor even with shard_replication_factor > 1

It seems that we were already using adaptive executor when
replication_factor > 1. So this commit removes the check.

* remove multi_resowner.c and deprecate some settings

* remove TaskExecution related leftovers

* change deprecated API error message

* not recursively plan single relatition repartition subquery

* recursively plan single relation repartition subquery

* test depreceated task tracker functions

* fix overlapping shard intervals in range-distributed test

* fix error message for citus_metadata_container

* drop task-tracker deprecated functions

* put the implemantation back to worker_cleanup_job_schema_cachesince citus cloud uses it

* drop some functions, add downgrade script

Some deprecated functions are dropped.
Downgrade script is added.
Some gucs are deprecated.
A new guc for repartition joins bucket size is added.

* order by a test to fix flappiness
2020-07-18 13:11:36 +03:00
Onur Tirtir be17ebb334 Bump citus version to 9.5devel 2020-07-01 14:46:55 +03:00
Hanefi Önaldı ca2ececb3b
Downgrade path from 9.4 to 9.3 to 9.2 2020-07-01 10:38:11 +03:00
Onur Tirtir 2e927bd6b7
Bump Citus to 9.4devel (#3788) 2020-04-22 12:50:00 +03:00
Nils Dijk 1d6ba1d09e
Refactor alter role to work on distributed roles (#3739)
DESCRIPTION: Alter role only works for citus managed roles

Alter role was implemented before we implemented good role management that hooks into the object propagation framework. This is a refactor of all alter role commands that have been implemented to
 - be on by default
 - only work for supported roles
 - make the citus extension owner a supported role

Instead of distributing the alter role commands for roles at the beginning of the node activation role it now _only_ executes the alter role commands for all users in all databases and in the current database.

In preparation of full role support small refactors have been done in the deparser.

Earlier tests targeting other roles than the citus extension owner have been either slightly changed or removed to be put back where we have full role support.

Fixes #2549
2020-04-16 12:23:27 +02:00
SaitTalhaNisanci ebda3eff61
read database name inside the function (#3730) 2020-04-09 13:11:13 +03:00
Hanefi Önaldı d1223bd6cc
Remove migration paths to 9.3-1, introduce 9.3-2 2020-04-03 12:50:45 +03:00
SaitTalhaNisanci 710970407f
not wait forever in multi_extension test (#3702) 2020-04-03 12:21:02 +03:00
Philip Dubé bcf54c5014 Address a couple issues with maintenace daemon management:
- Stop the daemon when citus extension is dropped
- Bail on maintenance daemon startup if myDbData is started with a non-zero pid
- Stop maintenance daemon from spawning itself
- Don't use postgres die, just wrap proc_exit(0)
- Assert(myDbData->workerPid == MyProcPid)

The two issues were that multiple daemons could be running for a database,
or that a daemon would be leftover after DROP EXTENSION citus
2020-02-21 16:49:01 +00:00
Nils Dijk 6ee82c381e
Add missing pieces for version bump of #3482 (#3523) 2020-02-21 12:35:29 +01:00
Onur Tirtir cd8210d516
Bump citus version to 9.3devel (#3482) 2020-02-13 16:22:05 +03:00
Philip Dubé 69dde460de See what flaky multi_extension test is doing with roles 2020-01-23 21:50:40 +00:00
Jelte Fennema 7730bd449c Normalize tests: Remove trailing whitespace 2020-01-06 09:32:03 +01:00
Jelte Fennema 7f3de68b0d Normalize tests: header separator length 2020-01-06 09:32:03 +01:00
Marco Slot 4c8d43c5d0 Bump repo version to 9.2devel 2019-11-29 07:33:39 +01:00
Hanefi Onaldi e3ad4aba94
Bump 9.1devel
* Add Changelog entry for 9.0.1
* Bump citus version to 9.1devel
2019-11-19 10:35:57 +03:00
Jelte Fennema 01da11f264
Change citus truncate trigger to AFTER and add more upgrade tests (#3070)
* Add more upgrade tests

* Fix citus trigger generation after upgrade

citus_truncate_trigger runs before truncate when created by create_distributed_table:
492d1b2cba/src/backend/distributed/commands/create_distributed_table.c (L1163)

* Remove pg_dist_jobid_seq
2019-10-07 16:43:04 +02:00
Nils Dijk 9c2c50d875
Hookup function/procedure deparsing to our utility hook (#3041)
DESCRIPTION: Propagate ALTER FUNCTION statements for distributed functions

Using the implemented deparser for function statements to propagate changes to both functions and procedures that are previously distributed.
2019-09-27 22:06:49 +02:00
Jelte Fennema 4bbf65d913
Change SQL migration build process for easier reviews (#2951)
@thanodnl told me it was a bit of a problem that it's impossible to see
the history of a UDF in git. The only way to do so is by reading all the
sql migration files from new to old. Another problem is that it's also
hard to review the changed UDF during code review, because to find out
what changed you have to do the same. I thought of a IMHO better (but
not perfect) way to handle this.

We keep the definition of a UDF in sql/udfs/{name_of_udf}/latest.sql.
That file we change whenever we need to make a change to the the UDF. On
top of that you also make a snapshot of the file in
sql/udfs/{name_of_udf}/{migration-version}.sql (e.g. 9.0-1.sql) by
copying the contents. This way you can easily view what the actual
changes were by looking at the latest.sql file.

There's still the question on how to use these files then. Sadly
postgres doesn't allow inclusion of other sql files in the migration sql
file (it does in psql using \i). So instead I used the C preprocessor+
make to compile a sql/xxx.sql to a build/sql/xxx.sql file. This final
build/sql/xxx.sql file has every occurence of #include "somefile.sql" in
sql/xxx.sql replaced by the contents of somefile.sql.
2019-09-13 18:44:27 +02:00
Nils Dijk 936d546a3c
Refactor Ensure Schema Exists to Ensure Dependecies Exists (#2882)
DESCRIPTION: Refactor ensure schema exists to dependency exists

Historically we only supported schema's as table dependencies to be created on the workers before a table gets distributed. This PR puts infrastructure in place to walk pg_depend to figure out which dependencies to create on the workers. Currently only schema's are supported as objects to create before creating a table.

We also keep track of dependencies that have been created in the cluster. When we add a new node to the cluster we use this catalog to know which objects need to be created on the worker.

Side effect of knowing which objects are already distributed is that we don't have debug messages anymore when creating schema's that are already created on the workers.
2019-09-04 14:10:20 +02:00
Nils Dijk be6b7bec69
Add UDF citus_(prepare|finish)_pg_upgrade to aid with upgrading citus (#2877)
DESCRIPTION: Add functions to help with postgres upgrades

Currently there is [a list of manual steps](https://docs.citusdata.com/en/v8.2/admin_guide/upgrading_citus.html?highlight=upgrade#upgrading-postgresql-version-from-10-to-11) to perform during a postgres upgrade. These steps guarantee our catalog tables are kept and counter values are maintained across upgrades.

Having more than 1 command in our docs for users to manually execute during upgrades is error prone for both the user, and our docs. There are already 2 catalog tables that have been introduced to citus that have not been added to our docs for backing up during upgrades (`pg_authinfo` and `pg_dist_poolinfo`).

As we add more functionality to citus we run into situations where there are more steps required either before or after the upgrade. At the same time, when we move catalog tables to a place where the contents will be maintained automatically during upgrades we could have less steps in our docs. This will come to a hard to maintain matrix of citus versions and steps to be performed.

Instead we could take ownership of these steps within the extension itself. This PR introduces two new functions for the user to use instead of long lists of error prone instructions to follow.
 - `citus_prepare_pg_upgrade`
    This function should be called by the user right before shutting down the cluster. This will ensure all citus catalog tables are backed up in a location where the information will be retained during an upgrade.
- `citus_finish_pg_upgrade`
    This function should be called right after a pg_upgrade of the cluster. This will restore the catalog tables to the state before the upgrade happend.

Both functions need to be executed both on the coordinator and on all the workers, in the same fashion our current documentation instructs to do.

There are two known problems with this function in its current form, which is also a problem with our docs. We should schedule time in the future to improve on this, but having it automated now is better as we are about to add extra steps to take after upgrades.
 - When you install citus in a clean cluster we do enable ssl for communication between the coordinator and the workers. If an upgrade to a clean cluster is performed we do not setup ssl on the new cluster causing the communication to fail.
 - There are no automated tests added in this PR to execute an upgrade test durning every build. 
    Our current test infrastructure does not allow for 2 versions of postgres to exist in the same environment. We will need to invest time to create a new testing harness that could run the following scenario:
      1. Create cluster
      2. Run extensible scripts to execute arbitrary statements on this cluster
      3. Perform an upgrade by preparing, upgrading and finishing
      4. Run extensible scripts to verify all objects created by earlier scripts exists in correct form in the upgraded cluster

    Given the non trivial amount of work involved for such a suite I'd like to land this before we have 
automated testing.

On a side note; As the reviewer noticed, the tables created in the public namespace are not visible in `psql` with `\d`. The backup catalog tables have the same name as the tables in `pg_catalog`. Due to postgres internals `pg_catalog` is first in the search path and therefore the non-qualified name would alwasy resolve to `pg_catalog.pg_dist_*`. Internally this is called a non-visible table as it would resolve to a different table without a qualified name. Only visible tables are shown with `\d`.
2019-08-13 15:53:10 +02:00
Philip Dubé 50144b75d0 Add check-empty to testing Makefile
Don't create functions multiple times
Move ALTER TABLEs to their declaration
Remove DROP FUNCTIONS IF EXISTS, OR REPLACE
2019-07-24 11:03:54 -07:00
Philip Dubé acbaa38a62 Squash migrations for versions 5/6, don't use WITH OIDS 2019-07-24 11:03:29 -07:00
Marco Slot efbe58eab2 Fix SQL schema version, we skipped 8.3 2019-07-17 16:05:25 +02:00
Hanefi Onaldi 5a6eba6ba9
Bump Citus to 8.4devel 2019-07-10 15:26:10 +03:00
Hadi Moshayedi d233887d68 Fix multi_extension in check-multi-vg 2019-07-04 13:03:46 +02:00
Marco Slot d6c667946c Fix citus_executor_name mapping by reimplementing it in C 2019-06-29 22:38:29 +02:00
Önder Kalacı 40da78c6fd
Introduce the adaptive executor (#2798)
With this commit, we're introducing the Adaptive Executor. 


The commit message consists of two distinct sections. The first part explains
how the executor works. The second part consists of the commit messages of
the individual smaller commits that resulted in this commit. The readers
can search for the each of the smaller commit messages on 
https://github.com/citusdata/citus and can learn more about the history
of the change.

/*-------------------------------------------------------------------------
 *
 * adaptive_executor.c
 *
 * The adaptive executor executes a list of tasks (queries on shards) over
 * a connection pool per worker node. The results of the queries, if any,
 * are written to a tuple store.
 *
 * The concepts in the executor are modelled in a set of structs:
 *
 * - DistributedExecution:
 *     Execution of a Task list over a set of WorkerPools.
 * - WorkerPool
 *     Pool of WorkerSessions for the same worker which opportunistically
 *     executes "unassigned" tasks from a queue.
 * - WorkerSession:
 *     Connection to a worker that is used to execute "assigned" tasks
 *     from a queue and may execute unasssigned tasks from the WorkerPool.
 * - ShardCommandExecution:
 *     Execution of a Task across a list of placements.
 * - TaskPlacementExecution:
 *     Execution of a Task on a specific placement.
 *     Used in the WorkerPool and WorkerSession queues.
 *
 * Every connection pool (WorkerPool) and every connection (WorkerSession)
 * have a queue of tasks that are ready to execute (readyTaskQueue) and a
 * queue/set of pending tasks that may become ready later in the execution
 * (pendingTaskQueue). The tasks are wrapped in a ShardCommandExecution,
 * which keeps track of the state of execution and is referenced from a
 * TaskPlacementExecution, which is the data structure that is actually
 * added to the queues and describes the state of the execution of a task
 * on a particular worker node.
 *
 * When the task list is part of a bigger distributed transaction, the
 * shards that are accessed or modified by the task may have already been
 * accessed earlier in the transaction. We need to make sure we use the
 * same connection since it may hold relevant locks or have uncommitted
 * writes. In that case we "assign" the task to a connection by adding
 * it to the task queue of specific connection (in
 * AssignTasksToConnections). Otherwise we consider the task unassigned
 * and add it to the task queue of a worker pool, which means that it
 * can be executed over any connection in the pool.
 *
 * A task may be executed on multiple placements in case of a reference
 * table or a replicated distributed table. Depending on the type of
 * task, it may not be ready to be executed on a worker node immediately.
 * For instance, INSERTs on a reference table are executed serially across
 * placements to avoid deadlocks when concurrent INSERTs take conflicting
 * locks. At the beginning, only the "first" placement is ready to execute
 * and therefore added to the readyTaskQueue in the pool or connection.
 * The remaining placements are added to the pendingTaskQueue. Once
 * execution on the first placement is done the second placement moves
 * from pendingTaskQueue to readyTaskQueue. The same approach is used to
 * fail over read-only tasks to another placement.
 *
 * Once all the tasks are added to a queue, the main loop in
 * RunDistributedExecution repeatedly does the following:
 *
 * For each pool:
 * - ManageWorkPool evaluates whether to open additional connections
 *   based on the number unassigned tasks that are ready to execute
 *   and the targetPoolSize of the execution.
 *
 * Poll all connections:
 * - We use a WaitEventSet that contains all (non-failed) connections
 *   and is rebuilt whenever the set of active connections or any of
 *   their wait flags change.
 *
 *   We almost always check for WL_SOCKET_READABLE because a session
 *   can emit notices at any time during execution, but it will only
 *   wake up WaitEventSetWait when there are actual bytes to read.
 *
 *   We check for WL_SOCKET_WRITEABLE just after sending bytes in case
 *   there is not enough space in the TCP buffer. Since a socket is
 *   almost always writable we also use WL_SOCKET_WRITEABLE as a
 *   mechanism to wake up WaitEventSetWait for non-I/O events, e.g.
 *   when a task moves from pending to ready.
 *
 * For each connection that is ready:
 * - ConnectionStateMachine handles connection establishment and failure
 *   as well as command execution via TransactionStateMachine.
 *
 * When a connection is ready to execute a new task, it first checks its
 * own readyTaskQueue and otherwise takes a task from the worker pool's
 * readyTaskQueue (on a first-come-first-serve basis).
 *
 * In cases where the tasks finish quickly (e.g. <1ms), a single
 * connection will often be sufficient to finish all tasks. It is
 * therefore not necessary that all connections are established
 * successfully or open a transaction (which may be blocked by an
 * intermediate pgbouncer in transaction pooling mode). It is therefore
 * essential that we take a task from the queue only after opening a
 * transaction block.
 *
 * When a command on a worker finishes or the connection is lost, we call
 * PlacementExecutionDone, which then updates the state of the task
 * based on whether we need to run it on other placements. When a
 * connection fails or all connections to a worker fail, we also call
 * PlacementExecutionDone for all queued tasks to try the next placement
 * and, if necessary, mark shard placements as inactive. If a task fails
 * to execute on all placements, the execution fails and the distributed
 * transaction rolls back.
 *
 * For multi-row INSERTs, tasks are executed sequentially by
 * SequentialRunDistributedExecution instead of in parallel, which allows
 * a high degree of concurrency without high risk of deadlocks.
 * Conversely, multi-row UPDATE/DELETE/DDL commands take aggressive locks
 * which forbids concurrency, but allows parallelism without high risk
 * of deadlocks. Note that this is unrelated to SEQUENTIAL_CONNECTION,
 * which indicates that we should use at most one connection per node, but
 * can run tasks in parallel across nodes. This is used when there are
 * writes to a reference table that has foreign keys from a distributed
 * table.
 *
 * Execution finishes when all tasks are done, the query errors out, or
 * the user cancels the query.
 *
 *-------------------------------------------------------------------------
 */



All the commits involved here:
* Initial unified executor prototype

* Latest changes

* Fix rebase conflicts to master branch

* Add missing variable for assertion

* Ensure that master_modify_multiple_shards() returns the affectedTupleCount

* Adjust intermediate result sizes

The real-time executor uses COPY command to get the results
from the worker nodes. Unified executor avoids that which
results in less data transfer. Simply adjust the tests to lower
sizes.

* Force one connection per placement (or co-located placements) when requested

The existing executors (real-time and router) always open 1 connection per
placement when parallel execution is requested.

That might be useful under certain circumstances:

(a) User wants to utilize as much as CPUs on the workers per
distributed query
(b) User has a transaction block which involves COPY command

Also, lots of regression tests rely on this execution semantics.
So, we'd enable few of the tests with this change as well.

* For parameters to be resolved before using them

For the details, see PostgreSQL's copyParamList()

* Unified executor sorts the returning output

* Ensure that unified executor doesn't ignore sequential execution of DDLJob's

Certain DDL commands, mainly creating foreign keys to reference tables,
should be executed sequentially. Otherwise, we'd end up with a self
distributed deadlock.

To overcome this situaiton, we set a flag `DDLJob->executeSequentially`
and execute it sequentially. Note that we have to do this because
the command might not be called within a transaction block, and
we cannot call `SetLocalMultiShardModifyModeToSequential()`.

This fixes at least two test: multi_insert_select_on_conflit.sql and
multi_foreign_key.sql

Also, I wouldn't mind scattering local `targetPoolSize` variables within
the code. The reason is that we'll soon have a GUC (or a global
variable based on a GUC) that'd set the pool size. In that case, we'd
simply replace `targetPoolSize` with the global variables.

* Fix 2PC conditions for DDL tasks

* Improve closing connections that are not fully established in unified execution

* Support foreign keys to reference tables in unified executor

The idea for supporting foreign keys to reference tables is simple:
Keep track of the relation accesses within a transaction block.
    - If a parallel access happens on a distributed table which
      has a foreign key to a reference table, one cannot modify
      the reference table in the same transaction. Otherwise,
      we're very likely to end-up with a self-distributed deadlock.
    - If an access to a reference table happens, and then a parallel
      access to a distributed table (which has a fkey to the reference
      table) happens, we switch to sequential mode.

Unified executor misses the function calls that marks the relation
accesses during the execution. Thus, simply add the necessary calls
and let the logic kick in.

* Make sure to close the failed connections after the execution

* Improve comments

* Fix savepoints in unified executor.

* Rebuild the WaitEventSet only when necessary

* Unclaim connections on all errors.

* Improve failure handling for unified executor

   - Implement the notion of errorOnAnyFailure. This is similar to
     Critical Connections that the connection managament APIs provide
   - If the nodes inside a modifying transaction expand, activate 2PC
   - Fix few bugs related to wait event sets
   - Mark placement INACTIVE during the execution as much as possible
     as opposed to we do in the COMMIT handler
   - Fix few bugs related to scheduling next placement executions
   - Improve decision on when to use 2PC

Improve the logic to start a transaction block for distributed transactions

- Make sure that only reference table modifications are always
  executed with distributed transactions
- Make sure that stored procedures and functions are executed
  with distributed transactions

* Move waitEventSet to DistributedExecution

This could also be local to RunDistributedExecution(), but in that case
we had to mark it as "volatile" to avoid PG_TRY()/PG_CATCH() issues, and
cast it to non-volatile when doing WaitEventSetFree(). We thought that
would make code a bit harder to read than making this non-local, so we
move it here. See comments for PG_TRY() in postgres/src/include/elog.h
and "man 3 siglongjmp" for more context.

* Fix multi_insert_select test outputs

Two things:
   1) One complex transaction block is now supported. Simply update
      the test output
   2) Due to dynamic nature of the unified executor, the orders of
      the errors coming from the shards might change (e.g., all of
      the queries on the shards would fail, but which one appears
      on the error message?). To fix that, we simply added it to
      our shardId normalization tool which happens just before diff.

* Fix subeury_and_cte test

The error message is updated from:
	failed to execute task
To:
        more than one row returned by a subquery or an expression

which is a lot clearer to the user.

* Fix intermediate_results test outputs

Simply update the error message from:
	could not receive query results
to
	result "squares" does not exist

which makes a lot more sense.

* Fix multi_function_in_join test

The error messages update from:
     Failed to execute task XXX
To:
     function f(..) does not exist

* Fix multi_query_directory_cleanup test

The unified executor does not create any intermediate files.

* Fix with_transactions test

A test case that just started to work fine

* Fix multi_router_planner test outputs

The error message is update from:
	Could not receive query results
To:
	Relation does not exists

which is a lot more clearer for the users

* Fix multi_router_planner_fast_path test

The error message is update from:
	Could not receive query results
To:
	Relation does not exists

which is a lot more clearer for the users

* Fix isolation_copy_placement_vs_modification by disabling select_opens_transaction_block

* Fix ordering in isolation_multi_shard_modify_vs_all

* Add executor locks to unified executor

* Make sure to allocate enought WaitEvents

The previous code was missing the waitEvents for the latch and
postmaster death.

* Fix rebase conflicts for master rebase

* Make sure that TRUNCATE relies on unified executor

* Implement true sequential execution for multi-row INSERTS

Execute the individual tasks executed one by one. Note that this is different than
MultiShardConnectionType == SEQUENTIAL_CONNECTION case (e.g., sequential execution
mode). In that case, running the tasks across the nodes in parallel is acceptable
and implemented in that way.

However, the executions that are qualified here would perform poorly if the
tasks across the workers are executed in parallel. We currently qualify only
one class of distributed queries here, multi-row INSERTs. If we do not enforce
true sequential execution, concurrent multi-row upserts could easily form
a distributed deadlock when the upserts touch the same rows.

* Remove SESSION_LIFESPAN flag in unified_executor

* Apply failure test updates

We've changed the failure behaviour a bit, and also the error messages
that show up to the user. This PR covers majority of the updates.

* Unified executor honors citus.node_connection_timeout

With this commit, unified executor errors out if even
a single connection cannot be established within
citus.node_connection_timeout.

And, as a side effect this fixes failure_connection_establishment
test.

* Properly increment/decrement pool size variables

Before this commit, the idle and active connection
counts were not properly calculated.

* insert_select_executor goes through unified executor.

* Add missing file for task tracker

* Modify ExecuteTaskListExtended()'s signature

* Sort output of INSERT ... SELECT ... RETURNING

* Take partition locks correctly in unified executor

* Alternative implementation for force_max_query_parallelization

* Fix compile warnings in unified executor

* Fix style issues

* Decrement idleConnectionCount when idle connection is lost

* Always rebuild the wait event sets

In the previous implementation, on waitFlag changes, we were only
modifying the wait events. However, we've realized that it might
be an over optimization since (a) we couldn't see any performance
benefits (b) we see some errors on failures and because of (a)
we prefer to disable it now.

* Make sure to allocate enough sized waitEventSet

With multi-row INSERTs, we might have more sessions than
task*workerCount after few calls of RunDistributedExecution()
because the previous sessions would also be alive.

Instead, re-allocate events when the connectino set changes.

* Implement SELECT FOR UPDATE on reference tables

On master branch, we do two extra things on SELECT FOR UPDATE
queries on reference tables:
   - Acquire executor locks
   - Execute the query on all replicas

With this commit, we're implementing the same logic on the
new executor.

* SELECT FOR UPDATE opens transaction block even if SelectOpensTransactionBlock disabled

Otherwise, users would be very confused and their logic is very likely
to break.

* Fix build error

* Fix the newConnectionCount calculation in ManageWorkerPool

* Fix rebase conflicts

* Fix minor test output differences

* Fix citus indent

* Remove duplicate sorts that is added with rebase

* Create distributed table via executor

* Fix wait flags in CheckConnectionReady

* failure_savepoints output for unified executor.

* failure_vacuum output (pg 10) for unified executor.

* Fix WaitEventSetWait timeout in unified executor

* Stabilize failure_truncate test output

* Add an ORDER BY to multi_upsert

* Fix regression test outputs after rebase to master

* Add executor.c comment

* Rename executor.c to adaptive_executor.c

* Do not schedule tasks if the failed placement is not ready to execute

Before the commit, we were blindly scheduling the next placement executions
even if the failed placement is not on the ready queue. Now, we're ensuring
that if failed placement execution is on a failed pool or session where the
execution is on the pendingQueue, we do not schedule the next task. Because
the other placement execution should be already running.

* Implement a proper custom scan node for adaptive executor

- Switch between the executors, add GUC to set the pool size
- Add non-adaptive regression test suites
- Enable CIRCLE CI for non-adaptive tests
- Adjust test output files

* Add slow start interval to the executor

* Expose max_cached_connection_per_worker to user

* Do not start slow when there are cached connections

* Consider ExecutorSlowStartInterval in NextEventTimeout

* Fix memory issues with ReceiveResults().

* Disable executor via TaskExecutorType

* Make sure to execute the tests with the other executor

* Use task_executor_type to enable-disable adaptive executor

* Remove useless code

* Adjust the regression tests

* Add slow start regression test

* Rebase to master

* Fix test failures in adaptive executor.

* Rebase to master - 2

* Improve comments & debug messages

* Set force_max_query_parallelization in isolation_citus_dist_activity

* Force max parallelization for creating shards when asked to use exclusive connection.

* Adjust the default pool size

* Expand description of max_adaptive_executor_pool_size GUC

* Update warnings in FinishRemoteTransactionCommit()

* Improve session clean up at the end of execution

Explicitly list all the states that the execution might end,
otherwise warn.

* Remove MULTI_CONNECTION_WAIT_RETRY which is not used at all

* Add more ORDER BYs to multi_mx_partitioning
2019-06-28 14:04:40 +02:00
Hanefi Onaldi fb497ddad1
Bump 8.2devel on master (#2567) 2018-12-24 13:49:50 +03:00
Murat Tuncer fd868ec268 Fix citus_stat_statements view
Join between pg_stat_statements and citus_query_stats should
include queryid, dbid, userid instead of just queryid.
2018-11-29 14:49:16 +03:00
Marco Slot 5a63deab2e Clean up UDFs and remove unnecessary permissions 2018-11-26 14:40:37 +01:00
Marco Slot 30bad7e66f Add worker_execute_sql_task UDF 2018-11-22 18:15:33 +01:00
Murat Tuncer 4f8042085c Fix drop schema in mx with partitioned tables
Drop schema command fails in mx mode if there
is a partitioned table with active partitions.

This is due to fact that sql drop trigger receives
all the dropped objects including partitions. When
we call drop table on parent partition, it also drops
the partitions on the mx node. This causes the drop
table command on partitions to fail on mx node because
they are already dropped when the partition parent was
dropped.

With this work we did not require the table to exist on
worker_drop_distributed_table.
2018-10-08 17:01:54 -07:00
velioglu d7f75e5b48 Add citus_lock_waits to show locked distributed queries 2018-09-20 14:13:51 +03:00
velioglu d1f005daac Adds UDFs for testing MX functionalities with isolation tests 2018-09-12 07:04:16 +03:00
Onder Kalaci d657759c97 Views to Provide some insight about the distributed transactions on Citus MX
With this commit, we implement two views that are very similar
to pg_stat_activity, but showing queries that are involved in
distributed queries:

    - citus_dist_stat_activity: Shows all the distributed queries
    - citus_worker_stat_activity: Shows all the queries on the shards
                                  that are initiated by distributed queries.

Both views have the same columns in the outputs. In very basic terms, both of the views
are meant to provide some useful insights about the distributed
transactions within the cluster. As the names reveal, both views are similar to pg_stat_activity.
Also note that these views can be pretty useful on Citus MX clusters.

Note that when the views are queried from the worker nodes, they'd not show the distributed
transactions that are initiated from the coordinator node. The reason is that the worker
nodes do not know the host/port of the coordinator. Thus, it is advisable to query the
views from the coordinator.

If we bucket the columns that the views returns, we'd end up with the following:

- Hostnames and ports:
   - query_hostname, query_hostport: The node that the query is running
   - master_query_host_name, master_query_host_port: The node in the cluster
                                                   initiated the query.
    Note that for citus_dist_stat_activity view, the query_hostname-query_hostport
    is always the same with master_query_host_name-master_query_host_port. The
    distinction is mostly relevant for citus_worker_stat_activity. For example,
    on Citus MX, a users starts a transaction on Node-A, which starts worker
    transactions on Node-B and Node-C. In that case, the query hostnames would be
    Node-B and Node-C whereas the master_query_host_name would Node-A.

- Distributed transaction related things:
    This is mostly the process_id, distributed transactionId and distributed transaction
    number.

- pg_stat_activity columns:
    These two views get all the columns from pg_stat_activity. We're basically joining
    pg_stat_activity with get_all_active_transactions on process_id.
2018-09-10 21:33:27 +03:00
Onder Kalaci 5cf8fbe7b6 Add infrastructure to relation if exists 2018-09-07 14:49:36 +03:00
Onder Kalaci 1b3257816e Make sure that table is dropped before shards are dropped
This commit fixes a bug where a concurrent DROP TABLE deadlocks
with SELECT (or DML) when the SELECT is executed from the workers.

The problem was that Citus used to remove the metadata before
droping the table on the workers. That creates a time window
where the SELECT starts running on some of the nodes and DROP
table on some of the other nodes.
2018-09-04 08:57:20 +03:00
Onder Kalaci 974cbf11a5 Hide shard names on MX worker nodes
This commit by default enables hiding shard names on MX workers
by simple replacing `pg_table_is_visible()` calls with
`citus_table_is_visible()` calls on the MX worker nodes. The latter
function filters out tables that are known to be shards.

The main motivation of this change is a better UX. The functionality
can be opted out via a GUC.

We also added two views, namely citus_shards_on_worker and
citus_shard_indexes_on_worker such that users can query
them to see the shards and their corresponding indexes.

We also added debug messages such that the filtered tables can
be interactively seen by setting the level to DEBUG1.
2018-08-07 14:21:45 +03:00
mehmet furkan ÅŸahin bc757845eb Citus versioning fix 2018-07-26 10:56:34 +03:00
mehmet furkan ÅŸahin 887aa8150d Bump citus version to 8.0devel 2018-07-25 12:03:47 +03:00
Jason Petersen 318119910b
Add pg_dist_poolinfo table
For storing nodes' pool host/port overrides.
2018-07-10 09:30:22 -07:00
Murat Tuncer a7277526fd Make citus_stat_statements_reset() super user function 2018-07-10 11:21:20 +03:00
Murat Tuncer 23800f50f1 Update citus_stat_statements view and regression tests 2018-07-03 16:14:13 +03:00
Jason Petersen 7a75c2ed31 Add connparam invalidation trigger creation logic
This needs to live in Community, since we haven't yet added the com-
plication of having divergent upgrade scripts in Enterprise.
2018-06-20 14:13:18 -06:00
velioglu 20acee2cd4
Bump citus version to 7.5devel 2018-05-28 17:25:21 -06:00
Onder Kalaci 317dd02a2f Implement single repartitioning on hash distributed tables
* Change worker_hash_partition_table() such that the
     divergence between Citus planner's hashing and
     worker_hash_partition_table() becomes the same.

   * Rename single partitioning to single range partitioning.

   * Add single hash repartitioning. Basically, logical planner
     treats single hash and range partitioning almost equally.
     Physical planner, on the other hand, treats single hash and
     dual hash repartitioning almost equally (except for JoinPruning).

   * Add a new GUC to enable this feature
2018-05-02 18:50:55 +03:00
velioglu f01daa0c83 Bump citus version to 7.4devel 2018-04-05 20:38:47 +03:00
velioglu 698d585fb5 Remove broadcast join logic
After this change all the logic related to shard data fetch logic
will be removed. Planner won't plan any ShardFetchTask anymore.
Shard fetch related steps in real time executor and task-tracker
executor have been removed.
2018-03-30 11:45:19 +03:00
Onder Kalaci 7dc9589b56 Handle failures during I/O
This commit checks the connection status right after any IO happens
on the socket.

This is necessary since before this commit we didn't pass any information
to the higher level functions whether we're done with the connection
(e.g., no IO required anymore) or an errors happened during the IO.
2018-03-02 08:33:53 +02:00
Markus Sintonen 6202e80d06 Implemented jsonb_agg, json_agg, jsonb_object_agg, json_object_agg 2018-02-18 00:19:18 +02:00
velioglu d357d2fccd Bump citus version to 7.3devel 2018-01-16 11:50:28 +03:00
Marco Slot f8550b8c85 Fix issues with read_intermediate_result signature 2017-12-07 13:47:56 +01:00
Marco Slot 4cdadfcab6 Add intermediate results infrastructure 2017-12-04 14:50:11 +01:00
Marco Slot bbbadd6d1b Bump Citus version to 7.2devel 2017-11-15 10:32:49 +01:00
Marco Slot 89eb833375 Use citus.next_shard_id where practical in regression tests 2017-11-15 10:12:05 +01:00
Marco Slot 7e34348334 Add shard transfer mode parameter to shard copy functions 2017-10-31 13:30:48 +01:00
Brian Cloutier ebcb2b65e9 Add master_move_node function 2017-10-16 10:51:28 -07:00
Murat Tuncer 4676c4f7a5 Prevent crash when remote transaction start fails (#1662)
We sent multiple commands to worker when starting a transaction.
Previously we only checked the result of the first command that
is transaction 'BEGIN' which always succeeds. Any failure on
following commands were not checked.

With this commit, we make sure all command results are checked.
If there is any error we report the first error found.
2017-09-26 17:25:46 -07:00
Burak Yucesoy 273b034720 Bump Citus version 2017-08-28 17:56:39 +03:00
Marco Slot 641420d79f Remove source node argument from dump_local_wait_edges 2017-08-23 13:14:00 +02:00
Onder Kalaci 6532b69873 Kill the maintenance daemon on DROP DATABASE 2017-08-18 16:03:08 +03:00
Onder Kalaci a333c9f16c Add infrastructure for distributed deadlock detection
This commit adds all the necessary pieces to do the distributed
deadlock detection.

Each distributed transaction is already assigned with distributed
transaction ids introduced with
3369f3486f. The dependency among the
distributed transactions are gathered with
80ea233ec1.

With this commit, we implement a DFS (depth first seach) on the
dependency graph and search for cycles. Finding a cycle reveals
a distributed deadlock.

Once we find the deadlock, we examine the path that the cycle exists
and cancel the youngest distributed transaction.

Note that, we're not yet enabling the deadlock detection by default
with this commit.
2017-08-12 13:28:37 +03:00
velioglu 100739f62a Change citus subversion 2017-08-11 11:57:57 +03:00
Marco Slot 0ae265c436 Add citus_create_restore_point for distributed snapshots 2017-08-11 07:36:20 +02:00
Eren BaÅŸak 3061737712 Define Some Utility Functions
This change declares two new functions:

`master_update_table_statistics` updates the statistics of shards belong
to the given table as well as its colocated tables.

`get_colocated_shard_array` returns the ids of colocated shards of a
given shard.
2017-08-10 12:42:46 +03:00
Brian Cloutier 2e0916e15a Add master_add_secondary_node() UDF 2017-08-09 17:10:48 +03:00
Brian Cloutier 5618e69386 Add pg_dist_node.nodecluster 2017-08-08 11:18:31 +03:00
Brian Cloutier b20a086a8f master_activate_node UDF also returns noderole 2017-07-28 16:02:43 +03:00
Brian Cloutier 32e16ffe02 Give isolation tester ability to see locks on workers 2017-07-26 18:43:04 +03:00
Marco Slot 81198a1d02 Add function for dumping local wait edges 2017-07-25 16:52:32 +02:00
Onder Kalaci 3369f3486f Introduce distributed transaction ids
This commit adds distributed transaction id infrastructure in
the scope of distributed deadlock detection.

In general, the distributed transaction id consists of a tuple
in the form of: `(databaseId, initiatorNodeIdentifier, transactionId,
timestamp)`.

Briefly, we add a shared memory block on each node, which holds some
information per backend (i.e., an array `BackendData backends[MaxBackends]`).
Later, on each coordinated transaction, Citus sends
`SELECT assign_distributed_transaction_id()` right after `BEGIN`.
For that backend on the worker, the distributed transaction id is set to
the values assigned via the function call.

The aim of the above is to correlate the transactions on the coordinator
to the transactions on the worker nodes.
2017-07-18 15:01:42 +03:00
Brian Cloutier 7ad95b53d2 Rename pg_dist_shard_placement -> pg_dist_placement
Comes with a few changes:

- Change the signature of some functions to accept groupid
  - InsertShardPlacementRow
  - DeleteShardPlacementRow
  - UpdateShardPlacementState

- NodeHasActiveShardPlacements returns true if the group the node is a
  part of has any active shard placements

- TupleToShardPlacement now returns ShardPlacements which have NULL
  nodeName and nodePort.

- Populate (nodeName, nodePort) when creating ShardPlacements
- Disallow removing a node if it contains any shard placements

- DeleteAllReferenceTablePlacementsFromNode matches based on group. This
  doesn't change behavior for now (while there is only one node per
  group), but means in the future callers should be careful about
  calling it on a secondary node, it'll delete placements on the primary.

- Create concept of a GroupShardPlacement, which represents an actual
  tuple in pg_dist_placement and is distinct from a ShardPlacement,
  which has been resolved to a specific node. In the future
  ShardPlacement should be renamed to NodeShardPlacement.

- Create some triggers which allow existing code to continue to insert
  into and update pg_dist_shard_placement as if it still existed.
2017-07-12 14:17:31 +02:00
Marco Slot 04fe3f03f6 Change implementation of shard_name UDF to get schema-qualified shard name 2017-07-04 10:49:40 +03:00
Andres Freund 4a3b2de4c5 Add some tests checking that maintenance daemon gets started.
The 2nd database one is a bit slow, but also shows something
important, so we might want to keep it?
2017-06-23 11:53:39 -07:00
Andres Freund 1691f780fd Force cache invalidation machinery to be initialized earlier.
Previously it was not guaranteed that invalidations were registered
after creating the extension, only if the extension was used
afterwards.
2017-06-23 11:20:10 -07:00
Burak Yucesoy aff6a3dcc4 Add tests for version check 2017-05-24 17:39:25 +03:00
Burak Yucesoy 7a7c74cc87 Add tests for version checks 2017-05-22 09:53:29 +03:00
Jason Petersen cc45712144 Bump extension and configure PACKAGE versions
Actually getting this done before the next dev cycle begins.
2017-05-17 15:25:30 -06:00
Marco Slot 8edba5f309 Honour enable_ddl_propagation in truncate trigger 2017-04-29 03:32:52 +02:00
Burak Yucesoy e9095e62ec Decouple reference table replication
With this change we add an option to add a node without replicating all reference
tables to that node. If a node is added with this option, we mark the node as
inactive and no queries will sent to that node.

We also added two new UDFs;
 - master_activate_node(host, port):
    - marks node as active and replicates all reference tables to that node
 - master_add_inactive_node(host, port):
    - only adds node to pg_dist_node
2017-04-17 13:33:31 +03:00
Jason Petersen 8b4620ef16
Use RESET for GUC test, not reconnect
More limited in what it does, better test.
2017-04-04 16:40:17 -06:00
Jason Petersen ef81b21a49
Clean up ErrorIfUnstableCreateOrAlterExtensionStmt
Swaps an Assert in for an ereport, and adds details and hints to the
error message to help users with a possibly confusing scenario.
2017-04-04 15:58:57 -06:00
Burak Yucesoy a09614553f Add enable_version_checks GUC and address feedback 2017-04-04 19:11:13 +03:00
Burak Yucesoy 087d8427e3
Error out if binary citus version does not match installed extension
With this change, we start to error out if loaded citus binaries does not match
the available major version or installed citus extension version. In this case
we force user to restart the server or run ALTER EXTENSION depending on the
situation
2017-04-03 17:36:13 -06:00
velioglu e32aff1a26 Size UDFs implemented
citus_table_size, citus_relation_size and citus_total_relation_size UDFs are implemented.
2017-03-16 13:50:30 +03:00
Brian Cloutier 807beb7bc0 Remove master_get_local_first_candidate_nodes 2017-03-07 11:50:59 +03:00
Metin Doslu 7cff8719c2 Add worker_hash() and a stub for isolate_tenant_to_new_shard() 2017-01-20 14:38:01 +02:00
Eren Basak b686d9a025 Add Sequence Support for MX Tables
This change adds support for serial columns to be used with MX tables.
Prior to this change, sequences of serial columns were created in all
workers (for being able to create shards) but never used. With MX, we
need to set the sequences so that sequences in each worker create
unique values. This is done by setting the MINVALUE, MAXVALUE and
START values of the sequence.
2017-01-18 09:43:38 +03:00
Eren Basak b1ce8d61c0 Create Invalidation Trigger for pg_dist_local_group Table Updates 2017-01-18 09:43:38 +03:00
Murat Tuncer b93185d800 Add master_disable_node UDF
We can now remove nodes from cluster regardless of them
having an active shard placement.
2017-01-10 10:54:57 +03:00
Burak Yucesoy 31cd2357fe Add upgrade_to_reference_table
With this change we introduce new UDF, upgrade_to_reference_table, which can be used to
upgrade existing broadcast tables reference tables. For upgrading, we require that given
table contains only one shard.
2017-01-02 17:54:42 +02:00
Eren Basak e43eed0f7a Prevent Deadlock on Dropping MX Tables with Sequences
This change prevents a deadlock situation during DROP TABLE on an
mx table with sequences on workers with metadata.
2016-12-28 16:32:20 +03:00
Burak Yucesoy 0851fd2f0b GRANT SELECT access for metadata tables to public
Previously, we errored out if non-user tries to SELECT query for some metadata tables. It
seems that we already GRANT SELECT access to some metadata tables but not others. With
this change, we GRANT SELECT access to all existing Citus metadata tables.
2016-12-23 16:32:47 +03:00
Eren Basak 31af40cc26 Handle MX tables on workers during drop table commands 2016-12-23 15:43:32 +03:00
Marco Slot 6852f8a951 Add shard locking UDFs 2016-12-22 11:04:34 +01:00
Burak Yücesoy 501a2ecead Add get_distribution_value_shardid UDF (#1048)
* Add get_distribution_value_shardid UDF

With this UDF users can now map given distribution value to shard id. We mostly hide
shardids from users to prevent unnecessary complexity but some power users might need
to know about which entry/value is stored in which shard for maintanence purposes.

Signature of this UDF is as follows;

bigint get_distribution_value_shardid(table_name regclass, distribution_value anyelement)
2016-12-22 12:17:08 +03:00
Onder Kalaci 9f0bd4cb36 Reference Table Support - Phase 1
With this commit, we implemented some basic features of reference tables.

To start with, a reference table is
  * a distributed table whithout a distribution column defined on it
  * the distributed table is single sharded
  * and the shard is replicated to all nodes

Reference tables follows the same code-path with a single sharded
tables. Thus, broadcast JOINs are applicable to reference tables.
But, since the table is replicated to all nodes, table fetching is
not required any more.

Reference tables support the uniqueness constraints for any column.

Reference tables can be used in INSERT INTO .. SELECT queries with
the following rules:
  * If a reference table is in the SELECT part of the query, it is
    safe join with another reference table and/or hash partitioned
    tables.
  * If a reference table is in the INSERT part of the query, all
    other participating tables should be reference tables.

Reference tables follow the regular co-location structure. Since
all reference tables are single sharded and replicated to all nodes,
they are always co-located with each other.

Queries involving only reference tables always follows router planner
and executor.

Reference tables can have composite typed columns and there is no need
to create/define the necessary support functions.

All modification queries, master_* UDFs, EXPLAIN, DDLs, TRUNCATE,
sequences, transactions, COPY, schema support works on reference
tables as expected. Plus, all the pre-requisites associated with
distribution columns are dismissed.
2016-12-20 14:09:35 +02:00
Metin Doslu 86cca54857 Add colocate_with option to create_distributed_table()
With this commit, we support three versions of colocate_with: i.default, ii.none
and iii. a specific table name.
2016-12-16 14:53:35 +02:00
Marco Slot 5714be0da5 Expose the column_to_column_name UDF to make partkey in pg_dist_partition human-readable 2016-12-14 10:46:33 +01:00
Eren Basak afbb5ffb31 Add stop_metadata_sync_to_node UDF 2016-12-14 10:53:12 +03:00
Eren Basak 5e96e4f60e Make truncate triggers propagated on start_metadata_sync_to_node call 2016-12-14 10:53:10 +03:00
Eren Basak 9eff968d1f Add start_metadata_sync_to_node UDF
This change adds `start_metadata_sync_to_node` UDF which copies the metadata about nodes and MX tables
from master to the specified worker, sets its local group ID and marks its hasmetadata to true to
allow it receive future DDL changes.
2016-12-13 10:48:03 +03:00
Eren Basak 444f14d546 Add Column Definition List for Output Columns for master_add_node
This change allows seeing the names of columns of `master_add_node`,
using `SELECT * FROM master_add_node(...)` by specifying output
columns in UDF definition.
2016-11-07 14:08:58 -08:00
Onder Kalaci 9cd549f21f Add stub for Copy shard placement
This commit does not change the current behaviour, but, helps to implement
enterprise feature without any version changes.
2016-10-26 17:57:55 +03:00
Metin Doslu 4e555880b7 Add mark_tables_colocated() to update colocation groups
Added a new UDF, mark_tables_colocated(), to colocate tables with the same
configuration (shard count, shard replication count and distribution column type).
2016-10-26 17:29:03 +03:00
Andres Freund fcd150c7c8 Invalidate relcache after pg_dist_shard_placement changes.
This forces prepared statements to be re-planned after changes of the
placement metadata. There's some locking issues remaining, but that's a
a separate task.

Also add regression tests verifying that invalidations take effect on
prepared statements.
2016-10-26 03:36:35 -07:00
Jason Petersen 73f5b8b05f
Move all funcs to pg_catalog, add test to verify
We'd been relying on a single SET search_path command in an earlier
script, but a subsequent script RESET search_path, causing any further
bare functions to be created in the first schema on the search path.

However, starting with an older extension version and executing ALTER
scripts one at a time DOES avoid putting any functions in the public
namespace, so I wrote an upgrade script resilient to that, especially
because PostgreSQL 9.5 will error out if a function is already in the
schema it's being moved to.
2016-10-25 12:45:53 -06:00
Burak Yucesoy 5a03acf2bf Foreign Constraint Support for create_distributed_table and shard move
With this change, we now push down foreign key constraints created during CREATE TABLE
statements. We also start to send foreign constraints during shard move along with
other DDL statements
2016-10-21 15:38:55 +03:00
Metin Doslu 405335fcee Add create_reference_table()
create_reference_table() creates a hash distributed table with shard count
equals to 1 and replication factor equals to shard_replication_factor
configuration value.
2016-10-20 15:29:30 +03:00
Metin Doslu 40bdafa8d1 Add create_distributed_table()
create_distributed_table() creates a hash distributed table with default values
of shard count and shard replication factor.
2016-10-20 10:58:25 +03:00
Eren Basak cee7b54e7c Add worker transaction and transaction recovery infrastructure 2016-10-18 14:18:14 +03:00
Eren Basak 8f477d18f1 Add pg_dist_local_group Metadata Table
This change adds the pg_dist_local_group metadata table, which indicates
the group id of the current node. It is expected that this table contains
one and only one row, which only contains the group id of the node as an
integer.
2016-10-14 11:41:14 +03:00
Brian Cloutier 6c3d79b4e7 Drop shardalias 2016-10-14 11:03:26 +03:00
Burak Yucesoy 6668d19a3b Make shard transfer functions co-location aware
With this change, master_copy_shard_placement and master_move_shard_placement functions
start to copy/move given shard along with its co-located shards.
2016-10-13 18:16:40 +03:00
Eren Basak ed3af403fd Add Metadata Snapshot Infrastructure
This change adds the required infrastructure about metadata snapshot from MX
codebase into Citus, mainly metadata_sync.c file and master_metadata_snapshot UDF.
2016-10-13 10:40:14 +03:00
Andres Freund 982ad66753 Introduce placement IDs.
So far placements were assigned an Oid, but that was just used to track
insertion order. It also did so incompletely, as it was not preserved
across changes of the shard state. The behaviour around oid wraparound
was also not entirely as intended.

The newly introduced, explicitly assigned, IDs are preserved across
shard-state changes.

The prime goal of this change is not to improve ordering of task
assignment policies, but to make it easier to reference shards.  The
newly introduced UpdateShardPlacementState() makes use of that, and so
will the in-progress connection and transaction management changes.
2016-10-07 11:59:20 -07:00
Brian Cloutier 9d6699b07c Switch from pg_worker_list.conf file to pg_dist_node metadata table.
Related to #786

This change adds the `pg_dist_node` table that contains the information
about the workers in the cluster, replacing the previously used
`pg_worker_list.conf` file (or the one specified with `citus.worker_list_file`).

Upon update, `pg_worker_list.conf` file is read and `pg_dist_node` table is
populated with the file's content. After that, `pg_worker_list.conf` file
is renamed to `pg_worker_list.conf.obsolete`

For adding and removing nodes, the change also includes two new UDFs:
`master_add_node` and `master_remove_node`, which require superuser
permissions.

'citus.worker_list_file' guc is kept for update purposes but not used after the
update is finished.
2016-10-05 13:01:35 +03:00
Marco Slot 32b2bd4ed8 Add replication model column to pg_dist_partition 2016-10-05 01:14:28 +02:00
Marco Slot a4efb60b54 Change logicalrelid type in pg_dist_partition and pg_dist_shard to regclass 2016-10-03 20:27:16 +02:00
Burak Yucesoy 1ee39eb098 Internal co-location API
With this commit we introduce internal API for co-location related operations.
2016-09-29 11:56:53 +03:00
Metin Doslu 3ff1877108
Bump version numbers for 5.2 release 2016-08-01 13:48:24 -07:00
Jason Petersen abe7304898
Support SERIAL/BIGSERIAL non-partition columns
This adds support for SERIAL/BIGSERIAL column types. Because we now can
evaluate functions on the master (during execution), adding this is a
matter of ensuring the table creation step works properly.

To accomplish this, I've added some logic to detect sequences owned by
a table (i.e. those related to its columns). Simply creating a sequence
and using it in a default value is insufficient; users who do so must
ensure the sequence is owned by the column using it.

Fortunately, this is exactly what SERIAL and BIGSERIAL do, which is the
use case we're targeting with this feature. While testing this, I found
that worker_apply_shard_ddl_command actually adds shard identifiers to
sequence names, though I found no places that use or test this path. I
removed that code so that sequence names are not mutated and will match
those used by a SERIAL default value expression.

Our use of the new-to-9.5 CREATE SEQUENCE IF NOT EXISTS syntax means we
are dropping support for 9.4 (which is being done regardless, but makes
this change simpler). I've removed 9.4 from the Travis build matrix.

Some edge cases are possible in ALTER SEQUENCE, COPY FROM (on workers),
and CREATE SEQUENCE OWNED BY. I've added errors for each so that users
understand when and why certain operations are prohibited.
2016-07-28 23:55:40 -06:00
Burak Yucesoy 6f20af9e38 Remove schema name parameter from API functions
We remove schema name parameter from worker_fetch_foreign_file and
worker_fetch_regular_table functions. We now send schema name
concatanated with table name.
2016-07-28 20:41:05 +03:00
Burak Yucesoy a649b47bac Add old version(without schema name parameter) of api functions back
Fixes #676

We added old versions (i.e. without schema name) of worker_apply_shard_ddl_command,
worker_fetch_foreign_file and worker_fetch_regular_table back. During function call
of one of these functions, we set schema name as  public schema and call the newer
version of the functions.
2016-07-28 20:40:38 +03:00
Burak Yucesoy b58872b441
Fix worker_fetch_regular_table with schema
Fixes #504
Fixes #646

We changed signature of worker_fetch_regular_table to accept schema name as parameter to
make it work with schemas.
2016-07-22 00:44:02 -06:00
Burak Yucesoy 2f0158dde1 Change worker_apply_shard_ddl_command to accept schema name as parameter
Fixes #565
Fixes #626

To add schema support to citus, we need to schema-prefix all table names, object names etc.
in the queries sent to worker nodes. However; query deparsing is not available for most of
DDL commands, therefore it is not easy to generate worker query in the master node.

As a solution we are sending schema names along with shard id and query to run to worker
nodes with worker_apply_shard_ddl_command.

To not break \STAGE command we pass public schema as paramater while calling
worker_apply_shard_ddl_command from there. This will not cause problem if user uses \STAGE
in different schema because passes schema name is used only if there is no schema name is
given in the query.
2016-07-21 14:17:26 +03:00
Eren 5512bb359a Set Explicit ShardId/JobId In Regression Tests
Fixes #271

This change sets ShardIds and JobIds for each test case. Before this change,
when a new test that somehow increments Job or Shard IDs is added, then
the tests after the new test should be updated.

ShardID and JobID sequences are set at the beginning of each file with the
following commands:

```
ALTER SEQUENCE pg_catalog.pg_dist_shardid_seq RESTART 290000;
ALTER SEQUENCE pg_catalog.pg_dist_jobid_seq RESTART 290000;
```

ShardIds and JobIds are multiples of 10000. Exceptions are:
- multi_large_shardid: shardid and jobid sequences are set to much larger values
- multi_fdw_large_shardid: same as above
- multi_join_pruning: Causes a race condition with multi_hash_pruning since
they are run in parallel.
2016-06-07 14:32:44 +03:00
Metin Doslu d4c4eaa9ff Move master_update_shard_statistics() to pg_catalog
Fixes #546
2016-06-02 10:52:47 +03:00
eren 132d9212d0 ADD master_modify_multiple_shards UDF
Fixes #10

This change creates a new UDF: master_modify_multiple_shards
Parameters:
  modify_query: A simple DELETE or UPDATE query as a string.

The UDF is similar to the existing master_apply_delete_command UDF.
Basically, given the modify query, it prunes the shard list, re-constructs
the query for each shard and sends the query to the placements.

Depending on the value of citus.multi_shard_commit_protocol, the commit
can be done in one-phase or two-phase manner.

Limitations:
* It cannot be called inside a transaction block
* It only be called with simple operator expressions (like Single Shard Modify)

Sample Usage:
```
SELECT master_modify_multiple_shards(
  'DELETE FROM customer_delete_protocol WHERE c_custkey > 500 AND c_custkey < 500');
```
2016-05-26 17:30:35 +03:00
Andres Freund 5f282dd241 Stamp 5.1 release. 2016-05-04 18:05:41 -07:00
Metin Doslu 866271b765 Add COPY support on worker nodes for append partitioned relations
Now, we can copy to an append-partitioned distributed relation from
any worker node by providing master options such as;

COPY relation_name FROM file_path WITH (delimiter '|', master_host 'localhost', master_port 5432);

where master_port is optional and default is 5432.
2016-05-03 16:00:00 +03:00
Andres Freund 25f919576f Add very basic infrastructure for schema upgrade scripts.
Citus' extension version now has a -$schemaversion appendix.  When the
schema is changed, a new schema version has to be added; changes to the
same schema version several commits inside a single pull request are ok.

Schema migration scripts between each schema version have to be
added. To ensure upgrade scripts work correctly a new regression test
ensures that all steps work.

The extension scripts to-be-used for CREATE EXTENSION (i.e. not
extension updates) are generated by concatenating citus.sql and the
relevant migration scripts.
2016-04-27 10:00:08 -07:00