Commit Graph

183 Commits (609a5465eab079fbec864568abee4dd8db2b8308)

Author SHA1 Message Date
Marco Slot bee202aa39
Fix PG upgrade scripts for 9.4 2021-07-05 13:39:28 +02:00
Halil Ozan Akgul db03afe91e Bump citus version to 10.2devel 2021-06-16 17:44:05 +03:00
Jelte Fennema 924959fdb1
Include result type in upgrade diff test (#4987)
We often change result types of functions slightly. Our downgrade tests
wouldn't notice these changes. This change adds them to the description
of these items.

An example of an SQL change that isn't caught without this change and is
caught with the get_rebalance_progress change in this PR:
https://github.com/citusdata/citus/pull/4963
2021-05-18 16:25:39 +02:00
Ahmet Gedemenli 5e5db9eefa Add udf citus_get_active_worker_nodes 2021-03-17 13:15:59 +03:00
Onur Tirtir dcc0207605 Add 10.0-2 schema version 2021-02-26 12:31:09 +03:00
Onur Tirtir 676d9a9726 Bump Citus to 10.1devel 2021-02-17 11:54:33 +03:00
Marco Slot 321cc784c7 Collapse Citus 7.* scripts into Citus 8.0-1 2020-12-21 22:55:51 +01:00
Jeff Davis c91e5b052b more test fixups 2020-12-07 13:43:27 -08:00
Marco Slot c9b658daea Add a public.citus_tables view 2020-12-03 17:31:40 +01:00
Nils Dijk 22df8027b0
add extra output for multi_extension targeting pg11 2020-11-17 19:01:54 +01:00
Nils Dijk d065bb495d
Prepare downgrade script and bump development version to 10.0-1 2020-11-17 18:55:35 +01:00
Onur Tirtir 5e3dc9d707 Bump citus version to 10.0devel 2020-11-09 13:16:54 +03:00
Marco Slot 881e5df780 Fix a bug that could lead to multiple maintenance daemons 2020-10-08 16:18:14 +02:00
Marco Slot 18219843d0 Add maintenance daemon error tests 2020-10-08 16:17:33 +02:00
Onur Tirtir 64d5ac6a10
Do not downgrade if a citus local table exists (#4174)
As the previous versions of Citus don't know how to handle citus local
tables, we should prevent downgrading from 9.5 to older versions if any
citus local tables exists.
2020-09-22 14:19:50 +03:00
Onur Tirtir be17ebb334 Bump citus version to 9.5devel 2020-07-01 14:46:55 +03:00
Hanefi Önaldı ca2ececb3b
Downgrade path from 9.4 to 9.3 to 9.2 2020-07-01 10:38:11 +03:00
Onur Tirtir 2e927bd6b7
Bump Citus to 9.4devel (#3788) 2020-04-22 12:50:00 +03:00
Nils Dijk 1d6ba1d09e
Refactor alter role to work on distributed roles (#3739)
DESCRIPTION: Alter role only works for citus managed roles

Alter role was implemented before we implemented good role management that hooks into the object propagation framework. This is a refactor of all alter role commands that have been implemented to
 - be on by default
 - only work for supported roles
 - make the citus extension owner a supported role

Instead of distributing the alter role commands for roles at the beginning of the node activation role it now _only_ executes the alter role commands for all users in all databases and in the current database.

In preparation of full role support small refactors have been done in the deparser.

Earlier tests targeting other roles than the citus extension owner have been either slightly changed or removed to be put back where we have full role support.

Fixes #2549
2020-04-16 12:23:27 +02:00
SaitTalhaNisanci ebda3eff61
read database name inside the function (#3730) 2020-04-09 13:11:13 +03:00
Hanefi Önaldı d1223bd6cc
Remove migration paths to 9.3-1, introduce 9.3-2 2020-04-03 12:50:45 +03:00
SaitTalhaNisanci 710970407f
not wait forever in multi_extension test (#3702) 2020-04-03 12:21:02 +03:00
Philip Dubé bcf54c5014 Address a couple issues with maintenace daemon management:
- Stop the daemon when citus extension is dropped
- Bail on maintenance daemon startup if myDbData is started with a non-zero pid
- Stop maintenance daemon from spawning itself
- Don't use postgres die, just wrap proc_exit(0)
- Assert(myDbData->workerPid == MyProcPid)

The two issues were that multiple daemons could be running for a database,
or that a daemon would be leftover after DROP EXTENSION citus
2020-02-21 16:49:01 +00:00
Nils Dijk 6ee82c381e
Add missing pieces for version bump of #3482 (#3523) 2020-02-21 12:35:29 +01:00
Philip Dubé 69dde460de See what flaky multi_extension test is doing with roles 2020-01-23 21:50:40 +00:00
Marco Slot 4c8d43c5d0 Bump repo version to 9.2devel 2019-11-29 07:33:39 +01:00
Hanefi Onaldi e3ad4aba94
Bump 9.1devel
* Add Changelog entry for 9.0.1
* Bump citus version to 9.1devel
2019-11-19 10:35:57 +03:00
Jelte Fennema 01da11f264
Change citus truncate trigger to AFTER and add more upgrade tests (#3070)
* Add more upgrade tests

* Fix citus trigger generation after upgrade

citus_truncate_trigger runs before truncate when created by create_distributed_table:
492d1b2cba/src/backend/distributed/commands/create_distributed_table.c (L1163)

* Remove pg_dist_jobid_seq
2019-10-07 16:43:04 +02:00
Nils Dijk 9c2c50d875
Hookup function/procedure deparsing to our utility hook (#3041)
DESCRIPTION: Propagate ALTER FUNCTION statements for distributed functions

Using the implemented deparser for function statements to propagate changes to both functions and procedures that are previously distributed.
2019-09-27 22:06:49 +02:00
Jelte Fennema 4bbf65d913
Change SQL migration build process for easier reviews (#2951)
@thanodnl told me it was a bit of a problem that it's impossible to see
the history of a UDF in git. The only way to do so is by reading all the
sql migration files from new to old. Another problem is that it's also
hard to review the changed UDF during code review, because to find out
what changed you have to do the same. I thought of a IMHO better (but
not perfect) way to handle this.

We keep the definition of a UDF in sql/udfs/{name_of_udf}/latest.sql.
That file we change whenever we need to make a change to the the UDF. On
top of that you also make a snapshot of the file in
sql/udfs/{name_of_udf}/{migration-version}.sql (e.g. 9.0-1.sql) by
copying the contents. This way you can easily view what the actual
changes were by looking at the latest.sql file.

There's still the question on how to use these files then. Sadly
postgres doesn't allow inclusion of other sql files in the migration sql
file (it does in psql using \i). So instead I used the C preprocessor+
make to compile a sql/xxx.sql to a build/sql/xxx.sql file. This final
build/sql/xxx.sql file has every occurence of #include "somefile.sql" in
sql/xxx.sql replaced by the contents of somefile.sql.
2019-09-13 18:44:27 +02:00
Nils Dijk 936d546a3c
Refactor Ensure Schema Exists to Ensure Dependecies Exists (#2882)
DESCRIPTION: Refactor ensure schema exists to dependency exists

Historically we only supported schema's as table dependencies to be created on the workers before a table gets distributed. This PR puts infrastructure in place to walk pg_depend to figure out which dependencies to create on the workers. Currently only schema's are supported as objects to create before creating a table.

We also keep track of dependencies that have been created in the cluster. When we add a new node to the cluster we use this catalog to know which objects need to be created on the worker.

Side effect of knowing which objects are already distributed is that we don't have debug messages anymore when creating schema's that are already created on the workers.
2019-09-04 14:10:20 +02:00
Nils Dijk be6b7bec69
Add UDF citus_(prepare|finish)_pg_upgrade to aid with upgrading citus (#2877)
DESCRIPTION: Add functions to help with postgres upgrades

Currently there is [a list of manual steps](https://docs.citusdata.com/en/v8.2/admin_guide/upgrading_citus.html?highlight=upgrade#upgrading-postgresql-version-from-10-to-11) to perform during a postgres upgrade. These steps guarantee our catalog tables are kept and counter values are maintained across upgrades.

Having more than 1 command in our docs for users to manually execute during upgrades is error prone for both the user, and our docs. There are already 2 catalog tables that have been introduced to citus that have not been added to our docs for backing up during upgrades (`pg_authinfo` and `pg_dist_poolinfo`).

As we add more functionality to citus we run into situations where there are more steps required either before or after the upgrade. At the same time, when we move catalog tables to a place where the contents will be maintained automatically during upgrades we could have less steps in our docs. This will come to a hard to maintain matrix of citus versions and steps to be performed.

Instead we could take ownership of these steps within the extension itself. This PR introduces two new functions for the user to use instead of long lists of error prone instructions to follow.
 - `citus_prepare_pg_upgrade`
    This function should be called by the user right before shutting down the cluster. This will ensure all citus catalog tables are backed up in a location where the information will be retained during an upgrade.
- `citus_finish_pg_upgrade`
    This function should be called right after a pg_upgrade of the cluster. This will restore the catalog tables to the state before the upgrade happend.

Both functions need to be executed both on the coordinator and on all the workers, in the same fashion our current documentation instructs to do.

There are two known problems with this function in its current form, which is also a problem with our docs. We should schedule time in the future to improve on this, but having it automated now is better as we are about to add extra steps to take after upgrades.
 - When you install citus in a clean cluster we do enable ssl for communication between the coordinator and the workers. If an upgrade to a clean cluster is performed we do not setup ssl on the new cluster causing the communication to fail.
 - There are no automated tests added in this PR to execute an upgrade test durning every build. 
    Our current test infrastructure does not allow for 2 versions of postgres to exist in the same environment. We will need to invest time to create a new testing harness that could run the following scenario:
      1. Create cluster
      2. Run extensible scripts to execute arbitrary statements on this cluster
      3. Perform an upgrade by preparing, upgrading and finishing
      4. Run extensible scripts to verify all objects created by earlier scripts exists in correct form in the upgraded cluster

    Given the non trivial amount of work involved for such a suite I'd like to land this before we have 
automated testing.

On a side note; As the reviewer noticed, the tables created in the public namespace are not visible in `psql` with `\d`. The backup catalog tables have the same name as the tables in `pg_catalog`. Due to postgres internals `pg_catalog` is first in the search path and therefore the non-qualified name would alwasy resolve to `pg_catalog.pg_dist_*`. Internally this is called a non-visible table as it would resolve to a different table without a qualified name. Only visible tables are shown with `\d`.
2019-08-13 15:53:10 +02:00
Philip Dubé acbaa38a62 Squash migrations for versions 5/6, don't use WITH OIDS 2019-07-24 11:03:29 -07:00
Marco Slot efbe58eab2 Fix SQL schema version, we skipped 8.3 2019-07-17 16:05:25 +02:00
Hadi Moshayedi d233887d68 Fix multi_extension in check-multi-vg 2019-07-04 13:03:46 +02:00
Marco Slot d6c667946c Fix citus_executor_name mapping by reimplementing it in C 2019-06-29 22:38:29 +02:00
Önder Kalacı 40da78c6fd
Introduce the adaptive executor (#2798)
With this commit, we're introducing the Adaptive Executor. 


The commit message consists of two distinct sections. The first part explains
how the executor works. The second part consists of the commit messages of
the individual smaller commits that resulted in this commit. The readers
can search for the each of the smaller commit messages on 
https://github.com/citusdata/citus and can learn more about the history
of the change.

/*-------------------------------------------------------------------------
 *
 * adaptive_executor.c
 *
 * The adaptive executor executes a list of tasks (queries on shards) over
 * a connection pool per worker node. The results of the queries, if any,
 * are written to a tuple store.
 *
 * The concepts in the executor are modelled in a set of structs:
 *
 * - DistributedExecution:
 *     Execution of a Task list over a set of WorkerPools.
 * - WorkerPool
 *     Pool of WorkerSessions for the same worker which opportunistically
 *     executes "unassigned" tasks from a queue.
 * - WorkerSession:
 *     Connection to a worker that is used to execute "assigned" tasks
 *     from a queue and may execute unasssigned tasks from the WorkerPool.
 * - ShardCommandExecution:
 *     Execution of a Task across a list of placements.
 * - TaskPlacementExecution:
 *     Execution of a Task on a specific placement.
 *     Used in the WorkerPool and WorkerSession queues.
 *
 * Every connection pool (WorkerPool) and every connection (WorkerSession)
 * have a queue of tasks that are ready to execute (readyTaskQueue) and a
 * queue/set of pending tasks that may become ready later in the execution
 * (pendingTaskQueue). The tasks are wrapped in a ShardCommandExecution,
 * which keeps track of the state of execution and is referenced from a
 * TaskPlacementExecution, which is the data structure that is actually
 * added to the queues and describes the state of the execution of a task
 * on a particular worker node.
 *
 * When the task list is part of a bigger distributed transaction, the
 * shards that are accessed or modified by the task may have already been
 * accessed earlier in the transaction. We need to make sure we use the
 * same connection since it may hold relevant locks or have uncommitted
 * writes. In that case we "assign" the task to a connection by adding
 * it to the task queue of specific connection (in
 * AssignTasksToConnections). Otherwise we consider the task unassigned
 * and add it to the task queue of a worker pool, which means that it
 * can be executed over any connection in the pool.
 *
 * A task may be executed on multiple placements in case of a reference
 * table or a replicated distributed table. Depending on the type of
 * task, it may not be ready to be executed on a worker node immediately.
 * For instance, INSERTs on a reference table are executed serially across
 * placements to avoid deadlocks when concurrent INSERTs take conflicting
 * locks. At the beginning, only the "first" placement is ready to execute
 * and therefore added to the readyTaskQueue in the pool or connection.
 * The remaining placements are added to the pendingTaskQueue. Once
 * execution on the first placement is done the second placement moves
 * from pendingTaskQueue to readyTaskQueue. The same approach is used to
 * fail over read-only tasks to another placement.
 *
 * Once all the tasks are added to a queue, the main loop in
 * RunDistributedExecution repeatedly does the following:
 *
 * For each pool:
 * - ManageWorkPool evaluates whether to open additional connections
 *   based on the number unassigned tasks that are ready to execute
 *   and the targetPoolSize of the execution.
 *
 * Poll all connections:
 * - We use a WaitEventSet that contains all (non-failed) connections
 *   and is rebuilt whenever the set of active connections or any of
 *   their wait flags change.
 *
 *   We almost always check for WL_SOCKET_READABLE because a session
 *   can emit notices at any time during execution, but it will only
 *   wake up WaitEventSetWait when there are actual bytes to read.
 *
 *   We check for WL_SOCKET_WRITEABLE just after sending bytes in case
 *   there is not enough space in the TCP buffer. Since a socket is
 *   almost always writable we also use WL_SOCKET_WRITEABLE as a
 *   mechanism to wake up WaitEventSetWait for non-I/O events, e.g.
 *   when a task moves from pending to ready.
 *
 * For each connection that is ready:
 * - ConnectionStateMachine handles connection establishment and failure
 *   as well as command execution via TransactionStateMachine.
 *
 * When a connection is ready to execute a new task, it first checks its
 * own readyTaskQueue and otherwise takes a task from the worker pool's
 * readyTaskQueue (on a first-come-first-serve basis).
 *
 * In cases where the tasks finish quickly (e.g. <1ms), a single
 * connection will often be sufficient to finish all tasks. It is
 * therefore not necessary that all connections are established
 * successfully or open a transaction (which may be blocked by an
 * intermediate pgbouncer in transaction pooling mode). It is therefore
 * essential that we take a task from the queue only after opening a
 * transaction block.
 *
 * When a command on a worker finishes or the connection is lost, we call
 * PlacementExecutionDone, which then updates the state of the task
 * based on whether we need to run it on other placements. When a
 * connection fails or all connections to a worker fail, we also call
 * PlacementExecutionDone for all queued tasks to try the next placement
 * and, if necessary, mark shard placements as inactive. If a task fails
 * to execute on all placements, the execution fails and the distributed
 * transaction rolls back.
 *
 * For multi-row INSERTs, tasks are executed sequentially by
 * SequentialRunDistributedExecution instead of in parallel, which allows
 * a high degree of concurrency without high risk of deadlocks.
 * Conversely, multi-row UPDATE/DELETE/DDL commands take aggressive locks
 * which forbids concurrency, but allows parallelism without high risk
 * of deadlocks. Note that this is unrelated to SEQUENTIAL_CONNECTION,
 * which indicates that we should use at most one connection per node, but
 * can run tasks in parallel across nodes. This is used when there are
 * writes to a reference table that has foreign keys from a distributed
 * table.
 *
 * Execution finishes when all tasks are done, the query errors out, or
 * the user cancels the query.
 *
 *-------------------------------------------------------------------------
 */



All the commits involved here:
* Initial unified executor prototype

* Latest changes

* Fix rebase conflicts to master branch

* Add missing variable for assertion

* Ensure that master_modify_multiple_shards() returns the affectedTupleCount

* Adjust intermediate result sizes

The real-time executor uses COPY command to get the results
from the worker nodes. Unified executor avoids that which
results in less data transfer. Simply adjust the tests to lower
sizes.

* Force one connection per placement (or co-located placements) when requested

The existing executors (real-time and router) always open 1 connection per
placement when parallel execution is requested.

That might be useful under certain circumstances:

(a) User wants to utilize as much as CPUs on the workers per
distributed query
(b) User has a transaction block which involves COPY command

Also, lots of regression tests rely on this execution semantics.
So, we'd enable few of the tests with this change as well.

* For parameters to be resolved before using them

For the details, see PostgreSQL's copyParamList()

* Unified executor sorts the returning output

* Ensure that unified executor doesn't ignore sequential execution of DDLJob's

Certain DDL commands, mainly creating foreign keys to reference tables,
should be executed sequentially. Otherwise, we'd end up with a self
distributed deadlock.

To overcome this situaiton, we set a flag `DDLJob->executeSequentially`
and execute it sequentially. Note that we have to do this because
the command might not be called within a transaction block, and
we cannot call `SetLocalMultiShardModifyModeToSequential()`.

This fixes at least two test: multi_insert_select_on_conflit.sql and
multi_foreign_key.sql

Also, I wouldn't mind scattering local `targetPoolSize` variables within
the code. The reason is that we'll soon have a GUC (or a global
variable based on a GUC) that'd set the pool size. In that case, we'd
simply replace `targetPoolSize` with the global variables.

* Fix 2PC conditions for DDL tasks

* Improve closing connections that are not fully established in unified execution

* Support foreign keys to reference tables in unified executor

The idea for supporting foreign keys to reference tables is simple:
Keep track of the relation accesses within a transaction block.
    - If a parallel access happens on a distributed table which
      has a foreign key to a reference table, one cannot modify
      the reference table in the same transaction. Otherwise,
      we're very likely to end-up with a self-distributed deadlock.
    - If an access to a reference table happens, and then a parallel
      access to a distributed table (which has a fkey to the reference
      table) happens, we switch to sequential mode.

Unified executor misses the function calls that marks the relation
accesses during the execution. Thus, simply add the necessary calls
and let the logic kick in.

* Make sure to close the failed connections after the execution

* Improve comments

* Fix savepoints in unified executor.

* Rebuild the WaitEventSet only when necessary

* Unclaim connections on all errors.

* Improve failure handling for unified executor

   - Implement the notion of errorOnAnyFailure. This is similar to
     Critical Connections that the connection managament APIs provide
   - If the nodes inside a modifying transaction expand, activate 2PC
   - Fix few bugs related to wait event sets
   - Mark placement INACTIVE during the execution as much as possible
     as opposed to we do in the COMMIT handler
   - Fix few bugs related to scheduling next placement executions
   - Improve decision on when to use 2PC

Improve the logic to start a transaction block for distributed transactions

- Make sure that only reference table modifications are always
  executed with distributed transactions
- Make sure that stored procedures and functions are executed
  with distributed transactions

* Move waitEventSet to DistributedExecution

This could also be local to RunDistributedExecution(), but in that case
we had to mark it as "volatile" to avoid PG_TRY()/PG_CATCH() issues, and
cast it to non-volatile when doing WaitEventSetFree(). We thought that
would make code a bit harder to read than making this non-local, so we
move it here. See comments for PG_TRY() in postgres/src/include/elog.h
and "man 3 siglongjmp" for more context.

* Fix multi_insert_select test outputs

Two things:
   1) One complex transaction block is now supported. Simply update
      the test output
   2) Due to dynamic nature of the unified executor, the orders of
      the errors coming from the shards might change (e.g., all of
      the queries on the shards would fail, but which one appears
      on the error message?). To fix that, we simply added it to
      our shardId normalization tool which happens just before diff.

* Fix subeury_and_cte test

The error message is updated from:
	failed to execute task
To:
        more than one row returned by a subquery or an expression

which is a lot clearer to the user.

* Fix intermediate_results test outputs

Simply update the error message from:
	could not receive query results
to
	result "squares" does not exist

which makes a lot more sense.

* Fix multi_function_in_join test

The error messages update from:
     Failed to execute task XXX
To:
     function f(..) does not exist

* Fix multi_query_directory_cleanup test

The unified executor does not create any intermediate files.

* Fix with_transactions test

A test case that just started to work fine

* Fix multi_router_planner test outputs

The error message is update from:
	Could not receive query results
To:
	Relation does not exists

which is a lot more clearer for the users

* Fix multi_router_planner_fast_path test

The error message is update from:
	Could not receive query results
To:
	Relation does not exists

which is a lot more clearer for the users

* Fix isolation_copy_placement_vs_modification by disabling select_opens_transaction_block

* Fix ordering in isolation_multi_shard_modify_vs_all

* Add executor locks to unified executor

* Make sure to allocate enought WaitEvents

The previous code was missing the waitEvents for the latch and
postmaster death.

* Fix rebase conflicts for master rebase

* Make sure that TRUNCATE relies on unified executor

* Implement true sequential execution for multi-row INSERTS

Execute the individual tasks executed one by one. Note that this is different than
MultiShardConnectionType == SEQUENTIAL_CONNECTION case (e.g., sequential execution
mode). In that case, running the tasks across the nodes in parallel is acceptable
and implemented in that way.

However, the executions that are qualified here would perform poorly if the
tasks across the workers are executed in parallel. We currently qualify only
one class of distributed queries here, multi-row INSERTs. If we do not enforce
true sequential execution, concurrent multi-row upserts could easily form
a distributed deadlock when the upserts touch the same rows.

* Remove SESSION_LIFESPAN flag in unified_executor

* Apply failure test updates

We've changed the failure behaviour a bit, and also the error messages
that show up to the user. This PR covers majority of the updates.

* Unified executor honors citus.node_connection_timeout

With this commit, unified executor errors out if even
a single connection cannot be established within
citus.node_connection_timeout.

And, as a side effect this fixes failure_connection_establishment
test.

* Properly increment/decrement pool size variables

Before this commit, the idle and active connection
counts were not properly calculated.

* insert_select_executor goes through unified executor.

* Add missing file for task tracker

* Modify ExecuteTaskListExtended()'s signature

* Sort output of INSERT ... SELECT ... RETURNING

* Take partition locks correctly in unified executor

* Alternative implementation for force_max_query_parallelization

* Fix compile warnings in unified executor

* Fix style issues

* Decrement idleConnectionCount when idle connection is lost

* Always rebuild the wait event sets

In the previous implementation, on waitFlag changes, we were only
modifying the wait events. However, we've realized that it might
be an over optimization since (a) we couldn't see any performance
benefits (b) we see some errors on failures and because of (a)
we prefer to disable it now.

* Make sure to allocate enough sized waitEventSet

With multi-row INSERTs, we might have more sessions than
task*workerCount after few calls of RunDistributedExecution()
because the previous sessions would also be alive.

Instead, re-allocate events when the connectino set changes.

* Implement SELECT FOR UPDATE on reference tables

On master branch, we do two extra things on SELECT FOR UPDATE
queries on reference tables:
   - Acquire executor locks
   - Execute the query on all replicas

With this commit, we're implementing the same logic on the
new executor.

* SELECT FOR UPDATE opens transaction block even if SelectOpensTransactionBlock disabled

Otherwise, users would be very confused and their logic is very likely
to break.

* Fix build error

* Fix the newConnectionCount calculation in ManageWorkerPool

* Fix rebase conflicts

* Fix minor test output differences

* Fix citus indent

* Remove duplicate sorts that is added with rebase

* Create distributed table via executor

* Fix wait flags in CheckConnectionReady

* failure_savepoints output for unified executor.

* failure_vacuum output (pg 10) for unified executor.

* Fix WaitEventSetWait timeout in unified executor

* Stabilize failure_truncate test output

* Add an ORDER BY to multi_upsert

* Fix regression test outputs after rebase to master

* Add executor.c comment

* Rename executor.c to adaptive_executor.c

* Do not schedule tasks if the failed placement is not ready to execute

Before the commit, we were blindly scheduling the next placement executions
even if the failed placement is not on the ready queue. Now, we're ensuring
that if failed placement execution is on a failed pool or session where the
execution is on the pendingQueue, we do not schedule the next task. Because
the other placement execution should be already running.

* Implement a proper custom scan node for adaptive executor

- Switch between the executors, add GUC to set the pool size
- Add non-adaptive regression test suites
- Enable CIRCLE CI for non-adaptive tests
- Adjust test output files

* Add slow start interval to the executor

* Expose max_cached_connection_per_worker to user

* Do not start slow when there are cached connections

* Consider ExecutorSlowStartInterval in NextEventTimeout

* Fix memory issues with ReceiveResults().

* Disable executor via TaskExecutorType

* Make sure to execute the tests with the other executor

* Use task_executor_type to enable-disable adaptive executor

* Remove useless code

* Adjust the regression tests

* Add slow start regression test

* Rebase to master

* Fix test failures in adaptive executor.

* Rebase to master - 2

* Improve comments & debug messages

* Set force_max_query_parallelization in isolation_citus_dist_activity

* Force max parallelization for creating shards when asked to use exclusive connection.

* Adjust the default pool size

* Expand description of max_adaptive_executor_pool_size GUC

* Update warnings in FinishRemoteTransactionCommit()

* Improve session clean up at the end of execution

Explicitly list all the states that the execution might end,
otherwise warn.

* Remove MULTI_CONNECTION_WAIT_RETRY which is not used at all

* Add more ORDER BYs to multi_mx_partitioning
2019-06-28 14:04:40 +02:00
Murat Tuncer fd868ec268 Fix citus_stat_statements view
Join between pg_stat_statements and citus_query_stats should
include queryid, dbid, userid instead of just queryid.
2018-11-29 14:49:16 +03:00
Marco Slot 5a63deab2e Clean up UDFs and remove unnecessary permissions 2018-11-26 14:40:37 +01:00
Marco Slot 30bad7e66f Add worker_execute_sql_task UDF 2018-11-22 18:15:33 +01:00
Murat Tuncer 4f8042085c Fix drop schema in mx with partitioned tables
Drop schema command fails in mx mode if there
is a partitioned table with active partitions.

This is due to fact that sql drop trigger receives
all the dropped objects including partitions. When
we call drop table on parent partition, it also drops
the partitions on the mx node. This causes the drop
table command on partitions to fail on mx node because
they are already dropped when the partition parent was
dropped.

With this work we did not require the table to exist on
worker_drop_distributed_table.
2018-10-08 17:01:54 -07:00
velioglu d7f75e5b48 Add citus_lock_waits to show locked distributed queries 2018-09-20 14:13:51 +03:00
velioglu d1f005daac Adds UDFs for testing MX functionalities with isolation tests 2018-09-12 07:04:16 +03:00
Onder Kalaci d657759c97 Views to Provide some insight about the distributed transactions on Citus MX
With this commit, we implement two views that are very similar
to pg_stat_activity, but showing queries that are involved in
distributed queries:

    - citus_dist_stat_activity: Shows all the distributed queries
    - citus_worker_stat_activity: Shows all the queries on the shards
                                  that are initiated by distributed queries.

Both views have the same columns in the outputs. In very basic terms, both of the views
are meant to provide some useful insights about the distributed
transactions within the cluster. As the names reveal, both views are similar to pg_stat_activity.
Also note that these views can be pretty useful on Citus MX clusters.

Note that when the views are queried from the worker nodes, they'd not show the distributed
transactions that are initiated from the coordinator node. The reason is that the worker
nodes do not know the host/port of the coordinator. Thus, it is advisable to query the
views from the coordinator.

If we bucket the columns that the views returns, we'd end up with the following:

- Hostnames and ports:
   - query_hostname, query_hostport: The node that the query is running
   - master_query_host_name, master_query_host_port: The node in the cluster
                                                   initiated the query.
    Note that for citus_dist_stat_activity view, the query_hostname-query_hostport
    is always the same with master_query_host_name-master_query_host_port. The
    distinction is mostly relevant for citus_worker_stat_activity. For example,
    on Citus MX, a users starts a transaction on Node-A, which starts worker
    transactions on Node-B and Node-C. In that case, the query hostnames would be
    Node-B and Node-C whereas the master_query_host_name would Node-A.

- Distributed transaction related things:
    This is mostly the process_id, distributed transactionId and distributed transaction
    number.

- pg_stat_activity columns:
    These two views get all the columns from pg_stat_activity. We're basically joining
    pg_stat_activity with get_all_active_transactions on process_id.
2018-09-10 21:33:27 +03:00
Onder Kalaci 5cf8fbe7b6 Add infrastructure to relation if exists 2018-09-07 14:49:36 +03:00
Onder Kalaci 1b3257816e Make sure that table is dropped before shards are dropped
This commit fixes a bug where a concurrent DROP TABLE deadlocks
with SELECT (or DML) when the SELECT is executed from the workers.

The problem was that Citus used to remove the metadata before
droping the table on the workers. That creates a time window
where the SELECT starts running on some of the nodes and DROP
table on some of the other nodes.
2018-09-04 08:57:20 +03:00
Onder Kalaci 974cbf11a5 Hide shard names on MX worker nodes
This commit by default enables hiding shard names on MX workers
by simple replacing `pg_table_is_visible()` calls with
`citus_table_is_visible()` calls on the MX worker nodes. The latter
function filters out tables that are known to be shards.

The main motivation of this change is a better UX. The functionality
can be opted out via a GUC.

We also added two views, namely citus_shards_on_worker and
citus_shard_indexes_on_worker such that users can query
them to see the shards and their corresponding indexes.

We also added debug messages such that the filtered tables can
be interactively seen by setting the level to DEBUG1.
2018-08-07 14:21:45 +03:00
mehmet furkan şahin bc757845eb Citus versioning fix 2018-07-26 10:56:34 +03:00
Jason Petersen 318119910b
Add pg_dist_poolinfo table
For storing nodes' pool host/port overrides.
2018-07-10 09:30:22 -07:00
Murat Tuncer a7277526fd Make citus_stat_statements_reset() super user function 2018-07-10 11:21:20 +03:00
Murat Tuncer 23800f50f1 Update citus_stat_statements view and regression tests 2018-07-03 16:14:13 +03:00
Jason Petersen 7a75c2ed31 Add connparam invalidation trigger creation logic
This needs to live in Community, since we haven't yet added the com-
plication of having divergent upgrade scripts in Enterprise.
2018-06-20 14:13:18 -06:00
Onder Kalaci 317dd02a2f Implement single repartitioning on hash distributed tables
* Change worker_hash_partition_table() such that the
     divergence between Citus planner's hashing and
     worker_hash_partition_table() becomes the same.

   * Rename single partitioning to single range partitioning.

   * Add single hash repartitioning. Basically, logical planner
     treats single hash and range partitioning almost equally.
     Physical planner, on the other hand, treats single hash and
     dual hash repartitioning almost equally (except for JoinPruning).

   * Add a new GUC to enable this feature
2018-05-02 18:50:55 +03:00
velioglu 698d585fb5 Remove broadcast join logic
After this change all the logic related to shard data fetch logic
will be removed. Planner won't plan any ShardFetchTask anymore.
Shard fetch related steps in real time executor and task-tracker
executor have been removed.
2018-03-30 11:45:19 +03:00
Markus Sintonen 6202e80d06 Implemented jsonb_agg, json_agg, jsonb_object_agg, json_object_agg 2018-02-18 00:19:18 +02:00
Marco Slot f8550b8c85 Fix issues with read_intermediate_result signature 2017-12-07 13:47:56 +01:00
Marco Slot 4cdadfcab6 Add intermediate results infrastructure 2017-12-04 14:50:11 +01:00
Marco Slot 89eb833375 Use citus.next_shard_id where practical in regression tests 2017-11-15 10:12:05 +01:00
Marco Slot 7e34348334 Add shard transfer mode parameter to shard copy functions 2017-10-31 13:30:48 +01:00
Brian Cloutier ebcb2b65e9 Add master_move_node function 2017-10-16 10:51:28 -07:00
Murat Tuncer 4676c4f7a5 Prevent crash when remote transaction start fails (#1662)
We sent multiple commands to worker when starting a transaction.
Previously we only checked the result of the first command that
is transaction 'BEGIN' which always succeeds. Any failure on
following commands were not checked.

With this commit, we make sure all command results are checked.
If there is any error we report the first error found.
2017-09-26 17:25:46 -07:00
Marco Slot 641420d79f Remove source node argument from dump_local_wait_edges 2017-08-23 13:14:00 +02:00
Onder Kalaci 6532b69873 Kill the maintenance daemon on DROP DATABASE 2017-08-18 16:03:08 +03:00
Onder Kalaci a333c9f16c Add infrastructure for distributed deadlock detection
This commit adds all the necessary pieces to do the distributed
deadlock detection.

Each distributed transaction is already assigned with distributed
transaction ids introduced with
3369f3486f. The dependency among the
distributed transactions are gathered with
80ea233ec1.

With this commit, we implement a DFS (depth first seach) on the
dependency graph and search for cycles. Finding a cycle reveals
a distributed deadlock.

Once we find the deadlock, we examine the path that the cycle exists
and cancel the youngest distributed transaction.

Note that, we're not yet enabling the deadlock detection by default
with this commit.
2017-08-12 13:28:37 +03:00
velioglu 100739f62a Change citus subversion 2017-08-11 11:57:57 +03:00
Marco Slot 0ae265c436 Add citus_create_restore_point for distributed snapshots 2017-08-11 07:36:20 +02:00
Eren Başak 3061737712 Define Some Utility Functions
This change declares two new functions:

`master_update_table_statistics` updates the statistics of shards belong
to the given table as well as its colocated tables.

`get_colocated_shard_array` returns the ids of colocated shards of a
given shard.
2017-08-10 12:42:46 +03:00
Brian Cloutier 2e0916e15a Add master_add_secondary_node() UDF 2017-08-09 17:10:48 +03:00
Brian Cloutier 5618e69386 Add pg_dist_node.nodecluster 2017-08-08 11:18:31 +03:00
Brian Cloutier b20a086a8f master_activate_node UDF also returns noderole 2017-07-28 16:02:43 +03:00
Brian Cloutier 32e16ffe02 Give isolation tester ability to see locks on workers 2017-07-26 18:43:04 +03:00
Marco Slot 81198a1d02 Add function for dumping local wait edges 2017-07-25 16:52:32 +02:00
Onder Kalaci 3369f3486f Introduce distributed transaction ids
This commit adds distributed transaction id infrastructure in
the scope of distributed deadlock detection.

In general, the distributed transaction id consists of a tuple
in the form of: `(databaseId, initiatorNodeIdentifier, transactionId,
timestamp)`.

Briefly, we add a shared memory block on each node, which holds some
information per backend (i.e., an array `BackendData backends[MaxBackends]`).
Later, on each coordinated transaction, Citus sends
`SELECT assign_distributed_transaction_id()` right after `BEGIN`.
For that backend on the worker, the distributed transaction id is set to
the values assigned via the function call.

The aim of the above is to correlate the transactions on the coordinator
to the transactions on the worker nodes.
2017-07-18 15:01:42 +03:00
Brian Cloutier 7ad95b53d2 Rename pg_dist_shard_placement -> pg_dist_placement
Comes with a few changes:

- Change the signature of some functions to accept groupid
  - InsertShardPlacementRow
  - DeleteShardPlacementRow
  - UpdateShardPlacementState

- NodeHasActiveShardPlacements returns true if the group the node is a
  part of has any active shard placements

- TupleToShardPlacement now returns ShardPlacements which have NULL
  nodeName and nodePort.

- Populate (nodeName, nodePort) when creating ShardPlacements
- Disallow removing a node if it contains any shard placements

- DeleteAllReferenceTablePlacementsFromNode matches based on group. This
  doesn't change behavior for now (while there is only one node per
  group), but means in the future callers should be careful about
  calling it on a secondary node, it'll delete placements on the primary.

- Create concept of a GroupShardPlacement, which represents an actual
  tuple in pg_dist_placement and is distinct from a ShardPlacement,
  which has been resolved to a specific node. In the future
  ShardPlacement should be renamed to NodeShardPlacement.

- Create some triggers which allow existing code to continue to insert
  into and update pg_dist_shard_placement as if it still existed.
2017-07-12 14:17:31 +02:00
Marco Slot 04fe3f03f6 Change implementation of shard_name UDF to get schema-qualified shard name 2017-07-04 10:49:40 +03:00
Andres Freund 4a3b2de4c5 Add some tests checking that maintenance daemon gets started.
The 2nd database one is a bit slow, but also shows something
important, so we might want to keep it?
2017-06-23 11:53:39 -07:00
Andres Freund 1691f780fd Force cache invalidation machinery to be initialized earlier.
Previously it was not guaranteed that invalidations were registered
after creating the extension, only if the extension was used
afterwards.
2017-06-23 11:20:10 -07:00
Burak Yucesoy aff6a3dcc4 Add tests for version check 2017-05-24 17:39:25 +03:00
Burak Yucesoy 7a7c74cc87 Add tests for version checks 2017-05-22 09:53:29 +03:00
Jason Petersen cc45712144 Bump extension and configure PACKAGE versions
Actually getting this done before the next dev cycle begins.
2017-05-17 15:25:30 -06:00
Marco Slot 8edba5f309 Honour enable_ddl_propagation in truncate trigger 2017-04-29 03:32:52 +02:00
Burak Yucesoy e9095e62ec Decouple reference table replication
With this change we add an option to add a node without replicating all reference
tables to that node. If a node is added with this option, we mark the node as
inactive and no queries will sent to that node.

We also added two new UDFs;
 - master_activate_node(host, port):
    - marks node as active and replicates all reference tables to that node
 - master_add_inactive_node(host, port):
    - only adds node to pg_dist_node
2017-04-17 13:33:31 +03:00
Jason Petersen 8b4620ef16
Use RESET for GUC test, not reconnect
More limited in what it does, better test.
2017-04-04 16:40:17 -06:00
Burak Yucesoy a09614553f Add enable_version_checks GUC and address feedback 2017-04-04 19:11:13 +03:00
Burak Yucesoy 087d8427e3
Error out if binary citus version does not match installed extension
With this change, we start to error out if loaded citus binaries does not match
the available major version or installed citus extension version. In this case
we force user to restart the server or run ALTER EXTENSION depending on the
situation
2017-04-03 17:36:13 -06:00
velioglu e32aff1a26 Size UDFs implemented
citus_table_size, citus_relation_size and citus_total_relation_size UDFs are implemented.
2017-03-16 13:50:30 +03:00
Brian Cloutier 807beb7bc0 Remove master_get_local_first_candidate_nodes 2017-03-07 11:50:59 +03:00
Metin Doslu 7cff8719c2 Add worker_hash() and a stub for isolate_tenant_to_new_shard() 2017-01-20 14:38:01 +02:00
Eren Basak b686d9a025 Add Sequence Support for MX Tables
This change adds support for serial columns to be used with MX tables.
Prior to this change, sequences of serial columns were created in all
workers (for being able to create shards) but never used. With MX, we
need to set the sequences so that sequences in each worker create
unique values. This is done by setting the MINVALUE, MAXVALUE and
START values of the sequence.
2017-01-18 09:43:38 +03:00
Eren Basak b1ce8d61c0 Create Invalidation Trigger for pg_dist_local_group Table Updates 2017-01-18 09:43:38 +03:00
Murat Tuncer b93185d800 Add master_disable_node UDF
We can now remove nodes from cluster regardless of them
having an active shard placement.
2017-01-10 10:54:57 +03:00
Burak Yucesoy 31cd2357fe Add upgrade_to_reference_table
With this change we introduce new UDF, upgrade_to_reference_table, which can be used to
upgrade existing broadcast tables reference tables. For upgrading, we require that given
table contains only one shard.
2017-01-02 17:54:42 +02:00
Eren Basak e43eed0f7a Prevent Deadlock on Dropping MX Tables with Sequences
This change prevents a deadlock situation during DROP TABLE on an
mx table with sequences on workers with metadata.
2016-12-28 16:32:20 +03:00
Burak Yucesoy 0851fd2f0b GRANT SELECT access for metadata tables to public
Previously, we errored out if non-user tries to SELECT query for some metadata tables. It
seems that we already GRANT SELECT access to some metadata tables but not others. With
this change, we GRANT SELECT access to all existing Citus metadata tables.
2016-12-23 16:32:47 +03:00
Eren Basak 31af40cc26 Handle MX tables on workers during drop table commands 2016-12-23 15:43:32 +03:00
Marco Slot 6852f8a951 Add shard locking UDFs 2016-12-22 11:04:34 +01:00
Burak Yücesoy 501a2ecead Add get_distribution_value_shardid UDF (#1048)
* Add get_distribution_value_shardid UDF

With this UDF users can now map given distribution value to shard id. We mostly hide
shardids from users to prevent unnecessary complexity but some power users might need
to know about which entry/value is stored in which shard for maintanence purposes.

Signature of this UDF is as follows;

bigint get_distribution_value_shardid(table_name regclass, distribution_value anyelement)
2016-12-22 12:17:08 +03:00
Onder Kalaci 9f0bd4cb36 Reference Table Support - Phase 1
With this commit, we implemented some basic features of reference tables.

To start with, a reference table is
  * a distributed table whithout a distribution column defined on it
  * the distributed table is single sharded
  * and the shard is replicated to all nodes

Reference tables follows the same code-path with a single sharded
tables. Thus, broadcast JOINs are applicable to reference tables.
But, since the table is replicated to all nodes, table fetching is
not required any more.

Reference tables support the uniqueness constraints for any column.

Reference tables can be used in INSERT INTO .. SELECT queries with
the following rules:
  * If a reference table is in the SELECT part of the query, it is
    safe join with another reference table and/or hash partitioned
    tables.
  * If a reference table is in the INSERT part of the query, all
    other participating tables should be reference tables.

Reference tables follow the regular co-location structure. Since
all reference tables are single sharded and replicated to all nodes,
they are always co-located with each other.

Queries involving only reference tables always follows router planner
and executor.

Reference tables can have composite typed columns and there is no need
to create/define the necessary support functions.

All modification queries, master_* UDFs, EXPLAIN, DDLs, TRUNCATE,
sequences, transactions, COPY, schema support works on reference
tables as expected. Plus, all the pre-requisites associated with
distribution columns are dismissed.
2016-12-20 14:09:35 +02:00
Metin Doslu 86cca54857 Add colocate_with option to create_distributed_table()
With this commit, we support three versions of colocate_with: i.default, ii.none
and iii. a specific table name.
2016-12-16 14:53:35 +02:00
Marco Slot 5714be0da5 Expose the column_to_column_name UDF to make partkey in pg_dist_partition human-readable 2016-12-14 10:46:33 +01:00