citus

Commit Graph

Author	SHA1	Message	Date
Onder Kalaci	621ccf3946	Ensure to use initialized MaxBackends Postgresql loads shared libraries before calculating MaxBackends. However, Citus relies on MaxBackends being set. Thus, with this commit we use the same steps to calculate MaxBackends while Citus is being loaded (e.g., PG_Init is called). Note that this is safe since all the elements that are used to calculate MaxBackends are PGC_POSTMASTER gucs and a constant value.	2018-12-03 13:25:51 +03:00
Onder Kalaci	b6ebd791a6	Sort task list for multi-task explain outputs This is purely for ensuring that regression tests do not randomly fail.	2018-11-30 11:19:37 -07:00
Marco Slot	8893cc141d	Support INSERT...SELECT with ON CONFLICT or RETURNING via coordinator Before this commit, Citus supported INSERT...SELECT queries with ON CONFLICT or RETURNING clauses only for pushdownable ones, since queries supported via coordinator were utilizing COPY infrastructure of PG to send selected tuples to the target worker nodes. After this PR, INSERT...SELECT queries with ON CONFLICT or RETURNING clauses will be performed in two phases via coordinator. In the first phase selected tuples will be saved to the intermediate table which is colocated with target table of the INSERT...SELECT query. Note that, a utility function to save results to the colocated intermediate result also implemented as a part of this commit. In the second phase, INSERT.. SELECT query is directly run on the worker node using the intermediate table as the source table.	2018-11-30 15:29:12 +03:00
Marco Slot	8e93fe5870	Check schema owner in task_tracker_assign_task	2018-11-23 11:05:09 +01:00
Marco Slot	6aa5592e52	Add user ID suffix to intermediate files in re-partition jobs	2018-11-23 08:36:11 +01:00
Marco Slot	a59bf31c76	Use worker_execute_sql_task UDF in task-tracker executor	2018-11-22 18:15:33 +01:00
Marco Slot	caf402d506	COPY to a task file no longer switches to superuser	2018-11-22 18:15:33 +01:00
Onder Kalaci	052ba21b19	Make sure to prevent unauthorized users to drop sequences in Citus MX	2018-11-15 18:08:04 +03:00
Nils Dijk	f9520be011	Round robin queries to reference tables with task_assignment_policy set to `round-robin` (#2472 ) Description: Support round-robin `task_assignment_policy` for queries to reference tables. This PR allows users to query multiple placements of shards in a round robin fashion. When `citus.task_assignment_policy` is set to `'round-robin'` the planner will use a round robin scheduling feature when multiple shard placements are available. The primary use-case is spreading the load of reference table queries to all the nodes in the cluster instead of hammering only the first placement of the reference table. Since reference tables share the same path for selecting the shards with single shard queries that have multiple placements (`citus.shard_replication_factor > 1`) this setting also allows users to spread the query load on these shards. For modifying queries we do not apply a round-robin strategy. This would be negated by an extra reordering step in the executor for such queries where a `first-replica` strategy is enforced.	2018-11-15 15:11:15 +01:00
Marco Slot	f383e4f307	Description: Refactor code that handles DDL commands from one file into a module The file handling the utility functions (DDL) for citus organically grew over time and became unreasonably large. This refactor takes that file and refactored the functionality into separate files per command. Initially modeled after the directory and file layout that can be found in postgres. Although the size of the change is quite big there are barely any code changes. Only one two functions have been added for readability purposes: - PostProcessIndexStmt which is extracted from PostProcessUtility - PostProcessAlterTableStmt which is extracted from multi_ProcessUtility A README.md has been added to `src/backend/distributed/commands` describing the contents of the module and every file in the module. We need more documentation around the overloading of the COPY command, for now the boilerplate has been added for people with better knowledge to fill out.	2018-11-14 13:36:27 +01:00
Murat Tuncer	cc401a2616	Create function_utils for pg function call related utilities	2018-11-07 15:29:38 +03:00
Marco Slot	d56baefe3d	Allow simple DML commands from hot standby	2018-10-06 10:54:44 +02:00
Murat Tuncer	4f8042085c	Fix drop schema in mx with partitioned tables Drop schema command fails in mx mode if there is a partitioned table with active partitions. This is due to fact that sql drop trigger receives all the dropped objects including partitions. When we call drop table on parent partition, it also drops the partitions on the mx node. This causes the drop table command on partitions to fail on mx node because they are already dropped when the partition parent was dropped. With this work we did not require the table to exist on worker_drop_distributed_table.	2018-10-08 17:01:54 -07:00
velioglu	512d23934f	Show router modify,select and real-time queries on MX views	2018-10-02 13:59:38 +03:00
Onder Kalaci	abc443d7fa	Make sure that shard repair considers replication factor	2018-09-21 15:24:49 +03:00
Onder Kalaci	8520a5b432	worker_append_table_to_shard becomes aware of partitioned tables	2018-09-21 14:40:42 +03:00
Onder Kalaci	c1b5a04f6e	Allow partitioned tables with replication factor > 1 With this commit, we all partitioned distributed tables with replication factor > 1. However, we also have many restrictions. In summary, we disallow all kinds of modifications (including DDLs) on the partition tables. Instead, the user is allowed to run the modifications over the parent table. The necessity for such a restriction have two aspects: - We need to acquire shard resource locks appropriately - We need to handle marking partitions INVALID in case of any failures. Note that, in theory, the parent table should also become INVALID, which is too aggressive.	2018-09-21 14:40:41 +03:00
Murat Tuncer	b6930e3db9	Add distributed locking to truncated mx tables We acquire distributed lock on all mx nodes for truncated tables before actually doing truncate operation. This is needed for distributed serialization of the truncate command without causing a deadlock.	2018-09-21 14:23:19 +03:00
Marco Slot	f34ab55389	Fix bug preventing rollback in stored procedure	2018-08-31 20:49:20 +02:00
Onder Kalaci	41d606b575	Use tree walker instad of mutator in relation visibility This commit uses _walker instead of _mutator for performance reasons. Given that we're only updating a functionId in the tree, the approach seems fine.	2018-09-18 09:33:01 +03:00
Onder Kalaci	a94184fff8	Prevent overflow of memory accesses during deadlock detection In the distributed deadlock detection design, we concluded that prepared transactions cannot be part of a distributed deadlock. The idea is that (a) when the transaction is prepared it already acquires all the locks, so cannot be part of a deadlock (b) even if some other processes blocked on the prepared transaction, prepared transactions would eventually be committed (or rollbacked) and the system will continue operating. With the above in mind, we probably had a mistake in terms of memory allocations. For each backend initialized, we keep a `BackendData` struct. The bug we've introduced is that, we assumed there would only be `MaxBackend` number of backends. However, `MaxBackends` doesn't include the prepared transactions and axuliary processes. When you check Postgres' InitProcGlobal` you'd see that `TotalProcs = MaxBackends + NUM_AUXILIARY_PROCS + max_prepared_xacts;` This commit aligns with total procs processed with that.	2018-09-17 16:23:29 +03:00
velioglu	d1f005daac	Adds UDFs for testing MX functionalities with isolation tests	2018-09-12 07:04:16 +03:00
Onder Kalaci	d657759c97	Views to Provide some insight about the distributed transactions on Citus MX With this commit, we implement two views that are very similar to pg_stat_activity, but showing queries that are involved in distributed queries: - citus_dist_stat_activity: Shows all the distributed queries - citus_worker_stat_activity: Shows all the queries on the shards that are initiated by distributed queries. Both views have the same columns in the outputs. In very basic terms, both of the views are meant to provide some useful insights about the distributed transactions within the cluster. As the names reveal, both views are similar to pg_stat_activity. Also note that these views can be pretty useful on Citus MX clusters. Note that when the views are queried from the worker nodes, they'd not show the distributed transactions that are initiated from the coordinator node. The reason is that the worker nodes do not know the host/port of the coordinator. Thus, it is advisable to query the views from the coordinator. If we bucket the columns that the views returns, we'd end up with the following: - Hostnames and ports: - query_hostname, query_hostport: The node that the query is running - master_query_host_name, master_query_host_port: The node in the cluster initiated the query. Note that for citus_dist_stat_activity view, the query_hostname-query_hostport is always the same with master_query_host_name-master_query_host_port. The distinction is mostly relevant for citus_worker_stat_activity. For example, on Citus MX, a users starts a transaction on Node-A, which starts worker transactions on Node-B and Node-C. In that case, the query hostnames would be Node-B and Node-C whereas the master_query_host_name would Node-A. - Distributed transaction related things: This is mostly the process_id, distributed transactionId and distributed transaction number. - pg_stat_activity columns: These two views get all the columns from pg_stat_activity. We're basically joining pg_stat_activity with get_all_active_transactions on process_id.	2018-09-10 21:33:27 +03:00
Onder Kalaci	76aa6951c2	Properly send commands to other nodes We previously implemented OTHER_WORKERS_WITH_METADATA tag. However, that was wrong. See the related discussion: https://github.com/citusdata/citus/issues/2320 Instead, we switched using OTHER_WORKER_NODES and make the command that we're running optional such that even if the node is not a metadata node, we won't be in trouble.	2018-09-10 16:01:30 +03:00
Onder Kalaci	5cf8fbe7b6	Add infrastructure to relation if exists	2018-09-07 14:49:36 +03:00
Murat Tuncer	65276311f7	Reflect changed index for constraint scans in PG11	2018-09-07 08:07:01 +03:00
Onder Kalaci	26e308bf2a	Support TRUNCATE from the MX worker nodes This commit enables support for TRUNCATE on both distributed table and reference tables. The basic idea is to acquire lock on the relation by sending the TRUNCATE command to all metedata worker nodes. We only skip sending the TRUNCATE command to the node that actually executus the command to prevent a self-distributed-deadlock.	2018-09-03 14:06:31 +03:00
Onder Kalaci	97ba7bf2eb	Add the option to skip the node that is executing the node	2018-09-03 14:01:24 +03:00
velioglu	bd30e3e908	Add support for writing to reference tables from MX nodes	2018-08-27 18:15:04 +03:00
velioglu	2639149bd8	Enterprise functions about metadata/resource locks	2018-08-27 16:32:20 +03:00
Onder Kalaci	b8af8c359b	Make sure that modifying CTEs always use the correct execution mode	2018-08-23 14:53:55 +03:00
mehmet furkan şahin	ef9f38b68d	ApplyLogRedaction noop func is added	2018-08-17 14:48:54 -07:00
Onder Kalaci	974cbf11a5	Hide shard names on MX worker nodes This commit by default enables hiding shard names on MX workers by simple replacing `pg_table_is_visible()` calls with `citus_table_is_visible()` calls on the MX worker nodes. The latter function filters out tables that are known to be shards. The main motivation of this change is a better UX. The functionality can be opted out via a GUC. We also added two views, namely citus_shards_on_worker and citus_shard_indexes_on_worker such that users can query them to see the shards and their corresponding indexes. We also added debug messages such that the filtered tables can be interactively seen by setting the level to DEBUG1.	2018-08-07 14:21:45 +03:00
Onder Kalaci	e13da6a343	Add infrastructure to hide shards on MX worker nodes Add ability to understand whether a table is a known shard on MX workers. Note that this is only useful and applicable for hiding shards on MX worker nodes given that we can have metadata only there.	2018-08-04 09:03:37 +03:00
Nils Dijk	6a15e1c9fc	extract ErrorIfOnConflictNotSupported function for reuse	2018-07-23 12:20:10 +02:00
Jason Petersen	318119910b	Add pg_dist_poolinfo table For storing nodes' pool host/port overrides.	2018-07-10 09:30:22 -07:00
mehmet furkan şahin	3afa7f425d	Topn aggregates are supported	2018-07-10 14:33:42 +03:00
Marco Slot	89870e76ce	Add a select_opens_transaction_block GUC	2018-07-08 03:50:39 +02:00
Onder Kalaci	7fb529aab9	Some stylistic improvements in the foreign keys to reference table changes.	2018-07-05 23:23:34 +03:00
Nils Dijk	c1c8c38dc9	create placeholder for policy ddl	2018-07-05 11:07:01 +02:00
mehmet furkan şahin	06217be326	hll aggregate functions are supported natively	2018-07-04 16:41:09 +03:00
mehmet furkan şahin	f7b901e3fd	CopyShardForeignConstraintCommandList API change for grouped constraints	2018-07-03 17:05:55 +03:00
mehmet furkan şahin	35eac2318d	lock referenced reference table metadata is added For certain operations in enterprise, we need to lock the referenced reference table shard distribution metadata	2018-07-03 17:05:55 +03:00
Onder Kalaci	d83be3a33f	Enforce foreign key restrictions inside transaction blocks When a hash distributed table have a foreign key to a reference table, there are few restrictions we have to apply in order to prevent distributed deadlocks or reading wrong results. The necessity to apply the restrictions arise from cascading nature of foreign keys. When a foreign key on a reference table cascades to a distributed table, a single operation over a single connection can acquire locks on multiple shards of the distributed table. Thus, any parallel operation on that distributed table, in the same transaction should not open parallel connections to the shards. Otherwise, we'd either end-up with a self-distributed deadlock or read wrong results. As briefly described above, the restrictions that we apply is done by tracking the distributed/reference relation accesses inside transaction blocks, and act accordingly when necessary. The two main rules are as follows: - Whenever a parallel distributed relation access conflicts with a consecutive reference relation access, Citus errors out - Whenever a reference relation access is followed by a conflicting parallel relation access, the execution mode is switched to sequential mode. There are also some other notes to mention: - If the user does SET LOCAL citus.multi_shard_modify_mode TO 'sequential';, all the queries should simply work with using one connection per worker and sequentially executing the commands. That's obviously a slower approach than Citus' usual parallel execution. However, we've at least have a way to run all commands successfully. - If an unrelated parallel query executed on any distributed table, we cannot switch to sequential mode. Because, the essense of sequential mode is using one connection per worker. However, in the presence of a parallel connection, the connection manager picks those connections to execute the commands. That contradicts with our purpose, thus we error out. - COPY to a distributed table cannot be executed in sequential mode. Thus, if we switch to sequential mode and COPY is executed, the operation fails and there is currently no way of implementing that. Note that, when the local table is not empty and create_distributed_table is used, citus uses COPY internally. Thus, in those cases, create_distributed_table() will also fail. - There is a GUC called citus.enforce_foreign_key_restrictions to disable all the checks. We added that GUC since the restrictions we apply is sometimes a bit more restrictive than its necessary. The user might want to relax those. Similarly, if you don't have CASCADEing reference tables, you might consider disabling all the checks.	2018-07-03 17:05:55 +03:00
velioglu	6be6911ed9	Create foreign key relation graph and functions to query on it	2018-07-03 17:05:55 +03:00
mehmet furkan şahin	4db72c99f6	Specific DDLs are sequentialized when there is FK -[x] drop constraint -[x] drop column -[x] alter column type -[x] truncate are sequentialized if there is a foreign constraint from a distributed table to a reference table on the affected relations by the above commands.	2018-07-03 17:05:55 +03:00
mehmet furkan şahin	2c5d59f3a8	create_distributed_table in transaction is fixed	2018-07-03 17:05:01 +03:00
mehmet furkan şahin	2fa4e38841	FK from dist to ref can be added with alter table	2018-07-03 17:05:01 +03:00
Murat Tuncer	3fc7cdfe6d	Apply master_stage_protocol refactoring changes	2018-06-28 11:24:57 +03:00
Murat Tuncer	4d35b92016	Add groundwork for citus_stat_statements api	2018-06-27 14:20:03 +03:00
Onder Kalaci	8ccb8b679e	Real-time executor marks multi shard relation accesses before opening connections	2018-06-25 18:40:31 +03:00
Onder Kalaci	21038f0d0e	Make sure that inter-shard DDL commands are always covers both tables	2018-06-25 18:40:30 +03:00
Onder Kalaci	2f01894589	Track relation accesses using the connection management infrastructure	2018-06-25 18:40:30 +03:00
Onder Kalaci	d5472614df	Use non-data connection for intermediate results Make sure that intermediate results use a connection that is not associated with any placement. That is useful in two ways: - More complex queries can be executed with CTEs - Safely use the same connections when there is a foreign key to reference table from a distributed table, which needs to use the same connection for modifications since the reference table might cascade to the distributed table.	2018-06-21 13:26:13 +03:00
Onder Kalaci	7762d81cba	Move test UDF under test folder	2018-06-21 08:42:44 +03:00
Jason Petersen	7a75c2ed31	Add connparam invalidation trigger creation logic This needs to live in Community, since we haven't yet added the com- plication of having divergent upgrade scripts in Enterprise.	2018-06-20 14:13:18 -06:00
velioglu	53b2e81d01	Adds SELECT ... FOR UPDATE support for router plannable queries	2018-06-18 13:55:17 +03:00
Marco Slot	0feb1f2eb1	Do not call CheckRemoteTransactionsHealth from commit handler	2018-06-14 23:33:07 +02:00
Marco Slot	4ab8e87090	Always throw errors on failure on critical connection in router executor	2018-06-14 23:33:07 +02:00
Nils Dijk	73efcb22c4	Extract RoleSpecString and resolve role references	2018-06-14 11:38:42 +02:00
Jason Petersen	57b3f253c5	Add node_conninfo GUC and related logic To support more flexible (i.e. not at compile-time) specification of libpq connection parameters, this change adds a new GUC, node_conninfo, which must be a space-separated string of key-value pairs suitable for parsing by libpq's connection establishment methods. To avoid rebuilding and parsing these values at connection time, this change also adds a cache in front of the configuration params to permit immediate use of any previously-calculated parameters.	2018-06-12 20:23:47 -06:00
mehmet furkan şahin	d1a3b20115	foreign_constraint_utils is created	2018-06-07 18:19:24 +03:00
Onder Kalaci	336044f2a8	master_modify_multiple_shards() and TRUNCATE honors multi_shard_modification_mode	2018-06-06 12:29:05 +03:00
Onder Kalaci	df44956dc3	Make sure that sequential DDL opens a single connection to each node After this commit DDL commands honour `citus.multi_shard_modify_mode`. We preferred using the code-path that executes single task router queries (e.g., ExecuteSingleModifyTask()) in order not to invent a new executor that is only applicable for DDL commands that require sequential execution.	2018-06-05 17:52:17 +03:00
Marco Slot	fd4ff29f2f	Add a debug message with distribution column value	2018-06-05 15:09:17 +03:00
Dimitri Fontaine	8b258cbdb0	Lock reads and writes only to the node being updated in master_update_node Rather than locking out all the writes in the cluster, the function now only locks out writes that target shards hosted by the node we're updating.	2018-05-09 15:14:20 +02:00
Murat Tuncer	42a8082721	PG11 compatibility refresh adds a shim for a changed function api	2018-05-03 13:21:15 -06:00
Onder Kalaci	317dd02a2f	Implement single repartitioning on hash distributed tables * Change worker_hash_partition_table() such that the divergence between Citus planner's hashing and worker_hash_partition_table() becomes the same. * Rename single partitioning to single range partitioning. * Add single hash repartitioning. Basically, logical planner treats single hash and range partitioning almost equally. Physical planner, on the other hand, treats single hash and dual hash repartitioning almost equally (except for JoinPruning). * Add a new GUC to enable this feature	2018-05-02 18:50:55 +03:00
velioglu	32bcd610c1	Support modify queries with multiple tables With this commit we begin to support modify queries with multiple tables if these queries are pushdownable.	2018-05-02 16:22:26 +03:00
velioglu	d9fa69c031	Refactor query pushdown related logic	2018-05-02 15:03:09 +03:00
velioglu	121ff39b26	Removes large_table_shard_count GUC	2018-04-29 10:34:50 +02:00
mehmet furkan şahin	a4153c6ab1	notice handler is implemented	2018-04-27 14:37:01 +03:00
Marco Slot	304b3a41ba	Cache the partition column Var	2018-04-26 14:58:16 -06:00
Marco Slot	3d3c19a717	Improve messages for essential connection failures	2018-04-26 12:58:47 -06:00
Murat Tuncer	a6fe5ca183	PG11 compatibility update - changes in ruleutils_11.c is reflected - vacuum statement api change is handled. We now allow multi-table vacuum commands. - some other function header changes are reflected - api conflicts between PG11 and earlier versions are handled by adding shims in version_compat.h - various regression tests are fixed due output and functionality in PG1 - no change is made to support new features in PG11 they need to be handled by new commit	2018-04-26 11:29:43 +03:00
Onder Kalaci	ee748d9140	Unify extendedOpNode Processing Before this commit, we had a divergence among the creation of master/worker extended op nodes. This commit moves the related parts into a single place and allows the creation of master/extended op nodes to share a common data structure.	2018-04-24 11:56:38 +03:00
velioglu	82b2d21b0c	Convert broadcast join to reference join After this commit large_table_shard_count wont be used to check whether broadcast join, which is renamed as reference join, can be applied. Reference join can only be applied over reference tables.	2018-04-13 12:58:14 +03:00
Marco Slot	ee132c5ead	Prune shards once per relation in subquery pushdown	2018-04-10 20:33:07 +02:00
Burak Yucesoy	0c283fa8a3	Add partitioning support to MX tables Previously, we prevented creation of partitioned tables on Citus MX. We decided to not focus on this feature until there is a need. Since now there are requests for this feature, we are implementing support for partitioned tables on Citus MX.	2018-04-06 12:47:06 +03:00
velioglu	72dfe4a289	Adds colocation check to local join	2018-04-04 22:49:27 +03:00
velioglu	82a864308a	Remove SHARD_STORAGE_RELAY type	2018-03-30 11:45:19 +03:00
velioglu	698d585fb5	Remove broadcast join logic After this change all the logic related to shard data fetch logic will be removed. Planner won't plan any ShardFetchTask anymore. Shard fetch related steps in real time executor and task-tracker executor have been removed.	2018-03-30 11:45:19 +03:00
Murat Tuncer	224b0a8c14	Replace poll with select/poll Windows does not have poll(), so fall back to select()	2018-03-21 20:05:00 -07:00
Metin Doslu	3b7b64a8b6	Remove skip_jsonb_validation_in_copy GUC	2018-03-13 10:33:27 +02:00
Murat Tuncer	1440caeef2	Fix incorrect limit pushdown when distinct clause is not superset of group by (#2035 ) Pushing down limit and order by into workers may produce wrong output when distinct on() clause has expressions, aggregates, or window functions. This checking allows pushing down of limits only if distinct clause is a superset of group by clause. i.e. it contains all clauses in group by.	2018-03-07 13:24:56 +03:00
Murat Tuncer	76f6883d5d	Add support for window functions that can be pushed down to worker (#2008 ) This is the first of series of window function work. We can now support window functions that can be pushed down to workers. Window function must have distribution column in the partition clause to be pushed down.	2018-03-01 19:07:07 +03:00
Marco Slot	ef5ff7eb12	Add bit_ and bool_ aggregates to AggregateType	2018-02-27 23:48:25 +01:00
Marco Slot	c723a1fa32	Add support for bool and bit aggregates	2018-02-27 23:48:25 +01:00
Metin Doslu	bcf660475a	Add support for modifying CTEs	2018-02-27 15:08:32 +02:00
Onder Kalaci	1c930c96a3	Support non-co-located joins between subqueries With #1804 (and related PRs), Citus gained the ability to plan subqueries that are not safe to pushdown. There are two high-level requirements for pushing down subqueries: * Individual subqueries that require a merge step (i.e., GROUP BY on non-distribution key, or LIMIT in the subquery etc). We've handled such subqueries via #1876. * Combination of subqueries that are not joined on distribution keys. This commit aims to recursively plan some of such subqueries to make the whole query safe to pushdown. The main logic behind non colocated subquery joins is that we pick an anchor range table entry and check for distribution key equality of any other subqueries in the given query. If for a given subquery, we cannot find distribution key equality with the anchor rte, we recursively plan that subquery. We also used a hacky solution for picking relations as the anchor range table entries. The hack is that we wrap them into a subquery. This is only necessary since some of the attribute equivalance checks are based on queries rather than range table entries.	2018-02-26 13:50:37 +02:00
Onder Kalaci	7b57e0562a	Add infrastructure for detecting non-colocated subqueries	2018-02-26 13:28:25 +02:00
Onder Kalaci	e8aa532a90	Refactor checks for distribution key equality Change some function names, ensure we stick to Citus' function order rules etc.	2018-02-26 13:28:24 +02:00
Markus Sintonen	6202e80d06	Implemented jsonb_agg, json_agg, jsonb_object_agg, json_object_agg	2018-02-18 00:19:18 +02:00
velioglu	195ac948d2	Recursively plan subqueries in WHERE clause when FROM recurs	2018-02-13 19:52:12 +03:00
Marco Slot	6051aae56e	Handle errors that are discovered during abort	2018-02-12 16:45:02 +01:00
Onder Kalaci	94c5ac6ebb	Remove duplicate join restrictions We use PostgreSQL hooks to accumulate the join restrictions and PostgreSQL gives us all the join paths it tries while deciding on the join order. Thus, for queries that have many joins, this function is likely to remove lots of duplicate join restrictions. This becomes relevant for Citus on query pushdown check peformance.	2018-02-12 18:35:05 +02:00
Onder Kalaci	c228d8ff3d	Refactor equivalance generation related codes This commit changes the APIs for restriction generation to make future changes simpler.	2018-02-12 18:35:04 +02:00
Onder Kalaci	2f2d350924	Refactor relation restriction related codes This commit moves some of the functions to a more relevant source file.	2018-02-12 18:35:04 +02:00
Marco Slot	6f7c3bd73b	Skip JSON validation on coordinator during COPY	2018-02-02 15:33:27 +01:00
Brian Cloutier	15511f6ba1	Dynamically allocate connection metadata in WaitForAllConnections	2018-02-01 10:30:41 -08:00
Brian Cloutier	a2ed45e206	Remove variable length arrays VLAs aren't supported by Visual Studio. - Remove all existing instances of VLAs. - Add a flag, -Werror=vla, which makes gcc refuse to compile if we add VLAs in the future.	2018-02-01 10:30:41 -08:00
Brian Cloutier	76d1edc3fd	Don't rely on gcc-specific features (#1963 ) * Don't use expressions inside compound statements * Don't depend on __builtin_constant_p * Remove reliance on S_ISLNK * Replace use of __func__: older mcvs doesn't support this builtin	2018-01-23 17:03:29 -08:00
Marco Slot	3fd65cb91b	Do not raise errors in the real-time executor (#1903 )	2018-01-01 22:26:31 -05:00
Marco Slot	09c09f650f	Recursively plan set operations when leaf nodes recur	2017-12-26 13:46:55 +02:00
mehmet furkan şahin	fd546cf322	Intermediate result size limitation This commit introduces a new GUC to limit the intermediate result size which we handle when we use read_intermediate_result function for CTEs and complex subqueries.	2017-12-21 14:26:56 +03:00
Onder Kalaci	0d5a4b9c72	Recursively plan subqueries that are not safe to pushdown With this commit, Citus recursively plans subqueries that are not safe to pushdown, in other words, requires a merge step. The algorithm is simple: Recursively traverse the query from bottom up (i.e., bottom meaning the leaf queries). On each level, check whether the query is safe to pushdown (or a single repartition subquery). If the answer is yes, do not touch that subquery. If the answer is no, plan the subquery seperately (i.e., create a subPlan for it) and replace the subquery with a call to `read_intermediate_results(planId, subPlanId)`. During the the execution, run the subPlans first, and make them avaliable to the next query executions. Some of the queries hat this change allows us: * Subqueries with LIMIT * Subqueries with GROUP BY/DISTINCT on non-partition keys * Subqueries involving re-partition joins, router queries * Mixed usage of subqueries and CTEs (i.e., use CTEs in subqueries as well). Nested subqueries as long as we support the subquery inside the nested subquery. * Subqueries with local tables (i.e., those subqueries has the limitation that they have to be leaf subqueries) * VIEWs on the distributed tables just works (i.e., the limitations mentioned below still applies to views) Some of the queries that is still NOT supported: * Corrolated subqueries that are not safe to pushdown * Window function on non-partition keys * Recursively planned subqueries or CTEs on the outer side of an outer join * Only recursively planned subqueries and CTEs in the FROM (i.e., not any distributed tables in the FROM) and subqueries in WHERE clause * Subquery joins that are not on the partition columns (i.e., each subquery is individually joined on partition keys but not the upper level subquery.) * Any limitation that logical planner applies such as aggregate distincts (except for count) when GROUP BY is on non-partition key, or array_agg with ORDER BY	2017-12-21 08:37:40 +02:00
Onder Kalaci	e12ea914b9	Refactor ErrorIfQueryNotSupported to defer errors	2017-12-20 09:03:49 +02:00
Onder Kalaci	71ce42b936	Refactor RecursivelyPlanSubqueriesAndCTEs() to make it ready to work with subqueries	2017-12-20 09:03:47 +02:00
Marco Slot	7dab078e67	Set cost estimates for read_intermediate_result	2017-12-18 16:23:44 +01:00
Marco Slot	a64f0060ba	Reduce the frequency of FinishConnectionIO calls during COPY (#1864 )	2017-12-14 13:21:59 -05:00
Marco Slot	2e2b4e81fa	Add support for CTEs in distributed queries	2017-12-14 09:32:55 +01:00
Marco Slot	cbbd418af2	Add citus.copy_format OIDs to metadata cache	2017-12-14 09:32:55 +01:00
Marco Slot	66f9f1d6cd	Make some intermediate results functions public	2017-12-14 09:32:55 +01:00
Marco Slot	36ee21c323	Make CanUseBinaryCopyFormatForType public	2017-12-14 09:32:55 +01:00
Marco Slot	7d1191954d	Add DistributedSubPlan node	2017-12-14 09:32:55 +01:00
mehmet furkan şahin	3c941aedf1	adds citus.enable_repartition_joins GUC The new GUC allows Citus to switch between task executors when necessary	2017-12-11 09:36:37 +03:00
Marco Slot	60a1e31671	Allow queries with local tables in NeedsDistributedPlanning	2017-12-07 16:20:23 +01:00
Marco Slot	f8550b8c85	Fix issues with read_intermediate_result signature	2017-12-07 13:47:56 +01:00
Marco Slot	d8fea4efb8	Revert "Allow queries with local tables in NeedsDistributedPlanning" This reverts commit `d2bac081e8`.	2017-12-07 11:19:11 +01:00
Marco Slot	d2bac081e8	Allow queries with local tables in NeedsDistributedPlanning	2017-12-07 11:02:16 +01:00
Marco Slot	7279d42849	Treat read_intermediate_result as recurring tuples	2017-12-04 14:50:11 +01:00
Marco Slot	4cdadfcab6	Add intermediate results infrastructure	2017-12-04 14:50:11 +01:00
Marco Slot	bfcc76df69	Make several COPY-related functions public	2017-12-04 13:12:03 +01:00
Marco Slot	73989b07eb	Refactor query execution functions	2017-12-04 13:12:03 +01:00
Murat Tuncer	2d66bf5f16	Fix hard coded formatting strings for 64 bit numbers (#1831 ) Postgres provides OS agnosting formatting macros for formatting 64 bit numbers. Replaced %ld %lu with INT64_FORMAT and UINT64_FORMAT respectively. Also found some incorrect usages of formatting flags and fixed them.	2017-12-04 14:11:06 +03:00
Marco Slot	d6dd0b3a81	Send BEGIN in the real-time executor when in a transaction	2017-11-30 12:59:09 +01:00
Marco Slot	3a4d5f8182	Remove filter checks on leaf queries	2017-11-30 12:25:14 +01:00
Marco Slot	3f03cb6a6a	Support UNION with joins in the subqueries	2017-11-30 10:37:56 +01:00
Marco Slot	a9933deac6	Make real time executor work in transactions	2017-11-30 09:59:32 +03:00
Onder Kalaci	05fb0dd020	Add infrastructure for filtering restriction contexts based on the input query In subquery pushdown, we first ensure that each relation is joined with at least on another relation on the partition keys. That's fine given that the decision is binary: pushdown the query at all or not. With recursive planning, we'd want to check whether any specific part of the query can be pushded down or not. Thus, we need the ability to understand which part(s) of the subquery is safe to pushdown. This commit adds the infrastructure for doing that.	2017-11-28 09:58:21 +02:00
Onder Kalaci	16421f089f	Register citus custom scan nodes	2017-11-23 11:38:33 +02:00
Onder Kalaci	83c1143505	Refactor custom scan related codes In this commit, we don't change any codes, only create a new file and move the related functions and types there.	2017-11-23 11:38:12 +02:00
Marco Slot	8486f76e15	Auto-recover 2PC transactions	2017-11-22 11:26:58 +01:00
Marco Slot	6ba3f42d23	Rename MultiPlan to DistributedPlan	2017-11-22 09:36:24 +01:00
Marco Slot	0ad39b36fe	Treat immutable table functions and constant subqueries as reference tables	2017-11-21 14:15:22 +01:00
Brian Cloutier	d267e0f9fa	EXEC_BACKEND: don't put pointers to shared hashes into shared memory Store pointers to shared hashes in process-local variables. Previously pointers to shared hashes were put into shared memory. This causes problems on EXEC_BACKEND because everybody calls execve and receives a brand new address space; the shared hash will be in a different place for every backend. (normally we call fork, which gives you a copy of the address space, so these pointers remain constant)	2017-11-20 15:29:51 -08:00
Brian Cloutier	30a2365d81	Rename CreateDirectory to CitusCreateDirectory	2017-11-20 14:38:26 -08:00
Brian Cloutier	aa2ab023a2	Rename RemoveDirectory -> CitusRemoveDirectory	2017-11-20 14:21:52 -08:00
Marco Slot	2410c2e450	Rewrite recover_prepared_transactions to be fast, non-blocking	2017-11-20 11:27:40 +01:00
Marco Slot	d3b634b301	Allow generating placement IDs without using the sequence	2017-11-15 10:12:06 +01:00
Marco Slot	c24a0875a5	Allow generating shard IDs without using the sequence	2017-11-15 10:12:05 +01:00
velioglu	be28ba8e70	Add stub UDF to run pg_upgrade flawlessly	2017-11-13 16:14:45 +02:00
Marco Slot	f71728f634	Add GUC for specifying sslmode in connections to workers	2017-11-08 14:15:58 +01:00
Brian Cloutier	7be1545843	Support implicit casts during INSERT/SELECT It's possible to build INSERT SELECT queries which include implicit casts, currently we attempt to support these by adding explicit casts to the SELECT query, but this sometimes crashes because we don't update all nodes with the new types. (SortClauses, for instance) This commit removes those explicit casts and passes an unmodified SELECT query to the COPY executor (how we implement INSERT SELECT under the scenes). In lieu of those cases, COPY has been given some extra logic to inspect queries, notice that the types don't line up with the table it's supposed to be inserting into, and "manually" casting every tuple before sending them to workers.	2017-11-03 22:27:15 -07:00
Marco Slot	6883a09cdd	Allow distributed partitioned table creation in Cloud	2017-11-03 10:09:18 +01:00
Hadi Moshayedi	9bfbbf8a04	Make reports hostname configurable and enable stats collection in tests. This patch adds --with-reports-host configure option, which sets the REPORTS_BASE_URL constant. The default is reports.citusdata.com. It also enables stats collection in tests.	2017-10-31 21:51:43 -04:00
Hadi Moshayedi	78a2cd9052	Check for Citus updates. Sends a request to /v1/releases/latest?flavor=$CITUS_EDITION once a day, which returns a response similar to {"version": "7.1.0", "major": 7, "minor": 1, "patch": 0}. Then compares it with current Citus version, and if the latest release is newer, logs a LOG message.	2017-10-31 21:51:43 -04:00
Hadi Moshayedi	34f3ec0961	Call FlushDistTableCache() before stats collection.	2017-10-31 21:51:43 -04:00
Hadi Moshayedi	97d544b75c	Follow the patterns used in Deadlock Detection in Stats Collection. This includes: (1) Wrap everything inside a StartTransactionCommand()/CommitTransactionCommand(). This is so we can access the database. This also switches to a new memory context and releases it, so we don't have to do our own memory management. (2) LockCitusExtension() so the extension cannot be dropped or created concurrently. (3) Check CitusHasBeenLoaded() && CheckCitusVersion() before doing any work. (4) Do not PG_TRY() inside a loop.	2017-10-31 21:51:43 -04:00
Furkan Sahin	2b39c52f0b	Replica identity on create_distributed_table By this commit, citus minds the replica identity of the table when we distribute the table. So the shards of the distributed table have the same replica identity with the local table.	2017-10-31 13:08:36 +03:00
velioglu	0b5db5d826	Support multi shard update/delete queries	2017-10-25 15:52:38 +03:00
Hadi Moshayedi	9a04b78980	Send server_id for statistics reports. (#1698 ) This change introduces the `pg_dist_node_metadata` which has a single jsonb value. When creating the extension, a random server id is generated and stored in there. Everything in the metadata table is added as a nested objected to the json payload that is sent to the reports server.	2017-10-18 21:20:32 -04:00
Jason Petersen	f2c593b25c	Add CITUS_NAME and CITUS_EDITION Unambiguous places to check whether we're running simply Citus or Citus Enterprise, or to check for 'community' or 'enterprise'.	2017-10-16 18:09:29 -06:00
Jason Petersen	8544878c4b	Add citus_version(), analogous to PG's version() This will provide the full project name (i.e. Citus/Citus Enterprise), and the host system, compiler, and architecture word size. I wanted to limit the number of copied files in 'config', so I added only config.guess and call it manually, rather than using the macro AC_CANONICAL_HOST, which requires several other files.	2017-10-16 18:09:29 -06:00
Brian Cloutier	ebcb2b65e9	Add master_move_node function	2017-10-16 10:51:28 -07:00
Hadi Moshayedi	2aec6eda49	Properly use #ifdef HAVE_LIBCURL.	2017-10-13 12:04:36 -06:00
Jason Petersen	01353cb7cb	Use header define rather than -D flag Eclipse apparently doesn't scan build output looking for -D flags, so having the value actually appear in a header is nicer for those of us using IDEs.	2017-10-13 11:00:09 -04:00
Murat Tuncer	f7ab901766	Add select distinct, and distinct on support Distinct, and distinct on() clauses are supported in simple selects, joins, subqueries, and insert into select queries.	2017-10-13 14:59:48 +03:00
Hadi Moshayedi	a1387f4aa8	Basic usage statistics collection. (#1656 ) Adds ```citus.enable_statistics_collection``` GUC variable, which ```true``` by default, unless built without libcurl. If statistics collection is enabled, sends basic usage data to Citus servers every 24 hours. The data that is collected consists of: - Citus version - OS name & release - Hardware Id - Number of tables, rounded to next power of 2 - Size of data, rounded to next power of 2 - Number of workers	2017-10-11 09:55:15 -04:00
Onder Kalaci	498ac80d8b	Add window function support for SUBQUERY PUSHDOWN and INSERT INTO SELECT This commit provides the support for window functions in subquery and insert into select queries. Note that our support for window functions is still limited because it must have a partition by clause on the distribution key. This commit makes changes in the files insert_select_planner and multi_logical_planner. The required tests are also added with files multi_subquery_window_functions.out and multi_insert_select_window.out.	2017-10-04 15:33:07 +03:00
Marco Slot	394918f9d0	Invalidate worker and group ID cache in maintenance daemon	2017-10-02 18:14:29 +02:00
Marco Slot	43d5e79eaa	Execute transmit commands as superuser during task-tracker queries	2017-09-28 15:27:25 +02:00
Marco Slot	da6b42a3e2	Use unique constraint index for transaction record deletion	2017-09-28 12:04:56 +02:00
Murat Tuncer	4676c4f7a5	Prevent crash when remote transaction start fails (#1662 ) We sent multiple commands to worker when starting a transaction. Previously we only checked the result of the first command that is transaction 'BEGIN' which always succeeds. Any failure on following commands were not checked. With this commit, we make sure all command results are checked. If there is any error we report the first error found.	2017-09-26 17:25:46 -07:00
Jason Petersen	b4d53423fa	Add adapter functions for OpenFile changes	2017-09-25 17:20:24 -07:00
Jason Petersen	bbc15e0598	Handle HASHPROC changes PostgreSQL 11 now has "standard" and "extended" (64-bit) versions of hash functions.	2017-09-25 17:20:24 -07:00
Jason Petersen	6c9b19a954	Add version-compat header For polyfill macros, etc.	2017-09-25 17:20:23 -07:00
velioglu	0a56ed910b	Change error message of queries with distributed and local table Citus can handle INSERT INTO ... SELECT queries if the query inserts into local table by reading data from distributed table. The opposite way is not correct. With this commit we warn the user if the latter option is used.	2017-09-22 13:46:19 -07:00
Onder Kalaci	4782f9f98a	Properly copy and trim the error messages that come from pg_conn When a NULL connection is provided to PQerrorMessage(), the returned error message is a static text. Modifying that static text, which doesn't necessarly be in a writeable memory, is dangreous and might cause a segfault.	2017-09-22 19:43:09 +03:00
Onder Kalaci	6736fd1682	Remove two obsolete functions Namely GetConnectionFromPGconn() and CloseConnectionByPGconn()	2017-09-21 00:36:23 -06:00
Marco Slot	0aadbb1760	Convert multi-row INSERT target list to Vars	2017-08-25 10:55:56 +02:00
Marco Slot	ae00795dab	Allow default columns in multi-row INSERTs	2017-08-25 10:55:56 +02:00
Marco Slot	c97692f382	Fix multi-row INSERT with RETURNING on reference tables	2017-08-24 10:42:12 +02:00
Marco Slot	4d7927b672	Execute multi-row INSERTs sequentially	2017-08-23 10:04:57 +02:00
Marco Slot	cf375d6a66	Consider dropped columns that precede the partition column in COPY	2017-08-22 13:02:35 +02:00
Onder Kalaci	6532b69873	Kill the maintenance daemon on DROP DATABASE	2017-08-18 16:03:08 +03:00
Marco Slot	7523753a73	Clear metadata OID cache prior to deadlock detection	2017-08-18 11:20:24 +02:00
Hadi Moshayedi	e5fbcf37dd	Add Savepoint Support (#1539 ) This change adds support for SAVEPOINT, ROLLBACK TO SAVEPOINT, and RELEASE SAVEPOINT. When transaction connections are not established yet, savepoints are kept in a stack and sent to the worker when the connection is later established. After establishing connections, savepoint commands are sent as they arrive. This change fixes #1493 .	2017-08-15 13:02:28 -04:00
Burak Yucesoy	dfdfb44ebf	Acquire shard resource locks on parent tables while operating on partitions	2017-08-14 14:44:30 +03:00
Burak Yucesoy	a321e750c0	Acquire relation locks on partitions while operation on parent table	2017-08-14 14:44:30 +03:00
Burak Yucesoy	52b9e35d50	Add relationIdList field to the Job struct	2017-08-14 14:06:22 +03:00
Onder Kalaci	5b48de7430	Improve deadlock detection for MX We added a new field to the transaction id that is set to true only for the transactions initialized on the coordinator. This is only useful for MX in order to distinguish the transaction that started the distributed transaction on the coordinator where we could have the same transactions' worker queries on the same node.	2017-08-12 13:28:37 +03:00
Onder Kalaci	e5d5bdff51	Enable distributed deadlock detection on the maintenance deamon With this commit, the maintenance deamon starts to check for distributed deadlocks. We also introduced a GUC variable (distributed_deadlock_detection_factor) whose value is multiplied with Postgres' deadlock_timeout. Setting it to -1 disables the distributed deadlock detection.	2017-08-12 13:28:37 +03:00
Onder Kalaci	a333c9f16c	Add infrastructure for distributed deadlock detection This commit adds all the necessary pieces to do the distributed deadlock detection. Each distributed transaction is already assigned with distributed transaction ids introduced with `3369f3486f`. The dependency among the distributed transactions are gathered with `80ea233ec1`. With this commit, we implement a DFS (depth first seach) on the dependency graph and search for cycles. Finding a cycle reveals a distributed deadlock. Once we find the deadlock, we examine the path that the cycle exists and cancel the youngest distributed transaction. Note that, we're not yet enabling the deadlock detection by default with this commit.	2017-08-12 13:28:37 +03:00
velioglu	b0efffae1c	Correct planner and add more tests	2017-08-11 10:16:13 +03:00
velioglu	ceba81ce35	Move physical planner checks to logical planner	2017-08-11 10:09:47 +03:00
velioglu	c4e3b8b5e1	Add planner changes and tests for subquery on reference tables	2017-08-11 10:09:47 +03:00
Marco Slot	0ae265c436	Add citus_create_restore_point for distributed snapshots	2017-08-11 07:36:20 +02:00
Marco Slot	fca986f214	Add API for waiting for multiple connections	2017-08-11 00:03:06 +02:00
Brian Cloutier	9d93fb5551	Create citus.use_secondary_nodes GUC This GUC has two settings, 'always' and 'never'. When it's set to 'never' all behavior stays exactly as it was prior to this commit. When it's set to 'always' only SELECT queries are allowed to run, and only secondary nodes are used when processing those queries. Add some helper functions: - WorkerNodeIsSecondary(), checks the noderole of the worker node - WorkerNodeIsReadable(), returns whether we're currently allowed to read from this node - ActiveReadableNodeList(), some functions (namely, the ones on the SELECT path) don't require working with Primary Nodes. They should call this function instead of ActivePrimaryNodeList(), because the latter will error out in contexts where we're not allowed to write to nodes. - ActiveReadableNodeCount(), like the above, replaces ActivePrimaryNodeCount(). - EnsureModificationsCanRun(), error out if we're not currently allowed to run queries which modify data. (Either we're in read-only mode or use_secondary_nodes is set) Some parts of the code were switched over to use readable nodes instead of primary nodes: - Deadlock detection - DistributedTableSize, - the router, real-time, and task tracker executors - ShardPlacement resolution	2017-08-10 17:37:17 +03:00
Brian Cloutier	3fc87a7a29	Metadata sync also syncs nodes in other clusters	2017-08-10 16:55:55 +03:00
Eren Başak	deb89cb9ce	Delete tesh_helper_functions.h	2017-08-10 14:00:44 +03:00
Eren Başak	3061737712	Define Some Utility Functions This change declares two new functions: `master_update_table_statistics` updates the statistics of shards belong to the given table as well as its colocated tables. `get_colocated_shard_array` returns the ids of colocated shards of a given shard.	2017-08-10 12:42:46 +03:00
Jason Petersen	6a35c2937c	Enable multi-row INSERTs This is a pretty substantial refactoring of the existing modify path within the router executor and planner. In particular, we now hunt for all VALUES range table entries in INSERT statements and group the rows contained therein by shard identifier. These rows are stashed away for later in "ModifyRoute" elements. During deparse, the appropriate RTE is extracted from the Query and its values list is replaced by these rows before any SQL is generated. In this way, we can create multiple Tasks, but only one per shard, to piecemeal execute a multi-row INSERT. The execution of jobs containing such tasks now exclusively go through the "multi-router executor" which was previously used for e.g. INSERT INTO ... SELECT. By piggybacking onto that executor, we participate in ongoing trans- actions, get rollback-ability, etc. In short order, the only remaining use of the "single modify" router executor will be for bare single- row INSERT statements (i.e. those not in a transaction). This change appropriately handles deferred pruning as well as master- evaluated functions.	2017-08-10 00:32:46 -07:00
Onder Kalaci	b5ea3ab6a3	Improve locking semantics for backend management We use the backend shared memory lock for preventing new backends to be part of a new distributed transaction or an existing backend to leave a distributed transaction while we're reading the all backends' data. The primary goal is to provide consistent view of the current distributed transactions while doing the deadlock detection.	2017-08-09 17:17:12 +03:00
Marco Slot	3a0571e69b	Remove LockMetadataSnapshot	2017-08-09 14:09:54 +02:00
Burak Yucesoy	fddf9b3fcc	Add distributed partitioned table support distributed table creation With this PR, Citus starts to support all possible ways to create distributed partitioned tables. These are; - Distributing already created partitioning hierarchy - CREATE TABLE ... PARTITION OF a distributed_table - ALTER TABLE distributed_table ATTACH PARTITION non_distributed_table - ALTER TABLE distributed_table ATTACH PARTITION distributed_table We also support DETACHing partitions from partitioned tables and propogating TRUNCATE and DDL commands to distributed partitioned tables. This PR also refactors some parts of distributed table creation logic.	2017-08-09 10:01:35 +03:00
Metin Doslu	b8a9e7c1bf	Add support for UPDATE/DELETE with subqueries	2017-08-08 21:35:08 +03:00
Marco Slot	d3e9746236	Avoid connections that accessed non-colocated placements in multi-shard commands	2017-08-08 18:32:34 +02:00
Brian Cloutier	5914c992e6	cluster management UDFs see nodes in different clusters - master_activate_node and master_disable_node correctly toggle isActive, without crashing - master_add_node rejects duplicate nodes, even if they're in different clusters - master_remove_node allows removing nodes in different clusters	2017-08-08 13:12:06 +03:00
Brian Cloutier	3151b52a0b	Add citus.cluster_name GUC - Nodes with a nodecluster which does not match citus.cluster_name are excluded from the metadata cache and never seen by another part of Citus.	2017-08-08 13:12:06 +03:00
Brian Cloutier	fbecf48a03	Disallow adding primary nodes to non-default clusters	2017-08-08 11:18:31 +03:00
Brian Cloutier	5618e69386	Add pg_dist_node.nodecluster	2017-08-08 11:18:31 +03:00
Marco Slot	aa7ca81548	Execute UPDATE/DELETE statements with 0 shards	2017-08-07 15:36:58 +02:00
Murat Tuncer	fa18899cf9	Remove serialization/deserialization of multiplan node (#1477 ) introduces copy functions for Citus MultiPlan nodes. uses ExtensibleNode mechanism to store MultiPlan data drops serialiazation of MultiPlans	2017-08-02 08:24:00 +03:00
Burak Yucesoy	7769f1d012	Refactor distributed table creation logic This commit is preperation for introducing distributed partitioned table support. We want to clean and refactor some code in distributed table creation logic so that we can handle partitioned tables in more robust way.	2017-07-31 11:11:23 +03:00
Brian Cloutier	b20a086a8f	master_activate_node UDF also returns noderole	2017-07-28 16:02:43 +03:00
Murat Tuncer	26f020dc6e	Make maxTaskStringSize configurable (#1501 ) maxTaskStringSize determines the size of worker query string. It was originally hard coded to a specific value. This has caused issues at some users. Since it determines initial shared memory allocation, we did not want to set it to an arbitrary higher number. Instead made it configurable. This commit introduces a new GUC variable max_task_string_size Changes in this variable requires restart to be in effect.	2017-07-27 11:39:12 -07:00
Onder Kalaci	6132d17481	Convert global wait edges to adjacency list In this commit, we add ability to convert global wait edges into adjacency list with the following format: [transactionId] = [transactionNode->waitsFor {list of waiting transaction nodes}]	2017-07-27 19:53:51 +03:00
Murat Tuncer	8729b7d55a	Use cstore_table_size function to determine cstore table size (#1521 ) pg_table_size/pg_relation_size variants always return 0 for cstore tables. We should be using cstore_table_size function for cstore_tables.	2017-07-27 09:02:07 -07:00
Eren Başak	a12f1980de	Add Progress Tracking Infrastructure This change adds a general purpose infrastructure to log and monitor process about long running progresses. It uses `pg_stat_get_progress_info` infrastructure, introduced with PostgreSQL 9.6 and used for tracking `VACUUM` commands. This patch only handles the creation of a memory space in dynamic shared memory, putting its info in `pg_stat_get_progress_info`, fetching the progress monitors on demand and finalizing the progress tracking.	2017-07-26 14:12:15 +03:00
Marco Slot	80ea233ec1	Add function for dumping global wait edges	2017-07-25 16:52:32 +02:00
Marco Slot	81198a1d02	Add function for dumping local wait edges	2017-07-25 16:52:32 +02:00
Brian Cloutier	ec99f8f983	Add nodeRole column - master_add_node enforces that there is only one primary per group - there's also a trigger on pg_dist_node to prevent multiple primaries per group - functions in metadata cache only return primary nodes - Rename ActiveWorkerNodeList -> ActivePrimaryNodeList - Rename WorkerGetLive{Node->Group}Count() - Refactor WorkerGetRandomCandidateNode - master_remove_node only complains about active shard placements if the node being removed is a primary. - master_remove_node only deletes all reference table placements in the group if the node being removed is the primary. - Rename {Node->NodeGroup}HasShardPlacements, this reflects the behavior it already had. - Rename DeleteAllReferenceTablePlacementsFrom{Node->NodeGroup}. This also reflects the behavior it already had, but the new signature forces the caller to pass in a groupId - Rename {WorkerGetLiveGroup->ActivePrimaryNode}Count	2017-07-24 11:57:46 +03:00
Brian Cloutier	ee270b65d7	make WorkerGetNodeWithName a static function	2017-07-24 11:57:46 +03:00
Marco Slot	601b17d544	Use distributed transaction number in 2PC identifiers	2017-07-21 17:36:33 +02:00
Marco Slot	18a6e478af	Fix typo in GetCurrentDistributedTransctionId	2017-07-21 17:36:33 +02:00
Onder Kalaci	3369f3486f	Introduce distributed transaction ids This commit adds distributed transaction id infrastructure in the scope of distributed deadlock detection. In general, the distributed transaction id consists of a tuple in the form of: `(databaseId, initiatorNodeIdentifier, transactionId, timestamp)`. Briefly, we add a shared memory block on each node, which holds some information per backend (i.e., an array `BackendData backends[MaxBackends]`). Later, on each coordinated transaction, Citus sends `SELECT assign_distributed_transaction_id()` right after `BEGIN`. For that backend on the worker, the distributed transaction id is set to the values assigned via the function call. The aim of the above is to correlate the transactions on the coordinator to the transactions on the worker nodes.	2017-07-18 15:01:42 +03:00
velioglu	6ea15fbb25	Make create_distributed_table transactional	2017-07-18 12:35:40 +03:00
Brian Cloutier	7ad95b53d2	Rename pg_dist_shard_placement -> pg_dist_placement Comes with a few changes: - Change the signature of some functions to accept groupid - InsertShardPlacementRow - DeleteShardPlacementRow - UpdateShardPlacementState - NodeHasActiveShardPlacements returns true if the group the node is a part of has any active shard placements - TupleToShardPlacement now returns ShardPlacements which have NULL nodeName and nodePort. - Populate (nodeName, nodePort) when creating ShardPlacements - Disallow removing a node if it contains any shard placements - DeleteAllReferenceTablePlacementsFromNode matches based on group. This doesn't change behavior for now (while there is only one node per group), but means in the future callers should be careful about calling it on a secondary node, it'll delete placements on the primary. - Create concept of a GroupShardPlacement, which represents an actual tuple in pg_dist_placement and is distinct from a ShardPlacement, which has been resolved to a specific node. In the future ShardPlacement should be renamed to NodeShardPlacement. - Create some triggers which allow existing code to continue to insert into and update pg_dist_shard_placement as if it still existed.	2017-07-12 14:17:31 +02:00
Brian Cloutier	fe53fd4a8e	Remove functions created just for unit testing These functions are holdovers from pg_shard and were created for unit testing c-level functions (like InsertShardPlacementRow) which our regression tests already test quite effectively. Removing because it makes refactoring the signatures of those c-level functions unnecessarily difficult. - create_healthy_local_shard_placement_row - update_shard_placement_row_state - delete_shard_placement_row	2017-07-12 14:16:24 +02:00
Marco Slot	29f21fea59	Use GetPlacementListConnection for multi-shard commands	2017-07-12 11:26:22 +02:00
Marco Slot	01c9b1f921	Use GetPlacementListConnection for router SELECTs	2017-07-12 11:26:22 +02:00
Marco Slot	63676f5d65	Allow choosing a connection for multiple placements with GetPlacementListConnection	2017-07-12 11:26:22 +02:00
Burak Yucesoy	c8b9e4011b	Remove LockRelationDistributionMetadata function	2017-07-10 15:46:37 +03:00
Andres Freund	be8677f926	Add NonblockingForgetResults(). This is very similar to ForgetResults() except that no network IO is performed. Primarily useful in error handling cases.	2017-07-04 14:46:03 -07:00
Andres Freund	24153fae5d	Add ShutdownConnection() which cancels statement before closing connection. That's primarily useful in error cases, where we want to make sure locks etc held by commands running on workers are released promptly.	2017-07-04 14:46:03 -07:00
Andres Freund	c674bc8640	Add interrupt aware PQputCopy{End,Data} wrappers.	2017-07-04 12:38:52 -07:00
Marco Slot	da47a03b18	Move INSERT ... SELECT planning logic into one place	2017-06-29 15:03:14 +02:00
Onder Kalaci	5f3f1d75a3	Add some utility functions for partitioned tables This commit is intended to be a base for supporting declarative partitioning on distributed tables. Here we add the following utility functions and their unit tests: * Very basic functions including differnentiating partitioned tables and partitions, listing the partitions * Generating the PARTITION BY (expr) and adding this to the DDL events of partitioned tables * Ability to generate text representations of the ranges for partitions * Ability to generate the `ALTER TABLE parent_table ATTACH PARTITION partition_table FOR VALUES value_range` * Ability to apply add shard ids to the above command using `worker_apply_inter_shard_ddl_command()` * Ability to generate `ALTER TABLE parent_table DETACH PARTITION`	2017-06-28 09:39:55 +03:00
Andres Freund	dc3997c3b8	Remove 9.5 related node wrappers. Now that all branches support the extensible node infrastructure, we don't need our wrappers anymore.	2017-06-26 08:46:32 -07:00
Andres Freund	b96ba9b490	Fix code only enabled for 9.5. There's still supporting wrappers used, a subsequent commit will remove those. This also removes the already unused tuplecount_t define.	2017-06-26 08:46:32 -07:00
Jason Petersen	2204da19f0	Support PostgreSQL 10 (#1379 ) Adds support for PostgreSQL 10 by copying in the requisite ruleutils and updating all API usages to conform with changes in PostgreSQL 10. Most changes are fairly minor but they are numerous. One particular obstacle was the change in \d behavior in PostgreSQL 10's psql; I had to add SQL implementations (views, mostly) to mimic the pre-10 output.	2017-06-26 02:35:46 -06:00
Andres Freund	c3b7c5dc33	Introduce per-database maintenance process. This will be used for deadlock detection, prepared transaction recovery amongst others, but currently is just idling around.	2017-06-23 11:53:39 -07:00
Andres Freund	3483bb99eb	Minimal infrastructure for per-backend citus initialization.	2017-06-23 11:20:10 -07:00
Marco Slot	2f8ac82660	Execute INSERT..SELECT via coordinator if it cannot be pushed down Add a second implementation of INSERT INTO distributed_table SELECT ... that is used if the query cannot be pushed down. The basic idea is to execute the SELECT query separately and pass the results into the distributed table using a CopyDestReceiver, which is also used for COPY and create_distributed_table. When planning the SELECT, we go through planner hooks again, which means the SELECT can also be a distributed query. EXPLAIN is supported, but EXPLAIN ANALYZE is not because preventing double execution was a lot more complicated in this case.	2017-06-22 15:46:30 +02:00
Marco Slot	155db4d913	Simplify router planner call path	2017-06-22 15:45:57 +02:00
velioglu	a17ab6408a	Delete ExecuteRemoteCommand function	2017-06-15 17:11:19 +03:00
Burak Yucesoy	c7bfa06cb9	Fix incorrect call to CheckInstalledVersion During version update, we indirectly calld CheckInstalledVersion via ChackCitusVersions. This obviously fails because during version update it is expected to have version mismatch between installed version and binary version. Thus, we remove that ChackCitusVersions. We now only call ChackAvailableVersion.	2017-05-24 17:39:25 +03:00
Burak Yucesoy	eea8c51e1f	Only error out on distributed queries when there is version mismatch Before this commit, we were erroring out at almost all queries if there is a version mismatch. With this commit, we started to error out only requested operation touches distributed tables. Normally we would need to use distributed cache to understand whether a table is distributed or not. However, it is not safe to read our metadata tables when there is a version mismatch, thus it is not safe to create distributed cache. Therefore for this specific occasion, we directly read from pg_dist_partition table. However; reading from catalog is costly and we should not use this method in other places as much as possible.	2017-05-22 09:53:29 +03:00
Jason Petersen	c9fa11b445	Use library and symbol name for bgw entry PostgreSQL 10 takes away the ability to directly assign a function pointer; the other approach (library and symbol name) is supported by all versions.	2017-05-16 11:05:33 -06:00
Önder Kalacı	3ec502b286	Add support for parametrized execution for subquery pushdown (#1356 ) Distributed query planning for subquery pushdown is done on the original query. This prevents the usage of external parameters on the execution. To overcome this, we manually replace the parameters on the original query.	2017-05-10 09:38:48 +03:00
Marco Slot	8edba5f309	Honour enable_ddl_propagation in truncate trigger	2017-04-29 03:32:52 +02:00
Önder Kalacı	ad5cd326a4	Subquery pushdown - main branch (#1323 ) * Enabling physical planner for subquery pushdown changes This commit applies the logic that exists in INSERT .. SELECT planning to the subquery pushdown changes. The main algorithm is followed as : - pick an anchor relation (i.e., target relation) - per each target shard interval - add the target shard interval's shard range as a restriction to the relations (if all relations joined on the partition keys) - Check whether the query is router plannable per target shard interval. - If router plannable, create a task * Add union support within the JOINS This commit adds support for UNION/UNION ALL subqueries that are in the following form: .... (Q1 UNION Q2 UNION ...) as union_query JOIN (QN) ... In other words, we currently do NOT support the queries that are in the following form where union query is not JOINed with other relations/subqueries : .... (Q1 UNION Q2 UNION ...) as union_query .... * Subquery pushdown planner uses original query With this commit, we change the input to the logical planner for subquery pushdown. Before this commit, the planner was relying on the query tree that is transformed by the postgresql planner. After this commit, the planner uses the original query. The main motivation behind this change is the simplify deparsing of subqueries. * Enable top level subquery join queries This work enables - Top level subquery joins - Joins between subqueries and relations - Joins involving more than 2 range table entries A new regression test file is added to reflect enabled test cases * Add top level union support This commit adds support for UNION/UNION ALL subqueries that are in the following form: .... (Q1 UNION Q2 UNION ...) as union_query .... In other words, Citus supports allow top level unions being wrapped into aggregations queries and/or simple projection queries that only selects some fields from the lower level queries. * Disallow subqueries without a relation in the range table list for subquery pushdown This commit disallows subqueries without relation in the range table list. This commit is only applied for subquery pushdown. In other words, we do not add this limitation for single table re-partition subqueries. The reasoning behind this limitation is that if we allow pushing down such queries, the result would include (shardCount * expectedResults) where in a non distributed world the result would be (expectedResult) only. * Disallow subqueries without a relation in the range table list for INSERT .. SELECT This commit disallows subqueries without relation in the range table list. This commit is only applied for INSERT.. SELECT queries. The reasoning behind this limitation is that if we allow pushing down such queries, the result would include (shardCount * expectedResults) where in a non distributed world the result would be (expectedResult) only. * Change behaviour of subquery pushdown flag (#1315) This commit changes the behaviour of the citus.subquery_pushdown flag. Before this commit, the flag is used to enable subquery pushdown logic. But, with this commit, that behaviour is enabled by default. In other words, the flag is now useless. We prefer to keep the flag since we don't want to break the backward compatibility. Also, we may consider using that flag for other purposes in the next commits. * Require subquery_pushdown when limit is used in subquery Using limit in subqueries may cause returning incorrect results. Therefore we allow limits in subqueries only if user explicitly set subquery_pushdown flag. * Evaluate expressions on the LIMIT clause (#1333) Subquery pushdown uses orignal query, the LIMIT and OFFSET clauses are not evaluated. However, logical optimizer expects these expressions are already evaluated by the standard planner. This commit manually evaluates the functions on the logical planner for subquery pushdown. * Better format subquery regression tests (#1340) * Style fix for subquery pushdown regression tests With this commit we intented a more consistent style for the regression tests we've added in the - multi_subquery_union.sql - multi_subquery_complex_queries.sql - multi_subquery_behavioral_analytics.sql * Enable the tests that are temporarily commented This commit enables some of the regression tests that were commented out until all the development is done. * Fix merge conflicts (#1347) - Update regression tests to meet the changes in the regression test output. - Replace Ifs with Asserts given that the check is already done - Update shard pruning outputs * Add view regression tests for increased subquery coverage (#1348) - joins between views and tables - joins between views - union/union all queries involving views - views with limit - explain queries with view * Improve btree operators for the subquery tests This commit adds the missing comprasion for subquery composite key btree comparator.	2017-04-29 04:09:48 +03:00
Burak Yucesoy	6599677902	Fix check-vanilla tests It semms that GEQO optimizations, when it is set to on, create their own memory context and free it after when it is no longer necessary. In join multi_join_restriction_hook we allocate our variables in the CurrentMemoryContext, which is GEQO's memory context if it is active. To prevent deallocation of our variables when GEQO's memory context is freed, we started to allocate memory fo these variables in separate MemoryContext.	2017-04-29 01:55:18 +02:00
Andres Freund	d399f395f7	Faster shard pruning. So far citus used postgres' predicate proofing logic for shard pruning, except for INSERT and COPY which were already optimized for speed. That turns out to be too slow: * Shard pruning for SELECTs is currently O(#shards), because PruneShardList calls predicate_refuted_by() for every shard. Obviously using an O(N) type algorithm for general pruning isn't good. * predicate_refuted_by() is quite expensive on its own right. That's primarily because it's optimized for doing a single refutation proof, rather than performing the same proof over and over. * predicate_refuted_by() does not keep persistent state (see 2.) for function calls, which means that a lot of syscache lookups will be performed. That's particularly bad if the partitioning key is a composite key, because without a persistent FunctionCallInfo record_cmp() has to repeatedly look-up the type definition of the composite key. That's quite expensive. Thus replace this with custom-code that works in two phases: 1) Search restrictions for constraints that can be pruned upon 2) Use those restrictions to search for matching shards in the most efficient manner available: a) Binary search / Hash Lookup in case of hash partitioned tables b) Binary search for equal clauses in case of range or append tables without overlapping shards. c) Binary search for inequality clauses, searching for both lower and upper boundaries, again in case of range or append tables without overlapping shards. d) exhaustive search testing each ShardInterval My measurements suggest that we are considerably, often orders of magnitude, faster than the previous solution, even if we have to fall back to exhaustive pruning.	2017-04-28 14:40:41 -07:00
Andres Freund	6bd2e3ed30	Add DistTableCacheEntry->hasOverlappingShardInterval. This determines whether it's possible to perform binary search on sortedShardIntervalArray or not. If e.g. two shards have overlapping ranges, that'd be prohibitive. That'll be useful in later commit introducing faster shard pruning.	2017-04-28 14:40:38 -07:00
Andres Freund	105483ec56	Add DistTableCacheEntry->shardValueCompareFunction. That's useful when comparing values a hash-partitioned table is filtered by. The existing shardIntervalCompareFunction is about comparing hashed values, not unhashed ones. The added btree opclass function is so we can get a comparator back. This should be changed much more widely, but is not necessary so far.	2017-04-28 14:40:38 -07:00
Andres Freund	52571c00ad	Build DistTableCacheEntry->shardIntervalCompareFunction even for 0 shards. Previously we, unnecessarily, used a the first shard's type information to to look up the comparison function. But that information is already available, so use it. That's helpful because we sometimes want to access the comparator function even if there's no shards.	2017-04-28 14:40:38 -07:00
Metin Doslu	b6659bec22	Send explain queries with savepoints With this commit, we started to send explain queries within a savepoint. After running explain query, we rollback to savepoint. This saves us from side effects of EXPLAIN ANALYZE on DML queries.	2017-04-28 12:13:48 -07:00
Jason Petersen	93e3afc25c	Remove FastShardPruning method With the other simplifications, it doesn't make sense to keep around.	2017-04-27 13:32:36 -06:00
Jason Petersen	42ee7c05f5	Refactor FindShardInterval to use cacheEntry All callers fetch a cache entry and extract/compute arguments for the eventual FindShardInterval call, so it makes more sense to refactor into that function itself; this solves the use-after-free bug, too.	2017-04-27 13:32:36 -06:00
Marco Slot	4ed093970a	Support expressions in the partition column in INSERTs	2017-04-21 14:05:52 +02:00
velioglu	24d24db25c	Implement ALTER TABLE ADD CONSTRAINT command	2017-04-20 15:02:33 +03:00
velioglu	8cbef819be	Log message of across shard queries according to the log level	2017-04-20 12:24:46 +03:00
velioglu	2327b63291	Change native hash function with worker_hash	2017-04-19 22:16:55 +03:00
Marco Slot	dfd7d86948	Stop using a sequence to generate unique job IDs	2017-04-18 11:31:51 +02:00
Marco Slot	5e58804d44	Support query parameters in combination with function evaluation	2017-04-17 15:40:55 +02:00
Marco Slot	0bcc227a62	Create indexes after worker_append_table_to_shard during shard repair	2017-04-17 15:17:21 +02:00
Burak Yucesoy	e9095e62ec	Decouple reference table replication With this change we add an option to add a node without replicating all reference tables to that node. If a node is added with this option, we mark the node as inactive and no queries will sent to that node. We also added two new UDFs; - master_activate_node(host, port): - marks node as active and replicates all reference tables to that node - master_add_inactive_node(host, port): - only adds node to pg_dist_node	2017-04-17 13:33:31 +03:00
Onder Kalaci	1cb6a34ba8	Remove uninstantiated qual logic, use attribute equivalences In this PR, we aim to deduce whether each of the RTE_RELATION is joined with at least on another RTE_RELATION on their partition keys. If each RTE_RELATION follows the above rule, we can conclude that all RTE_RELATIONs are joined on their partition keys. In order to do that, we invented a new equivalence class namely: AttributeEquivalenceClass. In very simple words, a AttributeEquivalenceClass is identified by an unique id and consists of a list of AttributeEquivalenceMembers. Each AttributeEquivalenceMember is designed to identify attributes uniquely within the whole query. The necessity of this arise since varno attributes are defined within a single level of a query. Instead, here we want to identify each RTE_RELATION uniquely and try to find equality among each RTE_RELATION's partition key. Whenever we find an equality clause A = B, where both A and B originates from relation attributes (i.e., not random expressions), we create an AttributeEquivalenceClass to record this knowledge. If we later find another equivalence B = C, we create another AttributeEquivalenceClass. Finally, we can apply transitity rules and generate a new AttributeEquivalenceClass which includes A, B and C. Note that equality among the members are identified by the varattno and rteIdentity. Each equality among RTE_RELATION is saved using an AttributeEquivalenceClass where each member attribute is identified by a AttributeEquivalenceMember. In the final step, we try generate a common attribute equivalence class that holds as much as AttributeEquivalenceMembers whose attributes are a partition keys.	2017-04-13 11:51:26 +03:00
Burak Yucesoy	a09614553f	Add enable_version_checks GUC and address feedback	2017-04-04 19:11:13 +03:00
Jason Petersen	1c2056ec74	Self-implemented review feedback The use of a bare src/ rather than $srcdir caused configure to fail during VPATH builds. With our additional dependency upon AWK, we need to call AC_PROG_AWK, otherwise environments may not have $AWK set. Finally, citus_version.h should be in .gitignore.	2017-04-03 22:55:12 -06:00
Burak Yucesoy	087d8427e3	Error out if binary citus version does not match installed extension With this change, we start to error out if loaded citus binaries does not match the available major version or installed citus extension version. In this case we force user to restart the server or run ALTER EXTENSION depending on the situation	2017-04-03 17:36:13 -06:00
Jason Petersen	4cdfc3a10f	Address review feedback Should just about do it.	2017-04-03 11:44:57 -06:00
Jason Petersen	cf775c4773	Improve CONCURRENTLY-related error messages Thought this looked slightly nicer than the default behavior. Changed preventTransaction to concurrent to be clearer that this code path presently affects CONCURRENTLY code only.	2017-04-03 11:19:15 -06:00
Jason Petersen	dd9365433e	Update documentation Ensure all functions have comments, etc.	2017-04-03 11:19:15 -06:00
Jason Petersen	d904e96c59	Address MX CONCURRENTLY problems Adds a non-transactional multi-command method to propagate DDLs to all MX/metadata-synced nodes.	2017-04-03 11:19:15 -06:00
Jason Petersen	dea6c44f75	Remove CONCURRENTLY checks, fix tests Still pending failure testing, which broke with my recent changes.	2017-04-03 11:19:15 -06:00
Jason Petersen	95d8d27c4f	Change IndexStmt to generate worker DDL on master Because we can't execute CREATE INDEX CONCURRENTLY during transactions, worker_apply_shard_ddl_command is insufficient.	2017-04-03 11:19:14 -06:00
Marco Slot	0f355a4a48	Batch task_tracker_status calls to reduce task-tracker query times	2017-03-31 11:54:11 +02:00
Jason Petersen	34a62abb7d	Address code review comments	2017-03-22 17:29:17 -06:00
Jason Petersen	d95b5bbad3	Rework ReplicateGrantStmt to use new flow This was the impetus for the previous commit that changed from using a DDLJob * to a List * of them.	2017-03-22 17:29:16 -06:00
Jason Petersen	a02a2a90c7	Refactor ExecuteDistDDLCommand to expect struct Will let us separate out the determination of what to execute from its actual execution.	2017-03-22 17:21:49 -06:00
velioglu	e32aff1a26	Size UDFs implemented citus_table_size, citus_relation_size and citus_total_relation_size UDFs are implemented.	2017-03-16 13:50:30 +03:00
Metin Doslu	1f838199f8	Use CustomScan API for query execution Custom Scan is a node in the planned statement which helps external providers to abstract data scan not just for foreign data wrappers but also for regular relations so you can benefit your version of caching or hardware optimizations. This sounds like only an abstraction on the data scan layer, but we can use it as an abstraction for our distributed queries. The only thing we need to do is to find distributable parts of the query, plan for them and replace them with a Citus Custom Scan. Then, whenever PostgreSQL hits this custom scan node in its Vulcano style execution, it will call our callback functions which run distributed plan and provides tuples to the upper node as it scans a regular relation. This means fewer code changes, fewer bugs and more supported features for us! First, in the distributed query planner phase, we create a Custom Scan which wraps the distributed plan. For real-time and task-tracker executors, we add this custom plan under the master query plan. For router executor, we directly pass the custom plan because there is not any master query. Then, we simply let the PostgreSQL executor run this plan. When it hits the custom scan node, we call the related executor parts for distributed plan, fill the tuple store in the custom scan and return results to PostgreSQL executor in Vulcano style, a tuple per XXX_ExecScan() call. * Modify planner to utilize Custom Scan node. * Create different scan methods for different executors. * Use native PostgreSQL Explain for master part of queries.	2017-03-14 12:17:51 +02:00
Andres Freund	52358fe891	Initial temp table removal implementation	2017-03-14 12:09:49 +02:00
Jason Petersen	6f4886cd11	Revert "Remove unused SendCommandToWorker" This reverts commit `c8c308c109`.	2017-03-13 15:48:51 -06:00
Brian Cloutier	c8c308c109	Remove unused SendCommandToWorker	2017-03-08 16:30:23 +03:00
Brian Cloutier	95936ff481	Remove unused master_get_round_robin_candidate_nodes	2017-03-07 11:51:24 +03:00
Brian Cloutier	807beb7bc0	Remove master_get_local_first_candidate_nodes	2017-03-07 11:50:59 +03:00
Murat Tuncer	72027f2eba	Remove default clause from shard DDL when sequences are used	2017-03-01 17:32:48 +03:00
Marco Slot	db98c28354	Address review feedback in COPY refactoring	2017-02-28 17:39:45 +01:00
Marco Slot	bf3541cb24	Add CitusCopyDestReceiver infrastructure	2017-02-28 17:24:45 +01:00
Eren Basak	df9cf346ee	Enforce statement based replication on old APIs and non-hash tables This change ignores `citus.replication_model` setting and uses the statement based replication in - Tables distributed via the old `master_create_distributed_table` function - Append and range partitioned tables, even if created via `create_distributed_table` function This seems like the easiest solution to #1191, without changing the existing behavior and harming existing users with custom scripts. This change also prevents RF>1 on streaming replicated tables on `master_create_worker_shards` Prior to this change, `master_create_worker_shards` command was not checking the replication model of the target table, thus allowing RF>1 with streaming replicated tables. With this change, `master_create_worker_shards` errors out on the case.	2017-02-16 10:37:53 -08:00
Brian Cloutier	1173f3f225	Refactor CheckShardPlacements - Break CheckShardPlacements into multiple functions (The most important is MarkFailedShardPlacements), so that we can get rid of the global CoordinatedTransactionUses2PC. - Call MarkFailedShardPlacements in the router executor, so we mark shards as invalid and stop using them while inside transaction blocks.	2017-01-26 13:20:45 +02:00
Marco Slot	ba940a1de9	Use coordinator instead of schema node in terminology	2017-01-25 11:07:23 +01:00
Burak Yucesoy	484cb12cd0	Add LoadShardPlacement UDF This UDF returns a shard placement from cache given shard id and placement id. At the moment it iterates over all shard placements of given shard by ShardPlacementList and searches given placement id in that list, which is not a good solution performance-wise. However, currently, this function will be used only when there is a failed transaction. If a need arises we can optimize this function in the future.	2017-01-23 21:04:57 +03:00
Marco Slot	1585c02322	Use placement connection API for multi-shard transactions	2017-01-23 18:34:50 +01:00
Andres Freund	6939cb8c56	Hack up PREPARE/EXECUTE for nearly all distributed queries. All router, real-time, task-tracker plannable queries should now have full prepared statement support (and even use router when possible), unless they don't go through the custom plan interface (which basically just affects LANGUAGE SQL (not plpgsql) functions). This is achieved by forcing postgres' planner to always choose a custom plan, by assigning very low costs to plans with bound parameters (i.e. ones were the postgres planner replanned the query upon EXECUTE with all parameter values provided), instead of the generic one. This requires some trickery, because for custom plans to work the costs for a non-custom plan have to be known, which means we can't error out when planning the generic plan. Instead we have to return a "faux" plan, that'd trigger an error message if executed. But due to the custom plan logic that plan will likely (unless called by an SQL function, or because we can't support that query for some reason) not be executed; instead the custom plan will be chosen.	2017-01-23 09:23:50 -08:00
Andres Freund	c244b8ef4a	Make router planner error handling more flexible. So far router planner had encapsulated different functionality in MultiRouterPlanCreate. Modifications always go through router, selects sometimes. Modifications always error out if the query is unsupported, selects return NULL. Especially the error handling is a problem for the upcoming extension of prepared statement support. Split MultiRouterPlanCreate into CreateRouterPlan and CreateModifyPlan, and change them to not throw errors. Instead errors are now reported by setting the new MultiPlan->plannigError. Callers of router planner functionality now have to throw errors themselves if desired, but also can skip doing so. This is a pre-requisite for expanding prepared statement support. While touching all those lines, improve a number of error messages by getting them closer to the postgres error message guidelines.	2017-01-23 09:23:50 -08:00
Andres Freund	557ccc6fda	Support for deferred error messages. It can be useful, e.g. in the upcoming prepared statement support, to be able to return an error from a function that is not raised immediately, but can later be thrown. That allows e.g. to attempt to plan a statment using different methods and to create good error messages in each planner, but to only error out after all planners have been run. To enable that create support for deferred error messages that can be created (supporting errorcode, message, detail, hint) in one function, and then thrown in different place.	2017-01-23 09:23:50 -08:00
Jason Petersen	56197dbdba	Add replication_model GUC This adds a replication_model GUC which is used as the replication model for any new distributed table that is not a reference table. With this change, tables with replication factor 1 are no longer implicitly MX tables. The GUC is similarly respected during empty shard creation for e.g. existing append-partitioned tables. If the model is set to streaming while replication factor is greater than one, table and shard creation routines will error until this invalid combination is corrected. Changing this parameter requires superuser permissions.	2017-01-23 09:05:14 -07:00
Brian Cloutier	fe5465aa4e	Port master_append_table_to_shard to new connection API (#1149 ) If any placements fail it doesn't update shard statistics on those placements. A minor enabling refactor: Make CoordinatedTransactionUses2PC public (it used to be CoordinatedTransactionUse2PC but that symbol already existed, so renamed it as well)	2017-01-23 15:57:44 +02:00
Marco Slot	ea855ddf86	Add an enable_deadlock_prevention flag to allow router transactions to expand to multiple nodes	2017-01-22 17:31:24 +01:00
Andres Freund	78b085106a	Remove connection_cache.[ch].	2017-01-21 09:01:15 -08:00
Andres Freund	6ec34bed84	Remove remnants of commit_protocol.[ch].	2017-01-21 09:01:15 -08:00
Andres Freund	fd717d6da9	Consistently libpq forward declaration in remote_commands.h.	2017-01-21 09:01:14 -08:00
Murat Tuncer	d76f781ae4	Convert multi copy to use new connection api This enables proper transactional behaviour for copy and relaxes some restrictions like combining COPY with single-row modifications. It also provides the basis for relaxing restrictions further, and for optionally allowing connection caching.	2017-01-20 19:15:19 -08:00
Andres Freund	3a36d32c43	Mark some now unnecessarily exposed multi_planner.c functions static.	2017-01-20 12:31:56 -08:00
Andres Freund	0f28a11970	Remove citus.explain_multi_logical/physical_plan. They make fixing explain for prepared statement harder, and they don't really fit into EXPLAIN in the first place. Additionally they're currently not exercised in any tests.	2017-01-20 12:31:19 -08:00
Metin Doslu	2bd8f8f12e	Add a function to delete shard metadata from MX nodes	2017-01-20 14:38:01 +02:00
Metin Doslu	93e626c896	Refactor get_shard_id_for_distribution_column() and other minor changes	2017-01-20 14:38:01 +02:00
Eren Basak	e7c15ecc1f	Make `upgrade_to_reference_table` function MX-compatible	2017-01-18 16:49:50 +03:00
Eren Basak	be78769ae4	Propagate new reference table placement metadata on `master_add_node`	2017-01-18 15:59:06 +03:00
Eren Basak	b686d9a025	Add Sequence Support for MX Tables This change adds support for serial columns to be used with MX tables. Prior to this change, sequences of serial columns were created in all workers (for being able to create shards) but never used. With MX, we need to set the sequences so that sequences in each worker create unique values. This is done by setting the MINVALUE, MAXVALUE and START values of the sequence.	2017-01-18 09:43:38 +03:00
Andres Freund	bdef35ac14	Query placementId in RemoteFinalizedShardPlacementList(). Not having the id in the ShardPlacement struct causes issues while making copy use the placement aware connection management.	2017-01-17 13:27:26 -08:00
Brian Cloutier	b1b2b4fadf	Create ExecuteOptionalRemoteCommand A small refactor which pulls some code out of `RecoverWorkerTransactions` and into `remote_commands.c`. This code block currently only occurs in `RecoverWorkerTransactions` but will be useful to other functions shortly. Unfortunately we couldn't call it `ExecuteRemoteCommand`, that name was already taken.	2017-01-17 17:04:37 +02:00
Andres Freund	6972186652	Add ShardPlacement fields required for colocated placement connection mapping.	2017-01-16 13:42:54 -08:00
Burak Yucesoy	3315ae6142	Remove placement metadata of reference tables after master_remove_node With this change, we start to delete placement of reference tables at given worker node after master_remove_node UDF call. We remove placement metadata at master node but we do not drop actual shard from the worker node. There are two reasons for that decision, first, it is not critical to DROP the shards in the workers because Citus will ignore them as long as node is removed from cluster and if we add that node back to cluster we will DROP and recreate all reference tables. Second, if node is unreachable, it becomes complicated to cover failure cases and have a transaction support.	2017-01-16 11:24:56 +03:00
Murat Tuncer	77f8db6b14	Add view support Enables use views within distributed queries. User can create and use a view on distributed tables/queries as he/she would use with regular queries. After this change router queries will have full support for views, insert into select queries will support reading from views, not writing into. Outer joins would have a limited support, and would error out at certain cases such as when a view is in the inner side of the outer join. Although PostgreSQL supports writing into views under certain circumstances. We disallowed that for distributed views.	2017-01-13 09:39:42 +03:00
Onder Kalaci	aed5f817fa	Refactor CheckShardPlacements() and improve support for node removal This commit refactors CheckShardPlacements() so that it only considers modifyingConnection. Also, it skips nodes which are removed from the cluster.	2017-01-12 20:10:10 +02:00
Andres Freund	b813b39241	Cache ShardPlacements in metadata cache. So far we've reloaded them frequently. Besides avoiding that cost - noticeable for some workloads with large shard counts - it makes it easier to add information to ShardPlacements that help us make placement_connection.c colocation aware.	2017-01-10 18:14:18 -08:00
Andres Freund	7320c17f00	Convert router executor to placement connection management infrastructure. Remove the router specific transaction and shard management, and replace it with the new placement connection API. This mostly leaves behaviour alone, except that it is now, inside a transaction, legal to select from a shard to which no pre-existing connection exists. To simplify code the code handling task executions for select and modify has been split into two - the previous coding was starting to get confusing due to the amount of only conditionally applicable code. Modification connections & transactions are now always established in parallel, not just for reference tables.	2017-01-09 13:13:02 -08:00
Andres Freund	bfa742d794	Centralized shard/placement connection and state management. Currently there are several places in citus that map placements to connections and that manage placement health. Centralize this knowledge. Because of the centralized knowledge about which connection has previously been used for which shard/placement, this also provides the basis for relaxing restrictions around combining various forms of DDL/DML. Connections for a placement can now be acquired using GetPlacementConnection(). If the connection is used for DML or DDL the FOR_DDL/DML flags should be used respectively. If an individual remote transaction fails (but the transaction on the master succeeds) and FOR_DDL/DML have been specified, the placement is marked as invalid, unless that'd mark all placements for a shard as invalid.	2017-01-09 13:13:02 -08:00
Andres Freund	d256f3fca9	Remove unused LogPreparedTransactions() function. This is unused since `92c7567008`.	2017-01-06 09:15:01 -08:00
Burak Yucesoy	9c9f479e4b	Replicate reference tables when new node is added With this change, we start to replicate all reference tables to the new node when new node is added to the cluster with master_add_node command. We also update replication factor of reference table's colocation group.	2017-01-05 14:30:41 +03:00
Onder Kalaci	6d050fd677	Use 2PC for reference table modification With this commit, we ensure that router executor always uses 2PC for reference table modifications and never mark the placements of it as INVALID.	2017-01-04 12:46:35 +02:00
Burak Yucesoy	31cd2357fe	Add upgrade_to_reference_table With this change we introduce new UDF, upgrade_to_reference_table, which can be used to upgrade existing broadcast tables reference tables. For upgrading, we require that given table contains only one shard.	2017-01-02 17:54:42 +02:00
Eren Basak	7e09bd6836	Error on Unsupported Features on Workers This change makes the metadata workers error out on unsupported commands.	2017-01-02 16:03:45 +03:00
Marco Slot	59bc5972fa	Use MultiConnection in multi-shard transactions	2016-12-30 14:43:21 -07:00
Metin Doslu	1ddc70ca55	Add binary search capability to ShardIndex() Renamed FindShardIntervalIndex() to ShardIndex() and added binary search capability. It used to assume that hash partition tables are always uniformly distributed which is not true if upcoming tenant isolation feature is applied. This commit also reduces code duplication.	2016-12-30 18:55:34 +02:00
Marco Slot	6cbc1945f9	Enable transaction recovery in connection API	2016-12-23 16:14:29 +01:00
Marco Slot	92c7567008	Convert worker_transactions to new connection API	2016-12-23 16:14:29 +01:00
Marco Slot	00d55ad957	Add a wrapper for PQsendQuery	2016-12-23 16:14:29 +01:00
Marco Slot	87c62d598e	Connectionapify SendCommandListToWorkerInSingleTransaction	2016-12-23 16:14:29 +01:00
Eren Basak	31af40cc26	Handle MX tables on workers during drop table commands	2016-12-23 15:43:32 +03:00
Eren Basak	bed2e353db	Propagate `mark_tables_colocated` changes in `pg_dist_partition` table to metadata workers.	2016-12-23 15:43:32 +03:00
Eren Basak	71d73ec5ff	Propagate DDL commands to metadata workers for MX tables	2016-12-23 15:43:32 +03:00
Eren Basak	048fddf4da	Propagate MX table and shard metadata on `create_distributed_table` call	2016-12-23 15:43:32 +03:00
Marco Slot	11031bcf55	Enable evaluation of stable functions in INSERT..SELECT	2016-12-23 12:47:21 +01:00
Marco Slot	d745d7bf70	Add explicit RelationShards mapping to tasks	2016-12-23 10:23:43 +01:00
Onder Kalaci	9f0bd4cb36	Reference Table Support - Phase 1 With this commit, we implemented some basic features of reference tables. To start with, a reference table is * a distributed table whithout a distribution column defined on it * the distributed table is single sharded * and the shard is replicated to all nodes Reference tables follows the same code-path with a single sharded tables. Thus, broadcast JOINs are applicable to reference tables. But, since the table is replicated to all nodes, table fetching is not required any more. Reference tables support the uniqueness constraints for any column. Reference tables can be used in INSERT INTO .. SELECT queries with the following rules: * If a reference table is in the SELECT part of the query, it is safe join with another reference table and/or hash partitioned tables. * If a reference table is in the INSERT part of the query, all other participating tables should be reference tables. Reference tables follow the regular co-location structure. Since all reference tables are single sharded and replicated to all nodes, they are always co-located with each other. Queries involving only reference tables always follows router planner and executor. Reference tables can have composite typed columns and there is no need to create/define the necessary support functions. All modification queries, master_* UDFs, EXPLAIN, DDLs, TRUNCATE, sequences, transactions, COPY, schema support works on reference tables as expected. Plus, all the pre-requisites associated with distribution columns are dismissed.	2016-12-20 14:09:35 +02:00
Eren Basak	296e0bd33a	Add citus.node_connection_timeout GUC	2016-12-20 14:11:37 +03:00
Murat Tuncer	c3a60bff70	Make router planner active at all times We used to disable router planner and executor when task executor is set to task-tracker. This change enables router planning and execution at all times regardless of task execution mode. We are introducing a hidden flag enable_router_execution to enable/disable router execution. Its default value is true. User may disable router planning by setting it to false.	2016-12-20 11:24:01 +03:00
Jason Petersen	6f95875191	Add targeted VACUUM/ANALYZE support Adds support for VACUUM and ANALYZE commands which target a specific distributed table. After grabbing the appropriate locks, this imple- mentation sends VACUUM commands to each placement (using one connec- tion per placement). These commands are sent in parallel, so users with large tables will benefit from sharding. Except for VERBOSE, all VACUUM and ANALYZE options are supported, including the explicit column list used by ANALYZE. As with many of our utility commands, the local command also runs. In the VACUUM/ANALYZE case, the local command is executed before any re- mote propagation. Because error handling is managed after local proc- essing, this can result in a VACUUM completing locally but erroring out when distributed processing commences: a minor technicality in all cases, as there isn't really much reason to ever roll back a VACUUM (an impossibility in any case, as VACUUM cannot run within a transaction). Remote propagation of targeted VACUUM/ANALYZE is controlled by the enable_ddl_propagation setting; warnings are emitted if such a command is attempted when DDL propagation is disabled. Unqualified VACUUM or ANALYZE is not handled, but a warning message informs the user of this. Implementation note: this commit adds a "BARE" value to MultiShard- CommitProtocol. When active, no BEGIN command is ever sent to remote nodes, useful for commands such as VACUUM/ANALYZE which must not run in a transaction block. This value is not user-facing and is reset at transaction end.	2016-12-16 16:59:06 -07:00
Metin Doslu	20b8f1feeb	Refactor distribution column type check for colocation	2016-12-16 15:24:45 +02:00
Metin Doslu	e2d0bd38f2	Don't allow tables with different replication models to be colocated	2016-12-16 15:23:49 +02:00
Metin Doslu	86cca54857	Add colocate_with option to create_distributed_table() With this commit, we support three versions of colocate_with: i.default, ii.none and iii. a specific table name.	2016-12-16 14:53:35 +02:00
Metin Doslu	edbedbd744	Move colocation related functions to colocation_utils.c	2016-12-16 14:52:40 +02:00
Eren Basak	b94647c3bc	Propagate CREATE SCHEMA commands with the correct AUTHORIZATION clause in start_metadata_sync_to_node	2016-12-14 10:53:12 +03:00
Eren Basak	fb08093b00	Make start_metadata_sync_to_node UDF to propagate foreign-key constraints	2016-12-14 10:53:12 +03:00
Eren Basak	5e96e4f60e	Make truncate triggers propagated on start_metadata_sync_to_node call	2016-12-14 10:53:10 +03:00
Eren Basak	9eff968d1f	Add start_metadata_sync_to_node UDF This change adds `start_metadata_sync_to_node` UDF which copies the metadata about nodes and MX tables from master to the specified worker, sets its local group ID and marks its hasmetadata to true to allow it receive future DDL changes.	2016-12-13 10:48:03 +03:00
Andres Freund	80b34a5d6b	Integrate router executor into transaction management framework. One less place managing remote transactions. It also makes it fairly easy to use 2PC for certain modifications (e.g. reference tables). Just issue a CoordinatedTransactionUse2PC(). If every placement failure should cause the whole transaction to abort, additionally mark the relevant transactions as critical.	2016-12-12 15:18:12 -08:00
Andres Freund	fa5e202403	Convert multi_shard_transaction.[ch] to new framework.	2016-12-12 15:18:12 -08:00
Andres Freund	fc298ec095	Coordinated remote transaction management.	2016-12-12 15:18:12 -08:00
Andres Freund	6eeb43af15	Add PQgetResult() wrapper handling interrupts. This makes it possible to implement cancelling queries blocked on communication with remote nodes.	2016-12-12 15:18:12 -08:00
Andres Freund	2374905c89	Move multi_client_executor.[ch] ontop of connection_management.[ch]. That way connections can be automatically closed after errors and such, and the connection management infrastructure gets wider testing. It also fixes a few issues around connection string building.	2016-12-07 11:44:24 -08:00
Andres Freund	a77cf36778	Use connection_management.c from within connection_cache.c. This is a temporary step towards removing connection_cache.c.	2016-12-07 11:44:24 -08:00

... 5 6 7 8 9 ...

764 Commits (test/adaptive_executor_repartition)