citus

Commit Graph

Author	SHA1	Message	Date
Philip Dubé	863bf49507	Implement pulling up rows to coordinator when aggregates cannot be pushed down. Enabled by default	2020-01-07 01:16:04 +00:00
Philip Dubé	261a9de42d	Fix typos: VAR_SET_VALUE_KIND -> VAR_SET_VALUE kind beginnig -> beginning plannig -> planning the the -> the er then -> er than	2019-11-25 23:24:13 +00:00
Philip Dubé	c563e0825c	Strip trailing whitespace and add final newline (#3186 ) This brings files in line with our editorconfig file	2019-11-21 14:25:37 +01:00
Jelte Fennema	01da11f264	Change citus truncate trigger to AFTER and add more upgrade tests (#3070 ) * Add more upgrade tests * Fix citus trigger generation after upgrade citus_truncate_trigger runs before truncate when created by create_distributed_table: `492d1b2cba/src/backend/distributed/commands/create_distributed_table.c (L1163)` * Remove pg_dist_jobid_seq	2019-10-07 16:43:04 +02:00
Nils Dijk	2879689441	Distribute Types to worker nodes (#2893 ) DESCRIPTION: Distribute Types to worker nodes When to propagate ============== There are two logical moments that types could be distributed to the worker nodes - When they get used ( just in time distribution ) - When they get created ( proactive distribution ) The just in time distribution follows the model used by how schema's get created right before we are going to create a table in that schema, for types this would be when the table uses a type as its column. The proactive distribution is suitable for situations where it is benificial to have the type on the worker nodes directly. They can later on be used in queries where an intermediate result gets created with a cast to this type. Just in time creation is always the last resort, you cannot create a distributed table before the type gets created. A good example use case is; you have an existing postgres server that needs to scale out. By adding the citus extension, add some nodes to the cluster, and distribute the table. The type got created before citus existed. There was no moment where citus could have propagated the creation of a type. Proactive is almost always a good option. Types are not resource intensive objects, there is no performance overhead of having 100's of types. If you want to use them in a query to represent an intermediate result (which happens in our test suite) they just work. There is however a moment when proactive type distribution is not beneficial; in transactions where the type is used in a distributed table. Lets assume the following transaction: ```sql BEGIN; CREATE TYPE tt1 AS (a int, b int); CREATE TABLE t1 AS (a int PRIMARY KEY, b tt1); SELECT create_distributed_table('t1', 'a'); \copy t1 FROM bigdata.csv ``` Types are node scoped objects; meaning the type exists once per worker. Shards however have best performance when they are created over their own connection. For the type to be visible on all connections it needs to be created and committed before we try to create the shards. Here the just in time situation is most beneficial and follows how we create schema's on the workers. Outside of a transaction block we will just use 1 connection to propagate the creation. How propagation works ================= Just in time ----------- Just in time propagation hooks into the infrastructure introduced in #2882. It adds types as a supported object in `SupportedDependencyByCitus`. This will make sure that any object being distributed by citus that depends on types will now cascade into types. When types are depending them self on other objects they will get created first. Creation later works by getting the ddl commands to create the object by its `ObjectAddress` in `GetDependencyCreateDDLCommands` which will dispatch types to `CreateTypeDDLCommandsIdempotent`. For the correct walking of the graph we follow array types, when later asked for the ddl commands for array types we return `NIL` (empty list) which makes that the object will not be recorded as distributed, (its an internal type, dependant on the user type). Proactive distribution --------------------- When the user creates a type (composite or enum) we will have a hook running in `multi_ProcessUtility` after the command has been applied locally. Running after running locally makes that we already have an `ObjectAddress` for the type. This is required to mark the type as being distributed. Keeping the type up to date ==================== For types that are recorded in `pg_dist_object` (eg. `IsObjectDistributed` returns true for the `ObjectAddress`) we will intercept the utility commands that alter the type. - `AlterTableStmt` with `relkind` set to `OBJECT_TYPE` encapsulate changes to the fields of a composite type. - `DropStmt` with removeType set to `OBJECT_TYPE` encapsulate `DROP TYPE`. - `AlterEnumStmt` encapsulates changes to enum values. Enum types can not be changed transactionally. When the execution on a worker fails a warning will be shown to the user the propagation was incomplete due to worker communication failure. An idempotent command is shown for the user to re-execute when the worker communication is fixed. Keeping types up to date is done via the executor. Before the statement is executed locally we create a plan on how to apply it on the workers. This plan is executed after we have applied the statement locally. All changes to types need to be done in the same transaction for types that have already been distributed and will fail with an error if parallel queries have already been executed in the same transaction. Much like foreign keys to reference tables.	2019-09-13 17:46:07 +02:00
Philip Dubé	0e233c63a3	multi_colocation_utils: sort by nodeport, not placementid multi_copy: replace smgr with aclitem, smgr is removed in pg12	2019-07-25 14:33:43 +00:00
Hadi Moshayedi	46608e42f9	Add hyperscale tutorial to the regression tests.	2019-07-10 10:47:55 +02:00
Hadi Moshayedi	4bbae02778	Make COPY compatible with unified executor.	2019-06-20 19:53:40 +02:00
Marco Slot	b3fcf2a48f	Deprecate master_modify_multiple_shards	2019-05-24 15:22:06 +02:00
Onder Kalaci	58e90ad60d	Add order by multi_outer_join	2019-04-09 12:53:57 +03:00
Onder Kalaci	92e87738dd	Make sure that the regression test output is durable to different execution orders Mostly add order bys and suppress worker node ports in the test outputs.	2019-04-08 11:48:08 +03:00
Marco Slot	1656b519c4	Plan outer joins through pushdown planning	2019-01-05 20:55:27 +01:00
Nils Dijk	6cf4516fdb	fix \d change for indexes in pg11	2018-08-15 23:27:31 -06:00
Nils Dijk	2a9d47e1a6	fix pg11 tests	2018-08-15 23:27:31 -06:00
mehmet furkan şahin	6d0fbbace7	ALTER TABLE %s ADD COLUMN constraint check is added	2018-07-24 15:53:05 +03:00
Murat Tuncer	f20258ef10	Expand count distinct support We can now support more complex count distinct operations by pulling necessary columns to coordinator and evalutating the aggreage at coordinator. It supports broad range of expression with the restriction that the expression must contain a column.	2018-07-06 09:44:20 +03:00
Onder Kalaci	ed47e4e6b9	Remove placementId from the ORDER BY to make results consistent	2018-05-11 17:04:50 +03:00
mehmet furkan şahin	785a86ed0a	Tests are updated to use create_distributed_table	2018-05-10 11:18:59 +03:00
mehmet furkan şahin	ef90122cd3	shard count for some of the tests are increased	2018-05-03 10:44:43 +03:00
velioglu	121ff39b26	Removes large_table_shard_count GUC	2018-04-29 10:34:50 +02:00
Brian Cloutier	0104790385	Fix hard-coded temp directory in multi_copy /tmp does not exist on Windows, use :temp_dir instead	2018-04-17 15:01:22 -07:00
velioglu	82b2d21b0c	Convert broadcast join to reference join After this commit large_table_shard_count wont be used to check whether broadcast join, which is renamed as reference join, can be applied. Reference join can only be applied over reference tables.	2018-04-13 12:58:14 +03:00
velioglu	72dfe4a289	Adds colocation check to local join	2018-04-04 22:49:27 +03:00
velioglu	698d585fb5	Remove broadcast join logic After this change all the logic related to shard data fetch logic will be removed. Planner won't plan any ShardFetchTask anymore. Shard fetch related steps in real time executor and task-tracker executor have been removed.	2018-03-30 11:45:19 +03:00
Brian Cloutier	9aff4384a1	Make tests platform independent - Force all platforms to use the same collation - Force all platforms to use the same locale - Use /dev/null or NUL, depending on platform - Use /tmp or %TEMP%, dpeending on platform	2018-03-27 14:18:48 -07:00
Onder Kalaci	4d4648aabd	Change single shard mx test tables to reference tables	2018-02-26 13:28:24 +02:00
Murat Tuncer	901b543e20	Fix count distinct using field select on top level query We were allowing count distict queries even if they were not directly on columns if the query is grouped on distribution column. When performing these checks we were skipping subqueries because they also perform this check in a more concise manner. We relied on oid SUBQUERY_RELATION_ID (10000) to decide if a given RTE relation id denotes a subquery, however, we also use SUBQUERY_PUSHDOWN_RELATION_ID (10001) for some subqueries. We skip both type of subqueries with this change.	2018-02-06 13:16:10 +03:00
Marco Slot	6f7c3bd73b	Skip JSON validation on coordinator during COPY	2018-02-02 15:33:27 +01:00
Dimitri Fontaine	c9760fbb64	Fix CREATE INDEX with storage options on distributed tables. By sharing the implementation of the function AppendOptionListToString on three call sites, we would expand an extra OPTIONS keyword in a create index statement, and omit other bits of the specific syntax here. This patch introduces an AppendStorageParametersToString() function that is very similar to AppendOptionListToString() but handles WITH(a="foo",...) syntax that is used in reloptions (aka Storage Parameters). Fixes #1747.	2018-01-17 21:56:40 +01:00
Dimitri Fontaine	952da72c55	Implement ALTER TABLE\|INDEX ... SET\|RESET (). PostgreSQL implements support for several relation kinds in a single statement, such as in the AlterTableStmt case, which supports both tables and indexes and more (see ATExecSetRelOptions in PostgreSQL source code file src/backend/commands/tablecmds.c for an example of that). As a consequence, this patch implements support for setting and resetting storage parameters on both relation kinds.	2018-01-17 21:56:40 +01:00
Dimitri Fontaine	17266e3301	Implement ALTER INDEX ... RENAME TO ... The command is now distributed among the shards when the table is distributed. To that effect, we fill in the DDLJob's targetRelationId with the OID of the table for which the index is defined, rather than the OID of the index itself.	2018-01-17 21:56:40 +01:00
Dimitri Fontaine	e010238280	Implement ALTER TABLE ... RENAME TO ... The implementation was already mostly in place, but the code was protected by a principled check against the operation. Turns out there's a nasty concurrency bug though with long identifier names, much as in #1664. To prevent deadlocks from happening, we could either review the DDL transaction management in shards and placements, or we can simply reject names with (NAMEDATALEN - 1) chars or more — that's because of the PostgreSQL array types being created with a one-char prefix: '_'.	2018-01-11 13:21:24 +01:00
Marco Slot	a64f0060ba	Reduce the frequency of FinishConnectionIO calls during COPY (#1864 )	2017-12-14 13:21:59 -05:00
Marco Slot	a9933deac6	Make real time executor work in transactions	2017-11-30 09:59:32 +03:00
Marco Slot	f4ceea5a3d	Enable 2PC by default	2017-11-22 11:26:58 +01:00
mehmet furkan şahin	34709c2a16	Regression tests parallelization PART-1	2017-11-20 18:03:37 +03:00
mehmet furkan şahin	314fc09d90	regression test shard_count is changed from 32 to 4	2017-11-20 12:47:49 +03:00
Marco Slot	89eb833375	Use citus.next_shard_id where practical in regression tests	2017-11-15 10:12:05 +01:00
Murat Tuncer	2332678793	Fix intermittent test failures on count distinct tests (#1761 ) Added analyze to test which seem to force planner to use index.	2017-11-06 10:58:46 +02:00
Murat Tuncer	e16805215d	Support count(distinct) for non-partition columns (#1692 ) Expands count distinct coverage by allowing more cases. We used to support count distinct only if we can push down distinct aggregate to worker query i.e. the count distinct clause was on the partition column of the table, or there was a grouping on the partition column. Now we can support - non-partition columns, with or without grouping on partition column - partition, and non partition column in the same query - having clause - single table subqueries - insert into select queries - join queries where count distinct is on partition, or non-partition column - filters on count distinct clauses (extends existing support) We first try to push down aggregate to worker query (original case), if we can't then we modify worker query to return distinct columns to coordinator node. We do that by adding distinct column targets to group by clauses. Then we perform count distinct operation on the coordinator node. This work should reduce the cases where HLL is used as it can address anything that HLL can. However, if we start having performance issues due to very large number rows, then we can recommend hll use.	2017-10-30 13:12:24 +02:00
mehmet furkan şahin	61ae33dc7f	ALTER TABLE .. REPLICA IDENTITY support is implemented	2017-10-26 13:44:28 +03:00
velioglu	0b5db5d826	Support multi shard update/delete queries	2017-10-25 15:52:38 +03:00
Onder Kalaci	498ac80d8b	Add window function support for SUBQUERY PUSHDOWN and INSERT INTO SELECT This commit provides the support for window functions in subquery and insert into select queries. Note that our support for window functions is still limited because it must have a partition by clause on the distribution key. This commit makes changes in the files insert_select_planner and multi_logical_planner. The required tests are also added with files multi_subquery_window_functions.out and multi_insert_select_window.out.	2017-10-04 15:33:07 +03:00
Marco Slot	24915779d1	Remove separate citus.binary_worker_copy_format regression tests	2017-10-03 17:44:50 +02:00
Onder Kalaci	a5b66912d4	Expand reference table support in subquery pushdown With this commit, we relax the restrictions put on the reference tables with subquery pushdown. We did three notable improvements: 1) Relax equi-join restrictions Previously, we always expected that the non-reference tables are equi joined with reference tables on the partition key of the non-reference table. With this commit, we allow any column of non-reference tables joined using non-equi joins as well. 2) Relax OUTER JOIN restrictions Previously Citus errored out if any reference table exists at any point of the outer part of an outer join. For instance, See the below sketch where (h) denotes a hash distributed relation, (r) denotes a reference table, (L) denotes LEFT JOIN and (I) denotes INNER JOIN. (L) / \ (I) h / \ r h Before this commit Citus would error out since a reference table appears on the left most part of an left join. However, that was too restrictive so that we only error out if the reference table is directly below and in the outer part of an outer join. 3) Bug fixes We've done some minor bugfixes in the existing implementation.	2017-09-14 20:59:22 +03:00
Marco Slot	cf375d6a66	Consider dropped columns that precede the partition column in COPY	2017-08-22 13:02:35 +02:00
Hadi Moshayedi	e5fbcf37dd	Add Savepoint Support (#1539 ) This change adds support for SAVEPOINT, ROLLBACK TO SAVEPOINT, and RELEASE SAVEPOINT. When transaction connections are not established yet, savepoints are kept in a stack and sent to the worker when the connection is later established. After establishing connections, savepoint commands are sent as they arrive. This change fixes #1493 .	2017-08-15 13:02:28 -04:00
Marco Slot	3ff46245b3	Make sure we don't use 2PC in copy from worker	2017-08-15 13:44:20 +02:00
velioglu	b0efffae1c	Correct planner and add more tests	2017-08-11 10:16:13 +03:00
Brian Cloutier	74ce4faab5	Make multi_cluster_management test more stable	2017-08-08 11:18:31 +03:00
Hadi Moshayedi	8229a64fe8	Remove distributed tables' dependency on distribution key columns. (#1527 ) This change removes distributed tables' dependency on distribution key columns. We already check that we cannot drop distribution key columns in ErrorIfUnsupportedAlterTableStmt() at multi_utility.c, so we don't need to have distributed table to distribution key column dependency to avoid dropping of distribution key column. Furthermore, having this dependency causes some warnings in pg_dump --schema-only (See #866), which are not desirable. This change also adds check to disallow drop of distribution keys when citus.enable_ddl_propagation is set to false. Regression tests are updated accordingly.	2017-08-03 10:07:04 -04:00
Jason Petersen	2204da19f0	Support PostgreSQL 10 (#1379 ) Adds support for PostgreSQL 10 by copying in the requisite ruleutils and updating all API usages to conform with changes in PostgreSQL 10. Most changes are fairly minor but they are numerous. One particular obstacle was the change in \d behavior in PostgreSQL 10's psql; I had to add SQL implementations (views, mostly) to mimic the pre-10 output.	2017-06-26 02:35:46 -06:00
velioglu	a1ea29ec2b	Use placement connection to drop shards instead of node connection	2017-06-14 14:14:59 +03:00
Onder Kalaci	df494c0403	Improve subquery pushdown regression tests - Use native postgres function for composite key btree functions - Move explain tests to multi_explain.sql (get rid of .out _0.out files) - Get rid of input/output files for multi_subquery.sql by moving table creations - Update some comments	2017-05-30 14:05:15 +03:00
Jason Petersen	97f8302c9c	Change version-sensitive tests to handle '10' Previously assumed period in version; this makes tests future-proof.	2017-05-16 11:05:34 -06:00
Jason Petersen	d6cccee5bc	Remove ALTER SEQUENCE from parallel groups Removing these has no side effect, and in the (current) PostgreSQL 10, an ERROR is printed during concurrent sequence modification.	2017-05-16 11:05:34 -06:00
Jason Petersen	db11324ac7	Add unambiguous ORDER BY clauses to many tests Queries which do not specify an order may arbitrarily change output across PostgreSQL versions.	2017-05-16 11:05:34 -06:00
Jason Petersen	b9bc3fdada	Update header comments to match new test names Just keeping these in sync with the actual file name.	2017-05-16 11:05:33 -06:00
Jason Petersen	9f4a33eee1	Rename very long test files In addition to not actually providing much information, these names can cause problems in PostgreSQL 10.	2017-05-16 11:05:33 -06:00
Önder Kalacı	ad5cd326a4	Subquery pushdown - main branch (#1323 ) * Enabling physical planner for subquery pushdown changes This commit applies the logic that exists in INSERT .. SELECT planning to the subquery pushdown changes. The main algorithm is followed as : - pick an anchor relation (i.e., target relation) - per each target shard interval - add the target shard interval's shard range as a restriction to the relations (if all relations joined on the partition keys) - Check whether the query is router plannable per target shard interval. - If router plannable, create a task * Add union support within the JOINS This commit adds support for UNION/UNION ALL subqueries that are in the following form: .... (Q1 UNION Q2 UNION ...) as union_query JOIN (QN) ... In other words, we currently do NOT support the queries that are in the following form where union query is not JOINed with other relations/subqueries : .... (Q1 UNION Q2 UNION ...) as union_query .... * Subquery pushdown planner uses original query With this commit, we change the input to the logical planner for subquery pushdown. Before this commit, the planner was relying on the query tree that is transformed by the postgresql planner. After this commit, the planner uses the original query. The main motivation behind this change is the simplify deparsing of subqueries. * Enable top level subquery join queries This work enables - Top level subquery joins - Joins between subqueries and relations - Joins involving more than 2 range table entries A new regression test file is added to reflect enabled test cases * Add top level union support This commit adds support for UNION/UNION ALL subqueries that are in the following form: .... (Q1 UNION Q2 UNION ...) as union_query .... In other words, Citus supports allow top level unions being wrapped into aggregations queries and/or simple projection queries that only selects some fields from the lower level queries. * Disallow subqueries without a relation in the range table list for subquery pushdown This commit disallows subqueries without relation in the range table list. This commit is only applied for subquery pushdown. In other words, we do not add this limitation for single table re-partition subqueries. The reasoning behind this limitation is that if we allow pushing down such queries, the result would include (shardCount * expectedResults) where in a non distributed world the result would be (expectedResult) only. * Disallow subqueries without a relation in the range table list for INSERT .. SELECT This commit disallows subqueries without relation in the range table list. This commit is only applied for INSERT.. SELECT queries. The reasoning behind this limitation is that if we allow pushing down such queries, the result would include (shardCount * expectedResults) where in a non distributed world the result would be (expectedResult) only. * Change behaviour of subquery pushdown flag (#1315) This commit changes the behaviour of the citus.subquery_pushdown flag. Before this commit, the flag is used to enable subquery pushdown logic. But, with this commit, that behaviour is enabled by default. In other words, the flag is now useless. We prefer to keep the flag since we don't want to break the backward compatibility. Also, we may consider using that flag for other purposes in the next commits. * Require subquery_pushdown when limit is used in subquery Using limit in subqueries may cause returning incorrect results. Therefore we allow limits in subqueries only if user explicitly set subquery_pushdown flag. * Evaluate expressions on the LIMIT clause (#1333) Subquery pushdown uses orignal query, the LIMIT and OFFSET clauses are not evaluated. However, logical optimizer expects these expressions are already evaluated by the standard planner. This commit manually evaluates the functions on the logical planner for subquery pushdown. * Better format subquery regression tests (#1340) * Style fix for subquery pushdown regression tests With this commit we intented a more consistent style for the regression tests we've added in the - multi_subquery_union.sql - multi_subquery_complex_queries.sql - multi_subquery_behavioral_analytics.sql * Enable the tests that are temporarily commented This commit enables some of the regression tests that were commented out until all the development is done. * Fix merge conflicts (#1347) - Update regression tests to meet the changes in the regression test output. - Replace Ifs with Asserts given that the check is already done - Update shard pruning outputs * Add view regression tests for increased subquery coverage (#1348) - joins between views and tables - joins between views - union/union all queries involving views - views with limit - explain queries with view * Improve btree operators for the subquery tests This commit adds the missing comprasion for subquery composite key btree comparator.	2017-04-29 04:09:48 +03:00
Andres Freund	6bd2e3ed30	Add DistTableCacheEntry->hasOverlappingShardInterval. This determines whether it's possible to perform binary search on sortedShardIntervalArray or not. If e.g. two shards have overlapping ranges, that'd be prohibitive. That'll be useful in later commit introducing faster shard pruning.	2017-04-28 14:40:38 -07:00
Andres Freund	1f93c325fa	Some cleanup in multi_subquery test. Remove trailing whitespace and use of EXPLAIN instead of EXPLAIN (COSTS OFF).	2017-04-26 11:33:56 -07:00
Burak Yucesoy	d6cb88a73a	Stabilize test outputs	2017-04-21 16:08:52 +03:00
Eren Basak	abc84e6b2b	Add support for proper valgrind tests This change allows valgrind tests (`make check-multi-vg`) to be run seamlessly without test output errors and timeout problems.	2017-04-21 16:08:52 +03:00
Jason Petersen	5272c2c44b	Enable distributed ALTER TABLE ... RENAME COLUMN Pretty straightforward. Had some concerns about locking, but due to the fact that all distributed operations use either some level of deparsing or need to enumerate column names, they all block during any concurrent column renames (due to the AccessExclusive lock). In addition, I had some misgivings about permitting renames of the dis- tribution column, but nothing bad comes from just allowing them. Finally, I tried to trigger any sort of error using prepared statements and could not trigger any errors not also exhibited by plain PostgreSQL tables.	2017-04-18 22:47:48 -06:00
Marco Slot	f838c83809	Remove redundant pg_dist_jobid_seq restarts in tests	2017-04-18 11:42:32 +02:00
Marco Slot	40829c2ba9	Set citus.enable_unique_job_ids in tests with job ID in output	2017-04-18 11:42:32 +02:00
Burak Yucesoy	e9095e62ec	Decouple reference table replication With this change we add an option to add a node without replicating all reference tables to that node. If a node is added with this option, we mark the node as inactive and no queries will sent to that node. We also added two new UDFs; - master_activate_node(host, port): - marks node as active and replicates all reference tables to that node - master_add_inactive_node(host, port): - only adds node to pg_dist_node	2017-04-17 13:33:31 +03:00
Onder Kalaci	1cb6a34ba8	Remove uninstantiated qual logic, use attribute equivalences In this PR, we aim to deduce whether each of the RTE_RELATION is joined with at least on another RTE_RELATION on their partition keys. If each RTE_RELATION follows the above rule, we can conclude that all RTE_RELATIONs are joined on their partition keys. In order to do that, we invented a new equivalence class namely: AttributeEquivalenceClass. In very simple words, a AttributeEquivalenceClass is identified by an unique id and consists of a list of AttributeEquivalenceMembers. Each AttributeEquivalenceMember is designed to identify attributes uniquely within the whole query. The necessity of this arise since varno attributes are defined within a single level of a query. Instead, here we want to identify each RTE_RELATION uniquely and try to find equality among each RTE_RELATION's partition key. Whenever we find an equality clause A = B, where both A and B originates from relation attributes (i.e., not random expressions), we create an AttributeEquivalenceClass to record this knowledge. If we later find another equivalence B = C, we create another AttributeEquivalenceClass. Finally, we can apply transitity rules and generate a new AttributeEquivalenceClass which includes A, B and C. Note that equality among the members are identified by the varattno and rteIdentity. Each equality among RTE_RELATION is saved using an AttributeEquivalenceClass where each member attribute is identified by a AttributeEquivalenceMember. In the final step, we try generate a common attribute equivalence class that holds as much as AttributeEquivalenceMembers whose attributes are a partition keys.	2017-04-13 11:51:26 +03:00
velioglu	19d0c66fa5	Change checks with built-in type	2017-04-11 14:41:37 +03:00
velioglu	1fb11c738f	Check binary output function of type.	2017-04-10 16:28:09 +03:00
Metin Doslu	54a277ff01	Add disable/enable trigger all support	2017-03-29 22:00:14 +03:00
Andres Freund	52358fe891	Initial temp table removal implementation	2017-03-14 12:09:49 +02:00
Murat Tuncer	72027f2eba	Remove default clause from shard DDL when sequences are used	2017-03-01 17:32:48 +03:00
Marco Slot	d74fb764b1	Use CitusCopyDestReceiver for regular COPY	2017-02-28 17:24:45 +01:00
Brian Cloutier	e3c763c3f7	Start remote transactions in master_append_table_to_shard Add a call to RemoteTransactionBeginIfNecessary so that BEGIN is actually sent to the remote connections. This means that ROLLBACK and Ctrl-C are respected and don't leave the table in a partial state.	2017-02-01 18:12:19 +02:00
Murat Tuncer	1107439ade	Fix dependent tests	2017-01-25 19:19:39 +03:00
Murat Tuncer	5194111420	Add failure case for regression tests	2017-01-25 19:19:39 +03:00
Eren Basak	88e9a429e1	Add Regression Tests For Querying MX Tables from Workers	2017-01-24 10:36:59 +03:00
Burak Yucesoy	d80e7849a4	Convert DropShards to use new connection API With this change DropShards function started to use new connection API. DropShards function is used by DROP TABLE, master_drop_all_shards and master_apply_delete_command, therefore all of these functions now support transactional operations. In DropShards function, if we cannot reach a node, we mark shard state of related placements as FILE_TO_DELETE and continue to drop remaining shards; however if any error occurs after establishing the connection, we ROLLBACK whole operation.	2017-01-23 21:08:41 +03:00
Murat Tuncer	77f8db6b14	Add view support Enables use views within distributed queries. User can create and use a view on distributed tables/queries as he/she would use with regular queries. After this change router queries will have full support for views, insert into select queries will support reading from views, not writing into. Outer joins would have a limited support, and would error out at certain cases such as when a view is in the inner side of the outer join. Although PostgreSQL supports writing into views under certain circumstances. We disallowed that for distributed views.	2017-01-13 09:39:42 +03:00
Burak Yucesoy	1d18950860	Modify tests to create clean workspace Since we will now replicate reference tables each time we add node, we need to ensure that test space is clean in terms of reference tables before any add node operation. For this purpose we had to change order of multi_drop_extension test which caused change of some of the colocation ids.	2017-01-05 12:22:44 +03:00
Murat Tuncer	2f76b4be99	Add error hint to failing modify query	2016-12-23 19:43:55 +03:00
Onder Kalaci	9f0bd4cb36	Reference Table Support - Phase 1 With this commit, we implemented some basic features of reference tables. To start with, a reference table is * a distributed table whithout a distribution column defined on it * the distributed table is single sharded * and the shard is replicated to all nodes Reference tables follows the same code-path with a single sharded tables. Thus, broadcast JOINs are applicable to reference tables. But, since the table is replicated to all nodes, table fetching is not required any more. Reference tables support the uniqueness constraints for any column. Reference tables can be used in INSERT INTO .. SELECT queries with the following rules: * If a reference table is in the SELECT part of the query, it is safe join with another reference table and/or hash partitioned tables. * If a reference table is in the INSERT part of the query, all other participating tables should be reference tables. Reference tables follow the regular co-location structure. Since all reference tables are single sharded and replicated to all nodes, they are always co-located with each other. Queries involving only reference tables always follows router planner and executor. Reference tables can have composite typed columns and there is no need to create/define the necessary support functions. All modification queries, master_* UDFs, EXPLAIN, DDLs, TRUNCATE, sequences, transactions, COPY, schema support works on reference tables as expected. Plus, all the pre-requisites associated with distribution columns are dismissed.	2016-12-20 14:09:35 +02:00
Murat Tuncer	c3a60bff70	Make router planner active at all times We used to disable router planner and executor when task executor is set to task-tracker. This change enables router planning and execution at all times regardless of task execution mode. We are introducing a hidden flag enable_router_execution to enable/disable router execution. Its default value is true. User may disable router planning by setting it to false.	2016-12-20 11:24:01 +03:00
Murat Tuncer	45762006f3	Add support for filters Ensures filter clauses are stripped from master query, and pushed down to worker queries.	2016-12-01 08:53:46 +03:00
Andres Freund	ac14b2edbc	Support PostgreSQL 9.6 Adds support for PostgreSQL 9.6 by copying in the requisite ruleutils file and refactoring the out/readfuncs code to flexibly support the old-style copy/pasted out/readfuncs (prior to 9.6) or use extensible node APIs (in 9.6 and higher). Most version-specific code within this change is only needed to set new fields in the AggRef nodes we build for aggregations. Version-specific test output files were added in certain cases, though in most they were not necessary. Each such file begins by e.g. printing the major version in order to clarify its purpose. The comment atop citus_nodes.h details how to add support for new nodes for when that becomes necessary.	2016-10-18 16:23:55 -06:00
Metin Doslu	35eceb6cca	Remove pg_toast_* references from regression tests pg_toast_* oids are constantly changing, and this causes regression tests to fail time to time. With this commit, we remove all of the pg_toast_* references from regression test outputs.	2016-09-09 11:31:51 +03:00
Jason Petersen	74f4e0003b	Permit multiple DDL commands in a transaction Three changes here to get to true multi-statement, multi-relation DDL transactions (same functionality pre-5.2, with benefits of atomicity): 1. Changed the multi-shard utility hook to always run (consistency with router executor hook, removes ad-hoc "installed" boolean) 2. Change the global connection list in multi_shard_transaction to instead be a hash; update related functions to operate on global hash instead of local hash/global list 3. Remove check within DDL code to prevent subsequent DDL commands; place unset/reset guard around call to ConnectToNode to permit connecting to additional nodes after DDL transaction has begun In addition, code has been added to raise an error if a ROLLBACK TO SAVEPOINT is attempted (similar to router executor), and comprehensive tests execute all multi-DDL scenarios (full success, user ROLLBACK, any actual errors (say, duplicate index), partial failure (duplicate index on one node but not others), partial COMMIT (one node fails), and 2PC partial PREPARE (one node fails)). Interleavings with other commands (DML, \copy) are similarly all covered.	2016-09-08 22:35:55 -05:00
Metin Doslu	5b50f2c333	Add complex subquery pushdown regression tests	2016-09-02 14:21:51 +03:00
Jason Petersen	850c51947a	Re-permit DDL in transactions, selectively Recent changes to DDL and transaction logic resulted in a "regression" from the viewpoint of users. Previously, DDL commands were allowed in multi-command transaction blocks, though they were not processed in any actual transactional manner. We improved the atomicity of our DDL code, but added a restriction that DDL commands themselves must not occur in any BEGIN/END transaction block. To give users back the original functionality (and improved atomicity) we now keep track of whether a multi-command transaction has modified data (DML) or schema (DDL). Interleaving the two modification types in a single transaction is disallowed. This first step simply permits a single DDL command in such a block, admittedly an incomplete solution, but one which will permit us to add full multi-DDL command support in a subsequent commit.	2016-08-30 20:37:19 -06:00
Metin Doslu	75618fc3fb	Return false in MultiClientQueryResult() on failing query	2016-08-29 17:05:35 +03:00
Brian Cloutier	640bb8863b	Remove check-multi-fdw tests, nobody uses Citus with fdws	2016-08-26 10:41:33 +03:00
Jason Petersen	e54d3f6d32	Rename test files with 'stage' in name Ignored FDW files as those test are being removed entirely, I believe.	2016-08-22 13:32:53 -06:00
Jason Petersen	b391abda3d	Replace verb 'stage' with 'load' in test comments "Staging table" will be the only valid use of 'stage' from now on, we will now say "load" when talking about data ingestion. If creation of shards is its own step, we'll just say "shard creation".	2016-08-22 13:24:18 -06:00
Eren Başak	0322916700	Lowercase \copy to match PostgreSQL's style for local/psql-level functions	2016-08-22 11:31:26 -06:00
Eren Basak	b513f1c911	Replace \stage With \copy on Regression Tests Fixes #547 This change removes all references to \stage in the regression tests and puts \COPY instead. Doing so changed shard counts, min/max values on some test tables (lineitem, orders, etc.).	2016-08-22 11:31:26 -06:00
Marco Slot	9705cbcdf8	Rewrite WorkerShardStats to avoid invalid value bugs	2016-07-29 20:11:18 +02:00
Eren Başak	bb3893d0d8	Set 1PC as the Default Commit Protocol for DDL Commands Fixes #679 This change sets the default commit protocol for distributed DDL commands to '1pc'. If the user issues a distributed DDL command with this default setting, then once in a session, a NOTICE message is shown about using '2pc' being extra safe.	2016-07-29 16:42:55 +03:00
Jason Petersen	abe7304898	Support SERIAL/BIGSERIAL non-partition columns This adds support for SERIAL/BIGSERIAL column types. Because we now can evaluate functions on the master (during execution), adding this is a matter of ensuring the table creation step works properly. To accomplish this, I've added some logic to detect sequences owned by a table (i.e. those related to its columns). Simply creating a sequence and using it in a default value is insufficient; users who do so must ensure the sequence is owned by the column using it. Fortunately, this is exactly what SERIAL and BIGSERIAL do, which is the use case we're targeting with this feature. While testing this, I found that worker_apply_shard_ddl_command actually adds shard identifiers to sequence names, though I found no places that use or test this path. I removed that code so that sequence names are not mutated and will match those used by a SERIAL default value expression. Our use of the new-to-9.5 CREATE SEQUENCE IF NOT EXISTS syntax means we are dropping support for 9.4 (which is being done regardless, but makes this change simpler). I've removed 9.4 from the Travis build matrix. Some edge cases are possible in ALTER SEQUENCE, COPY FROM (on workers), and CREATE SEQUENCE OWNED BY. I've added errors for each so that users understand when and why certain operations are prohibited.	2016-07-28 23:55:40 -06:00

1 2 3 4

181 Commits (cb6d67a9a97d7b0ed1a3fba7e316c275b2a08f84)