citus

Commit Graph

Author	SHA1	Message	Date
velioglu	a1ea29ec2b	Use placement connection to drop shards instead of node connection	2017-06-14 14:14:59 +03:00
Onder Kalaci	df494c0403	Improve subquery pushdown regression tests - Use native postgres function for composite key btree functions - Move explain tests to multi_explain.sql (get rid of .out _0.out files) - Get rid of input/output files for multi_subquery.sql by moving table creations - Update some comments	2017-05-30 14:05:15 +03:00
Jason Petersen	97f8302c9c	Change version-sensitive tests to handle '10' Previously assumed period in version; this makes tests future-proof.	2017-05-16 11:05:34 -06:00
Jason Petersen	d6cccee5bc	Remove ALTER SEQUENCE from parallel groups Removing these has no side effect, and in the (current) PostgreSQL 10, an ERROR is printed during concurrent sequence modification.	2017-05-16 11:05:34 -06:00
Jason Petersen	db11324ac7	Add unambiguous ORDER BY clauses to many tests Queries which do not specify an order may arbitrarily change output across PostgreSQL versions.	2017-05-16 11:05:34 -06:00
Jason Petersen	b9bc3fdada	Update header comments to match new test names Just keeping these in sync with the actual file name.	2017-05-16 11:05:33 -06:00
Jason Petersen	9f4a33eee1	Rename very long test files In addition to not actually providing much information, these names can cause problems in PostgreSQL 10.	2017-05-16 11:05:33 -06:00
Önder Kalacı	ad5cd326a4	Subquery pushdown - main branch (#1323 ) * Enabling physical planner for subquery pushdown changes This commit applies the logic that exists in INSERT .. SELECT planning to the subquery pushdown changes. The main algorithm is followed as : - pick an anchor relation (i.e., target relation) - per each target shard interval - add the target shard interval's shard range as a restriction to the relations (if all relations joined on the partition keys) - Check whether the query is router plannable per target shard interval. - If router plannable, create a task * Add union support within the JOINS This commit adds support for UNION/UNION ALL subqueries that are in the following form: .... (Q1 UNION Q2 UNION ...) as union_query JOIN (QN) ... In other words, we currently do NOT support the queries that are in the following form where union query is not JOINed with other relations/subqueries : .... (Q1 UNION Q2 UNION ...) as union_query .... * Subquery pushdown planner uses original query With this commit, we change the input to the logical planner for subquery pushdown. Before this commit, the planner was relying on the query tree that is transformed by the postgresql planner. After this commit, the planner uses the original query. The main motivation behind this change is the simplify deparsing of subqueries. * Enable top level subquery join queries This work enables - Top level subquery joins - Joins between subqueries and relations - Joins involving more than 2 range table entries A new regression test file is added to reflect enabled test cases * Add top level union support This commit adds support for UNION/UNION ALL subqueries that are in the following form: .... (Q1 UNION Q2 UNION ...) as union_query .... In other words, Citus supports allow top level unions being wrapped into aggregations queries and/or simple projection queries that only selects some fields from the lower level queries. * Disallow subqueries without a relation in the range table list for subquery pushdown This commit disallows subqueries without relation in the range table list. This commit is only applied for subquery pushdown. In other words, we do not add this limitation for single table re-partition subqueries. The reasoning behind this limitation is that if we allow pushing down such queries, the result would include (shardCount * expectedResults) where in a non distributed world the result would be (expectedResult) only. * Disallow subqueries without a relation in the range table list for INSERT .. SELECT This commit disallows subqueries without relation in the range table list. This commit is only applied for INSERT.. SELECT queries. The reasoning behind this limitation is that if we allow pushing down such queries, the result would include (shardCount * expectedResults) where in a non distributed world the result would be (expectedResult) only. * Change behaviour of subquery pushdown flag (#1315) This commit changes the behaviour of the citus.subquery_pushdown flag. Before this commit, the flag is used to enable subquery pushdown logic. But, with this commit, that behaviour is enabled by default. In other words, the flag is now useless. We prefer to keep the flag since we don't want to break the backward compatibility. Also, we may consider using that flag for other purposes in the next commits. * Require subquery_pushdown when limit is used in subquery Using limit in subqueries may cause returning incorrect results. Therefore we allow limits in subqueries only if user explicitly set subquery_pushdown flag. * Evaluate expressions on the LIMIT clause (#1333) Subquery pushdown uses orignal query, the LIMIT and OFFSET clauses are not evaluated. However, logical optimizer expects these expressions are already evaluated by the standard planner. This commit manually evaluates the functions on the logical planner for subquery pushdown. * Better format subquery regression tests (#1340) * Style fix for subquery pushdown regression tests With this commit we intented a more consistent style for the regression tests we've added in the - multi_subquery_union.sql - multi_subquery_complex_queries.sql - multi_subquery_behavioral_analytics.sql * Enable the tests that are temporarily commented This commit enables some of the regression tests that were commented out until all the development is done. * Fix merge conflicts (#1347) - Update regression tests to meet the changes in the regression test output. - Replace Ifs with Asserts given that the check is already done - Update shard pruning outputs * Add view regression tests for increased subquery coverage (#1348) - joins between views and tables - joins between views - union/union all queries involving views - views with limit - explain queries with view * Improve btree operators for the subquery tests This commit adds the missing comprasion for subquery composite key btree comparator.	2017-04-29 04:09:48 +03:00
Andres Freund	6bd2e3ed30	Add DistTableCacheEntry->hasOverlappingShardInterval. This determines whether it's possible to perform binary search on sortedShardIntervalArray or not. If e.g. two shards have overlapping ranges, that'd be prohibitive. That'll be useful in later commit introducing faster shard pruning.	2017-04-28 14:40:38 -07:00
Andres Freund	1f93c325fa	Some cleanup in multi_subquery test. Remove trailing whitespace and use of EXPLAIN instead of EXPLAIN (COSTS OFF).	2017-04-26 11:33:56 -07:00
Burak Yucesoy	d6cb88a73a	Stabilize test outputs	2017-04-21 16:08:52 +03:00
Eren Basak	abc84e6b2b	Add support for proper valgrind tests This change allows valgrind tests (`make check-multi-vg`) to be run seamlessly without test output errors and timeout problems.	2017-04-21 16:08:52 +03:00
Jason Petersen	5272c2c44b	Enable distributed ALTER TABLE ... RENAME COLUMN Pretty straightforward. Had some concerns about locking, but due to the fact that all distributed operations use either some level of deparsing or need to enumerate column names, they all block during any concurrent column renames (due to the AccessExclusive lock). In addition, I had some misgivings about permitting renames of the dis- tribution column, but nothing bad comes from just allowing them. Finally, I tried to trigger any sort of error using prepared statements and could not trigger any errors not also exhibited by plain PostgreSQL tables.	2017-04-18 22:47:48 -06:00
Marco Slot	f838c83809	Remove redundant pg_dist_jobid_seq restarts in tests	2017-04-18 11:42:32 +02:00
Marco Slot	40829c2ba9	Set citus.enable_unique_job_ids in tests with job ID in output	2017-04-18 11:42:32 +02:00
Burak Yucesoy	e9095e62ec	Decouple reference table replication With this change we add an option to add a node without replicating all reference tables to that node. If a node is added with this option, we mark the node as inactive and no queries will sent to that node. We also added two new UDFs; - master_activate_node(host, port): - marks node as active and replicates all reference tables to that node - master_add_inactive_node(host, port): - only adds node to pg_dist_node	2017-04-17 13:33:31 +03:00
Onder Kalaci	1cb6a34ba8	Remove uninstantiated qual logic, use attribute equivalences In this PR, we aim to deduce whether each of the RTE_RELATION is joined with at least on another RTE_RELATION on their partition keys. If each RTE_RELATION follows the above rule, we can conclude that all RTE_RELATIONs are joined on their partition keys. In order to do that, we invented a new equivalence class namely: AttributeEquivalenceClass. In very simple words, a AttributeEquivalenceClass is identified by an unique id and consists of a list of AttributeEquivalenceMembers. Each AttributeEquivalenceMember is designed to identify attributes uniquely within the whole query. The necessity of this arise since varno attributes are defined within a single level of a query. Instead, here we want to identify each RTE_RELATION uniquely and try to find equality among each RTE_RELATION's partition key. Whenever we find an equality clause A = B, where both A and B originates from relation attributes (i.e., not random expressions), we create an AttributeEquivalenceClass to record this knowledge. If we later find another equivalence B = C, we create another AttributeEquivalenceClass. Finally, we can apply transitity rules and generate a new AttributeEquivalenceClass which includes A, B and C. Note that equality among the members are identified by the varattno and rteIdentity. Each equality among RTE_RELATION is saved using an AttributeEquivalenceClass where each member attribute is identified by a AttributeEquivalenceMember. In the final step, we try generate a common attribute equivalence class that holds as much as AttributeEquivalenceMembers whose attributes are a partition keys.	2017-04-13 11:51:26 +03:00
velioglu	19d0c66fa5	Change checks with built-in type	2017-04-11 14:41:37 +03:00
velioglu	1fb11c738f	Check binary output function of type.	2017-04-10 16:28:09 +03:00
Metin Doslu	54a277ff01	Add disable/enable trigger all support	2017-03-29 22:00:14 +03:00
Andres Freund	52358fe891	Initial temp table removal implementation	2017-03-14 12:09:49 +02:00
Murat Tuncer	72027f2eba	Remove default clause from shard DDL when sequences are used	2017-03-01 17:32:48 +03:00
Marco Slot	d74fb764b1	Use CitusCopyDestReceiver for regular COPY	2017-02-28 17:24:45 +01:00
Brian Cloutier	e3c763c3f7	Start remote transactions in master_append_table_to_shard Add a call to RemoteTransactionBeginIfNecessary so that BEGIN is actually sent to the remote connections. This means that ROLLBACK and Ctrl-C are respected and don't leave the table in a partial state.	2017-02-01 18:12:19 +02:00
Murat Tuncer	1107439ade	Fix dependent tests	2017-01-25 19:19:39 +03:00
Murat Tuncer	5194111420	Add failure case for regression tests	2017-01-25 19:19:39 +03:00
Eren Basak	88e9a429e1	Add Regression Tests For Querying MX Tables from Workers	2017-01-24 10:36:59 +03:00
Burak Yucesoy	d80e7849a4	Convert DropShards to use new connection API With this change DropShards function started to use new connection API. DropShards function is used by DROP TABLE, master_drop_all_shards and master_apply_delete_command, therefore all of these functions now support transactional operations. In DropShards function, if we cannot reach a node, we mark shard state of related placements as FILE_TO_DELETE and continue to drop remaining shards; however if any error occurs after establishing the connection, we ROLLBACK whole operation.	2017-01-23 21:08:41 +03:00
Murat Tuncer	77f8db6b14	Add view support Enables use views within distributed queries. User can create and use a view on distributed tables/queries as he/she would use with regular queries. After this change router queries will have full support for views, insert into select queries will support reading from views, not writing into. Outer joins would have a limited support, and would error out at certain cases such as when a view is in the inner side of the outer join. Although PostgreSQL supports writing into views under certain circumstances. We disallowed that for distributed views.	2017-01-13 09:39:42 +03:00
Burak Yucesoy	1d18950860	Modify tests to create clean workspace Since we will now replicate reference tables each time we add node, we need to ensure that test space is clean in terms of reference tables before any add node operation. For this purpose we had to change order of multi_drop_extension test which caused change of some of the colocation ids.	2017-01-05 12:22:44 +03:00
Murat Tuncer	2f76b4be99	Add error hint to failing modify query	2016-12-23 19:43:55 +03:00
Onder Kalaci	9f0bd4cb36	Reference Table Support - Phase 1 With this commit, we implemented some basic features of reference tables. To start with, a reference table is * a distributed table whithout a distribution column defined on it * the distributed table is single sharded * and the shard is replicated to all nodes Reference tables follows the same code-path with a single sharded tables. Thus, broadcast JOINs are applicable to reference tables. But, since the table is replicated to all nodes, table fetching is not required any more. Reference tables support the uniqueness constraints for any column. Reference tables can be used in INSERT INTO .. SELECT queries with the following rules: * If a reference table is in the SELECT part of the query, it is safe join with another reference table and/or hash partitioned tables. * If a reference table is in the INSERT part of the query, all other participating tables should be reference tables. Reference tables follow the regular co-location structure. Since all reference tables are single sharded and replicated to all nodes, they are always co-located with each other. Queries involving only reference tables always follows router planner and executor. Reference tables can have composite typed columns and there is no need to create/define the necessary support functions. All modification queries, master_* UDFs, EXPLAIN, DDLs, TRUNCATE, sequences, transactions, COPY, schema support works on reference tables as expected. Plus, all the pre-requisites associated with distribution columns are dismissed.	2016-12-20 14:09:35 +02:00
Murat Tuncer	c3a60bff70	Make router planner active at all times We used to disable router planner and executor when task executor is set to task-tracker. This change enables router planning and execution at all times regardless of task execution mode. We are introducing a hidden flag enable_router_execution to enable/disable router execution. Its default value is true. User may disable router planning by setting it to false.	2016-12-20 11:24:01 +03:00
Murat Tuncer	45762006f3	Add support for filters Ensures filter clauses are stripped from master query, and pushed down to worker queries.	2016-12-01 08:53:46 +03:00
Andres Freund	ac14b2edbc	Support PostgreSQL 9.6 Adds support for PostgreSQL 9.6 by copying in the requisite ruleutils file and refactoring the out/readfuncs code to flexibly support the old-style copy/pasted out/readfuncs (prior to 9.6) or use extensible node APIs (in 9.6 and higher). Most version-specific code within this change is only needed to set new fields in the AggRef nodes we build for aggregations. Version-specific test output files were added in certain cases, though in most they were not necessary. Each such file begins by e.g. printing the major version in order to clarify its purpose. The comment atop citus_nodes.h details how to add support for new nodes for when that becomes necessary.	2016-10-18 16:23:55 -06:00
Metin Doslu	35eceb6cca	Remove pg_toast_* references from regression tests pg_toast_* oids are constantly changing, and this causes regression tests to fail time to time. With this commit, we remove all of the pg_toast_* references from regression test outputs.	2016-09-09 11:31:51 +03:00
Jason Petersen	74f4e0003b	Permit multiple DDL commands in a transaction Three changes here to get to true multi-statement, multi-relation DDL transactions (same functionality pre-5.2, with benefits of atomicity): 1. Changed the multi-shard utility hook to always run (consistency with router executor hook, removes ad-hoc "installed" boolean) 2. Change the global connection list in multi_shard_transaction to instead be a hash; update related functions to operate on global hash instead of local hash/global list 3. Remove check within DDL code to prevent subsequent DDL commands; place unset/reset guard around call to ConnectToNode to permit connecting to additional nodes after DDL transaction has begun In addition, code has been added to raise an error if a ROLLBACK TO SAVEPOINT is attempted (similar to router executor), and comprehensive tests execute all multi-DDL scenarios (full success, user ROLLBACK, any actual errors (say, duplicate index), partial failure (duplicate index on one node but not others), partial COMMIT (one node fails), and 2PC partial PREPARE (one node fails)). Interleavings with other commands (DML, \copy) are similarly all covered.	2016-09-08 22:35:55 -05:00
Metin Doslu	5b50f2c333	Add complex subquery pushdown regression tests	2016-09-02 14:21:51 +03:00
Jason Petersen	850c51947a	Re-permit DDL in transactions, selectively Recent changes to DDL and transaction logic resulted in a "regression" from the viewpoint of users. Previously, DDL commands were allowed in multi-command transaction blocks, though they were not processed in any actual transactional manner. We improved the atomicity of our DDL code, but added a restriction that DDL commands themselves must not occur in any BEGIN/END transaction block. To give users back the original functionality (and improved atomicity) we now keep track of whether a multi-command transaction has modified data (DML) or schema (DDL). Interleaving the two modification types in a single transaction is disallowed. This first step simply permits a single DDL command in such a block, admittedly an incomplete solution, but one which will permit us to add full multi-DDL command support in a subsequent commit.	2016-08-30 20:37:19 -06:00
Metin Doslu	75618fc3fb	Return false in MultiClientQueryResult() on failing query	2016-08-29 17:05:35 +03:00
Brian Cloutier	640bb8863b	Remove check-multi-fdw tests, nobody uses Citus with fdws	2016-08-26 10:41:33 +03:00
Jason Petersen	e54d3f6d32	Rename test files with 'stage' in name Ignored FDW files as those test are being removed entirely, I believe.	2016-08-22 13:32:53 -06:00
Jason Petersen	b391abda3d	Replace verb 'stage' with 'load' in test comments "Staging table" will be the only valid use of 'stage' from now on, we will now say "load" when talking about data ingestion. If creation of shards is its own step, we'll just say "shard creation".	2016-08-22 13:24:18 -06:00
Eren Başak	0322916700	Lowercase \copy to match PostgreSQL's style for local/psql-level functions	2016-08-22 11:31:26 -06:00
Eren Basak	b513f1c911	Replace \stage With \copy on Regression Tests Fixes #547 This change removes all references to \stage in the regression tests and puts \COPY instead. Doing so changed shard counts, min/max values on some test tables (lineitem, orders, etc.).	2016-08-22 11:31:26 -06:00
Marco Slot	9705cbcdf8	Rewrite WorkerShardStats to avoid invalid value bugs	2016-07-29 20:11:18 +02:00
Eren Başak	bb3893d0d8	Set 1PC as the Default Commit Protocol for DDL Commands Fixes #679 This change sets the default commit protocol for distributed DDL commands to '1pc'. If the user issues a distributed DDL command with this default setting, then once in a session, a NOTICE message is shown about using '2pc' being extra safe.	2016-07-29 16:42:55 +03:00
Jason Petersen	abe7304898	Support SERIAL/BIGSERIAL non-partition columns This adds support for SERIAL/BIGSERIAL column types. Because we now can evaluate functions on the master (during execution), adding this is a matter of ensuring the table creation step works properly. To accomplish this, I've added some logic to detect sequences owned by a table (i.e. those related to its columns). Simply creating a sequence and using it in a default value is insufficient; users who do so must ensure the sequence is owned by the column using it. Fortunately, this is exactly what SERIAL and BIGSERIAL do, which is the use case we're targeting with this feature. While testing this, I found that worker_apply_shard_ddl_command actually adds shard identifiers to sequence names, though I found no places that use or test this path. I removed that code so that sequence names are not mutated and will match those used by a SERIAL default value expression. Our use of the new-to-9.5 CREATE SEQUENCE IF NOT EXISTS syntax means we are dropping support for 9.4 (which is being done regardless, but makes this change simpler). I've removed 9.4 from the Travis build matrix. Some edge cases are possible in ALTER SEQUENCE, COPY FROM (on workers), and CREATE SEQUENCE OWNED BY. I've added errors for each so that users understand when and why certain operations are prohibited.	2016-07-28 23:55:40 -06:00
Murat Tuncer	cc33a450c4	Expand router planner coverage We can now support richer set of queries in router planner. This allow us to support CTEs, joins, window function, subqueries if they are known to be executed at a single worker with a single task (all tables are filtered down to a single shard and a single worker contains all table shards referenced in the query). Fixes : #501	2016-07-27 23:35:38 +03:00
Burak Yucesoy	bdff72ed75	Fix ALTER TABLE SET SCHEMA Fixes #132 We hook into ALTER ... SET SCHEMA and warn out if user tries to change schema of a distributed table. We also hook into ALTER TABLE ALL IN TABLE SPACE statements and warn out if citus has been loaded.	2016-07-22 17:52:40 +03:00

1 2

79 Commits (296f41dfd043ee3d31880f1ec54d5eba006646d9)