citus

Commit Graph

Author	SHA1	Message	Date
Brian Cloutier	7ad95b53d2	Rename pg_dist_shard_placement -> pg_dist_placement Comes with a few changes: - Change the signature of some functions to accept groupid - InsertShardPlacementRow - DeleteShardPlacementRow - UpdateShardPlacementState - NodeHasActiveShardPlacements returns true if the group the node is a part of has any active shard placements - TupleToShardPlacement now returns ShardPlacements which have NULL nodeName and nodePort. - Populate (nodeName, nodePort) when creating ShardPlacements - Disallow removing a node if it contains any shard placements - DeleteAllReferenceTablePlacementsFromNode matches based on group. This doesn't change behavior for now (while there is only one node per group), but means in the future callers should be careful about calling it on a secondary node, it'll delete placements on the primary. - Create concept of a GroupShardPlacement, which represents an actual tuple in pg_dist_placement and is distinct from a ShardPlacement, which has been resolved to a specific node. In the future ShardPlacement should be renamed to NodeShardPlacement. - Create some triggers which allow existing code to continue to insert into and update pg_dist_shard_placement as if it still existed.	2017-07-12 14:17:31 +02:00
Jason Petersen	9018e698ec	Indentation cleanup Uncrustify 0.65 appears to have changed some defaults, resulting in breakages for those of us who have already upgraded; Travis still uses Uncrustify 0.64, but these changes work with both versions (assuming appropriately updated config), so this should permit use of either version for the time being.	2017-07-11 15:59:28 -06:00
Murat Tuncer	2a4eada150	Replace duplicate code and call check_functions_in_node (#1478 ) MasterIrreducibleExpressionWalker has a copied code from function check_functions_in_node() which was available with PG 9.6+. Now PG 9.5 support is dropped we can remove duplicate code and directly call check_functions_in_node().	2017-07-07 10:19:33 +03:00
Marco Slot	da47a03b18	Move INSERT ... SELECT planning logic into one place	2017-06-29 15:03:14 +02:00
Andres Freund	dc3997c3b8	Remove 9.5 related node wrappers. Now that all branches support the extensible node infrastructure, we don't need our wrappers anymore.	2017-06-26 08:46:32 -07:00
Andres Freund	b96ba9b490	Fix code only enabled for 9.5. There's still supporting wrappers used, a subsequent commit will remove those. This also removes the already unused tuplecount_t define.	2017-06-26 08:46:32 -07:00
Jason Petersen	2204da19f0	Support PostgreSQL 10 (#1379 ) Adds support for PostgreSQL 10 by copying in the requisite ruleutils and updating all API usages to conform with changes in PostgreSQL 10. Most changes are fairly minor but they are numerous. One particular obstacle was the change in \d behavior in PostgreSQL 10's psql; I had to add SQL implementations (views, mostly) to mimic the pre-10 output.	2017-06-26 02:35:46 -06:00
Marco Slot	2f8ac82660	Execute INSERT..SELECT via coordinator if it cannot be pushed down Add a second implementation of INSERT INTO distributed_table SELECT ... that is used if the query cannot be pushed down. The basic idea is to execute the SELECT query separately and pass the results into the distributed table using a CopyDestReceiver, which is also used for COPY and create_distributed_table. When planning the SELECT, we go through planner hooks again, which means the SELECT can also be a distributed query. EXPLAIN is supported, but EXPLAIN ANALYZE is not because preventing double execution was a lot more complicated in this case.	2017-06-22 15:46:30 +02:00
Marco Slot	155db4d913	Simplify router planner call path	2017-06-22 15:45:57 +02:00
jmunsch	1647d17a14	Clarify error message for local and distributed query plans.	2017-06-01 11:52:49 -07:00
Jason Petersen	f86920f9d6	Add includes for missing standard headers We use symbols from each of these and were relying on them being included by other headers.	2017-05-16 11:05:33 -06:00
Jason Petersen	82b03d5cb6	Add explicit cast for argument to copyObject PostgreSQL 10 adds a call to typeof, if supported.	2017-05-16 11:05:33 -06:00
Önder Kalacı	3ec502b286	Add support for parametrized execution for subquery pushdown (#1356 ) Distributed query planning for subquery pushdown is done on the original query. This prevents the usage of external parameters on the execution. To overcome this, we manually replace the parameters on the original query.	2017-05-10 09:38:48 +03:00
Önder Kalacı	ef6d3587b6	Skip exhaustive test in CoPartitionedTables() if declared colocated (#1376 ) That's considerably cheaper.	2017-05-02 03:33:21 +03:00
Önder Kalacı	b74ed3c8e1	Subqueries in where -- updated (#1372 ) * Support for subqueries in WHERE clause This commit enables subqueries in WHERE clause to be pushed down by the subquery pushdown logic. The support covers: - Correlated subqueries with IN, NOT IN, EXISTS, NOT EXISTS, operator expressions such as (>, <, =, ALL, ANY etc.) - Non-correlated subqueries with (partition_key) IN (SELECT partition_key ..) (partition_key) =ANY (SELECT partition_key ...) Note that this commit heavily utilizes the attribute equivalence logic introduced in the `1cb6a34ba8`. In general, this commit mostly adjusts the logical planner not to error out on the subqueries in WHERE clause. * Improve error checks for subquery pushdown and INSERT ... SELECT Since we allow subqueries in WHERE clause with the previous commit, we should apply the same limitations to those subqueries. With this commit, we do not iterate on each subquery one by one. Instead, we extract all the subqueries and apply the checks directly on those subqueries. The aim of this change is to (i) Simplify the code (ii) Make it close to the checks on INSERT .. SELECT code base. * Extend checks for unresolved paramaters to include SubLinks With the presence of subqueries in where clause (i.e., SubPlans on the query) the existing way for checking unresolved parameters fail. The reason is that the parameters for SubPlans are kept on the parent plan not on the query itself (see primnodes.h for the details). With this commit, instead of checking SubPlans on the modified plans we start to use originalQuery, where SubLinks represent the subqueries in where clause. The unresolved parameters can be found on the SubLinks. * Apply code-review feedback * Remove unnecessary copying of shard interval list This commit removes unnecessary copying of shard interval list. Note that there are no copyObject function implemented for shard intervals.	2017-05-01 17:20:21 +03:00
Önder Kalacı	ad5cd326a4	Subquery pushdown - main branch (#1323 ) * Enabling physical planner for subquery pushdown changes This commit applies the logic that exists in INSERT .. SELECT planning to the subquery pushdown changes. The main algorithm is followed as : - pick an anchor relation (i.e., target relation) - per each target shard interval - add the target shard interval's shard range as a restriction to the relations (if all relations joined on the partition keys) - Check whether the query is router plannable per target shard interval. - If router plannable, create a task * Add union support within the JOINS This commit adds support for UNION/UNION ALL subqueries that are in the following form: .... (Q1 UNION Q2 UNION ...) as union_query JOIN (QN) ... In other words, we currently do NOT support the queries that are in the following form where union query is not JOINed with other relations/subqueries : .... (Q1 UNION Q2 UNION ...) as union_query .... * Subquery pushdown planner uses original query With this commit, we change the input to the logical planner for subquery pushdown. Before this commit, the planner was relying on the query tree that is transformed by the postgresql planner. After this commit, the planner uses the original query. The main motivation behind this change is the simplify deparsing of subqueries. * Enable top level subquery join queries This work enables - Top level subquery joins - Joins between subqueries and relations - Joins involving more than 2 range table entries A new regression test file is added to reflect enabled test cases * Add top level union support This commit adds support for UNION/UNION ALL subqueries that are in the following form: .... (Q1 UNION Q2 UNION ...) as union_query .... In other words, Citus supports allow top level unions being wrapped into aggregations queries and/or simple projection queries that only selects some fields from the lower level queries. * Disallow subqueries without a relation in the range table list for subquery pushdown This commit disallows subqueries without relation in the range table list. This commit is only applied for subquery pushdown. In other words, we do not add this limitation for single table re-partition subqueries. The reasoning behind this limitation is that if we allow pushing down such queries, the result would include (shardCount * expectedResults) where in a non distributed world the result would be (expectedResult) only. * Disallow subqueries without a relation in the range table list for INSERT .. SELECT This commit disallows subqueries without relation in the range table list. This commit is only applied for INSERT.. SELECT queries. The reasoning behind this limitation is that if we allow pushing down such queries, the result would include (shardCount * expectedResults) where in a non distributed world the result would be (expectedResult) only. * Change behaviour of subquery pushdown flag (#1315) This commit changes the behaviour of the citus.subquery_pushdown flag. Before this commit, the flag is used to enable subquery pushdown logic. But, with this commit, that behaviour is enabled by default. In other words, the flag is now useless. We prefer to keep the flag since we don't want to break the backward compatibility. Also, we may consider using that flag for other purposes in the next commits. * Require subquery_pushdown when limit is used in subquery Using limit in subqueries may cause returning incorrect results. Therefore we allow limits in subqueries only if user explicitly set subquery_pushdown flag. * Evaluate expressions on the LIMIT clause (#1333) Subquery pushdown uses orignal query, the LIMIT and OFFSET clauses are not evaluated. However, logical optimizer expects these expressions are already evaluated by the standard planner. This commit manually evaluates the functions on the logical planner for subquery pushdown. * Better format subquery regression tests (#1340) * Style fix for subquery pushdown regression tests With this commit we intented a more consistent style for the regression tests we've added in the - multi_subquery_union.sql - multi_subquery_complex_queries.sql - multi_subquery_behavioral_analytics.sql * Enable the tests that are temporarily commented This commit enables some of the regression tests that were commented out until all the development is done. * Fix merge conflicts (#1347) - Update regression tests to meet the changes in the regression test output. - Replace Ifs with Asserts given that the check is already done - Update shard pruning outputs * Add view regression tests for increased subquery coverage (#1348) - joins between views and tables - joins between views - union/union all queries involving views - views with limit - explain queries with view * Improve btree operators for the subquery tests This commit adds the missing comprasion for subquery composite key btree comparator.	2017-04-29 04:09:48 +03:00
Andres Freund	90b211267d	Perform range based pruning if equality pruning has survivor. We previously dismissed this as unimportant, but it turns out to be very useful for the upcoming subquery pushdown, where a user might specify an equality constraint in a subquery, and the subquery pushdown machinery adds >= and <= restrictions on the shard boundary. Previously the latter restriction was ignored.	2017-04-28 17:35:18 -07:00
Andres Freund	6c08fe72f9	Use stricter qual for pruning if both >/< and >=/<= are present. Previously, if both =< and < (>= and < respectively) were specified, we always used the latter restriction. Instead use the stricter one.	2017-04-28 17:35:18 -07:00
Burak Yucesoy	6599677902	Fix check-vanilla tests It semms that GEQO optimizations, when it is set to on, create their own memory context and free it after when it is no longer necessary. In join multi_join_restriction_hook we allocate our variables in the CurrentMemoryContext, which is GEQO's memory context if it is active. To prevent deallocation of our variables when GEQO's memory context is freed, we started to allocate memory fo these variables in separate MemoryContext.	2017-04-29 01:55:18 +02:00
Andres Freund	d399f395f7	Faster shard pruning. So far citus used postgres' predicate proofing logic for shard pruning, except for INSERT and COPY which were already optimized for speed. That turns out to be too slow: * Shard pruning for SELECTs is currently O(#shards), because PruneShardList calls predicate_refuted_by() for every shard. Obviously using an O(N) type algorithm for general pruning isn't good. * predicate_refuted_by() is quite expensive on its own right. That's primarily because it's optimized for doing a single refutation proof, rather than performing the same proof over and over. * predicate_refuted_by() does not keep persistent state (see 2.) for function calls, which means that a lot of syscache lookups will be performed. That's particularly bad if the partitioning key is a composite key, because without a persistent FunctionCallInfo record_cmp() has to repeatedly look-up the type definition of the composite key. That's quite expensive. Thus replace this with custom-code that works in two phases: 1) Search restrictions for constraints that can be pruned upon 2) Use those restrictions to search for matching shards in the most efficient manner available: a) Binary search / Hash Lookup in case of hash partitioned tables b) Binary search for equal clauses in case of range or append tables without overlapping shards. c) Binary search for inequality clauses, searching for both lower and upper boundaries, again in case of range or append tables without overlapping shards. d) exhaustive search testing each ShardInterval My measurements suggest that we are considerably, often orders of magnitude, faster than the previous solution, even if we have to fall back to exhaustive pruning.	2017-04-28 14:40:41 -07:00
Metin Doslu	b6659bec22	Send explain queries with savepoints With this commit, we started to send explain queries within a savepoint. After running explain query, we rollback to savepoint. This saves us from side effects of EXPLAIN ANALYZE on DML queries.	2017-04-28 12:13:48 -07:00
Jason Petersen	93e3afc25c	Remove FastShardPruning method With the other simplifications, it doesn't make sense to keep around.	2017-04-27 13:32:36 -06:00
Jason Petersen	42ee7c05f5	Refactor FindShardInterval to use cacheEntry All callers fetch a cache entry and extract/compute arguments for the eventual FindShardInterval call, so it makes more sense to refactor into that function itself; this solves the use-after-free bug, too.	2017-04-27 13:32:36 -06:00
Andres Freund	b7dfeb0bec	Boring regression test output adjustments. Soon shard pruning will be optimized not to generally work linearly anymore. Thus we can't print the pruned shard intervals as currently done anymore. The current printing of shard ids also prevents us from running tests in parallel, as otherwise shard ids aren't linearly numbered.	2017-04-26 11:33:56 -07:00
Andres Freund	71a7f39b05	Skip exhaustive test in CoPartitionedTables() if declared colocated. That's considerably cheaper.	2017-04-26 11:19:17 -07:00
Marco Slot	4ed093970a	Support expressions in the partition column in INSERTs	2017-04-21 14:05:52 +02:00
velioglu	8cbef819be	Log message of across shard queries according to the log level	2017-04-20 12:24:46 +03:00
velioglu	2327b63291	Change native hash function with worker_hash	2017-04-19 22:16:55 +03:00
Marco Slot	dfd7d86948	Stop using a sequence to generate unique job IDs	2017-04-18 11:31:51 +02:00
Marco Slot	af0e462409	Support UPDATE/DELETE with parameterised partition column qual	2017-04-17 16:17:30 +02:00
Burak Yucesoy	e9095e62ec	Decouple reference table replication With this change we add an option to add a node without replicating all reference tables to that node. If a node is added with this option, we mark the node as inactive and no queries will sent to that node. We also added two new UDFs; - master_activate_node(host, port): - marks node as active and replicates all reference tables to that node - master_add_inactive_node(host, port): - only adds node to pg_dist_node	2017-04-17 13:33:31 +03:00
Burak Yucesoy	7cfcb7d2f8	Error out on parameterized SQL functions Before this commit, we were erroring out for queries containing parameterized SQL functions like 'SELECT parameterized_sql_query(value)' as we should, however we were returning wrong results for queries like 'SELECT * FROM parameterized_sql_query(value)'. With this commit we started to error out on such queries too.	2017-04-13 16:36:24 +03:00
Onder Kalaci	1cb6a34ba8	Remove uninstantiated qual logic, use attribute equivalences In this PR, we aim to deduce whether each of the RTE_RELATION is joined with at least on another RTE_RELATION on their partition keys. If each RTE_RELATION follows the above rule, we can conclude that all RTE_RELATIONs are joined on their partition keys. In order to do that, we invented a new equivalence class namely: AttributeEquivalenceClass. In very simple words, a AttributeEquivalenceClass is identified by an unique id and consists of a list of AttributeEquivalenceMembers. Each AttributeEquivalenceMember is designed to identify attributes uniquely within the whole query. The necessity of this arise since varno attributes are defined within a single level of a query. Instead, here we want to identify each RTE_RELATION uniquely and try to find equality among each RTE_RELATION's partition key. Whenever we find an equality clause A = B, where both A and B originates from relation attributes (i.e., not random expressions), we create an AttributeEquivalenceClass to record this knowledge. If we later find another equivalence B = C, we create another AttributeEquivalenceClass. Finally, we can apply transitity rules and generate a new AttributeEquivalenceClass which includes A, B and C. Note that equality among the members are identified by the varattno and rteIdentity. Each equality among RTE_RELATION is saved using an AttributeEquivalenceClass where each member attribute is identified by a AttributeEquivalenceMember. In the final step, we try generate a common attribute equivalence class that holds as much as AttributeEquivalenceMembers whose attributes are a partition keys.	2017-04-13 11:51:26 +03:00
Onder Kalaci	11665dbe3c	Fix pushing down wrong queries for INSERT ... SELECT queries Before this commit, in certain cases router planner allowed pushing down JOINs that are not on the partition keys. With @anarazel's suggestion, we change the logic to use uninstantiated parameter. Previously, the planner was traversing on the restriction information and once it finds the parameter, it was replacing it with the shard range. With this commit, instead of traversing the restrict infos, the planner explicitly checks for the equivalence of the relation partition key with the uninstantiated parameter. If finds an equivalence, it adds the restrictions. In this way, we have more control over the queries that are pushed down.	2017-03-24 11:37:35 +02:00
Metin Doslu	b1ee7ec93e	Fix access permission checks for distributed relations With this commit, we add the range table list of the original query to our custom plan. Therefore, PostgreSQL can check relations in the original query for access permissions and error out if the proper access is not granted.	2017-03-22 15:25:00 -06:00
Murat Tuncer	c4734d7d94	Rephrase router modify errors generic "distributed modifications must target exactly one shard" message is replaced by more context aware error messages.	2017-03-16 15:09:10 +03:00
Metin Doslu	1f838199f8	Use CustomScan API for query execution Custom Scan is a node in the planned statement which helps external providers to abstract data scan not just for foreign data wrappers but also for regular relations so you can benefit your version of caching or hardware optimizations. This sounds like only an abstraction on the data scan layer, but we can use it as an abstraction for our distributed queries. The only thing we need to do is to find distributable parts of the query, plan for them and replace them with a Citus Custom Scan. Then, whenever PostgreSQL hits this custom scan node in its Vulcano style execution, it will call our callback functions which run distributed plan and provides tuples to the upper node as it scans a regular relation. This means fewer code changes, fewer bugs and more supported features for us! First, in the distributed query planner phase, we create a Custom Scan which wraps the distributed plan. For real-time and task-tracker executors, we add this custom plan under the master query plan. For router executor, we directly pass the custom plan because there is not any master query. Then, we simply let the PostgreSQL executor run this plan. When it hits the custom scan node, we call the related executor parts for distributed plan, fill the tuple store in the custom scan and return results to PostgreSQL executor in Vulcano style, a tuple per XXX_ExecScan() call. * Modify planner to utilize Custom Scan node. * Create different scan methods for different executors. * Use native PostgreSQL Explain for master part of queries.	2017-03-14 12:17:51 +02:00
Andres Freund	52358fe891	Initial temp table removal implementation	2017-03-14 12:09:49 +02:00
Murat Tuncer	f657a744d5	Enable router planner for queries on range partitioned tables Router planner now supports queries using range partitioned tables. Queries on append partitioned tables are still not supported.	2017-03-09 16:39:15 +03:00
Metin Doslu	ee425871ee	Get reproducible costs between different PostgreSQL versions	2017-02-22 15:40:02 +02:00
Andres Freund	9721e80901	Use DEBUG2 instead of DEBUG4 in INSERT SELECT tests & debug message. During later work the transaction debug output will change (as it will in postgres 10), which makes it hard to see actual changes in the INSERT ... SELECT ... test. Reduce to DEBUG2 after changing a debug message to that log level.	2017-02-20 12:56:16 +02:00
Marco Slot	ba940a1de9	Use coordinator instead of schema node in terminology	2017-01-25 11:07:23 +01:00
Andres Freund	6939cb8c56	Hack up PREPARE/EXECUTE for nearly all distributed queries. All router, real-time, task-tracker plannable queries should now have full prepared statement support (and even use router when possible), unless they don't go through the custom plan interface (which basically just affects LANGUAGE SQL (not plpgsql) functions). This is achieved by forcing postgres' planner to always choose a custom plan, by assigning very low costs to plans with bound parameters (i.e. ones were the postgres planner replanned the query upon EXECUTE with all parameter values provided), instead of the generic one. This requires some trickery, because for custom plans to work the costs for a non-custom plan have to be known, which means we can't error out when planning the generic plan. Instead we have to return a "faux" plan, that'd trigger an error message if executed. But due to the custom plan logic that plan will likely (unless called by an SQL function, or because we can't support that query for some reason) not be executed; instead the custom plan will be chosen.	2017-01-23 09:23:50 -08:00
Andres Freund	c244b8ef4a	Make router planner error handling more flexible. So far router planner had encapsulated different functionality in MultiRouterPlanCreate. Modifications always go through router, selects sometimes. Modifications always error out if the query is unsupported, selects return NULL. Especially the error handling is a problem for the upcoming extension of prepared statement support. Split MultiRouterPlanCreate into CreateRouterPlan and CreateModifyPlan, and change them to not throw errors. Instead errors are now reported by setting the new MultiPlan->plannigError. Callers of router planner functionality now have to throw errors themselves if desired, but also can skip doing so. This is a pre-requisite for expanding prepared statement support. While touching all those lines, improve a number of error messages by getting them closer to the postgres error message guidelines.	2017-01-23 09:23:50 -08:00
Andres Freund	7681f6ab9d	Centralize more of distributed planning into CreateDistributedPlan(). The name CreatePhysicalPlan() hasn't been accurate for a while, and the split of work between multi_planner() and CreatePhysicalPlan() doesn't seem perfect. So rename to CreateDistributedPlan() and move a bit more logic in there.	2017-01-23 09:23:50 -08:00
Andres Freund	9a82e8f06b	Make usage of static a bit more consistent in multi_planner.c.	2017-01-23 09:23:50 -08:00
Jason Petersen	56197dbdba	Add replication_model GUC This adds a replication_model GUC which is used as the replication model for any new distributed table that is not a reference table. With this change, tables with replication factor 1 are no longer implicitly MX tables. The GUC is similarly respected during empty shard creation for e.g. existing append-partitioned tables. If the model is set to streaming while replication factor is greater than one, table and shard creation routines will error until this invalid combination is corrected. Changing this parameter requires superuser permissions.	2017-01-23 09:05:14 -07:00
Burak Yucesoy	2e1df4c910	Reword error message for outer joins requiring repartition We changed error message which appears when user tries to execute outer join command and that command requires repartitioning. Old error message mentioned about 1-to-1 shard partitioning which may not be clear to user.	2017-01-23 10:42:36 +03:00
Marco Slot	87ae26aef3	Ensure job IDs are unique across workers	2017-01-22 16:55:14 +01:00
Andres Freund	3a36d32c43	Mark some now unnecessarily exposed multi_planner.c functions static.	2017-01-20 12:31:56 -08:00
Andres Freund	608bed0387	Don't duplicate planning logic in citus' explain hook. Instead use pg_plan_query() like the normal explain does, and use that to explain the query. That's important because it allows to remove the duplicated planner logic from multi_explain - and that logic is about to get more complicated.	2017-01-20 12:31:28 -08:00
Andres Freund	0f28a11970	Remove citus.explain_multi_logical/physical_plan. They make fixing explain for prepared statement harder, and they don't really fit into EXPLAIN in the first place. Additionally they're currently not exercised in any tests.	2017-01-20 12:31:19 -08:00
Metin Doslu	93e626c896	Refactor get_shard_id_for_distribution_column() and other minor changes	2017-01-20 14:38:01 +02:00
Onder Kalaci	a7ed49c16e	Improve error messages for INSERT INTO .. SELECT This commit is intended to improve the error messages while planning INSERT INTO .. SELECT queries. The main motivation for this change is that we used to map multiple cases into a single message. With this change, we added explicit error messages for many cases.	2017-01-16 12:16:14 -07:00
Murat Tuncer	e7935a3be4	Report error when original range table id is not found in NewTableId()	2017-01-13 09:39:43 +03:00
Murat Tuncer	77f8db6b14	Add view support Enables use views within distributed queries. User can create and use a view on distributed tables/queries as he/she would use with regular queries. After this change router queries will have full support for views, insert into select queries will support reading from views, not writing into. Outer joins would have a limited support, and would error out at certain cases such as when a view is in the inner side of the outer join. Although PostgreSQL supports writing into views under certain circumstances. We disallowed that for distributed views.	2017-01-13 09:39:42 +03:00
Murat Tuncer	cb1dfd0a17	Add hint to errored real time queries	2017-01-12 11:33:35 +03:00
Burak Yucesoy	59d3d05bc4	Error out on CTEs with data modifying statement With this change we start to error out on router planner queries where a common table expression with data-modifying statement is present. We already do not support if there is a data-modifying statement using result of the CTE, now we also error out if CTE itself is data-modifying statement.	2017-01-10 10:30:09 +02:00
Onder Kalaci	6d050fd677	Use 2PC for reference table modification With this commit, we ensure that router executor always uses 2PC for reference table modifications and never mark the placements of it as INVALID.	2017-01-04 12:46:35 +02:00
Eren Basak	7e09bd6836	Error on Unsupported Features on Workers This change makes the metadata workers error out on unsupported commands.	2017-01-02 16:03:45 +03:00
Murat Tuncer	2f76b4be99	Add error hint to failing modify query	2016-12-23 19:43:55 +03:00
Marco Slot	11031bcf55	Enable evaluation of stable functions in INSERT..SELECT	2016-12-23 12:47:21 +01:00
Marco Slot	d745d7bf70	Add explicit RelationShards mapping to tasks	2016-12-23 10:23:43 +01:00
Onder Kalaci	9f0bd4cb36	Reference Table Support - Phase 1 With this commit, we implemented some basic features of reference tables. To start with, a reference table is * a distributed table whithout a distribution column defined on it * the distributed table is single sharded * and the shard is replicated to all nodes Reference tables follows the same code-path with a single sharded tables. Thus, broadcast JOINs are applicable to reference tables. But, since the table is replicated to all nodes, table fetching is not required any more. Reference tables support the uniqueness constraints for any column. Reference tables can be used in INSERT INTO .. SELECT queries with the following rules: * If a reference table is in the SELECT part of the query, it is safe join with another reference table and/or hash partitioned tables. * If a reference table is in the INSERT part of the query, all other participating tables should be reference tables. Reference tables follow the regular co-location structure. Since all reference tables are single sharded and replicated to all nodes, they are always co-located with each other. Queries involving only reference tables always follows router planner and executor. Reference tables can have composite typed columns and there is no need to create/define the necessary support functions. All modification queries, master_* UDFs, EXPLAIN, DDLs, TRUNCATE, sequences, transactions, COPY, schema support works on reference tables as expected. Plus, all the pre-requisites associated with distribution columns are dismissed.	2016-12-20 14:09:35 +02:00
Murat Tuncer	c3a60bff70	Make router planner active at all times We used to disable router planner and executor when task executor is set to task-tracker. This change enables router planning and execution at all times regardless of task execution mode. We are introducing a hidden flag enable_router_execution to enable/disable router execution. Its default value is true. User may disable router planning by setting it to false.	2016-12-20 11:24:01 +03:00
Onder Kalaci	df974e15b8	Bugfix for deparsing INSERT..SELECT queries which involve constant values This commit fixes a bug when the SELECT target list includes a constant value. Previous behaviour of target list re-ordering: * Iterate over the INSERT target list * If it includes a Var, find the corresponding SELECT entry and update its resno accordingly * If it does not include a Var (which we only considered to be DEFAULTs), generate a new SELECT target entry * If the processed target entry count in SELECT target list is less than the original SELECT target list (GROUP BY elements not included in the SELECT target entry), add them in the SELECT target list and update the resnos accordingly. * However, this step was leading to add the CONST SELECT target entries twice. The reason is that when CONST target list entries appear in the SELECT target list, the INSERT target list doesn't include a Var. Instead, it includes CONST as it does for DEFAULTs. New behaviour of target list re-ordering: * Iterate over the INSERT target list * If it includes a Var, find the corresponding SELECT entry and update its resno accordingly * If it does not include a Var (which we consider to be DEFAULTs and CONSTs on the SELECT), generate a new SELECT target entry * If any target entry remains on the SELECT target list which are resjunk, (GROUP BY elements not included in the SELECT target entry), keep them in the SELECT target list by updating the resnos.	2016-12-01 10:41:56 +02:00
Murat Tuncer	45762006f3	Add support for filters Ensures filter clauses are stripped from master query, and pushed down to worker queries.	2016-12-01 08:53:46 +03:00
Onder Kalaci	a43e3bad56	Improve error semantics for INSERT..SELECT With this commit, we error out if a worker query cannot be executed on all placements of a target insert shard interval.	2016-10-27 14:09:05 +03:00
Brian Cloutier	1e6d1ef67e	Fix segfault during EXPLAIN EXECUTE Fix citusdata/citus#886 The way postgres' explain hook is designed means that our hook is never called during EXPLAIN EXECUTE. So, we special-case EXPLAIN EXECUTE by catching it in the utility hook. We then replace the EXECUTE with the original query and pass it back to Citus.	2016-10-26 15:18:42 +03:00
Onder Kalaci	1673ea937c	Feature: INSERT INTO ... SELECT This commit adds INSERT INTO ... SELECT feature for distributed tables. We implement INSERT INTO ... SELECT by pushing down the SELECT to each shard. To compute that we use the router planner, by adding an "uninstantiated" constraint that the partition column be equal to a certain value. standard_planner() distributes that constraint to all the tables where it knows how to push the restriction safely. An example is that the tables that are connected via equi joins. The router planner then iterates over the target table's shards, for each we replace the "uninstantiated" restriction, with one that PruneShardList() handles. Do so by replacing the partitioning qual parameter added in multi_planner() with the current shard's actual boundary values. Also, add the current shard's boundary values to the top level subquery to ensure that even if the partitioning qual is not distributed to all the tables, we never run the queries on the shards that don't match with the current shard boundaries. Finally, perform the normal shard pruning to decide on whether to push the query to the current shard or not. We do not support certain SQLs on the subquery, which are described/commented on ErrorIfInsertSelectQueryNotSupported(). We also added some locking on the router executor. When an INSERT/SELECT command runs on a distributed table with replication factor >1, we need to ensure that it sees the same result on each placement of a shard. So we added the ability such that router executor takes exclusive locks on shards from which the SELECT in an INSERT/SELECT reads in order to prevent concurrent changes. This is not a very optimal solution, but it's simple and correct. The citus.all_modifications_commutative can be used to avoid aggressive locking. An INSERT/SELECT whose filters are known to exclude any ongoing writes can be marked as commutative. See RequiresConsistentSnapshot() for the details. We also moved the decison of whether the multiPlan should be executed on the router executor or not to the planning phase. This allowed us to integrate multi task router executor tasks to the router executor smoothly.	2016-10-26 10:01:00 +03:00
Onder Kalaci	e0d83d65af	Add ability to reorder target list for INSERT/SELECT queries The necessity for this functionality comes from the fact that ruleutils.c is not supposed to be used on "rewritten" queries (i.e. ones that have been passed through QueryRewrite()). Query rewriting is the process in which views and such are expanded, and, INSERT/UPDATE targetlists are reordered to match the physical order, defaults etc. For the details of reordeing, see transformInsertRow().	2016-10-26 10:00:03 +03:00
Marco Slot	02d2b86e68	Re-disable master evaluation for SELECT	2016-10-21 10:51:47 +02:00
Marco Slot	9d98acfb6d	Move requiresMasterEvaluation from Task to Job	2016-10-19 08:23:06 +02:00
Andres Freund	ac14b2edbc	Support PostgreSQL 9.6 Adds support for PostgreSQL 9.6 by copying in the requisite ruleutils file and refactoring the out/readfuncs code to flexibly support the old-style copy/pasted out/readfuncs (prior to 9.6) or use extensible node APIs (in 9.6 and higher). Most version-specific code within this change is only needed to set new fields in the AggRef nodes we build for aggregations. Version-specific test output files were added in certain cases, though in most they were not necessary. Each such file begins by e.g. printing the major version in order to clarify its purpose. The comment atop citus_nodes.h details how to add support for new nodes for when that becomes necessary.	2016-10-18 16:23:55 -06:00
Metin Doslu	d03a2af778	Add HAVING support This commit completes having support in Citus by adding having support for real-time and task-tracker executors. Multiple tests are added to regression tests to cover new supported queries with having support.	2016-10-13 15:47:53 +03:00
Andres Freund	982ad66753	Introduce placement IDs. So far placements were assigned an Oid, but that was just used to track insertion order. It also did so incompletely, as it was not preserved across changes of the shard state. The behaviour around oid wraparound was also not entirely as intended. The newly introduced, explicitly assigned, IDs are preserved across shard-state changes. The prime goal of this change is not to improve ordering of task assignment policies, but to make it easier to reference shards. The newly introduced UpdateShardPlacementState() makes use of that, and so will the in-progress connection and transaction management changes.	2016-10-07 11:59:20 -07:00
Brian Cloutier	9d6699b07c	Switch from pg_worker_list.conf file to pg_dist_node metadata table. Related to #786 This change adds the `pg_dist_node` table that contains the information about the workers in the cluster, replacing the previously used `pg_worker_list.conf` file (or the one specified with `citus.worker_list_file`). Upon update, `pg_worker_list.conf` file is read and `pg_dist_node` table is populated with the file's content. After that, `pg_worker_list.conf` file is renamed to `pg_worker_list.conf.obsolete` For adding and removing nodes, the change also includes two new UDFs: `master_add_node` and `master_remove_node`, which require superuser permissions. 'citus.worker_list_file' guc is kept for update purposes but not used after the update is finished.	2016-10-05 13:01:35 +03:00
Andres Freund	6d050bc9f8	Initialize count_agg_clauses argument to 0. count_agg_clause adds the cost of the aggregates to the state variable, it doesn't reinitialize it. That is intentional, as it is used to incrementally add costs in some places.	2016-10-03 13:07:43 -07:00
Robin Thomas	c507a0df1c	During repartitions, the partitionColumnType argument sent to workers is now a `::regtype` using the qualified name of the column type, not the column type OID which may differ between master/worker nodes. Test coverage of a hash reparitition using a UDT as the join column. Note that the UDFs `worker_hash_partition_table` and `worker_range_partition_table` are unchanged, and rightly expect an OID for the column type; but the planner code building the commands now allows for `::regtype` casting to do its magic. Fixes citusdata/citus#111.	2016-10-03 13:41:20 -04:00
Onder Kalaci	a533b8e7c1	Differentiate worker and master job temporary folders This commit enables to create different worker and master temporary folders. This change is important for citus-mx on task-tracker execution. In simple words, on citus-mx, the worker could actually be reponsible for the master tasks as well. Prior to this change, both master and worker logic on task-tracker executor was accessing and using the same files for different purposes which was dangerous on certain cases (i.e., when task_tracker_delay is low).	2016-10-03 14:24:08 +03:00
Marco Slot	c4bc0742a7	Make count return 0 if all shards are pruned away Before this change, count on a distributed returned NULL if all shards were pruned away, because on the master we replace with count(..) call with a sum(..) call to sum the counts from the shards. However, sum returns NULL when there are no rows, whereas count is expected to return 0.	2016-09-29 20:27:26 +02:00
Murat Tuncer	5b42318ac4	Make where false queries router plannable	2016-09-28 18:49:26 +03:00
Marco Slot	3318288d75	Fix segmentation fault in case of joins with WHERE 1=0	2016-09-26 15:12:29 +02:00
Marco Slot	6f6cb1a0d6	Allow noop updates of the partition column	2016-09-07 14:22:41 +02:00
Metin Doslu	7d212b847f	Add outer join clause list extraction for subquery pushdown logic In subquery pushdown, we allow outer joins if the join condition is on the partition columns. WhereClauseList() used to return all join conditions including outer joins. However, this has been changed with a commit related to outer join support on regular queries. With this commit, we refactored ExtractFromExpressionWalker() to return two lists of qualifiers. The first list is for inner join and filter clauses and the second list is for outer join clauses. Therefore, we can also use outer join clauses to check subquery pushdown prerequisites.	2016-09-02 11:54:44 +03:00
Robin Thomas	010cbf16fc	Remove all usage of pg_dist_shard.shardalias in extension code. (#739 ) Remove regression test of non-null shardalias.	2016-08-19 17:06:22 +03:00
Burak Yucesoy	6f20af9e38	Remove schema name parameter from API functions We remove schema name parameter from worker_fetch_foreign_file and worker_fetch_regular_table functions. We now send schema name concatanated with table name.	2016-07-28 20:41:05 +03:00
Burak Yucesoy	a649b47bac	Add old version(without schema name parameter) of api functions back Fixes #676 We added old versions (i.e. without schema name) of worker_apply_shard_ddl_command, worker_fetch_foreign_file and worker_fetch_regular_table back. During function call of one of these functions, we set schema name as public schema and call the newer version of the functions.	2016-07-28 20:40:38 +03:00
Murat Tuncer	cc33a450c4	Expand router planner coverage We can now support richer set of queries in router planner. This allow us to support CTEs, joins, window function, subqueries if they are known to be executed at a single worker with a single task (all tables are filtered down to a single shard and a single worker contains all table shards referenced in the query). Fixes : #501	2016-07-27 23:35:38 +03:00
Murat Tuncer	c20080992d	Remove PostgreSQL 9.4 support	2016-07-26 20:16:09 +03:00
Murat Tuncer	5d996a6891	Fix outer join crash when subquery is flatten	2016-07-22 17:01:19 +03:00
Burak Yucesoy	b58872b441	Fix worker_fetch_regular_table with schema Fixes #504 Fixes #646 We changed signature of worker_fetch_regular_table to accept schema name as parameter to make it work with schemas.	2016-07-22 00:44:02 -06:00
Burak Yucesoy	20debfc0ee	Fix COUNT DISTINCT approximation with schema Fixes #555 Before this change, we were resolving HLL function and type Oid without qualified name. Now we find the schema name where HLL objects are stored and generate qualified names for each objects. Similar fix is also applied for cstore_table_size function call.	2016-07-21 17:29:18 +03:00
Murat Tuncer	4d992c8143	Make router planner use original query	2016-07-18 18:23:04 +03:00
Eren	5b54e28f93	Add LIMIT/OFFSET Support Fixes #394 This change adds LIMIT/OFFSET support for non router-plannable distributed queries. In cases that we can push the LIMIT down, we add the OFFSET value to that LIMIT in the worker queries. When a query with LIMIT x OFFSET y is issued, the query is propagated to the workers as LIMIT (x+y) OFFSET 0, and on the master table, the original LIMIT and OFFSET values are used. With this change, we can use OFFSET wherever we can use LIMIT.	2016-07-18 12:00:24 +03:00
Andres Freund	4cf0a4e48e	citus_indent fixups	2016-07-13 11:45:51 -07:00
Brian Cloutier	0cad3b22cc	Simplify code and fix include guards in citus_clauses	2016-07-13 11:45:51 -07:00
Brian Cloutier	08384ddc71	cosmetic changes	2016-07-13 11:45:51 -07:00
Brian Cloutier	af9515f669	Only reparse queries if the planner flags them for reparsing	2016-07-13 11:45:51 -07:00
Brian Cloutier	4820366a6f	citus_indent and some renaming	2016-07-13 11:45:51 -07:00

1 2 3 4

200 Commits (3f03cb6a6a666720de8f121fde4e765239f00d80)