citus

Commit Graph

Author	SHA1	Message	Date
Mehmet Yilmaz	bcb5b5e952	pg18 compile work cont	2025-07-01 13:03:51 +00:00
Naisila Puka	6bd3474804	Rename foreach_ macros to foreach_declared_ macros (#7700 ) This is prep work for successful compilation with PG17 PG17added foreach_ptr, foreach_int and foreach_oid macros Relevant PG commit 14dd0f27d7cd56ffae9ecdbe324965073d01a9ff `14dd0f27d7` We already have these macros, but they are different with the PG17 ones because our macros take a DECLARED variable, whereas the PG16 macros declare a locally-scoped loop variable themselves. Hence I am renaming our macros to foreach_declared_ I am separating this into its own PR since it touches many files. The main compilation PR is https://github.com/citusdata/citus/pull/7699	2025-03-12 11:01:49 +03:00
Nils Dijk	0620c8f9a6	Sort includes (#7326 ) This change adds a script to programatically group all includes in a specific order. The script was used as a one time invocation to group and sort all includes throught our formatted code. The grouping is as follows: - System includes (eg. `#include<...>`) - Postgres.h (eg. `#include "postgres.h"`) - Toplevel imports from postgres, not contained in a directory (eg. `#include "miscadmin.h"`) - General postgres includes (eg . `#include "nodes/..."`) - Toplevel citus includes, not contained in a directory (eg. `#include "citus_verion.h"`) - Columnar includes (eg. `#include "columnar/..."`) - Distributed includes (eg. `#include "distributed/..."`) Because it is quite hard to understand the difference between toplevel citus includes and toplevel postgres includes it hardcodes the list of toplevel citus includes. In the same manner it assumes anything not prefixed with `columnar/` or `distributed/` as a postgres include. The sorting/grouping is enforced by CI. Since we do so with our own script there are not changes required in our uncrustify configuration.	2023-11-23 18:19:54 +01:00
Nils Dijk	0dac63afc0	move pg_version_constants.h to toplevel include (#7335 ) In preparation of sorting and grouping all includes we wanted to move this file to the toplevel includes for good grouping/sorting.	2023-11-09 15:09:39 +00:00
Onur Tirtir	20a5f3af2b	Replace CITUS_TABLE_WITH_NO_DIST_KEY checks with HasDistributionKey() (#6743 ) Now that we will soon add another table type having DISTRIBUTE_BY_NONE as distribution method and that we want the code to interpret such tables mostly as distributed tables, let's make the definition of those other two table types more strict by removing CITUS_TABLE_WITH_NO_DIST_KEY macro. And instead, use HasDistributionKey() check in the places where the logic applies to all table types that have / don't have a distribution key. In future PRs, we might want to convert some of those HasDistributionKey() checks if logic only applies to Citus local / reference tables, not the others. And adding HasDistributionKey() also allows us to consider having DISTRIBUTE_BY_NONE as the distribution method as a "table attribute" that can apply to distributed tables too, rather something that determines the table type.	2023-03-10 13:55:52 +03:00
Onur Tirtir	81af605e07	Fix typo: "no sharding pruning constraints" -> "no shard pruning constraints" (#5490 )	2021-11-25 21:00:44 +01:00
Onur Tirtir	b118d4188e	Fix lower boundary calculation when pruning range dist table shards (#5082 ) This happens only when we have a "<" or "<=" filter on distribution column of a range distributed table and that filter falls in between two shards. When the filter falls in between two shards: If the filter is ">" or ">=", then UpperShardBoundary was returning "upperBoundIndex - 1", where upperBoundIndex is exclusive shard index used during binary seach. This is expected since upperBoundIndex is an exclusive index. If the filter is "<" or "<=", then LowerShardBoundary was returning "lowerBoundIndex + 1", where lowerBoundIndex is inclusive shard index used during binary seach. On the other hand, since lowerBoundIndex is an inclusive index, we should just return lowerBoundIndex instead of doing "+ 1". Before this commit, we were missing leftmost shard in such queries. * Remove useless conditional branches The branch that we delete from UpperShardBoundary was obviously useless. The other one in LowerShardBoundary became useless after we remove "+ 1" from there. This indeed is another proof of what & how we are fixing with this pr. * Improve comments and add more * Add some tests for upper bound calculation too	2021-07-02 14:48:21 +03:00
SaitTalhaNisanci	03832f353c	Drop postgres 11 support	2021-03-25 09:20:28 +03:00
Sait Talha Nisanci	eebcd995b3	Add some more tests	2020-12-15 18:17:10 +03:00
Sait Talha Nisanci	44953579cf	Enable citus-local distributed table joins Check equality in quals We want to recursively plan distributed tables only if they have an equality filter on a unique column. So '>' and '<' operators will not trigger recursive planning of distributed tables in local-distributed table joins. Recursively plan distributed table only if the filter is constant If the filter is not a constant then the join might return multiple rows and there is a chance that the distributed table will return huge data. Hence if the filter is not constant we choose to recursively plan the local table.	2020-12-15 18:17:10 +03:00
SaitTalhaNisanci	180195b445	Remove unused parameter from VarConstOpExprClause (#4348 )	2020-11-25 21:00:22 +03:00
SaitTalhaNisanci	9c44911226	Improve error messages in shard pruning (#4324 )	2020-11-18 17:16:06 +03:00
SaitTalhaNisanci	366461ccdb	Introduce cache entry/table utilities (#4132 ) Introduce table entry utility functions Citus table cache entry utilities are introduced so that we can easily extend existing functionality with minimum changes, specifically changes to these functions. For example IsNonDistributedTableCacheEntry can be extended for citus local tables without the need to scan the whole codebase and update each relevant part. * Introduce utility functions to find the type of tables A table type can be a reference table, a hash/range/append distributed table. Utility methods are created so that we don't have to worry about how a table is considered as a reference table etc. This also makes it easy to extend the table types. * Add IsCitusTableType utilities * Rename IsCacheEntryCitusTableType -> IsCitusTableTypeCacheEntry * Change citus table types in some checks	2020-09-02 22:26:05 +03:00
Onder Kalaci	06461ca55f	Coerce types properly for INSERT Also, unify similar code-paths to rely on more accurate function.	2020-06-10 10:40:28 +02:00
Philip Dubé	c0a95a3adb	Copy data from CitusTableCacheEntry more often This copies over fixes from reference counting branch, all CitusTableCacheEntry data may be freed when a GetCitusTableCacheEntry call occurs for its relationId This fix is not complete, but reference counting is being deferred until 9.4 CopyShardInterval: remove dest parameter, always return newly allocated object	2020-04-17 14:17:18 +00:00
SaitTalhaNisanci	ba01f3457a	use macros for pg versions instead of hardcoded values (#3694 ) 3 Macros are defined for removing the hardcoded pg versions. PG_VERSION_11, PG_VERSION_12 and PG_VERSION_13.	2020-04-01 17:01:52 +03:00
Jelte Fennema	56863e8f0b	Really ignore -Wgnu-variable-sized-type-not-at-end (#3627 )	2020-03-20 11:53:28 +01:00
Philip Dubé	7cdfa1daab	Rename LookupCitusTableCacheEntry to GetCitusTableCacheEntry, LookupLookupCitusTableCacheEntry back to LookupCitusTableCacheEntry	2020-03-08 14:08:23 +00:00
Philip Dubé	a7cca1bcde	Rename DistTableCacheEntry to CitusTableCacheEntry	2020-03-07 14:08:03 +00:00
Philip Dubé	bec58000d6	Given IsDistributedTableRTE, there's ambiguity in what DistributedTable means Elsewhere we used DistributedTable to include reference tables Marco suggested we use CitusTable for distributed & reference tables So renaming: - IsDistributedTable -> IsCitusTable - IsDistributedTableViaCatalog -> IsCitusTableViaCatalog - DistributedTableCacheEntry -> CitusTableCacheEntry - DistributedTableList -> CitusTableList - isDistributedTable -> isCitusTable - InsertSelectIntoDistributedTable -> InsertSelectIntoCitusTable - ExtractFirstDistributedTableId -> ExtractFirstCitusTableId	2020-03-06 18:57:55 +00:00
Jelte Fennema	685b54b3de	Semmle: Check for NULL in some places where it might occur (#3509 ) Semmle reported quite some places where we use a value that could be NULL. Most of these are not actually a real issue, but better to be on the safe side with these things and make the static analysis happy.	2020-02-27 10:45:29 +01:00
Hadi Moshayedi	e7cce40e6e	Address pykello's feedback	2020-02-26 07:17:32 -08:00
Hadi Moshayedi	1b3e58f0c3	Merge branch 'improve-shard-pruning' of https://github.com/MarkusSintonen/citus into MarkusSintonen-improve-shard-pruning	2020-02-26 07:13:33 -08:00
Jelte Fennema	8de8b62669	Convert unsafe APIs to safe ones	2020-02-25 15:39:27 +01:00
Philip Dubé	08f6842d50	Fix typos Equivalance -> Equivalence utillity -> utility shorted lived one -> shortly lived one elegible -> eligible	2020-02-18 17:14:40 +00:00
Markus Sintonen	cf8319b992	Add comment, add subquery NOT tests	2020-02-16 01:21:10 +02:00
Markus Sintonen	3d3d615040	Add comment about NOT_EXPR. Treat it as invalid constraint for safety.	2020-02-15 16:54:38 +02:00
Philip Dubé	7382c8be00	Clean up from code review Only change to behavior is: - don't ignore array const's constcollid in SAORestrictions - don't end lines with commas in DebugLogPruningInstance	2020-02-14 17:58:23 +00:00
Markus Sintonen	cdedb98c54	Improve shard pruning logic to understand OR-conditions. Previously a limitation in the shard pruning logic caused multi distribution value queries to always go into all the shards/workers whenever query also used OR conditions in WHERE clause. Related to https://github.com/citusdata/citus/issues/2593 and https://github.com/citusdata/citus/issues/1537 There was no good workaround for this limitation. The limitation caused quite a bit of overhead with simple queries being sent to all workers/shards (especially with setups having lot of workers/shards). An example of a previous plan which was inadequately pruned: ``` EXPLAIN SELECT count() FROM orders_hash_partitioned WHERE (o_orderkey IN (1,2)) AND (o_custkey = 11 OR o_custkey = 22); QUERY PLAN --------------------------------------------------------------------- Aggregate (cost=0.00..0.00 rows=0 width=0) -> Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=0 width=0) Task Count: 4 Tasks Shown: One of 4 -> Task Node: host=localhost port=xxxxx dbname=regression -> Aggregate (cost=13.68..13.69 rows=1 width=8) -> Seq Scan on orders_hash_partitioned_630000 orders_hash_partitioned (cost=0.00..13.68 rows=1 width=0) Filter: ((o_orderkey = ANY ('{1,2}'::integer[])) AND ((o_custkey = 11) OR (o_custkey = 22))) (9 rows) ``` After this commit the task count is what one would expect from the query defining multiple distinct values for the distribution column: ``` EXPLAIN SELECT count() FROM orders_hash_partitioned WHERE (o_orderkey IN (1,2)) AND (o_custkey = 11 OR o_custkey = 22); QUERY PLAN --------------------------------------------------------------------- Aggregate (cost=0.00..0.00 rows=0 width=0) -> Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=0 width=0) Task Count: 2 Tasks Shown: One of 2 -> Task Node: host=localhost port=xxxxx dbname=regression -> Aggregate (cost=13.68..13.69 rows=1 width=8) -> Seq Scan on orders_hash_partitioned_630000 orders_hash_partitioned (cost=0.00..13.68 rows=1 width=0) Filter: ((o_orderkey = ANY ('{1,2}'::integer[])) AND ((o_custkey = 11) OR (o_custkey = 22))) (9 rows) ``` "Core" of the pruning logic works as previously where it uses `PrunableInstances` to queue ORable valid constraints for shard pruning. The difference is that now we build a compact internal representation of the query expression tree with PruningTreeNodes before actual shard pruning is run. Pruning tree nodes represent boolean operators and the associated constraints of it. This internal format allows us to have compact representation of the query WHERE clauses which allows "core" pruning logic to work with OR-clauses correctly. For example query having `WHERE (o_orderkey IN (1,2)) AND (o_custkey=11 OR (o_shippriority > 1 AND o_shippriority < 10))` gets transformed into: 1. AND(o_orderkey IN (1,2), OR(X, AND(X, X))) 2. AND(o_orderkey IN (1,2), OR(X, X)) 3. AND(o_orderkey IN (1,2), X) Here X is any set of unknown condition(s) for shard pruning. This allow the final shard pruning to correctly recognize that shard pruning is done with the valid condition of `o_orderkey IN (1,2)`. Another example with unprunable condition in query `WHERE (o_orderkey IN (1,2)) OR (o_custkey=11 AND o_custkey=22)` gets transformed into: 1. OR(o_orderkey IN (1,2), AND(X, X)) 2. OR(o_orderkey IN (1,2), X) Which is recognized as unprunable due to the OR condition between distribution column and unknown constraint -> goes to all shards. Issue https://github.com/citusdata/citus/issues/1537 originally suggested transforming the query conditions into a full disjunctive normal form (DNF), but this process of transforming into DNF is quite a heavy operation. It may "blow up" into a really large DNF form with complex queries having non trivial `WHERE` clauses. I think the logic for shard pruning could be simplified further but I decided to leave the "core" of the shard pruning untouched.	2020-02-14 17:58:13 +00:00
Onder Kalaci	975c4c2264	Do not prune shards if the distribution key is NULL The root of the problem is that, standard_planner() converts the following qual ``` {OPEXPR :opno 98 :opfuncid 67 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 100 :args ( {VAR :varno 1 :varattno 1 :vartype 25 :vartypmod -1 :varcollid 100 :varlevelsup 0 :varnoold 1 :varoattno 1 :location 45 } {CONST :consttype 25 :consttypmod -1 :constcollid 100 :constlen -1 :constbyval false :constisnull true :location 51 :constvalue <> } ) :location 49 } ``` To ``` ( {CONST :consttype 16 :consttypmod -1 :constcollid 0 :constlen 1 :constbyval true :constisnull true :location -1 :constvalue <> } ) ``` So, Citus doesn't deal with NULL values in real-time or non-fast path router queries. And, in the FastPathRouter planner, we check constisnull in DistKeyInSimpleOpExpression(). However, in deferred pruning case, we do not check for isnull for const. Thus, the fix consists of two parts: - Let PruneShards() not crash when NULL parameter is passed - For deferred shard pruning in fast-path queries, explicitly check that we have CONST which is not NULL	2020-02-13 15:00:31 +01:00
Marco Slot	1633123d78	Fix crash in IN (NULL) queries	2019-12-13 08:35:54 +01:00
SaitTalhaNisanci	13204487e9	remove copyright years (#3286 )	2019-12-11 21:14:08 +03:00
Philip Dubé	fcf2fd819b	Add distributioncolumncollation to to pg_dist_colocation Use partition column's collation for range distributed tables Don't allow non deterministic collations for hash distributed tables CoPartitionedTables: don't compare unequal types	2019-12-09 19:51:40 +00:00
Philip Dubé	261a9de42d	Fix typos: VAR_SET_VALUE_KIND -> VAR_SET_VALUE kind beginnig -> beginning plannig -> planning the the -> the er then -> er than	2019-11-25 23:24:13 +00:00
Jelte Fennema	1d8dde232f	Automatically convert useless declarations using regex replace (#3181 ) * Add declaration removal to CI * Convert declarations	2019-11-21 13:47:29 +01:00
Jelte Fennema	f0c35ad134	Include fmgr.h, don't duplicate FunctionCallInfo typedef	2019-11-04 17:10:33 +00:00
Philip Dubé	e5cd298a98	pg12 revised layout of FunctionCallInfoData See `a9c35cf85c` clang raises a warning due to FunctionCall2InfoData technically being variable sized This is fine, as the struct is the size we want it to be. So silence the warning	2019-08-22 19:02:35 +00:00
Jason Petersen	4c7f78bd7e	Code review feedback	2019-03-25 22:07:27 -05:00
Jason Petersen	6a0dc7756e	Formatting fixes Noticed a lot of weird lines wrapped at 80; our standard is 90.	2019-03-22 20:32:19 -06:00
Jason Petersen	6acf52660c	Always coerce RHS of pruning op to part. key type Our assumption that strip_implicit_coercions would leave us with a bi- nary-compatible type to that of the partition key was wrong. Instead, we should ensure the RHS of the comparison we perform is proactively coerced into a compatible type (at least binary compatible).	2019-03-22 20:32:19 -06:00
Jason Petersen	5baa257c91	Add second assert to guard against future changes This isn't entirely necessary but I feel safer with it here.	2019-03-22 20:32:19 -06:00
Jason Petersen	69adb627c3	Add Assert that will crash before coercion fix is in	2019-03-22 20:32:19 -06:00
Marco Slot	fd4ff29f2f	Add a debug message with distribution column value	2018-06-05 15:09:17 +03:00
Marco Slot	6ba3f42d23	Rename MultiPlan to DistributedPlan	2017-11-22 09:36:24 +01:00
Metin Doslu	0d052e9864	Fix a crash on zero-shard tables	2017-08-18 13:53:59 +03:00
velioglu	7e436c0277	Add bool expression to pruning instance with a function	2017-08-10 08:56:36 +03:00
Andres Freund	e8b793c454	Support for IN (const, list) and = ANY(const, b, c) pruning.	2017-08-10 08:56:36 +03:00
Jason Petersen	9018e698ec	Indentation cleanup Uncrustify 0.65 appears to have changed some defaults, resulting in breakages for those of us who have already upgraded; Travis still uses Uncrustify 0.64, but these changes work with both versions (assuming appropriately updated config), so this should permit use of either version for the time being.	2017-07-11 15:59:28 -06:00
Andres Freund	90b211267d	Perform range based pruning if equality pruning has survivor. We previously dismissed this as unimportant, but it turns out to be very useful for the upcoming subquery pushdown, where a user might specify an equality constraint in a subquery, and the subquery pushdown machinery adds >= and <= restrictions on the shard boundary. Previously the latter restriction was ignored.	2017-04-28 17:35:18 -07:00
Andres Freund	6c08fe72f9	Use stricter qual for pruning if both >/< and >=/<= are present. Previously, if both =< and < (>= and < respectively) were specified, we always used the latter restriction. Instead use the stricter one.	2017-04-28 17:35:18 -07:00

1 2

51 Commits (bcb5b5e952c0077f6d8a8974efea7dbbd196ea12)