citus

Commit Graph

Author	SHA1	Message	Date
Naisila Puka	35b4ddc355	Pg15 support (#6085 ) * Adjust configure script to allow PG15 * Adds copy of ruleutils_14.c as ruleutils_15.c * Uses get_namespace_name_or_temp in ruleutils_15.c Relevant PG commit: 48c5c9068211e0a04fd9553c8714b2821ed3ad17 * Clean up code using "(expr) ? true : false" in ruleutils_15.c Relevant PG commit: fd0625c7a9c679c0c1e896014b8f49a489c3a245 * Change varno from Index (unsigned int) to int in ruleutils_15.c Relevant PG commit: e3ec3c00d85bd2844ffddee83df2bd67c4f8297f * Adds find_recursive_union to ruleutils_15.c Relevant PG commit: 3f50b82639637c9908afa2087de7588450aa866b * Fix display of SQL-std func's args in INSERT/SELECT in ruleutils_15.c Relevant PG commit: a8d8445a7b2f80f6d0bfe97b19f90bd2cbef8759 * Fix ruleutils_15.c's dumping of whole-row Vars in more contexts Relevant PG commit: 43c2175121c829c8591fc5117b725f1f22bfb670 * Fix assorted missing logic for GroupingFunc nodes in ruleutils_15.c Relevant PG commit: 2591ee8ec44d8cbc8e1226550337a64c684746e4 * Adds grammar support for SQL/JSON clauses in ruleutils_15.c Relevant PG commit: f79b803dcc98d707450e158db3638dc67ff8380b * Adds SQL/JSON constructors to ruleutils_15.c Relevant PG commits: f4fb45d15c59d7add2e1b81a9d477d0119a9691a cc7401d5ca498a84d9b47fd2e01cebd8e830e558 * Adds support for MERGE in ruleutils_15.c Relevant PG commit: 7103ebb7aae8ab8076b7e85f335ceb8fe799097c * Add IS JSON predicate to ruleutils_15.c Relevant PG commit: 33a377608fc29cdd1f6b63be561eab0aee5c81f0 * Add SQL/JSON query functions to ruleutils_15.c Relevant PG commit: 1a36bc9dba8eae90963a586d37b6457b32b2fed4 * Adds three different SQL/JSON values to ruleutils_15.c Relevant PG commits: 606948b058dc16bce494270eea577011a602810e 49082c2cc3d8167cca70cfe697afb064710828ca * Adds JSON table functions in ruleutils_15.c Relevant PG commit: 4e34747c88a03ede6e9d731727815e37273d4bc9 * Add PLAN function for JSON table in ruleutils_15.c Relevant PG commit: fadb48b00e02ccfd152baa80942de30205ab3c4f * Remove extra blank lines before block-closing braces ruleutils_15.c Relevant PG commit: 24d2b2680a8d0e01b30ce8a41c4eb3b47aca5031 * set_deparse_plan: Reuse variable to appease Coverity ruleutils_15.c Relevant PG commit: e70813fbc4aaca35ec012d5a426706bd54e4acab * Mechanical code beautification ruleutils_15.c Relevant PG commit: 23e7b38bfe396f919fdb66057174d29e17086418 * Rename value_type to item_type in ruleutils_15.c Relevant PG commit: 3ab9a63cb638a1fd99475668e2da9c237495aeda * Show 'AS "?column?"' explicitly when it's important in ruleutils_15.c Relevant PG commit: c7461fc25558832dd347a9c8150b0f1ed85e36e8 * Fix ruleutils_15.c issues with dropped cols in funcs-returning-composite Relevant PG commit: c1d1e8469c77ce6b8e5310955580b4a3eee7fe96 * Change comment regarding functions returning composite in ruleutils_15.c Relevant PG commit: c2fa113ddb1117b1f03e91960f65d5d7d8a90270 * Replace int nodes with bool nodes where needed In PG15, Boolean nodes are added. Pre PG15, internal Boolean values in Create Role commands were represented by Integer nodes. This commit replaces int nodes logic with bool nodes logic where needed. Mostly there are CREATE ROLE logic changes. Relevant PG commit: 941460fcf731a32e6a90691508d5cfa3d1f8eeaf * Handle new option colliculocale in CREATE COLLATION logic In PG15, there is an added option to use ICU as global locale provider. pg_collation has three locale-related fields: collcollate and collctype, which are libc-related fields, and a new one colliculocale, which is the ICU-related field. Only the libc-related fields or the ICU-related field is set, never both. Relevant PG commits: f2553d43060edb210b36c63187d52a632448e1d2 54637508f87bd5f07fb9406bac6b08240283be3b * Add PG15 tests to CI using test images that have 15beta2 (#6093) * Change warning message in pg_signal_backend() Relevant PG commit: 7fa945b857cc1b2964799411f1633468826861ff * Revert "Add missing ifdef for PG 15" This reverts commit `c7b51025ab`. * Fixes tests for ALTER TRIGGER RENAME consistency for part. tables Relevant PG commit: 80ba4bb383538a2ee846fece6a7b8da9518b6866 * Prevent creating child triggers on partitions when adding new node Pre PG15, tgisinternal is true for a "child" trigger on a partition cloned from the trigger on the parent. In PG15, tgisinternal is false in that case. However, we don't want to create this trigger on the partition since it will create a conflict when we try to attach the partition to the parent table: ERROR: trigger "..." for relation "{partition_name}" already exists Relevant PG commit: f4566345cf40b068368cb5617e61318da60676ec * Fix tests for generated columns dependency changes In PG15, For GENERATED columns, all dependencies of the generation expression are recorded as NORMAL dependencies of the column itself. This requires CASCADE to drop generated cols with the original col. PRE PG15, dependencies were recorded as AUTO, with which generated columns are silently dropped with the original column. Relevant PG commit: cb02fcb4c95bae08adaca1202c2081cfc81a28b5 * Explicitly cast catalog "char" column to text before concatenation Relevant PG commit: 07eee5a0dc642d26f44d65c4e6263304208e8583 * Remove 'AS "?column?"' from test outputs There were some instances in the following tst outputs in planning debug outputs where AS "?column?" is added. We add a normalization rule to remove it as it is not important. cte_inline.out recursive_relation_planning_restriction_pushdown.out Relevant PG commit: c7461fc25558832dd347a9c8150b0f1ed85e36e8 * Use pg_backup_stop(PG15) instead of pg_stop_backup(PG<15) Add an alternative test output because of the change in the backup modes of Postgres. Specifically here, there is a renaming issue: pg_stop_backup PRE PG15 vs pg_backup_stop PG15+ The alternative output can be deleted when we drop support for PG14 Relevant PG commit: 39969e2a1e4d7f5a37f3ef37d53bbfe171e7d77a * Adds citus.mitmfifo GUC Previously we setting this configuration parameter in the fly for failure tests schedule. However, PG15 doesn't allow that anymore: reserved prefixes like "citus" cannot be used to set non-existing GUCs. Relevant PG commit: 88103567cb8fa5be46dc9fac3e3b8774951a2be7 * Handles EXPLAIN output diffs in PG15 - Extra result lines To handle extra "Result" lines in explain outputs, we add explain method to multi_test_helpers.sql file - plan_without_result_lines() is added for cases where we want the whole explain output with only "Result" lines removed * Handles EXPLAIN output diffs in PG15, Hash Agg/Join leverage To handle differences in usage of GroupAggregate vs HashAggregate or Merge Join vs Hash join in cases where this detail doesn't seem to matter, we use coordinator_plan(). - coordinator_plan() is updated to remove "Result" lines There are some cases where we have subplans so we add a new function that prints all Task Count lines as well - coordinator_plan_with_subplans() Still not sure of the relevant PG commit Could be db0d67db2401eb6238ccc04c6407a4fd4f985832 but disabling enable_group_by_reordering didn't help. * Handles EXPLAIN output diffs in PG15: enable_group_by_reordering Relevant PG commit db0d67db2401eb6238ccc04c6407a4fd4f985832 * Normalizes Memory Usage, Buckets, Batches for PG15 explain diffs We create a new function in multi_test_helpers, which is similar to explain_merge function in PG15. This explain helper function normalies Memory Usage, Buckets and Batches, and we use it in the tests which give a different output for PG15. * Bump test images to 15beta3 (#6172) * Omit namespace in post-copy errmsg Relevant PG commit: 069d33d0c5a021601245e44df77a0423ddd69359 * Handles EXPLAIN output diffs in PG15: extra arrows&result lines To handle extra "->" arrows resulting from extra Result lines in explain outputs, we add the following explain method to multi_test_helpers.sql file - plan_without_arrows() is added for cases where we want the whole explain output without arrows and without Result lines * Alters public schema's owner to pg_database_owner in PG15 In PG15, public schema is owned by pg_database_owner role. In multi_extension, we drop and recreate the ppublic schema, hence its owner become the default user in our tests, postgres. Change that to pg_database_owner for PG15 consistency. This results in alternative test output for public schema grants in the following test: grant_on_schema_propagation.sql Relevant PG commit: b073c3ccd06e4cb845e121387a43faa8c68a7b62 * Add alternative test outputs for change in Insert Select display citus_local_tables_queries.sql coordinator_shouldhaveshards.sql cte_inline.sql insert_select_repartition.sql intermediate_result_pruning.sql local_shard_execution.sql local_shard_execution_replicated.sql multi_deparse_shard_query.sql multi_insert_select.sql multi_insert_select_conflict.sql multi_mx_insert_select_repartition.sql mx_coordinator_shouldhaveshards.sql single_node.sql Relevant PG commit: a8d8445a7b2f80f6d0bfe97b19f90bd2cbef8759 * Fixes columnar tap tests for PG15 In PG15, Perl test modules have been moved to a new namespace. Also, postgres node new() and get_new_node() methods have been unified to one method: new() We create separate tap tests for PG13/14 and PG15+ and update the Makefiles accordingly. Relevant PG commits: 201a76183e2056c2217129e12d68c25ec9c559c8 b3b4d8e68ae83f432f43f035c7eb481ef93e1583 * Handles EXPLAIN output diffs in PG15: HashAgg Leverage,alt. output Still not sure of the relevant PG commit Could be db0d67db2401eb6238ccc04c6407a4fd4f985832 but disabling enable_group_by_reordering didn't help.	2022-08-24 17:59:17 +02:00
Marco Slot	72d8fde28b	Use intermediate results for re-partition joins	2022-02-23 19:40:21 +01:00
Burak Velioglu	fa6866ed36	Start to propagate functions to worker nodes with CREATE FUNCTION command together with it's dependencies. If the function depends on any nondistributable object, function will be created only locally. Parameterless version of create_distributed_function becomes obsolete with this change, it will deprecated from the code with a subsequent PR.	2022-02-18 13:56:51 +03:00
Halil Ozan Akgul	87a1c760d9	Fix tests in multi-1-schedule that fail with metadata syncing	2021-11-26 12:09:53 +03:00
SaitTalhaNisanci	03832f353c	Drop postgres 11 support	2021-03-25 09:20:28 +03:00
Onder Kalaci	e65e72130d	Rename use -> shouldUse Because setting the flag doesn't necessarily mean that we'll use 2PC. If connections are read-only, we will not use 2PC. In other words, we'll use 2PC only for connections that modified any placements.	2021-03-12 08:29:43 +00:00
Onder Kalaci	6a7ed7b309	Do not trigger 2PC for reads on local execution Before this commit, Citus used 2PC no matter what kind of local query execution happens. For example, if the coordinator has shards (and the workers as well), even a simple SELECT query could start 2PC: ```SQL WITH cte_1 AS (SELECT * FROM test LIMIT 10) SELECT count(*) FROM cte_1; ``` In this query, the local execution of the shards (and also intermediate result reads) triggers the 2PC. To prevent that, Citus now distinguishes local reads and local writes. And, Citus switches to 2PC only if a modification happens. This may still lead to unnecessary 2PCs when there is a local modification and remote SELECTs only. Though, we handle that separately via #4587.	2021-03-12 08:29:43 +00:00
Onder Kalaci	c804c9aa21	Allow local execution for intermediate results in COPY When COPY is used for copying into co-located files, it was not allowed to use local execution. The primary reason was Citus treating co-located intermediate results as co-located shards, and COPY into the distributed table was done via "format result". And, local execution of such COPY commands was not implemented. With this change, we implement support for local execution with "format result". To do that, we use the buffer for every file on shardState->copyOutState, similar to how local copy on shards are implemented. In fact, the logic is similar to local copy on shards, but instead of writing to the shards, Citus writes the results to a file. The logic relies on LOCAL_COPY_FLUSH_THRESHOLD, and flushes only when the size exceeds the threshold. But, unlike local copy on shards, in this case we write the headers and footers just once.	2021-02-09 15:00:06 +01:00
Sait Talha Nisanci	24e60b44a1	Consider coordinator in intermediate result optimization It seems that we were not considering the case where coordinator was added to the cluster as a worker in the optimization of intermediate results. This could lead to errors when coordinator was added as a worker.	2021-02-03 20:02:03 +03:00
Marco Slot	03328e9679	Rename citus_tables column names to be query-friendly	2021-01-21 18:58:30 +01:00
Halil Ozan Akgul	2be14cce2e	Adds alter_distributed_table and alter_table_set_access_method UDFs	2021-01-13 16:02:39 +03:00
Marco Slot	5de3337b2f	Support local execution for INSERT..SELECT with re-partitioning	2021-01-06 16:15:53 +01:00
Ahmet Gedemenli	1f36ff7c17	Prevent deadlock for long named partitioned index creation on single node (#4461 ) * Prevent deadlock for long named partitioned index creation on single node * Create IsSingleNodeCluster function * Use both local and sequential execution	2021-01-05 13:39:13 +03:00
naisila	59a81491e8	Add test for master_create_empty_shard on coordinator	2020-12-24 17:59:40 +03:00
Onder Kalaci	82a4830c7d	Adjust the existing regression tests	2020-12-15 18:17:10 +03:00
Önder Kalacı	e7079d1384	Add orderbys to some tests (#4162 )	2020-09-14 16:59:22 +02:00
Onder Kalaci	c25de2cf22	Remove flag from As it doesn't make any sense anymore	2020-07-20 12:45:05 +02:00
SaitTalhaNisanci	ab5be77709	test coordinator reference-distributed table join (#3698 )	2020-07-14 11:43:03 +03:00
Sait Talha Nisanci	d97d03ec65	use ActivePrimaryNodeList to include coordinator ActiveReadableWorkerNodeList doesn't include coordinator, however if coordinator is added as a worker, we should also include that while planning. The current methods are very easily misusable and this requires a refactoring to make the distinction between methods that include coordinator and that don't very explicit as they can introduce subtle/major bugs pretty easily.	2020-07-13 19:20:15 +03:00
Sait Talha Nisanci	db1b78148c	send schema creation/cleanup to coordinator in repartitions We were using ALL_WORKERS TargetWorkerSet while sending temporary schema creation and cleanup. We(well mostly I) thought that ALL_WORKERS would also include coordinator when it is added as a worker. It turns out that it was FILTERING OUT the coordinator even if it is added as a worker to the cluster. So to have some context here, in repartitions, for each jobId we create (at least we were supposed to) a schema in each worker node in the cluster. Then we partition each shard table into some intermediate files, which is called the PARTITION step. So after this partition step each node has some intermediate files having tuples in those nodes. Then we fetch the partition files to necessary worker nodes, which is called the FETCH step. Then from the files we create intermediate tables in the temporarily created schemas, which is called a MERGE step. Then after evaluating the result, we remove the temporary schemas(one for each job ID in each node) and files. If node 1 has file1, and node 2 has file2 after PARTITION step, it is enough to either move file1 from node1 to node2 or vice versa. So we prune one of them. In the MERGE step, if the schema for a given jobID doesn't exist, the node tries to use the `public` schema if it is a superuser, which is actually added for testing in the past. So when we were not sending schema creation comands for each job ID to the coordinator(because we were using ALL_WORKERS flag, and it doesn't include the coordinator), we would basically not have any schemas for repartitions in the coordinator. The PARTITION step would be executed on the coordinator (because the tasks are generated in the planner part) and it wouldn't give us any error because it doesn't have anything to do with the temporary schemas(that we didn't create). But later two things would happen: - If by chance the fetch is pruned on the coordinator side, we the other nodes would fetch the partitioned files from the coordinator and execute the query as expected, because it has all the information. - If the fetch tasks are not pruned in the coordinator, in the MERGE step, the coordinator would either error out saying that the necessary schema doesn't exist, or it would try to create the temporary tables under public schema ( if it is a superuser). But then if we had the same task ID with different jobID it would fail saying that the table already exists, which is an error we were getting. In the first case, the query would work okay, but it would still not do the cleanup, hence we would leave the partitioned files from the PARTITION step there. Hence ensure_no_intermediate_data_leak would fail. To make things more explicit and prevent such bugs in the future, ALL_WORKERS is named as ALL_NON_COORD_WORKERS. And a new flag to return all the active nodes is added as ALL_DATA_NODES. For repartition case, we don't use the only-reference table nodes but this version makes the code simpler and there shouldn't be any significant performance issue with that.	2020-07-13 19:20:15 +03:00
Jelte Fennema	ab01571c9e	Fix crash with single node dummy placement (#3993 ) Static analysis found an issue where we could dereference `NULL`, because `CreateDummyPlacement` could return `NULL` when there were no workers. This PR changes it so that it never returns `NULL`, which was intended by @marcocitus when doing this change: https://github.com/citusdata/citus/pull/3887/files#r438136433 While adding tests for citus on a single node I also added some more basic tests and it turns out we error out on repartition joins. This has been present since `shouldhaveshards` was introduced and is not trivial to fix. So I created a separate issue for this: https://github.com/citusdata/citus/issues/3996	2020-07-08 17:11:25 +02:00
Marco Slot	24feadc230	Handle joins between local/reference/cte via router planner	2020-06-12 18:36:01 -07:00
SaitTalhaNisanci	cbda951395	Fix task copy and appending empty task in ExtractLocalAndRemoteTasks (#3802 ) * Not append empty task in ExtractLocalAndRemoteTasks ExtractLocalAndRemoteTasks extracts the local and remote tasks. If we do not have a local task the localTaskPlacementList will be NIL, in this case we should not append anything to local tasks. Previously we would first check if a task contains a single placement or not, now we first check if there is any local task before doing anything. * fix copy of node task Task node has task query, which might contain a list of strings in its fields. We were using postgres copyObject for these lists. Postgres assumes that each element of list will be a node type. If it is not a node type it will error. As a solution to that, a new macro is introduced to copy a list of strings.	2020-04-29 11:05:34 +03:00
SaitTalhaNisanci	24dcb02bca	enable local table join with reference table (#3697 ) * enable local table join with reference table * test different cases with local table and reference join	2020-04-09 15:25:54 +03:00
SaitTalhaNisanci	a710b3cdc5	fix null tupleStoreState case in ExecuteLocalTaskListExtended (#3711 ) In case we don't care about the tupleStoreState in ExecuteLocalTaskListExtended, it could be passed as null. In that case we will get a seg error. This changes it so that a dummy tuple store will be created when it is null. Do not use local execution in ExecuteTaskListOutsideTransaction. As we are going to run the tasks outside transaction, we shouldn't use local execution. However, there is some problem when using local execution related to repartition joins, when we solve that problem, we can execute the tasks coming to this path with local execution. Also logging the local command is simplified. normalize job id in worker_hash_partition_table in test outputs.	2020-04-07 11:47:09 +03:00
SaitTalhaNisanci	df88ab71b6	normalize assign_distributed_transaction_id in tests	2020-04-01 18:23:16 +03:00
SaitTalhaNisanci	0aebd78ea7	use localExecution in ExecuteTaskListExtended ExecuteTaskListExtended is the common method for different codepaths, and instead of writing separate local execution logics in different codepaths, it makes more sense to have the logic here. We still need to do some refactoring, this is an initial step. After this commit, we can run create shard commands locally. There is a special case with shard creation commands. A create shard command might have a concatenated query string, however local execution did not know how to execute a task with multiple query strings. This is also implemented in this commit. We go over each query in the concatenated query string and plan/execute them one by one. A more clean solution to this would be to make sure that each task has a single query. We currently cannot do that because we need to ensure the task dependencies. However, it would make sense to do that at some point and it would simplify the code a lot.	2020-04-01 18:23:16 +03:00
Jelte Fennema	246435be7e	Lazy query deparsing executable queries (#3350 ) Deparsing and parsing a query can be heavy on CPU. When locally executing the query we don't need to do this in theory most of the time. This PR is the first step in allowing to skip deparsing and parsing the query in these cases, by lazily creating the query string and storing the query in the task. Future commits will make use of this and not deparse and parse the query anymore, but use the one from the task directly.	2020-01-17 11:49:43 +01:00
Marco Slot	b37ef0e394	Fix error in distributed queries when shards are on the coordinator	2019-12-24 06:36:43 +01:00

29 Commits (17149b92b2e65fd038732a6ea57e2c3eb41bc5e7)