citus

Commit Graph

Author	SHA1	Message	Date
Marco Slot	7c0589abb8	Do not override combinefunc of custom aggregates with common names (#6805 ) DESCRIPTION: Fix an issue that caused some queries with custom aggregates to fail While playing around with https://github.com/pgvector/pgvector I noticed that the AVG query was broken. That's because we treat it as any other AVG by breaking it down in SUM and COUNT, but there are no SUM/COUNT functions in this case, but there is a perfectly usable combinefunc. This PR changes our aggregate logic to prefer custom aggregates with a combinefunc even if they have a common name. Co-authored-by: Marco Slot <marco.slot@gmail.com>	2023-04-03 19:43:09 +02:00
Jelte Fennema	184c7c0bce	Make enterprise features open source (#6008 ) This PR makes all of the features open source that were previously only available in Citus Enterprise. Features that this adds: 1. Non blocking shard moves/shard rebalancer (`citus.logical_replication_timeout`) 2. Propagation of CREATE/DROP/ALTER ROLE statements 3. Propagation of GRANT statements 4. Propagation of CLUSTER statements 5. Propagation of ALTER DATABASE ... OWNER TO ... 6. Optimization for COPY when loading JSON to avoid double parsing of the JSON object (`citus.skip_jsonb_validation_in_copy`) 7. Support for row level security 8. Support for `pg_dist_authinfo`, which allows storing different authentication options for different users, e.g. you can store passwords or certificates here. 9. Support for `pg_dist_poolinfo`, which allows using connection poolers in between coordinator and workers 10. Tracking distributed query execution times using citus_stat_statements (`citus.stat_statements_max`, `citus.stat_statements_purge_interval`, `citus.stat_statements_track`). This is disabled by default. 11. Blocking tenant_isolation 12. Support for `sslkey` and `sslcert` in `citus.node_conninfo`	2022-06-16 00:23:46 -07:00
Ahmet Gedemenli	b5448e43e3	Fix aggregate signature bug (#5854 )	2022-03-23 13:42:03 +03:00
Burak Velioglu	d4625ec6a1	Add support for zero-argument polymorphic aggregates	2022-03-21 16:10:40 +03:00
Burak Velioglu	333c73a53c	Drop distributed table on worker with ProcessUtilityParseTree	2022-03-15 17:42:01 +03:00
Marco Slot	7559ad12ba	Change create_object_propagation default to immediate	2022-03-09 17:40:50 +01:00
Nils Dijk	3801576dfb	Move pg_dist_object to pg_catalog (#5765 ) DESCRIPTION: Move pg_dist_object to pg_catalog Historically `pg_dist_object` had been created in the `citus` schema as an experiment to understand if we could move our catalog tables to a branded schema. We quickly realised that this interfered with the UX on our managed services and other environments, where users connected via a user with the name of `citus`. By default postgres put the username on the search_path. To be able to read the catalog in the `citus` schema we would need to grant access permissions to the schema. This caused newly created objects like tables etc, to default to this schema for creation. This failed due to the write permissions to that schema. With this change we move the `pg_dist_object` catalog table to the `pg_catalog` schema, where our other schema's are also located. This makes the catalog table visible and readable by any user, like our other catalog tables, for debugging purposes. Note: due to the change of schema, we had to disable 1 test that was running into a discrepancy between the schema and binary. Secondly, we needed to make the lookup functions for the `pg_dist_object` relation and their indexes less strict on the fallback of the naming due to an other test that, due to an unfortunate cache invalidation, needed to lookup the relation again. This makes that we won't default to _only_ resolving from `pg_catalog` outside of upgrades.	2022-03-04 17:40:38 +00:00
Ahmet Gedemenli	e1809af376	Propagate CREATE AGGREGATE commands	2022-03-02 10:52:43 +03:00
Burak Velioglu	fa6866ed36	Start to propagate functions to worker nodes with CREATE FUNCTION command together with it's dependencies. If the function depends on any nondistributable object, function will be created only locally. Parameterless version of create_distributed_function becomes obsolete with this change, it will deprecated from the code with a subsequent PR.	2022-02-18 13:56:51 +03:00
Onur Tirtir	79442df1b7	Fix coordinator/worker query targetlists for agg. that we cannot push-down (#5679 ) Previously, we were wrapping targetlist nodes with Vars that reference to the result of the worker query, if the node itself is not `Const` or not a `Param`. Indeed, we should not do that unless the node itself is a `Var` node or contains a `Var` within it (e.g.: `OpExpr(Var(column_a) > 2)`). Otherwise, when worker query returns empty result set, then combine query exec would crash since the `Var` would be pointing to an empty tuple slot, which is not desirable for the node-executor methods.	2022-02-04 05:37:25 -08:00
Burak Velioglu	ed8e32de5e	Sync pg_dist_object on an update and propagate while syncing to a new node Before that PR we were updating citus.pg_dist_object metadata, which keeps the metadata related to objects on Citus, only on the coordinator node. In order to allow using those object from worker nodes (or erroring out with proper error message) we've started to propagate that metedata to worker nodes as well.	2021-12-06 19:25:50 +03:00
Benjamin Satzger	a35a15a513	Distribute custom aggregates with multiple arguments (#4047 ) Enable custom aggregates with multiple parameters to be executed on workers. #2921 introduces distributed execution of custom aggregates. One of the limitations of this feature is that only aggregate functions with a single aggregation parameter can be pushed to worker nodes. Aim of this change is to remove that limitation and support handling of multi-parameter aggregates. Resolves: #3997 See also: #2921	2020-07-24 15:16:00 -07:00
SaitTalhaNisanci	b3af63c8ce	Remove task tracker executor (#3850 ) * use adaptive executor even if task-tracker is set * Update check-multi-mx tests for adaptive executor Basically repartition joins are enabled where necessary. For parallel tests max adaptive executor pool size is decresed to 2, otherwise we would get too many clients error. * Update limit_intermediate_size test It seems that when we use adaptive executor instead of task tracker, we exceed the intermediate result size less in the test. Therefore updated the tests accordingly. * Update multi_router_planner It seems that there is one problem with multi_router_planner when we use adaptive executor, we should fix the following error: +ERROR: relation "authors_range_840010" does not exist +CONTEXT: while executing command on localhost:57637 * update repartition join tests for check-multi * update isolation tests for repartitioning * Error out if shard_replication_factor > 1 with repartitioning As we are removing the task tracker, we cannot switch to it if shard_replication_factor > 1. In that case, we simply error out. * Remove MULTI_EXECUTOR_TASK_TRACKER * Remove multi_task_tracker_executor Some utility methods are moved to task_execution_utils.c. * Remove task tracker protocol methods * Remove task_tracker.c methods * remove unused methods from multi_server_executor * fix style * remove task tracker specific tests from worker_schedule * comment out task tracker udf calls in tests We were using task tracker udfs to test permissions in multi_multiuser.sql. We should find some other way to test them, then we should remove the commented out task tracker calls. * remove task tracker test from follower schedule * remove task tracker tests from multi mx schedule * Remove task-tracker specific functions from worker functions * remove multi task tracker extra schedule * Remove unused methods from multi physical planner * remove task_executor_type related things in tests * remove LoadTuplesIntoTupleStore * Do initial cleanup for repartition leftovers During startup, task tracker would call TrackerCleanupJobDirectories and TrackerCleanupJobSchemas to clean up leftover directories and job schemas. With adaptive executor, while doing repartitions it is possible to leak these things as well. We don't retry cleanups, so it is possible to have leftover in case of errors. TrackerCleanupJobDirectories is renamed as RepartitionCleanupJobDirectories since it is repartition specific now, however TrackerCleanupJobSchemas cannot be used currently because it is task tracker specific. The thing is that this function is a no-op currently. We should add cleaning up intermediate schemas to DoInitialCleanup method when that problem is solved(We might want to solve it in this PR as well) * Revert "remove task tracker tests from multi mx schedule" This reverts commit `03ecc0a681`. * update multi mx repartition parallel tests * not error with task_tracker_conninfo_cache_invalidate * not run 4 repartition queries in parallel It seems that when we run 4 repartition queries in parallel we get too many clients error on CI even though we don't get it locally. Our guess is that, it is because we open/close many connections without doing some work and postgres has some delay to close the connections. Hence even though connections are removed from the pg_stat_activity, they might still not be closed. If the above assumption is correct, it is unlikely for it to happen in practice because: - There is some network latency in clusters, so this leaves some times for connections to be able to close - Repartition joins return some data and that also leaves some time for connections to be fully closed. As we don't get this error in our local, we currently assume that it is not a bug. Ideally this wouldn't happen when we get rid of the task-tracker repartition methods because they don't do any pruning and might be opening more connections than necessary. If this still gives us "too many clients" error, we can try to increase the max_connections in our test suite(which is 100 by default). Also there are different places where this error is given in postgres, but adding some backtrace it seems that we get this from ProcessStartupPacket. The backtraces can be found in this link: https://circleci.com/gh/citusdata/citus/138702 * Set distributePlan->relationIdList when it is needed It seems that we were setting the distributedPlan->relationIdList after JobExecutorType is called, which would choose task-tracker if replication factor > 1 and there is a repartition query. However, it uses relationIdList to decide if the query has a repartition query, and since it was not set yet, it would always think it is not a repartition query and would choose adaptive executor when it should choose task-tracker. * use adaptive executor even with shard_replication_factor > 1 It seems that we were already using adaptive executor when replication_factor > 1. So this commit removes the check. * remove multi_resowner.c and deprecate some settings * remove TaskExecution related leftovers * change deprecated API error message * not recursively plan single relatition repartition subquery * recursively plan single relation repartition subquery * test depreceated task tracker functions * fix overlapping shard intervals in range-distributed test * fix error message for citus_metadata_container * drop task-tracker deprecated functions * put the implemantation back to worker_cleanup_job_schema_cachesince citus cloud uses it * drop some functions, add downgrade script Some deprecated functions are dropped. Downgrade script is added. Some gucs are deprecated. A new guc for repartition joins bucket size is added. * order by a test to fix flappiness	2020-07-18 13:11:36 +03:00
Sait Talha Nisanci	e5a21f07cb	test aggregates with expressions	2020-06-30 11:41:16 -07:00
Philip Dubé	d155149c18	tests: remove stale comment, fix typo	2020-03-31 20:13:51 +00:00
Philip Dubé	917cb6ae93	Don't segfault on queries using GROUPING GROUPING will always return 0 outside of GROUPING SETS, CUBE, or ROLLUP Since we don't support those, it makes sense to reject GROUPING in queries	2020-03-25 15:46:43 +00:00
Philip Dubé	11b968bc30	Add runtime type checking to AGGREGATE_CUSTOM_COMBINE helper functions	2020-03-11 17:20:30 +00:00
Philip Dubé	d43c80d4d8	pullUpIntermediateRows should not be true when groupedByDisjointPartitionColumn is true This was causing 'SELECT id, stdev(y_int) FROM tbl GROUP BY id' to push down stddev without group by	2020-01-30 21:18:08 +00:00
Philip Dubé	281aacce9b	Fix row-gather for subqueries being handled by task-tracker task-tracker has specific logic for MultiPartition when GROUP BY is missing We were ending up in this code path because row-gather removes GROUP BY	2020-01-10 01:51:37 +00:00
Philip Dubé	863bf49507	Implement pulling up rows to coordinator when aggregates cannot be pushed down. Enabled by default	2020-01-07 01:16:04 +00:00
Marco Slot	b21b6905ae	Do not repeat GROUP BY distribution_column on coordinator Allow arbitrary aggregates to be pushed down in these scenarios	2019-12-25 01:33:41 +00:00
Philip Dubé	e9bbdb8f31	Fix handling of empty intermediate results when distributing custom aggregates	2019-12-23 17:27:52 +00:00
Philip Dubé	1597fbb369	aggregate_support test: test DISTINCT, ORDER BY, FILTER, & no intermediate results Previously, - we'd push down ORDER BY, but this doesn't order intermediate results between workers - we'd keep FILTER on master aggregate, which would raise an error about unexpected cstrings	2019-12-03 15:46:01 +00:00
Marco Slot	2329157406	Swap aggregate_support tests to simplify enterprise merge	2019-11-26 13:39:18 +01:00
Philip Dubé	a81e6a81ab	Fix distributed aggregation for non superuser roles Moves support functions to pg_catalog for now. We'd prefer a different solution for when we're creating these support functions dynamically	2019-11-25 20:46:25 +00:00
Philip Dubé	495c0f5117	Phase 1 implementation of custom aggregates Phase 1 seeks to implement minimal infrastructure, so does not include: - dynamic generation of support aggregates to handle multiple arguments - configuration methods to direct aggregation strategy, or mark an aggregate's serialize/deserialize as safe to operate across nodes Aggregates can be distributed when: - they have a single argument - they have a combinefunc - their transition type is not a pseudotype	2019-11-14 19:01:24 +00:00

26 Commits (002a88ae7fbd1d4cb41c168b075ac3a6aeb368d6)