citus

Commit Graph

Author	SHA1	Message	Date
Marco Slot	03328e9679	Rename citus_tables column names to be query-friendly	2021-01-21 18:58:30 +01:00
Halil Ozan Akgul	2be14cce2e	Adds alter_distributed_table and alter_table_set_access_method UDFs	2021-01-13 16:02:39 +03:00
Marco Slot	d900a7336e	Automatically add placeholder record for coordinator	2021-01-08 15:09:53 +01:00
Marco Slot	597533b1ff	Add citus_set_coordinator_host	2021-01-08 13:36:26 +01:00
Onur Tirtir	5289785da4	Add cascade_via_foreign_keys option to create_citus_local_table (#4462 )	2021-01-08 15:13:26 +03:00
Onur Tirtir	f3801143fb	Add cascade option to undistribute_table	2021-01-07 15:41:49 +03:00
Marco Slot	5de3337b2f	Support local execution for INSERT..SELECT with re-partitioning	2021-01-06 16:15:53 +01:00
Ahmet Gedemenli	1f36ff7c17	Prevent deadlock for long named partitioned index creation on single node (#4461 ) * Prevent deadlock for long named partitioned index creation on single node * Create IsSingleNodeCluster function * Use both local and sequential execution	2021-01-05 13:39:13 +03:00
Onder Kalaci	c546ec5e78	Local node connection management When Citus needs to parallelize queries on the local node (e.g., the node executing the distributed query and the shards are the same), we need to be mindful about the connection management. The reason is that the client backends that are running distributed queries are competing with the client backends that Citus initiates to parallelize the queries in order to get a slot on the max_connections. In that regard, we implemented a "failover" mechanism where if the distributed queries cannot get a connection, the execution failovers the tasks to the local execution. The failover logic is follows: - As the connection manager if it is OK to get a connection - If yes, we are good. - If no, we fail the workerPool and the failure triggers the failover of the tasks to local execution queue The decision of getting a connection is follows: /* * For local nodes, solely relying on citus.max_shared_pool_size or * max_connections might not be sufficient. The former gives us * a preview of the future (e.g., we let the new connections to establish, * but they are not established yet). The latter gives us the close to * precise view of the past (e.g., the active number of client backends). * * Overall, we want to limit both of the metrics. The former limit typically * kics in under regular loads, where the load of the database increases in * a reasonable pace. The latter limit typically kicks in when the database * is issued lots of concurrent sessions at the same time, such as benchmarks. */	2020-12-03 14:16:13 +03:00
Ahmet Gedemenli	67761897ab	Add test for citus table size func in transaction with modification Add test for citus_relation_size	2020-12-01 10:38:15 +03:00
Onder Kalaci	629ecc3dee	Add the infrastructure to count the number of client backends Considering the adaptive connection management improvements that we plan to roll soon, it makes it very helpful to know the number of active client backends. We are doing this addition to simplify yhe adaptive connection management for single node Citus. In single node Citus, both the client backends and Citus parallel queries would compete to get slots on Postgres' `max_connections` on the same Citus database. With adaptive connection management, we have the counters for Citus parallel queries. That helps us to adaptively decide on the remote executions pool size (e.g., throttle connections if necessary). However, we do not have any counters for the total number of client backends on the database. For single node Citus, we should consider all the client backends, not only the remote connections that Citus does. Of course Postgres internally knows how many client backends are active. However, to get that number Postgres iterates over all the backends. For examaple, see [pg_stat_get_db_numbackends](`8e90ec5580/src/backend/utils/adt/pgstatfuncs.c (L1240)`) where Postgres iterates over all the backends. For our purpuses, we need this information on every connection establishment. That's why we cannot affort to do this kind of iterattion.	2020-11-25 19:19:24 +01:00
Ahmet Gedemenli	a64dc8a72b	Fixes a bug preventing INSERT SELECT .. ON CONFLICT with a constraint name on local shards Separate search relation shard function Add tests	2020-11-25 15:10:46 +03:00
Onder Kalaci	596f7bf4a9	Add more regression test for single node Citus Tests on commands with SCHEMA.	2020-10-15 17:32:32 +02:00
Onder Kalaci	fe3caf3bc8	Local execution considers intermediate result size limit With this commit, we make sure that local execution adds the intermediate result size as the distributed execution adds. Plus, it enforces the citus.max_intermediate_result_size value.	2020-10-15 17:18:55 +02:00
Sait Talha Nisanci	dc40758355	Return early if there is no citus table in VACUUM	2020-10-09 11:10:00 +03:00
Sait Talha Nisanci	99bb79745a	Commit transaction for VACUUM on shell table With postgres 13, there is a global lock that prevents multiple VACUUMs happening in the current database. This global lock is taken for a short time but this creates a problem because of the following: - We execute the VACUUM for the shell table through the standard process utility. In this step the global lock is taken for the current database. - If the current node has shard placements then it tries to execute VACUUM over a connection to localhost with ExecuteUtilityTaskList. - the VACUUM on shard placements cannot proceed because it is waiting for the global lock for the current database to be released. - The acquired lock from the VACUUM for shell table will not be released until the transaction is committed. - So there is a deadlock. As a solution, we commit the current transaction in case of VACUUM after the VACUUM is executed for the shell table. Executing the VACUUM on a shell table is not important because the data there will probably be truncated. PostprocessVacuumStmt takes the necessary locks on the shell table so we don't need to take any extra locks after we commit the current transaction.	2020-10-09 10:57:44 +03:00
Onder Kalaci	5d017cd123	Improve node matedata when coordinator is added Coordinator should always be always active, hasmetadata and metadasynced. Prevent changing those fields.	2020-09-21 14:53:41 +02:00
Onder Kalaci	6fc1dea85c	Improve the robustness of function call delegation Pushing down the CALLs to the node that the CALL is executed is dangerous and could lead to infinite recursion. When the coordinator added as worker, Citus was by chance preventing this. The coordinator was marked as "not metadatasynced" node in pg_dist_node, which prevented CALL/function delegation to happen. With this commit, we do the following: - Fix metadatasynced column for the coordinator on pg_dist_node - Prevent pushdown of function/procedure to the same node that the function/procedure is being executed. Today, we do not sync pg_dist_object (e.g., distributed functions metadata) to the worker nodes. But, even if we do it now, the function call delegation would prevent the infinite recursion.	2020-09-21 14:53:30 +02:00
SaitTalhaNisanci	dae2c69fd7	Not allow removing a single node with ref tables (#4127 ) * Not allow removing a single node with ref tables We should not allow removing a node if it is the only node in the cluster and there is a data on it. We have this check for distributed tables but we didn't have it for reference tables. * Update src/test/regress/expected/single_node.out Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com> * Update src/test/regress/sql/single_node.sql Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>	2020-09-18 15:35:59 +03:00
Önder Kalacı	8d3f353746	Add more tests for single node citus - distributetd tables (#4166 )	2020-09-17 17:50:35 +02:00
Sait Talha Nisanci	4308d867d9	remove task-tracker in comments, documentation	2020-07-21 16:21:01 +03:00
Sait Talha Nisanci	510535f558	address feedback	2020-07-13 19:45:02 +03:00
Sait Talha Nisanci	41ec76a6ad	use ActiveReadableNodeList in JobExecutorType and task tracker The reason we should use ActiveReadableNodeList instead of ActiveReadableNonCoordinatorNodeList is that if coordinator is added to cluster as a worker, it should be counted as well. Otherwise if there is only coordinator in the cluster, the count will be 0, hence we get a warning. In MultiTaskTrackerExecute, we should connect to coordinator if it is added to the cluster because it will also be assigned tasks.	2020-07-13 19:45:02 +03:00
Sait Talha Nisanci	d97d03ec65	use ActivePrimaryNodeList to include coordinator ActiveReadableWorkerNodeList doesn't include coordinator, however if coordinator is added as a worker, we should also include that while planning. The current methods are very easily misusable and this requires a refactoring to make the distinction between methods that include coordinator and that don't very explicit as they can introduce subtle/major bugs pretty easily.	2020-07-13 19:20:15 +03:00
Jelte Fennema	ab01571c9e	Fix crash with single node dummy placement (#3993 ) Static analysis found an issue where we could dereference `NULL`, because `CreateDummyPlacement` could return `NULL` when there were no workers. This PR changes it so that it never returns `NULL`, which was intended by @marcocitus when doing this change: https://github.com/citusdata/citus/pull/3887/files#r438136433 While adding tests for citus on a single node I also added some more basic tests and it turns out we error out on repartition joins. This has been present since `shouldhaveshards` was introduced and is not trivial to fix. So I created a separate issue for this: https://github.com/citusdata/citus/issues/3996	2020-07-08 17:11:25 +02:00

25 Commits (03328e9679e08685f8e24f77a577b17a4b9176b9)