citus

Commit Graph

Author	SHA1	Message	Date
Onur Tirtir	cc4c83b1e5	HAVE_LZ4 -> HAVE_CITUS_LZ4 (#5541 )	2021-12-16 16:21:52 +03:00
Burak Velioglu	ed8e32de5e	Sync pg_dist_object on an update and propagate while syncing to a new node Before that PR we were updating citus.pg_dist_object metadata, which keeps the metadata related to objects on Citus, only on the coordinator node. In order to allow using those object from worker nodes (or erroring out with proper error message) we've started to propagate that metedata to worker nodes as well.	2021-12-06 19:25:50 +03:00
Onder Kalaci	549edcabb6	Allow disabling node(s) when multiple failures happen As of master branch, Citus does all the modifications to replicated tables (e.g., reference tables and distributed tables with replication factor > 1), via 2PC and avoids any shardstate=3. As a side-effect of those changes, handling node failures for replicated tables change. With this PR, when one (or multiple) node failures happen, the users would see query errors on modifications. If the problem is intermitant, that's OK, once the node failure(s) recover by themselves, the modification queries would succeed. If the node failure(s) are permenant, the users should call `SELECT citus_disable_node(...)` to disable the node. As soon as the node is disabled, modification would start to succeed. However, now the old node gets behind. It means that, when the node is up again, the placements should be re-created on the node. First, use `SELECT citus_activate_node()`. Then, use `SELECT replicate_table_shards(...)` to replicate the missing placements on the re-activated node.	2021-12-01 10:19:48 +01:00
Onder Kalaci	d405993b57	Make sure to use a dedicated metadata connection With this commit, we make sure to use a dedicated connection per node for all the metadata operations within the same transaction. This is needed because the same metadata (e.g., metadata includes the distributed table on the workers) can be modified accross multiple connections. With this connection we guarantee that there is a single metadata connection. But note that this connection can be used for any other operation. In other words, this connection is not only reserved for metadata operations.	2021-11-26 14:36:28 +01:00
Onder Kalaci	38b08ebde9	Generalize the error checks while removing node The checks for preventing to remove a node are very much reference table centric. We are soon going to add the same checks for replicated tables. So, make the checks generic such that: (a) replicated tables fit naturally (b) we can the same checks in `citus_disable_node`.	2021-11-26 14:25:29 +01:00
Marco Slot	56eae48daf	Stop updating shard range in citus_update_shard_statistics	2021-11-19 10:51:15 +01:00
Hanefi Onaldi	c0d43d4905	Prevent cache usage on citus_drop_trigger codepaths	2021-11-18 20:24:51 +03:00
Marco Slot	9e6ca23286	Remove cstore_fdw-related logic	2021-11-16 13:59:03 +01:00
Önder Kalacı	8c0bc94b51	Enable replication factor > 1 in metadata syncing (#5392 ) - [x] Add some more regression test coverage - [x] Make sure returning works fine in case of local execution + remote execution (task->partiallyLocalOrRemote works as expected, already added tests) - [x] Implement locking properly (and add isolation tests) - [x] We do #shardcount round-trips on `SerializeNonCommutativeWrites`. We made it a single round-trip. - [x] Acquire locks for subselects on the workers & add isolation tests - [x] Add a GUC to prevent modification from the workers, hence increase the coordinator-only throughput - The performance slightly drops (~%15), unless `citus.allow_modifications_from_workers_to_replicated_tables` is set to false	2021-11-15 15:10:18 +03:00
Ahmet Gedemenli	14a33d4e8e	Introduce GUC citus.use_citus_managed_tables	2021-11-11 14:09:06 +03:00
Onder Kalaci	d5e89b1132	Unify distributed execution logic for single replicated tables Citus does not acquire any executor locks for shard replication == 1. With this commit, we unify this decision and exit early.	2021-11-08 13:52:20 +01:00
Önder Kalacı	d5b371b2e0	Merge branch 'master' into naisila/fix-partitioned-index	2021-11-08 10:53:16 +01:00
naisila	385ba94d15	Run fix_partition_shard_index_names after each wrong naming command	2021-11-08 10:43:34 +01:00
Marco Slot	78866df13c	Remove master_append_table_to_shard UDF	2021-11-08 10:43:24 +01:00
Marco Slot	fba93df4b0	Remove copy into new append shard logic	2021-11-07 21:01:40 +01:00
Sait Talha Nisanci	ab29c25658	Fix missing from entry	2021-11-04 18:54:52 +03:00
naisila	796d56a7b1	Rename ddlJob->commandString to ddlJob->metadataSyncCommand	2021-10-29 23:45:43 +03:00
Ahmet Gedemenli	67dca4363d	Dont auto-undistribute user-added citus local tables (#5314 ) * Disable auto-undistribute for user-added citus local tables	2021-10-28 12:10:26 +03:00
Philip Dubé	cc50682158	Fix typos. Spurred spotting "connectios" in logs	2021-10-25 13:54:09 +00:00
Onder Kalaci	575bb6dde9	Drop support for Inactive Shard placements Given that we do all operations via 2PC, there is no way for any placement to be marked as INACTIVE.	2021-10-22 18:03:35 +02:00
Önder Kalacı	b3299de81c	Drop support for citus.multi_shard_commit_protocol (#5380 ) In the past, we allowed users to manually switch to 1PC (e.g., one phase commit). However, with this commit, we don't. All multi-shard modifications are done via 2PC.	2021-10-21 14:01:28 +02:00
Önder Kalacı	3f726c72e0	When replication factor > 1, all modifications are done via 2PC (#5379 ) With Citus 9.0, we introduced `citus.single_shard_commit_protocol` which defaults to 2PC. With this commit, we prevent any user to set it to 1PC and drop support for `citus.single_shard_commit_protocol`. Although this might add some overhead for users, it is already the default behaviour (so less likely) and marking placements as INVALID is much worse.	2021-10-20 01:39:03 -07:00
Marco Slot	096660d61d	Remove master_apply_delete_command	2021-10-18 22:29:37 +02:00
Önder Kalacı	31c8f279ac	Add helper UDFs to inspect object dependencies (#5293 ) - citus_get_all_dependencies_for_object: emulate what Citus would qualify as dependency when adding a new node - citus_get_dependencies_for_object: emulate what Citus would qualify as dependency when creating an object Example use: ```SQL -- find all the depedencies of table test SELECT pg_identify_object(t.classid, t.objid, t.objsubid) FROM (SELECT * FROM pg_get_object_address('table', '{test}', '{}')) as addr JOIN LATERAL citus_get_all_dependencies_for_object(addr.classid, addr.objid, addr.objsubid) as t(classid oid, objid oid, objsubid int) ON TRUE ORDER BY 1; ```	2021-10-18 14:46:49 +03:00
Teja Mupparti	a8348047c5	Pushdown procedures with OUT parameters (#5348 )	2021-10-11 23:14:36 -07:00
Onur Tirtir	f7f4a93073	Remove get_relation_trigger_oid_compat	2021-10-11 11:53:00 +03:00
Ahmet Gedemenli	d19793c174	Add partitioning support for citus local tables Add/fix tests Fix creating partitions Add test for mx - partition creating case Enable cascading to partitioned tables Fix mx partition adding test Fix cascading through fkeys Style Disable converting with non-inherited fkeys Fix detach bug Early return in case of cascade & Add tests Style Fix undistribute_table bug & Fix test outputs Remove RemovePartitionRelationIds Test with undistribute_table Add test for mx+convert+undistribute Remove redundant usage of CreatePartitionedCitusLocalTable Add some comments Introduce bulk functions for generating attach/detach partition commands Fix: Convert partitioned tables after adding fkey Change the error message for partitions Introduce function ErrorIfPartitionTableAddedToMetadata Polish attach/detach command generation functions Use time_partitions for testing Move mx tests to citus_local_tables_mx Add new partitioned table to cascade test Add test with time series management UDFs Fix test output Fix: Assertion fail on relation access tracking Style Refactor creating partitioned citus local tables Remove CreatePartitionedCitusLocalTable Style Error out if converting multi-level table Revert some old tests Error out adding partitioned partition Polish Polish/address Fix create table partition of case Use CascadeOperationForRelationIdList if no cascade needed Fix create partition bug Revert / Add new tests to mx Style Fix dropping fkey bug Add test with IF NOT EXISTS Convert to CLT when doing ATTACH PARTITION Add comments Add more tests with time series management Edit the error message for converting the child Use OR instead of AND in ErrorIfUnsupportedAlterTableStmt Edit/improve tests Disable ddl prop when dropping default column definitions Disable/enable ddl prop just before/after the command Add comment Add sequence test Add trigger test Remove NeedCascadeViaForeignKeys Add one more insert to sequence test Add comment Style Fix test output shard ids Update comments Disable creating fkey on partitions Move partition check to CreateCitusLocalTable Add comment Add check for attachingmulti-level partition Add test for pg_constraint Check pg_dist_partition in tests Add test inserting on the worker	2021-10-11 10:45:07 +03:00
Halil Ozan Akgul	9c9d4b5eeb	Turn MX on by default	2021-10-08 18:17:21 +03:00
Onur Tirtir	ea61efb63a	Not flush writes until need to read them when doing index-scan on columnar (#5247 ) Not flush pending writes if given tid belongs to a "flushed" or "aborted" stripe write, or to an "in-progress" stripe write of another backend. That way, we would reduce the cases where we flush single-tuple stripes during index scan. To do that, we follow below steps for index look-up's: - Do not flush any pending writes and do stripe metadata look-up for given tid. If tuple with tid is found, then no need to do another look-up since we already found the tuple without needing to flush pending writes. - If tuple is not found without flushing pending writes, then we have two scenarios: - If given tid belongs to a pending write of my backend, then do stripe metadata look-up for given tid. But this time first flush any pending writes. - Otherwise, just return false from `index_fetch_tuple` since flushing pending writes wouldn't help.	2021-09-13 18:41:20 +02:00
Naisila Puka	a69abe3be0	Fixes bug about int and smallint sequences on MX (#5254 ) * Introduce worker_nextval udf for int&smallint column defaults * Fix current tests and add new ones for worker_nextval	2021-09-09 23:41:07 +03:00
Nils Dijk	80a44a7b93	prevent double inclusion of columnar_tableam.h (#5266 ) Recently there are some warnings during the compilation of Citus. Part of the warnings come due to the `columnar_tableam.h` header not being properly guarded with defines and ifndef's. This PR fixes these warnings.	2021-09-09 17:37:58 +02:00
Marco Slot	4faa49775b	Perform copy command as regular user in worker_append_table_to_shard	2021-09-09 11:00:29 +02:00
Onur Tirtir	5825c44d5f	Handle aborted writes properly when scanning a columnar table (#5244 ) If it is certain that we will not use any `parallel_worker`s for a columnar table, then stripe entries inserted by aborted transactions become visible to `SnapshotAny` and that causes `REINDEX` to fail by throwing a duplicate key error. To fix that: * consider three states for a stripe write operation: "flushed", "aborted", or "in-progress", * make sure to have a clear separation between them, and * act according to those three states when reading from a columnar table	2021-09-08 13:26:11 +03:00
Sait Talha Nisanci	3ad3bbba84	Apply latest version compat without conflicts	2021-09-03 16:09:59 +03:00
Halil Ozan Akgul	ca0d4c3bde	Includes pg_version_constants.h in columnar_version_compat.h	2021-09-03 15:41:28 +03:00
Halil Ozan Akgul	7823e49219	Introduces pg_get_statisticsobj_worker_compat macro Relevant PG commit: a4d75c86bf15220df22de0a92c819ecef9db3849	2021-09-03 15:41:28 +03:00
Halil Ozan Akgul	f16d5e1833	Introduces make_simple_restrictinfo_compat and pull_varnos_compat macros make_simple_restrictinfo and pull_varnos functions now have a new parameter These new macros give us the ability to use this new parameter for PG14 and they don't give the parameter for previous versions Relevant PG commit: 55dc86eca70b1dc18a79c141b3567efed910329d	2021-09-03 15:41:28 +03:00
Sait Talha Nisanci	a1bfb4f31b	Fix unlimited copy size variable's value	2021-09-03 15:41:28 +03:00
Halil Ozan Akgul	b21a00e775	Introduces index_insert_compat macro index_insert function now has a new parameter, indexUnchanged This new macro give us the ability to use these new parameter for PG14 and they don't give the parameters for previous versions Existing parameter is set to false Relevant PG commit: 9dc718bdf2b1a574481a45624d42b674332e2903	2021-09-03 15:27:25 +03:00
Halil Ozan Akgul	fd2ca2825b	Introduces ExecSimpleRelationInsert_compat and modifyStateResultRelInfo macros es_result_relation_info is removed from Estate. In this commit we make some changes to handle that. resultRelationInfo filed is added to ModifyState to support the removed field. Relevant PG commits: 1375422c7826a2bf387be29895e961614f69de4b a04daa97a4339c38e304cd6164d37da540d665a8	2021-09-03 15:27:25 +03:00
Halil Ozan Akgul	b644ac55c6	Introduces GetOldestNonRemovableTransactionId_compat macro GetOldestXmin function is removed so we use GetOldestNonRemovableTransactionId functions instead GetOldestNonRemovableTransactionId_compat picks the appropriate one Relevant PG commit: dc7420c2c9274a283779ec19718d2d16323640c0	2021-09-03 15:27:25 +03:00
Halil Ozan Akgul	cb3b76ed24	Introduces get_partition_parent_compat and RelationGetPartitionDesc_compat macros get_partition_parent and RelationGetPartitionDesc functions now have new parameters to also include detached partitions Thess new macros give us the ability to use these new parameter for PG14 and they don't give the parameters for previous versions Existing parameters are set to not accept detached partitions Relevant PG commit: 71f4c8c6f74ba021e55d35b1128d22fb8c6e1629	2021-09-03 15:27:25 +03:00
Halil Ozan Akgul	898d3bb8d3	Introduces proc_statusflags_compat macro In two commits vacuumFlags in PGXACT is moved and then renamed to status flags This macro uses the appropriate version of the flag Relevant PG commits: 5788e258bb26495fab65ff3aa486268d1c50b123 cd9c1b3e197a9b53b840dcc87eb41b04d601a5f9	2021-09-03 15:27:25 +03:00
Halil Ozan Akgul	287706b717	Introduces SetTuplestoreDestReceiverParams_compat macro SetTuplestoreDestReceiverParams function now has two new parameters This new macro give us the ability to use this new parameter for PG14 and it doesn't give the parameter for previous versions Existing parameters are set to NULL to keep previous behavior Relevant PG commit: 2f48ede080f42b97b594fb14102c82ca1001b80c	2021-09-03 15:27:25 +03:00
Halil Ozan Akgul	c3f0528607	Extends statistics on expressions in ruleutils_14.c Relevant PG commit: a4d75c86bf15220df22de0a92c819ecef9db3849	2021-09-03 15:27:25 +03:00
Halil Ozan Akgul	1d5053b652	Removes support for old protocols in Copy functions from PG14 Some Copy related functions copied from Postgres had support for both old and new protocols Postgres removed support for old version so we remove it too Relevant PG commit: 3174d69fb96a66173224e60ec7053b988d5ed4d9	2021-09-03 15:27:24 +03:00
Halil Ozan Akgul	82858ca8fe	Introduces ProcessUtility macros for readOnlyTree parameter New macros: standard_ProcessUtility_compat, ProcessUtility_compat, ColumnarProcessUtility_compat, PrevProcessUtilityHook_compat The functions now have a new bool parameter: readOnlyTree These new macros give us the ability to use this new parameter for PG14 and it doesn't give the parameter for previous versions In multi_ProcessUtility and ColumnarProcessUtility, before doing anything else, we check if readOnlyTree parameter is true and create a copy of pstmt Existing readOnlyTree parameters are set to false since we already handle the read only case at multi_ProcessUtility and ColumnarProcessUtility Relevant PG commit: 7c337b6b527b7052e6a751f966d5734c56f668b5	2021-09-03 15:27:24 +03:00
Halil Ozan Akgul	db2d9af863	Introduces BeginCopyFrom_compat macro BeginCopyFrom function now has a new whereClause parameter. In the function this parameter is assigned to the whereClause field of the CopyFromState returned Currently in Postgres there is only one place where this argument isn't NULL, and in previous PG version the whereClause argument of copy state is set right after the function call Since we don't have such example all current whereClause parameters are set to NULL Relevant PG commit: c532d15dddff14b01fe9ef1d465013cb8ef186df	2021-09-03 15:27:24 +03:00
Halil Ozan Akgul	35cfa5d7b9	Introduces CopyFromState_compat macro CopyState struct is divided into parts and one of them is CopyFromState This macro uses the appropriate one for PG versions Relevant PG commit: c532d15dddff14b01fe9ef1d465013cb8ef186df	2021-09-03 15:27:24 +03:00
Halil Ozan Akgul	8f34f84ce6	Introduces IsReindexWithParam_compat macro In ReindexStmt concurrent field is moved to options and then options are converted to params list. This macro uses previous fields for previous versions and the new params list with a new function named IsReindexWithParam for PG14 Relevant PG commits: 844c05abc3f1c1703bf17cf44ab66351ed9711d2 b5913f6120792465f4394b93c15c2e2ac0c08376	2021-09-03 15:27:24 +03:00
Halil Ozan Akgul	37ae22ce3e	Introduces macros for vacuum options VacOptTernaryValue enum is renamed to VacOptValue. In the enum there were three values, VACOPT_TERNARY_DEFAULT, VACOPT_TERNARY_DISABLED, and VACOPT_TERNARY_ENABLED Now there are four values VACOPTVALUE_UNSPECIFIED, VACOPTVALUE_AUTO, VACOPTVALUE_DISABLED, and VACOPTVALUE_ENABLED New macros are VacOptValue_compat, VACOPTVALUE_UNSPECIFIED_COMPAT, VACOPTVALUE_DISABLED_COMPAT, and VACOPTVALUE_ENABLED_COMPAT The VACOPTVALUE_UNSPECIFIED_COMPAT matches VACOPT_TERNARY_DEFAULT and VACOPTVALUE_UNSPECIFIED. And there are no macro for VACOPTVALUE_AUTO. Relevant PG commit: 3499df0dee8c4ea51d264a674df5b5e31991319a	2021-09-03 15:27:24 +03:00
Halil Ozan Akgul	ebf1b7e23f	Introduces macros for functions that now have include_out_arguments argument New macros: FuncnameGetCandidates_compat and expand_function_arguments_compat The functions (the ones without _compat) now have a new bool include_out_arguments parameter These new macros give us the ability to use this new parameter for PG14 and it doesn't give the parameter for previous versions Existing include_out_arguments parameters are set to 'false' to keep current behavior Relevant PG commit: e56bce5d43789cce95d099554ae9593ada92b3b7	2021-09-03 15:27:24 +03:00
Halil Ozan Akgul	347ae2928f	Introduces stats_compat macro for MemoryContextMethods->stats stats function now have a new bool print_to_stderr parameter This new macro gives us the ability to use this new parameter for PG14 and it doesn't give the parameter for previous versions Existing print_to_stderr parameter is set to true to keep current behavior Relevant PG commit: 43620e328617c1f41a2a54c8cee01723064e3ffa	2021-09-03 15:27:24 +03:00
Halil Ozan Akgul	54ee93885a	Introduces getObjectTypeDescription_compat and getObjectIdentity_compat macros getObjectTypeDescription and getObjectIdentity functions now have a new bool missing_ok parameter These new macros give us the ability to use this new parameter for PG14 and they don't give the parameter for previous versions Currently all missing_ok parameters are set to false to keep current behavior Relevant PG commit: 2a10fdc4307a667883f7a3369cb93a721ade9680	2021-09-03 15:27:24 +03:00
Halil Ozan Akgul	f8d3e50f25	Introduces STATUS_WAITING_COMPAT macro The STATUS_WAITING define is removed and an enum with PROC_WAIT_STATUS_WAITING is added instead This macro uses appropriate one Relevant PG commit: a513f1dfbf2c29a51b0f7cbd5913ce2d2ee452c5	2021-09-03 15:27:24 +03:00
Halil Ozan Akgul	3c10e0f568	Introduces ROLE_MONITOR_COMPAT macro DEFAULT_ROLE_MONITOR is renamed to ROLE_PG_MONITOR This macro uses appropriate one Relevant PG commit: c9c41c7a337d3e2deb0b2a193e9ecfb865d8f52b	2021-09-03 15:27:24 +03:00
Halil Ozan Akgul	b790ecf180	Introduces F_NEXTVAL_COMPAT macro Name of F_NEXTVAL_OID is changed to F_NEXTVAL Relevant PG commit: 8e1f37c07aafd4bb7aa6e1e1982010af11f8b5c7	2021-09-03 15:27:24 +03:00
Halil Ozan Akgul	63cdb4b70a	Adds AlterTableStmtObjType macro AlterTableStmt's relkind field is changed into objtype New AlterTableStmtObjType macro uses the appropriate one Relevant PG commit: cc35d8933a211d9965eb1c1d2749a903d5735db2	2021-09-03 15:27:24 +03:00
Halil Ozan Akgul	1b6c8348fb	Adds PG14 to version_compat.h and columnar_version_compat.h files	2021-09-03 15:27:24 +03:00
Halil Ozan Akgul	7a27d7cee3	Adds copy of ruleutils_13.c as ruleutils_14.c	2021-09-03 15:27:24 +03:00
jeff-davis	4718b6bcdf	Generate parameterized paths for columnar scans. (#5172 ) Allow ColumnarScans to push down join quals by generating parameterized paths. This significantly expands the utility of chunk group filtering, making a ColumnarScan behave similar to an index when on the inner of a nested loop join. Also, evaluate all parameters on beginscan/rescan, which also works for external parameters. Fixes #4488.	2021-09-02 22:22:48 -07:00
Onur Tirtir	9cb5ef5007	Pass ColumnarScanDesc to ColumnarScanChunkGroupsFiltered	2021-09-02 13:20:11 +03:00
Naisila Puka	4fb05efabb	Distributes partition-to-be table before ProcessUtility (#5191 ) * Skip ALTER TABLE constraint checks while planning * Revert previous commit's solution, keep tests * Distribute partition-to-be table before ProcessUtility * Acquire locks in PreprocessAlterTableStmtAttachPartition	2021-09-02 13:07:42 +03:00
Onur Tirtir	889a2731cb	Split columnar stripe reservation into two phases (#5188 ) Previously, we were doing `first_row_number` reservation for the first row written to current `WriteState` but were doing `stripe_id` reservation when flushing the `WriteState` and were inserting the related record to `columnar.stripe` at that time as well. However, inserting `columnar.stripe` record at flush-time is problematic. This is because, as told in #5160, if relation has any index-based constraints and if there are two concurrent writes that are inserting conflicting key values for that constraint, then postgres relies on `tableAM->fetch_index_tuple` (=`columnar_fetch_index_tuple`) callback to return `true` when indexAM is checking against possible constraint violations. However, pending writes of other backends are not visible to concurrent sessions in columnar since we were not inserting the stripe metadata record until flushing the stripe. With this commit, we split stripe reservation into two phases: i) Reserve `stripe_id` and insert a "dummy" record to `columnar.stripe` at the very same time we reserve `first_row_number`, i.e. when writing the first row to the current `WriteState`. ii) At flush time, do the storage level allocation and complete the missing fields of the dummy record inserted into `columnar.stripe` during i). That way, any concurrent writes would be able to check against possible constraint violations by using `SnapshotDirty` when scanning `columnar.stripe`. Note that `columnar_fetch_index_tuple` still wouldn't be able to fill the output tupleslot for the requested tid but it would at least return `true` for such index look-up's and we believe this should be sufficient for the caller indexAM callback to make the concurrent writer block on prior one. That is how we fix #5160. Only downside of reserving `stripe_id` at the same time we reserve `first_row_number` is that now any aborted writes would also waste some amount of `stripe_id` as in the case of `first_row_number` but we are just wasting them one-by-one. Considering the fact that we waste `first_row_number` by the amount stripe row limit (=150k by default) in such cases, this shouldn't be important at all.	2021-09-02 11:49:14 +03:00
Onur Tirtir	bf4dfad6f7	Update curcid of given snapshot if it is MVCC Before starting to scan a columnar table, we always flush the pending writes to disk. However, we increment command counter after modifying metadata tables. On the other hand, now that we _don't always use_ xact snapshot to scan a columnar table, writes that we just flushed might not be visible to the query that just flushed pending writes to disk since curcid of provided snapshot would become smaller than the command id being used when modifying metadata tables. To give an example, before this change, below was a possible scenario due to the changes that we made to use the correct snapshot. ```sql CREATE TABLE t(a int, b int) USING columnar; BEGIN; INSERT INTO t VALUES (5, 10); SELECT * FROM t; ┌───┬───┐ │ a │ b │ ├───┼───┤ └───┴───┘ (0 rows) SELECT * FROM t; ┌───┬────┐ │ a │ b │ ├───┼────┤ │ 5 │ 10 │ └───┴────┘ (1 row) ```	2021-09-02 11:11:59 +03:00
Onur Tirtir	0b4ed075b5	Use correct snapshot when reading a columnar table Instead of using xact snapshot, use the snapshot provided to columnarAM when scanning table.	2021-09-02 11:10:11 +03:00
Hanefi Onaldi	7e39c7ea83	Replace master with citus in logs and comments (#5210 ) I replaced - master_add_node, - master_add_inactive_node - master_activate_node with - citus_add_node, - citus_add_inactive_node - citus_activate_node respectively.	2021-08-26 11:31:17 +03:00
Onur Tirtir	4b03195c06	Use RelationGetStatExtList instead of GetExplicitStatisticsIdList	2021-08-18 17:50:57 +03:00
Onur Tirtir	68f46c5dc9	Use scan context for intermediate mem allocs too	2021-08-16 11:06:03 +03:00
Ahmet Gedemenli	9e90894f21	Synchronize hasmetadata flag on mx workers (#5086 ) * Synchronize hasmetadata flag on mx workers * Switch to sequential execution * Add test * Use SetWorkerColumn * Add test for stop_sync * Remove usage of UpdateHasmetadataOnWorkersWithMetadata * Remove MarkNodeMetadataSynced * Fix test for metadatasynced * Remove MarkNodeMetadataSynced * Style * Remove MarkNodeHasMetadata * Remove UpdateDistNodeBoolAttr * Refactor SetWorkerColumn * Use SetWorkerColumnLocalOnly when setting up dependencies * Use SetWorkerColumnLocalOnly in TriggerSyncMetadataToPrimaryNodes * Style * Make update command generator functions static * Set metadatasynced before syncing * Call SetWorkerColumn only if the sync is successful * Try to sync all nodes * Fix indexno * Update metadatasynced locally first * Break if a node fails to sync metadata * Send worker commands optional * Style & Rebase * Add raiseOnError param to SetWorkerColumn * Style * Set metadatasynced for all metadata nodes * Style * Introduce SetWorkerColumnOptional * Polish * Style * Dont send set command to not synced metadata nodes * Style * Polish * Add test for stop_sync * Add test for shouldhaveshards * Add test for isactive flag * Sort by placementid in the function verify_metadata * Cover edge cases for failing nodes * Add comments * Add nodeport to isactive test * Add warning if metadata out of sync * Update warning message	2021-08-12 14:16:18 +03:00
Onder Kalaci	5f02d18ef8	transactional metadata sync for maintanince daemon As we use the current user to sync the metadata to the nodes with #5105 (and many other PRs), there is no reason that prevents us to use the coordinated transaction for metadata syncing. This commit also renames few functions to reflect their actual implementation.	2021-08-09 10:34:55 +02:00
Onder Kalaci	35964c6366	Dropped columns do not diverge distribution column for partitioned tables Before this commit, creating a partition after a DROP column on the parent (position before dist. key) was leading to partition to have the wrong distribution column.	2021-08-06 13:36:12 +02:00
Onder Kalaci	482b8096e9	Introduce citus_internal_update_relation_colocation update_distributed_table_colocation can be called by the relation owner, and internally it updates pg_dist_partition. With this commit, update_distributed_table_colocation uses an internal UDF to access pg_dist_partition. As a result, this operation can now be done by regular users on MX.	2021-08-03 11:44:58 +02:00
Onur Tirtir	83f5d42365	Use long-lasting mem cxt & optimize correlated index scan	2021-08-02 11:00:12 +03:00
Onur Tirtir	eeecbd2324	Introduce ColumnarSupportsIndexAM	2021-07-30 16:40:27 +03:00
SaitTalhaNisanci	4559d02c41	Fix union pushdown issue (#5079 ) * Fix UNION not being pushdown Postgres optimizes column fields that are not needed in the output. We were relying on these fields to understand if it is safe to push down a union query. This fix looks at the parse query, which has the original column fields to detect if it is safe to push down a union query. * Add more tests * Simplify code and make it more robust * Process varlevelsup > 0 in FindReferencedTableColumn * Only look for outers vars in union path * Add more comments * Remove UNION ALL specific logic for pulling up childvars	2021-07-29 13:52:55 +03:00
Jelte Fennema	7d0b6dc9be	Include data_type and cache in sequence definition on workers These two options were not included when creating the sequences on the workers as part of metadata syncing. The missing `data_type` part of the definition made finding the cause of #5126 harder than necessary, because of confusing errors.	2021-07-22 11:49:06 +02:00
Onder Kalaci	2c349e6dfd	Use current user to sync metadata Before this commit, we always synced the metadata with superuser. However, that creates various edge cases such as visibility errors or self distributed deadlocks or complicates user access checks. Instead, with this commit, we use the current user to sync the metadata. Note that, `start_metadata_sync_to_node` still requires super user because accessing certain metadata (like pg_dist_node) always require superuser (e.g., the current user should be a superuser). However, metadata syncing operations regarding the distributed tables can now be done with regular users, as long as the user is the owner of the table. A table owner can still insert non-sense metadata, however it'd only affect its own table. So, we cannot do anything about that.	2021-07-16 13:25:27 +02:00
Onur Tirtir	7bfd84bc70	Introduce StripeGetHighestRowNumber	2021-07-07 11:01:39 +03:00
Onur Tirtir	8942086506	Remove stripeList & currentStripe from ColumnarReadState	2021-07-07 11:01:39 +03:00
Sait Talha Nisanci	e7ed16c296	Not include to-be-deleted shards while finding shard placements Ignore orphaned shards in more places Only use active shard placements in RouterInsertTaskList Use IncludingOrphanedPlacements in some more places Fix comment Add tests	2021-06-28 13:05:31 +03:00
Naisila Puka	fe5907ad2d	Adds propagation of ALTER SEQUENCE and other improvements (#5061 ) * Alter seq type when we first use the seq in a dist table * Don't allow type changes when seq is used in dist table * ALTER SEQUENCE propagation * Tests for ALTER SEQUENCE propagation * Relocate AlterSequenceType and ensure dependencies for sequence * Support for citus local tables, and other fixes * Final formatting	2021-06-24 21:23:25 +03:00
Jelte Fennema	d1d386a904	Only allow moves of shards of distributed tables (#5072 ) Moving shards of reference tables was possible in at least one case: ```sql select citus_disable_node('localhost', 9702); create table r(x int); select create_reference_table('r'); set citus.replicate_reference_tables_on_activate = off; select citus_activate_node('localhost', 9702); select citus_move_shard_placement(102008, 'localhost', 9701, 'localhost', 9702); ``` This would then remove the reference table shard on the source, causing all kinds of issues. This fixes that by disallowing all shard moves except for shards of distributed tables. Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>	2021-06-23 16:25:46 +02:00
Jelte Fennema	ca00b63272	Avoid two race conditions in the rebalance progress monitor (#5050 ) The first and main issue was that we were putting absolute pointers into shared memory for the `steps` field of the `ProgressMonitorData`. This pointer was being overwritten every time a process requested the monitor steps, which is the only reason why this even worked in the first place. To quote a part of a relevant stack overflow answer: > First of all, putting absolute pointers in shared memory segments is > terrible terible idea - those pointers would only be valid in the > process that filled in their values. Shared memory segments are not > guaranteed to attach at the same virtual address in every process. > On the contrary - they attach where the system deems it possible when > `shmaddr == NULL` is specified on call to `shmat()` Source: https://stackoverflow.com/a/10781921/2570866 In this case a race condition occurred when a second process overwrote the pointer in between the first process its write and read of the steps field. This issue is fixed by not storing the pointer in shared memory anymore. Instead we now calculate it's position every time we need it. The second race condition I have not been able to trigger, but I found it while investigating this. This issue was that we published the handle of the shared memory segment, before we initialized the data in the steps. This means that during initialization of the data, a call to `get_rebalance_progress()` could read partial data in an unsynchronized manner.	2021-06-21 14:03:42 +00:00
Onder Kalaci	69ca943e58	Deparse/parse the local cached queries With local query caching, we try to avoid deparse/parse stages as the operation is too costly. However, we can do deparse/parse operations once per cached queries, right before we put the plan into the cache. With that, we avoid edge cases like (4239) or (5038). In a sense, we are making the local plan caching behave similar for non-cached local/remote queries, by forcing to deparse the query once.	2021-06-21 12:24:29 +03:00
Onur Tirtir	6215a3aa93	Merge remote-tracking branch 'origin/master' into columnar-index	2021-06-17 14:31:12 +03:00
Onder Kalaci	bc09288651	Get ready for Improve index backed constraint creation for online rebalancer See: https://github.com/citusdata/citus-enterprise/issues/616	2021-06-17 13:05:56 +03:00
Onur Tirtir	3d11c0f9ef	Merge remote-tracking branch 'origin/master' into columnar-index Conflicts: src/test/regress/expected/columnar_empty.out src/test/regress/expected/multi_extension.out	2021-06-16 20:23:50 +03:00
Onur Tirtir	5adab2a3ac	Report progress when building index on columnar tables	2021-06-16 20:06:33 +03:00
Onur Tirtir	10a762aa88	Implement columnar index support functions	2021-06-16 19:59:32 +03:00
Hanefi Onaldi	5c6069a74a	Do not rely on fk cache when truncating local data (#5018 )	2021-06-07 11:56:48 +03:00
Marco Slot	e81d25a7be	Refactor RelationIsAKnownShard to remove onlySearchPath argument	2021-06-02 14:30:27 +02:00
Ahmet Gedemenli	089ef35940	Disable dropping and truncating known shards Add test for disabling dropping and truncating known shards	2021-06-02 14:30:27 +02:00
Jelte Fennema	1a83628195	Use "orphaned shards" naming in more places We were not very consistent in how we named these shards.	2021-06-04 11:39:19 +02:00
Jelte Fennema	3f60e4f394	Add ExecuteCriticalCommandInDifferentTransaction function We use this pattern multiple times throughout the codebase now. Seems like a good moment to abstract it away.	2021-06-04 11:30:27 +02:00
Jelte Fennema	280b9ae018	Cleanup orphaned shards at the start of a rebalance In case the background daemon hasn't cleaned up shards yet, we do this manually at the start of a rebalance.	2021-06-04 11:23:07 +02:00
Naisila Puka	0f37ab5f85	Fixes column default coming from a sequence (#4914 ) * Add user-defined sequence support for MX * Remove default part when propagating to workers * Fix ALTER TABLE with sequences for mx tables * Clean up and add tests * Propagate DROP SEQUENCE * Removing function parts * Propagate ALTER SEQUENCE * Change sequence type before propagation & cleanup * Revert "Propagate ALTER SEQUENCE" This reverts commit 2bef64c5a29f4e7224a7f43b43b88e0133c65159. * Ensure sequence is not used in a different column with different type * Insert select tests * Propagate rename sequence stmt * Fix issue with group ID cache invalidation * Add ALTER TABLE ALTER COLUMN TYPE .. precaution * Fix attnum inconsistency and add various tests * Add ALTER SEQUENCE precaution * Remove Citus hook * More tests Co-authored-by: Marco Slot <marco.slot@gmail.com>	2021-06-03 23:02:09 +03:00
Hanefi Onaldi	fa29d6667a	Accept invalidation before fk graph validity check (#5017 ) InvalidateForeignKeyGraph sends an invalidation via shared memory to all backends, including the current one. However, we might not call AcceptInvalidationMessages before reading from the cache below. It would be better to also add a call to AcceptInvalidationMessages in IsForeignConstraintRelationshipGraphValid.	2021-06-02 14:45:35 +03:00
Onur Tirtir	94f30a0428	Refactor index check in ColumnarProcessUtility	2021-06-01 11:12:28 +03:00
Jelte Fennema	3271f1bd13	Fix data race in get_rebalance_progress (#5008 ) To be able to report progress of the rebalancer, the rebalancer updates the state of a shard move in a shared memory segment. To then fetch the progress, `get_rebalance_progress` can be called which reads this shared memory. Without this change it did so without using any synchronization primitives, allowing for data races. This fixes that by using atomic operations to update and read from the parts of the shared memory that can be changed after initialization.	2021-05-31 15:27:32 +02:00
SaitTalhaNisanci	8c3f85692d	Not consider old placements when disabling or removing a node (#4960 ) * Not consider old placements when disabling or removing a node * update cluster test	2021-05-28 22:38:20 +02:00
SaitTalhaNisanci	a4944a2102	Rename CoordinatedTransactionShouldUse2PC (#4995 )	2021-05-21 18:57:42 +03:00
Hanefi Onaldi	878513f325	Remove all occurences of replication_model GUC	2021-05-21 16:14:59 +03:00
Jelte Fennema	10f06ad753	Fetch shard size on the fly for the rebalance monitor Without this change the rebalancer progress monitor gets the shard sizes from the `shardlength` column in `pg_dist_placement`. This column needs to be updated manually by calling `citus_update_table_statistics`. However, `citus_update_table_statistics` could lead to distributed deadlocks while database traffic is on-going (see #4752). To work around this we don't use `shardlength` column anymore. Instead for every rebalance we now fetch all shard sizes on the fly. Two additional things this does are: 1. It adds tests for the rebalance progress function. 2. If a shard move cannot be done because a source or target node is unreachable, then we error in stop the rebalance, instead of showing a warning and continuing. When using the by_disk_size rebalance strategy it's not safe to continue with other moves if a specific move failed. It's possible that the failed move made space for the next move, and because the failed move never happened this space now does not exist. 3. Adds two new columns to the result of `get_rebalancer_progress` which shows the size of the shard on the source and target node. Fixes #4930	2021-05-20 16:38:17 +02:00
Nils Dijk	a6c2d2a4c4	Feature: alter database owner (#4986 ) DESCRIPTION: Add support for ALTER DATABASE OWNER This adds support for changing the database owner. It achieves this by marking the database as a distributed object. By marking the database as a distributed object it will look for its dependencies and order the user creation commands (enterprise only) before the alter of the database owner. This is mostly important when adding new nodes. By having the database marked as a distributed object it can easily understand for which `ALTER DATABASE ... OWNER TO ...` commands to propagate by resolving the object address of the database and verifying it is a distributed object, and hence should propagate changes of owner ship to all workers. Given the ownership of the database might have implications on subsequent commands in transactions we force sequential mode for transactions that have a `ALTER DATABASE ... OWNER TO ...` command in them. This will fail the transaction with meaningful help when the transaction already executed parallel statements. By default the feature is turned off since roles are not automatically propagated, having it turned on would cause hard to understand errors for the user. It can be turned on by the user via setting the `citus.enable_alter_database_owner`.	2021-05-20 13:27:44 +02:00
Onder Kalaci	d07db99ea4	Make sure that target node in shard moves is eligable for shard move	2021-05-20 10:51:01 +02:00
Onder Kalaci	926069a859	Wait until all connections are successfully established Comment from the code: /* * Iterate until all the tasks are finished. Once all the tasks * are finished, ensure that that all the connection initializations * are also finished. Otherwise, those connections are terminated * abruptly before they are established (or failed). Instead, we let * the ConnectionStateMachine() to properly handle them. * * Note that we could have the connections that are not established * as a side effect of slow-start algorithm. At the time the algorithm * decides to establish new connections, the execution might have tasks * to finish. But, the execution might finish before the new connections * are established. / Note that the abruptly terminated connections lead to the following errors: 2020-11-16 21:09:09.800 CET [16633] LOG: could not accept SSL connection: Connection reset by peer 2020-11-16 21:09:09.872 CET [16657] LOG: could not accept SSL connection: Undefined error: 0 2020-11-16 21:09:09.894 CET [16667] LOG: could not accept SSL connection: Connection reset by peer To easily reproduce the issue: - Create a single node Citus - Add the coordinator to the metadata - Create a distributed table with shards on the coordinator - f.sql: select count() from test; - pgbench -f /tmp/f.sql postgres -T 12 -c 40 -P 1 or pgbench -f /tmp/f.sql postgres -T 12 -c 40 -P 1 -C	2021-05-19 15:59:13 +02:00
Onder Kalaci	995adf1a19	Executor takes connection establishment and task execution costs into account With this commit, the executor becomes smarter about refrain to open new connections. The very basic example is that, if the connection establishments take 1000ms and task executions as 5 msecs, the executor becomes smart enough to not establish new connections.	2021-05-19 15:48:07 +02:00
Marco Slot	644b266dee	Only cache local plans when reusing a distributed plan	2021-05-18 16:11:43 +02:00
SaitTalhaNisanci	eaa7d2bada	Not block maintenance daemon (#4972 ) It was possible to block maintenance daemon by taking an SHARE ROW EXCLUSIVE lock on pg_dist_placement. Until the lock is released maintenance daemon would be blocked. We should not block the maintenance daemon under any case hence now we try to get the pg_dist_placement lock without waiting, if we cannot get it then we don't try to drop the old placements.	2021-05-17 03:22:35 -07:00
Nils Dijk	c91f8d8a15	Feature: localhost guc (#4836 ) DESCRIPTION: introduce `citus.local_hostname` GUC for connections to the current node Citus once in a while needs to connect to itself for some systems operations. This used to be hardcoded to `localhost`. The hardcoded hostname causes some issues, for example in environments where `sslmode=verify-full` is required. It is not always desirable or even feasible to get `localhost` as an alt name on the certificate. By introducing a GUC to use when connecting to the current instance the user has more control what network path is used and what hostname is required to be present in the server certificate.	2021-05-12 16:59:44 +02:00
Jelte Fennema	cbbd10b974	Implement an improvement threshold in the rebalancer (#4927 ) Every move in the rebalancer algorithm results in an improvement in the balance. However, even if the improvement in the balance was very small the move was still chosen. This is especially problematic if the shard itself is very big and the move will take a long time. This changes the rebalancer algorithm to take the relative size of the balance improvement into account when choosing moves. By default a move will not be chosen if it improves the balance by less than half of the size of the shard. An extra argument is added to the rebalancer functions so that the user can decide to lower the default threshold if the ignored move is wanted anyway.	2021-05-11 14:24:59 +02:00
Onder Kalaci	a231ff29b0	Get prepared for some improvements for online rebalancer To see all the changes, see https://github.com/citusdata/citus-enterprise/pull/586/files	2021-05-10 19:54:31 +02:00
Onur Tirtir	4f3c672ebe	Re-consider VALID_ITEMPOINTER_OFFSETS wrt bitmap scan logic	2021-05-10 20:16:50 +03:00
Onur Tirtir	0f4c97e0d0	Improve the constants around row number mapping	2021-05-10 20:16:50 +03:00
Onur Tirtir	2e419ea177	Add first_row_number column to columnar.stripe for tid mapping	2021-05-10 20:16:50 +03:00
jeff-davis	7b9aecff21	Columnnar: metapage changes. (#4907 ) * Columnar: introduce columnar storage API. This new API is responsible for the low-level storage details of columnar; translating large reads and writes into individual block reads and writes that respect the page headers and emit WAL. It's also responsible for the columnar metapage, resource reservations (stripe IDs, row numbers, and data), and truncation. This new API is not used yet, but will be used in subsequent forthcoming commits. * Columnar: add columnar_storage_info() for debugging purposes. * Columnar: expose ColumnarMetadataNewStorageId(). * Columnar: always initialize metapage at creation time. This avoids the complexity of dealing with tables where the metapage has not yet been initialized. * Columnar: columnar storage upgrade/downgrade UDFs. Necessary upgrade/downgrade step so that new code doesn't see an old metapage. * Columnar: improve metadata.c comment. * Columnar: make ColumnarMetapage internal to the storage API. Callers should not have or need direct access to the metapage. * Columnar: perform resource reservation using storage API. * Columnar: implement truncate using storage API. * Columnar: implement read/write paths with storage API. * Columnar: add storage tests. * Revert "Columnar: don't include stripe reservation locks in lock graph." This reverts commit `c3dcd6b9f8`. No longer needed because the columnar storage API takes care of concurrency for resource reservation. * Columnar: remove unnecessary lock when reserving. No longer necessary because the columnar storage API takes care of concurrent resource reservation. * Add simple upgrade tests for storage/ branch * fix multi_extension.out Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>	2021-05-10 20:16:46 +03:00
SaitTalhaNisanci	6b1904d37a	When moving a shard to a new node ensure there is enough space (#4929 ) * When moving a shard to a new node ensure there is enough space * Add WairForMiliseconds time utility * Add more tests and increase readability * Remove the retry loop and use a single udf for disk stats * Address review * address review Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>	2021-05-06 17:28:02 +03:00
Jelte Fennema	2f29d4e53e	Continue to remove shards after first failure in DropMarkedShards The comment of DropMarkedShards described the behaviour that after a failure we would continue trying to drop other shards. However the code did not do this and would stop after the first failure. Instead of simply fixing the comment I fixed the code, because the described behaviour is more useful. Now a single shard that cannot be removed yet does not block others from being removed.	2021-04-30 15:42:09 +03:00
Marco Slot	4b49cb112f	Fix FROM ONLY queries on partitioned tables	2021-04-27 16:10:07 +02:00
Onder Kalaci	918838e488	Allow constant VALUES clauses in pushdown queries As long as the VALUES clause contains constant values, we should not recursively plan the queries/CTEs. This is a follow-up work of #1805. So, we can easily apply OUTER join checks as if VALUES clause is a reference table/immutable function.	2021-04-21 14:28:08 +02:00
SaitTalhaNisanci	93c2dcf3d2	Fix data-race with concurrent calls of DropMarkedShards (#4909 ) * Fix problews with concurrent calls of DropMarkedShards When trying to enable `citus.defer_drop_after_shard_move` by default it turned out that DropMarkedShards was not safe to call concurrently. This could especially cause big problems when also moving shards at the same time. During tests it was possible to trigger a state where a shard that was moved would not be available on any of the nodes anymore after the move. Currently DropMarkedShards is only called in production by the maintenaince deamon. Since this is only a single process triggering such a race is currently impossible in production settings. In future changes we will want to call DropMarkedShards from other places too though. * Add some isolation tests Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>	2021-04-21 10:59:48 +03:00
Ahmet Gedemenli	33c620f232	Optimize partitioned disk size calculation (#4905 ) * Optimize partitioned disk size calculation * Polish * Fix test for citus_shard_cost_by_disk_size Try optimizing if not CSTORE	2021-04-19 13:30:56 +03:00
Onder Kalaci	5482d5822f	Keep more statistics about connection establishment times When DEBUG4 enabled, Citus now prints per connection establishment time.	2021-04-16 14:56:31 +02:00
Hanefi Onaldi	9919fbe3f8	Switch to sequential mode on long partition names This commit adds support for long partition names for distributed tables: - ALTER TABLE dist_table ATTACH PARTITION .. - CREATE TABLE .. PARTITION OF dist_table .. Note: create_distributed_table UDF does not support long table and partition names, and is not covered in this commit	2021-04-14 15:27:50 +03:00
Onur Tirtir	fe5c985e1d	Remove HAS_TABLEAM config since we dropped pg11 support (#4862 ) * Remove HAS_TABLEAM config * Drop columnar_ensure_objects_exist * Not call columnar_ensure_objects_exist in citus_finish_pg_upgrade	2021-04-13 10:51:26 +03:00
Ahmet Gedemenli	d74d358a45	Refactor size queries with new enum SizeQueryType (#4898 ) * Refactor size queries with new enum SizeQueryType * Polish	2021-04-12 17:14:29 +03:00
SaitTalhaNisanci	b453563e88	Warm up connections params hash (#4872 ) ConnParams(AuthInfo and PoolInfo) gets a snapshot, which will block the remote connectinos to localhost. And the release of snapshot will be blocked by the snapshot. This leads to a deadlock. We warm up the conn params hash before starting a new transaction so that the entries will already be there when we start a new transaction. Hence GetConnParams will not get a snapshot.	2021-04-12 13:08:38 +03:00
Halil Ozan Akgul	a5038046f9	Adds shard_count parameter to create_distributed_table	2021-03-29 16:22:49 +03:00
SaitTalhaNisanci	03832f353c	Drop postgres 11 support	2021-03-25 09:20:28 +03:00
Marco Slot	fbc2147e11	Replace MAX_PUT_COPY_DATA_BUFFER_SIZE by citus.remote_copy_flush_threshold GUC	2021-03-16 06:00:38 +01:00
Marco Slot	1646fca445	Add GUC to set maximum connection lifetime	2021-03-16 01:57:57 +01:00
jeff-davis	3b12556401	Columnar: cleanup (#4814 ) * Columnar: fix misnamed file. * Columnar: make compression not dependent on columnar.h. * Columnar: rename columnar_metadata_tables.c to columnar_metadata.c. * Columnar: make customscan not depend on columnar.h. Co-authored-by: Jeff Davis <jefdavi@microsoft.com>	2021-03-15 11:34:39 -07:00
Onder Kalaci	e65e72130d	Rename use -> shouldUse Because setting the flag doesn't necessarily mean that we'll use 2PC. If connections are read-only, we will not use 2PC. In other words, we'll use 2PC only for connections that modified any placements.	2021-03-12 08:29:43 +00:00
Onder Kalaci	6a7ed7b309	Do not trigger 2PC for reads on local execution Before this commit, Citus used 2PC no matter what kind of local query execution happens. For example, if the coordinator has shards (and the workers as well), even a simple SELECT query could start 2PC: ```SQL WITH cte_1 AS (SELECT * FROM test LIMIT 10) SELECT count(*) FROM cte_1; ``` In this query, the local execution of the shards (and also intermediate result reads) triggers the 2PC. To prevent that, Citus now distinguishes local reads and local writes. And, Citus switches to 2PC only if a modification happens. This may still lead to unnecessary 2PCs when there is a local modification and remote SELECTs only. Though, we handle that separately via #4587.	2021-03-12 08:29:43 +00:00
Onder Kalaci	d1cd198655	Prevent infinite recursion for queries that involve UNION ALL and JOIN With this commit, we make sure to prevent infinite recursion for queries in the format: [subquery with a UNION ALL] JOIN [table or subquery] Also, fixes a bug where we pushdown UNION ALL below a JOIN even if the UNION ALL is not safe to pushdown.	2021-03-03 12:27:26 +01:00
Hadi Moshayedi	1a05131331	Use chunk groups to read columnar data (#4768 )	2021-03-02 23:53:24 -08:00
Naisila Puka	2f30614fe3	Reimplement citus_update_table_statistics to detect dist. deadlocks (#4752 ) * Reimplement citus_update_table_statistics * Update stats for the given table not colocation group * Add tests for reimplemented citus_update_table_statistics * Use coordinated transaction, merge with citus_shard_sizes functions * Update the old master_update_table_statistics as well	2021-03-03 04:12:30 +03:00
jeff-davis	9da9bd3dfd	Columnar: rename files and tests. (#4751 ) * Columnar: rename files and tests. * Columnar: Rename TableState to ColumnarState.	2021-03-01 08:34:24 -08:00
SaitTalhaNisanci	feee25dfbd	Use translated vars in postgres 13 as well (#4746 ) * Use translated vars in postgres 13 as well Postgres 13 removed translated vars with pg 13 so we had a special logic for pg 13. However it had some bug, so now we copy the translated vars before postgres deletes it. This also simplifies the logic. * fix rtoffset with pg >= 13	2021-02-26 19:41:29 +03:00
Naisila Puka	5ebd4eac7f	Preserve colocation with procedures in alter_distributed_table (#4743 )	2021-02-25 19:52:47 +03:00
Hanefi Onaldi	9a792ef841	Remove length limitations for table renames	2021-02-24 03:35:27 +03:00
Onur Tirtir	495096ef5e	Remove useless pg version checks (#4741 )	2021-02-23 21:20:18 +03:00
SaitTalhaNisanci	bcbd24f8de	Only consider pseudo constants for shortcuts (#4712 ) It seems that we need to consider only pseudo constants while doing some shortcuts in planning. For example there could be a false clause but it can contribute to the result in which case it will not be a pseudo constant.	2021-02-15 18:39:37 +03:00
Onder Kalaci	f297c96ec5	Add regression tests for COPY into colocated intermediate results To add the tests without too much data, make the copy switchover configurable.	2021-02-11 15:41:06 +01:00
Ahmet Gedemenli	c8e83d1f26	Fix dropping fkey when distributing table	2021-02-11 15:48:35 +03:00
Hadi Moshayedi	c3dcd6b9f8	Columnar: don't include stripe reservation locks in lock graph.	2021-02-10 10:20:20 -08:00
Hadi Moshayedi	c8d61a31e2	Columnar: chunk_group metadata table	2021-02-09 14:11:58 -08:00
Onder Kalaci	c804c9aa21	Allow local execution for intermediate results in COPY When COPY is used for copying into co-located files, it was not allowed to use local execution. The primary reason was Citus treating co-located intermediate results as co-located shards, and COPY into the distributed table was done via "format result". And, local execution of such COPY commands was not implemented. With this change, we implement support for local execution with "format result". To do that, we use the buffer for every file on shardState->copyOutState, similar to how local copy on shards are implemented. In fact, the logic is similar to local copy on shards, but instead of writing to the shards, Citus writes the results to a file. The logic relies on LOCAL_COPY_FLUSH_THRESHOLD, and flushes only when the size exceeds the threshold. But, unlike local copy on shards, in this case we write the headers and footers just once.	2021-02-09 15:00:06 +01:00
Jeff Davis	2ea31c899e	Columnar: make read and write state private.	2021-02-08 10:11:57 -08:00
Hadi Moshayedi	eff8cffaf3	Columnar: improve naming of limit config variables. (#4653 ) * Rename chunk_row_count to chunk_group_row_limit * Rename stripe_row_count to stripe_row_limit * Undo couple of renames	2021-02-06 09:04:04 -08:00
Hadi Moshayedi	0a9fd91d8f	Use 'Chunk Groups' in EXPLAIN ANALYZE of columnar scan	2021-02-05 10:58:01 -08:00
Ahmet Gedemenli	5dd2a3da03	Convert RelabelTypes into CollateExprs in get_rule_expr function	2021-02-05 12:06:46 +03:00
Onder Kalaci	fc9a23792c	COPY uses adaptive connection management on local node With #4338, the executor is smart enough to failover to local node if there is not enough space in max_connections for remote connections. For COPY, the logic is different. With #4034, we made COPY work with the adaptive connection management slightly differently. The cause of the difference is that COPY doesn't know which placements are going to be accessed hence requires to get connections up-front. Similarly, COPY decides to use local execution up-front. With this commit, we change the logic for COPY on local nodes: Try to reserve a connection to local host. This logic follows the same logic (e.g., citus.local_shared_pool_size) as the executor because COPY also relies on TryToIncrementSharedConnectionCounter(). If reservation to local node fails, switch to local execution Apart from this, if local execution is disabled, we follow the exact same logic for multi-node Citus. It means that if we are out of the connection, we'd give an error.	2021-02-04 09:45:07 +01:00
Sait Talha Nisanci	9ba3f70420	Remove unused method	2021-02-03 20:02:03 +03:00
Onur Tirtir	3a403090fd	Disallow adding local table with identity column to metadata (#4633 ) pg_get_tableschemadef_string doesn't know how to deparse identity columns so we cannot reflect those columns when creating shell relation. For this reason, we don't allow adding local tables -having identity cols- to metadata.	2021-02-03 19:05:17 +03:00
Onur Tirtir	93c3f30024	Rename ExtractColumnsOwningSequences	2021-02-02 18:17:42 +03:00
Hanefi Önaldı	cab17afce9	Introduce UDFs for fixing partitioned table constraint names	2021-01-29 17:32:20 +03:00
SaitTalhaNisanci	738825cc38	Fix partition column index issue (#4591 ) * Fix partition column index issue We send column names to worker_hash/range_partition_table methods, and in these methods we check the column name index from tuple descriptor. Then this index is used to decide the bucket that the current row will be sent for the repartition. This becomes a problem when there are the same column names in the tupleDescriptor. Then we can choose the wrong index. Hence the partitioned data will be put to wrong workers. Then the result could miss some data because workers might contain different range of data. An example: TupleDescriptor contains "trip_id", "car_id", "car_id" for one table. It contains only "car_id" for the other table. And assuming that the tables will be partitioned by car_id, it is not certain what should be used for deciding the bucket number for the first table. Assuming value 2 goes to bucket 2 and value 3 goes to bucket 3, it is not certain which bucket "1 2 3" (trip_id, car_id, car_id) row will go to. As a solution we send the index of partition column in targetList instead of the column name. The old API is kept so that if workers upgrade work, it still works (though it will have the same bug) * Use the same method so that backporting is easier	2021-01-29 14:40:40 +03:00
Onur Tirtir	2f30be823e	Rename create_citus_local_table to citus_add_local_table_to_metadata For simplicity in downgrade test in multi_extension, didn't actually remove create_citus_local_table udf.	2021-01-27 15:52:36 +03:00
Onur Tirtir	458a81f93d	Add suppressNoticeMessages to TableConversionState	2021-01-27 12:53:58 +03:00
Naisila Puka	94bc2703bc	Make undistribute_table() and citus_create_local_table() work with columnar (#4563 ) * Make undistribute_table() and citus_create_local_table() work with columnar * Rename and use LocallyExecuteUtilityTask for UDF check * Remove 'local' references in ExecuteUtilityCommand	2021-01-27 01:17:20 +03:00
Halil Ozan Akgul	bafa692fc1	Adds error messages with names of indexes that will be dropped	2021-01-26 18:18:26 +03:00
Jeff Davis	d62e54dc09	Columnar: optimize write path.	2021-01-25 11:47:21 -08:00
Hadi Moshayedi	639952ffa8	Read chunk row count from catalog tables	2021-01-25 08:53:52 -08:00
Onur Tirtir	b5ea033a0b	Convert postgres tables to citus local when creating reference table having fkeys	2021-01-25 11:02:50 +03:00
Onur Tirtir	253c19062a	Rename IsCitusInitiatedBackend to IsCitusInitiatedRemoteBackend (#4562 )	2021-01-23 01:07:43 +03:00
Jeff Davis	53f7b019d5	Columnar: clean up old references to cstore.	2021-01-22 11:08:36 -08:00
Onur Tirtir	941c8fbf32	Automatically undistribute citus local tables when no more fkeys with reference tables (#4538 )	2021-01-22 18:15:41 +03:00
Ahmet Gedemenli	887b67953b	Merge branch 'master' into fix-bug-create-citus-local-table-with-stats	2021-01-22 12:46:47 +03:00
Hadi Moshayedi	222fb4d589	Don't use 'cstore' in function names	2021-01-21 18:32:21 -08:00
jeff-davis	0b5551faaf	Columnar: add explain info for chunk filtering (#4554 ) Co-authored-by: Jeff Davis <jefdavi@microsoft.com>	2021-01-21 15:04:42 -08:00
Ahmet Gedemenli	2fa060a32d	Fix bug creating citus local table with stats	2021-01-20 17:17:13 +03:00
Onder Kalaci	8df58926c5	Rename CitusProcessUtility -> ProcessUtilityForNode	2021-01-20 15:54:00 +03:00
Hadi Moshayedi	bc01c795a2	Reland #4419	2021-01-19 07:48:47 -08:00
Halil Ozan Akgul	27c2bd1599	Moves creation of ALTER INDEX STATISTICS commands next to index commands	2021-01-18 16:55:53 +03:00
Onur Tirtir	f1ecbc3a53	Fix segfault when adding/dropping fkey from ref to citus local via remote exec (#4528 )	2021-01-17 20:43:33 +03:00
Onder Kalaci	c35e22d75d	Skip validation for foreign key creation commands For certaion purposes, we drop and recreate the foreign keys. As we acquire exclusive locks on the tables in between drop and re-create, we can safely skip validation phase of the foreign keys. The reason is purely being performance as foreign key validation could take a long value.	2021-01-15 18:04:52 +03:00
Onder Kalaci	ae0b92233d	Rename function	2021-01-15 18:04:52 +03:00
Onder Kalaci	30d0a65f40	Adds citus.enable_local_reference_table_foreign_keys When enabled any foreign keys between local tables and reference tables supported by converting the local table to a citus local table. When the coordinator is not in the metadata, the logic is disabled as foreign keys are not allowed in this configuration.	2021-01-15 18:04:52 +03:00
Halil Ozan Akgul	9407965817	Moves struct to the header	2021-01-15 11:50:11 +03:00
Onur Tirtir	36b418982f	Add support for ALTER TABLE commands defining foreign keys	2021-01-14 17:12:00 +03:00
Onur Tirtir	05931b8fe2	Pass ProcessUtilityContext to .preprocess	2021-01-14 17:12:00 +03:00
jeff-davis	9cffd41389	Cleanup: use table_open, not heap_open. (#4506 ) Co-authored-by: Jeff Davis <jefdavi@microsoft.com>	2021-01-13 12:08:46 -08:00
jeff-davis	b49beda4c3	Stronger check for triggers on columnar tables (#4493 ). (#4494 ) * Stronger check for triggers on columnar tables (#4493). Previously, we used a simple ProcessUtility_hook. Change to use an object_access_hook instead. * Replace alter_table_set_access_method test on partition with foreign key Co-authored-by: Jeff Davis <jefdavi@microsoft.com> Co-authored-by: Marco Slot <marco.slot@gmail.com>	2021-01-13 10:30:53 -08:00
Onur Tirtir	00da1eed20	Some refactor as a preparation	2021-01-13 16:50:09 +03:00
Halil Ozan Akgul	2be14cce2e	Adds alter_distributed_table and alter_table_set_access_method UDFs	2021-01-13 16:02:39 +03:00
SaitTalhaNisanci	724d56f949	Add citus shard helper view (#4361 ) With citus shard helper view, we can easily see: - where each shard is, which node, which port - what kind of table it belongs to - its size With such a view, we can see shards that have a size bigger than some value, which could be useful. Also debugging can be easier in production as well with this view. Fetch shards in one go per node The previous implementation was slow because it would do a lot of round trips, one per shard to be exact. Hence it is improved so that we fetch all the shard_name, shard-size pairs per node in one go. Construct shards_names, sizes query on coordinator	2021-01-13 13:58:47 +03:00
Ahmet Gedemenli	436c9d9d79	Remove the word 'master' from Citus UDFs (#4472 ) * Replace master_add_node with citus_add_node * Replace master_activate_node with citus_activate_node * Replace master_add_inactive_node with citus_add_inactive_node * Use master udfs in old scripts * Replace master_add_secondary_node with citus_add_secondary_node * Replace master_disable_node with citus_disable_node * Replace master_drain_node with citus_drain_node * Replace master_remove_node with citus_remove_node * Replace master_set_node_property with citus_set_node_property * Replace master_unmark_object_distributed with citus_unmark_object_distributed * Replace master_update_node with citus_update_node * Replace master_update_shard_statistics with citus_update_shard_statistics * Replace master_update_table_statistics with citus_update_table_statistics * Rename master_conninfo_cache_invalidate to citus_conninfo_cache_invalidate Rename master_dist_local_group_cache_invalidate to citus_dist_local_group_cache_invalidate * Replace master_copy_shard_placement with citus_copy_shard_placement * Replace master_move_shard_placement with citus_move_shard_placement * Rename master_dist_node_cache_invalidate to citus_dist_node_cache_invalidate * Rename master_dist_object_cache_invalidate to citus_dist_object_cache_invalidate * Rename master_dist_partition_cache_invalidate to citus_dist_partition_cache_invalidate * Rename master_dist_placement_cache_invalidate to citus_dist_placement_cache_invalidate * Rename master_dist_shard_cache_invalidate to citus_dist_shard_cache_invalidate * Drop master_modify_multiple_shards * Rename master_drop_all_shards to citus_drop_all_shards * Drop master_create_distributed_table * Drop master_create_worker_shards * Revert old function definitions * Add missing revoke statement for citus_disable_node	2021-01-13 12:10:43 +03:00
Onur Tirtir	dd55ab394e	Disallow cascade_via_foreign_keys if any partition rel has non-inherited fkeys (#4487 )	2021-01-11 21:50:09 +03:00
Marco Slot	d900a7336e	Automatically add placeholder record for coordinator	2021-01-08 15:09:53 +01:00
Marco Slot	597533b1ff	Add citus_set_coordinator_host	2021-01-08 13:36:26 +01:00
Onur Tirtir	5289785da4	Add cascade_via_foreign_keys option to create_citus_local_table (#4462 )	2021-01-08 15:13:26 +03:00
Marco Slot	011283122b	Add the shard rebalancer implementation	2021-01-07 16:51:55 +01:00
Onur Tirtir	f3801143fb	Add cascade option to undistribute_table	2021-01-07 15:41:49 +03:00
Onur Tirtir	2e3e680ba9	Add infra to cascade citus table functions	2021-01-07 15:41:48 +03:00
Marco Slot	47c1b19174	Revert "Do metadata sync in a separate background worker." This reverts commit `4df723cf9b`.	2021-01-07 10:30:04 +01:00
Marco Slot	d9f175532b	Revert "Trigger metadata sync at transaction commit" This reverts commit `a2c73bef27`.	2021-01-07 10:30:00 +01:00
Marco Slot	5de3337b2f	Support local execution for INSERT..SELECT with re-partitioning	2021-01-06 16:15:53 +01:00
Onur Tirtir	e91e745dbc	Implement ConstraintWithNameIsOfType (#4451 )	2020-12-29 11:53:06 +03:00
Onur Tirtir	04a4167a8a	Implement GetPgDependTuplesForDependingObjects	2020-12-25 18:03:28 +03:00
Hadi Moshayedi	a2c73bef27	Trigger metadata sync at transaction commit	2020-12-24 08:28:38 -08:00
Hadi Moshayedi	4df723cf9b	Do metadata sync in a separate background worker.	2020-12-24 08:25:55 -08:00
Ahmet Gedemenli	48ca1637a4	Propagate alter stats owner	2020-12-24 17:10:12 +03:00
Ahmet Gedemenli	f7c70f9a63	Propagate alter stats target	2020-12-24 17:10:12 +03:00
Ahmet Gedemenli	5a1607b6c0	Propagate alter stats schema	2020-12-24 17:10:12 +03:00
Ahmet Gedemenli	bdce4a7e67	Propagate rename statistics	2020-12-24 17:10:12 +03:00
Onur Tirtir	5ed9197041	Implement infra to get foreign key connected relations (#4439 ) On top of our foreign key graph, implement the infrastructure to get list of relations that are connected to input relation via a foreign key graph. We need this to support cascading create_citus_local_table & undistribute_table operations. Also add regression tests to see what our foreign key graph is able to capture currently.	2020-12-24 16:42:40 +03:00
Halil Ozan Akgül	9fd3f62cb6	Refactor foreign key functions to use table types (#4424 ) * Reuses extractReferencing/Referenced variables * Refactors GetForeignKeyOids function to check table types * Converts flags to inclusive	2020-12-23 17:05:09 +03:00
Onur Tirtir	d1b3eaf767	Refactor ColumnAppearsInForeignKeyToReferenceTable (#4441 )	2020-12-23 11:44:02 +03:00
Ahmet Gedemenli	874fa1fc09	Propagate Drop Statistics	2020-12-22 18:34:46 +03:00
Marco Slot	f2056e553f	Expose partition column of subqueries in optimizer (#4355 ) Co-authored-by: Marco Slot <marco.slot@gmail.com>	2020-12-18 20:32:52 +01:00
Ahmet Gedemenli	770d3da1ca	Add dependencies for stat schemas	2020-12-18 17:04:13 +03:00
Ahmet Gedemenli	6c0465566a	Propagate create statistics	2020-12-17 20:38:36 +03:00
Marco Slot	100e5d3196	Address review feedback	2020-12-15 15:23:38 +01:00
Sait Talha Nisanci	7951273f74	Refactor WrapRteRelationIntoSubquery	2020-12-15 18:18:36 +03:00
Sait Talha Nisanci	0e53aa5d3b	Add more tests	2020-12-15 18:18:36 +03:00
Sait Talha Nisanci	f7c1509fed	Not check if the query is routable for converting It seems that there are only very few cases where that is useful, and for now we prefer not having that check. This means that we might perform some unnecessary checks, but that should be rare and not performance critical.	2020-12-15 18:18:36 +03:00
Sait Talha Nisanci	1d82972ff4	Increase the performance with a trick Instead of sending NULL's over a network, we now convert the subqueries in the form of: SELECT t.a, NULL, NULL FROM (SELECT a FROM table)t; And we recursively plan the inner part so that we don't send the NULL's over network. We still need the NULLs in the outer subquery because we currently don't have an easy way of updating all the necessary places in the query. Add some documentation for how the conversion is done	2020-12-15 18:18:36 +03:00
Sait Talha Nisanci	3aed6c3ad0	Rename containsOnlyLocalTable as isLocalTableModification Update error message in Modify View	2020-12-15 18:18:36 +03:00
Sait Talha Nisanci	5618f3a3fc	Use BaseRestrictInfo for finding equality columns Baseinfo also has pushed down filters etc, so it makes more sense to use BaseRestrictInfo to determine what columns have constant equality filters. Also RteIdentity is used for removing conversion candidates instead of rteIndex.	2020-12-15 18:18:36 +03:00
Sait Talha Nisanci	28c5b6a425	Convert some hard coded errors to deferred errors in router planner	2020-12-15 18:18:36 +03:00
Sait Talha Nisanci	69992d58f9	Add broken local-dist table modifications tests It seems that most of the updates were broken, we weren't aware of it because there wasn't any data in the tables. They are broken mostly because local tables do not have a shard id and some code paths should be updated with that information, currently when there is an invalid shard id, it is assumed to be pruned. Consider local tables in router planner In case there is a local table, the shard id will not be valid and there are some checks that rely on shard id, we should skip these in case of local tables, which is handled with a dummy placement. Add citus local table dist table join tests add local-dist table mixed joins tests	2020-12-15 18:18:36 +03:00
Sait Talha Nisanci	a34504d7bf	Move recursive planning related function to recursive_planning	2020-12-15 18:17:10 +03:00
Sait Talha Nisanci	2a44029aaf	Simplify ContainsTableToBeConvertedToSubquery AllDataLocallyAccessible and ContainsLocalTableSubqueryJoin are removed. We can possibly remove ModifiesLocalTableWithRemoteCitusLocalTable as well. Though this removal has a side effect that now when all the data is locally available, we could still wrap a relation into a subquery, I guess that should be resolved in the router planner itself. Add more tests	2020-12-15 18:17:10 +03:00
Sait Talha Nisanci	26d9f0b457	Use auto mode in tests and fix debug message	2020-12-15 18:17:10 +03:00
Sait Talha Nisanci	eebcd995b3	Add some more tests	2020-12-15 18:17:10 +03:00
Sait Talha Nisanci	5693cabc41	Not convert an already routable plannable query We should not recursively plan an already routable plannable query. An example of this is (SELECT * FROM local JOIN (SELECT * FROM dist) d1 USING(a)); So we let the recursive planner do all of its work and at the end we convert the final query to to handle unsupported joins. While doing each conversion, we check if it is router plannable, if so we stop. Only consider range table entries that are in jointree If a range table is not in jointree then there is no point in considering that because we are trying to convert range table entries to subqueries for join use case.	2020-12-15 18:17:10 +03:00
Sait Talha Nisanci	2ff65f3630	Enable partitioned distributed tables in local-dist table joins	2020-12-15 18:17:10 +03:00
Sait Talha Nisanci	44953579cf	Enable citus-local distributed table joins Check equality in quals We want to recursively plan distributed tables only if they have an equality filter on a unique column. So '>' and '<' operators will not trigger recursive planning of distributed tables in local-distributed table joins. Recursively plan distributed table only if the filter is constant If the filter is not a constant then the join might return multiple rows and there is a chance that the distributed table will return huge data. Hence if the filter is not constant we choose to recursively plan the local table.	2020-12-15 18:17:10 +03:00
Sait Talha Nisanci	f3d55448b3	Choose distributed table if it has a unique index in filter When doing local-distributed table joins we convert one of them to subquery. The current policy is that we convert distributed tables to subquery if it has a unique index on a column that has unique index(primary key also has a unique index).	2020-12-15 18:17:10 +03:00
Onder Kalaci	3f4952cc2b	Pushdown projections when relations are recursively planned This is important to limit the data transfer size.	2020-12-15 18:17:10 +03:00
Onder Kalaci	594e001f3b	Add filter pushdown regression tests Also handle WHERE false	2020-12-15 18:17:10 +03:00
Onder Kalaci	7a4d6b2984	Handle modifications as well	2020-12-15 18:17:10 +03:00
Onder Kalaci	8f8390ed6e	Recursively plan local table joins The logical planner cannot handle joins between local and distributed table. Instead, we can recursively plan one side of the join and let the logical planner handle the rest. Our algorithm is a little smart, trying not to recursively plan distributed tables, but favors local tables.	2020-12-15 18:17:10 +03:00
Onder Kalaci	7cc25c9125	Add ability to fetch the restrictions per relation With this commit, we add the ability to add restrictions per relation. We simply rely on the restrictions that Postgres keeps per relation.	2020-12-15 18:17:10 +03:00
Marco Slot	f2538a456f	Support co-located/recurring sublinks in the target list	2020-12-13 15:45:24 +01:00
Hadi Moshayedi	4668fe51a6	Columnar: Make compression level configurable	2020-12-09 08:48:50 -08:00
Hadi Moshayedi	f5a4a4bc74	Columnar: Support zstd compression	2020-12-09 08:30:55 -08:00
Hadi Moshayedi	3f81ee26fd	Columnar: Support LZ4 compression	2020-12-09 08:29:07 -08:00
Jeff Davis	3758e83850	Rename cstore->columnar in SQL objects and errors.	2020-12-07 13:01:53 -08:00
Ahmet Gedemenli	936775e8e3	Delete transactions when removing node With this commit, we delete entries in pg_dist_transaction for the primary nodes that are removed by `master_remove_node`.	2020-12-07 11:35:20 +03:00
Hadi Moshayedi	01da2a1c73	Columnar: track decompressed length in metadata	2020-12-04 09:09:39 -08:00
Hadi Moshayedi	4a9aebaa7b	Columnar: rename block to chunk	2020-12-03 08:50:19 -08:00
SaitTalhaNisanci	f164575524	Add a utility to process each table index (#4382 ) A utility function is added so that each caller can implement a handler for each index on a given table. This means that the caller doesn't need to worry about how to access each index, the only thing that it needs to do each to implement a function to which each index on the table is passed iteratively.	2020-12-03 16:33:13 +03:00
Onder Kalaci	c546ec5e78	Local node connection management When Citus needs to parallelize queries on the local node (e.g., the node executing the distributed query and the shards are the same), we need to be mindful about the connection management. The reason is that the client backends that are running distributed queries are competing with the client backends that Citus initiates to parallelize the queries in order to get a slot on the max_connections. In that regard, we implemented a "failover" mechanism where if the distributed queries cannot get a connection, the execution failovers the tasks to the local execution. The failover logic is follows: - As the connection manager if it is OK to get a connection - If yes, we are good. - If no, we fail the workerPool and the failure triggers the failover of the tasks to local execution queue The decision of getting a connection is follows: /* * For local nodes, solely relying on citus.max_shared_pool_size or * max_connections might not be sufficient. The former gives us * a preview of the future (e.g., we let the new connections to establish, * but they are not established yet). The latter gives us the close to * precise view of the past (e.g., the active number of client backends). * * Overall, we want to limit both of the metrics. The former limit typically * kics in under regular loads, where the load of the database increases in * a reasonable pace. The latter limit typically kicks in when the database * is issued lots of concurrent sessions at the same time, such as benchmarks. */	2020-12-03 14:16:13 +03:00
Hadi Moshayedi	c2f60b6422	Columnar: pg_upgrade support (#4354 )	2020-12-02 08:46:59 -08:00
Ahmet Gedemenli	514c6a76ac	Propagate alter schema rename	2020-12-02 15:18:26 +03:00
Nils Dijk	6f9c040f76	DESCRIPTION: Propagate columnar table settings for distributed tables When distributing a columnar table, as well as changing options on a distributed columnar table, this patch will forward the settings from the coordinator to the workers. For propagating options changes on an already distributed table this change is pretty straight forward. Before applying the change in options locally we will create a `DDLJob` that contains a call to `alter_columnar_table_set(...)` for every shard placement with all settings of the current table. This goes both for setting an option as well as resetting. This will reset the values to the defaults configured on the coordinator. Having the effect that the coordinator is authoritative on the settings and makes sure the shards have the same settings set as the table on the coordinator. When a columnar table is distributed it is using the `TableDDLCommand` infra structure to create a new kind of `TableDDLCommand`. This new type, called a `TableDDLCommandFunction` contains a context and 2 function pointers to execute. One function returns the command as applied on the table, the second function will return the sql command to apply to a shard with a given shard id. The schema name is ignored as it will use the fully qualified name of the shard in the same schema as the base table.	2020-12-02 13:02:42 +01:00
Onder Kalaci	f7e1aa3f22	Multi-row INSERTs use local execution when placements are local Multi-row execution already uses sequential execution. When shards are local, using local execution is profitable as it avoids an extra connection establishment to the local node.	2020-12-01 21:37:59 +03:00
Hadi Moshayedi	a94e8c9cda	Associate column store metadata with storage id (#4347 )	2020-11-30 18:01:43 -08:00
Onur Tirtir	7f3d1182ed	Handle invalid connection hash entries (#4362 ) If MemoryContextAlloc errors out -e.g. during an OOM-, ConnectionHashEntry->connections stays as NULL. With this commit, we add isValid flag to ConnectionHashEntry that should be set to true right after we allocate & initialize ConnectionHashEntry->connections list properly, and we check it before accesing to ConnectionHashEntry->connections.	2020-11-30 19:44:03 +03:00
Nils Dijk	383e334023	refactor options to their own table linked to the regclass (#4346 ) Columnar options were by accident linked to the relfilenode instead of the regclass/relation oid. This PR moves everything related to columnar options to their own catalog table.	2020-11-27 11:22:08 -08:00
Nils Dijk	326e6afa53	refactor table ddl events scoped for shards (#4342 ) Refactor internals on how Citus creates the SQL commands it sends to recreate shards. Before Citus collected solely ddl commands as `char `'s to recreate a table. If they were used to create a shard they were wrapped with `worker_apply_shard_ddl_command` and send to the workers. On the workers the UDF wrapping the ddl command would rewrite the parsetree to replace tables names with their shard name equivalent. This worked well, but poses an issue when adding columnar. Due to limitations in Postgres on creating custom options on table access methods we need to fall back on a UDF to set columnar specific options. Now, to recreate the table, we can not longer rely on having solely DDL statements to recreate a table. A prototype was made to run this UDF wrapped in `worker_apply_shard_ddl_command`. This became pretty messy, hard to understand and subsequently hard to maintain. This PR proposes a refactor of the internal representation of table ddl commands into a `TableDDLCommand` structure. The current implementation only supports a `char ` as its contents. Based on the use of the DDL statement (eg. creating the table -mx- or creating a shard) one of two different functions can be called to get the statement to send to the worker: - `GetTableDDLCommand(TableDDLCommand command)`: This function returns that ddl command to create the table. In this implementation it will just return the `char `. This has the same functionality as getting the old list and not wrapping it. - `GetShardedTableDDLCommand(TableDDLCommand command, uint64 shardId, char schemaName)`: This function returns the ddl command wrapped in `worker_apply_shard_ddl_command` with the `shardId` as an argument. Due to backwards compatibility it also accepts a. `schemaName`. The exact purpose is not directly clear. Ideally new implementations would work with fully qualified statements and ignore the `schemaName`. A future implementation could accept 2.function pointers and a `void *` for context to let the two pointers work on. This gives greater flexibility in controlling what commands get send in which situations. Also, in a future, we could implement the intermediate step of creating the `parsetree` datastructure of statements based on the contents in the catalog with a corresponding deparser. For sharded queries a mutator could be ran over the parsetree to rewrite the tablenames to the names with the shard identifier. This will completely omit the requirement for `worker_apply_shard_ddl_command`.	2020-11-26 13:31:59 +01:00
Onder Kalaci	629ecc3dee	Add the infrastructure to count the number of client backends Considering the adaptive connection management improvements that we plan to roll soon, it makes it very helpful to know the number of active client backends. We are doing this addition to simplify yhe adaptive connection management for single node Citus. In single node Citus, both the client backends and Citus parallel queries would compete to get slots on Postgres' `max_connections` on the same Citus database. With adaptive connection management, we have the counters for Citus parallel queries. That helps us to adaptively decide on the remote executions pool size (e.g., throttle connections if necessary). However, we do not have any counters for the total number of client backends on the database. For single node Citus, we should consider all the client backends, not only the remote connections that Citus does. Of course Postgres internally knows how many client backends are active. However, to get that number Postgres iterates over all the backends. For examaple, see [pg_stat_get_db_numbackends](`8e90ec5580/src/backend/utils/adt/pgstatfuncs.c (L1240)`) where Postgres iterates over all the backends. For our purpuses, we need this information on every connection establishment. That's why we cannot affort to do this kind of iterattion.	2020-11-25 19:19:24 +01:00
Onur Tirtir	46be63d76b	Refactor PreprocessIndexStmt (#4272 )	2020-11-25 12:19:37 +03:00
Hadi Moshayedi	40b52ab757	Fix memory leaks in column store	2020-11-23 11:26:12 -08:00
Jeff Davis	8cee2b092b	remove columnar FDW code	2020-11-20 10:03:12 -08:00
Onder Kalaci	c433c66f2b	Do not execute subplans multiple times with cursors Before this commit, we let AdaptiveExecutorPreExecutorRun() to be effective multiple times on every FETCH on cursors. That does not affect the correctness of the query results, but adds significant overhead.	2020-11-20 10:43:56 +01:00
Jeff Davis	a2b698a766	rename cstore_tableam -> columnar	2020-11-19 12:15:51 -08:00
Hadi Moshayedi	97cba2d5b6	Implements write state management for tuple inserts. TableAM API doesn't allow us to pass around a state variable along all of the tuple inserts belonging to the same command. We require this in columnar store, since we batch them, and when we have enough rows we flush them as stripes. To do that, we keep a (relfilenode) -> stack of (subxact id, TableWriteState) global mapping. Inserts Whenever we want to insert a tuple, we look up for the relation's relfilenode in this mapping. If top of the stack matches current subtransaction, we us the existing TableWriteState. Otherwise, we allocate a new TableWriteState and push it on top of stack. (Sub)Transaction Commit/Aborts When the subtransaction or transaction is committed, we flush and pop all entries matching current SubTransactionId. When the subtransaction or transaction is committed, we pop all entries matching current SubTransactionId and discard them without flushing. Reads Since we might have unwritten rows which needs to be read by a table scan, we flush write states on SELECTs. Since flushing the write state of upper transactions in a subtransaction will cause metadata being written in wrong subtransaction, we ERROR out if any of the upper subtransactions have unflushed rows. Table Drops We record in which subtransaction the table was dropped. When committing a subtransaction in which table was dropped, we propagate the drop to upper transaction. When aborting a subtransaction in which table was dropped, we mark table as not deleted.	2020-11-17 12:07:16 -08:00
Nils Dijk	725f4a37d0	change configure to not have options	2020-11-17 19:01:54 +01:00
Nils Dijk	213eb93e6d	make columnar compile and functionally working	2020-11-17 18:55:34 +01:00
Nils Dijk	527d3ce0bb	move headers to include directory	2020-11-17 18:55:34 +01:00
Önder Kalacı	0c0fc69f2a	Remove unused field (#4275 )	2020-11-17 11:41:57 +01:00
Hanefi Onaldi	d3019f1b6d	Introduce foreach_ptr_modify macro (#4303 ) If one wishes to iterate through a List and insert list elements in PG13, it is not safe to use for_each_ptr as the List representation in PostgreSQL no longer linked lists, but arrays, and it is possible that the whole array is repalloc'ed if ther is not sufficient space available. See postgres commit 1cff1b95ab6ddae32faa3efe0d95a820dbfdc164 for more information	2020-11-09 12:03:59 +03:00
Onder Kalaci	e0d2ac7620	Do not rely on set_rel_pathlist_hook for finding local relations When a relation is used on an OUTER JOIN with FALSE filters, set_rel_pathlist_hook may not be called for the table. There might be other cases as well, so do not rely on the hook for classification of the tables.	2020-11-06 11:14:30 +01:00
Halil Ozan Akgul	77b3be8b6d	Turn RelOptInfos to only used field of them, relids, to be able to copy	2020-10-22 13:42:28 +03:00
Onder Kalaci	5c4c9304ba	Remove RemoveDuplicateJoinRestrictions() function RemoveDuplicateJoinRestrictions() function was introduced with the aim of decrasing the overall planning times by eliminating the duplicate JOIN restriction entries (#1989). However, it turns out that the function itself is so CPU intensive with a very high algorithmic complexity, it hurts a lot more than it helps. The function is a clear example of premature optimization. The table below shows the difference clearly: "distributed query planning time master" RemoveDuplicateJoinRestrictions() execution time on master "Remove the function RemoveDuplicateJoinRestrictions() this PR" 5 table INNER JOIN 9 msec 2msec 7 msec 10 table INNER JOIN 227 msec 194 msec 29 msec 20 table INNER JOIN 1 sec 235 msec 1 sec 139 msec 90 msecs 50 table INNER JOIN 24 seconds 21 seconds 1.5 seconds 100 table INNER JOIN 2 minutes 16 secods 1 minute 53 seconds 23 seconds 250 table INNER JOIN Bottleneck on JoinClauseList 18 minutes 52 seconds Bottleneck on JoinClauseList 5 table INNER JOIN in subquery 9 msec 0 msec 6 msec 10 table INNER JOIN subquery 33 msec 10 msec 32 msec 20 table INNER JOIN subquery 132 msec 67 msec 123 msec 50 table INNER JOIN subquery 1.2 seconds 900 msec 500 msec 100 table INNER JOIN subquery 6 seconds 5 seconds 2 seconds 250 table INNER JOIN subquery 54 seconds 37 seconds 20 seconds 5 table LEFT JOIN 5 msec 0 msec 5 msec 10 table LEFT JOIN 11 msec 0 msec 13 msec 20 table LEFT JOIN 26 msec 2 msec 30 msec 50 table LEFT JOIN 150 msec 15 msec 193 msec 100 table LEFT JOIN 757 msec 71 msec 722 msec 250 table LEFT JOIN 8 seconds 600 msec 8 seconds 5 JOINs among 2 table JOINs 37 msec 11 msec 25 msec 10 JOINs among 2 table JOINs 536 msec 306 msec 352 msec 20 JOINs among 2 table JOINs 794 msec 181 msec 640 msec 50 JOINs among 2 table JOINs 25 seconds 2 seconds 22 seconds 100 JOINs among 2 table JOINs Bottleneck on JoinClauseList 9 seconds Bottleneck on JoinClauseList 150 JOINs among 2 table JOINs Bottleneck on JoinClauseList 46 seconds Bottleneck on JoinClauseList On top of the performance penalty, the function had a critical bug #4255, and with #4254 we hit one more important bug. It should be fixed by adding the followig check to the ContextCoversJoinRestriction(): ``` static bool JoinRelIdsSame(JoinRestriction leftRestriction, JoinRestriction rightRestriction) { Relids leftInnerRelIds = leftRestriction->innerrel->relids; Relids rightInnerRelIds = rightRestriction->innerrel->relids; if (!bms_equal(leftInnerRelIds, rightInnerRelIds)) { return false; } Relids leftOuterRelIds = leftRestriction->outerrel->relids; Relids rightOuterRelIds = rightRestriction->outerrel->relids; if (!bms_equal(leftOuterRelIds, rightOuterRelIds)) { return false; } return true; } ``` However, adding this eliminates all the benefits tha RemoveDuplicateJoinRestrictions() brings. I've used the commands here to generate the JOINs mentioned in the PR: https://gist.github.com/onderkalaci/fe8654f9df5916c7af4c7c5eb892561e#file-gistfile1-txt Inner and outer JOINs behave roughly the same, to simplify the table only added INNER joins.	2020-10-21 10:29:39 +02:00
SaitTalhaNisanci	0f209377c4	Fix incorrect join related fields (#4242 ) * Fix incorrect join related fields Ruleutils expect to give the original index of join columns hence we should consider the dropped columns while setting the fields in SetJoinRelatedFieldsCompat. * add some more tests for joins * Move tests to join.sql and create a utility function	2020-10-19 18:28:39 +03:00
Onur Tirtir	c49077d594	Disallow outer joins `ON TRUE` with ref & dist tables when ref table is outer relation (#4255 ) Disallow `ON TRUE` outer joins with reference & distributed tables when reference table is outer relation by fixing the logic bug made when calling `LeftListIsSubset` function. Also, be more defensive when removing duplicate join restrictions when join clause is empty for non-inner joins as they might still contain useful information for non-inner joins.	2020-10-19 16:58:11 +03:00
Onur Tirtir	f80f4839ad	Remove unused functions that cppcheck found	2020-10-19 13:50:52 +03:00
Onder Kalaci	bbedfca761	Improve the relation restriction counters It seems like Postgres could call set_rel_pathlist() for the same relation multiple times. This breaks the logic where we assume relationCount eqauls to the number of entries in relationRestrictionList. In summary, relationRestrictionList may contain duplicate entries.	2020-10-19 08:51:16 +02:00
Nils Dijk	caabbf4b84	Table access method support for distributed tables	2020-10-16 12:02:25 -07:00
Onur Tirtir	7cb07c70fa	Move hasSemiJoin to JoinRestrictionContext (#4256 )	2020-10-16 18:37:39 +03:00
Onur Tirtir	de6f2d3f42	Refactor JoinRestrictionListExistsInContext to improve readability (#4249 )	2020-10-16 12:24:56 +03:00
Onder Kalaci	fe3caf3bc8	Local execution considers intermediate result size limit With this commit, we make sure that local execution adds the intermediate result size as the distributed execution adds. Plus, it enforces the citus.max_intermediate_result_size value.	2020-10-15 17:18:55 +02:00
Marco Slot	31858c8a29	Check table existence in EnsureRelationKindSupported	2020-10-15 17:05:06 +02:00
Sait Talha Nisanci	ecde6c6eef	Introduce GetCurrentLocalExecutionStatus wrapper We should not access CurrentLocalExecutionStatus directly because that would mean that we could also set it directly, which we shouldn't because we have checks to see if the new state is possible, otherwise we error.	2020-10-15 15:38:19 +03:00
Halil Ozan Akgul	e2736c25bd	Adds support for WITH TIES option	2020-10-12 19:34:18 +03:00
Marco Slot	73fc054c27	Rename DDL command functions	2020-10-06 11:30:56 +02:00
Marco Slot	dbc348b7e0	Create sequence dependency during metadata syncing	2020-10-06 10:57:39 +02:00
Ahmet Gedemenli	81db4dca5c	Degrade gracefully when no background workers available	2020-10-05 16:55:00 +03:00
Hanefi Önaldı	6d8e83d24f	Replace worker_hash calls with partkey IS NOT NULL filters	2020-10-02 18:16:24 +03:00
Önder Kalacı	df5aa0f0cc	Switch to sequential execution if the index name is long (#4209 ) Citus has the logic to truncate the long shard names to prevent various issues, including self-deadlocks. However, for partitioned tables, when index is created on the parent table, the index names on the partitions are auto-generated by Postgres. We use the same Postgres function to generate the index names on the shards of the partitions. If the length exceeds the limit, we switch to sequential execution mode.	2020-10-02 13:39:34 +03:00
Onder Kalaci	56ca256374	Forcefully terminate connections after citus.node_connection_timeout After the connection timeout, we fail the session/pool. However, the underlying connection can still be trying to connect. That is dangerous because the new placement executions have already been in place. The executor cannot handle the situation where multiple of EXECUTION_ORDER_ANY task executions succeeds. Adding a regression test doesn't seem easily doable. To reproduce the issue - Add 2 worker nodes - create a reference table - set citus.node_connection_timeout to 1ms (requires code change) - Continiously execute `SELECT count(*) FROM ref_table` - Sometime later, you hit an out-of-array access in `ScheduleNextPlacementExecution()` hence crashing. - The reason for that is sometimes the first connection successfully established while the executor is already trying to execute the query on the second node.	2020-09-30 18:24:24 +02:00
Marco Slot	b905c8043d	Fix create index concurrently crash with local execution	2020-09-25 11:49:09 +02:00
Ahmet Gedemenli	abfb79bda6	Sort explain analyze output by task time Add sort method parameter for regression tests Fix check-style Change sorting method parameters to enum Polish Add task fields to OutTask Add test into multi_explain Fix isolation test	2020-09-24 11:38:40 +03:00
SaitTalhaNisanci	e7cd1ed0ee	Not take ShareUpdateExlusiveLock on pg_dist_transaction (#4184 ) * Not take ShareUpdateExlusiveLock on pg_dist_transaction We were taking ShareUpdateExlusiveLock on pg_dist_transaction during recovery to prevent multiple recoveries happening concurrenly. VACUUM( not FULL) also takes ShareUpdateExclusiveLock, and they can conflict. It seems that VACUUM will skip the table if there is a conflicting lock already taken unless it is doing the vacuum to prevent id wraparound, in which case there can be a deadlock. I guess the deadlock happens if: - VACUUM takes a lock on pg_dist_transaction and is done for id wraparound problem - The transaction in the maintenance tries to take a lock but cannot as that conflicts with the lock acquired by VACUUM - The transaction in the maintenance daemon has a very old xid hence VACUUM cannot proceed. If we take a row exclusive lock in transaction recovery then it wouldn't conflict with VACUUM hence it could proceed so the deadlock would be resolved. To prevent concurrent transaction recoveries happening, an advisory lock is taken with ShareUpdateExlusiveLock as before. * Use CITUS_OPERATIONS tag	2020-09-21 15:20:38 +03:00
Onur Tirtir	1b31b22635	Refactor the functions that return OID lists for citus tables	2020-09-18 16:42:46 +03:00
SaitTalhaNisanci	dae2c69fd7	Not allow removing a single node with ref tables (#4127 ) * Not allow removing a single node with ref tables We should not allow removing a node if it is the only node in the cluster and there is a data on it. We have this check for distributed tables but we didn't have it for reference tables. * Update src/test/regress/expected/single_node.out Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com> * Update src/test/regress/sql/single_node.sql Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>	2020-09-18 15:35:59 +03:00
Onur Tirtir	4118560b75	Prevent citus local table creation from a catalog table (#4158 )	2020-09-15 14:30:48 +03:00
Marco Slot	bd12555b16	Fix distributing tables owned by extensions	2020-09-10 04:46:11 +02:00
Onur Tirtir	3a73fba810	Apply planner changes for citus local tables	2020-09-09 11:51:18 +03:00
Onur Tirtir	a58a4395ab	Extend citus local table utility command support This commit brings following features: Foreign key support from citus local tables to reference tables * Foreign key support from reference tables to citus local tables (only with RESTRICT & NO ACTION behavior) * ALTER TABLE ENABLE/DISABLE trigger command support * CREATE/DROP/ALTER trigger command support and disallows: * ALTER TABLE ATTACH/DETACH PARTITION commands * CREATE TABLE <postgres table> ATTACH PARTITION <citus local table> commands * Foreign keys from postgres tables to citus local tables (the other way was already disallowed) for citus local tables.	2020-09-09 11:50:55 +03:00
Onur Tirtir	17cc810372	Implement "citus local table" creation logic	2020-09-09 11:50:48 +03:00
Onur Tirtir	ba208eae4d	Record non-distributed table accesses in local executor (#4139 )	2020-09-07 18:19:08 +03:00
Nils Dijk	bbf42063a7	export LookupShardTransferMode	2020-09-03 16:06:38 +02:00
Nils Dijk	6e4862c57f	expose transfermode for ensure reference table existance	2020-09-03 16:06:37 +02:00
SaitTalhaNisanci	366461ccdb	Introduce cache entry/table utilities (#4132 ) Introduce table entry utility functions Citus table cache entry utilities are introduced so that we can easily extend existing functionality with minimum changes, specifically changes to these functions. For example IsNonDistributedTableCacheEntry can be extended for citus local tables without the need to scan the whole codebase and update each relevant part. * Introduce utility functions to find the type of tables A table type can be a reference table, a hash/range/append distributed table. Utility methods are created so that we don't have to worry about how a table is considered as a reference table etc. This also makes it easy to extend the table types. * Add IsCitusTableType utilities * Rename IsCacheEntryCitusTableType -> IsCitusTableTypeCacheEntry * Change citus table types in some checks	2020-09-02 22:26:05 +03:00
Jelte Fennema	451ea04508	Rename ForceXxx functions to to XxxOrError This clearer naming was suggested in https://github.com/citusdata/citus/pull/4001	2020-09-01 11:19:17 +02:00
Hanefi Önaldı	024d398cd7	Allow distribution of functions that read from reference tables create_distributed_function(function_name, distribution_arg_name, colocate_with text) This UDF did not allow colocate_with parameters when there were no disttribution_arg_name supplied. This commit changes the behaviour to allow missing distribution_arg_name parameters when the function should be colocated with a reference table.	2020-09-01 07:28:34 +03:00
Hanefi Onaldi	f47b3a7e7d	Remove unused parameters from round robin reordering and friends (#4120 )	2020-08-20 12:45:01 +03:00
SaitTalhaNisanci	679bf0d2b2	Create CanPushdownSubqery wrapper for better readability (#4108 )	2020-08-12 17:28:20 +03:00
SaitTalhaNisanci	73ef40886b	Rename FindNodeCheckXXX functions (#4106 ) FindNodeCheck is not clear about what the function is doing. They are renamed to FindNodeMatchingCheckFunctionXXX. Also for choosing elements in these functions, CheckNodeFunc type is introduced.	2020-08-11 15:01:23 +03:00
Hadi Moshayedi	7b74eca22d	Support EXPLAIN EXECUTE ANALYZE.	2020-08-10 13:44:30 -07:00
Halil Ozan Akgul	375310b7f1	Adds support for table undistribution	2020-08-05 14:36:03 +03:00
Sait Talha Nisanci	fe4ac51d8c	Normalize Output:.. since it changes with pg13 Fix indentation for better readability	2020-08-04 15:38:13 +03:00
Sait Talha Nisanci	63ed126ad4	Set buffer usage with explain It seems that currently we process even postgres tables in explain commands. This is because we register a hook for explain and we don't have any check to see if the query has any citus table. With this commit, we now send the buffer usage as well to the relevant API. There is some duplicate in the code but it is because of the existing structure, we can refactor this separately.	2020-08-04 15:38:13 +03:00
Sait Talha Nisanci	fe1e1c9b68	Replace Set_ptr_value as SetListCellPtr to be more explicit Move header to right place and fix comment style	2020-08-04 15:38:13 +03:00
Sait Talha Nisanci	8e9b52971c	Use new var field names in the codebase The codebase is updated to use varattnosync and varnosyn and we defined the macros for older versions. This way we can just remove the macros when we drop an older version.	2020-08-04 15:38:13 +03:00
Sait Talha Nisanci	b641f63bfd	Use CMDTAG_SELECT_COMPAT CMDTAG_SELECT exists in PG12 hence defining a MACRO such as CMDTAG_SELECT -> "SELECT" is not possible. I chose CMDTAG_SELECT_COMPAT because with the COMPAT suffix it is explicit that it maps to different things in different versions and also has a less chance of mapping something irrevelant. For example if we used SELECT as a macro, then it would map every SELECT to whatever it is mapping to, which might have unexpected/undesired behaviour.	2020-08-04 15:38:13 +03:00
Sait Talha Nisanci	d68bfc5687	Improve error for index operator class parameters The error message when index has opclassopts is improved and the commit from postgres side is also included for future reference. Also some minor style related changes are applied.	2020-08-04 15:38:13 +03:00
Sait Talha Nisanci	1070828465	update cte inline output for pg13 Make some macros in version_compat more robust Remove commented code in ruleutils Remove unnecessary variable assignments	2020-08-04 15:18:27 +03:00
Sait Talha Nisanci	1112b254a7	adapt recently added code for pg13 This commit mostly adds pg_get_triggerdef_command to our ruleutils_13. This doesn't add anything extra for ruleutils 13 so it is basically a copy of the change on ruleutils_12	2020-08-04 15:18:27 +03:00
Sait Talha Nisanci	38aaf1faba	use QueryCompletion struct Postgres introduced QueryCompletion struct. Hence a compat utility is added to finish query completion for older versions and pg >= 13. The commit on Postgres side: 2f9661311b83dc481fc19f6e3bda015392010a40	2020-08-04 15:10:22 +03:00
Sait Talha Nisanci	9f1ec792b3	add queryString to distributed_planner distributed_planner now takes query string as a parameter. related commit on PG side: 6aba63ef3e606db71beb596210dd95fa73c44ce2	2020-08-04 15:10:22 +03:00
Sait Talha Nisanci	1a7ccac6ef	Add RangeTableEntryFromNSItem macro addRangeTableEntryXXX methods return a ParseNamespaceItem with pg >= 13. RangeTableEntryFromNSItem macro is added so that we return the range table entry from the ParseNamespaceItem in pg>=13 and for pg < 13 rte would already be returned with addRangeTableEntryXXX methods. Commit on Postgres side: 5815696bc66b3092f6361f53e0394909647042c8	2020-08-04 15:10:22 +03:00
Sait Talha Nisanci	4ed30a0824	create Set_ptr_value Since PG13 changed the list, a listcell doesn't contain data anymore. Therefore Set_ptr_value macro is created, so that depending on the version it will either use cell->data.ptr_value or cell->ptr_value. Commit on Postgres side: 1cff1b95ab6ddae32faa3efe0d95a820dbfdc164	2020-08-04 15:10:22 +03:00
Sait Talha Nisanci	ab85a8129d	map varoattno and varnoold fields in Var With PG13 varoattno and varnoold fields were renamed as varattnosyn and varnosyn. A macro is defined for these. Commit on Postgres side: 9ce77d75c5ab094637cc4a446296dc3be6e3c221 Command on Postgres side: git log --all --grep="varoattno"	2020-08-04 15:10:22 +03:00
Sait Talha Nisanci	688ab16bba	Introduce ExplainOnePlanCompat Since ExplainOnePlan expects BufferUsage as well with PG >= 13, ExplainOnePlanCompat is added. Commit on Postgres side: ed7a5095716ee498ecc406e1b8d5ab92c7662d10	2020-08-04 15:10:22 +03:00
Sait Talha Nisanci	6314eba5df	introduce standard_planner_compat standard_planner now takes the query string as a parameter as well with pg >= 13. Commit on Postgres Side: 66888f7424f7d6c7cea2c26e181054d1455d4e7a	2020-08-04 15:10:22 +03:00
Sait Talha Nisanci	991f49efc9	introduce getOwnedSequencesCompat macro Commit on Postgres side: 19781729f789f3c6b2540e02b96f8aa500460322	2020-08-04 15:10:22 +03:00
Sait Talha Nisanci	00e7386007	introduce PortalDefineQuerySelectCompat PortalDefineQuery doesn't accept char* for command tag anymore with PG >= 13. We are currently only using it with Select, therefore a Portal define query compat for select is created. Commit on PG side: 2f9661311b83dc481fc19f6e3bda015392010a40	2020-08-04 15:10:22 +03:00
Sait Talha Nisanci	62879ee8c1	introduce planner_compat and pg_plan_query_compat macros As the new planner and pg_plan_query_compat methods expect the query string as well, macros are defined to be compatible in different versions of postgres. Relevant commit on Postgres: 6aba63ef3e606db71beb596210dd95fa73c44ce2 Command on Postgres: git log --all --grep="pg_plan_query"	2020-08-04 15:10:22 +03:00
Sait Talha Nisanci	bf831d2e59	Use table_openXXX methods in the codebase With PG13 heap_* (heap_open, heap_close etc) are replaced with table_* (table_open, table_close etc). It is better to use the new table access methods in the codebase and define the macros for the previous versions as we can easily remove the macro without having to change the codebase when we drop the support for the old version. Commits that introduced this change on Postgres: f25968c49697db673f6cd2a07b3f7626779f1827 e0c4ec07284db817e1f8d9adfb3fffc952252db0 4b21acf522d751ba5b6679df391d5121b6c4a35f Command to see relevant commits on Postgres side: git log --all --grep="heap_open"	2020-08-04 15:10:22 +03:00
Sait Talha Nisanci	0819b79631	introduce list compat macros Pass the list to lnext API lnext API now expects the list as well. The commit on Postgres that introduced the change: 1cff1b95ab6ddae32faa3efe0d95a820dbfdc164 lnext_compat and list_delete_cell_compat macros are introduced so that we can use these macros in the codebase without having to use #if directives in the codebase. Related commit on postgres: 1cff1b95ab6ddae32faa3efe0d95a820dbfdc164 Command to search in postgres: git log --all --grep="list_delete_cell" add ListCellAndListWrapper When iterating a list in separate function calls, we need both the list and the current cell starting from PG13, therefore ListCellAndListWrapper is added to store both as a wrapper. Use ListCellAndListWrapper in foreign key test udfs As we iterate a list in these udfs using a functionContext, we need to use the wrapper to be able to access both the list and the current cell.	2020-08-04 15:10:22 +03:00
Sait Talha Nisanci	30549dc0e2	add copy of ruleutils_12 as ruleutils_13	2020-08-04 13:34:13 +03:00
Onder Kalaci	eeb8c81de2	Implement shared connection count reservation & enable `citus.max_shared_pool_size` for COPY With this patch, we introduce `locally_reserved_shared_connections.c/h` files which are responsible for reserving some space in shared memory counters upfront. We sometimes need to reserve connections, but not necessarily establish them. For example: - COPY command should reserve connections as it cannot know which connections it needs in which order. COPY establishes connections as any input data hits the workers. For example, for router COPY command, it only establishes 1 connection. As discussed here (https://github.com/citusdata/citus/pull/3849#pullrequestreview-431792473), COPY needs to reserve connections up-front, otherwise we can end up with resource starvation/un-detected deadlocks.	2020-08-03 18:51:40 +02:00
SaitTalhaNisanci	ef841115de	Fix int32 overflow and use PG macros for INT32_XX (#4061 ) * Use CalculateUniformHashRangeIndex in HashPartitionId INT32_MIN definition can change among different platforms hence it is possible to get overflow, we would see crashes because of this in debian distros. We have already solved a similar problem with introducing CalculateUniformHashRangeIndex method, hence to solve it we can use the same method, this also removes some duplication and has a single place to decide that. * Use PG_INT32_XX instead of INT32_XX to be safer	2020-07-23 18:30:08 +03:00
Onder Kalaci	cfb633601d	Minor refactorings in COPY command execution 1) Rename CONNECTION_PER_PLACEMENT to REQUIRE_CLEAN_CONNECTION. This is mostly to make things clear as the new name reveals more. 2) We also make sure that mark all the copy connections critical, even if they are accessed earlier in the transction	2020-07-23 15:36:19 +02:00
Onder Kalaci	52c0fccb08	Move executor specific logic to a function Because as we're planning to use the same logic, it'd be nice to use the exact same functions.	2020-07-22 15:09:47 +02:00
Onder Kalaci	ff6555299c	Unify node sort ordering The executor relies on WorkerPool, and many other places rely on WorkerNode. With this commit, we make sure that they are sorted via the same function/logic.	2020-07-22 11:03:25 +02:00
Hanefi Önaldı	e534dbae4a	Accept list of values in a supported ALTER ROLE .. SET statement Some GUCs support a list of values which is indicated by GUC_LIST_INPUT flag. When an ALTER ROLE .. SET statement is executed, the new configuration default for affected users and databases are stored in the setconfig(text[]) column in a pg_db_role_setting record. If a GUC that supports a list of values is used in an ALTER ROLE .. SET statement, we need to split the text into items delimited by commas.	2020-07-21 03:49:57 +03:00
Onder Kalaci	c25de2cf22	Remove flag from As it doesn't make any sense anymore	2020-07-20 12:45:05 +02:00
SaitTalhaNisanci	b3af63c8ce	Remove task tracker executor (#3850 ) * use adaptive executor even if task-tracker is set * Update check-multi-mx tests for adaptive executor Basically repartition joins are enabled where necessary. For parallel tests max adaptive executor pool size is decresed to 2, otherwise we would get too many clients error. * Update limit_intermediate_size test It seems that when we use adaptive executor instead of task tracker, we exceed the intermediate result size less in the test. Therefore updated the tests accordingly. * Update multi_router_planner It seems that there is one problem with multi_router_planner when we use adaptive executor, we should fix the following error: +ERROR: relation "authors_range_840010" does not exist +CONTEXT: while executing command on localhost:57637 * update repartition join tests for check-multi * update isolation tests for repartitioning * Error out if shard_replication_factor > 1 with repartitioning As we are removing the task tracker, we cannot switch to it if shard_replication_factor > 1. In that case, we simply error out. * Remove MULTI_EXECUTOR_TASK_TRACKER * Remove multi_task_tracker_executor Some utility methods are moved to task_execution_utils.c. * Remove task tracker protocol methods * Remove task_tracker.c methods * remove unused methods from multi_server_executor * fix style * remove task tracker specific tests from worker_schedule * comment out task tracker udf calls in tests We were using task tracker udfs to test permissions in multi_multiuser.sql. We should find some other way to test them, then we should remove the commented out task tracker calls. * remove task tracker test from follower schedule * remove task tracker tests from multi mx schedule * Remove task-tracker specific functions from worker functions * remove multi task tracker extra schedule * Remove unused methods from multi physical planner * remove task_executor_type related things in tests * remove LoadTuplesIntoTupleStore * Do initial cleanup for repartition leftovers During startup, task tracker would call TrackerCleanupJobDirectories and TrackerCleanupJobSchemas to clean up leftover directories and job schemas. With adaptive executor, while doing repartitions it is possible to leak these things as well. We don't retry cleanups, so it is possible to have leftover in case of errors. TrackerCleanupJobDirectories is renamed as RepartitionCleanupJobDirectories since it is repartition specific now, however TrackerCleanupJobSchemas cannot be used currently because it is task tracker specific. The thing is that this function is a no-op currently. We should add cleaning up intermediate schemas to DoInitialCleanup method when that problem is solved(We might want to solve it in this PR as well) * Revert "remove task tracker tests from multi mx schedule" This reverts commit `03ecc0a681`. * update multi mx repartition parallel tests * not error with task_tracker_conninfo_cache_invalidate * not run 4 repartition queries in parallel It seems that when we run 4 repartition queries in parallel we get too many clients error on CI even though we don't get it locally. Our guess is that, it is because we open/close many connections without doing some work and postgres has some delay to close the connections. Hence even though connections are removed from the pg_stat_activity, they might still not be closed. If the above assumption is correct, it is unlikely for it to happen in practice because: - There is some network latency in clusters, so this leaves some times for connections to be able to close - Repartition joins return some data and that also leaves some time for connections to be fully closed. As we don't get this error in our local, we currently assume that it is not a bug. Ideally this wouldn't happen when we get rid of the task-tracker repartition methods because they don't do any pruning and might be opening more connections than necessary. If this still gives us "too many clients" error, we can try to increase the max_connections in our test suite(which is 100 by default). Also there are different places where this error is given in postgres, but adding some backtrace it seems that we get this from ProcessStartupPacket. The backtraces can be found in this link: https://circleci.com/gh/citusdata/citus/138702 * Set distributePlan->relationIdList when it is needed It seems that we were setting the distributedPlan->relationIdList after JobExecutorType is called, which would choose task-tracker if replication factor > 1 and there is a repartition query. However, it uses relationIdList to decide if the query has a repartition query, and since it was not set yet, it would always think it is not a repartition query and would choose adaptive executor when it should choose task-tracker. * use adaptive executor even with shard_replication_factor > 1 It seems that we were already using adaptive executor when replication_factor > 1. So this commit removes the check. * remove multi_resowner.c and deprecate some settings * remove TaskExecution related leftovers * change deprecated API error message * not recursively plan single relatition repartition subquery * recursively plan single relation repartition subquery * test depreceated task tracker functions * fix overlapping shard intervals in range-distributed test * fix error message for citus_metadata_container * drop task-tracker deprecated functions * put the implemantation back to worker_cleanup_job_schema_cachesince citus cloud uses it * drop some functions, add downgrade script Some deprecated functions are dropped. Downgrade script is added. Some gucs are deprecated. A new guc for repartition joins bucket size is added. * order by a test to fix flappiness	2020-07-18 13:11:36 +03:00
Hadi Moshayedi	13003d8d05	Use TupleDestination API for partitioning in insert/select.	2020-07-17 09:43:46 -07:00
Nils Dijk	d0b6e62c9a	change wording to allowlist and the likes (#3906 ) In the same line as #3904 Change wording to better reflect use and remove words that enforce/maintain bias.	2020-07-15 16:24:40 +02:00
Sait Talha Nisanci	510535f558	address feedback	2020-07-13 19:45:02 +03:00
Sait Talha Nisanci	db1b78148c	send schema creation/cleanup to coordinator in repartitions We were using ALL_WORKERS TargetWorkerSet while sending temporary schema creation and cleanup. We(well mostly I) thought that ALL_WORKERS would also include coordinator when it is added as a worker. It turns out that it was FILTERING OUT the coordinator even if it is added as a worker to the cluster. So to have some context here, in repartitions, for each jobId we create (at least we were supposed to) a schema in each worker node in the cluster. Then we partition each shard table into some intermediate files, which is called the PARTITION step. So after this partition step each node has some intermediate files having tuples in those nodes. Then we fetch the partition files to necessary worker nodes, which is called the FETCH step. Then from the files we create intermediate tables in the temporarily created schemas, which is called a MERGE step. Then after evaluating the result, we remove the temporary schemas(one for each job ID in each node) and files. If node 1 has file1, and node 2 has file2 after PARTITION step, it is enough to either move file1 from node1 to node2 or vice versa. So we prune one of them. In the MERGE step, if the schema for a given jobID doesn't exist, the node tries to use the `public` schema if it is a superuser, which is actually added for testing in the past. So when we were not sending schema creation comands for each job ID to the coordinator(because we were using ALL_WORKERS flag, and it doesn't include the coordinator), we would basically not have any schemas for repartitions in the coordinator. The PARTITION step would be executed on the coordinator (because the tasks are generated in the planner part) and it wouldn't give us any error because it doesn't have anything to do with the temporary schemas(that we didn't create). But later two things would happen: - If by chance the fetch is pruned on the coordinator side, we the other nodes would fetch the partitioned files from the coordinator and execute the query as expected, because it has all the information. - If the fetch tasks are not pruned in the coordinator, in the MERGE step, the coordinator would either error out saying that the necessary schema doesn't exist, or it would try to create the temporary tables under public schema ( if it is a superuser). But then if we had the same task ID with different jobID it would fail saying that the table already exists, which is an error we were getting. In the first case, the query would work okay, but it would still not do the cleanup, hence we would leave the partitioned files from the PARTITION step there. Hence ensure_no_intermediate_data_leak would fail. To make things more explicit and prevent such bugs in the future, ALL_WORKERS is named as ALL_NON_COORD_WORKERS. And a new flag to return all the active nodes is added as ALL_DATA_NODES. For repartition case, we don't use the only-reference table nodes but this version makes the code simpler and there shouldn't be any significant performance issue with that.	2020-07-13 19:20:15 +03:00
SaitTalhaNisanci	15290bc43b	remove unused worker methods (#4017 )	2020-07-10 13:45:55 +03:00
SaitTalhaNisanci	3f50165365	rename TargetWorkerSet enums (#4015 ) Rename TargetWorkerSet enums to make them more explicit about what they mean. Ideally it would be good to treat everything as a node without the 'worker' concept because it makes things complicated. Another improvement could be to rename TargetWorkerSet as TargetNodeSet but it goes to renaming many occurrences of Worker, which is probably too big for this PR.	2020-07-10 11:21:27 +03:00
Jelte Fennema	759e628dd5	Handle some NULL issues that static analysis found (#4001 ) Static analysis found some issues where we used the result from ExtractResultRelationRTE, without checking that it wasn't NULL. It seems like in all these cases it can never actually be NULL, since we have checked before that it isn't a SELECT query. So, this PR is mostly to make static analysis happy (and protect a bit against future changes of the code).	2020-07-09 15:46:42 +02:00
SaitTalhaNisanci	96adce77d6	rename node/worker utilities (#4003 ) The names were not explicit about what they do, and we have many misusages in the codebase, so they are renamed to be more explicit.	2020-07-09 15:30:35 +03:00
Jelte Fennema	f6e2f1b1cb	Replace words that have bad associations (#3992 ) We had a few words in our codebase that static analysis flagged as having bad associations.	2020-07-08 14:57:48 +02:00
Hadi Moshayedi	23fa421639	Fix task->fetchedExplainAnalyzePlan memory issue.	2020-07-07 07:58:02 -07:00
citus bot	f0693e2f75	Remove unused MaxMasterConnectionCount function	2020-07-07 10:37:57 +02:00
citus bot	bdfeb380d3	Fix some more master->coordinator comments	2020-07-07 10:37:53 +02:00
Marco Slot	b4fec63bc0	Rename master evaluation to coordinator evaluation	2020-07-07 10:37:41 +02:00
Marco Slot	eeffbde8bd	Fix pushdown of constants in aggregate queries	2020-06-30 11:41:16 -07:00

... 5 6 7 8 9 ...

1413 Commits (fc09e1cfdcb4619544c6f356b14a39f766c8b718)