citus

Commit Graph

Author	SHA1	Message	Date
Onur Tirtir	6c26c67ea0	Flush write state when initializing read state In next commit, we will adjust curcid of the snapshot being used when scanning the columnar table. However, for index scan, snapshot is provided not when beginning scan but within fetch-tuple call. For this reason, start flushing pending writes in init_columnar_read_state since this seem to be a prerequisite step that needs to be done before scanning a columnar table regardless of the scan method being used.	2021-09-02 11:10:11 +03:00
Onur Tirtir	db0e4ce889	Increment command counter in FinishModifyRelation instead Seems that we always increment the command counter right after finishing metadata table modification. For this reason, it makes sense to call CommandCounterIncrement within FinishModifyRelation.	2021-09-02 11:10:11 +03:00
Onur Tirtir	0b4ed075b5	Use correct snapshot when reading a columnar table Instead of using xact snapshot, use the snapshot provided to columnarAM when scanning table.	2021-09-02 11:10:11 +03:00
Naisila Puka	bd91df298f	Fixes ConnectionModifiedPlacement output for a failed transaction (#5198 )	2021-08-31 18:58:46 +03:00
Naisila Puka	7755d5ed3a	Fixes order of citus_drop_all_shards arguments (#5200 )	2021-08-31 18:25:38 +03:00
Naisila Puka	acb5ae6ab6	Skip dropping shards when we know it's a partition (#5176 )	2021-08-31 17:41:37 +03:00
SaitTalhaNisanci	5ae01303d4	Use get_attnum to find the attribute number of target entry (#5220 ) * Use get_attnum to find the attribute number of target entry	2021-08-31 16:47:19 +03:00
Jelte Fennema	481f8be084	Fix crash in shard rebalancer when no distributed tables exist (#5205 ) The logging of the amount of ignored moves crashed when no distributed tables existed in a cluster. This also fixes in passing that the logging of ignored moves logs the correct number of ignored moves if there exist multiple colocation groups and all are rebalanced at the same time.	2021-08-31 14:15:24 +02:00
SaitTalhaNisanci	d50830d4cc	Update failure tests README (#5197 ) * Update failure tests README I keep finding this page when trying to run failure tests, so updating the README that way: https://github.com/pypa/pipenv/issues/3363#issuecomment-452171564 Co-authored-by: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com> Co-authored-by: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com>	2021-08-26 12:35:06 +03:00
Hanefi Onaldi	7e39c7ea83	Replace master with citus in logs and comments (#5210 ) I replaced - master_add_node, - master_add_inactive_node - master_activate_node with - citus_add_node, - citus_add_inactive_node - citus_activate_node respectively.	2021-08-26 11:31:17 +03:00
SaitTalhaNisanci	b923d51fc6	Bump pg12 and pg13 images to pg12.8 and pg13.8 (#5208 ) In our testing infra structure, even though we use pinned versions of postgres, the auxiliary libraries might pull in newer versions. This is for example the case for libpq, which will now use the libpq libraries from 14beta3. The changes in this PR are a lot due to the libpq changes. We also have changed the citus version that is used as a base for the citus upgrades, from 10.0 to 10.1 . This caused columnar to enforce some extra limits on the settings, which conflicted with our upgrade tests. The changes in failure tests are due to the libpq changes. There are also a lot of changes on isolation tests outputs, hence we updated all of them. Co-authored-by: Nils Dijk <nils@citusdata.com>	2021-08-25 16:04:57 +03:00
SaitTalhaNisanci	c8326df8c0	Fix missing comma in connection options (#5206 )	2021-08-25 13:40:42 +03:00
Jelte Fennema	a31429aae5	Allow configuring tcp_user_timeout using citus.node_conn_info (#5203 ) `tcp_user_timeout` is the awesome relatively unknown big brother of the TCP keepalive related options. Instead of depending on keepalives being sent, this determines that a socket is dead by waiting at most N seconds for an ack of data that it has sent. It's exposed in libpq starting from PG12.	2021-08-24 11:48:40 +03:00
Onur Tirtir	5af839ada0	Not print metapage.reserved_offset in regression tests (#5168 ) * We were anyway not testing reserved_offset in any of those tests but other fields. * This only happens with compressed columnar tables and is because the libzstd/liblz4 versions that we have on exttester ci image might be different than what we might have on our local environments.	2021-08-23 11:07:10 +03:00
Onur Tirtir	7dcd9380e7	Update index support section of columnar README	2021-08-23 10:35:11 +03:00
Onur Tirtir	3acd3ebae2	Remove temp table limitation from columnar README	2021-08-23 10:35:11 +03:00
Onur Tirtir	4e1201a333	Use RelationGetStatExtList instead of scanning pg_stats_ext	2021-08-18 17:50:58 +03:00
Onur Tirtir	4b03195c06	Use RelationGetStatExtList instead of GetExplicitStatisticsIdList	2021-08-18 17:50:57 +03:00
Onur Tirtir	91544d0191	Use PGIndexProcessor infra to find explicitly created indexes	2021-08-18 17:50:57 +03:00
Onur Tirtir	549ca4de6d	Use RelationGetIndexList instead of scanning pg_index	2021-08-18 17:50:57 +03:00
Onur Tirtir	fa9933daf3	Use get_am_name to find indexAM name	2021-08-18 00:44:37 +03:00
Nils Dijk	dfc950ce1e	Fix a segfault caused by use after free in ConnectionsPlacementHash (#5170 ) DESCRIPTION: Fix a segfault caused by use after free in ConnectionsPlacementHash Fix a segfault caused by retaining data in any of the hashmaps making up the Placement Connection Management. We have seen production systems segfault due to random data referenced from ConnectionPlacementHash. On investigation we found that the backends segfaulting on this had OOM errors closely prior to the segfault. It has shown there are at least 15 places where an allocation can OOM that would cause ConnectionPlacementHash to retain pointers to memory from contexts that are subsequently freed. This would reproduce the segfault we have observed in production. Conditions for these allocations are: - allocated after first call to `AssociatePlacementWithShard`: https://github.com/citusdata/citus/blob/v10.0.3/src/backend/distributed/connection/placement_connection.c#L880-L881 - allocated before `StartNodeUserDatabaseConnection`: https://github.com/citusdata/citus/blob/v10.0.3/src/backend/distributed/connection/connection_management.c#L291 At least 15 points of memory allocation (which could fail) are between the callsites of both in a primary key lookup on a reference table - where we have seen an OOM cause a segfault moments later. Instead of leaving any references in ConnectionPlacementHash, ConnectionShardHash and ColocatedPlacementsHash that could retain any pointers that are freed due to the TopTransactionContext being reset we clear all these hashes irregardless of the state of CurrentCoordinatedTransactionState. Downside is that on any transaction abort we will now iterate through 4 hashmaps and clear their contents. Given that they are either already empty, which should cause a quick iteration, or non-empty, causing segfaults in subsequent executions, this overhead seems reasonable. A better solution would be to move the creation of these hashmaps so they would live in the TopTransactionContext themself, assuming their contents would never outlive a transaction. This needs more investigation and is an involved refactor Hence fixing this quickly here.	2021-08-17 17:42:35 +02:00
jeff-davis	4f213f293e	Columnar: use generate_series for test rather than load. (#5181 )	2021-08-16 16:12:06 -07:00
Onur Tirtir	68f46c5dc9	Use scan context for intermediate mem allocs too	2021-08-16 11:06:03 +03:00
Onur Tirtir	b3d9fc91f8	Always use right mem cxt when creating ColumnarReadState All the callers except columnar_relation_copy_for_cluster were already switching to right memory context when creating ColumnarReadState. With this commit, we embed that logic into init_columnar_read_state to avoid further such bugs. That way, we start using the right memory context for columnar_relation_copy_for_cluster too.	2021-08-16 11:06:03 +03:00
Onur Tirtir	7fcecde203	Use init_columnar_read_state instead of lower level func Funtionally, this doesn't change anything. This is just a preparation before next commit.	2021-08-16 11:06:03 +03:00
Burak Velioglu	4355ba0a38	Add CREATE INDEX ... ON ONLY and ALTER INDEX ... ATTACH PARTITION (#4938 #4980 ) - Add support for CRETE INDEX ... ON ONLY: Before that commit we were not sending "ONLY" option to the worker nodes at all. With this commit, "ONLY" parameter will be sent to the worker nodes if it is necessary. (#4938) - Add support for ALTER INDEX ... ATTACH PARTITION: Attach child_index to parent_index by creating same inheritance on shard level in addition to table level. (#4980)	2021-08-13 13:12:45 +03:00
SaitTalhaNisanci	2ec4e37e45	Fix assert failure in FindReferencedTableColumn (#5175 )	2021-08-12 18:21:45 +03:00
Ahmet Gedemenli	9e90894f21	Synchronize hasmetadata flag on mx workers (#5086 ) * Synchronize hasmetadata flag on mx workers * Switch to sequential execution * Add test * Use SetWorkerColumn * Add test for stop_sync * Remove usage of UpdateHasmetadataOnWorkersWithMetadata * Remove MarkNodeMetadataSynced * Fix test for metadatasynced * Remove MarkNodeMetadataSynced * Style * Remove MarkNodeHasMetadata * Remove UpdateDistNodeBoolAttr * Refactor SetWorkerColumn * Use SetWorkerColumnLocalOnly when setting up dependencies * Use SetWorkerColumnLocalOnly in TriggerSyncMetadataToPrimaryNodes * Style * Make update command generator functions static * Set metadatasynced before syncing * Call SetWorkerColumn only if the sync is successful * Try to sync all nodes * Fix indexno * Update metadatasynced locally first * Break if a node fails to sync metadata * Send worker commands optional * Style & Rebase * Add raiseOnError param to SetWorkerColumn * Style * Set metadatasynced for all metadata nodes * Style * Introduce SetWorkerColumnOptional * Polish * Style * Dont send set command to not synced metadata nodes * Style * Polish * Add test for stop_sync * Add test for shouldhaveshards * Add test for isactive flag * Sort by placementid in the function verify_metadata * Cover edge cases for failing nodes * Add comments * Add nodeport to isactive test * Add warning if metadata out of sync * Update warning message	2021-08-12 14:16:18 +03:00
Naisila Puka	e5b32b2c3c	Acquire AccessShareLock before updating table statistics (#5155 )	2021-08-12 13:58:15 +03:00
Onder Kalaci	d4368ff2b3	Make sure that shouldhaveshards is synced to workers	2021-08-11 15:53:31 +02:00
Onder Kalaci	86bd28b92c	Guard against hard WaitEvenSet errors In short, add wrappers around Postgres' AddWaitEventToSet() and ModifyWaitEvent(). AddWaitEventToSet()/ModifyWaitEvent*() may throw hard errors. For example, when the underlying socket for a connection is closed by the remote server and already reflected by the OS, however Citus hasn't had a chance to get this information. In that case, if replication factor is >1, Citus can failover to other nodes for executing the query. Even if replication factor = 1, Citus can give much nicer errors. So CitusAddWaitEventSetToSet()/CitusModifyWaitEvent() simply puts AddWaitEventToSet()/ModifyWaitEvent() into a PG_TRY/PG_CATCH block in order to catch any hard errors, and returns this information to the caller.	2021-08-10 09:35:03 +02:00
Onder Kalaci	5f02d18ef8	transactional metadata sync for maintanince daemon As we use the current user to sync the metadata to the nodes with #5105 (and many other PRs), there is no reason that prevents us to use the coordinated transaction for metadata syncing. This commit also renames few functions to reflect their actual implementation.	2021-08-09 10:34:55 +02:00
Onder Kalaci	35964c6366	Dropped columns do not diverge distribution column for partitioned tables Before this commit, creating a partition after a DROP column on the parent (position before dist. key) was leading to partition to have the wrong distribution column.	2021-08-06 13:36:12 +02:00
jeff-davis	deb7ec605b	Columnar: fix misleading comments and useless types. (#5162 ) CustomScan and CustomPath structures cannot be extended with additional fields. Fix comments and type structure that implied that they can.	2021-08-05 09:22:21 -07:00
Ahmet Gedemenli	51d410bb7b	Add check for alphabetically sorted gucs Move to a separate script Add the new script to readme	2021-08-05 16:37:49 +03:00
naisila	798a7902bf	Fix master_update_table_statistics scripts for 9.5	2021-08-03 18:15:56 +03:00
naisila	f9fa5a3d69	Fix master_update_table_statistics scripts for 9.4	2021-08-03 18:15:56 +03:00
Onder Kalaci	482b8096e9	Introduce citus_internal_update_relation_colocation update_distributed_table_colocation can be called by the relation owner, and internally it updates pg_dist_partition. With this commit, update_distributed_table_colocation uses an internal UDF to access pg_dist_partition. As a result, this operation can now be done by regular users on MX.	2021-08-03 11:44:58 +02:00
Onur Tirtir	93ebbb0607	Re-cost SeqPath's as well for columnar tables	2021-08-02 11:32:25 +03:00
Onur Tirtir	453ac40725	Comment why we still remove non IndexPath's when custom scan is off	2021-08-02 11:25:18 +03:00
Onur Tirtir	a87405b6ba	Not adjust IndexPath cost if indexscan is off	2021-08-02 11:25:18 +03:00
Onur Tirtir	51691a8994	Rename RecostColumnarIndexPaths to RecostColumnarPaths	2021-08-02 11:25:18 +03:00
Onur Tirtir	297f59a70e	Re-cost columnar table index paths	2021-08-02 11:16:37 +03:00
Onur Tirtir	8adcf2096b	Multiply ColumnarCustomScan cost by tblspace.seqpage cost	2021-08-02 11:16:37 +03:00
Onur Tirtir	dba8421453	Refactor ColumnarScanCost into ColumnarPerChunkGroupScanCost	2021-08-02 11:16:37 +03:00
Onur Tirtir	d8f92697f2	Free memory used for last stripe read when re-scanning a columnar table (#5143 ) Instead of setting stripeReadState to NULL, call ColumnarResetRead before re-scanning a columnar table since this function is already designed for doing the necessary clean up when finishing a stripe read. Note that this change shouldn't have a great effect on memory usage since AdvanceStripe was already doing the clean-up for all the stripes except the last one.	2021-08-02 11:16:01 +03:00
Onur Tirtir	73058d35cc	Not free (stripe) chunk buffers after de-serializing Previously, we were only using chunk group reader for sequential scan. However, to support index scans on columnar tables, now we use very same low level functions for index scan too. Since those low-level functions were only used for sequential scan, it was guaranteed that we would never read the same chunk group more than once, so we were freeing chunk buffers after deserializing them into a separate buffer. Now that we use those low level functions for index scan, we cannot free chunk buffers since it's possible to read the same chunk group again, such that: - read chunk group 1 of stripe 5 - read chunk group 2 of stripe 5 - read chunk group 1 of stripe 5 again Here, when we decide to read chunk group 1 for a second time, chunk group 1 is not cached. Plus, before this commit, we were freeing the chunk buffers for chunk group 1 after the first read and then we were getting segfault or errors from low-level de-compression APIs.	2021-08-02 11:00:12 +03:00
Onur Tirtir	327ae43b83	Get rid of EndStripeRead, since we anyway reset mem cxt	2021-08-02 11:00:12 +03:00
Onur Tirtir	83f5d42365	Use long-lasting mem cxt & optimize correlated index scan	2021-08-02 11:00:12 +03:00

1 2 3 4 5 ...

3169 Commits (6c26c67ea09db4e95a93f3eaefea72d20e54b20c)