citus

Commit Graph

Author	SHA1	Message	Date
Onur Tirtir	889a2731cb	Split columnar stripe reservation into two phases (#5188 ) Previously, we were doing `first_row_number` reservation for the first row written to current `WriteState` but were doing `stripe_id` reservation when flushing the `WriteState` and were inserting the related record to `columnar.stripe` at that time as well. However, inserting `columnar.stripe` record at flush-time is problematic. This is because, as told in #5160, if relation has any index-based constraints and if there are two concurrent writes that are inserting conflicting key values for that constraint, then postgres relies on `tableAM->fetch_index_tuple` (=`columnar_fetch_index_tuple`) callback to return `true` when indexAM is checking against possible constraint violations. However, pending writes of other backends are not visible to concurrent sessions in columnar since we were not inserting the stripe metadata record until flushing the stripe. With this commit, we split stripe reservation into two phases: i) Reserve `stripe_id` and insert a "dummy" record to `columnar.stripe` at the very same time we reserve `first_row_number`, i.e. when writing the first row to the current `WriteState`. ii) At flush time, do the storage level allocation and complete the missing fields of the dummy record inserted into `columnar.stripe` during i). That way, any concurrent writes would be able to check against possible constraint violations by using `SnapshotDirty` when scanning `columnar.stripe`. Note that `columnar_fetch_index_tuple` still wouldn't be able to fill the output tupleslot for the requested tid but it would at least return `true` for such index look-up's and we believe this should be sufficient for the caller indexAM callback to make the concurrent writer block on prior one. That is how we fix #5160. Only downside of reserving `stripe_id` at the same time we reserve `first_row_number` is that now any aborted writes would also waste some amount of `stripe_id` as in the case of `first_row_number` but we are just wasting them one-by-one. Considering the fact that we waste `first_row_number` by the amount stripe row limit (=150k by default) in such cases, this shouldn't be important at all.	2021-09-02 11:49:14 +03:00
Onur Tirtir	bf4dfad6f7	Update curcid of given snapshot if it is MVCC Before starting to scan a columnar table, we always flush the pending writes to disk. However, we increment command counter after modifying metadata tables. On the other hand, now that we _don't always use_ xact snapshot to scan a columnar table, writes that we just flushed might not be visible to the query that just flushed pending writes to disk since curcid of provided snapshot would become smaller than the command id being used when modifying metadata tables. To give an example, before this change, below was a possible scenario due to the changes that we made to use the correct snapshot. ```sql CREATE TABLE t(a int, b int) USING columnar; BEGIN; INSERT INTO t VALUES (5, 10); SELECT * FROM t; ┌───┬───┐ │ a │ b │ ├───┼───┤ └───┴───┘ (0 rows) SELECT * FROM t; ┌───┬────┐ │ a │ b │ ├───┼────┤ │ 5 │ 10 │ └───┴────┘ (1 row) ```	2021-09-02 11:11:59 +03:00
Onur Tirtir	6c26c67ea0	Flush write state when initializing read state In next commit, we will adjust curcid of the snapshot being used when scanning the columnar table. However, for index scan, snapshot is provided not when beginning scan but within fetch-tuple call. For this reason, start flushing pending writes in init_columnar_read_state since this seem to be a prerequisite step that needs to be done before scanning a columnar table regardless of the scan method being used.	2021-09-02 11:10:11 +03:00
Onur Tirtir	db0e4ce889	Increment command counter in FinishModifyRelation instead Seems that we always increment the command counter right after finishing metadata table modification. For this reason, it makes sense to call CommandCounterIncrement within FinishModifyRelation.	2021-09-02 11:10:11 +03:00
Onur Tirtir	0b4ed075b5	Use correct snapshot when reading a columnar table Instead of using xact snapshot, use the snapshot provided to columnarAM when scanning table.	2021-09-02 11:10:11 +03:00
Onur Tirtir	7dcd9380e7	Update index support section of columnar README	2021-08-23 10:35:11 +03:00
Onur Tirtir	3acd3ebae2	Remove temp table limitation from columnar README	2021-08-23 10:35:11 +03:00
Onur Tirtir	68f46c5dc9	Use scan context for intermediate mem allocs too	2021-08-16 11:06:03 +03:00
Onur Tirtir	b3d9fc91f8	Always use right mem cxt when creating ColumnarReadState All the callers except columnar_relation_copy_for_cluster were already switching to right memory context when creating ColumnarReadState. With this commit, we embed that logic into init_columnar_read_state to avoid further such bugs. That way, we start using the right memory context for columnar_relation_copy_for_cluster too.	2021-08-16 11:06:03 +03:00
Onur Tirtir	7fcecde203	Use init_columnar_read_state instead of lower level func Funtionally, this doesn't change anything. This is just a preparation before next commit.	2021-08-16 11:06:03 +03:00
jeff-davis	deb7ec605b	Columnar: fix misleading comments and useless types. (#5162 ) CustomScan and CustomPath structures cannot be extended with additional fields. Fix comments and type structure that implied that they can.	2021-08-05 09:22:21 -07:00
Onur Tirtir	93ebbb0607	Re-cost SeqPath's as well for columnar tables	2021-08-02 11:32:25 +03:00
Onur Tirtir	453ac40725	Comment why we still remove non IndexPath's when custom scan is off	2021-08-02 11:25:18 +03:00
Onur Tirtir	a87405b6ba	Not adjust IndexPath cost if indexscan is off	2021-08-02 11:25:18 +03:00
Onur Tirtir	51691a8994	Rename RecostColumnarIndexPaths to RecostColumnarPaths	2021-08-02 11:25:18 +03:00
Onur Tirtir	297f59a70e	Re-cost columnar table index paths	2021-08-02 11:16:37 +03:00
Onur Tirtir	8adcf2096b	Multiply ColumnarCustomScan cost by tblspace.seqpage cost	2021-08-02 11:16:37 +03:00
Onur Tirtir	dba8421453	Refactor ColumnarScanCost into ColumnarPerChunkGroupScanCost	2021-08-02 11:16:37 +03:00
Onur Tirtir	d8f92697f2	Free memory used for last stripe read when re-scanning a columnar table (#5143 ) Instead of setting stripeReadState to NULL, call ColumnarResetRead before re-scanning a columnar table since this function is already designed for doing the necessary clean up when finishing a stripe read. Note that this change shouldn't have a great effect on memory usage since AdvanceStripe was already doing the clean-up for all the stripes except the last one.	2021-08-02 11:16:01 +03:00
Onur Tirtir	73058d35cc	Not free (stripe) chunk buffers after de-serializing Previously, we were only using chunk group reader for sequential scan. However, to support index scans on columnar tables, now we use very same low level functions for index scan too. Since those low-level functions were only used for sequential scan, it was guaranteed that we would never read the same chunk group more than once, so we were freeing chunk buffers after deserializing them into a separate buffer. Now that we use those low level functions for index scan, we cannot free chunk buffers since it's possible to read the same chunk group again, such that: - read chunk group 1 of stripe 5 - read chunk group 2 of stripe 5 - read chunk group 1 of stripe 5 again Here, when we decide to read chunk group 1 for a second time, chunk group 1 is not cached. Plus, before this commit, we were freeing the chunk buffers for chunk group 1 after the first read and then we were getting segfault or errors from low-level de-compression APIs.	2021-08-02 11:00:12 +03:00
Onur Tirtir	327ae43b83	Get rid of EndStripeRead, since we anyway reset mem cxt	2021-08-02 11:00:12 +03:00
Onur Tirtir	83f5d42365	Use long-lasting mem cxt & optimize correlated index scan	2021-08-02 11:00:12 +03:00
Onur Tirtir	c021b82a43	Introduce CreateColumnarScanMemoryContext	2021-08-02 11:00:12 +03:00
Onur Tirtir	84a49cc221	Improve error message for indexAMs not supported by columnar	2021-07-30 16:41:53 +03:00
Onur Tirtir	eeecbd2324	Introduce ColumnarSupportsIndexAM	2021-07-30 16:40:27 +03:00
Onur Tirtir	f00c63c33d	Support columnar table index builds with CONCURRENTLY option (#5032 ) With this commit, we add (`CREATE INDEX` / `REINDEX`) `CONCURRENTLY` support for columnar tables. For that, we implement `columnar_index_validate_scan` callback. The reasoning behind the implementation is as follows: * Postgres function `validate_index` provides all the TIDs that are currently in the index to `columnar_index_validate_scan` callback via a `tupleSort` object.. * We start scanning the table by using `columnar_getnextslot` as usual. Before moving forward, note that `columnar_getnextslot` guarantees to return tuples in the order of their TIDs. * For us to use during table scan, postgres provides a snapshot guaranteeing that any tuples that are valid according to that snapshot but are not in the index must be added to the index. * Then for each tuple that we read from our table, we continue iterating given `tupleSort` to find the first TID that is greater than or equal to our tuple's TID. If both TID's are equal to each other, then we skip the tuple since it's already indexed. If the TID that we read from tupleSort is greater then our tuple's TID, then we decide to insert this tuple into index.	2021-07-09 13:44:58 +03:00
Onur Tirtir	ea5fe022a4	Be more explicit when doing ordered scan on columnar cat. tables (#5026 ) systable_getnext already uses ForwardScanDirection if relation has any open indexes, but let's be more explicit doing ordered scan on columnar catalog tables.	2021-07-09 13:24:27 +03:00
Onur Tirtir	7bfd84bc70	Introduce StripeGetHighestRowNumber	2021-07-07 11:01:39 +03:00
Onur Tirtir	8942086506	Remove stripeList & currentStripe from ColumnarReadState	2021-07-07 11:01:39 +03:00
Onur Tirtir	16dee73b10	Refactor FindStripeByRowNumber into StripeMetadataLookupRowNumber Push the most logic in FindStripeByRowNumber down to an helper function to re-use it in next commit.	2021-07-07 11:01:38 +03:00
Onur Tirtir	18fe0311c0	Move rest of the schema changes to 10.2-1	2021-06-16 20:43:41 +03:00
Onur Tirtir	07117b0454	Move sql files for upgrade/downgrade_columnar_storage to 10.2-1	2021-06-16 20:40:26 +03:00
Onur Tirtir	3d11c0f9ef	Merge remote-tracking branch 'origin/master' into columnar-index Conflicts: src/test/regress/expected/columnar_empty.out src/test/regress/expected/multi_extension.out	2021-06-16 20:23:50 +03:00
Onur Tirtir	b6b969971a	Error out for CLUSTER commands on columnar tables	2021-06-16 20:06:33 +03:00
Onur Tirtir	5adab2a3ac	Report progress when building index on columnar tables	2021-06-16 20:06:33 +03:00
Onur Tirtir	9b4dc2f804	Prevent using parallel scan for columnar index builds	2021-06-16 19:59:32 +03:00
Onur Tirtir	82ea1b5daf	Not remove all paths, keep IndexPath's	2021-06-16 19:59:32 +03:00
Onur Tirtir	1af50e98b3	Fix a comment in ColumnarMetapageRead	2021-06-16 19:59:32 +03:00
Onur Tirtir	10a762aa88	Implement columnar index support functions	2021-06-16 19:59:32 +03:00
Onur Tirtir	a209999618	Enforce table opt constraints when using alter_columnar_table_set (#5029 )	2021-06-08 17:39:16 +03:00
Onur Tirtir	94f30a0428	Refactor index check in ColumnarProcessUtility	2021-06-01 11:12:28 +03:00
Onur Tirtir	181848cc80	Implement ErrorIfInvalidRowNumber To use the same logic when mapping tid's to row number's	2021-05-10 20:16:50 +03:00
Onur Tirtir	7ae90b7f96	Rename ColumnarStripeIndexRelationId to ColumnarStripePKeyIndexRelationId Since now we have another index on columnar.stripe	2021-05-10 20:16:50 +03:00
Onur Tirtir	f846c16514	Implement BuildStripeMetadata	2021-05-10 20:16:50 +03:00
Onur Tirtir	2552aee404	Handle old versioned columnar metapage after binary upgrade (#4956 ) * Make VACUUM hint for upgrade scenario actually work * Suggest using VACUUM if metapage doesn't exist Plus, suggest upgrading sql version as another option. * Always force read metapage block * Fix two typos	2021-05-10 20:16:50 +03:00
Onur Tirtir	2e419ea177	Add first_row_number column to columnar.stripe for tid mapping	2021-05-10 20:16:50 +03:00
Onur Tirtir	9c1ac3127f	Implement ColumnarOverwriteMetapage	2021-05-10 20:16:50 +03:00
jeff-davis	7b9aecff21	Columnnar: metapage changes. (#4907 ) * Columnar: introduce columnar storage API. This new API is responsible for the low-level storage details of columnar; translating large reads and writes into individual block reads and writes that respect the page headers and emit WAL. It's also responsible for the columnar metapage, resource reservations (stripe IDs, row numbers, and data), and truncation. This new API is not used yet, but will be used in subsequent forthcoming commits. * Columnar: add columnar_storage_info() for debugging purposes. * Columnar: expose ColumnarMetadataNewStorageId(). * Columnar: always initialize metapage at creation time. This avoids the complexity of dealing with tables where the metapage has not yet been initialized. * Columnar: columnar storage upgrade/downgrade UDFs. Necessary upgrade/downgrade step so that new code doesn't see an old metapage. * Columnar: improve metadata.c comment. * Columnar: make ColumnarMetapage internal to the storage API. Callers should not have or need direct access to the metapage. * Columnar: perform resource reservation using storage API. * Columnar: implement truncate using storage API. * Columnar: implement read/write paths with storage API. * Columnar: add storage tests. * Revert "Columnar: don't include stripe reservation locks in lock graph." This reverts commit `c3dcd6b9f8`. No longer needed because the columnar storage API takes care of concurrency for resource reservation. * Columnar: remove unnecessary lock when reserving. No longer necessary because the columnar storage API takes care of concurrent resource reservation. * Add simple upgrade tests for storage/ branch * fix multi_extension.out Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>	2021-05-10 20:16:46 +03:00
Onur Tirtir	7def297a3b	Move the logic that builds relation col list into a function (#4964 )	2021-05-10 20:01:28 +03:00
Onur Tirtir	59fea712e2	Implement an helper to create memory cxt for stripe read (#4965 )	2021-05-10 19:55:47 +03:00
Onur Tirtir	96278822d9	Move columnar test helpers to a separate file (#4908 ) * Move columnar test helpers to another file * Rename column_store_memory_stats to columnar_store_memory_stats	2021-04-16 18:56:21 +03:00
jeff-davis	9ed56928d3	Columnar: fix use-after-free. (#4906 ) Co-authored-by: Jeff Davis <jefdavi@microsoft.com>	2021-04-15 01:00:00 -07:00
Onur Tirtir	fe5c985e1d	Remove HAS_TABLEAM config since we dropped pg11 support (#4862 ) * Remove HAS_TABLEAM config * Drop columnar_ensure_objects_exist * Not call columnar_ensure_objects_exist in citus_finish_pg_upgrade	2021-04-13 10:51:26 +03:00
Onur Tirtir	716cc629f1	Refactor ColumnarReadNextRow for better readability (#4823 )	2021-04-13 10:44:00 +03:00
jeff-davis	3efdfdd791	Columnar: make projectedColumnList an integer list. (#4869 ) Co-authored-by: Jeff Davis <jefdavi@microsoft.com>	2021-04-12 19:07:21 -07:00
jeff-davis	063e673038	Columnar: use clause Vars for chunk group filtering. (#4856 ) * Columnar: use clause Vars for chunk group filtering. This solves #4780 and also provides a cleaner separation between chunk group filtering and projection pushdown. * Columnar: sort and deduplicate Vars pulled from clauses. * Columnar: cleanup variable names. * Columnar: remove alternate test output. * Columnar: do not recurse when looking for whereClauseVars. Co-authored-by: Jeff Davis <jefdavi@microsoft.com>	2021-04-01 12:27:28 -07:00
SaitTalhaNisanci	03832f353c	Drop postgres 11 support	2021-03-25 09:20:28 +03:00
jeff-davis	248c6cb91a	Columnar: do not bother building unnecessary RestrictInfo. (#4852 ) Co-authored-by: Jeff Davis <jefdavi@microsoft.com>	2021-03-24 16:05:08 -07:00
Onur Tirtir	c01507a91b	Remove columnar/.gitignore (#4825 )	2021-03-24 13:04:14 +03:00
jeff-davis	3b12556401	Columnar: cleanup (#4814 ) * Columnar: fix misnamed file. * Columnar: make compression not dependent on columnar.h. * Columnar: rename columnar_metadata_tables.c to columnar_metadata.c. * Columnar: make customscan not depend on columnar.h. Co-authored-by: Jeff Davis <jefdavi@microsoft.com>	2021-03-15 11:34:39 -07:00
Onur Tirtir	1d3e075e62	Support temporary columnar tables (#4766 )	2021-03-12 12:01:36 +03:00
Onur Tirtir	874d5fd962	Remove foreign keys between columnar metadata tables (#4791 ) Postgres keeps AFTER trigger state for each transaction, because we can have deferred AFTER triggers which will be fired at the end of a transaction. Postgres cleans up this state at the end of transaction. Postgres processes ON COMMIT triggers after cleaning-up the AFTER trigger states. So if we fire any triggers in ON COMMIT, the AFTER trigger state won't be cleaned-up properly and the transaction state will be left in an inconsistent state, which might result in assertion failure. So with this commit, we remove foreign keys between columnar metadata tables and enforce constraints between them manually when dropping columnar tables.	2021-03-12 11:28:17 +03:00
Hadi Moshayedi	1a05131331	Use chunk groups to read columnar data (#4768 )	2021-03-02 23:53:24 -08:00
jeff-davis	9da9bd3dfd	Columnar: rename files and tests. (#4751 ) * Columnar: rename files and tests. * Columnar: Rename TableState to ColumnarState.	2021-03-01 08:34:24 -08:00
Onur Tirtir	54ac924bef	Grant read access for columnar metadata tables to unprivileged user	2021-02-26 12:31:09 +03:00
Onur Tirtir	5ed954844c	Ensure table owner when using alter_columnar_table_set/alter_columnar_table_reset (#4748 )	2021-02-26 12:27:51 +03:00
jeff-davis	fbeb747006	Columnar: refactor read path and fix zero-column tables. (#4668 ) Co-authored-by: Jeff Davis <jefdavi@microsoft.com>	2021-02-25 09:04:54 -08:00
Onur Tirtir	495096ef5e	Remove useless pg version checks (#4741 )	2021-02-23 21:20:18 +03:00
Hadi Moshayedi	2fca5ff3b5	Fix alignment issue in DatumToBytea	2021-02-22 16:04:30 -08:00
jeff-davis	0227317002	Columnar: better specification for microbenchmark. (#4711 ) Co-authored-by: Jeff Davis <jefdavi@microsoft.com>	2021-02-16 15:28:25 -08:00
Hadi Moshayedi	e690d8b79b	Move stripe.chunk_count to last position	2021-02-11 17:00:44 -08:00
Jeff Davis	b96673de69	Columnar: update README to compare with cstore_fdw.	2021-02-11 10:47:27 -08:00
Jeff Davis	1f1c3c362b	Columnar: rename chunk_num -> chunk_group_num.	2021-02-11 09:27:00 -08:00
Hadi Moshayedi	c3dcd6b9f8	Columnar: don't include stripe reservation locks in lock graph.	2021-02-10 10:20:20 -08:00
Hadi Moshayedi	841d25bae9	Release metadata locks early	2021-02-10 10:20:12 -08:00
Hadi Moshayedi	52297804ae	Fix zero column tables	2021-02-09 23:05:11 -08:00
Hadi Moshayedi	2d09c76b76	Rename storageid to storage_id	2021-02-09 19:57:04 -08:00
Hadi Moshayedi	8270b598b6	Rename stripeid, chunkid, and attnum	2021-02-09 19:50:50 -08:00
Hadi Moshayedi	9114fd4050	Move chunk.value_count to last position	2021-02-09 19:43:34 -08:00
Hadi Moshayedi	be90c20457	Fix write path for zero column tables	2021-02-09 14:14:06 -08:00
Hadi Moshayedi	c8d61a31e2	Columnar: chunk_group metadata table	2021-02-09 14:11:58 -08:00
Jeff Davis	2ea31c899e	Columnar: make read and write state private.	2021-02-08 10:11:57 -08:00
Hadi Moshayedi	eff8cffaf3	Columnar: improve naming of limit config variables. (#4653 ) * Rename chunk_row_count to chunk_group_row_limit * Rename stripe_row_count to stripe_row_limit * Undo couple of renames	2021-02-06 09:04:04 -08:00
Jeff Davis	b1882d4400	Columnar: Call nextval_internal instead of DirectFunctionCall.	2021-02-06 01:45:30 -08:00
Hadi Moshayedi	0a9fd91d8f	Use 'Chunk Groups' in EXPLAIN ANALYZE of columnar scan	2021-02-05 10:58:01 -08:00
Hadi Moshayedi	1d311b0709	Columnar: don't double count chunks filtered	2021-02-05 10:58:01 -08:00
Hadi Moshayedi	5fde617229	Columnar: disallow CREATE INDEX CONCURRENTLY	2021-02-03 12:10:00 -08:00
Jeff Davis	4043731c41	Columnar: fix inheritance planning.	2021-02-03 10:41:21 -08:00
jeff-davis	e03246dd45	Colummnar: mark custom scan path paralle_safe. (#4619 ) Enables an overall plan to be parallel (e.g. over a partition hierarchy), even though an individual ColumnarScan is not parallel-aware. Co-authored-by: Jeff Davis <jefdavi@microsoft.com>	2021-02-02 11:56:00 -08:00
jeff-davis	e195af7e72	Columnar: always disable parallel paths. (#4617 ) Previously, if columnar.enable_custom_scan was false, parallel paths could remain, leading to an unexpected error. Also, ensure that cheapest_parameterized_paths is cleared if a custom scan is used. Co-authored-by: Jeff Davis <jefdavi@microsoft.com>	2021-02-02 11:37:42 -08:00
Jeff Davis	f417510a7f	Columnar: properly initialize rowNumber.	2021-02-01 21:15:14 -08:00
Hadi Moshayedi	bcb162976f	Fix #4608	2021-02-01 16:23:16 -08:00
Hadi Moshayedi	f5b1e49b79	Columnar: Fix lateral joins	2021-02-01 11:59:36 -08:00
Hadi Moshayedi	ef927688fa	Columnar: Fix ALTER TABLE ... ADD COLUMN.	2021-02-01 11:40:17 -08:00
jeff-davis	15297cab49	Columnar: add GUC to control qual pushdown. (#4586 )	2021-01-27 09:57:40 -08:00
jeff-davis	62e0383150	Columnar readme. (#4585 ) Co-authored-by: Jeff Davis <jefdavi@microsoft.com>	2021-01-27 09:33:35 -08:00
Nils Dijk	07d3b4fd04	fix NaN cost estimate on empty columnar tables (#4593 ) Fixing a division by zero in the cost calculations for scanning a columnar table. Due to how the columns in a columnar table are counted an empty table would result in a division by zero. Instead this patch keeps the column selection ratio on zero when this happens, resulting in an accurate cost of zero pages to scan a columnar table. fixes #4589	2021-01-27 17:32:17 +01:00
Nils Dijk	07cf037b13	fix parse error on pg11.8 for extension creation (#4582 ) In pg11.8 it seemingly tries to parse the full sql file creating the extension, since we use syntax introduced in postgres 12 this fails. This patch rewrites the statement not recognized by pg11.8 to be dynamically executed from a string literal via `EXECUTE`.	2021-01-27 17:00:29 +01:00
Jeff Davis	d62e54dc09	Columnar: optimize write path.	2021-01-25 11:47:21 -08:00
Hadi Moshayedi	639952ffa8	Read chunk row count from catalog tables	2021-01-25 08:53:52 -08:00

1 2 3 4

195 Commits (13911614b873eda07a51cb4e47e704985ecc6af9)