Columnar: better specification for microbenchmark. (#4711)

Co-authored-by: Jeff Davis <jefdavi@microsoft.com>
pull/4719/head
jeff-davis 2021-02-16 15:28:25 -08:00 committed by GitHub
parent 530a284e51
commit 0227317002
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 20 additions and 30 deletions

View File

@ -279,7 +279,7 @@ INSERT INTO perf_columnar SELECT * FROM perf_row;
=> SELECT pg_total_relation_size('perf_row')::numeric/pg_total_relation_size('perf_columnar') AS compression_ratio; => SELECT pg_total_relation_size('perf_row')::numeric/pg_total_relation_size('perf_columnar') AS compression_ratio;
compression_ratio compression_ratio
-------------------- --------------------
5.4080768380134124 5.3958044063457513
(1 row) (1 row)
``` ```
@ -287,32 +287,12 @@ The overall compression ratio of columnar table, versus the same data
stored with row storage, is **5.4X**. stored with row storage, is **5.4X**.
``` ```
=> VACUUM VERBOSE perf_row;
INFO: vacuuming "public.perf_row"
INFO: "perf_row": found 0 removable, 10 nonremovable row versions in 1 out of 5769231 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 3110
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 5769230 frozen pages.
0 pages are entirely empty.
CPU: user: 0.10 s, system: 0.05 s, elapsed: 0.26 s.
INFO: vacuuming "pg_toast.pg_toast_17133"
INFO: index "pg_toast_17133_index" now contains 0 row versions in 1 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: "pg_toast_17133": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 3110
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
=> VACUUM VERBOSE perf_columnar; => VACUUM VERBOSE perf_columnar;
INFO: statistics for "perf_columnar": INFO: statistics for "perf_columnar":
storage id: 10000000020 storage id: 10000000000
total file size: 8741486592, total data size: 8714771176 total file size: 8761368576, total data size: 8734266196
compression rate: 4.96x compression rate: 5.01x
total row count: 75000000, stripe count: 501, average rows per stripe: 149700 total row count: 75000000, stripe count: 500, average rows per stripe: 150000
chunk count: 60000, containing data for dropped columns: 0, zstd compressed: 60000 chunk count: 60000, containing data for dropped columns: 0, zstd compressed: 60000
``` ```
@ -322,8 +302,13 @@ not account for the metadata savings of the columnar format.
## System ## System
* 16GB physical memory * Azure VM: Standard D2s v3 (2 vcpus, 8 GiB memory)
* 128MB PG shared buffers * Linux (ubuntu 18.04)
* Data Drive: Standard HDD (512GB, 500 IOPS Max, 60 MB/s Max)
* PostgreSQL 13 (``--with-llvm``, ``--with-python``)
* ``shared_buffers = 128MB``
* ``max_parallel_workers_per_gather = 0``
* ``jit = on``
Note: because this was run on a system with enough physical memory to Note: because this was run on a system with enough physical memory to
hold a substantial fraction of the table, the IO benefits of columnar hold a substantial fraction of the table, the IO benefits of columnar
@ -334,11 +319,16 @@ is substantially increased.
```sql ```sql
-- OFFSET 1000 so that no rows are returned, and we collect only timings -- OFFSET 1000 so that no rows are returned, and we collect only timings
SELECT vendor_id, SUM(quantity) FROM perf_row GROUP BY vendor_id OFFSET 1000; SELECT vendor_id, SUM(quantity) FROM perf_row GROUP BY vendor_id OFFSET 1000;
SELECT vendor_id, SUM(quantity) FROM perf_row GROUP BY vendor_id OFFSET 1000;
SELECT vendor_id, SUM(quantity) FROM perf_row GROUP BY vendor_id OFFSET 1000;
SELECT vendor_id, SUM(quantity) FROM perf_columnar GROUP BY vendor_id OFFSET 1000;
SELECT vendor_id, SUM(quantity) FROM perf_columnar GROUP BY vendor_id OFFSET 1000;
SELECT vendor_id, SUM(quantity) FROM perf_columnar GROUP BY vendor_id OFFSET 1000; SELECT vendor_id, SUM(quantity) FROM perf_columnar GROUP BY vendor_id OFFSET 1000;
``` ```
Timing (median of three runs): Timing (median of three runs):
* row: 201700ms * row: 436s
* columnar: 14202ms * columnar: 16s
* speedup: **14X** * speedup: **27X**