Commit Graph

5041 Commits (5aace9bb37b907c221f3375f8a9d3c50192f24b9)

Author SHA1 Message Date
Onder Kalaci d4368ff2b3 Make sure that shouldhaveshards is synced to workers 2021-08-11 15:53:31 +02:00
Hanefi Onaldi dc67dbaa01
Merge pull request #5171 from citusdata/changelog-updates
Add changelog entries for 9.4.6
2021-08-11 11:32:46 +03:00
Hanefi Onaldi c6e428896a
Add changelog entries for 9.4.6 2021-08-11 10:53:31 +03:00
Önder Kalacı 272f4d7ce5
Merge pull request #5158 from citusdata/heap_on_master
Guard against hard WaitEvenSet errors
2021-08-10 09:41:05 +02:00
Onder Kalaci 86bd28b92c Guard against hard WaitEvenSet errors
In short, add wrappers around Postgres' AddWaitEventToSet() and
ModifyWaitEvent().

AddWaitEventToSet()/ModifyWaitEvent*() may throw hard errors. For
example, when the underlying socket for a connection is closed by
the remote server and already reflected by the OS, however
Citus hasn't had a chance to get this information. In that case,
if replication factor is >1, Citus can failover to other nodes
for executing the query. Even if replication factor = 1, Citus
can give much nicer errors.

So CitusAddWaitEventSetToSet()/CitusModifyWaitEvent() simply puts
AddWaitEventToSet()/ModifyWaitEvent() into a PG_TRY/PG_CATCH block
in order to catch any hard errors, and returns this information to
the caller.
2021-08-10 09:35:03 +02:00
Önder Kalacı 2ac3cc07eb
Merge pull request #5144 from citusdata/transactional_start_metadata_node
Make start/stop_metadata_sync_to_node transactional
2021-08-09 10:55:57 +02:00
Onder Kalaci 5f02d18ef8 transactional metadata sync for maintanince daemon
As we use the current user to sync the metadata to the nodes
with #5105 (and many other PRs), there is no reason that
prevents us to use the coordinated transaction for metadata syncing.

This commit also renames few functions to reflect their actual
implementation.
2021-08-09 10:34:55 +02:00
Önder Kalacı 999a236540
Merge pull request #5131 from citusdata/fix_drop_column_part
Dropped columns do not diverge distribution column for partitioned tables
2021-08-06 13:47:50 +02:00
Onder Kalaci 35964c6366 Dropped columns do not diverge distribution column for partitioned tables
Before this commit, creating a partition after a DROP column
on the parent (position before dist. key) was leading to
partition to have the wrong distribution column.
2021-08-06 13:36:12 +02:00
Hanefi Onaldi 0722ec95bc
Merge pull request #5163 from citusdata/changelog-updates 2021-08-05 19:31:10 +03:00
Hanefi Onaldi 998b9ffcaa
Merge branch 'master' into changelog-updates 2021-08-05 19:27:13 +03:00
jeff-davis deb7ec605b
Columnar: fix misleading comments and useless types. (#5162)
CustomScan and CustomPath structures cannot be extended with
additional fields. Fix comments and type structure that implied that
they can.
2021-08-05 09:22:21 -07:00
Hanefi Onaldi bc5553b5d1
Add changelog entries for 10.1.1 2021-08-05 17:32:31 +03:00
Ahmet Gedemenli 07ca8784cd
Merge pull request #5161 from citusdata/add-check-for-gucs-order
Add check for alphabetically sorted gucs
2021-08-05 17:09:38 +03:00
Ahmet Gedemenli 51d410bb7b Add check for alphabetically sorted gucs
Move to a separate script

Add the new script to readme
2021-08-05 16:37:49 +03:00
naisila 798a7902bf Fix master_update_table_statistics scripts for 9.5 2021-08-03 18:15:56 +03:00
naisila f9fa5a3d69 Fix master_update_table_statistics scripts for 9.4 2021-08-03 18:15:56 +03:00
Önder Kalacı 0d2f49fbce
Merge pull request #5130 from citusdata/get_ready_update_dist_table_colocation
Introduce citus_internal_update_relation_colocation
2021-08-03 11:53:30 +02:00
Onder Kalaci 482b8096e9 Introduce citus_internal_update_relation_colocation
update_distributed_table_colocation can be called by the relation
owner, and internally it updates pg_dist_partition. With this
commit, update_distributed_table_colocation uses an internal
UDF to access pg_dist_partition.

As a result, this operation can now be done by regular users
on MX.
2021-08-03 11:44:58 +02:00
Onur Tirtir ef6a8604ba
Merge pull request #5140 from citusdata/col/seq-path-costing
Re-cost columnar table sequential scan paths 

With the changes in this pr, we adjust the cost estimates done by postgres for sequential scan paths for columnar tables.

We want to make better decisions when columnar custom scan is disabled too. That means, there are cases where index scan is more preferable over sequential scan for heapAM but not for columnarAM. 
For this reason, we want to make better decisions regarding whether to choose index scan or sequential scan when columnar custom is scan is **disabled**.
So with this pr, we re-estimate costs for sequential scan paths in a way that is quite similar to what we do for columnar custom scan.

The idea is that columnar custom scan uses projection pushdown so the cost is directly proportional to column selectivity. However, for sequential scan, we re-estimate the cost considering **all** the columns since projection pushdown is not supported for plain sequential scan.
One thing to note here is that we still don't consider chunk group filtering when estimating the cost for columnar custom scan. For this reason, we calculate the same costs for sequential scan & columnar custom scan if query reads all columns, regardless of the filters in the `where` clause.
To avoid mistakenly choosing sequential scan in such cases, we still remove non `IndexPath`s if columnar custom scan is enabled.
That way, even when we calculate the same cost for sequential scan and columnar scan, we will anyway remove sequential one and guarantee that we would choose either columnar custom scan or index scan.
2021-08-02 11:38:11 +03:00
Onur Tirtir 93ebbb0607 Re-cost SeqPath's as well for columnar tables 2021-08-02 11:32:25 +03:00
Onur Tirtir 453ac40725 Comment why we still remove non IndexPath's when custom scan is off 2021-08-02 11:25:18 +03:00
Onur Tirtir a87405b6ba Not adjust IndexPath cost if indexscan is off 2021-08-02 11:25:18 +03:00
Onur Tirtir 51691a8994 Rename RecostColumnarIndexPaths to RecostColumnarPaths 2021-08-02 11:25:18 +03:00
Onur Tirtir 734fa22272
Merge pull request #5090 from citusdata/col/path-costing
Re-cost columnar table index scan paths

With the changes in this pr, we adjust the cost estimate done by indexAM for `IndexPath` according to columnar tables when the index is on a columnar table.
This is because, the way indexAM estimates the cost is not appropriate for indexes on columnar tables.
The most basic reason is that indexAM assumes we will only need to read single page to access a single tuple of the table.
On the other hand for columnar tables, we read the whole stripe from disk for a single tuple too, regardless of the optimization done in #5058.

Note that we don't simply assign startup / total costs but we add the cost estimated by us to the cost estimated by indexAM.
This is because we need to take "the cost due to index data-structure traversal" into account too.

Before explaining the logic that we follow for `IndexPath`, let's first summarize what we were / are doing for `ColumnarCustomScan`:
```math
X <- cost for reading single column of single stripe // 1
cost = X * (number of columns after projection pushdown) // 2
cost = cost * (number of stripes that relation has) // 3
```

The logic that we follow to calculate the additional cost for index scan is as follows:
```math
X <- cost for reading single column of single stripe // same as 1 above
cost = X * (number of columns that relation has) // index scan cannot do projection pushdown, so different than 2 above
cost = cost * (estimated number of stripes that we need to read)
```

where, we calculate `estimated number of stripes that we need to read` as follows:

```math
indexCorrelation, indexSelectivity <- calculate by using amcostestimate_function
estimatedReadRows = (relation row count) * indexSelectivity

minEstimateStripeReads = estimatedReadRows / (average stripe row count) // full correlation, we will not do any redundant stripe reads
maxEstimateStripeReads = estimatedReadRows // no correlation, we will read a different stripe for each tuple

complementCorrelation = 1 - abs(indexCorrelation)
estimatedStripeCount = minEstimateStripeReads +
                       complementCorrelation  * (maxEstimateStripeReads  - minEstimateStripeReads)
```
2021-08-02 11:23:20 +03:00
Onur Tirtir 297f59a70e Re-cost columnar table index paths 2021-08-02 11:16:37 +03:00
Onur Tirtir 8adcf2096b Multiply ColumnarCustomScan cost by tblspace.seqpage cost 2021-08-02 11:16:37 +03:00
Onur Tirtir dba8421453 Refactor ColumnarScanCost into ColumnarPerChunkGroupScanCost 2021-08-02 11:16:37 +03:00
Onur Tirtir d8f92697f2
Free memory used for last stripe read when re-scanning a columnar table (#5143)
Instead of setting stripeReadState to NULL, call ColumnarResetRead
before re-scanning a columnar table since this function is already
designed for doing the necessary clean up when finishing a stripe
read.

Note that this change shouldn't have a great effect on memory usage
since AdvanceStripe was already doing the clean-up for all the
stripes except the last one.
2021-08-02 11:16:01 +03:00
Onur Tirtir 38940ed2a6
Merge pull request #5058 from citusdata/col/optimize-index-read
Use long-lasting mem cxt during columnar index scan & optimize correlated ones
2021-08-02 11:06:57 +03:00
Onur Tirtir 73058d35cc Not free (stripe) chunk buffers after de-serializing
Previously, we were only using chunk group reader for sequential scan.
However, to support index scans on columnar tables, now we use very
same low level functions for index scan too.

Since those low-level functions were only used for sequential scan, it
was guaranteed that we would never read the same chunk group more than
once, so we were freeing chunk buffers after deserializing them into a
separate buffer.

Now that we use those low level functions for index scan, we cannot
free chunk buffers since it's possible to read the same chunk group
again, such that:

- read chunk group 1 of stripe 5
- read chunk group 2 of stripe 5
- read chunk group 1 of stripe 5 again

Here, when we decide to read chunk group 1 for a second time,
chunk group 1 is not cached. Plus, before this commit, we were
freeing the chunk buffers for chunk group 1 after the first
read and then we were getting segfault or errors from low-level
de-compression APIs.
2021-08-02 11:00:12 +03:00
Onur Tirtir 327ae43b83 Get rid of EndStripeRead, since we anyway reset mem cxt 2021-08-02 11:00:12 +03:00
Onur Tirtir 83f5d42365 Use long-lasting mem cxt & optimize correlated index scan 2021-08-02 11:00:12 +03:00
Onur Tirtir c021b82a43 Introduce CreateColumnarScanMemoryContext 2021-08-02 11:00:12 +03:00
Onur Tirtir a25d89e4cb
Merge pull request #5103 from citusdata/at-set-columnar-index
Keep supported indexes when converting table to columnar.

Previously, as indexes were not supported by columnar tables, we were ignoring
all the indexes & index-based constraints of table when converting it to a
columnar table.

However, now that we support `btree` & `hash` indexAM's for columnar tables,
we only ignore the indexAM's other than those two.

However, the way we ignore the unsupported indexes is now a bit different
than before.
Previously we were just _not creating_ any index types after converting table
to columnar as we didn't support any of the index types.
Now that we support `btree` & `hash` indexAMs for columnar tables, now we
really drop the unsupported index types since re-creating the remaining ones
is easier than adding some code that creates only the supported indexes.
2021-07-30 17:01:30 +03:00
Onur Tirtir 84a49cc221 Improve error message for indexAMs not supported by columnar 2021-07-30 16:41:53 +03:00
Onur Tirtir 90e856d6bc Keep supported indexes when converting table to columnar 2021-07-30 16:41:01 +03:00
Onur Tirtir eeecbd2324 Introduce ColumnarSupportsIndexAM 2021-07-30 16:40:27 +03:00
Halil Ozan Akgül d140ca1b0e
Merge pull request #5146 from citusdata/fix_ruleutils_13_endif_comment
Corrects the ruleutils_13.c endif comment
2021-07-29 17:27:01 +03:00
Halil Ozan Akgul 286b0fe0e8 Corrects the endif comment 2021-07-29 17:22:31 +03:00
SaitTalhaNisanci 4559d02c41
Fix union pushdown issue (#5079)
* Fix UNION not being pushdown

Postgres optimizes column fields that are not needed in the output. We
were relying on these fields to understand if it is safe to push down a
union query.

This fix looks at the parse query, which has the original column fields
to detect if it is safe to push down a union query.

* Add more tests

* Simplify code and make it more robust

* Process varlevelsup > 0 in FindReferencedTableColumn

* Only look for outers vars in union path

* Add more comments

* Remove UNION ALL specific logic for pulling up childvars
2021-07-29 13:52:55 +03:00
Jelte Fennema 2aa67421a7
Fix showing target shard size in the rebalance progress monitor (#5136)
The progress monitor wouldn't actually update the size of the shard on
the target node when using "block_writes" as the `shard_transfer_mode`.
The reason for this is that the CREATE TABLE part of the shard creation
would only be committed once all data was moved as well. This caused
our size calculation to always return 0, since the table did not exist
yet in the session that the progress monitor used.

This is fixed by first committing creation of the table, and only then
starting the actual data copy.

The test output changes slightly. Apparently splitting this up in two
transactions instead of one, increases the table size after the copy by
about 40kB. The additional size used doesn't increase when with the
amount of data in the table is larger (it stays ~40kB per shard). So 
this small change in test output is not considered an actual problem.
2021-07-23 16:37:00 +02:00
Jelte Fennema 4c1066e463
Merge pull request #5133 from citusdata/add-cache-to-sequence-def-mx
Include data_type and cache in sequence definition on workers
2021-07-22 11:57:03 +02:00
Jelte Fennema 7d0b6dc9be Include data_type and cache in sequence definition on workers
These two options were not included when creating the sequences on the
workers as part of metadata syncing.

The missing `data_type` part of the definition made finding the cause
of #5126 harder than necessary, because of confusing errors.
2021-07-22 11:49:06 +02:00
Önder Kalacı f52db0abab
Merge pull request #5127 from citusdata/get_ready_tenant_isolation
Introduce citus_internal_delete_shard_metadata
2021-07-19 14:43:47 +02:00
Onder Kalaci 903489c763 Improve wording of an error message 2021-07-19 14:38:52 +02:00
Onder Kalaci c8368e7929 Introduce citus_internal_delete_shard_metadata
With this function, the owner of the table is allowed to remove
shard metadata. This is going to be useful for tenant-isolation.
2021-07-19 13:25:05 +02:00
Önder Kalacı 87a51ae552
CLUSTER ON deparser should consider schemas (#5122) 2021-07-16 19:13:18 +03:00
Hanefi Onaldi 38c139ba59
Merge pull request #5114 from citusdata/changelog-updates 2021-07-16 17:53:36 +03:00
Hanefi Onaldi 6b4996f47e
Add changelog entries for 10.1.0
This patch also moves the section to the top of the changelog
2021-07-16 16:51:12 +03:00