Commit Graph

5014 Commits (marcocitus/reentrant-executor)

Author SHA1 Message Date
Marco Slot f33de24877 Fetch tuples in small batches in adaptive executor where possible 2021-08-20 11:13:58 +02:00
Onur Tirtir 262f89359e
Merge pull request #5192 from citusdata/use-pg-functions
Use relcache utils instead of scanning catalog tables for indexes and ext stats
2021-08-18 17:56:56 +03:00
Onur Tirtir 4e1201a333 Use RelationGetStatExtList instead of scanning pg_stats_ext 2021-08-18 17:50:58 +03:00
Onur Tirtir 4b03195c06 Use RelationGetStatExtList instead of GetExplicitStatisticsIdList 2021-08-18 17:50:57 +03:00
Onur Tirtir 91544d0191 Use PGIndexProcessor infra to find explicitly created indexes 2021-08-18 17:50:57 +03:00
Onur Tirtir 549ca4de6d Use RelationGetIndexList instead of scanning pg_index 2021-08-18 17:50:57 +03:00
Onur Tirtir fa9933daf3 Use get_am_name to find indexAM name 2021-08-18 00:44:37 +03:00
Nils Dijk dfc950ce1e
Fix a segfault caused by use after free in ConnectionsPlacementHash (#5170)
DESCRIPTION: Fix a segfault caused by use after free in ConnectionsPlacementHash

Fix a segfault caused by retaining data in any of the hashmaps making up the Placement Connection Management.

We have seen production systems segfault due to random data referenced from ConnectionPlacementHash.
On investigation we found that the backends segfaulting on this had OOM errors closely prior to the segfault.
It has shown there are at least 15 places where an allocation can OOM that would cause ConnectionPlacementHash to retain pointers to memory from contexts that are subsequently freed. This would reproduce the segfault we have observed in production.

Conditions for these allocations are:
 - allocated after first call to `AssociatePlacementWithShard`: https://github.com/citusdata/citus/blob/v10.0.3/src/backend/distributed/connection/placement_connection.c#L880-L881
 - allocated before `StartNodeUserDatabaseConnection`: https://github.com/citusdata/citus/blob/v10.0.3/src/backend/distributed/connection/connection_management.c#L291

At least 15 points of memory allocation (which could fail) are between the callsites of both in a primary key lookup on a reference table - where we have seen an OOM cause a segfault moments later.

Instead of leaving any references in ConnectionPlacementHash, ConnectionShardHash and ColocatedPlacementsHash that could retain any pointers that are freed due to the TopTransactionContext being reset we clear all these hashes irregardless of the state of CurrentCoordinatedTransactionState.

Downside is that on any transaction abort we will now iterate through 4 hashmaps and clear their contents. Given that they are either already empty, which should cause a quick iteration, or non-empty, causing segfaults in subsequent executions, this overhead seems reasonable.

A better solution would be to move the creation of these hashmaps so they would live in the TopTransactionContext themself, assuming their contents would never outlive a transaction. This needs more investigation and is an involved refactor Hence fixing this quickly here.
2021-08-17 17:42:35 +02:00
jeff-davis 4f213f293e
Columnar: use generate_series for test rather than load. (#5181) 2021-08-16 16:12:06 -07:00
Hanefi Onaldi 49be45ed00
Merge pull request #5178 from citusdata/changelog-updates 2021-08-16 17:22:50 +03:00
Hanefi Onaldi 41b15d8775
Add changelog entries for 9.5.7 2021-08-16 17:11:32 +03:00
Hanefi Onaldi 167a023770
Add changelog entries for 10.0.5 2021-08-16 17:11:32 +03:00
Hanefi Onaldi da29a57837
Add changelog entries for 10.1.2 2021-08-16 17:11:31 +03:00
Onur Tirtir e0ecec80a1
Merge pull request #5165 from citusdata/col/rewrite-mem-reset-bug
Use right mem cxt for read state when re-writing a columnar table
2021-08-16 11:14:26 +03:00
Onur Tirtir 68f46c5dc9 Use scan context for intermediate mem allocs too 2021-08-16 11:06:03 +03:00
Onur Tirtir b3d9fc91f8 Always use right mem cxt when creating ColumnarReadState
All the callers except columnar_relation_copy_for_cluster were already
switching to right memory context when creating ColumnarReadState.

With this commit, we embed that logic into init_columnar_read_state
to avoid further such bugs.

That way, we start using the right memory context for
columnar_relation_copy_for_cluster too.
2021-08-16 11:06:03 +03:00
Onur Tirtir 7fcecde203 Use init_columnar_read_state instead of lower level func
Funtionally, this doesn't change anything. This is just a preparation
before next commit.
2021-08-16 11:06:03 +03:00
Burak Velioglu a8435620c4
Merge pull request #5115 from citusdata/velioglu/partition_fixes
Support for CREATE INDEX ONLY and ALTER INDEX ATTACH PARTITION
2021-08-13 13:18:36 +03:00
Burak Velioglu 4355ba0a38
Add CREATE INDEX ... ON ONLY and ALTER INDEX ... ATTACH PARTITION (#4938 #4980)
- Add support for CRETE INDEX ... ON ONLY: Before that commit we were not sending "ONLY" option to the worker nodes at all. With this commit, "ONLY" parameter will be sent to the worker nodes if it is necessary. (#4938)

- Add support for ALTER INDEX ... ATTACH PARTITION: Attach child_index to parent_index by creating same inheritance on shard level in addition to table level. (#4980)
2021-08-13 13:12:45 +03:00
SaitTalhaNisanci 2ec4e37e45
Fix assert failure in FindReferencedTableColumn (#5175) 2021-08-12 18:21:45 +03:00
Ahmet Gedemenli 9e90894f21
Synchronize hasmetadata flag on mx workers (#5086)
* Synchronize hasmetadata flag on mx workers

* Switch to sequential execution

* Add test

* Use SetWorkerColumn

* Add test for stop_sync

* Remove usage of UpdateHasmetadataOnWorkersWithMetadata

* Remove MarkNodeMetadataSynced

* Fix test for metadatasynced

* Remove MarkNodeMetadataSynced

* Style

* Remove MarkNodeHasMetadata

* Remove UpdateDistNodeBoolAttr

* Refactor SetWorkerColumn

* Use SetWorkerColumnLocalOnly when setting up dependencies

* Use SetWorkerColumnLocalOnly in TriggerSyncMetadataToPrimaryNodes

* Style

* Make update command generator functions static

* Set metadatasynced before syncing

* Call SetWorkerColumn only if the sync is successful

* Try to sync all nodes

* Fix indexno

* Update metadatasynced locally first

* Break if a node fails to sync metadata

* Send worker commands optional

* Style & Rebase

* Add raiseOnError param to SetWorkerColumn

* Style

* Set metadatasynced for all metadata nodes

* Style

* Introduce SetWorkerColumnOptional

* Polish

* Style

* Dont send set command to not synced metadata nodes

* Style

* Polish

* Add test for stop_sync

* Add test for shouldhaveshards

* Add test for isactive flag

* Sort by placementid in the function verify_metadata

* Cover edge cases for failing nodes

* Add comments

* Add nodeport to isactive test

* Add warning if metadata out of sync

* Update warning message
2021-08-12 14:16:18 +03:00
Naisila Puka e5b32b2c3c
Acquire AccessShareLock before updating table statistics (#5155) 2021-08-12 13:58:15 +03:00
Ahmet Gedemenli 6bb4c5e94f
Merge pull request #5173 from citusdata/missing_shouldhave_shards
Make sure that shouldhaveshards is synced to workers
2021-08-11 17:15:14 +03:00
Onder Kalaci d4368ff2b3 Make sure that shouldhaveshards is synced to workers 2021-08-11 15:53:31 +02:00
Hanefi Onaldi dc67dbaa01
Merge pull request #5171 from citusdata/changelog-updates
Add changelog entries for 9.4.6
2021-08-11 11:32:46 +03:00
Hanefi Onaldi c6e428896a
Add changelog entries for 9.4.6 2021-08-11 10:53:31 +03:00
Önder Kalacı 272f4d7ce5
Merge pull request #5158 from citusdata/heap_on_master
Guard against hard WaitEvenSet errors
2021-08-10 09:41:05 +02:00
Onder Kalaci 86bd28b92c Guard against hard WaitEvenSet errors
In short, add wrappers around Postgres' AddWaitEventToSet() and
ModifyWaitEvent().

AddWaitEventToSet()/ModifyWaitEvent*() may throw hard errors. For
example, when the underlying socket for a connection is closed by
the remote server and already reflected by the OS, however
Citus hasn't had a chance to get this information. In that case,
if replication factor is >1, Citus can failover to other nodes
for executing the query. Even if replication factor = 1, Citus
can give much nicer errors.

So CitusAddWaitEventSetToSet()/CitusModifyWaitEvent() simply puts
AddWaitEventToSet()/ModifyWaitEvent() into a PG_TRY/PG_CATCH block
in order to catch any hard errors, and returns this information to
the caller.
2021-08-10 09:35:03 +02:00
Önder Kalacı 2ac3cc07eb
Merge pull request #5144 from citusdata/transactional_start_metadata_node
Make start/stop_metadata_sync_to_node transactional
2021-08-09 10:55:57 +02:00
Onder Kalaci 5f02d18ef8 transactional metadata sync for maintanince daemon
As we use the current user to sync the metadata to the nodes
with #5105 (and many other PRs), there is no reason that
prevents us to use the coordinated transaction for metadata syncing.

This commit also renames few functions to reflect their actual
implementation.
2021-08-09 10:34:55 +02:00
Önder Kalacı 999a236540
Merge pull request #5131 from citusdata/fix_drop_column_part
Dropped columns do not diverge distribution column for partitioned tables
2021-08-06 13:47:50 +02:00
Onder Kalaci 35964c6366 Dropped columns do not diverge distribution column for partitioned tables
Before this commit, creating a partition after a DROP column
on the parent (position before dist. key) was leading to
partition to have the wrong distribution column.
2021-08-06 13:36:12 +02:00
Hanefi Onaldi 0722ec95bc
Merge pull request #5163 from citusdata/changelog-updates 2021-08-05 19:31:10 +03:00
Hanefi Onaldi 998b9ffcaa
Merge branch 'master' into changelog-updates 2021-08-05 19:27:13 +03:00
jeff-davis deb7ec605b
Columnar: fix misleading comments and useless types. (#5162)
CustomScan and CustomPath structures cannot be extended with
additional fields. Fix comments and type structure that implied that
they can.
2021-08-05 09:22:21 -07:00
Hanefi Onaldi bc5553b5d1
Add changelog entries for 10.1.1 2021-08-05 17:32:31 +03:00
Ahmet Gedemenli 07ca8784cd
Merge pull request #5161 from citusdata/add-check-for-gucs-order
Add check for alphabetically sorted gucs
2021-08-05 17:09:38 +03:00
Ahmet Gedemenli 51d410bb7b Add check for alphabetically sorted gucs
Move to a separate script

Add the new script to readme
2021-08-05 16:37:49 +03:00
naisila 798a7902bf Fix master_update_table_statistics scripts for 9.5 2021-08-03 18:15:56 +03:00
naisila f9fa5a3d69 Fix master_update_table_statistics scripts for 9.4 2021-08-03 18:15:56 +03:00
Önder Kalacı 0d2f49fbce
Merge pull request #5130 from citusdata/get_ready_update_dist_table_colocation
Introduce citus_internal_update_relation_colocation
2021-08-03 11:53:30 +02:00
Onder Kalaci 482b8096e9 Introduce citus_internal_update_relation_colocation
update_distributed_table_colocation can be called by the relation
owner, and internally it updates pg_dist_partition. With this
commit, update_distributed_table_colocation uses an internal
UDF to access pg_dist_partition.

As a result, this operation can now be done by regular users
on MX.
2021-08-03 11:44:58 +02:00
Onur Tirtir ef6a8604ba
Merge pull request #5140 from citusdata/col/seq-path-costing
Re-cost columnar table sequential scan paths 

With the changes in this pr, we adjust the cost estimates done by postgres for sequential scan paths for columnar tables.

We want to make better decisions when columnar custom scan is disabled too. That means, there are cases where index scan is more preferable over sequential scan for heapAM but not for columnarAM. 
For this reason, we want to make better decisions regarding whether to choose index scan or sequential scan when columnar custom is scan is **disabled**.
So with this pr, we re-estimate costs for sequential scan paths in a way that is quite similar to what we do for columnar custom scan.

The idea is that columnar custom scan uses projection pushdown so the cost is directly proportional to column selectivity. However, for sequential scan, we re-estimate the cost considering **all** the columns since projection pushdown is not supported for plain sequential scan.
One thing to note here is that we still don't consider chunk group filtering when estimating the cost for columnar custom scan. For this reason, we calculate the same costs for sequential scan & columnar custom scan if query reads all columns, regardless of the filters in the `where` clause.
To avoid mistakenly choosing sequential scan in such cases, we still remove non `IndexPath`s if columnar custom scan is enabled.
That way, even when we calculate the same cost for sequential scan and columnar scan, we will anyway remove sequential one and guarantee that we would choose either columnar custom scan or index scan.
2021-08-02 11:38:11 +03:00
Onur Tirtir 93ebbb0607 Re-cost SeqPath's as well for columnar tables 2021-08-02 11:32:25 +03:00
Onur Tirtir 453ac40725 Comment why we still remove non IndexPath's when custom scan is off 2021-08-02 11:25:18 +03:00
Onur Tirtir a87405b6ba Not adjust IndexPath cost if indexscan is off 2021-08-02 11:25:18 +03:00
Onur Tirtir 51691a8994 Rename RecostColumnarIndexPaths to RecostColumnarPaths 2021-08-02 11:25:18 +03:00
Onur Tirtir 734fa22272
Merge pull request #5090 from citusdata/col/path-costing
Re-cost columnar table index scan paths

With the changes in this pr, we adjust the cost estimate done by indexAM for `IndexPath` according to columnar tables when the index is on a columnar table.
This is because, the way indexAM estimates the cost is not appropriate for indexes on columnar tables.
The most basic reason is that indexAM assumes we will only need to read single page to access a single tuple of the table.
On the other hand for columnar tables, we read the whole stripe from disk for a single tuple too, regardless of the optimization done in #5058.

Note that we don't simply assign startup / total costs but we add the cost estimated by us to the cost estimated by indexAM.
This is because we need to take "the cost due to index data-structure traversal" into account too.

Before explaining the logic that we follow for `IndexPath`, let's first summarize what we were / are doing for `ColumnarCustomScan`:
```math
X <- cost for reading single column of single stripe // 1
cost = X * (number of columns after projection pushdown) // 2
cost = cost * (number of stripes that relation has) // 3
```

The logic that we follow to calculate the additional cost for index scan is as follows:
```math
X <- cost for reading single column of single stripe // same as 1 above
cost = X * (number of columns that relation has) // index scan cannot do projection pushdown, so different than 2 above
cost = cost * (estimated number of stripes that we need to read)
```

where, we calculate `estimated number of stripes that we need to read` as follows:

```math
indexCorrelation, indexSelectivity <- calculate by using amcostestimate_function
estimatedReadRows = (relation row count) * indexSelectivity

minEstimateStripeReads = estimatedReadRows / (average stripe row count) // full correlation, we will not do any redundant stripe reads
maxEstimateStripeReads = estimatedReadRows // no correlation, we will read a different stripe for each tuple

complementCorrelation = 1 - abs(indexCorrelation)
estimatedStripeCount = minEstimateStripeReads +
                       complementCorrelation  * (maxEstimateStripeReads  - minEstimateStripeReads)
```
2021-08-02 11:23:20 +03:00
Onur Tirtir 297f59a70e Re-cost columnar table index paths 2021-08-02 11:16:37 +03:00
Onur Tirtir 8adcf2096b Multiply ColumnarCustomScan cost by tblspace.seqpage cost 2021-08-02 11:16:37 +03:00