mirror of https://github.com/citusdata/citus.git
One problem with rebalancing by disk size is that shards in newly created collocation groups are considered extremely small. This can easily result in bad balances if there are some other collocation groups that do have some data. One extremely bad example of this is: 1. You have 2 workers 2. Both contain about 100GB of data, but there's a 70MB difference. 3. You create 100 new distributed schemas with a few empty tables in them 4. You run the rebalancer 5. Now all new distributed schemas are placed on the node with that had 70MB less. 6. You start loading some data in these shards and quickly the balance is completely off To address this edge case, this PR changes the by_disk_size rebalance strategy to add a a base size of 100MB to the actual size of each shard group. This can still result in a bad balance when shard groups are empty, but it solves some of the worst cases. |
||
---|---|---|
.. | ||
backend | ||
bin/pg_send_cancellation | ||
include | ||
test |