]> git.apps.os.sepia.ceph.com Git - ceph.git/commit
pybind/mgr/pg_autoscaler: Use bytes_used for actual_raw_used 53534/head
authorKamoltat <ksirivad@redhat.com>
Fri, 2 Jun 2023 20:06:52 +0000 (20:06 +0000)
committerKamoltat <ksirivad@redhat.com>
Tue, 19 Sep 2023 20:11:01 +0000 (20:11 +0000)
commitc3cc098236b43be9fc4f217ab79c726bcbb7f623
tree77b3564bdbe6497ffbae5fd7a0ddde504cd8fe71
parent733a975b9bd6cb569eab259a848f717e2ce1998a
pybind/mgr/pg_autoscaler: Use bytes_used for actual_raw_used

Problem

We realized that `store` is not
the correct value to represent `actual_raw_used`
when it comes to pool(s) with `compression` enabled.

https://github.com/ceph/ceph/pull/29986
was the PR that is the culprit of the issue, since
it simply changed `byte_used` to `store` just
because they want a per pool value of bytes_used
without factoring in replication. However, they
did not realized that in doing so also caused
pools with compression to inherit an incorrect
value for `actual_raw_used`.

This also caused an incorrect value for `capacity_ratio`
since the autoscaler scales PGs according to the
`capacity_ratio` of each pool. The existing issue
causes pool with compression to have higher `capacity_ratio`
where in reality the actual utilization is less than
non-compressed pools, assuming we perform I/O with the same
work load on each pool evenly.

Solution

Use `bytes_used` instead of `store` when fetching
for `actual_raw_used` and when calculating `pool_raw_used`
we `max(actual_raw_used, target_bytes * raw_used_rate)`

Fixes:

https://tracker.ceph.com/issues/54136

Signed-off-by: Kamoltat <ksirivad@redhat.com>
(cherry picked from commit 3d8ac80f61cd332b53aea1fa5799f8ccd3b01b66)
src/pybind/mgr/pg_autoscaler/module.py