From c8e8be7e5f0965731f8934c47543e92853ef980d Mon Sep 17 00:00:00 2001 From: Anthony D'Atri Date: Tue, 27 May 2025 15:00:59 -0400 Subject: [PATCH] doc/rados/operations: Add settings advice to balancer.rst Signed-off-by: Anthony D'Atri (cherry picked from commit f188003530e55f6ac46ab3d56c53fab65cdb7e7c) --- doc/rados/operations/balancer.rst | 41 +++++++++++++++++++++++++++---- 1 file changed, 36 insertions(+), 5 deletions(-) diff --git a/doc/rados/operations/balancer.rst b/doc/rados/operations/balancer.rst index a0189f06dc9ac..494abad9916d9 100644 --- a/doc/rados/operations/balancer.rst +++ b/doc/rados/operations/balancer.rst @@ -46,14 +46,45 @@ If the cluster is degraded (that is, if an OSD has failed and the system hasn't healed itself yet), then the balancer will not make any adjustments to the PG distribution. -When the cluster is healthy, the balancer will incrementally move a small -fraction of unbalanced PGs in order to improve distribution. This fraction -will not exceed a certain threshold that defaults to 5%. To adjust this -``target_max_misplaced_ratio`` threshold setting, run the following command: +When the cluster is healthy, the balancer will remap +unbalanced PGs in phases to incrementally improve the uniformity +of PG distribution. The maximum percentage of PGs to remap (move) in +a single phase defaults to 5%. To adjust this +``target_max_misplaced_ratio`` threshold setting, run a command +of the following form: .. prompt:: bash $ - ceph config set mgr target_max_misplaced_ratio .07 # 7% + ceph config set mgr target_max_misplaced_ratio .03 # 3% + +A larger value may increase the speed of cluster balancing / convergence +at the potential cost of greater impact on client operations. + +There is a separate setting for how uniform the distribution of PGs +must be for the module to consider adequately balanced. +At the time of writing this value defaults to ``5``, which means that +if a given OSD's PG replicas vary by five or fewer above or below the +cluster's average, it will be considered sufficiently balanced. + +This value of PG replicas / shards (as distinct from logical PGs) is reported +by the ``ceph osd df`` command under the ``PGS`` column and the variance +above or below the average under the ``VAR`` column. It may seem desirable +to specify a perfect or nearly perfect distribution by setting a very low +value, but in practice this is not advised, especially when a cluster or +individual pools have fewer PGs configured than is ideal. An excessively +low value for this setting may result in the balancer shuffling data +forever as it endeavors to meet an impossible expectation. + +That said, clusters with multiple CRUSH device classes and / or OSDs that +differ in capacity will benefit from a smaller value. In this situation +run a command of the following form: + + .. prompt:: bash $ + + ceph config set mgr/balancer/upmap_max_deviation 1 + +This value is reasonable and safe for most clusters. Note that this is +an absolute integer number of PGs, not a percentage. The balancer sleeps between runs. To set the number of seconds for this interval of sleep, run the following command: -- 2.39.5