From c8e8be7e5f0965731f8934c47543e92853ef980d Mon Sep 17 00:00:00 2001
From: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
Date: Tue, 27 May 2025 15:00:59 -0400
Subject: [PATCH] doc/rados/operations: Add settings advice to balancer.rst

Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
(cherry picked from commit f188003530e55f6ac46ab3d56c53fab65cdb7e7c)
---
 doc/rados/operations/balancer.rst | 41 +++++++++++++++++++++++++++----
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/doc/rados/operations/balancer.rst b/doc/rados/operations/balancer.rst
index a0189f06dc9ac..494abad9916d9 100644
--- a/doc/rados/operations/balancer.rst
+++ b/doc/rados/operations/balancer.rst
@@ -46,14 +46,45 @@ If the cluster is degraded (that is, if an OSD has failed and the system hasn't
 healed itself yet), then the balancer will not make any adjustments to the PG
 distribution.
 
-When the cluster is healthy, the balancer will incrementally move a small
-fraction of unbalanced PGs in order to improve distribution.  This fraction
-will not exceed a certain threshold that defaults to 5%. To adjust this
-``target_max_misplaced_ratio`` threshold setting, run the following command:
+When the cluster is healthy, the balancer will remap
+unbalanced PGs in phases to incrementally improve the uniformity
+of PG distribution.  The maximum percentage of PGs to remap (move) in
+a single phase defaults to 5%. To adjust this
+``target_max_misplaced_ratio`` threshold setting, run a command
+of the following form:
 
    .. prompt:: bash $
 
-      ceph config set mgr target_max_misplaced_ratio .07   # 7%
+      ceph config set mgr target_max_misplaced_ratio .03   # 3%
+
+A larger value may increase the speed of cluster balancing / convergence
+at the potential cost of greater impact on client operations.
+
+There is a separate setting for how uniform the distribution of PGs
+must be for the module to consider adequately balanced.
+At the time of writing this value defaults to ``5``, which means that
+if a given OSD's PG replicas vary by five or fewer above or below the
+cluster's average, it will be considered sufficiently balanced.
+
+This value of PG replicas / shards (as distinct from logical PGs) is reported
+by the ``ceph osd df`` command under the ``PGS`` column and the variance
+above or below the average under the ``VAR`` column.  It may seem desirable
+to specify a perfect or nearly perfect distribution by setting a very low
+value, but in practice this is not advised, especially when a cluster or
+individual pools have fewer PGs configured than is ideal.  An excessively
+low value for this setting may result in the balancer shuffling data
+forever as it endeavors to meet an impossible expectation.
+
+That said, clusters with multiple CRUSH device classes and / or OSDs that
+differ in capacity will benefit from a smaller value.  In this situation
+run a command of the following form:
+
+  .. prompt:: bash $
+
+     ceph config set mgr/balancer/upmap_max_deviation   1
+
+This value is reasonable and safe for most clusters.  Note that this is
+an absolute integer number of PGs, not a percentage.
 
 The balancer sleeps between runs. To set the number of seconds for this
 interval of sleep, run the following command:
-- 
2.39.5