doc/rados/operations: Add settings advice to balancer.rst

author Anthony D'Atri <anthonyeleven@users.noreply.github.com>

Tue, 27 May 2025 19:00:59 +0000 (15:00 -0400)

committer Zac Dover <zac.dover@proton.me>

Wed, 28 May 2025 04:46:10 +0000 (14:46 +1000)
author Anthony D'Atri <anthonyeleven@users.noreply.github.com>
Tue, 27 May 2025 19:00:59 +0000 (15:00 -0400)
committer Zac Dover <zac.dover@proton.me>
Wed, 28 May 2025 04:46:10 +0000 (14:46 +1000)
diff --git a/doc/rados/operations/balancer.rst b/doc/rados/operations/balancer.rst

index a0189f06dc9acb49c992f33b98b724df279f4de5..494abad9916d9dea2cf6b2d0151c566f80d04469 100644 (file)
--- a/doc/rados/operations/balancer.rst
+++ b/doc/rados/operations/balancer.rst
@@ -46,14 +46,45 @@ If the cluster is degraded (that is, if an OSD has failed and the system hasn't
  healed itself yet), then the balancer will not make any adjustments to the PG
  distribution.
  
-When the cluster is healthy, the balancer will incrementally move a small
-fraction of unbalanced PGs in order to improve distribution.  This fraction
-will not exceed a certain threshold that defaults to 5%. To adjust this
-``target_max_misplaced_ratio`` threshold setting, run the following command:
+When the cluster is healthy, the balancer will remap
+unbalanced PGs in phases to incrementally improve the uniformity
+of PG distribution.  The maximum percentage of PGs to remap (move) in
+a single phase defaults to 5%. To adjust this
+``target_max_misplaced_ratio`` threshold setting, run a command
+of the following form:
  
     .. prompt:: bash $
  
-      ceph config set mgr target_max_misplaced_ratio .07   # 7%
+      ceph config set mgr target_max_misplaced_ratio .03   # 3%
+
+A larger value may increase the speed of cluster balancing / convergence
+at the potential cost of greater impact on client operations.
+
+There is a separate setting for how uniform the distribution of PGs
+must be for the module to consider adequately balanced.
+At the time of writing this value defaults to ``5``, which means that
+if a given OSD's PG replicas vary by five or fewer above or below the
+cluster's average, it will be considered sufficiently balanced.
+
+This value of PG replicas / shards (as distinct from logical PGs) is reported
+by the ``ceph osd df`` command under the ``PGS`` column and the variance
+above or below the average under the ``VAR`` column.  It may seem desirable
+to specify a perfect or nearly perfect distribution by setting a very low
+value, but in practice this is not advised, especially when a cluster or
+individual pools have fewer PGs configured than is ideal.  An excessively
+low value for this setting may result in the balancer shuffling data
+forever as it endeavors to meet an impossible expectation.
+
+That said, clusters with multiple CRUSH device classes and / or OSDs that
+differ in capacity will benefit from a smaller value.  In this situation
+run a command of the following form:
+
+  .. prompt:: bash $
+
+     ceph config set mgr/balancer/upmap_max_deviation   1
+
+This value is reasonable and safe for most clusters.  Note that this is
+an absolute integer number of PGs, not a percentage.
  
  The balancer sleeps between runs. To set the number of seconds for this
  interval of sleep, run the following command:
author	Anthony D'Atri <anthonyeleven@users.noreply.github.com>
	Tue, 27 May 2025 19:00:59 +0000 (15:00 -0400)
committer	Zac Dover <zac.dover@proton.me>
	Wed, 28 May 2025 04:46:10 +0000 (14:46 +1000)