From: Prashant D <pdhange@redhat.com>
Date: Fri, 26 Sep 2025 05:16:29 +0000 (-0400)
Subject: pybind/mgr/pg_autoscaler: Introduce dynamic threshold to improve scaling sensitivity
X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=b907e0b2ebf001b67f0e7d1851d1415576d46a07;p=ceph.git

pybind/mgr/pg_autoscaler: Introduce dynamic threshold to improve scaling sensitivity

The scaling threshold is now dynamically adjusted within
_get_pool_pg_targets() based on the calculated ideal PG
count (final_pg_target).
This change allows the autoscaler to be more aggressive
when adjusting smaller pools, and less aggressive when
adjusting very large pools. Also improves logging to
clarify why scaling decisions are made or skipped.

Fixes: https://tracker.ceph.com/issues/73272

Signed-off-by: Prashant D <pdhange@redhat.com>
(cherry picked from commit e4d515d300437fc2b227c5fb8ac693d85bb7d291)
---

diff --git a/doc/rados/operations/placement-groups.rst b/doc/rados/operations/placement-groups.rst
index 40b6a12b2f5..8177476b230 100644
--- a/doc/rados/operations/placement-groups.rst
+++ b/doc/rados/operations/placement-groups.rst
@@ -153,7 +153,11 @@ The output will resemble the following::
 - **NEW PG_NUM** (if present) is the value that the system recommends that the
   ``pg_num`` of the pool should be. It is always a power of two, and it
   is present only if the recommended value varies from the current value by
-  more than the default factor of ``3``.
+  more than the scaling threshold. This threshold defaults to the configured
+  factor of ``3``. While scaling down uses only the configured factor, the
+  threshold is dynamically reduced when scaling up: it is set to 1.0 if the
+  recommended NEW PG_NUM is 512 or 1024, and to 2.0 if the recommended
+  NEW PG_NUM is 2048.
   To adjust this multiple (in the following example, it is changed
   to ``2``), run a command of the following form:
 
@@ -201,8 +205,10 @@ automatically scale each pool's ``pg_num`` in accordance with usage. Ceph consid
 total available storage, the target number of PG replicas for each OSD,
 and how much data is stored in each pool, then apportions PGs accordingly.
 The system is conservative with its approach, making changes to a pool only
-when the current number of PGs (``pg_num``) varies by more than a factor of 3
-from the recommended number.
+when the current number of PGs (``pg_num``) varies by more than the scaling threshold
+from the recommended number. When scaling down, only this configured factor is used.
+However, when scaling up, the threshold is dynamically reduced: it's automatically
+set to 1.0 when the recommended NEWÂ PG_NUM is 512 or 1024, and to 2.0 when it is 2048.
 
 The target number of PGs per OSD is determined by the ``mon_target_pg_per_osd``
 parameter (default: 100), which can be adjusted by running the following
diff --git a/src/pybind/mgr/pg_autoscaler/module.py b/src/pybind/mgr/pg_autoscaler/module.py
index 451bcf8568c..575de1484fe 100644
--- a/src/pybind/mgr/pg_autoscaler/module.py
+++ b/src/pybind/mgr/pg_autoscaler/module.py
@@ -517,6 +517,17 @@ class PgAutoscaler(MgrModule):
         ))
         return final_ratio, pool_pg_target, final_pg_target
 
+    def get_dynamic_threshold(
+            self,
+            final_pg_num: int,
+            default_threshold: float,
+    ) -> float:
+        if final_pg_num in (512, 1024):
+            return 1.0
+        elif final_pg_num == 2048:
+            return 2.0
+        return default_threshold
+
     def _get_pool_pg_targets(
             self,
             osdmap: OSDMap,
@@ -605,12 +616,27 @@ class PgAutoscaler(MgrModule):
                 continue
 
             adjust = False
-            if (final_pg_target > p['pg_num_target'] * threshold or
-                    final_pg_target < p['pg_num_target'] / threshold) and \
-                    final_ratio >= 0.0 and \
-                    final_ratio <= 1.0 and \
-                    p['pg_autoscale_mode'] == 'on':
-                adjust = True
+
+            # Dynamic threshold only applies to scaling UP, otherwise use the default threshold.
+            if final_pg_target is not None and \
+               final_pg_target > p['pg_num_target']:
+                dynamic_threshold = self.get_dynamic_threshold(final_pg_target, threshold)
+                adjust = final_pg_target > p['pg_num_target'] * dynamic_threshold
+            else:
+                adjust = final_pg_target < p['pg_num_target'] / threshold
+
+            if adjust and \
+               final_ratio >= 0.0 and \
+               final_ratio <= 1.0 and \
+               p['pg_autoscale_mode'] == 'on':
+                    adjust = True
+            else:
+                if final_pg_target != p['pg_num_target']:
+                    self.log.warning("pool %s won't scale because recommended PG_NUM target"
+                                     " value varies from current PG_NUM value by"
+                                     " more than '%f' scaling threshold",
+                                     pool_name,
+                                     dynamic_threshold if final_pg_target > p['pg_num_target'] else threshold)
 
             assert pool_pg_target is not None
             ret.append({