doc/man/8: add documentation about read balancer

author Laura Flores <lflores@redhat.com>

Fri, 10 Mar 2023 23:39:51 +0000 (23:39 +0000)

committer Laura Flores <lflores@redhat.com>

Thu, 22 Jun 2023 14:54:48 +0000 (09:54 -0500)
author Laura Flores <lflores@redhat.com>
Fri, 10 Mar 2023 23:39:51 +0000 (23:39 +0000)
committer Laura Flores <lflores@redhat.com>
Thu, 22 Jun 2023 14:54:48 +0000 (09:54 -0500)
diff --git a/doc/dev/balancer-design.rst b/doc/dev/balancer-design.rst

new file mode 100644 (file)

index 0000000..f101185
--- /dev/null
+++ b/doc/dev/balancer-design.rst
@@ -0,0 +1,58 @@
+============================
+Balancing in Ceph
+============================
+
+Introduction
+============
+
+In distributed storage systems like Ceph, it is important to balance write and read requests for optimal performance. Write balancing ensures fast storage
+and replication of data in a cluster, while read balancing ensures quick access and retrieval of data in a cluster. Both types of balancing are important
+in distributed systems for different reasons.
+
+Capacity (Upmap) Balancing
+==========================
+
+Importance in a Cluster
+-----------------------
+
+Capacity balancing is a functional requirement. A system like Ceph is as full as its fullest device: When one device is full, the system can not serve write
+requests anymore, and Ceph loses its function. To avoid filling up devices, we want to balance capacity across the devices in a fair way. Each device should
+get a capacity proportional to its size so all devices have the same fullness level. From a performance perspective, capacity balancing creates fair share
+workloads on the OSDs for write requests.
+
+Capacity balancing is expensive. The operation (changing the mapping of pgs) requires data movement by definition, which takes time. During this time, the
+performance of the system is reduced.
+
+In Ceph, we can balance the write performance if all devices are homogeneous (same size and performance).
+
+How to Balance Capacity in Ceph
+-------------------------------
+
+See :ref:`upmap` for more information.
+
+Read Balancing
+==============
+
+Unlike capacity balancing, read balancing is not a strict requirement for Ceph’s functionality. Instead, it is a performance requirement, as it helps the system
+“work” better. The overall goal is to ensure each device gets its fair share of primary OSDs so read requests are distributed evenly across OSDs in the cluster.
+Unbalanced read requests lead to bad performance because of reduced overall cluster bandwidth.
+
+Read balancing is cheap. Unlike capacity balancing, there is no data movement involved. It is just a metadata operation, where the osdmap is updated to change
+which participating OSD in a pg is primary. This operation is fast and has no impact on the cluster performance (except improved performance when the operation
+completes – almost immediately).
+
+In Ceph, we can balance the read performance if all devices are homogeneous (same size and performance). However, in future versions, the read balancer can be improved
+to achieve overall cluster performance in heterogeneous systems.
+
+How to Balance Reads in Ceph
+----------------------------
+See :ref:`read_balancer` for more information.
+
+Also, see the Cephalocon 2023 talk `New Read Balancer in Ceph <https://www.youtube.com/watch?v=AT_cKYaQzcU/>`_ for a demonstration of the offline version
+of the read balancer.
+
+Plans for the Next Version
+--------------------------
+
+1. Improve behavior for heterogeneous OSDs in a pool
+2. Offer read balancing as an online option to the balancer manager module
diff --git a/doc/dev/prim-balancer-design.rst b/doc/dev/prim-balancer-design.rst

deleted file mode 100644 (file)

index 633a051..0000000
--- a/doc/dev/prim-balancer-design.rst
+++ /dev/null
@@ -1,53 +0,0 @@
-
-This document describes the requirements and high-level design of the primary
-balancer for Ceph.
-
-Introduction
-============
-
-In a distributed storage system such as Ceph, there are some requirements to keep the system balanced in order to make it perform well:
-
-#. Balance the capacity - This is a functional requirement, a system like Ceph is "as full as its fullest device". When one device is full the system can not serve write requests anymore. In order to do this we want to balance the capacity across the devices in a fair way - that each device gets capacity proportionally to its size and therefore all the devices have the same fullness level. This is a functional requirement. From performance perspective, capacity balancing creates fair share workloads on the OSDs for *write* requests.
-
-#. Balance the workload - This is a performance requirement, we want to make sure that all the devices will receive a workload according to their performance. Assuming all the devices in a pool use the same technology and have the same bandwidth (a strong recommendation for a well configured system), and all devices in a pool have the same capacity, this means that for each pool, each device gets its fair share of primary OSDs so that the *read* requests are distributed evenly across the OSDs in the cluster. Managing workload balancing for devices with different capacities is discussed in the future enhancements section. 
-
-Requirements
-============
-
-- For each pool, each OSD should have its fair share of PGs in which it is primary. For replicated pools, this would be the number of PGs mapped to this OSD divided by the replica size.
-  - This may be improved in future releases. (see below)
-- Improve the existing capacity balancer code to improve its maintainability
-- Primary balancing is performed without data movement (data is moved only when balancing the capacity)
-- Fix the global +/-1 balancing issue that happens since the current balancer works on a single pool at a time (this is a stretch goal for the first version)
-
-  - Problem description: In a perfectly balanced system, for each pool, each OSD has a number of PGs that ideally would have mapped to it to create a perfect capacity balancing. This number is usually not an integer, so some OSDs get a bit more PGs mapped and some a bit less. If you have many pools and you balance on a pool-by-pool basis, it is possible that some OSDs always get the "a bit more" side. When this happens, even to a single OSD, the result is non-balanced system where one OSD is more full than the others. This may happen with the current capacity balancer. 
-
-First release (Quincy) assumptions
-----------------------------------
-
-- Optional - In the first version the feature will be optional and by default will be disabled
-- CLI only - In the first version we will probably give access to the primary balancer only by ``osdmaptool`` CLI and will not enable it in the online balancer; this way, the use of the feature is more controlled for early adopters
-- No data movement
-
-Future possible enhancements
-----------------------------
-
-- Improve the behavior for non identical OSDs in a pool
-- Improve the capacity balancing behavior in extreme cases
-- Add workload balancing to the online balancer
-- A more futuristic feature can be to improve workload balancing based on real load statistics of the OSDs. 
-
-High Level Design
-=================
-
-- The capacity balancing code will remain in one function ``OSDMap::calc_pg_upmaps`` (the signature might be changed)
-- The workload (a.k.a primary) balancer will be implemented in a different function
-- The workload balancer will do its best based on the current status of the system
-
-  - When called on a balanced system (capacity-wise) with pools with identical devices, it will create a near optimal workload split among the OSDs
-  - Calling the workload balancer on an unbalanced system (capacity-wise) may yield non optimal results, and in some cases may give worse performance than before the call
-
-Helper functionality
---------------------
-
-- Set a seed for random generation in ``osdmaptool`` (For regression tests)
diff --git a/doc/man/8/osdmaptool.rst b/doc/man/8/osdmaptool.rst

index d107c4e3069f43d970a2400549ddc1a8ee9f4cf3..4ef5458e0923a5671835ef4f4260f8d0a1d45e99 100644 (file)
--- a/doc/man/8/osdmaptool.rst
+++ b/doc/man/8/osdmaptool.rst
@@ -183,6 +183,18 @@ Options
  
     write modified osdmap with upmap or crush-adjust changes
  
+.. option:: --read <file>
+
+   calculate pg upmap entries to balance pg primaries
+
+.. option:: --read-pool <poolname>
+
+   specify which pool the read balancer should adjust
+
+.. option:: --vstart
+
+   prefix upmap and read output with './bin/'
+
  Example
  =======
  
@@ -315,6 +327,31 @@ To simulate the active balancer in upmap mode::
     osd.20 pgs 42
     Total time elapsed 0.0167765 secs, 5 rounds
  
+To simulate the active balancer in read mode, first make sure capacity is balanced
+by running the balancer in upmap mode. Then, balance the reads on a replicated pool with::
+
+        osdmaptool osdmap --read read.out --read-pool <pool name>
+
+   ./bin/osdmaptool: osdmap file 'om'
+   writing upmap command output to: read.out
+
+   ---------- BEFORE ------------ 
+   osd.0 | primary affinity: 1 | number of prims: 3
+   osd.1 | primary affinity: 1 | number of prims: 10
+   osd.2 | primary affinity: 1 | number of prims: 3
+ 
+   read_balance_score of 'cephfs.a.meta': 1.88
+
+
+   ---------- AFTER ------------ 
+   osd.0 | primary affinity: 1 | number of prims: 5
+   osd.1 | primary affinity: 1 | number of prims: 5
+   osd.2 | primary affinity: 1 | number of prims: 6
+ 
+   read_balance_score of 'cephfs.a.meta': 1.13
+
+  
+   num changes: 5
  
  Availability
  ============
diff --git a/doc/rados/operations/balancer-module.rst b/doc/rados/operations/balancer-module.rst

new file mode 100644 (file)

index 0000000..aa4eab9
--- /dev/null
+++ b/doc/rados/operations/balancer-module.rst
@@ -0,0 +1,221 @@
+.. _balancer:
+
+Balancer Module
+=======================
+
+The *balancer* can optimize the allocation of placement groups (PGs) across
+OSDs in order to achieve a balanced distribution. The balancer can operate
+either automatically or in a supervised fashion.
+
+
+Status
+------
+
+To check the current status of the balancer, run the following command:
+
+   .. prompt:: bash $
+
+      ceph balancer status
+
+
+Automatic balancing
+-------------------
+
+When the balancer is in ``upmap`` mode, the automatic balancing feature is
+enabled by default. For more details, see :ref:`upmap`.  To disable the
+balancer, run the following command:
+
+   .. prompt:: bash $
+
+      ceph balancer off
+
+The balancer mode can be changed from ``upmap`` mode to ``crush-compat`` mode.
+``crush-compat`` mode is backward compatible with older clients.  In
+``crush-compat`` mode, the balancer automatically makes small changes to the
+data distribution in order to ensure that OSDs are utilized equally.
+
+
+Throttling
+----------
+
+If the cluster is degraded (that is, if an OSD has failed and the system hasn't
+healed itself yet), then the balancer will not make any adjustments to the PG
+distribution.
+
+When the cluster is healthy, the balancer will incrementally move a small
+fraction of unbalanced PGs in order to improve distribution.  This fraction
+will not exceed a certain threshold that defaults to 5%. To adjust this
+``target_max_misplaced_ratio`` threshold setting, run the following command:
+
+   .. prompt:: bash $
+
+      ceph config set mgr target_max_misplaced_ratio .07   # 7%
+
+The balancer sleeps between runs. To set the number of seconds for this
+interval of sleep, run the following command:
+
+   .. prompt:: bash $
+
+      ceph config set mgr mgr/balancer/sleep_interval 60
+
+To set the time of day (in HHMM format) at which automatic balancing begins,
+run the following command:
+
+   .. prompt:: bash $
+
+      ceph config set mgr mgr/balancer/begin_time 0000
+
+To set the time of day (in HHMM format) at which automatic balancing ends, run
+the following command:
+
+   .. prompt:: bash $
+
+      ceph config set mgr mgr/balancer/end_time 2359
+
+Automatic balancing can be restricted to certain days of the week.  To restrict
+it to a specific day of the week or later (as with crontab, ``0`` is Sunday,
+``1`` is Monday, and so on), run the following command:
+
+   .. prompt:: bash $
+
+      ceph config set mgr mgr/balancer/begin_weekday 0
+
+To restrict automatic balancing to a specific day of the week or earlier
+(again, ``0`` is Sunday, ``1`` is Monday, and so on), run the following
+command:
+
+   .. prompt:: bash $
+
+      ceph config set mgr mgr/balancer/end_weekday 6
+
+Automatic balancing can be restricted to certain pools. By default, the value
+of this setting is an empty string, so that all pools are automatically
+balanced.  To restrict automatic balancing to specific pools, retrieve their
+numeric pool IDs (by running the :command:`ceph osd pool ls detail` command),
+and then run the following command:
+
+   .. prompt:: bash $
+
+      ceph config set mgr mgr/balancer/pool_ids 1,2,3
+
+
+Modes
+-----
+
+There are two supported balancer modes:
+
+#. **crush-compat**. This mode uses the compat weight-set feature (introduced
+   in Luminous) to manage an alternative set of weights for devices in the
+   CRUSH hierarchy. When the balancer is operating in this mode, the normal
+   weights should remain set to the size of the device in order to reflect the
+   target amount of data intended to be stored on the device. The balancer will
+   then optimize the weight-set values, adjusting them up or down in small
+   increments, in order to achieve a distribution that matches the target
+   distribution as closely as possible. (Because PG placement is a pseudorandom
+   process, it is subject to a natural amount of variation; optimizing the
+   weights serves to counteract that natural variation.)
+
+   Note that this mode is *fully backward compatible* with older clients: when
+   an OSD Map and CRUSH map are shared with older clients, Ceph presents the
+   optimized weights as the "real" weights.
+
+   The primary limitation of this mode is that the balancer cannot handle
+   multiple CRUSH hierarchies with different placement rules if the subtrees of
+   the hierarchy share any OSDs. (Such sharing of OSDs is not typical and,
+   because of the difficulty of managing the space utilization on the shared
+   OSDs, is generally not recommended.)
+
+#. **upmap**. In Luminous and later releases, the OSDMap can store explicit
+   mappings for individual OSDs as exceptions to the normal CRUSH placement
+   calculation. These ``upmap`` entries provide fine-grained control over the
+   PG mapping. This balancer mode optimizes the placement of individual PGs in
+   order to achieve a balanced distribution.  In most cases, the resulting
+   distribution is nearly perfect: that is, there is an equal number of PGs on
+   each OSD (±1 PG, since the total number might not divide evenly).
+
+   To use ``upmap``, all clients must be Luminous or newer.
+
+The default mode is ``upmap``. The mode can be changed to ``crush-compat`` by
+running the following command:
+
+   .. prompt:: bash $
+
+      ceph balancer mode crush-compat
+
+Supervised optimization
+-----------------------
+
+Supervised use of the balancer can be understood in terms of three distinct
+phases:
+
+#. building a plan
+#. evaluating the quality of the data distribution, either for the current PG
+   distribution or for the PG distribution that would result after executing a
+   plan
+#. executing the plan
+
+To evaluate the current distribution, run the following command:
+
+   .. prompt:: bash $
+
+      ceph balancer eval
+
+To evaluate the distribution for a single pool, run the following command:
+
+   .. prompt:: bash $
+
+      ceph balancer eval <pool-name>
+
+To see the evaluation in greater detail, run the following command:
+
+   .. prompt:: bash $
+
+      ceph balancer eval-verbose ...
+
+To instruct the balancer to generate a plan (using the currently configured
+mode), make up a name (any useful identifying string) for the plan, and run the
+following command:
+
+   .. prompt:: bash $
+
+      ceph balancer optimize <plan-name>
+
+To see the contents of a plan, run the following command:
+
+   .. prompt:: bash $
+
+      ceph balancer show <plan-name>
+
+To display all plans, run the following command:
+
+   .. prompt:: bash $
+
+      ceph balancer ls
+
+To discard an old plan, run the following command:
+
+   .. prompt:: bash $
+
+      ceph balancer rm <plan-name>
+
+To see currently recorded plans, examine the output of the following status
+command:
+
+   .. prompt:: bash $
+
+      ceph balancer status
+
+To evaluate the distribution that would result from executing a specific plan,
+run the following command:
+
+   .. prompt:: bash $
+
+      ceph balancer eval <plan-name>
+
+If a plan is expected to improve the distribution (that is, the plan's score is
+lower than the current cluster state's score), you can execute that plan by
+running the following command:
+
+   .. prompt:: bash $
+
+      ceph balancer execute <plan-name>
diff --git a/doc/rados/operations/balancer.rst b/doc/rados/operations/balancer.rst

deleted file mode 100644 (file)

index a9a980f..0000000
--- a/doc/rados/operations/balancer.rst
+++ /dev/null
@@ -1,221 +0,0 @@
-.. _balancer:
-
-Balancer
-========
-
-The *balancer* can optimize the allocation of placement groups (PGs) across
-OSDs in order to achieve a balanced distribution. The balancer can operate
-either automatically or in a supervised fashion.
-
-
-Status
-------
-
-To check the current status of the balancer, run the following command:
-
-   .. prompt:: bash $
-
-      ceph balancer status
-
-
-Automatic balancing
--------------------
-
-When the balancer is in ``upmap`` mode, the automatic balancing feature is
-enabled by default. For more details, see :ref:`upmap`.  To disable the
-balancer, run the following command:
-
-   .. prompt:: bash $
-
-      ceph balancer off
-
-The balancer mode can be changed from ``upmap`` mode to ``crush-compat`` mode.
-``crush-compat`` mode is backward compatible with older clients.  In
-``crush-compat`` mode, the balancer automatically makes small changes to the
-data distribution in order to ensure that OSDs are utilized equally.
-
-
-Throttling
-----------
-
-If the cluster is degraded (that is, if an OSD has failed and the system hasn't
-healed itself yet), then the balancer will not make any adjustments to the PG
-distribution.
-
-When the cluster is healthy, the balancer will incrementally move a small
-fraction of unbalanced PGs in order to improve distribution.  This fraction
-will not exceed a certain threshold that defaults to 5%. To adjust this
-``target_max_misplaced_ratio`` threshold setting, run the following command:
-
-   .. prompt:: bash $
-
-      ceph config set mgr target_max_misplaced_ratio .07   # 7%
-
-The balancer sleeps between runs. To set the number of seconds for this
-interval of sleep, run the following command:
-
-   .. prompt:: bash $
-
-      ceph config set mgr mgr/balancer/sleep_interval 60
-
-To set the time of day (in HHMM format) at which automatic balancing begins,
-run the following command:
-
-   .. prompt:: bash $
-
-      ceph config set mgr mgr/balancer/begin_time 0000
-
-To set the time of day (in HHMM format) at which automatic balancing ends, run
-the following command:
-
-   .. prompt:: bash $
-
-      ceph config set mgr mgr/balancer/end_time 2359
-
-Automatic balancing can be restricted to certain days of the week.  To restrict
-it to a specific day of the week or later (as with crontab, ``0`` is Sunday,
-``1`` is Monday, and so on), run the following command:
-
-   .. prompt:: bash $
-
-      ceph config set mgr mgr/balancer/begin_weekday 0
-
-To restrict automatic balancing to a specific day of the week or earlier
-(again, ``0`` is Sunday, ``1`` is Monday, and so on), run the following
-command:
-
-   .. prompt:: bash $
-
-      ceph config set mgr mgr/balancer/end_weekday 6
-
-Automatic balancing can be restricted to certain pools. By default, the value
-of this setting is an empty string, so that all pools are automatically
-balanced.  To restrict automatic balancing to specific pools, retrieve their
-numeric pool IDs (by running the :command:`ceph osd pool ls detail` command),
-and then run the following command:
-
-   .. prompt:: bash $
-
-      ceph config set mgr mgr/balancer/pool_ids 1,2,3
-
-
-Modes
------
-
-There are two supported balancer modes:
-
-#. **crush-compat**. This mode uses the compat weight-set feature (introduced
-   in Luminous) to manage an alternative set of weights for devices in the
-   CRUSH hierarchy. When the balancer is operating in this mode, the normal
-   weights should remain set to the size of the device in order to reflect the
-   target amount of data intended to be stored on the device. The balancer will
-   then optimize the weight-set values, adjusting them up or down in small
-   increments, in order to achieve a distribution that matches the target
-   distribution as closely as possible. (Because PG placement is a pseudorandom
-   process, it is subject to a natural amount of variation; optimizing the
-   weights serves to counteract that natural variation.)
-
-   Note that this mode is *fully backward compatible* with older clients: when
-   an OSD Map and CRUSH map are shared with older clients, Ceph presents the
-   optimized weights as the "real" weights.
-
-   The primary limitation of this mode is that the balancer cannot handle
-   multiple CRUSH hierarchies with different placement rules if the subtrees of
-   the hierarchy share any OSDs. (Such sharing of OSDs is not typical and,
-   because of the difficulty of managing the space utilization on the shared
-   OSDs, is generally not recommended.)
-
-#. **upmap**. In Luminous and later releases, the OSDMap can store explicit
-   mappings for individual OSDs as exceptions to the normal CRUSH placement
-   calculation. These ``upmap`` entries provide fine-grained control over the
-   PG mapping. This balancer mode optimizes the placement of individual PGs in
-   order to achieve a balanced distribution.  In most cases, the resulting
-   distribution is nearly perfect: that is, there is an equal number of PGs on
-   each OSD (±1 PG, since the total number might not divide evenly).
-
-   To use ``upmap``, all clients must be Luminous or newer.
-
-The default mode is ``upmap``. The mode can be changed to ``crush-compat`` by
-running the following command:
-
-   .. prompt:: bash $
-
-      ceph balancer mode crush-compat
-
-Supervised optimization
------------------------
-
-Supervised use of the balancer can be understood in terms of three distinct
-phases:
-
-#. building a plan
-#. evaluating the quality of the data distribution, either for the current PG
-   distribution or for the PG distribution that would result after executing a
-   plan
-#. executing the plan
-
-To evaluate the current distribution, run the following command:
-
-   .. prompt:: bash $
-
-      ceph balancer eval
-
-To evaluate the distribution for a single pool, run the following command:
-
-   .. prompt:: bash $
-
-      ceph balancer eval <pool-name>
-
-To see the evaluation in greater detail, run the following command:
-
-   .. prompt:: bash $
-
-      ceph balancer eval-verbose ...
-
-To instruct the balancer to generate a plan (using the currently configured
-mode), make up a name (any useful identifying string) for the plan, and run the
-following command:
-
-   .. prompt:: bash $
-
-      ceph balancer optimize <plan-name>
-
-To see the contents of a plan, run the following command:
-
-   .. prompt:: bash $
-
-      ceph balancer show <plan-name>
-
-To display all plans, run the following command:
-
-   .. prompt:: bash $
-
-      ceph balancer ls
-
-To discard an old plan, run the following command:
-
-   .. prompt:: bash $
-
-      ceph balancer rm <plan-name>
-
-To see currently recorded plans, examine the output of the following status
-command:
-
-   .. prompt:: bash $
-
-      ceph balancer status
-
-To evaluate the distribution that would result from executing a specific plan,
-run the following command:
-
-   .. prompt:: bash $
-
-      ceph balancer eval <plan-name>
-
-If a plan is expected to improve the distribution (that is, the plan's score is
-lower than the current cluster state's score), you can execute that plan by
-running the following command:
-
-   .. prompt:: bash $
-
-      ceph balancer execute <plan-name>
diff --git a/doc/rados/operations/index.rst b/doc/rados/operations/index.rst

index 2136918c724cf631ce789558d9ed9222a782b181..a0605582e9881d63f5e60eb3dfb9de5be7cf9a09 100644 (file)
--- a/doc/rados/operations/index.rst
+++ b/doc/rados/operations/index.rst
@@ -39,8 +39,9 @@ CRUSH algorithm.
         erasure-code
         cache-tiering
         placement-groups
-       balancer
         upmap
+        read-balancer
+        balancer-module
         crush-map
         crush-map-edits
         stretch-mode
diff --git a/doc/rados/operations/pools.rst b/doc/rados/operations/pools.rst

index f53c60fe45f50c2b9b0ba1dc54d9b730ebbf1bea..932aeb1296dc688bb80af82d79639fcdfaa274ef 100644 (file)
--- a/doc/rados/operations/pools.rst
+++ b/doc/rados/operations/pools.rst
@@ -46,12 +46,49 @@ operations. Do not create or manipulate pools with these names.
  List Pools
  ==========
  
-To list your cluster's pools, run the following command:
+There are multiple ways to get the list of pools in your cluster.
+
+To list just your cluster's pool names (good for scripting), execute:
+
+.. prompt:: bash $
+
+   ceph osd pool ls
+
+::
+
+   .rgw.root
+   default.rgw.log
+   default.rgw.control
+   default.rgw.meta
+
+To list your cluster's pools with the pool number, run the following command:
  
  .. prompt:: bash $
  
     ceph osd lspools
  
+::
+
+   1 .rgw.root
+   2 default.rgw.log
+   3 default.rgw.control
+   4 default.rgw.meta
+
+To list your cluster's pools with additional information, execute:
+
+.. prompt:: bash $
+
+   ceph osd pool ls detail
+
+::
+
+   pool 1 '.rgw.root' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 19 flags hashpspool stripe_width 0 application rgw read_balance_score 4.00
+   pool 2 'default.rgw.log' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21 flags hashpspool stripe_width 0 application rgw read_balance_score 4.00
+   pool 3 'default.rgw.control' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 23 flags hashpspool stripe_width 0 application rgw read_balance_score 4.00
+   pool 4 'default.rgw.meta' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 25 flags hashpspool stripe_width 0 pg_autoscale_bias 4 application rgw read_balance_score 4.00
+
+To get even more information, you can execute this command with the ``--format`` (or ``-f``) option and the ``json``, ``json-pretty``, ``xml`` or ``xml-pretty`` value.
+
  .. _createpool:
  
  Creating a Pool
diff --git a/doc/rados/operations/read-balancer.rst b/doc/rados/operations/read-balancer.rst

new file mode 100644 (file)

index 0000000..0833e43
--- /dev/null
+++ b/doc/rados/operations/read-balancer.rst
@@ -0,0 +1,64 @@
+.. _read_balancer:
+
+=======================================
+Operating the Read (Primary) Balancer
+=======================================
+
+You might be wondering: How can I improve performance in my Ceph cluster?
+One important data point you can check is the ``read_balance_score`` on each
+of your replicated pools.
+
+This metric, available via ``ceph osd pool ls detail`` (see :ref:`rados_pools`
+for more details) indicates read performance, or how balanced the primaries are
+for each replicated pool. In most cases, if a ``read_balance_score`` is above 1
+(for instance, 1.5), this means that your pool has unbalanced primaries and that
+you may want to try improving your read performance with the read balancer.
+
+Online Optimization
+===================
+
+At present, there is no online option for the read balancer. However, we plan to add
+the read balancer as an option to the :ref:`balancer` in the next Ceph version
+so it can be enabled to run automatically in the background like the upmap balancer.
+
+Offline Optimization
+====================
+
+Primaries are updated with an offline optimizer that is built into the
+:ref:`osdmaptool`.
+
+#. Grab the latest copy of your osdmap:
+
+   .. prompt:: bash $
+
+      ceph osd getmap -o om
+
+#. Run the optimizer:
+
+   .. prompt:: bash $
+
+      osdmaptool om --read out.txt --read-pool <pool name> [--vstart] 
+
+   It is highly recommended that you run the capacity balancer before running the
+   balancer to ensure optimal results. See :ref:`upmap` for details on how to balance
+   capacity in a cluster.
+
+#. Apply the changes:
+
+   .. prompt:: bash $
+
+      source out.txt
+
+   In the above example, the proposed changes are written to the output file
+   ``out.txt``. The commands in this procedure are normal Ceph CLI commands
+   that can be run in order to apply the changes to the cluster.
+
+   If you are working in a vstart cluster, you may pass the ``--vstart`` parameter
+   as shown above so the CLI commands are formatted with the `./bin/` prefix.
+
+   Note that any time the number of pgs changes (for instance, if the pg autoscaler [:ref:`pg-autoscaler`]
+   kicks in), you should consider rechecking the scores and rerunning the balancer if needed.
+
+To see some details about what the tool is doing, you can pass
+``--debug-osd 10`` to ``osdmaptool``. To see even more details, pass
+``--debug-osd 20`` to ``osdmaptool``.
diff --git a/doc/rados/operations/upmap.rst b/doc/rados/operations/upmap.rst

index 8cce1cf8ef994d6debc4c17591de079bec10c7a1..bfa634954dfc5551e56a7ec62e537ffb1bda0ccf 100644 (file)
--- a/doc/rados/operations/upmap.rst
+++ b/doc/rados/operations/upmap.rst
@@ -1,7 +1,8 @@
  .. _upmap:
  
-Using pg-upmap
-==============
+=======================================
+Operating the Capacity (Upmap) Balancer
+=======================================
  
  In Luminous v12.2.z and later releases, there is a *pg-upmap* exception table
  in the OSDMap that allows the cluster to explicitly map specific PGs to
@@ -11,6 +12,9 @@ in most cases, uniformly distribute PGs across OSDs.
  However, there is an important caveat when it comes to this new feature: it
  requires all clients to understand the new *pg-upmap* structure in the OSDMap.
  
+Online Optimization
+===================
+
  Enabling
  --------
  
@@ -40,17 +44,17 @@ command:
  
     ceph features
  
-Balancer module
+Balancer Module
  ---------------
  
  The `balancer` module for ``ceph-mgr`` will automatically balance the number of
  PGs per OSD. See :ref:`balancer`
  
-Offline optimization
---------------------
+Offline Optimization
+====================
  
-Upmap entries are updated with an offline optimizer that is built into
-``osdmaptool``.
+Upmap entries are updated with an offline optimizer that is built into the
+:ref:`osdmaptool`.
  
  #. Grab the latest copy of your osdmap:
author	Laura Flores <lflores@redhat.com>
	Fri, 10 Mar 2023 23:39:51 +0000 (23:39 +0000)
committer	Laura Flores <lflores@redhat.com>
	Thu, 22 Jun 2023 14:54:48 +0000 (09:54 -0500)
doc/dev/balancer-design.rst	[new file with mode: 0644]	patch \| blob
doc/dev/prim-balancer-design.rst	[deleted file]	patch \| blob \| history
doc/man/8/osdmaptool.rst		patch \| blob \| history
doc/rados/operations/balancer-module.rst	[new file with mode: 0644]	patch \| blob
doc/rados/operations/balancer.rst	[deleted file]	patch \| blob \| history
doc/rados/operations/index.rst		patch \| blob \| history
doc/rados/operations/pools.rst		patch \| blob \| history
doc/rados/operations/read-balancer.rst	[new file with mode: 0644]	patch \| blob
doc/rados/operations/upmap.rst		patch \| blob \| history