From: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Date: Thu, 15 Jan 2026 07:22:29 +0000 (+0700)
Subject: doc/rados: improve troubleshooting-pg.rst
X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=586f64c867302b7f59a48ea44924d9e41bc8e2bf;p=ceph.git

doc/rados: improve troubleshooting-pg.rst

Note that a link to a walkthrough uses deprecated Filestore.
Reported in doc bugs pad.

Fix capitalization, use OSD instead of ceph-osd.
Improve language in a list.
Remove escaping from slashes in PG query output, tested on Quincy.
Don't use spaces in states like active+remapped consistently.
Add label for incoming links and change them to refs.
Use privileged prompt for CLI commands, don't highlight in console output.
Use double backticks consistently. Improve markup.
Remove spaces at the end of lines.

Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
---

diff --git a/doc/rados/operations/health-checks.rst b/doc/rados/operations/health-checks.rst
index 770c7b9a73c..aece6171436 100644
--- a/doc/rados/operations/health-checks.rst
+++ b/doc/rados/operations/health-checks.rst
@@ -1150,7 +1150,7 @@ or ``snaptrim_error`` flag set, which indicates that an earlier data scrub
 operation found a problem, or (2) have the *repair* flag set, which means that
 a repair for such an inconsistency is currently in progress.
 
-For more information, see :doc:`../troubleshooting/troubleshooting-pg`.
+For more information, see :ref:`rados_operations_monitoring_osd_pg`.
 
 OSD_SCRUB_ERRORS
 ________________
@@ -1158,7 +1158,7 @@ ________________
 Recent OSD scrubs have discovered inconsistencies. This alert is generally
 paired with *PG_DAMAGED* (see above).
 
-For more information, see :doc:`../troubleshooting/troubleshooting-pg`.
+For more information, see :ref:`rados_operations_monitoring_osd_pg`.
 
 OSD_TOO_MANY_REPAIRS
 ____________________
diff --git a/doc/rados/operations/monitoring-osd-pg.rst b/doc/rados/operations/monitoring-osd-pg.rst
index ba880594189..f5d86f3fb19 100644
--- a/doc/rados/operations/monitoring-osd-pg.rst
+++ b/doc/rados/operations/monitoring-osd-pg.rst
@@ -197,8 +197,7 @@ the following diagram, we assume a pool with three replicas of the PG:
                 |          Peering             |
 
 The OSDs also report their status to the monitor. For details, see `Configuring Monitor/OSD
-Interaction`_. To troubleshoot peering issues, see `Peering
-Failure`_.
+Interaction`_. To troubleshoot peering issues, see :ref:`failures-osd-peering`.
 
 
 Monitoring PG States
@@ -487,7 +486,7 @@ To identify stuck PGs, run the following command:
     ceph pg dump_stuck [unclean|inactive|stale|undersized|degraded]
 
 For more detail, see `Placement Group Subsystem`_. To troubleshoot stuck PGs,
-see `Troubleshooting PG Errors`_.
+see :ref:`failures-pg-stuck`.
 
 
 Finding an Object Location
@@ -554,8 +553,6 @@ performing the migration. For details, see the `Architecture`_ section.
 .. _mClock backfill: ../../configuration/mclock-config-ref#recovery-backfill-options
 .. _Architecture: ../../../architecture
 .. _OSD Not Running: ../../troubleshooting/troubleshooting-osd#osd-not-running
-.. _Troubleshooting PG Errors: ../../troubleshooting/troubleshooting-pg#troubleshooting-pg-errors
-.. _Peering Failure: ../../troubleshooting/troubleshooting-pg#failures-osd-peering
 .. _CRUSH map: ../crush-map
 .. _Configuring Monitor/OSD Interaction: ../../configuration/mon-osd-interaction/
 .. _Placement Group Subsystem: ../control#placement-group-subsystem
diff --git a/doc/rados/troubleshooting/troubleshooting-pg.rst b/doc/rados/troubleshooting/troubleshooting-pg.rst
index f1355b02c73..bf60e83999b 100644
--- a/doc/rados/troubleshooting/troubleshooting-pg.rst
+++ b/doc/rados/troubleshooting/troubleshooting-pg.rst
@@ -8,10 +8,10 @@ Placement Groups Never Get Clean
 Placement Groups (PGs) that remain in the ``active`` status, the
 ``active+remapped`` status or the ``active+degraded`` status and never achieve
 an ``active+clean`` status might indicate a problem with the configuration of
-the Ceph cluster. 
+the Ceph cluster.
 
-In such a situation, review the settings in the `Pool, PG and CRUSH Config
-Reference`_ and make appropriate adjustments.
+In such a situation, review the settings in the :ref:`rados_config_pool_pg_crush_ref`
+and make appropriate adjustments.
 
 As a general rule, run your cluster with more than one OSD and a pool size
 of greater than two object replicas.
@@ -29,11 +29,11 @@ VMs are used as clients). You can experiment with Ceph in a one-node
 configuration, in spite of the limitations as described herein.
 
 To create a cluster on a single node, you must change the
-``osd_crush_chooseleaf_type`` setting from the default of ``1`` (meaning
+:confval:`osd_crush_chooseleaf_type` setting from the default of ``1`` (meaning
 ``host`` or ``node``) to ``0`` (meaning ``osd``) in your Ceph configuration
-file before you create your monitors and OSDs. This tells Ceph that an OSD is
+file before you create Monitors and OSDs. This tells Ceph that an OSD is
 permitted to place another OSD on the same host. If you are trying to set up a
-single-node cluster and ``osd_crush_chooseleaf_type`` is greater than ``0``,
+single-node cluster and :confval:`osd_crush_chooseleaf_type` is greater than ``0``,
 Ceph will attempt to place the PGs of one OSD with the PGs of another OSD on
 another node, chassis, rack, row, or datacenter depending on the setting.
 
@@ -48,16 +48,16 @@ directories for the data first.
 Fewer OSDs than Replicas
 ------------------------
 
-If two OSDs are in an ``up`` and ``in`` state, but the placement gropus are not
-in an ``active + clean`` state, you may have an ``osd_pool_default_size`` set
-to greater than ``2``.
+If a number of OSDs are in an ``up`` and ``in`` state, but the placement groups are not
+in an ``active+clean`` state, you may have an :confval:`osd_pool_default_size` set
+to greater than the number of ``up`` and ``in`` state OSDs.
 
-There are a few ways to address this situation. If you want to operate your
-cluster in an ``active + degraded`` state with two replicas, you can set the
-``osd_pool_default_min_size`` to ``2`` so that you can write objects in an
-``active + degraded`` state. You may also set the ``osd_pool_default_size``
+There are a few ways to address this situation. For example, if you want to operate your
+cluster with :confval:`osd_pool_default_size` set to ``3`` in an ``active+degraded`` state with two replicas, you can set the
+:confval:`osd_pool_default_min_size` to ``2`` so that you can write objects in an
+``active+degraded`` state. You may also set the :confval:`osd_pool_default_size`
 setting to ``2`` so that you have only two stored replicas (the original and
-one replica). In such a case, the cluster should achieve an ``active + clean``
+one replica). In such a case, the cluster should achieve an ``active+clean``
 state.
 
 .. note:: You can make the changes while the cluster is running. If you make
@@ -68,7 +68,7 @@ state.
 Pool Size = 1
 -------------
 
-If you have ``osd_pool_default_size`` set to ``1``, you will have only one copy
+If you have :confval:`osd_pool_default_size` set to ``1``, you will have only one copy
 of the object. OSDs rely on other OSDs to tell them which objects they should
 have. If one OSD has a copy of an object and there is no second copy, then
 there is no second OSD to tell the first OSD that it should have that copy. For
@@ -76,7 +76,7 @@ each placement group mapped to the first OSD (see ``ceph pg dump``), you can
 force the first OSD to notice the placement groups it needs by running a
 command of the following form:
 
-.. prompt:: bash
+.. prompt:: bash #
 
    ceph osd force-create-pg <pgid>
 
@@ -84,40 +84,40 @@ command of the following form:
 CRUSH Map Errors
 ----------------
 
-If any placement groups in your cluster are unclean, then there might be errors
+If any placement groups in your cluster are ``unclean``, then there might be errors
 in your CRUSH map.
 
+.. _failures-pg-stuck:
 
 Stuck Placement Groups
 ======================
 
-It is normal for placement groups to enter "degraded" or "peering" states after
+It is normal for placement groups to enter ``degraded`` or ``peering`` states after
 a component failure. Normally, these states reflect the expected progression
 through the failure recovery process. However, a placement group that stays in
 one of these states for a long time might be an indication of a larger problem.
 For this reason, the Ceph Monitors will warn when placement groups get "stuck"
 in a non-optimal state. Specifically, we check for:
 
-* ``inactive`` - The placement group has not been ``active`` for too long (that
+* ``inactive`` The placement group has not been ``active`` for too long (that
   is, it hasn't been able to service read/write requests).
 
-* ``unclean`` - The placement group has not been ``clean`` for too long (that
+* ``unclean`` The placement group has not been ``clean`` for too long (that
   is, it hasn't been able to completely recover from a previous failure).
 
-* ``stale`` - The placement group status has not been updated by a
-  ``ceph-osd``.  This indicates that all nodes storing this placement group may
-  be ``down``.
+* ``stale`` The placement group status has not been updated by an OSD.
+  This indicates that all nodes storing this placement group may be ``down``.
 
 List stuck placement groups by running one of the following commands:
 
-.. prompt:: bash
+.. prompt:: bash #
 
    ceph pg dump_stuck stale
    ceph pg dump_stuck inactive
    ceph pg dump_stuck unclean
 
-- Stuck ``stale`` placement groups usually indicate that key ``ceph-osd``
-  daemons are not running.
+- Stuck ``stale`` placement groups usually indicate that key OSDs are
+  not running.
 - Stuck ``inactive`` placement groups usually indicate a peering problem (see
   :ref:`failures-osd-peering`).
 - Stuck ``unclean`` placement groups usually indicate that something is
@@ -125,21 +125,20 @@ List stuck placement groups by running one of the following commands:
   :ref:`failures-osd-unfound`);
 
 
-
 .. _failures-osd-peering:
 
 Placement Group Down - Peering Failure
 ======================================
 
-In certain cases, the ``ceph-osd`` `peering` process can run into problems,
+In certain cases, the OSD `peering` process can run into problems,
 which can prevent a PG from becoming active and usable. In such a case, running
 the command ``ceph health detail`` will report something similar to the following:
 
-.. prompt:: bash
+.. prompt:: bash #
 
    ceph health detail
 
-::
+.. code-block:: none
 
     HEALTH_ERR 7 pgs degraded; 12 pgs down; 12 pgs peering; 1 pgs recovering; 6 pgs stuck unclean; 114/3300 degraded (3.455%); 1/3 in osds are down
     ...
@@ -150,7 +149,7 @@ the command ``ceph health detail`` will report something similar to the followin
 
 Query the cluster to determine exactly why the PG is marked ``down`` by running a command of the following form:
 
-.. prompt:: bash
+.. prompt:: bash #
 
    ceph pg 0.5 query
 
@@ -159,10 +158,10 @@ Query the cluster to determine exactly why the PG is marked ``down`` by running
  { "state": "down+peering",
    ...
    "recovery_state": [
-        { "name": "Started\/Primary\/Peering\/GetInfo",
+        { "name": "Started/Primary/Peering/GetInfo",
           "enter_time": "2012-03-06 14:40:16.169679",
           "requested_info_from": []},
-        { "name": "Started\/Primary\/Peering",
+        { "name": "Started/Primary/Peering",
           "enter_time": "2012-03-06 14:40:16.169659",
           "probing_osds": [
                 0,
@@ -180,8 +179,8 @@ Query the cluster to determine exactly why the PG is marked ``down`` by running
  }
 
 The ``recovery_state`` section tells us that peering is blocked due to down
-``ceph-osd`` daemons, specifically ``osd.1``. In this case, we can start that
-particular ``ceph-osd`` and recovery will proceed.
+OSDs, specifically ``osd.1``. In this case, we can start that
+particular OSD and recovery will proceed.
 
 Alternatively, if there is a catastrophic failure of ``osd.1`` (for example, if
 there has been a disk failure), the cluster can be informed that the OSD is
@@ -194,7 +193,7 @@ there has been a disk failure), the cluster can be informed that the OSD is
 To report an OSD ``lost`` and to instruct Ceph to continue to attempt recovery
 anyway, run a command of the following form:
 
-.. prompt:: bash
+.. prompt:: bash #
 
    ceph osd lost 1
 
@@ -209,28 +208,28 @@ Unfound Objects
 Under certain combinations of failures, Ceph may complain about ``unfound``
 objects, as in this example:
 
-.. prompt:: bash
+.. prompt:: bash #
 
    ceph health detail
 
-::
+.. code-block:: none
 
    HEALTH_WARN 1 pgs degraded; 78/3778 unfound (2.065%)
    pg 2.4 is active+degraded, 78 unfound
 
 This means that the storage cluster knows that some objects (or newer copies of
 existing objects) exist, but it hasn't found copies of them.  Here is an
-example of how this might come about for a PG whose data is on two OSDS, which
+example of how this might come about for a PG whose data is on two OSDs, which
 we will call "1" and "2":
 
-* 1 goes down
-* 2 handles some writes, alone
-* 1 comes up
+* 1 goes down.
+* 2 handles some writes, alone.
+* 1 comes up.
 * 1 and 2 re-peer, and the objects missing on 1 are queued for recovery.
 * Before the new objects are copied, 2 goes down.
 
-At this point, 1 knows that these objects exist, but there is no live
-``ceph-osd`` that has a copy of the objects. In this case, IO to those objects
+At this point, 1 knows that these objects exist, but there is no live OSD
+that has a copy of the objects. In this case, IO to those objects
 will block, and the cluster will hope that the failed node comes back soon.
 This is assumed to be preferable to returning an IO error to the user.
 
@@ -240,11 +239,11 @@ This is assumed to be preferable to returning an IO error to the user.
 
 Identify which objects are unfound by running a command of the following form:
 
-.. prompt:: bash
+.. prompt:: bash #
 
    ceph pg 2.4 list_unfound [starting offset, in json]
 
-.. code-block:: javascript
+.. code-block:: json
 
   {
     "num_missing": 1,
@@ -296,18 +295,18 @@ OSDs that have the status of ``already probed`` are ignored.
 
 Use of ``query``:
 
-.. prompt:: bash
+.. prompt:: bash #
 
    ceph pg 2.4 query
 
-.. code-block:: javascript
+.. code-block:: json
 
    "recovery_state": [
-        { "name": "Started\/Primary\/Active",
+        { "name": "Started/Primary/Active",
           "enter_time": "2012-03-06 15:15:46.713212",
           "might_have_unfound": [
                 { "osd": 1,
-                  "status": "osd is down"}]},
+                  "status": "osd is down"}]}]
 
 In this case, the cluster knows that ``osd.1`` might have data, but it is
 ``down``. Here is the full range of possible states:
@@ -332,7 +331,7 @@ combinations of failures have occurred that allow the cluster to learn about
 writes that were performed before the writes themselves have been recovered. To
 mark the "unfound" objects as "lost", run a command of the following form:
 
-.. prompt:: bash
+.. prompt:: bash #
 
    ceph pg 2.5 mark_unfound_lost revert|delete
 
@@ -346,6 +345,7 @@ either roll back to a previous version of the object or (if it was a new
 object) forget about the object entirely. Use ``revert`` with caution, as it
 may confuse applications that expect the object to exist.
 
+
 Homeless Placement Groups
 =========================
 
@@ -355,22 +355,22 @@ placement groups becomes unavailable and the monitor will receive no status
 updates for those placement groups. The monitor marks as ``stale`` any
 placement group whose primary OSD has failed. For example:
 
-.. prompt:: bash
+.. prompt:: bash #
 
    ceph health
 
-::
+.. code-block:: none
 
     HEALTH_WARN 24 pgs stale; 3/300 in osds are down
 
 Identify which placement groups are ``stale`` and which were the last OSDs to
 store the ``stale`` placement groups by running the following command:
 
-.. prompt:: bash
+.. prompt:: bash #
 
    ceph health detail
 
-::
+.. code-block:: none
 
    HEALTH_WARN 24 pgs stale; 3/300 in osds are down
    ...
@@ -380,7 +380,7 @@ store the ``stale`` placement groups by running the following command:
    osd.11 is down since epoch 13, last address 192.168.106.220:6803/11539
    osd.12 is down since epoch 24, last address 192.168.106.220:6806/11861
 
-This output indicates that placement group 2.5 (``pg 2.5``) was last managed by
+This output indicates that placement group ``2.5`` (``pg 2.5``) was last managed by
 ``osd.0`` and ``osd.2``. Restart those OSDs to allow the cluster to recover
 that placement group.
 
@@ -395,7 +395,7 @@ OSDs in an operation involving dividing the number of placement groups in the
 cluster by the number of OSDs in the cluster, a small number of placement
 groups (the remainder, in this operation) are sometimes not distributed across
 the cluster. In situations like this, create a pool with a placement group
-count that is a multiple of the number of OSDs. See `Placement Groups`_ for
+count that is a multiple of the number of OSDs. See :ref:`placement groups` for
 details. See the :ref:`Pool, PG, and CRUSH Config Reference
 <rados_config_pool_pg_crush_ref>` for instructions on changing the default
 values used to determine how many placement groups are assigned to each pool.
@@ -408,23 +408,23 @@ If the cluster is up, but some OSDs are down and you cannot write data, make
 sure that you have the minimum number of OSDs running in the pool. If you don't
 have the minimum number of OSDs running in the pool, Ceph will not allow you to
 write data to it because there is no guarantee that Ceph can replicate your
-data. See ``osd_pool_default_min_size`` in the :ref:`Pool, PG, and CRUSH
+data. See :confval:`osd_pool_default_min_size` in the :ref:`Pool, PG, and CRUSH
 Config Reference <rados_config_pool_pg_crush_ref>` for details.
 
 
 PGs Inconsistent
 ================
 
-If the command ``ceph health detail`` returns an ``active + clean +
-inconsistent`` state, this might indicate an error during scrubbing. Identify
+If the command ``ceph health detail`` returns an ``active+clean+inconsistent``
+state, this might indicate an error during scrubbing. Identify
 the inconsistent placement group or placement groups by running the following
 command:
 
-.. prompt:: bash
+.. prompt:: bash #
 
     ceph health detail
 
-::
+.. code-block:: none
 
     HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
     pg 0.6 is active+clean+inconsistent, acting [0,1,2]
@@ -433,11 +433,11 @@ command:
 Alternatively, run this command if you prefer to inspect the output in a
 programmatic way:
 
-.. prompt:: bash
+.. prompt:: bash #
 
    rados list-inconsistent-pg rbd
 
-::
+.. code-block:: none
 
     ["0.6"]
 
@@ -446,11 +446,11 @@ different inconsistencies in multiple perspectives found in more than one
 object. If an object named ``foo`` in PG ``0.6`` is truncated, the output of
 ``rados list-inconsistent-pg rbd`` will look something like this:
 
-.. prompt:: bash
+.. prompt:: bash #
 
    rados list-inconsistent-obj 0.6 --format=json-pretty
 
-.. code-block:: javascript
+.. code-block:: json
 
     {
         "epoch": 14,
@@ -508,40 +508,40 @@ In this case, the output indicates the following:
   inconsistencies.
 * The inconsistencies fall into two categories:
 
-  #. ``errors``: these errors indicate inconsistencies between shards, without
+  #. ``errors``: These errors indicate inconsistencies between shards, without
      an indication of which shard(s) are bad. Check for the ``errors`` in the
      ``shards`` array, if available, to pinpoint the problem.
 
-     * ``data_digest_mismatch``: the digest of the replica read from ``OSD.2``
+     * ``data_digest_mismatch``: The digest of the replica read from ``OSD.2``
        is different from the digests of the replica reads of ``OSD.0`` and
-       ``OSD.1``
-     * ``size_mismatch``: the size of the replica read from ``OSD.2`` is ``0``,
+       ``OSD.1``.
+     * ``size_mismatch``: The size of the replica read from ``OSD.2`` is ``0``,
        but the size reported by ``OSD.0`` and ``OSD.1`` is ``968``.
 
-  #. ``union_shard_errors``: the union of all shard-specific ``errors`` in the
+  #. ``union_shard_errors``: The union of all shard-specific ``errors`` in the
      ``shards`` array. The ``errors`` are set for the shard with the problem.
      These errors include ``read_error`` and other similar errors. The
      ``errors`` ending in ``oi`` indicate a comparison with
      ``selected_object_info``. Examine the ``shards`` array to determine
      which shard has which error or errors.
 
-     * ``data_digest_mismatch_info``: the digest stored in the ``object-info``
+     * ``data_digest_mismatch_info``: The digest stored in the ``object-info``
        is not ``0xffffffff``, which is calculated from the shard read from
-       ``OSD.2``
-     * ``size_mismatch_info``: the size stored in the ``object-info`` is
+       ``OSD.2``.
+     * ``size_mismatch_info``: The size stored in the ``object-info`` is
        different from the size read from ``OSD.2``. The latter is ``0``.
 
 .. warning:: If ``read_error`` is listed in a shard's ``errors`` attribute, the
    inconsistency is likely due to physical storage errors. In cases like this,
-   check the storage used by that OSD. 
-   
+   check the storage used by that OSD.
+
    Examine the output of ``dmesg`` and ``smartctl`` before attempting a drive
    repair.
 
 To repair the inconsistent placement group, run a command of the following
 form:
 
-.. prompt:: bash
+.. prompt:: bash #
 
    ceph pg repair {placement-group-ID}
 
@@ -550,7 +550,7 @@ For example:
 .. prompt:: bash #
 
    ceph pg repair 1.4
-    
+
 .. warning:: This command overwrites the "bad" copies with "authoritative"
    copies. In most cases, Ceph is able to choose authoritative copies from all
    the available replicas by using some predefined criteria. This, however,
@@ -564,14 +564,16 @@ For example:
    command ``ceph osd dump | grep pool`` return a list of pool numbers.
 
 
-If you receive ``active + clean + inconsistent`` states periodically due to
+If you receive ``active+clean+inconsistent`` states periodically due to
 clock skew, consider configuring the `NTP
 <https://en.wikipedia.org/wiki/Network_Time_Protocol>`_ daemons on your monitor
 hosts to act as peers. See `The Network Time Protocol <http://www.ntp.org>`_
 and Ceph :ref:`Clock Settings <mon-config-ref-clock>` for more information.
 
+
 More Information on PG Repair
 -----------------------------
+
 Ceph stores and updates the checksums of objects stored in the cluster. When a
 scrub is performed on a PG, the lead OSD attempts to choose an authoritative
 copy from among its replicas. Only one of the possible cases is consistent.
@@ -583,7 +585,7 @@ any mismatch between the checksum of any replica of an object and the checksum
 of the authoritative copy means that there is an inconsistency. The discovery
 of these inconsistencies cause a PG's state to be set to ``inconsistent``.
 
-The ``pg repair`` command attempts to fix inconsistencies of various kinds. When 
+The ``pg repair`` command attempts to fix inconsistencies of various kinds. When
 ``pg repair`` finds an inconsistent PG, it attempts to overwrite the digest of
 the inconsistent copy with the digest of the authoritative copy. When ``pg
 repair`` finds an inconsistent copy in a replicated pool, it marks the
@@ -591,8 +593,8 @@ inconsistent copy as missing. In the case of replicated pools, recovery is
 beyond the scope of ``pg repair``.
 
 In the case of erasure-coded and BlueStore pools, Ceph will automatically
-perform repairs if ``osd_scrub_auto_repair`` (default ``false``) is set to
-``true`` and if no more than ``osd_scrub_auto_repair_num_errors`` (default
+perform repairs if :confval:`osd_scrub_auto_repair` (default ``false``) is set to
+``true`` and if no more than :confval:`osd_scrub_auto_repair_num_errors` (default
 ``5``) errors are found.
 
 The ``pg repair`` command will not solve every problem. Ceph does not
@@ -615,36 +617,41 @@ might not be the uncorrupted replica. Because of this uncertainty, human
 intervention is necessary when an inconsistency is discovered. This
 intervention sometimes involves use of ``ceph-objectstore-tool``.
 
+
 PG Repair Walkthrough
 ---------------------
+
 https://ceph.io/geen-categorie/ceph-manually-repair-object/ - This page
-contains a walkthrough of the repair of a PG. It is recommended reading if you
-want to repair a PG but have never done so.
+contains a walkthrough of the repair of a PG on the deprecated Filestore OSD back end. It is recommended reading if you
+want to repair a PG on a Filestore OSD but have never done so. The walkthrough does not
+apply to modern BlueStore OSDs.
 
-Erasure Coded PGs are not active+clean
-======================================
+
+Erasure Coded PGs are not ``active+clean``
+==========================================
 
 If CRUSH fails to find enough OSDs to map to a PG, it will show as a
 ``2147483647`` which is ``ITEM_NONE`` or ``no OSD found``. For example::
 
      [2,1,6,0,5,8,2147483647,7,4]
 
+
 Not enough OSDs
 ---------------
 
 If the Ceph cluster has only eight OSDs and an erasure coded pool needs nine
-OSDs, the cluster will show "Not enough OSDs". In this case, you either create
-another erasure coded pool that requires fewer OSDs, by running commands of the
+OSDs, the cluster will show ``Not enough OSDs``. In this case, either add new
+OSDs that the PG will then use automatically, or create
+another erasure coded pool that requires fewer OSDs by running commands of the
 following form:
 
-.. prompt:: bash
+.. prompt:: bash #
 
      ceph osd erasure-code-profile set myprofile k=5 m=3
      ceph osd pool create erasurepool erasure myprofile
 
-or add new OSDs, and the PG will automatically use them.
 
-CRUSH constraints cannot be satisfied
+CRUSH Constraints cannot be Satisfied
 -------------------------------------
 
 If the cluster has enough OSDs, it is possible that the CRUSH rule is imposing
@@ -653,16 +660,22 @@ the CRUSH rule requires that no two OSDs from the same host are used in the
 same PG, the mapping may fail because only two OSDs will be found. Check the
 constraint by displaying ("dumping") the rule, as shown here:
 
-.. prompt:: bash
+.. prompt:: bash #
 
    ceph osd crush rule ls
 
-::
+.. code-block:: json
 
     [
         "replicated_rule",
         "erasurepool"]
-    $ ceph osd crush rule dump erasurepool
+
+.. prompt:: bash #
+
+   ceph osd crush rule dump erasurepool
+
+.. code-block:: json
+
     { "rule_id": 1,
       "rule_name": "erasurepool",
       "type": 3,
@@ -679,39 +692,44 @@ constraint by displaying ("dumping") the rule, as shown here:
 Resolve this problem by creating a new pool in which PGs are allowed to have
 OSDs residing on the same host by running the following commands:
 
-.. prompt:: bash
+.. prompt:: bash #
 
    ceph osd erasure-code-profile set myprofile crush-failure-domain=osd
    ceph osd pool create erasurepool erasure myprofile
 
-CRUSH gives up too soon
+
+CRUSH Gives up too Soon
 -----------------------
 
 If the Ceph cluster has just enough OSDs to map the PG (for instance a cluster
 with a total of nine OSDs and an erasure coded pool that requires nine OSDs per
-PG), it is possible that CRUSH gives up before finding a mapping. This problem
-can be resolved by:
+PG), it is possible that CRUSH gives up before finding a mapping. To resolve
+this problem, either:
 
-* lowering the erasure coded pool requirements to use fewer OSDs per PG (this
+* Lower the erasure coded pool requirements to use fewer OSDs per PG (this
   requires the creation of another pool, because erasure code profiles cannot
   be modified dynamically).
 
-* adding more OSDs to the cluster (this does not require the erasure coded pool
-  to be modified, because it will become clean automatically)
+* Add more OSDs to the cluster (this does not require the erasure coded pool
+  to be modified, because it will become clean automatically).
 
-* using a handmade CRUSH rule that tries more times to find a good mapping.
+* Use a handmade CRUSH rule that tries more times to find a good mapping.
   This can be modified for an existing CRUSH rule by setting
-  ``set_choose_tries`` to a value greater than the default.
+  ``set_choose_tries`` to a value greater than the default. For more
+  information, see :ref:`rados-crush-map-edits`.
+
+* Use a multi-step retry (MSR) CRUSH rule (Squid or later releases). For more
+  information, see :ref:`rados-crush-msr-rules`.
 
 First, verify the problem by using  ``crushtool`` after extracting the crushmap
 from the cluster. This ensures that your experiments do not modify the Ceph
 cluster and that they operate only on local files:
 
-.. prompt:: bash
+.. prompt:: bash #
 
    ceph osd crush rule dump erasurepool
 
-::
+.. code-block:: json
 
     { "rule_id": 1,
       "rule_name": "erasurepool",
@@ -724,12 +742,24 @@ cluster and that they operate only on local files:
               "num": 0,
               "type": "host"},
             { "op": "emit"}]}
-    $ ceph osd getcrushmap > crush.map
+
+.. prompt:: bash #
+
+    ceph osd getcrushmap > crush.map
+
+.. code-block:: none
+
     got crush map from osdmap epoch 13
-    $ crushtool -i crush.map --test --show-bad-mappings \
+
+.. prompt:: bash #
+
+    crushtool -i crush.map --test --show-bad-mappings \
        --rule 1 \
        --num-rep 9 \
        --min-x 1 --max-x $((1024 * 1024))
+
+.. code-block:: none
+
     bad mapping rule 8 x 43 num_rep 9 result [3,2,7,1,2147483647,8,5,6,0]
     bad mapping rule 8 x 79 num_rep 9 result [6,0,2,1,4,7,2147483647,5,8]
     bad mapping rule 8 x 173 num_rep 9 result [0,4,6,8,2,1,3,7,2147483647]
@@ -747,25 +777,23 @@ considered bad, the CRUSH rule can be configured to search longer for a viable
 placement.
 
 
-Changing the value of set_choose_tries
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Changing the Value of ``set_choose_tries``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 #. Decompile the CRUSH map to edit the CRUSH rule by running the following
    command:
 
-   .. prompt:: bash
+   .. prompt:: bash #
 
       crushtool --decompile crush.map > crush.txt
 
    For illustrative purposes a simplified CRUSH map will be used in this
-   example, simulating a single host with four disks of sizes 3Ã1TiB and
-   1Ã200GiB.  The settings below are chosen specifically for this example and
+   example, simulating a single host with four disks of sizes 3Ã1 TiB and
+   1Ã200 GiB.  The settings below are chosen specifically for this example and
    will diverge from the :ref:`CRUSH Map Tunables <crush-map-tunables>`
    generally found in production clusters. As defaults may change, please refer
    to the correct version of the documentation for your release of Ceph.
 
-_
-
    ::
 
       tunable choose_local_tries 0
@@ -832,7 +860,7 @@ _
       step set_choose_tries 100
 
    If the line does exist already, as in this example, only modify the value.
-   Ensure that the rule in this ``crush.txt`` does resemble this after the
+   Ensure that the rule in your ``crush.txt`` does resemble this after the
    change::
 
       rule ec {
@@ -847,7 +875,7 @@ _
 
 #. Recompile and retest the CRUSH rule:
 
-   .. prompt:: bash
+   .. prompt:: bash #
 
       crushtool --compile crush.txt -o better-crush.map
 
@@ -856,7 +884,7 @@ _
    ``--show-choose-tries`` option of the ``crushtool`` command, as in the
    following example:
 
-   .. prompt:: bash
+   .. prompt:: bash #
 
       crushtool -i better-crush.map --test --show-bad-mappings \
        --show-choose-tries \
@@ -864,7 +892,7 @@ _
        --num-rep 3 \
        --min-x 1 --max-x 10
 
-   ::
+   .. code-block:: none
 
      0:         0
      1:         0
@@ -908,6 +936,3 @@ placements in practice, however if a lower value is desired then the lower
 value can be used at the chance of potentially hitting one of the rare cases in
 which placement fails, requiring manual intervention.
 
-.. _check: ../../operations/placement-groups#get-the-number-of-placement-groups
-.. _Placement Groups: ../../operations/placement-groups
-.. _Pool, PG and CRUSH Config Reference: ../../configuration/pool-pg-config-ref