doc/rados/ops: edit monitoring-osd-pg.rst (1 of x)

author Zac Dover <zac.dover@proton.me>

Tue, 4 Apr 2023 03:29:15 +0000 (13:29 +1000)

committer Zac Dover <zac.dover@proton.me>

Wed, 5 Apr 2023 04:41:09 +0000 (14:41 +1000)
author Zac Dover <zac.dover@proton.me>
Tue, 4 Apr 2023 03:29:15 +0000 (13:29 +1000)
committer Zac Dover <zac.dover@proton.me>
Wed, 5 Apr 2023 04:41:09 +0000 (14:41 +1000)
diff --git a/doc/rados/operations/monitoring-osd-pg.rst b/doc/rados/operations/monitoring-osd-pg.rst

index d9dd7be7d5c6e9a97da0715bb65d4b15996db852..11a84ebea1d5f5f5e4f857b6e3a6e0fa7a1859f3 100644 (file)
--- a/doc/rados/operations/monitoring-osd-pg.rst
+++ b/doc/rados/operations/monitoring-osd-pg.rst
@@ -3,35 +3,36 @@
  =========================
  
  High availability and high reliability require a fault-tolerant approach to
-managing hardware and software issues. Ceph has no single point-of-failure, and
-can service requests for data in a "degraded" mode. Ceph's `data placement`_
-introduces a layer of indirection to ensure that data doesn't bind directly to
-particular OSD addresses. This means that tracking down system faults requires
-finding the `placement group`_ and the underlying OSDs at root of the problem.
-
-.. tip:: A fault in one part of the cluster may prevent you from accessing a 
-   particular object, but that doesn't mean that you cannot access other objects.
+managing hardware and software issues. Ceph has no single point of failure and
+it can service requests for data even when in a "degraded" mode. Ceph's `data
+placement`_ introduces a layer of indirection to ensure that data doesn't bind
+directly to specific OSDs. For this reason, tracking system faults
+requires finding the `placement group`_ (PG) and the underlying OSDs at the
+root of the problem.
+
+.. tip:: A fault in one part of the cluster might prevent you from accessing a 
+   particular object, but that doesn't mean that you are prevented from accessing other objects.
     When you run into a fault, don't panic. Just follow the steps for monitoring
-   your OSDs and placement groups. Then, begin troubleshooting.
+   your OSDs and placement groups, and then begin troubleshooting.
  
-Ceph is generally self-repairing. However, when problems persist, monitoring
-OSDs and placement groups will help you identify the problem.
+Ceph is generally self-repairing. However, when problems persist and you want to find out 
+what exactly is going wrong, it can be helpful to monitor OSDs and PGs.
  
  
  Monitoring OSDs
  ===============
  
-An OSD's status is either in the cluster (``in``) or out of the cluster
-(``out``); and, it is either up and running (``up``), or it is down and not
-running (``down``). If an OSD is ``up``, it may be either ``in`` the cluster
-(you can read and write data) or it is ``out`` of the cluster. If it was
-``in`` the cluster and recently moved ``out`` of the cluster, Ceph will migrate
-placement groups to other OSDs. If an OSD is ``out`` of the cluster, CRUSH will
-not assign placement groups to the OSD. If an OSD is ``down``, it should also be
+An OSD's status is as follows: it is either in the cluster (``in``) or out of the cluster
+(``out``); likewise, it is either up and running (``up``) or down and not
+running (``down``). If an OSD is ``up``, it can be either ``in`` the cluster
+(if so, you can read and write data) or ``out`` of the cluster. If the OSD was previously
+``in`` the cluster but was recently moved ``out`` of the cluster, Ceph will migrate its
+PGs to other OSDs. If an OSD is ``out`` of the cluster, CRUSH will                               
+not assign any PGs to that OSD. If an OSD is ``down``, it should also be
  ``out``.
  
-.. note:: If an OSD is ``down`` and ``in``, there is a problem and the cluster 
-   will not be in a healthy state.
+.. note:: If an OSD is ``down`` and ``in``, then there is a problem and the cluster 
+   is not in a healthy state.
  
  .. ditaa::
  
@@ -50,72 +51,71 @@ not assign placement groups to the OSD. If an OSD is ``down``, it should also be
             |                |        |                |
             +----------------+        +----------------+
  
-If you execute a command such as ``ceph health``, ``ceph -s`` or ``ceph -w``,
-you may notice that the cluster does not always echo back ``HEALTH OK``. Don't
-panic. With respect to OSDs, you should expect that the cluster will **NOT**
-echo ``HEALTH OK`` in a few expected circumstances:
+If you run the commands ``ceph health``, ``ceph -s``, or ``ceph -w``,
+you might notice that the cluster does not always show ``HEALTH OK``. Don't
+panic. There are certain circumstances in which it is expected and normal that
+the cluster will **NOT** show ``HEALTH OK``:
  
-#. You haven't started the cluster yet (it won't respond).
-#. You have just started or restarted the cluster and it's not ready yet,
-   because the placement groups are getting created and the OSDs are in
-   the process of peering.
-#. You just added or removed an OSD.
-#. You just have modified your cluster map.
+#. You haven't started the cluster yet.
+#. You have just started or restarted the cluster and it's not ready to show
+   health statuses yet, because the PGs are in the process of being created and
+   the OSDs are in the process of peering.
+#. You have just added or removed an OSD.
+#. You have just have modified your cluster map.
  
-An important aspect of monitoring OSDs is to ensure that when the cluster
-is up and running that all OSDs that are ``in`` the cluster are ``up`` and
-running, too. To see if all OSDs are running, execute:
+Checking to see if OSDs are ``up`` and running is an important aspect of monitoring them:
+whenever the cluster is up and running, every OSD that is ``in`` the cluster should also
+be ``up`` and running. To see if all of the cluster's OSDs are running, run the following
+command:
  
  .. prompt:: bash $
  
-       ceph osd stat
+    ceph osd stat
  
-The result should tell you the total number of OSDs (x),
-how many are ``up`` (y), how many are ``in`` (z) and the map epoch (eNNNN). ::
+The output provides the following information: the total number of OSDs (x),
+how many OSDs are ``up`` (y), how many OSDs are ``in`` (z), and the map epoch (eNNNN). ::
  
-       x osds: y up, z in; epoch: eNNNN
+    x osds: y up, z in; epoch: eNNNN
  
-If the number of OSDs that are ``in`` the cluster is more than the number of
-OSDs that are ``up``, execute the following command to identify the ``ceph-osd``
+If the number of OSDs that are ``in`` the cluster is greater than the number of
+OSDs that are ``up``, run the following command to identify the ``ceph-osd``
  daemons that are not running:
  
  .. prompt:: bash $
  
-       ceph osd tree
+    ceph osd tree
  
  :: 
  
-       #ID CLASS WEIGHT  TYPE NAME             STATUS REWEIGHT PRI-AFF
-        -1       2.00000 pool openstack
-        -3       2.00000 rack dell-2950-rack-A
-        -2       2.00000 host dell-2950-A1
-         0   ssd 1.00000      osd.0                up  1.00000 1.00000
-         1   ssd 1.00000      osd.1              down  1.00000 1.00000
+    #ID CLASS WEIGHT  TYPE NAME             STATUS REWEIGHT PRI-AFF
+     -1       2.00000 pool openstack
+     -3       2.00000 rack dell-2950-rack-A
+     -2       2.00000 host dell-2950-A1
+      0   ssd 1.00000      osd.0                up  1.00000 1.00000
+      1   ssd 1.00000      osd.1              down  1.00000 1.00000
  
-.. tip:: The ability to search through a well-designed CRUSH hierarchy may help
-   you troubleshoot your cluster by identifying the physical locations faster.
+.. tip:: Searching through a well-designed CRUSH hierarchy to identify the physical
+   locations of particular OSDs might help you troubleshoot your cluster.
  
-If an OSD is ``down``, start it:
+If an OSD is ``down``, start it by running the following command:
  
  .. prompt:: bash $
  
-       sudo systemctl start ceph-osd@1
+    sudo systemctl start ceph-osd@1
+
+For problems associated with OSDs that have stopped or won't restart, see `OSD Not Running`_.
  
-See `OSD Not Running`_ for problems associated with OSDs that stopped, or won't
-restart.
-       
  
  PG Sets
  =======
  
-When CRUSH assigns placement groups to OSDs, it looks at the number of replicas
-for the pool and assigns the placement group to OSDs such that each replica of
-the placement group gets assigned to a different OSD. For example, if the pool
-requires three replicas of a placement group, CRUSH may assign them to
-``osd.1``, ``osd.2`` and ``osd.3`` respectively. CRUSH actually seeks a
-pseudo-random placement that will take into account failure domains you set in
-your `CRUSH map`_, so you will rarely see placement groups assigned to nearest
-neighbor OSDs in a large cluster.
+When CRUSH assigns a PG to OSDs, it takes note of how many replicas of the PG
+are required by the pool and then assigns each replica to a different OSD.
+For example, if the pool requires three replicas of a PG, CRUSH might assign
+them individually to ``osd.1``, ``osd.2`` and ``osd.3``. CRUSH seeks a
+pseudo-random placement that takes into account the failure domains that you
+have set in your `CRUSH map`_; for this reason, PGs are rarely assigned to
+immediately adjacent OSDs in a large cluster.
  
  Ceph processes a client request using the **Acting Set**, which is the set of
  OSDs that will actually handle the requests since they have a full and working
@@ -123,56 +123,55 @@ version of a placement group shard. The set of OSDs that should contain a shard
  of a particular placement group as the **Up Set**, i.e. where data is
  moved/copied to (or planned to be).
  
-In some cases, an OSD in the Acting Set is ``down`` or otherwise not able to
-service requests for objects in the placement group. When these situations
-arise, don't panic. Common examples include:
+Sometimes an OSD in the Acting Set is ``down`` or otherwise unable to
+service requests for objects in the PG. When this kind of situation
+arises, don't panic. Common examples of such a situation include:
  
-- You added or removed an OSD. Then, CRUSH reassigned the placement group to 
-  other OSDs--thereby changing the composition of the Acting Set and spawning
-  the migration of data with a "backfill" process.
+- You added or removed an OSD, CRUSH reassigned the PG to 
+  other OSDs, and this reassignment changed the composition of the Acting Set and triggered
+  the migration of data by means of a "backfill" process.
  - An OSD was ``down``, was restarted, and is now ``recovering``.
-- An OSD in the Acting Set is ``down`` or unable to service requests, 
+- An OSD in the Acting Set is ``down`` or unable to service requests,
    and another OSD has temporarily assumed its duties.
  
-In most cases, the Up Set and the Acting Set are identical. When they are not,
-it may indicate that Ceph is migrating the PG (it's remapped), an OSD is
-recovering, or that there is a problem (i.e., Ceph usually echoes a "HEALTH
-WARN" state with a "stuck stale" message in such scenarios).
+Typically, the Up Set and the Acting Set are identical. When they are not, it
+might indicate that Ceph is migrating the PG (in other words, that the PG has
+been remapped), that an OSD is recovering, or that there is a problem with the
+cluster (in such scenarios, Ceph usually shows a "HEALTH WARN" state with a
+"stuck stale" message).
  
-To retrieve a list of placement groups, execute:
+To retrieve a list of PGs, run the following command:
  
  .. prompt:: bash $
  
-       ceph pg dump
-       
-To view which OSDs are within the Acting Set or the Up Set for a given placement
-group, execute:
+    ceph pg dump
+
+To see which OSDs are within the Acting Set and the Up Set for a specific PG, run the following command:
  
  .. prompt:: bash $
  
-       ceph pg map {pg-num}
+    ceph pg map {pg-num}
  
-The result should tell you the osdmap epoch (eNNN), the placement group number
-({pg-num}), the OSDs in the Up Set (up[]), and the OSDs in the acting set
+The output provides the following information: the osdmap epoch (eNNN), the PG number
+({pg-num}), the OSDs in the Up Set (up[]), and the OSDs in the Acting Set
  (acting[])::
  
-       osdmap eNNN pg {raw-pg-num} ({pg-num}) -> up [0,1,2] acting [0,1,2]
+    osdmap eNNN pg {raw-pg-num} ({pg-num}) -> up [0,1,2] acting [0,1,2]
  
-.. note:: If the Up Set and Acting Set do not match, this may be an indicator
-   that the cluster rebalancing itself or of a potential problem with 
+.. note:: If the Up Set and the Acting Set do not match, this might indicate
+   that the cluster is rebalancing itself or that there is a problem with 
     the cluster.
- 
+
  
  Peering
  =======
  
-Before you can write data to a placement group, it must be in an ``active``
-state, and it **should** be in a ``clean`` state. For Ceph to determine the
-current state of a placement group, the primary OSD of the placement group
-(i.e., the first OSD in the acting set), peers with the secondary and tertiary
-OSDs to establish agreement on the current state of the placement group
-(assuming a pool with 3 replicas of the PG).
-
+Before you can write data to a PG, it must be in an ``active`` state and it
+will preferably be in a ``clean`` state. For Ceph to determine the current
+state of a PG, peering must take place.  That is, the primary OSD of the PG
+(that is, the first OSD in the Acting Set) must peer with the secondary and
+OSDs so that consensus on the current state of the PG can be established. In
+the following diagram, we assume a pool with three replicas of the PG:
  
  .. ditaa::
  
@@ -187,99 +186,100 @@ OSDs to establish agreement on the current state of the placement group
                  |    Peering                   |
                  |                              |
                  |         Request To           |
-                |            Peer              | 
+                |            Peer              |
                  |----------------------------->|
                  |<-----------------------------|
                  |          Peering             |
  
-The OSDs also report their status to the monitor. See `Configuring Monitor/OSD
-Interaction`_ for details. To troubleshoot peering issues, see `Peering
+The OSDs also report their status to the monitor. For details, see `Configuring Monitor/OSD
+Interaction`_. To troubleshoot peering issues, see `Peering
  Failure`_.
  
  
-Monitoring Placement Group States
-=================================
+Monitoring PG States
+====================
  
-If you execute a command such as ``ceph health``, ``ceph -s`` or ``ceph -w``,
-you may notice that the cluster does not always echo back ``HEALTH OK``. After
-you check to see if the OSDs are running, you should also check placement group
-states. You should expect that the cluster will **NOT** echo ``HEALTH OK`` in a
-number of placement group peering-related circumstances:
+If you run the commands ``ceph health``, ``ceph -s``, or ``ceph -w``,
+you might notice that the cluster does not always show ``HEALTH OK``. After
+first checking to see if the OSDs are running, you should also check PG
+states. There are certain PG-peering-related circumstances in which it is expected
+and normal that the cluster will **NOT** show ``HEALTH OK``:
  
-#. You have just created a pool and placement groups haven't peered yet.
-#. The placement groups are recovering.
+#. You have just created a pool and the PGs haven't peered yet.
+#. The PGs are recovering.
  #. You have just added an OSD to or removed an OSD from the cluster.
-#. You have just modified your CRUSH map and your placement groups are migrating.
-#. There is inconsistent data in different replicas of a placement group.
-#. Ceph is scrubbing a placement group's replicas.
+#. You have just modified your CRUSH map and your PGs are migrating.
+#. There is inconsistent data in different replicas of a PG.
+#. Ceph is scrubbing a PG's replicas.
  #. Ceph doesn't have enough storage capacity to complete backfilling operations.
  
-If one of the foregoing circumstances causes Ceph to echo ``HEALTH WARN``, don't
-panic. In many cases, the cluster will recover on its own. In some cases, you
-may need to take action. An important aspect of monitoring placement groups is
-to ensure that when the cluster is up and running that all placement groups are
-``active``, and preferably in the ``clean`` state. To see the status of all
-placement groups, execute:
+If one of these circumstances causes Ceph to show ``HEALTH WARN``, don't
+panic. In many cases, the cluster will recover on its own. In some cases, however, you
+might need to take action. An important aspect of monitoring PGs is to check their
+status as ``active`` and ``clean``: that is, it is important to ensure that, when the
+cluster is up and running, all PGs are ``active`` and (preferably) ``clean``.
+To see the status of every PG, run the following command:
  
  .. prompt:: bash $
  
-       ceph pg stat
+    ceph pg stat
  
-The result should tell you the total number of placement groups (x), how many
-placement groups are in a particular state such as ``active+clean`` (y) and the
+The output provides the following information: the total number of PGs (x), how many
+PGs are in a particular state such as ``active+clean`` (y), and the
  amount of data stored (z). ::
  
-       x pgs: y active+clean; z bytes data, aa MB used, bb GB / cc GB avail
+    x pgs: y active+clean; z bytes data, aa MB used, bb GB / cc GB avail
  
-.. note:: It is common for Ceph to report multiple states for placement groups.
+.. note:: It is common for Ceph to report multiple states for PGs (for example,
+   ``active+clean``, ``active+clean+remapped``, ``active+clean+scrubbing``.
  
-In addition to the placement group states, Ceph will also echo back the amount of
-storage capacity used (aa), the amount of storage capacity remaining (bb), and the total
-storage capacity for the placement group. These numbers can be important in a
-few cases: 
+Here Ceph shows not only the PG states, but also storage capacity used (aa),
+the amount of storage capacity remaining (bb), and the total storage capacity
+of the PG. These values can be important in a few cases:
  
-- You are reaching your ``near full ratio`` or ``full ratio``. 
-- Your data is not getting distributed across the cluster due to an 
-  error in your CRUSH configuration.
+- The cluster is reaching its ``near full ratio`` or ``full ratio``.
+- Data is not being distributed across the cluster due to an error in the
+  CRUSH configuration.
  
  
  .. topic:: Placement Group IDs
  
-   Placement group IDs consist of the pool number (not pool name) followed 
-   by a period (.) and the placement group ID--a hexadecimal number. You
-   can view pool numbers and their names from the output of ``ceph osd 
-   lspools``. For example, the first pool created corresponds to
-   pool number ``1``. A fully qualified placement group ID has the
+   PG IDs consist of the pool number (not the pool name) followed 
+   by a period (.) and the PG ID-- (a hexadecimal number). You
+   can view pool numbers and their names from in the output of ``ceph osd 
+   lspools``. For example, the first pool that was created corresponds to
+   pool number ``1``. A fully qualified PG ID has the
     following form::
-   
-       {pool-num}.{pg-id}
-   
-   And it typically looks like this:: 
-   
-       1.1f
-   
  
-To retrieve a list of placement groups, execute the following:
+       {pool-num}.{pg-id}
+
+   It typically resembles the following:: 
+
+    1.1701b
+
+
+To retrieve a list of PGs, run the following command:
  
  .. prompt:: bash $
  
-       ceph pg dump
-       
-You can also format the output in JSON format and save it to a file:
+    ceph pg dump
+
+To format the output in JSON format and save it to a file, run the following command:
  
  .. prompt:: bash $
  
-       ceph pg dump -o {filename} --format=json
+    ceph pg dump -o {filename} --format=json
  
-To query a particular placement group, execute the following:
+To query a specific PG, run the following command:
  
  .. prompt:: bash $
  
-       ceph pg {poolnum}.{pg-id} query
-       
+    ceph pg {poolnum}.{pg-id} query
+
  Ceph will output the query in JSON format.
  
-The following subsections describe the common pg states in detail.
+The following subsections describe the most common PG states in detail.
+
  
  Creating
  --------
author	Zac Dover <zac.dover@proton.me>
	Tue, 4 Apr 2023 03:29:15 +0000 (13:29 +1000)
committer	Zac Dover <zac.dover@proton.me>
	Wed, 5 Apr 2023 04:41:09 +0000 (14:41 +1000)