From: Zac Dover <zac.dover@proton.me>
Date: Fri, 11 Aug 2023 15:25:32 +0000 (+1000)
Subject: doc/rados: update monitoring-osd-pg.rst
X-Git-Tag: v17.2.7~201^2
X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=2b6fbc81daa21aa652b78a5ea9bfc7d4c5aa589a;p=ceph.git

doc/rados: update monitoring-osd-pg.rst

Ingest Anthony D'Atri's notes from
https://github.com/ceph/ceph/pull/50856#discussion_r1289532902
which should have been included earlier.

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 73503d4e2665bee9c49a055c11be1ffe2db253a1)
---

diff --git a/doc/rados/operations/monitoring-osd-pg.rst b/doc/rados/operations/monitoring-osd-pg.rst
index 3ddf1a5f6551..510096ed50d9 100644
--- a/doc/rados/operations/monitoring-osd-pg.rst
+++ b/doc/rados/operations/monitoring-osd-pg.rst
@@ -10,10 +10,11 @@ directly to specific OSDs. For this reason, tracking system faults
 requires finding the `placement group`_ (PG) and the underlying OSDs at the
 root of the problem.
 
-.. tip:: A fault in one part of the cluster might prevent you from accessing a 
-   particular object, but that doesn't mean that you are prevented from accessing other objects.
-   When you run into a fault, don't panic. Just follow the steps for monitoring
-   your OSDs and placement groups, and then begin troubleshooting.
+.. tip:: A fault in one part of the cluster might prevent you from accessing a
+   particular object, but that doesn't mean that you are prevented from
+   accessing other objects.  When you run into a fault, don't panic. Just
+   follow the steps for monitoring your OSDs and placement groups, and then
+   begin troubleshooting.
 
 Ceph is self-repairing. However, when problems persist, monitoring OSDs and
 placement groups will help you identify the problem.
@@ -22,17 +23,18 @@ placement groups will help you identify the problem.
 Monitoring OSDs
 ===============
 
-An OSD's status is as follows: it is either in the cluster (``in``) or out of the cluster
-(``out``); likewise, it is either up and running (``up``) or down and not
-running (``down``). If an OSD is ``up``, it can be either ``in`` the cluster
-(if so, you can read and write data) or ``out`` of the cluster. If the OSD was previously
-``in`` the cluster but was recently moved ``out`` of the cluster, Ceph will migrate its
-PGs to other OSDs. If an OSD is ``out`` of the cluster, CRUSH will                               
-not assign any PGs to that OSD. If an OSD is ``down``, it should also be
-``out``.
-
-.. note:: If an OSD is ``down`` and ``in``, then there is a problem and the cluster 
-   is not in a healthy state.
+An OSD is either *in* service (``in``) or *out* of service (``out``). An OSD is
+either running and reachable (``up``), or it is not running and not
+reachable (``down``). 
+
+If an OSD is ``up``, it may be either ``in`` service (clients can read and
+write data) or it is ``out`` of service. If the OSD was ``in`` but then due to a failure or a manual action was set to the ``out`` state, Ceph will migrate placement groups to the other OSDs to maintin the configured redundancy.  
+
+If an OSD is ``out`` of service, CRUSH will not assign placement groups to it.
+If an OSD is ``down``, it will also be ``out``.
+
+.. note:: If an OSD is ``down`` and ``in``, there is a problem and this
+   indicates that the cluster is not in a healthy state.
 
 .. ditaa::