From: Zac Dover Date: Fri, 11 Aug 2023 15:25:32 +0000 (+1000) Subject: doc/rados: update monitoring-osd-pg.rst X-Git-Tag: v17.2.7~201^2 X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=2b6fbc81daa21aa652b78a5ea9bfc7d4c5aa589a;p=ceph.git doc/rados: update monitoring-osd-pg.rst Ingest Anthony D'Atri's notes from https://github.com/ceph/ceph/pull/50856#discussion_r1289532902 which should have been included earlier. Signed-off-by: Zac Dover (cherry picked from commit 73503d4e2665bee9c49a055c11be1ffe2db253a1) --- diff --git a/doc/rados/operations/monitoring-osd-pg.rst b/doc/rados/operations/monitoring-osd-pg.rst index 3ddf1a5f6551d..510096ed50d9a 100644 --- a/doc/rados/operations/monitoring-osd-pg.rst +++ b/doc/rados/operations/monitoring-osd-pg.rst @@ -10,10 +10,11 @@ directly to specific OSDs. For this reason, tracking system faults requires finding the `placement group`_ (PG) and the underlying OSDs at the root of the problem. -.. tip:: A fault in one part of the cluster might prevent you from accessing a - particular object, but that doesn't mean that you are prevented from accessing other objects. - When you run into a fault, don't panic. Just follow the steps for monitoring - your OSDs and placement groups, and then begin troubleshooting. +.. tip:: A fault in one part of the cluster might prevent you from accessing a + particular object, but that doesn't mean that you are prevented from + accessing other objects. When you run into a fault, don't panic. Just + follow the steps for monitoring your OSDs and placement groups, and then + begin troubleshooting. Ceph is self-repairing. However, when problems persist, monitoring OSDs and placement groups will help you identify the problem. @@ -22,17 +23,18 @@ placement groups will help you identify the problem. Monitoring OSDs =============== -An OSD's status is as follows: it is either in the cluster (``in``) or out of the cluster -(``out``); likewise, it is either up and running (``up``) or down and not -running (``down``). If an OSD is ``up``, it can be either ``in`` the cluster -(if so, you can read and write data) or ``out`` of the cluster. If the OSD was previously -``in`` the cluster but was recently moved ``out`` of the cluster, Ceph will migrate its -PGs to other OSDs. If an OSD is ``out`` of the cluster, CRUSH will -not assign any PGs to that OSD. If an OSD is ``down``, it should also be -``out``. - -.. note:: If an OSD is ``down`` and ``in``, then there is a problem and the cluster - is not in a healthy state. +An OSD is either *in* service (``in``) or *out* of service (``out``). An OSD is +either running and reachable (``up``), or it is not running and not +reachable (``down``). + +If an OSD is ``up``, it may be either ``in`` service (clients can read and +write data) or it is ``out`` of service. If the OSD was ``in`` but then due to a failure or a manual action was set to the ``out`` state, Ceph will migrate placement groups to the other OSDs to maintin the configured redundancy. + +If an OSD is ``out`` of service, CRUSH will not assign placement groups to it. +If an OSD is ``down``, it will also be ``out``. + +.. note:: If an OSD is ``down`` and ``in``, there is a problem and this + indicates that the cluster is not in a healthy state. .. ditaa::