From bef6cdd1de1715cf645556e1d28748578f5399b3 Mon Sep 17 00:00:00 2001 From: Shraddha Agrawal Date: Tue, 20 May 2025 15:49:19 +0530 Subject: [PATCH] doc: address review comments Signed-off-by: Shraddha Agrawal (cherry picked from commit 22200d6cdf5c0bebe710ce8c1895f85fcf99ecd6) --- doc/rados/operations/monitoring.rst | 39 +++++++++++++++++++---------- 1 file changed, 26 insertions(+), 13 deletions(-) diff --git a/doc/rados/operations/monitoring.rst b/doc/rados/operations/monitoring.rst index 47251327b2427..64901eedba614 100644 --- a/doc/rados/operations/monitoring.rst +++ b/doc/rados/operations/monitoring.rst @@ -756,19 +756,32 @@ Example output: .. prompt:: bash $ POOL UPTIME DOWNTIME NUMFAILURES MTBF MTTR SCORE AVAILABLE - rbd 2m 21s 1 2m 21s 0.888889 1 - .mgr 86s 0s 0 0s 0s 1 1 - cephfs.a.meta 77s 0s 0 0s 0s 1 1 - cephfs.a.data 76s 0s 0 0s 0s 1 1 + rbd 2m 21s 1 2m 21s 0.888889 1 + .mgr 86s 0s 0 0s 0s 1 1 + cephfs.a.meta 77s 0s 0 0s 0s 1 1 + cephfs.a.data 76s 0s 0 0s 0s 1 1 A pool is considered ``unavailable`` when at least one PG in the pool becomes inactive or there is at least one unfound object in the pool. -Otherwise the pool is considered ``available``. - -We first calculate the Mean Time Between Failures (MTBF) and -Mean Time To Recover (MTTR) from the uptime and downtime recorded -for each pool and arrive at the availability score -by finding ratio of MTBF to total time (ie MTTR + MTBF). - -The score is updated every 5 seconds. This interval is currently -not configurable. \ No newline at end of file +Otherwise the pool is considered ``available``. Depending on the +current and previous state of the pool we update ``uptime`` and +``downtime`` values: + +================ =============== =============== ================= + Previous State Current State Uptime Update Downtime Update +================ =============== =============== ================= + Available Available +diff time no update + Available Unavailable +diff time no update + Unavailable Available +diff time no update + Unavailable Unavailable no update +diff time +================ =============== =============== ================= + +From the updated ``uptime`` and ``downtime`` values, we calculate +the Mean Time Between Failures (MTBF) and Mean Time To Recover (MTTR) +for each pool. The availability score is then calculated by finding +the ratio of MTBF to the total time. + +The score is updated every five seconds. This interval is currently +not configurable. Any intermittent changes to the pools that +occur between this duration but are reset before we recheck the pool +status will not be captured by this feature. \ No newline at end of file -- 2.39.5