From 78a5d1f4157f6c4211ac2e89f2450f7ef6d98485 Mon Sep 17 00:00:00 2001 From: Ville Ojamo <14869000+bluikko@users.noreply.github.com> Date: Tue, 6 Jan 2026 16:22:29 +0700 Subject: [PATCH] doc/dev: Improvements to health-reports.rst (1 of 2) Try to improve language of the document. Completely rewrite sections where possible without confirming nuances from the source. Use Monitor, Manager; add articles and fix typos/missing letters, etc. 1 of 2, second part should add back diagrams, text about them and reflow text. Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com> --- doc/dev/health-reports.rst | 86 +++++++++++++++++++------------------- 1 file changed, 42 insertions(+), 44 deletions(-) diff --git a/doc/dev/health-reports.rst b/doc/dev/health-reports.rst index 7769c6d8cea8..69b953f94fd8 100644 --- a/doc/dev/health-reports.rst +++ b/doc/dev/health-reports.rst @@ -9,61 +9,59 @@ How to Get Reports In general, there are two channels to retrieve the health reports: ceph (CLI) - which sends ``health`` mon command for retrieving the health status of the cluster -mgr module - which calls ``mgr.get('health')`` for the same report in the form of a JSON encoded string - -The following diagrams outline the involved parties and how the interact when the clients -query for the reports: + The ``ceph`` CLI command sends the ``health`` Monitor command for retrieving + the health status of the cluster. +Manager module + A Manager module calls the ``mgr.get('health')`` method for the same report + in the form of a JSON encoded string. Where are the Reports Generated =============================== -Aggregator of Aggregators -------------------------- +Monitor: Aggregator of Aggregators +---------------------------------- -Health reports are aggregated from multiple Paxos services: +Monitor aggregates health reports from multiple Paxos services: -- AuthMonitor -- HealthMonitor -- MDSMonitor -- MgrMonitor -- MgrStatMonitor -- MonmapMonitor -- OSDMonitor +- ``AuthMonitor`` +- ``HealthMonitor`` +- ``MDSMonitor`` +- ``MgrMonitor`` +- ``MgrStatMonitor`` +- ``MonmapMonitor`` +- ``OSDMonitor`` -When persisting the pending changes in their own domain, each of them identifies the -health related issues and store them into the monstore with the prefix of ``health`` -using the same transaction. For instance, ``OSDMonitor`` checks a pending new osdmap -for possible issues, like down OSDs and missing scrub flag in a pool, and then stores -the encoded form of the health reports along with the new osdmap. These reports are -later loaded and decoded, so they can be collected on demand. When it comes to -``MDSMonitor``, it persists the health metrics in the beacon sent by the MDS daemons, -and prepares health reports when storing the pending changes. +When each of the Paxos services persist the pending changes in their own domain, +health-related issues are identified and stored into monstore with the prefix ``health`` +using the same transaction. For instance: +- ``OSDMonitor`` checks a pending osdmap for possible issues such as + ``down`` OSDs and a missing scrub flag in a pool and then stores + the encoded form of the health reports along with the new osdmap. These reports are + later loaded and decoded, so they can be collected on demand. +- ``MDSMonitor`` persists the health metrics contained in the beacon sent by the MDS daemons + and prepares health reports when storing the pending changes. -So, if we want to add a new warning related to cephfs, probably the best place to +To add a new warning related to CephFS, for example, a good place to start is ``MDSMonitor::encode_pending()``, where health reports are collected from the latest ``FSMap`` and the health metrics reported by MDS daemons. -But it's noteworthy that ``MgrStatMonitor`` does *not* prepare the reports by itself, -it just stores whatever the health reports received from mgr! - -ceph-mgr -- A Delegate Aggregator ---------------------------------- - -In Ceph, mgr is created to share the burden of monitor, which is used to establish -the consensus of information which is critical to keep the cluster function. -Apparently, osdmap, mdsmap and monmap fall into this category. But what about the -aggregated statistics of the cluster? They are crucial for the administrator to -understand the status of the cluster, but they might not be that important to keep -the cluster running. To address this scalability issue, we offloaded the work of -collecting and aggregating the metrics to mgr. - -Now, mgr is responsible for receiving and processing the ``MPGStats`` messages from -OSDs. And we also developed a protocol allowing a daemon to periodically report its -metrics and status to mgr using ``MMgrReport``. On the mgr side, it periodically sends -an aggregated report to the ``MgrStatMonitor`` service on mon. As explained earlier, -this service just persists the health reports in the aggregated report to the monstore. +It is noteworthy that ``MgrStatMonitor`` does not prepare health reports. It +receives aggregated reports from the Manager and then persists them to monstore. + + +Manager: a Delegate Aggregator +------------------------------ + +Monitor establishes consensus information including osdmap, mdsmap and monmap +which is critical for cluster functioning. Aggregated statistics of the cluster +are crucial for the administrator to understand the status of the cluster but +they are not critical for cluster functioning. For scalability reasons they are +offloaded to Manager which collects and aggregates the metrics. + +Manager receives and processes ``MPGStats`` messages from OSDs. Daemons also +report metrics and status periodically to Manager using ``MMgrReport``. An +aggregated report is then sent periodically to the Monitor ``MgrStatMonitor`` +service which persists the data to monstore. -- 2.47.3