doc/rados/operations/health-checks: document MGR_DOWN

author Sage Weil <sage@redhat.com>

Wed, 31 Jul 2019 09:57:49 +0000 (04:57 -0500)

committer Sage Weil <sage@redhat.com>

Thu, 15 Aug 2019 01:40:08 +0000 (20:40 -0500)
author Sage Weil <sage@redhat.com>
Wed, 31 Jul 2019 09:57:49 +0000 (04:57 -0500)
committer Sage Weil <sage@redhat.com>
Thu, 15 Aug 2019 01:40:08 +0000 (20:40 -0500)
diff --git a/doc/rados/operations/health-checks.rst b/doc/rados/operations/health-checks.rst

index f6ca463bf0e10d53897129e6e3c4e597910e9db4..0668aa41845e3a03c9b7dcbf60978f8b58bc0371 100644 (file)
--- a/doc/rados/operations/health-checks.rst
+++ b/doc/rados/operations/health-checks.rst
@@ -75,6 +75,24 @@ If a monitor is configured to listen for v1 connections on a non-standard port (
  Manager
  -------
  
+MGR_DOWN
+________
+
+All manager daemons are currently down.  The cluster should normally
+have at least one running manager (``ceph-mgr``) daemon.  If no
+manager daemon is running, the cluster's ability to monitor itself will
+be compromised, and parts of the management API will become
+unavailable (for example, the dashboard will not work, and most CLI
+commands that report metrics or runtime state will block).  However,
+the cluster will still be able to perform all IO operations and
+recover from failures.
+
+The down manager daemon should generally be restarted as soon as
+possible to ensure that the cluster can be monitored (e.g., so that
+the ``ceph -s`` information is up to date, and/or metrics can be
+scraped by Prometheus).
+
+
  MGR_MODULE_DEPENDENCY
  _____________________
author	Sage Weil <sage@redhat.com>
	Wed, 31 Jul 2019 09:57:49 +0000 (04:57 -0500)
committer	Sage Weil <sage@redhat.com>
	Thu, 15 Aug 2019 01:40:08 +0000 (20:40 -0500)