From 57c4795c006c1dcac26b645e36c0ec81f820a500 Mon Sep 17 00:00:00 2001
From: Sage Weil <sage@redhat.com>
Date: Tue, 8 Jan 2019 21:38:42 -0600
Subject: [PATCH] doc/rados/operations/health-checks: document MON_* health
 warnings

Signed-off-by: Sage Weil <sage@redhat.com>
---
 doc/rados/operations/health-checks.rst | 31 ++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/doc/rados/operations/health-checks.rst b/doc/rados/operations/health-checks.rst
index d43941df8de..d93fc6d2965 100644
--- a/doc/rados/operations/health-checks.rst
+++ b/doc/rados/operations/health-checks.rst
@@ -21,6 +21,37 @@ that are defined by ceph-mgr python modules.
 Definitions
 ===========
 
+Monitor
+-------
+
+MON_DOWN
+________
+
+One or more monitor daemons is currently down.  The cluster requires a
+majority (more than 1/2) of the monitors in order to function.  When
+one or more monitors are down, clients may have a harder time forming
+their initial connection to the cluster as they may need to try more
+addresses before they reach an operating monitor.
+
+The down monitor daemon should generally be restarted as soon as
+possible to reduce the risk of a subsequen monitor failure leading to
+a service outage.
+
+MON_CLOCK_SKEW
+______________
+
+The clocks on the hosts running the ceph-mon monitor daemons are not
+sufficiently well synchronized.  This health alert is raised if the
+cluster detects a clock skew greater than ``mon_clock_drift_allowed``.
+
+This is best resolved by synchronizing the clocks using a tool like
+``ntpd`` or ``chrony``.
+
+If it is impractical to keep the clocks closely synchronized, the
+``mon_clock_drift_allowed`` threshold can also be increased, but this
+value must stay significantly below the ``mon_lease`` interval in
+order for monitor cluster to function properly.
+
 Manager
 -------
 
-- 
2.39.5