From 9db84be4b76c566cb4b5bd17662cffd9d2f0bd88 Mon Sep 17 00:00:00 2001
From: John Wilkins <john.wilkins@inktank.com>
Date: Tue, 4 Sep 2012 16:33:47 -0700
Subject: [PATCH] doc: Added monitor failure recovery. Will be re-factored
 again soon.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
---
 doc/cluster-ops/troubleshooting-mon.rst | 31 +++++++++++++++++++++++++
 1 file changed, 31 insertions(+)
 create mode 100644 doc/cluster-ops/troubleshooting-mon.rst

diff --git a/doc/cluster-ops/troubleshooting-mon.rst b/doc/cluster-ops/troubleshooting-mon.rst
new file mode 100644
index 0000000000000..e2b834acffbf2
--- /dev/null
+++ b/doc/cluster-ops/troubleshooting-mon.rst
@@ -0,0 +1,31 @@
+==================================
+ Recovering from Monitor Failures
+==================================
+
+In production clusters, we recommend running the cluster with a minimum
+of three monitors. The failure of a single monitor should not take down
+the entire monitor cluster, provided a majority of the monitors remain
+available. If the majority of nodes are available, the remaining nodes
+will be able to form a quorum.
+
+When you check your cluster's health, you may notice that a monitor
+has failed. For example:: 
+
+	ceph health
+	HEALTH_WARN 1 mons down, quorum 0,2
+
+For additional detail, you may check the cluster status::
+
+	ceph status
+	HEALTH_WARN 1 mons down, quorum 0,2
+	mon.b (rank 1) addr 192.168.106.220:6790/0 is down (out of quorum)
+
+In most cases, you can simply restart the affected node. 
+For example:: 
+
+	service ceph -a restart {failed-mon}
+
+If there are not enough monitors to form a quorum, the ``ceph``
+command will block trying to reach the cluster.  In this situation,
+you need to get enough ``ceph-mon`` daemons running to form a quorum
+before doing anything else with the cluster.
\ No newline at end of file
-- 
2.39.5