From: Sage Weil Date: Mon, 24 Sep 2018 16:12:00 +0000 (-0500) Subject: doc/rados/troubleshooting-mon: update mondb recovery script X-Git-Tag: v14.0.1~53^2 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=refs%2Fpull%2F24249%2Fhead;p=ceph.git doc/rados/troubleshooting-mon: update mondb recovery script - some cleanup (e.g., use $ms throughput) - behave if the local host is in the $hosts list (use $ms.remote) - be clear about updating all mons - mon.0 -> mon.foo Signed-off-by: Sage Weil --- diff --git a/doc/rados/troubleshooting/troubleshooting-mon.rst b/doc/rados/troubleshooting/troubleshooting-mon.rst index dd4c8fdeaa47..58d5bc1470ef 100644 --- a/doc/rados/troubleshooting/troubleshooting-mon.rst +++ b/doc/rados/troubleshooting/troubleshooting-mon.rst @@ -397,7 +397,7 @@ might be found in the monitor log:: or:: - Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/mon.0/store.db/1234567.ldb + Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/mon.foo/store.db/1234567.ldb Recovery using healthy monitor(s) --------------------------------- @@ -410,45 +410,51 @@ Recovery using OSDs ------------------- But what if all monitors fail at the same time? Since users are encouraged to -deploy at least three monitors in a Ceph cluster, the chance of simultaneous +deploy at least three (and preferably five) monitors in a Ceph cluster, the chance of simultaneous failure is rare. But unplanned power-downs in a data center with improperly configured disk/fs settings could fail the underlying filesystem, and hence kill all the monitors. In this case, we can recover the monitor store with the information stored in OSDs.:: - ms=/tmp/mon-store + ms=/root/mon-store mkdir $ms + # collect the cluster map from OSDs for host in $hosts; do - rsync -avz $ms user@host:$ms + rsync -avz $ms/. user@host:$ms.remote rm -rf $ms ssh user@host <