From: xie xingguo Date: Mon, 10 Sep 2018 07:15:17 +0000 (+0800) Subject: mon/OSDMonitor: share new maps with even non-active osds X-Git-Tag: v14.0.1~341^2~1 X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=79f480442f5219b5cad3b7979446fa79cff4986b;p=ceph.git mon/OSDMonitor: share new maps with even non-active osds OSDs may not be aware of their deadness and trapped at an obsolete map in which they were still marked as up: ``` host osd down_at stuck_at ceph-03 9 e712 e711 ceph-03 13 e700 e699 ceph-03 28 e697 e696 ceph-03 48 e697 e696 ceph-03 52 e707 e704 ceph-03 61 e710 e708 ceph-03 73 e712 e710 ceph-03 77 e708 e707 ceph-05 12 e711 e710 ceph-05 21 e703 e702 ceph-05 24 e700 e699 ceph-05 29 e703 e699 ceph-05 41 e711 e710 ceph-05 53 e711 e710 ceph-05 72 e712 e711 ``` In https://github.com/ceph/ceph/pull/23958 an OSD will ping monitor periodically now if it is stuck at __wait_for_healthy__. But in the above case OSDs are still considering themselves as __active__ and hence should miss that fixer. Since these OSDs might be still able to contact with monitors ( otherwise there is no way for them to be marked up again) and send beacons contiguously, we can simply get them out of the trap by sharing some new maps with them. Signed-off-by: xie xingguo Signed-off-by: runsisi --- diff --git a/src/mon/OSDMonitor.cc b/src/mon/OSDMonitor.cc index dc7aefd493ed6..0cc2206cafee1 100644 --- a/src/mon/OSDMonitor.cc +++ b/src/mon/OSDMonitor.cc @@ -3572,6 +3572,11 @@ bool OSDMonitor::prepare_beacon(MonOpRequestRef op) if (!src.is_osd() || !osdmap.is_up(from) || beacon->get_orig_source_addrs() != osdmap.get_addrs(from)) { + if (src.is_osd() && !osdmap.is_up(from)) { + // share some new maps with this guy in case it may not be + // aware of its own deadness... + send_latest(op, beacon->version+1); + } dout(1) << " ignoring beacon from non-active osd." << from << dendl; return false; }