OSDs may not be aware of their deadness and trapped at
an obsolete map in which they were still marked as up:
```
host osd down_at stuck_at
ceph-03 9 e712 e711
ceph-03 13 e700 e699
ceph-03 28 e697 e696
ceph-03 48 e697 e696
ceph-03 52 e707 e704
ceph-03 61 e710 e708
ceph-03 73 e712 e710
ceph-03 77 e708 e707
ceph-05 12 e711 e710
ceph-05 21 e703 e702
ceph-05 24 e700 e699
ceph-05 29 e703 e699
ceph-05 41 e711 e710
ceph-05 53 e711 e710
ceph-05 72 e712 e711
```
In https://github.com/ceph/ceph/pull/23958 an OSD will ping monitor
periodically now if it is stuck at __wait_for_healthy__. But in the
above case OSDs are still considering themselves as __active__ and
hence should miss that fixer.
Since these OSDs might be still able to contact with monitors (
otherwise there is no way for them to be marked up again) and send
beacons contiguously, we can simply get them out of the trap by
sharing some new maps with them.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
Signed-off-by: runsisi <runsisi@zte.com.cn>
if (!src.is_osd() ||
!osdmap.is_up(from) ||
beacon->get_orig_source_addrs() != osdmap.get_addrs(from)) {
+ if (src.is_osd() && !osdmap.is_up(from)) {
+ // share some new maps with this guy in case it may not be
+ // aware of its own deadness...
+ send_latest(op, beacon->version+1);
+ }
dout(1) << " ignoring beacon from non-active osd." << from << dendl;
return false;
}