When sharing OSD maps with a peer (e.g. during heartbeat in
maybe_share_map), we may have already trimmed the requested range or
newest_map (e.g. trim race, or store read failure). In that case the
panic path tried to send newest_map; if it could not be loaded, the
code called ceph_abort() and crashed the OSD.
Log and return an empty MOSDMap instead of aborting. The receiver drops
such messages (last <= superblock.get_newest_map()) and can re-request
from the mon.
Fixes: https://tracker.ceph.com/issues/74800
Signed-off-by: Vinayak Tiwari <tiwarivinayak10@gmail.com>
// send what we have so far
return m;
}
- // send something
+ // send something if we can
bufferlist bl;
if (get_inc_map_bl(m->newest_map, bl)) {
m->incremental_maps[m->newest_map] = std::move(bl);
- } else {
- derr << __func__ << " unable to load latest map " << m->newest_map << dendl;
- if (!get_map_bl(m->newest_map, bl)) {
- derr << __func__ << " unable to load latest full map " << m->newest_map
- << dendl;
- ceph_abort();
- }
+ } else if (get_map_bl(m->newest_map, bl)) {
m->maps[m->newest_map] = std::move(bl);
+ } else {
+ derr << __func__ << " unable to load latest map " << m->newest_map
+ << ", sending empty map message (peer will drop or re-request from mon)"
+ << dendl;
+
}
return m;
}