crimson/osd: fix crash in committed_osd_maps when an OSD is removed
OSDMap::is_down(osd) is defined as !is_up(osd), and is_up() gates on
exists(osd). This means is_down() returns true for OSDs that have
been *removed* from the map (EXISTS flag cleared), not just marked
down.
committed_osd_maps() iterates over epochs [first, last], and for each
epoch over all OSDs in old_map, calling get_cluster_addrs() for any
OSD that was up in old_map and is_down() in the current epoch.
get_cluster_addrs() asserts exists(osd), so when that OSD has been
removed the assertion fires.
# Two rapid OSDMap changes; the monitor batches them into one message.
ceph osd down 2
ceph osd purge 2 --yes-i-really-mean-it
# osd.0 and osd.1 call committed_osd_maps(N, N+1). Before this fix
# old_map is set once before the loop and never updated, so in
# iteration N+1 the comparison is still old_map(N-1) vs osdmap(N+1):
#
# old_map->is_up(2)=true (osd.2 was up at N-1)
# osdmap->exists(2)=false (purged in N+1)
# osdmap->is_down(2)=true (!is_up, since !exists -> true)
# -> get_cluster_addrs(2) asserts -> crash
#
# OSDMap.h: ceph_assert(exists(osd)) [in get_cluster_addrs()]
# Signal 6 (SIGABRT)
Note: 'ceph osd destroy' does NOT clear the EXISTS flag; it only sets
CEPH_OSD_DESTROYED. The EXISTS flag is cleared by 'osd rm', which
'osd purge' calls internally after 'osd destroy'.
Fix: advance old_map at the end of each iteration so the comparison
is pairwise (N-1 vs N, then N vs N+1, ...), matching classic
OSD::advance_map at src/osd/OSD.cc:8615. In the reproducer,
iteration N marks osd.2 down using osdmap(N) (where osd.2 still
exists), then sets old_map = osdmap(N). Iteration N+1 starts with
old_map(N)->is_up(2)=false (osd.2 was DOWN in N), so the condition
short-circuits and get_cluster_addrs() is never called on the new
map.
No explicit !exists branch is needed. The monitor produces a
separate epoch for each of 'osd down' / 'osd destroy' / 'osd rm', so
an OSD can only transition UP -> REMOVED through at least one
intermediate DOWN epoch in any batched MOSDMap, and the pairwise
comparison short-circuits before the assert can fire.