We hit a couple of bugs because in discover_all_missing() we send
pg_query to an OSD that was marked stray and already got purged. This results
in a state machine crash on the purged OSD. Fix this by skipping any
purged peers.
Fixes: https://tracker.ceph.com/issues/41317
Fixes: https://tracker.ceph.com/issues/40963
Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit
4960f579a234f9984d73767fde073c419e884c17)
Conflicts:
src/osd/PeeringState.cc
- file does not exist in mimic; made the changes manually in
src/osd/PG.cc
continue;
}
+ if (peer_purged.count(peer)) {
+ dout(20) << __func__ << " skipping purged osd." << peer << dendl;
+ continue;
+ }
+
map<pg_shard_t, pg_info_t>::const_iterator iter = peer_info.find(peer);
if (iter != peer_info.end() &&
(iter->second.is_empty() || iter->second.dne())) {