Reverts the change from https://github.com/ceph/ceph/pull/53993
and directly clears daemon health metrics for down and out OSDs.
The former approach of removing down/out OSDs from the daemon
state has undesirable consequences for stat output, including
the prometheus exporter.
Fixes: https://tracker.ceph.com/issues/66168
Signed-off-by: Cory Snyder <csnyder@1111systems.com>
(cherry picked from commit
282558cf40274366360bb3b1ec0fa102fbb592a6)
if (daemon_state.is_updating(k)) {
continue;
}
+
+ DaemonStatePtr daemon = daemon_state.get(k);
+
+ if (daemon && osd_map.is_out(osd_id) && osd_map.is_down(osd_id)) {
+ std::lock_guard l(daemon->lock);
+ daemon->daemon_health_metrics.clear();
+ }
bool update_meta = false;
- if (daemon_state.exists(k)) {
+ if (daemon) {
if (osd_map.get_up_from(osd_id) == osd_map.get_epoch()) {
dout(4) << "Mgr::handle_osd_map: osd." << osd_id
<< " joined cluster at " << "e" << osd_map.get_epoch()