From: Kefu Chai Date: Tue, 25 May 2021 06:17:34 +0000 (+0800) Subject: mon/OSDMonitor: drop stale failure_info even if can_mark_down() X-Git-Tag: v17.1.0~1824^2 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=df6916a56841f89d66fd211729a0a7adc13042cf;p=ceph.git mon/OSDMonitor: drop stale failure_info even if can_mark_down() in a124ee85b03e15f4ea371358008ecac65f9f4e50, we add a check to drop stale failure_info reports. but if osdmap does not prohibit us from marking the osd in question down, the branch checking the stale info is not executed. in general, it is allowed to mark an osd down, so the fix of a124ee85b03e15f4ea371358008ecac65f9f4e50 just fails to work. in this change, we check for stale failure report of osd in question as long as the osd is not marked down in the same function. this should address the slow ops of failure report issue. Fixes: https://tracker.ceph.com/issues/50964 Signed-off-by: Kefu Chai --- diff --git a/src/mon/OSDMonitor.cc b/src/mon/OSDMonitor.cc index 7e8c3450bc60..601ddedf859a 100644 --- a/src/mon/OSDMonitor.cc +++ b/src/mon/OSDMonitor.cc @@ -3181,8 +3181,9 @@ bool OSDMonitor::check_failures(utime_t now) auto p = failure_info.begin(); while (p != failure_info.end()) { auto& [target_osd, fi] = *p; - if (can_mark_down(target_osd)) { - found_failure |= check_failure(now, target_osd, fi); + if (can_mark_down(target_osd) && + check_failure(now, target_osd, fi)) { + found_failure = true; ++p; } else if (is_failure_stale(now, fi)) { dout(10) << " dropping stale failure_info for osd." << target_osd