From: Sage Weil Date: Mon, 24 Jun 2013 23:37:29 +0000 (-0700) Subject: osd: tolerate racing threads starting recovery ops X-Git-Tag: v0.61.8~8 X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=4433f9ad8b338b6a55e205602434b307287bfaa3;p=ceph.git osd: tolerate racing threads starting recovery ops We sample the (max - active) recovery ops to know how many to start, but do not hold the lock over the full duration, such that it is possible to start too many ops. This isn't problematic except that our condition checks for being == max but not beyond it, and we will continue to start recovery ops when we shouldn't. Fix this by adjusting the conditional to be <=. Reported-by: Stefan Priebe Signed-off-by: Sage Weil Reviewed-by: David Zafman (cherry picked from commit 3791a1e55828ba541f9d3e8e3df0da8e79c375f9) --- diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc index 679f0e143bdd2..7d0a0e3e5e633 100644 --- a/src/osd/OSD.cc +++ b/src/osd/OSD.cc @@ -6057,7 +6057,7 @@ void OSD::do_recovery(PG *pg) recovery_wq.lock(); int max = g_conf->osd_recovery_max_active - recovery_ops_active; recovery_wq.unlock(); - if (max == 0) { + if (max <= 0) { dout(10) << "do_recovery raced and failed to start anything; requeuing " << *pg << dendl; recovery_wq.queue(pg); } else {