From: xie xingguo Date: Sat, 31 Aug 2019 02:17:57 +0000 (+0800) Subject: osd/PG: fix _finish_recovery vs repair race X-Git-Tag: v12.2.13~8^2~1 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=7a41371c0943f8c3ba55d9d7900b48c4966899ca;p=ceph.git osd/PG: fix _finish_recovery vs repair race On detecting a corrupted object, primary may automatically repair that object by leveraging the existing recovery procedure, which turned out to be racy with a previous unfinished _finish_recovery callback - the problem would then be that _finish_recovery might continue to purge some strays that we still want to pull data from. Fix by re-checking if there are any newly added missing objects when executing _finish_recovery. Note that before https://github.com/ceph/ceph/pull/29756 we might instead have to call needs_recovery to catch the race condition since we did not evict pg from clean state when triggering an auto-repair.. Signed-off-by: xie xingguo (manual backport of d96e53285b4e748eacda314bf0958b87cfa42130) Conflicts: src/osd/PG.cc - adjusted if conditional for luminous - did not add the comment nor state_clear(PG_STATE_REPAIR);. Those lines were moved but don't exist in luminous. --- diff --git a/src/osd/PG.cc b/src/osd/PG.cc index a9611f31ca9..afbcac17ebc 100644 --- a/src/osd/PG.cc +++ b/src/osd/PG.cc @@ -2321,7 +2321,8 @@ void PG::finish_recovery(list& tfin) void PG::_finish_recovery(Context *c) { lock(); - if (deleting) { + if (deleting || !is_clean()) { + dout(10) << __func__ << " raced with delete or repair" << dendl; unlock(); return; }