]> git.apps.os.sepia.ceph.com Git - ceph.git/commitdiff
osd/PG: fix _finish_recovery vs repair race
authorxie xingguo <xie.xingguo@zte.com.cn>
Sat, 31 Aug 2019 02:17:57 +0000 (10:17 +0800)
committerNathan Cutler <ncutler@suse.com>
Sun, 15 Dec 2019 15:57:49 +0000 (16:57 +0100)
On detecting a corrupted object, primary may automatically
repair that object by leveraging the existing recovery procedure,
which turned out to be racy with a previous unfinished _finish_recovery
callback - the problem would then be that _finish_recovery might
continue to purge some strays that we still want to pull data from.

Fix by re-checking if there are any newly added missing objects when
executing _finish_recovery.

Note that before https://github.com/ceph/ceph/pull/29756 we might
instead have to call needs_recovery to catch the race condition
since we did not evict pg from clean state when triggering an auto-repair..

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(manual backport of d96e53285b4e748eacda314bf0958b87cfa42130)

Conflicts:
        src/osd/PG.cc
- adjusted if conditional for luminous
- did not add the comment nor state_clear(PG_STATE_REPAIR);. Those lines were
  moved but don't exist in luminous.

src/osd/PG.cc

index a9611f31ca91bbc8161cd7fa104af9a396adba7f..afbcac17ebc0b4110375484409d3269fbe899f34 100644 (file)
@@ -2321,7 +2321,8 @@ void PG::finish_recovery(list<Context*>& tfin)
 void PG::_finish_recovery(Context *c)
 {
   lock();
-  if (deleting) {
+  if (deleting || !is_clean()) {
+    dout(10) << __func__ << " raced with delete or repair" << dendl;
     unlock();
     return;
   }