]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/commitdiff
osd/PG: fix _finish_recovery vs repair race 30059/head
authorxie xingguo <xie.xingguo@zte.com.cn>
Sat, 31 Aug 2019 02:17:57 +0000 (10:17 +0800)
committerxie xingguo <xie.xingguo@zte.com.cn>
Mon, 2 Sep 2019 00:44:20 +0000 (08:44 +0800)
On detecting a corrupted object, primary may automatically
repair that object by leveraging the existing recovery procedure,
which turned out to be racy with a previous unfinished _finish_recovery
callback - the problem would then be that _finish_recovery might
continue to purge some strays that we still want to pull data from.

Fix by re-checking if there are any newly added missing objects when
executing _finish_recovery.

Note that before https://github.com/ceph/ceph/pull/29756 we might
instead have to call needs_recovery to catch the race condition
since we did not evict pg from clean state when triggering an auto-repair..

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
src/osd/PG.cc

index 88d86957ad9ebcef113f71af7cccb39f5a0c5174..6556d13301e176ee84ada9c47dd6f78a431e212b 100644 (file)
@@ -494,11 +494,12 @@ Context *PG::finish_recovery()
 void PG::_finish_recovery(Context *c)
 {
   std::scoped_lock locker{*this};
-  // When recovery is initiated by a repair, that flag is left on
-  state_clear(PG_STATE_REPAIR);
-  if (recovery_state.is_deleting()) {
+  if (recovery_state.is_deleting() || !is_clean()) {
+    dout(10) << __func__ << " raced with delete or repair" << dendl;
     return;
   }
+  // When recovery is initiated by a repair, that flag is left on
+  state_clear(PG_STATE_REPAIR);
   if (c == finish_sync_event) {
     dout(10) << "_finish_recovery" << dendl;
     finish_sync_event = 0;