]> git.apps.os.sepia.ceph.com Git - ceph.git/commit
osd/PG: fix _finish_recovery vs repair race
authorxie xingguo <xie.xingguo@zte.com.cn>
Sat, 31 Aug 2019 02:17:57 +0000 (10:17 +0800)
committerNathan Cutler <ncutler@suse.com>
Sun, 15 Dec 2019 15:57:49 +0000 (16:57 +0100)
commit7a41371c0943f8c3ba55d9d7900b48c4966899ca
treea8ad283ee8e50a55109ca78af307c9bebe80730c
parent8ba6679b72bf16d9be9bf87849a9179e79cb35f4
osd/PG: fix _finish_recovery vs repair race

On detecting a corrupted object, primary may automatically
repair that object by leveraging the existing recovery procedure,
which turned out to be racy with a previous unfinished _finish_recovery
callback - the problem would then be that _finish_recovery might
continue to purge some strays that we still want to pull data from.

Fix by re-checking if there are any newly added missing objects when
executing _finish_recovery.

Note that before https://github.com/ceph/ceph/pull/29756 we might
instead have to call needs_recovery to catch the race condition
since we did not evict pg from clean state when triggering an auto-repair..

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(manual backport of d96e53285b4e748eacda314bf0958b87cfa42130)

Conflicts:
        src/osd/PG.cc
- adjusted if conditional for luminous
- did not add the comment nor state_clear(PG_STATE_REPAIR);. Those lines were
  moved but don't exist in luminous.
src/osd/PG.cc