]> git.apps.os.sepia.ceph.com Git - ceph-ci.git/commit
osd/PG: fix DeferRecovery vs AllReplicasRecovered race
authorSage Weil <sage@redhat.com>
Fri, 27 Apr 2018 20:00:58 +0000 (15:00 -0500)
committerSage Weil <sage@redhat.com>
Sun, 29 Apr 2018 21:00:41 +0000 (16:00 -0500)
commitcfe59cf20c4b09aa7b25c3f9a724a01380699744
tree056c92ddd611609f41849b2b0f47ad8e3b40fdc5
parent049e9097a9fb0158a0a9cf52b1c04caa5b8c96a8
osd/PG: fix DeferRecovery vs AllReplicasRecovered race

- DeferRecovery event queued by AsyncReserver due to preemption
  event.  We are in Recovering state with RECOVERING bit set.
- We finish recovery, clear RECOVERING state bit, and queue
  AllReplicasRecovered from PrimaryLogPG::start_recovery_ops()
- DeferRecovery event arrives, moving us from Recovering -> NotRecovering
- AllReplciasRecovered event arrives, crashing us.

This is all hard to deal with because the events are queued and may
arrive later.  Solve the problem here by tolerating a delayed
DeferRecovery event: if the RECOVERING pg state bit isn't set, ignore
it (it's old).  The async reserver cancel events are unpredictable.

Fixes: http://tracker.ceph.com/issues/23860
Signed-off-by: Sage Weil <sage@redhat.com>
src/osd/PG.cc