]> git.apps.os.sepia.ceph.com Git - ceph.git/commit
osd/PG: fix DeferRecovery vs AllReplicasRecovered race 21964/head
authorSage Weil <sage@redhat.com>
Fri, 27 Apr 2018 20:00:58 +0000 (15:00 -0500)
committerPrashant D <pdhange@redhat.com>
Mon, 14 May 2018 02:01:10 +0000 (22:01 -0400)
commit82d5010e04fca6b0ea00bda298cb07f235d885a6
tree7aae6af8a49a2e7a9747b59943f89f2c0b0b8a6e
parent07b0d0ace717990b36e358e4ecdc17665f1e9045
osd/PG: fix DeferRecovery vs AllReplicasRecovered race

- DeferRecovery event queued by AsyncReserver due to preemption
  event.  We are in Recovering state with RECOVERING bit set.
- We finish recovery, clear RECOVERING state bit, and queue
  AllReplicasRecovered from PrimaryLogPG::start_recovery_ops()
- DeferRecovery event arrives, moving us from Recovering -> NotRecovering
- AllReplciasRecovered event arrives, crashing us.

This is all hard to deal with because the events are queued and may
arrive later.  Solve the problem here by tolerating a delayed
DeferRecovery event: if the RECOVERING pg state bit isn't set, ignore
it (it's old).  The async reserver cancel events are unpredictable.

Fixes: http://tracker.ceph.com/issues/23860
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit cfe59cf20c4b09aa7b25c3f9a724a01380699744)
src/osd/PG.cc