osd/PG: fix DeferRecovery vs AllReplicasRecovered race
- DeferRecovery event queued by AsyncReserver due to preemption
event. We are in Recovering state with RECOVERING bit set.
- We finish recovery, clear RECOVERING state bit, and queue
AllReplicasRecovered from PrimaryLogPG::start_recovery_ops()
- DeferRecovery event arrives, moving us from Recovering -> NotRecovering
- AllReplciasRecovered event arrives, crashing us.
This is all hard to deal with because the events are queued and may
arrive later. Solve the problem here by tolerating a delayed
DeferRecovery event: if the RECOVERING pg state bit isn't set, ignore
it (it's old). The async reserver cancel events are unpredictable.
Fixes: http://tracker.ceph.com/issues/23860
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit
cfe59cf20c4b09aa7b25c3f9a724a01380699744)