]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/commit
crimson/osd: cancel ongoing pglog-based recoveries on recovery defering 59066/head
authorXuehan Xu <xuxuehan@qianxin.com>
Wed, 7 Aug 2024 02:52:49 +0000 (10:52 +0800)
committerXuehan Xu <xuxuehan@qianxin.com>
Wed, 28 Aug 2024 08:56:05 +0000 (16:56 +0800)
commitc43542f7b9dcbf52cde4f041256a356d83ac86e9
tree10a5e34b91216ebaec632fddf4e804ffd0ffc21f
parent601fcfa91885fadd1028aff101cb731fa6aa6a38
crimson/osd: cancel ongoing pglog-based recoveries on recovery defering

Previously, we rely on checking `PG::is_recovery()` in
`PGRecevery::start_recovery_ops()` to determine whether it's still valid
to proceed the recovery. This turns out to be inefficient, for example:

1. PG `P` is under recovery, and `PGRecovery::start_recovery_ops()` is
   called;
2. PG `P`'s recovery is deferred, and `PGRecovery::start_recovery_ops()`
   is blocked on waiting for external replies;
3. Before the arrivals of external replies, PG `P` resumes recovery and
   a new round of `PGRecovery::start_recovery_ops()` starts;
4. The external replies arrives and the old
   `PGRecovery::start_recovery_ops()` continues as `PG:is_recovering()`
   returns true.

In the above case, we get two duplicated recovery ops, which is
incorrect.

Fixes: https://tracker.ceph.com/issues/67380
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
src/crimson/osd/osd_operations/background_recovery.cc
src/crimson/osd/osd_operations/background_recovery.h
src/crimson/osd/pg.cc
src/crimson/osd/pg.h
src/crimson/osd/pg_recovery.cc
src/crimson/osd/pg_recovery.h
src/crimson/osd/pg_recovery_listener.h
src/osd/PG.h
src/osd/PeeringState.cc
src/osd/PeeringState.h