]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/commit
crimson/osd/pg_recovery: call MOSDPGRecoveryDelete instead of MOSDPGBackfillRemove 67925/head
authorShraddha Agrawal <shraddha.agrawal000@gmail.com>
Thu, 19 Mar 2026 08:01:28 +0000 (13:31 +0530)
committerShraddha Agrawal <shraddha.agrawal000@gmail.com>
Tue, 31 Mar 2026 09:53:54 +0000 (15:23 +0530)
commit011f9d6c9c37ed31c0618a0321d1961a92cabc1e
tree4e1cd6d64bf6cddac740fddb15eecf9dfe8009b4
parent75c21181f02ad4a00786cceec181cb7b0d1cec3a
crimson/osd/pg_recovery: call MOSDPGRecoveryDelete instead of MOSDPGBackfillRemove

This commit fixes the abort in Recovered::Recovered.

There is a race to acquire the OBC lock between backfill and
client delete for the same object.

When the lock is acquired first by the backfill, the object is
recovered first, and then deleted by the client delete request.
When recovering the object, the corresponding peer_missing entry
is cleared and we are able to transition to Recovered state
successfully.

When the lock is acquired first by client delete request, the
object is deleted. Then backfill tries to recover the object,
finds it deleted and exists early. The stale peer_missing
entry is not cleared. In Recovered::Recovered, needs_recovery()
sees this stale peer_missing entry and calls abort.

The issue is fixed by sending MOSDPGRecoveryDelete from the client
path to peers and waiting for MOSDPGRecoveryDeleteReply in
recover_object.

Fixes: https://tracker.ceph.com/issues/70501
Signed-off-by: Shraddha Agrawal <shraddha.agrawal000@gmail.com>
src/crimson/osd/backfill_state.cc
src/crimson/osd/backfill_state.h
src/crimson/osd/pg_recovery.cc
src/crimson/osd/pg_recovery.h
src/crimson/osd/recovery_backend.h
src/crimson/osd/replicated_recovery_backend.cc
src/test/crimson/test_backfill.cc