git-server-git.apps.pok.os.sepia.ceph.com Git

crimson/osd/pg_recovery: call MOSDPGRecoveryDelete instead of MOSDPGBackfillRemove

This commit fixes the abort in Recovered::Recovered.

There is a race to acquire the OBC lock between backfill and
client delete for the same object.

When the lock is acquired first by the backfill, the object is
recovered first, and then deleted by the client delete request.
When recovering the object, the corresponding peer_missing entry
is cleared and we are able to transition to Recovered state
successfully.

When the lock is acquired first by client delete request, the
object is deleted. Then backfill tries to recover the object,
finds it deleted and exists early. The stale peer_missing
entry is not cleared. In Recovered::Recovered, needs_recovery()
sees this stale peer_missing entry and calls abort.

The issue is fixed by sending MOSDPGRecoveryDelete from the client
path to peers and waiting for MOSDPGRecoveryDeleteReply in
recover_object.

Fixes: https://tracker.ceph.com/issues/70501
Signed-off-by: Shraddha Agrawal <shraddha.agrawal000@gmail.com>

author	Shraddha Agrawal <shraddha.agrawal000@gmail.com>
	Thu, 19 Mar 2026 08:01:28 +0000 (13:31 +0530)
committer	Shraddha Agrawal <shraddha.agrawal000@gmail.com>
	Tue, 31 Mar 2026 09:53:54 +0000 (15:23 +0530)
commit	011f9d6c9c37ed31c0618a0321d1961a92cabc1e
tree	4e1cd6d64bf6cddac740fddb15eecf9dfe8009b4	tree \| snapshot
parent	75c21181f02ad4a00786cceec181cb7b0d1cec3a	commit \| diff

src/crimson/osd/backfill_state.cc		diff \| blob \| history
src/crimson/osd/backfill_state.h		diff \| blob \| history
src/crimson/osd/pg_recovery.cc		diff \| blob \| history
src/crimson/osd/pg_recovery.h		diff \| blob \| history
src/crimson/osd/recovery_backend.h		diff \| blob \| history
src/crimson/osd/replicated_recovery_backend.cc		diff \| blob \| history
src/test/crimson/test_backfill.cc		diff \| blob \| history