We observed in Seastore, deletion of a large batch (default osd_target_transaction_size=30)
can take a significant amount of time.
Because this happens inside the peering_pp.process stage, it blocks the PG's peering pipeline.
During this block, any incoming OSDMap updates (PGAdvanceMap) are stalled behind the deletion work.
This eventually causes a global OSD-wide map progression hang because
the OSD cannot advance past an epoch until all PGs have processed
it.
To fix this, we are reducing osd_target_transaction_size to 5 to lower
conflict rates and allow deletion transactions to complete.
Fixes: https://tracker.ceph.com/issues/73791
Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
- objects unfound and apparently lost
conf:
osd:
+ # osd_target_transaction_size added to
+ # avoid https://tracker.ceph.com/issues/73791.
+ # we should return to the default value of
+ # 30 eventually (https://tracker.ceph.com/issues/74507).
+ osd target transaction size: 5
osd debug reject backfill probability: .3
osd scrub min interval: 60
osd scrub max interval: 120