From dc2cf00e890463e6702a15fda56ed7c699287fe9 Mon Sep 17 00:00:00 2001 From: Shai Fultheim Date: Sun, 24 May 2026 14:19:56 +0300 Subject: [PATCH] crimson/os/seastore: enforce capacity in RBMCleaner::try_reserve_projected_usage RBMCleaner::try_reserve_projected_usage always returned true and just incremented stats.projected_used_bytes. The EPM BackgroundProcess relies on the return value to block IO when the device is full, so this effectively disabled backpressure for the RANDOM_BLOCK_SSD backend: concurrent transactions could each reserve unbounded amounts, and the over-commit surfaced downstream as `unexpected enospc` asserts in the data path (object_data_handler.cc and friends, where ENOSPC is treated as crimson::ct_error::enospc::assert_failure because the existing infrastructure assumes ENOSPC is impossible). The OSD aborted under sustained random-write workloads that exceeded RBM capacity. Compute the device's data capacity as total - journal, subtract a 5% headroom (for metadata writes and fragmentation slack the AVL allocator cannot pack into), and reject reservations that would push used + projected over the line. The existing EPM blocking-IO path (extent_placement_manager.cc:726) already queues the IO until release_projected_usage wakes it, so no caller-side changes are needed. This is the minimal fix to keep the OSD alive under sustained random writes. It converts a crash into a stall: once the device fills and the cleaner has nothing to free (RBMCleaner::clean_space is still a TODO), new writes block indefinitely instead of crashing. Verified against an 8-job 1MB random-write fio (--size 63g, 90GB RBM, 3GB journal): 68 GB user-written, host WAF 1.696, OSD survives, watchdog kills fio after slow-ops timeout. Without this patch the same workload asserts in the data path. The headroom is intentionally generous (5%) because there is no GC yet; once RBMCleaner::clean_space() exists, the headroom can shrink. Fixes: https://tracker.ceph.com/issues/75598 Signed-off-by: Shai Fultheim --- src/crimson/os/seastore/async_cleaner.cc | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/src/crimson/os/seastore/async_cleaner.cc b/src/crimson/os/seastore/async_cleaner.cc index f0e36f82da1..58d52b2ba05 100644 --- a/src/crimson/os/seastore/async_cleaner.cc +++ b/src/crimson/os/seastore/async_cleaner.cc @@ -2031,6 +2031,27 @@ void RBMCleaner::commit_space_used(paddr_t addr, extent_len_t len) bool RBMCleaner::try_reserve_projected_usage(std::size_t projected_usage) { assert(background_callback->is_ready()); + + // Capacity check. Without this, concurrent transactions over-commit the + // RBM device: each reserves but the cleaner has no clean_space() yet, so + // a write that physically can't be served reaches the allocator and + // surfaces as `unexpected enospc` asserts in the data path (object_data + // _handler.cc et al.). Return false so the EPM BackgroundProcess blocks + // the IO until committed transactions release space. + // + // Headroom carves out room for metadata writes (LBA btree, backref) and + // for fragmentation slack the allocator can't pack into. 5% is a starting + // point; until RBMCleaner::clean_space() exists we cannot reclaim from + // fragmented free space, so headroom doubles as a fragmentation guard. + assert(get_total_bytes() > get_journal_bytes()); + auto data_capacity = get_total_bytes() - get_journal_bytes(); + auto headroom = data_capacity / 20; + auto committed_and_projected = stats.used_bytes + + stats.projected_used_bytes + + projected_usage; + if (committed_and_projected + headroom > data_capacity) { + return false; + } stats.projected_used_bytes += projected_usage; return true; } -- 2.47.3