TransactionManager::get_extents_if_live() declared an inner
std::list<CachedExtentRef> res inside the "extent is cached" branch
that shadowed the outer res returned by the coroutine. When the
queried extent was present in the cache, it was moved into the inner
list and immediately discarded, and the empty outer list was returned
to the caller.
The async cleaner uses this result to decide whether to rewrite an
extent or treat it as dead. For recently-allocated LBA tree internal
nodes (still hot in cache), the shadowed return caused the cleaner to
skip them, so mark_space_free() never paired with the earlier
mark_space_used(). Each affected reclaim leaked exactly one extent
(4 KiB for LADDR_INTERNAL), tripping the live_bytes != 0 assertion in
SegmentCleaner::clean_space() (async_cleaner.cc:1441) once a victim
segment with such a leftover was selected.
The reproducer (at ~70% full) deterministically aborted within ~3
minutes before this fix; with the fix the OSDs run cleanly past the
trigger point.
Fixes: 87a5984b3ae ("crimson/.../transaction_manager: convert get_extents_if_live to coroutine")
Signed-off-by: Shai Fultheim <shai.fultheim@gmail.com>
DEBUGT("{} {}~0x{:x} {} is cached and alive -- {}",
t, type, laddr, len, paddr, *extent);
assert(extent->get_length() == len);
- std::list<CachedExtentRef> res;
res.emplace_back(std::move(extent));
} else if (is_logical_type(type)) {
auto pin_list = co_await lba_manager->get_cursors(