Fix memory leaks in librbd persistent write log (PWL) cache discard
operations by properly completing and deleting request objects.
ASan reported the following leaks in unittest_librbd:
Direct leak of 240 byte(s) in 1 object(s) allocated from:
#0 operator new(unsigned long)
#1 librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::discard(...)
/ceph/src/librbd/cache/pwl/AbstractWriteLog.cc:935:5
#2 TestMockCacheReplicatedWriteLog_discard_Test::TestBody()
/ceph/src/test/librbd/cache/pwl/test_mock_ReplicatedWriteLog.cc:534:7
Direct leak of 240 byte(s) in 1 object(s) allocated from:
#0 operator new(unsigned long)
#1 librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::discard(...)
/ceph/src/librbd/cache/pwl/AbstractWriteLog.cc:935:5
#2 TestMockCacheSSDWriteLog_discard_Test::TestBody()
/ceph/src/test/librbd/cache/pwl/test_mock_SSDWriteLog.cc:533:7
Plus multiple indirect leaks totaling 2,076 bytes:
- DiscardLogOperation (224 bytes x 2)
- SyncPoint (184 bytes)
- SyncPointLogEntry (168 bytes)
- DiscardLogEntry (152 bytes x 2)
- Various string and vector allocations
SUMMARY: AddressSanitizer: 2316 byte(s) leaked in 15 allocation(s).
Root cause analysis:
1. Missing request completion: C_DiscardRequest objects were never deleted
because their complete() method was never called. The on_write_persist
callback in setup_log_operations() called complete_user_request() to
invoke the user's callback, but didn't call complete() on the request
itself to trigger self-deletion via Context::complete().
Write requests use WriteLogOperationSet which takes the request as its
on_finish callback, ensuring complete() is eventually called. Discard
requests don't use WriteLogOperationSet and must explicitly call
complete() in their on_write_persist callback.
2. Sync point not reset: m_current_sync_point was never explicitly reset
during shutdown, keeping the sync point alive even after m_log_entries
was cleared. This prevented proper cleanup of the sync point chain.
3. Indirect leaks: C_DiscardRequest holds shared_ptr<DiscardLogOperation>,
which holds shared_ptr<SyncPoint>, which holds shared_ptr to log entries.
When the request leaked, all referenced objects also leaked through the
shared_ptr reference chain.
Solution:
1. Add discard_req->complete(r) in the on_write_persist callback
(Request.cc:445) to properly delete the request object after completing
the user callback and releasing the cell. This follows the same pattern
as WriteLogOperationSet for write requests.
2. Add m_current_sync_point.reset() in shut_down() (AbstractWriteLog.cc:682)
after m_log_entries.clear() to explicitly release the sync point and
break the reference chain.
These fixes ensure proper cleanup of all allocated objects in both
production and test environments.
Test results:
- Before: 2,316 bytes leaked in 15 allocations
- After: 0 bytes leaked
- unittest_librbd discard tests pass successfully with ASan
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
check_image_cache_state_clean();
m_wake_up_enabled = false;
m_log_entries.clear();
+ m_current_sync_point.reset();
m_cache_state->clean = true;
m_cache_state->empty = true;
remove_pool_file();
ceph_assert(discard_req->get_cell());
discard_req->complete_user_request(r);
discard_req->release_cell();
+ discard_req->complete(r);
});
op->init_op(current_sync_gen, persist_on_flush, pwl.get_last_op_sequence_num(),
on_write_persist, on_write_append);