From: Kefu Chai Date: Sat, 10 Jan 2026 01:22:31 +0000 (+0800) Subject: librbd/pwl: fix memory leaks in discard operations X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=e2873e3073e9f74ac63e5a2316c339a0f221fc9b;p=ceph-ci.git librbd/pwl: fix memory leaks in discard operations Fix memory leaks in librbd persistent write log (PWL) cache discard operations by properly completing and deleting request objects. ASan reported the following leaks in unittest_librbd: Direct leak of 240 byte(s) in 1 object(s) allocated from: #0 operator new(unsigned long) #1 librbd::cache::pwl::AbstractWriteLog::discard(...) /ceph/src/librbd/cache/pwl/AbstractWriteLog.cc:935:5 #2 TestMockCacheReplicatedWriteLog_discard_Test::TestBody() /ceph/src/test/librbd/cache/pwl/test_mock_ReplicatedWriteLog.cc:534:7 Direct leak of 240 byte(s) in 1 object(s) allocated from: #0 operator new(unsigned long) #1 librbd::cache::pwl::AbstractWriteLog::discard(...) /ceph/src/librbd/cache/pwl/AbstractWriteLog.cc:935:5 #2 TestMockCacheSSDWriteLog_discard_Test::TestBody() /ceph/src/test/librbd/cache/pwl/test_mock_SSDWriteLog.cc:533:7 Plus multiple indirect leaks totaling 2,076 bytes: - DiscardLogOperation (224 bytes x 2) - SyncPoint (184 bytes) - SyncPointLogEntry (168 bytes) - DiscardLogEntry (152 bytes x 2) - Various string and vector allocations SUMMARY: AddressSanitizer: 2316 byte(s) leaked in 15 allocation(s). Root cause analysis: 1. Missing request completion: C_DiscardRequest objects were never deleted because their complete() method was never called. The on_write_persist callback in setup_log_operations() called complete_user_request() to invoke the user's callback, but didn't call complete() on the request itself to trigger self-deletion via Context::complete(). Write requests use WriteLogOperationSet which takes the request as its on_finish callback, ensuring complete() is eventually called. Discard requests don't use WriteLogOperationSet and must explicitly call complete() in their on_write_persist callback. 2. Sync point not reset: m_current_sync_point was never explicitly reset during shutdown, keeping the sync point alive even after m_log_entries was cleared. This prevented proper cleanup of the sync point chain. 3. Indirect leaks: C_DiscardRequest holds shared_ptr, which holds shared_ptr, which holds shared_ptr to log entries. When the request leaked, all referenced objects also leaked through the shared_ptr reference chain. Solution: 1. Add discard_req->complete(r) in the on_write_persist callback (Request.cc:445) to properly delete the request object after completing the user callback and releasing the cell. This follows the same pattern as WriteLogOperationSet for write requests. 2. Add m_current_sync_point.reset() in shut_down() (AbstractWriteLog.cc:682) after m_log_entries.clear() to explicitly release the sync point and break the reference chain. These fixes ensure proper cleanup of all allocated objects in both production and test environments. Test results: - Before: 2,316 bytes leaked in 15 allocations - After: 0 bytes leaked - unittest_librbd discard tests pass successfully with ASan Signed-off-by: Kefu Chai --- diff --git a/src/librbd/cache/pwl/AbstractWriteLog.cc b/src/librbd/cache/pwl/AbstractWriteLog.cc index 70d8d7418a1..9fb14e5c090 100644 --- a/src/librbd/cache/pwl/AbstractWriteLog.cc +++ b/src/librbd/cache/pwl/AbstractWriteLog.cc @@ -679,6 +679,7 @@ void AbstractWriteLog::shut_down(Context *on_finish) { check_image_cache_state_clean(); m_wake_up_enabled = false; m_log_entries.clear(); + m_current_sync_point.reset(); m_cache_state->clean = true; m_cache_state->empty = true; remove_pool_file(); diff --git a/src/librbd/cache/pwl/Request.cc b/src/librbd/cache/pwl/Request.cc index e424c2285a1..678aaeb8c7a 100644 --- a/src/librbd/cache/pwl/Request.cc +++ b/src/librbd/cache/pwl/Request.cc @@ -442,6 +442,7 @@ void C_DiscardRequest::setup_log_operations() { ceph_assert(discard_req->get_cell()); discard_req->complete_user_request(r); discard_req->release_cell(); + discard_req->complete(r); }); op->init_op(current_sync_gen, persist_on_flush, pwl.get_last_op_sequence_num(), on_write_persist, on_write_append);