]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph-ci.git/commitdiff
librbd/pwl: fix memory leaks in discard operations
authorKefu Chai <k.chai@proxmox.com>
Sat, 10 Jan 2026 01:22:31 +0000 (09:22 +0800)
committerKefu Chai <k.chai@proxmox.com>
Sat, 10 Jan 2026 01:32:10 +0000 (09:32 +0800)
Fix memory leaks in librbd persistent write log (PWL) cache discard
operations by properly completing and deleting request objects.

ASan reported the following leaks in unittest_librbd:

  Direct leak of 240 byte(s) in 1 object(s) allocated from:
    #0 operator new(unsigned long)
    #1 librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::discard(...)
       /ceph/src/librbd/cache/pwl/AbstractWriteLog.cc:935:5
    #2 TestMockCacheReplicatedWriteLog_discard_Test::TestBody()
       /ceph/src/test/librbd/cache/pwl/test_mock_ReplicatedWriteLog.cc:534:7

  Direct leak of 240 byte(s) in 1 object(s) allocated from:
    #0 operator new(unsigned long)
    #1 librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::discard(...)
       /ceph/src/librbd/cache/pwl/AbstractWriteLog.cc:935:5
    #2 TestMockCacheSSDWriteLog_discard_Test::TestBody()
       /ceph/src/test/librbd/cache/pwl/test_mock_SSDWriteLog.cc:533:7

  Plus multiple indirect leaks totaling 2,076 bytes:
    - DiscardLogOperation (224 bytes x 2)
    - SyncPoint (184 bytes)
    - SyncPointLogEntry (168 bytes)
    - DiscardLogEntry (152 bytes x 2)
    - Various string and vector allocations

  SUMMARY: AddressSanitizer: 2316 byte(s) leaked in 15 allocation(s).

Root cause analysis:

1. Missing request completion: C_DiscardRequest objects were never deleted
   because their complete() method was never called. The on_write_persist
   callback in setup_log_operations() called complete_user_request() to
   invoke the user's callback, but didn't call complete() on the request
   itself to trigger self-deletion via Context::complete().

   Write requests use WriteLogOperationSet which takes the request as its
   on_finish callback, ensuring complete() is eventually called. Discard
   requests don't use WriteLogOperationSet and must explicitly call
   complete() in their on_write_persist callback.

2. Sync point not reset: m_current_sync_point was never explicitly reset
   during shutdown, keeping the sync point alive even after m_log_entries
   was cleared. This prevented proper cleanup of the sync point chain.

3. Indirect leaks: C_DiscardRequest holds shared_ptr<DiscardLogOperation>,
   which holds shared_ptr<SyncPoint>, which holds shared_ptr to log entries.
   When the request leaked, all referenced objects also leaked through the
   shared_ptr reference chain.

Solution:

1. Add discard_req->complete(r) in the on_write_persist callback
   (Request.cc:445) to properly delete the request object after completing
   the user callback and releasing the cell. This follows the same pattern
   as WriteLogOperationSet for write requests.

2. Add m_current_sync_point.reset() in shut_down() (AbstractWriteLog.cc:682)
   after m_log_entries.clear() to explicitly release the sync point and
   break the reference chain.

These fixes ensure proper cleanup of all allocated objects in both
production and test environments.

Test results:
- Before: 2,316 bytes leaked in 15 allocations
- After: 0 bytes leaked
- unittest_librbd discard tests pass successfully with ASan

Signed-off-by: Kefu Chai <k.chai@proxmox.com>
src/librbd/cache/pwl/AbstractWriteLog.cc
src/librbd/cache/pwl/Request.cc

index 70d8d7418a13f5e55e784c6d87fcfb9fe858b511..9fb14e5c090f45974359dd444ec8d276f9351779 100644 (file)
@@ -679,6 +679,7 @@ void AbstractWriteLog<I>::shut_down(Context *on_finish) {
       check_image_cache_state_clean();
       m_wake_up_enabled = false;
       m_log_entries.clear();
+      m_current_sync_point.reset();
       m_cache_state->clean = true;
       m_cache_state->empty = true;
       remove_pool_file();
index e424c2285a1b143358bcaad2f729b0669b2b591c..678aaeb8c7ac1990417e2d950f8b386e7e584a8b 100644 (file)
@@ -442,6 +442,7 @@ void C_DiscardRequest<T>::setup_log_operations() {
       ceph_assert(discard_req->get_cell());
       discard_req->complete_user_request(r);
       discard_req->release_cell();
+      discard_req->complete(r);
     });
   op->init_op(current_sync_gen, persist_on_flush, pwl.get_last_op_sequence_num(),
               on_write_persist, on_write_append);