From: Nitzan Mordechai Date: Tue, 22 Apr 2025 16:23:16 +0000 (+0000) Subject: msg: drain stack before stopping processors to avoid shutdown hang X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=8af12ef9d1876e19b753e0a162f69fb04e56b1df;p=ceph.git msg: drain stack before stopping processors to avoid shutdown hang `AsyncMessenger::shutdown()` called WorkerProcessor::stop() first, killing the worker threads, then queued a C_drain callback via stack->drain(). If a worker had already exited its event loop it never processed the callback, so drain.wait() blocked forever and the monitor shutdown hung for minutes. Move stack->drain() ahead of the processors->stop() loop. With the new order the workers are still alive to acknowledge the drain. Fixes: https://tracker.ceph.com/issues/71303 Signed-off-by: Nitzan Mordechai (cherry picked from commit 5fbb9c5e464e3a2227f0c4729b2e6a1bc2f6f9d6) --- diff --git a/src/msg/async/AsyncMessenger.cc b/src/msg/async/AsyncMessenger.cc index 6b3a8c3f6dcd..eedf22d53ddb 100644 --- a/src/msg/async/AsyncMessenger.cc +++ b/src/msg/async/AsyncMessenger.cc @@ -341,6 +341,7 @@ int AsyncMessenger::shutdown() { ldout(cct,10) << __func__ << " " << get_myaddrs() << dendl; + stack->drain(); // done! clean up. for (auto &&p : processors) p->stop(); @@ -353,7 +354,7 @@ int AsyncMessenger::shutdown() stop_cond.notify_all(); stopped = true; lock.unlock(); - stack->drain(); + return 0; }