From: ethanwu Date: Thu, 11 Sep 2025 07:40:09 +0000 (+0800) Subject: mds: fix rank 0 marked damaged if stopping fails after Elid flush and log trimmed X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=adb448b4f4e421f75275874f5a67c3a2ceb0214c;p=ceph.git mds: fix rank 0 marked damaged if stopping fails after Elid flush and log trimmed steps to reproduce ../src/vstart.sh --debug --new -x --localhost --bluestore ./bin/ceph tell mds. config set mds_kill_shutdown_at 10 ./bin/ceph fs set down true wait for a few seconds and will see the following log from take-over mds and rank 0 is marked damaged 2025-09-11T16:47:24.591+0800 785dabeaa6c0 -1 log_channel(cluster) log [ERR] : No subtrees found for root MDS rank! 2025-09-11T16:47:24.591+0800 785dabeaa6c0 5 mds.beacon.b set_want_state: up:rejoin -> down:damaged During shutdown_pass after submitting Elid and trimming mdlog, mds log will now have only ELid event which does nothing at replay. After replay, no subtree is found. Fix this by checking whther MDLog contains only one event. If so, skip the subtree check for rank 0, and allow it to request STATE_STOPPED just like the other ranks. Fixes: https://tracker.ceph.com/issues/72983 Signed-off-by: ethanwu --- diff --git a/src/mds/MDSRank.cc b/src/mds/MDSRank.cc index 32e9a8cdfc40d..068edbe265cba 100644 --- a/src/mds/MDSRank.cc +++ b/src/mds/MDSRank.cc @@ -2032,8 +2032,9 @@ void MDSRank::rejoin_done() // funny case: is our cache empty? no subtrees? if (!mdcache->is_subtrees()) { - if (whoami == 0) { - // The root should always have a subtree! + if (whoami == 0 && mdlog->get_num_events() > 1) { + // The root should always have a subtree except when + // the mdlog contains only the ELid event clog->error() << "No subtrees found for root MDS rank!"; damaged(); ceph_assert(mdcache->is_subtrees());