From: ethanwu Date: Sun, 24 Mar 2024 08:17:49 +0000 (+0800) Subject: mds/FSMap: go back to STARTING state when rank doesn't make it pass STARTING X-Git-Tag: v20.0.0~1669^2~2 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=767494d68c8c3130097f3cf794aadccc5bd98806;p=ceph.git mds/FSMap: go back to STARTING state when rank doesn't make it pass STARTING Just like STATE_CREATING, mds could fail or being stopped any where at STATE_STARTING state, so make sure subsequent take-over mds will start from STATE_STARTING. Otherwise, we'll end up with empty journal(No ESubtreeMap). The subsequent take-over mds will fail with no subtrees found and rank will be marked damaged. Quick way to reproduce this: ./bin/ceph fs set a down true # take down all rank in filesystem a #wait for fs to stop all rank ./bin/ceph fs set a down true; pidof ceph-mds | xargs kill # quickly kill all mds soon after they enter starting state ./bin/ceph-mds -i a -c ./ceph.conf # start all mds. Then we'll find out that mds rank is reported damaged with following log -1 log_channel(cluster) log [ERR] : No subtrees found for root MDS rank! 5 mds.beacon.a set_want_state: up:rejoin -> down:damaged Fixes: https://tracker.ceph.com/issues/65094 Signed-off-by: ethanwu --- diff --git a/src/mds/FSMap.cc b/src/mds/FSMap.cc index a266ad253afb..1501547e8505 100644 --- a/src/mds/FSMap.cc +++ b/src/mds/FSMap.cc @@ -1006,6 +1006,12 @@ void FSMap::erase(mds_gid_t who, epoch_t blocklist_epoch) // the rank ever existed so that next time it's handed out // to a gid it'll go back into CREATING. fs.mds_map.in.erase(info.rank); + } else if (info.state == MDSMap::STATE_STARTING) { + // If this gid didn't make it past STARTING, then forget + // the rank ever existed so that next time it's handed out + // to a gid it'll go back into STARTING. + fs.mds_map.in.erase(info.rank); + fs.mds_map.stopped.insert(info.rank); } else { // Put this rank into the failed list so that the next available // STANDBY will pick it up.