From 4a8ee645c90cded9eca9f9353402d022ed881abb Mon Sep 17 00:00:00 2001 From: ethanwu Date: Tue, 23 Sep 2025 09:45:36 +0800 Subject: [PATCH] mds/FSMap: fix join_fscid being incorrectly reset for active MDS during filesystem removal Fix bug where active MDS daemons in remaining filesystems incorrectly have their join_fscid cleared to FS_CLUSTER_ID_NONE when any other filesystem is removed. The issue was caused by variable name shadowing in erase_filesystem() where the loop variable 'fscid' shadowed the function parameter 'fscid': Inside loop: if (info.join_fscid == fscid) compared against the loop variable (remaining FS ID) instead of parameter (removed FS ID) Renamed the loop variable to 'remaining_fscid' to eliminate the shadowing and ensure the comparison uses the correct filesystem ID. Reproducer: ../src/vstart.sh --new -x --localhost --bluestore FS=b ./bin/ceph osd pool create cephfs.${FS}.meta 64 64 replicated ./bin/ceph osd pool create cephfs.${FS}.data 64 64 replicated ./bin/ceph fs new ${FS} cephfs.${FS}.meta cephfs.${FS}.data ./bin/ceph config set mds.a mds_join_fs a ./bin/ceph config set mds.b mds_join_fs a ./bin/ceph fs fail ${FS} ./bin/ceph fs rm ${FS} --yes-i-really-mean-it Then from ./bin/ceph fs dump We can see join_fscid in all active mds filesystem 'a' are reset. Since there are standby mds with join_fscid=1 MDSMonitor think they have better affinity and trigger switch over. Fixes: https://tracker.ceph.com/issues/73183 Signed-off-by: ethanwu (cherry picked from commit cfecf7c867d20d7d05ab3f341844c7c2b9b733d0) --- src/mds/FSMap.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mds/FSMap.cc b/src/mds/FSMap.cc index a133545c3ac..2129cbd576f 100644 --- a/src/mds/FSMap.cc +++ b/src/mds/FSMap.cc @@ -1234,7 +1234,7 @@ void FSMap::erase_filesystem(fs_cluster_id_t fscid) }); } } - for ([[maybe_unused]] auto& [fscid, fs] : filesystems) { + for ([[maybe_unused]] auto& [remaining_fscid, fs] : filesystems) { for (auto& [gid, info] : fs.mds_map.get_mds_info()) { if (info.join_fscid == fscid) { modify_daemon(gid, [](auto& info) { -- 2.39.5