git.apps.os.sepia.ceph.com Git - ceph-ci.git/commit

author	ethanwu <ethanwu@synology.com>
	Tue, 23 Sep 2025 01:45:36 +0000 (09:45 +0800)
committer	ethanwu <ethanwu@synology.com>
	Tue, 23 Sep 2025 02:23:52 +0000 (10:23 +0800)
commit	cfecf7c867d20d7d05ab3f341844c7c2b9b733d0
tree	eb3249eecce451e5d54881e6e6ca3e292fb0d867	tree \| snapshot
parent	31eb5e7fbf2bd0d8ef1265d6922ab2b7be98fa6c	commit \| diff

mds/FSMap: fix join_fscid being incorrectly reset for active MDS during filesystem removal

Fix bug where active MDS daemons in remaining filesystems incorrectly
have their join_fscid cleared to FS_CLUSTER_ID_NONE when any other
filesystem is removed.

The issue was caused by variable name shadowing in erase_filesystem()
where the loop variable 'fscid' shadowed the function parameter 'fscid':
Inside loop: if (info.join_fscid == fscid) compared against the
loop variable (remaining FS ID) instead of parameter (removed FS ID)

Renamed the loop variable to 'remaining_fscid' to eliminate the shadowing
and ensure the comparison uses the correct filesystem ID.

Reproducer:
../src/vstart.sh --new -x --localhost --bluestore
FS=b
./bin/ceph osd pool create cephfs.${FS}.meta 64 64 replicated
./bin/ceph osd pool create cephfs.${FS}.data 64 64 replicated
./bin/ceph fs new ${FS} cephfs.${FS}.meta cephfs.${FS}.data
./bin/ceph config set mds.a mds_join_fs a
./bin/ceph config set mds.b mds_join_fs a
./bin/ceph fs fail ${FS}
./bin/ceph fs rm ${FS} --yes-i-really-mean-it

Then from ./bin/ceph fs dump
We can see join_fscid in all active mds filesystem 'a' are reset.
Since there are standby mds with join_fscid=1
MDSMonitor think they have better affinity and trigger switch over.

Fixes: https://tracker.ceph.com/issues/73183
Signed-off-by: ethanwu <ethanwu@synology.com>