From: John Mulligan Date: Sun, 8 Sep 2024 14:42:36 +0000 (-0400) Subject: mgr/smb: stop trying to clean external store during cluster sync X-Git-Tag: v20.0.0~1093^2 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=refs%2Fpull%2F59658%2Fhead;p=ceph.git mgr/smb: stop trying to clean external store during cluster sync It was found during testing that a sequence of commands like: ``` ceph smb cluster create slow1 user --define-user-pass=user1%badf00d --clustering=always --placement=3 sleep 0.5 ceph smb share create slow1 share1 cephfs --subvolume=g1/sv1 --path=/ sleep 0.5 ceph smb share create slow1 share2 cephfs --subvolume=g1/sv2 --path=/ ``` would create a CTDB enabled cluster that would fail to start up correctly. The issue was due to the call to `external.rm_other_in_ns` during the cluster sync operation. In the CTDB enabled mode, objects are written to the pool outside of the smb mgr module's direct control, in particular `cluster.meta.json`, and this function, intended to keep the pool & namespace tidy, was removing objects needed by CTDB-enabled mode. The failure is somewhat timing sensitive due to the ctdb enablement sidecars coming up before or after the object was deleted. Remove this function call so that these objects stop getting deleted at inopportune times. While we could have tried making this function "smarter" and only deleting some unexpected objects, in this case I feel that keeping it simple is better. If we find this pool getting cluttered in the future we can add a smarter pool-tidying-up function later. Fixes: https://tracker.ceph.com/issues/67946 Signed-off-by: John Mulligan --- diff --git a/src/pybind/mgr/smb/handler.py b/src/pybind/mgr/smb/handler.py index b2285eef5753..efce904bff6a 100644 --- a/src/pybind/mgr/smb/handler.py +++ b/src/pybind/mgr/smb/handler.py @@ -620,11 +620,6 @@ class ClusterConfigHandler: change_group.cluster.cluster_id, set(change_group.cache), ) - external.rm_other_in_ns( - self.public_store, - change_group.cluster.cluster_id, - set(change_group.cache), - ) # ensure a entity exists with access to the volumes for volume in vols: