From: John Mulligan <jmulligan@redhat.com>
Date: Sun, 8 Sep 2024 14:42:36 +0000 (-0400)
Subject: mgr/smb: stop trying to clean external store during cluster sync
X-Git-Tag: v20.0.0~1093^2
X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=refs%2Fpull%2F59658%2Fhead;p=ceph.git

mgr/smb: stop trying to clean external store during cluster sync

It was found during testing that a sequence of commands like:
```
ceph smb cluster create slow1 user --define-user-pass=user1%badf00d --clustering=always
--placement=3
sleep 0.5
ceph smb share create slow1 share1 cephfs --subvolume=g1/sv1 --path=/
sleep 0.5
ceph smb share create slow1 share2 cephfs --subvolume=g1/sv2 --path=/
```
would create a CTDB enabled cluster that would fail to start up
correctly. The issue was due to the call to `external.rm_other_in_ns`
during the cluster sync operation. In the CTDB enabled mode, objects are
written to the pool outside of the smb mgr module's direct control, in
particular `cluster.meta.json`, and this function, intended to keep the
pool & namespace tidy, was removing objects needed by CTDB-enabled mode.
The failure is somewhat timing sensitive due to the ctdb enablement
sidecars coming up before or after the object was deleted.

Remove this function call so that these objects stop getting deleted at
inopportune times. While we could have tried making this function
"smarter" and only deleting some unexpected objects, in this case I feel
that keeping it simple is better. If we find this pool getting cluttered
in the future we can add a smarter pool-tidying-up function later.

Fixes: https://tracker.ceph.com/issues/67946

Signed-off-by: John Mulligan <jmulligan@redhat.com>
---

diff --git a/src/pybind/mgr/smb/handler.py b/src/pybind/mgr/smb/handler.py
index b2285eef5753..efce904bff6a 100644
--- a/src/pybind/mgr/smb/handler.py
+++ b/src/pybind/mgr/smb/handler.py
@@ -620,11 +620,6 @@ class ClusterConfigHandler:
             change_group.cluster.cluster_id,
             set(change_group.cache),
         )
-        external.rm_other_in_ns(
-            self.public_store,
-            change_group.cluster.cluster_id,
-            set(change_group.cache),
-        )
 
         # ensure a entity exists with access to the volumes
         for volume in vols: