From: Kotresh HR Date: Thu, 7 Apr 2022 13:28:28 +0000 (+0530) Subject: mgr/volumes: Fix clone hang issue X-Git-Tag: v18.0.0~1050^2~1 X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=cfd8d63e0158b1d336f5e080a1e83dbf6fc60079;p=ceph.git mgr/volumes: Fix clone hang issue Following sequence of operation lead to deadlock 1. Created subvolume 2. Written some I/O on the subvolume 3. Create snapshot of the subvolume 4. Create clone of the snapshot 5. Delete snapshot from back end (don't use subvolume interface) before clone completes 6. Delete clone with force 7. Delete subvolume 8. Delete fs and associated pools 9. Created new fs 10 Created new subvolume, 11. Written some I/O on the subvolume 12. Create snapshot of the subvolume 13. Create clone of the snapshot <---------------THIS OPERATION HANGS ----------------- Root Cause: Since the snapshot is deleted from the back end, the clone fails. But it also fails to remove the clone index at '/volumes/_index/clone'. The cloner thread goes to infinite loop of starting the clone and failing. This involves taking 'self.async_job.lock()' and reads the clone index to get the job and registers the above job. While the 'cloner thread' is in above loop, the fs is destroyed. The cloner threads which lives till the mgr/volumes is enabled in mgr, takes the 'self.async_job.lock()' and hangs while reading the clone index. Any further clone operations which also requires above lock hangs. Fix: Remove the clone index even though snapshot is not present. Fixes: https://tracker.ceph.com/issues/55217 Signed-off-by: Kotresh HR --- diff --git a/src/pybind/mgr/volumes/fs/operations/versions/subvolume_v1.py b/src/pybind/mgr/volumes/fs/operations/versions/subvolume_v1.py index 0e63b85878da9..542c860e897d0 100644 --- a/src/pybind/mgr/volumes/fs/operations/versions/subvolume_v1.py +++ b/src/pybind/mgr/volumes/fs/operations/versions/subvolume_v1.py @@ -796,8 +796,6 @@ class SubvolumeV1(SubvolumeBase, SubvolumeTemplate): raise VolumeException(-errno.EINVAL, "error cloning subvolume") def detach_snapshot(self, snapname, track_id): - if not snapname.encode('utf-8') in self.list_snapshots(): - raise VolumeException(-errno.ENOENT, "snapshot '{0}' does not exist".format(snapname)) try: with open_clone_index(self.fs, self.vol_spec) as index: index.untrack(track_id)