From cfd8d63e0158b1d336f5e080a1e83dbf6fc60079 Mon Sep 17 00:00:00 2001 From: Kotresh HR Date: Thu, 7 Apr 2022 18:58:28 +0530 Subject: [PATCH] mgr/volumes: Fix clone hang issue Following sequence of operation lead to deadlock 1. Created subvolume 2. Written some I/O on the subvolume 3. Create snapshot of the subvolume 4. Create clone of the snapshot 5. Delete snapshot from back end (don't use subvolume interface) before clone completes 6. Delete clone with force 7. Delete subvolume 8. Delete fs and associated pools 9. Created new fs 10 Created new subvolume, 11. Written some I/O on the subvolume 12. Create snapshot of the subvolume 13. Create clone of the snapshot <---------------THIS OPERATION HANGS ----------------- Root Cause: Since the snapshot is deleted from the back end, the clone fails. But it also fails to remove the clone index at '/volumes/_index/clone'. The cloner thread goes to infinite loop of starting the clone and failing. This involves taking 'self.async_job.lock()' and reads the clone index to get the job and registers the above job. While the 'cloner thread' is in above loop, the fs is destroyed. The cloner threads which lives till the mgr/volumes is enabled in mgr, takes the 'self.async_job.lock()' and hangs while reading the clone index. Any further clone operations which also requires above lock hangs. Fix: Remove the clone index even though snapshot is not present. Fixes: https://tracker.ceph.com/issues/55217 Signed-off-by: Kotresh HR --- src/pybind/mgr/volumes/fs/operations/versions/subvolume_v1.py | 2 -- 1 file changed, 2 deletions(-) diff --git a/src/pybind/mgr/volumes/fs/operations/versions/subvolume_v1.py b/src/pybind/mgr/volumes/fs/operations/versions/subvolume_v1.py index 0e63b85878da9..542c860e897d0 100644 --- a/src/pybind/mgr/volumes/fs/operations/versions/subvolume_v1.py +++ b/src/pybind/mgr/volumes/fs/operations/versions/subvolume_v1.py @@ -796,8 +796,6 @@ class SubvolumeV1(SubvolumeBase, SubvolumeTemplate): raise VolumeException(-errno.EINVAL, "error cloning subvolume") def detach_snapshot(self, snapname, track_id): - if not snapname.encode('utf-8') in self.list_snapshots(): - raise VolumeException(-errno.ENOENT, "snapshot '{0}' does not exist".format(snapname)) try: with open_clone_index(self.fs, self.vol_spec) as index: index.untrack(track_id) -- 2.39.5