git-server-git.apps.pok.os.sepia.ceph.com Git

author	Kotresh HR <khiremat@redhat.com>
	Fri, 16 Jul 2021 10:28:37 +0000 (15:58 +0530)
committer	Kotresh HR <khiremat@redhat.com>
	Mon, 24 Jan 2022 11:08:50 +0000 (16:38 +0530)
commit	ef373d8aea3133a15bb22ae62d97886b75eaca0c
tree	a7aaf04050c76db9b8d4f4d0285f1558c628c8c3	tree \| snapshot
parent	6016a9173221b550c7523d88a931c66b77a437f8	commit \| diff

mgr/volumes: Fix a race during clone cancel

Issue:
The race is that the cancelled clone can still go ahead and sync the data to cloned subvolume.

Here is the sequence how this can happen.
1. Subvolume clone is created and it's in PENDING state (it's the initial state for a clone)
2. The clone job is picked up by the cloner thread, started the state machine (i.e.start_clone_sm)
and queried the clone state, which is PENDING. So this has local copy of the state at this point.
3. The 'clone cancel' is called which just removes the tracker from index as the state is still PENDING.
This moves the clone state from PENDING to CANCEL.
4. The cloner thread proceeds further from PENDING (local copy of the state) to IN-PROGRESS.

Fix:
Along with checking for PENDING state, also check whether the job is picked by thread with in the
lock. This guarantees that none of the cloner threads has picked it up for processing and plain
removal of index is sufficient.

Fixes: https://tracker.ceph.com/issues/51805
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit e1ccf367dff0b9d57bd5c50477cfa60c76c64762)

src/pybind/mgr/volumes/fs/async_cloner.py		diff \| blob \| history
src/pybind/mgr/volumes/fs/async_job.py		diff \| blob \| history