git.apps.os.sepia.ceph.com Git

author	Ilya Dryomov <idryomov@gmail.com>
	Wed, 24 Aug 2022 10:56:31 +0000 (12:56 +0200)
committer	Ilya Dryomov <idryomov@gmail.com>
	Thu, 1 Sep 2022 18:24:09 +0000 (20:24 +0200)
commit	9a4030e36cca053d874aca0c3354241375476999
tree	10fa4d284de596301a1c5eb187d5189c62d03a23	tree \| snapshot
parent	2b20fd6d102ba28c4cab3f70b7ca97e6cc0b5d16	commit \| diff

rbd-mirror: resume pending shutdown on error in snapshot replayer

If a shutdown is requested, e.g. by update_pool_replayers() because
remote RADOS instance got blocklisted, and Replayer::shut_down() pends
it on completion of current snapshot sync, it gets stuck if replayer
encounters an error in the interim. This is particularly likely in the
blocklist case: a higher layer may detect that client got blocklisted
and request a shutdown first, and then when replayer sees EBLOCKLISTED
in turn, it calls handle_replay_complete() -- which does not resume
a pending shutdown. Because update_pool_replayers() blocks on shutdown
with Mirror::m_lock held, eventually the entire daemon hangs in
perpetuity.

Fixes: https://tracker.ceph.com/issues/56154
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit fc4cc575bc53f62f88ee3faf0daba8906bc1c6c1)

src/test/rbd_mirror/image_replayer/snapshot/test_mock_Replayer.cc		diff \| blob \| history
src/tools/rbd_mirror/image_replayer/snapshot/Replayer.cc		diff \| blob \| history