]> git.apps.os.sepia.ceph.com Git - ceph.git/commit
rbd-mirror: resume pending shutdown on error in snapshot replayer
authorIlya Dryomov <idryomov@gmail.com>
Wed, 24 Aug 2022 10:56:31 +0000 (12:56 +0200)
committerIlya Dryomov <idryomov@gmail.com>
Thu, 1 Sep 2022 18:24:09 +0000 (20:24 +0200)
commit9a4030e36cca053d874aca0c3354241375476999
tree10fa4d284de596301a1c5eb187d5189c62d03a23
parent2b20fd6d102ba28c4cab3f70b7ca97e6cc0b5d16
rbd-mirror: resume pending shutdown on error in snapshot replayer

If a shutdown is requested, e.g. by update_pool_replayers() because
remote RADOS instance got blocklisted, and Replayer::shut_down() pends
it on completion of current snapshot sync, it gets stuck if replayer
encounters an error in the interim.  This is particularly likely in the
blocklist case: a higher layer may detect that client got blocklisted
and request a shutdown first, and then when replayer sees EBLOCKLISTED
in turn, it calls handle_replay_complete() -- which does not resume
a pending shutdown.  Because update_pool_replayers() blocks on shutdown
with Mirror::m_lock held, eventually the entire daemon hangs in
perpetuity.

Fixes: https://tracker.ceph.com/issues/56154
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit fc4cc575bc53f62f88ee3faf0daba8906bc1c6c1)
src/test/rbd_mirror/image_replayer/snapshot/test_mock_Replayer.cc
src/tools/rbd_mirror/image_replayer/snapshot/Replayer.cc