]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/commit
rbd-mirror: resume pending shutdown on error in snapshot replayer
authorIlya Dryomov <idryomov@gmail.com>
Wed, 24 Aug 2022 10:56:31 +0000 (12:56 +0200)
committerIlya Dryomov <idryomov@gmail.com>
Mon, 29 Aug 2022 18:14:17 +0000 (20:14 +0200)
commitfc4cc575bc53f62f88ee3faf0daba8906bc1c6c1
tree53e924b70d3668b8da81775415a443b4d4f95af5
parent75d4ce7169efac7895e25e140e7a14387224022e
rbd-mirror: resume pending shutdown on error in snapshot replayer

If a shutdown is requested, e.g. by update_pool_replayers() because
remote RADOS instance got blocklisted, and Replayer::shut_down() pends
it on completion of current snapshot sync, it gets stuck if replayer
encounters an error in the interim.  This is particularly likely in the
blocklist case: a higher layer may detect that client got blocklisted
and request a shutdown first, and then when replayer sees EBLOCKLISTED
in turn, it calls handle_replay_complete() -- which does not resume
a pending shutdown.  Because update_pool_replayers() blocks on shutdown
with Mirror::m_lock held, eventually the entire daemon hangs in
perpetuity.

Fixes: https://tracker.ceph.com/issues/56154
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
src/test/rbd_mirror/image_replayer/snapshot/test_mock_Replayer.cc
src/tools/rbd_mirror/image_replayer/snapshot/Replayer.cc