librbd: fix a deadlock on image_lock caused by Mirror::image_disable()
With Mirror::image_disable() taking image_lock for write and calling
list_children() under it, the following deadlock is possible:
1. Mirror::image_disable() takes image_lock for write and calls
list_children()
2. AbstractWriteLog::periodic_stats() timer fires (it runs every
5 seconds) and ImageCacheState::write_image_cache_state() is called
under a global timer_lock
3. ImageCacheState::write_image_cache_state() successfully takes
owner_lock and blocks attempting to take image_lock for read because
it's already held for write by Mirror::image_disable()
4. list_children() blocks inside of a call to ImageState::close() on
a descendant image
5. The descendant image close can't proceed because TokenBucketThrottle
requires a global timer_lock to complete QosImageDispatch shutdown
6. safe_timer thread which is holding timer_lock can't proceed because
ImageCacheState::write_image_cache_state() is effectively blocked on
the descendant image close through Mirror::image_disable()
Until commit
281a64acf920 ("librbd: remove snapshot mirror image-meta
when disabling"), Mirror::image_disable() was taking image_lock only for
read meaning that this deadlock wasn't possible. The only other change
that commit
281a64acf920 made to the code block protected by image_lock
was using child_mirror_image_internal for cls_client::mirror_image_get()
call on descendant images instead of mirror_image_internal to preserve
the value of mirror_image_internal for later. Both are local variables
that have nothing to do with image_lock, so I'm going back and making
Mirror::image_disable() take image_lock only for read again.
Fixes: https://tracker.ceph.com/issues/70190
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
ff9aa20bc358775bf372052787322b1452886c13)