thread A:
- _committed_osd_maps starts
- checks is_stopping(), false
thread B:
- calls shutdown()
- takes osd_lock
- drains/clear peering_wq, etc.
thread A:
- finally gets osd_lock
- queues new peering_wq events
Eventually the dtor on the peering wq/tp asserts out.
We still need to check before taking the lock to resolve a deadlock
with msgr shutdown; see
aa8f2f138cc633292376dd1f449c7ebfc8a37910.
Fix by check both outside and inside of osd_lock
so that it is ordered wrt the shutdown() call. This
matches (almost) every other site checking is_stopping()
directly after taking osd_lock or heartbeat_lock.
Fixes: http://tracker.ceph.com/issues/20273
Signed-off-by: Sage Weil <sage@redhat.com>
return;
}
Mutex::Locker l(osd_lock);
+ if (is_stopping()) {
+ dout(10) << __func__ << " bailing, we are shutting down" << dendl;
+ return;
+ }
map_lock.get_write();
bool do_shutdown = false;