From: Sage Weil Date: Fri, 16 Jun 2017 02:15:37 +0000 (-0400) Subject: osd: bail from _committed_osd_maps inside osd_lock X-Git-Tag: v12.1.0~60^2 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=0267110d3d4ba1cf80507fd741bb016981f7089b;p=ceph.git osd: bail from _committed_osd_maps inside osd_lock thread A: - _committed_osd_maps starts - checks is_stopping(), false thread B: - calls shutdown() - takes osd_lock - drains/clear peering_wq, etc. thread A: - finally gets osd_lock - queues new peering_wq events Eventually the dtor on the peering wq/tp asserts out. We still need to check before taking the lock to resolve a deadlock with msgr shutdown; see aa8f2f138cc633292376dd1f449c7ebfc8a37910. Fix by check both outside and inside of osd_lock so that it is ordered wrt the shutdown() call. This matches (almost) every other site checking is_stopping() directly after taking osd_lock or heartbeat_lock. Fixes: http://tracker.ceph.com/issues/20273 Signed-off-by: Sage Weil --- diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc index 233104a8bc94..5e8a3b0e5891 100644 --- a/src/osd/OSD.cc +++ b/src/osd/OSD.cc @@ -7563,6 +7563,10 @@ void OSD::_committed_osd_maps(epoch_t first, epoch_t last, MOSDMap *m) return; } Mutex::Locker l(osd_lock); + if (is_stopping()) { + dout(10) << __func__ << " bailing, we are shutting down" << dendl; + return; + } map_lock.get_write(); bool do_shutdown = false;