]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/commitdiff
osd: bail from _committed_osd_maps inside osd_lock 15710/head
authorSage Weil <sage@redhat.com>
Fri, 16 Jun 2017 02:15:37 +0000 (22:15 -0400)
committerSage Weil <sage@redhat.com>
Fri, 16 Jun 2017 02:15:37 +0000 (22:15 -0400)
thread A:
 - _committed_osd_maps starts
 - checks is_stopping(), false
thread B:
 - calls shutdown()
 - takes osd_lock
 - drains/clear peering_wq, etc.
thread A:
 - finally gets osd_lock
 - queues new peering_wq events

Eventually the dtor on the peering wq/tp asserts out.

We still need to check before taking the lock to resolve a deadlock
with msgr shutdown; see aa8f2f138cc633292376dd1f449c7ebfc8a37910.

Fix by check both outside and inside of osd_lock
so that it is ordered wrt the shutdown() call.  This
matches (almost) every other site checking is_stopping()
directly after taking osd_lock or heartbeat_lock.

Fixes: http://tracker.ceph.com/issues/20273
Signed-off-by: Sage Weil <sage@redhat.com>
src/osd/OSD.cc

index 233104a8bc940cb6941e96887bfa0d444bc8220c..5e8a3b0e58918194dbd567d05250889526b23a04 100644 (file)
@@ -7563,6 +7563,10 @@ void OSD::_committed_osd_maps(epoch_t first, epoch_t last, MOSDMap *m)
     return;
   }
   Mutex::Locker l(osd_lock);
+  if (is_stopping()) {
+    dout(10) << __func__ << " bailing, we are shutting down" << dendl;
+    return;
+  }
   map_lock.get_write();
 
   bool do_shutdown = false;