Avoid this deadlock:
- a fault
- delay thread entry gets a fast dispatch message
- drops delay_lock
- calls into fast_dispatch
- reaper tries to reap the pipe
- pipe->join()
- delay_thread->join()
- blocks waiting for delay_thread to exit
- delay thread / fast dispatch blocks on msgr->lock trying to mark_down
The solution is to drop the msgr lock while joining the thread. This will
allow the join() to complete. Adjust the reaper thread to recheck the
exit condition since the lock may have been dropped. The other two callers
do not care.
Fixes: #8891
Signed-off-by: Sage Weil <sage@redhat.com>
ldout(cct,10) << "reaper_entry start" << dendl;
lock.Lock();
while (!reaper_stop) {
- reaper();
+ reaper(); // may drop and retake the lock
+ if (reaper_stop)
+ break;
reaper_cond.Wait(lock);
}
lock.Unlock();
p->unregister_pipe();
assert(pipes.count(p));
pipes.erase(p);
+
+ // drop msgr lock while joining thread; the delay through could be
+ // trying to fast dispatch, preventing it from joining without
+ // blocking and deadlocking.
+ lock.Unlock();
p->join();
+ lock.Lock();
+
if (p->sd >= 0)
::close(p->sd);
ldout(cct,10) << "reaper reaped pipe " << p << " " << p->get_peer_addr() << dendl;