When we open a connection, there is a short window before we attach
the session. If a fault happens quickly, we won't get the reset, and
will persistently fail to send osd pings.
Move the lock up to avoid this. Note that we should rarely really see
connections without sessions here anyway (except when this specific
race happens), so this should have no negative impact (by taking the lock
when we weren't before).
Fixes: http://tracker.ceph.com/issues/36602
Signed-off-by: Sage Weil <sage@redhat.com>
bool OSD::heartbeat_reset(Connection *con)
{
+ std::lock_guard l(heartbeat_lock);
auto s = con->get_priv();
if (s) {
- heartbeat_lock.Lock();
if (is_stopping()) {
heartbeat_lock.Unlock();
return true;
} else {
dout(10) << "heartbeat_reset closing (old) failed hb con " << con << dendl;
}
- heartbeat_lock.Unlock();
}
return true;
}