Callig _finish_hunting() clears out the bool hunting flag, which means we
don't retry by connection to another mon periodically. Instead, we send
keepalives every 10s. But, since we aren't yet in state HAVE_SESSION, we
don't check that the keepalives are getting responses. This means that an
ill-timed connection reset (say, after we get a MonMap, but before we
finish authenticating) can drop the monc into a black hole that does not
retry.
Instead, we should *only* call _finish_hunting() when we complete the
authentication handshake.
Fixes: #8278
Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit
77a6f0aefebebf057f02bfb95c088a30ed93c53f)
if (!monmap.get_addr_name(cur_con->get_peer_addr(), cur_mon)) {
ldout(cct, 10) << "mon." << cur_mon << " went away" << dendl;
_reopen_session(); // can't find the mon we were talking to (above)
- } else {
- _finish_hunting();
}
map_cond.Signal();
void MonClient::handle_subscribe_ack(MMonSubscribeAck *m)
{
- _finish_hunting();
-
if (sub_renew_sent != utime_t()) {
sub_renew_after = sub_renew_sent;
sub_renew_after += m->interval / 2.0;