We were asserting when the message's timecheck epoch (which is mapped to
the election epoch) was older than the current epoch. However, if a
monitor is lagged just enough to not even notice an election happened,
then it might eventually answer to old timechecks, which would make
the leader assert. Instead, we just drop the message, while warning we
did so.
Fixes: #3835
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
dout(10) << __func__ << " " << *m << dendl;
/* handles PONG's */
assert(m->op == MTimeCheck::OP_PONG);
- assert(m->epoch == timecheck_epoch);
entity_inst_t other = m->get_source_inst();
+ if (m->epoch < timecheck_epoch) {
+ dout(1) << __func__ << " got old timecheck epoch " << m->epoch
+ << " from " << other
+ << " curr " << timecheck_epoch
+ << " -- severely lagged? discard" << dendl;
+ return;
+ }
+ assert(m->epoch == timecheck_epoch);
if (m->round < timecheck_round) {
dout(1) << __func__ << " got old round " << m->round