We clear out the osd entry when an osd goes up or down. Thus, if we find
it missing from an up osd, we should start the timer. Otherwise we get
behavior like this
2012-04-24 13:22:47.888291
7fa5bc587700 mon.peon5752@0(leader).osd e21633 OSDMonitor::handle_osd_timeouts: never got MOSDPGStat info from osd 521. Marking down!
2012-04-24 13:22:50.076394
7fa5bcd88700 log [INF] : osd.521 [2607:f298:4:2243::7088]:6806/53217 boot
2012-04-24 13:22:52.903558
7fa5bc587700 mon.peon5752@0(leader).osd e21638 OSDMonitor::handle_osd_timeouts: never got MOSDPGStat info from osd 521. Marking down!
2012-04-24 13:23:15.144532
7fa5bcd88700 log [INF] : osd.521 [2607:f298:4:2243::7088]:6806/53217 boot
2012-04-24 13:23:17.967118
7fa5bc587700 mon.peon5752@0(leader).osd e21663 OSDMonitor::handle_osd_timeouts: never got MOSDPGStat info from osd 521. Marking down!
2012-04-24 13:23:22.173778
7fa5bcd88700 log [INF] : osd.521 [2607:f298:4:2243::7088]:6806/53217 boot
2012-04-24 13:23:22.981556
7fa5bc587700 mon.peon5752@0(leader).osd e21668 OSDMonitor::handle_osd_timeouts: never got MOSDPGStat info from osd 521. Marking down!
2012-04-24 13:23:45.245380
7fa5bcd88700 log [INF] : osd.521 [2607:f298:4:2243::7088]:6806/53217 boot
when the pg stats message doesn't arrive quickly enough.
Fixes: #2341
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>
void OSDMonitor::handle_osd_timeouts(const utime_t &now,
- const std::map<int,utime_t> &last_osd_report)
+ std::map<int,utime_t> &last_osd_report)
{
utime_t timeo(g_conf->mon_osd_report_timeout, 0);
int max_osd = osdmap.get_max_osd();
continue;
const std::map<int,utime_t>::const_iterator t = last_osd_report.find(i);
if (t == last_osd_report.end()) {
- derr << "OSDMonitor::handle_osd_timeouts: never got MOSDPGStat "
- << "info from osd " << i << ". Marking down!" << dendl;
- pending_inc.new_state[i] = CEPH_OSD_UP;
- new_down = true;
- }
- else {
+ // it wasn't in the map; start the timer.
+ last_osd_report[i] = ceph_clock_now(g_ceph_context);
+ } else {
utime_t diff(now);
diff -= t->second;
if (diff > timeo) {
bool prepare_command(MMonCommand *m);
void handle_osd_timeouts(const utime_t &now,
- const std::map<int,utime_t> &last_osd_report);
+ std::map<int,utime_t> &last_osd_report);
void mark_all_down();
void send_latest(PaxosServiceMessage *m, epoch_t start=0);