During teuthology tests, the initial cluster bootstrap often starts up
the mon sbut doesn't include all mons in the initial quorum, due to
mon startup misalignment and random delays. Provide a short grace period
where we will not raise a MON_DOWN alert even though the quorum is not
complete.
Fixes: https://tracker.ceph.com/issues/43584
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit
eee041f2f070b88b01d45c04624872681dd158be)
Conflicts:
src/common/options/mon.yaml.in
In Pacific, options are in `src/common/options.cc`.
.set_description("Period in seconds between monitor-to-manager "
"health/status updates"),
+ Option("mon_down_mkfs_grace", Option::TYPE_SECS, Option::LEVEL_ADVANCED)
+ .set_default(60)
+ .add_service("mon")
+ .set_description("Period in seconds that the cluster may have a mon down after cluster creation"),
+
Option("mon_mgr_beacon_grace", Option::TYPE_SECS, Option::LEVEL_ADVANCED)
.set_default(30)
.add_service("mon")
{
int max = mon.monmap->size();
int actual = mon.get_quorum().size();
- if (actual < max) {
+ const auto now = ceph::real_clock::now();
+ if (actual < max &&
+ now > mon.monmap->created.to_real_time() + g_conf().get_val<std::chrono::seconds>("mon_down_mkfs_grace")) {
ostringstream ss;
ss << (max-actual) << "/" << max << " mons down, quorum "
<< mon.get_quorum_names();