During teuthology tests, the initial cluster bootstrap often starts up
the mon sbut doesn't include all mons in the initial quorum, due to
mon startup misalignment and random delays. Provide a short grace period
where we will not raise a MON_DOWN alert even though the quorum is not
complete.
Fixes: https://tracker.ceph.com/issues/43584
Signed-off-by: Sage Weil <sage@newdream.net>
default: 5
services:
- mon
+- name: mon_down_mkfs_grace
+ type: secs
+ level: advanced
+ desc: Period in seconds that the cluster may have a mon down after cluster creation
+ default: 1_min
+ services:
+ - mon
- name: mon_mgr_beacon_grace
type: secs
level: advanced
{
int max = mon.monmap->size();
int actual = mon.get_quorum().size();
- if (actual < max) {
+ const auto now = ceph::real_clock::now();
+ if (actual < max &&
+ now > mon.monmap->created.to_real_time() + g_conf().get_val<std::chrono::seconds>("mon_down_mkfs_grace")) {
ostringstream ss;
ss << (max-actual) << "/" << max << " mons down, quorum "
<< mon.get_quorum_names();