From: Kamoltat Date: Wed, 11 Oct 2023 21:12:03 +0000 (+0000) Subject: src/mon/Monitor: Fix set_elector_disallowed_leaders X-Git-Tag: v19.0.0~283^2 X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=5d1b5da21591c57cb0cbbbc8775b6ea0ced953a4;p=ceph.git src/mon/Monitor: Fix set_elector_disallowed_leaders Problem: In the monitors we hold 2 copies of disallowed_leader ... 1. MonMap class 2. Elector class. When computing the ConnectivityScore for the monitors during the election, we use the `disallowed_leader` from Elector class to determine which monitors we shouldn't allow to lead. Now, we rely on the function `set_elector_disallowed_leaders` to set the `disallowed_leader` of the Elector class, MonMap class copy of the `disallowed_leader` contains the `tiebreaker_monitor` so we inherit that plus we also add the monitors that are dead due to a zone failure. Hence, the `adding dead monitors` phase is only allowed if we can enter stretch_mode. However, there is a problem when failing over a stretch cluster zone and reviving the entire zone back up, the revived monitors couldn't enter stretch_mode when they are at the state of "probing" since PaxosServices like osdmon becomes unreadable (this is expected) Solution: We unconditionally add monitors that are in `monmap->stretch_marked_down_mons` to the `disallowed_leaders` list in `Monitor::set_elector_disallowed_leaders` since if the monitors are in `monmap->stretch_marked_down_mons` we know that they probably belong in a marked down zone and is not fit for lead. This will fix the problem of newly revived monitors having different disallowed_leaders set and getting stuck in election. Fixes: https://tracker.ceph.com/issues/63183 Signed-off-by: Kamoltat --- diff --git a/src/mon/Monitor.cc b/src/mon/Monitor.cc index 6866536d0654f..27151e60b2202 100644 --- a/src/mon/Monitor.cc +++ b/src/mon/Monitor.cc @@ -6658,14 +6658,16 @@ void Monitor::notify_new_monmap(bool can_change_external_state, bool remove_rank void Monitor::set_elector_disallowed_leaders(bool allow_election) { set dl; + // inherit dl from monmap for (auto name : monmap->disallowed_leaders) { dl.insert(monmap->get_rank(name)); - } - if (is_stretch_mode()) { - for (auto name : monmap->stretch_marked_down_mons) { - dl.insert(monmap->get_rank(name)); - } - dl.insert(monmap->get_rank(monmap->tiebreaker_mon)); + } // unconditionally add stretch_marked_down_mons to the new dl copy + for (auto name : monmap->stretch_marked_down_mons) { + dl.insert(monmap->get_rank(name)); + } // add the tiebreaker_mon incase it is not in monmap->disallowed_leaders + if (!monmap->tiebreaker_mon.empty() && + monmap->contains(monmap->tiebreaker_mon)) { + dl.insert(monmap->get_rank(monmap->tiebreaker_mon)); } bool disallowed_changed = elector.set_disallowed_leaders(dl);