]> git.apps.os.sepia.ceph.com Git - ceph-ci.git/commit
src/mon/Monitor: Fix set_elector_disallowed_leaders
authorKamoltat <ksirivad@redhat.com>
Wed, 11 Oct 2023 21:12:03 +0000 (21:12 +0000)
committerKamoltat <ksirivad@redhat.com>
Thu, 12 Oct 2023 20:07:12 +0000 (20:07 +0000)
commit5d1b5da21591c57cb0cbbbc8775b6ea0ced953a4
treecac172d466287cebc90a3ac05dc7333b0d677970
parent17bd0c6aca24a8d10b1f346462d6df2e29afd80e
src/mon/Monitor: Fix set_elector_disallowed_leaders

Problem:

In the monitors we hold 2 copies of disallowed_leader ...
1. MonMap class 2. Elector class.
When computing the ConnectivityScore for the monitors during
the election, we use the `disallowed_leader` from Elector
class to determine which monitors we shouldn't allow to lead.

Now, we rely on the function `set_elector_disallowed_leaders`
to set the `disallowed_leader` of the Elector class, MonMap
class copy of the `disallowed_leader` contains the
`tiebreaker_monitor` so we inherit that plus we also add the
monitors that are dead due to a zone failure.

Hence, the `adding dead monitors` phase is only allowed if we can
enter stretch_mode. However, there is a problem when failing over a stretch cluster
zone and reviving the entire zone back up, the revived monitors
couldn't enter stretch_mode when they are at the state of "probing"
since PaxosServices like osdmon becomes unreadable (this is expected)

Solution:

We unconditionally add monitors that are in
`monmap->stretch_marked_down_mons` to the
`disallowed_leaders` list in
`Monitor::set_elector_disallowed_leaders` since
if the monitors are in `monmap->stretch_marked_down_mons`
we know that they probably belong in a marked down
zone and is not fit for lead.

This will fix the problem of newly revived monitors
having different disallowed_leaders set
and getting stuck in election.

Fixes: https://tracker.ceph.com/issues/63183
Signed-off-by: Kamoltat <ksirivad@redhat.com>
src/mon/Monitor.cc