]> git.apps.os.sepia.ceph.com Git - ceph.git/commit
src/mon/Monitor: Fix set_elector_disallowed_leaders 54004/head
authorKamoltat <ksirivad@redhat.com>
Wed, 11 Oct 2023 21:12:03 +0000 (21:12 +0000)
committerKamoltat <ksirivad@redhat.com>
Fri, 13 Oct 2023 16:10:17 +0000 (16:10 +0000)
commit59b79e49742227f44f3fe002265b1c0663d641ce
tree795953bfb1f2a857618ec1f4478fb926aa173435
parent00e4cf2970094858ea50fc0e76fcfec6cb97c92c
src/mon/Monitor: Fix set_elector_disallowed_leaders

Problem:

In the monitors we hold 2 copies of disallowed_leader ...
1. MonMap class 2. Elector class.
When computing the ConnectivityScore for the monitors during
the election, we use the `disallowed_leader` from Elector
class to determine which monitors we shouldn't allow to lead.

Now, we rely on the function `set_elector_disallowed_leaders`
to set the `disallowed_leader` of the Elector class, MonMap
class copy of the `disallowed_leader` contains the
`tiebreaker_monitor` so we inherit that plus we also add the
monitors that are dead due to a zone failure.

Hence, the `adding dead monitors` phase is only allowed if we can
enter stretch_mode. However, there is a problem when failing over a stretch cluster
zone and reviving the entire zone back up, the revived monitors
couldn't enter stretch_mode when they are at the state of "probing"
since PaxosServices like osdmon becomes unreadable (this is expected)

Solution:

We unconditionally add monitors that are in
`monmap->stretch_marked_down_mons` to the
`disallowed_leaders` list in
`Monitor::set_elector_disallowed_leaders` since
if the monitors are in `monmap->stretch_marked_down_mons`
we know that they probably belong in a marked down
zone and is not fit for lead.

This will fix the problem of newly revived monitors
having different disallowed_leaders set
and getting stuck in election.

Fixes: https://tracker.ceph.com/issues/63183
Signed-off-by: Kamoltat <ksirivad@redhat.com>
(cherry picked from commit 5d1b5da21591c57cb0cbbbc8775b6ea0ced953a4)
src/mon/Monitor.cc