]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/commit
mon: Track and process pending pings after election 62924/head
authorKamoltat Sirivadhna <ksirivad@redhat.com>
Thu, 20 Mar 2025 14:44:28 +0000 (14:44 +0000)
committerKamoltat Sirivadhna <ksirivad@redhat.com>
Wed, 23 Apr 2025 04:24:05 +0000 (04:24 +0000)
commit33bb15bbb36d0eabd7aac35ff8d618e52e24e801
treec463bf47bfc0c27ba07ee5849490367da43a0d9a
parent6f0aa19f125fac8d3ad9ed7e56d9b18425ada4fb
mon: Track and process pending pings after election

Problem:

Monitors stop pinging each other when quorum_mon_feature
flag is empty. This happens when the monitor freshly starts
up and never formed a quorum before, or when you restart
the monitors in the cluster. Basically, Monitor startups.

This problem can easily be reproduced everytime.

Steps to reproduce:
1. Start 3 MONs with `connectivity` election strategy
2. Fail 1 mon.
3. Restart all the monitors (including the down monitor)
4. Observe that the connection scores of each monitor
will tell you that not all monitors are alive. Which
is not true because all 3 Monitors are in quorum.

What happened was during monitor startups,
quorum_mon_feature is empty and although
they all participated in the election,
when they hit the function begin_peer_ping,
some monitors if not all will not send ping
because of the emptry quorum_mon_feature flag.
Therefore, after the election the monitors will
have the wrong connection score. However,
this will get resolved in the next election
because now that quorum_mon_feature is populated
they will start pinging each other again, hence,
correct connectivity score.

Solution:
In begin_peer_ping, instead of just returning out of
the function when quorum_mon_feature is empty, we
keep track of the peers that we should ping once
the election is finished. In Monitor::win_election
and Monitor::lose_elections,
we process the pending pings by calling begin_peer_ping
on each of the peers (both peons and leader)

Additionally:
Improved loggings in Elector class such
that debugging the pinging process gets easier.

Fixes: https://tracker.ceph.com/issues/70587
Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
sd

Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
(cherry picked from commit eadcb8ced35d9aa011828e8a0b69d7b2e168f934)
src/mon/Elector.cc
src/mon/Elector.h
src/mon/Monitor.cc