Problem:
In RHCS the user can choose to manually remove a monitor rank
before shutting the monitor down. Causing inconsistency in monmap.
for example we remove mon.a from the monmap, there is a short period
where mon.a is still operational and will try to remove itself from
monmap but we will run into an assertion in
ConnectionTracker::notify_ranks_removed().
Solution:
In Monitor::notify_new_monmap() we prevent the func
from going into removing our own rank, or
ranks that doesn't exists in monmap.
FYI: this is an RHCS problem only, in ODF,
we never remove a monitor from monmap
before shutting it down.
Fixes: https://tracker.ceph.com/issues/58049
Signed-off-by: Kamoltat <ksirivad@redhat.com>
(cherry picked from commit
924e7ec92bbaa6efd0ef816c1cb101ff7972616c)
dout(10) << __func__ << " we have " << monmap->removed_ranks.size() << " removed ranks" << dendl;
for (auto i = monmap->removed_ranks.rbegin();
i != monmap->removed_ranks.rend(); ++i) {
- int rank = *i;
- dout(10) << __func__ << " removing rank " << rank << dendl;
+ int remove_rank = *i;
+ dout(10) << __func__ << " removing rank " << remove_rank << dendl;
+ if (rank == remove_rank) {
+ dout(5) << "We are removing our own rank, probably we"
+ << " are removed from monmap before we shutdown ... dropping." << dendl;
+ continue;
+ }
int new_rank = monmap->get_rank(messenger->get_myaddrs());
- elector.notify_rank_removed(rank, new_rank);
+ if (new_rank == -1) {
+ dout(5) << "We no longer exists in the monmap! ... dropping." << dendl;
+ continue;
+ }
+ elector.notify_rank_removed(remove_rank, new_rank);
}
}