]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/commit
mon/OSDMonitor: Reset grace period if failure interval exceeds a threshold. 35798/head
authorSridhar Seshasayee <sseshasa@redhat.com>
Mon, 8 Jun 2020 15:28:43 +0000 (20:58 +0530)
committerSridhar Seshasayee <sseshasa@redhat.com>
Fri, 26 Jun 2020 12:13:26 +0000 (17:43 +0530)
commit41f8343762fd60ceaf335f659f2e4d83f02a5921
tree8911c9696a267d65bbb318b0e09fd3eb72fa54ae
parent6b4475f35fc95ac56ce93ab1ad82f7782dd03e20
mon/OSDMonitor: Reset grace period if failure interval exceeds a threshold.

Reset the grace hearbeat period if there have been no failures since the
set threshold value (48 Hrs). The mon_osd_laggy_halflife value is
leveraged to calculate the threshold.

A couple of helper functions do the following:
 - get_grace_interval_threshold():
    Calculates and returns the grace interval threshold value.
 - grace_interval_threshold_exceeded(int):
    Checks if grace interval threshold is exceeded based on the last
    down stamp.
 - set_default_laggy_params(int):
     Resets the laggy_probability and laggy_interval in the
     new_xinfo structure maintained within pending_inc to be applied
     eventually as part of update from paxos.

The threshold value is checked and the laggy parameters are reset at the
following point,
 - encode_pending() - If an existing osd is experiencing failure
   after an interval exceeding the failure threshold period.

Fixes: https://tracker.ceph.com/issues/45943
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit 9f1d4c1a9cddd942c9ea804dff8dc8068efc06b8)
src/mon/OSDMonitor.cc
src/mon/OSDMonitor.h