]> git.apps.os.sepia.ceph.com Git - ceph.git/commit
mon/OSDMonitor: Reset grace period if failure interval exceeds a threshold. 35799/head
authorSridhar Seshasayee <sseshasa@redhat.com>
Mon, 8 Jun 2020 15:28:43 +0000 (20:58 +0530)
committerSridhar Seshasayee <sseshasa@redhat.com>
Fri, 26 Jun 2020 12:17:12 +0000 (17:47 +0530)
commit3bd4e2394b70555fe7e8a737156ce340da88369f
tree1b1d87fbac3a3ac0d66c00cf9d10a09c181cdf61
parentec03ea106bfc42135c54e7f8444c5b207d5120d9
mon/OSDMonitor: Reset grace period if failure interval exceeds a threshold.

Reset the grace hearbeat period if there have been no failures since the
set threshold value (48 Hrs). The mon_osd_laggy_halflife value is
leveraged to calculate the threshold.

A couple of helper functions do the following:
 - get_grace_interval_threshold():
    Calculates and returns the grace interval threshold value.
 - grace_interval_threshold_exceeded(int):
    Checks if grace interval threshold is exceeded based on the last
    down stamp.
 - set_default_laggy_params(int):
     Resets the laggy_probability and laggy_interval in the
     new_xinfo structure maintained within pending_inc to be applied
     eventually as part of update from paxos.

The threshold value is checked and the laggy parameters are reset at the
following point,
 - encode_pending() - If an existing osd is experiencing failure
   after an interval exceeding the failure threshold period.

Fixes: https://tracker.ceph.com/issues/45943
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit 9f1d4c1a9cddd942c9ea804dff8dc8068efc06b8)
src/mon/OSDMonitor.cc
src/mon/OSDMonitor.h