mon/OSDMonitor: Reset grace period if failure interval exceeds a threshold.
Reset the grace hearbeat period if there have been no failures since the
set threshold value (48 Hrs). The mon_osd_laggy_halflife value is
leveraged to calculate the threshold.
A couple of helper functions do the following:
- get_grace_interval_threshold():
Calculates and returns the grace interval threshold value.
- grace_interval_threshold_exceeded(int):
Checks if grace interval threshold is exceeded based on the last
down stamp.
- set_default_laggy_params(int):
Resets the laggy_probability and laggy_interval in the
new_xinfo structure maintained within pending_inc to be applied
eventually as part of update from paxos.
The threshold value is checked and the laggy parameters are reset at the
following point,
- encode_pending() - If an existing osd is experiencing failure
after an interval exceeding the failure threshold period.
Fixes: https://tracker.ceph.com/issues/45943
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit
9f1d4c1a9cddd942c9ea804dff8dc8068efc06b8)