git-server-git.apps.pok.os.sepia.ceph.com Git

author	Sridhar Seshasayee <sseshasa@redhat.com>
	Mon, 8 Jun 2020 15:28:43 +0000 (20:58 +0530)
committer	Sridhar Seshasayee <sseshasa@redhat.com>
	Fri, 26 Jun 2020 12:13:26 +0000 (17:43 +0530)
commit	41f8343762fd60ceaf335f659f2e4d83f02a5921
tree	8911c9696a267d65bbb318b0e09fd3eb72fa54ae	tree \| snapshot
parent	6b4475f35fc95ac56ce93ab1ad82f7782dd03e20	commit \| diff

mon/OSDMonitor: Reset grace period if failure interval exceeds a threshold.

Reset the grace hearbeat period if there have been no failures since the
set threshold value (48 Hrs). The mon_osd_laggy_halflife value is
leveraged to calculate the threshold.

A couple of helper functions do the following:
- get_grace_interval_threshold():
    Calculates and returns the grace interval threshold value.
- grace_interval_threshold_exceeded(int):
    Checks if grace interval threshold is exceeded based on the last
    down stamp.
- set_default_laggy_params(int):
     Resets the laggy_probability and laggy_interval in the
     new_xinfo structure maintained within pending_inc to be applied
     eventually as part of update from paxos.

The threshold value is checked and the laggy parameters are reset at the
following point,
- encode_pending() - If an existing osd is experiencing failure
   after an interval exceeding the failure threshold period.

Fixes: https://tracker.ceph.com/issues/45943
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit 9f1d4c1a9cddd942c9ea804dff8dc8068efc06b8)

src/mon/OSDMonitor.cc		diff \| blob \| history
src/mon/OSDMonitor.h		diff \| blob \| history