git.apps.os.sepia.ceph.com Git

author	Sridhar Seshasayee <sseshasa@redhat.com>
	Mon, 8 Jun 2020 15:28:43 +0000 (20:58 +0530)
committer	Sridhar Seshasayee <sseshasa@redhat.com>
	Fri, 26 Jun 2020 12:17:12 +0000 (17:47 +0530)
commit	3bd4e2394b70555fe7e8a737156ce340da88369f
tree	1b1d87fbac3a3ac0d66c00cf9d10a09c181cdf61	tree \| snapshot
parent	ec03ea106bfc42135c54e7f8444c5b207d5120d9	commit \| diff

mon/OSDMonitor: Reset grace period if failure interval exceeds a threshold.

Reset the grace hearbeat period if there have been no failures since the
set threshold value (48 Hrs). The mon_osd_laggy_halflife value is
leveraged to calculate the threshold.

A couple of helper functions do the following:
- get_grace_interval_threshold():
    Calculates and returns the grace interval threshold value.
- grace_interval_threshold_exceeded(int):
    Checks if grace interval threshold is exceeded based on the last
    down stamp.
- set_default_laggy_params(int):
     Resets the laggy_probability and laggy_interval in the
     new_xinfo structure maintained within pending_inc to be applied
     eventually as part of update from paxos.

The threshold value is checked and the laggy parameters are reset at the
following point,
- encode_pending() - If an existing osd is experiencing failure
   after an interval exceeding the failure threshold period.

Fixes: https://tracker.ceph.com/issues/45943
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit 9f1d4c1a9cddd942c9ea804dff8dc8068efc06b8)

src/mon/OSDMonitor.cc		diff \| blob \| history
src/mon/OSDMonitor.h		diff \| blob \| history