git.apps.os.sepia.ceph.com Git

author	Sage Weil <sage@inktank.com>
	Tue, 4 Sep 2012 20:39:23 +0000 (13:39 -0700)
committer	Sage Weil <sage@inktank.com>
	Tue, 18 Sep 2012 21:39:00 +0000 (14:39 -0700)
commit	adf0fe6a10ece6c2e48ecf6c66e849dfddf95656
tree	94f4f61e2fae02b05d3553441fd8c59daa3175fa	tree \| snapshot
parent	3f51d31639eb5af4e907fa316f1643b02ddb8f27	commit \| diff

mon: scale heartbeat grace based on laggy probability, interval

If, based on historical behavior, an observed osd failure is likely to be
due to unresponsiveness and not the daemon stopping, scale the heartbeat
grace period accordingly:

grace' = grace + laggy_probabiliy * laggy_interval

This will avoid fruitlessly marking OSDs down and generating additional
map update overhead when the cluster is overloaded and potentially
struggling to keep up with map updates. See #3045.

Signed-off-by: Sage Weil <sage@inktank.com>