osd/OSD: auto mark heartbeat sessions as stale and tear them down
The primary benefit is that the OSD doesn't need to keep a flood of
blocked heartbeat messages around in memory.
This prevents OSDs from accumulating heartbeat messages due to a
broken switch and then exhausting the whole node's memory:
Jun 11 04:19:26 host-192-168-9-12 kernel: [409881.137077] Out of memory:
Kill process
1471476 (ceph-osd) score 47 or sacrifice child
Jun 11 04:19:26 host-192-168-9-12 kernel: [409881.146054] Killed process
1471476 (ceph-osd) total-vm:4822548kB, anon-rss:3097860kB,
file-rss:2556kB, shmem-rss:0kB
Fixes: http://tracker.ceph.com/issues/40586
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit
6cc90f363b8096d2d5fad30e57426d0cea9e3478)
Conflicts:
src/osd/OSD.cc (no boot_finisher.stop() and no lock_guard)
src/osd/OSD.h (trivial)
Fixed get_val() call in reset_heartbeat_peers()