]> git.apps.os.sepia.ceph.com Git - ceph.git/commit
osd/OSD: auto mark heartbeat sessions as stale and tear them down 30225/head
authorxie xingguo <xie.xingguo@zte.com.cn>
Wed, 26 Jun 2019 06:24:08 +0000 (14:24 +0800)
committerDavid Zafman <dzafman@redhat.com>
Thu, 24 Oct 2019 22:06:07 +0000 (15:06 -0700)
commit1fa1103edee33a8a4abe9226194d3dc608c014d0
tree8ceeffc141b0714f0db4af8550c895424d145232
parent6320b210fef515f4246f316801f7338a306d1b0d
osd/OSD: auto mark heartbeat sessions as stale and tear them down

The primary benefit is that the OSD doesn't need to keep a flood of
blocked heartbeat messages around in memory.
This prevents OSDs from accumulating heartbeat messages due to a
broken switch and then exhausting the whole node's memory:

Jun 11 04:19:26 host-192-168-9-12 kernel: [409881.137077] Out of memory:
Kill process 1471476 (ceph-osd) score 47 or sacrifice child
Jun 11 04:19:26 host-192-168-9-12 kernel: [409881.146054] Killed process
1471476 (ceph-osd) total-vm:4822548kB, anon-rss:3097860kB,
file-rss:2556kB, shmem-rss:0kB

Fixes: http://tracker.ceph.com/issues/40586
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 6cc90f363b8096d2d5fad30e57426d0cea9e3478)

Conflicts:
src/osd/OSD.cc (no boot_finisher.stop() and no lock_guard)
src/osd/OSD.h (trivial)

Fixed get_val() call in reset_heartbeat_peers()
src/common/options.cc
src/osd/OSD.cc
src/osd/OSD.h