From: xie xingguo Date: Wed, 26 Jun 2019 06:24:08 +0000 (+0800) Subject: osd/OSD: auto mark heartbeat sessions as stale and tear them down X-Git-Tag: v13.2.7~26^2 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=refs%2Fpull%2F30225%2Fhead;p=ceph.git osd/OSD: auto mark heartbeat sessions as stale and tear them down The primary benefit is that the OSD doesn't need to keep a flood of blocked heartbeat messages around in memory. This prevents OSDs from accumulating heartbeat messages due to a broken switch and then exhausting the whole node's memory: Jun 11 04:19:26 host-192-168-9-12 kernel: [409881.137077] Out of memory: Kill process 1471476 (ceph-osd) score 47 or sacrifice child Jun 11 04:19:26 host-192-168-9-12 kernel: [409881.146054] Killed process 1471476 (ceph-osd) total-vm:4822548kB, anon-rss:3097860kB, file-rss:2556kB, shmem-rss:0kB Fixes: http://tracker.ceph.com/issues/40586 Signed-off-by: xie xingguo (cherry picked from commit 6cc90f363b8096d2d5fad30e57426d0cea9e3478) Conflicts: src/osd/OSD.cc (no boot_finisher.stop() and no lock_guard) src/osd/OSD.h (trivial) Fixed get_val() call in reset_heartbeat_peers() --- diff --git a/src/common/options.cc b/src/common/options.cc index e52b5d533642b..42c9f73fbddb1 100644 --- a/src/common/options.cc +++ b/src/common/options.cc @@ -2941,6 +2941,13 @@ std::vector