From 0ba7113f1310961299e49d78242ed8d33d0982fd Mon Sep 17 00:00:00 2001 From: xie xingguo Date: Wed, 26 Jun 2019 14:24:08 +0800 Subject: [PATCH] osd/OSD: auto mark heartbeat sessions as stale and tear them down The primary benefit is that the OSD doesn't need to keep a flood of blocked heartbeat messages around in memory. This prevents OSDs from accumulating heartbeat messages due to a broken switch and then exhausting the whole node's memory: Jun 11 04:19:26 host-192-168-9-12 kernel: [409881.137077] Out of memory: Kill process 1471476 (ceph-osd) score 47 or sacrifice child Jun 11 04:19:26 host-192-168-9-12 kernel: [409881.146054] Killed process 1471476 (ceph-osd) total-vm:4822548kB, anon-rss:3097860kB, file-rss:2556kB, shmem-rss:0kB Fixes: http://tracker.ceph.com/issues/40586 Signed-off-by: xie xingguo (cherry picked from commit 6cc90f363b8096d2d5fad30e57426d0cea9e3478) --- src/common/options.cc | 7 +++++++ src/osd/OSD.cc | 33 ++++++++++++++++++++------------- src/osd/OSD.h | 10 +++++++++- 3 files changed, 36 insertions(+), 14 deletions(-) diff --git a/src/common/options.cc b/src/common/options.cc index 9c79197cf93d6..b3ab211cbf877 100644 --- a/src/common/options.cc +++ b/src/common/options.cc @@ -3323,6 +3323,13 @@ std::vector