From 85f6e8d29cd8d0d30b3f07b26974357d875b6908 Mon Sep 17 00:00:00 2001 From: Dan Hill Date: Wed, 15 Apr 2020 14:54:09 -0700 Subject: [PATCH] rados: prevent ShardedOpWQ suicide_grace drop when waiting for work. The Sharded OpWQ will opportunistically wait for more work when processing an empty queue. While waiting, the default work queue heartbeat timeout and suicide_grace values are modified. The `threadpool_default_timeout` grace is applied and suicide_grace is disabled. If this op hangs, the heartbeat watchdog will not trigger an OSD suicide recovery. The default work queue values for grace and suicide_grace are re-applied after finding work. This keeps the heartbeat timeouts consistent with the values applied on _process() entry. Fixes: https://tracker.ceph.com/issues/45076 Signed-off-by: Dan Hill --- src/osd/OSD.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc index 133ebb984b25..7f63bbf5ced0 100644 --- a/src/osd/OSD.cc +++ b/src/osd/OSD.cc @@ -10448,8 +10448,9 @@ void OSD::ShardedOpWQ::_process(uint32_t thread_index, heartbeat_handle_d *hb) sdata->shard_lock.unlock(); return; } + // found a work item; reapply default wq timeouts osd->cct->get_heartbeat_map()->reset_timeout(hb, - osd->cct->_conf->threadpool_default_timeout, 0); + timeout_interval, suicide_interval); } else { dout(20) << __func__ << " need return immediately" << dendl; wait_lock.unlock(); -- 2.47.3