From: Dan Hill Date: Wed, 15 Apr 2020 21:54:09 +0000 (-0700) Subject: rados: prevent ShardedOpWQ suicide_grace drop when waiting for work. X-Git-Tag: v16.1.0~2448^2 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=85f6e8d29cd8d0d30b3f07b26974357d875b6908;p=ceph.git rados: prevent ShardedOpWQ suicide_grace drop when waiting for work. The Sharded OpWQ will opportunistically wait for more work when processing an empty queue. While waiting, the default work queue heartbeat timeout and suicide_grace values are modified. The `threadpool_default_timeout` grace is applied and suicide_grace is disabled. If this op hangs, the heartbeat watchdog will not trigger an OSD suicide recovery. The default work queue values for grace and suicide_grace are re-applied after finding work. This keeps the heartbeat timeouts consistent with the values applied on _process() entry. Fixes: https://tracker.ceph.com/issues/45076 Signed-off-by: Dan Hill --- diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc index 133ebb984b25..7f63bbf5ced0 100644 --- a/src/osd/OSD.cc +++ b/src/osd/OSD.cc @@ -10448,8 +10448,9 @@ void OSD::ShardedOpWQ::_process(uint32_t thread_index, heartbeat_handle_d *hb) sdata->shard_lock.unlock(); return; } + // found a work item; reapply default wq timeouts osd->cct->get_heartbeat_map()->reset_timeout(hb, - osd->cct->_conf->threadpool_default_timeout, 0); + timeout_interval, suicide_interval); } else { dout(20) << __func__ << " need return immediately" << dendl; wait_lock.unlock();