From: Sridhar Seshasayee Date: Tue, 25 May 2021 14:09:33 +0000 (+0530) Subject: osd: Disable heartbeat timeout until a non-future workitem can be processed X-Git-Tag: v16.2.7~110^2~10 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=9f3937d98150f6d3afef04fe95b3914428e91e3f;p=ceph.git osd: Disable heartbeat timeout until a non-future workitem can be processed There could be rare instances when employing the mclock scheduler where a worker thread for a shard may not get an immediate work item to process. Such items are designated as future work items. In such cases, the _process() loop waits until the time indicated by the scheduler to attempt a dequeue from the scheduler queue again. It may so happen that if there are multiple threads per shard, a thread may not get an immediate item for a long time. This time could exceed the heartbeat timeout for the thread and result in hearbeat timeouts reported for the osd in question. To prevent this, the heartbeat timeouts for the thread is disabled before waiting for an item and enabled once the wait period is over. Signed-off-by: Sridhar Seshasayee (cherry picked from commit 9a95492b66341f7351e80f0386b4439f713debc6) --- diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc index 525304726b2..f16195bb38e 100644 --- a/src/osd/OSD.cc +++ b/src/osd/OSD.cc @@ -10842,12 +10842,17 @@ void OSD::ShardedOpWQ::_process(uint32_t thread_index, heartbeat_handle_d *hb) std::unique_lock wait_lock{sdata->sdata_wait_lock}; auto future_time = ceph::real_clock::from_double(*when_ready); dout(10) << __func__ << " dequeue future request at " << future_time << dendl; + // Disable heartbeat timeout until we find a non-future work item to process. + osd->cct->get_heartbeat_map()->clear_timeout(hb); sdata->shard_lock.unlock(); ++sdata->waiting_threads; sdata->sdata_cond.wait_until(wait_lock, future_time); --sdata->waiting_threads; wait_lock.unlock(); sdata->shard_lock.lock(); + // Reapply default wq timeouts + osd->cct->get_heartbeat_map()->reset_timeout(hb, + timeout_interval, suicide_interval); } } // while