]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/commitdiff
rados: prevent ShardedOpWQ suicide_grace drop when waiting for work. 34575/head
authorDan Hill <daniel.hill@canonical.com>
Wed, 15 Apr 2020 21:54:09 +0000 (14:54 -0700)
committerDan Hill <daniel.hill@canonical.com>
Wed, 15 Apr 2020 21:54:09 +0000 (14:54 -0700)
The Sharded OpWQ will opportunistically wait for more work when
processing an empty queue. While waiting, the default work queue
heartbeat timeout and suicide_grace values are modified. The
`threadpool_default_timeout` grace is applied and suicide_grace is
disabled. If this op hangs, the heartbeat watchdog will not trigger an
OSD suicide recovery.

The default work queue values for grace and suicide_grace are re-applied
after finding work. This keeps the heartbeat timeouts consistent with
the values applied on _process() entry.

Fixes: https://tracker.ceph.com/issues/45076
Signed-off-by: Dan Hill <daniel.hill@canonical.com>
src/osd/OSD.cc

index 133ebb984b2574d5f993c6035af81644daa80a39..7f63bbf5ced0c02ad6b2cf1c6a0aed5243e41f3e 100644 (file)
@@ -10448,8 +10448,9 @@ void OSD::ShardedOpWQ::_process(uint32_t thread_index, heartbeat_handle_d *hb)
        sdata->shard_lock.unlock();
        return;
       }
+      // found a work item; reapply default wq timeouts
       osd->cct->get_heartbeat_map()->reset_timeout(hb,
-         osd->cct->_conf->threadpool_default_timeout, 0);
+        timeout_interval, suicide_interval);
     } else {
       dout(20) << __func__ << " need return immediately" << dendl;
       wait_lock.unlock();