]> git.apps.os.sepia.ceph.com Git - ceph.git/commitdiff
mgr/devicehealth: fix telemetry stops sending device reports after 48 hours 33346/head
authorYaarit Hatuka <yaarit@redhat.com>
Mon, 27 Jan 2020 13:57:55 +0000 (08:57 -0500)
committerYaarit Hatuka <yaarit@redhat.com>
Sat, 15 Feb 2020 01:29:14 +0000 (20:29 -0500)
Telemetry module fetches device metrics which were scraped in the last
"telemetry interval"*2 (=48 hours by default) by calling
_get_device_metrics() with min_sample. _get_device_metrics() fetches the
metrics from omap and breaks on the first one that is older than
min_sample. But because it fetched in ascending order (from oldest to
newest) it was breaking on the first one it received, if it was older
than the interval above. We need to pass min_sample to get_omap_vals()
so it will start fetching from that value.

Fixes: https://tracker.ceph.com/issues/43837
Signed-off-by: Yaarit Hatuka <yaarit@redhat.com>
(cherry picked from commit 5f7e4a980a73e8cacb2c9bde47d822a32fb8c440)

src/pybind/mgr/devicehealth/module.py

index ed352bfb07eeaa04a7485ad4b4874dbad0ffb63f..53826d094de28ec92834c4420230d93c1deffd20 100644 (file)
@@ -429,7 +429,7 @@ class Module(MgrModule):
             return {}
         with ioctx:
             with rados.ReadOpCtx() as op:
-                omap_iter, ret = ioctx.get_omap_vals(op, "", sample or '',
+                omap_iter, ret = ioctx.get_omap_vals(op, min_sample or '', sample or '',
                                                      MAX_SAMPLES)  # fixme
                 assert ret == 0
                 try: