]> git.apps.os.sepia.ceph.com Git - ceph.git/commit
mgr/prometheus: expose daemon health metrics 49519/head
authorPere Diaz Bou <pdiazbou@redhat.com>
Fri, 11 Nov 2022 09:43:01 +0000 (10:43 +0100)
committerPere Diaz Bou <pdiazbou@redhat.com>
Tue, 20 Dec 2022 11:34:56 +0000 (12:34 +0100)
commit6a851ba3beee3cca058fc3a0932d2b4878ff335d
treeeee3b95d99477d2d08b582c5619e5f04ea61dc23
parent4143d952fbf1efbef0488cc0426d82f273ddf661
mgr/prometheus: expose daemon health metrics

Until now daemon health metrics were stored without being used. One of
the most helpful metrics there is SLOW_OPS with respect to OSDs and MONs
which this commit tries to expose to bring fine grained metrics to find
troublesome OSDs instead of having a lone healthcheck of slow ops in the
whole cluster.

Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
(cherry picked from commit 5a2b7c25b68f2c955356640041e4c7ed72416d4e)
16 files changed:
doc/mgr/modules.rst
monitoring/ceph-mixin/dashboards/host.libsonnet
monitoring/ceph-mixin/dashboards/osd.libsonnet
monitoring/ceph-mixin/dashboards_out/host-details.json
monitoring/ceph-mixin/dashboards_out/osds-overview.json
monitoring/ceph-mixin/prometheus_alerts.libsonnet
monitoring/ceph-mixin/prometheus_alerts.yml
monitoring/ceph-mixin/tests_alerts/test_alerts.yml
src/mgr/ActivePyModules.cc
src/mgr/ActivePyModules.h
src/mgr/BaseMgrModule.cc
src/mgr/DaemonHealthMetric.h
src/mgr/DaemonServer.cc
src/pybind/mgr/ceph_module.pyi
src/pybind/mgr/mgr_module.py
src/pybind/mgr/prometheus/module.py