]> git.apps.os.sepia.ceph.com Git - ceph.git/commit
mgr/prometheus: expose daemon health metrics 49520/head
authorPere Diaz Bou <pdiazbou@redhat.com>
Fri, 11 Nov 2022 09:43:01 +0000 (10:43 +0100)
committerPere Diaz Bou <pdiazbou@redhat.com>
Tue, 20 Dec 2022 11:36:30 +0000 (12:36 +0100)
commit2b32cc3c3d6fac572f5e5c0183f519527011d9ce
tree222ae9d5884013be7a99d7b4621240bea915037b
parentacb9dba651af6d9e56c71f9be7bdc57773fe3880
mgr/prometheus: expose daemon health metrics

Until now daemon health metrics were stored without being used. One of
the most helpful metrics there is SLOW_OPS with respect to OSDs and MONs
which this commit tries to expose to bring fine grained metrics to find
troublesome OSDs instead of having a lone healthcheck of slow ops in the
whole cluster.

Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
(cherry picked from commit 5a2b7c25b68f2c955356640041e4c7ed72416d4e)
16 files changed:
doc/mgr/modules.rst
monitoring/ceph-mixin/dashboards/host.libsonnet
monitoring/ceph-mixin/dashboards/osd.libsonnet
monitoring/ceph-mixin/dashboards_out/host-details.json
monitoring/ceph-mixin/dashboards_out/osds-overview.json
monitoring/ceph-mixin/prometheus_alerts.libsonnet
monitoring/ceph-mixin/prometheus_alerts.yml
monitoring/ceph-mixin/tests_alerts/test_alerts.yml
src/mgr/ActivePyModules.cc
src/mgr/ActivePyModules.h
src/mgr/BaseMgrModule.cc
src/mgr/DaemonHealthMetric.h
src/mgr/DaemonServer.cc
src/pybind/mgr/ceph_module.pyi
src/pybind/mgr/mgr_module.py
src/pybind/mgr/prometheus/module.py