]> git.apps.os.sepia.ceph.com Git - ceph.git/commit
mgr/prometheus: expose daemon health metrics 48843/head
authorPere Diaz Bou <pdiazbou@redhat.com>
Fri, 11 Nov 2022 09:43:01 +0000 (10:43 +0100)
committerPere Diaz Bou <pdiazbou@redhat.com>
Tue, 20 Dec 2022 08:44:49 +0000 (09:44 +0100)
commit5a2b7c25b68f2c955356640041e4c7ed72416d4e
treefd5fe902c1181ae64f2c13ed10c49b28fc67a9a7
parente464ce9c6e03afddc5b142720f36215996818c76
mgr/prometheus: expose daemon health metrics

Until now daemon health metrics were stored without being used. One of
the most helpful metrics there is SLOW_OPS with respect to OSDs and MONs
which this commit tries to expose to bring fine grained metrics to find
troublesome OSDs instead of having a lone healthcheck of slow ops in the
whole cluster.

Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
16 files changed:
doc/mgr/modules.rst
monitoring/ceph-mixin/dashboards/host.libsonnet
monitoring/ceph-mixin/dashboards/osd.libsonnet
monitoring/ceph-mixin/dashboards_out/host-details.json
monitoring/ceph-mixin/dashboards_out/osds-overview.json
monitoring/ceph-mixin/prometheus_alerts.libsonnet
monitoring/ceph-mixin/prometheus_alerts.yml
monitoring/ceph-mixin/tests_alerts/test_alerts.yml
src/mgr/ActivePyModules.cc
src/mgr/ActivePyModules.h
src/mgr/BaseMgrModule.cc
src/mgr/DaemonHealthMetric.h
src/mgr/DaemonServer.cc
src/pybind/mgr/ceph_module.pyi
src/pybind/mgr/mgr_module.py
src/pybind/mgr/prometheus/module.py