]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/commit
ceph-mixin: Add Prometheus Alert for Degraded Bond 48538/head
authorChristian Kugler <syphdias+git@gmail.com>
Sun, 16 Oct 2022 17:21:01 +0000 (19:21 +0200)
committerChristian Kugler <syphdias+git@gmail.com>
Wed, 2 Nov 2022 13:48:57 +0000 (14:48 +0100)
commit4aecdad350890c0d18773e05a63e75966b7956e0
tree081a6a405b5128d4c01496c5220681aa248e77bd
parent8892cc0ace34be7f6e6b42021ea321d3d64f02e6
ceph-mixin: Add Prometheus Alert for Degraded Bond

Currently there is no alert for a network interface card to be misconfigured or
failed which is part of a network bond.

This could lead to redundancies and performance being degraded unnoticed.

To solve this, I use node exporter metrics to look at the number of total peers
of the bond and the ones that are active. If the numbers differ, something is up
and should be looked at.

Fixes: https://tracker.ceph.com/issues/57962
Signed-off-by: Christian Kugler <syphdias+git@gmail.com>
monitoring/ceph-mixin/prometheus_alerts.libsonnet
monitoring/ceph-mixin/prometheus_alerts.yml
monitoring/ceph-mixin/tests_alerts/test_alerts.yml