From: Patrick Seidensal Date: Wed, 23 Mar 2022 13:53:58 +0000 (+0100) Subject: mgr/dashboard: Compare values of MTU alert by device X-Git-Tag: v17.2.0~6^2~8 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=9d0e500f2aa2b763c5e48c9344a0049deb59fdbd;p=ceph.git mgr/dashboard: Compare values of MTU alert by device Fixes: https://tracker.ceph.com/issues/55004 Signed-off-by: Patrick Seidensal (cherry picked from commit 3821548a37373f87109ab0dac7f3ee2d8f3ead99) --- diff --git a/monitoring/ceph-mixin/prometheus_alerts.yml b/monitoring/ceph-mixin/prometheus_alerts.yml index fc38678f99dd..578596f4af0b 100644 --- a/monitoring/ceph-mixin/prometheus_alerts.yml +++ b/monitoring/ceph-mixin/prometheus_alerts.yml @@ -704,7 +704,18 @@ groups: rate of the past 48 hours. - alert: CephNodeInconsistentMTU - expr: node_network_mtu_bytes{device!="lo"} * (node_network_up{device!="lo"} > 0) != on() group_left() (quantile(0.5, node_network_mtu_bytes{device!="lo"})) + expr: | + node_network_mtu_bytes * (node_network_up{device!="lo"} > 0) == + scalar( + max by (device) (node_network_mtu_bytes * (node_network_up{device!="lo"} > 0)) != + quantile by (device) (.5, node_network_mtu_bytes * (node_network_up{device!="lo"} > 0)) + ) + or + node_network_mtu_bytes * (node_network_up{device!="lo"} > 0) == + scalar( + min by (device) (node_network_mtu_bytes * (node_network_up{device!="lo"} > 0)) != + quantile by (device) (.5, node_network_mtu_bytes * (node_network_up{device!="lo"} > 0)) + ) labels: severity: warning type: ceph_default @@ -712,7 +723,7 @@ groups: summary: MTU settings across Ceph hosts are inconsistent description: > Node {{ $labels.instance }} has a different MTU size ({{ $value }}) - than the median value on device {{ $labels.device }}. + than the median of devices named {{ $labels.device }}. - name: pools rules: