It was requested for us to update our alerting definitions to include a
slow OSD Ops health check.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1951664
Signed-off-by: Boris Ranto <branto@redhat.com>
(cherry picked from commit
2491d4e004c7b162216bc17e2288f05d0b049a87)
annotations:
summary: "OSD(s) with High PG Count"
description: "This indicates there are some OSDs with high PG count (275+)."
+ - alert: Slow OSD Ops
+ expr: ceph_healthcheck_slow_ops > 0
+ for: 1m
+ labels:
+ severity: page
+ annotations:
+ summary: "Slow OSD Ops"
+ description: "OSD requests are taking too long to process (osd_op_complaint_time exceeded)"