ceph config set mgr mgr/prometheus/cache false
+ If you are using the prometheus module behind some kind of reverse proxy or
+ loadbalancer, you can simplify discovering the active instance by switching
+ to ``error``-mode::
+
+ ceph config set mgr mgr/prometheus/standby_behaviour error
+
+ If set, the prometheus module will repond with a HTTP error when requesting ``/``
+ from the standby instance. The default error code is 500, but you can configure
+ the HTTP response code with::
+
+ ceph config set mgr mgr/prometheus/standby_error_status_code 503
+
+ Valid error codes are between 400-599.
+
+ To switch back to the default behaviour, simply set the config key to ``default``::
+
+ ceph config set mgr mgr/prometheus/standby_behaviour default
+
.. _prometheus-rbd-io-statistics:
+Ceph Health Checks
+------------------
+
+The mgr/prometheus module also tracks and maintains a history of Ceph health checks,
+exposing them to the Prometheus server as discrete metrics. This allows Prometheus
+alert rules to be configured for specific health check events.
+
+The metrics take the following form;
+
+::
+
+ # HELP ceph_health_detail healthcheck status by type (0=inactive, 1=active)
+ # TYPE ceph_health_detail gauge
+ ceph_health_detail{name="OSDMAP_FLAGS",severity="HEALTH_WARN"} 0.0
+ ceph_health_detail{name="OSD_DOWN",severity="HEALTH_WARN"} 1.0
+ ceph_health_detail{name="PG_DEGRADED",severity="HEALTH_WARN"} 1.0
+
+The health check history is made available through the following commands;
+
+::
+
+ healthcheck history ls [--format {plain|json|json-pretty}]
+ healthcheck history clear
+
+The ``ls`` command provides an overview of the health checks that the cluster has
+encountered, or since the last ``clear`` command was issued. The example below;
+
+::
+
+ [ceph: root@c8-node1 /]# ceph healthcheck history ls
+ Healthcheck Name First Seen (UTC) Last seen (UTC) Count Active
+ OSDMAP_FLAGS 2021/09/16 03:17:47 2021/09/16 22:07:40 2 No
+ OSD_DOWN 2021/09/17 00:11:59 2021/09/17 00:11:59 1 Yes
+ PG_DEGRADED 2021/09/17 00:11:59 2021/09/17 00:11:59 1 Yes
+ 3 health check(s) listed
+
+
RBD IO statistics
-----------------