Merge pull request #43897 from k0ste/wip-53234-pacific

author Ernesto Puerta <37327689+epuertat@users.noreply.github.com>

Fri, 4 Feb 2022 16:39:05 +0000 (17:39 +0100)

committer GitHub <noreply@github.com>

Fri, 4 Feb 2022 16:39:05 +0000 (17:39 +0100)
author Ernesto Puerta <37327689+epuertat@users.noreply.github.com>
Fri, 4 Feb 2022 16:39:05 +0000 (17:39 +0100)
committer GitHub <noreply@github.com>
Fri, 4 Feb 2022 16:39:05 +0000 (17:39 +0100)
diff --cc doc/mgr/prometheus.rst

index 5510b8cd76a3a2c9a52ce0a2b14fdae56ec965c9,04a10c3cd2f6b8d87c95f833a588e0c259510161..d4a9fd911e171590c76c63135ad9c1b04a097e5b
--- 1/doc/mgr/prometheus.rst
--- 2/doc/mgr/prometheus.rst
+++ b/doc/mgr/prometheus.rst
@@@ -87,45 -88,26 +87,63 @@@ If you are confident that you don't req
   
       ceph config set mgr mgr/prometheus/cache false
   
+ If you are using the prometheus module behind some kind of reverse proxy or
+ loadbalancer, you can simplify discovering the active instance by switching
+ to ``error``-mode::
+ 
+     ceph config set mgr mgr/prometheus/standby_behaviour error
+ 
+ If set, the prometheus module will repond with a HTTP error when requesting ``/``
+ from the standby instance. The default error code is 500, but you can configure
+ the HTTP response code with::
+ 
+     ceph config set mgr mgr/prometheus/standby_error_status_code 503
+ 
+ Valid error codes are between 400-599.
+ 
+ To switch back to the default behaviour, simply set the config key to ``default``::
+ 
+     ceph config set mgr mgr/prometheus/standby_behaviour default
+ 
   .. _prometheus-rbd-io-statistics:
   
+ +Ceph Health Checks
+ +------------------
+ +
+ +The mgr/prometheus module also tracks and maintains a history of Ceph health checks,
+ +exposing them to the Prometheus server as discrete metrics. This allows Prometheus
+ +alert rules to be configured for specific health check events.
+ +
+ +The metrics take the following form;
+ +
+ +::
+ +
+ +    # HELP ceph_health_detail healthcheck status by type (0=inactive, 1=active)
+ +    # TYPE ceph_health_detail gauge
+ +    ceph_health_detail{name="OSDMAP_FLAGS",severity="HEALTH_WARN"} 0.0
+ +    ceph_health_detail{name="OSD_DOWN",severity="HEALTH_WARN"} 1.0
+ +    ceph_health_detail{name="PG_DEGRADED",severity="HEALTH_WARN"} 1.0
+ +
+ +The health check history is made available through the following commands;
+ +
+ +::
+ +
+ +    healthcheck history ls [--format {plain|json|json-pretty}]
+ +    healthcheck history clear
+ +
+ +The ``ls`` command provides an overview of the health checks that the cluster has
+ +encountered, or since the last ``clear`` command was issued. The example below;
+ +
+ +::
+ +
+ +    [ceph: root@c8-node1 /]# ceph healthcheck history ls
+ +    Healthcheck Name          First Seen (UTC)      Last seen (UTC)       Count  Active
+ +    OSDMAP_FLAGS              2021/09/16 03:17:47   2021/09/16 22:07:40       2    No
+ +    OSD_DOWN                  2021/09/17 00:11:59   2021/09/17 00:11:59       1   Yes
+ +    PG_DEGRADED               2021/09/17 00:11:59   2021/09/17 00:11:59       1   Yes
+ +    3 health check(s) listed
+ +
+ +
   RBD IO statistics
   -----------------
   
diff --cc src/pybind/mgr/prometheus/module.py
Simple merge
author	Ernesto Puerta <37327689+epuertat@users.noreply.github.com>
	Fri, 4 Feb 2022 16:39:05 +0000 (17:39 +0100)
committer	GitHub <noreply@github.com>
	Fri, 4 Feb 2022 16:39:05 +0000 (17:39 +0100)
		1	2
doc/mgr/prometheus.rst	patch \|	diff1 \|	diff2 \|	blob \| history
src/pybind/mgr/prometheus/module.py	patch \|	diff1 \|	diff2 \|	blob \| history