From: Ernesto Puerta <37327689+epuertat@users.noreply.github.com>
Date: Thu, 11 Nov 2021 16:36:30 +0000 (+0100)
Subject: Merge pull request #43464 from rsommer/wip-prometheus-standby-behaviour
X-Git-Tag: v17.1.0~457
X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=45eb9dd328c53c252f7c2a17cf96e3c50739f19f;p=ceph.git

Merge pull request #43464 from rsommer/wip-prometheus-standby-behaviour

mgr/prometheus: Make prometheus standby behaviour configurable

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
---

45eb9dd328c53c252f7c2a17cf96e3c50739f19f
diff --cc doc/mgr/prometheus.rst
index 733c4bfdb4f3,a8774ff33234..ef138477608c
--- a/doc/mgr/prometheus.rst
+++ b/doc/mgr/prometheus.rst
@@@ -96,45 -98,26 +98,63 @@@ If you are confident that you don't req
  
      ceph config set mgr mgr/prometheus/cache false
  
+ If you are using the prometheus module behind some kind of reverse proxy or
+ loadbalancer, you can simplify discovering the active instance by switching
+ to ``error``-mode::
+ 
+     ceph config set mgr mgr/prometheus/standby_behaviour error
+ 
+ If set, the prometheus module will repond with a HTTP error when requesting ``/``
+ from the standby instance. The default error code is 500, but you can configure
+ the HTTP response code with::
+ 
+     ceph config set mgr mgr/prometheus/standby_error_status_code 503
+ 
+ Valid error codes are between 400-599.
+ 
+ To switch back to the default behaviour, simply set the config key to ``default``::
+ 
+     ceph config set mgr mgr/prometheus/standby_behaviour default
+ 
  .. _prometheus-rbd-io-statistics:
  
 +Ceph Health Checks
 +------------------
 +
 +The mgr/prometheus module also tracks and maintains a history of Ceph health checks,
 +exposing them to the Prometheus server as discrete metrics. This allows Prometheus
 +alert rules to be configured for specific health check events.
 +
 +The metrics take the following form;
 +
 +::
 +
 +    # HELP ceph_health_detail healthcheck status by type (0=inactive, 1=active)
 +    # TYPE ceph_health_detail gauge
 +    ceph_health_detail{name="OSDMAP_FLAGS",severity="HEALTH_WARN"} 0.0
 +    ceph_health_detail{name="OSD_DOWN",severity="HEALTH_WARN"} 1.0
 +    ceph_health_detail{name="PG_DEGRADED",severity="HEALTH_WARN"} 1.0
 +
 +The health check history is made available through the following commands;
 +
 +::
 +
 +    healthcheck history ls [--format {plain|json|json-pretty}]
 +    healthcheck history clear
 +
 +The ``ls`` command provides an overview of the health checks that the cluster has
 +encountered, or since the last ``clear`` command was issued. The example below;
 +
 +::
 +
 +    [ceph: root@c8-node1 /]# ceph healthcheck history ls
 +    Healthcheck Name          First Seen (UTC)      Last seen (UTC)       Count  Active
 +    OSDMAP_FLAGS              2021/09/16 03:17:47   2021/09/16 22:07:40       2    No
 +    OSD_DOWN                  2021/09/17 00:11:59   2021/09/17 00:11:59       1   Yes
 +    PG_DEGRADED               2021/09/17 00:11:59   2021/09/17 00:11:59       1   Yes
 +    3 health check(s) listed
 +
 +
  RBD IO statistics
  -----------------