From: Zac Dover Date: Fri, 30 May 2025 18:01:04 +0000 (+1000) Subject: Merge branch 'squid' into wip-doc-2025-05-29-backport-63471-to-squid X-Git-Tag: testing/wip-vshankar-testing-20250604.145504-squid-debug~16^2 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=fbb5a13a343828b1642893ebad92495f79f6f034;p=ceph-ci.git Merge branch 'squid' into wip-doc-2025-05-29-backport-63471-to-squid Signed-off-by: Zac Dover --- fbb5a13a343828b1642893ebad92495f79f6f034 diff --cc doc/mgr/prometheus.rst index 9eecff3c90a,e61b7beb4c3..63a11158520 --- a/doc/mgr/prometheus.rst +++ b/doc/mgr/prometheus.rst @@@ -4,23 -4,23 +4,23 @@@ Prometheus Module ================= - Provides a Prometheus exporter to pass on Ceph performance counters - from the collection point in ceph-mgr. Ceph-mgr receives MMgrReport - messages from all MgrClient processes (mons and OSDs, for instance) - with performance counter schema data and actual counter data, and keeps - a circular buffer of the last N samples. This module creates an HTTP - endpoint (like all Prometheus exporters) and retrieves the latest sample - of every counter when polled (or "scraped" in Prometheus terminology). - The HTTP path and query parameters are ignored; all extant counters - for all reporting entities are returned in text exposition format. - (See the Prometheus `documentation `_.) - - Enabling prometheus output + The Manager ``prometheus`` module implements a Prometheus exporter to expose + Ceph performance counters from the collection point in the Manager. The + Manager receives ``MMgrReport`` messages from all ``MgrClient`` processes + (including mons and OSDs) with performance counter schema data and counter + data, and maintains a circular buffer of the latest samples. This module + listens on an HTTP endpoint and retrieves the latest sample of every counter + when scraped. The HTTP path and query parameters are ignored. All extant + counters for all reporting entities are returned in the Prometheus exposition + format. (See the Prometheus `documentation + `_.) + + Enabling Prometheus output ========================== - The *prometheus* module is enabled with: + Enable the ``prometheus`` module by running the below command : -.. prompt:: bash $ +.. prompt:: bash # ceph mgr module enable prometheus @@@ -91,15 -90,16 +90,16 @@@ This behavior can be configured. By def code (service unavailable). You can set other options using the ``ceph config set`` commands. - To tell the module to respond with possibly stale data, set it to ``return``: + To configure the module to respond with possibly stale data, set + the cache strategy to ``return``: -.. prompt:: bash $ +.. prompt:: bash # - ceph config set mgr mgr/prometheus/stale_cache_strategy return + ceph config set mgr mgr/prometheus/stale_cache_strategy return - To tell the module to respond with "service unavailable", set it to ``fail``: + To configure the module to respond with "service unavailable", set it to ``fail``: -.. prompt:: bash $ +.. prompt:: bash # ceph config set mgr mgr/prometheus/stale_cache_strategy fail @@@ -109,11 -109,11 +109,11 @@@ If you are confident that you don't req ceph config set mgr mgr/prometheus/cache false - If you are using the prometheus module behind some kind of reverse proxy or - loadbalancer, you can simplify discovering the active instance by switching + If you are using the ``prometheus`` module behind a reverse proxy or + load balancer, you can simplify discovery of the active instance by switching to ``error``-mode: -.. prompt:: bash $ +.. prompt:: bash # ceph config set mgr mgr/prometheus/standby_behaviour error @@@ -152,22 -152,19 +152,23 @@@ The metrics take the following form ceph_health_detail{name="OSD_DOWN",severity="HEALTH_WARN"} 1.0 ceph_health_detail{name="PG_DEGRADED",severity="HEALTH_WARN"} 1.0 - The health check history is made available through the following commands; + The health check history may be retrieved and cleared by running the following commands: -:: +.. prompt:: bash # + + ceph healthcheck history ls [--format {plain|json|json-pretty}] + ceph healthcheck history clear - The ``ls`` command provides an overview of the health checks that the cluster has - encountered, or since the last ``clear`` command was issued. The example below; - ceph healthcheck history ls [--format {plain|json|json-pretty}] - ceph healthcheck history clear + + The ``ceph healthcheck ls`` command provides an overview of the health checks that the cluster has + encountered since the last ``clear`` command was issued: +.. prompt:: bash # + + ceph healthcheck history ls + :: - [ceph: root@c8-node1 /]# ceph healthcheck history ls Healthcheck Name First Seen (UTC) Last seen (UTC) Count Active OSDMAP_FLAGS 2021/09/16 03:17:47 2021/09/16 22:07:40 2 No OSD_DOWN 2021/09/17 00:11:59 2021/09/17 00:11:59 1 Yes @@@ -178,22 -175,22 +179,22 @@@ RBD IO statistics ----------------- - The module can optionally collect RBD per-image IO statistics by enabling - dynamic OSD performance counters. The statistics are gathered for all images - in the pools that are specified in the ``mgr/prometheus/rbd_stats_pools`` + The ``prometheus`` module can optionally collect RBD per-image IO statistics by enabling + dynamic OSD performance counters. Statistics are gathered for all images + in the pools that are specified by the ``mgr/prometheus/rbd_stats_pools`` configuration parameter. The parameter is a comma or space separated list - of ``pool[/namespace]`` entries. If the namespace is not specified the + of ``pool[/namespace]`` entries. If the RBD namespace is not specified, statistics are collected for all namespaces in the pool. - Example to activate the RBD-enabled pools ``pool1``, ``pool2`` and ``poolN``: + To enable collection of stats for RBD pools named ``pool1``, ``pool2`` and ``poolN``: -.. prompt:: bash $ +.. prompt:: bash # ceph config set mgr mgr/prometheus/rbd_stats_pools "pool1,pool2,poolN" - The wildcard can be used to indicate all pools or namespaces: + A wildcard can be used to indicate all pools or namespaces: -.. prompt:: bash $ +.. prompt:: bash # ceph config set mgr mgr/prometheus/rbd_stats_pools "*" @@@ -204,20 -201,20 +205,20 @@@ parameter, which defaults to 300 second force refresh earlier if it detects statistics from a previously unknown RBD image. - Example to turn up the sync interval to 10 minutes: + To set the sync interval to 10 minutes run the following command: -.. prompt:: bash $ +.. prompt:: bash # ceph config set mgr mgr/prometheus/rbd_stats_pools_refresh_interval 600 Ceph daemon performance counters metrics ----------------------------------------- - With the introduction of ``ceph-exporter`` daemon, the prometheus module will no longer export Ceph daemon - perf counters as prometheus metrics by default. However, one may re-enable exporting these metrics by setting + With the introduction of the ``ceph-exporter`` daemon, the ``prometheus`` module will no longer export Ceph daemon + perf counters as Prometheus metrics by default. However, one may re-enable exporting these metrics by setting the module option ``exclude_perf_counters`` to ``false``: -.. prompt:: bash $ +.. prompt:: bash # ceph config set mgr mgr/prometheus/exclude_perf_counters false