From: Zac Dover Date: Wed, 23 Apr 2025 09:15:19 +0000 (+1000) Subject: Merge branch 'main' into mgr-prom X-Git-Tag: v20.3.0~34^2 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=4843cd7f522882ec021f2777a220692f424c4df9;p=ceph.git Merge branch 'main' into mgr-prom Signed-off-by: Zac Dover --- 4843cd7f522882ec021f2777a220692f424c4df9 diff --cc doc/mgr/prometheus.rst index 9c2d341b753f,a3ea3f1e81ce..e61b7beb4c33 --- a/doc/mgr/prometheus.rst +++ b/doc/mgr/prometheus.rst @@@ -4,23 -4,23 +4,23 @@@ Prometheus Module ================= -Provides a Prometheus exporter to pass on Ceph performance counters -from the collection point in ceph-mgr. Ceph-mgr receives MMgrReport -messages from all MgrClient processes (mons and OSDs, for instance) -with performance counter schema data and actual counter data, and keeps -a circular buffer of the last N samples. This module creates an HTTP -endpoint (like all Prometheus exporters) and retrieves the latest sample -of every counter when polled (or "scraped" in Prometheus terminology). -The HTTP path and query parameters are ignored; all extant counters -for all reporting entities are returned in text exposition format. -(See the Prometheus `documentation `_.) - -Enabling prometheus output +The Manager ``prometheus`` module implements a Prometheus exporter to expose +Ceph performance counters from the collection point in the Manager. The +Manager receives ``MMgrReport`` messages from all ``MgrClient`` processes +(including mons and OSDs) with performance counter schema data and counter +data, and maintains a circular buffer of the latest samples. This module +listens on an HTTP endpoint and retrieves the latest sample of every counter +when scraped. The HTTP path and query parameters are ignored. All extant +counters for all reporting entities are returned in the Prometheus exposition +format. (See the Prometheus `documentation +`_.) + +Enabling Prometheus output ========================== -The *prometheus* module is enabled with: +Enable the ``prometheus`` module by running the below command : - .. prompt:: bash # + .. prompt:: bash $ ceph mgr module enable prometheus @@@ -90,16 -91,15 +90,16 @@@ This behavior can be configured. By def code (service unavailable). You can set other options using the ``ceph config set`` commands. -To tell the module to respond with possibly stale data, set it to ``return``: +To configure the module to respond with possibly stale data, set +the cache strategy to ``return``: - .. prompt:: bash # + .. prompt:: bash $ ceph config set mgr mgr/prometheus/stale_cache_strategy return -To tell the module to respond with "service unavailable", set it to ``fail``: +To configure the module to respond with "service unavailable", set it to ``fail``: - .. prompt:: bash # + .. prompt:: bash $ ceph config set mgr mgr/prometheus/stale_cache_strategy fail @@@ -109,11 -109,11 +109,11 @@@ If you are confident that you don't req ceph config set mgr mgr/prometheus/cache false -If you are using the prometheus module behind some kind of reverse proxy or -loadbalancer, you can simplify discovering the active instance by switching +If you are using the ``prometheus`` module behind a reverse proxy or +load balancer, you can simplify discovery of the active instance by switching to ``error``-mode: - .. prompt:: bash # + .. prompt:: bash $ ceph config set mgr mgr/prometheus/standby_behaviour error @@@ -152,22 -152,19 +152,19 @@@ The metrics take the following form ceph_health_detail{name="OSD_DOWN",severity="HEALTH_WARN"} 1.0 ceph_health_detail{name="PG_DEGRADED",severity="HEALTH_WARN"} 1.0 -The health check history is made available through the following commands; +The health check history may be retrieved and cleared by running the following commands: - .. prompt:: bash # + :: - healthcheck history ls [--format {plain|json|json-pretty}] - healthcheck history clear + ceph healthcheck history ls [--format {plain|json|json-pretty}] + ceph healthcheck history clear -The ``ls`` command provides an overview of the health checks that the cluster has -encountered, or since the last ``clear`` command was issued. The example below; +The ``ceph healthcheck ls`` command provides an overview of the health checks that the cluster has +encountered since the last ``clear`` command was issued: - .. prompt:: bash # - - ceph healthcheck history ls - :: + [ceph: root@c8-node1 /]# ceph healthcheck history ls Healthcheck Name First Seen (UTC) Last seen (UTC) Count Active OSDMAP_FLAGS 2021/09/16 03:17:47 2021/09/16 22:07:40 2 No OSD_DOWN 2021/09/17 00:11:59 2021/09/17 00:11:59 1 Yes @@@ -178,22 -175,22 +175,22 @@@ RBD IO statistics ----------------- -The module can optionally collect RBD per-image IO statistics by enabling -dynamic OSD performance counters. The statistics are gathered for all images -in the pools that are specified in the ``mgr/prometheus/rbd_stats_pools`` +The ``prometheus`` module can optionally collect RBD per-image IO statistics by enabling +dynamic OSD performance counters. Statistics are gathered for all images +in the pools that are specified by the ``mgr/prometheus/rbd_stats_pools`` configuration parameter. The parameter is a comma or space separated list -of ``pool[/namespace]`` entries. If the namespace is not specified the +of ``pool[/namespace]`` entries. If the RBD namespace is not specified, statistics are collected for all namespaces in the pool. -Example to activate the RBD-enabled pools ``pool1``, ``pool2`` and ``poolN``: +To enable collection of stats for RBD pools named ``pool1``, ``pool2`` and ``poolN``: - .. prompt:: bash # + .. prompt:: bash $ ceph config set mgr mgr/prometheus/rbd_stats_pools "pool1,pool2,poolN" -The wildcard can be used to indicate all pools or namespaces: +A wildcard can be used to indicate all pools or namespaces: - .. prompt:: bash # + .. prompt:: bash $ ceph config set mgr mgr/prometheus/rbd_stats_pools "*" @@@ -204,20 -201,20 +201,20 @@@ parameter, which defaults to 300 second force refresh earlier if it detects statistics from a previously unknown RBD image. -Example to turn up the sync interval to 10 minutes: +To set the sync interval to 10 minutes run the following command: - .. prompt:: bash # + .. prompt:: bash $ ceph config set mgr mgr/prometheus/rbd_stats_pools_refresh_interval 600 Ceph daemon performance counters metrics ----------------------------------------- -With the introduction of ``ceph-exporter`` daemon, the prometheus module will no longer export Ceph daemon -perf counters as prometheus metrics by default. However, one may re-enable exporting these metrics by setting +With the introduction of the ``ceph-exporter`` daemon, the ``prometheus`` module will no longer export Ceph daemon +perf counters as Prometheus metrics by default. However, one may re-enable exporting these metrics by setting the module option ``exclude_perf_counters`` to ``false``: - .. prompt:: bash # + .. prompt:: bash $ ceph config set mgr mgr/prometheus/exclude_perf_counters false