Prometheus Module
=================
-Provides a Prometheus exporter to pass on Ceph performance counters
-from the collection point in ceph-mgr. Ceph-mgr receives MMgrReport
-messages from all MgrClient processes (mons and OSDs, for instance)
-with performance counter schema data and actual counter data, and keeps
-a circular buffer of the last N samples. This module creates an HTTP
-endpoint (like all Prometheus exporters) and retrieves the latest sample
-of every counter when polled (or "scraped" in Prometheus terminology).
-The HTTP path and query parameters are ignored; all extant counters
-for all reporting entities are returned in text exposition format.
-(See the Prometheus `documentation <https://prometheus.io/docs/instrumenting/exposition_formats/#text-format-details>`_.)
-
-Enabling prometheus output
+The Manager ``prometheus`` module implements a Prometheus exporter to expose
+Ceph performance counters from the collection point in the Manager. The
+Manager receives ``MMgrReport`` messages from all ``MgrClient`` processes
+(including mons and OSDs) with performance counter schema data and counter
+data, and maintains a circular buffer of the latest samples. This module
+listens on an HTTP endpoint and retrieves the latest sample of every counter
+when scraped. The HTTP path and query parameters are ignored. All extant
+counters for all reporting entities are returned in the Prometheus exposition
+format. (See the Prometheus `documentation
+<https://prometheus.io/docs/instrumenting/exposition_formats/#text-format-details>`_.)
+
+Enabling Prometheus output
==========================
-The *prometheus* module is enabled with:
+Enable the ``prometheus`` module by running the below command :
- .. prompt:: bash #
+ .. prompt:: bash $
ceph mgr module enable prometheus
code (service unavailable). You can set other options using the ``ceph config
set`` commands.
-To tell the module to respond with possibly stale data, set it to ``return``:
+To configure the module to respond with possibly stale data, set
+the cache strategy to ``return``:
- .. prompt:: bash #
+ .. prompt:: bash $
ceph config set mgr mgr/prometheus/stale_cache_strategy return
-To tell the module to respond with "service unavailable", set it to ``fail``:
+To configure the module to respond with "service unavailable", set it to ``fail``:
- .. prompt:: bash #
+ .. prompt:: bash $
ceph config set mgr mgr/prometheus/stale_cache_strategy fail
ceph config set mgr mgr/prometheus/cache false
-If you are using the prometheus module behind some kind of reverse proxy or
-loadbalancer, you can simplify discovering the active instance by switching
+If you are using the ``prometheus`` module behind a reverse proxy or
+load balancer, you can simplify discovery of the active instance by switching
to ``error``-mode:
- .. prompt:: bash #
+ .. prompt:: bash $
ceph config set mgr mgr/prometheus/standby_behaviour error
ceph_health_detail{name="OSD_DOWN",severity="HEALTH_WARN"} 1.0
ceph_health_detail{name="PG_DEGRADED",severity="HEALTH_WARN"} 1.0
-The health check history is made available through the following commands;
+The health check history may be retrieved and cleared by running the following commands:
- .. prompt:: bash #
+ ::
- healthcheck history ls [--format {plain|json|json-pretty}]
- healthcheck history clear
+ ceph healthcheck history ls [--format {plain|json|json-pretty}]
+ ceph healthcheck history clear
-The ``ls`` command provides an overview of the health checks that the cluster has
-encountered, or since the last ``clear`` command was issued. The example below;
+The ``ceph healthcheck ls`` command provides an overview of the health checks that the cluster has
+encountered since the last ``clear`` command was issued:
- .. prompt:: bash #
-
- ceph healthcheck history ls
-
::
+ [ceph: root@c8-node1 /]# ceph healthcheck history ls
Healthcheck Name First Seen (UTC) Last seen (UTC) Count Active
OSDMAP_FLAGS 2021/09/16 03:17:47 2021/09/16 22:07:40 2 No
OSD_DOWN 2021/09/17 00:11:59 2021/09/17 00:11:59 1 Yes
RBD IO statistics
-----------------
-The module can optionally collect RBD per-image IO statistics by enabling
-dynamic OSD performance counters. The statistics are gathered for all images
-in the pools that are specified in the ``mgr/prometheus/rbd_stats_pools``
+The ``prometheus`` module can optionally collect RBD per-image IO statistics by enabling
+dynamic OSD performance counters. Statistics are gathered for all images
+in the pools that are specified by the ``mgr/prometheus/rbd_stats_pools``
configuration parameter. The parameter is a comma or space separated list
-of ``pool[/namespace]`` entries. If the namespace is not specified the
+of ``pool[/namespace]`` entries. If the RBD namespace is not specified,
statistics are collected for all namespaces in the pool.
-Example to activate the RBD-enabled pools ``pool1``, ``pool2`` and ``poolN``:
+To enable collection of stats for RBD pools named ``pool1``, ``pool2`` and ``poolN``:
- .. prompt:: bash #
+ .. prompt:: bash $
ceph config set mgr mgr/prometheus/rbd_stats_pools "pool1,pool2,poolN"
-The wildcard can be used to indicate all pools or namespaces:
+A wildcard can be used to indicate all pools or namespaces:
- .. prompt:: bash #
+ .. prompt:: bash $
ceph config set mgr mgr/prometheus/rbd_stats_pools "*"
force refresh earlier if it detects statistics from a previously unknown
RBD image.
-Example to turn up the sync interval to 10 minutes:
+To set the sync interval to 10 minutes run the following command:
- .. prompt:: bash #
+ .. prompt:: bash $
ceph config set mgr mgr/prometheus/rbd_stats_pools_refresh_interval 600
Ceph daemon performance counters metrics
-----------------------------------------
-With the introduction of ``ceph-exporter`` daemon, the prometheus module will no longer export Ceph daemon
-perf counters as prometheus metrics by default. However, one may re-enable exporting these metrics by setting
+With the introduction of the ``ceph-exporter`` daemon, the ``prometheus`` module will no longer export Ceph daemon
+perf counters as Prometheus metrics by default. However, one may re-enable exporting these metrics by setting
the module option ``exclude_perf_counters`` to ``false``:
- .. prompt:: bash #
+ .. prompt:: bash $
ceph config set mgr mgr/prometheus/exclude_perf_counters false