ceph config set mgr mgr/prometheus/scrape_interval 20
On large clusters (>1000 OSDs), the time to fetch the metrics may become
-significant. Without the cache, the Prometheus manager module could,
-especially in conjunction with multiple Prometheus instances, overload the
-manager and lead to unresponsive or crashing Ceph manager instances. Hence,
-the cache is enabled by default and cannot be disabled. This means that there
-is a possibility that the cache becomes stale. The cache is considered stale
-when the time to fetch the metrics from Ceph exceeds the configured
-``scrape_interval``.
+significant. Without the cache, the Prometheus manager module could, especially
+in conjunction with multiple Prometheus instances, overload the manager and lead
+to unresponsive or crashing Ceph manager instances. Hence, the cache is enabled
+by default. This means that there is a possibility that the cache becomes
+stale. The cache is considered stale when the time to fetch the metrics from
+Ceph exceeds the configured :confval:``mgr/prometheus/scrape_interval``.
If that is the case, **a warning will be logged** and the module will either
ceph config set mgr mgr/prometheus/stale_cache_strategy fail
+If you are confident that you don't require the cache, you can disable it::
+
+ ceph config set mgr mgr/prometheus/cache false
+
.. _prometheus-rbd-io-statistics:
RBD IO statistics
'stale_cache_strategy',
default='log'
),
+ Option(
+ 'cache',
+ type='bool',
+ default=True,
+ ),
Option(
'rbd_stats_pools',
default=''
self.collect_lock = threading.Lock()
self.collect_time = 0.0
self.scrape_interval: float = 15.0
+ self.cache = True
self.stale_cache_strategy: str = self.STALE_CACHE_FAIL
self.collect_cache: Optional[str] = None
self.rbd_stats = {
@staticmethod
def _metrics(instance: 'Module') -> Optional[str]:
+ if not self.cache:
+ self.log.debug('Cache disabled, collecting and returning without cache')
+ cherrypy.response.headers['Content-Type'] = 'text/plain'
+ return self.collect()
+
# Return cached data if available
if not instance.collect_cache:
raise cherrypy.HTTPError(503, 'No cached data available yet')
(server_addr, server_port)
)
- self.metrics_thread.start()
+ self.cache = cast(bool, self.get_localized_module_option('cache', True))
+ if self.cache:
+ self.log.info('Cache enabled')
+ self.metrics_thread.start()
+ else:
+ self.log.info('Cache disabled')
cherrypy.config.update({
'server.socket_host': server_addr,