space available from BlueStore (in other words, ``bluestore max free`` has
a low value), consider reducing the BlueFS allocation unit size. To simulate
available space when the allocation unit is different, run the following
-command:
+command:
.. prompt:: bash #
BLOCK_DEVICE_STALLED_READ_ALERT
_______________________________
-There are BlueStore log messages that reveal storage drive issues
+There are BlueStore log messages that reveal storage drive issues
that can cause performance degradation and potentially data unavailability or
loss. These may indicate a storage drive that is failing and should be
evaluated and possibly removed and replaced.
ceph config set class:ssd bluestore_slow_ops_warn_lifetime 300
ceph config set class:ssd bluestore_slow_ops_warn_threshold 5
-Device health
+Device Health
-------------
DEVICE_HEALTH
_____________
One or more OSD devices are expected to fail soon, where the warning threshold
-is determined by the ``mgr/devicehealth/warn_threshold`` config option.
+is determined by the :confval:`mgr/devicehealth/warn_threshold` config option.
Because this alert applies only to OSDs that are currently marked ``in``, the
appropriate response to this expected failure is (1) to mark the OSD ``out`` so
that data is migrated off of the OSD, and then (2) to remove the hardware from
the system. Note that this marking ``out`` is normally done automatically if
-``mgr/devicehealth/self_heal`` is enabled (as determined by
-``mgr/devicehealth/mark_out_threshold``). If an OSD device is compromised but
-the OSD(s) on that device are still ``up``, recovery can be degraded. In such
+:confval:`mgr/devicehealth/self_heal` is enabled (as determined by
+:confval:`mgr/devicehealth/mark_out_threshold`). If an OSD device is compromised but
+the OSD(s) on that device are still ``up``, recovery can be degraded. In such
cases it may be advantageous to forcibly stop the OSD daemon(s) in question so
that recovery can proceed from surviving healthly OSDs. This must be
done with extreme care and attention to failure domains so that data availability
To check device health, run a command of the following form:
-.. prompt:: bash $
+.. prompt:: bash #
ceph device info <device-id>
Device life expectancy is set either by a prediction model that the Ceph Manager
runs or by an external tool that runs a command the following form:
-.. prompt:: bash $
+.. prompt:: bash #
ceph device set-life-expectancy <device-id> <from> <to>
One or more devices (that is, OSDs) are expected to fail soon and have been
marked ``out`` of the cluster (as controlled by
-``mgr/devicehealth/mark_out_threshold``), but they are still participating in
+:confval:`mgr/devicehealth/mark_out_threshold`), but they are still participating in
one or more Placement Groups. This might be because the OSD(s) were marked
``out`` only recently and data is still migrating, or because data cannot be
migrated off of the OSD(s) for some reason (for example, the cluster is nearly
OSD to migrate the data to).
This message can be silenced by disabling self-heal behavior (that is, setting
-``mgr/devicehealth/self_heal`` to ``false``), by adjusting
-``mgr/devicehealth/mark_out_threshold``, or by addressing whichever condition
+:confval:`mgr/devicehealth/self_heal` to ``false``), by adjusting
+:confval:`mgr/devicehealth/mark_out_threshold`, or by addressing whichever condition
is preventing data from being migrated off of the ailing OSD(s).
.. _rados_health_checks_device_health_toomany:
_____________________
Too many devices (that is, OSDs) are expected to fail soon, and because
-``mgr/devicehealth/self_heal`` behavior is enabled, marking ``out`` all of the
-ailing OSDs would exceed the cluster's ``mon_osd_min_in_ratio`` ratio. This
+:confval:`mgr/devicehealth/self_heal` behavior is enabled, marking ``out`` all of the
+ailing OSDs would exceed the cluster's :confval:`mon_osd_min_in_ratio` ratio. This
ratio prevents a cascade of too many OSDs from being automatically marked
``out``.
incrementally replace the failing OSDs.
Alternatively, you can silence this health check by adjusting options including
-``mon_osd_min_in_ratio`` or ``mgr/devicehealth/mark_out_threshold``. Be
+:confval:`mon_osd_min_in_ratio` or :confval:`mgr/devicehealth/mark_out_threshold`. Be
warned, however, that this will increase the likelihood of unrecoverable data
loss.
-Data health (pools & placement groups)
+Data Health (Pools & Placement Groups)
--------------------------------------
PG_AVAILABILITY
_______________
Data availability is reduced. In other words, the cluster is unable to service
-potential read or write requests for at least some data in the cluster. More
+potential read or write requests for at least some data in the cluster. More
precisely, one or more Placement Groups (PGs) are in a state that does not
allow I/O requests to be serviced. Any of the following PG states are
-problematic if they do not clear quickly: *peering*, *stale*, *incomplete*, and
-the lack of *active*.
+problematic if they do not clear quickly: ``peering``, ``stale``, ``incomplete``, and
+the lack of ``active``.
For detailed information about which PGs are affected, run the following
command:
-.. prompt:: bash $
+.. prompt:: bash #
ceph health detail
In most cases, the root cause of this issue is that one or more OSDs are
-currently ``down``: see ``OSD_DOWN`` above.
+currently ``down``: see `OSD_DOWN`_ above.
To see the state of a specific problematic PG, run a command of the following form:
-.. prompt:: bash $
+.. prompt:: bash #
ceph tell <pgid> query
Data redundancy is reduced for some data: in other words, the cluster does not
have the desired number of replicas for all data (in the case of replicated
-pools) or erasure code fragments (in the case of erasure-coded pools). More
+pools) or erasure code fragments (in the case of erasure-coded pools). More
precisely, one or more Placement Groups (PGs):
-* have the *degraded* or *undersized* flag set, which means that there are not
+* have the ``degraded`` or ``undersized`` flag set, which means that there are not
enough instances of that PG in the cluster; or
-* have not had the *clean* state set for a long time.
+* have not had the ``clean`` state set for a long time.
For detailed information about which PGs are affected, run the following
command:
-.. prompt:: bash $
+.. prompt:: bash #
ceph health detail
In most cases, the root cause of this issue is that one or more OSDs are
-currently "down": see ``OSD_DOWN`` above.
+currently ``down``: see `OSD_DOWN`_ above.
To see the state of a specific problematic PG, run a command of the following form:
-.. prompt:: bash $
+.. prompt:: bash #
ceph tell <pgid> query
Data redundancy might be reduced or even put at risk for some data due to a
lack of free space in the cluster. More precisely, one or more Placement Groups
-have the *recovery_toofull* flag set, which means that the cluster is unable to
+have the ``recovery_toofull`` flag set, which means that the cluster is unable to
migrate or recover data because one or more OSDs are above the ``full``
threshold.
-For steps to resolve this condition, see *OSD_FULL* above.
+For steps to resolve this condition, see `OSD_FULL`_ above.
PG_BACKFILL_FULL
________________
Data redundancy might be reduced or even put at risk for some data due to a
lack of free space in the cluster. More precisely, one or more Placement Groups
-have the *backfill_toofull* flag set, which means that the cluster is unable to
+have the ``backfill_toofull`` flag set, which means that the cluster is unable to
migrate or recover data because one or more OSDs are above the ``backfillfull``
threshold.
-For steps to resolve this condition, see *OSD_BACKFILLFULL* above.
+For steps to resolve this condition, see `OSD_BACKFILLFULL`_ above.
PG_DAMAGED
__________
Data scrubbing has discovered problems with data consistency in the cluster.
-More precisely, one or more Placement Groups either (1) have the *inconsistent*
+More precisely, one or more Placement Groups either (1) have the ``inconsistent``
or ``snaptrim_error`` flag set, which indicates that an earlier data scrub
-operation found a problem, or (2) have the *repair* flag set, which means that
+operation found a problem, or (2) have the ``repair`` flag set, which means that
a repair for such an inconsistency is currently in progress.
For more information, see :ref:`rados_operations_monitoring_osd_pg`.
________________
Recent OSD scrubs have discovered inconsistencies. This alert is generally
-paired with *PG_DAMAGED* (see above).
+paired with `PG_DAMAGED`_ (see above).
For more information, see :ref:`rados_operations_monitoring_osd_pg`.
____________________
The count of read repairs has exceeded the config value threshold
-``mon_osd_warn_num_repaired`` (default: ``10``). Because scrub handles errors
+:confval:`mon_osd_warn_num_repaired` (default: ``10``). Because scrub handles errors
only for data at rest, and because any read error that occurs when another
replica is available is repaired immediately so that the client can get
the object data, there might exist failing disks that are not registering any
In order to allow clearing of the warning, a new command
``ceph tell osd.# clear_shards_repaired [count]`` has been added.
-By default it will set the repair count to 0. A `count` value can be passed
+By default it will set the repair count to 0. A ``count`` value can be passed
to the command. Thus, the administrator has the option to re-enable the warning
-by passing the value of ``mon_osd_warn_num_repaired`` (or above) to the command.
-An alternative to using `clear_shards_repaired` is to mute the
-`OSD_TOO_MANY_REPAIRS` alert with `ceph health mute`.
+by passing the value of :confval:`mon_osd_warn_num_repaired` (or above) to the command.
+An alternative to using ``clear_shards_repaired`` is to :ref:`mute <rados-monitoring-muting-health-checks>` the
+``OSD_TOO_MANY_REPAIRS`` alert with ``ceph health mute``.
LARGE_OMAP_OBJECTS
__________________
One or more pools contain large omap objects, as determined by
-``osd_deep_scrub_large_omap_object_key_threshold`` (the threshold for the
+:confval:`osd_deep_scrub_large_omap_object_key_threshold` (the threshold for the
number of keys to determine what is considered a large omap object) or
-``osd_deep_scrub_large_omap_object_value_sum_threshold`` (the threshold for the
+:confval:`osd_deep_scrub_large_omap_object_value_sum_threshold` (the threshold for the
summed size in bytes of all key values to determine what is considered a large
-omap object) or both. To find more information on object name, key count, and
-size in bytes, search the cluster log for 'Large omap object found'. This issue
-can be caused by RGW-bucket index objects that do not have automatic resharding
+omap object) or both. To find more information on object name, key count, and
+size in bytes, search the cluster log for ``Large omap object found``. This issue
+can be caused by RGW bucket index objects that do not have automatic resharding
enabled. For more information on resharding, see :ref:`RGW Dynamic Bucket Index
Resharding <rgw_dynamic_bucket_index_resharding>`.
To adjust the thresholds mentioned above, run a command of following form:
-.. prompt:: bash $
+.. prompt:: bash #
ceph config set osd osd_deep_scrub_large_omap_object_key_threshold <keys>
ceph config set osd osd_deep_scrub_large_omap_object_value_sum_threshold <bytes>
To adjust the cache pool's target size, run a command of the following form:
-.. prompt:: bash $
+.. prompt:: bash #
ceph osd pool set <cache-pool-name> target_max_bytes <bytes>
ceph osd pool set <cache-pool-name> target_max_objects <objects>
___________
The number of Placement Groups (PGs) that are in use in the cluster is below
-the configurable threshold of ``mon_pg_warn_min_per_osd`` PGs per OSD. This can
+the configurable threshold of :confval:`mon_pg_warn_min_per_osd` PGs per OSD. This can
lead to suboptimal distribution and suboptimal balance of data across the OSDs
in the cluster, and a reduction of overall performance.
If data pools have not yet been created, this condition is expected.
To address this issue, you can increase the PG count for existing pools or
-create new pools. For more information, see
+create new pools. For more information, see
:ref:`choosing-number-of-placement-groups`.
POOL_PG_NUM_NOT_POWER_OF_TWO
____________________________
-One or more pools have a ``pg_num`` value that is not a power of two. Although
+One or more pools have a ``pg_num`` value that is not a power of two. Although
this is not fatal, it does lead to a less balanced distribution of
data because some placement groups will comprise much more data than others.
This is easily corrected by setting the ``pg_num`` value for the affected
pool(s) to a nearby power of two. Enable the PG Autoscaler or run a command of the following form:
-.. prompt:: bash $
+.. prompt:: bash #
ceph osd pool set <pool-name> pg_num <value>
To disable this health check, run the following command:
-.. prompt:: bash $
+.. prompt:: bash #
ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false
To disable the alert, entirely disable auto-scaling of PGs for the pool by
running the following command:
-.. prompt:: bash $
+.. prompt:: bash #
ceph osd pool set <pool-name> pg_autoscale_mode off
To allow the cluster to automatically adjust the number of PGs for the pool,
run a command of following form:
-.. prompt:: bash $
+.. prompt:: bash #
ceph osd pool set <pool-name> pg_autoscale_mode on
Alternatively, to manually set the number of PGs for the pool to the
recommended amount, run a command of the following form:
-.. prompt:: bash $
+.. prompt:: bash #
ceph osd pool set <pool-name> pg_num <new-pg-num>
____________
The number of Placement Groups (PGs) in use in the cluster is above the
-configurable threshold of ``mon_max_pg_per_osd`` PGs per OSD. If this threshold
-is exceeded, the cluster will not allow new pools to be created, pool `pg_num`
+configurable threshold of :confval:`mon_max_pg_per_osd` PGs per OSD. If this threshold
+is exceeded, the cluster will not allow new pools to be created, pool ``pg_num``
to be increased, or pool replication to be increased (any of which, if allowed,
would lead to more PGs in the cluster). A large number of PGs can lead to
higher memory utilization for OSD daemons, slower peering after cluster state
marking ``out`` OSDs ``in`` (if there are any ``out`` OSDs available) can also
help. To do so, run a command of the following form:
-.. prompt:: bash $
+.. prompt:: bash #
ceph osd in <osd id(s)>
To disable the alert, entirely disable auto-scaling of PGs for the pool by
running the following command:
-.. prompt:: bash $
+.. prompt:: bash #
ceph osd pool set <pool-name> pg_autoscale_mode off
To allow the cluster to automatically adjust the number of PGs for the pool,
run a command of the following form:
-.. prompt:: bash $
+.. prompt:: bash #
ceph osd pool set <pool-name> pg_autoscale_mode on
Alternatively, to manually set the number of PGs for the pool to the
recommended amount, run a command of the following form:
-.. prompt:: bash $
+.. prompt:: bash #
ceph osd pool set <pool-name> pg_num <new-pg-num>
the pool is too large and should be reduced or set to zero. To reduce the
``target_size_bytes`` value or set it to zero, run the following command:
-.. prompt:: bash $
+.. prompt:: bash #
ceph osd pool set <pool-name> target_size_bytes 0
____________________________________
One or more pools have both ``target_size_bytes`` and ``target_size_ratio`` set
-in order to estimate the expected size of the pool. Only one of these
+in order to estimate the expected size of the pool. Only one of these
properties should be non-zero. If both are set to a non-zero value, then
``target_size_ratio`` takes precedence and ``target_size_bytes`` is ignored.
To reset ``target_size_bytes`` to zero, run a command of the following form:
-.. prompt:: bash $
+.. prompt:: bash #
ceph osd pool set <pool-name> target_size_bytes 0
____________
The number of OSDs in the cluster is below the configurable threshold of
-``osd_pool_default_size``. This means that some or all data may not be able to
+:confval:`osd_pool_default_size`. This means that some or all data may not be able to
satisfy the data protection policy specified in CRUSH rules and pool settings.
SMALLER_PGP_NUM
This issue is normally resolved by setting ``pgp_num`` to match ``pg_num``, so
as to trigger the data migration, by running a command of the following form:
-.. prompt:: bash $
+.. prompt:: bash #
ceph osd pool set <pool> pgp_num <pg-num-value>
One or more pools have an average number of objects per Placement Group (PG)
that is significantly higher than the overall cluster average. The specific
-threshold is determined by the ``mon_pg_warn_max_object_skew`` configuration
+threshold is determined by the :confval:`mon_pg_warn_max_object_skew` configuration
value.
This alert is usually an indication that the pool(s) that contain most of the
data in the cluster have too few PGs, or that other pools that contain less
-data have too many PGs. See *TOO_MANY_PGS* above.
+data have too many PGs. See `TOO_MANY_PGS`_ above.
To silence the health check, raise the threshold by adjusting the
-``mon_pg_warn_max_object_skew`` config option on the managers.
+:confval:`mon_pg_warn_max_object_skew` config option on the managers.
The health check is silenced for a specific pool only if
``pg_autoscale_mode`` is set to ``on``.
To resolve this issue, tag the pool for use by an application. For
example, if the pool is used by RBD, run a command of the following form:
-.. prompt:: bash $
+.. prompt:: bash #
rbd pool init <poolname>
Alternatively, if the pool is being used by a custom application (here 'foo'),
you can label the pool by running the following low-level command:
-.. prompt:: bash $
+.. prompt:: bash #
ceph osd pool application enable foo
One or more pools have reached (or are very close to reaching) their quota. The
threshold to raise this health check is determined by the
-``mon_pool_quota_crit_threshold`` configuration option.
+:confval:`mon_pool_quota_crit_threshold` configuration option.
Pool quotas can be adjusted up or down (or removed) by running commands of the the following
forms:
-.. prompt:: bash $
+.. prompt:: bash #
ceph osd pool set-quota <pool> max_bytes <bytes>
ceph osd pool set-quota <pool> max_objects <objects>
One or more pools are approaching a configured fullness threshold.
One of the several thresholds that can raise this health check is determined by
-the ``mon_pool_quota_warn_threshold`` configuration option.
+the :confval:`mon_pool_quota_warn_threshold` configuration option.
Pool quotas can be adjusted up or down (or removed) by running commands of the following
forms:
-.. prompt:: bash $
+.. prompt:: bash #
ceph osd pool set-quota <pool> max_bytes <bytes>
ceph osd pool set-quota <pool> max_objects <objects>
-To disable a quota, set the quota value to 0.
+To disable a quota, set the quota value to ``0``.
Other thresholds that can raise the two health checks above are
-``mon_osd_nearfull_ratio`` and ``mon_osd_full_ratio``. For details and
+:confval:`mon_osd_nearfull_ratio` and :confval:`mon_osd_full_ratio`. For details and
resolution, see :ref:`storage-capacity` and :ref:`no-free-drive-space`.
OBJECT_MISPLACED
Read or write requests to unfound objects will block.
-Ideally, a "down" OSD that has a more recent copy of the unfound object can be
+Ideally, a ``down`` OSD that has a more recent copy of the unfound object can be
brought back online. To identify candidate OSDs, check the peering state of the
PG(s) responsible for the unfound object. To see the peering state, run a command
of the following form:
-.. prompt:: bash $
+.. prompt:: bash #
ceph tell <pgid> query
To query the request queue for the daemon that is causing the slowdown, run the
following command from the daemon's host:
-.. prompt:: bash $
+.. prompt:: bash #
ceph daemon osd.<id> ops
To see a summary of the slowest recent requests, run a command of the following form:
-.. prompt:: bash $
+.. prompt:: bash #
ceph daemon osd.<id> dump_historic_ops
To see the location of a specific OSD, run a command of the following form:
-.. prompt:: bash $
+.. prompt:: bash #
ceph osd find osd.<id>
:confval:`osd_scrub_max_interval` globally. This interval can be overridden on
per-pool basis by changing the value of the variable
:confval:`scrub_max_interval`. This health check is raised if a certain
-percentage (determined by ``mon_warn_pg_not_scrubbed_ratio``) of the interval
+percentage (determined by :confval:`mon_warn_pg_not_scrubbed_ratio`) of the interval
has elapsed after the time the scrub was scheduled and no scrub has been
performed.
PGs are scrubbed only if they are flagged as ``clean`` (which means that
they are to be cleaned, and not that they have been examined and found to be
clean). Misplaced or degraded PGs will not be flagged as ``clean`` (see
-*PG_AVAILABILITY* and *PG_DEGRADED* above).
+`PG_AVAILABILITY`_ and `PG_DEGRADED`_ above).
To manually initiate a scrub of a clean PG, run a command of the following form:
-.. prompt: bash $
+.. prompt:: bash #
ceph pg scrub <pgid>
PGs will receive a deep scrub only if they are flagged as ``clean`` (which
means that they are to be cleaned, and not that they have been examined and
found to be clean). Misplaced or degraded PGs might not be flagged as ``clean``
-(see *PG_AVAILABILITY* and *PG_DEGRADED* above).
+(see `PG_AVAILABILITY`_ and `PG_DEGRADED`_ above).
This document offers two methods of setting the value of
:confval:`osd_deep_scrub_interval`. The first method listed here changes the
value of :confval:`osd_deep_scrub_interval` globally. The second method listed
-here changes the value of :confval:`osd_deep scrub interval` for OSDs and for
+here changes the value of :confval:`osd_deep_scrub_interval` for OSDs and for
the Manager daemon.
First Method
To manually initiate a deep scrub of a clean PG, run a command of the following form:
-.. prompt:: bash $
+.. prompt:: bash #
ceph pg deep-scrub <pgid>
-Under certain conditions, the warning ``PGs not deep-scrubbed in time``
+Under certain conditions, the warning ``pgs not deep-scrubbed in time``
appears. This might be because the cluster contains many large PGs, which take
longer to deep-scrub. To remedy this situation, you must change the value of
:confval:`osd_deep_scrub_interval` globally.
#. Confirm that ``ceph health detail`` returns a ``pgs not deep-scrubbed in
- time`` warning::
+ time`` warning:
+
+ .. prompt:: bash #
+
+ ceph health detail
+
+ .. code-block:: console
- # ceph health detail
HEALTH_WARN 1161 pgs not deep-scrubbed in time
[WRN] PG_NOT_DEEP_SCRUBBED: 1161 pgs not deep-scrubbed in time
pg 86.fff not deep-scrubbed since 2024-08-21T02:35:25.733187+0000
-#. Change ``osd_deep_scrub_interval`` globally:
+#. Change :confval:`osd_deep_scrub_interval` globally:
.. prompt:: bash #
To manually initiate a deep scrub of a clean PG, run a command of the following form:
-.. prompt:: bash $
+.. prompt:: bash #
ceph pg deep-scrub <pgid>
-Under certain conditions, the warning ``PGs not deep-scrubbed in time``
+Under certain conditions, the warning ``pgs not deep-scrubbed in time``
appears. This might be because the cluster contains many large PGs, which take
longer to deep-scrub. To remedy this situation, you must change the value of
:confval:`osd_deep_scrub_interval` for OSDs and for the Manager daemon.
#. Confirm that ``ceph health detail`` returns a ``pgs not deep-scrubbed in
- time`` warning::
+ time`` warning:
+
+ .. prompt:: bash #
+
+ ceph health detail
+
+ .. code-block:: console
- # ceph health detail
HEALTH_WARN 1161 pgs not deep-scrubbed in time
[WRN] PG_NOT_DEEP_SCRUBBED: 1161 pgs not deep-scrubbed in time
pg 86.fff not deep-scrubbed since 2024-08-21T02:35:25.733187+0000
-#. Change the ``osd_deep_scrub_interval`` for OSDs:
+#. Change the :confval:`osd_deep_scrub_interval` for OSDs:
.. prompt:: bash #
ceph config set osd osd_deep_scrub_interval 1209600
-#. Change the ``osd_deep_scrub_interval`` for Managers:
+#. Change the :confval:`osd_deep_scrub_interval` for Managers:
.. prompt:: bash #