From: Ville Ojamo <14869000+bluikko@users.noreply.github.com> Date: Thu, 18 Dec 2025 08:27:36 +0000 (+0700) Subject: doc/rados: Fix minor issues, improvements in health-checks.rst (3 of x) X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=3afa69911f6b4385df6f5c3915efd6ba6df6e391;p=ceph.git doc/rados: Fix minor issues, improvements in health-checks.rst (3 of x) Also changed in monitoring.rst: - Add label for link. - Linkify mention of health checks. Fix "bluestore bluefs" admin-socket commands after #38437. Use ":confval:" instead of simple inline literal text for config values. Use a privileged prompt for CLI command examples requiring privileges. Trim section title syntax underline. Use admonition instead of emphasis on the whole paragraph. Attempt to improve language in a few sentences. Use double backticks consistently for literals. Linkify, adding a label in the destination if necessary: - Muting health checks - BlueStore migration Capitalize Ceph, BlueStore, RADOS. Add paragraph linking to more info on the "ceph df" command. Use DB or DB device instead of literal db. Add unordered list formatting to what was being rendered as single paragraph. Use ":program:" for linking to ceph-bluestore-tool. Change few last double spaces after full stop to a single space. Use code-block paragraphs instead of inline literals where appropriate. Add missing word "is". Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com> --- diff --git a/doc/rados/operations/health-checks.rst b/doc/rados/operations/health-checks.rst index cf7649d6120..a8f51352410 100644 --- a/doc/rados/operations/health-checks.rst +++ b/doc/rados/operations/health-checks.rst @@ -508,7 +508,7 @@ __________________ The CRUSH map is using very old settings and should be updated. The oldest set of tunables that can be used (that is, the oldest client version that can connect to the cluster) without raising this health check is determined by the -``mon_crush_min_required_version`` config option. For more information, see +:confval:`mon_crush_min_required_version` config option. For more information, see :ref:`crush-map-tunables`. OLD_CRUSH_STRAW_CALC_VERSION @@ -529,7 +529,7 @@ objects that are to be flushed and evicted from the cache. To configure hit sets on the cache pool, run the following commands: -.. prompt:: bash $ +.. prompt:: bash # ceph osd pool set hit_set_type ceph osd pool set hit_set_period @@ -545,35 +545,34 @@ been set. The ``sortbitwise`` flag must be set in order for OSDs running Luminous v12.y.z or newer to start. To safely set the flag, run the following command: -.. prompt:: bash $ +.. prompt:: bash # ceph osd set sortbitwise OSD_FILESTORE -__________________ +_____________ -Warn if OSDs are running the old Filestore back end. The Filestore OSD back end -is deprecated; the BlueStore back end has been the default object store since +OSDs are running the old Filestore back end. The Filestore OSD back end +is deprecated and the BlueStore back end has been the default object store since the Ceph Luminous release. -The 'mclock_scheduler' is not supported for Filestore OSDs. For this reason, -the default 'osd_op_queue' is set to 'wpq' for Filestore OSDs and is enforced +The mClock scheduler is not supported for Filestore OSDs. For this reason, +the default ``osd_op_queue`` is set to ``wpq`` for Filestore OSDs and is enforced even if the user attempts to change it. - - -.. prompt:: bash $ +.. prompt:: bash # ceph report | jq -c '."osd_metadata" | .[] | select(.osd_objectstore | contains("filestore")) | {id, osd_objectstore}' -**In order to upgrade to Reef or a later release, you must first migrate any -Filestore OSDs to BlueStore.** +.. important:: In order to upgrade to Reef or a later release, you must first migrate any + Filestore OSDs to BlueStore. If you are upgrading a pre-Reef release to Reef or later, but it is not -feasible to migrate Filestore OSDs to BlueStore immediately, you can -temporarily silence this alert by running the following command: +feasible to :ref:`migrate Filestore OSDs to BlueStore ` immediately, you can +temporarily :ref:`silence ` this alert +by running the following command: -.. prompt:: bash $ +.. prompt:: bash # ceph health mute OSD_FILESTORE @@ -585,10 +584,10 @@ OSD_UNREACHABLE _______________ The registered v1/v2 public address or addresses of one or more OSD(s) is or -are out of the defined `public_network` subnet, which prevents these -unreachable OSDs from communicating with ceph clients properly. +are out of the defined :confval:`public_network` subnet, which prevents these +unreachable OSDs from communicating with Ceph clients properly. -Even though these unreachable OSDs are in up state, rados clients +Even though these unreachable OSDs are in ``up`` state, RADOS clients will hang till TCP timeout before erroring out due to this inconsistency. POOL_FULL @@ -598,13 +597,16 @@ One or more pools have reached quota and no longer allow writes. To see pool quotas and utilization, run the following command: -.. prompt:: bash $ +.. prompt:: bash # ceph df detail +For more details on the ``ceph df`` command, +see :ref:`rados-monitoring-pool-usage`. + If you opt to raise the pool quota, run the following commands: -.. prompt:: bash $ +.. prompt:: bash # ceph osd pool set-quota max_objects ceph osd pool set-quota max_bytes @@ -614,7 +616,7 @@ If not, delete some existing data to reduce utilization. BLUEFS_SPILLOVER ________________ -One or more OSDs that use the BlueStore back end have been allocated `db` +One or more OSDs that use the BlueStore back end have been allocated DB device partitions (that is, storage space for metadata, normally on a faster device), but because that space has been filled, metadata has "spilled over" onto the slow device. This is not necessarily an error condition or even unexpected @@ -624,25 +626,25 @@ that not enough space was provided. To disable this alert on all OSDs, run the following command: -.. prompt:: bash $ +.. prompt:: bash # ceph config set osd bluestore_warn_on_bluefs_spillover false Alternatively, to disable the alert on a specific OSD, run the following command: -.. prompt:: bash $ +.. prompt:: bash # ceph config set osd.123 bluestore_warn_on_bluefs_spillover false To secure more metadata space, you can destroy and reprovision the OSD in question. This process involves data migration and recovery. -It might also be possible to expand the LVM logical volume that backs the `db` +It might also be possible to expand the LVM logical volume that backs the DB storage. If the underlying LV has been expanded, you must stop the OSD daemon and inform BlueFS of the device-size change by running the following command: -.. prompt:: bash $ +.. prompt:: bash # ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-$ID @@ -651,31 +653,31 @@ ______________________ To see how much space is free for BlueFS, run the following command: -.. prompt:: bash $ +.. prompt:: bash # - ceph daemon osd.123 bluestore bluefs available + ceph daemon osd.123 bluestore bluefs device info This will output up to three values: ``BDEV_DB free``, ``BDEV_SLOW free``, and -``available_from_bluestore``. ``BDEV_DB`` and ``BDEV_SLOW`` report the amount +``bluestore max free``. ``BDEV_DB`` and ``BDEV_SLOW`` report the amount of space that has been acquired by BlueFS and is now considered free. The value -``available_from_bluestore`` indicates the ability of BlueStore to relinquish -more space to BlueFS. It is normal for this value to differ from the amount of +``bluestore max free`` indicates the ability of BlueStore to relinquish +more space to BlueFS. It is normal for this value to differ from the amount of BlueStore free space, because the BlueFS allocation unit is typically larger -than the BlueStore allocation unit. This means that only part of the BlueStore +than the BlueStore allocation unit. This means that only part of the BlueStore free space will be available for BlueFS. BLUEFS_LOW_SPACE _________________ If BlueFS is running low on available free space and there is not much free -space available from BlueStore (in other words, `available_from_bluestore` has +space available from BlueStore (in other words, ``bluestore max free`` has a low value), consider reducing the BlueFS allocation unit size. To simulate available space when the allocation unit is different, run the following command: -.. prompt:: bash $ +.. prompt:: bash # - ceph daemon osd.123 bluestore bluefs available + ceph daemon osd.123 bluestore bluefs device info BLUESTORE_FRAGMENTATION _______________________ @@ -685,33 +687,34 @@ BlueStore has become fragmented. This is normal and unavoidable, but excessive fragmentation causes slowdown. To inspect BlueStore fragmentation, run the following command: -.. prompt:: bash $ +.. prompt:: bash # ceph daemon osd.123 bluestore allocator score block -The fragmentation score is given in a [0-1] range. -[0.0 .. 0.4] tiny fragmentation -[0.4 .. 0.7] small, acceptable fragmentation -[0.7 .. 0.9] considerable, but safe fragmentation -[0.9 .. 1.0] severe fragmentation, might impact BlueFS's ability to get space from BlueStore +The fragmentation score is given in a [0-1] range: + +- [0.0 .. 0.4] tiny fragmentation +- [0.4 .. 0.7] small, acceptable fragmentation +- [0.7 .. 0.9] considerable, but safe fragmentation +- [0.9 .. 1.0] severe fragmentation, might impact BlueFS' ability to get space from BlueStore To see a detailed report of free fragments, run the following command: -.. prompt:: bash $ +.. prompt:: bash # ceph daemon osd.123 bluestore allocator dump block For OSD processes that are not currently running, fragmentation can be -inspected with `ceph-bluestore-tool`. To see the fragmentation score, run the +inspected with :program:`ceph-bluestore-tool`. To see the fragmentation score, run the following command: -.. prompt:: bash $ +.. prompt:: bash # ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-123 --allocator block free-score To dump detailed free chunks, run the following command: -.. prompt:: bash $ +.. prompt:: bash # ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-123 --allocator block free-dump @@ -730,7 +733,7 @@ The old OSDs can be updated to use the new usage-tracking scheme by stopping each OSD, running a repair operation, and then restarting the OSD. For example, to update ``osd.123``, run the following commands: -.. prompt:: bash $ +.. prompt:: bash # systemctl stop ceph-osd@123 ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-123 @@ -738,7 +741,7 @@ to update ``osd.123``, run the following commands: To disable this alert, run the following command: -.. prompt:: bash $ +.. prompt:: bash # ceph config set global bluestore_warn_on_legacy_statfs false @@ -757,7 +760,7 @@ The OSDs can be updated to track by pool by stopping each OSD, running a repair operation, and then restarting the OSD. For example, to update ``osd.123``, run the following commands: -.. prompt:: bash $ +.. prompt:: bash # systemctl stop ceph-osd@123 ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-123 @@ -765,15 +768,15 @@ the following commands: To disable this alert, run the following command: -.. prompt:: bash $ +.. prompt:: bash # ceph config set global bluestore_warn_on_no_per_pool_omap false BLUESTORE_NO_PER_PG_OMAP __________________________ -One or more OSDs have volumes that were created prior to Pacific. (In Pacific -and later releases Bluestore tracks omap space utilitzation by Placement Group +One or more OSDs have volumes that were created prior to Pacific. (In Pacific +and later releases BlueStore tracks omap space utilitzation by Placement Group (PG).) Per-PG omap allows faster PG removal when PGs migrate. @@ -782,7 +785,7 @@ The older OSDs can be updated to track by PG by stopping each OSD, running a repair operation, and then restarting the OSD. For example, to update ``osd.123``, run the following commands: -.. prompt:: bash $ +.. prompt:: bash # systemctl stop ceph-osd@123 ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-123 @@ -790,7 +793,7 @@ repair operation, and then restarting the OSD. For example, to update To disable this alert, run the following command: -.. prompt:: bash $ +.. prompt:: bash # ceph config set global bluestore_warn_on_no_per_pg_omap false @@ -807,7 +810,7 @@ very careful to execute this procedure on only one OSD at a time, so as to minimize the risk of losing any data. To execute this procedure, where ``$N`` is the OSD that has the inconsistency, run the following commands: -.. prompt:: bash $ +.. prompt:: bash # ceph osd out osd.$N while ! ceph osd safe-to-destroy osd.$N ; do sleep 1m ; done @@ -837,10 +840,10 @@ BLUESTORE_SPURIOUS_READ_ERRORS ______________________________ One (or more) BlueStore OSDs detects read errors on the main device. -BlueStore has recovered from these errors by retrying disk reads. This alert +BlueStore has recovered from these errors by retrying disk reads. This alert might indicate issues with underlying hardware, issues with the I/O subsystem, -or something similar. Such issues can cause permanent data -corruption. Some observations on the root cause of spurious read errors can be +or something similar. Such issues can cause permanent data +corruption. Some observations on the root cause of spurious read errors can be found here: https://tracker.ceph.com/issues/22464 This alert does not require an immediate response, but the affected host might @@ -849,13 +852,13 @@ OS/kernel versions and implementing hardware-resource-utilization monitoring. To disable this alert on all OSDs, run the following command: -.. prompt:: bash $ +.. prompt:: bash # ceph config set osd bluestore_warn_on_spurious_read_errors false Or, to disable this alert on a specific OSD, run the following command: -.. prompt:: bash $ +.. prompt:: bash # ceph config set osd.123 bluestore_warn_on_spurious_read_errors false @@ -867,17 +870,19 @@ that can cause performance degradation and potentially data unavailability or loss. These may indicate a storage drive that is failing and should be evaluated and possibly removed and replaced. -``read stalled read 0x29f40370000~100000 (buffered) since 63410177.290546s, timeout is 5.000000s`` +.. code-block:: console -However, this is difficult to spot because there no discernible warning (a + read stalled read 0x29f40370000~100000 (buffered) since 63410177.290546s, timeout is 5.000000s + +However, this is difficult to spot because there is no discernible warning (a health warning or info in ``ceph health detail`` for example). More observations can be found here: https://tracker.ceph.com/issues/62500 Also because there can be false positive ``stalled read`` instances, a mechanism -has been added to increase accuracy. If in the last ``bdev_stalled_read_warn_lifetime`` +has been added to increase accuracy. If in the last :confval:`bdev_stalled_read_warn_lifetime` seconds the number of ``stalled read`` events is found to be greater than or equal to -``bdev_stalled_read_warn_threshold`` for a given BlueStore block device, this -warning will be reported in ``ceph health detail``. The warning state will be +:confval:`bdev_stalled_read_warn_threshold` for a given BlueStore block device, this +warning will be reported in ``ceph health detail``. The warning state will be removed when the condition clears. The defaults for :confval:`bdev_stalled_read_warn_lifetime` @@ -886,7 +891,7 @@ specific OSDs. To change this, run the following command: -.. prompt:: bash $ +.. prompt:: bash # ceph config set global bdev_stalled_read_warn_lifetime 10 ceph config set global bdev_stalled_read_warn_threshold 5 @@ -894,7 +899,7 @@ To change this, run the following command: This may be done for specific OSDs or a given mask. For example, to apply only to SSD OSDs: -.. prompt:: bash $ +.. prompt:: bash # ceph config set osd.123 bdev_stalled_read_warn_lifetime 10 ceph config set osd.123 bdev_stalled_read_warn_threshold 5 @@ -927,14 +932,16 @@ to performance degradation and data unavailability or loss. These indicate that the storage drive may be failing and should be investigated and potentially replaced. -``log_latency_fn slow operation observed for _txc_committed_kv, latency = 12.028621219s, txc = 0x55a107c30f00`` -``log_latency_fn slow operation observed for upper_bound, latency = 6.25955s`` -``log_latency slow operation observed for submit_transaction..`` +.. code-block:: console + + log_latency_fn slow operation observed for _txc_committed_kv, latency = 12.028621219s, txc = 0x55a107c30f00 + log_latency_fn slow operation observed for upper_bound, latency = 6.25955s + log_latency slow operation observed for submit_transaction.. This may also be reflected by the ``BLUESTORE_SLOW_OP_ALERT`` cluster health flag. As there can be false positive ``slow ops`` instances, a mechanism has -been added for more reliability. If in the last ``bluestore_slow_ops_warn_lifetime`` +been added for more reliability. If in the last :confval:`bluestore_slow_ops_warn_lifetime` seconds the number of ``slow ops`` indications are found greater than or equal to :confval:`bluestore_slow_ops_warn_threshold` for a given BlueStore OSD, this warning will be reported in ``ceph health detail``. The warning state is @@ -946,14 +953,14 @@ specific OSDs. To change this, run a command of the following form: -.. prompt:: bash $ +.. prompt:: bash # ceph config set global bluestore_slow_ops_warn_lifetime 300 ceph config set global bluestore_slow_ops_warn_threshold 5 this may be done for specific OSDs or a given mask, for example: -.. prompt:: bash $ +.. prompt:: bash # ceph config set osd.123 bluestore_slow_ops_warn_lifetime 300 ceph config set osd.123 bluestore_slow_ops_warn_threshold 5 diff --git a/doc/rados/operations/monitoring.rst b/doc/rados/operations/monitoring.rst index df258f0de50..14a349dcc48 100644 --- a/doc/rados/operations/monitoring.rst +++ b/doc/rados/operations/monitoring.rst @@ -147,7 +147,7 @@ lines from the cluster log. Monitoring Health Checks ======================== -Ceph continuously runs various *health checks*. When +Ceph continuously runs various :ref:`health-checks`. When a health check fails, this failure is reflected in the output of ``ceph status`` and ``ceph health``. The cluster log receives messages that indicate when a check has failed and when the cluster has recovered. @@ -375,6 +375,8 @@ For example, suppose that there is one OSD down and the health check is muted. I one or more additional OSDs go down, then the health mute disappears. This behavior occurs in any health check with a threshold value. +.. _rados-monitoring-pool-usage: + Checking a Cluster's Usage Stats ================================