From 12df4d6c27c7b0ee853aec3c69da0e21af72f97e Mon Sep 17 00:00:00 2001 From: Zac Dover Date: Sun, 4 Dec 2022 02:33:42 +1000 Subject: [PATCH] doc/rados: add prompts to health-checks (2 of 5) Add unselectable prompts to doc/rados/operations/health-checks.rst, second 300 lines. https://tracker.ceph.com/issues/57108 Signed-off-by: Zac Dover (cherry picked from commit c850569e52e97fe58d366e32784e011db80a027b) --- doc/rados/operations/health-checks.rst | 169 +++++++++++++++++-------- 1 file changed, 116 insertions(+), 53 deletions(-) diff --git a/doc/rados/operations/health-checks.rst b/doc/rados/operations/health-checks.rst index fa9e55f14ccc0..7efe1c7a1cdfb 100644 --- a/doc/rados/operations/health-checks.rst +++ b/doc/rados/operations/health-checks.rst @@ -281,9 +281,11 @@ __________ An OSD is referenced in the CRUSH map hierarchy but does not exist. -The OSD can be removed from the CRUSH hierarchy with:: +The OSD can be removed from the CRUSH hierarchy with: - ceph osd crush rm osd. +.. prompt:: bash $ + + ceph osd crush rm osd. OSD_OUT_OF_ORDER_FULL _____________________ @@ -293,11 +295,13 @@ and/or `failsafe_full` are not ascending. In particular, we expect `nearfull < backfillfull`, `backfillfull < full`, and `full < failsafe_full`. -The thresholds can be adjusted with:: +The thresholds can be adjusted with: + +.. prompt:: bash $ - ceph osd set-nearfull-ratio - ceph osd set-backfillfull-ratio - ceph osd set-full-ratio + ceph osd set-nearfull-ratio + ceph osd set-backfillfull-ratio + ceph osd set-full-ratio OSD_FULL @@ -306,18 +310,24 @@ ________ One or more OSDs has exceeded the `full` threshold and is preventing the cluster from servicing writes. -Utilization by pool can be checked with:: +Utilization by pool can be checked with: + +.. prompt:: bash $ + + ceph df - ceph df +The currently defined `full` ratio can be seen with: -The currently defined `full` ratio can be seen with:: +.. prompt:: bash $ - ceph osd dump | grep full_ratio + ceph osd dump | grep full_ratio A short-term workaround to restore write availability is to raise the full -threshold by a small amount:: +threshold by a small amount: - ceph osd set-full-ratio +.. prompt:: bash $ + + ceph osd set-full-ratio New storage should be added to the cluster by deploying more OSDs or existing data should be deleted in order to free up space. @@ -330,9 +340,11 @@ prevent data from being allowed to rebalance to this device. This is an early warning that rebalancing may not be able to complete and that the cluster is approaching full. -Utilization by pool can be checked with:: +Utilization by pool can be checked with: + +.. prompt:: bash $ - ceph df + ceph df OSD_NEARFULL ____________ @@ -340,9 +352,11 @@ ____________ One or more OSDs has exceeded the `nearfull` threshold. This is an early warning that the cluster is approaching full. -Utilization by pool can be checked with:: +Utilization by pool can be checked with: - ceph df +.. prompt:: bash $ + + ceph df OSDMAP_FLAGS ____________ @@ -363,10 +377,12 @@ One or more cluster flags of interest has been set. These flags include: * *noscrub*, *nodeep_scrub* - scrubbing is disabled * *notieragent* - cache tiering activity is suspended -With the exception of *full*, these flags can be set or cleared with:: +With the exception of *full*, these flags can be set or cleared with: + +.. prompt:: bash $ - ceph osd set - ceph osd unset + ceph osd set + ceph osd unset OSD_FLAGS _________ @@ -381,19 +397,23 @@ These flags include: * *noout*: if these OSDs are down they will not automatically be marked `out` after the configured interval -These flags can be set and cleared in batch with:: +These flags can be set and cleared in batch with: - ceph osd set-group - ceph osd unset-group +.. prompt:: bash $ -For example, :: + ceph osd set-group + ceph osd unset-group - ceph osd set-group noup,noout osd.0 osd.1 - ceph osd unset-group noup,noout osd.0 osd.1 - ceph osd set-group noup,noout host-foo - ceph osd unset-group noup,noout host-foo - ceph osd set-group noup,noout class-hdd - ceph osd unset-group noup,noout class-hdd +For example: + +.. prompt:: bash $ + + ceph osd set-group noup,noout osd.0 osd.1 + ceph osd unset-group noup,noout osd.0 osd.1 + ceph osd set-group noup,noout host-foo + ceph osd unset-group noup,noout host-foo + ceph osd set-group noup,noout class-hdd + ceph osd unset-group noup,noout class-hdd OLD_CRUSH_TUNABLES __________________ @@ -421,12 +441,14 @@ One or more cache pools is not configured with a *hit set* to track utilization, which will prevent the tiering agent from identifying cold objects to flush and evict from the cache. -Hit sets can be configured on the cache pool with:: +Hit sets can be configured on the cache pool with: - ceph osd pool set hit_set_type - ceph osd pool set hit_set_period - ceph osd pool set hit_set_count - ceph osd pool set hit_set_fpp +.. prompt:: bash $ + + ceph osd pool set hit_set_type + ceph osd pool set hit_set_period + ceph osd pool set hit_set_count + ceph osd pool set hit_set_fpp OSD_NO_SORTBITWISE __________________ @@ -435,23 +457,52 @@ No pre-luminous v12.y.z OSDs are running but the ``sortbitwise`` flag has not been set. The ``sortbitwise`` flag must be set before luminous v12.y.z or newer -OSDs can start. You can safely set the flag with:: +OSDs can start. You can safely set the flag with: + +.. prompt:: bash $ + + ceph osd set sortbitwise + +OSD_FILESTORE +__________________ + +Filestore has been deprecated, considering that Bluestore has been the default +objectstore for quite some time. Warn if OSDs are running Filestore. - ceph osd set sortbitwise +The 'mclock_scheduler' is not supported for filestore OSDs. Therefore, the +default 'osd_op_queue' is set to 'wpq' for filestore OSDs and is enforced +even if the user attempts to change it. + +Filestore OSDs can be listed with: + +.. prompt:: bash $ + + ceph report | jq -c '."osd_metadata" | .[] | select(.osd_objectstore | contains("filestore")) | {id, osd_objectstore}' + +If it is not feasible to migrate Filestore OSDs to Bluestore immediately, you +can silence this warning temporarily with: + +.. prompt:: bash $ + + ceph health mute OSD_FILESTORE POOL_FULL _________ One or more pools has reached its quota and is no longer allowing writes. -Pool quotas and utilization can be seen with:: +Pool quotas and utilization can be seen with: + +.. prompt:: bash $ + + ceph df detail - ceph df detail +You can either raise the pool quota with: -You can either raise the pool quota with:: +.. prompt:: bash $ - ceph osd pool set-quota max_objects - ceph osd pool set-quota max_bytes + ceph osd pool set-quota max_objects + ceph osd pool set-quota max_bytes or delete some existing data to reduce utilization. @@ -466,29 +517,37 @@ condition or even unexpected, but if the administrator's expectation was that all metadata would fit on the faster device, it indicates that not enough space was provided. -This warning can be disabled on all OSDs with:: +This warning can be disabled on all OSDs with: - ceph config set osd bluestore_warn_on_bluefs_spillover false +.. prompt:: bash $ -Alternatively, it can be disabled on a specific OSD with:: + ceph config set osd bluestore_warn_on_bluefs_spillover false + +Alternatively, it can be disabled on a specific OSD with: + +.. prompt:: bash $ - ceph config set osd.123 bluestore_warn_on_bluefs_spillover false + ceph config set osd.123 bluestore_warn_on_bluefs_spillover false To provide more metadata space, the OSD in question could be destroyed and reprovisioned. This will involve data migration and recovery. It may also be possible to expand the LVM logical volume backing the `db` storage. If the underlying LV has been expanded, the OSD daemon -needs to be stopped and BlueFS informed of the device size change with:: +needs to be stopped and BlueFS informed of the device size change with: - ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-$ID +.. prompt:: bash $ + + ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-$ID BLUEFS_AVAILABLE_SPACE ______________________ -To check how much space is free for BlueFS do:: +To check how much space is free for BlueFS do: + +.. prompt:: bash $ - ceph daemon osd.123 bluestore bluefs available + ceph daemon osd.123 bluestore bluefs available This will output up to 3 values: `BDEV_DB free`, `BDEV_SLOW free` and `available_from_bluestore`. `BDEV_DB` and `BDEV_SLOW` report amount of space that @@ -503,18 +562,22 @@ _________________ If BlueFS is running low on available free space and there is little `available_from_bluestore` one can consider reducing BlueFS allocation unit size. -To simulate available space when allocation unit is different do:: +To simulate available space when allocation unit is different do: - ceph daemon osd.123 bluestore bluefs available +.. prompt:: bash $ + + ceph daemon osd.123 bluestore bluefs available BLUESTORE_FRAGMENTATION _______________________ As BlueStore works free space on underlying storage will get fragmented. This is normal and unavoidable but excessive fragmentation will cause slowdown. -To inspect BlueStore fragmentation one can do:: +To inspect BlueStore fragmentation one can do: + +.. prompt:: bash $ - ceph daemon osd.123 bluestore allocator score block + ceph daemon osd.123 bluestore allocator score block Score is given in [0-1] range. [0.0 .. 0.4] tiny fragmentation -- 2.39.5