From d3efcf6e307de9d1449251db31d4feb2a9620688 Mon Sep 17 00:00:00 2001 From: Zac Dover Date: Sun, 2 Apr 2023 06:25:42 +1000 Subject: [PATCH] doc/rados/ops: edit health-checks.rst (4 of x) Edit docs/rados/operations/health-checks.rst (4 of x). Follows https://github.com/ceph/ceph/pull/50826. https://tracker.ceph.com/issues/58485 Co-authored-by: Anthony D'Atri Signed-off-by: Zac Dover (cherry picked from commit 31df75356f2e49e0da6e0f4a6ec498e3609bd39e) --- doc/rados/operations/health-checks.rst | 332 +++++++++++++------------ 1 file changed, 167 insertions(+), 165 deletions(-) diff --git a/doc/rados/operations/health-checks.rst b/doc/rados/operations/health-checks.rst index 31a93a9a0313c..cefd9fe424218 100644 --- a/doc/rados/operations/health-checks.rst +++ b/doc/rados/operations/health-checks.rst @@ -781,68 +781,68 @@ Device health DEVICE_HEALTH _____________ -One or more devices is expected to fail soon, where the warning -threshold is controlled by the ``mgr/devicehealth/warn_threshold`` -config option. +One or more OSD devices are expected to fail soon, where the warning threshold +is determined by the ``mgr/devicehealth/warn_threshold`` config option. -This warning only applies to OSDs that are currently marked "in", so -the expected response to this failure is to mark the device "out" so -that data is migrated off of the device, and then to remove the -hardware from the system. Note that the marking out is normally done -automatically if ``mgr/devicehealth/self_heal`` is enabled based on -the ``mgr/devicehealth/mark_out_threshold``. +Because this alert applies only to OSDs that are currently marked ``in``, the +appropriate response to this expected failure is (1) to mark the OSD ``out`` so +that data is migrated off of the OSD, and then (2) to remove the hardware from +the system. Note that this marking ``out`` is normally done automatically if +``mgr/devicehealth/self_heal`` is enabled (as determined by +``mgr/devicehealth/mark_out_threshold``). -Device health can be checked with: +To check device health, run the following command: .. prompt:: bash $ ceph device info -Device life expectancy is set by a prediction model run by -the mgr or an by external tool via the command: +Device life expectancy is set either by a prediction model that the mgr runs or +by an external tool that is activated by running the following command: .. prompt:: bash $ ceph device set-life-expectancy -You can change the stored life expectancy manually, but that usually -doesn't accomplish anything as whatever tool originally set it will -probably set it again, and changing the stored value does not affect -the actual health of the hardware device. +You can change the stored life expectancy manually, but such a change usually +doesn't accomplish anything. The reason for this is that whichever tool +originally set the stored life expectancy will probably undo your change by +setting it again, and a change to the stored value does not affect the actual +health of the hardware device. DEVICE_HEALTH_IN_USE ____________________ -One or more devices is expected to fail soon and has been marked "out" -of the cluster based on ``mgr/devicehealth/mark_out_threshold``, but it -is still participating in one more PGs. This may be because it was -only recently marked "out" and data is still migrating, or because data -cannot be migrated off for some reason (e.g., the cluster is nearly -full, or the CRUSH hierarchy is such that there isn't another suitable -OSD to migrate the data too). +One or more devices (that is, OSDs) are expected to fail soon and have been +marked ``out`` of the cluster (as controlled by +``mgr/devicehealth/mark_out_threshold``), but they are still participating in +one or more Placement Groups. This might be because the OSD(s) were marked +``out`` only recently and data is still migrating, or because data cannot be +migrated off of the OSD(s) for some reason (for example, the cluster is nearly +full, or the CRUSH hierarchy is structured so that there isn't another suitable +OSD to migrate the data to). -This message can be silenced by disabling the self heal behavior -(setting ``mgr/devicehealth/self_heal`` to false), by adjusting the -``mgr/devicehealth/mark_out_threshold``, or by addressing what is -preventing data from being migrated off of the ailing device. +This message can be silenced by disabling self-heal behavior (that is, setting +``mgr/devicehealth/self_heal`` to ``false``), by adjusting +``mgr/devicehealth/mark_out_threshold``, or by addressing whichever condition +is preventing data from being migrated off of the ailing OSD(s). DEVICE_HEALTH_TOOMANY _____________________ -Too many devices is expected to fail soon and the -``mgr/devicehealth/self_heal`` behavior is enabled, such that marking -out all of the ailing devices would exceed the clusters -``mon_osd_min_in_ratio`` ratio that prevents too many OSDs from being -automatically marked "out". +Too many devices (that is, OSDs) are expected to fail soon, and because +``mgr/devicehealth/self_heal`` behavior is enabled, marking ``out`` all of the +ailing OSDs would exceed the cluster's ``mon_osd_min_in_ratio`` ratio. This +ratio prevents a cascade of too many OSDs from being automatically marked +``out``. -This generally indicates that too many devices in your cluster are -expected to fail soon and you should take action to add newer -(healthier) devices before too many devices fail and data is lost. +You should promptly add new OSDs to the cluster to prevent data loss, or +incrementally replace the failing OSDs. -The health message can also be silenced by adjusting parameters like -``mon_osd_min_in_ratio`` or ``mgr/devicehealth/mark_out_threshold``, -but be warned that this will increase the likelihood of unrecoverable -data loss in the cluster. +Alternatively, you can silence this health check by adjusting options including +``mon_osd_min_in_ratio`` or ``mgr/devicehealth/mark_out_threshold``. Be +warned, however, that this will increase the likelihood of unrecoverable data +loss. Data health (pools & placement groups) @@ -851,23 +851,24 @@ Data health (pools & placement groups) PG_AVAILABILITY _______________ -Data availability is reduced, meaning that the cluster is unable to -service potential read or write requests for some data in the cluster. -Specifically, one or more PGs is in a state that does not allow IO -requests to be serviced. Problematic PG states include *peering*, -*stale*, *incomplete*, and the lack of *active* (if those conditions do not clear -quickly). +Data availability is reduced. In other words, the cluster is unable to service +potential read or write requests for at least some data in the cluster. More +precisely, one or more Placement Groups (PGs) are in a state that does not +allow I/O requests to be serviced. Any of the following PG states are +problematic if they do not clear quickly: *peering*, *stale*, *incomplete*, and +the lack of *active*. -Detailed information about which PGs are affected is available from: +For detailed information about which PGs are affected, run the following +command: .. prompt:: bash $ ceph health detail -In most cases the root cause is that one or more OSDs is currently -down; see the discussion for ``OSD_DOWN`` above. +In most cases, the root cause of this issue is that one or more OSDs are +currently ``down``: see ``OSD_DOWN`` above. -The state of specific problematic PGs can be queried with: +To see the state of a specific problematic PG, run the following command: .. prompt:: bash $ @@ -876,25 +877,26 @@ The state of specific problematic PGs can be queried with: PG_DEGRADED ___________ -Data redundancy is reduced for some data, meaning the cluster does not -have the desired number of replicas for all data (for replicated -pools) or erasure code fragments (for erasure coded pools). -Specifically, one or more PGs: +Data redundancy is reduced for some data: in other words, the cluster does not +have the desired number of replicas for all data (in the case of replicated +pools) or erasure code fragments (in the case of erasure-coded pools). More +precisely, one or more Placement Groups (PGs): -* has the *degraded* or *undersized* flag set, meaning there are not - enough instances of that placement group in the cluster; -* has not had the *clean* flag set for some time. +* have the *degraded* or *undersized* flag set, which means that there are not + enough instances of that PG in the cluster; or +* have not had the *clean* state set for a long time. -Detailed information about which PGs are affected is available from: +For detailed information about which PGs are affected, run the following +command: .. prompt:: bash $ ceph health detail -In most cases the root cause is that one or more OSDs is currently -down; see the discussion for ``OSD_DOWN`` above. +In most cases, the root cause of this issue is that one or more OSDs are +currently "down": see ``OSD_DOWN`` above. -The state of specific problematic PGs can be queried with: +To see the state of a specific problematic PG, run the following command: .. prompt:: bash $ @@ -904,72 +906,71 @@ The state of specific problematic PGs can be queried with: PG_RECOVERY_FULL ________________ -Data redundancy may be reduced or at risk for some data due to a lack -of free space in the cluster. Specifically, one or more PGs has the -*recovery_toofull* flag set, meaning that the -cluster is unable to migrate or recover data because one or more OSDs -is above the *full* threshold. +Data redundancy might be reduced or even put at risk for some data due to a +lack of free space in the cluster. More precisely, one or more Placement Groups +have the *recovery_toofull* flag set, which means that the cluster is unable to +migrate or recover data because one or more OSDs are above the ``full`` +threshold. -See the discussion for *OSD_FULL* above for steps to resolve this condition. +For steps to resolve this condition, see *OSD_FULL* above. PG_BACKFILL_FULL ________________ -Data redundancy may be reduced or at risk for some data due to a lack -of free space in the cluster. Specifically, one or more PGs has the -*backfill_toofull* flag set, meaning that the -cluster is unable to migrate or recover data because one or more OSDs -is above the *backfillfull* threshold. +Data redundancy might be reduced or even put at risk for some data due to a +lack of free space in the cluster. More precisely, one or more Placement Groups +have the *backfill_toofull* flag set, which means that the cluster is unable to +migrate or recover data because one or more OSDs are above the ``backfillfull`` +threshold. -See the discussion for *OSD_BACKFILLFULL* above for -steps to resolve this condition. - -.. _rados-operations-health-checks-pg-damaged: +For steps to resolve this condition, see *OSD_BACKFILLFULL* above. PG_DAMAGED __________ -Data scrubbing has discovered some problems with data consistency in -the cluster. Specifically, one or more PGs has the *inconsistent* or -*snaptrim_error* flag is set, indicating an earlier scrub operation -found a problem, or that the *repair* flag is set, meaning a repair -for such an inconsistency is currently in progress. +Data scrubbing has discovered problems with data consistency in the cluster. +More precisely, one or more Placement Groups either (1) have the *inconsistent* +or ``snaptrim_error`` flag set, which indicates that an earlier data scrub +operation found a problem, or (2) have the *repair* flag set, which means that +a repair for such an inconsistency is currently in progress. -See :doc:`pg-repair` for more information. +For more information, see :doc:`pg-repair`. OSD_SCRUB_ERRORS ________________ -Recent OSD scrubs have uncovered inconsistencies. This error is generally -paired with *PG_DAMAGED* (see :ref:`PG_DAMAGE `). +Recent OSD scrubs have discovered inconsistencies. This alert is generally +paired with *PG_DAMAGED* (see above). -See :doc:`pg-repair` for more information. +For more information, see :doc:`pg-repair`. OSD_TOO_MANY_REPAIRS ____________________ -When a read error occurs and another replica is available it is used to repair -the error immediately, so that the client can get the object data. Scrub -handles errors for data at rest. In order to identify possible failing disks -that aren't seeing scrub errors, a count of read repairs is maintained. If -it exceeds a config value threshold *mon_osd_warn_num_repaired* default 10, -this health warning is generated. +The count of read repairs has exceeded the config value threshold +``mon_osd_warn_num_repaired`` (default: ``10``). Because scrub handles errors +only for data at rest, and because any read error that occurs when another +replica is available will be repaired immediately so that the client can get +the object data, there might exist failing disks that are not registering any +scrub errors. This repair count is maintained as a way of identifying any such +failing disks. + LARGE_OMAP_OBJECTS __________________ -One or more pools contain large omap objects as determined by -``osd_deep_scrub_large_omap_object_key_threshold`` (threshold for number of keys -to determine a large omap object) or -``osd_deep_scrub_large_omap_object_value_sum_threshold`` (the threshold for -summed size (bytes) of all key values to determine a large omap object) or both. -More information on the object name, key count, and size in bytes can be found -by searching the cluster log for 'Large omap object found'. Large omap objects -can be caused by RGW bucket index objects that do not have automatic resharding -enabled. Please see :ref:`RGW Dynamic Bucket Index Resharding -` for more information on resharding. +One or more pools contain large omap objects, as determined by +``osd_deep_scrub_large_omap_object_key_threshold`` (threshold for the number of +keys to determine what is considered a large omap object) or +``osd_deep_scrub_large_omap_object_value_sum_threshold`` (the threshold for the +summed size in bytes of all key values to determine what is considered a large +omap object) or both. To find more information on object name, key count, and +size in bytes, search the cluster log for 'Large omap object found'. This issue +can be caused by RGW-bucket index objects that do not have automatic resharding +enabled. For more information on resharding, see :ref:`RGW Dynamic Bucket Index +Resharding `. -The thresholds can be adjusted with: +To adjust the thresholds mentioned above, run the following commands: .. prompt:: bash $ @@ -979,54 +980,53 @@ The thresholds can be adjusted with: CACHE_POOL_NEAR_FULL ____________________ -A cache tier pool is nearly full. Full in this context is determined -by the ``target_max_bytes`` and ``target_max_objects`` properties on -the cache pool. Once the pool reaches the target threshold, write -requests to the pool may block while data is flushed and evicted -from the cache, a state that normally leads to very high latencies and -poor performance. +A cache-tier pool is nearly full, as determined by the ``target_max_bytes`` and +``target_max_objects`` properties of the cache pool. Once the pool reaches the +target threshold, write requests to the pool might block while data is flushed +and evicted from the cache. This state normally leads to very high latencies +and poor performance. -The cache pool target size can be adjusted with: +To adjust the cache pool's target size, run the following commands: .. prompt:: bash $ ceph osd pool set target_max_bytes ceph osd pool set target_max_objects -Normal cache flush and evict activity may also be throttled due to reduced -availability or performance of the base tier, or overall cluster load. +There might be other reasons that normal cache flush and evict activity are +throttled: for example, reduced availability of the base tier, reduced +performance of the base tier, or overall cluster load. TOO_FEW_PGS ___________ -The number of PGs in use in the cluster is below the configurable -threshold of ``mon_pg_warn_min_per_osd`` PGs per OSD. This can lead -to suboptimal distribution and balance of data across the OSDs in -the cluster, and similarly reduce overall performance. +The number of Placement Groups (PGs) that are in use in the cluster is below +the configurable threshold of ``mon_pg_warn_min_per_osd`` PGs per OSD. This can +lead to suboptimal distribution and suboptimal balance of data across the OSDs +in the cluster, and a reduction of overall performance. -This may be an expected condition if data pools have not yet been -created. +If data pools have not yet been created, this condition is expected. -The PG count for existing pools can be increased or new pools can be created. -Please refer to :ref:`choosing-number-of-placement-groups` for more -information. +To address this issue, you can increase the PG count for existing pools or +create new pools. For more information, see +:ref:`choosing-number-of-placement-groups`. POOL_PG_NUM_NOT_POWER_OF_TWO ____________________________ -One or more pools has a ``pg_num`` value that is not a power of two. -Although this is not strictly incorrect, it does lead to a less -balanced distribution of data because some PGs have roughly twice as -much data as others. +One or more pools have a ``pg_num`` value that is not a power of two. Although +this is not strictly incorrect, it does lead to a less balanced distribution of +data because some Placement Groups will have roughly twice as much data as +others have. -This is easily corrected by setting the ``pg_num`` value for the -affected pool(s) to a nearby power of two: +This is easily corrected by setting the ``pg_num`` value for the affected +pool(s) to a nearby power of two. To do so, run the following command: .. prompt:: bash $ ceph osd pool set pg_num -This health warning can be disabled with: +To disable this health check, run the following command: .. prompt:: bash $ @@ -1035,92 +1035,94 @@ This health warning can be disabled with: POOL_TOO_FEW_PGS ________________ -One or more pools should probably have more PGs, based on the amount -of data that is currently stored in the pool. This can lead to -suboptimal distribution and balance of data across the OSDs in the -cluster, and similarly reduce overall performance. This warning is -generated if the ``pg_autoscale_mode`` property on the pool is set to -``warn``. +One or more pools should probably have more Placement Groups (PGs), given the +amount of data that is currently stored in the pool. This issue can lead to +suboptimal distribution and suboptimal balance of data across the OSDs in the +cluster, and a reduction of overall performance. This alert is raised only if +the ``pg_autoscale_mode`` property on the pool is set to ``warn``. -To disable the warning, you can disable auto-scaling of PGs for the -pool entirely with: +To disable the alert, entirely disable auto-scaling of PGs for the pool by +running the following command: .. prompt:: bash $ ceph osd pool set pg_autoscale_mode off -To allow the cluster to automatically adjust the number of PGs,: +To allow the cluster to automatically adjust the number of PGs for the pool, +run the following command: .. prompt:: bash $ ceph osd pool set pg_autoscale_mode on -You can also manually set the number of PGs for the pool to the -recommended amount with: +Alternatively, to manually set the number of PGs for the pool to the +recommended amount, run the following command: .. prompt:: bash $ ceph osd pool set pg_num -Please refer to :ref:`choosing-number-of-placement-groups` and -:ref:`pg-autoscaler` for more information. +For more information, see :ref:`choosing-number-of-placement-groups` and +:ref:`pg-autoscaler`. TOO_MANY_PGS ____________ -The number of PGs in use in the cluster is above the configurable -threshold of ``mon_max_pg_per_osd`` PGs per OSD. If this threshold is -exceed the cluster will not allow new pools to be created, pool `pg_num` to -be increased, or pool replication to be increased (any of which would lead to -more PGs in the cluster). A large number of PGs can lead -to higher memory utilization for OSD daemons, slower peering after -cluster state changes (like OSD restarts, additions, or removals), and -higher load on the Manager and Monitor daemons. +The number of Placement Groups (PGs) in use in the cluster is above the +configurable threshold of ``mon_max_pg_per_osd`` PGs per OSD. If this threshold +is exceeded, the cluster will not allow new pools to be created, pool `pg_num` +to be increased, or pool replication to be increased (any of which, if allowed, +would lead to more PGs in the cluster). A large number of PGs can lead to +higher memory utilization for OSD daemons, slower peering after cluster state +changes (for example, OSD restarts, additions, or removals), and higher load on +the Manager and Monitor daemons. -The simplest way to mitigate the problem is to increase the number of -OSDs in the cluster by adding more hardware. Note that the OSD count -used for the purposes of this health check is the number of "in" OSDs, -so marking "out" OSDs "in" (if there are any) can also help: +The simplest way to mitigate the problem is to increase the number of OSDs in +the cluster by adding more hardware. Note that, because the OSD count that is +used for the purposes of this health check is the number of ``in`` OSDs, +marking ``out`` OSDs ``in`` (if there are any ``out`` OSDs available) can also +help. To do so, run the following command: .. prompt:: bash $ ceph osd in -Please refer to :ref:`choosing-number-of-placement-groups` for more -information. +For more information, see :ref:`choosing-number-of-placement-groups`. POOL_TOO_MANY_PGS _________________ -One or more pools should probably have more PGs, based on the amount -of data that is currently stored in the pool. This can lead to higher -memory utilization for OSD daemons, slower peering after cluster state -changes (like OSD restarts, additions, or removals), and higher load -on the Manager and Monitor daemons. This warning is generated if the +One or more pools should probably have fewer Placement Groups (PGs), given the +amount of data that is currently stored in the pool. This issue can lead to +higher memory utilization for OSD daemons, slower peering after cluster state +changes (for example, OSD restarts, additions, or removals), and higher load on +the Manager and Monitor daemons. This alert is raised only if the ``pg_autoscale_mode`` property on the pool is set to ``warn``. -To disable the warning, you can disable auto-scaling of PGs for the -pool entirely with: +To disable the alert, entirely disable auto-scaling of PGs for the pool by +running the following command: .. prompt:: bash $ ceph osd pool set pg_autoscale_mode off -To allow the cluster to automatically adjust the number of PGs,: +To allow the cluster to automatically adjust the number of PGs for the pool, +run the following command: .. prompt:: bash $ ceph osd pool set pg_autoscale_mode on -You can also manually set the number of PGs for the pool to the -recommended amount with: +Alternatively, to manually set the number of PGs for the pool to the +recommended amount, run the following command: .. prompt:: bash $ ceph osd pool set pg_num -Please refer to :ref:`choosing-number-of-placement-groups` and -:ref:`pg-autoscaler` for more information. +For more information, see :ref:`choosing-number-of-placement-groups` and +:ref:`pg-autoscaler`. + POOL_TARGET_SIZE_BYTES_OVERCOMMITTED ____________________________________ -- 2.39.5