From dbb1dd33e6cc7b0443aee15602f89da1bb1e0d02 Mon Sep 17 00:00:00 2001 From: Sage Weil Date: Tue, 1 Aug 2017 09:25:27 -0400 Subject: [PATCH] doc/rados/operations/health-checks: add PG health check commentary Include a link to pg-repair.rst, although there is no content there yet. Signed-off-by: Sage Weil --- doc/rados/operations/health-checks.rst | 214 +++++++++++++++++++++++++ doc/rados/operations/pg-repair.rst | 4 + 2 files changed, 218 insertions(+) create mode 100644 doc/rados/operations/pg-repair.rst diff --git a/doc/rados/operations/health-checks.rst b/doc/rados/operations/health-checks.rst index e5156fbe3af..b612995081e 100644 --- a/doc/rados/operations/health-checks.rst +++ b/doc/rados/operations/health-checks.rst @@ -218,79 +218,293 @@ You can either raise the pool quota with:: or delete some existing data to reduce utilization. + Data health (pools & placement groups) ------------------------------ PG_AVAILABILITY _______________ +Data availability is reduced, meaning that the cluster is unable to +service potential read or write requests for some data in the cluster. +Specifically, one or more PGs is in a state that does not allow IO +requests to be serviced. Problematic PG states include *peering*, +*stale*, *incomplete*, and the lack of *active* (if those conditions do not clear +quickly). + +Detailed information about which PGs are affected is available from:: + + ceph health detail + +In most cases the root cause is that one or more OSDs is currently +down; see the dicussion for ``OSD_DOWN`` above. + +The state of specific problematic PGs can be queried with:: + + ceph tell query PG_DEGRADED ___________ +Data redundancy is reduced for some data, meaning the cluster does not +have the desired number of replicas for all data (for replicated +pools) or erasure code fragments (for erasure coded pools). +Specifically, one or more PGs: + +* has the *degraded* or *undersized* flag set, meaning there are not + enough instances of that placement group in the cluster; +* has not had the *clean* flag set for some time. + +Detailed information about which PGs are affected is available from:: + + ceph health detail + +In most cases the root cause is that one or more OSDs is currently +down; see the dicussion for ``OSD_DOWN`` above. + +The state of specific problematic PGs can be queried with:: + + ceph tell query + PG_DEGRADED_FULL ________________ +Data redundancy may be reduced or at risk for some data due to a lack +of free space in the cluster. Specifically, one or more PGs has the +*backfill_toofull* or *recovery_toofull* flag set, meaning that the +cluster is unable to migrate or recover data because one or more OSDs +is above the *backfillfull* threshold. + +See the discussion for *OSD_BACKFILLFULL* or *OSD_FULL* above for +steps to resolve this condition. PG_DAMAGED __________ +Data scrubbing has discovered some problems with data consistency in +the cluster. Specifically, one or more PGs has the *inconsistent* or +*snaptrim_error* flag is set, indicating an earlier scrub operation +found a problem, or that the *repair* flag is set, meaning a repair +for such an inconsistency is currently in progress. + +See :doc:`pg-repair` for more information. + OSD_SCRUB_ERRORS ________________ +Recent OSD scrubs have uncovered inconsistencies. This error is generally +paired with *PG_DAMANGED* (see above). + +See :doc:`pg-repair` for more information. CACHE_POOL_NEAR_FULL ____________________ +A cache tier pool is nearly full. Full in this context is determined +by the ``target_max_bytes`` and ``target_max_objects`` properties on +the cache pool. Once the pool reaches the target threshold, write +requests to the pool may block while data is flushed and evicted +from the cache, a state that normally leads to very high latencies and +poor performance. + +The cache pool target size can be adjusted with:: + + ceph osd pool set target_max_bytes + ceph osd pool set target_max_objects + +Normal cache flush and evict activity may also be throttled due to reduced +availability or performance of the base tier, or overall cluster load. TOO_FEW_PGS ___________ +The number of PGs in use in the cluster is below the configurable +threshold of ``mon_pg_warn_min_per_osd`` PGs per OSD. This can lead +to suboptimizal distribution and balance of data across the OSDs in +the cluster, and similar reduce overall performance. + +This may be an expected condition if data pools have not yet been +created. + +The PG count for existing pools can be increased or new pools can be +created. Please refer to +:doc:`placement-groups#Choosing-the-number-of-Placement-Groups` for +more information. TOO_MANY_PGS ____________ +The number of PGs in use in the cluster is above the configurable +threshold of ``mon_pg_warn_max_per_osd`` PGs per OSD. This can lead +to higher memory utilization for OSD daemons, slower peering after +cluster state changes (like OSD restarts, additions, or removals), and +higher load on the Manager and Monitor daemons. + +The ``pg_num`` value for existing pools cannot currently be reduced. +However, the ``pgp_num`` value can, which effectively collocates some +PGs on the same sets of OSDs, mitigating some of the negative impacts +described above. The ``pgp_num`` value can be adjusted with:: + + ceph osd pool set pgp_num + +Please refer to +:doc:`placement-groups#Choosing-the-number-of-Placement-Groups` for +more information. SMALLER_PGP_NUM _______________ +One or more pools has a ``pgp_num`` value less than ``pg_num``. This +is normally an indication that the PG count was increased without +also increasing the placement behavior. + +This is sometimes done deliberately to separate out the `split` step +when the PG count is adjusted from the data migration that is needed +when ``pgp_num`` is changed. + +This is normally resolved by setting ``pgp_num`` to match ``pg_num``, +triggering the data migration, with:: + + ceph osd pool set pgp_num + MANY_OBJECTS_PER_PG ___________________ +One or more pools has an average number of objects per PG that is +significantly higher than the overall cluster average. The specific +threshold is controlled by the ``mon_pg_warn_max_object_skew`` +configuration value. + +This is usually an indication that the pool(s) containing most of the +data in the cluster have too few PGs, and/or that other pools that do +not contain as much data have too many PGs. See the discussion of +*TOO_MANY_PGS* above. + +The threshold can be raised to silence the health warning by adjusting +the ``mon_pg_warn_max_object_skew`` config option on the monitors. POOL_FULL _________ +One or more pools has reached (or is very close to reaching) its +quota. The threshold to trigger this error condition is controlled by +the ``mon_pool_quota_crit_threshold`` configuration option. + +Pool quotas can be adjusted up or down (or removed) with:: + + ceph osd pool set-quota max_bytes + ceph osd pool set-quota max_objects + +Setting the quota value to 0 will disable the quota. POOL_NEAR_FULL ______________ +One or more pools is approaching is quota. The threshold to trigger +this warning condition is controlled by the +``mon_pool_quota_warn_threshold`` configuration option. + +Pool quotas can be adjusted up or down (or removed) with:: + + ceph osd pool set-quota max_bytes + ceph osd pool set-quota max_objects + +Setting the quota value to 0 will disable the quota. OBJECT_MISPLACED ________________ +One or more objects in the cluster is not stored on the node the +cluster would like it to be stored on. This is an indication that +data migration due to some recent cluster change has not yet completed. + +Misplaced data is not a dangerous condition in and of itself; data +consistency is never at risk, and old copies of objects are never +removed until the desired number of new copies (in the desired +locations) are present. OBJECT_UNFOUND ______________ +One or more objects in the cluster cannot be found. Specifically, the +OSDs know that a new or updated copy of an object should exist, but a +copy of that version of the object has not been found on OSDs that are +currently online. + +Read or write requests to unfound objects will block. + +Ideally, a down OSD can be brought back online that has the more +recent copy of the unfound object. Candidate OSDs can be identified from the +peering state for the PG(s) responsible for the unfound object:: + + ceph tell query + +If the latest copy of the object is not available, the cluster can be +told to roll back to a previous version of the object. See +:doc:`troubleshooting-pg#Unfound-objects` for more information. REQUEST_SLOW ____________ +One or more OSD requests is taking a long time to process. This can +be an indication of extreme load, a slow storage device, or a software +bug. + +The request queue on the OSD(s) in question can be queried with the +following command, executed from the OSD host:: + + ceph daemon osd. ops + +A summary of the slowest recent requests can be seen with:: + + ceph daemon osd. dump_historic_ops + +The location of an OSD can be found with:: + + ceph osd find osd. REQUEST_STUCK _____________ +One or more OSD requests has been blocked for an extremely long time. +This is an indication that either the cluster has been unhealthy for +an extended period of time (e.g., not enough running OSDs) or there is +some internal problem with the OSD. See the dicussion of +*REQUEST_SLOW* above. PG_NOT_SCRUBBED _______________ +One or more PGs has not been scrubbed recently. PGs are normally +scrubbed every ``mon_scrub_interval`` seconds, and this warning +triggers when ``mon_warn_not_scrubbed`` such intervals have elapsed +without a scrub. + +PGs will not scrub if they are not flagged as *clean*, which may +happen if they are misplaced or degraded (see *PG_AVAILABILITY* and +*PG_DEGRADED* above). + +You can manually initiate a scrub of a clean PG with:: + + ceph pg scrub PG_NOT_DEEP_SCRUBBED ____________________ +One or more PGs has not been deep scrubbed recently. PGs are normally +scrubbed every ``osd_deep_mon_scrub_interval`` seconds, and this warning +triggers when ``mon_warn_not_deep_scrubbed`` such intervals have elapsed +without a scrub. + +PGs will not (deep) scrub if they are not flagged as *clean*, which may +happen if they are misplaced or degraded (see *PG_AVAILABILITY* and +*PG_DEGRADED* above). + +You can manually initiate a scrub of a clean PG with:: + + ceph pg deep-scrub CephFS ------ diff --git a/doc/rados/operations/pg-repair.rst b/doc/rados/operations/pg-repair.rst new file mode 100644 index 00000000000..0d6692a35e9 --- /dev/null +++ b/doc/rados/operations/pg-repair.rst @@ -0,0 +1,4 @@ +Repairing PG inconsistencies +============================ + + -- 2.39.5