From c59bc152ea746a8bfc5795dbeb68b2ef6e511b1f Mon Sep 17 00:00:00 2001 From: Zac Dover Date: Sun, 2 Apr 2023 06:39:32 +1000 Subject: [PATCH] doc/rados/ops: edit health-checks.rst (6 of x) Edit docs/rados/operations/health-checks.rst (6 of x). Follows https://github.com/ceph/ceph/pull/50828. https://tracker.ceph.com/issues/58485 Co-authored-by: Anthony D'Atri Signed-off-by: Zac Dover --- doc/rados/operations/health-checks.rst | 134 ++++++++++++------------- 1 file changed, 66 insertions(+), 68 deletions(-) diff --git a/doc/rados/operations/health-checks.rst b/doc/rados/operations/health-checks.rst index 5db0a0bb74867..4058907d1306f 100644 --- a/doc/rados/operations/health-checks.rst +++ b/doc/rados/operations/health-checks.rst @@ -1410,44 +1410,43 @@ Miscellaneous RECENT_CRASH ____________ -One or more Ceph daemons has crashed recently, and the crash has not -yet been archived (acknowledged) by the administrator. This may -indicate a software bug, a hardware problem (e.g., a failing disk), or -some other problem. +One or more Ceph daemons have crashed recently, and the crash(es) have not yet +been acknowledged and archived by the administrator. This alert might indicate +a software bug, a hardware problem (for example, a failing disk), or some other +problem. -New crashes can be listed with: +To list recent crashes, run the following command: .. prompt:: bash $ ceph crash ls-new -Information about a specific crash can be examined with: +To examine information about a specific crash, run the following command: .. prompt:: bash $ ceph crash info -This warning can be silenced by "archiving" the crash (perhaps after -being examined by an administrator) so that it does not generate this -warning: +To silence this alert, you can archive the crash (perhaps after the crash +has been examined by an administrator) by running the following command: .. prompt:: bash $ ceph crash archive -Similarly, all new crashes can be archived with: +Similarly, to archive all recent crashes, run the following command: .. prompt:: bash $ ceph crash archive-all -Archived crashes will still be visible via ``ceph crash ls`` but not -``ceph crash ls-new``. +Archived crashes will still be visible by running the command ``ceph crash +ls``, but not by running the command ``ceph crash ls-new``. -The time period for what "recent" means is controlled by the option +The time period that is considered recent is determined by the option ``mgr/crash/warn_recent_interval`` (default: two weeks). -These warnings can be disabled entirely with: +To entirely disable this alert, run the following command: .. prompt:: bash $ @@ -1456,40 +1455,40 @@ These warnings can be disabled entirely with: RECENT_MGR_MODULE_CRASH _______________________ -One or more ceph-mgr modules has crashed recently, and the crash as -not yet been archived (acknowledged) by the administrator. This -generally indicates a software bug in one of the software modules run -inside the ceph-mgr daemon. Although the module that experienced the -problem maybe be disabled as a result, the function of other modules -is normally unaffected. +One or more ``ceph-mgr`` modules have crashed recently, and the crash(es) have +not yet been acknowledged and archived by the administrator. This alert +usually indicates a software bug in one of the software modules that are +running inside the ``ceph-mgr`` daemon. The module that experienced the problem +might be disabled as a result, but other modules are unaffected and continue to +function as expected. -As with the *RECENT_CRASH* health alert, the crash can be inspected with: +As with the *RECENT_CRASH* health check, a specific crash can be inspected by +running the following command: .. prompt:: bash $ ceph crash info -This warning can be silenced by "archiving" the crash (perhaps after -being examined by an administrator) so that it does not generate this -warning: +To silence this alert, you can archive the crash (perhaps after the crash has +been examined by an administrator) by running the following command: .. prompt:: bash $ ceph crash archive -Similarly, all new crashes can be archived with: +Similarly, to archive all recent crashes, run the following command: .. prompt:: bash $ ceph crash archive-all -Archived crashes will still be visible via ``ceph crash ls`` but not -``ceph crash ls-new``. +Archived crashes will still be visible by running the command ``ceph crash ls`` +but not by running the command ``ceph crash ls-new``. -The time period for what "recent" means is controlled by the option +The time period that is considered recent is determined by the option ``mgr/crash/warn_recent_interval`` (default: two weeks). -These warnings can be disabled entirely with: +To entirely disable this alert, run the following command: .. prompt:: bash $ @@ -1498,33 +1497,32 @@ These warnings can be disabled entirely with: TELEMETRY_CHANGED _________________ -Telemetry has been enabled, but the contents of the telemetry report -have changed since that time, so telemetry reports will not be sent. +Telemetry has been enabled, but because the contents of the telemetry report +have changed in the meantime, telemetry reports will not be sent. -The Ceph developers periodically revise the telemetry feature to -include new and useful information, or to remove information found to -be useless or sensitive. If any new information is included in the -report, Ceph will require the administrator to re-enable telemetry to -ensure they have an opportunity to (re)review what information will be +Ceph developers occasionally revise the telemetry feature to include new and +useful information, or to remove information found to be useless or sensitive. +If any new information is included in the report, Ceph requires the +administrator to re-enable telemetry. This requirement ensures that the +administrator has an opportunity to (re)review the information that will be shared. -To review the contents of the telemetry report: +To review the contents of the telemetry report, run the following command: .. prompt:: bash $ ceph telemetry show -Note that the telemetry report consists of several optional channels -that may be independently enabled or disabled. For more information, see -:ref:`telemetry`. +Note that the telemetry report consists of several channels that may be +independently enabled or disabled. For more information, see :ref:`telemetry`. -To re-enable telemetry (and make this warning go away): +To re-enable telemetry (and silence the alert), run the following command: .. prompt:: bash $ ceph telemetry on -To disable telemetry (and make this warning go away): +To disable telemetry (and silence the alert), run the following command: .. prompt:: bash $ @@ -1533,25 +1531,26 @@ To disable telemetry (and make this warning go away): AUTH_BAD_CAPS _____________ -One or more auth users has capabilities that cannot be parsed by the -monitor. This generally indicates that the user will not be -authorized to perform any action with one or more daemon types. +One or more auth users have capabilities that cannot be parsed by the monitors. +As a general rule, this alert indicates that there are one or more daemon types +that the user is not authorized to use to perform any action. -This error is mostly likely to occur after an upgrade if the -capabilities were set with an older version of Ceph that did not -properly validate their syntax, or if the syntax of the capabilities -has changed. +This alert is most likely to be raised after an upgrade if (1) the capabilities +were set with an older version of Ceph that did not properly validate the +syntax of those capabilities, or if (2) the syntax of the capabilities has +changed. -The user in question can be removed with: +To remove the user(s) in question, run the following command: .. prompt:: bash $ ceph auth rm -(This will resolve the health alert, but obviously clients will not be -able to authenticate as that user.) +(This resolves the health check, but it prevents clients from being able to +authenticate as the removed user.) -Alternatively, the capabilities for the user can be updated with: +Alternatively, to update the capabilities for the user(s), run the following +command: .. prompt:: bash $ @@ -1562,17 +1561,17 @@ For more information about auth capabilities, see :ref:`user-management`. OSD_NO_DOWN_OUT_INTERVAL ________________________ -The ``mon_osd_down_out_interval`` option is set to zero, which means -that the system will not automatically perform any repair or healing -operations after an OSD fails. Instead, an administrator (or some -other external entity) will need to manually mark down OSDs as 'out' -(i.e., via ``ceph osd out ``) in order to trigger recovery. +The ``mon_osd_down_out_interval`` option is set to zero, which means that the +system does not automatically perform any repair or healing operations when an +OSD fails. Instead, an administrator an external orchestrator must manually +mark "down" OSDs as ``out`` (by running ``ceph osd out ``) in order to +trigger recovery. -This option is normally set to five or ten minutes--enough time for a -host to power-cycle or reboot. +This option is normally set to five or ten minutes, which should be enough time +for a host to power-cycle or reboot. -This warning can silenced by setting the -``mon_warn_on_osd_down_out_interval_zero`` to false: +To silence this alert, set ``mon_warn_on_osd_down_out_interval_zero`` to +``false`` by running the following command: .. prompt:: bash $ @@ -1581,13 +1580,12 @@ This warning can silenced by setting the DASHBOARD_DEBUG _______________ -The Dashboard debug mode is enabled. This means, if there is an error -while processing a REST API request, the HTTP error response contains -a Python traceback. This behaviour should be disabled in production -environments because such a traceback might contain and expose sensible -information. +The Dashboard debug mode is enabled. This means that if there is an error while +processing a REST API request, the HTTP error response will contain a Python +traceback. This mode should be disabled in production environments because such +a traceback might contain and expose sensitive information. -The debug mode can be disabled with: +To disable the debug mode, run the following command: .. prompt:: bash $ -- 2.39.5