From: Zac Dover Date: Fri, 2 Jun 2023 05:43:36 +0000 (+1000) Subject: doc/rados: edit log-and-debug.rst (1 of x) X-Git-Tag: v17.2.7~342^2 X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=7dfe5dd122d119d54d4c0d6454d8a741784ef46f;p=ceph.git doc/rados: edit log-and-debug.rst (1 of x) Edit roughly the first half of doc/rados/troubleshooting/log-and-debug.rst. https://tracker.ceph.com/issues/58485 Co-authored-by: Anthony D'Atri Signed-off-by: Zac Dover (cherry picked from commit 0d74e6a0ddaa9131e8e0570d14b530dadf2db485) --- diff --git a/doc/rados/troubleshooting/log-and-debug.rst b/doc/rados/troubleshooting/log-and-debug.rst index c5328090c9e18..152ecdfe5e529 100644 --- a/doc/rados/troubleshooting/log-and-debug.rst +++ b/doc/rados/troubleshooting/log-and-debug.rst @@ -2,177 +2,206 @@ Logging and Debugging ======================= -Typically, when you add debugging to your Ceph configuration, you do so at -runtime. You can also add Ceph debug logging to your Ceph configuration file if -you are encountering issues when starting your cluster. You may view Ceph log -files under ``/var/log/ceph`` (the default location). - -.. tip:: When debug output slows down your system, the latency can hide - race conditions. - -Logging is resource intensive. If you are encountering a problem in a specific -area of your cluster, enable logging for that area of the cluster. For example, -if your OSDs are running fine, but your metadata servers are not, you should -start by enabling debug logging for the specific metadata server instance(s) -giving you trouble. Enable logging for each subsystem as needed. - -.. important:: Verbose logging can generate over 1GB of data per hour. If your - OS disk reaches its capacity, the node will stop working. - -If you enable or increase the rate of Ceph logging, ensure that you have -sufficient disk space on your OS disk. See `Accelerating Log Rotation`_ for -details on rotating log files. When your system is running well, remove -unnecessary debugging settings to ensure your cluster runs optimally. Logging -debug output messages is relatively slow, and a waste of resources when -operating your cluster. - -See `Subsystem, Log and Debug Settings`_ for details on available settings. +Ceph component debug log levels can be adjusted at runtime, while services are +running. In some circumstances you might want to adjust debug log levels in +``ceph.conf`` or in the central config store. Increased debug logging can be +useful if you are encountering issues when operating your cluster. By default, +Ceph log files are in ``/var/log/ceph``. + +.. tip:: Remember that debug output can slow down your system, and that this + latency sometimes hides race conditions. + +Debug logging is resource intensive. If you encounter a problem in a specific +component of your cluster, begin troubleshooting by enabling logging for only +that component of the cluster. For example, if your OSDs are running without +errors, but your metadata servers are not, enable logging for any specific +metadata server instances that are having problems. Continue by enabling +logging for each subsystem only as needed. + +.. important:: Verbose logging sometimes generates over 1 GB of data per hour. + If the disk that your operating system runs on (your "OS disk") reaches its + capacity, the node associated with that disk will stop working. + +Whenever you enable or increase the rate of debug logging, make sure that you +have ample capacity for log files, as this may dramatically increase their +size. For details on rotating log files, see `Accelerating Log Rotation`_. +When your system is running well again, remove unnecessary debugging settings +in order to ensure that your cluster runs optimally. Logging debug-output +messages is a slow process and a potential waste of your cluster's resources. + +For details on available settings, see `Subsystem, Log and Debug Settings`_. Runtime ======= -If you would like to see the configuration settings at runtime, you must log -in to a host with a running daemon and execute the following:: +To see the configuration settings at runtime, log in to a host that has a +running daemon and run a command of the following form: - ceph daemon {daemon-name} config show | less +.. prompt:: bash $ -For example,:: + ceph daemon {daemon-name} config show | less - ceph daemon osd.0 config show | less +For example: -To activate Ceph's debugging output (*i.e.*, ``dout()``) at runtime, use the -``ceph tell`` command to inject arguments into the runtime configuration:: +.. prompt:: bash $ - ceph tell {daemon-type}.{daemon id or *} config set {name} {value} - -Replace ``{daemon-type}`` with one of ``osd``, ``mon`` or ``mds``. You may apply -the runtime setting to all daemons of a particular type with ``*``, or specify -a specific daemon's ID. For example, to increase -debug logging for a ``ceph-osd`` daemon named ``osd.0``, execute the following:: + ceph daemon osd.0 config show | less - ceph tell osd.0 config set debug_osd 0/5 +To activate Ceph's debugging output (that is, the ``dout()`` logging function) +at runtime, inject arguments into the runtime configuration by running a ``ceph +tell`` command of the following form: -The ``ceph tell`` command goes through the monitors. If you cannot bind to the -monitor, you can still make the change by logging into the host of the daemon -whose configuration you'd like to change using ``ceph daemon``. -For example:: +.. prompt:: bash $ - sudo ceph daemon osd.0 config set debug_osd 0/5 + ceph tell {daemon-type}.{daemon id or *} config set {name} {value} -See `Subsystem, Log and Debug Settings`_ for details on available settings. +Here ``{daemon-type}`` is ``osd``, ``mon``, or ``mds``. Apply the runtime +setting either to a specific daemon (by specifying its ID) or to all daemons of +a particular type (by using the ``*`` operator). For example, to increase +debug logging for a specific ``ceph-osd`` daemon named ``osd.0``, run the +following command: + +.. prompt:: bash $ + + ceph tell osd.0 config set debug_osd 0/5 + +The ``ceph tell`` command goes through the monitors. However, if you are unable +to bind to the monitor, there is another method that can be used to activate +Ceph's debugging output: use the ``ceph daemon`` command to log in to the host +of a specific daemon and change the daemon's configuration. For example: + +.. prompt:: bash $ + + sudo ceph daemon osd.0 config set debug_osd 0/5 + +For details on available settings, see `Subsystem, Log and Debug Settings`_. Boot Time ========= -To activate Ceph's debugging output (*i.e.*, ``dout()``) at boot time, you must -add settings to your Ceph configuration file. Subsystems common to each daemon -may be set under ``[global]`` in your configuration file. Subsystems for -particular daemons are set under the daemon section in your configuration file -(*e.g.*, ``[mon]``, ``[osd]``, ``[mds]``). For example +To activate Ceph's debugging output (that is, the ``dout()`` logging function) +at boot time, you must add settings to your Ceph configuration file. +Subsystems that are common to all daemons are set under ``[global]`` in the +configuration file. Subsystems for a specific daemon are set under the relevant +daemon section in the configuration file (for example, ``[mon]``, ``[osd]``, +``[mds]``). Here is an example that shows possible debugging settings in a Ceph +configuration file: .. code-block:: ini - [global] - debug ms = 1/5 - - [mon] - debug mon = 20 - debug paxos = 1/5 - debug auth = 2 - - [osd] - debug osd = 1/5 - debug filestore = 1/5 - debug journal = 1 - debug monc = 5/20 - - [mds] - debug mds = 1 - debug mds balancer = 1 + [global] + debug_ms = 1/5 + + [mon] + debug_mon = 20 + debug_paxos = 1/5 + debug_auth = 2 + + [osd] + debug_osd = 1/5 + debug_filestore = 1/5 + debug_journal = 1 + debug_monc = 5/20 + + [mds] + debug_mds = 1 + debug_mds_balancer = 1 -See `Subsystem, Log and Debug Settings`_ for details. +For details, see `Subsystem, Log and Debug Settings`_. Accelerating Log Rotation ========================= -If your OS disk is relatively full, you can accelerate log rotation by modifying -the Ceph log rotation file at ``/etc/logrotate.d/ceph``. Add a size setting -after the rotation frequency to accelerate log rotation (via cronjob) if your -logs exceed the size setting. For example, the default setting looks like -this:: - - rotate 7 - weekly - compress - sharedscripts - -Modify it by adding a ``size`` setting. :: - - rotate 7 - weekly - size 500M - compress - sharedscripts - -Then, start the crontab editor for your user space. :: - - crontab -e - -Finally, add an entry to check the ``etc/logrotate.d/ceph`` file. :: - - 30 * * * * /usr/sbin/logrotate /etc/logrotate.d/ceph >/dev/null 2>&1 - -The preceding example checks the ``etc/logrotate.d/ceph`` file every 30 minutes. +If your log filesystem is nearly full, you can accelerate log rotation by +modifying the Ceph log rotation file at ``/etc/logrotate.d/ceph``. To increase +the frequency of log rotation (which will guard against a filesystem reaching +capacity), add a ``size`` directive after the ``weekly`` frequency directive. +To smooth out volume spikes, consider changing ``weekly`` to ``daily`` and +consider changing ``rotate`` to ``30``. The procedure for adding the size +setting is shown immediately below. + +#. Note the default settings of the ``/etc/logrotate.d/ceph`` file:: + + rotate 7 + weekly + compress + sharedscripts + +#. Modify them by adding a ``size`` setting:: + + rotate 7 + weekly + size 500M + compress + sharedscripts +#. Start the crontab editor for your user space: + + .. prompt:: bash $ + + crontab -e + +#. Add an entry to crontab that instructs cron to check the + ``etc/logrotate.d/ceph`` file:: + + 30 * * * * /usr/sbin/logrotate /etc/logrotate.d/ceph >/dev/null 2>&1 + +In this example, the ``etc/logrotate.d/ceph`` file will be checked every 30 +minutes. Valgrind ======== -Debugging may also require you to track down memory and threading issues. -You can run a single daemon, a type of daemon, or the whole cluster with -Valgrind. You should only use Valgrind when developing or debugging Ceph. -Valgrind is computationally expensive, and will slow down your system otherwise. -Valgrind messages are logged to ``stderr``. +When you are debugging your cluster's performance, you might find it necessary +to track down memory and threading issues. The Valgrind tool suite can be used +to detect problems in a specific daemon, in a particular type of daemon, or in +the entire cluster. Because Valgrind is computationally expensive, it should be +used only when developing or debugging Ceph, and it will slow down your system +if used at other times. Valgrind messages are logged to ``stderr``. Subsystem, Log and Debug Settings ================================= -In most cases, you will enable debug logging output via subsystems. +Debug logging output is typically enabled via subsystems. Ceph Subsystems --------------- -Each subsystem has a logging level for its output logs, and for its logs -in-memory. You may set different values for each of these subsystems by setting -a log file level and a memory level for debug logging. Ceph's logging levels -operate on a scale of ``1`` to ``20``, where ``1`` is terse and ``20`` is -verbose [#]_ . In general, the logs in-memory are not sent to the output log unless: +For each subsystem, there is a logging level for its output logs (a so-called +"log level") and a logging level for its in-memory logs (a so-called "memory +level"). Different values may be set for these two logging levels in each +subsystem. Ceph's logging levels operate on a scale of ``1`` to ``20``, where +``1`` is terse and ``20`` is verbose [#f1]_. As a general rule, the in-memory +logs are not sent to the output log unless one or more of the following +conditions obtain: - a fatal signal is raised or - an ``assert`` in source code is triggered or -- upon requested. Please consult `document on admin socket `_ for more details. - -A debug logging setting can take a single value for the log level and the -memory level, which sets them both as the same value. For example, if you -specify ``debug ms = 5``, Ceph will treat it as a log level and a memory level -of ``5``. You may also specify them separately. The first setting is the log -level, and the second setting is the memory level. You must separate them with -a forward slash (/). For example, if you want to set the ``ms`` subsystem's -debug logging level to ``1`` and its memory level to ``5``, you would specify it -as ``debug ms = 1/5``. For example: - - +- upon requested. Please consult `document on admin socket + `_ for more details. + +.. warning :: + .. [#f1] In certain rare cases, there are logging levels that can take a value greater than 20. The resulting logs are extremely verbose. + +Log levels and memory levels can be set either together or separately. If a +subsystem is assigned a single value, then that value determines both the log +level and the memory level. For example, ``debug ms = 5`` will give the ``ms`` +subsystem a log level of ``5`` and a memory level of ``5``. On the other hand, +if a subsystem is assigned two values that are separated by a forward slash +(/), then the first value determines the log level and the second value +determines the memory level. For example, ``debug ms = 1/5`` will give the +``ms`` subsystem a log level of ``1`` and a memory level of ``5``. See the +following: .. code-block:: ini - debug {subsystem} = {log-level}/{memory-level} - #for example - debug mds balancer = 1/20 - + debug {subsystem} = {log-level}/{memory-level} + #for example + debug mds balancer = 1/20 The following table provides a list of Ceph subsystems and their default log and memory levels. Once you complete your logging efforts, restore the subsystems @@ -352,4 +381,3 @@ RADOS Gateway - :confval:`rgw_usage_log_flush_threshold` - :confval:`rgw_usage_log_tick_interval` -.. [#] there are levels >20 in some rare cases and that they are extremely verbose.