From 0fc663d793fe34e42e19b251fcc79b5af52581c6 Mon Sep 17 00:00:00 2001 From: Zac Dover Date: Wed, 11 Oct 2023 20:38:55 +1000 Subject: [PATCH] doc/rados: edit troubleshooting-osd (1 of x) Edit doc/rados/troubleshooting/troubleshooting-osd. Co-authored-by: Anthony D'Atri Signed-off-by: Zac Dover --- .../troubleshooting/troubleshooting-osd.rst | 334 +++++++++++------- 1 file changed, 200 insertions(+), 134 deletions(-) diff --git a/doc/rados/troubleshooting/troubleshooting-osd.rst b/doc/rados/troubleshooting/troubleshooting-osd.rst index 5d0185b9b686b..27c9751f24f67 100644 --- a/doc/rados/troubleshooting/troubleshooting-osd.rst +++ b/doc/rados/troubleshooting/troubleshooting-osd.rst @@ -2,202 +2,267 @@ Troubleshooting OSDs ====================== -Before troubleshooting your OSDs, first check your monitors and network. If -you execute ``ceph health`` or ``ceph -s`` on the command line and Ceph shows -``HEALTH_OK``, it means that the monitors have a quorum. -If you don't have a monitor quorum or if there are errors with the monitor -status, `address the monitor issues first <../troubleshooting-mon>`_. -Check your networks to ensure they -are running properly, because networks may have a significant impact on OSD -operation and performance. Look for dropped packets on the host side -and CRC errors on the switch side. +Before troubleshooting the cluster's OSDs, check the monitors +and the network. + +First, determine whether the monitors have a quorum. Run the ``ceph health`` +command or the ``ceph -s`` command and if Ceph shows ``HEALTH_OK`` then there +is a monitor quorum. + +If the monitors don't have a quorum or if there are errors with the monitor +status, address the monitor issues before proceeding by consulting the material +in `Troubleshooting Monitors <../troubleshooting-mon>`_. + +Next, check your networks to make sure that they are running properly. Networks +can have a significant impact on OSD operation and performance. Look for +dropped packets on the host side and CRC errors on the switch side. + Obtaining Data About OSDs ========================= -A good first step in troubleshooting your OSDs is to obtain topology information in -addition to the information you collected while `monitoring your OSDs`_ -(e.g., ``ceph osd tree``). +When troubleshooting OSDs, it is useful to collect different kinds of +information about the OSDs. Some information comes from the practice of +`monitoring OSDs`_ (for example, by running the ``ceph osd tree`` command). +Additional information concerns the topology of your cluster, and is discussed +in the following sections. Ceph Logs --------- -If you haven't changed the default path, you can find Ceph log files at -``/var/log/ceph``:: +Ceph log files are stored under ``/var/log/ceph``. Unless the path has been +changed (or you are in a containerized environment that stores logs in a +different location), the log files can be listed by running the following +command: - ls /var/log/ceph +.. prompt:: bash + + ls /var/log/ceph + +If there is not enough log detail, change the logging level. To ensure that +Ceph performs adequately under high logging volume, see `Logging and +Debugging`_. -If you don't see enough log detail you can change your logging level. See -`Logging and Debugging`_ for details to ensure that Ceph performs adequately -under high logging volume. Admin Socket ------------ -Use the admin socket tool to retrieve runtime information. For details, list -the sockets for your Ceph daemons:: +Use the admin socket tool to retrieve runtime information. First, list the +sockets of Ceph's daemons by running the following command: - ls /var/run/ceph +.. prompt:: bash -Then, execute the following, replacing ``{daemon-name}`` with an actual -daemon (e.g., ``osd.0``):: + ls /var/run/ceph - ceph daemon osd.0 help +Next, run a command of the following form (replacing ``{daemon-name}`` with the +name of a specific daemon: for example, ``osd.0``): -Alternatively, you can specify a ``{socket-file}`` (e.g., something in ``/var/run/ceph``):: +.. prompt:: bash - ceph daemon {socket-file} help + ceph daemon {daemon-name} help -The admin socket, among other things, allows you to: +Alternatively, run the command with a ``{socket-file}`` specified (a "socket +file" is a specific file in ``/var/run/ceph``): -- List your configuration at runtime -- Dump historic operations -- Dump the operation priority queue state -- Dump operations in flight -- Dump perfcounters +.. prompt:: bash -Display Freespace ------------------ + ceph daemon {socket-file} help -Filesystem issues may arise. To display your file system's free space, execute -``df``. :: +The admin socket makes many tasks possible, including: - df -h +- Listing Ceph configuration at runtime +- Dumping historic operations +- Dumping the operation priority queue state +- Dumping operations in flight +- Dumping perfcounters -Execute ``df --help`` for additional usage. +Display Free Space +------------------ + +Filesystem issues may arise. To display your filesystems' free space, run the +following command: + +.. prompt:: bash + + df -h + +To see this command's supported syntax and options, run ``df --help``. I/O Statistics -------------- -Use `iostat`_ to identify I/O-related issues. :: +The `iostat`_ tool can be used to identify I/O-related issues. Run the +following command: + +.. prompt:: bash + + iostat -x - iostat -x Diagnostic Messages ------------------- -To retrieve diagnostic messages from the kernel, use ``dmesg`` with ``less``, ``more``, ``grep`` -or ``tail``. For example:: +To retrieve diagnostic messages from the kernel, run the ``dmesg`` command and +specify the output with ``less``, ``more``, ``grep``, or ``tail``. For +example: - dmesg | grep scsi +.. prompt:: bash -Stopping w/out Rebalancing -========================== + dmesg | grep scsi + +Stopping without Rebalancing +============================ + +It might be occasionally necessary to perform maintenance on a subset of your +cluster or to resolve a problem that affects a failure domain (for example, a +rack). However, when you stop OSDs for maintenance, you might want to prevent +CRUSH from automatically rebalancing the cluster. To avert this rebalancing +behavior, set the cluster to ``noout`` by running the following command: + +.. prompt:: bash + + ceph osd set noout -Periodically, you may need to perform maintenance on a subset of your cluster, -or resolve a problem that affects a failure domain (e.g., a rack). If you do not -want CRUSH to automatically rebalance the cluster as you stop OSDs for -maintenance, set the cluster to ``noout`` first:: +.. warning:: This is more a thought exercise offered for the purpose of giving + the reader a sense of failure domains and CRUSH behavior than a suggestion + that anyone in the post-Luminous world run ``ceph osd set noout``. When the + OSDs return to an ``up`` state, rebalancing will resume and the change + introduced by the ``ceph osd set noout`` command will be reverted. - ceph osd set noout +In Luminous and later releases, however, it is a safer approach to flag only +affected OSDs. To add or remove a ``noout`` flag to a specific OSD, run a +command like the following: -On Luminous or newer releases it is safer to set the flag only on affected OSDs. -You can do this individually :: +.. prompt:: bash - ceph osd add-noout osd.0 - ceph osd rm-noout osd.0 + ceph osd add-noout osd.0 + ceph osd rm-noout osd.0 -Or an entire CRUSH bucket at a time. Say you're going to take down -``prod-ceph-data1701`` to add RAM :: +It is also possible to flag an entire CRUSH bucket. For example, if you plan to +take down ``prod-ceph-data1701`` in order to add RAM, you might run the +following command: - ceph osd set-group noout prod-ceph-data1701 +.. prompt:: bash -Once the flag is set you can stop the OSDs and any other colocated Ceph -services within the failure domain that requires maintenance work. :: + ceph osd set-group noout prod-ceph-data1701 - systemctl stop ceph\*.service ceph\*.target +After the flag is set, stop the OSDs and any other colocated +Ceph services within the failure domain that requires maintenance work:: -.. note:: Placement groups within the OSDs you stop will become ``degraded`` - while you are addressing issues with within the failure domain. + systemctl stop ceph\*.service ceph\*.target -Once you have completed your maintenance, restart the OSDs and any other -daemons. If you rebooted the host as part of the maintenance, these should -come back on their own without intervention. :: +.. note:: When an OSD is stopped, any placement groups within the OSD are + marked as ``degraded``. - sudo systemctl start ceph.target +After the maintenance is complete, it will be necessary to restart the OSDs +and any other daemons that have stopped. However, if the host was rebooted as +part of the maintenance, they do not need to be restarted and will come back up +automatically. To restart OSDs or other daemons, use a command of the following +form: -Finally, you must unset the cluster-wide``noout`` flag:: +.. prompt:: bash - ceph osd unset noout - ceph osd unset-group noout prod-ceph-data1701 + sudo systemctl start ceph.target + +Finally, unset the ``noout`` flag as needed by running commands like the +following: + +.. prompt:: bash + + ceph osd unset noout + ceph osd unset-group noout prod-ceph-data1701 + +Many contemporary Linux distributions employ ``systemd`` for service +management. However, for certain operating systems (especially older ones) it +might be necessary to issue equivalent ``service`` or ``start``/``stop`` +commands. -Note that most Linux distributions that Ceph supports today employ ``systemd`` -for service management. For other or older operating systems you may need -to issue equivalent ``service`` or ``start``/``stop`` commands. .. _osd-not-running: OSD Not Running =============== -Under normal circumstances, simply restarting the ``ceph-osd`` daemon will -allow it to rejoin the cluster and recover. +Under normal conditions, restarting a ``ceph-osd`` daemon will allow it to +rejoin the cluster and recover. + An OSD Won't Start ------------------ -If you start your cluster and an OSD won't start, check the following: - -- **Configuration File:** If you were not able to get OSDs running from - a new installation, check your configuration file to ensure it conforms - (e.g., ``host`` not ``hostname``, etc.). - -- **Check Paths:** Check the paths in your configuration, and the actual - paths themselves for data and metadata (journals, WAL, DB). If you separate the OSD data from - the metadata and there are errors in your configuration file or in the - actual mounts, you may have trouble starting OSDs. If you want to store the - metadata on a separate block device, you should partition or LVM your - drive and assign one partition per OSD. - -- **Check Max Threadcount:** If you have a node with a lot of OSDs, you may be - hitting the default maximum number of threads (e.g., usually 32k), especially - during recovery. You can increase the number of threads using ``sysctl`` to - see if increasing the maximum number of threads to the maximum possible - number of threads allowed (i.e., 4194303) will help. For example:: - - sysctl -w kernel.pid_max=4194303 - - If increasing the maximum thread count resolves the issue, you can make it - permanent by including a ``kernel.pid_max`` setting in a file under ``/etc/sysctl.d`` or - within the master ``/etc/sysctl.conf`` file. For example:: - - kernel.pid_max = 4194303 - -- **Check ``nf_conntrack``:** This connection tracking and limiting system - is the bane of many production Ceph clusters, and can be insidious in that - everything is fine at first. As cluster topology and client workload - grow, mysterious and intermittent connection failures and performance - glitches manifest, becoming worse over time and at certain times of day. - Check ``syslog`` history for table fillage events. You can mitigate this - bother by raising ``nf_conntrack_max`` to a much higher value via ``sysctl``. - Be sure to raise ``nf_conntrack_buckets`` accordingly to - ``nf_conntrack_max / 4``, which may require action outside of ``sysctl`` e.g. - ``"echo 131072 > /sys/module/nf_conntrack/parameters/hashsize`` - More interdictive but fussier is to blacklist the associated kernel modules - to disable processing altogether. This is fragile in that the modules - vary among kernel versions, as does the order in which they must be listed. - Even when blacklisted there are situations in which ``iptables`` or ``docker`` - may activate connection tracking anyway, so a "set and forget" strategy for - the tunables is advised. On modern systems this will not consume appreciable - resources. - -- **Kernel Version:** Identify the kernel version and distribution you - are using. Ceph uses some third party tools by default, which may be - buggy or may conflict with certain distributions and/or kernel - versions (e.g., Google ``gperftools`` and ``TCMalloc``). Check the - `OS recommendations`_ and the release notes for each Ceph version - to ensure you have addressed any issues related to your kernel. - -- **Segment Fault:** If there is a segment fault, increase log levels - and start the problematic daemon(s) again. If segment faults recur, - search the Ceph bug tracker `https://tracker.ceph/com/projects/ceph `_ - and the ``dev`` and ``ceph-users`` mailing list archives `https://ceph.io/resources `_. - If this is truly a new and unique - failure, post to the ``dev`` email list and provide the specific Ceph - release being run, ``ceph.conf`` (with secrets XXX'd out), - your monitor status output and excerpts from your log file(s). +If the cluster has started but an OSD isn't starting, check the following: + +- **Configuration File:** If you were not able to get OSDs running from a new + installation, check your configuration file to ensure it conforms to the + standard (for example, make sure that it says ``host`` and not ``hostname``, + etc.). + +- **Check Paths:** Ensure that the paths specified in the configuration + correspond to the paths for data and metadata that actually exist (for + example, the paths to the journals, the WAL, and the DB). Separate the OSD + data from the metadata in order to see whether there are errors in the + configuration file and in the actual mounts. If so, these errors might + explain why OSDs are not starting. To store the metadata on a separate block + device, partition or LVM the drive and assign one partition per OSD. + +- **Check Max Threadcount:** If the cluster has a node with an especially high + number of OSDs, it might be hitting the default maximum number of threads + (usually 32,000). This is especially likely to happen during recovery. + Increasing the maximum number of threads to the maximum possible number of + threads allowed (4194303) might help with the problem. To increase the number + of threads to the maximum, run the following command: + + .. prompt:: bash + + sysctl -w kernel.pid_max=4194303 + + If this increase resolves the issue, you must make the increase permanent by + including a ``kernel.pid_max`` setting either in a file under + ``/etc/sysctl.d`` or within the master ``/etc/sysctl.conf`` file. For + example:: + + kernel.pid_max = 4194303 + +- **Check ``nf_conntrack``:** This connection-tracking and connection-limiting + system causes problems for many production Ceph clusters. The problems often + emerge slowly and subtly. As cluster topology and client workload grow, + mysterious and intermittent connection failures and performance glitches + occur more and more, especially at certain times of the day. To begin taking + the measure of your problem, check the ``syslog`` history for "table full" + events. One way to address this kind of problem is as follows: First, use the + ``sysctl`` utility to assign ``nf_conntrack_max`` a much higher value. Next, + raise the value of ``nf_conntrack_buckets`` so that ``nf_conntrack_buckets`` + × 8 = ``nf_conntrack_max``; this action might require running commands + outside of ``sysctl`` (for example, ``"echo 131072 > + /sys/module/nf_conntrack/parameters/hashsize``). Another way to address the + problem is to blacklist the associated kernel modules in order to disable + processing altogether. This approach is powerful, but fragile. The modules + and the order in which the modules must be listed can vary among kernel + versions. Even when blacklisted, ``iptables`` and ``docker`` might sometimes + activate connection tracking anyway, so we advise a "set and forget" strategy + for the tunables. On modern systems, this approach will not consume + appreciable resources. + +- **Kernel Version:** Identify the kernel version and distribution that are in + use. By default, Ceph uses third-party tools that might be buggy or come into + conflict with certain distributions or kernel versions (for example, Google's + ``gperftools`` and ``TCMalloc``). Check the `OS recommendations`_ and the + release notes for each Ceph version in order to make sure that you have + addressed any issues related to your kernel. + +- **Segment Fault:** If there is a segment fault, increase log levels and + restart the problematic daemon(s). If segment faults recur, search the Ceph + bug tracker `https://tracker.ceph/com/projects/ceph + `_ and the ``dev`` and + ``ceph-users`` mailing list archives `https://ceph.io/resources + `_ to see if others have experienced and reported + these issues. If this truly is a new and unique failure, post to the ``dev`` + email list and provide the following information: the specific Ceph release + being run, ``ceph.conf`` (with secrets XXX'd out), your monitor status + output, and excerpts from your log file(s). + An OSD Failed ------------- @@ -612,6 +677,7 @@ from eventually being marked ``out`` (regardless of what the current value for .. _Monitor/OSD Interaction: ../../configuration/mon-osd-interaction .. _Monitor Config Reference: ../../configuration/mon-config-ref .. _monitoring your OSDs: ../../operations/monitoring-osd-pg +.. _monitoring OSDs: ../../operations/monitoring-osd-pg/#monitoring-osds .. _subscribe to the ceph-devel email list: mailto:majordomo@vger.kernel.org?body=subscribe+ceph-devel .. _unsubscribe from the ceph-devel email list: mailto:majordomo@vger.kernel.org?body=unsubscribe+ceph-devel .. _subscribe to the ceph-users email list: mailto:ceph-users-join@lists.ceph.com -- 2.39.5