Troubleshooting OSDs
======================
-Before troubleshooting your OSDs, first check your monitors and network. If
-you execute ``ceph health`` or ``ceph -s`` on the command line and Ceph shows
-``HEALTH_OK``, it means that the monitors have a quorum.
-If you don't have a monitor quorum or if there are errors with the monitor
-status, `address the monitor issues first <../troubleshooting-mon>`_.
-Check your networks to ensure they
-are running properly, because networks may have a significant impact on OSD
-operation and performance. Look for dropped packets on the host side
-and CRC errors on the switch side.
+Before troubleshooting the cluster's OSDs, check the monitors
+and the network.
+
+First, determine whether the monitors have a quorum. Run the ``ceph health``
+command or the ``ceph -s`` command and if Ceph shows ``HEALTH_OK`` then there
+is a monitor quorum.
+
+If the monitors don't have a quorum or if there are errors with the monitor
+status, address the monitor issues before proceeding by consulting the material
+in `Troubleshooting Monitors <../troubleshooting-mon>`_.
+
+Next, check your networks to make sure that they are running properly. Networks
+can have a significant impact on OSD operation and performance. Look for
+dropped packets on the host side and CRC errors on the switch side.
+
Obtaining Data About OSDs
=========================
-A good first step in troubleshooting your OSDs is to obtain topology information in
-addition to the information you collected while `monitoring your OSDs`_
-(e.g., ``ceph osd tree``).
+When troubleshooting OSDs, it is useful to collect different kinds of
+information about the OSDs. Some information comes from the practice of
+`monitoring OSDs`_ (for example, by running the ``ceph osd tree`` command).
+Additional information concerns the topology of your cluster, and is discussed
+in the following sections.
Ceph Logs
---------
-If you haven't changed the default path, you can find Ceph log files at
-``/var/log/ceph``::
+Ceph log files are stored under ``/var/log/ceph``. Unless the path has been
+changed (or you are in a containerized environment that stores logs in a
+different location), the log files can be listed by running the following
+command:
- ls /var/log/ceph
+.. prompt:: bash
+
+ ls /var/log/ceph
+
+If there is not enough log detail, change the logging level. To ensure that
+Ceph performs adequately under high logging volume, see `Logging and
+Debugging`_.
-If you don't see enough log detail you can change your logging level. See
-`Logging and Debugging`_ for details to ensure that Ceph performs adequately
-under high logging volume.
Admin Socket
------------
-Use the admin socket tool to retrieve runtime information. For details, list
-the sockets for your Ceph daemons::
+Use the admin socket tool to retrieve runtime information. First, list the
+sockets of Ceph's daemons by running the following command:
- ls /var/run/ceph
+.. prompt:: bash
-Then, execute the following, replacing ``{daemon-name}`` with an actual
-daemon (e.g., ``osd.0``)::
+ ls /var/run/ceph
- ceph daemon osd.0 help
+Next, run a command of the following form (replacing ``{daemon-name}`` with the
+name of a specific daemon: for example, ``osd.0``):
-Alternatively, you can specify a ``{socket-file}`` (e.g., something in ``/var/run/ceph``)::
+.. prompt:: bash
- ceph daemon {socket-file} help
+ ceph daemon {daemon-name} help
-The admin socket, among other things, allows you to:
+Alternatively, run the command with a ``{socket-file}`` specified (a "socket
+file" is a specific file in ``/var/run/ceph``):
-- List your configuration at runtime
-- Dump historic operations
-- Dump the operation priority queue state
-- Dump operations in flight
-- Dump perfcounters
+.. prompt:: bash
-Display Freespace
------------------
+ ceph daemon {socket-file} help
-Filesystem issues may arise. To display your file system's free space, execute
-``df``. ::
+The admin socket makes many tasks possible, including:
- df -h
+- Listing Ceph configuration at runtime
+- Dumping historic operations
+- Dumping the operation priority queue state
+- Dumping operations in flight
+- Dumping perfcounters
-Execute ``df --help`` for additional usage.
+Display Free Space
+------------------
+
+Filesystem issues may arise. To display your filesystems' free space, run the
+following command:
+
+.. prompt:: bash
+
+ df -h
+
+To see this command's supported syntax and options, run ``df --help``.
I/O Statistics
--------------
-Use `iostat`_ to identify I/O-related issues. ::
+The `iostat`_ tool can be used to identify I/O-related issues. Run the
+following command:
+
+.. prompt:: bash
+
+ iostat -x
- iostat -x
Diagnostic Messages
-------------------
-To retrieve diagnostic messages from the kernel, use ``dmesg`` with ``less``, ``more``, ``grep``
-or ``tail``. For example::
+To retrieve diagnostic messages from the kernel, run the ``dmesg`` command and
+specify the output with ``less``, ``more``, ``grep``, or ``tail``. For
+example:
- dmesg | grep scsi
+.. prompt:: bash
-Stopping w/out Rebalancing
-==========================
+ dmesg | grep scsi
+
+Stopping without Rebalancing
+============================
+
+It might be occasionally necessary to perform maintenance on a subset of your
+cluster or to resolve a problem that affects a failure domain (for example, a
+rack). However, when you stop OSDs for maintenance, you might want to prevent
+CRUSH from automatically rebalancing the cluster. To avert this rebalancing
+behavior, set the cluster to ``noout`` by running the following command:
+
+.. prompt:: bash
+
+ ceph osd set noout
-Periodically, you may need to perform maintenance on a subset of your cluster,
-or resolve a problem that affects a failure domain (e.g., a rack). If you do not
-want CRUSH to automatically rebalance the cluster as you stop OSDs for
-maintenance, set the cluster to ``noout`` first::
+.. warning:: This is more a thought exercise offered for the purpose of giving
+ the reader a sense of failure domains and CRUSH behavior than a suggestion
+ that anyone in the post-Luminous world run ``ceph osd set noout``. When the
+ OSDs return to an ``up`` state, rebalancing will resume and the change
+ introduced by the ``ceph osd set noout`` command will be reverted.
- ceph osd set noout
+In Luminous and later releases, however, it is a safer approach to flag only
+affected OSDs. To add or remove a ``noout`` flag to a specific OSD, run a
+command like the following:
-On Luminous or newer releases it is safer to set the flag only on affected OSDs.
-You can do this individually ::
+.. prompt:: bash
- ceph osd add-noout osd.0
- ceph osd rm-noout osd.0
+ ceph osd add-noout osd.0
+ ceph osd rm-noout osd.0
-Or an entire CRUSH bucket at a time. Say you're going to take down
-``prod-ceph-data1701`` to add RAM ::
+It is also possible to flag an entire CRUSH bucket. For example, if you plan to
+take down ``prod-ceph-data1701`` in order to add RAM, you might run the
+following command:
- ceph osd set-group noout prod-ceph-data1701
+.. prompt:: bash
-Once the flag is set you can stop the OSDs and any other colocated Ceph
-services within the failure domain that requires maintenance work. ::
+ ceph osd set-group noout prod-ceph-data1701
- systemctl stop ceph\*.service ceph\*.target
+After the flag is set, stop the OSDs and any other colocated
+Ceph services within the failure domain that requires maintenance work::
-.. note:: Placement groups within the OSDs you stop will become ``degraded``
- while you are addressing issues with within the failure domain.
+ systemctl stop ceph\*.service ceph\*.target
-Once you have completed your maintenance, restart the OSDs and any other
-daemons. If you rebooted the host as part of the maintenance, these should
-come back on their own without intervention. ::
+.. note:: When an OSD is stopped, any placement groups within the OSD are
+ marked as ``degraded``.
- sudo systemctl start ceph.target
+After the maintenance is complete, it will be necessary to restart the OSDs
+and any other daemons that have stopped. However, if the host was rebooted as
+part of the maintenance, they do not need to be restarted and will come back up
+automatically. To restart OSDs or other daemons, use a command of the following
+form:
-Finally, you must unset the cluster-wide``noout`` flag::
+.. prompt:: bash
- ceph osd unset noout
- ceph osd unset-group noout prod-ceph-data1701
+ sudo systemctl start ceph.target
+
+Finally, unset the ``noout`` flag as needed by running commands like the
+following:
+
+.. prompt:: bash
+
+ ceph osd unset noout
+ ceph osd unset-group noout prod-ceph-data1701
+
+Many contemporary Linux distributions employ ``systemd`` for service
+management. However, for certain operating systems (especially older ones) it
+might be necessary to issue equivalent ``service`` or ``start``/``stop``
+commands.
-Note that most Linux distributions that Ceph supports today employ ``systemd``
-for service management. For other or older operating systems you may need
-to issue equivalent ``service`` or ``start``/``stop`` commands.
.. _osd-not-running:
OSD Not Running
===============
-Under normal circumstances, simply restarting the ``ceph-osd`` daemon will
-allow it to rejoin the cluster and recover.
+Under normal conditions, restarting a ``ceph-osd`` daemon will allow it to
+rejoin the cluster and recover.
+
An OSD Won't Start
------------------
-If you start your cluster and an OSD won't start, check the following:
-
-- **Configuration File:** If you were not able to get OSDs running from
- a new installation, check your configuration file to ensure it conforms
- (e.g., ``host`` not ``hostname``, etc.).
-
-- **Check Paths:** Check the paths in your configuration, and the actual
- paths themselves for data and metadata (journals, WAL, DB). If you separate the OSD data from
- the metadata and there are errors in your configuration file or in the
- actual mounts, you may have trouble starting OSDs. If you want to store the
- metadata on a separate block device, you should partition or LVM your
- drive and assign one partition per OSD.
-
-- **Check Max Threadcount:** If you have a node with a lot of OSDs, you may be
- hitting the default maximum number of threads (e.g., usually 32k), especially
- during recovery. You can increase the number of threads using ``sysctl`` to
- see if increasing the maximum number of threads to the maximum possible
- number of threads allowed (i.e., 4194303) will help. For example::
-
- sysctl -w kernel.pid_max=4194303
-
- If increasing the maximum thread count resolves the issue, you can make it
- permanent by including a ``kernel.pid_max`` setting in a file under ``/etc/sysctl.d`` or
- within the master ``/etc/sysctl.conf`` file. For example::
-
- kernel.pid_max = 4194303
-
-- **Check ``nf_conntrack``:** This connection tracking and limiting system
- is the bane of many production Ceph clusters, and can be insidious in that
- everything is fine at first. As cluster topology and client workload
- grow, mysterious and intermittent connection failures and performance
- glitches manifest, becoming worse over time and at certain times of day.
- Check ``syslog`` history for table fillage events. You can mitigate this
- bother by raising ``nf_conntrack_max`` to a much higher value via ``sysctl``.
- Be sure to raise ``nf_conntrack_buckets`` accordingly to
- ``nf_conntrack_max / 4``, which may require action outside of ``sysctl`` e.g.
- ``"echo 131072 > /sys/module/nf_conntrack/parameters/hashsize``
- More interdictive but fussier is to blacklist the associated kernel modules
- to disable processing altogether. This is fragile in that the modules
- vary among kernel versions, as does the order in which they must be listed.
- Even when blacklisted there are situations in which ``iptables`` or ``docker``
- may activate connection tracking anyway, so a "set and forget" strategy for
- the tunables is advised. On modern systems this will not consume appreciable
- resources.
-
-- **Kernel Version:** Identify the kernel version and distribution you
- are using. Ceph uses some third party tools by default, which may be
- buggy or may conflict with certain distributions and/or kernel
- versions (e.g., Google ``gperftools`` and ``TCMalloc``). Check the
- `OS recommendations`_ and the release notes for each Ceph version
- to ensure you have addressed any issues related to your kernel.
-
-- **Segment Fault:** If there is a segment fault, increase log levels
- and start the problematic daemon(s) again. If segment faults recur,
- search the Ceph bug tracker `https://tracker.ceph/com/projects/ceph <https://tracker.ceph.com/projects/ceph/>`_
- and the ``dev`` and ``ceph-users`` mailing list archives `https://ceph.io/resources <https://ceph.io/resources>`_.
- If this is truly a new and unique
- failure, post to the ``dev`` email list and provide the specific Ceph
- release being run, ``ceph.conf`` (with secrets XXX'd out),
- your monitor status output and excerpts from your log file(s).
+If the cluster has started but an OSD isn't starting, check the following:
+
+- **Configuration File:** If you were not able to get OSDs running from a new
+ installation, check your configuration file to ensure it conforms to the
+ standard (for example, make sure that it says ``host`` and not ``hostname``,
+ etc.).
+
+- **Check Paths:** Ensure that the paths specified in the configuration
+ correspond to the paths for data and metadata that actually exist (for
+ example, the paths to the journals, the WAL, and the DB). Separate the OSD
+ data from the metadata in order to see whether there are errors in the
+ configuration file and in the actual mounts. If so, these errors might
+ explain why OSDs are not starting. To store the metadata on a separate block
+ device, partition or LVM the drive and assign one partition per OSD.
+
+- **Check Max Threadcount:** If the cluster has a node with an especially high
+ number of OSDs, it might be hitting the default maximum number of threads
+ (usually 32,000). This is especially likely to happen during recovery.
+ Increasing the maximum number of threads to the maximum possible number of
+ threads allowed (4194303) might help with the problem. To increase the number
+ of threads to the maximum, run the following command:
+
+ .. prompt:: bash
+
+ sysctl -w kernel.pid_max=4194303
+
+ If this increase resolves the issue, you must make the increase permanent by
+ including a ``kernel.pid_max`` setting either in a file under
+ ``/etc/sysctl.d`` or within the master ``/etc/sysctl.conf`` file. For
+ example::
+
+ kernel.pid_max = 4194303
+
+- **Check ``nf_conntrack``:** This connection-tracking and connection-limiting
+ system causes problems for many production Ceph clusters. The problems often
+ emerge slowly and subtly. As cluster topology and client workload grow,
+ mysterious and intermittent connection failures and performance glitches
+ occur more and more, especially at certain times of the day. To begin taking
+ the measure of your problem, check the ``syslog`` history for "table full"
+ events. One way to address this kind of problem is as follows: First, use the
+ ``sysctl`` utility to assign ``nf_conntrack_max`` a much higher value. Next,
+ raise the value of ``nf_conntrack_buckets`` so that ``nf_conntrack_buckets``
+ × 8 = ``nf_conntrack_max``; this action might require running commands
+ outside of ``sysctl`` (for example, ``"echo 131072 >
+ /sys/module/nf_conntrack/parameters/hashsize``). Another way to address the
+ problem is to blacklist the associated kernel modules in order to disable
+ processing altogether. This approach is powerful, but fragile. The modules
+ and the order in which the modules must be listed can vary among kernel
+ versions. Even when blacklisted, ``iptables`` and ``docker`` might sometimes
+ activate connection tracking anyway, so we advise a "set and forget" strategy
+ for the tunables. On modern systems, this approach will not consume
+ appreciable resources.
+
+- **Kernel Version:** Identify the kernel version and distribution that are in
+ use. By default, Ceph uses third-party tools that might be buggy or come into
+ conflict with certain distributions or kernel versions (for example, Google's
+ ``gperftools`` and ``TCMalloc``). Check the `OS recommendations`_ and the
+ release notes for each Ceph version in order to make sure that you have
+ addressed any issues related to your kernel.
+
+- **Segment Fault:** If there is a segment fault, increase log levels and
+ restart the problematic daemon(s). If segment faults recur, search the Ceph
+ bug tracker `https://tracker.ceph/com/projects/ceph
+ <https://tracker.ceph.com/projects/ceph/>`_ and the ``dev`` and
+ ``ceph-users`` mailing list archives `https://ceph.io/resources
+ <https://ceph.io/resources>`_ to see if others have experienced and reported
+ these issues. If this truly is a new and unique failure, post to the ``dev``
+ email list and provide the following information: the specific Ceph release
+ being run, ``ceph.conf`` (with secrets XXX'd out), your monitor status
+ output, and excerpts from your log file(s).
+
An OSD Failed
-------------
.. _Monitor/OSD Interaction: ../../configuration/mon-osd-interaction
.. _Monitor Config Reference: ../../configuration/mon-config-ref
.. _monitoring your OSDs: ../../operations/monitoring-osd-pg
+.. _monitoring OSDs: ../../operations/monitoring-osd-pg/#monitoring-osds
.. _subscribe to the ceph-devel email list: mailto:majordomo@vger.kernel.org?body=subscribe+ceph-devel
.. _unsubscribe from the ceph-devel email list: mailto:majordomo@vger.kernel.org?body=unsubscribe+ceph-devel
.. _subscribe to the ceph-users email list: mailto:ceph-users-join@lists.ceph.com