doc/rados: edit troubleshooting-osd (1 of x)

author Zac Dover <zac.dover@proton.me>

Wed, 11 Oct 2023 10:38:55 +0000 (20:38 +1000)

committer Zac Dover <zac.dover@proton.me>

Thu, 12 Oct 2023 05:00:35 +0000 (15:00 +1000)
author Zac Dover <zac.dover@proton.me>
Wed, 11 Oct 2023 10:38:55 +0000 (20:38 +1000)
committer Zac Dover <zac.dover@proton.me>
Thu, 12 Oct 2023 05:00:35 +0000 (15:00 +1000)
diff --git a/doc/rados/troubleshooting/troubleshooting-osd.rst b/doc/rados/troubleshooting/troubleshooting-osd.rst

index 5d0185b9b686b7316db5d1984c81d44c3d918902..27c9751f24f67f64636498d3f83d5133787ff9de 100644 (file)
--- a/doc/rados/troubleshooting/troubleshooting-osd.rst
+++ b/doc/rados/troubleshooting/troubleshooting-osd.rst
@@ -2,202 +2,267 @@
   Troubleshooting OSDs
  ======================
  
-Before troubleshooting your OSDs, first check your monitors and network. If
-you execute ``ceph health`` or ``ceph -s`` on the command line and Ceph shows
-``HEALTH_OK``, it means that the monitors have a quorum.
-If you don't have a monitor quorum or if there are errors with the monitor
-status, `address the monitor issues first <../troubleshooting-mon>`_.
-Check your networks to ensure they
-are running properly, because networks may have a significant impact on OSD
-operation and performance. Look for dropped packets on the host side
-and CRC errors on the switch side.
+Before troubleshooting the cluster's OSDs, check the monitors
+and the network. 
+
+First, determine whether the monitors have a quorum. Run the ``ceph health``
+command or the ``ceph -s`` command and if Ceph shows ``HEALTH_OK`` then there
+is a monitor quorum. 
+
+If the monitors don't have a quorum or if there are errors with the monitor
+status, address the monitor issues before proceeding by consulting the material
+in `Troubleshooting Monitors <../troubleshooting-mon>`_.
+
+Next, check your networks to make sure that they are running properly. Networks
+can have a significant impact on OSD operation and performance. Look for
+dropped packets on the host side and CRC errors on the switch side.
+
  
  Obtaining Data About OSDs
  =========================
  
-A good first step in troubleshooting your OSDs is to obtain topology information in
-addition to the information you collected while `monitoring your OSDs`_
-(e.g., ``ceph osd tree``).
+When troubleshooting OSDs, it is useful to collect different kinds of
+information about the OSDs. Some information comes from the practice of
+`monitoring OSDs`_ (for example, by running the ``ceph osd tree`` command).
+Additional information concerns the topology of your cluster, and is discussed
+in the following sections.
  
  
  Ceph Logs
  ---------
  
-If you haven't changed the default path, you can find Ceph log files at
-``/var/log/ceph``::
+Ceph log files are stored under ``/var/log/ceph``. Unless the path has been
+changed (or you are in a containerized environment that stores logs in a
+different location), the log files can be listed by running the following
+command:
  
-       ls /var/log/ceph
+.. prompt:: bash
+
+   ls /var/log/ceph
+
+If there is not enough log detail, change the logging level. To ensure that
+Ceph performs adequately under high logging volume, see `Logging and
+Debugging`_.
  
-If you don't see enough log detail you can change your logging level.  See
-`Logging and Debugging`_ for details to ensure that Ceph performs adequately
-under high logging volume.
  
  
  Admin Socket
  ------------
  
-Use the admin socket tool to retrieve runtime information. For details, list
-the sockets for your Ceph daemons::
+Use the admin socket tool to retrieve runtime information. First, list the
+sockets of Ceph's daemons by running the following command:
  
-       ls /var/run/ceph
+.. prompt:: bash
  
-Then, execute the following, replacing ``{daemon-name}`` with an actual
-daemon (e.g., ``osd.0``)::
+   ls /var/run/ceph
  
-  ceph daemon osd.0 help
+Next, run a command of the following form (replacing ``{daemon-name}`` with the
+name of a specific daemon: for example, ``osd.0``):
  
-Alternatively, you can specify a ``{socket-file}`` (e.g., something in ``/var/run/ceph``)::
+.. prompt:: bash
  
-  ceph daemon {socket-file} help
+   ceph daemon {daemon-name} help
  
-The admin socket, among other things, allows you to:
+Alternatively, run the command with a ``{socket-file}`` specified (a "socket
+file" is a specific file in ``/var/run/ceph``):
  
-- List your configuration at runtime
-- Dump historic operations
-- Dump the operation priority queue state
-- Dump operations in flight
-- Dump perfcounters
+.. prompt:: bash
  
-Display Freespace
------------------
+   ceph daemon {socket-file} help
  
-Filesystem issues may arise. To display your file system's free space, execute
-``df``. ::
+The admin socket makes many tasks possible, including:
  
-       df -h
+- Listing Ceph configuration at runtime
+- Dumping historic operations
+- Dumping the operation priority queue state
+- Dumping operations in flight
+- Dumping perfcounters
  
-Execute ``df --help`` for additional usage.
+Display Free Space
+------------------
+
+Filesystem issues may arise. To display your filesystems' free space, run the
+following command:
+
+.. prompt:: bash
+
+   df -h
+
+To see this command's supported syntax and options, run ``df --help``.
  
  I/O Statistics
  --------------
  
-Use `iostat`_ to identify I/O-related issues. ::
+The `iostat`_ tool can be used to identify I/O-related issues. Run the
+following command:
+
+.. prompt:: bash
+
+   iostat -x
  
-       iostat -x
  
  Diagnostic Messages
  -------------------
  
-To retrieve diagnostic messages from the kernel, use ``dmesg`` with ``less``, ``more``, ``grep``
-or ``tail``.  For example::
+To retrieve diagnostic messages from the kernel, run the ``dmesg`` command and
+specify the output with ``less``, ``more``, ``grep``, or ``tail``. For
+example: 
  
-       dmesg | grep scsi
+.. prompt:: bash
  
-Stopping w/out Rebalancing
-==========================
+    dmesg | grep scsi
+
+Stopping without Rebalancing
+============================
+
+It might be occasionally necessary to perform maintenance on a subset of your
+cluster or to resolve a problem that affects a failure domain (for example, a
+rack).  However, when you stop OSDs for maintenance, you might want to prevent
+CRUSH from automatically rebalancing the cluster. To avert this rebalancing
+behavior, set the cluster to ``noout`` by running the following command:
+
+.. prompt:: bash
+
+   ceph osd set noout
  
-Periodically, you may need to perform maintenance on a subset of your cluster,
-or resolve a problem that affects a failure domain (e.g., a rack). If you do not
-want CRUSH to automatically rebalance the cluster as you stop OSDs for
-maintenance, set the cluster to ``noout`` first::
+.. warning:: This is more a thought exercise offered for the purpose of giving
+   the reader a sense of failure domains and CRUSH behavior than a suggestion
+   that anyone in the post-Luminous world run ``ceph osd set noout``. When the
+   OSDs return to an ``up`` state, rebalancing will resume and the change
+   introduced by the ``ceph osd set noout`` command will be reverted.
  
-       ceph osd set noout
+In Luminous and later releases, however, it is a safer approach to flag only
+affected OSDs.  To add or remove a ``noout`` flag to a specific OSD, run a
+command like the following:
  
-On Luminous or newer releases it is safer to set the flag only on affected OSDs.
-You can do this individually ::
+.. prompt:: bash
  
-       ceph osd add-noout osd.0
-       ceph osd rm-noout  osd.0
+   ceph osd add-noout osd.0
+   ceph osd rm-noout  osd.0
  
-Or an entire CRUSH bucket at a time.  Say you're going to take down
-``prod-ceph-data1701`` to add RAM ::
+It is also possible to flag an entire CRUSH bucket. For example, if you plan to
+take down ``prod-ceph-data1701`` in order to add RAM, you might run the
+following command:
  
-       ceph osd set-group noout prod-ceph-data1701
+.. prompt:: bash
  
-Once the flag is set you can stop the OSDs and any other colocated Ceph
-services within the failure domain that requires maintenance work. ::
+   ceph osd set-group noout prod-ceph-data1701
  
-       systemctl stop ceph\*.service ceph\*.target
+After the flag is set, stop the OSDs and any other colocated
+Ceph services within the failure domain that requires maintenance work::
  
-.. note:: Placement groups within the OSDs you stop will become ``degraded``
-   while you are addressing issues with within the failure domain.
+   systemctl stop ceph\*.service ceph\*.target
  
-Once you have completed your maintenance, restart the OSDs and any other
-daemons.  If you rebooted the host as part of the maintenance, these should
-come back on their own without intervention. ::
+.. note:: When an OSD is stopped, any placement groups within the OSD are
+   marked as ``degraded``.
  
-       sudo systemctl start ceph.target
+After the maintenance is complete, it will be necessary to restart the OSDs
+and any other daemons that have stopped. However, if the host was rebooted as
+part of the maintenance, they do not need to be restarted and will come back up
+automatically. To restart OSDs or other daemons, use a command of the following
+form:
  
-Finally, you must unset the cluster-wide``noout`` flag::
+.. prompt:: bash
  
-       ceph osd unset noout
-       ceph osd unset-group noout prod-ceph-data1701
+   sudo systemctl start ceph.target
+
+Finally, unset the ``noout`` flag as needed by running commands like the
+following:
+
+.. prompt:: bash
+
+   ceph osd unset noout
+   ceph osd unset-group noout prod-ceph-data1701
+
+Many contemporary Linux distributions employ ``systemd`` for service
+management.  However, for certain operating systems (especially older ones) it
+might be necessary to issue equivalent ``service`` or ``start``/``stop``
+commands.
  
-Note that most Linux distributions that Ceph supports today employ ``systemd``
-for service management.  For other or older operating systems you may need
-to issue equivalent ``service`` or ``start``/``stop`` commands.
  
  .. _osd-not-running:
  
  OSD Not Running
  ===============
  
-Under normal circumstances, simply restarting the ``ceph-osd`` daemon will
-allow it to rejoin the cluster and recover.
+Under normal conditions, restarting a ``ceph-osd`` daemon will allow it to
+rejoin the cluster and recover.
+
  
  An OSD Won't Start
  ------------------
  
-If you start your cluster and an OSD won't start, check the following:
-
-- **Configuration File:** If you were not able to get OSDs running from
-  a new installation, check your configuration file to ensure it conforms
-  (e.g., ``host`` not ``hostname``, etc.).
-
-- **Check Paths:** Check the paths in your configuration, and the actual
-  paths themselves for data and metadata (journals, WAL, DB). If you separate the OSD data from
-  the metadata and there are errors in your configuration file or in the
-  actual mounts, you may have trouble starting OSDs. If you want to store the
-  metadata on a separate block device, you should partition or LVM your
-  drive and assign one partition per OSD.
-
-- **Check Max Threadcount:** If you have a node with a lot of OSDs, you may be
-  hitting the default maximum number of threads (e.g., usually 32k), especially
-  during recovery. You can increase the number of threads using ``sysctl`` to
-  see if increasing the maximum number of threads to the maximum possible
-  number of threads allowed (i.e.,  4194303) will help. For example::
-
-       sysctl -w kernel.pid_max=4194303
-
-  If increasing the maximum thread count resolves the issue, you can make it
-  permanent by including a ``kernel.pid_max`` setting in a file under ``/etc/sysctl.d`` or
-  within the master ``/etc/sysctl.conf`` file. For example::
-
-       kernel.pid_max = 4194303
-
-- **Check ``nf_conntrack``:** This connection tracking and limiting system
-  is the bane of many production Ceph clusters, and can be insidious in that
-  everything is fine at first. As cluster topology and client workload
-  grow, mysterious and intermittent connection failures and performance
-  glitches manifest, becoming worse over time and at certain times of day.
-  Check ``syslog`` history for table fillage events.  You can mitigate this
-  bother by raising ``nf_conntrack_max`` to a much higher value via ``sysctl``.
-  Be sure to raise ``nf_conntrack_buckets`` accordingly to
-  ``nf_conntrack_max / 4``, which may require action outside of ``sysctl`` e.g.
-  ``"echo 131072 > /sys/module/nf_conntrack/parameters/hashsize``
-  More interdictive but fussier is to blacklist the associated kernel modules
-  to disable processing altogether.  This is fragile in that the modules
-  vary among kernel versions, as does the order in which they must be listed.
-  Even when blacklisted there are situations in which ``iptables`` or ``docker``
-  may activate connection tracking anyway, so a "set and forget" strategy for
-  the tunables is advised.  On modern systems this will not consume appreciable
-  resources.
-
-- **Kernel Version:** Identify the kernel version and distribution you
-  are using. Ceph uses some third party tools by default, which may be
-  buggy or may conflict with certain distributions and/or kernel
-  versions (e.g., Google ``gperftools`` and ``TCMalloc``). Check the
-  `OS recommendations`_ and the release notes for each Ceph version
-  to ensure you have addressed any issues related to your kernel.
-
-- **Segment Fault:** If there is a segment fault, increase log levels
-  and start the problematic daemon(s) again. If segment faults recur,
-  search the Ceph bug tracker `https://tracker.ceph/com/projects/ceph <https://tracker.ceph.com/projects/ceph/>`_
-  and the ``dev`` and ``ceph-users`` mailing list archives `https://ceph.io/resources <https://ceph.io/resources>`_.
-  If this is truly a new and unique
-  failure, post to the ``dev`` email list and provide the specific Ceph
-  release being run, ``ceph.conf`` (with secrets XXX'd out),
-  your monitor status output and excerpts from your log file(s).
+If the cluster has started but an OSD isn't starting, check the following:
+
+- **Configuration File:** If you were not able to get OSDs running from a new
+  installation, check your configuration file to ensure it conforms to the
+  standard (for example, make sure that it says ``host`` and not ``hostname``,
+  etc.).
+
+- **Check Paths:** Ensure that the paths specified in the configuration
+  correspond to the paths for data and metadata that actually exist (for
+  example, the paths to the journals, the WAL, and the DB). Separate the OSD
+  data from the metadata in order to see whether there are errors in the
+  configuration file and in the actual mounts. If so, these errors might
+  explain why OSDs are not starting. To store the metadata on a separate block
+  device, partition or LVM the drive and assign one partition per OSD.
+
+- **Check Max Threadcount:** If the cluster has a node with an especially high
+  number of OSDs, it might be hitting the default maximum number of threads
+  (usually 32,000).  This is especially likely to happen during recovery.
+  Increasing the maximum number of threads to the maximum possible number of
+  threads allowed (4194303) might help with the problem. To increase the number
+  of threads to the maximum, run the following command:
+
+  .. prompt:: bash
+
+     sysctl -w kernel.pid_max=4194303
+
+  If this increase resolves the issue, you must make the increase permanent by
+  including a ``kernel.pid_max`` setting either in a file under
+  ``/etc/sysctl.d`` or within the master ``/etc/sysctl.conf`` file. For
+  example::
+
+     kernel.pid_max = 4194303
+
+- **Check ``nf_conntrack``:** This connection-tracking and connection-limiting
+  system causes problems for many production Ceph clusters. The problems often
+  emerge slowly and subtly. As cluster topology and client workload grow,
+  mysterious and intermittent connection failures and performance glitches
+  occur more and more, especially at certain times of the day. To begin taking
+  the measure of your problem, check the ``syslog`` history for "table full"
+  events. One way to address this kind of problem is as follows: First, use the
+  ``sysctl`` utility to assign ``nf_conntrack_max`` a much higher value. Next,
+  raise the value of ``nf_conntrack_buckets`` so that ``nf_conntrack_buckets``
+  × 8 = ``nf_conntrack_max``; this action might require running commands
+  outside of ``sysctl`` (for example, ``"echo 131072 >
+  /sys/module/nf_conntrack/parameters/hashsize``). Another way to address the
+  problem is to blacklist the associated kernel modules in order to disable
+  processing altogether. This approach is powerful, but fragile. The modules
+  and the order in which the modules must be listed can vary among kernel
+  versions. Even when blacklisted, ``iptables`` and ``docker`` might sometimes
+  activate connection tracking anyway, so we advise a "set and forget" strategy
+  for the tunables. On modern systems, this approach will not consume
+  appreciable resources.
+
+- **Kernel Version:** Identify the kernel version and distribution that are in
+  use. By default, Ceph uses third-party tools that might be buggy or come into
+  conflict with certain distributions or kernel versions (for example, Google's
+  ``gperftools`` and ``TCMalloc``). Check the `OS recommendations`_ and the
+  release notes for each Ceph version in order to make sure that you have
+  addressed any issues related to your kernel.
+
+- **Segment Fault:** If there is a segment fault, increase log levels and
+  restart the problematic daemon(s). If segment faults recur, search the Ceph
+  bug tracker `https://tracker.ceph/com/projects/ceph
+  <https://tracker.ceph.com/projects/ceph/>`_ and the ``dev`` and
+  ``ceph-users`` mailing list archives `https://ceph.io/resources
+  <https://ceph.io/resources>`_ to see if others have experienced and reported
+  these issues. If this truly is a new and unique failure, post to the ``dev``
+  email list and provide the following information: the specific Ceph release
+  being run, ``ceph.conf`` (with secrets XXX'd out), your monitor status
+  output, and excerpts from your log file(s).
+
  
  An OSD Failed
  -------------
@@ -612,6 +677,7 @@ from eventually being marked ``out`` (regardless of what the current value for
  .. _Monitor/OSD Interaction: ../../configuration/mon-osd-interaction
  .. _Monitor Config Reference: ../../configuration/mon-config-ref
  .. _monitoring your OSDs: ../../operations/monitoring-osd-pg
+.. _monitoring OSDs: ../../operations/monitoring-osd-pg/#monitoring-osds
  .. _subscribe to the ceph-devel email list: mailto:majordomo@vger.kernel.org?body=subscribe+ceph-devel
  .. _unsubscribe from the ceph-devel email list: mailto:majordomo@vger.kernel.org?body=unsubscribe+ceph-devel
  .. _subscribe to the ceph-users email list: mailto:ceph-users-join@lists.ceph.com
author	Zac Dover <zac.dover@proton.me>
	Wed, 11 Oct 2023 10:38:55 +0000 (20:38 +1000)
committer	Zac Dover <zac.dover@proton.me>
	Thu, 12 Oct 2023 05:00:35 +0000 (15:00 +1000)