From: Zac Dover Date: Sun, 10 Aug 2025 08:32:03 +0000 (+1000) Subject: doc/cephfs: edit troubleshooting.rst X-Git-Tag: testing/wip-vshankar-testing-20250821.112602-debug~29^2 X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=a17fd3f41b3e4a7ea5160bcdfe7a8b43b4b0f2e3;p=ceph-ci.git doc/cephfs: edit troubleshooting.rst Edit the section "The MDS" in the file doc/cephfs/troubleshooting.rst. Signed-off-by: Zac Dover --- diff --git a/doc/cephfs/troubleshooting.rst b/doc/cephfs/troubleshooting.rst index 8d47f4e4a68..29d5788dba7 100644 --- a/doc/cephfs/troubleshooting.rst +++ b/doc/cephfs/troubleshooting.rst @@ -222,20 +222,27 @@ problems first (:doc:`../../rados/troubleshooting/index`). The MDS ======= -If an operation is hung inside the MDS, it will eventually show up in ``ceph health``, -identifying "slow requests are blocked". It may also identify clients as -"failing to respond" or misbehaving in other ways. If the MDS identifies -specific clients as misbehaving, you should investigate why they are doing so. +Run the ``ceph health`` command. Any operation that is hung in the MDS is +indicated by the ``slow requests are blocked`` message. -Generally it will be the result of +Messages that read ``failing to respond`` indicate that a client is failing to +respond. -#. Overloading the system (if you have extra RAM, increase the - "mds cache memory limit" config from its default 1GiB; having a larger active - file set than your MDS cache is the #1 cause of this!). +The following list details potential causes of hung operations: -#. Running an older (misbehaving) client. +#. The system is overloaded. The most likely cause of system overload is an + active file set that is larger than the MDS cache. + + If you have extra RAM, increase the ``mds_cache_memory_limit``. The specific + tunable ``mds_cache_memory_limit`` is discussed in the :ref:`MDS Cache + Size`. Read the :ref:`MDS + Cache Configuration` section in full before + making any alterations to the ``mds_cache_memory_limit`` tunable. -#. Underlying RADOS issues. +#. There is an older (misbehaving) client. + +#. There are underlying RADOS issues. See :ref:`The RADOS troubleshooting + documentation`. Otherwise, you have probably discovered a new bug and should report it to the developers! diff --git a/doc/rados/troubleshooting/index.rst b/doc/rados/troubleshooting/index.rst index b481ee1dc9c..8b937d8d19d 100644 --- a/doc/rados/troubleshooting/index.rst +++ b/doc/rados/troubleshooting/index.rst @@ -1,3 +1,5 @@ +.. _rados_troubleshooting: + ================= Troubleshooting =================