From: Zac Dover Date: Sun, 10 Aug 2025 08:32:03 +0000 (+1000) Subject: doc/cephfs: edit troubleshooting.rst X-Git-Tag: v20.1.1~104^2 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=refs%2Fpull%2F65089%2Fhead;p=ceph.git doc/cephfs: edit troubleshooting.rst Edit the section "The MDS" in the file doc/cephfs/troubleshooting.rst. Signed-off-by: Zac Dover (cherry picked from commit a17fd3f41b3e4a7ea5160bcdfe7a8b43b4b0f2e3) --- diff --git a/doc/cephfs/troubleshooting.rst b/doc/cephfs/troubleshooting.rst index 79e6682d1abed..6df90f87ba110 100644 --- a/doc/cephfs/troubleshooting.rst +++ b/doc/cephfs/troubleshooting.rst @@ -238,20 +238,27 @@ See the :ref:`RADOS troubleshooting documentation`. The MDS ======= -If an operation is hung inside the MDS, it will eventually show up in ``ceph health``, -identifying "slow requests are blocked". It may also identify clients as -"failing to respond" or misbehaving in other ways. If the MDS identifies -specific clients as misbehaving, you should investigate why they are doing so. +Run the ``ceph health`` command. Any operation that is hung in the MDS is +indicated by the ``slow requests are blocked`` message. -Generally it will be the result of +Messages that read ``failing to respond`` indicate that a client is failing to +respond. -#. Overloading the system (if you have extra RAM, increase the - "mds cache memory limit" config from its default 1GiB; having a larger active - file set than your MDS cache is the #1 cause of this!). +The following list details potential causes of hung operations: -#. Running an older (misbehaving) client. +#. The system is overloaded. The most likely cause of system overload is an + active file set that is larger than the MDS cache. + + If you have extra RAM, increase the ``mds_cache_memory_limit``. The specific + tunable ``mds_cache_memory_limit`` is discussed in the :ref:`MDS Cache + Size`. Read the :ref:`MDS + Cache Configuration` section in full before + making any alterations to the ``mds_cache_memory_limit`` tunable. -#. Underlying RADOS issues. +#. There is an older (misbehaving) client. + +#. There are underlying RADOS issues. See :ref:`The RADOS troubleshooting + documentation`. Otherwise, you have probably discovered a new bug and should report it to the developers!