From e5f197fc06b69412e94ee0d956aaffad9ee1a757 Mon Sep 17 00:00:00 2001 From: Zac Dover Date: Sun, 10 Aug 2025 18:32:03 +1000 Subject: [PATCH] doc/cephfs: edit troubleshooting.rst Edit the section "The MDS" in the file doc/cephfs/troubleshooting.rst. Signed-off-by: Zac Dover (cherry picked from commit a17fd3f41b3e4a7ea5160bcdfe7a8b43b4b0f2e3) --- doc/cephfs/troubleshooting.rst | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/doc/cephfs/troubleshooting.rst b/doc/cephfs/troubleshooting.rst index c1aa34bddc7..4ac8e997b6f 100644 --- a/doc/cephfs/troubleshooting.rst +++ b/doc/cephfs/troubleshooting.rst @@ -248,20 +248,27 @@ See the :ref:`RADOS troubleshooting documentation`. The MDS ======= -If an operation is hung inside the MDS, it will eventually show up in ``ceph health``, -identifying "slow requests are blocked". It may also identify clients as -"failing to respond" or misbehaving in other ways. If the MDS identifies -specific clients as misbehaving, you should investigate why they are doing so. +Run the ``ceph health`` command. Any operation that is hung in the MDS is +indicated by the ``slow requests are blocked`` message. -Generally it will be the result of +Messages that read ``failing to respond`` indicate that a client is failing to +respond. -#. Overloading the system (if you have extra RAM, increase the - "mds cache memory limit" config from its default 1GiB; having a larger active - file set than your MDS cache is the #1 cause of this!). +The following list details potential causes of hung operations: -#. Running an older (misbehaving) client. +#. The system is overloaded. The most likely cause of system overload is an + active file set that is larger than the MDS cache. + + If you have extra RAM, increase the ``mds_cache_memory_limit``. The specific + tunable ``mds_cache_memory_limit`` is discussed in the :ref:`MDS Cache + Size`. Read the :ref:`MDS + Cache Configuration` section in full before + making any alterations to the ``mds_cache_memory_limit`` tunable. -#. Underlying RADOS issues. +#. There is an older (misbehaving) client. + +#. There are underlying RADOS issues. See :ref:`The RADOS troubleshooting + documentation`. Otherwise, you have probably discovered a new bug and should report it to the developers! -- 2.39.5