From a17fd3f41b3e4a7ea5160bcdfe7a8b43b4b0f2e3 Mon Sep 17 00:00:00 2001 From: Zac Dover Date: Sun, 10 Aug 2025 18:32:03 +1000 Subject: [PATCH] doc/cephfs: edit troubleshooting.rst Edit the section "The MDS" in the file doc/cephfs/troubleshooting.rst. Signed-off-by: Zac Dover --- doc/cephfs/troubleshooting.rst | 27 +++++++++++++++++---------- doc/rados/troubleshooting/index.rst | 2 ++ 2 files changed, 19 insertions(+), 10 deletions(-) diff --git a/doc/cephfs/troubleshooting.rst b/doc/cephfs/troubleshooting.rst index 8d47f4e4a68..29d5788dba7 100644 --- a/doc/cephfs/troubleshooting.rst +++ b/doc/cephfs/troubleshooting.rst @@ -222,20 +222,27 @@ problems first (:doc:`../../rados/troubleshooting/index`). The MDS ======= -If an operation is hung inside the MDS, it will eventually show up in ``ceph health``, -identifying "slow requests are blocked". It may also identify clients as -"failing to respond" or misbehaving in other ways. If the MDS identifies -specific clients as misbehaving, you should investigate why they are doing so. +Run the ``ceph health`` command. Any operation that is hung in the MDS is +indicated by the ``slow requests are blocked`` message. -Generally it will be the result of +Messages that read ``failing to respond`` indicate that a client is failing to +respond. -#. Overloading the system (if you have extra RAM, increase the - "mds cache memory limit" config from its default 1GiB; having a larger active - file set than your MDS cache is the #1 cause of this!). +The following list details potential causes of hung operations: -#. Running an older (misbehaving) client. +#. The system is overloaded. The most likely cause of system overload is an + active file set that is larger than the MDS cache. + + If you have extra RAM, increase the ``mds_cache_memory_limit``. The specific + tunable ``mds_cache_memory_limit`` is discussed in the :ref:`MDS Cache + Size`. Read the :ref:`MDS + Cache Configuration` section in full before + making any alterations to the ``mds_cache_memory_limit`` tunable. -#. Underlying RADOS issues. +#. There is an older (misbehaving) client. + +#. There are underlying RADOS issues. See :ref:`The RADOS troubleshooting + documentation`. Otherwise, you have probably discovered a new bug and should report it to the developers! diff --git a/doc/rados/troubleshooting/index.rst b/doc/rados/troubleshooting/index.rst index b481ee1dc9c..8b937d8d19d 100644 --- a/doc/rados/troubleshooting/index.rst +++ b/doc/rados/troubleshooting/index.rst @@ -1,3 +1,5 @@ +.. _rados_troubleshooting: + ================= Troubleshooting ================= -- 2.39.5