From bddde34f1ae887c501f006e3ff45e1c993f38596 Mon Sep 17 00:00:00 2001 From: Zac Dover Date: Sat, 9 Aug 2025 10:25:31 +1000 Subject: [PATCH] doc/cephfs: edit troubleshooting.rst Edit the section "Slow Requests (MDS)" in doc/cephfs/troubleshooting.rst. Signed-off-by: Zac Dover (cherry picked from commit edb3d2be60fd38a1957878bdfa9a9d9d415cc94c) --- doc/cephfs/troubleshooting.rst | 36 ++++++++++++++++++++-------------- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/doc/cephfs/troubleshooting.rst b/doc/cephfs/troubleshooting.rst index 79e6682d1abe..e04cf389ea3a 100644 --- a/doc/cephfs/troubleshooting.rst +++ b/doc/cephfs/troubleshooting.rst @@ -260,24 +260,30 @@ the developers! Slow requests (MDS) ------------------- -You can list current operations via the admin socket by running:: +List current operations via the admin socket by running the following command +from the MDS host: - ceph daemon mds. dump_ops_in_flight +.. prompt:: bash # + + ceph daemon mds. dump_ops_in_flight -from the MDS host. Identify the stuck commands and examine why they are stuck. +Identify the stuck commands and examine why they are stuck. Usually the last "event" will have been an attempt to gather locks, or sending -the operation off to the MDS log. If it is waiting on the OSDs, fix them. If -operations are stuck on a specific inode, you probably have a client holding -caps which prevent others from using it, either because the client is trying -to flush out dirty data or because you have encountered a bug in CephFS' -distributed file lock code (the file "capabilities" ["caps"] system). - -If it's a result of a bug in the capabilities code, restarting the MDS -is likely to resolve the problem. - -If there are no slow requests reported on the MDS, and it is not reporting -that clients are misbehaving, either the client has a problem or its -requests are not reaching the MDS. +the operation off to the MDS log. If it is waiting on the OSDs, fix them. + +If operations are stuck on a specific inode, then a client is likely holding +capabilities, preventing its use by other clients. This situation can be caused +by a client trying to flush dirty data, but it might be caused because you have +encountered a bug in the distributed file lock code (the file "capabilities" +["caps"] system) of CephFS. + +If you have determined that the commands are stuck because of a bug in the +capabilities code, restart the MDS. Restarting the MDS is likely to resolve the +problem. + +If there are no slow requests reported on the MDS, and there is no indication +that clients are misbehaving, then either there is a problem with the client +or the client's requests are not reaching the MDS. .. _ceph_fuse_debugging: -- 2.47.3