From: Zac Dover Date: Sat, 9 Aug 2025 00:25:31 +0000 (+1000) Subject: doc/cephfs: edit troubleshooting.rst X-Git-Tag: v20.1.1~103^2 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=refs%2Fpull%2F65086%2Fhead;p=ceph.git doc/cephfs: edit troubleshooting.rst Edit the section "Slow Requests (MDS)" in doc/cephfs/troubleshooting.rst. Signed-off-by: Zac Dover (cherry picked from commit edb3d2be60fd38a1957878bdfa9a9d9d415cc94c) --- diff --git a/doc/cephfs/troubleshooting.rst b/doc/cephfs/troubleshooting.rst index 79e6682d1abe..e04cf389ea3a 100644 --- a/doc/cephfs/troubleshooting.rst +++ b/doc/cephfs/troubleshooting.rst @@ -260,24 +260,30 @@ the developers! Slow requests (MDS) ------------------- -You can list current operations via the admin socket by running:: +List current operations via the admin socket by running the following command +from the MDS host: - ceph daemon mds. dump_ops_in_flight +.. prompt:: bash # + + ceph daemon mds. dump_ops_in_flight -from the MDS host. Identify the stuck commands and examine why they are stuck. +Identify the stuck commands and examine why they are stuck. Usually the last "event" will have been an attempt to gather locks, or sending -the operation off to the MDS log. If it is waiting on the OSDs, fix them. If -operations are stuck on a specific inode, you probably have a client holding -caps which prevent others from using it, either because the client is trying -to flush out dirty data or because you have encountered a bug in CephFS' -distributed file lock code (the file "capabilities" ["caps"] system). - -If it's a result of a bug in the capabilities code, restarting the MDS -is likely to resolve the problem. - -If there are no slow requests reported on the MDS, and it is not reporting -that clients are misbehaving, either the client has a problem or its -requests are not reaching the MDS. +the operation off to the MDS log. If it is waiting on the OSDs, fix them. + +If operations are stuck on a specific inode, then a client is likely holding +capabilities, preventing its use by other clients. This situation can be caused +by a client trying to flush dirty data, but it might be caused because you have +encountered a bug in the distributed file lock code (the file "capabilities" +["caps"] system) of CephFS. + +If you have determined that the commands are stuck because of a bug in the +capabilities code, restart the MDS. Restarting the MDS is likely to resolve the +problem. + +If there are no slow requests reported on the MDS, and there is no indication +that clients are misbehaving, then either there is a problem with the client +or the client's requests are not reaching the MDS. .. _ceph_fuse_debugging: