Most of the time this guide will work but sometimes all MDSs lock up and you
cannot actually see them spill. It is much better to run this on a cluster.
-As a pre-requistie, we assume you've installed `mdtest
+As a pre-requistie, we assume you have installed `mdtest
<https://sourceforge.net/projects/mdtest/>`_ or pulled the `Docker image
<https://hub.docker.com/r/michaelsevilla/mdtest/>`_. We use mdtest because we
need to generate enough load to get over the MIN_OFFLOAD threshold that is
done
-6. When you're done, you can kill all the clients with:
+6. When you are done, you can kill all the clients with:
::
in the MDBalancer. We do not want the error propagating up the call chain. The
cls_lua class wants to handle the error itself because it must fail gracefully.
For Mantle, we don't care if a Lua error crashes our balancer -- in that case,
-we'll fall back to the original balancer.
+we will fall back to the original balancer.
The performance improvement of using `lua_call` over `lua_pcall` would not be
leveraged here because the balancer is invoked every 10 seconds by default.
RADOS Health
============
-If part of the CephFS metadata or data pools is unavaible and CephFS isn't
+If part of the CephFS metadata or data pools is unavaible and CephFS is not
responding, it is probably because RADOS itself is unhealthy. Resolve those
problems first (:doc:`../../rados/troubleshooting/index`).
the operation off to the MDS log. If it is waiting on the OSDs, fix them. If
operations are stuck on a specific inode, you probably have a client holding
caps which prevent others from using it, either because the client is trying
-to flush out dirty data or because you've encountered a bug in CephFS'
+to flush out dirty data or because you have encountered a bug in CephFS'
distributed file lock code (the file "capabilities" ["caps"] system).
If it's a result of a bug in the capabilities code, restarting the MDS
is likely to resolve the problem.
-If there are no slow requests reported on the MDS, and it isn't reporting
+If there are no slow requests reported on the MDS, and it is not reporting
that clients are misbehaving, either the client has a problem or its
-requests aren't reaching the MDS.
+requests are not reaching the MDS.
ceph-fuse debugging
===================
* osdc: Dumps the current ops in-flight to OSDs (ie, file data IO)
* osdmap: Dumps the current OSDMap epoch, pools, and OSDs
-If there are no stuck requests but you have file IO which isn't progressing,
+If there are no stuck requests but you have file IO which is not progressing,
you might have a...
Disconnected+Remounted FS
Because CephFS has a "consistent cache", if your network connection is
disrupted for a long enough time, the client will be forcibly
disconnected from the system. At this point, the kernel client is in
-a bind: it can't safely write back dirty data, and many applications
+a bind: it cannot safely write back dirty data, and many applications
do not handle IO errors correctly on close().
At the moment, the kernel client will remount the FS, but outstanding filesystem
IO may or may not be satisfied. In these cases, you may need to reboot your