From e966aaf6fba0bb4c6f4ce31a3ac5c60b13370fb0 Mon Sep 17 00:00:00 2001 From: Xiubo Li Date: Fri, 29 Oct 2021 11:46:52 +0800 Subject: [PATCH] doc: update the capabilities doc for cephfs Add more detail about the caps and explain more for the caps in filelock. Signed-off-by: Xiubo Li --- doc/cephfs/capabilities.rst | 70 +++++++++++++++++++++++++++++++++++-- 1 file changed, 68 insertions(+), 2 deletions(-) diff --git a/doc/cephfs/capabilities.rst b/doc/cephfs/capabilities.rst index ac47bb12ba7ad..0db40e51d56e3 100644 --- a/doc/cephfs/capabilities.rst +++ b/doc/cephfs/capabilities.rst @@ -16,8 +16,8 @@ the capability grants. :: /* generic cap bits */ - #define CEPH_CAP_GSHARED 1 /* client can reads (s) */ - #define CEPH_CAP_GEXCL 2 /* client can read and update (x) */ + #define CEPH_CAP_GSHARED 1 /* (metadata) client can reads (s) */ + #define CEPH_CAP_GEXCL 2 /* (metadata) client can read and update (x) */ #define CEPH_CAP_GCACHE 4 /* (file) client can cache reads (c) */ #define CEPH_CAP_GRD 8 /* (file) client can read (r) */ #define CEPH_CAP_GWR 16 /* (file) client can write (w) */ @@ -110,3 +110,69 @@ capabilities. For example: The 'p' represents the pin. Each capital letter corresponds to the shift values, and the lowercase letters after each shift are for the actual capabilities granted in each shift. + +The relation between the lock states and the capabilities +--------------------------------------------------------- +In MDS there are four different locks for each inode, they are simplelock, +scatterlock, filelock and locallock. Each lock has several different lock +states, and the MDS will issue capabilities to clients based on the lock +state. + +In each state the MDS Locker will always try to issue all the capabilities to the +clients allowed, even some capabilities are not needed or wanted by the clients, +as pre-issuing capabilities could reduce latency in some cases. + +If there is only one client, usually it will be the loner client for all the inodes. +While in multiple clients case, the MDS will try to caculate a loner client out for +each inode depending on the capabilities the clients (needed | wanted), but usually +it will fail. The loner client will always get all the capabilities. + +The filelock will control files' partial metadatas' and the file contents' access +permissions. The metadatas include **mtime**, **atime**, **size**, etc. + +**Fs**: Once a client has it, all other clients are denied **Fw**. + +**Fx**: Only the loner client is allowed this capability. Once the lock state transitions + to LOCK_EXCL, the loner client is granted this along with all other file capabilities + except the **Fl**. + +**Fr**: Once a client has it, the **Fb** capability will be already revoked from all + the other clients. + + If clients only request to read the file, the lock state will be transferred + to LOCK_SYNC stable state directly. All the clients can be granted **Fscrl** + capabilities from the auth MDS and **Fscr** capabilities from the replica MDSes. + + If multiple clients read from and write to the same file, then the lock state + will be transferred to LOCK_MIX stable state finally and all the clients could + have the **Frwl** capabilities from the auth MDS, and the **Fr** from the replica + MDSes. The **Fcb** capabilities won't be granted to all the clients and the + clients will do sync read/write. + +**Fw**: If there is no loner client and once a client have this capability, the **Fsxcb** + capabilities won't be granted to other clients. + + If multiple clients read from and write to the same file, then the lock state + will be transferred to LOCK_MIX stable state finally and all the clients could + have the **Frwl** capabilities from the auth MDS, and the **Fr** from the replica + MDSes. The **Fcb** capabilities won't be granted to all the clients and the + clients will do sync read/write. + +**Fc**: This capability means the clients could cache file read and should be issued + together with **Fr** capability and only in this use case will it make sense. + While actually in some stable or interim transitional states they tend to keep + the **Fc** allowed even the **Fr** capability isn't granted as this can avoid + forcing clients to drop full caches, for example on a simple file size extension + or truncating use case. + +**Fb**: This capability means the clients could buffer file write and should be issued + together with **Fw** capability and only in this use case will it make sense. + While actually in some stable or interim transitional states they tend to keep + the **Fc** allowed even the **Fw** capability isn't granted as this can avoid + forcing clients to drop dirty buffers, for example on a simple file size extension + or truncating use case. + +**Fl**: This capability means the clients could perform lazy io. LazyIO relaxes POSIX + semantics. Buffered reads/writes are allowed even when a file is opened by multiple + applications on multiple clients. Applications are responsible for managing cache + coherency themselves. -- 2.39.5