From: Kotresh HR Date: Thu, 24 Jul 2025 09:33:06 +0000 (+0000) Subject: mds: Fix readdir when osd is full. X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=844c21b9a5baab60b0f014ea5fe491317ad97b09;p=ceph-ci.git mds: Fix readdir when osd is full. Problem: The readdir wouldn't list all the entries in the directory when the osd is full with rstats enabled. Cause: The issue happens only in multi-mds cephfs cluster. If rstats is enabled, the readdir would request 'Fa' cap on every dentry, basically to fetch the size of the directories. Note that 'Fa' is CEPH_CAP_GWREXTEND which maps to CEPH_CAP_FILE_WREXTEND and is used by CEPH_STAT_RSTAT. The request for the cap is a getattr call and it need not go to the auth mds. If rstats is enabled, the getattr would go with the mask CEPH_STAT_RSTAT which mandates the requirement for auth-mds in 'handle_client_getattr', so that the request gets forwarded to auth mds if it's not the auth. But if the osd is full, the indode is fetched in the 'dispatch_client_request' before calling the handler function of respective op, to check the FULL cap access for certain metadata write operations. If the inode doesn't exist, ESTALE is returned. This is wrong for the operations like getattr, where the inode might not be in memory on the non-auth mds and returning ESTALE is confusing and client wouldn't retry. This is introduced by the commit 6db81d8479b539d which fixes subvolume deletion when osd is full. Fix: Fetch the inode required for the FULL cap access check for the relevant operations in osd full scenario. This makes sense because all the operations would mostly be preceded with lookup and load the inode in memory or they would handle ESTALE gracefully. Fixes: https://tracker.ceph.com/issues/72260 Introduced-by: 6db81d8479b539d3ca6b98dc244c525e71a36437 Signed-off-by: Kotresh HR (cherry picked from commit 1ca8f334f944ff78ba12894f385ffb8c1932901c) --- diff --git a/src/mds/Server.cc b/src/mds/Server.cc index 6dd9b330972..d0f799425d7 100644 --- a/src/mds/Server.cc +++ b/src/mds/Server.cc @@ -2765,11 +2765,6 @@ void Server::dispatch_client_request(const MDRequestRef& mdr) } if (is_full) { - CInode *cur = try_get_auth_inode(mdr, req->get_filepath().get_ino()); - if (!cur) { - // the request is already responded to - return; - } if (req->get_op() == CEPH_MDS_OP_SETLAYOUT || req->get_op() == CEPH_MDS_OP_SETDIRLAYOUT || req->get_op() == CEPH_MDS_OP_SETLAYOUT || @@ -2782,7 +2777,18 @@ void Server::dispatch_client_request(const MDRequestRef& mdr) req->get_op() == CEPH_MDS_OP_RENAME) && (!mdr->has_more() || mdr->more()->witnessed.empty())) // haven't started peer request ) { - + /* + * The inode fetch below is specific to the operations above and the inode is + * expected to be in memory as these operations are likely preceded by lookup. + * Doing this generically outside the condition was incorrect as the ops like + * getattr might not have the inode in memory as this could be a non-auth mds + * and fails with ESTALE confusing the client without forwarding to the auth mds. + */ + CInode *cur = try_get_auth_inode(mdr, req->get_filepath().get_ino()); + if (!cur) { + // the request is already responded to + return; + } if (check_access(mdr, cur, MAY_FULL)) { dout(20) << __func__ << ": full, has FULL caps, permitting op " << ceph_mds_op_name(req->get_op()) << dendl; } else {