]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/commit
mds: include encoded stray inode when sending dentry unlink message to replicas 46184/head
authorVenky Shankar <vshankar@redhat.com>
Wed, 16 Mar 2022 06:08:55 +0000 (02:08 -0400)
committerVenky Shankar <vshankar@redhat.com>
Fri, 6 May 2022 04:14:54 +0000 (09:44 +0530)
commit805a3674779d0150c06ba45c8caf84e17e131ee5
tree4381a377fb9c8d1927b845e7e2c03036ec58c403
parentd204189761c9cf64d5ff5976c86ae9b6f566e711
mds: include encoded stray inode when sending dentry unlink message to replicas

The series of events that lead to unaccessible dentries is as follows (requires some
hardlinked files, directory pinning and path restricted caps).

Assume 3 hardlinked files:

       d0/f0 <-- primary link
       d1/h1
       d1/h2

with multiple active MDSs -- d0 pinned to rank-0 and d1 to rank-1. Reproducing
this requires deleting a non-primary link first followed by deleting the primary
link and lastly the other non-primary link. If one if unlucky, the last delete
fails with "Permission denied" error. On the MDS side, this is what happens:

Unlinking the first non-primary link would discover the remote inode and link
it in dentry as part of lookup. Unlink would send a remote unlink operation
(_link_remote(op-unlink)) to the auth mds of the remote inode. Now the other
unlinks:

               rank0               |                 rank1
                                   |
                x-----<-----unlink: #unlink(d0/f0)
                |                  |
                v                  |
         _unlink_local()           |
                |                  |
                v                  |
                |                  |
   x ---send_dentry_unlink()       |
   |   reply_client_request()      |
   |                               |
   |                       unlink: #lookup(d1/h2)----x
   |                               |                 |
   |                               |                 v
   |                               |       MDCache::path_traverse()
   v                               |       (uses linked remote inode,
   |                               |       which is not a stray inode)-----x
   |                               |                                       |
   x-------------->----------------|---------->------x                     v
         (dentry_unlink msg)       |                 |                     |
                                   |                 v                     |
                                   |       handle_dentry_unlink()          |
                                   |    (relinks inode to stray dentry)    |
                                   |                                       |
                                   |                 x---------------------x
                                   |                 |
                                   |                 v
                                   |       dispatch_client_request()
                                   |        (uses linked inode under
                                   |  stray dentry, no reintegration check)
                                   |                 |
                                   |                 v
                                   |      SessionMap::check_access()
                                   |      (parent is a stray dir,
                                   |    but stray_prior_path is empty)

One possible fix could be to fix SessionMap::check_access() to build the
inode path for a inode under stray directory, however, the inode path
would be on the form "#<inode>" and it does not look like there is a
good way to figure out what it was (as seen in inode dump).

Sending the (encoded) inode as part of `dentry_unlink' message seems to
get handled well -- decode_replica_inode() handles the case where the
inode already existed (in MDCache::inode_map) and the auth mds sending
the stray inode to its replicas doesn't seem to cause trouble. (I think
it didn't do this since the replicas have the inode anyway, but in this
case this inode is a _bit_ out of sync with what exists in the auth).

Fixes: http://tracker.ceph.com/issues/54046
Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit 59e01da10695c2ccff2623c09e78a9d572e64235)
src/mds/MDCache.cc
src/mds/MDCache.h
src/mds/Server.cc