From: Yan, Zheng Date: Fri, 20 Apr 2018 05:49:17 +0000 (+0800) Subject: mds: update dev document of cephfs snapshot X-Git-Tag: v13.1.0~2^2~3 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=5643909b36aab8ef6d0297cbf876c3daf6ad1527;p=ceph.git mds: update dev document of cephfs snapshot Signed-off-by: "Yan, Zheng" Fixes: http://tracker.ceph.com/issues/23583 --- diff --git a/doc/dev/cephfs-snapshots.rst b/doc/dev/cephfs-snapshots.rst index 6069ce57c309..70d07bc0c5af 100644 --- a/doc/dev/cephfs-snapshots.rst +++ b/doc/dev/cephfs-snapshots.rst @@ -20,55 +20,60 @@ features that make CephFS snapshots different from what you might expect: Important Data Structures ------------------------- * SnapRealm: A `SnapRealm` is created whenever you create a snapshot at a new - point in the hierarchy (or, when a snapshotted inode is moved outside of its - parent snapshot). SnapRealms contain an `sr_t srnode`, links to `past_parents` - and `past_children`, and all `inodes_with_caps` that are part of the snapshot. - Clients also have a SnapRealm concept that maintains less data but is used to - associate a `SnapContext` with each open file for writing. + point in the hierarchy (or, when a snapshotted inode is move outside of its + parent snapshot). SnapRealms contain an `sr_t srnode`, and `inodes_with_caps` + that are part of the snapshot. Clients also have a SnapRealm concept that + maintains less data but is used to associate a `SnapContext` with each open + file for writing. * sr_t: An `sr_t` is the on-disk snapshot metadata. It is part of the containing directory and contains sequence counters, timestamps, the list of associated - snapshot IDs, and `past_parents`. -* snaplink_t: `past_parents` et al are stored on-disk as a `snaplink_t`, holding - the inode number and first `snapid` of the inode/snapshot referenced. + snapshot IDs, and `past_parent_snaps`. +* SnapServer: SnapServer manages snapshot ID allocation, snapshot deletion and + tracks list of effective snapshots in the filesystem. A filesystem only has + one instance of snapserver. +* SnapClient: SnapClient is used to communicate with snapserver, each MDS rank + has its own snapclient instance. SnapClient also caches effective snapshots + locally. Creating a snapshot ------------------- -Because CephFS snapshot currently is an experimental feature, we are supposed -to enable it explicitly by the command below before testing. +CephFS snapshot feature is enabled by default on new filesystem. To enable it +on existing filesystems, use command below. .. code:: - $ ceph fs set allow_new_snaps true --yes-i-really-mean-it + $ ceph fs set allow_new_snaps true -To make a snapshot on directory "/1/2/3/foo", the client invokes "mkdir" on -"/1/2/3/foo/.snap" directory. This is transmitted to the MDS Server as a +To make a snapshot on directory "/1/2/3/", the client invokes "mkdir" on +"/1/2/3/.snap" directory. This is transmitted to the MDS Server as a CEPH_MDS_OP_MKSNAP-tagged `MClientRequest`, and initially handled in Server::handle_client_mksnap(). It allocates a `snapid` from the `SnapServer`, projects a new inode with the new SnapRealm, and commits it to the MDLog as usual. When committed, it invokes -`MDCache::do_realm_invalidate_and_update_notify()`, which triggers most of the -real work of the snapshot. +`MDCache::do_realm_invalidate_and_update_notify()`, which notifies all clients +with caps on files under "/1/2/3/", about the new SnapRealm. When clients get +the notifications, they update client-side SnapRealm hierarchy, link files +under "/1/2/3/" to the new SnapRealm and generate a `SnapContext` for the +new SnapRealm. -If there were already snapshots above directory "foo" (rooted at "/1", say), -the new SnapRealm adds its most immediate ancestor as a `past_parent` on -creation. After committing to the MDLog, all clients with caps on files in -"/1/2/3/foo/" are notified (MDCache::send_snaps()) of the new SnapRealm, and -update the `SnapContext` they are using with that data. Note that this -*is not* a synchronous part of the snapshot creation! +Note that this *is not* a synchronous part of the snapshot creation! Updating a snapshot ------------------- -If you delete a snapshot, or move data out of the parent snapshot's hierarchy, -a similar process is followed. Extra code paths check to see if we can break -the `past_parent` links between SnapRealms, or eliminate them entirely. +If you delete a snapshot, a similar process is followed. If you remove an inode +out of its parent SnapRealm, the rename code creates a new SnapRealm for the +renamed inode (if SnapRealm does not already exist), saves IDs of snapshots that +are effective on the original parent SnapRealm into `past_parent_snaps` of the +new SnapRealm, then follows a process similar to creating snapshot. Generating a SnapContext ------------------------ A RADOS `SnapContext` consists of a snapshot sequence ID (`snapid`) and all the snapshot IDs that an object is already part of. To generate that list, we -generate a list of all `snapids` associated with the SnapRealm and all its -`past_parents`. +combine `snapids` associated with the SnapRealm and all vaild `snapids` in +`past_parent_snaps`. Stale `snapids` are filtered out by SnapClient's cached +effective snapshots. Storing snapshot data --------------------- @@ -106,11 +111,10 @@ out again. Hard links ---------- -Hard links do not interact well with snapshots. A file is snapshotted when its -primary link is part of a SnapRealm; other links *will not* preserve data. -Generally the location where a file was first created will be its primary link, -but if the original link has been deleted it is not easy (nor always -determnistic) to find which link is now the primary. +Inode with multiple hard links is moved to a dummy gloabl SnapRealm. The +dummy SnapRealm covers all snapshots in the filesystem. The inode's data +will be preserved for any new snapshot. These preserved data will cover +snapshots on any linkage of the inode. Multi-FS ---------