From: John Spray Date: Mon, 5 Jan 2015 15:46:38 +0000 (+0000) Subject: doc: add cephfs disaster recovery guidance X-Git-Tag: v0.93~138^2 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=refs%2Fpull%2F3533%2Fhead;p=ceph.git doc: add cephfs disaster recovery guidance This is a place to put some useful notes about the new offline recovery tooling. Signed-off-by: John Spray --- diff --git a/doc/cephfs/disaster-recovery.rst b/doc/cephfs/disaster-recovery.rst new file mode 100644 index 000000000000..0b88948cd673 --- /dev/null +++ b/doc/cephfs/disaster-recovery.rst @@ -0,0 +1,110 @@ + +Disaster recovery +================= + +.. danger:: + + The notes in this section are aimed at experts, making a best effort + to recovery what they can from damaged filesystems. These steps + have the potential to make things worse as well as better. If you + are unsure, do not proceed. + + +Journal export +-------------- + +Before attempting dangerous operations, make a copy of the journal like so: + +:: + + cephfs-journal-tool journal export backup.bin + +Note that this command may not always work if the journal is badly corrupted, +in which case a RADOS-level copy should be made (http://tracker.ceph.com/issues/9902). + + +Dentry recovery from journal +---------------------------- + +If a journal is damaged or for any reason an MDS is incapable of replaying it, +attempt to recover what file metadata we can like so: + +:: + + cephfs-journal-tool event recover_dentries summary + +This command by default acts on MDS rank 0, pass --rank= to operate on other ranks. + +This command will write any inodes/dentries recoverable from the journal +into the backing store, if these inodes/dentries are higher-versioned +than the previous contents of the backing store. If any regions of the journal +are missing/damaged, they will be skipped. + +Note that in addition to writing out dentries and inodes, this command will update +the InoTables of each 'in' MDS rank, to indicate that any written inodes' numbers +are now in use. In simple cases, this will result in an entirely valid backing +store state. + +.. warning:: + + The resulting state of the backing store is not guaranteed to be self-consistent, + and an online MDS scrub will be required afterwards. The journal contents + will not be modified by this command, you should truncate the journal + separately after recovering what you can. + +Journal truncation +------------------ + +If the journal is corrupt or MDSs cannot replay it for any reason, you can +truncate it like so: + +:: + + cephfs-journal-tool journal reset + +.. warning:: + + Resetting the journal *will* lose metadata unless you have extracted + it by other means such as ``recover_dentries``. It is likely to leave + some orphaned objects in the data pool. It may result in re-allocation + of already-written inodes, such that permissions rules could be violated. + +MDS table wipes +--------------- + +After the journal has been reset, it may no longer be consistent with respect +to the contents of the MDS tables (InoTable, SessionMap, SnapServer). + +To reset the SessionMap (erase all sessions), use: + +:: + + cephfs-table-tool all reset session + +This command acts on the tables of all 'in' MDS ranks. Replace 'all' with an MDS +rank to operate on that rank only. + +The session table is the table most likely to need resetting, but if you know you +also need to reset the other tables then replace 'session' with 'snap' or 'inode'. + +MDS map reset +------------- + +Once the in-RADOS state of the filesystem (i.e. contents of the metadata pool) +is somewhat recovered, it may be necessary to update the MDS map to reflect +the contents of the metadata pool. Use the following command to reset the MDS +map to a single MDS: + +:: + + ceph fs reset --yes-i-really-mean-it + +Once this is run, any in-RADOS state for MDS ranks other than 0 will be ignored: +as a result it is possible for this to result in data loss. + +One might wonder what the difference is between 'fs reset' and 'fs remove; fs new'. The +key distinction is that doing a remove/new will leave rank 0 in 'creating' state, such +that it would overwrite any existing root inode on disk and orphan any existing files. In +contrast, the 'reset' command will leave rank 0 in 'active' state such that the next MDS +daemon to claim the rank will go ahead and use the existing in-RADOS metadata. + diff --git a/doc/cephfs/index.rst b/doc/cephfs/index.rst index be55ecfa6f9d..eb3f31049d2b 100644 --- a/doc/cephfs/index.rst +++ b/doc/cephfs/index.rst @@ -84,6 +84,7 @@ authentication keyring. Client eviction Handling full filesystems Troubleshooting + Disaster recovery .. raw:: html