doc: add cephfs disaster recovery guidance

author John Spray <john.spray@redhat.com>

Mon, 5 Jan 2015 15:46:38 +0000 (15:46 +0000)

committer John Spray <john.spray@redhat.com>

Mon, 2 Feb 2015 08:56:14 +0000 (09:56 +0100)
author John Spray <john.spray@redhat.com>
Mon, 5 Jan 2015 15:46:38 +0000 (15:46 +0000)
committer John Spray <john.spray@redhat.com>
Mon, 2 Feb 2015 08:56:14 +0000 (09:56 +0100)
diff --git a/doc/cephfs/disaster-recovery.rst b/doc/cephfs/disaster-recovery.rst

new file mode 100644 (file)

index 0000000..0b88948
--- /dev/null
+++ b/doc/cephfs/disaster-recovery.rst
@@ -0,0 +1,110 @@
+
+Disaster recovery
+=================
+
+.. danger::
+
+    The notes in this section are aimed at experts, making a best effort
+    to recovery what they can from damaged filesystems.  These steps
+    have the potential to make things worse as well as better.  If you
+    are unsure, do not proceed.
+
+
+Journal export
+--------------
+
+Before attempting dangerous operations, make a copy of the journal like so:
+
+::
+
+    cephfs-journal-tool journal export backup.bin
+
+Note that this command may not always work if the journal is badly corrupted,
+in which case a RADOS-level copy should be made (http://tracker.ceph.com/issues/9902).
+
+
+Dentry recovery from journal
+----------------------------
+
+If a journal is damaged or for any reason an MDS is incapable of replaying it,
+attempt to recover what file metadata we can like so:
+
+::
+
+    cephfs-journal-tool event recover_dentries summary
+
+This command by default acts on MDS rank 0, pass --rank=<n> to operate on other ranks.
+
+This command will write any inodes/dentries recoverable from the journal
+into the backing store, if these inodes/dentries are higher-versioned
+than the previous contents of the backing store.  If any regions of the journal
+are missing/damaged, they will be skipped.
+
+Note that in addition to writing out dentries and inodes, this command will update
+the InoTables of each 'in' MDS rank, to indicate that any written inodes' numbers
+are now in use.  In simple cases, this will result in an entirely valid backing
+store state.
+
+.. warning::
+
+    The resulting state of the backing store is not guaranteed to be self-consistent,
+    and an online MDS scrub will be required afterwards.  The journal contents
+    will not be modified by this command, you should truncate the journal
+    separately after recovering what you can.
+
+Journal truncation
+------------------
+
+If the journal is corrupt or MDSs cannot replay it for any reason, you can
+truncate it like so:
+
+::
+
+    cephfs-journal-tool journal reset
+
+.. warning::
+
+    Resetting the journal *will* lose metadata unless you have extracted
+    it by other means such as ``recover_dentries``.  It is likely to leave
+    some orphaned objects in the data pool.  It may result in re-allocation
+    of already-written inodes, such that permissions rules could be violated.
+
+MDS table wipes
+---------------
+
+After the journal has been reset, it may no longer be consistent with respect
+to the contents of the MDS tables (InoTable, SessionMap, SnapServer).
+
+To reset the SessionMap (erase all sessions), use:
+
+::
+
+    cephfs-table-tool all reset session
+
+This command acts on the tables of all 'in' MDS ranks.  Replace 'all' with an MDS
+rank to operate on that rank only.
+
+The session table is the table most likely to need resetting, but if you know you
+also need to reset the other tables then replace 'session' with 'snap' or 'inode'.
+
+MDS map reset
+-------------
+
+Once the in-RADOS state of the filesystem (i.e. contents of the metadata pool)
+is somewhat recovered, it may be necessary to update the MDS map to reflect
+the contents of the metadata pool.  Use the following command to reset the MDS
+map to a single MDS:
+
+::
+
+    ceph fs reset <fs name> --yes-i-really-mean-it
+
+Once this is run, any in-RADOS state for MDS ranks other than 0 will be ignored:
+as a result it is possible for this to result in data loss.
+
+One might wonder what the difference is between 'fs reset' and 'fs remove; fs new'.  The
+key distinction is that doing a remove/new will leave rank 0 in 'creating' state, such
+that it would overwrite any existing root inode on disk and orphan any existing files.  In
+contrast, the 'reset' command will leave rank 0 in 'active' state such that the next MDS
+daemon to claim the rank will go ahead and use the existing in-RADOS metadata.
+
diff --git a/doc/cephfs/index.rst b/doc/cephfs/index.rst

index be55ecfa6f9d8b70cc04f001d4d34512f763686e..eb3f31049d2bbf81a1d7c633e8e310fa247fd069 100644 (file)
--- a/doc/cephfs/index.rst
+++ b/doc/cephfs/index.rst
@@ -84,6 +84,7 @@ authentication keyring.
         Client eviction <eviction>
         Handling full filesystems <full>
         Troubleshooting <troubleshooting>
+       Disaster recovery <disaster-recovery>
  
  .. raw:: html
author	John Spray <john.spray@redhat.com>
	Mon, 5 Jan 2015 15:46:38 +0000 (15:46 +0000)
committer	John Spray <john.spray@redhat.com>
	Mon, 2 Feb 2015 08:56:14 +0000 (09:56 +0100)
doc/cephfs/disaster-recovery.rst	[new file with mode: 0644]	patch \| blob
doc/cephfs/index.rst		patch \| blob \| history