From 0603aa681a3621eb960ba2488c895d1b630a7d2f Mon Sep 17 00:00:00 2001
From: Patrick Donnelly <pdonnell@redhat.com>
Date: Wed, 24 Jun 2020 15:32:42 -0700
Subject: [PATCH] doc: update cephfs metadata pool recovery procedure

This is mostly clarifying a few things after personally experimenting
with this.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
---
 doc/cephfs/disaster-recovery-experts.rst | 112 +++++++++++++++++------
 1 file changed, 82 insertions(+), 30 deletions(-)
diff --git a/doc/cephfs/disaster-recovery-experts.rst b/doc/cephfs/disaster-recovery-experts.rst
index 3be5cf3e7a0..11df16e3813 100644
--- a/doc/cephfs/disaster-recovery-experts.rst
+++ b/doc/cephfs/disaster-recovery-experts.rst
@@ -204,63 +204,115 @@ Using an alternate metadata pool for recovery
    undertaken with great care.
 
 If an existing file system is damaged and inoperative, it is possible to create
-a fresh metadata pool and attempt to reconstruct the file system metadata
-into this new pool, leaving the old metadata in place. This could be used to
-make a safer attempt at recovery since the existing metadata pool would not be
-overwritten.
+a fresh metadata pool and attempt to reconstruct the file system metadata into
+this new pool, leaving the old metadata in place. This could be used to make a
+safer attempt at recovery since the existing metadata pool would not be
+modified.
 
 .. caution::
 
    During this process, multiple metadata pools will contain data referring to
    the same data pool. Extreme caution must be exercised to avoid changing the
    data pool contents while this is the case. Once recovery is complete, the
-   damaged metadata pool should be deleted.
+   damaged metadata pool should be archived or deleted.
 
-To begin this process, first create the fresh metadata pool and initialize
-it with empty file system data structures:
+To begin, the existing file system should be taken down, if not done already,
+to prevent further modification of the data pool. Unmount all clients and then
+mark the file system failed:
+
+::
+
+    ceph fs fail <fs_name>
+
+Next, create a recovery file system in which we will populate a new metadata pool
+backed by the original data pool.
 
 ::
 
     ceph fs flag set enable_multiple true --yes-i-really-mean-it
-    ceph osd pool create recovery replicated <crush-rule-name>
-    ceph fs new recovery-fs recovery <data pool> --allow-dangerous-metadata-overlay
-    cephfs-data-scan init --force-init --filesystem recovery-fs --alternate-pool recovery
-    ceph fs reset recovery-fs --yes-i-really-mean-it
-    cephfs-table-tool recovery-fs:all reset session
-    cephfs-table-tool recovery-fs:all reset snap
-    cephfs-table-tool recovery-fs:all reset inode
+    ceph osd pool create cephfs_recovery_meta
+    ceph fs new cephfs_recovery recovery <data_pool> --allow-dangerous-metadata-overlay
+
+
+The recovery file system starts with an MDS rank that will initialize the new
+metadata pool with some metadata. This is necessary to bootstrap recovery.
+However, now we will take the MDS down as we do not want it interacting with
+the metadata pool further.
+
+::
+
+    ceph fs fail cephfs_recovery
 
-Next, run the recovery toolset using the --alternate-pool argument to output
-results to the alternate pool:
+Next, we will reset the initial metadata the MDS created:
 
 ::
 
-    cephfs-data-scan scan_extents --alternate-pool recovery --filesystem <original file system name> <original data pool name>
-    cephfs-data-scan scan_inodes --alternate-pool recovery --filesystem <original file system name> --force-corrupt --force-init <original data pool name>
-    cephfs-data-scan scan_links --filesystem recovery-fs
+    cephfs-table-tool cephfs_recovery:all reset session
+    cephfs-table-tool cephfs_recovery:all reset snap
+    cephfs-table-tool cephfs_recovery:all reset inode
+
+Now perform the recovery of the metadata pool from the data pool:
+
+::
+
+    cephfs-data-scan init --force-init --filesystem cephfs_recovery --alternate-pool cephfs_recovery_meta
+    cephfs-data-scan scan_extents --alternate-pool cephfs_recovery_meta --filesystem <fs_name> <data_pool>
+    cephfs-data-scan scan_inodes --alternate-pool cephfs_recovery_meta --filesystem <fs_name> --force-corrupt <data_pool>
+    cephfs-data-scan scan_links --filesystem cephfs_recovery
+
+.. note::
+
+   Each scan procedure above goes through the entire data pool. This may take a
+   significant amount of time. See the previous section on how to distribute
+   this task among workers.
 
 If the damaged file system contains dirty journal data, it may be recovered next
 with:
 
 ::
 
-    cephfs-journal-tool --rank=<original filesystem name>:0 event recover_dentries list --alternate-pool recovery
-    cephfs-journal-tool --rank recovery-fs:0 journal reset --force
+    cephfs-journal-tool --rank=<fs_name>:0 event recover_dentries list --alternate-pool cephfs_recovery_meta
+    cephfs-journal-tool --rank cephfs_recovery:0 journal reset --force
 
 After recovery, some recovered directories will have incorrect statistics.
-Ensure the parameters mds_verify_scatter and mds_debug_scatterstat are set
-to false (the default) to prevent the MDS from checking the statistics, then
-run a forward :doc:`scrub </cephfs/scrub>` to repair them. Ensure you have an
-MDS running and issue:
+Ensure the parameters ``mds_verify_scatter`` and ``mds_debug_scatterstat`` are
+set to false (the default) to prevent the MDS from checking the statistics:
+
+::
+
+    ceph config rm mds mds_verify_scatter
+    ceph config rm mds mds_debug_scatterstat
+
+(Note, the config may also have been set globally or via a ceph.conf file.)
+Now, allow an MDS to join the recovery file system:
+
+::
+
+    ceph fs set cephfs_recovery joinable true
+
+Finally, run a forward :doc:`scrub </cephfs/scrub>` to repair the statistics.
+Ensure you have an MDS running and issue:
 
 ::
 
-    ceph tell mds.a scrub start / recursive repair
+    ceph fs status # get active MDS
+    ceph tell mds.<id> scrub start / recursive repair
+
+.. note::
+
+   Symbolic links are recovered as empty regular files. `Symbolic link recovery
+   <https://tracker.ceph.com/issues/46166>`_ is scheduled to be supported in
+   Pacific.
+
+It is recommended to migrate any data from the recovery file system as soon as
+possible. Do not restore the old file system while the recovery file system is
+operational.
 
 .. note::
 
-    In Nautilus and above versions, tell interface scrub command is preferred
-    than scrub_path. For older versions only scrub_path asok command is
-    supported. Example::
+    If the data pool is also corrupt, some files may not be restored because
+    backtrace information is lost. If any data objects are missing (due to
+    issues like lost Placement Groups on the data pool), the recovered files
+    will contain holes in place of the missing data.
 
-        ceph daemon mds.a scrub_path / recursive repair
+.. _Symbolic link recovery: https://tracker.ceph.com/issues/46166
-- 
2.39.5