doc: explain how unfound objects happen

author Sage Weil <sage@newdream.net>

Thu, 8 Mar 2012 22:55:21 +0000 (14:55 -0800)

committer Sage Weil <sage@newdream.net>

Thu, 8 Mar 2012 22:55:21 +0000 (14:55 -0800)
author Sage Weil <sage@newdream.net>
Thu, 8 Mar 2012 22:55:21 +0000 (14:55 -0800)
committer Sage Weil <sage@newdream.net>
Thu, 8 Mar 2012 22:55:21 +0000 (14:55 -0800)
diff --git a/doc/ops/manage/failures/osd.rst b/doc/ops/manage/failures/osd.rst

index 179f5516c32157589708be60c7024579efc2ae42..4d2196406af13bdaf57c04c72222c6d70050f42a 100644 (file)
--- a/doc/ops/manage/failures/osd.rst
+++ b/doc/ops/manage/failures/osd.rst
@@ -186,6 +186,19 @@ Under certain combinations of failures Ceph may complain about
  
  This means that the storage cluster knows that some objects (or newer
  copies of existing objects) exist, but it hasn't found copies of them.
+One example of how this might come about for a PG whose data is on ceph-osds
+A and B:
+
+* A goes down
+* B handles some writes, alone
+* A comes up
+* A and B repeer, and the objects missing on A are queued for recovery.
+* Before the new objects are copied, B goes down.
+
+Now A knows that these object exist, but there is no live ceph-osd who
+has a copy.  In this case, IO to those objects will block, and the
+cluster will hope that the failed node comes back soon; this is
+assumed to be preferable to returning an IO error to the user.
  
  First, you can identify which objects are unfound with::
author	Sage Weil <sage@newdream.net>
	Thu, 8 Mar 2012 22:55:21 +0000 (14:55 -0800)
committer	Sage Weil <sage@newdream.net>
	Thu, 8 Mar 2012 22:55:21 +0000 (14:55 -0800)