From: John Wilkins <john.wilkins@inktank.com>
Date: Fri, 18 Jan 2013 07:31:47 +0000 (-0800)
Subject: doc: Added a section describing mon/osd interaction.
X-Git-Tag: v0.57~172
X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=d6fc92dfaed03a897e4d702af6647df245e7f27f;p=ceph.git

doc: Added a section describing mon/osd interaction.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
---

diff --git a/doc/rados/configuration/mon-osd-interaction.rst b/doc/rados/configuration/mon-osd-interaction.rst
new file mode 100644
index 000000000000..f3f06648626f
--- /dev/null
+++ b/doc/rados/configuration/mon-osd-interaction.rst
@@ -0,0 +1,165 @@
+=====================================
+ Configuring Monitor/OSD Interaction
+=====================================
+
+After you have completed your initial Ceph configuration, you may deploy and run
+Ceph.  When you execute a command such as ``ceph health`` or ``ceph -s``,  the
+monitor reports on the current state of the cluster. The monitor knows about the
+cluster by requiring reports from each OSD, and by receiving reports from OSDs
+about the status of their neighboring OSDs. If the monitor doesn't receive
+reports, or if it receives reports of changes in the cluster, the monitor
+updates the status of the cluster.
+
+Ceph provides reasonable default settings for monitor/OSD interaction. However,
+you may override the defaults. The following sections describe how Ceph monitors
+and OSDs interact for the purposes of monitoring.
+
+
+OSDs Check Heartbeats
+=====================
+
+Each OSD checks the heartbeat of other OSDs every 6 seconds. You can change the
+heartbeat interval by adding an ``osd heartbeat interval`` setting under the
+``[osd]`` section of your Ceph configuration file, or by setting the value at
+runtime. If an OSD doesn't show a heartbeat within a 20 second grace period, the
+cluster may consider the OSD ``down``. You may change this grace period by
+adding an ``osd heartbeat grace`` setting under the ``[osd]`` section of your
+Ceph configuration file, or by setting the value at runtime.
+
+
+.. ditaa:: +---------+          +---------+
+           |  OSD 1  |          |  OSD 2  |
+           +---------+          +---------+
+                |                    |
+                |----+ Heartbeat     |
+                |    | Interval      |
+                |<---+ Exceeded      |
+                |                    |
+                |       Check        |
+                |     Heartbeat      |
+                |------------------->|
+                |                    |
+                |<-------------------|
+                |   Heart Beating    |
+                |                    |
+                |----+ Heartbeat     |
+                |    | Interval      |
+                |<---+ Exceeded      |
+                |                    |
+                |       Check        |
+                |     Heartbeat      |
+                |------------------->|
+                |                    |
+                |----+ Grace         |
+                |    | Period        |
+                |<---+ Exceeded      |
+                |                    |
+                |----+ Mark          |
+                |    | OSD 2         |
+                |<---+ Down          |
+                
+
+
+OSDs Report Down OSDs
+=====================
+
+By default, an OSD must report to the monitors that another OSD is ``down``
+three times before the monitors acknowledge that the reported OSD is ``down``.
+You can change the minimum number of ``osd down`` reports by adding an ``osd min
+down reports`` setting under the ``[osd]`` section of your Ceph configuration
+file, or by setting the value at runtime. By default, only one OSD is required
+to report another OSD down. You can change the number of OSDs required to report
+a monitor down by adding an ``osd min down reporters`` setting under the
+``[osd]`` section of your Ceph configuration file, or by setting the value at
+runtime.
+
+
+.. ditaa:: +---------+     +---------+
+           |  OSD 1  |     | Monitor |
+           +---------+     +---------+
+                |               |             
+                | OSD 2 Is Down |
+                |-------------->|
+                |               |             
+                | OSD 2 Is Down |
+                |-------------->|
+                |               |             
+                | OSD 2 Is Down |
+                |-------------->|
+                |               |             
+                |               |----------+ Mark
+                |               |          | OSD 2                
+                |               |<---------+ Down
+
+
+OSDs Report Peering Failure
+===========================
+
+If an OSD cannot peer with any of the OSDs defined in its Ceph configuration
+file, it will ping the monitor for the most recent copy of the cluster map every
+30 seconds. You can change the monitor heartbeat interval by adding an ``osd mon
+heartbeat interval`` setting under the ``[osd]`` section of your Ceph
+configuration file, or by setting the value at runtime.
+
+.. ditaa:: +---------+     +---------+     +-------+     +---------+
+           |  OSD 1  |     |  OSD 2  |     | OSD 3 |     | Monitor |
+           +---------+     +---------+     +-------+     +---------+
+                |               |              |              |
+                |  Request To   |              |              |
+                |     Peer      |              |              |               
+                |-------------->|              |              |
+                |<--------------|              |              |
+                |    Peering                   |              |
+                |                              |              |
+                |  Request To                  |              |
+                |     Peer                     |              |               
+                |----------------------------->|              |
+                |                                             |
+                |----+ OSD Monitor                            |
+                |    | Heartbeat                              |
+                |<---+ Interval Exceeded                      |
+                |                                             |
+                |         Failed to Peer with OSD 3           |
+                |-------------------------------------------->|
+                |<--------------------------------------------|
+                |          Receive New Cluster Map            |
+ 
+
+OSDs Report Their Status
+========================
+
+If an OSD doesn't report to the monitor once at least every 120 seconds, the
+monitor will consider the OSD ``down``. You can change the monitor report
+interval by adding an ``osd mon report interval max`` setting under the
+``[osd]`` section of your Ceph configuration file, or by setting the value at
+runtime. The OSD attempts  to report on its status every 30 seconds. You can
+change the OSD report interval by adding an ``osd mon report interval min``
+setting under the ``[osd]`` section of your Ceph configuration file, or by
+setting the value at runtime.
+
+
+.. ditaa:: +---------+          +---------+
+           |  OSD 1  |          | Monitor |
+           +---------+          +---------+
+                |                    |
+                |----+ Report Min    |
+                |    | Interval      |
+                |<---+ Exceeded      |
+                |                    |
+                |     Report To      |
+                |      Monitor       |
+                |------------------->|
+                |                    |
+                |----+ Report Min    |
+                |    | Interval      |
+                |<---+ Exceeded      |
+                |                    |
+                | No Report          |
+                                     +----+ Report Max
+                                     |    | Interval
+                                     |<---+ Exceeded
+                                     |
+                                     +----+ Mark
+                                     |    | OSD 1
+                                     |<---+ Down
+