From: Noah Watkins <noahwatkins@gmail.com>
Date: Sat, 23 Feb 2013 01:58:25 +0000 (-0800)
Subject: doc: Hadoop clarifications
X-Git-Tag: v0.67-rc1~81^2~3
X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=743c528754ee64a5db5149810c1425f5b3469cdc;p=ceph.git

doc: Hadoop clarifications

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
---

diff --git a/doc/cephfs/hadoop.rst b/doc/cephfs/hadoop.rst
index 625d46a0eecd..ddfa07a88daf 100644
--- a/doc/cephfs/hadoop.rst
+++ b/doc/cephfs/hadoop.rst
@@ -50,17 +50,21 @@ These options are intended to be set in the Hadoop configuration file
 Support For Per-file Custom Replication
 ---------------------------------------
 
-Hadoop users may specify a custom replication factor (e.g. 3 copies of each
-block) when creating a file. However, object replication factors are
-controlled on a per-pool basis in Ceph, and by default a Ceph file system will
-contain a pre-configured pool. In order to support per-file replication Hadoop
-can be configured to select from alternative pools when creating new files.
+The Hadoop file system interface allows users to specify a custom replication
+factor (e.g. 3 copies of each block) when creating a file. However, object
+replication factors in the Ceph file system are controlled on a per-pool
+basis, and by default a Ceph file system will contain only a single
+pre-configured pool. Thus, in order to support per-file replication with
+Hadoop over Ceph, additional storage pools with non-default replications
+factors must be created, and Hadoop must be configured to choose from these
+additional pools.
 
 Additional data pools can be specified using the ``ceph.data.pools``
 configuration option. The value of the option is a comma separated list of
 pool names. The default Ceph pool will be used automatically if this
 configuration option is omitted or the value is empty. For example, the
-following configuration setting will consider the three pools listed. ::
+following configuration setting will consider the pools ``pool1``, ``pool2``, and
+``pool5`` when selecting a target pool to store a file. ::
 
 	<property>
 	  <name>ceph.data.pools</name>
@@ -76,7 +80,7 @@ documentation`_.
 .. _RADOS Pool documentation: ../../rados/operations/pools
 
 Once a pool has been created and configured the metadata service must be told
-that the new pool may be used to store file data. A pool can be made available
+that the new pool may be used to store file data. A pool is be made available
 for storing file system data using the ``ceph mds add_data_pool`` command.
 
 First, create the pool. In this example we create the ``hadoop1`` pool with
@@ -85,8 +89,9 @@ replication factor 1. ::
     ceph osd pool create hadoop1 100
     ceph osd pool set hadoop1 size 1
 
-Next, determine the pool id. This can be done using the ``ceph osd dump``
-command. For example, we can look for the newly created ``hadoop1`` pool. ::
+Next, determine the pool id. This can be done by examining the output of the
+``ceph osd dump`` command. For example, we can look for the newly created
+``hadoop1`` pool. ::
 
     ceph osd dump | grep hadoop1
 
@@ -107,11 +112,11 @@ selecting the target pool for new files. ::
 		<value>hadoop1</value>
 	</property>
 
-Pool Selection Semantics
-~~~~~~~~~~~~~~~~~~~~~~~~
+Pool Selection Rules
+~~~~~~~~~~~~~~~~~~~~
 
-The following semantics describe the rules by which Hadoop will choose a pool
-given a desired replication factor and the set of pools specified using the
+The following rules describe how Hadoop chooses a pool given a desired
+replication factor and the set of pools specified using the
 ``ceph.data.pools`` configuration option.
 
 1. When no custom pools are specified the default Ceph data pool is used.