doc: Hadoop clarifications

author Noah Watkins <noahwatkins@gmail.com>

Sat, 23 Feb 2013 01:58:25 +0000 (17:58 -0800)

committer Noah Watkins <noahwatkins@gmail.com>

Thu, 30 May 2013 20:29:42 +0000 (13:29 -0700)
author Noah Watkins <noahwatkins@gmail.com>
Sat, 23 Feb 2013 01:58:25 +0000 (17:58 -0800)
committer Noah Watkins <noahwatkins@gmail.com>
Thu, 30 May 2013 20:29:42 +0000 (13:29 -0700)
diff --git a/doc/cephfs/hadoop.rst b/doc/cephfs/hadoop.rst

index 625d46a0eecd108c9b2e0fc94758a3e86772d6b6..ddfa07a88daf429f1954d4541ad7c6c717642acd 100644 (file)
--- a/doc/cephfs/hadoop.rst
+++ b/doc/cephfs/hadoop.rst
@@ -50,17 +50,21 @@ These options are intended to be set in the Hadoop configuration file
  Support For Per-file Custom Replication
  ---------------------------------------
  
-Hadoop users may specify a custom replication factor (e.g. 3 copies of each
-block) when creating a file. However, object replication factors are
-controlled on a per-pool basis in Ceph, and by default a Ceph file system will
-contain a pre-configured pool. In order to support per-file replication Hadoop
-can be configured to select from alternative pools when creating new files.
+The Hadoop file system interface allows users to specify a custom replication
+factor (e.g. 3 copies of each block) when creating a file. However, object
+replication factors in the Ceph file system are controlled on a per-pool
+basis, and by default a Ceph file system will contain only a single
+pre-configured pool. Thus, in order to support per-file replication with
+Hadoop over Ceph, additional storage pools with non-default replications
+factors must be created, and Hadoop must be configured to choose from these
+additional pools.
  
  Additional data pools can be specified using the ``ceph.data.pools``
  configuration option. The value of the option is a comma separated list of
  pool names. The default Ceph pool will be used automatically if this
  configuration option is omitted or the value is empty. For example, the
-following configuration setting will consider the three pools listed. ::
+following configuration setting will consider the pools ``pool1``, ``pool2``, and
+``pool5`` when selecting a target pool to store a file. ::
  
         <property>
           <name>ceph.data.pools</name>
@@ -76,7 +80,7 @@ documentation`_.
  .. _RADOS Pool documentation: ../../rados/operations/pools
  
  Once a pool has been created and configured the metadata service must be told
-that the new pool may be used to store file data. A pool can be made available
+that the new pool may be used to store file data. A pool is be made available
  for storing file system data using the ``ceph mds add_data_pool`` command.
  
  First, create the pool. In this example we create the ``hadoop1`` pool with
@@ -85,8 +89,9 @@ replication factor 1. ::
      ceph osd pool create hadoop1 100
      ceph osd pool set hadoop1 size 1
  
-Next, determine the pool id. This can be done using the ``ceph osd dump``
-command. For example, we can look for the newly created ``hadoop1`` pool. ::
+Next, determine the pool id. This can be done by examining the output of the
+``ceph osd dump`` command. For example, we can look for the newly created
+``hadoop1`` pool. ::
  
      ceph osd dump | grep hadoop1
  
@@ -107,11 +112,11 @@ selecting the target pool for new files. ::
                 <value>hadoop1</value>
         </property>
  
-Pool Selection Semantics
-~~~~~~~~~~~~~~~~~~~~~~~~
+Pool Selection Rules
+~~~~~~~~~~~~~~~~~~~~
  
-The following semantics describe the rules by which Hadoop will choose a pool
-given a desired replication factor and the set of pools specified using the
+The following rules describe how Hadoop chooses a pool given a desired
+replication factor and the set of pools specified using the
  ``ceph.data.pools`` configuration option.
  
  1. When no custom pools are specified the default Ceph data pool is used.
author	Noah Watkins <noahwatkins@gmail.com>
	Sat, 23 Feb 2013 01:58:25 +0000 (17:58 -0800)
committer	Noah Watkins <noahwatkins@gmail.com>
	Thu, 30 May 2013 20:29:42 +0000 (13:29 -0700)