Support For Per-file Custom Replication
---------------------------------------
-Hadoop users may specify a custom replication factor (e.g. 3 copies of each
-block) when creating a file. However, object replication factors are
-controlled on a per-pool basis in Ceph, and by default a Ceph file system will
-contain a pre-configured pool. In order to support per-file replication Hadoop
-can be configured to select from alternative pools when creating new files.
+The Hadoop file system interface allows users to specify a custom replication
+factor (e.g. 3 copies of each block) when creating a file. However, object
+replication factors in the Ceph file system are controlled on a per-pool
+basis, and by default a Ceph file system will contain only a single
+pre-configured pool. Thus, in order to support per-file replication with
+Hadoop over Ceph, additional storage pools with non-default replications
+factors must be created, and Hadoop must be configured to choose from these
+additional pools.
Additional data pools can be specified using the ``ceph.data.pools``
configuration option. The value of the option is a comma separated list of
pool names. The default Ceph pool will be used automatically if this
configuration option is omitted or the value is empty. For example, the
-following configuration setting will consider the three pools listed. ::
+following configuration setting will consider the pools ``pool1``, ``pool2``, and
+``pool5`` when selecting a target pool to store a file. ::
<property>
<name>ceph.data.pools</name>
.. _RADOS Pool documentation: ../../rados/operations/pools
Once a pool has been created and configured the metadata service must be told
-that the new pool may be used to store file data. A pool can be made available
+that the new pool may be used to store file data. A pool is be made available
for storing file system data using the ``ceph mds add_data_pool`` command.
First, create the pool. In this example we create the ``hadoop1`` pool with
ceph osd pool create hadoop1 100
ceph osd pool set hadoop1 size 1
-Next, determine the pool id. This can be done using the ``ceph osd dump``
-command. For example, we can look for the newly created ``hadoop1`` pool. ::
+Next, determine the pool id. This can be done by examining the output of the
+``ceph osd dump`` command. For example, we can look for the newly created
+``hadoop1`` pool. ::
ceph osd dump | grep hadoop1
<value>hadoop1</value>
</property>
-Pool Selection Semantics
-~~~~~~~~~~~~~~~~~~~~~~~~
+Pool Selection Rules
+~~~~~~~~~~~~~~~~~~~~
-The following semantics describe the rules by which Hadoop will choose a pool
-given a desired replication factor and the set of pools specified using the
+The following rules describe how Hadoop chooses a pool given a desired
+replication factor and the set of pools specified using the
``ceph.data.pools`` configuration option.
1. When no custom pools are specified the default Ceph data pool is used.