From: Noah Watkins Date: Sat, 23 Feb 2013 01:58:25 +0000 (-0800) Subject: doc: Hadoop clarifications X-Git-Tag: v0.67-rc1~81^2~3 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=743c528754ee64a5db5149810c1425f5b3469cdc;p=ceph.git doc: Hadoop clarifications Signed-off-by: Noah Watkins --- diff --git a/doc/cephfs/hadoop.rst b/doc/cephfs/hadoop.rst index 625d46a0eecd..ddfa07a88daf 100644 --- a/doc/cephfs/hadoop.rst +++ b/doc/cephfs/hadoop.rst @@ -50,17 +50,21 @@ These options are intended to be set in the Hadoop configuration file Support For Per-file Custom Replication --------------------------------------- -Hadoop users may specify a custom replication factor (e.g. 3 copies of each -block) when creating a file. However, object replication factors are -controlled on a per-pool basis in Ceph, and by default a Ceph file system will -contain a pre-configured pool. In order to support per-file replication Hadoop -can be configured to select from alternative pools when creating new files. +The Hadoop file system interface allows users to specify a custom replication +factor (e.g. 3 copies of each block) when creating a file. However, object +replication factors in the Ceph file system are controlled on a per-pool +basis, and by default a Ceph file system will contain only a single +pre-configured pool. Thus, in order to support per-file replication with +Hadoop over Ceph, additional storage pools with non-default replications +factors must be created, and Hadoop must be configured to choose from these +additional pools. Additional data pools can be specified using the ``ceph.data.pools`` configuration option. The value of the option is a comma separated list of pool names. The default Ceph pool will be used automatically if this configuration option is omitted or the value is empty. For example, the -following configuration setting will consider the three pools listed. :: +following configuration setting will consider the pools ``pool1``, ``pool2``, and +``pool5`` when selecting a target pool to store a file. :: ceph.data.pools @@ -76,7 +80,7 @@ documentation`_. .. _RADOS Pool documentation: ../../rados/operations/pools Once a pool has been created and configured the metadata service must be told -that the new pool may be used to store file data. A pool can be made available +that the new pool may be used to store file data. A pool is be made available for storing file system data using the ``ceph mds add_data_pool`` command. First, create the pool. In this example we create the ``hadoop1`` pool with @@ -85,8 +89,9 @@ replication factor 1. :: ceph osd pool create hadoop1 100 ceph osd pool set hadoop1 size 1 -Next, determine the pool id. This can be done using the ``ceph osd dump`` -command. For example, we can look for the newly created ``hadoop1`` pool. :: +Next, determine the pool id. This can be done by examining the output of the +``ceph osd dump`` command. For example, we can look for the newly created +``hadoop1`` pool. :: ceph osd dump | grep hadoop1 @@ -107,11 +112,11 @@ selecting the target pool for new files. :: hadoop1 -Pool Selection Semantics -~~~~~~~~~~~~~~~~~~~~~~~~ +Pool Selection Rules +~~~~~~~~~~~~~~~~~~~~ -The following semantics describe the rules by which Hadoop will choose a pool -given a desired replication factor and the set of pools specified using the +The following rules describe how Hadoop chooses a pool given a desired +replication factor and the set of pools specified using the ``ceph.data.pools`` configuration option. 1. When no custom pools are specified the default Ceph data pool is used.