doc/rados: line-edit erasure-code.rst

author Zac Dover <zac.dover@proton.me>

Tue, 21 Mar 2023 12:27:15 +0000 (22:27 +1000)

committer Zac Dover <zac.dover@proton.me>

Tue, 21 Mar 2023 22:10:22 +0000 (08:10 +1000)
author Zac Dover <zac.dover@proton.me>
Tue, 21 Mar 2023 12:27:15 +0000 (22:27 +1000)
committer Zac Dover <zac.dover@proton.me>
Tue, 21 Mar 2023 22:10:22 +0000 (08:10 +1000)
diff --git a/doc/rados/operations/erasure-code.rst b/doc/rados/operations/erasure-code.rst

index e4e404574f6ae9048eccb884ff876bc82e76f7b5..d6321af3cacababb42faf171c16c42426e7227a1 100644 (file)
--- a/doc/rados/operations/erasure-code.rst
+++ b/doc/rados/operations/erasure-code.rst
@@ -1,14 +1,14 @@
  .. _ecpool:
  
-=============
+==============
   Erasure code
-=============
+==============
  
  By default, Ceph `pools <../pools>`_ are created with the type "replicated". In
-replicated-type pools, every object is copied to multiple disks (this
-multiple copying is the "replication").
+replicated-type pools, every object is copied to multiple disks. This
+multiple copying is the method of data protection known as "replication".
  
-In contrast, `erasure-coded <https://en.wikipedia.org/wiki/Erasure_code>`_
+By contrast, `erasure-coded <https://en.wikipedia.org/wiki/Erasure_code>`_
  pools use a method of data protection that is different from replication. In
  erasure coding, data is broken into fragments of two kinds: data blocks and
  parity blocks. If a drive fails or becomes corrupted, the parity blocks are
@@ -23,10 +23,10 @@ first forward error correction code was developed in 1950 by Richard
  Hamming at Bell Laboratories.
  
  
-Creating a sample erasure coded pool
+Creating a sample erasure-coded pool
  ------------------------------------
  
-The simplest erasure coded pool is equivalent to `RAID5
+The simplest erasure-coded pool is similar to `RAID5
  <https://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_5>`_ and
  requires at least three hosts:
  
@@ -47,12 +47,13 @@ requires at least three hosts:
  
     ABCDEFGHI
  
-Erasure code profiles
+Erasure-code profiles
  ---------------------
  
-The default erasure code profile can sustain the loss of two OSDs. This erasure
-code profile is equivalent to a replicated pool of size three, but requires
-2TB to store 1TB of data instead of 3TB to store 1TB of data. The default
+The default erasure-code profile can sustain the overlapping loss of two OSDs
+without losing data. This erasure-code profile is equivalent to a replicated
+pool of size three, but with different storage requirements: instead of
+requiring 3TB to store 1TB, it requires only 2TB to store 1TB. The default
  profile can be displayed with this command:
  
  .. prompt:: bash $
@@ -68,26 +69,27 @@ profile can be displayed with this command:
     technique=reed_sol_van
  
  .. note::
-   The default erasure-coded pool, the profile of which is displayed here, is
-   not the same as the simplest erasure-coded pool. 
-   
-   The default erasure-coded pool has two data chunks (k) and two coding chunks
-   (m). The profile of the default erasure-coded pool is "k=2 m=2".
+  The profile just displayed is for the *default* erasure-coded pool, not the
+  *simplest* erasure-coded pool. These two pools are not the same:
+
+   The default erasure-coded pool has two data chunks (K) and two coding chunks
+   (M). The profile of the default erasure-coded pool is "k=2 m=2".
  
-   The simplest erasure-coded pool has two data chunks (k) and one coding chunk
-   (m). The profile of the simplest erasure-coded pool is "k=2 m=1".
+   The simplest erasure-coded pool has two data chunks (K) and one coding chunk
+   (M). The profile of the simplest erasure-coded pool is "k=2 m=1".
  
  Choosing the right profile is important because the profile cannot be modified
  after the pool is created. If you find that you need an erasure-coded pool with
  a profile different than the one you have created, you must create a new pool
-with a different (and presumably more carefully-considered) profile. When the
-new pool is created, all objects from the wrongly-configured pool must be moved
-to the newly-created pool. There is no way to alter the profile of a pool after its creation.
+with a different (and presumably more carefully considered) profile. When the
+new pool is created, all objects from the wrongly configured pool must be moved
+to the newly created pool. There is no way to alter the profile of a pool after
+the pool has been created.
  
-The most important parameters of the profile are *K*, *M* and
+The most important parameters of the profile are *K*, *M*, and
  *crush-failure-domain* because they define the storage overhead and
  the data durability. For example, if the desired architecture must
-sustain the loss of two racks with a storage overhead of 67% overhead,
+sustain the loss of two racks with a storage overhead of 67%,
  the following profile can be defined:
  
  .. prompt:: bash $
@@ -106,7 +108,7 @@ the following profile can be defined:
  
  The *NYAN* object will be divided in three (*K=3*) and two additional
  *chunks* will be created (*M=2*). The value of *M* defines how many
-OSD can be lost simultaneously without losing any data. The
+OSDs can be lost simultaneously without losing any data. The
  *crush-failure-domain=rack* will create a CRUSH rule that ensures
  no two *chunks* are stored in the same rack.
  
@@ -155,19 +157,19 @@ no two *chunks* are stored in the same rack.
                                   +------+
  
   
-More information can be found in the `erasure code profiles
+More information can be found in the `erasure-code profiles
  <../erasure-code-profile>`_ documentation.
  
  
  Erasure Coding with Overwrites
  ------------------------------
  
-By default, erasure coded pools only work with uses like RGW that
-perform full object writes and appends.
+By default, erasure-coded pools work only with operations that
+perform full object writes and appends (for example, RGW).
  
-Since Luminous, partial writes for an erasure coded pool may be
+Since Luminous, partial writes for an erasure-coded pool may be
  enabled with a per-pool setting. This lets RBD and CephFS store their
-data in an erasure coded pool:
+data in an erasure-coded pool:
  
  .. prompt:: bash $
  
@@ -175,31 +177,33 @@ data in an erasure coded pool:
  
  This can be enabled only on a pool residing on BlueStore OSDs, since
  BlueStore's checksumming is used during deep scrubs to detect bitrot
-or other corruption. In addition to being unsafe, using Filestore with
-EC overwrites results in lower performance compared to BlueStore.
+or other corruption. Using Filestore with EC overwrites is not only
+unsafe, but it also results in lower performance compared to BlueStore.
  
-Erasure coded pools do not support omap, so to use them with RBD and
-CephFS you must instruct them to store their data in an EC pool, and
+Erasure-coded pools do not support omap, so to use them with RBD and
+CephFS you must instruct them to store their data in an EC pool and
  their metadata in a replicated pool. For RBD, this means using the
-erasure coded pool as the ``--data-pool`` during image creation:
+erasure-coded pool as the ``--data-pool`` during image creation:
  
  .. prompt:: bash $
  
      rbd create --size 1G --data-pool ec_pool replicated_pool/image_name
  
-For CephFS, an erasure coded pool can be set as the default data pool during
+For CephFS, an erasure-coded pool can be set as the default data pool during
  file system creation or via `file layouts <../../../cephfs/file-layouts>`_.
  
  
-Erasure coded pool and cache tiering
-------------------------------------
+Erasure-coded pools and cache tiering
+-------------------------------------
  
-Erasure coded pools require more resources than replicated pools and
-lack some functionality such as omap. To overcome these
-limitations, one can set up a `cache tier <../cache-tiering>`_
-before the erasure coded pool.
+Erasure-coded pools require more resources than replicated pools and
+lack some of the functionality supported by replicated pools (for example, omap).
+To overcome these limitations, one can set up a `cache tier <../cache-tiering>`_
+before setting up the erasure-coded pool.
  
-For instance, if the pool *hot-storage* is made of fast storage:
+For example, if the pool *hot-storage* is made of fast storage, the following commands
+will place the *hot-storage* pool as a tier of *ecpool* in *writeback*
+mode:
  
  .. prompt:: bash $
  
@@ -207,58 +211,60 @@ For instance, if the pool *hot-storage* is made of fast storage:
     ceph osd tier cache-mode hot-storage writeback
     ceph osd tier set-overlay ecpool hot-storage
  
-will place the *hot-storage* pool as tier of *ecpool* in *writeback*
-mode so that every write and read to the *ecpool* are actually using
-the *hot-storage* and benefit from its flexibility and speed.
+The result is that every write and read to the *ecpool* actually uses
+the *hot-storage* pool and benefits from its flexibility and speed.
  
  More information can be found in the `cache tiering
-<../cache-tiering>`_ documentation.  Note however that cache tiering
+<../cache-tiering>`_ documentation. Note, however, that cache tiering
  is deprecated and may be removed completely in a future release.
  
-Erasure coded pool recovery
+Erasure-coded pool recovery
  ---------------------------
-If an erasure coded pool loses some data shards, it must recover them from others.
-This involves reading from the remaining shards, reconstructing the data, and
+If an erasure-coded pool loses any data shards, it must recover them from others.
+This recovery involves reading from the remaining shards, reconstructing the data, and
  writing new shards.
+
  In Octopus and later releases, erasure-coded pools can recover as long as there are at least *K* shards
  available. (With fewer than *K* shards, you have actually lost data!)
  
-Prior to Octopus, erasure coded pools required at least ``min_size`` shards to be
-available, even if ``min_size`` is greater than ``K``. We recommend ``min_size``
-be ``K+2`` or more to prevent loss of writes and data.
-This conservative decision was made out of an abundance of caution when
-designing the new pool mode.  As a result pools with lost OSDs but without
-complete loss of any data were unable to recover and go active
-without manual intervention to temporarily change the ``min_size`` setting.
+Prior to Octopus, erasure-coded pools required that at least ``min_size`` shards be
+available, even if ``min_size`` was greater than ``K``. This was a conservative
+decision made out of an abundance of caution when designing the new pool
+mode. As a result, however, pools with lost OSDs but without complete data loss were
+unable to recover and go active without manual intervention to temporarily change
+the ``min_size`` setting.
+
+We recommend that ``min_size`` be ``K+2`` or greater to prevent loss of writes and
+loss of data.
+
+
  
  Glossary
  --------
  
  *chunk*
-   when the encoding function is called, it returns chunks of the same
-   size. Data chunks which can be concatenated to reconstruct the original
-   object and coding chunks which can be used to rebuild a lost chunk.
+   When the encoding function is called, it returns chunks of the same size as each other. There are two
+   kinds of chunks: (1) *data chunks*, which can be concatenated to reconstruct the original object, and
+   (2) *coding chunks*, which can be used to rebuild a lost chunk.
  
  *K*
-   the number of data *chunks*, i.e. the number of *chunks* in which the
-   original object is divided. For instance if *K* = 2 a 10KB object
-   will be divided into *K* objects of 5KB each.
+   The number of data chunks into which an object is divided. For example, if *K* = 2, then a 10KB object
+   is divided into two objects of 5KB each.
  
  *M*
-   the number of coding *chunks*, i.e. the number of additional *chunks*
-   computed by the encoding functions. If there are 2 coding *chunks*,
-   it means 2 OSDs can be out without losing data.
-
+   The number of coding chunks computed by the encoding function. *M* is equal to the number of OSDs that can
+   be missing from the cluster without the cluster suffering data loss. For example, if there are two coding
+   chunks, then two OSDs can be missing without data loss.
  
-Table of content
-----------------
+Table of contents
+-----------------
  
  .. toctree::
-       :maxdepth: 1
-
-       erasure-code-profile
-       erasure-code-jerasure
-       erasure-code-isa
-       erasure-code-lrc
-       erasure-code-shec
-       erasure-code-clay
+    :maxdepth: 1
+
+    erasure-code-profile
+    erasure-code-jerasure
+    erasure-code-isa
+    erasure-code-lrc
+    erasure-code-shec
+    erasure-code-clay
author	Zac Dover <zac.dover@proton.me>
	Tue, 21 Mar 2023 12:27:15 +0000 (22:27 +1000)
committer	Zac Dover <zac.dover@proton.me>
	Tue, 21 Mar 2023 22:10:22 +0000 (08:10 +1000)