From aa88dfac4e6618cf5d71b6f9983572b13f2c4e12 Mon Sep 17 00:00:00 2001 From: Samuel Just Date: Mon, 11 Dec 2023 13:06:42 -0800 Subject: [PATCH] doc/rados/operations: add CRUSH MSR documentation Signed-off-by: Samuel Just --- doc/rados/operations/crush-map-edits.rst | 31 +++++++++++++++++++++--- doc/rados/operations/crush-map.rst | 22 +++++++++++++++++ 2 files changed, 50 insertions(+), 3 deletions(-) diff --git a/doc/rados/operations/crush-map-edits.rst b/doc/rados/operations/crush-map-edits.rst index 46a4a4f74e873..84fd85dc2c01b 100644 --- a/doc/rados/operations/crush-map-edits.rst +++ b/doc/rados/operations/crush-map-edits.rst @@ -419,7 +419,7 @@ centers for three-way replication, and yet another rule for erasure coding acros six storage devices. For a detailed discussion of CRUSH rules, see **Section 3.2** of `CRUSH - Controlled, Scalable, Decentralized Placement of Replicated Data`_. -A rule takes the following form:: +A normal CRUSH rule takes the following form:: rule { @@ -430,6 +430,19 @@ A rule takes the following form:: step emit } +CRUSH MSR (Multi-Step Retry) rules are a distinct type of CRUSH rule which +supports retrying steps and provides better support for configurations that +require multiple OSDs within each failure domain. MSR rules take the following +form:: + + rule { + + id [a unique integer ID] + type [msr_indep|msr_firstn] + step take [class ] + step choosemsr type + step emit + } ``id`` :Description: A unique integer that identifies the rule. @@ -441,12 +454,14 @@ A rule takes the following form:: ``type`` :Description: Denotes the type of replication strategy to be enforced by the - rule. + rule. msr_firstn and msr_indep are a distinct descent algorithm + which supports retrying steps within the rule and therefore + multiple OSDs per failure domain. :Purpose: A component of the rule mask. :Type: String :Required: Yes :Default: ``replicated`` - :Valid Values: ``replicated`` or ``erasure`` + :Valid Values: ``replicated``, ``erasure``, ``msr_firstn``, ``msr_indep`` ``step take [class ]`` @@ -525,6 +540,16 @@ A rule takes the following form:: final CRUSH mapping transformation is therefore 1, 2, 3, 4, 5 → 1, 2, 6, 4, 5. +``step choosemsr {num} type {bucket-type}`` + :Description: Selects a num buckets of type bucket-type. msr_firstn and msr_indep + must use choosemsr rather than choose or chooseleaf. + + - If ``{num} == 0``, choose ``pool-num-replicas`` buckets (as many buckets as are available). + - If ``pool-num-replicas > {num} > 0``, choose that many buckets. + :Purpose: Choose step required for msr_firstn and msr_indep rules. + :Prerequisite: Follows ``step take`` and precedes ``step emit`` + :Example: ``step choosemsr 3 type host`` + .. _crush-reclassify: Migrating from a legacy SSD rule to device classes diff --git a/doc/rados/operations/crush-map.rst b/doc/rados/operations/crush-map.rst index 39151e6d4a766..8f833d28dacb6 100644 --- a/doc/rados/operations/crush-map.rst +++ b/doc/rados/operations/crush-map.rst @@ -709,6 +709,13 @@ The relevant erasure-code profile properties are as follows: [default: ``default``]. * **crush-failure-domain**: the CRUSH bucket type used in the distribution of erasure-coded shards [default: ``host``]. + * **crush-osds-per-failure-domain**: Maximum number of OSDs to place in each + failure domain -- defaults to 1. Using a value greater than one will + cause a CRUSH MSR rule to be created, see below. Must be specified if + ``crush-num-failure-domains`` is specified. + * **crush-num-failure-domains**: Number of failure domains to map. Must be + specified if ``crush-osds-per-failure-domain`` is specified. Results in + a CRUSH MSR rule being created. * **crush-device-class**: the device class on which to place data [default: none, which means that all devices are used]. * **k** and **m** (and, for the ``lrc`` plugin, **l**): these determine the @@ -726,6 +733,21 @@ The relevant erasure-code profile properties are as follows: argument is omitted, then Ceph will create the CRUSH rule automatically. +CRUSH MSR Rules +--------------- + +Creating an erasure-code profile with a ``crush-osds-per-failure-domain`` +value greater than one will cause a CRUSH MSR rule type to be created +instead of a normal CRUSH rule. Normal crush rules cannot retry prior +steps when an out OSD is encountered and rely on CHOOSELEAF steps to +permit moving OSDs to new hosts. However, CHOOSELEAF rules don't +support more than a single OSD per failure domain. MSR rules, new in +squid, support multiple OSDs per failure domain by retrying all prior +steps when an out OSD is encountered. Using MSR rules requires that +OSDs and clients be required to support the CRUSH_MSR feature bit +(squid or newer). + + Deleting rules -------------- -- 2.39.5