From 17d6e4315b7c04fe12234c08d959b3c06a84d32c Mon Sep 17 00:00:00 2001 From: Zac Dover Date: Tue, 18 Jun 2024 15:22:41 +1000 Subject: [PATCH] doc/rados: add stretch_rule workaround Add a method for defining a CRUSH rule that returns the actual value of the total available size. Fixes: https://tracker.ceph.com/issues/56650 Signed-off-by: Zac Dover (cherry picked from commit 007385a3ef05bd92e006fc7d6aba3fbb51792ef7) --- doc/rados/operations/stretch-mode.rst | 51 +++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) diff --git a/doc/rados/operations/stretch-mode.rst b/doc/rados/operations/stretch-mode.rst index 787e8cb4d9309..5c7269b2e9a44 100644 --- a/doc/rados/operations/stretch-mode.rst +++ b/doc/rados/operations/stretch-mode.rst @@ -130,6 +130,57 @@ your CRUSH map. This procedure shows how to do this. step emit } + .. warning:: If a CRUSH rule is defined for a stretch mode cluster and the + rule has multiple "takes" in it, then ``MAX AVAIL`` for the pools + associated with the CRUSH rule will report that the available size is all + of the available space from the datacenter, not the available space for + the pools associated with the CRUSH rule. + + For example, consider a cluster with two CRUSH rules, ``stretch_rule`` and + ``stretch_replicated_rule``:: + + rule stretch_rule { + id 1 + type replicated + step take DC1 + step chooseleaf firstn 2 type host + step emit + step take DC2 + step chooseleaf firstn 2 type host + step emit + } + + rule stretch_replicated_rule { + id 2 + type replicated + step take default + step choose firstn 0 type datacenter + step chooseleaf firstn 2 type host + step emit + } + + In the above example, ``stretch_rule`` will report an incorrect value for + ``MAX AVAIL``. ``stretch_replicated_rule`` will report the correct value. + This is because ``stretch_rule`` is defined in such a way that + ``PGMap::get_rule_avail`` considers only the available size of a single + data center, and not (as would be correct) the total available size from + both datacenters. + + Here is a workaround. Instead of defining the stretch rule as defined in + the ``stretch_rule`` function above, define it as follows:: + + rule stretch_rule { + id 2 + type replicated + step take default + step choose firstn 0 type datacenter + step chooseleaf firstn 2 type host + step emit + } + + See https://tracker.ceph.com/issues/56650 for more detail on this workaround. + + #. Inject the CRUSH map to make the rule available to the cluster: .. prompt:: bash $ -- 2.39.5