From e6ae37f56894462b86fe4eb26e9f1b1278bf0e9c Mon Sep 17 00:00:00 2001 From: Kamoltat Sirivadhna Date: Tue, 7 Jan 2025 09:36:03 +0000 Subject: [PATCH] doc/rados/operations/stretch-mode: Improve doc Added more content and rewrite some sections Signed-off-by: Kamoltat Sirivadhna (cherry picked from commit 8cc7fdbd29e1bf936b33256c74d48e23d75eaf96) --- doc/rados/operations/stretch-mode.rst | 56 ++++++++++++++++++++++++--- 1 file changed, 50 insertions(+), 6 deletions(-) diff --git a/doc/rados/operations/stretch-mode.rst b/doc/rados/operations/stretch-mode.rst index 37c1de14b83..da003569295 100644 --- a/doc/rados/operations/stretch-mode.rst +++ b/doc/rados/operations/stretch-mode.rst @@ -84,13 +84,29 @@ situation is surprisingly difficult to avoid using only standard CRUSH rules. Stretch Mode ============ -Stretch mode is designed to handle deployments in which you cannot guarantee the -replication of data across two data centers. This kind of situation can arise -when the cluster's CRUSH rule specifies that three copies are to be made, but -then a copy is placed in each data center with a ``min_size`` of 2. Under such -conditions, a placement group can become active with two copies in the first -data center and no copies in the second data center. +Stretch mode is designed to handle netsplit scenarios between two data zones as well +as the loss of one data zone. It handles the netsplit scenario by choosing the surviving zone +that has the better connection to the ``tiebreaker monitor``. It handles the loss of one zone by +reducing the ``size`` to ``2`` and ``min_size`` to ``1``, allowing the cluster to continue operating +with the remaining zone. When the lost zone comes back, the cluster will recover the lost data +and return to normal operation. + +Connectivity Monitor Election Strategy +--------------------------------------- +When using stretch mode, the monitor election strategy must be set to ``connectivity``. +This strategy tracks network connectivity between the monitors and is +used to determine which zone should be favored when the cluster is in a netsplit scenario. + +See `Changing Monitor Elections`_ + +Stretch Peering Rule +-------------------- +One critical behavior of stretch mode is its ability to prevent a PG from going active if the acting set +contains only replicas from a single zone. This safeguard is crucial for mitigating the risk of data +loss during site failures because if a PG were allowed to go active with replicas only in a single site, +writes could be acknowledged despite a lack of redundancy. In the event of a site failure, all data in the +affected PG would be lost. Entering Stretch Mode --------------------- @@ -235,6 +251,34 @@ possible, if needed). .. _Changing Monitor elections: ../change-mon-elections +Exiting Stretch Mode +-------------------- +To exit stretch mode, run the following command: + +.. prompt:: bash $ + + ceph mon disable_stretch_mode [{crush_rule}] --yes-i-really-mean-it + + +.. describe:: {crush_rule} + + The CRUSH rule that the user wants all pools to move back to. If this + is not specified, the pools will move back to the default CRUSH rule. + + :Type: String + :Required: No. + +The command will move the cluster back to normal mode, +and the cluster will no longer be in stretch mode. +All pools will move its ``size`` and ``min_size`` +back to the default values it started with. +At this point the user is responsible for scaling down the cluster +to the desired number of OSDs if they choose to operate with less number of OSDs. + +Please note that the command will not execute when the cluster is in +``recovery stretch mode``. The command will only execute when the cluster +is in ``degraded stretch mode`` or ``healthy stretch mode``. + Limitations of Stretch Mode =========================== When using stretch mode, OSDs must be located at exactly two sites. -- 2.39.5