From: Zac Dover Date: Mon, 1 May 2023 02:29:07 +0000 (+1000) Subject: doc/rados: edit stretch-mode.rst X-Git-Tag: v17.2.7~435^2 X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=68cb4e7040584ece5b76fa5a114574d4cee849f8;p=ceph.git doc/rados: edit stretch-mode.rst Refine and supplement the introductory and explanatory text at the top of the /doc/rados/operations/stretch-mode.rst file. Co-authored-by: Josh Durgin Co-authored-by: Anthony D'Atri Signed-off-by: Zac Dover (cherry picked from commit b642900abc57302e62a5064dba510c3cc5743ac0) --- diff --git a/doc/rados/operations/stretch-mode.rst b/doc/rados/operations/stretch-mode.rst index 7b7364049db2f..a540d819d5e01 100644 --- a/doc/rados/operations/stretch-mode.rst +++ b/doc/rados/operations/stretch-mode.rst @@ -7,33 +7,44 @@ Stretch Clusters Stretch Clusters ================ -Ceph generally expects all parts of its network and overall cluster to be -equally reliable, with failures randomly distributed across the CRUSH map. -So you may lose a switch that knocks out a number of OSDs, but we expect -the remaining OSDs and monitors to route around that. - -This is usually a good choice, but may not work well in some -stretched cluster configurations where a significant part of your cluster -is stuck behind a single network component. For instance, a single -cluster which is located in multiple data centers, and you want to -sustain the loss of a full DC. - -There are two standard configurations we've seen deployed, with either -two or three data centers (or, in clouds, availability zones). With two -zones, we expect each site to hold a copy of the data, and for a third -site to have a tiebreaker monitor (this can be a VM or high-latency compared -to the main sites) to pick a winner if the network connection fails and both -DCs remain alive. For three sites, we expect a copy of the data and an equal -number of monitors in each site. - -Note that the standard Ceph configuration will survive MANY failures of the -network or data centers and it will never compromise data consistency. If you -bring back enough Ceph servers following a failure, it will recover. If you -lose a data center, but can still form a quorum of monitors and have all the data -available (with enough copies to satisfy pools' ``min_size``, or CRUSH rules -that will re-replicate to meet it), Ceph will maintain availability. - -What can't it handle? + +A stretch cluster is a cluster that has servers in geographically separated +data centers, distributed over a WAN. Stretch clusters have LAN-like high-speed +and low-latency connections, but limited links. Stretch clusters have a higher +likelihood of (possibly asymmetric) network splits, and a higher likelihood of +temporary or complete loss of an entire data center (which can represent +one-third to one-half of the total cluster). + +Ceph is designed with the expectation that all parts of its network and cluster +will be reliable and that failures will be distributed randomly across the +CRUSH map. Even if a switch goes down and causes the loss of many OSDs, Ceph is +designed so that the remaining OSDs and monitors will route around such a loss. + +Sometimes this cannot be relied upon. If you have a "stretched-cluster" +deployment in which much of your cluster is behind a single network component, +you might need to use **stretch mode** to ensure data integrity. + +We will here consider two standard configurations: a configuration with two +data centers (or, in clouds, two availability zones), and a configuration with +three data centers (or, in clouds, three availability zones). + +In the two-site configuration, Ceph expects each of the sites to hold a copy of +the data, and Ceph also expects there to be a third site that has a tiebreaker +monitor. This tiebreaker monitor picks a winner if the network connection fails +and both data centers remain alive. + +The tiebreaker monitor can be a VM. It can also have high latency relative to +the two main sites. + +The standard Ceph configuration is able to survive MANY network failures or +data-center failures without ever compromising data availability. If enough +Ceph servers are brought back following a failure, the cluster *will* recover. +If you lose a data center but are still able to form a quorum of monitors and +still have all the data available, Ceph will maintain availability. (This +assumes that the cluster has enough copies to satisfy the pools' ``min_size`` +configuration option, or (failing that) that the cluster has CRUSH rules in +place that will cause the cluster to re-replicate the data until the +``min_size`` configuration option has been met.) Stretch Cluster Issues ======================