zones, we expect each site to hold a copy of the data, and for a third
site to have a tiebreaker monitor (this can be a VM or high-latency compared
to the main sites) to pick a winner if the network connection fails and both
-DCs remain alive. For three sites, we expect a a copy of the data and an equal
+DCs remain alive. For three sites, we expect a copy of the data and an equal
number of monitors in each site.
-Note, the standard Ceph configuration will survive MANY failures of
-the network or Data Centers, if you have configured it correctly, and it will
-never compromise data consistency -- if you bring back enough of the Ceph servers
-following a failure, it will recover. If you lose
-a data center and can still form a quorum of monitors and have all the data
-available (with enough copies to satisfy min_size, or CRUSH rules that will
-re-replicate to meet it), Ceph will maintain availability.
+Note that the standard Ceph configuration will survive MANY failures of the
+network or data centers and it will never compromise data consistency. If you
+bring back enough Ceph servers following a failure, it will recover. If you
+lose a data center, but can still form a quorum of monitors and have all the data
+available (with enough copies to satisfy pools' ``min_size``, or CRUSH rules
+that will re-replicate to meet it), Ceph will maintain availability.
What can't it handle?
The second important category of failures is when you think you have data replicated
across data centers, but the constraints aren't sufficient to guarantee this.
For instance, you might have data centers A and B, and your CRUSH rule targets 3 copies
-and places a copy in each data center with a min_size of 2. The PG may go active with
+and places a copy in each data center with a ``min_size`` of 2. The PG may go active with
2 copies in site A and no copies in site B, which means that if you then lose site A you
have lost data and Ceph can't operate on it. This situation is surprisingly difficult
to avoid with standard CRUSH rules.
Stretch Mode
============
-The new stretch mode is designed to handle the 2-site case. (3 sites are
-just as susceptible to netsplit issues, but much more resilient to surprising
-data availability ones than 2-site clusters are.)
+The new stretch mode is designed to handle the 2-site case. Three sites are
+just as susceptible to netsplit issues, but are much more tolerant of
+component availability outages than 2-site clusters are.
To enter stretch mode, you must set the location of each monitor, matching
-your CRUSH map. For instance, to place mon.a in your first data center ::
+your CRUSH map. For instance, to place ``mon.a`` in your first data center ::
$ ceph mon set_location a datacenter=site1
Next, generate a CRUSH rule which will place 2 copies in each data center. This
-will require editing the crush map directly::
+will require editing the CRUSH map directly::
$ ceph osd getcrushmap > crush.map.bin
$ crushtool -d crush.map.bin -o crush.map.txt
-Then edit the crush.map.txt file to add a new rule. Here
-there is only one other rule, so this is id 1, but you may need
-to use a different rule id. We also have two data center buckets
-named site1 and site2::
+Now edit the ``crush.map.txt`` file to add a new rule. Here
+there is only one other rule, so this is ID 1, but you may need
+to use a different rule ID. We also have two datacenter buckets
+named ``site1`` and ``site2``::
rule stretch_rule {
id 1
step emit
}
-Finally, inject the crushmap to make the rule available to the cluster::
+Finally, inject the CRUSH map to make the rule available to the cluster::
$ crushtool -c crush.map.txt -o crush2.map.bin
$ ceph osd setcrushmap -i crush2.map.bin
.. _Changing Monitor elections: ../change-mon-elections
+And lastly, tell the cluster to enter stretch mode. Here, ``mon.e`` is the
+tiebreaker and we are splitting across data centers ::
-And last, tell the cluster to enter stretch mode. Here, mon.e is the
-tiebreaker and we are splitting across datacenters ::
-
- $ ceph mon enable_stretch_mode e stretch_rule datacenter
+ $ ceph mon enable_stretch_mode e stretch_rule data center
When stretch mode is enabled, the OSDs wlll only take PGs active when
-they peer across datacenters (or whatever other CRUSH bucket type
+they peer across data centers (or whatever other CRUSH bucket type
you specified), assuming both are alive. Pools will increase in size
from the default 3 to 4, expecting 2 copies in each site. OSDs will only
be allowed to connect to monitors in the same data center.
You must create your own CRUSH rule which provides 2 copies in each site, and
you must use 4 total copies with 2 in each site. If you have existing pools
with non-default size/min_size, Ceph will object when you attempt to
-enable_stretch_mode.
+enable stretch mode.
-Because it runs with min_size 1 when degraded, you should only use stretch mode
-with all-flash OSDs.
+Because it runs with ``min_size 1`` when degraded, you should only use stretch
+mode with all-flash OSDs. This minimizes the time needed to recover once
+connectivity is restored, and thus minimizes the potential for data loss.
Hopefully, future development will extend this feature to support EC pools and
running with more than 2 full sites.
This command should not be necessary; it is included to deal with
unanticipated situations. But you might wish to invoke it to remove
-the HEALTH_WARN state which recovery mode generates.
+the ``HEALTH_WARN`` state which recovery mode generates.