From 77a69405f92f9912f75b8e947fc9df8e1e92099c Mon Sep 17 00:00:00 2001 From: Patrick Donnelly Date: Thu, 21 Feb 2019 20:23:13 -0800 Subject: [PATCH] doc: update documentation for standby-replay Signed-off-by: Patrick Donnelly --- doc/cephfs/standby.rst | 160 +++++--------------------------------- doc/releases/nautilus.rst | 6 ++ 2 files changed, 25 insertions(+), 141 deletions(-) diff --git a/doc/cephfs/standby.rst b/doc/cephfs/standby.rst index 0aaab9b76c7..d4be67d0a3b 100644 --- a/doc/cephfs/standby.rst +++ b/doc/cephfs/standby.rst @@ -58,9 +58,10 @@ forms of the 'fail' command: Managing failover ----------------- -If an MDS daemon stops communicating with the monitor, the monitor will -wait ``mds_beacon_grace`` seconds (default 15 seconds) before marking -the daemon as *laggy*. +If an MDS daemon stops communicating with the monitor, the monitor will wait +``mds_beacon_grace`` seconds (default 15 seconds) before marking the daemon as +*laggy*. If a standby is available, the monitor will immediately replace the +laggy daemon. Each file system may specify a number of standby daemons to be considered healthy. This number includes daemons in standby-replay waiting for a rank to @@ -76,148 +77,25 @@ Each file system may set the number of standby daemons wanted using: Setting ``count`` to 0 will disable the health check. -Configuring standby daemons ---------------------------- +Configuring standby-replay +-------------------------- -There are four configuration settings that control how a daemon -will behave while in standby: +Each CephFS file system may be configured to add standby-replay daemons. These +standby daemons follow the active MDS's metadata journal to reduce failover +time in the event the active MDS becomes unavailable. Each active MDS may have +only one standby-replay daemon following it. -:: - - mds_standby_replay - mds_standby_for_name - mds_standby_for_rank - mds_standby_for_fscid - -These may be set in the ceph.conf on the host where the MDS daemon -runs (as opposed to on the monitor). The daemon loads these settings -when it starts, and sends them to the monitor. - -By default, if none of these settings are used, all MDS daemons -which do not hold a rank will be used as standbys for any rank. - -The settings which associate a standby daemon with a particular -name or rank do not guarantee that the daemon will *only* be used -for that rank. They mean that when several standbys are available, -the associated standby daemon will be used. If a rank is failed, -and a standby is available, it will be used even if it is associated -with a different rank or named daemon. - -mds_standby_replay -~~~~~~~~~~~~~~~~~~ - -If this is set to true, then the standby daemon will continuously read -the metadata journal of an up rank. This will give it -a warm metadata cache, and speed up the process of failing over -if the daemon serving the rank fails. - -An up rank may only have one standby replay daemon assigned to it, -if two daemons are both set to be standby replay then one of them -will arbitrarily win, and the other will become a normal non-replay -standby. - -Once a daemon has entered the standby replay state, it will only be -used as a standby for the rank that it is following. If another rank -fails, this standby replay daemon will not be used as a replacement, -even if no other standbys are available. - -*Historical note:* In Ceph prior to v10.2.1, this setting (when ``false``) is -always true when ``mds_standby_for_*`` is also set. - -mds_standby_for_name -~~~~~~~~~~~~~~~~~~~~ - -Set this to make the standby daemon only take over a failed rank -if the last daemon to hold it matches this name. - -mds_standby_for_rank -~~~~~~~~~~~~~~~~~~~~ - -Set this to make the standby daemon only take over the specified -rank. If another rank fails, this daemon will not be used to -replace it. - -Use in conjunction with ``mds_standby_for_fscid`` to be specific -about which filesystem's rank you are targeting, if you have -multiple filesystems. - -mds_standby_for_fscid -~~~~~~~~~~~~~~~~~~~~~ - -If ``mds_standby_for_rank`` is set, this is simply a qualifier to -say which filesystem's rank is referred to. - -If ``mds_standby_for_rank`` is not set, then setting FSCID will -cause this daemon to target any rank in the specified FSCID. Use -this if you have a daemon that you want to use for any rank, but -only within a particular filesystem. - -mon_force_standby_active -~~~~~~~~~~~~~~~~~~~~~~~~ - -This setting is used on monitor hosts. It defaults to true. - -If it is false, then daemons configured with mds_standby_replay=true -will **only** become active if the rank/name that they have -been configured to follow fails. On the other hand, if this -setting is true, then a daemon configured with mds_standby_replay=true -may be assigned some other rank. - -Examples --------- - -These are example ceph.conf snippets. In practice you can either -copy a ceph.conf with all daemons' configuration to all your servers, -or you can have a different file on each server that contains just -that server's daemons' configuration. - -Simple pair -~~~~~~~~~~~ - -Two MDS daemons 'a' and 'b' acting as a pair, where whichever one is not -currently assigned a rank will be the standby replay follower -of the other. - -:: - - [mds.a] - mds standby replay = true - mds standby for rank = 0 - - [mds.b] - mds standby replay = true - mds standby for rank = 0 - -Floating standby -~~~~~~~~~~~~~~~~ - -Three MDS daemons 'a', 'b' and 'c', in a filesystem that has -``max_mds`` set to 2. +Configuring standby-replay on a file system is done using: :: - - # No explicit configuration required: whichever daemon is - # not assigned a rank will go into 'standby' and take over - # for whichever other daemon fails. - -Two MDS clusters -~~~~~~~~~~~~~~~~ - -With two filesystems, I have four MDS daemons, and I want two -to act as a pair for one filesystem and two to act as a pair -for the other filesystem. - -:: - - [mds.a] - mds standby for fscid = 1 - - [mds.b] - mds standby for fscid = 1 - [mds.c] - mds standby for fscid = 2 + ceph fs set allow_standby_replay - [mds.d] - mds standby for fscid = 2 +Once set, the monitors will assign available standby daemons to follow the +active MDSs in that file system. +Once an MDS has entered the standby-replay state, it will only be used as a +standby for the rank that it is following. If another rank fails, this +standby-replay daemon will not be used as a replacement, even if no other +standbys are available. For this reason, it is advised that if standby-replay +is used then every active MDS should have a standby-replay daemon. diff --git a/doc/releases/nautilus.rst b/doc/releases/nautilus.rst index e55315517da..b77830c73e1 100644 --- a/doc/releases/nautilus.rst +++ b/doc/releases/nautilus.rst @@ -428,6 +428,12 @@ These changes occurred between the Mimic and Nautilus releases. ``mds_recall_warning_decay_rate`` (default: 60s) sets the threshold for this warning. +* The MDS mds_standby_for_*, mon_force_standby_active, and mds_standby_replay + configuration options have been obsoleted. Instead, the operator may now set + the new "allow_standby_replay" flag on the CephFS file system. This setting + causes standbys to become standby-replay for any available rank in the file + system. + * The Telegraf module for the Manager allows for sending statistics to an Telegraf Agent over TCP, UDP or a UNIX Socket. Telegraf can then send the statistics to databases like InfluxDB, ElasticSearch, Graphite -- 2.39.5