This page lists the health checks raised by MDS daemons. For the checks from
other daemons, please see :ref:`health-checks`.
-* ``MDS_TRIM``
+``MDS_TRIM``
+------------
Message
"Behind on trimming..."
too slowly, or a software bug is preventing trimming, then this health
message may appear. The threshold for this message to appear is controlled by
the config option ``mds_log_warn_factor``, the default is 2.0.
-* ``MDS_HEALTH_CLIENT_LATE_RELEASE``, ``MDS_HEALTH_CLIENT_LATE_RELEASE_MANY``
+
+``MDS_HEALTH_CLIENT_LATE_RELEASE``, ``MDS_HEALTH_CLIENT_LATE_RELEASE_MANY``
+---------------------------------------------------------------------------
Message
"Client *name* failing to respond to capability release"
is unresponsive or buggy, it might fail to do so promptly or fail to do
so at all. This message appears if a client has taken longer than
``session_timeout`` (default 60s) to comply.
-* ``MDS_CLIENT_RECALL``, ``MDS_HEALTH_CLIENT_RECALL_MANY``
+
+``MDS_CLIENT_RECALL``, ``MDS_HEALTH_CLIENT_RECALL_MANY``
+--------------------------------------------------------
Message
"Client *name* failing to respond to cache pressure"
``mds_recall_warning_threshold`` capabilities (decaying with a half-life of
``mds_recall_max_decay_rate``) within the last
``mds_recall_warning_decay_rate`` second.
-* ``MDS_CLIENT_OLDEST_TID``, ``MDS_CLIENT_OLDEST_TID_MANY``
+
+``MDS_CLIENT_OLDEST_TID``, ``MDS_CLIENT_OLDEST_TID_MANY``
+---------------------------------------------------------
Message
"Client *name* failing to advance its oldest client/flush tid"
appears if a client appears to have more than ``max_completed_requests``
(default 100000) requests that are complete on the MDS side but haven't
yet been accounted for in the client's *oldest tid* value.
-* ``MDS_DAMAGE``
+
+``MDS_DAMAGE``
+--------------
Message
"Metadata damage detected"
client accesses to the damaged subtree will return IO errors. Use
the ``damage ls`` admin socket command to get more detail on the damage.
This message appears as soon as any damage is encountered.
-* ``MDS_HEALTH_READ_ONLY``
+
+``MDS_HEALTH_READ_ONLY``
+------------------------
Message
"MDS in read-only mode"
MDS will go into readonly mode if it encounters a write error while
writing to the metadata pool, or if forced to by an administrator using
the *force_readonly* admin socket command.
-* ``MDS_SLOW_REQUEST``
+
+``MDS_SLOW_REQUEST``
+--------------------
Message
"*N* slow requests are blocked"
Use the ``ops`` admin socket command to list outstanding metadata operations.
This message appears if any client requests have taken longer than
``mds_op_complaint_time`` (default 30s).
-* ``MDS_CACHE_OVERSIZED``
+
+``MDS_CACHE_OVERSIZED``
+-----------------------
Message
"Too many inodes in cache"
the actual cache size (in memory) is at least 50% greater than
``mds_cache_memory_limit`` (default 1GB). Modify ``mds_health_cache_threshold``
to set the warning ratio.
+
+``FS_WITH_FAILED_MDS``
+----------------------
+
+ Message
+ "Some MDS ranks do not have standby replacements"
+
+ Description
+ Normally, a failed MDS rank will be replaced by a standby MDS. This situation
+ is transient and is not considered critical. However, if there are no standby
+ MDSs available to replace an active MDS rank, this health warning is generated.
+
+``MDS_INSUFFICIENT_STANDBY``
+----------------------------
+
+ Message
+ "Insufficient number of available standby(-replay) MDS daemons than configured"
+
+ Description
+ The minimum number of standby(-replay) MDS daemons can be configured by setting
+ ``standby_count_wanted`` configuration variable. This health warning is generated
+ when the configured value mismatches the number of standby(-replay) MDS daemons
+ available.
+
+``FS_DEGRADED``
+----------------------------
+
+ Message
+ "Some MDS ranks have been marked failed or damaged"
+
+ Description
+ When one or more MDS rank ends up in failed or damaged state due to
+ an unrecoverable error. The file system may be partially or fully
+ unavailable when one (or more) ranks are offline.
+
+``MDS_UP_LESS_THAN_MAX``
+----------------------------
+
+ Message
+ "Number of active ranks are less than configured number of maximum MDSs"
+
+ Description
+ The maximum number of MDS ranks can be configured by setting ``max_mds``
+ configuration variable. This health warning is generated when the number
+ of MDS ranks falls below this configured value.
+
+``MDS_ALL_DOWN``
+----------------------------
+
+ Message
+ "None of the MDS ranks are available (file system offline)"
+
+ Description
+ All MDS ranks are unavailable resulting in the file system to be completely
+ offline.