or delete some existing data to reduce utilization.
+Device health
+-------------
+
+DEVICE_HEALTH
+_____________
+
+One or more devices is expected to fail soon, where the warning
+threshold is controlled by the ``mgr/devicehealth/warn_threshold``
+config option.
+
+This warning only applies to OSDs that are currently marked "in", so
+the expected response to this failure is to mark the device "out" so
+that data is migrated off of the device, and then to remove the
+hardware from the system. Note that the marking out is normally done
+automatically if ``mgr/devicehealth/self_heal`` is enabled based on
+the ``mgr/devicehealth/mark_out_threshold``.
+
+Device health can be checked with::
+
+ ceph device info <device-id>
+
+Device life expectancy is set by a prediction model run by
+the mgr or an by external tool via the command::
+
+ ceph device set-life-expectancy <device-id> <from> <to>
+
+You can change the stored life expectancy manually, but that usually
+doesn't accomplish anything as whatever tool originally set it will
+probably set it again, and changing the stored value does not affect
+the actual health of the hardware device.
+
+DEVICE_HEALTH_IN_USE
+____________________
+
+One or more devices is expected to fail soon and has been marked "out"
+of the cluster based on ``mgr/devicehalth/mark_out_threshold``, but it
+is still participating in one more PGs. This may be because it was
+only recently marked "out" and data is still migrating, or because data
+cannot be migrated off for some reason (e.g., the cluster is nearly
+full, or the CRUSH hierarchy is such that there isn't another suitable
+OSD to migrate the data too).
+
+This message can be silenced by disabling the self heal behavior
+(setting ``mgr/devicehealth/self_heal`` to false), by adjusting the
+``mgr/devicehealth/mark_out_threshold``, or by addressing what is
+preventing data from being migrated off of the ailing device.
+
+DEVICE_HEALTH_TOOMANY
+_____________________
+
+Too many devices is expected to fail soon and the
+``mgr/devicehealth/self_heal`` behavior is enabled, such that marking
+out all of the ailing devices would exceed the clusters
+``mon_osd_min_in_ratio`` ratio that prevents too many OSDs from being
+automatically marked "out".
+
+This generally indicates that too many devices in your cluster are
+expected to fail soon and you should take action to add newer
+(healthier) devices before too many devices fail and data is lost.
+
+The health message can also be silenced by adjusting parameters like
+``mon_osd_min_in_ratio`` or ``mgr/devicehealth/mark_out_threshold``,
+but be warned that this will increase the likelihood of unrecoverable
+data loss in the cluster.
+
+
Data health (pools & placement groups)
--------------------------------------