From f09a87f9025fca332697f66be4a3c889fe80ccc9 Mon Sep 17 00:00:00 2001 From: Sage Weil Date: Tue, 31 Jul 2018 09:38:39 -0500 Subject: [PATCH] doc/mgr/devicehealth: document devicehealth module Signed-off-by: Sage Weil --- doc/mgr/devicehealth.rst | 52 ++++++++++++++++++++++++++++++++++++++++ doc/mgr/index.rst | 1 + 2 files changed, 53 insertions(+) create mode 100644 doc/mgr/devicehealth.rst diff --git a/doc/mgr/devicehealth.rst b/doc/mgr/devicehealth.rst new file mode 100644 index 0000000000000..5e0d0012192b8 --- /dev/null +++ b/doc/mgr/devicehealth.rst @@ -0,0 +1,52 @@ +Devicehealth plugin +=================== + +The *devicehealth* plugin includes code to manage physical devices +that back Ceph daemons (e.g., OSDs). This includes scraping health +metrics (e.g., SMART) and responding to health metrics by migrating +data away from failing devices. + +Enabling +-------- + +The *devicehealth* module is enabled with:: + + ceph mgr module enable devicehealth + +(It is enabled by default.) + +Scraping +-------- + +Health metrics can be scraped from all devices with:: + + ceph device scrape-health-metrics + +A single device can be scraped with:: + + ceph device scrape-health-metrics + +Or a single daemon's devices can be scraped with:: + + ceph device scrape-daemon-health-metrics + + +Health monitoring +----------------- + +By default, the devicehealth module wakes up periodically and checks +the health of all devices in the system. This will raise health +alerts if devices are expected to fail soon. This can be disabled by +turning off the ``mgr/devicehealth/enable_monitoring`` option. + +The ``mgr/devicehealth/warn_threshold`` controls how soon an expected +device failure must be before we generate a health warning. + +If the ``mgr/devicehealth/self_heal`` option is enabled (it is by +default), then for devices that are expected to fail soon the module +will automatically migrate data away from them by marking the devices +"out". + +The ``mgr/devicehealth/mark_out_threshold`` controls how soon an +expected device failure must be before we automatically mark an osd +"out". diff --git a/doc/mgr/index.rst b/doc/mgr/index.rst index ea8c9d48ca1bf..e640292f40f4d 100644 --- a/doc/mgr/index.rst +++ b/doc/mgr/index.rst @@ -39,3 +39,4 @@ sensible. Telemetry plugin Iostat plugin Crash plugin + Devicehealth plugin -- 2.39.5