From 7e9ba0a1c12090466a4fa1ca25e77b43e4d944b4 Mon Sep 17 00:00:00 2001 From: Sage Weil Date: Wed, 31 Jul 2019 05:04:37 -0500 Subject: [PATCH] doc/reados/operations/health-checks: document PG_SLOW_SNAP_TRIMMING The mitigation steps are weak, but it's not clear concrete guidance to provide. Signed-off-by: Sage Weil --- doc/rados/operations/health-checks.rst | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/doc/rados/operations/health-checks.rst b/doc/rados/operations/health-checks.rst index 0668aa41845..4238abb9378 100644 --- a/doc/rados/operations/health-checks.rst +++ b/doc/rados/operations/health-checks.rst @@ -965,6 +965,28 @@ You can manually initiate a scrub of a clean PG with:: ceph pg deep-scrub +PG_SLOW_SNAP_TRIMMING +_____________________ + +The snapshot trim queue for one or more PGs has exceeded the +configured warning threshold. This indicates that either an extremely +large number of snapshots were recently deleted, or that the OSDs are +unable to trim snapshots quickly enough to keep up with the rate of +new snapshot deletions. + +The warning threshold is controlled by the +``mon_osd_snap_trim_queue_warn_on`` option (default: 32768). + +This warning may trigger if OSDs are under excessive load and unable +to keep up with their background work, or if the OSDs' internal +metadata database is heavily fragmented and unable to perform. It may +also indicate some other performance issue with the OSDs. + +The exact size of the snapshot trim queue is reported by the +``snaptrimq_len`` field of ``ceph pg ls -f json-detail``. + + + Miscellaneous ------------- -- 2.39.5