From: Sage Weil Date: Wed, 31 Jul 2019 10:04:37 +0000 (-0500) Subject: doc/reados/operations/health-checks: document PG_SLOW_SNAP_TRIMMING X-Git-Tag: v15.1.0~1877^2~6 X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=7e9ba0a1c12090466a4fa1ca25e77b43e4d944b4;p=ceph-ci.git doc/reados/operations/health-checks: document PG_SLOW_SNAP_TRIMMING The mitigation steps are weak, but it's not clear concrete guidance to provide. Signed-off-by: Sage Weil --- diff --git a/doc/rados/operations/health-checks.rst b/doc/rados/operations/health-checks.rst index 0668aa41845..4238abb9378 100644 --- a/doc/rados/operations/health-checks.rst +++ b/doc/rados/operations/health-checks.rst @@ -965,6 +965,28 @@ You can manually initiate a scrub of a clean PG with:: ceph pg deep-scrub +PG_SLOW_SNAP_TRIMMING +_____________________ + +The snapshot trim queue for one or more PGs has exceeded the +configured warning threshold. This indicates that either an extremely +large number of snapshots were recently deleted, or that the OSDs are +unable to trim snapshots quickly enough to keep up with the rate of +new snapshot deletions. + +The warning threshold is controlled by the +``mon_osd_snap_trim_queue_warn_on`` option (default: 32768). + +This warning may trigger if OSDs are under excessive load and unable +to keep up with their background work, or if the OSDs' internal +metadata database is heavily fragmented and unable to perform. It may +also indicate some other performance issue with the OSDs. + +The exact size of the snapshot trim queue is reported by the +``snaptrimq_len`` field of ``ceph pg ls -f json-detail``. + + + Miscellaneous -------------