From: Sage Weil Date: Fri, 15 Feb 2013 01:33:22 +0000 (-0800) Subject: doc: update rados troubleshooting for slow requests X-Git-Tag: v0.58~61 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=2c42bfc00c819ddd9e81fb46879df4ef6412fe58;p=ceph.git doc: update rados troubleshooting for slow requests The example was out of date. Adding a note about how to look at the request queue on the OSD. Reported-by: Chris Dunlop Signed-off-by: Sage Weil --- diff --git a/doc/rados/operations/troubleshooting-osd.rst b/doc/rados/operations/troubleshooting-osd.rst index ba5655d9e25b..1dffa02bb42d 100644 --- a/doc/rados/operations/troubleshooting-osd.rst +++ b/doc/rados/operations/troubleshooting-osd.rst @@ -298,7 +298,7 @@ long. The warning threshold defaults to 30 seconds, and is configurable via the ``osd op complaint time`` option. When this happens, the cluster log will receive messages like:: - osd.0 192.168.106.220:6800/18813 312 : [WRN] old request osd_op(client.5099.0:790 fatty_26485_object789 [write 0~4096] 2.5e54f643) v4 received at 2012-03-06 15:42:56.054801 currently waiting for sub ops + slow request 30.383883 seconds old, received at 2013-02-12 16:27:15.508374: osd_op(client.9821.0:122242 rb.0.209f.74b0dc51.000000000120 [write 921600~4096] 2.981cf6bc) v4 currently no flag points reached Possible causes include: @@ -307,6 +307,16 @@ Possible causes include: * overloaded cluster (check system load, iostat, etc.) * ceph-osd bug +Pay particular attention to the ``currently`` part, as that will give +some clue as to what the request is waiting for. You can further look +at exactly what requests the slow OSD is working on are, and what +state(s) they are in with:: + + ceph --admin-daemon /var/run/ceph/ceph-osd.{ID}.asok dump_ops_in_flight + +These are sorted oldest to newest, and the dump includes an ``age`` +indicating how long the request has been in the queue. + Flapping OSDs =============