]> git.apps.os.sepia.ceph.com Git - ceph.git/commit
OSD: suspend tp timeout while taking pg lock in OpWQ
authorSamuel Just <sam.just@inktank.com>
Mon, 29 Jul 2013 16:36:04 +0000 (09:36 -0700)
committerSamuel Just <sam.just@inktank.com>
Mon, 29 Jul 2013 19:49:16 +0000 (12:49 -0700)
commit1f13d8ac5b879134942cac2f5aca00669f24581f
tree370f3e24dff968157948c69d04e3239581389379
parentf1bd4e5bdf4fc9473c1762533aab61ac2dbe64d5
OSD: suspend tp timeout while taking pg lock in OpWQ

If N op_tp threads are configured, and recovery_max_active
is set to a sufficiently large number, all N op_tp threads
might grab a MOSDPGPush op off of the queue for the same PG.
The last thread to get the lock will have waited
N*time_to_handle_push before completing its item and pinging
the heartbeat timeout.  If that time exceeds the timeout
and there are enough ops waiting, each thread subsequently
will end up exceeding the timeout before completeing an
item preventing the OSD from heartbeating indefinitely.

We prevent this by suspending the timeout while we try to
get the PG lock.  Even if we do block for an excessive
period of time attempting to get the lock, hopefully,
the thread holding the lock will cause the threadpool
to time out.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
src/common/WorkQueue.cc
src/common/WorkQueue.h
src/osd/OSD.cc
src/osd/OSD.h
src/osd/PG.cc
src/osd/PG.h