From: xie xingguo Date: Wed, 6 Apr 2016 01:45:21 +0000 (+0800) Subject: osd: reset tp handle when search for boundary of chunky-scrub X-Git-Tag: v10.1.2~16^2~1 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=4d3aef75ec44abf0cb2b59e419c1089d8db8b01f;p=ceph.git osd: reset tp handle when search for boundary of chunky-scrub One of our tests in our local testbed shows that if the number of snapshots become extremely huge, the process of chunky-scrub() may encouter heart-beat failure. This is because it takes a real long time for the procedure to traverse and determine the boundary for a single run of chunk scrub under this case. This pr tries to solve the above the problem by resetting the tp handle passed in once in a while(after a certain number of loops, 64 by default) since the search can become very time-consumptive. Furthermore, the BUILD_MAP stage later on shall encouter the same problem but has already got fixed in the same way. Therefore, although the test case is rare, but this change is defensive and make our code strong and thus shall be considered as worthwhile. Fixes: tracker.ceph.com/issues/12892 Signed-off-by: xie xingguo --- diff --git a/src/osd/PG.cc b/src/osd/PG.cc index 2804c73163a2..09b9ef132ccc 100644 --- a/src/osd/PG.cc +++ b/src/osd/PG.cc @@ -4108,6 +4108,7 @@ void PG::chunky_scrub(ThreadPool::TPHandle &handle) bool boundary_found = false; hobject_t start = scrubber.start; + unsigned loop = 0; while (!boundary_found) { vector objects; ret = get_pgbackend()->objects_list_partial( @@ -4137,6 +4138,12 @@ void PG::chunky_scrub(ThreadPool::TPHandle &handle) boundary_found = true; } } + + // reset handle once in a while, the search maybe takes long. + if (++loop >= g_conf->osd_loop_before_reset_tphandle) { + handle.reset_tp_timeout(); + loop = 0; + } } if (!_range_available_for_scrub(scrubber.start, candidate_end)) {