]> git.apps.os.sepia.ceph.com Git - ceph.git/commit
osd: vary tick interval +/- 5% to avoid scrub livelocks 24396/head
authorSage Weil <sage@redhat.com>
Thu, 9 Aug 2018 13:33:42 +0000 (08:33 -0500)
committerNathan Cutler <ncutler@suse.com>
Fri, 5 Oct 2018 20:23:34 +0000 (22:23 +0200)
commit897588003345cb553216351813ae17aa1048f055
tree3b08d443ba3b62667d627bc7217209ef0ded97b8
parent8138b1d4f484fc685cb66a9bb85fef23ae65ead4
osd: vary tick interval +/- 5% to avoid scrub livelocks

If you have two pgs that need to scrub on two OSDs, each the primary
for one pg and the replica for the other, you can end up in a livelock:

- both osds locally reserve a scrub slot
- both osds send a scrub schedule request
- both scrub requests are rejected
- both osds wait exactly 1 second
- repeat

Seems a bit unlikely, but I've seen test cases where it goes on more an
hour.

Fixes: http://tracker.ceph.com/issues/26890
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 2011377c379c9d53a3a0a693a7874fc330278898)

Conflicts:
src/osd/OSD.cc
- luminous does not have src/include/random.h; use #include <random>
  instead, seeding with whoami so each OSD gets a different series
  of pseudo-random numbers
src/osd/OSD.cc
src/osd/OSD.h