]> git.apps.os.sepia.ceph.com Git - ceph.git/commit
osd/scrub: report replicas slow to repond to scrub requests
authorRonen Friedman <rfriedma@redhat.com>
Mon, 24 Jan 2022 13:19:01 +0000 (13:19 +0000)
committerRonen Friedman <rfriedma@redhat.com>
Sun, 11 Dec 2022 15:27:42 +0000 (17:27 +0200)
commit411ea10084033ad69b44f0a04c9e18ab19d43639
tree1e84d3a45ec2f3f6214bb07795fe4dbe35a0f7e4
parent0fb0add0a5c9e2ce1b6d85163c0255ffa13422ff
osd/scrub: report replicas slow to repond to scrub requests

Implemented timeouts:

1: Slow-Secondary Warning:

Once at least half of the replicas have accepted the reservation, we
start reporting any secondary that takes too long (more than <conf>
milliseconds after the previous response received) to respond to the reservation
request.
(Why? because we have encountered real-life situations where a specific
OSD was systematically very slow to respond (e.g. 5 seconds in one case) to
the reservation requests, slowing the scrub process to a crawl).

2: Reservation Process Timeout:

We now limit the total time the primary waits for the replicas to
respond to the reservation request. If we do not get all the responses
(either Grant or Reject) within <conf> milliseconds, we give up and release all the
reservations we have acquired so far.
(Why? because we have encountered instances where a reservation request
was lost - either due to a bug or due to a network issue.)

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
src/common/options/osd.yaml.in
src/osd/scrubber/pg_scrubber.cc
src/osd/scrubber/pg_scrubber.h