osd/scrub: report replicas slow to repond to scrub requests
Implemented timeouts:
1: Slow-Secondary Warning:
Once at least half of the replicas have accepted the reservation, we
start reporting any secondary that takes too long (more than <conf>
milliseconds after the previous response received) to respond to the reservation
request.
(Why? because we have encountered real-life situations where a specific
OSD was systematically very slow to respond (e.g. 5 seconds in one case) to
the reservation requests, slowing the scrub process to a crawl).
2: Reservation Process Timeout:
We now limit the total time the primary waits for the replicas to
respond to the reservation request. If we do not get all the responses
(either Grant or Reject) within <conf> milliseconds, we give up and release all the
reservations we have acquired so far.
(Why? because we have encountered instances where a reservation request
was lost - either due to a bug or due to a network issue.)