Increase the two timeouts associated with replica
responses to scrub requests.
This solves the issue of, when a cluster event causes
some form of repeering (e.g. an OSD in the active set
is down), having a request time out before the new
interval is established. This scenario does not
lead to any real data loss or crashes, but it does
result in log warnings (and failed tests).
Fixes: https://tracker.ceph.com/issues/68698
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
desc: Duration before issuing a cluster-log warning
long_desc: Waiting too long for a replica to respond (after at least half of the
replicas have responded).
- default: 2200
+ default: 22000
min: 500
see_also:
- osd_scrub_reservation_timeout
desc: Duration before aborting the scrub session
long_desc: Waiting too long for some replicas to respond to
scrub reservation requests.
- default: 5000
+ default: 50000
min: 2000
see_also:
- osd_scrub_slow_reservation_response
scrbr->get_clog()->warn()
<< "osd." << scrbr->get_whoami()
<< " PgScrubber: " << scrbr->get_spgid()
- << " timeout on reserving replicsa (since " << entered_at
+ << " timeout on reserving replicas (since " << entered_at
<< ")";
scrbr->on_replica_reservation_timeout();