From: Ronen Friedman Date: Mon, 23 Oct 2023 15:38:18 +0000 (+0300) Subject: osd/scrub: extend scrub reservation timeout X-Git-Tag: v19.0.0~197^2~1 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=9999249876d4a4c5ab8f4c8a204442df2902fc36;p=ceph.git osd/scrub: extend scrub reservation timeout As replicas are now reserved sequentially, the scrub reservation timeout should be extended. Taking into account the low priority of scrub-related messages, modify both 'osd_scrub_reservation_timeout' and 'osd_scrub_slow_reservation_response'. Signed-off-by: Ronen Friedman --- diff --git a/src/common/options/osd.yaml.in b/src/common/options/osd.yaml.in index 5d8d40cf12d1..198224423585 100644 --- a/src/common/options/osd.yaml.in +++ b/src/common/options/osd.yaml.in @@ -511,9 +511,9 @@ options: type: millisecs level: advanced desc: Maximum wait (milliseconds) for scrub reservations before issuing a cluster-log warning - long_desc: Waiting too long for a replica to respond to scrub resource reservation request - (after at least half of the replicas have responded). Disable by setting to a very large value. - default: 2200 + long_desc: Waiting too long for a replica to respond to scrub resource reservation request. + Disable by setting to a very large value. + default: 30000 min: 500 see_also: - osd_scrub_reservation_timeout @@ -525,7 +525,7 @@ options: long_desc: Maximum wait (milliseconds) for all replicas to respond to scrub reservation requests, before the scrub session is aborted. Disable by setting to a very large value. - default: 5000 + default: 300000 min: 2000 see_also: - osd_scrub_slow_reservation_response diff --git a/src/osd/scrubber/scrub_machine.h b/src/osd/scrubber/scrub_machine.h index 2f73cbbefb5d..cbce07fe183c 100644 --- a/src/osd/scrubber/scrub_machine.h +++ b/src/osd/scrubber/scrub_machine.h @@ -164,9 +164,19 @@ MEV(SchedReplica) /// that is in-flight to the local ObjectStore MEV(ReplicaPushesUpd) -/// a new interval has dawned. -/// For a Primary: Discards replica reservations, so that the FullReset that would -/// follow it would not attempt to release them. +/** + * IntervalChanged + * + * This event notifies the ScrubMachine that it is no longer responsible for + * releasing replica state. It will generally be submitted upon a PG interval + * change. + * + * This event is distinct from FullReset because replicas are always responsible + * for releasing any interval specific state (including but certainly not limited to + * scrub reservations) upon interval change, without coordination from the + * Primary. This event notifies the ScrubMachine that it can forget about + * such remote state. + */ MEV(IntervalChanged) /// guarantee that the FSM is in the quiescent state (i.e. NotActive)