From: Ronen Friedman <rfriedma@redhat.com>
Date: Mon, 23 Oct 2023 15:38:18 +0000 (+0300)
Subject: osd/scrub: extend scrub reservation timeout
X-Git-Tag: v19.0.0~197^2~1
X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=9999249876d4a4c5ab8f4c8a204442df2902fc36;p=ceph.git

osd/scrub: extend scrub reservation timeout

As replicas are now reserved sequentially, the scrub reservation
timeout should be extended.

Taking into account the low priority of scrub-related messages,
modify both 'osd_scrub_reservation_timeout' and
'osd_scrub_slow_reservation_response'.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
---

diff --git a/src/common/options/osd.yaml.in b/src/common/options/osd.yaml.in
index 5d8d40cf12d1..198224423585 100644
--- a/src/common/options/osd.yaml.in
+++ b/src/common/options/osd.yaml.in
@@ -511,9 +511,9 @@ options:
   type: millisecs
   level: advanced
   desc: Maximum wait (milliseconds) for scrub reservations before issuing a cluster-log warning
-  long_desc: Waiting too long for a replica to respond to scrub resource reservation request
-   (after at least half of the replicas have responded). Disable by setting to a very large value.
-  default: 2200
+  long_desc: Waiting too long for a replica to respond to scrub resource reservation request.
+    Disable by setting to a very large value.
+  default: 30000
   min: 500
   see_also:
   - osd_scrub_reservation_timeout
@@ -525,7 +525,7 @@ options:
   long_desc: Maximum wait (milliseconds) for all replicas to respond to
     scrub reservation requests, before the scrub session is aborted. Disable by setting
     to a very large value.
-  default: 5000
+  default: 300000
   min: 2000
   see_also:
   - osd_scrub_slow_reservation_response
diff --git a/src/osd/scrubber/scrub_machine.h b/src/osd/scrubber/scrub_machine.h
index 2f73cbbefb5d..cbce07fe183c 100644
--- a/src/osd/scrubber/scrub_machine.h
+++ b/src/osd/scrubber/scrub_machine.h
@@ -164,9 +164,19 @@ MEV(SchedReplica)
 /// that is in-flight to the local ObjectStore
 MEV(ReplicaPushesUpd)
 
-/// a new interval has dawned.
-/// For a Primary: Discards replica reservations, so that the FullReset that would
-/// follow it would not attempt to release them.
+/**
+ * IntervalChanged
+ *
+ * This event notifies the ScrubMachine that it is no longer responsible for
+ * releasing replica state.  It will generally be submitted upon a PG interval
+ * change.
+ *
+ * This event is distinct from FullReset because replicas are always responsible
+ * for releasing any interval specific state (including but certainly not limited to
+ * scrub reservations) upon interval change, without coordination from the
+ * Primary.  This event notifies the ScrubMachine that it can forget about
+ * such remote state.
+ */
 MEV(IntervalChanged)
 
 /// guarantee that the FSM is in the quiescent state (i.e. NotActive)