and physical extents are updated accordingly. The SegmmentCleaner is also responisble for throttling GC work
in order to avoid abrupt pauses and maintain smooth IO latenices.
+.. _cleaner-gc-autotune:
+
+**Cleaner GC autotune**:
+
+ ``SegmentCleaner::get_next_reclaim_segment()`` chooses the next segment to
+ reclaim using one of three configurable formulas selected by
+ ``seastore_segment_cleaner_gc_formula``: ``GREEDY`` (lowest utilization
+ wins), ``COST_BENEFIT`` (``(1-u) * age / (2u)``), or ``BENEFIT``
+ (age-weighted quadratic). ``COST_BENEFIT`` is the default and the right
+ call for journaling / LIFO workloads where age predicts future
+ dead-byte accumulation.
+
+ That assumption breaks under random-write at high cluster fill. Dead
+ bytes spread uniformly across segments regardless of age, so age stops
+ predicting future deadness, and ``(1-u)/(2u)`` becomes the only term that
+ distinguishes candidates. With every segment in the 0.7-0.94 utilization
+ band, ``(1-u)/(2u)`` ranges from 0.227 to 0.032 -- a 7x spread the
+ formula can easily lose to a 7x age difference. A 0.94-util old segment
+ then outscores a 0.68-util young one, even though reclaiming the 0.68
+ segment would free 5x more space.
+
+ The autotune override detects this mis-selection at runtime. In the
+ same pass that scores segments by the configured formula, it also
+ tracks the lowest-utilization candidate (what ``GREEDY`` would pick).
+ After the pass, if greedy's free-fraction (``1 - util``) is at least
+ ``seastore_segment_cleaner_gc_autotune_ratio`` times the formula's
+ pick's free-fraction (default 2.0), the override swaps the formula's
+ pick for greedy. Since all segments share the same size, comparing
+ free-fractions is equivalent to comparing freed bytes.
+
+ Behaviour by regime:
+
+ - **Low alive_ratio**: many low-util candidates exist; the formula's
+ age-preferred pick is typically within ~30% of greedy in
+ free-fraction. The override does not fire and age weighting is
+ preserved.
+ - **High alive_ratio with non-uniform utilisation** (hot/cold mix):
+ greedy and the formula converge on the same segment in most cases;
+ when they differ, the formula's choice is usually within 2x. The
+ override rarely fires.
+ - **High alive_ratio with uniform utilisation** (the failure regime
+ the autotune targets): greedy's pick exceeds the formula's by 3-5x
+ routinely. The override fires reliably; net free per reclaim jumps
+ from 4-6 MB to 14-22 MB.
+
+ Configurable:
+
+ - ``seastore_segment_cleaner_gc_autotune`` (bool, default true):
+ operators can disable the override unconditionally.
+ - ``seastore_segment_cleaner_gc_autotune_ratio`` (float, default 2.0,
+ min 1.0): operators can tune the threshold; higher is more
+ conservative (preserves age weighting more aggressively).
+
+ A safety guard skips the override when the formula's pick has
+ free-fraction below ``1/1024`` of a segment, because the ratio
+ comparison is meaningless against a near-zero denominator. On
+ override the formula's score for the chosen segment is recomputed
+ so the value logged after selection stays consistent.
+
**Tiering**:
.. note::
- greedy
- cost_benefit
- benefit
+- name: seastore_segment_cleaner_gc_autotune
+ type: bool
+ level: advanced
+ desc: When the configured gc formula (cost_benefit or benefit) picks a segment
+ whose free-space fraction (1 - utilization) is at least
+ seastore_segment_cleaner_gc_autotune_ratio times smaller than the
+ lowest-utilization candidate, override the pick with the greedy choice.
+ long_desc: COST_BENEFIT and BENEFIT weight segment age, which is the right
+ call when age predicts dead-byte accumulation (journaling / LIFO
+ workloads). Under random-write at high alive_ratio dead bytes
+ spread uniformly across segments, age stops predicting deadness,
+ and the formula can pick a high-util old segment whose reclaim
+ frees several times less space than the lowest-util candidate.
+ When this option is enabled the cleaner detects the mis-selection
+ at runtime and overrides the formula's pick with the greedy
+ choice. Disable to honor the configured formula unconditionally.
+ Ignored when seastore_segment_cleaner_gc_formula = greedy.
+ default: true
+- name: seastore_segment_cleaner_gc_autotune_ratio
+ type: float
+ level: advanced
+ desc: Override threshold for the gc auto-tune. The configured formula's
+ pick is overridden with the greedy candidate when greedy's free
+ fraction is at least this ratio times the formula's pick's free
+ fraction.
+ long_desc: Higher is more conservative (override fires less often, the
+ configured formula's age weighting is preserved more
+ aggressively). Lower is more aggressive (override fires more
+ often, behaviour converges toward pure greedy). The default
+ (2.0) captures the random-write failure regime while staying
+ clear of normal-operation fluctuations.
+ default: 2.0
+ min: 1.0
- name: seastore_data_delta_based_overwrite
type: size
level: dev
} else {
bound_time = NULL_TIME;
}
+ // Track the configured formula's best-scoring candidate alongside the
+ // greedy choice (lowest utilization / highest free fraction).
+ // See doc/dev/crimson/seastore.rst#cleaner-gc-autotune.
+ segment_id_t greedy_id = NULL_SEG_ID;
+ double greedy_min_util = 1.0;
for (auto& [_id, segment_info] : segments) {
if (segment_info.is_closed() &&
(trimmer == nullptr ||
!segment_info.is_in_journal(trimmer->get_journal_tail()))) {
+ // Track the configured formula's best-scoring reclaim candidate.
double benefit_cost = calc_gc_benefit_cost(_id, now_time, bound_time);
if (benefit_cost > max_benefit_cost) {
id = _id;
max_benefit_cost = benefit_cost;
}
+ // Track the greedy candidate (lowest utilization / highest free fraction).
+ double util = calc_utilization(_id);
+ if (util < greedy_min_util) {
+ greedy_id = _id;
+ greedy_min_util = util;
+ }
+ }
+ }
+ // Autotune override: prefer greedy when its pick would free far more.
+ // See doc/dev/crimson/seastore.rst#cleaner-gc-autotune.
+ const bool autotune_enabled =
+ crimson::common::get_conf<bool>(
+ "seastore_segment_cleaner_gc_autotune");
+ if (autotune_enabled &&
+ gc_formula != gc_formula_t::GREEDY &&
+ id != NULL_SEG_ID && greedy_id != NULL_SEG_ID && id != greedy_id) {
+ double picked_util = calc_utilization(id);
+ double picked_free = 1.0 - picked_util;
+ double greedy_free = 1.0 - greedy_min_util;
+ const double ratio = crimson::common::get_conf<double>(
+ "seastore_segment_cleaner_gc_autotune_ratio");
+ if (should_override_to_greedy(picked_free, greedy_free, ratio)) {
+ DEBUG("auto-tune: formula picked seg {} (util {:.3f}, free {:.3f}),"
+ " overriding with greedy seg {} (util {:.3f}, free {:.3f})",
+ id, picked_util, picked_free,
+ greedy_id, greedy_min_util, greedy_free);
+ id = greedy_id;
+ // Recompute the formula score for the chosen segment so the
+ // value logged below stays semantically consistent.
+ max_benefit_cost =
+ calc_gc_benefit_cost(greedy_id, now_time, bound_time);
}
}
if (id != NULL_SEG_ID) {
clean_space_ret clean_space() final;
+ // Predicate for the autotune override: returns true when greedy's pick frees
+ // significantly more space than the formula's pick.
+ // See doc/dev/crimson/seastore.rst#cleaner-gc-autotune.
+ static bool should_override_to_greedy(
+ double picked_free, double greedy_free, double ratio) {
+ // Guard against picked_free near zero (1/1024 of a segment): the ratio
+ // comparison is meaningless against a near-zero denominator.
+ constexpr double kMinPickedFreeForRatio = 1.0 / 1024.0;
+ return picked_free >= kMinPickedFreeForRatio &&
+ greedy_free >= ratio * picked_free;
+ }
+
const std::set<device_id_t>& get_device_ids() const final {
return sm_group->get_device_ids();
}