From: Sridhar Seshasayee Date: Mon, 25 May 2026 12:14:54 +0000 (+0530) Subject: mclock_common: adjust mClock profile parameters to prevent backfill starvation X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=77421c83665fdfd3343c487793e37e003d8c1891;p=ceph.git mclock_common: adjust mClock profile parameters to prevent backfill starvation Adjust the 'background_best_effort' queue parameters across the three standard mClock profiles (high_client_ops, balanced, and high_recovery_ops) to ensure best effort ops are not starved. Previously, the 'background_best_effort' queue carried a default allocation of 0% (MIN) reservation and a weight of 1 under these profiles. When concurrent client traffic is dense, the zero-reservation for example completely starves backfill sub-ops (MSG_OSD_EC_READ) on pools with 'allow_ec_optimizations' set to false. This starvation forces the Primary OSD to hold internal BlueStore transactions and PG object locks for extended windows, causing severe client median (50th) latency inflation. To prevent background starvation and resolve the effects of the primary lock retention, the profile configurations are tuned as follows: The following profile changes forces low-cost sub-ops to clear out of peer queues rapidly to drop primary locks, which helps improve the client completion latency and tail latency (95th, 99th and 99.5th) percentile. 1. high_client_ops profile: - Grant 'background_best_effort' a safe 5% minimum reservation. - Scale the queue weight to 4. 2. balanced profile: - Grant 'background_best_effort' a 5% minimum reservation. - Set the queue weight to 2. 3. high_recovery_ops profile: - Grant 'background_best_effort' a 5% minimum reservation. - Set the queue weight to 2. 4. Modify the mClock config reference documentation to reflect the tuning changes to the best-effort QoS parameters across the profiles. Note on Proportional Scaling Compatibility: Configuring these changes shifts total reservations to 105% (e.g., 50% client + 50% recovery + 5% best-effort under the Balanced profile). Under heavy concurrent saturation, mClock's internal controls resolves this gracefully via proportional down-scaling, preserving the underlying device bandwidth limits for different classes of clients. For example instead of the client being allocated 50% bandwidth, a slightly lower reservation is allocated while shifting the remaining bandwidth to the best-effort queue. This minor scaling shift is virtually unnoticeable to the client application, but it prevents the internal queue deadlocks. Signed-off-by: Sridhar Seshasayee --- diff --git a/doc/rados/configuration/mclock-config-ref.rst b/doc/rados/configuration/mclock-config-ref.rst index 95d6e52c91e..316e1faa5e1 100644 --- a/doc/rados/configuration/mclock-config-ref.rst +++ b/doc/rados/configuration/mclock-config-ref.rst @@ -128,7 +128,7 @@ built-in profiles may be enabled by following the steps mentioned in next sectio +------------------------+-------------+--------+-------+ | background recovery | 50% | 1 | MAX | +------------------------+-------------+--------+-------+ -| background best-effort | MIN | 1 | 90% | +| background best-effort | 5% | 2 | 90% | +------------------------+-------------+--------+-------+ high_client_ops @@ -147,7 +147,7 @@ the resource control parameters set by the profile: +------------------------+-------------+--------+-------+ | background recovery | 40% | 1 | MAX | +------------------------+-------------+--------+-------+ -| background best-effort | MIN | 1 | 70% | +| background best-effort | 5% | 4 | 70% | +------------------------+-------------+--------+-------+ high_recovery_ops @@ -165,7 +165,7 @@ parameters set by the profile: +------------------------+-------------+--------+-------+ | background recovery | 70% | 2 | MAX | +------------------------+-------------+--------+-------+ -| background best-effort | MIN | 1 | MAX | +| background best-effort | 5% | 2 | MAX | +------------------------+-------------+--------+-------+ .. note:: Across the built-in profiles, internal background best-effort clients diff --git a/src/common/mclock_common.h b/src/common/mclock_common.h index b736308159d..498da12ca5a 100644 --- a/src/common/mclock_common.h +++ b/src/common/mclock_common.h @@ -128,12 +128,12 @@ struct profile_t { * Background Recovery Allocation: * reservation: 40% | weight: 1 | limit: 0 (max) | * Background Best Effort Allocation: - * reservation: 0 (min) | weight: 1 | limit: 70% | + * reservation: 5% | weight: 4 | limit: 70% | */ constexpr profile_t HIGH_CLIENT_OPS{ - { .6, 2, 0 }, - { .4, 1, 0 }, - { 0, 1, .7 } + { .6, 2, 0 }, + { .4, 1, 0 }, + { .05, 4, .7 } }; /** @@ -144,12 +144,12 @@ constexpr profile_t HIGH_CLIENT_OPS{ * Background Recovery Allocation: * reservation: 70% | weight: 2 | limit: 0 (max) | * Background Best Effort Allocation: - * reservation: 0 (min) | weight: 1 | limit: 0 (max) | + * reservation: 5% | weight: 2 | limit: 0 (max) | */ constexpr profile_t HIGH_RECOVERY_OPS{ - { .3, 1, 0 }, - { .7, 2, 0 }, - { 0, 1, 0 } + { .3, 1, 0 }, + { .7, 2, 0 }, + { .05, 2, 0 } }; /** @@ -160,12 +160,12 @@ constexpr profile_t HIGH_RECOVERY_OPS{ * Background Recovery Allocation: * reservation: 50% | weight: 1 | limit: 0 (max) | * Background Best Effort Allocation: - * reservation: 0 (min) | weight: 1 | limit: 90% | + * reservation: 5% | weight: 2 | limit: 90% | */ constexpr profile_t BALANCED{ - { .5, 1, 0 }, - { .5, 1, 0 }, - { 0, 1, .9 } + { .5, 1, 0 }, + { .5, 1, 0 }, + { .05, 2, .9 } }; struct client_profile_id_t {