From 7cdb7edad4a45cb8761233a883a2b7f0e00f4aae Mon Sep 17 00:00:00 2001 From: "Matthew N. Heler" Date: Fri, 15 May 2026 06:11:35 -0500 Subject: [PATCH] doc/rados/configuration: recommend wpq for EC clusters seeing slow ops On large EC clusters, mClock currently routes recovery EC sub-reads through the immediate queue, skipping throttling. When many OSDs read from one source during recovery, that source's high-priority queue saturates and starves client work, producing slow ops. Recommend falling back to wpq in the mClock config reference until the scheduler treats those reads as background. Signed-off-by: Matthew N. Heler --- doc/rados/configuration/mclock-config-ref.rst | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/doc/rados/configuration/mclock-config-ref.rst b/doc/rados/configuration/mclock-config-ref.rst index c205ec14affd..95d6e52c91e3 100644 --- a/doc/rados/configuration/mclock-config-ref.rst +++ b/doc/rados/configuration/mclock-config-ref.rst @@ -4,6 +4,22 @@ .. index:: mclock; configuration +.. warning:: On large clusters with erasure-coded pools, operators may + observe slow ops during recovery or backfill (for example, when an + OSD is drained out). Under mClock, EC sub-operation reads issued + during recovery are currently routed through the ``immediate`` + high-priority queue and bypass mClock throttling. When many OSDs + read concurrently from a single source OSD, this can saturate that + OSD's high-priority queue and starve client and background work. + As an interim measure, such deployments are advised to switch to + the ``WeightedPriorityQueue`` (``wpq``) scheduler. The change can + be applied cluster-wide and takes effect after each OSD is + restarted: + + .. prompt:: bash # + + ceph config set osd osd_op_queue wpq + QoS support in Ceph is implemented using a queuing scheduler based on `the dmClock algorithm`_. See :ref:`dmclock-qos` section for more details. -- 2.47.3