doc/rados/configuration: recommend wpq for EC clusters seeing slow ops

author Matthew N. Heler <matthew.heler@hotmail.com>

Fri, 15 May 2026 11:11:35 +0000 (06:11 -0500)

committer Matthew N. Heler <matthew.heler@hotmail.com>

Fri, 15 May 2026 11:33:00 +0000 (06:33 -0500)
author Matthew N. Heler <matthew.heler@hotmail.com>
Fri, 15 May 2026 11:11:35 +0000 (06:11 -0500)
committer Matthew N. Heler <matthew.heler@hotmail.com>
Fri, 15 May 2026 11:33:00 +0000 (06:33 -0500)
diff --git a/doc/rados/configuration/mclock-config-ref.rst b/doc/rados/configuration/mclock-config-ref.rst

index c205ec14affd88f8bef982968c3fbbac4408a3a7..95d6e52c91e39f2d45757336a694d01fab77b474 100644 (file)
--- a/doc/rados/configuration/mclock-config-ref.rst
+++ b/doc/rados/configuration/mclock-config-ref.rst
@@ -4,6 +4,22 @@
  
  .. index:: mclock; configuration
  
+.. warning:: On large clusters with erasure-coded pools, operators may
+   observe slow ops during recovery or backfill (for example, when an
+   OSD is drained out). Under mClock, EC sub-operation reads issued
+   during recovery are currently routed through the ``immediate``
+   high-priority queue and bypass mClock throttling. When many OSDs
+   read concurrently from a single source OSD, this can saturate that
+   OSD's high-priority queue and starve client and background work.
+   As an interim measure, such deployments are advised to switch to
+   the ``WeightedPriorityQueue`` (``wpq``) scheduler. The change can
+   be applied cluster-wide and takes effect after each OSD is
+   restarted:
+
+   .. prompt:: bash #
+
+     ceph config set osd osd_op_queue wpq
+
  QoS support in Ceph is implemented using a queuing scheduler based on `the
  dmClock algorithm`_. See :ref:`dmclock-qos` section for more details.
author	Matthew N. Heler <matthew.heler@hotmail.com>
	Fri, 15 May 2026 11:11:35 +0000 (06:11 -0500)
committer	Matthew N. Heler <matthew.heler@hotmail.com>
	Fri, 15 May 2026 11:33:00 +0000 (06:33 -0500)