who continue to either use the codebase or modify it according to their needs.
DmClock exists in its own repository_. Before the Ceph *Pacific* release,
-mClock could be enabled by setting the ``osd_op_queue`` Ceph option to
+mClock could be enabled by setting the :confval:`osd_op_queue` Ceph option to
"mclock_scheduler". Additional mClock parameters like *reservation*, *weight*
and *limit* for each service type could be set using Ceph options.
For example, ``osd_mclock_scheduler_client_[res,wgt,lim]`` is one such option.
| **HDD (without bluestore WAL & dB)** | 315 IOPS (1.23 MiB/s) |
+--------------------------------------+-------------------------------------------+
-.. note:: The ``bluestore_throttle_bytes`` and
- ``bluestore_throttle_deferred_bytes`` for SSDs were determined to be
+.. note:: The :confval:`bluestore_throttle_bytes` and
+ :confval:`bluestore_throttle_deferred_bytes` for SSDs were determined to be
256 KiB. For HDDs, it was 40MiB. The above throughput was obtained
by running 4 KiB random writes at a queue depth of 64 for 300 secs.
following set of Ceph recovery related options were modified for tests with both
the WPQ and mClock schedulers.
-- ``osd_max_backfills`` = 1000
-- ``osd_recovery_max_active`` = 1000
-- ``osd_async_recovery_min_cost`` = 1
+- :confval:`osd_max_backfills` = 1000
+- :confval:`osd_recovery_max_active` = 1000
+- :confval:`osd_async_recovery_min_cost` = 1
The above options set a high limit on the number of concurrent local and
remote backfill operations per OSD. Under these conditions the capability of the
WPQ(def) in the chart shows the average client throughput obtained
using the WPQ scheduler with all other Ceph configuration settings set to
-default values. The default setting for ``osd_max_backfills`` limits the number
+default values. The default setting for :confval:`osd_max_backfills` limits the number
of concurrent local and remote backfills or recoveries per OSD to 1. As a
result, the average client throughput obtained is impressive at just over 18000
IOPS when compared to the baseline value which is 21500 IOPS.
rates under these conditions.
With the non-default options, the same test was executed with mClock and with
-the default profile(*high_client_ops*) enabled. As per the profile allocation,
+the default profile (*high_client_ops*) enabled. As per the profile allocation,
the reservation goal of 50% (10750 IOPS) is being met with an average throughput
of 11209 IOPS during the course of recovery operations. This is more than 4x
times the throughput obtained with WPQ(BST).
.. image:: ../../images/mclock_wpq_study/Recovery_Rate_Comparison_NVMe_SSD_WPQ_vs_mClock.png
-Intuitively, the *high_client_ops* should impact recovery operations the most
+Intuitively, the *high_client_ops* should impact recovery operations the most
and this is indeed the case as it took an average of 966 secs for the
recovery to complete at 80 Objects/sec. The recovery bandwidth as expected was
the lowest at an average of ~320 MiB/s.
the max OSD capacity provided beforehand. As a result, the following mclock
config parameters cannot be modified when using any of the built-in profiles:
-- ``osd_mclock_scheduler_client_res``
-- ``osd_mclock_scheduler_client_wgt``
-- ``osd_mclock_scheduler_client_lim``
-- ``osd_mclock_scheduler_background_recovery_res``
-- ``osd_mclock_scheduler_background_recovery_wgt``
-- ``osd_mclock_scheduler_background_recovery_lim``
-- ``osd_mclock_scheduler_background_best_effort_res``
-- ``osd_mclock_scheduler_background_best_effort_wgt``
-- ``osd_mclock_scheduler_background_best_effort_lim``
+- :confval:`osd_mclock_scheduler_client_res`
+- :confval:`osd_mclock_scheduler_client_wgt`
+- :confval:`osd_mclock_scheduler_client_lim`
+- :confval:`osd_mclock_scheduler_background_recovery_res`
+- :confval:`osd_mclock_scheduler_background_recovery_wgt`
+- :confval:`osd_mclock_scheduler_background_recovery_lim`
+- :confval:`osd_mclock_scheduler_background_best_effort_res`
+- :confval:`osd_mclock_scheduler_background_best_effort_wgt`
+- :confval:`osd_mclock_scheduler_background_best_effort_lim`
The following Ceph options will not be modifiable by the user:
-- ``osd_max_backfills``
-- ``osd_recovery_max_active``
+- :confval:`osd_max_backfills`
+- :confval:`osd_recovery_max_active`
This is because the above options are internally modified by the mclock
scheduler in order to maximize the impact of the set profile.
If a built-in profile is active, the following Ceph config sleep options will
be disabled,
-- ``osd_recovery_sleep``
-- ``osd_recovery_sleep_hdd``
-- ``osd_recovery_sleep_ssd``
-- ``osd_recovery_sleep_hybrid``
-- ``osd_scrub_sleep``
-- ``osd_delete_sleep``
-- ``osd_delete_sleep_hdd``
-- ``osd_delete_sleep_ssd``
-- ``osd_delete_sleep_hybrid``
-- ``osd_snap_trim_sleep``
-- ``osd_snap_trim_sleep_hdd``
-- ``osd_snap_trim_sleep_ssd``
-- ``osd_snap_trim_sleep_hybrid``
+- :confval:`osd_recovery_sleep`
+- :confval:`osd_recovery_sleep_hdd`
+- :confval:`osd_recovery_sleep_ssd`
+- :confval:`osd_recovery_sleep_hybrid`
+- :confval:`osd_scrub_sleep`
+- :confval:`osd_delete_sleep`
+- :confval:`osd_delete_sleep_hdd`
+- :confval:`osd_delete_sleep_ssd`
+- :confval:`osd_delete_sleep_hybrid`
+- :confval:`osd_snap_trim_sleep`
+- :confval:`osd_snap_trim_sleep_hdd`
+- :confval:`osd_snap_trim_sleep_ssd`
+- :confval:`osd_snap_trim_sleep_hybrid`
The above sleep options are disabled to ensure that mclock scheduler is able to
determine when to pick the next op from its operation queue and transfer it to
to the code base. We hope you'll share you're experiences with your
mClock and dmClock experiments on the ``ceph-devel`` mailing list.
-
+.. confval:: osd_async_recovery_min_cost
.. confval:: osd_push_per_object_cost
.. confval:: osd_mclock_scheduler_client_res
.. confval:: osd_mclock_scheduler_client_wgt