+------------------------+-------------+--------+-------+
| background recovery | 25% | 1 | 100% |
+------------------------+-------------+--------+-------+
-| background best effort | 25% | 1 | MAX |
+| background best effort | 25% | 2 | MAX |
+------------------------+-------------+--------+-------+
balanced
+------------------------+-------------+--------+-------+
| background recovery | 40% | 1 | 150% |
+------------------------+-------------+--------+-------+
-| background best effort | 20% | 1 | MAX |
+| background best effort | 20% | 2 | MAX |
+------------------------+-------------+--------+-------+
high_recovery_ops
+------------------------+-------------+--------+-------+
| background recovery | 60% | 2 | 200% |
+------------------------+-------------+--------+-------+
-| background best effort | 1 (MIN) | 1 | MAX |
+| background best effort | 1 (MIN) | 2 | MAX |
+------------------------+-------------+--------+-------+
custom
+------------------------+-------------+--------+-------+
| background recovery | 25% | 1 | 100% |
+------------------------+-------------+--------+-------+
-| background best-effort | 25% | 1 | MAX |
+| background best-effort | 25% | 2 | MAX |
+------------------------+-------------+--------+-------+
high_recovery_ops
+------------------------+-------------+--------+-------+
| background recovery | 60% | 2 | 200% |
+------------------------+-------------+--------+-------+
-| background best-effort | 1 (MIN) | 1 | MAX |
+| background best-effort | 1 (MIN) | 2 | MAX |
+------------------------+-------------+--------+-------+
balanced
+------------------------+-------------+--------+-------+
| background recovery | 40% | 1 | 150% |
+------------------------+-------------+--------+-------+
-| background best-effort | 20% | 1 | MAX |
+| background best-effort | 20% | 2 | MAX |
+------------------------+-------------+--------+-------+
.. note:: Across the built-in profiles, internal background best-effort clients
QoS requirements are being met.
+Switching Between Built-in and Custom Profiles
+==============================================
+
+There may be situations requiring switching from a built-in profile to the
+*custom* profile and vice-versa. The following sections outline the steps to
+accomplish this.
+
+Steps to Switch From a Built-in to the Custom Profile
+-----------------------------------------------------
+
+The following command can be used to switch to the *custom* profile:
+
+ .. prompt:: bash #
+
+ ceph config set osd osd_mclock_profile custom
+
+For example, to change the profile to *custom* on all OSDs, the following
+command can be used:
+
+ .. prompt:: bash #
+
+ ceph config set osd osd_mclock_profile custom
+
+After switching to the *custom* profile, the desired mClock configuration
+option may be modified. For example, to change the client reservation IOPS
+allocation for a specific OSD (say osd.0), the following command can be used:
+
+ .. prompt:: bash #
+
+ ceph config set osd.0 osd_mclock_scheduler_client_res 3000
+
+.. important:: Care must be taken to change the reservations of other services like
+ recovery and background best effort accordingly to ensure that the sum of the
+ reservations do not exceed the maximum IOPS capacity of the OSD.
+
+.. tip:: The reservation and limit parameter allocations are per-shard based on
+ the type of backing device (HDD/SSD) under the OSD. See
+ :confval:`osd_op_num_shards_hdd` and :confval:`osd_op_num_shards_ssd` for
+ more details.
+
+Steps to Switch From the Custom Profile to a Built-in Profile
+-------------------------------------------------------------
+
+Switching from the *custom* profile to a built-in profile requires an
+intermediate step of removing the custom settings from the central config
+database for the changes to take effect.
+
+The following sequence of commands can be used to switch to a built-in profile:
+
+#. Set the desired built-in profile using:
+
+ .. prompt:: bash #
+
+ ceph config set osd <mClock Configuration Option>
+
+ For example, to set the built-in profile to ``high_client_ops`` on all
+ OSDs, run the following command:
+
+ .. prompt:: bash #
+
+ ceph config set osd osd_mclock_profile high_client_ops
+#. Determine the existing custom mClock configuration settings in the central
+ config database using the following command:
+
+ .. prompt:: bash #
+
+ ceph config dump
+#. Remove the custom mClock configuration settings determined in the previous
+ step from the central config database:
+
+ .. prompt:: bash #
+
+ ceph config rm osd <mClock Configuration Option>
+
+ For example, to remove the configuration option
+ :confval:`osd_mclock_scheduler_client_res` that was set on all OSDs, run the
+ following command:
+
+ .. prompt:: bash #
+
+ ceph config rm osd osd_mclock_scheduler_client_res
+#. After all existing custom mClock configuration settings have been removed
+ from the central config database, the configuration settings pertaining to
+ ``high_client_ops`` will come into effect. For e.g., to verify the settings
+ on osd.0 use:
+
+ .. prompt:: bash #
+
+ ceph config show osd.0
+
+Switch Temporarily Between mClock Profiles
+------------------------------------------
+
+To switch between mClock profiles on a temporary basis, the following commands
+may be used to override the settings:
+
+.. warning:: This section is for advanced users or for experimental testing. The
+ recommendation is to not use the below commands on a running cluster as it
+ could have unexpected outcomes.
+
+.. note:: The configuration changes on an OSD using the below commands are
+ ephemeral and are lost when it restarts. It is also important to note that
+ the config options overridden using the below commands cannot be modified
+ further using the *ceph config set osd.N ...* command. The changes will not
+ take effect until a given OSD is restarted. This is intentional, as per the
+ config subsystem design. However, any further modification can still be made
+ ephemerally using the commands mentioned below.
+
+#. Run the *injectargs* command as shown to override the mclock settings:
+
+ .. prompt:: bash #
+
+ ceph tell osd.N injectargs '--<mClock Configuration Option>=<value>'
+
+ For example, the following command overrides the
+ :confval:`osd_mclock_profile` option on osd.0:
+
+ .. prompt:: bash #
+
+ ceph tell osd.0 injectargs '--osd_mclock_profile=high_recovery_ops'
+
+
+#. An alternate command that can be used is:
+
+ .. prompt:: bash #
+
+ ceph daemon osd.N config set <mClock Configuration Option> <value>
+
+ For example, the following command overrides the
+ :confval:`osd_mclock_profile` option on osd.0:
+
+ .. prompt:: bash #
+
+ ceph daemon osd.0 config set osd_mclock_profile high_recovery_ops
+
+The individual QoS-related config options for the *custom* profile can also be
+modified ephemerally using the above commands.
+
+
OSD Capacity Determination (Automated)
======================================
--- /dev/null
+#!/usr/bin/env bash
+#
+# Copyright (C) 2022 Red Hat <contact@redhat.com>
+#
+# Author: Sridhar Seshasayee <sseshasa@redhat.com>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Library Public License as published by
+# the Free Software Foundation; either version 2, or (at your option)
+# any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU Library Public License for more details.
+#
+
+source $CEPH_ROOT/qa/standalone/ceph-helpers.sh
+
+function run() {
+ local dir=$1
+ shift
+
+ export CEPH_MON="127.0.0.1:7124" # git grep '\<7124\>' : there must be only one
+ export CEPH_ARGS
+ CEPH_ARGS+="--fsid=$(uuidgen) --auth-supported=none "
+ CEPH_ARGS+="--mon-host=$CEPH_MON "
+ CEPH_ARGS+="--debug-bluestore 20 "
+
+ local funcs=${@:-$(set | sed -n -e 's/^\(TEST_[0-9a-z_]*\) .*/\1/p')}
+ for func in $funcs ; do
+ setup $dir || return 1
+ $func $dir || return 1
+ teardown $dir || return 1
+ done
+}
+
+function TEST_profile_builtin_to_custom() {
+ local dir=$1
+ local OSDS=3
+
+ setup $dir || return 1
+ run_mon $dir a --osd_pool_default_size=$OSDS || return 1
+ run_mgr $dir x || return 1
+ for osd in $(seq 0 $(expr $OSDS - 1))
+ do
+ run_osd $dir $osd --osd_op_queue=mclock_scheduler || return 1
+ done
+
+ # Verify that the default mclock profile is set on the OSDs
+ for id in $(seq 0 $(expr $OSDS - 1))
+ do
+ local mclock_profile=$(ceph config get osd.$id osd_mclock_profile)
+ test "$mclock_profile" = "high_client_ops" || return 1
+ done
+
+ # Change the mclock profile to 'custom'
+ ceph config set osd osd_mclock_profile custom || return 1
+
+ # Verify that the mclock profile is set to 'custom' on the OSDs
+ for id in $(seq 0 $(expr $OSDS - 1))
+ do
+ local mclock_profile=$(ceph config get osd.$id osd_mclock_profile)
+ test "$mclock_profile" = "custom" || return 1
+ done
+
+ # Change a mclock config param and confirm the change
+ local client_res=$(CEPH_ARGS='' ceph --format=json daemon $(get_asok_path \
+ osd.$id) config get osd_mclock_scheduler_client_res | \
+ jq .osd_mclock_scheduler_client_res | bc)
+ echo "client_res = $client_res"
+ local client_res_new=$(expr $client_res + 10)
+ echo "client_res_new = $client_res_new"
+ ceph config set osd osd_mclock_scheduler_client_res \
+ $client_res_new || return 1
+ for id in $(seq 0 $(expr $OSDS - 1))
+ do
+ # Check value in config monitor db
+ local res=$(ceph config get osd.$id \
+ osd_mclock_scheduler_client_res) || return 1
+ test $res -eq $client_res_new || return 1
+ # Check value in the in-memory 'values' map
+ res=$(CEPH_ARGS='' ceph --format=json daemon $(get_asok_path \
+ osd.$id) config get osd_mclock_scheduler_client_res | \
+ jq .osd_mclock_scheduler_client_res | bc)
+ test $res -eq $client_res_new || return 1
+ done
+
+ teardown $dir || return 1
+}
+
+function TEST_profile_custom_to_builtin() {
+ local dir=$1
+ local OSDS=3
+
+ setup $dir || return 1
+ run_mon $dir a --osd_pool_default_size=$OSDS || return 1
+ run_mgr $dir x || return 1
+ for osd in $(seq 0 $(expr $OSDS - 1))
+ do
+ run_osd $dir $osd --osd_op_queue=mclock_scheduler || return 1
+ done
+
+ # Verify that the default mclock profile is set on the OSDs
+ for id in $(seq 0 $(expr $OSDS - 1))
+ do
+ local mclock_profile=$(ceph config get osd.$id osd_mclock_profile)
+ test "$mclock_profile" = "high_client_ops" || return 1
+ done
+
+ # Change the mclock profile to 'custom'
+ ceph config set osd osd_mclock_profile custom || return 1
+
+ # Verify that the mclock profile is set to 'custom' on the OSDs
+ for id in $(seq 0 $(expr $OSDS - 1))
+ do
+ local mclock_profile=$(ceph config get osd.$id osd_mclock_profile)
+ test "$mclock_profile" = "custom" || return 1
+ done
+
+ # Save the original client reservations allocated to the OSDs
+ local client_res=()
+ for id in $(seq 0 $(expr $OSDS - 1))
+ do
+ client_res+=( $(CEPH_ARGS='' ceph --format=json daemon $(get_asok_path \
+ osd.$id) config get osd_mclock_scheduler_client_res | \
+ jq .osd_mclock_scheduler_client_res | bc) )
+ echo "Original client_res for osd.$id = ${client_res[$id]}"
+ done
+
+ # Change a mclock config param and confirm the change
+ local client_res_new=$(expr ${client_res[0]} + 10)
+ echo "client_res_new = $client_res_new"
+ ceph config set osd osd_mclock_scheduler_client_res \
+ $client_res_new || return 1
+ for id in $(seq 0 $(expr $OSDS - 1))
+ do
+ # Check value in config monitor db
+ local res=$(ceph config get osd.$id \
+ osd_mclock_scheduler_client_res) || return 1
+ test $res -eq $client_res_new || return 1
+ # Check value in the in-memory 'values' map
+ res=$(CEPH_ARGS='' ceph --format=json daemon $(get_asok_path \
+ osd.$id) config get osd_mclock_scheduler_client_res | \
+ jq .osd_mclock_scheduler_client_res | bc)
+ test $res -eq $client_res_new || return 1
+ done
+
+ # Switch the mclock profile back to the original built-in profile.
+ # The config subsystem prevents the overwrite of the changed QoS config
+ # option above i.e. osd_mclock_scheduler_client_res. This fact is verified
+ # before proceeding to remove the entry from the config monitor db. After
+ # the config entry is removed, the original value for the config option is
+ # restored and is verified.
+ ceph config set osd osd_mclock_profile high_client_ops || return 1
+ # Verify that the mclock profile is set to 'high_client_ops' on the OSDs
+ for id in $(seq 0 $(expr $OSDS - 1))
+ do
+ local mclock_profile=$(ceph config get osd.$id osd_mclock_profile)
+ test "$mclock_profile" = "high_client_ops" || return 1
+ done
+
+ # Verify that the new value is still in effect
+ for id in $(seq 0 $(expr $OSDS - 1))
+ do
+ # Check value in config monitor db
+ local res=$(ceph config get osd.$id \
+ osd_mclock_scheduler_client_res) || return 1
+ test $res -eq $client_res_new || return 1
+ # Check value in the in-memory 'values' map
+ res=$(CEPH_ARGS='' ceph --format=json daemon $(get_asok_path \
+ osd.$id) config get osd_mclock_scheduler_client_res | \
+ jq .osd_mclock_scheduler_client_res | bc)
+ test $res -eq $client_res_new || return 1
+ done
+
+ # Remove the changed QoS config option from monitor db
+ ceph config rm osd osd_mclock_scheduler_client_res || return 1
+
+ # Verify that the original values are now restored
+ for id in $(seq 0 $(expr $OSDS - 1))
+ do
+ # Check value in the in-memory 'values' map
+ res=$(CEPH_ARGS='' ceph --format=json daemon $(get_asok_path \
+ osd.$id) config get osd_mclock_scheduler_client_res | \
+ jq .osd_mclock_scheduler_client_res | bc)
+ test $res -eq ${client_res[$id]} || return 1
+ done
+
+ teardown $dir || return 1
+}
+
+main test-mclock-profile-switch "$@"
+
+# Local Variables:
+# compile-command: "cd build ; make -j4 && \
+# ../qa/run-standalone.sh test-mclock-profile-switch.sh"
+# End:
static_cast<size_t>(op_scheduler_class::background_best_effort)];
// Set external client params
- cct->_conf.set_val("osd_mclock_scheduler_client_res",
+ cct->_conf.set_val_default("osd_mclock_scheduler_client_res",
std::to_string(client.res));
- cct->_conf.set_val("osd_mclock_scheduler_client_wgt",
+ cct->_conf.set_val_default("osd_mclock_scheduler_client_wgt",
std::to_string(client.wgt));
- cct->_conf.set_val("osd_mclock_scheduler_client_lim",
+ cct->_conf.set_val_default("osd_mclock_scheduler_client_lim",
std::to_string(client.lim));
dout(10) << __func__ << " client QoS params: " << "["
<< client.res << "," << client.wgt << "," << client.lim
<< "]" << dendl;
// Set background recovery client params
- cct->_conf.set_val("osd_mclock_scheduler_background_recovery_res",
+ cct->_conf.set_val_default("osd_mclock_scheduler_background_recovery_res",
std::to_string(rec.res));
- cct->_conf.set_val("osd_mclock_scheduler_background_recovery_wgt",
+ cct->_conf.set_val_default("osd_mclock_scheduler_background_recovery_wgt",
std::to_string(rec.wgt));
- cct->_conf.set_val("osd_mclock_scheduler_background_recovery_lim",
+ cct->_conf.set_val_default("osd_mclock_scheduler_background_recovery_lim",
std::to_string(rec.lim));
dout(10) << __func__ << " Recovery QoS params: " << "["
<< rec.res << "," << rec.wgt << "," << rec.lim
<< "]" << dendl;
// Set background best effort client params
- cct->_conf.set_val("osd_mclock_scheduler_background_best_effort_res",
+ cct->_conf.set_val_default("osd_mclock_scheduler_background_best_effort_res",
std::to_string(best_effort.res));
- cct->_conf.set_val("osd_mclock_scheduler_background_best_effort_wgt",
+ cct->_conf.set_val_default("osd_mclock_scheduler_background_best_effort_wgt",
std::to_string(best_effort.wgt));
- cct->_conf.set_val("osd_mclock_scheduler_background_best_effort_lim",
+ cct->_conf.set_val_default("osd_mclock_scheduler_background_best_effort_lim",
std::to_string(best_effort.lim));
dout(10) << __func__ << " Best effort QoS params: " << "["
<< best_effort.res << "," << best_effort.wgt << "," << best_effort.lim