From: Sage Weil Date: Wed, 16 Jan 2013 22:09:53 +0000 (-0800) Subject: ceph: adjust crush tunables via 'ceph osd crush tunables ' X-Git-Tag: v0.57~186 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=19ee23111585f15a39ee2907fa79e2db2bf523f0;p=ceph.git ceph: adjust crush tunables via 'ceph osd crush tunables ' Make it easy to adjust crush tunables. Create profiles: legacy: the legacy values argonaut: the argonaut defaults, and what is supported.. legacy! (*( bobtail: best that bobtail supports optimal: the current optimal values default: the current default values * In actuality, argonaut supports some of the tunables, but it doesn't say so via the feature bits. Signed-off-by: Sage Weil Reviewed-by: Samuel Just Reviewed-by: Dan Mick --- diff --git a/doc/rados/operations/crush-map.rst b/doc/rados/operations/crush-map.rst index be2d8532cdad..d89860c05d55 100644 --- a/doc/rados/operations/crush-map.rst +++ b/doc/rados/operations/crush-map.rst @@ -788,12 +788,12 @@ the v0.48 argonaut series) allow the values to be adjusted or tuned. Clusters running recent Ceph releases support using the tunable values in the CRUSH maps. However, older clients and daemons will not correctly interact with clusters using the "tuned" CRUSH maps. To detect this situation, -there is now a feature bit ``CRUSH_TUNABLES`` (value 0x40000) to +there are now features bits ``CRUSH_TUNABLES`` (value 0x40000) and ``CRUSH_TUNABLES2`` to reflect support for tunables. If the OSDMap currently used by the ``ceph-mon`` or ``ceph-osd`` -daemon has non-legacy values, it will require the ``CRUSH_TUNABLES`` -feature bit from clients and daemons who connect to it. This means +daemon has non-legacy values, it will require the ``CRUSH_TUNABLES`` or ``CRUSH_TUNABLES2`` +feature bits from clients and daemons who connect to it. This means that old clients will not be able to connect. At some future point in time, newly created clusters will have @@ -818,13 +818,41 @@ The legacy values result in several misbehaviors: * When some OSDs are marked out, the data tends to get redistributed to nearby OSDs instead of across the entire hierarchy. -Which client versions support tunables --------------------------------------- +CRUSH_TUNABLES +-------------- + + * ``choose_local_tries``: Number of local retries. Legacy value is + 2, optimal value is 0. + + * ``choose_local_fallback_tries``: Legacy value is 5, optimal value + is 0. + + * ``choose_total_tries``: Total number of attempts to choose an item. + Legacy value was 19, subsequent testing indicates that a value of + 50 is more appropriate for typical clusters. For extremely large + clusters, a larger value might be necessary. + +CRUSH_TUNABLES2 +--------------- + + * ``chooseleaf_descend_once``: Whether a recursive chooseleaf attempt + will retry, or only try once and allow the original placement to + retry. Legacy default is 0, optimal value is 1. + + +Which client versions support CRUSH_TUNABLES +-------------------------------------------- * argonaut series, v0.48.1 or later * v0.49 or later * Linux kernel version v3.5 or later (for the file system and RBD kernel clients) +Which client versions support CRUSH_TUNABLES2 +--------------------------------------------- + + * v0.55 or later, including bobtail series (v0.56.x) + * Linux kernel version v3.9 or later (for the file system and RBD kernel clients) + A few important points ---------------------- @@ -832,7 +860,7 @@ A few important points storage nodes. If the Ceph cluster is already storing a lot of data, be prepared for some fraction of the data to move. * The ``ceph-osd`` and ``ceph-mon`` daemons will start requiring the - ``CRUSH_TUNABLES`` feature of new connections as soon as they get + feature bits of new connections as soon as they get the updated map. However, already-connected clients are effectively grandfathered in, and will misbehave if they do not support the new feature. @@ -840,7 +868,7 @@ A few important points changed back to the defult values, ``ceph-osd`` daemons will not be required to support the feature. However, the OSD peering process requires examining and understanding old maps. Therefore, you - should not run old (pre-v0.48) versions of the ``ceph-osd`` daemon + should not run old versions of the ``ceph-osd`` daemon if the cluster has previosly used non-legacy CRUSH values, even if the latest version of the map has been switched back to using the legacy defaults. @@ -848,6 +876,28 @@ A few important points Tuning CRUSH ------------ +The simplest way to adjust the crush tunables is by changing to a known +profile. Those are: + + * ``legacy``: the legacy behavior from argonaut and earlier. + * ``argonaut``: the legacy values supported by the original argonaut release + * ``bobtail``: the values supported by the bobtail release + * ``optimal``: the current best values + * ``default``: the current default values for a new cluster + +Currently, ``legacy``, ``default``, and ``argonaut`` are the same, and +``bobtail`` and ``optimal`` include ``CRUSH_TUNABLES`` and ``CRUSH_TUNABLES2``. + +You can select a profile on a running cluster with the command:: + + ceph osd crush tunables {PROFILE} + +Note that this may result in some data movement. + + +Tuning CRUSH, the hard way +-------------------------- + If you can ensure that all clients are running recent code, you can adjust the tunables by extracting the CRUSH map, modifying the values, and reinjecting it into the cluster. diff --git a/src/crush/CrushWrapper.h b/src/crush/CrushWrapper.h index f77991cc314c..56bcb598ff30 100644 --- a/src/crush/CrushWrapper.h +++ b/src/crush/CrushWrapper.h @@ -97,6 +97,28 @@ public: } // tunables + void set_tunables_legacy() { + crush->choose_local_tries = 2; + crush->choose_local_fallback_tries = 5; + crush->choose_total_tries = 19; + crush->chooseleaf_descend_once = 0; + } + void set_tunables_optimal() { + crush->choose_local_tries = 0; + crush->choose_local_fallback_tries = 0; + crush->choose_total_tries = 50; + crush->chooseleaf_descend_once = 1; + } + void set_tunables_argonaut() { + set_tunables_legacy(); + } + void set_tunables_bobtail() { + set_tunables_optimal(); + } + void set_tunables_default() { + set_tunables_legacy(); + } + int get_choose_local_tries() const { return crush->choose_local_tries; } diff --git a/src/mon/OSDMonitor.cc b/src/mon/OSDMonitor.cc index e279e8540ce3..9845552fc00c 100644 --- a/src/mon/OSDMonitor.cc +++ b/src/mon/OSDMonitor.cc @@ -2331,6 +2331,39 @@ bool OSDMonitor::prepare_command(MMonCommand *m) } } while (false); } + else if (m->cmd.size() == 4 && m->cmd[1] == "crush" && m->cmd[2] == "tunables") { + bufferlist bl; + if (pending_inc.crush.length()) + bl = pending_inc.crush; + else + osdmap.crush->encode(bl); + + CrushWrapper newcrush; + bufferlist::iterator p = bl.begin(); + newcrush.decode(p); + + err = 0; + if (m->cmd[3] == "legacy" || m->cmd[3] == "argonaut") { + newcrush.set_tunables_legacy(); + } else if (m->cmd[3] == "bobtail") { + newcrush.set_tunables_bobtail(); + } else if (m->cmd[3] == "optimal") { + newcrush.set_tunables_optimal(); + } else if (m->cmd[3] == "default") { + newcrush.set_tunables_default(); + } else { + err = -EINVAL; + ss << "unknown tunables profile '" << m->cmd[3] << "'; allowed values are argonaut, bobtail, optimal, or default"; + } + if (err == 0) { + pending_inc.crush.clear(); + newcrush.encode(pending_inc.crush); + ss << "adjusted tunables profile to " << m->cmd[3]; + getline(ss, rs); + paxos->wait_for_commit(new Monitor::C_Command(mon, m, 0, rs, paxos->get_version())); + return true; + } + } else if (m->cmd[1] == "setmaxosd" && m->cmd.size() > 2) { int newmax = parse_pos_long(m->cmd[2].c_str(), &ss); if (newmax < 0) { diff --git a/src/test/cli/ceph/help.t b/src/test/cli/ceph/help.t index c8d6fa7556cc..2cd3deda5e26 100644 --- a/src/test/cli/ceph/help.t +++ b/src/test/cli/ceph/help.t @@ -46,6 +46,7 @@ ceph osd crush move [ ...] ceph osd crush create-or-move [ ...] ceph osd crush reweight + ceph osd crush tunables ceph osd create [] ceph osd rm [...] ceph osd lost [--yes-i-really-mean-it] diff --git a/src/tools/ceph.cc b/src/tools/ceph.cc index d5300e69bdd3..7582ac96ab27 100644 --- a/src/tools/ceph.cc +++ b/src/tools/ceph.cc @@ -89,6 +89,7 @@ static void usage() cout << " ceph osd crush move [ ...]\n"; cout << " ceph osd crush create-or-move [ ...]\n"; cout << " ceph osd crush reweight \n"; + cout << " ceph osd crush tunables \n"; cout << " ceph osd create []\n"; cout << " ceph osd rm [...]\n"; cout << " ceph osd lost [--yes-i-really-mean-it]\n";