Sage Weil [Wed, 28 Nov 2018 03:32:04 +0000 (21:32 -0600)]
Merge PR #24502 into master
* refs/pull/24502/head:
crushtool: implement --rebuild-class-roots command
crushtool: make --reweight re-sum choose_args weight-sets too
crushtool: --reweight should only reweight nonshadow roots
crush/CrushWrapper: reclassify: use default parent for created buckets
crush/CrushWrapper: reclassify: handle to-be-created buckets that we need twice
test/cli/crushtool/reclassify: add second gabe test case
crushtool: add --set-subtree-class; do not set class via --reclassify-root
test/cli/crushtool/reclassify: add reclassify test cases
doc/rados/operations/crush*: document reclassify
doc/rados/operations/crush: remove instructions for separate crush trees for ssd
crushtool: add --compare command
crushtool: implement --reclassify
crush/CrushCompiler: fix id scan to include class ids
This simply rebuilds the class roots. Normally this should create no
change in the map since whatever was making changes to the map before
should have rebuild the shadow roots at that point.
Sage Weil [Fri, 26 Oct 2018 14:32:27 +0000 (09:32 -0500)]
crushtool: make --reweight re-sum choose_args weight-sets too
This ensures that the weights add us for each weight-set (and each
position). Note that since we don't have anything that actually
creates positional weight-sets, the behavior here might not be what we
want in the end, but for the compat weight-sets (no position), we *do*
keep the weights as a properly summing tree.
Sage Weil [Sun, 14 Oct 2018 19:57:59 +0000 (14:57 -0500)]
crushtool: add --set-subtree-class; do not set class via --reclassify-root
Sometimes we don't want the --reclassify-root to set the class of every
device because a small number of them are (correctly) a different class.
Allow both behaviors by adding a new, separate command to set the class
of all devices beneath a point in the hierarchy and do not implicitly do
that relabeling as part of --reclassify-root.
Zack Cerza [Mon, 29 Oct 2018 22:07:27 +0000 (16:07 -0600)]
mgr/dashboard: Replace dashboard service
This splits out the collection of health and log data from the
/api/dashboard/health controller into /api/health/{full,minimal} and
/api/logs/all.
/health/full contains all the data (minus logs) that /dashboard/health
did, whereas /health/minimal contains only what is needed for the health
component to function. /logs/all contains exactly what the logs portion
of /dashboard/health did.
By using /health/minimal, on a vstart cluster we pull ~1.4KB of data
every 5s, where we used to pull ~6KB; those numbers would get larger
with larger clusters. Once we split out log data, that will drop to
~0.4KB.
John Spray [Tue, 7 Aug 2018 14:19:41 +0000 (10:19 -0400)]
mgr: create `volumes` module
This encapsulates and extends ceph_volume_client, providing
similar functionality as a service from ceph-mgr.
We used to call CephFS namespaces "filesystems", and the
ceph_volume_client-created directories "volumes". That
terminology changes in this module: namespaces are now "volumes",
and the directory-based entity is a "subvolume".
External systems can use librados to access the command
interface provided by this module, instead of using
ceph_volume_client directly.
Kefu Chai [Tue, 27 Nov 2018 04:08:28 +0000 (12:08 +0800)]
denc: only shallow_copy large-enough chunk for decoding
if the bl being decoded is not continous, shallow_copy() will do deep
copy under the hood. this introduces internal fragmentation when
decoding small objects in large non-contiguous bufferlist.
to alleviate this problem, we try to copy less if the object being
decoded is bounded.
Kefu Chai [Tue, 27 Nov 2018 02:47:07 +0000 (10:47 +0800)]
denc: add non-contiguous decode_nohead() for bl,string
bufferlist's denc traits claims to be need_contiguous=false, so
it should implement all all functions to work with
buffer::list::const_iterator. we already have decode(), the missing
puzzle is decode_nohead().
in this change, decode_nohead(size_t len, bufferlist& v,
buffer::list::const_iterator& p) is implemented.
same applies to basic_string.
ideally, we should allow decode buffer::list::iterator as well. but
let's leave it for another change in future when it's needed.
Kefu Chai [Mon, 26 Nov 2018 17:47:07 +0000 (01:47 +0800)]
buffer: fix the traits of list::iterator
before this change, the constness of value, pointer, etc traits of
list::iterator was wrong. also, because std::iterator is deprecated
in C++17, we need to define the traits manually. so, do this change.
these traits could be potentially used anywhere in the source tree.
but a noteworthy user is is_const_iterator<> in denc.h.
Kefu Chai [Mon, 26 Nov 2018 16:00:43 +0000 (00:00 +0800)]
denc: do not always copy before decoding
before this change, if the caller calls into ::decode_nohead(),
we will try to copy the memory chunk to be decoded before decoding it,
but if the buffer in the buffer list is not the last one, we will *deep*
copy the remaining part in the buffer list into a new contiguous memory
chunk and decode it instead. if we are decoding a map<int, buffer::ptr>
with lots of items in it, and the buffer::ptrs in it are very small,
we will end up suffering from a serious internal fragmentation.
we could use the same strategy of decode(), where we compare the
size of remaining part with CEPH_PAGE_SIZE, and only copy it if
it's small enough. this requires that the decoded type supports
both variant of decoding contiguous and non-contiguous.
quite a few MDS types do not support both of them. for instance,
snapid_t only supports contiguous decoding.
so, instead of conditioning on size, in this change, we condition
on the traits::need_contiguous. probably we can condition on both
of them in a follow-up change.
but we don't have the templatized version of it. so, if we
plugin a different allocator to basic_string<>, the new string won't
decode with buffer::list::const_iterator. this decode variant is used if
the caller only has a (probably non-contiguous) buffer::list in hand.
in this change, copy(unsigned, char*) is used as an alternative.
Sort through and batch bucket instances so that multiple calls to reading
current bucket info and locking can be avoided. For the most trivial case when
the bucket is already deleted we exit early with all the stale instances. When
the bucket reshard is in progress we only process the stale entries with status
done, if the bucket is available for locking then we lock down and mark the
other instances as well.
Kefu Chai [Tue, 27 Nov 2018 11:00:15 +0000 (19:00 +0800)]
ceph.in: write bytes to stdout in raw_write()
in python3, sys.stdout.buffer is an io.BufferedWriter, while in python2
`sys.__stdout__` is a plain file. the former only accepts "bytes". so if
we send it a "str", it complains:
TypeError: a bytes-like object is required, not 'str'
it happens when quitting from the interactive mode of ceph CLI. in that
case, `new_style_command()` returns a tuple of `0, '\n', ''`, where the
second element is a str.
in this change, we always send `bytes` to raw_stdout.
John Spray [Tue, 7 Aug 2018 13:48:56 +0000 (09:48 -0400)]
mgr: cleaner constructor for CommandResult
This is just to reduce the visual noise of all the
CommandResult(""). Nothing really uses the optional
tag part, but it's conceivably useful so let's not
rip it out just yet.