James Page [Thu, 29 Nov 2018 09:47:07 +0000 (09:47 +0000)]
Correct usage of collections.abc
Some classes should still be imported directly from collections;
only OrderedDict, Iterable and Callable (in the context of the
ceph codebase) are found in collections.abc.
The current code works due to the fallback support for Python 2.
Li Wang [Thu, 29 Nov 2018 09:12:10 +0000 (09:12 +0000)]
tools/rados: allow reuse object for write test
Currently, for rados bench write test, it always
creates new objects for testing. Create operation
refers to non-neglectable metadata overhead, especially
for small write performance. This patch allows to
reuse objects for write testing
Signed-off-by: Li Wang <laurence.liwang@gmail.com>
Kefu Chai [Thu, 29 Nov 2018 07:44:55 +0000 (15:44 +0800)]
mgr: don't write to output if EOPNOTSUPP
if process_pg_map_command() fails to fulfill the request, we should keep
the odata intact. and let the module take care of it.
before this change, we always write to odata even if
process_pg_map_command() returns -EOPNOTSUPP, this leaves unnecessary
leftover in the output, like pg_info,pg_ready.
after this change, we won't touch odata if process_pg_map_command()
returns -EOPNOTSUPP. and odata will be filled with whatever the python
module returns.
Kefu Chai [Thu, 29 Nov 2018 06:35:01 +0000 (14:35 +0800)]
qa/suites/rados/upgrade: set require-osd-release to nautilus
* add qa/releases/nautilus.yaml so it can be reused.
* use releases/nautilus.yaml in luminous-x upgrade test, so
test_librbd_python.sh is able to use the feature introduced in
nautilus.
Casey Bodley [Wed, 28 Nov 2018 18:45:54 +0000 (13:45 -0500)]
rgw: data sync accepts ERR_PRECONDITION_FAILED on remove_object()
sync of deletes uses an If-UnModified-Since precondition, but does not
handle the corresponding ERR_PRECONDITION_FAILED error. treating this as
a failure means that we'll keep retrying the delete which will never
succeed. break this loop by treating ERR_PRECONDITION_FAILED as a
success
Sage Weil [Wed, 28 Nov 2018 03:32:04 +0000 (21:32 -0600)]
Merge PR #24502 into master
* refs/pull/24502/head:
crushtool: implement --rebuild-class-roots command
crushtool: make --reweight re-sum choose_args weight-sets too
crushtool: --reweight should only reweight nonshadow roots
crush/CrushWrapper: reclassify: use default parent for created buckets
crush/CrushWrapper: reclassify: handle to-be-created buckets that we need twice
test/cli/crushtool/reclassify: add second gabe test case
crushtool: add --set-subtree-class; do not set class via --reclassify-root
test/cli/crushtool/reclassify: add reclassify test cases
doc/rados/operations/crush*: document reclassify
doc/rados/operations/crush: remove instructions for separate crush trees for ssd
crushtool: add --compare command
crushtool: implement --reclassify
crush/CrushCompiler: fix id scan to include class ids
This simply rebuilds the class roots. Normally this should create no
change in the map since whatever was making changes to the map before
should have rebuild the shadow roots at that point.
Sage Weil [Fri, 26 Oct 2018 14:32:27 +0000 (09:32 -0500)]
crushtool: make --reweight re-sum choose_args weight-sets too
This ensures that the weights add us for each weight-set (and each
position). Note that since we don't have anything that actually
creates positional weight-sets, the behavior here might not be what we
want in the end, but for the compat weight-sets (no position), we *do*
keep the weights as a properly summing tree.
Sage Weil [Sun, 14 Oct 2018 19:57:59 +0000 (14:57 -0500)]
crushtool: add --set-subtree-class; do not set class via --reclassify-root
Sometimes we don't want the --reclassify-root to set the class of every
device because a small number of them are (correctly) a different class.
Allow both behaviors by adding a new, separate command to set the class
of all devices beneath a point in the hierarchy and do not implicitly do
that relabeling as part of --reclassify-root.
Zack Cerza [Mon, 29 Oct 2018 22:07:27 +0000 (16:07 -0600)]
mgr/dashboard: Replace dashboard service
This splits out the collection of health and log data from the
/api/dashboard/health controller into /api/health/{full,minimal} and
/api/logs/all.
/health/full contains all the data (minus logs) that /dashboard/health
did, whereas /health/minimal contains only what is needed for the health
component to function. /logs/all contains exactly what the logs portion
of /dashboard/health did.
By using /health/minimal, on a vstart cluster we pull ~1.4KB of data
every 5s, where we used to pull ~6KB; those numbers would get larger
with larger clusters. Once we split out log data, that will drop to
~0.4KB.
John Spray [Tue, 7 Aug 2018 14:19:41 +0000 (10:19 -0400)]
mgr: create `volumes` module
This encapsulates and extends ceph_volume_client, providing
similar functionality as a service from ceph-mgr.
We used to call CephFS namespaces "filesystems", and the
ceph_volume_client-created directories "volumes". That
terminology changes in this module: namespaces are now "volumes",
and the directory-based entity is a "subvolume".
External systems can use librados to access the command
interface provided by this module, instead of using
ceph_volume_client directly.
Casey Bodley [Thu, 30 Nov 2017 23:40:06 +0000 (18:40 -0500)]
rgw: add rgw_rados_operate() to wrap optionally-async operate
calls IoCtx::operate() when given an empty optional_yield, or
librados::async_operate() when non-empty. calling async_operate()
with a yield_context behaves just like a synchronous call to
IoCtx::operate(), except that the stackful coroutine is suspended and
resumed on completion instead of blocking the calling thread
Casey Bodley [Sun, 12 Nov 2017 20:49:45 +0000 (15:49 -0500)]
common: add optional_yield wrapper
adds a wrapper type that may or may not contain a yield_context
that represents a boost::asio stackful coroutine, along with a
'null_yield' token to designate an empty one
Kefu Chai [Tue, 27 Nov 2018 04:08:28 +0000 (12:08 +0800)]
denc: only shallow_copy large-enough chunk for decoding
if the bl being decoded is not continous, shallow_copy() will do deep
copy under the hood. this introduces internal fragmentation when
decoding small objects in large non-contiguous bufferlist.
to alleviate this problem, we try to copy less if the object being
decoded is bounded.
Kefu Chai [Tue, 27 Nov 2018 02:47:07 +0000 (10:47 +0800)]
denc: add non-contiguous decode_nohead() for bl,string
bufferlist's denc traits claims to be need_contiguous=false, so
it should implement all all functions to work with
buffer::list::const_iterator. we already have decode(), the missing
puzzle is decode_nohead().
in this change, decode_nohead(size_t len, bufferlist& v,
buffer::list::const_iterator& p) is implemented.
same applies to basic_string.
ideally, we should allow decode buffer::list::iterator as well. but
let's leave it for another change in future when it's needed.
Kefu Chai [Mon, 26 Nov 2018 17:47:07 +0000 (01:47 +0800)]
buffer: fix the traits of list::iterator
before this change, the constness of value, pointer, etc traits of
list::iterator was wrong. also, because std::iterator is deprecated
in C++17, we need to define the traits manually. so, do this change.
these traits could be potentially used anywhere in the source tree.
but a noteworthy user is is_const_iterator<> in denc.h.
Kefu Chai [Mon, 26 Nov 2018 16:00:43 +0000 (00:00 +0800)]
denc: do not always copy before decoding
before this change, if the caller calls into ::decode_nohead(),
we will try to copy the memory chunk to be decoded before decoding it,
but if the buffer in the buffer list is not the last one, we will *deep*
copy the remaining part in the buffer list into a new contiguous memory
chunk and decode it instead. if we are decoding a map<int, buffer::ptr>
with lots of items in it, and the buffer::ptrs in it are very small,
we will end up suffering from a serious internal fragmentation.
we could use the same strategy of decode(), where we compare the
size of remaining part with CEPH_PAGE_SIZE, and only copy it if
it's small enough. this requires that the decoded type supports
both variant of decoding contiguous and non-contiguous.
quite a few MDS types do not support both of them. for instance,
snapid_t only supports contiguous decoding.
so, instead of conditioning on size, in this change, we condition
on the traits::need_contiguous. probably we can condition on both
of them in a follow-up change.
but we don't have the templatized version of it. so, if we
plugin a different allocator to basic_string<>, the new string won't
decode with buffer::list::const_iterator. this decode variant is used if
the caller only has a (probably non-contiguous) buffer::list in hand.
in this change, copy(unsigned, char*) is used as an alternative.
Sort through and batch bucket instances so that multiple calls to reading
current bucket info and locking can be avoided. For the most trivial case when
the bucket is already deleted we exit early with all the stale instances. When
the bucket reshard is in progress we only process the stale entries with status
done, if the bucket is available for locking then we lock down and mark the
other instances as well.