Yan, Zheng [Thu, 13 Nov 2014 05:38:35 +0000 (13:38 +0800)]
mds: don't overwrite reply's snapbl
set_trace_dist() updates reply's snapbl, don't overwrite it. For MKSNAP
request, just need to set mdr->tracei, set_trace_dist() will set reply's
snapbl.
Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
Sage Weil [Thu, 13 Nov 2014 18:59:22 +0000 (10:59 -0800)]
crush/CrushWrapper: fix detach_bucket
In commit 9850227d2f0ca2f692a154de2c14a0a08e751f08 we changed the call that
changed the weight of all instances of item to one that explicitly
changes it in the parent bucket, but parent_id may not be valid at the
call site. Move this into the conditional block to fix.
Fixes: #10095 Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 13 Nov 2014 01:11:10 +0000 (17:11 -0800)]
osd/OSD: use OSDMap helper to determine if we are correct op target
Use the new helper. This fixes our behavior for EC pools where targetting
a different shard is not correct, while for replicated pools it may be. In
the EC case, it leaves the op hanging indefinitely in the OpTracker because
the pgid exists but as a different shard.
Fixes: #9835 Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 13 Nov 2014 01:04:35 +0000 (17:04 -0800)]
osd/OSDMap: add osd_is_valid_op_target()
Helper to check whether an osd is a given op target for a pg. This
assumes that for EC we always send ops to the primary, while for
replicated we may target any replica.
Josh Durgin [Wed, 12 Nov 2014 02:16:02 +0000 (18:16 -0800)]
qa: allow small allocation diffs for exported rbds
The local filesytem may behave slightly differently. This isn't
foolproof, but seems to be reliable enough on rhel7 rootfs, where
exact comparison was failing.
The check for 'nextkey < last_disk_key' makes not much sense since
last_disk_key is an empty string and not set before. Comparing a
decoded string to be less than an empty string will be never true.
Since this if() isn't part of a loop last_disk_key is only set
once and there is no other consumer: revert this dead code.
Danny Al-Gaaf [Thu, 30 Oct 2014 02:14:41 +0000 (03:14 +0100)]
rados_sync.cc: fix xattr_diff() for the only_in_b checks
In the checks to build only_in_b up the wrong const_iterator x is
build up. it should compare rhs->xattrs with xattrs entries and
not twice rhs->xattrs.
Fix for:
CID 716957 (#1 of 1): Invalid iterator comparison (MISMATCHED_ITERATOR)
mismatched_comparison: Comparing x from rhs->xattrs to this->xattrs.end()
from this->xattrs.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
CID 717177 (#2-1 of 3): Uncaught exception (UNCAUGHT_EXCEPT)
root_function: In function main(int, char const **) an exception of
type ceph::FailedAssertion is thrown and never caught.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Rongze Zhu [Fri, 10 Oct 2014 11:18:00 +0000 (19:18 +0800)]
crush: fix incorrect use of adjust_item_weight method
adjust_item_weight method will adjust all buckets which the item
inside. If the osd.0 in host=fake01 and host=fake02, we execute
"ceph osd crush osd.0 10 host=fake01", it not only will adjust fake01's
weight, but also will adjust fake02's weight.
the patch add adjust_item_weightf_in_loc method and fix remove_item,
_remove_item_under, update_item, insert_item, detach_bucket methods.
Introduce ceph_erasure_code_non_regression to check and compare how an
erasure code plugin encodes and decodes content with a given set of
parameters. For instance:
Will create an encoded object (--create) and store it into a directory
along with the chunks, one chunk per file. The directory name is derived
from the parameters. The content of the object is a random pattern of 31
bytes repeated to fill the object size specified with --stripe-width.
The check function (--check) reads the object back from the file,
encodes it and compares the result with the content of the chunks read
from the files. It also attempts recover from one or two erasures.
Chunks encoded by a given version of Ceph are expected to be encoded
exactly in the same way by all Ceph versions going forward.
Loic Dachary [Sun, 9 Nov 2014 02:23:06 +0000 (03:23 +0100)]
erasure-code: document pool operations
A short introduction to the first time user of an erasure coded pool.
It includes a reminder of how it relates to cache tiering and links to
define new profiles with an example.
There was examples in the developer documentation but the operator
expects to find such a guide in the rados operations chapter.
Loic Dachary [Wed, 22 Oct 2014 03:05:45 +0000 (20:05 -0700)]
tests: use kill -0 to check process existence
When killing a daemon, instead of using kill -9 to check the process was
terminated, use kill -0. Should the pid of the process be reused
immediately after, it would be wrong to kill the new process. Worst case
scenario the kill_daemon function returns before the process is
confirmed to be killed but this is not treated as an error and is
unlikely to cause any problem.
Loic Dachary [Sat, 18 Oct 2014 22:41:40 +0000 (15:41 -0700)]
tests: remove vstart_wrapped_tests.sh
Listing tests to be run in a single script does not take advantage of
parallel runs in make.
The vstart_wrapper.sh script is reworked and made less specialized and
let the caller decide which daemons to run via CEPH_START and does not
enforce the number of deamons of each time. It no longer uses stop.sh to
avoid killing the osd/mon/mds that are unrelated to the tests.
John Spray [Fri, 7 Nov 2014 14:20:04 +0000 (14:20 +0000)]
tools: error handling on journal import/export
Actually propagate nonzero returns codes! Also
add checks on return values of I/O functions so
that someone doesn't think they've successfully
exported a journal if they haven't, and some
validation of the header pointers during import
so that people find out with a nice error
instead of an assertion if something is up.
Signed-off-by: John Spray <john.spray@redhat.com>
Fix compile issue in the position value cout.
Greg Farnum [Thu, 6 Nov 2014 19:10:29 +0000 (11:10 -0800)]
MDS: clean up internal MDRequests the standard way
All cleanup is now routed through respond_to_request(),
which invokes the internal_op_finish Context*, then does
mdcache->request_finish(). This is easier to reason about,
and indeed fixes a bug (I was not cleaning up locks
following flush). Use the MDSContinuation to facilitate
this in scrub's case.
Greg Farnum [Fri, 29 Aug 2014 06:03:59 +0000 (23:03 -0700)]
MDCache: make scrub_dentry schedulable and reentrant
Rather than assuming that any necessary inodes are in the cache, split up
MDCache::scrub_dentry into setup and work phases. Add an internal_op_finisher()
to MDRequest. Dispatch any CEPH_MDS_OP_VALIDATE internal operations to
scrub_dentry_work(). Taken together, these make everything work properly when
path_traverse() (by way of rdlock_path_pin_ref()) needs to go to disk before
satisfying the lookup.
Greg Farnum [Wed, 27 Aug 2014 21:11:26 +0000 (14:11 -0700)]
MDCache: "handle" request_forward on internal ops
For now, just return -EXDEV ("Cross-device link") on internal ops that
require forwarding, as forwarding internal ops will require a great deal more
infrastructure.. But push the issue down to this level instead of worrying
about it in path_traverse, and consider the possibility that the MDRequest
might not have a client_request that it's wrapped around.
Greg Farnum [Thu, 21 Aug 2014 03:12:00 +0000 (20:12 -0700)]
Server: rename reply_request -> reply_client_request; make it private
The generic reply_request(MDRequest, int) is now the only caller. It's still
just building an MClientRequest to pass along, but we can change it a lot more
easily now to support responding to non-client requests.