Blaine Gardner [Mon, 17 Nov 2014 23:17:15 +0000 (17:17 -0600)]
Fix bug #10096 (ceph-disk umount race condition)
Bug: http://tracker.ceph.com/issues/10096
Brief: Unmounting temporary mount point failed due to file being 'busy'.
Root cause could not be easily determined due to timing variances caused
by debug attempts. Race condition exists.
Solution: Implement a retry with incremental backoff as a viable
workaround. This workaround is okay because (1) Finding the root cause
would take a not insignificant amount of time/effort. (2) The workaround
is a more general fix for any process that might cause the exhibited
behavior.
Adam Spiers [Sun, 16 Nov 2014 20:52:36 +0000 (15:52 -0500)]
doc: fix typos in diagram for incomplete write
In this example of a write of v2 of the object being interrupted, OSD2
would never have any version of the D1 chunk. It only has the old v1
version of the D2 chunk.
Loic Dachary [Fri, 14 Nov 2014 00:16:10 +0000 (01:16 +0100)]
common: do not omit shard when ghobject NO_GEN is set
Do not silence the display of shard_id when generation is NO_GEN.
Erasure coded objects JSON representation used by ceph_objectstore_tool
need the shard_id to find the file containing the chunk.
Minimal testing is added to ceph_objectstore_tool.py
Loic Dachary [Thu, 13 Nov 2014 16:32:14 +0000 (17:32 +0100)]
tests: ceph_objectstore_tool.py replace stop.sh with init-ceph
The stop.sh will stop all ceph-* processes. Use the init-ceph script
instead to selectively kill the daemons run by the vstart.sh cluster
used for ceph_objectstore_tool.
Loic Dachary [Thu, 13 Nov 2014 16:27:01 +0000 (17:27 +0100)]
tests: ceph_objectstore_tool.py run faster by default
By default use only a small number of objects to speed up the tests. If
the argument "big" is given, use a large number of objects as it may
help find some problems.
Loic Dachary [Thu, 13 Nov 2014 16:21:48 +0000 (17:21 +0100)]
tests: ceph_objectstore_tool.py run mon and osd on specific port
By default vstart.sh runs MDS but they are not needed for the tests,
only run mon and osd instead. Instead of using the default vstart.sh
port which may conflict with a already running vstart.sh, set the
CEPH_PORT=7400 which is not used by any other test run with make check.
Sébastien Han [Thu, 13 Nov 2014 18:11:36 +0000 (19:11 +0100)]
Improve readability of the exception
The error messages were not really clear from a non-programmer
perspective. In the context of OpenStack all the drivers are falling
back to the exceptions provided by the rados library. Having clearer
error messages will help debugging misconfigured environment.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
Yan, Zheng [Thu, 13 Nov 2014 05:38:35 +0000 (13:38 +0800)]
mds: don't overwrite reply's snapbl
set_trace_dist() updates reply's snapbl, don't overwrite it. For MKSNAP
request, just need to set mdr->tracei, set_trace_dist() will set reply's
snapbl.
Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
Sage Weil [Thu, 13 Nov 2014 18:59:22 +0000 (10:59 -0800)]
crush/CrushWrapper: fix detach_bucket
In commit 9850227d2f0ca2f692a154de2c14a0a08e751f08 we changed the call that
changed the weight of all instances of item to one that explicitly
changes it in the parent bucket, but parent_id may not be valid at the
call site. Move this into the conditional block to fix.
Fixes: #10095 Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 13 Nov 2014 01:11:10 +0000 (17:11 -0800)]
osd/OSD: use OSDMap helper to determine if we are correct op target
Use the new helper. This fixes our behavior for EC pools where targetting
a different shard is not correct, while for replicated pools it may be. In
the EC case, it leaves the op hanging indefinitely in the OpTracker because
the pgid exists but as a different shard.
Fixes: #9835 Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 13 Nov 2014 01:04:35 +0000 (17:04 -0800)]
osd/OSDMap: add osd_is_valid_op_target()
Helper to check whether an osd is a given op target for a pg. This
assumes that for EC we always send ops to the primary, while for
replicated we may target any replica.
David Zafman [Wed, 12 Nov 2014 23:22:04 +0000 (15:22 -0800)]
ceph_objectstore_tool: Fixes to make import work again
The is_pg() call is now true even for pgs pending removal, fix broken
finish_remove_pgs() by removing is_pg() check.
Need to add create_collection() to the initial transaction on import
Fixes: #10090 Signed-off-by: David Zafman <dzafman@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
Loic Dachary [Wed, 12 Nov 2014 17:49:54 +0000 (18:49 +0100)]
qa: handle CEPH_CLI_TEST_DUP_COMMAND on ceph osd create
If CEPH_CLI_TEST_DUP_COMMAND is set when ceph osd create is called, it
will create two osd. They must be cleaned up afterwards instead of
assuming only one is going to be created.
Sébastien Han [Mon, 10 Nov 2014 14:06:20 +0000 (15:06 +0100)]
doc: enable RBD cache and socket on OpenStack deployments
Enabling the RBD cache improves sequential IOs and the socket helps a
lot while troubleshooting. These 2 items are considered as best
practice for OpenStack deployments with Ceph.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
Josh Durgin [Wed, 12 Nov 2014 02:16:02 +0000 (18:16 -0800)]
qa: allow small allocation diffs for exported rbds
The local filesytem may behave slightly differently. This isn't
foolproof, but seems to be reliable enough on rhel7 rootfs, where
exact comparison was failing.
Rongze Zhu [Mon, 10 Nov 2014 16:13:42 +0000 (00:13 +0800)]
crush: fix tree bucket functions
There are incorrect nodes' weight in tree bucket when construct tree
bucket. The tree bucket don't store item id in items array, so the tree
bucket will not work correctly. The patch fix above bugs and add a
simple test for tree bucket.
The check for 'nextkey < last_disk_key' makes not much sense since
last_disk_key is an empty string and not set before. Comparing a
decoded string to be less than an empty string will be never true.
Since this if() isn't part of a loop last_disk_key is only set
once and there is no other consumer: revert this dead code.