Erwan Velu [Fri, 31 Mar 2017 12:54:33 +0000 (14:54 +0200)]
ceph-disk: Adding retry loop in get_partition_dev()
There is very rare cases where get_partition_dev() is called before the actual partition is available in /sys/block/<device>.
It appear that waiting a very short is usually enough to get the partition beein populated.
Analysis:
update_partition() is supposed to be enough to avoid any racing between events sent by parted/sgdisk/partprobe and
the actual creation on the /sys/block/<device>/* entrypoint.
On our CI that race occurs pretty often but trying to reproduce it locally never been possible.
This patch is almost a workaround rather than a fix to the real problem.
It offer retrying after a very short to be make a chance the device to appear.
This approach have been succesful on the CI.
Note his patch is not changing the timing when the device is perfectly created on time and just differ by a 1/5th up to 2 seconds when the bug occurs.
A typical output from the build running on a CI with that code.
command_check_call: Running command: /usr/bin/udevadm settle --timeout=600
get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
get_partition_dev: Try 1/10 : partition 2 for /dev/sda does not in /sys/block/sda
get_partition_dev: Found partition 2 for /dev/sda after 1 tries
get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
get_dm_uuid: get_dm_uuid /dev/sda2 uuid path is /sys/dev/block/8:2/dm/uuid
Erwan Velu [Wed, 22 Mar 2017 09:11:44 +0000 (10:11 +0100)]
ceph-disk: Reporting /sys directory in get_partition_dev()
When get_partition_dev() fails, it reports the following message :
ceph_disk.main.Error: Error: partition 2 for /dev/sdb does not appear to exist
The code search for a directory inside the /sys/block/get_dev_name(os.path.realpath(dev)).
The issue here is the error message doesn't report that path when failing while it might be involved in.
This patch is about reporting where the code was looking at when trying to estimate if the partition was available.
Loic Dachary [Tue, 9 May 2017 10:32:51 +0000 (12:32 +0200)]
ceph-disk: separate ceph-osd --check-needs-* logs
It is using the OSD id zero but have nothing to do with OSD zero and
this is confusing to the user. The log themselves do not need to be kept
around and are stored in the run directory so that they can be disposed
of after reboot.
Samuel Just [Tue, 14 Feb 2017 20:47:37 +0000 (12:47 -0800)]
ReplicatedBackend: don't queue Context outside of ObjectStore with obc
We only flush the ObjectStore callbacks, not everything else. Thus,
there isn't a guarrantee that the obc held by pull_complete_info will
be cleaned up before the Flush callback is triggered. Instead, just
defer clearing the pull state until the callback (it'll be cleaned up
during the interval change) and remove the ObjectContext from
pull_complete_info.
Sage Weil [Thu, 9 Mar 2017 21:51:21 +0000 (16:51 -0500)]
os/bluestore/BlueFS: fix flush_bdev placement
We need to flush any new writes on any fsync(). Notably, this includes
the rocksdb log. However, previously _fsync was only doing a bdev flush if
we also had a dirty bluefs journal and called into _sync_and_flush_journal.
If we didn't, we weren't doing a flush() at all, which could lead to
corrupted data.
Fix this by moving the first flush_bdev *out* of _sync_and_flush_log. (The
second one is there to flush the bluefs journal; the first one was to
ensure prior writes are stable.) Instead, flush prior writes in all of the
callers prior to calling _sync_and_flush_log. This includes _fsync (and
fixes the bug by covering the non-journal-flush path) as well as several
other callers.
Sage Weil [Thu, 9 Mar 2017 21:51:05 +0000 (16:51 -0500)]
os/bluestore/KernelDevice: make flush() thread safe
flush() may be called from multiple racing threads (notably, rocksdb can call fsync via
bluefs at any time), and we need to make sure that if one thread sees the io_since_flush
command and does an actual flush, that other racing threads also wait until that flush is
complete. This is accomplished with a simple mutex!
Also, set the flag on IO *completion*, since flush is only a promise about
completed IOs, not submitted IOs.
osd: Return correct osd_objectstore in OSD metadata
Do not simply read the configuration value as it might have changed
during OSD startup by reading the type from disk.
Fixes: http://tracker.ceph.com/issues/18638 Signed-off-by: Wido den Hollander <wido@42on.com>
(cherry picked from commit 8fe6a0303b02ac1033f5bfced9f94350fe3e33de)
Conflicts:
src/osd/OSD.cc
- g_conf->osd_objectstore was changed to cct->_conf->osd_objectstore by 1d5e967a05ddbcceb10efe3b57e242b3b6b7eb8c which is not in kraken
liuchang0812 [Mon, 27 Mar 2017 05:08:12 +0000 (13:08 +0800)]
rgw/lifecycle: do not send lifecycle rules when GetLifeCycle failed
Now, RGW will send two HTTP responses when GetLifeCycle failed. The first one is
Error Respnse like 404, and the second is lifecycle rules. It will breaks s3 sdk
and s3 utilities.
Fixes: http://tracker.ceph.com/issues/19363 Signed-off-by: liuchang0812 <liuchang0812@gmail.com>
(cherry picked from commit c3c0c828da5a64ca896475c1b0c369fde1bbd76a)
Response header of Swift API returned by radosgw does not contain
"x-openstack-request-id", but Swift returns it. Enhance the
compatibility of radosgw.
Fixes: http://tracker.ceph.com/issues/19443 Signed-off-by: tone-zhang <tone.zhang@linaro.org>
(cherry picked from commit e96db213079ab5e026156ab4b38418d1d4c23d27)
Conflicts:
src/librbd/AioImageRequestWQ.h:
- in master this file has morphed into src/librbd/io/ImageRequestWQ.h
- kraken has AioImageRequest<ImageCtx> instead of ImageRequest<ImageCtx>
src/librbd/image/RefreshRequest.cc:
- rename image context element to "aio_work_queue" (from "io_work_queue")
because kraken doesn't have de95d862f57b56738e04d77f2351622f83f17f4a
src/test/librbd/image/test_mock_RefreshRequest.cc:
- rename image context element to "aio_work_queue" (from "io_work_queue")
because kraken doesn't have de95d862f57b56738e04d77f2351622f83f17f4a
Samuel Just [Wed, 18 Jan 2017 18:24:13 +0000 (10:24 -0800)]
PrimaryLogPG::try_lock_for_read: give up if missing
The only users calc_*_subsets might try to read_lock an object which is
missing on the primary. Returning false in those cases is perfectly
reasonable and avoids the problem.
Samuel Just [Wed, 23 Nov 2016 23:41:13 +0000 (15:41 -0800)]
ReplicatedBackend: take read locks for clone sources during recovery
Otherwise, we run the risk of a clone source which hasn't actually
come into existence yet being used if we grab a clone which *just*
got added the the ssc, but has not yet actually had time to be
created (can't rely on message ordering here since recovery messages
don't necessarily order with client IO!).
Sage Weil [Fri, 31 Mar 2017 14:06:42 +0000 (10:06 -0400)]
ceph_test_librados_api_misc: fix stupid LibRadosMiscConnectFailure.ConnectFailure test
Sometimes the cond doesn't time out and it wakes up instead. Just repeat
the test many times to ensure that at least once it times out (usually
it doesn't; it's pretty infrequent that it doesn't).
Merge pull request #16069 from smithfarm/wip-20345-kraken
kraken: make check fails with Error EIO: load dlopen(build/lib/libec_FAKE.so): build/lib/libec_FAKE.so: cannot open shared object file: No such file or directory
qa/workunits/ceph-helpers: do not error out if is_clean
it would be a race otherwise, because we cannot be sure that the cluster
pgs are not all clean or not when run_osd() returns, but we can be sure
that they are expected to active+clean after a while. that's what
wait_for_clean() does.
kraken: osd: unlock sdata_op_ordering_lock with sdata_lock hold to avoid missing wakeup signal
Based on commit bc683385819146f3f6f096ceec97e1226a3cd237. The OSD code has
been refactored a lot since Kraken, hence cherry-picking that patch introduces
a lot of unrelated changes, and is much more difficult than reusing the idea.
Nathan Cutler [Fri, 23 Jun 2017 06:27:42 +0000 (08:27 +0200)]
tests: move swift.py task to qa/tasks
In preparation for moving this task from ceph/teuthology.git into ceph/ceph.git
The move is necessary because jewel-specific changes are needed, yet teuthology
does not maintain a separate branch for jewel. Also, swift.py is a
Ceph-specific task so it makes more sense to have it in Ceph.