Josh Durgin [Sun, 31 Mar 2013 00:25:18 +0000 (17:25 -0700)]
librbd: change diff_iterate interface to be more C-friendly
Use int instead of bool for the callback, and make it represent
whether the data exists, rather than the opposite, since callers
are likely to test for whether it's data instead of whether its zeroes.
Change the return value to 0, since an int64_t will wrap around
for large reads, and there's no value in reporting the length
read when it will always be the length requested clipped to the
size of the image.
Josh Durgin [Fri, 29 Mar 2013 23:48:02 +0000 (16:48 -0700)]
rbd: fail import-diff if we reach the end of the stream sooner than expected
safe_read() just protects against EINTR, and may return less data than
requested if it reaches the end of the file. Use safe_read_exact() to
make sure we get the right amount of data.
We were using the internal CEPH_NOSNAP and CEPH_SNAPDIR constants, and
defining a clone_info_t::HEAD (with a different value). The docs were
referrring to the internal constant names.
Instead, define librados constants (C and C++) with the same values as the
internal types.
Note that this changes the clone_info_t::HEAD value from -1 to -2 so that
it now matches the internal type.
Sage Weil [Thu, 28 Mar 2013 21:13:03 +0000 (14:13 -0700)]
librbd: fix diff_iterate arithmetic for non-standard striping
This code is confusing because we are moving back and forth between
image offsets, "buffer" offsets (image offsets relative to off), and
object offsets. Fix the math.
Sage Weil [Thu, 28 Mar 2013 16:26:11 +0000 (09:26 -0700)]
librbd: handle diff from clone
If we have a parent image, and the reference is from snap 0 (beginning of
time) we need to look at the diff on the parent from the beginning of time
and report that when we get an ENOENT.
Sage Weil [Wed, 27 Mar 2013 06:16:54 +0000 (23:16 -0700)]
qa: add rbd/diff_continuous.sh stress test
Stress test that does io on an image while we are mirroring a diff from
earlier snaps to a second copy. At the end, verify that all snaps have
matching content.
Sage Weil [Mon, 25 Mar 2013 21:14:50 +0000 (14:14 -0700)]
librbd: implement diff_iterate
Implement a diff_iterate() method that will iterate over an image and
report which extents vary between two snapshots (or a snapshot and the
head). The callback gets an extent and a flag indicating whether it is
full of data or is known to be zero in the ending snapshot.
Sage Weil [Tue, 26 Mar 2013 15:53:00 +0000 (08:53 -0700)]
osd: fix clone snap list for list-snaps
We need to return the list of snaps that each clone is defined for, not
the list of snaps we know may or may not exist globally over a similar
interval. This requires looking at the clone's obc, unfortunately.
Sage Weil [Tue, 26 Mar 2013 17:31:19 +0000 (10:31 -0700)]
osd: direct reads on SNAPDIR to either head or snapdir
The list_snaps operation needs to look at the SnapSet, and is logically
querying all revisions of the object. Make requests to SNAPDIR be
read-only, and grab the head or snapdir obc transparently (whichever one
exists). This allows us to list snaps when, say, the head does not
exist, but there are in fact snaps.
Sage Weil [Tue, 26 Mar 2013 04:18:47 +0000 (21:18 -0700)]
osd: do not include snaps with head on list_snaps()
If there is a sequence of snaps 1, 2, 3, 4, 5, and we have a clone
2 with [1,2], and the head reflects content at snap times [3,4,5], then
the snap_list should return
clone 2 snaps [1,2]
head snaps
seq 2
because it never saw a write after snap 2, and therefor has the same
content currently as it did in snaps 3,4,5. If the SnapSet on the
object lists snaps 3,4,5, and the head exists, it actually means the
object was deleted between 2 and 3, and was recreated after 5:
clone 2 snaps [1,2]
head snaps []
seq 5
The key to telling the two situations apart is the seq number on the
SnapSet (now included in the list_snaps reply) that tells us when the
last update was.
Loic Dachary [Sat, 30 Mar 2013 10:26:12 +0000 (11:26 +0100)]
fix null character in object name triggering segfault
Parsing \n in lfn_parse_object_name is implemented with
out->append('\0');
which segfaults when using libstdc++ and g++ version 4.6.3 on Debian
GNU/Linux. It is replaced with
(*out) += '\0';
to avoid the bugous implicit conversion. There is no append(charT)
method in C++98 or C++11, which means it relies on an implicit
conversion that is bugous. It would be better to rely on the
basic_string& operator+=(charT c); method as defined in ISO 14882-1998
(page 385) thru ISO 14882-2012 (page 640)
A set of tests is added to generate and parse object names. They need
access to the private function lfn_parse_object_name because there is
no convenient protected method to exercise it. The tests contain a
LFNIndex derived class, TestWrapLFNIndex which is made a friend of
LFNIndex to gain access to the private methods.
Loic Dachary [Thu, 28 Mar 2013 12:38:09 +0000 (13:38 +0100)]
unit test LFNIndex::remove_object and LFNIndex::lfn_unlink
When the object name is short, check that the corresponding file is
::unlink()ed. When the object name is long, there may be multiple files
with the same name, modulo the anti-collision number showing just before
the FILENAME_COOKIE. The following scenarii are tested:
* there only is one file
* there are multiple files and the last one is removed
* there are multiple files and the last one is moved in place of the
file that is to be removed
lfn_unlink and remove_object are tested together because
lfn_unlink is a private function and remove_object is a protected function
that does very little beside calling lfn_unlink
mon: ConfigKeyService: stash config keys on the monitor
Building up on the Single-Paxos and our existing k/v store that backs
the monitor, we now introduce a simple service so that the monitors
act as a generic k/v store available to the cluster, in which a user
can stash (and later obtain) configuration keys at his own discretion.
Users can put, get, delete, list and check for values using the
following commands:
- ceph config-key put <key> [<value>]
or
- ceph config-key put <key> [-i <in-file>]
with 'value' and 'in-file' being optional; if these are not specified,
'put' will act as 'touch' if 'key' does not exist, or will overwrite
the value of 'key' with a zero byte value (i.e., truncates the
contents of the value to zero)
- ceph config-key get <key>
or
- ceph config-key get <key> -o <out-file>
- ceph config-key delete <key>
- ceph config-key list [-o <out-file]
- ceph config-key exists <key>
Fixes: #4313 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Gary Lowell [Thu, 28 Mar 2013 23:12:33 +0000 (16:12 -0700)]
ceph.spec.in: Move four scripts from sbin to usr/bin
The ceph-create-keys, ceph-disk, ceph-disk-activate, and
ceph-disk-prepare scripts are built in sbin, but debian installs
them into usr/bin, and several utilities look for them there.
This commit changes the RPM to install them in /usr/bin. (Bug #3921)
Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
ceph: propagate do_command()'s return value to user space
We were returning '1' regardless of what do_command() returned in case
of error. This would make building tools relying on command error codes
short of useless, and forced them to rely instead on error messages.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit e91405d540ce11b9996e4977212553bd33afb3ed)
ceph: propagate do_command()'s return value to user space
We were returning '1' regardless of what do_command() returned in case
of error. This would make building tools relying on command error codes
short of useless, and forced them to rely instead on error messages.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
Josh Durgin [Thu, 21 Mar 2013 23:04:10 +0000 (16:04 -0700)]
librbd: add an async flush
At this point it's a simple wrapper around the ObjectCacher or
librados.
This is needed for QEMU so that its main thread can continue while a
flush is occurring. Since this will be backported, don't update the
librbd version yet, just add a #define that QEMU and others can use to
detect the presence of aio_flush().
Josh Durgin [Wed, 27 Mar 2013 22:42:10 +0000 (15:42 -0700)]
librbd: use the same IoCtx for each request
Before we were duplicating the IoCtx for each new request since they
could have a different snapshot context or read from a different
snapshot id. Since librados now supports setting these explicitly
for a given request, do that instead.
Since librados tracks outstanding requests on a per-IoCtx basis, this
also fixes a bug that causes flush() without caching to ignore
all the outstanding requests, since they were to separate,
duplicate IoCtxs.
Josh Durgin [Wed, 27 Mar 2013 22:32:29 +0000 (15:32 -0700)]
librados: add versions of a couple functions taking explicit snap args
Usually the snapid to read from or the snapcontext to send with a write
are determined implicitly by the IoCtx the operations are done on.
This makes it difficult to have multiple ops in flight to the same
IoCtx using different snapcontexts or reading from different snapshots,
particularly when more than one operation may be needed past the initial
scheduling.
Add versions of aio_read, aio_sparse_read, and aio_operate
that don't depend on the snap id or snapcontext stored in the IoCtx,
but get them from the caller. Specifying this information for each
operation can be a more useful interface in general, but for now just
add it for the methods used by librbd.