Sage Weil [Fri, 9 Nov 2012 14:48:10 +0000 (06:48 -0800)]
client: do not gratuitously drop FILE_CACHE ref in _read()
The get_caps() had a confusing out-arg called "got" that is really what
caps we *have*; it only takes a ref on the *need* cap. We should only
put that one explicitly (CEPH_CAP_FILE_RD). The _write() method already
does this properly, but _read() did not.
Fixes: #3470 Signed-off-by: Sage Weil <sage@inktank.com>
Dan Mick [Tue, 6 Nov 2012 23:28:10 +0000 (15:28 -0800)]
cls_rbd: send proper format of key to "last_read" for dir_list
rbd ls of format-2 images was looping on the first 64 (when more than 64
were present). The key name passed to the omap layer needs to always
contain the prefix, and the "inside-the-loop next-chunk" statement
was missing the "add the prefix" call.
Also, add a test for listing 100 images, format 1 and 2.
Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Yehuda Sadeh [Wed, 24 Oct 2012 19:56:59 +0000 (12:56 -0700)]
rgw: don't reset multipart parts when updating their metadata
Fixes: #3401
The problem was that put_obj_meta() was assuming object is going
to be reset, so it was resetting the object anyway. This is not
true when dealing with the immutable multipart upload parts.
Dan Mick [Tue, 6 Nov 2012 00:13:19 +0000 (16:13 -0800)]
rbd: allow removal of image even if rbd_children deletion fails
Users have been seeing failures where rbd rm is half-done; could be
because of outstanding watches on the rbd_header object. The state
is that rbd_children no longer contains the child, but other pieces
remain; remove considers this a failure.
Fix: test for ENOENT from remove_child, and treat that as an ignorable
error and drive on. Simulate this in copy.sh by removing the
rbd_children object altogether, which also results in ENOENT return
from remove_child.
Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Samuel Just [Fri, 2 Nov 2012 20:02:15 +0000 (13:02 -0700)]
PG: use remove_object_with_snap_hardlinks for divergent objects
Otherwise, we end up leaving snap hardlinks in the snapshot
index directories. This eventually results in an EEXIST error
when we attempt to re-link the clone into place during
recovery.
Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Sage Weil [Tue, 30 Oct 2012 21:17:56 +0000 (14:17 -0700)]
ceph-disk-activate: avoid duplicating mounts if already activated
If the given device is already mounted at the target location, do not
mount --move it again and create a bunch of dup entries in the /etc/mtab
and kernel mount table.
Sage Weil [Fri, 26 Oct 2012 04:21:18 +0000 (21:21 -0700)]
ceph-disk-prepare: poke kernel into refreshing partition tables
Prod the kernel to refresh the partition table after we create one. The
partprobe program is packaged with parted, which we already use, so this
introduces no new dependency.
Sage Weil [Mon, 29 Oct 2012 18:03:46 +0000 (11:03 -0700)]
osd: make pool_snap_info_t encoding backward compatible
Way back in fc869dee1e8a1c90c93cb7e678563772fb1c51fb (v0.42) when we redid
the osd type encoding we forgot to make this conditionally encode the old
format for old clients. In particular, this means that kernel clients
will fail to decode the osdmap if there is a rados pool with a pool-level
snapshot defined.
Fixes: #3290 Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Thu, 25 Oct 2012 00:00:01 +0000 (17:00 -0700)]
osd: fix populate_obc_watchers() assert
There is one case where populate_obc_watchers gets called when the object
is missing: during a revert. And in that case we *should* do the populate,
since all that is getting reverted is the object version.
Fixes: #3405 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Sam Just <sam.just@inktank.com>
Sage Weil [Mon, 22 Oct 2012 17:45:36 +0000 (10:45 -0700)]
osd: drop conditional check in populate_obc_watchers
Turn these into asserts. The only two callers are create_object_context()
and get_object_context(), and they only get called when the object is no
longer missing.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Sage Weil [Mon, 22 Oct 2012 17:45:20 +0000 (10:45 -0700)]
osd: populate obc watchers even when degraded
Bug #3142 appears to be caused by the following sequence:
- object X missing on primary and replica
- [assert-ver,watch], notify, unwatch requests come in, get deferred
- object is recovered on primary, !missing, create_object_context
- populate_obc_watchers() does nothing, since still degraded
- notify happens now (odd but ok?)
- replica recovered, !degraded
- watch skips bc of bad assert
- unwatch trips up on an assert because populate_obc_watchers never
ran
Fix this by populating the obc watcher when !missing, not when
!degraded. This conditional dates back to Sam's original watch/notify
cleanup in October 2011.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Sage Weil [Thu, 18 Oct 2012 00:44:12 +0000 (17:44 -0700)]
addr_parsing: make , and ; and ' ' all delimiters
Instead of just ,. Currently "foo.com, bar.com" will fail because of the
space after the comma. This patches fixes that, and makes all delim
chars interchangeable.
Sage Weil [Tue, 16 Oct 2012 20:03:53 +0000 (13:03 -0700)]
mds: explicitly queue messages for unconnected clients
Previously, the messenger would queue messages for a destination that
didn't exist when you were a server; that changed a while back with the
wip-msgr merge (circa v0.52). The result is that when we force open
client sessions and queue messages, they are dropped on the floor and the
client--when it does connect--gets confusing stuff from the MDS.
Instead, explicitly queue and send these messages. Also, *always* send
via the Connection* instead of the inst.
Fixes: #2681 Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Mon, 15 Oct 2012 21:20:51 +0000 (14:20 -0700)]
client: fix shadowing in inode ctor
CID 728080 (#1 of 1): Incorrect sizeof expression (BAD_SIZEOF)
Taking the size of pointer parameter "layout" is suspicious.
At (2): Non-static class member field "layout.fl_stripe_unit" is not initialized in this constructor nor in any functions that it calls.
At (4): Non-static class member field "layout.fl_stripe_count" is not initialized in this constructor nor in any functions that it calls.
At (6): Non-static class member field "layout.fl_object_size" is not initialized in this constructor nor in any functions that it calls.
At (8): Non-static class member field "layout.fl_cas_hash" is not initialized in this constructor nor in any functions that it calls.
At (10): Non-static class member field "layout.fl_object_stripe_unit" is not initialized in this constructor nor in any functions that it calls.
At (12): Non-static class member field "layout.fl_unused" is not initialized in this constructor nor in any functions that it calls.
CID 717206 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
At (14): Non-static class member field "layout.fl_pg_pool" is not initialized in this constructor nor in any functions that it calls.
Sage Weil [Mon, 15 Oct 2012 21:19:10 +0000 (14:19 -0700)]
client: init readdir fields
At (2): Non-static class member "readdir_offset" is not initialized in this constructor nor in any functions that it calls.
At (4): Non-static class member "readdir_end" is not initialized in this constructor nor in any functions that it calls.
At (6): Non-static class member "readdir_num" is not initialized in this constructor nor in any functions that it calls.
CID 717207 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
At (8): Non-static class member "tid" is not initialized in this constructor nor in any functions that it calls.
Sage Weil [Mon, 15 Oct 2012 21:14:28 +0000 (14:14 -0700)]
cls_rgw: init var in ctor
CID 727992 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
At (2): Non-static class member "tag_timeout" is not initialized in this constructor nor in any functions that it calls.
Tommi Virtanen [Fri, 5 Oct 2012 17:57:42 +0000 (10:57 -0700)]
ceph-disk-prepare, debian/control: Support external journals.
Previously, ceph-disk-* would only let you use a journal that was a
file inside the OSD data directory. With this, you can do:
ceph-disk-prepare /dev/sdb /dev/sdb
to put the journal as a second partition on the same disk as the OSD
data (might save some file system overhead), or, more interestingly:
ceph-disk-prepare /dev/sdb /dev/sdc
which makes it create a new partition on /dev/sdc to use as the
journal. Size of the partition is decided by $osd_journal_size.
/dev/sdc must be a GPT-format disk. Multiple OSDs may share the same
journal disk (using separate partitions); this way, a single fast SSD
can serve as journal for multiple spinning disks.
The second use case currently requires parted, so a Recommends: for
parted has been added to Debian packaging.
Closes: #3078 Closes: #3079 Signed-off-by: Tommi Virtanen <tv@inktank.com>
Dan Mick [Wed, 10 Oct 2012 17:41:05 +0000 (10:41 -0700)]
rbd: don't issue usage on errors
Change bare calls to usage() to an informative targeted error message
Remove all calls to usage() except when requested with -h/--help
Regularize all errors to start with rbd:
Remove a few commented cerrs, wrap cerr calls at 80 cols
Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Sam Lang [Tue, 9 Oct 2012 22:34:48 +0000 (17:34 -0500)]
client: Reset cache_name pos on dirp
Reset the at_cache_name field on the directory
stream pointer for rewinddir.
This fixes a bug where getdents after readdir at
the end of the stream would return invalid
results after rewinddir had been called.
Sage Weil [Tue, 9 Oct 2012 21:10:48 +0000 (14:10 -0700)]
ceph-debugpack: updates
- avoid copying data around; tar things directly into the tgz
- 'ceph report' instead of all the little bits
- unrotated logs only
- ensure target doesn't already exist