Sage Weil [Wed, 24 Oct 2012 21:41:38 +0000 (14:41 -0700)]
osdc/ObjectCacher: set complete flag when we observe ENOENT
If we observe an ENOENT on a read, set the complete flag. Any dirty
buffers we have will still be in memory, even if the write are in flight,
because the TX state remains pinned until the writes commit. Writes cannot
proceed faster than reads, even though reads may proceed faster than
writes.
Sage Weil [Wed, 24 Oct 2012 19:48:02 +0000 (12:48 -0700)]
osdc/ObjectCacher: refresh iterator in read apply loop
The p iterator points to the next bh, but try_merge_bh() at the end of the
loop might merge that into our result and invalidate the iterator. Fix
this by repeating the lookup on each pass through the loop.
Sage Weil [Wed, 24 Oct 2012 19:44:25 +0000 (12:44 -0700)]
osdc/ObjectCacher: do read completions after assimilating read result
Wait until we have applied the entire read result to the cache before we
trigger any read completion events. This is a cleaner and safer approach
since we can be sure that the callback won't get blocked again on data we
have but haven't applied yet. It also fixes a crash I just observed where
the completion did a read, called trim(), and invalidated/destroyed the
iterator/bh p was referencing.
Sage Weil [Tue, 23 Oct 2012 16:18:04 +0000 (09:18 -0700)]
osdc/ObjectCacher: check lru_is_expireable() in can_close()
We assert that if can_close(), the Object isn't pinned in the LRU. This
assumes we did yur get/put refcounting properly, such that the pins are
at least as restrictive as can_close().
Sage Weil [Fri, 26 Oct 2012 18:30:06 +0000 (11:30 -0700)]
librbd: fix race in AioCompletion that are still being built
When caching is enabled, it is possible for the io completion to happen
faster than we call ->finish_adding_requests() (e.g., on cache read).
When that happens, the final read request completion doesn't see a
pending_count == 0 and thus doesn't do all the final buffer construction
that is necessary to return correct data. In particular, users will see
zeroed buffers. test_librbd_fsx is turning this up consistently after
several thousand ops with an image size of ~100MB and cloning disabled.
This was introduced with the extra logic added here with striping.
Fix this by making a separate flag to indicate the completion is under
construction, and make sure we call complete() when both pending_count==0
and building==false.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Noah Watkins [Thu, 25 Oct 2012 19:04:00 +0000 (12:04 -0700)]
client: double mount returns -EISCONN
Change error code from -EDOM to -EISCONN when mounting an already
mounted ceph_mount_info instance. The current convention is to return
-ENOTCONN when using the libcephfs interface in an unmounted state.
Sage Weil [Sun, 21 Oct 2012 21:54:23 +0000 (14:54 -0700)]
client: do not reset session state on reopened sessions
We can have a sequence one the MDS like:
- queue REQUEST_CLOSE to journal
- force_open, queue open to journal
- request_close acked, do nothing
- force_open acked, send OPEN
In this case, the MDS never actually closed the session, and all of the
state remained valid. The client, however, gets a suprious OPEN
message and resets the session state.
Fix this by not resetting that state.
A nicer fix might be to not send the second OPEN at all, but that would
require a REOPENING state on the MDS which is more complicated; this is
good enough. Also, that approach might not give the client an
appropriate opportunity to say "um, no..." and resend the
REQUEST_CLOSE.
Sage Weil [Sun, 21 Oct 2012 21:22:51 +0000 (14:22 -0700)]
mds: fix handling of cache_expire export
During export, between the warning stage and the final notify, we may
get cache expire messages because the replicas are sending to both us
and the new auth. This check should look for >= WARNING so that it
includes the EXPORTING states as well as the portion of WARNING after
we heard from that replica. This aligns the conditional with the
following assert such that they are properly mutually exclusive.
Fixes: #1527 Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Thu, 25 Oct 2012 00:00:01 +0000 (17:00 -0700)]
osd: fix populate_obc_watchers() assert
There is one case where populate_obc_watchers gets called when the object
is missing: during a revert. And in that case we *should* do the populate,
since all that is getting reverted is the object version.
Fixes: #3405 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Sam Just <sam.just@inktank.com>
Sam Lang [Tue, 23 Oct 2012 21:21:02 +0000 (16:21 -0500)]
vstart.sh: Use ./init-ceph instead of CEPH_BIN
This effectively reverts faddb80c4230acad2b4a17aa6cbf0c30ae8d24a9
which prevented vstart.sh from being used in an enviroment where
CEPH_BIN pointed to a make install target.
Sage Weil [Tue, 23 Oct 2012 00:57:08 +0000 (17:57 -0700)]
librbd: use assert_exists() to simplify copyup check
Previously we would explicitly STAT the object to see if it exists before
sending the write to the OSD. Instead, send the write optimistically, and
assert that the object already exists. This avoids an extra round trip in
the optimistic/common case, and makes the existence check in the initial
first-write case more expensive because we send the data payload along.
Sage Weil [Tue, 23 Oct 2012 00:51:11 +0000 (17:51 -0700)]
librados: add assert_exists guard operation
Add a guard operation for writes that asserts that the object already
exists. To avoid requiring new functionality on the OSD side, implement
this by including a STAT operation, and discard the results on the
client side.
Sage Weil [Mon, 22 Oct 2012 21:14:09 +0000 (14:14 -0700)]
msg/Pipe: fix tight reconnect loop on connect failure
The fault() call in connect should not set onread=true since connect is
effectively a write path. This was forcing the writer() into a tight
loop that repeatedly would call connect(); not very polite.
Changing that, we want to avoid treating this as a normal fault (with the
failure callback) and instead back off.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
Sam Lang [Mon, 22 Oct 2012 20:24:09 +0000 (15:24 -0500)]
test: Dont check initial permissions
We can't check the initial permissions of the
file because the umask may be set to something
other than 0022. The check isn't needed to check
for chmod correctness anyway.
Yehuda Sadeh [Mon, 22 Oct 2012 19:41:30 +0000 (12:41 -0700)]
rgw: check client write status on swift get_obj
Fixes: #3381
We check the return code of the cio->write() operation
when doing get_obj(). This makes sure that we don't
continue processing the request if client has disconnected.
This commit complements another commit that does the same
for the specific s3 operation.
Sage Weil [Mon, 22 Oct 2012 17:45:36 +0000 (10:45 -0700)]
osd: drop conditional check in populate_obc_watchers
Turn these into asserts. The only two callers are create_object_context()
and get_object_context(), and they only get called when the object is no
longer missing.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Sage Weil [Mon, 22 Oct 2012 17:45:20 +0000 (10:45 -0700)]
osd: populate obc watchers even when degraded
Bug #3142 appears to be caused by the following sequence:
- object X missing on primary and replica
- [assert-ver,watch], notify, unwatch requests come in, get deferred
- object is recovered on primary, !missing, create_object_context
- populate_obc_watchers() does nothing, since still degraded
- notify happens now (odd but ok?)
- replica recovered, !degraded
- watch skips bc of bad assert
- unwatch trips up on an assert because populate_obc_watchers never
ran
Fix this by populating the obc watcher when !missing, not when
!degraded. This conditional dates back to Sam's original watch/notify
cleanup in October 2011.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Sage Weil [Mon, 22 Oct 2012 04:07:12 +0000 (21:07 -0700)]
objecter: move map checks to helper
This makes coverity happier because we check_op_pool_dne() may free
the Op (or Lingerop) structure(s), but the callers in the submit_*
paths dereference after calling. This is actually safe because they
never free new ops, but is confusing. Explicitly push this into a
separate helper.
CID 739607 (#1-2 of 2): Read from pointer after free (USE_AFTER_FREE)
At (9): Dereferencing freed pointer "o".
CID 739606 (#1 of 1): Read from pointer after free (USE_AFTER_FREE)
At (28): Dereferencing freed pointer "op".
Sage Weil [Mon, 22 Oct 2012 03:56:25 +0000 (20:56 -0700)]
librbd: init layout in ImageCtx ctor
At (6): Non-static class member field "layout.fl_stripe_unit" is not initialized in this constructor nor in any functions that it calls.
At (8): Non-static class member field "layout.fl_stripe_count" is not initialized in this constructor nor in any functions that it calls.
At (10): Non-static class member field "layout.fl_object_size" is not initialized in this constructor nor in any functions that it calls.
At (12): Non-static class member field "layout.fl_cas_hash" is not initialized in this constructor nor in any functions that it calls.
At (14): Non-static class member field "layout.fl_object_stripe_unit" is not initialized in this constructor nor in any functions that it calls.
At (16): Non-static class member field "layout.fl_unused" is not initialized in this constructor nor in any functions that it calls.
CID 717224 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
At (18): Non-static class member field "layout.fl_pg_pool" is not initialized in this constructor nor in any functions that it calls.
Sage Weil [Mon, 22 Oct 2012 03:55:41 +0000 (20:55 -0700)]
librbd: init vars in AioRequest ctor
At (2): Non-static class member "m_object_no" is not initialized in this constructor nor in any functions that it calls.
At (4): Non-static class member "m_object_off" is not initialized in this constructor nor in any functions that it calls.
CID 717222 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
At (6): Non-static class member "m_object_len" is not initialized in this constructor nor in any functions that it calls.
CID 717223 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
At (2): Non-static class member "m_parent_overlap" is not initialized in this constructor nor in any functions that it calls.
Sage Weil [Wed, 17 Oct 2012 22:44:09 +0000 (15:44 -0700)]
client: release import caps we don't have
If we don't have the inode, release the caps. There is no point in adding
it to our cache, and that is problematic anyway because it ends up with a
ref count of zero and no dentry that will get trimmed by trim_cache(),
leaving it stuck there on shutdown.
This also aligns us with the kernel client behavior.
Greg Farnum [Fri, 19 Oct 2012 21:27:38 +0000 (14:27 -0700)]
mds: deal with the case where you have a Session close event without a Session.
This case shouldn't ever happen, but we've seen it, so there's a bug
somewhere. Handling a Session close when the Session is already closed
is easy, though -- we don't need to do anything!
Adds --enable-cephfs-java and --with-jdk to build
the libcephfs Java bindings and specify the default
JDK directory, respectively.
Also adds default JDK paths to avoid --with-jdk in
the common case. Currently setup for the default
provided by Debian's default-jdk package, but other
default search paths can easily be added.
Sage Weil [Fri, 19 Oct 2012 16:09:53 +0000 (09:09 -0700)]
mds: fix coverity warnings on NULL deref
Add asserts...
At (5): Function "MDCache::get_dirfrag(dirfrag_t)" returns null (checked 33 out of 39 times). [show details]
At (6): Assigning: "dir" = null return value from "MDCache::get_dirfrag(dirfrag_t)".
CID 717007 (#1 of 1): Dereference null return value (NULL_RETURNS)
At (7): Dereferencing a pointer that might be null "dir" when calling "MDCache::adjust_bounded_subtree_auth(CDir *, std::vector<dirfrag_t, std::allocator<dirfrag_t> > &, std::pair<int, int>)". [show details]
CID 717006 (#1 of 1): Dereference null return value (NULL_RETURNS)
At (5): Dereferencing a pointer that might be null "dir" when calling "MDCache::adjust_bounded_subtree_auth(CDir *, std::vector<dirfrag_t, std::allocator<dirfrag_t> > &, std::pair<int, int>)". [show details]
CID 717005 (#2 of 2): Dereference null return value (NULL_RETURNS)
At (22): Dereferencing a pointer that might be null "dir" when calling "MDCache::adjust_bounded_subtree_auth(CDir *, std::vector<dirfrag_t, std::allocator<dirfrag_t> > &, int)". [show details]
Sage Weil [Fri, 19 Oct 2012 16:07:31 +0000 (09:07 -0700)]
mds: fix possible inode_t::get_layout_size_increment() overflow
CID 717015 (#1 of 1): Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)
Potentially overflowing expression "this->layout.fl_object_size.operator unsigned int() * this->layout.fl_stripe_count.operator unsigned int()" with type "unsigned int" (32 bits, unsigned) is evaluated using 32-bit arithmetic before being used in a context which expects an expression of type "uint64_t" (64 bits, unsigned). To avoid overflow, cast either operand to "uint64_t" before performing the multiplication.
Sage Weil [Fri, 19 Oct 2012 16:06:39 +0000 (09:06 -0700)]
mds: init cap_reconnect_t::flock_len
CID 717256 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
At (2): Non-static class member field "capinfo.flock_len" is not initialized in this constructor nor in any functions that it calls.
Sage Weil [Fri, 19 Oct 2012 16:06:06 +0000 (09:06 -0700)]
mds: init in cap_reconnect_t ctor
At (2): Non-static class member field "capinfo.cap_id" is not initialized in this constructor nor in any functions that it calls.
At (4): Non-static class member field "capinfo.wanted" is not initialized in this constructor nor in any functions that it calls.
At (6): Non-static class member field "capinfo.issued" is not initialized in this constructor nor in any functions that it calls.
At (8): Non-static class member field "capinfo.snaprealm" is not initialized in this constructor nor in any functions that it calls.
At (10): Non-static class member field "capinfo.pathbase" is not initialized in this constructor nor in any functions that it calls.
CID 717257 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
At (12): Non-static class member field "capinfo.flock_len" is not initialized in this constructor nor in any functions that it calls.
Sage Weil [Fri, 19 Oct 2012 16:04:10 +0000 (09:04 -0700)]
mds: init inode_t::dir_layout
At (2): Non-static class member field "dir_layout.dl_dir_hash" is not initialized in this constructor nor in any functions that it calls.
At (4): Non-static class member field "dir_layout.dl_unused1" is not initialized in this constructor nor in any functions that it calls.
At (6): Non-static class member field "dir_layout.dl_unused2" is not initialized in this constructor nor in any functions that it calls.
CID 717258 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)