Greg Farnum [Fri, 7 Aug 2009 20:21:34 +0000 (13:21 -0700)]
Hadoop: CephFSInterface cleanup:
better error catching on JNI function use;
removed a few unused functions;
now takes advantage of implicit typecasting for clearer code
(no more manually creating j_ints and storing results to make a 1-line
function into 3+ lines; no more if statements returning JNI_TRUE/FALSE);
modified the getDir function as it wasn't very robust
about checking for . and .. entries and the Java code does that;
modified the Java code listPaths as it wasn't necessarily filling
the array and didn't compact it!
Greg Farnum [Thu, 6 Aug 2009 23:29:50 +0000 (16:29 -0700)]
Hadoop: Throws IOException -> return null; interface requirement.
setReplicationRaw is an old method name, and it doesn't need to be overridden.
getReplication was deprecated and no longer exists.
Greg Farnum [Thu, 6 Aug 2009 23:04:25 +0000 (16:04 -0700)]
Hadoop: libhadoopcephfs now links against libceph
rather than incorporating it; the Java code accounts for this and
loads based on a configuration setting -- no more worries
about java.library.path.
Sage Weil [Wed, 5 Aug 2009 18:38:32 +0000 (11:38 -0700)]
kclient: revamp fsync
Be smarter about when we write back caps on fsync, and when we
wait. Also, wait only for those caps to write back, not for all
caps to be clean, avoiding starvation.
Sage Weil [Tue, 4 Aug 2009 23:08:32 +0000 (16:08 -0700)]
mds: wait for rejoin_gather_finish to complete before finishing
To do that we add ourselves to the rejoin_ack_gather. Otherwise
we end up in up:active before we've even finished our
parallel_fetch or finished up our caps!
Sage Weil [Tue, 4 Aug 2009 23:07:24 +0000 (16:07 -0700)]
mds: set primary lock state to LOCK from strong replica when appropriate
This is needed only because we identify_files_to_recover() before
sending the rejoin acks, and that may twiddle the lock state, so
we need to be in a compatible state.
Sage Weil [Tue, 4 Aug 2009 04:34:15 +0000 (21:34 -0700)]
mds: start resolve with root as UNKNOWN (if it's not ours)
Anything that's not ours should be unknown, including the root dir frag,
which normally starts out as mds0.
If we leave it as 0, then when mds0 claims a subset of /, its bounds are
left as 0 as well instead of being set to unknown. Which leads to
incorrect resolve stage results.
Sage Weil [Mon, 3 Aug 2009 18:39:27 +0000 (11:39 -0700)]
kclient: use caps, fragtree only to choose mds (not hierarchy)
Since we require caps for all inodes in our cache, no need to consider
parents when identifying where to sent a request. Just look at fragtree
(for fragmented dirs) or caps.
We need to make sure that the Fw bit is only cleaned after ack 2,
and Ax after ack 3. A single tid for the inode isn't sufficient,
since that would e.g. ignore ack 2... we need a tid per cap bit so
we can pipeline writeback of different caps.
Note that we can't simply write back dirty | flushing caps every
time, since the write may also be releasing the cap. And it would
gum up the MDS locking.
Move the last_tid to the inode, and only pay attention to 16 bits
per cap bit.. that's 17*2 bytes, vs the old 16. Could be worse.
An 8 bit tid is probably also sufficient (that's 256 pipelined
writes) if we're concerned about inode size down the road.
Sage Weil [Wed, 29 Jul 2009 22:51:54 +0000 (15:51 -0700)]
kclient: fix queue_cap_snap refs, calls in handle_snap
Call on correct ci. Skip past dropping inodes without dropping
our spinlock. Hold ref on prior inode until we traverse to the
next one. (We can't iput while holding our spinlock.)
Sage Weil [Wed, 29 Jul 2009 20:28:23 +0000 (13:28 -0700)]
kclient: nofail mode for osd writes
If nofail is specified, allocate request from mempool, and do not
return error on message send failure. Instead, mark the request,
and periodically retry.
This isn't perfect: we can still starve indefinitely trying to
send the write, but it'll do until we have a better way to reserve
resources for writeback messages.