Build tests (that check if there are unresolved symbols in libraries)
can slow down the build a lot. We should only enable them when
developers need them.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Wed, 13 Jul 2011 00:27:41 +0000 (17:27 -0700)]
mds: move to MIX state if writer wanted and no wanted loner
We can just look at the target loner here, which also takes any caps wanted
by other replicas on other MDSs into account. Otherwise we need to
to duplicate the CInode::calc_ideal_loner() logic.
This assumes the loner field is accurate.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Wed, 13 Jul 2011 20:18:38 +0000 (13:18 -0700)]
mds: migrate loner_cap state
It is tedious to infer what the old loner_cap was pre-migration. Just send
it over the wire and set it explicitly. Usually when we eval() we would
have come to the same conclusion, but when we didn't, we got into
inconsistent/impossible states where the issue caps don't match the loner
state (Asx issued but no loner_cap set). That meant the next issue was
a revocation with no lock state change, which led to yet more problems
down the line.
Sage Weil [Tue, 12 Jul 2011 23:49:25 +0000 (16:49 -0700)]
mds: verify deferred messages aren't stale
We may defer processing of some messages because we are laggy (in hearing
from the monitor). When we eventually get to those messages, make sure
they haven't since become stale (i.e., the source mds isn't now down).
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Mon, 11 Jul 2011 18:35:15 +0000 (11:35 -0700)]
mds: only issue xlocker_caps if we are also the loner
We cannot issue caps to a client purely because they have something
xlocked, because we do not revoke caps when we drop the xlock. However,
if they are a loner AND have the object xlocked, we can; this is why the
xlock release code either moves to LOCK or EXCL state.
Remember, the goal here is to issue caps when we do operations on objects
that xlock them (e.g. setattr, mknod) and move directly to the EXCL state
afterward. That only works (or makes sense) when we are the lone client
with caps.
Fix the get_caps_allowed_by_type() helper to do this properly.
Sage Weil [Sun, 10 Jul 2011 21:15:38 +0000 (14:15 -0700)]
mds: rely on master to do anchor locks for slave_link_prep
The replica can't take all these locks without confusing things, since it
maybe need to unlock/relock, may screw up auth_pins, and worse. The master
can take the locks.
The only problem is that the master may not know if the inode has already
been anchored if the lock hasn't cycled since then. In that case, we take
more locks than we need to.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Sun, 10 Jul 2011 21:05:10 +0000 (14:05 -0700)]
mds: defer lock eval if freezing or frozen
We were only deferring if frozen. But if freezing we need to too, because
of the way cap messages are deferred. We defer cap messages if
- inode is frozen
- inode is freezing and locks are stable (to avoid starvation)
So if we are in a stable freezing state and start deferring caps, we can't
twiddle locks further or else we can
- potentially starve (okay, in rare cases)
- get stuck because we already started deferring cap messages
We would also screw up the cap message ordering if we became unstable again
and were allowed to start processing cap messages while others were still
deferred.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Fri, 8 Jul 2011 16:32:04 +0000 (09:32 -0700)]
mds: take a remote_wrlock on srcdir for cross-mds rename
This ensures that we hold a wrlock on the srcdn auth when the slave
makes it's changes to the src directory, and prevents us from corrupting
the scatterlock state.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Fri, 8 Jul 2011 16:30:29 +0000 (09:30 -0700)]
mds: implement remote_wrlock
For the rename code to behave, we need to hold a wrlock on the slave node
to ensure that any racing gather (mix->lock) is not sent prior to the
_rename_prepare() running; otherwise we violate the locking rules and
corrupt rstats.
Implement a remote_wrlock that will be used by rename. The wrlock is held
on a remote node instead of the local node, and is set up similarly to
remote_xlocks.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Wed, 6 Jul 2011 20:48:35 +0000 (13:48 -0700)]
mds: add mix->lock(2) state
There is a problem with the wrlocks and cross-mds renames:
- master (dest auth, srci auth, srcdir replica) takes wrlock on srcdiri
- something triggers a srcdiri lock, putting inest/ifile lock in mix->lock
state
- slave (srcdir auth) sends LOCKACK
- master sends prepare_rename
- slave (srcdir auth) does rename prepare, which modifies srcdir
Even though the master holds a wrlock on the srcdiri, the gather starts
immediately and the slave sends the LOCKACK before the master's wrlock is
released.
To fix this, we add a new mix->lock(2) state, and we do not start the
mix->lock gather from replicas until the local gather completes, _after_
the auth's wrlock is released. This makes the master's wrlock sufficient
to ensure the prepare_rename on the slave is save.
This also works when the slave is the srci auth, since the gather won't
complete until the master releases its wrlock. BUT, it does NOT work if a
third MDS is the srcdiri auth, since it can still gather from the slave
prior to the master releasing its wrlock.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Thu, 7 Jul 2011 21:13:14 +0000 (14:13 -0700)]
mon: fix up pending_inc pool op mess
You can't look at pending_inc in preprocess methods. Or return an error
based on pending_inc before it commits. Fix up the snap-related error
checking.
Sage Weil [Thu, 7 Jul 2011 20:35:32 +0000 (13:35 -0700)]
mds: set old and new dentry lease bits
Recent kernels got the new CEPH_LOCK_DN definition but we were still
setting the old bit. Set both so we work with both classes of clients. In
the meantime, update the kernel to ignore this field so that eventually we
can drop/reuse it.
qa: mds rename: account for not being in the ceph root dir
We need to know the Ceph absolute path. We can't actually
derive that for sure (if we aren't mounted into the root), but this
at least lets us deal with being in our own subdirectories.
qa: mds rename: Rework so it will function in teuthology as a workunit:
work in current directory, not hardcoded mnt path
use CEPH_TOOL variable rather than hardcoded local executable
pass CEPH_ARGS to scripts so you don't need to export it into the environment.
Sage Weil [Tue, 5 Jul 2011 21:22:24 +0000 (14:22 -0700)]
mds: always clear_flushed() after finish_flush()
The scatter_writebehind_finish() is always followed up by an eval_gather(),
which does the clear_flushed(). For everyone else (replicas!), we need to
clear it immediately to avoid confusing things later.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Tue, 5 Jul 2011 20:33:40 +0000 (13:33 -0700)]
context: implement complete()
finish() requires the caller to delete. complete() does that for you by
calling finish() and then doing delete this. Unless you overload it and
do something else. This will allow us to make Contexts are are reusable,
for example, by overloading complete() instead of finish() and managing
the lifecycle in some other way.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Tue, 5 Jul 2011 15:58:26 +0000 (08:58 -0700)]
mds: fix file_excl assert
If we are in XSYN state and want to move to anything else, we must go via
EXCL, but we may not be loner anymore. Weaken the file_excl() assert so we
don't crash.
Reported-by: Fyodor Ustinov <ufm@ufm.su> Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Fri, 1 Jul 2011 06:17:39 +0000 (23:17 -0700)]
mon: add 'osd create [id]' command
If the id is specified, mark a non-existant osd rank as existant. The id
must fall within the current [0,max) range. This is the counterpart of
'osd rm <id>'.
If the id is not specified, allocate an unused osd id and set the EXISTS
flag. Increase max_osd as needed.
Closes: #1244 Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Fri, 1 Jul 2011 05:04:42 +0000 (22:04 -0700)]
mds: fix off-by-one in cow_inode vs snap flushes
We need to wait for the client to flush snapped caps if the client has
not already flushed for the given snap. If the client has already flushed
caps through the last snapid for the old inode, we do not need to set up
the snapped inode's locks to wait for that.
This fixes an occasional hang on the snaps/snaptest-multiple-capsnaps.sh
workunit.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Thu, 30 Jun 2011 20:44:24 +0000 (13:44 -0700)]
client: only send one flushsnap once per mds session
This mirrors a kclient change a while back (e835124).
We only want to send one flushsnap cap message per MDS session:
- it's a waste to send multiples
- the mds will only reply to the first one
If the mds restarts we need to resend.
This fixes a hang where we send multiples, the first (and only) reply is
ignored (due to tid mismatch), and we are left with dangling references to
the inode and hang on umount. (Reliably reproduced by running the full
snaps/ workunit directory.)
Fixes: #1239 Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Add an activate() function that must be called before we call the
onfinish callback. This is especially important in multi-threaded
contexts, since otherwise if completions come in in the wrong order, we
may delete the C_Gather object right before calling new_sub on it!
Also delete rm_subs because it is redundant with sub_finish.
Finally, num_subs_created, num_subs_remaining are now methods on
C_GatherBuilder rather than C_Gather.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>