Sage Weil [Thu, 9 Jun 2011 18:37:18 +0000 (11:37 -0700)]
mds: set or issue caps on lock state changes
Set pneed_issue (or issue ourselves) whenever we jump directly to the
target lock state. Make sure we only do it if there are caps (cap shift)
for this particular lock.
Sage Weil [Thu, 9 Jun 2011 17:43:13 +0000 (10:43 -0700)]
mds: make issue_caps from file_update_finish smarter
We do one funky thing in file_update_finish that only issues caps on a
single cap when max_size changes. This is more commonly we see. However,
if a lock changes state and we need to issue on the whole inode (for all
clients), avoid doing the cap-specific issue by checking the issue set.
Sage Weil [Thu, 9 Jun 2011 17:41:53 +0000 (10:41 -0700)]
mds: issue caps from drop_locks
In drop_locks, build a set of inodes we need to issue caps on. Then do it
all at once. This does two things:
- it fixes the fact that currently a dropped lock leading to an eval and
lock state change will not issue caps _at_all_
- it ensure we only issue_caps once for each inode, even when we are
dropping multiple locks on it.
Sage Weil [Thu, 9 Jun 2011 17:03:08 +0000 (10:03 -0700)]
mds: pass pissue_caps through *lock_finish()
This allows *lock_finish() callers to handle the issue_caps themselves.
None of them do yet (this arg is still optional) so this is patch has no
functional change (yet!).
Greg Farnum [Mon, 6 Jun 2011 20:43:37 +0000 (13:43 -0700)]
mds: xlock_finish should only do_issue in certain cases.
We accidentally (we think) initialized this variable to true when
we want it to be false: we should only do_issue if there aren't
any remaining locks, not in all cases.
De-globalize CephToolContext. It's important to do this now because the
constructor for CephToolContext references the configuration (via
CephContext.)
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Greg Farnum [Wed, 8 Jun 2011 21:13:28 +0000 (14:13 -0700)]
mds: rename: remove illicit assert.
We actually do want witnesses who aren't auth for anything
to do journaling in some cases, so kill the assert.
That also negates the need for the not_journaling check.
Sage Weil [Wed, 8 Jun 2011 20:29:21 +0000 (13:29 -0700)]
mds: try_trim_non_auth_subtree if we rename a dir away from a non-auth subtree
It's possible we have non-auth metadata only because we have a subtree
nested beneath. If we rename a directory out of a non-auth subtree, we
should try to trim any non-auth content from that subtree that may now
be possible due to the child subtrees being linked elsewhere.
Fixes: #1146 Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Wed, 8 Jun 2011 20:18:07 +0000 (13:18 -0700)]
mds: remove unlinked metadata from cache on replay
If we replay a metablob that unlinks something, throw it out immediately.
Recursively. This comes up when:
- we rename a file from one mds to another, and we replay the event on
the source mds. the inode gets thrown out.
- we rename a directory from one mds to another, and when journaled, the
source mds had no nested metadata. same thing: we throw it out. we
may have something in our cache nested beneath that, though, that was
since committed and such, but the fact that we didn't journal it being
reattached elsewhere implies that it was clean and gone when our event
was journaled, and we can throw it all out. recursively.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
libcommon uses symbols from the crypto libraries, so they must appear on
the link line whenever libcommon appears. Later, we may want to revisit
this dependency; however, right now, having unit tests that build
consistently is pretty important.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Wed, 8 Jun 2011 03:48:52 +0000 (20:48 -0700)]
mds: open renamed import child frags during journal replay
Open up any child frags of the imported renamed inode that are noted in
the journal event. (Note we blindly open up that list here; it's up to the
journaler to only populate it when appropriate.) If the listed frags are
not already open, open them up and set the dir_auth to unknown; presumably
they belong to the rename source/exporter. If we already had them open,
then the adjust_subtree_after_rename call above will have caught them and
already done the necessary subtree adjustment.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Wed, 8 Jun 2011 03:46:42 +0000 (20:46 -0700)]
mds: journal open srci frags on srci import (master)
If we are importing the renamed inode, and it is a directory, journal a
list of all open dirfrags (currently, this is actually all frags) so that
we can open them up during journal replay.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Wed, 8 Jun 2011 03:43:29 +0000 (20:43 -0700)]
mds: journal renames on witnesses if we have nested subtrees
If a rename witness has any subtrees that are nested beneath the renamed
directory, we need to journal the rename event so that our cache is
properly updated on journal replay.
Further, if we are exporting srci, we also need to journal the dest
(even if we aren't auth for destdn) if we have any open dirfrags because
those will turn into nested subtrees shortly.
We still need to ensure that the cache is properly trimmed during replay.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Tue, 7 Jun 2011 16:41:56 +0000 (09:41 -0700)]
mds: fix/clean up xlock import/export
- create xlock import/export helpers
- fix/simplify checks: we want to export/import only xlocks on the inode
that is being migrated, unless they are locallock.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
MonClient should contain a KeyRing and a RotatingKeyRing. All the
MonClient users, except possibly csyn, don't want to manage those
objects themselves.
Don't chdir until after we have opened the KeyRing. If the KeyRing is at
a relative path, a chdir may make it inaccessible. Separate the chdir
function from the daemonize function.
Refactor the cmds argument parsing a little bit. Separate the special
actions from the normal operations of the daemon.
This should allow librados and libceph to support CephX finally! yay!
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
This commit is just erroneous. It adds checks on a pipe write
for the result and an abort if the write failed. But that's broken
in the desired case where we succeed, block on ceph_fuse_ll_main(),
and the parent process is long-gone by the time we get to this code!
Greg Farnum [Fri, 3 Jun 2011 18:53:10 +0000 (11:53 -0700)]
rados_bencher: re-add written objects constraint to read benchmark.
Somehow, in the last major change, the constraints that kept the
bencher from trying to read non-existent objects got removed. Put
a check back in the main bench loop to fix that.
Greg Farnum [Fri, 3 Jun 2011 16:53:20 +0000 (09:53 -0700)]
mds: Clean up _rename_prepare journaling
This has been broken for a while in terms of journaling
things the MDS isn't auth for. This patch should fix that, and
adds a few asserts to that effect.
Also adds a new not_journaling flag to _rename_prepare
for those cases which call the function and then discard
the bufferlist results. Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Thu, 2 Jun 2011 21:27:23 +0000 (14:27 -0700)]
uclient: reset flushing_caps on (mds) cap import.
Previously, we could get stuck thinking that we'd flushed caps
(that went to the original MDS, waited on freeze for export,
and then were dropped) without ever telling the auth MDS that we
wanted to do so. This caused hung shutdowns:
1) during shutdown we drop all our caps
2) we get stuck and notice that we have a flushing cap
3) we send cap flush
4) MDS ignores it (I think because actual data already got updated?
and now we don't have the proper caps either)
Greg Farnum [Thu, 2 Jun 2011 18:43:05 +0000 (11:43 -0700)]
uclient: don't use racy check for uncommitted data.
Previously we used a check for if there were CEPH_CAP_FILE_BUFFER refs,
but that was racy if we had other threads (they could hold caps for
sync writes or something). Instead, see if we have any in-flight
writes or uncommitted objects.