Sage Weil [Tue, 3 May 2011 19:34:54 +0000 (12:34 -0700)]
osdmap: allow incremental to represent osd deletion
Convert new_down to new_state, with values xored onto the old state. We
preserve compatibility with old incrementals because they were (virtually)
always 0, and we can special case that to mean toggle CEPH_OSD_UP. We
don't really care if clients get new values right.. if they don't clear
the EXISTS flag that doesn't really hurt them. It's only important that
the monitor get it right.
To ensure that, we rev the monitor internal protocol.
Sage Weil [Tue, 3 May 2011 01:19:32 +0000 (18:19 -0700)]
cfuse: encode/decode dev_t properly
The fuse layer passes through "encoded" dev_t values (probably for
compatibility reasons or something). I copied the encode/decode methods
from the kernel and encode/decode the st_rdev values where appropriate
(where struct stat is exposed directory or via the fuse_entry_param
struct).
Fixes: #1031 Signed-off-by: Sage Weil <sage@newdream.net>
MDS: fix handle_client_rename use of path_traverse.
It was using the MDS_TRAVERSE_DISCOVERXLOCK flag, which allows
path_traverse to return success if it encounters a NULL dentry. When
we're looking for a source inode, though, that doesn't work out! We
want MDS_TRAVERSE_DISCOVER, which will go away and look for the dentry
on other inodes but requires a linked dentry, not a NULL one.
Sage Weil [Sat, 30 Apr 2011 00:30:45 +0000 (17:30 -0700)]
mds: trim non-auth swallowed subtrees during resolve
Consider:
- peer auth for /foo
- ambiguous import /foo/bar
- peer claims /foo, swallows /foo/bar.
- disambiguate_imports sees we didn't get /foo/bar, cancels ambiguous
import.
-> we are left with /foo/bar (and content) in cache, even tho it is
non-auth.
Fix by pulling the try_trim_non_auth_subtree() back out of
cancel_ambiguous_import, and trimming the containing subtree in the
disambiguate (resolve completion) case. (For the journal replay case the
subtree structure is deterministic and no such check is needed.)
Sage Weil [Thu, 28 Apr 2011 22:17:18 +0000 (15:17 -0700)]
mds: ignore fragment_notify when dft state doesn't match
In particular, if there is a resolve in there somewhere, we may have found
out about this refragment from the src because they send resolve messages
to all nodes (to resolve ambiguous migrations). If that's the case we
can ignore the message.
Sage Weil [Thu, 28 Apr 2011 20:44:55 +0000 (13:44 -0700)]
mds: try_trim_non_auth_subtree on any canceled import (including resolve)
We were trimming on journal replay of an import failure, but not on a
canceled ambiguous import during resolve. Fix that by moving the call into
the helper (and passing a CDir* instead of a dirfrag_t).
Sage Weil [Thu, 28 Apr 2011 20:22:30 +0000 (13:22 -0700)]
mds: fix steal_dentry dir_auth_pins adjustment
Pass down the correct value for dir_auth_pins (dh->auth_pins plus the
inode's auth_pins, but nothing nested beneath the inode). The CDentry
doesn't track dir auth pins independently, and doesn't really need to.
Sage Weil [Thu, 28 Apr 2011 20:00:44 +0000 (13:00 -0700)]
mds: fix export_prep trace format
The prep message includes a spanning tree in the interior of the subtree
that includes all parent inodes of bounding dirfrags. That used to look
like
df dentry inode (dir dentry inode)*
The code to generate those traces was stopping if the df->ino had already
been included. The problem was that we may have done the that inode on a
different dirfrag.
so that we can start with a dentry (already had the dirfrag, same check
as before) or a dirfrag (already had the inode, the new case), or a '-'
(nothing at all). A single byte is used to indicate which it is and how
to start decoding.
Sage Weil [Thu, 28 Apr 2011 00:07:54 +0000 (17:07 -0700)]
mds: handle freeze completion delayed by frozen inode
We can't complete a freeze_tree if we are not a subtree and the parent
inode is frozen. If that's the case, we were just doing nothing on the
auth_unpin, but that means the freeze_tree would never complete.
Instead, retake an auth_pin (on behalf of the parent) and release it when
the parent inode unfreezes.
Sage Weil [Wed, 27 Apr 2011 23:33:21 +0000 (16:33 -0700)]
mds: fix nested_auth_pin accounting on refragment
The diri gets an auth_pin on the first frag pin when it is not a subtree
root. When we are moving dentries between frags during refragment, make
sure we use the adjust_nested_auth_pins method to have one such pin per
fragment.
Carry an auth_pin on the old fragment for the duration to ensure that the
pinning/unpinning as no side-effects.
Sage Weil [Wed, 27 Apr 2011 22:52:26 +0000 (15:52 -0700)]
mds: maintain dn pinning invariants during freezing for refragmenting
fragment_mark_and_complete aims to complete the in-cache directory,
mark+pin every dentry, then drop a final auth_pin so that the whole thing
freezes. The problem is we may not be holding the final auth_pin, and
other dentries may get added (or removed?) between the mark and freeze
stages.
Use the DNPINNEDFRAG dir state bit to maintain the invariant that that
bit is set IFF all dentries are similarly pinned and marked. Update the
add_*_dentry and remove_dentry methods to do that.
Fix the success path to assert this was true and to clean up(!). Also
fix the unwind/failure path to assert.
Sage Weil [Wed, 27 Apr 2011 22:09:35 +0000 (15:09 -0700)]
mds: freeze fragments during split/merge
Freeze the target fragment(s) before unfreezing the old fragment(s) to
avoid any weird events going off when the unfreeze unauth_pins the dir
inode (in certain cases). This makes the whole process cleaner and more
symmetrical.
Tommi Virtanen [Wed, 27 Apr 2011 19:16:52 +0000 (12:16 -0700)]
automake: Make debug targets known but not built by default in non-debug builds.
With this, "./configure --without-debug && make -C src testceph" will work.
Before this, it would use make builtin rules, and fail to compile in a
confusing manner.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Sage Weil [Wed, 27 Apr 2011 17:59:40 +0000 (10:59 -0700)]
mds: handle discovers that race with refragmenting
Consider:
- send discover on frag X
- X refragments
- we take the waiter and rediscover on frag Y
- we get the reply for the X discover
The auth mds will correctly delay sending the reply until the refragment
completes and it unfreezes, but the reply was getting the original frag_t,
not the new one.
Jim Schutt [Tue, 26 Apr 2011 22:06:46 +0000 (15:06 -0700)]
configure.ac: check for supported compiler flags
Ancient versions of gcc, such as the gcc 4.1.2 in RHEL 5.5, don't
support some -W flags that newer versions do. Fix up configure.ac
and Makefile.am to use them if you have them.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Tue, 26 Apr 2011 21:43:24 +0000 (14:43 -0700)]
mon: fix standby-replay assignment logic
Assign a standby-replay at any time based on rank, name, or no preference.
Previously this could only happen when the MDS first started, and we would
fail if the target MDS wasn't followable at that point in time.
Implement RadosStore, a storage backend which accesses librados
directly, without going through RGW (Rados GateWay).
This version is still very preliminary because ACLs aren't supported.
We need ACLs even to do things like properly create buckets.
Instead, this version has ACL_HACK, which is just for testing purposes.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Tue, 26 Apr 2011 19:10:12 +0000 (12:10 -0700)]
osd: add RWORDERED osd op flag
Order this op wrt reads the same way a read-modify-write would be.
(Otherwise we may get a fast/stale read result on a not-yet-complete
write.)
This fixes a problem where the Filer was marking a probe stat as a write
to get this same effect, but the OSD would EINVAL if it was a snapped
object (which happens in certain cases where the MDS is recovering the
file size of a snapped file).
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>