Sage Weil [Fri, 1 Oct 2010 05:00:06 +0000 (22:00 -0700)]
osd: fix recovery_primary loop on local clone
When we take the clone branch, we update the missing map. This invalidates
our current iterator, which can cause badness. Instead, increment the
iterator near the top of the loop so we don't have to worry about it.
Sage Weil [Tue, 21 Sep 2010 20:44:02 +0000 (13:44 -0700)]
mds: do full pre_dirty()/mark_dirty() on cowed dentries
The dir commit/fetch and LogSegment::try_to_expire() rely on any new or
items in the directory getting new versions that correspond to a bump in
the dirfrag version. This must include dentries/inodes that are created
by the cow process, or else we have problems during dir commit/fetch or
segment expire.
Change the dirty list in the Mutation to include the pv so that we can
properly mark them dirty later.
Leave the inode one alone. We could theoretically do the same for the
dirty inodes, but this way we avoid projecting them and copying stuff
around. Any dirty cowed inode will also have a dirty dentry, so it will
still get saved regardless.
Sage Weil [Tue, 21 Sep 2010 20:54:00 +0000 (13:54 -0700)]
mds: only return pdnvec for full path_traverse
We should only return the pdnvec for a full traverse. i.e., either a
success, or a failure in which we instantiate a null dn for the trailing
entry. This makes pdnvec well defined, and allows callers like
rdlock_path_pin_ref() to reply with a null lease when appropriate.
Sage Weil [Tue, 21 Sep 2010 02:59:00 +0000 (19:59 -0700)]
mds: don't instantiate null dentries for snapped namespace
The dentry needs a [first,last] range and we don't know what first is when
we miss a lookup. And part of the point of instantiating null dentires is
to issue leases against them, which we don't do. The client will cache
the null result.
Sage Weil [Tue, 17 Aug 2010 19:16:02 +0000 (12:16 -0700)]
mds: drop x/wrlocks before, rdlocks after sending reply
This lets us issue the most leases/caps possible. It also ensure we can
issue caps in the snapped namespace when we are still on the head inode
(previously, releasing the rdlock twiddled the state, the client didn't
get say Frc, and hung indefinitely).
Sage Weil [Fri, 17 Sep 2010 16:46:29 +0000 (09:46 -0700)]
mds: touch missed dentry when fetching dir on path traverse
We can get into a loop when doing a path traverse if we miss on a large
directory and then end up trimming the result we need before handling the
original request. To avoid this, we simply put the wanted dentry at the
top of the LRU (instead of midpoint).
Greg Farnum [Thu, 26 Aug 2010 21:04:45 +0000 (14:04 -0700)]
client: Make truncation work properly
The previous if block didn't work because inode->size was usually
changed well before handle_cap_trunc was ever invoked, so it never
did the truncation in the objectcacher! This was okay if you just truncated
a file and then closed it, but if you wrote a file, truncated part of it out,
and then wrote past the (new) end you would get reads that returned
the previously-truncated data out of what should have been a hole.
Now, we do the actual objectcacher truncation in update_inode_file_bits,
because all methods of truncation will move through there and this maintains
proper ordering.
Sage Weil [Thu, 16 Sep 2010 23:15:30 +0000 (16:15 -0700)]
osd: copy truncate_seq et al to clone oi
These fields are logically object attributes that should be preserved
across the clone COW process. (Not copying truncate_seq in particular
corrupts snapshot file data, depending on the order of arrival of racing
trimtrunc and writes.
Sage Weil [Thu, 16 Sep 2010 22:50:50 +0000 (15:50 -0700)]
osd: fix is_pool_snaps_mode() for empty pools
The data pool in particular has seq 0 and (initially) no removed snaps. We
must not return true for that case, or else the OSD will use an empty
pool snap context and not the user/mds provided one.
Sage Weil [Thu, 9 Sep 2010 22:37:09 +0000 (15:37 -0700)]
osdmap: allow blacklist of an entire ip
We can backlist either a specific instance (1.2.3.4:1234/5678) or an
entire IP, in which case the table has something like "1.2.3.4:0/0" (a port
and nonce of 0).
Sage Weil [Thu, 9 Sep 2010 17:56:22 +0000 (10:56 -0700)]
mon: handle subscribe to osdmap=1
We would send an incremental for anything >1, or the latest map, but not
osdmap e1 itself. Fix the condition, and make send_incremental() smart
about starting with the full map at 1 as needed.
Sage Weil [Thu, 9 Sep 2010 17:01:24 +0000 (10:01 -0700)]
mds: fix journal replay of session close->open after reconnect
If the client reconnects, the journal 'close' replay doesn't remove the
session, which leaves the session state intact. It needs to reset it in
that case, or else we get problems if the session is reopened and the
state doesn't match up.
Reported-by: Nat N <phenisha@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Thu, 26 Aug 2010 18:28:25 +0000 (11:28 -0700)]
osd: always mark down old hb peers; send map update via cluster link
If we don't mark down the hb link immediately, we'll forget about it
because it won't be in the from or to set anymore, and if it does go down
later we'll end up with garbage in the logs.
Instead, always mark it down. Since we want to share our map with old
peers that are still up, do that via the cluster link instead, which is
reliably marked down if/when the peer goes down.
Sage Weil [Fri, 20 Aug 2010 16:26:34 +0000 (09:26 -0700)]
crush: return error instead of BUGing on bad forcefed mapping
The forcefed mapping relies on a parent map. However, the current
implementation assumes that the parent mapping is unique for all rules. If
that is not the case (i.e., some osd exists in multiple hierarchies) then
we cannot assert that the TAKE matches the calculated force_context.
For now, we can just fail the mapping in that case (we don't use forcefed
mappings yet). The real solution is probably to define parent maps for
all possible hierarchies (i.e., starting at each unique TAKE starting
point).
Sage Weil [Fri, 20 Aug 2010 04:47:19 +0000 (21:47 -0700)]
mds: fix ENOTEMPTY checking on rmdir/rename
We can't trust the inode rstat size without holding the locks. We can
look at our auth frags and though without fear of a false positive
ENOTEMPTY, however.
Rename the function, introduce a helper for the locked check, update
comments, etc.