Sage Weil [Tue, 7 Sep 2010 21:57:49 +0000 (14:57 -0700)]
mds: fix replica state for mix->sync
Should be mix->sync(2): the same as a replica who already go the first
SYNC message and is waiting in mix->sync(2) for the final SYNC to indicate
the gather is completed.
Sage Weil [Tue, 7 Sep 2010 20:59:37 +0000 (13:59 -0700)]
mds: lock path, parent dir scatterlocks _after_ freezing
This fixes a ABBA deadlock between
acquire_locks(): auth_pins items, then locks in order
export_dir: locks paths, then freezes.
Instead, we check for lockability (but don't lock), do the freeze, and then
try to take the locks after. If we can't do so atomically, we currently
just fail. In theory this could wait for the distributed locks, but it's
probably not worth the complexity at this stage; export_dir is currently
still opportunistic and can bail out for a variety of reasons.
Sage Weil [Mon, 23 Aug 2010 22:36:14 +0000 (15:36 -0700)]
mds: can next state lockability checks in eval_gather
The can_* fields need to be ANY or AUTH.. not REQ or XCL (at least not
without trickier checks). Otherwise we progress to the next state too
early and violate the locking rules.
Sage Weil [Tue, 7 Sep 2010 17:01:58 +0000 (10:01 -0700)]
osd: log error instead of crashing on failed pull attempt
If peering screws up and the primary mistakenly tries to pull an object
from us we don't have, log an error instead of crashing. This will still
throw off recovery (it will hang), but that's better than crashing
outright.
Greg Farnum [Thu, 26 Aug 2010 21:04:45 +0000 (14:04 -0700)]
client: Make truncation work properly
The previous if block didn't work because inode->size was usually
changed well before handle_cap_trunc was ever invoked, so it never
did the truncation in the objectcacher! This was okay if you just truncated
a file and then closed it, but if you wrote a file, truncated part of it out,
and then wrote past the (new) end you would get reads that returned
the previously-truncated data out of what should have been a hole.
Now, we do the actual objectcacher truncation in update_inode_file_bits,
because all methods of truncation will move through there and this maintains
proper ordering.
Sage Weil [Thu, 26 Aug 2010 18:28:25 +0000 (11:28 -0700)]
osd: always mark down old hb peers; send map update via cluster link
If we don't mark down the hb link immediately, we'll forget about it
because it won't be in the from or to set anymore, and if it does go down
later we'll end up with garbage in the logs.
Instead, always mark it down. Since we want to share our map with old
peers that are still up, do that via the cluster link instead, which is
reliably marked down if/when the peer goes down.
Greg Farnum [Wed, 25 Aug 2010 19:04:53 +0000 (12:04 -0700)]
mds: fix invalid comparison.
We just want the code in this if block to execute if the previous if block did.
But the previous if block unlinked destdnl, so the comparison always fails! Use
a bool and set it appropriately to fix.
Sage Weil [Tue, 24 Aug 2010 17:38:21 +0000 (10:38 -0700)]
mds: expose projected values for all locks to loner (not just filelock)
There is nothing special about filelock in this case. If the client is
the loner, we should use the projected values. This is important when
we are looking at a snapid on the head (see snaptest-authwb.sh).
Sage Weil [Thu, 19 Aug 2010 22:19:33 +0000 (15:19 -0700)]
mds: add wait on auth change machinery
Special wait mask is passed through lock wait mask to parent object.
Caller adds item to a list on the subtree root.
Removal of wait item automatically removes from said list.
Subtree topology changes adjust authchange wait lists.
Migrator auth change update waits waiters. Import/export should be
protected by freeze/thaw or the blanket wakeups.
Greg Farnum [Thu, 19 Aug 2010 19:01:11 +0000 (12:01 -0700)]
backtrace: fix segfault in tcmalloc.
The print function is only called when we're about to crash anyway,
and the datamember 'foo' is allocated by ptmalloc, not tcmalloc. Freeing
it via tcmalloc causes its own crash which pollutes our debugging and
incorrectly sticks tcmalloc into the stack. So, just don't free.
Sage Weil [Fri, 20 Aug 2010 16:26:34 +0000 (09:26 -0700)]
crush: return error instead of BUGing on bad forcefed mapping
The forcefed mapping relies on a parent map. However, the current
implementation assumes that the parent mapping is unique for all rules. If
that is not the case (i.e., some osd exists in multiple hierarchies) then
we cannot assert that the TAKE matches the calculated force_context.
For now, we can just fail the mapping in that case (we don't use forcefed
mappings yet). The real solution is probably to define parent maps for
all possible hierarchies (i.e., starting at each unique TAKE starting
point).
Sage Weil [Fri, 20 Aug 2010 04:47:19 +0000 (21:47 -0700)]
mds: fix ENOTEMPTY checking on rmdir/rename
We can't trust the inode rstat size without holding the locks. We can
look at our auth frags and though without fear of a false positive
ENOTEMPTY, however.
Rename the function, introduce a helper for the locked check, update
comments, etc.