Sage Weil [Tue, 30 Aug 2011 14:09:06 +0000 (07:09 -0700)]
client: fix readder result merge
When merging readdir results into the cache, we want to remove any names
_preceeding_ the current item before updating it. Then, at the end, we
clean up the trailing items.
This fixes a cfuse crash on workunits/snaps/snaptest-2.sh.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Fri, 26 Aug 2011 16:47:03 +0000 (09:47 -0700)]
objectcacher: only want for commit
There was some old, weird stuff going on here where we would wait for the
ACK and COMMIT separately. This is just wrong. Writeback does not
complete until the data is committed on disk.
Simplify by waiting only for commit, removing all the 'ack' code, and
going back to a single callback (flush_set).
- we would dirty some buffers on an object
- bump dirty_tx count
- flush()
- this adds the Object to ObjectSet::uncommitted
- truncate
- client clears FILE_BUFFER cap_ref
- Object::purge()
- clear dirty_tx count
- client puts last inode
- Object::uncommitted is not empty in ~ObjectSet
(This was triggered after several runs of workunits/suites/blogbensh.sh
on sepia.)
It turns out the uncommitted xlist<> is pretty useless, though: the same
information is captured in the dirty_tx counter. We add a separate
counter to the Object itself (for the benefit of Object::can_close()).
We also clean up Object::purge() to call truncate(0), a small
simplification.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Wed, 24 Aug 2011 23:51:15 +0000 (16:51 -0700)]
client: be careful about replacing dentries during readdir assimilation
When we are assimilate readdir results into our cache, we need to be more
careful about replacing existing dentries. We were calling
insert_dentry_inode(), which would replace a name if it already exists,
which might include pd->first, an active iterator.
Move the dentry link/relink into the caller (where we already have an
iterator pointing to the existing item, if any). Then update the dentry
lease information separately.
Fixes: #1391 Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Wed, 24 Aug 2011 16:31:22 +0000 (09:31 -0700)]
client: only link directories into hierarchy once
Directories can only link into the hierarchy once. We assert as much
in readdir_r_cb(). Fix link() so that it unlinked the directory from the
old location when relinking somewhere new. Be careful to do this after
we take inode refs to avoid any unpleasantness.
Fixes: #1429 Reported-by: Sam Lang <samlang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
mon/Paxos.cc: In function 'void Paxos::handle_begin(MMonPaxos*)', in thread '0x7fc74d11f700'
mon/Paxos.cc: 393: FAILED assert(begin->last_committed == last_committed)
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Mon, 22 Aug 2011 20:58:39 +0000 (13:58 -0700)]
mds: open+pin stray dirfrags on startup
This ensures that the stray dirfrags are always open, which in turn ensures
that whenever we add straydn items the rstats/fragstats will get updated
properly. This is a better solution than d3d767a.
Now we can assert the stray dirfrag is open in
get_or_create_stray_dentry() instead of calling get_or_open_dirfrag().
Samuel Just [Mon, 22 Aug 2011 17:09:06 +0000 (10:09 -0700)]
PG: Move reset_last_warm_restart to Initial::exit
Previously, reset_last_warm_restart was only invoked when handle_create
was used. This misses cases where the pg is initialized via a Notify,
Log, or Info message. reset_last_warm_restart will now be called from
the Initial state exit handler in order to handle the other cases.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Sage Weil [Sun, 21 Aug 2011 03:42:30 +0000 (20:42 -0700)]
mds: do not complain/assert about stray inode rstat/fragstat consistency
We instantiate the stray dirfrags without reading the fragstat off of disk
because it's faster, we know the dentry is unique, and we don't care about
the stats. It can lead to inconsistency between the dirfrag frag/rstat
and the inodes, though. Silently clean it up when we hit it; that's
simpler than not maintaining it at all for those directories.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Fri, 19 Aug 2011 23:16:26 +0000 (16:16 -0700)]
mds: xlocker_caps are supplemental to caps
Like loner_caps, xlocker_caps are additional caps to the any caps.
In practical terms, this only affects (currently) the LOCK_XLOCKDONE state
for the filelock, where it's less work than making sure what is in the any
column is also |'d onto the xlocker column. Easier to read :)
Sage Weil [Fri, 19 Aug 2011 23:15:06 +0000 (16:15 -0700)]
mds: only client hold xlock in xlockdone can change lock state
If we are in xlockdone, only the client holding that xlock can adjus the
lock state (e.g., relock). Other clients have to wait until the xlock
cycle unwinds completely.
Fixes: #1417 Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Fri, 19 Aug 2011 16:47:29 +0000 (09:47 -0700)]
paxos: use MonitorStore::put_bl_sn_map() to commit batches of values
This allows us to (safely) do fsync vs sync optimizations. The old code
would write values to the final names and then sync(2), but a crash in
between could leave 0-length (non-temporary) files and crash.
Fixes: #1414 Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Wed, 17 Aug 2011 22:40:47 +0000 (15:40 -0700)]
mds: avoid explicit passing of projected_xattrs
No need to pass this in explicitly; we can look in the projected inode for
it. This actually fixes a race where a journaled inode following a
setxattr will not journal the projected xattrs.
Sage Weil [Wed, 17 Aug 2011 21:11:15 +0000 (14:11 -0700)]
mds: handle O_TRUNC when size is already 0
We always want to go through the truncation path. Two reasons:
- even if the size is already 0, we still need to update ctime/mtime
- we may not have the correct size (client could have dirty data), in
which case we still want to bump truncate_seq etc.
Basically, we can't trust pi->size here because we are holding a wrlock,
not an xlock. And that's fine, since we need to update the inode
unconditionally anyway.
This broke cfuse pjd open/00.t tests when we added the fuse option
atomic_o_trunc. libceph has always been broken in this regard.
Fixes: #1393 Signed-off-by: Sage Weil <sage@newdream.net>