Sage Weil [Thu, 7 Oct 2010 14:52:02 +0000 (07:52 -0700)]
debug: always append to log
We were truncating if we were in log_per_instance mode. But normally those
logs don't exist. And if they do, we probably don't want to truncate
them. This is particularly true if we respawn ourselves (e.g. after being
marked down) and restart with the same pid.
Greg Farnum [Wed, 6 Oct 2010 23:35:14 +0000 (16:35 -0700)]
mds: Check the lock state, not the inode state!
This was causing a lot of slowdowns.
Additionally, pin the inode when exporting caps -- otherwise it could
disappear out from under a cap ack. This was probably just exposed
by fixing the lock check.
Sage Weil [Tue, 7 Sep 2010 17:01:58 +0000 (10:01 -0700)]
osd: log error instead of crashing on failed pull attempt
If peering screws up and the primary mistakenly tries to pull an object
from us we don't have, log an error instead of crashing. This will still
throw off recovery (it will hang), but that's better than crashing
outright.
Sage Weil [Fri, 24 Sep 2010 18:43:37 +0000 (11:43 -0700)]
osd: make sparse data/clone push behave with partial object push
We can't error out if we don't get everything we want in one go now that
we support pushing objects in pieces. Remove this check entirely, since
we don't have a good error handling case anyway.
Sage Weil [Tue, 5 Oct 2010 22:41:40 +0000 (15:41 -0700)]
osd: cancel deletion on pg change
If the primary changes, cancel deletion so that the new primary has the
benefit of considering whether they need anything we have. Before we were
only canceling if our role changed, but that makes little sense.
Greg Farnum [Tue, 5 Oct 2010 16:25:38 +0000 (09:25 -0700)]
client: Fix truncate_seq/truncate_length initialization.
Initializing to 0 was causing file_to_extents to get called on every inode
since the MDS initializes truncate_seq to 1 and truncate_length to -1.
This revealed itself as a crash on directory inodes, which have their
layouts zeroed since merging the file_layouts branch.
To make clearer, assert that anything being truncated is a file inode.
Previously we unconditionally encoded the standard layout, which
on a directory inode is meaningless. So, use that spot to fill
in the default dir layout, if it exists. Otherwise, zero-fill.
This lets us display default directory layouts without changing
the protocol, which is good.
Always throw exceptions by value rather than as pointers. Always catch
exceptions as const references to avoid unecessary copying. This fixes a
few minor memory leaks and should simplify handling exceptions in the
future.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Sage Weil [Fri, 1 Oct 2010 22:54:56 +0000 (15:54 -0700)]
mon: add 'mds fail N' command
Manually mark an mds rank as failed. The daemon should kill itself when
it finds out.
Note that this doesn't do any sanity checks, so it can also be used to
adjust state in an otherwise inconsistent mdsmap due to other bugs (one
where, say, an mds in up but has no info, or not up but not in the failed
set.)
Sage Weil [Fri, 1 Oct 2010 19:43:20 +0000 (12:43 -0700)]
mds: fix stray replica push on _rename_prepare_witness()
We need to push all parents of the straydn to the target. This changed
a while back with the mdsdir stuff but this bit of code wasn't updated.
Updated to mirror send_dentry_unlink().
This fixes a crash like:
mds/MDCache.cc: In function 'void MDCache::adjust_subtree_auth(CDir*, std::pair<int, int>, bool)':
mds/MDCache.cc:644: FAILED assert(root)
ceph version 0.22~rc (0e67718a365b42969e785f544ea3b4258bb2407f)
1: (MDCache::add_replica_dir(ceph::buffer::list::iterator&, CInode*, int, std::list<Context*, std::allocator<Context*> >&)+0x1c1) [0x536a91]
2: (MDCache::add_replica_stray(ceph::buffer::list&, int)+0xdb) [0x536fab]
3: (Server::handle_slave_rename_prep(MDRequest*)+0x1113) [0x4d5c33]
4: (Server::dispatch_slave_request(MDRequest*)+0x21b) [0x4de80b]
5: (Server::handle_slave_request(MMDSSlaveRequest*)+0x145) [0x4e1955]
6: (MDS::_dispatch(Message*)+0x2598) [0x49e038]
...
Sage Weil [Fri, 1 Oct 2010 19:32:59 +0000 (12:32 -0700)]
osd: revamp forgetting lost objects
The old forget lost objects rewrote history in the PG log, which is asking
for all kinds of trouble. Instead, add new logs events to indicate that
an object is LOST (deleted) or LOST_REVERTed (reverted to an older
version).
The LOST_REVERT case means we may need to recover the old version from
another node and rewrite the version number. This isn't implemented yet;
for now we just assert.
Sage Weil [Fri, 1 Oct 2010 19:32:59 +0000 (12:32 -0700)]
osd: revamp forgetting lost objects
The old forget lost objects rewrote history in the PG log, which is asking
for all kinds of trouble. Instead, add new logs events to indicate that
an object is LOST (deleted) or LOST_REVERTed (reverted to an older
version).
The LOST_REVERT case means we may need to recover the old version from
another node and rewrite the version number. This isn't implemented yet;
for now we just assert.
Sage Weil [Fri, 1 Oct 2010 05:00:06 +0000 (22:00 -0700)]
osd: fix recovery_primary loop on local clone
When we take the clone branch, we update the missing map. This invalidates
our current iterator, which can cause badness. Instead, increment the
iterator near the top of the loop so we don't have to worry about it.
coll_t is now a string. META_COLL and TEMP_COLL are just constants now.
Now there is a constructor that takes pgid_t and snapid_t, rather than
factory methods. It's clear what that constructor does, so wrapping it
in factory methods should be unecessary.
Bump coll_t serialization version to 3. Implement decoding for the old
versions.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Colin McCabe [Wed, 29 Sep 2010 02:00:28 +0000 (19:00 -0700)]
interval_set: hide data members
This change makes interval_set::m and interval_set::_size private data
members in interval_set, instead of public. This change also creates a
non-const iterator. Using this iterator, users can modify the length of
an interval. So now, all users can use the iterators rather than
interacting with the class internals directly.
mon: Fix issue first addressed in 2c5a3d99aa3be5ce114072e84f73a0a6426e63fd.
We were properly falling out of the while loop when we reached end(), but
not checking for it in the following if-else. Now we do! Reported-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
The setup-chroot.sh script is very handy for building the server in a
chroot environment. I thought I would share it here in case anyone else
finds it useful.
Sage Weil [Mon, 27 Sep 2010 15:31:34 +0000 (08:31 -0700)]
mds: don't block request on freezing if we're already auth_pinned.
If we already auth_pinned, we're past the gates; don't stop on freezable.
This screws up xlock: the lock moves to PREXLOCK state, but the request
that would normally xlock it gets deferred because of a racing freezing
of the tree. Then the PREXLOCK gather kicks in and badness happens.
Sage Weil [Sat, 25 Sep 2010 03:10:08 +0000 (20:10 -0700)]
osd: add coll_t::is_pg() method
This makes the interface a bit more adaptable for a situation where it has
a simple string representation instead of the strict structure it has now.
Eventually this function can simply attempt a pg_t parse.