Sage Weil [Thu, 20 Nov 2008 19:54:09 +0000 (11:54 -0800)]
osd: adjust merge_log
Object should only be marked missing if new entry is newer. If
they are the same, it may or may not be missing (depending on
whether it was before merge_log).
Sage Weil [Thu, 20 Nov 2008 18:36:19 +0000 (10:36 -0800)]
msgr: reference count messenger
We want an explicit destroy() method, because the SimpleMessenger
needs to join the dispatch thread, and that can't happen just on
the last reference drop because that may happen in the dispatch
thread itself.
Sage Weil [Tue, 18 Nov 2008 23:52:54 +0000 (15:52 -0800)]
msgr: fix reconnect after error
Items in sent queue weren't being moved back to out queue, and
in_seq/out_seq weren't being set properly after an incoming
connection replaced an existing connection.
Sage Weil [Tue, 18 Nov 2008 04:37:05 +0000 (20:37 -0800)]
mds: add new directory to new_dirfrags list
This ensures the directory gets committed before the mkdir event
is trimmed from the journal. Fixes the failed assertion in
CDir::_fetched seen on mds recovery (due to a missing directory object).
Sage Weil [Tue, 18 Nov 2008 00:46:38 +0000 (16:46 -0800)]
mds: unqueue recovery on purging inodes
If an inode is queued for file size recovery when it is purged,
unqueue it. This catches the log replay case where client
reconnect queues up the inode.
Also, in eval_stray, skip inodes that are queued. This should
avoid a recovery running concurrently with the purge (which could
be problematic, as it would carry a pointer to *in).
Sage Weil [Mon, 17 Nov 2008 21:23:03 +0000 (13:23 -0800)]
mds: use last_sent (not last_open) to untangle cap release races
If we use last_open, the client has to be smart about ignoring
MDS revocations after it sends a release request. (Or, the MDS has
to somehow know the ack is for an old cap.) Instead, just
serialize release over all cap messages sent to the client. It may
make for a slightly chattier cap release in some cases, but those
cases should be very rare, and this is simpler.
Sage Weil [Mon, 17 Nov 2008 17:03:47 +0000 (09:03 -0800)]
osd: remember past intervals instead of recalculating each time
This _vastly_ improves the speed of build_prior (and thus activate_map).
There is no need to recalculate this information each time as it is fully
dependent on _old_ OSDMaps, not current cluster state.
Sage Weil [Fri, 14 Nov 2008 23:31:07 +0000 (15:31 -0800)]
mds: adjust purge_stray sequence; include explicit ino destroy
First purge the inode content. Don't bother journaling our intent,
as that's implied by the fact that it's an unused stray.
Once purged, journal an event that destroys the inode and unlinks
the dentry. Don't remove null dentry itself, as we still need to
update the stray dir... it will get removed when that is committed.
Sage Weil [Fri, 14 Nov 2008 21:48:30 +0000 (13:48 -0800)]
mon: commit large numbers of state values quickly
Write them all, then sync once at the end.
Also include some infrastructure for using the latest stashed value
to recover. Don't use it yet, though. The interaction with
keeping last_committed and latest stashed values in sync wrt a
failure between the two is a bit tricky.
Sage Weil [Thu, 13 Nov 2008 21:30:26 +0000 (13:30 -0800)]
mds: treat open requests as non-idempotent
The problem is that the reply contains a capability, and as such
is statefull and can't be lost. Forwards by the MDS on behalf of
the client, however, introduce the possibility of multiple copies
or a request in flight if one of the MDSs fails, and the client
will drop any duplicate replies it receives.
Alternatively, the client could _also_ parse duplicate responses
(i.e. call fill_trace). I'm not sure if that's a good idea. In
any case, MDS forwarded requests are only really important for
dealing with flash flood scenarios on extremely large clusters,
so let's just set this aside for now.