Sage Weil [Thu, 24 Feb 2011 13:49:29 +0000 (05:49 -0800)]
Makefile: fix libatomic_ops linking
LDADD seems to have no effect on the final link command. Switching this
back to AM_LDFLAGS. This was changed as in 1c7d8f1ac2c, although it's not
clear that the change was intentional...
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Thu, 24 Feb 2011 08:34:37 +0000 (00:34 -0800)]
mds: remove "N stopped" from short mdsmap summary
It's confusing because it sounds like we're talking about daemons, when we
really just mean there are some ranks that created some ondisk state but
aren't currently part of the running cluster.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Wed, 23 Feb 2011 23:08:58 +0000 (15:08 -0800)]
mds: strengthen assertions in rejoin ack
The ACK only contains items we asked for with a WEAK request. Assert as
much. (The old continue bits were from ~2007, when this was originally
written.)
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Wed, 23 Feb 2011 22:25:06 +0000 (14:25 -0800)]
mds: fix export cancellation vs nested freezes
Prevent freezes from completing while we are canceling exports. Otherwise
if we are freezing /a/b and /a, and cancel /a/b, we may inadvertantly
complete the freeze on /a (synchronously) and confuse ourselves. Pin
all freezes beforehand so that when we cancel each one we do not cause
any others to prematurely complete.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Samuel Just [Wed, 23 Feb 2011 21:55:43 +0000 (13:55 -0800)]
FileStore: fix OpSequencer::flush error
In writeahead mode, an op will dissappear from jq without immediately
reappearing in q. Thus, q can be empty before seq is requeued and
finished. last_thru_q and last_thru_jq will now be tracked explicitly.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Sage Weil [Wed, 23 Feb 2011 21:34:01 +0000 (13:34 -0800)]
mon: fix dup mds takeover
Allow a standby to take over for a single MDS only by consistently looking
at the pending_mdsmap and not mdsmap. Mixing the two leads to all kinds
of confusion.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Wed, 23 Feb 2011 21:01:08 +0000 (13:01 -0800)]
mds: refragment dirs when inode dirfragtree updates from journal
Force dir fragmentation specified by dirfragtree when replayed from
the journal.
Example:
mds0 is auth for /foo, mds1 is auth for /foo/bar.
mds1 fragments /foo/bar. journals etc.
mds0 gets fragment notify and the in-memory inode's dirfragtree changes.
mds0 journals the /foo/bar inode for some random reason.
mds0 imports /foo/bar.
On replay, mds0 refragments upon first mention of the new fragtree in the
journal, so that the dirfragtree <-> dir frags always match. Confusion is
avoided when we, say, import /foo/bar.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Tue, 22 Feb 2011 20:45:21 +0000 (12:45 -0800)]
osd: fix recovery pointer when pulling head before snapid
If recovery wants to pull a snapped object and needs the head first, pull()
does that, but the caller doesn't ++skipped and incorrectly bumps the
recovery pointer, preventing us from going back and re-pulling the snapped
object later.
Return a tristate enum from pull so we can tell what it did and update our
recovery state appropriately.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Tue, 22 Feb 2011 20:20:40 +0000 (12:20 -0800)]
osd: verify object version during push
Fail to push if the ondisk version doesn't match the version we want to
send.
This isn't supposed to happen. If it does it means we have a bug somewhere
else. Log something to the error log and don't push. This is better than
the current behavior, which goes into a loop (repeatedly pulling the object
and retrying when it's not the right version).
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Tue, 22 Feb 2011 17:40:47 +0000 (09:40 -0800)]
osd: improve up_thru request behavior
There is some epoch the OSD wants for up_thru, based on when the PG mapping
last changed. However, once the monitor gets to the point where it must
update the map, it should set up_thru to the most recent epoch the OSD has
seen (i.e. the epoch it is known to be "up thru"!). This will hopefully/
frequently avoid any subsequent up_thru requests.
MOSDAlive already has a separate field (in PaxosServiceMessage) to hold the
latest epoch; just fix the constructor to set it properly, and make the
monitor use it. No protocol change, yay!
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Greg Farnum [Mon, 14 Feb 2011 21:24:40 +0000 (13:24 -0800)]
OSD: convert waiting_for_pg from hash_map to map.
This doesn't need to be a hash_map; there will only be an entry
for each PG that gets a message request while it's not active.
Shouldn't be too many PGs that that happens too, right?
Greg Farnum [Sat, 12 Feb 2011 01:25:14 +0000 (17:25 -0800)]
PG: convert hash_maps to maps, remove unused.
waiting_for_[missing|degraded]_object don't need to be
hash_maps, and we don't use stat_object_temp_rd at all.
Swap to map and remove to reduce per-PG memory consumption!
Shouldn't need to include DoutStreambuf.h; that's all implementation.
Don't include Mutex.h, since we don't use it.
*Do* include config.h, since we need it.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Josh Durgin [Fri, 18 Feb 2011 01:30:19 +0000 (17:30 -0800)]
librbd: hold image context lock minimally
Holding the image context lock during snapshot removal prevented the
client from responding to a notify, causing a deadlock. This could be
triggered by removing a snapshot while concurrently adding more to the
same image.
Greg Farnum [Tue, 15 Feb 2011 16:58:48 +0000 (08:58 -0800)]
Journaler: call set_layout after init_headers.
set_layout modifies last_committed, but then init_headers
uses operator= and overwrites those changes. In this case
it doesn't matter as they're both writing the same changes,
but make the ordering explicit for the future.
We should handle the situation where we assert() while already holding
the dout() lock. At the same time, we want to get the dout lock if we
can, because it makes the logs look nicer. pthread_mutex_trylock solves
the dilemma.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Convert _dout_lock to plain pthread_mutex_t. This way, we don't have to
depend on the order of global constructor initialization. It should also
be slightly more efficient. The dout_lock was never subject to lockdep
anyway, so that's not an issue.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Using ELF TLS via the __thread keyword is much faster than using
pthread_getspecific and pthread_setspecific. It's also much nicer
looking syntactically. Finally, the __thread keyword is going to be
standardized in C++0x. So there's no reason to have an infrastructure
dependent on pthread_getspecific.
There were no users so this shouldn't affect anything negatively.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Sat, 12 Feb 2011 06:47:51 +0000 (22:47 -0800)]
debian: add python, python-dev build-deps
Might be overkill? The error I see from pbuilder is
checking for a Python interpreter with version >= 2.4... none
error: configure: in `/tmp/buildd/ceph-0.24.3-676-gcde53e9':
error: configure: Failed to find Python 2.4 or newer
...but I'm guessing python-dev is needed too?
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>