Sage Weil [Tue, 22 Feb 2011 20:45:21 +0000 (12:45 -0800)]
osd: fix recovery pointer when pulling head before snapid
If recovery wants to pull a snapped object and needs the head first, pull()
does that, but the caller doesn't ++skipped and incorrectly bumps the
recovery pointer, preventing us from going back and re-pulling the snapped
object later.
Return a tristate enum from pull so we can tell what it did and update our
recovery state appropriately.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Tue, 22 Feb 2011 20:20:40 +0000 (12:20 -0800)]
osd: verify object version during push
Fail to push if the ondisk version doesn't match the version we want to
send.
This isn't supposed to happen. If it does it means we have a bug somewhere
else. Log something to the error log and don't push. This is better than
the current behavior, which goes into a loop (repeatedly pulling the object
and retrying when it's not the right version).
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Tue, 22 Feb 2011 17:40:47 +0000 (09:40 -0800)]
osd: improve up_thru request behavior
There is some epoch the OSD wants for up_thru, based on when the PG mapping
last changed. However, once the monitor gets to the point where it must
update the map, it should set up_thru to the most recent epoch the OSD has
seen (i.e. the epoch it is known to be "up thru"!). This will hopefully/
frequently avoid any subsequent up_thru requests.
MOSDAlive already has a separate field (in PaxosServiceMessage) to hold the
latest epoch; just fix the constructor to set it properly, and make the
monitor use it. No protocol change, yay!
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Greg Farnum [Mon, 14 Feb 2011 21:24:40 +0000 (13:24 -0800)]
OSD: convert waiting_for_pg from hash_map to map.
This doesn't need to be a hash_map; there will only be an entry
for each PG that gets a message request while it's not active.
Shouldn't be too many PGs that that happens too, right?
Greg Farnum [Sat, 12 Feb 2011 01:25:14 +0000 (17:25 -0800)]
PG: convert hash_maps to maps, remove unused.
waiting_for_[missing|degraded]_object don't need to be
hash_maps, and we don't use stat_object_temp_rd at all.
Swap to map and remove to reduce per-PG memory consumption!
Shouldn't need to include DoutStreambuf.h; that's all implementation.
Don't include Mutex.h, since we don't use it.
*Do* include config.h, since we need it.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Josh Durgin [Fri, 18 Feb 2011 01:30:19 +0000 (17:30 -0800)]
librbd: hold image context lock minimally
Holding the image context lock during snapshot removal prevented the
client from responding to a notify, causing a deadlock. This could be
triggered by removing a snapshot while concurrently adding more to the
same image.
Greg Farnum [Tue, 15 Feb 2011 16:58:48 +0000 (08:58 -0800)]
Journaler: call set_layout after init_headers.
set_layout modifies last_committed, but then init_headers
uses operator= and overwrites those changes. In this case
it doesn't matter as they're both writing the same changes,
but make the ordering explicit for the future.
We should handle the situation where we assert() while already holding
the dout() lock. At the same time, we want to get the dout lock if we
can, because it makes the logs look nicer. pthread_mutex_trylock solves
the dilemma.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Convert _dout_lock to plain pthread_mutex_t. This way, we don't have to
depend on the order of global constructor initialization. It should also
be slightly more efficient. The dout_lock was never subject to lockdep
anyway, so that's not an issue.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Using ELF TLS via the __thread keyword is much faster than using
pthread_getspecific and pthread_setspecific. It's also much nicer
looking syntactically. Finally, the __thread keyword is going to be
standardized in C++0x. So there's no reason to have an infrastructure
dependent on pthread_getspecific.
There were no users so this shouldn't affect anything negatively.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Sat, 12 Feb 2011 06:47:51 +0000 (22:47 -0800)]
debian: add python, python-dev build-deps
Might be overkill? The error I see from pbuilder is
checking for a Python interpreter with version >= 2.4... none
error: configure: in `/tmp/buildd/ceph-0.24.3-676-gcde53e9':
error: configure: Failed to find Python 2.4 or newer
...but I'm guessing python-dev is needed too?
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Greg Farnum [Fri, 11 Feb 2011 23:54:18 +0000 (15:54 -0800)]
MDCache: add max_dir_commit_size.
Configured by setting mds_dir_max_commit_size in conf, or else
by looking at osd_max_write_size. This should lead to sane
max commits even if the user doesn't specify anything.
This will be used in the next commit or to by CDir.
Josh Durgin [Fri, 11 Feb 2011 21:21:05 +0000 (13:21 -0800)]
objecter: set linger op target pg when a linger is resent
send_linger always creates a new Op, but op_submit does not fill in
the target pg if an existing session is passed in, so when a linger
was resent, it had the wrong pg set.
This caused a crash in cosd with debugging turned on when running
testlibrbd twice. This occurred because the object context for the
linger in the wrong pg had no object name set.
Currently, we haven't read the configuration at the time we initialize
these locks. So we can't know whether lockdep has been enabled, or what
verbosity it is supposed to have. So just disable it on these locks.
Potentially ExportControl's initialization could be moved to after
g_conf.lockdep and g_conf.debug_lockdep have been read from the
configuration, if lockdep is needed for this component.
ConfFile probably doesn't need a lock at all, but that's another story.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Greg Farnum [Fri, 11 Feb 2011 23:54:18 +0000 (15:54 -0800)]
MDCache: add max_dir_commit_size.
Configured by setting mds_dir_max_commit_size in conf, or else
by looking at osd_max_write_size. This should lead to sane
max commits even if the user doesn't specify anything.
This will be used in the next commit or to by CDir.