Sage Weil [Tue, 4 Jan 2011 18:20:18 +0000 (10:20 -0800)]
client: fix frag selection code
Calling fragtree_t::contains() on a non-frag_t is nonsense and will crash.
And a fragtree is a complete partition of the space. What we really want
to check is if we know where to find the specific frag_t we need.
osd: Make g_conf.osd_max_notify_timeout a uint32_t
Make g_conf.osd_max_notify_timeout a uint32_t. Squashes an annoying
compiler warning and avoids the awkward issue of users specifying
negative timeouts.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Sage Weil [Mon, 3 Jan 2011 22:32:48 +0000 (14:32 -0800)]
mds: load root inode on replay if auth
If we are auth for the root inode, load it's initial value off of disk. We
may not see it in the log if it has not been modified. If it has, this
is useless but fast/harmless. This only occurs for brand-new filesystems
where the mds is immediately restarted.
It seems that we have not been zeroing
PG::Info::History:last_epoch_clean when the History structure is
created. This led to some very interesting log output (and bugs!)
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Make g_conf.keyring a plain old string rather than an array of strings.
Don't do substitution using the user's HOME variable-- this could lead
to security holes for setuid processes.
Get rid of AuthMonitor::read_keyfile because there is already a Keyring
member function, Keyring::load, that does the same thing.
qa/rbd/common.sh: we can now use cconf to figure out what the keyring
is.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
cconf: add a better usage() message, with examples. Give more helpful
error messages when the usage is wrong. Put different actions into
different functions. Eliminate unecessary globals.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
It seems that we have not been zeroing
PG::Info::History:last_epoch_clean when the History structure is
created. This led to some very interesting log output (and bugs!)
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Sage Weil [Mon, 20 Dec 2010 21:22:49 +0000 (13:22 -0800)]
osd: compensate for replicas with tail > last_complete
Normally we shouldn't ever have a last_complete < log.tail (&& !backlog).
But maybe we do (old bugs, whatever; see #590). In that case, the primary
can compensate by sending more log info to the replica.
Greg Farnum [Mon, 20 Dec 2010 19:34:46 +0000 (11:34 -0800)]
objectcacher: Fix erroneous reference to "lock" with "flock."
This looks to be an old bug introduced years ago in 267679abc7e29e73655da7367d87e22a0a0d2375, and left
undiscovered due to code unuse.
Discovered by inspection while searching for clues to other issues.
Sage Weil [Sat, 18 Dec 2010 05:02:58 +0000 (21:02 -0800)]
mds: make nested scatterlock state change check more robust
The predirty_journal_parents() calls wrlock_start() with nowait=true
because it has a journal entry open and we don't want to trigger a nested
scatterlock change that needs to journal something again (either
via scatter_writebehind or scatter_start). (MDLog can only handle a single
log entry open at once because building multiple at once would require very
very very careful ordering of predirty() calls and versions.)
We were already check for the simple_lock() case (which may call
writebehind); fix up the check to also cover the scatter_mix() (which may
call scatter_start) case.
Sage Weil [Sat, 18 Dec 2010 00:33:15 +0000 (16:33 -0800)]
mds: set a writeable client range on regular files created via MKNOD
If the client reexports ceph via nfs, file creations come through as
a MKNOD followed by OPEN. If it's a MKNOD on a normal file, assume that
the client will probably write to it and set them up with the caps and
client_range to do so without asking us again first.
Sage Weil [Fri, 17 Dec 2010 23:12:17 +0000 (15:12 -0800)]
filestore: make OpSequencer::flush() work for writeahead journaling items
It was only waiting for items in the op_queue to complete. The goal is
to wait for anything we've called queue_transactions(&osr,...) on. If we
do writeahead journaling, though, there might be new ops that are still
journaling but not yet submitted to the fs that are missed.
This adds a journal queue to the OpSequencer, and uses it in the writeahead
case only.
Sage Weil [Fri, 17 Dec 2010 20:54:38 +0000 (12:54 -0800)]
osd: flush pg writes to disk before starting scrub scan
This avoids two races:
- we just completed recovery by pushing objects to the replica, and the
replica starts scanning before those writes reach the fs.
- we just trimmed to something after last_update_applied.
Eliminate calls to dout that use non-existent log levels, like negative
levels less than -1. Also trigger a compiler error in the future if they
get re-added. -1 is the highest priority dout level; putting lower
priorities into the output buffer will just cause errors.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
DoutStreambuf: primitive_log: just write to the stdout fd rather than cerr
assert: don't write output to stderr manually (dout will do that
automatically). Do _dout_lock.Lock rather than TryLock. Failing to wait
for the lock is potentially buggy and provides no benefit in that code.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Re-introduce derr as a special log level (level -1) which will show up
in all logs, and on stderr. These messages are the only messages which
will show up on stderr.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Calling messenger->add_dispatcher_head() has the side-effect of starting
the messenger thread. So we must not do it before calling
messenger->start(), which sometimes calls daemonize.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Sage Weil [Wed, 15 Dec 2010 19:01:25 +0000 (11:01 -0800)]
objecter: check for pg mapping changes in each incremental; refactor misc resubmission code
We need to detect when a pg mapping changes but the primary stays the same.
That means we can't just look at the final osdmap and see what is says; we
have to look at each intervening map and check each request to see if
something switched and the osd has thrown our request out.
Also refactor and clean up the linger vs normal op stuff some more.
Sage Weil [Tue, 14 Dec 2010 17:26:12 +0000 (09:26 -0800)]
mon: trim pgmap less aggressively
This will make observer crashes due to missed states (#648) much harder to
hit. Eventually the pgmap state trim problem will go away when the
monitor/paxos code is restructured (#647).
Use Mutex::Locker to make logging exception-safe. That is, if you are
doing "dout() << foo() << dendl;" and foo throws an exception, the dout
mutex will not be left in a locked state.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>