Greg Farnum [Mon, 20 Dec 2010 19:34:46 +0000 (11:34 -0800)]
objectcacher: Fix erroneous reference to "lock" with "flock."
This looks to be an old bug introduced years ago in 267679abc7e29e73655da7367d87e22a0a0d2375, and left
undiscovered due to code unuse.
Discovered by inspection while searching for clues to other issues.
Sage Weil [Sat, 18 Dec 2010 00:33:15 +0000 (16:33 -0800)]
mds: set a writeable client range on regular files created via MKNOD
If the client reexports ceph via nfs, file creations come through as
a MKNOD followed by OPEN. If it's a MKNOD on a normal file, assume that
the client will probably write to it and set them up with the caps and
client_range to do so without asking us again first.
Eliminate calls to dout that use non-existent log levels, like negative
levels less than -1. Also trigger a compiler error in the future if they
get re-added. -1 is the highest priority dout level; putting lower
priorities into the output buffer will just cause errors.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
DoutStreambuf: primitive_log: just write to the stdout fd rather than cerr
assert: don't write output to stderr manually (dout will do that
automatically). Do _dout_lock.Lock rather than TryLock. Failing to wait
for the lock is potentially buggy and provides no benefit in that code.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Re-introduce derr as a special log level (level -1) which will show up
in all logs, and on stderr. These messages are the only messages which
will show up on stderr.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Calling messenger->add_dispatcher_head() has the side-effect of starting
the messenger thread. So we must not do it before calling
messenger->start(), which sometimes calls daemonize.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Sage Weil [Wed, 15 Dec 2010 19:01:25 +0000 (11:01 -0800)]
objecter: check for pg mapping changes in each incremental; refactor misc resubmission code
We need to detect when a pg mapping changes but the primary stays the same.
That means we can't just look at the final osdmap and see what is says; we
have to look at each intervening map and check each request to see if
something switched and the osd has thrown our request out.
Also refactor and clean up the linger vs normal op stuff some more.
Use Mutex::Locker to make logging exception-safe. That is, if you are
doing "dout() << foo() << dendl;" and foo throws an exception, the dout
mutex will not be left in a locked state.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Vangelis Koukis [Thu, 9 Dec 2010 18:53:22 +0000 (20:53 +0200)]
Fix overflow in FileJournal::_open_file()
[ The following text is in the "iso-8859-7" character set. ]
[ Your display is set for the "iso-8859-1" character set. ]
[ Some special characters may be displayed incorrectly. ]
Running the unstable branch, mkcephfs fails when trying to create
a 3GB journal file on the OSDs.
Relevant messages from the osd logfile:
2010-12-09 19:03:54.419737 7fdde4d51720 journal _open_file: unable to extend journal to 18446744072560312320 bytes
2010-12-09 19:03:54.419789 7fdde4d51720 filestore(/osd) mkjournal error creating journal on /osd/journal
The problem is that the calculation of the journal size in bytes
overflows, in FileJournal::_open_file().
Signed-off-by: Vangelis Koukis <vkoukis@cslab.ece.ntua.gr> Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Wed, 8 Dec 2010 23:53:13 +0000 (15:53 -0800)]
filejournal: reset last_commited_seq if we find journal to be invalid
If we read an event that's later than our expected entry, we set read_pos
to -1 and discard the journal. If that happens we also need to reset
last_committed_seq to avoid a crash like
Sage Weil [Tue, 7 Dec 2010 21:31:01 +0000 (13:31 -0800)]
mds: sync->mix replica state is sync->mix(2)
When auth first moves to sync->mix,
- auth sends AC_MIX to replicas
- replicas go to sync->mix
- replicas finish gather, send AC_SYNCACK, move to sync->mix(2)
- auth gets all acks, sends AC_MIX again
- replica moves to MIX
So any new replica should just get sync->mix(2), so that it is not confused
by the second AC_MIX.
Sage Weil [Tue, 7 Dec 2010 19:15:56 +0000 (11:15 -0800)]
mds: open undef dirfrags during rejoin
Any invented dirfrags have a version of 0. This will cause problems later
if we pre_dirty() anything in that dir because the dir version won't be
in sync (it'll be way too small). Also, we can do that at any point,
e.g. when flushing dirty caps, and aren't allowed to delay, so we need to
load those dirfrags now.
In theory we could read only the fnode and not all the dentries, but we
may as well. We should be more careful about memory that this patch is,
though.
Sage Weil [Tue, 7 Dec 2010 17:06:47 +0000 (09:06 -0800)]
mds: send LOCKFLUSHED to trigger finish_flush on replicas
Since f741766a we have triggered start_flush and finish_flush on replicas.
The problem is that the finish_flush didn't always happen for the mix->lock
case: we sould start_flush when we sent the AC_LOCKACK, but could only
finish_flush if/when we got another SYNC or MIX. If the primary stayed in
the LOCK state, we would keep our flushing flag. That in turn causes
problems later when we try to eval_gather() (esp if we are auth at that
point?).
Fix this by sending an explicit AC_LOCKFLUSHED message to replicas after
we do a scatter_writebehind. The replica will only set flushing if it
flushed dirty data, which forces scatter_writebehind, so we will always
get the LOCKFLUSHED to match. Replicas that didn't flush will also get
it, but oh well. We'd need to keep track which ones sent dirty data to
do that properly, though.
TODO: still need to verify that this is correct for rejoin.
Sage Weil [Tue, 7 Dec 2010 15:58:01 +0000 (07:58 -0800)]
mds: clear EXPORTINGCAPS on export_reverse
We need to reverse the effects of encode_export_inode_caps(), which is just
the pin and state bit.
The original problem can be reproduced with
- ceph tell mds 0 injectargs '--mds-kill-import-at 5'
- restart mds
- recovery completes successfully
- wait for the subtree to be reexported
- fail with bad EXPORTINGCAPS get in encode_export_inode_caps