Sage Weil [Sat, 26 Jun 2010 17:28:38 +0000 (10:28 -0700)]
msgr: fix throttle deadlock
Do msgr throttle after peer policy throttle. The msgr (dispatch) throttle
is shortlived and won't deadlock (unless dispatch blocks), so it's safe to
take last. In contrast, the policy throttle carries over the lifetime of
the message, and may block until replication completes or whatever else.
Sage Weil [Sat, 26 Jun 2010 04:46:23 +0000 (21:46 -0700)]
crushwrapper: gracefully handle crush error
crush_do_rule can return <0 in certain error cases (e.g., forcefed device
does not exist in crush map). We should take that to mean an empty []
result instead of crashing.
Sage Weil [Thu, 24 Jun 2010 23:49:12 +0000 (16:49 -0700)]
mds: keep cap follows above in->first in FLUSHSNAP
The client has a follows of 0 initially, which is correct (it does follow
0, and there are no prior snaps). But the inode has ->first of 2, which
is also fine. The follows here needs to be at least higher than the
inode first, though, or the caps cloning gets off...
Sage Weil [Thu, 24 Jun 2010 22:50:47 +0000 (15:50 -0700)]
mds: fix client cap condition
In 551a12f52e36 we fixed a bug with cow_inode() where the
cap->client_follows didn't match last precisely. Instead, we compare
to first. But the == is too strict.. cap follows that is equal _or_older_
than the clone's first should be copied to the clone inode.
This fixes the simple test case
$ echo asdf > bar ; mkdir .snap/bar ; rm bar ; cat .snap/bar/bar
asdf
(Previously we would get nothing unless we waited for the cap to flush on
its own.)
Sage Weil [Thu, 24 Jun 2010 17:40:14 +0000 (10:40 -0700)]
crush: make CHOOSE_LEAF to behave when leaf type is encountered
We may not want to recursively call crush_choose() if we start out with a
leaf. If that happens, we need to fill out the out2[] vector with
our result immediately.
Sage Weil [Wed, 23 Jun 2010 21:08:39 +0000 (14:08 -0700)]
crush: behave when chooseleaf is given leaf type
Fill in the out2 choose_leaf vector if it's defined. This is necessary
because we may not recursively call choose on out2 if the item we're on is
not a bucket (e.g., when chooseleaf is given the leaf type 0).
Thomas Mueller [Mon, 21 Jun 2010 10:32:26 +0000 (10:32 +0000)]
add helptext for option "snapdirname" to manpage of mount.ceph
[ The following text is in the "UTF-8" character set. ]
[ Your display is set for the "iso-8859-1" character set. ]
[ Some characters may be displayed incorrectly. ]
inspired by the addition to
http://ceph.newdream.net/wiki/Snapshots about the snapdirname
option i've created a patch for the mount.ceph manpage
Sage Weil [Sun, 20 Jun 2010 21:41:19 +0000 (14:41 -0700)]
journal: initialize applied_seq during journal replay
This should avoid
#0 0x00007f41b1a18a75 in raise () from /lib/libc.so.6
#1 0x00007f41b1a1c5c0 in abort () from /lib/libc.so.6
#2 0x00007f41b22cd8e5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#3 0x00007f41b22cbd16 in ?? () from /usr/lib/libstdc++.so.6
#4 0x00007f41b22cbd43 in std::terminate() () from /usr/lib/libstdc++.so.6
#5 0x00007f41b22cbe3e in __cxa_throw () from /usr/lib/libstdc++.so.6
#6 0x00000000005b39f8 in ceph::__ceph_assert_fail (assertion=0x5ec3b2 "seq >= last_committed_seq", file=<value optimized out>, line=711, func=<value optimized out>) at common/assert.cc:30
#7 0x00000000005649e1 in FileJournal::committed_thru (this=0x1116310, seq=0) at os/FileJournal.cc:711
#8 0x000000000055d265 in JournalingObjectStore::commit_finish (this=0x1125740) at os/JournalingObjectStore.cc:186
#9 0x00000000005543f3 in FileStore::sync_entry (this=0x1125740) at os/FileStore.cc:1714
#10 0x00000000004ef93d in FileStore::SyncThread::entry() ()
#11 0x0000000000469a4a in Thread::_entry_func (arg=0x6315) at ./common/Thread.h:39
#12 0x00007f41b28ab9ca in start_thread () from /lib/libpthread.so.0
#13 0x00007f41b1acb6cd in clone () from /lib/libc.so.6
#14 0x0000000000000000 in ?? ()
Sage Weil [Sat, 19 Jun 2010 04:26:41 +0000 (21:26 -0700)]
initscript: remove class loading for now
- only need to do it once, by connecting to a random monitor, not for
each monitor
- not sure we should try it every time we start the monitor for all time,
as opposed to once after mkfs, or whenever the admin chooses to load
new classes
Sage Weil [Fri, 18 Jun 2010 22:59:36 +0000 (15:59 -0700)]
filestore: op_start when op is _queued_, so that q is drained on commit
We need the store in a consistent state on commit, which means flushing
transactions such that we have all ops <= a given seq applied. That is
handled by the commit_start()/commit_started() pair, but will only include
ops in the FileStore queue if we op_start when it is initially queued.
Which is exactly what we want, because the queue can reorder things, so
stopping just currently-being-applied updates will only keep transactions
atomic but not ordered.
Sage Weil [Thu, 17 Jun 2010 20:37:34 +0000 (13:37 -0700)]
msgr: ref count Pipe to avoid use after free
The Connection has a Pipe pointer to facilitate
send_message(Message, Connection)
but the reaper() clears that pointer when tearing down old pipes. This
leads to a race in which submit_message dereferences the old Pipe pointer.
Instead, make Pipe ref counted, and only submit_message() if we get a
valid Pipe reference. This fixes races between send_message() and
reaper() (as well as any use of the Connection after the pipe is closed).