Thomas Mueller [Fri, 2 Jul 2010 08:13:38 +0000 (08:13 +0000)]
ceph.spec.in: some fixes
encountered following errors building an rpm package with the
ceph.spec.in:
RPM build errors:
File not found: /var/tmp/ceph-0.21~rc-4el5.elefant-root-mockbuild/usr/
bin/mkmonfs
File must begin with "/": %{_initddir}/ceph
Installed (but unpackaged) file(s) found:
/usr/bin/dumpjournal
/usr/bin/dupstore
/usr/bin/psim
/usr/bin/radosacl
/usr/bin/streamtest
/usr/bin/test_ioctls
/usr/bin/test_trans
/usr/bin/testceph
/usr/bin/testcrypto
/usr/bin/testkeys
/usr/bin/testmsgr
/usr/bin/testrados
/usr/bin/testradospp
* mkmonfs - vanished. isn't it used anymore? if so, the man/mkmonfs.8 can
also be removed
* initddir can be ignored as it is centos/rhel specific (it's called
"initrddir")
* added CXXFLAGS to make
Reported-by: Thomas Mueller <thomas@chaschperli.ch> Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Thu, 1 Jul 2010 19:45:16 +0000 (12:45 -0700)]
msgr: fix dispatch throttler release
We need to release the same amount back to the throttler as we originally
reserved. Store that amount in the Message, and catch all the error
paths. This fixes the case where messages get fed back into dispatch
locally (i.e. not via read_message()).
Sage Weil [Thu, 1 Jul 2010 15:28:24 +0000 (08:28 -0700)]
debug: revamp debug/logging
- By default, append to $type.$name.log.
- Get old $hostname.$pid + $type.$name symlink behavior only with
g_conf.log_per_instance
- Add new g_conf.log_file option to force a particular file.
Greg Farnum [Wed, 30 Jun 2010 18:34:45 +0000 (11:34 -0700)]
client: set MetaRequest::inode to the inode or directory inode whenever possible.
Provides link to caps, useful when requests come back ESTALE or similar.
Sage Weil [Wed, 30 Jun 2010 21:40:20 +0000 (14:40 -0700)]
osd: fix, cleanup ack/disk reply logic
There was a bug where we would get no reply if we could send ondisk but
the client didn't want it. This simplifies and cleans up the checks
to make more sense, removing the can_* helpers that were hiding which
checks were being done.
Sage Weil [Wed, 30 Jun 2010 19:08:18 +0000 (12:08 -0700)]
msgr: release bytes reserved from throttlers in failure paths
If we don't release those bytes, the throttler count eventually fills up
with bytes we were going to read but didn't (due to socket errors, etc)
until we can't read anything.
Sage Weil [Mon, 28 Jun 2010 21:15:59 +0000 (14:15 -0700)]
msgr: use dedicated reaper thread
We were calling the reaper from the wait() loop. The problem is that
the OSD has two messengers, and only the first was in wait().. the second
wait() was only called after the first terminated (i.e, when the OSD was
shutting down).
Instead, launch a separate reaper thread when we bind, and close it out
on shutdown right after the accepter.
Sage Weil [Tue, 29 Jun 2010 21:32:28 +0000 (14:32 -0700)]
osd: always use original Connection when replying
...even when the op came from another OSD. Not that that should happen
anyway, since we don't forward messages currently. (And can't, since the
OSD doesn't initiate connections to the client!)
Sage Weil [Mon, 28 Jun 2010 18:44:26 +0000 (11:44 -0700)]
journal: set max journal write to 10MB
If we take too big a bite of data to write in a single writev(2), we can
end up making performance worse, because everyone waits for the full write
to complete. Bigger writes mean better throughput but higher latency.
So, balance the two by placing some upper limit.
Sage Weil [Mon, 28 Jun 2010 18:34:29 +0000 (11:34 -0700)]
filejournal: fix journal write_pos advance
This was broken by bd4188a02abff9efffb87a0a2031efe51c1b4d9a. @pos needs to
be advanced (it is pass by reference) or else we just overwrite the same
bytes at the journal start over and over again.
Sage Weil [Sat, 26 Jun 2010 17:28:38 +0000 (10:28 -0700)]
msgr: fix throttle deadlock
Do msgr throttle after peer policy throttle. The msgr (dispatch) throttle
is shortlived and won't deadlock (unless dispatch blocks), so it's safe to
take last. In contrast, the policy throttle carries over the lifetime of
the message, and may block until replication completes or whatever else.
Sage Weil [Sat, 26 Jun 2010 04:46:23 +0000 (21:46 -0700)]
crushwrapper: gracefully handle crush error
crush_do_rule can return <0 in certain error cases (e.g., forcefed device
does not exist in crush map). We should take that to mean an empty []
result instead of crashing.
Sage Weil [Thu, 24 Jun 2010 23:49:12 +0000 (16:49 -0700)]
mds: keep cap follows above in->first in FLUSHSNAP
The client has a follows of 0 initially, which is correct (it does follow
0, and there are no prior snaps). But the inode has ->first of 2, which
is also fine. The follows here needs to be at least higher than the
inode first, though, or the caps cloning gets off...
Sage Weil [Thu, 24 Jun 2010 22:50:47 +0000 (15:50 -0700)]
mds: fix client cap condition
In 551a12f52e36 we fixed a bug with cow_inode() where the
cap->client_follows didn't match last precisely. Instead, we compare
to first. But the == is too strict.. cap follows that is equal _or_older_
than the clone's first should be copied to the clone inode.
This fixes the simple test case
$ echo asdf > bar ; mkdir .snap/bar ; rm bar ; cat .snap/bar/bar
asdf
(Previously we would get nothing unless we waited for the cap to flush on
its own.)
Sage Weil [Thu, 24 Jun 2010 17:40:14 +0000 (10:40 -0700)]
crush: make CHOOSE_LEAF to behave when leaf type is encountered
We may not want to recursively call crush_choose() if we start out with a
leaf. If that happens, we need to fill out the out2[] vector with
our result immediately.