Samuel Just [Tue, 26 Mar 2013 22:10:37 +0000 (15:10 -0700)]
ReplicatedPG: send entire stats on OP_BACKFILL_FINISH
Otherwise, we update the stat.stat structure, but not the
stat.invalid_stats part. This will result in a recently
split primary propogating the invalid stats but not the
invalid marker. Sending the whole pg_stat_t structure
also mirrors MOSDSubOp.
Fixes: #4557
Backport: bobtail Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Joe Buck [Tue, 26 Mar 2013 21:17:14 +0000 (14:17 -0700)]
testing: fix hadoop-internal-test
Remove now superfluous directory changes
that are causing tests to fail.
This code should have been removed when we transitioned
from running tests with Ant to using Java to run the tests.
Signed-off-by: Joe Buck <jbbuck@gmail.com> Reviewed-by: Noah Watkins <noahwatkins@gmail.com>
Sam Lang [Mon, 25 Mar 2013 19:55:20 +0000 (14:55 -0500)]
client: Don't signal requests already handled
The assertion failure reported in #4530 is triggered
by the following:
1. client sends request
2. mds sends unsafe reply
3. before request gets journaled, mds is killed
4. mds restarts
5. client receives session close (from close request before restart)
6. session close does kick_requests()
7. kick_requests tries to signal caller that doesn't exist.
This fix avoids signaling a caller if the unsafe reply
has been received and the make_request() function has completed.
We do this by setting the caller_cond to null once the caller
is woken up, and only signal the caller in kick_requests if
caller_cond is non-null. This avoids trying to resend requests
listed in mds_request but that have already received unsafe replies.
The unsafe requests are handled by resend_unsafe_requests() code,
so skipping those requests is allowable.
Fixes #4530. Signed-off-by: Sam Lang <sam.lang@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Gary Lowell [Tue, 26 Mar 2013 18:31:16 +0000 (11:31 -0700)]
ceph-disk: udevadm settle before partprobe
After changing the partition table, allow the udev event to be
processed before calling partprobe. This helps prevent partprobe
from getting a resource busy error on some platforms.
Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
Sam Lang [Tue, 26 Mar 2013 13:55:40 +0000 (08:55 -0500)]
mds: CInode::build_backtrace() always incr iter
Always increment the iterator when adding old pools
to the backtrace. This fixes a bug on files where
the layout had been set to a different pool and then
back to the same pool, causing continuous looping in
the build_backtrace() function.
Fixes #4537. Signed-off-by: Sam Lang <sam.lang@inktank.com>
Sam Lang [Mon, 25 Mar 2013 17:58:13 +0000 (12:58 -0500)]
client: Handle duplicate safe replies
If the mds sends a duplicate safe reply, the mds_requests
map won't contain a matching request id (tid). Instead of
assert failing, we log a message that we saw a reply without
a matching request.
Also remove redundant mds_requests->erase(tid) line.
Sam Lang [Mon, 25 Mar 2013 16:43:54 +0000 (11:43 -0500)]
client: Always cleanup request after safe
The client MetaRequest should always be cleaned up
and removed from the mds_requests map once the client
gets a safe reply. This patch avoids a leak where the
mds does not send back an unsafe reply and the request
is never cleaned up.
Sam Lang [Mon, 25 Mar 2013 16:39:19 +0000 (11:39 -0500)]
client: Remove got_safe from MetaRequest
Once a safe reply is received, we remove the
request from the mds_requests map, so checking that
it might be a duplicate won't succeed. This patch
removes the got_safe checks in the reply handling code
and the got_safe field on the MetaRequest to avoid confusion.
Yehuda Sadeh [Mon, 25 Mar 2013 16:50:33 +0000 (09:50 -0700)]
rgw: bucket index ops on system buckets shouldn't do anything
Fixes: #4508
Backport: bobtail
On certain bucket index operations we didn't check whether
the bucket was a system bucket, which caused the operations
to fail. This triggered an error message on bucket removal
operations.
Noah Watkins [Fri, 22 Mar 2013 19:42:47 +0000 (12:42 -0700)]
java: support ceph_get_osd_addr
Adds a few JNI utilities from the Android project (license: Apache) to
help with IP address conversions. These functions are also updated to
work in our environment (use Ceph exception utilities, edit header
paths).
Sage Weil [Fri, 22 Mar 2013 19:32:15 +0000 (12:32 -0700)]
crush, mon: unlink vs remove
Make an 'unlink' mode of remove that will remove a link to a bucket but
not remove the bucket itself. This refactors remove_item[_under] and moves
some of the checks into common helpers where they are not duplicated. Fix
callers to pass the extra arg.
Sage Weil [Thu, 21 Mar 2013 18:04:59 +0000 (11:04 -0700)]
mon: add 'osd crush add-bucket <name> <type>'
This is (I think) the last missing piece to let you construct an entire
map via the CLI. The add/set commands will construct intervening ancestor
nodes provide there is an existing ancestor to stick them under, but this
is needed to create the initial root node.
Sage Weil [Tue, 19 Mar 2013 21:26:16 +0000 (14:26 -0700)]
os/FileJournal: fix aio self-throttling deadlock
This block of code tries to limit the number of aios in flight by waiting
for the amount of data to be written to grow relative to a function of the
number of aios. Strictly speaking, the condition we are waiting for is a
function of both aio_num and the write queue, but we are only woken by
changes in aio_num, and were (in rare cases) waiting when aio_num == 0 and
there was no possibility of being woken.
Fix this by verifying that aio_num > 0, and restructuring the loop to
recheck that condition on each wakeup.
Fixes: #4079 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>