This commit is just erroneous. It adds checks on a pipe write
for the result and an abort if the write failed. But that's broken
in the desired case where we succeed, block on ceph_fuse_ll_main(),
and the parent process is long-gone by the time we get to this code!
Greg Farnum [Fri, 3 Jun 2011 18:53:10 +0000 (11:53 -0700)]
rados_bencher: re-add written objects constraint to read benchmark.
Somehow, in the last major change, the constraints that kept the
bencher from trying to read non-existent objects got removed. Put
a check back in the main bench loop to fix that.
Greg Farnum [Fri, 3 Jun 2011 16:53:20 +0000 (09:53 -0700)]
mds: Clean up _rename_prepare journaling
This has been broken for a while in terms of journaling
things the MDS isn't auth for. This patch should fix that, and
adds a few asserts to that effect.
Also adds a new not_journaling flag to _rename_prepare
for those cases which call the function and then discard
the bufferlist results. Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Thu, 2 Jun 2011 21:27:23 +0000 (14:27 -0700)]
uclient: reset flushing_caps on (mds) cap import.
Previously, we could get stuck thinking that we'd flushed caps
(that went to the original MDS, waited on freeze for export,
and then were dropped) without ever telling the auth MDS that we
wanted to do so. This caused hung shutdowns:
1) during shutdown we drop all our caps
2) we get stuck and notice that we have a flushing cap
3) we send cap flush
4) MDS ignores it (I think because actual data already got updated?
and now we don't have the proper caps either)
Greg Farnum [Thu, 2 Jun 2011 18:43:05 +0000 (11:43 -0700)]
uclient: don't use racy check for uncommitted data.
Previously we used a check for if there were CEPH_CAP_FILE_BUFFER refs,
but that was racy if we had other threads (they could hold caps for
sync writes or something). Instead, see if we have any in-flight
writes or uncommitted objects.
Download files to a 'temporary' name and then rename them when they are
complete. If the download gets aborted halfway through, this prevents
rados sync from getting confused next time when it sees the downloaded
file. This is similar to how rsync operates.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Thu, 28 Apr 2011 04:28:26 +0000 (21:28 -0700)]
librados: implement aio_flush
Implement a per-ioctx flush that blocks until all previously submitted
aio operations on the ioctx are safe. Each aio gets a sequence number and
is put on a linked list attached to the ioctx. The flush operation waits
for it to drain to the watermark set when flush is first called.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Wed, 25 May 2011 23:22:06 +0000 (16:22 -0700)]
osd: ignore old/stale heartbeat messages
If we get heartbeat messages from old epochs from peers that are not
current, drop them and mark the connection down. Even if they are peers
we _should_ have (because we haven't gotten a notify yet to learn about
a pg we should have but don't yet) we have a newer map epoch and will learn
about them shortly, reopening the connection.
Fixes: #1107 Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Wed, 25 May 2011 21:54:15 +0000 (14:54 -0700)]
mds: fix canceled lock attempt
If client tries to lock a file, has to wait, and then cancels the attempt,
the client will send an unlock request to unwind its state.
- the unlock now removes the waiting lock attempt from the wait list
- when the lock request retries and finds it is no longer on the wait
list it will fail.
Samuel Just [Wed, 25 May 2011 17:54:27 +0000 (10:54 -0700)]
PG: fix race in _activate_committed
Previously, _activate_committed would access the osdmap epoch racing
with handle_osd_map's osdmap update. This would allow a message to be
sent from a replica to the primary tagged with the same epoch as
last_warm_restart, though the event actually occured before
last_warm_restart. Thus the primary would fail to ignore the event and
transition to crashed.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Sage Weil [Wed, 18 May 2011 04:29:33 +0000 (21:29 -0700)]
mds: do not shift to EXCL or MIX while rdlocked
There was an old change in file_eval() that was allowing us to switch from
SYNC to MIX or EXCL while there were rdlocks, which either caused lots of
lock thrashing or could (I think) hang things up completely. This was
from ea10a672, an ancient fix for something related that appears to have
taken out the rdlocked check by accident.
In my tests (one writer, one stat-er), this took things from long stalls
(up to 20 seconds) to very responsive stats. Yay!
Fixes: #791 Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
If "rados put" uses write instead of write_full, the resulting object on
the server may be a mismash of old and new objects, if the old object
was longer than the new one. This is fairly counterintuitive behavior
for radostool, so remove it.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Tue, 24 May 2011 16:47:06 +0000 (09:47 -0700)]
osd: add ability to explicitly mark unfound as lost
Instead of automatically marking unfound objects lost (once we've tried
every location we can think of), do it when the administator explicitly
says to. This avoids marking things wrong incorrectly when there are
peering issues, and also allows the administrator to decide whether there
may be offline osds that are worth bringing online.
Sage Weil [Tue, 24 May 2011 16:42:39 +0000 (09:42 -0700)]
osd: make automatically marking of unfound as lost optional
We may not want to do this automatically until we have more confidense in
the recovery code. Even then, possible not. In particular, the OSDs may
believe they have contact all possible homes for the data even though there
is some long-lost OSD that has the data on disk that if offline.
For now, we make the marking process explicit so that the administrator can
make the call.