Sage Weil [Fri, 22 Oct 2010 22:55:49 +0000 (15:55 -0700)]
client: fix dup entries in multifrag readdir
We need a next_offset of 0 for non-leftmost frags. Otherwise we set
our dentry offsets incorrectly and the next_offset we return to teh readdir
callback doesn't line up. This was causing the first readdir on a large
multifrag directory to duplicate the last two items.
Greg Farnum [Fri, 22 Oct 2010 20:36:30 +0000 (13:36 -0700)]
Revert "messenger: Make sure to unlock existing->pipe_lock. There are a few cases in the "open" section where we can go to fail_unlocked while still holding existing->pipe_lock. So unlock it."
Greg Farnum [Fri, 22 Oct 2010 18:16:24 +0000 (11:16 -0700)]
messenger: If we error out of accept() but have messages in our queue, save Pipe.
This can occur if we're replacing another Pipe and hit an error
in the process.
Greg Farnum [Fri, 22 Oct 2010 18:15:22 +0000 (11:15 -0700)]
messenger: If we're replacing an existing Pipe, steal queue when we kill it!
Previously we could fail out after killing existing but before
splicing its queue into our own, which lost messages.
osd: PG::prior_set_affected: fix lost OSD detection
When looking for newly-lost OSDs, we should check prior_set_lost rather
than prior_set. Down OSDs often are in PG::prior_set_down and NOT in
PG::prior_set.
Also update comment for prior_set_down. Sometimes OSDs are in both
PG::prior_set and PG::prior_set_down.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
If the journal is a raw block device, the user shouldn't need to give a
journal size argument most of the time-- it should default to using the
entire block device. This was the old default but it got changed
erroneously by commit ad12d5d5be41ce.
Along the way, I split the FileJournal::_open code into multiple
functions, and added some additional checks. We no longer try block
device ioctls on regular files. We now check for block devices that are
too small.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
In the placement group code, track prior_set_lost. This fixes a bug
where a new OSDMap updates an OSD's lost_at time, but the PG code does
not update the PG data structures.
When clearing the peering state, call clear_prior() rather than manually
clearing every prior set.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Greg Farnum [Mon, 18 Oct 2010 18:23:50 +0000 (11:23 -0700)]
messenger: Make sure to unlock existing->pipe_lock. There are a few cases in the "open" section where we can go to fail_unlocked while still holding existing->pipe_lock. So unlock it.
Sage Weil [Thu, 21 Oct 2010 23:15:03 +0000 (16:15 -0700)]
client: fix dcache removal during multiple frags
We remove unexpected dentries from our cache while processing mds results.
Results are ordered within a frag, but not between them. Since we can
have multiple frags, only remove results for the current frag, to avoid
removing items from earlier frags.
Sage Weil [Thu, 21 Oct 2010 18:37:45 +0000 (11:37 -0700)]
objecter: reconnect on osd disconnect
If the connection closes to an OSD, we need to reconnect and resubmit our
ops. Otherwise we just hang. This is problematic if it is a transient
error, since we'll only retry if the OSDMap reflects a change, and that
won't happen for transient network/socket errors and such.
Jim Schutt [Thu, 21 Oct 2010 16:32:09 +0000 (10:32 -0600)]
init-ceph: Make sure daemon_is_running() checks the correct instance
When starting multiple instances of a daemon on a single host,
for unknown reasons /var/run/ceph/$type.$id.pid can hold a pid
for which /proc/$pid/cmdline identifies the right type of daemon,
but the wrong instance. When this happens, all the configured
instances of a daemon are not running, but repeated invocations
of "init-ceph start" do not start the missing instances.
So, check for the correct daemon instance id as well as type when
testing if the daemon is up.
Signed-off-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Sage Weil <sage@newdream.net>
If the journal is a raw block device, the user shouldn't need to give a
journal size argument most of the time-- it should default to using the
entire block device. This was the old default but it got changed
erroneously by commit ad12d5d5be41ce.
Along the way, I split the FileJournal::_open code into multiple
functions, and added some additional checks. We no longer try block
device ioctls on regular files. We now check for block devices that are
too small.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Objecter::shutdown() needs to call Timer::join() to ensure that
concurrently exectuting events in other threads get flushed before the
Objecter and its Timer are destroyed.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Greg Farnum [Tue, 19 Oct 2010 15:57:22 +0000 (08:57 -0700)]
Revert "Revert "messenger: introduce a "halt_delivery" flag, checked by queue_delivery.""
This reverts commit d44267c2d6a77d4a3cda1e44ec7c58a19be51cc4.
The problem with this code was that it's possible for the Pipe
to be reused after calling discard_queue(), and we didn't
account for that. So, with this revert, it now sets halt_delivery=false
at the end of discard_queue() and the Pipe is ready for continued use.
Sage Weil [Mon, 18 Oct 2010 20:28:42 +0000 (13:28 -0700)]
filestore: deliberate crash on ENOSPC or EIO
Neither of these are handled, so crash when we hit them. This ensures we
don't blindly continue on with a partially applied transaction and corrupt
our store any further.
Signed-off-by: Sage Weil <sage@newdream.net>
Conflicts: