Samuel Just [Tue, 26 Nov 2013 21:20:21 +0000 (13:20 -0800)]
PG: retry GetLog() each time we get a notify in Incomplete
If for some reason there are no up OSDs in the history which
happen to have usable copies of the pg, it's possible that
there is a usable copy elsewhere on the cluster which will
become known to the primary if it waits.
Fixes: #6909 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Yan, Zheng [Tue, 26 Nov 2013 06:41:00 +0000 (14:41 +0800)]
mds: remove superfluous warning of releasing lease
When receiving the lease release message, it's possible that the lease
has already expired and the corresponding dentry has been trimmed from
the cache.
David Zafman [Mon, 25 Nov 2013 20:57:19 +0000 (12:57 -0800)]
osd: Remove bogus assert(active == acting.size())
We saw this assert because active is not correctly computed.
Remove assert and incorrectly computed active count.
We already use acting.size() to determine whether to set PG_STATE_DEGRADED.
Fixes: #6896 Signed-off-by: David Zafman <david.zafman@inktank.com>
Josh Durgin [Fri, 18 Oct 2013 15:23:40 +0000 (08:23 -0700)]
buffer: enable tracking of calls to c_str()
Track buffer::ptr::c_str() to catch internal calls that use it, like
buffer::ptr::cmp(). buffer::list::c_str() will be captured by this as
well, since it will do a final buffer::ptr::c_str() and possibly
several more if it needs to rebuild into a single raw buffer.
Josh Durgin [Fri, 18 Oct 2013 14:46:34 +0000 (07:46 -0700)]
buffer: attempt to size raw_pipe buffers
Make sure the requested length is below the maximum pipe size for now,
since we're only using one pipe and splicing once into and out of
it. The default max is 1MB on recent kernels, so this isn't such a
terrible limitation.
To get around this we could use multiple pipes, or keep both source and
destination fds open at the same time and call splice many times. This
is more usual usage for splice, but would require a lot more work to
restructure the filestore and messenger to handle it.
Josh Durgin [Mon, 21 Oct 2013 19:40:30 +0000 (12:40 -0700)]
buffer: add methods to read and write using zero copy
Create explicit methods for testing. Make buffer::list::write_fd() use
zero-copy if all the buffers support it. Don't automatically handle
reads yet, since we need better detection of read length first.
Josh Durgin [Mon, 21 Oct 2013 15:58:56 +0000 (08:58 -0700)]
buffer: create raw pipe-based buffer
This uses a pipe to reference kernel memory so we can use splice(2) to
avoid extra data copies. Take an fd in the factory to create it, since
that's the only way to use it efficiently, which is its whole purpose.
Josh Durgin [Wed, 16 Oct 2013 23:23:36 +0000 (16:23 -0700)]
buffer: abstract raw data related methods
Create a virtual function that returns the raw data instead of
accessing it directly, so raw buffers backed by pipes can be used as
buffer::ptrs. Make raw::is_page_aligned() virtual so it will not need
to look at the raw data for a pipe-based buffer.
Josh Durgin [Thu, 21 Nov 2013 02:35:34 +0000 (18:35 -0800)]
test: use older names for module setup/teardown
setUp and tearDown require nosetests 0.11, but 0.10.4 is the latest on
centos. Rename to use the older aliases, which still work with newer
versions of nosetests as well.
Fixes: #6368 Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
David Zafman [Fri, 11 Oct 2013 22:53:49 +0000 (15:53 -0700)]
osd: Backfill peers should not be included in the acting set
Create actingbackfill in choose_acting()
Use first backfill target as previously
Add asserts to catch inappropriate use of actingbackfill
Use is_acting() in proc_replica_info() because this is before actingbackfill set
Remove backfill_targets from stray_set to prevent purge_strays from removing collection
Can't check is_replica() anymore for backfill operations since a backfill isn't
a replica due to acting set change.
fixes: #5855
Signed-off-by: David Zafman <david.zafman@inktank.com>
Samuel Just [Wed, 30 Oct 2013 18:21:56 +0000 (11:21 -0700)]
ReplicatedPG/PGBackend: block all ops other than Pull prior to active
Previously, it was guarranteed that prior to activation, flushed would
be false on a replica. Now, there may be a period where flushed is true
due to the flush in Stray completing prior to activation and flushed
being false again. This is necessary since shortly it won't be possible
to determine from the osdmap whether a stray will be activated in a
particular interval.
rallred [Tue, 12 Nov 2013 15:29:19 +0000 (08:29 -0700)]
RBD Documentation and Example fixes for --image-format
- RBD Documentation, --image-format wrongly specified as --format in examples
- RBD Documentation, better describe image format, to differentiate from --format
Josh Durgin [Mon, 18 Nov 2013 22:39:12 +0000 (14:39 -0800)]
osd: fix bench block size
The command was declared to take 'size' in dumpling, but was trying to
read 'bsize' instead, so it always used the default of 4MiB. Change
the bench command to read 'size', so it matches what existing clients
are sending.
Shipping an object_info_t to a replica with the dirty
flag set would cause the replica to interpret that
object as being lost. Instead, we always encode
lost into the slot where dumpling expects to find
it and add another field at the end of the encoding.
Backport: emperor Fixes: #6761 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
Samuel Just [Tue, 12 Nov 2013 23:15:26 +0000 (15:15 -0800)]
ReplicatedPG: test for missing head before find_object_context
find_object_context doesn't return EAGAIN for a missing head.
I chose not to change that behavior since it might hide bugs
in the future. All other callers check for missing on head
before calling into find_object_context because we potentially
need head or snapdir to map a snapid onto a clone.
Backport: emperor Fixes: 6758 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: David Zafman <david.zafman@inktank.com>
Samuel Just [Tue, 12 Nov 2013 21:39:04 +0000 (13:39 -0800)]
JounralingObjectStore: journal->committed_thru after replay
It's possible that the osd stopped between when the filestore
op_seq file was updated and when the journal was trimmed. In
that case, it's possible that on boot the journal might be
full, and yet not be trimmed because commit_start assumes
there is no work to do. Calling committed_thru on the journal
ensures that the journal matches committed_seq.
Backport: emperor dumpling Fixes: 6756 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: David Zafman <david.zafman@inktank.com>
Danny Al-Gaaf [Tue, 5 Nov 2013 18:46:09 +0000 (19:46 +0100)]
osd/osd_types.cc: use !p.tiers.empty() instead of size()
Use empty() since it should be prefered as it has, following the
standard, a constant time complexity regardless of the containter
type. The same is not guaranteed for size().
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>