Sage Weil [Tue, 11 Sep 2012 15:48:34 +0000 (08:48 -0700)]
mon: make redundant osd.NNN argument optional
Instead of 'osd crush set NNN osd.NNN weight loc...', make the second
osd.NNN option optional, and allow either NNN or osd.NNN to specify the
osd id. This makes the usage much more sane, but maintains backward
compatibility.
Sage Weil [Tue, 4 Sep 2012 22:25:20 +0000 (15:25 -0700)]
osd: fill in user log entry last after snapdir tran
Reorder the snapdir logic and ctx->at_version adjustments prior to filling
in the object_info_t and user_versions and all that stuff. Adjust
at_version after appending the log entry (so that it points to the next
position/version we will write at.. culminating in the actual user
event).
The user log entry contains the request id, which will be used
by replay ops to put themselves in the correct place in the
waiting_for_commit/ack maps. Thus, the repop needs to be tagged
with the same version as the log entry with the request id.
Thus, the request id bearing log entry should be the last in
the log entry vector.
This should fix #3072, wherein a replay which should wait on
the repop tagged as version '36 will instead wait on '35.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Instead of just keeping a flat usage info per bucket, we
now maintain a list of categories for which requests
usage is aggregated in. Ops are put in categories based
on their names.
Samuel Just [Wed, 5 Sep 2012 22:56:25 +0000 (15:56 -0700)]
PG: clear want_acting in choose_acting if want == acting
Otherwise, a pg_temp from a previous peering sequence
(but not a different peering_interval) might leak through
into Active and incorrectly trip the
Active::react(AdvMap&) asserts regarding want_acting.
Those asserts assume that want_acting is either empty or is
a results of recovery completion. In the latter case, the
want_acting set much consist only of elements of up and
acting.
Mike Ryan [Mon, 27 Aug 2012 18:16:17 +0000 (11:16 -0700)]
osd: deep scrub, read file contents from disk and compare digest
Deep scrub reads the contents of every file from the store and computes
a crc32 digest. The primary compares the digest of all replicas and will
mark the PG inconsistent if any don't match.
OSDs that do not support deep scrub simply perform an ordinary chunky
scrub. Any subset of OSDs that do support deep scrub will have their
digests compared.
Mike Ryan [Mon, 16 Jul 2012 22:58:26 +0000 (15:58 -0700)]
osd: chunky scrub, scrub PGs a chunk of objects at a time
Chunky scrub is a more efficient scrub. It blocks writes on a subset of
objects and scrubs those, allowing writes through to the rest of the PG.
The scrub takes longer to complete than a classic scrub, but improves
overall write throughput.
This feature is backward-compatible with classic scrub. If the primary
detects that any replica does not have the chunky scrub feature, it
falls back to the less efficient classic scrub.
Samuel Just [Tue, 4 Sep 2012 20:55:09 +0000 (13:55 -0700)]
OSD::handle_pg_stats_ack: grab pg refcount while processing pg
If the queue refcount is the last one for the pg, the pg->put()
in the loop will destroy the pg while the lock is still held
leading to #3071. Thus, grab refcount in case we need to drop
it.
Samuel Just [Tue, 4 Sep 2012 20:32:58 +0000 (13:32 -0700)]
ReplicatedPG: fill in user log entry last after snapdir tran
The user log entry contains the request id, which will be used
by replay ops to put themselves in the correct place in the
waiting_for_commit/ack maps. Thus, the repop needs to be tagged
with the same version as the log entry with the request id.
Thus, the request id bearing log entry should be the last in
the log entry vector.
This should fix #3072, wherein a replay which should wait on
the repop tagged as version '36 will instead wait on '35.
Samuel Just [Thu, 23 Aug 2012 18:10:25 +0000 (11:10 -0700)]
PG: In Active, don't transition to WantActingChange
want_acting is filled in during recovery completion in
order to move the newly backfilled osd into its correct
place. In this case, however, want_acting must contain
only members of acting and up. Thus, we can be sure that
if any of them go down, we would restart peering anyway.
Thus, we need not transition to WaitActingChange, which
does not reflect that we continue to serve client operations
in the interim.
Sage Weil [Tue, 4 Sep 2012 18:29:21 +0000 (11:29 -0700)]
objecter: fix osdmap wait
When we get a pool_op_reply, we find out which osdmap we need to wait for.
The wait_for_new_map() code was feeding that epoch into
maybe_request_map(), which was feeding it to the monitor with the subscribe
request. However, that epoch is the *start* epoch, not what we want. Fix
this code to always subscribe to what we have (+1), and ensure we keep
asking for more until we catch up to what we know we should eventually
get.
Bug: #3075 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Tommi Virtanen [Tue, 4 Sep 2012 15:20:57 +0000 (08:20 -0700)]
doc: Fix leftover "localhost" mention.
Commit dd011aba90831bade3b67e99268429be10635dce changed
the conf file sample to say {hostname}, but changed the
prose only from ``localhost`` to ``{localhost}``.
Sage Weil [Mon, 3 Sep 2012 21:00:09 +0000 (14:00 -0700)]
msg/Pipe: do not special-case failure during connect
Do not special case failure during connect. In particular, we may be
reconnecting and experience a second fault, and wipe out our session
(e.g., between the fs client and the mds) and destroy important session
state.
This logic dates back to the original patch in '08 when the standby
state was introduced.
Bug: #3070 Signed-off-by: Sage Weil <sage@inktank.com>
Eleanor Cawthon [Fri, 8 Jun 2012 18:05:20 +0000 (11:05 -0700)]
test, key_value_store: added distributed flat btree key-value store
Uses one index object and many sub objects to store key-value pairs. The pairs
are stored in the omaps of librados objects. The index contains keys
corresponding to the highest key in an object, and values that contain the
name of the object where the key range is stored. The tree guarantees that
the number of pairs in an object will be > k and < 2k for a user-specified k.
KvStoreBench contains benchmarking tests.
Sage Weil [Fri, 31 Aug 2012 23:31:01 +0000 (16:31 -0700)]
osd: defer backfill with NOBACKFILL osdmap flag is set
If we encounter nobackfill, let ourselves to fall out of the recovery
queue. If we encounter a map that has does not have the flag set and we
are not clean, requeue ourselves. This is a big hammer, but simple.
Dan Mick [Fri, 31 Aug 2012 22:18:53 +0000 (15:18 -0700)]
Clarify CodingStyle with respect to tab compression of space runs Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Greg Farnum <gregory.farnum@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Dan Mick [Fri, 31 Aug 2012 21:41:29 +0000 (14:41 -0700)]
Fix rados put from '-' (stdin)
Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Mike Ryan <mike.ryan@inktank.com> Reviewed-by: Greg Farnum <gregory.farnum@inktank.com> Fixes: #3068