Mike Ryan [Fri, 14 Sep 2012 17:30:17 +0000 (10:30 -0700)]
timer: add unsafe callbacks option
Using unsafe callbacks drops the lock between invocations of event
callbacks. It is useful under some circumstances, but the user must take
caution. See the comment in Timer.h for full details.
Samuel Just [Fri, 7 Sep 2012 01:02:08 +0000 (18:02 -0700)]
osd/: add PG_STATE_BACKFILLING
PG_STATE_BACKFILLING is set when the pg enters the Backfilling state.
That is, +backfilling indicates that the pg has obtained its
reservations and is now actively backfilling.
Samuel Just [Thu, 6 Sep 2012 22:11:57 +0000 (15:11 -0700)]
osd/: add backfill reservations
Previously, a new osd would be bombarded by backfills from many osds
simultaneously, resulting in excessively high load. Instead, we
want to limit the number of backfills coming into and going out
from a single osd.
To that end, each OSDService now has two AsyncReserver instances: one
for backfills going from the osd (local_reserver) and one for backfills
going to the osd (remote_reserver). For a primary to initiate a
backfill, it must first obtain a reservation from its own
local_reserver. Then, it must obtain a reservation from the backfill
target's remote_reserver via a MBackfillReserve message. This process is
managed by substates of Active and ReplicaActive (see the changes in
PG.h). The reservations are dropped either on the Backfilled event,
which is sent on the primary before calling recovery_complete and on the
replica on receipt of the BackfillComplete progress message), or upon
leaving Active or ReplicaActive.
It's important that we always grab the local reservation before the
remote reservation in order to prevent a circular dependency.
Samuel Just [Wed, 5 Sep 2012 22:56:25 +0000 (15:56 -0700)]
PG: clear want_acting in choose_acting if want == acting
Otherwise, a pg_temp from a previous peering sequence
(but not a different peering_interval) might leak through
into Active and incorrectly trip the
Active::react(AdvMap&) asserts regarding want_acting.
Those asserts assume that want_acting is either empty or is
a results of recovery completion. In the latter case, the
want_acting set much consist only of elements of up and
acting.
Mike Ryan [Mon, 27 Aug 2012 18:16:17 +0000 (11:16 -0700)]
osd: deep scrub, read file contents from disk and compare digest
Deep scrub reads the contents of every file from the store and computes
a crc32 digest. The primary compares the digest of all replicas and will
mark the PG inconsistent if any don't match.
OSDs that do not support deep scrub simply perform an ordinary chunky
scrub. Any subset of OSDs that do support deep scrub will have their
digests compared.
Mike Ryan [Mon, 16 Jul 2012 22:58:26 +0000 (15:58 -0700)]
osd: chunky scrub, scrub PGs a chunk of objects at a time
Chunky scrub is a more efficient scrub. It blocks writes on a subset of
objects and scrubs those, allowing writes through to the rest of the PG.
The scrub takes longer to complete than a classic scrub, but improves
overall write throughput.
This feature is backward-compatible with classic scrub. If the primary
detects that any replica does not have the chunky scrub feature, it
falls back to the less efficient classic scrub.
Samuel Just [Tue, 4 Sep 2012 20:55:09 +0000 (13:55 -0700)]
OSD::handle_pg_stats_ack: grab pg refcount while processing pg
If the queue refcount is the last one for the pg, the pg->put()
in the loop will destroy the pg while the lock is still held
leading to #3071. Thus, grab refcount in case we need to drop
it.
Samuel Just [Tue, 4 Sep 2012 20:32:58 +0000 (13:32 -0700)]
ReplicatedPG: fill in user log entry last after snapdir tran
The user log entry contains the request id, which will be used
by replay ops to put themselves in the correct place in the
waiting_for_commit/ack maps. Thus, the repop needs to be tagged
with the same version as the log entry with the request id.
Thus, the request id bearing log entry should be the last in
the log entry vector.
This should fix #3072, wherein a replay which should wait on
the repop tagged as version '36 will instead wait on '35.
Samuel Just [Thu, 23 Aug 2012 18:10:25 +0000 (11:10 -0700)]
PG: In Active, don't transition to WantActingChange
want_acting is filled in during recovery completion in
order to move the newly backfilled osd into its correct
place. In this case, however, want_acting must contain
only members of acting and up. Thus, we can be sure that
if any of them go down, we would restart peering anyway.
Thus, we need not transition to WaitActingChange, which
does not reflect that we continue to serve client operations
in the interim.
Sage Weil [Tue, 4 Sep 2012 18:29:21 +0000 (11:29 -0700)]
objecter: fix osdmap wait
When we get a pool_op_reply, we find out which osdmap we need to wait for.
The wait_for_new_map() code was feeding that epoch into
maybe_request_map(), which was feeding it to the monitor with the subscribe
request. However, that epoch is the *start* epoch, not what we want. Fix
this code to always subscribe to what we have (+1), and ensure we keep
asking for more until we catch up to what we know we should eventually
get.
Bug: #3075 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Tommi Virtanen [Tue, 4 Sep 2012 15:20:57 +0000 (08:20 -0700)]
doc: Fix leftover "localhost" mention.
Commit dd011aba90831bade3b67e99268429be10635dce changed
the conf file sample to say {hostname}, but changed the
prose only from ``localhost`` to ``{localhost}``.
Sage Weil [Mon, 3 Sep 2012 21:00:09 +0000 (14:00 -0700)]
msg/Pipe: do not special-case failure during connect
Do not special case failure during connect. In particular, we may be
reconnecting and experience a second fault, and wipe out our session
(e.g., between the fs client and the mds) and destroy important session
state.
This logic dates back to the original patch in '08 when the standby
state was introduced.
Bug: #3070 Signed-off-by: Sage Weil <sage@inktank.com>
Eleanor Cawthon [Fri, 8 Jun 2012 18:05:20 +0000 (11:05 -0700)]
test, key_value_store: added distributed flat btree key-value store
Uses one index object and many sub objects to store key-value pairs. The pairs
are stored in the omaps of librados objects. The index contains keys
corresponding to the highest key in an object, and values that contain the
name of the object where the key range is stored. The tree guarantees that
the number of pairs in an object will be > k and < 2k for a user-specified k.
KvStoreBench contains benchmarking tests.
Sage Weil [Fri, 31 Aug 2012 23:31:01 +0000 (16:31 -0700)]
osd: defer backfill with NOBACKFILL osdmap flag is set
If we encounter nobackfill, let ourselves to fall out of the recovery
queue. If we encounter a map that has does not have the flag set and we
are not clean, requeue ourselves. This is a big hammer, but simple.
Dan Mick [Fri, 31 Aug 2012 22:18:53 +0000 (15:18 -0700)]
Clarify CodingStyle with respect to tab compression of space runs Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Greg Farnum <gregory.farnum@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Dan Mick [Fri, 31 Aug 2012 21:41:29 +0000 (14:41 -0700)]
Fix rados put from '-' (stdin)
Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Mike Ryan <mike.ryan@inktank.com> Reviewed-by: Greg Farnum <gregory.farnum@inktank.com> Fixes: #3068
Samuel Just [Fri, 31 Aug 2012 21:01:47 +0000 (14:01 -0700)]
PG: do not update stats in ReplicaActive from info
Bug #2954
Consider the following case:
1) Primary calls share_pg_info()
2) Primary processes client op and sends off sub_op to replica
3) Replica process sub_op
4) Replica process info reverting stat to before 2)
Similarly:
1) Primary processes client op
2) Primary calls share_pg_info()
3) Replica processes info
[4) Replica processes sub_op]
If 4) is interrupted by a map change, we can end up in a case there
the replica's info has a stat which reflects a log entry which
is not there. If that logs ends up authoratative, the most recent
op will be replayed and end up double counted in the log.
There should actually be no cases where the stats change after the
replica goes active except for as part of a sub_op_modify. Thus,
ReplicaActive::MInfoRec should not update the stats.
CID 716882: Copy-paste error (COPY_PASTE_ERROR)At (2): "last_epoch_started" in
"other.last_epoch_started" looks like a copy-paste error. Should it say
"last_epoch_split" instead?
From what I can tell, this really should be checking other.last_epoch_split
rather than other.last_epoch_started.
Samuel Just [Wed, 29 Aug 2012 23:43:02 +0000 (16:43 -0700)]
osd/Watch.h: uninit var in ctor Watch
CID 717345: Uninitialized pointer field (UNINIT_CTOR)At (8): Non-static class
member "obc" is not initialized in this constructor nor in any functions that
it calls.
At (2): Non-static class member "id" is not initialized in this constructor nor
in any functions that it calls.
At (4): Non-static class member "reply" is not
initialized in this constructor nor in any functions that it calls.
At (6): Non-static class member "timeout" is not initialized in this
constructor nor in any functions that it calls.
Samuel Just [Wed, 29 Aug 2012 23:39:25 +0000 (16:39 -0700)]
osd/ReplicatedPG.h: uninit var in ctor RepModify
CID 717344: Uninitialized scalar field (UNINIT_CTOR)At (2): Non-static class
member "epoch_started" is not initialized in this constructor nor in any
functions that it calls.
Samuel Just [Wed, 29 Aug 2012 23:37:50 +0000 (16:37 -0700)]
osd/ReplicatedPG.h: uninit var in ctor OpContext
CID 717343: Uninitialized pointer field (UNINIT_CTOR)At (3): Non-static class
member "snapset" is not initialized in this constructor nor in any functions
that it calls.
Samuel Just [Wed, 29 Aug 2012 23:32:28 +0000 (16:32 -0700)]
osd/PG.h: uninit var in ctor NamedState
CID 717340: Uninitialized pointer field (UNINIT_CTOR)At (2): Non-static class
member "state_name" is not initialized in this constructor nor in any functions
that it calls.
Samuel Just [Wed, 29 Aug 2012 23:31:20 +0000 (16:31 -0700)]
osd/PG.h: uninit var in ctor OndiskLog
CID 717342: Uninitialized scalar field (UNINIT_CTOR)At (2): Non-static class
member "has_checksums" is not initialized in this constructor nor in any
functions that it calls.
Samuel Just [Wed, 29 Aug 2012 23:30:07 +0000 (16:30 -0700)]
osd/PG.h: uninit var in ctor IndexedLog
CID 717339: Uninitialized scalar field (UNINIT_CTOR)At (2): Non-static class
member "last_requested" is not initialized in this constructor nor in any
functions that it calls.
Samuel Just [Wed, 29 Aug 2012 23:25:20 +0000 (16:25 -0700)]
osd/OpRequest.h: uninit vars in ctor OpRequest
At (2): Non-static class member "hit_flag_points" is not initialized in this
constructor nor in any functions that it calls. CID 717338: Uninitialized
scalar field (UNINIT_CTOR)At (4): Non-static class
member "latest_flag_point" is not initialized in this constructor nor in any
functions that it calls.