Josh Durgin [Fri, 6 May 2011 19:23:45 +0000 (12:23 -0700)]
PG: choose_acting needs the value of the osd, not its index
This caused two osds to keep flipping the acting set between [2] and
[0,2] when osd.0 was far behind and needed a backlog. This is visible
as toggling between peering and peering+degraded.
Tommi Virtanen [Fri, 6 May 2011 18:10:16 +0000 (11:10 -0700)]
stop.sh: Avoid bashisms.
I have a habit of running "sh -x stop.sh" whenever it seems
to fail, and that runs it with dash, not bash. Since it
doesn't actually need the bashisms, remove them.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Sage Weil [Fri, 6 May 2011 16:26:32 +0000 (09:26 -0700)]
osd: rearrange #includes to get our assert
Make sure we include boostchart headers before our common/assert.h so that
ours clobbers theirs. Otherwise the generic one will clobber ours and our
assert output won't get logged or be as pretty.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Thu, 5 May 2011 23:08:58 +0000 (16:08 -0700)]
osdmap: fix temp osd pg mapping
If you feed in a raw pg (full precision) you should get the same mapping
out as when you plug in the effective/reduced precision pg. The
raw_to_temp_osds() wasn't doing that, which gave you results like
Sage Weil [Thu, 5 May 2011 22:15:03 +0000 (15:15 -0700)]
mon: do not stop mds0 unless all other nodes have also stopped
If we are the root node or the tableserver, we have to shut down last.
(And even then, if we have client sessions, we can't fully shut down, we
can only kill ourselves!)
Fixes: #1048 (sorta) Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Thu, 5 May 2011 18:20:58 +0000 (11:20 -0700)]
osd: fix GetInfo down check
The PgPriorSet::down set can have lots of stuff it in without it affecting
peering completion. We just need to look at the some_down flag to tell us
if any nodes in the _cur_ set are down, which indicates no progress is
possible (unless/until someone starts up again or last_epoch_started moves
forward in time).
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Thu, 5 May 2011 15:54:23 +0000 (08:54 -0700)]
osd: handle notify+info explicitly in GetInfo state
This fixes a few things:
- do not proceed past GetInfo if there are down osds. ever.
- if we get a new info that moves last_epoch_started forward,
rebuild prior, because we may have eliminated said down osds.
- if we get dup info, do nothing
- if we get new info, see if we can proceed to GetLog
This is all simpler/cleaner by handling Notify/Info (they're the same)
explicitly in the GetInfo state and not falling back to the parent
state handler.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Thu, 5 May 2011 15:12:24 +0000 (08:12 -0700)]
osd: fix GetInfo querying
Don't query for info we already have, or have already requested. Remove
unneeded helper so that this is simpler and we have access to the info
we need.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Thu, 5 May 2011 15:11:41 +0000 (08:11 -0700)]
osd: handle event notify/info/log from Initial
We shouldn't post a creation event and jump into peering/stray based on
pg creation when we are about to process more information or else we will
send out unnecessary queries. Instead, handle those from Initial and jump
to the appropriate state.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
XML parsing functions in RGW now return a bool, indicating whether they
were able to get the fields they needed to out of the XML.
If any field returns false, the parsing is deemed to have failed.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Wed, 4 May 2011 20:05:09 +0000 (13:05 -0700)]
osd: move directly to Reset state on pg load
Add Initial -> Reset transition on pg load. This avoids doing any
activation-type stuff (like sending messages) before we are ready. In
particularly, we want to advance through any new OSDMaps and only
send out queries/notifies/whatever when we get to the activate_map
stage.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Samuel Just [Wed, 4 May 2011 17:21:54 +0000 (10:21 -0700)]
PG: ReplicaActive must repond to requests from discover_all_missing
If the peer does not yet have the pg during GetMissing, there won't be
a peer_missing entry for that peer. In that case, discover_all_missing
can legitimately request a missing set after the pg has gone active.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Greg Farnum [Wed, 4 May 2011 17:47:43 +0000 (10:47 -0700)]
uclient: only try to update caps on the auth MDS.
Previously we would send updates on things like the max_size we
wanted to the first MDS in our list, which was bad if the auth mds
had a higher number. Now, only send them (and update bookkeeping)
for the auth MDS.
Josh Durgin [Wed, 4 May 2011 16:10:00 +0000 (09:10 -0700)]
PG: use a state_name member instead of overriding get_state_name
Also add debugging to each state constructor. Since dout uses
the recovery machine context, anything using it in the constructor
must be a state, not a simple_state.
Sage Weil [Tue, 3 May 2011 22:31:28 +0000 (15:31 -0700)]
osd: feed new pg mapping into state machine
instead of recalculating it. Also pass the last map into warm_restart,
while we're at it. Drop the Reset state constructor and instead repost
the AdvMap event before transitioning.
If all other aspects of two objects match, we should make sure the ACLs
match before deciding that there's nothing for us to do.
Restructure LocalCopy so that it no longer contains the ACL. Create
LocalAcl to represent a cached local copy of the ACL. Add get_acl
methods to all stores.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Tue, 3 May 2011 20:08:35 +0000 (13:08 -0700)]
osd: fix pg log entry types to not always be delete
This was broken by the osd_trans work merged in 01f3526b62. We need to
use the obs reference to new_obs. This caused objects to be deleted during
pg recovery.
Sage Weil [Tue, 3 May 2011 19:34:54 +0000 (12:34 -0700)]
osdmap: allow incremental to represent osd deletion
Convert new_down to new_state, with values xored onto the old state. We
preserve compatibility with old incrementals because they were (virtually)
always 0, and we can special case that to mean toggle CEPH_OSD_UP. We
don't really care if clients get new values right.. if they don't clear
the EXISTS flag that doesn't really hurt them. It's only important that
the monitor get it right.
To ensure that, we rev the monitor internal protocol.
Samuel Just [Tue, 3 May 2011 00:03:56 +0000 (17:03 -0700)]
OSD,PG: Peering refactor
Previously, peering was handled by a defacto state machine in do_peer
and related methods. Peering state will now be encapsulated in
RecoveryState, which uses boost::state_chart internally to enforce an
explicit state machine abstraction. OSD::handle_pg_* pass off to
PG::handle_*, which pass messages to the state machine.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just [Fri, 22 Apr 2011 00:42:51 +0000 (17:42 -0700)]
OSD,PG: Move pg reset code from OSD::advance_map to PG
OSD::advance_map previously handled resetting the PG for peering. Now,
PG::acting_up_affected returns true if peering needs to be restarted and
PG::warm_restart takes care of restting the pg.