Tommi Virtanen [Wed, 23 Nov 2011 01:48:40 +0000 (17:48 -0800)]
common/pick_address: Fix IP address stringification.
Different sockaddr_* have the actual address (sin_addr, sin6_addr)
at different offsets, and sockaddr->sa_data just isn't enough.
inet_ntop conspires by taking a void*. I could figure out the right
offset with a switch (found->sa_family), but let's go for the
supposedly write-once-run-with-any-AF solution, getnameinfo.
Which, naturally, takes an extra length argument that is AF-specific,
and not provided anywhere nicely by getifaddrs. Huzzah!
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Sage Weil [Tue, 22 Nov 2011 18:09:41 +0000 (10:09 -0800)]
mon: mark down all connections when rank changes
The election and some other stuff depend on msg->get_source().num() to get
the peer rank, and that is part of the connection state. If it changes,
we need to close old connections and open new ones so that we aren't
taken for someone else (like mon.-1).
Sage Weil [Mon, 21 Nov 2011 21:23:59 +0000 (13:23 -0800)]
osd: protect handle_osd_map requeueing with queue lock
pending_ops was protected by osd_lock, but it tracks something in the
queue, which has it's own lock. Messy. Also, useless, since
wait_for_no_ops had a single caller in shutdown() that op_wq.drain() can
do for us.
Rip it out, and track queue size under the queue lock.
Fixes: #1727 Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Sun, 20 Nov 2011 22:26:09 +0000 (14:26 -0800)]
paxos: fix sharing of learned commits during collect/last
We can learn either an uncommitted or committed value during the
collect/last recovery phase. For the committed values, we need to remember
each peer's first/last_committed and share only at the end to avoid a
situation like:
- mon.1 has same last_committed as us
- mon.2 has newer last_commited, we save it
- mon.3 has same last_commited as mon.1, we share new value
- done... but mon.1 never got mon.2's newer commit.
Instead, save the commit sharing until the collect process completes, so
we know that any committed value learned from anyone is shared with
everyone who needs it.
Greg Farnum [Fri, 18 Nov 2011 23:56:35 +0000 (15:56 -0800)]
osdmon: set the maps-to-keep floor to be at least epoch 0
Looks like this conditional was just set backwards by mistake. There
have been a number of issues with OSDMap versions that are probably
related to this...
(Thanks to some smarts in trim_to, we at least did not trim ALL of
our maps. But on every tick prior to epoch 500 [that's the default]
the leader was trimming all old maps off the system.)
Calling osr.flush() is not quite enough since the onreadable callbacks
may not have been called (thus, last_update_applied may still lag behind
the tail of the log).
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Greg Farnum [Fri, 18 Nov 2011 16:47:09 +0000 (08:47 -0800)]
objecter: trigger oncommit acks if the request returns an error code.
Many users only set oncommit acks, so if they get an error code
(which comes only as a CEPH_OSD_OP_ACK right now) the request
disappears into the ether.
(And remove stupid debug statements while we're at it.)
Sage Weil [Fri, 18 Nov 2011 04:45:54 +0000 (20:45 -0800)]
mon: don't propose new state from update_from_paxos
Proposing a new state from within update_from_paxos() confuses some callers,
like PaxosService::_active(). Instead, do it in the on_active() callback.
This also let's us collapse the check_osd_map() caller into on_active(),
and makes it happen on leaders and peons alike, which ought to avoid some
of the pg creation lag we see sometimes (presumably when the osds have
sessions with peons instead of the leader).
Fixes: #1708 Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Thu, 17 Nov 2011 19:56:37 +0000 (11:56 -0800)]
objecter: set skipped_map if we skip a map
This ensures that we resend _all_ requests, since we aren't sure which
may have mapped to a different primary and then back. This was missed in
the original implementation in 4fe9cca5dd63a1924be2b5cb18f542fb4b97a768.
Sage Weil [Thu, 17 Nov 2011 19:39:36 +0000 (11:39 -0800)]
objecter: send slow osd MPing via Connection*
This may address #1732 indirectly because we have a Connection* reference
here. However, it's still not clear how we ended up with an OSDSession*
for an osd that doesn't exist. :/
Sage Weil [Wed, 16 Nov 2011 18:54:59 +0000 (10:54 -0800)]
mon: always load stashed version when version doesn't match
The slurp process can happen after the monitor has started and has some
in-memory version of the state, and that process may wipe out old
incrementals and change the stashed version. That means that in
update_from_paxos, we need to pull the stashed version if it doesn't
match what we currently have or else we may not have the incrementals we
need to get up to date.
This simplifies and cleans up that code a bit so it is not specific to
monitor startup.
Josh Pieper [Fri, 11 Nov 2011 13:19:55 +0000 (08:19 -0500)]
rgw: Fix some merge problems uncovered by gcc warnings:
* a refactor in e2100bce left the mod_ptr and unmod_ptr members set
incorrectly in RGWCopyObj::init_common
* a fix in 6752babd aggregated error returns, but then failed to do
anything with them
Signed-off-by: Josh Pieper <jjp@pobox.com> Signed-off-by: Sage Weil <sage@newdream.net>
Josh Pieper [Fri, 11 Nov 2011 13:19:02 +0000 (08:19 -0500)]
Resolve gcc warnings.
These should have no functional changes:
* Check errors from functions that currently cannot return any
* Initialize variables that gcc can't determine will be initialized
in a following function call
* Remove unused variables
Signed-off-by: Josh Pieper <jjp@pobox.com> Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Sat, 12 Nov 2011 05:03:09 +0000 (21:03 -0800)]
osd: fix warnings
osd/ReplicatedPG.cc: In member function 'virtual void ReplicatedPG::remove_watchers_and_notifies()':
osd/ReplicatedPG.cc:1167: warning: suggest a space before ';' or explicit braces around empty body in 'for' statement
osd/ReplicatedPG.cc:1176: warning: suggest a space before ';' or explicit braces around empty body in 'for' statement
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Fri, 11 Nov 2011 22:52:14 +0000 (14:52 -0800)]
mon: allow monitor to automagically join cluster
If a monitor starts up with the correct fsid and auth keys, it will now
add itself to the monmap (and subsequently try to join the quorum) if it
is not already in the monmap.
Sage Weil [Fri, 11 Nov 2011 20:02:52 +0000 (12:02 -0800)]
mon: properly process monmaps even when i have the latest
We may get the latest monmap when we are doing our probing, but we still
need to process it in update_from_paxos(). Consider get_latest_version()
in addition to the active map.