Sage Weil [Mon, 28 Nov 2011 00:10:46 +0000 (16:10 -0800)]
mon: search for local ip during mkfs
If an address isn't explicitly specified during mkfs, look for an unnamed
monitor in the (generated) monmap and see if any of those addresses is
configured on the local machine. If so, assume it's us, and name ourselves
in the seed monmap.
Samuel Just [Tue, 22 Nov 2011 17:30:35 +0000 (09:30 -0800)]
ReplicatedPG: Account for clone space usage in make_writeable
Previously, we accounted for clone space usage inconsistently in
write_update_size_and_usage etc when walking through the operations.
make_writeable may change the most recent clone overlap, however, so we
can't handle it until then.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Sage Weil [Wed, 23 Nov 2011 15:02:41 +0000 (07:02 -0800)]
ceph: fix shutdown race
Shut down MonClient before messenger, to avoid race with MonClient::tick()
and MonClient::shutdown().
Fixes
#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1 0x00007f44475e2849 in _L_lock_953 () from /lib/libpthread.so.0
#2 0x00007f44475e266b in __pthread_mutex_lock (mutex=0x14d8dc8) at pthread_mutex_lock.c:61
#3 0x00000000005ae090 in Mutex::Lock (this=0x14d8db8, no_lockdep=false) at ./common/Mutex.h:108
#4 0x000000000068440e in MonClient::shutdown (this=0x14d8c30) at mon/MonClient.cc:386
#5 0x00000000005b2653 in ceph_tool_common_shutdown (ctx=0x14d84c0) at tools/common.cc:661
#6 0x00000000005ada29 in main (argc=7, argv=0x7fff8a2394c8) at tools/ceph.cc:304
vs
#0 0x00007f44475e8a0b in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42
#1 0x00000000005eff6b in reraise_fatal (signum=11) at global/signal_handler.cc:59
#2 0x00000000005f0165 in handle_fatal_signal (signum=11) at global/signal_handler.cc:106
#3 <signal handler called>
#4 0x0000000000000000 in ?? ()
#5 0x000000000068661a in MonClient::tick (this=0x14d8c30) at mon/MonClient.cc:621
#6 0x0000000000689e3b in MonClient::C_Tick::finish(int) ()
#7 0x000000000061b3c5 in SafeTimer::timer_thread (this=0x14d8df8) at common/Timer.cc:102
#8 0x000000000061c6f0 in SafeTimerThread::entry() ()
#9 0x00000000005f1219 in Thread::_entry_func (arg=0x14e1a00) at common/Thread.cc:41
#10 0x00007f44475e0971 in start_thread (arg=<value optimized out>) at pthread_create.c:304
#11 0x00007f4445ead92d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#12 0x0000000000000000 in ?? ()
Tommi Virtanen [Wed, 23 Nov 2011 01:48:40 +0000 (17:48 -0800)]
common/pick_address: Fix IP address stringification.
Different sockaddr_* have the actual address (sin_addr, sin6_addr)
at different offsets, and sockaddr->sa_data just isn't enough.
inet_ntop conspires by taking a void*. I could figure out the right
offset with a switch (found->sa_family), but let's go for the
supposedly write-once-run-with-any-AF solution, getnameinfo.
Which, naturally, takes an extra length argument that is AF-specific,
and not provided anywhere nicely by getifaddrs. Huzzah!
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Sage Weil [Tue, 22 Nov 2011 18:09:41 +0000 (10:09 -0800)]
mon: mark down all connections when rank changes
The election and some other stuff depend on msg->get_source().num() to get
the peer rank, and that is part of the connection state. If it changes,
we need to close old connections and open new ones so that we aren't
taken for someone else (like mon.-1).
Samuel Just [Mon, 21 Nov 2011 23:06:35 +0000 (15:06 -0800)]
PG: it's not necessary to call build_inc_scrub_map in build_scrub_map
Because we have called osr.flush(), it's safe to tag map.valid_through
as last_update. We will still have to catch up once we have stopped
writes and allowed the filestore to catch up anyway.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Sage Weil [Mon, 21 Nov 2011 21:23:59 +0000 (13:23 -0800)]
osd: protect handle_osd_map requeueing with queue lock
pending_ops was protected by osd_lock, but it tracks something in the
queue, which has it's own lock. Messy. Also, useless, since
wait_for_no_ops had a single caller in shutdown() that op_wq.drain() can
do for us.
Rip it out, and track queue size under the queue lock.
Fixes: #1727 Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Sun, 20 Nov 2011 22:26:09 +0000 (14:26 -0800)]
paxos: fix sharing of learned commits during collect/last
We can learn either an uncommitted or committed value during the
collect/last recovery phase. For the committed values, we need to remember
each peer's first/last_committed and share only at the end to avoid a
situation like:
- mon.1 has same last_committed as us
- mon.2 has newer last_commited, we save it
- mon.3 has same last_commited as mon.1, we share new value
- done... but mon.1 never got mon.2's newer commit.
Instead, save the commit sharing until the collect process completes, so
we know that any committed value learned from anyone is shared with
everyone who needs it.
Greg Farnum [Fri, 18 Nov 2011 23:56:35 +0000 (15:56 -0800)]
osdmon: set the maps-to-keep floor to be at least epoch 0
Looks like this conditional was just set backwards by mistake. There
have been a number of issues with OSDMap versions that are probably
related to this...
(Thanks to some smarts in trim_to, we at least did not trim ALL of
our maps. But on every tick prior to epoch 500 [that's the default]
the leader was trimming all old maps off the system.)
Calling osr.flush() is not quite enough since the onreadable callbacks
may not have been called (thus, last_update_applied may still lag behind
the tail of the log).
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Greg Farnum [Fri, 18 Nov 2011 16:47:09 +0000 (08:47 -0800)]
objecter: trigger oncommit acks if the request returns an error code.
Many users only set oncommit acks, so if they get an error code
(which comes only as a CEPH_OSD_OP_ACK right now) the request
disappears into the ether.
(And remove stupid debug statements while we're at it.)
Sage Weil [Fri, 18 Nov 2011 04:45:54 +0000 (20:45 -0800)]
mon: don't propose new state from update_from_paxos
Proposing a new state from within update_from_paxos() confuses some callers,
like PaxosService::_active(). Instead, do it in the on_active() callback.
This also let's us collapse the check_osd_map() caller into on_active(),
and makes it happen on leaders and peons alike, which ought to avoid some
of the pg creation lag we see sometimes (presumably when the osds have
sessions with peons instead of the leader).
Fixes: #1708 Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Thu, 17 Nov 2011 19:56:37 +0000 (11:56 -0800)]
objecter: set skipped_map if we skip a map
This ensures that we resend _all_ requests, since we aren't sure which
may have mapped to a different primary and then back. This was missed in
the original implementation in 4fe9cca5dd63a1924be2b5cb18f542fb4b97a768.
Sage Weil [Thu, 17 Nov 2011 19:39:36 +0000 (11:39 -0800)]
objecter: send slow osd MPing via Connection*
This may address #1732 indirectly because we have a Connection* reference
here. However, it's still not clear how we ended up with an OSDSession*
for an osd that doesn't exist. :/
Sage Weil [Wed, 16 Nov 2011 18:54:59 +0000 (10:54 -0800)]
mon: always load stashed version when version doesn't match
The slurp process can happen after the monitor has started and has some
in-memory version of the state, and that process may wipe out old
incrementals and change the stashed version. That means that in
update_from_paxos, we need to pull the stashed version if it doesn't
match what we currently have or else we may not have the incrementals we
need to get up to date.
This simplifies and cleans up that code a bit so it is not specific to
monitor startup.
Josh Pieper [Fri, 11 Nov 2011 13:19:55 +0000 (08:19 -0500)]
rgw: Fix some merge problems uncovered by gcc warnings:
* a refactor in e2100bce left the mod_ptr and unmod_ptr members set
incorrectly in RGWCopyObj::init_common
* a fix in 6752babd aggregated error returns, but then failed to do
anything with them
Signed-off-by: Josh Pieper <jjp@pobox.com> Signed-off-by: Sage Weil <sage@newdream.net>
Josh Pieper [Fri, 11 Nov 2011 13:19:02 +0000 (08:19 -0500)]
Resolve gcc warnings.
These should have no functional changes:
* Check errors from functions that currently cannot return any
* Initialize variables that gcc can't determine will be initialized
in a following function call
* Remove unused variables
Signed-off-by: Josh Pieper <jjp@pobox.com> Signed-off-by: Sage Weil <sage@newdream.net>