Split common_init_daemonize from common_init_finish
Split off common_init_daemonize from common_init_finish. cfuse is a
daemon that calls common_init_finish, but handles daemonization itself.
This fixes cfuse.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Get rid of the initialize-then-shutdown-crypto hack. We just initialize
crypto once, after it is safe to do so. There is now a single callback,
common_init_finish, which does the final stage of initialization,
including starting crypto and daemonization (if required.)
common_init_finish needs to be done before messenger::start().
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Fri, 20 May 2011 21:45:36 +0000 (14:45 -0700)]
osd: more heartbeat rework
A few things:
- track Connection* instead of entity_inst_t for hb peers
- we can only send maps over the cluster_messenger
- if peer is still alive, do that
- if peer is not, send dying MOSDPing ping with YOU_DIED flag
Sage Weil [Fri, 20 May 2011 19:55:29 +0000 (12:55 -0700)]
osd: rework peer map epoch caching
We try to keep track of which epochs our peers have so that we can be
semi-intelligent about which map incrementals we send preceeding any
messages. Since this is useful from the heartbeat and cluster channels/
threads, protect the data with an inner lock and clean up the callers.
Be smarter about when we forget.
Make note of peer epoch when we receive a ping.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Fri, 20 May 2011 17:42:16 +0000 (10:42 -0700)]
osd: do not clobber explicitly requested heartbeat_to target addresss
Consider peer P.
- P does down in, say, epoch 60, and back up in epoch 70
- P and requests a heartbeat, as_of 70
- We update to map 50, and coincidentally add the same peer as a target
- We set the heartbeat_to[P] = 50 and start sending to the _old_ address
- P marks us down because we stop sending to the new addr
- We eventually get map 70, but it's too late!
Make sure we preserve any _to targets _and_ their epoch+inst.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Fri, 20 May 2011 16:29:10 +0000 (09:29 -0700)]
osd: request proper log extent for missing
We can't blinding ask for everything since last_epoch_started because that
may mean we get some fragment of a backlog. Look at the peer's log
ranges and request the correct thing. Also, in fulfill_log, infer what
the primary should have asked for if they make a bad request.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Fri, 20 May 2011 07:27:00 +0000 (00:27 -0700)]
osd: take remote log when it is clearly superior
I'm hitting a case where the primary is compensating for a replica's
last_complete < log.tail by sending a log+backlog, but the replica
isn't smart enough to take advantage. In this case,
replica: log(781'26629,781'26631]
from primary: log(781'26629,781'26631]+backlog
result: log(781'26629,781'26631]
Doh!
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Fri, 20 May 2011 07:14:24 +0000 (00:14 -0700)]
osd: fix compensation for bad last_complete
If the peer has a last_complete below their tail, we can get by with our
log (without backlog) if our tail if _before_ their last_complete, not
after. Otherwise, we need a backlog!
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Fri, 20 May 2011 06:40:12 +0000 (23:40 -0700)]
osd: include past acting osds if they were up
This fixes a bug where we were excluding up (but not acting) nodes from
past intervals, which in turn was triggering a nasty choose_acting loop
(because we _do_ already include acting but !up from the current
interval).
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Thu, 19 May 2011 22:03:13 +0000 (15:03 -0700)]
client: hold FILE_BUFFER ref while waiting for dirty throttle
We may block in the write path because we've reached out dirty data limit.
Hold a reference to the FILE_BUFFER cap during that interval so we don't
lose the cap and put new dirty buffers into the objectcacher out of turn.
(We could also recheck our ability to take the ref after blocking, but I
think this is cleaner.)
Sage Weil [Thu, 19 May 2011 22:00:34 +0000 (15:00 -0700)]
client: assert(in) on _flush
We should never arrive in _flush() and not have a reference to the inode
in question, because the presence of dirty buffers pins the inode. This
condition was introduced forever ago; clean it out.
Josh Durgin [Thu, 19 May 2011 21:31:30 +0000 (14:31 -0700)]
PG: choose_log_location: prefer OSDs with a backlog
Without preferring an OSD with a backlog, PGs would get stuck in the
active state when acting != up and the backlog was on an OSD with the
same last_update but a lower number or log_tail.
Josh Durgin [Wed, 18 May 2011 22:54:06 +0000 (15:54 -0700)]
PG: choose acting set and newest_update_osd based on a map of all osds
newest_update osd should be stable when the primary changes, to
prevent cycles of acting set choices. For the same reason, we should
not treat the primary as a special case in choose_acting.
Also remove the magic -1 from representing the current primary.
Josh Durgin [Wed, 18 May 2011 23:15:28 +0000 (16:15 -0700)]
PG: GetLog: don't fail if we get an outdated log
If we request a log from one osd, and then another member of our prior
set comes up with a later last_update, we should not fail when we
receive the first log.
Samuel Just [Tue, 17 May 2011 22:59:32 +0000 (15:59 -0700)]
PG: make choose_acting a bit smarter
This change allows old strays that don't need backlogs
to stay acting until current members of the up set are caught up.
This allows the up set to maintain its full size during peering.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
I'm not having any problems linking. I suspect this was some automake
failure and that a 'make clean' is all that's needed to put everything
straight...
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Wed, 18 May 2011 17:09:34 +0000 (10:09 -0700)]
logclient: get rid of send_log; simplify monitor special casing
Change the SYNC flag to MON and send the Mlog synchronously in the do_log
call. This eliminates teh send_log vestigates completely. Either we are
a monitor and queue for ourselves immediately, or log sending is handled
by MonClient.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Wed, 18 May 2011 16:58:46 +0000 (09:58 -0700)]
logclient: send entries once per mon session
We have a lossless session with the monitor! Only send log entries once.
Otherwise, if the mon is down or something, we end up building up a HUGE
backlog of requests by resending the same messages over and over again.
To do this:
- keep track of which entire we've sent.
- reset when the session resets
- let the MonClient control when log entries are sent, and reset
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>