Sage Weil [Mon, 15 Dec 2008 05:39:54 +0000 (21:39 -0800)]
osd: generate_backlog asynchronously in a work queue; simplify peering a bit
We do all backlog creation in a thread pool. Break it down into the
disk scan and log integration steps, and drop PG lock as much as possible.
We only worry about pg acting changes; backlogs are only generated when the
pg is inactive.
We also simplify the activation code a bit by observing that replicas only
generate backlogs when their logs are discontiguous with the primary; in
such cases, we pull the backlog during peering and no generate_backlog
(equivalent) is needed for activation.
Sage Weil [Sat, 13 Dec 2008 04:01:14 +0000 (20:01 -0800)]
osd: for remaining peers, pull either log or backlog, but not both.
Pull as far back as peer's last_epoch_started (if they have that much).
This ensures we will pull any divergent entries, if there are any, so
that we can update our peer_missing map accordingly.
Sage Weil [Fri, 12 Dec 2008 23:00:34 +0000 (15:00 -0800)]
osd: simplify master log recreation; fix up Log::copy_after
Pull log from a given point from peer with the largest last_update. Do
not worry about divergence on the peer; that is handled by the new
primary. Simplifies PG::Query struct.
Fix copy_after to set an accurate .bottom, and to behave if the split
point given is divergent (i.e. doesn't actually appear in the log).
Sage Weil [Fri, 12 Dec 2008 05:10:43 +0000 (21:10 -0800)]
osd: rewrite proc_replica_log
After we have the master log, our only real purpose with other peer/stray
logs is to update replica missing maps and to find any missing objects.
Rewrite the log handling to clearly do that, with some comments.
Sage Weil [Fri, 12 Dec 2008 00:06:16 +0000 (16:06 -0800)]
osd: small peer cleanup
Make sure we check peer_log_requested and peer_summary_requested
independently, depending on which we want. Move 'since'
calculation to where it is needed.
Sage Weil [Wed, 10 Dec 2008 00:34:54 +0000 (16:34 -0800)]
mon: mark unresponsive mds laggy instead of failed until we can replace it
This way we flag laggy mds's, but hold out until they come back
online or we have a standby cmds to replace them. Should make
things much more tolerable.
Sage Weil [Wed, 10 Dec 2008 00:00:27 +0000 (16:00 -0800)]
osd: make sure hb peers get marked down
We mark_down on osdmap update when we see an osd has gone down, but the
heartbeats are sent in a different thread without map_lock using
heartbeat_inst. So, make sure heartbeat_inst entries are removed.
Also, we add hb peers at peers' request. When removing such entries in
update_heartbeat_peers, mark_down then, too. (We may mark_down a failed
peer, and then receive the hb request late. So we mark that down next
time we update the heartbeat maps.)
Sage Weil [Mon, 8 Dec 2008 21:50:46 +0000 (13:50 -0800)]
mds: stay loner if client has B and no other reason to switch state
If the client has dirty data, and there is no other reason to
toggle the lock state, leave it as LONER. The client will write
out at its leisure, and we'll avoid an unstable lock state that
is waiting on a potentially slow writeout.
Sage Weil [Mon, 8 Dec 2008 17:54:26 +0000 (09:54 -0800)]
osd: pause scrub wq async
The scrub _process() worker may be waiting on a message from a replica, so
we can't pause it synchronously. Instead, pause_new() to just prevent
new workers from starting.
Sage Weil [Fri, 5 Dec 2008 19:46:55 +0000 (11:46 -0800)]
osd: fix merge_log divergent item detection
An item in our log isn't divergent if it is below the bottom of
olog. Using the last_kept item isn't helpful here because
last_kept is in olog, and may be below that log's bottom.