git.apps.os.sepia.ceph.com Git

mon: mark unresponsive mds laggy instead of failed until we can replace it

This way we flag laggy mds's, but hold out until they come back
online or we have a standby cmds to replace them. Should make
things much more tolerable.

commit | commitdiff | tree

Sage Weil [Tue, 9 Dec 2008 23:06:48 +0000 (15:06 -0800)]

cobserver: simplify headers

commit | commitdiff | tree

Sage Weil [Wed, 10 Dec 2008 00:00:27 +0000 (16:00 -0800)]

osd: make sure hb peers get marked down

We mark_down on osdmap update when we see an osd has gone down, but the
heartbeats are sent in a different thread without map_lock using
heartbeat_inst.  So, make sure heartbeat_inst entries are removed.

Also, we add hb peers at peers' request.  When removing such entries in
update_heartbeat_peers, mark_down then, too.  (We may mark_down a failed
peer, and then receive the hb request late.  So we mark that down next
time we update the heartbeat maps.)

commit | commitdiff | tree

Sage Weil [Tue, 9 Dec 2008 23:06:54 +0000 (15:06 -0800)]

osd: update_stat during recover_replicas()

commit | commitdiff | tree

Sage Weil [Tue, 9 Dec 2008 22:57:43 +0000 (14:57 -0800)]

dstart: --nostop option

to avoid ./dstop.sh

commit | commitdiff | tree

Sage Weil [Tue, 9 Dec 2008 22:57:19 +0000 (14:57 -0800)]

osd: drive primary recovery via missing map, not log

commit | commitdiff | tree

Sage Weil [Tue, 9 Dec 2008 22:57:01 +0000 (14:57 -0800)]

mon: osdmon cleanup

commit | commitdiff | tree

Sage Weil [Tue, 9 Dec 2008 21:34:27 +0000 (13:34 -0800)]

dstart: keep old cosd binaries around for a bit

commit | commitdiff | tree

Sage Weil [Tue, 9 Dec 2008 21:33:33 +0000 (13:33 -0800)]

osd: 'pg repair <pgid>' to repair an inconsistent pg using replicas

commit | commitdiff | tree

Sage Weil [Tue, 9 Dec 2008 20:27:31 +0000 (12:27 -0800)]

osd: don't read file content during _scrub

commit | commitdiff | tree

Sage Weil [Tue, 9 Dec 2008 19:56:51 +0000 (11:56 -0800)]

msgr: be noisier about mark_down calls

commit | commitdiff | tree

Sage Weil [Tue, 9 Dec 2008 19:00:04 +0000 (11:00 -0800)]

osd: avoid needless calls to peer(), build_prior()

Introduces PEERING pg state. Also is smarter about when build_prior and
peer are actually called.

commit | commitdiff | tree

Sage Weil [Tue, 9 Dec 2008 18:40:01 +0000 (10:40 -0800)]

osd: make prior_set_affected() slightly smarter

Only return true if an osd goes down that we didn't already know was
down (prior_set may contain down osds if the PG is marked DOWN).

commit | commitdiff | tree

Sage Weil [Tue, 9 Dec 2008 22:57:48 +0000 (14:57 -0800)]

cobserver: cleanups

commit | commitdiff | tree

Sage Weil [Tue, 9 Dec 2008 22:55:00 +0000 (14:55 -0800)]

mon: use 'latest' for latest osd, mds maps

Mainly for benefit of PaxosObserver, but it also cleans things up
a bit.

commit | commitdiff | tree

Sage Weil [Tue, 9 Dec 2008 22:44:58 +0000 (14:44 -0800)]

cobserver: cleanup; print map summaries w/ each new state

commit | commitdiff | tree

Sage Weil [Tue, 9 Dec 2008 22:44:41 +0000 (14:44 -0800)]

mon: refactor map print_summary/operator<< methods

commit | commitdiff | tree

Yehuda Sadeh [Tue, 9 Dec 2008 22:23:38 +0000 (14:23 -0800)]

cobserver: accidentaly removed a line

commit | commitdiff | tree

Yehuda Sadeh [Tue, 9 Dec 2008 22:20:44 +0000 (14:20 -0800)]

kclient: missing files

commit | commitdiff | tree

Yehuda Sadeh [Tue, 9 Dec 2008 22:19:27 +0000 (14:19 -0800)]

whitespaces

commit | commitdiff | tree

Yehuda Sadeh [Tue, 9 Dec 2008 22:11:09 +0000 (14:11 -0800)]

mon: factor ClientMap class out

commit | commitdiff | tree

Yehuda Sadeh [Tue, 9 Dec 2008 21:39:10 +0000 (13:39 -0800)]

cobserver: utility, observe changes in different maps

commit | commitdiff | tree

Sage Weil [Tue, 9 Dec 2008 19:47:06 +0000 (11:47 -0800)]

osd: use push() to push clone op

Also fixes missing updates to peer_missing[peer] and pushing
map.

commit | commitdiff | tree

Sage Weil [Tue, 9 Dec 2008 17:58:16 +0000 (09:58 -0800)]

mon: factor our osdmap print, print_summary

commit | commitdiff | tree

Sage Weil [Tue, 9 Dec 2008 17:58:08 +0000 (09:58 -0800)]

mon: factor out mds print, print_summary

commit | commitdiff | tree

Sage Weil [Mon, 8 Dec 2008 21:50:46 +0000 (13:50 -0800)]

mds: stay loner if client has B and no other reason to switch state

If the client has dirty data, and there is no other reason to
toggle the lock state, leave it as LONER. The client will write
out at its leisure, and we'll avoid an unstable lock state that
is waiting on a potentially slow writeout.

commit | commitdiff | tree

Sage Weil [Tue, 9 Dec 2008 17:50:26 +0000 (09:50 -0800)]

osd: missing last_mon_heartbeat declaration

commit | commitdiff | tree

Sage Weil [Tue, 9 Dec 2008 16:48:03 +0000 (08:48 -0800)]

msgr: make sure nonce matches too when connecting to peer

Otherwise the predictable port numbers cause problems.

commit | commitdiff | tree

Sage Weil [Tue, 9 Dec 2008 16:43:38 +0000 (08:43 -0800)]

msgr: print error when message type is unrecognized

commit | commitdiff | tree

Sage Weil [Tue, 9 Dec 2008 16:42:28 +0000 (08:42 -0800)]

osd: ping mon less frequently when peerless

Every second is too much. Make it tunable.

commit | commitdiff | tree

Sage Weil [Mon, 8 Dec 2008 22:03:46 +0000 (14:03 -0800)]

mon: typo in pg dump output

commit | commitdiff | tree

Sage Weil [Mon, 8 Dec 2008 19:44:21 +0000 (11:44 -0800)]

ceph: new default mon port; try to bind to port in known range

New monitor port in unused region (according to nmap-services).

Try to bind to a port in a known range, so that tools can easily
identify the protocol in use.

Remove some old .sh cruft.

commit | commitdiff | tree

Sage Weil [Mon, 8 Dec 2008 19:15:10 +0000 (11:15 -0800)]

mon: 'pg map <pgid>' command

To see current pg -> osd mapping

commit | commitdiff | tree

Sage Weil [Mon, 8 Dec 2008 18:11:12 +0000 (10:11 -0800)]

mon: 'osd dump' command; refactor sstream->bufferlist code a bit

commit | commitdiff | tree

Sage Weil [Mon, 8 Dec 2008 18:01:37 +0000 (10:01 -0800)]

osdmap: use print method from osdmaptool

commit | commitdiff | tree

Sage Weil [Mon, 8 Dec 2008 17:54:26 +0000 (09:54 -0800)]

osd: pause scrub wq async

The scrub _process() worker may be waiting on a message from a replica, so
we can't pause it synchronously. Instead, pause_new() to just prevent
new workers from starting.

commit | commitdiff | tree

Sage Weil [Fri, 5 Dec 2008 23:57:43 +0000 (15:57 -0800)]

osd: lock pg before calling on_shutdown

commit | commitdiff | tree

Sage Weil [Fri, 5 Dec 2008 23:57:28 +0000 (15:57 -0800)]

osd: fix degraded figure calculation typo

commit | commitdiff | tree

Sage Weil [Fri, 5 Dec 2008 23:45:08 +0000 (15:45 -0800)]

cmonctl: resend command if monitor is not responsive

commit | commitdiff | tree

Sage Weil [Fri, 5 Dec 2008 23:35:22 +0000 (15:35 -0800)]

cmonctl: interactive mode using libedit

commit | commitdiff | tree

Sage Weil [Fri, 5 Dec 2008 22:22:17 +0000 (14:22 -0800)]

osd: log scrub ok

commit | commitdiff | tree

Sage Weil [Fri, 5 Dec 2008 22:22:01 +0000 (14:22 -0800)]

mon: rename out to log, log.type files

commit | commitdiff | tree

Sage Weil [Fri, 5 Dec 2008 22:00:59 +0000 (14:00 -0800)]

mon: notify PaxosService of any paxos state changes

commit | commitdiff | tree

Sage Weil [Fri, 5 Dec 2008 22:29:17 +0000 (14:29 -0800)]

mon: dump full pgmap on each state change (for debugging)

commit | commitdiff | tree

Sage Weil [Fri, 5 Dec 2008 22:27:25 +0000 (14:27 -0800)]

osd: don't die on stray sub op acks

If a replica drops out of the pg, we force and ack in on_change(), but
may still get it later. Don't freak out.

commit | commitdiff | tree

Sage Weil [Fri, 5 Dec 2008 19:48:27 +0000 (11:48 -0800)]

osd: generate_backlog sanity check

If item is on disk and log, then log entry shouldn't be a delete.

commit | commitdiff | tree

Sage Weil [Fri, 5 Dec 2008 19:46:55 +0000 (11:46 -0800)]

osd: fix merge_log divergent item detection

An item in our log isn't divergent if it is below the bottom of
olog. Using the last_kept item isn't helpful here because
last_kept is in olog, and may be below that log's bottom.

commit | commitdiff | tree

Sage Weil [Fri, 5 Dec 2008 19:45:05 +0000 (11:45 -0800)]

osd: clean on ondisklog a bit

Express as extent, not interval.

commit | commitdiff | tree

Sage Weil [Fri, 5 Dec 2008 19:18:25 +0000 (11:18 -0800)]

osd: make read_log output a bit more informative

commit | commitdiff | tree

Sage Weil [Fri, 5 Dec 2008 19:08:53 +0000 (11:08 -0800)]

vstart: only sudo if -e dev/sudo

commit | commitdiff | tree

Sage Weil [Fri, 5 Dec 2008 19:00:47 +0000 (11:00 -0800)]

osd: revise missing map adjustment

Rewrite helpers in terms of how they are actually used.

commit | commitdiff | tree

Sage Weil [Fri, 5 Dec 2008 18:59:23 +0000 (10:59 -0800)]

osd: mark backlog events as BACKLOG

This is purely to make the logs easier to read.

commit | commitdiff | tree

Sage Weil [Fri, 5 Dec 2008 18:01:28 +0000 (10:01 -0800)]

osd: generate_backlog fixes

Generate backlog records even if the object appears in the log if
the existing entry's prior_version in non-zero and isn't also
in the log. This allows us to accurately generate the .have field
when we are building the missing map.

commit | commitdiff | tree

Sage Weil [Fri, 5 Dec 2008 04:49:11 +0000 (20:49 -0800)]

crush: add include

commit | commitdiff | tree

Yehuda Sadeh [Fri, 5 Dec 2008 00:50:50 +0000 (16:50 -0800)]

kclient: reduce stack usage

commit | commitdiff | tree

Sage Weil [Fri, 5 Dec 2008 00:32:53 +0000 (16:32 -0800)]

mon: 'osd scrub \*' to scrub all osds

commit | commitdiff | tree

Sage Weil [Fri, 5 Dec 2008 00:28:33 +0000 (16:28 -0800)]

filestore: fix up listxattr buffer management a bit

commit | commitdiff | tree

Sage Weil [Fri, 5 Dec 2008 00:27:28 +0000 (16:27 -0800)]

dstart: put debug output on local disk

commit | commitdiff | tree

Sage Weil [Fri, 5 Dec 2008 00:27:09 +0000 (16:27 -0800)]

debug: allow output and output symlinks to go in different directories

commit | commitdiff | tree

Yehuda Sadeh [Fri, 5 Dec 2008 00:14:55 +0000 (16:14 -0800)]

logmonitor: append all notifications in a single file

commit | commitdiff | tree

Sage Weil [Thu, 4 Dec 2008 22:58:58 +0000 (14:58 -0800)]

set/check subprotocol versions

commit | commitdiff | tree

Sage Weil [Thu, 4 Dec 2008 22:30:11 +0000 (14:30 -0800)]

mon: clean up paxos service registration a bit. rev disk format.

commit | commitdiff | tree

Yehuda Sadeh [Thu, 4 Dec 2008 22:08:39 +0000 (14:08 -0800)]

cleanup, whitespace

commit | commitdiff | tree

Yehuda Sadeh [Thu, 4 Dec 2008 21:59:34 +0000 (13:59 -0800)]

log: use of cascading dispatcher for log messages

commit | commitdiff | tree

Yehuda Sadeh [Thu, 4 Dec 2008 21:31:00 +0000 (13:31 -0800)]

dispatcher: cascading dispatch infrastructure

commit | commitdiff | tree

Sage Weil [Thu, 4 Dec 2008 21:46:19 +0000 (13:46 -0800)]

mon: keep pgmap consistent

We were cutting corners and updating the live map before it
committed to paxos, since pg stats aren't system critical. This
can lead to problems due to the way "latest" is saved out, though,
and it can be confusing to see things jump backward in time.

commit | commitdiff | tree

Sage Weil [Thu, 4 Dec 2008 21:41:29 +0000 (13:41 -0800)]

osd: make replica scrub_map generation a subop

This puts build_scrub_map in a worker thread, _and_ ensures it is
serialized wrt any in-progress writes.

commit | commitdiff | tree

Sage Weil [Thu, 4 Dec 2008 21:40:30 +0000 (13:40 -0800)]

logclient: always print log messages to debug output

commit | commitdiff | tree

Sage Weil [Thu, 4 Dec 2008 21:01:28 +0000 (13:01 -0800)]

osd: fix up scrub error log formatting

commit | commitdiff | tree

Sage Weil [Thu, 4 Dec 2008 21:01:19 +0000 (13:01 -0800)]

logclient: optionally take a stringstream

commit | commitdiff | tree

Sage Weil [Thu, 4 Dec 2008 20:18:07 +0000 (12:18 -0800)]

osd: some scrub fixes

Don't drop locks just yet; atm this leaves the dout() prefix
exposed to concurrent modifications of pg state.

Don't requeue for scrub if already scrubbing.

Fix missing object detection bugs.

commit | commitdiff | tree

Sage Weil [Thu, 4 Dec 2008 19:57:18 +0000 (11:57 -0800)]

osd: fix pg_stats.reported value

commit | commitdiff | tree

Sage Weil [Thu, 4 Dec 2008 19:17:58 +0000 (11:17 -0800)]

osd: drop lock during most of scrub; only disallow concurrent writes

Make the PG go read-only during a scrub. Only take the pg lock
when absolutely necessary. Wait for any pending writes to
complete before starting the scrub.

commit | commitdiff | tree

Sage Weil [Thu, 4 Dec 2008 19:16:25 +0000 (11:16 -0800)]

osd: ignore dup scrub maps

commit | commitdiff | tree

Sage Weil [Thu, 4 Dec 2008 19:15:46 +0000 (11:15 -0800)]

osd: take pg ref on scrub_wq

commit | commitdiff | tree

Sage Weil [Thu, 4 Dec 2008 18:58:23 +0000 (10:58 -0800)]

osd: make scrub verify replica object attrs match

commit | commitdiff | tree

Sage Weil [Thu, 4 Dec 2008 18:57:13 +0000 (10:57 -0800)]

osd: fix pg stat acking in osd

commit | commitdiff | tree

Yehuda Sadeh [Thu, 4 Dec 2008 18:47:35 +0000 (10:47 -0800)]

log: logclient uses log types instead of log level

commit | commitdiff | tree

Sage Weil [Thu, 4 Dec 2008 18:08:22 +0000 (10:08 -0800)]

osd: check for missing clones in pick_read_snap

We may need to wait from op_read if we are missing a specific
clone.

commit | commitdiff | tree

Sage Weil [Thu, 4 Dec 2008 18:07:43 +0000 (10:07 -0800)]

osd: clear waiting_for_head when we pull the head; set skipped if we do

Need to ++skipped if we skip because we're waiting for the head
or else we'll incorrectly advanced requested_to.

Clear waiting_for_head entry when we pull a head we're waiting for.

commit | commitdiff | tree

Sage Weil [Thu, 4 Dec 2008 18:06:09 +0000 (10:06 -0800)]

osd: fix missing.add_event

We only should set have to prior_version if we aren't missing the
prior_version too!

commit | commitdiff | tree

Yehuda Sadeh [Thu, 4 Dec 2008 17:46:24 +0000 (09:46 -0800)]

kclient: fix NULL dereferencing oops

commit | commitdiff | tree

Sage Weil [Thu, 4 Dec 2008 05:12:11 +0000 (21:12 -0800)]

osd: version pg_stats_t with <epoch,version> pair; clean up pgmon a bit

commit | commitdiff | tree

Sage Weil [Thu, 4 Dec 2008 03:46:08 +0000 (19:46 -0800)]

osd: keep projected info on in-progress object modifications in memory

Since the primary delays its writes until after replicas ack, we need to
keep projected object info in memory for the duration, because the
semantics very much depend on whether the object exists and what its size
is (well, mainly the pg_stats do).

This can avoid re-parsing SnapSet et al for certain workloads hitting
the same objects repeatedly (e.g., mds journal objects).

commit | commitdiff | tree

Sage Weil [Thu, 4 Dec 2008 00:42:38 +0000 (16:42 -0800)]

osd: fix problems with propagation of info.stats during recovery

merge_log() is called on replicas, do don't use peer_info (which
is primary-only)!

commit | commitdiff | tree

Sage Weil [Thu, 4 Dec 2008 00:41:53 +0000 (16:41 -0800)]

osd: keep tabs on total object copies vs missing/degraded

Define degraded as an object copy that is not present in the
proper location.

commit | commitdiff | tree

Sage Weil [Thu, 4 Dec 2008 00:26:18 +0000 (16:26 -0800)]

osd: fix uninitialized var use

commit | commitdiff | tree

Yehuda Sadeh [Thu, 4 Dec 2008 00:53:38 +0000 (16:53 -0800)]

add missing header file declaration