]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
16 years agoMerge branch 'diskformat' into unstable
Sage Weil [Wed, 3 Dec 2008 23:06:25 +0000 (15:06 -0800)]
Merge branch 'diskformat' into unstable

16 years agomon: include last_scrub info in pg dump
Sage Weil [Wed, 3 Dec 2008 22:57:23 +0000 (14:57 -0800)]
mon: include last_scrub info in pg dump

16 years agoosd: remove pg from recovery_wq with clear_primary_state
Sage Weil [Wed, 3 Dec 2008 22:57:09 +0000 (14:57 -0800)]
osd: remove pg from recovery_wq with clear_primary_state

16 years agoosd: make pg refcounting vs work queues constent
Sage Weil [Wed, 3 Dec 2008 22:51:26 +0000 (14:51 -0800)]
osd: make pg refcounting vs work queues constent

Either refcount items in queue, or don't.

16 years agoosd: do clone scrub based on our generated scrub map
Sage Weil [Wed, 3 Dec 2008 22:37:10 +0000 (14:37 -0800)]
osd: do clone scrub based on our generated scrub map

16 years agoosd: scrub info in pg_stat_t. scrub states.
Sage Weil [Wed, 3 Dec 2008 22:17:54 +0000 (14:17 -0800)]
osd: scrub info in pg_stat_t.  scrub states.

16 years agoosd: fix small quirk read_log missing generation
Sage Weil [Wed, 3 Dec 2008 21:55:48 +0000 (13:55 -0800)]
osd: fix small quirk read_log missing generation

The missing entry .have field was probably wrong due to the use
of missing.add_event (which assumes missing is up to date wrt
the previous log entry).  Use the prior version we just pulled off
disk instead.

Also, be a bit more verbose.

16 years agomon: always discard pending on election completion
Sage Weil [Wed, 3 Dec 2008 21:41:12 +0000 (13:41 -0800)]
mon: always discard pending on election completion

Previously we tried to save the pending if we were still the
leader.  The problem is that while we were not leader, we may have
missed out on some updates, in which case the pending may no longer
be based on the current state.

In the future, we could make the commit waiters smart about callback
return codes so that they try to reapply.  For now, don't worry
about it.

16 years agomds: update segment on ETableServer replay
Sage Weil [Wed, 3 Dec 2008 20:26:24 +0000 (12:26 -0800)]
mds: update segment on ETableServer replay

Otherwise we may forget to flush table changes to disk before
trimming.

Also, clean up code a bit to use update_segment() whenever
possible (instead of duplicating the specific LogSegment update).

16 years agomds: print table version loaded during log replay
Sage Weil [Wed, 3 Dec 2008 20:20:35 +0000 (12:20 -0800)]
mds: print table version loaded during log replay

16 years agoosd: default 2x pg only for now
Sage Weil [Wed, 3 Dec 2008 20:20:22 +0000 (12:20 -0800)]
osd: default 2x pg only for now

16 years agoosd: distributed scrub compares primary vs replica contents
Sage Weil [Wed, 3 Dec 2008 20:19:59 +0000 (12:19 -0800)]
osd: distributed scrub compares primary vs replica contents

The checks are still pretty trivial at this point.

16 years agoosd: do not clear ops vector to indicate noop (protocol change)
Sage Weil [Wed, 3 Dec 2008 18:35:10 +0000 (10:35 -0800)]
osd: do not clear ops vector to indicate noop (protocol change)

The reply needs to include the full ops vector.  Use a separate
flag to indicate a noop.

16 years agoosd: rewrite pg_stats queueing
Sage Weil [Wed, 3 Dec 2008 00:02:43 +0000 (16:02 -0800)]
osd: rewrite pg_stats queueing

Use an xlist instead of a separate map.  Avoid inefficient
requeueing and external map overhead.

16 years agoosd: remove useless raid4pg from build
Sage Weil [Tue, 2 Dec 2008 22:38:12 +0000 (14:38 -0800)]
osd: remove useless raid4pg from build

16 years agokclient: fix oops in case written size doesn't match request
Yehuda Sadeh [Wed, 3 Dec 2008 18:10:55 +0000 (10:10 -0800)]
kclient: fix oops in case written size doesn't match request

16 years agokclient: some logs revision
Yehuda Sadeh [Wed, 3 Dec 2008 00:16:41 +0000 (16:16 -0800)]
kclient: some logs revision

16 years agoosd: don't forget about skipped clones during recover_primary
Sage Weil [Tue, 2 Dec 2008 22:27:30 +0000 (14:27 -0800)]
osd: don't forget about skipped clones during recover_primary

Only advance requested_to if we haven't skipped any items.

16 years agoalways print snapids in hex
Sage Weil [Tue, 2 Dec 2008 22:26:06 +0000 (14:26 -0800)]
always print snapids in hex

16 years agoosd: log scrub errors to system log
Sage Weil [Tue, 2 Dec 2008 22:25:54 +0000 (14:25 -0800)]
osd: log scrub errors to system log

16 years agomon: system-wide log
Sage Weil [Tue, 2 Dec 2008 22:28:30 +0000 (14:28 -0800)]
mon: system-wide log

Pretty rudimentary still.

16 years agomds: pick_inode_snap should consider follows==0 valid
Sage Weil [Tue, 2 Dec 2008 20:03:15 +0000 (12:03 -0800)]
mds: pick_inode_snap should consider follows==0 valid

I suspect a larger audit of 'follows' semantics may be necessary..
but this fixes the bug I was seeing with:

echo asdf > a
mkdir .snap/1
echo qwer > b
mkdir .snap/2
echo zxcv > a
mkdir .snap/3
sync
cat .snap/1/a   # empty
ll .snap/*      # hangs

16 years agokclient: initialize some protocol fields
Yehuda Sadeh [Tue, 2 Dec 2008 21:37:02 +0000 (13:37 -0800)]
kclient: initialize some protocol fields

16 years agoosd: log stats for push and pull bytes
Sage Weil [Tue, 2 Dec 2008 03:00:58 +0000 (19:00 -0800)]
osd: log stats for push and pull bytes

16 years agomds: avoid overlapping release attempts
Sage Weil [Tue, 2 Dec 2008 19:45:37 +0000 (11:45 -0800)]
mds: avoid overlapping release attempts

If the first release attempt is waiting for the log to flush, we
should avoid sending any RELEASED ack until all releases have
flushed.  That is, only the last release will ack.  Keep a counter
in Capability to do this.

Otherwise, we may close out a capability from under a release
that is flushing, and our seq # will be meaningless later in
_finish_release_cap() when we're trying to decide what to do.

16 years agokclient: avoid queueing cap_snap when nothing is dirty or writing
Sage Weil [Tue, 2 Dec 2008 19:09:16 +0000 (11:09 -0800)]
kclient: avoid queueing cap_snap when nothing is dirty or writing

Save ourselves the trouble when there is nothing to flush

16 years agoosd: send old_version to replicas
Sage Weil [Tue, 2 Dec 2008 18:49:22 +0000 (10:49 -0800)]
osd: send old_version to replicas

Otherwise CLONE entries in the PG log on replicas have 0'0 for
prior_version, and everything goes to hell.

16 years agoosd: always get old_version; include in debug output
Sage Weil [Tue, 2 Dec 2008 18:43:36 +0000 (10:43 -0800)]
osd: always get old_version; include in debug output

16 years agoosd: show prior_version in log dump, output recovery_primary debug output
Sage Weil [Tue, 2 Dec 2008 18:00:18 +0000 (10:00 -0800)]
osd: show prior_version in log dump, output recovery_primary debug output

16 years agomds: suspend instead of suicide on beacon timeout
Sage Weil [Tue, 2 Dec 2008 05:13:34 +0000 (21:13 -0800)]
mds: suspend instead of suicide on beacon timeout

If we don't hear from the monitor, suspend doing any useful work instead
of just committing suicide.  If the monitor comes back and hasn't killed
us off, then we're fine.  If we've been marked as failed, we will shut
down as before.

16 years agofilestore: show return codes in debug output
Sage Weil [Tue, 2 Dec 2008 00:55:40 +0000 (16:55 -0800)]
filestore: show return codes in debug output

16 years agoobject: print snapid in hex
Sage Weil [Tue, 2 Dec 2008 00:53:16 +0000 (16:53 -0800)]
object: print snapid in hex

16 years agoosd: allow admin to mark osd lost to kickstart recovery (disk format change)
Sage Weil [Wed, 26 Nov 2008 22:48:52 +0000 (14:48 -0800)]
osd: allow admin to mark osd lost to kickstart recovery (disk format change)

This is important when an osd (or osds) may contain modifications
but is offline.  If the data is truly lost, we can kickstart
recovery.

Note that if the osd was storing metadata, this could be
especially dangerous!

16 years agoosd: clean up pg_stat, osd_stat summation, fields a bit
Sage Weil [Tue, 2 Dec 2008 00:31:16 +0000 (16:31 -0800)]
osd: clean up pg_stat, osd_stat summation, fields a bit

16 years agokclient: no page cache for write without wrbuf cap
Yehuda Sadeh [Tue, 2 Dec 2008 00:00:35 +0000 (16:00 -0800)]
kclient: no page cache for write without wrbuf cap

16 years agoMerge branch 'unstable' of ssh://ceph.newdream.net/git/ceph into unstable
Yehuda Sadeh [Mon, 1 Dec 2008 23:51:58 +0000 (15:51 -0800)]
Merge branch 'unstable' of ssh://ceph.newdream.net/git/ceph into unstable

16 years agokclient: sync writes use page cache
Yehuda Sadeh [Mon, 1 Dec 2008 23:50:36 +0000 (15:50 -0800)]
kclient: sync writes use page cache

16 years agokclient: clean out old BACKOFF bit flag, comments
Sage Weil [Mon, 1 Dec 2008 23:29:56 +0000 (15:29 -0800)]
kclient: clean out old BACKOFF bit flag, comments

16 years agoosdmaptool: print new osd_info fields
Sage Weil [Mon, 1 Dec 2008 22:24:35 +0000 (14:24 -0800)]
osdmaptool: print new osd_info fields

down_at, last_clean interval, etc.

16 years agoosd: make pg log dump obey debug levels
Sage Weil [Mon, 1 Dec 2008 23:16:24 +0000 (15:16 -0800)]
osd: make pg log dump obey debug levels

16 years agoosd: rebuild past intervals when needed; tolerate partial info
Sage Weil [Mon, 1 Dec 2008 23:13:25 +0000 (15:13 -0800)]
osd: rebuild past intervals when needed; tolerate partial info

Tolerate missing past_intervals attr.

Also, if only some past intervals are missing, rebuild them all; don't
assume that if any are there then all are there.

16 years agoosd: send and process heartbeats in separate thread, channel
Sage Weil [Mon, 1 Dec 2008 19:03:52 +0000 (11:03 -0800)]
osd: send and process heartbeats in separate thread, channel

Use a separate dispatch thread to process heartbeats.  Use a
separate thread to send them.  This ensures something slow
(e.g. a map update) does not make an osd appear to be down.

This also means a spearate entity_addr for heartbeats, which puts
them over a separate TCP stream.

16 years agomds: do not purge until leases expire
Sage Weil [Mon, 1 Dec 2008 22:01:45 +0000 (14:01 -0800)]
mds: do not purge until leases expire

16 years agolockdep: lockdep_dump_locks()
Sage Weil [Mon, 1 Dec 2008 18:41:29 +0000 (10:41 -0800)]
lockdep: lockdep_dump_locks()

Handy for dumping held locks in gdb w/ 'p lockdep_dump_lock()'

16 years agorwlock: try_get_read, try_get_write
Sage Weil [Mon, 1 Dec 2008 18:13:00 +0000 (10:13 -0800)]
rwlock: try_get_read, try_get_write

16 years agoosd: optionally avoid zeroing trimmed log on disk
Sage Weil [Mon, 1 Dec 2008 15:13:30 +0000 (07:13 -0800)]
osd: optionally avoid zeroing trimmed log on disk

This is a half-hearted attempt to keep old PG log content around.  It'll
still be lost if a PG moves to another node or the entire log is written
to disk for some other reason.

16 years agokclient: fix bad check
Yehuda Sadeh [Mon, 1 Dec 2008 20:36:37 +0000 (12:36 -0800)]
kclient: fix bad check

16 years agokclient: slient down some log message
Yehuda Sadeh [Mon, 1 Dec 2008 19:39:00 +0000 (11:39 -0800)]
kclient: slient down some log message

16 years agoosd: skip peer_info on down osds
Sage Weil [Mon, 1 Dec 2008 04:42:44 +0000 (20:42 -0800)]
osd: skip peer_info on down osds

We don't clean old/down OSDs out of peer_info map, since we may not
restart peering when strays go up/down.  That's fine... just make sure
we ignore them later.

16 years agoosd: remove bad PG::put() assertion
Sage Weil [Mon, 1 Dec 2008 04:41:50 +0000 (20:41 -0800)]
osd: remove bad PG::put() assertion

16 years agoosd: fix lock inversion on workqueue shutdown
Sage Weil [Sun, 30 Nov 2008 21:06:37 +0000 (13:06 -0800)]
osd: fix lock inversion on workqueue shutdown

Simplify PG get/put vs lock/unlock.  Since we are reference counting with
an atomic_t, we don't need to re-use PG::lock to protect the reference
count.

16 years agotodo
Sage Weil [Thu, 27 Nov 2008 16:34:52 +0000 (08:34 -0800)]
todo

16 years agoworkqueue: deliberately leak string with lock name
Sage Weil [Thu, 27 Nov 2008 16:34:43 +0000 (08:34 -0800)]
workqueue: deliberately leak string with lock name

Lockdep assumes strings are statically allocated.

16 years agoMerge branch 'unstable' of ssh://yehudasa@ceph.newdream.net/git/ceph into unstable
Ariela [Thu, 27 Nov 2008 01:17:05 +0000 (17:17 -0800)]
Merge branch 'unstable' of ssh://yehudasa@ceph.newdream.net/git/ceph into unstable

16 years agowireshark: update for win32
Ariela [Thu, 27 Nov 2008 01:16:43 +0000 (17:16 -0800)]
wireshark: update for win32

16 years agoosd: fix bad sub_op_push assertion; only write data we need
Sage Weil [Thu, 27 Nov 2008 00:20:48 +0000 (16:20 -0800)]
osd: fix bad sub_op_push assertion; only write data we need

We adjust the pushed buffer so that we only write the portions
of it that we can't clone.

16 years agoobjectcacher: only call flushed callback if there are also no dirty buffers
Sage Weil [Thu, 27 Nov 2008 00:42:01 +0000 (16:42 -0800)]
objectcacher: only call flushed callback if there are also no dirty buffers

Otherwise we call the flushed_callback too soon.

16 years agomds: fix up loner_cap whenever we manually change filelock state
Sage Weil [Thu, 27 Nov 2008 00:32:29 +0000 (16:32 -0800)]
mds: fix up loner_cap whenever we manually change filelock state

This is kind of a pain.

16 years agoosd: clean up touch() calls to use exists bool
Sage Weil [Wed, 26 Nov 2008 22:35:56 +0000 (14:35 -0800)]
osd: clean up touch() calls to use exists bool

We only need to touch of the head object itself doesn't exist.
Do so far all ops.  This works cleanly despite any op munging
above.

16 years agoobjectstore: fix touch()
Sage Weil [Wed, 26 Nov 2008 22:35:02 +0000 (14:35 -0800)]
objectstore: fix touch()

Wrong opcode meant skewed args, and all kinds of badness.

16 years agomon: instruct individual pgs to scrub
Sage Weil [Wed, 26 Nov 2008 21:50:33 +0000 (13:50 -0800)]
mon: instruct individual pgs to scrub

Factor out pg parsing into pg_t.

16 years agoosd: debug write_info a bit, clean up Transaction cruft
Sage Weil [Wed, 26 Nov 2008 21:42:17 +0000 (13:42 -0800)]
osd: debug write_info a bit, clean up Transaction cruft

16 years agoosd: initiate scrub via monitor message
Sage Weil [Wed, 26 Nov 2008 21:41:14 +0000 (13:41 -0800)]
osd: initiate scrub via monitor message

16 years agoobjecter: scan_pgs even on original full map
Sage Weil [Wed, 26 Nov 2008 20:39:36 +0000 (12:39 -0800)]
objecter: scan_pgs even on original full map

We may have queued ops before getting _any_ map; those still need to
be kicked.

16 years agoosd: ensure target osd is still up when sending MPGRemoves
Sage Weil [Wed, 26 Nov 2008 20:18:10 +0000 (12:18 -0800)]
osd: ensure target osd is still up when sending MPGRemoves

16 years agoosd: move stats into PG::Info (disk format change)
Sage Weil [Wed, 26 Nov 2008 19:23:15 +0000 (11:23 -0800)]
osd: move stats into PG::Info (disk format change)

We want the pg stats to propagate along with last_update.  Do so
in merge_log.

Also, stop doing delayed stats update on primary; we always update
the in-core copy of Info, and only delay applying the transaction
to disk.  At least currently.

16 years agoosd: (re)set degraded flag on activate
Sage Weil [Wed, 26 Nov 2008 18:55:54 +0000 (10:55 -0800)]
osd: (re)set degraded flag on activate

This ensures the bit is properly set on newly created PGs..

16 years agotimer: discard unfired events on shutdown
Sage Weil [Wed, 26 Nov 2008 18:55:14 +0000 (10:55 -0800)]
timer: discard unfired events on shutdown

Mostly this just cleans up valgrind leak check output.

16 years agoclient: fix use-after-free
Sage Weil [Wed, 26 Nov 2008 18:44:48 +0000 (10:44 -0800)]
client: fix use-after-free

put_node at the end.

16 years agomsgr: discard queued messages when closing pipe
Sage Weil [Wed, 26 Nov 2008 18:35:23 +0000 (10:35 -0800)]
msgr: discard queued messages when closing pipe

These were leaking whenever we, say, did mark_down().

16 years agoobjectcacher: do callbacks _last_ to avoid a use-after-free
Sage Weil [Wed, 26 Nov 2008 18:19:08 +0000 (10:19 -0800)]
objectcacher: do callbacks _last_ to avoid a use-after-free

The callback may call back into release_set, so do not assume any pointers
will remain valid after the callback.

16 years agomds: initialize CInode::loner_cap when twiddling filelock states after reconnect
Sage Weil [Wed, 26 Nov 2008 17:58:32 +0000 (09:58 -0800)]
mds: initialize CInode::loner_cap when twiddling filelock states after reconnect

loner_cap must either be defined or not defined to match the lock states.

16 years agoobjecter: use full osdmap to get started
Sage Weil [Wed, 26 Nov 2008 17:57:35 +0000 (09:57 -0800)]
objecter: use full osdmap to get started

Previously we requested and then decoded the full history of incrementals,
a big waste of time during startup.

16 years agoosd: ensure 0-byte object created by push
Sage Weil [Wed, 26 Nov 2008 18:07:54 +0000 (10:07 -0800)]
osd: ensure 0-byte object created by push

Specifically, a zero-length HEAD object whose snapset we care
about.

16 years agoosd: mark repops aborteds in on_shutdown
Sage Weil [Wed, 26 Nov 2008 16:28:11 +0000 (08:28 -0800)]
osd: mark repops aborteds in on_shutdown

Otherwise the op_modify_ondisk() completion gets confused.

16 years agodstart: show filestore ops
Sage Weil [Wed, 26 Nov 2008 01:31:34 +0000 (17:31 -0800)]
dstart: show filestore ops

16 years agoosd: clear out workqueue queues on shutdown
Sage Weil [Wed, 26 Nov 2008 01:30:15 +0000 (17:30 -0800)]
osd: clear out workqueue queues on shutdown

16 years agomon: show both user data and actual disk space used
Sage Weil [Wed, 26 Nov 2008 01:28:00 +0000 (17:28 -0800)]
mon: show both user data and actual disk space used

16 years agodebug: rotate old courtesy symlinks to .0, .1, etc.
Sage Weil [Wed, 26 Nov 2008 01:26:56 +0000 (17:26 -0800)]
debug: rotate old courtesy symlinks to .0, .1, etc.

This will make it much easier to find the next-oldest instantiation of
osd foo.

16 years agoosd: remove bad assertion in op_modify_ondisk
Sage Weil [Tue, 25 Nov 2008 22:43:56 +0000 (14:43 -0800)]
osd: remove bad assertion in op_modify_ondisk

We may not be in waitfor_disk if we are marked failed and
on_osd_failure acked and removed us from the set.

16 years agoosd: fix clone push when head is correct old version
Sage Weil [Tue, 25 Nov 2008 22:00:55 +0000 (14:00 -0800)]
osd: fix clone push when head is correct old version

If the replica's head is the clone source version, our push should instruct
the replica to clone its contents.

16 years agomds: remove anchors when destroy stray inodes
Sage Weil [Tue, 25 Nov 2008 21:19:16 +0000 (13:19 -0800)]
mds: remove anchors when destroy stray inodes

purge_stray() is now the sole caller of anchor_destroy, so we can still
avoid any locking considerations.

16 years agomds: clean up request write condition, can_forward logic a bit
Sage Weil [Tue, 25 Nov 2008 20:50:00 +0000 (12:50 -0800)]
mds: clean up request write condition, can_forward logic a bit

16 years agoosd: fix rare memory leak
Sage Weil [Tue, 25 Nov 2008 20:21:20 +0000 (12:21 -0800)]
osd: fix rare memory leak

16 years agocmonctl: print summary every 20 lines when in watch mode
Sage Weil [Tue, 25 Nov 2008 20:14:26 +0000 (12:14 -0800)]
cmonctl: print summary every 20 lines when in watch mode

16 years agoosd: fix up recovery pointers a bit
Sage Weil [Tue, 25 Nov 2008 19:22:14 +0000 (11:22 -0800)]
osd: fix up recovery pointers a bit

16 years agowireshark: wireshark ceph plugin patch
Yehuda Sadeh [Tue, 25 Nov 2008 18:57:37 +0000 (10:57 -0800)]
wireshark: wireshark ceph plugin patch

16 years agoosd: be a bit more verbose in push_to_replica
Sage Weil [Tue, 25 Nov 2008 18:42:36 +0000 (10:42 -0800)]
osd: be a bit more verbose in push_to_replica

16 years agoosd: lots of fixes
Sage Weil [Tue, 25 Nov 2008 18:02:48 +0000 (10:02 -0800)]
osd: lots of fixes

16 years agotodos
Sage Weil [Tue, 25 Nov 2008 00:41:54 +0000 (16:41 -0800)]
todos

16 years agoosd: clean up repop code
Sage Weil [Tue, 25 Nov 2008 14:45:48 +0000 (06:45 -0800)]
osd: clean up repop code

16 years agoosd: fix up repop_ack
Sage Weil [Tue, 25 Nov 2008 04:46:16 +0000 (20:46 -0800)]
osd: fix up repop_ack

16 years agoosd: ack type in osd sub ops
Sage Weil [Tue, 25 Nov 2008 00:52:23 +0000 (16:52 -0800)]
osd: ack type in osd sub ops

16 years agoosd: infrastructure for ack vs nvram vs disk osd_op ack types
Sage Weil [Tue, 25 Nov 2008 00:35:43 +0000 (16:35 -0800)]
osd: infrastructure for ack vs nvram vs disk osd_op ack types

NVRAM ack not generated, yet.  The completion callbacks from the store
need some work first.

16 years agoos: separate onjournal, ondisk callbacks for apply_transaction
Sage Weil [Mon, 24 Nov 2008 22:56:46 +0000 (14:56 -0800)]
os: separate onjournal, ondisk callbacks for apply_transaction

16 years agocontext: all C_Gather to OR instead of AND subs.
Sage Weil [Mon, 24 Nov 2008 22:06:49 +0000 (14:06 -0800)]
context: all C_Gather to OR instead of AND subs.

That is, call completion when first sub completes.

16 years agoosd: remove snap collection after it is trimmed
Sage Weil [Mon, 24 Nov 2008 21:49:17 +0000 (13:49 -0800)]
osd: remove snap collection after it is trimmed

We can still end up with empty collections for existing snaps but no
local objects.  However, they'll eventually go away when the snap is
deleted, so who cares.

16 years agoos: add collection_empty method
Sage Weil [Mon, 24 Nov 2008 21:47:54 +0000 (13:47 -0800)]
os: add collection_empty method

16 years agotodos
Sage Weil [Mon, 24 Nov 2008 20:06:18 +0000 (12:06 -0800)]
todos

16 years agoosd: reply with EBLACKLISTED if sender is blacklisted
Sage Weil [Mon, 24 Nov 2008 20:06:08 +0000 (12:06 -0800)]
osd: reply with EBLACKLISTED if sender is blacklisted

Move reply_op_error helper into OSD.cc.