]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
14 years agoosdmap: don't include blacklist info in summary
Sage Weil [Wed, 17 Nov 2010 18:24:21 +0000 (10:24 -0800)]
osdmap: don't include blacklist info in summary

It's confusing users and isn't that important.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoconfig: added max_mds
Samuel Just [Wed, 17 Nov 2010 00:07:47 +0000 (16:07 -0800)]
config: added max_mds
MDSMonitor: create_new_fs adapted to use the max_mds parameter

max_mds is now a configurable value and create_new_fs will initialize
max_mds to the specified value.

Signed-off-by: Samuel Just <samuelj@hq.newdream.net>
14 years agomds: add timestamp to LogEvents
Sage Weil [Tue, 16 Nov 2010 20:07:38 +0000 (12:07 -0800)]
mds: add timestamp to LogEvents

This just gives us a bit of useful info when debugging problems.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: fix trailing + in pg state string rendering
Sage Weil [Tue, 16 Nov 2010 18:32:19 +0000 (10:32 -0800)]
osd: fix trailing + in pg state string rendering

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge remote branch 'origin/unfound' into unstable
Sage Weil [Tue, 16 Nov 2010 18:10:43 +0000 (10:10 -0800)]
Merge remote branch 'origin/unfound' into unstable

14 years agomds: be less noisy about cap imports
Sage Weil [Tue, 16 Nov 2010 18:06:05 +0000 (10:06 -0800)]
mds: be less noisy about cap imports

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'mds_dir_hash' into unstable
Sage Weil [Tue, 16 Nov 2010 18:01:15 +0000 (10:01 -0800)]
Merge branch 'mds_dir_hash' into unstable

14 years agomds/client: pass dir hash over the wire
Sage Weil [Tue, 16 Nov 2010 17:50:55 +0000 (09:50 -0800)]
mds/client: pass dir hash over the wire

Add a feature bit DIRLAYOUTHASH.

Also fix client request routing for lookups (we were only hashing when
a Dentry pointer was provided, not when a relative path was).

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: set dir hash on root inode
Sage Weil [Tue, 16 Nov 2010 17:47:34 +0000 (09:47 -0800)]
mds: set dir hash on root inode

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: set mode before all the file type dependent inode initialization!
Sage Weil [Tue, 16 Nov 2010 17:42:51 +0000 (09:42 -0800)]
mds: set mode before all the file type dependent inode initialization!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: add DIRLAYOUTHASH feature bit
Sage Weil [Tue, 26 Oct 2010 03:41:24 +0000 (20:41 -0700)]
mds: add DIRLAYOUTHASH feature bit

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: make dentry hash a dir layout property
Sage Weil [Mon, 25 Oct 2010 23:46:01 +0000 (16:46 -0700)]
mds: make dentry hash a dir layout property

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoRadosClient::shutdown: call monclient::shutdown
Colin Patrick McCabe [Tue, 16 Nov 2010 02:33:01 +0000 (18:33 -0800)]
RadosClient::shutdown: call monclient::shutdown

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd: don't stop recovery when there are unfound
Colin Patrick McCabe [Tue, 16 Nov 2010 00:19:27 +0000 (16:19 -0800)]
osd: don't stop recovery when there are unfound

There are two phases in recovery: one where we get all the right objects
on to the primary, and another where we push all those objects out to
the replicas. Formerly, we would not start the second phase until there
were no missing objects at all on the primary.

This change modifies that so that we will start the second phase even if
there are unfound objects. However, we will still wait for all findable
missing objects to be brought to us, of course.

Get rid of uptodate_set. We can find the same information by looking at
the missing and missing_loc sets directly. Keeping the uptodate_set...
er... up-to-date would be very difficult in the presence of all the things
that can modify the missing and missing_loc sets.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agodumpjournal.cc: fix compile
Colin Patrick McCabe [Tue, 16 Nov 2010 01:01:19 +0000 (17:01 -0800)]
dumpjournal.cc: fix compile

dumpjournal needs to create its own SafeTimers and pass them in to some
constructors.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agorbd: fix rbd snap rm class handling
Yehuda Sadeh [Tue, 16 Nov 2010 00:43:25 +0000 (16:43 -0800)]
rbd: fix rbd snap rm class handling

14 years agoMerge remote branch 'origin/unfound_last_epoch_clean' into unstable
Sage Weil [Mon, 15 Nov 2010 22:59:46 +0000 (14:59 -0800)]
Merge remote branch 'origin/unfound_last_epoch_clean' into unstable

14 years agoAdd ./ceph osd tell <osd-num> dump_missing <out>
Colin Patrick McCabe [Mon, 15 Nov 2010 22:47:44 +0000 (14:47 -0800)]
Add ./ceph osd tell <osd-num> dump_missing <out>

Add a command that tells the OSD to dump its missing set for all PGs to
a file. This should be useful for debugging multi-OSD scenarios.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agosearch_for_missing:recalc stats if unfound changed
Colin Patrick McCabe [Mon, 15 Nov 2010 22:38:36 +0000 (14:38 -0800)]
search_for_missing:recalc stats if unfound changed

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agotimer: make init/shutdown explicit
Sage Weil [Mon, 15 Nov 2010 21:23:42 +0000 (13:23 -0800)]
timer: make init/shutdown explicit

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agotest_unfound.sh: start recovery at end of test
Colin Patrick McCabe [Mon, 15 Nov 2010 20:39:56 +0000 (12:39 -0800)]
test_unfound.sh: start recovery at end of test

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agotest_common.sh: add dump_osd_store
Colin Patrick McCabe [Mon, 15 Nov 2010 20:31:43 +0000 (12:31 -0800)]
test_common.sh: add dump_osd_store

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agotest_unfound.sh: fix return codes
Colin Patrick McCabe [Mon, 15 Nov 2010 19:27:44 +0000 (11:27 -0800)]
test_unfound.sh: fix return codes

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agostray_test:don't use up/down. timeout extension
Colin Patrick McCabe [Mon, 15 Nov 2010 19:05:47 +0000 (11:05 -0800)]
stray_test:don't use up/down. timeout extension

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd: add discover_all_missing
Colin Patrick McCabe [Sun, 14 Nov 2010 22:50:30 +0000 (14:50 -0800)]
osd: add discover_all_missing

Add discover_all_missing. This function makes sure that we have messages
en route to any OSD that we think might have information that could help
us discover where our unfound objects lie.

We call this function:
* In do_peer, right after activating the PG
* In _process_pg_info, when we're the primary of an active PG
* From handle_pg_notify, when we're the primary of an active PG

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoFix bugs in search_for_missing, _process_pg_info
Colin Patrick McCabe [Sun, 14 Nov 2010 21:54:55 +0000 (13:54 -0800)]
Fix bugs in search_for_missing, _process_pg_info

PG::search_for_missing: fix a bug with the handling of MSG_OSD_PG_INFO
messages. Formerly, when processing these messages, we erroneously
assumed that there was nothing missing on the peer at all even in cases
where there were missing objects.

PG::merge_log: drop unused Missing parameter.

_process_pg_info: Don't assume that just because we requested a Log
message at some point, that that is the message we're prcessing.
Correctly handle cases where we didn't get the peer's Missing set or
Log.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoAdd stray_test to test_unfound.sh
Colin Patrick McCabe [Sat, 13 Nov 2010 07:44:50 +0000 (23:44 -0800)]
Add stray_test to test_unfound.sh

This test is designed to produce a stray that nonetheless has some
useful objects. The primary should be able to find those objects and ask
for them.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoPG::finish_recovery: set info.last_epoch_clean
Colin Patrick McCabe [Sun, 14 Nov 2010 21:32:51 +0000 (13:32 -0800)]
PG::finish_recovery: set info.last_epoch_clean

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoAdd MOSDPGMissing
Colin Patrick McCabe [Fri, 12 Nov 2010 22:55:40 +0000 (14:55 -0800)]
Add MOSDPGMissing

Add MOSDPGMissing, a message which just contains the missing objects
information for a PG. We will request messages like this one in order to
locate all of our unfound objects.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd: add incompat feature LEC for last_epoch_clean
Colin Patrick McCabe [Thu, 11 Nov 2010 01:12:49 +0000 (17:12 -0800)]
osd: add incompat feature LEC for last_epoch_clean

So an old binary will fail to mount a store with new Info encoding.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: add last_epoch_clean to PG::Info
Sage Weil [Tue, 19 Oct 2010 18:40:45 +0000 (11:40 -0700)]
osd: add last_epoch_clean to PG::Info

This changes the encoding in a non-backwards compatible way.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agotest_common.sh: remove messenger debug for now
Colin Patrick McCabe [Sun, 14 Nov 2010 19:40:33 +0000 (11:40 -0800)]
test_common.sh: remove messenger debug for now

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd: skip unfound in recover_replicas
Sage Weil [Mon, 15 Nov 2010 20:06:09 +0000 (12:06 -0800)]
osd: skip unfound in recover_replicas

This is moot currently, since we don't currently start recovering replicas
until the primary is complete.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: skip unfound objects in recover_primary()
Sage Weil [Mon, 15 Nov 2010 20:04:57 +0000 (12:04 -0800)]
osd: skip unfound objects in recover_primary()

We also need to make sure we come back later when they are found.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosdmap: make printing a bit easier to read
Sage Weil [Mon, 15 Nov 2010 19:57:57 +0000 (11:57 -0800)]
osdmap: make printing a bit easier to read

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoobjecter: don't dereference null op->outbl
Sage Weil [Mon, 15 Nov 2010 19:50:53 +0000 (11:50 -0800)]
objecter: don't dereference null op->outbl

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge remote branch 'origin/testing' into unstable
Sage Weil [Mon, 15 Nov 2010 19:25:58 +0000 (11:25 -0800)]
Merge remote branch 'origin/testing' into unstable

14 years agoosd: unreg scrub when removing pg
Sage Weil [Mon, 15 Nov 2010 19:25:49 +0000 (11:25 -0800)]
osd: unreg scrub when removing pg

This fixes this crash:

    osd/OSD.cc: In function 'PG* OSD::_lookup_lock_pg(pg_t)':
    osd/OSD.cc:956: FAILED assert(pg_map.count(pgid))
    ceph version 0.24~rc (7201497f2feef6a2bbd0baf89e3a14b8a880e79f)
    1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x34) [0x81ee4e]
    2: (OSD::_lookup_lock_pg(pg_t)+0x6f) [0x6d3d57]
    3: (OSD::sched_scrub()+0x2e9) [0x6e4445]
    4: (OSD::tick()+0x204) [0x6f168e]
    5: (OSD::C_Tick::finish(int)+0x1c) [0x7613bc]
    6: (SafeTimer::timer_thread()+0x189) [0x81bcf5]
    7: (SafeTimerThread::entry()+0x19) [0x81dd73]
    8: (Thread::_entry_func(void*)+0x20) [0x66496a]
    9: (()+0x68ba) [0x7fb807d118ba]
    10: (clone()+0x6d) [0x7fb806a7002d]
    NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Reported-by: Colin McCabe <colinm@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'msgr_zerocopy_read' into unstable
Sage Weil [Mon, 15 Nov 2010 04:48:43 +0000 (20:48 -0800)]
Merge branch 'msgr_zerocopy_read' into unstable

14 years agolibrados: pass provided buffer to objecter on rados_read
Sage Weil [Mon, 15 Nov 2010 04:29:40 +0000 (20:29 -0800)]
librados: pass provided buffer to objecter on rados_read

This allows us to avoid to the data copy if the objecter and msgr manage
to use it.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoobjecter: post rx buffer to msgr if target bufferlist is present
Sage Weil [Mon, 15 Nov 2010 04:28:44 +0000 (20:28 -0800)]
objecter: post rx buffer to msgr if target bufferlist is present

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: use provided rx buffer if present
Sage Weil [Mon, 15 Nov 2010 04:26:52 +0000 (20:26 -0800)]
msgr: use provided rx buffer if present

This changes the read path so that we hold the Connection::lock mutex while
reading data off the socket.  This ensures that we are reading into a
buffer we are allowed to use, and allows users to revoke a previously
posted buffer.  If that happens, switch over to a newly allocated buffer.

Note that currently the final result bufferlist may contain part of the
provided buffer and part of a newly allocated buffer.  This is okay as long
as we will always read the same data into the buffer.  And in practice, if
the rx buffer is revoked then the message itself will be thrown out anyway.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: add Connection rx buffer interface
Sage Weil [Mon, 15 Nov 2010 04:23:52 +0000 (20:23 -0800)]
msgr: add Connection rx buffer interface

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: implement get_connection()
Sage Weil [Mon, 15 Nov 2010 04:23:10 +0000 (20:23 -0800)]
msgr: implement get_connection()

Get a Connection* for the given destination.  This mirrors submit_message,
but does not actually queue a message.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agobuffer: implement list::iterator::get_current_ptr()
Sage Weil [Mon, 15 Nov 2010 04:21:05 +0000 (20:21 -0800)]
buffer: implement list::iterator::get_current_ptr()

Return a buffer::ptr for the ptr at the current position/offset, with the
length set to the remaining space in the current buffer.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoObjecter::shutdown: shut down timer.
Colin Patrick McCabe [Sun, 14 Nov 2010 19:29:29 +0000 (11:29 -0800)]
Objecter::shutdown: shut down timer.

We have to explictly shut down the timer in Objecter::shutdown.
Otherwise, we are relying on the destructor of SafeTimer to do it.
Unfortunately, that destructor gets called after the mutex the timer is
using has already been destroyed.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMerge remote branch 'origin/msgr' into testing
Sage Weil [Sat, 13 Nov 2010 04:43:30 +0000 (20:43 -0800)]
Merge remote branch 'origin/msgr' into testing

14 years agodebug: don't print thread id twice
Sage Weil [Sat, 13 Nov 2010 00:00:12 +0000 (16:00 -0800)]
debug: don't print thread id twice

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: cleanup: make queue_received non-inline; some helpful debug
Sage Weil [Fri, 12 Nov 2010 23:59:50 +0000 (15:59 -0800)]
msgr: cleanup: make queue_received non-inline; some helpful debug

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: do not clear halt_delivery
Sage Weil [Fri, 12 Nov 2010 23:56:54 +0000 (15:56 -0800)]
msgr: do not clear halt_delivery

We need to keep the halt_delivery plug set on failure/shutdown in order to
prevent a racing reader from queuing new messages.  The only time we clear
it is when we discard messages due to a session reset.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: close enqueue/discard race
Sage Weil [Fri, 12 Nov 2010 22:41:53 +0000 (14:41 -0800)]
msgr: close enqueue/discard race

We need to re-check halt_delivery after dropping and retaking pipe_lock.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: protect pipe queue_item map with pipe_lock AND dispatch_queue lock
Sage Weil [Fri, 12 Nov 2010 22:05:56 +0000 (14:05 -0800)]
msgr: protect pipe queue_item map with pipe_lock AND dispatch_queue lock

Close a few different races here.

Also, assert that queue_items are not queued in ~Pipe().

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: add 'ms inject socket failures = foo'
Sage Weil [Fri, 12 Nov 2010 21:53:49 +0000 (13:53 -0800)]
msgr: add 'ms inject socket failures = foo'

Where we fail roughly every foo'th socket operation.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: only close socket on reconnect or shutdown
Sage Weil [Fri, 12 Nov 2010 21:09:24 +0000 (13:09 -0800)]
msgr: only close socket on reconnect or shutdown

We can't modify 'sd' or (more importnatly) close sd while any other thread
might be using it, or else we might race with an open and they might end
up using someone else's fd.

Take care to _only_ close(sd) in connect(), when the reader thread is
stopped, or when reaping the connection.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: protect pipe queuing with _both_ pipe and dispatch_queue locks
Sage Weil [Fri, 12 Nov 2010 21:41:14 +0000 (13:41 -0800)]
msgr: protect pipe queuing with _both_ pipe and dispatch_queue locks

We want to make sure the pipe's queue item doesn't go away.

Also, make queue_received() require pipe_lock to be held.  This avoids some
useless unlocking/locking, since (in the case where the pipe is already
queued) we then don't need to drop the pipe_lock at all.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoTestTimers: don't test (nonexistent) Timer
Colin Patrick McCabe [Fri, 12 Nov 2010 22:49:25 +0000 (14:49 -0800)]
TestTimers: don't test (nonexistent) Timer

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoRename PG::peer to PG::do_peer
Colin Patrick McCabe [Fri, 12 Nov 2010 22:45:36 +0000 (14:45 -0800)]
Rename PG::peer to PG::do_peer

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMerge branch 'testing' into unstable
Sage Weil [Fri, 12 Nov 2010 15:59:05 +0000 (07:59 -0800)]
Merge branch 'testing' into unstable

14 years agouclient: insert lssnap results under snapdir, not live dir
Sage Weil [Fri, 12 Nov 2010 15:55:41 +0000 (07:55 -0800)]
uclient: insert lssnap results under snapdir, not live dir

Put the readdir results (list of snapshots) in the right place in the
hierarchy; we were putting them in the parent dir (as if they were real
directories).

This bug manifested itself as a snaptest-2.sh failure.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsg: fix buffer size for IPv6 address parsing
Wido den Hollander [Fri, 12 Nov 2010 15:36:00 +0000 (07:36 -0800)]
msg: fix buffer size for IPv6 address parsing

Signed-off-by: Wido den Hollander <wido@widodh.nl>
14 years agotimer: rewrite mostly from scratch
Sage Weil [Fri, 12 Nov 2010 00:38:02 +0000 (16:38 -0800)]
timer: rewrite mostly from scratch

Just use the provided lock.  This _vastly_ reduces the complexity because
we don't have to worry about races between our thread trying to fire off a
timer that is being canceled.

The old Timer class isn't used anywhere anymore.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: hit inode created via CREATE
Sage Weil [Thu, 11 Nov 2010 23:31:42 +0000 (15:31 -0800)]
mds: hit inode created via CREATE

We missed this path!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'rc' into unstable
Sage Weil [Thu, 11 Nov 2010 22:28:18 +0000 (14:28 -0800)]
Merge branch 'rc' into unstable

Conflicts:
configure.ac
src/Makefile.am

14 years agov0.23 v0.23
Sage Weil [Thu, 11 Nov 2010 00:34:17 +0000 (16:34 -0800)]
v0.23

14 years agomds: fix null_snapflush with multiple intervening snaps
Sage Weil [Thu, 11 Nov 2010 04:58:49 +0000 (20:58 -0800)]
mds: fix null_snapflush with multiple intervening snaps

The client is allowed to not send a snapflush if there is no dirty metadata
to write for a given snap.  However, the mds can only look up inodes by
the last snapid in the interval.  So, when doing a null_snapflush (filling
in for snapflushes the client didn't send), we have to walk forward through
intervening snaps until we find the right inode.

Note that this means we will call _do_snap_update multiple times on the
same inode, but with different snapids.

Add unit test to check this.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge remote branch 'origin/unfound' into unstable
Sage Weil [Thu, 11 Nov 2010 00:36:18 +0000 (16:36 -0800)]
Merge remote branch 'origin/unfound' into unstable

14 years agoosd: scrub least recently scrubbed pgs first; once a day
Sage Weil [Thu, 11 Nov 2010 00:31:26 +0000 (16:31 -0800)]
osd: scrub least recently scrubbed pgs first; once a day

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: don't scrub something we just scrubbed
Sage Weil [Wed, 10 Nov 2010 23:43:37 +0000 (15:43 -0800)]
osd: don't scrub something we just scrubbed

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: call sched_scrub on reserve reply
Sage Weil [Wed, 10 Nov 2010 23:33:31 +0000 (15:33 -0800)]
osd: call sched_scrub on reserve reply

Otherwise we have to wait until the next time it's called by the timer, and
during that period we have a reservation locally, and any other peers can't
reserve a scrub from us, and nobody makes any progress.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: fix sched_scrub
Sage Weil [Wed, 10 Nov 2010 23:28:39 +0000 (15:28 -0800)]
osd: fix sched_scrub

Insert whoami into reserved set on primary, not 0!  Also more cleanup of
sched state helpers.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: do scrub schedule state changes inside scrub()
Sage Weil [Wed, 10 Nov 2010 22:58:34 +0000 (14:58 -0800)]
osd: do scrub schedule state changes inside scrub()

Update these values under protection of pg lock iff we start scrubbing,
otherwise back out.

On scrub completion, unreserve replicas.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: track last_scrubbed in PG::Info::History
Sage Weil [Wed, 10 Nov 2010 22:55:57 +0000 (14:55 -0800)]
osd: track last_scrubbed in PG::Info::History

Share with peers and write to disk on scrub completion.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: scrub: change cancel behavior
Sage Weil [Wed, 10 Nov 2010 22:15:41 +0000 (14:15 -0800)]
osd: scrub: change cancel behavior

Use explicit flag, so that scrub_reserved always indicates whether the
osd count includes us or not.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agopg_state_string: use an ostringstream
Colin Patrick McCabe [Wed, 10 Nov 2010 22:43:26 +0000 (14:43 -0800)]
pg_state_string: use an ostringstream

Use an ostringstream for efficiency's sake.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agovstart: stop logging to /tmp/foo
Sage Weil [Wed, 10 Nov 2010 21:49:22 +0000 (13:49 -0800)]
vstart: stop logging to /tmp/foo

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: fix scrub reserved state when starting scrub
Sage Weil [Wed, 10 Nov 2010 21:39:51 +0000 (13:39 -0800)]
osd: fix scrub reserved state when starting scrub

Also document scrub scheduling/pending/active states.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agovstart: turn down msgr debugging
Sage Weil [Wed, 10 Nov 2010 21:16:34 +0000 (13:16 -0800)]
vstart: turn down msgr debugging

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomonc: cancel timer events with lock held
Sage Weil [Wed, 10 Nov 2010 21:13:38 +0000 (13:13 -0800)]
monc: cancel timer events with lock held

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoWake up clients waiting for now-found objects
Colin Patrick McCabe [Tue, 9 Nov 2010 06:15:14 +0000 (22:15 -0800)]
Wake up clients waiting for now-found objects

PG::search_for_missing: when we find a previously unfound object, check
to see if there is an entry in waiting_for_missing_object representing a
client waiting for this object.

PG::repair_object: assert that waiting_for_missing_object is empty
before messing with missing_loc. It definitely should be during a scrub.

ReplicatedPG role change logic: always take_object_waiters on the wait
queues when the PG acting set changes.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agotest_unfound.sh: test reading an unfound object.
Colin Patrick McCabe [Mon, 8 Nov 2010 20:30:26 +0000 (12:30 -0800)]
test_unfound.sh: test reading an unfound object.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agotest_unfound.sh: verify that we have unfound objs
Colin Patrick McCabe [Fri, 5 Nov 2010 00:28:39 +0000 (17:28 -0700)]
test_unfound.sh: verify that we have unfound objs

test_unfound.sh: verify that we have unfound objs.
Then, when we bring up the other OSD, verify that those unfound objects
are found (on that OSD).

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoAdd num_objects_unfound to struct pg_stat_t
Colin Patrick McCabe [Thu, 4 Nov 2010 21:40:22 +0000 (14:40 -0700)]
Add num_objects_unfound to struct pg_stat_t

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agotest_unfound.sh: shorter test
Colin Patrick McCabe [Wed, 3 Nov 2010 00:58:52 +0000 (17:58 -0700)]
test_unfound.sh: shorter test

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoPG::recover_master_log: rename a local variable
Colin Patrick McCabe [Wed, 3 Nov 2010 00:56:00 +0000 (17:56 -0700)]
PG::recover_master_log: rename a local variable

PG::recover_master_log: rename a local variable to avoid using the
overloaded term "missing".

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoOSD::_process_pg_info:search_for_missing sometimes
Colin Patrick McCabe [Wed, 3 Nov 2010 00:51:02 +0000 (17:51 -0700)]
OSD::_process_pg_info:search_for_missing sometimes

OSD::_process_pg_info: If we're the primary for this active PG, and we
have missing objects, call search_for_missing. This should ensure that
we know where to find our missing objects.

The reason why this wasn't there before is that previously, we kept the
PG from going active until all the missing objects were found.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoAdd PG::Missing::have_missing()
Colin Patrick McCabe [Wed, 3 Nov 2010 00:50:28 +0000 (17:50 -0700)]
Add PG::Missing::have_missing()

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoPG::search_for_missing: minor refactoring, comment
Colin Patrick McCabe [Wed, 3 Nov 2010 00:49:37 +0000 (17:49 -0700)]
PG::search_for_missing: minor refactoring, comment

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoPG::peer: don't block if objects are unfound
Colin Patrick McCabe [Wed, 3 Nov 2010 00:42:48 +0000 (17:42 -0700)]
PG::peer: don't block if objects are unfound

Erase the code in PG::peer that used to keep us from becoming active
when objects were still unfound. Print out the number of missing and
unfound objects at the end of PG::peer.

Erase PG::check_for_lost_objects and PG::forget_lost_objects.

14 years agoPG::peer: count/find cleanup
Colin Patrick McCabe [Wed, 3 Nov 2010 00:38:16 +0000 (17:38 -0700)]
PG::peer: count/find cleanup

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoPG.h erase deadcode
Colin Patrick McCabe [Wed, 3 Nov 2010 00:36:11 +0000 (17:36 -0700)]
PG.h erase deadcode

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoPG: nomenclature change: talk about unfound objs
Colin Patrick McCabe [Wed, 3 Nov 2010 00:34:07 +0000 (17:34 -0700)]
PG: nomenclature change: talk about unfound objs

Describe objects as "unfound" when we don't know what OSD has them.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoPG: move ostream operator to .cpp file
Colin Patrick McCabe [Wed, 3 Nov 2010 00:31:20 +0000 (17:31 -0700)]
PG: move ostream operator to .cpp file

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agomds: fix inode->frag rstat projected with snaps
Sage Weil [Wed, 10 Nov 2010 17:43:56 +0000 (09:43 -0800)]
mds: fix inode->frag rstat projected with snaps

The snapid 'first' value needs to be >= inode->first; move that into
the helper.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosdmap: break up asserts for easier debugging
Sage Weil [Wed, 10 Nov 2010 17:04:31 +0000 (09:04 -0800)]
osdmap: break up asserts for easier debugging

If we fail one of these it's helpful to know which one.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoobjecter: throttle before looking at lock protected state
Sage Weil [Wed, 10 Nov 2010 17:03:37 +0000 (09:03 -0800)]
objecter: throttle before looking at lock protected state

The take_op_budget() may drop our lock if we are in keep_balanced_budget
mode, so we need to do that _before_ we take references to internal state
that may change out from under us during that time.

This fixes a crash like

./osd/OSDMap.h: In function 'entity_inst_t OSDMap::get_inst(int)':
./osd/OSDMap.h:460: FAILED assert(exists(osd) && is_up(osd))
ceph version 0.22.1 (commit:c6f403a6f441184956e00659ce713eaee7014279)
1: (Objecter::op_submit(Objecter::Op*)+0x6c2) [0x38658854c2]
2: /usr/lib64/librados.so.1() [0x3865855dc9]
3: (RadosClient::aio_write(RadosClient::PoolCtx&, object_t, long,
ceph::buffer::list const&, unsigned long,
RadosClient::AioCompletion*)+0x24b) [0x386585724b]
4: (rados_aio_write()+0x9a) [0x386585741a]
5: /usr/bin/qemu-kvm() [0x45a305]
6: /usr/bin/qemu-kvm() [0x45a430]
7: /usr/bin/qemu-kvm() [0x43bb73]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
./osd/OSDMap.h: In function 'entity_inst_t OSDMap::get_inst(int)':
./osd/OSDMap.h:460: FAILED assert(exists(osd) && is_up(osd))
ceph version 0.22.1 (commit:c6f403a6f441184956e00659ce713eaee7014279)
1: (Objecter::op_submit(Objecter::Op*)+0x6c2) [0x38658854c2]
2: /usr/lib64/librados.so.1() [0x3865855dc9]
3: (RadosClient::aio_write(RadosClient::PoolCtx&, object_t, long,
ceph::buffer::list const&, unsigned long,
RadosClient::AioCompletion*)+0x24b) [0x386585724b]
4: (rados_aio_write()+0x9a) [0x386585741a]
5: /usr/bin/qemu-kvm() [0x45a305]
6: /usr/bin/qemu-kvm() [0x45a430]
7: /usr/bin/qemu-kvm() [0x43bb73]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
*** Caught signal (ABRT) ***
ceph version 0.22.1 (commit:c6f403a6f441184956e00659ce713eaee7014279)
1: (sigabrt_handler(int)+0x91) [0x3865922b91]
2: /lib64/libc.so.6() [0x3c0c032a30]
3: (gsignal()+0x35) [0x3c0c0329b5]
4: (abort()+0x175) [0x3c0c034195]
5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x3c110beaad]

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomon: drop unnecessary state checks
Sage Weil [Wed, 10 Nov 2010 16:50:25 +0000 (08:50 -0800)]
mon: drop unnecessary state checks

We want to ignore all beacons from the mds regardless of what state they
are in.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agodebian: don't explicitly depend on libgoogle-perftools0
Sage Weil [Wed, 10 Nov 2010 16:45:36 +0000 (08:45 -0800)]
debian: don't explicitly depend on libgoogle-perftools0

dpkg-buildpackage will autodetect the dependency.  Except on lenny, where
it doesn't exist and we don't use it!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: Enable --journal_check mode.
Greg Farnum [Wed, 10 Nov 2010 16:11:23 +0000 (08:11 -0800)]
mds: Enable --journal_check mode.

This replaces the old --shadow option, which didn't work.
It starts up the MDS daemon, then replays the journal for
another MDS, and then shuts down.

Also minimally modifies the MDSMonitor to enable this
behavior; since it requires shared state.

14 years agoosdc: Fix bad assert in ~ObjectCacher.
Greg Farnum [Tue, 9 Nov 2010 18:48:00 +0000 (10:48 -0800)]
osdc: Fix bad assert in ~ObjectCacher.

The objects data member is never empty on shutdown since it now consists
of a vector of pools. Instead, check each pool map for emptiness.

14 years agouclient: only update inode if version increased
Sage Weil [Wed, 10 Nov 2010 15:42:29 +0000 (07:42 -0800)]
uclient: only update inode if version increased

This realigns the code with the kernel version, fixing a number of
problems when you have multiple MDSs returning info on the same inode.

Signed-off-by: Sage Weil <sage@newdream.net>