]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
14 years agomds: set dir hash on root inode
Sage Weil [Tue, 16 Nov 2010 17:47:34 +0000 (09:47 -0800)]
mds: set dir hash on root inode

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: set mode before all the file type dependent inode initialization!
Sage Weil [Tue, 16 Nov 2010 17:42:51 +0000 (09:42 -0800)]
mds: set mode before all the file type dependent inode initialization!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: add DIRLAYOUTHASH feature bit
Sage Weil [Tue, 26 Oct 2010 03:41:24 +0000 (20:41 -0700)]
mds: add DIRLAYOUTHASH feature bit

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: make dentry hash a dir layout property
Sage Weil [Mon, 25 Oct 2010 23:46:01 +0000 (16:46 -0700)]
mds: make dentry hash a dir layout property

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge remote branch 'origin/unfound_last_epoch_clean' into unstable
Sage Weil [Mon, 15 Nov 2010 22:59:46 +0000 (14:59 -0800)]
Merge remote branch 'origin/unfound_last_epoch_clean' into unstable

14 years agoAdd ./ceph osd tell <osd-num> dump_missing <out>
Colin Patrick McCabe [Mon, 15 Nov 2010 22:47:44 +0000 (14:47 -0800)]
Add ./ceph osd tell <osd-num> dump_missing <out>

Add a command that tells the OSD to dump its missing set for all PGs to
a file. This should be useful for debugging multi-OSD scenarios.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agosearch_for_missing:recalc stats if unfound changed
Colin Patrick McCabe [Mon, 15 Nov 2010 22:38:36 +0000 (14:38 -0800)]
search_for_missing:recalc stats if unfound changed

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agotimer: make init/shutdown explicit
Sage Weil [Mon, 15 Nov 2010 21:23:42 +0000 (13:23 -0800)]
timer: make init/shutdown explicit

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agotest_unfound.sh: start recovery at end of test
Colin Patrick McCabe [Mon, 15 Nov 2010 20:39:56 +0000 (12:39 -0800)]
test_unfound.sh: start recovery at end of test

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agotest_common.sh: add dump_osd_store
Colin Patrick McCabe [Mon, 15 Nov 2010 20:31:43 +0000 (12:31 -0800)]
test_common.sh: add dump_osd_store

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agotest_unfound.sh: fix return codes
Colin Patrick McCabe [Mon, 15 Nov 2010 19:27:44 +0000 (11:27 -0800)]
test_unfound.sh: fix return codes

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agostray_test:don't use up/down. timeout extension
Colin Patrick McCabe [Mon, 15 Nov 2010 19:05:47 +0000 (11:05 -0800)]
stray_test:don't use up/down. timeout extension

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd: add discover_all_missing
Colin Patrick McCabe [Sun, 14 Nov 2010 22:50:30 +0000 (14:50 -0800)]
osd: add discover_all_missing

Add discover_all_missing. This function makes sure that we have messages
en route to any OSD that we think might have information that could help
us discover where our unfound objects lie.

We call this function:
* In do_peer, right after activating the PG
* In _process_pg_info, when we're the primary of an active PG
* From handle_pg_notify, when we're the primary of an active PG

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoFix bugs in search_for_missing, _process_pg_info
Colin Patrick McCabe [Sun, 14 Nov 2010 21:54:55 +0000 (13:54 -0800)]
Fix bugs in search_for_missing, _process_pg_info

PG::search_for_missing: fix a bug with the handling of MSG_OSD_PG_INFO
messages. Formerly, when processing these messages, we erroneously
assumed that there was nothing missing on the peer at all even in cases
where there were missing objects.

PG::merge_log: drop unused Missing parameter.

_process_pg_info: Don't assume that just because we requested a Log
message at some point, that that is the message we're prcessing.
Correctly handle cases where we didn't get the peer's Missing set or
Log.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoAdd stray_test to test_unfound.sh
Colin Patrick McCabe [Sat, 13 Nov 2010 07:44:50 +0000 (23:44 -0800)]
Add stray_test to test_unfound.sh

This test is designed to produce a stray that nonetheless has some
useful objects. The primary should be able to find those objects and ask
for them.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoPG::finish_recovery: set info.last_epoch_clean
Colin Patrick McCabe [Sun, 14 Nov 2010 21:32:51 +0000 (13:32 -0800)]
PG::finish_recovery: set info.last_epoch_clean

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoAdd MOSDPGMissing
Colin Patrick McCabe [Fri, 12 Nov 2010 22:55:40 +0000 (14:55 -0800)]
Add MOSDPGMissing

Add MOSDPGMissing, a message which just contains the missing objects
information for a PG. We will request messages like this one in order to
locate all of our unfound objects.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd: add incompat feature LEC for last_epoch_clean
Colin Patrick McCabe [Thu, 11 Nov 2010 01:12:49 +0000 (17:12 -0800)]
osd: add incompat feature LEC for last_epoch_clean

So an old binary will fail to mount a store with new Info encoding.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: add last_epoch_clean to PG::Info
Sage Weil [Tue, 19 Oct 2010 18:40:45 +0000 (11:40 -0700)]
osd: add last_epoch_clean to PG::Info

This changes the encoding in a non-backwards compatible way.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agotest_common.sh: remove messenger debug for now
Colin Patrick McCabe [Sun, 14 Nov 2010 19:40:33 +0000 (11:40 -0800)]
test_common.sh: remove messenger debug for now

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd: skip unfound in recover_replicas
Sage Weil [Mon, 15 Nov 2010 20:06:09 +0000 (12:06 -0800)]
osd: skip unfound in recover_replicas

This is moot currently, since we don't currently start recovering replicas
until the primary is complete.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: skip unfound objects in recover_primary()
Sage Weil [Mon, 15 Nov 2010 20:04:57 +0000 (12:04 -0800)]
osd: skip unfound objects in recover_primary()

We also need to make sure we come back later when they are found.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosdmap: make printing a bit easier to read
Sage Weil [Mon, 15 Nov 2010 19:57:57 +0000 (11:57 -0800)]
osdmap: make printing a bit easier to read

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoobjecter: don't dereference null op->outbl
Sage Weil [Mon, 15 Nov 2010 19:50:53 +0000 (11:50 -0800)]
objecter: don't dereference null op->outbl

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge remote branch 'origin/testing' into unstable
Sage Weil [Mon, 15 Nov 2010 19:25:58 +0000 (11:25 -0800)]
Merge remote branch 'origin/testing' into unstable

14 years agoosd: unreg scrub when removing pg
Sage Weil [Mon, 15 Nov 2010 19:25:49 +0000 (11:25 -0800)]
osd: unreg scrub when removing pg

This fixes this crash:

    osd/OSD.cc: In function 'PG* OSD::_lookup_lock_pg(pg_t)':
    osd/OSD.cc:956: FAILED assert(pg_map.count(pgid))
    ceph version 0.24~rc (7201497f2feef6a2bbd0baf89e3a14b8a880e79f)
    1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x34) [0x81ee4e]
    2: (OSD::_lookup_lock_pg(pg_t)+0x6f) [0x6d3d57]
    3: (OSD::sched_scrub()+0x2e9) [0x6e4445]
    4: (OSD::tick()+0x204) [0x6f168e]
    5: (OSD::C_Tick::finish(int)+0x1c) [0x7613bc]
    6: (SafeTimer::timer_thread()+0x189) [0x81bcf5]
    7: (SafeTimerThread::entry()+0x19) [0x81dd73]
    8: (Thread::_entry_func(void*)+0x20) [0x66496a]
    9: (()+0x68ba) [0x7fb807d118ba]
    10: (clone()+0x6d) [0x7fb806a7002d]
    NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Reported-by: Colin McCabe <colinm@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'msgr_zerocopy_read' into unstable
Sage Weil [Mon, 15 Nov 2010 04:48:43 +0000 (20:48 -0800)]
Merge branch 'msgr_zerocopy_read' into unstable

14 years agolibrados: pass provided buffer to objecter on rados_read
Sage Weil [Mon, 15 Nov 2010 04:29:40 +0000 (20:29 -0800)]
librados: pass provided buffer to objecter on rados_read

This allows us to avoid to the data copy if the objecter and msgr manage
to use it.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoobjecter: post rx buffer to msgr if target bufferlist is present
Sage Weil [Mon, 15 Nov 2010 04:28:44 +0000 (20:28 -0800)]
objecter: post rx buffer to msgr if target bufferlist is present

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: use provided rx buffer if present
Sage Weil [Mon, 15 Nov 2010 04:26:52 +0000 (20:26 -0800)]
msgr: use provided rx buffer if present

This changes the read path so that we hold the Connection::lock mutex while
reading data off the socket.  This ensures that we are reading into a
buffer we are allowed to use, and allows users to revoke a previously
posted buffer.  If that happens, switch over to a newly allocated buffer.

Note that currently the final result bufferlist may contain part of the
provided buffer and part of a newly allocated buffer.  This is okay as long
as we will always read the same data into the buffer.  And in practice, if
the rx buffer is revoked then the message itself will be thrown out anyway.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: add Connection rx buffer interface
Sage Weil [Mon, 15 Nov 2010 04:23:52 +0000 (20:23 -0800)]
msgr: add Connection rx buffer interface

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: implement get_connection()
Sage Weil [Mon, 15 Nov 2010 04:23:10 +0000 (20:23 -0800)]
msgr: implement get_connection()

Get a Connection* for the given destination.  This mirrors submit_message,
but does not actually queue a message.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agobuffer: implement list::iterator::get_current_ptr()
Sage Weil [Mon, 15 Nov 2010 04:21:05 +0000 (20:21 -0800)]
buffer: implement list::iterator::get_current_ptr()

Return a buffer::ptr for the ptr at the current position/offset, with the
length set to the remaining space in the current buffer.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoObjecter::shutdown: shut down timer.
Colin Patrick McCabe [Sun, 14 Nov 2010 19:29:29 +0000 (11:29 -0800)]
Objecter::shutdown: shut down timer.

We have to explictly shut down the timer in Objecter::shutdown.
Otherwise, we are relying on the destructor of SafeTimer to do it.
Unfortunately, that destructor gets called after the mutex the timer is
using has already been destroyed.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMerge remote branch 'origin/msgr' into testing
Sage Weil [Sat, 13 Nov 2010 04:43:30 +0000 (20:43 -0800)]
Merge remote branch 'origin/msgr' into testing

14 years agodebug: don't print thread id twice
Sage Weil [Sat, 13 Nov 2010 00:00:12 +0000 (16:00 -0800)]
debug: don't print thread id twice

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: cleanup: make queue_received non-inline; some helpful debug
Sage Weil [Fri, 12 Nov 2010 23:59:50 +0000 (15:59 -0800)]
msgr: cleanup: make queue_received non-inline; some helpful debug

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: do not clear halt_delivery
Sage Weil [Fri, 12 Nov 2010 23:56:54 +0000 (15:56 -0800)]
msgr: do not clear halt_delivery

We need to keep the halt_delivery plug set on failure/shutdown in order to
prevent a racing reader from queuing new messages.  The only time we clear
it is when we discard messages due to a session reset.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: close enqueue/discard race
Sage Weil [Fri, 12 Nov 2010 22:41:53 +0000 (14:41 -0800)]
msgr: close enqueue/discard race

We need to re-check halt_delivery after dropping and retaking pipe_lock.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: protect pipe queue_item map with pipe_lock AND dispatch_queue lock
Sage Weil [Fri, 12 Nov 2010 22:05:56 +0000 (14:05 -0800)]
msgr: protect pipe queue_item map with pipe_lock AND dispatch_queue lock

Close a few different races here.

Also, assert that queue_items are not queued in ~Pipe().

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: add 'ms inject socket failures = foo'
Sage Weil [Fri, 12 Nov 2010 21:53:49 +0000 (13:53 -0800)]
msgr: add 'ms inject socket failures = foo'

Where we fail roughly every foo'th socket operation.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: only close socket on reconnect or shutdown
Sage Weil [Fri, 12 Nov 2010 21:09:24 +0000 (13:09 -0800)]
msgr: only close socket on reconnect or shutdown

We can't modify 'sd' or (more importnatly) close sd while any other thread
might be using it, or else we might race with an open and they might end
up using someone else's fd.

Take care to _only_ close(sd) in connect(), when the reader thread is
stopped, or when reaping the connection.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: protect pipe queuing with _both_ pipe and dispatch_queue locks
Sage Weil [Fri, 12 Nov 2010 21:41:14 +0000 (13:41 -0800)]
msgr: protect pipe queuing with _both_ pipe and dispatch_queue locks

We want to make sure the pipe's queue item doesn't go away.

Also, make queue_received() require pipe_lock to be held.  This avoids some
useless unlocking/locking, since (in the case where the pipe is already
queued) we then don't need to drop the pipe_lock at all.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoTestTimers: don't test (nonexistent) Timer
Colin Patrick McCabe [Fri, 12 Nov 2010 22:49:25 +0000 (14:49 -0800)]
TestTimers: don't test (nonexistent) Timer

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoRename PG::peer to PG::do_peer
Colin Patrick McCabe [Fri, 12 Nov 2010 22:45:36 +0000 (14:45 -0800)]
Rename PG::peer to PG::do_peer

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMerge branch 'testing' into unstable
Sage Weil [Fri, 12 Nov 2010 15:59:05 +0000 (07:59 -0800)]
Merge branch 'testing' into unstable

14 years agouclient: insert lssnap results under snapdir, not live dir
Sage Weil [Fri, 12 Nov 2010 15:55:41 +0000 (07:55 -0800)]
uclient: insert lssnap results under snapdir, not live dir

Put the readdir results (list of snapshots) in the right place in the
hierarchy; we were putting them in the parent dir (as if they were real
directories).

This bug manifested itself as a snaptest-2.sh failure.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsg: fix buffer size for IPv6 address parsing
Wido den Hollander [Fri, 12 Nov 2010 15:36:00 +0000 (07:36 -0800)]
msg: fix buffer size for IPv6 address parsing

Signed-off-by: Wido den Hollander <wido@widodh.nl>
14 years agotimer: rewrite mostly from scratch
Sage Weil [Fri, 12 Nov 2010 00:38:02 +0000 (16:38 -0800)]
timer: rewrite mostly from scratch

Just use the provided lock.  This _vastly_ reduces the complexity because
we don't have to worry about races between our thread trying to fire off a
timer that is being canceled.

The old Timer class isn't used anywhere anymore.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: hit inode created via CREATE
Sage Weil [Thu, 11 Nov 2010 23:31:42 +0000 (15:31 -0800)]
mds: hit inode created via CREATE

We missed this path!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'rc' into unstable
Sage Weil [Thu, 11 Nov 2010 22:28:18 +0000 (14:28 -0800)]
Merge branch 'rc' into unstable

Conflicts:
configure.ac
src/Makefile.am

14 years agov0.23 v0.23
Sage Weil [Thu, 11 Nov 2010 00:34:17 +0000 (16:34 -0800)]
v0.23

14 years agomds: fix null_snapflush with multiple intervening snaps
Sage Weil [Thu, 11 Nov 2010 04:58:49 +0000 (20:58 -0800)]
mds: fix null_snapflush with multiple intervening snaps

The client is allowed to not send a snapflush if there is no dirty metadata
to write for a given snap.  However, the mds can only look up inodes by
the last snapid in the interval.  So, when doing a null_snapflush (filling
in for snapflushes the client didn't send), we have to walk forward through
intervening snaps until we find the right inode.

Note that this means we will call _do_snap_update multiple times on the
same inode, but with different snapids.

Add unit test to check this.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge remote branch 'origin/unfound' into unstable
Sage Weil [Thu, 11 Nov 2010 00:36:18 +0000 (16:36 -0800)]
Merge remote branch 'origin/unfound' into unstable

14 years agoosd: scrub least recently scrubbed pgs first; once a day
Sage Weil [Thu, 11 Nov 2010 00:31:26 +0000 (16:31 -0800)]
osd: scrub least recently scrubbed pgs first; once a day

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: don't scrub something we just scrubbed
Sage Weil [Wed, 10 Nov 2010 23:43:37 +0000 (15:43 -0800)]
osd: don't scrub something we just scrubbed

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: call sched_scrub on reserve reply
Sage Weil [Wed, 10 Nov 2010 23:33:31 +0000 (15:33 -0800)]
osd: call sched_scrub on reserve reply

Otherwise we have to wait until the next time it's called by the timer, and
during that period we have a reservation locally, and any other peers can't
reserve a scrub from us, and nobody makes any progress.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: fix sched_scrub
Sage Weil [Wed, 10 Nov 2010 23:28:39 +0000 (15:28 -0800)]
osd: fix sched_scrub

Insert whoami into reserved set on primary, not 0!  Also more cleanup of
sched state helpers.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: do scrub schedule state changes inside scrub()
Sage Weil [Wed, 10 Nov 2010 22:58:34 +0000 (14:58 -0800)]
osd: do scrub schedule state changes inside scrub()

Update these values under protection of pg lock iff we start scrubbing,
otherwise back out.

On scrub completion, unreserve replicas.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: track last_scrubbed in PG::Info::History
Sage Weil [Wed, 10 Nov 2010 22:55:57 +0000 (14:55 -0800)]
osd: track last_scrubbed in PG::Info::History

Share with peers and write to disk on scrub completion.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: scrub: change cancel behavior
Sage Weil [Wed, 10 Nov 2010 22:15:41 +0000 (14:15 -0800)]
osd: scrub: change cancel behavior

Use explicit flag, so that scrub_reserved always indicates whether the
osd count includes us or not.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agopg_state_string: use an ostringstream
Colin Patrick McCabe [Wed, 10 Nov 2010 22:43:26 +0000 (14:43 -0800)]
pg_state_string: use an ostringstream

Use an ostringstream for efficiency's sake.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agovstart: stop logging to /tmp/foo
Sage Weil [Wed, 10 Nov 2010 21:49:22 +0000 (13:49 -0800)]
vstart: stop logging to /tmp/foo

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: fix scrub reserved state when starting scrub
Sage Weil [Wed, 10 Nov 2010 21:39:51 +0000 (13:39 -0800)]
osd: fix scrub reserved state when starting scrub

Also document scrub scheduling/pending/active states.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agovstart: turn down msgr debugging
Sage Weil [Wed, 10 Nov 2010 21:16:34 +0000 (13:16 -0800)]
vstart: turn down msgr debugging

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomonc: cancel timer events with lock held
Sage Weil [Wed, 10 Nov 2010 21:13:38 +0000 (13:13 -0800)]
monc: cancel timer events with lock held

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoWake up clients waiting for now-found objects
Colin Patrick McCabe [Tue, 9 Nov 2010 06:15:14 +0000 (22:15 -0800)]
Wake up clients waiting for now-found objects

PG::search_for_missing: when we find a previously unfound object, check
to see if there is an entry in waiting_for_missing_object representing a
client waiting for this object.

PG::repair_object: assert that waiting_for_missing_object is empty
before messing with missing_loc. It definitely should be during a scrub.

ReplicatedPG role change logic: always take_object_waiters on the wait
queues when the PG acting set changes.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agotest_unfound.sh: test reading an unfound object.
Colin Patrick McCabe [Mon, 8 Nov 2010 20:30:26 +0000 (12:30 -0800)]
test_unfound.sh: test reading an unfound object.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agotest_unfound.sh: verify that we have unfound objs
Colin Patrick McCabe [Fri, 5 Nov 2010 00:28:39 +0000 (17:28 -0700)]
test_unfound.sh: verify that we have unfound objs

test_unfound.sh: verify that we have unfound objs.
Then, when we bring up the other OSD, verify that those unfound objects
are found (on that OSD).

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoAdd num_objects_unfound to struct pg_stat_t
Colin Patrick McCabe [Thu, 4 Nov 2010 21:40:22 +0000 (14:40 -0700)]
Add num_objects_unfound to struct pg_stat_t

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agotest_unfound.sh: shorter test
Colin Patrick McCabe [Wed, 3 Nov 2010 00:58:52 +0000 (17:58 -0700)]
test_unfound.sh: shorter test

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoPG::recover_master_log: rename a local variable
Colin Patrick McCabe [Wed, 3 Nov 2010 00:56:00 +0000 (17:56 -0700)]
PG::recover_master_log: rename a local variable

PG::recover_master_log: rename a local variable to avoid using the
overloaded term "missing".

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoOSD::_process_pg_info:search_for_missing sometimes
Colin Patrick McCabe [Wed, 3 Nov 2010 00:51:02 +0000 (17:51 -0700)]
OSD::_process_pg_info:search_for_missing sometimes

OSD::_process_pg_info: If we're the primary for this active PG, and we
have missing objects, call search_for_missing. This should ensure that
we know where to find our missing objects.

The reason why this wasn't there before is that previously, we kept the
PG from going active until all the missing objects were found.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoAdd PG::Missing::have_missing()
Colin Patrick McCabe [Wed, 3 Nov 2010 00:50:28 +0000 (17:50 -0700)]
Add PG::Missing::have_missing()

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoPG::search_for_missing: minor refactoring, comment
Colin Patrick McCabe [Wed, 3 Nov 2010 00:49:37 +0000 (17:49 -0700)]
PG::search_for_missing: minor refactoring, comment

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoPG::peer: don't block if objects are unfound
Colin Patrick McCabe [Wed, 3 Nov 2010 00:42:48 +0000 (17:42 -0700)]
PG::peer: don't block if objects are unfound

Erase the code in PG::peer that used to keep us from becoming active
when objects were still unfound. Print out the number of missing and
unfound objects at the end of PG::peer.

Erase PG::check_for_lost_objects and PG::forget_lost_objects.

14 years agoPG::peer: count/find cleanup
Colin Patrick McCabe [Wed, 3 Nov 2010 00:38:16 +0000 (17:38 -0700)]
PG::peer: count/find cleanup

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoPG.h erase deadcode
Colin Patrick McCabe [Wed, 3 Nov 2010 00:36:11 +0000 (17:36 -0700)]
PG.h erase deadcode

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoPG: nomenclature change: talk about unfound objs
Colin Patrick McCabe [Wed, 3 Nov 2010 00:34:07 +0000 (17:34 -0700)]
PG: nomenclature change: talk about unfound objs

Describe objects as "unfound" when we don't know what OSD has them.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoPG: move ostream operator to .cpp file
Colin Patrick McCabe [Wed, 3 Nov 2010 00:31:20 +0000 (17:31 -0700)]
PG: move ostream operator to .cpp file

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agomds: fix inode->frag rstat projected with snaps
Sage Weil [Wed, 10 Nov 2010 17:43:56 +0000 (09:43 -0800)]
mds: fix inode->frag rstat projected with snaps

The snapid 'first' value needs to be >= inode->first; move that into
the helper.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosdmap: break up asserts for easier debugging
Sage Weil [Wed, 10 Nov 2010 17:04:31 +0000 (09:04 -0800)]
osdmap: break up asserts for easier debugging

If we fail one of these it's helpful to know which one.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoobjecter: throttle before looking at lock protected state
Sage Weil [Wed, 10 Nov 2010 17:03:37 +0000 (09:03 -0800)]
objecter: throttle before looking at lock protected state

The take_op_budget() may drop our lock if we are in keep_balanced_budget
mode, so we need to do that _before_ we take references to internal state
that may change out from under us during that time.

This fixes a crash like

./osd/OSDMap.h: In function 'entity_inst_t OSDMap::get_inst(int)':
./osd/OSDMap.h:460: FAILED assert(exists(osd) && is_up(osd))
ceph version 0.22.1 (commit:c6f403a6f441184956e00659ce713eaee7014279)
1: (Objecter::op_submit(Objecter::Op*)+0x6c2) [0x38658854c2]
2: /usr/lib64/librados.so.1() [0x3865855dc9]
3: (RadosClient::aio_write(RadosClient::PoolCtx&, object_t, long,
ceph::buffer::list const&, unsigned long,
RadosClient::AioCompletion*)+0x24b) [0x386585724b]
4: (rados_aio_write()+0x9a) [0x386585741a]
5: /usr/bin/qemu-kvm() [0x45a305]
6: /usr/bin/qemu-kvm() [0x45a430]
7: /usr/bin/qemu-kvm() [0x43bb73]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
./osd/OSDMap.h: In function 'entity_inst_t OSDMap::get_inst(int)':
./osd/OSDMap.h:460: FAILED assert(exists(osd) && is_up(osd))
ceph version 0.22.1 (commit:c6f403a6f441184956e00659ce713eaee7014279)
1: (Objecter::op_submit(Objecter::Op*)+0x6c2) [0x38658854c2]
2: /usr/lib64/librados.so.1() [0x3865855dc9]
3: (RadosClient::aio_write(RadosClient::PoolCtx&, object_t, long,
ceph::buffer::list const&, unsigned long,
RadosClient::AioCompletion*)+0x24b) [0x386585724b]
4: (rados_aio_write()+0x9a) [0x386585741a]
5: /usr/bin/qemu-kvm() [0x45a305]
6: /usr/bin/qemu-kvm() [0x45a430]
7: /usr/bin/qemu-kvm() [0x43bb73]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
*** Caught signal (ABRT) ***
ceph version 0.22.1 (commit:c6f403a6f441184956e00659ce713eaee7014279)
1: (sigabrt_handler(int)+0x91) [0x3865922b91]
2: /lib64/libc.so.6() [0x3c0c032a30]
3: (gsignal()+0x35) [0x3c0c0329b5]
4: (abort()+0x175) [0x3c0c034195]
5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x3c110beaad]

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomon: drop unnecessary state checks
Sage Weil [Wed, 10 Nov 2010 16:50:25 +0000 (08:50 -0800)]
mon: drop unnecessary state checks

We want to ignore all beacons from the mds regardless of what state they
are in.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agodebian: don't explicitly depend on libgoogle-perftools0
Sage Weil [Wed, 10 Nov 2010 16:45:36 +0000 (08:45 -0800)]
debian: don't explicitly depend on libgoogle-perftools0

dpkg-buildpackage will autodetect the dependency.  Except on lenny, where
it doesn't exist and we don't use it!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: Enable --journal_check mode.
Greg Farnum [Wed, 10 Nov 2010 16:11:23 +0000 (08:11 -0800)]
mds: Enable --journal_check mode.

This replaces the old --shadow option, which didn't work.
It starts up the MDS daemon, then replays the journal for
another MDS, and then shuts down.

Also minimally modifies the MDSMonitor to enable this
behavior; since it requires shared state.

14 years agoosdc: Fix bad assert in ~ObjectCacher.
Greg Farnum [Tue, 9 Nov 2010 18:48:00 +0000 (10:48 -0800)]
osdc: Fix bad assert in ~ObjectCacher.

The objects data member is never empty on shutdown since it now consists
of a vector of pools. Instead, check each pool map for emptiness.

14 years agouclient: only update inode if version increased
Sage Weil [Wed, 10 Nov 2010 15:42:29 +0000 (07:42 -0800)]
uclient: only update inode if version increased

This realigns the code with the kernel version, fixing a number of
problems when you have multiple MDSs returning info on the same inode.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agodecompile_crush_bucket: fix depth-first decomp
Colin Patrick McCabe [Wed, 10 Nov 2010 07:59:06 +0000 (23:59 -0800)]
decompile_crush_bucket: fix depth-first decomp

We need to ensure that buckets are output after their dependencies. The
best way to do this is a depth-first traversal of the bucket directed
acyclic graph. The previous solution was incorrect because it in some
cases it didn't traverse the graph in the right order.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoCrushWrapper:get_bucket: ret ENOENT for no bucket
Colin Patrick McCabe [Wed, 10 Nov 2010 07:48:01 +0000 (23:48 -0800)]
CrushWrapper:get_bucket: ret ENOENT for no bucket

All the callers of CrushWrapper::get_bucket() check for error codes, but
not for NULL returns. So if there is no bucket (i.e., a NULL pointer) at
crush->bucket[i], just return the error code ENOENT. This is consistent
with how we handle other out-of-bounds requests.

Also, don't allow the caller to get us to try to access negative indices
in crush->bucket.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMerge branch 'sched_scrub' into unstable
Sage Weil [Tue, 9 Nov 2010 23:56:20 +0000 (15:56 -0800)]
Merge branch 'sched_scrub' into unstable

Conflicts:
src/osd/PG.cc
src/osd/PG.h

14 years agoosd: small cleanup
Sage Weil [Tue, 9 Nov 2010 23:50:48 +0000 (15:50 -0800)]
osd: small cleanup

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: scrub: list objects without lock held
Sage Weil [Tue, 9 Nov 2010 23:08:15 +0000 (15:08 -0800)]
osd: scrub: list objects without lock held

We'll go back to get anything we missed later.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'scrub_no_lock' into unstable
Sage Weil [Tue, 9 Nov 2010 23:46:54 +0000 (15:46 -0800)]
Merge branch 'scrub_no_lock' into unstable

14 years agops-ceph.pl: don't show self
Colin Patrick McCabe [Tue, 9 Nov 2010 23:34:52 +0000 (15:34 -0800)]
ps-ceph.pl: don't show self

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agogui: add missing #include
Sage Weil [Tue, 9 Nov 2010 23:04:10 +0000 (15:04 -0800)]
gui: add missing #include

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'rbd-fiemap' into unstable
Sage Weil [Tue, 9 Nov 2010 22:50:24 +0000 (14:50 -0800)]
Merge branch 'rbd-fiemap' into unstable

14 years agoobjecter: set READ flag on new objecter mapext/read_sparse ops
Sage Weil [Tue, 9 Nov 2010 22:49:47 +0000 (14:49 -0800)]
objecter: set READ flag on new objecter mapext/read_sparse ops

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoobjecter: fix balancer for ops with length < 0
Sage Weil [Tue, 9 Nov 2010 22:48:52 +0000 (14:48 -0800)]
objecter: fix balancer for ops with length < 0

Notably, mapext.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agofilestore: autodetect presense of FIEMAP ioctl
Sage Weil [Tue, 9 Nov 2010 22:36:02 +0000 (14:36 -0800)]
filestore: autodetect presense of FIEMAP ioctl

If it's not there, assume the whole object is allocated.

Signed-off-by: Sage Weil <sage@newdream.net>