]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
14 years agoosd_resurrection_1_impl: turn on recovery at end
Colin Patrick McCabe [Thu, 18 Nov 2010 18:09:14 +0000 (10:09 -0800)]
osd_resurrection_1_impl: turn on recovery at end

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMakefile: fix builddir weirdness
Jim Schutt [Thu, 18 Nov 2010 00:52:19 +0000 (16:52 -0800)]
Makefile: fix builddir weirdness

Signed-off-by: Jim Schutt <jaschut@sandia.gov>
14 years agoosd: rev PG::Info encoding for last_epoch_clean change
Sage Weil [Wed, 17 Nov 2010 22:37:38 +0000 (14:37 -0800)]
osd: rev PG::Info encoding for last_epoch_clean change

This was missed by 184fbf582b27c10b47101735a4495fe8c73ad186, so any fs
created between now and then won't decode properly.  It's more important
to make an fs prior to that work, though, so that the upgrade path from
the last stable version works.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'mds_frags' into unstable
Sage Weil [Wed, 17 Nov 2010 21:06:14 +0000 (13:06 -0800)]
Merge branch 'mds_frags' into unstable

14 years agomds: adjust dir_auth_pins on steal_dentry
Sage Weil [Wed, 17 Nov 2010 20:31:18 +0000 (12:31 -0800)]
mds: adjust dir_auth_pins on steal_dentry

dir_auth_pins is a counter of dentry auth_pins in the current dir; those
need to be added in when stealing.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: wrlock scatterlocks to prevent a gather racing with split/merge logging
Sage Weil [Wed, 17 Nov 2010 19:39:24 +0000 (11:39 -0800)]
mds: wrlock scatterlocks to prevent a gather racing with split/merge logging

We have the dirs split in our cache for some time while journaling it to
disk, before the fragment_notify goes out.  Make sure we don't do a
scatterlock gather during that time that will confuse the inode auth (who
has their dirfrags fragmented differently).

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix subtree map update on dirfrag merge
Sage Weil [Wed, 17 Nov 2010 19:38:00 +0000 (11:38 -0800)]
mds: fix subtree map update on dirfrag merge

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: clear PIN_SUBTREE on split/merge in purge_strays
Sage Weil [Wed, 17 Nov 2010 19:23:15 +0000 (11:23 -0800)]
mds: clear PIN_SUBTREE on split/merge in purge_strays

This makes the helper work for merge as well as split.  Remove the special
fixups in the caller that were making split work before.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: don't complete freeze while parent inode is frozen
Sage Weil [Wed, 17 Nov 2010 17:20:15 +0000 (09:20 -0800)]
mds: don't complete freeze while parent inode is frozen

This makes maybe_finish_freeze() conditions match that of is_freezeable()
and avoids an assert.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: move dirty rstat inodes to new dir on refragment
Sage Weil [Wed, 17 Nov 2010 17:19:39 +0000 (09:19 -0800)]
mds: move dirty rstat inodes to new dir on refragment

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: flush log on fragment
Sage Weil [Wed, 17 Nov 2010 16:42:44 +0000 (08:42 -0800)]
mds: flush log on fragment

This makes request lock auth_pins expire, so the fragment moves along.
Otherwise we can end up waiting for the log flush timer to go off.

This isn't a complete solution; in-progress requests won't know to flush.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: initialize PIN_SUBTREE on split
Sage Weil [Wed, 17 Nov 2010 16:33:08 +0000 (08:33 -0800)]
mds: initialize PIN_SUBTREE on split

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix discover requests, tracking wrt fragments
Sage Weil [Mon, 15 Nov 2010 22:16:43 +0000 (14:16 -0800)]
mds: fix discover requests, tracking wrt fragments

Track discover requests by tid.  The old system of tracking outstanding
discovers was kludgey and somewhat broken.  Also there is a possibility
of getting dup replies if someone does kick_requests().

There is still room for improvement with the logic detemrining when a
discover is sent: we may want to discover multiple dirfrags in parallel,
but the current code will only do one at a time.

Signed-off-by: Sage Weil <sage@newdream.net>
comment

14 years agomds: fix EFragment replay
Sage Weil [Tue, 16 Nov 2010 23:59:48 +0000 (15:59 -0800)]
mds: fix EFragment replay

If the inode already exists in our cache, adjust our (existing) fragments.
But it might not.  In that case, we just replay the metablob.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: don't fragment mdsdir or .ceph
Sage Weil [Tue, 16 Nov 2010 19:48:58 +0000 (11:48 -0800)]
mds: don't fragment mdsdir or .ceph

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoDetect broken system linux/fiemap.h
Jim Schutt [Wed, 17 Nov 2010 20:39:52 +0000 (13:39 -0700)]
Detect broken system linux/fiemap.h

RedHat 5.5 has a /usr/include/linux/fiemap.h, but it is
broken because it does not itself include linux/types.h.
As a result, __u64 and friends are not defined.

We have a Ceph-local copy of fiemap.h, so use it
if the system version is broken.

While we're at it, fix up the configure message to
note we're using a local copy.

Signed-off-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosdmap: don't include blacklist info in summary
Sage Weil [Wed, 17 Nov 2010 18:24:21 +0000 (10:24 -0800)]
osdmap: don't include blacklist info in summary

It's confusing users and isn't that important.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoconfig: added max_mds
Samuel Just [Wed, 17 Nov 2010 00:07:47 +0000 (16:07 -0800)]
config: added max_mds
MDSMonitor: create_new_fs adapted to use the max_mds parameter

max_mds is now a configurable value and create_new_fs will initialize
max_mds to the specified value.

Signed-off-by: Samuel Just <samuelj@hq.newdream.net>
14 years agomds: allow frag merge on subtree root
Sage Weil [Mon, 15 Nov 2010 21:43:56 +0000 (13:43 -0800)]
mds: allow frag merge on subtree root

Fix purge_stolen and adjust_dir_fragments.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: make dirfrag thrashing join and split
Sage Weil [Mon, 15 Nov 2010 21:24:24 +0000 (13:24 -0800)]
mds: make dirfrag thrashing join and split

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: add timestamp to LogEvents
Sage Weil [Tue, 16 Nov 2010 20:07:38 +0000 (12:07 -0800)]
mds: add timestamp to LogEvents

This just gives us a bit of useful info when debugging problems.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: fix trailing + in pg state string rendering
Sage Weil [Tue, 16 Nov 2010 18:32:19 +0000 (10:32 -0800)]
osd: fix trailing + in pg state string rendering

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge remote branch 'origin/unfound' into unstable
Sage Weil [Tue, 16 Nov 2010 18:10:43 +0000 (10:10 -0800)]
Merge remote branch 'origin/unfound' into unstable

14 years agomds: be less noisy about cap imports
Sage Weil [Tue, 16 Nov 2010 18:06:05 +0000 (10:06 -0800)]
mds: be less noisy about cap imports

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'mds_dir_hash' into unstable
Sage Weil [Tue, 16 Nov 2010 18:01:15 +0000 (10:01 -0800)]
Merge branch 'mds_dir_hash' into unstable

14 years agomds/client: pass dir hash over the wire
Sage Weil [Tue, 16 Nov 2010 17:50:55 +0000 (09:50 -0800)]
mds/client: pass dir hash over the wire

Add a feature bit DIRLAYOUTHASH.

Also fix client request routing for lookups (we were only hashing when
a Dentry pointer was provided, not when a relative path was).

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: set dir hash on root inode
Sage Weil [Tue, 16 Nov 2010 17:47:34 +0000 (09:47 -0800)]
mds: set dir hash on root inode

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: set mode before all the file type dependent inode initialization!
Sage Weil [Tue, 16 Nov 2010 17:42:51 +0000 (09:42 -0800)]
mds: set mode before all the file type dependent inode initialization!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: add DIRLAYOUTHASH feature bit
Sage Weil [Tue, 26 Oct 2010 03:41:24 +0000 (20:41 -0700)]
mds: add DIRLAYOUTHASH feature bit

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: make dentry hash a dir layout property
Sage Weil [Mon, 25 Oct 2010 23:46:01 +0000 (16:46 -0700)]
mds: make dentry hash a dir layout property

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoRadosClient::shutdown: call monclient::shutdown
Colin Patrick McCabe [Tue, 16 Nov 2010 02:33:01 +0000 (18:33 -0800)]
RadosClient::shutdown: call monclient::shutdown

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd: don't stop recovery when there are unfound
Colin Patrick McCabe [Tue, 16 Nov 2010 00:19:27 +0000 (16:19 -0800)]
osd: don't stop recovery when there are unfound

There are two phases in recovery: one where we get all the right objects
on to the primary, and another where we push all those objects out to
the replicas. Formerly, we would not start the second phase until there
were no missing objects at all on the primary.

This change modifies that so that we will start the second phase even if
there are unfound objects. However, we will still wait for all findable
missing objects to be brought to us, of course.

Get rid of uptodate_set. We can find the same information by looking at
the missing and missing_loc sets directly. Keeping the uptodate_set...
er... up-to-date would be very difficult in the presence of all the things
that can modify the missing and missing_loc sets.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agodumpjournal.cc: fix compile
Colin Patrick McCabe [Tue, 16 Nov 2010 01:01:19 +0000 (17:01 -0800)]
dumpjournal.cc: fix compile

dumpjournal needs to create its own SafeTimers and pass them in to some
constructors.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agorbd: fix rbd snap rm class handling
Yehuda Sadeh [Tue, 16 Nov 2010 00:43:25 +0000 (16:43 -0800)]
rbd: fix rbd snap rm class handling

14 years agoMerge remote branch 'origin/unfound_last_epoch_clean' into unstable
Sage Weil [Mon, 15 Nov 2010 22:59:46 +0000 (14:59 -0800)]
Merge remote branch 'origin/unfound_last_epoch_clean' into unstable

14 years agoAdd ./ceph osd tell <osd-num> dump_missing <out>
Colin Patrick McCabe [Mon, 15 Nov 2010 22:47:44 +0000 (14:47 -0800)]
Add ./ceph osd tell <osd-num> dump_missing <out>

Add a command that tells the OSD to dump its missing set for all PGs to
a file. This should be useful for debugging multi-OSD scenarios.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agosearch_for_missing:recalc stats if unfound changed
Colin Patrick McCabe [Mon, 15 Nov 2010 22:38:36 +0000 (14:38 -0800)]
search_for_missing:recalc stats if unfound changed

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agotimer: make init/shutdown explicit
Sage Weil [Mon, 15 Nov 2010 21:23:42 +0000 (13:23 -0800)]
timer: make init/shutdown explicit

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agotest_unfound.sh: start recovery at end of test
Colin Patrick McCabe [Mon, 15 Nov 2010 20:39:56 +0000 (12:39 -0800)]
test_unfound.sh: start recovery at end of test

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agotest_common.sh: add dump_osd_store
Colin Patrick McCabe [Mon, 15 Nov 2010 20:31:43 +0000 (12:31 -0800)]
test_common.sh: add dump_osd_store

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agotest_unfound.sh: fix return codes
Colin Patrick McCabe [Mon, 15 Nov 2010 19:27:44 +0000 (11:27 -0800)]
test_unfound.sh: fix return codes

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agostray_test:don't use up/down. timeout extension
Colin Patrick McCabe [Mon, 15 Nov 2010 19:05:47 +0000 (11:05 -0800)]
stray_test:don't use up/down. timeout extension

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd: add discover_all_missing
Colin Patrick McCabe [Sun, 14 Nov 2010 22:50:30 +0000 (14:50 -0800)]
osd: add discover_all_missing

Add discover_all_missing. This function makes sure that we have messages
en route to any OSD that we think might have information that could help
us discover where our unfound objects lie.

We call this function:
* In do_peer, right after activating the PG
* In _process_pg_info, when we're the primary of an active PG
* From handle_pg_notify, when we're the primary of an active PG

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoFix bugs in search_for_missing, _process_pg_info
Colin Patrick McCabe [Sun, 14 Nov 2010 21:54:55 +0000 (13:54 -0800)]
Fix bugs in search_for_missing, _process_pg_info

PG::search_for_missing: fix a bug with the handling of MSG_OSD_PG_INFO
messages. Formerly, when processing these messages, we erroneously
assumed that there was nothing missing on the peer at all even in cases
where there were missing objects.

PG::merge_log: drop unused Missing parameter.

_process_pg_info: Don't assume that just because we requested a Log
message at some point, that that is the message we're prcessing.
Correctly handle cases where we didn't get the peer's Missing set or
Log.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoAdd stray_test to test_unfound.sh
Colin Patrick McCabe [Sat, 13 Nov 2010 07:44:50 +0000 (23:44 -0800)]
Add stray_test to test_unfound.sh

This test is designed to produce a stray that nonetheless has some
useful objects. The primary should be able to find those objects and ask
for them.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoPG::finish_recovery: set info.last_epoch_clean
Colin Patrick McCabe [Sun, 14 Nov 2010 21:32:51 +0000 (13:32 -0800)]
PG::finish_recovery: set info.last_epoch_clean

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoAdd MOSDPGMissing
Colin Patrick McCabe [Fri, 12 Nov 2010 22:55:40 +0000 (14:55 -0800)]
Add MOSDPGMissing

Add MOSDPGMissing, a message which just contains the missing objects
information for a PG. We will request messages like this one in order to
locate all of our unfound objects.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd: add incompat feature LEC for last_epoch_clean
Colin Patrick McCabe [Thu, 11 Nov 2010 01:12:49 +0000 (17:12 -0800)]
osd: add incompat feature LEC for last_epoch_clean

So an old binary will fail to mount a store with new Info encoding.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: add last_epoch_clean to PG::Info
Sage Weil [Tue, 19 Oct 2010 18:40:45 +0000 (11:40 -0700)]
osd: add last_epoch_clean to PG::Info

This changes the encoding in a non-backwards compatible way.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agotest_common.sh: remove messenger debug for now
Colin Patrick McCabe [Sun, 14 Nov 2010 19:40:33 +0000 (11:40 -0800)]
test_common.sh: remove messenger debug for now

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd: skip unfound in recover_replicas
Sage Weil [Mon, 15 Nov 2010 20:06:09 +0000 (12:06 -0800)]
osd: skip unfound in recover_replicas

This is moot currently, since we don't currently start recovering replicas
until the primary is complete.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: skip unfound objects in recover_primary()
Sage Weil [Mon, 15 Nov 2010 20:04:57 +0000 (12:04 -0800)]
osd: skip unfound objects in recover_primary()

We also need to make sure we come back later when they are found.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosdmap: make printing a bit easier to read
Sage Weil [Mon, 15 Nov 2010 19:57:57 +0000 (11:57 -0800)]
osdmap: make printing a bit easier to read

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoobjecter: don't dereference null op->outbl
Sage Weil [Mon, 15 Nov 2010 19:50:53 +0000 (11:50 -0800)]
objecter: don't dereference null op->outbl

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge remote branch 'origin/testing' into unstable
Sage Weil [Mon, 15 Nov 2010 19:25:58 +0000 (11:25 -0800)]
Merge remote branch 'origin/testing' into unstable

14 years agoosd: unreg scrub when removing pg
Sage Weil [Mon, 15 Nov 2010 19:25:49 +0000 (11:25 -0800)]
osd: unreg scrub when removing pg

This fixes this crash:

    osd/OSD.cc: In function 'PG* OSD::_lookup_lock_pg(pg_t)':
    osd/OSD.cc:956: FAILED assert(pg_map.count(pgid))
    ceph version 0.24~rc (7201497f2feef6a2bbd0baf89e3a14b8a880e79f)
    1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x34) [0x81ee4e]
    2: (OSD::_lookup_lock_pg(pg_t)+0x6f) [0x6d3d57]
    3: (OSD::sched_scrub()+0x2e9) [0x6e4445]
    4: (OSD::tick()+0x204) [0x6f168e]
    5: (OSD::C_Tick::finish(int)+0x1c) [0x7613bc]
    6: (SafeTimer::timer_thread()+0x189) [0x81bcf5]
    7: (SafeTimerThread::entry()+0x19) [0x81dd73]
    8: (Thread::_entry_func(void*)+0x20) [0x66496a]
    9: (()+0x68ba) [0x7fb807d118ba]
    10: (clone()+0x6d) [0x7fb806a7002d]
    NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Reported-by: Colin McCabe <colinm@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'msgr_zerocopy_read' into unstable
Sage Weil [Mon, 15 Nov 2010 04:48:43 +0000 (20:48 -0800)]
Merge branch 'msgr_zerocopy_read' into unstable

14 years agolibrados: pass provided buffer to objecter on rados_read
Sage Weil [Mon, 15 Nov 2010 04:29:40 +0000 (20:29 -0800)]
librados: pass provided buffer to objecter on rados_read

This allows us to avoid to the data copy if the objecter and msgr manage
to use it.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoobjecter: post rx buffer to msgr if target bufferlist is present
Sage Weil [Mon, 15 Nov 2010 04:28:44 +0000 (20:28 -0800)]
objecter: post rx buffer to msgr if target bufferlist is present

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: use provided rx buffer if present
Sage Weil [Mon, 15 Nov 2010 04:26:52 +0000 (20:26 -0800)]
msgr: use provided rx buffer if present

This changes the read path so that we hold the Connection::lock mutex while
reading data off the socket.  This ensures that we are reading into a
buffer we are allowed to use, and allows users to revoke a previously
posted buffer.  If that happens, switch over to a newly allocated buffer.

Note that currently the final result bufferlist may contain part of the
provided buffer and part of a newly allocated buffer.  This is okay as long
as we will always read the same data into the buffer.  And in practice, if
the rx buffer is revoked then the message itself will be thrown out anyway.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: add Connection rx buffer interface
Sage Weil [Mon, 15 Nov 2010 04:23:52 +0000 (20:23 -0800)]
msgr: add Connection rx buffer interface

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: implement get_connection()
Sage Weil [Mon, 15 Nov 2010 04:23:10 +0000 (20:23 -0800)]
msgr: implement get_connection()

Get a Connection* for the given destination.  This mirrors submit_message,
but does not actually queue a message.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agobuffer: implement list::iterator::get_current_ptr()
Sage Weil [Mon, 15 Nov 2010 04:21:05 +0000 (20:21 -0800)]
buffer: implement list::iterator::get_current_ptr()

Return a buffer::ptr for the ptr at the current position/offset, with the
length set to the remaining space in the current buffer.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoObjecter::shutdown: shut down timer.
Colin Patrick McCabe [Sun, 14 Nov 2010 19:29:29 +0000 (11:29 -0800)]
Objecter::shutdown: shut down timer.

We have to explictly shut down the timer in Objecter::shutdown.
Otherwise, we are relying on the destructor of SafeTimer to do it.
Unfortunately, that destructor gets called after the mutex the timer is
using has already been destroyed.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMerge remote branch 'origin/msgr' into testing
Sage Weil [Sat, 13 Nov 2010 04:43:30 +0000 (20:43 -0800)]
Merge remote branch 'origin/msgr' into testing

14 years agodebug: don't print thread id twice
Sage Weil [Sat, 13 Nov 2010 00:00:12 +0000 (16:00 -0800)]
debug: don't print thread id twice

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: cleanup: make queue_received non-inline; some helpful debug
Sage Weil [Fri, 12 Nov 2010 23:59:50 +0000 (15:59 -0800)]
msgr: cleanup: make queue_received non-inline; some helpful debug

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: do not clear halt_delivery
Sage Weil [Fri, 12 Nov 2010 23:56:54 +0000 (15:56 -0800)]
msgr: do not clear halt_delivery

We need to keep the halt_delivery plug set on failure/shutdown in order to
prevent a racing reader from queuing new messages.  The only time we clear
it is when we discard messages due to a session reset.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: close enqueue/discard race
Sage Weil [Fri, 12 Nov 2010 22:41:53 +0000 (14:41 -0800)]
msgr: close enqueue/discard race

We need to re-check halt_delivery after dropping and retaking pipe_lock.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: protect pipe queue_item map with pipe_lock AND dispatch_queue lock
Sage Weil [Fri, 12 Nov 2010 22:05:56 +0000 (14:05 -0800)]
msgr: protect pipe queue_item map with pipe_lock AND dispatch_queue lock

Close a few different races here.

Also, assert that queue_items are not queued in ~Pipe().

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: add 'ms inject socket failures = foo'
Sage Weil [Fri, 12 Nov 2010 21:53:49 +0000 (13:53 -0800)]
msgr: add 'ms inject socket failures = foo'

Where we fail roughly every foo'th socket operation.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: only close socket on reconnect or shutdown
Sage Weil [Fri, 12 Nov 2010 21:09:24 +0000 (13:09 -0800)]
msgr: only close socket on reconnect or shutdown

We can't modify 'sd' or (more importnatly) close sd while any other thread
might be using it, or else we might race with an open and they might end
up using someone else's fd.

Take care to _only_ close(sd) in connect(), when the reader thread is
stopped, or when reaping the connection.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: protect pipe queuing with _both_ pipe and dispatch_queue locks
Sage Weil [Fri, 12 Nov 2010 21:41:14 +0000 (13:41 -0800)]
msgr: protect pipe queuing with _both_ pipe and dispatch_queue locks

We want to make sure the pipe's queue item doesn't go away.

Also, make queue_received() require pipe_lock to be held.  This avoids some
useless unlocking/locking, since (in the case where the pipe is already
queued) we then don't need to drop the pipe_lock at all.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoTestTimers: don't test (nonexistent) Timer
Colin Patrick McCabe [Fri, 12 Nov 2010 22:49:25 +0000 (14:49 -0800)]
TestTimers: don't test (nonexistent) Timer

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoRename PG::peer to PG::do_peer
Colin Patrick McCabe [Fri, 12 Nov 2010 22:45:36 +0000 (14:45 -0800)]
Rename PG::peer to PG::do_peer

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMerge branch 'testing' into unstable
Sage Weil [Fri, 12 Nov 2010 15:59:05 +0000 (07:59 -0800)]
Merge branch 'testing' into unstable

14 years agouclient: insert lssnap results under snapdir, not live dir
Sage Weil [Fri, 12 Nov 2010 15:55:41 +0000 (07:55 -0800)]
uclient: insert lssnap results under snapdir, not live dir

Put the readdir results (list of snapshots) in the right place in the
hierarchy; we were putting them in the parent dir (as if they were real
directories).

This bug manifested itself as a snaptest-2.sh failure.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsg: fix buffer size for IPv6 address parsing
Wido den Hollander [Fri, 12 Nov 2010 15:36:00 +0000 (07:36 -0800)]
msg: fix buffer size for IPv6 address parsing

Signed-off-by: Wido den Hollander <wido@widodh.nl>
14 years agotimer: rewrite mostly from scratch
Sage Weil [Fri, 12 Nov 2010 00:38:02 +0000 (16:38 -0800)]
timer: rewrite mostly from scratch

Just use the provided lock.  This _vastly_ reduces the complexity because
we don't have to worry about races between our thread trying to fire off a
timer that is being canceled.

The old Timer class isn't used anywhere anymore.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: hit inode created via CREATE
Sage Weil [Thu, 11 Nov 2010 23:31:42 +0000 (15:31 -0800)]
mds: hit inode created via CREATE

We missed this path!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'rc' into unstable
Sage Weil [Thu, 11 Nov 2010 22:28:18 +0000 (14:28 -0800)]
Merge branch 'rc' into unstable

Conflicts:
configure.ac
src/Makefile.am

14 years agov0.23 v0.23
Sage Weil [Thu, 11 Nov 2010 00:34:17 +0000 (16:34 -0800)]
v0.23

14 years agomds: fix null_snapflush with multiple intervening snaps
Sage Weil [Thu, 11 Nov 2010 04:58:49 +0000 (20:58 -0800)]
mds: fix null_snapflush with multiple intervening snaps

The client is allowed to not send a snapflush if there is no dirty metadata
to write for a given snap.  However, the mds can only look up inodes by
the last snapid in the interval.  So, when doing a null_snapflush (filling
in for snapflushes the client didn't send), we have to walk forward through
intervening snaps until we find the right inode.

Note that this means we will call _do_snap_update multiple times on the
same inode, but with different snapids.

Add unit test to check this.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge remote branch 'origin/unfound' into unstable
Sage Weil [Thu, 11 Nov 2010 00:36:18 +0000 (16:36 -0800)]
Merge remote branch 'origin/unfound' into unstable

14 years agoosd: scrub least recently scrubbed pgs first; once a day
Sage Weil [Thu, 11 Nov 2010 00:31:26 +0000 (16:31 -0800)]
osd: scrub least recently scrubbed pgs first; once a day

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: don't scrub something we just scrubbed
Sage Weil [Wed, 10 Nov 2010 23:43:37 +0000 (15:43 -0800)]
osd: don't scrub something we just scrubbed

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: call sched_scrub on reserve reply
Sage Weil [Wed, 10 Nov 2010 23:33:31 +0000 (15:33 -0800)]
osd: call sched_scrub on reserve reply

Otherwise we have to wait until the next time it's called by the timer, and
during that period we have a reservation locally, and any other peers can't
reserve a scrub from us, and nobody makes any progress.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: fix sched_scrub
Sage Weil [Wed, 10 Nov 2010 23:28:39 +0000 (15:28 -0800)]
osd: fix sched_scrub

Insert whoami into reserved set on primary, not 0!  Also more cleanup of
sched state helpers.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: do scrub schedule state changes inside scrub()
Sage Weil [Wed, 10 Nov 2010 22:58:34 +0000 (14:58 -0800)]
osd: do scrub schedule state changes inside scrub()

Update these values under protection of pg lock iff we start scrubbing,
otherwise back out.

On scrub completion, unreserve replicas.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: track last_scrubbed in PG::Info::History
Sage Weil [Wed, 10 Nov 2010 22:55:57 +0000 (14:55 -0800)]
osd: track last_scrubbed in PG::Info::History

Share with peers and write to disk on scrub completion.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: scrub: change cancel behavior
Sage Weil [Wed, 10 Nov 2010 22:15:41 +0000 (14:15 -0800)]
osd: scrub: change cancel behavior

Use explicit flag, so that scrub_reserved always indicates whether the
osd count includes us or not.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agopg_state_string: use an ostringstream
Colin Patrick McCabe [Wed, 10 Nov 2010 22:43:26 +0000 (14:43 -0800)]
pg_state_string: use an ostringstream

Use an ostringstream for efficiency's sake.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agovstart: stop logging to /tmp/foo
Sage Weil [Wed, 10 Nov 2010 21:49:22 +0000 (13:49 -0800)]
vstart: stop logging to /tmp/foo

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: fix scrub reserved state when starting scrub
Sage Weil [Wed, 10 Nov 2010 21:39:51 +0000 (13:39 -0800)]
osd: fix scrub reserved state when starting scrub

Also document scrub scheduling/pending/active states.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agovstart: turn down msgr debugging
Sage Weil [Wed, 10 Nov 2010 21:16:34 +0000 (13:16 -0800)]
vstart: turn down msgr debugging

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomonc: cancel timer events with lock held
Sage Weil [Wed, 10 Nov 2010 21:13:38 +0000 (13:13 -0800)]
monc: cancel timer events with lock held

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoWake up clients waiting for now-found objects
Colin Patrick McCabe [Tue, 9 Nov 2010 06:15:14 +0000 (22:15 -0800)]
Wake up clients waiting for now-found objects

PG::search_for_missing: when we find a previously unfound object, check
to see if there is an entry in waiting_for_missing_object representing a
client waiting for this object.

PG::repair_object: assert that waiting_for_missing_object is empty
before messing with missing_loc. It definitely should be during a scrub.

ReplicatedPG role change logic: always take_object_waiters on the wait
queues when the PG acting set changes.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agotest_unfound.sh: test reading an unfound object.
Colin Patrick McCabe [Mon, 8 Nov 2010 20:30:26 +0000 (12:30 -0800)]
test_unfound.sh: test reading an unfound object.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agotest_unfound.sh: verify that we have unfound objs
Colin Patrick McCabe [Fri, 5 Nov 2010 00:28:39 +0000 (17:28 -0700)]
test_unfound.sh: verify that we have unfound objs

test_unfound.sh: verify that we have unfound objs.
Then, when we bring up the other OSD, verify that those unfound objects
are found (on that OSD).

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoAdd num_objects_unfound to struct pg_stat_t
Colin Patrick McCabe [Thu, 4 Nov 2010 21:40:22 +0000 (14:40 -0700)]
Add num_objects_unfound to struct pg_stat_t

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>