]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
14 years agofilestore: only warn about disk write cache on kernels <2.6.33
Sage Weil [Mon, 22 Nov 2010 00:16:43 +0000 (16:16 -0800)]
filestore: only warn about disk write cache on kernels <2.6.33

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: fix search_for_missing: old last_update implies object not present
Sage Weil [Mon, 22 Nov 2010 00:15:25 +0000 (16:15 -0800)]
osd: fix search_for_missing: old last_update implies object not present

For example, if an osd sends an empty PG::Info (last_update = 0'0) and
empty missing, we should not conclude that the object is there.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoinit-ceph: fix cleanlogs for no log_sym_dir case
Sage Weil [Mon, 22 Nov 2010 00:09:13 +0000 (16:09 -0800)]
init-ceph: fix cleanlogs for no log_sym_dir case

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoOSDMap: const cleanup
Colin Patrick McCabe [Sat, 20 Nov 2010 03:15:11 +0000 (19:15 -0800)]
OSDMap: const cleanup

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agomds-dumper: Define Dumper::~Dumper()
Colin Patrick McCabe [Sat, 20 Nov 2010 03:14:29 +0000 (19:14 -0800)]
mds-dumper: Define Dumper::~Dumper()

To fix compile error.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoReplicatedPG::pull: fix test for unfound
Colin Patrick McCabe [Fri, 19 Nov 2010 22:21:00 +0000 (14:21 -0800)]
ReplicatedPG::pull: fix test for unfound

The test for unfound objects was reversed, leading us to try to pull
unfound objects and refrain from pulling objects that we knew how to
get. Should fix bug #585.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosdmap: fix printing, again
Sage Weil [Fri, 19 Nov 2010 21:41:58 +0000 (13:41 -0800)]
osdmap: fix printing, again

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge remote branch 'origin/mds' into unstable
Sage Weil [Fri, 19 Nov 2010 18:17:58 +0000 (10:17 -0800)]
Merge remote branch 'origin/mds' into unstable

14 years agomulti-dump.sh: add diff mode
Colin Patrick McCabe [Fri, 19 Nov 2010 05:13:02 +0000 (21:13 -0800)]
multi-dump.sh: add diff mode

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoAdd multi-dump.sh
Colin Patrick McCabe [Fri, 19 Nov 2010 04:57:15 +0000 (20:57 -0800)]
Add multi-dump.sh

This is a debug tool that can dump out Ceph information at various
epochs. For instance, it can show how the OSDmap changed over time.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoReplicatedPG::get_object_contect: fix broken calls
Colin Patrick McCabe [Thu, 18 Nov 2010 22:51:18 +0000 (14:51 -0800)]
ReplicatedPG::get_object_contect: fix broken calls

ReplicatedPG::get_object_context takes three parameters.  The last two
are "const object_locator_t& oloc" and "bool can_create".
Unfortunately, booleans can degrade to ints, and ints can be used to
initialize objects of type object_locator_t.

So when you make a call like:
> ctx->snapset_obc = get_object_context(snapoid, true);

What happens is that you actually call:
> get_object_context(snapoid, object_locator(1), false);

So you pass an invalid and *not* blank object_locator_t, and pass false
for can_create. This is not what the caller wanted. This change gets rid
of the default parameters and fixes the callers.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoReplicatedPG: call finish_recovery when needed
Colin Patrick McCabe [Thu, 18 Nov 2010 20:37:49 +0000 (12:37 -0800)]
ReplicatedPG: call finish_recovery when needed

Don't loop in ReplicatedPG::start_recovery_ops. There is already a loop
in both recover_replicas and recover_primary that will try to do as many
recovery ops as it can, there's no need to repeat it. Also, the former
loop provably would never execute more than once because of the way
the code was structured.

If there are no more recovery operations to do, and PG::is_all_uptodate
is true at the end of ReplicatedPG::start_recovery_ops, call
finish_recovery.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd_resurrection_1_impl: turn on recovery at end
Colin Patrick McCabe [Thu, 18 Nov 2010 18:09:14 +0000 (10:09 -0800)]
osd_resurrection_1_impl: turn on recovery at end

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMakefile: fix builddir weirdness
Jim Schutt [Thu, 18 Nov 2010 00:52:19 +0000 (16:52 -0800)]
Makefile: fix builddir weirdness

Signed-off-by: Jim Schutt <jaschut@sandia.gov>
14 years agoosd: rev PG::Info encoding for last_epoch_clean change
Sage Weil [Wed, 17 Nov 2010 22:37:38 +0000 (14:37 -0800)]
osd: rev PG::Info encoding for last_epoch_clean change

This was missed by 184fbf582b27c10b47101735a4495fe8c73ad186, so any fs
created between now and then won't decode properly.  It's more important
to make an fs prior to that work, though, so that the upgrade path from
the last stable version works.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'mds_frags' into unstable
Sage Weil [Wed, 17 Nov 2010 21:06:14 +0000 (13:06 -0800)]
Merge branch 'mds_frags' into unstable

14 years agomds: adjust dir_auth_pins on steal_dentry
Sage Weil [Wed, 17 Nov 2010 20:31:18 +0000 (12:31 -0800)]
mds: adjust dir_auth_pins on steal_dentry

dir_auth_pins is a counter of dentry auth_pins in the current dir; those
need to be added in when stealing.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: wrlock scatterlocks to prevent a gather racing with split/merge logging
Sage Weil [Wed, 17 Nov 2010 19:39:24 +0000 (11:39 -0800)]
mds: wrlock scatterlocks to prevent a gather racing with split/merge logging

We have the dirs split in our cache for some time while journaling it to
disk, before the fragment_notify goes out.  Make sure we don't do a
scatterlock gather during that time that will confuse the inode auth (who
has their dirfrags fragmented differently).

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix subtree map update on dirfrag merge
Sage Weil [Wed, 17 Nov 2010 19:38:00 +0000 (11:38 -0800)]
mds: fix subtree map update on dirfrag merge

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: clear PIN_SUBTREE on split/merge in purge_strays
Sage Weil [Wed, 17 Nov 2010 19:23:15 +0000 (11:23 -0800)]
mds: clear PIN_SUBTREE on split/merge in purge_strays

This makes the helper work for merge as well as split.  Remove the special
fixups in the caller that were making split work before.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: don't complete freeze while parent inode is frozen
Sage Weil [Wed, 17 Nov 2010 17:20:15 +0000 (09:20 -0800)]
mds: don't complete freeze while parent inode is frozen

This makes maybe_finish_freeze() conditions match that of is_freezeable()
and avoids an assert.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: move dirty rstat inodes to new dir on refragment
Sage Weil [Wed, 17 Nov 2010 17:19:39 +0000 (09:19 -0800)]
mds: move dirty rstat inodes to new dir on refragment

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: flush log on fragment
Sage Weil [Wed, 17 Nov 2010 16:42:44 +0000 (08:42 -0800)]
mds: flush log on fragment

This makes request lock auth_pins expire, so the fragment moves along.
Otherwise we can end up waiting for the log flush timer to go off.

This isn't a complete solution; in-progress requests won't know to flush.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: initialize PIN_SUBTREE on split
Sage Weil [Wed, 17 Nov 2010 16:33:08 +0000 (08:33 -0800)]
mds: initialize PIN_SUBTREE on split

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix discover requests, tracking wrt fragments
Sage Weil [Mon, 15 Nov 2010 22:16:43 +0000 (14:16 -0800)]
mds: fix discover requests, tracking wrt fragments

Track discover requests by tid.  The old system of tracking outstanding
discovers was kludgey and somewhat broken.  Also there is a possibility
of getting dup replies if someone does kick_requests().

There is still room for improvement with the logic detemrining when a
discover is sent: we may want to discover multiple dirfrags in parallel,
but the current code will only do one at a time.

Signed-off-by: Sage Weil <sage@newdream.net>
comment

14 years agomds: fix EFragment replay
Sage Weil [Tue, 16 Nov 2010 23:59:48 +0000 (15:59 -0800)]
mds: fix EFragment replay

If the inode already exists in our cache, adjust our (existing) fragments.
But it might not.  In that case, we just replay the metablob.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: don't fragment mdsdir or .ceph
Sage Weil [Tue, 16 Nov 2010 19:48:58 +0000 (11:48 -0800)]
mds: don't fragment mdsdir or .ceph

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoDetect broken system linux/fiemap.h
Jim Schutt [Wed, 17 Nov 2010 20:39:52 +0000 (13:39 -0700)]
Detect broken system linux/fiemap.h

RedHat 5.5 has a /usr/include/linux/fiemap.h, but it is
broken because it does not itself include linux/types.h.
As a result, __u64 and friends are not defined.

We have a Ceph-local copy of fiemap.h, so use it
if the system version is broken.

While we're at it, fix up the configure message to
note we're using a local copy.

Signed-off-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosdmap: don't include blacklist info in summary
Sage Weil [Wed, 17 Nov 2010 18:24:21 +0000 (10:24 -0800)]
osdmap: don't include blacklist info in summary

It's confusing users and isn't that important.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoconfig: added max_mds
Samuel Just [Wed, 17 Nov 2010 00:07:47 +0000 (16:07 -0800)]
config: added max_mds
MDSMonitor: create_new_fs adapted to use the max_mds parameter

max_mds is now a configurable value and create_new_fs will initialize
max_mds to the specified value.

Signed-off-by: Samuel Just <samuelj@hq.newdream.net>
14 years agomds: allow frag merge on subtree root
Sage Weil [Mon, 15 Nov 2010 21:43:56 +0000 (13:43 -0800)]
mds: allow frag merge on subtree root

Fix purge_stolen and adjust_dir_fragments.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: make dirfrag thrashing join and split
Sage Weil [Mon, 15 Nov 2010 21:24:24 +0000 (13:24 -0800)]
mds: make dirfrag thrashing join and split

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: add timestamp to LogEvents
Sage Weil [Tue, 16 Nov 2010 20:07:38 +0000 (12:07 -0800)]
mds: add timestamp to LogEvents

This just gives us a bit of useful info when debugging problems.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: fix trailing + in pg state string rendering
Sage Weil [Tue, 16 Nov 2010 18:32:19 +0000 (10:32 -0800)]
osd: fix trailing + in pg state string rendering

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge remote branch 'origin/unfound' into unstable
Sage Weil [Tue, 16 Nov 2010 18:10:43 +0000 (10:10 -0800)]
Merge remote branch 'origin/unfound' into unstable

14 years agomds: be less noisy about cap imports
Sage Weil [Tue, 16 Nov 2010 18:06:05 +0000 (10:06 -0800)]
mds: be less noisy about cap imports

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'mds_dir_hash' into unstable
Sage Weil [Tue, 16 Nov 2010 18:01:15 +0000 (10:01 -0800)]
Merge branch 'mds_dir_hash' into unstable

14 years agomds/client: pass dir hash over the wire
Sage Weil [Tue, 16 Nov 2010 17:50:55 +0000 (09:50 -0800)]
mds/client: pass dir hash over the wire

Add a feature bit DIRLAYOUTHASH.

Also fix client request routing for lookups (we were only hashing when
a Dentry pointer was provided, not when a relative path was).

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: set dir hash on root inode
Sage Weil [Tue, 16 Nov 2010 17:47:34 +0000 (09:47 -0800)]
mds: set dir hash on root inode

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: set mode before all the file type dependent inode initialization!
Sage Weil [Tue, 16 Nov 2010 17:42:51 +0000 (09:42 -0800)]
mds: set mode before all the file type dependent inode initialization!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: add DIRLAYOUTHASH feature bit
Sage Weil [Tue, 26 Oct 2010 03:41:24 +0000 (20:41 -0700)]
mds: add DIRLAYOUTHASH feature bit

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: make dentry hash a dir layout property
Sage Weil [Mon, 25 Oct 2010 23:46:01 +0000 (16:46 -0700)]
mds: make dentry hash a dir layout property

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoRadosClient::shutdown: call monclient::shutdown
Colin Patrick McCabe [Tue, 16 Nov 2010 02:33:01 +0000 (18:33 -0800)]
RadosClient::shutdown: call monclient::shutdown

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd: don't stop recovery when there are unfound
Colin Patrick McCabe [Tue, 16 Nov 2010 00:19:27 +0000 (16:19 -0800)]
osd: don't stop recovery when there are unfound

There are two phases in recovery: one where we get all the right objects
on to the primary, and another where we push all those objects out to
the replicas. Formerly, we would not start the second phase until there
were no missing objects at all on the primary.

This change modifies that so that we will start the second phase even if
there are unfound objects. However, we will still wait for all findable
missing objects to be brought to us, of course.

Get rid of uptodate_set. We can find the same information by looking at
the missing and missing_loc sets directly. Keeping the uptodate_set...
er... up-to-date would be very difficult in the presence of all the things
that can modify the missing and missing_loc sets.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agodumpjournal.cc: fix compile
Colin Patrick McCabe [Tue, 16 Nov 2010 01:01:19 +0000 (17:01 -0800)]
dumpjournal.cc: fix compile

dumpjournal needs to create its own SafeTimers and pass them in to some
constructors.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agorbd: fix rbd snap rm class handling
Yehuda Sadeh [Tue, 16 Nov 2010 00:43:25 +0000 (16:43 -0800)]
rbd: fix rbd snap rm class handling

14 years agoMerge remote branch 'origin/unfound_last_epoch_clean' into unstable
Sage Weil [Mon, 15 Nov 2010 22:59:46 +0000 (14:59 -0800)]
Merge remote branch 'origin/unfound_last_epoch_clean' into unstable

14 years agoAdd ./ceph osd tell <osd-num> dump_missing <out>
Colin Patrick McCabe [Mon, 15 Nov 2010 22:47:44 +0000 (14:47 -0800)]
Add ./ceph osd tell <osd-num> dump_missing <out>

Add a command that tells the OSD to dump its missing set for all PGs to
a file. This should be useful for debugging multi-OSD scenarios.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agosearch_for_missing:recalc stats if unfound changed
Colin Patrick McCabe [Mon, 15 Nov 2010 22:38:36 +0000 (14:38 -0800)]
search_for_missing:recalc stats if unfound changed

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agomds: Use CDir bloom filter as appropriate.
Greg Farnum [Mon, 15 Nov 2010 21:00:14 +0000 (13:00 -0800)]
mds: Use CDir bloom filter as appropriate.

Add items to the bloom filter when trimming, and look for them
in the filter in the few places where a simple existence
check suffices for our needs.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agomds: Add bloom filter to CDir.
Greg Farnum [Mon, 15 Nov 2010 20:59:29 +0000 (12:59 -0800)]
mds: Add bloom filter to CDir.

You can now add items to a bloom filter and check for their existence.
This is intended to be used when trimming items out of the cache; the
filter is cleared when you mark_complete and is not transferred between
nodes. Neither does it change how you set or remove the STATE_COMPLETE flag.
You must explicitly check the bloom filter as appropriate; likewise, if
you start to fill it in you must always continue filling it in until
you delete the current instance of the filter.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agotimer: make init/shutdown explicit
Sage Weil [Mon, 15 Nov 2010 21:23:42 +0000 (13:23 -0800)]
timer: make init/shutdown explicit

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agotest_unfound.sh: start recovery at end of test
Colin Patrick McCabe [Mon, 15 Nov 2010 20:39:56 +0000 (12:39 -0800)]
test_unfound.sh: start recovery at end of test

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agotest_common.sh: add dump_osd_store
Colin Patrick McCabe [Mon, 15 Nov 2010 20:31:43 +0000 (12:31 -0800)]
test_common.sh: add dump_osd_store

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agotest_unfound.sh: fix return codes
Colin Patrick McCabe [Mon, 15 Nov 2010 19:27:44 +0000 (11:27 -0800)]
test_unfound.sh: fix return codes

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agostray_test:don't use up/down. timeout extension
Colin Patrick McCabe [Mon, 15 Nov 2010 19:05:47 +0000 (11:05 -0800)]
stray_test:don't use up/down. timeout extension

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd: add discover_all_missing
Colin Patrick McCabe [Sun, 14 Nov 2010 22:50:30 +0000 (14:50 -0800)]
osd: add discover_all_missing

Add discover_all_missing. This function makes sure that we have messages
en route to any OSD that we think might have information that could help
us discover where our unfound objects lie.

We call this function:
* In do_peer, right after activating the PG
* In _process_pg_info, when we're the primary of an active PG
* From handle_pg_notify, when we're the primary of an active PG

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoFix bugs in search_for_missing, _process_pg_info
Colin Patrick McCabe [Sun, 14 Nov 2010 21:54:55 +0000 (13:54 -0800)]
Fix bugs in search_for_missing, _process_pg_info

PG::search_for_missing: fix a bug with the handling of MSG_OSD_PG_INFO
messages. Formerly, when processing these messages, we erroneously
assumed that there was nothing missing on the peer at all even in cases
where there were missing objects.

PG::merge_log: drop unused Missing parameter.

_process_pg_info: Don't assume that just because we requested a Log
message at some point, that that is the message we're prcessing.
Correctly handle cases where we didn't get the peer's Missing set or
Log.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoAdd stray_test to test_unfound.sh
Colin Patrick McCabe [Sat, 13 Nov 2010 07:44:50 +0000 (23:44 -0800)]
Add stray_test to test_unfound.sh

This test is designed to produce a stray that nonetheless has some
useful objects. The primary should be able to find those objects and ask
for them.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoPG::finish_recovery: set info.last_epoch_clean
Colin Patrick McCabe [Sun, 14 Nov 2010 21:32:51 +0000 (13:32 -0800)]
PG::finish_recovery: set info.last_epoch_clean

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoAdd MOSDPGMissing
Colin Patrick McCabe [Fri, 12 Nov 2010 22:55:40 +0000 (14:55 -0800)]
Add MOSDPGMissing

Add MOSDPGMissing, a message which just contains the missing objects
information for a PG. We will request messages like this one in order to
locate all of our unfound objects.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd: add incompat feature LEC for last_epoch_clean
Colin Patrick McCabe [Thu, 11 Nov 2010 01:12:49 +0000 (17:12 -0800)]
osd: add incompat feature LEC for last_epoch_clean

So an old binary will fail to mount a store with new Info encoding.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: add last_epoch_clean to PG::Info
Sage Weil [Tue, 19 Oct 2010 18:40:45 +0000 (11:40 -0700)]
osd: add last_epoch_clean to PG::Info

This changes the encoding in a non-backwards compatible way.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agotest_common.sh: remove messenger debug for now
Colin Patrick McCabe [Sun, 14 Nov 2010 19:40:33 +0000 (11:40 -0800)]
test_common.sh: remove messenger debug for now

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd: skip unfound in recover_replicas
Sage Weil [Mon, 15 Nov 2010 20:06:09 +0000 (12:06 -0800)]
osd: skip unfound in recover_replicas

This is moot currently, since we don't currently start recovering replicas
until the primary is complete.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: skip unfound objects in recover_primary()
Sage Weil [Mon, 15 Nov 2010 20:04:57 +0000 (12:04 -0800)]
osd: skip unfound objects in recover_primary()

We also need to make sure we come back later when they are found.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosdmap: make printing a bit easier to read
Sage Weil [Mon, 15 Nov 2010 19:57:57 +0000 (11:57 -0800)]
osdmap: make printing a bit easier to read

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoobjecter: don't dereference null op->outbl
Sage Weil [Mon, 15 Nov 2010 19:50:53 +0000 (11:50 -0800)]
objecter: don't dereference null op->outbl

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoinclude: Add bloom filter library to include/
Greg Farnum [Mon, 15 Nov 2010 19:36:51 +0000 (11:36 -0800)]
include: Add bloom filter library to include/

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoMerge remote branch 'origin/testing' into unstable
Sage Weil [Mon, 15 Nov 2010 19:25:58 +0000 (11:25 -0800)]
Merge remote branch 'origin/testing' into unstable

14 years agoosd: unreg scrub when removing pg
Sage Weil [Mon, 15 Nov 2010 19:25:49 +0000 (11:25 -0800)]
osd: unreg scrub when removing pg

This fixes this crash:

    osd/OSD.cc: In function 'PG* OSD::_lookup_lock_pg(pg_t)':
    osd/OSD.cc:956: FAILED assert(pg_map.count(pgid))
    ceph version 0.24~rc (7201497f2feef6a2bbd0baf89e3a14b8a880e79f)
    1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x34) [0x81ee4e]
    2: (OSD::_lookup_lock_pg(pg_t)+0x6f) [0x6d3d57]
    3: (OSD::sched_scrub()+0x2e9) [0x6e4445]
    4: (OSD::tick()+0x204) [0x6f168e]
    5: (OSD::C_Tick::finish(int)+0x1c) [0x7613bc]
    6: (SafeTimer::timer_thread()+0x189) [0x81bcf5]
    7: (SafeTimerThread::entry()+0x19) [0x81dd73]
    8: (Thread::_entry_func(void*)+0x20) [0x66496a]
    9: (()+0x68ba) [0x7fb807d118ba]
    10: (clone()+0x6d) [0x7fb806a7002d]
    NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Reported-by: Colin McCabe <colinm@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'msgr_zerocopy_read' into unstable
Sage Weil [Mon, 15 Nov 2010 04:48:43 +0000 (20:48 -0800)]
Merge branch 'msgr_zerocopy_read' into unstable

14 years agolibrados: pass provided buffer to objecter on rados_read
Sage Weil [Mon, 15 Nov 2010 04:29:40 +0000 (20:29 -0800)]
librados: pass provided buffer to objecter on rados_read

This allows us to avoid to the data copy if the objecter and msgr manage
to use it.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoobjecter: post rx buffer to msgr if target bufferlist is present
Sage Weil [Mon, 15 Nov 2010 04:28:44 +0000 (20:28 -0800)]
objecter: post rx buffer to msgr if target bufferlist is present

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: use provided rx buffer if present
Sage Weil [Mon, 15 Nov 2010 04:26:52 +0000 (20:26 -0800)]
msgr: use provided rx buffer if present

This changes the read path so that we hold the Connection::lock mutex while
reading data off the socket.  This ensures that we are reading into a
buffer we are allowed to use, and allows users to revoke a previously
posted buffer.  If that happens, switch over to a newly allocated buffer.

Note that currently the final result bufferlist may contain part of the
provided buffer and part of a newly allocated buffer.  This is okay as long
as we will always read the same data into the buffer.  And in practice, if
the rx buffer is revoked then the message itself will be thrown out anyway.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: add Connection rx buffer interface
Sage Weil [Mon, 15 Nov 2010 04:23:52 +0000 (20:23 -0800)]
msgr: add Connection rx buffer interface

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: implement get_connection()
Sage Weil [Mon, 15 Nov 2010 04:23:10 +0000 (20:23 -0800)]
msgr: implement get_connection()

Get a Connection* for the given destination.  This mirrors submit_message,
but does not actually queue a message.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agobuffer: implement list::iterator::get_current_ptr()
Sage Weil [Mon, 15 Nov 2010 04:21:05 +0000 (20:21 -0800)]
buffer: implement list::iterator::get_current_ptr()

Return a buffer::ptr for the ptr at the current position/offset, with the
length set to the remaining space in the current buffer.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoObjecter::shutdown: shut down timer.
Colin Patrick McCabe [Sun, 14 Nov 2010 19:29:29 +0000 (11:29 -0800)]
Objecter::shutdown: shut down timer.

We have to explictly shut down the timer in Objecter::shutdown.
Otherwise, we are relying on the destructor of SafeTimer to do it.
Unfortunately, that destructor gets called after the mutex the timer is
using has already been destroyed.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMerge remote branch 'origin/msgr' into testing
Sage Weil [Sat, 13 Nov 2010 04:43:30 +0000 (20:43 -0800)]
Merge remote branch 'origin/msgr' into testing

14 years agodebug: don't print thread id twice
Sage Weil [Sat, 13 Nov 2010 00:00:12 +0000 (16:00 -0800)]
debug: don't print thread id twice

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: cleanup: make queue_received non-inline; some helpful debug
Sage Weil [Fri, 12 Nov 2010 23:59:50 +0000 (15:59 -0800)]
msgr: cleanup: make queue_received non-inline; some helpful debug

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: do not clear halt_delivery
Sage Weil [Fri, 12 Nov 2010 23:56:54 +0000 (15:56 -0800)]
msgr: do not clear halt_delivery

We need to keep the halt_delivery plug set on failure/shutdown in order to
prevent a racing reader from queuing new messages.  The only time we clear
it is when we discard messages due to a session reset.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: close enqueue/discard race
Sage Weil [Fri, 12 Nov 2010 22:41:53 +0000 (14:41 -0800)]
msgr: close enqueue/discard race

We need to re-check halt_delivery after dropping and retaking pipe_lock.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: protect pipe queue_item map with pipe_lock AND dispatch_queue lock
Sage Weil [Fri, 12 Nov 2010 22:05:56 +0000 (14:05 -0800)]
msgr: protect pipe queue_item map with pipe_lock AND dispatch_queue lock

Close a few different races here.

Also, assert that queue_items are not queued in ~Pipe().

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: add 'ms inject socket failures = foo'
Sage Weil [Fri, 12 Nov 2010 21:53:49 +0000 (13:53 -0800)]
msgr: add 'ms inject socket failures = foo'

Where we fail roughly every foo'th socket operation.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: only close socket on reconnect or shutdown
Sage Weil [Fri, 12 Nov 2010 21:09:24 +0000 (13:09 -0800)]
msgr: only close socket on reconnect or shutdown

We can't modify 'sd' or (more importnatly) close sd while any other thread
might be using it, or else we might race with an open and they might end
up using someone else's fd.

Take care to _only_ close(sd) in connect(), when the reader thread is
stopped, or when reaping the connection.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: protect pipe queuing with _both_ pipe and dispatch_queue locks
Sage Weil [Fri, 12 Nov 2010 21:41:14 +0000 (13:41 -0800)]
msgr: protect pipe queuing with _both_ pipe and dispatch_queue locks

We want to make sure the pipe's queue item doesn't go away.

Also, make queue_received() require pipe_lock to be held.  This avoids some
useless unlocking/locking, since (in the case where the pipe is already
queued) we then don't need to drop the pipe_lock at all.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoTestTimers: don't test (nonexistent) Timer
Colin Patrick McCabe [Fri, 12 Nov 2010 22:49:25 +0000 (14:49 -0800)]
TestTimers: don't test (nonexistent) Timer

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoRename PG::peer to PG::do_peer
Colin Patrick McCabe [Fri, 12 Nov 2010 22:45:36 +0000 (14:45 -0800)]
Rename PG::peer to PG::do_peer

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMerge branch 'testing' into unstable
Sage Weil [Fri, 12 Nov 2010 15:59:05 +0000 (07:59 -0800)]
Merge branch 'testing' into unstable

14 years agouclient: insert lssnap results under snapdir, not live dir
Sage Weil [Fri, 12 Nov 2010 15:55:41 +0000 (07:55 -0800)]
uclient: insert lssnap results under snapdir, not live dir

Put the readdir results (list of snapshots) in the right place in the
hierarchy; we were putting them in the parent dir (as if they were real
directories).

This bug manifested itself as a snaptest-2.sh failure.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsg: fix buffer size for IPv6 address parsing
Wido den Hollander [Fri, 12 Nov 2010 15:36:00 +0000 (07:36 -0800)]
msg: fix buffer size for IPv6 address parsing

Signed-off-by: Wido den Hollander <wido@widodh.nl>
14 years agotimer: rewrite mostly from scratch
Sage Weil [Fri, 12 Nov 2010 00:38:02 +0000 (16:38 -0800)]
timer: rewrite mostly from scratch

Just use the provided lock.  This _vastly_ reduces the complexity because
we don't have to worry about races between our thread trying to fire off a
timer that is being canceled.

The old Timer class isn't used anywhere anymore.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: hit inode created via CREATE
Sage Weil [Thu, 11 Nov 2010 23:31:42 +0000 (15:31 -0800)]
mds: hit inode created via CREATE

We missed this path!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'rc' into unstable
Sage Weil [Thu, 11 Nov 2010 22:28:18 +0000 (14:28 -0800)]
Merge branch 'rc' into unstable

Conflicts:
configure.ac
src/Makefile.am

14 years agov0.23 v0.23
Sage Weil [Thu, 11 Nov 2010 00:34:17 +0000 (16:34 -0800)]
v0.23

14 years agomds: fix null_snapflush with multiple intervening snaps
Sage Weil [Thu, 11 Nov 2010 04:58:49 +0000 (20:58 -0800)]
mds: fix null_snapflush with multiple intervening snaps

The client is allowed to not send a snapflush if there is no dirty metadata
to write for a given snap.  However, the mds can only look up inodes by
the last snapid in the interval.  So, when doing a null_snapflush (filling
in for snapflushes the client didn't send), we have to walk forward through
intervening snaps until we find the right inode.

Note that this means we will call _do_snap_update multiple times on the
same inode, but with different snapids.

Add unit test to check this.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge remote branch 'origin/unfound' into unstable
Sage Weil [Thu, 11 Nov 2010 00:36:18 +0000 (16:36 -0800)]
Merge remote branch 'origin/unfound' into unstable

14 years agoosd: scrub least recently scrubbed pgs first; once a day
Sage Weil [Thu, 11 Nov 2010 00:31:26 +0000 (16:31 -0800)]
osd: scrub least recently scrubbed pgs first; once a day

Signed-off-by: Sage Weil <sage@newdream.net>