]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
14 years agoosd: fix scrub reserved state when starting scrub
Sage Weil [Wed, 10 Nov 2010 21:39:51 +0000 (13:39 -0800)]
osd: fix scrub reserved state when starting scrub

Also document scrub scheduling/pending/active states.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agovstart: turn down msgr debugging
Sage Weil [Wed, 10 Nov 2010 21:16:34 +0000 (13:16 -0800)]
vstart: turn down msgr debugging

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomonc: cancel timer events with lock held
Sage Weil [Wed, 10 Nov 2010 21:13:38 +0000 (13:13 -0800)]
monc: cancel timer events with lock held

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agodecompile_crush_bucket: fix depth-first decomp
Colin Patrick McCabe [Wed, 10 Nov 2010 07:59:06 +0000 (23:59 -0800)]
decompile_crush_bucket: fix depth-first decomp

We need to ensure that buckets are output after their dependencies. The
best way to do this is a depth-first traversal of the bucket directed
acyclic graph. The previous solution was incorrect because it in some
cases it didn't traverse the graph in the right order.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoCrushWrapper:get_bucket: ret ENOENT for no bucket
Colin Patrick McCabe [Wed, 10 Nov 2010 07:48:01 +0000 (23:48 -0800)]
CrushWrapper:get_bucket: ret ENOENT for no bucket

All the callers of CrushWrapper::get_bucket() check for error codes, but
not for NULL returns. So if there is no bucket (i.e., a NULL pointer) at
crush->bucket[i], just return the error code ENOENT. This is consistent
with how we handle other out-of-bounds requests.

Also, don't allow the caller to get us to try to access negative indices
in crush->bucket.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMerge branch 'sched_scrub' into unstable
Sage Weil [Tue, 9 Nov 2010 23:56:20 +0000 (15:56 -0800)]
Merge branch 'sched_scrub' into unstable

Conflicts:
src/osd/PG.cc
src/osd/PG.h

14 years agoosd: small cleanup
Sage Weil [Tue, 9 Nov 2010 23:50:48 +0000 (15:50 -0800)]
osd: small cleanup

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: scrub: list objects without lock held
Sage Weil [Tue, 9 Nov 2010 23:08:15 +0000 (15:08 -0800)]
osd: scrub: list objects without lock held

We'll go back to get anything we missed later.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'scrub_no_lock' into unstable
Sage Weil [Tue, 9 Nov 2010 23:46:54 +0000 (15:46 -0800)]
Merge branch 'scrub_no_lock' into unstable

14 years agops-ceph.pl: don't show self
Colin Patrick McCabe [Tue, 9 Nov 2010 23:34:52 +0000 (15:34 -0800)]
ps-ceph.pl: don't show self

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMerge branch 'rbd-fiemap' into unstable
Sage Weil [Tue, 9 Nov 2010 22:50:24 +0000 (14:50 -0800)]
Merge branch 'rbd-fiemap' into unstable

14 years agoobjecter: set READ flag on new objecter mapext/read_sparse ops
Sage Weil [Tue, 9 Nov 2010 22:49:47 +0000 (14:49 -0800)]
objecter: set READ flag on new objecter mapext/read_sparse ops

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoobjecter: fix balancer for ops with length < 0
Sage Weil [Tue, 9 Nov 2010 22:48:52 +0000 (14:48 -0800)]
objecter: fix balancer for ops with length < 0

Notably, mapext.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agofilestore: autodetect presense of FIEMAP ioctl
Sage Weil [Tue, 9 Nov 2010 22:36:02 +0000 (14:36 -0800)]
filestore: autodetect presense of FIEMAP ioctl

If it's not there, assume the whole object is allocated.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agofiemap: include linux fiemap.h header; unconditionally compile helper
Sage Weil [Tue, 9 Nov 2010 22:35:33 +0000 (14:35 -0800)]
fiemap: include linux fiemap.h header; unconditionally compile helper

If the system doesn't have the header, use our copy.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agops-ceph.pl: display Ceph tests
Colin Patrick McCabe [Tue, 9 Nov 2010 22:32:49 +0000 (14:32 -0800)]
ps-ceph.pl: display Ceph tests

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMerge remote branch 'origin/rbd-fiemap' into unstable
Sage Weil [Tue, 9 Nov 2010 22:23:12 +0000 (14:23 -0800)]
Merge remote branch 'origin/rbd-fiemap' into unstable

14 years agoFix example config file
Colin Patrick McCabe [Tue, 9 Nov 2010 22:06:42 +0000 (14:06 -0800)]
Fix example config file

We need to specify a journal size for the file-based journal we set up
in the example config file.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoTimerThread:don't call pop_front before iter deref
Colin Patrick McCabe [Tue, 9 Nov 2010 21:57:17 +0000 (13:57 -0800)]
TimerThread:don't call pop_front before iter deref

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoObjecter: initialize timer in Objecter::init
Colin Patrick McCabe [Tue, 9 Nov 2010 20:04:47 +0000 (12:04 -0800)]
Objecter: initialize timer in Objecter::init

Just in case future users of Objecter want to create one before calling
Messenger::start as a daemon.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoAdd test_crushtool.sh
Colin Patrick McCabe [Tue, 9 Nov 2010 18:13:46 +0000 (10:13 -0800)]
Add test_crushtool.sh

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agomds: turn on mds_bal_frag (dir fragmentation) by default
Sage Weil [Tue, 9 Nov 2010 18:06:10 +0000 (10:06 -0800)]
mds: turn on mds_bal_frag (dir fragmentation) by default

Let the fun begin!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoCrushWrapper::get_bucket_item: bounds check
Colin Patrick McCabe [Tue, 9 Nov 2010 17:57:15 +0000 (09:57 -0800)]
CrushWrapper::get_bucket_item: bounds check

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agocrushtool: don't create a dump we can't recompile
Colin Patrick McCabe [Tue, 9 Nov 2010 17:55:44 +0000 (09:55 -0800)]
crushtool: don't create a dump we can't recompile

In crushtool, dump buckets in tree order. Buckets which reference other
buckets must be dumped after their depedencies, or else re-compilation
will fail.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosdmap: cleanup: add parens
Sage Weil [Tue, 9 Nov 2010 17:56:05 +0000 (09:56 -0800)]
osdmap: cleanup: add parens

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: wipe out client sessions on startup
Sage Weil [Thu, 14 Oct 2010 21:40:54 +0000 (14:40 -0700)]
mds: wipe out client sessions on startup

For disaster recovery and such.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomon: implement 'mds newfs <metapool> <datapool>' command
Sage Weil [Thu, 14 Oct 2010 20:55:15 +0000 (13:55 -0700)]
mon: implement 'mds newfs <metapool> <datapool>' command

Create a new fs (by creating a new MDSMap) using the given pools.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: use mdsmap data pool for root inode default layout
Sage Weil [Thu, 14 Oct 2010 20:53:47 +0000 (13:53 -0700)]
mds: use mdsmap data pool for root inode default layout

The MDSMap may specify any random pool as the data pool; use that.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: add mds_skip_ino and mds_wipe_ino_prealloc options
Sage Weil [Thu, 14 Oct 2010 20:37:59 +0000 (13:37 -0700)]
mds: add mds_skip_ino and mds_wipe_ino_prealloc options

These are last-ditch recovery tools.  Not particularly effective ones,
though.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: add missing Dumper.[h,cc]
Sage Weil [Sat, 6 Nov 2010 19:12:38 +0000 (12:12 -0700)]
mds: add missing Dumper.[h,cc]

14 years agoReplace ps-ceph.sh shell script with perl script
Andrew Farmer [Mon, 8 Nov 2010 17:41:06 +0000 (09:41 -0800)]
Replace ps-ceph.sh shell script with perl script

A much faster version of ps-ceph.sh.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMerge remote branch 'origin/object_locator' into unstable
Sage Weil [Sun, 7 Nov 2010 17:56:42 +0000 (09:56 -0800)]
Merge remote branch 'origin/object_locator' into unstable

Conflicts:
src/osd/OSD.cc
src/osd/ReplicatedPG.cc
src/osd/ReplicatedPG.h
src/osd/osd_types.h

14 years agoMerge remote branch 'origin/timer-fixes' into unstable
Sage Weil [Sun, 7 Nov 2010 17:45:09 +0000 (09:45 -0800)]
Merge remote branch 'origin/timer-fixes' into unstable

14 years agov0.24~rc
Sage Weil [Sun, 7 Nov 2010 17:44:04 +0000 (09:44 -0800)]
v0.24~rc

14 years agoMerge remote branch 'origin/testing' into unstable
Sage Weil [Sun, 7 Nov 2010 17:42:51 +0000 (09:42 -0800)]
Merge remote branch 'origin/testing' into unstable

14 years agomds: eval: put scatter in MIX if replicated, otherwise LOCK
Sage Weil [Sun, 7 Nov 2010 15:49:59 +0000 (07:49 -0800)]
mds: eval: put scatter in MIX if replicated, otherwise LOCK

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: do not scatter_writebehind in MIX state
Sage Weil [Sun, 7 Nov 2010 15:45:52 +0000 (07:45 -0800)]
mds: do not scatter_writebehind in MIX state

Replicas might come in while we're flushing and get a MIX state with
the old state.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'unstable' into mix_stale
Sage Weil [Sun, 7 Nov 2010 04:05:11 +0000 (21:05 -0700)]
Merge branch 'unstable' into mix_stale

14 years agomds: remove MIX_STALE
Sage Weil [Sat, 6 Nov 2010 18:35:54 +0000 (11:35 -0700)]
mds: remove MIX_STALE

Yay, we don't need it!

If we can't update the frag on scatter, fine.  The staleness of the frag
is implicit in the frag's scatter stat version not matching the inode's.
If/when we do want to update it, the frag will clearly be writable, and
we can bring it back in sync then.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: don't fuss with versions when taking frag/rstat from frag; it's never stale...
Sage Weil [Sat, 6 Nov 2010 18:18:53 +0000 (11:18 -0700)]
mds: don't fuss with versions when taking frag/rstat from frag; it's never stale here

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: introduce/use helpers to resync stale fragstat/rstat; update version
Sage Weil [Sat, 6 Nov 2010 18:18:13 +0000 (11:18 -0700)]
mds: introduce/use helpers to resync stale fragstat/rstat; update version

Simplifies code.

Also, update the version when we resync!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: ignore done_locking on slave requests' acquire_locks()
Sage Weil [Sun, 7 Nov 2010 03:55:12 +0000 (20:55 -0700)]
mds: ignore done_locking on slave requests' acquire_locks()

Slave requests ask for each xlock one at a time.  Don't bail out based on
the done_locking flag.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: don't use helper for rename srcdn
Sage Weil [Sun, 7 Nov 2010 03:17:32 +0000 (20:17 -0700)]
mds: don't use helper for rename srcdn

The rdlock_path_xlock_dentry helper works for _auth_ dentries that we
create locally in an auth dirfrag.  For the srcdn, we need to discover an
_existing_ dentry that is not necessarily auth.

Call path_traverse ourselves, but be careful to take the appropriate locks
on the resulting dn, dir, and ancestors.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: never complete a gather on a flushing lock
Sage Weil [Sat, 6 Nov 2010 18:02:13 +0000 (11:02 -0700)]
mds: never complete a gather on a flushing lock

The scatter_writebehind() takes a wrlock, but that may still allow the lock
to complete a gather to LOCK and even move to say MIX before the data is
committed.  Bad news!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: update version when bring stale rstat back up to date
Sage Weil [Sat, 6 Nov 2010 16:38:15 +0000 (09:38 -0700)]
mds: update version when bring stale rstat back up to date

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: simplify stale semantics a bit
Sage Weil [Sat, 6 Nov 2010 14:58:32 +0000 (07:58 -0700)]
mds: simplify stale semantics a bit

is_stale() => next MIX is MIX_STALE. Stale flag is then cleared.  Then we
special case the import to preserve stale-ness.

TODO: add_replica_inode likely has this same problem.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: preserve stale state on import; some cleanup
Sage Weil [Sat, 6 Nov 2010 04:52:28 +0000 (21:52 -0700)]
mds: preserve stale state on import; some cleanup

Our new invariant is that MIX_STALE always implies is_stale().  And on
import, if is_stale(), MIX becomes MIX_STALE.  This ensures that a replica
that we put into MIX_STALE doesn't turn back into MIX if we import it
and take the auth's state in CInode::decode_import().

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'mix_stale' into unstable
Sage Weil [Sat, 6 Nov 2010 00:08:10 +0000 (17:08 -0700)]
Merge branch 'mix_stale' into unstable

14 years agomds: add more verify_scatter asserts
Sage Weil [Sat, 6 Nov 2010 00:06:10 +0000 (17:06 -0700)]
mds: add more verify_scatter asserts

For catchings fragstat errors sooner.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix version check on resyncing stale rstat in predirty_journal_parents
Sage Weil [Fri, 5 Nov 2010 22:24:53 +0000 (15:24 -0700)]
mds: fix version check on resyncing stale rstat in predirty_journal_parents

We're resyncing rstat, so check the rstat version (not fragstat!)

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: Fix bad inode deref.
Greg Farnum [Fri, 5 Nov 2010 19:45:06 +0000 (12:45 -0700)]
mds: Fix bad inode deref.

Accidentally trying to print out the CInode after removing it in trim_non_auth!
Move the print to before it's been unlinked/removed/etc.

14 years agoRevisit std::multimap decoder
Colin Patrick McCabe [Fri, 5 Nov 2010 19:17:40 +0000 (12:17 -0700)]
Revisit std::multimap decoder

Previously I changed the std::multimap decoder to minimize the number of
constructor invocations. However, it could be much more expensive to
copy an initialized (decoded) val_t than to copy an empty one. For
example, if we are decoding std::multimap < int, std::set <int> >. So
change the code to insert a non-decoded val_t again.

However, this still saves two constructor invocations over the original.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoautogen.sh: check for pkg-config
Colin Patrick McCabe [Fri, 5 Nov 2010 18:34:11 +0000 (11:34 -0700)]
autogen.sh: check for pkg-config

To avoid seeing confusing errors later in the configure process, in
autogen.sh, check to make sure the pkg-config program is installed.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoPG.cc: build_scrub_map now drops the PG lock while scanning the PG
Samuel Just [Thu, 21 Oct 2010 23:54:01 +0000 (16:54 -0700)]
PG.cc: build_scrub_map now drops the PG lock while scanning the PG
       build_inc_scrub_map scans all files modified since the given
           version number and creates an incremental scrub map to
           be merged with a scrub map created with build_scrub_map.
           This scan is done while holding the pg lock.
       ScrubMap.objects is now represented as a map rather than as
           a vector.

PG.h:  Added last_update_applied and finalizing_scrub members to
           PG.

ReplicatedPG.cc:
       calc_trim_to will not trim the log during a scrub (since
           replicas need the log to construct incremental maps)
       sub_op_modify_oplied and op_applied maintain a
   last_update_applied PG member to be used for determining
           how far back a replica need go to construct an
           incremental scrub map.

osd_types.h:
       Added merge_incr method for combining a scrub map with
           a subsequent incremental scrub map.
       ScrubMap.objects is now a map from sobject_t to object.

PG scrubs will now drop the PG lock while initially scanning the PG
collection allowing writes to continue.  The scrub map will be tagged
with the most recent version applied.  After halting writes, the
primary will request an incremental map from any replicas whose map
versions do not match log.head.

Signed-off-by: Samuel Just <samuelj@hq.newdream.net>
14 years agomds: preserve version when recovering rstat from dirfrag in predirty_journal_parents
Sage Weil [Fri, 5 Nov 2010 17:38:35 +0000 (10:38 -0700)]
mds: preserve version when recovering rstat from dirfrag in predirty_journal_parents

We don't want to screw up the version here.  This aligns the code with
other instances of this check.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: restructure finish_scatter_gather_update()
Sage Weil [Fri, 5 Nov 2010 06:20:33 +0000 (23:20 -0700)]
mds: restructure finish_scatter_gather_update()

Separate behavior into two dimensions: whether or not we are updating
the dirfrag, and whether or not the dirfrag is stale.

Change the various helpers to NOT implicitly update accounted_*, as the
caller doesn't always want that, notably when we are non-stale but frozen.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: do not bump scatter stat lock in predirty_journal_parents
Sage Weil [Fri, 5 Nov 2010 06:15:06 +0000 (23:15 -0700)]
mds: do not bump scatter stat lock in predirty_journal_parents

If we're in the MIX state, we clearly can't touch this without screwing up
the delicate scatter/gather behavior.  If we're in, say, LOCK, there is
still no reason to update it.  One frag at least is local and auth if we
are in this code, but there may be other frags on other nodes.  This would
just make them appear stale when they are not.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: mark scatterlock stale on import of stale frag scatter stat
Sage Weil [Fri, 5 Nov 2010 05:48:09 +0000 (22:48 -0700)]
mds: mark scatterlock stale on import of stale frag scatter stat

When the lock scattered, if we didn't have an auth frag that was frozen,
we go into MIX state.  Later, we may import a stale dirfrag.  We need to
move to MIX_STALE at that point, and/or mark the lock stale so that any
subsequent transition does so.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: match bottom half of assilate_dirty_rstat_inodes with a dir flag
Sage Weil [Fri, 5 Nov 2010 05:44:01 +0000 (22:44 -0700)]
mds: match bottom half of assilate_dirty_rstat_inodes with a dir flag

We only do the assimilate_dirty_rstat_inodes if we do an update AND the
frag rstat was non-stale, but the bottom half (_finish) doesn't have the
same info to know whether we did it because the top half updates the
fragstat version.  Use a flag to indicate we've updated the dirfrag so
the bottom half will only run when needed.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix inode version used for inest in decode_lock_state
Sage Weil [Fri, 5 Nov 2010 05:19:53 +0000 (22:19 -0700)]
mds: fix inode version used for inest in decode_lock_state

We need to pass the inode rstat's version into finish_scatter_update, not
the shadowed local variable.  Otherwise we don't update the dirfrag when
we should.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoPGMonitor::update_from_paxos: check for bad input
Colin Patrick McCabe [Thu, 4 Nov 2010 22:46:55 +0000 (15:46 -0700)]
PGMonitor::update_from_paxos: check for bad input

Be more robust against bad data coming in from the network.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoReplace sprintf with snprintf
Colin Patrick McCabe [Thu, 4 Nov 2010 21:33:48 +0000 (14:33 -0700)]
Replace sprintf with snprintf

Replace sprintf with snprintf. This is especially critical when the
format string includes "%s".

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agostart_profiler/enable_profiler_options:fix memleak
Colin Patrick McCabe [Thu, 4 Nov 2010 21:26:08 +0000 (14:26 -0700)]
start_profiler/enable_profiler_options:fix memleak

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoSet HEAP_PROFILE_INUSE_INTERVAL based on conf
Colin Patrick McCabe [Thu, 4 Nov 2010 21:11:41 +0000 (14:11 -0700)]
Set HEAP_PROFILE_INUSE_INTERVAL based on conf

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoCInode::make_path_string: don't coerce ino
Colin Patrick McCabe [Thu, 4 Nov 2010 21:06:09 +0000 (14:06 -0700)]
CInode::make_path_string: don't coerce ino

CInode::make_path_string: don't coerce the inode number to 32-bits.
Everyone else is treating it as 64 bits; this function should too.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agomds: mds debug scatterstat to print out projected rstat/fragstat
Sage Weil [Thu, 4 Nov 2010 20:17:01 +0000 (13:17 -0700)]
mds: mds debug scatterstat to print out projected rstat/fragstat

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: verify single frag rstat on projection too
Sage Weil [Thu, 4 Nov 2010 20:04:47 +0000 (13:04 -0700)]
mds: verify single frag rstat on projection too

Currently we do a sanity check on gather; do the same check in
project_rstat_frag_to_inode().

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'dumpjournal' into unstable
Greg Farnum [Thu, 4 Nov 2010 18:58:30 +0000 (11:58 -0700)]
Merge branch 'dumpjournal' into unstable

14 years agocmds: Include journal dumper functionality.
Greg Farnum [Thu, 4 Nov 2010 18:30:59 +0000 (11:30 -0700)]
cmds: Include journal dumper functionality.

14 years agodumper: Add new Dumper class.
Greg Farnum [Thu, 4 Nov 2010 18:30:38 +0000 (11:30 -0700)]
dumper: Add new Dumper class.

This lets you dump an MDS journal to a file.

14 years agomds: fix optional frag asserts
Sage Weil [Thu, 4 Nov 2010 18:33:49 +0000 (11:33 -0700)]
mds: fix optional frag asserts

We want these to trigger when mds_verify_scatter is true.  Only one !.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoobjecter: add new wait_for_osd_map function.
Greg Farnum [Thu, 4 Nov 2010 18:28:52 +0000 (11:28 -0700)]
objecter: add new wait_for_osd_map function.

14 years agoosd: clean up active <-> booting state transitions
Sage Weil [Thu, 4 Nov 2010 18:13:14 +0000 (11:13 -0700)]
osd: clean up active <-> booting state transitions

Among other things, get rid of the 'wrongly marked down' log message on
normal startup.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoTestEncoding: count number of ctor invocations
Colin Patrick McCabe [Thu, 4 Nov 2010 17:24:52 +0000 (10:24 -0700)]
TestEncoding: count number of ctor invocations

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agomds: dump corrupt events; optionally skip them
Sage Weil [Thu, 4 Nov 2010 04:30:11 +0000 (21:30 -0700)]
mds: dump corrupt events; optionally skip them

If we encounter a bad event in the journal, dump it to the log.

Optionally skip it, if 'mds log skip corrupt events = true'.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: wait for last_failure_osd_epoch before starting journal replay
Sage Weil [Thu, 4 Nov 2010 05:22:54 +0000 (22:22 -0700)]
mds: wait for last_failure_osd_epoch before starting journal replay

This is extremely important, and it forces the MDS to get the osdmap that
includes the blacklist entry for its predecessor.  This in turn means that
any OSD we contact trying to read the journal will be forced to get that
osdmap (or newer) before handling our read request, which means that
anything we read cannot be overwritten by a racing request from our
predecessor.  This prevents two MDSs writing to the journal at the same
time.

This change fixes potential (and observed!) journal corruption.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomon: blacklist and update last_failure_osd_epoch in all failure paths
Sage Weil [Thu, 4 Nov 2010 05:20:25 +0000 (22:20 -0700)]
mon: blacklist and update last_failure_osd_epoch in all failure paths

This includes the pure failure in do_stop(), and the explicit admin
fail command.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomon: update mdsmap.last_failure_osd_epoch when blacklisting
Sage Weil [Thu, 4 Nov 2010 05:28:54 +0000 (22:28 -0700)]
mon: update mdsmap.last_failure_osd_epoch when blacklisting

We need to note the osdmap epoch the taking-over mds needs in the mdsmap.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: add last_failure_osd_epoch to extended section of mdsmap
Sage Weil [Thu, 4 Nov 2010 05:10:46 +0000 (22:10 -0700)]
mds: add last_failure_osd_epoch to extended section of mdsmap

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMonClient: start SafeTimer in MonClient::init()
Colin Patrick McCabe [Thu, 4 Nov 2010 05:00:31 +0000 (22:00 -0700)]
MonClient: start SafeTimer in MonClient::init()

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agocosd: start SafeTimer in OSD::init()
Colin Patrick McCabe [Thu, 4 Nov 2010 04:55:40 +0000 (21:55 -0700)]
cosd: start SafeTimer in OSD::init()

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMonitor: start timer thread in init(), not ctor
Colin Patrick McCabe [Wed, 3 Nov 2010 23:31:13 +0000 (16:31 -0700)]
Monitor: start timer thread in init(), not ctor

Don't start the SafeTimer when class Monitor is created. We want to hold off on
starting the thread until SimpleMessenger has fork()ed the process.  Instead,
start the timer thread in Timer::init().

Use an auto_ptr to store the SafeTimer.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoTimer: add verbose debugging when debug timer = 20
Colin Patrick McCabe [Wed, 3 Nov 2010 22:36:09 +0000 (15:36 -0700)]
Timer: add verbose debugging when debug timer = 20

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoTestTimers: add test for out-of-order timer insert
Colin Patrick McCabe [Wed, 3 Nov 2010 22:35:34 +0000 (15:35 -0700)]
TestTimers: add test for out-of-order timer insert

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoSafeTimer: delete contexts under the event_lock
Colin Patrick McCabe [Tue, 2 Nov 2010 22:15:51 +0000 (15:15 -0700)]
SafeTimer: delete contexts under the event_lock

SafeTimer: delete contexts under the event_lock.
Also add more debug printouts and create two convenience functions.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agovstart.sh: turn on MDS debugging
Colin Patrick McCabe [Tue, 2 Nov 2010 02:33:15 +0000 (19:33 -0700)]
vstart.sh: turn on MDS debugging

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agocephtool: fix timer init/destruction
Colin Patrick McCabe [Tue, 2 Nov 2010 20:20:39 +0000 (13:20 -0700)]
cephtool: fix timer init/destruction

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoLogger.cc: avoid creating SafeTimer in global-ctor
Colin Patrick McCabe [Tue, 2 Nov 2010 20:01:18 +0000 (13:01 -0700)]
Logger.cc: avoid creating SafeTimer in global-ctor

Don't create a SafeTimer at global constructor time. Timers
contain a Thread, and the library stuff may not have been initialized at
global constructor time. Instead, just create the timer when we need it,
in flush_all_loggers.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoSafeTimer: clean up copy constructor declaration
Colin Patrick McCabe [Tue, 2 Nov 2010 20:00:58 +0000 (13:00 -0700)]
SafeTimer: clean up copy constructor declaration

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoTimer.cc: clean up debug printouts
Colin Patrick McCabe [Tue, 2 Nov 2010 19:37:10 +0000 (12:37 -0700)]
Timer.cc: clean up debug printouts

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoTestTimers: test cancelling single events
Colin Patrick McCabe [Tue, 2 Nov 2010 19:36:52 +0000 (12:36 -0700)]
TestTimers: test cancelling single events

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoTestTimers: call common_init and parse argv
Colin Patrick McCabe [Tue, 2 Nov 2010 18:37:58 +0000 (11:37 -0700)]
TestTimers: call common_init and parse argv

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoTimer: fix timer shutdown, efficiency issues
Colin Patrick McCabe [Fri, 22 Oct 2010 01:37:25 +0000 (18:37 -0700)]
Timer: fix timer shutdown, efficiency issues

Rework Timer and SafeTimer to be more efficient and to handle shutdown
correctly. Document the API, especially what locks need to held where.

The destructor for both Timer and SafeTimer now joins the timer thread
safely. The shutdown() function is available to callers who want to join
it before the Timer is destroyed.

To make things more efficient, don't create a new std::set every time we
insert a Context. Use multimap instead. Don't signal the condition
variable unless the event we have insert comes before all the other
events in the scheduled map. Don't allocate an extra Context in
SafeTimer.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoclient: print useful max_size waiting message
Sage Weil [Wed, 3 Nov 2010 23:41:29 +0000 (16:41 -0700)]
client: print useful max_size waiting message

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'mix_stale' into unstable
Sage Weil [Wed, 3 Nov 2010 23:40:19 +0000 (16:40 -0700)]
Merge branch 'mix_stale' into unstable

14 years agodebian: add gtk build-depends
Sage Weil [Wed, 3 Nov 2010 16:44:22 +0000 (09:44 -0700)]
debian: add gtk build-depends

For ceph -g.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: add 'mds verify scatter' and re-add some scatter asserts
Sage Weil [Wed, 3 Nov 2010 21:02:30 +0000 (14:02 -0700)]
mds: add 'mds verify scatter' and re-add some scatter asserts

Check on ifile and inest gather that stats match single-frag dirs.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix put_xlock() assert for slave masters
Sage Weil [Wed, 3 Nov 2010 20:51:07 +0000 (13:51 -0700)]
mds: fix put_xlock() assert for slave masters

If we are a master of a slave, the state will be LOCK.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: rename 'mix stale' => 'mix_stale'
Sage Weil [Wed, 3 Nov 2010 20:16:06 +0000 (13:16 -0700)]
mds: rename 'mix stale' => 'mix_stale'

For unambigous debug output

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: request unscatter when MIX_STALE on replica
Sage Weil [Wed, 3 Nov 2010 20:15:43 +0000 (13:15 -0700)]
mds: request unscatter when MIX_STALE on replica

This means implementing REQUNSCATTER.

Eventually this should use TEMPSYNC, but that isn't fully implemented yet.

Signed-off-by: Sage Weil <sage@newdream.net>