]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
14 years agoceph.spec.in: include gui files
Sage Weil [Tue, 30 Nov 2010 17:22:42 +0000 (09:22 -0800)]
ceph.spec.in: include gui files

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agodebian: many many cleanups
Sage Weil [Tue, 30 Nov 2010 17:13:54 +0000 (09:13 -0800)]
debian: many many cleanups

Signed-off-by: Laszlo Boszormenyi <gcs@debian.hu>
14 years agofilejournal: fix throttle vs FULL behavior
Sage Weil [Tue, 30 Nov 2010 16:55:29 +0000 (08:55 -0800)]
filejournal: fix throttle vs FULL behavior

We don't want to add to the throttler if we aren't going to queue the
write, or else we'll never take it off again.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomdcache: in trim_non_auth, only print out path if it has a parent dentry.
Greg Farnum [Tue, 23 Nov 2010 22:40:54 +0000 (14:40 -0800)]
mdcache: in trim_non_auth, only print out path if it has a parent dentry.

This should only occur with the root inode, but caused a segfault for
anybody running more than one MDS who restarted.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agomds: Reply checking_lock while reading filelock
Herb Shiu [Tue, 23 Nov 2010 07:31:50 +0000 (15:31 +0800)]
mds: Reply checking_lock while reading filelock

Use checking_lock to repalce lock_state in extra buffer list to let client can get correct file lock reply.

14 years agov0.23.1 v0.23.1
Sage Weil [Sun, 21 Nov 2010 23:23:29 +0000 (15:23 -0800)]
v0.23.1

14 years agoclient: only encode_cap_releases once per request.
Greg Farnum [Mon, 22 Nov 2010 16:50:32 +0000 (08:50 -0800)]
client: only encode_cap_releases once per request.

Accomplish this by making a list of cap releases in the (permanent)
MetaRequest, and then copying that into the (potentially-temporary)
MClientRequest.

14 years agoclient: Remove the I_COMPLETE flag from the parent directory in relink_inode.
Greg Farnum [Wed, 17 Nov 2010 17:58:38 +0000 (09:58 -0800)]
client: Remove the I_COMPLETE flag from the parent directory in relink_inode.

This papers over issues arising from the client's lack of proper support
for hard links, and lets it pass the snaptest-upchildrealms test.

14 years agoMerge remote branch 'origin/msgr' into testing
Sage Weil [Sat, 13 Nov 2010 04:43:30 +0000 (20:43 -0800)]
Merge remote branch 'origin/msgr' into testing

14 years agodebug: don't print thread id twice
Sage Weil [Sat, 13 Nov 2010 00:00:12 +0000 (16:00 -0800)]
debug: don't print thread id twice

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: cleanup: make queue_received non-inline; some helpful debug
Sage Weil [Fri, 12 Nov 2010 23:59:50 +0000 (15:59 -0800)]
msgr: cleanup: make queue_received non-inline; some helpful debug

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: do not clear halt_delivery
Sage Weil [Fri, 12 Nov 2010 23:56:54 +0000 (15:56 -0800)]
msgr: do not clear halt_delivery

We need to keep the halt_delivery plug set on failure/shutdown in order to
prevent a racing reader from queuing new messages.  The only time we clear
it is when we discard messages due to a session reset.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: close enqueue/discard race
Sage Weil [Fri, 12 Nov 2010 22:41:53 +0000 (14:41 -0800)]
msgr: close enqueue/discard race

We need to re-check halt_delivery after dropping and retaking pipe_lock.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: protect pipe queue_item map with pipe_lock AND dispatch_queue lock
Sage Weil [Fri, 12 Nov 2010 22:05:56 +0000 (14:05 -0800)]
msgr: protect pipe queue_item map with pipe_lock AND dispatch_queue lock

Close a few different races here.

Also, assert that queue_items are not queued in ~Pipe().

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: add 'ms inject socket failures = foo'
Sage Weil [Fri, 12 Nov 2010 21:53:49 +0000 (13:53 -0800)]
msgr: add 'ms inject socket failures = foo'

Where we fail roughly every foo'th socket operation.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: only close socket on reconnect or shutdown
Sage Weil [Fri, 12 Nov 2010 21:09:24 +0000 (13:09 -0800)]
msgr: only close socket on reconnect or shutdown

We can't modify 'sd' or (more importnatly) close sd while any other thread
might be using it, or else we might race with an open and they might end
up using someone else's fd.

Take care to _only_ close(sd) in connect(), when the reader thread is
stopped, or when reaping the connection.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: protect pipe queuing with _both_ pipe and dispatch_queue locks
Sage Weil [Fri, 12 Nov 2010 21:41:14 +0000 (13:41 -0800)]
msgr: protect pipe queuing with _both_ pipe and dispatch_queue locks

We want to make sure the pipe's queue item doesn't go away.

Also, make queue_received() require pipe_lock to be held.  This avoids some
useless unlocking/locking, since (in the case where the pipe is already
queued) we then don't need to drop the pipe_lock at all.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agouclient: insert lssnap results under snapdir, not live dir
Sage Weil [Fri, 12 Nov 2010 15:55:41 +0000 (07:55 -0800)]
uclient: insert lssnap results under snapdir, not live dir

Put the readdir results (list of snapshots) in the right place in the
hierarchy; we were putting them in the parent dir (as if they were real
directories).

This bug manifested itself as a snaptest-2.sh failure.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsg: fix buffer size for IPv6 address parsing
Wido den Hollander [Fri, 12 Nov 2010 15:36:00 +0000 (07:36 -0800)]
msg: fix buffer size for IPv6 address parsing

Signed-off-by: Wido den Hollander <wido@widodh.nl>
14 years agov0.23 v0.23
Sage Weil [Thu, 11 Nov 2010 00:34:17 +0000 (16:34 -0800)]
v0.23

14 years agomds: fix null_snapflush with multiple intervening snaps
Sage Weil [Thu, 11 Nov 2010 04:58:49 +0000 (20:58 -0800)]
mds: fix null_snapflush with multiple intervening snaps

The client is allowed to not send a snapflush if there is no dirty metadata
to write for a given snap.  However, the mds can only look up inodes by
the last snapid in the interval.  So, when doing a null_snapflush (filling
in for snapflushes the client didn't send), we have to walk forward through
intervening snaps until we find the right inode.

Note that this means we will call _do_snap_update multiple times on the
same inode, but with different snapids.

Add unit test to check this.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix inode->frag rstat projected with snaps
Sage Weil [Wed, 10 Nov 2010 17:43:56 +0000 (09:43 -0800)]
mds: fix inode->frag rstat projected with snaps

The snapid 'first' value needs to be >= inode->first; move that into
the helper.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosdmap: break up asserts for easier debugging
Sage Weil [Wed, 10 Nov 2010 17:04:31 +0000 (09:04 -0800)]
osdmap: break up asserts for easier debugging

If we fail one of these it's helpful to know which one.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoobjecter: throttle before looking at lock protected state
Sage Weil [Wed, 10 Nov 2010 17:03:37 +0000 (09:03 -0800)]
objecter: throttle before looking at lock protected state

The take_op_budget() may drop our lock if we are in keep_balanced_budget
mode, so we need to do that _before_ we take references to internal state
that may change out from under us during that time.

This fixes a crash like

./osd/OSDMap.h: In function 'entity_inst_t OSDMap::get_inst(int)':
./osd/OSDMap.h:460: FAILED assert(exists(osd) && is_up(osd))
ceph version 0.22.1 (commit:c6f403a6f441184956e00659ce713eaee7014279)
1: (Objecter::op_submit(Objecter::Op*)+0x6c2) [0x38658854c2]
2: /usr/lib64/librados.so.1() [0x3865855dc9]
3: (RadosClient::aio_write(RadosClient::PoolCtx&, object_t, long,
ceph::buffer::list const&, unsigned long,
RadosClient::AioCompletion*)+0x24b) [0x386585724b]
4: (rados_aio_write()+0x9a) [0x386585741a]
5: /usr/bin/qemu-kvm() [0x45a305]
6: /usr/bin/qemu-kvm() [0x45a430]
7: /usr/bin/qemu-kvm() [0x43bb73]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
./osd/OSDMap.h: In function 'entity_inst_t OSDMap::get_inst(int)':
./osd/OSDMap.h:460: FAILED assert(exists(osd) && is_up(osd))
ceph version 0.22.1 (commit:c6f403a6f441184956e00659ce713eaee7014279)
1: (Objecter::op_submit(Objecter::Op*)+0x6c2) [0x38658854c2]
2: /usr/lib64/librados.so.1() [0x3865855dc9]
3: (RadosClient::aio_write(RadosClient::PoolCtx&, object_t, long,
ceph::buffer::list const&, unsigned long,
RadosClient::AioCompletion*)+0x24b) [0x386585724b]
4: (rados_aio_write()+0x9a) [0x386585741a]
5: /usr/bin/qemu-kvm() [0x45a305]
6: /usr/bin/qemu-kvm() [0x45a430]
7: /usr/bin/qemu-kvm() [0x43bb73]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
*** Caught signal (ABRT) ***
ceph version 0.22.1 (commit:c6f403a6f441184956e00659ce713eaee7014279)
1: (sigabrt_handler(int)+0x91) [0x3865922b91]
2: /lib64/libc.so.6() [0x3c0c032a30]
3: (gsignal()+0x35) [0x3c0c0329b5]
4: (abort()+0x175) [0x3c0c034195]
5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x3c110beaad]

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomon: drop unnecessary state checks
Sage Weil [Wed, 10 Nov 2010 16:50:25 +0000 (08:50 -0800)]
mon: drop unnecessary state checks

We want to ignore all beacons from the mds regardless of what state they
are in.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agodebian: don't explicitly depend on libgoogle-perftools0
Sage Weil [Wed, 10 Nov 2010 16:45:36 +0000 (08:45 -0800)]
debian: don't explicitly depend on libgoogle-perftools0

dpkg-buildpackage will autodetect the dependency.  Except on lenny, where
it doesn't exist and we don't use it!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: Enable --journal_check mode.
Greg Farnum [Wed, 10 Nov 2010 16:11:23 +0000 (08:11 -0800)]
mds: Enable --journal_check mode.

This replaces the old --shadow option, which didn't work.
It starts up the MDS daemon, then replays the journal for
another MDS, and then shuts down.

Also minimally modifies the MDSMonitor to enable this
behavior; since it requires shared state.

14 years agoosdc: Fix bad assert in ~ObjectCacher.
Greg Farnum [Tue, 9 Nov 2010 18:48:00 +0000 (10:48 -0800)]
osdc: Fix bad assert in ~ObjectCacher.

The objects data member is never empty on shutdown since it now consists
of a vector of pools. Instead, check each pool map for emptiness.

14 years agouclient: only update inode if version increased
Sage Weil [Wed, 10 Nov 2010 15:42:29 +0000 (07:42 -0800)]
uclient: only update inode if version increased

This realigns the code with the kernel version, fixing a number of
problems when you have multiple MDSs returning info on the same inode.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agogui: add missing #include
Sage Weil [Tue, 9 Nov 2010 23:04:10 +0000 (15:04 -0800)]
gui: add missing #include

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMakefile: use openssl module check
Kacper Kowalik [Tue, 9 Nov 2010 21:30:15 +0000 (13:30 -0800)]
Makefile: use openssl module check

This allows ceph to build with --as-needed.

Signed-off-by: Kacper Kowalik <xarthisius@gentoo.org>
14 years agoosd: shut down if we do not exist
Sage Weil [Tue, 9 Nov 2010 21:17:25 +0000 (13:17 -0800)]
osd: shut down if we do not exist

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: handle osds that no longer exist in prior_set_affected
Sage Weil [Tue, 9 Nov 2010 21:08:56 +0000 (13:08 -0800)]
osd: handle osds that no longer exist in prior_set_affected

Consider no-longer-existent OSDs lost.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix inode freeze auth pin allowance
Sage Weil [Tue, 9 Nov 2010 17:55:14 +0000 (09:55 -0800)]
mds: fix inode freeze auth pin allowance

When we're renaming across nodes, we need to freeze the inode.  This
requires that we allow for the auth_pins that _we_ hold, which include
one because of the linklock xlock, and one by the MDRequest.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: handle osds that no longer exist in build_prior
Sage Weil [Tue, 9 Nov 2010 17:43:25 +0000 (09:43 -0800)]
osd: handle osds that no longer exist in build_prior

Fix build_prior to handle OSDs that no longer exist in the current map.
Consider them lost.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph.spec.in: don't strip rados classes
Christian Brunner [Tue, 9 Nov 2010 06:03:02 +0000 (22:03 -0800)]
ceph.spec.in: don't strip rados classes

Signed-off-by: Christian Brunner <christian@brunner-muc.de>
14 years agomds: add missing Dumper.[h,cc]
Sage Weil [Sat, 6 Nov 2010 19:12:38 +0000 (12:12 -0700)]
mds: add missing Dumper.[h,cc]

14 years agomds: tolerate/fix negative dir size counts
Sage Weil [Mon, 8 Nov 2010 21:18:31 +0000 (13:18 -0800)]
mds: tolerate/fix negative dir size counts

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge remote branch 'origin/testing' into unstable
Sage Weil [Sun, 7 Nov 2010 17:42:51 +0000 (09:42 -0800)]
Merge remote branch 'origin/testing' into unstable

14 years agomds: eval: put scatter in MIX if replicated, otherwise LOCK
Sage Weil [Sun, 7 Nov 2010 15:49:59 +0000 (07:49 -0800)]
mds: eval: put scatter in MIX if replicated, otherwise LOCK

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: do not scatter_writebehind in MIX state
Sage Weil [Sun, 7 Nov 2010 15:45:52 +0000 (07:45 -0800)]
mds: do not scatter_writebehind in MIX state

Replicas might come in while we're flushing and get a MIX state with
the old state.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'unstable' into mix_stale
Sage Weil [Sun, 7 Nov 2010 04:05:11 +0000 (21:05 -0700)]
Merge branch 'unstable' into mix_stale

14 years agomds: remove MIX_STALE
Sage Weil [Sat, 6 Nov 2010 18:35:54 +0000 (11:35 -0700)]
mds: remove MIX_STALE

Yay, we don't need it!

If we can't update the frag on scatter, fine.  The staleness of the frag
is implicit in the frag's scatter stat version not matching the inode's.
If/when we do want to update it, the frag will clearly be writable, and
we can bring it back in sync then.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: don't fuss with versions when taking frag/rstat from frag; it's never stale...
Sage Weil [Sat, 6 Nov 2010 18:18:53 +0000 (11:18 -0700)]
mds: don't fuss with versions when taking frag/rstat from frag; it's never stale here

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: introduce/use helpers to resync stale fragstat/rstat; update version
Sage Weil [Sat, 6 Nov 2010 18:18:13 +0000 (11:18 -0700)]
mds: introduce/use helpers to resync stale fragstat/rstat; update version

Simplifies code.

Also, update the version when we resync!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: ignore done_locking on slave requests' acquire_locks()
Sage Weil [Sun, 7 Nov 2010 03:55:12 +0000 (20:55 -0700)]
mds: ignore done_locking on slave requests' acquire_locks()

Slave requests ask for each xlock one at a time.  Don't bail out based on
the done_locking flag.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: don't use helper for rename srcdn
Sage Weil [Sun, 7 Nov 2010 03:17:32 +0000 (20:17 -0700)]
mds: don't use helper for rename srcdn

The rdlock_path_xlock_dentry helper works for _auth_ dentries that we
create locally in an auth dirfrag.  For the srcdn, we need to discover an
_existing_ dentry that is not necessarily auth.

Call path_traverse ourselves, but be careful to take the appropriate locks
on the resulting dn, dir, and ancestors.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: never complete a gather on a flushing lock
Sage Weil [Sat, 6 Nov 2010 18:02:13 +0000 (11:02 -0700)]
mds: never complete a gather on a flushing lock

The scatter_writebehind() takes a wrlock, but that may still allow the lock
to complete a gather to LOCK and even move to say MIX before the data is
committed.  Bad news!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: update version when bring stale rstat back up to date
Sage Weil [Sat, 6 Nov 2010 16:38:15 +0000 (09:38 -0700)]
mds: update version when bring stale rstat back up to date

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: simplify stale semantics a bit
Sage Weil [Sat, 6 Nov 2010 14:58:32 +0000 (07:58 -0700)]
mds: simplify stale semantics a bit

is_stale() => next MIX is MIX_STALE. Stale flag is then cleared.  Then we
special case the import to preserve stale-ness.

TODO: add_replica_inode likely has this same problem.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: preserve stale state on import; some cleanup
Sage Weil [Sat, 6 Nov 2010 04:52:28 +0000 (21:52 -0700)]
mds: preserve stale state on import; some cleanup

Our new invariant is that MIX_STALE always implies is_stale().  And on
import, if is_stale(), MIX becomes MIX_STALE.  This ensures that a replica
that we put into MIX_STALE doesn't turn back into MIX if we import it
and take the auth's state in CInode::decode_import().

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'mix_stale' into unstable
Sage Weil [Sat, 6 Nov 2010 00:08:10 +0000 (17:08 -0700)]
Merge branch 'mix_stale' into unstable

14 years agomds: add more verify_scatter asserts
Sage Weil [Sat, 6 Nov 2010 00:06:10 +0000 (17:06 -0700)]
mds: add more verify_scatter asserts

For catchings fragstat errors sooner.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix version check on resyncing stale rstat in predirty_journal_parents
Sage Weil [Fri, 5 Nov 2010 22:24:53 +0000 (15:24 -0700)]
mds: fix version check on resyncing stale rstat in predirty_journal_parents

We're resyncing rstat, so check the rstat version (not fragstat!)

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: Fix bad inode deref.
Greg Farnum [Fri, 5 Nov 2010 19:45:06 +0000 (12:45 -0700)]
mds: Fix bad inode deref.

Accidentally trying to print out the CInode after removing it in trim_non_auth!
Move the print to before it's been unlinked/removed/etc.

14 years agoRevisit std::multimap decoder
Colin Patrick McCabe [Fri, 5 Nov 2010 19:17:40 +0000 (12:17 -0700)]
Revisit std::multimap decoder

Previously I changed the std::multimap decoder to minimize the number of
constructor invocations. However, it could be much more expensive to
copy an initialized (decoded) val_t than to copy an empty one. For
example, if we are decoding std::multimap < int, std::set <int> >. So
change the code to insert a non-decoded val_t again.

However, this still saves two constructor invocations over the original.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoautogen.sh: check for pkg-config
Colin Patrick McCabe [Fri, 5 Nov 2010 18:34:11 +0000 (11:34 -0700)]
autogen.sh: check for pkg-config

To avoid seeing confusing errors later in the configure process, in
autogen.sh, check to make sure the pkg-config program is installed.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agomds: preserve version when recovering rstat from dirfrag in predirty_journal_parents
Sage Weil [Fri, 5 Nov 2010 17:38:35 +0000 (10:38 -0700)]
mds: preserve version when recovering rstat from dirfrag in predirty_journal_parents

We don't want to screw up the version here.  This aligns the code with
other instances of this check.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: restructure finish_scatter_gather_update()
Sage Weil [Fri, 5 Nov 2010 06:20:33 +0000 (23:20 -0700)]
mds: restructure finish_scatter_gather_update()

Separate behavior into two dimensions: whether or not we are updating
the dirfrag, and whether or not the dirfrag is stale.

Change the various helpers to NOT implicitly update accounted_*, as the
caller doesn't always want that, notably when we are non-stale but frozen.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: do not bump scatter stat lock in predirty_journal_parents
Sage Weil [Fri, 5 Nov 2010 06:15:06 +0000 (23:15 -0700)]
mds: do not bump scatter stat lock in predirty_journal_parents

If we're in the MIX state, we clearly can't touch this without screwing up
the delicate scatter/gather behavior.  If we're in, say, LOCK, there is
still no reason to update it.  One frag at least is local and auth if we
are in this code, but there may be other frags on other nodes.  This would
just make them appear stale when they are not.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: mark scatterlock stale on import of stale frag scatter stat
Sage Weil [Fri, 5 Nov 2010 05:48:09 +0000 (22:48 -0700)]
mds: mark scatterlock stale on import of stale frag scatter stat

When the lock scattered, if we didn't have an auth frag that was frozen,
we go into MIX state.  Later, we may import a stale dirfrag.  We need to
move to MIX_STALE at that point, and/or mark the lock stale so that any
subsequent transition does so.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: match bottom half of assilate_dirty_rstat_inodes with a dir flag
Sage Weil [Fri, 5 Nov 2010 05:44:01 +0000 (22:44 -0700)]
mds: match bottom half of assilate_dirty_rstat_inodes with a dir flag

We only do the assimilate_dirty_rstat_inodes if we do an update AND the
frag rstat was non-stale, but the bottom half (_finish) doesn't have the
same info to know whether we did it because the top half updates the
fragstat version.  Use a flag to indicate we've updated the dirfrag so
the bottom half will only run when needed.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix inode version used for inest in decode_lock_state
Sage Weil [Fri, 5 Nov 2010 05:19:53 +0000 (22:19 -0700)]
mds: fix inode version used for inest in decode_lock_state

We need to pass the inode rstat's version into finish_scatter_update, not
the shadowed local variable.  Otherwise we don't update the dirfrag when
we should.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoPGMonitor::update_from_paxos: check for bad input
Colin Patrick McCabe [Thu, 4 Nov 2010 22:46:55 +0000 (15:46 -0700)]
PGMonitor::update_from_paxos: check for bad input

Be more robust against bad data coming in from the network.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoReplace sprintf with snprintf
Colin Patrick McCabe [Thu, 4 Nov 2010 21:33:48 +0000 (14:33 -0700)]
Replace sprintf with snprintf

Replace sprintf with snprintf. This is especially critical when the
format string includes "%s".

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agostart_profiler/enable_profiler_options:fix memleak
Colin Patrick McCabe [Thu, 4 Nov 2010 21:26:08 +0000 (14:26 -0700)]
start_profiler/enable_profiler_options:fix memleak

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoSet HEAP_PROFILE_INUSE_INTERVAL based on conf
Colin Patrick McCabe [Thu, 4 Nov 2010 21:11:41 +0000 (14:11 -0700)]
Set HEAP_PROFILE_INUSE_INTERVAL based on conf

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoCInode::make_path_string: don't coerce ino
Colin Patrick McCabe [Thu, 4 Nov 2010 21:06:09 +0000 (14:06 -0700)]
CInode::make_path_string: don't coerce ino

CInode::make_path_string: don't coerce the inode number to 32-bits.
Everyone else is treating it as 64 bits; this function should too.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agomds: mds debug scatterstat to print out projected rstat/fragstat
Sage Weil [Thu, 4 Nov 2010 20:17:01 +0000 (13:17 -0700)]
mds: mds debug scatterstat to print out projected rstat/fragstat

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: verify single frag rstat on projection too
Sage Weil [Thu, 4 Nov 2010 20:04:47 +0000 (13:04 -0700)]
mds: verify single frag rstat on projection too

Currently we do a sanity check on gather; do the same check in
project_rstat_frag_to_inode().

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'dumpjournal' into unstable
Greg Farnum [Thu, 4 Nov 2010 18:58:30 +0000 (11:58 -0700)]
Merge branch 'dumpjournal' into unstable

14 years agocmds: Include journal dumper functionality.
Greg Farnum [Thu, 4 Nov 2010 18:30:59 +0000 (11:30 -0700)]
cmds: Include journal dumper functionality.

14 years agodumper: Add new Dumper class.
Greg Farnum [Thu, 4 Nov 2010 18:30:38 +0000 (11:30 -0700)]
dumper: Add new Dumper class.

This lets you dump an MDS journal to a file.

14 years agomds: fix optional frag asserts
Sage Weil [Thu, 4 Nov 2010 18:33:49 +0000 (11:33 -0700)]
mds: fix optional frag asserts

We want these to trigger when mds_verify_scatter is true.  Only one !.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoobjecter: add new wait_for_osd_map function.
Greg Farnum [Thu, 4 Nov 2010 18:28:52 +0000 (11:28 -0700)]
objecter: add new wait_for_osd_map function.

14 years agoosd: clean up active <-> booting state transitions
Sage Weil [Thu, 4 Nov 2010 18:13:14 +0000 (11:13 -0700)]
osd: clean up active <-> booting state transitions

Among other things, get rid of the 'wrongly marked down' log message on
normal startup.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoTestEncoding: count number of ctor invocations
Colin Patrick McCabe [Thu, 4 Nov 2010 17:24:52 +0000 (10:24 -0700)]
TestEncoding: count number of ctor invocations

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agomds: dump corrupt events; optionally skip them
Sage Weil [Thu, 4 Nov 2010 04:30:11 +0000 (21:30 -0700)]
mds: dump corrupt events; optionally skip them

If we encounter a bad event in the journal, dump it to the log.

Optionally skip it, if 'mds log skip corrupt events = true'.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: wait for last_failure_osd_epoch before starting journal replay
Sage Weil [Thu, 4 Nov 2010 05:22:54 +0000 (22:22 -0700)]
mds: wait for last_failure_osd_epoch before starting journal replay

This is extremely important, and it forces the MDS to get the osdmap that
includes the blacklist entry for its predecessor.  This in turn means that
any OSD we contact trying to read the journal will be forced to get that
osdmap (or newer) before handling our read request, which means that
anything we read cannot be overwritten by a racing request from our
predecessor.  This prevents two MDSs writing to the journal at the same
time.

This change fixes potential (and observed!) journal corruption.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomon: blacklist and update last_failure_osd_epoch in all failure paths
Sage Weil [Thu, 4 Nov 2010 05:20:25 +0000 (22:20 -0700)]
mon: blacklist and update last_failure_osd_epoch in all failure paths

This includes the pure failure in do_stop(), and the explicit admin
fail command.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomon: update mdsmap.last_failure_osd_epoch when blacklisting
Sage Weil [Thu, 4 Nov 2010 05:28:54 +0000 (22:28 -0700)]
mon: update mdsmap.last_failure_osd_epoch when blacklisting

We need to note the osdmap epoch the taking-over mds needs in the mdsmap.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: add last_failure_osd_epoch to extended section of mdsmap
Sage Weil [Thu, 4 Nov 2010 05:10:46 +0000 (22:10 -0700)]
mds: add last_failure_osd_epoch to extended section of mdsmap

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoclient: print useful max_size waiting message
Sage Weil [Wed, 3 Nov 2010 23:41:29 +0000 (16:41 -0700)]
client: print useful max_size waiting message

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'mix_stale' into unstable
Sage Weil [Wed, 3 Nov 2010 23:40:19 +0000 (16:40 -0700)]
Merge branch 'mix_stale' into unstable

14 years agodebian: add gtk build-depends
Sage Weil [Wed, 3 Nov 2010 16:44:22 +0000 (09:44 -0700)]
debian: add gtk build-depends

For ceph -g.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: add 'mds verify scatter' and re-add some scatter asserts
Sage Weil [Wed, 3 Nov 2010 21:02:30 +0000 (14:02 -0700)]
mds: add 'mds verify scatter' and re-add some scatter asserts

Check on ifile and inest gather that stats match single-frag dirs.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix put_xlock() assert for slave masters
Sage Weil [Wed, 3 Nov 2010 20:51:07 +0000 (13:51 -0700)]
mds: fix put_xlock() assert for slave masters

If we are a master of a slave, the state will be LOCK.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: rename 'mix stale' => 'mix_stale'
Sage Weil [Wed, 3 Nov 2010 20:16:06 +0000 (13:16 -0700)]
mds: rename 'mix stale' => 'mix_stale'

For unambigous debug output

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: request unscatter when MIX_STALE on replica
Sage Weil [Wed, 3 Nov 2010 20:15:43 +0000 (13:15 -0700)]
mds: request unscatter when MIX_STALE on replica

This means implementing REQUNSCATTER.

Eventually this should use TEMPSYNC, but that isn't fully implemented yet.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: disable tempsync
Sage Weil [Wed, 3 Nov 2010 20:09:18 +0000 (13:09 -0700)]
mds: disable tempsync

Tempsync is not implemented in the filelock state machine.  Never use it,
at lesat for now!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: finish_scatter_update on auth dirfrags too
Sage Weil [Wed, 3 Nov 2010 21:31:19 +0000 (14:31 -0700)]
mds: finish_scatter_update on auth dirfrags too

We can update the dirfrag accounted on auth dirfrags at scatter time too.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: use helper for scatter dirfrag update; use on local dirfrags
Sage Weil [Wed, 3 Nov 2010 20:08:06 +0000 (13:08 -0700)]
mds: use helper for scatter dirfrag update; use on local dirfrags

Any time we scatter is an opportunity to update the dirfrag with the
accounted scatter stat if it is out of date.  We should use that
opportunity even when the dirfrag is on the same node as the inode (i.e.,
not just through decode_lock_state).

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoAdd the ps-ceph.sh tool
Colin Patrick McCabe [Wed, 3 Nov 2010 19:52:49 +0000 (12:52 -0700)]
Add the ps-ceph.sh tool

This allows you to see at a glance which ceph programs and tools you
have running.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoencoding.h: fix compiler warning
Colin Patrick McCabe [Wed, 3 Nov 2010 19:15:20 +0000 (12:15 -0700)]
encoding.h: fix compiler warning

Fix a compiler warning about an uninitialized variable. Basically, we
used to insert uninitialized values into a std::multimap and then fix
them later. Rather than doing that, just insert the value we want
directly into the map.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoTestEncoding: add templated encode-then-decode fn
Colin Patrick McCabe [Wed, 3 Nov 2010 19:14:21 +0000 (12:14 -0700)]
TestEncoding: add templated encode-then-decode fn

TestEncoding: add a templated encode-then-decode fn that can be used to
test encoding followed by decoding of any type. Test encoding and
decoding of a std::multimap.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoCreate TestEncoding to test serialization code
Colin Patrick McCabe [Wed, 3 Nov 2010 19:03:32 +0000 (12:03 -0700)]
Create TestEncoding to test serialization code

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agomds: add some scatterlock notes
Sage Weil [Wed, 3 Nov 2010 18:07:40 +0000 (11:07 -0700)]
mds: add some scatterlock notes

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph: remove bad assert for old frag stat
Sage Weil [Wed, 3 Nov 2010 18:03:37 +0000 (11:03 -0700)]
ceph: remove bad assert for old frag stat

It's normal for old fragstat info to be mismatched (stat !=
accounted_stat).

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: match conditions in finish_scatter_gather_update_accounted
Sage Weil [Wed, 3 Nov 2010 17:51:42 +0000 (10:51 -0700)]
mds: match conditions in finish_scatter_gather_update_accounted

This needs to match the frozen check in finish_scatter_gather_update.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: handle MIX_STALE on auth too
Sage Weil [Wed, 3 Nov 2010 17:12:35 +0000 (10:12 -0700)]
mds: handle MIX_STALE on auth too

Signed-off-by: Sage Weil <sage@newdream.net>