]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
14 years agoMerge branch 'rc' into unstable
Sage Weil [Thu, 11 Nov 2010 22:28:18 +0000 (14:28 -0800)]
Merge branch 'rc' into unstable

Conflicts:
configure.ac
src/Makefile.am

14 years agov0.23 v0.23
Sage Weil [Thu, 11 Nov 2010 00:34:17 +0000 (16:34 -0800)]
v0.23

14 years agomds: fix null_snapflush with multiple intervening snaps
Sage Weil [Thu, 11 Nov 2010 04:58:49 +0000 (20:58 -0800)]
mds: fix null_snapflush with multiple intervening snaps

The client is allowed to not send a snapflush if there is no dirty metadata
to write for a given snap.  However, the mds can only look up inodes by
the last snapid in the interval.  So, when doing a null_snapflush (filling
in for snapflushes the client didn't send), we have to walk forward through
intervening snaps until we find the right inode.

Note that this means we will call _do_snap_update multiple times on the
same inode, but with different snapids.

Add unit test to check this.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge remote branch 'origin/unfound' into unstable
Sage Weil [Thu, 11 Nov 2010 00:36:18 +0000 (16:36 -0800)]
Merge remote branch 'origin/unfound' into unstable

14 years agoosd: scrub least recently scrubbed pgs first; once a day
Sage Weil [Thu, 11 Nov 2010 00:31:26 +0000 (16:31 -0800)]
osd: scrub least recently scrubbed pgs first; once a day

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: don't scrub something we just scrubbed
Sage Weil [Wed, 10 Nov 2010 23:43:37 +0000 (15:43 -0800)]
osd: don't scrub something we just scrubbed

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: call sched_scrub on reserve reply
Sage Weil [Wed, 10 Nov 2010 23:33:31 +0000 (15:33 -0800)]
osd: call sched_scrub on reserve reply

Otherwise we have to wait until the next time it's called by the timer, and
during that period we have a reservation locally, and any other peers can't
reserve a scrub from us, and nobody makes any progress.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: fix sched_scrub
Sage Weil [Wed, 10 Nov 2010 23:28:39 +0000 (15:28 -0800)]
osd: fix sched_scrub

Insert whoami into reserved set on primary, not 0!  Also more cleanup of
sched state helpers.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: do scrub schedule state changes inside scrub()
Sage Weil [Wed, 10 Nov 2010 22:58:34 +0000 (14:58 -0800)]
osd: do scrub schedule state changes inside scrub()

Update these values under protection of pg lock iff we start scrubbing,
otherwise back out.

On scrub completion, unreserve replicas.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: track last_scrubbed in PG::Info::History
Sage Weil [Wed, 10 Nov 2010 22:55:57 +0000 (14:55 -0800)]
osd: track last_scrubbed in PG::Info::History

Share with peers and write to disk on scrub completion.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: scrub: change cancel behavior
Sage Weil [Wed, 10 Nov 2010 22:15:41 +0000 (14:15 -0800)]
osd: scrub: change cancel behavior

Use explicit flag, so that scrub_reserved always indicates whether the
osd count includes us or not.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agopg_state_string: use an ostringstream
Colin Patrick McCabe [Wed, 10 Nov 2010 22:43:26 +0000 (14:43 -0800)]
pg_state_string: use an ostringstream

Use an ostringstream for efficiency's sake.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agovstart: stop logging to /tmp/foo
Sage Weil [Wed, 10 Nov 2010 21:49:22 +0000 (13:49 -0800)]
vstart: stop logging to /tmp/foo

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: fix scrub reserved state when starting scrub
Sage Weil [Wed, 10 Nov 2010 21:39:51 +0000 (13:39 -0800)]
osd: fix scrub reserved state when starting scrub

Also document scrub scheduling/pending/active states.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agovstart: turn down msgr debugging
Sage Weil [Wed, 10 Nov 2010 21:16:34 +0000 (13:16 -0800)]
vstart: turn down msgr debugging

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomonc: cancel timer events with lock held
Sage Weil [Wed, 10 Nov 2010 21:13:38 +0000 (13:13 -0800)]
monc: cancel timer events with lock held

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoWake up clients waiting for now-found objects
Colin Patrick McCabe [Tue, 9 Nov 2010 06:15:14 +0000 (22:15 -0800)]
Wake up clients waiting for now-found objects

PG::search_for_missing: when we find a previously unfound object, check
to see if there is an entry in waiting_for_missing_object representing a
client waiting for this object.

PG::repair_object: assert that waiting_for_missing_object is empty
before messing with missing_loc. It definitely should be during a scrub.

ReplicatedPG role change logic: always take_object_waiters on the wait
queues when the PG acting set changes.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agotest_unfound.sh: test reading an unfound object.
Colin Patrick McCabe [Mon, 8 Nov 2010 20:30:26 +0000 (12:30 -0800)]
test_unfound.sh: test reading an unfound object.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agotest_unfound.sh: verify that we have unfound objs
Colin Patrick McCabe [Fri, 5 Nov 2010 00:28:39 +0000 (17:28 -0700)]
test_unfound.sh: verify that we have unfound objs

test_unfound.sh: verify that we have unfound objs.
Then, when we bring up the other OSD, verify that those unfound objects
are found (on that OSD).

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoAdd num_objects_unfound to struct pg_stat_t
Colin Patrick McCabe [Thu, 4 Nov 2010 21:40:22 +0000 (14:40 -0700)]
Add num_objects_unfound to struct pg_stat_t

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agotest_unfound.sh: shorter test
Colin Patrick McCabe [Wed, 3 Nov 2010 00:58:52 +0000 (17:58 -0700)]
test_unfound.sh: shorter test

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoPG::recover_master_log: rename a local variable
Colin Patrick McCabe [Wed, 3 Nov 2010 00:56:00 +0000 (17:56 -0700)]
PG::recover_master_log: rename a local variable

PG::recover_master_log: rename a local variable to avoid using the
overloaded term "missing".

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoOSD::_process_pg_info:search_for_missing sometimes
Colin Patrick McCabe [Wed, 3 Nov 2010 00:51:02 +0000 (17:51 -0700)]
OSD::_process_pg_info:search_for_missing sometimes

OSD::_process_pg_info: If we're the primary for this active PG, and we
have missing objects, call search_for_missing. This should ensure that
we know where to find our missing objects.

The reason why this wasn't there before is that previously, we kept the
PG from going active until all the missing objects were found.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoAdd PG::Missing::have_missing()
Colin Patrick McCabe [Wed, 3 Nov 2010 00:50:28 +0000 (17:50 -0700)]
Add PG::Missing::have_missing()

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoPG::search_for_missing: minor refactoring, comment
Colin Patrick McCabe [Wed, 3 Nov 2010 00:49:37 +0000 (17:49 -0700)]
PG::search_for_missing: minor refactoring, comment

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoPG::peer: don't block if objects are unfound
Colin Patrick McCabe [Wed, 3 Nov 2010 00:42:48 +0000 (17:42 -0700)]
PG::peer: don't block if objects are unfound

Erase the code in PG::peer that used to keep us from becoming active
when objects were still unfound. Print out the number of missing and
unfound objects at the end of PG::peer.

Erase PG::check_for_lost_objects and PG::forget_lost_objects.

14 years agoPG::peer: count/find cleanup
Colin Patrick McCabe [Wed, 3 Nov 2010 00:38:16 +0000 (17:38 -0700)]
PG::peer: count/find cleanup

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoPG.h erase deadcode
Colin Patrick McCabe [Wed, 3 Nov 2010 00:36:11 +0000 (17:36 -0700)]
PG.h erase deadcode

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoPG: nomenclature change: talk about unfound objs
Colin Patrick McCabe [Wed, 3 Nov 2010 00:34:07 +0000 (17:34 -0700)]
PG: nomenclature change: talk about unfound objs

Describe objects as "unfound" when we don't know what OSD has them.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoPG: move ostream operator to .cpp file
Colin Patrick McCabe [Wed, 3 Nov 2010 00:31:20 +0000 (17:31 -0700)]
PG: move ostream operator to .cpp file

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agomds: fix inode->frag rstat projected with snaps
Sage Weil [Wed, 10 Nov 2010 17:43:56 +0000 (09:43 -0800)]
mds: fix inode->frag rstat projected with snaps

The snapid 'first' value needs to be >= inode->first; move that into
the helper.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosdmap: break up asserts for easier debugging
Sage Weil [Wed, 10 Nov 2010 17:04:31 +0000 (09:04 -0800)]
osdmap: break up asserts for easier debugging

If we fail one of these it's helpful to know which one.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoobjecter: throttle before looking at lock protected state
Sage Weil [Wed, 10 Nov 2010 17:03:37 +0000 (09:03 -0800)]
objecter: throttle before looking at lock protected state

The take_op_budget() may drop our lock if we are in keep_balanced_budget
mode, so we need to do that _before_ we take references to internal state
that may change out from under us during that time.

This fixes a crash like

./osd/OSDMap.h: In function 'entity_inst_t OSDMap::get_inst(int)':
./osd/OSDMap.h:460: FAILED assert(exists(osd) && is_up(osd))
ceph version 0.22.1 (commit:c6f403a6f441184956e00659ce713eaee7014279)
1: (Objecter::op_submit(Objecter::Op*)+0x6c2) [0x38658854c2]
2: /usr/lib64/librados.so.1() [0x3865855dc9]
3: (RadosClient::aio_write(RadosClient::PoolCtx&, object_t, long,
ceph::buffer::list const&, unsigned long,
RadosClient::AioCompletion*)+0x24b) [0x386585724b]
4: (rados_aio_write()+0x9a) [0x386585741a]
5: /usr/bin/qemu-kvm() [0x45a305]
6: /usr/bin/qemu-kvm() [0x45a430]
7: /usr/bin/qemu-kvm() [0x43bb73]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
./osd/OSDMap.h: In function 'entity_inst_t OSDMap::get_inst(int)':
./osd/OSDMap.h:460: FAILED assert(exists(osd) && is_up(osd))
ceph version 0.22.1 (commit:c6f403a6f441184956e00659ce713eaee7014279)
1: (Objecter::op_submit(Objecter::Op*)+0x6c2) [0x38658854c2]
2: /usr/lib64/librados.so.1() [0x3865855dc9]
3: (RadosClient::aio_write(RadosClient::PoolCtx&, object_t, long,
ceph::buffer::list const&, unsigned long,
RadosClient::AioCompletion*)+0x24b) [0x386585724b]
4: (rados_aio_write()+0x9a) [0x386585741a]
5: /usr/bin/qemu-kvm() [0x45a305]
6: /usr/bin/qemu-kvm() [0x45a430]
7: /usr/bin/qemu-kvm() [0x43bb73]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
*** Caught signal (ABRT) ***
ceph version 0.22.1 (commit:c6f403a6f441184956e00659ce713eaee7014279)
1: (sigabrt_handler(int)+0x91) [0x3865922b91]
2: /lib64/libc.so.6() [0x3c0c032a30]
3: (gsignal()+0x35) [0x3c0c0329b5]
4: (abort()+0x175) [0x3c0c034195]
5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x3c110beaad]

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomon: drop unnecessary state checks
Sage Weil [Wed, 10 Nov 2010 16:50:25 +0000 (08:50 -0800)]
mon: drop unnecessary state checks

We want to ignore all beacons from the mds regardless of what state they
are in.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agodebian: don't explicitly depend on libgoogle-perftools0
Sage Weil [Wed, 10 Nov 2010 16:45:36 +0000 (08:45 -0800)]
debian: don't explicitly depend on libgoogle-perftools0

dpkg-buildpackage will autodetect the dependency.  Except on lenny, where
it doesn't exist and we don't use it!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: Enable --journal_check mode.
Greg Farnum [Wed, 10 Nov 2010 16:11:23 +0000 (08:11 -0800)]
mds: Enable --journal_check mode.

This replaces the old --shadow option, which didn't work.
It starts up the MDS daemon, then replays the journal for
another MDS, and then shuts down.

Also minimally modifies the MDSMonitor to enable this
behavior; since it requires shared state.

14 years agoosdc: Fix bad assert in ~ObjectCacher.
Greg Farnum [Tue, 9 Nov 2010 18:48:00 +0000 (10:48 -0800)]
osdc: Fix bad assert in ~ObjectCacher.

The objects data member is never empty on shutdown since it now consists
of a vector of pools. Instead, check each pool map for emptiness.

14 years agouclient: only update inode if version increased
Sage Weil [Wed, 10 Nov 2010 15:42:29 +0000 (07:42 -0800)]
uclient: only update inode if version increased

This realigns the code with the kernel version, fixing a number of
problems when you have multiple MDSs returning info on the same inode.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agodecompile_crush_bucket: fix depth-first decomp
Colin Patrick McCabe [Wed, 10 Nov 2010 07:59:06 +0000 (23:59 -0800)]
decompile_crush_bucket: fix depth-first decomp

We need to ensure that buckets are output after their dependencies. The
best way to do this is a depth-first traversal of the bucket directed
acyclic graph. The previous solution was incorrect because it in some
cases it didn't traverse the graph in the right order.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoCrushWrapper:get_bucket: ret ENOENT for no bucket
Colin Patrick McCabe [Wed, 10 Nov 2010 07:48:01 +0000 (23:48 -0800)]
CrushWrapper:get_bucket: ret ENOENT for no bucket

All the callers of CrushWrapper::get_bucket() check for error codes, but
not for NULL returns. So if there is no bucket (i.e., a NULL pointer) at
crush->bucket[i], just return the error code ENOENT. This is consistent
with how we handle other out-of-bounds requests.

Also, don't allow the caller to get us to try to access negative indices
in crush->bucket.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMerge branch 'sched_scrub' into unstable
Sage Weil [Tue, 9 Nov 2010 23:56:20 +0000 (15:56 -0800)]
Merge branch 'sched_scrub' into unstable

Conflicts:
src/osd/PG.cc
src/osd/PG.h

14 years agoosd: small cleanup
Sage Weil [Tue, 9 Nov 2010 23:50:48 +0000 (15:50 -0800)]
osd: small cleanup

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: scrub: list objects without lock held
Sage Weil [Tue, 9 Nov 2010 23:08:15 +0000 (15:08 -0800)]
osd: scrub: list objects without lock held

We'll go back to get anything we missed later.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'scrub_no_lock' into unstable
Sage Weil [Tue, 9 Nov 2010 23:46:54 +0000 (15:46 -0800)]
Merge branch 'scrub_no_lock' into unstable

14 years agops-ceph.pl: don't show self
Colin Patrick McCabe [Tue, 9 Nov 2010 23:34:52 +0000 (15:34 -0800)]
ps-ceph.pl: don't show self

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agogui: add missing #include
Sage Weil [Tue, 9 Nov 2010 23:04:10 +0000 (15:04 -0800)]
gui: add missing #include

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'rbd-fiemap' into unstable
Sage Weil [Tue, 9 Nov 2010 22:50:24 +0000 (14:50 -0800)]
Merge branch 'rbd-fiemap' into unstable

14 years agoobjecter: set READ flag on new objecter mapext/read_sparse ops
Sage Weil [Tue, 9 Nov 2010 22:49:47 +0000 (14:49 -0800)]
objecter: set READ flag on new objecter mapext/read_sparse ops

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoobjecter: fix balancer for ops with length < 0
Sage Weil [Tue, 9 Nov 2010 22:48:52 +0000 (14:48 -0800)]
objecter: fix balancer for ops with length < 0

Notably, mapext.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agofilestore: autodetect presense of FIEMAP ioctl
Sage Weil [Tue, 9 Nov 2010 22:36:02 +0000 (14:36 -0800)]
filestore: autodetect presense of FIEMAP ioctl

If it's not there, assume the whole object is allocated.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agofiemap: include linux fiemap.h header; unconditionally compile helper
Sage Weil [Tue, 9 Nov 2010 22:35:33 +0000 (14:35 -0800)]
fiemap: include linux fiemap.h header; unconditionally compile helper

If the system doesn't have the header, use our copy.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agops-ceph.pl: display Ceph tests
Colin Patrick McCabe [Tue, 9 Nov 2010 22:32:49 +0000 (14:32 -0800)]
ps-ceph.pl: display Ceph tests

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMerge remote branch 'origin/rbd-fiemap' into unstable
Sage Weil [Tue, 9 Nov 2010 22:23:12 +0000 (14:23 -0800)]
Merge remote branch 'origin/rbd-fiemap' into unstable

14 years agoFix example config file
Colin Patrick McCabe [Tue, 9 Nov 2010 22:06:42 +0000 (14:06 -0800)]
Fix example config file

We need to specify a journal size for the file-based journal we set up
in the example config file.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoTimerThread:don't call pop_front before iter deref
Colin Patrick McCabe [Tue, 9 Nov 2010 21:57:17 +0000 (13:57 -0800)]
TimerThread:don't call pop_front before iter deref

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMakefile: use openssl module check
Kacper Kowalik [Tue, 9 Nov 2010 21:30:15 +0000 (13:30 -0800)]
Makefile: use openssl module check

This allows ceph to build with --as-needed.

Signed-off-by: Kacper Kowalik <xarthisius@gentoo.org>
14 years agoosd: shut down if we do not exist
Sage Weil [Tue, 9 Nov 2010 21:17:25 +0000 (13:17 -0800)]
osd: shut down if we do not exist

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: handle osds that no longer exist in prior_set_affected
Sage Weil [Tue, 9 Nov 2010 21:08:56 +0000 (13:08 -0800)]
osd: handle osds that no longer exist in prior_set_affected

Consider no-longer-existent OSDs lost.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoObjecter: initialize timer in Objecter::init
Colin Patrick McCabe [Tue, 9 Nov 2010 20:04:47 +0000 (12:04 -0800)]
Objecter: initialize timer in Objecter::init

Just in case future users of Objecter want to create one before calling
Messenger::start as a daemon.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoAdd test_crushtool.sh
Colin Patrick McCabe [Tue, 9 Nov 2010 18:13:46 +0000 (10:13 -0800)]
Add test_crushtool.sh

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agomds: turn on mds_bal_frag (dir fragmentation) by default
Sage Weil [Tue, 9 Nov 2010 18:06:10 +0000 (10:06 -0800)]
mds: turn on mds_bal_frag (dir fragmentation) by default

Let the fun begin!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix inode freeze auth pin allowance
Sage Weil [Tue, 9 Nov 2010 17:55:14 +0000 (09:55 -0800)]
mds: fix inode freeze auth pin allowance

When we're renaming across nodes, we need to freeze the inode.  This
requires that we allow for the auth_pins that _we_ hold, which include
one because of the linklock xlock, and one by the MDRequest.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: handle osds that no longer exist in build_prior
Sage Weil [Tue, 9 Nov 2010 17:43:25 +0000 (09:43 -0800)]
osd: handle osds that no longer exist in build_prior

Fix build_prior to handle OSDs that no longer exist in the current map.
Consider them lost.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoCrushWrapper::get_bucket_item: bounds check
Colin Patrick McCabe [Tue, 9 Nov 2010 17:57:15 +0000 (09:57 -0800)]
CrushWrapper::get_bucket_item: bounds check

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agocrushtool: don't create a dump we can't recompile
Colin Patrick McCabe [Tue, 9 Nov 2010 17:55:44 +0000 (09:55 -0800)]
crushtool: don't create a dump we can't recompile

In crushtool, dump buckets in tree order. Buckets which reference other
buckets must be dumped after their depedencies, or else re-compilation
will fail.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosdmap: cleanup: add parens
Sage Weil [Tue, 9 Nov 2010 17:56:05 +0000 (09:56 -0800)]
osdmap: cleanup: add parens

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: wipe out client sessions on startup
Sage Weil [Thu, 14 Oct 2010 21:40:54 +0000 (14:40 -0700)]
mds: wipe out client sessions on startup

For disaster recovery and such.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomon: implement 'mds newfs <metapool> <datapool>' command
Sage Weil [Thu, 14 Oct 2010 20:55:15 +0000 (13:55 -0700)]
mon: implement 'mds newfs <metapool> <datapool>' command

Create a new fs (by creating a new MDSMap) using the given pools.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: use mdsmap data pool for root inode default layout
Sage Weil [Thu, 14 Oct 2010 20:53:47 +0000 (13:53 -0700)]
mds: use mdsmap data pool for root inode default layout

The MDSMap may specify any random pool as the data pool; use that.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: add mds_skip_ino and mds_wipe_ino_prealloc options
Sage Weil [Thu, 14 Oct 2010 20:37:59 +0000 (13:37 -0700)]
mds: add mds_skip_ino and mds_wipe_ino_prealloc options

These are last-ditch recovery tools.  Not particularly effective ones,
though.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoceph.spec.in: don't strip rados classes
Christian Brunner [Tue, 9 Nov 2010 06:03:02 +0000 (22:03 -0800)]
ceph.spec.in: don't strip rados classes

Signed-off-by: Christian Brunner <christian@brunner-muc.de>
14 years agomds: add missing Dumper.[h,cc]
Sage Weil [Sat, 6 Nov 2010 19:12:38 +0000 (12:12 -0700)]
mds: add missing Dumper.[h,cc]

14 years agomds: tolerate/fix negative dir size counts
Sage Weil [Mon, 8 Nov 2010 21:18:31 +0000 (13:18 -0800)]
mds: tolerate/fix negative dir size counts

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: add missing Dumper.[h,cc]
Sage Weil [Sat, 6 Nov 2010 19:12:38 +0000 (12:12 -0700)]
mds: add missing Dumper.[h,cc]

14 years agoReplace ps-ceph.sh shell script with perl script
Andrew Farmer [Mon, 8 Nov 2010 17:41:06 +0000 (09:41 -0800)]
Replace ps-ceph.sh shell script with perl script

A much faster version of ps-ceph.sh.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMerge remote branch 'origin/object_locator' into unstable
Sage Weil [Sun, 7 Nov 2010 17:56:42 +0000 (09:56 -0800)]
Merge remote branch 'origin/object_locator' into unstable

Conflicts:
src/osd/OSD.cc
src/osd/ReplicatedPG.cc
src/osd/ReplicatedPG.h
src/osd/osd_types.h

14 years agoMerge remote branch 'origin/timer-fixes' into unstable
Sage Weil [Sun, 7 Nov 2010 17:45:09 +0000 (09:45 -0800)]
Merge remote branch 'origin/timer-fixes' into unstable

14 years agov0.24~rc
Sage Weil [Sun, 7 Nov 2010 17:44:04 +0000 (09:44 -0800)]
v0.24~rc

14 years agoMerge remote branch 'origin/testing' into unstable
Sage Weil [Sun, 7 Nov 2010 17:42:51 +0000 (09:42 -0800)]
Merge remote branch 'origin/testing' into unstable

14 years agomds: eval: put scatter in MIX if replicated, otherwise LOCK
Sage Weil [Sun, 7 Nov 2010 15:49:59 +0000 (07:49 -0800)]
mds: eval: put scatter in MIX if replicated, otherwise LOCK

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: do not scatter_writebehind in MIX state
Sage Weil [Sun, 7 Nov 2010 15:45:52 +0000 (07:45 -0800)]
mds: do not scatter_writebehind in MIX state

Replicas might come in while we're flushing and get a MIX state with
the old state.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'unstable' into mix_stale
Sage Weil [Sun, 7 Nov 2010 04:05:11 +0000 (21:05 -0700)]
Merge branch 'unstable' into mix_stale

14 years agomds: remove MIX_STALE
Sage Weil [Sat, 6 Nov 2010 18:35:54 +0000 (11:35 -0700)]
mds: remove MIX_STALE

Yay, we don't need it!

If we can't update the frag on scatter, fine.  The staleness of the frag
is implicit in the frag's scatter stat version not matching the inode's.
If/when we do want to update it, the frag will clearly be writable, and
we can bring it back in sync then.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: don't fuss with versions when taking frag/rstat from frag; it's never stale...
Sage Weil [Sat, 6 Nov 2010 18:18:53 +0000 (11:18 -0700)]
mds: don't fuss with versions when taking frag/rstat from frag; it's never stale here

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: introduce/use helpers to resync stale fragstat/rstat; update version
Sage Weil [Sat, 6 Nov 2010 18:18:13 +0000 (11:18 -0700)]
mds: introduce/use helpers to resync stale fragstat/rstat; update version

Simplifies code.

Also, update the version when we resync!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: ignore done_locking on slave requests' acquire_locks()
Sage Weil [Sun, 7 Nov 2010 03:55:12 +0000 (20:55 -0700)]
mds: ignore done_locking on slave requests' acquire_locks()

Slave requests ask for each xlock one at a time.  Don't bail out based on
the done_locking flag.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: don't use helper for rename srcdn
Sage Weil [Sun, 7 Nov 2010 03:17:32 +0000 (20:17 -0700)]
mds: don't use helper for rename srcdn

The rdlock_path_xlock_dentry helper works for _auth_ dentries that we
create locally in an auth dirfrag.  For the srcdn, we need to discover an
_existing_ dentry that is not necessarily auth.

Call path_traverse ourselves, but be careful to take the appropriate locks
on the resulting dn, dir, and ancestors.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: never complete a gather on a flushing lock
Sage Weil [Sat, 6 Nov 2010 18:02:13 +0000 (11:02 -0700)]
mds: never complete a gather on a flushing lock

The scatter_writebehind() takes a wrlock, but that may still allow the lock
to complete a gather to LOCK and even move to say MIX before the data is
committed.  Bad news!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: update version when bring stale rstat back up to date
Sage Weil [Sat, 6 Nov 2010 16:38:15 +0000 (09:38 -0700)]
mds: update version when bring stale rstat back up to date

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: simplify stale semantics a bit
Sage Weil [Sat, 6 Nov 2010 14:58:32 +0000 (07:58 -0700)]
mds: simplify stale semantics a bit

is_stale() => next MIX is MIX_STALE. Stale flag is then cleared.  Then we
special case the import to preserve stale-ness.

TODO: add_replica_inode likely has this same problem.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: preserve stale state on import; some cleanup
Sage Weil [Sat, 6 Nov 2010 04:52:28 +0000 (21:52 -0700)]
mds: preserve stale state on import; some cleanup

Our new invariant is that MIX_STALE always implies is_stale().  And on
import, if is_stale(), MIX becomes MIX_STALE.  This ensures that a replica
that we put into MIX_STALE doesn't turn back into MIX if we import it
and take the auth's state in CInode::decode_import().

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'mix_stale' into unstable
Sage Weil [Sat, 6 Nov 2010 00:08:10 +0000 (17:08 -0700)]
Merge branch 'mix_stale' into unstable

14 years agomds: add more verify_scatter asserts
Sage Weil [Sat, 6 Nov 2010 00:06:10 +0000 (17:06 -0700)]
mds: add more verify_scatter asserts

For catchings fragstat errors sooner.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix version check on resyncing stale rstat in predirty_journal_parents
Sage Weil [Fri, 5 Nov 2010 22:24:53 +0000 (15:24 -0700)]
mds: fix version check on resyncing stale rstat in predirty_journal_parents

We're resyncing rstat, so check the rstat version (not fragstat!)

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: Fix bad inode deref.
Greg Farnum [Fri, 5 Nov 2010 19:45:06 +0000 (12:45 -0700)]
mds: Fix bad inode deref.

Accidentally trying to print out the CInode after removing it in trim_non_auth!
Move the print to before it's been unlinked/removed/etc.

14 years agoRevisit std::multimap decoder
Colin Patrick McCabe [Fri, 5 Nov 2010 19:17:40 +0000 (12:17 -0700)]
Revisit std::multimap decoder

Previously I changed the std::multimap decoder to minimize the number of
constructor invocations. However, it could be much more expensive to
copy an initialized (decoded) val_t than to copy an empty one. For
example, if we are decoding std::multimap < int, std::set <int> >. So
change the code to insert a non-decoded val_t again.

However, this still saves two constructor invocations over the original.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoautogen.sh: check for pkg-config
Colin Patrick McCabe [Fri, 5 Nov 2010 18:34:11 +0000 (11:34 -0700)]
autogen.sh: check for pkg-config

To avoid seeing confusing errors later in the configure process, in
autogen.sh, check to make sure the pkg-config program is installed.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoPG.cc: build_scrub_map now drops the PG lock while scanning the PG
Samuel Just [Thu, 21 Oct 2010 23:54:01 +0000 (16:54 -0700)]
PG.cc: build_scrub_map now drops the PG lock while scanning the PG
       build_inc_scrub_map scans all files modified since the given
           version number and creates an incremental scrub map to
           be merged with a scrub map created with build_scrub_map.
           This scan is done while holding the pg lock.
       ScrubMap.objects is now represented as a map rather than as
           a vector.

PG.h:  Added last_update_applied and finalizing_scrub members to
           PG.

ReplicatedPG.cc:
       calc_trim_to will not trim the log during a scrub (since
           replicas need the log to construct incremental maps)
       sub_op_modify_oplied and op_applied maintain a
   last_update_applied PG member to be used for determining
           how far back a replica need go to construct an
           incremental scrub map.

osd_types.h:
       Added merge_incr method for combining a scrub map with
           a subsequent incremental scrub map.
       ScrubMap.objects is now a map from sobject_t to object.

PG scrubs will now drop the PG lock while initially scanning the PG
collection allowing writes to continue.  The scrub map will be tagged
with the most recent version applied.  After halting writes, the
primary will request an incremental map from any replicas whose map
versions do not match log.head.

Signed-off-by: Samuel Just <samuelj@hq.newdream.net>
14 years agomds: preserve version when recovering rstat from dirfrag in predirty_journal_parents
Sage Weil [Fri, 5 Nov 2010 17:38:35 +0000 (10:38 -0700)]
mds: preserve version when recovering rstat from dirfrag in predirty_journal_parents

We don't want to screw up the version here.  This aligns the code with
other instances of this check.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: restructure finish_scatter_gather_update()
Sage Weil [Fri, 5 Nov 2010 06:20:33 +0000 (23:20 -0700)]
mds: restructure finish_scatter_gather_update()

Separate behavior into two dimensions: whether or not we are updating
the dirfrag, and whether or not the dirfrag is stale.

Change the various helpers to NOT implicitly update accounted_*, as the
caller doesn't always want that, notably when we are non-stale but frozen.

Signed-off-by: Sage Weil <sage@newdream.net>