]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
13 years agoAdded LevelDBStore
Samuel Just [Wed, 29 Feb 2012 02:02:34 +0000 (18:02 -0800)]
Added LevelDBStore

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoAdded leveldb submodule
Samuel Just [Wed, 29 Feb 2012 02:03:18 +0000 (18:03 -0800)]
Added leveldb submodule

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoMakefile: make check-local relative to $(srcdir)
Samuel Just [Thu, 1 Mar 2012 04:28:05 +0000 (20:28 -0800)]
Makefile: make check-local relative to $(srcdir)

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agorgw: don't check for ECANCELED in the _impl() functions
Yehuda Sadeh [Wed, 29 Feb 2012 21:51:45 +0000 (13:51 -0800)]
rgw: don't check for ECANCELED in the _impl() functions

We already check it in the outer functions.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agorgw: don't retry certain operations if we raced
Yehuda Sadeh [Wed, 29 Feb 2012 19:34:33 +0000 (11:34 -0800)]
rgw: don't retry certain operations if we raced

The atomic get/put scheme was retrying writes in case where it lost
races (head object was rewritten by another client). Instead we can
just back off and return success.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agomsgr: fix race in learned_addr()
Sage Weil [Wed, 29 Feb 2012 21:22:34 +0000 (13:22 -0800)]
msgr: fix race in learned_addr()

- two connect() threads
- both hit if (need_addr) check
- one takes lock, sets addr, need_addr = false, unlocks
- continues to ::encode(ms_addr, ...);
- meanwhile, second thread set ms_addr _again_, but copies peer port into
  place before adjusting it.  racing ::encode() sees bad port and sends it
  to the peer.

Fix this two ways:

- don't copy bad port into place; set it first
- re-check need_addr after taking lock

Fixes: #1747
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agomsgr: print existing->state before failing assert
Sage Weil [Wed, 29 Feb 2012 20:28:19 +0000 (12:28 -0800)]
msgr: print existing->state before failing assert

May help with #1378.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge remote-tracking branch 'gh/wip-2121'
Sage Weil [Wed, 29 Feb 2012 19:07:03 +0000 (11:07 -0800)]
Merge remote-tracking branch 'gh/wip-2121'

Reviewed-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com>
13 years agoosd: unregister signal handlers on shutdown
Sage Weil [Wed, 29 Feb 2012 17:46:13 +0000 (09:46 -0800)]
osd: unregister signal handlers on shutdown

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomon: unregister signal handlers on shutdown
Sage Weil [Wed, 29 Feb 2012 17:46:06 +0000 (09:46 -0800)]
mon: unregister signal handlers on shutdown

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomds: unregister SIGHUP too
Sage Weil [Wed, 29 Feb 2012 17:45:56 +0000 (09:45 -0800)]
mds: unregister SIGHUP too

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoradosgw: handle SIGHUP
Sage Weil [Wed, 29 Feb 2012 17:45:46 +0000 (09:45 -0800)]
radosgw: handle SIGHUP

Fixes: #2121
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoinit-radosgw: add 'reload' command to send SIGHUP
Sage Weil [Wed, 29 Feb 2012 17:23:22 +0000 (09:23 -0800)]
init-radosgw: add 'reload' command to send SIGHUP

Fixes: #2121
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: fix typo is recovery_state query dump
Sage Weil [Wed, 29 Feb 2012 17:21:22 +0000 (09:21 -0800)]
osd: fix typo is recovery_state query dump

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: add missing space to scrub error
Sage Weil [Wed, 29 Feb 2012 17:17:07 +0000 (09:17 -0800)]
osd: add missing space to scrub error

[ERR] 18.5 osd.3: soid 8a5e37ad/rb.0.0.000000002b99/headextra attr _, extra attr snapset

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomsgr: discard the local_pipe's queue on shutdown.
Greg Farnum [Wed, 29 Feb 2012 01:30:23 +0000 (17:30 -0800)]
msgr: discard the local_pipe's queue on shutdown.

To facilitate this, we do two things:
1) actually identify the number of special code values we pass around
2) use that to prevent trying to put() those non-pointer values in
Pipe::discard_queue().
Then we just call local_pipe.discard_queue() in wait() like happens
(indirectly, via reaping) with all the normal Pipes in rank_pipe.

But this does make me think that we may be approaching the point
where it's appropriate to create a subclass LocalPipe (against a
RemotePipe like our current Pipe implementation is mostly intended
to be).

Should fix #2086.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
13 years agoosd: remove down OSDs from peer_info on reset
Sage Weil [Wed, 29 Feb 2012 17:10:57 +0000 (09:10 -0800)]
osd: remove down OSDs from peer_info on reset

If an OSD goes down, remove it from peer_info. In particular, I saw

2012-02-28 11:04:25.851038 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3599 mlcod 0'0 peering] state<Started/Primary/Peering>: Peering advmap
2012-02-28 11:04:25.851491 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3599 mlcod 0'0 peering]  PriorSet: affected_by_map osd.1 now down
...
2012-02-28 11:04:25.998186 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering]  PriorSet: build_prior interval(3587-3597 [3,1]/[3,1] maybe_went_rw)
2012-02-28 11:04:25.998636 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering]  PriorSet: build_prior  prior osd.1 is down
2012-02-28 11:04:25.999106 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering]  PriorSet: build_prior final: probe 3,5 down 1 blocked_by {}
...
2012-02-28 11:04:26.001723 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] enter Started/Primary/Peering/GetLog
2012-02-28 11:04:26.002428 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting osd.1 1.15( v 10'1 (0'0,10'1] n=1 ec=1 les/c 0/3587 3598/3598/3598)
2012-02-28 11:04:26.003000 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting osd.3 1.15( v 10'1 (0'0,10'1] n=1 ec=1 les/c 0/3587 3598/3598/3598)
2012-02-28 11:04:26.003528 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting osd.5 1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598)
2012-02-28 11:04:26.004109 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting newest update on osd.1 with 1.15( v 10'1 (0'0,10'1] n=1 ec=1 les/c 0/3587 3598/3598/3598)

Any time an osd goes down we want to ensure we remove it from peer_info.
Handling this in Reset and Started states captures all of the nested
states, which forward the event (or re-post transit to Reset).  We can
also drop the Primary reaction, which is now superfluous.

Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoMerge branch 'next'
Sage Weil [Wed, 29 Feb 2012 01:04:55 +0000 (17:04 -0800)]
Merge branch 'next'

13 years agomon: report pgs stuck inactive/unclean/stale in health check
Josh Durgin [Tue, 28 Feb 2012 01:49:13 +0000 (17:49 -0800)]
mon: report pgs stuck inactive/unclean/stale in health check

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomon: fix slurp_latest to fill in any missing incrementals
Greg Farnum [Tue, 28 Feb 2012 20:28:47 +0000 (12:28 -0800)]
mon: fix slurp_latest to fill in any missing incrementals

Fixes #1789.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agotest_osd_types: fix unit test for new pg_t::is_split() prototype
Sage Weil [Tue, 28 Feb 2012 17:33:18 +0000 (09:33 -0800)]
test_osd_types: fix unit test for new pg_t::is_split() prototype

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMakefile: drop separate libjson_spirit.la
Sage Weil [Tue, 28 Feb 2012 17:30:38 +0000 (09:30 -0800)]
Makefile: drop separate libjson_spirit.la

automake seems to have difficulty with the .la dependency on another .la.
Since libjson_spirit.la is only used by libcommon.la anyway, just build it
directly into that.  Sigh.

...
CXXLD libjson_spirit.la
AR libmds.a
CXXLD libcls_rbd.la
CXXLD libcls_rgw.la
CXXLD cephfs
CCLD test_ioctls
CC libcommon_la-ceph_ver.lo
CXX libcommon_la-version.lo
CXX ceph_dencoder.o
CCLD mount.ceph
CC ceph_ver.o
CXX test_libhadoopcephfs_build-version.o
CXXLD test_libhadoopcephfs_build
CXXLD libcommon.la
libtool: link: cannot find the library `libjson_spirit.la' or unhandled argument `libjson_spirit.la'

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: drop useless ENOMEM check
Sage Weil [Tue, 28 Feb 2012 17:26:04 +0000 (09:26 -0800)]
osd: drop useless ENOMEM check

new throws exception; doesn't return NULL.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoceph-osd: clarify error messages
Sage Weil [Tue, 28 Feb 2012 17:11:59 +0000 (09:11 -0800)]
ceph-osd: clarify error messages

So we know where the error came from.  And use real error codes in init().

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoinit: Actually do start the daemons when 'service ceph start <type>' is specified
Wido den Hollander [Tue, 28 Feb 2012 11:41:42 +0000 (12:41 +0100)]
init: Actually do start the daemons when 'service ceph start <type>' is specified

A bug in my previous patch prevented any daemon with auto_start set to false from starting.

This patch allows:
* /etc/init.d/ceph start osd|mds|mon
* service ceph start osd|mds|mon

It however does not start daemons if auto_start is disabled when you invoke:
* /etc/init.d/ceph start
* service ceph start

Signed-off-by: Wido den Hollander <wido@widodh.nl>
13 years agodoc: beginnings of documentation of stuck pgs and pg states
Sage Weil [Mon, 27 Feb 2012 23:41:57 +0000 (15:41 -0800)]
doc: beginnings of documentation of stuck pgs and pg states

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
13 years agofilestore: make less noise on ENOENT
Sage Weil [Mon, 27 Feb 2012 23:13:13 +0000 (15:13 -0800)]
filestore: make less noise on ENOENT

Don't generate high-level log spam on every open error.

Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
13 years agopg: use get_cluster_inst instead of get_inst in activate
Greg Farnum [Mon, 27 Feb 2012 22:49:18 +0000 (14:49 -0800)]
pg: use get_cluster_inst instead of get_inst in activate

This was mistakenly broken in 4b3bb5ab37a05fa001d59f24da7d9c30d650321b

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Reviewed-by: Sam Just <sam.just@dreamhost.com>
13 years agoMerge branch 'wip-split2'
Sage Weil [Mon, 27 Feb 2012 22:37:41 +0000 (14:37 -0800)]
Merge branch 'wip-split2'

Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoosd: pg_t::is_split(): make children out param a pointer, and optional
Sage Weil [Mon, 27 Feb 2012 22:35:21 +0000 (14:35 -0800)]
osd: pg_t::is_split(): make children out param a pointer, and optional

Also unit test it.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: bypass split code
Sage Weil [Mon, 27 Feb 2012 22:18:21 +0000 (14:18 -0800)]
osd: bypass split code

Until it is fully implemented.  It's also disabled in the monitor
currently, but just in case it gets into the OSDMap, do nothing for now.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: fix pg locking flags
Sage Weil [Tue, 21 Feb 2012 00:46:03 +0000 (16:46 -0800)]
osd: fix pg locking flags

Two things we need to handle:

 - callers who already hold map_lock (split_pg())
 - callers who already hold another pg->lock, and want to skip the lockdep
   check for this one.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: partially refactor pg split
Sage Weil [Mon, 27 Feb 2012 22:04:22 +0000 (14:04 -0800)]
osd: partially refactor pg split

This partially refactors the OSD split code to do the split synchronously
when processing a new OSDMap.  It is incomplete in that it does not yet
do anything useful for the PG.  The full solution needs to:

- Do the split synchronously when applying the map update.
- Reset the parent pg so that it repeers.  This will cause problems until
  we consistently consider this a new interval when looking backwards in
  time; this needs to be fixed.  Anybody doing generate_past_intervals()
  or similar will need to consider a split/merge event as an interval
  boundary.
- The recovery state machine should trigger appropriately when this
  happens.
- The old PG that was split should probably be handle identically to the
  new children.  That means deleting the old PG instance and creating a new
  PG object for the newly-split child.  Ditto for merge.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: implement pg_t::is_split()
Sage Weil [Mon, 20 Feb 2012 23:59:00 +0000 (15:59 -0800)]
osd: implement pg_t::is_split()

Test to determine if a pg has split between two pool sizes, and if so,
what its children are.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: factor hobject key into child pgid calc during split
Sage Weil [Mon, 20 Feb 2012 22:12:16 +0000 (14:12 -0800)]
osd: factor hobject key into child pgid calc during split

When we calculate the object's new pg, take the locator key into
consideration, to avoid a crash like

osd/OSD.cc: In function 'void OSD::split_pg(PG*, std::map<pg_t, PG*>&,ObjectStore::Transaction&)' thread 7fe3df8c4700 time 2012-02-20 18:22:19.900886
osd/OSD.cc: 4066: FAILED assert(child)

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agojournaler: log on unexpected objecter error
Sage Weil [Mon, 27 Feb 2012 19:39:53 +0000 (11:39 -0800)]
journaler: log on unexpected objecter error

This will help with #2110, #1796, #1640.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: fix recursive map_lock via check_replay_queue()
Sage Weil [Mon, 27 Feb 2012 17:56:21 +0000 (09:56 -0800)]
osd: fix recursive map_lock via check_replay_queue()

Also drop activate_pg() helper while we're at it, so it's clear that we
are the only user.

recursive lock of OSD::map_lock (33)
 ceph version 0.42-146-g7ad35ce (commit:7ad35ce489cc5f9169eb838e1196fa2ca4d6e985)
2012-02-24 12:30:16.541416 1: (PG::lock(bool)+0x2a) [0xa09348]
2012-02-24 12:30:16.541424 2: (OSD::_lookup_lock_pg(pg_t)+0xbd) [0x84b8df]
2012-02-24 12:30:16.541431 3: (OSD::activate_pg(pg_t, utime_t)+0x9f) [0x87463b]
2012-02-24 12:30:16.541442 4: (OSD::check_replay_queue()+0x12f) [0x87452d]
2012-02-24 12:30:16.541450 5: (OSD::tick()+0x23c) [0x8535ea]
2012-02-24 12:30:16.541456 6: (OSD::C_Tick::finish(int)+0x1f) [0x881671]
2012-02-24 12:30:16.541462 7: (SafeTimer::timer_thread()+0x2d5) [0x8f8211]
2012-02-24 12:30:16.541468 8: (SafeTimerThread::entry()+0x1c) [0x8f923c]
2012-02-24 12:30:16.541475 9: (Thread::_entry_func(void*)+0x23) [0x9c8109]
2012-02-24 12:30:16.541485 10: (()+0x68ba) [0x7f9dbed838ba]
2012-02-24 12:30:16.541491 11: (clone()+0x6d) [0x7f9dbd66f02d]
2012-02-24 12:30:16.541495 common/lockdep.cc: In function 'int lockdep_will_lock(const char*, int)' thread 7f9db9d98700 time 2012-02-24 12:30:16.541504

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Reviewed-by: Sam Just <samuel.just@dreamhost.com>
13 years agoinit-ceph: stick with /var/run for the time being
Sage Weil [Mon, 27 Feb 2012 04:56:05 +0000 (20:56 -0800)]
init-ceph: stick with /var/run for the time being

/run isn't present on older systems.  Stick with the old location until it
is more pervasive, or we add an autoconf option to control it.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agodebian: /var/run/ceph -> /run/ceph
Laszlo Boszormenyi [Mon, 27 Feb 2012 04:47:53 +0000 (20:47 -0800)]
debian: /var/run/ceph -> /run/ceph

/run/ceph should exists for creating UNIX domain sockets
ceph uses UNIX domain sockets for internal communication. Create their
directory on startup as /run is on a virtual filesystem.

Last-Update: <2012-02-26>
Bug-Debian: http://bugs.debian.org/660238
Forwarded: <ceph-devel@vger.kernel.org>
Signed-off-by: Laszlo Boszormenyi (GCS) <gcs@debian.hu>
13 years agodebian: build-{indep,arch}
Laszlo Boszormenyi [Mon, 27 Feb 2012 04:45:52 +0000 (20:45 -0800)]
debian: build-{indep,arch}

Signed-off-by: Laszlo Boszormenyi <gcs@debian.hu>
13 years agodebian: sdparm|hdparm, new standards version
Laszlo Boszormenyi [Mon, 27 Feb 2012 04:45:06 +0000 (20:45 -0800)]
debian: sdparm|hdparm, new standards version

Signed-off-by: Laszlo Boszormenyi <gcs@debian.hu>
13 years agorgw: initialize bucket_id in bucket structure
Yehuda Sadeh [Sat, 25 Feb 2012 01:00:35 +0000 (17:00 -0800)]
rgw: initialize bucket_id in bucket structure

might make valgrind a little bit less noisy.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agorgw: _exit(0) on SIGTERM
Sage Weil [Fri, 24 Feb 2012 23:23:44 +0000 (15:23 -0800)]
rgw: _exit(0) on SIGTERM

We need to do something a bit smarter to get coverage information, but this
is a start.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge remote branch 'gh/wip-crush-adjust'
Sage Weil [Fri, 24 Feb 2012 21:52:32 +0000 (13:52 -0800)]
Merge remote branch 'gh/wip-crush-adjust'

Reviewed-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
13 years agoMerge remote branch 'gh/wip-mds-resetter'
Sage Weil [Fri, 24 Feb 2012 21:48:06 +0000 (13:48 -0800)]
Merge remote branch 'gh/wip-mds-resetter'

Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoMerge branch 'wip-pg-query'
Sage Weil [Fri, 24 Feb 2012 21:43:43 +0000 (13:43 -0800)]
Merge branch 'wip-pg-query'

Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoMerge branch 'stable'
Sage Weil [Fri, 24 Feb 2012 21:22:49 +0000 (13:22 -0800)]
Merge branch 'stable'

13 years agov0.42.2 v0.42.2
Sage Weil [Fri, 24 Feb 2012 20:59:53 +0000 (12:59 -0800)]
v0.42.2

13 years agoMerge remote-tracking branch 'gh/stable' into stable
Sage Weil [Fri, 24 Feb 2012 21:00:33 +0000 (13:00 -0800)]
Merge remote-tracking branch 'gh/stable' into stable

13 years agoMerge branch 'stable'
Sage Weil [Fri, 24 Feb 2012 20:54:41 +0000 (12:54 -0800)]
Merge branch 'stable'

13 years agoosd: fix array index
Sage Weil [Fri, 24 Feb 2012 20:40:34 +0000 (12:40 -0800)]
osd: fix array index

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agolockdep: don't make noise on startup
Sage Weil [Fri, 24 Feb 2012 20:39:44 +0000 (12:39 -0800)]
lockdep: don't make noise on startup

Who cares!

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoformatter: fix trailing dump_stream()
Sage Weil [Fri, 24 Feb 2012 20:38:13 +0000 (12:38 -0800)]
formatter: fix trailing dump_stream()

Flush a previous dump_stream() if it was the last thing prior to a
close_section().

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: include timestamps in state json dumps
Sage Weil [Fri, 24 Feb 2012 20:04:29 +0000 (12:04 -0800)]
osd: include timestamps in state json dumps

Include the time we entered this state in the dump.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoMerge branch 'wip-2007'
Sage Weil [Fri, 24 Feb 2012 20:00:00 +0000 (12:00 -0800)]
Merge branch 'wip-2007'

Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoosd: use blocks for readability in list_missing
Sage Weil [Fri, 24 Feb 2012 19:59:20 +0000 (11:59 -0800)]
osd: use blocks for readability in list_missing

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: dump recovery_state states in json
Sage Weil [Fri, 24 Feb 2012 19:33:48 +0000 (11:33 -0800)]
osd: dump recovery_state states in json

Use a formatter.  Present a vector of states, inner to outer.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: query Peering substates
Sage Weil [Fri, 24 Feb 2012 00:30:42 +0000 (16:30 -0800)]
osd: query Peering substates

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: query recovery state machine
Sage Weil [Fri, 24 Feb 2012 00:22:08 +0000 (16:22 -0800)]
osd: query recovery state machine

For now, just append this to the end of the pg <pgid> query json dump.
We definitely want to do something smarter here, but I'm not sure whether
json or plaintext is the way to go.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: add tunable for number of records in osd command replies
Sage Weil [Fri, 24 Feb 2012 17:27:49 +0000 (09:27 -0800)]
osd: add tunable for number of records in osd command replies

e.g., 'pg <pgid> list_missing [offset]'.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: 'pg <pgid> list_missing <json hobject_t offset>'
Sage Weil [Fri, 24 Feb 2012 04:30:44 +0000 (20:30 -0800)]
osd: 'pg <pgid> list_missing <json hobject_t offset>'

Dump missing objects in json.  If more key is non-zero, user should ask for
more by passing the last object as the offset for the next request.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agohobject_t: decode json
Sage Weil [Fri, 24 Feb 2012 14:07:40 +0000 (06:07 -0800)]
hobject_t: decode json

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoadd libjson_spirit.la
Sage Weil [Fri, 24 Feb 2012 14:04:14 +0000 (06:04 -0800)]
add libjson_spirit.la

This is lightweight and relies on boost spirit, which we already use, so
there are no new dependencies.

There were some other libraries that also looked good, but they weren't
already packages for existing Debian distros like squeeze or even wheezy.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: pass in data to do_command
Sage Weil [Fri, 24 Feb 2012 04:16:05 +0000 (20:16 -0800)]
osd: pass in data to do_command

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: 'tell osd.N mark_unfound_lost revert' -> 'pg <pgid> mark_unfound_lost revert'
Sage Weil [Fri, 24 Feb 2012 19:24:04 +0000 (11:24 -0800)]
osd: 'tell osd.N mark_unfound_lost revert' -> 'pg <pgid> mark_unfound_lost revert'

More consistent interface.

Fixes: #2030
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agolockdep: warn on stderr (via derr), not stdout
Sage Weil [Fri, 24 Feb 2012 15:06:51 +0000 (07:06 -0800)]
lockdep: warn on stderr (via derr), not stdout

Otherwise we screw up ceph-conf output and the like.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agodo_autogen.sh: -T for --without-tcmalloc
Sage Weil [Thu, 23 Feb 2012 17:44:05 +0000 (09:44 -0800)]
do_autogen.sh: -T for --without-tcmalloc

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoceph: fix help.t
Sage Weil [Fri, 24 Feb 2012 02:58:35 +0000 (18:58 -0800)]
ceph: fix help.t

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agov0.42.1 v0.42.1
Sage Weil [Fri, 24 Feb 2012 02:46:30 +0000 (18:46 -0800)]
v0.42.1

13 years agodebian: add ceph-dencoder
Sage Weil [Tue, 21 Feb 2012 19:12:37 +0000 (11:12 -0800)]
debian: add ceph-dencoder

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoceph.spec.in: add ceph-dencoder
Sage Weil [Tue, 21 Feb 2012 19:12:30 +0000 (11:12 -0800)]
ceph.spec.in: add ceph-dencoder

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoceph-dencoder: man page
Sage Weil [Tue, 21 Feb 2012 19:12:13 +0000 (11:12 -0800)]
ceph-dencoder: man page

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoceph-tool: remove reference to "stop" command
Greg Farnum [Fri, 24 Feb 2012 02:13:29 +0000 (18:13 -0800)]
ceph-tool: remove reference to "stop" command

This doesn't exist any more, and I don't think it
ever "cleanly shut down the filesystem" -- certainly not
within my recent lifetime!

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Reviewed-by: Dan Mick <dan.mick@dreamhost.com>
13 years agomds: remove unused MDBalancer dump_pop_map() function.
Greg Farnum [Thu, 23 Feb 2012 23:40:20 +0000 (15:40 -0800)]
mds: remove unused MDBalancer dump_pop_map() function.

Commenting it out is not the right answer. ;)

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Reviewed-by: Dan Mick <dan.mick@dreamhost.com>
13 years agomds: clean up useless block
Sage Weil [Fri, 24 Feb 2012 00:35:22 +0000 (16:35 -0800)]
mds: clean up useless block

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomds: fix Resetter locking
Sage Weil [Thu, 23 Feb 2012 05:15:19 +0000 (21:15 -0800)]
mds: fix Resetter locking

We need to hold the lock for ms_dispatch, esp calls into objecter.  We
should only drop it when blocking; use distinct naming for the on-stack
mutex used for that.

Reported-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoMerge remote branch 'origin/wip-mds-old-inodes'
Greg Farnum [Thu, 23 Feb 2012 23:33:39 +0000 (15:33 -0800)]
Merge remote branch 'origin/wip-mds-old-inodes'

Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoMerge remote branch 'origin/wip-dencoder'
Greg Farnum [Thu, 23 Feb 2012 23:06:32 +0000 (15:06 -0800)]
Merge remote branch 'origin/wip-dencoder'

Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoMerge remote branch 'origin/wip-1820'
Greg Farnum [Thu, 23 Feb 2012 23:06:15 +0000 (15:06 -0800)]
Merge remote branch 'origin/wip-1820'

Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoosd: only set CLEAN when we are not remapped (up == acting)
Sage Weil [Thu, 23 Feb 2012 23:05:46 +0000 (15:05 -0800)]
osd: only set CLEAN when we are not remapped (up == acting)

If we have a temporary mapping for this PG, consider that unclean.  This
makes CLEAN and REMAPPED mutually exclusive.  For example, a 2 node cluster
with 2x replication and one osd marked out will make the pgs all
active+remapped, not active+clean+remapped.

Fixes: #2094
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge remote-tracking branch 'gh/wip-pg-query'
Sage Weil [Thu, 23 Feb 2012 22:56:54 +0000 (14:56 -0800)]
Merge remote-tracking branch 'gh/wip-pg-query'

Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoosd: conditionally encode old pg_pool_t when no CEPH_FEATURE_OSDENC
Greg Farnum [Thu, 23 Feb 2012 22:55:48 +0000 (14:55 -0800)]
osd: conditionally encode old pg_pool_t when no CEPH_FEATURE_OSDENC

This fixes OSDMap compatibility between v0.42 and <v0.42.

For MOSDMap, reencode maps if OSDENC feature is missing.  Also rev the
message version.  We don't use COMPAT version here because v3 can't be
understood by v2 (that's why we're checking feature bits).  (It will be
possible to do that later when our constituent types can be decoded by
multiple versions.)

Fixes: #2095
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoMerge remote-tracking branch 'gh/wip-dump-ops-in-flight'
Sage Weil [Thu, 23 Feb 2012 22:38:03 +0000 (14:38 -0800)]
Merge remote-tracking branch 'gh/wip-dump-ops-in-flight'

Reviewed-by: Sage Weil <sage@newdream.net>
13 years agomon: use pending_mdsmap for deactivate
Sage Weil [Thu, 23 Feb 2012 22:24:12 +0000 (14:24 -0800)]
mon: use pending_mdsmap for deactivate

We should always look at the proposed map to avoid weird races.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agodoc: 'deactivate mds' instead of 'stop mds'
Sage Weil [Thu, 23 Feb 2012 22:27:34 +0000 (14:27 -0800)]
doc: 'deactivate mds' instead of 'stop mds'

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: mds "stop" -> "deactivate"
Sage Weil [Thu, 23 Feb 2012 20:16:59 +0000 (12:16 -0800)]
mon: mds "stop" -> "deactivate"

See #1820.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agotest: add basic test for the OSD's dump_ops_in_flight adminsocket command
Greg Farnum [Thu, 23 Feb 2012 20:11:27 +0000 (12:11 -0800)]
test: add basic test for the OSD's dump_ops_in_flight adminsocket command

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoosd: add "dump_ops_in_flight" to the AdminSocket.
Greg Farnum [Wed, 15 Feb 2012 02:53:49 +0000 (18:53 -0800)]
osd: add "dump_ops_in_flight" to the AdminSocket.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agomon: refuse to stop mds if max_mds will make it rejoin
Sage Weil [Thu, 23 Feb 2012 20:08:52 +0000 (12:08 -0800)]
mon: refuse to stop mds if max_mds will make it rejoin

Otherwise the MDS will leave the cluster and immediately rejoin, which is
useless and confusing to users.  See #1820.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agocrushtool: add --reweight-item cli tests
Sage Weil [Thu, 23 Feb 2012 19:42:08 +0000 (11:42 -0800)]
crushtool: add --reweight-item cli tests

Test list, tree, and straw buckets.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agocrush: fix weight adjust for list, tree buckets
Sage Weil [Thu, 23 Feb 2012 19:03:44 +0000 (11:03 -0800)]
crush: fix weight adjust for list, tree buckets

Fix the typo.  Code now matches that for straw buckets.

Reported-by: ZhuRongze <zrz4ceph@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge branch 'wip-2090'
Sage Weil [Thu, 23 Feb 2012 19:16:17 +0000 (11:16 -0800)]
Merge branch 'wip-2090'

Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agomon: unlock mon before msgr shutdown
Sage Weil [Thu, 23 Feb 2012 04:49:04 +0000 (20:49 -0800)]
mon: unlock mon before msgr shutdown

The ceph_mon.cc main() will delete mon when the msgr dispatch thread
completes.  Make sure we unlock before we shut down the messenger, and
avoid touching this after messenger->shutdown().

Fixes: #2090
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomon: deprecate mon 'stop' command
Sage Weil [Thu, 23 Feb 2012 04:43:20 +0000 (20:43 -0800)]
mon: deprecate mon 'stop' command

Send SIGTERM.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomsgr: join dispatch_thread after it completes
Sage Weil [Thu, 23 Feb 2012 04:37:40 +0000 (20:37 -0800)]
msgr: join dispatch_thread after it completes

This is just for completeness.  No change in behavior, since we don't
get here until the thread has signaled it is done.

Drop the destroy() overload, since we join earlier.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoMerge remote-tracking branch 'gh/wip-stop'
Sage Weil [Thu, 23 Feb 2012 19:04:30 +0000 (11:04 -0800)]
Merge remote-tracking branch 'gh/wip-stop'

13 years agofilestore: use IOC_CLONERANGE intead of IOC_CLONE ioctl
Sage Weil [Thu, 23 Feb 2012 17:51:31 +0000 (09:51 -0800)]
filestore: use IOC_CLONERANGE intead of IOC_CLONE ioctl

This is functionally equivalent, except that valgrind doesn't complain
about a bad pointer passed to an ioctl.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoosd: drop "stop" command
Sage Weil [Thu, 23 Feb 2012 17:43:03 +0000 (09:43 -0800)]
osd: drop "stop" command

Send SIGINT.

Fixes: #1820
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: drop unused "stop" check
Sage Weil [Thu, 23 Feb 2012 17:42:11 +0000 (09:42 -0800)]
osd: drop unused "stop" check

This is never reached: both callers handle "stop" explicitly.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: don't complete recovery if unfound
Sage Weil [Thu, 23 Feb 2012 17:39:50 +0000 (09:39 -0800)]
osd: don't complete recovery if unfound

Otherwise we fail the !needs_recovery() assert.  Because we aren't
recovered.  For example,

2012-02-21 16:16:13.104665 1685c700 osd.5 1217 pg[0.16( v 1215'337 lc 19'2 (0'0,1215'337] n=25 ec=1 les/c 0/1061 1210/1210/1210) [5,3] r=0 lpr=1210 mlcod 0'0 active m=23 u=23 snaptrimq=[1~99,9b~e,aa~72,11d~3d,15b~e,16a~f,17a~5,180~4,185~1a,1a0~a,1ac~10,1bd~4,1c2~8,1cb~1,1cd~1,1cf~1a,1ea~10,1fb~6,202~2,205~2,209~2,20c~8,215~2,218~5,21e~1,220~1,222~9,22c~4,231~3,235~2,238~3,23e~2,241~4,246~1,248~1,24b~1,24d~9,257~6,25e~1,263~1,265~2,268~3,26e~1,273~1,275~5,27e~1,280~2]] needs_recovery osd.3 has 23 missing
osd/PG.cc: In function 'boost::statechart::result PG::RecoveryState::Active::react(const PG::RecoveryState::RecoveryComplete&)' thread 1685c700 time 2012-02-21 16:16:13.108923
osd/PG.cc: 4070: FAILED assert(!pg->needs_recovery())
 ceph version 0.42-70-g0e4367a (commit:0e4367aaac88b99c36386b6ce5e8d816fdd4ada0)
 1: (PG::RecoveryState::Active::react(PG::RecoveryState::RecoveryComplete const&)+0x1b3) [0x6a1173]
 2: (boost::statechart::simple_state<PG::RecoveryState::Active, PG::RecoveryState::Primary, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x121) [0x6c7301]
 3: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x6b) [0x6bfc6b]
 4: (PG::RecoveryState::handle_recovery_complete(PG::RecoveryCtx*)+0x10c) [0x67c03c]
 5: (ReplicatedPG::start_recovery_ops(int, PG::RecoveryCtx*)+0x241) [0x4f83c1]
 6: (OSD::do_recovery(PG*)+0x345) [0x54b3e5]
 7: (ThreadPool::worker()+0xa26) [0x619e66]
 8: (ThreadPool::WorkThread::entry()+0xd) [0x57ad5d]
 9: (()+0x7971) [0x5037971]
 10: (clone()+0x6d) [0x679f92d]

Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>