]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
13 years agomon: include unfound count in health detail
Sage Weil [Tue, 6 Mar 2012 23:17:33 +0000 (15:17 -0800)]
mon: include unfound count in health detail

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: refactor health, include optional detail
Sage Weil [Wed, 7 Mar 2012 01:05:22 +0000 (17:05 -0800)]
mon: refactor health, include optional detail

'ceph health' to get the usual summary, 'ceph health detail' to
additionally get a comprehensive list of problems found.

Eventually we can format this as yaml, json, whatever, too.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge branch 'wip_omap'
Samuel Just [Tue, 6 Mar 2012 19:46:24 +0000 (11:46 -0800)]
Merge branch 'wip_omap'

Reviewed-by: Sage Weil <sage.weil@dreamhost.com>
13 years agotest_rados_api_aio: add omap
Samuel Just [Tue, 6 Mar 2012 19:32:04 +0000 (11:32 -0800)]
test_rados_api_aio: add omap

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoosd: testing for tmap auto upgrade
Samuel Just [Tue, 6 Mar 2012 18:35:24 +0000 (10:35 -0800)]
osd: testing for tmap auto upgrade

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoReplicatedPG: transparently upgrade TMAP
Samuel Just [Fri, 2 Mar 2012 17:25:13 +0000 (09:25 -0800)]
ReplicatedPG: transparently upgrade TMAP

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoRadosModel: Add omap operations to RadosModel
Samuel Just [Tue, 7 Feb 2012 16:57:19 +0000 (08:57 -0800)]
RadosModel: Add omap operations to RadosModel

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoReplicatedPG: Add omap ops to ReplicatedPG
Samuel Just [Fri, 2 Mar 2012 00:22:27 +0000 (16:22 -0800)]
ReplicatedPG: Add omap ops to ReplicatedPG

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agolibrados: Added omap operations to librados
Samuel Just [Thu, 1 Mar 2012 22:52:20 +0000 (14:52 -0800)]
librados: Added omap operations to librados

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoosdc: Add omap operation stubs to Objecter::ObjectOperation
Samuel Just [Thu, 1 Mar 2012 20:33:33 +0000 (12:33 -0800)]
osdc: Add omap operation stubs to Objecter::ObjectOperation

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoReplicatedPG: add omap_header to recovery
Samuel Just [Fri, 2 Mar 2012 19:12:56 +0000 (11:12 -0800)]
ReplicatedPG: add omap_header to recovery

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agolibrados: add tmap_put to ObjectWriteOperation
Samuel Just [Tue, 6 Mar 2012 18:34:21 +0000 (10:34 -0800)]
librados: add tmap_put to ObjectWriteOperation

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoMerge branch 'wip-1796'
Sage Weil [Tue, 6 Mar 2012 19:03:01 +0000 (11:03 -0800)]
Merge branch 'wip-1796'

Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agomds: respawn when blacklisted
Sage Weil [Sat, 3 Mar 2012 22:28:21 +0000 (14:28 -0800)]
mds: respawn when blacklisted

If we are blacklisted by the OSD cluster, it's because we were too slow
and were replaced by another ceph-mds.  Respawn and re-register as a
standby.

If we get some other write error, shut down.

Fixes: #1796
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agojournaler: add generic write error handler
Sage Weil [Sat, 3 Mar 2012 22:25:25 +0000 (14:25 -0800)]
journaler: add generic write error handler

Specify a generic callback for any write error the journaler encounters.
This is more helpful than passing up write errors to specific callers
because

 - there are several of them
 - journaler initiates writes on its own (like the head)

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoMerge remote-tracking branch 'gh/wip-2105'
Sage Weil [Tue, 6 Mar 2012 18:49:18 +0000 (10:49 -0800)]
Merge remote-tracking branch 'gh/wip-2105'

Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
13 years ago.gitignore: src/ocf/rbd
Sage Weil [Tue, 6 Mar 2012 18:24:04 +0000 (10:24 -0800)]
.gitignore: src/ocf/rbd

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agofilestore: create snap_0 on mkfs
Sage Weil [Tue, 6 Mar 2012 17:19:32 +0000 (09:19 -0800)]
filestore: create snap_0 on mkfs

If we create a new filestore, apply one transaction, and then crash, we
want to make sure roll back to a consistent reference point--empty.  The
simplest solution is to create that snap_0 during mkfs.  This avoids
strangeness like

2012-02-27 00:42:00.336703 7fb1381ef780 filestore(/ceph/osd.0) mkfs in /ceph/osd.0
2012-02-27 00:42:00.341399 7fb1381ef780 journal _open /ceph/osd.0.journal fd 10: 1048576000 bytes, block size 4096 bytes, directio = 1, aio = 0
2012-02-27 00:42:00.349705 7fb1381ef780 filestore(/ceph/osd.0) mkjournal created journal on /ceph/osd.0.journal
2012-02-27 00:42:00.349728 7fb1381ef780 filestore(/ceph/osd.0) mkfs done in /ceph/osd.0
2012-02-27 00:42:00.349787 7fb1381ef780 filestore(/ceph/osd.0) mount FIEMAP ioctl is NOT supported
2012-02-27 00:42:00.349800 7fb1381ef780 filestore(/ceph/osd.0) mount detected btrfs
2012-02-27 00:42:00.349813 7fb1381ef780 filestore(/ceph/osd.0) mount btrfs CLONE_RANGE ioctl is supported
2012-02-27 00:42:00.357023 7fb1381ef780 filestore(/ceph/osd.0) mount btrfs SNAP_CREATE is supported
2012-02-27 00:42:00.405174 7fb1381ef780 filestore(/ceph/osd.0) mount btrfs SNAP_DESTROY is supported
2012-02-27 00:42:00.405214 7fb1381ef780 filestore(/ceph/osd.0) mount btrfs START_SYNC got (25) Inappropriate ioctl for device
2012-02-27 00:42:00.405228 7fb1381ef780 filestore(/ceph/osd.0) mount btrfs START_SYNC is NOT supported: (25) Inappropriate ioctl for device
2012-02-27 00:42:00.405235 7fb1381ef780 filestore(/ceph/osd.0) mount WARNING: btrfs snaps enabled, but no SNAP_CREATE_V2 ioctl (from kernel 2.6.37+)
2012-02-27 00:42:00.405561 7fb1381ef780 filestore(/ceph/osd.0) mount found snaps <>
2012-02-27 00:42:00.405576 7fb1381ef780 filestore(/ceph/osd.0) mount WARNING: no consistent snaps found, store may be in inconsistent state

and subsequent badness if we fail before a proper commit is made.

Fixes: #2105
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agofilestore: drop useless read_op_seq() arg
Sage Weil [Tue, 6 Mar 2012 17:19:16 +0000 (09:19 -0800)]
filestore: drop useless read_op_seq() arg

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge pull request #9 from fghaas/ocf-ra
Sage Weil [Tue, 6 Mar 2012 17:14:25 +0000 (09:14 -0800)]
Merge pull request #9 from fghaas/ocf-ra

OCF resource agents: add rbd

Reviewed-by: Sage Weil <sage@newdream.net>
Reviewed-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
13 years agorbd OCF RA: fix whitespace inconsistency 9/head
Florian Haas [Tue, 6 Mar 2012 08:58:42 +0000 (09:58 +0100)]
rbd OCF RA: fix whitespace inconsistency

Signed-off-by: Florian Haas <florian@hastexo.com>
13 years agoMerge remote branch 'gh/wip-msgr-interface'
Sage Weil [Tue, 6 Mar 2012 06:48:07 +0000 (22:48 -0800)]
Merge remote branch 'gh/wip-msgr-interface'

Reviewed-by: Sage Weil <sage@newdream.net>
13 years agoMerge remote branch 'gh/wip-swift-acls'
Sage Weil [Mon, 5 Mar 2012 22:35:30 +0000 (14:35 -0800)]
Merge remote branch 'gh/wip-swift-acls'

Lightly-reviewed-by: Sage Weil <sage@newdream.net>
13 years agoosd: delay non-replayed ops during replay
Sage Weil [Mon, 5 Mar 2012 22:21:31 +0000 (14:21 -0800)]
osd: delay non-replayed ops during replay

If we get new (non-replayed) ops during replay, those need to wait until
after the replayed ops are ordered and applied.  Otherwise we break the op
ordering completely, particularly with something like

 - pg not active
 - get op 1, put on waiting_for_active
 - pg enters replay
 - get op 2, apply immediately
 - finish replay, requeue op 1

Fixes: #2082
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
13 years agolibrados: close narrow shutdown race
Sage Weil [Mon, 5 Mar 2012 22:21:12 +0000 (14:21 -0800)]
librados: close narrow shutdown race

timer.shutdown() will drop and retake the lock, so set DISCONNECTED first
to avoid a message slipping in and reaching the objecter like so:

INFO:teuthology.task.rados.rados.0.err:osdc/Objecter.cc: In function 'void Objecter::handle_osd_op_reply(MOSDOpReply*)' thread 7f0bc2b1b700 time 2012-03-03 18:35:25.302135
INFO:teuthology.task.rados.rados.0.err:osdc/Objecter.cc: 1151: FAILED assert(initialized)
INFO:teuthology.task.rados.rados.0.err: ceph version 0.43-46-g2e57997 (commit:2e57997894944696fcc737aae9b57e30b6bb5bdc)
INFO:teuthology.task.rados.rados.0.err: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xb3) [0x7f0bc59bd66f]
INFO:teuthology.task.rados.rados.0.err: 2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x82) [0x7f0bc58e885e]
INFO:teuthology.task.rados.rados.0.err: 3: (librados::RadosClient::_dispatch(Message*)+0x66) [0x7f0bc58a2674]
INFO:teuthology.task.rados.rados.0.err: 4: (librados::RadosClient::ms_dispatch(Message*)+0x130) [0x7f0bc58a246e]
INFO:teuthology.task.rados.rados.0.err: 5: (Messenger::ms_deliver_dispatch(Message*)+0x8b) [0x7f0bc5a4e859]
INFO:teuthology.task.rados.rados.0.err: 6: (SimpleMessenger::dispatch_entry()+0x7c2) [0x7f0bc5a377fc]
INFO:teuthology.task.rados.rados.0.err: 7: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x7f0bc58b5512]
INFO:teuthology.task.rados.rados.0.err: 8: (Thread::_entry_func(void*)+0x23) [0x7f0bc5ac4c75]
INFO:teuthology.task.rados.rados.0.err: 9: (()+0x7971) [0x7f0bc5110971]
INFO:teuthology.task.rados.rados.0.err: 10: (clone()+0x6d) [0x7f0bc495092d]

Fixes: #2135
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoosd: don't trust pusher's data_complete
Sage Weil [Mon, 5 Mar 2012 22:21:00 +0000 (14:21 -0800)]
osd: don't trust pusher's data_complete

The pusher doesn't know what clone_overlap we'll see, so it has no idea
if we are data_complete from our perspective, making this check useless.
In particular, we screw up if we race with a recalculation of
clone_overlap.

Fixes: #2133
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoosd: warn if recovery still has missing at end
Sage Weil [Mon, 5 Mar 2012 22:20:48 +0000 (14:20 -0800)]
osd: warn if recovery still has missing at end

We shouldn't get to this point.  If we do, recover_primary didn't do what
it needed to.  Dump the remaining missing set and hope we can debug.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoOCF resource agents: add rbd
Florian Haas [Sat, 3 Mar 2012 23:40:55 +0000 (00:40 +0100)]
OCF resource agents: add rbd

Add a resource agent for mapping, unmapping and monitoring RBD devices.

Maps an RBD on start, unmaps it on stop. Checks "rbd showmapped"
output for monitoring whether the device is mapped, thus does not
rely on the ceph-rbdnamer udev magic to be enabled.

This RA is cloneable and essentially allows people to use RBD devices
as a drop-in replacement for
- iSCSI devices,
- host-based mirrored devices using md RAID-1,
- DRBD devices
in Pacemaker clusters.

13 years agoDBObjectMap: remove stray ;
Sage Weil [Sun, 4 Mar 2012 05:01:45 +0000 (21:01 -0800)]
DBObjectMap: remove stray ;

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoLevelDBStore: #include types.h
Sage Weil [Sat, 3 Mar 2012 22:28:55 +0000 (14:28 -0800)]
LevelDBStore: #include types.h

This fixes some compile errors on one of my boxes (squeeze).

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years ago.gitignore: *.tar.bz2
Sage Weil [Fri, 2 Mar 2012 22:59:51 +0000 (14:59 -0800)]
.gitignore: *.tar.bz2

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomsgr: start re-ordering functions into a better order
Greg Farnum [Fri, 2 Mar 2012 22:46:06 +0000 (14:46 -0800)]
msgr: start re-ordering functions into a better order

This is the start of making the SimpleMessenger interface legible
to users. In addition to moving the configuration and accessor
functions to the top of the file, it adds virtual to the functions
which are part of the defined Messenger interface.
You can tell from some of the comments that work remains.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoMerge branch 'stable'
Sage Weil [Fri, 2 Mar 2012 21:45:03 +0000 (13:45 -0800)]
Merge branch 'stable'

13 years agomsgr: remove refcounting of Messengers.
Greg Farnum [Fri, 2 Mar 2012 19:08:51 +0000 (11:08 -0800)]
msgr: remove refcounting of Messengers.

This was pretty pointless since each Messenger has a well-defined
exit point and shutdown process.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agomsgr: make nonce a required part of the SimpleMessenger constructor.
Greg Farnum [Fri, 2 Mar 2012 02:48:46 +0000 (18:48 -0800)]
msgr: make nonce a required part of the SimpleMessenger constructor.

With that, remove the set_nonce function and the gratuitous passing
of nonce around through layers of functions.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agomsgr: Require that init functions are called before bind() and start().
Greg Farnum [Fri, 2 Mar 2012 02:31:49 +0000 (18:31 -0800)]
msgr: Require that init functions are called before bind() and start().

Fix up callers to handle these constraints.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agolibrados: remove gratuitous call to add_dispatcher_head.
Greg Farnum [Fri, 2 Mar 2012 02:23:51 +0000 (18:23 -0800)]
librados: remove gratuitous call to add_dispatcher_head.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agomsgr: promote the started bool to Messenger.
Greg Farnum [Fri, 2 Mar 2012 01:52:45 +0000 (17:52 -0800)]
msgr: promote the started bool to Messenger.

Make it a protected member of Messenger instead of a public part of
SimpleMessenger.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agomsgr: Remove the SimpleMessenger::bind() nonce parameter.
Greg Farnum [Fri, 2 Mar 2012 01:19:22 +0000 (17:19 -0800)]
msgr: Remove the SimpleMessenger::bind() nonce parameter.

Instead, use the just-established nonce value.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agomsgr: Remove the SimpleMessenger start/start_with_nonce distinction.
Greg Farnum [Fri, 2 Mar 2012 01:12:28 +0000 (17:12 -0800)]
msgr: Remove the SimpleMessenger start/start_with_nonce distinction.

Instead, have a settable nonce value that you can fill in any time
after construction and that it uses during regular start().

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agomsgr: Remove SimpleMessenger::register_entity
Greg Farnum [Thu, 1 Mar 2012 23:14:52 +0000 (15:14 -0800)]
msgr: Remove SimpleMessenger::register_entity

This function has been vestigial for a long time. Remove it and move
its remaining functionality into the constructor.
Update users to the new interface (this is remarkably easy and
simplifies the code).

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agomsgr: add start() and wait() stubs to the Messenger interface
Greg Farnum [Thu, 1 Mar 2012 22:06:42 +0000 (14:06 -0800)]
msgr: add start() and wait() stubs to the Messenger interface

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agogithub.com/NewDreamNetwork -> github.com/ceph
Sage Weil [Fri, 2 Mar 2012 19:00:08 +0000 (11:00 -0800)]
github.com/NewDreamNetwork -> github.com/ceph

13 years agofilestore: fix rollback safety check
Sage Weil [Fri, 2 Mar 2012 17:44:04 +0000 (09:44 -0800)]
filestore: fix rollback safety check

There is a window in the old check between when current/commit_op_seq is
written and the snapshot is taken.  If ceph-osd crashes, we'll be unable to
start because we'll believe current/ was in use without proper checkpoints.

Instead, make the snapped/not snapped state of current/ explicit.

Fixes: #2118
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
Reviewed-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com>
13 years agoMerge remote branch 'gh/wip_fs_omap'
Sage Weil [Fri, 2 Mar 2012 17:35:11 +0000 (09:35 -0800)]
Merge remote branch 'gh/wip_fs_omap'

Reviewed-by: Sage Weil <sage.weil@dreamhost.com>
13 years agov0.43 v0.43
Sage Weil [Fri, 2 Mar 2012 16:53:30 +0000 (08:53 -0800)]
v0.43

13 years agoRadosModel: separate initialization and construction
Josh Durgin [Tue, 7 Feb 2012 01:37:55 +0000 (17:37 -0800)]
RadosModel: separate initialization and construction

Several error codes needed to be checked.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoMerge branch 'next'
Josh Durgin [Fri, 2 Mar 2012 01:17:38 +0000 (17:17 -0800)]
Merge branch 'next'

13 years agolibrados: only shutdown objecter after it's initialized
Josh Durgin [Tue, 7 Feb 2012 01:59:36 +0000 (17:59 -0800)]
librados: only shutdown objecter after it's initialized

The objecter is only initialized once the RadosClient state is
CONNECTED from the perspective of a RadosClient::shutdown()
caller. Error paths in RadosClient::connect() may call shutdown while
still in the CONNECTING state.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoMakefile: add headers for distcheck
Samuel Just [Thu, 1 Mar 2012 04:44:30 +0000 (20:44 -0800)]
Makefile: add headers for distcheck

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoReplicatedPG: Add omap to recovery
Samuel Just [Mon, 13 Feb 2012 01:58:50 +0000 (17:58 -0800)]
ReplicatedPG: Add omap to recovery

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoMOSDSubOp: Add entry for omap recovery
Samuel Just [Mon, 13 Feb 2012 01:28:02 +0000 (17:28 -0800)]
MOSDSubOp: Add entry for omap recovery

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agotest: Add KeyValueDB atomicity checker
Samuel Just [Thu, 23 Feb 2012 04:07:02 +0000 (20:07 -0800)]
test: Add KeyValueDB atomicity checker

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoos/: DBObjectMap and KeyValueDB interface with tests
Samuel Just [Fri, 3 Feb 2012 17:16:09 +0000 (09:16 -0800)]
os/: DBObjectMap and KeyValueDB interface with tests

DBObjectMap is an implementation of ObjectMap in terms of KeyValueDB.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoObjectStore.h: Initial ObjectStore omap interfaces
Samuel Just [Fri, 3 Feb 2012 17:13:18 +0000 (09:13 -0800)]
ObjectStore.h: Initial ObjectStore omap interfaces

ObjectMap.h defines the interface which will be implemented by
leveldb.  store_test now tests basic omap operations.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoos/CollectionIndex: Add debugging constructor and Path::coll()
Samuel Just [Fri, 3 Feb 2012 16:56:04 +0000 (08:56 -0800)]
os/CollectionIndex: Add debugging constructor and Path::coll()

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoAdded LevelDBStore
Samuel Just [Wed, 29 Feb 2012 02:02:34 +0000 (18:02 -0800)]
Added LevelDBStore

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoAdded leveldb submodule
Samuel Just [Wed, 29 Feb 2012 02:03:18 +0000 (18:03 -0800)]
Added leveldb submodule

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoMakefile: make check-local relative to $(srcdir)
Samuel Just [Thu, 1 Mar 2012 04:28:05 +0000 (20:28 -0800)]
Makefile: make check-local relative to $(srcdir)

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoMakefile: add json_spirit headers to tarball
Sage Weil [Thu, 1 Mar 2012 00:21:15 +0000 (16:21 -0800)]
Makefile: add json_spirit headers to tarball

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agorgw: don't check for ECANCELED in the _impl() functions
Yehuda Sadeh [Wed, 29 Feb 2012 21:51:45 +0000 (13:51 -0800)]
rgw: don't check for ECANCELED in the _impl() functions

We already check it in the outer functions.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agorgw: don't retry certain operations if we raced
Yehuda Sadeh [Wed, 29 Feb 2012 19:34:33 +0000 (11:34 -0800)]
rgw: don't retry certain operations if we raced

The atomic get/put scheme was retrying writes in case where it lost
races (head object was rewritten by another client). Instead we can
just back off and return success.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agomsgr: fix race in learned_addr()
Sage Weil [Wed, 29 Feb 2012 21:22:34 +0000 (13:22 -0800)]
msgr: fix race in learned_addr()

- two connect() threads
- both hit if (need_addr) check
- one takes lock, sets addr, need_addr = false, unlocks
- continues to ::encode(ms_addr, ...);
- meanwhile, second thread set ms_addr _again_, but copies peer port into
  place before adjusting it.  racing ::encode() sees bad port and sends it
  to the peer.

Fix this two ways:

- don't copy bad port into place; set it first
- re-check need_addr after taking lock

Fixes: #1747
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agomsgr: print existing->state before failing assert
Sage Weil [Wed, 29 Feb 2012 20:28:19 +0000 (12:28 -0800)]
msgr: print existing->state before failing assert

May help with #1378.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge remote-tracking branch 'gh/wip-2121'
Sage Weil [Wed, 29 Feb 2012 19:07:03 +0000 (11:07 -0800)]
Merge remote-tracking branch 'gh/wip-2121'

Reviewed-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com>
13 years agoosd: unregister signal handlers on shutdown
Sage Weil [Wed, 29 Feb 2012 17:46:13 +0000 (09:46 -0800)]
osd: unregister signal handlers on shutdown

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomon: unregister signal handlers on shutdown
Sage Weil [Wed, 29 Feb 2012 17:46:06 +0000 (09:46 -0800)]
mon: unregister signal handlers on shutdown

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomds: unregister SIGHUP too
Sage Weil [Wed, 29 Feb 2012 17:45:56 +0000 (09:45 -0800)]
mds: unregister SIGHUP too

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoradosgw: handle SIGHUP
Sage Weil [Wed, 29 Feb 2012 17:45:46 +0000 (09:45 -0800)]
radosgw: handle SIGHUP

Fixes: #2121
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoinit-radosgw: add 'reload' command to send SIGHUP
Sage Weil [Wed, 29 Feb 2012 17:23:22 +0000 (09:23 -0800)]
init-radosgw: add 'reload' command to send SIGHUP

Fixes: #2121
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: fix typo is recovery_state query dump
Sage Weil [Wed, 29 Feb 2012 17:21:22 +0000 (09:21 -0800)]
osd: fix typo is recovery_state query dump

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: add missing space to scrub error
Sage Weil [Wed, 29 Feb 2012 17:17:07 +0000 (09:17 -0800)]
osd: add missing space to scrub error

[ERR] 18.5 osd.3: soid 8a5e37ad/rb.0.0.000000002b99/headextra attr _, extra attr snapset

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomsgr: discard the local_pipe's queue on shutdown.
Greg Farnum [Wed, 29 Feb 2012 01:30:23 +0000 (17:30 -0800)]
msgr: discard the local_pipe's queue on shutdown.

To facilitate this, we do two things:
1) actually identify the number of special code values we pass around
2) use that to prevent trying to put() those non-pointer values in
Pipe::discard_queue().
Then we just call local_pipe.discard_queue() in wait() like happens
(indirectly, via reaping) with all the normal Pipes in rank_pipe.

But this does make me think that we may be approaching the point
where it's appropriate to create a subclass LocalPipe (against a
RemotePipe like our current Pipe implementation is mostly intended
to be).

Should fix #2086.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
13 years agoosd: remove down OSDs from peer_info on reset
Sage Weil [Wed, 29 Feb 2012 17:10:57 +0000 (09:10 -0800)]
osd: remove down OSDs from peer_info on reset

If an OSD goes down, remove it from peer_info. In particular, I saw

2012-02-28 11:04:25.851038 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3599 mlcod 0'0 peering] state<Started/Primary/Peering>: Peering advmap
2012-02-28 11:04:25.851491 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3599 mlcod 0'0 peering]  PriorSet: affected_by_map osd.1 now down
...
2012-02-28 11:04:25.998186 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering]  PriorSet: build_prior interval(3587-3597 [3,1]/[3,1] maybe_went_rw)
2012-02-28 11:04:25.998636 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering]  PriorSet: build_prior  prior osd.1 is down
2012-02-28 11:04:25.999106 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering]  PriorSet: build_prior final: probe 3,5 down 1 blocked_by {}
...
2012-02-28 11:04:26.001723 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] enter Started/Primary/Peering/GetLog
2012-02-28 11:04:26.002428 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting osd.1 1.15( v 10'1 (0'0,10'1] n=1 ec=1 les/c 0/3587 3598/3598/3598)
2012-02-28 11:04:26.003000 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting osd.3 1.15( v 10'1 (0'0,10'1] n=1 ec=1 les/c 0/3587 3598/3598/3598)
2012-02-28 11:04:26.003528 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting osd.5 1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598)
2012-02-28 11:04:26.004109 12e53700 osd.5 3602 pg[1.15( empty n=0 ec=1 les/c 0/3587 3598/3598/3598) [5,3] r=0 lpr=3602 mlcod 0'0 peering] calc_acting newest update on osd.1 with 1.15( v 10'1 (0'0,10'1] n=1 ec=1 les/c 0/3587 3598/3598/3598)

Any time an osd goes down we want to ensure we remove it from peer_info.
Handling this in Reset and Started states captures all of the nested
states, which forward the event (or re-post transit to Reset).  We can
also drop the Primary reaction, which is now superfluous.

Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoMerge branch 'next'
Sage Weil [Wed, 29 Feb 2012 01:04:55 +0000 (17:04 -0800)]
Merge branch 'next'

13 years agorgw: check for bucket swift permissions only if failed
Yehuda Sadeh [Tue, 28 Feb 2012 22:05:52 +0000 (14:05 -0800)]
rgw: check for bucket swift permissions only if failed

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agomon: report pgs stuck inactive/unclean/stale in health check
Josh Durgin [Tue, 28 Feb 2012 01:49:13 +0000 (17:49 -0800)]
mon: report pgs stuck inactive/unclean/stale in health check

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoMerge branch 'master' into wip-swift-acls
Yehuda Sadeh [Tue, 28 Feb 2012 21:31:09 +0000 (13:31 -0800)]
Merge branch 'master' into wip-swift-acls

13 years agorgw: fix swift bucket acl verification
Yehuda Sadeh [Tue, 28 Feb 2012 21:29:30 +0000 (13:29 -0800)]
rgw: fix swift bucket acl verification

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agorgw: implement swift public group
Yehuda Sadeh [Tue, 28 Feb 2012 20:37:27 +0000 (12:37 -0800)]
rgw: implement swift public group

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agomon: fix slurp_latest to fill in any missing incrementals
Greg Farnum [Tue, 28 Feb 2012 20:28:47 +0000 (12:28 -0800)]
mon: fix slurp_latest to fill in any missing incrementals

Fixes #1789.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agotest_osd_types: fix unit test for new pg_t::is_split() prototype
Sage Weil [Tue, 28 Feb 2012 17:33:18 +0000 (09:33 -0800)]
test_osd_types: fix unit test for new pg_t::is_split() prototype

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMakefile: drop separate libjson_spirit.la
Sage Weil [Tue, 28 Feb 2012 17:30:38 +0000 (09:30 -0800)]
Makefile: drop separate libjson_spirit.la

automake seems to have difficulty with the .la dependency on another .la.
Since libjson_spirit.la is only used by libcommon.la anyway, just build it
directly into that.  Sigh.

...
CXXLD libjson_spirit.la
AR libmds.a
CXXLD libcls_rbd.la
CXXLD libcls_rgw.la
CXXLD cephfs
CCLD test_ioctls
CC libcommon_la-ceph_ver.lo
CXX libcommon_la-version.lo
CXX ceph_dencoder.o
CCLD mount.ceph
CC ceph_ver.o
CXX test_libhadoopcephfs_build-version.o
CXXLD test_libhadoopcephfs_build
CXXLD libcommon.la
libtool: link: cannot find the library `libjson_spirit.la' or unhandled argument `libjson_spirit.la'

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: drop useless ENOMEM check
Sage Weil [Tue, 28 Feb 2012 17:26:04 +0000 (09:26 -0800)]
osd: drop useless ENOMEM check

new throws exception; doesn't return NULL.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoceph-osd: clarify error messages
Sage Weil [Tue, 28 Feb 2012 17:11:59 +0000 (09:11 -0800)]
ceph-osd: clarify error messages

So we know where the error came from.  And use real error codes in init().

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoinit: Actually do start the daemons when 'service ceph start <type>' is specified
Wido den Hollander [Tue, 28 Feb 2012 11:41:42 +0000 (12:41 +0100)]
init: Actually do start the daemons when 'service ceph start <type>' is specified

A bug in my previous patch prevented any daemon with auto_start set to false from starting.

This patch allows:
* /etc/init.d/ceph start osd|mds|mon
* service ceph start osd|mds|mon

It however does not start daemons if auto_start is disabled when you invoke:
* /etc/init.d/ceph start
* service ceph start

Signed-off-by: Wido den Hollander <wido@widodh.nl>
13 years agodoc: beginnings of documentation of stuck pgs and pg states
Sage Weil [Mon, 27 Feb 2012 23:41:57 +0000 (15:41 -0800)]
doc: beginnings of documentation of stuck pgs and pg states

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
13 years agofilestore: make less noise on ENOENT
Sage Weil [Mon, 27 Feb 2012 23:13:13 +0000 (15:13 -0800)]
filestore: make less noise on ENOENT

Don't generate high-level log spam on every open error.

Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
13 years agopg: use get_cluster_inst instead of get_inst in activate
Greg Farnum [Mon, 27 Feb 2012 22:49:18 +0000 (14:49 -0800)]
pg: use get_cluster_inst instead of get_inst in activate

This was mistakenly broken in 4b3bb5ab37a05fa001d59f24da7d9c30d650321b

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Reviewed-by: Sam Just <sam.just@dreamhost.com>
13 years agoMerge branch 'wip-split2'
Sage Weil [Mon, 27 Feb 2012 22:37:41 +0000 (14:37 -0800)]
Merge branch 'wip-split2'

Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoosd: pg_t::is_split(): make children out param a pointer, and optional
Sage Weil [Mon, 27 Feb 2012 22:35:21 +0000 (14:35 -0800)]
osd: pg_t::is_split(): make children out param a pointer, and optional

Also unit test it.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: bypass split code
Sage Weil [Mon, 27 Feb 2012 22:18:21 +0000 (14:18 -0800)]
osd: bypass split code

Until it is fully implemented.  It's also disabled in the monitor
currently, but just in case it gets into the OSDMap, do nothing for now.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: fix pg locking flags
Sage Weil [Tue, 21 Feb 2012 00:46:03 +0000 (16:46 -0800)]
osd: fix pg locking flags

Two things we need to handle:

 - callers who already hold map_lock (split_pg())
 - callers who already hold another pg->lock, and want to skip the lockdep
   check for this one.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: partially refactor pg split
Sage Weil [Mon, 27 Feb 2012 22:04:22 +0000 (14:04 -0800)]
osd: partially refactor pg split

This partially refactors the OSD split code to do the split synchronously
when processing a new OSDMap.  It is incomplete in that it does not yet
do anything useful for the PG.  The full solution needs to:

- Do the split synchronously when applying the map update.
- Reset the parent pg so that it repeers.  This will cause problems until
  we consistently consider this a new interval when looking backwards in
  time; this needs to be fixed.  Anybody doing generate_past_intervals()
  or similar will need to consider a split/merge event as an interval
  boundary.
- The recovery state machine should trigger appropriately when this
  happens.
- The old PG that was split should probably be handle identically to the
  new children.  That means deleting the old PG instance and creating a new
  PG object for the newly-split child.  Ditto for merge.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: implement pg_t::is_split()
Sage Weil [Mon, 20 Feb 2012 23:59:00 +0000 (15:59 -0800)]
osd: implement pg_t::is_split()

Test to determine if a pg has split between two pool sizes, and if so,
what its children are.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: factor hobject key into child pgid calc during split
Sage Weil [Mon, 20 Feb 2012 22:12:16 +0000 (14:12 -0800)]
osd: factor hobject key into child pgid calc during split

When we calculate the object's new pg, take the locator key into
consideration, to avoid a crash like

osd/OSD.cc: In function 'void OSD::split_pg(PG*, std::map<pg_t, PG*>&,ObjectStore::Transaction&)' thread 7fe3df8c4700 time 2012-02-20 18:22:19.900886
osd/OSD.cc: 4066: FAILED assert(child)

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agojournaler: log on unexpected objecter error
Sage Weil [Mon, 27 Feb 2012 19:39:53 +0000 (11:39 -0800)]
journaler: log on unexpected objecter error

This will help with #2110, #1796, #1640.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: fix recursive map_lock via check_replay_queue()
Sage Weil [Mon, 27 Feb 2012 17:56:21 +0000 (09:56 -0800)]
osd: fix recursive map_lock via check_replay_queue()

Also drop activate_pg() helper while we're at it, so it's clear that we
are the only user.

recursive lock of OSD::map_lock (33)
 ceph version 0.42-146-g7ad35ce (commit:7ad35ce489cc5f9169eb838e1196fa2ca4d6e985)
2012-02-24 12:30:16.541416 1: (PG::lock(bool)+0x2a) [0xa09348]
2012-02-24 12:30:16.541424 2: (OSD::_lookup_lock_pg(pg_t)+0xbd) [0x84b8df]
2012-02-24 12:30:16.541431 3: (OSD::activate_pg(pg_t, utime_t)+0x9f) [0x87463b]
2012-02-24 12:30:16.541442 4: (OSD::check_replay_queue()+0x12f) [0x87452d]
2012-02-24 12:30:16.541450 5: (OSD::tick()+0x23c) [0x8535ea]
2012-02-24 12:30:16.541456 6: (OSD::C_Tick::finish(int)+0x1f) [0x881671]
2012-02-24 12:30:16.541462 7: (SafeTimer::timer_thread()+0x2d5) [0x8f8211]
2012-02-24 12:30:16.541468 8: (SafeTimerThread::entry()+0x1c) [0x8f923c]
2012-02-24 12:30:16.541475 9: (Thread::_entry_func(void*)+0x23) [0x9c8109]
2012-02-24 12:30:16.541485 10: (()+0x68ba) [0x7f9dbed838ba]
2012-02-24 12:30:16.541491 11: (clone()+0x6d) [0x7f9dbd66f02d]
2012-02-24 12:30:16.541495 common/lockdep.cc: In function 'int lockdep_will_lock(const char*, int)' thread 7f9db9d98700 time 2012-02-24 12:30:16.541504

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Reviewed-by: Sam Just <samuel.just@dreamhost.com>
13 years agoinit-ceph: stick with /var/run for the time being
Sage Weil [Mon, 27 Feb 2012 04:56:05 +0000 (20:56 -0800)]
init-ceph: stick with /var/run for the time being

/run isn't present on older systems.  Stick with the old location until it
is more pervasive, or we add an autoconf option to control it.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agodebian: /var/run/ceph -> /run/ceph
Laszlo Boszormenyi [Mon, 27 Feb 2012 04:47:53 +0000 (20:47 -0800)]
debian: /var/run/ceph -> /run/ceph

/run/ceph should exists for creating UNIX domain sockets
ceph uses UNIX domain sockets for internal communication. Create their
directory on startup as /run is on a virtual filesystem.

Last-Update: <2012-02-26>
Bug-Debian: http://bugs.debian.org/660238
Forwarded: <ceph-devel@vger.kernel.org>
Signed-off-by: Laszlo Boszormenyi (GCS) <gcs@debian.hu>