]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
12 years agoOSD: handle stray snap collections from upgrade bug
Samuel Just [Tue, 7 May 2013 23:41:22 +0000 (16:41 -0700)]
OSD: handle stray snap collections from upgrade bug

Previously, we failed to clear snap_collections, which causes split to
spawn a bunch of snap collections.  In load_pgs, we now clear any such
snap collections and then snap_collections field on the PG itself.

Related: #4927
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 8e89db89cb36a217fd97cbc1f24fd643b62400dc)

12 years agoPG: clear snap_collections on upgrade
Samuel Just [Tue, 7 May 2013 23:35:57 +0000 (16:35 -0700)]
PG: clear snap_collections on upgrade

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 252d71a81ef4536830a74897c84a7015ae6ec9fe)

12 years agoOSD: snap collections can be ignored on split
Samuel Just [Tue, 7 May 2013 23:34:57 +0000 (16:34 -0700)]
OSD: snap collections can be ignored on split

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 438d9aa152e546b2008ec355b481df71aa1c51a5)

12 years agoceph: return error code when failing to get result from admin socket
Sage Weil [Wed, 8 May 2013 18:05:29 +0000 (11:05 -0700)]
ceph: return error code when failing to get result from admin socket

Make sure we return a non-zero result code when we fail to read something
from the admin socket.

Backport: cuttlefish, bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 393c9372f82ef37fc6497dd46fc453507a463d42)

12 years agov0.61 v0.61
Gary Lowell [Mon, 6 May 2013 20:18:56 +0000 (13:18 -0700)]
v0.61

12 years agoos/: default to dio for non-block journals
Samuel Just [Mon, 6 May 2013 17:56:50 +0000 (10:56 -0700)]
os/: default to dio for non-block journals

Workaround: #4910
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoceph-disk: use separate lock files for prepare, activate
Sage Weil [Mon, 6 May 2013 18:40:52 +0000 (11:40 -0700)]
ceph-disk: use separate lock files for prepare, activate

Use a separate lock file for prepare and activate to avoid deadlock.  This
didn't seem to trigger on all machines, but in many cases, the prepare
process would take the file lock and later trigger a udev event and the
activate would then block on the same lock, either when we explicitly call
'udevadm settle --timeout=10' or when partprobe does it on our behalf
(without a timeout!).   Avoid this by using separate locks for prepare
and activate.  We only care if multiple activates race; it is
okay for a prepare to be in progress and for an activate to be kicked
off.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoceph-test.install: add ceph-monstore-tool and ceph-osdomap-tool
Danny Al-Gaaf [Mon, 6 May 2013 13:42:57 +0000 (15:42 +0200)]
ceph-test.install: add ceph-monstore-tool and ceph-osdomap-tool

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoceph.spec.in: remove twice listed ceph-coverage
Danny Al-Gaaf [Mon, 6 May 2013 13:21:56 +0000 (15:21 +0200)]
ceph.spec.in: remove twice listed ceph-coverage

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoceph.spec: add some files to ceph
Danny Al-Gaaf [Mon, 6 May 2013 13:09:32 +0000 (15:09 +0200)]
ceph.spec: add some files to ceph

Add installed, but not packaged files to ceph-test (ceph-monstore-tool,
ceph-osdomap-tool) rpm file section.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomon: fix init sequence when not daemonizing
Sage Weil [Fri, 3 May 2013 23:20:26 +0000 (16:20 -0700)]
mon: fix init sequence when not daemonizing

We made the common_init_finish and chdir conditional on daemonize in commit
2e0dd5ae6c8751e33d456b2b06c1204b63db959a, breaking init (asok at least)
when -f is specified (as with upstart).

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agomon: avoid null deref in Monitor::_mon_status()
Sage Weil [Fri, 3 May 2013 23:04:31 +0000 (16:04 -0700)]
mon: avoid null deref in Monitor::_mon_status()

mikedawson reports:

*** Caught signal (Segmentation fault) **
 in thread 7f40ce270700

 ceph version 0.60-801-g7ec0151 (7ec01513970b5a977bdbdf60052b6f6e257d267e)
 1: /usr/bin/ceph-mon() [0x59d550]
 2: (()+0xfbd0) [0x7f40d3e38bd0]
 3: (operator<<(std::ostream&, entity_name_t const&)+0x16) [0x4d7c46]
 4: (operator<<(std::ostream&, entity_inst_t const&)+0x1b) [0x4d837b]
 5: (Monitor::_mon_status(std::ostream&)+0x2ce) [0x4d284e]
 6: (Monitor::do_admin_command(std::string, std::string, std::ostream&)+0x4f) [0x4d652f]
 7: (AdminHook::call(std::string, std::string, ceph::buffer::list&)+0x68) [0x4efa38]
 8: (AdminSocket::do_accept()+0x451) [0x64ab81]
 9: (AdminSocket::entry()+0x398) [0x64c528]
 10: (()+0x7f8e) [0x7f40d3e30f8e]
 11: (clone()+0x6d) [0x7f40d237ae1d]

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoceph.spec: require xfsprogs
Sage Weil [Fri, 3 May 2013 20:28:24 +0000 (13:28 -0700)]
ceph.spec: require xfsprogs

This is needed when creating new OSDs (via ceph-disk).  At least for most
people.  Eventually we'll want to include btrfs here.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoinit-ceph: update osd crush map position on start
Sage Weil [Fri, 3 May 2013 00:18:27 +0000 (17:18 -0700)]
init-ceph: update osd crush map position on start

This is what the upstart ceph-osd.conf does; we need to do the same so that
new OSDs (e.g., that ceph-deploy creates) get added to the crush map.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: fork early to avoid leveldb static env state
Sage Weil [Fri, 3 May 2013 18:29:24 +0000 (11:29 -0700)]
mon: fork early to avoid leveldb static env state

leveldb has static state that prevents it from recreating its worker thread
after our fork(), even when we close and reopen the database (tsk tsk!).
Avoid this by forking early, before we touch leveldb.

Hide the details in a Preforker class.  This is modeled after what
ceph-fuse already does; we should convert it later.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoMerge remote-tracking branch 'gh/wip-mon-rank' into next
Sage Weil [Thu, 2 May 2013 20:32:41 +0000 (13:32 -0700)]
Merge remote-tracking branch 'gh/wip-mon-rank' into next

Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agotools/: add paranoid option to ceph-osdomap-tool
Samuel Just [Thu, 2 May 2013 19:49:34 +0000 (12:49 -0700)]
tools/: add paranoid option to ceph-osdomap-tool

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoosd: default 'osd leveldb paranoid = false'
Sage Weil [Thu, 2 May 2013 19:47:24 +0000 (12:47 -0700)]
osd: default 'osd leveldb paranoid = false'

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agolibrados,client: bump mount timeout to 5 min
Sage Weil [Thu, 2 May 2013 19:31:38 +0000 (12:31 -0700)]
librados,client: bump mount timeout to 5 min

30 seconds is pretty short.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoOSD: also walk maps individually for start_split in consume_map()
Samuel Just [Thu, 2 May 2013 17:47:55 +0000 (10:47 -0700)]
OSD: also walk maps individually for start_split in consume_map()

We need to go map-by-map to get the parents right in consume_map()
just as we must in load_pgs().

Fixes: 4884
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agorgw: increase startup timeout to 5 min
Sage Weil [Thu, 2 May 2013 18:06:22 +0000 (11:06 -0700)]
rgw: increase startup timeout to 5 min

30s is too short.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge branch 'wip-paranoid' into next
Sage Weil [Thu, 2 May 2013 17:18:39 +0000 (10:18 -0700)]
Merge branch 'wip-paranoid' into next

12 years agoMerge remote-tracking branch 'gh/wip-doc-cuttlefish' into next
Sage Weil [Thu, 2 May 2013 00:24:40 +0000 (17:24 -0700)]
Merge remote-tracking branch 'gh/wip-doc-cuttlefish' into next

12 years agoMerge remote-tracking branch 'upstream/wip_4884' into next
Samuel Just [Wed, 1 May 2013 23:11:47 +0000 (16:11 -0700)]
Merge remote-tracking branch 'upstream/wip_4884' into next

Fixes: #4884
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoMakefile,gitignore: ceph-monstore-tool, not ceph_monstore_tool
Samuel Just [Wed, 1 May 2013 01:11:05 +0000 (18:11 -0700)]
Makefile,gitignore: ceph-monstore-tool, not ceph_monstore_tool

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoMakefile: put ceph_monstore_tool in bin_DEBUGPROGRAMS
Samuel Just [Wed, 1 May 2013 00:57:56 +0000 (17:57 -0700)]
Makefile: put ceph_monstore_tool in bin_DEBUGPROGRAMS

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agotools: ceph-osdomap-tool.cc
Samuel Just [Tue, 30 Apr 2013 16:31:26 +0000 (09:31 -0700)]
tools: ceph-osdomap-tool.cc

Add tool for dumping info from osd omap.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoOSD: load_pgs() should fill in start_split honestly
Samuel Just [Wed, 1 May 2013 21:59:08 +0000 (14:59 -0700)]
OSD: load_pgs() should fill in start_split honestly

In load_pgs(), we previously called assigned children starting
at the loaded pg created between its stored epoch and the current
osdmap to have that pg as their parent.  This is not correct, some
of the children may have been split in subsequent epochs from children
split in earlier epochs.  Instead, do each map individually.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoOSD: cancel_pending_splits needs to cancel all descendants
Samuel Just [Wed, 1 May 2013 21:56:25 +0000 (14:56 -0700)]
OSD: cancel_pending_splits needs to cancel all descendants

expand_pg_num() and load_pgs() may result in a pg with children
in pending_splits which also have children in pending_splits (etc).

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: add --osd-leveldb-paranoid flag
Sage Weil [Wed, 1 May 2013 21:40:33 +0000 (14:40 -0700)]
osd: add --osd-leveldb-paranoid flag

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: add --mon-leveldb-paranoid flag
Sage Weil [Wed, 1 May 2013 21:38:59 +0000 (14:38 -0700)]
mon: add --mon-leveldb-paranoid flag

This is sort of equivalent to an fsck.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agodumper: fix Objecter locking
Greg Farnum [Wed, 1 May 2013 21:10:31 +0000 (14:10 -0700)]
dumper: fix Objecter locking

Locking expectations changed at some point, and the Dumper wasn't
updated to comply:
1) We need to take the lock for Objecter, as it
doesn't do so on its own any more.
2) We need to drop the lock in several places so that Objecter
can take delivery of messages

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoRevert "PaxosService: use get and put for version_t"
Sage Weil [Wed, 1 May 2013 05:48:52 +0000 (22:48 -0700)]
Revert "PaxosService: use get and put for version_t"

This reverts commit e725c3e210b244e090d70c77d937c94f4f63a2be.

These inadvertantely got rid of the prefix portion of the key, which
lead to overwriting the wrong keys.

Fixes: #4872
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agomon/Paxos: update first_committed when we trim
Sage Weil [Wed, 1 May 2013 17:57:35 +0000 (10:57 -0700)]
mon/Paxos: update first_committed when we trim

The Paxos::trim() -> ::trim_to() path trims old states but does not
update first_committed.  This misinforms later paxos rounds such that
peers think they can participate and end up with COMMIT messages
following the COLLECT/LAST exchange that are for future commits they
can't do anything with and then crash out when they get the BEGIN:

mon/Paxos.cc: 557: FAILED assert(begin->last_committed == last_committed)

Fixes: #4879
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agomon/Paxos: don't ignore peer first_committed
Sage Weil [Wed, 1 May 2013 04:16:16 +0000 (21:16 -0700)]
mon/Paxos: don't ignore peer first_committed

We go to the effort of keeping a map of the peer's first/last committed
so that we can send the right commits during the first phase of paxos,
but we forgot to record the first value.  This appears to simply be an
oversight.  It is mostly harmless; it just means we send extra states
that the peer already has.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agomon: Monitor: fix bug on _pick_random_mon() that would choose an invalid rank
Joao Eduardo Luis [Tue, 30 Apr 2013 16:12:05 +0000 (17:12 +0100)]
mon: Monitor: fix bug on _pick_random_mon() that would choose an invalid rank

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agomon: Monitor: use rank instead of name when randomly picking monitors
Joao Eduardo Luis [Tue, 30 Apr 2013 15:28:42 +0000 (16:28 +0100)]
mon: Monitor: use rank instead of name when randomly picking monitors

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agoOSD: clean up in progress split state on pg removal
Samuel Just [Tue, 30 Apr 2013 22:48:10 +0000 (15:48 -0700)]
OSD: clean up in progress split state on pg removal

There are two cases: 1) The parent pg has not yet initiated the split 2) The
parent pg has initiated the split.

Previously in case 1), _remove_pg left the entry for its children in the
in_progress_splits map blocking subsequent peering attempts.

In case 1), we need to unblock requests on the child pgs for the parent on
parent removal.  We don't need to bother waking requests since any requests
received prior to the remove_pg request are necessarily obsolete.

In case 2), we don't need to do anything: the child will complete the split on
its own anyway.

Thus, we now track pending_splits vs in_progress_splits.  Children in
pending_splits are in state 1), in_progress_splits in state 2).  split_pgs
bumps pgs from pending_splits to in_progress_splits atomically with respect to
_remove_pg since the parent pg lock is held in both places.

Fixes: #4813
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agomon: communicate the quorum_features properly when declaring victory.
Greg Farnum [Wed, 1 May 2013 01:12:10 +0000 (18:12 -0700)]
mon: communicate the quorum_features properly when declaring victory.

Fixes #4747.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agodoc: Incorporating Tamil's feedback.
John Wilkins [Wed, 1 May 2013 01:04:46 +0000 (18:04 -0700)]
doc: Incorporating Tamil's feedback.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Reordered header levels for visual clarity.
John Wilkins [Wed, 1 May 2013 00:48:05 +0000 (17:48 -0700)]
doc: Reordered header levels for visual clarity.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Fixed a few typos.
John Wilkins [Wed, 1 May 2013 00:39:50 +0000 (17:39 -0700)]
doc: Fixed a few typos.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Updated the upgrade guide for Aronaut and Bobtail to Cuttlefish.
John Wilkins [Wed, 1 May 2013 00:32:15 +0000 (17:32 -0700)]
doc: Updated the upgrade guide for Aronaut and Bobtail to Cuttlefish.

fixes: #4874

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoMerge branch 'wip-4837-election-syncing' into next
Greg Farnum [Tue, 30 Apr 2013 22:39:21 +0000 (15:39 -0700)]
Merge branch 'wip-4837-election-syncing' into next

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoceph-disk: tolerate /sbin/service or /usr/sbin/service
Sage Weil [Tue, 30 Apr 2013 21:16:04 +0000 (14:16 -0700)]
ceph-disk: tolerate /sbin/service or /usr/sbin/service

CentOS/RH has it in /sbin, others in /usr/sbin.

Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: Monitor: disregard paxos_max_join_drift when deciding whether to sync
Joao Eduardo Luis [Tue, 30 Apr 2013 17:45:22 +0000 (18:45 +0100)]
mon: Monitor: disregard paxos_max_join_drift when deciding whether to sync

We should only rely on whether our paxos version is overlap with whatever
they have -- we'll catch up later with them.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agomon: if we get our own sync_start back, drop it on the floor.
Greg Farnum [Tue, 30 Apr 2013 20:35:53 +0000 (13:35 -0700)]
mon: if we get our own sync_start back, drop it on the floor.

We have timeouts that will clean everything up, and this can happen
in some cases that we've decided are legitimate. Hopefully we'll
be able to do something else later.

Signed-off-by: Greg Farnum <greg@inktank.com>
12 years agoRevert "mon: update assert for looser requirements"
Greg Farnum [Tue, 30 Apr 2013 20:21:28 +0000 (13:21 -0700)]
Revert "mon: update assert for looser requirements"

We reverted the gating by paxos sequences, so now we don't
need to look at them at all.

This reverts commit 1e6f02b337767012aeb387da9582cd7ad5a03084.
Signed-off-by: Greg Farnum <greg@inktank.com>
12 years agoRevert "mon: when electing, be sure acked leaders have new enough stores to lead"
Greg Farnum [Tue, 30 Apr 2013 19:02:20 +0000 (12:02 -0700)]
Revert "mon: when electing, be sure acked leaders have new enough stores to lead"

This was somehow broken -- out-of-date leaders were being elected -- and
we've decided smaller band-aids are more appropriate. We don't completely
revert the MMonElection changes, though -- there have been user clusters
running the code which includes these messages so we can't pretend it
never happened. We can make them clearly unused in the code, though.

This reverts commit fcaabf1a22723c571c10d402464071c6405607c0.

Signed-off-by: Greg Farnum <greg@inktank.com>
12 years agoObjectCacher: wait for all reads when stopping flusher
Josh Durgin [Mon, 29 Apr 2013 20:58:07 +0000 (13:58 -0700)]
ObjectCacher: wait for all reads when stopping flusher

Stopping the flusher is essentially the shutdown step for the
ObjectCacher - the next thing is actually destroying it.

If we leave any reads outstanding, when they complete they will
attempt to use the now-destroyed ObjectCacher. This is particularly a
problem with rbd images, since an -ENOENT can instantly complete many
readers, so the upper layers don't wait for the other rados-level
reads of that object to finish before trying to shutdown the cache.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoMerge branch 'wip-mon-compact' into next
Sage Weil [Tue, 30 Apr 2013 18:49:31 +0000 (11:49 -0700)]
Merge branch 'wip-mon-compact' into next

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoelector: trigger a mon reset whenever we bump the epoch
Greg Farnum [Tue, 30 Apr 2013 18:01:54 +0000 (11:01 -0700)]
elector: trigger a mon reset whenever we bump the epoch

We need to call reset during every election cycle; luckily we
can call it more than once. bump_epoch is (by definition!) only called
once per cycle, and it's called at the beginning, so we put it there.

Fixes #4858.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoMerge branch 'wip-2209' into next
David Zafman [Tue, 30 Apr 2013 17:52:23 +0000 (10:52 -0700)]
Merge branch 'wip-2209' into next

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agomon: change leveldb block size to 64K
Sage Weil [Tue, 30 Apr 2013 17:26:24 +0000 (10:26 -0700)]
mon: change leveldb block size to 64K

#leveldb on freenode says > 2MB is nonsense (it might explain the weird
behavior we saw).  Riak tuning guide suggests 256KB for large data block
environments.  Default is 8KB.  64KB seems sane for us.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agodoc: Fix typo.
John Wilkins [Tue, 30 Apr 2013 01:57:05 +0000 (18:57 -0700)]
doc: Fix typo.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added reference to transition from mkcephfs to ceph-deploy.
John Wilkins [Tue, 30 Apr 2013 01:54:04 +0000 (18:54 -0700)]
doc: Added reference to transition from mkcephfs to ceph-deploy.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Updated index for new pages. Added inner table.
John Wilkins [Tue, 30 Apr 2013 01:53:37 +0000 (18:53 -0700)]
doc: Updated index for new pages. Added inner table.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added transition from mkcephfs to ceph-deploy page.
John Wilkins [Tue, 30 Apr 2013 01:53:12 +0000 (18:53 -0700)]
doc: Added transition from mkcephfs to ceph-deploy page.

fixes: #4756

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added purge page to ceph-deploy.
John Wilkins [Tue, 30 Apr 2013 01:52:16 +0000 (18:52 -0700)]
doc: Added purge page to ceph-deploy.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added OSD page to ceph-deploy.
John Wilkins [Tue, 30 Apr 2013 01:51:46 +0000 (18:51 -0700)]
doc: Added OSD page to ceph-deploy.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added mds page for ceph-deploy.
John Wilkins [Tue, 30 Apr 2013 01:51:23 +0000 (18:51 -0700)]
doc: Added mds page for ceph-deploy.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added admin tasks page for ceph-deploy.
John Wilkins [Tue, 30 Apr 2013 01:51:05 +0000 (18:51 -0700)]
doc: Added admin tasks page for ceph-deploy.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoSet num_rd, num_wr_kb and num_wr in various places that needed it
David Zafman [Sat, 27 Apr 2013 01:07:10 +0000 (18:07 -0700)]
Set num_rd, num_wr_kb and num_wr in various places that needed it

Signed-off-by: David Zafman <david.zafman@inktank.com>
12 years agoosd: read kb stats not tracked?
David Zafman [Sat, 27 Apr 2013 01:05:18 +0000 (18:05 -0700)]
osd: read kb stats not tracked?

In read cases track stats in PG::unstable_stats
Include unstable_stats in write_info() and publish_stats_to_osd()
For now this information may not get persisted

fixes: #2209

Signed-off-by: David Zafman <david.zafman@inktank.com>
12 years agoosd: Rename members and methods related to stat publish
David Zafman [Mon, 29 Apr 2013 21:36:18 +0000 (14:36 -0700)]
osd: Rename members and methods related to stat publish

pg_stats_lock to pg_stats_publish_lock
pg_stats_valid to pg_stats_publish_valid
pg_stats_stable to pg_stats_publish
update_stats() to publish_stats_to_osd()
clear_stats() to clear_publish_stats()

Signed-off-by: David Zafman <david.zafman@inktank.com>
12 years agomon: enable 'mon compact on trim' by default; trim in larger increments
Sage Weil [Tue, 30 Apr 2013 00:20:39 +0000 (17:20 -0700)]
mon: enable 'mon compact on trim' by default; trim in larger increments

This resolves the leveldb growth-without-bound problem observed by
mikedawson, and all the badness that stems from it.  Enable this by
default until we figure out why leveldb is not behaving better.

While we are at it, trim more states at a time.  This will make
compaction less frequent, which should help given that there is some
overhead unrelated to the amount of deleted data.

Fixes: #4815
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #249 from ceph/wip-cuttle-man
Sage Weil [Tue, 30 Apr 2013 00:09:37 +0000 (17:09 -0700)]
Merge pull request #249 from ceph/wip-cuttle-man

man page updates

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agomon: share extra probe peers with debug log, mon_status
Sage Weil [Mon, 29 Apr 2013 23:31:05 +0000 (16:31 -0700)]
mon: share extra probe peers with debug log, mon_status

This is useful when debugging initial quorum formation.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agodebian: only start/stop upstart jobs if upstart is present
Sage Weil [Tue, 30 Apr 2013 00:01:55 +0000 (17:01 -0700)]
debian: only start/stop upstart jobs if upstart is present

This avoids errors on non-upstart distros (like wheezy).

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'gh/wip-up' into next
Sage Weil [Mon, 29 Apr 2013 23:57:13 +0000 (16:57 -0700)]
Merge remote-tracking branch 'gh/wip-up' into next

Reviewed-by: Sam Lang <sam.lang@inktank.com>
12 years agoMerge pull request #248 from ctrlaltdel/next
Sage Weil [Mon, 29 Apr 2013 23:46:52 +0000 (16:46 -0700)]
Merge pull request #248 from ctrlaltdel/next

Fix a README typo

12 years agoman: update remaining copyright notices 249/head
Josh Durgin [Mon, 29 Apr 2013 23:01:38 +0000 (16:01 -0700)]
man: update remaining copyright notices

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoman: refresh content from rst
Josh Durgin [Mon, 29 Apr 2013 23:01:03 +0000 (16:01 -0700)]
man: refresh content from rst

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoMerge branch 'wip_4860' into next
Samuel Just [Mon, 29 Apr 2013 22:57:26 +0000 (15:57 -0700)]
Merge branch 'wip_4860' into next

Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
12 years agoPG,OSD: _remove_pg must remove pg keys
Samuel Just [Mon, 29 Apr 2013 16:07:19 +0000 (09:07 -0700)]
PG,OSD: _remove_pg must remove pg keys

Instead of doing this in OSD::_remove_pg, pass a transaction
to on_removal and do it in PG.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoOSD: no need to remove snapdirs on _remove_pg()
Samuel Just [Mon, 29 Apr 2013 16:03:12 +0000 (09:03 -0700)]
OSD: no need to remove snapdirs on _remove_pg()

The snapmapper patches removed snapdirs altogether.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agomon/Paxos: compact on trim
Sage Weil [Mon, 29 Apr 2013 22:05:01 +0000 (15:05 -0700)]
mon/Paxos: compact on trim

Compact the paxos keys when we trim old paxos states.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: compact PaxosService prefix on trim
Sage Weil [Mon, 29 Apr 2013 22:04:09 +0000 (15:04 -0700)]
mon: compact PaxosService prefix on trim

Each time we trim a PaxosService, have leveldb compact so that the
space from removed states is reclaimed.

This is probably not optimal if leveldb's heuristics are doing the right
thing, but it currently appears as if they are not.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: add compact_prefix transaction operation
Sage Weil [Mon, 29 Apr 2013 22:01:45 +0000 (15:01 -0700)]
mon: add compact_prefix transaction operation

Add a prefix compaction opteration to the transaction that will be
performed after the transaction applies.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoleveldb: add compact_prefix method
Sage Weil [Mon, 29 Apr 2013 22:01:05 +0000 (15:01 -0700)]
leveldb: add compact_prefix method

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: compact leveldb on bootstrap
Sage Weil [Mon, 29 Apr 2013 17:51:00 +0000 (10:51 -0700)]
mon: compact leveldb on bootstrap

This is an opportunistic time to optimize our local data since we are
out of quorum.  It serves as a safety net for cases where leveldb's
automatic compaction doesn't work quite right and lets things get out
of hand.

Anecdotally we have seen stores in excess of 30GB compact down to a few
hundred KB.  And a 9GB store compact down to 900MB in only 1 minute.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: compact leveldb on bootstrap
Sage Weil [Mon, 29 Apr 2013 17:51:00 +0000 (10:51 -0700)]
mon: compact leveldb on bootstrap

This is an opportunistic time to optimize our local data since we are
out of quorum.  It serves as a safety net for cases where leveldb's
automatic compaction doesn't work quite right and lets things get out
of hand.

Anecdotally we have seen stores in excess of 30GB compact down to a few
hundred KB.  And a 9GB store compact down to 900MB in only 1 minute.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: --compact argument, config option to compact the store on start
Sage Weil [Mon, 29 Apr 2013 22:44:58 +0000 (15:44 -0700)]
mon: --compact argument, config option to compact the store on start

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoleveldb: add compact() method
Sage Weil [Mon, 29 Apr 2013 16:43:17 +0000 (09:43 -0700)]
leveldb: add compact() method

This will compact the entire store; it will be slow!

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agodoc: update rbd man page for new options
Josh Durgin [Mon, 29 Apr 2013 22:37:06 +0000 (15:37 -0700)]
doc: update rbd man page for new options

--no-progress and --allow-shrink were added recently.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agogitignore: add ceph_monstore_tool
Samuel Just [Mon, 29 Apr 2013 21:58:24 +0000 (14:58 -0700)]
gitignore: add ceph_monstore_tool

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoMakefile: fix java build warning
Sage Weil [Mon, 29 Apr 2013 21:50:31 +0000 (14:50 -0700)]
Makefile: fix java build warning

This is a workaround that makes the warning go away.  Not certain there
isn't something we should be changing...

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joe Buck <joe.buck@inktank.com>
12 years agoMerge branch 'wip-mon-pg' into next
Sage Weil [Mon, 29 Apr 2013 18:27:22 +0000 (11:27 -0700)]
Merge branch 'wip-mon-pg' into next

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agomon: remap creating pgs on startup
Sage Weil [Mon, 29 Apr 2013 18:06:36 +0000 (11:06 -0700)]
mon: remap creating pgs on startup

After Monitor::init_paxos() has loaded all of the PaxosService state,
we should then map creating pgs to osds.  This ensures we do so after the
osdmap has been loaded and the pgs actually map somewhere meaningful.

Fixes: #4675
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: only map/send pg creations if osdmap is defined
Sage Weil [Mon, 29 Apr 2013 18:11:24 +0000 (11:11 -0700)]
mon: only map/send pg creations if osdmap is defined

This avoids calculating new pg creation mappings if the osdmap isn't
loaded yet, which currently happens when during Monitor::paxos_init()
on startup.  Assuming osdmap epoch is nonzero, it should always be
safe to do this (although possibly unnecessary).

More cleanup here is certainly possible, but this is one step toward fixing
the bad behavior for #4675.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: factor map_pg_creates() out of send_pg_creates()
Sage Weil [Sun, 7 Apr 2013 15:48:22 +0000 (08:48 -0700)]
mon: factor map_pg_creates() out of send_pg_creates()

Factor out the portion of the function that remaps creating pgs to osds
from the part that sends those pending creates out.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoclient: make dup reply a louder error
Sage Weil [Mon, 29 Apr 2013 17:45:31 +0000 (10:45 -0700)]
client: make dup reply a louder error

If we get a dup reply something is probably wrong!  We should make sure
it appears more loudly in the log.  In particular, it can lead to out
of sync cap state; see #4853.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoclient: fix session open vs mdsmap race with request kicking
Sage Weil [Mon, 29 Apr 2013 17:44:28 +0000 (10:44 -0700)]
client: fix session open vs mdsmap race with request kicking

A sequence like:

 - ceph-fuse starts, make_request on getattr
 - waits for mds to be active
 - tries to open a session
 - mds restarts, recovers
 - eventually gets session open reply
 - sends first getattr (even tho mds is in reconnect state)
 - gets mdsmap update that mds is now active
 - kicks request, resends getattr
 - get first reply
 - ignore second reply, caps get out of sync

The bug is that we send the first request when the MDS is still in
the reconnect state.  The fix is to loop in make_request so that we
ensure all conditions are satisfied before sending the request.  Any
time we wait, we loop, so that we know all conditions (still) pass if
we make it to the end.

Fixes: #4853
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge branch 'wip_4836' into next
Samuel Just [Mon, 29 Apr 2013 17:45:20 +0000 (10:45 -0700)]
Merge branch 'wip_4836' into next

Fixes: #4836
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoFix a README typo 248/head
Francois Deppierraz [Thu, 25 Apr 2013 19:10:41 +0000 (21:10 +0200)]
Fix a README typo

Signed-off-by: François Deppierraz <francois@ctrlaltdel.ch>
12 years agomon: Fix leak of context
Yan, Zheng [Sat, 27 Apr 2013 03:04:38 +0000 (11:04 +0800)]
mon: Fix leak of context

Use Context::complete() to finish context, it frees the context
after executing Context::finish().

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agodoc: Removed extra whitespace.
John Wilkins [Sun, 28 Apr 2013 22:01:44 +0000 (15:01 -0700)]
doc: Removed extra whitespace.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added rbd-fuse to TOC.
John Wilkins [Sun, 28 Apr 2013 22:01:12 +0000 (15:01 -0700)]
doc: Added rbd-fuse to TOC.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoAdded commentary and removed fourth column for now.
John Wilkins [Sun, 28 Apr 2013 22:00:51 +0000 (15:00 -0700)]
Added commentary and removed fourth column for now.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Removed. Redunant information now.
John Wilkins [Sun, 28 Apr 2013 22:00:10 +0000 (15:00 -0700)]
doc: Removed. Redunant information now.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>