]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
12 years agomon: destroy MonitorDBStore before g_ceph_context
Sage Weil [Fri, 31 May 2013 04:43:50 +0000 (21:43 -0700)]
mon: destroy MonitorDBStore before g_ceph_context

Put it on the heap so that we can destroy it before the g_ceph_context
cct that it references.  This fixes a crash like

*** Caught signal (Segmentation fault) **
in thread 4034a80
ceph version 0.63-204-gcf9aa7a (cf9aa7a0037e56eada8b3c1bb59d59d0bfe7bba5)
1: ceph-mon() [0x59932a]
2: (()+0xfcb0) [0x4e41cb0]
3: (Mutex::Lock(bool)+0x1b) [0x6235bb]
4: (PerfCountersCollection::remove(PerfCounters*)+0x27) [0x6a0877]
5: (LevelDBStore::~LevelDBStore()+0x1b) [0x582b2b]
6: (LevelDBStore::~LevelDBStore()+0x9) [0x582da9]
7: (main()+0x1386) [0x48db16]
8: (__libc_start_main()+0xed) [0x658076d]
9: ceph-mon() [0x4909ad]

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit df2d06db6f3f7e858bdadcc8cd2b0ade432df413)

12 years agomon: fix leak of health_monitor and config_key_service
Sage Weil [Thu, 30 May 2013 18:07:06 +0000 (11:07 -0700)]
mon: fix leak of health_monitor and config_key_service

Switch to using regular pointers here.  The lifecycle of these services is
very simple such that refcounting is overkill.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit c888d1d3f1b77e62d1a8796992e918d12a009b9d)

12 years agomon: return instead of exit(3) via preforker
Sage Weil [Thu, 30 May 2013 00:54:17 +0000 (17:54 -0700)]
mon: return instead of exit(3) via preforker

This lets us run all the locally-scoped dtors so that leak checking will
work.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 3c5706163b72245768958155d767abf561e6d96d)

12 years agoos/LevelDBStore: add perfcounters
Sage Weil [Thu, 30 May 2013 21:57:42 +0000 (14:57 -0700)]
os/LevelDBStore: add perfcounters

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 7802292e0a49be607d7ba139b44d5ea1f98e07e6)

12 years agomon: make compaction bounds overlap
Sage Weil [Thu, 30 May 2013 21:36:41 +0000 (14:36 -0700)]
mon: make compaction bounds overlap

When we trim items N to M, compact over range (N-1) to M so that the
items in the queue will share bounds and get merged.  There is no harm in
compacting over a larger range here when the lower bound is a key that
doesn't exist anyway.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit a47ca583980523ee0108774b466718b303bd3f46)

12 years agoos/LevelDBStore: merge adjacent ranges in compactionqueue
Sage Weil [Thu, 30 May 2013 21:26:42 +0000 (14:26 -0700)]
os/LevelDBStore: merge adjacent ranges in compactionqueue

If we get behind and multiple adjacent ranges end up in the queue, merge
them so that we fire off compaction on larger ranges.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit f628dd0e4a5ace079568773edfab29d9f764d4f0)

12 years agomon: compact trimmed range, not entire prefix
Sage Weil [Wed, 29 May 2013 15:40:32 +0000 (08:40 -0700)]
mon: compact trimmed range, not entire prefix

This will reduce the work that leveldb is asked to do by only triggering
compaction of the keys that were just trimmed.

We ma want to further reduce the work by compacting less frequently, but
this is at least a step in that direction.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 6da4b20ca53fc8161485c8a99a6b333e23ace30e)

12 years agomon/MonitorDBStore: allow compaction of ranges
Sage Weil [Wed, 29 May 2013 15:35:44 +0000 (08:35 -0700)]
mon/MonitorDBStore: allow compaction of ranges

Allow a transaction to describe the compaction of a range of keys.  Do this
in a backward compatible say, such that older code will interpret the
compaction of a prefix + range as compaction of the entire prefix.  This
allows us to avoid introducing any new feature bits.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit ab09f1e5c1305a64482ebbb5a6156a0bb12a63a4)

Conflicts:

src/mon/MonitorDBStore.h

12 years agoos/LevelDBStore: allow compaction of key ranges
Sage Weil [Wed, 29 May 2013 15:34:13 +0000 (08:34 -0700)]
os/LevelDBStore: allow compaction of key ranges

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit e20c9a3f79ccfeb816ed634ca25de29fc5975ea8)

12 years agoos/LevelDBStore: do compact_prefix() work asynchronously
Sage Weil [Tue, 28 May 2013 23:35:55 +0000 (16:35 -0700)]
os/LevelDBStore: do compact_prefix() work asynchronously

We generally do not want to block while compacting a range of leveldb.
Push the blocking+waiting off to a separate thread.  (leveldb will do what
it can to avoid blocking internally; no reason for us to wait explicitly.)

This addresses part of #5176.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 4af917d4478ec07734a69447420280880d775fa2)

12 years agoqa: rsync test: exclude /usr/local
Sage Weil [Sun, 12 May 2013 00:36:13 +0000 (17:36 -0700)]
qa: rsync test: exclude /usr/local

Some plana have non-world-readable crap in /usr/local/samba.  Avoid
/usr/local entirely for that and any similar landmines.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 82211f2197241c4f3d3135fd5d7f0aa776eaeeb6)

12 years agomon: fix uninitialized fields in MMonHealth
Sage Weil [Sat, 1 Jun 2013 04:16:54 +0000 (21:16 -0700)]
mon: fix uninitialized fields in MMonHealth

Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit d7e2ab1451e284cd4273cca47eec75e1d323f113)

12 years agoPGLog: only add entry to caller_ops in add() if reqid_is_indexed()
Samuel Just [Fri, 31 May 2013 20:44:39 +0000 (13:44 -0700)]
PGLog: only add entry to caller_ops in add() if reqid_is_indexed()

Fixes: #5216
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoPG: don't write out pg map epoch every handle_activate_map
Samuel Just [Mon, 15 Apr 2013 23:33:48 +0000 (16:33 -0700)]
PG: don't write out pg map epoch every handle_activate_map

We don't actually need to write out the pg map epoch on every
activate_map as long as:
a) the osd does not trim past the oldest pg map persisted
b) the pg does update the persisted map epoch from time
to time.

To that end, we now keep a reference to the last map persisted.
The OSD already does not trim past the oldest live OSDMapRef.
Second, handle_activate_map will trim if the difference between
the current map and the last_persisted_map is large enough.

Fixes: #4731
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 2c5a9f0e178843e7ed514708bab137def840ab89)

Conflicts:

src/common/config_opts.h
src/osd/PG.cc
- last_persisted_osdmap_ref gets set in the non-static
  PG::write_info

12 years agoupstart: handle upper case in cluster name and id
Alexandre Marangone [Fri, 31 May 2013 19:33:11 +0000 (12:33 -0700)]
upstart: handle upper case in cluster name and id

Signed-off-by: Alexandre Marangone <alexandre.marangone@inktank.com>
(cherry picked from commit 851619ab6645967e5d7659d9b0eea63d5c402b15)

12 years agoOSDMonitor: skip new pools in update_pools_status() and get_pools_health()
Samuel Just [Tue, 21 May 2013 22:22:56 +0000 (15:22 -0700)]
OSDMonitor: skip new pools in update_pools_status() and get_pools_health()

New pools won't be full.  mon->pgmon()->pg_map.pg_pool_sum[poolid] will
implicitly create an entry for poolid causing register_new_pgs() to assume that
the newly created pgs in the new pool are in fact a result of a split
preventing MOSDPGCreate messages from being sent out.

Fixes: #4813
Backport: cuttlefish
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 0289c445be0269157fa46bbf187c92639a13db46)

12 years agorgw: only append prefetched data if reading from head
Yehuda Sadeh [Thu, 30 May 2013 19:58:11 +0000 (12:58 -0700)]
rgw: only append prefetched data if reading from head

Fixes: #5209
Backport: bobtail, cuttlefish
If the head object wrongfully contains data, but according to the
manifest we don't read from the head, we shouldn't copy the prefetched
data. Also fix the length calculation for that data.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit c5fc52ae0fc851444226abd54a202af227d7cf17)

12 years agorgw: don't copy object idtag when copying object
Yehuda Sadeh [Thu, 30 May 2013 16:34:21 +0000 (09:34 -0700)]
rgw: don't copy object idtag when copying object

Fixes: #5204
When copying object we ended up also copying the original
object idtag which overrode the newly generated one. When
refcount put is called with the wrong idtag the count
does't go down.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit b1312f94edc016e604f1d05ccfe2c788677f51d1)

12 years agodebian: sync up postinst and prerm with latest
Sage Weil [Thu, 30 May 2013 15:53:22 +0000 (08:53 -0700)]
debian: sync up postinst and prerm with latest

- do not use invoke-rc.d for upstart
- do not stop daemons on upgrade
- misc other cleanups

This corresponds to the state of master as of cf9aa7a.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: Monitor: backup monmap using all ceph features instead of quorum's
Joao Eduardo Luis [Thu, 30 May 2013 17:17:28 +0000 (18:17 +0100)]
mon: Monitor: backup monmap using all ceph features instead of quorum's

When a monitor is freshly created and for some reason its initial sync is
aborted, it will end up with an incorrect backup monmap.  This monmap is
incorrect in the sense that it will not contain the monitor's names as
it will expect on the next run.

This results from us being using the quorum features to encode the monmap
when backing it up, instead of CEPH_FEATURES_ALL.

Fixes: #5203
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 626de387e617db457d6d431c16327c275b0e8a34)

12 years agoosd: do not assume head obc object exists when getting snapdir
Sage Weil [Wed, 29 May 2013 16:49:11 +0000 (09:49 -0700)]
osd: do not assume head obc object exists when getting snapdir

For a list-snaps operation on the snapdir, do not assume that the obc for the
head means the object exists.  This fixes a race between a head deletion and
a list-snaps that wrongly returns ENOENT, triggered by the DiffItersateStress
test when thrashing OSDs.

Fixes: #5183
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 29e4e7e316fe3f3028e6930bb5987cfe3a5e59ab)

12 years agoosd: initialize new_state field when we use it
Sage Weil [Wed, 29 May 2013 23:50:04 +0000 (16:50 -0700)]
osd: initialize new_state field when we use it

If we use operator[] on a new int field its value is undefined; avoid
reading it or using |= et al until we initialize it.

Fixes: #4967
Backport: cuttlefish, bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
(cherry picked from commit 50ac8917f175d1b107c18ecb025af1a7b103d634)

12 years agoHashIndex: sync top directory during start_split,merge,col_split
Samuel Just [Tue, 28 May 2013 18:10:05 +0000 (11:10 -0700)]
HashIndex: sync top directory during start_split,merge,col_split

Otherwise, the links might be ordered after the in progress
operation tag write.  We need the in progress operation tag to
correctly recover from an interrupted merge, split, or col_split.

Fixes: #5180
Backport: cuttlefish, bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 5bca9c38ef5187c7a97916970a7fa73b342755ac)

12 years agomon: Paxos: get rid of the 'prepare_bootstrap()' mechanism
Joao Eduardo Luis [Wed, 22 May 2013 12:59:08 +0000 (13:59 +0100)]
mon: Paxos: get rid of the 'prepare_bootstrap()' mechanism

We don't need it after all.  If we are in the middle of some proposal,
then we guarantee that said proposal is likely to be retried.  If we
haven't yet proposed, then it's forever more likely that a client will
eventually retry the message that triggered this proposal.

Basically, this mechanism attempted at fixing a non-problem, and was in
fact triggering some unforeseen issues that would have required increasing
the code complexity for no good reason.

Fixes: #5102
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit e15d29094503f279d444eda246fc45c09f5535c9)

12 years agomon: Paxos: finish queued proposals instead of clearing the list
Joao Eduardo Luis [Wed, 22 May 2013 12:51:13 +0000 (13:51 +0100)]
mon: Paxos: finish queued proposals instead of clearing the list

By finishing these Contexts, we make sure the Contexts they enclose (to be
called once the proposal goes through) will behave as their were initially
planned:  for instance, a C_Command() may retry the command if a -EAGAIN
is passed to 'finish_contexts', while a C_Trimmed() will simply set
'going_to_trim' to false.

This aims at fixing at least a bug in which Paxos will stop trimming if an
election is triggered while a trim is queued but not yet finished.  Such
happens because it is the C_Trimmed() context that is responsible for
resetting 'going_to_trim' back to false.  By clearing all the contexts on
the proposal list instead of finishing them, we stay forever unable to
trim Paxos again as 'going_to_trim' will stay True till the end of time as
we know it.

Fixes: #4895
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 586e8c2075f721456fbd40f738dab8ccfa657aa8)

12 years agomon: Paxos: finish_proposal() when we're finished recovering
Joao Eduardo Luis [Fri, 17 May 2013 17:23:36 +0000 (18:23 +0100)]
mon: Paxos: finish_proposal() when we're finished recovering

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 2ff23fe784245f3b86bc98e0434b21a5318e0a7b)

12 years agoMerge branch 'wip_scrub_tphandle' into cuttlefish
Samuel Just [Fri, 24 May 2013 03:09:29 +0000 (20:09 -0700)]
Merge branch 'wip_scrub_tphandle' into cuttlefish

Fixes: #5159
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoPG: ping tphandle during omap loop as well
Samuel Just [Fri, 24 May 2013 00:40:44 +0000 (17:40 -0700)]
PG: ping tphandle during omap loop as well

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoPG: reset timeout in _scan_list for each object, read chunk
Samuel Just [Thu, 23 May 2013 22:24:39 +0000 (15:24 -0700)]
PG: reset timeout in _scan_list for each object, read chunk

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoOSD,PG: pass tphandle down to _scan_list
Samuel Just [Thu, 23 May 2013 22:23:05 +0000 (15:23 -0700)]
OSD,PG: pass tphandle down to _scan_list

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agorgw: iterate usage entries from correct entry
Yehuda Sadeh [Thu, 23 May 2013 04:34:52 +0000 (21:34 -0700)]
rgw: iterate usage entries from correct entry

Fixes: #5152
When iterating through usage entries, and when user id was
provided, we started at the user's first entry and not from
the entry indexed by the request start time.
This commit fixes the issue.

Backport: bobtail

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 8b3a04dec8be13559716667d4b16cde9e9543feb)

12 years agosysvinit: fix enumeration of local daemons when specifying type only
Sage Weil [Fri, 17 May 2013 03:37:05 +0000 (20:37 -0700)]
sysvinit: fix enumeration of local daemons when specifying type only

- prepend $local to the $allconf list at the top
- remove $local special case for all case
- fix the type prefix checks to explicitly check for prefixes

Fugly bash, but works!

Backport: cuttlefish, bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit c80c6a032c8112eab4f80a01ea18e1fa2c7aa6ed)

12 years agosysvinit: fix osd weight calculation on remote hosts
Sage Weil [Wed, 22 May 2013 16:47:29 +0000 (09:47 -0700)]
sysvinit: fix osd weight calculation on remote hosts

We need to do df on the remote host, not locally.

Simlarly, the ceph command uses the osd key, which exists remotely; run it there.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit d81d0ea5c442699570bd93a90bea0d97a288a1e9)

12 years agosysvinit: use known hostname $host instead of (incorrectly) recalculating
Sage Weil [Wed, 22 May 2013 16:47:03 +0000 (09:47 -0700)]
sysvinit: use known hostname $host instead of (incorrectly) recalculating

We would need to do hostname -s on the remote node, not the local one.
But we already have $host; use it!

Reported-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit caa15a34cb5d918c0c8b052cd012ec8a12fca150)

12 years agomon: be a bit more verbose about osd mark down events
Sage Weil [Mon, 20 May 2013 19:41:30 +0000 (12:41 -0700)]
mon: be a bit more verbose about osd mark down events

Put these in the cluster log; they are interesting.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 87767fb1fb9a52d11b11f0b641cebbd9998f089e)

12 years agoPG: subset_last_update must be at least log.tail
Samuel Just [Mon, 13 May 2013 21:23:00 +0000 (14:23 -0700)]
PG: subset_last_update must be at least log.tail

Fixes: 5020
Backport: bobtail, cuttlefish
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
(cherry picked from commit 72bf5f4813c273210b5ced7f7793bc1bf813690c)

12 years agoFileJournal: adjust write_pos prior to unlocking write_lock
Samuel Just [Tue, 14 May 2013 23:35:48 +0000 (16:35 -0700)]
FileJournal: adjust write_pos prior to unlocking write_lock

In committed_thru, we use write_pos to reset the header.start value in cases
where seq is past the end of our journalq.  It is therefore important that the
journalq be updated atomically with write_pos (that is, under the write_lock).

The call to align_bl() is moved into do_write in order to ensure that write_pos
is adjusted correctly prior to write_bl().

Also, we adjust pos at the end of write_bl() such that pos \in [get_top(),
header.max_size) after write_bl().

Fixes: #5020
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit eaf3abf3f9a7b13b81736aa558c9084a8f07fdbe)

12 years agomon: implement --extract-monmap <filename>
Sage Weil [Tue, 21 May 2013 21:36:11 +0000 (14:36 -0700)]
mon: implement --extract-monmap <filename>

This will make for a simpler process for
  http://ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit c0268e27497a4d8228ef54da9d4ca12f3ac1f1bf)

12 years agolibrbd: make image creation defaults configurable
Josh Durgin [Thu, 16 May 2013 22:28:40 +0000 (15:28 -0700)]
librbd: make image creation defaults configurable

Programs using older versions of the image creation functions can't
set newer parameters like image format and fancier striping.

Setting these options lets them use all the new functionality without
being patched and recompiled to use e.g. rbd_create3().
This is particularly useful for things like qemu-img, which does not
know how to create format 2 images yet.

Refs: #5067
backport: cuttlefish, bobtail
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit aacc9adc4e9ca90bbe73ac153cc754a3a5b2c0a1)

12 years agorbd.py: fix stripe_unit() and stripe_count()
Josh Durgin [Thu, 16 May 2013 22:21:24 +0000 (15:21 -0700)]
rbd.py: fix stripe_unit() and stripe_count()

These matched older versions of the functions, but would segfault
using the current versions.

backport: cuttlefish, bobtail
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 53ee6f965e8f06c7256848210ad3c4f89d0cb5a0)

12 years agocls_rbd: make sure stripe_unit is not larger than object size
Josh Durgin [Thu, 16 May 2013 22:19:46 +0000 (15:19 -0700)]
cls_rbd: make sure stripe_unit is not larger than object size

Test a few other cases too.

backport: cuttlefish, bobtail
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 810306a2a76eec1c232fd28ec9c351e827fa3031)

12 years agorgw: protect ops log socket formatter
Yehuda Sadeh [Fri, 3 May 2013 19:57:00 +0000 (12:57 -0700)]
rgw: protect ops log socket formatter

Fixes: #4905
Ops log (through the unix domain socket) uses a formatter, which wasn't
protected.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit d48f1edb07a4d8727ac956f70e663c1b4e33e1dd)

12 years agoMakefle: force char to be signed
Sage Weil [Thu, 16 May 2013 06:02:10 +0000 (23:02 -0700)]
Makefle: force char to be signed

On an armv7l build, we see errors like

 warning: rgw/rgw_common.cc:626:16: comparison is always false due to limited range of data type [-Wtype-limits]

from code

      char c1 = hex_to_num(*src++);
...
      if (c1 < 0)

Force char to be signed (regardless of any weird architecture's default)
to avoid risk of this leading to misbehavior.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit 769a16d6674122f3b537f03e17514ad974bf2a2f)

12 years agodebian: stop sysvinit on ceph.prerm
Sage Weil [Mon, 20 May 2013 20:34:27 +0000 (13:34 -0700)]
debian: stop sysvinit on ceph.prerm

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 2f193fb931ed09d921e6fa5a985ab87aa4874589)

12 years agoceph df: fix si units for 'global' stats
Mike Kelly [Thu, 16 May 2013 16:29:50 +0000 (12:29 -0400)]
ceph df: fix si units for 'global' stats

si_t expects bytes, but it was being given kilobytes.

Signed-off-by: Mike Kelly <pioto@pioto.org>
(cherry picked from commit 0c2b738d8d07994fee4c73dd076ac9364a64bdb2)

12 years agoudev: install disk/by-partuuid rules
Sage Weil [Fri, 17 May 2013 01:40:29 +0000 (18:40 -0700)]
udev: install disk/by-partuuid rules

Wheezy's udev (175-7.2) has broken rules for the /dev/disk/by-partuuid/
symlinks that ceph-disk relies on.  Install parallel rules that work.  On
new udev, this is harmless; old older udev, this will make life better.

Fixes: #4865
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit d8d7113c35b59902902d487738888567e3a6b933)

12 years agodebian: make radosgw require matching version of librados2
Sage Weil [Thu, 16 May 2013 20:17:45 +0000 (13:17 -0700)]
debian: make radosgw require matching version of librados2

...indirectly via ceph-common.  We get bad behavior when they diverge, I
think because of libcommon.la being linked both statically and dynamically.

Fixes: #4997
Backport: cuttlefish, bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Gary Lowell <gary.lowell@inktank.com>
(cherry picked from commit 604c83ff18f9a40c4f44bc8483ef22ff41efc8ad)

12 years agomon: fix validatation of mds ids in mon commands
Sage Weil [Sat, 11 May 2013 05:14:05 +0000 (22:14 -0700)]
mon: fix validatation of mds ids in mon commands

Fixes: #4996
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 5c305d63043762027323052b4bb3ae3063665c6f)

12 years agov0.61.2 v0.61.2
Gary Lowell [Mon, 13 May 2013 18:58:35 +0000 (11:58 -0700)]
v0.61.2

12 years agomon: Monitor: tolerate GV duplicates during conversion
Joao Eduardo Luis [Mon, 13 May 2013 14:36:59 +0000 (15:36 +0100)]
mon: Monitor: tolerate GV duplicates during conversion

Fixes: #4974
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit ba05b16ee2b6e25141f2ab88265a1cf92dcd493c)

12 years agoconfig_opts: default mon_debug_dump_transactions to 'false'
Dan Mick [Sat, 11 May 2013 03:09:34 +0000 (20:09 -0700)]
config_opts: default mon_debug_dump_transactions to 'false'

otherwise, it chews mon log space at an alarming rate.

Fixes: #5024
Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agov0.61.1 v0.61.1
Gary Lowell [Thu, 9 May 2013 00:23:47 +0000 (17:23 -0700)]
v0.61.1

12 years agomon: dump MonitorDBStore transactions to file
Samuel Just [Thu, 2 May 2013 21:13:07 +0000 (14:13 -0700)]
mon: dump MonitorDBStore transactions to file

Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 797089ef082b99910eebfd9454c03d1f027c93bb)

12 years agoosd: optionally enable leveldb logging
Sage Weil [Mon, 6 May 2013 21:21:28 +0000 (14:21 -0700)]
osd: optionally enable leveldb logging

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 0b4c5c1a3349670d11cc3c4fb3c4b3c1a80b2502)

12 years agomon: allow leveldb logging
Sage Weil [Mon, 6 May 2013 21:13:50 +0000 (14:13 -0700)]
mon: allow leveldb logging

'mon leveldb log = filename'

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit c1d5f815546b731e10bfcb81cbcd48b7d432e9c4)

12 years agodebian/control: squeeze requres cryptsetup package
Gary Lowell [Wed, 8 May 2013 23:33:05 +0000 (16:33 -0700)]
debian/control:  squeeze requres cryptsetup package

Squeeze requires the cryptsetup package which has been renamed
cryptsetup-bin in later versions.  Allow either package to
satisfy the dependency.

Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
(cherry picked from commit 83bbae415de16f708ca1cb24861ddbb0bd514a7f)

12 years agoosd: don't assert if get_omap_iterator() returns NULL
Yehuda Sadeh [Wed, 8 May 2013 19:18:49 +0000 (12:18 -0700)]
osd: don't assert if get_omap_iterator() returns NULL

Fixes: #4949
This can happen if the object does not exist and it's
a write operation. Just return -ENOENT.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 36ec6f9bce63641f4fc2e4ab04d03d3ec1638ea0)

12 years agoceph-create-keys: gracefully handle no data from admin socket
Sage Weil [Wed, 8 May 2013 21:54:33 +0000 (14:54 -0700)]
ceph-create-keys: gracefully handle no data from admin socket

Old ceph-mon (prior to 393c9372f82ef37fc6497dd46fc453507a463d42) would
return an empty string and success if the command was not registered yet.
Gracefully handle that case by retrying.

If we still fail to parse, exit entirely with EINVAL.

Fixes: #4952
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@intank.com>
(cherry picked from commit e2528ae42c455c522154c9f68b5032a3362fca8e)

12 years agoinit-ceph: fix osd_data location when checking df utilization
Sage Weil [Wed, 8 May 2013 21:35:54 +0000 (14:35 -0700)]
init-ceph: fix osd_data location when checking df utilization

Do not assume default osd data location.

Fixes: #4951
Backport: cuttlefish, bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Gary Lowelll <gary.lowell@inktank.com>
(cherry picked from commit f2a54cc9c98a9f31aef049c74ea932b2d9000d3c)

12 years agoOSD: handle stray snap collections from upgrade bug
Samuel Just [Tue, 7 May 2013 23:41:22 +0000 (16:41 -0700)]
OSD: handle stray snap collections from upgrade bug

Previously, we failed to clear snap_collections, which causes split to
spawn a bunch of snap collections.  In load_pgs, we now clear any such
snap collections and then snap_collections field on the PG itself.

Related: #4927
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 8e89db89cb36a217fd97cbc1f24fd643b62400dc)

12 years agoPG: clear snap_collections on upgrade
Samuel Just [Tue, 7 May 2013 23:35:57 +0000 (16:35 -0700)]
PG: clear snap_collections on upgrade

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 252d71a81ef4536830a74897c84a7015ae6ec9fe)

12 years agoOSD: snap collections can be ignored on split
Samuel Just [Tue, 7 May 2013 23:34:57 +0000 (16:34 -0700)]
OSD: snap collections can be ignored on split

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 438d9aa152e546b2008ec355b481df71aa1c51a5)

12 years agoceph: return error code when failing to get result from admin socket
Sage Weil [Wed, 8 May 2013 18:05:29 +0000 (11:05 -0700)]
ceph: return error code when failing to get result from admin socket

Make sure we return a non-zero result code when we fail to read something
from the admin socket.

Backport: cuttlefish, bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 393c9372f82ef37fc6497dd46fc453507a463d42)

12 years agov0.61 v0.61
Gary Lowell [Mon, 6 May 2013 20:18:56 +0000 (13:18 -0700)]
v0.61

12 years agoos/: default to dio for non-block journals
Samuel Just [Mon, 6 May 2013 17:56:50 +0000 (10:56 -0700)]
os/: default to dio for non-block journals

Workaround: #4910
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoceph-disk: use separate lock files for prepare, activate
Sage Weil [Mon, 6 May 2013 18:40:52 +0000 (11:40 -0700)]
ceph-disk: use separate lock files for prepare, activate

Use a separate lock file for prepare and activate to avoid deadlock.  This
didn't seem to trigger on all machines, but in many cases, the prepare
process would take the file lock and later trigger a udev event and the
activate would then block on the same lock, either when we explicitly call
'udevadm settle --timeout=10' or when partprobe does it on our behalf
(without a timeout!).   Avoid this by using separate locks for prepare
and activate.  We only care if multiple activates race; it is
okay for a prepare to be in progress and for an activate to be kicked
off.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoceph-test.install: add ceph-monstore-tool and ceph-osdomap-tool
Danny Al-Gaaf [Mon, 6 May 2013 13:42:57 +0000 (15:42 +0200)]
ceph-test.install: add ceph-monstore-tool and ceph-osdomap-tool

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoceph.spec.in: remove twice listed ceph-coverage
Danny Al-Gaaf [Mon, 6 May 2013 13:21:56 +0000 (15:21 +0200)]
ceph.spec.in: remove twice listed ceph-coverage

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoceph.spec: add some files to ceph
Danny Al-Gaaf [Mon, 6 May 2013 13:09:32 +0000 (15:09 +0200)]
ceph.spec: add some files to ceph

Add installed, but not packaged files to ceph-test (ceph-monstore-tool,
ceph-osdomap-tool) rpm file section.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomon: fix init sequence when not daemonizing
Sage Weil [Fri, 3 May 2013 23:20:26 +0000 (16:20 -0700)]
mon: fix init sequence when not daemonizing

We made the common_init_finish and chdir conditional on daemonize in commit
2e0dd5ae6c8751e33d456b2b06c1204b63db959a, breaking init (asok at least)
when -f is specified (as with upstart).

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agomon: avoid null deref in Monitor::_mon_status()
Sage Weil [Fri, 3 May 2013 23:04:31 +0000 (16:04 -0700)]
mon: avoid null deref in Monitor::_mon_status()

mikedawson reports:

*** Caught signal (Segmentation fault) **
 in thread 7f40ce270700

 ceph version 0.60-801-g7ec0151 (7ec01513970b5a977bdbdf60052b6f6e257d267e)
 1: /usr/bin/ceph-mon() [0x59d550]
 2: (()+0xfbd0) [0x7f40d3e38bd0]
 3: (operator<<(std::ostream&, entity_name_t const&)+0x16) [0x4d7c46]
 4: (operator<<(std::ostream&, entity_inst_t const&)+0x1b) [0x4d837b]
 5: (Monitor::_mon_status(std::ostream&)+0x2ce) [0x4d284e]
 6: (Monitor::do_admin_command(std::string, std::string, std::ostream&)+0x4f) [0x4d652f]
 7: (AdminHook::call(std::string, std::string, ceph::buffer::list&)+0x68) [0x4efa38]
 8: (AdminSocket::do_accept()+0x451) [0x64ab81]
 9: (AdminSocket::entry()+0x398) [0x64c528]
 10: (()+0x7f8e) [0x7f40d3e30f8e]
 11: (clone()+0x6d) [0x7f40d237ae1d]

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoceph.spec: require xfsprogs
Sage Weil [Fri, 3 May 2013 20:28:24 +0000 (13:28 -0700)]
ceph.spec: require xfsprogs

This is needed when creating new OSDs (via ceph-disk).  At least for most
people.  Eventually we'll want to include btrfs here.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoinit-ceph: update osd crush map position on start
Sage Weil [Fri, 3 May 2013 00:18:27 +0000 (17:18 -0700)]
init-ceph: update osd crush map position on start

This is what the upstart ceph-osd.conf does; we need to do the same so that
new OSDs (e.g., that ceph-deploy creates) get added to the crush map.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: fork early to avoid leveldb static env state
Sage Weil [Fri, 3 May 2013 18:29:24 +0000 (11:29 -0700)]
mon: fork early to avoid leveldb static env state

leveldb has static state that prevents it from recreating its worker thread
after our fork(), even when we close and reopen the database (tsk tsk!).
Avoid this by forking early, before we touch leveldb.

Hide the details in a Preforker class.  This is modeled after what
ceph-fuse already does; we should convert it later.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoMerge remote-tracking branch 'gh/wip-mon-rank' into next
Sage Weil [Thu, 2 May 2013 20:32:41 +0000 (13:32 -0700)]
Merge remote-tracking branch 'gh/wip-mon-rank' into next

Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agotools/: add paranoid option to ceph-osdomap-tool
Samuel Just [Thu, 2 May 2013 19:49:34 +0000 (12:49 -0700)]
tools/: add paranoid option to ceph-osdomap-tool

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoosd: default 'osd leveldb paranoid = false'
Sage Weil [Thu, 2 May 2013 19:47:24 +0000 (12:47 -0700)]
osd: default 'osd leveldb paranoid = false'

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agolibrados,client: bump mount timeout to 5 min
Sage Weil [Thu, 2 May 2013 19:31:38 +0000 (12:31 -0700)]
librados,client: bump mount timeout to 5 min

30 seconds is pretty short.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoOSD: also walk maps individually for start_split in consume_map()
Samuel Just [Thu, 2 May 2013 17:47:55 +0000 (10:47 -0700)]
OSD: also walk maps individually for start_split in consume_map()

We need to go map-by-map to get the parents right in consume_map()
just as we must in load_pgs().

Fixes: 4884
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agorgw: increase startup timeout to 5 min
Sage Weil [Thu, 2 May 2013 18:06:22 +0000 (11:06 -0700)]
rgw: increase startup timeout to 5 min

30s is too short.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge branch 'wip-paranoid' into next
Sage Weil [Thu, 2 May 2013 17:18:39 +0000 (10:18 -0700)]
Merge branch 'wip-paranoid' into next

12 years agoMerge remote-tracking branch 'gh/wip-doc-cuttlefish' into next
Sage Weil [Thu, 2 May 2013 00:24:40 +0000 (17:24 -0700)]
Merge remote-tracking branch 'gh/wip-doc-cuttlefish' into next

12 years agoMerge remote-tracking branch 'upstream/wip_4884' into next
Samuel Just [Wed, 1 May 2013 23:11:47 +0000 (16:11 -0700)]
Merge remote-tracking branch 'upstream/wip_4884' into next

Fixes: #4884
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoMakefile,gitignore: ceph-monstore-tool, not ceph_monstore_tool
Samuel Just [Wed, 1 May 2013 01:11:05 +0000 (18:11 -0700)]
Makefile,gitignore: ceph-monstore-tool, not ceph_monstore_tool

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoMakefile: put ceph_monstore_tool in bin_DEBUGPROGRAMS
Samuel Just [Wed, 1 May 2013 00:57:56 +0000 (17:57 -0700)]
Makefile: put ceph_monstore_tool in bin_DEBUGPROGRAMS

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agotools: ceph-osdomap-tool.cc
Samuel Just [Tue, 30 Apr 2013 16:31:26 +0000 (09:31 -0700)]
tools: ceph-osdomap-tool.cc

Add tool for dumping info from osd omap.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoOSD: load_pgs() should fill in start_split honestly
Samuel Just [Wed, 1 May 2013 21:59:08 +0000 (14:59 -0700)]
OSD: load_pgs() should fill in start_split honestly

In load_pgs(), we previously called assigned children starting
at the loaded pg created between its stored epoch and the current
osdmap to have that pg as their parent.  This is not correct, some
of the children may have been split in subsequent epochs from children
split in earlier epochs.  Instead, do each map individually.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoOSD: cancel_pending_splits needs to cancel all descendants
Samuel Just [Wed, 1 May 2013 21:56:25 +0000 (14:56 -0700)]
OSD: cancel_pending_splits needs to cancel all descendants

expand_pg_num() and load_pgs() may result in a pg with children
in pending_splits which also have children in pending_splits (etc).

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: add --osd-leveldb-paranoid flag
Sage Weil [Wed, 1 May 2013 21:40:33 +0000 (14:40 -0700)]
osd: add --osd-leveldb-paranoid flag

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: add --mon-leveldb-paranoid flag
Sage Weil [Wed, 1 May 2013 21:38:59 +0000 (14:38 -0700)]
mon: add --mon-leveldb-paranoid flag

This is sort of equivalent to an fsck.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agodumper: fix Objecter locking
Greg Farnum [Wed, 1 May 2013 21:10:31 +0000 (14:10 -0700)]
dumper: fix Objecter locking

Locking expectations changed at some point, and the Dumper wasn't
updated to comply:
1) We need to take the lock for Objecter, as it
doesn't do so on its own any more.
2) We need to drop the lock in several places so that Objecter
can take delivery of messages

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoRevert "PaxosService: use get and put for version_t"
Sage Weil [Wed, 1 May 2013 05:48:52 +0000 (22:48 -0700)]
Revert "PaxosService: use get and put for version_t"

This reverts commit e725c3e210b244e090d70c77d937c94f4f63a2be.

These inadvertantely got rid of the prefix portion of the key, which
lead to overwriting the wrong keys.

Fixes: #4872
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agomon/Paxos: update first_committed when we trim
Sage Weil [Wed, 1 May 2013 17:57:35 +0000 (10:57 -0700)]
mon/Paxos: update first_committed when we trim

The Paxos::trim() -> ::trim_to() path trims old states but does not
update first_committed.  This misinforms later paxos rounds such that
peers think they can participate and end up with COMMIT messages
following the COLLECT/LAST exchange that are for future commits they
can't do anything with and then crash out when they get the BEGIN:

mon/Paxos.cc: 557: FAILED assert(begin->last_committed == last_committed)

Fixes: #4879
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agomon/Paxos: don't ignore peer first_committed
Sage Weil [Wed, 1 May 2013 04:16:16 +0000 (21:16 -0700)]
mon/Paxos: don't ignore peer first_committed

We go to the effort of keeping a map of the peer's first/last committed
so that we can send the right commits during the first phase of paxos,
but we forgot to record the first value.  This appears to simply be an
oversight.  It is mostly harmless; it just means we send extra states
that the peer already has.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agomon: Monitor: fix bug on _pick_random_mon() that would choose an invalid rank
Joao Eduardo Luis [Tue, 30 Apr 2013 16:12:05 +0000 (17:12 +0100)]
mon: Monitor: fix bug on _pick_random_mon() that would choose an invalid rank

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agomon: Monitor: use rank instead of name when randomly picking monitors
Joao Eduardo Luis [Tue, 30 Apr 2013 15:28:42 +0000 (16:28 +0100)]
mon: Monitor: use rank instead of name when randomly picking monitors

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agoOSD: clean up in progress split state on pg removal
Samuel Just [Tue, 30 Apr 2013 22:48:10 +0000 (15:48 -0700)]
OSD: clean up in progress split state on pg removal

There are two cases: 1) The parent pg has not yet initiated the split 2) The
parent pg has initiated the split.

Previously in case 1), _remove_pg left the entry for its children in the
in_progress_splits map blocking subsequent peering attempts.

In case 1), we need to unblock requests on the child pgs for the parent on
parent removal.  We don't need to bother waking requests since any requests
received prior to the remove_pg request are necessarily obsolete.

In case 2), we don't need to do anything: the child will complete the split on
its own anyway.

Thus, we now track pending_splits vs in_progress_splits.  Children in
pending_splits are in state 1), in_progress_splits in state 2).  split_pgs
bumps pgs from pending_splits to in_progress_splits atomically with respect to
_remove_pg since the parent pg lock is held in both places.

Fixes: #4813
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agomon: communicate the quorum_features properly when declaring victory.
Greg Farnum [Wed, 1 May 2013 01:12:10 +0000 (18:12 -0700)]
mon: communicate the quorum_features properly when declaring victory.

Fixes #4747.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agodoc: Incorporating Tamil's feedback.
John Wilkins [Wed, 1 May 2013 01:04:46 +0000 (18:04 -0700)]
doc: Incorporating Tamil's feedback.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Reordered header levels for visual clarity.
John Wilkins [Wed, 1 May 2013 00:48:05 +0000 (17:48 -0700)]
doc: Reordered header levels for visual clarity.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>