]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
12 years agomon: Monitor: backup monmap using all ceph features instead of quorum's 333/head
Joao Eduardo Luis [Thu, 30 May 2013 17:17:28 +0000 (18:17 +0100)]
mon: Monitor: backup monmap using all ceph features instead of quorum's

When a monitor is freshly created and for some reason its initial sync is
aborted, it will end up with an incorrect backup monmap.  This monmap is
incorrect in the sense that it will not contain the monitor's names as
it will expect on the next run.

This results from us being using the quorum features to encode the monmap
when backing it up, instead of CEPH_FEATURES_ALL.

Fixes: #5203
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agoosd: initialize new_state field when we use it
Sage Weil [Wed, 29 May 2013 23:50:04 +0000 (16:50 -0700)]
osd: initialize new_state field when we use it

If we use operator[] on a new int field its value is undefined; avoid
reading it or using |= et al until we initialize it.

Fixes: #4967
Backport: cuttlefish, bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
12 years agomds: stay in SCAN state in file_eval
Sage Weil [Tue, 28 May 2013 17:51:11 +0000 (10:51 -0700)]
mds: stay in SCAN state in file_eval

If we are in the SCAN state, stay there until the recovery finishes.  Do
not jump to another state from file_eval().

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 0071b8e75bd3f5a09cc46e2225a018f6d1ef0680)

12 years agoosd: do not assume head obc object exists when getting snapdir
Sage Weil [Wed, 29 May 2013 16:49:11 +0000 (09:49 -0700)]
osd: do not assume head obc object exists when getting snapdir

For a list-snaps operation on the snapdir, do not assume that the obc for the
head means the object exists.  This fixes a race between a head deletion and
a list-snaps that wrongly returns ENOENT, triggered by the DiffItersateStress
test when thrashing OSDs.

Fixes: #5183
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agomon: disable tdump by default
Sage Weil [Wed, 29 May 2013 05:13:11 +0000 (22:13 -0700)]
mon: disable tdump by default

Grr.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'gh/last'
Sage Weil [Wed, 29 May 2013 05:10:21 +0000 (22:10 -0700)]
Merge remote-tracking branch 'gh/last'

12 years agoMerge branch 'wip-5172'
Sage Weil [Wed, 29 May 2013 03:44:48 +0000 (20:44 -0700)]
Merge branch 'wip-5172'

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: fix note_down_osd
Sage Weil [Wed, 29 May 2013 03:38:43 +0000 (20:38 -0700)]
osd: fix note_down_osd

Fix bug introduced in 27381c0c6259ac89f5f9c592b4bfb585937a1cfc.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: fix hb con failure handler
Sage Weil [Wed, 29 May 2013 03:39:30 +0000 (20:39 -0700)]
osd: fix hb con failure handler

Fix a few bugs introduced by 27381c0c6259ac89f5f9c592b4bfb585937a1cfc:

- check against both front and back cons; either one may have failed.
- close *both* front and back before reopening either.  this is
  overkill, but slightly simpler code.
- fix leak of con when marking down
- handle race against osdmap update and note_down_osd

Fixes: #5172
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #319 from dalgaaf/wip-da-pylint-3
Sage Weil [Wed, 29 May 2013 02:52:41 +0000 (19:52 -0700)]
Merge pull request #319 from dalgaaf/wip-da-pylint-3

Fix some smaller Python issues

12 years agoMerge pull request #326 from dalgaaf/wip-da-CID-727978
Sage Weil [Tue, 28 May 2013 22:48:11 +0000 (15:48 -0700)]
Merge pull request #326 from dalgaaf/wip-da-CID-727978

kv_flat_btree_async.cc: fix AioCompletion resource leak

12 years agov0.63 v0.63
Gary Lowell [Tue, 28 May 2013 20:58:22 +0000 (13:58 -0700)]
v0.63

12 years agoHashIndex: sync top directory during start_split,merge,col_split
Samuel Just [Tue, 28 May 2013 18:10:05 +0000 (11:10 -0700)]
HashIndex: sync top directory during start_split,merge,col_split

Otherwise, the links might be ordered after the in progress
operation tag write.  We need the in progress operation tag to
correctly recover from an interrupted merge, split, or col_split.

Fixes: #5180
Backport: cuttlefish, bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #325 from dalgaaf/wip-da-CID-727980
Sage Weil [Tue, 28 May 2013 17:27:56 +0000 (10:27 -0700)]
Merge pull request #325 from dalgaaf/wip-da-CID-727980

kv_flat_btree_async.cc: fix AioCompletion resource leak

12 years agoMerge pull request #324 from dalgaaf/wip-da-CID-727979
Sage Weil [Tue, 28 May 2013 17:27:25 +0000 (10:27 -0700)]
Merge pull request #324 from dalgaaf/wip-da-CID-727979

kv_flat_btree_async.cc: fix AioCompletion resource leak

12 years agoosd/OSDMap: fix Incremental dump
Sage Weil [Tue, 28 May 2013 16:16:17 +0000 (09:16 -0700)]
osd/OSDMap: fix Incremental dump

The front hb addr entry may not be present.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #322 from guilhem/patch-1
Sage Weil [Tue, 28 May 2013 15:43:10 +0000 (08:43 -0700)]
Merge pull request #322 from guilhem/patch-1

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agokv_flat_btree_async.cc: fix AioCompletion resource leak 326/head
Danny Al-Gaaf [Tue, 28 May 2013 10:43:12 +0000 (12:43 +0200)]
kv_flat_btree_async.cc: fix AioCompletion resource leak

Call AioCompletion::release() if the completion is no longer needed.

CID 727978 (#1-2 of 2): Resource leak (RESOURCE_LEAK)
  leaked_storage: Variable "obj_aioc" going out of scope leaks the
  storage it points to.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agokv_flat_btree_async.cc: fix AioCompletion resource leak 324/head
Danny Al-Gaaf [Tue, 28 May 2013 10:38:57 +0000 (12:38 +0200)]
kv_flat_btree_async.cc: fix AioCompletion resource leak

Call AioCompletion::release() if the completion is no longer needed.

CID 727979 (#1-2 of 2): Resource leak (RESOURCE_LEAK)
  leaked_storage: Variable "a" going out of scope leaks the storage
  it points to.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agokv_flat_btree_async.cc: fix AioCompletion resource leak 325/head
Danny Al-Gaaf [Tue, 28 May 2013 10:27:37 +0000 (12:27 +0200)]
kv_flat_btree_async.cc: fix AioCompletion resource leak

Call AioCompletion::release() if the completion is no longer
needed.

CID 727980 (#1-4 of 4): Resource leak (RESOURCE_LEAK)
  leaked_storage: Variable "aioc" going out of scope leaks
  the storage it points to.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoRemove mon socket in post-stop 322/head
Guilhem Lettron [Mon, 27 May 2013 10:41:53 +0000 (12:41 +0200)]
Remove mon socket in post-stop

If ceph-mon segfault, socket file isn't removed.

By adding a remove in post-stop, upstart clean run directory properly.

Signed-off-by: Guilhem Lettron <guilhem@lettron.fr>
12 years agoPendingReleaseNotes: notes about enabling HASHPSPOOL
Sage Weil [Tue, 28 May 2013 04:16:46 +0000 (21:16 -0700)]
PendingReleaseNotes: notes about enabling HASHPSPOOL

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdmaptool: fix cli tests
Sage Weil [Tue, 28 May 2013 04:12:29 +0000 (21:12 -0700)]
osdmaptool: fix cli tests

Now that the default pool flags have changed.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #321 from dalgaaf/wip-da-CID-727981
Sage Weil [Mon, 27 May 2013 20:55:54 +0000 (13:55 -0700)]
Merge pull request #321 from dalgaaf/wip-da-CID-727981

kv_flat_btree_async.cc: fix AioCompletion resource leak

12 years agoMerge pull request #320 from dalgaaf/wip-da-CID-727983
Sage Weil [Mon, 27 May 2013 20:55:24 +0000 (13:55 -0700)]
Merge pull request #320 from dalgaaf/wip-da-CID-727983

kv_flat_btree_async.cc: fix resource leak

12 years agodoc: Updated rgw.conf example.
John Wilkins [Sat, 25 May 2013 22:13:01 +0000 (15:13 -0700)]
doc: Updated rgw.conf example.

fixes: #4608

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Updated RGW Quickstart.
John Wilkins [Sat, 25 May 2013 22:11:49 +0000 (15:11 -0700)]
doc: Updated RGW Quickstart.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Updated index for newer terms.
John Wilkins [Sat, 25 May 2013 22:11:06 +0000 (15:11 -0700)]
doc: Updated index for newer terms.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agopg_pool_t: enable FLAG_HASHPSPOOL by default
Samuel Just [Fri, 24 May 2013 23:20:38 +0000 (16:20 -0700)]
pg_pool_t: enable FLAG_HASHPSPOOL by default

Fixes: #5160
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agokv_flat_btree_async.cc: fix AioCompletion resource leak 321/head
Danny Al-Gaaf [Fri, 24 May 2013 12:47:49 +0000 (14:47 +0200)]
kv_flat_btree_async.cc: fix AioCompletion resource leak

Call AioCompletion::release() if the completion is no longer
needed to free the resources.

CID 727981 (#3 of 3): Resource leak (RESOURCE_LEAK)
  leaked_storage: Variable "top_aioc" going out of scope leaks the
  storage it points to.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agokv_flat_btree_async.cc: fix resource leak 320/head
Danny Al-Gaaf [Fri, 24 May 2013 12:29:14 +0000 (14:29 +0200)]
kv_flat_btree_async.cc: fix resource leak

Call AioCompletion::release() if the completion is no longer
needed to free the resources.

CID 727983 : Resource leak (RESOURCE_LEAK)
  leaked_storage: Variable "aioc" going out of scope leaks the
  storage it points to.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoceph-disk: remove unnecessary semicolons 319/head
Danny Al-Gaaf [Fri, 24 May 2013 10:46:15 +0000 (12:46 +0200)]
ceph-disk: remove unnecessary semicolons

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoceph-disk: cast output of _check_output()
Danny Al-Gaaf [Fri, 24 May 2013 10:41:11 +0000 (12:41 +0200)]
ceph-disk: cast output of _check_output()

Cast output of _check_output() to str() to be able to use
str.split().

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoceph-disk: fix undefined variable
Danny Al-Gaaf [Fri, 24 May 2013 10:33:16 +0000 (12:33 +0200)]
ceph-disk: fix undefined variable

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoceph-disk: add missing spaces around operator
Danny Al-Gaaf [Fri, 24 May 2013 10:29:07 +0000 (12:29 +0200)]
ceph-disk: add missing spaces around operator

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoMerge branch 'wip_scrub_tphandle' into next
Samuel Just [Fri, 24 May 2013 03:08:11 +0000 (20:08 -0700)]
Merge branch 'wip_scrub_tphandle' into next

Fixes: #5159
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoPG: ping tphandle during omap loop as well
Samuel Just [Fri, 24 May 2013 00:40:44 +0000 (17:40 -0700)]
PG: ping tphandle during omap loop as well

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoPG: reset timeout in _scan_list for each object, read chunk
Samuel Just [Thu, 23 May 2013 22:24:39 +0000 (15:24 -0700)]
PG: reset timeout in _scan_list for each object, read chunk

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoOSD,PG: pass tphandle down to _scan_list
Samuel Just [Thu, 23 May 2013 22:23:05 +0000 (15:23 -0700)]
OSD,PG: pass tphandle down to _scan_list

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agodoc: Updated Ceph FS Quick Start.
John Wilkins [Fri, 24 May 2013 00:02:17 +0000 (17:02 -0700)]
doc: Updated Ceph FS Quick Start.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added troubleshooting to Ceph FS index.
John Wilkins [Fri, 24 May 2013 00:01:51 +0000 (17:01 -0700)]
doc: Added troubleshooting to Ceph FS index.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added separate troubleshooting for MDS and Ceph FS.
John Wilkins [Fri, 24 May 2013 00:01:29 +0000 (17:01 -0700)]
doc: Added separate troubleshooting for MDS and Ceph FS.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agorgw: iterate usage entries from correct entry
Yehuda Sadeh [Thu, 23 May 2013 04:34:52 +0000 (21:34 -0700)]
rgw: iterate usage entries from correct entry

Fixes: #5152
When iterating through usage entries, and when user id was
provided, we started at the user's first entry and not from
the entry indexed by the request start time.
This commit fixes the issue.

Backport: bobtail

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agodoc: Updates for ceph-deploy and cuttlefish.
John Wilkins [Thu, 23 May 2013 18:45:14 +0000 (11:45 -0700)]
doc: Updates for ceph-deploy and cuttlefish.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agomon: drop unnecessary conditionals
Sage Weil [Thu, 23 May 2013 17:23:43 +0000 (10:23 -0700)]
mon: drop unnecessary conditionals

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #311 from ceph/wip-5102
Sage Weil [Thu, 23 May 2013 17:21:51 +0000 (10:21 -0700)]
Merge pull request #311 from ceph/wip-5102

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #312 from ceph/wip-osd-hb
Sage Weil [Thu, 23 May 2013 17:17:14 +0000 (10:17 -0700)]
Merge pull request #312 from ceph/wip-osd-hb

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoMerge branch 'next'
Sage Weil [Thu, 23 May 2013 15:49:10 +0000 (08:49 -0700)]
Merge branch 'next'

12 years agomodified: src/init-ceph.in
Xiaoxi Chen [Thu, 23 May 2013 01:33:27 +0000 (09:33 +0800)]
modified:   src/init-ceph.in
fixed bug in init script, the "df" should be run on remote host by do_cmd,
and use $host instead of "hostname -s"

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
(cherry picked from commit 1dd99f0fc91ee6d417325689f24601aa335b94c2)

Conflicts:

src/init-ceph.in

12 years agomsgr: increase port range to 6900-7300 (from -7100)
Sage Weil [Thu, 23 May 2013 15:40:23 +0000 (08:40 -0700)]
msgr: increase port range to 6900-7300 (from -7100)

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #307 from xiaoxichen/master
Sage Weil [Thu, 23 May 2013 15:45:55 +0000 (08:45 -0700)]
Merge pull request #307 from xiaoxichen/master

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agomodified: src/init-ceph.in 307/head
Xiaoxi Chen [Thu, 23 May 2013 01:33:27 +0000 (09:33 +0800)]
modified:   src/init-ceph.in
fixed bug in init script, the "df" should be run on remote host by do_cmd,
and use $host instead of "hostname -s"

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
12 years agoosd: ping both front and back interfaces 312/head
Sage Weil [Wed, 22 May 2013 15:44:52 +0000 (08:44 -0700)]
osd: ping both front and back interfaces

Send ping requests to both the front and back hb addrs for peer osds.  If
the front hb addr is not present, do not send it and interpret a reply
as coming from both.  This handles the transition from old to new OSDs
seamlessly.

Note both the front and back rx times.  Both need to be up to date in order
for the peer to be healthy.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomsgr: add Messenger reference to Connection
Sage Weil [Wed, 22 May 2013 15:13:21 +0000 (08:13 -0700)]
msgr: add Messenger reference to Connection

This allows us to get the messenger associated with a connection.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomsgr: take an arbitrary set of ports to avoid binding to
Sage Weil [Wed, 22 May 2013 00:20:45 +0000 (17:20 -0700)]
msgr: take an arbitrary set of ports to avoid binding to

We used to only need to avoid 2 ports; now we need 3.  Make it a set so we
don't have this problem later.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: bind front heartbeat messenger to public_addr
Sage Weil [Wed, 22 May 2013 00:10:01 +0000 (17:10 -0700)]
osd: bind front heartbeat messenger to public_addr

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: send hb front addr to monitor at boot
Sage Weil [Tue, 21 May 2013 23:44:00 +0000 (16:44 -0700)]
osd: send hb front addr to monitor at boot

We still aren't binding it to anything yet, or putting it in the OSDMap.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: create front and back hb messenger instances
Sage Weil [Tue, 21 May 2013 23:43:24 +0000 (16:43 -0700)]
osd: create front and back hb messenger instances

The hb_front messenger is not used yet.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd/OSDMap: encode front heartbeat addr
Sage Weil [Tue, 21 May 2013 23:48:43 +0000 (16:48 -0700)]
osd/OSDMap: encode front heartbeat addr

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd/OSDMap: hb_addr -> hb_back_addr
Sage Weil [Tue, 21 May 2013 23:39:00 +0000 (16:39 -0700)]
osd/OSDMap: hb_addr -> hb_back_addr

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd/OSDMap: new_hb_up -> new_hb_back_up
Sage Weil [Tue, 21 May 2013 22:41:46 +0000 (15:41 -0700)]
osd/OSDMap: new_hb_up -> new_hb_back_up

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd/OSDMap: new_up_internal -> new_up_cluster
Sage Weil [Wed, 22 May 2013 23:03:36 +0000 (16:03 -0700)]
osd/OSDMap: new_up_internal -> new_up_cluster

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: Add asserts for seg faults caused by corrupt OSDs
David Zafman [Wed, 22 May 2013 05:10:41 +0000 (22:10 -0700)]
osd: Add asserts for seg faults caused by corrupt OSDs

fixes: #5139

Signed-off-by: David Zafman <david.zafman@inktank.com>
12 years agoosd: skip mark-me-down message if osd is not up
Sage Weil [Wed, 22 May 2013 22:03:50 +0000 (15:03 -0700)]
osd: skip mark-me-down message if osd is not up

Fixes crash when the OSD has not successfully booted and gets a
SIGINT or SIGTERM.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd, mds: shut down async signal handler on exit
Sage Weil [Wed, 22 May 2013 21:56:24 +0000 (14:56 -0700)]
osd, mds: shut down async signal handler on exit

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agorbd image_read.sh: ensure rbd is loaded
Alex Elder [Wed, 22 May 2013 21:50:19 +0000 (16:50 -0500)]
rbd image_read.sh: ensure rbd is loaded

Make sure rbd is loaded before proceeding with the script.

Signed-off-by: Alex Elder <elder@inktank.com>
12 years agomessages/MOSDMarkMeDown: fix uninit field
Sage Weil [Wed, 22 May 2013 21:29:37 +0000 (14:29 -0700)]
messages/MOSDMarkMeDown: fix uninit field

Fixes valgrind warning:
==14803== Use of uninitialised value of size 8
==14803==    at 0x12E7614: sctp_crc32c_sb8_64_bit (sctp_crc32.c:567)
==14803==    by 0x12E76F8: update_crc32 (sctp_crc32.c:609)
==14803==    by 0x12E7720: ceph_crc32c_le (sctp_crc32.c:733)
==14803==    by 0x105085F: ceph::buffer::list::crc32c(unsigned int) (buffer.h:427)
==14803==    by 0x115D7B2: Message::calc_front_crc() (Message.h:441)
==14803==    by 0x1159BB0: Message::encode(unsigned long, bool) (Message.cc:170)
==14803==    by 0x1323934: Pipe::writer() (Pipe.cc:1524)
==14803==    by 0x13293D9: Pipe::Writer::entry() (Pipe.h:59)
==14803==    by 0x120A398: Thread::_entry_func(void*) (Thread.cc:41)
==14803==    by 0x503BE99: start_thread (pthread_create.c:308)
==14803==    by 0x6C6E4BC: clone (clone.S:112)

Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomds: weaken reconnect assertion
Sage Weil [Wed, 22 May 2013 21:11:40 +0000 (14:11 -0700)]
mds: weaken reconnect assertion

See #5031.  This appears to be populated with another mds's mdsdir; just
not asserting avoids the problem for the time being.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #316 from ceph/wip-sysvinit
Sage Weil [Wed, 22 May 2013 20:25:42 +0000 (13:25 -0700)]
Merge pull request #316 from ceph/wip-sysvinit

Reviewed-by: Dan Mick <dan.mick@inktank.com>
12 years agosysvinit: fix osd weight calculation on remote hosts 316/head
Sage Weil [Wed, 22 May 2013 16:47:29 +0000 (09:47 -0700)]
sysvinit: fix osd weight calculation on remote hosts

We need to do df on the remote host, not locally.

Simlarly, the ceph command uses the osd key, which exists remotely; run it there.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agosysvinit: use known hostname $host instead of (incorrectly) recalculating
Sage Weil [Wed, 22 May 2013 16:47:03 +0000 (09:47 -0700)]
sysvinit: use known hostname $host instead of (incorrectly) recalculating

We would need to do hostname -s on the remote node, not the local one.
But we already have $host; use it!

Reported-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #314 from ceph/wip-4228
Sage Weil [Wed, 22 May 2013 17:33:35 +0000 (10:33 -0700)]
Merge pull request #314 from ceph/wip-4228

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoOSDMonitor: skip new pools in update_pools_status() and get_pools_health()
Samuel Just [Tue, 21 May 2013 22:22:56 +0000 (15:22 -0700)]
OSDMonitor: skip new pools in update_pools_status() and get_pools_health()

New pools won't be full.  mon->pgmon()->pg_map.pg_pool_sum[poolid] will
implicitly create an entry for poolid causing register_new_pgs() to assume that
the newly created pgs in the new pool are in fact a result of a split
preventing MOSDPGCreate messages from being sent out.

Fixes: #4813
Backport: cuttlefish
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoceph-syn: specify which types of addresses to pick 314/head
Joao Eduardo Luis [Wed, 22 May 2013 16:52:27 +0000 (17:52 +0100)]
ceph-syn: specify which types of addresses to pick

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agoceph-mds: specify which types of addresses to pick
Joao Eduardo Luis [Wed, 22 May 2013 16:52:15 +0000 (17:52 +0100)]
ceph-mds: specify which types of addresses to pick

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agoMerge pull request #315 from ceph/wip-4507
Sage Weil [Wed, 22 May 2013 17:15:51 +0000 (10:15 -0700)]
Merge pull request #315 from ceph/wip-4507

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agomon: PaxosService: drop atomic_t on 'proposing' 315/head
Joao Eduardo Luis [Mon, 6 May 2013 16:10:15 +0000 (17:10 +0100)]
mon: PaxosService: drop atomic_t on 'proposing'

We don't need this to be atomic -- a simple boolean is enough.

Fixes: #4507
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agoceph-osd: specify which types of addresses to pick
Joao Eduardo Luis [Wed, 22 May 2013 16:52:03 +0000 (17:52 +0100)]
ceph-osd: specify which types of addresses to pick

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agoceph-mon: only care about public addr during pick_addresses()
Joao Eduardo Luis [Mon, 6 May 2013 15:51:30 +0000 (16:51 +0100)]
ceph-mon: only care about public addr during pick_addresses()

Fixes: #4228
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agocommon: add mask argument to pick_addresses() to specify what we need
Joao Eduardo Luis [Mon, 6 May 2013 15:33:53 +0000 (16:33 +0100)]
common: add mask argument to pick_addresses() to specify what we need

Fixes: #4228
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agoceph: remove cli test
Sage Weil [Wed, 22 May 2013 16:39:11 +0000 (09:39 -0700)]
ceph: remove cli test

This is about to be removed by wip-ceph-cli anyway.  And it broke in
commit 132d5bf7f9af7de9e2028e20c95ba91637da5875.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: Paxos: get rid of the 'prepare_bootstrap()' mechanism 311/head
Joao Eduardo Luis [Wed, 22 May 2013 12:59:08 +0000 (13:59 +0100)]
mon: Paxos: get rid of the 'prepare_bootstrap()' mechanism

We don't need it after all.  If we are in the middle of some proposal,
then we guarantee that said proposal is likely to be retried.  If we
haven't yet proposed, then it's forever more likely that a client will
eventually retry the message that triggered this proposal.

Basically, this mechanism attempted at fixing a non-problem, and was in
fact triggering some unforeseen issues that would have required increasing
the code complexity for no good reason.

Fixes: #5102
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agomon: Paxos: finish queued proposals instead of clearing the list
Joao Eduardo Luis [Wed, 22 May 2013 12:51:13 +0000 (13:51 +0100)]
mon: Paxos: finish queued proposals instead of clearing the list

By finishing these Contexts, we make sure the Contexts they enclose (to be
called once the proposal goes through) will behave as their were initially
planned:  for instance, a C_Command() may retry the command if a -EAGAIN
is passed to 'finish_contexts', while a C_Trimmed() will simply set
'going_to_trim' to false.

This aims at fixing at least a bug in which Paxos will stop trimming if an
election is triggered while a trim is queued but not yet finished.  Such
happens because it is the C_Trimmed() context that is responsible for
resetting 'going_to_trim' back to false.  By clearing all the contexts on
the proposal list instead of finishing them, we stay forever unable to
trim Paxos again as 'going_to_trim' will stay True till the end of time as
we know it.

Fixes: #4895
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agoMerge pull request #297 from dalgaaf/wip-da-CID-727982
Sage Weil [Wed, 22 May 2013 15:54:56 +0000 (08:54 -0700)]
Merge pull request #297 from dalgaaf/wip-da-CID-727982

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #310 from dalgaaf/wip-da-CID-fixes-4
Sage Weil [Wed, 22 May 2013 15:37:09 +0000 (08:37 -0700)]
Merge pull request #310 from dalgaaf/wip-da-CID-fixes-4

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agomds/Migrator.cc: fix possible dereference NULL return value 310/head
Danny Al-Gaaf [Wed, 22 May 2013 15:28:06 +0000 (17:28 +0200)]
mds/Migrator.cc: fix possible dereference NULL return value

CID 716997 (#1 of 1): Dereference null return value (NULL_RETURNS)
  dereference: Dereferencing a pointer that might be null "in" when
  calling "MDSCacheObject::is_auth() const".

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomds/Migrator.cc: fix possible dereference NULL return value
Danny Al-Gaaf [Wed, 22 May 2013 15:25:16 +0000 (17:25 +0200)]
mds/Migrator.cc: fix possible dereference NULL return value

ID 716998 (#1 of 2): Dereference null return value (NULL_RETURNS)
  dereference: Dereferencing a pointer that might be null "in" when
  calling "operator <<(std::ostream &, CInode &)".

CID 716998 (#2 of 2): Dereference null return value (NULL_RETURNS)
  dereference: Dereferencing a pointer that might be null "in" when
  calling "MDCache::add_replica_dir(ceph::buffer::list::iterator &,
  CInode *, int, std::list<Context *, std::allocator<Context *> > &)".

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomds/Migrator.cc: delete some empty lines at EOF
Danny Al-Gaaf [Wed, 22 May 2013 15:23:40 +0000 (17:23 +0200)]
mds/Migrator.cc: delete some empty lines at EOF

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomds/Migrator.cc: fix possible dereference NULL return value
Danny Al-Gaaf [Wed, 22 May 2013 15:21:59 +0000 (17:21 +0200)]
mds/Migrator.cc: fix possible dereference NULL return value

CID 716999 (#1 of 1): Dereference null return value (NULL_RETURNS)
  dereference: Dereferencing a pointer that might be null "in" when
  calling "CInode::put_stickydirs()".

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoMerge pull request #309 from dalgaaf/wip-da-CID-fixes-3-v2
Sage Weil [Wed, 22 May 2013 15:20:49 +0000 (08:20 -0700)]
Merge pull request #309 from dalgaaf/wip-da-CID-fixes-3-v2

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agomds/Migrator.cc: fix dereference NULL return value
Danny Al-Gaaf [Wed, 22 May 2013 15:17:01 +0000 (17:17 +0200)]
mds/Migrator.cc: fix dereference NULL return value

CID 717000 (#1 of 2): Dereference null return value (NULL_RETURNS)
  dereference: Dereferencing a pointer that might be null "dir" when
  calling "operator <<(std::ostream &, CDir &)".

CID 717000 (#2 of 2): Dereference null return value (NULL_RETURNS)
  dereference: Dereferencing a pointer that might be null "dir" when
  calling "Migrator::import_reverse_unfreeze(CDir *)".

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomds/Migrator.cc: fix possible NULL pointer dereference
Danny Al-Gaaf [Wed, 22 May 2013 15:06:40 +0000 (17:06 +0200)]
mds/Migrator.cc: fix possible NULL pointer dereference

Move dout() calls behind the related asserts to prevent possible NULL
pointer dereference.

CID 717001 (#1 of 1): Dereference null return value (NULL_RETURNS)
  dereference: Dereferencing a pointer that might be null "diri" when calling
  "operator <<(std::ostream &, CInode

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomds/Server.cc: fix possible NULL pointer dereference 309/head
Danny Al-Gaaf [Fri, 17 May 2013 12:38:24 +0000 (14:38 +0200)]
mds/Server.cc: fix possible NULL pointer dereference

Add asserts to solve these CID issues:

CID 717002 (#1 of 2): Dereference null return value (NULL_RETURNS)
  dereference: Dereferencing a pointer that might be null "dir"
  when calling "CDir::lookup(std::string const &, snapid_t)".
CID 717002 (#2 of 2): Dereference null return value (NULL_RETURNS)
  dereference: Dereferencing a pointer that might be null "dir"
  when calling "CDir::lookup(std::string const &, snapid_t)".

CID 717003 (#1 of 2): Dereference null return value (NULL_RETURNS)
  dereference: Dereferencing a pointer that might be null "dn" when
  calling "operator <<(std::ostream &, CDentry &)"
CID 717003 (#2 of 2): Dereference null return value (NULL_RETURNS)
  dereference: Dereferencing a pointer that might be null "straydn"
  when calling "CDentry::push_projected_linkage()".

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agosrc/rbd.cc: silence CID COPY_PASTE_ERROR warning
Danny Al-Gaaf [Fri, 17 May 2013 12:15:23 +0000 (14:15 +0200)]
src/rbd.cc: silence CID COPY_PASTE_ERROR warning

CID 1021212 (#1 of 1): Copy-paste error (COPY_PASTE_ERROR)
  copy_paste_error: "r" in "r = -*__errno_location()" looks like
  a copy-paste error. Should it say "fd" instead?

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomds/MDS.cc: fix dereference null return value
Danny Al-Gaaf [Wed, 22 May 2013 13:42:52 +0000 (15:42 +0200)]
mds/MDS.cc: fix dereference null return value

Fix for:

returned_null: Function "SessionMap::get_session(entity_name_t)" returns
  null (checked 12 out of 14 times)

CID 739601 (#1 of 1): Dereference null return value (NULL_RETURNS)
  dereference: Dereferencing a pointer that might be null
  "this->sessionmap.get_session(entity_name_t::CLIENT(client.v))" when
  calling "MDS::send_message_client_counted(Message *, Session *)"

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomds/MDCache.cc: fix possible NULL pointer dereference
Danny Al-Gaaf [Wed, 15 May 2013 16:14:06 +0000 (18:14 +0200)]
mds/MDCache.cc: fix possible NULL pointer dereference

Assert if 'cur' is NULL.

CID 966616 (#1 of 1): Dereference null return value (NULL_RETURNS)
  dereference: Dereferencing a pointer that might be null "cur" when
  calling "CInode::is_dir()".

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomon: Paxos: finish_proposal() when we're finished recovering
Joao Eduardo Luis [Fri, 17 May 2013 17:23:36 +0000 (18:23 +0100)]
mon: Paxos: finish_proposal() when we're finished recovering

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agomon: implement --extract-monmap <filename>
Sage Weil [Tue, 21 May 2013 21:36:11 +0000 (14:36 -0700)]
mon: implement --extract-monmap <filename>

This will make for a simpler process for
  http://ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit c0268e27497a4d8228ef54da9d4ca12f3ac1f1bf)

12 years agodoc: update mon cluster rescue process for cuttlefish+
Sage Weil [Tue, 21 May 2013 21:45:29 +0000 (14:45 -0700)]
doc: update mon cluster rescue process for cuttlefish+

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoFix usage for "ceph osd lost"
David Zafman [Tue, 21 May 2013 21:43:41 +0000 (14:43 -0700)]
Fix usage for "ceph osd lost"

Will be superceded, but use this commit to backport

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>