]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
7 years agoqa/suites/powercycle/osd/whitelist_health: whitelist more 17306/head
Sage Weil [Mon, 28 Aug 2017 13:38:58 +0000 (09:38 -0400)]
qa/suites/powercycle/osd/whitelist_health: whitelist more

"2017-08-26 16:09:27.704418 mon.a mon.0 172.21.15.169:6789/0 876 : cluster [WRN] MDS health message (mds.0): Behind on trimming (66/30)" in cluster log

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoMerge pull request #17282 from dillaman/wip-21017-luminous
Jason Dillaman [Sun, 27 Aug 2017 12:44:33 +0000 (08:44 -0400)]
Merge pull request #17282 from dillaman/wip-21017-luminous

luminous: mgr/dashboard: fix duplicate images listed on iSCSI status page

Reviewed-by: John Spray <john.spray@redhat.com>
7 years agomgr/dashboard: fix duplicate images listed on iSCSI status page 17282/head
Jason Dillaman [Thu, 17 Aug 2017 00:43:40 +0000 (20:43 -0400)]
mgr/dashboard: fix duplicate images listed on iSCSI status page

Fixes: http://tracker.ceph.com/issues/21017
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit e16e52ee1d30a86f1e9ddbbff6add8dec6deb7c4)

7 years agoMerge pull request #17274 from xiexingguo/wip-luminous-pr-17271
Sage Weil [Sun, 27 Aug 2017 02:02:41 +0000 (21:02 -0500)]
Merge pull request #17274 from xiexingguo/wip-luminous-pr-17271

luminous: mon/MonCommands: fix copy-and-paste error

7 years agoMerge pull request #17273 from xiexingguo/wip-luminous-pr-17268
Sage Weil [Sun, 27 Aug 2017 02:01:49 +0000 (21:01 -0500)]
Merge pull request #17273 from xiexingguo/wip-luminous-pr-17268

luminous: os/bluestore: compensate for bad freelistmanager size/blocks metadata

7 years agoos/bluestore: compensate for bad freelistmanager size/blocks metadata 17273/head
Sage Weil [Fri, 25 Aug 2017 22:08:25 +0000 (18:08 -0400)]
os/bluestore: compensate for bad freelistmanager size/blocks metadata

This repairs bluestores created before http://tracker.ceph.com/issues/21089
was fixed in f6f1ae3724d593d3709d982c973ec18a25a47b6e.

In both cases, the freelistmanager's size is off by one block (4k).  In
one case, it is just a matter of fixing the size and twiddling the trailing
bit.  In the second case, the size delta causes freelistmanager to need
a new row, which means the blocks count also changes, and we have lots
of bits to zero (all but one in the new row).

Both are silently corrected by fsck in this patch.

Fixes: http://tracker.ceph.com/issues/21089
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit c029a9645b13d0c0cf412940010b90ac10638ec3)

7 years agomon/MonCommands: fix copy-and-paste error 17274/head
xie xingguo [Sat, 26 Aug 2017 02:09:11 +0000 (10:09 +0800)]
mon/MonCommands: fix copy-and-paste error

Class is definitely required by default for the "crush rule ls-by-class"
command.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 49e293d5523840af4ae6ea589f6b015602a63c40)

7 years agoMerge pull request #17233 from tchaikov/wip-luminous-pr-17183
Sage Weil [Sat, 26 Aug 2017 18:13:46 +0000 (13:13 -0500)]
Merge pull request #17233 from tchaikov/wip-luminous-pr-17183

luminous: osd/PGBackend: delete reply if fails to complete delete request

7 years agoMerge pull request #17257 from tchaikov/wip-luminous-pr-17248
Sage Weil [Sat, 26 Aug 2017 18:13:25 +0000 (13:13 -0500)]
Merge pull request #17257 from tchaikov/wip-luminous-pr-17248

luminous: mon/OSDMonitor: check creating_pgs.last_scan_epoch instead when sending creates

7 years agoMerge pull request #17260 from liewegas/wip-pr-17029-luminous
Sage Weil [Sat, 26 Aug 2017 18:13:10 +0000 (13:13 -0500)]
Merge pull request #17260 from liewegas/wip-pr-17029-luminous

luminous: mon: "ceph osd crush rule rename" support

7 years agoMerge pull request #17232 from tchaikov/wip-luminous-pr-17179
Sage Weil [Fri, 25 Aug 2017 15:25:22 +0000 (10:25 -0500)]
Merge pull request #17232 from tchaikov/wip-luminous-pr-17179

luminous: mon/OSDMonitor: fix improper input/testing range of crush somke testing

7 years agoMerge pull request #17234 from theanalyst/wip-21097-luminous
Sage Weil [Fri, 25 Aug 2017 15:24:23 +0000 (10:24 -0500)]
Merge pull request #17234 from theanalyst/wip-21097-luminous

luminous: multisite: FAILED assert(prev_iter != pos_to_prev.end()) in RGWMetaSyncShardCR::collect_children()

Reviewed-by: Casey Bodley <cbodley@redhat.com>
7 years agocrush, mon: "ceph osd crush rule ls-by-class" support 17260/head
xie xingguo [Thu, 17 Aug 2017 02:34:26 +0000 (10:34 +0800)]
crush, mon: "ceph osd crush rule ls-by-class" support

This command returns all crush rules that are currently
referencing the device class specified by user.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 7c67f95201316a240c7cdf1d8619b7642ff8fc33)

7 years agomon: "ceph osd crush rule rename" support
xie xingguo [Tue, 15 Aug 2017 08:46:15 +0000 (16:46 +0800)]
mon: "ceph osd crush rule rename" support

User may specify a rule with the same name of the pool that it serves.
Since a pool can be renamed, so does the rule.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit a5075ed253940471b347ba0773f66ea6e61398d0)

7 years agoMerge pull request #17259 from cbodley/wip-luminous-qa-rgw-pool-application
Sage Weil [Fri, 25 Aug 2017 14:59:30 +0000 (09:59 -0500)]
Merge pull request #17259 from cbodley/wip-luminous-qa-rgw-pool-application

luminous: qa/rgw: use 'ceph osd pool application enable' on created pools

7 years agoqa/rgw: enable 'rgw' application on created pools 17259/head
Casey Bodley [Tue, 22 Aug 2017 17:56:11 +0000 (13:56 -0400)]
qa/rgw: enable 'rgw' application on created pools

Signed-off-by: Casey Bodley <cbodley@redhat.com>
7 years agoqa: add optional 'application' to pool creation helpers
Casey Bodley [Tue, 22 Aug 2017 17:55:47 +0000 (13:55 -0400)]
qa: add optional 'application' to pool creation helpers

Signed-off-by: Casey Bodley <cbodley@redhat.com>
7 years agomon/OSDMonitor: check creating_pgs.last_scan_epoch instead when sending creates 17257/head
Kefu Chai [Fri, 25 Aug 2017 06:13:14 +0000 (14:13 +0800)]
mon/OSDMonitor: check creating_pgs.last_scan_epoch instead when sending creates

we cannot be sure that the creating_pgs_by_osd_epoch is in sync with
creating_pgs.pgs even if mapping.get_epoch() is less or equal to
creating_pgs_epoch. because, 1) the access to mapping.epoch is not
protected by lock, 2) even worse, the mapping might be not finished yet
when we are trying to send pg-creates to subscribers.

so instead of comparing creating_pgs_epoch with mapping's epoch, we
should compare it with creating_pgs.last_scan_epoch. the former is
updated once the creating_pgs_by_osd_epoch is updated with the latest
mapping's epoch and creating_pgs.pgs; the latter is updated with current
osdmap's epoch when creating_pgs is being updated with the inc osd map.
if we are using a creating_pgs_epoch in sync, creating_pgs_epoch should
be creating_pgs.last_scan_epoch + 1.

Fixes: http://tracker.ceph.com/issues/20785
Signed-off-by: Kefu Chai <kchai@redhat.com>
7 years agoosd/PGBackend: release a msg using msg->put() not delete 17233/head
Kefu Chai [Fri, 25 Aug 2017 02:36:56 +0000 (10:36 +0800)]
osd/PGBackend: release a msg using msg->put() not delete

fix the regression introduced by 1c18b5cb

Fixes: http://tracker.ceph.com/issues/20913
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit d2d941dd19b9dd6e41429d92cdab8390f9c3084d)

7 years agoosd/PGBackend: delete reply if fails to complete delete request
Kefu Chai [Wed, 23 Aug 2017 08:34:12 +0000 (16:34 +0800)]
osd/PGBackend: delete reply if fails to complete delete request

if any of the objects fails to be deleted due to pg reset after latest
osdmap, the pg recovery delete reply won't be sent to the primary OSD.
in that case, we should delete the reply.

Fixes: http://tracker.ceph.com/issues/20913
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 1c18b5cb0c27d7976e6d3d5e4ea6c3935685019b)

7 years agoMerge pull request #17214 from tchaikov/wip-luminous-pr-17014
Sage Weil [Fri, 25 Aug 2017 02:30:46 +0000 (21:30 -0500)]
Merge pull request #17214 from tchaikov/wip-luminous-pr-17014

luminous: crush: various weight-set fixes

7 years agoMerge pull request #17215 from tchaikov/wip-luminous-pr-17099
Sage Weil [Fri, 25 Aug 2017 02:30:33 +0000 (21:30 -0500)]
Merge pull request #17215 from tchaikov/wip-luminous-pr-17099

luminous: mon/PGMap: fix "0 stuck requests are blocked > 4096 sec" warn

7 years agoMerge pull request #17228 from tchaikov/wip-luminous-pr-17058
Sage Weil [Fri, 25 Aug 2017 02:30:22 +0000 (21:30 -0500)]
Merge pull request #17228 from tchaikov/wip-luminous-pr-17058

luminous: crush: fix CrushCompiler won't compile maps with empty shadow tree

7 years agoMerge pull request #17229 from tchaikov/wip-luminous-pr-17083
Sage Weil [Fri, 25 Aug 2017 02:30:11 +0000 (21:30 -0500)]
Merge pull request #17229 from tchaikov/wip-luminous-pr-17083

luminous: crush: force rebuilding shadow hierarchy after swapping buckets

7 years agoMerge pull request #17230 from tchaikov/wip-luminous-pr-17034
Sage Weil [Fri, 25 Aug 2017 02:29:59 +0000 (21:29 -0500)]
Merge pull request #17230 from tchaikov/wip-luminous-pr-17034

luminous: mon/OSDMonitor: add plain output for "crush class ls-osd" command

7 years agoMerge PR #17240 into luminous
Patrick Donnelly [Thu, 24 Aug 2017 20:22:28 +0000 (13:22 -0700)]
Merge PR #17240 into luminous

* refs/remotes/upstream/pull/17240/head:
mds: check cap string only if !allow_all
mds/MDSDaemon: add 'is_valid=false' when failed to parse caps

Reviewed-by: Sage Weil <sage@redhat.com>
7 years agoMerge PR #17238 into luminous
Patrick Donnelly [Thu, 24 Aug 2017 20:22:26 +0000 (13:22 -0700)]
Merge PR #17238 into luminous

* refs/remotes/upstream/pull/17238/head:
mon: get writeable osdmap for added data pool

Reviewed-by: Sage Weil <sage@redhat.com>
7 years agoMerge PR #17237 into luminous
Patrick Donnelly [Thu, 24 Aug 2017 20:22:24 +0000 (13:22 -0700)]
Merge PR #17237 into luminous

* refs/remotes/upstream/pull/17237/head:
fuse: use c++ allocations for group list

Reviewed-by: Abhishek Lekshmanan <abhishek.lekshmanan@gmail.com>
7 years agoMerge PR #17236 into luminous
Patrick Donnelly [Thu, 24 Aug 2017 20:22:22 +0000 (13:22 -0700)]
Merge PR #17236 into luminous

* refs/remotes/upstream/pull/17236/head:
client: fix compat version on MStatfs

Reviewed-by: Sage Weil <sage@redhat.com>
7 years agoMerge PR #17235 into luminous
Patrick Donnelly [Thu, 24 Aug 2017 20:17:19 +0000 (13:17 -0700)]
Merge PR #17235 into luminous

* refs/remotes/upstream/pull/17235/head:
client: fix locking in Client::getcwd

Reviewed-by: Sage Weil <sage@redhat.com>
7 years agoqa/tasks/ceph_deploy: gatherkeys before mgr deploy
Sage Weil [Thu, 24 Aug 2017 13:52:17 +0000 (09:52 -0400)]
qa/tasks/ceph_deploy: gatherkeys before mgr deploy

Otherwise we may be missing the bootstrap-mgr key.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 800fdd9953013f975acd70de060bb828c83f30bc)

7 years agoMerge pull request #17196 from theanalyst/wip-21051-luminous
Sage Weil [Thu, 24 Aug 2017 17:40:26 +0000 (12:40 -0500)]
Merge pull request #17196 from theanalyst/wip-21051-luminous

luminous: Improve size scrub error handling and ignore system attrs in xattr checking

Reviewed-by: Kefu Chai <kchai@redhat.com>
7 years agoMerge pull request #17241 from dzafman/wip-pidfile-luminous
Sage Weil [Thu, 24 Aug 2017 17:39:49 +0000 (12:39 -0500)]
Merge pull request #17241 from dzafman/wip-pidfile-luminous

test/CMakeLists: disable test_pidfile.sh

Reviewed-by: Abhishek Lekshmanan <abhishek.lekshmanan@gmail.com>
7 years agomds: check cap string only if !allow_all 17240/head
Patrick Donnelly [Mon, 21 Aug 2017 21:12:47 +0000 (14:12 -0700)]
mds: check cap string only if !allow_all

This corrects a regression introduced by #16891 which fixes
http://tracker.ceph.com/issues/20990. Not using cephx would
cause all clients to fail auth with:

    2017-08-17 12:21:05.191958 7f5b788d4700  0 -- 127.0.0.1:0/65887226 >> 127.0.0.1:6805/3339248996 conn(0x1004be8a0 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply connect got BADAUTHORIZER

Fixes: http://tracker.ceph.com/issues/21027
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit ce1995fc63829f854f2da16c68bee09c03efa180)

7 years agomds/MDSDaemon: add 'is_valid=false' when failed to parse caps
Yanhu Cao [Tue, 8 Aug 2017 10:55:54 +0000 (18:55 +0800)]
mds/MDSDaemon: add 'is_valid=false' when failed to parse caps

Fixes: http://tracker.ceph.com/issues/20990
Signed-off-by: Yanhu Cao <gmayyyha@gmail.com>
(cherry picked from commit 353a89728ca82112873cc85ccc59b6cf7a3f37da)

7 years agotest/CMakeLists: disable test_pidfile.sh 17241/head
Sage Weil [Thu, 10 Aug 2017 19:41:38 +0000 (15:41 -0400)]
test/CMakeLists: disable test_pidfile.sh

Too flaky, see http://tracker.ceph.com/issues/20975

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit ff3de2304497544033837bb8d0c809a9e54a3e6e)

7 years agomon: get writeable osdmap for added data pool 17238/head
Patrick Donnelly [Tue, 22 Aug 2017 19:11:25 +0000 (12:11 -0700)]
mon: get writeable osdmap for added data pool

Continuation of: 435717791ec499f71c9d1485b1e4e63239a343e2

Fixes: http://tracker.ceph.com/issues/21064
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 94a62a3dff847e515510550f5e8af4a61671d667)

7 years agofuse: use c++ allocations for group list 17237/head
Jeff Layton [Wed, 23 Aug 2017 16:13:14 +0000 (12:13 -0400)]
fuse: use c++ allocations for group list

Valgrind is unhappy about our turning on supplimentary group handling
with fuse by default. The problem is that we end up calling delete to
free the supplimentary gids list, but fuse uses malloc to allocate it.

Note that I was initially concerned that I needed to use malloc and
free there to handle the case of userland calling ceph_userperm_new,
but we leave freeing the pointer up to the caller in that case.

Convert fuse to use new/delete to allocate and free the group lists
instead.

Tracker: http://tracker.ceph.com/issues/21065
Signed-off-by: Jeff Layton <jlayton@redhat.com>
(cherry picked from commit f4fe5e2d524f8cca74f80a8a80fcd3e82b9effcb)

7 years agoclient: fix compat version on MStatfs 17236/head
John Spray [Wed, 23 Aug 2017 13:52:22 +0000 (14:52 +0100)]
client: fix compat version on MStatfs

Fixes: http://tracker.ceph.com/issues/21078
Signed-off-by: John Spray <john.spray@redhat.com>
(cherry picked from commit 744160d784463dd6220707c3bcc96a2194997aab)

7 years agoclient: fix locking in Client::getcwd 17235/head
Jeff Layton [Wed, 23 Aug 2017 17:49:40 +0000 (13:49 -0400)]
client: fix locking in Client::getcwd

Currently, it doesn't take the client_lock at all, which is problematic
as make_request may very well end up unlocking it. Rename the current
function to _getcwd, and add a new getcwd wrapper that takes the mutex
before calling _getcwd.

This fixes: http://tracker.ceph.com/issues/21082

Signed-off-by: Jeff Layton <jlayton@redhat.com>
(cherry picked from commit 72909729254f70f3d8c6ec4191b1fead2212f3ed)

7 years agocls/log: cls_log_list always returns next marker 17234/head
Casey Bodley [Mon, 14 Aug 2017 19:25:44 +0000 (15:25 -0400)]
cls/log: cls_log_list always returns next marker

commit 5334622a8365520fa4247241f97422c044cbf5b2 changed cls_log_list()
to only return the next marker if the results were truncated

this broke RGWMetaSyncShardCR in rgw_sync.cc, which relies on
cls_log_list() to track its max_marker

Fixes: http://tracker.ceph.com/issues/20906
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit f7ea4ea2b2264fc74beb04872246a853efb9206a)

7 years agomon/OSDMonitor: fix improper input/testing range of crush somke testing 17231/head 17232/head
xie xingguo [Wed, 23 Aug 2017 03:31:35 +0000 (11:31 +0800)]
mon/OSDMonitor: fix improper input/testing range of crush somke testing

CrushTester::test() will reset testing range to [0, 1023] whenever
min_x or max_x is negative and the constructor of CrushTester will
always default min_x and max_x to -1.

Thus to set the test range correctly, you have to specify both min_x and max_x.
Local test shows this patch shall decrease the time consumed by the crush
smoke testing to approximate 1/20 of those without this.

For exmaple:
crush somke test duration: 0.668354 seconds ->
crush somke test duration: 0.012592 seconds

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit e128a1e913fb4a224e392cd09203fc7cf4fa9f5f)

7 years agomon/OSDMonitor: add plain output for "crush class ls-osd" command 17230/head
xie xingguo [Tue, 15 Aug 2017 12:13:50 +0000 (20:13 +0800)]
mon/OSDMonitor: add plain output for "crush class ls-osd" command

Was:
ceph osd crush rm-device-class `ceph osd crush class ls-osd pool_bar`
Error EINVAL: Expected option value to be integer, got '[', unable to parse osd id:"[".

Now:
ceph osd crush rm-device-class `ceph osd crush class ls-osd pool_bar`
done removing class of osd(s): 0,2,4

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 985c21e77a9987f8f648341e04591abeb93304c3)

7 years agocrush: force rebuilding shadow hierarchy after swapping buckets 17229/head
xie xingguo [Fri, 18 Aug 2017 04:36:57 +0000 (12:36 +0800)]
crush: force rebuilding shadow hierarchy after swapping buckets

Was:
---------------------------------------------------------------
ID CLASS WEIGHT  TYPE NAME
-8   ssd 3.00000 root fake-root~ssd
-6   ssd 3.00000     host fake~ssd
 0   ssd 1.00000         osd.0
 1   ssd 1.00000         osd.1
 2   ssd 1.00000         osd.2
-7       3.00000 root fake-root
-5       3.00000     host gitbuilder-ceph-rpm-centos7-amd64-basic
 3   ssd 1.00000         osd.3
 4   ssd 1.00000         osd.4
 5   ssd 1.00000         osd.5
-4   ssd 3.00000 root default~ssd
-3   ssd 3.00000     host gitbuilder-ceph-rpm-centos7-amd64-basic~ssd
 3   ssd 1.00000         osd.3
 4   ssd 1.00000         osd.4
 5   ssd 1.00000         osd.5
-1       3.00000 root default
-2       3.00000     host fake
 0   ssd 1.00000         osd.0
 1   ssd 1.00000         osd.1
 2   ssd 1.00000         osd.2

Now:
---------------------------------------------------------------
ID CLASS WEIGHT  TYPE NAME
-8   ssd 3.00000 root fake-root~ssd
-7   ssd 3.00000     host gitbuilder-ceph-rpm-centos7-amd64-basic~ssd
 3   ssd 1.00000         osd.3
 4   ssd 1.00000         osd.4
 5   ssd 1.00000         osd.5
-6       3.00000 root fake-root
-5       3.00000     host gitbuilder-ceph-rpm-centos7-amd64-basic
 3   ssd 1.00000         osd.3
 4   ssd 1.00000         osd.4
 5   ssd 1.00000         osd.5
-4   ssd 3.00000 root default~ssd
-3   ssd 3.00000     host fake~ssd
 0   ssd 1.00000         osd.0
 1   ssd 1.00000         osd.1
 2   ssd 1.00000         osd.2
-1       3.00000 root default
-2       3.00000     host fake
 0   ssd 1.00000         osd.0
 1   ssd 1.00000         osd.1
 2   ssd 1.00000         osd.2

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 7e3528a9dc7fb53bce5b2c39b644c2b2b2c5d417)

7 years agocrush: fix CrushCompiler won't compile maps with empty shadow tree 17228/head
xie xingguo [Thu, 17 Aug 2017 06:45:13 +0000 (14:45 +0800)]
crush: fix CrushCompiler won't compile maps with empty shadow tree

Steps to reproduce:
(1) ceph osd crush rm-device-class osd.0
(2) ceph osd crush set-device-class foo osd.0
(3) ceph osd crush rule create-replicated foo_rule default host foo
(4) ceph osd crush rm-device-class osd.0
(5) ceph osd getcrushmap -o crushmap
(6) crushtool -d crushmap -o crushmap.txt
(7) crushtool -c crushmap.txt -o crushmap
    unknown device class 'foo'

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 3e4fe5bc7410cecaff86c7c216a3e63eb94f6213)

7 years agocrush: rebuild shadow trees on removing crush rule
xie xingguo [Fri, 18 Aug 2017 02:25:13 +0000 (10:25 +0800)]
crush: rebuild shadow trees on removing crush rule

In case this is the last crush rule which is still referencing
a specific device class. Otherwise the device class might stay
hanging.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 968b72d8595996ee4d2e1364264c4c8203532045)

7 years agoqa/suites/upgrade/jewel-x/parallel: tolerate laggy mgr
Sage Weil [Thu, 24 Aug 2017 14:30:01 +0000 (10:30 -0400)]
qa/suites/upgrade/jewel-x/parallel: tolerate laggy mgr

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit bf296018ff5b0a610869b8cbc8e385038aa6e2b7)

7 years agoqa/suites/upgrade/jewel-x/stress-split: tolerate sloppy past_intervals
Sage Weil [Thu, 24 Aug 2017 14:23:22 +0000 (10:23 -0400)]
qa/suites/upgrade/jewel-x/stress-split: tolerate sloppy past_intervals

This is harmless in general, esp during upgrade.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit d5d5d7d1d22fc8da7b7df21cf527aa4c9ed54c9f)

7 years agoMerge pull request #17208 from ceph/backport-bz1484002
Alfredo Deza [Thu, 24 Aug 2017 13:07:41 +0000 (09:07 -0400)]
Merge pull request #17208 from ceph/backport-bz1484002

luminous ceph-volume: use unique logical volumes

7 years agoMerge pull request #16985 from dzafman/wip-standalone-luminous
Sage Weil [Thu, 24 Aug 2017 03:07:17 +0000 (22:07 -0500)]
Merge pull request #16985 from dzafman/wip-standalone-luminous

luminous: tests: qa/standalone: misc fixes

7 years agoMerge pull request #17113 from theanalyst/wip-luminous-20962
Sage Weil [Thu, 24 Aug 2017 03:07:08 +0000 (22:07 -0500)]
Merge pull request #17113 from theanalyst/wip-luminous-20962

luminous: rgw: Fix rgw not responding occasionally when receiving SIGHUP signal.

7 years agoMerge pull request #17114 from theanalyst/wip-21043-luminous
Sage Weil [Thu, 24 Aug 2017 03:06:58 +0000 (22:06 -0500)]
Merge pull request #17114 from theanalyst/wip-21043-luminous

luminous: rgw: S3 v4 auth fails when query string contains

7 years agoMerge pull request #17172 from tchaikov/wip-luminous-pr-16887
Sage Weil [Thu, 24 Aug 2017 03:06:16 +0000 (22:06 -0500)]
Merge pull request #17172 from tchaikov/wip-luminous-pr-16887

luminous: mon: fix wrong mon-num counting logic of 'ceph features' command

7 years agoMerge pull request #17173 from tchaikov/wip-luminous-pr-16940
Sage Weil [Thu, 24 Aug 2017 03:05:59 +0000 (22:05 -0500)]
Merge pull request #17173 from tchaikov/wip-luminous-pr-16940

luminous: mgr: add missing call to pick_addresses

7 years agoMerge pull request #17174 from tchaikov/wip-luminous-pr-17059
Sage Weil [Thu, 24 Aug 2017 03:05:20 +0000 (22:05 -0500)]
Merge pull request #17174 from tchaikov/wip-luminous-pr-17059

luminous: compressor: conditionalize on HAVE_LZ4

7 years agoMerge pull request #17176 from tchaikov/wip-luminous-pr-16967
Sage Weil [Thu, 24 Aug 2017 03:04:46 +0000 (22:04 -0500)]
Merge pull request #17176 from tchaikov/wip-luminous-pr-16967

luminous: mon: fix legacy health checks in 'ceph status' during upgrade; fix jewel-x upgrade combo

7 years agoMerge pull request #17191 from tchaikov/wip-luminous-pr-17065
Sage Weil [Thu, 24 Aug 2017 03:03:44 +0000 (22:03 -0500)]
Merge pull request #17191 from tchaikov/wip-luminous-pr-17065

luminous: mon/OSDMonitor: do not send_pg_creates with stale info

Reviewed-by: Sage Weil <sage@redhat.com>
7 years agoMerge pull request #17193 from theanalyst/wip-21048-luminous
Sage Weil [Thu, 24 Aug 2017 03:03:08 +0000 (22:03 -0500)]
Merge pull request #17193 from theanalyst/wip-21048-luminous

luminous: Include front/back interface names in OSD metadata

Reviewed-by: Kefu Chai <kchai@redhat.com>
7 years agoMerge pull request #17195 from theanalyst/wip-21077-luminous
Sage Weil [Thu, 24 Aug 2017 03:02:44 +0000 (22:02 -0500)]
Merge pull request #17195 from theanalyst/wip-21077-luminous

luminous: osd: osd_scrub_during_recovery only considers primary, not replicas

Reviewed-by: David Zafman <dzafman@redhat.com>
7 years agoMerge pull request #17198 from theanalyst/wip-21079-luminous
Sage Weil [Thu, 24 Aug 2017 03:02:05 +0000 (22:02 -0500)]
Merge pull request #17198 from theanalyst/wip-21079-luminous

luminous: mon: bug in functon reweight_by_utilization

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
7 years agomon/PGMap: fix "0 stuck requests are blocked > 4096 sec" warn 17215/head
xie xingguo [Sat, 19 Aug 2017 02:05:36 +0000 (10:05 +0800)]
mon/PGMap: fix "0 stuck requests are blocked > 4096 sec" warn

There are test cases I saw Ceph complained about:

2017-08-19 01:02:22.393763 mon.a mon.0 172.21.15.108:6789/0 279 : cluster [ERR] Health check failed: 0 stuck requests are blocked > 4096 sec (REQUEST_STUCK)

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 364178e71622be032b10273a845af4373278cb0c)

7 years agocrush: fix type mismatch 17214/head
xie xingguo [Tue, 15 Aug 2017 04:18:30 +0000 (12:18 +0800)]
crush: fix type mismatch

Pool IDs are of type int64_t instead of uint64_t.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 507f970b569280831ec9d12df8455227e9a31f36)

7 years agocrush: fix bucket_adjust_item_weight() won't update weight-set correctly
xie xingguo [Mon, 14 Aug 2017 07:57:07 +0000 (15:57 +0800)]
crush: fix bucket_adjust_item_weight() won't update weight-set correctly

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit d62c9f16cba5fef991b8405d3002da1c0a0ae090)

7 years agocrush: CrushWrapper::add_bucket - do not allow caller pass in null 'idout'
xie xingguo [Mon, 14 Aug 2017 06:23:24 +0000 (14:23 +0800)]
crush: CrushWrapper::add_bucket - do not allow caller pass in null 'idout'

*** Caught signal (Segmentation fault) **
 in thread 7f495c0f6300 thread_name:crushtool
 ceph version 12.1.2-768-gab69125 (ab6912523e779174f92f0b0fc10372bd0b645415) mimic (dev)
 1: (()+0x1a3d1) [0x7f495c1343d1]
 2: (()+0xf370) [0x7f4951deb370]
 3: (CrushWrapper::add_bucket(int, int, int, int, int, int*, int*, int*)+0x84) [0x7f49538ba084]
 4: (CrushCompiler::parse_bucket(__gnu_cxx::__normal_iterator<boost::spirit::tree_node<boost::spirit::node_val_data<char const*, boost::spirit::nil_t> >*, std::vector<boost::spirit::tree_node<boost::spirit::node_val_data<char const*, boost::spirit::nil_t> >, std::allocator<boost::spirit::tree_node<boost::spirit::node_val_data<char const*, boost::spirit::nil_t> > > > > const&)+0xef0) [0x7f49538dc170]
 5: (CrushCompiler::parse_crush(__gnu_cxx::__normal_iterator<boost::spirit::tree_node<boost::spirit::node_val_data<char const*, boost::spirit::nil_t> >*, std::vector<boost::spirit::tree_node<boost::spirit::node_val_data<char const*, boost::spirit::nil_t> >, std::allocator<boost::spirit::tree_node<boost::spirit::node_val_data<char const*, boost::spirit::nil_t> > > > > const&)+0x130) [0x7f49538dcba0]
 6: (CrushCompiler::compile(std::istream&, char const*)+0xb93) [0x7f49538deaa3]
 7: (main()+0x2615) [0x7f495c126015]
 8: (__libc_start_main()+0xf5) [0x7f49507ccb35]
 9: (()+0xf4b0) [0x7f495c1294b0]
2017-08-14 13:31:25.498050 7f495c0f6300 -1 *** Caught signal (Segmentation fault) **
 in thread 7f495c0f6300 thread_name:crushtool

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit dc8f925cb4eebffa5f4c13d1063ae3ae8d7e15bd)

7 years agocrush: update crush_choose_arg_map size on resizing
xie xingguo [Mon, 14 Aug 2017 06:19:09 +0000 (14:19 +0800)]
crush: update crush_choose_arg_map size on resizing

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 0c8ca02727b83c5bcbf896010e675762fdc333f5)

7 years agocrush: fix bucket_remove_item() won't update weight-set simultaneously
xie xingguo [Mon, 14 Aug 2017 06:15:53 +0000 (14:15 +0800)]
crush: fix bucket_remove_item() won't update weight-set simultaneously

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 0e1d37f6f3160cda9fe3578b21ff002d9ebe5b36)

7 years agocrush: fix bucket_add_item() won't update weight-set simultaneously
xie xingguo [Mon, 14 Aug 2017 06:13:59 +0000 (14:13 +0800)]
crush: fix bucket_add_item() won't update weight-set simultaneously

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 475bddab51931ca3d5ade6cf62b43e9dbb9a5b3e)

7 years agocrush: fix bucket index to weight-set
xie xingguo [Mon, 14 Aug 2017 06:12:16 +0000 (14:12 +0800)]
crush: fix bucket index to weight-set

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 859b066adf5de02a5b127d9177f792b80616cbea)

7 years agoqa/suites/upgrade/jewel-x/parallel: tolerate OBJECT_MISPLACED
Sage Weil [Wed, 23 Aug 2017 18:24:00 +0000 (14:24 -0400)]
qa/suites/upgrade/jewel-x/parallel: tolerate OBJECT_MISPLACED

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 5455f599b3a156c0f85f42d7faef8c5a59359982)

7 years agoqa/suites/upgarde/jewel-x/parallel: tolerate mgr warning
Sage Weil [Wed, 23 Aug 2017 18:22:34 +0000 (14:22 -0400)]
qa/suites/upgarde/jewel-x/parallel: tolerate mgr warning

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 2504ab167518ee48e5e1fd4d3efccb072a186442)

7 years agoceph-volume tests centos7 use the new ansible syntax for lvm 17208/head
Alfredo Deza [Wed, 23 Aug 2017 19:26:27 +0000 (15:26 -0400)]
ceph-volume tests centos7 use the new ansible syntax for lvm

Signed-off-by: Alfredo Deza <adeza@redhat.com>
(cherry picked from commit fbce7ad593ac5342ee6d1769e04d7f3adadab78a)

7 years agoceph-volume tests use the new ansible syntax for lvm
Alfredo Deza [Wed, 23 Aug 2017 19:24:08 +0000 (15:24 -0400)]
ceph-volume tests use the new ansible syntax for lvm

Signed-off-by: Alfredo Deza <adeza@redhat.com>
(cherry picked from commit 8f0f202ec4b4f39d748b5aec013b0bc541439c2b)

7 years agoceph-volume tests create tests for the get_lv helper method
Alfredo Deza [Wed, 23 Aug 2017 17:44:46 +0000 (13:44 -0400)]
ceph-volume tests create tests for the get_lv helper method

Signed-off-by: Alfredo Deza <adeza@redhat.com>
(cherry picked from commit 7584d64c0fd585fd0e02991ccb9cad2d147fb130)

7 years agoceph-volume tests create tests for the new arg validator
Alfredo Deza [Wed, 23 Aug 2017 17:29:15 +0000 (13:29 -0400)]
ceph-volume tests create tests for the new arg validator

Signed-off-by: Alfredo Deza <adeza@redhat.com>
(cherry picked from commit d5eb9640aa4b20eabc9b5fdd07559082346478c6)

7 years agoceph-volume util create a validator module for argparse
Alfredo Deza [Wed, 23 Aug 2017 17:28:56 +0000 (13:28 -0400)]
ceph-volume util create a validator module for argparse

Signed-off-by: Alfredo Deza <adeza@redhat.com>
(cherry picked from commit 0ce77806bd04d69cd9d1cdcd71e886f7763a1eb2)

7 years agoceph-volume lvm.prepare enforce usage of vg/lv when preparing lvm devices
Alfredo Deza [Wed, 23 Aug 2017 17:26:15 +0000 (13:26 -0400)]
ceph-volume lvm.prepare enforce usage of vg/lv when preparing lvm devices

Signed-off-by: Alfredo Deza <adeza@redhat.com>
(cherry picked from commit 192fe4e1dd24713f33131583de4ca81b625bad98)

7 years agoceph-volume lvm.common update help values for vg/lv usage
Alfredo Deza [Wed, 23 Aug 2017 17:21:56 +0000 (13:21 -0400)]
ceph-volume lvm.common update help values for vg/lv usage

Signed-off-by: Alfredo Deza <adeza@redhat.com>
(cherry picked from commit a0286da13bc1edef96489d7d525a1cd8e7066acf)

7 years agoceph-volume lvm.create update docstring for vg/lv usage
Alfredo Deza [Wed, 23 Aug 2017 17:20:07 +0000 (13:20 -0400)]
ceph-volume lvm.create update docstring for vg/lv usage

Signed-off-by: Alfredo Deza <adeza@redhat.com>
(cherry picked from commit f907f7a91dce2a5748dd4c07d5919fa56aae4a83)

7 years agoMerge pull request #17197 from theanalyst/wip-20965-luminous
Sage Weil [Wed, 23 Aug 2017 17:00:34 +0000 (12:00 -0500)]
Merge pull request #17197 from theanalyst/wip-20965-luminous

luminous: src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size())

Reviewed-by: Kefu Chai <kchai@redhat.com>
7 years agoMerge pull request #17202 from jdurgin/wip-lumionus-pg-log
Josh Durgin [Wed, 23 Aug 2017 16:23:37 +0000 (09:23 -0700)]
Merge pull request #17202 from jdurgin/wip-lumionus-pg-log

osd: adjust osd_min_pg_log_entries

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
7 years agoosd: adjust osd_min_pg_log_entries 17202/head
J. Eric Ivancich [Thu, 17 Aug 2017 20:12:35 +0000 (16:12 -0400)]
osd: adjust osd_min_pg_log_entries

Return osd_min_pg_log_entries to its original values, which matches
osd_pg_log_dups_tracked, so the extended dup log is not used in the
general case.

This helps address: http://tracker.ceph.com/issues/21026

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
(cherry picked from commit 1c4df03394312fe67f36448613d8b54cb1a0e2c9)

7 years agoMerge pull request #17192 from idryomov/wip-krbd-unmap-tests-pool-luminous
Venky Shankar [Wed, 23 Aug 2017 15:39:27 +0000 (21:09 +0530)]
Merge pull request #17192 from idryomov/wip-krbd-unmap-tests-pool-luminous

luminous: qa: fix POOL_APP_NOT_ENABLED warning in krbd:unmap suite

7 years agomon/PGMap: reweight::by_utilization - skip DNE osds 17198/head
xie xingguo [Thu, 17 Aug 2017 11:15:15 +0000 (19:15 +0800)]
mon/PGMap: reweight::by_utilization - skip DNE osds

EC could set one or more members of acting set to CRUSH_ITEM_NONE,
which as a result can cause pgs_by_osd.resize() attempt to apply
for a large amount of memory which we can not afford.

Fix the above problem by always excluding a current DNE osd from
acting set.

Fixes: http://tracker.ceph.com/issues/20970
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 6ae96d33a84404a144e06c423796e4a3886d8e61)

7 years agocommon/LogClient: make last_log non-atomic 17197/head
Sage Weil [Fri, 4 Aug 2017 21:18:17 +0000 (17:18 -0400)]
common/LogClient: make last_log non-atomic

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit c36a98be97e15294f57c6a640fe6a1c277dce8a3)

7 years agocommon/LogClient: fix indentation
Sage Weil [Fri, 4 Aug 2017 17:59:38 +0000 (13:59 -0400)]
common/LogClient: fix indentation

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit bf92a8a2693e0b6a01a861977a5376eb584c9e42)

7 years agocommon/LogClient: assign seq and queue atomically
Sage Weil [Fri, 4 Aug 2017 17:58:17 +0000 (13:58 -0400)]
common/LogClient: assign seq and queue atomically

The _get_mon_log_message() assumes that log_last and log_queue
are in sync, but it was previously possible to increment log_last
setting e.seq in do_log(), and only later queue it.  If a racing
thread ran get_mon_log_message() in the meantime it would fail
an assertion.

Fix by assigning the seq and queueing it atomically.  If the
cluster log is not enabled, use the get_next_seq() helper so that
graylog or syslog messages still have a seq assigned.

Fixes: http://tracker.ceph.com/issues/18209
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 1f8f58becf1ec53c729b0befeb1f19c601588c4a)

7 years agoosd: In scrub's be_select_auth_object() detect multiple errors better 17196/head
David Zafman [Wed, 26 Jul 2017 23:22:26 +0000 (16:22 -0700)]
osd: In scrub's be_select_auth_object() detect multiple errors better

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 75b425671a75c80ed95a52820cabf25d3fafcfff)

7 years agoosd, rados: Adding ss_attr_missing and ss_attr_corrupt errors to list-inconsistent-obj
David Zafman [Wed, 19 Jul 2017 01:45:57 +0000 (18:45 -0700)]
osd, rados: Adding ss_attr_missing and ss_attr_corrupt errors to list-inconsistent-obj

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 4c949b6258109884ce1683d4474c740d5e61aee6)

7 years agoosd, rados: Improve size scrub error handling
David Zafman [Fri, 14 Jul 2017 04:01:18 +0000 (21:01 -0700)]
osd, rados: Improve size scrub error handling

Fixes: http://tracker.ceph.com/issues/20243
Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 5f58301a1364e948834dabe503200dda07fc2790)

7 years agoosd: Compare all object info even when can't consider for auth copy
David Zafman [Thu, 13 Jul 2017 16:45:21 +0000 (09:45 -0700)]
osd: Compare all object info even when can't consider for auth copy

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 437e5cf1067658912fe15859d18615c733c84f1a)

7 years agoosd: Change a check to an assert() since it can't happen anymore
David Zafman [Thu, 13 Jul 2017 16:44:29 +0000 (09:44 -0700)]
osd: Change a check to an assert() since it can't happen anymore

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 8e2b9a07e0551895809a3fc036aae557fadc74ba)

7 years agoosd: Add whether shard is primary in list-inconsistent-obj
David Zafman [Thu, 6 Jul 2017 02:14:36 +0000 (19:14 -0700)]
osd: Add whether shard is primary in list-inconsistent-obj

Add new field in the client interface
Update test case

Fixes: http://tracker.ceph.com/issues/18836
Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 8ad4b291131058bbdb4267f4cad35a40fb905bb4)

7 years agotest: Add undocumented corrupt-size for testing
David Zafman [Fri, 30 Jun 2017 00:13:50 +0000 (17:13 -0700)]
test: Add undocumented corrupt-size for testing

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit c0606b9eea977074b560b44c4cd1a3d8e8bc3e0a)

7 years agoosd: Fix test op error message
David Zafman [Thu, 29 Jun 2017 23:44:29 +0000 (16:44 -0700)]
osd: Fix test op error message

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 8325639a320398dc5e4022634a2c4fc38a8fdba9)

7 years agoosd: Fixes for osd_scrub_during_recovery handling 17195/head
David Zafman [Tue, 15 Aug 2017 21:45:13 +0000 (14:45 -0700)]
osd: Fixes for osd_scrub_during_recovery handling

Fixes: http://tracker.ceph.com/issues/18206
Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 367c32c69a512d2bea85a9b3860ec28bb4433750)

7 years agotests: osd-scrub-snaps.sh minor cleanup
David Zafman [Tue, 15 Aug 2017 21:46:40 +0000 (14:46 -0700)]
tests: osd-scrub-snaps.sh minor cleanup

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 9f3d970a0dbb754ef34ffafa1c3e8d0d3c5982c7)

7 years agoosd: include front_iface+back_iface in metadata 17193/head
John Spray [Wed, 9 Aug 2017 11:09:55 +0000 (07:09 -0400)]
osd: include front_iface+back_iface in metadata

Fixes: http://tracker.ceph.com/issues/20956
Signed-off-by: John Spray <john.spray@redhat.com>
(cherry picked from commit 79491c15473310b2ac315f8698b11bdde588b20d)

7 years agocommon: return iface instead of addr from ipaddr.cc helpers
John Spray [Wed, 9 Aug 2017 11:08:58 +0000 (07:08 -0400)]
common: return iface instead of addr from ipaddr.cc helpers

So that we can use the same helper functions to look
up interface names that we use to look up addresses.

Signed-off-by: John Spray <john.spray@redhat.com>
(cherry picked from commit 687ea102697b2ec64b503f16c9aaeb46bee5da99)

7 years agomon/OSDMonitor: do not send_pg_creates with stale info 17191/head
Kefu Chai [Thu, 17 Aug 2017 11:01:40 +0000 (19:01 +0800)]
mon/OSDMonitor: do not send_pg_creates with stale info

we reset the "creating_pgs" with the newly accepted paxos proposal, but
the creating_pgs_by_osd_epoch is out-of-sync with the new creating_pgs.
so we are at risk of using stale creating_pgs_by_osd_epoch along with
the new creating_pgs.pgs. to avoid this racing, we need to check the
creating_pgs_epoch before sending pg-creates using
creating_pgs_by_osd_epoch.

Fixes: http://tracker.ceph.com/issues/20785
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 689cf03f5515bb116ddf3966ef57550baf97eee9)

7 years agoqa: fix POOL_APP_NOT_ENABLED warning in krbd:unmap suite 17192/head
Ilya Dryomov [Thu, 10 Aug 2017 09:54:53 +0000 (11:54 +0200)]
qa: fix POOL_APP_NOT_ENABLED warning in krbd:unmap suite

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit ad715b3368522a3913ccaa233abc9729f728e328)