]> git.apps.os.sepia.ceph.com Git - ceph-ci.git/log
ceph-ci.git
8 years agoqa/tasks: rbd-mirror daemon not properly run in foreground mode
Jason Dillaman [Fri, 14 Jul 2017 14:32:28 +0000 (10:32 -0400)]
qa/tasks: rbd-mirror daemon not properly run in foreground mode

Fixes: http://tracker.ceph.com/issues/20630
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
8 years agoMerge pull request #16043 from jcsp/wip-dashboard-updates
Sage Weil [Fri, 14 Jul 2017 14:16:33 +0000 (09:16 -0500)]
Merge pull request #16043 from jcsp/wip-dashboard-updates

mgr: dashboard improvements

Reviewed-by: Kefu Chai <kchai@redhat.com>
8 years agomgr/dashboard: update health display
John Spray [Thu, 22 Jun 2017 00:34:27 +0000 (20:34 -0400)]
mgr/dashboard: update health display

This takes account of the new health format, also
expands and visually cleans up the frontpage
where we put the health information.

Dark backgrounds make it much easier to use
red/amber/green colours to grab attention.

Signed-off-by: John Spray <john.spray@redhat.com>
8 years agomgr: expose a MgrMap in PyModules
John Spray [Thu, 22 Jun 2017 00:12:58 +0000 (20:12 -0400)]
mgr: expose a MgrMap in PyModules

Signed-off-by: John Spray <john.spray@redhat.com>
8 years agoMerge pull request #16020 from jcsp/wip-20383
Sage Weil [Fri, 14 Jul 2017 14:04:16 +0000 (09:04 -0500)]
Merge pull request #16020 from jcsp/wip-20383

mgr: clean up daemon start process

Reviewed-by: Sage Weil <sage@redhat.com>
8 years agoMerge pull request #16338 from scienceluo/wip-doc-branch
Jos Collin [Fri, 14 Jul 2017 13:58:15 +0000 (13:58 +0000)]
Merge pull request #16338 from scienceluo/wip-doc-branch

doc/release-notes: Luminous release notes typo fixes

Reviewed-by: Jos Collin <jcollin@redhat.com>
8 years agodoc/release-notes: Luminous release notes typo fixes "systemctl ceph-osd.target"...
Luo Kexue [Fri, 14 Jul 2017 10:17:07 +0000 (18:17 +0800)]
doc/release-notes: Luminous release notes typo fixes "systemctl ceph-osd.target"->"systemctl restart ceph-osd.target" and "systemctl ceph-mgr.target"->"systemctl restart ceph-mgr.target"

Signed-off-by: Luo Kexue <luo.kexue@zte.com.cn>
8 years agoMerge pull request #16318 from smithfarm/wip-jewel-10-2-9
Nathan Cutler [Fri, 14 Jul 2017 10:07:18 +0000 (12:07 +0200)]
Merge pull request #16318 from smithfarm/wip-jewel-10-2-9

doc: Jewel v10.2.9 release notes

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
8 years agoMerge pull request #16330 from scienceluo/master
Kefu Chai [Fri, 14 Jul 2017 08:31:42 +0000 (16:31 +0800)]
Merge pull request #16330 from scienceluo/master

doc/release-notes: Luminous release notes typo fixes  "ceph config-key ls"->"ceph config-key list"

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Reviewed-by: Kefu Chai <kchai@redhat.com>
8 years agodoc:fix release-notes
scienceluo [Thu, 13 Jul 2017 13:05:58 +0000 (21:05 +0800)]
doc:fix release-notes

Signed-off-by: luo.kexue@zte.com.cn
8 years agoMerge pull request #16300 from liewegas/wip-20600
Sage Weil [Fri, 14 Jul 2017 03:16:39 +0000 (22:16 -0500)]
Merge pull request #16300 from liewegas/wip-20600

mon: fix hang on deprecated/removed 'pg set_*full_ratio' commands

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
8 years agoMerge pull request #16321 from Yan-waller/wip-walle-0712cephosd
Sage Weil [Fri, 14 Jul 2017 03:16:16 +0000 (22:16 -0500)]
Merge pull request #16321 from Yan-waller/wip-walle-0712cephosd

common: misc cleanups in common, global, os, osd submodules

Reviewed-by: Jos Collin <jcollin@redhat.com>
8 years agoMerge pull request #16322 from liewegas/wip-20617
Sage Weil [Fri, 14 Jul 2017 03:15:03 +0000 (22:15 -0500)]
Merge pull request #16322 from liewegas/wip-20617

qa/tasks/ceph_manager: wait longer for pg stats to flush

Reviewed-by: Kefu Chai <kchai@redhat.com>
8 years agoMerge pull request #16323 from ceph/revert-15897-wip-20390
Sage Weil [Fri, 14 Jul 2017 03:14:35 +0000 (22:14 -0500)]
Merge pull request #16323 from ceph/revert-15897-wip-20390

Revert "msg/async: increase worker reference with local listen table enabled backend"

8 years agoMerge pull request #16319 from tchaikov/wip-ceph-helper-with-exp-features
Kefu Chai [Fri, 14 Jul 2017 03:13:57 +0000 (11:13 +0800)]
Merge pull request #16319 from tchaikov/wip-ceph-helper-with-exp-features

qa/workunits/ceph-helpers: enable experimental features for osd

Reviewed-by: Loic Dachary <ldachary@redhat.com>
8 years agoMerge pull request #16320 from tchaikov/wip-clang-analyzer-warnings
Kefu Chai [Fri, 14 Jul 2017 03:10:52 +0000 (11:10 +0800)]
Merge pull request #16320 from tchaikov/wip-clang-analyzer-warnings

test,mon,msg: kill clang analyzer warnings

Reviewed-by: Haomai Wang <haomai@xsky.com>
Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
8 years agodoc: Jewel v10.2.9 changelog
Nathan Cutler [Thu, 13 Jul 2017 21:13:46 +0000 (23:13 +0200)]
doc: Jewel v10.2.9 changelog

Signed-off-by: Nathan Cutler <ncutler@suse.com>
8 years agomon/PGMonitor: EOPNOTSUPP for old pgmon commands
Sage Weil [Thu, 13 Jul 2017 17:59:26 +0000 (13:59 -0400)]
mon/PGMonitor: EOPNOTSUPP for old pgmon commands

This includes 'pg set_full_ratio', which we have only for the upgrade, but
goes away afterwards.

Also, return true to either swallow the request or indicate it has been
processed.

Fixes: http://tracker.ceph.com/issues/20600
Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon: do not assign to never-read variable
Kefu Chai [Thu, 13 Jul 2017 10:44:45 +0000 (18:44 +0800)]
mon: do not assign to never-read variable

this silences clang analyzer's warning of

Value stored to 'err' is never read

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agomsg/async/rdma: return stored errno on error
Kefu Chai [Thu, 13 Jul 2017 10:42:11 +0000 (18:42 +0800)]
msg/async/rdma: return stored errno on error

otherwise the errno would be overwritten, and we are returning 0 or the
errno set by ::close()

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agotest: test_c_headers: silence clang analyzer warnings
Kefu Chai [Thu, 13 Jul 2017 10:38:36 +0000 (18:38 +0800)]
test: test_c_headers: silence clang analyzer warnings

this silences clang analyzer's warnings like:

Value stored to 'ret' is never read

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agoMerge pull request #16262 from liewegas/wip-20208
Gregory Farnum [Thu, 13 Jul 2017 16:37:42 +0000 (09:37 -0700)]
Merge pull request #16262 from liewegas/wip-20208

mgr/ClusterState: do not mangle PGMap outside of Incremental

8 years agoMerge pull request #9974 from weiqiaomiao/wqm-wip-copy_obj
Yehuda Sadeh [Thu, 13 Jul 2017 16:36:12 +0000 (09:36 -0700)]
Merge pull request #9974 from weiqiaomiao/wqm-wip-copy_obj

rgw: fix memory leak in copy_obj_to_remote_dest

Reviewed-by: Orit Wasserman <owasserm@redhat.com>
8 years agoMerge pull request #11124 from zhangsw/cleanup-rgwrados-deleteobj
Yehuda Sadeh [Thu, 13 Jul 2017 16:22:59 +0000 (09:22 -0700)]
Merge pull request #11124 from zhangsw/cleanup-rgwrados-deleteobj

rgw: remove a redundant judgement in rgw_rados.cc:delete_obj.

Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
8 years agoMerge pull request #12010 from zhangsw/fix-rgw-multipart-bug
Yehuda Sadeh [Thu, 13 Jul 2017 16:19:36 +0000 (09:19 -0700)]
Merge pull request #12010 from zhangsw/fix-rgw-multipart-bug

rgw: Fix a bug that multipart upload may exceed the quota.

Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
8 years agoMerge pull request #12197 from zhangsw/fix-rgw-metasync-lock-bug
Yehuda Sadeh [Thu, 13 Jul 2017 16:14:45 +0000 (09:14 -0700)]
Merge pull request #12197 from zhangsw/fix-rgw-metasync-lock-bug

rgw: lock is not released when set sync marker is failed.

Reviewed-by: Casey Bodley <cbodley@redhat.com>
8 years agoqa/tasks/ceph_manager: wait longer for pg stats to flush
Sage Weil [Thu, 13 Jul 2017 16:13:45 +0000 (12:13 -0400)]
qa/tasks/ceph_manager: wait longer for pg stats to flush

An ill-timed mgr restart could blow the current 15s wait.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoMerge pull request #16021 from joscollin/wip-uninitialized-pointer-fields-1
Yehuda Sadeh [Thu, 13 Jul 2017 16:12:06 +0000 (09:12 -0700)]
Merge pull request #16021 from joscollin/wip-uninitialized-pointer-fields-1

rgw: Initialize pointer fields

Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
8 years agoMerge pull request #16014 from xiexingguo/wip-autoclass
Sage Weil [Thu, 13 Jul 2017 15:50:45 +0000 (10:50 -0500)]
Merge pull request #16014 from xiexingguo/wip-autoclass

osd/OSD: auto class on osd start up

Reviewed-by: Sage Weil <sage@redhat.com>
8 years agoMerge pull request #15774 from shashalu/drop-temp-var
Yuri Weinstein [Thu, 13 Jul 2017 15:41:19 +0000 (08:41 -0700)]
Merge pull request #15774 from shashalu/drop-temp-var

rgw: using RGW_OBJ_NS_MULTIPART in check_bad_index_multipart

Reviewed-by: Casey Bodley <cbodley@redhat.com>
8 years agodoc: Jewel v10.2.9 release notes
Nathan Cutler [Thu, 13 Jul 2017 09:59:37 +0000 (11:59 +0200)]
doc: Jewel v10.2.9 release notes

Signed-off-by: Nathan Cutler <ncutler@suse.com>
8 years agoRevert "msg/async: increase worker reference with local listen table enabled backend"
Haomai Wang [Thu, 13 Jul 2017 15:19:11 +0000 (23:19 +0800)]
Revert "msg/async: increase worker reference with local listen table enabled backend"

8 years agoMerge pull request #16255 from trociny/wip-test-librbd-internals
Jason Dillaman [Thu, 13 Jul 2017 15:18:09 +0000 (11:18 -0400)]
Merge pull request #16255 from trociny/wip-test-librbd-internals

test/librbd: re-enable internal tests in ceph_test_librbd

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
8 years agoMerge pull request #16130 from fullerdj/wip-djf-ceph-connect-timeout
Sage Weil [Thu, 13 Jul 2017 14:46:10 +0000 (09:46 -0500)]
Merge pull request #16130 from fullerdj/wip-djf-ceph-connect-timeout

ceph.in: Check return value when connecting

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Alfredo Deza <adeza@redhat.com>
8 years agoMerge pull request #16243 from markhpc/wip-bluestore-freelist-iterator
Sage Weil [Thu, 13 Jul 2017 14:43:46 +0000 (09:43 -0500)]
Merge pull request #16243 from markhpc/wip-bluestore-freelist-iterator

os/bluestore: Make BitmapFreelistManager kv itereator short lived.

Reviewed-by: Sage Weil <sage@redhat.com>
8 years agoMerge pull request #16269 from liewegas/wip-bluestore-deferred-pending
Sage Weil [Thu, 13 Jul 2017 14:37:40 +0000 (09:37 -0500)]
Merge pull request #16269 from liewegas/wip-bluestore-deferred-pending

os/bluestore: only submit deferred if there is any

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Reviewed-by: Mark Nelson <mnelson@redhat.com>
8 years agoMerge pull request #16306 from liewegas/wip-reg11184-health
Sage Weil [Thu, 13 Jul 2017 14:33:27 +0000 (09:33 -0500)]
Merge pull request #16306 from liewegas/wip-reg11184-health

qa/suites/rados/singleton/all/reg11184: whitelist health warnings

8 years agoosd: cleanups
Yan Jun [Wed, 12 Jul 2017 06:20:33 +0000 (14:20 +0800)]
osd: cleanups

Signed-off-by: Yan Jun <yan.jun8@zte.com.cn>
8 years agoMerge pull request #16317 from tchaikov/wip-0-osd-is-not-an-error
Kefu Chai [Thu, 13 Jul 2017 13:59:40 +0000 (21:59 +0800)]
Merge pull request #16317 from tchaikov/wip-0-osd-is-not-an-error

qa/workunits/ceph-helpers: test wait_for_health_ok differently

Reviewed-by: Loic Dachary <ldachary@redhat.com>
8 years agoMerge pull request #16274 from smithfarm/wip-jewel-10-2-8
Kefu Chai [Thu, 13 Jul 2017 13:01:40 +0000 (21:01 +0800)]
Merge pull request #16274 from smithfarm/wip-jewel-10-2-8

doc: Jewel v10.2.8 release notes

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Gregory Farnum <gfarnum@redhat.com>
8 years agoMerge pull request #16267 from liewegas/wip-restful-defaults
Boris Ranto [Thu, 13 Jul 2017 12:04:01 +0000 (14:04 +0200)]
Merge pull request #16267 from liewegas/wip-restful-defaults

mgr/restful: bind to :: and update docs

Reviewed-by: John Spray <john.spray@redhat.com>
Reviewed-by: Boris Ranto <branto@redhat.com>
8 years agoMerge pull request #16311 from Songweibin/wip-update-info-affi
Kefu Chai [Thu, 13 Jul 2017 11:19:55 +0000 (19:19 +0800)]
Merge pull request #16311 from Songweibin/wip-update-info-affi

.mailmap, .organizationmap: Update Song Weibin information and affiliation

Reviewed-by: Abhishek Lekshmanan <abhishek@suse.com>
8 years agoMerge pull request #16303 from bstillwell/releases-doc-update-201707
Kefu Chai [Thu, 13 Jul 2017 11:14:30 +0000 (19:14 +0800)]
Merge pull request #16303 from bstillwell/releases-doc-update-201707

doc/releases: Update releases from Feb 2017 to July 2017

Reviewed-by: Kefu Chai <kchai@redhat.com>
8 years agodoc: add v10.2.8 changelog
Nathan Cutler [Thu, 13 Jul 2017 10:18:17 +0000 (12:18 +0200)]
doc: add v10.2.8 changelog

Signed-off-by: Nathan Cutler <ncutler@suse.com>
8 years agoqa/workunits/ceph-helpers: enable experimental features for osd
Kefu Chai [Thu, 13 Jul 2017 09:57:07 +0000 (17:57 +0800)]
qa/workunits/ceph-helpers: enable experimental features for osd

it matches the settings in vstart.sh, also it would be handy for those
who are still developing on btrfs, which is now marked as an experimental
features now.

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agoqa/workunits/ceph-helpers: test wait_for_health_ok differently
Kefu Chai [Thu, 13 Jul 2017 09:43:39 +0000 (17:43 +0800)]
qa/workunits/ceph-helpers: test wait_for_health_ok differently

0 OSDs is not an error anymore in the new health checking implemented by
OSDMap::check_health(). this case was treated as an error before, see
OSDMonitor::get_health(). but an osdmap without any OSD is fine, i
think. but an osdmap with 3 OSDs, but all of them are down and out, this
is an error. and we do report this as an error. so, let's update the
test instead.

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agoMerge pull request #16277 from smithfarm/wip-rn-script
Kefu Chai [Thu, 13 Jul 2017 08:48:52 +0000 (16:48 +0800)]
Merge pull request #16277 from smithfarm/wip-rn-script

tools: ceph-release-notes: handle an edge case

Reviewed-by: Kefu Chai <kchai@redhat.com>
8 years agodoc: release notes: note MDS regression in 10.2.8
Nathan Cutler [Wed, 12 Jul 2017 07:05:12 +0000 (09:05 +0200)]
doc: release notes: note MDS regression in 10.2.8

See the discussion in https://github.com/ceph/ceph/pull/16192

Signed-off-by: Nathan Cutler <ncutler@suse.com>
8 years agoMerge pull request #16264 from dillaman/wip-20571
Mykola Golub [Thu, 13 Jul 2017 07:57:42 +0000 (10:57 +0300)]
Merge pull request #16264 from dillaman/wip-20571

rbd-mirror: ignore permission errors on rbd_mirroring object

Reviewed-by: Mykola Golub <mgolub@mirantis.com>
8 years ago.mailmap, .organizationmap: Update Song Weibin information and affiliation
songweibin [Thu, 13 Jul 2017 03:42:17 +0000 (11:42 +0800)]
.mailmap, .organizationmap: Update Song Weibin information and affiliation

Signed-off-by: songweibin <song.weibin@zte.com.cn>
8 years agoqa/suites/rados/singleton/all/reg11184: whitelist health warnings
Sage Weil [Wed, 12 Jul 2017 22:39:24 +0000 (18:39 -0400)]
qa/suites/rados/singleton/all/reg11184: whitelist health warnings

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoUpdate releases from Feb 2017 to July 2017
Bryan Stillwell [Wed, 12 Jul 2017 21:38:59 +0000 (15:38 -0600)]
Update releases from Feb 2017 to July 2017

Pull in the latest releases from the past 5 months and fix some of the
links so they jump to the correct sections in the release notes.

Signed-off-by: Bryan Stillwell <bstillwell@godaddy.com>
8 years agoMerge pull request #16301 from dmick/master
Sage Weil [Wed, 12 Jul 2017 19:55:56 +0000 (14:55 -0500)]
Merge pull request #16301 from dmick/master

mgr: increase debug level for ticks 0 -> 10

8 years agoMgr: increase debug level for ticks 0 -> 10
Dan Mick [Wed, 12 Jul 2017 19:40:01 +0000 (15:40 -0400)]
Mgr: increase debug level for ticks 0 -> 10

Signed-off-by: Dan Mick <dmick@redhat.com>
8 years agomon/MonCommands: mark 'pg set_*_ratio' deprecated
Sage Weil [Wed, 12 Jul 2017 19:09:00 +0000 (15:09 -0400)]
mon/MonCommands: mark 'pg set_*_ratio' deprecated

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoMerge pull request #16263 from liupan1111/wip-fix-fio
Sage Weil [Wed, 12 Jul 2017 18:21:18 +0000 (13:21 -0500)]
Merge pull request #16263 from liupan1111/wip-fix-fio

test/fio: remove experimental option for bluestore & rocksdb.

Reviewed-by: Casey Bodley <cbodley@redhat.com>
8 years agoMerge pull request #15643 from liewegas/wip-health
Sage Weil [Wed, 12 Jul 2017 17:10:47 +0000 (12:10 -0500)]
Merge pull request #15643 from liewegas/wip-health

mon: revamp health check/warning system

8 years agomon/PGMap: adjust scrub checks to avoid overflow for future stamps
Sage Weil [Wed, 12 Jul 2017 13:17:55 +0000 (09:17 -0400)]
mon/PGMap: adjust scrub checks to avoid overflow for future stamps

Avoid an overflow (and false warning) when scrub stamps are in the future.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoqa/workunits/cephtool/test.sh: adjust full tests to avoid races
Sage Weil [Wed, 12 Jul 2017 12:10:47 +0000 (08:10 -0400)]
qa/workunits/cephtool/test.sh: adjust full tests to avoid races

OSDs may report fullness in any order.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoqa/tasks/ceph: wait for osds to come up before creating pool
Sage Weil [Tue, 11 Jul 2017 03:48:47 +0000 (23:48 -0400)]
qa/tasks/ceph: wait for osds to come up before creating pool

Avoid health warnings.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoqa/tasks/ceph_test_case.py: update health check helpers
Sage Weil [Tue, 11 Jul 2017 03:39:31 +0000 (23:39 -0400)]
qa/tasks/ceph_test_case.py: update health check helpers

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoqa/suites/fs: whitelist health warnings
Sage Weil [Mon, 10 Jul 2017 16:40:01 +0000 (12:40 -0400)]
qa/suites/fs: whitelist health warnings

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoqa/suites/rgw/thrash: whitelist
Sage Weil [Mon, 10 Jul 2017 16:39:50 +0000 (12:39 -0400)]
qa/suites/rgw/thrash: whitelist

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoqa/suites/rbd: whitelist health messages
Sage Weil [Mon, 10 Jul 2017 16:25:23 +0000 (12:25 -0400)]
qa/suites/rbd: whitelist health messages

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoqa: whitelist health warnings
Sage Weil [Thu, 6 Jul 2017 21:58:16 +0000 (17:58 -0400)]
qa: whitelist health warnings

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoqa/workunits/cephtool/test.sh: adjust for new health error codes
Sage Weil [Fri, 7 Jul 2017 03:24:52 +0000 (23:24 -0400)]
qa/workunits/cephtool/test.sh: adjust for new health error codes

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/MgrMonitor: clear last_beacon after mon election
Sage Weil [Thu, 6 Jul 2017 21:53:34 +0000 (17:53 -0400)]
mon/MgrMonitor: clear last_beacon after mon election

The last_beacon map is local to an election interval; if there is a new
election completed we should reset it or else we may kill an apparently
laggy mgr that hasn't been able to get a beacon processed due to the mon
quorum changing, or had its beacon processed on a different leader.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon: clean up `osd out` messages
John Spray [Tue, 4 Jul 2017 22:37:25 +0000 (18:37 -0400)]
mon: clean up `osd out` messages

Cleaner prose for the auto-out case, and add
a cluster log message for OSDs that go out
at the behest of the administrator.

Signed-off-by: John Spray <john.spray@redhat.com>
8 years agoosd: don't log per-PG backfill messages at INFO level
John Spray [Tue, 4 Jul 2017 17:29:38 +0000 (13:29 -0400)]
osd: don't log per-PG backfill messages at INFO level

This behaviour led to way too many messages going to
the cluster log when an OSD is marked in.  Retain
the messages at debug level.

Signed-off-by: John Spray <john.spray@redhat.com>
8 years agomon: simplify PG health checks
John Spray [Tue, 4 Jul 2017 13:52:59 +0000 (09:52 -0400)]
mon: simplify PG health checks

Instead of a distinct health check for each possible
PG state, group the states into categories for availability,
degraded, damage, and report on that.

That way, while a PG/pool is suffering from one of those
bad PG states, health conditions don't keep toggling on and
off as we transition from one unavailable state to another
unavailable state.

Signed-off-by: John Spray <john.spray@redhat.com>
8 years agomgr/dashboard: update for new style health checks
John Spray [Mon, 26 Jun 2017 18:27:38 +0000 (14:27 -0400)]
mgr/dashboard: update for new style health checks

Signed-off-by: John Spray <john.spray@redhat.com>
8 years agomon: demote cluster map prints to DEBUG level
John Spray [Fri, 23 Jun 2017 10:39:56 +0000 (06:39 -0400)]
mon: demote cluster map prints to DEBUG level

The PaxosService subclasses should be writing out
informative log messages, and not relying on
a stream of map summary prints to communicate
changes.

Signed-off-by: John Spray <john.spray@redhat.com>
8 years agomon: prettify health check log messages
John Spray [Fri, 23 Jun 2017 10:37:53 +0000 (06:37 -0400)]
mon: prettify health check log messages

Add a "Cluster is now healthy" to give clarity
after a series of "health check cleared" that
they were the last ones.

Convert certain health check messages into
well formed sentences.

Don't print severity in the log string (it's already
expressed in the severity of the log entry.

Signed-off-by: John Spray <john.spray@redhat.com>
8 years agomgr: fix spurious PG health messages on mgr restart
John Spray [Thu, 22 Jun 2017 21:41:35 +0000 (17:41 -0400)]
mgr: fix spurious PG health messages on mgr restart

Previously, the mgr would send MMonMgrReport indicating
a very unhappy PGMap to the mon right after startup.

This is a change to hold off on sending that report until
all the OSDs have reported in, or until some time has passed.

Signed-off-by: John Spray <john.spray@redhat.com>
8 years agomon: don't consider a starting mgr to be an error
John Spray [Thu, 22 Jun 2017 15:18:32 +0000 (11:18 -0400)]
mon: don't consider a starting mgr to be an error

The .available flag is there to tell MgrClients whether
to try and connect -- it isn't the right condition
for a health complaint.

Signed-off-by: John Spray <john.spray@redhat.com>
8 years agomon: pass new style health to mgr
John Spray [Thu, 22 Jun 2017 00:14:19 +0000 (20:14 -0400)]
mon: pass new style health to mgr

Signed-off-by: John Spray <john.spray@redhat.com>
8 years agomon: prefix periodic health reminder with 'overall'
Sage Weil [Fri, 30 Jun 2017 16:28:27 +0000 (12:28 -0400)]
mon: prefix periodic health reminder with 'overall'

...so we can whitelist it.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/PGMap: rename a few health checks
Sage Weil [Fri, 30 Jun 2017 03:30:17 +0000 (23:30 -0400)]
mon/PGMap: rename a few health checks

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomgr/DaemonServer: debug log health checks
Sage Weil [Thu, 29 Jun 2017 22:22:24 +0000 (18:22 -0400)]
mgr/DaemonServer: debug log health checks

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/MgrStatMonitor: show health check count on receipt
Sage Weil [Thu, 29 Jun 2017 22:20:28 +0000 (18:20 -0400)]
mon/MgrStatMonitor: show health check count on receipt

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomessages/MMonMgrReport: show health check count
Sage Weil [Thu, 29 Jun 2017 22:19:06 +0000 (18:19 -0400)]
messages/MMonMgrReport: show health check count

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/PGMap: only warn about too few pgs after >0 pools exist
Sage Weil [Thu, 29 Jun 2017 20:08:38 +0000 (16:08 -0400)]
mon/PGMap: only warn about too few pgs after >0 pools exist

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/PGMap: some stuck warnings are err, some warn
Sage Weil [Thu, 29 Jun 2017 16:38:49 +0000 (12:38 -0400)]
mon/PGMap: some stuck warnings are err, some warn

inactive and stale -> error
degraded, unclean, undersized -> warning

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoqa/suites/rados: whitelist health warnings
Sage Weil [Tue, 20 Jun 2017 19:29:28 +0000 (15:29 -0400)]
qa/suites/rados: whitelist health warnings

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/PGMap: do not warn about recovering, peering, stale
Sage Weil [Wed, 28 Jun 2017 04:32:50 +0000 (00:32 -0400)]
mon/PGMap: do not warn about recovering, peering, stale

Wait for stuck before complaining.  These aren't scary in and of
themselves.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoqa/tasks/mon_clock_skew_check: vastly simplify
Sage Weil [Tue, 27 Jun 2017 18:57:53 +0000 (14:57 -0400)]
qa/tasks/mon_clock_skew_check: vastly simplify

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon: shorten 'stuck' threshold from 5m -> 1m
Sage Weil [Tue, 27 Jun 2017 18:09:22 +0000 (14:09 -0400)]
mon: shorten 'stuck' threshold from 5m -> 1m

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoosd/OSDMap: add per-osd flag OSD_FLAGS check
Sage Weil [Tue, 27 Jun 2017 19:01:16 +0000 (15:01 -0400)]
osd/OSDMap: add per-osd flag OSD_FLAGS check

rename OSD_FLAGS to OSDMAP_FLAGS

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoosd/OSDMap: rename a few health checks
Sage Weil [Wed, 5 Jul 2017 16:06:16 +0000 (12:06 -0400)]
osd/OSDMap: rename a few health checks

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon: move osd health checks into OSDMap method
Sage Weil [Fri, 23 Jun 2017 13:45:42 +0000 (09:45 -0400)]
mon: move osd health checks into OSDMap method

...with one check moving into HealthMonitor where it
belongs.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoqa/tasks/ceph: stop logging health on shutdown
Sage Weil [Wed, 21 Jun 2017 18:02:05 +0000 (14:02 -0400)]
qa/tasks/ceph: stop logging health on shutdown

Don't log health during actual teardown or we'll see
various scary messages unrelated to our test run.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/MgrMonitor: do not issue MGR_DOWN on new cluster
Sage Weil [Tue, 20 Jun 2017 16:44:18 +0000 (12:44 -0400)]
mon/MgrMonitor: do not issue MGR_DOWN on new cluster

It is normal for the initial cluster to lack a mgr.  Wait for some
grace period before complaining about a missing mgr.

Default to 30m.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/Monitor: periodically log new-style health warnings to log
Sage Weil [Tue, 27 Jun 2017 18:33:00 +0000 (14:33 -0400)]
mon/Monitor: periodically log new-style health warnings to log

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/MDSMonitor: implement new-style cephfs health checks
Sage Weil [Thu, 15 Jun 2017 02:23:42 +0000 (22:23 -0400)]
mon/MDSMonitor: implement new-style cephfs health checks

Our detail elements are still strings, so we keep the bit that collapses
the metadata into a string and appends it to the string.

Each MDS-generated item becomes a detail record.

Health checks are consolidated either by MDS_ or FS_, counting the
number of mds servers or file systems affected.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon,mgr: pass new-style health checks from mgr's PGMap
Sage Weil [Tue, 13 Jun 2017 19:08:19 +0000 (15:08 -0400)]
mon,mgr: pass new-style health checks from mgr's PGMap

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/PGMap: implement new-style health checks
Sage Weil [Tue, 13 Jun 2017 19:07:46 +0000 (15:07 -0400)]
mon/PGMap: implement new-style health checks

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/OSDMonitor: implement new health checks
Sage Weil [Tue, 13 Jun 2017 12:16:14 +0000 (08:16 -0400)]
mon/OSDMonitor: implement new health checks

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon: new health check framework
Sage Weil [Mon, 12 Jun 2017 22:39:15 +0000 (18:39 -0400)]
mon: new health check framework

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon: HealthMonitor -> OldHealthMonitor
Sage Weil [Mon, 12 Jun 2017 21:44:57 +0000 (17:44 -0400)]
mon: HealthMonitor -> OldHealthMonitor

This will go away post-luminous.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon: remove Formatter arg to QuorumService::get_health()
Sage Weil [Mon, 12 Jun 2017 20:15:27 +0000 (16:15 -0400)]
mon: remove Formatter arg to QuorumService::get_health()

This is used to dump extra weirdness to the health detail structured
output, but we are about to remove all of that in luminous.

Signed-off-by: Sage Weil <sage@redhat.com>