]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
8 years agopybind/ceph_argparse: fix no arg check 15500/head
Sage Weil [Tue, 6 Jun 2017 13:23:32 +0000 (09:23 -0400)]
pybind/ceph_argparse: fix no arg check

This fixes breakage from commit a1214a702cfd8465deb91847f4dc63c0e80fb586
that caused a [] to be appended to some json argument lists.

Fixes: http://tracker.ceph.com/issues/20135
Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoMerge pull request #14657 from chardan/jfw-wip-halflife_atomic_t-mothra
Sage Weil [Mon, 5 Jun 2017 18:35:27 +0000 (13:35 -0500)]
Merge pull request #14657 from chardan/jfw-wip-halflife_atomic_t-mothra

messenger,client,compressor: migrate atomic_t to std::atomic

Reviewed-by: Sage Weil <sage@redhat.com>
8 years agoMerge pull request #15380 from xiexingguo/wip-cache-trim
Sage Weil [Mon, 5 Jun 2017 18:35:01 +0000 (13:35 -0500)]
Merge pull request #15380 from xiexingguo/wip-cache-trim

os/bluestore: move cache_trim into MempoolThread

Reviewed-by: Igor Fedotov <ifedotov@mirantis.com>
8 years agoMerge pull request #15409 from Liuchang0812/wip-support-mon-target-in-pybind
Sage Weil [Mon, 5 Jun 2017 18:34:33 +0000 (13:34 -0500)]
Merge pull request #15409 from Liuchang0812/wip-support-mon-target-in-pybind

pybind: support mon target in pybind

Reviewed-by: Kefu Chai <kchai@redhat.com>
8 years agoMerge pull request #15469 from wjwithagen/wip-wjw-revert-wvla
Sage Weil [Mon, 5 Jun 2017 18:33:41 +0000 (13:33 -0500)]
Merge pull request #15469 from wjwithagen/wip-wjw-revert-wvla

build: revert -Wvla from #15342

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
8 years agoMerge pull request #15470 from xiexingguo/wip-rm-false-assert
Sage Weil [Mon, 5 Jun 2017 18:33:16 +0000 (13:33 -0500)]
Merge pull request #15470 from xiexingguo/wip-rm-false-assert

os/bluestore: fix false asserts in Cache::trim_all()

Reviewed-by: Sage Weil <sage@redhat.com>
8 years agoMerge pull request #15478 from xiexingguo/wip-perf-avg-time
Sage Weil [Mon, 5 Jun 2017 18:32:53 +0000 (13:32 -0500)]
Merge pull request #15478 from xiexingguo/wip-perf-avg-time

common/perf_counters: add average time for PERFCOUNTER_TIME

Reviewed-by: Sage Weil <sage@redhat.com>
8 years agocommon/perf_counters: add average time for PERFCOUNTER_TIME 15478/head
xie xingguo [Mon, 5 Jun 2017 06:34:31 +0000 (14:34 +0800)]
common/perf_counters: add average time for PERFCOUNTER_TIME

Otherwise we'll have to calculate this manually, which is annoying.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
8 years agoos/bluestore: fix false asserts in Cache::trim_all() 15470/head
xie xingguo [Mon, 5 Jun 2017 03:29:39 +0000 (11:29 +0800)]
os/bluestore: fix false asserts in Cache::trim_all()

These asserts are true if we are going to shutdown the BlueStore instance.
But the caller can also be something like "ceph daemon out/osd.1.asok flush_store_cache",
which can fire these asserts as a result.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
8 years agoMerge pull request #15231 from smithfarm/wip-20052
Nathan Cutler [Mon, 5 Jun 2017 09:38:40 +0000 (11:38 +0200)]
Merge pull request #15231 from smithfarm/wip-20052

build/ops: rpm: make librbd1 %post scriptlet depend on coreutils

Reviewed-by: Ken Dreyer <kdreyer@redhat.com>
8 years agotool/ceph: pep8 clean up 15409/head
liuchang0812 [Wed, 31 May 2017 14:09:09 +0000 (22:09 +0800)]
tool/ceph: pep8 clean up

Signed-off-by: liuchang0812 <liuchang0812@gmail.com>
8 years agopybind: support mon target, and clean up tool
liuchang0812 [Wed, 31 May 2017 13:03:38 +0000 (21:03 +0800)]
pybind: support mon target, and clean up tool

Signed-off-by: liuchang0812 <liuchang0812@gmail.com>
8 years agoMerge pull request #15165 from badone/wip-cls-optimize-header-file-dependency
Kefu Chai [Mon, 5 Jun 2017 02:15:56 +0000 (10:15 +0800)]
Merge pull request #15165 from badone/wip-cls-optimize-header-file-dependency

cls: optimize header file dependency

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
8 years agobuild: revert -Wvla from #15342 15469/head
Willem Jan Withagen [Sun, 4 Jun 2017 19:25:08 +0000 (21:25 +0200)]
build: revert -Wvla from #15342

 - VLAs are in GCC and Clang, and are there to stay forever,
   if only to be compatible with all the software that is already
   out there.
   - Theoretical debates about VLA being hard to implement are
     long superceded by th actual implentations
 - Before setting this flag is would be required to first start
   work on fixing all the fallout/warnings that will arise from
   setting -Wvla
 - Allocating large variable/stuctures on the stack could be asking
   for trouble, but changes that ceph tools are going to be running
   on small embedded devices are rather slim.

Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
8 years agoMerge pull request #15073 from liewegas/wip-mgr-stats
Sage Weil [Sun, 4 Jun 2017 18:36:01 +0000 (13:36 -0500)]
Merge pull request #15073 from liewegas/wip-mgr-stats

mon,mgr: extricate PGmap from monitor

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
8 years agoqa/suites/upgrade/hammer-jewel-x: don't initially start mgr daemons 15073/head
Kefu Chai [Sun, 4 Jun 2017 05:19:10 +0000 (13:19 +0800)]
qa/suites/upgrade/hammer-jewel-x: don't initially start mgr daemons

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agoqa/workunits/rados/test_health_warning: misc fixes
Kefu Chai [Sat, 3 Jun 2017 18:13:02 +0000 (02:13 +0800)]
qa/workunits/rados/test_health_warning: misc fixes

* do not let osd shutdown itself by enlarge osd_max_markdown_count and
  shorten osd_max_markdown_period
* do not shutdown all osds in the last test. if all osds are shutdown at
  the same time. none of them will get updated osdmap after noup is
  unset. we should leave at least one of them, so the gossip protocol
  can kick in, and populate the news to all osds.

Fixes: http://tracker.ceph.com/issues/20174
Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agomon/OSDMonitor: add a space after __func__ in log message
Kefu Chai [Sat, 3 Jun 2017 17:37:05 +0000 (01:37 +0800)]
mon/OSDMonitor: add a space after __func__ in log message

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agoMerge pull request #15443 from trociny/wip-rbd_image_options_t
Jason Dillaman [Sat, 3 Jun 2017 13:43:11 +0000 (09:43 -0400)]
Merge pull request #15443 from trociny/wip-rbd_image_options_t

librbd: remove unused rbd_image_options_t ostream operator

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
8 years agobuild/ops: rpm: make librbd1 %post scriptlet depend on coreutils 15231/head
Nathan Cutler [Tue, 23 May 2017 09:36:54 +0000 (11:36 +0200)]
build/ops: rpm: make librbd1 %post scriptlet depend on coreutils

Fixes: http://tracker.ceph.com/issues/20052
Signed-off-by: Giacomo Comes <comes@naic.edu>
Signed-off-by: Nathan Cutler <ncutler@suse.com>
8 years agoMerge pull request #15347 from tchaikov/wip-cmake
Kefu Chai [Sat, 3 Jun 2017 12:44:24 +0000 (20:44 +0800)]
Merge pull request #15347 from tchaikov/wip-cmake

cmake: rgw: do not link against boost in a wholesale

Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Nathan Cutler <ncutler@suse.com>
8 years agoMerge pull request #15353 from wjwithagen/wip-wjw-compare-object_catcher
Jos Collin [Sat, 3 Jun 2017 11:54:40 +0000 (17:24 +0530)]
Merge pull request #15353 from wjwithagen/wip-wjw-compare-object_catcher

test/osdc: fix comparison error and silence warning from -Wunused-value

Reviewed-by: Jos Collin <jcollin@redhat.com>
8 years agocmake: rgw: do not link against boost in a wholesale 15347/head
Kefu Chai [Mon, 29 May 2017 06:28:51 +0000 (14:28 +0800)]
cmake: rgw: do not link against boost in a wholesale

With the new Beast frontend, RGW now has a small Boost dependency [1] which was
being addressed by statically (and unconditionally) linking *all* the Boost
libraries. This patch ensures that only the necessary Boost components are
linked.

We use the target_link_libraries(<target> <item>...) [2] syntax to ensure that the
library dependencies are transitive: i.e. "when this target is linked into
another target then the libraries linked to this target will appear on the link
line for the other target too."

[1] The boost/asio/spawn.hpp header used by rgw_asio_frontend.cc depends on
    boost::coroutine/boost::context

[2] https://cmake.org/cmake/help/v3.3/command/target_link_libraries.html#libraries-for-both-a-target-and-its-dependents

Signed-off-by: Nathan Cutler <ncutler@suse.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agolibrbd: remove unused rbd_image_options_t ostream operator 15443/head
Mykola Golub [Sat, 3 Jun 2017 08:34:17 +0000 (10:34 +0200)]
librbd: remove unused rbd_image_options_t ostream operator

This fixes librbd crashes currently observed on master, when
debug is on, because `rbd_image_options_t` is typedef-ed `void *`
and it's operator is used when attempting to print out an
address (`void *`) of any object.

Signed-off-by: Mykola Golub <mgolub@mirantis.com>
8 years agoos/bluestore: move cache_trim into MempoolThread 15380/head
xie xingguo [Sat, 27 May 2017 09:30:24 +0000 (17:30 +0800)]
os/bluestore: move cache_trim into MempoolThread

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
8 years agomon: clean up PGMapDigest encoding
Sage Weil [Fri, 2 Jun 2017 19:12:18 +0000 (15:12 -0400)]
mon: clean up PGMapDigest encoding

Remove compat cruft due to intitial testing on bigbang.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoMerge pull request #15119 from ceph/wip-rgw-config-docs
Casey Bodley [Fri, 2 Jun 2017 18:09:33 +0000 (14:09 -0400)]
Merge pull request #15119 from ceph/wip-rgw-config-docs

doc: mention certain conf vars should be in global

Reviewed-by: Daniel Gryniewicz <dang@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
8 years agoselinux: clip the ceph context to ceph-mgr also
Kefu Chai [Fri, 2 Jun 2017 08:17:04 +0000 (16:17 +0800)]
selinux: clip the ceph context to ceph-mgr also

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agoqa/tasks: add a blacklist for flush_pg_stats()
Kefu Chai [Fri, 2 Jun 2017 06:41:26 +0000 (14:41 +0800)]
qa/tasks: add a blacklist for flush_pg_stats()

so we don't wait for marked out osds.

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agomon/OSDMonitor: filter the added creating_pgs added from pgmap
Kefu Chai [Fri, 2 Jun 2017 04:43:07 +0000 (12:43 +0800)]
mon/OSDMonitor: filter the added creating_pgs added from pgmap

the creating_pgs added from pgmap might contains pgs whose containing
pools have been deleted. this is fine with the PGMonitor, as it has the
updated pg mapping which is consistent with itself. but it does not work
with OSDMonitor's creating_pgs, whose pg mapping is calculated by
itself. so we need to filter the pgmap's creating_pgs when adding them to
OSDMonitor's creating_pgs with the latest osdmap.get_pools().

Fixes: http://tracker.ceph.com/issues/20067
Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agomon/MonmapMonitor: initialize new cluster monmap with default persistent features
Sage Weil [Fri, 2 Jun 2017 04:11:21 +0000 (00:11 -0400)]
mon/MonmapMonitor: initialize new cluster monmap with default persistent features

This is similar to what we do for OSDMonitor::create_initial().

Avoid setting these initial features just for teh mon test that verifies
persistent features get set on a full quorum.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon: mgr: remove osd_stats map from PGMapDigest
Greg Farnum [Thu, 1 Jun 2017 05:19:02 +0000 (22:19 -0700)]
mon: mgr: remove osd_stats map from PGMapDigest

We use this information only for dumps. Stop dumping per-OSD stats as they're
not needed. In order to maintain pool "fullness" information, calculate
the OSDMap-based rule availibility ratios on the monitor and include those
values in the PGMapDigest. Also do it whenever we call dump_pool_stats_full()
on the manager.

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
8 years agomon: pgstat: remove unused get_osd_sum from post-luminous state
Greg Farnum [Thu, 1 Jun 2017 01:39:48 +0000 (18:39 -0700)]
mon: pgstat: remove unused get_osd_sum from post-luminous state

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
8 years agomon: mgr: pgstats: do available-space calculations on mgr instead of mon
Greg Farnum [Thu, 1 Jun 2017 01:36:47 +0000 (18:36 -0700)]
mon: mgr: pgstats: do available-space calculations on mgr instead of mon

This means we don't need the osd_stat map on the monitor at all. We're
about to remove it entirely!

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
8 years agomon: pgstat: remove unused get_pg_sum interface
Greg Farnum [Thu, 1 Jun 2017 01:28:03 +0000 (18:28 -0700)]
mon: pgstat: remove unused get_pg_sum interface

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
8 years agomon: pgstat: remove get_num_pg_by_osd() from post-luminous state
Greg Farnum [Thu, 1 Jun 2017 00:52:54 +0000 (17:52 -0700)]
mon: pgstat: remove get_num_pg_by_osd() from post-luminous state

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
8 years agomon: pgstat: remove get_osd_stat() from post-luminous state
Greg Farnum [Fri, 19 May 2017 07:00:42 +0000 (00:00 -0700)]
mon: pgstat: remove get_osd_stat() from post-luminous state

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
8 years agoosdmonitor: use a new get_full_osd_counts to generate health without osdstat
Greg Farnum [Fri, 19 May 2017 06:56:56 +0000 (23:56 -0700)]
osdmonitor: use a new get_full_osd_counts to generate health without osdstat

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
8 years agomon/OSDMonitor: print pgid before looking it up in mapping
Kefu Chai [Thu, 1 Jun 2017 07:14:27 +0000 (15:14 +0800)]
mon/OSDMonitor: print pgid before looking it up in mapping

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agomon: handle MGetPoolStats using PGStatService
Sage Weil [Wed, 31 May 2017 14:19:33 +0000 (10:19 -0400)]
mon: handle MGetPoolStats using PGStatService

otherwise ceph_test_rados_api_stat: LibRadosStat.PoolStat will always
timeout once the cluster is switched to luminous

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agomon: handle MStatfs using PGStatService
Sage Weil [Wed, 31 May 2017 14:19:25 +0000 (10:19 -0400)]
mon: handle MStatfs using PGStatService

otherwise ceph_test_rados_api_stat: LibRadosStat.ClusterStat will always
timeout once the cluster is switched to luminous

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agomon/PGMap: strip out PGMapDigest compat cruft
Sage Weil [Wed, 31 May 2017 13:27:08 +0000 (09:27 -0400)]
mon/PGMap: strip out PGMapDigest compat cruft

This was needed for bigbang testing, but not for the final version.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomgr: reset pending_inc after applying it
Kefu Chai [Mon, 29 May 2017 16:40:25 +0000 (00:40 +0800)]
mgr: reset pending_inc after applying it

we cannot apply pending_inc twice and expect the result is the same. in
other words, pg_map.apply_incremental(pending_inc) is not an idempotent
operation.

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agoosd: do_shutdown takes precedence over fetching more maps
Sage Weil [Wed, 24 May 2017 22:40:44 +0000 (18:40 -0400)]
osd: do_shutdown takes precedence over fetching more maps

This is making my osd-markdown.sh test fail reliably.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoosd/OSDMap: more efficient PGMapTemp
Sage Weil [Thu, 25 May 2017 16:13:58 +0000 (12:13 -0400)]
osd/OSDMap: more efficient PGMapTemp

Use a flat_map with pointers into a buffer with the actual data.  For a
decoded mapping, we have just two allocations (one for flat_map and one
for the encoded buffer).

This can get slow if you make lots of incremental changes after the fact
since flat_map is not efficient for modifications at large sizes.  :/

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon: print log for the creating_pgs changes
Kefu Chai [Wed, 24 May 2017 11:34:08 +0000 (19:34 +0800)]
mon: print log for the creating_pgs changes

print more log messages when updating creating_pgs.

see-also: http://tracker.ceph.com/issues/20067
Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agomgr: mgr_tick_period = 2
Sage Weil [Tue, 23 May 2017 20:35:57 +0000 (16:35 -0400)]
mgr: mgr_tick_period = 2

5 seconds is driving me nuts.  We cap the health message size so the
digest is now small and lightweight.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/PGMap: count 'unknown' pgs
Sage Weil [Tue, 23 May 2017 20:35:30 +0000 (16:35 -0400)]
mon/PGMap: count 'unknown' pgs

Also, count "not active" (inactive) pgs instead of active so that we
list "bad" things consistently, and so that 'inactive' is a separate
bucket of pgs than the 'unknown' ones.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomgr/MgrStandby: reset subscriptions when we become non-active
Sage Weil [Tue, 23 May 2017 19:17:34 +0000 (15:17 -0400)]
mgr/MgrStandby: reset subscriptions when we become non-active

This is a goofy workaround that we're also doing in Mgr::init().  Someday
we should come up with a more elegant solution.  In the meantime, this
works just fine!

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomgr/ClusterState: make pg stat filtering less fragile
Sage Weil [Tue, 23 May 2017 16:02:02 +0000 (12:02 -0400)]
mgr/ClusterState: make pg stat filtering less fragile

We want to drop updates for pgs for pools that don't exist.  Keep an
updated set of those pools instead of relying on the previous PGMap
having them instantiated.  (The previous map may drift due to bugs.)

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomgr/DaemonServer: log pgmap usage to cluster log
Sage Weil [Tue, 23 May 2017 14:05:45 +0000 (10:05 -0400)]
mgr/DaemonServer: log pgmap usage to cluster log

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomgr: apply PGMap incremental at same interval as reports
Sage Weil [Tue, 23 May 2017 13:49:16 +0000 (09:49 -0400)]
mgr: apply PGMap incremental at same interval as reports

We were doing an incremental per osd stat report; this screws up the
delta stats updates when there are more than a handful of OSDs.  Instead,
do it with the same period as the mgr->mon reports.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/PGMap: encode delta info in digest
Sage Weil [Tue, 23 May 2017 13:43:36 +0000 (09:43 -0400)]
mon/PGMap: encode delta info in digest

It was already in PGMapDigest, but not encoded.

One field we didn't need; move that back to PGMap.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/OSDMonitor: use newest creation epoch for pgs that we can
Sage Weil [Tue, 23 May 2017 03:39:09 +0000 (23:39 -0400)]
mon/OSDMonitor: use newest creation epoch for pgs that we can

If we have a huge pool it may take a while for the PGs to get out of the
queue and be created.  If we use the epoch the pool was created it may
mean a lot of old OSDMaps the OSD has to process.  If we use the current
epoch (the first epoch in which any OSD learned that this PG should
exist) we limit PastIntervals as much as possible.

It is still possible that we start trying to create a PG but the cluster
is unhealthy for a long time, resulting in a long PastIntervals that
needs to be generated by a primary OSD when it eventually comes up. So
this only partially

Partially-fixes: http://tracker.ceph.com/issues/20050
Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/OSDMonitor: clean up no-beacon message
Sage Weil [Tue, 23 May 2017 02:58:44 +0000 (22:58 -0400)]
mon/OSDMonitor: clean up no-beacon message

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomgr: drop useless __func__ prints
Sage Weil [Mon, 22 May 2017 22:47:40 +0000 (18:47 -0400)]
mgr: drop useless __func__ prints

This is part of the default prefix.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoosd: work around bluestore fragmetned buffers in get_map_bl
Sage Weil [Tue, 23 May 2017 21:07:17 +0000 (17:07 -0400)]
osd: work around bluestore fragmetned buffers in get_map_bl

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agotest: switch from xmlstartlet to jq
Kefu Chai [Sun, 21 May 2017 08:33:53 +0000 (16:33 +0800)]
test: switch from xmlstartlet to jq

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agotools/ceph-monstore-update-crush.sh: switch from xmlstarlet to jq
Kefu Chai [Sun, 21 May 2017 08:32:59 +0000 (16:32 +0800)]
tools/ceph-monstore-update-crush.sh: switch from xmlstarlet to jq

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agodebian,rpm: add jq dependency to ceph-test
Kefu Chai [Sun, 21 May 2017 08:28:28 +0000 (16:28 +0800)]
debian,rpm: add jq dependency to ceph-test

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agomon: speed up pg creates a bit
Sage Weil [Sat, 20 May 2017 21:54:09 +0000 (17:54 -0400)]
mon: speed up pg creates a bit

I don't see any noticeable load on bigbang cluster, so let's bump this up
a bit.  Not being super aggressive here, though, since pool creation is so
rare and who really cares if ginormous clusters take a few minutes to
create all the PGs; better to make sure the mon is happy and responsive
during setup.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/PGMap: update osd_epoch in synchrony with osd_stat_updates
Sage Weil [Sat, 20 May 2017 21:25:11 +0000 (17:25 -0400)]
mon/PGMap: update osd_epoch in synchrony with osd_stat_updates

I'm not sure why this didn't bite us earlier, but there is an assert
in apply_incremental (not used in preluminous mon) and an implicit
dereference in PGMonitor::encode_pending (maybe didn't cause crash?)
that will trigger if we have an osd_stat_updates record without a matching
osd_epochs update.  Maybe there is some subtle reason why the osd_epochs
update happens elsewhere in master (it doesn't on the mgr), but my guess
is we were silently dereferencing the invalid iterator and not noticing.

Anyway, it's easy to fix.  We use the epoch from the previous PGMap.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon,mgr: update inc.osd_epochs when resetting osd_stat_updates
Kefu Chai [Sat, 20 May 2017 04:50:07 +0000 (12:50 +0800)]
mon,mgr: update inc.osd_epochs when resetting osd_stat_updates

otherwise we could have follow bt in ceph-mgr:

 ceph version Development (no_version)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x137) [0x557c46f9d822]
 2: (PGMap::apply_incremental(CephContext*, PGMap::Incremental
const&)+0x72f) [0x557c46cb0e7b]
 3: (ClusterState::notify_osdmap(OSDMap const&)+0x15c) [0x557c46d7fb90]
 4: (()+0xf9f761) [0x557c46dc0761]
 5: (()+0xfa1592) [0x557c46dc2592]
 6: (Mgr::handle_osd_map()+0x82) [0x557c46dc0952]
 7: (Mgr::ms_dispatch(Message*)+0x37d) [0x557c46dc0df9]
 8: (MgrStandby::ms_dispatch(Message*)+0x237) [0x557c46daa12b]
 9: (Messenger::ms_deliver_dispatch(Message*)+0xbf) [0x557c47379703]
 10: (DispatchQueue::entry()+0x623) [0x557c47378821]
 11: (DispatchQueue::DispatchThread::entry()+0x1c) [0x557c470f0ca8]
 12: (Thread::entry_wrapper()+0xc1) [0x557c47225e79]
 13: (Thread::_entry_func(void*)+0x18) [0x557c47225dae]
 14: (()+0x7494) [0x7f2ef43a4494]
 15: (clone()+0x3f) [0x7f2ef321993f]

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agoosd/osd_types: make 0 pg state 'unknown' instead of 'inactive'
Sage Weil [Tue, 23 May 2017 20:25:14 +0000 (16:25 -0400)]
osd/osd_types: make 0 pg state 'unknown' instead of 'inactive'

The OSD never reports PGs this way; we'll only see it from a mgr
PGMap that hasn't filled in the state.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomgr: simplify handling of new pgs/pools
Sage Weil [Fri, 19 May 2017 21:56:11 +0000 (17:56 -0400)]
mgr: simplify handling of new pgs/pools

Instantiate barebones pg records (creating+stale) in our PGMap when pgs
are created.  These will switch to 'creating' when the pgs is in the
process of creating, and peering etc.  The 'stale' is an indicator that
the mon may not have even asked the pg to create them yet.

All of the old meticulous tracking in PGMap for mappings for creating
pgs is useless to us; OSDMonitor has new code to handle it.  This is
fast and simple.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agovstart: debug mon on mgr
Sage Weil [Fri, 19 May 2017 21:42:57 +0000 (17:42 -0400)]
vstart: debug mon on mgr

PGMap dout is tied to mon still; we want to see it on the mgr.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/PGMap: fix deleted_pool cleanup
Sage Weil [Fri, 19 May 2017 21:42:42 +0000 (17:42 -0400)]
mon/PGMap: fix deleted_pool cleanup

- make sure num_pg_by_pool is cleared
- update_pool_deltas can repopulate pg_pool_sum; clear after that

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomgr/ClusterState: apply latest osdmap to pgmap
Sage Weil [Fri, 19 May 2017 21:01:55 +0000 (17:01 -0400)]
mgr/ClusterState: apply latest osdmap to pgmap

In particular, clear out deleted pools and clear osd stats for
deleted/down/out osds.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/PGMap: new check_osd_map that takes a OSDMap& const
Sage Weil [Fri, 19 May 2017 21:01:22 +0000 (17:01 -0400)]
mon/PGMap: new check_osd_map that takes a OSDMap& const

The previous version takes an Incremental and requires that we see
every single consecutive map in the history.  This version is mgr-friendly
and just takes the latest OSDMap.  It's a bit simpler too because it
ignores the full/nearfull (legacy preluminous) and last_osd_report.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoceph-kvstore-tool: implement 'rm' and 'rm-prefix' commands
Sage Weil [Fri, 19 May 2017 20:38:58 +0000 (16:38 -0400)]
ceph-kvstore-tool: implement 'rm' and 'rm-prefix' commands

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/PGMap: use int32_t, not int
Sage Weil [Fri, 19 May 2017 20:38:38 +0000 (16:38 -0400)]
mon/PGMap: use int32_t, not int

These are encoded! Be explicit.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/PGMap: explicitly count pg per pool
Sage Weil [Fri, 19 May 2017 20:38:14 +0000 (16:38 -0400)]
mon/PGMap: explicitly count pg per pool

This will more reliably remove empty pools from pg_pool_sum.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/MgrStatMonitor: fix version across restarts
Sage Weil [Fri, 19 May 2017 19:17:18 +0000 (15:17 -0400)]
mon/MgrStatMonitor: fix version across restarts

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/PGMap: cap health detail messages at 50 (configurable)
Sage Weil [Fri, 19 May 2017 15:48:15 +0000 (11:48 -0400)]
mon/PGMap: cap health detail messages at 50 (configurable)

There are two cases where we spew health detail warnings for potentially
every pg.  Cap those detail messages at 50 and, if we exceed that, include
a message saying how many more there are.  This avoids huge lists of
detail messages going from the mgr to mon and also makes life better for
users of the health detail api.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/MgrStatMonitor: trim mgrstat states
Sage Weil [Fri, 19 May 2017 15:08:26 +0000 (11:08 -0400)]
mon/MgrStatMonitor: trim mgrstat states

We don't actually need any of these older states at all so I hard-coded
a constant (oh no!).  In reality it doesn't matter what it is anyway
since PaxosService waits for paxos_service_trim_min (=250) to accumulate
before removing anything.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/MgrStatMonitor: wrap digest encoding in bufferlist
Sage Weil [Fri, 19 May 2017 15:07:34 +0000 (11:07 -0400)]
mon/MgrStatMonitor: wrap digest encoding in bufferlist

This is just for bigbang's benefit.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomgr,mon: move 'osd {scrub,deep-scrub,repair}' handling to mgr
Sage Weil [Fri, 19 May 2017 14:25:28 +0000 (10:25 -0400)]
mgr,mon: move 'osd {scrub,deep-scrub,repair}' handling to mgr

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomgr/DaemonServer: use registered osd session to send scrub messages
Sage Weil [Wed, 31 May 2017 17:02:35 +0000 (13:02 -0400)]
mgr/DaemonServer: use registered osd session to send scrub messages

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomgr/DaemonServer: keep registry of osd sessions
Sage Weil [Fri, 19 May 2017 03:06:20 +0000 (23:06 -0400)]
mgr/DaemonServer: keep registry of osd sessions

Occasionally we send them messages.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon: move parse_* helpers into cmdparse
Sage Weil [Fri, 19 May 2017 14:25:43 +0000 (10:25 -0400)]
mon: move parse_* helpers into cmdparse

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agotest/cephtool-test-mon: make ec test less sensitive to crush
Sage Weil [Fri, 19 May 2017 03:33:36 +0000 (23:33 -0400)]
test/cephtool-test-mon: make ec test less sensitive to crush

With only 3 we can get into a situation where one slot is CRUSH_ITEM_NONE.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agotest/osd/osd-dup.sh: flush_pg_stats before "ceph -s"
Kefu Chai [Fri, 19 May 2017 10:31:28 +0000 (18:31 +0800)]
test/osd/osd-dup.sh: flush_pg_stats before "ceph -s"

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agoqa/workunites/ceph-helpers.sh: move flush_pg_stats() here
Kefu Chai [Fri, 19 May 2017 10:30:49 +0000 (18:30 +0800)]
qa/workunites/ceph-helpers.sh: move flush_pg_stats() here

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agomon/PGMonitor: do not create/encode pending if luminous
Kefu Chai [Fri, 19 May 2017 09:53:27 +0000 (17:53 +0800)]
mon/PGMonitor: do not create/encode pending if luminous

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agomon/PGMonitor: reset PGMonitor member vars when luminous
Kefu Chai [Fri, 19 May 2017 09:52:52 +0000 (17:52 +0800)]
mon/PGMonitor: reset PGMonitor member vars when luminous

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agoqa/workunits/cephtool/test.sh: use jq instead of awk to select the require element
Kefu Chai [Fri, 19 May 2017 09:50:02 +0000 (17:50 +0800)]
qa/workunits/cephtool/test.sh: use jq instead of awk to select the require element

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agoqa/workunits/cephtool/test.sh: use flush_pg_stats to sync mon with osd
Kefu Chai [Fri, 19 May 2017 06:54:58 +0000 (14:54 +0800)]
qa/workunits/cephtool/test.sh: use flush_pg_stats to sync mon with osd

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agomgr: add a command "mgr report"
Kefu Chai [Wed, 17 May 2017 07:30:28 +0000 (15:30 +0800)]
mgr: add a command "mgr report"

* extract send_report() out of tick() so it can be reused.
* add a commmand "mgr report-mon" for mgr, so we are able to flush the
  the mgr stats to mon actively without waiting for the tick. this
  could help with the tests.

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agoqa/workunits/cephtool/test.sh: fix flush_pg_stats usage
Sage Weil [Thu, 18 May 2017 22:34:34 +0000 (18:34 -0400)]
qa/workunits/cephtool/test.sh: fix flush_pg_stats usage

Use a helper.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoqa/tasks: use new reliable flush_pg_stats helper
Sage Weil [Thu, 18 May 2017 22:16:55 +0000 (18:16 -0400)]
qa/tasks: use new reliable flush_pg_stats helper

The helper gets a sequence number from the osd (or osds), and then
polls the mon until that seq is reflected there.

This is overkill in some cases, since many tests only require that the
stats be reflected on the mgr (not the mon), but waiting for it to also
reach the mon is sufficient!

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon: add 'osd last-stat-seq <osd>' command
Sage Weil [Thu, 18 May 2017 21:23:03 +0000 (17:23 -0400)]
mon: add 'osd last-stat-seq <osd>' command

Return the latest seq for the osd reflected in the mon's digest stats.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/PGMap: coalesce last osd stat seq in the PGMapDigest
Sage Weil [Thu, 18 May 2017 21:21:32 +0000 (17:21 -0400)]
mon/PGMap: coalesce last osd stat seq in the PGMapDigest

This is, strictly speaking, redundant, since the osd_stat is also in the
digest, but we plan to remove that.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoosd: report a seq from flush_pg_stats command
Sage Weil [Thu, 18 May 2017 21:19:08 +0000 (17:19 -0400)]
osd: report a seq from flush_pg_stats command

Report a sequence number when we flush_pg_stats.  Combine the up_from and
a per-boot seq number to get a monotonically increasing value across OSD
restarts (we assume less than 4 billion stats reports in a single epoch).

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoosd: include up_from, seq in osd_stat_t
Sage Weil [Thu, 18 May 2017 20:10:32 +0000 (16:10 -0400)]
osd: include up_from, seq in osd_stat_t

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/PGMonitor: clear PGMap data when require_luminous is set
Sage Weil [Thu, 18 May 2017 20:05:28 +0000 (16:05 -0400)]
mon/PGMonitor: clear PGMap data when require_luminous is set

Once the OSDMap flag is set there is no going back. Zero out the on-disk
PGMap data, and clear the in-memory PGMap to free up memory and make
bugs easier to spot.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agokv/RocksDBStore: make rmkeys_by_prefix efficient
Sage Weil [Thu, 18 May 2017 19:42:23 +0000 (15:42 -0400)]
kv/RocksDBStore: make rmkeys_by_prefix efficient

This matches what rm_range_keys does.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/OSDMonitor: limit number of concurrently creating pgs
Sage Weil [Thu, 18 May 2017 19:41:39 +0000 (15:41 -0400)]
mon/OSDMonitor: limit number of concurrently creating pgs

There is overhead for PGs we are creating because the mon has to track
which OSD each one current maps to.  This can be problematic on a very
large cluster.  Limit the overhead by setting a cap on the number of PGs
we are creating at once; leave the rest in a persistent queue.

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/MgrStatMonitor: fix digest vs pending digest
Sage Weil [Thu, 18 May 2017 19:32:57 +0000 (15:32 -0400)]
mon/MgrStatMonitor: fix digest vs pending digest

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/OSDMonitor: fix bad update_pending_pgs call
Sage Weil [Thu, 18 May 2017 19:32:26 +0000 (15:32 -0400)]
mon/OSDMonitor: fix bad update_pending_pgs call

We are not persisiting the updated creating_pgs here; this is bad!  I'm
not sure why it was there to begin with?

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agomon/PGMonitor: disable when REQUIRE_LUMINOUS is set
Sage Weil [Thu, 18 May 2017 17:57:58 +0000 (13:57 -0400)]
mon/PGMonitor: disable when REQUIRE_LUMINOUS is set

Signed-off-by: Sage Weil <sage@redhat.com>