Kefu Chai [Fri, 11 Jan 2019 10:47:39 +0000 (18:47 +0800)]
crimson/osd: enable crimson-osd to boot
* add state.h to encapsulate the state represeting different stages
related to booting an OSD. the boot process of an OSD can be blocked
by
- waiting for PG consuming updated osdmaps
- waiting for osdmaps marking osd.{whoami} up
- waiting for new osdmaps to bring this osd up to speed.
- waiting for current OSD to be healthy
we could chain these "waits" in a more seastarized way, and let
OSD::start() wait on the future returned by this chain. but that'd
requires adding some seastar::shard_future<> as member variables of
`OSD` class, which is a little bit more convoluted than the state
machine approach used in this change. we could switch over to the
`future<>` chain approach, if we found that these futures could have
more consumers than merely `OSD::start()`.
* all osdmaps are now stored in an `std::map` in `OSD`, we can
improve it by
- caching it using an LRU cache
- trimming the stale ones
- persisting the evicted maps into the meta collection in ObjectStore
* superblock is not persited to store, neither is it read from the
store.
Kefu Chai [Thu, 10 Jan 2019 13:15:18 +0000 (21:15 +0800)]
crimson: set src for message
monitor will panic at seeing a MOSDBoot message which is not from an
OSD. see OSDMonitor::preprocess_boot()
```
ceph_assert(m->get_orig_source_inst().name.is_osd());
```
/home/jenkins-build/build/workspace/ceph-pull-requests-arm64/src/seastar/fmt/include/fmt/format.h:2120:14:
error: comparison of integer expressions of different signedness: 'const
wchar_t' and 'char' [-Werror=sign-compare]
if (*out == value)
~~~~~^~~~~~~~
cc1plus: all warnings being treated as errors
where libfmt compares a wchar_t with a literal '}', which is char.
because the former is unsigned, and the latter is of a signed type,
GCC is annoyed. but since both of them are ASCII, and when performing
comparison, the signed one is converted to unsigned, then result of
comparison is correct per-se. hence, it's safe to silence this very
warning.
and warning like:
/home/jenkins-build/build/workspace/ceph-pull-requests-arm64/src/seastar/src/core/future-util.cc:61:5:
required from 'seastar::future<> seastar::sleep_abortable(typename
Clock::duration, seastar::abort_source&) [with Clock =
std::chrono::_V2::steady_clock; typename Clock::duration =
std::chrono::duration<long int, std::ratio<1, 1000000000> >]'
/home/jenkins-build/build/workspace/ceph-pull-requests-arm64/src/seastar/src/core/future-util.cc:68:105:
required from here
/home/jenkins-build/build/workspace/ceph-pull-requests-arm64/src/seastar/src/core/future-util.cc:48:28:
error: 'seastar::sleep_abortable(typename Clock::duration,
seastar::abort_source&)::sleeper::sleeper(typename Clock::duration,
seastar::abort_source&) [with Clock = std::chrono::_V2::steady_clock;
typename Clock::duration = std::chrono::duration<long int, std::ratio<1, 1000000000> >]::<lambda()>' declared with greater visibility than the
type of its field 'seastar::sleep_abortable(typename Clock::duration,
seastar::abort_source&)::sleeper::sleeper(typename Clock::duration,
seastar::abort_source&) [with Clock = std::chrono::_V2::steady_clock;
typename Clock::duration = std::chrono::duration<long int, std::ratio<1, 1000000000> >]::<lambda()>::<this capture>' [-Werror=attributes]
: tmr([this] { done.set_value(); }) {
^ Signed-off-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Fri, 21 Dec 2018 10:34:22 +0000 (18:34 +0800)]
test/crimson/monc: start/stop perf counter
in CephContext::CephContext(), we assume that
ceph::common::local_perf_coll() is ready when a CephContext is to be
contructed. so we need to prepare start it before creating CephContext.
Sage Weil [Fri, 18 Jan 2019 03:02:59 +0000 (21:02 -0600)]
Merge PR #25900 into master
* refs/pull/25900/head:
qa/tasks/ceph.py: bracket addrvecs in mon_host etc
vstart.sh: bracket addrvec on mon_host for msgr2-only mode
unittest_addrs: entity_addr_t: strengthen tests slightly
common/ceph_argparse: make parse_ip_port_vec handle list of addrs or addrvecs
common/ceph_argparse: parse_ip_port_vec returns addrvecs, not addrds
msg/msg_types: entity_addrvec_t: require brackets for size >1
msg/msg_types: entity_addrvec_t: allow brackets when parsing addrvec to match output
msg/msg_types: entity_addrvec_t: allow only ',' as an addrvec separator
msg/msg_types: entity_addr_t: we should not parse an addrvec
msg/msg_types: entity_addr_t: fix empty string parse cases
msg/msg_types: entity_addr_t: is_ipv6() and is_ipv4()
Sage Weil [Thu, 17 Jan 2019 17:04:30 +0000 (11:04 -0600)]
Merge PR #25849 into master
* refs/pull/25849/head:
qa/suites/rados/upgrade: one mon per node, and enable-msgr2 at end
qa/rados/thrash-old-clients: avoid msgr2
mon: make bootstrap rank check more robust
mon: clean up probe debug output a bit
msg/async: use v1 for v1 <-> [v2,v1] peers
msg/async/AsyncMessenger: drop single-use _send_to
mon/HealthMonitor: raise MON_MSGR2_NOT_ENABLED if mons not bound to msgr2
doc/rados/operations/health-checks: document MON_* health warnings
mon/MonMapMonitor: add 'mon enable-msgr2' command
mon: respawn if rank addr changes
mon/MonMap: calc_addr_mons() after setting rank addrvec
Kefu Chai [Wed, 28 Nov 2018 13:00:33 +0000 (21:00 +0800)]
crimson/monc: set name using a setter
* set entity_name using a setter not pass it to constructor, because
the entity_name is retrieved in seastar's app.run() by ConfigProxy,
while it'd be simpler if we can instantiate mon::Client in main()
as a local variable, instead of managing it on heap using a smart
pointer. so we cannot pass the entity_name as a parameter of ctor.
* also cleanup the #include's, as they are included already in the
header.
Kefu Chai [Wed, 28 Nov 2018 12:55:44 +0000 (20:55 +0800)]
crimson: pass entity_name and cluster to ctor of ConfigProxy
as we always need to set entity_name and cluster before start using
ConfigProxy, and we do not read these settings from config file,
this makes these two setting special. so it'd be simpler to just
pass them as parameters of constructor.
and more importantly, we need to parse the command line arguments
using ceph_argparse_early_args() first, as it will consume the parsed
arguments, and leave the unparsed ones in the input parameter of `args`,
and then, we can pass the unparsed args to app.run().
it's not a perfect solution. as there are some options that both parsers
are interested, for instance, `-c` -- ceph take it as the conf file's
path, while seastar takes it as the number of cores to use. but let's
feed ceph's parser first. unless it's fine to drop the backward
compatibility of command line syntax of ceph-osd.
Sage Weil [Wed, 16 Jan 2019 21:39:53 +0000 (15:39 -0600)]
msg/simple: remove forced authorizer refresh
This synchronous check has always been kludgey; remove it and just fault
instead, just like we did with 794a8f9cf51cf176636d114ccfbbf68fbc304083
in AsyncMessenger.
Sage Weil [Wed, 16 Jan 2019 19:12:35 +0000 (13:12 -0600)]
Merge PR #25934 into master
* refs/pull/25934/head:
msg/msg_type: entity_addr_t: fix legacy decode
msg/msg_types: make set_sockaddr() work with AF_UNSPEC (i.e., zeroed)
msg/msg_types: make set_sockaddr() a bit more robust
msg/async: fix IP inference
Sage Weil [Thu, 10 Jan 2019 20:18:55 +0000 (14:18 -0600)]
common/ceph_argparse: make parse_ip_port_vec handle list of addrs or addrvecs
This helper is only used for mon_host.
We want to be able to list addrs or addrvecs. It is slight wonky because
you could parse something like
"1.2.3.4,5.6.7.8"
as either a list of addrs or a list of addrvecs. Since the addr parse
takes a default type, it is preferred, so first try parsing as an addr
before proceeding. Addrvecs that are size >1 must have brackets, so we
will always parse "[1.2.3.4,5.6.7.8]" unambiguously (since "[1.2.3.4"
won't parse as an addr).
Sage Weil [Thu, 10 Jan 2019 19:53:55 +0000 (13:53 -0600)]
msg/msg_types: entity_addrvec_t: require brackets for size >1
Allowing us to parse "1.2.3.4,5.6.7.8" means we can't unambiguously
differentiate between one addrvec and a list of addrvecs, which we'll
want/need to do later.
Sage Weil [Wed, 16 Jan 2019 13:13:14 +0000 (07:13 -0600)]
msg/msg_type: entity_addr_t: fix legacy decode
If we decode a zeroed sockaddr, we should end up with a TYPE_NONE
entity_addr_t, not v1::/0.
This was obscured by unit test TestAddrvecEncodeAddrDecode3, which
took an addrvec with all v2 addrs, decoded to an addr variable that
previously had v1:1.2.3.4:/0, and asserted the result was not v1::/0.
The test passed before because the set_sockaddr() failed on AF_UNSPEC
and the addr kept v1:1.2.3.4, but with the previous commit it failed
because it equaled v1::/0. In reality, addr should get - (addr TYPE_NONE).
The TestEmptyAddrvecEncodeAddrDecode test case is similarly adjusted.
xie xingguo [Mon, 14 Jan 2019 06:39:18 +0000 (14:39 +0800)]
mgr/balancer: blame if upmap won't actually work
With automatic balancing on, and if mode is set to upmap,
balancer will fail silently if min_compat_client is lower than
luminous.
You can't figure out that unless you take a closer look at the
mgr log, which is super annoying..
Sage Weil [Tue, 15 Jan 2019 16:38:29 +0000 (10:38 -0600)]
msg/async: use v1 for v1 <-> [v2,v1] peers
If *peers* are communicating, i.e. there may be bidirectional connection
attempts, we must use the same protocol version from both ends or else
we will get very confused.
Fix this by forcing the use of v1 when we
- are bound to a v1 endpoint only (people can't connect to us via v2)
- we are connecting to a *peer*
If it is a non-peer, then connections are uni-directional. If we both
have v2, we will both use v2.
If we ever switch to [v2,v1], it will be as part of a restart.
xie xingguo [Tue, 15 Jan 2019 08:23:26 +0000 (16:23 +0800)]
crush: fix memory leak
If we remove the last item of bucket, there should still be
one final entry in the __weights__ field of __weight_set__.
Free the corresponding memory before we __null__ the pointer.