Sage Weil [Sat, 20 Mar 2021 13:15:58 +0000 (09:15 -0400)]
Merge PR #40220 into master
* refs/pull/40220/head:
mgr/cephadm: identify rgw, cepfs-mirror in servicemap
mgr/ServiceMap: adjust 'ceph -s' summary
rgw: register daemons in servicemap by gid; include id
cephadm: fix rbd-mirror auth name
Kefu Chai [Sat, 20 Mar 2021 05:00:01 +0000 (13:00 +0800)]
install-deps.sh: remove existing ceph-libboost of different version
we install different versions of precompiled ceph-libboost packages
for different branches when building and testing them on ubuntu test
nodes. for instance,
- nautilus: v1.72
- octopus, pacific: v1.73
they share the same set of test nodes. and these ceph-libboost packages
conflict with each other, because they install files to the same places.
in order to avoid the confliction, we should uninstall existing packages
before installing a different version of ceph-libboost packages.
ceph-libboost${version}-dev is a package providing the shared headers of
boost library, so, in this change we check if it is installed before
returning or removing the existing packages.
Sage Weil [Fri, 19 Mar 2021 20:42:14 +0000 (16:42 -0400)]
Merge PR #40242 into master
* refs/pull/40242/head:
mgr/cephadm/upgrade: do not repeat crash message
mgr/cephadm/upgrade: a little less verbose
mgr/cephadm: don't log not-ok-to-stop at ERR level
mgr/cephadm: is presumed -> appears
mgr/cephadm: don't double-log ok-to-stop results
mgr/cephadm/upgrade: include upgrade progress in ceph -s
Sage Weil [Fri, 19 Mar 2021 12:21:18 +0000 (08:21 -0400)]
mgr/ServiceMap: adjust 'ceph -s' summary
- Do not list individual daemon ids as this won't scale for larger
clusters
- Do not contemplate multile daemons of the same type that register with
different "daemon_type" -- not until we actually have any that do that.
- Present counts by various groupings: distinct hosts and rgw zones to
start.
services:
mon: 1 daemons, quorum a (age 4m)
mgr: x(active, since 3m)
osd: 1 osds: 1 up (since 3m), 1 in (since 3m)
cephfs-mirror: 1 daemon active (1 hosts)
rbd-mirror: 2 daemons active (1 hosts)
rgw: 2 daemons active (1 hosts, 1 zones)
Kefu Chai [Thu, 11 Mar 2021 13:13:13 +0000 (21:13 +0800)]
mon/OSDMonitor: drop stale failure_info
failure_info keeps strong references of the MOSDFailure messages
sent by osd or peon monitors, whenever monitor starts to handle
an MOSDFailure message, it registers it in its OpTracker. and
the failure report messageis unregistered when monitor acks them
by either canceling them or replying the reporters with a new
osdmap marking the target osd down. but if this does not happen,
the failure reports just pile up in OpTracker. and monitor considers
them as slow ops. and they are reported as SLOW_OPS health warning.
in theory, it does not take long to mark an unresponsive osd down if
we have enough reporters. but there is chance, that a reporter fails
to cancel its report before it reboots, and the monitor also fails
to collect enough reports and mark the target osd down. so the
target osd never gets an osdmap marking it down, so it won't send
an alive message to monitor to fix this.
in this change, we check for the stale failure info in tick(), and
simply drop the stale reports. so the messages can released and
marked "done".
will add a trim failures call in the loop, which mutates failure_info,
while we are still iterating this map. so have to restructure the loop
a little bit.
Kefu Chai [Thu, 11 Mar 2021 09:45:49 +0000 (17:45 +0800)]
mon/OSDMonitor: do not return no_reply() again
we always return "no_op" message to proxy monitor in
`OSDMonitor::prepare_failure()` at the very beginning of this method. so
no need to reply the peon again when discarding the failure report.
Kefu Chai [Thu, 11 Mar 2021 09:09:57 +0000 (17:09 +0800)]
mon/Monitor: early return if routed request is not found
* early return if routed request is not found in routed_requests.
reduce the indent level, for better readability.
* do not look up the request twice. for better performance.
* use unique_ptr<> for holding the request, for better readability
Sage Weil [Fri, 19 Mar 2021 12:25:23 +0000 (08:25 -0400)]
rgw: register daemons in servicemap by gid; include id
Registering by gid allows multiple radosgw instances to share an auth
key/identity. Including the id in the metadata allows them to still be
identified by name (even if not uniquely).
Kefu Chai [Fri, 19 Mar 2021 02:32:16 +0000 (10:32 +0800)]
test: run promtool test without docker on ubuntu/focal
before this change, we use docker for running promtools offered by
a docker image, but this is not efficient, and quite a few developers
do not want to use docker for running "make check". this change was
introduced by #39246, the reason was that, in Ceph's CI process, we
are using Ubuntu/Bionic for running "make check" jobs, but prometheus
packaged by Bionic does not offer the "test rules" command. so, to
address problem, we are using "dnanexus/promtool:2.9.2" docker image
for verifying monitoring/prometheus/alerts/test_alerts.yml.
after this change, we use prometheus packaged by debian derivatives
instead of pulling a docker image.
* debian/control: add prometheus as a "make check" dependency
* install-deps.sh: partially revert 53a5816deda0874a3a37e131e9bc22d88bb2a588, as we don't need to
pull docker or start docker service for using promtool anymore.
* cmake: check if promtool is capable of running "test rules"
command, bail out if it is not.
Kefu Chai [Thu, 18 Mar 2021 11:50:58 +0000 (19:50 +0800)]
install-deps.sh: install boost 1.75 on focal
we bump boost on regular basis. let's take the opportunity of moving to
focal to use boost v1.75.
v1.73 was used before this change. since both boost 1.75 and boost 1.73
install some files at the same places, we need to remove boost 1.73
before installing boost 1.75.
Kefu Chai [Thu, 18 Mar 2021 11:43:06 +0000 (19:43 +0800)]
install-deps.sh: install libzbd on focal
WITH_ZBD is enabled for testing the build of zbd bluestore backend, and
we plan to migrate to Ubuntu/Focal for testing "make check", so need to
install libzbd when the distro version is focal.
Kefu Chai [Fri, 19 Mar 2021 12:20:32 +0000 (20:20 +0800)]
osd/PeeringState: remove unused variable
recovery_ec_pool_below_min_size was used to verify if the osd in clsuter
are octopus and up, but since we are now quincy and up, there is no need
to verify this. so drop it for better readability and for silencing
the -Wunused-variable warning in Release build.
Kefu Chai [Fri, 19 Mar 2021 11:23:09 +0000 (19:23 +0800)]
script/run-make.sh: quote targets with double quote
in
ceph-build/ceph-perf-pull-requests/config/definitions/ceph-perf-pull-requests.yml,
we pass "vstart-base crimson-osd" as the targets argument, but the
build() function in ceph/src/script/run-make.sh fails to quote them, so
they are expanded into two argument of `test -n`. hence it breaks like
src/script/run-make.sh: line 124: test: vstart-base: binary operator expected
make will run with option(s) -j40
Unknown argument vstart-base
Unknown argument crimson-osd
Kefu Chai [Fri, 19 Mar 2021 08:18:23 +0000 (16:18 +0800)]
run-make-check.sh: increase fs.aio-max-nr
without this change the seastar based tests fail on host with 48 cores,
because the /proc/sys/fs/aio-nr used by the tests is greater than 1048576. if run-make-check.sh is used to launch the test, the default
job number is `$(nproc) / 2`, and the peak number of /proc/sys/fs/aio-nr
when running ctest was 3190848 when testing on the 48-core host.
so we need to increase fs.aio-max-nr accordingly to the available cores
on the host.
Kefu Chai [Fri, 19 Mar 2021 04:05:45 +0000 (12:05 +0800)]
pybind/mgr/dashboard: bump flake8 to 3.9.0
to address the failure of
ERROR: Cannot install -r requirements-lint.txt (line 2) and -r requirements-lint.txt (line 8) because these package versions have conflicting dependencies.
The conflict is caused by:
flake8 3.8.4 depends on pycodestyle<2.7.0 and >=2.6.0a1
autopep8 1.5.6 depends on pycodestyle>=2.7.0
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
Sage Weil [Thu, 18 Mar 2021 20:11:38 +0000 (16:11 -0400)]
Merge PR #40048 into master
* refs/pull/40048/head:
mgr/cephadm: stop conflicting daemon when deploying to a specific port
mgr/cephadm: make DaemonPlacement print nicer
mgr/cephadm: fix --force remove comment
mgr/cephadm/schedule: choose an IP from a subnet list
mgr/cephadm: rgw: clean up config and config-key values on removal
mgr/cephadm: rgw: drop .crt extension when storing cert in config-key
mgr/cephadm/services: allow beast/civetweb to bind to a particular IP
python-common: add 'networks' property to ServiceSpec
mgr/cephadm/schedule: match placement ip only combination with port
Reviewed-by: Sebastian Wagner <swagner@suse.com> Reviewed-by: Juan Miguel Olmo <jolmomar@redhat.com>
Sage Weil [Thu, 18 Mar 2021 18:26:48 +0000 (14:26 -0400)]
cephadm: prevent podman from breaking socket.getfqdn()
socket.getfqdn() will return the reverse lookup for 127.0.1.1, which is
the last item listed for that IP in /etc/hosts. Podman, by default, will
append the container name (ceph-$fsid-$name) to that line, which is not
a valid hostname, and not what we want the dashbaord to use for the URI
it advertises in the service map.
Pass --no-hosts to podman to disable this.
Docker does not appear to modify /etc/hosts by default--or, more
importantly, does not add the container name there.
Explicitly instruct podman (and docker) to add a
Fixes: https://tracker.ceph.com/issues/49890 Signed-off-by: Sage Weil <sage@newdream.net>
Adam King [Thu, 18 Mar 2021 17:20:46 +0000 (13:20 -0400)]
mgr/orchestrator: remove image name field from 'orch ps' and 'orch ls'
Now that we're typically using the image digests the name isn't as helpful. We also
end up in scenarios where some images use tags for their name and others use the
digest so the image name comes out as "mix" in orch ls despite it being the same image.
Fixes: https://tracker.ceph.com/issues/47333 Signed-off-by: Adam King <adking@redhat.com>
Sage Weil [Thu, 18 Mar 2021 16:45:48 +0000 (11:45 -0500)]
mon/MgrStatMonitor: ignore MMgrReport from non-active mgr
If it's not the active mgr, we should ignore it.
Since the mgr instance is best identified by the gid, add that to the
message. (We can't use the source_addrs for the message since that is
the MgrStandby monc addr, not the active mgr addrs in the MgrMap.)
This fixes a problem where a just-demoted mgr report gets processed and a
new mgr gets a ServiceMap with an epoch >= its pending map. (At least,
that is my theory!)
Fixes: https://tracker.ceph.com/issues/48022 Signed-off-by: Sage Weil <sage@newdream.net>
qa/tasks: Add additional wait_for_clean() check in lost_unfound tasks.
At the end of the lost_unfound tests add an additional wait_for_clean()
check to ensure that recoveries get enough time to complete before
proceeding and avoid failures down the line. For e.g. failure like
"Scrubbing terminated -- not all pgs were active and clean." is because
recoveries on the PGs did not get sufficient time to complete even though
they were bound to eventually complete.