]>
git.apps.os.sepia.ceph.com Git - ceph.git/log
Patrick Seidensal [Thu, 9 Dec 2021 13:59:49 +0000 (14:59 +0100)]
monitoring: fix display ceph_osd_in in Grafana panel
Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
(cherry picked from commit
4a6b2c1dfbbe7182beaf510c4a7297a79c6e2524 )
Patrick Seidensal [Mon, 25 Oct 2021 13:00:14 +0000 (15:00 +0200)]
mgr/prometheus: Fix regression with OSD/host details/overview dashboards
Fix issues with PromQL expressions and vector matching with the
`ceph_disk_occupation` metric.
As it turns out, `ceph_disk_occupation` cannot simply be used as
expected, as there seem to be some edge cases for users that have
several OSDs on a single disk. This leads to issues which cannot be
approached by PromQL alone (many-to-many PromQL erros). The data we
have expected is simply different in some rare cases.
I have not found a sole PromQL solution to this issue. What we basically
need is the following.
1. Match on labels `host` and `instance` to get one or more OSD names
from a metadata metric (`ceph_disk_occupation`) to let a user know
about which OSDs belong to which disk.
2. Match on labels `ceph_daemon` of the `ceph_disk_occupation` metric,
in which case the value of `ceph_daemon` must not refer to more than
a single OSD. The exact opposite to requirement 1.
As both operations are currently performed on a single metric, and there
is no way to satisfy both requirements on a single metric, the intention
of this commit is to extend the metric by providing a similar metric
that satisfies one of the requirements. This enables the queries to
differentiate between a vector matching operation to show a string to
the user (where `ceph_daemon` could possibly be `osd.1` or
`osd.1+osd.2`) and to match a vector by having a single `ceph_daemon` in
the condition for the matching.
Although the `ceph_daemon` label is used on a variety of daemons, only
OSDs seem to be affected by this issue (only if more than one OSD is run
on a single disk). This means that only the `ceph_disk_occupation`
metadata metric seems to need to be extended and provided as two
metrics.
`ceph_disk_occupation` is supposed to be used for matching the
`ceph_daemon` label value.
foo * on(ceph_daemon) group_left ceph_disk_occupation
`ceph_disk_occupation_human` is supposed to be used for anything where
the resulting data is displayed to be consumed by humans (graphs, alert
messages, etc).
foo * on(device,instance)
group_left(ceph_daemon) ceph_disk_occupation_human
Fixes: https://tracker.ceph.com/issues/52974
Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
(cherry picked from commit
18d3a71618a5e3bc3cbd0bce017fb7b9c18c2ca0 )
Patrick Seidensal [Mon, 25 Oct 2021 08:51:35 +0000 (10:51 +0200)]
mgr/prometheus: Refactoring: Introduce type aliases
Fixes: https://tracker.ceph.com/issues/52974
Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
(cherry picked from commit
154d3525b19135a929851c0b027da19abda20ebe )
Guillaume Abrioux [Mon, 24 Jan 2022 12:39:15 +0000 (13:39 +0100)]
Merge pull request #44708 from guits/wip-53962-pacific
pacific: ceph-volume: show RBD devices as not available
Ernesto Puerta [Fri, 21 Jan 2022 19:44:47 +0000 (20:44 +0100)]
Merge pull request #44534 from rhcs-dashboard/wip-53834-pacific
pacific: mgr/dashboard: Update Angular version to 12
Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Guillaume Abrioux [Fri, 21 Jan 2022 12:48:04 +0000 (13:48 +0100)]
Merge pull request #44701 from guits/wip-53955-pacific
pacific: ceph-volume: don't use MultiLogger in find_executable_on_host()
Michael Fritch [Tue, 18 Jan 2022 22:15:45 +0000 (15:15 -0700)]
ceph-volume: filter RBD devices from the device inventory
Avoid running `blkid` or deploying OSDs on RBD devices by ensuring they
do not appear in the `ceph-volume inventory`
Fixes: https://tracker.ceph.com/issues/53846
Signed-off-by: Michael Fritch <mfritch@suse.com>
(cherry picked from commit
47325ec3ec5ce1d53c5eae2952f631e95b7135fe )
Adam King [Thu, 20 Jan 2022 19:51:04 +0000 (14:51 -0500)]
Merge pull request #44681 from guits/split-cephadm-distros
qa: split distro for rados/cephadm/smoke tests
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Guillaume Abrioux [Wed, 19 Jan 2022 14:04:20 +0000 (15:04 +0100)]
ceph-volume: don't use MultiLogger in find_executable_on_host()
This generates a lot of unnecessary messages on the terminal.
Fixes: https://tracker.ceph.com/issues/53934
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
3be55621600be3ebc9c70295a3a351dab426b3a3 )
Ernesto Puerta [Thu, 20 Jan 2022 17:22:54 +0000 (18:22 +0100)]
Merge pull request #44480 from rhcs-dashboard/wip-53616-pacific
pacific: mgr/prometheus: expose ceph healthchecks as metrics
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Paul Cuzner <pcuzner@redhat.com>
Reviewed-by: sebastian-philipp <NOT@FOUND>
Guillaume Abrioux [Thu, 20 Jan 2022 10:29:52 +0000 (11:29 +0100)]
qa: split distro for rados/cephadm/smoke tests
There was a difference between master and pacific.
The hwe kernel modification for Ubuntu 20.04 should be done
only for cephadm tests. Modifying `qa/distros/all/ubuntu_20.04.yaml` broke
many tests.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Sebastian Wagner [Thu, 20 Jan 2022 10:09:29 +0000 (11:09 +0100)]
Merge pull request #44635 from sebastian-philipp/pacific-backport-44506
pacific: qa/suites/orch/cephadm: Also run the rbd/iscsi suite
Reviewed-by: Adam King <adking@redhat.com>
Yuri Weinstein [Wed, 19 Jan 2022 22:05:38 +0000 (14:05 -0800)]
Merge pull request #44596 from idryomov/wip-xfstests-qemu-cert-pacific
pacific: qa/run_xfstests_qemu.sh: stop reporting success without actually running any tests
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
Yuri Weinstein [Wed, 19 Jan 2022 22:04:47 +0000 (14:04 -0800)]
Merge pull request #44594 from idryomov/wip-diff-iterate-parent-fix-pacific
pacific: librbd: restore diff-iterate include_parent functionality in fast-diff mode
Reviewed-by: Mykola Golub <mgolub@mirantis.com>
Yuri Weinstein [Wed, 19 Jan 2022 22:04:06 +0000 (14:04 -0800)]
Merge pull request #44547 from cfsnyder/wip-53839-pacific
pacific: librbd: diff-iterate reports incorrect offsets in fast-diff mode
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Sebastian Wagner [Wed, 19 Jan 2022 15:11:53 +0000 (16:11 +0100)]
Merge pull request #44626 from sebastian-philipp/pacific-backport-42905
pacific: python-common: improve OSD spec error messages
Reviewed-by: Michael Fritch <mfritch@suse.com>
Sebastian Wagner [Wed, 19 Jan 2022 12:35:41 +0000 (13:35 +0100)]
Merge pull request #44644 from guits/wip-53916-pacific
pacific: ceph-volume: fix regression introcuded via #43536
Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
Ernesto Puerta [Wed, 19 Jan 2022 12:14:59 +0000 (13:14 +0100)]
Merge pull request #44652 from rhcs-dashboard/wip-53921-pacific
pacific: mgr/dashboard: Refactoring dashboard cephadm checks
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Ernesto Puerta [Wed, 19 Jan 2022 12:03:24 +0000 (13:03 +0100)]
Merge pull request #44650 from aaSharma14/wip-53828-pacific
pacific: mgr/dashboard: monitoring:Implement BlueStore onode hit/miss counters into the dashboard
Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
Sebastian Wagner [Thu, 25 Nov 2021 16:38:35 +0000 (17:38 +0100)]
python-common/tests: Remove filstore tests in test_disk_selector.py
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit
1c40ca1e37e5e798cfd9cf317f39b11dd22ea086 )
Sebastian Wagner [Wed, 10 Nov 2021 14:54:42 +0000 (15:54 +0100)]
python-common: Don't valiate ServiceSpec.from_json() in `orch ls`
unfortunately `ceph orch ls` may return invalid OSD specs for
OSDs not associated to and specs.
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit
3f38583b7189d99be360d8475fe6ef8cd53dee7c )
Conflicts:
src/pybind/mgr/orchestrator/module.py
Sebastian Wagner [Wed, 22 Sep 2021 11:46:52 +0000 (13:46 +0200)]
python-common: HostSpec: add `validate()`
Adjust HostSpec interface to ServiceSpec
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit
7c6d922dead8480cd1f2cd05be7ccd1d8d5b7dd8 )
Conflicts:
src/python-common/ceph/deployment/service_spec.py
Sebastian Wagner [Wed, 1 Sep 2021 13:46:12 +0000 (15:46 +0200)]
python-common: DriveGroupSpec: move pacement validation to validate()
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit
311860412e840e6b31e04b80a9de5e9ae05e7fb7 )
Sebastian Wagner [Wed, 1 Sep 2021 13:36:01 +0000 (15:36 +0200)]
pyhton-common: DriveGroupSpec: Allow unnamed OSD specs
Cause it never actually worked as expected.
Remove duplicated service_id check, cause it's already
verified by parent method.
Fixes: https://tracker.ceph.com/issues/46253
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit
8b567e132d75711179febac126c5ec8a250b8952 )
Conflicts:
src/python-common/ceph/deployment/service_spec.py
Sebastian Wagner [Tue, 24 Aug 2021 12:57:27 +0000 (14:57 +0200)]
python-common: Improve DriveSelection error messages
Fixes: https://tracker.ceph.com/issues/50685
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit
74f29b97ea3331d43391cd40fe843104a2c15c3d )
Sebastian Wagner [Tue, 24 Aug 2021 10:56:21 +0000 (12:56 +0200)]
python-common: OSD specs: Improve quality of error messages
Fixes: https://tracker.ceph.com/issues/47401
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit
4142c52d7406bb67042d9ad7b26d8e84f5a734ba )
Conflicts:
src/python-common/ceph/deployment/drive_group.py
Sebastian Wagner [Tue, 24 Aug 2021 12:31:56 +0000 (14:31 +0200)]
python-common: Remove duplicated DriveGroupSpec.__repr__ and __eq__
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit
b91f81801af40c213adfbc88c8fd148b4edf3ede )
Conflicts:
src/python-common/ceph/deployment/drive_group.py
Sebastian Wagner [Wed, 22 Sep 2021 12:20:24 +0000 (14:20 +0200)]
mgr/orch: re-raise to make debugging easier
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit
38b52f715fa581f3540ad6fc4c595ab0ede83ece )
Sebastian Wagner [Wed, 19 Jan 2022 10:39:36 +0000 (11:39 +0100)]
Merge pull request #44627 from sebastian-philipp/pacific-backport-44228
pacific: mgr/cephadm: fix 'cephadm osd activate' on existing osd devices
Reviewed-by: Adam King <adking@redhat.com>
Sebastian Wagner [Wed, 19 Jan 2022 10:39:13 +0000 (11:39 +0100)]
Merge pull request #44625 from sebastian-philipp/pacific-backport-43149
pacific: mgr/cephadm: Add client.admin keyring when upgrading from older version
Reviewed-by: Michael Fritch <mfritch@suse.com>
Guillaume Abrioux [Fri, 14 Jan 2022 17:20:10 +0000 (18:20 +0100)]
qa/cephadm: install hwe kernel only for focal
Let's install hwe kernel only on Ubuntu focal, otherwise we only shift the
problem on Ubuntu bionic given that the hwe kernel for bionic is 5.4.
Fixes: https://tracker.ceph.com/issues/53863
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
5c0f0698a5b8db75ae9bcdca311a68a1589ee0a5 )
Guillaume Abrioux [Thu, 13 Jan 2022 21:46:03 +0000 (22:46 +0100)]
qa/nvme_loop: fix an issue on ubuntu 18.04
The following command:
```
echo /dev/sda | tee /sys/kernel/config/nvmet/subsystems/sda/namespaces/1/device_path
```
makes nvme_loop fail because fascinatingly, it adds an unexpected newline.
See:
```
/dev/sda
/dev/sda
1
tee: /sys/kernel/config/nvmet/subsystems/sda/namespaces/1/enable: No such file or directory
/dev/sda
1
```
Other distros don't have the same behavior:
```
CentOS 8
/dev/sda
/dev/sda
1
Ubuntu 20.04
/dev/sda
/dev/sda
1
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
f8e22fb3da9bfbdc75d88beb66543716afb19511 )
Guillaume Abrioux [Mon, 10 Jan 2022 09:21:53 +0000 (10:21 +0100)]
ceph-volume: fix regression introcuded via #43536
The recent changes from PR #43536 introduced a regeression preventing from
running ceph-volume in a containerized context on Ubuntu 18.04.
Given that the path for the binary `lvs` differs between CentOS 8 and Ubuntu 18.04.
(`/usr/sbin/lvs` and `/sbin/lvs` respictively). It means that ceph-volume running
in the container on CentOS 8 sees the `lvs` binary at `/usr/sbin/lvs` and try to
run it with `nsenter` on the host which is running Ubuntu 18.04.
Fixes: https://tracker.ceph.com/issues/53812
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
95e88cda3df76b59b548ae808df0ef7f19db1f63 )
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
3c93ffdc92d4d03b9ae7415b548192a572cfc5ea )
Nizamudeen A [Thu, 13 Jan 2022 12:58:56 +0000 (18:28 +0530)]
mgr/dashboard: Refactoring dashboard cephadm checks
I isolated all the tests suites into there respective files
so that in future it is easier to add more tests to it.
I also given priority to the host actions.
Create OSD checks are now written in a way that OSDs
are created only on the intended hosts. This will make
the host draining process easier and less time consuming.
Also tried to address the flaky force maintenance checks.
Removed some duplicated codes
Service creation part improved to reduce the time taken
for its completion
Fixes: https://tracker.ceph.com/issues/53905
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit
b6759b75c9fc4d3fb565201aa6bbe0c2473fd3d4 )
Ernesto Puerta [Thu, 13 Jan 2022 16:21:12 +0000 (17:21 +0100)]
qa/dashboard: ensure node 16 is installed
For Ubuntu: https://github.com/nodesource/distributions#manual-installation
Fixes: https://tracker.ceph.com/issues/53843
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
(cherry picked from commit
7225b68e46173350954beb418ecd43e9eca4d179 )
Aashish Sharma [Mon, 13 Dec 2021 12:03:02 +0000 (17:33 +0530)]
mgr/dashboard: monitoring:Implement BlueStore onode hit/miss counters into the dashboard
Provide the details pulled from Bluestore stats in order to display the onode hit/miss counters
Fixes: https://tracker.ceph.com/issues/53577
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit
15aa4dffa91b325014024d3e35603d88330b87cc )
Ernesto Puerta [Tue, 18 Jan 2022 20:01:51 +0000 (21:01 +0100)]
Merge pull request #44467 from rhcs-dashboard/wip-53780-pacific
pacific: mgr/dashboard: fix orchestrator/02-hosts-inventory.e2e failure
Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
Reviewed-by: Yuri Weinstein <yweins@redhat.com>
Ernesto Puerta [Tue, 18 Jan 2022 19:58:47 +0000 (20:58 +0100)]
Merge pull request #44533 from rhcs-dashboard/wip-53825-pacific
pacific: mgr/dashboard: add test coverage for API docs (SwaggerUI)
Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Sebastian Wagner [Tue, 18 Jan 2022 13:55:31 +0000 (14:55 +0100)]
Merge pull request #44529 from sebastian-philipp/pacific-backport-43901-44341
pacific: mgr/cephadm: Add snmp-gateway service support
Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Paul Cuzner <pcuzner@redhat.com>
Sebastian Wagner [Mon, 10 Jan 2022 09:45:36 +0000 (10:45 +0100)]
qa/suites/orch/cephadm: Also run the rbd/iscsi suite
Adding a new workload test to our suite.
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit
651192aacc4ac695a03f4ab0f7ffa045632d5d11 )
Sage Weil [Thu, 16 Dec 2021 15:00:05 +0000 (10:00 -0500)]
qa/suites/orch/cephadm/osds: test 'ceph cephadm osd activate'
Make sure this command behaves when the /var/lib/ceph osd.NNN dir is
removed.
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit
867bf04b74d510a544d9555afc56d5cd6657874d )
Sage Weil [Mon, 6 Dec 2021 15:19:57 +0000 (10:19 -0500)]
mgr/cephadm/services/osd: skip found osds that already have daemons
If we are trying to deploy new or newly-found osds, we can skip the ones
that already have cephadm daemons deployed.
Fixes: https://tracker.ceph.com/issues/53491
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit
dc3d45bbe8c3bfedee57da619616c0be489cd233 )
Conflicts:
src/pybind/mgr/cephadm/services/osd.py
Sage Weil [Mon, 6 Dec 2021 15:19:16 +0000 (10:19 -0500)]
mgr/cephadm: allow activation of OSDs that have previously started
When this code was introduced way back in
ea987a0e56db106f7c76d11f86b3e602257f365e ,
for some reason I was focused only on freshly created OSDs. The
get_osd_uuid_map() helper is used by deploy_osd_daemons_for_existing_osds()
which is called not only by OSD creation but also by 'ceph cephadm
osd activate', which is meant to instantiate daemons for existing OSD
devices (e.g., devices that were reattached to a new server, or whose
/var/lib/ceph/$fsid/osd.$id directory was lost for some other reason.
However, if we ignore OSDs with up_from > 0, then we can't recreate a
daemon instance for such existing OSDs--arguably the most important ones,
since they may hold real data.
Fixes: https://tracker.ceph.com/issues/53491
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit
40aeac7f52c80df0daa99bb664e3d672da3bc249 )
Sebastian Wagner [Mon, 20 Dec 2021 10:48:43 +0000 (11:48 +0100)]
python-common: move test_valid_snmp_gateway_spec from mgr/cephadm
We have to validate to_json() now as well, as we have spcial enums.
Otherwiese we might end up with !!python... representations.
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit
303843b476b442d0d398680b23aa244633768f29 )
Sebastian Wagner [Mon, 20 Dec 2021 10:37:40 +0000 (11:37 +0100)]
python-common: move test_invalid_snmp_gateway_spec from mgr/cephadm
Let's keep the tests in the same package where the class is defined.
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit
c652ae74795252f875594b09627064d97ff2a762 )
Sebastian Wagner [Thu, 16 Dec 2021 16:57:50 +0000 (17:57 +0100)]
mgr/cephadm: SNMP: don't write urls manually
this just broken for non-trivial urls. Don't be a bad example
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit
3f47c2293b9ace730d6f76c613ef2106f274ea32 )
Sebastian Wagner [Thu, 16 Dec 2021 16:51:07 +0000 (17:51 +0100)]
mgr/cephadm: SNMP: Don't write default values into the store
Enable us to chage defaults in the future
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit
5e3cc4d6c167b7d5bdd0f08aa90ed7e7d0779b25 )
Sebastian Wagner [Thu, 16 Dec 2021 16:43:47 +0000 (17:43 +0100)]
mgr/cephadm: SNMP: use of python3 enums
Little reason to duplicate things ourselves
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit
0039accb2caedf99166b88cc5b75736b6a7fd5c2 )
Conflicts:
src/pybind/mgr/orchestrator/module.py
src/python-common/ceph/deployment/service_spec.py
src/python-common/ceph/tests/test_service_spec.py
Paul Cuzner [Fri, 12 Nov 2021 03:16:59 +0000 (16:16 +1300)]
mgr/cephadm: Add snmp-gateway service support
Add a new snmp-gateway service to provide a bridge between
Prometheus and an SNMP management platform. The gateway
service uses https://github.com/maxwo/snmp_notifier to provide
an SNMP v2c and SNMP V3 support.
The SNMP V3 support mandates at least authentication, and also
offers authentication and privacy (encryption).
Fixes: https://tracker.ceph.com/issues/52920
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
(cherry picked from commit
c2f5e105ca4870b2cb124db662537c20e6daadae )
Conflicts:
src/pybind/mgr/cephadm/module.py
src/pybind/mgr/orchestrator/_interface.py
src/pybind/mgr/orchestrator/module.py
src/python-common/ceph/deployment/service_spec.py
Paul Cuzner [Fri, 12 Nov 2021 03:19:00 +0000 (16:19 +1300)]
mgr/cephadm: Add unit tests for snmp-gateway support
Adds tests to validate the deployed configuration given a known
input context, and check the parameters created based on input
various input scenarios
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
(cherry picked from commit
2ffa81bb91618eb70708073096f39bc1f8e2a8e6 )
Conflicts:
src/pybind/mgr/cephadm/tests/test_services.py
Paul Cuzner [Fri, 12 Nov 2021 03:17:52 +0000 (16:17 +1300)]
mgr/cephadm: Updated docs for snmp-gateway support
Updated docs to show snmp-gateway usage. docs provide
guidance on SNMP versions supported and show CLI and
yaml deployment examples.
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
(cherry picked from commit
91f35e1f5355bb4d1c9e7be4a943d564483f4e13 )
Paul Cuzner [Wed, 13 Oct 2021 23:35:31 +0000 (12:35 +1300)]
mgr/cephadm: provide initial snmp gateway support
This patch enables the cephadm binary
to deploy an SNMP gateway based on -
https://hub.docker.com/r/maxwo/snmp-notifier
Fixes: https://tracker.ceph.com/issues/52920
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
(cherry picked from commit
5c997ad355dea01b1bec0b977f4b4ac33407d8d5 )
Conflicts:
src/cephadm/cephadm
Sebastian Wagner [Mon, 29 Nov 2021 10:50:59 +0000 (11:50 +0100)]
mgr/cephadm: serve.py: put _write_client_files into it's own method
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit
018807ef655068d699c70388e41284addee32040 )
Conflicts:
src/pybind/mgr/cephadm/serve.py
Sebastian Wagner [Mon, 29 Nov 2021 10:36:51 +0000 (11:36 +0100)]
mgr/cephadm: serve.py: put _calc_client_files into it's own method
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit
fb2321ec6988075777d8fc838f1d19034855264a )
Conflicts:
src/pybind/mgr/cephadm/serve.py
Sebastian Wagner [Mon, 13 Sep 2021 14:05:03 +0000 (16:05 +0200)]
mgr/cephadm: Raise errors to properly set a cli status code
otherwise `ceph orch host rm` will return 0
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit
1a87e5eaf54b30c1974ed02aa7e69656d0106c27 )
Sebastian Wagner [Mon, 13 Sep 2021 14:03:02 +0000 (16:03 +0200)]
mgr/cephadm: Add client.admin keyring when upgrading from older version
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit
02c942a093a28376301b9b4c66d9c712345ff953 )
Conflicts:
src/pybind/mgr/cephadm/tests/test_migration.py
Sebastian Wagner [Mon, 13 Sep 2021 07:56:06 +0000 (09:56 +0200)]
mgr/cephadm/inventory: remove unused `filter_by_label`
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit
8de88a1d0ac4f4747fa15d45d2a82b34d6b35a95 )
Sebastian Wagner [Tue, 18 Jan 2022 10:18:37 +0000 (11:18 +0100)]
Merge pull request #44527 from sebastian-philipp/pacific-backport-44267
pacific: python-common: add int value validation for count and count_per_host
Reviewed-by: John Mulligan <jmulligan@redhat.com>
Sebastian Wagner [Tue, 18 Jan 2022 08:52:03 +0000 (09:52 +0100)]
Merge pull request #44528 from sebastian-philipp/pacific-backport-44293
pacific: cephadm: make extract_uid_gid errors more readable
Reviewed-by: Adam King <adking@redhat.com>
Sebastian Wagner [Tue, 18 Jan 2022 08:50:13 +0000 (09:50 +0100)]
Merge pull request #44526 from sebastian-philipp/pacific-backport-44035
pacific: mgr/cephadm: less log noise when config checks fail
Reviewed-by: Adam King <adking@redhat.com>
Sebastian Wagner [Tue, 18 Jan 2022 08:49:59 +0000 (09:49 +0100)]
Merge pull request #44248 from guits/pacific-backport-44104
pacific: cephadm: pass `CEPH_VOLUME_SKIP_RESTORECON=yes` (backport)
Reviewed-by: Adam King <adking@redhat.com>
Sebastian Wagner [Tue, 18 Jan 2022 08:30:52 +0000 (09:30 +0100)]
Merge pull request #44525 from sebastian-philipp/pacific-backport-44129-44109-44309
pacific: doc/cephadm: Doc backport
Reviewed-by: Adam King <adking@redhat.com>
Sebastian Wagner [Tue, 18 Jan 2022 08:30:24 +0000 (09:30 +0100)]
Merge pull request #44535 from adk3798/backport-44134
pacific: mgr/cephadm: avoid repeated calls to get_module_option
Reviewed-by: Michael Fritch <mfritch@suse.com>
Sebastian Wagner [Tue, 18 Jan 2022 08:29:48 +0000 (09:29 +0100)]
Merge pull request #44531 from sebastian-philipp/pacific-backport-44020
pacific: mgr/orchestrator: add filtering and count option for orch host ls
Reviewed-by: Adam King <adking@redhat.com>
Sebastian Wagner [Mon, 17 Jan 2022 09:16:37 +0000 (10:16 +0100)]
Merge pull request #44530 from sebastian-philipp/pacific-backport-44336
pacific: mgr/cephadm: Fix test_facts
Reviewed-by: Adam King <adking@redhat.com>
Ernesto Puerta [Fri, 14 Jan 2022 17:24:05 +0000 (18:24 +0100)]
Merge pull request #44597 from rhcs-dashboard/wip-53881-pacific
pacific: mgr/dashboard: fix: get SMART data from single-daemon device
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
Alfonso Martínez [Thu, 13 Jan 2022 14:20:48 +0000 (15:20 +0100)]
mgr/dashboard: fix: get SMART data from single-daemon device
Return SMART data even when a device is only associated with a single daemon.
Fixes: https://tracker.ceph.com/issues/53858
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
(cherry picked from commit
6cd3729e2737f9012569cffc6fd69cc5eed287ed )
Ilya Dryomov [Tue, 11 Jan 2022 20:26:12 +0000 (21:26 +0100)]
qa/tasks/qemu: get the new Let's Encrypt root certificate
Fixes: https://tracker.ceph.com/issues/53841
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
b47965b5773d086eb64e7f91bdc05f483f562b00 )
Ilya Dryomov [Tue, 11 Jan 2022 12:13:01 +0000 (13:13 +0100)]
qa/run_xfstests_qemu.sh: harden against wget failures
If wget fails (e.g. due to a certificate issue), it still creates
an empty file. Then this file is marked executable, ./"${SCRIPT}"
immediately returns 0 and run_xfstests_qemu.sh exits successfully
without running a single xfstest.
This started on Sep 30, 2021 with the expiration of Let's Encrypt
root certificate -- all qemu jobs with "test: qa/run_xfstests_qemu.sh"
just booted the VM for a couple of seconds and reported success.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
387be947948ff1dd40e88ae5288b9a52c7cde403 )
Ilya Dryomov [Fri, 7 Jan 2022 12:31:08 +0000 (13:31 +0100)]
test/librbd: make diff-iterate clone tests exercise fast-diff mode
The fast-diff feature wasn't propagated to the clone so these tests
were exercising the slow list_snaps path no matter what RBD_FEATURES
value was supplied to ceph_test_librbd.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
ceb13d76f2b3aba7209e85f3354970c072997742 )
Ilya Dryomov [Wed, 5 Jan 2022 19:24:40 +0000 (20:24 +0100)]
librbd: restore diff-iterate include_parent functionality in fast-diff mode
Commit
4429ed4f3f4c ("librbd: switch diff iterate API to use new snaps
list dispatch methods") removed the recursive execute() call. The new
list_snaps method does indeed handle parent diffs internally but it is
not used in fast-diff mode. Nothing changed there -- we still need to
load the parent object map, calculate parent object_diff_state, etc.
Fixes: https://tracker.ceph.com/issues/53787
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
04293bef6ccd2b9ca3db53906b63c952e235cdb4 )
Ilya Dryomov [Wed, 5 Jan 2022 18:45:50 +0000 (19:45 +0100)]
librbd: stash unmodified include_parent value in DiffContext
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
92ca5ec36496dd02f618dc161e52b24711baa47b )
Yuri Weinstein [Thu, 13 Jan 2022 15:50:19 +0000 (07:50 -0800)]
Merge pull request #44296 from batrick/i53445
pacific: mds: opening connection to up:replay/up:creating daemon causes message drop
Reviewed-by: Milind Changire <mchangir@redhat.com>
Yuri Weinstein [Thu, 13 Jan 2022 15:49:50 +0000 (07:49 -0800)]
Merge pull request #44272 from nmshelke/wip-53332-pacific
pacific: doc: prerequisites fix for cephFS mount
Reviewed-by: Milind Changire <mchangir@redhat.com>
Yuri Weinstein [Thu, 13 Jan 2022 15:49:25 +0000 (07:49 -0800)]
Merge pull request #44168 from cfsnyder/wip-50851-pacific
pacific: mds: PurgeQueue.cc fix for 32bit compilation
Reviewed-by: Milind Changire <mchangir@redhat.com>
Yuri Weinstein [Thu, 13 Jan 2022 15:48:30 +0000 (07:48 -0800)]
Merge pull request #43979 from lxbsz/wip-53218
pacific: qa: increase the timeout value to wait a litte longer
Reviewed-by: Nikhilkumar Shelke <nshelke@redhat.com>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Ilya Dryomov [Tue, 4 Jan 2022 19:38:35 +0000 (20:38 +0100)]
librbd: diff-iterate reports incorrect offsets in fast-diff mode
If rbd_diff_iterate2() is called on an image offset that doesn't
correspond to an object boundary, the callback is invoked with an
incorrect image offset. For example, assuming a fully allocated
image, a diff request for
806354944 ~57344 results in offs=
807403520 ,
len=57344, exists=true invocation, which is ahead by
1048576 bytes.
This occurs only in fast-diff mode, for a diff request on an image
with the fast-diff feature disabled or if whole_object parameter is
set to false the invocation is correct.
This bug goes back to the introduction of fast-diff mode in commit
6d5b969d4206 ("librbd: add diff_iterate2 to API").
Fixes: https://tracker.ceph.com/issues/53784
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
ea07d1e834018c693fc03637d338806f3c2f494f )
Sage Weil [Mon, 29 Nov 2021 20:51:26 +0000 (15:51 -0500)]
pacific: mgr/cephadm: avoid repeated calls to get_module_option
We already stash these as MgrModule members.
Signed-off-by: Sage Weil <sage@newdream.net>
Conflicts:
src/pybind/mgr/cephadm/module.py
src/pybind/mgr/cephadm/serve.py
src/pybind/mgr/cephadm/services/cephadmservice.py
Nizamudeen A [Sun, 3 Oct 2021 18:56:45 +0000 (00:26 +0530)]
mgr/dashboard: Update Angular version to 12
A full changelog can be seen here: https://blog.angular.io/angular-v12-is-now-available-
32ed51fbfd49
For us, the most I had to do is to take care of the min-max validation
and a small CSS change regarding the math()
Fixes: https://tracker.ceph.com/issues/53049
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit
e3d92e4889a4022aebf343d1142388ba699b265d )
Conflicts:
src/pybind/mgr/dashboard/frontend/package-lock.json
- File regenerated.
src/pybind/mgr/dashboard/frontend/package.json
- Conflicts solved.
Alfonso Martínez [Mon, 3 Jan 2022 16:43:07 +0000 (17:43 +0100)]
mgr/dashboard: add test coverage for API docs (SwaggerUI)
Fixes: https://tracker.ceph.com/issues/53756
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
(cherry picked from commit
7363bc3af1613f2b06eaf34ea8c57ee8f4583537 )
Adam King [Fri, 19 Nov 2021 00:43:35 +0000 (19:43 -0500)]
mgr/orchestrator: add filtering and count option for orch host ls
Filter orch host ls output for only hosts whose name
contains a certain substring or who have a certain label
Add a count flag that causes the command to return the number
of hosts found (either overall or matching the substring and/or
label) instead of a list of all the matching hosts
Fixes: https://tracker.ceph.com/issues/47774
Fixes: https://tracker.ceph.com/issues/53452
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit
edd9bf38c3f07f5fdb6714e7f66515820c736d2e )
Sebastian Wagner [Thu, 16 Dec 2021 15:40:08 +0000 (16:40 +0100)]
mgr/cephadm: Fix test_facts
Wasn't executed before
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit
a03a34a01a70ce4d4ac8927a37d27e9853e46f8a )
Sebastian Wagner [Mon, 13 Dec 2021 11:54:22 +0000 (12:54 +0100)]
cephadm: make extract_uid_gid errors more readable
Avoid dumping a traceback
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit
d732a51df3a8d6b9edc340251edcd024b0e70f09 )
John Mulligan [Fri, 10 Dec 2021 13:19:59 +0000 (08:19 -0500)]
python-common: add test inputs verifying count & count-per-host >= 1
This adds unit new test inputs, local to python-common that verify the
correct error messages are raised when count == 0 and count_per_host ==
0.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit
0eb4e7dd56f3db6448080b0e9b880927c1bb7e04 )
John Mulligan [Fri, 10 Dec 2021 13:16:19 +0000 (08:16 -0500)]
python-common: make count & count-per-host >= 1 checks consistent
The previous version of the validate function had a incorrect error
statement that suggested the count must be >1 when it should have
been >=1. This confusion was possibly due to using "n < 1" on
one line and "n <= 0" on another line. Since both values are supposed
to be integers this change corrects the error message and makes
the comparisons on the lines both use "n < 1" (since I find it easier
to see that the check "n < 1" is the inverse of the error text
asserting "n >= 1").
Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit
6169eb7f8e2462eb58338d6fc312b1347858b47f )
John Mulligan [Wed, 8 Dec 2021 20:37:11 +0000 (15:37 -0500)]
python-common: add unit test func for invalid yaml inputs
I didn't find a preexisting test function for this so I added a
new test that is fed yaml snippets and expected error messages.
This verifies some of the recently added validation for
count and cound_per_host under the placement spec.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit
068d37d95762bce4d11668a838c6e85f6098723a )
John Mulligan [Wed, 8 Dec 2021 20:33:54 +0000 (15:33 -0500)]
python-common: add int value validation for count and count_per_host
Add additional validation for the count and count_per_host fields
sourced from YAML.
Fixes: https://tracker.ceph.com/issues/50524
Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit
a9ad2a50fe83ea3342b7c1bbcfb942789e965cb4 )
Sage Weil [Sat, 20 Nov 2021 14:53:36 +0000 (09:53 -0500)]
mgr/cephadm: less log noise when config checks fail
We are already raising health alerts--there is no need to spam the log
every few seconds when these checks are evaluated.
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit
f2a2e2d92ca21700aeffc78cce4a3d3c5949fd3f )
Foad Lind [Tue, 14 Dec 2021 13:01:58 +0000 (14:01 +0100)]
doc/cephadm/upgrade: correct example command
Update the ceph version used in the example upgrade command to match the one mentioned in the text above it.
Signed-off-by: Foad Lind <foad.lind@citynetwork.eu>
(cherry picked from commit
5077eef37844c1fc25c444a5b54d44a37052875c )
Sebastian Wagner [Thu, 25 Nov 2021 14:52:20 +0000 (15:52 +0100)]
doc/cephadm: host location: add link to types
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit
ee7ed53df865cfd1b88216cc7d27029172b935ef )
Brian_P [Mon, 29 Nov 2021 14:13:17 +0000 (14:13 +0000)]
doc: fix typo in cephadm host management
(cherry picked from commit
22ca9ce373efd527d838a58ed25617ce4e7dcd91 )
Sebastian Wagner [Tue, 11 Jan 2022 10:36:46 +0000 (11:36 +0100)]
Merge pull request #44446 from sebastian-philipp/pacific-backport-43827-43894-42906-43095-43929-43969-43873-43888-44092-44080-
pacific: cephadm: November batch 2
Reviewed-by: Adam King <adking@redhat.com>
Paul Cuzner [Wed, 3 Nov 2021 02:24:20 +0000 (15:24 +1300)]
mgr/prometheus: Update rule format and enhance SNMP support
Rules now adhere to the format defined by Prometheus.io.
This changes alert naming and each alert now includes a
a summary description to provide a quick one-liner.
In addition to reformatting some missing alerts for MDS and
cephadm have been added, and corresponding tests added.
The MIB has also been refactored, so it now passes standard
lint tests and a README included for devs to understand the
OID schema.
Fixes: https://tracker.ceph.com/issues/53111
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
Alfonso Martínez [Fri, 7 Jan 2022 11:32:46 +0000 (12:32 +0100)]
Merge pull request #44468 from rhcs-dashboard/wip-53716-pacific
pacific: mgr/dashboard: fix timeout error in dashboard cephadm e2e job
Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
Yuri Weinstein [Thu, 6 Jan 2022 22:19:14 +0000 (14:19 -0800)]
Merge pull request #44171 from cfsnyder/wip-52073-pacific
pacific: rgw: user stats showing 0 value for "size_utilized" and "size_kb_utilized" fields
Reviewed-by: Casey Bodley <cbodley@redhat.com>
Yuri Weinstein [Thu, 6 Jan 2022 22:18:50 +0000 (14:18 -0800)]
Merge pull request #44166 from cfsnyder/wip-53289-pacific
pacific: rgw: fix `bi put` not using right bucket index shard
Reviewed-by: Casey Bodley <cbodley@redhat.com>
Yuri Weinstein [Thu, 6 Jan 2022 22:18:24 +0000 (14:18 -0800)]
Merge pull request #43968 from cfsnyder/wip-53256-pacific
pacific: librgw: treat empty root path as "/" on mount
Reviewed-by: Casey Bodley <cbodley@redhat.com>
Yuri Weinstein [Thu, 6 Jan 2022 22:17:58 +0000 (14:17 -0800)]
Merge pull request #43966 from cfsnyder/wip-53225-pacific
pacific: qa/rgw: bump tempest version to resolve dependency issue
Reviewed-by: Casey Bodley <cbodley@redhat.com>
Yuri Weinstein [Thu, 6 Jan 2022 22:17:28 +0000 (14:17 -0800)]
Merge pull request #43951 from cfsnyder/wip-53098-pacific
pacific: qa/rgw: Fix vault token file access.
Reviewed-by: Casey Bodley <cbodley@redhat.com>
Yuri Weinstein [Thu, 6 Jan 2022 22:17:00 +0000 (14:17 -0800)]
Merge pull request #43946 from cfsnyder/wip-53271-pacific
pacific: rgw/beast: optimizations for request timeout
Reviewed-by: Casey Bodley <cbodley@redhat.com>