git-server-git.apps.pok.os.sepia.ceph.com Git

ceph-volume: filter RBD devices from the device inventory

Avoid running `blkid` or deploying OSDs on RBD devices by ensuring they
do not appear in the `ceph-volume inventory`

Fixes: https://tracker.ceph.com/issues/53846
Signed-off-by: Michael Fritch <mfritch@suse.com>
(cherry picked from commit 47325ec3ec5ce1d53c5eae2952f631e95b7135fe)

Merge pull request #44681 from guits/split-cephadm-distros

qa: split distro for rados/cephadm/smoke tests

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>

Merge pull request #44480 from rhcs-dashboard/wip-53616-pacific

pacific: mgr/prometheus: expose ceph healthchecks as metrics

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Paul Cuzner <pcuzner@redhat.com>
Reviewed-by: sebastian-philipp <NOT@FOUND>

qa: split distro for rados/cephadm/smoke tests

There was a difference between master and pacific.
The hwe kernel modification for Ubuntu 20.04 should be done
only for cephadm tests. Modifying `qa/distros/all/ubuntu_20.04.yaml` broke
many tests.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

Merge pull request #44635 from sebastian-philipp/pacific-backport-44506

pacific: qa/suites/orch/cephadm: Also run the rbd/iscsi suite

Reviewed-by: Adam King <adking@redhat.com>

Merge pull request #44596 from idryomov/wip-xfstests-qemu-cert-pacific

pacific: qa/run_xfstests_qemu.sh: stop reporting success without actually running any tests

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

Merge pull request #44594 from idryomov/wip-diff-iterate-parent-fix-pacific

pacific: librbd: restore diff-iterate include_parent functionality in fast-diff mode

Reviewed-by: Mykola Golub <mgolub@mirantis.com>

Merge pull request #44547 from cfsnyder/wip-53839-pacific

pacific: librbd: diff-iterate reports incorrect offsets in fast-diff mode

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>

Merge pull request #44626 from sebastian-philipp/pacific-backport-42905

pacific: python-common: improve OSD spec error messages

Reviewed-by: Michael Fritch <mfritch@suse.com>

Merge pull request #44644 from guits/wip-53916-pacific

pacific: ceph-volume: fix regression introcuded via #43536

Reviewed-by: Sebastian Wagner <sewagner@redhat.com>

Merge pull request #44652 from rhcs-dashboard/wip-53921-pacific

pacific: mgr/dashboard: Refactoring dashboard cephadm checks

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>

Merge pull request #44650 from aaSharma14/wip-53828-pacific

pacific: mgr/dashboard: monitoring:Implement BlueStore onode hit/miss counters into the dashboard

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>

python-common/tests: Remove filstore tests in test_disk_selector.py

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit 1c40ca1e37e5e798cfd9cf317f39b11dd22ea086)

python-common: Don't valiate ServiceSpec.from_json() in `orch ls`

unfortunately `ceph orch ls` may return invalid OSD specs for
OSDs not associated to and specs.

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit 3f38583b7189d99be360d8475fe6ef8cd53dee7c)

Conflicts:
src/pybind/mgr/orchestrator/module.py

python-common: HostSpec: add `validate()`

Adjust HostSpec interface to ServiceSpec

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit 7c6d922dead8480cd1f2cd05be7ccd1d8d5b7dd8)

Conflicts:
src/python-common/ceph/deployment/service_spec.py

python-common: DriveGroupSpec: move pacement validation to validate()

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit 311860412e840e6b31e04b80a9de5e9ae05e7fb7)

pyhton-common: DriveGroupSpec: Allow unnamed OSD specs

Cause it never actually worked as expected.

Remove duplicated service_id check, cause it's already
verified by parent method.

Fixes: https://tracker.ceph.com/issues/46253
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit 8b567e132d75711179febac126c5ec8a250b8952)

Conflicts:
src/python-common/ceph/deployment/service_spec.py

python-common: Improve DriveSelection error messages

Fixes: https://tracker.ceph.com/issues/50685
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit 74f29b97ea3331d43391cd40fe843104a2c15c3d)

python-common: OSD specs: Improve quality of error messages

Fixes: https://tracker.ceph.com/issues/47401
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit 4142c52d7406bb67042d9ad7b26d8e84f5a734ba)

Conflicts:
src/python-common/ceph/deployment/drive_group.py

python-common: Remove duplicated DriveGroupSpec.__repr__ and __eq__

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit b91f81801af40c213adfbc88c8fd148b4edf3ede)

Conflicts:
src/python-common/ceph/deployment/drive_group.py

mgr/orch: re-raise to make debugging easier

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit 38b52f715fa581f3540ad6fc4c595ab0ede83ece)

Merge pull request #44627 from sebastian-philipp/pacific-backport-44228

pacific: mgr/cephadm: fix 'cephadm osd activate' on existing osd devices

Reviewed-by: Adam King <adking@redhat.com>

Merge pull request #44625 from sebastian-philipp/pacific-backport-43149

pacific: mgr/cephadm: Add client.admin keyring when upgrading from older version

Reviewed-by: Michael Fritch <mfritch@suse.com>

qa/cephadm: install hwe kernel only for focal

Let's install hwe kernel only on Ubuntu focal, otherwise we only shift the
problem on Ubuntu bionic given that the hwe kernel for bionic is 5.4.

Fixes: https://tracker.ceph.com/issues/53863
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5c0f0698a5b8db75ae9bcdca311a68a1589ee0a5)

qa/nvme_loop: fix an issue on ubuntu 18.04

The following command:

```
echo /dev/sda | tee /sys/kernel/config/nvmet/subsystems/sda/namespaces/1/device_path
```

makes nvme_loop fail because fascinatingly, it adds an unexpected newline.

See:
```
/dev/sda
/dev/sda

1
tee: /sys/kernel/config/nvmet/subsystems/sda/namespaces/1/enable: No such file or directory
/dev/sda
1
```

Other distros don't have the same behavior:

```
CentOS 8
/dev/sda
/dev/sda
1

Ubuntu 20.04
/dev/sda
/dev/sda
1
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f8e22fb3da9bfbdc75d88beb66543716afb19511)

ceph-volume: fix regression introcuded via #43536

The recent changes from PR #43536 introduced a regeression preventing from
running ceph-volume in a containerized context on Ubuntu 18.04.

Given that the path for the binary `lvs` differs between CentOS 8 and Ubuntu 18.04.
(`/usr/sbin/lvs` and `/sbin/lvs` respictively). It means that ceph-volume running
in the container on CentOS 8 sees the `lvs` binary at `/usr/sbin/lvs` and try to
run it with `nsenter` on the host which is running Ubuntu 18.04.

Fixes: https://tracker.ceph.com/issues/53812
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 95e88cda3df76b59b548ae808df0ef7f19db1f63)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3c93ffdc92d4d03b9ae7415b548192a572cfc5ea)

mgr/dashboard: Refactoring dashboard cephadm checks

I isolated all the tests suites into there respective files
so that in future it is easier to add more tests to it.

I also given priority to the host actions.

Create OSD checks are now written in a way that OSDs
are created only on the intended hosts. This will make
the host draining process easier and less time consuming.

Also tried to address the flaky force maintenance checks.

Removed some duplicated codes

Service creation part improved to reduce the time taken
for its completion

Fixes: https://tracker.ceph.com/issues/53905
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit b6759b75c9fc4d3fb565201aa6bbe0c2473fd3d4)

mgr/dashboard: monitoring:Implement BlueStore onode hit/miss counters into the dashboard

Provide the details pulled from Bluestore stats in order to display the onode hit/miss counters

Fixes: https://tracker.ceph.com/issues/53577
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 15aa4dffa91b325014024d3e35603d88330b87cc)

Merge pull request #44467 from rhcs-dashboard/wip-53780-pacific

pacific: mgr/dashboard: fix orchestrator/02-hosts-inventory.e2e failure

Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
Reviewed-by: Yuri Weinstein <yweins@redhat.com>

Merge pull request #44533 from rhcs-dashboard/wip-53825-pacific

pacific: mgr/dashboard: add test coverage for API docs (SwaggerUI)

Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>

Merge pull request #44529 from sebastian-philipp/pacific-backport-43901-44341

pacific: mgr/cephadm: Add snmp-gateway service support

Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Paul Cuzner <pcuzner@redhat.com>

qa/suites/orch/cephadm: Also run the rbd/iscsi suite

Adding a new workload test to our suite.

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit 651192aacc4ac695a03f4ab0f7ffa045632d5d11)

qa/suites/orch/cephadm/osds: test 'ceph cephadm osd activate'

Make sure this command behaves when the /var/lib/ceph osd.NNN dir is
removed.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 867bf04b74d510a544d9555afc56d5cd6657874d)

mgr/cephadm/services/osd: skip found osds that already have daemons

If we are trying to deploy new or newly-found osds, we can skip the ones
that already have cephadm daemons deployed.

Fixes: https://tracker.ceph.com/issues/53491
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit dc3d45bbe8c3bfedee57da619616c0be489cd233)

Conflicts:
src/pybind/mgr/cephadm/services/osd.py

mgr/cephadm: allow activation of OSDs that have previously started

When this code was introduced way back in ea987a0e56db106f7c76d11f86b3e602257f365e,
for some reason I was focused only on freshly created OSDs. The
get_osd_uuid_map() helper is used by deploy_osd_daemons_for_existing_osds()
which is called not only by OSD creation but also by 'ceph cephadm
osd activate', which is meant to instantiate daemons for existing OSD
devices (e.g., devices that were reattached to a new server, or whose
/var/lib/ceph/$fsid/osd.$id directory was lost for some other reason.
However, if we ignore OSDs with up_from > 0, then we can't recreate a
daemon instance for such existing OSDs--arguably the most important ones,
since they may hold real data.

Fixes: https://tracker.ceph.com/issues/53491
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 40aeac7f52c80df0daa99bb664e3d672da3bc249)

python-common: move test_valid_snmp_gateway_spec from mgr/cephadm

We have to validate to_json() now as well, as we have spcial enums.
Otherwiese we might end up with !!python... representations.

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit 303843b476b442d0d398680b23aa244633768f29)

python-common: move test_invalid_snmp_gateway_spec from mgr/cephadm

Let's keep the tests in the same package where the class is defined.

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit c652ae74795252f875594b09627064d97ff2a762)

mgr/cephadm: SNMP: don't write urls manually

this just broken for non-trivial urls. Don't be a bad example

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit 3f47c2293b9ace730d6f76c613ef2106f274ea32)

mgr/cephadm: SNMP: Don't write default values into the store

Enable us to chage defaults in the future

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit 5e3cc4d6c167b7d5bdd0f08aa90ed7e7d0779b25)

mgr/cephadm: SNMP: use of python3 enums

Little reason to duplicate things ourselves

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit 0039accb2caedf99166b88cc5b75736b6a7fd5c2)

Conflicts:
src/pybind/mgr/orchestrator/module.py
src/python-common/ceph/deployment/service_spec.py
src/python-common/ceph/tests/test_service_spec.py

mgr/cephadm: Add snmp-gateway service support

Add a new snmp-gateway service to provide a bridge between
Prometheus and an SNMP management platform. The gateway
service uses https://github.com/maxwo/snmp_notifier to provide
an SNMP v2c and SNMP V3 support.

The SNMP V3 support mandates at least authentication, and also
offers authentication and privacy (encryption).

Fixes: https://tracker.ceph.com/issues/52920
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
(cherry picked from commit c2f5e105ca4870b2cb124db662537c20e6daadae)

Conflicts:
src/pybind/mgr/cephadm/module.py
src/pybind/mgr/orchestrator/_interface.py
src/pybind/mgr/orchestrator/module.py
src/python-common/ceph/deployment/service_spec.py

mgr/cephadm: Add unit tests for snmp-gateway support

Adds tests to validate the deployed configuration given a known
input context, and check the parameters created based on input
various input scenarios

Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
(cherry picked from commit 2ffa81bb91618eb70708073096f39bc1f8e2a8e6)

Conflicts:
src/pybind/mgr/cephadm/tests/test_services.py

mgr/cephadm: Updated docs for snmp-gateway support

Updated docs to show snmp-gateway usage. docs provide
guidance on SNMP versions supported and show CLI and
yaml deployment examples.

Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
(cherry picked from commit 91f35e1f5355bb4d1c9e7be4a943d564483f4e13)

mgr/cephadm: provide initial snmp gateway support

This patch enables the cephadm binary
to deploy an SNMP gateway based on -
https://hub.docker.com/r/maxwo/snmp-notifier

Fixes: https://tracker.ceph.com/issues/52920
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
(cherry picked from commit 5c997ad355dea01b1bec0b977f4b4ac33407d8d5)

Conflicts:
src/cephadm/cephadm

mgr/cephadm: serve.py: put _write_client_files into it's own method

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit 018807ef655068d699c70388e41284addee32040)

Conflicts:
src/pybind/mgr/cephadm/serve.py

mgr/cephadm: serve.py: put _calc_client_files into it's own method

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit fb2321ec6988075777d8fc838f1d19034855264a)

Conflicts:
src/pybind/mgr/cephadm/serve.py

mgr/cephadm: Raise errors to properly set a cli status code

otherwise `ceph orch host rm` will return 0

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit 1a87e5eaf54b30c1974ed02aa7e69656d0106c27)

mgr/cephadm: Add client.admin keyring when upgrading from older version

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit 02c942a093a28376301b9b4c66d9c712345ff953)

Conflicts:
src/pybind/mgr/cephadm/tests/test_migration.py

mgr/cephadm/inventory: remove unused `filter_by_label`

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit 8de88a1d0ac4f4747fa15d45d2a82b34d6b35a95)

Merge pull request #44527 from sebastian-philipp/pacific-backport-44267

pacific: python-common: add int value validation for count and count_per_host

Reviewed-by: John Mulligan <jmulligan@redhat.com>

Merge pull request #44528 from sebastian-philipp/pacific-backport-44293

pacific: cephadm: make extract_uid_gid errors more readable

Reviewed-by: Adam King <adking@redhat.com>

Merge pull request #44526 from sebastian-philipp/pacific-backport-44035

pacific: mgr/cephadm: less log noise when config checks fail

Reviewed-by: Adam King <adking@redhat.com>

Merge pull request #44248 from guits/pacific-backport-44104

pacific: cephadm: pass `CEPH_VOLUME_SKIP_RESTORECON=yes` (backport)

Reviewed-by: Adam King <adking@redhat.com>

Merge pull request #44525 from sebastian-philipp/pacific-backport-44129-44109-44309

pacific: doc/cephadm: Doc backport

Reviewed-by: Adam King <adking@redhat.com>

Merge pull request #44535 from adk3798/backport-44134

pacific: mgr/cephadm: avoid repeated calls to get_module_option

Reviewed-by: Michael Fritch <mfritch@suse.com>

Merge pull request #44531 from sebastian-philipp/pacific-backport-44020

pacific: mgr/orchestrator: add filtering and count option for orch host ls

Reviewed-by: Adam King <adking@redhat.com>

Merge pull request #44530 from sebastian-philipp/pacific-backport-44336

pacific: mgr/cephadm: Fix test_facts

Reviewed-by: Adam King <adking@redhat.com>

Merge pull request #44597 from rhcs-dashboard/wip-53881-pacific

pacific: mgr/dashboard: fix: get SMART data from single-daemon device

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>

mgr/dashboard: fix: get SMART data from single-daemon device

Return SMART data even when a device is only associated with a single daemon.

Fixes: https://tracker.ceph.com/issues/53858
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
(cherry picked from commit 6cd3729e2737f9012569cffc6fd69cc5eed287ed)

qa/tasks/qemu: get the new Let's Encrypt root certificate

Fixes: https://tracker.ceph.com/issues/53841
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit b47965b5773d086eb64e7f91bdc05f483f562b00)

qa/run_xfstests_qemu.sh: harden against wget failures

If wget fails (e.g. due to a certificate issue), it still creates
an empty file. Then this file is marked executable, ./"${SCRIPT}"
immediately returns 0 and run_xfstests_qemu.sh exits successfully
without running a single xfstest.

This started on Sep 30, 2021 with the expiration of Let's Encrypt
root certificate -- all qemu jobs with "test: qa/run_xfstests_qemu.sh"
just booted the VM for a couple of seconds and reported success.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 387be947948ff1dd40e88ae5288b9a52c7cde403)

test/librbd: make diff-iterate clone tests exercise fast-diff mode

The fast-diff feature wasn't propagated to the clone so these tests
were exercising the slow list_snaps path no matter what RBD_FEATURES
value was supplied to ceph_test_librbd.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit ceb13d76f2b3aba7209e85f3354970c072997742)

librbd: restore diff-iterate include_parent functionality in fast-diff mode

Commit 4429ed4f3f4c ("librbd: switch diff iterate API to use new snaps
list dispatch methods") removed the recursive execute() call. The new
list_snaps method does indeed handle parent diffs internally but it is
not used in fast-diff mode. Nothing changed there -- we still need to
load the parent object map, calculate parent object_diff_state, etc.

Fixes: https://tracker.ceph.com/issues/53787
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 04293bef6ccd2b9ca3db53906b63c952e235cdb4)

librbd: stash unmodified include_parent value in DiffContext

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 92ca5ec36496dd02f618dc161e52b24711baa47b)

Merge pull request #44296 from batrick/i53445

pacific: mds: opening connection to up:replay/up:creating daemon causes message drop

Reviewed-by: Milind Changire <mchangir@redhat.com>

Merge pull request #44272 from nmshelke/wip-53332-pacific

pacific: doc: prerequisites fix for cephFS mount

Reviewed-by: Milind Changire <mchangir@redhat.com>

Merge pull request #44168 from cfsnyder/wip-50851-pacific

pacific: mds: PurgeQueue.cc fix for 32bit compilation

Reviewed-by: Milind Changire <mchangir@redhat.com>

Merge pull request #43979 from lxbsz/wip-53218

pacific: qa: increase the timeout value to wait a litte longer

Reviewed-by: Nikhilkumar Shelke <nshelke@redhat.com>
Reviewed-by: Milind Changire <mchangir@redhat.com>

librbd: diff-iterate reports incorrect offsets in fast-diff mode

If rbd_diff_iterate2() is called on an image offset that doesn't
correspond to an object boundary, the callback is invoked with an
incorrect image offset. For example, assuming a fully allocated
image, a diff request for 806354944~57344 results in offs=807403520,
len=57344, exists=true invocation, which is ahead by 1048576 bytes.
This occurs only in fast-diff mode, for a diff request on an image
with the fast-diff feature disabled or if whole_object parameter is
set to false the invocation is correct.

This bug goes back to the introduction of fast-diff mode in commit
6d5b969d4206 ("librbd: add diff_iterate2 to API").

Fixes: https://tracker.ceph.com/issues/53784
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit ea07d1e834018c693fc03637d338806f3c2f494f)

pacific: mgr/cephadm: avoid repeated calls to get_module_option

We already stash these as MgrModule members.

Signed-off-by: Sage Weil <sage@newdream.net>
Conflicts:
src/pybind/mgr/cephadm/module.py
src/pybind/mgr/cephadm/serve.py
src/pybind/mgr/cephadm/services/cephadmservice.py

mgr/dashboard: add test coverage for API docs (SwaggerUI)

Fixes: https://tracker.ceph.com/issues/53756
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
(cherry picked from commit 7363bc3af1613f2b06eaf34ea8c57ee8f4583537)

mgr/orchestrator: add filtering and count option for orch host ls

Filter orch host ls output for only hosts whose name
contains a certain substring or who have a certain label

Add a count flag that causes the command to return the number
of hosts found (either overall or matching the substring and/or
label) instead of a list of all the matching hosts

Fixes: https://tracker.ceph.com/issues/47774
Fixes: https://tracker.ceph.com/issues/53452
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit edd9bf38c3f07f5fdb6714e7f66515820c736d2e)

mgr/cephadm: Fix test_facts

Wasn't executed before

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit a03a34a01a70ce4d4ac8927a37d27e9853e46f8a)

cephadm: make extract_uid_gid errors more readable

Avoid dumping a traceback

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit d732a51df3a8d6b9edc340251edcd024b0e70f09)

python-common: add test inputs verifying count & count-per-host >= 1

This adds unit new test inputs, local to python-common that verify the
correct error messages are raised when count == 0 and count_per_host ==
0.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 0eb4e7dd56f3db6448080b0e9b880927c1bb7e04)

python-common: make count & count-per-host >= 1 checks consistent

The previous version of the validate function had a incorrect error
statement that suggested the count must be >1 when it should have
been >=1. This confusion was possibly due to using "n < 1" on
one line and "n <= 0" on another line. Since both values are supposed
to be integers this change corrects the error message and makes
the comparisons on the lines both use "n < 1" (since I find it easier
to see that the check "n < 1" is the inverse of the error text
asserting "n >= 1").

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 6169eb7f8e2462eb58338d6fc312b1347858b47f)

python-common: add unit test func for invalid yaml inputs

I didn't find a preexisting test function for this so I added a
new test that is fed yaml snippets and expected error messages.
This verifies some of the recently added validation for
count and cound_per_host under the placement spec.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 068d37d95762bce4d11668a838c6e85f6098723a)

python-common: add int value validation for count and count_per_host

Add additional validation for the count and count_per_host fields
sourced from YAML.

Fixes: https://tracker.ceph.com/issues/50524
Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit a9ad2a50fe83ea3342b7c1bbcfb942789e965cb4)

mgr/cephadm: less log noise when config checks fail

We are already raising health alerts--there is no need to spam the log
every few seconds when these checks are evaluated.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit f2a2e2d92ca21700aeffc78cce4a3d3c5949fd3f)

doc/cephadm/upgrade: correct example command

Update the ceph version used in the example upgrade command to match the one mentioned in the text above it.

Signed-off-by: Foad Lind <foad.lind@citynetwork.eu>
(cherry picked from commit 5077eef37844c1fc25c444a5b54d44a37052875c)

doc/cephadm: host location: add link to types

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit ee7ed53df865cfd1b88216cc7d27029172b935ef)

doc: fix typo in cephadm host management

(cherry picked from commit 22ca9ce373efd527d838a58ed25617ce4e7dcd91)

Merge pull request #44446 from sebastian-philipp/pacific-backport-43827-43894-42906-43095-43929-43969-43873-43888-44092-44080-

pacific: cephadm: November batch 2

Reviewed-by: Adam King <adking@redhat.com>

mgr/prometheus: Update rule format and enhance SNMP support

Rules now adhere to the format defined by Prometheus.io.
This changes alert naming and each alert now includes a
a summary description to provide a quick one-liner.

In addition to reformatting some missing alerts for MDS and
cephadm have been added, and corresponding tests added.

The MIB has also been refactored, so it now passes standard
lint tests and a README included for devs to understand the
OID schema.

Fixes: https://tracker.ceph.com/issues/53111
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>

Merge pull request #44468 from rhcs-dashboard/wip-53716-pacific

pacific: mgr/dashboard: fix timeout error in dashboard cephadm e2e job

Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>

Merge pull request #44171 from cfsnyder/wip-52073-pacific

pacific: rgw: user stats showing 0 value for "size_utilized" and "size_kb_utilized" fields

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #44166 from cfsnyder/wip-53289-pacific

pacific: rgw: fix `bi put` not using right bucket index shard

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #43968 from cfsnyder/wip-53256-pacific

pacific: librgw: treat empty root path as "/" on mount

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #43966 from cfsnyder/wip-53225-pacific

pacific: qa/rgw: bump tempest version to resolve dependency issue

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #43951 from cfsnyder/wip-53098-pacific

pacific: qa/rgw: Fix vault token file access.

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #43946 from cfsnyder/wip-53271-pacific

pacific: rgw/beast: optimizations for request timeout

Reviewed-by: Casey Bodley <cbodley@redhat.com>

mgr/prometheus: remove cmake tests

Temporary removal of the cmake test integration

Signed-off-by: Paul Cuzner <pcuzner@redhat.com>

mgr/prometheus: update promtool testcase location

Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
Conflicts:
src/test/CMakeLists.txt

monitoring/prometheus: Add cmake integration

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>

mgr/prometheus: add test cases and validation using tox

Focus all tests inside a tests directory, and use pytest/tox to
perform validation of the overall content. tox tests also use
promtool if available to provide rule checks and unittest runs.

In addition to these checks a validate_rules script provides the
format, and content checks against all rules - which is also
called via tox (but can be run independently too)

Signed-off-by: Paul Cuzner <pcuzner@redhat.com>

mgr/prometheus: track individual healthchecks as metrics

This patch creates a health history object maintained in
the modules kvstore. The history and current health
checks are used to create a metric per healthcheck whilst
also providing a history feature. Two new commands are added:
ceph healthcheck history ls
ceph healthcheck history clear

In addition to the new commands, the additional metrics
have been used to update the prometheus alerts

Fixes: https://tracker.ceph.com/issues/52638
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
(cherry picked from commit e0dfc02063ef40cf6a1dc6e3080d0a856ceff050)

Conflicts:
doc/mgr/prometheus.rst
- Adopting doc with master.

mgr/dashboard: stabilizing the cephadm dashboard e2e

Reordering the tests and adding some more tests to verify the cluster is
healthy before proceeding to do some complex tasks like maintenance and
drain host

Fixes: https://tracker.ceph.com/issues/53742
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit fbc9f46459537d0799c448e27f80623d7d4805c8)

mgr/dashboard: dashboard cephadm e2e improvement

Fixes: https://tracker.ceph.com/issues/53742
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 26f86f6cd32fc86297e250472e3205d4d65744fb)

mgr/dashboard: fix timeout error in dashboard cephadm e2e job

1. Fix the timeout error happening in the dashboard e2e job
2. Take care of the flaky force maintenance check

Most of the time our test is getting timed out while searching for an item
in the table. Its because `.clear().type()` is not clearing the content
in the search field sometimes and that creates a wrong data to be
entered into the search field and it starts searching based on this
wrong name. To avoid this I am explicitly clearing the search area
before typing.

Fixes: https://tracker.ceph.com/issues/53672
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit fed358d7c5a21cc76cae7042975f7e47ac3f8d50)

mgr/dashboard: fix orchestrator/02-hosts-inventory.e2e failed.

I removed the `02-hosts-inventory.e2e` file because it is a duplicate
test of one of the test in the `01-hosts.e2e` file and fixed the error
from that file.

Also, in the inventory Identify test, we test for an element to be not
visible. According to the latest cypress docs, this should be not.exist
instead of not.visible since the cd-modal will not even be present in
the DOM

Fixes: https://tracker.ceph.com/issues/53499
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 7b9fb258c46f4a93ca53a5864a8eb4363147bcdc)