Patrick Donnelly [Fri, 23 Jul 2021 15:19:48 +0000 (08:19 -0700)]
qa: stop overriding distro for k-testing
This is a continuation of previous commit
qa: only use RHEL for workload testing
We don't want to test fs:workload with centos/ubuntu to avoid packaging
issues and to reduce the matrix of distros we're running workloads on.
Also, the testing kernel should install fine on the distros we test with
"supported" random distros.
Patrick Donnelly [Thu, 15 Jul 2021 17:35:41 +0000 (10:35 -0700)]
qa: only use RHEL for workload testing
It's not useful testing workloads with different distributions; it just
adds to the maintenance burden of this qa suite as distro upgrades often
break compilation of various tests.
Rishabh Dave [Mon, 7 Feb 2022 18:44:42 +0000 (00:14 +0530)]
monitoring: mention PyYAML only once in requirements
Following error occurs while running "sudo install-deps.sh" -
ERROR: Double requirement given: PyYAML==6.0 (from -r requirements-lint.txt (line 5)) (already in pyyaml (from -r requirements-alerts.txt (line 1)), name='PyYAML')
PyYAML is mentioned twice as a requirement. It is mentioned once in both
the following files -
monitoring/ceph-mixin/requirements-lint.txt
monitoring/ceph-mixin/requirements-alerts.txt
Ilya Dryomov [Mon, 31 Jan 2022 13:08:26 +0000 (14:08 +0100)]
qa/suites/krbd: add legacy+rxbounce and crc+rxbounce coverage
For basic, rbd and rbd-nomount subsuites, replace legacy and crc
facets with "legacy or legacy+rxbounce" and "crc or crc+rxbounce"
facets (chosen at random).
For fsx, singleton and thrash subsuites, add legacy+rxbounce and
crc+rxbounce facets and drop prefer-crc facet. The expected behaviour
of the latter depends on cluster configuration and should be tested
separately.
Ilya Dryomov [Sat, 29 Jan 2022 14:01:27 +0000 (15:01 +0100)]
qa/suites/rbd: add cram-based mon command API test
With mon (rbd_support mgr module in this case) command definitions
generated automatically by @CLI{Read,Write}Command decorator, it's
very easy to accidentally break the external facing API.
Ilya Dryomov [Sat, 29 Jan 2022 14:01:27 +0000 (15:01 +0100)]
mgr/rbd_support: level_spec is optional for schedule list/status
Commit fea6fdff4c74 ("mgr/rbd_support: level_spec passed to some
commands is not optional") is wrong. While it is true that a valid
level_spec is needed to create a LevelSpec instance, an empty string
is very much a valid level spec -- it signifies "all levels".
This wasn't caught because within Ceph these commands are wrapped by
rbd CLI which injects an empty string in get_level_spec_args().
Ilya Dryomov [Fri, 28 Jan 2022 22:01:08 +0000 (23:01 +0100)]
mgr/rbd_support: "trash remove" takes image_id_spec, not image_spec
Because of @CLIWriteCommand, the parameter name has to adhere to
the mon command API. Commit dcb51b067a49 ("mgr/rbd_support: define
commands using CLICommand") accidentally changed image_id_spec to
image_spec, breaking external users such as go-ceph.
Casey Bodley [Wed, 2 Feb 2022 19:06:20 +0000 (14:06 -0500)]
qa/rgw: tests run against ceph-quincy branch
target the ceph-quincy branch of s3tests, ragweed, and java_s3tests.
this commit targets the quincy branch specifically, rather than merging
to master and backporting
Before the patch the test case was showing an unreliable behaviour
dependent on the underlying memory allocator. It was because
the bufferlist rebuild can be skipped, resulting in unchanged number
of buffers, if all of them begin at aligned addresses.
The commit fixes that by allocating a 4 KiB-aligned buffer and
offsetting it by a small constant (42) to ensure the memory added
to the bufferlist begins at non-4 KiB address.
test/bufferlist: assert the rebuild in rebuild_aligned_size_and_memory() actually happens.
For the investigation of failures like the following one:
```
[ RUN ] BufferList.rebuild_aligned_size_and_memory
../src/test/bufferlist.cc:1865: Failure
Expected equality of these values:
bl.get_num_buffers()
Which is: 2
1
[ FAILED ] BufferList.rebuild_aligned_size_and_memory (0 ms)
```
The test case assumes the rebuild before the failed clause **always**
happens while `bufferlist::rebuild_aligned_size_and_memory()` skips it
if buffers are already aligned.
Neha Ojha [Fri, 21 Jan 2022 23:31:01 +0000 (23:31 +0000)]
qa/suites/rados: reduce the number of cephadm tests
Currently, every rados run of ~400 jobs is running ~150 cephadm tests,
which is unnecessary and redundant. With this change, we will run some
basic cephadm tests within the rados suite. The following seems to be
a good start.
John Mulligan [Thu, 20 Jan 2022 19:48:28 +0000 (14:48 -0500)]
cephadm: validate that the constructed YumDnf baseurl is usable
If the inputs to the `cephadm add-repo` command would result in an
invalid URL for repo metadata fail the command early with a (somewhat)
helpful error.
Fixes: https://tracker.ceph.com/issues/46773 Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Thu, 20 Jan 2022 19:46:03 +0000 (14:46 -0500)]
cephadm: add a validate function to Packager
The validate function is for testing the inputs to the Packager
subclasses independently of writing the configuration to disk.
It only raises an exception upon failed validation.
Use it for the existing YumDnf validation exceptions.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Melissa Li [Tue, 11 Jan 2022 23:03:23 +0000 (18:03 -0500)]
mgr/cephadm/iscsi: use `mon_command` in `post_remove` instead of `check_mon_command`
Use `mon_command` instead of `check_mon_command` in `post_remove` to avoid errors such as if iscsi service is removed before the iscsi gateway list is updated, cluster will enter error state and iscsi removal gets stuck.
Fixes: https://tracker.ceph.com/issues/53706 Signed-off-by: Melissa Li <melissali@redhat.com>
John Mulligan [Tue, 18 Jan 2022 18:31:03 +0000 (13:31 -0500)]
mgr/cephadm: add a test for enabling cephfs mirroring module
Add a test that checks that when cephfs mirror service is enabled
the mirroring mgr module gets enabled.
Actually-written-by: Sebastian Wagner <sewagner@redhat.com> Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit bcb4fa70f9f739dce3e1c111db0a322804350f9d)
Adam King [Thu, 6 Jan 2022 22:01:34 +0000 (17:01 -0500)]
mgr/cephadm: still check agent deps if it is marked down
Right now if an agent is down, the way _check_agent works
if will return without ever going on to check the deps or
scheduled actions for that agent. This causes a few issues.
For one, if an agent is marked down and then a mgr failover
happens, even if reconfiguring the agent would put it in a working
state (e.g. changing the target ip if the active mgr has moved)
we never try it because _check_agent just returns as soon as it
sees the agent is down. Additionally, if someone purposely tried
to schedule a redeploy of a down agent for whatever reason, we
would never make good on this action.
This change allows us to still carry out the normal checks/
scheduled actions even on down agents
I isolated all the tests suites into there respective files
so that in future it is easier to add more tests to it.
I also given priority to the host actions.
Create OSD checks are now written in a way that OSDs
are created only on the intended hosts. This will make
the host draining process easier and less time consuming.
Also tried to address the flaky force maintenance checks.
Removed some duplicated codes
Service creation part improved to reduce the time taken
for its completion
Ilya Dryomov [Sun, 23 Jan 2022 15:32:57 +0000 (16:32 +0100)]
qa/run_xfstests_qemu.sh: disable 251, 260 and 288
All three are skipped with virtio disks:
251 [not run] FITRIM not supported on /dev/vdc
260 [not run] FITRIM not supported on /dev/vdc
288 [not run] FITRIM not supported on /dev/vdc
But 260 and 288 fail with ide disks, where discard defaults to on. The
ancient kernel in our ubuntu-12.04.qcow2 doesn't support virtio discard
anyway so let's just disable them for consistency.
Ilya Dryomov [Fri, 21 Jan 2022 12:41:46 +0000 (13:41 +0100)]
rbd-mirror: fix races in snapshot-based mirroring deletion propagation
When remote image is deleted, rbd-mirror can encounter three cases:
1) no remote image id
2) no remote mirror metadata
3) MIRROR_IMAGE_STATE_DISABLING in remote mirror metadata
Commit d4c66ac5c615 ("rbd-mirror: fix issue with snapshot-based
mirroring deletion propagation") fixed case 1. Cases 2 and 3 remained
broken because for both of them finalize_snapshot_state_builder() would
populate not only remote_mirror_peer_uuid but also remote_image_id,
thus disabling ENOLINK logic in handle_prepare_remote_image() and
handle_bootstrap(). Commit ff60aec2d9ef ("rbd-mirror: fix bootstrap
sequence while the image is removed") touched on case 3, but it made
a difference only for journal-based mirroring.
Stop calling finalize_snapshot_state_builder() on errors. Instead,
align with journal-based mirroring by filling remote_mirror_peer_uuid
together with remote_mirror_uuid.
Make it clear that the local image non-primariness is asserted
independent of the mode; avoid the default implementation being
overridden but still relied on by both modes.
Ilya Dryomov [Wed, 19 Jan 2022 11:54:23 +0000 (12:54 +0100)]
rbd: add missing switch arguments for recognition by get_command_spec()
Currently this
$ rbd --all children img
doesn't work, while this
$ rbd children --all img
or this
$ rbd children img --all
does. The issue is that -a/--all isn't on the list of known switch
arguments. The "rbd children" example may seem contrived but for more
complicated commands such as "rbd device map" mixing switches and
positional arguments occurs naturally:
ee8887f4c0ff4f91117f31b621b95c8d08019130 was intended for adding
mpath devices support in ceph-volume but it has missed the lvm batch scenario.
This also fixes the zapping of mpath devices prepared with `ceph-volume raw`
The recent changes from PR #43536 introduced a regeression preventing from
running ceph-volume in a containerized context on Ubuntu 18.04.
Given that the path for the binary `lvs` differs between CentOS 8 and Ubuntu 18.04.
(`/usr/sbin/lvs` and `/sbin/lvs` respictively). It means that ceph-volume running
in the container on CentOS 8 sees the `lvs` binary at `/usr/sbin/lvs` and try to
run it with `nsenter` on the host which is running Ubuntu 18.04.