John Mulligan [Thu, 22 Feb 2024 18:49:10 +0000 (13:49 -0500)]
qa/tasks: add templating functions to cephadm module
Add functions to cephadm.py that will be later used to template
strings within the yaml files in the cephadm suites. This will be used
to replace the specific subst_vip call with generic calls that let
tests access "any" variables stored on the test ctx.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Tue, 20 Feb 2024 15:09:50 +0000 (10:09 -0500)]
qa/tasks: fix VIPs log line
While testing my previous patches were correct I noticed that the string
here was logged exactly as written, and was thus pretty useless. This
was probably meant to be an f-string. So make it one. Also get rid of
the unnecessary map call, the list and IP address type can repr
themselves just fine IMO.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Tue, 20 Feb 2024 00:14:52 +0000 (19:14 -0500)]
qa/tasks: change map_vips to raise exceptions instead of returning None
None of the callers of map_vips ever checks for a None return. So
instead of handling any error conditions it would always just blow
up with a semi-obscure TypeError. Convert the function to always
raise an exception (one that tries to breifly explain the condition)
when something goes wrong. I also take the opportunity to make
more clearer logging and reduce an indentation level.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Xuehan Xu [Wed, 28 Feb 2024 05:42:04 +0000 (13:42 +0800)]
crimson/os/seastore: copy attrs and omaps when cloning objects
At present, we just copy attrs and omaps one by one, which is not
efficient but very important in terms of functionality especially for
the teuthology tests
Venky Shankar [Wed, 28 Feb 2024 04:02:42 +0000 (09:32 +0530)]
Merge PR #52258 into main
* refs/pull/52258/head:
client: check mds down status bofore getting mds_gid_t from mdsmap
mgr/dashboard: allow sending back error status code fetching clients fails
Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Rishabh Dave <ridave@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Nizamudeen A <nia@redhat.com> Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
Zac Dover [Mon, 26 Feb 2024 10:03:48 +0000 (20:03 +1000)]
doc/rados: add "change public network" procedure
Add a procedure to /doc/rados/operations/add-or-rm-mons.rst that
explains how to change the public_network in a Ceph cluster deployed
with cephadm. This procedure was developed by Eugen Block, and can be
seen in its original form here:
https://heiterbiswolkig.blogs.nde.ag/2024/02/22/cephadm-change-public-network/
Co-authored-by: Eugen Block <eblock@nde.ag> Signed-off-by: Zac Dover <zac.dover@proton.me>
Venky Shankar [Tue, 27 Feb 2024 05:53:13 +0000 (11:23 +0530)]
Merge PR #52859 into main
* refs/pull/52859/head:
qa: test cases to make sure invalid paths don't get updated
mgr/nfs: use helper to validate cephfs path
mgr/nfs: validate path before updating a cephfs export
mgr/nfs: add a helper to validate cephfs path
Adam King [Fri, 16 Feb 2024 16:24:32 +0000 (11:24 -0500)]
mgr/cephadm: catch CancelledError in asyncio timeout handler
Specifically, concurrent.futures.CancelledError. At least on
python 3.9, this error can be raised when certain commands
being run asynchronously fail. Not catching this results in
the whole cephadm module crashing with something like
Traceback (most recent call last):
File "/usr/share/ceph/mgr/cephadm/utils.py", line 94, in do_work
return f(*arg)
File "/usr/share/ceph/mgr/cephadm/serve.py", line 267, in refresh
r = self._refresh_facts(host)
File "/usr/share/ceph/mgr/cephadm/serve.py", line 370, in _refresh_facts
val = self.mgr.wait_async(self._run_cephadm_json(
File "/usr/share/ceph/mgr/cephadm/module.py", line 671, in wait_async
return self.event_loop.get_result(coro, timeout)
File "/usr/share/ceph/mgr/cephadm/ssh.py", line 64, in get_result
return future.result(timeout)
File "/lib64/python3.9/concurrent/futures/_base.py", line 444, in result
raise CancelledError()
concurrent.futures._base.CancelledError
Fixes: https://tracker.ceph.com/issues/64473 Signed-off-by: Adam King <adking@redhat.com>
Casey Bodley [Mon, 26 Feb 2024 14:38:52 +0000 (09:38 -0500)]
test/rgw: increase timeouts in unittest_rgw_dmclock_scheduler
1ms sleeps are generally below the timer's resolution. increase run_for()
durations to 50ms to make the tests far less sensitive to timing. in
practice, none of the sleeps actually wait the full 50ms
Zac Dover [Fri, 23 Feb 2024 16:05:42 +0000 (02:05 +1000)]
doc/rbd: repair ordered list
Fix the numbering in an ordered list. The numbering was thrown off
because a ".. prompt" directive was improperly indented (it wasn't
indented at all).
See https://github.com/ceph/ceph/pull/55540#discussion_r1500051264
Casey Bodley [Thu, 22 Feb 2024 21:54:54 +0000 (16:54 -0500)]
rgw/aio: avoid infinite recursion in aio_abstract()
a recent regression from 320a2179a3c6c1981a0fd2494938515997c1bfad causes
aio_abstract() to recurse when given an empty optional_yield. this is
exposed by the librgw_file tests
Ramana Raja [Thu, 25 May 2023 16:48:12 +0000 (16:48 +0000)]
qa: Add tests to validate syncing of images using rbd-mirror
Introduce functional tests to validate that the images under
workloads are correctly mirrored between two clusters using snapshot
based mirroring.
Run workload on a primary image using a krbd or nbd client. Take
mirror snapshots of the image under workload. Unmount the mapped image
and calculate its MD5 checksum before demoting it. After demotion,
wait for the mirror status of the image to be 'up+unknown' in both
the clusters. This is to make sure that the non-primary image in the
other cluster is ready to be promoted. Now promote the non-primary
image in the other cluster. Map the promoted image and calculate its
MD5 checksum. Verify that the checksums of the demoted and promoted
images in the two clusters are the same.
The above test is run as part of two different workunits:
- a workunit that validates the syncing of multiple mirrored images
with workloads running on them
- another workunit that validates the syncing of a single mirrored
image with workload running on it and the image is set as primary
alternatively between the two clusters, as it happens during
failover and failback scenarios.
Fixes: https://tracker.ceph.com/issues/61617 Signed-off-by: Ramana Raja <rraja@redhat.com> Co-authored-by: Ilya Dryomov <idryomov@redhat.com> Co-authored-by: Christopher Hoffman <choffman@redhat.com>
Ramana Raja [Fri, 9 Feb 2024 00:32:37 +0000 (19:32 -0500)]
qa/workunits: make wait_for_status_in_pool_dir() reentrant
In rbd_mirror_helpers.sh, the `wait_for_status_in_pool_dir()` helper
stored `mirror image status` and `mirror pool status` command outputs
in files that could be shared over successive calls or calls from
multiple threads. Instead store the command outputs in local variables
to make `wait_for_status_in_pool_dir()` reentrant.
This allows to override persistent min_alloc_size if needed.
This might be helpful to troubleshoot and work around issues like
https://tracker.ceph.com/issues/63618
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>