Adam King [Fri, 1 Mar 2024 18:22:44 +0000 (13:22 -0500)]
cephadm: improve cephadm pull usage message
Generally, it's uncommon for users to run this
directly, but in case they need to for debugging
purposes, we should include how to pass the
image to be pulled in the usage message.
Additionally, include that this is only to be used
for pulling ceph images in the help message, as
that isn't necessarily clear. Pulling anything
else will result in a traceback as it tries
to run `ceph --version` inside the container.
After some tests, it turns out that depending on the hardware,
the header 'Location' which is returned by the server after logged can be different.
I could notice the following:
the endpoint passed down to util.query() is wrong:
is passes the full url (scheme://addr:port/path) where it should only
pass the path. The cause is that RedFishClient.login() basically stores
the value of the Location header in `self.location`.
The consequence of this is that it makes the client unable to properly logout.
Venky Shankar [Mon, 4 Mar 2024 13:23:53 +0000 (18:53 +0530)]
mds: disable `defer_client_eviction_on_laggy_osds' by default
This config can result in a single client holding up mds to service
other clients since once a client is deferred from eviction due to
laggy OSD(s), a new clients cap acquire request can be possibly
blocked until the other laggy client resumes operation, i.e., when
the laggy OSD is considered non-laggy anymore.
Disable the config by default till the issue is fixed.
Ramana Raja [Thu, 25 May 2023 16:48:12 +0000 (16:48 +0000)]
qa: Add tests to validate syncing of images using rbd-mirror
Introduce functional tests to validate that the images under
workloads are correctly mirrored between two clusters using snapshot
based mirroring.
Run workload on a primary image using a krbd or nbd client. Take
mirror snapshots of the image under workload. Unmount the mapped image
and calculate its MD5 checksum before demoting it. After demotion,
wait for the mirror status of the image to be 'up+unknown' in both
the clusters. This is to make sure that the non-primary image in the
other cluster is ready to be promoted. Now promote the non-primary
image in the other cluster. Map the promoted image and calculate its
MD5 checksum. Verify that the checksums of the demoted and promoted
images in the two clusters are the same.
The above test is run as part of two different workunits:
- a workunit that validates the syncing of multiple mirrored images
with workloads running on them
- another workunit that validates the syncing of a single mirrored
image with workload running on it and the image is set as primary
alternatively between the two clusters, as it happens during
failover and failback scenarios.
Fixes: https://tracker.ceph.com/issues/61617 Signed-off-by: Ramana Raja <rraja@redhat.com> Co-authored-by: Ilya Dryomov <idryomov@redhat.com> Co-authored-by: Christopher Hoffman <choffman@redhat.com>
(cherry picked from commit b7aae5c3c5a1dd24c4cb7ceb499292af00bae680)
Cherry-pick notes:
- In qa/workunits/rbd/compare_mirror_images.sh, replace
`wait_for_replaying_status_in_pool_dir` with `wait_for_status_in_pool_dir`
Commit 3fd8a03 that added `wait_for_replaying_status_in_pool_dir`
not backported
Adam King [Fri, 16 Feb 2024 16:24:32 +0000 (11:24 -0500)]
mgr/cephadm: catch CancelledError in asyncio timeout handler
Specifically, concurrent.futures.CancelledError. At least on
python 3.9, this error can be raised when certain commands
being run asynchronously fail. Not catching this results in
the whole cephadm module crashing with something like
Traceback (most recent call last):
File "/usr/share/ceph/mgr/cephadm/utils.py", line 94, in do_work
return f(*arg)
File "/usr/share/ceph/mgr/cephadm/serve.py", line 267, in refresh
r = self._refresh_facts(host)
File "/usr/share/ceph/mgr/cephadm/serve.py", line 370, in _refresh_facts
val = self.mgr.wait_async(self._run_cephadm_json(
File "/usr/share/ceph/mgr/cephadm/module.py", line 671, in wait_async
return self.event_loop.get_result(coro, timeout)
File "/usr/share/ceph/mgr/cephadm/ssh.py", line 64, in get_result
return future.result(timeout)
File "/lib64/python3.9/concurrent/futures/_base.py", line 444, in result
raise CancelledError()
concurrent.futures._base.CancelledError
Zac Dover [Wed, 13 Mar 2024 12:04:35 +0000 (22:04 +1000)]
doc/dev: backport zipapp docs to reef
Backport the docs changes in https://github.com/ceph/ceph/pull/54173 to
the Reef release branch. This was not previously done because the docs
changes in PR#54173 were bundled with code changes.
Adam King [Mon, 6 Nov 2023 16:19:09 +0000 (11:19 -0500)]
qa/cephadm: adjust host drain test to handle explicit placement warning
Since we're adding a warning if any host is listed explicitly
in the placement of any service when removing the host,
we need to adjust the host drain test that removes a host
without the --force flag to not have the explicit hostname
in the placement for the mon service.
and then you run `ceph orch host drain host3`, cephadm will remove
the daemon from that host and the placement would now match nothing.
This is definitely an issue that should be able to be bypassed as
it generally isn't serious, but it would be good to let users
know they have the host listed explicitly in placements like this
when they want to drain it.
Adam King [Fri, 29 Sep 2023 20:09:48 +0000 (16:09 -0400)]
qa/cephadm: add teuthology test for host draining
This was a gap in our testing in general, but I'm
adding it here right now specifically to use it
to test the "--rm-crush-entry" flag in a follow
up commit
Adam King [Fri, 29 Sep 2023 18:39:10 +0000 (14:39 -0400)]
mgr/cephadm: add --rm-crush-entry flag to host removal
This will tell cephadm to try and remove the
crush bucket for the host at the end of the host
removal process. If this fails, we still consider the
host as having been successfully remove from
cephadm's POV, but the user will get back an error
message telling them we failed to remove the
host from the crush map
Adam King [Thu, 15 Feb 2024 14:24:23 +0000 (09:24 -0500)]
qa/cephadm: don't test certain workunits with agent
There are a handful of workunits that don't work
with or don't make sense with the agent.
The test for the cephadm timeout only works if
the mgr directly runs ceph-volume inventory which
it won't do with the agent present. The adoption
test is just running direct cephadm commands that
are irrelevant to the agent. The test_orch_cli tests
rely on refresh timings that are different with
the agent running, causing spurious failures.
Seena Fallah [Sun, 11 Feb 2024 21:50:05 +0000 (22:50 +0100)]
cephadm: remove restriction for crush device classes
A restriction has been introduced here (https://github.com/ceph/ceph/commit/6c6cb2f5130dbcf8e42cf03666173948411fc92b) which doesn't let OSDs be created with custom crush device classes.
Crush Device Class is the key that helps the crush distinguish between multiple storage classes, so it must accept any custom names.
Adam King [Wed, 14 Feb 2024 17:02:09 +0000 (12:02 -0500)]
cephadm: rm podman-auth.json if removing last cluster
We have points in rm-cluster where we check that
there are no other clusters on the host. If that
is the case, we can also clear /etc/ceph/podman-auth.json
which gets written out when we log in to a registry
while using podman
Adam King [Sun, 10 Mar 2024 20:42:51 +0000 (16:42 -0400)]
cephadm: create ceph-exporter sock dir if it's not present
Since this is usually /var/run/ceph/ which ends up getting
created by other daemons as well, it was common to see
ceph-exporter fail to deploy and then deploy fine after
once other daemons were down on the host. I don't see any
reason we can't just try to make the directory here instead
of bailing out.
This patch had to be rewritten for reef, as it depended on
changes in cephadm that will not be backported to reef.
Fixes: https://tracker.ceph.com/issues/64491 Signed-off-by: Adam King <adking@redhat.com>
Paul Cuzner [Thu, 21 Dec 2023 01:12:45 +0000 (20:12 -0500)]
orchestrator: Add summary line to orch device ls
This patch just adds a summary line to the plain
text output of orch device ls when the --summary
switch is given. This helps to quickly understand your
device countswhen managing hosts with many devices.
Adam King [Mon, 27 Nov 2023 20:04:42 +0000 (15:04 -0500)]
python-common/drive_selection: fix limit with existing devices
When devices have already been used for OSDs, they are still
allowed to pass filtering as they are still needed for the
resulting ceph-volume lvm batch command. This was causing an
issue with limit however. Limit adds the devices we've found
that match the filter and existing OSD daemons tied to the spec.
This allows double counting of devices that hae been used for
OSDs, as they're counted in terms of being an existing device
and that they match the filter. To avoid this issue, devices
should only be counted towards the limit if they are not already
part of an OSD.
An additional note: The limit feature is only applied for
data devices, so there is no need to worry about the effect
of this change on selection of db, wal, or journal devices.
Also, we would still want to not count these devices if they
did end up passing the data device filter but had been used
for a db/wal/journal device previously.
When no `service_id` is provided to service spec (osd) it results in
OSDs created with "osdspec_affinity" attribute set to a string
containing "None".
The DriveSelection class relies on the comparison of the actual
value of this attribute with the value of the service_id which has
the python type `None` in that case.
If any existing deployments were created without the service_id
attribute, we now have to support this case and make sure the check
won't filter out devices unexpectedly.
Adam King [Tue, 7 Nov 2023 20:49:57 +0000 (15:49 -0500)]
mgr/cephadm: fix reweighting of OSD when OSD removal is stopped
Previously, when you ran "ceph orch osd rm stop <osd-id>"
cephadm would pass in a new OSD object to the removal
queue that would not have any of the fields set previously
for the OSD. This was mostly fine when removing it from
the queue as those fields were no longer needed, but an
exception was the initial weight, which you need if
you want to set the weight back when you stop removal.
This patch changes it so it will now remove the actual
OSD object the removal queue stores so that we will
get to use the previously set original weight. It also
changes when we grab the original weight to make it
happen earlier and adds it to the to_json so it survives
any potential mgr failovers.