ceph-volume: fix partitions support in disk.get_devices()
The following:
```
is_part = get_file_contents(os.path.join(_sys_dev_block_path, item, 'partition')) == "1"
```
assumes any `/sys/dev/block/x:y/partition` contains '1' which is wrong.
This file actually contains the corresponding partition number.
Nizamudeen A [Tue, 19 Mar 2024 14:57:13 +0000 (20:27 +0530)]
mgr/dashboard: rm warning/error threshold for cpu usage
for multi-core cpu's the value can be more than 100% so it doesn't make
sense to show warning/error when the usage is at or more than 100%.
hence removing it
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/shared/components/usage-bar/usage-bar.component.html
- some changes are not in quincy for the threasholds. so adapting it
Igor Fedotov [Thu, 29 Sep 2022 11:52:45 +0000 (14:52 +0300)]
osd: improve OSD robustness.
Achieved by
1. osd superblock data is replicated in onode's OMAP - hence one can
recover from that after onode's content is corrupted.
2. pg_num_history object gets full overwrite which eliminatess the need to
merge with previous data (and hence reading corrupted data wouldn't
kill OSD).
Xavi Hernandez [Fri, 16 Feb 2024 18:14:07 +0000 (19:14 +0100)]
client: fix leak of file handles
Based on posix specification, the fd passed to fdopendir() will be
closed by closedir(). However CephFS client wasn't doing that. If the
user opened a directory using ceph_openat(), for example, and then
passed the returned fd to ceph_fdopendir(), the created Fh associated
with the new open was never destroyed.
This patch records the fd used in ceph_fdopendir() so that it can be
closed when ceph_closedir() is called.
Ramana Raja [Thu, 29 Feb 2024 17:12:19 +0000 (12:12 -0500)]
qa/suites: add diff-continuous and compare-mirror-image tests
... to rbd and krbd suites respectively.
This allows the compare-mirror-image tests introduced in ea3a567
to be run against various kernel branches, e.g., testing branch.
And allows diff_continuous test in rbd_suite to run against distro
kernel.
Adam King [Mon, 5 Jun 2023 19:05:55 +0000 (15:05 -0400)]
mgr/cephadm: add ability to zap OSDs' devices while draining host
Currently, when cephadm drains a host, it will remove all OSDs on
the host, but provides no option to zap the OSD's devices afterwards.
Given users are draining the host likely to remove it from the cluster,
it makes sense some users would want to clean up the devices on the
host that were being used for OSDs. Cephadm already supports zapping
devices outside of host draining, so it makes shouldn't take much to
add that functionality to the host drain as well.
Ilya Dryomov [Wed, 28 Feb 2024 13:20:16 +0000 (14:20 +0100)]
librbd: don't clip expanded diff on truncate in ObjectListSnapsRequest
If the diff was expanded due to LIST_SNAPS_FLAG_WHOLE_OBJECT, clipping
it when handling a truncate is wrong -- when subtracting that interval,
we either split the expanded extent into two or chop off a piece of it.
However the point of LIST_SNAPS_FLAG_WHOLE_OBJECT is to report a single
extent covering the entire object.
Ilya Dryomov [Sun, 18 Feb 2024 10:46:15 +0000 (11:46 +0100)]
librados/snap_set_diff: ignore truncates above size at start
Because currently calc_snap_set_diff() only ever appends to the running
diff, an excessive (either too large or completely bogus) zero extent
is reported in cases where an object is first expanded (with a snapshot
taken at that point) and then truncated but still above the size of the
object as of the starting snapshot.
Venky Shankar [Mon, 4 Mar 2024 13:23:53 +0000 (18:53 +0530)]
mds: disable `defer_client_eviction_on_laggy_osds' by default
This config can result in a single client holding up mds to service
other clients since once a client is deferred from eviction due to
laggy OSD(s), a new clients cap acquire request can be possibly
blocked until the other laggy client resumes operation, i.e., when
the laggy OSD is considered non-laggy anymore.
Disable the config by default till the issue is fixed.
Ramana Raja [Thu, 25 May 2023 16:48:12 +0000 (16:48 +0000)]
qa: Add tests to validate syncing of images using rbd-mirror
Introduce functional tests to validate that the images under
workloads are correctly mirrored between two clusters using snapshot
based mirroring.
Run workload on a primary image using a krbd or nbd client. Take
mirror snapshots of the image under workload. Unmount the mapped image
and calculate its MD5 checksum before demoting it. After demotion,
wait for the mirror status of the image to be 'up+unknown' in both
the clusters. This is to make sure that the non-primary image in the
other cluster is ready to be promoted. Now promote the non-primary
image in the other cluster. Map the promoted image and calculate its
MD5 checksum. Verify that the checksums of the demoted and promoted
images in the two clusters are the same.
The above test is run as part of two different workunits:
- a workunit that validates the syncing of multiple mirrored images
with workloads running on them
- another workunit that validates the syncing of a single mirrored
image with workload running on it and the image is set as primary
alternatively between the two clusters, as it happens during
failover and failback scenarios.
Fixes: https://tracker.ceph.com/issues/61617 Signed-off-by: Ramana Raja <rraja@redhat.com> Co-authored-by: Ilya Dryomov <idryomov@redhat.com> Co-authored-by: Christopher Hoffman <choffman@redhat.com>
(cherry picked from commit b7aae5c3c5a1dd24c4cb7ceb499292af00bae680)
Cherry-pick notes:
- In qa/workunits/rbd/compare_mirror_images.sh, replace
`wait_for_replaying_status_in_pool_dir` with `wait_for_status_in_pool_dir`
Commit 3fd8a03 that added `wait_for_replaying_status_in_pool_dir`
not backported
Zac Dover [Wed, 13 Mar 2024 17:25:06 +0000 (03:25 +1000)]
doc/cephadm: explain different methods of cephadm delivery
Explain that only in Reef and later releases is cephadm distributed as
an executable compiled from source code. This note is to go into Quincy
and only into Quincy, to direct new users of Ceph whom circumstance has
delivered into the hands of Quincy and who might have the wrong idea
that the documentation of Reef and later releases applies to their
release.
Adam King [Fri, 16 Feb 2024 16:24:32 +0000 (11:24 -0500)]
mgr/cephadm: catch CancelledError in asyncio timeout handler
Specifically, concurrent.futures.CancelledError. At least on
python 3.9, this error can be raised when certain commands
being run asynchronously fail. Not catching this results in
the whole cephadm module crashing with something like
Traceback (most recent call last):
File "/usr/share/ceph/mgr/cephadm/utils.py", line 94, in do_work
return f(*arg)
File "/usr/share/ceph/mgr/cephadm/serve.py", line 267, in refresh
r = self._refresh_facts(host)
File "/usr/share/ceph/mgr/cephadm/serve.py", line 370, in _refresh_facts
val = self.mgr.wait_async(self._run_cephadm_json(
File "/usr/share/ceph/mgr/cephadm/module.py", line 671, in wait_async
return self.event_loop.get_result(coro, timeout)
File "/usr/share/ceph/mgr/cephadm/ssh.py", line 64, in get_result
return future.result(timeout)
File "/lib64/python3.9/concurrent/futures/_base.py", line 444, in result
raise CancelledError()
concurrent.futures._base.CancelledError
Adam King [Fri, 29 Sep 2023 20:09:48 +0000 (16:09 -0400)]
qa/cephadm: add teuthology test for host draining
This was a gap in our testing in general, but I'm
adding it here right now specifically to use it
to test the "--rm-crush-entry" flag in a follow
up commit
Adam King [Fri, 29 Sep 2023 18:39:10 +0000 (14:39 -0400)]
mgr/cephadm: add --rm-crush-entry flag to host removal
This will tell cephadm to try and remove the
crush bucket for the host at the end of the host
removal process. If this fails, we still consider the
host as having been successfully remove from
cephadm's POV, but the user will get back an error
message telling them we failed to remove the
host from the crush map
Adam King [Wed, 18 Oct 2023 18:00:05 +0000 (14:00 -0400)]
mgr/cephadm: update timestamp on repeat daemon/service events
If you have a daemon/service event and then an identical
event happens later (e.g. the same daemon is redeployed
multiple times) the events are not updated on the repeat
instances. In cases like this I think it makes more
sense to update the timestamp so users can see the most
recent time the event happened.
Ronen Friedman [Mon, 22 May 2023 15:09:28 +0000 (18:09 +0300)]
osd/scrub: increasing max_osd_scrubs to 3
Bug reports seem to hint that the current default value of
'1' is too low: the cluster is susceptible to scrub scheduling
delays and issues stemming from local software/networking/hardware
problems, even if affecting a very small number of OSDs.
Squid will include a major overhaul of the way scrubs are counted
in the cluster, providing a better solution to the problem. For
now - modifying the default is an effective stop-gap measure.
Wei Wang [Mon, 29 Jan 2024 08:26:24 +0000 (08:26 +0000)]
mon: fix health store size growing infinitely
The `check_mutes` wrongly marks `changed` to true, trigger `propose_pending` and block following `maybe_trim` logic (`have_pending` will be always be false); as a result, the health store will never be trimmed.