Nizamudeen A [Tue, 19 Mar 2024 14:57:13 +0000 (20:27 +0530)]
mgr/dashboard: rm warning/error threshold for cpu usage
for multi-core cpu's the value can be more than 100% so it doesn't make
sense to show warning/error when the usage is at or more than 100%.
hence removing it
Ivo Almeida [Wed, 21 Feb 2024 13:02:19 +0000 (13:02 +0000)]
mgr/dashboard: fix retention add for subvolume
- Added parameters for subvolume and subvolume group when adding a new
snap schedule.
- Added call to remove retention policies when removing a snap schedule
in case it is the last one with same path
Fixes: https://tracker.ceph.com/issues/64524 Signed-off-by: Ivo Almeida <ialmeida@redhat.com>
(cherry picked from commit 80e1207f4b536fe6edbc81e61cbf951e135eba54)
Adam King [Wed, 13 Mar 2024 19:30:25 +0000 (15:30 -0400)]
mgr/cephadm: refresh public_network for config checks before checking
The place it was being run before meant it would only grab the
public_network setting once at startup of the module. This meant
if a user changed the setting, which they are likely to do if they
get the warning, cephadm would ignore the change and continue
reporting that the hosts don't match up with the old setting
for the public_network. This moves the call to refresh the
setting to right before we actually run the checks. It does
mean we'll do the `ceph config dump --format json` call
each serve loop iteration, but I've found that only tends
to take a few milliseconds, which is nothing compared to
the time to refresh other things we check during the serve
loop.
I additionally modified the use of this option to use
the attribute on the mgr, rather than calling
`get_module_option`. This was just to get it more in
line with how we tend to handle other config options
Florent Carli [Tue, 12 Mar 2024 17:31:16 +0000 (18:31 +0100)]
cephadm.py: add timemaster to timesync services list
On debian/ubuntu, if you need PTP, it's possible to use the linuxptp package for time-synchonization.
In that case the systemd service is called timemaster and is a wrapper for chrony/ntpd/phc2sys/ptp4l.
where the networks is set and the
"only_bind_port_on_networks" option is
set to true, the grafana daemon will bind
to its port (3000 in this case since it's
the default and I didn't set a port) only
on an IP from that network. I tested this
by holding port 3000 on an IP from a different
network on the host and then deploying
grafana. Without this patch it would have
failed with a port conflict error.
Nizamudeen A [Wed, 18 Oct 2023 06:38:21 +0000 (12:08 +0530)]
mgr/dashboard: support rgw roles updating
Right now only the modification of max_session_duration is supported via
the roles update command. To update, we need to use `policy modify`
command which is not added in this PR. That should be done separately
Adam King [Fri, 1 Mar 2024 18:22:44 +0000 (13:22 -0500)]
cephadm: improve cephadm pull usage message
Generally, it's uncommon for users to run this
directly, but in case they need to for debugging
purposes, we should include how to pass the
image to be pulled in the usage message.
Additionally, include that this is only to be used
for pulling ceph images in the help message, as
that isn't necessarily clear. Pulling anything
else will result in a traceback as it tries
to run `ceph --version` inside the container.
cephadm: adjust the ingress ha proxy health check interval
Currently health checker uses default value of 2s, it is send list
bucket request for every 2s. This seems to be frequent and need to
adjust properly. Hence introducing new setting health_check_interval in
the ingress spec for haproxy.
Signed-off-by: Jiffin Tony Thottan <thottanjiffin@gmail.com>
Apply suggestions from code review
Co-authored-by: Adam King <47704447+adk3798@users.noreply.github.com> Signed-off-by: Jiffin Tony Thottan <thottanjiffin@gmail.com>
(cherry picked from commit 75327c5b56591c6a29ad47745df24d16320f5a99)
Ramana Raja [Thu, 29 Feb 2024 17:12:19 +0000 (12:12 -0500)]
qa/suites: add diff-continuous and compare-mirror-image tests
... to rbd and krbd suites respectively.
This allows the compare-mirror-image tests introduced in ea3a567
to be run against various kernel branches, e.g., testing branch.
And allows diff_continuous test in rbd_suite to run against distro
kernel.
After some tests, it turns out that depending on the hardware,
the header 'Location' which is returned by the server after logged can be different.
I could notice the following:
the endpoint passed down to util.query() is wrong:
is passes the full url (scheme://addr:port/path) where it should only
pass the path. The cause is that RedFishClient.login() basically stores
the value of the Location header in `self.location`.
The consequence of this is that it makes the client unable to properly logout.
and it placed the node-exporter daemons on vm-00
and vm-02 but not vm-01. Obviously there are more
advanced scenarios that justify this than listing
two hosts, but using "|" as an OR like that is an
example of something you can't do with the fnmatch
version of the host pattern
Nizamudeen A [Thu, 7 Mar 2024 08:43:54 +0000 (14:13 +0530)]
mgr/dashboard: disable applitools e2e
Temporarily disabling this so the CI could turn green. Meanwhile I'll
research for a proper way to handle the applitools e2es which I'll track
on https://tracker.ceph.com/issues/64783
Ilya Dryomov [Wed, 28 Feb 2024 13:20:16 +0000 (14:20 +0100)]
librbd: don't clip expanded diff on truncate in ObjectListSnapsRequest
If the diff was expanded due to LIST_SNAPS_FLAG_WHOLE_OBJECT, clipping
it when handling a truncate is wrong -- when subtracting that interval,
we either split the expanded extent into two or chop off a piece of it.
However the point of LIST_SNAPS_FLAG_WHOLE_OBJECT is to report a single
extent covering the entire object.
Ilya Dryomov [Sun, 18 Feb 2024 10:46:15 +0000 (11:46 +0100)]
librados/snap_set_diff: ignore truncates above size at start
Because currently calc_snap_set_diff() only ever appends to the running
diff, an excessive (either too large or completely bogus) zero extent
is reported in cases where an object is first expanded (with a snapshot
taken at that point) and then truncated but still above the size of the
object as of the starting snapshot.
Venky Shankar [Mon, 4 Mar 2024 13:23:53 +0000 (18:53 +0530)]
mds: disable `defer_client_eviction_on_laggy_osds' by default
This config can result in a single client holding up mds to service
other clients since once a client is deferred from eviction due to
laggy OSD(s), a new clients cap acquire request can be possibly
blocked until the other laggy client resumes operation, i.e., when
the laggy OSD is considered non-laggy anymore.
Disable the config by default till the issue is fixed.
Ramana Raja [Thu, 25 May 2023 16:48:12 +0000 (16:48 +0000)]
qa: Add tests to validate syncing of images using rbd-mirror
Introduce functional tests to validate that the images under
workloads are correctly mirrored between two clusters using snapshot
based mirroring.
Run workload on a primary image using a krbd or nbd client. Take
mirror snapshots of the image under workload. Unmount the mapped image
and calculate its MD5 checksum before demoting it. After demotion,
wait for the mirror status of the image to be 'up+unknown' in both
the clusters. This is to make sure that the non-primary image in the
other cluster is ready to be promoted. Now promote the non-primary
image in the other cluster. Map the promoted image and calculate its
MD5 checksum. Verify that the checksums of the demoted and promoted
images in the two clusters are the same.
The above test is run as part of two different workunits:
- a workunit that validates the syncing of multiple mirrored images
with workloads running on them
- another workunit that validates the syncing of a single mirrored
image with workload running on it and the image is set as primary
alternatively between the two clusters, as it happens during
failover and failback scenarios.
Fixes: https://tracker.ceph.com/issues/61617 Signed-off-by: Ramana Raja <rraja@redhat.com> Co-authored-by: Ilya Dryomov <idryomov@redhat.com> Co-authored-by: Christopher Hoffman <choffman@redhat.com>
(cherry picked from commit b7aae5c3c5a1dd24c4cb7ceb499292af00bae680)
Cherry-pick notes:
- In qa/workunits/rbd/compare_mirror_images.sh, replace
`wait_for_replaying_status_in_pool_dir` with `wait_for_status_in_pool_dir`
Commit 3fd8a03 that added `wait_for_replaying_status_in_pool_dir`
not backported
- Adds support to set bucket policies through the Dashboard.
- Rename rgw bucket policy from 'policy' to 'bucket policy' and tab 'Permissions' to 'Policies'
- Fix: hide Tags when none are present on bucket list details and sets bucket form dirty after deleting a tag
- Added service to manage the formatting of a textArea that works with json
Signed-off-by: Pedro Gonzalez Gomez <pegonzal@redhat.com> Fixes: https://tracker.ceph.com/issues/63942
(cherry picked from commit 2817d8e25d84bba47951bd68cb3e8651cdb51b56)