Patrick Donnelly [Tue, 27 Aug 2024 17:10:54 +0000 (13:10 -0400)]
Merge PR #58419 into main
* refs/pull/58419/head:
mds: generate correct path for unlinked snapped files
qa: add test for cephx path check on unlinked snapped dir tree
mds: add debugging for stray_prior_path
Casey Bodley [Fri, 23 Aug 2024 19:03:31 +0000 (15:03 -0400)]
rgw: ignore zoneless default realm when not configured
"default" zone/zonegroup deployments without a realm can be broken by
the creation of an unrelated realm, because that realm is (was)
automatically set as the default
when startup detects an incomplete default realm (one that doesn't have
a default zone), fall back to the realmless "default" zone/zonegroup
instead
Nitzan Mordechai [Mon, 10 Jun 2024 10:51:03 +0000 (10:51 +0000)]
crimson: Add support for bench osd command
this commit adds support for the 'bench' admin command in the OSD,
allowing administrators to perform benchmark tests on the OSD. The
'bench' command accepts 4 optional parameters with the following
default values:
1. count - Total number of bytes to write (default: 1GB).
2. size - Block size for each write operation (default: 4MB).
3. object_size - Size of each object to write (default: 0).
4. object_num - Number of objects to write (default: 0).
The results of the benchmark are returned in a JSON formatted output,
which includes the following fields:
1. bytes_written - Total number of bytes written during the benchmark.
2. blocksize - Block size used for each write operation.
3. elapsed_sec - Total time taken to complete the benchmark in seconds.
4. bytes_per_sec - Write throughput in bytes per second.
5. iops - Number of input/output operations per second.
Ronen Friedman [Sat, 17 Aug 2024 16:08:19 +0000 (11:08 -0500)]
osd/scrub: delay both targets on some failures
If the failure of a scrub-job is due to a condition that affects
both targets, both should be delayed. Otherwise, we may end up
with the following bogus scenario:
A high priority deep target is scheduled, but scrub session initiation
fails due to, for example, a concurrent snap trim. The deep target
will be delayed. A second initiation attempt may happen after the
snap trimming is done, but before the updated deep target not-before.
As a result - the lower priority target will be scheduled before the
higher priority one - which is a bug.
Ronen Friedman [Thu, 15 Aug 2024 13:17:48 +0000 (08:17 -0500)]
osd/scrub: reverse OSDRestrictions flags polarity
As most of the flags in OSDRestrictions are of 'true is bad' polarity,
reverse the two non-conforming flags - cpu load and time-of-day
restrictions - to match.
This flag was used to indicate that a deep scrub should
be performed if a shallow scrub finds an error. It was
always set true for shallow, regular, scrubs - if
can_autorepair flag was set. Thus, the ephemeral flag in
the requested_scrub_t object is not really needed.
Ronen Friedman [Tue, 6 Aug 2024 13:07:17 +0000 (08:07 -0500)]
qa/standalone/scrub: disable scrub_extended_sleep test
Disabling osd-scrub-test.sh::TEST_scrub_extended_sleep,
as the test is no longer valid (updated code no longer
produces the same logs or the same behavior).
osd/scrub: OSD's scrub queue now holds SchedEntry-s
The OSD's scrub queue now holds SchedEntry-s, instead of ScrubJob-s.
The queue itself is implemented using the 'not_before_queue_t' class.
Note: this is not a stable state of the scrubber code. In the next
commits:
- modifying the way sched targets are modified and updated, to match the
new queue implementation.
- removing the 'planned scrub' flags.
Important note: the interaction of initiate_scrub() and pop_ready_pg()
is not changed by this commit. Namely:
Currently - pop..() loops over all eligible jobs, until it finds one
that matches the environment restrictions (which most of the time, as the
concurrency limit is usually reached, would be 'high-priority-only').
The other option is to maintain Sam's 'not_before_q' clean interface: we
always pop the top, and if that top fails the preconds tests - we delay and
re-push. This has the following troubling implications:
- it would take a long time to find a viable scrub job, if the problem
is related to, for example, 'no scrub'.
- local resources failure (inc_scrubs() failure) must be handles
separately, as we do not want to reshuffle the queue for this
very very common case.
- but the real problem: unneeded shuffling of the queue, even as the
problem is not with the scrub job itself, but with the environment
(esp. no-scrub etc.).
This is a common case, and it would be wrong to reshuffle the queue
for that.
- and - remember that any change to a sched-entry must be done under PG
lock.
osd/scrub: modify ScrubJob to hold two SchedTarget-s
ScrubJob will now hold two SchedTarget-s - two sets of scheduling
information (times, levels, etc.) for the next shallow and deep scrubs.
This is in preparation for the upcoming changes to the scheduling queue.
The change cannot stand on its own, as the partial implementation
creates some inconsistencies in the scheduling logic.
Specifically, here is what changes here, and how it differs from the
desired implementation:
- The OSD still maintains a queue of scrub jobs - one object only per
PG.
But now - each queue element holds two SchedTarget-s.
- When a scrub is initiated, the Scrubber is handed a ScrubJob object.
Only in the next commit will it also receive the ID of the selected
level. That causes some issues when re-determining the level of the
initiated scrub. A failure to match the queue "intent" results in
failures.
- the 'planned scrub' flags are still here, instead of directly
encoding the characteristics of the next scrub in the relevant
sched-entry.
- the 'urgency' levels do not cover the full required range of
behaviors and priorities.
Ilya Dryomov [Sun, 25 Aug 2024 11:22:08 +0000 (13:22 +0200)]
qa: drop XMLSTARLET variable, use xmlstarlet directly
The variable was added in commit 9b6b7c35d03f ("Handle
differently-named xmlstarlet binary for *suse") but this
compatibility business is long outdated:
Mon Oct 13 08:52:37 UTC 2014 - toms@opensuse.org
- SPEC file changes
- Added link from /usr/bin/xml to /usr/bin/xmlstarlet as other
distributions do the same
- Did the same for the manpage
Ilya Dryomov [Fri, 23 Aug 2024 21:00:24 +0000 (23:00 +0200)]
rbd: "rbd bench" always writes the same byte
It's expected that the buffer is filled with the same byte, but the
byte should differ from run to run:
memset(bp.c_str(), rand() & 0xff, io_size);
This was broken in commit c7f71d14a5d3 ("rbd: migrated existing command
logic to new namespaces") which inadvertently moved the call to srand(),
leaving rand() unseeded for the above memset().
Ilya Dryomov [Fri, 16 Aug 2024 17:09:39 +0000 (19:09 +0200)]
librbd/migration: add external clusters support
This commit extends NativeFormat (aka migration where the migration
source is an RBD image) to support external Ceph clusters, limited to
import-only mode.
Co-authored-by: Or Ozeri <oro@il.ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Venky Shankar [Thu, 22 Aug 2024 09:22:33 +0000 (14:52 +0530)]
Merge PR #56816 into main
* refs/pull/56816/head:
doc: mention the peer status failed when snapshot created on the remote filesystem.
qa: add test_cephfs_mirror_remote_snap_corrupt_fails_synced_snapshot
cephfs_mirror: update peer status for invalid metadata in remote snapshot
Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
osd/scrub: introducing the concept of a SchedEntry
SchedEntry holds the scheduling details for scrubbing a specific PG at
a specific scrub level. Namely - it identifies the [pg,level]
combination, the 'urgency' attribute of the scheduled scrub
(which determines most of its behavior and scheduling decisions)
and the actual time attributes for scheduling (target,
deadline, not_before).
Added a table detailing, for each type of scrub, what limitations apply
to it, and what restrictions are waived.
The following commits will reshape the ScrubJob objects to hold
two instances of SchedTarget-s - two wrappers around SchedEntry-s,
one for the next shallow scrub and one for the next deep scrub.
Sched-entries (wrapped in sched-targets) have a defined order:
For ready-to-scrub entries (those that have an n.b. in the past),
the order is first by urgency, then by target time (and then by
level - deep before shallow - and then by the n.b. itself).
'Future' entries are ordered by n.b., then urgency,
target time, and level.
John Mulligan [Mon, 12 Aug 2024 14:56:51 +0000 (10:56 -0400)]
mgr/cephadm: enable the smb service to prevent stray ctdb services
Tell cephadm that any `ctdb` services are "owned" by the smb service
and should be ignored as not a stray.
Ideally, we do this on a per service basis but the info that the ctdb
lock helper provides to its registration function is pretty generic.
Future versions of samba may improve upon this.
Signed-off-by: John Mulligan <phlogistonjohn@asynchrono.us>
John Mulligan [Mon, 12 Aug 2024 14:56:36 +0000 (10:56 -0400)]
mgr/cephadm: extend stray service detection with a general ignore hook
Extend the system's current stray service detection with a new method on
the service classes so that new classes can hook into the stray services
in the case that ceph services and cephadm services have differing names
or use subsystems that call into ceph with different names (my use
case).
Signed-off-by: John Mulligan <phlogistonjohn@asynchrono.us>