John Mulligan [Wed, 21 Aug 2024 15:31:52 +0000 (11:31 -0400)]
python-common/deployment: add a cluster public ip spec for smb
This spec can be used to define one or more public addresses that will
be automatically assigned to hosts by CTDB. The address can be specified
in the "interface" form - an address plus prefix length. Optionally,
networks to bind to can be specified. The network value will be
converted to a network device name later by cephadm.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Wed, 21 Aug 2024 21:03:40 +0000 (17:03 -0400)]
cephadm: add support for cluster public ip addresses to smb daemon
When a list of public addresses (and optional network destination(s))
are supplied at deploy time, convert the networks to device names
and pass that result to the sambcc ctdb configuration.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Wed, 21 Aug 2024 21:03:19 +0000 (17:03 -0400)]
mgr/smb: simplify orch backend enablement
We have a developer/debug module option that allows one to disable
triggering orchestration. When I tried to use it I thought it was
buggy and I had trouble diagnosing it. The mistake was on my side,
but the code change makes it much clearer what is being enabled
so I want to keep it.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Ronen Friedman [Sat, 17 Aug 2024 16:08:19 +0000 (11:08 -0500)]
osd/scrub: delay both targets on some failures
If the failure of a scrub-job is due to a condition that affects
both targets, both should be delayed. Otherwise, we may end up
with the following bogus scenario:
A high priority deep target is scheduled, but scrub session initiation
fails due to, for example, a concurrent snap trim. The deep target
will be delayed. A second initiation attempt may happen after the
snap trimming is done, but before the updated deep target not-before.
As a result - the lower priority target will be scheduled before the
higher priority one - which is a bug.
Ronen Friedman [Thu, 15 Aug 2024 13:17:48 +0000 (08:17 -0500)]
osd/scrub: reverse OSDRestrictions flags polarity
As most of the flags in OSDRestrictions are of 'true is bad' polarity,
reverse the two non-conforming flags - cpu load and time-of-day
restrictions - to match.
This flag was used to indicate that a deep scrub should
be performed if a shallow scrub finds an error. It was
always set true for shallow, regular, scrubs - if
can_autorepair flag was set. Thus, the ephemeral flag in
the requested_scrub_t object is not really needed.
Ronen Friedman [Tue, 6 Aug 2024 13:07:17 +0000 (08:07 -0500)]
qa/standalone/scrub: disable scrub_extended_sleep test
Disabling osd-scrub-test.sh::TEST_scrub_extended_sleep,
as the test is no longer valid (updated code no longer
produces the same logs or the same behavior).
osd/scrub: OSD's scrub queue now holds SchedEntry-s
The OSD's scrub queue now holds SchedEntry-s, instead of ScrubJob-s.
The queue itself is implemented using the 'not_before_queue_t' class.
Note: this is not a stable state of the scrubber code. In the next
commits:
- modifying the way sched targets are modified and updated, to match the
new queue implementation.
- removing the 'planned scrub' flags.
Important note: the interaction of initiate_scrub() and pop_ready_pg()
is not changed by this commit. Namely:
Currently - pop..() loops over all eligible jobs, until it finds one
that matches the environment restrictions (which most of the time, as the
concurrency limit is usually reached, would be 'high-priority-only').
The other option is to maintain Sam's 'not_before_q' clean interface: we
always pop the top, and if that top fails the preconds tests - we delay and
re-push. This has the following troubling implications:
- it would take a long time to find a viable scrub job, if the problem
is related to, for example, 'no scrub'.
- local resources failure (inc_scrubs() failure) must be handles
separately, as we do not want to reshuffle the queue for this
very very common case.
- but the real problem: unneeded shuffling of the queue, even as the
problem is not with the scrub job itself, but with the environment
(esp. no-scrub etc.).
This is a common case, and it would be wrong to reshuffle the queue
for that.
- and - remember that any change to a sched-entry must be done under PG
lock.
osd/scrub: modify ScrubJob to hold two SchedTarget-s
ScrubJob will now hold two SchedTarget-s - two sets of scheduling
information (times, levels, etc.) for the next shallow and deep scrubs.
This is in preparation for the upcoming changes to the scheduling queue.
The change cannot stand on its own, as the partial implementation
creates some inconsistencies in the scheduling logic.
Specifically, here is what changes here, and how it differs from the
desired implementation:
- The OSD still maintains a queue of scrub jobs - one object only per
PG.
But now - each queue element holds two SchedTarget-s.
- When a scrub is initiated, the Scrubber is handed a ScrubJob object.
Only in the next commit will it also receive the ID of the selected
level. That causes some issues when re-determining the level of the
initiated scrub. A failure to match the queue "intent" results in
failures.
- the 'planned scrub' flags are still here, instead of directly
encoding the characteristics of the next scrub in the relevant
sched-entry.
- the 'urgency' levels do not cover the full required range of
behaviors and priorities.
Ilya Dryomov [Fri, 16 Aug 2024 17:09:39 +0000 (19:09 +0200)]
librbd/migration: add external clusters support
This commit extends NativeFormat (aka migration where the migration
source is an RBD image) to support external Ceph clusters, limited to
import-only mode.
Co-authored-by: Or Ozeri <oro@il.ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Venky Shankar [Thu, 22 Aug 2024 09:22:33 +0000 (14:52 +0530)]
Merge PR #56816 into main
* refs/pull/56816/head:
doc: mention the peer status failed when snapshot created on the remote filesystem.
qa: add test_cephfs_mirror_remote_snap_corrupt_fails_synced_snapshot
cephfs_mirror: update peer status for invalid metadata in remote snapshot
Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
osd/scrub: introducing the concept of a SchedEntry
SchedEntry holds the scheduling details for scrubbing a specific PG at
a specific scrub level. Namely - it identifies the [pg,level]
combination, the 'urgency' attribute of the scheduled scrub
(which determines most of its behavior and scheduling decisions)
and the actual time attributes for scheduling (target,
deadline, not_before).
Added a table detailing, for each type of scrub, what limitations apply
to it, and what restrictions are waived.
The following commits will reshape the ScrubJob objects to hold
two instances of SchedTarget-s - two wrappers around SchedEntry-s,
one for the next shallow scrub and one for the next deep scrub.
Sched-entries (wrapped in sched-targets) have a defined order:
For ready-to-scrub entries (those that have an n.b. in the past),
the order is first by urgency, then by target time (and then by
level - deep before shallow - and then by the n.b. itself).
'Future' entries are ordered by n.b., then urgency,
target time, and level.
John Mulligan [Mon, 12 Aug 2024 14:56:51 +0000 (10:56 -0400)]
mgr/cephadm: enable the smb service to prevent stray ctdb services
Tell cephadm that any `ctdb` services are "owned" by the smb service
and should be ignored as not a stray.
Ideally, we do this on a per service basis but the info that the ctdb
lock helper provides to its registration function is pretty generic.
Future versions of samba may improve upon this.
Signed-off-by: John Mulligan <phlogistonjohn@asynchrono.us>
John Mulligan [Mon, 12 Aug 2024 14:56:36 +0000 (10:56 -0400)]
mgr/cephadm: extend stray service detection with a general ignore hook
Extend the system's current stray service detection with a new method on
the service classes so that new classes can hook into the stray services
in the case that ceph services and cephadm services have differing names
or use subsystems that call into ceph with different names (my use
case).
Signed-off-by: John Mulligan <phlogistonjohn@asynchrono.us>
John Mulligan [Mon, 15 Jul 2024 19:41:43 +0000 (15:41 -0400)]
mgr/smb: add a cluster resource field to manage clustering
Add a new `clustering` field to the smb cluster resource. This field can
be used to select either automatic clustering with ctdb, or disable it,
or require it. The default is automatic and is based on the count value
in the placement spec. A count of 1 disables clustering and any other
value it is enabled.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Thu, 15 Aug 2024 20:40:47 +0000 (16:40 -0400)]
mgr/cephadm: configure ctdb cluster metadata from cephadm smb service
Add support to the smb service module so that cephadm will provide
information about the layout of the smb daemons to the clustermeta
module that, in turn, will provide the information sambacc needs to
configure ctdb.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Mon, 15 Jul 2024 19:39:19 +0000 (15:39 -0400)]
mgr/smb: add a python module to help manage the ctdb cluster
Add a new module clustermeta that implements a JSON based interface
compatible with sambacc. This module will be called directly by cephadm
as it places the daemons on the cluster nodes.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Mon, 15 Jul 2024 19:22:22 +0000 (15:22 -0400)]
mgr/smb: add support for rados locks to rados store
Add support for using rados object locks to the rados store classes.
Callers directly using the rados store outside the store interface will
be able to make use of locking.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Mon, 15 Jul 2024 19:38:12 +0000 (15:38 -0400)]
mgr/cephadm: improve key management of smb service
The clustered mode of a logical smb cluster needs certain additional
capabilities in the rados pool. Improve, reorganize the key
configuration functions, and add the new caps.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
NitzanMordhai [Tue, 28 Nov 2023 09:52:05 +0000 (09:52 +0000)]
mgr/rest: Trim request array and limit size
Presently, the requests array in the REST module has the potential to grow
indefinitely, leading to excessive memory consumption, particularly when
dealing with lengthy and intricate request results.
To address this issue, a limit will be imposed on the requests array within
the REST module.
This limitation will be governed by the `mgr/restful/x/max_requests` configuration
parameter specific to the REST module.
when submit_request called we will check request array if exceed max_request option
if it does we will check if the future trimmed request finished and log error
message in case we are trimming un-finished requests.