Vallari Agrawal [Mon, 26 Aug 2024 04:23:07 +0000 (09:53 +0530)]
qa/tasks/nvmeof.py: add nvmeof gw-group to deployment
Groups was made a required parameter to be
`ceph orch apply nvmeof <pool> <group>` in
https://github.com/ceph/ceph/pull/58860.
That broke the `nvmeof` suite so this PR fixes that.
Right now, all gateway are deployed in a single group.
Later, this would be changed to have multi groups for a better test.
Ronen Friedman [Sat, 17 Aug 2024 16:08:19 +0000 (11:08 -0500)]
osd/scrub: delay both targets on some failures
If the failure of a scrub-job is due to a condition that affects
both targets, both should be delayed. Otherwise, we may end up
with the following bogus scenario:
A high priority deep target is scheduled, but scrub session initiation
fails due to, for example, a concurrent snap trim. The deep target
will be delayed. A second initiation attempt may happen after the
snap trimming is done, but before the updated deep target not-before.
As a result - the lower priority target will be scheduled before the
higher priority one - which is a bug.
Ronen Friedman [Thu, 15 Aug 2024 13:17:48 +0000 (08:17 -0500)]
osd/scrub: reverse OSDRestrictions flags polarity
As most of the flags in OSDRestrictions are of 'true is bad' polarity,
reverse the two non-conforming flags - cpu load and time-of-day
restrictions - to match.
This flag was used to indicate that a deep scrub should
be performed if a shallow scrub finds an error. It was
always set true for shallow, regular, scrubs - if
can_autorepair flag was set. Thus, the ephemeral flag in
the requested_scrub_t object is not really needed.
Ronen Friedman [Tue, 6 Aug 2024 13:07:17 +0000 (08:07 -0500)]
qa/standalone/scrub: disable scrub_extended_sleep test
Disabling osd-scrub-test.sh::TEST_scrub_extended_sleep,
as the test is no longer valid (updated code no longer
produces the same logs or the same behavior).
osd/scrub: OSD's scrub queue now holds SchedEntry-s
The OSD's scrub queue now holds SchedEntry-s, instead of ScrubJob-s.
The queue itself is implemented using the 'not_before_queue_t' class.
Note: this is not a stable state of the scrubber code. In the next
commits:
- modifying the way sched targets are modified and updated, to match the
new queue implementation.
- removing the 'planned scrub' flags.
Important note: the interaction of initiate_scrub() and pop_ready_pg()
is not changed by this commit. Namely:
Currently - pop..() loops over all eligible jobs, until it finds one
that matches the environment restrictions (which most of the time, as the
concurrency limit is usually reached, would be 'high-priority-only').
The other option is to maintain Sam's 'not_before_q' clean interface: we
always pop the top, and if that top fails the preconds tests - we delay and
re-push. This has the following troubling implications:
- it would take a long time to find a viable scrub job, if the problem
is related to, for example, 'no scrub'.
- local resources failure (inc_scrubs() failure) must be handles
separately, as we do not want to reshuffle the queue for this
very very common case.
- but the real problem: unneeded shuffling of the queue, even as the
problem is not with the scrub job itself, but with the environment
(esp. no-scrub etc.).
This is a common case, and it would be wrong to reshuffle the queue
for that.
- and - remember that any change to a sched-entry must be done under PG
lock.
osd/scrub: modify ScrubJob to hold two SchedTarget-s
ScrubJob will now hold two SchedTarget-s - two sets of scheduling
information (times, levels, etc.) for the next shallow and deep scrubs.
This is in preparation for the upcoming changes to the scheduling queue.
The change cannot stand on its own, as the partial implementation
creates some inconsistencies in the scheduling logic.
Specifically, here is what changes here, and how it differs from the
desired implementation:
- The OSD still maintains a queue of scrub jobs - one object only per
PG.
But now - each queue element holds two SchedTarget-s.
- When a scrub is initiated, the Scrubber is handed a ScrubJob object.
Only in the next commit will it also receive the ID of the selected
level. That causes some issues when re-determining the level of the
initiated scrub. A failure to match the queue "intent" results in
failures.
- the 'planned scrub' flags are still here, instead of directly
encoding the characteristics of the next scrub in the relevant
sched-entry.
- the 'urgency' levels do not cover the full required range of
behaviors and priorities.
Ilya Dryomov [Fri, 16 Aug 2024 17:09:39 +0000 (19:09 +0200)]
librbd/migration: add external clusters support
This commit extends NativeFormat (aka migration where the migration
source is an RBD image) to support external Ceph clusters, limited to
import-only mode.
Co-authored-by: Or Ozeri <oro@il.ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Venky Shankar [Thu, 22 Aug 2024 09:22:33 +0000 (14:52 +0530)]
Merge PR #56816 into main
* refs/pull/56816/head:
doc: mention the peer status failed when snapshot created on the remote filesystem.
qa: add test_cephfs_mirror_remote_snap_corrupt_fails_synced_snapshot
cephfs_mirror: update peer status for invalid metadata in remote snapshot
Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
osd/scrub: introducing the concept of a SchedEntry
SchedEntry holds the scheduling details for scrubbing a specific PG at
a specific scrub level. Namely - it identifies the [pg,level]
combination, the 'urgency' attribute of the scheduled scrub
(which determines most of its behavior and scheduling decisions)
and the actual time attributes for scheduling (target,
deadline, not_before).
Added a table detailing, for each type of scrub, what limitations apply
to it, and what restrictions are waived.
The following commits will reshape the ScrubJob objects to hold
two instances of SchedTarget-s - two wrappers around SchedEntry-s,
one for the next shallow scrub and one for the next deep scrub.
Sched-entries (wrapped in sched-targets) have a defined order:
For ready-to-scrub entries (those that have an n.b. in the past),
the order is first by urgency, then by target time (and then by
level - deep before shallow - and then by the n.b. itself).
'Future' entries are ordered by n.b., then urgency,
target time, and level.
John Mulligan [Mon, 12 Aug 2024 14:56:51 +0000 (10:56 -0400)]
mgr/cephadm: enable the smb service to prevent stray ctdb services
Tell cephadm that any `ctdb` services are "owned" by the smb service
and should be ignored as not a stray.
Ideally, we do this on a per service basis but the info that the ctdb
lock helper provides to its registration function is pretty generic.
Future versions of samba may improve upon this.
Signed-off-by: John Mulligan <phlogistonjohn@asynchrono.us>
John Mulligan [Mon, 12 Aug 2024 14:56:36 +0000 (10:56 -0400)]
mgr/cephadm: extend stray service detection with a general ignore hook
Extend the system's current stray service detection with a new method on
the service classes so that new classes can hook into the stray services
in the case that ceph services and cephadm services have differing names
or use subsystems that call into ceph with different names (my use
case).
Signed-off-by: John Mulligan <phlogistonjohn@asynchrono.us>
John Mulligan [Mon, 15 Jul 2024 19:41:43 +0000 (15:41 -0400)]
mgr/smb: add a cluster resource field to manage clustering
Add a new `clustering` field to the smb cluster resource. This field can
be used to select either automatic clustering with ctdb, or disable it,
or require it. The default is automatic and is based on the count value
in the placement spec. A count of 1 disables clustering and any other
value it is enabled.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Thu, 15 Aug 2024 20:40:47 +0000 (16:40 -0400)]
mgr/cephadm: configure ctdb cluster metadata from cephadm smb service
Add support to the smb service module so that cephadm will provide
information about the layout of the smb daemons to the clustermeta
module that, in turn, will provide the information sambacc needs to
configure ctdb.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Mon, 15 Jul 2024 19:39:19 +0000 (15:39 -0400)]
mgr/smb: add a python module to help manage the ctdb cluster
Add a new module clustermeta that implements a JSON based interface
compatible with sambacc. This module will be called directly by cephadm
as it places the daemons on the cluster nodes.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Mon, 15 Jul 2024 19:22:22 +0000 (15:22 -0400)]
mgr/smb: add support for rados locks to rados store
Add support for using rados object locks to the rados store classes.
Callers directly using the rados store outside the store interface will
be able to make use of locking.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Mon, 15 Jul 2024 19:38:12 +0000 (15:38 -0400)]
mgr/cephadm: improve key management of smb service
The clustered mode of a logical smb cluster needs certain additional
capabilities in the rados pool. Improve, reorganize the key
configuration functions, and add the new caps.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
NitzanMordhai [Tue, 28 Nov 2023 09:52:05 +0000 (09:52 +0000)]
mgr/rest: Trim request array and limit size
Presently, the requests array in the REST module has the potential to grow
indefinitely, leading to excessive memory consumption, particularly when
dealing with lengthy and intricate request results.
To address this issue, a limit will be imposed on the requests array within
the REST module.
This limitation will be governed by the `mgr/restful/x/max_requests` configuration
parameter specific to the REST module.
when submit_request called we will check request array if exceed max_request option
if it does we will check if the future trimmed request finished and log error
message in case we are trimming un-finished requests.
Tobias Urdin [Thu, 15 Aug 2024 15:17:14 +0000 (17:17 +0200)]
qa: barbican: restrict python packages with upper-constraints
We install barbican by doing a pip install directly on the
cloned git repository but we don't honor the upper-constraints
from the OpenStack Requirements project that handles what
versions is supported.
This changes the pip install command that we issue when
installing barbican to honor the requirements for the
version (derived from the branch) that we use, in
this case it's the 2023.1 release upper-constraints [1].
This prevents us from pulling in untested Python packages.
This only updates Barbican because for the Keystone job
we dont directly issue pip but install using tox using the
`venv` environment which already by default sets the
constraints as you can see in [2].
Yuval Lifshitz [Mon, 19 Aug 2024 10:37:07 +0000 (13:37 +0300)]
Merge pull request #59239 from yuvalif/wip-yuval-67513
Reviewed-By: Casey Bodley <cbodley@ibm.com>
test/rgw/notification: use real ip address instead of localhost
based on that comment:
https://tracker.ceph.com/issues/67206#note-6
the address used by the endpoint is taken as the real IP address of the
host where the test script is running and not localhost.
we also changed the rabbitmq-server conf to allow "guest"
user to connect over non localhost address
The commit starts to submit OOL writes before submitting the journal
write, true, but it cannot guarantee that OOL writes finish before the
journal write.
Thus it is possible that during SeaStore restart, a journal record
appears valid but its dependent OOL records are partial written, which
leads to corruption.
Ilya Dryomov [Mon, 5 Aug 2024 15:52:10 +0000 (17:52 +0200)]
librbd/migration: move away from util::create_ioctx() in NativeFormat
This is another step towards supporting migration from external
clusters, where creating an IoCtx from a Rados instance that has
nothing to do with dst_io_ctx would be needed. It also allows to
get rid of a pool lookup in the middle of parsing code.
Ilya Dryomov [Fri, 16 Aug 2024 12:12:38 +0000 (14:12 +0200)]
common/config: export CEPH_CONF_FILE_DEFAULT
It used to be exported until commit 318c62f8ae16 ("common/config:
cleanup remove some unused macros"). Having CEPH_CONF_FILE_DEFAULT
avaialable is handy to prevent parse_config_files() from picking up
CEPH_CONF environment variable.
Or Ozeri [Tue, 31 Jan 2023 11:08:22 +0000 (13:08 +0200)]
librbd/migration: don't clone when flattening
When the flatten flag is set, instead of creating the
destination image by cloning, create it independently,
as the parent relation is unnecessary in this case.
This will be particularly useful when the migration source
is located in an external Ceph cluster, which will soon be
supported.
Signed-off-by: Or Ozeri <oro@il.ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>