Previous commits handled the following two cases correctly:
- requeueing a scrub job while the OSD is still the primary, and
- not restoring the scrub job to the queue if the PG is not there;
Here we handle the missed scenario: the PG is there (we were able to
lock it), but is no longer the primary.
Also - a configuration change must not cause a re-queue of a
scrub-job for a PG that is in the middle of scrubbing.
removing this unit-test, as it must be rewritten to match the
changes in the implementation of the scrub scheduling mechanism,
and that implementation is still in flux.
Ronen Friedman [Thu, 20 Jun 2024 13:04:41 +0000 (08:04 -0500)]
osd/scrub: scheduling the next scrub following scrub completion
or after an aborted scrub.
To note: one of the important changes in this commit:
merging the functionality of adjust_target_time() &
update_schedule() into a single function - adjust_schedule().
Regarding the handling of aborts:
Most of the time - all that is required following a scrub abort is to
requeue the scrub job - the one that triggered the aborted scrub -
with just a delay added to its n.b..
But we must take into account scenarios where "something" caused the
parameters prepared for the *next* scrub to show higher urgency or
priority. "Something" - as in an operator command requiring immediate
scrubbing, or a change in the pool/cluster configuration.
In such cases - the current requested flags and the parameters of
the aborted scrub must be merged.
Note that the current implementation is a temporary solution, to be
replaced by a per-level updating of the relevant target.
...and on_scrub_schedule_input_change()
by accepting a parameter to control whether the scheduled time for
periodic scrubs that are already due should be updated.
Ronen Friedman [Sun, 16 Jun 2024 15:49:27 +0000 (10:49 -0500)]
osd/scrub: passing the scrub-job copy through the scrubber
moving the scrubbed sjob copy thru the scrubber - from scrub
session initiation to its termination (or abort - where we use the
handed "old version" of the sjob to update the new one).
Note that in this version - not all the information that was used to
determine the specifics of the initiated scrub is passed to the
scrubber and back. In this half-baked stage of the refactoring,
the resulting implementation handling of corner cases, still using
the "planned scrub" flags, is less than optimal.
The next step (dual targets, replacing the 'planned scrub' flags with
specific attributes in the scheduling target) fixes this.
Ronen Friedman [Sat, 15 Jun 2024 12:58:30 +0000 (07:58 -0500)]
osd/scrub: fix adjust_target_time()
... to use populate_config_params() output
Also: fix determine_scrub_time() to use the same already-available
data. determine_scrub_time() renamed to determine_initial_schedule(),
to better reflect its purpose.
following the change to the queue into holding a copy of the
ScrubJob, the registration process - initiated by
schedule_scrub_with_osd() - can now be simplified.
adjust_target_time() is relocated as is. It will be refactored
in the next commit.
Ronen Friedman [Thu, 13 Jun 2024 10:38:29 +0000 (05:38 -0500)]
osd/scrub: the scrub queue now holds a copy of the ScrubJob
... and not a shared pointer to it.
The PG's entry in the scrub queue and the scrub-job object that
is part of the scrubber are now separate entities, that can be
modified separately - each with its own locking mechanism.
the interaction between OsdScrub (the OSD's scrub scheduler
object) and the scrub queue during scrub initiation is simplified:
instead of fetching from the queue a list of scrub targets
possibly ready to be scrubbed, the OsdScrub now picks only one
candidate - the top of the queue.
to convey the change in the scheduling data ownership: no longer
a scrub-job object shared between the Scrubber and the OSD
scrub queue. Instead - the scrub queue holds a copied snapshot
of the scheduling data.
Improve "Principles for format change" in doc/dev/encoding.rst. This
commit started as a response to Anthony D'Atri's suggestion here: https://github.com/ceph/ceph/pull/58299/files#r1656985564
Review of this section suggested to me that certain minor English usage
improvements would be of benefit. The numbered lists in this section
could still be made a bit clearer.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
Edit the section called "Is mount helper present?", the title of which
prior to this commit was "Is mount helper is present?". Other small
disambiguating improvements have been made to the text in the section.
An unselectable prompt has been added before a command.
Laura Flores [Thu, 11 Jul 2024 22:20:50 +0000 (17:20 -0500)]
qa/workunits/cephtool: add extra privileges to cephtool script
This is more of a workaround to an issue in the infrastructure
where files are made with root privileges, causing permission
issues when operations are used on them without sudo.
Fixes: https://tracker.ceph.com/issues/66881 Signed-off-by: Laura Flores <lflores@ibm.com>
This test deletes the CephFS already present on the cluster at the very
beginning and unmounts the first client beforehand. But it leaves the
second client mounted on this deleted CephFS that doesn't exist for the
rest of the test. And then at the very end of this test it attempts to
remount the second client (during tearDown()) which hangs and causes
test runner to crash.
Unmount the second client beforehand to prevent the bug and delete
mount_b object to avoid confusion for the readers in future about
whether or not 2nd mountpoint exists.
Fixes: https://tracker.ceph.com/issues/66077 Signed-off-by: Rishabh Dave <ridave@redhat.com>
Rishabh Dave [Mon, 17 Jun 2024 19:03:28 +0000 (00:33 +0530)]
mgr/vol: handle case where clone index entry goes missing
In `async_cloner.py`, clone index entry is fetched to get next clone job
that needs to be executed. It might happen that the clone job was
cancelled just when it was going to be picked for execution (IOW, when
it was about to move from pending state to in-progress state).
Currently, MGR hangs in such a case because exception `ObjectNotFound`
from CephFS Python bindings is raised and is left uncaught. To prevent
this issue catch the exception, log it and return None to tell
`get_job()` of `async_job.py` to look for next job in the queue.
Increase the scope of try-except in method `get_oldest_clone_entry()` of
`async_cloner.py` so that when exception `cephfs.Error` or any exception
under it is thrown by `self.fs.lstat()` is not left uncaught.
FS object is also passed to the method `list_one_entry_at_a_time()`, so
increasing scope of try-except is useful as it will not allow exceptions
raised in other calls to CephFS Python binding methods to be left
uncaught.
Fixes: https://tracker.ceph.com/issues/66560 Signed-off-by: Rishabh Dave <ridave@redhat.com>
John Mulligan [Mon, 8 Jul 2024 19:14:30 +0000 (15:14 -0400)]
mgr/cephadm: allow services to be conditionally ranked
Add a spec argument to the `ranked` function. If this function returns
true the service will use ranks. In some cases we want 'rankedness'
to be conditional on parameters of the given service spec.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Adam King [Mon, 1 Jul 2024 17:44:29 +0000 (13:44 -0400)]
cephadm: turn off cgroups_split setting when bootstrapping with --no-cgroups-split
If users provide the --no-cgroups-split tag when bootstrapping a
cluster, they probably want the cluster to continue to not use
cgroups split for daemon post bootstrap. Setting the
mgr/cephadm/cgroups_split setting to false accomplishes that.
Fixes: https://tracker.ceph.com/issues/66848 Signed-off-by: Adam King <adking@redhat.com>
John Mulligan [Wed, 10 Jul 2024 13:50:45 +0000 (09:50 -0400)]
mgr/smb: fix ceph smb show when a cluster has not associated shares
Fix an error condition in the `ceph smb show` command. When the ceph
smb show command was run after creating a usersgroups and cluster
resource but no shares resources the following traceback was seen:
```
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 1910, in
_handle_command
return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf)
File "/usr/share/ceph/mgr/mgr_module.py", line 507, in call
return self.func(mgr, **kwargs)
File "/usr/share/ceph/mgr/object_format.py", line 592, in
_format_response
robj = f(*args, **kwargs)
File "/usr/share/ceph/mgr/smb/module.py", line 258, in show
resources = self._handler.matching_resources(resource_names)
File "/usr/share/ceph/mgr/smb/handler.py", line 403, in
matching_resources
return self._search_resources(matcher)
File "/usr/share/ceph/mgr/smb/handler.py", line 414, in
_search_resources
for share_id in cluster_shares[cluster_id]:
KeyError: 'smbcluster'
```
Fixes: a5cde6ebe940 Reported-by: Anoop C S <anoopcs@cryptolab.net> Signed-off-by: John Mulligan <jmulligan@redhat.com>
qa/tasks/cephadm: don't wait for OSDs in create_rbd_pool()
This fails because teuthology.wait_until_osds_up() wants to use
adjust-ulimits wrapper which isn't available in "cephadm shell"
environment. The whole thing is also redundant because cephadm task
is supposed to wait for OSDs to come up earlier, in ceph_osds().
sunlan [Mon, 24 Jun 2024 08:29:38 +0000 (16:29 +0800)]
mgr/dashboard: add restful api for creating crush rule with type of 'erasure' Fixes: https://tracker.ceph.com/issues/66490 Signed-off-by: sunlan <sunlan@asiainfo.com>