Patrick Donnelly [Thu, 21 Dec 2023 13:48:33 +0000 (08:48 -0500)]
pybind/mgr/devicehealth: skip legacy objects that cannot be loaded
Log looks like after test:
2023-12-21T16:09:28.804+0000 7fbe7fd86700 0 [devicehealth DEBUG root] loading object ABC_DEADB33F_FA
2023-12-21T16:09:28.805+0000 7fbe7fd86700 0 [devicehealth DEBUG root] object rados.Object(ioctx=<rados.Ioctx object at 0x7fbeee0c4668>,key=ABC_DEADB33F_FA,nspace=--default--,locator=None) does not exist because it is deleted in HEAD
2023-12-21T16:09:28.805+0000 7fbe7fd86700 0 [devicehealth DEBUG root] finished reading legacy pool, complete = True
Credit to Greg Farnum for postulating the cause.
Fixes: https://tracker.ceph.com/issues/63882 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Thu, 21 Dec 2023 15:39:03 +0000 (10:39 -0500)]
qa: test devicehealth legacy load of deleted snap obj
Failure without fix looks like:
2023-12-21T16:05:55.737+0000 7fbe585b0700 0 [devicehealth DEBUG root] loading object ABC_DEADB33F_FA
2023-12-21T16:05:55.737+0000 7fbe585b0700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'devicehealth' while running on mgr.x: [errno 2] RADOS object not found (Failed to operate read op for oid ABC_DEADB33F_FA)
2023-12-21T16:05:55.737+0000 7fbe585b0700 -1 devicehealth.serve:
2023-12-21T16:05:55.737+0000 7fbe585b0700 -1 Traceback (most recent call last):
File "/home/pdonnell/ceph/src/pybind/mgr/devicehealth/module.py", line 394, in serve
self._do_serve()
File "/home/pdonnell/ceph/src/pybind/mgr/mgr_module.py", line 524, in check
return func(self, *args, **kwargs)
File "/home/pdonnell/ceph/src/pybind/mgr/devicehealth/module.py", line 354, in _do_serve
finished_loading_legacy = self.check_legacy_pool()
File "/home/pdonnell/ceph/src/pybind/mgr/devicehealth/module.py", line 326, in check_legacy_pool
if self._load_legacy_object(ioctx, obj.key):
File "/home/pdonnell/ceph/src/pybind/mgr/devicehealth/module.py", line 300, in _load_legacy_object
ioctx.operate_read_op(op, oid)
File "rados.pyx", line 3723, in rados.Ioctx.operate_read_op
rados.ObjectNotFound: [errno 2] RADOS object not found (Failed to operate read op for oid ABC_DEADB33F_FA)
Credit to Greg Farnum for postulating the cause.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Casey Bodley [Mon, 8 Jan 2024 16:24:18 +0000 (08:24 -0800)]
make-dist: don't use --continue option for wget
the boost jfrog mirror is broken and returns an HTML error page instead
of the archive. the file size of this page is 11534 bytes
when download_from() retries the download from download.ceph.com, the -c
option tells it to resume the download of the existing file. the
resulting boost_1_82_0.tar.bz2 ends up with the correct total file size
of 121325129 bytes, but the first 11534 bytes still correspond to the
HTML from jfrog. that causes the sha256sum mismatch
remove the -c option so that wget fetches the archive in its entirety
Adam King [Fri, 4 Aug 2023 17:30:55 +0000 (13:30 -0400)]
qa/cephadm: mgr-nfs-upgrade, match any migration > 2
I believe this check was originally added because
the 2->3 migration migrated some nfs related bits. Since
then we've had to update the migration this checks
for every time we bump the max migration. This change
is intended to instead just have it check for a
miration > 2 so we don't have to keep updating it.
Adam King [Wed, 29 Nov 2023 16:49:38 +0000 (11:49 -0500)]
qa/upgrade/reef-x: pull compiled cephadm to start upgrades from reef
The compiled zipapp cephadm that began in reef needs
to be pulled differently than the old single python script
cephadm from earlier releases. This commit updates the reef-x
upgrade suite to pull cephadm in this new way.
Adam King [Wed, 16 Aug 2023 23:56:38 +0000 (19:56 -0400)]
qa/cephadm: support to pull stable branch compiled cephadm
This is to allow us to pull the latest build of
cephadm off of a stable branch (currently the only
valid option for that is reef, although this hopefully
will work with squid, T release, etc. in the future).
This should allow us to bootstrap cliusters based on
those stable branches for use in upgrade testing
Paul Cuzner [Thu, 21 Dec 2023 01:12:45 +0000 (20:12 -0500)]
orchestrator: Add summary line to orch device ls
This patch just adds a summary line to the plain
text output of orch device ls when the --summary
switch is given. This helps to quickly understand your
device countswhen managing hosts with many devices.
Fixes: https://tracker.ceph.com/issues/63864 Signed-off-by: Paul Cuzner <pcuzner@ibm.com>
Ronen Friedman [Thu, 28 Dec 2023 19:41:19 +0000 (13:41 -0600)]
osd/scrub: avoid "over clearing" queued_or_active flag
If two StartScrub messages are received in quick succession, the earlier
one might clear the queued_or_active flag as it fails for being from an old
interval.
When that happens - a 3'rd scrub request will actually be allowed to go
through, while the scrubber is still handling the second one.
Myoungwon Oh [Sun, 17 Dec 2023 08:51:22 +0000 (17:51 +0900)]
crimson/os/seastore: introduce modified_region in DATA_BLOCK to keep track of modified region
It has a limitation to keep track of the modified region using the existing
deltas because we can not get the correct region in two cases: 1) a case where replay
is done and 2) duplicate_for_write. This commit introduces modified region
to solve the problem.
Matan Breizman [Sun, 1 Jan 2023 11:41:34 +0000 (11:41 +0000)]
crimson/osd: Keep track of modified_ranges
* `modifies_ranges` interval_set is added to osd_op_params_t
* keep track of modified_ranges while executing relevant ops
* Add `osd_op_params` parameter to `PGBackend::remove()`.
Zac Dover [Wed, 3 Jan 2024 08:41:51 +0000 (18:41 +1000)]
doc/radosgw: edit "Add/Remove a Key"
Edit the section "Add/Remove a Key" in doc/radosgw/admin.rst. Each
operation (e.g. "Adding an S3 key pair for a user", "Removing an S3 key
pair for a user") now has its own subsection. This increased granularity
should make it easier in the future to link to each of these specific
operations, if needed.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
where the value ends up as a floating point value
after converting to a string (which is necessary to actually
pass it to the binary). By setting the field to be an
int, we should be able to avoid this.
Adam King [Sat, 4 Nov 2023 22:45:17 +0000 (18:45 -0400)]
qa/cephadm: add test for cephadm asyncio based timeout
Adds a test that will set the default cephadm command
timeout and then force a timeout to occur by holding
the cephadm lock and triggering a device refresh.
This works because cephadm ceph-volume commands
require the cephadm lock to run, so the command will
timeout waiting for the lock to become available.
Adam King [Wed, 7 Jun 2023 14:33:13 +0000 (10:33 -0400)]
mgr/cephadm: Also catch concurrent.futures.TimeoutError for timeouts
On python 3.6 which Ceph currently uses for its
container builds (which are based on centos 8 stream builds
hence the python version) the exception raised by a timeout
from a concurrent.futures.Future is successfully caught by
looking for asyncio.TimeoutError. However, in builds with
later python versions, e.g. 3.9.16, the timeout is no
longer caught. This results in situations like
Traceback (most recent call last):
File "/usr/share/ceph/mgr/cephadm/utils.py", line 79, in do_work
return f(*arg)
File "/usr/share/ceph/mgr/cephadm/serve.py", line 241, in refresh
r = self._refresh_host_devices(host)
File "/usr/share/ceph/mgr/cephadm/serve.py", line 352, in _refresh_host_devices
devices = self.mgr.wait_async(self._run_cephadm_json(
File "/usr/share/ceph/mgr/cephadm/module.py", line 635, in wait_async
return self.event_loop.get_result(coro, timeout)
File "/usr/share/ceph/mgr/cephadm/ssh.py", line 63, in get_result
return future.result(timeout)
File "/lib64/python3.9/concurrent/futures/_base.py", line 448, in result
raise TimeoutError()
concurrent.futures._base.TimeoutError
which causes the cephadm module to crash whenever one of these
command timeouts happen. This patch is to also catch the
newer exception type so it works on later python versions as well