Test: osd-recovery-space.sh extends the wait time for "recovery toofull".
The osd-recovery-space test involves writing objects and expecting to receive
the "toofull" flag.
If we don't wait long enough, we might check the "toofull" flag before all objects
have completed writing, and the "toofull" status hasn't been activated yet.
The change will extend the waiting time and will also incorporate additional
checks for the return code from the status wait.
Conflicts: git got confused about these; unchanged in original commit
qa/suites/rbd/device/ignore-pg-availability.yaml
qa/suites/upgrade/octopus-x/ignore-pg-availability.yaml
qa/suites/rgw/dbstore/ignore-pg-availability.yaml depended on
missing .qa symlink
the subsuite had a supported-all-distro$/ subdirectory, but that only
contained centos_8.yaml. qa/tasks/rabbitmq.py is hardcoded to use 'yum'
and rpm packages, so replace supported-all-distro$ with a link to
centos_latest.yaml
qa/workunits/rbd: avoid caching effects in luks-encryption.sh
Commit 40f6f5224bce ("qa/workunits/rbd: fix issues in
luks-encryption.sh") did the right thing for reads, which solved
most of the issue. However, it actually made a step in the opposite
direction for writes -- depending on the RBD cache settings, rbd-nbd
virtual devices can behave as physical devices with a volatile write
cache, so fsync is required.
While at it, involving O_DIRECT for reads isn't needed outside of
test_encryption_format().
Or Ozeri [Tue, 7 Jun 2022 07:44:21 +0000 (10:44 +0300)]
qa/workunits/rbd: fix issues in luks-encryption.sh
This commit fixes 2 issues in luks-encryption.sh:
1. Fix sporadic comparison failures due to stale data read from kernel buffer cache.
2. Fix test skipping condition (when journaling is enabled)
Prashant D [Fri, 3 May 2024 23:32:32 +0000 (19:32 -0400)]
mgr/cephadm: Fix unfound progress events
While applying service specs, cephadm creates a progress event for
the daemons to be added or deleted from the hosts. The progress
event is initialized if progress_total is greater than 0
but at the end cephadm tries to mark the progress event as
complete/fail even though progress event has not been initialized.
Mark progress events as complete/fail only if they are initialized.
Fixes: https://tracker.ceph.com/issues/65799 Signed-off-by: Prashant D <pdhange@redhat.com>
(cherry picked from commit cfcdfd6ae2e0580450cc857123ca35e0d4d2ebea)
Adam King [Mon, 22 Jul 2024 15:27:26 +0000 (11:27 -0400)]
pybind/mgr/mgr_util: convert certs to bytes before loading them
This function expects to be passed bytes rather than a string.
Mypy complains about failing to do this conversion
mgr_util.py: note: In function "verify_cacrt_content":
mgr_util.py:547: error: Argument 2 to "load_certificate" has incompatible type "str"; expected "bytes"
mgr_util.py: note: In function "verify_tls":
mgr_util.py:584: error: Argument 2 to "load_certificate" has incompatible type "str"; expected "bytes"
Kefu Chai [Thu, 23 May 2024 23:21:51 +0000 (07:21 +0800)]
cephadm: use importlib.metadata for querying ceph_iscsi's version
use importlib.metadata for querying ceph_iscsi's version and fallback to
pkg_resources. as the former is only available in Python 3.8, while
the latter is deprecated.
Kefu Chai [Thu, 23 May 2024 23:16:14 +0000 (07:16 +0800)]
cephadm: extract python() helper to execute python statement
to prepare for a change to use importlib, then fallback to
pkg_resources. as the former is only available in Python 3.8, while
the latter is deprecated.
Laura Flores [Tue, 16 Jul 2024 16:47:35 +0000 (11:47 -0500)]
qa/suites/rados/thrash-old-clients/0-distros$: test on ubuntu_20.04 and drop nautilus
Centos 8 has gone end of life, so we need to choose a different distro on which
to test thrash-old-clients.
thrash-old-clients tests should only support N-3 releases. Nautilus fits with
this, but unfortunately there is no overlapping distro between nautilus, pacific,
octopus, AND quincy (bionic was dropped from quincy, and nautilus does not build
focal). As such, we are only able to test N-2.
Proof that focal is not available for octopus (this is where the test would search for packages):
https://shaman.ceph.com/api/search/?status=ready&project=ceph&flavor=default&distros=ubuntu%2F20.04%2Fx86_64&ref=nautilus
Edit the section called "Is mount helper present?", the title of which
prior to this commit was "Is mount helper is present?". Other small
disambiguating improvements have been made to the text in the section.
An unselectable prompt has been added before a command.
Improve "Principles for format change" in doc/dev/encoding.rst. This
commit started as a response to Anthony D'Atri's suggestion here: https://github.com/ceph/ceph/pull/58299/files#r1656985564
Review of this section suggested to me that certain minor English usage
improvements would be of benefit. The numbered lists in this section
could still be made a bit clearer.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 570797e5588b67b8c72e5297b61f84d9aa48dc45)
After rollback started being tested in commit b3977c53c930
("test/librbd: make rollback in TestGroup.add_snapshot{,PP}
meaningful"), these tests can fail on comparing post-rollback
data to expected data if run with exclusive lock disabled.
This doesn't occur with exclusive lock enabled because the RBD
cache gets invalidated implicitly before releasing the lock.
While at it, pass LIBRADOS_OP_FLAG_FADVISE_FUA to avoid relying
on any cache settings that happen to be in effect.
Ilya Dryomov [Fri, 14 Jun 2024 12:04:39 +0000 (14:04 +0200)]
librbd: disallow group snap rollback if memberships don't match
Before proceeding with group rollback, ensure that the set of images
that took part in the group snapshot matches the set of images that are
currently part of the group. Otherwise, because we preserve affected
snapshots when an image is removed from the group, data loss can ensue
where an image gets rolled back while part of another group or not part
of any group but long repurposed for something else.
Similarly, ensure that the group snapshot is complete.
Conflicts:
src/cls/rbd/cls_rbd_types.h [ provide operator== for
GroupImageSpec manually since quincy is on C++17 ]
src/test/pybind/test_rbd.py [ commit d7fd66ec9944 ("librbd: add
rbd_clone4() API to take parent snapshot by ID") not in
quincy ]
After the rollback assert in TestGroup.add_snapshot{,PP} was made
meaningful in the previous commit, it fails in mock tests which means
that rollback has never been exercised properly...
While I confess to not following file->snap_id == CEPH_NOSNAP branch
especially given how file variable is shadowed, it's pretty clear that
get_snap_read() doesn't belong here -- the snapshot selected for reads
has nothing to do with rollback. Replacing it with the rollback snap
ID makes sense of the other branches and makes the tests in question
pass.
Ilya Dryomov [Thu, 13 Jun 2024 14:24:43 +0000 (16:24 +0200)]
test/librbd: make rollback in TestGroup.add_snapshot{,PP} meaningful
The rollback assert doesn't really test anything -- because orig_data
and test_data are written to non-overlapping areas, the test would pass
even if rbd_group_snap_rollback() does nothing (i.e. rollback isn't
performed) as long as the call returns 0.
Ilya Dryomov [Thu, 20 Jun 2024 19:13:56 +0000 (21:13 +0200)]
librbd: make diff-iterate in fast-diff mode aware of encryption
diff-iterate wasn't updated when librbd was being prepared to support
encryption in commit 8d6a47933269 ("librbd: add crypto image dispatch
layer"). This is even noted in [1]:
> The two places I skipped for now are DiffIterate and TrimRequest.
CryptoImageDispatch has since been removed, but diff-iterate in
fast-diff mode is still unaware of encryption and just assumes that all
offsets are raw. This means that the callback gets invoked with
incorrect image offsets when encryption is loaded. For example, for
a LUKS1-formatted image with some data at offsets 0 and 20971520,
diff-iterate with encryption loaded reports
as "exists". For any piece of code that is using diff-iterate to
optimize block-by-block processing (e.g. copy an encrypted source image
to a differently-encrypted destination image), this is fatal: it would
skip processing block 20971520 which has data and instead process block 25165824 which doesn't have any data and was to be skipped, producing
a corrupted destination image.
Conflicts:
src/librbd/api/DiffIterate.cc [ ImageArea support not in
quincy ]
src/test/librbd/test_librbd.cc [ commit 4a5a0a5dd82b ("librbd:
add cloned images encryption API") not in quincy
Currently we are laying data only at the beginning of an object.
Extend the skeletons to write to three different offsets in the middle
and also at the end of the object.
Separately, make C and C++ API test variants slightly different in
terms of offsets being targeted to not go through exactly the same
scenario twice.