Samuel Just [Wed, 19 Jun 2024 04:10:34 +0000 (21:10 -0700)]
crimson/.../object_context: drop recovery_read_marker
This doesn't seem to serve a purpose with current crimson. classic
uses ObjectState::recovery_read_marker to indicate that backfill
should be requeued upon wakeup, but that hasn't been necessary so
far in crimson. We can reintroduce this if it becomes useful.
Samuel Just [Thu, 13 Jun 2024 00:47:08 +0000 (00:47 +0000)]
crimson/.../tri_mutex: lock() methods return normal future
f63d76a2 modified the lock() variants on tri_mutex so that the obc
loading pathway wouldn't invoke .then() on returned future known
statically to be ready. Now that the loading pathway uses demotion
mechanisms that cannot block and do not return futures, we no longer
have any users like that and can drop the extra std::nullopt
possibility.
In a larger sense, if lock() *can* return a non-ready future in a
particular pathway, there's no semantic difference between returning
std::optional<future<>> and future<> as the caller would still have to
deal with a possible non-ready future return even if std::nullopt is
also possible. If the pathway can be demonstrated statically to be
non-blocking, as with the obc loading mechanism, we really want to use a
mechanism that obviously cannot block rather relying on a mechanism with
a return signature of std::optional<future<>> to return std::nullopt.
Because we just constructed the obc, we know that we can get an
exclusive lock without blocking. Introduce
ObjectContext::load_then_with_lock to take an exclusive lock
unconditionally, load, downgrade (which we also know must be safe), and
then run the passed function.
Laura Flores [Wed, 19 Jun 2024 21:57:45 +0000 (16:57 -0500)]
qa/suites/upgrade/telemetry-upgrade/reef-x: update how cephadm is pulled and change image reference
Update how cephadm is pulled:
`cephadm_git_url` and `cephadm_branch` are used in releases older than reef
to install cephadm. Both of these keys are needed to install it from the github
repo.
However, in reef and on, the compiled zipapp cephadm needs to be pulled differently
than the old single python script `cephadm` from earlier releases.
Laura Flores [Wed, 19 Jun 2024 21:07:31 +0000 (16:07 -0500)]
qa/suites/upgrade/telemetry-upgrade: add more ignorelist items and require_osd_release=squid
The warnings added to the ignorelist show up in the cluster log, but they are
expected during upgrades and should thus be ignored.
We also need to set require_osd_release=squid to avoid this warning:
```
cluster [WRN] Health check failed: all OSDs are running squid or later but require_osd_release < squid (OSD_UPGRADE_FINISHED)
```
Note from Dan van der Ster:
This is a test that should succeed,
it definitely used to succeed back in the L/O days of Ceph. At some
point peering code changed and this behaviour regressed. In short,
an OSD goes down then comes up, and no objects were modified in the mean time.
There should be no degraded PGs in this case.
As this commit is currently breaking make check on all PRs, I think it should
be re-evaluated and merged so whatever fix is needed along with this test to
make it work are merged together.
Fixes: https://tracker.ceph.com/issues/66556 Signed-off-by: Laura Flores <lflores@ibm.com>
Patrick Donnelly [Thu, 20 Jun 2024 15:43:17 +0000 (11:43 -0400)]
script/backport-create-issue: update tag custom field
The redmineup_tags was added and, apparently, the old "Tags" custom field with
id == 3 was deleted and then recreated with id == 31. This broke the script.
Update and refactor the id for that "Tags" custom field.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Wed, 19 Jun 2024 19:19:56 +0000 (15:19 -0400)]
Merge PR #55792 into main
* refs/pull/55792/head:
tools/cephfs: recover alternate_name of dentries from journal
qa: add test to verify recovery of alternate_name from journal
tools/cephfs/JournalTool: add some more debugging
tools/cephfs/JournalTool: remove extraneous 0x in debug output
mds: dump alternate_name to formatter
mds: add warning about encoding new fields
Reviewed-by: Christopher Hoffman <choffman@redhat.com> Reviewed-by: Xiubo Li <xiubli@redhat.com>
Afreen Misbah [Fri, 14 Jun 2024 05:21:47 +0000 (10:51 +0530)]
mgr/dashboard: fix service page e2e tests
- service page now uses defaults value for the placement count due to which mds test failing
- in test we pass "1" while "2" which is the default count for mds is already populated, making it 21 and causing unable to create mds service
Zack Cerza [Fri, 14 Jun 2024 19:37:16 +0000 (13:37 -0600)]
qa/tasks/qemu: Fix OS version comparison
See: https://sentry.ceph.com/share/issue/21ed88d705854238bdafbf6711e795ee/
They're strings, not floats.
This surfaced as a result of https://github.com/ceph/teuthology/pull/1953
Nizamudeen A [Sun, 16 Jun 2024 11:15:43 +0000 (16:45 +0530)]
mgr/dashboard: disable telemetry notification in e2e
After the new UI shell, the telemetry notifications are displayed on the
bottom side, which kind of exists on top of some buttons and that causes
some cypress failures while clicking Submit button in the form.
I am disabling the notification itself in e2e because its not checked in
the e2e at all so we can afford to disable it. Incase we decide to add
an e2e for the notification, we can just toggle it on later on and we
can check it
Fixes: https://tracker.ceph.com/issues/66506 Signed-off-by: Nizamudeen A <nia@redhat.com>
Xuehan Xu [Fri, 24 May 2024 09:30:41 +0000 (17:30 +0800)]
crimson/osd/osd_operations/client_request_common: `PeeringState::needs_recovery()`
may fail if the object is under backfill
Meanwhile, set the correct version for backfill:
From Classic:
```
if (is_degraded_or_backfilling_object(head)) {
if (can_backoff && g_conf()->osd_backoff_on_degraded) {
add_backoff(session, head, head);
maybe_kick_recovery(head);
}
```
John Mulligan [Thu, 2 May 2024 20:41:15 +0000 (16:41 -0400)]
mgr/smb: add validation funcs for custom parameter dictionaries
Custom parameter dictionaries will be used to pass options to samba
config without much filtering and control by the smb mgr module. Because
the risks that it entails the user must "agree" that using these options
can break their setup with a "magic" key-value pair.
This pair will be filtered out of the eventual data passed to samba.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Thu, 2 May 2024 20:40:53 +0000 (16:40 -0400)]
mgr/smb: convert failures to create a valid resource into error results
Convert failures to create a valid resource into error results that can
be reported back to the caller like the other error result types
generated by actually attempting to apply the resources.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Thu, 2 May 2024 20:33:01 +0000 (16:33 -0400)]
mgr/smb: add result type for reporting resource validation errors
This error type can not take a real resource object because the resource
object could not be constructed from the data. Use the raw data for
reporting the error result.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Thu, 2 May 2024 20:32:35 +0000 (16:32 -0400)]
mgr/smb: add resource construction error handling exception
Use error hook function to wrap plain ValueError instances to an
InvalidResourcError. This error retains the simplified data being
(re)constructed into an object and thus will be used later to generate
proper error results.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Thu, 2 May 2024 20:28:46 +0000 (16:28 -0400)]
mgr/smb: add error handling & conversion hook to resourcelib
Add a method to supply the Resource instances with a callback that
can handle errors that occur during object construction from simplified
data. If the callback is not set, the exceptions are handled as usual.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
some tests are currently failing when `lvm2` isn't installed:
```
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestList::test_empty_device_json_zero_exit_status - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestList::test_empty_device_zero_exit_status - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestFullReport::test_no_ceph_lvs - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestFullReport::test_ceph_data_lv_reported - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestFullReport::test_ceph_journal_lv_reported - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestFullReport::test_ceph_wal_lv_reported - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestFullReport::test_physical_2nd_device_gets_reported[journal] - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestFullReport::test_physical_2nd_device_gets_reported[db] - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestFullReport::test_physical_2nd_device_gets_reported[wal] - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestSingleReport::test_not_a_ceph_lv - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestSingleReport::test_report_a_ceph_lv - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestSingleReport::test_report_a_ceph_journal_device - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestSingleReport::test_report_by_osd_id_for_just_block_dev - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestSingleReport::test_report_by_osd_id_for_just_data_dev - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestSingleReport::test_report_by_osd_id_for_just_block_wal_and_db_dev - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestSingleReport::test_report_by_osd_id_for_data_and_journal_dev - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestSingleReport::test_report_by_nonexistent_osd_id - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestSingleReport::test_report_a_ceph_lv_with_no_matching_devices - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_migrate.py::TestNew::test_newdb_not_target_lvm - FileNotFoundError: [Errno 2] No such file or directory: 'lvs'
FAILED ceph_volume/tests/devices/lvm/test_zap.py::TestEnsureAssociatedLVs::test_nothing_is_found - FileNotFoundError: [Errno 2] No such file or directory: 'lvs'
FAILED ceph_volume/tests/devices/lvm/test_zap.py::TestEnsureAssociatedLVs::test_multiple_journals_are_found - FileNotFoundError: [Errno 2] No such file or directory: 'lvs'
FAILED ceph_volume/tests/devices/lvm/test_zap.py::TestEnsureAssociatedLVs::test_multiple_dbs_are_found - FileNotFoundError: [Errno 2] No such file or directory: 'lvs'
FAILED ceph_volume/tests/devices/lvm/test_zap.py::TestEnsureAssociatedLVs::test_multiple_wals_are_found - FileNotFoundError: [Errno 2] No such file or directory: 'lvs'
FAILED ceph_volume/tests/devices/lvm/test_zap.py::TestEnsureAssociatedLVs::test_multiple_backing_devs_are_found - FileNotFoundError: [Errno 2] No such file or directory: 'lvs'
FAILED ceph_volume/tests/objectstore/test_lvmbluestore.py::TestLvmBlueStore::test_activate_all_osd_is_active - FileNotFoundError: [Errno 2] No such file or directory: 'lvs'
```
Everything should be actually mocked. This commit addresses that.
Venky Shankar [Mon, 17 Jun 2024 09:27:52 +0000 (14:57 +0530)]
Merge PR #57991 into main
* refs/pull/57991/head:
qa: upgrade sub-suite upgraded_client from from n-1|n-2 releases
qa: upgrade sub-suite nofs from n-1 and n-2 releases
qa: use supported releases for featureful_client
Ilya Dryomov [Fri, 14 Jun 2024 12:04:39 +0000 (14:04 +0200)]
librbd: disallow group snap rollback if memberships don't match
Before proceeding with group rollback, ensure that the set of images
that took part in the group snapshot matches the set of images that are
currently part of the group. Otherwise, because we preserve affected
snapshots when an image is removed from the group, data loss can ensue
where an image gets rolled back while part of another group or not part
of any group but long repurposed for something else.
Similarly, ensure that the group snapshot is complete.
After the rollback assert in TestGroup.add_snapshot{,PP} was made
meaningful in the previous commit, it fails in mock tests which means
that rollback has never been exercised properly...
While I confess to not following file->snap_id == CEPH_NOSNAP branch
especially given how file variable is shadowed, it's pretty clear that
get_snap_read() doesn't belong here -- the snapshot selected for reads
has nothing to do with rollback. Replacing it with the rollback snap
ID makes sense of the other branches and makes the tests in question
pass.
Ilya Dryomov [Thu, 13 Jun 2024 14:24:43 +0000 (16:24 +0200)]
test/librbd: make rollback in TestGroup.add_snapshot{,PP} meaningful
The rollback assert doesn't really test anything -- because orig_data
and test_data are written to non-overlapping areas, the test would pass
even if rbd_group_snap_rollback() does nothing (i.e. rollback isn't
performed) as long as the call returns 0.