Adam King [Wed, 19 Jun 2024 19:04:21 +0000 (15:04 -0400)]
mgr/rgw: fix error handling in rgw zone create
This was returning either a list of strings or
a HandleCommandResult and in the latter case
it would error out trying to build the final
return message which covered up the original
error
Fixes: https://tracker.ceph.com/issues/66568 Signed-off-by: Adam King <adking@redhat.com>
Xuehan Xu [Fri, 24 May 2024 09:30:41 +0000 (17:30 +0800)]
crimson/osd/osd_operations/client_request_common: `PeeringState::needs_recovery()`
may fail if the object is under backfill
Meanwhile, set the correct version for backfill:
From Classic:
```
if (is_degraded_or_backfilling_object(head)) {
if (can_backoff && g_conf()->osd_backoff_on_degraded) {
add_backoff(session, head, head);
maybe_kick_recovery(head);
}
```
John Mulligan [Thu, 2 May 2024 20:41:15 +0000 (16:41 -0400)]
mgr/smb: add validation funcs for custom parameter dictionaries
Custom parameter dictionaries will be used to pass options to samba
config without much filtering and control by the smb mgr module. Because
the risks that it entails the user must "agree" that using these options
can break their setup with a "magic" key-value pair.
This pair will be filtered out of the eventual data passed to samba.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Thu, 2 May 2024 20:40:53 +0000 (16:40 -0400)]
mgr/smb: convert failures to create a valid resource into error results
Convert failures to create a valid resource into error results that can
be reported back to the caller like the other error result types
generated by actually attempting to apply the resources.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Thu, 2 May 2024 20:33:01 +0000 (16:33 -0400)]
mgr/smb: add result type for reporting resource validation errors
This error type can not take a real resource object because the resource
object could not be constructed from the data. Use the raw data for
reporting the error result.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Thu, 2 May 2024 20:32:35 +0000 (16:32 -0400)]
mgr/smb: add resource construction error handling exception
Use error hook function to wrap plain ValueError instances to an
InvalidResourcError. This error retains the simplified data being
(re)constructed into an object and thus will be used later to generate
proper error results.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Thu, 2 May 2024 20:28:46 +0000 (16:28 -0400)]
mgr/smb: add error handling & conversion hook to resourcelib
Add a method to supply the Resource instances with a callback that
can handle errors that occur during object construction from simplified
data. If the callback is not set, the exceptions are handled as usual.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
some tests are currently failing when `lvm2` isn't installed:
```
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestList::test_empty_device_json_zero_exit_status - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestList::test_empty_device_zero_exit_status - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestFullReport::test_no_ceph_lvs - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestFullReport::test_ceph_data_lv_reported - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestFullReport::test_ceph_journal_lv_reported - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestFullReport::test_ceph_wal_lv_reported - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestFullReport::test_physical_2nd_device_gets_reported[journal] - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestFullReport::test_physical_2nd_device_gets_reported[db] - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestFullReport::test_physical_2nd_device_gets_reported[wal] - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestSingleReport::test_not_a_ceph_lv - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestSingleReport::test_report_a_ceph_lv - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestSingleReport::test_report_a_ceph_journal_device - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestSingleReport::test_report_by_osd_id_for_just_block_dev - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestSingleReport::test_report_by_osd_id_for_just_data_dev - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestSingleReport::test_report_by_osd_id_for_just_block_wal_and_db_dev - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestSingleReport::test_report_by_osd_id_for_data_and_journal_dev - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestSingleReport::test_report_by_nonexistent_osd_id - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_listing.py::TestSingleReport::test_report_a_ceph_lv_with_no_matching_devices - FileNotFoundError: [Errno 2] No such file or directory: 'pvs'
FAILED ceph_volume/tests/devices/lvm/test_migrate.py::TestNew::test_newdb_not_target_lvm - FileNotFoundError: [Errno 2] No such file or directory: 'lvs'
FAILED ceph_volume/tests/devices/lvm/test_zap.py::TestEnsureAssociatedLVs::test_nothing_is_found - FileNotFoundError: [Errno 2] No such file or directory: 'lvs'
FAILED ceph_volume/tests/devices/lvm/test_zap.py::TestEnsureAssociatedLVs::test_multiple_journals_are_found - FileNotFoundError: [Errno 2] No such file or directory: 'lvs'
FAILED ceph_volume/tests/devices/lvm/test_zap.py::TestEnsureAssociatedLVs::test_multiple_dbs_are_found - FileNotFoundError: [Errno 2] No such file or directory: 'lvs'
FAILED ceph_volume/tests/devices/lvm/test_zap.py::TestEnsureAssociatedLVs::test_multiple_wals_are_found - FileNotFoundError: [Errno 2] No such file or directory: 'lvs'
FAILED ceph_volume/tests/devices/lvm/test_zap.py::TestEnsureAssociatedLVs::test_multiple_backing_devs_are_found - FileNotFoundError: [Errno 2] No such file or directory: 'lvs'
FAILED ceph_volume/tests/objectstore/test_lvmbluestore.py::TestLvmBlueStore::test_activate_all_osd_is_active - FileNotFoundError: [Errno 2] No such file or directory: 'lvs'
```
Everything should be actually mocked. This commit addresses that.
Venky Shankar [Mon, 17 Jun 2024 09:27:52 +0000 (14:57 +0530)]
Merge PR #57991 into main
* refs/pull/57991/head:
qa: upgrade sub-suite upgraded_client from from n-1|n-2 releases
qa: upgrade sub-suite nofs from n-1 and n-2 releases
qa: use supported releases for featureful_client
Ilya Dryomov [Fri, 14 Jun 2024 12:04:39 +0000 (14:04 +0200)]
librbd: disallow group snap rollback if memberships don't match
Before proceeding with group rollback, ensure that the set of images
that took part in the group snapshot matches the set of images that are
currently part of the group. Otherwise, because we preserve affected
snapshots when an image is removed from the group, data loss can ensue
where an image gets rolled back while part of another group or not part
of any group but long repurposed for something else.
Similarly, ensure that the group snapshot is complete.
After the rollback assert in TestGroup.add_snapshot{,PP} was made
meaningful in the previous commit, it fails in mock tests which means
that rollback has never been exercised properly...
While I confess to not following file->snap_id == CEPH_NOSNAP branch
especially given how file variable is shadowed, it's pretty clear that
get_snap_read() doesn't belong here -- the snapshot selected for reads
has nothing to do with rollback. Replacing it with the rollback snap
ID makes sense of the other branches and makes the tests in question
pass.
Ilya Dryomov [Thu, 13 Jun 2024 14:24:43 +0000 (16:24 +0200)]
test/librbd: make rollback in TestGroup.add_snapshot{,PP} meaningful
The rollback assert doesn't really test anything -- because orig_data
and test_data are written to non-overlapping areas, the test would pass
even if rbd_group_snap_rollback() does nothing (i.e. rollback isn't
performed) as long as the call returns 0.
Ronen Friedman [Tue, 4 Jun 2024 09:02:55 +0000 (04:02 -0500)]
osd/scrub: do not track reserving state at OSD level
As we no longer block the initiation of new scrub sessions for an OSD
for which any of its PGs is in the process of reserving scrub resources,
there is no need to track the reserving state at the OSD level.
Ronen Friedman [Tue, 4 Jun 2024 08:53:04 +0000 (03:53 -0500)]
osd/scrub: allow new scrubs while reserving
allow new scrub session to be initiated by an OSD even while a PG is
in the process of reserving scrub resources.
The existing restriction made sense when the replica reservation process
was expected to succeed or fail within a few milliseconds. It makes less
sense now that the reservation process is queue-based (Reserver based)
and can take unlimited time (hours, days, ...) to complete.
Zac Dover [Sat, 15 Jun 2024 11:55:18 +0000 (21:55 +1000)]
doc/rados: explain replaceable parts of command
Add an explanation that directs the reader to replace the "X" part of
the command "ceph tell mon.X mon_status" with the value specific to the
reader's Ceph cluster (which is (probably) not "X").
In the future, such replaceable strings in commands may be bounded by
angle brackets ("<" and ">").
This improvement to the documentation was suggested on the [ceph-users]
email list by Joel Davidow. This email, an absolute model of user
engagement with an upstream project, can be reviewed here:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/KF67F5TXFSSTPXV7EKL6JKLA5KZQDLDQ/
John Mulligan [Fri, 14 Jun 2024 14:07:07 +0000 (10:07 -0400)]
script/cpatch.py: add support for multiple valid python versions
Fix running cpatch.py with the latest centos9s based container images.
Future proof a little by adding multiple valid, existing, python version
numbers to probe.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Samuel Just [Tue, 28 May 2024 20:45:26 +0000 (20:45 +0000)]
crimson/.../snaptrim_event: SnapTrimObjSubEvent should enter WaitRepop
Otherwise, it parks on Process until the repop completes blocking any
other repops, including client IO. Since we don't actually care about
ordering, simply calling handle.complete() would also be viable, but
this is a valid usage of the stage and does provide information to an
operator.
SnapTrimEvent doesn't actually do or block on GetOBC or Process --
remove those stages entirely. Entering Process, in particular, causes
problems unless we immediately leave it as SnapTrimObjSubEvent needs to
enter and leave it to complete. Entering one of the stages removed in
a prior commit had a side effect of exiting Process -- without that
exit SnapTrimEvent and SnapTrimObjSubEvent mutually block preventing
snap trim or client io from making progress.
This leaves no actual pipeline stages on SnapTrimEvent, which makes
sense as only SnapTrimObjSubEvent actually does IO.
Samuel Just [Tue, 28 May 2024 16:48:42 +0000 (09:48 -0700)]
crimson/.../snaptrim_event: remove pipeline stages located on event
WaitSubop, WaitTrimTimer, and WaitRepop are pipeline stages local to
the operation. As such they don't actually provide any ordering
guarrantees as only one operation will ever enter them. Rather, the
intent is to hook into the event system to expose information to an
administrator.
This poses a problem for OrderedConcurrentPhase as it is currently
implemented. PipelineHandle::exit() is invoked prior to the op being
destructed. PipelineHandle::exit() does:
void exit() {
barrier.reset();
}
For OrderedConcurrentPhase, ~ExitBarrier() invokes ExitBarrier::exit():
The problem comes in not waiting for the phase->mutex.unlock() to occur.
For SnapTrimEvent, phase is actually in the operation itself. It's
possible for that continuation
to occur after the last finally() in ShardServices::start_operation
completes and releases the final reference to SnapTrimEvent. This is
harmless normally provided that the PG or connection outlives it,
but it's a problem for these stages.
For now, let's just remove these stages. We can reintroduce another
mechanism later to set these event flags without an actual pipeline
stage.
This is likely a bug even with pipelines not embedded in an operation,
but we can fix it later -- https://tracker.ceph.com/issues/64545.
Fixes: https://tracker.ceph.com/issues/63647 Signed-off-by: Samuel Just <sjust@redhat.com>
John Mulligan [Wed, 1 May 2024 14:57:02 +0000 (10:57 -0400)]
mgr/smb: share and cluster create commands only create resources
Prior to this change the create commands could be used counter to the
term 'create' as a create-or-update command. IMO this violates the
principle of least surprise so make them create-only.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Wed, 1 May 2024 14:55:27 +0000 (10:55 -0400)]
mgr/smb: add create_only arg for handler apply function
Add a create_only argument to the handler class apply function. This
flag is used to prevent modification of existing resources. This flag
will be use by 'cluster create' and 'share create' commands to make
them true to their names and not sneaky modify-or-create commands.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Afreen [Fri, 31 May 2024 07:54:27 +0000 (13:24 +0530)]
mgr/dashboard: Configure NVMe/TCP
Fixes https://tracker.ceph.com/issues/63686
- creation of Nvme-oF/TCP service
- deletion of Nvme-oF/TCP service
- edit/update Nvme-oF/TCP service
- added unit tests for Nvme-oF/TCP service
- changed Id -> Service Name
- added prefix of service type in service name (similar to <client.> in
fs access)
- service name and pool are required fields for nvmeof
- placement count now takes default value as mentioned in cephadm
- slight refactors
- prepopulate serviceId for each service type setServiceId()
- in case serviceId is same as servcie type then do not add create service name with<servicetype>.<setrviceid> format
Ilya Dryomov [Fri, 7 Jun 2024 10:12:29 +0000 (12:12 +0200)]
librbd: add rbd_snap_get_trash_namespace2() API to return full namespace
The existing rbd_snap_get_trash_namespace() API returns only the
original name of the deleted snapshot, omitting its namespace type.
While non-user snapshots have distinctive names, there is nothing
preventing the user from creating user snapshots with identical names
(i.e. starting with ".group" or ".mirror" prefix). After cloning from
non-user snapshots is allowed, it's possible for such user snapshots to
get mixed up with non-user snapshots in the trash, so let's provide
means for disambiguation.
Ilya Dryomov [Thu, 30 May 2024 14:54:53 +0000 (16:54 +0200)]
qa/workunits/rbd: fix bogus grep -v asserts in test_clone()
The intent of "rbd ls | grep -v clone" was probably to check that an
image with the name "clone" shows up in rbd2 pool and not in rbd pool.
However, it's very far from that -- "grep -v clone" would succeed
regardless because of an image with the name "test1" in rbd pool.
Ilya Dryomov [Fri, 24 May 2024 10:06:09 +0000 (12:06 +0200)]
librbd: add rbd_clone4() API to take parent snapshot by ID
Allow cloning from non-user snapshots -- namely snapshots in group
and mirror namespaces. The motivation is to provide a building block
for cloning new groups from group snapshots ("rbd group snap create").
Otherwise, group snapshots as they are today can be used only for
rolling back the group as a whole, which is very limiting.
While at it, there doesn't seem to be anything wrong with making it
possible to clone from mirror snapshots as well.
Snapshots in a trash namespace can't be cloned from since they are
considered to be deleted.
Cloning from non-user snapshots is limited to clone v2 just because
protecting/unprotecting is limited to snapshots in a user namespace.
This happens to simplify some invariants.
librbd: replace assert with error check in clone()
With an error check for p_snap_name, it doesn't make much sense to
crash if "either p_id or p_name" contract is violated. Replace the
assert with a similar error check.