Ilya Dryomov [Fri, 14 Jun 2024 12:04:39 +0000 (14:04 +0200)]
librbd: disallow group snap rollback if memberships don't match
Before proceeding with group rollback, ensure that the set of images
that took part in the group snapshot matches the set of images that are
currently part of the group. Otherwise, because we preserve affected
snapshots when an image is removed from the group, data loss can ensue
where an image gets rolled back while part of another group or not part
of any group but long repurposed for something else.
Similarly, ensure that the group snapshot is complete.
After the rollback assert in TestGroup.add_snapshot{,PP} was made
meaningful in the previous commit, it fails in mock tests which means
that rollback has never been exercised properly...
While I confess to not following file->snap_id == CEPH_NOSNAP branch
especially given how file variable is shadowed, it's pretty clear that
get_snap_read() doesn't belong here -- the snapshot selected for reads
has nothing to do with rollback. Replacing it with the rollback snap
ID makes sense of the other branches and makes the tests in question
pass.
Ilya Dryomov [Thu, 13 Jun 2024 14:24:43 +0000 (16:24 +0200)]
test/librbd: make rollback in TestGroup.add_snapshot{,PP} meaningful
The rollback assert doesn't really test anything -- because orig_data
and test_data are written to non-overlapping areas, the test would pass
even if rbd_group_snap_rollback() does nothing (i.e. rollback isn't
performed) as long as the call returns 0.
Ilya Dryomov [Fri, 7 Jun 2024 10:12:29 +0000 (12:12 +0200)]
librbd: add rbd_snap_get_trash_namespace2() API to return full namespace
The existing rbd_snap_get_trash_namespace() API returns only the
original name of the deleted snapshot, omitting its namespace type.
While non-user snapshots have distinctive names, there is nothing
preventing the user from creating user snapshots with identical names
(i.e. starting with ".group" or ".mirror" prefix). After cloning from
non-user snapshots is allowed, it's possible for such user snapshots to
get mixed up with non-user snapshots in the trash, so let's provide
means for disambiguation.
Ilya Dryomov [Thu, 30 May 2024 14:54:53 +0000 (16:54 +0200)]
qa/workunits/rbd: fix bogus grep -v asserts in test_clone()
The intent of "rbd ls | grep -v clone" was probably to check that an
image with the name "clone" shows up in rbd2 pool and not in rbd pool.
However, it's very far from that -- "grep -v clone" would succeed
regardless because of an image with the name "test1" in rbd pool.
Ilya Dryomov [Fri, 24 May 2024 10:06:09 +0000 (12:06 +0200)]
librbd: add rbd_clone4() API to take parent snapshot by ID
Allow cloning from non-user snapshots -- namely snapshots in group
and mirror namespaces. The motivation is to provide a building block
for cloning new groups from group snapshots ("rbd group snap create").
Otherwise, group snapshots as they are today can be used only for
rolling back the group as a whole, which is very limiting.
While at it, there doesn't seem to be anything wrong with making it
possible to clone from mirror snapshots as well.
Snapshots in a trash namespace can't be cloned from since they are
considered to be deleted.
Cloning from non-user snapshots is limited to clone v2 just because
protecting/unprotecting is limited to snapshots in a user namespace.
This happens to simplify some invariants.
librbd: replace assert with error check in clone()
With an error check for p_snap_name, it doesn't make much sense to
crash if "either p_id or p_name" contract is violated. Replace the
assert with a similar error check.
Afreen Misbah [Wed, 12 Jun 2024 15:50:04 +0000 (21:20 +0530)]
mgr/dashboard: Fix login and notification e2e tests
Fixes https://tracker.ceph.com/issues/66453
- `#rbdMirroring` checkbox is not found due to which both of these tests are failing on most of the Prs
- this is due to the pool helper function which checks for an existing app passed in parameter
- if app is not found, then mirroring checkbox remains hidden
`set_dmcrypt_no_workqueue()` from `ceph_volume.util.encryption`
The function `set_dmcrypt_no_workqueue` in `encryption.py` now
dynamically retrieves the installed cryptsetup version using `cryptsetup
--version` command. It then parses the version string using a regular
expression to accommodate varying digit counts. If the retrieved version
is greater than or equal to the specified target version,
`conf.dmcrypt_no_workqueue` is set to True, allowing for flexible version
handling.
Laura Flores [Fri, 7 Jun 2024 17:30:33 +0000 (12:30 -0500)]
qa/suites/rados/thrash-old-clients: update supported releases and distro
thrash-old-clients tests should only support N-3 releases. To fix this for
main, I have removed all releases < quincy and have added squid.
Also, we are fully switching to centos.9_stream packages/containers after
the centos.8_stream end of life, so I changed the distro from centos.8_stream
to centos.9_stream.
*** Note: If this commit is backported, it should be done in such a way that
only releases >= quincy reference centos.9_stream. For instance, if backporting to squid,
a reef/squid thrash test is okay to make references to centos.9_stream since both reef and
squid support this, but a pacific/squid test will have to take a different approach
since pacific does not support centos.9_stream.
Fixes: https://tracker.ceph.com/issues/66398 Signed-off-by: Laura Flores <lflores@ibm.com>
Add an explanation of leader-peon conditions that obtain when the
cluster is in the "HEALTH_OK" state. Previously, the text discussed
these two monitor states only in the context of a health detail entry.
This improvement to the documentation was suggested on the [ceph-users]
email list by Joel Davidow. This email, an absolute model of user
engagement with an upstream project, can be reviewed here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/KF67F5TXFSSTPXV7EKL6JKLA5KZQDLDQ/
I will list Joel Davidow here as the co-author for the sake of more
expediently getting this change into the documentation, but though he is
listed as the co-author, he is the true author.
Co-authored-by: Joel Davidow <jdavidow@nso.edu> Signed-off-by: Zac Dover <zac.dover@proton.me>
* refs/pull/48130/head:
qa: add killpoint testing for dirfrags
qa: stringify arguments to setfattr
qa: move some configs to cluster-conf
qa: restore default for config to split exports
qa/tasks/ceph_test_case: rollback configs using `config reset`
qa/cephfs: set confs using cluster-conf
qa/tasks/ceph: provide configuration for setting configs via mon
mds: optimize MDBalancer code path config access
mds: add killpoints for directory fragmentation
Nizamudeen A [Fri, 7 Jun 2024 13:49:42 +0000 (19:19 +0530)]
mgr/dashboard: fix edit bucket failing in other selected gateways
even if I select gateway 8002, the bucket policy req seems to go through 8000 and doesn't find the bucket
```
2024-06-07T13:40:33.161+0000 7f563be00700 0 [dashboard DEBUG rest_client] RGW REST API GET req: /hello?policy data: None
2024-06-07T13:40:33.164+0000 7f563be00700 0 [dashboard DEBUG urllib3.connectionpool] http://172.20.0.5:8000 "GET /hello?policy HTTP/1.1" 404 174
2024-06-07T13:40:33.164+0000 7f563be00700 0 [dashboard ERROR rest_client] RGW REST API failed GET req status: 404
2024-06-07T13:40:33.164+0000 7f563be00700 0 [dashboard ERROR exception] Internal Server Error
Traceback (most recent call last):
File "/ceph/src/pybind/mgr/dashboard/services/exception.py", line 47, in dashboard_exception_handler
return handler(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/cherrypy/_cpdispatch.py", line 54, in __call__
return self.callable(*self.args, **self.kwargs)
File "/ceph/src/pybind/mgr/dashboard/controllers/_base_controller.py", line 263, in inner
ret = func(*args, **kwargs)
File "/ceph/src/pybind/mgr/dashboard/controllers/_rest_controller.py", line 193, in wrapper
return func(*vpath, **params)
File "/ceph/src/pybind/mgr/dashboard/controllers/rgw.py", line 463, in get
result['bucket_policy'] = self._get_policy(bucket_name)
File "/ceph/src/pybind/mgr/dashboard/controllers/rgw.py", line 381, in _get_policy
return rgw_client.get_bucket_policy(bucket)
File "/ceph/src/pybind/mgr/dashboard/rest_client.py", line 543, in func_wrapper
**kwargs)
File "/ceph/src/pybind/mgr/dashboard/services/rgw_client.py", line 957, in get_bucket_policy
raise e
File "/ceph/src/pybind/mgr/dashboard/services/rgw_client.py", line 949, in get_bucket_policy
request = request()
File "/ceph/src/pybind/mgr/dashboard/rest_client.py", line 325, in __call__
data, raw_content, headers)
File "/ceph/src/pybind/mgr/dashboard/rest_client.py", line 428, in do_request
resp.content)
dashboard.rest_client.RequestException: RGW REST API failed request with status code 404
(b'{"Code":"NoSuchBucket","Message":"","BucketName":"hello","RequestId":"tx0000'
b'0d73bbbad485175ea-0066630dd1-18785-zone1-zg1-realm1","HostId":"18785-zone1-z'
b'g1-realm1-zg1-realm1"}')
```
But for the same bucket the encryption and other req goes through the correct gateway
```
2024-06-07T13:40:32.704+0000 7f563be00700 0 [dashboard DEBUG urllib3.connectionpool] http://172.20.0.5:8002 "GET /hello?versioning HTTP/1.1" 200 2
2024-06-07T13:40:32.745+0000 7f563be00700 0 [dashboard DEBUG rest_client] RGW REST API GET res status: 200 content: {}
2024-06-07T13:40:32.745+0000 7f563be00700 0 [dashboard INFO rgw_client] Found RGW daemon with configuration: host=172.20.0.5, port=8000, ssl=False
2024-06-07T13:40:32.746+0000 7f563be00700 0 [dashboard INFO rgw_client] Found RGW daemon with configuration: host=172.20.0.5, port=8002, ssl=False
2024-06-07T13:40:32.746+0000 7f563be00700 0 [dashboard DEBUG rest_client] RGW REST API GET req: /hello?encryption data: None
2024-06-07T13:40:32.747+0000 7f563be00700 0 [dashboard DEBUG urllib3.connectionpool] http://172.20.0.5:8002 "GET /hello?encr
```
Fixes: https://tracker.ceph.com/issues/66395 Signed-off-by: Nizamudeen A <nia@redhat.com>