qa/tasks/cephadm: don't wait for OSDs in create_rbd_pool()
This fails because teuthology.wait_until_osds_up() wants to use
adjust-ulimits wrapper which isn't available in "cephadm shell"
environment. The whole thing is also redundant because cephadm task
is supposed to wait for OSDs to come up earlier, in ceph_osds().
Merge pull request #57996 from kchheda3/wip-66443-squid
squid: rgw/notification: Store the value of `persistent_queue` for existing topics and continue commiting events for all topics subscribed to given bucket
Casey Bodley [Wed, 26 Jun 2024 14:52:37 +0000 (10:52 -0400)]
rgw: fix multipart get part when count==1
the RGWObjManifest for multipart uploads is subtly different when
there's only a single part. in that case, get_cur_part_id() for the
final rule returns 1 where it otherwise returns (parts_count + 1)
this caused two problems:
* we returned a parts_count of 0 instead 1, and
* the do-while loop got stuck in an infinite loop expecting the last
rule's part id to be higher than the requested part id
authentication.rst described the steps to generate a v2 signature,
without reference to aws docs. replace that with sections that reference
aws docs for v2 and v4 signatures. list which values of the request
header x-amz-content-sha256 are supported for v4
Document how to manually pass the search domain to "mon_dns_srv_name" in
doc/rados/configuration/mon-lookup-dns.rst.
This commit is made in response to a request by Lander Duncan that was made on the [ceph-users] mailing list, and can be seen here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/F7V4CWLIYCAJ4JXI2JLNY6QPCFPR4SLA/
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 98938a0312dd0c8e0b293ed9aa2e0760cc9619fa)
Repair the link to cephfs-shell.rst in doc/cephfs/cephfs-shell.rst that
was broken in https://github.com/ceph/ceph/pull/41165/ when
doc/cephfs/cephfs-shell.rst was moved to doc/man/8/cephfs-shell.rst.
This commit is made in response to a request by Lander Duncan that was
made on the [ceph-users] mailing list, and can be seen here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/F7V4CWLIYCAJ4JXI2JLNY6QPCFPR4SLA/
Nizamudeen A [Wed, 26 Jun 2024 13:22:40 +0000 (18:52 +0530)]
mgr/dashboard: fix clone async validators with different groups
Providing a way to dynamically update the async validator based on the
selector field so that when the selected value changes, the depended
field like the clone name gets validated again against the new value
Pere Diaz Bou [Wed, 26 Jun 2024 13:57:47 +0000 (15:57 +0200)]
doc/rados: update how to install c++ header files
In this example librados2-devel only install C header files on fedora 40,
therefore I added libradospp-devel to the command to include C++ header files.
Zac Dover [Mon, 24 Jun 2024 10:32:30 +0000 (20:32 +1000)]
doc/rados: edit troubleshooting-osd.rst
Make minor changes to the "Debugging Slow Requests" section of
doc/rados/troubleshooting/troubleshooting-osd.rst in preparation
for an expansion of this section in response to a reqeust from Joel
Davidow.
Bill Scales [Wed, 19 Jun 2024 08:36:06 +0000 (08:36 +0000)]
osd/ECBackend.cc: Fix double increment of num_shards_repaired stat
Commit https://github.com/ceph/ceph/commit/deffa8209f9c0bd300cfdb54d358402bfc6e41c6 refactored
ECBackend::handle_recovery_push for Crimson but accidentally duplicated the code that increments
the num_shards_repaired OSD statistic.
This caused one of the QA tests to fail because the stat reported twice as much repair work
had been completed than expected:
qa/standalone/scrub/osd-scrub-repair.sh: TEST_repair_stats_ec: test 26 = 13
Fixes: https://tracker.ceph.com/issues/64437 Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit e618dc01a7a1bdfaa3e1a6fa2a9a9ac13eee11b8)
Nizamudeen A [Fri, 7 Jun 2024 07:45:06 +0000 (13:15 +0530)]
mgr/dashboard: select default daemon based on the default zonegroup
if multisite is configured, the default daemon needs to be selected
based on the default zonegroup. Otherwise dashboard gives you incorrect
details when doing the period commit
The issue occurs when you do a period update --commit and you reload one
of the block page, the api assigns the zonegroup of the second gateway
because for a moment, the first gateway reflects the period changes...
This is not true because the default zonegroup is of the previous active
gateway but even though the back-end correctly says the active
zonegroup, the dashboard api says it wrongly.
Nizamudeen A [Fri, 24 May 2024 14:20:11 +0000 (19:50 +0530)]
mgr/dashboard: apply replication policy for a bucket
On a normal multisite configured cluster, you can create a bucket with
this replication enabled which will stop the normal syncing and starts
doing the granular bucket syncing; meaning only the bucket with the
replication enabled will be syncing to the secondary site.
To enable replication, there should be a group policy created in the
primary site. If no group policy is there, the dashboard will create
one with bidirectional rule and add all the zones in the zonegroup for
syncing.
Nizamudeen A [Fri, 24 May 2024 15:16:17 +0000 (20:46 +0530)]
mgr/dashboard: add dueTime to rgw bucket validator
the unique async validator which checks if the typed bucket is existing
or not in the bucket creation form sends a request to the backend on
each keystroke. Each keystroke will raise an exception if the bucket is
not found.
Nizamudeen A [Fri, 7 Jun 2024 13:49:42 +0000 (19:19 +0530)]
mgr/dashboard: fix edit bucket failing in other selected gateways
even if I select gateway 8002, the bucket policy req seems to go through 8000 and doesn't find the bucket
```
2024-06-07T13:40:33.161+0000 7f563be00700 0 [dashboard DEBUG rest_client] RGW REST API GET req: /hello?policy data: None
2024-06-07T13:40:33.164+0000 7f563be00700 0 [dashboard DEBUG urllib3.connectionpool] http://172.20.0.5:8000 "GET /hello?policy HTTP/1.1" 404 174
2024-06-07T13:40:33.164+0000 7f563be00700 0 [dashboard ERROR rest_client] RGW REST API failed GET req status: 404
2024-06-07T13:40:33.164+0000 7f563be00700 0 [dashboard ERROR exception] Internal Server Error
Traceback (most recent call last):
File "/ceph/src/pybind/mgr/dashboard/services/exception.py", line 47, in dashboard_exception_handler
return handler(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/cherrypy/_cpdispatch.py", line 54, in __call__
return self.callable(*self.args, **self.kwargs)
File "/ceph/src/pybind/mgr/dashboard/controllers/_base_controller.py", line 263, in inner
ret = func(*args, **kwargs)
File "/ceph/src/pybind/mgr/dashboard/controllers/_rest_controller.py", line 193, in wrapper
return func(*vpath, **params)
File "/ceph/src/pybind/mgr/dashboard/controllers/rgw.py", line 463, in get
result['bucket_policy'] = self._get_policy(bucket_name)
File "/ceph/src/pybind/mgr/dashboard/controllers/rgw.py", line 381, in _get_policy
return rgw_client.get_bucket_policy(bucket)
File "/ceph/src/pybind/mgr/dashboard/rest_client.py", line 543, in func_wrapper
**kwargs)
File "/ceph/src/pybind/mgr/dashboard/services/rgw_client.py", line 957, in get_bucket_policy
raise e
File "/ceph/src/pybind/mgr/dashboard/services/rgw_client.py", line 949, in get_bucket_policy
request = request()
File "/ceph/src/pybind/mgr/dashboard/rest_client.py", line 325, in __call__
data, raw_content, headers)
File "/ceph/src/pybind/mgr/dashboard/rest_client.py", line 428, in do_request
resp.content)
dashboard.rest_client.RequestException: RGW REST API failed request with status code 404
(b'{"Code":"NoSuchBucket","Message":"","BucketName":"hello","RequestId":"tx0000'
b'0d73bbbad485175ea-0066630dd1-18785-zone1-zg1-realm1","HostId":"18785-zone1-z'
b'g1-realm1-zg1-realm1"}')
```
But for the same bucket the encryption and other req goes through the correct gateway
```
2024-06-07T13:40:32.704+0000 7f563be00700 0 [dashboard DEBUG urllib3.connectionpool] http://172.20.0.5:8002 "GET /hello?versioning HTTP/1.1" 200 2
2024-06-07T13:40:32.745+0000 7f563be00700 0 [dashboard DEBUG rest_client] RGW REST API GET res status: 200 content: {}
2024-06-07T13:40:32.745+0000 7f563be00700 0 [dashboard INFO rgw_client] Found RGW daemon with configuration: host=172.20.0.5, port=8000, ssl=False
2024-06-07T13:40:32.746+0000 7f563be00700 0 [dashboard INFO rgw_client] Found RGW daemon with configuration: host=172.20.0.5, port=8002, ssl=False
2024-06-07T13:40:32.746+0000 7f563be00700 0 [dashboard DEBUG rest_client] RGW REST API GET req: /hello?encryption data: None
2024-06-07T13:40:32.747+0000 7f563be00700 0 [dashboard DEBUG urllib3.connectionpool] http://172.20.0.5:8002 "GET /hello?encr
```
commit [1] introduced a behavior change.
`ceph-volume lvm prepare` used to create VGs/LVs when it was passed partitions
for db and/or wal devices. Since commit [1] has been introduced, it made ceph-volume
consume the partition directly, it doesn't create LV anymore. Although this
doesn't prevent from creating OSDs, this is a behavior change.
Laura Flores [Wed, 19 Jun 2024 21:57:45 +0000 (16:57 -0500)]
qa/suites/upgrade/telemetry-upgrade/reef-x: update how cephadm is pulled and change image reference
Update how cephadm is pulled:
`cephadm_git_url` and `cephadm_branch` are used in releases older than reef
to install cephadm. Both of these keys are needed to install it from the github
repo.
However, in reef and on, the compiled zipapp cephadm needs to be pulled differently
than the old single python script `cephadm` from earlier releases.
Laura Flores [Wed, 19 Jun 2024 21:07:31 +0000 (16:07 -0500)]
qa/suites/upgrade/telemetry-upgrade: add more ignorelist items and require_osd_release=squid
The warnings added to the ignorelist show up in the cluster log, but they are
expected during upgrades and should thus be ignored.
We also need to set require_osd_release=squid to avoid this warning:
```
cluster [WRN] Health check failed: all OSDs are running squid or later but require_osd_release < squid (OSD_UPGRADE_FINISHED)
```
Laura Flores [Tue, 11 Jun 2024 20:10:01 +0000 (15:10 -0500)]
qa/suites/upgrade/telemetry-upgrade: upgrade from reef instead of pacific
With cephadm upgrades, we are only allowed to upgrade from as far back as N-2
releases. On the main branch, that means we can only upgrade from quincy and reef, and
we can no longer upgrade from pacific.
This test was trying to upgrade from pacific, which isn't allowed, which led to an
`UPGRADE_BAD_TARGET_VERSION` cluster error.