Zac Dover [Wed, 11 Jun 2025 12:44:32 +0000 (22:44 +1000)]
doc/rados/ops: edit cache-tiering.rst
Add material to doc/rados/operations/cache-tiering.rst, as suggested by
Anthony D'Atri in
https://github.com/ceph/ceph/pull/63745#discussion_r2127887785.
Ville Ojamo [Wed, 30 Apr 2025 18:17:14 +0000 (01:17 +0700)]
doc/radosgw: Improve rgw-cache.rst
Try to improve the language by completely rewriting some sentences.
Attempt to format the document more like the rest of the docs.
Fix several errors in punctuation, capitalization, spaces etc.
Use blocks with bash prompts for CLI commands instead of hardcoded
prompts.
Fix section hierarchy and section title underline lengths.
Use admonition.
Add comprehensive documentation for defining configuration options in
ceph-mgr modules, including all supported properties and their usage.
Previously, the documentation did not explain how to define ceph-mgr
module configuration options, despite subtle differences from other Ceph
components. This change documents all supported Option properties, their
types, and provides clear examples to help module developers properly
configure their options.
Casey Bodley [Wed, 18 Dec 2024 16:28:02 +0000 (11:28 -0500)]
rgw: don't use merge_and_store_attrs() when recreating a bucket
https://github.com/ceph/ceph/pull/56583 recently fixed
merge_and_store_attrs() to preserve existing attrs, but this broke the
swift api's ability to remove container metadata. RGWCreateBucket
handles this merging itself with prepare_add_del_attrs(), so we should
just assign createparams.attrs to the bucket and store it with
bucket->put_info()
make the same change for RGWPutMetadataBucket which swift uses to
add/remove existing metadata
J. Eric Ivancich [Tue, 22 Oct 2024 17:17:14 +0000 (13:17 -0400)]
rgw: fix empty storage class on display of multipart uploads
Some multipart uploads do not have a stored storage class, however the
code is written such that an empty storage class is treated as the
"STANDARD" storage class. So when encoding the storage class in JSON,
use the canonical storage.
The crash module has been enabled by default since commit 18f253aa in
Nautilus and is now in the always_on_modules list. However, the
documentation still contained instructions for manually enabling it.
When users followed these outdated instructions, they encountered:
```
module 'crash' is already enabled (always-on)
```
The module cannot be disabled either. Running:
```
ceph mgr module disable crash
```
Returns the error:
```
Error EINVAL: module 'crash' cannot be disabled (always-on)
```
In this change, we remove the obsolete enabling instructions and clarify
that this module is always active and cannot be disabled.
Kefu Chai [Wed, 25 Jun 2025 04:14:36 +0000 (12:14 +0800)]
mgr/dashboard: Fix inline markup warning in API documentation
Remove trailing space from summary field that was causing Sphinx build
warning.
Sphinx was generating a warning due to malformed inline markup:
```
/home/kefu/dev/ceph/doc/mgr/ceph_api/index.rst:3349: WARNING: Inline strong start-string without end-string.`
```
The openapi directive appears to convert trailing spaces into asterisk
markers, creating unterminated strong markup. This change removes the
trailing space to eliminate the warning and maintain consistency with
other entries in the file.
Mark Kogan [Wed, 25 Jun 2025 12:21:49 +0000 (12:21 +0000)]
qa/rgw: fix perl tests missing Amazon::S3 module
and a second case where perl tests can fail without error output
1. fix errors like: `Can't locate Amazon/S3.pm in @INC (you may need to
install the Amazon::S3 module)`
by priming the perl tests with installing the Amazon::S3 module from cpan
ex:
```
2025-06-23T19:18:40.162 INFO:tasks.workunit.client.0.smithi090.stderr:Can't locate Amazon/S3.pm in @INC (you may need to install the Amazon::S3 module) (@INC contains: /usr/local/lib64/perl5/5.32 ...
```
Conflicts:
qa/suites/rgw/multifs/0-install.yaml
- Doesn't exist in this branch, or support for it, so duplicated in
individual tests.
qa/suites/rgw/multifs/tasks/rgw_bucket_quota.yaml
qa/suites/rgw/multifs/tasks/rgw_multipart_upload.yaml
qa/suites/rgw/multifs/tasks/rgw_user_quota.yaml
- Has overrides no longer needed in main
Fixes: https://tracker.ceph.com/issues/71873 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Deepika Upadhyay [Mon, 28 Oct 2024 09:19:52 +0000 (14:49 +0530)]
rgw: make keystone work without admin token(service ac requirement)
Ceph RGW admin credentials must not be a requirement.
Both ec2 auth and keystone token validation work without an admin token.
And the user token verification will use its own token. The only
requirement for the service admin user token is the allow_expired, but
in our case we don't use this parameter.
When the cluster needs to be read, the completion is posted to ASIO.
However, in the two special cases (cluster DNE and zero cluster), the
completion is completed inline at the moment. This violates invariants
and can eventually lead to a lockup. For example, in a scenario of
a read from a clone image whose parent is under migration:
io::ObjectReadRequest::read_parent()
io::util::read_parent()
< image_lock is taken for read >
io::ImageDispatchSpec::send()
migration::ImageDispatch::read()
migration::QCOWFormat::ReadRequest::send()
...
migration::QCOWFormat::ReadRequest::read_clusters()
< cluster DNE >
migration::QCOWFormat::ReadRequest::handle_read_clusters()
io::AioCompletion::complete()
io::ObjectReadRequest::copyup()
is_copy_on_read()
< image_lock is taken for read >
copyup() expects to be called with no locks held, but going through
QCOWFormat in the "cluster DNE" case essentially maintains image_lock
taken in read_parent() and then it's taken again by the same thread in
is_copy_on_read(). Under pthreads, it's not a problem:
A thread may hold multiple concurrent read locks on rwlock (that is,
successfully call the pthread_rwlock_rdlock() function n times). If
so, the thread must perform matching unlocks (that is, it must call
the pthread_rwlock_unlock() function n times).
But according to C++ standard it's undefined behavior:
If lock_shared is called by a thread that already owns the mutex in
any mode (exclusive or shared), the behavior is undefined.
Other, longer and more elaborate, call chains are possible too and
there it may end up being a write lock, a tripped assertion, etc. To
avoid this, make the special cases in read_clusters() behave the same
as the main path.
Kefu Chai [Wed, 25 Jun 2025 03:50:24 +0000 (11:50 +0800)]
doc/dev/config: Document how to use :confval: directive for config options
Add comprehensive guide for documenting configuration options using the
:confval: directive, including naming conventions and cross-referencing.
Previously, the documentation lacked guidance on using the :confval:
directive and the important distinction between regular config options
and mgr module options (which require the mgr/<module>/ namespace
prefix). This change provides detailed examples and best practices for
properly documenting and referencing both types of configuration options.
Jos Collin [Tue, 6 May 2025 11:50:39 +0000 (17:20 +0530)]
qa: fix test_cephfs_mirror_stats failure
* Don't create huge files that results in 'No space left on device'.
* Relax last_synced_end > last_synced_start check, so that
the test wouldn't fail even if 'counter dump' delays getting updated
values within a particular snapshot sync.
Fixes: https://tracker.ceph.com/issues/71186 Signed-off-by: Jos Collin <jcollin@redhat.com>
(cherry picked from commit 9738b8d36275fda42d847058aab55ba1e6e6e7fc)
Jos Collin [Fri, 13 Dec 2024 02:53:07 +0000 (08:23 +0530)]
qa: fix test_cephfs_mirror_stats failure
100MB files would take less than a second to sync, which makes no difference
in 'last_synced_end' and the test fails intermittently. We need to increase the
size of the files, as the time/duration is determined only in seconds.
Because of this, it also needs more sleep time before checking the status.
Fixes: https://tracker.ceph.com/issues/69232 Signed-off-by: Jos Collin <jcollin@redhat.com>
(cherry picked from commit 005e492288b71c641f33396cc8b13cc53d52b478)
Zac Dover [Mon, 23 Jun 2025 08:18:07 +0000 (18:18 +1000)]
doc/radosgw: remove "pubsub_event_lost"
Remove "pubsub_event_lost" from the list of "Notification Performance
Statistics" in doc/radosgw/notifications.rst. "pubsub_event_lost" is now
obsolete.
J. Eric Ivancich [Thu, 22 May 2025 20:15:56 +0000 (16:15 -0400)]
rgw: make sure max_objs_per_shard is appropriate in debugging scenarios
When we have a versioned bucket, we reduce max_objs_per_shard by a
factor of 3 to account for the extra bucket index entries required in
such buckets. And during debugging, we may want to induce early
resharding by setting max_objs_per_shard to an artificially low
value. Combined, that math could result in max_objs_per_shard with a
value of 0 that would cause a division by zero crash. This fixes that.