This should fix the chmod 777 /var/log/ceph failures.
We were missing the install task which resulted in no /var/log/ceph:
```
2025-07-07T08:55:44.586 INFO:teuthology.run_tasks:Running task ceph...
2025-07-07T08:55:44.679 INFO:tasks.ceph:Making ceph log dir writeable by
non-root...
2025-07-07T08:55:44.679 DEBUG:teuthology.orchestra.run.smithi144:> sudo
chmod 777 /var/log/ceph
2025-07-07T08:55:44.711
INFO:teuthology.orchestra.run.smithi144.stderr:chmod: cannot access
'/var/log/ceph': No such file or directory
```
.github/workflows/scripts/config-diff-post-comment.js: fix config check ok logic
currently, whenever a "config diff tool output" comment is created it
also has the string `/config check ok` string in it. The next time the
test run it see's this text and assumes that the user has commented it.
We fix the logic to makes sure that we ignore such cases.
Kefu Chai [Mon, 30 Jun 2025 08:48:09 +0000 (16:48 +0800)]
osdc: remove unused rados.h include from error_code.h
Remove unnecessary `#include "include/rados.h"` from error_code.h as it's not
used by the header and error_code.h doesn't need to expose any RADOS
declarations.
This improves compilation time and reduces unnecessary dependencies.
mgr/dashboard: Enable rgw module automatically in the primary and secondary cluster if not enabled during multi-site automation
1. Enable rgw module automatically in the primary and secondary cluster if not enabled during multi-site automation
2. Improve progress bar descriptions and add sub-descriptions for steps
The crash module has been enabled by default since commit 18f253aa in
Nautilus and is now in the always_on_modules list. However, the
documentation still contained instructions for manually enabling it.
When users followed these outdated instructions, they encountered:
```
module 'crash' is already enabled (always-on)
```
The module cannot be disabled either. Running:
```
ceph mgr module disable crash
```
Returns the error:
```
Error EINVAL: module 'crash' cannot be disabled (always-on)
```
In this change, we remove the obsolete enabling instructions and clarify
that this module is always active and cannot be disabled.
Kirill Nazarov [Sun, 26 Jan 2025 19:08:24 +0000 (22:08 +0300)]
rbd: add --estimated-size option for import from stdin
One issue with importing from stdin is that it's not easy to track
progress. The only feasible option is to process messages on the highest
log level looking for lines like
but when it comes to large images it takes a lot of effort.
This commit introduces --estimated-size option, that makes it possible
to print out progress in percents via the standard mechanism. Obviously,
it requires the knowledge of the amount of provided data in advance and
in case of an error nonsensical percents might be printed, but I don't
think it's that big of a deal.
Also use `estimated size` as the base image size, making resizing not
necessary in cases where we know the exact amount of data provided from
stdin.
Mark Kogan [Wed, 25 Jun 2025 12:21:49 +0000 (12:21 +0000)]
qa/rgw: fix perl tests missing Amazon::S3 module
and a second case where perl tests can fail without error output
1. fix errors like: `Can't locate Amazon/S3.pm in @INC (you may need to
install the Amazon::S3 module)`
by priming the perl tests with installing the Amazon::S3 module from cpan
ex:
```
2025-06-23T19:18:40.162 INFO:tasks.workunit.client.0.smithi090.stderr:Can't locate Amazon/S3.pm in @INC (you may need to install the Amazon::S3 module) (@INC contains: /usr/local/lib64/perl5/5.32 ...
```
2. log an error when RGW process is not detected
Fixes: https://tracker.ceph.com/issues/71577 Signed-off-by: Mark Kogan <mkogan@redhat.com>
Yuval Lifshitz [Wed, 18 Jun 2025 12:11:46 +0000 (12:11 +0000)]
test/rgw/notifications: prevent client retries to avoid duplicates
if the RGW is slow, and the client retry, it may cause test to fail
since the number of notifications would be off.
in addition, in slow RGW, we need to verify that the expiry time did
not pass before checking the queue, so we see the expected number of
entries in the queue before they expire.
Yuval Lifshitz [Wed, 18 Jun 2025 12:09:12 +0000 (12:09 +0000)]
rgw/notifications: stop processing when we reach a skipped notifications
if a notification retry should be skipped, we should stop processing
all notifications. if we successfully processing another notification
it will not be removed (as we will remove only up to the marker of the
skipped notification). as a result, the successfull notification will be
processed again.
Ronen Friedman [Wed, 25 Jun 2025 14:25:08 +0000 (09:25 -0500)]
osd/scrub: some perf counters priority was '0'
Some scrub perf counters were created without specifying
individual priorities, assuming by mistake that the
default priority is '_INTERESTING'. That was not the case,
and those perf counters were not reported.
Kefu Chai [Wed, 25 Jun 2025 13:51:04 +0000 (21:51 +0800)]
rgw: do not include unused header
previously, when building cls_rgw, we could have following build
failure:
```
In file included from /home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/cls/rgw/cls_rgw_types.cc:4:
In file included from /home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/cls/rgw/cls_rgw_types.h:15:
In file included from /home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/rgw/rgw_basic_types.h:32:
In file included from /home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/rgw/rgw_user_types.h:27:
In file included from /home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/common/dout.h:29:
In file included from /home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/common/ceph_context.h:41:
In file included from /home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/common/config_proxy.h:7:
In file included from /home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/common/config.h:28:
In file included from /home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/common/config_values.h:59:
/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/common/options/legacy_config_opts.h:1:10: fatal error: 'global_legacy_options.h' file not found
1 | #include "global_legacy_options.h"
| ^~~~~~~~~~~~~~~~~~~~~~~~~
```
but it turned out that `cls_rgw_types.h` does not use `dout.h` at all.
so, in this change, we just drop this include. this helps to reduce
the build dependency.
Kefu Chai [Wed, 25 Jun 2025 04:14:36 +0000 (12:14 +0800)]
mgr/dashboard: Fix inline markup warning in API documentation
Remove trailing space from summary field that was causing Sphinx build
warning.
Sphinx was generating a warning due to malformed inline markup:
```
/home/kefu/dev/ceph/doc/mgr/ceph_api/index.rst:3349: WARNING: Inline strong start-string without end-string.`
```
The openapi directive appears to convert trailing spaces into asterisk
markers, creating unterminated strong markup. This change removes the
trailing space to eliminate the warning and maintain consistency with
other entries in the file.
When the cluster needs to be read, the completion is posted to ASIO.
However, in the two special cases (cluster DNE and zero cluster), the
completion is completed inline at the moment. This violates invariants
and can eventually lead to a lockup. For example, in a scenario of
a read from a clone image whose parent is under migration:
io::ObjectReadRequest::read_parent()
io::util::read_parent()
< image_lock is taken for read >
io::ImageDispatchSpec::send()
migration::ImageDispatch::read()
migration::QCOWFormat::ReadRequest::send()
...
migration::QCOWFormat::ReadRequest::read_clusters()
< cluster DNE >
migration::QCOWFormat::ReadRequest::handle_read_clusters()
io::AioCompletion::complete()
io::ObjectReadRequest::copyup()
is_copy_on_read()
< image_lock is taken for read >
copyup() expects to be called with no locks held, but going through
QCOWFormat in the "cluster DNE" case essentially maintains image_lock
taken in read_parent() and then it's taken again by the same thread in
is_copy_on_read(). Under pthreads, it's not a problem:
A thread may hold multiple concurrent read locks on rwlock (that is,
successfully call the pthread_rwlock_rdlock() function n times). If
so, the thread must perform matching unlocks (that is, it must call
the pthread_rwlock_unlock() function n times).
But according to C++ standard it's undefined behavior:
If lock_shared is called by a thread that already owns the mutex in
any mode (exclusive or shared), the behavior is undefined.
Other, longer and more elaborate, call chains are possible too and
there it may end up being a write lock, a tripped assertion, etc. To
avoid this, make the special cases in read_clusters() behave the same
as the main path.
Zac Dover [Wed, 25 Jun 2025 09:19:49 +0000 (19:19 +1000)]
doc/radosgw: line edit bucket_logging.rst
Edit doc/radosgw/bucket_logging.rst so that it is not solecistic and so
that its punctuation is corrected and its use of articles is corrected.
This file remains in my judgment demotic and maybe demotic enough to
warrant another editorial pass in the future.
Venky Shankar [Wed, 25 Jun 2025 06:39:39 +0000 (12:09 +0530)]
Merge PR #59435 into main
* refs/pull/59435/head:
mgr/volumes: Fix json.loads for test on mon caps
mgr/volumes: Add test for mon caps if auth key has remaining mds/osd caps
mgr/volumes: Keep mon caps if auth key has remaining mds/osd caps
Add comprehensive documentation for defining configuration options in
ceph-mgr modules, including all supported properties and their usage.
Previously, the documentation did not explain how to define ceph-mgr
module configuration options, despite subtle differences from other Ceph
components. This change documents all supported Option properties, their
types, and provides clear examples to help module developers properly
configure their options.
Kefu Chai [Wed, 25 Jun 2025 03:02:46 +0000 (11:02 +0800)]
doc: do not depend on typed-ast
the typed-ast project was marked end of life since July 2023, and
not maintained anymore. since we build the document using readthedocs'
service, and in .readtherdocs.yml we use python 3.9, which comes with
ast module included by its standard library.
the typed-ast dependency was originally added in 30d41597, but now that
we are using python 3.9, there is no need to use this module anymore.
Kefu Chai [Wed, 25 Jun 2025 03:50:24 +0000 (11:50 +0800)]
doc/dev/config: Document how to use :confval: directive for config options
Add comprehensive guide for documenting configuration options using the
:confval: directive, including naming conventions and cross-referencing.
Previously, the documentation lacked guidance on using the :confval:
directive and the important distinction between regular config options
and mgr module options (which require the mgr/<module>/ namespace
prefix). This change provides detailed examples and best practices for
properly documenting and referencing both types of configuration options.
Kefu Chai [Tue, 24 Jun 2025 14:38:13 +0000 (22:38 +0800)]
rbd: fix unused function warning when WITH_KRBD is disabled
Guard print_error_description() and get_unsupported_features() with
`#ifdef WITH_KRBD` to prevent compiler warnings when KRBD support is
not enabled.
These functions are only called by do_kernel_map(), which is itself
conditionally compiled. When WITH_KRBD is not defined, the compiler
generates unused function warnings for these helper functions.
Fixes warning:
```
/home/kefu/dev/ceph/src/tools/rbd/action/Kernel.cc:305:13: warning: ‘void rbd::action::kernel::print_error_description(const char*, const char*, const char*, const char*, int)’ defined but not used [-Wunused-function]
305 | static void print_error_description(const char *poolname,
| ^~~~~~~~~~~~~~~~~~~~~~~
```