.github/workflows/scripts/config-diff-post-comment.js: fix config check ok logic
currently, whenever a "config diff tool output" comment is created it
also has the string `/config check ok` string in it. The next time the
test run it see's this text and assumes that the user has commented it.
We fix the logic to makes sure that we ignore such cases.
Kefu Chai [Mon, 30 Jun 2025 08:48:09 +0000 (16:48 +0800)]
osdc: remove unused rados.h include from error_code.h
Remove unnecessary `#include "include/rados.h"` from error_code.h as it's not
used by the header and error_code.h doesn't need to expose any RADOS
declarations.
This improves compilation time and reduces unnecessary dependencies.
mgr/dashboard: Enable rgw module automatically in the primary and secondary cluster if not enabled during multi-site automation
1. Enable rgw module automatically in the primary and secondary cluster if not enabled during multi-site automation
2. Improve progress bar descriptions and add sub-descriptions for steps
Kefu Chai [Mon, 30 Jun 2025 02:58:31 +0000 (10:58 +0800)]
cmake/modules/FindBoost: add support for Boost 1.88.0
Add Boost 1.88.0 to the supported versions list and update component
dependencies to eliminate build warnings.
This resolves the following warning when building with Boost 1.88.0:
```
-- Found Boost: /usr/include (found suitable version "1.88.0", minimum required is "1.73.0")
CMake Warning at cmake/modules/FindBoost.cmake:1413 (message):
New Boost version may have incorrect or missing dependencies and imported
targets
Call Stack (most recent call first):
cmake/modules/FindBoost.cmake:1538 (_Boost_COMPONENT_DEPENDENCIES)
cmake/modules/FindBoost.cmake:2157 (_Boost_MISSING_DEPENDENCIES)
src/CMakeLists.txt:461 (_find_package)
src/seastar/cmake/SeastarDependencies.cmake:136 (find_package)
src/seastar/CMakeLists.txt:395 (seastar_find_dependencies)
```
Boost 1.88.0 was released on April 3, 2025, and is already available
in some distributions. Since many distributions don't yet ship Boost's
native CMake configuration files, our vendored FindBoost.cmake module
needs updating to handle this version.
The component dependencies were updated following the scanning procedure
documented in the _Boost_COMPONENT_DEPENDENCIES() function. The change
will be upstreamed to CMake shortly.
Kefu Chai [Mon, 30 Jun 2025 02:01:20 +0000 (10:01 +0800)]
cmake/modules/FindBoost: sync with upstream FindBoost.cmake
Update our local FindBoost.cmake module to match CMake upstream's
latest version to properly handle Boost component dependencies.
While commit b446290f prevented warnings when checking Boost 1.87,
it failed to update the boost component dependency mappings. This
change synchronizes our module with upstream to ensure correct
dependency resolution.
Additionally, this prepares for Boost 1.88.0 support (released
April 3, 2025) which some distributions have already adopted.
Since CMake upstream hasn't yet added 1.88 support, this sync
provides the foundation for adding 1.88.0 compatibility in a
subsequent commit.
Changes made:
- Sync with upstream CMake FindBoost.cmake (commit 17726227)
(at https://github.com/Kitware/CMake/blob/1772622772133fad3b348ca4a5b4df3bbd69da75/Modules/FindBoost.cmake)
- Reapply local modifications from commit 06824bc1
Note: New dependencies can be scanned using:
```
cmake -DBOOST_DIR=/path/to/boost_1_88_0 -P
Utilities/Scripts/BoostScanDeps.cmake
```
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
cmake: adapt FindBoost.cmake to our needs
the vanilla FindBoost.cmake pulled from cmake has couple assumptions
which do not hold in our environment. so address them case by case.
The crash module has been enabled by default since commit 18f253aa in
Nautilus and is now in the always_on_modules list. However, the
documentation still contained instructions for manually enabling it.
When users followed these outdated instructions, they encountered:
```
module 'crash' is already enabled (always-on)
```
The module cannot be disabled either. Running:
```
ceph mgr module disable crash
```
Returns the error:
```
Error EINVAL: module 'crash' cannot be disabled (always-on)
```
In this change, we remove the obsolete enabling instructions and clarify
that this module is always active and cannot be disabled.
Kefu Chai [Sun, 29 Jun 2025 02:15:25 +0000 (10:15 +0800)]
test/erasure-code: fix stack-use-after-scope by replacing initializer_list with array
Previously, we used std::array<std::initializer_list<int>, 27> to store
a multi-dimensional array. However, initializer_list objects only hold
pointers to their underlying data, not the data itself. When initialized
with brace-enclosed lists like {0,1,2,3}, the temporary arrays created
by these literals are destroyed after the initialization expression
completes, leaving the initializer_list objects pointing to deallocated
memory.
This caused AddressSanitizer to detect stack-use-after-scope errors when
getint() attempted to iterate over the initializer_list contents:
```
==2085499==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7f5fe9803580 at pc 0x55d851bea586 bp 0x7ffc9816a5b0 sp 0x7ffc9816a5a8
READ of size 4 at 0x7f5fe9803580 thread T0
#0 0x55d851bea585 in getint(std::initializer_list<int>) /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/erasure-code/TestErasureCodeShec_arguments.cc:46:21
#1 0x55d851bf0258 in int std::__invoke_impl<int, int (*&)(std::initializer_list<int>), std::initializer_list<int>&>(std::__invoke_other, int (*&)(std::initializer_list<int>), std::initializer_list<int>&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14
...
Address 0x7f5fe9803580 is located in stack of thread T0 at offset 1408 in frame
#0 0x55d851bdd07f in create_table_shec432() /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/erasure-code/TestErasureCodeShec_arguments.cc:52
```
Fix this by using std::array<std::array<int, 4>, 27> instead, which
actually owns and stores the data rather than just pointing to it.
Kirill Nazarov [Sun, 26 Jan 2025 19:08:24 +0000 (22:08 +0300)]
rbd: add --estimated-size option for import from stdin
One issue with importing from stdin is that it's not easy to track
progress. The only feasible option is to process messages on the highest
log level looking for lines like
but when it comes to large images it takes a lot of effort.
This commit introduces --estimated-size option, that makes it possible
to print out progress in percents via the standard mechanism. Obviously,
it requires the knowledge of the amount of provided data in advance and
in case of an error nonsensical percents might be printed, but I don't
think it's that big of a deal.
Also use `estimated size` as the base image size, making resizing not
necessary in cases where we know the exact amount of data provided from
stdin.
John Mulligan [Fri, 27 Jun 2025 15:08:39 +0000 (11:08 -0400)]
ceph.spec.in: conditionalize crimson gts version on el10
EL10 distros come with GCC 14. When crimson was enabled it was always
trying to set gts_version to 13 (gcc-toolset version). Make the use of
gts version conditional on using el versions lower than 10.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Fri, 27 Jun 2025 15:04:44 +0000 (11:04 -0400)]
install-deps.sh: add a temporary repo for missing el10 deps
Add a new dnf/yum repository hosted in the ceph lab infra for providing
the last few dependencies missing from other el10 repos.
Hopefully we can remove this soon but it serves as a stopgap as we work
on getting el10 builds working in the ceph CI infra and tested.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Mark Kogan [Wed, 25 Jun 2025 12:21:49 +0000 (12:21 +0000)]
qa/rgw: fix perl tests missing Amazon::S3 module
and a second case where perl tests can fail without error output
1. fix errors like: `Can't locate Amazon/S3.pm in @INC (you may need to
install the Amazon::S3 module)`
by priming the perl tests with installing the Amazon::S3 module from cpan
ex:
```
2025-06-23T19:18:40.162 INFO:tasks.workunit.client.0.smithi090.stderr:Can't locate Amazon/S3.pm in @INC (you may need to install the Amazon::S3 module) (@INC contains: /usr/local/lib64/perl5/5.32 ...
```
2. log an error when RGW process is not detected
Fixes: https://tracker.ceph.com/issues/71577 Signed-off-by: Mark Kogan <mkogan@redhat.com>
Yuval Lifshitz [Wed, 18 Jun 2025 12:11:46 +0000 (12:11 +0000)]
test/rgw/notifications: prevent client retries to avoid duplicates
if the RGW is slow, and the client retry, it may cause test to fail
since the number of notifications would be off.
in addition, in slow RGW, we need to verify that the expiry time did
not pass before checking the queue, so we see the expected number of
entries in the queue before they expire.
Yuval Lifshitz [Wed, 18 Jun 2025 12:09:12 +0000 (12:09 +0000)]
rgw/notifications: stop processing when we reach a skipped notifications
if a notification retry should be skipped, we should stop processing
all notifications. if we successfully processing another notification
it will not be removed (as we will remove only up to the marker of the
skipped notification). as a result, the successfull notification will be
processed again.
Ronen Friedman [Wed, 25 Jun 2025 14:25:08 +0000 (09:25 -0500)]
osd/scrub: some perf counters priority was '0'
Some scrub perf counters were created without specifying
individual priorities, assuming by mistake that the
default priority is '_INTERESTING'. That was not the case,
and those perf counters were not reported.
Kefu Chai [Wed, 25 Jun 2025 13:51:04 +0000 (21:51 +0800)]
rgw: do not include unused header
previously, when building cls_rgw, we could have following build
failure:
```
In file included from /home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/cls/rgw/cls_rgw_types.cc:4:
In file included from /home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/cls/rgw/cls_rgw_types.h:15:
In file included from /home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/rgw/rgw_basic_types.h:32:
In file included from /home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/rgw/rgw_user_types.h:27:
In file included from /home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/common/dout.h:29:
In file included from /home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/common/ceph_context.h:41:
In file included from /home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/common/config_proxy.h:7:
In file included from /home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/common/config.h:28:
In file included from /home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/common/config_values.h:59:
/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/common/options/legacy_config_opts.h:1:10: fatal error: 'global_legacy_options.h' file not found
1 | #include "global_legacy_options.h"
| ^~~~~~~~~~~~~~~~~~~~~~~~~
```
but it turned out that `cls_rgw_types.h` does not use `dout.h` at all.
so, in this change, we just drop this include. this helps to reduce
the build dependency.
Kefu Chai [Wed, 25 Jun 2025 04:14:36 +0000 (12:14 +0800)]
mgr/dashboard: Fix inline markup warning in API documentation
Remove trailing space from summary field that was causing Sphinx build
warning.
Sphinx was generating a warning due to malformed inline markup:
```
/home/kefu/dev/ceph/doc/mgr/ceph_api/index.rst:3349: WARNING: Inline strong start-string without end-string.`
```
The openapi directive appears to convert trailing spaces into asterisk
markers, creating unterminated strong markup. This change removes the
trailing space to eliminate the warning and maintain consistency with
other entries in the file.
When the cluster needs to be read, the completion is posted to ASIO.
However, in the two special cases (cluster DNE and zero cluster), the
completion is completed inline at the moment. This violates invariants
and can eventually lead to a lockup. For example, in a scenario of
a read from a clone image whose parent is under migration:
io::ObjectReadRequest::read_parent()
io::util::read_parent()
< image_lock is taken for read >
io::ImageDispatchSpec::send()
migration::ImageDispatch::read()
migration::QCOWFormat::ReadRequest::send()
...
migration::QCOWFormat::ReadRequest::read_clusters()
< cluster DNE >
migration::QCOWFormat::ReadRequest::handle_read_clusters()
io::AioCompletion::complete()
io::ObjectReadRequest::copyup()
is_copy_on_read()
< image_lock is taken for read >
copyup() expects to be called with no locks held, but going through
QCOWFormat in the "cluster DNE" case essentially maintains image_lock
taken in read_parent() and then it's taken again by the same thread in
is_copy_on_read(). Under pthreads, it's not a problem:
A thread may hold multiple concurrent read locks on rwlock (that is,
successfully call the pthread_rwlock_rdlock() function n times). If
so, the thread must perform matching unlocks (that is, it must call
the pthread_rwlock_unlock() function n times).
But according to C++ standard it's undefined behavior:
If lock_shared is called by a thread that already owns the mutex in
any mode (exclusive or shared), the behavior is undefined.
Other, longer and more elaborate, call chains are possible too and
there it may end up being a write lock, a tripped assertion, etc. To
avoid this, make the special cases in read_clusters() behave the same
as the main path.
Zac Dover [Wed, 25 Jun 2025 09:19:49 +0000 (19:19 +1000)]
doc/radosgw: line edit bucket_logging.rst
Edit doc/radosgw/bucket_logging.rst so that it is not solecistic and so
that its punctuation is corrected and its use of articles is corrected.
This file remains in my judgment demotic and maybe demotic enough to
warrant another editorial pass in the future.
Venky Shankar [Wed, 25 Jun 2025 06:39:39 +0000 (12:09 +0530)]
Merge PR #59435 into main
* refs/pull/59435/head:
mgr/volumes: Fix json.loads for test on mon caps
mgr/volumes: Add test for mon caps if auth key has remaining mds/osd caps
mgr/volumes: Keep mon caps if auth key has remaining mds/osd caps
Add comprehensive documentation for defining configuration options in
ceph-mgr modules, including all supported properties and their usage.
Previously, the documentation did not explain how to define ceph-mgr
module configuration options, despite subtle differences from other Ceph
components. This change documents all supported Option properties, their
types, and provides clear examples to help module developers properly
configure their options.
Kefu Chai [Wed, 25 Jun 2025 03:02:46 +0000 (11:02 +0800)]
doc: do not depend on typed-ast
the typed-ast project was marked end of life since July 2023, and
not maintained anymore. since we build the document using readthedocs'
service, and in .readtherdocs.yml we use python 3.9, which comes with
ast module included by its standard library.
the typed-ast dependency was originally added in 30d41597, but now that
we are using python 3.9, there is no need to use this module anymore.
Kefu Chai [Wed, 25 Jun 2025 03:50:24 +0000 (11:50 +0800)]
doc/dev/config: Document how to use :confval: directive for config options
Add comprehensive guide for documenting configuration options using the
:confval: directive, including naming conventions and cross-referencing.
Previously, the documentation lacked guidance on using the :confval:
directive and the important distinction between regular config options
and mgr module options (which require the mgr/<module>/ namespace
prefix). This change provides detailed examples and best practices for
properly documenting and referencing both types of configuration options.