This is unsafe because bufferlist data is not guaranteed to be null-
terminated. The std::string constructor searches for a null terminator
and may read beyond the bufferlist's allocated memory, causing a
heap-buffer-overflow detected by AddressSanitizer:
```
==66092==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7e0c65215004 at pc 0x7fbc6e27c597 bp 0x7ffe29fb6100 sp 0x7ffe29fb58b8
READ of size 5 at 0x7e0c65215004 thread T0
#0 0x7fbc6e27c596 in strlen /usr/src/debug/gcc/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:425
#1 0x562c75fad91a in std::char_traits<char>::length(char const*) /usr/include/c++/15.2.1/bits/char_traits.h:393
#2 0x562c75fb4222 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string<std::allocator<char> >(char const*, std::allocator<char> const&) /usr/include/c++/15.2.1/bits/b
asic_string.h:713
#3 0x562c761b81ae in operator() /home/kefu/dev/ceph/src/osd/scrubber/scrub_backend.cc:1300
#4 0x562c761d7d53 in operator()<mini_flat_map<shard_id_t, ceph::buffer::v15_2_0::list, signed char>::_iterator<false> > /usr/include/c++/15.2.1/bits/predefined_ops.h:318
#5 0x562c761d789c in __find_if<mini_flat_map<shard_id_t, ceph::buffer::v15_2_0::list, signed char>::_iterator<false>, __gnu_cxx::__ops::_Iter_pred<ScrubBackend::match_in_shards(const hobject_t&, auth_selection_
t&, inconsistent_obj_wrapper&, std::stringstream&)::<lambda(const std::pair<const shard_id_t, ceph::buffer::v15_2_0::list&>&)> > > /usr/include/c++/15.2.1/bits/stl_algobase.h:2095
#6 0x562c761d72b2 in find_if<mini_flat_map<shard_id_t, ceph::buffer::v15_2_0::list, signed char>::_iterator<false>, ScrubBackend::match_in_shards(const hobject_t&, auth_selection_t&, inconsistent_obj_wrapper&,
std::stringstream&)::<lambda(const std::pair<const shard_id_t, ceph::buffer::v15_2_0::list&>&)> > /usr/include/c++/15.2.1/bits/stl_algo.h:3921
#7 0x562c761d5f6f in none_of<mini_flat_map<shard_id_t, ceph::buffer::v15_2_0::list, signed char>::_iterator<false>, ScrubBackend::match_in_shards(const hobject_t&, auth_selection_t&, inconsistent_obj_wrapper&,
std::stringstream&)::<lambda(const std::pair<const shard_id_t, ceph::buffer::v15_2_0::list&>&)> > /usr/include/c++/15.2.1/bits/stl_algo.h:431
#8 0x562c761d4a50 in any_of<mini_flat_map<shard_id_t, ceph::buffer::v15_2_0::list, signed char>::_iterator<false>, ScrubBackend::match_in_shards(const hobject_t&, auth_selection_t&, inconsistent_obj_wrapper&, s
td::stringstream&)::<lambda(const std::pair<const shard_id_t, ceph::buffer::v15_2_0::list&>&)> > /usr/include/c++/15.2.1/bits/stl_algo.h:450
#9 0x562c761bb84b in ScrubBackend::match_in_shards(hobject_t const&, auth_selection_t&, inconsistent_obj_wrapper&, std::__cxx11::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >&) /home/k
efu/dev/ceph/src/osd/scrubber/scrub_backend.cc:1297
#10 0x562c761b4282 in ScrubBackend::compare_obj_in_maps[abi:cxx11](hobject_t const&) /home/kefu/dev/ceph/src/osd/scrubber/scrub_backend.cc:941
#11 0x562c761d44af in operator()<hobject_t> /home/kefu/dev/ceph/src/osd/scrubber/scrub_backend.cc:887
#12 0x562c761d4836 in for_each<std::_Rb_tree_const_iterator<hobject_t>, ScrubBackend::compare_smaps()::<lambda(const auto:422&)> > /usr/include/c++/15.2.1/bits/stl_algo.h:3798
#13 0x562c761b3259 in ScrubBackend::compare_smaps() /home/kefu/dev/ceph/src/osd/scrubber/scrub_backend.cc:884
#14 0x562c761a478d in ScrubBackend::update_authoritative() /home/kefu/dev/ceph/src/osd/scrubber/scrub_backend.cc:315`
```
Fix by using bufferlist::length() which tells if the given buffer is
empty instead of converting the buffer content to a string.
Patrick Donnelly [Wed, 27 Aug 2025 16:02:00 +0000 (12:02 -0400)]
Merge PR #65266 into main
* refs/pull/65266/head:
script/redmine-upkeep: filter by one status_id at a time
script/redmine-upkeep: add more statuses and organize
script/redmine-upkeep: break out of filters on limit threshold
.github/workflows/redmine-upkeep: bump limit
Patrick Donnelly [Tue, 12 Aug 2025 20:01:44 +0000 (16:01 -0400)]
.github/workflows/redmine-upkeep: bump limit
Now that this no longer hits the github API in general, it's safer to process
many more issues. This is generally good as it's not possible to construct a
specific search that will process issues that have had their corresponding PR
merged. (We do have merge triggers however.) Additionally, process more issues
but only every 12 hours.
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
Zack Cerza [Thu, 14 Aug 2025 16:29:47 +0000 (10:29 -0600)]
cephadm: Use specified method for osd.default
When creating the first OSD in a cluster, the method requested was being
ignored - meaning an LVM OSD would be created in all cases. If a given cluster
couldn't support LVM, it could not be deployed. If we relay the method value
requested by the cephadm invocation, we can create OSDs as expected.
Edwin Rodriguez [Thu, 7 Aug 2025 20:28:44 +0000 (16:28 -0400)]
test/osd: Suppress subobject-linkage warning in SelectMappingAndLayers class
Change SelectMapping and SelectLayers definitions to use non-static arrays of strings.
SelectMappingAndLayers::sma and SelectMappingAndLayers::sly have internal storage
duration, because it is a non-template, non-inline, non-extern const-qualified variable.
As a consequence, in each translation unit sma and sly is a different object.
And because ProgramOptionSelector takes a reference as template argument, then
ProgramOptionSelector<...> are different specializations of ProgramOptionSelector
in different translation units, because the template parameter references different objects in each.
Then, if you include the header in two different translation units, the program will
have undefined behavior, because the definitions of SelectMapping violates the one-definition
rule as they are, roughly said, not semantically identical. The compiler has no way to
decide whether SelectMapping is supposed to have ProgramOptionSelector<value1> or
ProgramOptionSelector<value2> as base class (where value1 and value2 are invented names
for the two instances of io_sequence::tester::lrc::mapping_layer_array_sizes in the
different translation units).
gal salomon [Wed, 11 Jun 2025 18:05:03 +0000 (21:05 +0300)]
the current connection setup is single and shared connection, the strand on that single connection may cause a serialization.
it should be noted that per s3-request there are several redis-operation that may run on co-routine.
the redis-connection pool implement the guarded acquire/release APIs.
adding configuration : rgw_redis_connection_pool_size.
re-factor of redis-exec* function.
shared pointer for Redis connection pool
adding branch predication optimization for redis-pool/single-shared-connection condition
adding a warning-report-method for the case there is a blocking state upon empty connection pool.
Dan Mick [Tue, 26 Aug 2025 00:45:21 +0000 (17:45 -0700)]
Remove git clean -fdx
either
1) a source tarball is supplied, in which case the local dir is
irrelevant, or
2) make-debs calls make-dist, which doesn't care about a dirty cwd
so it just punishes the unaware by removing things that they may
have wanted to keep.
Afreen Misbah [Wed, 13 Aug 2025 06:49:02 +0000 (12:19 +0530)]
mgr/dashboard: Add /health/snapshot api
Fixes https://tracker.ceph.com/issues/72609
- The current minimal API relies on fetching data from osdmap and pgmap.
- These commands produce large, detailed payloads that become a performance bottleneck and impact scalability, especially in large clusters.
- To address this, we propose switching to the ceph snapshot API using ceph status command, which retrieves essential information directly from the cluster map.
- ceph status is significantly more lightweight compared to osdmap/pgmap, reducing payload sizes and processing overhead.
- This change ensures faster response times, improves system efficiency in large deployments, and minimizes unnecessary data transfer.
- update tests
Dan Mick [Sat, 23 Aug 2025 00:43:24 +0000 (17:43 -0700)]
make-debs.sh: invoke tar with --no-same-owner
When running as a normal user, tar does not attempt to preserve
owners set on the tar content files. When running as root, it does.
Containerized builds are running as root. Stop make-debs.sh from
trying to set other owners for files, and leaving files in the
host system with mapped UIDs other than the user running the container
(which causes jenkins to be unable to clear the workspace).
rgw/d4n: modified update method to optionally take a bool
for dirty flag, such that when a flag is not set, then the method
re-uses the old value of dirty flag using in-memory data structure.
This is helpful in eliminating a call to directory to fetch the
'dirty' flag in the flush() method.
rgw/d4n: optimizing iterate method to align
last block also with max_chunk_size(object size
or rgw_max_chunk_size) and to perform checks
based on object size.
Zac Dover [Fri, 22 Aug 2025 08:39:29 +0000 (18:39 +1000)]
doc/cephfs: edit troubleshooting.rst (Slow MDS)
Move the "Slow requests (MDS)" section immediately after the first
section in this document ("Slow/Stuck Operations"), because the first
procedure on the page directs the reader to undertake the operation in
"Slow requests (MDS)" before trying anything else.
Dan Mick [Thu, 21 Aug 2025 20:00:43 +0000 (13:00 -0700)]
make-debs.sh: make "skip debug packages" conditional
Now that we're using make-debs.sh as a builder inside containers,
the default should be to build all the packages, including debug.
(Also, fix a typo.)
Ilya Dryomov [Thu, 21 Aug 2025 19:39:29 +0000 (21:39 +0200)]
mon/MonClient: post version request completions outside of monc_lock
dispatch() is allowed to invoke the completion object in the current
thread, before control returns from dispatch(). This isn't desirable
when it comes to discarding version requests in MonClient::shutdown()
and MonClient::_reopen_session() because completion objects could then
be invoked under monc_lock. In case of MonClient::_reopen_session() in
particular, this leads to an attempt to acquire monc_lock once again in
MonClient::get_version() on a retry due to monc_errc::session_reset
that is converted to errc::resource_unavailable_try_again:
MonClient::ms_handle_reset
< takes monc_lock >
MonClient::_reopen_session
< invokes the completion object via dispatch() with ec == monc_errc::session_reset >
Objecter::CB_Objecter_GetVersion::operator() [ ec == errc::resource_unavailable_try_again ]
Objecter::_wait_for_latest_osdmap
MonClient::get_version
< attempts to take monc_lock in the body of the lambda >
The end result is either a lockup or some form of undefined behavior.
The best possible outcome here is an exception (std::system_error with
"Resource deadlock avoided" error) and a successive call to
std::terminate().
This is a regression introduced in commit e81d4eae4e76 ("common/async:
Update `use_blocked` for newer asio"). Revert to posting version
request completions for the error cases in a way that is uniform with
the success case in MonClient::handle_get_version_reply().