John Mulligan [Fri, 21 Mar 2025 18:28:25 +0000 (14:28 -0400)]
script/build-with-container: cache git branch result
Cache the branch we got from the git command as it is highly unlikely
to change during the script execution and if it does -- we mostly don't
care anyway.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Zac Dover [Wed, 11 Jun 2025 12:44:32 +0000 (22:44 +1000)]
doc/rados/ops: edit cache-tiering.rst
Add material to doc/rados/operations/cache-tiering.rst, as suggested by
Anthony D'Atri in
https://github.com/ceph/ceph/pull/63745#discussion_r2127887785.
Zac Dover [Wed, 11 Jun 2025 12:39:50 +0000 (22:39 +1000)]
doc/radosgw: edit cloud-transition.rst
Add a link to the "Versioned Objects" section from a place in the docs
where that section is referred to. This change was requested by Anthony
D'Atri in
https://github.com/ceph/ceph/pull/63447#discussion_r2104492552.
Jaya Prakash [Wed, 5 Mar 2025 21:56:37 +0000 (21:56 +0000)]
os/bluestore: Implemented create-bdev-label
Introduces a helper function create_bdev_label() and a new command create-bdev-label
to write essential OSD metadata (e.g., fsid, whoami) directly into the device label
at offset 0, for use on devices where support_bdev_label == false.
Zac Dover [Tue, 10 Jun 2025 10:38:54 +0000 (20:38 +1000)]
doc/rbd: add mirroring troubleshooting info
Add a note to doc/rbd/rbd-mirroring.rst that directs the reader to set
both "site-a" and "site-b" to have the same pool names in the event that
rbd throws the error message "failed to import peer bootstrap token".
This information was reported to the Ceph upstream by Petr Tlapa in June
of 2025, and credit for its development goes to Petr.
Zac Dover [Tue, 10 Jun 2025 10:58:22 +0000 (20:58 +1000)]
doc/rados: enhance "pools.rst"
Add a link to the instructions for modifying a user's caps for a given
pool. Add this link where it makes sense to add it. Add this link where
the reader would naturally want to have the link.
Kefu Chai [Tue, 10 Jun 2025 09:59:28 +0000 (17:59 +0800)]
test/erasure-code: fix memory leak in ErasureCodePlugin.parity_delta_write
Fix 4KB memory leak in ErasureCodePlugin_parity_delta_write_Test caused by
unmanaged raw buffer allocation. The test was allocating a 4096-byte raw
buffer to replace shard 4 for delta encoding validation, but the buffer::ptr
constructed from the raw pointer did not manage the buffer's lifecycle.
Detected by AddressSanitizer:
```
Direct leak of 4096 byte(s) in 1 object(s) allocated from:
#0 0x7fb73a720e15 in malloc /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:67
#1 0x5562f4062ccc in ErasureCodePlugin_parity_delta_write_Test::TestBody() /home/kefu/dev/ceph/src/test/erasure-code/TestErasureCodePluginJerasure.cc:122
#2 0x5562f41081a1 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2653
#3 0x5562f40f3004 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2689
#4 0x5562f409cbba in testing::Test::Run() /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2728```
```
In this change, we replace raw pointer allocation with
create_bufferptr() to ensure proper memory management by buffer::ptr.
Zac Dover [Tue, 10 Jun 2025 02:54:18 +0000 (12:54 +1000)]
doc/mgr: edit telemetry.rst (lines 300-400)
Edit doc/mgr/telemetry.rst (lines 300-400).
Follow up on the suggestions made by Anthony D'Atri in
https://github.com/ceph/ceph/pull/63741 (except for the one about
including Lovecraftian lore in the dummy user data in this file).
Yuval Lifshitz [Tue, 3 Jun 2025 10:30:46 +0000 (10:30 +0000)]
rgw/logging: return the last object name that was actually comitted
when comitting a pending object that was never created we should
not reply the object name as the name of the comitted object.
instead, we should return the name of the object that was actuaslly
comitted.
Kefu Chai [Mon, 9 Jun 2025 10:26:21 +0000 (18:26 +0800)]
rgw/rgw_lua_utils: fix memory leak in luaL_error() formatting
Previously, error messages passed to luaL_error() were formatted using
std::string concatenation. Since luaL_error() never returns (it throws
a Lua exception via longjmp), the allocated std::string memory was
leaked, as detected by AddressSanitizer:
This change replaces std::string formatting with stack-allocated buffer
and std::to_chars() to eliminate the memory leak.
Note: We cannot format int64_t directly through luaL_error() because
lua_pushfstring() does not support long long or int64_t format specifiers,
even in Lua 5.4 (see https://www.lua.org/manual/5.4/manual.html#lua_pushfstring).
Since libstdc++ uses int64_t for std::chrono::milliseconds::rep, we use
std::to_chars() for safe, efficient conversion without heap allocation.
The maximum runtime limit was a configuration introduced by 3e3cb156.
Naveen Naidu [Mon, 9 Jun 2025 08:02:44 +0000 (13:32 +0530)]
.github/workflows/diff-ceph-config.yml: use --ref-commit-sha and --cmp-commit-sha
update the config_diff.py to use `--ref-commit-sha` and
`--cmp-commit-sha` to repliace the three-dot diff [1] that Github uses
for showing it's diff. This way we only output the configuration changes
that have been made in the PR.
Naveen Naidu [Sun, 8 Jun 2025 13:55:24 +0000 (19:25 +0530)]
src/script/config_diff.py: add support for `ref-commit-sha` and `cmp-commit-sha` arguments
Introduced `ref-commit-sha` and `cmp-commit-sha` arguments to the
`diff-branch-remote-repo` mode, enabling comparison of remote
branches against specific commits.
This enhancement is crucial for comparing configuration changes
between a pull request (PR) and the Ceph upstream main branch. It
allows for precise comparison by focusing on files changed in the
PR, rather than simply comparing the PR's head with its latest
commit.
The approach mirrors GitHub's three-dot diff [1], where the PR is
compared against the common ancestor of the Ceph upstream repository
, i.e., the point where the PR was forked.
Naveen Naidu [Mon, 9 Jun 2025 07:36:00 +0000 (13:06 +0530)]
.github/workflows/config-diff-post-comment.js: improve handling of GH comment
1. When no configuration changes are detected, delete the outdated
configuration diff Github comment. This ensures that the PR does not
have any misleading information about configuration changes.
2. Configuration changes might change with every push event, update the
old configuration diff comment with the new configDiff that was
calculated in the present run.
Naveen Naidu [Sun, 8 Jun 2025 06:37:11 +0000 (12:07 +0530)]
src/scripts/config-diff.py: simplify sparse_branch_checkout_* functions and add files names to POSIX diff
Refactored `sparse_branch_checkout_skip_clone` and
`sparse_branch_checkout_remote_repo_skip_clone` to accept and use
branch/tag names directly instead of constructing `ref_sha` strings
throughout the code.
Also include filenames from where the configuration values are coming
from in the POSIX diff. This helps identify the config options faster in
case of descrepancies.
Kefu Chai [Wed, 4 Jun 2025 03:05:38 +0000 (11:05 +0800)]
cmake: enable out-of-source build of breakpad
Previously, Breakpad was built in its source tree instead of the
user-specified build directory, inconsistent with other external
projects and potentially causing source tree pollution.
Include path fix:
- Add ${INSTALL_DIR}/include/breakpad to include directories to fix
FTBFS on Jammy builders
Build system improvements:
- Replace dedicated LSS submodule symlink target with PATCH_COMMAND to
simplify the build process
- Use user-specified make command instead of hardcoded "make"
- Skip building unused process library and tools
- Link against breakpad with PRIVATE visibility unless required
Compiler flag cleanups:
- Remove -Wno-array-bounds from CFLAGS (Breakpad uses C++/CXXFLAGS)
- Remove compile-time flags incorrectly placed in LDFLAGS
- Remove '-fPIC' from CFLAGS, as it is already included by breakpad
when building on linux hosts.
- Replace the individual -Wno-* flags with -Wno-error to cancel
-Werror option specified by breakpad. This is more future-proof.
CMake target modernization:
- Rename libbreakpad_client to Breakpad::client following modern conventions
- Add Breakpad::breakpad header-only target to minimize dependencies
- Install library to enable proper include path prefixes
(breakpad/client/... vs client/...)
Header dependency optimization:
- Remove Breakpad includes from popular headers, use forward declarations
- Include Breakpad headers before internal headers for better readability
Ronen Friedman [Wed, 4 Jun 2025 17:44:16 +0000 (12:44 -0500)]
osd/scrub: make m_session_started_at at Session state ctor
ScrubMachine::get_time_scrubbing() must access the Session object
to compute the scrub duration. But the State data is not externally
accessible before its ctor has completed.
As we always happen to try to access that data inside the ctor,
this always results in a warning log message.
Here we move m_session_started_at into the outer state, simplifying
the logic required to access it.
Zac Dover [Wed, 4 Jun 2025 23:39:33 +0000 (09:39 +1000)]
doc/glossary: s/OMAP/omap/
Change "OMAP" to "omap" to match the capitalization established by
Eleanor Cawthon in her 2012 omap paper, here:
https://ceph.io/assets/pdfs/CawthonKeyValueStore.pdf.
Samuel Just [Wed, 4 Jun 2025 20:55:21 +0000 (20:55 +0000)]
.gitmodules: remove shallow=true config from nvmeof/gateway
https://github.com/ceph/ceph/pull/61264 reintroduced
https://tracker.ceph.com/issues/67640 fixed by 383091e89.
Setting shallow=true for the nvmeof/gateway submodule
is problematic because the ceph.git submodule sha1
is only very rarely the head sha1 of the default
branch.
Fixes: https://tracker.ceph.com/issues/71568 Signed-off-by: Samuel Just <sjust@redhat.com>
mgr/dashboard: fix KeyError exception in HardwareService.get_summary()
Typical error:
```
[dashboard ERROR exception] Internal Server Error
Traceback (most recent call last):
File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 48, in dashboard_exception_handler
return handler(*args, **kwargs)
File "/lib/python3.9/site-packages/cherrypy/_cpdispatch.py", line 54, in __call__
return self.callable(*self.args, **self.kwargs)
File "/usr/share/ceph/mgr/dashboard/controllers/_base_controller.py", line 263, in inner
ret = func(*args, **kwargs)
File "/usr/share/ceph/mgr/dashboard/controllers/_rest_controller.py", line 193, in wrapper
return func(*vpath, **params)
File "/usr/share/ceph/mgr/dashboard/controllers/hardware.py", line 21, in summary
return HardwareService.get_summary(categories, hostname)
File "/usr/share/ceph/mgr/dashboard/services/hardware.py", line 33, in get_summary
'ok': sum(item['status']['health'] == 'OK' for items in data.values()
File "/usr/share/ceph/mgr/dashboard/services/hardware.py", line 33, in <genexpr>
'ok': sum(item['status']['health'] == 'OK' for items in data.values()
KeyError: 'status'
```
The recent change from commit `fbcdf571ca1` introduced this regression.