Ilya Dryomov [Thu, 21 Aug 2025 19:39:29 +0000 (21:39 +0200)]
mon/MonClient: post version request completions outside of monc_lock
dispatch() is allowed to invoke the completion object in the current
thread, before control returns from dispatch(). This isn't desirable
when it comes to discarding version requests in MonClient::shutdown()
and MonClient::_reopen_session() because completion objects could then
be invoked under monc_lock. In case of MonClient::_reopen_session() in
particular, this leads to an attempt to acquire monc_lock once again in
MonClient::get_version() on a retry due to monc_errc::session_reset
that is converted to errc::resource_unavailable_try_again:
MonClient::ms_handle_reset
< takes monc_lock >
MonClient::_reopen_session
< invokes the completion object via dispatch() with ec == monc_errc::session_reset >
Objecter::CB_Objecter_GetVersion::operator() [ ec == errc::resource_unavailable_try_again ]
Objecter::_wait_for_latest_osdmap
MonClient::get_version
< attempts to take monc_lock in the body of the lambda >
The end result is either a lockup or some form of undefined behavior.
The best possible outcome here is an exception (std::system_error with
"Resource deadlock avoided" error) and a successive call to
std::terminate().
This is a regression introduced in commit e81d4eae4e76 ("common/async:
Update `use_blocked` for newer asio"). Revert to posting version
request completions for the error cases in a way that is uniform with
the success case in MonClient::handle_get_version_reply().
Improve source rpm detection by adding a new detection method that
executes and rpm command in a container to get exactly the version of
the source rpm that the ceph.spec file would have generated. For
backwards compatibility and that I don't entirely trust myself to have
tested this the old methods are still available.
The old `--rpm-no-match-sha` is now an alias for `--srpm-match=any` to
cause it to build any (unique) ceph srpm it finds.
`--srpm-match=versionglob` retains the previous default behavior of
using a glob matching on the git id or ceph version value. The new
default of `--srpm-match=auto` implements the rpm command based behavior
described above.
All of this is wrapped in a new step `find-rpm` but that's mostly an
implementation detail and for testing.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Alex Ainscow [Wed, 13 Aug 2025 11:03:21 +0000 (12:03 +0100)]
src: Add sign-compare warnings to clang
For a while, GCC has generated warnings about sign errors. A common
mistake if compiling with clang was to accidentally introduce signedness
errors, which were picked up by the GCC builds.
This occurs due to an inconsistency in -Wall implementation between clang
and gcc: gcc includes sign-compare, clang does not.
See:
https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wall
vs
https://clang.llvm.org/docs/DiagnosticsReference.html#wall
Note that sign-compare is included under -Wextra for clang:
https://clang.llvm.org/docs/DiagnosticsReference.html#wextra
Clang will now generate similar warnings with -Wsign-compare:
https://clang.llvm.org/docs/DiagnosticsReference.html#wsign-compare
Interestingly, if specified on its own, -Wsign-compare will include
C, whereas gcc -Wall affects C++ only. Therefore we must work around
this in the make file to emulate the GCC behaviour in clang builds.
Also fix a couple of warnings found in some tests.
Kefu Chai [Mon, 18 Aug 2025 02:41:07 +0000 (10:41 +0800)]
os/Transaction: initialize unused fields in TransactionData
Initialize unused1, unused2, and unused3 fields to zero in TransactionData
to ensure consistent encoding/decoding behavior.
Background:
In commit a0c9fec7, we updated TransactionData encoding/decoding and bumped
the Transaction encoding version from 9 to 10. As part of this change, we
renamed three fields to mark them as unused:
- largest_data_len → unused1
- largest_data_off → unused2
- largest_data_off_in_data_bl → unused3
The move constructor was also updated to stop setting these fields, leaving
them uninitialized after move operations.
Problem:
This worked with existing tests because check-generated.sh reused struct
instances, preserving stale values across encode/decode cycles. However,
an upcoming test change will stop reusing instances and compare hexdumps
of encoded/re-encoded values to verify consistency. Uninitialized fields
cause these comparisons to fail due to garbage values.
Solution:
Initialize the unused fields to zero in the move constructor. This preserves
existing behavior while ensuring consistent encoding. These fields can be
removed entirely in a future change.
Ronen Friedman [Thu, 7 Aug 2025 04:54:30 +0000 (23:54 -0500)]
qa/standalone/scrub: re-code osd-scrub-dump.sh to test scrub repair functionality.
The new version of osd-scrub-dump.sh is designed to
allow multiple "corruption methods" on a subset of objects.
The functionality includes specifying:
- the number of objects created;
- the number to have their Primary version modified;
- the number to have their Replicas modified;
- the set of "manipulations" to perform on the objects.
The arm64-only module uadk needs numa.h to build; nothing else
ensures it's available. Make it an unconditional ceph build
dependency on behalf of the arm64 build.
Fixes: https://tracker.ceph.com/issues/72594 Signed-off-by: Dan Mick <dan.mick@redhat.com>
tasks/cephfs: Use different errmsg for invalid dir
During test_df_for_invalid_directory, path_walk is now called.
Use a more general error message as more errnos can be returned
and this will be a better catch all.
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
Dan Mick [Wed, 13 Aug 2025 19:16:45 +0000 (12:16 -0700)]
pybind/mgr/dashboard/frontend: add NPM_CACHEDIR envvar, use in bwc
Add an optional NPM_CACHEDIR environment variable to serve as the
cache parameter for npm in the dashboard frontend build. The idea
is to allow it to persist across builds so that we decrease the load
on registry.npmjs.org, which has been throttling our requests when
using build-with-container.py, and also hopefully improve the time
of the frontend npm operations.
build-with-container.py also grows a --npm-cache-path option to allow
setting it for container builds and passing the envvar to the build.
Fixes: https://tracker.ceph.com/issues/72298 Signed-off-by: Dan Mick <dan.mick@redhat.com>
John Mulligan [Wed, 13 Aug 2025 16:59:24 +0000 (12:59 -0400)]
cephadm/smb: fix issue setting port of remote-control sidecar
Use the new set of constants to ensure all components that touch the smb
services use the same set of ports and port names. Ensure that the
remote-control sidecar service's port can be customized.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Wed, 13 Aug 2025 16:58:06 +0000 (12:58 -0400)]
mgr/smb: update port validation to use new smb constants
Make behavior consistent across the smb mgr module and the service spec
class by using the same set of constants.
Fix an issue supporting the customization of the `remote-control`
sidecar's port.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
rgw/restore: Update expiry-date of restored copies
As per AWS spec (https://docs.aws.amazon.com/AmazonS3/latest/API/API_RestoreObject.html),
if a `restore-object` request is re-issued on already restored copy, server needs to
update restoration period relative to the current time. These changes handles the same.
Note: this applies to only temporary restored copies
John Mulligan [Wed, 13 Aug 2025 15:06:35 +0000 (11:06 -0400)]
python-common: add a new smb sub package
Add a new smb sub package for smb related things that are meant to
be shared throughout the Ceph python code related to smb and the
smb management stack.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Jon Bailey [Mon, 11 Aug 2025 11:51:42 +0000 (12:51 +0100)]
erasure-code/consistency: Allow consistency checker to be able to deal with non-4k aligned buffers
A parity returned to the client will be the size of shard 0, which may not be 4k aligned in optimised erasure coding. A parity returned through the encode function will always be 4k aligned. This change makes it so we truncate the encoded data down to the size of the client write before comparing them together in the consistency check so we know we are comparing them like-for-like
Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>
Jon Bailey [Thu, 17 Jul 2025 15:51:58 +0000 (16:51 +0100)]
test/osd: Move initialisation of overwrites and optimisation earlier in ceph_test_rados_io_sequence
All other pool initialisation happens straight after pool creation, however these two items happen later on. This is just due to them being the first two rest calls added. We now have better and more logical places for this code and this commit is moving it into this structure.
Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>
Afreen Misbah [Mon, 11 Aug 2025 09:03:32 +0000 (14:33 +0530)]
mgr/dashboard: Replace capacity threshold data with prometheus metrics
- Fixes https://tracker.ceph.com/issues/72519
- the osd dump metrics is used in /api/osd/settings
- this metrics creates perf bottleneck when osds are 1000s
- replacing with similar prometheus metrics
- minor refactors - including renaming, comments.
client: get quota root based off of provided inode in statfs
In statfs, get quota_root for inode provided. Check if a quota
is directly applied to inode. If not, reverse tree walk up and
maybe find a quota set higher up the tree.
Fixes: https://tracker.ceph.com/issues/72355 Signed-off-by: Christopher Hoffman <choffman@redhat.com>
Patrick Donnelly [Tue, 12 Aug 2025 18:43:43 +0000 (14:43 -0400)]
Merge PR #64821 into main
* refs/pull/64821/head:
.github: only run redmine-upkeep actions from ceph/ceph.git
script/redmine-upkeep: add transform to resolve merged issue
script/redmine-upkeep: set default filter name/priority
script/redmine-upkeep: raise exception for PUT failures
script/redmine-upkeep: finish transform after application
script/redmine-upkeep: indicate log location in comments
script/redmine-upkeep: check envvar to see if running as action
script/redmine-upkeep: bullet issue list
script/redmine-upkeep: add stronger note on upkeep-failed tag in failure message
script/redmine-upkeep: do not raise commennt if upkeep-failed already present
script/redmine-upkeep: correct filter out of upkeep-failed
.github/workflows: allow redmine-upkeep to write comments
Patrick Donnelly [Wed, 16 Jul 2025 18:28:59 +0000 (14:28 -0400)]
script/redmine-upkeep: add transform to resolve merged issue
Few things:
- Add priority to transforms. Largely this is to have the "merged"
transformation run first to update the "Merge Commit" field of the ticket
before any other transform intends to look at that field. This avoids
duplicating logic to set the Merge Commit field.
- Fix a bug where the github API cannot be trusted to indicate the Merge Commit
for a PR. When the branch is rename or changed, the github backend clearly
gets confused and gives the "HEAD" commit instead.
- Add new transform to resolve tickets that are merged to either Resolved or
Pending Backport status.
* Note: filters on TAGS cannot be combined. There is some restructuring to deal with that.
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>