John Mulligan [Wed, 15 Apr 2026 21:15:03 +0000 (17:15 -0400)]
CODEOWNERS: add a build-sig group for various build / test files
Add a new build-sig group that covers some of the high level tools and
scripts used in the build and CI processes. This should help PRs not
pass by without notifying people who care about these things.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Kefu Chai [Sun, 29 Mar 2026 11:41:48 +0000 (19:41 +0800)]
crimson/osd: remove unnecessary named string for --smp value
The local variable `smp` is used only in the two immediately following
statements. Inline the fmt::format() call into emplace_back() and pass
reactor_num directly to the logger.
Kefu Chai [Sun, 29 Mar 2026 11:39:40 +0000 (19:39 +0800)]
crimson/osd: make early_config_t::to_ptr_vector private
The helper is an implementation detail of get_early_args() and
get_ceph_args(). Making it private prevents callers from inadvertently
holding the returned const char* pointers past the lifetime of the
input vector. Also fix the truncated doc-comment ("must not outlive in").
Shraddha Agrawal [Thu, 19 Mar 2026 08:01:28 +0000 (13:31 +0530)]
crimson/osd/pg_recovery: call MOSDPGRecoveryDelete instead of MOSDPGBackfillRemove
This commit fixes the abort in Recovered::Recovered.
There is a race to acquire the OBC lock between backfill and
client delete for the same object.
When the lock is acquired first by the backfill, the object is
recovered first, and then deleted by the client delete request.
When recovering the object, the corresponding peer_missing entry
is cleared and we are able to transition to Recovered state
successfully.
When the lock is acquired first by client delete request, the
object is deleted. Then backfill tries to recover the object,
finds it deleted and exists early. The stale peer_missing
entry is not cleared. In Recovered::Recovered, needs_recovery()
sees this stale peer_missing entry and calls abort.
The issue is fixed by sending MOSDPGRecoveryDelete from the client
path to peers and waiting for MOSDPGRecoveryDeleteReply in
recover_object.
Ville Ojamo [Tue, 31 Mar 2026 06:51:15 +0000 (13:51 +0700)]
doc/ceph-volume: Fix spelling etc errors
Low-hanging spelling, punctuation, and capitalization errors.
Ignore style and other more complex issues.
Use angle brackets consistently for value placeholders.
Signed-off-by: Ville Ojamo <git2233+ceph@ojamo.eu>
Aliaksei Makarau [Tue, 31 Mar 2026 06:40:04 +0000 (08:40 +0200)]
This change introduces the shared memory communication (SMC-D) for the cluster network.
SMC-D is faster than ethernet in IBM Z LPARs and/or VMs (zVM or KVM).
bst2002git [Wed, 4 Mar 2026 15:48:20 +0000 (16:48 +0100)]
found duplicate series for the match group {fs_id="-1"}
when 1 MDS active and 2 MDS standby (on 3Node-Cluster)
found duplicate series for the match group {fs_id="-1"} on the right hand-side of the operation
many-to-many matching not allowed: matching labels must be unique on one side
Vallari Agrawal [Thu, 12 Mar 2026 13:50:00 +0000 (19:20 +0530)]
mgr/dashboard: Add 'network_mask' to nvmeof cli
This commit add the following to nvmeof cli:
0. Add new param `--network-mask` to 'subsystem add' cmd
It's a list parameter so we can pass multiple netmask by
`subsystem add --network-mask <subnet1> --network-mask <subnet2>`
1. Add new cli `subsystem add_network --network-mask <subnet>`
2. Add new cli `subsystem del_network --network-mask <subnet>`
3. Add column 'network_mask' to `subsystem list` output
4. Add column 'manual' to `listener list` output
Shraddha Agrawal [Mon, 30 Mar 2026 10:12:08 +0000 (15:42 +0530)]
qa/tasks/cephadm.py: only pass --objectstore when not bluestore
This commit ensure that we only pass --objectstore argument to
cephadm's add/apply OSD command only when the value is not the
default value, bluestore.
This is done to ensure older ceph releases, like Squid and Tentacle
do not fail, as --objectstore argument was only added in Umbrella.
Kefu Chai [Sun, 29 Mar 2026 11:41:24 +0000 (19:41 +0800)]
crimson/osd: fix inaccurate comment about child early-exit in get_early_config
The comment contained a typo ("taged") and vaguely referred to "one of
the parameters" without explaining what actually happens: the child
calls exit(0) for early-exit paths such as --help and --version, and
the parent detects this by checking for a clean exit with no pipe data.
Kefu Chai [Sun, 29 Mar 2026 11:40:46 +0000 (19:40 +0800)]
crimson/osd: remove redundant comments
Remove comments that merely restate what the code already says clearly:
- SeastarOption field comments (option_name, config_key, value_type)
- "Define a list of Seastar options" above seastar_options
- "Function to get the option value as a string" above get_option_value
- "stop()s registered using defer() are called here" in main()
Also fix the trailing space before the semicolon in the value_type
field declaration.
Lumir Sliva [Sat, 28 Mar 2026 23:27:10 +0000 (00:27 +0100)]
doc/dev: fix typos in running-tests-locally.rst
Fix grammar error ('is be tested' -> 'can be tested'), misspellings
of 'bootstrap', 'teuthology', and 'environment', a repeated word
('manually manually'), and a missing article ('maybe bootstrap' ->
'maybe the bootstrap').
Lumir Sliva [Sat, 28 Mar 2026 23:43:27 +0000 (00:43 +0100)]
doc: fix typos and outdated refs across developer guide
Fix 'elipsis' to 'ellipsis' in SubmittingPatches.rst, update
outdated 'master' branch references to 'main' in essentials.rst
and running-tests-locally.rst, fix 'sometime' to 'sometimes' in
merging.rst, and remove duplicated word in teuthology-intro.rst.
Nizamudeen A [Sat, 28 Mar 2026 08:20:44 +0000 (13:50 +0530)]
mgr/dashboard: fix subvolume group corruption from smb share form
the SMB share form accidentally corrupts the subvolume group when it
issues a call to the subvolume_info API with an empty subvol_name which
then corrupts the group entirely and the following subvolume creation
gets failed.
The fix is to not call subvol_info with an empty name.
Fixes: https://tracker.ceph.com/issues/75771 Signed-off-by: Nizamudeen A <nia@redhat.com>
WenLei [Fri, 27 Mar 2026 08:40:14 +0000 (16:40 +0800)]
src/arch: fix hwprobe include path and ZBC/ZVBC offsets for riscv64
Signed-off-by: WenLei <lei.wen2@zte.com.cn>
Fix runtime detection of RISC-V ZBC and ZVBC crypto extensions.
Problems fixed:
- <sys/hwprobe.h> only exists in glibc >= 2.40 (released 2024-07-22).
Many production RISC-V distros still use older glibc (Ubuntu 22.04: 2.35,
Debian 12: 2.36, etc.) and would fail to compile.
Therefore we switch to the kernel UAPI header <asm/hwprobe.h>,
which works with all current glibc versions.
Proof:
- Absent in glibc 2.39:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h;hb=refs/tags/glibc-2.39
- Present in glibc 2.40:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h;hb=refs/tags/glibc-2.40
- Introducing commit:
https://sourceware.org/git/?p=glibc.git;a=commit;h=426d0e1aa8f17426d13707594111df712d2b8911
- Incorrect fallback bit positions:
- RISCV_HWPROBE_EXT_ZBC was (1ULL << 15) → should be (1ULL << 7)
- RISCV_HWPROBE_EXT_ZVBC was (1ULL << 20) → should be (1ULL << 18)
Ronen Friedman [Mon, 23 Mar 2026 16:24:20 +0000 (16:24 +0000)]
Crimson/osd/run_bench(): make randomness follow Classic more closely
Direct gen() calls for randomness: Crimson uses dis(gen) % onum and
dis(gen) % (osize / bsize) to pick random object indices and
offsets, which limits the range to 0–255. Classic uses mt19937s
directly, allowing the full 32-bit range of randomness.
rbd: improve mirror image status and validation error messages
When a mirror image is left in a transitional state such as DISABLING,
the current mirror image status command reports:
$ rbd mirror image status test_pool/test_image1
rbd: mirroring not enabled on the image
This is the same message shown when mirroring is disabled or not yet
enabled, which can give the impression that mirroring is already
disabled.
Improve the validation logic and error messages to distinguish between
the DISABLED state and other non-enabled states, and include the image
name and current state in the output.
Examples:
When the image is completely disabled:
$ rbd mirror image status test_pool/test_image1
rbd: mirroring disabled on image 'test_image1'
When the image is in a transitional state (ex: DISABLING):
$ rbd mirror image status test_pool/test_image1
rbd: mirroring not enabled on image 'test_image1' (state: disabling)
John Mulligan [Mon, 9 Mar 2026 23:03:46 +0000 (19:03 -0400)]
mgr/cephadm: move ceph specific action checks to function
Move core ceph type services next action check to the
_ceph_service_needs_reconfig helper function. This is a private helper
that does not use choose_next_action because of the additional needs for
the last_config and monmap/extra conf that no other service needed to
care about. Moving the logic to a function shrinks the already-long
_check_daemons a bit and makes it possible to stop checking for
services that don't use choose_next_action in a future commit.
Plus cephadm always treats core ceph services a bit special anyway,
right? :-)
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Mon, 9 Mar 2026 21:14:50 +0000 (17:14 -0400)]
mgr/cephadm: add custom choose_next_action to ingress service
The haproxy component of the ingress service performs additional
checks to determine in the service needs to be redployed in the
case it is fronting nfs and the placement has changed.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Mon, 9 Mar 2026 20:49:26 +0000 (16:49 -0400)]
mgr/cephadm: add custom choose_next_action to monitoring services
Like the previous commit, update the prometheus, node-exporter, and
alertmanager services to use choose_next_action and share the
logic of that function via next_action_for_mgmt_stack_service.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Mon, 9 Mar 2026 20:48:53 +0000 (16:48 -0400)]
mgr/cephadm: add custom choose_next_action to ceph exporter service
The ceph exporter service (and similar monitoring stack services)
need to detect if andy dependencies in the mgmt stack support services have
changed and be redeployed if so.
Update the ceph exporter service that makes use of a common function
for checking for this need. A common function will be used instead of
messing around with inheritance because I'm simply not brave enough
to look at doing that and I know a function provides common
implementation without side-effecting the class hierarchy.
Signed-off-by: John Mulligan <jmulligan@redhat.com>