Patrick Donnelly [Tue, 31 Mar 2026 13:10:08 +0000 (18:40 +0530)]
mon/MonClient: check stopping for auth request handling
When the MonClient is shutting down, it is no longer safe to
access MonClient::auth and other members. The AuthClient
methods should be checking the stopping flag in this case.
The key bit from the segfault backtrace (thanks Brad Hubbard!) is here:
#13 0x00007f921ee23c40 in ProtocolV2::handle_auth_done (this=0x7f91cc0945f0, payload=...) at /usr/include/c++/12/bits/shared_ptr_base.h:1665
#14 0x00007f921ee16a29 in ProtocolV2::run_continuation (this=0x7f91cc0945f0, continuation=...) at msg/./src/msg/async/ProtocolV2.cc:54
#15 0x00007f921edee56e in std::function<void (char*, long)>::operator()(char*, long) const (__args#1=0, __args#0=<optimized out>, this=0x7f91cc0744d8) at /usr/include/c++/12/bits/std_function.h:591
#16 AsyncConnection::process (this=0x7f91cc074140) at msg/./src/msg/async/AsyncConnection.cc:485
#17 0x00007f921ee3300c in EventCenter::process_events (this=0x55efc9d0a058, timeout_microseconds=<optimized out>, working_dur=0x7f921a891d88) at msg/./src/msg/async/Event.cc:465
#18 0x00007f921ee38bf9 in operator() (__closure=<optimized out>) at msg/./src/msg/async/Stack.cc:50
#19 std::__invoke_impl<void, NetworkStack::add_thread(Worker*)::<lambda()>&> (__f=...) at /usr/include/c++/12/bits/invoke.h:61
#20 std::__invoke_r<void, NetworkStack::add_thread(Worker*)::<lambda()>&> (__fn=...) at /usr/include/c++/12/bits/invoke.h:111
#21 std::_Function_handler<void(), NetworkStack::add_thread(Worker*)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/12/bits/std_function.h:290
#22 0x00007f921e81f253 in std::execute_native_thread_routine (__p=0x55efc9e9c5f0) at ../../../../../src/libstdc++-v3/src/c++11/thread.cc:82
#23 0x00007f921f5e8ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#24 0x00007f921f67a8d0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
I originally thought this may be the issue causing [1] however that
turned out to be an issue caused by OpenSSL's use of atexit handlers.
I still think there is a bug here so I am continuing with this change.
[1] https://tracker.ceph.com/issues/59335
Fixes: https://tracker.ceph.com/issues/76017 Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
benhanokh [Mon, 30 Mar 2026 08:22:51 +0000 (11:22 +0300)]
rgw/dedup: This PR extends the RGW dedup split-head feature to support objects that already have tail RADOS objects (i.e. objects larger than the head chunk size).
Previously, split-head was restricted to objects whose entire data fit in the head (≤4 MiB).
It also migrates the split-head manifest representation from the legacy explicit-objs format to the prefix+index rules-based format.
Refactored should_split_head():
Now performs a larger set of eligibility checks:
* d_split_head flag is set
* single-part object only
* non-empty head
* not a legacy manifest
* not an Alibaba Cloud OSS AppendObject
Explicit skips for unsupported manifest types:
— old-style explicit-objs manifests
— OSS AppendObject manifests (detected via non-empty override_prefix)
New config option: rgw_dedup_split_obj_head:
Default is true (split-head enabled).
Setting to false disables split-head entirely.
Tail object lookup via manifest iterator:
Replaces the old get_tail_ioctx() which manually constructed the tail OID via generate_split_head_tail_name().
The new function simply calls manifest.obj_begin() and resolves the first tail object location through the standard manifest iterator.
Stats cleanup:
Removed the "Potential Dedup" stats section (small_objs_stat, dup_head_bytes, dup_head_bytes_estimate, ingress_skip_too_small_64KB*)
which tracked 64KB–4MB objects as potential-but-skipped candidates.
Since split-head now covers all sizes, this distinction is no longer meaningful. calc_deduped_bytes() is simplified accordingly.
Ilya Dryomov [Thu, 15 Jan 2026 12:56:13 +0000 (13:56 +0100)]
librbd: avoid losing sparseness in read_parent()
When read_parent() constructs a read for image_ctx->parent, it employs
a thick bufferlist (either re-using the bufferlist on the object extent
or creating a temporary one inside of C_ObjectReadMergedExtents). This
forgoes any sparseness: even if the result obtained by ObjectRequest is
sparse, it's thickened by ReadResult's handler for Bufferlist type.
This behavior is very old and hasn't been a problem for regular clones
because the public API returns a thick bufferlist in the case of C++ or
equivalent char* buf/struct iovec iov[] buffers in the case of C anyway.
ObjectCacher isn't sparse-aware but it's also not used for caching reads
by default and reading from parent for the purposes of a copyup is done
in CopyupRequest in a way that preserves sparseness. However, when it
comes to migration, source image reads go through read_parent() and the
destination image gets thickened as an inadvertent side effect.
Fix this by introducing a new ChildObject type for ReadResult whose
handler would plant the result obtained by parent's ObjectRequest into
child's ObjectRequest, as if read_parent() wasn't even called.
qa: fix misleading "in cluster log" failures during cluster log scan
Summary:
Fix misleading failure reasons reported as `"… in cluster log"` when
no such log entry actually exists.
The cephadm task currently treats `grep` errors from the cluster log
scan as if they were actual log matches. This can produce bogus
failure summaries when `ceph.log` is missing, especially after early
failures such as image pull or bootstrap problems.
Problem:
first_in_ceph_log() currently:
- returns stdout if a match is found
- otherwise returns stderr
The caller then treats any non-None value as a real cluster log hit and formats it as:
"<value>" in cluster log
That means an error like:
grep: /var/log/ceph/<fsid>/ceph.log: No such file or directory
can be misreported as if it came from the cluster log.
This change makes cluster log scanning robust and accurate by:
- checking whether /var/log/ceph/<fsid>/ceph.log exists before scanning
- using check_status=False for the grep pipeline
- treating only stdout as a real log match
- treating stderr as a scan error instead of log content
- avoiding overwrite of a more accurate pre-existing failure_reason
- reporting scan failures separately as cluster log scan failed
John Mulligan [Wed, 15 Apr 2026 21:15:03 +0000 (17:15 -0400)]
CODEOWNERS: add a build-sig group for various build / test files
Add a new build-sig group that covers some of the high level tools and
scripts used in the build and CI processes. This should help PRs not
pass by without notifying people who care about these things.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Nizamudeen A [Fri, 10 Apr 2026 06:25:27 +0000 (11:55 +0530)]
mgr/dashboard: table cell inline edit emit editing state
- Emit the editing state so that the consuming component can manipulate
that to add some extra validations
- Replace button with cds-icon-button.
- Replace submit button with tertiary instead of ghost for visibility.
- Also added a cancel button to cancel the ongoing edit
Fixes: https://tracker.ceph.com/issues/75949 Signed-off-by: Nizamudeen A <nia@redhat.com>
crimson/osd: fix race between AllReplicasRecovered and DeferRecovery
Fixes a crash where AllReplicasRecovered event arrives at NotRecovering
state due to async event delivery race with DeferRecovery preemption.
The issue occurs when:
1. Recovery completes and AllReplicasRecovered is queued asynchronously
2. A higher priority operation (e.g., client I/O) triggers AsyncReserver
to preempt recovery, posting DeferRecovery event
3. DeferRecovery is processed first, transitioning PG to NotRecovering
4. AllReplicasRecovered arrives at wrong state → crash with "bad state
machine event" because NotRecovering doesn't handle it
The fix follows Classic OSD's approach in PrimaryLogPG::start_recovery_ops():
clear PG_STATE_RECOVERING before posting recovery completion events. This
makes the existing safety check in PeeringState::Recovering::react() work:
when DeferRecovery arrives and sees !state_test(PG_STATE_RECOVERING), it
discards itself, preventing the state transition that would cause the crash.
Patrick Donnelly [Tue, 14 Apr 2026 15:46:54 +0000 (11:46 -0400)]
Merge PR #66294 into main
* refs/pull/66294/head:
qa: enforce centos9 for test
qa: rename distro
qa/suites/fs/bugs: use centos9 for squid upgrade test
qa: remove unused variables
qa: use centos9 for fs suites using k-testing
qa: update fs suite to rocky10
qa: skip dashboard install due to dependency noise
qa: only setup nat rules during bridge creation
qa: correct wording of comment
qa: use nft instead iptables
qa: use py3 builtin ipaddress module
Patrick Donnelly [Wed, 21 Jan 2026 17:25:31 +0000 (12:25 -0500)]
tools/cephfs: add new cephfs-tool
This patch introduces `cephfs-tool`, a new standalone C++ utility
designed to interact directly with `libcephfs`.
While the tool is architected to support various subcommands in the
future, the initial implementation focuses on a `bench` command to
measure library performance. This allows developers and administrators
to benchmark the userspace library isolated from FUSE or Kernel client
overheads.
Key features include:
* Multi-threaded Read/Write throughput benchmarking.
* Configurable block sizes, file counts, and fsync intervals.
* Detailed statistical reporting (Mean, Std Dev, Min/Max) for throughput and IOPS.
* Support for specific CephFS user/group impersonation (UID/GID) via `ceph_mount_perms_set`.
As an example test on a "trial" sepia machine against the new LRC, I
used a command like:
General Options:
-h [ --help ] Produce help message
-c [ --conf ] arg Ceph config file path
-i [ --id ] arg (=admin) Client ID
-k [ --keyring ] arg Path to keyring file
--filesystem arg CephFS filesystem name to mount
--uid arg (=-1) User ID to mount as
--gid arg (=-1) Group ID to mount as
Benchmark Options (used with 'bench' command):
--threads arg (=1) Number of threads
--iterations arg (=1) Number of iterations
--files arg (=100) Total number of files
--size arg (=4MB) File size (e.g. 4MB, 0 for creates only)
--block-size arg (=4MB) IO block size (e.g. 1MB)
--fsync-every arg (=0) Call fsync every N bytes
--prefix arg (=benchmark_) Filename prefix
--dir-prefix arg (=bench_run_) Directory prefix
--root-path arg (=/) Root path in CephFS
--per-thread-mount Use separate mount per thread
--no-cleanup Disable cleanup of files
AI-Assisted: significant portions of this code were AI-generated through a dozens of iterative prompts. Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
cephadm: wait for latest osd map after ceph-volume before OSD deploy
after ceph-volume creates an OSD, the cached osd map of the mgr can
lag behind the monitors, then get_osd_uuid_map() misses the new osd
id and deploy_osd_daemons_for_existing_osds() skips deploying the
cephadm daemon, which reports a misleading "Created no osd(s)" while
the osd exists.
This behavior is often seen with raw devices. (lvm list returns quicker).
This also fixes get_osd_uuid_map(only_up=True) as the previous branch
never populated the map when 'only_up' was true.
Now we only include osds with 'up==1' so a new OSD created (but still down)
is not treated as already present.
Patrick Donnelly [Tue, 14 Apr 2026 00:47:43 +0000 (20:47 -0400)]
qa: rename distro
The kernel mount overrides for the distro have no effect if they are
applied before `supported-random-distro`.
Fixes:
2026-04-13T19:06:13.603 INFO:teuthology.task.pexec:sudo dnf remove nvme-cli -y
2026-04-13T19:06:13.603 INFO:teuthology.task.pexec:sudo dnf install nvmetcli nvme-cli -y
2026-04-13T19:06:13.626 INFO:teuthology.task.pexec:Running commands on host ubuntu@trial005.front.sepia.ceph.com
2026-04-13T19:06:13.627 INFO:teuthology.task.pexec:sudo dnf remove nvme-cli -y
2026-04-13T19:06:13.627 INFO:teuthology.task.pexec:sudo dnf install nvmetcli nvme-cli -y
2026-04-13T19:06:13.652 INFO:teuthology.orchestra.run.trial148.stderr:sudo: dnf: command not found
2026-04-13T19:06:13.653 DEBUG:teuthology.orchestra.run:got remote process result: 1
2026-04-13T19:06:13.654 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_teuthology_426ec63bc4a39bba882efb593125294667afc593/teuthology/run_tasks.py", line 105, in run_tasks
manager = run_one_task(taskname, ctx=ctx, config=config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/teuthworker/src/git.ceph.com_teuthology_426ec63bc4a39bba882efb593125294667afc593/teuthology/run_tasks.py", line 83, in run_one_task
return task(**kwargs)
^^^^^^^^^^^^^^
File "/home/teuthworker/src/git.ceph.com_teuthology_426ec63bc4a39bba882efb593125294667afc593/teuthology/task/pexec.py", line 149, in task
with parallel() as p:
File "/home/teuthworker/src/git.ceph.com_teuthology_426ec63bc4a39bba882efb593125294667afc593/teuthology/parallel.py", line 84, in __exit__
for result in self:
File "/home/teuthworker/src/git.ceph.com_teuthology_426ec63bc4a39bba882efb593125294667afc593/teuthology/parallel.py", line 98, in __next__
resurrect_traceback(result)
File "/home/teuthworker/src/git.ceph.com_teuthology_426ec63bc4a39bba882efb593125294667afc593/teuthology/parallel.py", line 30, in resurrect_traceback
raise exc.exc_info[1]
File "/home/teuthworker/src/git.ceph.com_teuthology_426ec63bc4a39bba882efb593125294667afc593/teuthology/parallel.py", line 23, in capture_traceback
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/teuthworker/src/git.ceph.com_teuthology_426ec63bc4a39bba882efb593125294667afc593/teuthology/task/pexec.py", line 62, in _exec_host
tor.wait([r])
File "/home/teuthworker/src/git.ceph.com_teuthology_426ec63bc4a39bba882efb593125294667afc593/teuthology/orchestra/run.py", line 485, in wait
proc.wait()
File "/home/teuthworker/src/git.ceph.com_teuthology_426ec63bc4a39bba882efb593125294667afc593/teuthology/orchestra/run.py", line 161, in wait
self._raise_for_status()
File "/home/teuthworker/src/git.ceph.com_teuthology_426ec63bc4a39bba882efb593125294667afc593/teuthology/orchestra/run.py", line 181, in _raise_for_status
raise CommandFailedError(
teuthology.exceptions.CommandFailedError: Command failed on trial148 with status 1: 'TESTDIR=/home/ubuntu/cephtest bash -s'
which was done because these dnf commands were pulled from rocky10.yaml from the kclient overrides but ubuntu_latest was used for the random distro.
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
Patrick Donnelly [Thu, 12 Feb 2026 15:36:29 +0000 (10:36 -0500)]
qa: use centos9 for fs suites using k-testing
A better approach would be to include centos9 OR rocky10 for
distribution choice. Then we can just filter out rocky10 when we're
testing the `testing` kernel but keep rocky10 coverage for other
testing.
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
Patrick Donnelly [Wed, 19 Nov 2025 17:25:45 +0000 (12:25 -0500)]
qa: skip dashboard install due to dependency noise
2025-11-18T19:46:46.226 INFO:teuthology.orchestra.run.smithi008.stdout:/usr/bin/ceph: stderr Error ENOTSUP: Module 'alerts' is not enabled/loaded (required by command 'dashboard set-ssl-certificate'): use `ceph mgr module enable alerts` to enable it
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
Gil Bregman [Mon, 13 Apr 2026 21:41:25 +0000 (00:41 +0300)]
mgr/dashboard: Add port and secure-listeners to subsystem add NVMeoF CLI command Fixes: https://tracker.ceph.com/issues/75998 Signed-off-by: Gil Bregman <gbregman@il.ibm.com>
mgr/cephadm: fix nvmeof TLS handling and add coverage for ssl/mTLS
This PR fixes the value of `ssl` field on `NvmeofServiceSpec` (was
always set to enable_auth) and add some UT to make sure both specs
with ssl only and with mTLS enabled (enable_auth) generate the
expected daemon configuration.