Casey Bodley [Mon, 23 Mar 2026 14:34:38 +0000 (10:34 -0400)]
qa/rgw: don't duplicate 'user list' commands for default zone
some commands during setup expect the zone to exist already, so run
'radosgw-admin user list' to make sure a default zone/zonegroup are
created. avoid duplicating this in several subtasks by moving this to
its own subtask that runs when a realm is not configured
rgw: fix cloud tier multipart resume starting at part number 0
When resuming a cloud tier multipart upload, the part-size
calculation was inside the fresh-init block and never executed.
cur_part, num_parts, and part_size stayed at 0, causing the
remote endpoint to reject part number 0 as invalid.
Move the part-size calculation out of the init block so it
runs for both fresh and resumed uploads.
Signed-off-by: Matthew N. Heler <matthew.heler@hotmail.com>
Matthew N. Heler [Thu, 26 Feb 2026 21:14:29 +0000 (15:14 -0600)]
rgw/cloud-restore: strip quotes from ETag on cloud tier fetch
When objects are restored from a cloud endpoint, the ETag value read
from the HTTP response includes surrounding double-quotes per RFC 7232.
RGW stores ETags unquoted internally, and dump_etag() adds its own
quotes when serving responses. The mismatch results in double-quoted
ETags like ""abc123-6"" on restored objects.
Strip the quotes from both the etag output parameter and the
RGW_ATTR_ETAG attribute after fetching from the cloud endpoint,
matching the unquoted format RGW uses everywhere else.
Signed-off-by: Matthew N. Heler <matthew.heler@hotmail.com>
Kushal Deb [Mon, 22 Dec 2025 12:58:29 +0000 (18:28 +0530)]
mgr/cephadm: Implement D3N L1 persistent datacache support for RGW
Add RGW D3N L1 persistent datacache support backed by host block devices.
Select devices deterministically per (service, host) with intra-service
sharing, forbid cross-service reuse, prepare/mount devices, and
bind-mount per-daemon cache directories into the container.
qa: fix setting rbd_sparse_read_threshold_bytes in test_migration_clone()
Currently it's set on the intermediary clone instead of the parent.
As a result the setting is effective only for reads that terminate at
the intermediary clone -- reads that go all the way to the parent may
end up being handled as not sparse depending on their size.
AWS S3 requires Days to be a positive non-zero integer. Parse Days as
a signed integer and validate in get_params() before any restore state
is modified, returning InvalidArgument for values less than 1.
Signed-off-by: Matthew N. Heler <matthew.heler@hotmail.com>
mgr/cephadm: fix NFS ganesha service registration via nodeid and register_service
The NFS Ganesha service was not consistently visible in `ceph -s`,
especially in multi-daemon deployments. This was due to missing or
incorrect service registration with the Ceph manager.
This change updates the ganesha.conf template to explicitly set:
Key points:
- Enables proper service registration in mgr via register_service
- Ensures unique nodeid per daemon using namespace + nodeid
- Fixes visibility of NFS daemons in `ceph -s`
- Works correctly for both single and multi-node deployments
Validation:
- Verified with single NFS daemon → visible in `ceph -s`
- Verified with 3 NFS daemons → all correctly aggregated and visible
- Confirmed export creation activates service visibility
- Tested using Ceph 9.1 (Ganesha 9.7)
Shubha Jain [Wed, 25 Mar 2026 14:42:26 +0000 (20:12 +0530)]
mgr/cephadm: add nodeid and register_service for NFS Ganesha service visibility
- Add 'name' to template context in nfs.py
- Use consistent nodeid in RADOS_KV and CEPH blocks
- Enable register_service in CEPH block for service map visibility
Note: CEPH block is ignored in Ganesha 5.9 (seen as 'Unknown block (CEPH)' in logs),
so ceph -s visibility cannot be validated upstream. This change is forward-compatible
with newer Ganesha versions.
Patrick Donnelly [Tue, 31 Mar 2026 13:10:08 +0000 (18:40 +0530)]
mon/MonClient: check stopping for auth request handling
When the MonClient is shutting down, it is no longer safe to
access MonClient::auth and other members. The AuthClient
methods should be checking the stopping flag in this case.
The key bit from the segfault backtrace (thanks Brad Hubbard!) is here:
#13 0x00007f921ee23c40 in ProtocolV2::handle_auth_done (this=0x7f91cc0945f0, payload=...) at /usr/include/c++/12/bits/shared_ptr_base.h:1665
#14 0x00007f921ee16a29 in ProtocolV2::run_continuation (this=0x7f91cc0945f0, continuation=...) at msg/./src/msg/async/ProtocolV2.cc:54
#15 0x00007f921edee56e in std::function<void (char*, long)>::operator()(char*, long) const (__args#1=0, __args#0=<optimized out>, this=0x7f91cc0744d8) at /usr/include/c++/12/bits/std_function.h:591
#16 AsyncConnection::process (this=0x7f91cc074140) at msg/./src/msg/async/AsyncConnection.cc:485
#17 0x00007f921ee3300c in EventCenter::process_events (this=0x55efc9d0a058, timeout_microseconds=<optimized out>, working_dur=0x7f921a891d88) at msg/./src/msg/async/Event.cc:465
#18 0x00007f921ee38bf9 in operator() (__closure=<optimized out>) at msg/./src/msg/async/Stack.cc:50
#19 std::__invoke_impl<void, NetworkStack::add_thread(Worker*)::<lambda()>&> (__f=...) at /usr/include/c++/12/bits/invoke.h:61
#20 std::__invoke_r<void, NetworkStack::add_thread(Worker*)::<lambda()>&> (__fn=...) at /usr/include/c++/12/bits/invoke.h:111
#21 std::_Function_handler<void(), NetworkStack::add_thread(Worker*)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/12/bits/std_function.h:290
#22 0x00007f921e81f253 in std::execute_native_thread_routine (__p=0x55efc9e9c5f0) at ../../../../../src/libstdc++-v3/src/c++11/thread.cc:82
#23 0x00007f921f5e8ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#24 0x00007f921f67a8d0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
I originally thought this may be the issue causing [1] however that
turned out to be an issue caused by OpenSSL's use of atexit handlers.
I still think there is a bug here so I am continuing with this change.
[1] https://tracker.ceph.com/issues/59335
Fixes: https://tracker.ceph.com/issues/76017 Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
benhanokh [Mon, 30 Mar 2026 08:22:51 +0000 (11:22 +0300)]
rgw/dedup: This PR extends the RGW dedup split-head feature to support objects that already have tail RADOS objects (i.e. objects larger than the head chunk size).
Previously, split-head was restricted to objects whose entire data fit in the head (≤4 MiB).
It also migrates the split-head manifest representation from the legacy explicit-objs format to the prefix+index rules-based format.
Refactored should_split_head():
Now performs a larger set of eligibility checks:
* d_split_head flag is set
* single-part object only
* non-empty head
* not a legacy manifest
* not an Alibaba Cloud OSS AppendObject
Explicit skips for unsupported manifest types:
— old-style explicit-objs manifests
— OSS AppendObject manifests (detected via non-empty override_prefix)
New config option: rgw_dedup_split_obj_head:
Default is true (split-head enabled).
Setting to false disables split-head entirely.
Tail object lookup via manifest iterator:
Replaces the old get_tail_ioctx() which manually constructed the tail OID via generate_split_head_tail_name().
The new function simply calls manifest.obj_begin() and resolves the first tail object location through the standard manifest iterator.
Stats cleanup:
Removed the "Potential Dedup" stats section (small_objs_stat, dup_head_bytes, dup_head_bytes_estimate, ingress_skip_too_small_64KB*)
which tracked 64KB–4MB objects as potential-but-skipped candidates.
Since split-head now covers all sizes, this distinction is no longer meaningful. calc_deduped_bytes() is simplified accordingly.
Ilya Dryomov [Thu, 15 Jan 2026 12:56:13 +0000 (13:56 +0100)]
librbd: avoid losing sparseness in read_parent()
When read_parent() constructs a read for image_ctx->parent, it employs
a thick bufferlist (either re-using the bufferlist on the object extent
or creating a temporary one inside of C_ObjectReadMergedExtents). This
forgoes any sparseness: even if the result obtained by ObjectRequest is
sparse, it's thickened by ReadResult's handler for Bufferlist type.
This behavior is very old and hasn't been a problem for regular clones
because the public API returns a thick bufferlist in the case of C++ or
equivalent char* buf/struct iovec iov[] buffers in the case of C anyway.
ObjectCacher isn't sparse-aware but it's also not used for caching reads
by default and reading from parent for the purposes of a copyup is done
in CopyupRequest in a way that preserves sparseness. However, when it
comes to migration, source image reads go through read_parent() and the
destination image gets thickened as an inadvertent side effect.
Fix this by introducing a new ChildObject type for ReadResult whose
handler would plant the result obtained by parent's ObjectRequest into
child's ObjectRequest, as if read_parent() wasn't even called.
qa: fix misleading "in cluster log" failures during cluster log scan
Summary:
Fix misleading failure reasons reported as `"… in cluster log"` when
no such log entry actually exists.
The cephadm task currently treats `grep` errors from the cluster log
scan as if they were actual log matches. This can produce bogus
failure summaries when `ceph.log` is missing, especially after early
failures such as image pull or bootstrap problems.
Problem:
first_in_ceph_log() currently:
- returns stdout if a match is found
- otherwise returns stderr
The caller then treats any non-None value as a real cluster log hit and formats it as:
"<value>" in cluster log
That means an error like:
grep: /var/log/ceph/<fsid>/ceph.log: No such file or directory
can be misreported as if it came from the cluster log.
This change makes cluster log scanning robust and accurate by:
- checking whether /var/log/ceph/<fsid>/ceph.log exists before scanning
- using check_status=False for the grep pipeline
- treating only stdout as a real log match
- treating stderr as a scan error instead of log content
- avoiding overwrite of a more accurate pre-existing failure_reason
- reporting scan failures separately as cluster log scan failed
John Mulligan [Wed, 15 Apr 2026 21:15:03 +0000 (17:15 -0400)]
CODEOWNERS: add a build-sig group for various build / test files
Add a new build-sig group that covers some of the high level tools and
scripts used in the build and CI processes. This should help PRs not
pass by without notifying people who care about these things.
Signed-off-by: John Mulligan <jmulligan@redhat.com>