Casey Bodley [Mon, 23 Mar 2026 14:34:38 +0000 (10:34 -0400)]
qa/rgw: don't duplicate 'user list' commands for default zone
some commands during setup expect the zone to exist already, so run
'radosgw-admin user list' to make sure a default zone/zonegroup are
created. avoid duplicating this in several subtasks by moving this to
its own subtask that runs when a realm is not configured
rgw: fix cloud tier multipart resume starting at part number 0
When resuming a cloud tier multipart upload, the part-size
calculation was inside the fresh-init block and never executed.
cur_part, num_parts, and part_size stayed at 0, causing the
remote endpoint to reject part number 0 as invalid.
Move the part-size calculation out of the init block so it
runs for both fresh and resumed uploads.
Signed-off-by: Matthew N. Heler <matthew.heler@hotmail.com>
Matthew N. Heler [Thu, 26 Feb 2026 21:14:29 +0000 (15:14 -0600)]
rgw/cloud-restore: strip quotes from ETag on cloud tier fetch
When objects are restored from a cloud endpoint, the ETag value read
from the HTTP response includes surrounding double-quotes per RFC 7232.
RGW stores ETags unquoted internally, and dump_etag() adds its own
quotes when serving responses. The mismatch results in double-quoted
ETags like ""abc123-6"" on restored objects.
Strip the quotes from both the etag output parameter and the
RGW_ATTR_ETAG attribute after fetching from the cloud endpoint,
matching the unquoted format RGW uses everywhere else.
Signed-off-by: Matthew N. Heler <matthew.heler@hotmail.com>
Kushal Deb [Mon, 22 Dec 2025 12:58:29 +0000 (18:28 +0530)]
mgr/cephadm: Implement D3N L1 persistent datacache support for RGW
Add RGW D3N L1 persistent datacache support backed by host block devices.
Select devices deterministically per (service, host) with intra-service
sharing, forbid cross-service reuse, prepare/mount devices, and
bind-mount per-daemon cache directories into the container.
qa: fix setting rbd_sparse_read_threshold_bytes in test_migration_clone()
Currently it's set on the intermediary clone instead of the parent.
As a result the setting is effective only for reads that terminate at
the intermediary clone -- reads that go all the way to the parent may
end up being handled as not sparse depending on their size.
AWS S3 requires Days to be a positive non-zero integer. Parse Days as
a signed integer and validate in get_params() before any restore state
is modified, returning InvalidArgument for values less than 1.
Signed-off-by: Matthew N. Heler <matthew.heler@hotmail.com>
mgr/cephadm: fix NFS ganesha service registration via nodeid and register_service
The NFS Ganesha service was not consistently visible in `ceph -s`,
especially in multi-daemon deployments. This was due to missing or
incorrect service registration with the Ceph manager.
This change updates the ganesha.conf template to explicitly set:
Key points:
- Enables proper service registration in mgr via register_service
- Ensures unique nodeid per daemon using namespace + nodeid
- Fixes visibility of NFS daemons in `ceph -s`
- Works correctly for both single and multi-node deployments
Validation:
- Verified with single NFS daemon → visible in `ceph -s`
- Verified with 3 NFS daemons → all correctly aggregated and visible
- Confirmed export creation activates service visibility
- Tested using Ceph 9.1 (Ganesha 9.7)
Shubha Jain [Wed, 25 Mar 2026 14:42:26 +0000 (20:12 +0530)]
mgr/cephadm: add nodeid and register_service for NFS Ganesha service visibility
- Add 'name' to template context in nfs.py
- Use consistent nodeid in RADOS_KV and CEPH blocks
- Enable register_service in CEPH block for service map visibility
Note: CEPH block is ignored in Ganesha 5.9 (seen as 'Unknown block (CEPH)' in logs),
so ceph -s visibility cannot be validated upstream. This change is forward-compatible
with newer Ganesha versions.
Patrick Donnelly [Tue, 31 Mar 2026 13:10:08 +0000 (18:40 +0530)]
mon/MonClient: check stopping for auth request handling
When the MonClient is shutting down, it is no longer safe to
access MonClient::auth and other members. The AuthClient
methods should be checking the stopping flag in this case.
The key bit from the segfault backtrace (thanks Brad Hubbard!) is here:
#13 0x00007f921ee23c40 in ProtocolV2::handle_auth_done (this=0x7f91cc0945f0, payload=...) at /usr/include/c++/12/bits/shared_ptr_base.h:1665
#14 0x00007f921ee16a29 in ProtocolV2::run_continuation (this=0x7f91cc0945f0, continuation=...) at msg/./src/msg/async/ProtocolV2.cc:54
#15 0x00007f921edee56e in std::function<void (char*, long)>::operator()(char*, long) const (__args#1=0, __args#0=<optimized out>, this=0x7f91cc0744d8) at /usr/include/c++/12/bits/std_function.h:591
#16 AsyncConnection::process (this=0x7f91cc074140) at msg/./src/msg/async/AsyncConnection.cc:485
#17 0x00007f921ee3300c in EventCenter::process_events (this=0x55efc9d0a058, timeout_microseconds=<optimized out>, working_dur=0x7f921a891d88) at msg/./src/msg/async/Event.cc:465
#18 0x00007f921ee38bf9 in operator() (__closure=<optimized out>) at msg/./src/msg/async/Stack.cc:50
#19 std::__invoke_impl<void, NetworkStack::add_thread(Worker*)::<lambda()>&> (__f=...) at /usr/include/c++/12/bits/invoke.h:61
#20 std::__invoke_r<void, NetworkStack::add_thread(Worker*)::<lambda()>&> (__fn=...) at /usr/include/c++/12/bits/invoke.h:111
#21 std::_Function_handler<void(), NetworkStack::add_thread(Worker*)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/12/bits/std_function.h:290
#22 0x00007f921e81f253 in std::execute_native_thread_routine (__p=0x55efc9e9c5f0) at ../../../../../src/libstdc++-v3/src/c++11/thread.cc:82
#23 0x00007f921f5e8ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#24 0x00007f921f67a8d0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
I originally thought this may be the issue causing [1] however that
turned out to be an issue caused by OpenSSL's use of atexit handlers.
I still think there is a bug here so I am continuing with this change.
[1] https://tracker.ceph.com/issues/59335
Fixes: https://tracker.ceph.com/issues/76017 Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
benhanokh [Mon, 30 Mar 2026 08:22:51 +0000 (11:22 +0300)]
rgw/dedup: This PR extends the RGW dedup split-head feature to support objects that already have tail RADOS objects (i.e. objects larger than the head chunk size).
Previously, split-head was restricted to objects whose entire data fit in the head (≤4 MiB).
It also migrates the split-head manifest representation from the legacy explicit-objs format to the prefix+index rules-based format.
Refactored should_split_head():
Now performs a larger set of eligibility checks:
* d_split_head flag is set
* single-part object only
* non-empty head
* not a legacy manifest
* not an Alibaba Cloud OSS AppendObject
Explicit skips for unsupported manifest types:
— old-style explicit-objs manifests
— OSS AppendObject manifests (detected via non-empty override_prefix)
New config option: rgw_dedup_split_obj_head:
Default is true (split-head enabled).
Setting to false disables split-head entirely.
Tail object lookup via manifest iterator:
Replaces the old get_tail_ioctx() which manually constructed the tail OID via generate_split_head_tail_name().
The new function simply calls manifest.obj_begin() and resolves the first tail object location through the standard manifest iterator.
Stats cleanup:
Removed the "Potential Dedup" stats section (small_objs_stat, dup_head_bytes, dup_head_bytes_estimate, ingress_skip_too_small_64KB*)
which tracked 64KB–4MB objects as potential-but-skipped candidates.
Since split-head now covers all sizes, this distinction is no longer meaningful. calc_deduped_bytes() is simplified accordingly.
Ilya Dryomov [Thu, 15 Jan 2026 12:56:13 +0000 (13:56 +0100)]
librbd: avoid losing sparseness in read_parent()
When read_parent() constructs a read for image_ctx->parent, it employs
a thick bufferlist (either re-using the bufferlist on the object extent
or creating a temporary one inside of C_ObjectReadMergedExtents). This
forgoes any sparseness: even if the result obtained by ObjectRequest is
sparse, it's thickened by ReadResult's handler for Bufferlist type.
This behavior is very old and hasn't been a problem for regular clones
because the public API returns a thick bufferlist in the case of C++ or
equivalent char* buf/struct iovec iov[] buffers in the case of C anyway.
ObjectCacher isn't sparse-aware but it's also not used for caching reads
by default and reading from parent for the purposes of a copyup is done
in CopyupRequest in a way that preserves sparseness. However, when it
comes to migration, source image reads go through read_parent() and the
destination image gets thickened as an inadvertent side effect.
Fix this by introducing a new ChildObject type for ReadResult whose
handler would plant the result obtained by parent's ObjectRequest into
child's ObjectRequest, as if read_parent() wasn't even called.
qa: fix misleading "in cluster log" failures during cluster log scan
Summary:
Fix misleading failure reasons reported as `"… in cluster log"` when
no such log entry actually exists.
The cephadm task currently treats `grep` errors from the cluster log
scan as if they were actual log matches. This can produce bogus
failure summaries when `ceph.log` is missing, especially after early
failures such as image pull or bootstrap problems.
Problem:
first_in_ceph_log() currently:
- returns stdout if a match is found
- otherwise returns stderr
The caller then treats any non-None value as a real cluster log hit and formats it as:
"<value>" in cluster log
That means an error like:
grep: /var/log/ceph/<fsid>/ceph.log: No such file or directory
can be misreported as if it came from the cluster log.
This change makes cluster log scanning robust and accurate by:
- checking whether /var/log/ceph/<fsid>/ceph.log exists before scanning
- using check_status=False for the grep pipeline
- treating only stdout as a real log match
- treating stderr as a scan error instead of log content
- avoiding overwrite of a more accurate pre-existing failure_reason
- reporting scan failures separately as cluster log scan failed