rgw/logging: add error message when log_record fails
when log_record fails in journal mode due to issues in the target
bucket, the result code that the client get will be confusing, since
there is no indication that the issue is wit hte target bucket and not
the source bucket on which the client was operating.
the HTTP error message will be used to convey this information.
rgw/restore: Mark the restore entry status as `None` first time
While adding the restore entry to the FIFO, mark its status as `None`
so that restore thread knows that the entry is being processed for
the first time. Incase the restore is still in progress and the entry
needs to be re-added to the queue, its status then will be marked
`InProgress`.
Soumya Koduri [Sun, 10 Aug 2025 12:13:11 +0000 (17:43 +0530)]
rgw/restore: Persistently store the restore state for cloud-s3 tier
In order to resume IN_PROGRESS restore operations post RGW service
restarts, store the entries of the objects being restored from `cloud-s3`
tier persistently. This is already being done for `cloud-s3-glacier`
tier and now the same will be applied to `cloud-s3` tier too.
With this change, when `restore-object` is performed on any object,
it will be marked RESTORE_ALREADY_IN_PROGRESS and added to a restore FIFO queue.
This queue is later processed by Restore worker thread which will try to
fetch the objects from Cloud or Glacier/Tape S3 services. Hence all the
restore operations are now handled asynchronously (for both `cloud-s3`,
`cloud-s3-glacier` tiers).
Matt Benjamin [Thu, 11 Sep 2025 20:42:03 +0000 (16:42 -0400)]
rgw_cksum: return ChecksumAlgorithm and ChecksumType in ListParts
An uncompleted multipart upload's checksum algorithm and type can
be deduced from the upload object. Also the ChecksumType element
was being omitted in the completed case.
rgw/restore: Update expiry-date of restored copies
As per AWS spec (https://docs.aws.amazon.com/AmazonS3/latest/API/API_RestoreObject.html),
if a `restore-object` request is re-issued on already restored copy, server needs to
update restoration period relative to the current time. These changes handles the same.
Note: this applies to only temporary restored copies
cloud restore : add None type for cloud-s3-glacier
AWS supports various glacier conf options such as Standard, Expetided
to restore object with in a time period. Theses options may not be supported in
other S3 servers. So introducing option NoTier, so other vendors can be supported.
Signed-off-by: Harsimran Singh <hsthukral51@gmail.com>
(cherry picked from commit b588fd05c7d82b52fc8fa3742976a9a45c3755b4) Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Signed-off-by: Ali Masarwa <ali.saed.masarwa@gmail.com> Signed-off-by: Ali Masarwa <amasarwa@redhat.com>
(cherry picked from commit 47166556c5bbcf1f26621bf24cf04221b65af366)
Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
(cherry picked from commit 9bb170104446bfea0ad87b34244f3a3d47962fcc) Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Co-authored-by: Yuval Lifshitz <yuvalif@yahoo.com> Signed-off-by: Mark Kogan <31659604+mkogan1@users.noreply.github.com>
(cherry picked from commit 965eda7a45b12c9ccd78f230076002043f7df65c) Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Marcus Watts [Wed, 28 Aug 2024 21:21:13 +0000 (17:21 -0400)]
rgw/storage class. Don't inherit storage class for copy object.
When an object is copied, it should only be depending on data
in the request to determine the storage class, and if it is
not specified, it should default to 'STANDARD'. In radosgw,
this means that this is another attribute (similar to encryption)
that should not be merged from the source object.
Fixes: https://tracker.ceph.com/issues/67787 Signed-off-by: Marcus Watts <mwatts@redhat.com>
(cherry picked from commit a0e60bda70d4af93aa545a3fdea46eb9e68088c4)
Marcus Watts [Wed, 28 Aug 2024 15:42:05 +0000 (11:42 -0400)]
rgw/storage class: don't store/report STANDARD storage class.
While 'STANDARD' is a valid storage class, it is not supposed
to ever be returned when fetching an object. This change suppresses
storing 'STANDARD' as the attribute value, so that objects
explicitly created with 'STANDARD' will in fact be indistinguishable
from those where it was implicitly set.
Fixes: https://tracker.ceph.com/issues/67786 Signed-off-by: Marcus Watts <mwatts@redhat.com>
(cherry picked from commit b95e743ab9374cd3463a29c5f719ffce1c9fb28a)
Marcus Watts [Sat, 25 May 2024 03:45:14 +0000 (23:45 -0400)]
Fix lifecycle transition of encrypted multipart objects.
Lifecycle transtion can copy objects to a different storage tier.
When this happens, since the object is repacked, the original
manifest is invalidated. It is necessary to store a special
"parts_len" attribute to fix this. There was code in PutObj
to handle this, but that was only used for multisite replication,
it is not used by the lifecycle transisiton code. This fix
adds similar logic to the lifecycle transition code path to make the
same thing happen.
Fixes: https://tracker.ceph.com/issues/23264 Signed-off-by: Marcus Watts <mwatts@redhat.com>
(cherry picked from commit 60ddd17d2753b769ba2f5ebde60eb7753649d73f)
Marcus Watts [Fri, 14 Apr 2023 09:19:59 +0000 (05:19 -0400)]
copy object encryption fixes
This contains code to allow copyobject to copy encrypted objects.
It includes additional data paths to communicate data from the
rest layer down to the sal layer to handle decrypting
objects. The data paths include logic to use filter chains
from get and put that process encryption and compression.
There are several hacks to deal with quirks of the filter chains.
The "get" path has to propgate flushes around the chain,
because a flush isn't guaranteed to propagate through it.
Also the "get" and "put" chains have conflicting uses of the
buffer list logic, so the buffer list has to be copied so that
they don't step on each other's toes.
Fixes: https://tracker.ceph.com/issues/23264 Signed-off-by: Marcus Watts <mwatts@redhat.com>
(cherry picked from commit bcaaf55f4182da0a980c87c1dbd7e1d3c868626c)
Marcus Watts [Tue, 16 Jul 2024 21:16:10 +0000 (17:16 -0400)]
rgw/compression antibug check
If another bug tells the compression filter to decompress more
data than is actually present, the resulting "end_of_buffer"
error was thrown. The thrown exception unwinds the stack,
including a completion that is pending. The resulting core dump
indicates a failure with this completion rather than the end of buffer
exception, which is misleading and not useful.
With this change, radosgw does not abort, and instead logs
a somewhat useful message before returning an "unknown" error
to the client.
Fixes: https://tracker.ceph.com/issues/23264 Signed-off-by: Marcus Watts <mwatts@redhat.com>
(cherry picked from commit 8c7b0fac53107c5fdfcd1b9d5c5d6933b7ace39f)
rgw_beast_enable_async=0 can be used to run process_request() without a
coroutine context, which can make stack traces easier to view and debug
however, the frontend's reads/writes through ClientIO were still using
the yield_context to suspend/resume. so after ClientIO, the stack traces
came from the coroutine resume instead of process_request()
the beast frontend's ClientIO now issues synchronous reads/writes when
rgw_beast_enable_async is disabled
matt benjamin [Fri, 16 May 2025 16:02:20 +0000 (12:02 -0400)]
rgw: defensive fix for crash attemping part-copy of '%' versioned obj
The proximate cause of the issue actually appears to be in recognizing
the key.name of the object, only failing in rgw_rados due to an assert
on key.name being non-empty.
Signed-off-by: matt benjamin <mbenjamin@redhat.com>
(cherry picked from commit 5111b625a174aa2eaeb4be943dec9fe4b9d948af) Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Matt Benjamin [Mon, 30 Jun 2025 14:26:25 +0000 (10:26 -0400)]
rgwlc: fix removal of delete markers (SAL)
S3 delete markers do not have head objects, and SAL's Object::load_obj_state()
returns -ENOENT in this case. Handle this case in LC's remove_expired_obj().
Matt Benjamin [Sun, 10 Aug 2025 18:05:43 +0000 (14:05 -0400)]
rgw:chksum: pull up aws-sdk-java-v2 and fix S3Builder invocation
This commit pulls up aws-sdk-java-v2 to 2.32.2, which has trailing header
formatting previously seen with golang v2 sdk--for which the upstream
*Reef* logic is not present (see prior commit by Yixin Jin).
And it fixes the construction of S3Client to accept endpoint self-signed
certificates--logic which is present in the main function example code
in jcksum.java, but somehow not in putobjects.java (anymore?).
Yixin Jin [Sun, 10 Aug 2025 15:59:18 +0000 (11:59 -0400)]
rgw:cksum: fix two checksum-trailer related signing issues
1. return error code on signature mismatch (should be 400,
XAmzContentSHA256Mismatch
2. reorder final chunk extraction and signing to better address
what we were handling as a special case of a few trailing bytes--
this is arising because the implementer was working against Reef,
which I guess doesn't have the extra extraction logic (c.f.,
ceph/main and its upstream backport)
(A change to catch rgw::io::Exception at rgw_process_authenticated
has been removed, as it is already handled in the only applicable
path.)
Matt Benjamin [Sun, 18 May 2025 01:02:34 +0000 (21:02 -0400)]
rgw: aws-chunked need not supply any content-length
The updated logic for aws chunked handling (2024) appears sufficient
to handle the cases produced by aws-sdk-go-v2.
Note that https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-streaming.html
states that "For all requests, you must include the
x-amz-decoded-content-length header, specifying the size of the object in
bytes." (accessed 5/17/2025) (but now we do not enforce it).
Matt Benjamin [Sat, 17 May 2025 19:52:20 +0000 (15:52 -0400)]
rgw: recognize checksum from x-amz-checksum-{type} alone
Some SDKs may send x-amz-checksum-algorithm or
x-amz-sdk-checksum-algorithm regardless as well, but those are
only required if the checksum header is in the trailer section.
Max Kellermann [Thu, 24 Apr 2025 11:22:55 +0000 (13:22 +0200)]
rgw/rgw_cksum: work around -Wsometimes-uninitialized
clang complains that `cck3` might not be initialized:
```
/home/jenkins-build/build/workspace/ceph-api/src/rgw/rgw_cksum.cc:74:2: error: variable 'cck3' is used uninitialized whenever switch default is taken [-Werror,-Wsometimes-uninitialized]
74 | default:
| ^~~~~~~
/home/jenkins-build/build/workspace/ceph-api/src/rgw/rgw_cksum.cc:78:31: note: uninitialized use occurs here
78 | cck3 = rgw::digest::byteswap(cck3);
| ^~~~
/home/jenkins-build/build/workspace/ceph-api/src/rgw/rgw_cksum.cc:61:15: note: initialize the variable 'cck3' to silence this warning
61 | uint32_t cck3;
| ^
| = 0
```
The `default:` case however is not reachable because `ck1.type` has
already been checked. Adding initializers to `cck3` would only hide
potential future bugs, therefore I suggest just bailing out of the
function for this unreachable piece of code. With C++23, we could use
`std::unreachable()` instead.
Sachin Punadikar [Thu, 21 Aug 2025 10:09:17 +0000 (06:09 -0400)]
NFS CONF: Disable dentry caching in Ganesha
Disbale dentry caching in Ganesha. This caching leads to inconsistent
directory listing to connected NFS clients.
Fixes - https://tracker.ceph.com/issues/72797
rgw/multisite: handle secondary zone's response appropriately
depending on primary zone's version.
decode primary's response only when generate-key is true.
rgw/multisite: forward create_key request to master, fetch the newly created key
and store it on secondary. also, include 'create_date' in the user info response to
help identify timestamp of each key.
Mark Kogan [Mon, 27 May 2024 17:01:01 +0000 (17:01 +0000)]
rgw: qat: if necesary initialize the `qat` supplemental group
when RGW is started as an entry point of a container the shell
does not have the opportunity to initialize the supplemental groups
hence the `sudo usermod -a -G qat <USER>` has not taken effect,
a call to `man 3 initgroups` is necessary
John Mulligan [Tue, 22 Jul 2025 23:24:11 +0000 (19:24 -0400)]
mgr/smb: add new cephfs parameter for getting fscrypt keys
Add a new field to the cephfs configuration section for shares. This
section selects the keybridge scope and key name to use when acquiring
the key to use for fscrypt.
John Mulligan [Tue, 22 Jul 2025 23:22:15 +0000 (19:22 -0400)]
mgr/smb: add keybridge configuration to cluster resource
Add keybridge service configuration classes and parameters to the
resources module. This supports enabling the keybridge, setting up
scopes for the keybridge and it's access control.
A helper class is added that parses and helps validate the scope names.
John Mulligan [Wed, 16 Jul 2025 21:55:44 +0000 (17:55 -0400)]
mgr/smb: add enums that will be used for configuring keybridge
Add a pair of enum types that will be used for configuring the
keybridge. The scope type identifies what kind of scope is being
used. The peer policy can be used to allow a dev or other user
more access to the keybridge api for development purposes.
John Mulligan [Fri, 18 Jul 2025 14:23:31 +0000 (10:23 -0400)]
mgr/smb: fix a resource error unpacking str instead of list
Add special handling for the case where a string is passed instead of a
list. Without this fix a string will be converted into a list of single
letter items, something pretty much no one ever wants. Raise an
exception instead.
John Mulligan [Fri, 18 Jul 2025 16:20:17 +0000 (12:20 -0400)]
cephadm: add keybridge sidecar to smb daemon module
The keybridge uses the sambacc configuration but can also be passed
CLI options. Since cephadm writes the cert files, cephadm must also
pass the file names to use to the container args.
John Mulligan [Wed, 16 Jul 2025 21:08:49 +0000 (17:08 -0400)]
python-common/deployment: add keybridge feature to smb service spec
The keybridge sidecar is enabled by the keybridge feature flag.
This sidecar will be used to help fetch keys over various protocols
for the ceph module to use to set up fs encryption.
Adam Kupczyk [Mon, 31 Mar 2025 11:38:08 +0000 (13:38 +0200)]
os/bluestore: Add ability to ignore BlueFS zombie files
Under normal circumstances BlueFS _replay() procedure
does not allow for zombie files to be present.
Zombie files are files that are declared but are not attached to any name+dir.
One exception if special BlueFS Log file (ino 1).
This change introduces configurable 'bluefs_log_replay_remove_zombie_files'.
When set to 'true', instead of refusing to mount, BlueFS logs the error and ignores the file.
This is equivalent of removing it, with the distinction being that until BlueFS log is
compacted, one cannot revert the option back, or the problem will reemerge.
Jamie Pryde [Wed, 13 Aug 2025 10:57:40 +0000 (11:57 +0100)]
erasure-code: use cauchy if K and M values are not supported by ISA-L reed_sol_van
ISA-L supports a limited set of K and M values when using a vandermonde matrix. There are no such limitations when using a cauchy matrix. If the user specifies reed_sol_van (or does not specify a technique and relies on the default reed_sol_van setting) and an unsupported K/M combination, then we will automatically switch the technique for the new EC profile to cauchy. Benchmarking has not shown any noticeable performance differences between ISA-L in reed_sol_van vs cauchy modes.
osd: stop scrub_purged_snaps() from ignoring osd_beacon_report_interval
OSD beacons could be burdersome to the enitre cluster, as they lead
to generation of new `OSDMap` epochs. Therefore their frequency is
restricted through `osd_beacon_report_interval` to 5 mins by default.
Unfortunately, the `OSD::send_purged_snaps()` is unaware about this
policy with the net result being storm of OSDMaps. This patch unifies
its behavior with `OSD::tick_without_osd_lock()`.
mgr/DaemonState: Minimise time we hold the DaemonStateIndex lock
Calling back into python functions whilst holding the lock can result in
this thread being queued for the GIL and resulting in extended delays
for threads waiting to acquire the lock.