doc: update Grafana certificate configuration to use certmgr
With the introduction of certmgr, users must register their certificates
via `ceph orch certmgr cert set --hostname ...` instead of the old
config-key method. The updated docs clarify that Grafana certificates
are host-scoped and can only be provided by reference (or default to
cephadm-signed).
doc: update RGW HTTPS configuration to use certmgr and new fields
With the introduction of certmgr, RGW services now support three
certificate sources: cephadm-signed (default), inline, and reference.
Docs have been updated to:
- Show how to provide inline certificates using the new ssl_cert/ssl_key
fields instead of the deprecated rgw_frontend_ssl_certificate.
- Explain how to register and reference user-provided certs/keys
- Clarify that cephadm-signed certificates remain the default, with
optional wildcard SANs support.
The usage of rgw_frontend_ssl_certificate is still supported for
backward compatibility, but is now documented as deprecated.
Remove the code used to migrate Grafana self-signed certificates, as
it is no longer needed. The certmgr logic now handles generating new
certificates during the upgrade, eliminating the need for any migration
code or logic.
Remove the special-case code used for RGW service migration, as it is no
longer needed. The certmgr logic now handles populating the certstore
with the corresponding certificate and key entries by reading their values
directly from the spec. During RGW service redeployment as part of the
upgrade, certmgr will ensure the certstore is updated accordingly.
mgr/cephadm: Fix RGW spec validation for deprecated rgw cert field
Starting from Tentacle, the rgw_frontend_ssl_certificate field has been
deprecated in favor of the new ssl_cert and ssl_key fields. Update the
validation logic to run after this field is automatically transformed into
the new fields, ensuring proper validation of RGW specs.
mgr/cephadm: Include mgmt-gateway/oauth2-proxy in upgrade process
Add the new mgmt-gateway and oauth2-proxy services to the list of
services upgraded by cephadm, ensuring they are updated alongside the
rest of the cephadm-managed services.
pybind/mgr/volumes: add getter and setter APIs for snapdir_visibility
Conflicts:
fscrypt changes exist downstream 01a4d2a0356e5f66b7260dad7de70a5fa9cc3aa7 but not upstream,
so it led to a conflict, kept both the changes in the branch.
client: check client config and snaprealm flag before snapdir lookup
this commit adds a new client config client_respect_subvolume_snapshot_visibility
which acts as knob to have a per-client control over the snapshot visibility and
checks it along with the snaprealm flag while looking up a subvolume inode.
Dhairya Parmar [Wed, 6 Aug 2025 21:32:05 +0000 (03:02 +0530)]
common,mds: transmit SNAPDIR_VISIBILITY flag via SnapRealmInfoNew
at the time of building snap trace
Conflicts:
upstream ed6b71246137f9793f2d56b4d050b271a3da29fd made changes to generate_test_instances()
which is not present downstream in ceph-9.0-rhel-patches, so had to adjust accordingly.
mds: rebuild snaprealm cache if last_modified or change_attr changed
For the server side snapdir visibility changes to be transported to the
client — SnapRealm cache needs to be rebuilt otherwise the same metadata
would be sent via the send_snap_update() in C_MDS_inode_update_finish() while
setting the `ceph.dir.subvolume.snaps.visible` vxattr.
The condition used to check for the `seq` and `last_destroyed` against their
cached values but for the vxattr change, it's a rather non-feasible
heavylifting to update the `seq` which involves a set of steps to prepare the
op, commit the op, journal the changes and update snap-server/client(s) just
for a mere flag update (and updating last_destroyed anyway doesn't make sense
for this case). So, compare last_modified and change_attr with their cached
values to check if the SnapRealm cache should be rebuilt. These values are
incremented in the Server::handle_client_setvxattr while toggling the
snapshot visibility xattr and this would enforce a cache rebuild.
Conflicts:
upstream ed6b71246137f9793f2d56b4d050b271a3da29fd made changes to generate_test_instances()
but is not present downstream in ceph-9.0-rhel-patches, so had to resort to adjust accordingly.
librbd: fix segfault when removing non-existent group
Removing a non-existent group triggers a segfault in
librbd::mirror::GroupGetInfoRequest::send(). The issue is caused by a missing
return after finish(), which allows execution to fall through into
GroupGetInfoRequest::get_id() and access invalid memory.
Also, makesure to ignore ENOENT throughout the method Group::remove()
except at cls_client::dir_get_id()
Ramana Raja [Mon, 8 Sep 2025 02:50:51 +0000 (22:50 -0400)]
qa/workunits: add scenario to "test_force_promote_delete_group"
... in rbd_mirror_group_simple test suite.
After the group and its images are removed from the secondary, the test
can run in one of two scenarios. In Scenario 1, the test confirms that
the group is completely synced from the primary to the secondary. In
Scenario 2, the test disables and re-enables the primary, and then
confirms the group syncs from the primary to the secondary. Currently,
both of the scenarios fail occassionally when trying to confirm that
group is completely synced from the primary to the secondary.
Signed-off-by: Ramana Raja <rraja@redhat.com>
Resolves: rhbz#2399618
rbd-mirror: skip validation of primary demote snapshots
Problem:
When a primary demotion is in progress, the demote snapshot is in an incomplete
state. However, the group replayer incorrectly attempts to validate this
snapshot using validate_local_group_snapshots(), treating the cluster as if it
were secondary. This results in the group status being incorrectly set to
up+replaying instead of up+unknown.
Solution:
Avoid validating snapshots that are in the process of being demoted on the
primary. This ensures the group replayer does not mistakenly assign an
incorrect role or state during transition.
Adam King [Thu, 25 Sep 2025 20:13:18 +0000 (16:13 -0400)]
mgr/cephadm: split host cache entries if they exceed max mon store entry size
If the json blob we attempt to store for a host entry
exceeds the max mon store entry size, we become unable
to continue to store that hosts information in the
config key store. This means we only ever have the
information from the last time the json blob was
under the size limit each time the mgr fails over,
resulting in a number of stray host/daemon warnings
being generated and very outdated information being
reported by `ceph orch ps` and `ceph orch ls` around
the time of the failover
Igor Fedotov [Thu, 21 Aug 2025 10:42:54 +0000 (13:42 +0300)]
test/libcephfs: use more entries to reproduce snapdiff fragmentation
issue.
Snapdiff listing fragments have different boundaries in Reef and Squid+
releases hence original reproducer (made for Reef) doesn't work properly
in S+ releases. This patch fixes that at cost of longer execution.
This might be redundant/senseless when backporting to Reef.
Igor Fedotov [Tue, 12 Aug 2025 13:17:49 +0000 (16:17 +0300)]
mds: rollback the snapdiff fragment entries with the same name if needed.
This is required when more entries with the same name don't fit into the
fragment. With the existing means for fragment offset specification such a splitting to be
prohibited.
Signed-off-by: Dnyaneshwari <dtalweka@redhat.com>
mgr/dashboard: handle creation of new pool
Commit includes:
1) Provide link to create a new pool
2) Refactored validation on ACL mapping, removed required validator as default
3) fixed runtime error on console due to ACL length due to which the details section was not opening
4) Used rxjs operators to make API calls and making form ready once all data is available, fixing the form patch issues
5) Refactored some part of code to improve the performance
6) Added zone and pool information in details section for local storage class
Fixes: https://tracker.ceph.com/issues/72569 Signed-off-by: Naman Munet <naman.munet@ibm.com>
(cherry picked from commit 2d0e71c845643a26d4425ddac8ee0ff30153eff2)
Problem:
The readdir wouldn't list all the entries in the directory
when the osd is full with rstats enabled.
Cause:
The issue happens only in multi-mds cephfs cluster. If rstats
is enabled, the readdir would request 'Fa' cap on every dentry,
basically to fetch the size of the directories. Note that 'Fa' is
CEPH_CAP_GWREXTEND which maps to CEPH_CAP_FILE_WREXTEND and is
used by CEPH_STAT_RSTAT.
The request for the cap is a getattr call and it need not go to
the auth mds. If rstats is enabled, the getattr would go with
the mask CEPH_STAT_RSTAT which mandates the requirement for
auth-mds in 'handle_client_getattr', so that the request gets
forwarded to auth mds if it's not the auth. But if the osd is full,
the indode is fetched in the 'dispatch_client_request' before
calling the handler function of respective op, to check the
FULL cap access for certain metadata write operations. If the inode
doesn't exist, ESTALE is returned. This is wrong for the operations
like getattr, where the inode might not be in memory on the non-auth
mds and returning ESTALE is confusing and client wouldn't retry. This
is introduced by the commit 6db81d8479b539d which fixes subvolume
deletion when osd is full.
Fix:
Fetch the inode required for the FULL cap access check for the
relevant operations in osd full scenario. This makes sense because
all the operations would mostly be preceded with lookup and load
the inode in memory or they would handle ESTALE gracefully.
Venky Shankar [Fri, 29 Aug 2025 07:15:09 +0000 (07:15 +0000)]
qa/cephfs: use fuse mount for volumes/subvolume tests
Using the kernel client is a) not really required existing
volume/subvolume test and b) per-subvolume metrics is only
supported by the user-space client library.
Igor Golikov [Thu, 10 Jul 2025 10:18:57 +0000 (10:18 +0000)]
mds: aggregate and expose subvolume metrics
rank0 periodically receives subvolume metrics from other MDS instances
and aggregate subvolume metrics using sliding window.
The MetricsAggregator exposes PerfCounters and PerfQueries for these
metrics.
Igor Golikov [Thu, 10 Jul 2025 10:17:36 +0000 (10:17 +0000)]
client,mds: add support for subvolume level metrics
Add support for client side metrics collection using SimpleIOMetric
struct and aggregation using AggregatedIOMetrics struct,
Client holds SimpleIOMetrics vector per each subvolume it recognized
(via caps/metadata messages), aggregates them into the
AggregatedIOMetric struct, and sends periodically to the MDS, along
with regulat client metrics.
MDS holds map of subvolume_path -> vector<AggregatedIOMetrics> and sends
it periodically to rank0, for further aggregation and exposure.
Yaarit Hatuka [Mon, 21 Oct 2024 20:35:31 +0000 (16:35 -0400)]
mgr/callhome: persist operations between mgr restarts
Currently the operations dictionary is only kept in memory. It is lost
when the mgr restarts, and this can cause the module to handle upload
requests which were already processed and registered in the operations
dictionary. To prevent that, we write the operations to the db, and load
them when the module starts.
mgr/callhome: management of diagnostic upload requests (#78)
Call Home stores diagnostic upload requests for 10 days
Call Home does not process operations sent repeated by IBM Call Home mesh
Call Home able repeat level 1 operations after 5 minutes
Call Home able to repeat level2 (and upper) operations after 1 hour
This is a combination of 18 commits to ease maintenance.
Signed-off-by: Yaarit Hatuka <yhatuka@ibm.com> Signed-off-by: Juan Miguel Olmo Martínez <jolmomar@ibm.com>
(cherry picked from commit c9deac16f75e174e66ecde453cc8e71c936b3981)
A new line was missing between the block of "%files node-proxy"
and that of "%files mgr-callhome".
Please note that the changes in de6cbfbde53c64877941751d2ef5f8198ae5dccc
to src/cephadm/cephadm.py were reset in this commit, since they were
extracted and cherry-picked to a separate call-home-cephadm branch.
pybind/mgr: add call_home_agent to CMakeLists.txt and tox.ini
This commit can be safely squashed along with: 209527a8e087c916fadd0e395e3619a89cf1c3a6
mgr/callhome: Add hardware status to inventory reports
in future releases
Signed-off-by: Yaarit Hatuka <yhatuka@ibm.com>
mgr/ccha: Remove jti error message when no credentials (#61)
Avoid the annoying error message if not credentials present
Fix error if registry credentials are set using ceph cephadm registry-reg_credentials
Changed default regex for registry urls
ECuRep requires Transfer ID credentials (user ID and password). In this fix we
are adding the option to load them from the encrypted keys file instead of
asking the user to populate them. The keys from the files are the default. As a
workaround, we are leaving the option to manually populate the module options,
in case we ever need it.
BF-2271537: mgr/callhome: pick up SI event ID (#65)
Storage Insights event ID was not picked up correctly which prevented Ceph from
listening to SI triggered requests, and thus not fulfilling them and updating
on their status.
Status for operations updated to match SI expectations
complete_multipart_upload: the spec requires that the client
provide the same values for sse-c as were used to initiate
the upload. Verify the required paraemeters exist and match.
rgw/multisite: reset RGW_ATTR_OBJ_REPLICATION_TRACE during object attr changes.
otherwise, if a zone receives request for any s3 object api requests like PutObjectAcl, PutObjectTagging etc. and this zone
was originally the source zone for the object put request, then such subsequent sync ops will fail. this is because the
zone id was added to the replication trace to ensure that we don't sync the object back to it.
for example in a put/delete race during full sync(https://tracker.ceph.com/issues/58911)
so, if the same zone ever becomes the destination for subsequent sync requests on the same object, we compare this zone as
the destination zone against the zone entries in replication trace and because it's entry is already present in the trace,
the sync operation returns -ERR_NOT_MODIFIED.
rgw/logging: add error message when log_record fails
when log_record fails in journal mode due to issues in the target
bucket, the result code that the client get will be confusing, since
there is no indication that the issue is wit hte target bucket and not
the source bucket on which the client was operating.
the HTTP error message will be used to convey this information.
rgw/restore: Mark the restore entry status as `None` first time
While adding the restore entry to the FIFO, mark its status as `None`
so that restore thread knows that the entry is being processed for
the first time. Incase the restore is still in progress and the entry
needs to be re-added to the queue, its status then will be marked
`InProgress`.
Soumya Koduri [Sun, 10 Aug 2025 12:13:11 +0000 (17:43 +0530)]
rgw/restore: Persistently store the restore state for cloud-s3 tier
In order to resume IN_PROGRESS restore operations post RGW service
restarts, store the entries of the objects being restored from `cloud-s3`
tier persistently. This is already being done for `cloud-s3-glacier`
tier and now the same will be applied to `cloud-s3` tier too.
With this change, when `restore-object` is performed on any object,
it will be marked RESTORE_ALREADY_IN_PROGRESS and added to a restore FIFO queue.
This queue is later processed by Restore worker thread which will try to
fetch the objects from Cloud or Glacier/Tape S3 services. Hence all the
restore operations are now handled asynchronously (for both `cloud-s3`,
`cloud-s3-glacier` tiers).
Matt Benjamin [Thu, 11 Sep 2025 20:42:03 +0000 (16:42 -0400)]
rgw_cksum: return ChecksumAlgorithm and ChecksumType in ListParts
An uncompleted multipart upload's checksum algorithm and type can
be deduced from the upload object. Also the ChecksumType element
was being omitted in the completed case.
rgw/restore: Update expiry-date of restored copies
As per AWS spec (https://docs.aws.amazon.com/AmazonS3/latest/API/API_RestoreObject.html),
if a `restore-object` request is re-issued on already restored copy, server needs to
update restoration period relative to the current time. These changes handles the same.
Note: this applies to only temporary restored copies
cloud restore : add None type for cloud-s3-glacier
AWS supports various glacier conf options such as Standard, Expetided
to restore object with in a time period. Theses options may not be supported in
other S3 servers. So introducing option NoTier, so other vendors can be supported.
Signed-off-by: Harsimran Singh <hsthukral51@gmail.com>
(cherry picked from commit b588fd05c7d82b52fc8fa3742976a9a45c3755b4) Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>