While creating the service without providing the allowlist domain, the
UI fails with an error which is logged in the mgr log
```
Nov 05 04:11:56 ceph-node-00 ceph-mgr[1587]: [dashboard ERROR frontend.error] (https://192.168.100.100:8443/#/services/(modal:create)): Cannot read properties of null (reading 'split')
TypeError: Cannot read properties of null (reading 'split')
at ServiceFormComponent.onSubmit (https://192.168.100.100:8443/src_bootstrap_ts.js:31997:74)
at ServiceFormComponent_Template_cd_form_button_panel_submitActionEvent_60_listener (https://192.168.100.100:8443/src_bootstrap_ts.js:34168:83)
at executeListenerWithErrorHandling (https://192.168.100.100:8443/node_modules_angular_core_fesm2022_core_mjs.js:26276:12)
at Object.wrapListenerIn_markDirtyAndPreventDefault [as next] (https://192.168.100.100:8443/node_modules_angular_core_fesm2022_core_mjs.js:26308:18)
at SafeSubscriber.__tryOrUnsub (https://192.168.100.100:8443/default-node_modules_rxjs__esm2015_internal_AsyncSubject_js-node_modules_rxjs__esm2015_intern-7c6e1a.js:960:10)
at SafeSubscriber.next (https://192.168.100.100:8443/default-node_modules_rxjs__esm2015_internal_AsyncSubject_js-node_modules_rxjs__esm2015_intern-7c6e1a.js:900:14)
at Subscriber._next (https://192.168.100.100:8443/default-node_modules_rxjs__esm2015_internal_AsyncSubject_js-node_modules_rxjs__esm2015_intern-7c6e1a.js:847:22)
at Subscriber.next (https://192.168.100.100:8443/default-node_modules_rxjs__esm2015_internal_AsyncSubject_js-node_modules_rxjs__esm2015_intern-7c6e1a.js:824:12)
at EventEmitter_.next (https://192.168.100.100:8443/default-node_modules_rxjs__esm2015_internal_AsyncSubject_js-node_modules_rxjs__esm2015_intern-7c6e1a.js:604:17)
at EventEmitter_.emit (https://192.168.100.100:8443/node_modules_angular_core_fesm2022_core_mjs.js:7069:13)
```
Suyash Dongre [Wed, 20 Aug 2025 17:52:41 +0000 (23:22 +0530)]
Check if `HTTP_X_AMZ_COPY_SOURCE` header is empty
The issue was that the `HTTP_X_AMZ_COPY_SOURCE` header could be present but empty (i.e., an empty string rather than NULL). The code only checked if the pointer was not NULL, but didn't verify that the string had content. When an empty string was passed to RGWCopyObj::parse_copy_location(), it would eventually try to access name_str[0] on an empty string, causing a crash.
Changes Includes:
Added styles in rh_overrides for btn-tertiary to fix the styles on multisite page and also added class `btn-group` class to make the buttons look like before
Regression introduced by the previous PR https://gitlab.cee.redhat.com/ceph/ceph/-/merge_requests/1323
Adam King [Wed, 29 Oct 2025 19:27:09 +0000 (15:27 -0400)]
cephadm: mount nvmeof conf under /src/
The current downstream nvmeof container builds for
9.0 seem to be using /src/ as the home directory for
the container rather than /remote-source/ceph-nvmeof/app/
This is effectively the reverse issue as
seen in https://bugzilla.redhat.com/show_bug.cgi?id=2240588
Shweta Bhosale [Fri, 24 Oct 2025 11:00:16 +0000 (16:30 +0530)]
mgr/cephadm: For updating NFS backends in HAProxy, send a SIGHUP signal to reload the configuration instead of restart Fixes: https://tracker.ceph.com/issues/73633 Signed-off-by: Shweta Bhosale <Shweta.Bhosale1@ibm.com>
Resolves: rhbz#2401776
rgw/dedup: fixes an assertion failure from __snprintf_chk in fortified mode when handling dedup cluster shard token OIDs.
The issue stems from buffer size validation in string operations.
- remove 'Z' from rbd APIs which are returning now `aware` timestamp
- `datetime.utcfromtimestamp` is deprectated so using `datetime.fromtimestamp(timestamp, tz=tz=timezone.utc)` thereby returning only `aware` timestamp and removing 'Z'.
- similarly `datetime.utcnow()` is deprecated , migrated to `datetime.now(timezone.utc)`
This commit refactors setup_metadata_devices into smaller helper methods.
It keeps the distinction between existing logical volumes and raw devices
explicit, centralizes tag handling and path assignment to make the
control flow obvious and separates responsibilities for checking, creating,
and tagging devices.
Adam King [Fri, 10 Oct 2025 20:46:03 +0000 (16:46 -0400)]
python-common/cryptotools: add funcs for call_home_agent crypto activities
So that cephadm and the call_home_agent modules aren't both attempting
to import cryptography libraries that cause https://tracker.ceph.com/issues/64213
Introduce a termination_grace_period field in service spec to define how long the
orchestrator should wait for a service to shut down gracefully before forcefully terminating it.
The value is plumbed mgr -> cephadm and written into 'unit.stop' as 'podman stop -t <N>'
mgr/cephadm: add the VIP to the internal mgmt-gateway cert SAN list
Include the VIP as part of the mgmt-gateway internal server
certificate SAN list when operating in HA mode. Otherwise
the communication between internal services might fail.
Yuval Lifshitz [Sun, 12 Oct 2025 14:14:36 +0000 (14:14 +0000)]
rgw/logging: fix race condition when name update returns ECANCELED
* when we get ECANCELED indication from the name set operation we should
bail out and not continue with the rollover
* this fix revealed a hidden bug where we do not check the existing temp
name when we do conf change cleanup (rollover)
Adam King [Fri, 10 Oct 2025 14:48:35 +0000 (10:48 -0400)]
mgr/orchestrator: stop passing "default_flow_style" flag to yaml dump
This seems to not be compatible with pyyaml 6.0
```
File "/lib/python3.12/site-packages/ceph/deployment/service_spec.py", line 1350, in __repr__
y = yaml.dump(cast(dict, self), default_flow_style=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib64/python3.12/site-packages/yaml/__init__.py", line 253, in dump
return dump_all([data], stream, Dumper=Dumper, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib64/python3.12/site-packages/yaml/__init__.py", line 241, in dump_all
dumper.represent(data)
File "/lib64/python3.12/site-packages/yaml/representer.py", line 28, in represent
self.serialize(node)
File "/lib64/python3.12/site-packages/yaml/serializer.py", line 54, in serialize
self.serialize_node(node, None, None)
File "/lib64/python3.12/site-packages/yaml/serializer.py", line 104, in serialize_node
self.emit(MappingStartEvent(alias, node.tag, implicit,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Prepared.__init__() got an unexpected keyword argument 'flow_style'
```
and didn't seem to cause any issues with making our specs look
readable in the logs or being able to round-trip specs
when using `ceph orch ls --export` (minus the known bug
around doing so with multi-line certs)
rgw/lc: At least wait for |rgw_lc_lock_max_time| while trying to fetch the lc-shard lock to get or update the bucket status.
Currently each lc worker would try 1 second to get the lock on lc_shard to decide on which bucket to process and again 1 second to update the bucket status once bucket is lc processed. However when there are multiple rgws running lc, often shard is locked by the other lc worker or if there are issues when the rados is slow the lock is not processed within 1 second and worker either skips processing the bucket or skips updating the bucket, resulting in miss of LC or miss in updating the bucket status.
So in worst case when other lc worker is already processing a shard, wait for rgw_lc_lock_max_time to get the lock, as any given worker can max hold onto rgw_lc_lock_max_time a given shard.
Signed-off-by: kchheda3 <kchheda3@bloomberg.net>
(cherry picked from commit 937ac626afd3bf443edf96aa177854e8eb291af5) Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
rgw/lc: if the buckets last lc processing time is less than start time of current LC session, then continue processing bucket for lC even if the status is not in initalized state.
Currently the logic inside expired_session() would consider an LC session valid for almost 2-3 days, so for some bucket where the lc processing POST status update fails, the next lc session would skip the bucket because the expired_session() would return false as it multiplies the num_seconds_day *2. Instead of hardcoding the logic to 2 days, store the start time for each lc session and then compare the bucket update time with lc_start time, if bucket process time is less then current lc start time, then bucket can be processed as previous session is already expired.
client: adjust `Fb` cap ref count check during synchronous fsync()
cephfs client holds a ref on Fb caps when handing out a write delegation[0].
As fsync from (Ganesha) client holding write delegation will block indefinitely[1]
waiting for cap ref for Fb to drop to 0, which will never happen until the
delegation is returned/recalled.
If an inode has been write delegated, adjust for cap reference count
check in fsync().
Note: This only workls for synchronous fsync() since `client_lock` is
held for the entire duration of the call (at least till the patch leading
upto the reference count check). Asynchronous fsync() needs to be fixed
separately (as that can drop `client_lock`).
Nizamudeen A [Thu, 11 Sep 2025 05:29:47 +0000 (10:59 +0530)]
mgr/dashboard: improve search and pagination behavior
add a throttle to the pagination cycle so that if you repeatedly try to
cycle through the page, it increases the delay. Doing this because
unlike search the button click to change page is deliberate and the
first click to the button should respond immediately.
another thing is that the search with a keyword stores every keystroke i
do in the search field and then after the debouncce interval it sends
all those request one by one.
for eg: if i type 222 it waits 1s for the
debounce timer and then sends a request to find osd with id 2 first then
again 2 and then again 2. Instead it should only send 222 at the end.
Nizamudeen A [Thu, 11 Sep 2025 04:13:13 +0000 (09:43 +0530)]
mgr/dashboard: fix missing schedule interval in rbd API
Fetching the rbd image schedule interval through the rbd_support module
schedule list command
GET /api/rbd will have the following field per image
```
"schedule_info": {
"image": "rbd/rbd_1",
"schedule_time": "2025-09-11 03:00:00",
"schedule_interval": [
{
"interval": "5d",
"start_time": null
},
{
"interval": "3h",
"start_time": null
}
]
},
```
Also fixes the UI where schedule interval was missing in the form and
also disable editing the schedule_interval.
Extended the same thing to the `GET /api/pool` endpoint.
Commit includes changes:
1) Renaming Topic to Notification destination
2) Renaming Tiering to Storage class
3) Renaming Users to User Management
4) fix storage class table refresh after delete
5) Also made changes to internal routing for topic and storage class
rgw/dedup: Grant dedup process full RGW permissions.
This is necessary to allow for the creation of intermediate SLAB objects on systems configured with Ceph authentication.
Fixes: https://tracker.ceph.com/issues/72894 Signed-off-by: Mark Kogan <mkogan@ibm.com>
Update PendingReleaseNotes
Co-authored-by: Yuval Lifshitz <yuvalif@yahoo.com> Signed-off-by: Mark Kogan <31659604+mkogan1@users.noreply.github.com>
Update PendingReleaseNotes
Co-authored-by: Yuval Lifshitz <yuvalif@yahoo.com> Signed-off-by: Mark Kogan <31659604+mkogan1@users.noreply.github.com>
(cherry picked from commit dae572d50080609c77d7131cfc99b1fb3f16d31b) Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Resolves: rhbz#2393790
Rishabh Dave [Thu, 8 May 2025 15:05:39 +0000 (20:35 +0530)]
mgr/vol: keep clone source info even after cloning is finished
Instead of removing the information regarding source of a cloned
subvolume from the .meta file after the cloning has finished, keep it as
it is as the user may find it useful.
Justin Caratzas [Mon, 6 Oct 2025 23:25:44 +0000 (19:25 -0400)]
mgr/dashboard: add an option to control the dashboard crypto caller
Add a mgr config option `crypto_caller` that lets a ceph user override
the default behavior of using the remote crypto caller. Supported
values are `internal` and `remote`.
Justin Caratzas [Mon, 6 Oct 2025 23:25:44 +0000 (19:25 -0400)]
mgr/cephadm: always use the internal cryptocaller
The cephadm modules needs to use python cryptography module for ssh (via
asyncssh) and thus there's no need to use the remote crypto caller in
cephadm. Configure cephadm to always use the internal cryptocaller.
Justin Caratzas [Mon, 6 Oct 2025 23:25:44 +0000 (19:25 -0400)]
python-common/cryptotools: catch all failures to read cert
Previously, the internal crypto caller would catch (and convert) some
errors when reading the cert but not all cases. Move the logic to catch
the errors to a common location and do it once consistently.
Justin Caratzas [Mon, 6 Oct 2025 23:25:44 +0000 (19:25 -0400)]
python-common/cryptotools: unify and organize all endpoint functions
Lightly reorganize and make the "endpoint" functions in cryptotools.py more
consistent and uniform. Use small functions for input and output
handling so that the handling is done the same way throughout. Pass a
pre-constructed crypto caller via the args to then endpoint functions.
Make generating the private key it's own named function rather than
one single (and only) function with overloaded behavior controlled by
a cli switch.
Justin Caratzas [Mon, 6 Oct 2025 23:25:44 +0000 (19:25 -0400)]
pybind/mgr: fix test case in test_tls.py
Why violate the typing in a test? mypy never noticed this because tests
are not type checked but there seems to be no need to turn a str into
bytes to pass to a function that is typed only as taking str!
Justin Caratzas [Mon, 6 Oct 2025 23:25:43 +0000 (19:25 -0400)]
python-common/cryptotools: fix error path in verify tls function
The remote verify_tls function was not raising errors when it should.
Fix the function so that it always returns an object when it succeeds or
fails gracefully. Always parse that function in the crypto caller class.
Justin Caratzas [Mon, 6 Oct 2025 23:25:43 +0000 (19:25 -0400)]
python-common/cryptotools: create CrytpoCaller interface class
Create a class to act as a common shim between the cryptotools external
functions and the mgr. It provides common conversion mechanisms and
could possibly act as an abstraction in case we decide to make
the external function calls in different ways in the future.
Justin Caratzas [Mon, 6 Oct 2025 23:25:43 +0000 (19:25 -0400)]
pybind/mgr: Hack around the 'ImportError: PyO3 modules may only be initialized once per interpreter process' issue.
Fixes: https://tracker.ceph.com/issues/64213 Signed-off-by: Paulo E. Castro <pecastro@wormholenet.com>
(cherry picked from commit 717d0a6f3530ad3e07f4423002810327b2addcf1)
doc: update Grafana certificate configuration to use certmgr
With the introduction of certmgr, users must register their certificates
via `ceph orch certmgr cert set --hostname ...` instead of the old
config-key method. The updated docs clarify that Grafana certificates
are host-scoped and can only be provided by reference (or default to
cephadm-signed).