]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph-ci.git/log
ceph-ci.git
3 months agomgr/callhome: turn service events off by default
Yaarit Hatuka [Mon, 24 Nov 2025 04:07:09 +0000 (04:07 +0000)]
mgr/callhome: turn service events off by default

The flag `enable_service_events` is now set to False.

Also, move the description field to the payload.

Resolves: rhbz#2408379
Signed-off-by: Yaarit Hatuka <yhatuka@ibm.com>
(cherry picked from commit b8b31e9c1fc6820a756f417a274e90138f3f3689)

3 months agorbd-mirror: integration of the new GroupSnapshotNamespaceMirror::complete field
VinayBhaskar-V [Wed, 20 Aug 2025 18:30:09 +0000 (18:30 +0000)]
rbd-mirror: integration of the new GroupSnapshotNamespaceMirror::complete field

This commit introduces the new field **complete**, of type **MirrorGroupSnapshotCompleteState** enum,
to the GroupSnapshotNamespaceMirror structure. This change is necessary to align behavior of
mirror group snapshots with that of mirror image snapshots, allowing for a precise differentiation
between a group snapshot that has been created and one that has been fully synced.

**1. Handling Old-Style Snapshots**

Decoding Old Snapshots: The original GroupSnapshotNamespaceMirror structure lacked the complete field,
which implicitly defaulted to a bool value of false upon initialization.
When an old snapshot (lacking the complete field) is decoded by an upgraded client,
the implicit default value maps to MIRROR_GROUP_SNAPSHOT_COMPLETE_IF_CREATED.

Completion Check: A snapshot is determined old by checking it's complete filed i.e
complete == MIRROR_GROUP_SNAPSHOT_COMPLETE_IF_CREATED and if it's old the sync completion
for these group snapshots is determined by checking the state field
i.e state == GROUP_SNAPSHOT_STATE_CREATED.

During a upgrade where **OSDs have not yet been updated**, the new client will be forced to create
snapshots using the old style. These snapshots will be initialized with MIRROR_GROUP_SNAPSHOT_COMPLETE_IF_CREATED
and will stay on that to prevent immediate, incorrect cleanup by the old OSDs and in this case
state field is set to **GROUP_SNAPSHOT_STATE_CREATED** only after snapshot completed it's sync.

**2. Handling New-Style Snapshots**

New snapshots are initialized with complete == **MIRROR_GROUP_SNAPSHOT_INCOMPLETE**,
state == GROUP_SNAPSHOT_STATE_CREATING. The group snapshot's state is marked as GROUP_SNAPSHOT_STATE_CREATED
as soon as its metadata is fully available and stored.

Completion Check: The snapshot's sync is confirmed only when complete == MIRROR_GROUP_SNAPSHOT_COMPLETE
along with state check (state == GROUP_SNAPSHOT_STATE_CREATED) is satisfied.

This approach ensures seamless transition and compatibility, allowing the system to correctly interpret the
synchronization status of both old and newly created group snapshots.

Signed-off-by: VinayBhaskar-V <vvarada@redhat.com>
Resolves: rhbz#2396583

3 months agolibrbd: fix incomplete group snapshot not being removed on creation failure
Prasanna Kumar Kalever [Wed, 19 Nov 2025 11:43:36 +0000 (17:13 +0530)]
librbd: fix incomplete group snapshot not being removed on creation failure

Problem:
GroupCreatePrimaryRequest doesn't remove group snapshot when group
snapshot creation encounters an error in notify_quiesce(). As a result,
INCOMPLETE snapshots from previous failed attempts remain uncleaned.

Log snippet:
librbd::watcher::Notifier: 0x7fbdac0168b0 handle_notify: r=-110
librbd::mirror::snapshot::GroupCreatePrimaryRequest:  handle_notify_quiesce: r=-110
librbd::mirror::snapshot::GroupCreatePrimaryRequest:  notify_unquiesce:
librbd::watcher::Notifier: 0x7fbda83c59a0 handle_notify: r=-110
librbd::mirror::snapshot::GroupCreatePrimaryRequest:  handle_notify_unquiesce: r=-110
librbd::mirror::snapshot::GroupCreatePrimaryRequest:  handle_notify_unquiesce: failed to notify the unquiesce requests: (110) Connection timed out
librbd::mirror::snapshot::GroupCreatePrimaryRequest:  close_images:
librbd::mirror::snapshot::GroupCreatePrimaryRequest:  handle_close_images: r=0
librbd::mirror::snapshot::GroupCreatePrimaryRequest:  finish: r=-110

When snapshot creation fails, the remove snap path that cleans the snapshot is
skipped, leaving behind INCOMPLETE snapshot entries.

Solution:
Ensure remove_snap_metadata() is executed on failed to quience scenario like
above, allowing INCOMPLETE snapshot to be consistently cleaned up.

Note:
Another issue identified and fixed around GroupUnlinkPeerRequest::remove_peer_uuid(),
i.e in case of INCOMPLETE snapshot, group_snap_set() is expected to return
EEXIST error, and that is now handled.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Resolves: rhbz#2415401

3 months agorbd-mirror: allow incomplete group demote snapshot to sync after rbd-mirror daemon...
VinayBhaskar-V [Wed, 15 Oct 2025 10:37:40 +0000 (16:07 +0530)]
rbd-mirror: allow incomplete group demote snapshot to sync after rbd-mirror daemon restart

Currently when the secondary daemon was killed while the group demote snapshot was in incomplete state on secondary,
the promotion state was set to PROMOTION_STATE_ORPHAN upon restart. This state prevents the incomplete
demote snapshot sync after restart as bootstrap on secondary fails. In this commit we fix this by assigning
promotion state to PROMOTION_STATE_NON_PRIMARY for a group with an incomplete non-primary demote snapshot.

The downside is that if the group is removed on the primary cluster, then after restart of
rbd-mirror daemon on secondary cluster, the corresponding group on the secondary also gets removed.
This is because deletion propagation is unconditionally enabled precisely for PROMOTION_STATE_NON_PRIMARY
and this is okay since the user would have deleted the primary demoted group forcefully.

Signed-off-by: VinayBhaskar-V <vvarada@redhat.com>
Resolves: rhbz#2416554

3 months agorbd-mirror: fix potential deadlock in finish_shut_down()
Prasanna Kumar Kalever [Thu, 6 Nov 2025 10:29:42 +0000 (15:59 +0530)]
rbd-mirror: fix potential deadlock in finish_shut_down()

Problem:
A race condition could lead to a deadlock in finish_shut_down(). The issue
occurs when the routine attempts to acquire m_lock while it is already held
by one of the Replayer callbacks, such as validate_local_group_snapshots() or
create_group_snapshot().

Solution:
Refactored finish_shut_down() to run asynchronously by splitting it into two
routines, wait_for_in_flight_ops() and handle_wait_for_in_flight_ops(). The
handle_wait_for_in_flight_ops() function acquires the lock, but now executes in
a separate thread, avoiding lock contention and eliminating the deadlock risk.

Credits to Ilya Dryomov <idryomov@gmail.com>

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Resolves: rhbz#2411963

3 months agorbd-mirror: allow incomplete demote snapshot to sync after rbd-mirror daemon restart
VinayBhaskar-V [Tue, 14 Oct 2025 11:47:37 +0000 (17:17 +0530)]
rbd-mirror: allow incomplete demote snapshot to sync after rbd-mirror daemon restart

Currently when the rbd-mirror daemon on secondary was killed while a demote snapshot was newly created (0% copied)
or partially synced, the image's promotion state was set to **PROMOTION_STATE_ORPHAN** upon restart of rbd-mirror
daemon on secondary. This state prevents the demote snapshot sync after restart as bootstrap on secondary fails.
In this commit we fix this by assigning promotion state to **PROMOTION_STATE_NON_PRIMARY**
for an image with an **incomplete non-primary demote snapshot**.

The downside is that if the image is removed on the primary cluster, then after restart of
rbd-mirror daemon on secondary cluster, the corresponding image on the secondary also gets removed.
This is because deletion propagation is unconditionally enabled precisely for **PROMOTION_STATE_NON_PRIMARY**
images and this is okay since the user would have deleted the primary demoted image forcefully.

Fixes: https://tracker.ceph.com/issues/73528
Signed-off-by: VinayBhaskar-V <vvarada@redhat.com>
(cherry picked from commit 636a3929c1a4052827660f44f35842ac62f6d69a)

Resolves: rhbz#2416554

3 months agoneorados: Fix Neorados CephContext leak and prevent future ones
Adam C. Emerson [Fri, 21 Nov 2025 08:48:33 +0000 (03:48 -0500)]
neorados: Fix Neorados CephContext leak and prevent future ones

The original fix just dropped the extraneous reference. Decided it was
better to make explicitly when a reference was being taken, so
`make_with_cct` no longer accepts `CephContext*` and instead requires
`boost::intrusive_ptr<CephContext>`.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 949e80d0a6e522322d0e00a65feb93f5e13c4655)
Resolves: rhbz#2412500
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 months agorgw/sts: build fix
Matt Benjamin [Fri, 21 Nov 2025 18:41:48 +0000 (13:41 -0500)]
rgw/sts:  build fix

    resolves: rhbz#2406837
    resolves: rhbz#2412223

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
3 months agorgw/sts: maintain backward compatibility with
Pritha Srivastava [Mon, 3 Nov 2025 09:20:28 +0000 (14:50 +0530)]
rgw/sts: maintain backward compatibility with
7.1 tenant based STS when bucket owner is the
same as one assuming the role.

resolves: rhbz#2406837
resolves: rhbz#2412223

Signed-off-by: Pritha Srivastava <prsrivas@redhat.com>
3 months agomon: ceph pg repeer should propose a correctly sized pg temp.
Alex Ainscow [Wed, 19 Nov 2025 11:32:14 +0000 (11:32 +0000)]
mon: ceph pg repeer should propose a correctly sized pg temp.

Resolves: rhbz#2415796
Fixes: https://tracker.ceph.com/issues/73897
Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit a3cc500a543d1c2fb9e1d55c144e0a041e3d1f80)
(cherry picked from commit d5d37bcf35617898c872aa041b217bc5d96d8a22)

3 months agoRGW/multisite: add some more debug logs to sync codepath
Oguzhan Ozmen [Tue, 11 Nov 2025 16:12:54 +0000 (16:12 +0000)]
RGW/multisite: add some more debug logs to sync codepath

Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
(cherry picked from commit 7e12bbc50b68ae045f7d30ecb440545672620090)
resolves rhbz#2412220

3 months agoRGW/multisite/RGWListRemoteBucketCR: clear reused bucket_list_result to avoid stale...
Oguzhan Ozmen [Tue, 11 Nov 2025 16:16:19 +0000 (16:16 +0000)]
RGW/multisite/RGWListRemoteBucketCR: clear reused bucket_list_result to avoid stale listings

RGWBucketFullSyncCR could spin indefinitely when the source bucket was
already deleted. The coroutine reused a bucket_list_result member, and
RGWListRemoteBucketCR populated it without clearing prior state. Stale
entries/is_truncated from a previous iteration caused the loop to
continue even after the bucket no longer existed.

Fix by clearing the provided bucket_list_result at the start of
RGWListRemoteBucketCR (constructor), ensuring each listing starts from a
clean state and reflects the current remote bucket contents.

This prevents the infinite loop and returns correct results when the
bucket has been deleted.

Fixes: https://tracker.ceph.com/issues/73799
Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
(cherry picked from commit 4e80bbfe05d0d8659a0ab54b578c42deea99a915)
resolves rhbz#2412220

3 months agoRGW/multisite: bucket_list_result object provides a method to reset its entries
Oguzhan Ozmen [Tue, 11 Nov 2025 16:14:41 +0000 (16:14 +0000)]
RGW/multisite: bucket_list_result object provides a method to reset its entries

Add a new method `reset_entries()` to the `bucket_list_result` struct
that clears the list of entries and resets the truncated flag.

This would be used to enhance the re-use cases to avoid accessing stale
entries or truncated flag.

Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
(cherry picked from commit 911d0cb2bce2a613657a6aebcb9724cabc23d6eb)
resolves rhbz#2412220
(cherry picked from commit ac270428b90cc411074467db4c5f0dfd2e9e1e59)

3 months agorgw/datalog: enable fallback implementation of asio co_compose
Shilpa Jagannath [Thu, 13 Nov 2025 19:42:17 +0000 (14:42 -0500)]
rgw/datalog: enable fallback implementation of asio co_compose

(cherry picked from commit ffbef3ad9cb63aca539f51d38cdeabae295b22d3)

3 months agorgw : Fixing Priority levels of the perfCounters
Harsimran Singh [Wed, 19 Nov 2025 08:27:35 +0000 (13:57 +0530)]
rgw : Fixing Priority levels of the perfCounters

Resolves: rhbz#2411930

(cherry picked from commit 16c668b9e28c503e84cb363eea5a49c0ba0ffd3a)

3 months agoinclude: detect corrupt frag from byteswap
Patrick Donnelly [Thu, 13 Nov 2025 19:51:20 +0000 (14:51 -0500)]
include: detect corrupt frag from byteswap

Fixes: https://tracker.ceph.com/issues/73792
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit 59461534437ed0c422d42b265ab0f401053c8a7f)
Resolves: rhbz#2414841

3 months agocommon: simplify fragment printing
Patrick Donnelly [Thu, 13 Nov 2025 19:47:24 +0000 (14:47 -0500)]
common: simplify fragment printing

Fixes: https://tracker.ceph.com/issues/73792
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit a8bb5a891eef8aae11e7748f71eac666732f3cff)
Resolves: rhbz#2414841

3 months agocommon: properly convert frag_t to net/store endianness
Patrick Donnelly [Tue, 11 Nov 2025 15:15:03 +0000 (10:15 -0500)]
common: properly convert frag_t to net/store endianness

The MDS/client are already accidentally doing the right thing unless
they are running on a big-endian machine.

Fixes: https://tracker.ceph.com/issues/73792
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit e1c205209aeee636325ab22635f0cade24044f1c)
Resolves: rhbz#2414841

3 months agomds: include sysinfo in status command output
Patrick Donnelly [Thu, 13 Nov 2025 14:24:19 +0000 (09:24 -0500)]
mds: include sysinfo in status command output

Of particular interest is the CPU architecture.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit 57a42cede947168fd633ac370cb846672b911b42)
Resolves: rhbz#2414841

3 months agoinclude/frag.h: un-inline methods to reduce header dependencies
Max Kellermann [Fri, 25 Oct 2024 16:14:34 +0000 (18:14 +0200)]
include/frag.h: un-inline methods to reduce header dependencies

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
(cherry picked from commit 5f1a893dc54dc579a8428100496adc27d638aab9)
Resolves: rhbz#2414841

3 months agorgw/dedup: Prevent the dup-counter from wrapping around after it reaches 64K of ident...
Gabriel BenHanokh [Tue, 18 Nov 2025 21:53:50 +0000 (23:53 +0200)]
rgw/dedup: Prevent the dup-counter from wrapping around after it reaches 64K of identical copies.

Resolves: rhbz#2415656

Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
3 months agocephadm: add pid file directory option to cfg watch sidecar
John Mulligan [Mon, 10 Nov 2025 21:15:29 +0000 (16:15 -0500)]
cephadm: add pid file directory option to cfg watch sidecar

This will enable the config watch sidecar to signal processes
with a SIGHUP to tell them to reload configuration when config
watch has detected a configuration change. Currently only used
by keybridge.

Resolves: rhbz#2412278
Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 54f2c897a68daf93252ea20d070d45e561f26718)

3 months agocephadm: add pidfile option to smb keybridge sidecar
John Mulligan [Mon, 10 Nov 2025 21:15:19 +0000 (16:15 -0500)]
cephadm: add pidfile option to smb keybridge sidecar

Resolves: rhbz#2412278
Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit a1f98e70c2e209d6b05b7145ab72bcf26e3cf205)

3 months agorgw/zone: remove duplicated startup logic in RGWSI_Zone
Casey Bodley [Tue, 13 May 2025 13:42:32 +0000 (09:42 -0400)]
rgw/zone: remove duplicated startup logic in RGWSI_Zone

SiteConfig had already loaded the correct configuration without all of
the crazy search_realm_with_zone() stuff which is now confused by
defaults. remove all of this duplicated logic and rely on SiteConfig

removes functions search_realm_with_zone() create_default_zg() and
init_default_zone() which are no longer used

Fixes: https://tracker.ceph.com/issues/71291
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 17805489404760c8fa45caa71786058a3984ed0d)
resolves rhbz#2412273

Resolved Conflicts:
src/rgw/services/svc_zone.cc
src/rgw/services/svc_zone.h

3 months agomgr/dashboard: Set max subsystem count to 512 rather than 4096
Afreen Misbah [Mon, 17 Nov 2025 05:01:45 +0000 (10:31 +0530)]
mgr/dashboard: Set max subsystem count to 512 rather than 4096

Resolves: rhbz#2413306

Fixes https://tracker.ceph.com/issues/73867

- regression from https://github.com/ceph/ceph/pull/64477/files
- removing frontend valdations as this values are volatiel and require changes every release. Nvmeof is seeting these and validating as well.

Signed-off-by: Afreen Misbah <afreen@ibm.com>
(cherry picked from commit d29084085bab52ef6eba224d35b58c44b6157ef6)

3 months agomgr/callhome: fix timestamp from Prometheus
Yaarit Hatuka [Mon, 17 Nov 2025 05:13:19 +0000 (00:13 -0500)]
mgr/callhome: fix timestamp from Prometheus

The timestamp from Prometheus uses nanoseconds while we expect
microseconds.

Resolves: rhbz#2408379
Signed-off-by: Yaarit Hatuka <yhatuka@ibm.com>
3 months agoRevert "python-common/cryptotools: add funcs for call_home_agent crypto activities"
Yaarit Hatuka [Fri, 31 Oct 2025 21:02:26 +0000 (17:02 -0400)]
Revert "python-common/cryptotools: add funcs for call_home_agent crypto activities"

This reverts commit 21230c1d73dd6c684a382f3b19bc043a17ddcc2e.

Resolves: rhbz#2408379

Signed-off-by: Yaarit Hatuka <yhatuka@ibm.com>
3 months agomgr/cephadm: don't remove TLS certs if svc still has daemons on host
Redouane Kachach [Fri, 14 Nov 2025 12:06:59 +0000 (13:06 +0100)]
mgr/cephadm: don't remove TLS certs if svc still has daemons on host

This change fixes an issue in cephadm where cephadm-signed (and
inline) TLS certificates could be removed while a service was still
running on the same host. During a rolling transition from HTTP to
HTTPS (e.g. RGW moving from port 80 to 443 with ssl: true), the
previous post_remove() logic deletes the service’s cephadm-signed
cert/key as soon as any daemon is removed, even if a new HTTPS daemon
for the same service is being deployed on that host. In practice this
leads to the certificate being created for the new daemon and then
immediately deleted from the certmgr store.

The new behavior makes post_remove() more conservative: before
removing a cephadm-signed or inline certificate, it checks whether
there are any remaining daemons for that service on the same host. If
there are, the cert/key is left in place because it may still be in
use (for example during an HTTP->HTTPS rollout). Certificates are only
cleaned up once the last daemon for that service disappears from the
host (and, when the service no longer uses SSL). This preserves
correct TLS behavior during service transitions while still
ensuring certificates are eventually garbage-collected when no longer
needed.

Fixes: https://tracker.ceph.com/issues/73853
Resolves: rhbz#2408795

Signed-off-by: Redouane Kachach <rkachach@ibm.com>
3 months agomgr/cephadm: Fix RGW zone endpoint auto-update log
Aashish Sharma [Wed, 12 Nov 2025 10:57:58 +0000 (16:27 +0530)]
mgr/cephadm: Fix RGW zone endpoint auto-update log
ic in _update_rgw_endpoints method

Issue: The existing implementation does not re-attempt endpoint updates when no RGW daemons were found for a service or the daemon deployment is still in progress. The zone is being modified with an empty endpoint array in this case.

Fix: Added conditional checks to retry the update if no daemons are found.

Fixes: https://tracker.ceph.com/issues/73814
Resolves: rhbz#2406510
Resolves: rhbz#2406696

Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit c8e878fe5f54b1cee3637f99787950f4616d8041)

3 months agorgw/sts: correct error code to 400 (from 403)
Pritha Srivastava [Thu, 9 Oct 2025 06:05:13 +0000 (11:35 +0530)]
rgw/sts: correct error code to 400 (from 403)
for expires sts credentials.

Fixes: https://tracker.ceph.com/issues/73441
Resolves: rhbz#2402526

Signed-off-by: Pritha Srivastava <prsrivas@redhat.com>
3 months agoosd: Make scrub determine the correct object size.
Alex Ainscow [Thu, 13 Nov 2025 11:22:07 +0000 (11:22 +0000)]
osd: Make scrub determine the correct object size.

Permits shard sizes to be either legacy or new after upgrade

Resolves: rhbz#2400427

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit 9609410db0df8879af4c8aacf2ff37b141d1bbcd)

3 months agomgr/nfs: Cephadm support for NFS-Ganesha TLS configuration adding new option xprtsec
Shweta Bhosale [Tue, 11 Nov 2025 07:47:31 +0000 (13:17 +0530)]
mgr/nfs: Cephadm support for NFS-Ganesha TLS configuration adding new option xprtsec

Fixes: https://tracker.ceph.com/issues/73774
Signed-off-by: Shweta Bhosale <Shweta.Bhosale1@ibm.com>
Resolves: rhbz#2413723

 Conflicts:
src/pybind/mgr/nfs/export.py
src/pybind/mgr/nfs/ganesha_conf.py
src/pybind/mgr/nfs/module.py

3 months agomgr/dashboard: hide multi-cluster context switcher
Aashish Sharma [Wed, 12 Nov 2025 13:18:43 +0000 (18:48 +0530)]
mgr/dashboard: hide multi-cluster context switcher

We need to hide multi-cluster context-switcher from the downstream 9.0
clusters in case a cluster with an existing multi-cluster setup upgrades
to this version.

Resolves: rhbz#2406182

Signed-off-by: Aashish Sharma <aasharma@redhat.com>
3 months agomgr/dashboard: allow deletion of non-default zone and zonegroup
Aashish Sharma [Tue, 4 Nov 2025 08:49:03 +0000 (14:19 +0530)]
mgr/dashboard: allow deletion of non-default zone and zonegroup

Fixes: https://tracker.ceph.com/issues/73708
Resolves: rhbz#2406519

Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit c59f5afa7b311d27edc8f9399ed1845993219d14)
(cherry picked from commit b90e37453d796a224b51d3db4f23016454d8e781)

3 months agomgr/cephadm: add tombstones to persist certs info after mgr failover
Redouane Kachach [Thu, 23 Oct 2025 11:10:49 +0000 (13:10 +0200)]
mgr/cephadm: add tombstones to persist certs info after mgr failover

Runtime-added TLS objects names were lost across mgr restarts/failovers
since they existed only in memory. We now write a tombstone to the KV
store whenever a new certificate is registered (empty map for
service/host scope; minimal JSON for global), so the object name is
restored during load().

Fixes: https://tracker.ceph.com/issues/73625
Resolves: rhbz#2404347

Signed-off-by: Redouane Kachach <rkachach@ibm.com>
(cherry picked from commit b9f81a682db2638698b09c6386d9f94c0fae7223)

3 months agomgr/cephadm: fix objects_by_names initialization + some improvements
Redouane Kachach [Thu, 23 Oct 2025 11:06:08 +0000 (13:06 +0200)]
mgr/cephadm: fix objects_by_names initialization + some improvements

This commit include the following changes:

1. Fix objects_by_names initialization as we was using a dict for all
the case including global scoped objects which is not correct. For
those cases an instance of an empty TLSObject must be used.
2. Add sanity checks to the load() method to avoid loading incorrect
and malformed entries.
3. Add some helper functions to avoid code repetition

Fixes: https://tracker.ceph.com/issues/73625
Resolves: rhbz#2404347

Signed-off-by: Redouane Kachach <rkachach@ibm.com>
(cherry picked from commit d06d06fda84363a57fe0c66339fcc2ebd254f4b1)

3 months agomgr/cephadm: update grafana conf for disconnected environment
Nizamudeen A [Fri, 17 Oct 2025 03:36:20 +0000 (09:06 +0530)]
mgr/cephadm: update grafana conf for disconnected environment

Resolves: rhbz#2346107

Fixes: https://tracker.ceph.com/issues/70070
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit fb6dbbf11b73d4806df3f84906526355261e152d)
(cherry picked from commit 3878143bfe199031972af42155696d2a85b92e2f)

3 months agorgw: add missing logic, backport (multi-delete optimization)
Matt Benjamin [Tue, 11 Nov 2025 19:55:28 +0000 (14:55 -0500)]
rgw: add missing logic, backport (multi-delete optimization)

Resolves: rhbz#2387764

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
3 months agorgw/dedup: remove the timeout limit when calling rgw_rados_notify
Gabriel BenHanokh [Tue, 11 Nov 2025 10:59:06 +0000 (12:59 +0200)]
rgw/dedup: remove the timeout limit when calling rgw_rados_notify

Resolves: rhbz#2413802

Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
3 months agorgw/dedup: skip delete-markers
Gabriel BenHanokh [Tue, 11 Nov 2025 10:23:03 +0000 (12:23 +0200)]
rgw/dedup: skip delete-markers

Resolves: rhbz#2413062

Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
3 months agoFixing unused variable in the test
Harsimran Singh [Wed, 29 Oct 2025 12:27:04 +0000 (17:57 +0530)]
Fixing unused variable in the test

Signed-off-by: Harsimran Singh <hsthukral51@gmail.com>
(cherry picked from commit 0c8b668009488911a959c26530e42d2f9e3ad0a1)
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Resolves: rhbz#2402146
Resolves: rhbz#2411930

3 months ago:Fixing issue with sync_owner_stats() method call
Harsimran Singh [Wed, 22 Oct 2025 05:16:53 +0000 (10:46 +0530)]
:Fixing issue with sync_owner_stats() method call

Signed-off-by: Harsimran Singh <hsthukral51@gmail.com>
(cherry picked from commit b8d7601cd8c1127e152227ca13c6680a1768c295)
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Resolves: rhbz#2402146
Resolves: rhbz#2411930

3 months agoFixing Stress Test
Harsimran Singh [Tue, 30 Sep 2025 09:23:50 +0000 (14:53 +0530)]
Fixing Stress Test

Signed-off-by: Harsimran Singh <hsthukral51@gmail.com>
(cherry picked from commit 184cad26575cb73203d1073c82829040d6822d45)
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Resolves: rhbz#2402146
Resolves: rhbz#2411930

3 months agorgw: Fix virtual-hosted style URL
Soumya Koduri [Mon, 27 Oct 2025 05:54:25 +0000 (11:24 +0530)]
rgw: Fix virtual-hosted style URL

if the host_style is set to "Virtual", the endpoint URL and
host should be set to "<bucket_name>.host" while sending
requests.

Resolves: rhbz#2402662

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
(cherry picked from commit a3cba0372126b2e99ffbe46d729a8867754baa71)

3 months agorgw/cloud-transition: Include LocationConstraint for non-default regions
Soumya Koduri [Wed, 29 Oct 2025 20:00:11 +0000 (01:30 +0530)]
rgw/cloud-transition: Include LocationConstraint for non-default regions

Add a new tier-config option "location_constraint" to be configured
while using AWS non-default regions.

Resolves: rhbz#2402662

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
(cherry picked from commit 7bca6467c783a7f33209721ea20d728e58daf457)

3 months agomgr/cephadm: increase default backend health check interval for NFS
Shweta Bhosale [Tue, 4 Nov 2025 13:54:27 +0000 (19:24 +0530)]
mgr/cephadm: increase default backend health check interval for NFS

Fixes: https://tracker.ceph.com/issues/73712
Signed-off-by: Shweta Bhosale <Shweta.Bhosale1@ibm.com>
Resolves: rhbz#2400121

3 months agomgr/dashboard: fix oauth2-service creation UI error
Nizamudeen A [Wed, 5 Nov 2025 04:19:04 +0000 (09:49 +0530)]
mgr/dashboard: fix oauth2-service creation UI error

While creating the service without providing the allowlist domain, the
UI fails with an error which is logged in the mgr log

```
Nov 05 04:11:56 ceph-node-00 ceph-mgr[1587]: [dashboard ERROR frontend.error] (https://192.168.100.100:8443/#/services/(modal:create)): Cannot read properties of null (reading 'split')
                                              TypeError: Cannot read properties of null (reading 'split')
                                                 at ServiceFormComponent.onSubmit (https://192.168.100.100:8443/src_bootstrap_ts.js:31997:74)
                                                 at ServiceFormComponent_Template_cd_form_button_panel_submitActionEvent_60_listener (https://192.168.100.100:8443/src_bootstrap_ts.js:34168:83)
                                                 at executeListenerWithErrorHandling (https://192.168.100.100:8443/node_modules_angular_core_fesm2022_core_mjs.js:26276:12)
                                                 at Object.wrapListenerIn_markDirtyAndPreventDefault [as next] (https://192.168.100.100:8443/node_modules_angular_core_fesm2022_core_mjs.js:26308:18)
                                                 at SafeSubscriber.__tryOrUnsub (https://192.168.100.100:8443/default-node_modules_rxjs__esm2015_internal_AsyncSubject_js-node_modules_rxjs__esm2015_intern-7c6e1a.js:960:10)
                                                 at SafeSubscriber.next (https://192.168.100.100:8443/default-node_modules_rxjs__esm2015_internal_AsyncSubject_js-node_modules_rxjs__esm2015_intern-7c6e1a.js:900:14)
                                                 at Subscriber._next (https://192.168.100.100:8443/default-node_modules_rxjs__esm2015_internal_AsyncSubject_js-node_modules_rxjs__esm2015_intern-7c6e1a.js:847:22)
                                                 at Subscriber.next (https://192.168.100.100:8443/default-node_modules_rxjs__esm2015_internal_AsyncSubject_js-node_modules_rxjs__esm2015_intern-7c6e1a.js:824:12)
                                                 at EventEmitter_.next (https://192.168.100.100:8443/default-node_modules_rxjs__esm2015_internal_AsyncSubject_js-node_modules_rxjs__esm2015_intern-7c6e1a.js:604:17)
                                                 at EventEmitter_.emit (https://192.168.100.100:8443/node_modules_angular_core_fesm2022_core_mjs.js:7069:13)
```

Resolves: rhbz#2412238

Fixes: https://tracker.ceph.com/issues/73717
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit f213d84b947b0f9d98181460d5efca74c34c099a)
(cherry picked from commit b52504abcba2ac9cab046b4c973f2e7807f18d5c)

3 months agomgr/alerts: enforce ssl context to SMTP_SSL
Nizamudeen A [Thu, 30 Oct 2025 04:35:04 +0000 (10:05 +0530)]
mgr/alerts: enforce ssl context to SMTP_SSL

Resolves: rhbz#2392901

Fixes: https://github.com/ceph/ceph/security/advisories/GHSA-xj9f-7g59-m4jx
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 5f7fc5267e55089eeb1cfc87e9c1215c32439102)
(cherry picked from commit 1167b9de50c8e79e8f3d09014e4d78004abf7547)

3 months agoCheck if `HTTP_X_AMZ_COPY_SOURCE` header is empty
Suyash Dongre [Wed, 20 Aug 2025 17:52:41 +0000 (23:22 +0530)]
Check if `HTTP_X_AMZ_COPY_SOURCE` header is empty

The issue was that the `HTTP_X_AMZ_COPY_SOURCE` header could be present but empty (i.e., an empty string rather than NULL). The  code only checked if the pointer was not NULL, but didn't verify that the string had content. When an empty string was passed to RGWCopyObj::parse_copy_location(), it would eventually try to access name_str[0] on an empty string, causing a crash.

Fixes: https://tracker.ceph.com/issues/72669
Resolves: rhbz#2412218

Signed-off-by: Suyash Dongre <suyashd999@gmail.com>
(cherry picked from commit bef59f17293e6e93af025eba1e00646d0b1a2bf0)

3 months agosrc/rgw_bucket.cc: include restore stats in the same json dict for bucket stats
Jiffin Tony Thottan [Fri, 17 Oct 2025 09:00:24 +0000 (14:30 +0530)]
src/rgw_bucket.cc: include restore stats in the same json dict for bucket stats

Resolves: rhbz#2403699

Signed-off-by: Jiffin Tony Thottan <thottanjiffin@gmail.com>
(cherry picked from commit 1085dfca361fc880c14542ea49094753e303ffc8)

3 months agomgr/dashboard: fix icon alignment in navigation header
Naman Munet [Wed, 29 Oct 2025 10:44:53 +0000 (16:14 +0530)]
mgr/dashboard: fix icon alignment in navigation header

Fixes: https://tracker.ceph.com/issues/73665
Resolves: rhbz#2404088

Changes Includes:
Added styles in rh_overrides for btn-tertiary to fix the styles on multisite page and also added class `btn-group` class to make the buttons look like before
Regression introduced by the previous PR https://gitlab.cee.redhat.com/ceph/ceph/-/merge_requests/1323

Signed-off-by: Naman Munet <naman.munet@ibm.com>
(cherry picked from commit b4d61dcd304ef9040e3b53a4730bf1ab451b233f)
(cherry picked from commit 5af064e6b3ccc19019c6fbf2b5749e853b703e4f)

3 months agomgr/dashboard: Remove Multi-Cluster Tab from Dashboard
Aashish Sharma [Fri, 24 Oct 2025 09:08:32 +0000 (14:38 +0530)]
mgr/dashboard: Remove Multi-Cluster Tab from Dashboard

With introduction of the new Multi-Cluster manager we need to remove the
multi-cluster reference from the UI

1. Multi-Cluster tab from side navigation
2. Multi-cluster reference from Multi-site replication wizard

Resolves: rhbz#2406182

Signed-off-by: Aashish Sharma <aasharma@redhat.com>
3 months agocephadm: mount nvmeof conf under /src/
Adam King [Wed, 29 Oct 2025 19:27:09 +0000 (15:27 -0400)]
cephadm: mount nvmeof conf under /src/

The current downstream nvmeof container builds for
9.0 seem to be using /src/ as the home directory for
the container rather than /remote-source/ceph-nvmeof/app/

This is effectively the reverse issue as
seen in https://bugzilla.redhat.com/show_bug.cgi?id=2240588

Resolves: rhbz#2406481

Signed-off-by: Adam King <adking@redhat.com>
3 months agomgr/dashboard: Edit user via UI throwing multiple server errors
Naman Munet [Fri, 24 Oct 2025 05:59:09 +0000 (11:29 +0530)]
mgr/dashboard: Edit user via UI throwing multiple server errors

Fixes: https://tracker.ceph.com/issues/73637
Resolves: rhbz#2403703

Commit includes:
Returning the default user ratelimit when the ratelimit for user is not set, hence eliminating the 500 error on UI

Signed-off-by: Naman Munet <naman.munet@ibm.com>
(cherry picked from commit b72071af0beb30ff022bdfa9b9f970309438632a)
(cherry picked from commit 4755f742247b51f28b1a0be56f30d9f443021b3e)

3 months agomgr/cephadm: For updating NFS backends in HAProxy, send a SIGHUP signal to reload...
Shweta Bhosale [Fri, 24 Oct 2025 11:00:16 +0000 (16:30 +0530)]
mgr/cephadm: For updating NFS backends in HAProxy, send a SIGHUP signal to reload the configuration instead of restart
Fixes: https://tracker.ceph.com/issues/73633
Signed-off-by: Shweta Bhosale <Shweta.Bhosale1@ibm.com>
Resolves: rhbz#2401776

 Conflicts:
src/pybind/mgr/cephadm/serve.py

3 months agomgr/cephadm: Stop NFS service/daemon from starting automatically after reboot, cephad...
Shweta Bhosale [Thu, 23 Oct 2025 05:50:16 +0000 (11:20 +0530)]
mgr/cephadm: Stop NFS service/daemon from starting automatically after reboot, cephadm to manage startup

Fixes: https://tracker.ceph.com/issues/73442
Signed-off-by: Shweta Bhosale <Shweta.Bhosale1@ibm.com>
Resolves: rhbz#2377090

 Conflicts:
src/pybind/mgr/cephadm/serve.py

3 months agoRevert "mgr/cephadm: Restart nfs daemon when same rank daemon is removed"
Shweta Bhosale [Wed, 29 Oct 2025 06:30:26 +0000 (12:00 +0530)]
Revert "mgr/cephadm: Restart nfs daemon when same rank daemon is removed"

This reverts commit 6fc3ac33e179fb67a6435a1bfbbb6c064d53976d.

3 months agofix unreachable code path
Kushal Deb [Wed, 10 Sep 2025 09:53:44 +0000 (15:23 +0530)]
fix unreachable code path

Signed-off-by: Kushal Deb <Kushal.Deb@ibm.com>
Resolves: rhbz#2279848
(cherry-picked from commit 580bf32f4c70699dddc65f571da3c902f66a5b54)

4 months agorgw/dedup: fixes an assertion failure from __snprintf_chk in fortified mode when...
Gabriel BenHanokh [Sun, 26 Oct 2025 15:51:53 +0000 (17:51 +0200)]
rgw/dedup: fixes an assertion failure from __snprintf_chk in fortified mode when handling dedup cluster shard token OIDs.
The issue stems from buffer size validation in string operations.

Resolves: rhbz#2405986

Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
4 months agorgw: fix new_key for swift users.
Shilpa Jagannath [Thu, 23 Oct 2025 22:13:44 +0000 (18:13 -0400)]
rgw: fix new_key for swift users.

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
(cherry picked from commit 34907631f25f8e7c56be30aa73afb4978056c759)
resolves rhbz#2403487

4 months agorgw/dedup:
Gabriel BenHanokh [Thu, 23 Oct 2025 10:11:51 +0000 (13:11 +0300)]
rgw/dedup:
Resolves: rhbz#2401399

Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
4 months agomonitoring: Fix Filesystem grafana dashboard units
Ankush Behl [Mon, 13 Oct 2025 12:43:00 +0000 (18:13 +0530)]
monitoring: Fix Filesystem grafana dashboard units

Fixes: https://tracker.ceph.com/issues/73521
Resolves: rhbz#2405989

Signed-off-by: Ankush Behl <cloudbehl@gmail.com>
(cherry picked from commit 3174b4ee9a92917d353e6f9ccf4cda598f3d6c18)
(cherry picked from commit 917433784522895f1294a001679b7d1f984d9b51)

4 months agomgr/dashboard: Fix timestamps in APIs
Afreen Misbah [Tue, 21 Oct 2025 18:20:19 +0000 (23:50 +0530)]
mgr/dashboard: Fix timestamps in APIs

Resolves: rhbz#2403686

- remove 'Z' from rbd APIs which are returning now `aware` timestamp
- `datetime.utcfromtimestamp` is deprectated so using  `datetime.fromtimestamp(timestamp, tz=tz=timezone.utc)` thereby returning only `aware` timestamp and removing 'Z'.
- similarly `datetime.utcnow()` is deprecated , migrated to `datetime.now(timezone.utc)`

https://docs.python.org/3/library/datetime.html#datetime.datetime.utcnow
https://docs.python.org/3/library/datetime.html#datetime.datetime.utcfromtimestamp

Signed-off-by: Afreen Misbah <afreen@ibm.com>
(cherry picked from commit 9b606ad89683c2f196603fc094eb8d4ae96bb5f2)
(cherry picked from commit d6282398de584baa74b7a57e832163f4c8b95521)

4 months agomgr/smb: use lazy_init to create the rados store
Sachin Prabhu [Mon, 18 Aug 2025 17:17:58 +0000 (18:17 +0100)]
mgr/smb: use lazy_init to create the rados store

The rados store is created when the service is deployed.

Also fix a typo identified in _lazy_init()

Signed-off-by: Sachin Prabhu <sp@spui.uk>
(cherry picked from commit 4fc592eb64cce763f155d485330ea7cc342b3eb7)

Resolves: rhbz#2380412

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
4 months agomonitoring: Fixes for smb overview
Ankush Behl [Tue, 14 Oct 2025 10:39:45 +0000 (16:09 +0530)]
monitoring: Fixes for smb overview

Fixes: https://tracker.ceph.com/issues/73535
Resolves: rhbz#2405692

Signed-off-by: Ankush Behl <cloudbehl@gmail.com>
(cherry picked from commit 39eabe530509312872a5cddd7ab180964a5996b9)
(cherry picked from commit 97002532ca93e8a714e9f7c736a84a1320bbc37a)

4 months agorgw/multisite: check the local bucket's versioning status when replicating deletion...
Jane Zhu [Tue, 14 Oct 2025 21:57:08 +0000 (21:57 +0000)]
rgw/multisite: check the local bucket's versioning status when replicating deletion from remote

Signed-off-by: Jane Zhu <jzhu116@bloomberg.net>
(cherry picked from commit 7e3d493dc3240fc7c8b2976e0de09cf2ecaebd99)
resolves rhbz#2405377

4 months agocall_home: add service events
Yaarit Hatuka [Mon, 4 Aug 2025 00:28:21 +0000 (20:28 -0400)]
call_home: add service events

Open a support case when these events are detected by Prometheus:

CephMonDiskspaceCritical,
CephOSDFull
CephFilesystemOffline
CephFilesystemDamaged
CephPGsInactive
CephObjectMissing

Users need to populate the IBM Customer Number (ICN) and
customer_country_code fields.

Logs of level 1 and 2 will be uploaded to the ticket after it's open.

Resolves: rhbz#2242911
Signed-off-by: Yaarit Hatuka <yhatuka@ibm.com>
(cherry picked from commit 66f20c29299047d2e365da5a6779b287c2f572af)

4 months agomgr/cephadm: remove logging of service specs in osd.py
Adam King [Fri, 17 Oct 2025 02:31:57 +0000 (22:31 -0400)]
mgr/cephadm: remove logging of service specs in osd.py

Resolves: rhbz#2402769

Signed-off-by: Adam King <adking@redhat.com>
4 months agoceph-volume: lvm.Lvm.setup_metadata_devices refactor
Guillaume Abrioux [Thu, 9 Oct 2025 07:31:58 +0000 (09:31 +0200)]
ceph-volume: lvm.Lvm.setup_metadata_devices refactor

This commit refactors setup_metadata_devices into smaller helper methods.
It keeps the distinction between existing logical volumes and raw devices
explicit, centralizes tag handling and path assignment to make the
control flow obvious and separates responsibilities for checking, creating,
and tagging devices.

Fixes: https://tracker.ceph.com/issues/73445
Resolves: rhbz#2401806

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit f6d2b20dbb7ba18dcd137990dd1637794a8f0d70)
(cherry picked from commit 7fa9223335a58ba2c8782d7bec99c8fe29dd09df)

4 months agopython-common/cryptotools: add funcs for call_home_agent crypto activities
Adam King [Fri, 10 Oct 2025 20:46:03 +0000 (16:46 -0400)]
python-common/cryptotools: add funcs for call_home_agent crypto activities

So that cephadm and the call_home_agent modules aren't both attempting
to import cryptography libraries that cause https://tracker.ceph.com/issues/64213

Resolves: rhbz#2402769

Signed-off-by: Adam King <adking@redhat.com>
4 months agomgr/dashboard: add customizations to table-actions
Naman Munet [Thu, 28 Aug 2025 09:12:00 +0000 (14:42 +0530)]
mgr/dashboard: add customizations to table-actions

Fixes: https://tracker.ceph.com/issues/72764
Resolves: rhbz#2404088

Signed-off-by: Naman Munet <naman.munet@ibm.com>
(cherry picked from commit dd06c75e3fe429be496f886aac54831dd7685e5e)

 Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/shared/datatable/table-actions/table-actions.component.html

(cherry picked from commit bcc3d2017c20f4b54e53a87c7684fc6925a254ed)

 Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/rgw/rgw-multisite-details/rgw-multisite-details.component.html

4 months agomgr/dashboard : Fixed usage bar for secondary site in rbd mirroing
Abhishek Desai [Thu, 9 Oct 2025 07:49:34 +0000 (13:19 +0530)]
mgr/dashboard : Fixed usage bar for secondary site in rbd mirroing
Resolves : rhbz#2383217
fixes : https://tracker.ceph.com/issues/73447
Signed-off-by: Abhishek Desai <abhishek.desai1@ibm.com>
(cherry picked from commit 60140b1ccc8006325632320e39fc209724524aef)
(cherry picked from commit 87fc968391636cf294cf878e71a5b21357ff564f)

4 months agomgr/dashboard : Fix secure-monitoring-stack creds issue
Abhishek Desai [Wed, 8 Oct 2025 07:10:22 +0000 (12:40 +0530)]
mgr/dashboard : Fix secure-monitoring-stack creds issue
Resolves: rhbz#2346779
Fixes : https://tracker.ceph.com/issues/73379

Signed-off-by: Abhishek Desai <abhishek.desai1@ibm.com>
(cherry picked from commit 01cb6886bef8a9c8a2c2946fcb7265575e9375d2)
(cherry picked from commit 2140a8fcbe862187c15bac80a39074bdb841271c)

4 months agomgr/dashboard : Skip calls until secure_monitoring_stack is enabled
Abhishek Desai [Tue, 19 Aug 2025 07:11:19 +0000 (12:41 +0530)]
mgr/dashboard : Skip calls until secure_monitoring_stack is enabled
fixes : https://tracker.ceph.com/issues/72635

Signed-off-by: Abhishek Desai <abhishek.desai1@ibm.com>
(cherry picked from commit 42ce56f4b96a46f01d2078132ced40182aa30d68)
(cherry picked from commit f6d0fa2f043df8c0f5b4f9b11026fcddcd734409)

4 months agomgr/cephadm: configurable per-service stop timeout
Kushal Deb [Tue, 9 Sep 2025 12:02:00 +0000 (17:32 +0530)]
mgr/cephadm: configurable per-service stop timeout

Introduce a termination_grace_period field in service spec to define how long the
orchestrator should wait for a service to shut down gracefully before forcefully terminating it.
The value is plumbed mgr -> cephadm and written into 'unit.stop' as 'podman stop -t <N>'

Resolves: rhbz#2387066

Signed-off-by: Kushal Deb <Kushal.Deb@ibm.com>
(cherry picked from commit a2f2696398c3813cfa420050be5cfd901ab3a4a3)

Conflicts:
src/cephadm/cephadm.py
src/pybind/mgr/cephadm/serve.py
src/python-common/ceph/tests/test_service_spec.py

4 months agoFrom efec6856a67b1525606b4cc2cd2861e30ddf0c48 Mon Sep 17 00:00:00 2001
Matt Benjamin [Tue, 14 Oct 2025 13:21:08 +0000 (09:21 -0400)]
From efec6856a67b1525606b4cc2cd2861e30ddf0c48 Mon Sep 17 00:00:00 2001
From: Gabriel BenHanokh <gbenhano@redhat.com>
Date: Mon, 15 Sep 2025 06:58:23 +0000
Subject: [PATCH] rgw/dedup: add throttling mechanism

Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
rgw/dedup: Change throttle code to work lock free and remove the atomic
from the timestamp

Resolves: rhbz#2401399

Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
(cherry picked from commit 16ad586dac47fe9d490ed42a8c93072593b699d3)

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
4 months agomgr/dashboard: Fix indentation of Release in about panel
Afreen Misbah [Tue, 14 Oct 2025 10:44:52 +0000 (16:14 +0530)]
mgr/dashboard: Fix indentation of Release in about panel

Resolves: rhbz#2107518

Signed-off-by: Afreen Misbah <afreen@ibm.com>
4 months agomgr/dashboard: Remove the time dropdown from grafana iframe.
Abhishek Desai [Thu, 3 Jul 2025 08:25:52 +0000 (13:55 +0530)]
mgr/dashboard: Remove the time dropdown from grafana iframe.
Resolves: rhbz#2393264
fixes: https://tracker.ceph.com/issues/71907

Signed-off-by: Abhishek Desai <abhishek.desai1@ibm.com>
(cherry picked from commit 8580fd50d8e1c6ce34d6eba0fe3f7e0d82ca02e3)
(cherry picked from commit fe218974727d3f4b79c272e90fe15d60e3b5a4e4)

4 months agomgr/dashboard: fix form modals background
Nizamudeen A [Tue, 14 Oct 2025 07:15:30 +0000 (12:45 +0530)]
mgr/dashboard: fix form modals background

about modals background is overriden globally. restricting that to just
the modal component

Resolves: rhbz#2403700

Signed-off-by: Nizamudeen A <nia@redhat.com>
4 months agomgr/cephadm: Add stick table and haproxy peers in haproxy.cfg for NFS to support...
Shweta Bhosale [Mon, 15 Sep 2025 16:16:21 +0000 (21:46 +0530)]
mgr/cephadm: Add stick table and haproxy peers in haproxy.cfg for NFS to support nfs active-active cluster

Fixes: https://tracker.ceph.com/issues/72906
Signed-off-by: Shweta Bhosale <Shweta.Bhosale1@ibm.com>
Resolves: rhbz#2388477

 Conflicts:
src/pybind/mgr/cephadm/schedule.py
src/pybind/mgr/cephadm/serve.py
src/pybind/mgr/cephadm/services/cephadmservice.py

4 months agomgr/cephadm: add the VIP to the internal mgmt-gateway cert SAN list
Redouane Kachach [Thu, 9 Oct 2025 08:55:50 +0000 (10:55 +0200)]
mgr/cephadm: add the VIP to the internal mgmt-gateway cert SAN list

Include the VIP as part of the mgmt-gateway internal server
certificate SAN list when operating in HA mode. Otherwise
the communication between internal services might fail.

Fixes: https://tracker.ceph.com/issues/73384
Resolves: rhbz#2402468

Signed-off-by: Redouane Kachach <rkachach@ibm.com>
(cherry picked from commit f63ebe811b0e3c7c090e35c6d2502c804a0bbec1)

4 months agorgw/logging: fix race condition when name update returns ECANCELED
Yuval Lifshitz [Sun, 12 Oct 2025 14:14:36 +0000 (14:14 +0000)]
rgw/logging: fix race condition when name update returns ECANCELED

* when we get ECANCELED indication from the name set operation we should
  bail out and not continue with the rollover
* this fix revealed a hidden bug where we do not check the existing temp
  name when we do conf change cleanup (rollover)

Fixes: https://tracker.ceph.com/issues/73434
Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
(cherry picked from commit 78f62f4207b9752d85a9ccbfa3007f2f2cf79d21)

4 months agomgr/dashboard: type the rbd mirror modes
Nizamudeen A [Wed, 17 Sep 2025 11:10:48 +0000 (16:40 +0530)]
mgr/dashboard: type the rbd mirror modes

Resolves: rhbz#2403464

Fixes: https://tracker.ceph.com/issues/72458
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 93e34a7cb58e27f8c60c9b4b3927c2548aff907e)
(cherry picked from commit 90bdd0c640ba5597d33c9ba55ac8b6ad5a9822bb)

4 months agomgr/dashboard: fix rbd form mirroring toggle
Nizamudeen A [Wed, 17 Sep 2025 03:39:54 +0000 (09:09 +0530)]
mgr/dashboard: fix rbd form mirroring toggle

- fix the toggle not working while editing the image
- the rbd form mirroring toggle doesn't disable/enable the mirror mode
when you change the pool.

- also re-arrange the form in a way that the required fields are together.
- disable mirroring when selecting the namespace

Resolves: rhbz#2403464

Fixes: https://tracker.ceph.com/issues/72458
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 1efd37d4b5fc4363c6c86a84ede9c826c94d2c22)
(cherry picked from commit 63e7555a1f3d877faf54047a46e6499ed2cc05e3)

4 months agomgr/dashboard: fix dashboard freeze on missing smb permissions
Pedro Gonzalez Gomez [Wed, 8 Oct 2025 17:25:29 +0000 (19:25 +0200)]
mgr/dashboard: fix dashboard freeze on missing smb permissions

Resolves: rhbz#2400920

Fixes: https://tracker.ceph.com/issues/73436
Signed-off-by: Pedro Gonzalez Gomez <pegonzal@ibm.com>
(cherry picked from commit d987989acc22b7b7359f80b5310441297bf16b72)
(cherry picked from commit 1520e3a2f52952a9133425233c32b3e5b38e4d22)
Signed-off-by: Afreen Misbah <afreen@ibm.com>
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/core/navigation/navigation/navigation.component.html

4 months agomgr/orchestrator: stop passing "default_flow_style" flag to yaml dump
Adam King [Fri, 10 Oct 2025 14:48:35 +0000 (10:48 -0400)]
mgr/orchestrator: stop passing "default_flow_style" flag to yaml dump

This seems to not be compatible with pyyaml 6.0

```
File "/lib/python3.12/site-packages/ceph/deployment/service_spec.py", line 1350, in __repr__
 y = yaml.dump(cast(dict, self), default_flow_style=False)
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib64/python3.12/site-packages/yaml/__init__.py", line 253, in dump
 return dump_all([data], stream, Dumper=Dumper, **kwds)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib64/python3.12/site-packages/yaml/__init__.py", line 241, in dump_all
 dumper.represent(data)
File "/lib64/python3.12/site-packages/yaml/representer.py", line 28, in represent
 self.serialize(node)
File "/lib64/python3.12/site-packages/yaml/serializer.py", line 54, in serialize
 self.serialize_node(node, None, None)
File "/lib64/python3.12/site-packages/yaml/serializer.py", line 104, in serialize_node
 self.emit(MappingStartEvent(alias, node.tag, implicit,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Prepared.__init__() got an unexpected keyword argument 'flow_style'
```

and didn't seem to cause any issues with making our specs look
readable in the logs or being able to round-trip specs
when using `ceph orch ls --export` (minus the known bug
around doing so with multi-line certs)

Resolves: rhbz#2402684

Signed-off-by: Adam King <adking@redhat.com>
4 months agomgr/dashboard: change the default max namespace from 4096 to None in subsystem add...
Tomer Haskalovitch [Thu, 9 Oct 2025 06:33:57 +0000 (09:33 +0300)]
mgr/dashboard: change the default max namespace from 4096 to None in subsystem add command to take gw default

Resolves: rhbz#2402658

Signed-off-by: Tomer Haskalovitch <tomer.haska@ibm.com>
(cherry picked from commit 97f54b081537e0ee99317927d8da850b72e4415f)

4 months agorgw/lc: At least wait for |rgw_lc_lock_max_time| while trying to fetch the lc-shard...
kchheda3 [Fri, 19 Sep 2025 20:05:55 +0000 (16:05 -0400)]
rgw/lc: At least wait for |rgw_lc_lock_max_time| while trying to fetch the lc-shard lock to get or update the bucket status.

Currently each lc worker would try 1 second to get the lock on lc_shard to decide on which bucket to process and again 1 second to update the bucket status once bucket is lc processed. However when there are multiple rgws running lc, often shard is locked by the other lc worker or if there are issues when the rados is slow the lock is not processed within 1 second and worker either skips processing the bucket or skips updating the bucket, resulting in miss of LC or miss in updating the bucket status.
So in worst case when other lc worker is already processing a shard, wait for rgw_lc_lock_max_time to get the lock, as any given worker can max hold onto rgw_lc_lock_max_time a given shard.

Fixes: https://tracker.ceph.com/issues/72572
Resolves: rhbz#2401203

Signed-off-by: kchheda3 <kchheda3@bloomberg.net>
(cherry picked from commit 937ac626afd3bf443edf96aa177854e8eb291af5)
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
4 months agorgw/lc: if the buckets last lc processing time is less than start time of current...
kchheda3 [Thu, 18 Sep 2025 20:01:50 +0000 (16:01 -0400)]
rgw/lc: if the buckets last lc processing time is less than start time of current LC session, then continue processing bucket for lC even if the status is not in initalized state.

Currently the logic inside expired_session() would consider an LC session valid for almost 2-3 days, so for some bucket where the lc processing POST status update fails, the next lc session would skip the bucket because the expired_session() would return false as it multiplies the num_seconds_day *2. Instead of hardcoding the logic to 2 days, store the start time for each lc session and then compare the bucket update time with lc_start time, if bucket process time is less then current lc start time, then bucket can be processed as previous session is already expired.

Fixes: https://tracker.ceph.com/issues/72572
Resolves: rhbz#2401203

Signed-off-by: kchheda3 <kchheda3@bloomberg.net>
(cherry picked from commit 541d13a6305bac9255348eeeef61d0c5096bf5bf)

4 months agotest/libcephfs: add test for fsync on a write delegated inode
Venky Shankar [Mon, 29 Sep 2025 06:44:28 +0000 (06:44 +0000)]
test/libcephfs: add test for fsync on a write delegated inode

Resolves: rhbz#2355723

Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit be0c40c89c0556ae7696dfaaf6804684ecfaddeb)

4 months agoclient: adjust `Fb` cap ref count check during synchronous fsync()
Venky Shankar [Mon, 29 Sep 2025 06:41:23 +0000 (06:41 +0000)]
client: adjust `Fb` cap ref count check during synchronous fsync()

cephfs client holds a ref on Fb caps when handing out a write delegation[0].
As fsync from (Ganesha) client holding write delegation will block indefinitely[1]
waiting for cap ref for Fb to drop to 0, which will never happen until the
delegation is returned/recalled.

[0]: https://github.com/ceph/ceph/blob/main/src/client/Delegation.cc#L71
[1]: https://github.com/ceph/ceph/blob/main/src/client/Client.cc#L12438

If an inode has been write delegated, adjust for cap reference count
check in fsync().

Note: This only workls for synchronous fsync() since `client_lock` is
held for the entire duration of the call (at least till the patch leading
upto the reference count check). Asynchronous fsync() needs to be fixed
separately (as that can drop `client_lock`).

Resolves: rhbz#2355723

Fixes: https://tracker.ceph.com/issues/73298
Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit d7eca69a5b887e2b65513411280158d06cdb6b3c)

4 months agomgr/dashboard: ns list now support not passing nqn param
Tomer Haskalovitch [Wed, 8 Oct 2025 16:21:46 +0000 (19:21 +0300)]
mgr/dashboard: ns list now support not passing nqn param

Signed-off-by: Tomer Haskalovitch <tomer.haska@ibm.com>
(cherry picked from commit 0419a1a17537917e01950745d3756591d83923da)

4 months agomgr/dashboard: raise exception if both size and rbd_image_size are being passed in...
Tomer Haskalovitch [Thu, 18 Sep 2025 07:58:44 +0000 (10:58 +0300)]
mgr/dashboard: raise exception if both size and rbd_image_size are being passed in ns add

Signed-off-by: Tomer Haskalovitch <tomer.haska@ibm.com>
(cherry picked from commit 3ff7d737bb1934dbfe26d86a819727d9456a6da6)

4 months agomgr/dashboard: support gw get_stats and listener info
Tomer Haskalovitch [Sun, 21 Sep 2025 18:42:49 +0000 (21:42 +0300)]
mgr/dashboard: support gw get_stats and listener info

Update nvmeof/gateway submodule to have the relevant protobuf objects and calls.

Signed-off-by: Tomer Haskalovitch <tomer.haska@ibm.com>
(cherry picked from commit 0ffbf3be8fad64085893afd9acc458ead503fb3b)

# Conflicts:
# src/pybind/mgr/dashboard/services/proto/gateway.proto

4 months agomgr/dashboard: improve search and pagination behavior
Nizamudeen A [Thu, 11 Sep 2025 05:29:47 +0000 (10:59 +0530)]
mgr/dashboard: improve search and pagination behavior

add a throttle to the pagination cycle so that if you repeatedly try to
cycle through the page, it increases the delay. Doing this because
unlike search the button click to change page is deliberate and the
first click to the button should respond immediately.

another thing is that the search with a keyword stores every keystroke i
do in the search field and then after the debouncce interval it sends
all those request one by one.

for eg: if i type 222 it waits 1s for the
debounce timer and then sends a request to find osd with id 2 first then
again 2 and then again 2. Instead it should only send 222 at the end.

Resolves: rhbz#2312512

Fixes: https://tracker.ceph.com/issues/72979
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 5eda016780b91ca46ba394a3a5ef3fd988897ebd)
(cherry picked from commit aa1b1eba918759a4631cf7d02768e8dcc03d6269)

4 months agomgr/dashboard: show loader while changing pages
Nizamudeen A [Thu, 11 Sep 2025 05:25:08 +0000 (10:55 +0530)]
mgr/dashboard: show loader while changing pages

during server side pagination where each pagination cycle is delayed by
1s.

Resolves: rhbz#2312512

Fixes: https://tracker.ceph.com/issues/72979
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 66ce55ae2bc823e39f5f0c9e4f1db7609f85974d)
(cherry picked from commit bb81d3366d5fb7785021bc20f8c2388b52a80996)

4 months agomgr/dashboard: fix missing schedule interval in rbd API
Nizamudeen A [Thu, 11 Sep 2025 04:13:13 +0000 (09:43 +0530)]
mgr/dashboard: fix missing schedule interval in rbd API

Fetching the rbd image schedule interval through the rbd_support module
schedule list command

GET /api/rbd will have the following field per image
```
"schedule_info": {
                    "image": "rbd/rbd_1",
                    "schedule_time": "2025-09-11 03:00:00",
                    "schedule_interval": [
                        {
                            "interval": "5d",
                            "start_time": null
                        },
                        {
                            "interval": "3h",
                            "start_time": null
                        }
                    ]
                },
```

Also fixes the UI where schedule interval was missing in the form and
also disable editing the schedule_interval.

Extended the same thing to the `GET /api/pool` endpoint.

Resolves: rhbz#2392374

Fixes: https://tracker.ceph.com/issues/72977
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 72cebf0126bd07f7d42b0ae7b68646c527044942)

4 months agomgr/dashboard: fix prometheus API error when not configured
Nizamudeen A [Mon, 22 Sep 2025 15:43:52 +0000 (21:13 +0530)]
mgr/dashboard: fix prometheus API error when not configured

Resolves: rhbz#2398040

Fixes: https://tracker.ceph.com/issues/73174
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 0c0e0d436e63fa767da149402fead6a25e513978)
(cherry picked from commit 4689bb4a039ac33353fbf386d1e4feb6254ffebf)

4 months agomgr/dashboard: Rename side-nav panel items
Naman Munet [Mon, 29 Sep 2025 04:51:06 +0000 (10:21 +0530)]
mgr/dashboard: Rename side-nav panel items

Fixes: https://tracker.ceph.com/issues/73252
Resolves: rhbz#2402656

Commit includes changes:
1) Renaming Topic to Notification destination
2) Renaming Tiering to Storage class
3) Renaming Users to User Management
4) fix storage class table refresh after delete
5) Also made changes to internal routing for topic and storage class

Signed-off-by: Naman Munet <naman.munet@ibm.com>
(cherry picked from commit 7aac42984c7ea24555ba1f8936a550c39902c389)
(cherry picked from commit 79c4abd9a2521fdb6a373eb205d5a3a74d04e417)

4 months agoblk/kernel: bring "bdev_async_discard" config parameter back.
Igor Fedotov [Wed, 12 Mar 2025 14:42:24 +0000 (17:42 +0300)]
blk/kernel: bring "bdev_async_discard" config parameter back.

To ensure backword compatibility for clusters with this parameter
previously set to true.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 7b914cb49d50241b2ed7811d8660fce27a80ae39)
(cherry picked from commit 615ca9bfbc279bd0be7c0545b35d681dd168d088)

Resolves: rhbz#2394395