]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/log
ceph.git
4 weeks agoMerge pull request #68782 from smanjara/wip-fix-frontend-exception
Shilpa Jagannath [Mon, 18 May 2026 16:54:01 +0000 (09:54 -0700)]
Merge pull request #68782 from smanjara/wip-fix-frontend-exception

rgw: catch exception from abort_early() on client disconnect

4 weeks agomgr/dashboard: Remove `ng-click-outside` and `ngx-toastr`package
Afreen Misbah [Mon, 18 May 2026 11:33:16 +0000 (17:03 +0530)]
mgr/dashboard: Remove `ng-click-outside` and `ngx-toastr`package

Fixes https://tracker.ceph.com/issues/70934
Fixes https://tracker.ceph.com/issues/76631

Signed-off-by: Afreen Misbah <afreen@ibm.com>
4 weeks agoMerge pull request #68685 from perezjosibm/wip-perezjos-doc-crimson-dev
Jose Juan Palacios-Perez [Mon, 18 May 2026 15:20:49 +0000 (16:20 +0100)]
Merge pull request #68685 from perezjosibm/wip-perezjos-doc-crimson-dev

doc: crimson/dev - add a vstart.sh example using SeaStore options, minor formatting fixes

4 weeks agoMerge pull request #68891 from rhcs-dashboard/carbonize-cluster-wide-osd-flags-modal
Afreen Misbah [Mon, 18 May 2026 14:55:15 +0000 (20:25 +0530)]
Merge pull request #68891 from rhcs-dashboard/carbonize-cluster-wide-osd-flags-modal

mgr/dashboard: Carbonize cluster-wide OSD flags modal

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: pujaoshahu <pshahu@redhat.com>
4 weeks agoMerge pull request #68971 from rhcs-dashboard/carbonize-upgrade
Afreen Misbah [Mon, 18 May 2026 14:54:28 +0000 (20:24 +0530)]
Merge pull request #68971 from rhcs-dashboard/carbonize-upgrade

Carbonize upgrade page

Reviewed-by: Devika Babrekar <devika.babrekar@ibm.com>
4 weeks agodoc:crimson-dev: add RANDOM_BLOCK_SSD usage example, fix indentation 68685/head
Jose J Palacios-Perez [Fri, 8 May 2026 09:58:13 +0000 (10:58 +0100)]
doc:crimson-dev: add RANDOM_BLOCK_SSD usage example, fix indentation

Signed-off-by: Jose J Palacios-Perez <perezjos@uk.ibm.com>
4 weeks agoMerge PR #68937 into main
Patrick Donnelly [Mon, 18 May 2026 14:20:08 +0000 (10:20 -0400)]
Merge PR #68937 into main

* refs/pull/68937/head:
.github/workflows/releng-audit: group events to serialize executions
.github/workflows/releng-audit: remove override on reopen
.github/workflows/releng-audit: refactor auth check to function

Reviewed-by: Yuri Weinstein <yweins@redhat.com>
4 weeks agocephadm: disable UDP in samples/nfs.json for test_cephadm Ganesha 68976/head
Kobi Ginon [Mon, 18 May 2026 13:45:32 +0000 (16:45 +0300)]
cephadm: disable UDP in samples/nfs.json for test_cephadm Ganesha
test_cephadm.sh deploys NFS through cephadm _orch deploy using
src/cephadm/samples/nfs.json. That sample is separate from the mgr
ganesha.conf.j2 template, which already sets Enable_UDP = false.
Without that setting, Ganesha on Rocky 10 (ceph-ci image) fails during
startup with "Cannot register NFS V3 on UDP", so test_cephadm.sh never
sees ganesha.nfsd listening on port 2049.
Add Protocols = 3, 4 and Enable_UDP = false to NFS_CORE_PARAM so the
sample matches the orchestrator defaults.
Fixes: https://tracker.ceph.com/issues/76295
Signed-off-by: Kobi Ginon <kginon@redhat.com>
4 weeks agocrimson/os/seastore: yield to user IO between cleaner cycles 68961/head
Shai Fultheim [Sun, 17 May 2026 08:27:00 +0000 (11:27 +0300)]
crimson/os/seastore: yield to user IO between cleaner cycles

After the deadlock fix in the preceding commit ("fix IO-block deadlock
when cleaner is sleeping"), the cleaner stays awake while user IO is
blocked, but a second symptom appears at high alive_ratio (~0.79): the
cleaner's segment-allocate-and-fill loop runs tightly enough that the
user-IO continuation scheduled by maybe_wake_blocked_io() never gets a
chance to retry try_reserve_io() before the cleaner consumes the
projected_avail headroom again on its next iteration. User IO wakes,
sees projected_avail still below hard_limit, re-blocks immediately.

In the qa/standalone/crimson randwrite bench this manifests as: cluster
makes 500-700 GB of progress, then user_written counter freezes for
~75 seconds (watchdog window) while the cleaner is fully busy.

In BackgroundProcess::run(), after each do_background_cycle, if user IO
is currently blocked, yield to the reactor. That gives the woken
user-IO continuation a chance to slot in and complete a reservation
before the cleaner starts its next reservation-consuming cycle.

With this change, the same bench runs 19 minutes (vs 11-16 min) and
writes 785 GB user (vs 506-692 GB) before the next cluster limit hits,
which is the inherent throughput cap at alive_ratio 0.79 where each
reclaim only frees ~21% of segment size — not a coordination bug.

Signed-off-by: Shai Fultheim <shai.fultheim@gmail.com>
4 weeks agocrimson/os/seastore: fix IO-block deadlock when cleaner is sleeping
Shai Fultheim [Sun, 17 May 2026 04:43:19 +0000 (07:43 +0300)]
crimson/os/seastore: fix IO-block deadlock when cleaner is sleeping

Two coordinated changes that together close a stall observed at high
alive_ratio in the qa/standalone/crimson randwrite bench (one OSD
frozen for 70+ minutes, alive_ratio ~0.79, projected_avail_ratio ~0.10,
slow_ops accumulating indefinitely).

1. SegmentCleaner::should_clean_space() used segments.get_available_ratio()
   (actual ratio) while should_block_io_on_clean() used
   get_projected_available_ratio() (actual minus in-flight reservations).
   When the actual ratio sat just above available_ratio_hard_limit but
   the projected ratio dipped below it, IO would block while the cleaner
   slept. Make should_clean_space() also trip on the projected ratio.

2. BackgroundProcess::reserve_projected_usage() did not wake the
   background process when an IO blocked. With the cleaner asleep and
   all IO blocked, nothing called maybe_wake_blocked_io() (no
   release_projected_usage runs without completing IO; no segment
   release runs without the cleaner). Kick do_wake_background() at the
   point of blocking, so the cleaner re-evaluates and runs.

Signed-off-by: Shai Fultheim <shai.fultheim@gmail.com>
4 weeks agoMerge pull request #68868 from rhcs-dashboard/fix-edit
Afreen Misbah [Mon, 18 May 2026 13:20:11 +0000 (18:50 +0530)]
Merge pull request #68868 from rhcs-dashboard/fix-edit

mgr/dashboard: Fix edit and delete access for pool-manager role

Reviewed-by: Abhishek Desai <abhishek.desai1@ibm.com>
4 weeks agoMerge pull request #68951 from rhcs-dashboard/revert-nx
Afreen Misbah [Mon, 18 May 2026 13:19:34 +0000 (18:49 +0530)]
Merge pull request #68951 from rhcs-dashboard/revert-nx

Revert: mgr/dashboard: reverting the nx tool changes

Reviewed-by: Nizamudeen A <nia@redhat.com>
4 weeks agomgr/dashboard: Remove cephfs mirroring navigation from Umbrella 68967/head
Dnyaneshwari Talwekar [Mon, 18 May 2026 04:49:36 +0000 (10:19 +0530)]
mgr/dashboard: Remove cephfs mirroring navigation from Umbrella

Fixes: https://tracker.ceph.com/issues/76649
Signed-off-by: Dnyaneshwari Talwekar <dtalweka@redhat.com>
4 weeks agoMerge pull request #67547 from mheler/wip-list-restorestatus
mheler [Mon, 18 May 2026 11:23:15 +0000 (06:23 -0500)]
Merge pull request #67547 from mheler/wip-list-restorestatus

rgw: add RestoreStatus support to object listings

4 weeks agomgr/dashboard: fix logs e2e tests after carbonization 68971/head
Afreen Misbah [Mon, 18 May 2026 10:01:58 +0000 (15:31 +0530)]
mgr/dashboard: fix logs e2e tests after carbonization

Update e2e test selectors to match the new Carbon component structure.
The .card-body and .message classes were replaced with .log-viewer
and .log-entry__message after carbonizing the logs component.

Assisted-by: Claude
Signed-off-by: Afreen Misbah <afreen@ibm.com>
4 weeks agoMerge pull request #68953 from rhcs-dashboard/linter-modernization-research
Afreen Misbah [Mon, 18 May 2026 10:01:28 +0000 (15:31 +0530)]
Merge pull request #68953 from rhcs-dashboard/linter-modernization-research

mgr/dashboard: Replace htmllint with Prettier for HTML linting

Reviewed-by: Nizamudeen A <nia@redhat.com>
4 weeks agoRevert "mgr/dashboard: set up dashboard as a app shell" 68951/head
Afreen Misbah [Fri, 15 May 2026 22:34:44 +0000 (04:04 +0530)]
Revert "mgr/dashboard: set up dashboard as a app shell"

Fixes https://tracker.ceph.com/issues/74006

This reverts commit a0dd52fe100932922ceab9277490bfa2f8631431.

 Conflicts:
src/pybind/mgr/dashboard/frontend/module-federation.config.ts
src/pybind/mgr/dashboard/frontend/package-lock.json
src/pybind/mgr/dashboard/frontend/package.json
src/pybind/mgr/dashboard/frontend/project.json

Signed-off-by: Afreen Misbah <afreen@ibm.com>
5 weeks agoRevert " mgr/dashboard: add rollup as optional deps"
Afreen Misbah [Fri, 15 May 2026 22:28:34 +0000 (03:58 +0530)]
Revert " mgr/dashboard: add rollup as optional deps"

This reverts commit 6f14d6f25f06ed3d78a4c603e1ad9f10fc9c17d8.

 Conflicts:
src/pybind/mgr/dashboard/frontend/package-lock.json
src/pybind/mgr/dashboard/frontend/package.json

Signed-off-by: Afreen Misbah <afreen@ibm.com>
5 weeks agomgr/dashboard: remove unused upgradable component
Afreen Misbah [Sun, 17 May 2026 21:18:22 +0000 (02:48 +0530)]
mgr/dashboard: remove unused upgradable component

The upgradable component is no longer used after converting
the upgrade page to use Carbon tiles directly.

Assisted-by: Claude
Signed-off-by: Afreen Misbah <afreenmisbah@ibm.com>
5 weeks agomgr/dashboard: carbonize logs component
Afreen Misbah [Sun, 17 May 2026 21:18:11 +0000 (02:48 +0530)]
mgr/dashboard: carbonize logs component

Fixes https://tracker.ceph.com/issues/68260

Assisted-by: Claude
Signed-off-by: Afreen Misbah <afreenmisbah@ibm.com>
5 weeks agomgr/dashboard: Carbonize upgrade page
Afreen Misbah [Sun, 17 May 2026 14:53:54 +0000 (20:23 +0530)]
mgr/dashboard: Carbonize upgrade page

- Made cluster status clickable to navigate to overview when not HEALTH_OK
- Replaced Bootstrap classes with Carbon design tokens
- Updated upgrade.component.scss to use CSS custom properties

Assisted-by: Claude
Signed-off-by: Afreen Misbah <afreenmisbah@ibm.com>
5 weeks agoMerge pull request #66908 from rkachach/fix_nvmeof_dashboard_interface
Redouane Kachach [Mon, 18 May 2026 07:03:20 +0000 (09:03 +0200)]
Merge pull request #66908 from rkachach/fix_nvmeof_dashboard_interface

mgr/cephadm: Add a new cephadm's API to get nvmeof TLS bundle

Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Adam King <adking@redhat.com>
5 weeks agomgr/dashboard: NFS enhancements - terminology alignment 68970/head
Dnyaneshwari Talwekar [Mon, 18 May 2026 06:55:14 +0000 (12:25 +0530)]
mgr/dashboard: NFS enhancements - terminology alignment

Fixes: https://tracker.ceph.com/issues/76655
Signed-off-by: Dnyaneshwari Talwekar <dtalweka@redhat.com>
5 weeks agoMerge pull request #68686 from rishabh-d-dave/fs-scrub-set-flag-for-dirfrags
Venky Shankar [Mon, 18 May 2026 05:24:29 +0000 (10:54 +0530)]
Merge pull request #68686 from rishabh-d-dave/fs-scrub-set-flag-for-dirfrags

mds/ScrubStack: set added_children to true for dirfrags too

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 weeks agoMerge pull request #67752 from supriti/wip-s3-policy-keystone-role
anrao19 [Mon, 18 May 2026 05:01:21 +0000 (10:31 +0530)]
Merge pull request #67752 from supriti/wip-s3-policy-keystone-role

rgw: Inject keystone roles into IAM policy

5 weeks agoMerge pull request #68740 from smanjara/wip-fix-multi-delete-crash
anrao19 [Mon, 18 May 2026 04:59:49 +0000 (10:29 +0530)]
Merge pull request #68740 from smanjara/wip-fix-multi-delete-crash

rgw: remove redundant close_section() call in RGWDeleteMultiObj end_response()

5 weeks agoMerge pull request #68601 from aza547/multisite-data-log-fix
anrao19 [Mon, 18 May 2026 04:47:55 +0000 (10:17 +0530)]
Merge pull request #68601 from aza547/multisite-data-log-fix

rgw: multisite sync data_log error handling broken in tentacle

5 weeks agoMerge pull request #68567 from aza547/radosgw-sync-status-flush-fix
anrao19 [Mon, 18 May 2026 04:47:40 +0000 (10:17 +0530)]
Merge pull request #68567 from aza547/radosgw-sync-status-flush-fix

radosgw-admin: fix output of sync status

5 weeks agomgr/dashboard: Fix mon_allow_pool_delete unit test 68868/head
Afreen Misbah [Tue, 12 May 2026 20:16:56 +0000 (01:46 +0530)]
mgr/dashboard: Fix mon_allow_pool_delete unit test

Signed-off-by: Afreen Misbah <afreen@ibm.com>
5 weeks agomgr/dashboard: Fix edit and delete access for pool-manager role
Afreen Misbah [Tue, 12 May 2026 12:07:39 +0000 (17:37 +0530)]
mgr/dashboard: Fix edit and delete access for pool-manager role

Fixes https://tracker.ceph.com/issues/76561

- allows deleting pools in pool-manager role by bypassing config-opt read permissions
- allows editing in pool-manager role which failing deu to misisng rbd mirroring permissions
- fixes a bug with pool edit mode where when both compression and name are edited it fails due to an if-else logic bug

Signed-off-by: Afreen Misbah <afreen@ibm.com>
5 weeks agocmake/BuildISAL: build and install library targets only 68758/head
Kefu Chai [Wed, 6 May 2026 02:08:20 +0000 (10:08 +0800)]
cmake/BuildISAL: build and install library targets only

Skip building the igzip executables; Ceph only needs libisal.la.
This should speed up the build a little bit, as we don't build the
executables previous built with "make"

Signed-off-by: Kefu Chai <k.chai@proxmox.com>
5 weeks agocrimson/osd: complete PGAdvanceMap's pg-deleted path properly 68823/head
Kefu Chai [Sun, 17 May 2026 10:06:04 +0000 (18:06 +0800)]
crimson/osd: complete PGAdvanceMap's pg-deleted path properly

When the pg has been deleted while PGAdvanceMap was queued, start()
takes an early return and skips the map-advance loop. The PeeringCtx
handed to the operation may already carry a transaction and queued
peering messages, so returning without calling complete_rctx() would
drop them. Dispatch the rctx before bailing out.

Also leave the PGPeeringPipeline::process stage via handle.complete()
instead of relying on the exit_handle defer's handle.exit(). The
pg-deleted path is a graceful completion, not an op failure, so it
should match the normal completion path; handle.exit() is documented
for the failure case only.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
5 weeks agocrimson/osd: remove unnecessary 'using cached_map_t'
Kefu Chai [Tue, 12 May 2026 06:15:04 +0000 (14:15 +0800)]
crimson/osd: remove unnecessary 'using cached_map_t'

`PGAdvanceMap` already declare an identical alias in-class, see
pg_advance_map.h, so no need to repeat it.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>
5 weeks agocrimson/osd: coroutinize PGAdvanceMap::start()
Kefu Chai [Tue, 12 May 2026 06:11:29 +0000 (14:11 +0800)]
crimson/osd: coroutinize PGAdvanceMap::start()

for better readability. also take this opportunity to use
seastar::defer() to call handle->exit() in the error handling path.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>
5 weeks agocrimson/osd: make PGAdvanceMap idempotent
Kefu Chai [Fri, 8 May 2026 12:43:12 +0000 (20:43 +0800)]
crimson/osd: make PGAdvanceMap idempotent

PGAdvanceMap is scheduled by two independent callers: pg creation
(do_init=true, to=current_epoch) and broadcast_map_to_pgs. They do
not coordinate, so a broadcast advance can race with an init advance
that has already pushed the pg past the broadcast's target. The op
carried a std::optional<from> that was overwritten at start-time,
guarded by ceph_assert(from <= to) which fires in this race.

The "from" parameter was never really an input. It was always
re-read inside the pipeline from pg->get_osdmap_epoch(); the value
passed in (when there was one) was discarded. Drop the member and
let the op contract be: "ensure pg has processed osdmaps up to at
least 'to'". If the pg is already past 'to', skip. This matches
OSD::advance_pg() in the classic OSD and removes the need to
serialize the two callers.

Fixes: https://tracker.ceph.com/issues/61744
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
5 weeks agoMerge pull request #68949 from fultheim/fix-cleanr-space-leak
Matan Breizman [Sun, 17 May 2026 08:32:46 +0000 (11:32 +0300)]
Merge pull request #68949 from fultheim/fix-cleanr-space-leak

crimson/os/seastore: fix cleaner space leak from shadowed result list

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
Reviewed-by: Kefu Chai <tchaikov@gmail.com>
5 weeks agomgr/dashboard: Replace htmllint with Prettier for HTML linting 68953/head
Afreen Misbah [Sat, 16 May 2026 23:20:24 +0000 (04:50 +0530)]
mgr/dashboard: Replace htmllint with Prettier for HTML linting

Fixes: https://tracker.ceph.com/issues/76631
Signed-off-by: Afreen Misbah <afreenmisbah@example.com>
5 weeks agocrimson/os/seastore: fix cleaner space leak from shadowed result list 68949/head
Shai Fultheim [Sat, 16 May 2026 20:17:59 +0000 (23:17 +0300)]
crimson/os/seastore: fix cleaner space leak from shadowed result list

TransactionManager::get_extents_if_live() declared an inner
std::list<CachedExtentRef> res inside the "extent is cached" branch
that shadowed the outer res returned by the coroutine. When the
queried extent was present in the cache, it was moved into the inner
list and immediately discarded, and the empty outer list was returned
to the caller.

The async cleaner uses this result to decide whether to rewrite an
extent or treat it as dead. For recently-allocated LBA tree internal
nodes (still hot in cache), the shadowed return caused the cleaner to
skip them, so mark_space_free() never paired with the earlier
mark_space_used(). Each affected reclaim leaked exactly one extent
(4 KiB for LADDR_INTERNAL), tripping the live_bytes != 0 assertion in
SegmentCleaner::clean_space() (async_cleaner.cc:1441) once a victim
segment with such a leftover was selected.

The reproducer (at ~70% full) deterministically aborted within ~3
minutes before this fix; with the fix the OSDs run cleanly past the
trigger point.

Fixes: 87a5984b3ae ("crimson/.../transaction_manager: convert get_extents_if_live to coroutine")
Signed-off-by: Shai Fultheim <shai.fultheim@gmail.com>
5 weeks ago.github/workflows/releng-audit: group events to serialize executions 68937/head
Patrick Donnelly [Fri, 15 May 2026 15:43:08 +0000 (11:43 -0400)]
.github/workflows/releng-audit: group events to serialize executions

This avoids confusion when several events are fired for e.g. label
changes before the bot can validate each change is authorized.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
Assisted-by: Gemini
5 weeks ago.github/workflows/releng-audit: remove override on reopen
Patrick Donnelly [Fri, 15 May 2026 15:17:41 +0000 (11:17 -0400)]
.github/workflows/releng-audit: remove override on reopen

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
Assisted-by: Gemini
5 weeks ago.github/workflows/releng-audit: refactor auth check to function
Patrick Donnelly [Fri, 15 May 2026 15:17:01 +0000 (11:17 -0400)]
.github/workflows/releng-audit: refactor auth check to function

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
Assisted-by: Gemini
5 weeks agoMerge pull request #68743 from tchaikov/mgr-get_metadata
Kefu Chai [Sat, 16 May 2026 12:49:25 +0000 (20:49 +0800)]
Merge pull request #68743 from tchaikov/mgr-get_metadata

pybind/mgr/status: drop asserts that fight the defaultdict defaults

Reviewed-by: Nitzan Mordechai <nmordec@ibm.com>
5 weeks agocrimson/os/seastore: use configured device type to select segment manager 68358/head
Ronen Friedman [Mon, 13 Apr 2026 15:17:26 +0000 (15:17 +0000)]
crimson/os/seastore: use configured device type to select segment manager

In get_segment_manager(), trust the user-specified device type rather
than probing the device for ZNS zones. This simplifies the
device type selection. More important:  the change avoids opening a block
file which was not yet created by the mkfs (we start() seastore then
call mkfs. but starting seastore requires creating the right segment
manager. And, currently, we probe the not-yet-created block file in mkfs()
when trying to create it.)

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Co-authored-by: Kefu Chai <k.chai@proxmox.com>
5 weeks agodoc/dev: refresh vstart.sh options in dev_cluster_deployment 68946/head
Kefu Chai [Sat, 16 May 2026 02:53:41 +0000 (10:53 +0800)]
doc/dev: refresh vstart.sh options in dev_cluster_deployment

Bring doc/dev/dev_cluster_deployment.rst back in line with the current
src/vstart.sh:

* drop the removed -K/--kstore objectstore backend
* drop -N/--not-new, which was dropped in 8dd2e418; reusing the existing
  cluster config is simply the default when -n is not given
* correct the --rgw_frontend default from civetweb to beast
* note that -b/--bluestore is the default objectstore backend
* update the example and add a note that a fresh build needs -n on the
  first run, while later runs can omit it
* note that the option list is not exhaustive and point at src/vstart.sh

Fixes: https://tracker.ceph.com/issues/57272
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
5 weeks agosrc/cephadm: added ceph-exporter to post-rotate signal list 68902/head
Timothy Q Nguyen [Wed, 13 May 2026 23:19:57 +0000 (16:19 -0700)]
src/cephadm: added ceph-exporter to post-rotate signal list

As the title says this change simply adds ceph-exporter to a logrotate
list which will ensure ceph-exporter will continue writing to a new log
file even after log rotation. Currently no new log file will be written to
and you will have to manually add ceph-exporter to logrotate.d.

Signed-off-by: Timothy Q Nguyen <timqn22@gmail.com>
5 weeks agoqa: ignore expected OSD_ROOT_DOWN 68893/head
Patrick Donnelly [Fri, 15 May 2026 17:20:50 +0000 (13:20 -0400)]
qa: ignore expected OSD_ROOT_DOWN

Fixes: https://tracker.ceph.com/issues/76620
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
5 weeks agoqa: ignore fs offline warning
Patrick Donnelly [Fri, 15 May 2026 17:12:59 +0000 (13:12 -0400)]
qa: ignore fs offline warning

Dup of FS_DOWN.

Fixes: https://tracker.ceph.com/issues/76619
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
5 weeks agoqa: add MDS_INSUFFICIENT_STANDBY to ignorelist
Patrick Donnelly [Wed, 13 May 2026 14:22:50 +0000 (10:22 -0400)]
qa: add MDS_INSUFFICIENT_STANDBY to ignorelist

This is expected when MDS are going up and down.

Fixes: https://tracker.ceph.com/issues/75419
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
5 weeks agoqa/suites/upgrade: use common ignorelist
Patrick Donnelly [Wed, 13 May 2026 14:22:05 +0000 (10:22 -0400)]
qa/suites/upgrade: use common ignorelist

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
5 weeks agoMerge pull request #68571 from lumir-sliva/wip-rgw-postobj-bytes-received
Adam Emerson [Fri, 15 May 2026 17:04:04 +0000 (13:04 -0400)]
Merge pull request #68571 from lumir-sliva/wip-rgw-postobj-bytes-received

rgw: account presigned POST bytes_received in usage log

Reviewed-by: Casey Bodley <cbodley@redhat.com>
5 weeks agoMerge pull request #68932 from mheler/wip-mclock-docs
Mark Nelson [Fri, 15 May 2026 17:03:01 +0000 (10:03 -0700)]
Merge pull request #68932 from mheler/wip-mclock-docs

doc/rados/configuration: recommend wpq for EC clusters seeing slow ops

5 weeks agocephadm: fix mgr ports list growth; add unit tests (#76564) 68915/head
Kobi Ginon [Fri, 15 May 2026 16:22:30 +0000 (19:22 +0300)]
cephadm: fix mgr ports list growth; add unit tests (#76564)

Problem
-------
MgrService.prepare_create built module ports from ``mgr services`` but
only assigned them when non-empty, then always appended
service_discovery_port. After rehydration from mgr/cephadm/host.*, an
empty ``mgr services`` response left stale ports in place and appended
another 8765 each redeploy (https://tracker.ceph.com/issues/76564).

Fix
---
Always set ``daemon_spec.ports = ports + [service_discovery_port]`` so
each prepare uses a fresh list plus exactly one discovery port.

Tests
-----
Add src/pybind/mgr/cephadm/tests/services/test_mgr.py: empty vs
non-empty ``mgr services`` with carried-over duplicate ports. Cases
consolidated from parallel PR https://github.com/ceph/ceph/pull/68879 .

Authors
-------
Kobi Ginon (@kginonredhat) — fix + test integration for this PR.
Raimund Sacherer (@rsacherer) — original fix + tests in #68879;
coordinated to land a single PR (#68915).

Fixes: https://tracker.ceph.com/issues/76564
Signed-off-by: Kobi Ginon <kginon@redhat.com>
Signed-off-by: Raimund Sacherer <rsachere@redhat.com>
5 weeks agoMerge pull request #68909 from ShwetaBhosale1/fix_nfs_version_build_issue
David Galloway [Fri, 15 May 2026 16:21:21 +0000 (12:21 -0400)]
Merge pull request #68909 from ShwetaBhosale1/fix_nfs_version_build_issue

Use GANESHA_REPO_BASEURL for NFS-Ganesha on all distros

5 weeks agoMerge PR #68931 into main
Patrick Donnelly [Fri, 15 May 2026 15:45:58 +0000 (11:45 -0400)]
Merge PR #68931 into main

* refs/pull/68931/head:
doc/dev: fix release cycle diagram and missing text

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
5 weeks agoMerge PR #68923 into main
Patrick Donnelly [Fri, 15 May 2026 15:15:46 +0000 (11:15 -0400)]
Merge PR #68923 into main

* refs/pull/68923/head:
script/ptl-tool: consolidate conflict reviews

Reviewed-by: Yuri Weinstein <yweins@redhat.com>
5 weeks agoMerge PR #68921 into main
Patrick Donnelly [Fri, 15 May 2026 15:13:28 +0000 (11:13 -0400)]
Merge PR #68921 into main

* refs/pull/68921/head:
.github/workflows/releng-audit: handle missing case of skipping audit on override

Reviewed-by: Yuri Weinstein <yweins@redhat.com>
5 weeks agoosd: Fix bug when calculating min_peer_features 68936/head
Bill Scales [Fri, 15 May 2026 14:39:25 +0000 (15:39 +0100)]
osd: Fix bug when calculating min_peer_features

PeeringState calculates the minimum set of features for the set
of OSDs within a PG. There is a bug when the peer info has
already been cached where these peers features are not included
in the calculation. This can lead to the min feature set
including features that not all OSDs have.

Previously this just made some asserts less aggressive than they
should have been. Pull request https://github.com/ceph/ceph/pull/57740
uses min_peer_features to decide how to encode messages to other OSDs.

Midway through an upgrade this bug can cause an OSD to send
the wrong version of a message to a downlevel OSD causing
it to abort.

Fixes: https://tracker.ceph.com/issues/76600
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
5 weeks agorgw/beast: add ssl_ciphersuites option for tls 1.3 68934/head
Casey Bodley [Fri, 15 May 2026 14:40:50 +0000 (10:40 -0400)]
rgw/beast: add ssl_ciphersuites option for tls 1.3

the existing ssl_ciphers option is passed to `SSL_CTX_set_cipher_list()`
which only applies to "TLSv1.2 and below". there's a separate
`SSL_CTX_set_ciphersuites()` for TLSv1.3

because the frontend's default configuration for `ssl_options` accepts
both 1.2 and 1.3, users may need to specify ciphers for each. that's why
`ssl_ciphersuites` is introduced as a separate option

Fixes: https://tracker.ceph.com/issues/76578
Signed-off-by: Casey Bodley <cbodley@redhat.com>
5 weeks agodoc/rados/configuration: recommend wpq for EC clusters seeing slow ops 68932/head
Matthew N. Heler [Fri, 15 May 2026 11:11:35 +0000 (06:11 -0500)]
doc/rados/configuration: recommend wpq for EC clusters seeing slow ops

On large EC clusters, mClock currently routes recovery EC sub-reads
through the immediate queue, skipping throttling. When many OSDs read
from one source during recovery, that source's high-priority queue
saturates and starves client work, producing slow ops. Recommend
falling back to wpq in the mClock config reference until the
scheduler treats those reads as background.

Signed-off-by: Matthew N. Heler <matthew.heler@hotmail.com>
5 weeks agoqa/crimson: disable test-coredump 68880/head
Ronen Friedman [Wed, 13 May 2026 08:06:17 +0000 (08:06 +0000)]
qa/crimson: disable test-coredump

This test is there to aid in debugging coredumps creation and collection.
It always fails (as it intentionally leaves the coredump in place for collection).
Disabling it in the suite to avoid noise, but leaving the test in place for manual
runs when needed.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
5 weeks agoqa/crimson: add coredump generation test using ASOK assert
Ronen Friedman [Sun, 10 May 2026 13:16:02 +0000 (13:16 +0000)]
qa/crimson: add coredump generation test using ASOK assert

Trigger a crash on a Crimson OSD via the admin socket 'assert'
command and verify the OSD goes down and a coredump is produced.
Exercises the debug_asok_assert_abort path added in the companion
commit.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
5 weeks agocrimson/osd: support the Assert test ASOK command
Ronen Friedman [Sun, 10 May 2026 12:59:20 +0000 (12:59 +0000)]
crimson/osd: support the Assert test ASOK command

The command was mostly implemented, but was not fully wired
up to the ASOK command handler. This change adds the necessary
command registration.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
5 weeks agodoc/dev: fix release cycle diagram and missing text 68931/head
Ville Ojamo [Fri, 15 May 2026 09:14:36 +0000 (16:14 +0700)]
doc/dev: fix release cycle diagram and missing text

Introduced in 0a54fcdfc491ce2b2bb3ded77e319a7cff785e73

Signed-off-by: Ville Ojamo <git2233+ceph@ojamo.eu>
5 weeks agoMerge pull request #68359 from ronen-fr/wip-rf-cls-fromerror
Ronen Friedman [Fri, 15 May 2026 03:34:18 +0000 (06:34 +0300)]
Merge pull request #68359 from ronen-fr/wip-rf-cls-fromerror

cls: return EIO instead of ceph::from_error_code()

Reviewed-by: Alex Ainscow <aainscow@uk.ibm.com>
Reviewed-by: Adam C. Emerson <aemerson@redhat.com>
5 weeks agoMerge pull request #68811 from tchaikov/wip-silence-cpp-btree-warnings
Kefu Chai [Fri, 15 May 2026 02:02:06 +0000 (10:02 +0800)]
Merge pull request #68811 from tchaikov/wip-silence-cpp-btree-warnings

include/cpp-btree: fix false -Warray-bounds in child accessors

Reviewed-by: Matan Breizman<mbreizma@redhat.com>
5 weeks agoscript/ptl-tool: consolidate conflict reviews 68923/head
Patrick Donnelly [Fri, 15 May 2026 00:32:54 +0000 (20:32 -0400)]
script/ptl-tool: consolidate conflict reviews

To avoid saying the same things repeatedly.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
5 weeks ago.github/workflows/releng-audit: handle missing case of skipping audit on override 68921/head
Patrick Donnelly [Thu, 14 May 2026 23:59:47 +0000 (19:59 -0400)]
.github/workflows/releng-audit: handle missing case of skipping audit on override

If someone adds -fail/-pass and override exists, the label should be
removed and -override respected.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
5 weeks agoMerge pull request #68721 from adamemerson/wip-boost-1.91-container-bug
Adam Emerson [Thu, 14 May 2026 23:54:11 +0000 (19:54 -0400)]
Merge pull request #68721 from adamemerson/wip-boost-1.91-container-bug

rgw: Work around Boost.Containers bug in 1.91

Reviewed-by: Kefu Chai <tchaikov@gmail.com>
5 weeks agoMerge PR #68913 into main
Patrick Donnelly [Thu, 14 May 2026 23:51:40 +0000 (19:51 -0400)]
Merge PR #68913 into main

* refs/pull/68913/head:
.github/workflows/releng-audit: reuse existing redmine secret
.github/workflows/releng-audit: consolidate into single job
.github/workflows/releng-audit: handle simultaneous override and fail label changes

Reviewed-by: Yuri Weinstein <yweins@redhat.com>
5 weeks agorgw/cloud-transition: url-encode rgwx-source-key metadata header 68784/head
Matthew N. Heler [Wed, 6 May 2026 16:10:32 +0000 (11:10 -0500)]
rgw/cloud-transition: url-encode rgwx-source-key metadata header

For non-ASCII object keys, raw UTF-8 bytes end up in the signed
x-amz-meta-rgwx-source-key header. Strict S3-compatible backends
normalize non-ASCII bytes when verifying SigV4, producing a signature
mismatch -> HTTP 403, surfaced in LC as -EACCES (-13).

url_encode() the value before signing. The header is write-only,
so no decode is needed.

Signed-off-by: Matthew N. Heler <matthew.heler@hotmail.com>
5 weeks agoMerge pull request #68409 from kamoltat/wip-ksirivad-hide-tiebreaker
Kamoltat (Junior) Sirivadhna [Thu, 14 May 2026 21:34:52 +0000 (17:34 -0400)]
Merge pull request #68409 from kamoltat/wip-ksirivad-hide-tiebreaker

mon: make tiebreaker mon optional in stretch-mode
Reviewed-by: Greg Farnum <gfarnum@redhat.com>
5 weeks agorgw: group lifecycle versioned deletes to reduce OLH contention 67700/head
Matthew N. Heler [Fri, 6 Mar 2026 17:46:44 +0000 (11:46 -0600)]
rgw: group lifecycle versioned deletes to reduce OLH contention

When multiple versions of the same key expire together, each delete
does a read-modify-write of the OLH on the same bucket index shard.

Buffer versions of the same key during listing and flush on key change.
Groups with multiple versions get pre-evaluated, then hard deletes go
through rgw::multi_delete::dispatch() which skips OLH updates on all
but the last delete. LCOpRule::process() is split into evaluate() and
execute() to support this two-phase pattern.

Non-versioned buckets and single-version groups are unchanged.

Signed-off-by: Matthew N. Heler <matthew.heler@hotmail.com>
5 weeks agomgr/dashboard: adding daemon_name as an arg to nvmeof get bundle API 66908/head
Redouane Kachach [Mon, 13 Apr 2026 13:00:41 +0000 (15:00 +0200)]
mgr/dashboard: adding daemon_name as an arg to nvmeof get bundle API

When cephadm-signed are in use, we know to know exacly which nvmeof daemon is
being used so we get the correct certificates for this daemon in
particular

Fixes: https://tracker.ceph.com/issues/74377
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
5 weeks agorgw: extract multi-delete OLH grouping for use by lifecycle
Matthew N. Heler [Fri, 6 Mar 2026 04:21:41 +0000 (22:21 -0600)]
rgw: extract multi-delete OLH grouping for use by lifecycle

Move the OLH-aware dispatch logic out of RGWDeleteMultiObj into a
standalone rgw::multi_delete::dispatch() so lifecycle expiration
can group versioned deletes of the same key and skip redundant
OLH updates.

Signed-off-by: Matthew N. Heler <matthew.heler@hotmail.com>
5 weeks agoMerge pull request #67858 from adk3798/cephadm-serialize-osd-rm-status
Redouane Kachach [Thu, 14 May 2026 19:21:44 +0000 (21:21 +0200)]
Merge pull request #67858 from adk3798/cephadm-serialize-osd-rm-status

mgr/cephadm: serialize OSD class before returning for OSD rm status

Reviewed-by: John Mulligan <jmulligan@redhat.com>
5 weeks agoMerge pull request #67694 from ashjosh1git/ceph-tracker-69477-pgscalar
Redouane Kachach [Thu, 14 May 2026 19:19:45 +0000 (21:19 +0200)]
Merge pull request #67694 from ashjosh1git/ceph-tracker-69477-pgscalar

Control PG autoscaler during upgrades with pg_autoscale_during_upgrade

Reviewed-by: Adam King <adking@redhat.com>
5 weeks agocephadm: mgr prepare_create must replace ports, not append
Kobi Ginon [Thu, 14 May 2026 17:56:59 +0000 (20:56 +0300)]
cephadm: mgr prepare_create must replace ports, not append

- Root cause: empty "mgr services" skipped "if ports"; stale list +
  unconditional append duplicated 8765 across redeploys.
- Fix: assign ports + [service_discovery_port] (tracker #76564).
- Repro tip: disable dashboard/prometheus, redeploy mgr repeatedly,
  inspect mgr/cephadm/host.<host> JSON before/after.

Fixes: https://tracker.ceph.com/issues/76564
Signed-off-by: Kobi Ginon <kginon@redhat.com>
5 weeks ago.github/workflows/releng-audit: reuse existing redmine secret 68913/head
Patrick Donnelly [Thu, 14 May 2026 17:40:02 +0000 (13:40 -0400)]
.github/workflows/releng-audit: reuse existing redmine secret

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
5 weeks ago.github/workflows/releng-audit: consolidate into single job
Patrick Donnelly [Thu, 14 May 2026 17:26:33 +0000 (13:26 -0400)]
.github/workflows/releng-audit: consolidate into single job

In order to make this a required check someday, we can't have the main
job ever be skipped. So, consolidate into a single job and skip actions
based on the router logic.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
5 weeks ago.github/workflows/releng-audit: handle simultaneous override and fail label changes
Patrick Donnelly [Thu, 14 May 2026 16:48:09 +0000 (12:48 -0400)]
.github/workflows/releng-audit: handle simultaneous override and fail label changes

And add branch debugging.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
5 weeks agoMerge PR #68703 into main
Patrick Donnelly [Thu, 14 May 2026 15:25:26 +0000 (11:25 -0400)]
Merge PR #68703 into main

* refs/pull/68703/head:
script/ptl-tool: continue adding conflicts to review when interactive
script/ptl-tool: improve wording for rationale requests
script/ptl-tool: refactor verify_commit_parity
script/ptl-tool: replace gitauth redirection
doc: document the releng-audit workflow and update release examples
script/ptl-tool, actions: introduce event-driven CI backport auditing
script/ptl-tool: introduce interactive backport parity and conflict verification
script/ptl-tool: use Authorization header

Reviewed-by: John Mulligan <jmulligan@redhat.com>
5 weeks agoMerge pull request #68866 from ochaze/wip-doc-rgw-usage-shards-warning
Casey Bodley [Thu, 14 May 2026 15:24:54 +0000 (11:24 -0400)]
Merge pull request #68866 from ochaze/wip-doc-rgw-usage-shards-warning

doc/rgw: warn about rgw_usage_max_shards consistency

Reviewed-by: Casey Bodley <cbodley@redhat.com>
5 weeks agoMerge pull request #66064 from mheler/lifecycle_monitoring
Casey Bodley [Thu, 14 May 2026 15:14:30 +0000 (11:14 -0400)]
Merge pull request #66064 from mheler/lifecycle_monitoring

rgw/lc: add per-bucket lifecycle performance monitoring

Reviewed-by: Matt Benjamin <mbenjamin@redhat.com>
5 weeks agoqa: ignore pg stuck peering 68907/head
Patrick Donnelly [Thu, 14 May 2026 12:36:38 +0000 (08:36 -0400)]
qa: ignore pg stuck peering

Originally resolved in 9fa163df8f1449a23d186d3fb20610dd1d2e69de but the
changes were lost when upgrade suites were refreshed for the new release.

Fixes: https://tracker.ceph.com/issues/76599
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
5 weeks agoUse GANESHA_REPO_BASEURL for NFS-Ganesha on all distros 68909/head
Shweta Bhosale [Thu, 14 May 2026 13:49:56 +0000 (19:19 +0530)]
Use GANESHA_REPO_BASEURL for NFS-Ganesha on all distros

Fixes: https://tracker.ceph.com/issues/76603
Signed-off-by: Shweta Bhosale <Shweta.Bhosale1@ibm.com>
5 weeks agoMerge pull request #68842 from ShwetaBhosale1/fix_issue_76504_nfs_to_reuse_cephfsclie...
Redouane Kachach [Thu, 14 May 2026 12:40:42 +0000 (14:40 +0200)]
Merge pull request #68842 from ShwetaBhosale1/fix_issue_76504_nfs_to_reuse_cephfsclient_cache

mgr/nfs: reuse CephfsClient for path checks and earmark resolver

Reviewed-by: Kushal Deb <Kushal.Deb@ibm.com>
Reviewed-by: Ashwin M. Joshi <ashjosh1@in.ibm.com>
5 weeks agoMerge pull request #68646 from ShwetaBhosale1/fix_issue_76284_skip_rdma_device_check_...
Redouane Kachach [Thu, 14 May 2026 12:38:22 +0000 (14:38 +0200)]
Merge pull request #68646 from ShwetaBhosale1/fix_issue_76284_skip_rdma_device_check_for_nfs_during_upgarde

mgr/cephadm: Skip RDMA device check for NFS during upgrade

Reviewed-by: Redouane Kachach <rkachach@ibm.com>
5 weeks agoMerge pull request #67070 from JoshuaGabriel/wip-cephadm-ssh-74551
Redouane Kachach [Thu, 14 May 2026 12:00:55 +0000 (14:00 +0200)]
Merge pull request #67070 from JoshuaGabriel/wip-cephadm-ssh-74551

mgr/cephadm: remove SSH error logs from health detail when host is unreachable

Reviewed-by: Redouane Kachach <rkachach@ibm.com>
5 weeks agoMerge pull request #68699 from Shubhaj1810/fix-issue-IBMCEPH-13078
Redouane Kachach [Thu, 14 May 2026 11:56:03 +0000 (13:56 +0200)]
Merge pull request #68699 from Shubhaj1810/fix-issue-IBMCEPH-13078

cephadm: improve oauth2-proxy validation error messaging

Reviewed-by: Adam King <adking@redhat.com>
5 weeks agoMerge pull request #68712 from yzaken/oauth2_proxy_redirect_dahsboard_browser_to_corr...
Redouane Kachach [Thu, 14 May 2026 11:54:08 +0000 (13:54 +0200)]
Merge pull request #68712 from yzaken/oauth2_proxy_redirect_dahsboard_browser_to_correct_port

mgr/cephadm: redirect browser to correct port by identity provider

Reviewed-by: Redouane Kachach <rkachach@ibm.com>
5 weeks agomgr/dashboard: show warning message in nvmeof cli 68996/head
Vallari Agrawal [Thu, 14 May 2026 11:36:00 +0000 (17:06 +0530)]
mgr/dashboard: show warning message in nvmeof cli

If return status=0 and there is error_message, then its
a warning message from gateway, add it to output string
for plain text output.
It is already there for JSON output.

Fixes: https://tracker.ceph.com/issues/76595
Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>
5 weeks agomds/ScrubStack: set added_children to true for dirfrags too 68686/head
Rishabh Dave [Thu, 30 Apr 2026 07:18:11 +0000 (12:48 +0530)]
mds/ScrubStack: set added_children to true for dirfrags too

Introduced-by: 9e83e1c
Fixes: https://tracker.ceph.com/issues/76321
Signed-off-by: Rishabh Dave <ridave@redhat.com>
5 weeks agotools/cephfs: always execute scan_{extents,inodes,frags} and cleanup 67709/head
Venky Shankar [Thu, 26 Feb 2026 14:43:19 +0000 (20:13 +0530)]
tools/cephfs: always execute scan_{extents,inodes,frags} and cleanup

Even when the number of objects reported from pool stats is zero.
Pool stats metrics are delayed and cannot be fully relied on for
accuracy. Trusting the number of objects (esp. when reported as
zero) could result in missed steps during data-scan execution.

So, we try to do two things now:

1. ProgressTracker::display_progress() will display progress only
when the total items exceeds to the number of progress items.
2. Refresh total object count during each iteration of processing
objects. This might be a bit too much, so we probably need to
do this periodically rather than on each iteration.

Fixes: http://tracker.ceph.com/issues/75083
Signed-off-by: Venky Shankar <vshankar@redhat.com>
5 weeks agoMerge PR #64774 into main
Venky Shankar [Thu, 14 May 2026 09:27:21 +0000 (14:57 +0530)]
Merge PR #64774 into main

* refs/pull/64774/head:
test_cephfs.py: delete purge_dir() helper method, use rmtree() instead
test_cephfs.py: remove rendundant call to purge_dir()
test_cephfs.py: test rmtree on root
pybind/cephfs: don't attempt to unlink root in rmtree
test_cephfs.py: test rmtree with and without should_cancel
pybind/cephfs: make should_cancel option parameter for rmtree()
mgr/volumes: clone using cptree() from cephfs python bindings
test_cephfs: add unit tests for cptree() in cephfs python bindings
test/pybind/assertions: add helper method assert_less
pybind/cephfs: use depth-first, non-recursive approach for cloning
test_cephfs: call object setup/teardown for all tests in TestWithRootUser
test_cephfs.py: add tests for utimensat()
pybind/cephfs: add python bindings for utimensat()
qa/cephfs: add tests for chownat()
pybind/cephfs: add python bindings for chownat()
test_cephfs.py: add tests for chmodat()
pybind/cephfs: add python bindings for chmodat()
test_cephfs.py: add tests for symlinkat()
pybind/cephfs: add python binding for symlinkat()
test_cephfs.py: add test for readlinkat()
pybind/cephfs: add python binding for readlinkat()
pybind/cephfs: add tests for statxat()
pybind/cephfs: add python bindings for statxat()
test_cephfs.py: add tests for mkdirat()
pybind/cephfs: add python binding for mkdirat()

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Jos Collin <jcollin@redhat.com>
5 weeks agomgr/cephadm: fix get_cert_with_label to use host instead of fqdn
Redouane Kachach [Wed, 11 Feb 2026 14:57:56 +0000 (15:57 +0100)]
mgr/cephadm: fix get_cert_with_label to use host instead of fqdn

Fixes: https://tracker.ceph.com/issues/75009
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
5 weeks agomgr/cephadm: adding UT for this new functionality (leftover cleanup)
Redouane Kachach [Wed, 11 Feb 2026 15:55:21 +0000 (16:55 +0100)]
mgr/cephadm: adding UT for this new functionality (leftover cleanup)

Fixes: https://tracker.ceph.com/issues/75009
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
5 weeks agomgr/cephadm: cleanup leftover certs/keys after cert_src changes
Redouane Kachach [Wed, 11 Feb 2026 13:36:01 +0000 (14:36 +0100)]
mgr/cephadm: cleanup leftover certs/keys after cert_src changes

This PR improves certificate cleanup when a service switches
certificate sources (cephadm-signed <-> inline/reference). It also adds
best-effort post-remove helpers to purge stale cephadm-managed
cert/key pairs. Inline-stored (non-editable) certs/keys are removed,
while referenced/user-managed (editable) credentials are preserved.

Fixes: https://tracker.ceph.com/issues/75009
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
5 weeks agomgr/cephadm: adding tls fields as deps for services with TLS support
Redouane Kachach [Wed, 11 Feb 2026 11:17:55 +0000 (12:17 +0100)]
mgr/cephadm: adding tls fields as deps for services with TLS support

This is especially important for inline certificates, so the certmgr
store is updated automatically whenever the user changes the values in
the spec and reapplies it.

Fixes: https://tracker.ceph.com/issues/75009
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
5 weeks agoMerge pull request #67087 from ShwetaBhosale1/fix_issue_74479_nfs_active_active_suppo...
Redouane Kachach [Thu, 14 May 2026 08:58:08 +0000 (10:58 +0200)]
Merge pull request #67087 from ShwetaBhosale1/fix_issue_74479_nfs_active_active_support_allow_colo

mgr/cephadm: Allow colocation of NFS daemon to support active-active mode

Reviewed-by: Adam King <adking@redhat.com>
5 weeks agomgr/dashboard: Carbonize cluster-wide OSD flags modal 68891/head
Sagar Gopale [Wed, 13 May 2026 14:13:56 +0000 (19:43 +0530)]
mgr/dashboard: Carbonize cluster-wide OSD flags modal
fixes:https://tracker.ceph.com/issues/76580
Signed-off-by: Sagar Gopale <sagar.gopale@ibm.com>