Krunal Chheda [Wed, 20 May 2026 18:14:22 +0000 (14:14 -0400)]
rgw/notification: fix zero eventTime in bucket notifications on concurrent PUT race
When concurrent PUTs target the same object key, RADOS may return
-ECANCELED to the losing writers. In that path *meta.mtime was never
populated from meta.set_mtime, leaving mtime at epoch (zero), which
propagated into bucket notification eventTime as
"1970-01-01T00:00:00.000Z".
Fix: set *meta.mtime from meta.set_mtime before returning 0 in the
ECANCELED/ENOENT/EEXIST early-return block, matching the behaviour of
the successful write path.
Also add a regression test that fires 20 concurrent threads writing
the same key and asserts no event in the persistent queue carries a
zero eventTime.
mgr/DaemonServer: auto-tune stats period when message queue gets backed up
The mgr can get overwhelmed when there's a lot of cluster activity and
daemons are sending stats reports faster than we can process them.
This commit adds logic to monitor the messenger queue depth and bump
up mgr_stats_period when things get congested. This reduces the
frequency of daemon stat reports, allowing the mgr to process existing
reports without being overwhelmed by new ones. The period automatically
scales back down when the queue clears up.
Added mgr_stats_period_autotune (on by default) and a queue threshold
setting. Recovery happens automatically when the queue clears up.
Max period is capped at 60 seconds to prevent excessive stat delays.
Kefu Chai [Tue, 19 May 2026 12:58:10 +0000 (20:58 +0800)]
debian/rules: strip ceph-osd-classic and ceph-osd-crimson
override_dh_strip enumerates each binary package explicitly. It was not
updated when ceph-osd was split into the ceph-osd-classic and
ceph-osd-crimson implementation packages, so the OSD binaries in those
two packages are shipped unstripped (ceph-osd-crimson installs at ~4.6
GiB) and their -dbg packages are left empty.
Add the missing dh_strip invocations so the OSD binaries are stripped
and their debug symbols land in the corresponding -dbg packages, as is
already done for every other binary package.
Afreen Misbah [Mon, 18 May 2026 20:06:35 +0000 (01:36 +0530)]
mgr/dashboard: fix remaining FA icon references and test failures
- Fix icon size mismatches and HTML lint errors
- Fix remaining FA icon references in tests
- Replace FA icons with Carbon in upgrade component:
use cds-inline-loading for spinners, cd-icon for status icons
- Update test selectors for Carbon icon queries
Fixes: https://tracker.ceph.com/issues/76631 Signed-off-by: Afreen Misbah <afreen23@gmail.com> Assisted-by: Claude
Afreen Misbah [Sun, 17 May 2026 16:43:59 +0000 (22:13 +0530)]
mgr/dashboard: fix filter icon alignment in table toolbar
Replace Bootstrap inline styles with proper CSS class for filter
icon and select dropdowns alignment. Created filter-wrapper class
to properly align filter icon with select elements using flexbox.
Signed-off-by: Afreen Misbah <afreen@ibm.com> Assisted-by: Claude Fixes: https://tracker.ceph.com/issues/76631
Afreen Misbah [Sun, 17 May 2026 15:07:45 +0000 (20:37 +0530)]
mgr/dashboard: fix missing loader and zone group icon
- Add state="active" to cds-inline-loading in card-row component
to properly show loading spinner for table row actions
- Replace parentChild icon with clusterIcon (web-services--cluster)
for zone group representation in RGW multisite
- Remove parentChild from Icons enum and replace with
WebServicesCluster in components.module.ts
- Import ComponentsModule in rgw.module.ts for cd-icon support
Signed-off-by: Afreen Misbah <afreen@ibm.com> Assisted-by: Claude Fixes: https://tracker.ceph.com/issues/76631
Added LoadingModule and InlineLoadingModule imports to:
- block.module.ts
- cephfs.module.ts
- cluster.module.ts
(rgw.module.ts and components.module.ts already had them)
Signed-off-by: Afreen Misbah <afreen@ibm.com> Assisted-by: Claude Fixes: https://tracker.ceph.com/issues/76631
Afreen Misbah [Sun, 17 May 2026 00:14:41 +0000 (05:44 +0530)]
mgr/dashboard: remove font awesome references
- Remove .fa and .fa-* class styles from component SCSS files
- Remove FA icon spacing rules from global styles
- Clean up .fa-stack styles (FA stacking feature)
- Remove FA-specific color styles
- Remove FA icons
Signed-off-by: Afreen Misbah <afreen@ibm.com> Assisted-by: Claude Fixes: https://tracker.ceph.com/issues/76631
Bill Scales [Tue, 19 May 2026 06:05:13 +0000 (07:05 +0100)]
doc/dev/internals: Improve Ceph Internals TOC
The Ceph internals section of the docs is a bit of a mess
as far as the table of contents is concerned. This commit
tries to add a bit more structure grouping topics by
area and trying to arrange them in a more logical order.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
rgw/dedup: add --allow/deny-bucket-list and --allow/deny-storage-class-list to dedup commands
Resolves: bz#2413730 Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
Patrick Donnelly [Mon, 18 May 2026 14:20:08 +0000 (10:20 -0400)]
Merge PR #68937 into main
* refs/pull/68937/head:
.github/workflows/releng-audit: group events to serialize executions
.github/workflows/releng-audit: remove override on reopen
.github/workflows/releng-audit: refactor auth check to function
Afreen Misbah [Mon, 18 May 2026 10:01:58 +0000 (15:31 +0530)]
mgr/dashboard: fix logs e2e tests after carbonization
Update e2e test selectors to match the new Carbon component structure.
The .card-body and .message classes were replaced with .log-viewer
and .log-entry__message after carbonizing the logs component.
Assisted-by: Claude Signed-off-by: Afreen Misbah <afreen@ibm.com>
Afreen Misbah [Sun, 17 May 2026 14:53:54 +0000 (20:23 +0530)]
mgr/dashboard: Carbonize upgrade page
- Made cluster status clickable to navigate to overview when not HEALTH_OK
- Replaced Bootstrap classes with Carbon design tokens
- Updated upgrade.component.scss to use CSS custom properties
Assisted-by: Claude Signed-off-by: Afreen Misbah <afreenmisbah@ibm.com>
Afreen Misbah [Tue, 12 May 2026 12:07:39 +0000 (17:37 +0530)]
mgr/dashboard: Fix edit and delete access for pool-manager role
Fixes https://tracker.ceph.com/issues/76561
- allows deleting pools in pool-manager role by bypassing config-opt read permissions
- allows editing in pool-manager role which failing deu to misisng rbd mirroring permissions
- fixes a bug with pool edit mode where when both compression and name are edited it fails due to an if-else logic bug
Kefu Chai [Wed, 6 May 2026 02:08:20 +0000 (10:08 +0800)]
cmake/BuildISAL: build and install library targets only
Skip building the igzip executables; Ceph only needs libisal.la.
This should speed up the build a little bit, as we don't build the
executables previous built with "make"
Shai Fultheim [Sat, 16 May 2026 20:17:59 +0000 (23:17 +0300)]
crimson/os/seastore: fix cleaner space leak from shadowed result list
TransactionManager::get_extents_if_live() declared an inner
std::list<CachedExtentRef> res inside the "extent is cached" branch
that shadowed the outer res returned by the coroutine. When the
queried extent was present in the cache, it was moved into the inner
list and immediately discarded, and the empty outer list was returned
to the caller.
The async cleaner uses this result to decide whether to rewrite an
extent or treat it as dead. For recently-allocated LBA tree internal
nodes (still hot in cache), the shadowed return caused the cleaner to
skip them, so mark_space_free() never paired with the earlier
mark_space_used(). Each affected reclaim leaked exactly one extent
(4 KiB for LADDR_INTERNAL), tripping the live_bytes != 0 assertion in
SegmentCleaner::clean_space() (async_cleaner.cc:1441) once a victim
segment with such a leftover was selected.
The reproducer (at ~70% full) deterministically aborted within ~3
minutes before this fix; with the fix the OSDs run cleanly past the
trigger point.
Kefu Chai [Sat, 16 May 2026 02:53:41 +0000 (10:53 +0800)]
doc/dev: refresh vstart.sh options in dev_cluster_deployment
Bring doc/dev/dev_cluster_deployment.rst back in line with the current
src/vstart.sh:
* drop the removed -K/--kstore objectstore backend
* drop -N/--not-new, which was dropped in 8dd2e418; reusing the existing
cluster config is simply the default when -n is not given
* correct the --rgw_frontend default from civetweb to beast
* note that -b/--bluestore is the default objectstore backend
* update the example and add a note that a fresh build needs -n on the
first run, while later runs can omit it
* note that the option list is not exhaustive and point at src/vstart.sh
Matthew N. Heler [Fri, 15 May 2026 11:11:35 +0000 (06:11 -0500)]
doc/rados/configuration: recommend wpq for EC clusters seeing slow ops
On large EC clusters, mClock currently routes recovery EC sub-reads
through the immediate queue, skipping throttling. When many OSDs read
from one source during recovery, that source's high-priority queue
saturates and starves client work, producing slow ops. Recommend
falling back to wpq in the mClock config reference until the
scheduler treats those reads as background.
Signed-off-by: Matthew N. Heler <matthew.heler@hotmail.com>
Ronen Friedman [Wed, 13 May 2026 08:06:17 +0000 (08:06 +0000)]
qa/crimson: disable test-coredump
This test is there to aid in debugging coredumps creation and collection.
It always fails (as it intentionally leaves the coredump in place for collection).
Disabling it in the suite to avoid noise, but leaving the test in place for manual
runs when needed.
Ronen Friedman [Sun, 10 May 2026 13:16:02 +0000 (13:16 +0000)]
qa/crimson: add coredump generation test using ASOK assert
Trigger a crash on a Crimson OSD via the admin socket 'assert'
command and verify the OSD goes down and a coredump is produced.
Exercises the debug_asok_assert_abort path added in the companion
commit.