This commit introduces performance counters for individual Ceph mgr modules.
These counters allow monitoring module behavior, debugging latency issues,
and identifying performance bottlenecks, all without modifying the modules themselves.
The following counters are now exposed under:
> ceph daemon mgr.<id> perf dump
Example structure:
"mgr_module_<module_name>": {
"notify_avg_usec": { <- Average time spent handling notify events
"avgcount": 0,
"sum": 0
},
"cmd_avg_usec": { <- Average time spent processing CLI/admin commands
"avgcount": 0,
"sum": 0
},
"serve_avg_usec": { <- Average time spent in module serve loop (if applicable)
"avgcount": 0,
"sum": 0
},
"alive": 1 <- Module is alive (1 = running, 0 = exited)
"cpu_usage": 0, <- CPU usage in percent
"mem_rss_change": 0, <- Memory RSS change in bytes
"mem_rss_current": 490737664 <- Memory RSS current in bytes
}
Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>
Conflicts:
src/mgr/ActivePyModules.cc - finisher.queue changed by 63859, adding py_module to the parameter list
src/mgr/PyModuleRegistry.cc - check_all_modules_started added by 63859
Jan Radon [Fri, 15 May 2026 13:42:08 +0000 (15:42 +0200)]
feat(rgw/kafka): add mTLS client certificate authentication for Kafka notifications
Add support for mutual TLS (mTLS) client certificate authentication
when publishing bucket notifications to Kafka brokers. RGW can now
present a client certificate and private key to authenticate with
brokers that require ssl.client.auth=required.
Changes:
- Add ssl-certificate-location, ssl-key-location, and ssl-key-password
topic attributes for configuring client certificates
- Validate that ssl_certificate and ssl_key are provided together
- Include ssl_key_password in connection identity (hash/equality)
- Add kafka-security.sh script for generating broker and client TLS certs
- Add mTLS test (test_notification_kafka_security_ssl_mtls) using
use_mtls=True flag on the existing SSL security path
- Update RGW notifications documentation with mTLS parameters
Fixes: http://tracker.ceph.com/issues/67427 Signed-off-by: Jan Radon <jan.fabian.radon@sap.com>
The new implementation retire an absent extent by constructing a real
empty extent and add it to the transaction's retired_set, instead of
creating a retired placeholder
osd/scrub: limit scrubbing under snap-trimming overload
When the snap-trim queues are long, scrubbing is likely to
make things worse. This change adds a new scrubbing restriction
for that case, and prevents periodic scrubs from starting when
the total snap-trim queue length across all PGs exceeds a
configurable threshold.
Maodi Ma [Wed, 5 Nov 2025 02:35:46 +0000 (02:35 +0000)]
common: enable AVX512+VPCLMULQDQ for crc32c performance on x86
- Add crc32_iscsi_by16_10 in src/isa-l into candidates for ceph_crc32c
- Add hardware capability check for AVX512 instr before register
- Add NASM feature check to ensure compatibility and to enable
AS_FEATURE_LEVEL in crc32_iscsi_by16_10.asm
Ville Ojamo [Wed, 20 May 2026 06:26:08 +0000 (13:26 +0700)]
doc/rados: move label to right place in pools.rst
The label is named setting and not unsetting, so move it from the
unsetting section to the setting section.
The label is used only once and the context in which it is used is also
more fitting for the setting section.
Signed-off-by: Ville Ojamo <git2233+ceph@ojamo.eu>
Ronen Friedman [Sat, 16 May 2026 14:39:51 +0000 (14:39 +0000)]
crimson/osd: decouple snap trim initiation from scrub completion
Add SnapTrimInitiate operation so kick_snap_trim() no longer calls
on_active_actmap() inline during scrub completion, which nested
conflicting with_interruption contexts and hit an assertion.
mgr/DaemonServer: auto-tune stats period when message queue gets backed up
The mgr can get overwhelmed when there's a lot of cluster activity and
daemons are sending stats reports faster than we can process them.
This commit adds logic to monitor the messenger queue depth and bump
up mgr_stats_period when things get congested. This reduces the
frequency of daemon stat reports, allowing the mgr to process existing
reports without being overwhelmed by new ones. The period automatically
scales back down when the queue clears up.
Added mgr_stats_period_autotune (on by default) and a queue threshold
setting. Recovery happens automatically when the queue clears up.
Max period is capped at 60 seconds to prevent excessive stat delays.
Patrick Donnelly [Tue, 19 May 2026 14:10:33 +0000 (10:10 -0400)]
.github/workflows/releng-audit: update workflows
To avoid this warning:
> Warning: Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/checkout@v3, actions/setup-python@v4. Actions will be forced to run with Node.js 24 by default starting June 2nd, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
Kefu Chai [Tue, 19 May 2026 12:58:10 +0000 (20:58 +0800)]
debian/rules: strip ceph-osd-classic and ceph-osd-crimson
override_dh_strip enumerates each binary package explicitly. It was not
updated when ceph-osd was split into the ceph-osd-classic and
ceph-osd-crimson implementation packages, so the OSD binaries in those
two packages are shipped unstripped (ceph-osd-crimson installs at ~4.6
GiB) and their -dbg packages are left empty.
Add the missing dh_strip invocations so the OSD binaries are stripped
and their debug symbols land in the corresponding -dbg packages, as is
already done for every other binary package.
Afreen Misbah [Mon, 18 May 2026 20:06:35 +0000 (01:36 +0530)]
mgr/dashboard: fix remaining FA icon references and test failures
- Fix icon size mismatches and HTML lint errors
- Fix remaining FA icon references in tests
- Replace FA icons with Carbon in upgrade component:
use cds-inline-loading for spinners, cd-icon for status icons
- Update test selectors for Carbon icon queries
Fixes: https://tracker.ceph.com/issues/76631 Signed-off-by: Afreen Misbah <afreen23@gmail.com> Assisted-by: Claude
Afreen Misbah [Sun, 17 May 2026 16:43:59 +0000 (22:13 +0530)]
mgr/dashboard: fix filter icon alignment in table toolbar
Replace Bootstrap inline styles with proper CSS class for filter
icon and select dropdowns alignment. Created filter-wrapper class
to properly align filter icon with select elements using flexbox.
Signed-off-by: Afreen Misbah <afreen@ibm.com> Assisted-by: Claude Fixes: https://tracker.ceph.com/issues/76631
Afreen Misbah [Sun, 17 May 2026 15:07:45 +0000 (20:37 +0530)]
mgr/dashboard: fix missing loader and zone group icon
- Add state="active" to cds-inline-loading in card-row component
to properly show loading spinner for table row actions
- Replace parentChild icon with clusterIcon (web-services--cluster)
for zone group representation in RGW multisite
- Remove parentChild from Icons enum and replace with
WebServicesCluster in components.module.ts
- Import ComponentsModule in rgw.module.ts for cd-icon support
Signed-off-by: Afreen Misbah <afreen@ibm.com> Assisted-by: Claude Fixes: https://tracker.ceph.com/issues/76631
Added LoadingModule and InlineLoadingModule imports to:
- block.module.ts
- cephfs.module.ts
- cluster.module.ts
(rgw.module.ts and components.module.ts already had them)
Signed-off-by: Afreen Misbah <afreen@ibm.com> Assisted-by: Claude Fixes: https://tracker.ceph.com/issues/76631
Afreen Misbah [Sun, 17 May 2026 00:14:41 +0000 (05:44 +0530)]
mgr/dashboard: remove font awesome references
- Remove .fa and .fa-* class styles from component SCSS files
- Remove FA icon spacing rules from global styles
- Clean up .fa-stack styles (FA stacking feature)
- Remove FA-specific color styles
- Remove FA icons
Signed-off-by: Afreen Misbah <afreen@ibm.com> Assisted-by: Claude Fixes: https://tracker.ceph.com/issues/76631
Bill Scales [Tue, 19 May 2026 06:05:13 +0000 (07:05 +0100)]
doc/dev/internals: Improve Ceph Internals TOC
The Ceph internals section of the docs is a bit of a mess
as far as the table of contents is concerned. This commit
tries to add a bit more structure grouping topics by
area and trying to arrange them in a more logical order.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
rgw/dedup: add --allow/deny-bucket-list and --allow/deny-storage-class-list to dedup commands
Resolves: bz#2413730 Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
Casey Bodley [Tue, 12 May 2026 18:58:16 +0000 (14:58 -0400)]
librados/asio: clear cancellation slot in associated executor
the librados callback function `AsyncOp::aio_dispatch()` runs on
Objecter's finisher strand executor, and dispatches the completion
handler to its associated executor
asio cancellation is not thread-safe, so should be synchronized on that
associated executor. move the call to `slot.clear()` from that librados
callback into the AsyncHandler wrapper so it doesn't run until we've
switched to the correct executor
because our `op_cancellation` handler depends on an `AioCompletion`
pointer, we have to clear the cancellation slot before that
`AioCompletion` lifetime ends