the vhost-style transformations ran in RGWREST::preprocess() before we
even route the request, so applied to every REST API in radosgw
vhost-style requests are specific to the S3 API, so they should only
apply after being routed to RGWRESTMgr_S3
extract the vhost logic from RGWREST::proprocess() into
rgw_rest_transform_s3_vhost_style(), and call that only from
RGWRESTMgr_S3::get_resource_mgr_as_default()
url-decoding of request_uri into decoded_uri is now duplicated in
preprocess() to apply to all requests, then again after vhost-style
transforms the request_uri
avoid allocating a list of strings to parse the comma-separated
rgw_enable_apis configuration
the range returned by ceph::split() has no size() function, so change
the calculation to not require it - `size() - distance(begin(), pos)`
is the same thing as `distance(pos, end())`
Casey Bodley [Mon, 30 Jun 2025 22:06:08 +0000 (18:06 -0400)]
rgw: add helper for bucket + account PublicAccessBlock config
get_public_access_conf() takes an optional account, and checks
RGW_ATTR_PUBLIC_ACCESS on that in addition to the bucket. if both attrs
are found, return the union of their configurations
Oguzhan Ozmen [Tue, 19 May 2026 22:12:35 +0000 (22:12 +0000)]
rgw/datalog: DataLogBackends::trim_entries: fix crash when target_gen > head_gen
When a cluster has no sync zones (single-zone), DataLogTrimCR passes
max_marker() as the trim marker, which encodes target_gen = UINT64_MAX
from gencursor(). In DataLogBackends::trim_entries, after trimming the
head (last) generation, the break condition
if (be->gen_id == target_gen)
is false (e.g. 0 != UINT64_MAX), so the loop attempts its increment
expression:
be = upper_bound(be->gen_id)->second
upper_bound(head_gen) returns end(), and dereferencing end()->second
causes crash.
Fix: also break when be->gen_id >= head_gen. Once we've trimmed the
head generation there are no further backends in the map, so the
upper_bound dereference in the loop increment will be skipped.
This is a general bug that affects any cluster using max_marker() as a
trim target (i.e. every single-zone deployment).
Oguzhan Ozmen [Tue, 19 May 2026 23:43:58 +0000 (23:43 +0000)]
test/rgw/datalog: test for trim_entries with max_marker
Verify that DataLogBackends::trim_entries does not crash when called
with max_marker() on a single-generation cluster. The bug causes
upper_bound(head_gen)->second to dereference end() (SIGSEGV)
because the only break condition checked be->gen_id == target_gen,
which is never true when target_gen is UINT64_MAX as encoded by
max_marker() and the cluster has only generation 0.
Kefu Chai [Wed, 20 May 2026 07:55:58 +0000 (15:55 +0800)]
crimson/osd: use store-specific max_object_size for the OSD-layer write check
is_offset_and_length_valid() checked write sizes against
osd_max_object_size (128 MiB), but SeaStore caps per-onode laddr space
at seastore_default_max_object_size (16 MiB). Writes between the two
limits pass the OSD check, reach SeaStore, and trip
prepare_data_reservation()'s ceph_assert(), crashing the OSD and its
replicas.
Add FuturizedStore::Shard::get_max_object_size() (returns
osd_max_object_size by default) and override it in SeaStore::Shard to
return min(osd_max_object_size, max_object_size). Convert
is_offset_and_length_valid() from a static function to a PGBackend
member that queries the store, so EFBIG reaches the client before the
write ever hits the store.
Kefu Chai [Wed, 20 May 2026 04:03:12 +0000 (12:03 +0800)]
crimson/osd: only unblock wait_for_active_blocker on replica when ACTIVE
ReplicaActive::react(ActivateCommitted) sets ACTIVE or PEERED before
calling on_activate_committed(). Without a guard, an unconditional
unblock() on the PEERED path resets the promise, causing ops that
arrive afterward to park indefinitely (until the next on_change()).
The primary already has this guard in on_activate_complete(); mirror it
on the replica side.
Kefu Chai [Wed, 20 May 2026 02:22:24 +0000 (10:22 +0800)]
crimson/osd: wake pgs_creating waiters in PGMap::pg_loaded()
wait_for_pg() parks callers on pgs_creating[pgid] when the PG isn't in
pgs yet. pg_created() wakes those waiters; pg_loaded() didn't. An op
that races ahead of a PG load hangs indefinitely at CreateOrWaitPG.
Add the symmetric wake-up in pg_loaded() with a conditional find --
unlike pg_created(), a loaded PG may have no waiters at all.
Kefu Chai [Wed, 20 May 2026 07:36:51 +0000 (15:36 +0800)]
crimson/seastore: reject oversized writes and zeros instead of aborting
prepare_data_reservation() ceph_assert()s the request fits within
seastore_default_max_object_size (16 MiB), but the OSD validates writes
against osd_max_object_size (128 MiB). Anything between the two limits
passes OSD validation then trips the assert, crashing the OSD and its
replicas.
_zero() already returned EIO for this case; mirror that in _write() and
fix _zero()'s off-by-one (>= should be >, matching the <= in the assert).
Jamie Pryde [Wed, 20 May 2026 10:31:53 +0000 (11:31 +0100)]
mon: Add health checker for deprecated EC plugins and techniques
We want to reduce the number of EC plugins and techniques we support
in order to focus dev and test effort on the ones that are most
useful.
We are deprecating the following plugins and techniques in Umbrella,
and dropping support for them in the V release:
* shec
* clay
* all non-reed_sol_van jerasure techniques
This commit adds a health checker to print a warning message if the cluster
is using any of the deprecated plugins/techniques and instructs the user
to migrate objects to a different pool.
Matan Breizman [Wed, 20 May 2026 08:22:22 +0000 (11:22 +0300)]
crimson/osd: call pubsetbuf() before open()
Move rdbuf()->pubsetbuf(nullptr, 0) before ofstream::open() since libstdc++
may ignore setbuf() once the filebuf is associated with a file.
```
setbuf() may only be called when the std::basic_filebuf is not associated with a file (has no effect otherwise)
```
https://en.cppreference.com/cpp/io/basic_filebuf/setbuf
Ville Ojamo [Wed, 20 May 2026 06:26:08 +0000 (13:26 +0700)]
doc/rados: move label to right place in pools.rst
The label is named setting and not unsetting, so move it from the
unsetting section to the setting section.
The label is used only once and the context in which it is used is also
more fitting for the setting section.
Signed-off-by: Ville Ojamo <git2233+ceph@ojamo.eu>
Ronen Friedman [Sat, 16 May 2026 14:39:51 +0000 (14:39 +0000)]
crimson/osd: decouple snap trim initiation from scrub completion
Add SnapTrimInitiate operation so kick_snap_trim() no longer calls
on_active_actmap() inline during scrub completion, which nested
conflicting with_interruption contexts and hit an assertion.
Kefu Chai [Wed, 20 May 2026 00:55:55 +0000 (08:55 +0800)]
crimson/seastore: clamp block_size to laddr_t::UNIT_SIZE on small-LBA devices
Seastar's file::disk_write_dma_alignment() faithfully reports what the
kernel exposes for the underlying device. On block devices in 512-byte
LBA mode (the factory default for many NVMe SSDs), it correctly returns
512.
SeaStore's internal addressing, however, operates at a 4 KiB page
granularity defined by laddr_t::UNIT_SIZE, and SeaStore::_mount() asserts
that block_size >= UNIT_SIZE. As a result, ceph-osd-crimson --mkfs
--osd-objectstore seastore aborts on any device shipped in 512-LBA
mode:
seastore.cc:344 ceph_assert(block_size >= laddr_t::UNIT_SIZE)
seastore requires a device block size of at least 4096 bytes,
but the primary device at '/var/lib/ceph/osd/ceph-N/block'
reports block_size=512
The reported alignment is not wrong; it is the minimum Linux enforces
for O_DIRECT on that device, so users with optimization in mind can
override it through io_properties.yaml. But SeaStore can run correctly
on 512-LBA devices as long as it issues only 4 KiB-aligned I/O (which
is also 512-aligned, so the device is happy). Clamp the captured
block_size to laddr_t::UNIT_SIZE so SeaStore can host an OSD on
512-LBA storage without operator intervention, while still honoring
larger device-reported alignments when present.
Oguzhan Ozmen [Tue, 12 May 2026 19:43:13 +0000 (19:43 +0000)]
neocls log trimming (time based): fix infinite loop on ENODATA
This is essentially the same as previous commit.
The time-based use_awaitable_t overload of trim() has the same
issue as the marker-based overload: the try-catch for ENODATA is inside
the for(;;) loop, so ENODATA is caught and swallowed, causing the loop
to retry forever.
Oguzhan Ozmen [Tue, 12 May 2026 19:38:12 +0000 (19:38 +0000)]
neocls log trimming (marker based): fix infinite loop on ENODATA
The use_awaitable_t overload of trim() has the try-catch for
ENODATA (no_message_available) inside the for(;;) loop. When
cls_log_trim returns ENODATA (i.e., nothing left to trim), the exception is
caught and silently swallowed execution falls through the catch block
back to for(;;), retrying the trim forever.
This should be a rare condition as 3 conditions should be met in a
single-cluster (once configured as multisite):
- a realm/period exists
- zone endpoints are configured
- data_log* objects exist
Since in a properly setup multisite cluster, data churn is continious so
hard to notice. In the case client reported, the multisite cluster was
reverted back to single site so data_logs have no data all the time;
hence, the issue is pronounced.
This fix adds co_return inside the catch block so ENODATA exits the loop.
Jaya Prakash [Tue, 19 May 2026 17:02:22 +0000 (17:02 +0000)]
qa: fix TEST_mon_features persistent feature checks in mon/misc.sh
The test reused the "tentacle" jq filter while validating
"nvmeof_beacon_diff", causing the comparison to fail. Also fix
the umbrella feature validation and update the expected
persistent feature count.
mgr/DaemonServer: auto-tune stats period when message queue gets backed up
The mgr can get overwhelmed when there's a lot of cluster activity and
daemons are sending stats reports faster than we can process them.
This commit adds logic to monitor the messenger queue depth and bump
up mgr_stats_period when things get congested. This reduces the
frequency of daemon stat reports, allowing the mgr to process existing
reports without being overwhelmed by new ones. The period automatically
scales back down when the queue clears up.
Added mgr_stats_period_autotune (on by default) and a queue threshold
setting. Recovery happens automatically when the queue clears up.
Max period is capped at 60 seconds to prevent excessive stat delays.
Patrick Donnelly [Tue, 19 May 2026 14:10:33 +0000 (10:10 -0400)]
.github/workflows/releng-audit: update workflows
To avoid this warning:
> Warning: Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/checkout@v3, actions/setup-python@v4. Actions will be forced to run with Node.js 24 by default starting June 2nd, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
Kefu Chai [Tue, 19 May 2026 12:58:10 +0000 (20:58 +0800)]
debian/rules: strip ceph-osd-classic and ceph-osd-crimson
override_dh_strip enumerates each binary package explicitly. It was not
updated when ceph-osd was split into the ceph-osd-classic and
ceph-osd-crimson implementation packages, so the OSD binaries in those
two packages are shipped unstripped (ceph-osd-crimson installs at ~4.6
GiB) and their -dbg packages are left empty.
Add the missing dh_strip invocations so the OSD binaries are stripped
and their debug symbols land in the corresponding -dbg packages, as is
already done for every other binary package.
Afreen Misbah [Mon, 18 May 2026 20:06:35 +0000 (01:36 +0530)]
mgr/dashboard: fix remaining FA icon references and test failures
- Fix icon size mismatches and HTML lint errors
- Fix remaining FA icon references in tests
- Replace FA icons with Carbon in upgrade component:
use cds-inline-loading for spinners, cd-icon for status icons
- Update test selectors for Carbon icon queries
Fixes: https://tracker.ceph.com/issues/76631 Signed-off-by: Afreen Misbah <afreen23@gmail.com> Assisted-by: Claude
Afreen Misbah [Sun, 17 May 2026 16:43:59 +0000 (22:13 +0530)]
mgr/dashboard: fix filter icon alignment in table toolbar
Replace Bootstrap inline styles with proper CSS class for filter
icon and select dropdowns alignment. Created filter-wrapper class
to properly align filter icon with select elements using flexbox.
Signed-off-by: Afreen Misbah <afreen@ibm.com> Assisted-by: Claude Fixes: https://tracker.ceph.com/issues/76631
Afreen Misbah [Sun, 17 May 2026 15:07:45 +0000 (20:37 +0530)]
mgr/dashboard: fix missing loader and zone group icon
- Add state="active" to cds-inline-loading in card-row component
to properly show loading spinner for table row actions
- Replace parentChild icon with clusterIcon (web-services--cluster)
for zone group representation in RGW multisite
- Remove parentChild from Icons enum and replace with
WebServicesCluster in components.module.ts
- Import ComponentsModule in rgw.module.ts for cd-icon support
Signed-off-by: Afreen Misbah <afreen@ibm.com> Assisted-by: Claude Fixes: https://tracker.ceph.com/issues/76631
Added LoadingModule and InlineLoadingModule imports to:
- block.module.ts
- cephfs.module.ts
- cluster.module.ts
(rgw.module.ts and components.module.ts already had them)
Signed-off-by: Afreen Misbah <afreen@ibm.com> Assisted-by: Claude Fixes: https://tracker.ceph.com/issues/76631
Afreen Misbah [Sun, 17 May 2026 00:14:41 +0000 (05:44 +0530)]
mgr/dashboard: remove font awesome references
- Remove .fa and .fa-* class styles from component SCSS files
- Remove FA icon spacing rules from global styles
- Clean up .fa-stack styles (FA stacking feature)
- Remove FA-specific color styles
- Remove FA icons
Signed-off-by: Afreen Misbah <afreen@ibm.com> Assisted-by: Claude Fixes: https://tracker.ceph.com/issues/76631
Bill Scales [Tue, 19 May 2026 06:05:13 +0000 (07:05 +0100)]
doc/dev/internals: Improve Ceph Internals TOC
The Ceph internals section of the docs is a bit of a mess
as far as the table of contents is concerned. This commit
tries to add a bit more structure grouping topics by
area and trying to arrange them in a more logical order.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
rgw/dedup: add --allow/deny-bucket-list and --allow/deny-storage-class-list to dedup commands
Resolves: bz#2413730 Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
Casey Bodley [Tue, 12 May 2026 18:58:16 +0000 (14:58 -0400)]
librados/asio: clear cancellation slot in associated executor
the librados callback function `AsyncOp::aio_dispatch()` runs on
Objecter's finisher strand executor, and dispatches the completion
handler to its associated executor
asio cancellation is not thread-safe, so should be synchronized on that
associated executor. move the call to `slot.clear()` from that librados
callback into the AsyncHandler wrapper so it doesn't run until we've
switched to the correct executor
because our `op_cancellation` handler depends on an `AioCompletion`
pointer, we have to clear the cancellation slot before that
`AioCompletion` lifetime ends
Patrick Donnelly [Mon, 18 May 2026 14:20:08 +0000 (10:20 -0400)]
Merge PR #68937 into main
* refs/pull/68937/head:
.github/workflows/releng-audit: group events to serialize executions
.github/workflows/releng-audit: remove override on reopen
.github/workflows/releng-audit: refactor auth check to function
Kobi Ginon [Mon, 18 May 2026 13:45:32 +0000 (16:45 +0300)]
cephadm: disable UDP in samples/nfs.json for test_cephadm Ganesha
test_cephadm.sh deploys NFS through cephadm _orch deploy using
src/cephadm/samples/nfs.json. That sample is separate from the mgr
ganesha.conf.j2 template, which already sets Enable_UDP = false.
Without that setting, Ganesha on Rocky 10 (ceph-ci image) fails during
startup with "Cannot register NFS V3 on UDP", so test_cephadm.sh never
sees ganesha.nfsd listening on port 2049.
Add Protocols = 3, 4 and Enable_UDP = false to NFS_CORE_PARAM so the
sample matches the orchestrator defaults. Fixes: https://tracker.ceph.com/issues/76295 Signed-off-by: Kobi Ginon <kginon@redhat.com>
Shai Fultheim [Sun, 17 May 2026 08:27:00 +0000 (11:27 +0300)]
crimson/os/seastore: yield to user IO between cleaner cycles
After the deadlock fix in the preceding commit ("fix IO-block deadlock
when cleaner is sleeping"), the cleaner stays awake while user IO is
blocked, but a second symptom appears at high alive_ratio (~0.79): the
cleaner's segment-allocate-and-fill loop runs tightly enough that the
user-IO continuation scheduled by maybe_wake_blocked_io() never gets a
chance to retry try_reserve_io() before the cleaner consumes the
projected_avail headroom again on its next iteration. User IO wakes,
sees projected_avail still below hard_limit, re-blocks immediately.
In the qa/standalone/crimson randwrite bench this manifests as: cluster
makes 500-700 GB of progress, then user_written counter freezes for
~75 seconds (watchdog window) while the cleaner is fully busy.
In BackgroundProcess::run(), after each do_background_cycle, if user IO
is currently blocked, yield to the reactor. That gives the woken
user-IO continuation a chance to slot in and complete a reservation
before the cleaner starts its next reservation-consuming cycle.
With this change, the same bench runs 19 minutes (vs 11-16 min) and
writes 785 GB user (vs 506-692 GB) before the next cluster limit hits,
which is the inherent throughput cap at alive_ratio 0.79 where each
reclaim only frees ~21% of segment size — not a coordination bug.
Shai Fultheim [Sun, 17 May 2026 04:43:19 +0000 (07:43 +0300)]
crimson/os/seastore: fix IO-block deadlock when cleaner is sleeping
Two coordinated changes that together close a stall observed at high
alive_ratio in the qa/standalone/crimson randwrite bench (one OSD
frozen for 70+ minutes, alive_ratio ~0.79, projected_avail_ratio ~0.10,
slow_ops accumulating indefinitely).
1. SegmentCleaner::should_clean_space() used segments.get_available_ratio()
(actual ratio) while should_block_io_on_clean() used
get_projected_available_ratio() (actual minus in-flight reservations).
When the actual ratio sat just above available_ratio_hard_limit but
the projected ratio dipped below it, IO would block while the cleaner
slept. Make should_clean_space() also trip on the projected ratio.
2. BackgroundProcess::reserve_projected_usage() did not wake the
background process when an IO blocked. With the cleaner asleep and
all IO blocked, nothing called maybe_wake_blocked_io() (no
release_projected_usage runs without completing IO; no segment
release runs without the cleaner). Kick do_wake_background() at the
point of blocking, so the cleaner re-evaluates and runs.
Afreen Misbah [Mon, 18 May 2026 10:01:58 +0000 (15:31 +0530)]
mgr/dashboard: fix logs e2e tests after carbonization
Update e2e test selectors to match the new Carbon component structure.
The .card-body and .message classes were replaced with .log-viewer
and .log-entry__message after carbonizing the logs component.
Assisted-by: Claude Signed-off-by: Afreen Misbah <afreen@ibm.com>