Sun Yuechi [Fri, 29 May 2026 10:39:51 +0000 (18:39 +0800)]
rgw: move SWIFT error_handler out-of-line to fix link failure
The two error_handler overrides are defined inline in rgw_rest_swift.h
and delegate to RGWSwiftWebsiteHandler::error_handler, a non-virtual
function defined in rgw_rest_swift.cc (librgw_a.a). Because the header
is included by rgw_rest.cc, the inline bodies are emitted in
librgw_common.a, which then ODR-uses that symbol across archives.
The link line lists librgw_a.a before librgw_common.a, and GNU ld only
pulls archive members on demand: when librgw_a.a is scanned nothing yet
references RGWSwiftWebsiteHandler::error_handler, so rgw_rest_swift.cc.o
is dropped and the symbol is later unresolved. This shows up as a link
failure with gcc 16 -O2.
Move the two bodies into rgw_rest_swift.cc next to the function they
call, so the ODR-use stays within the same object and the build no
longer depends on archive scan order. No functional change.
Nitzan Mordechai [Wed, 27 May 2026 11:48:14 +0000 (11:48 +0000)]
mgr/ThreadMonitor: monitor interval running in seconds and not nanoseconds
The ctor accidently use the mgr_module_monitor_interval as nanoseconds
we need to use it as seconds.
Also, prevent high cpu loop in case read_process_statm failed during
while loop
7f739adae2 dropped the last log call from get_segment_manager(), after
which `LOG_PREFIX(SegmentManager::get_segment_manager)` and
`SET_SUBSYS(seastore_device)` had no remaining users under `HAVE_ZNS`,
generating:
Afreen Misbah [Wed, 27 May 2026 00:07:38 +0000 (05:37 +0530)]
mgr/dashboard: fix nested shell quoting in cephadm e2e start-cluster
with_libvirt wraps commands in sg libvirt -c "$1", adding an extra
shell layer. Nested double quotes inside the outer double-quoted
string caused the argument to be split — with_libvirt received a
truncated $1, producing "Unterminated quoted string" on the remote
shell.
Drop the unnecessary inner double quotes around cephadm shell
arguments since cephadm shell accepts the command as separate args.
Use single quotes for the grep pattern inside the double-quoted
string so it survives the sg subshell.
stzuraski898 [Mon, 11 May 2026 20:10:07 +0000 (20:10 +0000)]
scripts: add Jenkins unit test analysis tool
Add analyze_unittest_jenkins.py to aggregate and analyze unit test
results across multiple Ceph PRs by mining Jenkins CI build logs.
The script fetches recent open PRs from GitHub, extracts Jenkins build
URLs from PR checks, downloads console logs in parallel, and parses
CTest output to generate comprehensive failure reports.
This enables data-driven decisions about test infrastructure
improvements and helps identify systemic issues in the test suite.
Fixes: https://tracker.ceph.com/issues/76513 Assisted-by: IBM Bob (Claude 3.7 Sonnet) Signed-off-by: Steven Zuraski <steven.zuraski@ibm.com>
Casey Bodley [Tue, 26 May 2026 16:03:48 +0000 (12:03 -0400)]
rgw/s3control: skip account id check for admin users
allow access to admin users that don't belong to the requested account.
this is also necessary for multisite, where requests are forwarded to
the metadata master as the multisite system user instead of the original
requester
Ronen Friedman [Tue, 26 May 2026 15:29:32 +0000 (15:29 +0000)]
osd/scrub: 'repairing' scrubs allowed at all times
Fix ScrubJob::observes_allowed_hours() to not block 'repairing'
scrubs outside of the allowed hours. This allows repair scrubs
to run at any time or day-of-week.
The fixed behaviour matches the documented requirements.
osd_scrub_queued_snaptrims_limit, introduced in PR#68737,
blocks the initiation of non-urgent scrubs on OSDs that
are overloaded with snap-trim operations.
Shai Fultheim [Sun, 17 May 2026 09:40:44 +0000 (12:40 +0300)]
crimson/os/seastore: auto-tune cleaner gc segment pick under random-write
SegmentCleaner uses one of three configurable gc formulas to select
the next segment to reclaim: GREEDY (lowest util wins), COST_BENEFIT
((1-u)*age/(2u)), or BENEFIT (an age-weighted quadratic in util).
COST_BENEFIT is the default and the right choice for journaling /
LIFO workloads, where old segments accumulate more dead bytes than
young ones — age predicts deadness, so an old high-util segment is
worth reclaiming because its util will keep rising as long as we wait.
That assumption breaks under random-write at high cluster fill. Dead
bytes spread uniformly across segments regardless of age, so age stops
predicting future deadness, and (1-u)/(2u) becomes the only term that
distinguishes candidates. With every segment in the 0.7-0.94 util
band, (1-u)/(2u) ranges from 0.227 to 0.032 — a 7x spread the formula
can easily lose to a 7x age difference. Result: a 0.94-util old
segment scores higher than a 0.68-util young one, even though
reclaiming the 0.68 segment would free 5x more space (32% of a
64 MB segment vs 6%).
Observed in qa/standalone/crimson randwrite at ~70% full: with the
unmodified formula, cleaner picks settled on 0.92-0.94 util segments
freeing ~4 MB net each; net free rate collapsed to single-digit KB/s
even though the cleaner was running cycles at ~30 µs each. fio's
stall watchdog killed the bench after 535 GB user written (target
1280 GB). Switching gc_formula = greedy by hand let the bench
complete the target.
This patch detects the mis-selection at runtime and overrides the
formula's pick with the greedy choice only when the difference is
significant. In get_next_reclaim_segment() we already iterate all
closed reclaimable segments to find the formula's max-score candidate;
in the same pass we now also track the lowest-util candidate (what
GREEDY would have picked). After the loop, if greedy's free-fraction
(1 - greedy_util) is at least seastore_segment_cleaner_gc_autotune_ratio
times the formula's pick's free-fraction (default 2.0), we swap to
greedy. Since all segments share the same size, comparing free-
fractions is equivalent to comparing freed bytes; the fraction form
avoids an unnecessary multiplication.
The full design rationale (regime-by-regime behaviour, safety guard
against picked_free near zero, score-recompute on override, threshold
calibration) lives in doc/dev/crimson/seastore.rst under the new
"Cleaner GC autotune" section. The code references it from short
inline comments.
Configurable knobs:
* seastore_segment_cleaner_gc_autotune (bool, default true) —
operators can disable the override entirely to honor the
configured formula unconditionally. Ignored when gc_formula =
greedy.
* seastore_segment_cleaner_gc_autotune_ratio (float, default 2.0,
min 1.0) — operators can tune the override threshold. Higher is
more conservative (preserves age weighting more aggressively);
lower is more aggressive (behaviour converges toward pure greedy).
The override predicate is factored into a static helper
`SegmentCleaner::should_override_to_greedy(picked_free, greedy_free,
ratio)` so the call site stays readable and the predicate is
independently testable.
With this change the qa/standalone/crimson randwrite bench at 70%
fill completes the target run rather than stalling at the 500-600 GB
mark, with the override firing reliably under high uniform alive_
ratio and not firing under low or non-uniform alive_ratio. Override
behaviour can be observed with debug_seastore_cleaner=20.
ShreeJejurikar [Wed, 20 May 2026 07:18:03 +0000 (12:48 +0530)]
qa/rgw/bucket-logging: configure STS for assume-role test
Set rgw sts key and enable rgw s3 auth use sts, both needed by
test_bucket_logging_requester_assumed_role. Mirrors the existing
settings in qa/suites/rgw/verify/overrides.yaml.
Xuehan Xu [Fri, 15 May 2026 09:10:04 +0000 (17:10 +0800)]
crimson/os/seastore: also update the mappings copied by client
transactions when committing background rewriting transactions
With the 128-bit laddr key layout in place, SeaStore::rename would
involve copying mappings. These mappings must also be updated when
the logical extents they point to are rewritten.
This commit introduces performance counters for individual Ceph mgr modules.
These counters allow monitoring module behavior, debugging latency issues,
and identifying performance bottlenecks, all without modifying the modules themselves.
The following counters are now exposed under:
> ceph daemon mgr.<id> perf dump
Example structure:
"mgr_module_<module_name>": {
"notify_avg_usec": { <- Average time spent handling notify events
"avgcount": 0,
"sum": 0
},
"cmd_avg_usec": { <- Average time spent processing CLI/admin commands
"avgcount": 0,
"sum": 0
},
"serve_avg_usec": { <- Average time spent in module serve loop (if applicable)
"avgcount": 0,
"sum": 0
},
"alive": 1 <- Module is alive (1 = running, 0 = exited)
"cpu_usage": 0, <- CPU usage in percent
"mem_rss_change": 0, <- Memory RSS change in bytes
"mem_rss_current": 490737664 <- Memory RSS current in bytes
}
Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>
Conflicts:
src/mgr/ActivePyModules.cc - finisher.queue changed by 63859, adding py_module to the parameter list
src/mgr/PyModuleRegistry.cc - check_all_modules_started added by 63859
ceph-volume: OSD mapper lifecycle (LVM + raw) for activate
This adds small helpers so activate can consistently bring the OSD device
stack online (LVM lvchange, optional mapper open) and tear it down again,
with refresh in between. Same idea for the raw path. Crypto is handled
inside that flow when the OSD is encrypted.
Kefu Chai [Sun, 24 May 2026 08:25:46 +0000 (16:25 +0800)]
rgw: bump Apache Arrow submodule from 17.0.0 to 19.0.1
When WITH_SYSTEM_ARROW is false, Ceph builds Arrow from the bundled
src/apache submodule. Our CI uses ubuntu:jammy as the base image, which
does not package libarrow-dev, so the bundled path is always taken there.
Arrow 17.0.0 vendors a copy of Thrift whose download URLs are no longer
reachable, breaking CI builds that try to fetch them at configure time.
Bump arrow submodule to 19.0.1, the latest Arrow release that:
- builds successfully on ubuntu:jammy, and
- requires only CMake 3.22 (the version shipped by ubuntu:jammy)
See also
CMake version shipped by ubuntu:jammy
- https://packages.ubuntu.com/jammy/cmake
Kefu Chai [Fri, 22 May 2026 11:01:17 +0000 (19:01 +0800)]
crimson/scrub: fix assert in PGScrubber::release_range() on interval change
when an interval change occurs while ScrubReserveRange is still
waiting to acquire background_process_lock, ChunkState::exit()
calls release_range() but blocked is not yet set. this triggers
ceph_assert(blocked) in release_range().
fix by checking if blocked is set before asserting. if blocked is
not set, the range was never reserved, so release_range() is a
no-op. ScrubReserveRange's finally block handles lock cleanup in
this case.