Patrick Donnelly [Wed, 27 May 2026 03:05:22 +0000 (23:05 -0400)]
qa/tasks/cbt: construct venv just for cbt
So we no longer need to install system-wide.
Avoids errors like on Ubuntu 24.04:
2026-05-24T13:48:19.681 DEBUG:teuthology.orchestra.run.trial043:> python3 -m pip install -r /home/ubuntu/cephtest/cbt/requirements.txt
2026-05-24T13:48:19.861 INFO:teuthology.orchestra.run.trial043.stderr:error: externally-managed-environment
2026-05-24T13:48:19.861 INFO:teuthology.orchestra.run.trial043.stderr:
2026-05-24T13:48:19.861 INFO:teuthology.orchestra.run.trial043.stderr:× This environment is externally managed
2026-05-24T13:48:19.861 INFO:teuthology.orchestra.run.trial043.stderr:╰─> To install Python packages system-wide, try apt install
2026-05-24T13:48:19.862 INFO:teuthology.orchestra.run.trial043.stderr: python3-xyz, where xyz is the package you are trying to
2026-05-24T13:48:19.862 INFO:teuthology.orchestra.run.trial043.stderr: install.
2026-05-24T13:48:19.862 INFO:teuthology.orchestra.run.trial043.stderr:
2026-05-24T13:48:19.862 INFO:teuthology.orchestra.run.trial043.stderr: If you wish to install a non-Debian-packaged Python package,
2026-05-24T13:48:19.862 INFO:teuthology.orchestra.run.trial043.stderr: create a virtual environment using python3 -m venv path/to/venv.
2026-05-24T13:48:19.862 INFO:teuthology.orchestra.run.trial043.stderr: Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
2026-05-24T13:48:19.862 INFO:teuthology.orchestra.run.trial043.stderr: sure you have python3-full installed.
2026-05-24T13:48:19.862 INFO:teuthology.orchestra.run.trial043.stderr:
2026-05-24T13:48:19.862 INFO:teuthology.orchestra.run.trial043.stderr: If you wish to install a non-Debian packaged Python application,
2026-05-24T13:48:19.862 INFO:teuthology.orchestra.run.trial043.stderr: it may be easiest to use pipx install xyz, which will manage a
2026-05-24T13:48:19.862 INFO:teuthology.orchestra.run.trial043.stderr: virtual environment for you. Make sure you have pipx installed.
2026-05-24T13:48:19.863 INFO:teuthology.orchestra.run.trial043.stderr:
2026-05-24T13:48:19.863 INFO:teuthology.orchestra.run.trial043.stderr: See /usr/share/doc/python3.12/README.venv for more information.
2026-05-24T13:48:19.863 INFO:teuthology.orchestra.run.trial043.stderr:
2026-05-24T13:48:19.863 INFO:teuthology.orchestra.run.trial043.stderr:note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
2026-05-24T13:48:19.863 INFO:teuthology.orchestra.run.trial043.stderr:hint: See PEP 668 for the detailed specification.
2026-05-24T13:48:19.883 DEBUG:teuthology.orchestra.run:got remote process result: 1
2026-05-24T13:48:19.883 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_teuthology_3686f8793d626abcf5a0018da0a50786e41fed9d/teuthology/run_tasks.py", line 112, in run_tasks
manager.__enter__()
File "/home/teuthworker/src/git.ceph.com_teuthology_3686f8793d626abcf5a0018da0a50786e41fed9d/teuthology/task/__init__.py", line 122, in __enter__
self.setup()
File "/home/teuthworker/src/github.com_ceph_ceph-c_1bc3c25246d3a6fbc360dc78d9b4b51200743391/qa/tasks/cbt.py", line 173, in setup
self.install_dependencies()
File "/home/teuthworker/src/github.com_ceph_ceph-c_1bc3c25246d3a6fbc360dc78d9b4b51200743391/qa/tasks/cbt.py", line 112, in install_dependencies
self.first_mon.run(args=pip_install_cmd)
File "/home/teuthworker/src/git.ceph.com_teuthology_3686f8793d626abcf5a0018da0a50786e41fed9d/teuthology/orchestra/remote.py", line 596, in run
r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/teuthworker/src/git.ceph.com_teuthology_3686f8793d626abcf5a0018da0a50786e41fed9d/teuthology/orchestra/run.py", line 461, in run
r.wait()
File "/home/teuthworker/src/git.ceph.com_teuthology_3686f8793d626abcf5a0018da0a50786e41fed9d/teuthology/orchestra/run.py", line 161, in wait
self._raise_for_status()
File "/home/teuthworker/src/git.ceph.com_teuthology_3686f8793d626abcf5a0018da0a50786e41fed9d/teuthology/orchestra/run.py", line 181, in _raise_for_status
raise CommandFailedError(
teuthology.exceptions.CommandFailedError: Command failed on trial043 with status 1: 'python3 -m pip install -r /home/ubuntu/cephtest/cbt/requirements.txt'
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
Patrick Donnelly [Wed, 27 May 2026 02:21:12 +0000 (22:21 -0400)]
qa/distros: use consistent naming
Put the release name in the yaml name so it's easy to read from the job
description. "ubuntu_latest" means different things depending on the
Ceph release.
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
Adam King [Tue, 16 Sep 2025 16:07:36 +0000 (12:07 -0400)]
qa/tasks/nvme_loop: fix nvme loop task for ubuntu noble
Compared to older distros, this one complains if
you include `-q hostnqn` in the nvme connect command,
saying "Failed to write to /dev/nvme-fabrics: Invalid argument".
Removing that argument gets passed that error and
doesn't seem to have any downsides
Nitzan Mordechai [Wed, 27 May 2026 11:48:14 +0000 (11:48 +0000)]
mgr/ThreadMonitor: monitor interval running in seconds and not nanoseconds
The ctor accidently use the mgr_module_monitor_interval as nanoseconds
we need to use it as seconds.
Also, prevent high cpu loop in case read_process_statm failed during
while loop
7f739adae2 dropped the last log call from get_segment_manager(), after
which `LOG_PREFIX(SegmentManager::get_segment_manager)` and
`SET_SUBSYS(seastore_device)` had no remaining users under `HAVE_ZNS`,
generating:
Kefu Chai [Tue, 26 May 2026 14:01:41 +0000 (22:01 +0800)]
crimson/seastore: make RecordSubmitter::wait_available() idempotent
Under sustained 4K randwrite workloads that roll journal segments
frequently, crimson-osd hits
```
crimson/os/seastore/journal/record_submitter.cc:198:
FAILED ceph_assert(!is_available())
```
and, in release builds without assertions, a downstream
`boost::throw_exception<std::length_error>` from
`seastar::shared_promise::get_shared_future()` called on a
disengaged `std::optional` in the same code path.
`RecordSubmitter::roll_segment()` arms wait_available_promise on entry,
then chains `journal_allocator.roll().safe_then(...)` whose continuation
sets the promise's value and resets the optional. The background
continuation can resolve before the subsequent `wait_available()` call
is entered -- the optional gets reset, `is_available()` becomes true
again, and `wait_available()`'s `assert(!is_available())` fires. The
brittle invariant being assumed
> .safe_then's continuation will not run before its outer call returns
is not part of seastar's contract.
Honour the documented contract instead. record_submitter.h
says:
> wait for available if cannot submit, should check
> is_available() again when the future is resolved.
The postcondition is "available when resolved"; the precondition
"unavailable when called" was incidental. Make `wait_available()`
idempotent: if `is_available()` is already true on entry, return a
ready future immediately. All three external callers
- `RecordSubmitter::roll_segment`
- `CircularBoundedJournal::submit_record`
- `SegmentedOolWriter::do_write`
re-check `is_available()` on the next iteration or in the chained
continuation and dispatch correctly.
Validated by runing a 96-job fio randwrite bench to confirm
the fix in operation; pre-patch the assert fires within ~30 min
and kills the OSD.
ceph-volume: detect rotational media under dm-crypt for workqueue bypass
bypass_workqueue() was inspecting the top level block device
(e.g: /dev/mapper/*) when deciding whether to disable read/write
workqueues for nvme devices, it must look at the real disk under
dmcrypt/lvm, not the mapper. On osd block paths the top device
often lies about rotational, so --perf-no_workqueue was wrong.
The idea of this fix is to walk sysfs 'slaves/' to the leaf, then
check rotational there (udev + rota).
Venky Shankar [Tue, 26 May 2026 12:09:53 +0000 (17:39 +0530)]
qa/cephfs: install ceph-mgr-modules-standard for cephfs tests
Now that the ceph-mgr plugins are being separated into essential and
non-essential packages (always-on vs. optional), cephfs qa suite
requires the optional packages for ceph-mgr plugins which are not
always-on, but are being tested with fs suite. The good thing is, we
can install _all_ optional plugins using ceph-mgr-modules-standard
package instead of installing cherry-picked packages.
Afreen Misbah [Wed, 27 May 2026 00:07:38 +0000 (05:37 +0530)]
mgr/dashboard: fix nested shell quoting in cephadm e2e start-cluster
with_libvirt wraps commands in sg libvirt -c "$1", adding an extra
shell layer. Nested double quotes inside the outer double-quoted
string caused the argument to be split — with_libvirt received a
truncated $1, producing "Unterminated quoted string" on the remote
shell.
Drop the unnecessary inner double quotes around cephadm shell
arguments since cephadm shell accepts the command as separate args.
Use single quotes for the grep pattern inside the double-quoted
string so it survives the sg subshell.
stzuraski898 [Mon, 11 May 2026 20:10:07 +0000 (20:10 +0000)]
scripts: add Jenkins unit test analysis tool
Add analyze_unittest_jenkins.py to aggregate and analyze unit test
results across multiple Ceph PRs by mining Jenkins CI build logs.
The script fetches recent open PRs from GitHub, extracts Jenkins build
URLs from PR checks, downloads console logs in parallel, and parses
CTest output to generate comprehensive failure reports.
This enables data-driven decisions about test infrastructure
improvements and helps identify systemic issues in the test suite.
Fixes: https://tracker.ceph.com/issues/76513 Assisted-by: IBM Bob (Claude 3.7 Sonnet) Signed-off-by: Steven Zuraski <steven.zuraski@ibm.com>
Casey Bodley [Tue, 26 May 2026 16:03:48 +0000 (12:03 -0400)]
rgw/s3control: skip account id check for admin users
allow access to admin users that don't belong to the requested account.
this is also necessary for multisite, where requests are forwarded to
the metadata master as the multisite system user instead of the original
requester
Ronen Friedman [Tue, 26 May 2026 15:29:32 +0000 (15:29 +0000)]
osd/scrub: 'repairing' scrubs allowed at all times
Fix ScrubJob::observes_allowed_hours() to not block 'repairing'
scrubs outside of the allowed hours. This allows repair scrubs
to run at any time or day-of-week.
The fixed behaviour matches the documented requirements.
Adam Kupczyk [Tue, 26 May 2026 10:38:17 +0000 (10:38 +0000)]
os/bluestore: Collection::split_cache no longer purges cache
There are 2 cases when Collection::split_cache is used:
1) Merge Collections.
It this case we get more elements in dest collection,
but we do not want to trim by force as it might cause stall.
2) Split Collection
Source cache is getting thinner, and dest cache is growing.
Similarly, trimming dest will cause stall.
Decision:
It is better to not trim forcibly and relay on gradual trimming
from MempoolThread or client IO.
Adam Kupczyk [Mon, 6 Oct 2025 11:33:50 +0000 (11:33 +0000)]
os/bluestore: Allow onode eviction be part of autotune
Add bluestore_cache_meta_evict_in_autotune configurable.
It allows to evict onodes as part of autotune process.
This makes onode CacheShards size adapt to PriorityCache Meta
memory allocation.
Fixes: https://tracker.ceph.com/issues/73353 Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Adam Kupczyk [Mon, 6 Oct 2025 10:53:04 +0000 (10:53 +0000)]
os/bluestore: Limit stall from evicting onodes
Limits how many onodes can be evicted from OnodeCacheShard in one go.
Added bluestore_cache_meta_drop_limit that controls how fast
onodes can be evicted from cache.
Fixes: https://tracker.ceph.com/issues/73353 Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
osd_scrub_queued_snaptrims_limit, introduced in PR#68737,
blocks the initiation of non-urgent scrubs on OSDs that
are overloaded with snap-trim operations.
Shai Fultheim [Sun, 17 May 2026 09:40:44 +0000 (12:40 +0300)]
crimson/os/seastore: auto-tune cleaner gc segment pick under random-write
SegmentCleaner uses one of three configurable gc formulas to select
the next segment to reclaim: GREEDY (lowest util wins), COST_BENEFIT
((1-u)*age/(2u)), or BENEFIT (an age-weighted quadratic in util).
COST_BENEFIT is the default and the right choice for journaling /
LIFO workloads, where old segments accumulate more dead bytes than
young ones — age predicts deadness, so an old high-util segment is
worth reclaiming because its util will keep rising as long as we wait.
That assumption breaks under random-write at high cluster fill. Dead
bytes spread uniformly across segments regardless of age, so age stops
predicting future deadness, and (1-u)/(2u) becomes the only term that
distinguishes candidates. With every segment in the 0.7-0.94 util
band, (1-u)/(2u) ranges from 0.227 to 0.032 — a 7x spread the formula
can easily lose to a 7x age difference. Result: a 0.94-util old
segment scores higher than a 0.68-util young one, even though
reclaiming the 0.68 segment would free 5x more space (32% of a
64 MB segment vs 6%).
Observed in qa/standalone/crimson randwrite at ~70% full: with the
unmodified formula, cleaner picks settled on 0.92-0.94 util segments
freeing ~4 MB net each; net free rate collapsed to single-digit KB/s
even though the cleaner was running cycles at ~30 µs each. fio's
stall watchdog killed the bench after 535 GB user written (target
1280 GB). Switching gc_formula = greedy by hand let the bench
complete the target.
This patch detects the mis-selection at runtime and overrides the
formula's pick with the greedy choice only when the difference is
significant. In get_next_reclaim_segment() we already iterate all
closed reclaimable segments to find the formula's max-score candidate;
in the same pass we now also track the lowest-util candidate (what
GREEDY would have picked). After the loop, if greedy's free-fraction
(1 - greedy_util) is at least seastore_segment_cleaner_gc_autotune_ratio
times the formula's pick's free-fraction (default 2.0), we swap to
greedy. Since all segments share the same size, comparing free-
fractions is equivalent to comparing freed bytes; the fraction form
avoids an unnecessary multiplication.
The full design rationale (regime-by-regime behaviour, safety guard
against picked_free near zero, score-recompute on override, threshold
calibration) lives in doc/dev/crimson/seastore.rst under the new
"Cleaner GC autotune" section. The code references it from short
inline comments.
Configurable knobs:
* seastore_segment_cleaner_gc_autotune (bool, default true) —
operators can disable the override entirely to honor the
configured formula unconditionally. Ignored when gc_formula =
greedy.
* seastore_segment_cleaner_gc_autotune_ratio (float, default 2.0,
min 1.0) — operators can tune the override threshold. Higher is
more conservative (preserves age weighting more aggressively);
lower is more aggressive (behaviour converges toward pure greedy).
The override predicate is factored into a static helper
`SegmentCleaner::should_override_to_greedy(picked_free, greedy_free,
ratio)` so the call site stays readable and the predicate is
independently testable.
With this change the qa/standalone/crimson randwrite bench at 70%
fill completes the target run rather than stalling at the 500-600 GB
mark, with the override firing reliably under high uniform alive_
ratio and not firing under low or non-uniform alive_ratio. Override
behaviour can be observed with debug_seastore_cleaner=20.
ShreeJejurikar [Wed, 20 May 2026 07:18:03 +0000 (12:48 +0530)]
qa/rgw/bucket-logging: configure STS for assume-role test
Set rgw sts key and enable rgw s3 auth use sts, both needed by
test_bucket_logging_requester_assumed_role. Mirrors the existing
settings in qa/suites/rgw/verify/overrides.yaml.
Xuehan Xu [Fri, 15 May 2026 09:10:04 +0000 (17:10 +0800)]
crimson/os/seastore: also update the mappings copied by client
transactions when committing background rewriting transactions
With the 128-bit laddr key layout in place, SeaStore::rename would
involve copying mappings. These mappings must also be updated when
the logical extents they point to are rewritten.
This commit introduces performance counters for individual Ceph mgr modules.
These counters allow monitoring module behavior, debugging latency issues,
and identifying performance bottlenecks, all without modifying the modules themselves.
The following counters are now exposed under:
> ceph daemon mgr.<id> perf dump
Example structure:
"mgr_module_<module_name>": {
"notify_avg_usec": { <- Average time spent handling notify events
"avgcount": 0,
"sum": 0
},
"cmd_avg_usec": { <- Average time spent processing CLI/admin commands
"avgcount": 0,
"sum": 0
},
"serve_avg_usec": { <- Average time spent in module serve loop (if applicable)
"avgcount": 0,
"sum": 0
},
"alive": 1 <- Module is alive (1 = running, 0 = exited)
"cpu_usage": 0, <- CPU usage in percent
"mem_rss_change": 0, <- Memory RSS change in bytes
"mem_rss_current": 490737664 <- Memory RSS current in bytes
}
Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>
Conflicts:
src/mgr/ActivePyModules.cc - finisher.queue changed by 63859, adding py_module to the parameter list
src/mgr/PyModuleRegistry.cc - check_all_modules_started added by 63859
Sun Yuechi [Mon, 25 May 2026 07:07:36 +0000 (15:07 +0800)]
isa-l: enable on RISC-V
ISA-L v2.32.0 added RISC-V support. Enable the ISA-L erasure code
plugin and the zlib compressor on RISC-V when RVV is available.
RVV is detected via the existing ceph_arch_riscv_probe() path added
in 01dc12ad565, so the same Linux 6.5+ requirement applies; on older
kernels the RVV path stays disabled.
ceph-volume: OSD mapper lifecycle (LVM + raw) for activate
This adds small helpers so activate can consistently bring the OSD device
stack online (LVM lvchange, optional mapper open) and tear it down again,
with refresh in between. Same idea for the raw path. Crypto is handled
inside that flow when the OSD is encrypted.
Kefu Chai [Sun, 24 May 2026 08:25:46 +0000 (16:25 +0800)]
rgw: bump Apache Arrow submodule from 17.0.0 to 19.0.1
When WITH_SYSTEM_ARROW is false, Ceph builds Arrow from the bundled
src/apache submodule. Our CI uses ubuntu:jammy as the base image, which
does not package libarrow-dev, so the bundled path is always taken there.
Arrow 17.0.0 vendors a copy of Thrift whose download URLs are no longer
reachable, breaking CI builds that try to fetch them at configure time.
Bump arrow submodule to 19.0.1, the latest Arrow release that:
- builds successfully on ubuntu:jammy, and
- requires only CMake 3.22 (the version shipped by ubuntu:jammy)
See also
CMake version shipped by ubuntu:jammy
- https://packages.ubuntu.com/jammy/cmake