git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/log

]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/log

projects / ceph.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Guillaume Abrioux [Fri, 29 May 2026 11:13:52 +0000 (13:13 +0200)]

cephadm: cephadm: omit --osd-type classic for older ceph-volume

tentacle doesn't know that flag yet.
During an upgrade, teuthology tests can break.
With this fix, we only add the flag when osd_type isn't classic.

Fixes: https://tracker.ceph.com/issues/76968
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>

commit | commitdiff | tree

Aashish Sharma [Fri, 29 May 2026 11:01:50 +0000 (16:31 +0530)]

mgr/dashboard: Add Sync from/sync from all options on master zone edit

In the dashboard, master zone's edit functionality include the expected "Sync from Zones" and "Sync from All Zones" options

Fixes: https://tracker.ceph.com/issues/76989
Signed-off-by: Aashish Sharma <aasharma@redhat.com>

commit | commitdiff | tree

Sun Yuechi [Fri, 29 May 2026 10:39:51 +0000 (18:39 +0800)]

rgw: move SWIFT error_handler out-of-line to fix link failure

The two error_handler overrides are defined inline in rgw_rest_swift.h
and delegate to RGWSwiftWebsiteHandler::error_handler, a non-virtual
function defined in rgw_rest_swift.cc (librgw_a.a). Because the header
is included by rgw_rest.cc, the inline bodies are emitted in
librgw_common.a, which then ODR-uses that symbol across archives.

The link line lists librgw_a.a before librgw_common.a, and GNU ld only
pulls archive members on demand: when librgw_a.a is scanned nothing yet
references RGWSwiftWebsiteHandler::error_handler, so rgw_rest_swift.cc.o
is dropped and the symbol is later unresolved. This shows up as a link
failure with gcc 16 -O2.

Move the two bodies into rgw_rest_swift.cc next to the function they
call, so the ODR-use stays within the same object and the build no
longer depends on archive scan order. No functional change.

Signed-off-by: Sun Yuechi <sunyuechi@iscas.ac.cn>

commit | commitdiff | tree

Sun Yuechi [Fri, 29 May 2026 10:19:18 +0000 (18:19 +0800)]

cmake/AddCephTest: use namespaced Catch2 imported targets

AddCephTest.cmake links the bare target names Catch2 / Catch2WithMain.
With WITH_SYSTEM_CATCH2=ON, CPM resolves Catch2 via find_package(),
which only exports the namespaced IMPORTED targets Catch2::Catch2 /
Catch2::Catch2WithMain. CMake then treats the bare names as plain
library names and the link fails with -lCatch2WithMain, since the
physical library is named libCatch2Main (OUTPUT_NAME "Catch2Main").

Use the namespaced names. Catch2 exports them as ALIASes in the bundled
(CPM subproject) build too, so the default path keeps working.

Signed-off-by: Sun Yuechi <sunyuechi@iscas.ac.cn>

commit | commitdiff | tree

Vallari Agrawal [Wed, 27 May 2026 12:17:55 +0000 (17:47 +0530)]

qa/suites/nvmeof: ignore "have only 1 nvmeof gateway"

Add "have only 1 nvmeof gateway" to ignorelist.
NVMEOF_SINGLE_GATEWAY is already part of ignorelist
but tests sometimes fail on "have only 1 nvmeof gateway".

Thrasher or scalability tests can trigger this but there
are enough asserts to ensure all expected gateways are
up, we can safely ignore this healthcheck warning.

Fixes: https://tracker.ceph.com/issues/75913
Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>

commit | commitdiff | tree

Shai Fultheim [Fri, 29 May 2026 09:26:39 +0000 (12:26 +0300)]

crimson/os/seastore: add safety clamp to adaptive hard_limit and crash_floor

Signed-off-by: Shai Fultheim <shai.fultheim@gmail.com>

commit | commitdiff | tree

Redouane Kachach [Fri, 29 May 2026 09:09:44 +0000 (11:09 +0200)]

qa/tasks: capture CommandCrashedError when running nvme list cmd

The safe_while retry loop does not catch exceptions, so a
CommandCrashedError from `nvme list` bypasses it entirely. Catch
CommandCrashedError and continue the retry loop instead.

Fixes: https://tracker.ceph.com/issues/76984
Signed-off-by: Redouane Kachach <rkachach@ibm.com>

commit | commitdiff | tree

Shai Fultheim [Tue, 19 May 2026 22:53:21 +0000 (01:53 +0300)]

crimson/os/seastore: adaptive cleaner gc_max from observed user-burst peak

The previous commit adapts `hard_limit` to track the cleaner's observed
open-segment peak, removing the hard-coded `.10` floor and cutting WAF
~43%. With hard_limit adaptive, the remaining WAF lever is `gc_max` —
the threshold that gates when the cleaner runs in non-emergency mode
and therefore the cluster's steady-state operating fill. Lower gc_max
= higher fill = more dead bytes per reclaim cycle = fewer live bytes
copied = lower GC component of WAF.

The hard-coded default of `0.15` (cleaner triggers at 85% segment
fill) is over-provisioned for the typical cluster. On the bench
workload the empirically optimal `gc_max` is about 0.08, which at the
default 0.15 means ~7% of cluster space sits unused and ~1.5x of WAF
is paid for the privilege.

This commit makes gc_max adaptive: it decays each window from its
initial static value toward an observation-derived floor

  target_floor = hard_limit + (peak_projected_used / total)

The floor is the smallest gap the cluster needs to absorb its observed
worst-case in-flight user reservation. `peak_projected_used` is tracked
across the cluster's lifetime with a slow exponential decay applied
each adjust cycle.

Decay rate
==========

The decay multiplier is `0.995` per 30 s elapsed window. The decay is
applied lazily: each call to `maybe_adjust_thresholds()` raises 0.995
to the actual elapsed seconds / 30. This way the decay catches up
correctly even if the background process was idle and the hook went
uncalled for many cycles. A naive per-call multiplication would freeze
the decay during idle phases (the issue observed in v1 testing where
peak stayed at its high-water mark across a 45-minute idle window).

Decay timeline (fraction of original value remaining, on a system
where maybe_adjust_thresholds is called at least every 30 s during
idle — or any interval, since the decay is now elapsed-time-based):

  - half-life: log(0.5) / log(0.995) ≈ 138 windows ≈ 69 min ≈ 1 hour
  - peak retention timeline:
       5 min  → 95 %
      30 min  → 74 %
       1 hour → 55 %
       4 hours →  9 %
      12 hours →  0.2 %
      24 hours → effectively 0

So a single observed peak influences gc_max strongly for ~1 hour,
noticeably for ~4 hours, and is essentially forgotten within a day.

This is sized to be much longer than transient bench phases (peaks
remain >92% of true value within a 16 min bench, never roll out
prematurely) yet much shorter than workload-shift timescales (a
workload that genuinely eases sees gc_max shrink within hours).

Re-discovery
============

The decay lets gc_max eventually re-discover lower floors when a
workload genuinely eases, while preserving observed peaks long enough
that transient bursts inside a steady workload don't roll out
prematurely.

gc_max is bounded below by the floor at all times — so the workload's
observed needs are always satisfied without static tuning. Each
window, gc_max moves halfway toward the floor (`gc_max = max(floor,
(gc_max + floor) / 2)`). This is binary-search-style convergence:
distance to floor halves per window. When the floor rises (workload
reveals a new peak), gc_max jumps up to meet it immediately. When the
floor falls (peaks have decayed below current gc_max), gc_max halves
toward the lower value over the next several windows.

Bootstrap safety: gc_max retains the existing static initial value
(0.15), so a freshly mounted cleaner runs at the same operating point
as today's code until observations have accumulated. This avoids the
"cluster crashes before adaptive sees a workload" failure mode that
naive `gc_max = hard_limit + observed` produces.

Implementation
==============

A single double member on SegmentCleaner: `peak_projected_used_decayed`
is updated to `max(current, projected_used_bytes)` on each
`try_reserve_projected_usage()` call. `maybe_adjust_thresholds()`
applies `std::pow(0.995, elapsed_sec / 30.0)` decay on each invocation
(every ≥30 s in steady state, longer if the cleaner was idle). The
floor uses this value directly.

Bench measurements (qa/standalone/crimson randwrite, 1 MiB writes,
32 GiB per-OSD null_blk, 70% fill, 1280 GiB write target):

  Configuration                          | WAF     | Duration | Status
  ---------------------------------------|---------|----------|---------
  Static defaults (gc_max=.15, hard=.10) |   5.749 |   33 min | clean
  Manual tuned (gc_max=.08, hard=.02)    |   2.926 |   16 min | clean
  Adaptive hard_limit only               |   3.276 |   17 min | clean
  Adaptive hard_limit + gc_max (HEAD)    |   2.829 |   17 min | clean

Adaptive gc_max reduces WAF a further 14% vs hard_limit-only (3.276 ->
2.829) and slightly beats the hand-tuned manual point (2.926). The
per-OSD adaptation captures workload asymmetry that uniform static
defaults can't: on the bench's PG-imbalanced setup the lightly-loaded
osd.0 settled at gc_max=0.026 (much tighter than the manual 0.08)
while osd.1 took the full traffic and settled at gc_max=0.084. Both
extract maximum efficiency for their actual load instead of running
at worst-case-conservative values.

A separate decay-validation run (45-minute idle interlude between two
heavy phases) confirmed that the lazy decay catches up correctly even
when the background process was dormant during the idle phase.

No new workload-tuned constants are introduced. The literal numbers
in this commit are:
  - the 30 s window from the previous commit (time scale of the
    feedback loop)
  - the binary-search halving rate (control geometry, not workload-
    specific; could be 1/3 or 1/4 with similar convergence)
  - the 0.995 decay rate (per-window multiplier; gives the ~1-hour
    half-life and ~24-hour full-forget behaviour described above;
    recompile-only)

The existing `get_default()` value of `0.15` is left untouched as the
bootstrap initial — operators who disable adaptive control (future
config knob) revert to today's exact behaviour.

Signed-off-by: Shai Fultheim <shai.fultheim@gmail.com>

commit | commitdiff | tree

Shai Fultheim [Tue, 19 May 2026 10:55:02 +0000 (13:55 +0300)]

crimson/os/seastore: adaptive cleaner hard_limit from observed open-segment peak

The cleaner's `available_ratio_hard_limit` controls when user IO blocks
(once projected_aratio < hard_limit). Setting it too high causes
unnecessary blocks during transient pressure; setting it too low risks
running out of free segments for the cleaner's own working set, which
aborts the OSD with "seastore device size setting is too small".

The current default of `0.10` was chosen empirically and does not scale
with cluster geometry. On a 32 GiB cluster with default 64 MiB segments,
`0.10` reserves ~3 GiB of always-empty space. The cleaner's actual
named-writer working set is 1 journal + `seastore_hot_tier_generations`
hot writers + `seastore_cold_tier_generations` cold writers + 1
metadata writer = (hot + cold + 2) segments. For the typical defaults
(5 hot, 3 cold) that is 10 segments = 640 MiB on a 32 GiB OSD = 2.0%.
Reserving 10% leaves ~80% of that "headroom" sitting unused, which
causes the cluster to operate at lower fill, accumulate fewer dead
bytes per segment, and pay 4-5x WAF on garbage collection cycles.

This commit makes hard_limit adaptive: track the peak open-segment
count observed during each 30 s window, then derive

  hard_limit = max(observed_peak, named_writers) + 1
             ────────────────────────────────────────
                       (segments_in_cluster)

where the "+ 1" segment is the minimum safety unit (one more open
segment than ever observed). The `named_writers` count is the
architectural floor below which the cleaner cannot allocate; staying
above it prevents the abort. `observed_peak` floats to track the
actual transient overhead introduced by segment transitions in the
running workload.

Implementation
==============

`AsyncCleaner::maybe_adjust_thresholds()` is added as a virtual no-op
hook; `SegmentCleaner` overrides it. The hook is invoked once per
`BackgroundProcess::run()` iteration. Each call samples the current
open-segment count into the rolling window peak. Every 30 s, the
window's peak is consumed to recompute hard_limit, and the window
resets.

`config_t config` loses its `const` qualifier; the only mutation is
this hook, which is the single writer in the cleaner's shard.

This commit only adapts `hard_limit`. `gc_max` remains at its existing
default (0.15). A follow-up commit will add adaptive `gc_max` driven
by observed user-burst and cleaner-cycle peaks; that is where the
remaining WAF reduction lives.

Bench measurements
==================

qa/standalone/crimson randwrite at 70% fill, 1 MiB writes, 32 GiB
per-OSD null_blk backing, 1280 GiB write target. Comparison against
the same workload with static `hard_limit = 0.10`:

  Metric                | static (0.15, 0.10) | adaptive hard_limit |
  ----------------------|---------------------|---------------------|
  user_written          |          1,374 GiB  |          1,374 GiB  |
  device_written        |          7,901 GiB  |          4,503 GiB  |
  WAF (d / u)           |              5.749  |              3.276  |
  completion            |              100 %  |              100 %  |
  bench duration        |             33 min  |             17 min  |
  fio exit              |             rc = 0  |             rc = 0  |
  observed peak open    |                  -  |       7 (each OSD)  |
  computed hard_limit   |                  -  |             0.0215  |

WAF drops 43 % and end-to-end throughput nearly doubles. The mechanism
is that fewer projected_aratio dips cross the (much lower) block
threshold, so the cluster spends less time in the block-recover-block
cycle that bloats device_written without progressing user_written.

No new workload-tuned constants are introduced. The two literal
numbers in the algorithm are the 30 s recompute interval (time scale
of the feedback loop, not workload-specific) and the `+ 1 segment`
safety unit (the smallest possible buffer in units the cleaner can
allocate).

Signed-off-by: Shai Fultheim <shai.fultheim@gmail.com>

commit | commitdiff | tree

Ronen Friedman [Thu, 28 May 2026 16:14:41 +0000 (16:14 +0000)]

osd/scrub: prefer modern C++ random facilities

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

commit | commitdiff | tree

Ronen Friedman [Thu, 28 May 2026 16:08:41 +0000 (16:08 +0000)]

osd/scrub: guarantee the double version of abs()

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

commit | commitdiff | tree

Ronen Friedman [Thu, 28 May 2026 16:01:24 +0000 (16:01 +0000)]

osd/scrub: fix level comparison in cmp_*_entries()

The existing code was, by mistake, asymmetric.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

commit | commitdiff | tree

Ronen Friedman [Thu, 28 May 2026 15:54:44 +0000 (15:54 +0000)]

osd/scrub: remove the unused current-time parameter

from both adjust_deep_schedule() and adjust_shallow_schedule().
In both cases, we only rely on the 'last' stamps to
determine the next scheduled time.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

commit | commitdiff | tree

Ronen Friedman [Thu, 28 May 2026 15:25:29 +0000 (15:25 +0000)]

osd/scrub: removing dead code

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

commit | commitdiff | tree

Ronen Friedman [Thu, 28 May 2026 15:24:44 +0000 (15:24 +0000)]

osd/scrub: fixing doc lines re 'repairing' scrubs

Scrubs with the 'repairing' urgency are not subject to the
'no-scrub' flags, nor are they blocked by max-concurrency.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

commit | commitdiff | tree

Ronen Friedman [Fri, 29 May 2026 04:00:00 +0000 (07:00 +0300)]

Merge pull request #69110 from ronen-fr/wip-rf-hours

osd/scrub: 'repairing' scrubs allowed at all times

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Fri, 29 May 2026 01:35:28 +0000 (21:35 -0400)]

script/backport-create-issue: update custom field name

It's now "Ceph Release". I renamed it for clarity.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Thu, 28 May 2026 23:48:14 +0000 (19:48 -0400)]

Merge PR #68936 into main

* refs/pull/68936/head:
osd: Fix bug when calculating min_peer_features

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Alex Ainscow <aainscow@uk.ibm.com>

commit | commitdiff | tree

Yuri Weinstein [Mon, 13 Apr 2026 21:33:15 +0000 (14:33 -0700)]

doc: squid 19.2.4 release notes

Signed-off-by: Yuri Weinstein <yweinste@redhat.com>

commit | commitdiff | tree

David Galloway [Thu, 28 May 2026 21:25:06 +0000 (17:25 -0400)]

Merge pull request #69073 from ShwetaBhosale1/fix_nfs_version_issue

Use GANESHA_REPO_BASEURL for NFS-Ganesha on all distros

commit | commitdiff | tree

John Mulligan [Tue, 5 May 2026 17:47:44 +0000 (13:47 -0400)]

cephadm: in cephadm shell mount /var/lib/ceph under /srv

When running cephadm shell mount /var/lib/ceph at /srv/ceph unless
/var/lib/ceph is already being passed to cephadm shell -v option.
The default mount is read only. Passing it manually allows the user
to mount it in a custom location read/write.

The mount location at /srv/ceph is chosen because /var/lib/ceph is
already in use for compatibility with various ceph. The /srv tree
is currently unused by the container and serves a similar purpose
to /var/lib if you turn your head in squint a little.

Making this change enables the use of tools that want to read
files or connect to sockets in that file heirarchy. Specifically,
in this case, the ceph smb ctl tool.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

commit | commitdiff | tree

John Mulligan [Fri, 3 Apr 2026 19:13:47 +0000 (15:13 -0400)]

python-common/ceph/smb: add frontend entry point to library

Add a __main__.py file with a frontend for interacting with the
remote-control (grpc) feature for SMB. This can be invoked
on the command line using `python -m ceph.smb.ctl` assuming that
the ceph module is importable.

This command line makes it easier to interact with the remote-control
server without knowing a lot about how it is implemented.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

commit | commitdiff | tree

John Mulligan [Wed, 1 Apr 2026 22:22:53 +0000 (18:22 -0400)]

python-common/ceph/smb: add client.py for remote-control grpc client

Add a new client.py that contains the main library for acting as a
client of the remote-control grpc service for SMB. This is based on grpc
reflection rather than rigidly following an api generated from protobuf.
As this system is rapidly evolving this avoids having to keep generated
files in sync and more closely matches the grpcurl tool people are
already using with this feature.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Thu, 28 May 2026 17:57:54 +0000 (13:57 -0400)]

Merge PR #69055 into main

* refs/pull/69055/head:
qa/suites/upgrade: ignore osd in unknown state

Reviewed-by: Laura Flores <lflores@redhat.com>

commit | commitdiff | tree

John Mulligan [Thu, 28 May 2026 17:53:15 +0000 (13:53 -0400)]

Merge pull request #69128 from xhernandez/fix-unbound-var

mgr/smb: fix incorrect referenced variable

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Shwetha Acharya <sacharya@redhat.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>

commit | commitdiff | tree

Adam Kupczyk [Thu, 28 May 2026 16:28:59 +0000 (18:28 +0200)]

Merge pull request #3 from ifed01/wip-ifed-improve-stray-spanning-blobs-fix

os/bluestore: improve stray spanning blobs fix

commit | commitdiff | tree

Ronen Friedman [Thu, 28 May 2026 14:55:41 +0000 (14:55 +0000)]

osd/scrub: removed a misleading comment about 'overdue' scrubs

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

commit | commitdiff | tree

Jon Bailey [Tue, 26 May 2026 12:31:47 +0000 (13:31 +0100)]

qa: test_rados_tool - change check on osd dump command to use json

Previously the test_rados_tool.sh test was dependant on flag ordering. This mean if you added a new flag after full_quota (such as split_reads or ec_optimizations), this could break the teuthology test if we try to test with these flags on. We prevent this by changing this condition to use json to ensure we are no longer depend on the order of the flags which the default command line output gives.

This also adds a check to ensure the pool name matches what we are working on, to ensure we don't get false-positives if we happened to have other pools.

Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>

commit | commitdiff | tree

Callum James [Tue, 21 Apr 2026 10:08:16 +0000 (11:08 +0100)]

osd: Using objects_read_local instead of objects_read_sync

Signed-off-by: Callum James <callum.james@ibm.com>

commit | commitdiff | tree

Alex Ainscow [Thu, 5 Feb 2026 13:08:10 +0000 (13:08 +0000)]

docs: Update design document

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Signed-off-by: Callum James <callum.james@ibm.com>

commit | commitdiff | tree

Alex Ainscow [Thu, 5 Feb 2026 13:07:51 +0000 (13:07 +0000)]

*: Update PendingReleaseNotes for new parameter

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Signed-off-by: Callum James <callum.james@ibm.com>

commit | commitdiff | tree

Alex Ainscow [Thu, 16 Apr 2026 09:28:29 +0000 (10:28 +0100)]

test/librbd: Fix infinite recursion in MockTestMemRadosClient::do_mon_command

The do_mon_command() method was calling the mocked mon_command() which
has a default action that calls do_mon_command(), creating an infinite
recursion that caused a segmentation fault due to stack overflow.

Fixed by calling TestRadosClient::mon_command() (the base class
implementation) instead of the mocked version.

This resolves the segfault in unittest_librbd when running the
run-rbd-unit-tests.sh script.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>

commit | commitdiff | tree

Jon Bailey [Wed, 4 Mar 2026 13:45:57 +0000 (13:45 +0000)]

test: modify osd types tests to include the new pool parameters so they are tested

Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>

commit | commitdiff | tree

Alex Ainscow [Thu, 5 Feb 2026 14:51:46 +0000 (14:51 +0000)]

test: Parameterize librados tests and add more split op tests.

Previously the librados tests were each restricted to a particular
configuration. Here we parameterize to execute against multiple
configurations of pool and fix the necessary create/clean up work.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>

commit | commitdiff | tree

Alex Ainscow [Wed, 22 Apr 2026 09:46:43 +0000 (10:46 +0100)]

osdc/SplitOp: Fix reference_sub_read initialization bug

Fixed a bug where reference_sub_read was set to -1 when non-read operations
were processed before read operations, causing crashes when accessing
sub_reads[reference_sub_read].

Changes:
1. Added init_reference_sub_read() virtual method to initialize reference_sub_read
   early, after _calc_target() populates the acting set but before processing
   any operations.

2. ECSplitOp::init_reference_sub_read(): Sets reference_sub_read to the acting
   index of the primary shard by performing a reverse lookup.

3. ReplicaSplitOp::init_reference_sub_read(): Counts valid OSDs and picks a
   random acting index for load balancing.

4. Simplified init_read() in both classes by removing duplicate logic that
   previously set reference_sub_read during read processing.

5. Added safety check in SplitOp::init() to ensure the reference_sub_read
   entry exists in sub_reads before accessing it for non-read operations.

The fix ensures reference_sub_read is always set before any operations are
processed, preventing the crash that occurred when STAT, GETXATTR, or other
non-read operations appeared before READ operations in the operation list.

Tested with split_op_cxx.cc test suite: 33/35 tests pass (2 test expectation
issues unrelated to the core bug fix).

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>

commit | commitdiff | tree

Matty Williams [Mon, 20 Oct 2025 15:46:43 +0000 (16:46 +0100)]

test/osd: Add balanced read flags to io_sequence exerciser

Added optional "-b"/"balanced" flag to the end of read/read2/read3 operations in interactive mode, to make them balanced reads.
Balanced read percentage is not used in interactive mode.

Add command line argument to specify percentage of read ops that should use the balanced reads flag. Default is 100%.

Signed-off-by: Matty Williams <Matty.Williams@ibm.com>
Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>

commit | commitdiff | tree

Alex Ainscow [Thu, 5 Feb 2026 13:12:47 +0000 (13:12 +0000)]

erasure-code/consistency: Extend get shard/osd to specify namespace.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>

commit | commitdiff | tree

Alex Ainscow [Thu, 5 Feb 2026 14:45:04 +0000 (14:45 +0000)]

osdc: Refactor SplitOp

There are large number of changes in this commit which were found through
development and testing of split ops.

I have split out all the objecter updates carefully, but since the split op
code is not currently used in production, I have not documented every change
and made significant refactors/rearrangements.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>

commit | commitdiff | tree

Alex Ainscow [Thu, 5 Feb 2026 14:04:30 +0000 (14:04 +0000)]

osdc: Implement cancel mechanism for split ops.

When an op that has been split is canceled or timed out, all the
sub ops need to be canceled.

This commit adds a mechanism to map the original op to the sub reads.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>

commit | commitdiff | tree

Alex Ainscow [Thu, 5 Feb 2026 14:00:38 +0000 (14:00 +0000)]

osdc: Do not recalculate target for split ops.

SplitOp calculates the target and set the necessary target OSD itself. This
means that calc_target is not required again on first submit of the sub
read ops.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>

commit | commitdiff | tree

Alex Ainscow [Thu, 5 Feb 2026 14:38:21 +0000 (14:38 +0000)]

osdc: Split the Objecter handle reply into more utilities

This was one big monolithic function which handled the entire reply message.

SplitOps needs to re-use parts of this function. The previous split ops
commit dealt with one split, but this adds another.

Also rename the functions to have a better name.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>

commit | commitdiff | tree

Alex Ainscow [Thu, 5 Feb 2026 13:34:58 +0000 (13:34 +0000)]

osdc: Extend op_post_submit to cope with successful Ops and move SplitOp decision point.

The locking situation in Objecter is complex. When ops are completed whether with success or otherwise, some locks are held. For split ops, this is particularly complex, since multiple sessions are involved in the completion.

To avoid all these deadlock issues, splitOps choose to schedule a completion task using asio::post, which can then take the appropriate locks before completing the IO, without risk of deadlock.

Usage of this will be added in a refactor of SplitOps.

In addition, previously split ops was being calculated immediately as soon as the op was submitted. Here we move the submit down to below the throttling and timeout code. This way we throttle/timeout the original op.

Handling the timeout (op_cancel) will be handled in a later commit.

As part of this commit we also introduce a SplitOp session. This allows us to keep track of the parent ops while the child ops have been submitted and redrive the correct op(s) when necessary.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>

commit | commitdiff | tree

Alex Ainscow [Thu, 5 Feb 2026 13:30:36 +0000 (13:30 +0000)]

osdc: Add split_op statistic

This statistic counts the number of OPs which have been submitted using the
split op mechanism. It allows a user to check how useful this is and
performance/development to check that this mechanism is being used in
any given application.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>

commit | commitdiff | tree

Alex Ainscow [Thu, 5 Feb 2026 13:24:29 +0000 (13:24 +0000)]

osdc: Add config option to specify split-replica-read threshold

SplitOps will add support to split replica reads. This allows the user to
specify the threshold at which they are split.

The default is currently set at 256k. This is set at a point where we have
confidence that split ops will never reduce performance.

Development may choose to reduce this default based on performance measurements.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Signed-off-by: Callum James <callum.james@ibm.com>
Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>

commit | commitdiff | tree

Alex Ainscow [Thu, 5 Feb 2026 13:16:25 +0000 (13:16 +0000)]

osdc: Add FORCE and FAIL_ON_EAGAIN flags.

Previously, the lower levels of Objecter would potentially redrive ops to
different OSDs when the map changed, or the OSD returns -EAGAIN. These
flags will be used to change this behaviour:

* FORCE_OSD means that the OSD is fixed and cannot be changed.
* FAIL_ON_EGAIN means that rather than redriving, the OP should be failed (to splitops)

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>

commit | commitdiff | tree

Alex Ainscow [Thu, 5 Feb 2026 15:00:57 +0000 (15:00 +0000)]

osd: Remove lldiv from ECUtil

lldiv is not faster and is less clear, so refactor the code to be more readable
and faster!

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>

commit | commitdiff | tree

Alex Ainscow [Thu, 5 Feb 2026 15:00:03 +0000 (15:00 +0000)]

osd: Corrent accounting and return codes for Direct Reads

We will never return -EAGAIN from ECBackend. If ECBackend returns EAGAIN, this causes the PrimaryLogPG code to drop the op. This is for historical reasons, but hard to refactor out.
Instead, the PrimaryLogPG code has been refactored to work out that EAGAIN is required much earlier in the processing, where EAGAIN will be returned to the client.

Here we also correct accounting in do_read and sparse_read so that we can correctly track the number of bytes read from direct reads.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>

commit | commitdiff | tree

Alex Ainscow [Thu, 5 Feb 2026 13:14:07 +0000 (13:14 +0000)]

osd: Torn write protection for Direct Reads

It is possible for direct reads to query two seperate shards and
get different versions of the object for each shard when using
direct reads.

To solve this we add a get_internal_version op to tell us the version
of the object on that shard and submit that in the same transaction
as the read so we can ensure the versions are what we expect. If we
have a mismatch, we resubmit the read through the primary path.

Also a couple of spelling/tidy ups

Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>
Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Signed-off-by: Callum James <callum.james@ibm.com>

commit | commitdiff | tree

Alex Ainscow [Thu, 5 Feb 2026 15:00:03 +0000 (15:00 +0000)]

osd: SplitOp preparatory work in osd_types

- Add ec_data_shard_count interface
- Prevent sending of the split reads flag to tentacle OSDs
- Add ec data shard count and coding shard count into the pool
- Encode shard mappings into the pool, for use by Direct reads

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>

commit | commitdiff | tree

Gil Bregman [Thu, 28 May 2026 12:41:26 +0000 (15:41 +0300)]

nvmeof: Change the NVMEOF image version to 1.8

Fixes: https://tracker.ceph.com/issues/76958
Signed-off-by: Gil Bregman <gbregman@il.ibm.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 28 May 2026 11:39:55 +0000 (13:39 +0200)]

ceph-volume: retry lvs after empty result and "devices file is missing" stderr

When LVM's devices file is out of sync with the runtime device view (common
in teuthology/container namespaces with multipath), `lvs` can exit 0 with
empty stdout and only stderr warnings about missing mapper entries.
It can leave get_lvs() empty and cause Device() to fall through to lsblk on a
vg/lv path which can produce a misleading "not a block device" error.

With this fix, ceph-volume retries once with 'use_devicesfile=0' when it
detects this specific pattern.

Fixes: https://tracker.ceph.com/issues/76959
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>

commit | commitdiff | tree

Kefu Chai [Thu, 28 May 2026 02:25:19 +0000 (10:25 +0800)]

crimson: replace assert_all class with a format-safe function template

27b232ecc058 relaxed `assert_all` to accept any `const char*` instead of
only array-ref literals, after which callers started passing
`fmt::format(...).c_str()`, which returns a pointer into a temporary
`std::string` that dies at the semicolon, leaving the stored pointer
dangling when an error eventually fires.

fmtlib solves the same problem with one `format()` function and
`fmt::format_string<T...>`, which validates the format string at compile
time and accepts only literals or `fmt::runtime()` at the call site, with
no `const char*` path at all.

apply the same approach here: replace the `assert_all` class in both the
errorator class template and the `ct_error` namespace with a
`static auto assert_all(fmt::format_string<Args...>, Args&&...)` function
template. the format string is validated at the call site; inside the
lambda the args are captured by value and `fmt::vformat` is used for
runtime dispatch, avoiding the consteval re-entry that `fmt::format` would
trigger. all 56 call sites are migrated from brace-init syntax to
function-call syntax, and format arguments can now be passed directly
instead of requiring a pre-formatted string.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>

commit | commitdiff | tree

Xuehan Xu [Thu, 28 May 2026 02:03:09 +0000 (10:03 +0800)]

Merge pull request #69095 from xxhdx1985126/wip-seastore-onode-cpu-overhead-output

crimson/os/seastore/onode_manager: avoid debug related info from occupying too much cpu time

Reviewed-by: Matan Breizman <mbreizma@redhat.com>

commit | commitdiff | tree

Matthew N. Heler [Mon, 4 May 2026 15:54:13 +0000 (10:54 -0500)]

rgw/http: use a dedicated mutex for reqs_change_state

set_request_state() pushed into reqs_change_state without holding
any lock. Concurrent callers and manage_pending_requests raced on
the list and corrupted node links, crashing in std::list::_M_hook.

Use a dedicated mutex; reusing reqs_lock would invert the
completion path's reqs_lock -> req_data->lock order against the
set_request_state callers, which already hold req_data->lock.

Signed-off-by: Matthew N. Heler <matthew.heler@hotmail.com>

commit | commitdiff | tree

Casey Bodley [Wed, 27 May 2026 21:22:10 +0000 (17:22 -0400)]

Merge pull request #67246 from mheler/wip-rgw-sse-gcm

rgw: Add AES-256-GCM support for RGW server-side encryption

Reviewed-by: Casey Bodley <cbodley@redhat.com>

commit | commitdiff | tree

Tomer Haskalovitch [Mon, 4 May 2026 10:20:45 +0000 (13:20 +0300)]

mgr/dashboard: dont treat errno.EREMOTE as an error to avoid errors from listerner add command

fixes: https://tracker.ceph.com/issues/76418
Signed-off-by: Tomer Haskalovitch <tomer.haska@ibm.com>

commit | commitdiff | tree

John Mulligan [Wed, 27 May 2026 19:11:47 +0000 (15:11 -0400)]

Merge pull request #68851 from stzuraski898/wip-sz-76513-unittest-analysis-script

scripts: add Jenkins unit test analysis tool

Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Casey Bodley [Wed, 27 May 2026 18:35:40 +0000 (14:35 -0400)]

Merge pull request #69111 from cbodley/wip-doc-71265

doc/rgw: document s3control apis for PublicAccessBlock

Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>

commit | commitdiff | tree

John Mulligan [Wed, 1 Apr 2026 22:22:54 +0000 (18:22 -0400)]

python-common/ceph/smb: add file with basic config classes

Add a file with the basic configuration classes and functions needed to
set up a connection with the remote-control grpc server for SMB.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

commit | commitdiff | tree

John Mulligan [Sat, 4 Apr 2026 20:34:32 +0000 (16:34 -0400)]

python-common/ceph/smb: add a shim file for configuring grpc imports

Add a private shim file that can be used by ceph.smb.ctl to choose
the grpc implementation in order to work around issues present in some
grpc versions. These are unfortunately the versions packaged for EL
distros.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

commit | commitdiff | tree

John Mulligan [Wed, 1 Apr 2026 22:22:51 +0000 (18:22 -0400)]

python-common/ceph/smb: add typing shim file

Avoid having to put this in a bunch of files in our new ceph.smb.ctl files.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

commit | commitdiff | tree

John Mulligan [Thu, 2 Apr 2026 20:52:19 +0000 (16:52 -0400)]

python-common: add smb module to black formatting in tox.ini

Signed-off-by: John Mulligan <jmulligan@redhat.com>

commit | commitdiff | tree

John Mulligan [Fri, 3 Apr 2026 15:37:30 +0000 (11:37 -0400)]

python-common: work around some wonky grpc type checking behavior

Enable mypy type checking for gprc related packages that will be
consumed within the ceph.smb.ctl package. The google[.protobuf] package
causes problems and gets an exception instead of a working stubs
dependency - see the extensive comment in the file for more details.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

commit | commitdiff | tree

John Mulligan [Wed, 1 Apr 2026 22:22:48 +0000 (18:22 -0400)]

python-common/ceph/smb: create a new ctl submodule

This new submodule of smb will be used for interacting with the (gRPC
based) remote control api provided by smb services.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

commit | commitdiff | tree

John Mulligan [Fri, 3 Apr 2026 15:23:23 +0000 (11:23 -0400)]

python-common: copy mgr isort tox config into this tox.ini

For consistency, add tox env for running isort in python common.
Basically copies the tox.ini config.
Like pybind/mgr this is 'opt in' so we only initially have the smb
module listed here.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

commit | commitdiff | tree

John Mulligan [Fri, 3 Apr 2026 15:40:38 +0000 (11:40 -0400)]

python-common/ceph/smb: fix formatting in preparation for isort

Make check-isort happier.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

commit | commitdiff | tree

Gil Bregman [Wed, 27 May 2026 18:22:28 +0000 (21:22 +0300)]

Merge pull request #69126 from gbregman/main

mgr/dashboard: Add EC pools support to NVMEoF CLI

commit | commitdiff | tree

Casey Bodley [Tue, 26 May 2026 16:42:04 +0000 (12:42 -0400)]

doc/rgw: document s3control apis for PublicAccessBlock

Signed-off-by: Casey Bodley <cbodley@redhat.com>

commit | commitdiff | tree

Casey Bodley [Wed, 27 May 2026 17:23:56 +0000 (13:23 -0400)]

Merge pull request #68972 from bluikko/wip-common-options-rgw-separator-note

common/options: improve rgw_dns_name and clarify separator

Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>

commit | commitdiff | tree

Matan Breizman [Wed, 27 May 2026 16:54:10 +0000 (19:54 +0300)]

Merge pull request #68964 from fultheim/fix-cleaner-gc-autotune

crimson/os/seastore: auto-tune cleaner gc segment pick under random-write

Reviewed-by: Matan Breizman <mbreizma@redhat.com>

commit | commitdiff | tree

Xavi Hernandez [Wed, 27 May 2026 16:09:10 +0000 (18:09 +0200)]

mgr/smb: fix incorrect referenced variable

An unassigned variable was used in a log message. Replace it by the
relevant one.

Fixes: https://tracker.ceph.com/issues/76947
Signed-off-by: Xavi Hernandez <xhernandez@gmail.com>

commit | commitdiff | tree

NitzanMordhai [Wed, 27 May 2026 15:44:48 +0000 (18:44 +0300)]

Merge pull request #69125 from NitzanMordhai/wip-nitzan-perf-count-high-cpu

mgr/ThreadMonitor: monitor interval running in seconds and not nanose…

Reviewed-by: Sridhar Seshasayee <sridhar.seshasayee@ibm.com>

commit | commitdiff | tree

Alex Ainscow [Thu, 5 Feb 2026 13:25:20 +0000 (13:25 +0000)]

mon: Functionality for enabling and upgrading ec_direct_reads

When a cluster upgrades to umbrella, we will enable direct reads for any pool which is using ec optimizations.
We also add k and m to the pg_pool_t structure to allow more efficient parsing of the k and m values of EC rather than string parsing of the profile.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>

commit | commitdiff | tree

Alex Ainscow [Thu, 5 Feb 2026 13:19:05 +0000 (13:19 +0000)]

mon: Add mechanism for user to add/clear pool flags.

Previously, every time we had a new experimental feature, switched with a
pool flag, we needed to add a bunch of boiler plate. Given that end users
should not be using these features, adding all of this user-visible
behaviour is not desirable.

This adds a single mechanism to specify a flag set by number. These magic
numbers can be used during development and then either removed, or
promoted to user-friendly flags.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>

commit | commitdiff | tree

Laura Flores [Wed, 27 May 2026 15:26:43 +0000 (10:26 -0500)]

Merge pull request #68727 from aainscow/wip_75962

osd: Correct missing list on divergent merge of partial writes

Reviewed-by: Bill Scales <bill_scales@uk.ibm.com>

commit | commitdiff | tree

Matan Breizman [Wed, 27 May 2026 14:08:45 +0000 (17:08 +0300)]

Merge pull request #69122 from tchaikov/wip-crimson-silence-unused-warning

crimson/seastore: segment_manager: fix -Wunused warnings

Reviewed-by: Matan Breizman <mbreizma@redhat.com>

commit | commitdiff | tree

Casey Bodley [Wed, 27 May 2026 14:02:47 +0000 (10:02 -0400)]

doc/rgw: document support for bucket-level PublicAccessBlock

Signed-off-by: Casey Bodley <cbodley@redhat.com>

commit | commitdiff | tree

Matan Breizman [Wed, 27 May 2026 13:23:16 +0000 (16:23 +0300)]

Merge pull request #69019 from tchaikov/wip-crimson-wake-in-loaded

crimson/osd: wake pgs_creating waiters in PGMap::pg_loaded()

Reviewed-by: Matan Breizman <mbreizma@redhat.com>

commit | commitdiff | tree

Gil Bregman [Wed, 27 May 2026 13:14:36 +0000 (16:14 +0300)]

mgr/dashboard: Add EC pools support to NVMEoF CLI
Fixes: https://tracker.ceph.com/issues/76937
Signed-off-by: Gil Bregman <gbregman@il.ibm.com>

commit | commitdiff | tree

Casey Bodley [Wed, 27 May 2026 13:08:45 +0000 (09:08 -0400)]

Merge pull request #64293 from cbodley/wip-71265

rgw: add s3control apis for account-wide PublicAccessBlock

Reviewed-by: Adam Emerson <aemerson@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 27 May 2026 03:05:22 +0000 (23:05 -0400)]

qa/tasks/cbt: construct venv just for cbt

So we no longer need to install system-wide.

Avoids errors like on Ubuntu 24.04:

    2026-05-24T13:48:19.681 DEBUG:teuthology.orchestra.run.trial043:> python3 -m pip install -r /home/ubuntu/cephtest/cbt/requirements.txt
    2026-05-24T13:48:19.861 INFO:teuthology.orchestra.run.trial043.stderr:error: externally-managed-environment
    2026-05-24T13:48:19.861 INFO:teuthology.orchestra.run.trial043.stderr:
    2026-05-24T13:48:19.861 INFO:teuthology.orchestra.run.trial043.stderr:× This environment is externally managed
    2026-05-24T13:48:19.861 INFO:teuthology.orchestra.run.trial043.stderr:╰─> To install Python packages system-wide, try apt install
    2026-05-24T13:48:19.862 INFO:teuthology.orchestra.run.trial043.stderr:    python3-xyz, where xyz is the package you are trying to
    2026-05-24T13:48:19.862 INFO:teuthology.orchestra.run.trial043.stderr:    install.
    2026-05-24T13:48:19.862 INFO:teuthology.orchestra.run.trial043.stderr:
    2026-05-24T13:48:19.862 INFO:teuthology.orchestra.run.trial043.stderr:    If you wish to install a non-Debian-packaged Python package,
    2026-05-24T13:48:19.862 INFO:teuthology.orchestra.run.trial043.stderr:    create a virtual environment using python3 -m venv path/to/venv.
    2026-05-24T13:48:19.862 INFO:teuthology.orchestra.run.trial043.stderr:    Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
    2026-05-24T13:48:19.862 INFO:teuthology.orchestra.run.trial043.stderr:    sure you have python3-full installed.
    2026-05-24T13:48:19.862 INFO:teuthology.orchestra.run.trial043.stderr:
    2026-05-24T13:48:19.862 INFO:teuthology.orchestra.run.trial043.stderr:    If you wish to install a non-Debian packaged Python application,
    2026-05-24T13:48:19.862 INFO:teuthology.orchestra.run.trial043.stderr:    it may be easiest to use pipx install xyz, which will manage a
    2026-05-24T13:48:19.862 INFO:teuthology.orchestra.run.trial043.stderr:    virtual environment for you. Make sure you have pipx installed.
    2026-05-24T13:48:19.863 INFO:teuthology.orchestra.run.trial043.stderr:
    2026-05-24T13:48:19.863 INFO:teuthology.orchestra.run.trial043.stderr:    See /usr/share/doc/python3.12/README.venv for more information.
    2026-05-24T13:48:19.863 INFO:teuthology.orchestra.run.trial043.stderr:
    2026-05-24T13:48:19.863 INFO:teuthology.orchestra.run.trial043.stderr:note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
    2026-05-24T13:48:19.863 INFO:teuthology.orchestra.run.trial043.stderr:hint: See PEP 668 for the detailed specification.
    2026-05-24T13:48:19.883 DEBUG:teuthology.orchestra.run:got remote process result: 1
    2026-05-24T13:48:19.883 ERROR:teuthology.run_tasks:Saw exception from tasks.
    Traceback (most recent call last):
      File "/home/teuthworker/src/git.ceph.com_teuthology_3686f8793d626abcf5a0018da0a50786e41fed9d/teuthology/run_tasks.py", line 112, in run_tasks
        manager.__enter__()
      File "/home/teuthworker/src/git.ceph.com_teuthology_3686f8793d626abcf5a0018da0a50786e41fed9d/teuthology/task/__init__.py", line 122, in __enter__
        self.setup()
      File "/home/teuthworker/src/github.com_ceph_ceph-c_1bc3c25246d3a6fbc360dc78d9b4b51200743391/qa/tasks/cbt.py", line 173, in setup
        self.install_dependencies()
      File "/home/teuthworker/src/github.com_ceph_ceph-c_1bc3c25246d3a6fbc360dc78d9b4b51200743391/qa/tasks/cbt.py", line 112, in install_dependencies
        self.first_mon.run(args=pip_install_cmd)
      File "/home/teuthworker/src/git.ceph.com_teuthology_3686f8793d626abcf5a0018da0a50786e41fed9d/teuthology/orchestra/remote.py", line 596, in run
        r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/teuthworker/src/git.ceph.com_teuthology_3686f8793d626abcf5a0018da0a50786e41fed9d/teuthology/orchestra/run.py", line 461, in run
        r.wait()
      File "/home/teuthworker/src/git.ceph.com_teuthology_3686f8793d626abcf5a0018da0a50786e41fed9d/teuthology/orchestra/run.py", line 161, in wait
        self._raise_for_status()
      File "/home/teuthworker/src/git.ceph.com_teuthology_3686f8793d626abcf5a0018da0a50786e41fed9d/teuthology/orchestra/run.py", line 181, in _raise_for_status
        raise CommandFailedError(
    teuthology.exceptions.CommandFailedError: Command failed on trial043 with status 1: 'python3 -m pip install -r /home/ubuntu/cephtest/cbt/requirements.txt'

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 27 May 2026 02:21:12 +0000 (22:21 -0400)]

qa/distros: use consistent naming

Put the release name in the yaml name so it's easy to read from the job
description. "ubuntu_latest" means different things depending on the
Ceph release.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Adam King [Tue, 16 Sep 2025 16:07:36 +0000 (12:07 -0400)]

qa/tasks/nvme_loop: fix nvme loop task for ubuntu noble

Compared to older distros, this one complains if
you include `-q hostnqn` in the nvme connect command,
saying "Failed to write to /dev/nvme-fabrics: Invalid argument".
Removing that argument gets passed that error and
doesn't seem to have any downsides

Signed-off-by: Adam King <adking@redhat.com>

commit | commitdiff | tree

Casey Bodley [Thu, 12 Jun 2025 20:40:49 +0000 (16:40 -0400)]

qa/distros: add ubuntu_24.04 as supported container host

Signed-off-by: Casey Bodley <cbodley@redhat.com>
CEPH-BUILD-JOB: ceph-dev-pipeline
DISTROS: noble jammy centos9
ARCHS: x86_64
FLAVORS: default

commit | commitdiff | tree

Casey Bodley [Thu, 12 Jun 2025 20:32:37 +0000 (16:32 -0400)]

qa/distros: bump ubuntu_latest.yaml to 24.04

and add ubuntu_22.04.yaml back to distros/supported and
distros/supported-random-distro$

Signed-off-by: Casey Bodley <cbodley@redhat.com>

commit | commitdiff | tree

Casey Bodley [Thu, 12 Jun 2025 19:57:12 +0000 (15:57 -0400)]

qa/distros: add all/ubuntu_24.04.yaml

Signed-off-by: Casey Bodley <cbodley@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Fri, 22 May 2026 22:53:33 +0000 (18:53 -0400)]

qa/suites/rados/encoder: use random supported distro

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Casey Bodley [Thu, 12 Jun 2025 20:08:57 +0000 (16:08 -0400)]

qa/ceph-ansible: symlink supported-random-distro$

Signed-off-by: Casey Bodley <cbodley@redhat.com>

commit | commitdiff | tree

Casey Bodley [Thu, 12 Jun 2025 20:06:43 +0000 (16:06 -0400)]

qa/fs/fscrypt: symlink supported-random-distro$

Signed-off-by: Casey Bodley <cbodley@redhat.com>

commit | commitdiff | tree

Casey Bodley [Thu, 12 Jun 2025 20:06:03 +0000 (16:06 -0400)]

qa/cephmetrics: symlink supported-random-distro$

Signed-off-by: Casey Bodley <cbodley@redhat.com>

commit | commitdiff | tree

Nitzan Mordechai [Wed, 27 May 2026 11:48:14 +0000 (11:48 +0000)]

mgr/ThreadMonitor: monitor interval running in seconds and not nanoseconds

The ctor accidently use the mgr_module_monitor_interval as nanoseconds
we need to use it as seconds.
Also, prevent high cpu loop in case read_process_statm failed during
while loop

Fixes: https://tracker.ceph.com/issues/76938
Signed-off-by: Nitzan Mordechai <nmordec@ibm.com>

commit | commitdiff | tree

Adam Kupczyk [Wed, 27 May 2026 11:49:33 +0000 (13:49 +0200)]

options: Add desc and flags to "rocksdb_cache_shard_bits"

It is needed for docs to be properly reference options.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>

commit | commitdiff | tree

Adam Kupczyk [Mon, 4 May 2026 18:47:12 +0000 (20:47 +0200)]

doc/rados/bluestore: RockDB cache shards, perf counters

Document role of RocksDB cache shards.
Explain how to monitor cache status and performance.
Give explanation how to detect cache flickering.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>

commit | commitdiff | tree

Kefu Chai [Wed, 27 May 2026 08:52:31 +0000 (16:52 +0800)]

crimson/seastore: segment_manager: fix -Wunused warnings

7f739adae2 dropped the last log call from get_segment_manager(), after
which `LOG_PREFIX(SegmentManager::get_segment_manager)` and
`SET_SUBSYS(seastore_device)` had no remaining users under `HAVE_ZNS`,
generating:

```
src/crimson/os/seastore/segment_manager.cc:38:3: warning: unused variable 'FNAME' [-Wunused-variable]
   38 |   LOG_PREFIX(SegmentManager::get_segment_manager);
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/crimson/common/log.h:49:38: note: expanded from macro 'LOG_PREFIX'
   49 | #define LOG_PREFIX(x) constexpr auto FNAME = #x
      |                                      ^~~~~
src/crimson/os/seastore/segment_manager.cc:10:1: warning: unused variable 'SOURCE_SUBSYS' [-Wunused-const-variable]
   10 | SET_SUBSYS(seastore_device);
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~
src/crimson/common/log.h:46:52: note: expanded from macro 'SET_SUBSYS'
   46 | #define SET_SUBSYS(subname_) static constexpr auto SOURCE_SUBSYS = ceph_subsys_##subname_
      |                                                    ^~~~~~~~~~~~~
2 warnings generated.
```

drop both to silence them.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>

commit | commitdiff | tree

Vallari Agrawal [Wed, 27 May 2026 08:12:12 +0000 (13:42 +0530)]

qa/suites/nvmeof: set beacon grace and connect panic

For teuthology clusters, set:
'''
ceph config set mon mon_nvmeofgw_beacon_grace 10
ceph config set mon nvmeof_mon_client_connect_panic 60
'''
Reason explained in https://tracker.ceph.com/issues/74660#note-12

Fixes: https://tracker.ceph.com/issues/74660
Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>

commit | commitdiff | tree

Kefu Chai [Tue, 26 May 2026 14:01:41 +0000 (22:01 +0800)]

crimson/seastore: make RecordSubmitter::wait_available() idempotent

Under sustained 4K randwrite workloads that roll journal segments
frequently, crimson-osd hits
```
    crimson/os/seastore/journal/record_submitter.cc:198:
    FAILED ceph_assert(!is_available())
```
and, in release builds without assertions, a downstream
`boost::throw_exception<std::length_error>` from
`seastar::shared_promise::get_shared_future()` called on a
disengaged `std::optional` in the same code path.

`RecordSubmitter::roll_segment()` arms wait_available_promise on entry,
then chains `journal_allocator.roll().safe_then(...)` whose continuation
sets the promise's value and resets the optional. The background
continuation can resolve before the subsequent `wait_available()` call
is entered -- the optional gets reset, `is_available()` becomes true
again, and `wait_available()`'s `assert(!is_available())` fires. The
brittle invariant being assumed

> .safe_then's continuation will not run before its outer call returns

is not part of seastar's contract.

Honour the documented contract instead.  record_submitter.h
says:

> wait for available if cannot submit, should check
> is_available() again when the future is resolved.

The postcondition is "available when resolved"; the precondition
"unavailable when called" was incidental.  Make `wait_available()`
idempotent: if `is_available()` is already true on entry, return a
ready future immediately. All three external callers
- `RecordSubmitter::roll_segment`
- `CircularBoundedJournal::submit_record`
- `SegmentedOolWriter::do_write`

re-check `is_available()` on the next iteration or in the chained
continuation and dispatch correctly.

Validated by runing a 96-job fio randwrite bench to confirm
the fix in operation; pre-patch the assert fires within ~30 min
and kills the OSD.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 27 May 2026 07:41:13 +0000 (09:41 +0200)]

ceph-volume: detect rotational media under dm-crypt for workqueue bypass

bypass_workqueue() was inspecting the top level block device
(e.g: /dev/mapper/*) when deciding whether to disable read/write
workqueues for nvme devices, it must look at the real disk under
dmcrypt/lvm, not the mapper. On osd block paths the top device
often lies about rotational, so --perf-no_workqueue was wrong.

The idea of this fix is to walk sysfs 'slaves/' to the leaf, then
check rotational there (udev + rota).

Fixes: https://tracker.ceph.com/issues/76805
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>

commit | commitdiff | tree

Nizamudeen A [Wed, 27 May 2026 05:52:17 +0000 (11:22 +0530)]

Merge pull request #69116 from rhcs-dashboard/fix-cephadm-e2e-quoting

mgr/dashboard: fix nested shell quoting in cephadm e2e start-cluster

Reviewed-by: Nizamudeen A <nia@redhat.com>

commit | commitdiff | tree

Venky Shankar [Tue, 26 May 2026 12:09:53 +0000 (17:39 +0530)]

qa/cephfs: install ceph-mgr-modules-standard for cephfs tests

Now that the ceph-mgr plugins are being separated into essential and
non-essential packages (always-on vs. optional), cephfs qa suite
requires the optional packages for ceph-mgr plugins which are not
always-on, but are being tested with fs suite. The good thing is, we
can install _all_ optional plugins using ceph-mgr-modules-standard
package instead of installing cherry-picked packages.

Signed-off-by: Venky Shankar <vshankar@redhat.com>

commit | commitdiff | tree

Xuehan Xu [Thu, 21 May 2026 12:35:47 +0000 (20:35 +0800)]

crimson/os/seastore: force rewrite transactions to conflict with others
if it involve insertions on the lba tree

The issue was introduced since bd0ce704f24afbe11830d31b46d8b22771f54456

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

Unnamed repository; edit this file 'description' to name the repository.