]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/log
ceph.git
5 days agocrimson/os/seastore: add safety clamp to adaptive hard_limit and crash_floor 69083/head
Shai Fultheim [Fri, 29 May 2026 09:26:39 +0000 (12:26 +0300)]
crimson/os/seastore: add safety clamp to adaptive hard_limit and crash_floor

Signed-off-by: Shai Fultheim <shai.fultheim@gmail.com>
5 days agocrimson/os/seastore: adaptive cleaner gc_max from observed user-burst peak
Shai Fultheim [Tue, 19 May 2026 22:53:21 +0000 (01:53 +0300)]
crimson/os/seastore: adaptive cleaner gc_max from observed user-burst peak

The previous commit adapts `hard_limit` to track the cleaner's observed
open-segment peak, removing the hard-coded `.10` floor and cutting WAF
~43%. With hard_limit adaptive, the remaining WAF lever is `gc_max` —
the threshold that gates when the cleaner runs in non-emergency mode
and therefore the cluster's steady-state operating fill. Lower gc_max
= higher fill = more dead bytes per reclaim cycle = fewer live bytes
copied = lower GC component of WAF.

The hard-coded default of `0.15` (cleaner triggers at 85% segment
fill) is over-provisioned for the typical cluster. On the bench
workload the empirically optimal `gc_max` is about 0.08, which at the
default 0.15 means ~7% of cluster space sits unused and ~1.5x of WAF
is paid for the privilege.

This commit makes gc_max adaptive: it decays each window from its
initial static value toward an observation-derived floor

  target_floor = hard_limit + (peak_projected_used / total)

The floor is the smallest gap the cluster needs to absorb its observed
worst-case in-flight user reservation. `peak_projected_used` is tracked
across the cluster's lifetime with a slow exponential decay applied
each adjust cycle.

Decay rate
==========

The decay multiplier is `0.995` per 30 s elapsed window. The decay is
applied lazily: each call to `maybe_adjust_thresholds()` raises 0.995
to the actual elapsed seconds / 30. This way the decay catches up
correctly even if the background process was idle and the hook went
uncalled for many cycles. A naive per-call multiplication would freeze
the decay during idle phases (the issue observed in v1 testing where
peak stayed at its high-water mark across a 45-minute idle window).

Decay timeline (fraction of original value remaining, on a system
where maybe_adjust_thresholds is called at least every 30 s during
idle — or any interval, since the decay is now elapsed-time-based):

  - half-life: log(0.5) / log(0.995) ≈ 138 windows ≈ 69 min ≈ 1 hour
  - peak retention timeline:
       5 min  → 95 %
      30 min  → 74 %
       1 hour → 55 %
       4 hours →  9 %
      12 hours →  0.2 %
      24 hours → effectively 0

So a single observed peak influences gc_max strongly for ~1 hour,
noticeably for ~4 hours, and is essentially forgotten within a day.

This is sized to be much longer than transient bench phases (peaks
remain >92% of true value within a 16 min bench, never roll out
prematurely) yet much shorter than workload-shift timescales (a
workload that genuinely eases sees gc_max shrink within hours).

Re-discovery
============

The decay lets gc_max eventually re-discover lower floors when a
workload genuinely eases, while preserving observed peaks long enough
that transient bursts inside a steady workload don't roll out
prematurely.

gc_max is bounded below by the floor at all times — so the workload's
observed needs are always satisfied without static tuning. Each
window, gc_max moves halfway toward the floor (`gc_max = max(floor,
(gc_max + floor) / 2)`). This is binary-search-style convergence:
distance to floor halves per window. When the floor rises (workload
reveals a new peak), gc_max jumps up to meet it immediately. When the
floor falls (peaks have decayed below current gc_max), gc_max halves
toward the lower value over the next several windows.

Bootstrap safety: gc_max retains the existing static initial value
(0.15), so a freshly mounted cleaner runs at the same operating point
as today's code until observations have accumulated. This avoids the
"cluster crashes before adaptive sees a workload" failure mode that
naive `gc_max = hard_limit + observed` produces.

Implementation
==============

A single double member on SegmentCleaner: `peak_projected_used_decayed`
is updated to `max(current, projected_used_bytes)` on each
`try_reserve_projected_usage()` call. `maybe_adjust_thresholds()`
applies `std::pow(0.995, elapsed_sec / 30.0)` decay on each invocation
(every ≥30 s in steady state, longer if the cleaner was idle). The
floor uses this value directly.

Bench measurements (qa/standalone/crimson randwrite, 1 MiB writes,
32 GiB per-OSD null_blk, 70% fill, 1280 GiB write target):

  Configuration                          | WAF     | Duration | Status
  ---------------------------------------|---------|----------|---------
  Static defaults (gc_max=.15, hard=.10) |   5.749 |   33 min | clean
  Manual tuned (gc_max=.08, hard=.02)    |   2.926 |   16 min | clean
  Adaptive hard_limit only               |   3.276 |   17 min | clean
  Adaptive hard_limit + gc_max (HEAD)    |   2.829 |   17 min | clean

Adaptive gc_max reduces WAF a further 14% vs hard_limit-only (3.276 ->
2.829) and slightly beats the hand-tuned manual point (2.926). The
per-OSD adaptation captures workload asymmetry that uniform static
defaults can't: on the bench's PG-imbalanced setup the lightly-loaded
osd.0 settled at gc_max=0.026 (much tighter than the manual 0.08)
while osd.1 took the full traffic and settled at gc_max=0.084. Both
extract maximum efficiency for their actual load instead of running
at worst-case-conservative values.

A separate decay-validation run (45-minute idle interlude between two
heavy phases) confirmed that the lazy decay catches up correctly even
when the background process was dormant during the idle phase.

No new workload-tuned constants are introduced. The literal numbers
in this commit are:
  - the 30 s window from the previous commit (time scale of the
    feedback loop)
  - the binary-search halving rate (control geometry, not workload-
    specific; could be 1/3 or 1/4 with similar convergence)
  - the 0.995 decay rate (per-window multiplier; gives the ~1-hour
    half-life and ~24-hour full-forget behaviour described above;
    recompile-only)

The existing `get_default()` value of `0.15` is left untouched as the
bootstrap initial — operators who disable adaptive control (future
config knob) revert to today's exact behaviour.

Signed-off-by: Shai Fultheim <shai.fultheim@gmail.com>
5 days agocrimson/os/seastore: adaptive cleaner hard_limit from observed open-segment peak
Shai Fultheim [Tue, 19 May 2026 10:55:02 +0000 (13:55 +0300)]
crimson/os/seastore: adaptive cleaner hard_limit from observed open-segment peak

The cleaner's `available_ratio_hard_limit` controls when user IO blocks
(once projected_aratio < hard_limit). Setting it too high causes
unnecessary blocks during transient pressure; setting it too low risks
running out of free segments for the cleaner's own working set, which
aborts the OSD with "seastore device size setting is too small".

The current default of `0.10` was chosen empirically and does not scale
with cluster geometry. On a 32 GiB cluster with default 64 MiB segments,
`0.10` reserves ~3 GiB of always-empty space. The cleaner's actual
named-writer working set is 1 journal + `seastore_hot_tier_generations`
hot writers + `seastore_cold_tier_generations` cold writers + 1
metadata writer = (hot + cold + 2) segments. For the typical defaults
(5 hot, 3 cold) that is 10 segments = 640 MiB on a 32 GiB OSD = 2.0%.
Reserving 10% leaves ~80% of that "headroom" sitting unused, which
causes the cluster to operate at lower fill, accumulate fewer dead
bytes per segment, and pay 4-5x WAF on garbage collection cycles.

This commit makes hard_limit adaptive: track the peak open-segment
count observed during each 30 s window, then derive

  hard_limit = max(observed_peak, named_writers) + 1
             ────────────────────────────────────────
                       (segments_in_cluster)

where the "+ 1" segment is the minimum safety unit (one more open
segment than ever observed). The `named_writers` count is the
architectural floor below which the cleaner cannot allocate; staying
above it prevents the abort. `observed_peak` floats to track the
actual transient overhead introduced by segment transitions in the
running workload.

Implementation
==============

`AsyncCleaner::maybe_adjust_thresholds()` is added as a virtual no-op
hook; `SegmentCleaner` overrides it. The hook is invoked once per
`BackgroundProcess::run()` iteration. Each call samples the current
open-segment count into the rolling window peak. Every 30 s, the
window's peak is consumed to recompute hard_limit, and the window
resets.

`config_t config` loses its `const` qualifier; the only mutation is
this hook, which is the single writer in the cleaner's shard.

This commit only adapts `hard_limit`. `gc_max` remains at its existing
default (0.15). A follow-up commit will add adaptive `gc_max` driven
by observed user-burst and cleaner-cycle peaks; that is where the
remaining WAF reduction lives.

Bench measurements
==================

qa/standalone/crimson randwrite at 70% fill, 1 MiB writes, 32 GiB
per-OSD null_blk backing, 1280 GiB write target. Comparison against
the same workload with static `hard_limit = 0.10`:

  Metric                | static (0.15, 0.10) | adaptive hard_limit |
  ----------------------|---------------------|---------------------|
  user_written          |          1,374 GiB  |          1,374 GiB  |
  device_written        |          7,901 GiB  |          4,503 GiB  |
  WAF (d / u)           |              5.749  |              3.276  |
  completion            |              100 %  |              100 %  |
  bench duration        |             33 min  |             17 min  |
  fio exit              |             rc = 0  |             rc = 0  |
  observed peak open    |                  -  |       7 (each OSD)  |
  computed hard_limit   |                  -  |             0.0215  |

WAF drops 43 % and end-to-end throughput nearly doubles. The mechanism
is that fewer projected_aratio dips cross the (much lower) block
threshold, so the cluster spends less time in the block-recover-block
cycle that bloats device_written without progressing user_written.

No new workload-tuned constants are introduced. The two literal
numbers in the algorithm are the 30 s recompute interval (time scale
of the feedback loop, not workload-specific) and the `+ 1 segment`
safety unit (the smallest possible buffer in units the cleaner can
allocate).

Signed-off-by: Shai Fultheim <shai.fultheim@gmail.com>
6 days agoMerge pull request #69110 from ronen-fr/wip-rf-hours
Ronen Friedman [Fri, 29 May 2026 04:00:00 +0000 (07:00 +0300)]
Merge pull request #69110 from ronen-fr/wip-rf-hours

osd/scrub: 'repairing' scrubs allowed at all times

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
6 days agoMerge PR #68936 into main
Patrick Donnelly [Thu, 28 May 2026 23:48:14 +0000 (19:48 -0400)]
Merge PR #68936 into main

* refs/pull/68936/head:
osd: Fix bug when calculating min_peer_features

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Alex Ainscow <aainscow@uk.ibm.com>
6 days agoMerge pull request #69073 from ShwetaBhosale1/fix_nfs_version_issue
David Galloway [Thu, 28 May 2026 21:25:06 +0000 (17:25 -0400)]
Merge pull request #69073 from ShwetaBhosale1/fix_nfs_version_issue

Use GANESHA_REPO_BASEURL for NFS-Ganesha on all distros

6 days agoMerge PR #69055 into main
Patrick Donnelly [Thu, 28 May 2026 17:57:54 +0000 (13:57 -0400)]
Merge PR #69055 into main

* refs/pull/69055/head:
qa/suites/upgrade: ignore osd in unknown state

Reviewed-by: Laura Flores <lflores@redhat.com>
6 days agoMerge pull request #69128 from xhernandez/fix-unbound-var
John Mulligan [Thu, 28 May 2026 17:53:15 +0000 (13:53 -0400)]
Merge pull request #69128 from xhernandez/fix-unbound-var

mgr/smb: fix incorrect referenced variable

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Shwetha Acharya <sacharya@redhat.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
6 days agoosd/scrub: removed a misleading comment about 'overdue' scrubs 69110/head
Ronen Friedman [Thu, 28 May 2026 14:55:41 +0000 (14:55 +0000)]
osd/scrub: removed a misleading comment about 'overdue' scrubs

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
7 days agoMerge pull request #69095 from xxhdx1985126/wip-seastore-onode-cpu-overhead-output
Xuehan Xu [Thu, 28 May 2026 02:03:09 +0000 (10:03 +0800)]
Merge pull request #69095 from xxhdx1985126/wip-seastore-onode-cpu-overhead-output

crimson/os/seastore/onode_manager: avoid debug related info from occupying too much cpu time

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
7 days agoMerge pull request #67246 from mheler/wip-rgw-sse-gcm
Casey Bodley [Wed, 27 May 2026 21:22:10 +0000 (17:22 -0400)]
Merge pull request #67246 from mheler/wip-rgw-sse-gcm

rgw: Add AES-256-GCM support for RGW server-side encryption

Reviewed-by: Casey Bodley <cbodley@redhat.com>
7 days agoMerge pull request #68851 from stzuraski898/wip-sz-76513-unittest-analysis-script
John Mulligan [Wed, 27 May 2026 19:11:47 +0000 (15:11 -0400)]
Merge pull request #68851 from stzuraski898/wip-sz-76513-unittest-analysis-script

scripts: add Jenkins unit test analysis tool

Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
7 days agoMerge pull request #69111 from cbodley/wip-doc-71265
Casey Bodley [Wed, 27 May 2026 18:35:40 +0000 (14:35 -0400)]
Merge pull request #69111 from cbodley/wip-doc-71265

doc/rgw: document s3control apis for PublicAccessBlock

Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
7 days agoMerge pull request #69126 from gbregman/main
Gil Bregman [Wed, 27 May 2026 18:22:28 +0000 (21:22 +0300)]
Merge pull request #69126 from gbregman/main

mgr/dashboard: Add EC pools support to NVMEoF CLI

7 days agodoc/rgw: document s3control apis for PublicAccessBlock 69111/head
Casey Bodley [Tue, 26 May 2026 16:42:04 +0000 (12:42 -0400)]
doc/rgw: document s3control apis for PublicAccessBlock

Signed-off-by: Casey Bodley <cbodley@redhat.com>
7 days agoMerge pull request #68972 from bluikko/wip-common-options-rgw-separator-note
Casey Bodley [Wed, 27 May 2026 17:23:56 +0000 (13:23 -0400)]
Merge pull request #68972 from bluikko/wip-common-options-rgw-separator-note

common/options: improve rgw_dns_name and clarify separator

Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
7 days agoMerge pull request #68964 from fultheim/fix-cleaner-gc-autotune
Matan Breizman [Wed, 27 May 2026 16:54:10 +0000 (19:54 +0300)]
Merge pull request #68964 from fultheim/fix-cleaner-gc-autotune

  crimson/os/seastore: auto-tune cleaner gc segment pick under random-write

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
7 days agomgr/smb: fix incorrect referenced variable 69128/head
Xavi Hernandez [Wed, 27 May 2026 16:09:10 +0000 (18:09 +0200)]
mgr/smb: fix incorrect referenced variable

An unassigned variable was used in a log message. Replace it by the
relevant one.

Fixes: https://tracker.ceph.com/issues/76947
Signed-off-by: Xavi Hernandez <xhernandez@gmail.com>
7 days agoMerge pull request #69125 from NitzanMordhai/wip-nitzan-perf-count-high-cpu
NitzanMordhai [Wed, 27 May 2026 15:44:48 +0000 (18:44 +0300)]
Merge pull request #69125 from NitzanMordhai/wip-nitzan-perf-count-high-cpu

mgr/ThreadMonitor: monitor interval running in seconds and not nanose…

Reviewed-by: Sridhar Seshasayee <sridhar.seshasayee@ibm.com>
7 days agoMerge pull request #68727 from aainscow/wip_75962
Laura Flores [Wed, 27 May 2026 15:26:43 +0000 (10:26 -0500)]
Merge pull request #68727 from aainscow/wip_75962

osd: Correct missing list on divergent merge of partial writes

Reviewed-by: Bill Scales <bill_scales@uk.ibm.com>
7 days agoMerge pull request #69122 from tchaikov/wip-crimson-silence-unused-warning
Matan Breizman [Wed, 27 May 2026 14:08:45 +0000 (17:08 +0300)]
Merge pull request #69122 from tchaikov/wip-crimson-silence-unused-warning

crimson/seastore: segment_manager: fix -Wunused warnings

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
7 days agodoc/rgw: document support for bucket-level PublicAccessBlock
Casey Bodley [Wed, 27 May 2026 14:02:47 +0000 (10:02 -0400)]
doc/rgw: document support for bucket-level PublicAccessBlock

Signed-off-by: Casey Bodley <cbodley@redhat.com>
7 days agoMerge pull request #69019 from tchaikov/wip-crimson-wake-in-loaded
Matan Breizman [Wed, 27 May 2026 13:23:16 +0000 (16:23 +0300)]
Merge pull request #69019 from tchaikov/wip-crimson-wake-in-loaded

crimson/osd: wake pgs_creating waiters in PGMap::pg_loaded()

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
7 days agomgr/dashboard: Add EC pools support to NVMEoF CLI 69126/head
Gil Bregman [Wed, 27 May 2026 13:14:36 +0000 (16:14 +0300)]
mgr/dashboard: Add EC pools support to NVMEoF CLI
Fixes: https://tracker.ceph.com/issues/76937
Signed-off-by: Gil Bregman <gbregman@il.ibm.com>
7 days agoMerge pull request #64293 from cbodley/wip-71265
Casey Bodley [Wed, 27 May 2026 13:08:45 +0000 (09:08 -0400)]
Merge pull request #64293 from cbodley/wip-71265

rgw: add s3control apis for account-wide PublicAccessBlock

Reviewed-by: Adam Emerson <aemerson@redhat.com>
7 days agomgr/ThreadMonitor: monitor interval running in seconds and not nanoseconds 69125/head
Nitzan Mordechai [Wed, 27 May 2026 11:48:14 +0000 (11:48 +0000)]
mgr/ThreadMonitor: monitor interval running in seconds and not nanoseconds

The ctor accidently use the mgr_module_monitor_interval as nanoseconds
we need to use it as seconds.
Also, prevent high cpu loop in case read_process_statm failed during
while loop

Fixes: https://tracker.ceph.com/issues/76938
Signed-off-by: Nitzan Mordechai <nmordec@ibm.com>
7 days agocrimson/seastore: segment_manager: fix -Wunused warnings 69122/head
Kefu Chai [Wed, 27 May 2026 08:52:31 +0000 (16:52 +0800)]
crimson/seastore: segment_manager: fix -Wunused warnings

7f739adae2 dropped the last log call from get_segment_manager(), after
which `LOG_PREFIX(SegmentManager::get_segment_manager)` and
`SET_SUBSYS(seastore_device)` had no remaining users under `HAVE_ZNS`,
generating:

```
src/crimson/os/seastore/segment_manager.cc:38:3: warning: unused variable 'FNAME' [-Wunused-variable]
   38 |   LOG_PREFIX(SegmentManager::get_segment_manager);
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/crimson/common/log.h:49:38: note: expanded from macro 'LOG_PREFIX'
   49 | #define LOG_PREFIX(x) constexpr auto FNAME = #x
      |                                      ^~~~~
src/crimson/os/seastore/segment_manager.cc:10:1: warning: unused variable 'SOURCE_SUBSYS' [-Wunused-const-variable]
   10 | SET_SUBSYS(seastore_device);
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~
src/crimson/common/log.h:46:52: note: expanded from macro 'SET_SUBSYS'
   46 | #define SET_SUBSYS(subname_) static constexpr auto SOURCE_SUBSYS = ceph_subsys_##subname_
      |                                                    ^~~~~~~~~~~~~
2 warnings generated.
```

drop both to silence them.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>
7 days agoMerge pull request #69116 from rhcs-dashboard/fix-cephadm-e2e-quoting
Nizamudeen A [Wed, 27 May 2026 05:52:17 +0000 (11:22 +0530)]
Merge pull request #69116 from rhcs-dashboard/fix-cephadm-e2e-quoting

mgr/dashboard: fix nested shell quoting in cephadm e2e start-cluster

Reviewed-by: Nizamudeen A <nia@redhat.com>
8 days agomgr/dashboard: fix nested shell quoting in cephadm e2e start-cluster 69116/head
Afreen Misbah [Wed, 27 May 2026 00:07:38 +0000 (05:37 +0530)]
mgr/dashboard: fix nested shell quoting in cephadm e2e start-cluster

with_libvirt wraps commands in sg libvirt -c "$1", adding an extra
shell layer. Nested double quotes inside the outer double-quoted
string caused the argument to be split — with_libvirt received a
truncated $1, producing "Unterminated quoted string" on the remote
shell.

Drop the unnecessary inner double quotes around cephadm shell
arguments since cephadm shell accepts the command as separate args.
Use single quotes for the grep pattern inside the double-quoted
string so it survives the sg subshell.

Signed-off-by: Afreen Misbah <afreen@ibm.com>
8 days agoMerge pull request #69068 from tchaikov/wip-bump-arrow-submodule
Kefu Chai [Wed, 27 May 2026 00:05:25 +0000 (08:05 +0800)]
Merge pull request #69068 from tchaikov/wip-bump-arrow-submodule

rgw: bump Apache Arrow submodule from 17.0.0 to 19.0.1

Reviewed-by: Justin Caratzas <jcaratza@ibm.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
8 days agoMerge pull request #67551 from Ericmzhang/wip-improve-pg-autoscale
Kamoltat (Junior) Sirivadhna [Tue, 26 May 2026 22:04:43 +0000 (18:04 -0400)]
Merge pull request #67551 from Ericmzhang/wip-improve-pg-autoscale

mgr: Fix autoscaling PG distribution
Reviewed-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
8 days agoMerge pull request #67337 from badone/wip-tracker-74919-ceph-dump-log-new-global...
Brad Hubbard [Tue, 26 May 2026 21:59:37 +0000 (07:59 +1000)]
Merge pull request #67337 from badone/wip-tracker-74919-ceph-dump-log-new-global-access

scripts: ceph_dump_log.py change global context access

Reviewed-by: Aishwarya Mathuria <amathuri@redhat.com>
8 days agoMerge pull request #67857 from yaelazulay-redhat/issues_74393_dashboard_fail_to_acces...
Afreen Misbah [Tue, 26 May 2026 21:35:55 +0000 (03:05 +0530)]
Merge pull request #67857 from yaelazulay-redhat/issues_74393_dashboard_fail_to_access_object_when_rgw_use_cephadm_certificate

Issues 74393 dashboard fail to access object when rgw use cephadm certificate

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Redouane Kachach <rkachach@ibm.com>
8 days agoscripts: add Jenkins unit test analysis tool 68851/head
stzuraski898 [Mon, 11 May 2026 20:10:07 +0000 (20:10 +0000)]
scripts: add Jenkins unit test analysis tool

Add analyze_unittest_jenkins.py to aggregate and analyze unit test
results across multiple Ceph PRs by mining Jenkins CI build logs.

The script fetches recent open PRs from GitHub, extracts Jenkins build
URLs from PR checks, downloads console logs in parallel, and parses
CTest output to generate comprehensive failure reports.

This enables data-driven decisions about test infrastructure
improvements and helps identify systemic issues in the test suite.

Fixes: https://tracker.ceph.com/issues/76513
Assisted-by: IBM Bob (Claude 3.7 Sonnet)
Signed-off-by: Steven Zuraski <steven.zuraski@ibm.com>
8 days agoMerge pull request #68874 from BBoozmen/wip-oozmen-76563
Adam Emerson [Tue, 26 May 2026 18:49:10 +0000 (14:49 -0400)]
Merge pull request #68874 from BBoozmen/wip-oozmen-76563

neorados/cls/log: fix infinite trim loop on empty data log shards

Reviewed-by: Adam C. Emerson <aemerson@redhat.com>
8 days agoMerge pull request #67079 from MattyWilliams22/ec-sync-reads
Radoslaw Zarzynski [Tue, 26 May 2026 16:31:12 +0000 (18:31 +0200)]
Merge pull request #67079 from MattyWilliams22/ec-sync-reads

osd: Support for Synchronous Reads in EC

Reviewed-by: Alex Ainscow <aainscow@uk.ibm.com>
Reviewed-by: Bill Scales <bill_scales@uk.ibm.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
8 days agorgw/s3control: skip account id check for admin users 64293/head
Casey Bodley [Tue, 26 May 2026 16:03:48 +0000 (12:03 -0400)]
rgw/s3control: skip account id check for admin users

allow access to admin users that don't belong to the requested account.
this is also necessary for multisite, where requests are forwarded to
the metadata master as the multisite system user instead of the original
requester

Signed-off-by: Casey Bodley <cbodley@redhat.com>
8 days agoosd/scrub: 'repairing' scrubs allowed at all times
Ronen Friedman [Tue, 26 May 2026 15:29:32 +0000 (15:29 +0000)]
osd/scrub: 'repairing' scrubs allowed at all times

Fix ScrubJob::observes_allowed_hours() to not block 'repairing'
scrubs outside of the allowed hours. This allows repair scrubs
to run at any time or day-of-week.
The fixed behaviour matches the documented requirements.

Fixes: https://tracker.ceph.com/issues/76811
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
8 days agoMerge pull request #67950 from rhcs-dashboard/add-telemetry-status
Pedro Gonzalez Gomez [Tue, 26 May 2026 10:16:32 +0000 (12:16 +0200)]
Merge pull request #67950 from rhcs-dashboard/add-telemetry-status

mgr/dashboard: add telemetry status to overview-health-card

Reviewed-by: Abhishek Desai <abhishek.desai1@ibm.com>
8 days agocrimson/os/seastore/onode_manager: avoid debug related info from 69095/head
Xuehan Xu [Tue, 26 May 2026 09:55:00 +0000 (17:55 +0800)]
crimson/os/seastore/onode_manager: avoid debug related info from
occupying too much cpu time

According to the perf tests, constructing the info occupies about 5% of
the total system cpu overhead

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
8 days agoMerge pull request #68258 from tchaikov/wip-with-system-jerasure
Kefu Chai [Tue, 26 May 2026 09:53:02 +0000 (17:53 +0800)]
Merge pull request #68258 from tchaikov/wip-with-system-jerasure

cmake: support building with system jerasure and gf-complete

Reviewed-by: Casey Bodley <cbodley@redhat.com>
8 days agoMerge pull request #61131 from NitzanMordhai/wip-nitzan-mgr-modules-perf-counts
NitzanMordhai [Tue, 26 May 2026 09:31:45 +0000 (12:31 +0300)]
Merge pull request #61131 from NitzanMordhai/wip-nitzan-mgr-modules-perf-counts

mgr: Add per-module performance counters to mgr

Reviewed-by: Sridhar Seshasayee sridhar.seshasayee@ibm.com
8 days agoMerge pull request #68858 from rsacherer/wip-fix-limit-break-existing-devices
Guillaume Abrioux [Tue, 26 May 2026 09:19:03 +0000 (11:19 +0200)]
Merge pull request #68858 from rsacherer/wip-fix-limit-break-existing-devices

ceph-volume: fix re-deployment of OSD issues with disk selection filters and DB Devices

8 days agoMerge pull request #67935 from rhcs-dashboard/add-csv
Pedro Gonzalez Gomez [Tue, 26 May 2026 09:06:14 +0000 (11:06 +0200)]
Merge pull request #67935 from rhcs-dashboard/add-csv

mgr/dashboard: Add Hosts via CSV Upload

Reviewed-by: Devika Babrekar <devika.babrekar@ibm.com>
Reviewed-by: Puja Shahu <pshahu@redhat.com>
Reviewed-by: Pedro Gonzalez Gomez <pegonzal@ibm.com>
8 days agoMerge pull request #68894 from guits/cv-dm-mgmt
Guillaume Abrioux [Tue, 26 May 2026 09:04:53 +0000 (11:04 +0200)]
Merge pull request #68894 from guits/cv-dm-mgmt

ceph-volume: OSD mapper lifecycle (LVM + raw) for activate

8 days agoMerge pull request #69064 from tchaikov/wip-crimson-scrub-blocked
Matan Breizman [Tue, 26 May 2026 08:02:54 +0000 (11:02 +0300)]
Merge pull request #69064 from tchaikov/wip-crimson-scrub-blocked

crimson/scrub: fix assert in PGScrubber::release_range() on interval change

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
8 days agoMerge pull request #69020 from tchaikov/wip-level-triggered-unblock
Matan Breizman [Tue, 26 May 2026 08:01:14 +0000 (11:01 +0300)]
Merge pull request #69020 from tchaikov/wip-level-triggered-unblock

crimson/osd: only unblock wait_for_active_blocker on replica when ACTIVE

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
8 days agoMerge pull request #69018 from tchaikov/wip-large-object-size
Matan Breizman [Tue, 26 May 2026 08:00:37 +0000 (11:00 +0300)]
Merge pull request #69018 from tchaikov/wip-large-object-size

crimson/seastore: reject oversized writes and zeros instead of aborting

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
8 days agoMerge pull request #59476 from zhscn/wip-new-128
Xuehan Xu [Tue, 26 May 2026 05:40:09 +0000 (13:40 +0800)]
Merge pull request #59476 from zhscn/wip-new-128

crimson/os/seastore: introduce static layout of laddr_t

Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
Reviewed-by: Matan Breizman <mbreizma@redhat.com>
Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
8 days agoMerge pull request #68887 from ShreeJejurikar/wip-bucket-logging-requester-assumed...
Yuval Lifshitz [Tue, 26 May 2026 04:58:58 +0000 (07:58 +0300)]
Merge pull request #68887 from ShreeJejurikar/wip-bucket-logging-requester-assumed-role

rgw/logging: use assumed-role ARN as Requester for STS requests

9 days agoMerge pull request #69067 from xxhdx1985126/wip-seastore-lba-wrong-asserts
Xuehan Xu [Tue, 26 May 2026 02:54:07 +0000 (10:54 +0800)]
Merge pull request #69067 from xxhdx1985126/wip-seastore-lba-wrong-asserts

crimson/os/seastore/lba: fix wrong asserts and "if" conditions

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
9 days agoMerge pull request #69082 from ronen-fr/wip-rf-trimlmt-rst
Ronen Friedman [Mon, 25 May 2026 18:27:42 +0000 (21:27 +0300)]
Merge pull request #69082 from ronen-fr/wip-rf-trimlmt-rst

doc/PendingReleaseNotes: document osd_scrub_queued_snaptrims_limit

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
9 days agodoc/PendingReleaseNotes: document osd_scrub_queued_snaptrims_limit 69082/head
Ronen Friedman [Mon, 25 May 2026 13:13:03 +0000 (13:13 +0000)]
doc/PendingReleaseNotes: document osd_scrub_queued_snaptrims_limit

osd_scrub_queued_snaptrims_limit, introduced in PR#68737,
blocks the initiation of non-urgent scrubs on OSDs that
are overloaded with snap-trim operations.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
9 days agocrimson/os/seastore: auto-tune cleaner gc segment pick under random-write 68964/head
Shai Fultheim [Sun, 17 May 2026 09:40:44 +0000 (12:40 +0300)]
crimson/os/seastore: auto-tune cleaner gc segment pick under random-write

SegmentCleaner uses one of three configurable gc formulas to select
the next segment to reclaim: GREEDY (lowest util wins), COST_BENEFIT
((1-u)*age/(2u)), or BENEFIT (an age-weighted quadratic in util).
COST_BENEFIT is the default and the right choice for journaling /
LIFO workloads, where old segments accumulate more dead bytes than
young ones — age predicts deadness, so an old high-util segment is
worth reclaiming because its util will keep rising as long as we wait.

That assumption breaks under random-write at high cluster fill. Dead
bytes spread uniformly across segments regardless of age, so age stops
predicting future deadness, and (1-u)/(2u) becomes the only term that
distinguishes candidates. With every segment in the 0.7-0.94 util
band, (1-u)/(2u) ranges from 0.227 to 0.032 — a 7x spread the formula
can easily lose to a 7x age difference. Result: a 0.94-util old
segment scores higher than a 0.68-util young one, even though
reclaiming the 0.68 segment would free 5x more space (32% of a
64 MB segment vs 6%).

Observed in qa/standalone/crimson randwrite at ~70% full: with the
unmodified formula, cleaner picks settled on 0.92-0.94 util segments
freeing ~4 MB net each; net free rate collapsed to single-digit KB/s
even though the cleaner was running cycles at ~30 µs each. fio's
stall watchdog killed the bench after 535 GB user written (target
1280 GB). Switching gc_formula = greedy by hand let the bench
complete the target.

This patch detects the mis-selection at runtime and overrides the
formula's pick with the greedy choice only when the difference is
significant. In get_next_reclaim_segment() we already iterate all
closed reclaimable segments to find the formula's max-score candidate;
in the same pass we now also track the lowest-util candidate (what
GREEDY would have picked). After the loop, if greedy's free-fraction
(1 - greedy_util) is at least seastore_segment_cleaner_gc_autotune_ratio
times the formula's pick's free-fraction (default 2.0), we swap to
greedy. Since all segments share the same size, comparing free-
fractions is equivalent to comparing freed bytes; the fraction form
avoids an unnecessary multiplication.

The full design rationale (regime-by-regime behaviour, safety guard
against picked_free near zero, score-recompute on override, threshold
calibration) lives in doc/dev/crimson/seastore.rst under the new
"Cleaner GC autotune" section. The code references it from short
inline comments.

Configurable knobs:

  * seastore_segment_cleaner_gc_autotune (bool, default true) —
    operators can disable the override entirely to honor the
    configured formula unconditionally. Ignored when gc_formula =
    greedy.

  * seastore_segment_cleaner_gc_autotune_ratio (float, default 2.0,
    min 1.0) — operators can tune the override threshold. Higher is
    more conservative (preserves age weighting more aggressively);
    lower is more aggressive (behaviour converges toward pure greedy).

The override predicate is factored into a static helper
`SegmentCleaner::should_override_to_greedy(picked_free, greedy_free,
ratio)` so the call site stays readable and the predicate is
independently testable.

With this change the qa/standalone/crimson randwrite bench at 70%
fill completes the target run rather than stalling at the 500-600 GB
mark, with the override firing reliably under high uniform alive_
ratio and not firing under low or non-uniform alive_ratio. Override
behaviour can be observed with debug_seastore_cleaner=20.

Signed-off-by: Shai Fultheim <shai.fultheim@gmail.com>
9 days agoqa/rgw/bucket-logging: configure STS for assume-role test 68887/head
ShreeJejurikar [Wed, 20 May 2026 07:18:03 +0000 (12:48 +0530)]
qa/rgw/bucket-logging: configure STS for assume-role test

Set rgw sts key and enable rgw s3 auth use sts, both needed by
test_bucket_logging_requester_assumed_role. Mirrors the existing
settings in qa/suites/rgw/verify/overrides.yaml.

Signed-off-by: ShreeJejurikar <shreemj8@gmail.com>
9 days agoMerge pull request #69006 from tchaikov/wip-seastore-clamp-block-size-on-small-lba
Matan Breizman [Mon, 25 May 2026 10:39:18 +0000 (13:39 +0300)]
Merge pull request #69006 from tchaikov/wip-seastore-clamp-block-size-on-small-lba

crimson/seastore: clamp block_size to laddr_t::UNIT_SIZE on small-LBA devices

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
9 days agoMerge pull request #68961 from fultheim/fix-cleaner-stall-projected-ratio
Matan Breizman [Mon, 25 May 2026 10:24:59 +0000 (13:24 +0300)]
Merge pull request #68961 from fultheim/fix-cleaner-stall-projected-ratio

crimson/os/seastore: fix cleaner stall under IO-block pressure

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
9 days agoMerge pull request #68884 from tchaikov/wip-crimson-advance-osdmap
Matan Breizman [Mon, 25 May 2026 09:27:01 +0000 (12:27 +0300)]
Merge pull request #68884 from tchaikov/wip-crimson-advance-osdmap

crimson/osd: fix mark-down crash for removed OSDs

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
9 days agoMerge pull request #68861 from tchaikov/wip-crimson-reset-logger
Matan Breizman [Mon, 25 May 2026 09:26:06 +0000 (12:26 +0300)]
Merge pull request #68861 from tchaikov/wip-crimson-reset-logger

crimson/osd: inline log file stream setup to fix dangling pointer

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
9 days agoMerge pull request #69042 from Shubhaj1810/revert-67999
Redouane Kachach [Mon, 25 May 2026 09:14:57 +0000 (11:14 +0200)]
Merge pull request #69042 from Shubhaj1810/revert-67999

Revert "mgr/cephadm: align nodeid and add register_service for NFS Ganesha service visibility"

Reviewed-by: Shweta Bhosale <Shweta.Bhosale1@ibm.com>
9 days agocrimson/os/seastore: also update the mappings copied by client 59476/head
Xuehan Xu [Fri, 15 May 2026 09:10:04 +0000 (17:10 +0800)]
crimson/os/seastore: also update the mappings copied by client
transactions when committing background rewriting transactions

With the 128-bit laddr key layout in place, SeaStore::rename would
involve copying mappings. These mappings must also be updated when
the logical extents they point to are rewritten.

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
9 days agocrimson/os/seastore/omap_manager/log: better output
Xuehan Xu [Tue, 28 Apr 2026 07:00:24 +0000 (15:00 +0800)]
crimson/os/seastore/omap_manager/log: better output

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
9 days agodoc/dev/crimson/seastore_laddr.rst: add descriptions about temp
Xuehan Xu [Sun, 10 May 2026 07:36:22 +0000 (15:36 +0800)]
doc/dev/crimson/seastore_laddr.rst: add descriptions about temp
recovering objects

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
9 days agocrimson/osd: treat OI-not-existing cases as enoent
Xuehan Xu [Thu, 16 Apr 2026 05:47:18 +0000 (13:47 +0800)]
crimson/osd: treat OI-not-existing cases as enoent

This is consistent with classic osds

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
9 days agocrimson/os/seastore/object_data_handler: new debug logs
Xuehan Xu [Tue, 14 Apr 2026 06:00:52 +0000 (14:00 +0800)]
crimson/os/seastore/object_data_handler: new debug logs

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
9 days agocrimson/osd: create temp recovering objects through touch_temp
Xuehan Xu [Wed, 22 Apr 2026 05:37:46 +0000 (13:37 +0800)]
crimson/osd: create temp recovering objects through touch_temp

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
9 days agocrimson/os/seastore: handle OP_TOUCH_TEMP
Xuehan Xu [Sun, 29 Mar 2026 03:20:52 +0000 (11:20 +0800)]
crimson/os/seastore: handle OP_TOUCH_TEMP

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
9 days agoos/Transaction: add the interface dedicated to touching temp objects
Xuehan Xu [Thu, 26 Mar 2026 08:08:41 +0000 (16:08 +0800)]
os/Transaction: add the interface dedicated to touching temp objects

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
9 days agocrimson/os/seastore/lba: fix possible namespace lookup error
Xuehan Xu [Tue, 3 Feb 2026 03:11:33 +0000 (11:11 +0800)]
crimson/os/seastore/lba: fix possible namespace lookup error

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
9 days agodev/doc/crimson: clarify dynamic PG and object bits for static laddr design
Zhang Song [Thu, 8 Jan 2026 04:14:20 +0000 (12:14 +0800)]
dev/doc/crimson: clarify dynamic PG and object bits for static laddr design

Signed-off-by: Zhang Song <zhangsong02@qianxin.com>
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
9 days agocrimson/os/seastore: adapt copy on write for static onode prefix
Zhang Song [Wed, 3 Sep 2025 07:54:40 +0000 (15:54 +0800)]
crimson/os/seastore: adapt copy on write for static onode prefix

Signed-off-by: Zhang Song <zhangsong02@qianxin.com>
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
9 days agocrimson/os/seastore: support rename for static layout of laddr
Zhang Song [Tue, 26 Aug 2025 03:42:49 +0000 (11:42 +0800)]
crimson/os/seastore: support rename for static layout of laddr

Signed-off-by: Zhang Song <zhangsong02@qianxin.com>
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
9 days agocrimson/os/seastore: add "move_mapping" to TransactionManager and LBAManager
Xuehan Xu [Tue, 26 Aug 2025 06:28:55 +0000 (14:28 +0800)]
crimson/os/seastore: add "move_mapping" to TransactionManager and LBAManager

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
9 days agocrimson/os/seastore/lba: set extent type for ZERO lba mappings
Xuehan Xu [Mon, 2 Feb 2026 07:42:49 +0000 (15:42 +0800)]
crimson/os/seastore/lba: set extent type for ZERO lba mappings

Signed-off-by: Zhang Song <zhangsong02@qianxin.com>
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
9 days agomgr/dashboard: Add Hosts via CSV Upload 67935/head
Sagar Gopale [Mon, 23 Mar 2026 06:08:44 +0000 (11:38 +0530)]
mgr/dashboard: Add Hosts via CSV Upload

Fixes: https://tracker.ceph.com/issues/75578
Signed-off-by: Sagar Gopale <sagar.gopale@ibm.com>
9 days agoMerge pull request #68667 from rhcs-dashboard/fix-76316-main
Redouane Kachach [Mon, 25 May 2026 08:32:23 +0000 (10:32 +0200)]
Merge pull request #68667 from rhcs-dashboard/fix-76316-main

mgr/dashboard: add remote write section to prometheus configuration

Reviewed-by: Redouane Kachach <rkachach@ibm.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
9 days agoworkunits/mgr/test_mgr_modules_perf_counters: new test for enable\disable\perf counts 61131/head
Nitzan Mordechai [Tue, 17 Dec 2024 13:49:00 +0000 (13:49 +0000)]
workunits/mgr/test_mgr_modules_perf_counters: new test for enable\disable\perf counts

Simple test to enable \ disable and get counters dump
for checking perf counters.

Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
9 days agomgr: Add per-module performance counters to mgr
Nitzan Mordechai [Sun, 8 Dec 2024 18:08:39 +0000 (18:08 +0000)]
mgr: Add per-module performance counters to mgr

This commit introduces performance counters for individual Ceph mgr modules.
These counters allow monitoring module behavior, debugging latency issues,
and identifying performance bottlenecks, all without modifying the modules themselves.

The following counters are now exposed under:
  > ceph daemon mgr.<id> perf dump

Example structure:
"mgr_module_<module_name>": {
    "notify_avg_usec": {     <- Average time spent handling notify events
        "avgcount": 0,
        "sum": 0
    },
    "cmd_avg_usec": {        <- Average time spent processing CLI/admin commands
        "avgcount": 0,
        "sum": 0
    },
    "serve_avg_usec": {      <- Average time spent in module serve loop (if applicable)
        "avgcount": 0,
        "sum": 0
    },
    "alive": 1               <- Module is alive (1 = running, 0 = exited)
    "cpu_usage": 0,          <- CPU usage in percent
    "mem_rss_change": 0,     <- Memory RSS change in bytes
    "mem_rss_current": 490737664 <- Memory RSS current in bytes

}

Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>
Conflicts:
  src/mgr/ActivePyModules.cc - finisher.queue changed by 63859, adding py_module to the parameter list
  src/mgr/PyModuleRegistry.cc - check_all_modules_started added by 63859

9 days agoUse GANESHA_REPO_BASEURL for NFS-Ganesha on all distros 69073/head
Shweta Bhosale [Thu, 14 May 2026 13:49:56 +0000 (19:19 +0530)]
Use GANESHA_REPO_BASEURL for NFS-Ganesha on all distros

Fixes: https://tracker.ceph.com/issues/76603
Signed-off-by: Shweta Bhosale <Shweta.Bhosale1@ibm.com>
10 days agoceph-volume: OSD mapper lifecycle (LVM + raw) for activate 68894/head
Guillaume Abrioux [Wed, 13 May 2026 12:57:03 +0000 (14:57 +0200)]
ceph-volume: OSD mapper lifecycle (LVM + raw) for activate

This adds small helpers so activate can consistently bring the OSD device
stack online (LVM lvchange, optional mapper open) and tear it down again,
with refresh in between. Same idea for the raw path. Crypto is handled
inside that flow when the OSD is encrypted.

Fixes: https://tracker.ceph.com/issues/76591
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
10 days agoMerge pull request #68771 from jrse/rgw-kafka-mtls-rebased
Yuval Lifshitz [Sun, 24 May 2026 19:29:38 +0000 (22:29 +0300)]
Merge pull request #68771 from jrse/rgw-kafka-mtls-rebased

rgw/kafka: add mTLS support (extends #61572)

10 days agorgw: bump Apache Arrow submodule from 17.0.0 to 19.0.1 69068/head
Kefu Chai [Sun, 24 May 2026 08:25:46 +0000 (16:25 +0800)]
rgw: bump Apache Arrow submodule from 17.0.0 to 19.0.1

When WITH_SYSTEM_ARROW is false, Ceph builds Arrow from the bundled
src/apache submodule. Our CI uses ubuntu:jammy as the base image, which
does not package libarrow-dev, so the bundled path is always taken there.

Arrow 17.0.0 vendors a copy of Thrift whose download URLs are no longer
reachable, breaking CI builds that try to fetch them at configure time.

Bump arrow submodule to 19.0.1, the latest Arrow release that:
- builds successfully on ubuntu:jammy, and
- requires only CMake 3.22 (the version shipped by ubuntu:jammy)

See also

CMake version shipped by ubuntu:jammy
- https://packages.ubuntu.com/jammy/cmake

arrow releases' CMake support
- maint-19.0.1: https://github.com/apache/arrow/blob/272715f6df2a042d69881ffa03d5078c58e4b345/cpp/CMakeLists.txt#L18
- maint-20.0.0: https://github.com/apache/arrow/blob/3ad0370a04ccdae638755b94c3c31c8760a11193/cpp/CMakeLists.txt#L18

arrow enabled minmalloc by default
-
https://github.com/apache/arrow/commit/b907c5dadb516b525c8fafbf34b0116d44044733

Because arrow uses the bundled mialloc library be default, we need
to disable it in the same commit bumping up the submodule.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>
10 days agoMerge pull request #66150 from MaodiMa/AVX512_crc32c
Kefu Chai [Sun, 24 May 2026 09:55:45 +0000 (17:55 +0800)]
Merge pull request #66150 from MaodiMa/AVX512_crc32c

common: enable AVX512+VPCLMULQDQ for crc32c performance on x86

Reviewed-by: Kefu Chai <k.chai@proxmox.com>
10 days agocrimson/os/seastore/lba: fix wrong asserts and "if" conditions 69067/head
Xuehan Xu [Sat, 23 May 2026 09:23:02 +0000 (17:23 +0800)]
crimson/os/seastore/lba: fix wrong asserts and "if" conditions

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
11 days agocrimson/os/seastore/OMapManager: only store the relative block offset to omap root...
Zhang Song [Fri, 30 May 2025 09:45:39 +0000 (17:45 +0800)]
crimson/os/seastore/OMapManager: only store the relative block offset to omap root in OMapInnerNode

Signed-off-by: Zhang Song <zhangsong02@qianxin.com>
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
11 days agotest/crimson/seastore/test_btree_lba_manager: add test cases for conflict policy
Zhang Song [Tue, 27 May 2025 07:31:13 +0000 (15:31 +0800)]
test/crimson/seastore/test_btree_lba_manager: add test cases for conflict policy

Signed-off-by: Zhang Song <zhangsong02@qianxin.com>
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
11 days agocrimson/os/seastore/lba_manager: implement conflict policy
Zhang Song [Tue, 26 Aug 2025 03:38:49 +0000 (11:38 +0800)]
crimson/os/seastore/lba_manager: implement conflict policy

Signed-off-by: Zhang Song <zhangsong02@qianxin.com>
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
11 days agocrimson/os/seastore: reserve region in LBABtree when touching onode
Zhang Song [Wed, 11 Jun 2025 04:04:25 +0000 (12:04 +0800)]
crimson/os/seastore: reserve region in LBABtree when touching onode

Signed-off-by: Zhang Song <zhangsong02@qianxin.com>
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
11 days agocrimson/os/seastore/OnodeManager: adapt laddr_hint_t approach
Zhang Song [Wed, 11 Jun 2025 04:04:03 +0000 (12:04 +0800)]
crimson/os/seastore/OnodeManager: adapt laddr_hint_t approach

Signed-off-by: Zhang Song <zhangsong02@qianxin.com>
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
11 days agocrimson/os/seastore/OMapManager: adapt laddr_hint_t approach
Zhang Song [Mon, 26 May 2025 07:23:25 +0000 (15:23 +0800)]
crimson/os/seastore/OMapManager: adapt laddr_hint_t approach

Signed-off-by: Zhang Song <zhangsong02@qianxin.com>
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
11 days agocrimson/os/seastore: use laddr_hint_t to allocate the laddr
Zhang Song [Tue, 26 Aug 2025 03:36:07 +0000 (11:36 +0800)]
crimson/os/seastore: use laddr_hint_t to allocate the laddr

Signed-off-by: Zhang Song <zhangsong02@qianxin.com>
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
11 days agocrimson/os/seastore/Onode: get sibling's object id when creating new onode
Zhang Song [Wed, 11 Jun 2025 03:50:12 +0000 (11:50 +0800)]
crimson/os/seastore/Onode: get sibling's object id when creating new onode

Signed-off-by: Zhang Song <zhangsong02@qianxin.com>
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
11 days agocrimson/os/seastore/Onode: adapt new get hint approach
Zhang Song [Tue, 26 Aug 2025 03:34:37 +0000 (11:34 +0800)]
crimson/os/seastore/Onode: adapt new get hint approach

Signed-off-by: Zhang Song <zhangsong02@qianxin.com>
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
11 days agocrimson/os/seastore/Onode: support get object/clone prefix
Zhang Song [Thu, 22 May 2025 08:58:14 +0000 (16:58 +0800)]
crimson/os/seastore/Onode: support get object/clone prefix

Signed-off-by: Zhang Song <zhangsong02@qianxin.com>
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
11 days agocrimson/os/seastore/Onode: remove default metadata offset/range
Zhang Song [Tue, 26 Aug 2025 03:31:03 +0000 (11:31 +0800)]
crimson/os/seastore/Onode: remove default metadata offset/range

Signed-off-by: Zhang Song <zhangsong02@qianxin.com>
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
11 days agocrimson/os/seastore: introduce laddr_hint_t and associated factory methods
Zhang Song [Wed, 14 May 2025 08:34:00 +0000 (16:34 +0800)]
crimson/os/seastore: introduce laddr_hint_t and associated factory methods

Signed-off-by: Zhang Song <zhangsong02@qianxin.com>
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
11 days agocrimson/os/seastore: make pladdr_t only store the local clone id instead of full...
Zhang Song [Tue, 26 Aug 2025 02:35:55 +0000 (10:35 +0800)]
crimson/os/seastore: make pladdr_t only store the local clone id instead of full laddr_t

Signed-off-by: Zhang Song <zhangsong02@qianxin.com>
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
11 days agocrimson/os/seastore: introduce static layout of laddr_t
Zhang Song [Wed, 14 May 2025 08:26:26 +0000 (16:26 +0800)]
crimson/os/seastore: introduce static layout of laddr_t

Signed-off-by: Zhang Song <zhangsong02@qianxin.com>
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
11 days agocrimson/os/seastore: extend the size of laddr_t from 64 bits to 128 bits
Zhang Song [Wed, 14 May 2025 07:22:15 +0000 (15:22 +0800)]
crimson/os/seastore: extend the size of laddr_t from 64 bits to 128 bits

Signed-off-by: Zhang Song <zhangsong02@qianxin.com>
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
11 days agoMerge pull request #69045 from xxhdx1985126/wip-seastore-drop-retired-placeholder
Kefu Chai [Sat, 23 May 2026 13:56:32 +0000 (21:56 +0800)]
Merge pull request #69045 from xxhdx1985126/wip-seastore-drop-retired-placeholder

crimson/os/seastore: remove RetiredExtentPlaceholder

Reviewed-by: Kefu Chai <k.chai@proxmox.com>