git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/log

]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/log

projects / ceph.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Sridhar Seshasayee [Thu, 4 Jun 2026 06:58:35 +0000 (12:28 +0530)]

doc/rados/configuration: Remove wpq recommendation warning for EC clusters

Remove the warning that recommends using wpq scheduler as a fallback for EC
clusters. This issue is addressed by considering EC recovery reads as
background, assigning an accurate cost for those reads and tuning the QoS
parameters associated with best-effort class of operations.

Signed-off-by: Sridhar Seshasayee <sridhar.seshasayee@ibm.com>

commit | commitdiff | tree

Sridhar Seshasayee [Mon, 25 May 2026 12:14:54 +0000 (17:44 +0530)]

mclock_common: adjust mClock profile parameters to prevent backfill starvation

Adjust the 'background_best_effort' queue parameters across the
three standard mClock profiles (high_client_ops, balanced, and
high_recovery_ops) to ensure best effort ops are not starved.

Previously, the 'background_best_effort' queue carried a default allocation
of 0% (MIN) reservation and a weight of 1 under these profiles. When
concurrent client traffic is dense, the zero-reservation for example completely
starves backfill sub-ops (MSG_OSD_EC_READ) on pools with
'allow_ec_optimizations' set to false. This starvation forces the Primary OSD
to hold internal BlueStore transactions and PG object locks for extended
windows, causing severe client median (50th) latency inflation.

To prevent background starvation and resolve the effects of the primary lock
retention, the profile configurations are tuned as follows:

The following profile changes forces low-cost sub-ops to clear out of peer
queues rapidly to drop  primary locks, which helps improve the client
completion latency and tail latency (95th, 99th and 99.5th) percentile.

1. high_client_ops profile:
   - Grant 'background_best_effort' a safe 5% minimum reservation.
   - Scale the queue weight to 4.

2. balanced profile:
   - Grant 'background_best_effort' a 5% minimum reservation.
   - Set the queue weight to 2.

3. high_recovery_ops profile:
   - Grant 'background_best_effort' a 5% minimum reservation.
   - Set the queue weight to 2.

4. Modify the mClock config reference documentation to reflect the tuning
   changes to the best-effort QoS parameters across the profiles.

Note on Proportional Scaling Compatibility:
Configuring these changes shifts total reservations to 105% (e.g., 50%
client + 50% recovery + 5% best-effort under the Balanced profile). Under
heavy concurrent saturation, mClock's internal controls resolves this
gracefully via proportional down-scaling, preserving the underlying
device bandwidth limits for different classes of clients. For example instead
of the client being allocated 50% bandwidth, a slightly lower reservation is
allocated while shifting the remaining bandwidth to the best-effort queue.
This minor scaling shift is virtually unnoticeable to the client application,
but it prevents the internal queue deadlocks.

Signed-off-by: Sridhar Seshasayee <sridhar.seshasayee@ibm.com>

commit | commitdiff | tree

Sridhar Seshasayee [Tue, 21 Apr 2026 12:30:50 +0000 (18:00 +0530)]

mclock_common, mClockScheduler: Add perf counters for scheduler ops

Add perf counters to show the status pertaining to the number of ops,
dynamic queue lengths, queue latency and bytes read for the following
ops handled in the high queues and in the scheduler queues:
- peering
- client
- ec reads/writes
- ec recovery reads

Additional counters can be added in the future based on the requirement.

Signed-off-by: Sridhar Seshasayee <sridhar.seshasayee@ibm.com>

commit | commitdiff | tree

Sridhar Seshasayee [Mon, 28 Jul 2025 11:09:34 +0000 (16:39 +0530)]

src/messages, osd: Calculate and set cost for subOpReads for mClock scheduler

Previously, sub-op reads returned a hardcoded cost of 0, bypassing
mClock's background bandwidth and tag calculation mechanisms. This
allowed backfill operations to proceed un-metered, occasionally causing
backend resource contention and driving up client tail latencies.

Cost is calculated based on whether the complete chunk/shard or a subchunk
needs to be read. The possible cases are:
1. Read the complete chunk aligned length:
   - Cost is set to the length of the chunk aligned extent size.
2. Fragmented reads:
   - Consider the subchunk length and count to calculate the cost.
   - compute_cost evaluates the exact layout of fragmented shard bytes on
     disk by summing up the active subchunk allocations exactly once
     (`fragmented_shard_bytes += k.second * subchunk_size`).
   - Linear Extent Scaling: Scale the baseline footprint cleanly by
     multiplying it against the true count of read extents (`tl.size()`),
     achieving a highly efficient O(N) time complexity.

This linear cost model is compatible with pools running with
'allow_ec_optimizations' set to true. Under the FastEC optimized
pipeline, most operations are unified and bypass fragment slicing,
meaning requests will primarily match the Case 1 chunk-aligned path.
In Case 2 where applicable, the O(N) loop ensures that cost will
scale proportionally according to the layout.

It is important to note that the amount of data to read was set to an upper
bound defined by osd_recovery_max_chunk (8 MiB) and was rounded up to the
stripe width. The reason for setting a higher than actual upper bound is that
there may be cases where the object doesn't have the xattrs yet to determine
its size. Therefore, the amount to read was ultimatly set to ~(8 MiB / k)
where k is the number of data shards. This can cause mClock to prolong
the recovery times as items stay longer in the queue. To address this, the
amount to read is set to the remaining length of the object to recover
if the object size is known. Otherwise, the amount to read is set to the
recovery chunk size as before. Therefore, in some cases, only the first
recovery read could be costly if the object context is not known.

The MOSDECSubOpRead class introduces the following:
- cost member. This necessitates an increment to the HEAD_VERSION and
   appropriate handling within the encode and decode methods.
- compute_cost() that is called when creating the message by
   ECCommonL::ReadPipeline::do_read_op(). This calls into ECSubRead::cost()
   that performs the actual calculations to set the cost based on the cases
   mentioned above.
- The same sequence applies to the EC optimized path in
   ECCommon::ReadPipeline::do_read_op().

Fixes: https://tracker.ceph.com/issues/71655
Signed-off-by: Sridhar Seshasayee <sridhar.seshasayee@ibm.com>

commit | commitdiff | tree

Sridhar Seshasayee [Tue, 22 Jul 2025 08:39:16 +0000 (14:09 +0530)]

osd/scheduler: Classify EC subOp reads according to op priority for mClock

The change brings MSG_OSD_EC_READ into the fold of mClock scheduler. This
improves the scheduling of client and other classes of operation as they
are no longer unnecessarily preempted by the 'immediate' queue.
EC SubOps are now handled as follows:

- EC SubOp reads generated during recovery will either go into the
   'background_recovery' or 'background_best_effort' class based on
   the recovery priority set for the op. EC SubOp reads generated due
   to client will continue to be classified as 'immediate'.

- EC SubOp writes generated as a result of client operations will
   continue to be classified as 'immediate'.

- EC SubOp replies are considered high priority and therefore
   continue to be classed as 'immediate'.

Fixes: https://tracker.ceph.com/issues/71655
Signed-off-by: Sridhar Seshasayee <sridhar.seshasayee@ibm.com>

commit | commitdiff | tree

Sridhar Seshasayee [Tue, 22 Jul 2025 08:23:07 +0000 (13:53 +0530)]

osd/scheduler/mClockScheduler: Fix line alignments

Signed-off-by: Sridhar Seshasayee <sridhar.seshasayee@ibm.com>

commit | commitdiff | tree

Sridhar Seshasayee [Tue, 22 Jul 2025 08:08:16 +0000 (13:38 +0530)]

osd/scheduler/mClockScheduler: Log the size of high priority queues.

Signed-off-by: Sridhar Seshasayee <sridhar.seshasayee@ibm.com>

commit | commitdiff | tree

Sridhar Seshasayee [Tue, 22 Jul 2025 07:43:55 +0000 (13:13 +0530)]

src/common/mclock_common: Fix output formatting of SchedulerClass

The earlier output formatting was resulting in the value and string
representation of the SchedulerClass being clubbed together for
e.g., "3client"

The formatting is now fixed to log SchedulerClass as "3 (client)".

Signed-off-by: Sridhar Seshasayee <sridhar.seshasayee@ibm.com>

commit | commitdiff | tree

Afreen Misbah [Thu, 11 Jun 2026 19:08:30 +0000 (00:38 +0530)]

Merge pull request #69136 from syedali237/rhcs-dashboard/hosts

mgr/dashboard: migrated host table tabs to resource pages

Reviewed-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

David Galloway [Thu, 11 Jun 2026 18:01:05 +0000 (14:01 -0400)]

Merge pull request #69408 from tchaikov/wip-spec-update-alternatives

ceph.spec.in: add update-alternatives as runtime dep and correct macro call

commit | commitdiff | tree

Afreen Misbah [Thu, 11 Jun 2026 15:16:32 +0000 (20:46 +0530)]

Merge pull request #68031 from syedali237/rhcs-dashboard/osd-component

mgr/dashboard : carbonize OSD form component

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Devika Babrekar <devika.babrekar@ibm.com>

commit | commitdiff | tree

Kamoltat (Junior) Sirivadhna [Thu, 11 Jun 2026 13:47:19 +0000 (09:47 -0400)]

Merge pull request #69402 from Ericmzhang/wip-fix-mon-stretch_cluster

qa: Fix stretch_cluster.py missing function call
Reviewed-by: Kamoltat Sirivadhna <ksirivad@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 11 Jun 2026 07:34:49 +0000 (09:34 +0200)]

Merge pull request #69391 from guits/fix-raw-activate

ceph-volume: fix raw activate when device path is stale

commit | commitdiff | tree

Zac Dover [Thu, 11 Jun 2026 06:32:20 +0000 (16:32 +1000)]

Merge pull request #69375 from zdover23/2026-06-10-organizationmap-update

organizationmap: add Zac Dover (Clyso)

Reviewed-by: Dan van der Ster <dan.vanderster@clyso.com>

commit | commitdiff | tree

Kefu Chai [Thu, 11 Jun 2026 03:32:24 +0000 (11:32 +0800)]

ceph.spec.in: require update-alternatives for the osd scriptlets

ceph-osd-crimson and ceph-osd-classic call update-alternatives in their
%posttrans and %preun scriptlets but don't depend on it. declare it as a
scriptlet dependency so the binary is there when they run.

Fixes: https://tracker.ceph.com/issues/77323
Signed-off-by: Kefu Chai <k.chai@proxmox.com>

commit | commitdiff | tree

Kefu Chai [Thu, 11 Jun 2026 03:29:05 +0000 (11:29 +0800)]

ceph.spec.in: use %{_sbindir} instead of ${_sbindir} in osd %preun

a37b5b5bde8c added %preun scriptlets that use ${_sbindir}, which is
shell syntax rather than an rpm macro, so it expands to empty at run
time and the scriptlet runs "/update-alternatives", failing on
uninstall/upgrade with:

  /var/tmp/rpm-tmp.K1fvm3: line 2: /update-alternatives: No such file or directory
  error: %preun(ceph-osd-crimson-2:20.3.0-5054.g33c1d671.el9.x86_64) scriptlet failed, exit status 127
  Error in PREUN scriptlet in rpm package ceph-osd-crimson.

use %{_sbindir}, like the %posttrans --install lines already do, so it
expands to /usr/sbin/update-alternatives at build time.

Fixes: https://tracker.ceph.com/issues/77323
Signed-off-by: Kefu Chai <k.chai@proxmox.com>

commit | commitdiff | tree

Patrick Donnelly [Thu, 11 Jun 2026 01:58:01 +0000 (21:58 -0400)]

Merge PR #69404 into main

* refs/pull/69404/head:
.github/milestone: add umbrella

Reviewed-by: Yuri Weinstein <yweins@redhat.com>

commit | commitdiff | tree

Anthony D'Atri [Thu, 11 Jun 2026 00:24:36 +0000 (20:24 -0400)]

Merge pull request #60492 from anthonyeleven/more-pgs

src/common/options: Increase autoscaler PG target and overload values

commit | commitdiff | tree

Patrick Donnelly [Wed, 10 Jun 2026 22:25:16 +0000 (18:25 -0400)]

.github/milestone: add umbrella

Fixes: https://tracker.ceph.com/issues/77308
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 10 Jun 2026 20:35:13 +0000 (16:35 -0400)]

Merge PR #69399 into main

* refs/pull/69399/head:
doc/dev/release-checklists: reset to skeleton

Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>

commit | commitdiff | tree

Eric Zhang [Wed, 10 Jun 2026 19:41:21 +0000 (12:41 -0700)]

qa: Fix task missing function call
lambda was missing function call so always returned true

Signed-off-by: Eric Zhang <emzhang@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 10 Jun 2026 18:36:59 +0000 (14:36 -0400)]

doc/dev/release-checklists: reset to skeleton

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 10 Jun 2026 18:30:59 +0000 (14:30 -0400)]

Merge PR #66726 into main

* refs/pull/66726/head:
doc: Update documentation to reflect new functionality
test: Add integration tests for EC Omap operations and recovery
osd: Hook up omap operations in EC pools
osd: Allow for recovery of OMAP header and entries in EC pools
doc: Write design document to explain the reasoning behind implementing this feature
osd: Introduce functions required for EC OMAP support
osd: Add ECOmapJournal class and relocate OmapUpdateType enum class

Reviewed-by: Bill Scales <bill_scales@uk.ibm.com>
Reviewed-by: Alex Ainscow <aainscow@uk.ibm.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

mheler [Wed, 10 Jun 2026 18:11:19 +0000 (13:11 -0500)]

Merge pull request #69051 from mheler/wip-rgw-http-reqs-lock

rgw/http: take reqs_lock when appending to reqs_change_state

commit | commitdiff | tree

mheler [Wed, 10 Jun 2026 18:10:51 +0000 (13:10 -0500)]

Merge pull request #68784 from mheler/wip-checksum-special-char

rgw/cloud-transition: url-encode rgwx-source-key metadata header

commit | commitdiff | tree

Syed Ali Ul Hasan [Sat, 6 Jun 2026 16:47:23 +0000 (22:17 +0530)]

mgr/dashboard: migrated host table tabs to resource pages

Fixes : https://tracker.ceph.com/issues/76712
Signed-off-by: Syed Ali Ul Hasan <syedaliulhasan19@gmail.com>

commit | commitdiff | tree

Syed Ali Ul Hasan [Wed, 10 Jun 2026 17:30:36 +0000 (23:00 +0530)]

mgr/dashboard: carbonized OSD form component

Fixes: https://tracker.ceph.com/issues/68265
Signed-off-by: Syed Ali Ul Hasan <syedaliulhasan19@gmail.com>

commit | commitdiff | tree

Ronen Friedman [Wed, 10 Jun 2026 15:31:58 +0000 (18:31 +0300)]

Merge pull request #69256 from ronen-fr/wip-rf-stshards

crimson/osd: avoid calling get_sharded_store() for obj size

Reviewed-by: Kefu Chai <k.chai@proxmox.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Matan Breizman <mbreizma@redhat.com>

commit | commitdiff | tree

Matty Williams [Wed, 10 Jun 2026 15:20:23 +0000 (16:20 +0100)]

Merge pull request #68888 from MattyWilliams22/mw-peering-state-rollforward

osd: Fix condition for rolling forward pg log entries

Reviewed-by: Alex Ainscow <aainscow@uk.ibm.com>
Reviewed-by: Bill Scales <bill_scales@uk.ibm.com>

commit | commitdiff | tree

Afreen Misbah [Wed, 10 Jun 2026 14:33:03 +0000 (20:03 +0530)]

Merge pull request #69276 from afreen23/worktree-umbrella-release-notes

doc: add Dashboard and Monitoring release notes for Umbrella

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Naman Munet <nmunet@redhat.com>

commit | commitdiff | tree

David Galloway [Wed, 10 Jun 2026 14:32:33 +0000 (10:32 -0400)]

Merge pull request #68368 from kginonredhat/issue-75389-yaml-and-jinja2-deps-on-centos-distro

ceph.spec: declare PyYAML and Jinja2 Requires for cephadm RPM

commit | commitdiff | tree

Afreen Misbah [Mon, 25 May 2026 23:10:46 +0000 (04:40 +0530)]

doc: add Dashboard and Monitoring release notes for Umbrella

Signed-off-by: Afreen Misbah <afreen23@gmail.com>

commit | commitdiff | tree

Jaya Prakash [Wed, 10 Jun 2026 11:31:07 +0000 (17:01 +0530)]

Merge pull request #68984 from Jayaprakash-ibm/wip-faster-alloc-recovery-testing

qa: Add Teuthology tests for BlueStore faster allocation recovery

Reviewed-by: Jaya Prakash <jayaprakash@ibm.com>

commit | commitdiff | tree

Jaya Prakash [Wed, 10 Jun 2026 11:30:14 +0000 (17:00 +0530)]

Merge pull request #64369 from aclamk/aclamk-bs-faster-start-more

bluestore: Faster allocation recovery - evolution

Reviewed-by: Jaya Prakash <jayaprakash@ibm.com>

commit | commitdiff | tree

Jaya Prakash [Wed, 10 Jun 2026 11:28:10 +0000 (16:58 +0530)]

Merge pull request #68981 from aclamk/aclamk-kv-divide-range

kv/KeyValueDB: New utility function util_divide_key_range

Reviewed-by: Jaya Prakash <jayaprakash@ibm.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 10 Jun 2026 11:22:14 +0000 (13:22 +0200)]

ceph-volume: fix raw activate when device path is stale

This changes unlink_bs_symlinks to use os.path.lexists instead
of os.path.exists. It can happen that devices get renumbered,
in that case, the OSD symlink still exists but its target device
is gone which means os.path.exists returns False, so the symlink
is never cleaned up and ceph-volume activate can fail later.

Fixes: https://tracker.ceph.com/issues/77295
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>

commit | commitdiff | tree

Ilya Dryomov [Wed, 10 Jun 2026 10:00:45 +0000 (12:00 +0200)]

Merge pull request #69364 from eameh-LF/wip-doc-77191

doc/man: Remove stale EOL release names from deprecation notices

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>

commit | commitdiff | tree

Ronen Friedman [Wed, 3 Jun 2026 05:40:25 +0000 (05:40 +0000)]

crimson/osd: move get_max_object_size() to store level

is_offset_and_length_valid() called get_sharded_store() locally to
obtain the store-specific max_object_size. On alien cores (where
smp::count > store_shard_nums), the local store is inactive and the
call hits assert(shard_store.get_status() == true).

As the max object size is a store-specific property and not a
store-shard one, there is no reason to acquire the
store shard to obtain it. Instead -
a get_max_object_size() method is added to the Store interface.

Fixes: https://tracker.ceph.com/issues/76946
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

commit | commitdiff | tree

Zac Dover [Wed, 10 Jun 2026 01:00:18 +0000 (11:00 +1000)]

docs: organizationmap: add Zac Dover (Clyso)

Add Zac Dover (Clyso) to .organizationmap.

Signed-off-by: Zac Dover <zac.dover@clyso.com>

commit | commitdiff | tree

Nizamudeen A [Wed, 10 Jun 2026 05:02:26 +0000 (10:32 +0530)]

Merge pull request #68990 from rhcs-dashboard/carbon-filter

mgr/dashboard: carbonize table filters

Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Naman Munet <nmunet@redhat.com>

commit | commitdiff | tree

Kefu Chai [Wed, 10 Jun 2026 03:18:07 +0000 (11:18 +0800)]

Merge pull request #69374 from sunyuechi/wip-catch2-disconnected-guard

cmake: disable Catch2 tests when Catch2 is unavailable

Reviewed-by: Kefu Chai <k.chai@proxmox.com>

commit | commitdiff | tree

Kefu Chai [Wed, 10 Jun 2026 01:52:35 +0000 (09:52 +0800)]

Merge pull request #69120 from tchaikov/wip-crimson-fix-move-rctx

crimson/osd: give each split child its own PeeringCtx

Reviewed-by: Aishwarya Mathuria <amathuri@redhat.com>

commit | commitdiff | tree

Sun Yuechi [Wed, 10 Jun 2026 00:13:53 +0000 (08:13 +0800)]

cmake: disable Catch2 tests when Catch2 is unavailable

debhelper on noble passes -DFETCHCONTENT_FULLY_DISCONNECTED=ON, so CPM
cannot fetch Catch2 and silently skips it, leaving no Catch2 targets
behind and breaking the generate step. Fall back to WITH_CATCH2=OFF
with a warning instead.

Signed-off-by: Sun Yuechi <sunyuechi@iscas.ac.cn>

commit | commitdiff | tree

Anthony D'Atri [Sat, 30 May 2026 01:36:48 +0000 (21:36 -0400)]

qa/workunits/mon: Update pg_autoscaler.sh in conjunction with https://github.com/ceph/ceph/pull/60492

Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>

commit | commitdiff | tree

Adam Emerson [Tue, 9 Jun 2026 20:22:35 +0000 (16:22 -0400)]

Merge pull request #61256 from irq0/wip/rgw-kms-cache

RGW SSE-KMS secrets cache

Reviewed-by: Adam Emerson <aemerson@redhat.com>

commit | commitdiff | tree

Adam Kupczyk [Tue, 9 Jun 2026 19:06:36 +0000 (21:06 +0200)]

Merge pull request #69085 from dheart-joe/wip-reconstruct-allocations

os/bluestore: fix reallocation and corruption when shared_blob key is missing/undecodable

commit | commitdiff | tree

Laura Flores [Tue, 9 Jun 2026 18:59:59 +0000 (13:59 -0500)]

Merge pull request #68837 from NitzanMordhai/wip-nitzan-cephtool-singleton-bluestore-evicting-unresponsive-client

qa: ignore evicted client warnings for singletone bluestore

Reviewed-by: Radosław Zarzyński <Radoslaw.Adam.Zarzynski@ibm.com>
Reviewed-by: Yuri Weinstein <yweinste@ibm.com>

commit | commitdiff | tree

John Mulligan [Tue, 9 Jun 2026 18:21:32 +0000 (14:21 -0400)]

Merge pull request #68825 from phlogistonjohn/jjm-smb-ctl-tool-fe

smb: add a smb remote control client tool frontend

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Anoop C S <anoopcs@cryptolab.net>

commit | commitdiff | tree

Anthony D'Atri [Fri, 25 Oct 2024 19:45:27 +0000 (15:45 -0400)]

src/common/options: Increase autoscaler PG target and overload values

Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>

commit | commitdiff | tree

Igor Fedotov [Tue, 9 Jun 2026 15:51:59 +0000 (18:51 +0300)]

Merge pull request #65275 from ifed01/wip-ifed-no-buffered-wal

os/bluestore: do not use buffered IO for BlueFS WAL.

Reviewed-by: Adam Kupczyk <akupczyk@ibm.com>

commit | commitdiff | tree

Matan Breizman [Tue, 9 Jun 2026 13:53:42 +0000 (16:53 +0300)]

Merge pull request #69211 from Matan-B/wip-matanb-seastore-conflict-counters

crimsn/os/seastore: separate reset accounting from transaction creation

Reviewed-by: Xuehan Xu <xuxuehan@qianxin.com>

commit | commitdiff | tree

dheart [Tue, 9 Jun 2026 13:27:14 +0000 (21:27 +0800)]

os/bluestore: prevent reallocation and corruption when shared_blob key is missing/undecodable

When the shared_blob key is missing or fails to decode,
it is necessary to scan the blob's pextents directly as the sole authoritative source
to verify allocated blocks and prevent double-allocation.

Signed-off-by: dheart <dheart_joe@163.com>

commit | commitdiff | tree

Casey Bodley [Tue, 9 Jun 2026 13:16:15 +0000 (09:16 -0400)]

Merge pull request #69233 from tchaikov/wip-rgw-posix-thread-last

rgw/posix: start the Inotify thread last, after the rest is built

Reviewed-by: Casey Bodley <cbodley@redhat.com>

commit | commitdiff | tree

Emmanuel Ameh [Tue, 9 Jun 2026 12:40:03 +0000 (13:40 +0100)]

doc/man: Remove stale EOL release names from deprecation notices

ceph.rst: "osd create" deprecation notice cited "the Luminous release"
(2017, EOL 2020). Update to a plain deprecation statement directing
users to the replacement command (osd new).

rbd.rst: cephx_require_signatures option deprecation cited "the Bobtail
release" (2013, EOL 2015) as context for why the option is deprecated.
Remove the EOL release name; retain the deprecation warning. Fix the
companion nocephx_require_signatures notice for consistency ("in a
future release" instead of "in the future").

Fixes: https://tracker.ceph.com/issues/77191
Signed-off-by: Emmanuel Ameh <eameh@contractor.linuxfoundation.org>

commit | commitdiff | tree

Casey Bodley [Tue, 9 Jun 2026 12:24:19 +0000 (08:24 -0400)]

Merge pull request #69253 from cbodley/wip-76725

osdc: deliver neorados completions to associated executor

Reviewed-by: Adam Emerson <aemerson@redhat.com>
Reviewed-by: Shilpa Jagannath <smanjara@redhat.com>

commit | commitdiff | tree

eameh-LF [Tue, 9 Jun 2026 12:06:30 +0000 (13:06 +0100)]

Merge pull request #69246 from eameh-LF/i77075

doc/cephadm: fix typo and missing quote in activate-existing-osds

commit | commitdiff | tree

Jaya Prakash [Tue, 9 Jun 2026 11:53:16 +0000 (17:23 +0530)]

Merge pull request #65792 from aclamk/aclamk-bs-onode-stall-fix

os/bluestore: Fix problem with onode cache causing stalls

Reviewed-by: Igor Fedotov <igor.fedotov@croit.io>

commit | commitdiff | tree

Jaya Prakash [Tue, 9 Jun 2026 11:52:57 +0000 (17:22 +0530)]

Merge pull request #68798 from aclamk/aclamk-bs-fix-stray-spanning-blobs

os/bluestore: Fix ExtentMap::reshard produce stray spanning blobs

Reviewed-by: Igor Fedotov <igor.fedotov@croit.io>

commit | commitdiff | tree

Matty Williams [Mon, 23 Feb 2026 16:32:13 +0000 (16:32 +0000)]

doc: Update documentation to reflect new functionality

https://tracker.ceph.com/issues/74188
Signed-off-by: Matty Williams <Matty.Williams@ibm.com>

commit | commitdiff | tree

Matty Williams [Tue, 23 Dec 2025 13:42:37 +0000 (13:42 +0000)]

test: Add integration tests for EC Omap operations and recovery

Assisted-by: Bob
Used for writing tests following the pattern of existing tests.

Fixes: https://tracker.ceph.com/issues/74188
Signed-off-by: Matty Williams <Matty.Williams@ibm.com>

commit | commitdiff | tree

Matty Williams [Mon, 18 May 2026 09:09:32 +0000 (10:09 +0100)]

osd: Hook up omap operations in EC pools

Add pool flag to determine if omap operations are supported in a pool.
- Currently disabled in EC pools (will later be enabled for Fast EC pools)
Require all osds to have umbrella or later release version to enable pool flag.
Change recovery reads to use journal updates.
Clear the journal for a new epoch.
Set omap_complete accurately before recovery.
Encode omap updates and add entry to journal.
Decode omap updates, apply updates to object store, then remove from journal.
Change omap reads in PrimaryLogPG to use PGBackend functions, including omap updates from journal.

Assisted-by: Bob
Used for debugging and copying patterns (e.g. implementing REPLACE type to match MODIFY).

Fixes: https://tracker.ceph.com/issues/74188
Signed-off-by: Matty Williams <Matty.Williams@ibm.com>

commit | commitdiff | tree

Matty Williams [Tue, 12 May 2026 15:11:17 +0000 (16:11 +0100)]

osd: Allow for recovery of OMAP header and entries in EC pools

Add omap fields to read_request_t, read_result_t, ECSubRead and ECSubReadReply.
Read and write omap header and entries if !omap_complete.
Require omap_complete to finish recovery.

Fixes: https://tracker.ceph.com/issues/74244
Signed-off-by: Matty Williams <Matty.Williams@ibm.com>

commit | commitdiff | tree

Matty Williams [Tue, 24 Feb 2026 15:16:28 +0000 (15:16 +0000)]

doc: Write design document to explain the reasoning behind implementing this feature

Assisted-by: Bob
Used to create the first draft of the design document.

https://tracker.ceph.com/issues/74187
Signed-off-by: Matty Williams <Matty.Williams@ibm.com>

commit | commitdiff | tree

Matty Williams [Fri, 12 Dec 2025 11:21:10 +0000 (11:21 +0000)]

osd: Introduce functions required for EC OMAP support

Introduced a "supports_omap" pool flag which is always enabled for Replicated pools and currently always disabled for EC pools.
Introduced wrappers around omap read operations in PGBackend to include updates from the journal in EC pools with optimisations enabled.
Introduced a function for encoding an EC_OMAP operation in the ObjectModDesc::Visitor class and a function for committing an operation in the Trimmer struct.

Signed-off-by: Matty Williams <Matty.Williams@ibm.com>

commit | commitdiff | tree

Yuval Lifshitz [Tue, 9 Jun 2026 07:58:15 +0000 (10:58 +0300)]

Merge pull request #69033 from kchheda3/fix-76729-notif-eventtime-race

rgw/notification: fix zero eventTime in bucket notifications on concurrent PUT race

commit | commitdiff | tree

Venky Shankar [Tue, 9 Jun 2026 01:32:00 +0000 (07:02 +0530)]

Merge PR #68413 into main

* refs/pull/68413/head:
mds: fix shutdown hang when ephemeral pins active and max_mds is 0
mds: fix crash in hash_into_rank_bucket() when max_mds is 0

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>

commit | commitdiff | tree

Kefu Chai [Mon, 8 Jun 2026 23:37:28 +0000 (07:37 +0800)]

Merge pull request #69165 from sunyuechi/wip-addcephtest-catch2-imported-target

cmake/AddCephTest: use namespaced Catch2 imported targets

Reviewed-by: Jesse F. Williamson <jfw@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Mon, 8 Jun 2026 22:31:53 +0000 (18:31 -0400)]

Merge PR #69337 into main

* refs/pull/69337/head:
doc: governance/csc: update email address

Reviewed-by: Joseph Mundackal <jmundackal@bloomberg.net>
Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Yehuda Sadeh Weinraub [Mon, 8 Jun 2026 18:38:26 +0000 (11:38 -0700)]

doc: governance/csc: update email address

yehuda@redhat.com -> yehuda@ui.com

Signed-off-by: Yehuda Sadeh Weinraub <yehuda@ui.com>

commit | commitdiff | tree

Ericmzhang [Mon, 8 Jun 2026 19:12:11 +0000 (12:12 -0700)]

Merge pull request #69176 from Ericmzhang/wip-fix-pg_autoscaler-tests

qa: Fix pg autoscaler tests

commit | commitdiff | tree

Zack Cerza [Mon, 8 Jun 2026 18:37:07 +0000 (12:37 -0600)]

Merge pull request #69315 from sunyuechi/wip-sccache-riscv64

Dockerfile.build: bump sccache and fetch it on riscv64

commit | commitdiff | tree

Jaya Prakash [Mon, 18 May 2026 19:57:50 +0000 (19:57 +0000)]

qa/suites: add faster allocation recovery thrashing suite

Signed-off-by: Jaya Prakash <jayaprakash@ibm.com>

commit | commitdiff | tree

Jaya Prakash [Mon, 18 May 2026 19:57:33 +0000 (19:57 +0000)]

qa/workunits: add EC fio workload for allocation recovery testing

Signed-off-by: Jaya Prakash <jayaprakash@ibm.com>

commit | commitdiff | tree

Adam Kupczyk [Fri, 29 May 2026 11:16:39 +0000 (11:16 +0000)]

os/bluestore: Add printout to CBT's recovery-compare command

1) recovery-compare prints on stdout
2) gracefully rejects comparing when multithreaded not enabled

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>

commit | commitdiff | tree

Adam Kupczyk [Tue, 19 May 2026 19:36:37 +0000 (19:36 +0000)]

os/bluestore: Add bluestore_debug_fast_recovery_compare_chance

The setting is used for testing purposes only.
It allows to force compare if required,
or set chance to use in teuthology thrash tests.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>

commit | commitdiff | tree

Adam Kupczyk [Mon, 7 Jul 2025 10:16:43 +0000 (10:16 +0000)]

os/bluestore: Make OnodeScan use just one Blob

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>

commit | commitdiff | tree

Adam Kupczyk [Mon, 7 Jul 2025 10:02:01 +0000 (10:02 +0000)]

os/bluestore: Tell OnodeScan to skip decoding checksums

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>

commit | commitdiff | tree

Adam Kupczyk [Mon, 7 Jul 2025 07:24:42 +0000 (07:24 +0000)]

os/bluestore: Adapt multithread recovery

Adapt multithread recovery to modified ExtentDecoder interface.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>

commit | commitdiff | tree

Adam Kupczyk [Thu, 3 Jul 2025 08:04:01 +0000 (08:04 +0000)]

os/bluestore: Multithreaded allocation recovery

Added multithreading processing for allocation recovery.
Added new config "bluestore_allocation_recovery_threads".

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>

commit | commitdiff | tree

Adam Kupczyk [Tue, 1 Jul 2025 13:25:38 +0000 (13:25 +0000)]

os/bluestore: Add "recovery-compare" action to CBT

New command compares 2 recovery modes:
- legacy
- new multithreaded
The command is hidden - it does not show in help.
Its role is devel & test only.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>

commit | commitdiff | tree

Adam Kupczyk [Tue, 1 Jul 2025 13:47:14 +0000 (13:47 +0000)]

os/bluestore: Add new onode recovery method

Added read_allocation_from_onodes_mt function
  (originally copied from read_allocation_from_onodes).
Added Decoder_AllocationsAndStatFS class
  (originally copied from ExtentDecoderpartial).

There are significant differences from originals:
- shared blobs are not scanned at all
- to not account allocations more than once,
  collisions are detected on SimpleBitmap level;
  only the first onode referencing shared blob will mark allocation
- Blobs are not preserved
- instead we remember only if blob or spanning blob was compressed

The underlying logic is make recovery faster and prepare for
multithread refactor.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>

commit | commitdiff | tree

Adam Kupczyk [Tue, 1 Jul 2025 11:54:01 +0000 (11:54 +0000)]

os/bluestore: Tiny refactor

Moved statfs initialization that is done after onode recovery
from read_allocation_from_onodes()
to reconstruct_allocations().

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>

commit | commitdiff | tree

Adam Kupczyk [Tue, 1 Jul 2025 11:48:45 +0000 (11:48 +0000)]

os/bluestore: Add set_atomic and clr_atomic to SimpleBitmap

The functions are analogs of set and clr respectively that allow to multithread use.
In addition return value is a count of set/cleared bits.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>

commit | commitdiff | tree

Adam Kupczyk [Fri, 4 Jul 2025 16:28:16 +0000 (16:28 +0000)]

os/bluestore: Rework on decoding

Refactored ExtentDecoder.
Introduced decode_create_blob method to it.
Converted bluestore_blob_t::decode and Blob::decode methods into templates.
Created clear example path how to specialize these and other decoders.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>

commit | commitdiff | tree

Shraddha Agrawal [Mon, 8 Jun 2026 14:54:47 +0000 (20:24 +0530)]

Merge pull request #69212 from shraddhaag/wip-shraddhaag-enable-debian-crimson-builds

debian: enable crimson packages

commit | commitdiff | tree

Kefu Chai [Mon, 8 Jun 2026 14:11:05 +0000 (22:11 +0800)]

Merge pull request #66746 from datdenkikniet/prologue-not-epilogue

msg/async/frames_v2: doc: FRAME_EARLY_DATA_COMPRESSED is used in prologue, not epilogue

Reviewed-by: Kefu Chai <k.chai@proxmox.com>

commit | commitdiff | tree

Kefu Chai [Mon, 8 Jun 2026 13:34:54 +0000 (21:34 +0800)]

Merge pull request #69188 from sunyuechi/zstd-system-include

compressor/zstd: include <zstd.h> instead of the bundled path

Reviewed-by: Kefu Chai <k.chai@proxmox.com>

commit | commitdiff | tree

chungfengz [Thu, 16 Apr 2026 06:54:16 +0000 (06:54 +0000)]

mds: fix shutdown hang when ephemeral pins active and max_mds is 0

During shutdown, `ceph fs set <fs> down true` sets max_mds to 0 before
the MDS daemons have finished exporting their subtrees. shutdown_pass()
iterates over auth subtrees and skips any dir whose inode is
ephemerally pinned, expecting handle_export_pins() to re-place them.
However, handle_export_pins() calls hash_into_rank_bucket() which (after
the companion fix) now returns MDS_RANK_NONE when max_mds == 0. With
no valid target rank the export is never scheduled, so the ephemerally-
pinned dirs are skipped by shutdown_pass() indefinitely and the daemon
loops.

Fixes: https://tracker.ceph.com/issues/76059
Signed-off-by: chungfengz <chungfengz@synology.com>

commit | commitdiff | tree

chungfengz [Thu, 16 Apr 2026 06:53:51 +0000 (06:53 +0000)]

mds: fix crash in hash_into_rank_bucket() when max_mds is 0

When a CephFS cluster is paused (e.g. via `ceph fs set <fs> down true`
or `ceph fs pause`) the MDS map's max_mds is set to 0. Any subsequent
call to hash_into_rank_bucket() with max_mds == 0 triggers a crash:
the jump-consistent-hash loop never executes (j starts at 0, condition
j < max_mds is immediately false), leaving b = -1, so the final
assert(result >= 0 && result < max_mds) aborts the daemon.

Fixes: https://tracker.ceph.com/issues/76059
Signed-off-by: chungfengz <chungfengz@synology.com>

commit | commitdiff | tree

Venky Shankar [Mon, 8 Jun 2026 09:03:10 +0000 (14:33 +0530)]

Merge pull request #56634 from neesingh-rh/wip-64064

mds: comply with the valid range for `mds_log_max_segments`

Reviewed-by: Venky Shankar <vshankar@redhat.com>

commit | commitdiff | tree

Venky Shankar [Mon, 8 Jun 2026 08:53:57 +0000 (14:23 +0530)]

Merge PR #68793 into main

* refs/pull/68793/head:
mds: prevent CDir omap commit with empty updates/removals/header

Reviewed-by: Igor Golikov <igolikov@ibm.com>
Reviewed-by: Dhairya Parmar <dparmar@redhat.com>

commit | commitdiff | tree

Matan Breizman [Mon, 8 Jun 2026 08:13:54 +0000 (11:13 +0300)]

Merge pull request #69153 from fultheim/rbm-capacity-enforcement

crimson/os/seastore: enforce capacity in RBMCleaner::try_reserve_projected_usage

Reviewed-by: Matan Breizman <mbreizma@redhat.com>

commit | commitdiff | tree

Nizamudeen A [Tue, 19 May 2026 04:40:08 +0000 (10:10 +0530)]

mgr/dashboard: carbonize table filters

Fixes: https://tracker.ceph.com/issues/76687
Signed-off-by: Nizamudeen A <nia@redhat.com>

commit | commitdiff | tree

Matan Breizman [Mon, 8 Jun 2026 07:43:56 +0000 (10:43 +0300)]

Merge pull request #69248 from xxhdx1985126/wip-seastore-get_child_sync-fix

crimson/os/seastore/linked_tree_node: get_child_sync should also get transactional views of the extent

Reviewed-by: Ronen Friedman <rfriedma@redhat.com>
Reviewed-by: Matan Breizman <mbreizma@redhat.com>

commit | commitdiff | tree

Venky Shankar [Mon, 8 Jun 2026 07:27:10 +0000 (12:57 +0530)]

Merge PR #66492 into main

* refs/pull/66492/head:
src/pybind/mgr: handle json-pretty for perf stats

Reviewed-by: Jos Collin <jcollin@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>

commit | commitdiff | tree

Shraddha Agrawal [Mon, 1 Jun 2026 10:58:48 +0000 (16:28 +0530)]

debian: enable crimson packages

This commit enables ceph-osd-crimson and ceph-osd-crimson-dbg
packages for debian builds which have gcc version 13 or above.
This is done as a first step to add noble to supported distors
for crimson.

Signed-off-by: Shraddha Agrawal <shraddha.agrawal000@gmail.com>

commit | commitdiff | tree

Nizamudeen A [Mon, 8 Jun 2026 05:25:57 +0000 (10:55 +0530)]

Merge pull request #68094 from rhcs-dashboard/cleanup-log

mgr/prometheus: cleanup the smb share processing logs

Reviewed-by: Avan Thakkar <athakkar@redhat.com>

commit | commitdiff | tree

Nizamudeen A [Mon, 8 Jun 2026 05:23:20 +0000 (10:53 +0530)]

Merge pull request #69317 from tchaikov/wip-mgr-dashboard-immutable-cache

mgr/dashboard: don't mutate the cached osd_map in CephService

Reviewed-by: Nizamudeen A <nia@redhat.com>

commit | commitdiff | tree

Venky Shankar [Mon, 8 Jun 2026 04:35:28 +0000 (10:05 +0530)]

Merge pull request #65950 from joscollin/wip-71701-near-full

qa: drop creating huge files in test_cephfs_mirror_cancel_sync

Reviewed-by: Venky Shankar <vshankar@redhat.com>

commit | commitdiff | tree

Kefu Chai [Mon, 8 Jun 2026 01:33:54 +0000 (09:33 +0800)]

Merge pull request #67371 from greenx/main

logrotate: send SIGHUP to ceph-exporter on log rotation

Reviewed-by: Kefu Chai <k.chai@proxmox.com>

Unnamed repository; edit this file 'description' to name the repository.