]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/log
ceph.git
2 days agoceph.spec.in: require c-ares >= 1.28 for ceph-osd-crimson 69259/head
Kautilya Tripathi [Wed, 3 Jun 2026 08:32:08 +0000 (14:02 +0530)]
ceph.spec.in: require c-ares >= 1.28 for ceph-osd-crimson

Seastar's DNS stack uses ares_query_dnsrec when built against c-ares
>= 1.28 (ARES_VERSION >= 0x011c00). Only ceph-osd-crimson links that
path; classic-osd does not, so add the version floor on the crimson
subpackage only.

Rocky Linux 10 shaman builds use docker.io/rockylinux/rockylinux:10
(os-release 10.1), but dnf builddeps resolve against the live Rocky 10
BaseOS/AppStream repos, which track the newest minor and install
c-ares-devel/c-ares 1.34.6. CMake links ceph-osd-crimson against that
library. Teuthology nodes are provisioned as Rocky 10.1 and install only
the requested Ceph packages without a full distro upgrade, so their
baseline c-ares stays at 1.25.0 (< 1.28, no ares_query_dnsrec). Install
succeeds but OSD startup fails with "undefined symbol: ares_query_dnsrec".

Require c-ares >= 1.28 on ceph-osd-crimson so dnf upgrades to a suitable
libcares (1.34.6 is already in Rocky 10.1 baseos) or fails cleanly at
install. Ubuntu crimson CI does not show this mismatch: the same LTS is
used for building and testing, and maintainers do not bump upstream
package versions across an LTS lifecycle (only cherry-picked fixes), so
build-time and runtime libc-ares stay aligned.

Signed-off-by: Kautilya Tripathi <kautilya.tripathi@ibm.com>
3 days agoMerge pull request #69007 from MaxKellermann/test__missing_includes
Ilya Dryomov [Wed, 3 Jun 2026 09:39:18 +0000 (11:39 +0200)]
Merge pull request #69007 from MaxKellermann/test__missing_includes

test: add missing includes

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
3 days agoMerge pull request #66459 from aainscow/ec_direct_reads_pr2
Jon Bailey [Wed, 3 Jun 2026 09:13:42 +0000 (10:13 +0100)]
Merge pull request #66459 from aainscow/ec_direct_reads_pr2

EC Direct Reads

Reviewed-by: Bill Scales <bill_scales@uk.ibm.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Adam Emerson <aemerson@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
3 days agoMerge pull request #68996 from VallariAg/wip-nvmeof-cli-warning
Vallari Agrawal [Wed, 3 Jun 2026 08:07:35 +0000 (13:37 +0530)]
Merge pull request #68996 from VallariAg/wip-nvmeof-cli-warning

mgr/dashboard: show warning message in nvmeof cli

3 days agoMerge pull request #68817 from tchaikov/nvme-of-mon-client
Kefu Chai [Wed, 3 Jun 2026 07:00:43 +0000 (15:00 +0800)]
Merge pull request #68817 from tchaikov/nvme-of-mon-client

cmake,debian: enable ceph-mon-client-nvmeof on Debian derivatives

Reviewed-by: Dan Mick <dan.mick@redhat.com>
3 days agoMerge PR #66748 into main
Venky Shankar [Wed, 3 Jun 2026 06:54:23 +0000 (12:24 +0530)]
Merge PR #66748 into main

* refs/pull/66748/head:
doc: Document that client_dirsize_rbytes confuses rsync

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
3 days agoMerge pull request #69167 from rhcs-dashboard/fix-76989-main
Aashish Sharma [Wed, 3 Jun 2026 04:36:33 +0000 (10:06 +0530)]
Merge pull request #69167 from rhcs-dashboard/fix-76989-main

mgr/dashboard: Add Sync from/sync from all options on master zone edit

Reviewed-by: Naman Munet <nmunet@redhat.com>
3 days agoMerge pull request #69137 from tchaikov/wip-assert-all-fmt
Kefu Chai [Wed, 3 Jun 2026 01:33:47 +0000 (09:33 +0800)]
Merge pull request #69137 from tchaikov/wip-assert-all-fmt

crimson: replace assert_all class with a format-safe function template

Reviewed-by: Ronen Friedman <rfriedma@redhat.com>
3 days agoMerge PR #69222 into main
Patrick Donnelly [Tue, 2 Jun 2026 19:59:10 +0000 (15:59 -0400)]
Merge PR #69222 into main

* refs/pull/69222/head:
qa: install nvme-cli only if distro remains rocky10

Reviewed-by: Redouane Kachach <rkachach@redhat.com>
3 days agoMerge pull request #68941 from adamemerson/wip-rgw-deprecate-omap-datalog
Adam Emerson [Tue, 2 Jun 2026 18:39:46 +0000 (14:39 -0400)]
Merge pull request #68941 from adamemerson/wip-rgw-deprecate-omap-datalog

rgw: Deprecate OMAP datalog

Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
3 days agoMerge PR #69219 into main
Patrick Donnelly [Tue, 2 Jun 2026 17:59:18 +0000 (13:59 -0400)]
Merge PR #69219 into main

* refs/pull/69219/head:
script/backport-create-issue: catch errors during traversal

Reviewed-by: Yuri Weinstein <yweins@redhat.com>
3 days agoMerge pull request #69174 from dang/wip-dang-merge-standalone
Daniel Gryniewicz [Tue, 2 Jun 2026 16:37:31 +0000 (12:37 -0400)]
Merge pull request #69174 from dang/wip-dang-merge-standalone

Merge rgw-standalone

3 days agoMerge pull request #68756 from phlogistonjohn/jjm-smb-ctl-tool
John Mulligan [Tue, 2 Jun 2026 14:42:52 +0000 (10:42 -0400)]
Merge pull request #68756 from phlogistonjohn/jjm-smb-ctl-tool

smb: add a ceph based smb remote control client tool

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Anoop C S <anoopcs@cryptolab.net>
3 days agoMerge pull request #68774 from aclamk/aclamk-doc-bs-rocksdb-perf-counters
Adam Kupczyk [Tue, 2 Jun 2026 13:16:26 +0000 (15:16 +0200)]
Merge pull request #68774 from aclamk/aclamk-doc-bs-rocksdb-perf-counters

doc/rados/bluestore: RockDB cache shards, perf counters

3 days agoMerge pull request #68430 from Jayaprakash-ibm/wip-bluefs-spillover-cleaner-rework
Jaya Prakash [Tue, 2 Jun 2026 13:04:54 +0000 (18:34 +0530)]
Merge pull request #68430 from Jayaprakash-ibm/wip-bluefs-spillover-cleaner-rework

os/bluestore: BlueFS Spillover Cleaner Evolution

Reviewed-by: Adam Kupczyk <akupczyk@ibm.com>
4 days agoMerge PR #67709 into main
Venky Shankar [Tue, 2 Jun 2026 10:12:16 +0000 (15:42 +0530)]
Merge PR #67709 into main

* refs/pull/67709/head:
tools/cephfs: always execute scan_{extents,inodes,frags} and cleanup

Reviewed-by: Edwin Rodriguez <edwin.rodriguez1@ibm.com>
4 days agoMerge pull request #69037 from dparmar18/i76728
Venky Shankar [Tue, 2 Jun 2026 08:55:12 +0000 (14:25 +0530)]
Merge pull request #69037 from dparmar18/i76728

mds: persist session auth_name in ESession journal event

Reviewed-by: Christopher Hoffman <choffman@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
4 days agoMerge pull request #67717 from leonidc/delay-failback
leonidc [Tue, 2 Jun 2026 08:03:36 +0000 (11:03 +0300)]
Merge pull request #67717 from leonidc/delay-failback

nvmeofgw: delay failback

4 days agoMerge pull request #69105 from guits/fix-bypass_workqueue
Guillaume Abrioux [Tue, 2 Jun 2026 06:10:59 +0000 (08:10 +0200)]
Merge pull request #69105 from guits/fix-bypass_workqueue

ceph-volume: detect rotational media under dm-crypt for workqueue bypass

4 days agosrc/test/reclaim: test session reclaim after mds failover 69037/head
Dhairya Parmar [Mon, 25 May 2026 12:01:33 +0000 (17:31 +0530)]
src/test/reclaim: test session reclaim after mds failover

ensure that the new active MDS reads the auth_name from the ESession
event and assigns it to the new session that MDS creates during journal
replay.

NOTE: the mds failover is carried by sending "respawn" command to active
MDS using libcephfs's ceph_mds_command().

Fixes: https://tracker.ceph.com/issues/76728
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
4 days agomds: persist session auth_name in ESession journal event
Dhairya Parmar [Wed, 20 May 2026 21:18:15 +0000 (02:48 +0530)]
mds: persist session auth_name in ESession journal event

So that it can be applied to the freshly creation session which happens
while recreating session in ESession::replay when the OMAP version fell
behind the ESession cmapv and the newly creation session would be
rejected as target when a client tries to reclaim this session.

Fixes: https://tracker.ceph.com/issues/76728
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
4 days agoMerge pull request #68098 from sunyuechi/riscv-isa-l-support
Kefu Chai [Tue, 2 Jun 2026 03:47:35 +0000 (11:47 +0800)]
Merge pull request #68098 from sunyuechi/riscv-isa-l-support

isa-l: enable on RISC-V

Reviewed-by: Kefu Chai <k.chai@proxmox.com>
4 days agoMerge pull request #69121 from tchaikov/wip-seastore-rolling-in-bg
Kefu Chai [Tue, 2 Jun 2026 02:14:48 +0000 (10:14 +0800)]
Merge pull request #69121 from tchaikov/wip-seastore-rolling-in-bg

crimson/seastore: make RecordSubmitter::wait_available() idempotent

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
4 days agoMerge pull request #69214 from tchaikov/wip-cephadm-iscsi-gw
Kefu Chai [Mon, 1 Jun 2026 23:35:37 +0000 (07:35 +0800)]
Merge pull request #69214 from tchaikov/wip-cephadm-iscsi-gw

qa/cephadm: query iSCSI gateway FQDN from inside the container

Reviewed-by: Redouane Kachach <rkachach@ibm.com>
4 days agoMerge pull request #69026 from jamiepryde/ec-profile-deprecation-warning
SrinivasaBharathKanta [Mon, 1 Jun 2026 23:19:05 +0000 (04:49 +0530)]
Merge pull request #69026 from jamiepryde/ec-profile-deprecation-warning

Add health warning for deprecated EC plugins and techniques

4 days agoMerge PR #68362 into main
Patrick Donnelly [Mon, 1 Jun 2026 19:33:50 +0000 (15:33 -0400)]
Merge PR #68362 into main

* refs/pull/68362/head:
doc: squid 19.2.4 release notes

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Redouane Kachach <rkachach@redhat.com>
4 days agoMerge pull request #66936 from jacquesh/remove-text-output-from-rados-bench-json
Radoslaw Zarzynski [Mon, 1 Jun 2026 19:30:25 +0000 (21:30 +0200)]
Merge pull request #66936 from jacquesh/remove-text-output-from-rados-bench-json

tools/rados: Remove plain text snippets from rados bench JSON output

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 days agoMerge pull request #69061 from jzhu116-bloomberg/wip-70346
Radoslaw Zarzynski [Mon, 1 Jun 2026 19:00:45 +0000 (21:00 +0200)]
Merge pull request #69061 from jzhu116-bloomberg/wip-70346

osd: unregister admin socket commands in fast shutdown

Reviewed-by: Kefu Chai <k.chai@proxmox.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 days agocommon/options, os/bluestore: add debug option to force bluefs files onto slow device 68430/head
Jaya Prakash [Thu, 7 May 2026 12:09:07 +0000 (12:09 +0000)]
common/options, os/bluestore: add debug option to force bluefs files onto slow device

Fixes: https://tracker.ceph.com/issues/74319
Signed-off-by: Jaya Prakash <jayaprakash@ibm.com>
4 days agoos/bluestore: start/stop BlueFS spillover cleaner on config change
Jaya Prakash [Mon, 16 Mar 2026 19:22:49 +0000 (19:22 +0000)]
os/bluestore: start/stop BlueFS spillover cleaner on config change

Fixes: https://tracker.ceph.com/issues/74319
Signed-off-by: Jaya Prakash <jayaprakash@ibm.com>
(cherry picked from commit dc768b782d54cc6a5dee29a9c4f358e8b9183aa6)

4 days agoos/bluestore: migrated files in 128MB chunks
Jaya Prakash [Fri, 15 May 2026 17:07:32 +0000 (17:07 +0000)]
os/bluestore: migrated files in 128MB chunks

Signed-off-by: Jaya Prakash <jayaprakash@ibm.com>
4 days agoos/bluestore: Spillover Cleaner Thread implementation in BlueFS
Jaya Prakash [Thu, 16 Apr 2026 15:30:28 +0000 (15:30 +0000)]
os/bluestore: Spillover Cleaner Thread implementation in BlueFS

Fixes: https://tracker.ceph.com/issues/74319
Signed-off-by: Jaya Prakash <jayaprakash@ibm.com>
4 days agocommon/options: add bluefs_spillover_cleaner option
Jaya Prakash [Mon, 16 Mar 2026 19:23:05 +0000 (19:23 +0000)]
common/options: add bluefs_spillover_cleaner option

Fixes: https://tracker.ceph.com/issues/74319
Signed-off-by: Jaya Prakash <jayaprakash@ibm.com>
4 days agoqa: install nvme-cli only if distro remains rocky10 69222/head
Patrick Donnelly [Mon, 1 Jun 2026 15:37:23 +0000 (11:37 -0400)]
qa: install nvme-cli only if distro remains rocky10

Notably, only include these the `dnf install` commands if the distro is
not overriden by some other mechanism (like cephfs kernel overrides).

This is only a problem for tentacle presently as the k-stock kernel will
override with centos9.

Fixes: https://tracker.ceph.com/issues/77037
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
4 days agoMerge pull request #69083 from fultheim/adaptive-cleaner-thresholds
Matan Breizman [Mon, 1 Jun 2026 16:12:08 +0000 (19:12 +0300)]
Merge pull request #69083 from fultheim/adaptive-cleaner-thresholds

crimson/os/seastore: adaptive cleaner thresholds from observed workload

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
4 days agoscript/backport-create-issue: catch errors during traversal 69219/head
Patrick Donnelly [Mon, 1 Jun 2026 14:29:13 +0000 (10:29 -0400)]
script/backport-create-issue: catch errors during traversal

A ServerError shouldn't prevent all forward progress.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
4 days agoMerge pull request #69203 from tchaikov/wip-libcephfs-test
Kefu Chai [Mon, 1 Jun 2026 14:08:36 +0000 (22:08 +0800)]
Merge pull request #69203 from tchaikov/wip-libcephfs-test

test/libcephfs: reduce SnapDiffDeletionRecreation bulk_count on Windows

Reviewed-by: Igor Fedotov <igor.fedotov@croit.io>
4 days agoMerge pull request #68775 from gardran/wip-gardran-fix-write-v2-deferred-counters
Igor Fedotov [Mon, 1 Jun 2026 13:58:53 +0000 (16:58 +0300)]
Merge pull request #68775 from gardran/wip-gardran-fix-write-v2-deferred-counters

os/bluestore: do not increment *issued_deferred* counters twice

Reviewed-by: Jaya Prakash <jayaprakash@ibm.com>
Reviewed-by: Adam Kupczyk <akupczyk@ibm.com>
4 days agoMerge pull request #69168 from guits/fix-osd-type
Guillaume Abrioux [Mon, 1 Jun 2026 13:49:09 +0000 (15:49 +0200)]
Merge pull request #69168 from guits/fix-osd-type

cephadm: cephadm: omit --osd-type classic for older ceph-volume

4 days agoMerge PR #69152 into main
Patrick Donnelly [Mon, 1 Jun 2026 13:33:25 +0000 (09:33 -0400)]
Merge PR #69152 into main

* refs/pull/69152/head:
script/backport-create-issue: update custom field name

Reviewed-by: Redouane Kachach <rkachach@redhat.com>
4 days agoMerge pull request #67889 from gardran/wip-gardran-no-seq-bytes
Igor Fedotov [Mon, 1 Jun 2026 10:55:28 +0000 (13:55 +0300)]
Merge pull request #67889 from gardran/wip-gardran-no-seq-bytes

os/bluestore: avoid redundant map lookup for deferred op

Reviewed-by: Jaya Prakash <jayaprakash@ibm.com>
4 days agoos/bluestore: do not increment *issued_deferred* counter twice 68775/head
Garry Drankovich [Wed, 6 May 2026 16:19:45 +0000 (19:19 +0300)]
os/bluestore: do not increment *issued_deferred* counter twice
in write v2 mode.

_get_deferred_op() is already increasing performance counter on its own.

Signed-off-by: Garry Drankovich <garry.drankovich@clyso.com>
4 days agoqa/cephadm: query iSCSI gateway FQDN from inside the container 69214/head
Kefu Chai [Mon, 1 Jun 2026 10:40:06 +0000 (18:40 +0800)]
qa/cephadm: query iSCSI gateway FQDN from inside the container

rbd-target-api validates that the gateway hostname supplied by gwcli
matches the container's own socket.getfqdn(). Running the same call on
the host can return a different value when the host and container resolve
names differently (e.g. on Rocky 10), causing gateway creation to fail
with HTTP 400 and all subsequent gwcli configuration to break silently.

Query the FQDN from inside the iSCSI container directly so the value is
always consistent with what rbd-target-api expects. This also removes the
"run twice" workaround, which was compensating for host-side DNS
warm-up flakiness rather than addressing the underlying mismatch.

Fixes: https://tracker.ceph.com/issues/74577
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
5 days agoMerge pull request #69143 from guits/fix-cv-vg-lv-batch
Guillaume Abrioux [Mon, 1 Jun 2026 07:57:17 +0000 (09:57 +0200)]
Merge pull request #69143 from guits/fix-cv-vg-lv-batch

ceph-volume: retry lvs after empty result and "devices file is missing" stderr

5 days agotest/libcephfs: reduce SnapDiffDeletionRecreation bulk_count on Windows 69203/head
Kefu Chai [Mon, 1 Jun 2026 05:19:04 +0000 (13:19 +0800)]
test/libcephfs: reduce SnapDiffDeletionRecreation bulk_count on Windows

this test timed out on Windows. and HugeSnapDiffLargeDelta, at half
the file count, passed in 508 seconds on the same run, suggesting this
test takes ~17 minutes on Windows -- beyond the test runner limit.

we haven't profiled the Windows client yet, but the likely culprit is
EventPoll, the Windows messenger backend, which scans the entire poll
array on every event_wait() and poll_ctl() call rather than using a
keyed data structure.

in this change, we reduce bulk_count to 1 << 12 on Windows. the unique
thing this test covers is the deletion-recreation pattern: a name that
exists as a file in snap1, gets deleted, and reappears as a directory in
snap2 -- it must show up in the diff with both snapids. 4096 produces
1024 such pairs, which is enough to exercise that logic. multi-fragment
snapdiff is already covered by HugeSnapDiffLargeDelta, which derives its
file count from mds_bal_split_size and mds_bal_fragment_fast_factor
explicitly to trigger fragmentation.

Fixes: https://tracker.ceph.com/issues/77015
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
5 days agoMerge pull request #69135 from VallariAg/wip-nvmeof-teuthology-mon-conf
Vallari Agrawal [Sun, 31 May 2026 16:00:05 +0000 (21:30 +0530)]
Merge pull request #69135 from VallariAg/wip-nvmeof-teuthology-mon-conf

qa/suites/nvmeof: set beacon grace and connect panic

5 days agoMerge pull request #66500 from AliMasarweh/wip-alimasa-global-cors
Ali Masarwa [Sun, 31 May 2026 10:30:56 +0000 (13:30 +0300)]
Merge pull request #66500 from AliMasarweh/wip-alimasa-global-cors

RGW: add support for global CORS rule

Reviewed-by: Naman Munet <naman.munet@ibm.com>, Casey Bodley <cbodley@redhat.com>
6 days agoMerge pull request #69185 from sunyuechi/wip-with-system-spdk
Kefu Chai [Sun, 31 May 2026 10:26:14 +0000 (18:26 +0800)]
Merge pull request #69185 from sunyuechi/wip-with-system-spdk

cmake,blk/spdk: support WITH_SYSTEM_SPDK

Reviewed-by: Kefu Chai <k.chai@proxmox.com>
6 days agoMerge pull request #68745 from Hezko/bugfix-13279
Hezko [Sun, 31 May 2026 08:04:07 +0000 (11:04 +0300)]
Merge pull request #68745 from Hezko/bugfix-13279

mgr/dashboard: fix listener add errors

6 days agoMerge pull request #69044 from xxhdx1985126/wip-seastore-rewrite-fix
Matan Breizman [Sun, 31 May 2026 07:20:36 +0000 (10:20 +0300)]
Merge pull request #69044 from xxhdx1985126/wip-seastore-rewrite-fix

crimson/os/seastore: force rewrite transactions to conflict with others if it involve insertions on the lba tree

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
6 days agocmake: add WITH_SYSTEM_SPDK to link a system-installed SPDK 69185/head
Sun Yuechi [Sat, 30 May 2026 06:15:12 +0000 (14:15 +0800)]
cmake: add WITH_SYSTEM_SPDK to link a system-installed SPDK

By default ceph builds the bundled src/spdk fork via BuildSPDK. Add a
WITH_SYSTEM_SPDK option that instead locates a distro-provided SPDK
through a new Findspdk.cmake (pkg-config based, modelled on
Finddpdk.cmake), exposing the same spdk::spdk target.

Signed-off-by: Sun Yuechi <sunyuechi@iscas.ac.cn>
6 days agoblk/spdk: support both old and new spdk_env_opts member names
Sun Yuechi [Sat, 30 May 2026 06:11:11 +0000 (14:11 +0800)]
blk/spdk: support both old and new spdk_env_opts member names

SPDK 21.01 renamed two struct spdk_env_opts members: pci_whitelist ->
pci_allowed and master_core -> main_core. Guard the assignments in
NVMEDevice with SPDK_VERSION.

pci_whitelist -> pci_allowed:  https://github.com/spdk/spdk/commit/4a6a2824119b
master_core -> main_core:      https://github.com/spdk/spdk/commit/fe137c8970bf

Signed-off-by: Sun Yuechi <sunyuechi@iscas.ac.cn>
7 days agoMerge pull request #68934 from cbodley/wip-76578
Casey Bodley [Fri, 29 May 2026 17:52:00 +0000 (13:52 -0400)]
Merge pull request #68934 from cbodley/wip-76578

rgw/beast: add ssl_ciphersuites option for tls 1.3

Reviewed-by: Matt Benjamin <mbenjamin@redhat.com>
7 days agorgw/posix: remove path from table names 69174/head
Nithya Balachandran [Thu, 16 Apr 2026 10:01:50 +0000 (10:01 +0000)]
rgw/posix: remove path from table names

Removes the DB directory path from the table names.

Signed-off-by: Nithya Balachandran <nithya.balachandran@ibm.com>
7 days agorgw/posix: implement the quota feature
Nithya Balachandran [Tue, 24 Mar 2026 08:17:52 +0000 (08:17 +0000)]
rgw/posix: implement the quota feature

Implement the quota feature for the POSIX driver.

Signed-off-by: Nithya Balachandran <nithya.balachandran@ibm.com>
7 days agoRGW | standalone: add support for accounts in dbstore
Ali Masarwa [Sun, 12 Apr 2026 13:07:38 +0000 (16:07 +0300)]
RGW | standalone: add support for accounts in dbstore

Signed-off-by: Ali Masarwa <amasarwa@redhat.com>
7 days agoradosgw-admin: Remove dependence on RADOS
Samarah Uriarte [Tue, 24 Mar 2026 15:21:00 +0000 (15:21 +0000)]
radosgw-admin: Remove dependence on RADOS

Signed-off-by: Samarah Uriarte <samarah.uriarte@ibm.com>
7 days agoRGW POSIX - Fix POSIX unittest
Daniel Gryniewicz [Mon, 30 Mar 2026 14:49:47 +0000 (10:49 -0400)]
RGW POSIX - Fix POSIX unittest

Signed-off-by: Daniel Gryniewicz <dang@fprintf.net>
7 days agorgw/posix: fix cached size of uploaded objects
Matt Benjamin [Tue, 24 Mar 2026 18:10:28 +0000 (14:10 -0400)]
rgw/posix: fix cached size of uploaded objects

Moves file open and stat into the (atomic) link step, so size
is correctly interned in the cache.  Fix suggested by dang.

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
7 days agorgw/posix: fix crash in radosgw-admin
Nithya Balachandran [Tue, 24 Mar 2026 11:33:15 +0000 (11:33 +0000)]
rgw/posix: fix crash in radosgw-admin

The POSIXBucket copy constructor incorrectly calls .get() on a
on a temporary unique_ptr returned by clone(), causing immediate
deletion of the Directory object. This leaves a dangling pointer
that triggers a segfault during destruction.

Signed-off-by: Nithya Balachandran <nithya.balachandran@ibm.com>
7 days agocohort_lru: keep strict discard, but from LRU
Matt Benjamin [Wed, 26 Nov 2025 23:17:02 +0000 (18:17 -0500)]
cohort_lru: keep strict discard, but from LRU

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
7 days agoposixdriver: properly destruct BucketCacheEntry objects
Matt Benjamin [Wed, 26 Nov 2025 14:00:03 +0000 (09:00 -0500)]
posixdriver:  properly destruct BucketCacheEntry objects

* avoids leak of database handles during eviction

Also adds missing return-ref in invalidate_entry--this would
leak a cache entry.

With this change, we can now tolerate indefinite s3-test runs
wit rgw_posix_cache_max_buckets=100.

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
7 days agocohort_lru: crash fix and reduce lock contention
Matt Benjamin [Tue, 25 Nov 2025 17:41:37 +0000 (12:41 -0500)]
cohort_lru: crash fix and reduce lock contention

Fixes crash induced by taking the address of the last element
of an empty intrusive list (!).

Also, introduces active queue, reducing potential for lock
contention in evict_block():

* entries are tracked on lane::active_queue when lru_refcnt > 1
** on some lane::q otherwise

Object transition between queues when lru_refcnt changes value--
a value of 0 triggers deletion, as before.

Fixes: https://tracker.ceph.com/issues/73992
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
7 days agoposixdriver: can move buffer::list leaving scope
Matt Benjamin [Fri, 13 Feb 2026 20:29:58 +0000 (15:29 -0500)]
posixdriver: can move buffer::list leaving scope

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
7 days agoposixdriver: add provisional manifest
Matt Benjamin [Wed, 4 Feb 2026 02:05:47 +0000 (21:05 -0500)]
posixdriver: add provisional manifest

initially, it is just used to remember the multipart layout, but
likely will see other use.

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
7 days agoposixdriver: fix cksum_type, flags propagation
Matt Benjamin [Tue, 3 Feb 2026 22:12:22 +0000 (17:12 -0500)]
posixdriver: fix cksum_type, flags propagation

Posixdriver doesn't serialize POSIXMultipartUpload, but rather a
member mp_obj of type POSIXMPObj--so to avoid losing the latter's
inherited cksum_type and cksum_flags members (which are already
copied in), copy them out in POSIXMultiPartUpload::get_info() which
we need to call to copy out dest_placement anyway.

(oops, chksum_type was copied in, but not cksum_flags)

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
7 days agoposixdriver: fix cache fill of versioned buckets
Matt Benjamin [Sun, 15 Feb 2026 20:56:03 +0000 (15:56 -0500)]
posixdriver: fix cache fill of versioned buckets

This change completes the original intent (hypothesized) to
conditionally set the FLAG_CURRENT bit on just the current
entries during bucket listing cache fill.

This avoids interning 2 copies of the current version of each
object in the listing cache, and also correctly sets the
FLAG_CURRENT bit as required--so the current versions are correctly
reported in versioned listings.

Janky logic to find the current version by explicitly chasing
the symlink target and saving it outside the enumeration scope
has been replaced with proper call to stat() provided by Dang.

Symlink::fill_cache() is no longer used, so removed.

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
7 days agoposixdriver: add bde.flags to in bucket cache serde cycle
Matt Benjamin [Sun, 15 Feb 2026 15:21:28 +0000 (10:21 -0500)]
posixdriver: add bde.flags to in bucket cache serde cycle

The upstream logic (mostly?) correctly uses bde.flags when filling
the cache for versioned objects, but cache ser(de)ialization has
been discarding that member.

This change suppresses the visible result where RGW incorrectly produces
multiple versions in non-versioned listing because none uniquely sets
FLAG_CURRENT:

mbenjamin@fedora:~/dev/rgw/s3_py/python$ s3cmd ls s3://sheik2
2026-02-14 22:44           22  s3://sheik2/ginfizz_1
2026-02-14 22:44           22  s3://sheik2/ginfizz_1
2026-02-14 22:44           22  s3://sheik2/ginfizz_1
2026-02-14 22:44           22  s3://sheik2/ginfizz_2
2026-02-14 22:44           22  s3://sheik2/ginfizz_2
2026-02-14 22:44           22  s3://sheik2/ginfizz_2

Corrected result is:

mbenjamin@fedora:~/dev/rgw/s3_py/python$ s3cmd ls s3://sheik2
2026-02-14 22:44           22  s3://sheik2/ginfizz_1
2026-02-14 22:44           22  s3://sheik2/ginfizz_2

Cached listings for versions are still incorrect in containing an
an extra entry for the "current" version in with empty instance
(from the Symlink)--the visible effect being that list-object-versions
output is incorrect (no entry is sent with IsLatest, after the
empty instance version has been filtered out).

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
7 days agoposixdriver: propagate object lock attrs across multipart upload
Matt Benjamin [Thu, 12 Feb 2026 19:13:17 +0000 (14:13 -0500)]
posixdriver: propagate object lock attrs across multipart upload

Retention rules can be specified in init-multipart, and of present,
need to propagate to the final object if the upload completes.

Needed for (e.g.) test_object_lock_delete_multipart_object_with_retention

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
7 days agoposixdriver: page in all xattrs in POSIXObject::load_obj_state()
Matt Benjamin [Wed, 11 Feb 2026 21:44:42 +0000 (16:44 -0500)]
posixdriver: page in all xattrs in POSIXObject::load_obj_state()

This seems to be needed for (at least) object lock retention period
checks, e.g., in DeleteObject::execute().

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
7 days agoMerge pull request #68899 from batrick/i76586
Ilya Dryomov [Fri, 29 May 2026 16:02:02 +0000 (18:02 +0200)]
Merge pull request #68899 from batrick/i76586

qa: ignore POOL_FULL for rbd tests exercising full pools

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
7 days agoMerge PR #67683 into main
Patrick Donnelly [Fri, 29 May 2026 15:08:23 +0000 (11:08 -0400)]
Merge PR #67683 into main

* refs/pull/67683/head:
qa/tasks/cbt: construct venv just for cbt
qa/distros: use consistent naming
qa/tasks/nvme_loop: fix nvme loop task for ubuntu noble
qa/distros: add ubuntu_24.04 as supported container host
qa/distros: bump ubuntu_latest.yaml to 24.04
qa/distros: add all/ubuntu_24.04.yaml
qa/suites/rados/encoder: use random supported distro
qa/ceph-ansible: symlink supported-random-distro$
qa/fs/fscrypt: symlink supported-random-distro$
qa/cephmetrics: symlink supported-random-distro$

Reviewed-by: Redouane Kachach <rkachach@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
7 days agoMerge PR #69163 into main
Patrick Donnelly [Fri, 29 May 2026 15:07:03 +0000 (11:07 -0400)]
Merge PR #69163 into main

* refs/pull/69163/head:
qa/tasks: capture CommandCrashedError when running nvme list cmd

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
7 days agoMerge pull request #66439 from aclamk/aclamk-bs-simpler-flush
Igor Fedotov [Fri, 29 May 2026 15:04:47 +0000 (18:04 +0300)]
Merge pull request #66439 from aclamk/aclamk-bs-simpler-flush

bluestore/bluefs: FileWriter simpler flush

Reviewed-by: Igor Fedotov <igor.fedotov@croit.io>
7 days agoMerge pull request #68607 from dheart-joe/wip-bluestore-unshare-blob
Igor Fedotov [Fri, 29 May 2026 15:03:05 +0000 (18:03 +0300)]
Merge pull request #68607 from dheart-joe/wip-bluestore-unshare-blob

os/bluestore: optimize shared blob unsharing during snapshot removal

Reviewed-by: Igor Fedotov <igor.fedotov@croit.io>
7 days agoMerge pull request #69166 from sunyuechi/wip-rgw-swift-error-handler-out-of-line
Casey Bodley [Fri, 29 May 2026 15:01:51 +0000 (11:01 -0400)]
Merge pull request #69166 from sunyuechi/wip-rgw-swift-error-handler-out-of-line

rgw: move SWIFT error_handler out-of-line to fix link failure

Reviewed-by: Casey Bodley <cbodley@redhat.com>
7 days agoMerge pull request #68898 from gardran/wip-gardran-show-esb-in-metadata
Igor Fedotov [Fri, 29 May 2026 15:01:44 +0000 (18:01 +0300)]
Merge pull request #68898 from gardran/wip-gardran-show-esb-in-metadata

os/bluestore: dump effective elastic shared blobs mode in OSD metadata report

Reviewed-by: Adam Kupczyk <akupczyk@ibm.com>
7 days agotools/rados: Remove plain text snippets from rados bench JSON output 66936/head
Jacques Heunis [Thu, 15 Jan 2026 12:11:11 +0000 (12:11 +0000)]
tools/rados: Remove plain text snippets from rados bench JSON output

`rados bench` emits performance stats as its output. It is very helpful
for this output to be in a machine-readable format and the CLI provides
the `--format=json` flag to achieve this.

There are some logs that do not respect the formatter flag though, as
they provide status updates as the tool is running and do not form part
of the output dataset. This prevents the contents of stdout from being
valid JSON which destroys the machine-readability of the output.

To resolve this we gate those status messages behind a check for the
formatter. If any specific formatter is provided we do not emit the
status logs. This leaves the plaintext output largely untouched while
helping the machine-readable output to be well-formed.

Fixes: https://tracker.ceph.com/issues/74370
Signed-off-by: Jacques Heunis <jheunis@bloomberg.net>
7 days agoMerge pull request #69144 from gbregman/main
Gil Bregman [Fri, 29 May 2026 14:39:57 +0000 (17:39 +0300)]
Merge pull request #69144 from gbregman/main

nvmeof: Change the NVMEOF image version to 1.8

7 days agorgw/datalog: `radosgw-admin` will no longer convert datalog to omap 68941/head
Adam C. Emerson [Thu, 5 Feb 2026 22:02:44 +0000 (17:02 -0500)]
rgw/datalog: `radosgw-admin` will no longer convert datalog to omap

Omap-backed datalogs are deprecated, so we remove the ability to
convert to them.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
7 days agorgw/datalog: Remove `rgw default data log backing` option
Adam C. Emerson [Thu, 5 Feb 2026 19:27:43 +0000 (14:27 -0500)]
rgw/datalog: Remove `rgw default data log backing` option

Omap-backed datalogs are deprecated. This option is removed and we no
longer support creating new clusters using them.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
7 days agoMerge pull request #68095 from lumir-sliva/fix-deprecated-egrep-fgrep
Ilya Dryomov [Fri, 29 May 2026 13:52:14 +0000 (15:52 +0200)]
Merge pull request #68095 from lumir-sliva/fix-deprecated-egrep-fgrep

qa,src: replace deprecated egrep/fgrep with grep -E/grep -F

Reviewed-by: Kefu Chai <k.chai@proxmox.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
7 days agoqa: Ignore deprecated EC plugin warning in teuthology tests 69026/head
Jamie Pryde [Fri, 29 May 2026 11:44:56 +0000 (12:44 +0100)]
qa: Ignore deprecated EC plugin warning in teuthology tests

Add DEPRECATED_EC_PLUGIN to the list of health warnings to
ignore in the thrash-erasure-code-* tests that use deprecated
plugins or techniques. It is expected that this warning will
be raised.

Signed-off-by: Jamie Pryde <jamiepry@uk.ibm.com>
7 days agocephadm: cephadm: omit --osd-type classic for older ceph-volume 69168/head
Guillaume Abrioux [Fri, 29 May 2026 11:13:52 +0000 (13:13 +0200)]
cephadm: cephadm: omit --osd-type classic for older ceph-volume

tentacle doesn't know that flag yet.
During an upgrade, teuthology tests can break.
With this fix, we only add the flag when osd_type isn't classic.

Fixes: https://tracker.ceph.com/issues/76968
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
7 days agomgr/dashboard: Add Sync from/sync from all options on master zone edit 69167/head
Aashish Sharma [Fri, 29 May 2026 11:01:50 +0000 (16:31 +0530)]
mgr/dashboard: Add Sync from/sync from all options on master zone edit

In the dashboard, master zone's edit functionality include the expected "Sync from Zones" and "Sync from All Zones" options

Fixes: https://tracker.ceph.com/issues/76989
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
7 days agorgw: move SWIFT error_handler out-of-line to fix link failure 69166/head
Sun Yuechi [Fri, 29 May 2026 10:39:51 +0000 (18:39 +0800)]
rgw: move SWIFT error_handler out-of-line to fix link failure

The two error_handler overrides are defined inline in rgw_rest_swift.h
and delegate to RGWSwiftWebsiteHandler::error_handler, a non-virtual
function defined in rgw_rest_swift.cc (librgw_a.a). Because the header
is included by rgw_rest.cc, the inline bodies are emitted in
librgw_common.a, which then ODR-uses that symbol across archives.

The link line lists librgw_a.a before librgw_common.a, and GNU ld only
pulls archive members on demand: when librgw_a.a is scanned nothing yet
references RGWSwiftWebsiteHandler::error_handler, so rgw_rest_swift.cc.o
is dropped and the symbol is later unresolved. This shows up as a link
failure with gcc 16 -O2.

Move the two bodies into rgw_rest_swift.cc next to the function they
call, so the ODR-use stays within the same object and the build no
longer depends on archive scan order. No functional change.

Signed-off-by: Sun Yuechi <sunyuechi@iscas.ac.cn>
8 days agoqa/suites/nvmeof: ignore "have only 1 nvmeof gateway" 69135/head
Vallari Agrawal [Wed, 27 May 2026 12:17:55 +0000 (17:47 +0530)]
qa/suites/nvmeof: ignore "have only 1 nvmeof gateway"

Add "have only 1 nvmeof gateway" to ignorelist.
NVMEOF_SINGLE_GATEWAY is already part of ignorelist
but tests sometimes fail on "have only 1 nvmeof gateway".

Thrasher or scalability tests can trigger this but there
are enough asserts to ensure all expected gateways are
up, we can safely ignore this healthcheck warning.

Fixes: https://tracker.ceph.com/issues/75913
Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>
8 days agocrimson/os/seastore: add safety clamp to adaptive hard_limit and crash_floor 69083/head
Shai Fultheim [Fri, 29 May 2026 09:26:39 +0000 (12:26 +0300)]
crimson/os/seastore: add safety clamp to adaptive hard_limit and crash_floor

Signed-off-by: Shai Fultheim <shai.fultheim@gmail.com>
8 days agoqa/tasks: capture CommandCrashedError when running nvme list cmd 69163/head
Redouane Kachach [Fri, 29 May 2026 09:09:44 +0000 (11:09 +0200)]
qa/tasks: capture CommandCrashedError when running nvme list cmd

The safe_while retry loop does not catch exceptions, so a
CommandCrashedError from `nvme list` bypasses it entirely. Catch
CommandCrashedError and continue the retry loop instead.

Fixes: https://tracker.ceph.com/issues/76984
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
8 days agocrimson/os/seastore: adaptive cleaner gc_max from observed user-burst peak
Shai Fultheim [Tue, 19 May 2026 22:53:21 +0000 (01:53 +0300)]
crimson/os/seastore: adaptive cleaner gc_max from observed user-burst peak

The previous commit adapts `hard_limit` to track the cleaner's observed
open-segment peak, removing the hard-coded `.10` floor and cutting WAF
~43%. With hard_limit adaptive, the remaining WAF lever is `gc_max` —
the threshold that gates when the cleaner runs in non-emergency mode
and therefore the cluster's steady-state operating fill. Lower gc_max
= higher fill = more dead bytes per reclaim cycle = fewer live bytes
copied = lower GC component of WAF.

The hard-coded default of `0.15` (cleaner triggers at 85% segment
fill) is over-provisioned for the typical cluster. On the bench
workload the empirically optimal `gc_max` is about 0.08, which at the
default 0.15 means ~7% of cluster space sits unused and ~1.5x of WAF
is paid for the privilege.

This commit makes gc_max adaptive: it decays each window from its
initial static value toward an observation-derived floor

  target_floor = hard_limit + (peak_projected_used / total)

The floor is the smallest gap the cluster needs to absorb its observed
worst-case in-flight user reservation. `peak_projected_used` is tracked
across the cluster's lifetime with a slow exponential decay applied
each adjust cycle.

Decay rate
==========

The decay multiplier is `0.995` per 30 s elapsed window. The decay is
applied lazily: each call to `maybe_adjust_thresholds()` raises 0.995
to the actual elapsed seconds / 30. This way the decay catches up
correctly even if the background process was idle and the hook went
uncalled for many cycles. A naive per-call multiplication would freeze
the decay during idle phases (the issue observed in v1 testing where
peak stayed at its high-water mark across a 45-minute idle window).

Decay timeline (fraction of original value remaining, on a system
where maybe_adjust_thresholds is called at least every 30 s during
idle — or any interval, since the decay is now elapsed-time-based):

  - half-life: log(0.5) / log(0.995) ≈ 138 windows ≈ 69 min ≈ 1 hour
  - peak retention timeline:
       5 min  → 95 %
      30 min  → 74 %
       1 hour → 55 %
       4 hours →  9 %
      12 hours →  0.2 %
      24 hours → effectively 0

So a single observed peak influences gc_max strongly for ~1 hour,
noticeably for ~4 hours, and is essentially forgotten within a day.

This is sized to be much longer than transient bench phases (peaks
remain >92% of true value within a 16 min bench, never roll out
prematurely) yet much shorter than workload-shift timescales (a
workload that genuinely eases sees gc_max shrink within hours).

Re-discovery
============

The decay lets gc_max eventually re-discover lower floors when a
workload genuinely eases, while preserving observed peaks long enough
that transient bursts inside a steady workload don't roll out
prematurely.

gc_max is bounded below by the floor at all times — so the workload's
observed needs are always satisfied without static tuning. Each
window, gc_max moves halfway toward the floor (`gc_max = max(floor,
(gc_max + floor) / 2)`). This is binary-search-style convergence:
distance to floor halves per window. When the floor rises (workload
reveals a new peak), gc_max jumps up to meet it immediately. When the
floor falls (peaks have decayed below current gc_max), gc_max halves
toward the lower value over the next several windows.

Bootstrap safety: gc_max retains the existing static initial value
(0.15), so a freshly mounted cleaner runs at the same operating point
as today's code until observations have accumulated. This avoids the
"cluster crashes before adaptive sees a workload" failure mode that
naive `gc_max = hard_limit + observed` produces.

Implementation
==============

A single double member on SegmentCleaner: `peak_projected_used_decayed`
is updated to `max(current, projected_used_bytes)` on each
`try_reserve_projected_usage()` call. `maybe_adjust_thresholds()`
applies `std::pow(0.995, elapsed_sec / 30.0)` decay on each invocation
(every ≥30 s in steady state, longer if the cleaner was idle). The
floor uses this value directly.

Bench measurements (qa/standalone/crimson randwrite, 1 MiB writes,
32 GiB per-OSD null_blk, 70% fill, 1280 GiB write target):

  Configuration                          | WAF     | Duration | Status
  ---------------------------------------|---------|----------|---------
  Static defaults (gc_max=.15, hard=.10) |   5.749 |   33 min | clean
  Manual tuned (gc_max=.08, hard=.02)    |   2.926 |   16 min | clean
  Adaptive hard_limit only               |   3.276 |   17 min | clean
  Adaptive hard_limit + gc_max (HEAD)    |   2.829 |   17 min | clean

Adaptive gc_max reduces WAF a further 14% vs hard_limit-only (3.276 ->
2.829) and slightly beats the hand-tuned manual point (2.926). The
per-OSD adaptation captures workload asymmetry that uniform static
defaults can't: on the bench's PG-imbalanced setup the lightly-loaded
osd.0 settled at gc_max=0.026 (much tighter than the manual 0.08)
while osd.1 took the full traffic and settled at gc_max=0.084. Both
extract maximum efficiency for their actual load instead of running
at worst-case-conservative values.

A separate decay-validation run (45-minute idle interlude between two
heavy phases) confirmed that the lazy decay catches up correctly even
when the background process was dormant during the idle phase.

No new workload-tuned constants are introduced. The literal numbers
in this commit are:
  - the 30 s window from the previous commit (time scale of the
    feedback loop)
  - the binary-search halving rate (control geometry, not workload-
    specific; could be 1/3 or 1/4 with similar convergence)
  - the 0.995 decay rate (per-window multiplier; gives the ~1-hour
    half-life and ~24-hour full-forget behaviour described above;
    recompile-only)

The existing `get_default()` value of `0.15` is left untouched as the
bootstrap initial — operators who disable adaptive control (future
config knob) revert to today's exact behaviour.

Signed-off-by: Shai Fultheim <shai.fultheim@gmail.com>
8 days agocrimson/os/seastore: adaptive cleaner hard_limit from observed open-segment peak
Shai Fultheim [Tue, 19 May 2026 10:55:02 +0000 (13:55 +0300)]
crimson/os/seastore: adaptive cleaner hard_limit from observed open-segment peak

The cleaner's `available_ratio_hard_limit` controls when user IO blocks
(once projected_aratio < hard_limit). Setting it too high causes
unnecessary blocks during transient pressure; setting it too low risks
running out of free segments for the cleaner's own working set, which
aborts the OSD with "seastore device size setting is too small".

The current default of `0.10` was chosen empirically and does not scale
with cluster geometry. On a 32 GiB cluster with default 64 MiB segments,
`0.10` reserves ~3 GiB of always-empty space. The cleaner's actual
named-writer working set is 1 journal + `seastore_hot_tier_generations`
hot writers + `seastore_cold_tier_generations` cold writers + 1
metadata writer = (hot + cold + 2) segments. For the typical defaults
(5 hot, 3 cold) that is 10 segments = 640 MiB on a 32 GiB OSD = 2.0%.
Reserving 10% leaves ~80% of that "headroom" sitting unused, which
causes the cluster to operate at lower fill, accumulate fewer dead
bytes per segment, and pay 4-5x WAF on garbage collection cycles.

This commit makes hard_limit adaptive: track the peak open-segment
count observed during each 30 s window, then derive

  hard_limit = max(observed_peak, named_writers) + 1
             ────────────────────────────────────────
                       (segments_in_cluster)

where the "+ 1" segment is the minimum safety unit (one more open
segment than ever observed). The `named_writers` count is the
architectural floor below which the cleaner cannot allocate; staying
above it prevents the abort. `observed_peak` floats to track the
actual transient overhead introduced by segment transitions in the
running workload.

Implementation
==============

`AsyncCleaner::maybe_adjust_thresholds()` is added as a virtual no-op
hook; `SegmentCleaner` overrides it. The hook is invoked once per
`BackgroundProcess::run()` iteration. Each call samples the current
open-segment count into the rolling window peak. Every 30 s, the
window's peak is consumed to recompute hard_limit, and the window
resets.

`config_t config` loses its `const` qualifier; the only mutation is
this hook, which is the single writer in the cleaner's shard.

This commit only adapts `hard_limit`. `gc_max` remains at its existing
default (0.15). A follow-up commit will add adaptive `gc_max` driven
by observed user-burst and cleaner-cycle peaks; that is where the
remaining WAF reduction lives.

Bench measurements
==================

qa/standalone/crimson randwrite at 70% fill, 1 MiB writes, 32 GiB
per-OSD null_blk backing, 1280 GiB write target. Comparison against
the same workload with static `hard_limit = 0.10`:

  Metric                | static (0.15, 0.10) | adaptive hard_limit |
  ----------------------|---------------------|---------------------|
  user_written          |          1,374 GiB  |          1,374 GiB  |
  device_written        |          7,901 GiB  |          4,503 GiB  |
  WAF (d / u)           |              5.749  |              3.276  |
  completion            |              100 %  |              100 %  |
  bench duration        |             33 min  |             17 min  |
  fio exit              |             rc = 0  |             rc = 0  |
  observed peak open    |                  -  |       7 (each OSD)  |
  computed hard_limit   |                  -  |             0.0215  |

WAF drops 43 % and end-to-end throughput nearly doubles. The mechanism
is that fewer projected_aratio dips cross the (much lower) block
threshold, so the cluster spends less time in the block-recover-block
cycle that bloats device_written without progressing user_written.

No new workload-tuned constants are introduced. The two literal
numbers in the algorithm are the 30 s recompute interval (time scale
of the feedback loop, not workload-specific) and the `+ 1 segment`
safety unit (the smallest possible buffer in units the cleaner can
allocate).

Signed-off-by: Shai Fultheim <shai.fultheim@gmail.com>
8 days agoMerge pull request #69110 from ronen-fr/wip-rf-hours
Ronen Friedman [Fri, 29 May 2026 04:00:00 +0000 (07:00 +0300)]
Merge pull request #69110 from ronen-fr/wip-rf-hours

osd/scrub: 'repairing' scrubs allowed at all times

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
8 days agoscript/backport-create-issue: update custom field name 69152/head
Patrick Donnelly [Fri, 29 May 2026 01:35:28 +0000 (21:35 -0400)]
script/backport-create-issue: update custom field name

It's now "Ceph Release". I renamed it for clarity.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
8 days agoMerge PR #68936 into main
Patrick Donnelly [Thu, 28 May 2026 23:48:14 +0000 (19:48 -0400)]
Merge PR #68936 into main

* refs/pull/68936/head:
osd: Fix bug when calculating min_peer_features

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Alex Ainscow <aainscow@uk.ibm.com>
8 days agodoc: squid 19.2.4 release notes 68362/head
Yuri Weinstein [Mon, 13 Apr 2026 21:33:15 +0000 (14:33 -0700)]
doc: squid 19.2.4 release notes

Signed-off-by: Yuri Weinstein <yweinste@redhat.com>
8 days agoMerge pull request #69073 from ShwetaBhosale1/fix_nfs_version_issue
David Galloway [Thu, 28 May 2026 21:25:06 +0000 (17:25 -0400)]
Merge pull request #69073 from ShwetaBhosale1/fix_nfs_version_issue

Use GANESHA_REPO_BASEURL for NFS-Ganesha on all distros

8 days agocephadm: in cephadm shell mount /var/lib/ceph under /srv 68756/head
John Mulligan [Tue, 5 May 2026 17:47:44 +0000 (13:47 -0400)]
cephadm: in cephadm shell mount /var/lib/ceph under /srv

When running cephadm shell mount /var/lib/ceph at /srv/ceph unless
/var/lib/ceph is already being passed to cephadm shell -v option.
The default mount is read only. Passing it manually allows the user
to mount it in a custom location read/write.

The mount location at /srv/ceph is chosen because /var/lib/ceph is
already in use for compatibility with various ceph. The /srv tree
is currently unused by the container and serves a similar purpose
to /var/lib if you turn your head in squint a little.

Making this change enables the use of tools that want to read
files or connect to sockets in that file heirarchy. Specifically,
in this case, the ceph smb ctl tool.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
8 days agopython-common/ceph/smb: add frontend entry point to library
John Mulligan [Fri, 3 Apr 2026 19:13:47 +0000 (15:13 -0400)]
python-common/ceph/smb: add frontend entry point to library

Add a __main__.py file with a frontend for interacting with the
remote-control (grpc) feature for SMB. This can be invoked
on the command line using `python -m ceph.smb.ctl` assuming that
the ceph module is importable.

This command line makes it easier to interact with the remote-control
server without knowing a lot about how it is implemented.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
8 days agopython-common/ceph/smb: add client.py for remote-control grpc client
John Mulligan [Wed, 1 Apr 2026 22:22:53 +0000 (18:22 -0400)]
python-common/ceph/smb: add client.py for remote-control grpc client

Add a new client.py that contains the main library for acting as a
client of the remote-control grpc service for SMB. This is based on grpc
reflection rather than rigidly following an api generated from protobuf.
As this system is rapidly evolving this avoids having to keep generated
files in sync and more closely matches the grpcurl tool people are
already using with this feature.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
8 days agoMerge PR #69055 into main
Patrick Donnelly [Thu, 28 May 2026 17:57:54 +0000 (13:57 -0400)]
Merge PR #69055 into main

* refs/pull/69055/head:
qa/suites/upgrade: ignore osd in unknown state

Reviewed-by: Laura Flores <lflores@redhat.com>