]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/log
ceph.git
12 days agodoc: add PendingReleaseNotes entry for rgw multisite DNS endpoint resolution 67141/head
Oguzhan Ozmen [Tue, 24 Mar 2026 17:54:22 +0000 (17:54 +0000)]
doc: add PendingReleaseNotes entry for rgw multisite DNS endpoint resolution

Documents the new rgw_rest_conn_connect_to_resolved_ips feature that
enables RGW to resolve HTTP endpoints for RGW services such as multisite,
into all IP addresses and distribute requests across them using
round-robin with per-IP health tracking, supporting DNS service
discovery deployments without external load balancers.

Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
3 weeks agorgw/rest: add TODO for concurrent endpoint DNS resolution
Oguzhan Ozmen [Wed, 3 Jun 2026 16:37:15 +0000 (16:37 +0000)]
rgw/rest: add TODO for concurrent endpoint DNS resolution

Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
3 weeks agorgw/multisite: fix endpoint unreachable detection in RGWRESTConn sync paths
Oguzhan Ozmen [Wed, 6 May 2026 19:17:55 +0000 (19:17 +0000)]
rgw/multisite: fix endpoint unreachable detection in RGWRESTConn sync paths

The checks that decide whether to call set_endpoint_unconnectable()
were comparing against -EIO, but the actual error codes returned on
connection failure changed after commit 37352a9074 ("rgw: change
rgw_http_error_to_errno default to -ERR_INTERNAL_ERROR").

- complete_request() -> wait() returns req_data->ret which is set to
  rgw_http_error_to_errno(0) = -ERR_INTERNAL_ERROR when http_status is 0
  (TCP connect failed). Fix the six call sites to check -ERR_INTERNAL_ERROR.

- forward_request() returns tl::unexpected(-ERR_SERVICE_UNAVAILABLE)
  when http_status == 0 (no HTTP response received at all). Fix the two
  forward/forward_iam conditionals to check -ERR_SERVICE_UNAVAILABLE.

Without this fix, connection failures are never detected in the sync
paths, so set_endpoint_unconnectable() is never called and the
IP failover / retry logic is effectively dead.

The .h coroutine paths were already fixed by dbb409e21b9 ("rgw: fix
endpoint detection in RGWRESTConn") but that commit missed all .cc
sync paths.

Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
3 weeks agorgw: store RGWEndpoint URL as boost::urls::url
Oguzhan Ozmen [Tue, 14 Apr 2026 16:46:05 +0000 (16:46 +0000)]
rgw: store RGWEndpoint URL as boost::urls::url

Replace raw string manipulation in RGWEndpoint with boost::urls::url.
URL path/query/host changes now use set_path(), set_query(), set_host()
instead of string concatenation.

Rename original_url to endpoint_url_lookup_id to clarify its role as a
health-tracking key for ResolvedEndpoint lookup in
set_endpoint_unconnectable().

Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
3 weeks agodoc/radosgw: expose rgw_rest_conn_connect_to_resolved_ips and rgw_rest_conn_ip_fail_t...
Oguzhan Ozmen [Tue, 24 Mar 2026 15:52:38 +0000 (15:52 +0000)]
doc/radosgw: expose rgw_rest_conn_connect_to_resolved_ips and rgw_rest_conn_ip_fail_timeout_secs

Expose the confval information for the confval "rgw_rest_conn_connect_to_resolved_ips"
and "rgw_rest_conn_ip_fail_timeout_secs" so that they can be seen in the
"Ceph Object Gateway Config Reference" as these are meant to be client-facing configs.

Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
3 weeks agorgw: rename round-robin counters for brevity
Oguzhan Ozmen [Tue, 3 Mar 2026 20:46:38 +0000 (20:46 +0000)]
rgw: rename round-robin counters for brevity

Rename endpoint_round_robin_counter to endpoint_rr_index and
endpoint_ips_round_robin_counter to ip_rr_index for shorter,
cleaner variable names while maintaining clarity.

Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
3 weeks agorgw/zone: increase visibility into zone connections via admin socket
Oguzhan Ozmen [Tue, 3 Mar 2026 16:49:21 +0000 (16:49 +0000)]
rgw/zone: increase visibility into zone connections via admin socket

Adds a new admin socket command to dump zone connection details
including endpoints, resolved IPs, and health status. Useful for
debugging multisite connectivity issues.

Usage: ceph daemon <radosgw.asok> zone connections

Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
3 weeks agorgw: make CONN_STATUS_EXPIRE_SECS a cfg option
Oguzhan Ozmen [Tue, 3 Mar 2026 01:39:19 +0000 (01:39 +0000)]
rgw: make CONN_STATUS_EXPIRE_SECS a cfg option

Introduce a new radosgw option 'rgw_rest_conn_ip_fail_timeout_secs' to
be able to set the constant CONN_STATUS_EXPIRE_SECS dynamically.

Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
3 weeks agorgw/rest: track connection failures per-IP instead of per-endpoint
Oguzhan Ozmen [Tue, 3 Mar 2026 00:45:59 +0000 (00:45 +0000)]
rgw/rest: track connection failures per-IP instead of per-endpoint

Previously, when a connection to a zone endpoint failed, the entire
endpoint was marked as unavailable for a timeout period. Since we now
resolve endpoints to all their IP addresses (via DNS A/AAAA records),
we can be more granular: track failures at the individual IP level.

Introduce ResolvedIP struct that pairs each IP's connect_to string
with its own failure timestamp. When selecting an IP for a request,
round-robin skips IPs that have recently failed, allowing traffic to
continue flowing to healthy nodes even when some are down.

An endpoint-level last_failure_time is maintained as a fast-path
optimization to avoid scanning all IPs when none have failed recently.

Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
3 weeks agorgw/rest: remove unused headers
Oguzhan Ozmen [Tue, 3 Mar 2026 01:05:06 +0000 (01:05 +0000)]
rgw/rest: remove unused headers

Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
3 weeks agorgw/rest: consolidate endpoint_urls and resolved_endpoints into single vector
Oguzhan Ozmen [Mon, 2 Mar 2026 18:11:27 +0000 (18:11 +0000)]
rgw/rest: consolidate endpoint_urls and resolved_endpoints into single vector

Previously RGWRESTConn stored endpoints in two data structures:
- endpoint_urls: vector<string> for ordered round-robin iteration
- resolved_endpoints: unordered_map<string, ResolvedEndpoint> for lookup

This was redundant since the URL was stored in both places.

Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
3 weeks agorgw: add operator<< for RGWEndpoint and simplify logging
Oguzhan Ozmen [Sun, 8 Feb 2026 00:30:13 +0000 (00:30 +0000)]
rgw: add operator<< for RGWEndpoint and simplify logging

Add ostream operator<< to RGWEndpoint struct for convenient logging of
endpoint details (url, original_url when different, and connect_to).
Update log statements across rgw_http_client.cc, rgw_rest_client.cc,
and rgw_rest_conn.cc to use the new operator for cleaner, more
consistent output.

Add unittest_rgw_http_client to test RGWEndpoint functionality.

Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
3 weeks agorgw: track original URL within RGWEndpoint instead of separate member (refactor)
Oguzhan Ozmen [Sat, 7 Feb 2026 17:33:42 +0000 (17:33 +0000)]
rgw: track original URL within RGWEndpoint instead of separate member (refactor)

Refactor endpoint tracking by adding original_url to RGWEndpoint struct
instead of maintaining a separate endpoint_orig member in RGWHTTPClient.
This simplifies the code by having each endpoint self-track its original
URL, which is needed for connection status lookups after URL modifications.

No functional changes intended.

Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
3 weeks agorgw: fix incomplete RGWRESTConn move constructor/assignment
Oguzhan Ozmen [Sat, 7 Feb 2026 02:00:16 +0000 (02:00 +0000)]
rgw: fix incomplete RGWRESTConn move constructor/assignment

Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
3 weeks agorgw/rest: consolidate endpoint status tracking into ResolvedEndpoint
Oguzhan Ozmen [Sat, 7 Feb 2026 01:45:20 +0000 (01:45 +0000)]
rgw/rest: consolidate endpoint status tracking into ResolvedEndpoint

Refactor RGWRESTConn to eliminate the separate endpoints_status map by
moving the connection status (std::atomic<ceph::real_time>) directly
into the ResolvedEndpoint struct. This reduces redundancy and simplifies
endpoint state management.

Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
3 weeks agorgw/http: apply RGWEndpoint connect_to via libcurl CURLOPT_CONNECT_TO
Oguzhan Ozmen [Fri, 23 Jan 2026 19:07:19 +0000 (19:07 +0000)]
rgw/http: apply RGWEndpoint connect_to via libcurl CURLOPT_CONNECT_TO

If endpoint.connect_to is non-empty, populate and attach a curl slist to
CURLOPT_CONNECT_TO for this request.

Ensure the slist is freed with the request lifetime to avoid leaks across
retries/requests.

Fixes: https://tracker.ceph.com/issues/74677
Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
3 weeks agorgw/rest: round-robin resolved endpoint IPs into curl CONNECT_TO mapping
Oguzhan Ozmen [Fri, 23 Jan 2026 18:51:22 +0000 (18:51 +0000)]
rgw/rest: round-robin resolved endpoint IPs into curl CONNECT_TO mapping

Add logic to select an IP for a given endpoint URL (RR over resolved addresses)
and build a host:port:ip:port mapping.

Store the mapping in the RGWEndpoint so the HTTP layer can apply it per request.

Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
3 weeks agorgw/rest: resolve multisite endpoints to all A/AAAA records (optional)
Oguzhan Ozmen [Fri, 23 Jan 2026 15:34:18 +0000 (15:34 +0000)]
rgw/rest: resolve multisite endpoints to all A/AAAA records (optional)

When rgw_resolve_endpoints_into_all_addresses=true, parse each configured
endpoint URL, extract host/port, and resolve the host into all IP addresses.

Store resolution results (including a round-robin index) alongside the original
endpoint URL for later selection.

Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
3 weeks agorgw: add rgw_resolve_endpoints_into_all_addresses config option
Oguzhan Ozmen [Fri, 19 Dec 2025 02:25:55 +0000 (02:25 +0000)]
rgw: add rgw_resolve_endpoints_into_all_addresses config option

Introduce an advanced boolean config option (default: false) to enable resolving
multisite endpoint hostnames into all A/AAAA records.

When enabled, RGW can distribute outgoing inter-zone traffic across DNS-provided
backends even without an external load balancer.

Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
3 weeks agorgw/http: introduce RGWEndpoint to carry url + connect_to (refactor)
Oguzhan Ozmen [Wed, 21 Jan 2026 20:11:29 +0000 (20:11 +0000)]
rgw/http: introduce RGWEndpoint to carry url + connect_to (refactor)

This commit is meant to be non-functional.

Replace the "URL as plain string" plumbing with an RGWEndpoint value type
that carries:

  - the URL string, and
  - optional per-request connect_to data for libcurl,
  - it also encapsulates related functions/methods

This commit is intended to be non-functional: behavior should remain unchanged
until callers populate connect_to.

Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
3 weeks agoMerge PR #69222 into main
Patrick Donnelly [Tue, 2 Jun 2026 19:59:10 +0000 (15:59 -0400)]
Merge PR #69222 into main

* refs/pull/69222/head:
qa: install nvme-cli only if distro remains rocky10

Reviewed-by: Redouane Kachach <rkachach@redhat.com>
3 weeks agoMerge pull request #68941 from adamemerson/wip-rgw-deprecate-omap-datalog
Adam Emerson [Tue, 2 Jun 2026 18:39:46 +0000 (14:39 -0400)]
Merge pull request #68941 from adamemerson/wip-rgw-deprecate-omap-datalog

rgw: Deprecate OMAP datalog

Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
3 weeks agoMerge PR #69219 into main
Patrick Donnelly [Tue, 2 Jun 2026 17:59:18 +0000 (13:59 -0400)]
Merge PR #69219 into main

* refs/pull/69219/head:
script/backport-create-issue: catch errors during traversal

Reviewed-by: Yuri Weinstein <yweins@redhat.com>
3 weeks agoMerge pull request #69174 from dang/wip-dang-merge-standalone
Daniel Gryniewicz [Tue, 2 Jun 2026 16:37:31 +0000 (12:37 -0400)]
Merge pull request #69174 from dang/wip-dang-merge-standalone

Merge rgw-standalone

3 weeks agoMerge pull request #68756 from phlogistonjohn/jjm-smb-ctl-tool
John Mulligan [Tue, 2 Jun 2026 14:42:52 +0000 (10:42 -0400)]
Merge pull request #68756 from phlogistonjohn/jjm-smb-ctl-tool

smb: add a ceph based smb remote control client tool

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Anoop C S <anoopcs@cryptolab.net>
3 weeks agoMerge pull request #68774 from aclamk/aclamk-doc-bs-rocksdb-perf-counters
Adam Kupczyk [Tue, 2 Jun 2026 13:16:26 +0000 (15:16 +0200)]
Merge pull request #68774 from aclamk/aclamk-doc-bs-rocksdb-perf-counters

doc/rados/bluestore: RockDB cache shards, perf counters

3 weeks agoMerge pull request #68430 from Jayaprakash-ibm/wip-bluefs-spillover-cleaner-rework
Jaya Prakash [Tue, 2 Jun 2026 13:04:54 +0000 (18:34 +0530)]
Merge pull request #68430 from Jayaprakash-ibm/wip-bluefs-spillover-cleaner-rework

os/bluestore: BlueFS Spillover Cleaner Evolution

Reviewed-by: Adam Kupczyk <akupczyk@ibm.com>
3 weeks agoMerge PR #67709 into main
Venky Shankar [Tue, 2 Jun 2026 10:12:16 +0000 (15:42 +0530)]
Merge PR #67709 into main

* refs/pull/67709/head:
tools/cephfs: always execute scan_{extents,inodes,frags} and cleanup

Reviewed-by: Edwin Rodriguez <edwin.rodriguez1@ibm.com>
3 weeks agoMerge pull request #69037 from dparmar18/i76728
Venky Shankar [Tue, 2 Jun 2026 08:55:12 +0000 (14:25 +0530)]
Merge pull request #69037 from dparmar18/i76728

mds: persist session auth_name in ESession journal event

Reviewed-by: Christopher Hoffman <choffman@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
3 weeks agoMerge pull request #67717 from leonidc/delay-failback
leonidc [Tue, 2 Jun 2026 08:03:36 +0000 (11:03 +0300)]
Merge pull request #67717 from leonidc/delay-failback

nvmeofgw: delay failback

3 weeks agoMerge pull request #69105 from guits/fix-bypass_workqueue
Guillaume Abrioux [Tue, 2 Jun 2026 06:10:59 +0000 (08:10 +0200)]
Merge pull request #69105 from guits/fix-bypass_workqueue

ceph-volume: detect rotational media under dm-crypt for workqueue bypass

3 weeks agosrc/test/reclaim: test session reclaim after mds failover 69037/head
Dhairya Parmar [Mon, 25 May 2026 12:01:33 +0000 (17:31 +0530)]
src/test/reclaim: test session reclaim after mds failover

ensure that the new active MDS reads the auth_name from the ESession
event and assigns it to the new session that MDS creates during journal
replay.

NOTE: the mds failover is carried by sending "respawn" command to active
MDS using libcephfs's ceph_mds_command().

Fixes: https://tracker.ceph.com/issues/76728
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
3 weeks agomds: persist session auth_name in ESession journal event
Dhairya Parmar [Wed, 20 May 2026 21:18:15 +0000 (02:48 +0530)]
mds: persist session auth_name in ESession journal event

So that it can be applied to the freshly creation session which happens
while recreating session in ESession::replay when the OMAP version fell
behind the ESession cmapv and the newly creation session would be
rejected as target when a client tries to reclaim this session.

Fixes: https://tracker.ceph.com/issues/76728
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
3 weeks agoMerge pull request #68098 from sunyuechi/riscv-isa-l-support
Kefu Chai [Tue, 2 Jun 2026 03:47:35 +0000 (11:47 +0800)]
Merge pull request #68098 from sunyuechi/riscv-isa-l-support

isa-l: enable on RISC-V

Reviewed-by: Kefu Chai <k.chai@proxmox.com>
3 weeks agoMerge pull request #69121 from tchaikov/wip-seastore-rolling-in-bg
Kefu Chai [Tue, 2 Jun 2026 02:14:48 +0000 (10:14 +0800)]
Merge pull request #69121 from tchaikov/wip-seastore-rolling-in-bg

crimson/seastore: make RecordSubmitter::wait_available() idempotent

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
3 weeks agoMerge pull request #69214 from tchaikov/wip-cephadm-iscsi-gw
Kefu Chai [Mon, 1 Jun 2026 23:35:37 +0000 (07:35 +0800)]
Merge pull request #69214 from tchaikov/wip-cephadm-iscsi-gw

qa/cephadm: query iSCSI gateway FQDN from inside the container

Reviewed-by: Redouane Kachach <rkachach@ibm.com>
3 weeks agoMerge pull request #69026 from jamiepryde/ec-profile-deprecation-warning
SrinivasaBharathKanta [Mon, 1 Jun 2026 23:19:05 +0000 (04:49 +0530)]
Merge pull request #69026 from jamiepryde/ec-profile-deprecation-warning

Add health warning for deprecated EC plugins and techniques

3 weeks agoMerge PR #68362 into main
Patrick Donnelly [Mon, 1 Jun 2026 19:33:50 +0000 (15:33 -0400)]
Merge PR #68362 into main

* refs/pull/68362/head:
doc: squid 19.2.4 release notes

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Redouane Kachach <rkachach@redhat.com>
3 weeks agoMerge pull request #66936 from jacquesh/remove-text-output-from-rados-bench-json
Radoslaw Zarzynski [Mon, 1 Jun 2026 19:30:25 +0000 (21:30 +0200)]
Merge pull request #66936 from jacquesh/remove-text-output-from-rados-bench-json

tools/rados: Remove plain text snippets from rados bench JSON output

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
3 weeks agoMerge pull request #69061 from jzhu116-bloomberg/wip-70346
Radoslaw Zarzynski [Mon, 1 Jun 2026 19:00:45 +0000 (21:00 +0200)]
Merge pull request #69061 from jzhu116-bloomberg/wip-70346

osd: unregister admin socket commands in fast shutdown

Reviewed-by: Kefu Chai <k.chai@proxmox.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
3 weeks agocommon/options, os/bluestore: add debug option to force bluefs files onto slow device 68430/head
Jaya Prakash [Thu, 7 May 2026 12:09:07 +0000 (12:09 +0000)]
common/options, os/bluestore: add debug option to force bluefs files onto slow device

Fixes: https://tracker.ceph.com/issues/74319
Signed-off-by: Jaya Prakash <jayaprakash@ibm.com>
3 weeks agoos/bluestore: start/stop BlueFS spillover cleaner on config change
Jaya Prakash [Mon, 16 Mar 2026 19:22:49 +0000 (19:22 +0000)]
os/bluestore: start/stop BlueFS spillover cleaner on config change

Fixes: https://tracker.ceph.com/issues/74319
Signed-off-by: Jaya Prakash <jayaprakash@ibm.com>
(cherry picked from commit dc768b782d54cc6a5dee29a9c4f358e8b9183aa6)

3 weeks agoos/bluestore: migrated files in 128MB chunks
Jaya Prakash [Fri, 15 May 2026 17:07:32 +0000 (17:07 +0000)]
os/bluestore: migrated files in 128MB chunks

Signed-off-by: Jaya Prakash <jayaprakash@ibm.com>
3 weeks agoos/bluestore: Spillover Cleaner Thread implementation in BlueFS
Jaya Prakash [Thu, 16 Apr 2026 15:30:28 +0000 (15:30 +0000)]
os/bluestore: Spillover Cleaner Thread implementation in BlueFS

Fixes: https://tracker.ceph.com/issues/74319
Signed-off-by: Jaya Prakash <jayaprakash@ibm.com>
3 weeks agocommon/options: add bluefs_spillover_cleaner option
Jaya Prakash [Mon, 16 Mar 2026 19:23:05 +0000 (19:23 +0000)]
common/options: add bluefs_spillover_cleaner option

Fixes: https://tracker.ceph.com/issues/74319
Signed-off-by: Jaya Prakash <jayaprakash@ibm.com>
3 weeks agoqa: install nvme-cli only if distro remains rocky10 69222/head
Patrick Donnelly [Mon, 1 Jun 2026 15:37:23 +0000 (11:37 -0400)]
qa: install nvme-cli only if distro remains rocky10

Notably, only include these the `dnf install` commands if the distro is
not overriden by some other mechanism (like cephfs kernel overrides).

This is only a problem for tentacle presently as the k-stock kernel will
override with centos9.

Fixes: https://tracker.ceph.com/issues/77037
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
3 weeks agoMerge pull request #69083 from fultheim/adaptive-cleaner-thresholds
Matan Breizman [Mon, 1 Jun 2026 16:12:08 +0000 (19:12 +0300)]
Merge pull request #69083 from fultheim/adaptive-cleaner-thresholds

crimson/os/seastore: adaptive cleaner thresholds from observed workload

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
3 weeks agoscript/backport-create-issue: catch errors during traversal 69219/head
Patrick Donnelly [Mon, 1 Jun 2026 14:29:13 +0000 (10:29 -0400)]
script/backport-create-issue: catch errors during traversal

A ServerError shouldn't prevent all forward progress.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
3 weeks agoMerge pull request #69203 from tchaikov/wip-libcephfs-test
Kefu Chai [Mon, 1 Jun 2026 14:08:36 +0000 (22:08 +0800)]
Merge pull request #69203 from tchaikov/wip-libcephfs-test

test/libcephfs: reduce SnapDiffDeletionRecreation bulk_count on Windows

Reviewed-by: Igor Fedotov <igor.fedotov@croit.io>
3 weeks agoMerge pull request #68775 from gardran/wip-gardran-fix-write-v2-deferred-counters
Igor Fedotov [Mon, 1 Jun 2026 13:58:53 +0000 (16:58 +0300)]
Merge pull request #68775 from gardran/wip-gardran-fix-write-v2-deferred-counters

os/bluestore: do not increment *issued_deferred* counters twice

Reviewed-by: Jaya Prakash <jayaprakash@ibm.com>
Reviewed-by: Adam Kupczyk <akupczyk@ibm.com>
3 weeks agoMerge pull request #69168 from guits/fix-osd-type
Guillaume Abrioux [Mon, 1 Jun 2026 13:49:09 +0000 (15:49 +0200)]
Merge pull request #69168 from guits/fix-osd-type

cephadm: cephadm: omit --osd-type classic for older ceph-volume

3 weeks agoMerge PR #69152 into main
Patrick Donnelly [Mon, 1 Jun 2026 13:33:25 +0000 (09:33 -0400)]
Merge PR #69152 into main

* refs/pull/69152/head:
script/backport-create-issue: update custom field name

Reviewed-by: Redouane Kachach <rkachach@redhat.com>
3 weeks agoMerge pull request #67889 from gardran/wip-gardran-no-seq-bytes
Igor Fedotov [Mon, 1 Jun 2026 10:55:28 +0000 (13:55 +0300)]
Merge pull request #67889 from gardran/wip-gardran-no-seq-bytes

os/bluestore: avoid redundant map lookup for deferred op

Reviewed-by: Jaya Prakash <jayaprakash@ibm.com>
3 weeks agoos/bluestore: do not increment *issued_deferred* counter twice 68775/head
Garry Drankovich [Wed, 6 May 2026 16:19:45 +0000 (19:19 +0300)]
os/bluestore: do not increment *issued_deferred* counter twice
in write v2 mode.

_get_deferred_op() is already increasing performance counter on its own.

Signed-off-by: Garry Drankovich <garry.drankovich@clyso.com>
3 weeks agoqa/cephadm: query iSCSI gateway FQDN from inside the container 69214/head
Kefu Chai [Mon, 1 Jun 2026 10:40:06 +0000 (18:40 +0800)]
qa/cephadm: query iSCSI gateway FQDN from inside the container

rbd-target-api validates that the gateway hostname supplied by gwcli
matches the container's own socket.getfqdn(). Running the same call on
the host can return a different value when the host and container resolve
names differently (e.g. on Rocky 10), causing gateway creation to fail
with HTTP 400 and all subsequent gwcli configuration to break silently.

Query the FQDN from inside the iSCSI container directly so the value is
always consistent with what rbd-target-api expects. This also removes the
"run twice" workaround, which was compensating for host-side DNS
warm-up flakiness rather than addressing the underlying mismatch.

Fixes: https://tracker.ceph.com/issues/74577
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
3 weeks agoMerge pull request #69143 from guits/fix-cv-vg-lv-batch
Guillaume Abrioux [Mon, 1 Jun 2026 07:57:17 +0000 (09:57 +0200)]
Merge pull request #69143 from guits/fix-cv-vg-lv-batch

ceph-volume: retry lvs after empty result and "devices file is missing" stderr

3 weeks agotest/libcephfs: reduce SnapDiffDeletionRecreation bulk_count on Windows 69203/head
Kefu Chai [Mon, 1 Jun 2026 05:19:04 +0000 (13:19 +0800)]
test/libcephfs: reduce SnapDiffDeletionRecreation bulk_count on Windows

this test timed out on Windows. and HugeSnapDiffLargeDelta, at half
the file count, passed in 508 seconds on the same run, suggesting this
test takes ~17 minutes on Windows -- beyond the test runner limit.

we haven't profiled the Windows client yet, but the likely culprit is
EventPoll, the Windows messenger backend, which scans the entire poll
array on every event_wait() and poll_ctl() call rather than using a
keyed data structure.

in this change, we reduce bulk_count to 1 << 12 on Windows. the unique
thing this test covers is the deletion-recreation pattern: a name that
exists as a file in snap1, gets deleted, and reappears as a directory in
snap2 -- it must show up in the diff with both snapids. 4096 produces
1024 such pairs, which is enough to exercise that logic. multi-fragment
snapdiff is already covered by HugeSnapDiffLargeDelta, which derives its
file count from mds_bal_split_size and mds_bal_fragment_fast_factor
explicitly to trigger fragmentation.

Fixes: https://tracker.ceph.com/issues/77015
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
4 weeks agoMerge pull request #69135 from VallariAg/wip-nvmeof-teuthology-mon-conf
Vallari Agrawal [Sun, 31 May 2026 16:00:05 +0000 (21:30 +0530)]
Merge pull request #69135 from VallariAg/wip-nvmeof-teuthology-mon-conf

qa/suites/nvmeof: set beacon grace and connect panic

4 weeks agoMerge pull request #66500 from AliMasarweh/wip-alimasa-global-cors
Ali Masarwa [Sun, 31 May 2026 10:30:56 +0000 (13:30 +0300)]
Merge pull request #66500 from AliMasarweh/wip-alimasa-global-cors

RGW: add support for global CORS rule

Reviewed-by: Naman Munet <naman.munet@ibm.com>, Casey Bodley <cbodley@redhat.com>
4 weeks agoMerge pull request #69185 from sunyuechi/wip-with-system-spdk
Kefu Chai [Sun, 31 May 2026 10:26:14 +0000 (18:26 +0800)]
Merge pull request #69185 from sunyuechi/wip-with-system-spdk

cmake,blk/spdk: support WITH_SYSTEM_SPDK

Reviewed-by: Kefu Chai <k.chai@proxmox.com>
4 weeks agoMerge pull request #68745 from Hezko/bugfix-13279
Hezko [Sun, 31 May 2026 08:04:07 +0000 (11:04 +0300)]
Merge pull request #68745 from Hezko/bugfix-13279

mgr/dashboard: fix listener add errors

4 weeks agoMerge pull request #69044 from xxhdx1985126/wip-seastore-rewrite-fix
Matan Breizman [Sun, 31 May 2026 07:20:36 +0000 (10:20 +0300)]
Merge pull request #69044 from xxhdx1985126/wip-seastore-rewrite-fix

crimson/os/seastore: force rewrite transactions to conflict with others if it involve insertions on the lba tree

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
4 weeks agocmake: add WITH_SYSTEM_SPDK to link a system-installed SPDK 69185/head
Sun Yuechi [Sat, 30 May 2026 06:15:12 +0000 (14:15 +0800)]
cmake: add WITH_SYSTEM_SPDK to link a system-installed SPDK

By default ceph builds the bundled src/spdk fork via BuildSPDK. Add a
WITH_SYSTEM_SPDK option that instead locates a distro-provided SPDK
through a new Findspdk.cmake (pkg-config based, modelled on
Finddpdk.cmake), exposing the same spdk::spdk target.

Signed-off-by: Sun Yuechi <sunyuechi@iscas.ac.cn>
4 weeks agoblk/spdk: support both old and new spdk_env_opts member names
Sun Yuechi [Sat, 30 May 2026 06:11:11 +0000 (14:11 +0800)]
blk/spdk: support both old and new spdk_env_opts member names

SPDK 21.01 renamed two struct spdk_env_opts members: pci_whitelist ->
pci_allowed and master_core -> main_core. Guard the assignments in
NVMEDevice with SPDK_VERSION.

pci_whitelist -> pci_allowed:  https://github.com/spdk/spdk/commit/4a6a2824119b
master_core -> main_core:      https://github.com/spdk/spdk/commit/fe137c8970bf

Signed-off-by: Sun Yuechi <sunyuechi@iscas.ac.cn>
4 weeks agoMerge pull request #68934 from cbodley/wip-76578
Casey Bodley [Fri, 29 May 2026 17:52:00 +0000 (13:52 -0400)]
Merge pull request #68934 from cbodley/wip-76578

rgw/beast: add ssl_ciphersuites option for tls 1.3

Reviewed-by: Matt Benjamin <mbenjamin@redhat.com>
4 weeks agorgw/posix: remove path from table names 69174/head
Nithya Balachandran [Thu, 16 Apr 2026 10:01:50 +0000 (10:01 +0000)]
rgw/posix: remove path from table names

Removes the DB directory path from the table names.

Signed-off-by: Nithya Balachandran <nithya.balachandran@ibm.com>
4 weeks agorgw/posix: implement the quota feature
Nithya Balachandran [Tue, 24 Mar 2026 08:17:52 +0000 (08:17 +0000)]
rgw/posix: implement the quota feature

Implement the quota feature for the POSIX driver.

Signed-off-by: Nithya Balachandran <nithya.balachandran@ibm.com>
4 weeks agoRGW | standalone: add support for accounts in dbstore
Ali Masarwa [Sun, 12 Apr 2026 13:07:38 +0000 (16:07 +0300)]
RGW | standalone: add support for accounts in dbstore

Signed-off-by: Ali Masarwa <amasarwa@redhat.com>
4 weeks agoradosgw-admin: Remove dependence on RADOS
Samarah Uriarte [Tue, 24 Mar 2026 15:21:00 +0000 (15:21 +0000)]
radosgw-admin: Remove dependence on RADOS

Signed-off-by: Samarah Uriarte <samarah.uriarte@ibm.com>
4 weeks agoRGW POSIX - Fix POSIX unittest
Daniel Gryniewicz [Mon, 30 Mar 2026 14:49:47 +0000 (10:49 -0400)]
RGW POSIX - Fix POSIX unittest

Signed-off-by: Daniel Gryniewicz <dang@fprintf.net>
4 weeks agorgw/posix: fix cached size of uploaded objects
Matt Benjamin [Tue, 24 Mar 2026 18:10:28 +0000 (14:10 -0400)]
rgw/posix: fix cached size of uploaded objects

Moves file open and stat into the (atomic) link step, so size
is correctly interned in the cache.  Fix suggested by dang.

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
4 weeks agorgw/posix: fix crash in radosgw-admin
Nithya Balachandran [Tue, 24 Mar 2026 11:33:15 +0000 (11:33 +0000)]
rgw/posix: fix crash in radosgw-admin

The POSIXBucket copy constructor incorrectly calls .get() on a
on a temporary unique_ptr returned by clone(), causing immediate
deletion of the Directory object. This leaves a dangling pointer
that triggers a segfault during destruction.

Signed-off-by: Nithya Balachandran <nithya.balachandran@ibm.com>
4 weeks agocohort_lru: keep strict discard, but from LRU
Matt Benjamin [Wed, 26 Nov 2025 23:17:02 +0000 (18:17 -0500)]
cohort_lru: keep strict discard, but from LRU

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
4 weeks agoposixdriver: properly destruct BucketCacheEntry objects
Matt Benjamin [Wed, 26 Nov 2025 14:00:03 +0000 (09:00 -0500)]
posixdriver:  properly destruct BucketCacheEntry objects

* avoids leak of database handles during eviction

Also adds missing return-ref in invalidate_entry--this would
leak a cache entry.

With this change, we can now tolerate indefinite s3-test runs
wit rgw_posix_cache_max_buckets=100.

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
4 weeks agocohort_lru: crash fix and reduce lock contention
Matt Benjamin [Tue, 25 Nov 2025 17:41:37 +0000 (12:41 -0500)]
cohort_lru: crash fix and reduce lock contention

Fixes crash induced by taking the address of the last element
of an empty intrusive list (!).

Also, introduces active queue, reducing potential for lock
contention in evict_block():

* entries are tracked on lane::active_queue when lru_refcnt > 1
** on some lane::q otherwise

Object transition between queues when lru_refcnt changes value--
a value of 0 triggers deletion, as before.

Fixes: https://tracker.ceph.com/issues/73992
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
4 weeks agoposixdriver: can move buffer::list leaving scope
Matt Benjamin [Fri, 13 Feb 2026 20:29:58 +0000 (15:29 -0500)]
posixdriver: can move buffer::list leaving scope

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
4 weeks agoposixdriver: add provisional manifest
Matt Benjamin [Wed, 4 Feb 2026 02:05:47 +0000 (21:05 -0500)]
posixdriver: add provisional manifest

initially, it is just used to remember the multipart layout, but
likely will see other use.

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
4 weeks agoposixdriver: fix cksum_type, flags propagation
Matt Benjamin [Tue, 3 Feb 2026 22:12:22 +0000 (17:12 -0500)]
posixdriver: fix cksum_type, flags propagation

Posixdriver doesn't serialize POSIXMultipartUpload, but rather a
member mp_obj of type POSIXMPObj--so to avoid losing the latter's
inherited cksum_type and cksum_flags members (which are already
copied in), copy them out in POSIXMultiPartUpload::get_info() which
we need to call to copy out dest_placement anyway.

(oops, chksum_type was copied in, but not cksum_flags)

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
4 weeks agoposixdriver: fix cache fill of versioned buckets
Matt Benjamin [Sun, 15 Feb 2026 20:56:03 +0000 (15:56 -0500)]
posixdriver: fix cache fill of versioned buckets

This change completes the original intent (hypothesized) to
conditionally set the FLAG_CURRENT bit on just the current
entries during bucket listing cache fill.

This avoids interning 2 copies of the current version of each
object in the listing cache, and also correctly sets the
FLAG_CURRENT bit as required--so the current versions are correctly
reported in versioned listings.

Janky logic to find the current version by explicitly chasing
the symlink target and saving it outside the enumeration scope
has been replaced with proper call to stat() provided by Dang.

Symlink::fill_cache() is no longer used, so removed.

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
4 weeks agoposixdriver: add bde.flags to in bucket cache serde cycle
Matt Benjamin [Sun, 15 Feb 2026 15:21:28 +0000 (10:21 -0500)]
posixdriver: add bde.flags to in bucket cache serde cycle

The upstream logic (mostly?) correctly uses bde.flags when filling
the cache for versioned objects, but cache ser(de)ialization has
been discarding that member.

This change suppresses the visible result where RGW incorrectly produces
multiple versions in non-versioned listing because none uniquely sets
FLAG_CURRENT:

mbenjamin@fedora:~/dev/rgw/s3_py/python$ s3cmd ls s3://sheik2
2026-02-14 22:44           22  s3://sheik2/ginfizz_1
2026-02-14 22:44           22  s3://sheik2/ginfizz_1
2026-02-14 22:44           22  s3://sheik2/ginfizz_1
2026-02-14 22:44           22  s3://sheik2/ginfizz_2
2026-02-14 22:44           22  s3://sheik2/ginfizz_2
2026-02-14 22:44           22  s3://sheik2/ginfizz_2

Corrected result is:

mbenjamin@fedora:~/dev/rgw/s3_py/python$ s3cmd ls s3://sheik2
2026-02-14 22:44           22  s3://sheik2/ginfizz_1
2026-02-14 22:44           22  s3://sheik2/ginfizz_2

Cached listings for versions are still incorrect in containing an
an extra entry for the "current" version in with empty instance
(from the Symlink)--the visible effect being that list-object-versions
output is incorrect (no entry is sent with IsLatest, after the
empty instance version has been filtered out).

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
4 weeks agoposixdriver: propagate object lock attrs across multipart upload
Matt Benjamin [Thu, 12 Feb 2026 19:13:17 +0000 (14:13 -0500)]
posixdriver: propagate object lock attrs across multipart upload

Retention rules can be specified in init-multipart, and of present,
need to propagate to the final object if the upload completes.

Needed for (e.g.) test_object_lock_delete_multipart_object_with_retention

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
4 weeks agoposixdriver: page in all xattrs in POSIXObject::load_obj_state()
Matt Benjamin [Wed, 11 Feb 2026 21:44:42 +0000 (16:44 -0500)]
posixdriver: page in all xattrs in POSIXObject::load_obj_state()

This seems to be needed for (at least) object lock retention period
checks, e.g., in DeleteObject::execute().

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
4 weeks agoMerge pull request #68899 from batrick/i76586
Ilya Dryomov [Fri, 29 May 2026 16:02:02 +0000 (18:02 +0200)]
Merge pull request #68899 from batrick/i76586

qa: ignore POOL_FULL for rbd tests exercising full pools

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
4 weeks agoMerge PR #67683 into main
Patrick Donnelly [Fri, 29 May 2026 15:08:23 +0000 (11:08 -0400)]
Merge PR #67683 into main

* refs/pull/67683/head:
qa/tasks/cbt: construct venv just for cbt
qa/distros: use consistent naming
qa/tasks/nvme_loop: fix nvme loop task for ubuntu noble
qa/distros: add ubuntu_24.04 as supported container host
qa/distros: bump ubuntu_latest.yaml to 24.04
qa/distros: add all/ubuntu_24.04.yaml
qa/suites/rados/encoder: use random supported distro
qa/ceph-ansible: symlink supported-random-distro$
qa/fs/fscrypt: symlink supported-random-distro$
qa/cephmetrics: symlink supported-random-distro$

Reviewed-by: Redouane Kachach <rkachach@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
4 weeks agoMerge PR #69163 into main
Patrick Donnelly [Fri, 29 May 2026 15:07:03 +0000 (11:07 -0400)]
Merge PR #69163 into main

* refs/pull/69163/head:
qa/tasks: capture CommandCrashedError when running nvme list cmd

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
4 weeks agoMerge pull request #66439 from aclamk/aclamk-bs-simpler-flush
Igor Fedotov [Fri, 29 May 2026 15:04:47 +0000 (18:04 +0300)]
Merge pull request #66439 from aclamk/aclamk-bs-simpler-flush

bluestore/bluefs: FileWriter simpler flush

Reviewed-by: Igor Fedotov <igor.fedotov@croit.io>
4 weeks agoMerge pull request #68607 from dheart-joe/wip-bluestore-unshare-blob
Igor Fedotov [Fri, 29 May 2026 15:03:05 +0000 (18:03 +0300)]
Merge pull request #68607 from dheart-joe/wip-bluestore-unshare-blob

os/bluestore: optimize shared blob unsharing during snapshot removal

Reviewed-by: Igor Fedotov <igor.fedotov@croit.io>
4 weeks agoMerge pull request #69166 from sunyuechi/wip-rgw-swift-error-handler-out-of-line
Casey Bodley [Fri, 29 May 2026 15:01:51 +0000 (11:01 -0400)]
Merge pull request #69166 from sunyuechi/wip-rgw-swift-error-handler-out-of-line

rgw: move SWIFT error_handler out-of-line to fix link failure

Reviewed-by: Casey Bodley <cbodley@redhat.com>
4 weeks agoMerge pull request #68898 from gardran/wip-gardran-show-esb-in-metadata
Igor Fedotov [Fri, 29 May 2026 15:01:44 +0000 (18:01 +0300)]
Merge pull request #68898 from gardran/wip-gardran-show-esb-in-metadata

os/bluestore: dump effective elastic shared blobs mode in OSD metadata report

Reviewed-by: Adam Kupczyk <akupczyk@ibm.com>
4 weeks agotools/rados: Remove plain text snippets from rados bench JSON output 66936/head
Jacques Heunis [Thu, 15 Jan 2026 12:11:11 +0000 (12:11 +0000)]
tools/rados: Remove plain text snippets from rados bench JSON output

`rados bench` emits performance stats as its output. It is very helpful
for this output to be in a machine-readable format and the CLI provides
the `--format=json` flag to achieve this.

There are some logs that do not respect the formatter flag though, as
they provide status updates as the tool is running and do not form part
of the output dataset. This prevents the contents of stdout from being
valid JSON which destroys the machine-readability of the output.

To resolve this we gate those status messages behind a check for the
formatter. If any specific formatter is provided we do not emit the
status logs. This leaves the plaintext output largely untouched while
helping the machine-readable output to be well-formed.

Fixes: https://tracker.ceph.com/issues/74370
Signed-off-by: Jacques Heunis <jheunis@bloomberg.net>
4 weeks agoMerge pull request #69144 from gbregman/main
Gil Bregman [Fri, 29 May 2026 14:39:57 +0000 (17:39 +0300)]
Merge pull request #69144 from gbregman/main

nvmeof: Change the NVMEOF image version to 1.8

4 weeks agorgw/datalog: `radosgw-admin` will no longer convert datalog to omap 68941/head
Adam C. Emerson [Thu, 5 Feb 2026 22:02:44 +0000 (17:02 -0500)]
rgw/datalog: `radosgw-admin` will no longer convert datalog to omap

Omap-backed datalogs are deprecated, so we remove the ability to
convert to them.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
4 weeks agorgw/datalog: Remove `rgw default data log backing` option
Adam C. Emerson [Thu, 5 Feb 2026 19:27:43 +0000 (14:27 -0500)]
rgw/datalog: Remove `rgw default data log backing` option

Omap-backed datalogs are deprecated. This option is removed and we no
longer support creating new clusters using them.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
4 weeks agoMerge pull request #68095 from lumir-sliva/fix-deprecated-egrep-fgrep
Ilya Dryomov [Fri, 29 May 2026 13:52:14 +0000 (15:52 +0200)]
Merge pull request #68095 from lumir-sliva/fix-deprecated-egrep-fgrep

qa,src: replace deprecated egrep/fgrep with grep -E/grep -F

Reviewed-by: Kefu Chai <k.chai@proxmox.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
4 weeks agoqa: Ignore deprecated EC plugin warning in teuthology tests 69026/head
Jamie Pryde [Fri, 29 May 2026 11:44:56 +0000 (12:44 +0100)]
qa: Ignore deprecated EC plugin warning in teuthology tests

Add DEPRECATED_EC_PLUGIN to the list of health warnings to
ignore in the thrash-erasure-code-* tests that use deprecated
plugins or techniques. It is expected that this warning will
be raised.

Signed-off-by: Jamie Pryde <jamiepry@uk.ibm.com>
4 weeks agocephadm: cephadm: omit --osd-type classic for older ceph-volume 69168/head
Guillaume Abrioux [Fri, 29 May 2026 11:13:52 +0000 (13:13 +0200)]
cephadm: cephadm: omit --osd-type classic for older ceph-volume

tentacle doesn't know that flag yet.
During an upgrade, teuthology tests can break.
With this fix, we only add the flag when osd_type isn't classic.

Fixes: https://tracker.ceph.com/issues/76968
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
4 weeks agorgw: move SWIFT error_handler out-of-line to fix link failure 69166/head
Sun Yuechi [Fri, 29 May 2026 10:39:51 +0000 (18:39 +0800)]
rgw: move SWIFT error_handler out-of-line to fix link failure

The two error_handler overrides are defined inline in rgw_rest_swift.h
and delegate to RGWSwiftWebsiteHandler::error_handler, a non-virtual
function defined in rgw_rest_swift.cc (librgw_a.a). Because the header
is included by rgw_rest.cc, the inline bodies are emitted in
librgw_common.a, which then ODR-uses that symbol across archives.

The link line lists librgw_a.a before librgw_common.a, and GNU ld only
pulls archive members on demand: when librgw_a.a is scanned nothing yet
references RGWSwiftWebsiteHandler::error_handler, so rgw_rest_swift.cc.o
is dropped and the symbol is later unresolved. This shows up as a link
failure with gcc 16 -O2.

Move the two bodies into rgw_rest_swift.cc next to the function they
call, so the ODR-use stays within the same object and the build no
longer depends on archive scan order. No functional change.

Signed-off-by: Sun Yuechi <sunyuechi@iscas.ac.cn>
4 weeks agoqa/suites/nvmeof: ignore "have only 1 nvmeof gateway" 69135/head
Vallari Agrawal [Wed, 27 May 2026 12:17:55 +0000 (17:47 +0530)]
qa/suites/nvmeof: ignore "have only 1 nvmeof gateway"

Add "have only 1 nvmeof gateway" to ignorelist.
NVMEOF_SINGLE_GATEWAY is already part of ignorelist
but tests sometimes fail on "have only 1 nvmeof gateway".

Thrasher or scalability tests can trigger this but there
are enough asserts to ensure all expected gateways are
up, we can safely ignore this healthcheck warning.

Fixes: https://tracker.ceph.com/issues/75913
Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>
4 weeks agocrimson/os/seastore: add safety clamp to adaptive hard_limit and crash_floor 69083/head
Shai Fultheim [Fri, 29 May 2026 09:26:39 +0000 (12:26 +0300)]
crimson/os/seastore: add safety clamp to adaptive hard_limit and crash_floor

Signed-off-by: Shai Fultheim <shai.fultheim@gmail.com>
4 weeks agoqa/tasks: capture CommandCrashedError when running nvme list cmd 69163/head
Redouane Kachach [Fri, 29 May 2026 09:09:44 +0000 (11:09 +0200)]
qa/tasks: capture CommandCrashedError when running nvme list cmd

The safe_while retry loop does not catch exceptions, so a
CommandCrashedError from `nvme list` bypasses it entirely. Catch
CommandCrashedError and continue the retry loop instead.

Fixes: https://tracker.ceph.com/issues/76984
Signed-off-by: Redouane Kachach <rkachach@ibm.com>