]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/log
ceph.git
9 days agotest/rgw/posix: free the quota handler in TestDriver 69285/head
Kefu Chai [Thu, 4 Jun 2026 10:38:24 +0000 (18:38 +0800)]
test/rgw/posix: free the quota handler in TestDriver

TestDriver::init() allocates quota_handler via
RGWQuotaHandler::generate_handler() but nothing frees it. The real
POSIXDriver frees it in finalize(), which the unit tests never call, so
every fixture that runs init() leaks the handler and the stat caches
hanging off it: 274 allocations, ~40KB, all rooted at generate_handler()
under ASan:

  ==6102==ERROR: LeakSanitizer: detected memory leaks
  Direct leak of 3200 byte(s) in 5 object(s) allocated from:
    #1 RGWQuotaHandler::generate_handler(...) src/rgw/rgw_quota.cc:989
    #2 TestDriver::init(...) src/test/rgw/test_rgw_posix_driver.cc:1100
    #3 POSIXDriverTest::SetUp() src/test/rgw/test_rgw_posix_driver.cc:1191
    ...
  SUMMARY: AddressSanitizer: 40099 byte(s) leaked in 274 allocation(s).

So free it in ~TestDriver(), the counterpart to the init() allocation.
~POSIXDriver() is empty and nothing else touches quota_handler, so there
is no double free, and free_handler(nullptr) is a no-op when init()
bailed out early.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>
9 days agoMerge pull request #64805 from NitzanMordhai/wip-nitzan-mgr-from-cache-ttl-to-cache...
NitzanMordhai [Thu, 4 Jun 2026 09:59:45 +0000 (12:59 +0300)]
Merge pull request #64805 from NitzanMordhai/wip-nitzan-mgr-from-cache-ttl-to-cache-changed

Enhance mgr cache - from cache based on ttl to cache based on changes

Reviewed-by: Sridhar Seshasayee <sridhar.seshasayee@ibm.com>
Signed-off-by: Nizamudeen A <nia@redhat.com>
9 days agoMerge pull request #69201 from kginonredhat/issue-75359-iscsi-rocky-doc
Ilya Dryomov [Thu, 4 Jun 2026 09:40:39 +0000 (11:40 +0200)]
Merge pull request #69201 from kginonredhat/issue-75359-iscsi-rocky-doc

doc/rbd: clarify Rocky iSCSI gateway requirements

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
9 days agoMerge pull request #68734 from rhcs-dashboard/sync-policy-daemon-context
Aashish Sharma [Thu, 4 Jun 2026 08:58:46 +0000 (14:28 +0530)]
Merge pull request #68734 from rhcs-dashboard/sync-policy-daemon-context

mgr/dashboard: multisite sync-policy page should include daemon selection

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
9 days agoMerge pull request #69183 from tchaikov/wip-rgw-posix-leak
Kefu Chai [Thu, 4 Jun 2026 08:04:23 +0000 (16:04 +0800)]
Merge pull request #69183 from tchaikov/wip-rgw-posix-leak

rgw/posix: fix leaks in error paths

Reviewed-by: Matt Benjamin <mbenjamin@redhat.com>
9 days agoMerge pull request #69215 from sunyuechi/wip-legacy-option-headers-race
Kefu Chai [Thu, 4 Jun 2026 07:59:24 +0000 (15:59 +0800)]
Merge pull request #69215 from sunyuechi/wip-legacy-option-headers-race

cmake: link legacy-option-headers from targets that use legacy options

Reviewed-by: Kefu Chai <k.chai@proxmox.com>
10 days agoceph.spec.in: require c-ares >= 1.28 for ceph-osd-crimson 69259/head
Kautilya Tripathi [Wed, 3 Jun 2026 08:32:08 +0000 (14:02 +0530)]
ceph.spec.in: require c-ares >= 1.28 for ceph-osd-crimson

Seastar's DNS stack uses ares_query_dnsrec when built against c-ares
>= 1.28 (ARES_VERSION >= 0x011c00). Only ceph-osd-crimson links that
path; classic-osd does not, so add the version floor on the crimson
subpackage only.

Rocky Linux 10 shaman builds use docker.io/rockylinux/rockylinux:10
(os-release 10.1), but dnf builddeps resolve against the live Rocky 10
BaseOS/AppStream repos, which track the newest minor and install
c-ares-devel/c-ares 1.34.6. CMake links ceph-osd-crimson against that
library. Teuthology nodes are provisioned as Rocky 10.1 and install only
the requested Ceph packages without a full distro upgrade, so their
baseline c-ares stays at 1.25.0 (< 1.28, no ares_query_dnsrec). Install
succeeds but OSD startup fails with "undefined symbol: ares_query_dnsrec".

Require c-ares >= 1.28 on ceph-osd-crimson so dnf upgrades to a suitable
libcares (1.34.6 is already in Rocky 10.1 baseos) or fails cleanly at
install. Ubuntu crimson CI does not show this mismatch: the same LTS is
used for building and testing, and maintainers do not bump upstream
package versions across an LTS lifecycle (only cherry-picked fixes), so
build-time and runtime libc-ares stay aligned.

Signed-off-by: Kautilya Tripathi <kautilya.tripathi@ibm.com>
10 days agomgr/dashboard: cached osd_map pop pg_temp 64805/head
Nitzan Mordechai [Thu, 4 Jun 2026 06:52:18 +0000 (06:52 +0000)]
mgr/dashboard: cached osd_map pop pg_temp

get('osd_map') returns the cached object directly, so del and key
assignments were silently corrupting the cache for subsequent callers.
Take a shallow copy before modifying, and use pop() instead of del in
case the cache was already corrupted.

Fixes: https://tracker.ceph.com/issues/72447
Signed-off-by: Nitzan Mordechai <nmordec@ibm.com>
10 days agomgr/cli_api: pretty_json for mappingproxy fix
Nitzan Mordechai [Thu, 4 Sep 2025 14:24:53 +0000 (14:24 +0000)]
mgr/cli_api: pretty_json for mappingproxy fix

since we are modifying read-only python object, we need to copy it first

Fixes: https://tracker.ceph.com/issues/72447
Signed-off-by: Nitzan Mordechai <nmordec@ibm.com>
10 days agomgr/DaemonServer: clarify ok-to-upgrade error message for CRUSH buckets 69270/head
Sridhar Seshasayee [Wed, 6 May 2026 15:11:33 +0000 (20:41 +0530)]
mgr/DaemonServer: clarify ok-to-upgrade error message for CRUSH buckets

Refine the error string in DaemonServer.cc returned by the
ok-to-upgrade command when OSDs in a CRUSH bucket cannot be upgraded.

The original message is ambiguous. It fails to clearly convey that
stopping *any* individual OSD in that specific bucket will drop PGs
offline, meaning no OSDs within that bucket can be safely upgraded at
this time.

Update the phrasing to explicitly state that at least X PGs will go offline
if any OSD out of the total count in that CRUSH bucket is stopped. Also
standardize on capitalized acronyms (PG, OSD, CRUSH) and wrap the bucket
name in single quotes for better log readability.

Fixes: https://tracker.ceph.com/issues/74612
Signed-off-by: Sridhar Seshasayee <sridhar.seshasayee@ibm.com>
10 days agoceph.spec: declare PyYAML and Jinja2 Requires for cephadm RPM 68368/head
Kobi Ginon [Wed, 20 May 2026 13:09:53 +0000 (16:09 +0300)]
ceph.spec: declare PyYAML and Jinja2 Requires for cephadm RPM

cephadm's zipapp imports yaml and jinja2. When the package is built
with cephadm_bundling and RPM-sourced deps (--without cephadm_pip_deps),
only BuildRequires were listed for jinja2 and no runtime Requires were
declared for PyYAML or Jinja2, so minimal "dnf install cephadm" could
fail with ModuleNotFoundError. The unbundled cephadm case also lacked
PyYAML.

Add matching BuildRequires/Requires in the rpm-bundle branch and
Requires (plus SUSE naming for Jinja2/PyYAML) in the no-bundle branch.

Fixes: https://tracker.ceph.com/issues/75389
Signed-off-by: Kobi Ginon <kginon@redhat.com>
10 days agoMerge pull request #69180 from ljflores/wip-tracker-68669
Laura Flores [Wed, 3 Jun 2026 21:45:40 +0000 (16:45 -0500)]
Merge pull request #69180 from ljflores/wip-tracker-68669

qa/workunits/rados: fetch files via GitHub instead of git.ceph.com

Reviewed-by: Kamoltat Sirivadhna <ksirivad@ibm.com>
10 days agoMerge pull request #69268 from ronen-fr/wip-rf-utst-alienstore-crimson
Ronen Friedman [Wed, 3 Jun 2026 18:24:09 +0000 (21:24 +0300)]
Merge pull request #69268 from ronen-fr/wip-rf-utst-alienstore-crimson

crimson/tests: emit success message in unittest-seastar-alienstore-thread-pool

Reviewed-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
10 days agoMerge pull request #68979 from nhoad/trim-dead-file
Nathan Hoad [Wed, 3 Jun 2026 16:37:36 +0000 (12:37 -0400)]
Merge pull request #68979 from nhoad/trim-dead-file

rgw: Remove blank file.

10 days agoMerge pull request #66154 from dparmar18/fix-ganesha-conf-vstart
Christopher Hoffman [Wed, 3 Jun 2026 16:02:38 +0000 (12:02 -0400)]
Merge pull request #66154 from dparmar18/fix-ganesha-conf-vstart

src/vstart.sh: fix start_ganesha() to avoid crashing nfs-ganesha server

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Christopher Hoffman <choffman@redhat.com>
10 days agopython-common: Move validation function to utils and remove unused 67384/head
Ashwin M. Joshi [Wed, 18 Feb 2026 05:49:12 +0000 (11:19 +0530)]
python-common: Move validation function to utils and remove unused

Fixes: https://tracker.ceph.com/issues/74986
Signed-off-by: Ashwin M. Joshi <ashjosh1@in.ibm.com>
 Conflicts:
src/python-common/ceph/deployment/service_spec.py
src/python-common/ceph/deployment/utils.py

10 days agoMerge pull request #69221 from Jayaprakash-ibm/wip-doc-bluefs-spillover-cleaners
Jaya Prakash [Wed, 3 Jun 2026 15:49:12 +0000 (21:19 +0530)]
Merge pull request #69221 from Jayaprakash-ibm/wip-doc-bluefs-spillover-cleaners

doc/rados/bluestore: Add documentation for the BlueFS spillover cleaner

Reviewed-by: Adam Kupczyk <akupczyk@ibm.com>
10 days agodoc/rbd: clarify Rocky iSCSI gateway requirements 69201/head
Kobi Ginon [Wed, 3 Jun 2026 14:03:09 +0000 (17:03 +0300)]
doc/rbd: clarify Rocky iSCSI gateway requirements

List Rocky Linux 8+ alongside RHEL/CentOS Stream 7.5+. Note that packaged
ceph-iscsi must recognize Rocky in /etc/os-release (ceph-iscsi#282). Add a
short Rocky note under iSCSI targets; expand the overview maintenance
warning with migration guidance to RBD and the NVMe-oF gateway.

Co-authored-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Kobi Ginon <kginon@redhat.com>
10 days agocrimson/tests: emit success message in unittest-seastar-alienstore-thread-pool 69268/head
Ronen Friedman [Wed, 3 Jun 2026 14:17:46 +0000 (14:17 +0000)]
crimson/tests: emit success message in unittest-seastar-alienstore-thread-pool

... avoiding the need to guess the results.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
10 days agoosdc: deliver neorados completions to associated executor
Casey Bodley [Tue, 2 Jun 2026 20:17:59 +0000 (16:17 -0400)]
osdc: deliver neorados completions to associated executor

while Objecter delivers librados completions directly by calling
Context::complete(), neorados completions are passed in as
boost::asio::any_completion_handler and delivered to an asio executor
via boost::asio::defer() on completion

asio handlers may have an "associated executor" so callers can customize
where these completions are delivered. for example, multithreaded
applications often use strand executors to synchronize completion
handlers and prevent data races between concurrent operations

however, applications like radosgw that depend on strands for
thread-safety did not get it due to the fact that Objecter's
Op::complete() delivered all neorados completions to the default
io_context executor

use boost::asio::get_associated_executor() to respect the handler's
executor affinity, if any. but because the Op's handler is the
type-erased any_completion_handler, its associated executor is also
type-erased as any_completion_executor. that any_completion_executor
doesn't support the blocking::never_t property required by defer/post,
so defer() was changed to dispatch() which may call the handler directly
if Objecter is already running on the requested executor. i assume this
is safe, given that librados' Context-based completions already do this

Fixes: https://tracker.ceph.com/issues/76725
Co-Authored-by: Oguzhan Ozmen <oozmen@bloomberg.net>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
10 days agoqa: updating the 'mds_log_max_segments' to new min value and including 'test_meta_inj... 56634/head
neeraj pratap singh [Thu, 4 Apr 2024 03:56:59 +0000 (09:26 +0530)]
qa: updating the 'mds_log_max_segments' to new min value and including 'test_meta_injection' into test suite

Fixes: https://tracker.ceph.com/issues/64064
Signed-off-by: Neeraj Pratap Singh <neesingh@redhat.com>
10 days agomds : correction in the description for mds_log_max_segments config
neeraj pratap singh [Tue, 2 Apr 2024 11:03:31 +0000 (16:33 +0530)]
mds : correction in the description for mds_log_max_segments config

Since we use unsigned integer for the config option
`mds_log_max_segments` , the value '-1' is not permitted.
And there's no need to disable this limit. Hence removing
this statement from the its description.

Fixes: https://tracker.ceph.com/issues/64064
Signed-off-by: Neeraj Pratap Singh <neesingh@redhat.com>
10 days agoMerge pull request #68863 from benhanokh/dedup_ops_api
Gabriel Benhanokh [Wed, 3 Jun 2026 11:24:06 +0000 (14:24 +0300)]
Merge pull request #68863 from benhanokh/dedup_ops_api

rgw/dedup: add Admin OPS REST API for dedup commands

10 days agodoc/rados/bluestore: Add documentation for the BlueFS spillover cleaner 69221/head
Jaya Prakash [Mon, 1 Jun 2026 15:39:24 +0000 (15:39 +0000)]
doc/rados/bluestore: Add documentation for the BlueFS spillover cleaner

Fixes: https://tracker.ceph.com/issues/74319
Signed-off-by: Jaya Prakash <jayaprakash@ibm.com>
10 days agocrimson/os/seastore/cache: avoid double increments 69211/head
Matan Breizman [Wed, 3 Jun 2026 10:08:57 +0000 (10:08 +0000)]
crimson/os/seastore/cache: avoid double increments

reseting a txn doesnt really create a new one semantically.
avoid incrementing "created" on reset, otherwise we end up
with inflated numbers where MUTATE txn created count
is twice as higher than committed.

Note, "resets" are already tracked as invalidated.

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
10 days agocrimson/os/seastore: move reset_preserve_handle to retry prepare step
Matan Breizman [Wed, 3 Jun 2026 09:43:19 +0000 (09:43 +0000)]
crimson/os/seastore: move reset_preserve_handle to retry prepare step

do_transaction_no_callbacks() relied on with_repeat_trans_intr()
implicitly resetting the transaction before every retry attempt.

That behavior is policy, not retry mechanics, so it should live
with the caller.

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
10 days agocrimson/os/seastore: make retry helper policy-free
Matan Breizman [Wed, 3 Jun 2026 09:39:52 +0000 (09:39 +0000)]
crimson/os/seastore: make retry helper policy-free

with_repeat_trans_intr() was previously mixing retry mechanics with
transaction-specific behavior such as reset and preserve handle.

instead, introduce prepare_attempt hook to run before each retry
attempt. Keep the older variant overload with no prepare (internal
users)

Also, fix func/args forwarding to avoid re-moving.

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
10 days agoMerge pull request #69007 from MaxKellermann/test__missing_includes
Ilya Dryomov [Wed, 3 Jun 2026 09:39:18 +0000 (11:39 +0200)]
Merge pull request #69007 from MaxKellermann/test__missing_includes

test: add missing includes

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
10 days agoosd: Add ECOmapJournal class and relocate OmapUpdateType enum class
Matty Williams [Fri, 12 Dec 2025 10:13:11 +0000 (10:13 +0000)]
osd: Add ECOmapJournal class and relocate OmapUpdateType enum class

The ECOmapJournal will be used to store omap updates (in ec pools with optimisations enabled) which have not yet been committed to the object store.
Added unit tests for this class.
Promoted OmapUpdateType to osd_types.h so that it can be shared to multiple files without circular dependencies.

Signed-off-by: Matty Williams <Matty.Williams@ibm.com>
10 days agoMerge pull request #66459 from aainscow/ec_direct_reads_pr2
Jon Bailey [Wed, 3 Jun 2026 09:13:42 +0000 (10:13 +0100)]
Merge pull request #66459 from aainscow/ec_direct_reads_pr2

EC Direct Reads

Reviewed-by: Bill Scales <bill_scales@uk.ibm.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Adam Emerson <aemerson@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
10 days agomgr/telemetry: get mutable copy for pool stats
Nitzan Mordechai [Wed, 3 Sep 2025 12:03:02 +0000 (12:03 +0000)]
mgr/telemetry: get mutable copy for pool stats

Since we are changing the 'application' for the report,
we need non-RO, in case of cached api call.
using 'pool_stats' map directly to avoid copy of the pg_dump
that can be huge.

Fixes: https://tracker.ceph.com/issues/72447
Signed-off-by: Nitzan Mordechai <nmordec@ibm.com>
10 days agomgr/insights: fix read-only copy before change
Nitzan Mordechai [Wed, 27 Aug 2025 10:49:40 +0000 (10:49 +0000)]
mgr/insights: fix read-only copy before change

since we are modifying read-only python object, we need to copy it first

Fixes: https://tracker.ceph.com/issues/72447
Signed-off-by: Nitzan Mordechai <nmordec@ibm.com>
10 days agomgr: replace TTLCache with MgrMapCache and protect api with readonly
Nitzan Mordechai [Thu, 17 Jul 2025 06:17:00 +0000 (06:17 +0000)]
mgr: replace TTLCache with MgrMapCache and protect api with readonly

This patch removes the old TTLCache implementation and introduces
a new generic MgrMapCache driven by a runtime toggle:

- Add `mgr_map_cache_enabled` config option in global.yaml
- Swap out `ttl_cache` for `api_cache` (MgrMapCache) in ActivePyModules
- Update cacheable_get_python() and get_python() to use LFU‐based api_cache
- add new get_mutable parameter to the get api call to get a copy.
- Invalidate api_cache on notify_all events
- Remove all TTLCache headers, sources, and tests
- Include MgrMapCache.cc in CMakeLists and update BaseMgrModule bindings
- Improve logging around cache hits, misses, and state changes

- ActivePyModules
  * Remove unused update_cache_metrics()
  * Log cache hits/misses inline and only insert into cache when
    enabled+cacheable (with proper Py_INCREF)
  * Switch get_python() to use PyFormatterRO for cacheable routes, PyFormatter otherwise

- MgrMapCache/LFUCache
  * Add can_read_cache()/can_write_cache() helpers and use const& for key parameters
  * Guard perf counter increments and improve debug logging

- PyFormatter
  * Add PyFormatterRO subclass that freezes dicts/lists into read-only
    proxies on the fly

- Python mgr_module
  * Simplify get() to return raw result

This change ensures immutable JSON output on cache hits and tightens up cache logic.

mgr/cli: add cache flush command with proper status reporting

Allow operators to invalidate individual mgr Python caches at runtime
without restarting the manager. Introduces a new CLI command:

  $ ceph mgr cli cache flush <map-name>

which returns success or a clear error if the named cache entry doesn’t
exist or isn’t cacheable. This makes it easy to drop stale cached maps
(e.g. osd_map, mon_map) on demand.

Fixes: https://tracker.ceph.com/issues/72447
Signed-off-by: Nitzan Mordechai <nmordec@ibm.com>
mgr: add new unit tests for MgrMapCache

- Guard against null perf‐counter before calling inc(), preventing crashes
- Add “foo” to allowed_keys list (for test coverage)
- Rename and refocus the CMake test target from TTLCache to MgrMapCache
- Introduce test_mgrmapcache.cc with LFUCache tests.
- Remove the obsolete test_ttlcache.cc

Fixes: https://tracker.ceph.com/issues/72447
Signed-off-by: Nitzan Mordechai <nmordec@ibm.com>
mgr/test_cache: add new tests

adding new unit-test for mgrcache

Fixes: https://tracker.ceph.com/issues/72447
Signed-off-by: Nitzan Mordechai <nmordec@ibm.com>
10 days agoMerge pull request #68996 from VallariAg/wip-nvmeof-cli-warning
Vallari Agrawal [Wed, 3 Jun 2026 08:07:35 +0000 (13:37 +0530)]
Merge pull request #68996 from VallariAg/wip-nvmeof-cli-warning

mgr/dashboard: show warning message in nvmeof cli

11 days agocrimson/osd: add debug logs for snaptrim and scrub background_process_lock 69240/head
Aishwarya Mathuria [Tue, 2 Jun 2026 13:37:50 +0000 (19:07 +0530)]
crimson/osd: add debug logs for snaptrim and scrub background_process_lock

Add targeted debug logs to help diagnose snaptrim stalls under thrash/pggrow

Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
11 days agoMerge pull request #68817 from tchaikov/nvme-of-mon-client
Kefu Chai [Wed, 3 Jun 2026 07:00:43 +0000 (15:00 +0800)]
Merge pull request #68817 from tchaikov/nvme-of-mon-client

cmake,debian: enable ceph-mon-client-nvmeof on Debian derivatives

Reviewed-by: Dan Mick <dan.mick@redhat.com>
11 days agoMerge PR #66748 into main
Venky Shankar [Wed, 3 Jun 2026 06:54:23 +0000 (12:24 +0530)]
Merge PR #66748 into main

* refs/pull/66748/head:
doc: Document that client_dirsize_rbytes confuses rsync

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
11 days agocrimson/osd: fold the split-child setup into handle_split_pg_creation 69120/head
Kefu Chai [Wed, 3 Jun 2026 05:53:06 +0000 (13:53 +0800)]
crimson/osd: fold the split-child setup into handle_split_pg_creation

A readability cleanup with no behaviour change.

split_pg()'s loop still did all the per-child setup inline (core
mapping, make_pg, split_colls, split_into, the snapmapper touch) and
then called handle_split_pg_creation() to kick off the child's
PGAdvanceMap. That is a lot of detail for what the loop is really doing.

So let's move that setup into handle_split_pg_creation() and have it
return the child PG. The loop then just asks it to create each child and
collects the result, and the per-child PeeringCtx never has to leave the
function. Children are still created one at a time, each with its own
PeeringCtx.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>
11 days agocrimson/osd: give each split child its own PeeringCtx
Kefu Chai [Tue, 26 May 2026 08:01:48 +0000 (16:01 +0800)]
crimson/osd: give each split child its own PeeringCtx

This is a readability cleanup, not a bugfix; the existing code is
correct.

handle_split_pg_creation() handed std::move(rctx) to each child's
PGAdvanceMap, reusing the parent's PeeringCtx by moving it into the
child. That leaves the parent's rctx moved-from, so the next split_pg()
iteration writes into the empty husk and moves it again, and
split_stats() runs against it afterwards too. It works, but only because
a moved-from ceph::os::Transaction comes back empty, which is a subtle
thing to rely on.

So let's just give each child its own PeeringCtx: split_colls() and the
snapmapper touch go into the child's context, and we hand that to the
child's do-init PGAdvanceMap. The only behavioural difference is that the
parent's own map-advance writes now commit with the parent instead of
riding into the first child's transaction, which is harmless because the
children are built from the parent's in-memory state.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>
11 days agoMerge pull request #69167 from rhcs-dashboard/fix-76989-main
Aashish Sharma [Wed, 3 Jun 2026 04:36:33 +0000 (10:06 +0530)]
Merge pull request #69167 from rhcs-dashboard/fix-76989-main

mgr/dashboard: Add Sync from/sync from all options on master zone edit

Reviewed-by: Naman Munet <nmunet@redhat.com>
11 days agomgr/dashboard: multisite sync-policy page should include daemon selection 68734/head
Naman Munet [Mon, 4 May 2026 12:57:53 +0000 (18:27 +0530)]
mgr/dashboard: multisite sync-policy page should include daemon selection

Fixes: https://tracker.ceph.com/issues/71522
Changes includes:
- Added daemon selection support to all sync policy endpoints
- Enhanced backend with daemon context awareness
- Fetch only the sync policies from the specified daemon

Signed-off-by: Naman Munet <naman.munet@ibm.com>
11 days agoMerge pull request #69137 from tchaikov/wip-assert-all-fmt
Kefu Chai [Wed, 3 Jun 2026 01:33:47 +0000 (09:33 +0800)]
Merge pull request #69137 from tchaikov/wip-assert-all-fmt

crimson: replace assert_all class with a format-safe function template

Reviewed-by: Ronen Friedman <rfriedma@redhat.com>
11 days agoMerge PR #69222 into main
Patrick Donnelly [Tue, 2 Jun 2026 19:59:10 +0000 (15:59 -0400)]
Merge PR #69222 into main

* refs/pull/69222/head:
qa: install nvme-cli only if distro remains rocky10

Reviewed-by: Redouane Kachach <rkachach@redhat.com>
11 days agoos/bluestore: enable non-buffered IO on WAL read 65275/head
Igor Fedotov [Tue, 2 Sep 2025 15:41:50 +0000 (18:41 +0300)]
os/bluestore: enable non-buffered IO on WAL read

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
11 days agoos/bluestore: use non-buffered IO for WAL files
Igor Fedotov [Wed, 27 Aug 2025 19:36:12 +0000 (22:36 +0300)]
os/bluestore:  use non-buffered IO for WAL files

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
11 days agoos/bluestore: use single bufferlist:splice() call.
Igor Fedotov [Wed, 27 Aug 2025 19:13:23 +0000 (22:13 +0300)]
os/bluestore: use single bufferlist:splice() call.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
11 days agochanges requested by reviewer 68863/head
Gabriel BenHanokh [Tue, 2 Jun 2026 19:12:55 +0000 (19:12 +0000)]
changes requested by reviewer

Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
11 days agoMerge pull request #68941 from adamemerson/wip-rgw-deprecate-omap-datalog
Adam Emerson [Tue, 2 Jun 2026 18:39:46 +0000 (14:39 -0400)]
Merge pull request #68941 from adamemerson/wip-rgw-deprecate-omap-datalog

rgw: Deprecate OMAP datalog

Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
11 days agoMerge PR #69219 into main
Patrick Donnelly [Tue, 2 Jun 2026 17:59:18 +0000 (13:59 -0400)]
Merge PR #69219 into main

* refs/pull/69219/head:
script/backport-create-issue: catch errors during traversal

Reviewed-by: Yuri Weinstein <yweins@redhat.com>
11 days agoMerge pull request #69174 from dang/wip-dang-merge-standalone
Daniel Gryniewicz [Tue, 2 Jun 2026 16:37:31 +0000 (12:37 -0400)]
Merge pull request #69174 from dang/wip-dang-merge-standalone

Merge rgw-standalone

11 days agoceph.spec.in: add support for ceph-smb-ctl entrypoint script
John Mulligan [Wed, 22 Apr 2026 18:06:20 +0000 (14:06 -0400)]
ceph.spec.in: add support for ceph-smb-ctl entrypoint script

This package is conditional on the new `pypkg` packaging mode.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
11 days agopython-common: add entry point for smb ctl to pyproject.toml
John Mulligan [Wed, 15 Apr 2026 19:00:45 +0000 (15:00 -0400)]
python-common: add entry point for smb ctl to pyproject.toml

Allow invoking the ceph.smb.ctl as a script `ceph-smb-ctl`.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
11 days agoMerge pull request #68756 from phlogistonjohn/jjm-smb-ctl-tool
John Mulligan [Tue, 2 Jun 2026 14:42:52 +0000 (10:42 -0400)]
Merge pull request #68756 from phlogistonjohn/jjm-smb-ctl-tool

smb: add a ceph based smb remote control client tool

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Anoop C S <anoopcs@cryptolab.net>
11 days agomgr/cephadm: Don't skip OSDs with non-empty osdspec_affinity 69247/head
Redouane Kachach [Tue, 2 Jun 2026 14:33:17 +0000 (16:33 +0200)]
mgr/cephadm: Don't skip OSDs with non-empty osdspec_affinity

`ceph cephadm osd activate` calls `deploy_osd_daemons_for_existing_osds`
with a synthetic DriveGroupSpec where service_id=''. Commit fbe3a053
introduced an unconditional osdspec_affinity filter:

    if osd['tags']['ceph.osdspec_affinity'] != spec.service_id:
        continue

The fix is to only enforce the affinity check when spec.service_id is
non-empty. An empty service_id means the caller is osd activate, which
should adopt any existing OSD regardless of its affinity tag.

Fixes: https://tracker.ceph.com/issues/76979
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
11 days agodoc/cephadm: fix typo and missing quote in activate-existing-osds 69246/head
Emmanuel Ameh [Tue, 2 Jun 2026 14:04:11 +0000 (15:04 +0100)]
doc/cephadm: fix typo and missing quote in activate-existing-osds

Correct "clea" to "clear" and add the missing closing quotation mark
after the description of an "online" host in the OSD activation section.

Fixes: https://tracker.ceph.com/issues/77075
Signed-off-by: Emmanuel Ameh <eameh@contractor.linuxfoundation.org>
11 days agoMerge pull request #68774 from aclamk/aclamk-doc-bs-rocksdb-perf-counters
Adam Kupczyk [Tue, 2 Jun 2026 13:16:26 +0000 (15:16 +0200)]
Merge pull request #68774 from aclamk/aclamk-doc-bs-rocksdb-perf-counters

doc/rados/bluestore: RockDB cache shards, perf counters

11 days agoMerge pull request #68430 from Jayaprakash-ibm/wip-bluefs-spillover-cleaner-rework
Jaya Prakash [Tue, 2 Jun 2026 13:04:54 +0000 (18:34 +0530)]
Merge pull request #68430 from Jayaprakash-ibm/wip-bluefs-spillover-cleaner-rework

os/bluestore: BlueFS Spillover Cleaner Evolution

Reviewed-by: Adam Kupczyk <akupczyk@ibm.com>
11 days agoMerge PR #67709 into main
Venky Shankar [Tue, 2 Jun 2026 10:12:16 +0000 (15:42 +0530)]
Merge PR #67709 into main

* refs/pull/67709/head:
tools/cephfs: always execute scan_{extents,inodes,frags} and cleanup

Reviewed-by: Edwin Rodriguez <edwin.rodriguez1@ibm.com>
11 days agoMerge pull request #69037 from dparmar18/i76728
Venky Shankar [Tue, 2 Jun 2026 08:55:12 +0000 (14:25 +0530)]
Merge pull request #69037 from dparmar18/i76728

mds: persist session auth_name in ESession journal event

Reviewed-by: Christopher Hoffman <choffman@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
11 days agorgw/posix: start the Inotify thread last, after the rest is built 69233/head
Kefu Chai [Tue, 2 Jun 2026 07:48:41 +0000 (15:48 +0800)]
rgw/posix: start the Inotify thread last, after the rest is built

f62e811f9ef fixed the wfd/efd init-order race but missed a sibling: thrd
was still declared before map_mutex, the watch maps and the shutdown flag.
Members come up in declaration order, so building thrd kicks off ev_loop()
while those are still uninitialized.

That is bad news, because ev_loop() reads shutdown and, when an event
arrives, locks map_mutex and pokes at the maps. Doing any of that before
they are constructed is undefined behavior: reading the shutdown atomic
before its initializer has even stored false, or locking a std::mutex that
does not exist yet. Making shutdown a std::atomic<bool> made concurrent
access well-defined, but that does not help if the load happens before the
object is constructed.

So just declare thrd last, and the thread will not start until everything
it touches is ready. wfd/efd stay ahead of it, so the earlier fix still
holds.

helgrind caught the shutdown read in ev_loop() racing its own initializer
in the constructor.

Fixes: https://tracker.ceph.com/issues/75601
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
11 days agoMerge pull request #67717 from leonidc/delay-failback
leonidc [Tue, 2 Jun 2026 08:03:36 +0000 (11:03 +0300)]
Merge pull request #67717 from leonidc/delay-failback

nvmeofgw: delay failback

11 days agoobjectstore/test_kv: Unittest for util_divide_key_range 68981/head
Adam Kupczyk [Mon, 18 May 2026 16:33:45 +0000 (16:33 +0000)]
objectstore/test_kv: Unittest for util_divide_key_range

Extensive tests for quality of KeyValueDB::util_divide_key_range.
Tests speed and correctness of split.
Has 2 control modes:
1) on Jenkins (detected by JENKINS_HOME) run with reduced scope
2) passing env VERBOSE=1 gives more details

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
12 days agokv/KeyValueDB: New utility function util_divide_key_range
Adam Kupczyk [Fri, 15 May 2026 16:07:07 +0000 (16:07 +0000)]
kv/KeyValueDB: New utility function util_divide_key_range

Significant reshuffle. Cleaned loops.
Points scanned on db were [size]->[key]. Now it is [key]->[size],
which is better since keys are unique by design, but calculation
of size can be a victim to RocksDB estimation precision.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
12 days agoMerge pull request #69105 from guits/fix-bypass_workqueue
Guillaume Abrioux [Tue, 2 Jun 2026 06:10:59 +0000 (08:10 +0200)]
Merge pull request #69105 from guits/fix-bypass_workqueue

ceph-volume: detect rotational media under dm-crypt for workqueue bypass

12 days agosrc/test/reclaim: test session reclaim after mds failover 69037/head
Dhairya Parmar [Mon, 25 May 2026 12:01:33 +0000 (17:31 +0530)]
src/test/reclaim: test session reclaim after mds failover

ensure that the new active MDS reads the auth_name from the ESession
event and assigns it to the new session that MDS creates during journal
replay.

NOTE: the mds failover is carried by sending "respawn" command to active
MDS using libcephfs's ceph_mds_command().

Fixes: https://tracker.ceph.com/issues/76728
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
12 days agomds: persist session auth_name in ESession journal event
Dhairya Parmar [Wed, 20 May 2026 21:18:15 +0000 (02:48 +0530)]
mds: persist session auth_name in ESession journal event

So that it can be applied to the freshly creation session which happens
while recreating session in ESession::replay when the OMAP version fell
behind the ESession cmapv and the newly creation session would be
rejected as target when a client tries to reclaim this session.

Fixes: https://tracker.ceph.com/issues/76728
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
12 days agoMerge pull request #68098 from sunyuechi/riscv-isa-l-support
Kefu Chai [Tue, 2 Jun 2026 03:47:35 +0000 (11:47 +0800)]
Merge pull request #68098 from sunyuechi/riscv-isa-l-support

isa-l: enable on RISC-V

Reviewed-by: Kefu Chai <k.chai@proxmox.com>
12 days agoMerge pull request #69121 from tchaikov/wip-seastore-rolling-in-bg
Kefu Chai [Tue, 2 Jun 2026 02:14:48 +0000 (10:14 +0800)]
Merge pull request #69121 from tchaikov/wip-seastore-rolling-in-bg

crimson/seastore: make RecordSubmitter::wait_available() idempotent

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
12 days agoMerge pull request #69214 from tchaikov/wip-cephadm-iscsi-gw
Kefu Chai [Mon, 1 Jun 2026 23:35:37 +0000 (07:35 +0800)]
Merge pull request #69214 from tchaikov/wip-cephadm-iscsi-gw

qa/cephadm: query iSCSI gateway FQDN from inside the container

Reviewed-by: Redouane Kachach <rkachach@ibm.com>
12 days agoMerge pull request #69026 from jamiepryde/ec-profile-deprecation-warning
SrinivasaBharathKanta [Mon, 1 Jun 2026 23:19:05 +0000 (04:49 +0530)]
Merge pull request #69026 from jamiepryde/ec-profile-deprecation-warning

Add health warning for deprecated EC plugins and techniques

12 days agoMerge PR #68362 into main
Patrick Donnelly [Mon, 1 Jun 2026 19:33:50 +0000 (15:33 -0400)]
Merge PR #68362 into main

* refs/pull/68362/head:
doc: squid 19.2.4 release notes

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Redouane Kachach <rkachach@redhat.com>
12 days agoMerge pull request #66936 from jacquesh/remove-text-output-from-rados-bench-json
Radoslaw Zarzynski [Mon, 1 Jun 2026 19:30:25 +0000 (21:30 +0200)]
Merge pull request #66936 from jacquesh/remove-text-output-from-rados-bench-json

tools/rados: Remove plain text snippets from rados bench JSON output

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
12 days agoMerge pull request #69061 from jzhu116-bloomberg/wip-70346
Radoslaw Zarzynski [Mon, 1 Jun 2026 19:00:45 +0000 (21:00 +0200)]
Merge pull request #69061 from jzhu116-bloomberg/wip-70346

osd: unregister admin socket commands in fast shutdown

Reviewed-by: Kefu Chai <k.chai@proxmox.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
12 days agocommon/options, os/bluestore: add debug option to force bluefs files onto slow device 68430/head
Jaya Prakash [Thu, 7 May 2026 12:09:07 +0000 (12:09 +0000)]
common/options, os/bluestore: add debug option to force bluefs files onto slow device

Fixes: https://tracker.ceph.com/issues/74319
Signed-off-by: Jaya Prakash <jayaprakash@ibm.com>
12 days agoos/bluestore: start/stop BlueFS spillover cleaner on config change
Jaya Prakash [Mon, 16 Mar 2026 19:22:49 +0000 (19:22 +0000)]
os/bluestore: start/stop BlueFS spillover cleaner on config change

Fixes: https://tracker.ceph.com/issues/74319
Signed-off-by: Jaya Prakash <jayaprakash@ibm.com>
(cherry picked from commit dc768b782d54cc6a5dee29a9c4f358e8b9183aa6)

12 days agoos/bluestore: migrated files in 128MB chunks
Jaya Prakash [Fri, 15 May 2026 17:07:32 +0000 (17:07 +0000)]
os/bluestore: migrated files in 128MB chunks

Signed-off-by: Jaya Prakash <jayaprakash@ibm.com>
12 days agoos/bluestore: Spillover Cleaner Thread implementation in BlueFS
Jaya Prakash [Thu, 16 Apr 2026 15:30:28 +0000 (15:30 +0000)]
os/bluestore: Spillover Cleaner Thread implementation in BlueFS

Fixes: https://tracker.ceph.com/issues/74319
Signed-off-by: Jaya Prakash <jayaprakash@ibm.com>
12 days agocommon/options: add bluefs_spillover_cleaner option
Jaya Prakash [Mon, 16 Mar 2026 19:23:05 +0000 (19:23 +0000)]
common/options: add bluefs_spillover_cleaner option

Fixes: https://tracker.ceph.com/issues/74319
Signed-off-by: Jaya Prakash <jayaprakash@ibm.com>
12 days agocrimson/os/seastore: enforce capacity in RBMCleaner::try_reserve_projected_usage 69153/head
Shai Fultheim [Sun, 24 May 2026 11:19:56 +0000 (14:19 +0300)]
crimson/os/seastore: enforce capacity in RBMCleaner::try_reserve_projected_usage

RBMCleaner::try_reserve_projected_usage always returned true and just
incremented stats.projected_used_bytes. The EPM BackgroundProcess
relies on the return value to block IO when the device is full, so
this effectively disabled backpressure for the RANDOM_BLOCK_SSD
backend: concurrent transactions could each reserve unbounded amounts,
and the over-commit surfaced downstream as `unexpected enospc` asserts
in the data path (object_data_handler.cc and friends, where ENOSPC is
treated as crimson::ct_error::enospc::assert_failure because the
existing infrastructure assumes ENOSPC is impossible). The OSD aborted
under sustained random-write workloads that exceeded RBM capacity.

Compute the device's data capacity as total - journal, subtract a 5%
headroom (for metadata writes and fragmentation slack the AVL allocator
cannot pack into), and reject reservations that would push
used + projected over the line. The existing EPM blocking-IO path
(extent_placement_manager.cc:726) already queues the IO until
release_projected_usage wakes it, so no caller-side changes are needed.

This is the minimal fix to keep the OSD alive under sustained random
writes. It converts a crash into a stall: once the device fills and
the cleaner has nothing to free (RBMCleaner::clean_space is still a
TODO), new writes block indefinitely instead of crashing. Verified
against an 8-job 1MB random-write fio (--size 63g, 90GB RBM, 3GB
journal): 68 GB user-written, host WAF 1.696, OSD survives, watchdog
kills fio after slow-ops timeout. Without this patch the same workload
asserts in the data path.

The headroom is intentionally generous (5%) because there is no GC
yet; once RBMCleaner::clean_space() exists, the headroom can shrink.

Fixes: https://tracker.ceph.com/issues/75598
Signed-off-by: Shai Fultheim <shai.fultheim@gmail.com>
12 days agorgw: SSE-KMS: Handle Testing Key Per Object 61256/head
Marcel Lauhoff [Tue, 5 May 2026 12:21:03 +0000 (14:21 +0200)]
rgw: SSE-KMS: Handle Testing Key Per Object

The testing backend uses a 'keysel' attribute to derive a per object
key from the KEK in the config. A single key_id with distinct keysel
has different keys and need to be cached as such.

Add the keysel to the cache key id to handle these collisions.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

12 days agorgw: KMS Cache Shutdown: Reaper first
Marcel Lauhoff [Tue, 10 Mar 2026 10:51:30 +0000 (11:51 +0100)]
rgw: KMS Cache Shutdown: Reaper first

1. Don't delete the KMS cache before draining/joining the frontend
coroutine threads. They may still depend on the KMS cache.
2. Stop the TTL reaper early to get it off the coroutine pool.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

12 days agorgw: KMS Cache: Reset Reaper State in Async+Threaded
Marcel Lauhoff [Fri, 6 Mar 2026 09:44:08 +0000 (10:44 +0100)]
rgw: KMS Cache: Reset Reaper State in Async+Threaded

Reset reaper state to monostate in the async and threaded case.
Fixes a possible use after free in the async reaper case.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

12 days agocommon/keyring: Fix reset error checking
Marcel Lauhoff [Tue, 3 Mar 2026 20:25:05 +0000 (21:25 +0100)]
common/keyring: Fix reset error checking

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

12 days agocommon/web_cache: Fix _sieve_hand dangling pointer
Marcel Lauhoff [Tue, 3 Mar 2026 20:24:57 +0000 (21:24 +0100)]
common/web_cache: Fix _sieve_hand dangling pointer

sieve_expire_erase_unmutexed did not update the sieve hand passed as
advertised. Make it return the updated hand and use that to
update the global _sieve_hand in expire_erase

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

12 days agocommon/web_cache: delete perfcounters on destruction
Marcel Lauhoff [Tue, 3 Mar 2026 20:24:26 +0000 (21:24 +0100)]
common/web_cache: delete perfcounters on destruction

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

12 days agorgw: SSE-KMS: Fix wrong cache key in in lookup_or() call
Marcel Lauhoff [Tue, 3 Mar 2026 20:24:34 +0000 (21:24 +0100)]
rgw: SSE-KMS: Fix wrong cache key in in lookup_or() call

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

12 days agorgw: SSE-KMS: Handle Vault Transit Key Per Object
Marcel Lauhoff [Tue, 3 Mar 2026 20:24:10 +0000 (21:24 +0100)]
rgw: SSE-KMS: Handle Vault Transit Key Per Object

KMS backends Barbican, Vault KV, and KMIP have a static key per
key_id. However, with Vault Transit, each object has a unique DEK
wrapped by the transit key.

Keying th cache with key_id in Transit mode results in only the first
DEK to be cached for all subsequent objects.

Fix this by appending a hash of the wrapped DEK to the cache key.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

12 days agorgw: Fix typos in perf counter descriptions
Marcel Lauhoff [Tue, 3 Mar 2026 20:24:48 +0000 (21:24 +0100)]
rgw: Fix typos in perf counter descriptions

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

12 days agoPendingReleaseNotes: Add SSE-KMS Cache
Marcel Lauhoff [Fri, 19 Dec 2025 09:20:13 +0000 (10:20 +0100)]
PendingReleaseNotes: Add SSE-KMS Cache

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

12 days agorgw: SSE-KMS Secrets Cache
Marcel Lauhoff [Thu, 19 Dec 2024 14:41:30 +0000 (15:41 +0100)]
rgw: SSE-KMS Secrets Cache

Add SSE Key Management System secrets cache to RGW.

It is common to have secrets shared by many if not all objects in a
bucket. Without RGW-side caching every PUT/GET will cause a request to
an external KSM. This not only adds load to the KSM, but also slows
down read and writes.

Combine WebCache, ceph::async::call_once and LinuxKeyringSecret into
KMSCache. WebCache stores async::once_result to wrap results of a KMS
secret fetch to mitigate cache stampedes (concurrent cache requests to
the same key coalesce into one). The retrieved secrets are stored in
the Linux kernel key retention service (LinuxKeyringSecret) for safe
keeping and retrial by subsequent requests. KMSCache adds a TTL reaper
and life cycle.

Cache values and error handling: The cache stores positive
fetch results, permanent errors (e.g key does not exists) and
transient errors (e.g fetch timeout). Each with a different TTL.

Unit tests to cover cached / uncached KMS retrieve and runtime cache
disable via config.

Add perf counter `kms_fetch_lat` to track KMS fetch request latency
and error counters to track permanent, transient and key store
errors.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
Fixes: https://tracker.ceph.com/issues/68524
On-behalf-of: SAP marcel.lauhoff@sap.com

12 days agocommon: Refactor LinuxKeyringSecret into Keyring Interface
Marcel Lauhoff [Thu, 18 Dec 2025 18:42:07 +0000 (19:42 +0100)]
common: Refactor LinuxKeyringSecret into Keyring Interface

Goal: Support multiple backends and faking / mocking for testing.

Add abstract classes Keyring (factory) and KeyringSecret. Add
"Unsupported" implementation for non-Linux platforms. Add a get_best
factory function that currently returns the LinuxKeyring impl on Linux
or Unsupported elsewhere.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

12 days agodoc: Document RGW KMS Cache
Marcel Lauhoff [Fri, 27 Jun 2025 10:22:06 +0000 (12:22 +0200)]
doc: Document RGW KMS Cache

Add caching section to the RGW Encryption docs. Add cache
settings to the RGW configuration reference.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

12 days agorgw: Early Linux process keyring initialization
Marcel Lauhoff [Fri, 13 Jun 2025 14:45:41 +0000 (16:45 +0200)]
rgw: Early Linux process keyring initialization

To allow RGW threads to share possession over process keyring keys the
keyring must be created before a child thread adds keys.

Since we only use the process keyring for KMS cache secrets, only
initialize the keyring if it is enabled on startup.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

12 days agocommon: Add Linux Keyring Secret Store Wrapper
Marcel Lauhoff [Fri, 25 Apr 2025 14:27:57 +0000 (16:27 +0200)]
common: Add Linux Keyring Secret Store Wrapper

Add RAII wrapper around the Linux Key Retention Service
add_key(2), keyctl_read(3), keyctl_invalidate(3)

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

12 days agotest: Add Secrets Store µBenchmarks
Marcel Lauhoff [Thu, 20 Mar 2025 16:27:11 +0000 (17:27 +0100)]
test: Add Secrets Store µBenchmarks

Benchmark:
- Linux Kernel Key Retention Service (kernel keystore) [0]
- memfd_secret(2)
- plain memory

Tests:
- Random reads
- (keystore) Write, Read, Remove

[0] https://docs.kernel.org/security/keys/core.html

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

12 days agotest: Add Cache benchmarks
Marcel Lauhoff [Fri, 21 Feb 2025 11:42:07 +0000 (12:42 +0100)]
test: Add Cache benchmarks

Add Google benchmark [0] based micro benchmarks for Cache/LRU
implementations in the Ceph code base.

[0] https://github.com/google/benchmark

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

12 days agocommon: Add WebCache
Marcel Lauhoff [Tue, 11 Feb 2025 12:48:22 +0000 (13:48 +0100)]
common: Add WebCache

A cache data structure for values that need to be retrieved form
outside systems (e.g Key Management Systems).

Features:
- Thread safe, optimized for concurrent lookups and cache hits
- Entry TTL expiration
- Cache replacement strategy tuned to "web" workloads (SIEVE)
- Performance Counters on hit, miss, expire, size, capacity, clears

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

12 days agocommon/async: add call_once() algorithm for optional_yield
Casey Bodley [Mon, 24 Mar 2025 16:51:15 +0000 (12:51 -0400)]
common/async: add call_once() algorithm for optional_yield

modeled after std::call_once() to guarantee that racing callers wait for
the initial caller to finish. the main differences here are

* support for coroutine callers to suspend instead of blocking while
  waiting for the initial caller, and
* the wrapped function must return a value, which is cached and returned
  to all callers

Signed-off-by: Casey Bodley <cbodley@redhat.com>
12 days agocommon/async: yield_waiter can return the associated executor
Casey Bodley [Tue, 25 Mar 2025 22:06:36 +0000 (18:06 -0400)]
common/async: yield_waiter can return the associated executor

also adds an empty() function so it's easier to specify its precondition

Signed-off-by: Casey Bodley <cbodley@redhat.com>
12 days agocommon/async: yield_waiter overloads for unique_lock
Casey Bodley [Mon, 24 Mar 2025 16:50:16 +0000 (12:50 -0400)]
common/async: yield_waiter overloads for unique_lock

if async_wait() can race with complete() across threads, the
yield_waiter's handler_state needs to be protected by a mutex. add
an async_wait() overload for unique_lock that behaves like
condition_variable::wait(): the lock is released immediately before
suspending, and reacquired immediately before calling its completion
handler

Signed-off-by: Casey Bodley <cbodley@redhat.com>