git-server-git.apps.pok.os.sepia.ceph.com Git

common/options, os/bluestore: add debug option to force bluefs files onto slow device

Fixes: https://tracker.ceph.com/issues/74319
Signed-off-by: Jaya Prakash <jayaprakash@ibm.com>

os/bluestore: start/stop BlueFS spillover cleaner on config change

Fixes: https://tracker.ceph.com/issues/74319
Signed-off-by: Jaya Prakash <jayaprakash@ibm.com>
(cherry picked from commit dc768b782d54cc6a5dee29a9c4f358e8b9183aa6)

os/bluestore: migrated files in 128MB chunks

Signed-off-by: Jaya Prakash <jayaprakash@ibm.com>

os/bluestore: Spillover Cleaner Thread implementation in BlueFS

Fixes: https://tracker.ceph.com/issues/74319
Signed-off-by: Jaya Prakash <jayaprakash@ibm.com>

common/options: add bluefs_spillover_cleaner option

Fixes: https://tracker.ceph.com/issues/74319
Signed-off-by: Jaya Prakash <jayaprakash@ibm.com>

crimson/os/seastore: enforce capacity in RBMCleaner::try_reserve_projected_usage

RBMCleaner::try_reserve_projected_usage always returned true and just
incremented stats.projected_used_bytes. The EPM BackgroundProcess
relies on the return value to block IO when the device is full, so
this effectively disabled backpressure for the RANDOM_BLOCK_SSD
backend: concurrent transactions could each reserve unbounded amounts,
and the over-commit surfaced downstream as `unexpected enospc` asserts
in the data path (object_data_handler.cc and friends, where ENOSPC is
treated as crimson::ct_error::enospc::assert_failure because the
existing infrastructure assumes ENOSPC is impossible). The OSD aborted
under sustained random-write workloads that exceeded RBM capacity.

Compute the device's data capacity as total - journal, subtract a 5%
headroom (for metadata writes and fragmentation slack the AVL allocator
cannot pack into), and reject reservations that would push
used + projected over the line. The existing EPM blocking-IO path
(extent_placement_manager.cc:726) already queues the IO until
release_projected_usage wakes it, so no caller-side changes are needed.

This is the minimal fix to keep the OSD alive under sustained random
writes. It converts a crash into a stall: once the device fills and
the cleaner has nothing to free (RBMCleaner::clean_space is still a
TODO), new writes block indefinitely instead of crashing. Verified
against an 8-job 1MB random-write fio (--size 63g, 90GB RBM, 3GB
journal): 68 GB user-written, host WAF 1.696, OSD survives, watchdog
kills fio after slow-ops timeout. Without this patch the same workload
asserts in the data path.

The headroom is intentionally generous (5%) because there is no GC
yet; once RBMCleaner::clean_space() exists, the headroom can shrink.

Fixes: https://tracker.ceph.com/issues/75598
Signed-off-by: Shai Fultheim <shai.fultheim@gmail.com>

rgw: SSE-KMS: Handle Testing Key Per Object

The testing backend uses a 'keysel' attribute to derive a per object
key from the KEK in the config. A single key_id with distinct keysel
has different keys and need to be cached as such.

Add the keysel to the cache key id to handle these collisions.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

rgw: KMS Cache Shutdown: Reaper first

1. Don't delete the KMS cache before draining/joining the frontend
coroutine threads. They may still depend on the KMS cache.
2. Stop the TTL reaper early to get it off the coroutine pool.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

rgw: KMS Cache: Reset Reaper State in Async+Threaded

Reset reaper state to monostate in the async and threaded case.
Fixes a possible use after free in the async reaper case.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

common/keyring: Fix reset error checking

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

common/web_cache: Fix _sieve_hand dangling pointer

sieve_expire_erase_unmutexed did not update the sieve hand passed as
advertised. Make it return the updated hand and use that to
update the global _sieve_hand in expire_erase

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

common/web_cache: delete perfcounters on destruction

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

rgw: SSE-KMS: Fix wrong cache key in in lookup_or() call

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

rgw: SSE-KMS: Handle Vault Transit Key Per Object

KMS backends Barbican, Vault KV, and KMIP have a static key per
key_id. However, with Vault Transit, each object has a unique DEK
wrapped by the transit key.

Keying th cache with key_id in Transit mode results in only the first
DEK to be cached for all subsequent objects.

Fix this by appending a hash of the wrapped DEK to the cache key.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

rgw: Fix typos in perf counter descriptions

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

PendingReleaseNotes: Add SSE-KMS Cache

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

rgw: SSE-KMS Secrets Cache

Add SSE Key Management System secrets cache to RGW.

It is common to have secrets shared by many if not all objects in a
bucket. Without RGW-side caching every PUT/GET will cause a request to
an external KSM. This not only adds load to the KSM, but also slows
down read and writes.

Combine WebCache, ceph::async::call_once and LinuxKeyringSecret into
KMSCache. WebCache stores async::once_result to wrap results of a KMS
secret fetch to mitigate cache stampedes (concurrent cache requests to
the same key coalesce into one). The retrieved secrets are stored in
the Linux kernel key retention service (LinuxKeyringSecret) for safe
keeping and retrial by subsequent requests. KMSCache adds a TTL reaper
and life cycle.

Cache values and error handling: The cache stores positive
fetch results, permanent errors (e.g key does not exists) and
transient errors (e.g fetch timeout). Each with a different TTL.

Unit tests to cover cached / uncached KMS retrieve and runtime cache
disable via config.

Add perf counter `kms_fetch_lat` to track KMS fetch request latency
and error counters to track permanent, transient and key store
errors.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
Fixes: https://tracker.ceph.com/issues/68524
On-behalf-of: SAP marcel.lauhoff@sap.com

common: Refactor LinuxKeyringSecret into Keyring Interface

Goal: Support multiple backends and faking / mocking for testing.

Add abstract classes Keyring (factory) and KeyringSecret. Add
"Unsupported" implementation for non-Linux platforms. Add a get_best
factory function that currently returns the LinuxKeyring impl on Linux
or Unsupported elsewhere.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

doc: Document RGW KMS Cache

Add caching section to the RGW Encryption docs. Add cache
settings to the RGW configuration reference.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

rgw: Early Linux process keyring initialization

To allow RGW threads to share possession over process keyring keys the
keyring must be created before a child thread adds keys.

Since we only use the process keyring for KMS cache secrets, only
initialize the keyring if it is enabled on startup.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

common: Add Linux Keyring Secret Store Wrapper

Add RAII wrapper around the Linux Key Retention Service
add_key(2), keyctl_read(3), keyctl_invalidate(3)

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

test: Add Secrets Store µBenchmarks

Benchmark:
- Linux Kernel Key Retention Service (kernel keystore) [0]
- memfd_secret(2)
- plain memory

Tests:
- Random reads
- (keystore) Write, Read, Remove

[0] https://docs.kernel.org/security/keys/core.html

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

test: Add Cache benchmarks

Add Google benchmark [0] based micro benchmarks for Cache/LRU
implementations in the Ceph code base.

[0] https://github.com/google/benchmark

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

common: Add WebCache

A cache data structure for values that need to be retrieved form
outside systems (e.g Key Management Systems).

Features:
- Thread safe, optimized for concurrent lookups and cache hits
- Entry TTL expiration
- Cache replacement strategy tuned to "web" workloads (SIEVE)
- Performance Counters on hit, miss, expire, size, capacity, clears

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@clyso.com>
On-behalf-of: SAP marcel.lauhoff@sap.com

common/async: add call_once() algorithm for optional_yield

modeled after std::call_once() to guarantee that racing callers wait for
the initial caller to finish. the main differences here are

* support for coroutine callers to suspend instead of blocking while
waiting for the initial caller, and
* the wrapped function must return a value, which is cached and returned
to all callers

Signed-off-by: Casey Bodley <cbodley@redhat.com>

common/async: yield_waiter can return the associated executor

also adds an empty() function so it's easier to specify its precondition

Signed-off-by: Casey Bodley <cbodley@redhat.com>

common/async: yield_waiter overloads for unique_lock

if async_wait() can race with complete() across threads, the
yield_waiter's handler_state needs to be protected by a mutex. add
an async_wait() overload for unique_lock that behaves like
condition_variable::wait(): the lock is released immediately before
suspending, and reacquired immediately before calling its completion
handler

Signed-off-by: Casey Bodley <cbodley@redhat.com>

qa: install nvme-cli only if distro remains rocky10

Notably, only include these the `dnf install` commands if the distro is
not overriden by some other mechanism (like cephfs kernel overrides).

This is only a problem for tentacle presently as the k-stock kernel will
override with centos9.

Fixes: https://tracker.ceph.com/issues/77037
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>

kv/KeyValueDB: New utility function util_divide_key_range

New function splits provided range into smaller chunks.
Declared in KeyValueDB, but implemented only for RocksDBStore.
Useful for splitting large datasets for multiple threads to
iterate in parallel.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>

kv/KeyValueDB: New estimate_range_size function

Taking estimate_prefix_size to another level.
Makes possible detailed inspection of db size.
Used primarily for bisecting key range.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>

Merge pull request #69083 from fultheim/adaptive-cleaner-thresholds

crimson/os/seastore: adaptive cleaner thresholds from observed workload

Reviewed-by: Matan Breizman <mbreizma@redhat.com>

script/backport-create-issue: catch errors during traversal

A ServerError shouldn't prevent all forward progress.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>

Merge pull request #69203 from tchaikov/wip-libcephfs-test

test/libcephfs: reduce SnapDiffDeletionRecreation bulk_count on Windows

Reviewed-by: Igor Fedotov <igor.fedotov@croit.io>

Merge pull request #68775 from gardran/wip-gardran-fix-write-v2-deferred-counters

os/bluestore: do not increment *issued_deferred* counters twice

Reviewed-by: Jaya Prakash <jayaprakash@ibm.com>
Reviewed-by: Adam Kupczyk <akupczyk@ibm.com>

Merge pull request #69168 from guits/fix-osd-type

cephadm: cephadm: omit --osd-type classic for older ceph-volume

Merge PR #69152 into main

* refs/pull/69152/head:
script/backport-create-issue: update custom field name

Reviewed-by: Redouane Kachach <rkachach@redhat.com>

mgr/dashboard: Combining Quorum tables data on Monitors page
Fixes: https://tracker.ceph.com/issues/76746
Signed-off-by: Devika Babrekar <devika.babrekar@ibm.com>

cmake: link legacy-option-headers from targets that use legacy options

The *_legacy_options.h headers that define the legacy ConfigValues
members are generated at build time by y2c.py. Linking the
legacy-option-headers INTERFACE library adds an order dependency on
that step. A few targets reference legacy members without linking it,
so under a parallel build they can be compiled before the headers
exist and fail with "class ConfigValues has no member ...":

  neorados_objs, neorados_api_obj - objecter_inflight_ops,
      ms_die_on_unhandled_msg (via Objecter.h / Messenger.h)
  ceph_zstd - compressor_zstd_level
  heap_profiler - log_file

Link legacy-option-headers from them, as ceph_lz4, ceph_snappy and
jerasure_utils already do.

Signed-off-by: Sun Yuechi <sunyuechi@iscas.ac.cn>

Merge pull request #67889 from gardran/wip-gardran-no-seq-bytes

os/bluestore: avoid redundant map lookup for deferred op

Reviewed-by: Jaya Prakash <jayaprakash@ibm.com>

os/bluestore: do not increment *issued_deferred* counter twice
in write v2 mode.

_get_deferred_op() is already increasing performance counter on its own.

Signed-off-by: Garry Drankovich <garry.drankovich@clyso.com>

qa/cephadm: query iSCSI gateway FQDN from inside the container

rbd-target-api validates that the gateway hostname supplied by gwcli
matches the container's own socket.getfqdn(). Running the same call on
the host can return a different value when the host and container resolve
names differently (e.g. on Rocky 10), causing gateway creation to fail
with HTTP 400 and all subsequent gwcli configuration to break silently.

Query the FQDN from inside the iSCSI container directly so the value is
always consistent with what rbd-target-api expects. This also removes the
"run twice" workaround, which was compensating for host-side DNS
warm-up flakiness rather than addressing the underlying mismatch.

Fixes: https://tracker.ceph.com/issues/74577
Signed-off-by: Kefu Chai <k.chai@proxmox.com>

python-common: Improve profile name string validation

Fixes: https://tracker.ceph.com/issues/74986
Signed-off-by: Ashwin M. Joshi <ashjosh1@in.ibm.com>
src/python-common/ceph/tests/test_service_spec.py

Conflicts:
src/python-common/ceph/deployment/service_spec.py

container: install ceph-mgr-modules-core and ceph-mgr-modules-standard

The Containerfile uses --setopt=install_weak_deps=False throughout, so
ceph-mgr-modules-core (a Recommends of ceph-mgr, not a Requires) and
the split-out module packages are not automatically installed. Add them
explicitly.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>

debian, rpm: add ceph-mgr-modules-standard meta-package

ceph-mgr-modules-core was split into per-module packages so that users
only need to install what they actually use. To ease migration for
existing deployments that want the full former set, add a meta-package
ceph-mgr-modules-standard that pulls in all modules which were
previously bundled in ceph-mgr-modules-core.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>

debian, rpm: add ceph-mgr-cli-api package

cli_api is a new module in this release (not previously shipped in
ceph-mgr-modules-core), so it gets its own package without any
Breaks/Replaces or Obsoletes against the old monolithic package.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>

cmake: install cli_api mgr module

cli_api was missing from the mgr_modules list, so cmake did not install
it into the buildroot, causing RPM packaging to fail with "File not
found".

Signed-off-by: Kefu Chai <k.chai@proxmox.com>

debian: fix missing python3 deps for diskprediction-local and osd-support

diskprediction-local depends on python3-prettytable and osd-support
depends on python3-cherrypy3; both need to be declared explicitly now
that these modules are separate packages.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>

rpm: split ceph-mgr-modules-core into per-module packages

ceph-mgr-modules-core has historically bundled always-on modules
together with optional ones, forcing users to install modules and their
dependencies even when they have no use for them. Split each optional
module into its own package so users and distributions can install only
what they need.

ceph-mgr-modules-core is trimmed to the 10 always-on modules defined
in src/mon/MgrMonitor.cc: balancer, crash, devicehealth, orchestrator,
pg_autoscaler, progress, rbd_support, status, telemetry, volumes.
Each optional module now follows the pattern of ceph-mgr-k8sevents and
ceph-mgr-rook.

New packages carry Obsoletes: ceph-mgr-modules-core < 21.0.0 for
proper upgrade path.

The split also exposes cross-module Python dependencies: modules
co-installed in ceph-mgr-modules-core could freely import each other,
but once separated into individual packages those imports require
explicit Requires entries. Now the inter-dependencies are expressed
properly in ceph.spec.in.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>

rpm: deduplicate mgr module scriptlets with a macro

Define %ceph_mgr_module_scripts() to emit the identical %post/%postun
pair for each optional mgr module package, replacing the 5 existing
hand-written copies (dashboard, diskprediction-local, rook, k8sevents,
cephadm) with a single call site per package. Subsequent commits that
introduce new mgr module packages can use the macro from the start.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>

debian: split ceph-mgr-modules-core into per-module packages

ceph-mgr-modules-core has historically bundled always-on modules
together with optional ones, forcing users to install modules and their
dependencies even when they have no use for them. Split each optional
module into its own package so users and distributions can install only
what they need.

ceph-mgr-modules-core is trimmed to the 10 always-on modules defined
in src/mon/MgrMonitor.cc: balancer, crash, devicehealth, orchestrator,
pg_autoscaler, progress, rbd_support, status, telemetry, volumes.
Each optional module now follows the pattern of ceph-mgr-k8sevents and
ceph-mgr-rook.

New packages carry Breaks/Replaces: ceph-mgr-modules-core (<< 21.0.0)
for proper file ownership transfer on upgrade.

The split also exposes cross-module Python dependencies: modules
co-installed in ceph-mgr-modules-core could freely import each other,
but once separated into individual packages those imports require
explicit Depends entries. Now the inter-dependencies are properly
reflected in debian/control.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>

Merge pull request #69143 from guits/fix-cv-vg-lv-batch

ceph-volume: retry lvs after empty result and "devices file is missing" stderr

test/libcephfs: reduce SnapDiffDeletionRecreation bulk_count on Windows

this test timed out on Windows. and HugeSnapDiffLargeDelta, at half
the file count, passed in 508 seconds on the same run, suggesting this
test takes ~17 minutes on Windows -- beyond the test runner limit.

we haven't profiled the Windows client yet, but the likely culprit is
EventPoll, the Windows messenger backend, which scans the entire poll
array on every event_wait() and poll_ctl() call rather than using a
keyed data structure.

in this change, we reduce bulk_count to 1 << 12 on Windows. the unique
thing this test covers is the deletion-recreation pattern: a name that
exists as a file in snap1, gets deleted, and reappears as a directory in
snap2 -- it must show up in the diff with both snapids. 4096 produces
1024 such pairs, which is enough to exercise that logic. multi-fragment
snapdiff is already covered by HugeSnapDiffLargeDelta, which derives its
file count from mds_bal_split_size and mds_bal_fragment_fast_factor
explicitly to trigger fragmentation.

Fixes: https://tracker.ceph.com/issues/77015
Signed-off-by: Kefu Chai <k.chai@proxmox.com>

Merge pull request #69135 from VallariAg/wip-nvmeof-teuthology-mon-conf

qa/suites/nvmeof: set beacon grace and connect panic

Merge pull request #66500 from AliMasarweh/wip-alimasa-global-cors

RGW: add support for global CORS rule

Reviewed-by: Naman Munet <naman.munet@ibm.com>, Casey Bodley <cbodley@redhat.com>

Merge pull request #69185 from sunyuechi/wip-with-system-spdk

cmake,blk/spdk: support WITH_SYSTEM_SPDK

Reviewed-by: Kefu Chai <k.chai@proxmox.com>

compressor/zstd: include <zstd.h> instead of the bundled path

ZstdCompressor.h hard-codes #include "zstd/lib/zstd.h", which only
resolves because include_directories(src) puts the bundled submodule
on the search path. It thus silently depends on src/zstd being checked
out, and breaks with -DWITH_SYSTEM_ZSTD=ON where the submodule is absent.

ceph_zstd already links Zstd::Zstd, whose INTERFACE_INCLUDE_DIRECTORIES
points at the directory holding zstd.h in both modes: src/zstd/lib for
the bundled build, ${Zstd_INCLUDE_DIR} for the system one. Use <zstd.h>
so the include resolves through that interface either way.

Signed-off-by: Sun Yuechi <sunyuechi@iscas.ac.cn>

Merge pull request #68745 from Hezko/bugfix-13279

mgr/dashboard: fix listener add errors

Merge pull request #69044 from xxhdx1985126/wip-seastore-rewrite-fix

crimson/os/seastore: force rewrite transactions to conflict with others if it involve insertions on the lba tree

Reviewed-by: Matan Breizman <mbreizma@redhat.com>

cmake: add WITH_SYSTEM_SPDK to link a system-installed SPDK

By default ceph builds the bundled src/spdk fork via BuildSPDK. Add a
WITH_SYSTEM_SPDK option that instead locates a distro-provided SPDK
through a new Findspdk.cmake (pkg-config based, modelled on
Finddpdk.cmake), exposing the same spdk::spdk target.

Signed-off-by: Sun Yuechi <sunyuechi@iscas.ac.cn>

blk/spdk: support both old and new spdk_env_opts member names

SPDK 21.01 renamed two struct spdk_env_opts members: pci_whitelist ->
pci_allowed and master_core -> main_core. Guard the assignments in
NVMEDevice with SPDK_VERSION.

pci_whitelist -> pci_allowed: https://github.com/spdk/spdk/commit/4a6a2824119b
master_core -> main_core: https://github.com/spdk/spdk/commit/fe137c8970bf

Signed-off-by: Sun Yuechi <sunyuechi@iscas.ac.cn>

rgw/posix: fix event replay in BucketCache ev_loop

evec is never cleared after each n->notify() call, so events accumulate
across iterations of ev_loop's inner for loop. Each notify() call
receives not just the current event but all events dispatched in earlier
iterations too.

Add evec.clear() after each n->notify() call.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>

rgw/posix: fix refcount leaks in BucketCache

get_bucket(FLAG_LOCK) increments the refcount via lru.ref(), but three
paths returned without the paired lru.unref(): the "do nothing" early
return and the INVALIDATE branch in notify(), and unconditionally in
invalidate_bucket(). Entries hitting these paths accumulated inflated
refcounts that the LRU could never reclaim, leaking during
~BucketCache() → cache.drain().

Replace the manual lru.unref() calls in notify(), add_entry(),
remove_entry(), invalidate_bucket(), and list_bucket() with a scope_guard
declared before unique_lock. Since the guard outlives ulk, it fires after
the mutex is released on all paths, including exceptions from
getRWTransaction() or txn->commit() (e.g. MDB_MAP_FULL, EIO) that the
manual calls never reached.

list_bucket() also had a bare b->mtx.unlock() after fill(); replace it
with unique_lock{..., std::adopt_lock} so a throw from fill() releases
the mutex too.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>

qa: Fix teuthology test timing out
Enable autoscale mode for pools which is default off for teuthology
Increase mon_target_pg_per_osd so pools scale up by enough
Signed-off-by: Eric Zhang <emzhang@ibm.com>

qa/workunits/rados: fetch files via GitHub instead of git.ceph.com

The current method fetches files from git.ceph.com, which is unreliable
and sometimes causes the file to contain HTML output instead of the C++ code.

Fetching from GitHub is a more reliable way to get the C++ code.

Fixes: https://tracker.ceph.com/issues/68669
Signed-off-by: Laura Flores <lflores@ibm.com>

Merge pull request #68934 from cbodley/wip-76578

rgw/beast: add ssl_ciphersuites option for tls 1.3

Reviewed-by: Matt Benjamin <mbenjamin@redhat.com>

rgw/posix: remove path from table names

Removes the DB directory path from the table names.

Signed-off-by: Nithya Balachandran <nithya.balachandran@ibm.com>

rgw/posix: implement the quota feature

Implement the quota feature for the POSIX driver.

Signed-off-by: Nithya Balachandran <nithya.balachandran@ibm.com>

RGW | standalone: add support for accounts in dbstore

Signed-off-by: Ali Masarwa <amasarwa@redhat.com>

radosgw-admin: Remove dependence on RADOS

Signed-off-by: Samarah Uriarte <samarah.uriarte@ibm.com>

RGW POSIX - Fix POSIX unittest

Signed-off-by: Daniel Gryniewicz <dang@fprintf.net>

rgw/posix: fix cached size of uploaded objects

Moves file open and stat into the (atomic) link step, so size
is correctly interned in the cache. Fix suggested by dang.

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>

rgw/posix: fix crash in radosgw-admin

The POSIXBucket copy constructor incorrectly calls .get() on a
on a temporary unique_ptr returned by clone(), causing immediate
deletion of the Directory object. This leaves a dangling pointer
that triggers a segfault during destruction.

Signed-off-by: Nithya Balachandran <nithya.balachandran@ibm.com>

cohort_lru: keep strict discard, but from LRU

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>

posixdriver: properly destruct BucketCacheEntry objects

* avoids leak of database handles during eviction

Also adds missing return-ref in invalidate_entry--this would
leak a cache entry.

With this change, we can now tolerate indefinite s3-test runs
wit rgw_posix_cache_max_buckets=100.

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>

cohort_lru: crash fix and reduce lock contention

Fixes crash induced by taking the address of the last element
of an empty intrusive list (!).

Also, introduces active queue, reducing potential for lock
contention in evict_block():

* entries are tracked on lane::active_queue when lru_refcnt > 1
** on some lane::q otherwise

Object transition between queues when lru_refcnt changes value--
a value of 0 triggers deletion, as before.

Fixes: https://tracker.ceph.com/issues/73992
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>

posixdriver: can move buffer::list leaving scope

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>

posixdriver: add provisional manifest

initially, it is just used to remember the multipart layout, but
likely will see other use.

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>

posixdriver: fix cksum_type, flags propagation

Posixdriver doesn't serialize POSIXMultipartUpload, but rather a
member mp_obj of type POSIXMPObj--so to avoid losing the latter's
inherited cksum_type and cksum_flags members (which are already
copied in), copy them out in POSIXMultiPartUpload::get_info() which
we need to call to copy out dest_placement anyway.

(oops, chksum_type was copied in, but not cksum_flags)

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>

posixdriver: fix cache fill of versioned buckets

This change completes the original intent (hypothesized) to
conditionally set the FLAG_CURRENT bit on just the current
entries during bucket listing cache fill.

This avoids interning 2 copies of the current version of each
object in the listing cache, and also correctly sets the
FLAG_CURRENT bit as required--so the current versions are correctly
reported in versioned listings.

Janky logic to find the current version by explicitly chasing
the symlink target and saving it outside the enumeration scope
has been replaced with proper call to stat() provided by Dang.

Symlink::fill_cache() is no longer used, so removed.

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>

posixdriver: add bde.flags to in bucket cache serde cycle

The upstream logic (mostly?) correctly uses bde.flags when filling
the cache for versioned objects, but cache ser(de)ialization has
been discarding that member.

This change suppresses the visible result where RGW incorrectly produces
multiple versions in non-versioned listing because none uniquely sets
FLAG_CURRENT:

mbenjamin@fedora:~/dev/rgw/s3_py/python$ s3cmd ls s3://sheik2
2026-02-14 22:44           22  s3://sheik2/ginfizz_1
2026-02-14 22:44           22  s3://sheik2/ginfizz_1
2026-02-14 22:44           22  s3://sheik2/ginfizz_1
2026-02-14 22:44           22  s3://sheik2/ginfizz_2
2026-02-14 22:44           22  s3://sheik2/ginfizz_2
2026-02-14 22:44           22  s3://sheik2/ginfizz_2

Corrected result is:

mbenjamin@fedora:~/dev/rgw/s3_py/python$ s3cmd ls s3://sheik2
2026-02-14 22:44           22  s3://sheik2/ginfizz_1
2026-02-14 22:44           22  s3://sheik2/ginfizz_2

Cached listings for versions are still incorrect in containing an
an extra entry for the "current" version in with empty instance
(from the Symlink)--the visible effect being that list-object-versions
output is incorrect (no entry is sent with IsLatest, after the
empty instance version has been filtered out).

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>

posixdriver: propagate object lock attrs across multipart upload

Retention rules can be specified in init-multipart, and of present,
need to propagate to the final object if the upload completes.

Needed for (e.g.) test_object_lock_delete_multipart_object_with_retention

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>

posixdriver: page in all xattrs in POSIXObject::load_obj_state()

This seems to be needed for (at least) object lock retention period
checks, e.g., in DeleteObject::execute().

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>

Merge pull request #68899 from batrick/i76586

qa: ignore POOL_FULL for rbd tests exercising full pools

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>

Merge PR #67683 into main

* refs/pull/67683/head:
qa/tasks/cbt: construct venv just for cbt
qa/distros: use consistent naming
qa/tasks/nvme_loop: fix nvme loop task for ubuntu noble
qa/distros: add ubuntu_24.04 as supported container host
qa/distros: bump ubuntu_latest.yaml to 24.04
qa/distros: add all/ubuntu_24.04.yaml
qa/suites/rados/encoder: use random supported distro
qa/ceph-ansible: symlink supported-random-distro$
qa/fs/fscrypt: symlink supported-random-distro$
qa/cephmetrics: symlink supported-random-distro$

Reviewed-by: Redouane Kachach <rkachach@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>

Merge PR #69163 into main

* refs/pull/69163/head:
qa/tasks: capture CommandCrashedError when running nvme list cmd

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

Merge pull request #66439 from aclamk/aclamk-bs-simpler-flush

bluestore/bluefs: FileWriter simpler flush

Reviewed-by: Igor Fedotov <igor.fedotov@croit.io>

Merge pull request #68607 from dheart-joe/wip-bluestore-unshare-blob

os/bluestore: optimize shared blob unsharing during snapshot removal

Reviewed-by: Igor Fedotov <igor.fedotov@croit.io>

Merge pull request #69166 from sunyuechi/wip-rgw-swift-error-handler-out-of-line

rgw: move SWIFT error_handler out-of-line to fix link failure

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #68898 from gardran/wip-gardran-show-esb-in-metadata

os/bluestore: dump effective elastic shared blobs mode in OSD metadata report

Reviewed-by: Adam Kupczyk <akupczyk@ibm.com>

tools/rados: Remove plain text snippets from rados bench JSON output

`rados bench` emits performance stats as its output. It is very helpful
for this output to be in a machine-readable format and the CLI provides
the `--format=json` flag to achieve this.

There are some logs that do not respect the formatter flag though, as
they provide status updates as the tool is running and do not form part
of the output dataset. This prevents the contents of stdout from being
valid JSON which destroys the machine-readability of the output.

To resolve this we gate those status messages behind a check for the
formatter. If any specific formatter is provided we do not emit the
status logs. This leaves the plaintext output largely untouched while
helping the machine-readable output to be well-formed.

Fixes: https://tracker.ceph.com/issues/74370
Signed-off-by: Jacques Heunis <jheunis@bloomberg.net>

qa/rgw: remove ragweed from multifs subsuite

it's currently broken with newer python on rocky 10 and ubuntu 24
(tracked in https://tracker.ceph.com/issues/72500) and doesn't provide
interesting test coverage outside of rgw/upgrade

Fixes: https://tracker.ceph.com/issues/76996
Signed-off-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #69144 from gbregman/main

nvmeof: Change the NVMEOF image version to 1.8

rgw/datalog: `radosgw-admin` will no longer convert datalog to omap

Omap-backed datalogs are deprecated, so we remove the ability to
convert to them.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

rgw/datalog: Remove `rgw default data log backing` option

Omap-backed datalogs are deprecated. This option is removed and we no
longer support creating new clusters using them.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

Merge pull request #68095 from lumir-sliva/fix-deprecated-egrep-fgrep

qa,src: replace deprecated egrep/fgrep with grep -E/grep -F

Reviewed-by: Kefu Chai <k.chai@proxmox.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>

qa: Ignore deprecated EC plugin warning in teuthology tests

Add DEPRECATED_EC_PLUGIN to the list of health warnings to
ignore in the thrash-erasure-code-* tests that use deprecated
plugins or techniques. It is expected that this warning will
be raised.

Signed-off-by: Jamie Pryde <jamiepry@uk.ibm.com>

cephadm: cephadm: omit --osd-type classic for older ceph-volume

tentacle doesn't know that flag yet.
During an upgrade, teuthology tests can break.
With this fix, we only add the flag when osd_type isn't classic.

Fixes: https://tracker.ceph.com/issues/76968
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>

mgr/dashboard: Add Sync from/sync from all options on master zone edit

In the dashboard, master zone's edit functionality include the expected "Sync from Zones" and "Sync from All Zones" options

Fixes: https://tracker.ceph.com/issues/76989
Signed-off-by: Aashish Sharma <aasharma@redhat.com>

rgw: move SWIFT error_handler out-of-line to fix link failure

The two error_handler overrides are defined inline in rgw_rest_swift.h
and delegate to RGWSwiftWebsiteHandler::error_handler, a non-virtual
function defined in rgw_rest_swift.cc (librgw_a.a). Because the header
is included by rgw_rest.cc, the inline bodies are emitted in
librgw_common.a, which then ODR-uses that symbol across archives.

The link line lists librgw_a.a before librgw_common.a, and GNU ld only
pulls archive members on demand: when librgw_a.a is scanned nothing yet
references RGWSwiftWebsiteHandler::error_handler, so rgw_rest_swift.cc.o
is dropped and the symbol is later unresolved. This shows up as a link
failure with gcc 16 -O2.

Move the two bodies into rgw_rest_swift.cc next to the function they
call, so the ODR-use stays within the same object and the build no
longer depends on archive scan order. No functional change.

Signed-off-by: Sun Yuechi <sunyuechi@iscas.ac.cn>

cmake/AddCephTest: use namespaced Catch2 imported targets

AddCephTest.cmake links the bare target names Catch2 / Catch2WithMain.
With WITH_SYSTEM_CATCH2=ON, CPM resolves Catch2 via find_package(),
which only exports the namespaced IMPORTED targets Catch2::Catch2 /
Catch2::Catch2WithMain. CMake then treats the bare names as plain
library names and the link fails with -lCatch2WithMain, since the
physical library is named libCatch2Main (OUTPUT_NAME "Catch2Main").

Use the namespaced names. Catch2 exports them as ALIASes in the bundled
(CPM subproject) build too, so the default path keeps working.

Signed-off-by: Sun Yuechi <sunyuechi@iscas.ac.cn>