unittest-seastore runs the seastar reactor on a separate thread
(SeastarRunner) and stops it at exit without draining its pending
tasks, so a few cached extents those tasks still held are leaked at
shutdown. Suppress them by the three seastore subsystems they come from.
Several unittests fail LeakSanitizer on still-reachable allocations that
belong to third-party libraries, not to Ceph. These are boost.thread's
main-thread TLS, OpenSSL's one-time init (the ForkDeathTest children
_exit() before it is freed), and the cipher, DRBG and error-stack state
that OpenSSL and libcryptsetup keep behind the librbd encryption and
migration unittests. None get freed without OPENSSL_cleanup(), so suppress
them by their allocation entry points.
Sun Yuechi [Sun, 21 Jun 2026 08:42:54 +0000 (16:42 +0800)]
vstart: load lsan/asan suppressions on WITH_ASAN builds
AddCephTest.cmake runs unittests with
ASAN_OPTIONS/LSAN_OPTIONS=suppressions=qa/{asan,lsan}.supp, but vstart.sh
does not, so on a WITH_ASAN build `ceph-mon --mkfs` aborts on a still-reachable
leak that those suppressions cover and fails the "ceph API tests" job. Export
the same options when WITH_ASAN=ON.
Sun Yuechi [Sun, 21 Jun 2026 08:42:54 +0000 (16:42 +0800)]
test/encoding: probe `setarch -R` and drop the arch argument
readable.sh wraps ceph-dencoder with `setarch $(uname -m) -R` to disable
ASLR on ASan builds, but the arch-qualified form also sets the personality
to that arch, which fails where setarch can't (e.g. riscv64). Use bare
`setarch -R` to only clear ASLR, and probe it first so the script falls
back to running ceph-dencoder unwrapped.
Sun Yuechi [Sun, 21 Jun 2026 08:42:54 +0000 (16:42 +0800)]
crimson: build seastar with the default allocator under ASan
With this build configured as RelWithDebInfo, seastar keeps its own
allocator instead of falling back to libc's. Under ASan that allocator
is called (via dlsym) before it is initialized and SIGSEGVs every
seastar/crimson unittest before main(). Define SEASTAR_DEFAULT_ALLOCATOR
under WITH_ASAN to keep seastar on the libc allocator.
Sun Yuechi [Sun, 21 Jun 2026 16:14:55 +0000 (00:14 +0800)]
cmake: use the libc allocator for sanitizer builds
tcmalloc/jemalloc keep exporting the global operator new/delete even though
their malloc is shadowed by the sanitizer interceptor, so memory the sanitizer
allocated gets freed through tcmalloc and SIGSEGVs (e.g. seastar coroutine
frames). Force libc when WITH_ASAN is set.
Sun Yuechi [Sat, 20 Jun 2026 05:51:39 +0000 (13:51 +0800)]
test/cmake: drop dead env_vars_for_tox_tests set_property
Both `tox_tests` and `env_vars_for_tox_tests` have been undefined since f0079a1030b, so this expands to `set_property(TEST PROPERTY ENVIRONMENT)`
-- a no-op. The actual per-test environment for tox tests is set in
add_tox_test() (cmake/modules/AddCephTest.cmake). Remove the leftover.
Sun Yuechi [Sun, 7 Jun 2026 14:19:08 +0000 (22:19 +0800)]
mypy: skip follow_imports for prettytable
With test venvs on system site-packages, mypy picks up the system
prettytable (3.4.0+, typed). It flags mgr's add_row(tuple) against the
list[Any] signature (src/mypy.ini) and qa's float_format = str against
the dict[str, str] property (qa/mypy.ini). Skip follow_imports in both.
Sun Yuechi [Sat, 6 Jun 2026 18:05:34 +0000 (02:05 +0800)]
test: optionally run test venvs with system site-packages
Add a CEPH_PYTHON_SYSTEM_SITE switch (off by default). When set:
- setup-virtualenv.sh builds its venv with --system-site-packages;
- run_tox.sh exports VIRTUALENV_SYSTEM_SITE_PACKAGES=true for tox's venvs.
This lets distro packages satisfy test dependencies instead of pip building
them from sdist, which helps where prebuilt wheels are missing (e.g. scipy and
numpy on riscv64) by avoiding a slow rebuild when the RPMs are installed.
Sun Yuechi [Thu, 18 Jun 2026 19:06:06 +0000 (03:06 +0800)]
mgr/tox: run pytest in parallel
The py3 and coverage tox environments run the full mgr pytest suite
serially, which makes run-tox-mgr the longest test in CI. Add
pytest-xdist and pass `-n auto` to both so the suite is distributed
across the available CPUs.
pytest-xdist is constrained to <2 to stay compatible with the pinned
pytest-cov. Running in parallel also surfaced a hard-coded port in
cephadm's test_node_proxy, which now allocates an ephemeral port per
process.
Kefu Chai [Sun, 17 Mar 2024 10:42:44 +0000 (18:42 +0800)]
script/run-make: enable ASan
when performing tests, we should enable sanitizers for detecting
potential issues. so, in this change, we enable ASsan, TSan and
UBSan.
script/run-make.sh is used by our CI job for testing PRs, so
enabling these sanitizers helps us to identify issues as early as
possible. because ASan cannot be used along with TSan, we prefer
using ASan for capturing memory related issue in favor of
detecting the multi-threading issues.
also, because of https://bugs.llvm.org/show_bug.cgi?id=23272, we
cannot enable multiple sanitizers. but we should enable UBSan as well,
once we can use a higher version of Clang than Clang-14. with
Clang-14, when enabling UBSan, we'd have following FTBFS
```
error: Cannot represent a difference across sections
```
when compiling `src/tools/neorados.cc`
Abhishek Desai [Tue, 26 May 2026 07:48:40 +0000 (13:18 +0530)]
mgr/dashboard : Support wildcard sans and zonegroup hostnames
fixes : https://tracker.ceph.com/issues/76795 Signed-off-by: Abhishek Desai <abhishek.desai1@ibm.com>
Kefu Chai [Thu, 18 Jun 2026 14:06:43 +0000 (22:06 +0800)]
ceph.spec.in: exclude CI and test directories from mgr plugin packages
The ceph-mgr-dashboard and ceph-mgr-rook packages install their entire
mgr module directory, which includes a ci/ subdirectory containing
Dockerfiles, e2e test scripts, and cluster specs used only for upstream
CI pipelines. ceph-mgr-cephadm similarly ships a tests/ directory with
Python unit tests. These files have no runtime purpose on a deployed
system and should not be shipped in the binary packages.
Exclude mgr/cephadm/tests, mgr/dashboard/ci, and mgr/rook/ci via
%exclude directives.
Kefu Chai [Thu, 18 Jun 2026 14:06:36 +0000 (22:06 +0800)]
debian/rules: exclude CI and test directories from mgr plugin packages
The ceph-mgr-dashboard and ceph-mgr-rook packages install their entire
mgr module directory, which includes a ci/ subdirectory containing
Dockerfiles, e2e test scripts, and cluster specs used only for upstream
CI pipelines. ceph-mgr-cephadm similarly ships a tests/ directory with
Python unit tests. These files have no runtime purpose on a deployed
system and should not be shipped in the binary packages.
Exclude mgr/cephadm/tests, mgr/dashboard/ci, and mgr/rook/ci via
dh_install --exclude.
mgr/cephadm: add ca_cert_required parameter to get_certificates mock in test
Add the ca_cert_required parameter to the lambda function mocking
CephadmService.get_certificates in test_prometheus_config_security_enabled
to match the updated method signature.
This ensures the test mock properly handles the new parameter that was
added to the get_certificates method.
mgr/smb: Add ssl_certificates buffer for smb features
Add ssl_certificates buffer for smb features like remote_control
and keybridge. when certificate applied it stores as a feature
name and SSLParameters as value where SSLParameters holds cert,
key and ca-cert.
mgr/cephadm: Add function to get ssl certificate from ssl_certificates
A function is added _get_certificates_from_spec_ssl_certificates to get
certificates from ssl_certificates buffer. it returns TLSCredentials from
SSLParameters of ssl_certificates[feature]
Kefu Chai [Sun, 14 Jun 2026 06:52:00 +0000 (14:52 +0800)]
journal/ObjectPlayer: don't acquire locks in destructor
~ObjectPlayer took m_timer_lock only to assert two invariants. but that
lock is borrowed by reference from the caller's SafeTimer, and an
ObjectPlayer can outlive it: a C_Fetch/C_WatchFetch completion on the
librados finisher may hold the last reference and run ~ObjectPlayer after
the timer and its lock are already gone. re-taking the freed lock is a
heap-use-after-free, which unittest_journal hits on arm64 under ASan:
the lock isn't needed though: at refcount 0 the watch has been cancelled
(m_watch_ctx == nullptr, asserted below) so no timer task references us, and
no fetch is in flight since a pending fetch holds a reference. nothing else
can touch our state. Furthermore, we can also skip acquiring `m_lock` as
well, because, in the destructor, it shouldn't really matter -- if one
of these asserts fails because the execution of the destructor races
with some `ObjectPlayer` mthod, we would get what the `assert()` was
added for. They are here to catch bugs and such a race just being
possible is a bug in itself.
Xiubo Li [Wed, 17 Jun 2026 01:14:43 +0000 (09:14 +0800)]
common/options: mark client_force_lazyio as not runtime updatable
The client_force_lazyio option is currently marked as supporting
runtime updates in the configuration schema, but this is misleading.
The value is read once during each file open/create and stored in
the file flags. There is no config observer registered to handle
dynamic updates, and there is no logic to propagate changes to the
already opened file handles.
This patch adds the NO_RUNTIME flag to the option definition to
correctly reflect reality.
Fixes: https://tracker.ceph.com/issues/77451 Signed-off-by: Xiubo Li <xiubo.li@clyso.com>
Matthew N. Heler [Mon, 18 May 2026 01:57:01 +0000 (20:57 -0500)]
mon: add monitor RocksDB backup and restore
Implements an opt-in backup mechanism for the monitor using
rocksdb::BackupEngine. Backups run on a schedule when
mon_backup_interval is set, or are triggered manually via
`ceph tell mon.* backup`. Cleanup keeps the last N, hourly,
and daily snapshots, with a free-space guard. Off by default.
Restore is offline: stop the mon and run
ceph-mon --restore-backup <dir> --yes-i-really-mean-it
optionally with --backup-version (BackupEngine logical version,
as shown by --list-backups). The mon keyring is stashed alongside
the RocksDB backup so a wiped mon_data is recovered end-to-end,
and kv_backend is stamped back when missing.
Co-authored-by: Daniel Poelzleithner <poelzleithner@b1-systems.de> Signed-off-by: Matthew N. Heler <matthew.heler@hotmail.com>
Emmanuel Ameh [Tue, 9 Jun 2026 12:19:27 +0000 (13:19 +0100)]
doc: Replace Python 2 package names with Python 3 equivalents
librados-intro.rst referenced ``python-rados`` for CentOS/RHEL.
rbd-openstack.rst referenced ``python-rbd`` for both apt and yum.
Python 2 reached end-of-life in January 2020; these package names
install the Python 2 bindings (or fail entirely) on current distros.
Replace with the correct Python 3 package names: python3-rados and
python3-rbd.
rbd-mirror: prune obsolete primary mirror snapshots after relocation
Previously, obsolete primary and demoted primary snapshots on the
secondary cluster were not cleaned up immediately after relocation.
Instead, old primary snapshots remained until a subsequent promote
operation triggered their cleanup, while old demoted primary snapshots
persisted until a later demote operation removed them.
Adding changes for proactive cleanup of obsolete primary and demoted
primary snapshots that are no longer required after relocation.
Also adding test coverage to validate the cleanup behavior.
Xuehan Xu [Sun, 24 May 2026 08:45:01 +0000 (16:45 +0800)]
crimson/os/seastore/lba: avoid paddr from crossing coroutines
Previously, LBAManager::remap_mappings() works as follows:
1. get the mapping's val;
2. remove the mapping;
3. insert the remapped mappings.
With pessimistic cc in place, during the above step 2 and 3, a rewrite
transaction that modifies the same mapping might be committed and miss
the pending lba leaf nodes because it doesn't contain the mapping at the
time.
This commit change the above workflow as follows:
1. replace the mapping with the first remapped mapping;
2. insert the remaining remapped mappings.
Note that all the remaining mapped mappings' paddrs are calculated based
on the mapping before it in the same coroutine as the insertion, which
means it'll always see the modification of a background rewrite
transaction.
Nizamudeen A [Wed, 10 Jun 2026 05:21:33 +0000 (10:51 +0530)]
mgr/dashboard: add custom filtering rules to the table
```
<cd-table #table
id="pool-list"
[data]="pools"
[columns]="columns"
selectionType="single"
[hasDetails]="true"
[status]="tableStatus"
[autoReload]="-1"
(fetchData)="taskListService.fetch()"
(setExpandedRow)="setExpandedRow($event)"
(updateSelection)="updateSelection($event)"
[customFilter]="true" # set this to true
(customFilterChange)="onCustomFilterChange($event)" #
get the new rules from here>
```
Fixes: https://tracker.ceph.com/issues/77290 Signed-off-by: Nizamudeen A <nia@redhat.com>
Oguzhan Ozmen [Tue, 24 Mar 2026 17:54:22 +0000 (17:54 +0000)]
doc: add PendingReleaseNotes entry for rgw multisite DNS endpoint resolution
Documents the new rgw_rest_conn_connect_to_resolved_ips feature that
enables RGW to resolve HTTP endpoints for RGW services such as multisite,
into all IP addresses and distribute requests across them using
round-robin with per-IP health tracking, supporting DNS service
discovery deployments without external load balancers.
Kotresh HR [Tue, 16 Jun 2026 17:31:13 +0000 (23:01 +0530)]
qa: Add tests for cephfs_mirror_directory perf counters
Verify counter dump registration, current-sync gauges while syncing
(full/delta), last-sync and summary counters after idle sync, and
extend mirror stats and remote-snap failure tests for per-directory
snaps_* and dir_state.
Shweta Sodani [Fri, 12 Jun 2026 11:41:15 +0000 (17:11 +0530)]
doc/mgr/smb: document client compatibility mode
Add documentation for --client-compat parameter in 'cluster create'
command and new 'cluster update client-compat' command. This feature
enables macOS-specific SMB optimizations (fruit VFS, streams_xattr)
and can be set during cluster creation or updated for existing clusters.
Describe per-directory labeled perf counters, labels, update behavior,
mapping to peer status, and counter reference tables. Document the
per-peer tick thread and cephfs_mirror_tick_interval, which refreshes
current-sync gauges on each tick. Add a PendingReleaseNotes entry for
the new cephfs_mirror_directory perf counter group.
Kotresh HR [Tue, 16 Jun 2026 16:57:12 +0000 (22:27 +0530)]
cephfs_mirror: update per-directory last sync and summary perf counters
Wire the remaining cephfs_mirror_directory labeled counters that were
registered in the prior commit but not yet refreshed. Live
current_syncing_snap gauges continue to be updated from the per-peer tick
thread; this commit updates last_synced_snap and per-directory snap
summary counters at the points where SnapSyncStat is actually modified.
Depends-on the cephfs_mirror_directory PerfCounters schema (dir_state,
current_*, last_*, snaps_*) added when each mirrored directory is
registered.
These counters appear under "cephfs_mirror_directory" in "counter dump" with
the same labels as current-sync metrics (source_fscid, source_filesystem,
peer_uuid, peer_cluster_name, peer_cluster_filesystem, directory). ceph-exporter
exposes them as e.g. ceph_cephfs_mirror_directory_last_sync_bytes and
ceph_cephfs_mirror_directory_snaps_synced.
Unlike cephfs_mirror_peers, values are per (peer_uuid, directory) rather than
aggregated across all directories on the peer. Peer-level counters are
unchanged.
Kotresh HR [Tue, 16 Jun 2026 17:22:21 +0000 (22:52 +0530)]
tools/cephfs_mirror: Expose per-directory snap metrics via perf counters
Introduce a new labeled perf counter group, cephfs_mirror_directory, so
per-directory snapshot mirror progress can be scraped via "counter dump" and
exported to Prometheus by ceph-exporter (e.g.
ceph_cephfs_mirror_directory_current_sync_bytes).
Design
------
* One PerfCounters instance per mirrored directory on a peer, keyed in
m_directory_perf_counters and registered on the daemon-wide
PerfCountersCollection.
* Labels on each instance (flat counter dump array entries):
- source_fscid, source_filesystem
- peer_uuid, peer_cluster_name, peer_cluster_filesystem
- directory (dir_root, e.g. "/parent/d1")
The peer_uuid label disambiguates the same directory path mirrored to
different peers.
* Counters are created in init() and add_directory(), removed in
remove_directory() and the PeerReplayer destructor.
* Priority follows cephfs_mirror_perf_stats_prio (same as
cephfs_mirror_peers).
Update path
-----------
Live / current_syncing_snap gauges are refreshed from
update_directory_current_sync_perf_counters(), called by
refresh_directory_current_sync_perf_counters() from the per-peer tick
thread (run_tick()). Each cephfs_mirror_tick_interval seconds (default 5)
the tick thread updates counters for each registered (actively syncing)
directory.
All of the following are added to the builder in this commit. Only the
"current sync" and dir_state fields listed under "Updated in this commit"
are written here; last_synced_snap and per-directory snap summary counters
are registered for a follow-up commit that updates them when stats change.
Current syncing snapshot (peer_status "current_syncing_snap")
[Updated in this commit]
current_snap_id - snapshot id being synchronized
current_sync_mode - 0 = full, 1 = delta (snapdiff)
current_read_bps - bytes/sec read (raw, not formatted)
current_write_bps - bytes/sec written
crawl_state - 0 = N/A, 1 = in-progress, 2 = completed
crawl_duration_seconds - crawl duration; in-progress uses now - start
datasync_wait_state - 0 = none, 1 = waiting, 2 = complete
datasync_wait_duration_seconds
current_sync_bytes - bytes synced so far for this snap
current_total_bytes - total bytes for this snap
current_sync_bytes_percent - basis points (1745 = 17.45%)
current_sync_files
current_total_files
current_sync_files_percent - basis points
current_eta_valid - 0 = calculating, 1 = ETA available
current_eta_seconds - ETA in seconds when valid
Per-directory snapshot summary (peer_status snaps_*)
[Registered only; not updated in this commit]
snaps_synced, snaps_deleted, snaps_renamed
Last synced snapshot (peer_status "last_synced_snap")
[Registered only; not updated in this commit]
last_snap_id
last_crawl_duration_seconds
last_datasync_wait_duration_seconds
last_sync_duration_seconds
last_sync_timestamp - utime_t / seconds since epoch
last_sync_bytes
last_sync_files
When idle or failed, current_* counters are zeroed and dir_state reflects
0 or 2 respectively.
Kotresh HR [Tue, 16 Jun 2026 16:38:15 +0000 (22:08 +0530)]
tools/cephfs_mirror: Add per-peer tick thread with configurable interval
Introduce a per-peer tick thread controlled by cephfs_mirror_tick_interval
(default 5 seconds). The interval is re-read each iteration so configuration
changes take effect without restarting the daemon. The thread provides a
generic hook for future periodic mirroring work.
Ronen Friedman [Mon, 15 Jun 2026 11:24:18 +0000 (11:24 +0000)]
crimson/os/seastore: fix laddr_t formatter and its use
'laddr_t' existing formatter did not support a ':x' format specifier
(actually - the output was always hexadecomal).
Here we remove the ':x', but also refactor the custom formatter to
avoid using the streambuf mechanism.
Note - SEASTORE_LADDR_USE_BOOST_U128 is no longer supported by the formatter.
Shweta Sodani [Fri, 12 Jun 2026 11:29:22 +0000 (16:59 +0530)]
mgr/smb: add tests for client compatibility mode
Add comprehensive test coverage for the new client compatibility
feature that enables macOS-specific SMB optimizations:
- test_enums.py: Add tests for ClientSupportMode enum values
(DEFAULT and MACOS) and string representation
- test_resources.py: Add tests for cluster client_compat field,
effective_client_compat property, and is_macos_compatibility_enabled
property with different mode configurations
- test_smb.py: Add integration tests for cluster_update_client_compat
CLI command including successful updates and error handling for
non-existent clusters
These tests ensure the client compatibility mode can be properly
set, retrieved, and updated at the cluster level.
Kefu Chai [Sun, 14 Jun 2026 08:47:59 +0000 (16:47 +0800)]
test/objectstore: hold split_blob cache shards in unique_ptr
ExtentMap.split_blob, added in be93e121a98, creates the onode and buffer cache
shards as raw pointers and never frees them. once we build with ASan,
LeakSanitizer reports them (and the structures they own):
Direct leak of 9928 byte(s) ... BlueStore::OnodeCacheShard::create
Direct leak of 224 byte(s) ... BlueStore::BufferCacheShard::create
SUMMARY: AddressSanitizer: 10288 byte(s) leaked in 8 allocation(s).
every other test in this file already wraps these shards in a unique_ptr,
declared before the collection that borrows them so they outlive it. do
the same here.