Emmanuel Ameh [Tue, 9 Jun 2026 12:19:27 +0000 (13:19 +0100)]
doc: Replace Python 2 package names with Python 3 equivalents
librados-intro.rst referenced ``python-rados`` for CentOS/RHEL.
rbd-openstack.rst referenced ``python-rbd`` for both apt and yum.
Python 2 reached end-of-life in January 2020; these package names
install the Python 2 bindings (or fail entirely) on current distros.
Replace with the correct Python 3 package names: python3-rados and
python3-rbd.
Nizamudeen A [Wed, 10 Jun 2026 05:21:33 +0000 (10:51 +0530)]
mgr/dashboard: add custom filtering rules to the table
```
<cd-table #table
id="pool-list"
[data]="pools"
[columns]="columns"
selectionType="single"
[hasDetails]="true"
[status]="tableStatus"
[autoReload]="-1"
(fetchData)="taskListService.fetch()"
(setExpandedRow)="setExpandedRow($event)"
(updateSelection)="updateSelection($event)"
[customFilter]="true" # set this to true
(customFilterChange)="onCustomFilterChange($event)" #
get the new rules from here>
```
Fixes: https://tracker.ceph.com/issues/77290 Signed-off-by: Nizamudeen A <nia@redhat.com>
Oguzhan Ozmen [Tue, 24 Mar 2026 17:54:22 +0000 (17:54 +0000)]
doc: add PendingReleaseNotes entry for rgw multisite DNS endpoint resolution
Documents the new rgw_rest_conn_connect_to_resolved_ips feature that
enables RGW to resolve HTTP endpoints for RGW services such as multisite,
into all IP addresses and distribute requests across them using
round-robin with per-IP health tracking, supporting DNS service
discovery deployments without external load balancers.
Shweta Sodani [Fri, 12 Jun 2026 11:41:15 +0000 (17:11 +0530)]
doc/mgr/smb: document client compatibility mode
Add documentation for --client-compat parameter in 'cluster create'
command and new 'cluster update client-compat' command. This feature
enables macOS-specific SMB optimizations (fruit VFS, streams_xattr)
and can be set during cluster creation or updated for existing clusters.
Ronen Friedman [Mon, 15 Jun 2026 11:24:18 +0000 (11:24 +0000)]
crimson/os/seastore: fix laddr_t formatter and its use
'laddr_t' existing formatter did not support a ':x' format specifier
(actually - the output was always hexadecomal).
Here we remove the ':x', but also refactor the custom formatter to
avoid using the streambuf mechanism.
Note - SEASTORE_LADDR_USE_BOOST_U128 is no longer supported by the formatter.
Shweta Sodani [Fri, 12 Jun 2026 11:29:22 +0000 (16:59 +0530)]
mgr/smb: add tests for client compatibility mode
Add comprehensive test coverage for the new client compatibility
feature that enables macOS-specific SMB optimizations:
- test_enums.py: Add tests for ClientSupportMode enum values
(DEFAULT and MACOS) and string representation
- test_resources.py: Add tests for cluster client_compat field,
effective_client_compat property, and is_macos_compatibility_enabled
property with different mode configurations
- test_smb.py: Add integration tests for cluster_update_client_compat
CLI command including successful updates and error handling for
non-existent clusters
These tests ensure the client compatibility mode can be properly
set, retrieved, and updated at the cluster level.
Kefu Chai [Tue, 16 Jun 2026 08:24:16 +0000 (16:24 +0800)]
rocksdb: update submodule to fix FTBFS due to missing <cstdint>
43dd4cbd370 bumped the rocksdb submodule to v7.10.2 for CVE-2022-23476,
dropping the <cstdint> includes the v7.9.2 pin carried.
db/blob/blob_file_meta.h uses uint64_t but no longer includes <cstdint>,
so it compiles only where another header pulls <cstdint> in transitively.
GCC with libstdc++ 16.1.0 no longer does, so the build fails:
db/blob/blob_file_meta.h: error: 'uint64_t' has not been declared
our targeted distros still pull it in, so the failure went unnoticed
there: ubuntu jammy (GCC 11.2.0) and noble (GCC 13.2).
bump the submodule to a cherry-pick of upstream rocksdb 72c3887167,
which fixes the same FTBFS.
rgw/d4n: adding a thread to asynchronously update
localweight to the cache backend. Removing the code
to update the localweight from GET and PUT requests.
Sun Yuechi [Mon, 15 Jun 2026 19:41:34 +0000 (03:41 +0800)]
crimson,mgr: mark assert-only variables [[maybe_unused]]
These variables are only read inside assert(), which is compiled out
under NDEBUG. Mark them [[maybe_unused]] to silence the warnings while
keeping the debug-only assert() style used by the surrounding code:
src/crimson/os/seastore/lba/btree_lba_manager.cc:1078: unused variable 'orig_len' [-Wunused-variable]
src/crimson/os/seastore/omap_manager/log/log_manager.cc:73: variable 'ret' set but not used [-Wunused-but-set-variable]
src/crimson/os/seastore/transaction_manager.cc:382: variable 'intermediate_key' set but not used [-Wunused-but-set-variable]
src/mgr/PyModule.cc:166,186: unused variable 'r' [-Wunused-variable]
Sun Yuechi [Mon, 15 Jun 2026 19:41:27 +0000 (03:41 +0800)]
test/crimson/seastore: use gtest assertion macros instead of assert()
Plain assert() is compiled out under NDEBUG, leaving the checked
variables unused. Use the always-evaluated gtest macros instead.
src/test/crimson/seastore/test_cbjournal.cc:586: variable 'old_written_to' set but not used [-Wunused-but-set-variable]
src/test/crimson/seastore/test_btree_lba_manager.cc:345: unused structured binding declaration [-Wunused-variable]
Sun Yuechi [Mon, 15 Jun 2026 19:41:19 +0000 (03:41 +0800)]
crimson,test: remove unused functions and dead variable
Fixing these warnings:
src/crimson/os/seastore/seastore.cc:83: 'omaptree_initialize' defined but not used [-Wunused-function]
src/crimson/osd/replicated_recovery_backend.cc:733: 'nullopt_if_empty' defined but not used [-Wunused-function]
src/test/rgw/test_rgw_kms_cache.cc:63: 'rethrow' defined but not used [-Wunused-function]
src/test/librados/test_cxx.cc:215: variable 'cmd' set but not used [-Wunused-but-set-variable]
Kefu Chai [Sat, 13 Jun 2026 01:50:09 +0000 (09:50 +0800)]
python-common/cryptotools: stop using the removed X509Req API
pyOpenSSL deprecated OpenSSL.crypto.X509Req in 24.2.0 (2024-07-20) and
removed it in 26.3.0 (2026-06-12). as we don't pin pyopenssl, CI picked
up the new release, and create_self_signed_cert() started failing with:
AttributeError: module 'OpenSSL.crypto' has no attribute 'X509Req'
this took down run-tox-mgr, run-tox-mgr-dashboard-py3 and the mypy check.
we only used X509Req to build a subject name and then copied it into the
X509 cert. so drop it, and set the subject on the cert directly. the
resulting cert stays the same: subject from dname, issuer set to the same
subject, self-signed.
Kotresh HR [Mon, 25 May 2026 18:22:29 +0000 (23:52 +0530)]
doc: Update the mirroring doc with new metrics fields
Update the mirroring documentation and also the
release notes with new metrics introduced and it's
availability via 'fs mirror peer status' asok
interface.
Kotresh HR [Fri, 5 Jun 2026 14:23:14 +0000 (19:53 +0530)]
tools/cephfs_mirror: Nest peer_status metrics by dir path and peer uuid
Restructure peer_status output so mirrored directory paths can be
shared by multiple peers without key collisions. Metrics are grouped
as metrics/<dir_path>/peer/<peer_uuid>/ instead of flat dir keys.
Kotresh HR [Sat, 28 Mar 2026 11:23:33 +0000 (16:53 +0530)]
tools/cephfs_mirror: Add eta metrics
Add estimate time of completion for the current
syncing snapshot. The calculation takes into
account the average read/write throughput from
the start of snapshot sync and not the current
read/write throughput. So the ETA is affected
accordingly.
Kotresh HR [Sat, 28 Mar 2026 10:57:02 +0000 (16:27 +0530)]
tools/cephfs_mirror: Add read/write throughput
The read throughput added measures the bytes
read per second from the source ceph filesystem.
Similarly, the write throughput added measures
the bytes written per second to the remote ceph
filesystem. It's derived from the time spent
in preadv and pwritev calls.
sync-mode:
---------
The 'sync-mode: full/delta' is added to peer status.
The 'delta' means, blockdiff along with snapdiff is
being used to sync the files where as 'full' means
full directory is crawled and each file is synced
entirely.
crawl:
-----
The state can be in-progress/completed. This
identifies whether the crawler thread is done
queuing the files for data sync threads.
The time taken for the duration is also shown.
If the crawl is in-progress, the duration
would show the time taken till then from the
start of the crawl. If the crawl state is
completed, then duration indicates total
time taken for the crawl.
The crawl duration is shown in "d h m s" format.
The existing 'sync_duration' in last_synced_snap
is also formatted
The values are as below. When crawl state is
completed, the 'total_files' metric doesn't
grow anymore.
crawl_duration:
--------------
The crawl_duration of last snapshot is saved in last_synced_snap
section as well.
Kotresh HR [Mon, 16 Feb 2026 10:59:31 +0000 (16:29 +0530)]
tools/cephfs_mirror: Add inprogress bytes and files metric
Add following mirroring progress metrics to current_syncing_snap
as below
bytes:
sync_bytes - bytes synced till now
total_bytes - total bytes to be synced
sync_percent - Percentage of bytes synced till now
files:
total_files - Total files to be synced
sync_files - files synced till now
sync_percent - Percentage of files synced till now
sync_files and sync_bytes are also stored in last_synced_snap section
after the snapshot is synced.
Ashwin M. Joshi [Tue, 10 Feb 2026 06:29:49 +0000 (11:59 +0530)]
mgr/cephadm: Control cephadm.log messages based on a new mgr logging level flag
Introduces a new 'cephadm_binary_logging_level' config option to control
the verbosity of cephadm logging to persistent destinations (cephadm.log, syslog).
- Adds --logging-level CLI flag (info, debug, error, warning)
- Adds mgr/cephadm/cephadm_binary_logging_level config option
- Applies logging level to file and syslog handlers
- Console handlers maintain their defaults for terminal UX
Fixes: https://tracker.ceph.com/issues/74872 Signed-off-by: Ashwin M. Joshi <ashjosh1@in.ibm.com>
doc/rados/configuration: Remove wpq recommendation warning for EC clusters
Remove the warning that recommends using wpq scheduler as a fallback for EC
clusters. This issue is addressed by considering EC recovery reads as
background, assigning an accurate cost for those reads and tuning the QoS
parameters associated with best-effort class of operations.
mclock_common: adjust mClock profile parameters to prevent backfill starvation
Adjust the 'background_best_effort' queue parameters across the
three standard mClock profiles (high_client_ops, balanced, and
high_recovery_ops) to ensure best effort ops are not starved.
Previously, the 'background_best_effort' queue carried a default allocation
of 0% (MIN) reservation and a weight of 1 under these profiles. When
concurrent client traffic is dense, the zero-reservation for example completely
starves backfill sub-ops (MSG_OSD_EC_READ) on pools with
'allow_ec_optimizations' set to false. This starvation forces the Primary OSD
to hold internal BlueStore transactions and PG object locks for extended
windows, causing severe client median (50th) latency inflation.
To prevent background starvation and resolve the effects of the primary lock
retention, the profile configurations are tuned as follows:
The following profile changes forces low-cost sub-ops to clear out of peer
queues rapidly to drop primary locks, which helps improve the client
completion latency and tail latency (95th, 99th and 99.5th) percentile.
1. high_client_ops profile:
- Grant 'background_best_effort' a safe 5% minimum reservation.
- Scale the queue weight to 4.
2. balanced profile:
- Grant 'background_best_effort' a 5% minimum reservation.
- Set the queue weight to 2.
3. high_recovery_ops profile:
- Grant 'background_best_effort' a 5% minimum reservation.
- Set the queue weight to 2.
4. Modify the mClock config reference documentation to reflect the tuning
changes to the best-effort QoS parameters across the profiles.
Note on Proportional Scaling Compatibility:
Configuring these changes shifts total reservations to 105% (e.g., 50%
client + 50% recovery + 5% best-effort under the Balanced profile). Under
heavy concurrent saturation, mClock's internal controls resolves this
gracefully via proportional down-scaling, preserving the underlying
device bandwidth limits for different classes of clients. For example instead
of the client being allocated 50% bandwidth, a slightly lower reservation is
allocated while shifting the remaining bandwidth to the best-effort queue.
This minor scaling shift is virtually unnoticeable to the client application,
but it prevents the internal queue deadlocks.
mclock_common, mClockScheduler: Add perf counters for scheduler ops
Add perf counters to show the status pertaining to the number of ops,
dynamic queue lengths, queue latency and bytes read for the following
ops handled in the high queues and in the scheduler queues:
- peering
- client
- ec reads/writes
- ec recovery reads
Additional counters can be added in the future based on the requirement.
src/messages, osd: Calculate and set cost for subOpReads for mClock scheduler
Previously, sub-op reads returned a hardcoded cost of 0, bypassing
mClock's background bandwidth and tag calculation mechanisms. This
allowed backfill operations to proceed un-metered, occasionally causing
backend resource contention and driving up client tail latencies.
Cost is calculated based on whether the complete chunk/shard or a subchunk
needs to be read. The possible cases are:
1. Read the complete chunk aligned length:
- Cost is set to the length of the chunk aligned extent size.
2. Fragmented reads:
- Consider the subchunk length and count to calculate the cost.
- compute_cost evaluates the exact layout of fragmented shard bytes on
disk by summing up the active subchunk allocations exactly once
(`fragmented_shard_bytes += k.second * subchunk_size`).
- Linear Extent Scaling: Scale the baseline footprint cleanly by
multiplying it against the true count of read extents (`tl.size()`),
achieving a highly efficient O(N) time complexity.
This linear cost model is compatible with pools running with
'allow_ec_optimizations' set to true. Under the FastEC optimized
pipeline, most operations are unified and bypass fragment slicing,
meaning requests will primarily match the Case 1 chunk-aligned path.
In Case 2 where applicable, the O(N) loop ensures that cost will
scale proportionally according to the layout.
It is important to note that the amount of data to read was set to an upper
bound defined by osd_recovery_max_chunk (8 MiB) and was rounded up to the
stripe width. The reason for setting a higher than actual upper bound is that
there may be cases where the object doesn't have the xattrs yet to determine
its size. Therefore, the amount to read was ultimatly set to ~(8 MiB / k)
where k is the number of data shards. This can cause mClock to prolong
the recovery times as items stay longer in the queue. To address this, the
amount to read is set to the remaining length of the object to recover
if the object size is known. Otherwise, the amount to read is set to the
recovery chunk size as before. Therefore, in some cases, only the first
recovery read could be costly if the object context is not known.
The MOSDECSubOpRead class introduces the following:
- cost member. This necessitates an increment to the HEAD_VERSION and
appropriate handling within the encode and decode methods.
- compute_cost() that is called when creating the message by
ECCommonL::ReadPipeline::do_read_op(). This calls into ECSubRead::cost()
that performs the actual calculations to set the cost based on the cases
mentioned above.
- The same sequence applies to the EC optimized path in
ECCommon::ReadPipeline::do_read_op().
osd/scheduler: Classify EC subOp reads according to op priority for mClock
The change brings MSG_OSD_EC_READ into the fold of mClock scheduler. This
improves the scheduling of client and other classes of operation as they
are no longer unnecessarily preempted by the 'immediate' queue.
EC SubOps are now handled as follows:
- EC SubOp reads generated during recovery will either go into the
'background_recovery' or 'background_best_effort' class based on
the recovery priority set for the op. EC SubOp reads generated due
to client will continue to be classified as 'immediate'.
- EC SubOp writes generated as a result of client operations will
continue to be classified as 'immediate'.
- EC SubOp replies are considered high priority and therefore
continue to be classed as 'immediate'.