Igor Golikov [Sun, 10 Aug 2025 17:48:55 +0000 (17:48 +0000)]
client: aggregate metrics on the fly
Instead of keeping vector<SimpleIOMetric>, which must be capped in terms
of memory footprint, aggregate metrics per each subvolume on the fly,
keeping memory footprint minimal.
Signed-off-by: Igor Golikov <igolikov@ibm.com> Fixes: https://tracker.ceph.com/issues/68929
Igor Golikov [Thu, 10 Jul 2025 10:18:57 +0000 (10:18 +0000)]
mds: aggregate and expose subvolume metrics
rank0 periodically receives subvolume metrics from other MDS instances
and aggregate subvolume metrics using sliding window.
The MetricsAggregator exposes PerfCounters and PerfQueries for these
metrics.
Fixes: https://tracker.ceph.com/issues/68931 Signed-off-by: Igor Golikov <igolikov@ibm.com>
Igor Golikov [Thu, 10 Jul 2025 10:17:36 +0000 (10:17 +0000)]
client,mds: add support for subvolume level metrics
Add support for client side metrics collection using SimpleIOMetric
struct and aggregation using AggregatedIOMetrics struct,
Client holds SimpleIOMetrics vector per each subvolume it recognized
(via caps/metadata messages), aggregates them into the
AggregatedIOMetric struct, and sends periodically to the MDS, along
with regulat client metrics.
MDS holds map of subvolume_path -> vector<AggregatedIOMetrics> and sends
it periodically to rank0, for further aggregation and exposure.
Fixes: https://tracker.ceph.com/issues/68929, https://tracker.ceph.com/issues/68930 Signed-off-by: Igor Golikov <igolikov@ibm.com>
Nizamudeen A [Mon, 28 Jul 2025 08:22:36 +0000 (13:52 +0530)]
mgr/dashboard: fix table dom re-rendering
each table refresh creates a new data or update the existing data. this
causes the existing data to be completely replaced with a newer one and
thereby loosing the trackBy functionality. So I am modifying the data
in-place so that the memory reference doesn't get changed
Fixes: https://tracker.ceph.com/issues/72491 Signed-off-by: Nizamudeen A <nia@redhat.com>
crimson/tools: Added PG log and rgw_index workload
This commit includes 2 workloads to crimson-store-bench
(a)PG_log workload with sequential omap write and delete
(b)RGW_index workload with randomised omap write and delete
Output is the number of operations, the total latency in seconds and the
duration of the workload in seconds per reactor.
Avoids severe slowdowns with detect_stack_use_after_return=1.
The root cause is unclear, but ASan's fake stack GC behavior is
suspected. Tuning the UAR (Use-After-Return) fake stack size
(reduced from 64KB–1MB to 64KB) helped delay the onset of the
performance degradation.
Fixes: https://tracker.ceph.com/issues/71704 Signed-off-by: Chanyoung Park <chaney.p@kakaoenterprise.com>
Zac Dover [Thu, 7 Aug 2025 05:03:22 +0000 (15:03 +1000)]
doc/cephfs: edit troubleshooting.rst
Follow up on comments made by Anthony D'Atri in
https://github.com/ceph/ceph/pull/64832 and make other small changes to
increase the ease of reading this text.
Adam C. Emerson [Fri, 18 Apr 2025 07:27:36 +0000 (03:27 -0400)]
rgw: Add run_coro utility
A convenience function for turning coroutines that return values and
use exceptions, `error_code`, or similar into `int`-returning
functions that take references to out parameters.
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Adam C. Emerson [Wed, 6 Aug 2025 20:02:32 +0000 (16:02 -0400)]
common/async: Update `use_blocked` for newer asio
Reimplement with `initiate` rather than the old style. This
necessitates getting rid of the old `async::Completion` in anything
that was calling it, and other changes.
Also, use disposition for error handling.
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Ronen Friedman [Wed, 6 Aug 2025 05:38:07 +0000 (00:38 -0500)]
osd/scrub: do not limit operator-initiated repairs
'auto-repair' scrubs are limited to a maximum of
'scrub_auto_repair_num_errors' damaged objects.
However, operator-initiated repairs should not be limited
by that number. Alas, a bug in a previous commit
(97de817ad1c253ee1c7c9c9302981ad2435301b9) modified the
code in such a way that it applied the
'scrub_auto_repair_num_errors' limit to all repairs,
including operator-initiated ones. This commit fixes that.
Ville Ojamo [Wed, 6 Aug 2025 05:05:49 +0000 (12:05 +0700)]
doc/install: Linkify mention of ceph.conf and use ref for links
Linkify first mention of config file to ceph.conf docs in:
- install-storage-cluster.rst
- manual-deployment.rst
- manual-freebsd-deployment.rst
Use ref instead of an external link in:
- clone-source.rst
- get-packages.rst
- index_manual.rst
- install-storage-cluster.rst
- manual-deploymen.rst
- manual-freebsd-deployment.rst
Only where a label already exists at the destination.
Delete the old link definition if one was used previously.
That should be about all external links in install/ that can use
existing labels for ref.
Fix an instance of "the a" into just "a" that is consistent with other
similar mentions in manual-freebsd-deployment.rst.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Adam C. Emerson [Mon, 30 Jun 2025 20:54:46 +0000 (16:54 -0400)]
rgw/datalog: Manage and shutdown tasks properly
This is slightly ugly but good enough for now. Make sure we can block
when shutting down background tasks.
Remove a few `driver` parameters that are unused. This lets us
simplify the IAM Policy and Lua tests and not construct stores we
never use. (Which is good since we aren't running them under a cluster.)
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Adam C. Emerson [Fri, 30 May 2025 20:54:45 +0000 (16:54 -0400)]
neorados: Hold reference to implementation across operations
Asynchrony combined with cancellations keeps leading to occasional
lifetime issues, so follow the best-practices of Asio I/O objects by
having completions keep a reference live.
The original NeoRados backing implements Asio's two-phase shutdown
properly.
The RadosClient backing does not, because it shares an Objecter with
completions that do not belong to it. In practice I don't think this
will matter since librados and neorados get shut down around the same
time.
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
* kafka: pass full broker list to consumer in tests
* kafka: use ip instead of localhost
* kafka: make sure topic exists before consumer start
* kafka: fix zookeeper and broker conf in tests
* kafka: verify receiver in the test
* kafka: tests were not running (Fixes: https://tracker.ceph.com/issues/72240)
* kafka: failover tests were failing (Fixes: https://tracker.ceph.com/issues/71585)
* simplify basic tests run command
* v2 migration tests were not running
* fix failing migration tests
Bill Scales [Fri, 1 Aug 2025 15:17:58 +0000 (16:17 +0100)]
doc: erasure coding enhancements for tentacle
* Document new pool flag allow_ec_optimizations
* Reference new conf setting osd_pool_default_flag_ec_optimizations
* Add section describing Erasure Code Optimizations
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
Ville Ojamo [Tue, 5 Aug 2025 15:34:26 +0000 (22:34 +0700)]
doc/rados: Remove obsolete fs-recomm links
2 files linked to filesystem-recommendations.rst which was removed
around the year 2017.
I understand this was relevant only for Filestore. So simply remove the
references to this file & the link definition if one was used.
Ville Ojamo [Tue, 5 Aug 2025 14:45:05 +0000 (21:45 +0700)]
doc/rados: Use ref instead of relative external links
Instead of external links use :ref: where dst labels exist already in:
operations/erasure-code.rst
operations/pools.rst
troubleshooting/troubleshooting-osd.rst
Use link text generation where it is reasonably close to previous manual
link text.
Delete some unused link definitions.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Zac Dover [Tue, 5 Aug 2025 11:24:41 +0000 (21:24 +1000)]
doc/cephfs: edit troubleshooting.rst
Edit "Stuck in up:replay" under the "Stuck During Recovery" section of
doc/cephfs/troubleshooting.rst. I had planned to edit the entire "Stuck
During Recovery" section in a single commit, but I think that the
material is too involved for that.
Xuehan Xu [Fri, 21 Mar 2025 02:58:27 +0000 (10:58 +0800)]
crimson/os/seastore/object_data_handler: LBACursor based overwrite
This should avoid unnecessary lba tree searches in the old
implementation of ObjectDataHandler::overwrite()
Overwrites of ObjectDataBlocks are dealt with by first punching holes
in the lba tree and then inserting new extents in the holes.
Specifically, overwrites are classified into two categories:
1. the range of the overwrite falls in a single lba mapping;
2. the range of the overwrite crosses multiple lba mappings.
For the first category, ObjectDataHandler processes the overwrites in
the following way:
1. if the mapping is a pending one (corresponds to a pending extent),
merge the overwrite with the data of the pending extent;
2. otherwise, if the overwrite can, apply it with delta based
overwrites;
3. otherwise, punch a hole in the mapping, insert a new extent with the
data of the overwrite.
For the second category, the overwrite is processed as follows:
1. if the left boundary of the overwrite is inside an existing mapping,
deal with the mapping in a way similar to the single-mapping
overwrites;
2. remove all lba mappings that are strictly within the range of the
overwrite;
3. deal with the right boundary of the overwrite in the same way as the
left boundary.
Xuehan Xu [Wed, 19 Mar 2025 09:06:12 +0000 (17:06 +0800)]
crimson/os/seastore/lba_mapping: make LBAMapping duplicate
light-weighted and safe
This commit removes LBAMapping::child_pos, forces TransactioManager
methods to link children directly through child_pos_t, so that
LBAMapping::duplicate() can be a shallow one.