ftbfs:
```
/home/jenkins-build/build/workspace/ceph-pull-requests/src/neorados/cls/fifo/detail/fifo.h:630:14: error: no member named 'parse' in namespace 'ceph'; did you mean 'pause'?
630 | auto n = ceph::parse<decltype(m.num)>(num);
| ^~~~~~~~~~~
```
Nizamudeen A [Fri, 8 Aug 2025 06:42:20 +0000 (12:12 +0530)]
mgr/dashboard: fix memory leak in prometheus service
Prometheus API calls in the Cluster Utilization call is subscribed in
the for loop multiple times but this is not properly unsubscribed. As we
stay in the dashboard page for longer time, it produces a significant
memory leak which eventually lags the UI. Attempting to fix it by
properly handling the subscription
Fixes: https://tracker.ceph.com/issues/72511 Signed-off-by: Nizamudeen A <nia@redhat.com>
Afreen Misbah [Wed, 6 Aug 2025 07:37:16 +0000 (13:07 +0530)]
mgr/dashboard: Stop rules api being polled on every page
- /rules ar epolled every 5 seconds on every page
- it is only required for alerts page where full rules list is shown in `Alerts` tab
- also added observable for getting rules instead of plain array
Nizamudeen A [Mon, 28 Jul 2025 08:22:36 +0000 (13:52 +0530)]
mgr/dashboard: fix table dom re-rendering
each table refresh creates a new data or update the existing data. this
causes the existing data to be completely replaced with a newer one and
thereby loosing the trackBy functionality. So I am modifying the data
in-place so that the memory reference doesn't get changed
Fixes: https://tracker.ceph.com/issues/72491 Signed-off-by: Nizamudeen A <nia@redhat.com>
crimson/tools: Added PG log and rgw_index workload
This commit includes 2 workloads to crimson-store-bench
(a)PG_log workload with sequential omap write and delete
(b)RGW_index workload with randomised omap write and delete
Output is the number of operations, the total latency in seconds and the
duration of the workload in seconds per reactor.
Avoids severe slowdowns with detect_stack_use_after_return=1.
The root cause is unclear, but ASan's fake stack GC behavior is
suspected. Tuning the UAR (Use-After-Return) fake stack size
(reduced from 64KB–1MB to 64KB) helped delay the onset of the
performance degradation.
Fixes: https://tracker.ceph.com/issues/71704 Signed-off-by: Chanyoung Park <chaney.p@kakaoenterprise.com>
Zac Dover [Thu, 7 Aug 2025 05:03:22 +0000 (15:03 +1000)]
doc/cephfs: edit troubleshooting.rst
Follow up on comments made by Anthony D'Atri in
https://github.com/ceph/ceph/pull/64832 and make other small changes to
increase the ease of reading this text.
Adam C. Emerson [Fri, 18 Apr 2025 07:27:36 +0000 (03:27 -0400)]
rgw: Add run_coro utility
A convenience function for turning coroutines that return values and
use exceptions, `error_code`, or similar into `int`-returning
functions that take references to out parameters.
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Adam C. Emerson [Wed, 6 Aug 2025 20:02:32 +0000 (16:02 -0400)]
common/async: Update `use_blocked` for newer asio
Reimplement with `initiate` rather than the old style. This
necessitates getting rid of the old `async::Completion` in anything
that was calling it, and other changes.
Also, use disposition for error handling.
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Ronen Friedman [Wed, 6 Aug 2025 05:38:07 +0000 (00:38 -0500)]
osd/scrub: do not limit operator-initiated repairs
'auto-repair' scrubs are limited to a maximum of
'scrub_auto_repair_num_errors' damaged objects.
However, operator-initiated repairs should not be limited
by that number. Alas, a bug in a previous commit
(97de817ad1c253ee1c7c9c9302981ad2435301b9) modified the
code in such a way that it applied the
'scrub_auto_repair_num_errors' limit to all repairs,
including operator-initiated ones. This commit fixes that.
Ville Ojamo [Wed, 6 Aug 2025 05:05:49 +0000 (12:05 +0700)]
doc/install: Linkify mention of ceph.conf and use ref for links
Linkify first mention of config file to ceph.conf docs in:
- install-storage-cluster.rst
- manual-deployment.rst
- manual-freebsd-deployment.rst
Use ref instead of an external link in:
- clone-source.rst
- get-packages.rst
- index_manual.rst
- install-storage-cluster.rst
- manual-deploymen.rst
- manual-freebsd-deployment.rst
Only where a label already exists at the destination.
Delete the old link definition if one was used previously.
That should be about all external links in install/ that can use
existing labels for ref.
Fix an instance of "the a" into just "a" that is consistent with other
similar mentions in manual-freebsd-deployment.rst.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Adam C. Emerson [Mon, 30 Jun 2025 20:54:46 +0000 (16:54 -0400)]
rgw/datalog: Manage and shutdown tasks properly
This is slightly ugly but good enough for now. Make sure we can block
when shutting down background tasks.
Remove a few `driver` parameters that are unused. This lets us
simplify the IAM Policy and Lua tests and not construct stores we
never use. (Which is good since we aren't running them under a cluster.)
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Adam C. Emerson [Fri, 30 May 2025 20:54:45 +0000 (16:54 -0400)]
neorados: Hold reference to implementation across operations
Asynchrony combined with cancellations keeps leading to occasional
lifetime issues, so follow the best-practices of Asio I/O objects by
having completions keep a reference live.
The original NeoRados backing implements Asio's two-phase shutdown
properly.
The RadosClient backing does not, because it shares an Objecter with
completions that do not belong to it. In practice I don't think this
will matter since librados and neorados get shut down around the same
time.
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
* kafka: pass full broker list to consumer in tests
* kafka: use ip instead of localhost
* kafka: make sure topic exists before consumer start
* kafka: fix zookeeper and broker conf in tests
* kafka: verify receiver in the test
* kafka: tests were not running (Fixes: https://tracker.ceph.com/issues/72240)
* kafka: failover tests were failing (Fixes: https://tracker.ceph.com/issues/71585)
* simplify basic tests run command
* v2 migration tests were not running
* fix failing migration tests
Bill Scales [Fri, 1 Aug 2025 15:17:58 +0000 (16:17 +0100)]
doc: erasure coding enhancements for tentacle
* Document new pool flag allow_ec_optimizations
* Reference new conf setting osd_pool_default_flag_ec_optimizations
* Add section describing Erasure Code Optimizations
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
Ville Ojamo [Tue, 5 Aug 2025 15:34:26 +0000 (22:34 +0700)]
doc/rados: Remove obsolete fs-recomm links
2 files linked to filesystem-recommendations.rst which was removed
around the year 2017.
I understand this was relevant only for Filestore. So simply remove the
references to this file & the link definition if one was used.
Ville Ojamo [Tue, 5 Aug 2025 14:45:05 +0000 (21:45 +0700)]
doc/rados: Use ref instead of relative external links
Instead of external links use :ref: where dst labels exist already in:
operations/erasure-code.rst
operations/pools.rst
troubleshooting/troubleshooting-osd.rst
Use link text generation where it is reasonably close to previous manual
link text.
Delete some unused link definitions.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Zac Dover [Tue, 5 Aug 2025 11:24:41 +0000 (21:24 +1000)]
doc/cephfs: edit troubleshooting.rst
Edit "Stuck in up:replay" under the "Stuck During Recovery" section of
doc/cephfs/troubleshooting.rst. I had planned to edit the entire "Stuck
During Recovery" section in a single commit, but I think that the
material is too involved for that.