Venky Shankar [Tue, 20 May 2025 05:28:55 +0000 (10:58 +0530)]
Merge PR #62632 into main
* refs/pull/62632/head:
libcephfs: increment library minor version
test: add test to fetch perf counters via libcephfs API
libcephfs: add API to get client perf counters
client: fix total write operations perf counter name
Ville Ojamo [Sun, 18 May 2025 05:35:48 +0000 (12:35 +0700)]
doc/man: Fix inline formatting in ceph-bluestore-tool.rst
A space is missing between a token with emphasis and the following
token:
- Not consistent with other commands like "show-label" (has space).
- Inline formatting is rendered verbatim in the second occurrence, without
the formatting being applied.
- Warning from Sphinx.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Ville Ojamo [Thu, 15 May 2025 10:32:29 +0000 (17:32 +0700)]
doc: Use existing labels and ref for hyperlinks in architecture.rst
Use validated ":ref:" hyperlinks instead of "external links" in "target
definitions" when linking within the Ceph docs:
- Update to use existing labels when linkin from architecture.rst.
- Remove unused "target definitions".
Also use title case for section titles in
doc/start/hardware-recommendations.rst because change to use link text
generated from section title.
Other than generated link texts the rendered PR should look the same as
the old docs, only differing in the source RST.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Ronen Friedman [Sat, 17 May 2025 07:17:42 +0000 (02:17 -0500)]
osd/scrub: minimize calls to sysconf() in scrub_load_below_threshold()
Return an 'all is OK' value if the 1min CPU load - even before being
divided by the number of CPUs - is below the configured threshold.
This is a very common case, and avoids the need to call sysconf()
to get the number of CPUs.
Ronen Friedman [Sat, 17 May 2025 06:04:22 +0000 (01:04 -0500)]
osd/scrub: remove the 2'nd option for determining 'low load' for scrubbing
Previously, there were two conditions under which the CPU load was
considered
low enough to allow scrubbing:
- the CPU load was below the configured threshold, or
- the load was below a calculated "daily" average, and lower than the
15-min average.
That second condition was confusing and surprising, and is now removed.
As the scrubber logic no longer requires the 5m & 15m load averages,
scrub_load_below_threshold() can use the data gathered by the
periodic LoadTracker::update_load_average().
Kefu Chai [Thu, 15 May 2025 07:41:07 +0000 (15:41 +0800)]
tools/ceph_dedup: Add const qualifiers and reference parameters
Improve code quality in ceph_dedup tool by:
- Adding const qualifiers to member functions and parameters where appropriate
- Converting parameter passing to use references instead of value copies
for complex objects
These changes enhance code readability, better express intent through
const-correctness, and improve performance by avoiding unnecessary deep
copies.
Ville Ojamo [Fri, 16 May 2025 08:01:14 +0000 (15:01 +0700)]
doc/rados/configuration: Fix invalid hyperlinks in mclock-config-ref.rst
Fix two intradocument links that pointed to the same nonexisting section
title.
These were rendered "as-is" with backticks and everything, while still
linkified pointing to invalid "#idN" anchors.
Modify them to point to the correct section title text.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Ville Ojamo [Fri, 16 May 2025 08:40:41 +0000 (15:40 +0700)]
doc/radosgw: Fix indentation in cloud-transition.rst
Indent the second paragraph of a list item at the same level as the
previous paragraph. The unexpected indentation resulted in an ERROR from
Sphinx but it was still rendered with increased indentation looking
rather out of place.
Capitalize the first letter similarly to the previous paragraph.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Ville Ojamo [Fri, 16 May 2025 08:14:09 +0000 (15:14 +0700)]
doc/rados/operations: Fix invalid hyperlink in crush-map-edits.rst
Fix attempted use of underscores for inline emphasis which resulted in
the text being emphasized to be considered a link. The text was rendered
partially as a link to an invalid anchor "#id3".
Instead use inline italic for formatting emphasis.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Kefu Chai [Thu, 15 May 2025 06:58:22 +0000 (14:58 +0800)]
tools/ceph_dedup: add modelines for emacs and vim
add modelines for emacs and vim so that the source code is formatted
with expected alignment even if the editor's default setting is not
match with the expected settings.
Zac Dover [Thu, 15 May 2025 13:24:58 +0000 (23:24 +1000)]
doc/mgr: edit dashboard.rst
Edit doc/mgr/crash.rst. Add prompts.
This changes eighty-nine prompts. Because this makes so many changes,
all other edits included in https://github.com/ceph/ceph/pull/63255 will
be made in a separate commit. This done for the sake of the patience of
the reviewers (probably Anthony, if history is any guide).
This commit is part of a project to separate out the twenty-five files
that were committed to https://github.com/ceph/ceph/pull/63255.
Kefu Chai [Thu, 15 May 2025 13:03:33 +0000 (21:03 +0800)]
common: fix backtrace leak in __ceph_abort and friends
Previously, in __ceph_abort and related abort handlers, we allocated
ClibBackTrace instances using raw pointers without proper cleanup. Since
these handlers terminate execution, the leaks didn't affect production
systems but were correctly flagged by ASan during testing:
```
Direct leak of 288 byte(s) in 1 object(s) allocated from:
#0 0x55aefe8cb65d in operator new(unsigned long) (/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/unittest_ceph_assert+0x1f465d) (BuildId: a4faeddac80b0d81062bd53ede3388c0c10680bc)
#1 0x7f3b84da988d in ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...) /home/jenkins-build/build/workspace/ceph-pull-requests/src/common/assert.cc:157:21
#2 0x55aefe8cf04b in supressed_assertf_line22() /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/ceph_assert.cc:22:3
#3 0x55aefe8ce4e6 in CephAssertDeathTest_ceph_assert_supresssions_Test::TestBody() /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/ceph_assert.cc:31:3
#4 0x55aefe99135d in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2653:10
#5 0x55aefe94f015 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2689:14
...
```
This commit resolves the issue by using std::unique_ptr to manage the
lifecycle of backtrace objects, ensuring proper cleanup even in
non-returning functions. While these leaks had no practical impact in
production (as the process terminates anyway), fixing them improves code
quality and eliminates false positives in memory analysis tools.
Aashish Sharma [Wed, 14 May 2025 08:05:10 +0000 (13:35 +0530)]
mgr/dashboard: fix flaky promql query test
There is a test collision in "promql-query-test" test suite because two different IOPS panels with the same title and legend labels (Read, Write) are present, and the test framework is not able to distinguish between them.
There are two panels with the same title IOPS and legends Read / Write, but different expressions:\
1. ceph-application-overview.json
2. ceph-cluster-advanced.json
Ville Ojamo [Thu, 15 May 2025 09:46:21 +0000 (16:46 +0700)]
doc/radosgw: Use ref for hyperlinking to multisite
Use validated ":ref:" hyperlinks instead of "external links" in "target
definitions" when linking within the Ceph docs:
- Update to use existing label in multisite.rst.
- Remove unused "target definitions".
Also use existing label for linking from multisite.rst.
Fix a broken link within multisite.rst.
The rendered PR should look the same as the old docs, only differing in
the source RST.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
doc/rados/operations.rst: add docs for availability score
This commit adds docs for how to use the availability score
feature. It also details when we consider a pool available
and when not and how we calculate the availability score.
This is a null pointer dereference issue and it happens as follows:
Uninline Data is not a regular client request ... it is an Internal Request.
So, there's no client request struct allocated and assigned in the mdr to
begin with.
In the scrubbing path, the auth validation is already done in
ScrubStack::kick_off_scrubs() ... and since Uninline Data path piggybacks
on the scrubbing path, we get the auth validation for free.
rdlock_path_pin_ref(), fails to lock the path if the lock is already taken.
This is what happens in the Uninline Data case. So rdlock_path_pin_ref()
creates a C_MDS_RetryRequest and this causes the request to be re-attempted
in the regular client request path where Server::handle_client_request()
assumes that the mdr->client_request member is valid ...
and hence the null pointer dereference issue.
---
Since the scrub path dequeues the CInode* from the ScrubStack, this
commit attempts to use the already available CInode*.
Venky Shankar [Wed, 14 May 2025 10:17:21 +0000 (11:17 +0100)]
Revert "mgr/volumes: handle bad arguments during subvolume create"
PR https://github.com/ceph/ceph/pull/53989 is causing failures in
fs:upgrade. Also, @VallariAg reported an issue with something
similar. I don't think adequate tests were run to qualify the PR
as mergeable. Reverting the change for now.
Matan Breizman [Wed, 7 May 2025 12:34:35 +0000 (12:34 +0000)]
qa/suites/crimson-rados:Seastore (recovery) thrash tests
Seastore is currently only being tested with thrash_simpe without recovery.
This commit adds recovery thrash tests with radosbench only for now.
Other workloads, mainly `ceph_test_rados` (rados) are not yet supported.
See: https://tracker.ceph.com/issues/71237