Ronen Friedman [Tue, 20 May 2025 07:29:04 +0000 (02:29 -0500)]
osd: move load avg units conversion to the client
The OSD calls OsdScrub::update_load_average() to find out the load
average, and notes it down in a performance counter. The system
load average is multipled by 100 (to improve precision). That
multiplication should be on the side of the client, not the
scrub queue service.
Ronen Friedman [Tue, 20 May 2025 05:21:37 +0000 (00:21 -0500)]
osd/scrub: remove OsdScrub::LoadTracker
As we no longer maintain a 'daily average', and as the interaction
between the load tracker and the scrub scheduler is now much simplified,
we can remove the load tracker entirely.
Ville Ojamo [Sun, 18 May 2025 05:35:48 +0000 (12:35 +0700)]
doc/man: Fix inline formatting in ceph-bluestore-tool.rst
A space is missing between a token with emphasis and the following
token:
- Not consistent with other commands like "show-label" (has space).
- Inline formatting is rendered verbatim in the second occurrence, without
the formatting being applied.
- Warning from Sphinx.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Ville Ojamo [Thu, 15 May 2025 10:32:29 +0000 (17:32 +0700)]
doc: Use existing labels and ref for hyperlinks in architecture.rst
Use validated ":ref:" hyperlinks instead of "external links" in "target
definitions" when linking within the Ceph docs:
- Update to use existing labels when linkin from architecture.rst.
- Remove unused "target definitions".
Also use title case for section titles in
doc/start/hardware-recommendations.rst because change to use link text
generated from section title.
Other than generated link texts the rendered PR should look the same as
the old docs, only differing in the source RST.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Ronen Friedman [Sat, 17 May 2025 07:17:42 +0000 (02:17 -0500)]
osd/scrub: minimize calls to sysconf() in scrub_load_below_threshold()
Return an 'all is OK' value if the 1min CPU load - even before being
divided by the number of CPUs - is below the configured threshold.
This is a very common case, and avoids the need to call sysconf()
to get the number of CPUs.
Ronen Friedman [Sat, 17 May 2025 06:04:22 +0000 (01:04 -0500)]
osd/scrub: remove the 2'nd option for determining 'low load' for scrubbing
Previously, there were two conditions under which the CPU load was
considered
low enough to allow scrubbing:
- the CPU load was below the configured threshold, or
- the load was below a calculated "daily" average, and lower than the
15-min average.
That second condition was confusing and surprising, and is now removed.
As the scrubber logic no longer requires the 5m & 15m load averages,
scrub_load_below_threshold() can use the data gathered by the
periodic LoadTracker::update_load_average().
Kefu Chai [Thu, 15 May 2025 07:41:07 +0000 (15:41 +0800)]
tools/ceph_dedup: Add const qualifiers and reference parameters
Improve code quality in ceph_dedup tool by:
- Adding const qualifiers to member functions and parameters where appropriate
- Converting parameter passing to use references instead of value copies
for complex objects
These changes enhance code readability, better express intent through
const-correctness, and improve performance by avoiding unnecessary deep
copies.
Ville Ojamo [Fri, 16 May 2025 08:01:14 +0000 (15:01 +0700)]
doc/rados/configuration: Fix invalid hyperlinks in mclock-config-ref.rst
Fix two intradocument links that pointed to the same nonexisting section
title.
These were rendered "as-is" with backticks and everything, while still
linkified pointing to invalid "#idN" anchors.
Modify them to point to the correct section title text.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Ville Ojamo [Fri, 16 May 2025 08:40:41 +0000 (15:40 +0700)]
doc/radosgw: Fix indentation in cloud-transition.rst
Indent the second paragraph of a list item at the same level as the
previous paragraph. The unexpected indentation resulted in an ERROR from
Sphinx but it was still rendered with increased indentation looking
rather out of place.
Capitalize the first letter similarly to the previous paragraph.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Ville Ojamo [Fri, 16 May 2025 08:14:09 +0000 (15:14 +0700)]
doc/rados/operations: Fix invalid hyperlink in crush-map-edits.rst
Fix attempted use of underscores for inline emphasis which resulted in
the text being emphasized to be considered a link. The text was rendered
partially as a link to an invalid anchor "#id3".
Instead use inline italic for formatting emphasis.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Kefu Chai [Thu, 15 May 2025 06:58:22 +0000 (14:58 +0800)]
tools/ceph_dedup: add modelines for emacs and vim
add modelines for emacs and vim so that the source code is formatted
with expected alignment even if the editor's default setting is not
match with the expected settings.
Zac Dover [Thu, 15 May 2025 13:24:58 +0000 (23:24 +1000)]
doc/mgr: edit dashboard.rst
Edit doc/mgr/crash.rst. Add prompts.
This changes eighty-nine prompts. Because this makes so many changes,
all other edits included in https://github.com/ceph/ceph/pull/63255 will
be made in a separate commit. This done for the sake of the patience of
the reviewers (probably Anthony, if history is any guide).
This commit is part of a project to separate out the twenty-five files
that were committed to https://github.com/ceph/ceph/pull/63255.
Kefu Chai [Thu, 15 May 2025 13:03:33 +0000 (21:03 +0800)]
common: fix backtrace leak in __ceph_abort and friends
Previously, in __ceph_abort and related abort handlers, we allocated
ClibBackTrace instances using raw pointers without proper cleanup. Since
these handlers terminate execution, the leaks didn't affect production
systems but were correctly flagged by ASan during testing:
```
Direct leak of 288 byte(s) in 1 object(s) allocated from:
#0 0x55aefe8cb65d in operator new(unsigned long) (/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/unittest_ceph_assert+0x1f465d) (BuildId: a4faeddac80b0d81062bd53ede3388c0c10680bc)
#1 0x7f3b84da988d in ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...) /home/jenkins-build/build/workspace/ceph-pull-requests/src/common/assert.cc:157:21
#2 0x55aefe8cf04b in supressed_assertf_line22() /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/ceph_assert.cc:22:3
#3 0x55aefe8ce4e6 in CephAssertDeathTest_ceph_assert_supresssions_Test::TestBody() /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/ceph_assert.cc:31:3
#4 0x55aefe99135d in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2653:10
#5 0x55aefe94f015 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2689:14
...
```
This commit resolves the issue by using std::unique_ptr to manage the
lifecycle of backtrace objects, ensuring proper cleanup even in
non-returning functions. While these leaks had no practical impact in
production (as the process terminates anyway), fixing them improves code
quality and eliminates false positives in memory analysis tools.
Aashish Sharma [Wed, 14 May 2025 08:05:10 +0000 (13:35 +0530)]
mgr/dashboard: fix flaky promql query test
There is a test collision in "promql-query-test" test suite because two different IOPS panels with the same title and legend labels (Read, Write) are present, and the test framework is not able to distinguish between them.
There are two panels with the same title IOPS and legends Read / Write, but different expressions:\
1. ceph-application-overview.json
2. ceph-cluster-advanced.json
Ville Ojamo [Thu, 15 May 2025 09:46:21 +0000 (16:46 +0700)]
doc/radosgw: Use ref for hyperlinking to multisite
Use validated ":ref:" hyperlinks instead of "external links" in "target
definitions" when linking within the Ceph docs:
- Update to use existing label in multisite.rst.
- Remove unused "target definitions".
Also use existing label for linking from multisite.rst.
Fix a broken link within multisite.rst.
The rendered PR should look the same as the old docs, only differing in
the source RST.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Venky Shankar [Wed, 14 May 2025 10:17:21 +0000 (11:17 +0100)]
Revert "mgr/volumes: handle bad arguments during subvolume create"
PR https://github.com/ceph/ceph/pull/53989 is causing failures in
fs:upgrade. Also, @VallariAg reported an issue with something
similar. I don't think adequate tests were run to qualify the PR
as mergeable. Reverting the change for now.
Matan Breizman [Wed, 7 May 2025 12:34:35 +0000 (12:34 +0000)]
qa/suites/crimson-rados:Seastore (recovery) thrash tests
Seastore is currently only being tested with thrash_simpe without recovery.
This commit adds recovery thrash tests with radosbench only for now.
Other workloads, mainly `ceph_test_rados` (rados) are not yet supported.
See: https://tracker.ceph.com/issues/71237
Zac Dover [Tue, 13 May 2025 06:31:42 +0000 (16:31 +1000)]
doc/dev/cephfs-mirroring: edit file 1 of x
Add prompts (and perform necessary corrections to glaring grammatical
errors) to doc/dev/cephfs-mirroring.rst, as requested by Jos Collin in https://github.com/ceph/ceph/pull/63237/files#r2085886075.
This commit edits the first quarter of the doc/dev/cephfs-mirroring.rst
file. This commit encompasses about one-hundred lines of RST.
Zac Dover [Tue, 13 May 2025 06:58:39 +0000 (16:58 +1000)]
doc/dev/cephfs-mirroring: edit file 2 of x
Add prompts (and perform necessary corrections to glaring grammatical
errors) to doc/dev/cephfs-mirroring.rst, as requested by Jos Collin in
https://github.com/ceph/ceph/pull/63237/files#r2085886075.
This commit edits the second quarter of the doc/dev/cephfs-mirroring.rst
file. This commit encompasses about one-hundred lines of RST.
Afreen Misbah [Tue, 6 May 2025 14:27:03 +0000 (19:57 +0530)]
mgr/dashboard: Fix delete listener
- pass gw_group to delete API in frontend
- when more than one gw groups present delete listener failing with error message: Multiple NVMe-oF gateway groups are configured. Please specify the 'gw_group' parameter in the request.
- added missing types, i18n
Afreen Misbah [Thu, 8 May 2025 04:09:59 +0000 (09:39 +0530)]
mgr/dashboard: Add default state when gateway groups are empty
Fixes https://tracker.ceph.com/issues/71247
- after upgrades the nvmeof service spec does not contain `group` field
- this causes UI combobox internal errors
- checking for `group` in spec and disabling the selector
Venky Shankar [Mon, 12 May 2025 13:02:40 +0000 (18:32 +0530)]
Merge PR #62250 into main
* refs/pull/62250/head:
qa/cephfs: increase data to be delay data sync by mirror daemon
cephfs-mirror: integrate blockdiff API for regular file transfers
mds: dout snapdiff snapid's before validation check
cephfs-mirror: current sync mechanism uses sync mechanism subclass'ing
qa: add test for syncing already existing snapshots
cephfs_mirror: avoid latest changes on the source fs to enable mirroring
Matan Breizman [Mon, 12 May 2025 11:16:26 +0000 (11:16 +0000)]
crimson/osd/../client_request: add logs around get_obc stage
If this stage is in use by other operation we would keep waiting for it
to finish. Add logs before entering the stage and after to keep track of
stuck requests.
Matan Breizman [Mon, 12 May 2025 11:14:47 +0000 (11:14 +0000)]
crimson/osd/pg: set log_entry_update_waiting_on prior to sending requests
Before this patch, we would first send the MOSDPGUpdateLogMissing to
all peers and only then insert this rep_tid to log_entry_update_waiting_on.
This could have resulted in race where we receive the reply prior to
actually inserting the rep_tid.
The reply would have been discarded with "reply on unknown tid" (which
is now aborting).
The unhandled reply would have not let submit_error to return and would
keep holding the lock on this obc.
Ronen Friedman [Sun, 11 May 2025 05:24:33 +0000 (00:24 -0500)]
osd/scrub: remove the 'deadline' attribute from the scrub job
The scrub job's 'overdue' attribute is no longer calculated -
the only 'scrub is overdue' status remaining after latest
scheduling refactor, is the one performed in PGMap.cc (the
one affecting the 'health warning' status of the cluster).
Thus - there is no longer any reason to maintain any 'deadline'
attribute for the scrub scheduler.
Ronen Friedman [Fri, 9 May 2025 12:46:26 +0000 (07:46 -0500)]
osd/scrub: remove the deep-scrubs deadline attribute
As it is no longer meaningful in the context of the new
scrub scheduling design.
The change mandates fixes to the way 'schedule-[deeps]crub'
commands are implemented. The offset to use when forcing the
last-scrub timestamp to a new value in now calculated in
ScrubJob::guaranteed_offset(), as ScrubJob is where all
schedule adjustments (which employ the same logic) are
implemented.
Samuel Just [Fri, 9 May 2025 16:46:48 +0000 (16:46 +0000)]
crimson/osd/pg_recovery: only reset_pglog_based_recovery_op if complete
ce4e9aaad, as part of the start_recovery_ops changed the call to
reset_pglog_based_recovery_op to occur unconditionally rather than only
if recovery has completed.
Note, this fix only restores the prior behavior. There's actually still
a race here where a DeferRecovery could be processed between the call to
reset_pglog_based_recovery_op and the RequestBackfill or
AllReplicasRecovered being processed.
Introduced: ce4e9aaad8f2cafae24511fe1687c61dc41affc1
Related: https://tracker.ceph.com/issues/71267 Fixes: https://tracker.ceph.com/issues/70337 Signed-off-by: Samuel Just <sjust@redhat.com>
Kefu Chai [Wed, 7 May 2025 01:09:00 +0000 (09:09 +0800)]
tools/ceph_dedup: remove 'using namespace std'
Remove 'using namespace std' from common.h to maintain consistent coding
practices. Although common.h is only used by ceph_dedup implementation,
keeping namespace declarations out of header files prevents potential
name conflicts and follows best practices for C++ code organization.
This change improves code clarity and reduces the risk of symbol collisions
when standard library elements are used alongside custom
implementations.
Ville Ojamo [Fri, 9 May 2025 08:17:00 +0000 (15:17 +0700)]
doc/radosgw: Use ref for hyperlinks, 1st batch
Use validated ":ref:" hyperlinks instead of "external links" in "target
definitions" when linking within the Ceph docs:
- Add a label at beginning of referenced files if missing.
- Remove unused "target definitions".
The rendered PR should look the same as the old docs, only differing in
the source RST.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>