doc/rados/operations.rst: add docs for availability score
This commit adds docs for how to use the availability score
feature. It also details when we consider a pool available
and when not and how we calculate the availability score.
Ronen Friedman [Sat, 17 May 2025 07:17:42 +0000 (02:17 -0500)]
osd/scrub: minimize calls to sysconf() in scrub_load_below_threshold()
Return an 'all is OK' value if the 1min CPU load - even before being
divided by the number of CPUs - is below the configured threshold.
This is a very common case, and avoids the need to call sysconf()
to get the number of CPUs.
Ronen Friedman [Sat, 17 May 2025 06:04:22 +0000 (01:04 -0500)]
osd/scrub: remove the 2'nd option for determining 'low load' for scrubbing
Previously, there were two conditions under which the CPU load was
considered
low enough to allow scrubbing:
- the CPU load was below the configured threshold, or
- the load was below a calculated "daily" average, and lower than the
15-min average.
That second condition was confusing and surprising, and is now removed.
As the scrubber logic no longer requires the 5m & 15m load averages,
scrub_load_below_threshold() can use the data gathered by the
periodic LoadTracker::update_load_average().
Matan Breizman [Wed, 7 May 2025 12:34:35 +0000 (12:34 +0000)]
qa/suites/crimson-rados:Seastore (recovery) thrash tests
Seastore is currently only being tested with thrash_simpe without recovery.
This commit adds recovery thrash tests with radosbench only for now.
Other workloads, mainly `ceph_test_rados` (rados) are not yet supported.
See: https://tracker.ceph.com/issues/71237
Zac Dover [Thu, 15 May 2025 13:24:58 +0000 (23:24 +1000)]
doc/mgr: edit dashboard.rst
Edit doc/mgr/crash.rst. Add prompts.
This changes eighty-nine prompts. Because this makes so many changes,
all other edits included in https://github.com/ceph/ceph/pull/63255 will
be made in a separate commit. This done for the sake of the patience of
the reviewers (probably Anthony, if history is any guide).
This commit is part of a project to separate out the twenty-five files
that were committed to https://github.com/ceph/ceph/pull/63255.
Ville Ojamo [Thu, 15 May 2025 09:46:21 +0000 (16:46 +0700)]
doc/radosgw: Use ref for hyperlinking to multisite
Use validated ":ref:" hyperlinks instead of "external links" in "target
definitions" when linking within the Ceph docs:
- Update to use existing label in multisite.rst.
- Remove unused "target definitions".
Also use existing label for linking from multisite.rst.
Fix a broken link within multisite.rst.
The rendered PR should look the same as the old docs, only differing in
the source RST.
Zac Dover [Tue, 13 May 2025 06:31:42 +0000 (16:31 +1000)]
doc/dev/cephfs-mirroring: edit file 1 of x
Add prompts (and perform necessary corrections to glaring grammatical
errors) to doc/dev/cephfs-mirroring.rst, as requested by Jos Collin in https://github.com/ceph/ceph/pull/63237/files#r2085886075.
This commit edits the first quarter of the doc/dev/cephfs-mirroring.rst
file. This commit encompasses about one-hundred lines of RST.
Zac Dover [Tue, 13 May 2025 06:58:39 +0000 (16:58 +1000)]
doc/dev/cephfs-mirroring: edit file 2 of x
Add prompts (and perform necessary corrections to glaring grammatical
errors) to doc/dev/cephfs-mirroring.rst, as requested by Jos Collin in
https://github.com/ceph/ceph/pull/63237/files#r2085886075.
This commit edits the second quarter of the doc/dev/cephfs-mirroring.rst
file. This commit encompasses about one-hundred lines of RST.
Matan Breizman [Mon, 12 May 2025 11:16:26 +0000 (11:16 +0000)]
crimson/osd/../client_request: add logs around get_obc stage
If this stage is in use by other operation we would keep waiting for it
to finish. Add logs before entering the stage and after to keep track of
stuck requests.
Matan Breizman [Mon, 12 May 2025 11:14:47 +0000 (11:14 +0000)]
crimson/osd/pg: set log_entry_update_waiting_on prior to sending requests
Before this patch, we would first send the MOSDPGUpdateLogMissing to
all peers and only then insert this rep_tid to log_entry_update_waiting_on.
This could have resulted in race where we receive the reply prior to
actually inserting the rep_tid.
The reply would have been discarded with "reply on unknown tid" (which
is now aborting).
The unhandled reply would have not let submit_error to return and would
keep holding the lock on this obc.
Ronen Friedman [Sun, 11 May 2025 05:24:33 +0000 (00:24 -0500)]
osd/scrub: remove the 'deadline' attribute from the scrub job
The scrub job's 'overdue' attribute is no longer calculated -
the only 'scrub is overdue' status remaining after latest
scheduling refactor, is the one performed in PGMap.cc (the
one affecting the 'health warning' status of the cluster).
Thus - there is no longer any reason to maintain any 'deadline'
attribute for the scrub scheduler.
Ronen Friedman [Fri, 9 May 2025 12:46:26 +0000 (07:46 -0500)]
osd/scrub: remove the deep-scrubs deadline attribute
As it is no longer meaningful in the context of the new
scrub scheduling design.
The change mandates fixes to the way 'schedule-[deeps]crub'
commands are implemented. The offset to use when forcing the
last-scrub timestamp to a new value in now calculated in
ScrubJob::guaranteed_offset(), as ScrubJob is where all
schedule adjustments (which employ the same logic) are
implemented.
Ronen Friedman [Thu, 8 May 2025 13:45:23 +0000 (08:45 -0500)]
osd/scrub: fix deadline calculations
The scrub scheduling deadlines are calculated based on pool and OSD
configuration parameters. The specifics of the calculations are
modified to match the new scrub scheduling design.
Comments and documentation are updated to reflect the fact that
the deadlines no longer have any meaningful effect on scrub
scheduling.
Samuel Just [Thu, 24 Apr 2025 22:13:04 +0000 (15:13 -0700)]
vstart.sh: simplify crimson core assignment, use assign_crimson_cores.py
This commit simplifies the internal flow in a few ways:
- core assignment is entirely handled by prep_balance_cpu and
do_balance_cpu. The latter simply does as the cpu_table
instructs.
- assign_crimson_cores calls lscpu and taskset internally, no
need for temp files.
It also changes some defaults:
- if crimson-balance-cpu is unset or set to none, crimson-osd will not
pin cpus at all rather than using the simple sequential allocation
scheme, which could be much less efficient on platforms where
cpuids 0,1,2,3,... are on socket 0,1,2,3,... "osd" and "socket"
options provide numa aware assignments when requested.
New features:
- Alienstore cores are now assigned with assign_crimson_cores
using the same balance strategy using
--crimson-alien-num-cores.
- --crimson-reactor-physical-only and
--crimson-alienstore-physical-only will cause reactor or
alienstore cpus respectively to be allocated with one
cpu per physical core rather than including smt siblings.
Samuel Just [Tue, 29 Apr 2025 01:53:11 +0000 (01:53 +0000)]
tools/contrib: add assign_crimson_cores as a more general replacement for balance_cpu
Improvements:
- shorter
- has tests
- uses lscpu -e --json to get logical<->physical mappings and avoid
needing to parse cpu ranges in lscpu --json
- supports allocating alienstore threads
- supports requiring physical cores only independently for alienstore
and seastar reactors
Afreen Misbah [Tue, 6 May 2025 14:27:03 +0000 (19:57 +0530)]
mgr/dashboard: Fix delete listener
- pass gw_group to delete API in frontend
- when more than one gw groups present delete listener failing with error message: Multiple NVMe-oF gateway groups are configured. Please specify the 'gw_group' parameter in the request.
- added missing types, i18n
Afreen Misbah [Thu, 8 May 2025 04:09:59 +0000 (09:39 +0530)]
mgr/dashboard: Add default state when gateway groups are empty
Fixes https://tracker.ceph.com/issues/71247
- after upgrades the nvmeof service spec does not contain `group` field
- this causes UI combobox internal errors
- checking for `group` in spec and disabling the selector
N Balachandran [Wed, 30 Apr 2025 05:15:13 +0000 (10:45 +0530)]
rbd: write image mirror status if state is CREATING
It can take upto 30s for the image mirror status to be written
to rbd_mirroring on the secondary for a newly created image. This fix
attempts to reduce the time by writing the status to rbd_mirroring even
if the image state is set to CREATING.
Fixes: https://tracker.ceph.com/issues/71138 Signed-off-by: N Balachandran <nithya.balachandran@ibm.com>
(cherry picked from commit 25a8de9c3db8309387eed3502e781872bc1e035e)
Samuel Just [Fri, 9 May 2025 16:46:48 +0000 (16:46 +0000)]
crimson/osd/pg_recovery: only reset_pglog_based_recovery_op if complete
ce4e9aaad, as part of the start_recovery_ops changed the call to
reset_pglog_based_recovery_op to occur unconditionally rather than only
if recovery has completed.
Note, this fix only restores the prior behavior. There's actually still
a race here where a DeferRecovery could be processed between the call to
reset_pglog_based_recovery_op and the RequestBackfill or
AllReplicasRecovered being processed.
Zac Dover [Thu, 8 May 2025 02:29:25 +0000 (12:29 +1000)]
doc/mgr: edit alerts.rst
Edit doc/mgr/alerts.rst as part of the project to determine where the
error is in https://github.com/ceph/ceph/pull/62782 that prevents the
Jenkins tests from passing.
This commit adds to the work done in
https://github.com/ceph/ceph/pull/62782 by correcting some of the
English that was present in that PR.
This is a change to one of twenty-five files in
https://github.com/ceph/ceph/pull/62782, and this commit represents one
of what will be at least twenty-five other commits made to track this
error down.
Zac Dover [Thu, 8 May 2025 00:08:06 +0000 (10:08 +1000)]
doc/mgr/ceph_api: edit index.rst
Edit doc/mgr/ceph_api/index.rst as part of the project to determine
where the error is in https://github.com/ceph/ceph/pull/62782 that
prevents the Jenkins tests from passing.
This is a change to one of twenty-five files in
https://github.com/ceph/ceph/pull/62782, and this commit represents one
of what will be at least twenty-five other commits made to track this
error down.
Matan Breizman [Tue, 4 Feb 2025 10:24:43 +0000 (10:24 +0000)]
CMakeLists: Fallback to RelWithDebInfo
Currently, if .git exists, we set CMAKE_BUILD_TYPE=Debug.
Otherwise, we leave it empty and no optimization flags will
be used.
With this change, the fallback CMAKE_BUILD_TYPE is set
to RelWithDebInfo instead.
From CMAKE_BUILD_TYPE manual:
The default value is often an empty string, but this is usually not
desirable and one of the other standard build types is usually more appropriate.
Note: One notable change is that -DNDEBUG will now be defined.
Adam Kupczyk [Wed, 7 May 2025 08:30:11 +0000 (08:30 +0000)]
os/bluestore/recompression: Estimator omits large compressed blobs
The problem was that Estimator accepted large compressed blobs for
recompression. The fix is to discourage such actions by penalizing
compressed blobs based on their size. In effect small compressed
blob is likely to be recompressed, and large compressed blob will not.
Fixes: https://tracker.ceph.com/issues/71244 Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
(cherry picked from commit bbc9e961e9046949138bb3d70e8dd91761fcb088)
Adam Kupczyk [Wed, 7 May 2025 08:25:19 +0000 (08:25 +0000)]
os/bluestore/recompression: Now able to reach left boundary
Bad comparision caused recompression range to exclude left boundary
point. In most cases it makes little difference, but it prevents from:
1) including extent starting at 0
2) including extent at begging of onode segment
Now fixed.
Fixes: https://tracker.ceph.com/issues/71244 Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
(cherry picked from commit acfe527d9bbe3364f9e321ce6e790f93eafe41df)
Nitzan Mordechai [Wed, 26 Mar 2025 08:20:15 +0000 (08:20 +0000)]
osd_types: Restore new_object marking for delete missing entries
Recent changes (PR #29893) removed the “new_object” parameter from missing.add() and the
pg_missing_item constructor. As a result, when processing delete log entries,
if an object is found on disk, its on‑disk version is stored as “have” instead
of the default eversion_t() (0'0). The invariant in read_log_and_missing() then
fails because delete entries are expected to have “have” set to eversion_t().
This patch reintroduces the following check:
if (have == eversion_t())
clean_regions.mark_object_new();
By doing so, we ensure that when the on‑disk “have” is default, the missing record
is marked as new—restoring the previous behavior and satisfying the invariant for
delete operations.