crimson/osd/osd_operations/pg_advance_map: Add splitting as a function
As we initiate pg splitting as part of the PGAdvanceMap workflow, it is not required
to maintain it as a separate osd_operation.
A new function in PGAdvanceMap - split_pg(), will now take care of the splitting workflow
if we detect split children in an OSD map.
Since we do not follow the same queuing system as classical OSD in crimson, we will not
need to maintain pg_num_history. This makes the splitting check simpler.
With most of the splitting code being part of PGAdvanceMap, it makes sense to have the
splitting check there as well and leave broadcast_map_to_pgs untouched.
crimson/osd/shard_services: make use of PGCreationBlockingEvent
For every split child PG being created, make an entry in the wait_for_pg map using
a PGCreationBlockingEvent trigger.
This change ensures that even split PGs are part of the pgs_creating map that tracks
PGCreationState of a PG by the addition of the create_split_pg method.
crimson/osd/shard_services: Add function create_split_pg_mapping for PG splitting
Previously, get_or_create_pg_mapping was used to assign a new child PG to a specific core. When a peering operation occurred, it determined the core responsible for the PG and forwarded the operation there using get_or_create_pg_mapping.
However, earlier changes did not ensure that the split child PG creation logic ran on the same core to which it was mapped. This led to an issue where PGMap::pg_created could be updated on shard 0, while the child PG was actually mapped to shard 1. When a peering operation for the new PG arrived, it was forwarded to shard 1. Since the PG had been created on shard 0, shard 1 had no record of it and attempted to create it again. This resulted in an assertion failure in BlueStore due to an attempt to create a collection for an already existing PG.
This commit adds a function create_split_pg_mapping that will resolve the issue by ensuring that child PG creation always occurs on the core to which the PG is mapped.
mon/OSDMonitor: Remove check for FLAG_NOPGCHANGE for crimson pools
With PG Splitting introduced, we can remove the check that previously stopped changes in pg_num for crimson pools.
Note: `nopgchange` is set to true by default with Crimson pools,
for pg_num changes to actually take place we should also set `nopgchange` to false.
Currently, this is used for (PG splitting) testing only. Once PG merging is in place as well,
we could revert `nopgchange` to be false by default in Crimson.
mgr/dashboard: Allow the user to re-use existing r
ealm/zg/zone and setup replication
1. Currently, we just allow the user to create a new realm/zg/zone and setup replication using the multi-site replication wizard. The ask is to allow the user to select the pre-existing realm/zg/zone and setup replication via automatic export and import of token as well.
2. Enable rgw module automatically in the selected cluster if its not
enabled
Ronen Friedman [Tue, 20 May 2025 07:29:04 +0000 (02:29 -0500)]
osd: move load avg units conversion to the client
The OSD calls OsdScrub::update_load_average() to find out the load
average, and notes it down in a performance counter. The system
load average is multipled by 100 (to improve precision). That
multiplication should be on the side of the client, not the
scrub queue service.
Ronen Friedman [Tue, 20 May 2025 05:21:37 +0000 (00:21 -0500)]
osd/scrub: remove OsdScrub::LoadTracker
As we no longer maintain a 'daily average', and as the interaction
between the load tracker and the scrub scheduler is now much simplified,
we can remove the load tracker entirely.
Venky Shankar [Tue, 20 May 2025 05:28:55 +0000 (10:58 +0530)]
Merge PR #62632 into main
* refs/pull/62632/head:
libcephfs: increment library minor version
test: add test to fetch perf counters via libcephfs API
libcephfs: add API to get client perf counters
client: fix total write operations perf counter name
Ville Ojamo [Sun, 18 May 2025 05:35:48 +0000 (12:35 +0700)]
doc/man: Fix inline formatting in ceph-bluestore-tool.rst
A space is missing between a token with emphasis and the following
token:
- Not consistent with other commands like "show-label" (has space).
- Inline formatting is rendered verbatim in the second occurrence, without
the formatting being applied.
- Warning from Sphinx.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Ville Ojamo [Thu, 15 May 2025 10:32:29 +0000 (17:32 +0700)]
doc: Use existing labels and ref for hyperlinks in architecture.rst
Use validated ":ref:" hyperlinks instead of "external links" in "target
definitions" when linking within the Ceph docs:
- Update to use existing labels when linkin from architecture.rst.
- Remove unused "target definitions".
Also use title case for section titles in
doc/start/hardware-recommendations.rst because change to use link text
generated from section title.
Other than generated link texts the rendered PR should look the same as
the old docs, only differing in the source RST.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Ronen Friedman [Sat, 17 May 2025 07:17:42 +0000 (02:17 -0500)]
osd/scrub: minimize calls to sysconf() in scrub_load_below_threshold()
Return an 'all is OK' value if the 1min CPU load - even before being
divided by the number of CPUs - is below the configured threshold.
This is a very common case, and avoids the need to call sysconf()
to get the number of CPUs.
Ronen Friedman [Sat, 17 May 2025 06:04:22 +0000 (01:04 -0500)]
osd/scrub: remove the 2'nd option for determining 'low load' for scrubbing
Previously, there were two conditions under which the CPU load was
considered
low enough to allow scrubbing:
- the CPU load was below the configured threshold, or
- the load was below a calculated "daily" average, and lower than the
15-min average.
That second condition was confusing and surprising, and is now removed.
As the scrubber logic no longer requires the 5m & 15m load averages,
scrub_load_below_threshold() can use the data gathered by the
periodic LoadTracker::update_load_average().
Kefu Chai [Thu, 15 May 2025 07:41:07 +0000 (15:41 +0800)]
tools/ceph_dedup: Add const qualifiers and reference parameters
Improve code quality in ceph_dedup tool by:
- Adding const qualifiers to member functions and parameters where appropriate
- Converting parameter passing to use references instead of value copies
for complex objects
These changes enhance code readability, better express intent through
const-correctness, and improve performance by avoiding unnecessary deep
copies.
Ville Ojamo [Fri, 16 May 2025 08:01:14 +0000 (15:01 +0700)]
doc/rados/configuration: Fix invalid hyperlinks in mclock-config-ref.rst
Fix two intradocument links that pointed to the same nonexisting section
title.
These were rendered "as-is" with backticks and everything, while still
linkified pointing to invalid "#idN" anchors.
Modify them to point to the correct section title text.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Ville Ojamo [Fri, 16 May 2025 08:40:41 +0000 (15:40 +0700)]
doc/radosgw: Fix indentation in cloud-transition.rst
Indent the second paragraph of a list item at the same level as the
previous paragraph. The unexpected indentation resulted in an ERROR from
Sphinx but it was still rendered with increased indentation looking
rather out of place.
Capitalize the first letter similarly to the previous paragraph.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Ville Ojamo [Fri, 16 May 2025 08:14:09 +0000 (15:14 +0700)]
doc/rados/operations: Fix invalid hyperlink in crush-map-edits.rst
Fix attempted use of underscores for inline emphasis which resulted in
the text being emphasized to be considered a link. The text was rendered
partially as a link to an invalid anchor "#id3".
Instead use inline italic for formatting emphasis.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>