Bill Scales [Fri, 25 Apr 2025 14:03:02 +0000 (15:03 +0100)]
osd: overaggressive assert in read_log_and_missing with optimized EC pool
read_log_and_missing is called during OSD initializaiton to sanity check
the PG log. One of its checks is too agressive for an optimized EC pool
where because of a partial write there can be a log entry but no update
to the object on this shard (other shards will have been updated). The
fix is to skip the checks when the log entry indicates this shard was
not updated.
Only affects pool with allow_ec_optimizations flag on.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
Bill Scales [Thu, 29 May 2025 11:53:27 +0000 (12:53 +0100)]
osd: EC optimizations rework for pg_temp
Bug fixes for how pg_temp is used with optimized EC pools. For these
pools pg_temp is re-ordered with non-primary shards last. The acting
set was undoing this re-ordering in PeeringState, but this is too
late and results code getting the shard id wrong. One consequence
iof this was an OSD refusing to create a PG because of an incorrect
shard id.
This commit moves the re-ordering earlier into OSDMap::_get_temp_osds,
some changes are then required to OSDMap::clean_temps.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
Bill Scales [Fri, 23 May 2025 09:45:46 +0000 (10:45 +0100)]
osd: EC Optimizations OSDMap::clean_temps preventing change of primary
clean_temps is clearing pg_temp if the acting set will be the same
as the up set. For optimized EC pools this is overaggressive because
there are scenarios where it is setting acting set to be the same as
up set to force an alternative shard to be chosen as primary - this
happens because the acting set is transformed to place non-primary
shards at the end of the pg_temp vector.
Detect this scenario and stop clean_temps from undoing the acting
set which is being set by PeeringState::choose_acting.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
Bill Scales [Thu, 22 May 2025 12:12:57 +0000 (13:12 +0100)]
osd: EC optimizations bug in OSDMap::clean_temps
OSDMap clean_temps clears pg_temp for a PG when the up set
matches the acting_set. For optimized EC pools the pg_temp
is reordered to place primary shards first, this function
was not calling pgtemp_undo_primaryfirst to revert the
reordering.
This meant that a 2+1 EC PG with up set [1,2,3] and
a desired acting set [1,3,2] re-ordered the acting
set to produce pg_temp as [1,2,3] and then deleted this
because it equals the up set.
Calling pgtemp_undo_primaryfirst makes this code work
as intended.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
Bill Scales [Mon, 19 May 2025 16:01:39 +0000 (17:01 +0100)]
osd: EC Optimizations fix routing of requests to non-zero shard id
Pools with EC optimizations can use pg_temp to modify the selection
of the primary. When this happens the clients route request to the
correct OSD, but wrong shard which causes hung I/Os or misdirected
I/Os.
Fix Objecter to select the correct shard when sending requests to
EC optimized pools. Fix OSD to modify the shard when receving
requests from legacy clients.
Add new unittests to test new functions for remapping the shard id.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
Alex Ainscow [Thu, 24 Apr 2025 14:13:31 +0000 (15:13 +0100)]
osd: Cosmetic code cleanup and improve debug
All these changes are one of:
* Whitespace changes
* Addition/removal of debug
* Typos
* Make CPU-intensive debug statements deb ug level 30
* Remove unnecessary counter which did not really help with debug
* Extra comments
Zac Dover [Wed, 25 Jun 2025 09:19:49 +0000 (19:19 +1000)]
doc/radosgw: line edit bucket_logging.rst
Edit doc/radosgw/bucket_logging.rst so that it is not solecistic and so
that its punctuation is corrected and its use of articles is corrected.
This file remains in my judgment demotic and maybe demotic enough to
warrant another editorial pass in the future.
Venky Shankar [Wed, 25 Jun 2025 06:39:39 +0000 (12:09 +0530)]
Merge PR #59435 into main
* refs/pull/59435/head:
mgr/volumes: Fix json.loads for test on mon caps
mgr/volumes: Add test for mon caps if auth key has remaining mds/osd caps
mgr/volumes: Keep mon caps if auth key has remaining mds/osd caps
Add comprehensive documentation for defining configuration options in
ceph-mgr modules, including all supported properties and their usage.
Previously, the documentation did not explain how to define ceph-mgr
module configuration options, despite subtle differences from other Ceph
components. This change documents all supported Option properties, their
types, and provides clear examples to help module developers properly
configure their options.
Kefu Chai [Wed, 25 Jun 2025 03:02:46 +0000 (11:02 +0800)]
doc: do not depend on typed-ast
the typed-ast project was marked end of life since July 2023, and
not maintained anymore. since we build the document using readthedocs'
service, and in .readtherdocs.yml we use python 3.9, which comes with
ast module included by its standard library.
the typed-ast dependency was originally added in 30d41597, but now that
we are using python 3.9, there is no need to use this module anymore.
Kefu Chai [Wed, 25 Jun 2025 03:50:24 +0000 (11:50 +0800)]
doc/dev/config: Document how to use :confval: directive for config options
Add comprehensive guide for documenting configuration options using the
:confval: directive, including naming conventions and cross-referencing.
Previously, the documentation lacked guidance on using the :confval:
directive and the important distinction between regular config options
and mgr module options (which require the mgr/<module>/ namespace
prefix). This change provides detailed examples and best practices for
properly documenting and referencing both types of configuration options.
Kefu Chai [Tue, 24 Jun 2025 14:38:13 +0000 (22:38 +0800)]
rbd: fix unused function warning when WITH_KRBD is disabled
Guard print_error_description() and get_unsupported_features() with
`#ifdef WITH_KRBD` to prevent compiler warnings when KRBD support is
not enabled.
These functions are only called by do_kernel_map(), which is itself
conditionally compiled. When WITH_KRBD is not defined, the compiler
generates unused function warnings for these helper functions.
Fixes warning:
```
/home/kefu/dev/ceph/src/tools/rbd/action/Kernel.cc:305:13: warning: ‘void rbd::action::kernel::print_error_description(const char*, const char*, const char*, const char*, int)’ defined but not used [-Wunused-function]
305 | static void print_error_description(const char *poolname,
| ^~~~~~~~~~~~~~~~~~~~~~~
```
Zac Dover [Mon, 23 Jun 2025 12:50:03 +0000 (22:50 +1000)]
doc/rados: clarify "upmap_max_deviation"
Clarify the threshold set by "upmap_max_deviation" and add the
information about this configurable that is currently in
src/pybind/mgr/balancer/module.py to src/common/options/global.yaml.in,
so that it will be accessible by means of ".. confval::" declarations.
Let PGShardManager::invoke_on_each_shard_seq pass the local shard_services
instance instead of using an additional helper.
The downside of dropping the generic sharded_map_seq helper is that it is
able to support *any* (seastar::)sharded object. However, as shard_services
is the only user of it - directly using the local instance without the
helper seems easier to read.
Matan Breizman [Sun, 22 Jun 2025 10:10:10 +0000 (10:10 +0000)]
crimson/common/smp_helpers: fix reactor_map_seq
Copy f into reactor_map_seq which would be kept alive
due to this method being a coroutine. That way, we can ensure
the lambdas passed to each core that are capturing f by
reference would be safe.
Alternatively, we can also copy f by using it's copy ctor and
pass a copy to each shard:
co_await crimson::submit_to(core, F(f))
However, avoiding the copy is possible here due to the sequential
traversal. Note, seastar's invoke_on_all do copy each callback to
every shard and is running the invocation in parallel.
The above would have fixed f's captures to be invalid and result
in a segfaults on diffrent shards.
Zac Dover [Mon, 23 Jun 2025 08:18:07 +0000 (18:18 +1000)]
doc/radosgw: remove "pubsub_event_lost"
Remove "pubsub_event_lost" from the list of "Notification Performance
Statistics" in doc/radosgw/notifications.rst. "pubsub_event_lost" is now
obsolete.
Shraddha Agrawal [Wed, 28 May 2025 05:56:26 +0000 (11:26 +0530)]
mon: add command osd pool clear-availability-status
This commit adds a new command to allow users to clear the
calculated availability score for a specified pool. This can be
done by issuing the command:
ceph osd pool clear-availability-status <pool_name>
Nitzan Mordechai [Wed, 21 May 2025 11:41:01 +0000 (11:41 +0000)]
src/mon/MgrStatMonitor: fix invalid iterator increment in calc_pool_availability()
Erasing entries from `pool_availability` inside a range-for
loop invalidated the hidden iterator, triggering an
“Invalid read” under Valgrind.
- Use `std::erase_if(pool_availability, predicate)` for
atomic removal.
- Refactor the stats-update loop to use structured bindings
and a clear `++it` for readability.
John Mulligan [Fri, 20 Jun 2025 23:03:22 +0000 (19:03 -0400)]
script/build-with-container: add rocky10 to built-in distros
Add "rocky10" (also aliased to "rockylinux10") to the known distro bases
so that the team can begin to experiment with the Rocky Linux 10 distro
for containerized builds.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Fri, 20 Jun 2025 23:46:16 +0000 (19:46 -0400)]
script/build-with-container: support --build-arg arguments
Allow passing --build-arg arguments to build-with-container.py
which are passed directly to the container build command.
This allows a developer to toggle certain features of the build
container, however this should not be used in CI.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Fri, 20 Jun 2025 23:34:45 +0000 (19:34 -0400)]
Dockerfile.build: make WITH_CRIMSON a build arg
We've chosen to enable crimson by default to match the CI, but that
is not always something a developer may want, so make WITH_CRIMSON
a build argument that can be toggled off if necessary.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Kefu Chai [Fri, 20 Jun 2025 23:00:01 +0000 (07:00 +0800)]
common/static_ptr: pass an integer to alignas to fix GCC-11 build failure
GCC-11 fails to compile `alignas(std::bit_ceil(Size))` despite std::bit_ceil()
being marked constexpr in libstdc++11. The compiler doesn't recognize it as a
constant expression, while GCC-12+ and Clang-14+ handle it correctly.
Define the alignment value as a separate constexpr variable before passing it
to alignas() to ensure compatibility with GCC-11.
Fixes compilation issue introduced in commit 73399b05 when std::aligned_storage_t
was replaced with alignas.
J. Eric Ivancich [Fri, 20 Jun 2025 20:00:54 +0000 (14:00 -0600)]
Merge pull request #63271 from rafaelweingartner/parameter_to_externalize_secret_key_ttl-upstream-2
rgw: Externalize Keystone secret key cache TTL
Reviewed-by: Matt Benjamin <mbenjamin@redhat.com> Reviewed-by: Adam C. Emerson <aemerson@redhat.com> Reviewed-by: Tobias Urdin <tobias.urdin@binero.com>