Alex Ainscow [Thu, 5 Feb 2026 14:45:04 +0000 (14:45 +0000)]
osdc: Refactor SplitOp
There are large number of changes in this commit which were found through
development and testing of split ops.
I have split out all the objecter updates carefully, but since the split op
code is not currently used in production, I have not documented every change
and made significant refactors/rearrangements.
Alex Ainscow [Thu, 5 Feb 2026 15:00:03 +0000 (15:00 +0000)]
osd: Never return -EAGAIN from ECBackend.
If ECBackend returns EAGAIN, this causes the PrimaryLogPG code to drop the
op. This is for historical reasons, but hard to refactor out.
Instead, the PrimaryLogPG code has been refactored to work out that EAGAIN
is required much earlier in the processing, where EAGAIN will be returned
to the client.
Alex Ainscow [Thu, 5 Feb 2026 14:00:38 +0000 (14:00 +0000)]
osdc: Do not recalculate target for split ops.
SplitOp calculates the target and set the necessary target OSD itself. This
means that calc_target is not required again on first submit of the sub
read ops.
Alex Ainscow [Thu, 5 Feb 2026 13:47:11 +0000 (13:47 +0000)]
osdc: Move SplitOp decision point to later in submit procedure.
Previously, split ops was being calculated immediately the op was
submitted. Here we move the submit down to below the throttling
and timeout code. This way we throttle/timeout the original op.
Handling the timeout (op_cancel) will be handled in a latter commit.
Alex Ainscow [Thu, 5 Feb 2026 13:34:58 +0000 (13:34 +0000)]
osdc: Extend op_post_submit to cope with successful Ops.
The locking situation in Objecter is complex. When ops are completed whether
with success or otherwise, some locks are held. For split ops, this is
particularly complex, since multiple sessions are involved in the completion.
To avoid all these deadlock issues, splitOps choose to schedule a completion
task using asio::post, which can then take the appropriate locks before
completing the IO, without risk of deadlock.
Usage of this will be added in a refactor if SplitOps.
Alex Ainscow [Thu, 5 Feb 2026 13:30:36 +0000 (13:30 +0000)]
osdc: Add split_op statistic
This statistic counts the number of OPs which have been submitted using the
split op mechanism. It allows a user to check how useful this is and
performance/development to check that this mechanism is being used in
any given application.
Alex Ainscow [Thu, 5 Feb 2026 13:19:05 +0000 (13:19 +0000)]
mon: Add mechanism for user to add/clear pool flags.
Previously, every time we had a new experimental feature, switched with a
pool flag, we needed to add a bunch of boiler plate. Given that end users
should not be using these features, adding all of this user-visible
behaviour is not desirable.
This adds a single mechanism to specify a flag set by number. These magic
numbers can be used during development and then either removed, or
promoted to user-friendly flags.
Alex Ainscow [Thu, 5 Feb 2026 13:16:25 +0000 (13:16 +0000)]
osdc: Add FORCE and FAIL_ON_EAGAIN flags.
Previously, the lower levels of Objecter would potentially redrive ops to
different OSDs when the map changed, or the OSD returns -EAGAIN. These
flags will be used to change this behaviour:
* FORCE_OSD means that the OSD is fixed and cannot be changed.
* FAIL_ON_EGAIN means that rather than redriving, the OP should be failed (to splitops)
Alex Ainscow [Thu, 5 Feb 2026 13:14:07 +0000 (13:14 +0000)]
Torn write protection for Direct Reads
It is possible for direct reads to query two seperate shards and
get different versions of the object for each shard when using
direct reads.
To solve this we add a get_internal_version op to tell us the version
of the object on that shard and submit that in the same transaction
as the read so we can ensure the versions are what we expect. If we
have a mismatch, we resubmit the read through the primary path.
Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com> Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Matty Williams [Mon, 20 Oct 2025 15:46:43 +0000 (16:46 +0100)]
test/osd: Add balanced read flags to io_sequence exerciser
Added optional "-b"/"balanced" flag to the end of read/read2/read3 operations in interactive mode, to make them balanced reads.
Balanced read percentage is not used in interactive mode.
Add command line argument to specify percentage of read ops that should use the balanced reads flag. Default is 100%.
Signed-off-by: Matty Williams <Matty.Williams@ibm.com>
Kefu Chai [Fri, 9 Jan 2026 23:53:29 +0000 (07:53 +0800)]
rgw: fix memory leak in RGWHTTPManager thread cleanup
Fix memory leak detected by AddressSanitizer in unittest_http_manager.
The test was failing with ASan enabled due to rgw_http_req_data objects
not being properly cleaned up when the HTTP manager thread exits.
ASan reported the following leaks:
Direct leak of 17152 byte(s) in 32 object(s) allocated from:
#0 operator new(unsigned long)
#1 RGWHTTPManager::add_request(RGWHTTPClient*)
/ceph/src/rgw/rgw_http_client.cc:946:33
#2 HTTPManager_SignalThread_Test::TestBody()
/ceph/src/test/rgw/test_http_manager.cc:132:10
Indirect leak of 768 byte(s) in 32 object(s) allocated from:
#0 operator new(unsigned long)
#1 rgw_http_req_data::rgw_http_req_data()
/ceph/src/rgw/rgw_http_client.cc:52:22
#2 RGWHTTPManager::add_request(RGWHTTPClient*)
/ceph/src/rgw/rgw_http_client.cc:946:37
SUMMARY: AddressSanitizer: 17920 byte(s) leaked in 64 allocation(s).
Root cause: The rgw_http_req_data class uses reference counting
(inherits from RefCountedObject). When a request is unregistered,
unregister_request() calls get() to increment the refcount, expecting
a corresponding put() to be called later.
In manage_pending_requests(), unregistered requests are properly
handled with both _unlink_request() and put(). However, in the thread
cleanup code (reqs_thread_entry exit path), only _unlink_request() was
called without the matching put(), causing a reference count leak.
The fix adds the missing put() call in the thread cleanup code to match
the reference counting pattern used in manage_pending_requests().
Test results:
- Before: 17,920 bytes leaked in 64 allocations
- After: 0 leaks, unittest_http_manager passes with ASan
Ville Ojamo [Tue, 3 Feb 2026 06:28:12 +0000 (13:28 +0700)]
doc: unpin pip in admin/doc-read-the-docs.txt
7dd00ca introduced a proper fix for pip 25.3/PEP517 compatibility by
adding pyproject.toml files and the workaround in a65c46c is no longer
necessary. RTD builds with pip 25.3 and later work with the proper fix.
Remove the pinned pip in admin/doc-read-the-docs.txt and let RTD use the
default PIP version.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Shraddha Agrawal [Thu, 29 Jan 2026 04:28:00 +0000 (09:58 +0530)]
ceph-volume: support crimson osd binary
Prior to this commit, ceph-volume was using hardcoded OSD binary
to issue commands (eg - to perform mkfs, etc). This commit enables
ceph-volume to start supporting crimson OSDs.
A new argument, --osd-type is introduced with the default value
classic. When this parameter is set to 'crimson', ceph-osd-crimson
binary will be used to execute OSD commands.
This commit enables us to deploy both classic and crimson
type OSDs using cephadm. To enable the same, a new feature,
osd_type is added to DriverGroupSpec. The default value for
the same is classic, but can also be set to crimson.
When this value is read by cephadm, the entrypoint is
changed from /usr/bin/ceph-osd to /usr/bin/ceph-osd-crimson.
- updates tearsheet component css to match with carbon component
- adds laoding state to submit button
- adds support for step validation when angualr component are use for steps rather than plain html templates
- adds step one of nvmeof
Ilya Dryomov [Fri, 30 Jan 2026 15:32:35 +0000 (16:32 +0100)]
qa/tasks/rbd_mirror_thrash: don't use random.randrange() on floats
This stopped working in Python 3.12:
Changed in version 3.12: Automatic conversion of non-integer types
is no longer supported. Calls such as randrange(10.0) and
randrange(Fraction(10, 1)) now raise a TypeError.
Ilya Dryomov [Tue, 11 Nov 2025 15:33:16 +0000 (16:33 +0100)]
qa/tasks/qemu: install genisoimage package
genisoimage is expected to be included in our base images but currently
isn't on Rocky 10. Since it's quite a niche thing, let's install the
package explicitly.
Ilya Dryomov [Thu, 29 Jan 2026 20:41:03 +0000 (21:41 +0100)]
qa/workunits/rbd: reduce randomized sleeps in live import tests
These tests were tuned for slower hardware than what we have now.
Currently "rbd migration execute" always finishes (successfully) before
the NBD server is killed.
Ilya Dryomov [Tue, 11 Nov 2025 20:39:58 +0000 (21:39 +0100)]
qa/valgrind.supp: make gcm_cipher_internal suppression more resilient
gcm_cipher_internal() and ossl_gcm_stream_final() make it to the stack
trace only on CentOS Stream 9. On Ubuntu 22.04 and Rocky 10, it looks
as follows:
Thread 4 msgr-worker-1:
Conditional jump or move depends on uninitialised value(s)
at 0x70A36D4: ??? (in /usr/lib64/libcrypto.so.3.2.2)
by 0x70A39A1: ??? (in /usr/lib64/libcrypto.so.3.2.2)
by 0x6F8A09C: EVP_DecryptFinal_ex (in /usr/lib64/libcrypto.so.3.2.2)
by 0xB498C1F: ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authenticated_decrypt_update_final(ceph::buffer::v15_2_0::list&) (crypto_onwire.cc:271)
by 0xB4992D7: ceph::msgr::v2::FrameAssembler::disassemble_preamble(ceph::buffer::v15_2_0::list&) (frames_v2.cc:281)
by 0xB482D98: ProtocolV2::handle_read_frame_preamble_main(std::unique_ptr<ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer>&&, int) (ProtocolV2.cc:1149)
by 0xB475318: ProtocolV2::run_continuation(Ct<ProtocolV2>&) (ProtocolV2.cc:54)
by 0xB457012: AsyncConnection::process() (AsyncConnection.cc:495)
by 0xB49E61A: EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*) (Event.cc:492)
by 0xB49EA9D: UnknownInlinedFun (Stack.cc:50)
by 0xB49EA9D: UnknownInlinedFun (invoke.h:61)
by 0xB49EA9D: UnknownInlinedFun (invoke.h:111)
by 0xB49EA9D: std::_Function_handler<void (), NetworkStack::add_thread(Worker*)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (std_function.h:290)
by 0xBB11063: ??? (in /usr/lib64/libstdc++.so.6.0.33)
by 0x4F17119: start_thread (in /usr/lib64/libc.so.6)
The proposal to amend the existing suppression so that it's tied to the
specific callsite rather than libcrypto internals [1] received a thumbs
up from Radoslaw.