node-proxy: encapsulate send logic in dedicated method
Move the "send data to mgr when inventory changed" logic from main()
into a dedicated method _try_send_update().
This flattens the reporter loop and keeps main() to a single call under
the lock.
- use warning for bad request in the API,
when thread is not alive and for retry failure,
- use error for OOB load failure,
- use info for backoff interval,
- use debug in send attempts and for member fetch
this commit fixes mypy errors by adding explicit types for get_path
and get_* getters methods, extending SystemBackend with
start/shutdown and declaring _ca_temp_file on NodeProxyManager
node-proxy: split out config, bootstrap and redfish logic
refactor config, bootstrap, redfish layer, and monitoring:
this:
- adds a config module (CephadmCofnig, load_cephadm_config and
get_node_proxy_config) and protocols for api/reporter.
- extracts redfish logic to redfish.py
- adds a vendor registry with entrypoints.
- simplifies main() and NodeProxyManager().
This commit renames CONFIG to DEFAULTS and add load_config() with
deep merge, refactor Config to use path + defaults and makes
node-proxy config path configurable via bootstrap JSON or env.
node-proxy: introduce component spec registry and overrides for updates
This change introduces a single COMPONENT_SPECS dict and get_update_spec(component)
as the single source of truth for RedFish component update config (collection, path,
fields, attribute). To support hardware that uses different paths or attributes,
get_component_spec_overrides() allows overriding only those fields (via dataclasses.replace())
without duplicating the rest of the spec.
All _update_network, _update_power, etc. now call _run_update(component).
For instance, AtollonSystem uses this to set the power path to 'PowerSubsystem'.
mgr/cephadm: safe status/health access in node-proxy agent and inventory
This adds helpers in NodeProxyEndpoint and NodeProxyCache to safely
read status.health and status.state.
In NodeProxyEndpoint, methods _get_health_value() and _get_state_value()
are used in get_nok_members() to avoid KeyError on malformed data.
In NodeProxyCache, _get_health_value(), _has_health_value(),
_is_error_status(), and _is_unknown_status() are used in fullreport()
and when filtering 'non ok' members instead of accessing
status['status']['health'] inline.
node-proxy: narrow build_data exception handling and re-raise
With this commit, it catches only KeyError, TypeError, and
AttributeError in build_data() instead of Exception, and
re-raise after logging so callers get the actual error.
node-proxy: refactor Endpoint/EndpointMgr and fix chassis paths
This commit refactors EndpointMgr and Endpoint to use explicit dicts
instead of dynamic attributes. It also fixes member path filtering
so chassis endpoints use Chassis paths.
node-proxy: reduce log verbosity for missing optional fields
Change missing field logging from warning to debug level in
RedfishDellSystem, as missing optional fields can be expected behavior
and and doesn't require warning level logging.
Kefu Chai [Fri, 9 Jan 2026 23:53:29 +0000 (07:53 +0800)]
rgw: fix memory leak in RGWHTTPManager thread cleanup
Fix memory leak detected by AddressSanitizer in unittest_http_manager.
The test was failing with ASan enabled due to rgw_http_req_data objects
not being properly cleaned up when the HTTP manager thread exits.
ASan reported the following leaks:
Direct leak of 17152 byte(s) in 32 object(s) allocated from:
#0 operator new(unsigned long)
#1 RGWHTTPManager::add_request(RGWHTTPClient*)
/ceph/src/rgw/rgw_http_client.cc:946:33
#2 HTTPManager_SignalThread_Test::TestBody()
/ceph/src/test/rgw/test_http_manager.cc:132:10
Indirect leak of 768 byte(s) in 32 object(s) allocated from:
#0 operator new(unsigned long)
#1 rgw_http_req_data::rgw_http_req_data()
/ceph/src/rgw/rgw_http_client.cc:52:22
#2 RGWHTTPManager::add_request(RGWHTTPClient*)
/ceph/src/rgw/rgw_http_client.cc:946:37
SUMMARY: AddressSanitizer: 17920 byte(s) leaked in 64 allocation(s).
Root cause: The rgw_http_req_data class uses reference counting
(inherits from RefCountedObject). When a request is unregistered,
unregister_request() calls get() to increment the refcount, expecting
a corresponding put() to be called later.
In manage_pending_requests(), unregistered requests are properly
handled with both _unlink_request() and put(). However, in the thread
cleanup code (reqs_thread_entry exit path), only _unlink_request() was
called without the matching put(), causing a reference count leak.
The fix adds the missing put() call in the thread cleanup code to match
the reference counting pattern used in manage_pending_requests().
Test results:
- Before: 17,920 bytes leaked in 64 allocations
- After: 0 leaks, unittest_http_manager passes with ASan
Ville Ojamo [Tue, 3 Feb 2026 06:28:12 +0000 (13:28 +0700)]
doc: unpin pip in admin/doc-read-the-docs.txt
7dd00ca introduced a proper fix for pip 25.3/PEP517 compatibility by
adding pyproject.toml files and the workaround in a65c46c is no longer
necessary. RTD builds with pip 25.3 and later work with the proper fix.
Remove the pinned pip in admin/doc-read-the-docs.txt and let RTD use the
default PIP version.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Shraddha Agrawal [Thu, 29 Jan 2026 04:28:00 +0000 (09:58 +0530)]
ceph-volume: support crimson osd binary
Prior to this commit, ceph-volume was using hardcoded OSD binary
to issue commands (eg - to perform mkfs, etc). This commit enables
ceph-volume to start supporting crimson OSDs.
A new argument, --osd-type is introduced with the default value
classic. When this parameter is set to 'crimson', ceph-osd-crimson
binary will be used to execute OSD commands.
This commit enables us to deploy both classic and crimson
type OSDs using cephadm. To enable the same, a new feature,
osd_type is added to DriverGroupSpec. The default value for
the same is classic, but can also be set to crimson.
When this value is read by cephadm, the entrypoint is
changed from /usr/bin/ceph-osd to /usr/bin/ceph-osd-crimson.
- updates tearsheet component css to match with carbon component
- adds laoding state to submit button
- adds support for step validation when angualr component are use for steps rather than plain html templates
- adds step one of nvmeof
Ilya Dryomov [Fri, 30 Jan 2026 15:32:35 +0000 (16:32 +0100)]
qa/tasks/rbd_mirror_thrash: don't use random.randrange() on floats
This stopped working in Python 3.12:
Changed in version 3.12: Automatic conversion of non-integer types
is no longer supported. Calls such as randrange(10.0) and
randrange(Fraction(10, 1)) now raise a TypeError.
Ilya Dryomov [Tue, 11 Nov 2025 15:33:16 +0000 (16:33 +0100)]
qa/tasks/qemu: install genisoimage package
genisoimage is expected to be included in our base images but currently
isn't on Rocky 10. Since it's quite a niche thing, let's install the
package explicitly.
Ilya Dryomov [Thu, 29 Jan 2026 20:41:03 +0000 (21:41 +0100)]
qa/workunits/rbd: reduce randomized sleeps in live import tests
These tests were tuned for slower hardware than what we have now.
Currently "rbd migration execute" always finishes (successfully) before
the NBD server is killed.
Ilya Dryomov [Tue, 11 Nov 2025 20:39:58 +0000 (21:39 +0100)]
qa/valgrind.supp: make gcm_cipher_internal suppression more resilient
gcm_cipher_internal() and ossl_gcm_stream_final() make it to the stack
trace only on CentOS Stream 9. On Ubuntu 22.04 and Rocky 10, it looks
as follows:
Thread 4 msgr-worker-1:
Conditional jump or move depends on uninitialised value(s)
at 0x70A36D4: ??? (in /usr/lib64/libcrypto.so.3.2.2)
by 0x70A39A1: ??? (in /usr/lib64/libcrypto.so.3.2.2)
by 0x6F8A09C: EVP_DecryptFinal_ex (in /usr/lib64/libcrypto.so.3.2.2)
by 0xB498C1F: ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authenticated_decrypt_update_final(ceph::buffer::v15_2_0::list&) (crypto_onwire.cc:271)
by 0xB4992D7: ceph::msgr::v2::FrameAssembler::disassemble_preamble(ceph::buffer::v15_2_0::list&) (frames_v2.cc:281)
by 0xB482D98: ProtocolV2::handle_read_frame_preamble_main(std::unique_ptr<ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer>&&, int) (ProtocolV2.cc:1149)
by 0xB475318: ProtocolV2::run_continuation(Ct<ProtocolV2>&) (ProtocolV2.cc:54)
by 0xB457012: AsyncConnection::process() (AsyncConnection.cc:495)
by 0xB49E61A: EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*) (Event.cc:492)
by 0xB49EA9D: UnknownInlinedFun (Stack.cc:50)
by 0xB49EA9D: UnknownInlinedFun (invoke.h:61)
by 0xB49EA9D: UnknownInlinedFun (invoke.h:111)
by 0xB49EA9D: std::_Function_handler<void (), NetworkStack::add_thread(Worker*)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (std_function.h:290)
by 0xBB11063: ??? (in /usr/lib64/libstdc++.so.6.0.33)
by 0x4F17119: start_thread (in /usr/lib64/libc.so.6)
The proposal to amend the existing suppression so that it's tied to the
specific callsite rather than libcrypto internals [1] received a thumbs
up from Radoslaw.
Roland Sommer [Fri, 30 Jan 2026 07:54:49 +0000 (08:54 +0100)]
debian: package mgr/smb in ceph-mgr-modules-core
The `BaseController` auto-imports the packaged `mgr/dashboard/controllers/smb.py`
file, which in turn wants to import `smb.enums` etc. which is part of the `smb`
package which is missing from `debian/ceph-mgr-modules-core.install`, thus
missing in the package. The missing module causes an exception
`ModuleNotFoundError: No module named 'smb'` on mgr instances when running a
ceph tentacle cluster installed from debian packages.
See: https://tracker.ceph.com/issues/74268 Signed-off-by: Roland Sommer <rol@ndsommer.de>
Afreen Misbah [Wed, 28 Jan 2026 09:59:08 +0000 (15:29 +0530)]
mgr/dashboard: fetch all namespaces in a gateway group
- adds a new API /api/gateway_group/{group}/namespace
- updates tests
- needed for UI flows and in general to fetch all namespaces, could not change existing API due to the maintenence of backward compatibility
- in a followup PR will add server side pagination
Ville Ojamo [Fri, 30 Jan 2026 04:47:40 +0000 (11:47 +0700)]
doc/dev: add sequence diagrams back to health-reports.rst
The sequence diagrams were removed in ce96ddd because they were causing
issues. Add them back as SVG images. Include as comments the source code
used to generate the diagrams.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
John Mulligan [Thu, 29 Jan 2026 23:28:44 +0000 (18:28 -0500)]
Merge pull request #65632 from phlogistonjohn/jjm-smb-hosts-allow
smb: support shares equivalent for hosts allow
Reviewed-by: Anthony D Atri <anthony.datri@gmail.com> Reviewed-by: Anoop C S <anoopcs@cryptolab.net> Reviewed-by: Shwetha Acharya <sacharya@redhat.com> Reviewed-by: Avan Thakkar <athakkar@redhat.com> Reviewed-by: Adam King <adking@redhat.com>
John Mulligan [Fri, 9 Jan 2026 16:25:43 +0000 (11:25 -0500)]
qa/workunits/smb: make the runner script easier to use manually
When testing the tests it can help speed things up to avoid
recreating the virtualenv, allow an env var SMB_REUSE_VENV=<path>
to supply a specific virtual env dir to (re)use.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Thu, 8 Jan 2026 18:42:14 +0000 (13:42 -0500)]
qa/suites/orch/cephadm: enable hosts_access tests
Enable the hosts_access tests when running deploy_smb_mgr_basic.yaml,
deploy_smb_mgr_domain.yaml, deploy_smb_mgr_res_basic.yaml, or
deploy_smb_mgr_res_dom.yaml.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Thu, 8 Jan 2026 18:45:43 +0000 (13:45 -0500)]
qa/workunits/smb: add tests for hosts_access field
The recently added hosts_access field allows a share to be configured
to allow or deny hosts by IP or network. The new module reconfigures
a share to attempt a small set of access scenarios with the hosts_access
field.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Wed, 19 Nov 2025 22:26:27 +0000 (17:26 -0500)]
qa/workunits/smb: add utility module for cephadm shell commands
Add a helper module that makes it a bit cleaner and easier to
find and interact with the cluster's 'admin node' the node where
we can run `cephadm shell` and commands within that shell.
This will allow us to make modifications to smb resources via
the ceph command and JSON in order to test various features.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Fri, 9 Jan 2026 14:32:56 +0000 (09:32 -0500)]
qa/workunits/smb: make the smb_cfg fixture module scoped
This means the file will only be read when pytest changes modules.
This also allows this fixture to be used with other fixtures at the
module or scope "higher" than the function scope.
John Mulligan [Fri, 9 Jan 2026 16:12:46 +0000 (11:12 -0500)]
qa/tasks: add client node info to smb workunit config dump
When generating the big ball of config JSON that helps define
parameters for the smb tests in the workunit add client "node"
info as well.
Add a function to avoid repeating the logic of getting node
info from the teuthology remote object.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Wed, 7 Jan 2026 23:02:21 +0000 (18:02 -0500)]
qa/tasks: embed use of ssh_keys task in smb workunit
Automatically use the ssh_keys tasks in the smb workunit task.
It can be disabled by passing false to `ssh_keys:` config key.
This allows the node running the tests to ssh into the node where
cephadm is installed in order to execute commands within
the cephadm shell.
Signed-off-by: John Mulligan <jmulligan@redhat.com>