Shubha Jain [Mon, 16 Feb 2026 14:03:01 +0000 (19:33 +0530)]
mgr/nfs: improve cluster info implementation and fix deployment type logic
- Show placement details and daemon roles in cluster info output
- Add deployment type field showing standalone/active-passive/active-active
- Use orchestrator.DaemonDescriptionStatus.to_str() directly
- Use placement.to_json() for placement field
- Cache get_hosts() to avoid O(n) orchestrator calls
- Optimize ingress service lookup with direct query
- Fix safe access to daemon.ports to prevent IndexError
- Use explicit None checks for port values
- Return empty dict {} for placement instead of None
- Remove unnecessary wrapper methods and comments
- Fix flake8 issues and update tests
Adam King [Thu, 19 Feb 2026 16:28:08 +0000 (11:28 -0500)]
qa/cephadm: start mgr-nfs-upgrade sequences from squid and tentacle
As main is what will become umbrella we no longer
need the reef tests. I also added tests for the current
state of the branch since we were only testing against
the initial release
Adam King [Tue, 17 Feb 2026 18:56:36 +0000 (13:56 -0500)]
cephadm/samples: don't specify localhost as grafana addr
Grafana complains about this being set to localhost
and also appears to be able to figure out what to bind to
itself if this field is not set. In actual deployments
this would be set by the cephadm mgr module while the
test_cephadm workunit itself just directly passes
a json config
Kushal Deb [Fri, 2 Jan 2026 09:34:15 +0000 (15:04 +0530)]
fix test_remote_executables.py to include ButOr -> BitOr
Issue seen:
def _names(node):
if isinstance(node, ast.Name):
return [node.id]
if isinstance(node, ast.Attribute):
vn = _names(node.value)
return vn + [node.attr]
if isinstance(node, ast.Call):
return _names(node.func)
if isinstance(node, ast.Constant):
return [repr(node.value)]
if isinstance(node, ast.JoinedStr):
return [f"<JoinedStr: {node.values!r}>"]
if isinstance(node, ast.Subscript):
return [f"<Subscript: {node.value}{node.slice}>"]
if isinstance(node, ast.BinOp):
return [f"<BinaryOp: {_names(node.left)} {_names(node.op)} {_names(node.right)}"]
if (
isinstance(node, ast.Add)
or isinstance(node, ast.Sub)
or isinstance(node, ast.Mult)
or isinstance(node, ast.Div)
or isinstance(node, ast.FloorDiv)
or isinstance(node, ast.Mod)
or isinstance(node, ast.Pow)
or isinstance(node, ast.LShift)
or isinstance(node, ast.RShift)
> or isinstance(node, ast.ButOr)
or isinstance(node, ast.BitXor)
or isinstance(node, ast.BitAnd)
or isinstance(node, ast.MatMult)
):
E AttributeError: module 'ast' has no attribute 'ButOr'. Did you mean: 'BitOr'?
Kushal Deb [Mon, 22 Dec 2025 12:58:29 +0000 (18:28 +0530)]
mgr/cephadm: Implement D3N L1 persistent datacache support for RGW
Add RGW D3N L1 persistent datacache support backed by host block devices.
Select devices deterministically per (service, host) with intra-service
sharing, forbid cross-service reuse, prepare/mount devices, and
bind-mount per-daemon cache directories into the container.
Github allows to add a instructions file to each repo
(.github/copilot-instructions.md) to improve the behavior
of Copilot Reviews and Agent.
These instructions can also be customized per path, filetype, etc.:
https://docs.github.com/en/copilot/how-tos/configure-custom-instructions/add-repository-instructions
This commit was authored through a Github Agent session: https://github.com/ceph/ceph/tasks/edeca07b-eabd-477c-917a-a18e72a0e2c2
This commit makes it log the http error with the code and the reason
in sessionservice_discover() and log the error code along with the
body in query() for 5xx responses.
node-proxy: encapsulate send logic in dedicated method
Move the "send data to mgr when inventory changed" logic from main()
into a dedicated method _try_send_update().
This flattens the reporter loop and keeps main() to a single call under
the lock.
- use warning for bad request in the API,
when thread is not alive and for retry failure,
- use error for OOB load failure,
- use info for backoff interval,
- use debug in send attempts and for member fetch
this commit fixes mypy errors by adding explicit types for get_path
and get_* getters methods, extending SystemBackend with
start/shutdown and declaring _ca_temp_file on NodeProxyManager
node-proxy: split out config, bootstrap and redfish logic
refactor config, bootstrap, redfish layer, and monitoring:
this:
- adds a config module (CephadmCofnig, load_cephadm_config and
get_node_proxy_config) and protocols for api/reporter.
- extracts redfish logic to redfish.py
- adds a vendor registry with entrypoints.
- simplifies main() and NodeProxyManager().
This commit renames CONFIG to DEFAULTS and add load_config() with
deep merge, refactor Config to use path + defaults and makes
node-proxy config path configurable via bootstrap JSON or env.
node-proxy: introduce component spec registry and overrides for updates
This change introduces a single COMPONENT_SPECS dict and get_update_spec(component)
as the single source of truth for RedFish component update config (collection, path,
fields, attribute). To support hardware that uses different paths or attributes,
get_component_spec_overrides() allows overriding only those fields (via dataclasses.replace())
without duplicating the rest of the spec.
All _update_network, _update_power, etc. now call _run_update(component).
For instance, AtollonSystem uses this to set the power path to 'PowerSubsystem'.
mgr/cephadm: safe status/health access in node-proxy agent and inventory
This adds helpers in NodeProxyEndpoint and NodeProxyCache to safely
read status.health and status.state.
In NodeProxyEndpoint, methods _get_health_value() and _get_state_value()
are used in get_nok_members() to avoid KeyError on malformed data.
In NodeProxyCache, _get_health_value(), _has_health_value(),
_is_error_status(), and _is_unknown_status() are used in fullreport()
and when filtering 'non ok' members instead of accessing
status['status']['health'] inline.
node-proxy: narrow build_data exception handling and re-raise
With this commit, it catches only KeyError, TypeError, and
AttributeError in build_data() instead of Exception, and
re-raise after logging so callers get the actual error.
node-proxy: refactor Endpoint/EndpointMgr and fix chassis paths
This commit refactors EndpointMgr and Endpoint to use explicit dicts
instead of dynamic attributes. It also fixes member path filtering
so chassis endpoints use Chassis paths.
node-proxy: reduce log verbosity for missing optional fields
Change missing field logging from warning to debug level in
RedfishDellSystem, as missing optional fields can be expected behavior
and and doesn't require warning level logging.
Shraddha Agrawal [Wed, 11 Feb 2026 14:23:39 +0000 (19:53 +0530)]
cephadm: add tests for seastore support
This commits adds the following tests:
1. cephadm: JSON roundtrip of a spec with objecstore=seastore.
2. cephadm: validation checks for objecstore values.
3. cephadm to ceph-volume: cmd checks if objecstore=seastore is set.
Kushal Deb [Mon, 16 Feb 2026 14:42:20 +0000 (20:12 +0530)]
cephadm: reapply hugepages for nvmeof at service start
NVMeoF gateways (SPDK) require host hugepages (vm.nr_hugepages + /dev/hugepages).
After a power-cycle some nodes boot with hugepages=0 (and/or the cephadm sysctl
drop-in under /etc/sysctl.d is missing/not applied), causing SPDK to fail and the
nvmeof container to crash-loop until the service is redeployed.
Cephadm previously applied hugepages only during deploy/reconfig via install_sysctl().
Normal boot/start uses the generated systemd unit which runs unit.run and does not
re-apply sysctl settings.
Added a pre-start step for nvmeof to set vm.nr_hugepages to the configured value
(from spdk_huge_pages, defaulting to 4096) before launching the container, so the
service self-heals on reboot/service restart.
Shubha Jain [Tue, 6 Jan 2026 15:19:27 +0000 (20:49 +0530)]
mgr/orchestrator: default NFS ingress to haproxy-protocol mode
Change default ingress mode from haproxy-standard to haproxy-protocol
to preserve client IP addresses for proper IP-level export restrictions
in NFS Ganesha.
Shweta Bhosale [Thu, 22 Jan 2026 10:09:41 +0000 (15:39 +0530)]
mgr/cephadm: Allow colocation for NFS daemon to support active-active mode, Spec will have colocation_ports field to accept ports for colocating daemons