]>
git.apps.os.sepia.ceph.com Git - ceph.git/log
Guillaume Abrioux [Wed, 24 Jan 2024 15:08:14 +0000 (15:08 +0000)]
mgr/cephadm: add a new config option 'oob_default_addr'
So there's a default value (169.254.1.1) which is the default
address for the 'OS to iDrac pass-through' interface.
Given that node-proxy will reach the RedFish API through this interface,
we can make users avoid to pass that addr when providing the host spec
at bootstrap time.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
b09fd672c9838a091d6779047f3292acbb62070d )
Guillaume Abrioux [Tue, 23 Jan 2024 09:41:39 +0000 (09:41 +0000)]
node-proxy: collect `LocationIndicatorActive` property (storage)
This makes node-proxy collect the `LocationIndicatorActive`
property for storage component.
This can be needed for the Blinkenlight feature.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
d4cfc5a96c9e6d04dedb21e7788325d7b00c533a )
Guillaume Abrioux [Tue, 23 Jan 2024 09:36:00 +0000 (09:36 +0000)]
node-proxy: add new attribute to BaseRedfishSystem()
This adds `self.component_list()` in order to parametrize
which categories the agent will collect.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
b49216bf8bb01fc8f11f4575cca644bd3ead5f5a )
Guillaume Abrioux [Mon, 15 Jan 2024 14:09:23 +0000 (14:09 +0000)]
node-proxy: add packaging related changes
This adds the required changes to build an RPM of node-proxy.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
940ce782b5843ef1c0a80a74c5ad2af3f635a8b9 )
Guillaume Abrioux [Fri, 12 Jan 2024 09:15:02 +0000 (09:15 +0000)]
node-proxy: reduce log level in reporter agent
the following messages get logged quite a lot while
this is not a very useful information in a normal situation:
```
2024-01-12 09:09:40,604 - reporter - INFO - data ready to be sent to the mgr.
2024-01-12 09:09:40,604 - reporter - INFO - no diff, not sending data to the mgr.
2024-01-12 09:10:15,022 - reporter - INFO - data ready to be sent to the mgr.
2024-01-12 09:10:15,022 - reporter - INFO - no diff, not sending data to the mgr.
...
```
This commit changes the log level to DEBUG.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
b4091600f696fa8c3577876e071af3d53024f56f )
Guillaume Abrioux [Fri, 12 Jan 2024 09:11:21 +0000 (09:11 +0000)]
node-proxy: fix a thread/locking issue
This `sleep(5)` should be initiated *after* the lock is released.
Otherwise, it can cause troubles with the reporter loop which can
never acquire the lock.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
06a4a637b5988a1b6d7bae5d74ae140ff9ba83b6 )
Guillaume Abrioux [Fri, 12 Jan 2024 09:09:15 +0000 (09:09 +0000)]
node-proxy: address a typo
while checking logs, I noticed the following message:
```
2024-01-12 09:08:03,751 - reporter - INFO - Reporter url set to https:10.10.10.11:7150/node-proxy/data
```
Although this is only a cosmetic issue as this variable
is only used for logging messages, let's fix it.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
1c4a212eb8d9608630c518cbbf46ab97051b1bc0 )
Guillaume Abrioux [Mon, 15 Jan 2024 12:38:39 +0000 (12:38 +0000)]
node-proxy: make it a separate daemon
The current implementation requires the inclusion of all the recent
modifications in the cephadm binary, which won't be backported.
Since we need the node-proxy code backported to reef, let's move the
code make it a separate daemon.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
Co-authored-by: Adam King <adking@redhat.com>
(cherry picked from commit
7e6bc179ae7e0d633bd63086775002182c861d3f )
Guillaume Abrioux [Wed, 17 Jan 2024 08:47:36 +0000 (08:47 +0000)]
node-proxy: rename attribute and class
This renames the mgr's NodeProxyCache attribute from
`self.node_proxy` to `self.node_proxy_cache` and the
class `NodeProxy` in agent.py from `NodeProxy` to
`NodeProxyEndpoint` to make it clearer and avoid confusion.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
c62d1c82cf6155aba5e75e88ff3390ed5288e758 )
Guillaume Abrioux [Tue, 19 Dec 2023 09:23:42 +0000 (09:23 +0000)]
node-proxy: enhance debug log messages for locking operations
This commit updates the debug log messages in the BaseRedfishSystem
and Reporter classes. The adjustments made enhance the clarity and
precision of the messages by specifically identifying acquired
and released locks, detailing their context, thereby improving the
understanding of the control flow during locking operations
in these components.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
e68dceb1d2d6b4e6871c77465e1e23f2e726f84c )
Guillaume Abrioux [Tue, 19 Dec 2023 09:14:31 +0000 (09:14 +0000)]
node-proxy: explicitly set NodeProxy's attributes
The current logic using `setattr()` makes mypy complain:
"NodeProxy" has no attribute "xxx"
Using `self.__dict['xxx']` addresses this mypy error but the
downside of this is that the code isn't clear and less readable.
Explicitly setting the different attributes makes the code clearer
and more readable.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
e71bf838428c297075df5515c342c0db0a9e31e3 )
Guillaume Abrioux [Mon, 18 Dec 2023 14:26:04 +0000 (14:26 +0000)]
cephadm/tests: add pyyaml dependency
node-proxy requires this dependency so it needs to be added as
dependency for tox testing.
Typical failure:
```
ImportError while importing test module '/root/ceph/src/cephadm/tests/test_agent.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib64/python3.9/importlib/__init__.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/test_agent.py:10: in <module>
_cephadm = import_cephadm()
tests/fixtures.py:14: in import_cephadm
import cephadm as _cephadm
cephadm.py:32: in <module>
from cephadmlib.node_proxy.main import NodeProxy
cephadmlib/node_proxy/main.py:2: in <module>
from .redfishdellsystem import RedfishDellSystem
cephadmlib/node_proxy/redfishdellsystem.py:2: in <module>
from .baseredfishsystem import BaseRedfishSystem
cephadmlib/node_proxy/baseredfishsystem.py:2: in <module>
from .basesystem import BaseSystem
cephadmlib/node_proxy/basesystem.py:2: in <module>
from .util import Config
cephadmlib/node_proxy/util.py:2: in <module>
import yaml
E ModuleNotFoundError: No module named 'yaml'
```
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
6e7ea5172ac489c01e0a073acf869bcf6982a2b4 )
Guillaume Abrioux [Thu, 7 Dec 2023 14:20:43 +0000 (14:20 +0000)]
node-proxy: send oob management requests to the MgrListener()
Note that this won't be a true out of band management.
In the case where the host hangs, this won't work. The oob
management should be reached directly but most of the time
the oob network is isolated. The idea is to send queries to the
the tcp server exposed by the cephadm agent (MgrListener) so it
can send itself queries to the redfish API using the IP address
exposed on the OS.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
323c8cb0fbee134b11267fdf1e2cbc26cda8b08a )
Guillaume Abrioux [Wed, 6 Dec 2023 15:09:44 +0000 (15:09 +0000)]
cephadm: add `types-PyYAML` dependency in mypy testing
In order to address the following error:
```
cephadmlib/node_proxy/util.py:2: error: Library stubs not installed for "yaml" (or incompatible with Python 3.9)
cephadmlib/node_proxy/util.py:2: note: Hint: "python3 -m pip install types-PyYAML"
cephadmlib/node_proxy/util.py:2: note: (or run "mypy --install-types" to install all missing stub packages)
cephadmlib/node_proxy/util.py:2: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
```
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
47e7d3ddac1fd61b57149e5e4305bb9e819ae52e )
Guillaume Abrioux [Wed, 6 Dec 2023 15:01:29 +0000 (15:01 +0000)]
node-proxy: address flake8 errors in tests
This addresses a lot of flake8 errors in node-proxy tests:
E121 continuation line under-indented for hanging indent
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
f2c809e33f4999c4c64c58112cd94835a3b4ba24 )
Guillaume Abrioux [Wed, 6 Dec 2023 14:25:28 +0000 (14:25 +0000)]
node-proxy: move the output formatting logic to orchestrator
Implementing this in the cephadm module doesn't follow the general idea
of the orchestrator interface. This is where the output formatting should
be done so let's move the logic to the orchestrator module.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
aa170850b8e7f63120169b89e39b21bee2c5287e )
Guillaume Abrioux [Wed, 6 Dec 2023 12:27:46 +0000 (12:27 +0000)]
node-proxy: address a typing issue in agent.NodeProxy.query()
The current logic supports str and bytes types for parameter
`data`. This doesn't make sense, let's drop this logic.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
6cdb6f65b4ab84fcc5484ee7c6b940dd27b29587 )
Guillaume Abrioux [Fri, 1 Dec 2023 08:56:23 +0000 (08:56 +0000)]
node-proxy: address flake8 'Q000' warnings
This addresses the flake8 warning 'Q000':
`Q000 Double quotes found but single quotes preferred`
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
6cd42b73cfa843301ac8f58fe4f39eaf0b855b66 )
Guillaume Abrioux [Fri, 1 Dec 2023 08:18:25 +0000 (08:18 +0000)]
node-proxy: code change for hdd blinkenlight pre-requisites
This is mainly for anticipating the case where hdd blinkenlight via RedFish
works (testing has to be done). This introduces the required changes so the
endpoint `/led` can support blinkenlight for both chassis and disks.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
febfe0bf7588705785047bec49bf1a970ce180eb )
Guillaume Abrioux [Fri, 1 Dec 2023 08:11:31 +0000 (08:11 +0000)]
node-proxy: Add a `NodeProxyManager` class
The current approach with `init_node_proxy()` and `node_proxy_loop_check()`
is 'cumbersome' and gives the heebie-jeebies.
Sub-classing `Thread()` makes the code a bit more clearer and readable.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
2a840d7ce4d64dd38f7aea381207bbecfef629cd )
Guillaume Abrioux [Fri, 1 Dec 2023 08:03:58 +0000 (08:03 +0000)]
cephadm: gracefully shutdown the agent prior to removing
When the agent is removed, the daemon is abruptly stopped.
Since the node-proxy logic runs from within the cephadm agent,
it leaves an active RedFish session. The idea is to gracefully
shutdown the agent so node-proxy can catch that event and make sure
it closes the current active RedFish session prior to shutting down.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
79bfe642001a7f9e1da28f987d1edb45174f6e86 )
Guillaume Abrioux [Wed, 29 Nov 2023 13:44:55 +0000 (13:44 +0000)]
orch/cephadm: add json format support to `ceph orch hardware`
This adds `--format json` option support to the `ceph orch hardware` CLI
command.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
3a38755aef724d3039dc303a210b0972f7a71e63 )
Guillaume Abrioux [Tue, 28 Nov 2023 16:28:46 +0000 (16:28 +0000)]
node-proxy: update the data structure for summary report
This extends the current data structure for the 'summary' report.
It adds `sn` (serial number information) and the `firmwares` dict
to the current data structure.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
61d07e0a441aafd84a463868f777d6091f6e92fe )
Guillaume Abrioux [Tue, 28 Nov 2023 13:17:47 +0000 (13:17 +0000)]
node-proxy: drop local API
This was intented to address the case where the Ceph
manager can't talk directly to the oob management tool because
of network restrictions (subnets not inter-connecter, etc.).
If for any reason the host is stuck or unreachable, that local API won't
be helpful anyway, as a result any actions the Ceph mgr would be asked
to perform on the node would fail.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
3607a305cf629f54a8c3c3f52e56d41210c21d28 )
Guillaume Abrioux [Tue, 28 Nov 2023 08:05:47 +0000 (08:05 +0000)]
node-proxy: change 'idrac' terminology
The 'idrac' terminology is too specific, let's change this
to something more generic.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
a69b267cb9c425facf6fe68daa08993cf81d0816 )
Guillaume Abrioux [Thu, 23 Nov 2023 16:08:18 +0000 (16:08 +0000)]
node-proxy: raise HTTPError 404 error when no host is found
Raise a 404 HTTPError when these differents endpoints
are passed an inexisting hostname.
Otherwise the code will fail with a `KeyError` exception.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
bda2568d8c73bbdc77897d22a99f74c1f4a23511 )
Guillaume Abrioux [Wed, 22 Nov 2023 14:27:09 +0000 (14:27 +0000)]
node-proxy: run only when idrac details provided
This agent shouldn't run when no idrac details are
available.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
a21779c39d495a06f6f908594c541e5aa818b4f6 )
Guillaume Abrioux [Mon, 20 Nov 2023 14:55:26 +0000 (14:55 +0000)]
cephadm: inventory.NodeProxyCache() refactor
This modifies fullreport(), summary() and common() methods
so they use the same logic as firmwares() and criticals()
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
1675b6fe4ee3a6c43204ddf698c845b09ab7a2db )
Guillaume Abrioux [Thu, 16 Nov 2023 13:35:51 +0000 (13:35 +0000)]
cephadm/agent: add docstring to NodeProxy class
In order to document that part of the code and it might
help to generate API spec and documentation.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
5e1051bdbdc4d1720aa58f5b584df89bd1dd3d6d )
Guillaume Abrioux [Mon, 30 Oct 2023 15:51:56 +0000 (15:51 +0000)]
node-proxy: implement criticals endpoint
This adds the required changes in order to implement the endpoint
'/criticals'.
The goal of this endpoint is to provide a report of all critical statuses
for either a given host or all hosts across the cluster.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
ae791f8721027a9a508c7cd27e85f86f6fe7c492 )
Guillaume Abrioux [Thu, 26 Oct 2023 14:34:10 +0000 (14:34 +0000)]
orch/cephadm: implement `ceph orch hardware` command
This adds a first implementation of the `ceph orch hardware` CLI.
Usage:
```
ceph orch hardware status [<hostname>] [--category <value>]
```
Omitting the `[<hostname>]` argument will generate a report for all hosts.
The default for argument `[--category]` is `summary`.
Example with `--category` :
```
+------------+-------------+-------+--------+---------+
| HOST | NAME | SPEED | STATUS | STATE |
+------------+-------------+-------+--------+---------+
| ceph-00001 | eno8303 | 0 | OK | Enabled |
| ceph-00001 | eno8403 | 0 | OK | Enabled |
| ceph-00001 | eno12399np0 | 10000 | OK | Enabled |
| ceph-00001 | eno12409np1 | 10000 | OK | Enabled |
| ceph-00001 | bond0 | 10000 | OK | Enabled |
+------------+-------------+-------+--------+---------+
```
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
1665156eea9e57e533a2ded26a8f7b37df68f5c5 )
Guillaume Abrioux [Thu, 16 Nov 2023 09:48:02 +0000 (09:48 +0000)]
node-proxy: validate_node_proxy_data() refactor
raise cherrypy.HTTPError() when the received data is
not valid instead of returning `self.validate_msg`
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
b5814cd9278c857b7e09a1dbe229a7cdead10a29 )
Guillaume Abrioux [Wed, 25 Oct 2023 15:07:09 +0000 (15:07 +0000)]
node-proxy: implement http_query() helper function
so we can drop the dependency to `requests` and
use same helper function from both reporter.py and redfish_client.py
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
cae0e5e510eb3bad5132deb0332942aa294c6e8b )
Guillaume Abrioux [Tue, 24 Oct 2023 11:28:11 +0000 (11:28 +0000)]
node-proxy: address mypy and flake8 errors
This addresses some flake8 and python typing errors.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
5b6f18d7ad602921a25c8b8acfaf7b454cdbba0b )
Guillaume Abrioux [Tue, 24 Oct 2023 08:43:53 +0000 (08:43 +0000)]
node-proxy: fetch idrac details from NodeProxyCache()
The class ` NodeProxyCache()` is intended for that, it already
has this information so there's no need to make a call to `get_store()`
each time we want to access idrac details.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
a0f96aa5f1a27ec84e09f0bd030f62e39203e4f7 )
Guillaume Abrioux [Mon, 23 Oct 2023 15:28:35 +0000 (15:28 +0000)]
node-proxy: parametrize idrac port
This adds the missing piece to make the idrac port
a parameter that one can customize.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
69f1272cbf036f8388398093def5136f420635f5 )
Guillaume Abrioux [Mon, 23 Oct 2023 13:42:09 +0000 (13:42 +0000)]
cephadm: add new option to CLI
this adds the `--deploy-cephadm-agent` option to the cephadm
CLI's bootstrap subcommand.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
4c3979788fccbe01ff23163ea61cbdf8f74d9cbd )
Guillaume Abrioux [Fri, 20 Oct 2023 16:12:55 +0000 (16:12 +0000)]
node-proxy: implement /led endpoint
This is the first 'act on node' feature implementation.
This adds the endpoint /led
a GET request to this endpoint returns the current status
of the enclosure LED.
a PATCH request to this endpoint allows to set the
enclosure LED status.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
76dd9aa47095f1fca644879656b1fe17a033b9c4 )
Guillaume Abrioux [Fri, 20 Oct 2023 09:21:16 +0000 (09:21 +0000)]
node-proxy: drop dispatch() in NodeProxy()
The current logic prevents from using any cherrypy decorators
on actual endpoints as we use a set of 'proxy functions'
(index and dispatch) instead.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
1ec59e6625bae7cd381d83817196bf8669f641ad )
Guillaume Abrioux [Thu, 19 Oct 2023 07:42:24 +0000 (07:42 +0000)]
node-proxy: local API (NodeProxy) refactor
- subclass cherrypy._cpserver.Server,
- drop cherrypy.quickstart() call,
- drop nested classes approach,
- make it run over https
- print tracebacks when an exception is raised
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
1c79d6493ac35ae0394c492616f95220fbe1fbb4 )
Guillaume Abrioux [Fri, 13 Oct 2023 12:15:21 +0000 (12:15 +0000)]
node-proxy: clean up node_proxy dir
This removes a legacy file that is not needed any longer.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
fe41c29d9a135815c5b1937589aa31066763be63 )
Guillaume Abrioux [Fri, 13 Oct 2023 12:09:56 +0000 (12:09 +0000)]
node-proxy: collect firmwares details
This makes all the required changes in order to support
collecting, pushing and exposing data regarding firmwares
status and versions for all the underlying hardware.
This also refactors the redfish dell corresponding logic:
Having so many nested/inheritance classes seems unnecessary.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
a9afa2f6adad2cff04b54bfd69e8883b4b9fb1cb )
Guillaume Abrioux [Thu, 12 Oct 2023 13:29:19 +0000 (13:29 +0000)]
node-proxy: update the JSON data structure
Change the data structure from:
```
{
"storage": "ok",
"processors": "ok",
"network": "ok",
"memory": "ok",
"power": "ok",
"fans": "ok"
}
```
to:
```
{
"host": "node1",
"sn": "xxxx",
"status": {
"storage": {
}
}
}
```
In order to provide a unique key (sn) which is more reliable at the top
level of the dict.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
65d3f781f92505eb708716eb281c670a71ed503c )
Guillaume Abrioux [Wed, 11 Oct 2023 15:15:50 +0000 (15:15 +0000)]
node-proxy: quick clean up
This removes some files which are not needed any longer.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
dcfeea4ea15d8bb566c4d40bc1ab2013a9c044a1 )
Guillaume Abrioux [Wed, 11 Oct 2023 14:50:40 +0000 (14:50 +0000)]
node-proxy: run all update functions in parallel
This makes the update logic run faster.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
045e508f2e3a8c8367ceaeafe91ea0c397dceae5 )
Guillaume Abrioux [Wed, 11 Oct 2023 08:34:38 +0000 (08:34 +0000)]
cephadm/node-proxy: reset ceph warning when needed
This makes the mgr reset the warning when the alert is fixed.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
e7d6b109a264d5964363eee0af2e0051e19bf2d6 )
Guillaume Abrioux [Tue, 10 Oct 2023 12:42:42 +0000 (12:42 +0000)]
node-proxy: rename server.py -> main.py
This is going to be the entrypoint of node-proxy, let's rename
this file to main.py
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
d08006f487abd3cecb957b4e82330d0b4ff27d6e )
Guillaume Abrioux [Tue, 10 Oct 2023 12:41:09 +0000 (12:41 +0000)]
node-proxy: subclass Thread class
The idea is to subclass Thread so I can catch
exceptions in threads from the main process.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
c04a88c4d24d83a5d1a60341514a98c87fe6f833 )
Guillaume Abrioux [Tue, 10 Oct 2023 12:38:12 +0000 (12:38 +0000)]
node-proxy: drop current main.py
This file was there for devel purposes.
Let's drop it as it is not used any longer.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
c45bb65fe88bbc6f2e6f9d82adc0bc4e594b1c43 )
Guillaume Abrioux [Fri, 6 Oct 2023 13:55:21 +0000 (13:55 +0000)]
cephadm/node-proxy: logging issues / error handling refactor
- fix multiple logging issue because of new handler
added each time `Logger` is called
- do not propagate to parent (root) logger: as it makes it log the messages too
- implement a new method `is_logged()` in `RedFishClient`
- refactor the logic regarding caught errors in `RedFishClient`
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
d43452d57f4342a8d0bf0b38e138e92945ba4eb6 )
Guillaume Abrioux [Fri, 6 Oct 2023 11:10:39 +0000 (11:10 +0000)]
mgr/cephadm: add NodeProxyCache class
This is for tracking and caching any node-proxy data.
The node-proxy API now uses this class to serve its data.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
a48c34ef0034de335c1ec5d599272fc9d958a506 )
Guillaume Abrioux [Wed, 4 Oct 2023 10:00:26 +0000 (10:00 +0000)]
monitoring: add new alerts
This adds new hardware monitoring alerts.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
76d8e0bbbf2c5130a325943ffe09791cbd4f2feb )
Guillaume Abrioux [Fri, 29 Sep 2023 13:05:31 +0000 (13:05 +0000)]
node-proxy: validate_node_proxy_data() refactor
This introduces minor changes in order to improve error
handling in validate_node_proxy_data()
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
05cc6afe4b76d35f549bb8928e459bc8fb93697c )
Guillaume Abrioux [Wed, 27 Sep 2023 13:00:17 +0000 (13:00 +0000)]
node-proxy: lower verbosity level
This reduces the verbosity level for some messages.
These are generating a lot of messages while they can be needed
only for debugging purposes.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
9fee8362a91fc4c263a69003733d6c1fde37db5e )
Guillaume Abrioux [Wed, 27 Sep 2023 09:41:49 +0000 (09:41 +0000)]
node-proxy: update alert names
Given that the 'node-proxy' terminology is internal, let's change
the few node-proxy related alert names to something
more user friendly as they are intended to be seen by the user
(NODE_PROXY_xxx > HARDWARE_xxx).
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
14bfc07a1a3ad483f2f91c5d1fde9f073c12867e )
Guillaume Abrioux [Wed, 27 Sep 2023 08:27:28 +0000 (08:27 +0000)]
node-proxy: split redfishdell class
This refactors split the redfishdell class in order
to collect power and thermal details from the redfish API.
'power' and 'thermal' details are very different in many points:
- not available at the same endpoint,
- data structure is different.
For these two reasons, let's split that class.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
db0172186a753d57c357a5396378d1158e3167e3 )
Guillaume Abrioux [Thu, 21 Sep 2023 14:52:01 +0000 (14:52 +0000)]
cephadm/agent: endpoint refactor
These changes are required in order to be able to re-use
the existing agent endpoint. The current code doesn't ease/allow
adding a new application. The idea here is to add a new class for
handling the '/node-proxy' endpoint.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
27b7f98e5c0816d07327bae22d39453608860390 )
Guillaume Abrioux [Tue, 19 Sep 2023 11:49:44 +0000 (11:49 +0000)]
node-proxy: raise ceph warning(s) if needed
This makes the agent endpoint raise alert(s) when one or multiple
members of a component are critical.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
b45ba22920afbd1471ad3163157f7dc612e6a1f1 )
Guillaume Abrioux [Tue, 19 Sep 2023 07:55:54 +0000 (07:55 +0000)]
node-proxy: drop redfish library dependency
Given that this library isn't packaged for both
upstream and downstream and we can achieve what it was used for
directly with a lib such `urllib` (basically just auth), let's
drop this dependency.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
9da37815ad48f00088ae028041b4621e91725985 )
Guillaume Abrioux [Tue, 19 Sep 2023 07:46:42 +0000 (07:46 +0000)]
node-proxy: logging refactor
This makes `logger` a class attribute so we don't have
the `Logger` instantiation outside of the different classes.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
88ad166b21815c775c50d902c573a65206e40f3e )
Guillaume Abrioux [Tue, 19 Sep 2023 07:41:57 +0000 (07:41 +0000)]
node-proxy: add __init__.py file
In order to make node-proxy a package.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
f4d3c59feb6fe5969bbd578850079226d1af6ad2 )
Guillaume Abrioux [Mon, 18 Sep 2023 06:50:24 +0000 (06:50 +0000)]
node-proxy: parametrize reporter url
node-proxy entrypoint (`server.main()`) now takes two parameters
(addr / port) in order to make the reporter agent know how to reach
the http agent endpoint hosted in the mgr daemon.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
043e827c2d7c4ac78808efff5627d75a3ed5a3bb )
Guillaume Abrioux [Thu, 14 Sep 2023 16:10:01 +0000 (16:10 +0000)]
node-proxy: modify the endpoint url from default config
This updates the endpoint url from DEFAULT_CONFIG in order
to match the new endpoint recently added.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
74df8b711f81138b38a960ac9cf39291f7d7d906 )
Guillaume Abrioux [Thu, 14 Sep 2023 16:08:26 +0000 (16:08 +0000)]
node-proxy: update reporter agent
This commit introduces the required changes in order to make
the reporter agent query the new mgr endpoint '/node-proxy/data'
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
9a305c5c8e94e12b6c103b3b3f4201f4fc3616c9 )
Guillaume Abrioux [Thu, 14 Sep 2023 15:53:34 +0000 (15:53 +0000)]
node-proxy: fetch idrac details from ceph
The idrac details are now fetched from ceph (monitor kv store) and
passed by the cephadm binary at the agent startup.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
c5e705abaa9df28862f88ba319e8dd9c6d710fac )
Guillaume Abrioux [Thu, 14 Sep 2023 15:41:32 +0000 (15:41 +0000)]
mgr/cephadm: add node-proxy endpoints to the mgr
This adds 2 endpoints to the existing http agent endpoint:
- '/node_proxy/idrac': support POST requests only although this endpoint
is intended for fetching the idrac credentials of a given node. As we pass
sensitive details (ceph secret) I didn't want to pass it as a query parameter
in the url. Passing it in a HTTP header is perhaps a better approach but we already
do similar thing for endpoint '/data' (agent) so for consistency reason I stick to
that.
- '/node_proxy/data': support GET and POST requests. A GET will return the
aggregated data for all nodes within the cluster. node-proxy will use a POST
request to that endpoint to push its collected data.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
c1324cd821ef005474eddd5d009e499de1a51ee3 )
Guillaume Abrioux [Thu, 14 Sep 2023 15:32:38 +0000 (15:32 +0000)]
cephadm/binary: add `query_endpoint()` method
This encapsulates the existing code in a new method
`query_endpoint()`.
The idea is to avoid duplicating code if we need to make multiple
calls to the agent endpoint from the `run()` method.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
7544406be33a579b3d0c63ee4c78ae91b02dfb0e )
Guillaume Abrioux [Thu, 14 Sep 2023 15:27:45 +0000 (15:27 +0000)]
mgr/cephadm: store oob mgmt credentials in mon kv store
The idea is to store the oob mgmt credentials into the monitor kv store
when they are passed via a host spec.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
08e3d7ff5f70a1c6faafb64d982be6c684cfef06 )
Guillaume Abrioux [Thu, 14 Sep 2023 15:16:57 +0000 (15:16 +0000)]
python-common: update HostSpec
This adds new parameters to the current spec 'HostSpec'.
The idea is to make it possible to pass idrac credentials so
it will be possible for the node-proxy agent to consume them in order
to communicate with the redfish API.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
22247f0b1e39bc485fa66fbd7a802203eb5279a9 )
Guillaume Abrioux [Thu, 17 Aug 2023 09:21:00 +0000 (11:21 +0200)]
node-proxy: migrate to cephadm-agent
This moves the existing files to the new directory 'cephadmlib' so
we can make the existing code for node-proxy run within the cephadm
agent. Indeed, we can leverage the existing code for the cephadm agent
given that both daemons would achieve the same thing.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
83661b6c1a25b2d40f3cefa9f5de094c644a1e4e )
Guillaume Abrioux [Thu, 17 Aug 2023 09:18:10 +0000 (11:18 +0200)]
node-proxy: rename directory
this renames the node-proxy directory node-proxy > node_proxy
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
400edcbd05305baed8d790aeefe48958a28d2b18 )
Guillaume Abrioux [Thu, 22 Jun 2023 13:54:55 +0000 (15:54 +0200)]
node-proxy: add unit tests for node-proxy endpoint
This adds some unit tests for the node-proxy endpoint recently added to
the mgr.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
300c99a2f0afd5999938e7e614188b80ee61853b )
Guillaume Abrioux [Tue, 20 Jun 2023 12:35:02 +0000 (14:35 +0200)]
node-proxy: move administration operations to /admin path
This adds a new path /admin where all administrator operation are grouped.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
2995c6a277159735002686d48484df7d6ae25ac0 )
Guillaume Abrioux [Tue, 20 Jun 2023 12:33:42 +0000 (14:33 +0200)]
node-proxy: add new endpoint for flushing the data
Although this is mostly for devel and debug purposes at the moment,
it might be useful to be able to flush the data whenever the user needs it.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
6677a6838493d5c6c6600edcf02d17a95f36b965 )
Guillaume Abrioux [Tue, 20 Jun 2023 12:24:42 +0000 (14:24 +0200)]
node-proxy: try to acquire lock early in reporter's loop
The lock should be acquired early in this loop.
If the lock gets acquired by another call after we enter that condition *and*
before Reporter.loop() actually acquires it, it can lead to issue if during
this short amount of time the value of `data_ready` gets modified
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
3f7384c7e1a9656dcc91fcd9e34c9095371a2a1e )
Guillaume Abrioux [Tue, 20 Jun 2023 11:33:14 +0000 (13:33 +0200)]
node-proxy: variabilize the observer_url
create a new parameter in DEFAULT_CONFIG for the reporter agent.
The default value, (especially the tcp port) still has to be defined though.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
ecbbcb432f1b4d08f4e2d011d821a30e102dd89a )
Guillaume Abrioux [Tue, 20 Jun 2023 11:31:40 +0000 (13:31 +0200)]
node-proxy: update endpoint url in Reporter.loop()
change the path of the endpoint to something more generic
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
f71dad1a11abf73ab17028e8a983439401c3893f )
Guillaume Abrioux [Tue, 20 Jun 2023 11:30:36 +0000 (13:30 +0200)]
node-proxy: implement _update_memory() in redfish_dell.py
This implements the `_update_memory()` method in redfish_dell.py
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
b1d00d9a5a63fed9d866bc7c44c89b0b1580301d )
Guillaume Abrioux [Tue, 20 Jun 2023 11:28:55 +0000 (13:28 +0200)]
node-proxy: redfish_dell.py refactor
This commit introduces a small refactor of `redfish_dell.py` in order
to avoid code redundancy.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
c538030f9e70afc687ed1e5734d0d603fc4b0a31 )
Guillaume Abrioux [Fri, 16 Jun 2023 11:09:48 +0000 (13:09 +0200)]
node-proxy: RedfishClient class refactor
This implements BaseClient class and make RedfishClient inherit from it.
Same logic as BaseSystem / RedfishSystem given that any other backend could
need to implement a new client for collecting the data.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
5cd39211401fcbbcb8a8e3441fd42043b45238dd )
Guillaume Abrioux [Fri, 16 Jun 2023 11:07:34 +0000 (13:07 +0200)]
node-proxy: fix mypy warning regarding Config.logging
Config's attributes are dynamically created so mypy complains.
using `__dict__['logging']` addresses that.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
5b6e762383efa7d1e846ac6c3ec1f912f6d60248 )
Guillaume Abrioux [Fri, 16 Jun 2023 11:06:03 +0000 (13:06 +0200)]
node-proxy: rename server-v2.py
As the previous version has been removed, let's rename this file.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
37f33ec87e830989dadd17dbfc0dfde1f58877c1 )
Guillaume Abrioux [Fri, 16 Jun 2023 11:04:56 +0000 (13:04 +0200)]
node-proxy: drop old server.py
This version relies on flask.
At the end, we decided to migrate to cherrypy given that
we already use it quite a lot in ceph/ceph
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
8c1036374008d1422e2f3485012231a3d1da77b8 )
Guillaume Abrioux [Fri, 16 Jun 2023 09:13:56 +0000 (11:13 +0200)]
node-proxy: create entrypoint main()
This creates a `main()` function in server.py that will be the
entrypoint of node-proxy.
This also implement arg parsing and add a `--config` parameter
to specify the configuration file.
Finally, this introduce a small refactor of class `Config` and class
`Logger` in util.py because there was a circular dependency between them.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
f2f87f4259bbfe1014f5a2309a82f5b08a8d78d3 )
Guillaume Abrioux [Fri, 16 Jun 2023 06:08:38 +0000 (08:08 +0200)]
node-proxy: rename System to BaseSystem
In order to avoid confusion or redefinition issue with class System()
defined in server.py.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
3bb2863d5ac14fbadd609cfb3c494acc3ba8c9f0 )
Guillaume Abrioux [Thu, 15 Jun 2023 14:23:13 +0000 (16:23 +0200)]
node-proxy: add a timeout when posting data
if this call is stuck for any reason, the report will block
the whole daemon given that at this point it has acquired a lock.
We need to make sure this call won't block the daemon for a long time,
let's add a timeout.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
a3aff1b848a3785dd2e3752a79c8c819e6445239 )
Guillaume Abrioux [Thu, 15 Jun 2023 14:20:31 +0000 (16:20 +0200)]
node-proxy: (Redfish_System) reuse the existing client when possible
Otherwise, the method start_client() recreates a new client.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
ee1d4e49d1431365ceed4043a59d9f91123c4506 )
Guillaume Abrioux [Thu, 15 Jun 2023 14:19:27 +0000 (16:19 +0200)]
node-proxy: remove a redundant message
This message is not needed given that there's the same in
the RedFishClient class.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
415dc693ffaab4e6bbcfd5e2891625c4707bd7e3 )
Guillaume Abrioux [Mon, 12 Jun 2023 12:36:54 +0000 (14:36 +0200)]
node-proxy: add requirements.txt
This adds the requirements.txt file in order to manage the required
libraries.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
31b46ff9b8901d0a54cfedaf219a280c4802676a )
Guillaume Abrioux [Fri, 9 Jun 2023 13:03:24 +0000 (15:03 +0200)]
node-proxy: add a retry on redfish_client.get_path() calls
The idea is to retry multiple times before stating the endpoint is
definitely unreachable.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
c8f31a1ef01d777e9ef8aae1a895dfcf0a6dea8b )
Guillaume Abrioux [Fri, 9 Jun 2023 12:58:02 +0000 (14:58 +0200)]
node-proxy: add a decorator 'retry'
This decorator will be useful for calls that should do multiple
attempts before actually failing.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
9b88e5a782b2c10ce782ca09ff2bb56bb0a82200 )
Guillaume Abrioux [Thu, 8 Jun 2023 16:31:38 +0000 (18:31 +0200)]
node-proxy: add type annotation
This commit adds the type annotation in all files.
This was missing since the initial implementation, let's add
it before the project gets bigger.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
ee8e28baafbe6861a21514c2af05b77a42d6f963 )
Guillaume Abrioux [Thu, 8 Jun 2023 16:22:26 +0000 (18:22 +0200)]
node-proxy: address some flake8 linting errors
This addresses some flake8 errors.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
4d63a0a18dbbb5259dad098ea0184edd5c3655bb )
Guillaume Abrioux [Thu, 8 Jun 2023 13:12:16 +0000 (15:12 +0200)]
node-proxy: implement config & logging management
This adds the classes 'Config' and 'Logger' in order to manage
the logging and the configuration within the node-proxy daemon.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
c5acf8183c7d6d02fb8fa301b2acdec096e37059 )
Guillaume Abrioux [Wed, 7 Jun 2023 12:23:57 +0000 (14:23 +0200)]
node-proxy: catch RequestException in reporter
This catches the requests.exceptions.RequestException
exception in the reporter agent so we can better handle the
case where it can't reach the endpoint when trying to send the
collected data.
Before this change, if for some reason the refreshed data couldn't be
sent to the endpoint, it wouldn't have retried because
`self.system.previous_data` was overwritten anyway.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
6d9198519d7b0d51e00d785d7be1f06e2e7509e3 )
Guillaume Abrioux [Wed, 7 Jun 2023 12:20:07 +0000 (14:20 +0200)]
node-proxy: catch more error in redfish_client
This catches more potential exceptions in the redfish_client
class.
So if an error is caught we can log a more accurate and nicer message.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
c8653e4cf64af5156d571d5e2ffe7e912ac0a78e )
Guillaume Abrioux [Mon, 22 May 2023 12:27:48 +0000 (14:27 +0200)]
node-proxy: add some logging in the reporter agent
This adds some calls to the logging module, mostly for
devel/debug purposes at the moment.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
102a80fc298a4292e14554e7d57db6c541889468 )
Guillaume Abrioux [Mon, 22 May 2023 12:26:54 +0000 (14:26 +0200)]
node-proxy: fix a typo in redfish_system.get_status()
s/Status/status
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
7d30c787779078b653d29d31be812580a86602d6 )
Guillaume Abrioux [Mon, 22 May 2023 12:25:35 +0000 (14:25 +0200)]
node-proxy: redfish_system.get_system refactor
This method should return the 'unified structure' version of the
collected data instead of the huge json returned by redfish.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
9f72e688c79ebf7883801f108cec3772b16e8d3c )
Guillaume Abrioux [Mon, 22 May 2023 12:20:54 +0000 (14:20 +0200)]
node-proxy: add a lock mechanism
The loop in the reporter agent has to wait that the data are all
collected before checking and pushing them to the ceph-mgr (if needed).
The idea is to use the lock mechanism offered by the threading module
from python.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit
fe03bf3676ee2b351a0155491bc5eb4bb7b3d1a3 )