]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
17 months agonode-proxy: run only when idrac details provided
Guillaume Abrioux [Wed, 22 Nov 2023 14:27:09 +0000 (14:27 +0000)]
node-proxy: run only when idrac details provided

This agent shouldn't run when no idrac details are
available.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit a21779c39d495a06f6f908594c541e5aa818b4f6)

17 months agocephadm: inventory.NodeProxyCache() refactor
Guillaume Abrioux [Mon, 20 Nov 2023 14:55:26 +0000 (14:55 +0000)]
cephadm: inventory.NodeProxyCache() refactor

This modifies fullreport(), summary() and common() methods
so they use the same logic as firmwares() and criticals()

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 1675b6fe4ee3a6c43204ddf698c845b09ab7a2db)

17 months agocephadm/agent: add docstring to NodeProxy class
Guillaume Abrioux [Thu, 16 Nov 2023 13:35:51 +0000 (13:35 +0000)]
cephadm/agent: add docstring to NodeProxy class

In order to document that part of the code and it might
help to generate API spec and documentation.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 5e1051bdbdc4d1720aa58f5b584df89bd1dd3d6d)

17 months agonode-proxy: implement criticals endpoint
Guillaume Abrioux [Mon, 30 Oct 2023 15:51:56 +0000 (15:51 +0000)]
node-proxy: implement criticals endpoint

This adds the required changes in order to implement the endpoint
'/criticals'.

The goal of this endpoint is to provide a report of all critical statuses
for either a given host or all hosts across the cluster.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit ae791f8721027a9a508c7cd27e85f86f6fe7c492)

17 months agoorch/cephadm: implement `ceph orch hardware` command
Guillaume Abrioux [Thu, 26 Oct 2023 14:34:10 +0000 (14:34 +0000)]
orch/cephadm: implement `ceph orch hardware` command

This adds a first implementation of the `ceph orch hardware` CLI.

Usage:

```
ceph orch hardware status [<hostname>] [--category <value>]
```

Omitting the `[<hostname>]` argument will generate a report for all hosts.
The default for argument `[--category]` is `summary`.

Example with `--category` :

```
+------------+-------------+-------+--------+---------+
|    HOST    |     NAME    | SPEED | STATUS |  STATE  |
+------------+-------------+-------+--------+---------+
| ceph-00001 |   eno8303   |   0   |   OK   | Enabled |
| ceph-00001 |   eno8403   |   0   |   OK   | Enabled |
| ceph-00001 | eno12399np0 | 10000 |   OK   | Enabled |
| ceph-00001 | eno12409np1 | 10000 |   OK   | Enabled |
| ceph-00001 |    bond0    | 10000 |   OK   | Enabled |
+------------+-------------+-------+--------+---------+
```

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 1665156eea9e57e533a2ded26a8f7b37df68f5c5)

17 months agonode-proxy: validate_node_proxy_data() refactor
Guillaume Abrioux [Thu, 16 Nov 2023 09:48:02 +0000 (09:48 +0000)]
node-proxy: validate_node_proxy_data() refactor

raise cherrypy.HTTPError() when the received data is
not valid instead of returning `self.validate_msg`

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit b5814cd9278c857b7e09a1dbe229a7cdead10a29)

17 months agonode-proxy: implement http_query() helper function
Guillaume Abrioux [Wed, 25 Oct 2023 15:07:09 +0000 (15:07 +0000)]
node-proxy: implement http_query() helper function

so we can drop the dependency to `requests` and
use same helper function from both reporter.py and redfish_client.py

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit cae0e5e510eb3bad5132deb0332942aa294c6e8b)

17 months agonode-proxy: address mypy and flake8 errors
Guillaume Abrioux [Tue, 24 Oct 2023 11:28:11 +0000 (11:28 +0000)]
node-proxy: address mypy and flake8 errors

This addresses some flake8 and python typing errors.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 5b6f18d7ad602921a25c8b8acfaf7b454cdbba0b)

17 months agonode-proxy: fetch idrac details from NodeProxyCache()
Guillaume Abrioux [Tue, 24 Oct 2023 08:43:53 +0000 (08:43 +0000)]
node-proxy: fetch idrac details from NodeProxyCache()

The class ` NodeProxyCache()` is intended for that, it already
has this information so there's no need to make a call to `get_store()`
each time we want to access idrac details.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit a0f96aa5f1a27ec84e09f0bd030f62e39203e4f7)

17 months agonode-proxy: parametrize idrac port
Guillaume Abrioux [Mon, 23 Oct 2023 15:28:35 +0000 (15:28 +0000)]
node-proxy: parametrize idrac port

This adds the missing piece to make the idrac port
a parameter that one can customize.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 69f1272cbf036f8388398093def5136f420635f5)

17 months agocephadm: add new option to CLI
Guillaume Abrioux [Mon, 23 Oct 2023 13:42:09 +0000 (13:42 +0000)]
cephadm: add new option to CLI

this adds the `--deploy-cephadm-agent` option to the cephadm
CLI's bootstrap subcommand.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 4c3979788fccbe01ff23163ea61cbdf8f74d9cbd)

17 months agonode-proxy: implement /led endpoint
Guillaume Abrioux [Fri, 20 Oct 2023 16:12:55 +0000 (16:12 +0000)]
node-proxy: implement /led endpoint

This is the first 'act on node' feature implementation.

This adds the endpoint /led

a GET request to this endpoint returns the current status
of the enclosure LED.
a PATCH request to this endpoint allows to set the
enclosure LED status.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 76dd9aa47095f1fca644879656b1fe17a033b9c4)

17 months agonode-proxy: drop dispatch() in NodeProxy()
Guillaume Abrioux [Fri, 20 Oct 2023 09:21:16 +0000 (09:21 +0000)]
node-proxy: drop dispatch() in NodeProxy()

The current logic prevents from using any cherrypy decorators
on actual endpoints as we use a set of 'proxy functions'
(index and dispatch) instead.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 1ec59e6625bae7cd381d83817196bf8669f641ad)

17 months agonode-proxy: local API (NodeProxy) refactor
Guillaume Abrioux [Thu, 19 Oct 2023 07:42:24 +0000 (07:42 +0000)]
node-proxy: local API (NodeProxy) refactor

- subclass cherrypy._cpserver.Server,
  - drop cherrypy.quickstart() call,
  - drop nested classes approach,
- make it run over https
- print tracebacks when an exception is raised

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 1c79d6493ac35ae0394c492616f95220fbe1fbb4)

17 months agonode-proxy: clean up node_proxy dir
Guillaume Abrioux [Fri, 13 Oct 2023 12:15:21 +0000 (12:15 +0000)]
node-proxy: clean up node_proxy dir

This removes a legacy file that is not needed any longer.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit fe41c29d9a135815c5b1937589aa31066763be63)

17 months agonode-proxy: collect firmwares details
Guillaume Abrioux [Fri, 13 Oct 2023 12:09:56 +0000 (12:09 +0000)]
node-proxy: collect firmwares details

This makes all the required changes in order to support
collecting, pushing and exposing data regarding firmwares
status and versions for all the underlying hardware.
This also refactors the redfish dell corresponding logic:
Having so many nested/inheritance classes seems unnecessary.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit a9afa2f6adad2cff04b54bfd69e8883b4b9fb1cb)

17 months agonode-proxy: update the JSON data structure
Guillaume Abrioux [Thu, 12 Oct 2023 13:29:19 +0000 (13:29 +0000)]
node-proxy: update the JSON data structure

Change the data structure from:
```
{
  "storage": "ok",
  "processors": "ok",
  "network": "ok",
  "memory": "ok",
  "power": "ok",
  "fans": "ok"
}
```
to:

```
{
    "host": "node1",
    "sn": "xxxx",
    "status": {
        "storage": {
        }
    }
}
```

In order to provide a unique key (sn) which is more reliable at the top
level of the dict.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 65d3f781f92505eb708716eb281c670a71ed503c)

17 months agonode-proxy: quick clean up
Guillaume Abrioux [Wed, 11 Oct 2023 15:15:50 +0000 (15:15 +0000)]
node-proxy: quick clean up

This removes some files which are not needed any longer.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit dcfeea4ea15d8bb566c4d40bc1ab2013a9c044a1)

17 months agonode-proxy: run all update functions in parallel
Guillaume Abrioux [Wed, 11 Oct 2023 14:50:40 +0000 (14:50 +0000)]
node-proxy: run all update functions in parallel

This makes the update logic run faster.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 045e508f2e3a8c8367ceaeafe91ea0c397dceae5)

17 months agocephadm/node-proxy: reset ceph warning when needed
Guillaume Abrioux [Wed, 11 Oct 2023 08:34:38 +0000 (08:34 +0000)]
cephadm/node-proxy: reset ceph warning when needed

This makes the mgr reset the warning when the alert is fixed.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit e7d6b109a264d5964363eee0af2e0051e19bf2d6)

17 months agonode-proxy: rename server.py -> main.py
Guillaume Abrioux [Tue, 10 Oct 2023 12:42:42 +0000 (12:42 +0000)]
node-proxy: rename server.py -> main.py

This is going to be the entrypoint of node-proxy, let's rename
this file to main.py

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit d08006f487abd3cecb957b4e82330d0b4ff27d6e)

17 months agonode-proxy: subclass Thread class
Guillaume Abrioux [Tue, 10 Oct 2023 12:41:09 +0000 (12:41 +0000)]
node-proxy: subclass Thread class

The idea is to subclass Thread so I can catch
exceptions in threads from the main process.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit c04a88c4d24d83a5d1a60341514a98c87fe6f833)

17 months agonode-proxy: drop current main.py
Guillaume Abrioux [Tue, 10 Oct 2023 12:38:12 +0000 (12:38 +0000)]
node-proxy: drop current main.py

This file was there for devel purposes.
Let's drop it as it is not used any longer.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit c45bb65fe88bbc6f2e6f9d82adc0bc4e594b1c43)

17 months agocephadm/node-proxy: logging issues / error handling refactor
Guillaume Abrioux [Fri, 6 Oct 2023 13:55:21 +0000 (13:55 +0000)]
cephadm/node-proxy: logging issues / error handling refactor

- fix multiple logging issue because of new handler
  added each time `Logger` is called
- do not propagate to parent (root) logger: as it makes it log the messages too
- implement a new method `is_logged()` in `RedFishClient`
- refactor the logic regarding caught errors in `RedFishClient`

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit d43452d57f4342a8d0bf0b38e138e92945ba4eb6)

17 months agomgr/cephadm: add NodeProxyCache class
Guillaume Abrioux [Fri, 6 Oct 2023 11:10:39 +0000 (11:10 +0000)]
mgr/cephadm: add NodeProxyCache class

This is for tracking and caching any node-proxy data.
The node-proxy API now uses this class to serve its data.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit a48c34ef0034de335c1ec5d599272fc9d958a506)

17 months agomonitoring: add new alerts
Guillaume Abrioux [Wed, 4 Oct 2023 10:00:26 +0000 (10:00 +0000)]
monitoring: add new alerts

This adds new hardware monitoring alerts.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 76d8e0bbbf2c5130a325943ffe09791cbd4f2feb)

17 months agonode-proxy: validate_node_proxy_data() refactor
Guillaume Abrioux [Fri, 29 Sep 2023 13:05:31 +0000 (13:05 +0000)]
node-proxy: validate_node_proxy_data() refactor

This introduces minor changes in order to improve error
handling in validate_node_proxy_data()

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 05cc6afe4b76d35f549bb8928e459bc8fb93697c)

17 months agonode-proxy: lower verbosity level
Guillaume Abrioux [Wed, 27 Sep 2023 13:00:17 +0000 (13:00 +0000)]
node-proxy: lower verbosity level

This reduces the verbosity level for some messages.
These are generating a lot of messages while they can be needed
only for debugging purposes.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 9fee8362a91fc4c263a69003733d6c1fde37db5e)

17 months agonode-proxy: update alert names
Guillaume Abrioux [Wed, 27 Sep 2023 09:41:49 +0000 (09:41 +0000)]
node-proxy: update alert names

Given that the 'node-proxy' terminology is internal, let's change
the few node-proxy related alert names to something
more user friendly as they are intended to be seen by the user

(NODE_PROXY_xxx > HARDWARE_xxx).

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 14bfc07a1a3ad483f2f91c5d1fde9f073c12867e)

17 months agonode-proxy: split redfishdell class
Guillaume Abrioux [Wed, 27 Sep 2023 08:27:28 +0000 (08:27 +0000)]
node-proxy: split redfishdell class

This refactors split the redfishdell class in order
to collect power and thermal details from the redfish API.

'power' and 'thermal' details are very different in many points:

- not available at the same endpoint,
- data structure is different.

For these two reasons, let's split that class.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit db0172186a753d57c357a5396378d1158e3167e3)

17 months agocephadm/agent: endpoint refactor
Guillaume Abrioux [Thu, 21 Sep 2023 14:52:01 +0000 (14:52 +0000)]
cephadm/agent: endpoint refactor

These changes are required in order to be able to re-use
the existing agent endpoint. The current code doesn't ease/allow
adding a new application. The idea here is to add a new class for
handling the '/node-proxy' endpoint.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 27b7f98e5c0816d07327bae22d39453608860390)

17 months agonode-proxy: raise ceph warning(s) if needed
Guillaume Abrioux [Tue, 19 Sep 2023 11:49:44 +0000 (11:49 +0000)]
node-proxy: raise ceph warning(s) if needed

This makes the agent endpoint raise alert(s) when one or multiple
members of a component are critical.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit b45ba22920afbd1471ad3163157f7dc612e6a1f1)

17 months agonode-proxy: drop redfish library dependency
Guillaume Abrioux [Tue, 19 Sep 2023 07:55:54 +0000 (07:55 +0000)]
node-proxy: drop redfish library dependency

Given that this library isn't packaged for both
upstream and downstream and we can achieve what it was used for
directly with a lib such `urllib` (basically just auth), let's
drop this dependency.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 9da37815ad48f00088ae028041b4621e91725985)

17 months agonode-proxy: logging refactor
Guillaume Abrioux [Tue, 19 Sep 2023 07:46:42 +0000 (07:46 +0000)]
node-proxy: logging refactor

This makes `logger` a class attribute so we don't have
the `Logger` instantiation outside of the different classes.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 88ad166b21815c775c50d902c573a65206e40f3e)

17 months agonode-proxy: add __init__.py file
Guillaume Abrioux [Tue, 19 Sep 2023 07:41:57 +0000 (07:41 +0000)]
node-proxy: add __init__.py file

In order to make node-proxy a package.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit f4d3c59feb6fe5969bbd578850079226d1af6ad2)

17 months agonode-proxy: parametrize reporter url
Guillaume Abrioux [Mon, 18 Sep 2023 06:50:24 +0000 (06:50 +0000)]
node-proxy: parametrize reporter url

node-proxy entrypoint (`server.main()`) now takes two parameters
(addr / port) in order to make the reporter agent know how to reach
the http agent endpoint hosted in the mgr daemon.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 043e827c2d7c4ac78808efff5627d75a3ed5a3bb)

17 months agonode-proxy: modify the endpoint url from default config
Guillaume Abrioux [Thu, 14 Sep 2023 16:10:01 +0000 (16:10 +0000)]
node-proxy: modify the endpoint url from default config

This updates the endpoint url from DEFAULT_CONFIG in order
to match the new endpoint recently added.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 74df8b711f81138b38a960ac9cf39291f7d7d906)

17 months agonode-proxy: update reporter agent
Guillaume Abrioux [Thu, 14 Sep 2023 16:08:26 +0000 (16:08 +0000)]
node-proxy: update reporter agent

This commit introduces the required changes in order to make
the reporter agent query the new mgr endpoint '/node-proxy/data'

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 9a305c5c8e94e12b6c103b3b3f4201f4fc3616c9)

17 months agonode-proxy: fetch idrac details from ceph
Guillaume Abrioux [Thu, 14 Sep 2023 15:53:34 +0000 (15:53 +0000)]
node-proxy: fetch idrac details from ceph

The idrac details are now fetched from ceph (monitor kv store) and
passed by the cephadm binary at the agent startup.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit c5e705abaa9df28862f88ba319e8dd9c6d710fac)

17 months agomgr/cephadm: add node-proxy endpoints to the mgr
Guillaume Abrioux [Thu, 14 Sep 2023 15:41:32 +0000 (15:41 +0000)]
mgr/cephadm: add node-proxy endpoints to the mgr

This adds 2 endpoints to the existing http agent endpoint:

- '/node_proxy/idrac': support POST requests only although this endpoint
  is intended for fetching the idrac credentials of a given node. As we pass
  sensitive details (ceph secret) I didn't want to pass it as a query parameter
  in the url. Passing it in a HTTP header is perhaps a better approach but we already
  do similar thing for endpoint '/data' (agent) so for consistency reason I stick to
  that.

- '/node_proxy/data': support GET and POST requests. A GET will return the
  aggregated data for all nodes within the cluster. node-proxy will use a POST
  request to that endpoint to push its collected data.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit c1324cd821ef005474eddd5d009e499de1a51ee3)

17 months agocephadm/binary: add `query_endpoint()` method
Guillaume Abrioux [Thu, 14 Sep 2023 15:32:38 +0000 (15:32 +0000)]
cephadm/binary: add `query_endpoint()` method

This encapsulates the existing code in a new method
`query_endpoint()`.
The idea is to avoid duplicating code if we need to make multiple
calls to the agent endpoint from the `run()` method.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 7544406be33a579b3d0c63ee4c78ae91b02dfb0e)

17 months agomgr/cephadm: store oob mgmt credentials in mon kv store
Guillaume Abrioux [Thu, 14 Sep 2023 15:27:45 +0000 (15:27 +0000)]
mgr/cephadm: store oob mgmt credentials in mon kv store

The idea is to store the oob mgmt credentials into the monitor kv store
when they are passed via a host spec.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 08e3d7ff5f70a1c6faafb64d982be6c684cfef06)

17 months agopython-common: update HostSpec
Guillaume Abrioux [Thu, 14 Sep 2023 15:16:57 +0000 (15:16 +0000)]
python-common: update HostSpec

This adds new parameters to the current spec 'HostSpec'.

The idea is to make it possible to pass idrac credentials so
it will be possible for the node-proxy agent to consume them in order
to communicate with the redfish API.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 22247f0b1e39bc485fa66fbd7a802203eb5279a9)

17 months agonode-proxy: migrate to cephadm-agent
Guillaume Abrioux [Thu, 17 Aug 2023 09:21:00 +0000 (11:21 +0200)]
node-proxy: migrate to cephadm-agent

This moves the existing files to the new directory 'cephadmlib' so
we can make the existing code for node-proxy run within the cephadm
agent. Indeed, we can leverage the existing code for the cephadm agent
given that both daemons would achieve the same thing.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 83661b6c1a25b2d40f3cefa9f5de094c644a1e4e)

17 months agonode-proxy: rename directory
Guillaume Abrioux [Thu, 17 Aug 2023 09:18:10 +0000 (11:18 +0200)]
node-proxy: rename directory

this renames the node-proxy directory node-proxy > node_proxy

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 400edcbd05305baed8d790aeefe48958a28d2b18)

17 months agonode-proxy: add unit tests for node-proxy endpoint
Guillaume Abrioux [Thu, 22 Jun 2023 13:54:55 +0000 (15:54 +0200)]
node-proxy: add unit tests for node-proxy endpoint

This adds some unit tests for the node-proxy endpoint recently added to
the mgr.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 300c99a2f0afd5999938e7e614188b80ee61853b)

17 months agonode-proxy: move administration operations to /admin path
Guillaume Abrioux [Tue, 20 Jun 2023 12:35:02 +0000 (14:35 +0200)]
node-proxy: move administration operations to /admin path

This adds a new path /admin where all administrator operation are grouped.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 2995c6a277159735002686d48484df7d6ae25ac0)

17 months agonode-proxy: add new endpoint for flushing the data
Guillaume Abrioux [Tue, 20 Jun 2023 12:33:42 +0000 (14:33 +0200)]
node-proxy: add new endpoint for flushing the data

Although this is mostly for devel and debug purposes at the moment,
it might be useful to be able to flush the data whenever the user needs it.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 6677a6838493d5c6c6600edcf02d17a95f36b965)

17 months agonode-proxy: try to acquire lock early in reporter's loop
Guillaume Abrioux [Tue, 20 Jun 2023 12:24:42 +0000 (14:24 +0200)]
node-proxy: try to acquire lock early in reporter's loop

The lock should be acquired early in this loop.

If the lock gets acquired by another call after we enter that condition *and*
before Reporter.loop() actually acquires it, it can lead to issue if during
this short amount of time the value of `data_ready` gets modified

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 3f7384c7e1a9656dcc91fcd9e34c9095371a2a1e)

17 months agonode-proxy: variabilize the observer_url
Guillaume Abrioux [Tue, 20 Jun 2023 11:33:14 +0000 (13:33 +0200)]
node-proxy: variabilize the observer_url

create a new parameter in DEFAULT_CONFIG for the reporter agent.
The default value, (especially the tcp port) still has to be defined though.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit ecbbcb432f1b4d08f4e2d011d821a30e102dd89a)

17 months agonode-proxy: update endpoint url in Reporter.loop()
Guillaume Abrioux [Tue, 20 Jun 2023 11:31:40 +0000 (13:31 +0200)]
node-proxy: update endpoint url in Reporter.loop()

change the path of the endpoint to something more generic

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit f71dad1a11abf73ab17028e8a983439401c3893f)

17 months agonode-proxy: implement _update_memory() in redfish_dell.py
Guillaume Abrioux [Tue, 20 Jun 2023 11:30:36 +0000 (13:30 +0200)]
node-proxy: implement _update_memory() in redfish_dell.py

This implements the `_update_memory()` method in redfish_dell.py

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit b1d00d9a5a63fed9d866bc7c44c89b0b1580301d)

17 months agonode-proxy: redfish_dell.py refactor
Guillaume Abrioux [Tue, 20 Jun 2023 11:28:55 +0000 (13:28 +0200)]
node-proxy: redfish_dell.py refactor

This commit introduces a small refactor of `redfish_dell.py` in order
to avoid code redundancy.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit c538030f9e70afc687ed1e5734d0d603fc4b0a31)

17 months agonode-proxy: RedfishClient class refactor
Guillaume Abrioux [Fri, 16 Jun 2023 11:09:48 +0000 (13:09 +0200)]
node-proxy: RedfishClient class refactor

This implements BaseClient class and make RedfishClient inherit from it.
Same logic as BaseSystem / RedfishSystem given that any other backend could
need to implement a new client for collecting the data.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 5cd39211401fcbbcb8a8e3441fd42043b45238dd)

17 months agonode-proxy: fix mypy warning regarding Config.logging
Guillaume Abrioux [Fri, 16 Jun 2023 11:07:34 +0000 (13:07 +0200)]
node-proxy: fix mypy warning regarding Config.logging

Config's attributes are dynamically created so mypy complains.
using `__dict__['logging']` addresses that.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 5b6e762383efa7d1e846ac6c3ec1f912f6d60248)

17 months agonode-proxy: rename server-v2.py
Guillaume Abrioux [Fri, 16 Jun 2023 11:06:03 +0000 (13:06 +0200)]
node-proxy: rename server-v2.py

As the previous version has been removed, let's rename this file.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 37f33ec87e830989dadd17dbfc0dfde1f58877c1)

17 months agonode-proxy: drop old server.py
Guillaume Abrioux [Fri, 16 Jun 2023 11:04:56 +0000 (13:04 +0200)]
node-proxy: drop old server.py

This version relies on flask.
At the end, we decided to migrate to cherrypy given that
we already use it quite a lot in ceph/ceph

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 8c1036374008d1422e2f3485012231a3d1da77b8)

17 months agonode-proxy: create entrypoint main()
Guillaume Abrioux [Fri, 16 Jun 2023 09:13:56 +0000 (11:13 +0200)]
node-proxy: create entrypoint main()

This creates a `main()` function in server.py that will be the
entrypoint of node-proxy.

This also implement arg parsing and add a `--config` parameter
to specify the configuration file.

Finally, this introduce a small refactor of class `Config` and class
`Logger` in util.py because there was a circular dependency between them.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit f2f87f4259bbfe1014f5a2309a82f5b08a8d78d3)

17 months agonode-proxy: rename System to BaseSystem
Guillaume Abrioux [Fri, 16 Jun 2023 06:08:38 +0000 (08:08 +0200)]
node-proxy: rename System to BaseSystem

In order to avoid confusion or redefinition issue with class System()
defined in server.py.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 3bb2863d5ac14fbadd609cfb3c494acc3ba8c9f0)

17 months agonode-proxy: add a timeout when posting data
Guillaume Abrioux [Thu, 15 Jun 2023 14:23:13 +0000 (16:23 +0200)]
node-proxy: add a timeout when posting data

if this call is stuck for any reason, the report will block
the whole daemon given that at this point it has acquired a lock.
We need to make sure this call won't block the daemon for a long time,
let's add a timeout.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit a3aff1b848a3785dd2e3752a79c8c819e6445239)

17 months agonode-proxy: (Redfish_System) reuse the existing client when possible
Guillaume Abrioux [Thu, 15 Jun 2023 14:20:31 +0000 (16:20 +0200)]
node-proxy: (Redfish_System) reuse the existing client when possible

Otherwise, the method start_client() recreates a new client.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit ee1d4e49d1431365ceed4043a59d9f91123c4506)

17 months agonode-proxy: remove a redundant message
Guillaume Abrioux [Thu, 15 Jun 2023 14:19:27 +0000 (16:19 +0200)]
node-proxy: remove a redundant message

This message is not needed given that there's the same in
the RedFishClient class.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 415dc693ffaab4e6bbcfd5e2891625c4707bd7e3)

17 months agonode-proxy: add requirements.txt
Guillaume Abrioux [Mon, 12 Jun 2023 12:36:54 +0000 (14:36 +0200)]
node-proxy: add requirements.txt

This adds the requirements.txt file in order to manage the required
libraries.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 31b46ff9b8901d0a54cfedaf219a280c4802676a)

17 months agonode-proxy: add a retry on redfish_client.get_path() calls
Guillaume Abrioux [Fri, 9 Jun 2023 13:03:24 +0000 (15:03 +0200)]
node-proxy: add a retry on redfish_client.get_path() calls

The idea is to retry multiple times before stating the endpoint is
definitely unreachable.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit c8f31a1ef01d777e9ef8aae1a895dfcf0a6dea8b)

17 months agonode-proxy: add a decorator 'retry'
Guillaume Abrioux [Fri, 9 Jun 2023 12:58:02 +0000 (14:58 +0200)]
node-proxy: add a decorator 'retry'

This decorator will be useful for calls that should do multiple
attempts before actually failing.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 9b88e5a782b2c10ce782ca09ff2bb56bb0a82200)

17 months agonode-proxy: add type annotation
Guillaume Abrioux [Thu, 8 Jun 2023 16:31:38 +0000 (18:31 +0200)]
node-proxy: add type annotation

This commit adds the type annotation in all files.
This was missing since the initial implementation, let's add
it before the project gets bigger.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit ee8e28baafbe6861a21514c2af05b77a42d6f963)

17 months agonode-proxy: address some flake8 linting errors
Guillaume Abrioux [Thu, 8 Jun 2023 16:22:26 +0000 (18:22 +0200)]
node-proxy: address some flake8 linting errors

This addresses some flake8 errors.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 4d63a0a18dbbb5259dad098ea0184edd5c3655bb)

17 months agonode-proxy: implement config & logging management
Guillaume Abrioux [Thu, 8 Jun 2023 13:12:16 +0000 (15:12 +0200)]
node-proxy: implement config & logging management

This adds the classes 'Config' and 'Logger' in order to manage
the logging and the configuration within the node-proxy daemon.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit c5acf8183c7d6d02fb8fa301b2acdec096e37059)

17 months agonode-proxy: catch RequestException in reporter
Guillaume Abrioux [Wed, 7 Jun 2023 12:23:57 +0000 (14:23 +0200)]
node-proxy: catch RequestException in reporter

This catches the requests.exceptions.RequestException
exception in the reporter agent so we can better handle the
case where it can't reach the endpoint when trying to send the
collected data.
Before this change, if for some reason the refreshed data couldn't be
sent to the endpoint, it wouldn't have retried because
`self.system.previous_data` was overwritten anyway.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 6d9198519d7b0d51e00d785d7be1f06e2e7509e3)

17 months agonode-proxy: catch more error in redfish_client
Guillaume Abrioux [Wed, 7 Jun 2023 12:20:07 +0000 (14:20 +0200)]
node-proxy: catch more error in redfish_client

This catches more potential exceptions in the redfish_client
class.
So if an error is caught we can log a more accurate and nicer message.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit c8653e4cf64af5156d571d5e2ffe7e912ac0a78e)

17 months agonode-proxy: add some logging in the reporter agent
Guillaume Abrioux [Mon, 22 May 2023 12:27:48 +0000 (14:27 +0200)]
node-proxy: add some logging in the reporter agent

This adds some calls to the logging module, mostly for
devel/debug purposes at the moment.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 102a80fc298a4292e14554e7d57db6c541889468)

17 months agonode-proxy: fix a typo in redfish_system.get_status()
Guillaume Abrioux [Mon, 22 May 2023 12:26:54 +0000 (14:26 +0200)]
node-proxy: fix a typo in redfish_system.get_status()

s/Status/status

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 7d30c787779078b653d29d31be812580a86602d6)

17 months agonode-proxy: redfish_system.get_system refactor
Guillaume Abrioux [Mon, 22 May 2023 12:25:35 +0000 (14:25 +0200)]
node-proxy: redfish_system.get_system refactor

This method should return the 'unified structure' version of the
collected data instead of the huge json returned by redfish.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 9f72e688c79ebf7883801f108cec3772b16e8d3c)

17 months agonode-proxy: add a lock mechanism
Guillaume Abrioux [Mon, 22 May 2023 12:20:54 +0000 (14:20 +0200)]
node-proxy: add a lock mechanism

The loop in the reporter agent has to wait that the data are all
collected before checking and pushing them to the ceph-mgr (if needed).
The idea is to use the lock mechanism offered by the threading module
from python.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit fe03bf3676ee2b351a0155491bc5eb4bb7b3d1a3)

17 months agonode-proxy: migrate to cherrypy
Guillaume Abrioux [Mon, 22 May 2023 12:19:09 +0000 (14:19 +0200)]
node-proxy: migrate to cherrypy

cherrypy is already widely used in Ceph.
Let's not add new dependencies and use cherrypy instead of
python-flask

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 213320d33863b2e76bb81e8de33bb78d0970dd28)

17 months agonode-proxy: add method start_client() redfish_system class
Guillaume Abrioux [Mon, 22 May 2023 12:15:05 +0000 (14:15 +0200)]
node-proxy: add method start_client() redfish_system class

This is going to be useful for a new endpoint '/start'

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 6f9d3d9e15305e80ea5797ff2f0dd0b929e70822)

17 months agonode-proxy: drop redfish_system._process_redfish_system method
Guillaume Abrioux [Mon, 22 May 2023 12:09:03 +0000 (14:09 +0200)]
node-proxy: drop redfish_system._process_redfish_system method

This method isn't needed, let's drop it.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 9b2a3345b6e52b152ebf680abf319300dad513d2)

17 months agonode-proxy: display error messages when Exception is caught
Guillaume Abrioux [Thu, 11 May 2023 11:29:05 +0000 (13:29 +0200)]
node-proxy: display error messages when Exception is caught

This is mostly for development purposes.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 4b9a4ec55fbcbeed48b1dc01594cf8ed65a23ef5)

17 months agonode-proxy: merge self._system with current values
Guillaume Abrioux [Thu, 11 May 2023 11:25:36 +0000 (13:25 +0200)]
node-proxy: merge self._system with current values

Otherwise `self._system` gets reset in each iteration.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 6ae1687f5f27b6d03dd2c46735de837c7429ae5b)

17 months agonode-proxy: add normalize_dict() function
Guillaume Abrioux [Thu, 11 May 2023 11:23:22 +0000 (13:23 +0200)]
node-proxy: add normalize_dict() function

this is to make sure all keys are converted into
lowercase.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 15b1122f7af6df3faac21ed4f1834ae826f5abb3)

17 months agonode-proxy: split RedfishSystem class
Guillaume Abrioux [Thu, 6 Apr 2023 15:29:28 +0000 (17:29 +0200)]
node-proxy: split RedfishSystem class

This class should be split because the logic will be different depending on the
hardware.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 4454c64f94bb2fd93ed796a832dedb42faecf3f9)

17 months agonode-proxy: implement storage endpoint
Guillaume Abrioux [Thu, 6 Apr 2023 12:56:48 +0000 (14:56 +0200)]
node-proxy: implement storage endpoint

This adds the required logic for the endpoint '/system/storage'
to gather and return data about physical drives.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit d919132be3e69e5494e44519e46a864833802b96)

17 months agonode-proxy: implement network endpoint
Guillaume Abrioux [Thu, 6 Apr 2023 12:55:41 +0000 (14:55 +0200)]
node-proxy: implement network endpoint

This adds the required logic for the endpoint '/system/network'
to gather and return data about network interfaces.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 4b9bc24f0dd1ab261c4b87667c6cdb77d3785185)

17 months agonode-proxy: implement processors endpoint
Guillaume Abrioux [Thu, 6 Apr 2023 12:53:41 +0000 (14:53 +0200)]
node-proxy: implement processors endpoint

This adds the required logic for the endpoint '/system/processors'
to gather and return data about CPUs.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 693a05a0cb38f3ca91d8aa3a67c0da23a491aa23)

17 months agonode-proxy: use `use_reloader=False`
Guillaume Abrioux [Wed, 5 Apr 2023 12:18:19 +0000 (14:18 +0200)]
node-proxy: use `use_reloader=False`

In order to prevent the server from restarting in a loop
when an error shows up. Otherwise, it creates a bunch of new
redfish client session and make it quickly unavailable due to the
session limit.
Probably not intended to be kept.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit dcbdfd33feda40df82dc898011bfbd690c7aca31)

17 months agonode-proxy: add a /shutdown endpoint
Guillaume Abrioux [Wed, 5 Apr 2023 12:16:29 +0000 (14:16 +0200)]
node-proxy: add a /shutdown endpoint

Add a '/shutdown' endpoint to force the client to logout and delete its current
session.
This is for devel puroposes and probably not intended to be kept.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit e06e65b78bdf44d38a3d47ab2040dc88e5cd130f)

17 months agonode-proxy: logout from redfish api on Exception
Guillaume Abrioux [Wed, 5 Apr 2023 12:14:40 +0000 (14:14 +0200)]
node-proxy: logout from redfish api on Exception

Otherwise it ends up recreating new session each time whereas the previous session
is left. After multiple failures, it reaches the limit and left sessions need to be
cleaned up manually.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 7c602947e45ceff719c105b0277a10a6e72831e5)

17 months agonode-proxy: variabilize the system_endpoint
Guillaume Abrioux [Wed, 5 Apr 2023 12:10:41 +0000 (14:10 +0200)]
node-proxy: variabilize the system_endpoint

This makes it possible to define the value of the 'System endpoint'.
This can be different according to the hardware.

This probably means that the class `RedfishSystem` should be split itself.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit e80cd0286a34a352fc098d72a9740e25156de9a8)

17 months agonode-proxy: improve logging
Guillaume Abrioux [Wed, 5 Apr 2023 12:08:38 +0000 (14:08 +0200)]
node-proxy: improve logging

this adds a new file `util.py` with a logger function in order
to improve a bit the logging.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 906286426f02e068f5f8379e9330b2dcbaace050)

17 months agonode-proxy: various unified interface changes
Guillaume Abrioux [Tue, 21 Mar 2023 06:07:54 +0000 (07:07 +0100)]
node-proxy: various unified interface changes

this slightly modifies the data structure of the unified interface.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit b853761836febe92f6460a13d554cd966ff2e529)
(cherry picked from commit ecc84f5b5aa8f8e45d5068956117ca793f805e18)

17 months agoFirst hardware-monitoring draft version
Redouane Kachach [Wed, 8 Mar 2023 14:27:57 +0000 (15:27 +0100)]
First hardware-monitoring draft version

Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 1c402576529edafdb8aa0aef241965e06fa4c151)

17 months agoMerge pull request #54629 from guits/wip-63599-reef
Guillaume Abrioux [Wed, 24 Jan 2024 15:28:41 +0000 (16:28 +0100)]
Merge pull request #54629 from guits/wip-63599-reef

reef: ceph-volume: fixes fallback to stat in is_device and is_partition

17 months agoMerge pull request #54705 from k0ste/wip-63312-reef
Guillaume Abrioux [Wed, 24 Jan 2024 15:28:31 +0000 (16:28 +0100)]
Merge pull request #54705 from k0ste/wip-63312-reef

reef: ceph-volume: fix a bug in _check_generic_reject_reasons

17 months agoMerge pull request #55282 from zdover23/wip-doc-2024-01-24-backport-55278-to-reef
zdover23 [Wed, 24 Jan 2024 05:29:57 +0000 (15:29 +1000)]
Merge pull request #55282 from zdover23/wip-doc-2024-01-24-backport-55278-to-reef

reef: doc: specify correct fs type for mkfs

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
17 months agodoc: specify correct fs type for mkfs 55282/head
Himura Kazuto [Tue, 23 Jan 2024 12:59:10 +0000 (12:59 +0000)]
doc: specify correct fs type for mkfs

The default value is ext2, which is not supported (anymore?).

Signed-off-by: Vladislav Glagolev <vladislav.glagolev@devexpress.com>
(cherry picked from commit 886af37744847246b3e70f54b8577ed4f9815c20)

17 months agoMerge pull request #55271 from zdover23/wip-doc-2024-01-23-backport-55269-to-reef
Anthony D'Atri [Tue, 23 Jan 2024 14:16:10 +0000 (09:16 -0500)]
Merge pull request #55271 from zdover23/wip-doc-2024-01-23-backport-55269-to-reef

reef: doc/radosgw: edit "read/write global rate limit" admin.rst

17 months agodoc/radosgw: edit "read/write global rate limit" admin.rst 55271/head
Zac Dover [Tue, 23 Jan 2024 02:13:10 +0000 (12:13 +1000)]
doc/radosgw: edit "read/write global rate limit" admin.rst

Edit "Reading/Writing Global Rate Limit Configuration" in
doc/radosgw/admin.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit c67a5e5d4bad17e7ae799dd62a66d1e23ec18942)

18 months agoMerge pull request #55263 from zdover23/wip-doc-2024-01-22-backport-54993-to-reef
Anthony D'Atri [Sun, 21 Jan 2024 21:57:11 +0000 (16:57 -0500)]
Merge pull request #55263 from zdover23/wip-doc-2024-01-22-backport-54993-to-reef

reef: doc/rados/operations: document `ceph balancer status detail`

18 months agodoc/rados/operations: document `ceph balancer status detail` 55263/head
Laura Flores [Fri, 22 Dec 2023 22:55:29 +0000 (22:55 +0000)]
doc/rados/operations: document `ceph balancer status detail`

Document change in https://github.com/ceph/ceph/pull/54801

Signed-off-by: Laura Flores <lflores@ibm.com>
(cherry picked from commit 159751b68085fbe0fe10a881ff8bedecda11142f)

18 months agoMerge pull request #55260 from zdover23/wip-doc-2024-01-21-backport-55190-to-reef
Anthony D'Atri [Sun, 21 Jan 2024 15:33:40 +0000 (10:33 -0500)]
Merge pull request #55260 from zdover23/wip-doc-2024-01-21-backport-55190-to-reef

reef: doc/radosgw: edit "Enable/Disable Bucket Rate Limit"