git-server-git.apps.pok.os.sepia.ceph.com Git

PendingReleaseNotes: add rbd_diff_iterate2 note

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 40e8813e9c705838eee42c98de717b20851aed72)

Conflicts:
PendingReleaseNotes [ moved to >=18.2.2 section ]

librbd: try to preserve object map for diff-iterate in fast-diff mode

As an optimization, try to ensure that the object map for the end
version is preloaded through the acquisition of exclusive lock and
as a consequence remains around until exclusive lock is released.
If it's not around, DiffRequest would (re)load it on each call.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 89b0d9e7b40a5f962094428e613315d3697d261f)

librbd/object_map: potentially use in-memory object map in DiffRequest

If the object map for the end version is around (already loaded in
memory, either due to the end version being a snapshot or due to
exclusive lock being held), use it to run diff-iterate against the
beginning of time. Since it's the only object map needed in that
case, such calls would be satisfied locally.

Fixes: https://tracker.ceph.com/issues/63341
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 0c4bb58c900efa2356ea8526d3432b2922787afa)

librbd/object_map: decouple object map processing in DiffRequest

In preparation for potentially using in-memory object map, decouple
object map processing from loading object maps and place the logic in
prepare_for_object_map() and process_object_map().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit dabb677ba5923f347c5b4b81b6a86214699a52bf)

common/bit_vector: fix iterator vs reference constness confusion

T (ConstIterator or Iterator) is confused with const T here:
IteratorImpl dereference operator is wrongly overloaded on const
and returns Reference instead of ConstReference for ConstIterator.
This then fails inside bufferlist bowels because Reference is
incompatible with bufferlist::const_iterator.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 45d534553abaca81e26574fd5a7b17b9219c0dd0)

librbd/object_map: make object map in handle_load_object_map() local

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 232ad1a5fb6248d7b3fbfaec5944a90a71a95806)

librbd/object_map: don't resize object map in handle_load_object_map()

Currently it's done in two cases:

- if the loaded object map is larger than expected based on byte size,
  it's truncated to expected number of objects
- in case of deep-copy, if the loaded object map is smaller than diff
  state, it's expanded to get "track the largest of all versions in the
  set" semantics

Both of these cases can be easily dealt with without modifying the
object map.  Being able to process a const object map is needed for
working on in-memory object map which is external to DiffRequest.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 275a299cd48d2ddac36608d6633a6b79c8927351)

common/bit_vector: fix IteratorImpl post-increment operator

It's totally broken: instead of returning the current position and
moving to the next position, it returns the next position and doesn't
move anywhere. Luckily it hasn't been used until now.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 2ab5b52f71c88cb55f8ed82f1dfd0115fdd6e022)

librbd: drop DiffIterate::diff_object_map() declaration

This is a leftover from commit 2b3a46801d39 ("librbd: switch
diff-iterate API to use new object-map diff helper").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 1503b96bf91d358540fc56a69b3cf18aa7eab68e)

librbd: propagate diff-iterate range to parent in fast-diff mode

When getting parent diff, pass the overlap-reduced image extent instead
of the entire 0..overlap range to avoid a similar quadratic slowdown on
cloned images.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 7677d4b1b52ab68484545d0fcd7348f2f8e9f263)

librbd/object_map: add support for ranged diff-iterate

Currently diff-iterate in fast-diff mode is performed on the entire
image no matter what image extent is passed to the API.  Then, unused
diff just gets discarded as DiffIterate ends up querying only objects
that the passed image extent maps to.  This hasn't been an issue for
internal consumers ("rbd du", "rbd diff", etc) because they work on the
entire image, but turns out to lead to quadratic slowdown in some QEMU
use cases.

0..UINT64_MAX range is carved out for deep-copy which is unranged by
definition.  To get effectively unranged diff-iterate, 0..UINT64_MAX-1
range can be used.

Fixes: https://tracker.ceph.com/issues/63341
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 0b5ba5fedf704ada74a65108af129eae6baea5c5)

include/intarith: introduce round_down_to()

Same as with round_up_to(), d isn't required to be a power of two.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 94bf3a5d74cfd61e0bd881ca2863fe2205f27767)

test/librbd: expand TestMockObjectMapDiffRequest edge case coverage

For each covered edge case or error, run through the following
scenarios:

- where the edge case concerns snap_id_start
- where the edge case concerns snap_id_end
- where the edge case concerns intermediate snapshot and
snap_id_start == 0 (diff against the beginning of time)
- where the edge case concerns intermediate snapshot and
snap_id_start != 0 (diff from snapshot)

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 9931282bfd2260d654325970555cab8c617e8f14)

librbd/object_map: allow intermediate snaps to be skipped on diff-iterate

In case of diff-iterate against the beginning of time, the result
depends only on the end version. Loading and processing object maps
or intermediate snapshots is redundant and can be skipped.

This optimization is made possible by commit be507aaed15f ("librbd:
diff-iterate shouldn't ever report "new hole" against a hole") and, to
a lesser extent, the previous commit.

Getting FastDiffInvalid, LoadObjectMapError and ObjectMapTooSmall to
pass required tweaking not just expectations, but also start/end snap
ids and thus also the meaning of these tests. This is addressed in the
next commit.

Fixes: https://tracker.ceph.com/issues/63341
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 23c675f04a4b4bb00794bbc59493d9591625a0a7)

librbd/object_map: resurrect diff-iterate behavior when image is shrunk

The new "track the largest of all versions in the set, diff state is
only ever grown" semantics introduced in commit 330f2a7bb94f ("librbd:
helper state machine for computing diffs between object-maps") don't
make sense for diff-iterate. It's a waste because DiffIterate won't
query beyond the end version size -- this is baked into the API.

Limit this behavior to deep-copy and resurrect the original behavior
from 2015 for diff-iterate.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 19c7c4a5359fa9c1d06cc11187e300251249ad9e)

librbd/object_map: fix diff from snapshot when image is grown

Commit 399a45e11332 ("librbd/object_map: rbd diff between two
snapshots lists entire image content") fixed most of the damage caused
by commit b81cd2460de7 ("librbd/object_map: diff state machine should
track object existence"), but the case of a "resize diff" when diffing
from snapshot was missed. An area that was freshly allocated in image
resize is the same in principle as a freshly created image and objects
marked OBJECT_EXISTS_CLEAN are no exception. Diff for such objects in
such an area should be set to DIFF_STATE_DATA_UPDATED, however
currently when diffing from snapshot, it's set to DIFF_STATE_DATA.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 34386d29a8b96de79c38465062b8e93e7fc6e184)

librbd/object_map: drop bogus if in handle_load_object_map()

It became redundant with commit b81cd2460de7 ("librbd/object_map: diff
state machine should track object existence") -- it != end_it condition
in the loop is sufficient.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 4e036d65b9146f28a2f6c0dfb353120baea8d62d)

test/librbd: refactor TestMockObjectMapDiffRequest tests

In preparation for multiple similarly configured MockTestImageCtx
objects being used in a single test, centralize their creation and add
a couple of helpers for setting expectations from a callback.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 718f6b5546890179f66d5ffadbae9e9cb0e6c97b)

test/librbd: improve TestMockObjectMapDiffRequest.InvalidStartSnap

Use a range where only snap_id_start is invalid.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 64a5afcaad7b61a20f630aca064c2953380c70c1)

Merge pull request #55405 from guits/node-proxy-reef

reef: orch: implement hardware monitoring

Reviewed-by: Adam King <adking@redhat.com>

Merge pull request #55375 from rkachach/fix_issue_64176

reef backport: rook e2e testing related PRs

mgr/rook: adding deployment to ceph image built for rook e2e testing
Fixes: https://tracker.ceph.com/issues/64286
using reef image as base

Signed-off-by: Redouane Kachach <rkachach@redhat.com>

Merge pull request #55361 from rhcs-dashboard/wip-63426-reef

reef: mgr/dashboard: get object bucket policies for a bucket

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>

Merge pull request #54951 from idryomov/wip-53897-reef

reef: librbd: don't report HOLE_UPDATED when diffing against a hole

Reviewed-by: Mykola Golub <mgolub@suse.com>

backport mgr/rook: adding metrics monitoring e2e testing
Fixes: https://tracker.ceph.com/issues/64247
Signed-off-by: Redouane Kachach <rkachach@redhat.com>

backport mgr/rook: adding some basic rook e2e testing
Fixes: https://tracker.ceph.com/issues/64176
Signed-off-by: Redouane Kachach <rkachach@redhat.com>

bacport mgr/rook: always recreate kvm default network + fix groups refresh
Fixes: https://tracker.ceph.com/issues/64079
This change also includes:
- adding ~/.local/bin to path so behave binary can be found
- adding requirements.txt file for testing dependencies
- increasing timeout used to wait for tools deployment to 90s
- increasing timeout used to wait for kvm network to 20s

Signed-off-by: Redouane Kachach <rkachach@redhat.com>

Merge pull request #55316 from ajarr/wip-64181-reef

reef: rbd-nbd: fix resize of images mapped using netlink

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>

Merge pull request #55173 from ronen-fr/wip-64019-reef

reef: osd/scrub: increasing max_osd_scrubs to 3

Reviewed-by: Sridhar Seshasayee <sseshasa@redhat.com>

Merge pull request #55399 from zdover23/wip-doc-2024-01-31-backport-55396-to-reef

reef: doc/architecture: improve some paragraphs

doc: add node-proxy documentation

This commit adds some documentation about the
'hardware inventory / monitoring' feature (node-proxy agent).

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit b7c0a6a5b0e7d6ba063e1dd1715f938ecf7ec55d)

doc/architecture: improve some paragraphs

Improve paragraphs under the heading "The Ceph Storage Cluster". Remove
a sentence that was pleonastic in its context in the paragraph.

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 2f0542d66901295cf875893de0ac15304578d917)

Merge pull request #55384 from zdover23/wip-doc-2024-01-31-backport-55372-to-reef

reef: doc/architecture.rst - fix typo

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>

doc/architecture.rst - fix typo

s/requies/requires

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 6c0417fbcbe6b9760b3836e5166d6bd929578096)

Merge pull request #55335 from guits/wip-64197-reef

reef: ceph-volume: use 'no workqueue' options with dmcrypt

agent/node-proxy: fix wrong host name used in data endpoint

data['cephx']['name'] will return something like:

node-proxy.hostname123

the prefix "node-proxy." has the be removed otherwise there will be
a mismatch between what is actually expected.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 224dd36df9c57e2162407ea0b98598f401884060)

cephadm: rewrite NodeProxy class for reef

Since we don't have some refactoring work in reef
that we have in main that allowed writing these daemon
classes in a more standard way

Signed-off-by: Adam King <adking@redhat.com>

mgr/cephadm: update node-proxy unit tests

The recent migration to a separate daemon implied
some changes which have broken these tests.
This commit fixes them.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 40fe3147a84452d1409f7b792736e090b01c7674)

mgr/cephadm: add a new config option 'oob_default_addr'

So there's a default value (169.254.1.1) which is the default
address for the 'OS to iDrac pass-through' interface.
Given that node-proxy will reach the RedFish API through this interface,
we can make users avoid to pass that addr when providing the host spec
at bootstrap time.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit b09fd672c9838a091d6779047f3292acbb62070d)

Merge pull request #55040 from sseshasa/wip-63910-reef

reef: osd: Tune snap trim item cost to reflect a PGs' average object size for mClock scheduler

Reviewed-by: Laura Flores <lflores@redhat.com>

Merge pull request #55311 from afreen23/wip-64178-reef

reef: mgr/dashboard: Fix inconsistency in capitalisation of "Multi-site"

Reviewed-by: Nizamudeen A <nia@redhat.com>

Merge pull request #55362 from ljflores/wip-64234-reef

reef: mgr: pin pytest to version 7.4.4

Reviewed-by: Nizamudeen A <nia@redhat.com>

mgr: pin pytest to version 7.4.4

On 2024-01-27, pytest updated to 8.0.0,
which broke run-tox-mgr.

https://docs.pytest.org/en/stable/changelog.html

==================================== ERRORS ====================================
_____________________ ERROR collecting alerts/__init__.py ______________________
alerts/__init__.py:2: in <module>
    from .module import Alerts
alerts/module.py:6: in <module>
    from mgr_module import CLIReadCommand, HandleCommandResult, MgrModule, Option
mgr_module.py:1: in <module>
    import ceph_module  # noqa
E   ModuleNotFoundError: No module named 'ceph_module'
______________________ ERROR collecting alerts/module.py _______________________
alerts/module.py:6: in <module>
    from mgr_module import CLIReadCommand, HandleCommandResult, MgrModule, Option
mgr_module.py:1: in <module>
    import ceph_module  # noqa
E   ModuleNotFoundError: No module named 'ceph_module'
____________________ ERROR collecting balancer/__init__.py _____________________
balancer/__init__.py:2: in <module>
    from .module import Module
balancer/module.py:12: in <module>
    from mgr_module import CLIReadCommand, CLICommand, CommandResult, MgrModule, Option, OSDMap, CephReleases
mgr_module.py:1: in <module>
    import ceph_module  # noqa
E   ModuleNotFoundError: No module named 'ceph_module'
_____________________ ERROR collecting balancer/module.py ______________________
balancer/module.py:12: in <module>
    from mgr_module import CLIReadCommand, CLICommand, CommandResult, MgrModule, Option, OSDMap, CephReleases
mgr_module.py:1: in <module>
    import ceph_module  # noqa
E   ModuleNotFoundError: No module named 'ceph_module'

Fixes: https://tracker.ceph.com/issues/64200
Signed-off-by: Laura Flores <lflores@ibm.com>
(cherry picked from commit 5554e565ca7ca9c9d6bc70e245be63d947722eda)

Merge pull request #55167 from kamoltat/wip-ksirivad-backport-reef-52380

reef: mon/ConnectionTracker.cc: disregard connection scores from mon_rank = -1

Reviewed-by: Laura Flores <lflores@redhat.com>

mgr/dashboard: get object bucket policies for a bucket

Getting the bucket details will also fetch the bucket policy if its set.

Fixes: https://tracker.ceph.com/issues/63221
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 40f053aee0d3504d34545101a546b3eaf64f50d1)

Merge pull request #53972 from Matan-B/wip-63180-reef

reef: osd/OSD: introduce reset_purged_snaps_last

Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

Merge pull request #55343 from zdover23/wip-doc-2024-01-29-backport-55341-to-reef

reef: doc/architecture.rst: improve rados definition

doc/architecture.rst: improve rados definition

Improve the definition of RADOS, and link to information about RADOS.

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 763f6b7a22e846962f388d58fd7e699cbf16ffe7)

Merge pull request #55338 from zdover23/wip-doc-2024-01-28-backport-55333-to-reef

reef: doc/radosgw: fix verb disagreement - index.html

doc/radosgw: fix verb disagreement - index.html

Fix a tricky verb disagreement and rewrite a few sentences for what I
hope is greater clarity.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 9f271093f4331381dc024cb4309f9f486d366818)

ceph-volume: fix partitions support in disk.get_devices()

The following:
```
is_part = get_file_contents(os.path.join(_sys_dev_block_path, item, 'partition')) == "1"
```
assumes any `/sys/dev/block/x:y/partition` contains '1' which is wrong.
This file actually contains the corresponding partition number.

Fixes: https://tracker.ceph.com/issues/64195
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit f72100bbd17539d9774ae72215afefee16f20775)

ceph-volume: use 'no workqueue' options with dmcrypt

CloudFlare engineers made some testing and realized that using
workqueues with encryption on flash devices has a bad effect.

See [1] for details.

With this patch it will make ceph-volume call crypsetup with
`--perf-no_read_workqueue` and `--perf-no_write_workqueue` options
when the device is not a rotational.

[1] https://blog.cloudflare.com/speeding-up-linux-disk-encryption/

Fixes: https://tracker.ceph.com/issues/64195
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
Co-Authored-by: Stefan Kooman <stefan@kooman.org>
(cherry picked from commit 0985e201342fa53c014a811156aed661b4b8f994)

ceph-volume: fix util.get_partitions

The current logic makes it report only the first
partitions of devices.

Fixes: https://tracker.ceph.com/issues/63086
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit b14ff07e6344d9f097259265d468f6300818b053)

Merge pull request #55321 from zdover23/wip-doc-2024-01-26-backport-55307-to-reef

reef: doc/radosgw: edit "Usage" admin.rst

doc/radosgw: edit "Usage" admin.rst

Edit "Usage" in doc/radosgw/admin.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit d8df6f61e817a34c2c3282224cff117ae43e3f98)

node-proxy: collect `LocationIndicatorActive` property (storage)

This makes node-proxy collect the `LocationIndicatorActive`
property for storage component.
This can be needed for the Blinkenlight feature.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit d4cfc5a96c9e6d04dedb21e7788325d7b00c533a)

node-proxy: add new attribute to BaseRedfishSystem()

This adds `self.component_list()` in order to parametrize
which categories the agent will collect.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit b49216bf8bb01fc8f11f4575cca644bd3ead5f5a)

node-proxy: add packaging related changes

This adds the required changes to build an RPM of node-proxy.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 940ce782b5843ef1c0a80a74c5ad2af3f635a8b9)

node-proxy: reduce log level in reporter agent

the following messages get logged quite a lot while
this is not a very useful information in a normal situation:

```
2024-01-12 09:09:40,604 - reporter - INFO - data ready to be sent to the mgr.
2024-01-12 09:09:40,604 - reporter - INFO - no diff, not sending data to the mgr.
2024-01-12 09:10:15,022 - reporter - INFO - data ready to be sent to the mgr.
2024-01-12 09:10:15,022 - reporter - INFO - no diff, not sending data to the mgr.
...
```

This commit changes the log level to DEBUG.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit b4091600f696fa8c3577876e071af3d53024f56f)

node-proxy: fix a thread/locking issue

This `sleep(5)` should be initiated *after* the lock is released.
Otherwise, it can cause troubles with the reporter loop which can
never acquire the lock.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 06a4a637b5988a1b6d7bae5d74ae140ff9ba83b6)

node-proxy: address a typo

while checking logs, I noticed the following message:

```
2024-01-12 09:08:03,751 - reporter - INFO - Reporter url set to https:10.10.10.11:7150/node-proxy/data
```

Although this is only a cosmetic issue as this variable
is only used for logging messages, let's fix it.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 1c4a212eb8d9608630c518cbbf46ab97051b1bc0)

node-proxy: make it a separate daemon

The current implementation requires the inclusion of all the recent
modifications in the cephadm binary, which won't be backported.

Since we need the node-proxy code backported to reef, let's move the
code make it a separate daemon.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
Co-authored-by: Adam King <adking@redhat.com>
(cherry picked from commit 7e6bc179ae7e0d633bd63086775002182c861d3f)

node-proxy: rename attribute and class

This renames the mgr's NodeProxyCache attribute from
`self.node_proxy` to `self.node_proxy_cache` and the
class `NodeProxy` in agent.py from `NodeProxy` to
`NodeProxyEndpoint` to make it clearer and avoid confusion.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit c62d1c82cf6155aba5e75e88ff3390ed5288e758)

node-proxy: enhance debug log messages for locking operations

This commit updates the debug log messages in the BaseRedfishSystem
and Reporter classes. The adjustments made enhance the clarity and
precision of the messages by specifically identifying acquired
and released locks, detailing their context, thereby improving the
understanding of the control flow during locking operations
in these components.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit e68dceb1d2d6b4e6871c77465e1e23f2e726f84c)

node-proxy: explicitly set NodeProxy's attributes

The current logic using `setattr()` makes mypy complain:

"NodeProxy" has no attribute "xxx"

Using `self.__dict['xxx']` addresses this mypy error but the
downside of this is that the code isn't clear and less readable.

Explicitly setting the different attributes makes the code clearer
and more readable.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit e71bf838428c297075df5515c342c0db0a9e31e3)

cephadm/tests: add pyyaml dependency

node-proxy requires this dependency so it needs to be added as
dependency for tox testing.

Typical failure:

```
ImportError while importing test module '/root/ceph/src/cephadm/tests/test_agent.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib64/python3.9/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_agent.py:10: in <module>
    _cephadm = import_cephadm()
tests/fixtures.py:14: in import_cephadm
    import cephadm as _cephadm
cephadm.py:32: in <module>
    from cephadmlib.node_proxy.main import NodeProxy
cephadmlib/node_proxy/main.py:2: in <module>
    from .redfishdellsystem import RedfishDellSystem
cephadmlib/node_proxy/redfishdellsystem.py:2: in <module>
    from .baseredfishsystem import BaseRedfishSystem
cephadmlib/node_proxy/baseredfishsystem.py:2: in <module>
    from .basesystem import BaseSystem
cephadmlib/node_proxy/basesystem.py:2: in <module>
    from .util import Config
cephadmlib/node_proxy/util.py:2: in <module>
    import yaml
E   ModuleNotFoundError: No module named 'yaml'
```

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 6e7ea5172ac489c01e0a073acf869bcf6982a2b4)

node-proxy: send oob management requests to the MgrListener()

Note that this won't be a true out of band management.
In the case where the host hangs, this won't work. The oob
management should be reached directly but most of the time
the oob network is isolated. The idea is to send queries to the
the tcp server exposed by the cephadm agent (MgrListener) so it
can send itself queries to the redfish API using the IP address
exposed on the OS.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 323c8cb0fbee134b11267fdf1e2cbc26cda8b08a)

cephadm: add `types-PyYAML` dependency in mypy testing

In order to address the following error:

```
cephadmlib/node_proxy/util.py:2: error: Library stubs not installed for "yaml" (or incompatible with Python 3.9)
cephadmlib/node_proxy/util.py:2: note: Hint: "python3 -m pip install types-PyYAML"
cephadmlib/node_proxy/util.py:2: note: (or run "mypy --install-types" to install all missing stub packages)
cephadmlib/node_proxy/util.py:2: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
```

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 47e7d3ddac1fd61b57149e5e4305bb9e819ae52e)

node-proxy: address flake8 errors in tests

This addresses a lot of flake8 errors in node-proxy tests:

E121 continuation line under-indented for hanging indent

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit f2c809e33f4999c4c64c58112cd94835a3b4ba24)

node-proxy: move the output formatting logic to orchestrator

Implementing this in the cephadm module doesn't follow the general idea
of the orchestrator interface. This is where the output formatting should
be done so let's move the logic to the orchestrator module.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit aa170850b8e7f63120169b89e39b21bee2c5287e)

node-proxy: address a typing issue in agent.NodeProxy.query()

The current logic supports str and bytes types for parameter
`data`. This doesn't make sense, let's drop this logic.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 6cdb6f65b4ab84fcc5484ee7c6b940dd27b29587)

node-proxy: address flake8 'Q000' warnings

This addresses the flake8 warning 'Q000':

`Q000 Double quotes found but single quotes preferred`

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 6cd42b73cfa843301ac8f58fe4f39eaf0b855b66)

node-proxy: code change for hdd blinkenlight pre-requisites

This is mainly for anticipating the case where hdd blinkenlight via RedFish
works (testing has to be done). This introduces the required changes so the
endpoint `/led` can support blinkenlight for both chassis and disks.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit febfe0bf7588705785047bec49bf1a970ce180eb)

node-proxy: Add a `NodeProxyManager` class

The current approach with `init_node_proxy()` and `node_proxy_loop_check()`
is 'cumbersome' and gives the heebie-jeebies.

Sub-classing `Thread()` makes the code a bit more clearer and readable.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 2a840d7ce4d64dd38f7aea381207bbecfef629cd)

rbd-nbd: log errors during netlink_resize() using derr

When using rbd CLI to map the images to NBD devices via netlink,
any errors that arose during image resizing in netlink_resize()
were not logged. Switching the error logging from using cerr to
derr helps log the errors from netlink_resize().

Signed-off-by: Ramana Raja <rraja@redhat.com>
(cherry picked from commit 1712b95c784c5ce381fbf4b09e8219ea40bd99a8)

rbd_nbd: fix resize of images mapped using netlink

Include device identifier or cookie in the message sent to the kernel
to resize images mapped to NBD devices using netlink. Otherwise,
netlink_resize() fails and the size of the device isn't updated.

Fixes: https://tracker.ceph.com/issues/64139
Signed-off-by: Ramana Raja <rraja@redhat.com>
(cherry picked from commit 1eebb7ba7903c6db0ab37a0457b263a1b2b00ff5)

cephadm: gracefully shutdown the agent prior to removing

When the agent is removed, the daemon is abruptly stopped.
Since the node-proxy logic runs from within the cephadm agent,
it leaves an active RedFish session. The idea is to gracefully
shutdown the agent so node-proxy can catch that event and make sure
it closes the current active RedFish session prior to shutting down.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 79bfe642001a7f9e1da28f987d1edb45174f6e86)

orch/cephadm: add json format support to `ceph orch hardware`

This adds `--format json` option support to the `ceph orch hardware` CLI
command.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 3a38755aef724d3039dc303a210b0972f7a71e63)

node-proxy: update the data structure for summary report

This extends the current data structure for the 'summary' report.
It adds `sn` (serial number information) and the `firmwares` dict
to the current data structure.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 61d07e0a441aafd84a463868f777d6091f6e92fe)

node-proxy: drop local API

This was intented to address the case where the Ceph
manager can't talk directly to the oob management tool because
of network restrictions (subnets not inter-connecter, etc.).

If for any reason the host is stuck or unreachable, that local API won't
be helpful anyway, as a result any actions the Ceph mgr would be asked
to perform on the node would fail.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 3607a305cf629f54a8c3c3f52e56d41210c21d28)

node-proxy: change 'idrac' terminology

The 'idrac' terminology is too specific, let's change this
to something more generic.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit a69b267cb9c425facf6fe68daa08993cf81d0816)

node-proxy: raise HTTPError 404 error when no host is found

Raise a 404 HTTPError when these differents endpoints
are passed an inexisting hostname.
Otherwise the code will fail with a `KeyError` exception.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit bda2568d8c73bbdc77897d22a99f74c1f4a23511)

node-proxy: run only when idrac details provided

This agent shouldn't run when no idrac details are
available.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit a21779c39d495a06f6f908594c541e5aa818b4f6)

cephadm: inventory.NodeProxyCache() refactor

This modifies fullreport(), summary() and common() methods
so they use the same logic as firmwares() and criticals()

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 1675b6fe4ee3a6c43204ddf698c845b09ab7a2db)

cephadm/agent: add docstring to NodeProxy class

In order to document that part of the code and it might
help to generate API spec and documentation.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 5e1051bdbdc4d1720aa58f5b584df89bd1dd3d6d)

node-proxy: implement criticals endpoint

This adds the required changes in order to implement the endpoint
'/criticals'.

The goal of this endpoint is to provide a report of all critical statuses
for either a given host or all hosts across the cluster.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit ae791f8721027a9a508c7cd27e85f86f6fe7c492)

node-proxy: validate_node_proxy_data() refactor

raise cherrypy.HTTPError() when the received data is
not valid instead of returning `self.validate_msg`

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit b5814cd9278c857b7e09a1dbe229a7cdead10a29)

node-proxy: implement http_query() helper function

so we can drop the dependency to `requests` and
use same helper function from both reporter.py and redfish_client.py

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit cae0e5e510eb3bad5132deb0332942aa294c6e8b)

node-proxy: address mypy and flake8 errors

This addresses some flake8 and python typing errors.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 5b6f18d7ad602921a25c8b8acfaf7b454cdbba0b)

node-proxy: fetch idrac details from NodeProxyCache()

The class ` NodeProxyCache()` is intended for that, it already
has this information so there's no need to make a call to `get_store()`
each time we want to access idrac details.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit a0f96aa5f1a27ec84e09f0bd030f62e39203e4f7)

node-proxy: parametrize idrac port

This adds the missing piece to make the idrac port
a parameter that one can customize.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 69f1272cbf036f8388398093def5136f420635f5)

cephadm: add new option to CLI

this adds the `--deploy-cephadm-agent` option to the cephadm
CLI's bootstrap subcommand.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 4c3979788fccbe01ff23163ea61cbdf8f74d9cbd)

node-proxy: implement /led endpoint

This is the first 'act on node' feature implementation.

This adds the endpoint /led

a GET request to this endpoint returns the current status
of the enclosure LED.
a PATCH request to this endpoint allows to set the
enclosure LED status.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 76dd9aa47095f1fca644879656b1fe17a033b9c4)

node-proxy: drop dispatch() in NodeProxy()

The current logic prevents from using any cherrypy decorators
on actual endpoints as we use a set of 'proxy functions'
(index and dispatch) instead.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 1ec59e6625bae7cd381d83817196bf8669f641ad)

node-proxy: local API (NodeProxy) refactor

- subclass cherrypy._cpserver.Server,
- drop cherrypy.quickstart() call,
- drop nested classes approach,
- make it run over https
- print tracebacks when an exception is raised

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 1c79d6493ac35ae0394c492616f95220fbe1fbb4)

node-proxy: clean up node_proxy dir

This removes a legacy file that is not needed any longer.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit fe41c29d9a135815c5b1937589aa31066763be63)

node-proxy: collect firmwares details

This makes all the required changes in order to support
collecting, pushing and exposing data regarding firmwares
status and versions for all the underlying hardware.
This also refactors the redfish dell corresponding logic:
Having so many nested/inheritance classes seems unnecessary.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit a9afa2f6adad2cff04b54bfd69e8883b4b9fb1cb)

node-proxy: update the JSON data structure

Change the data structure from:
```
{
  "storage": "ok",
  "processors": "ok",
  "network": "ok",
  "memory": "ok",
  "power": "ok",
  "fans": "ok"
}
```
to:

```
{
    "host": "node1",
    "sn": "xxxx",
    "status": {
        "storage": {
        }
    }
}
```

In order to provide a unique key (sn) which is more reliable at the top
level of the dict.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 65d3f781f92505eb708716eb281c670a71ed503c)

node-proxy: quick clean up

This removes some files which are not needed any longer.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit dcfeea4ea15d8bb566c4d40bc1ab2013a9c044a1)