Ilya Dryomov [Sat, 6 Jan 2024 16:08:04 +0000 (17:08 +0100)]
librbd: try to preserve object map for diff-iterate in fast-diff mode
As an optimization, try to ensure that the object map for the end
version is preloaded through the acquisition of exclusive lock and
as a consequence remains around until exclusive lock is released.
If it's not around, DiffRequest would (re)load it on each call.
Ilya Dryomov [Sat, 6 Jan 2024 16:05:39 +0000 (17:05 +0100)]
librbd/object_map: potentially use in-memory object map in DiffRequest
If the object map for the end version is around (already loaded in
memory, either due to the end version being a snapshot or due to
exclusive lock being held), use it to run diff-iterate against the
beginning of time. Since it's the only object map needed in that
case, such calls would be satisfied locally.
Ilya Dryomov [Fri, 5 Jan 2024 12:15:54 +0000 (13:15 +0100)]
librbd/object_map: decouple object map processing in DiffRequest
In preparation for potentially using in-memory object map, decouple
object map processing from loading object maps and place the logic in
prepare_for_object_map() and process_object_map().
Ilya Dryomov [Fri, 5 Jan 2024 11:23:24 +0000 (12:23 +0100)]
common/bit_vector: fix iterator vs reference constness confusion
T (ConstIterator or Iterator) is confused with const T here:
IteratorImpl dereference operator is wrongly overloaded on const
and returns Reference instead of ConstReference for ConstIterator.
This then fails inside bufferlist bowels because Reference is
incompatible with bufferlist::const_iterator.
Ilya Dryomov [Thu, 4 Jan 2024 10:39:20 +0000 (11:39 +0100)]
librbd/object_map: don't resize object map in handle_load_object_map()
Currently it's done in two cases:
- if the loaded object map is larger than expected based on byte size,
it's truncated to expected number of objects
- in case of deep-copy, if the loaded object map is smaller than diff
state, it's expanded to get "track the largest of all versions in the
set" semantics
Both of these cases can be easily dealt with without modifying the
object map. Being able to process a const object map is needed for
working on in-memory object map which is external to DiffRequest.
It's totally broken: instead of returning the current position and
moving to the next position, it returns the next position and doesn't
move anywhere. Luckily it hasn't been used until now.
Ilya Dryomov [Thu, 28 Dec 2023 09:14:18 +0000 (10:14 +0100)]
librbd: propagate diff-iterate range to parent in fast-diff mode
When getting parent diff, pass the overlap-reduced image extent instead
of the entire 0..overlap range to avoid a similar quadratic slowdown on
cloned images.
Ilya Dryomov [Wed, 27 Dec 2023 17:07:05 +0000 (18:07 +0100)]
librbd/object_map: add support for ranged diff-iterate
Currently diff-iterate in fast-diff mode is performed on the entire
image no matter what image extent is passed to the API. Then, unused
diff just gets discarded as DiffIterate ends up querying only objects
that the passed image extent maps to. This hasn't been an issue for
internal consumers ("rbd du", "rbd diff", etc) because they work on the
entire image, but turns out to lead to quadratic slowdown in some QEMU
use cases.
0..UINT64_MAX range is carved out for deep-copy which is unranged by
definition. To get effectively unranged diff-iterate, 0..UINT64_MAX-1
range can be used.
Ilya Dryomov [Sat, 23 Dec 2023 14:19:09 +0000 (15:19 +0100)]
test/librbd: expand TestMockObjectMapDiffRequest edge case coverage
For each covered edge case or error, run through the following
scenarios:
- where the edge case concerns snap_id_start
- where the edge case concerns snap_id_end
- where the edge case concerns intermediate snapshot and
snap_id_start == 0 (diff against the beginning of time)
- where the edge case concerns intermediate snapshot and
snap_id_start != 0 (diff from snapshot)
Ilya Dryomov [Sat, 23 Dec 2023 13:47:54 +0000 (14:47 +0100)]
librbd/object_map: allow intermediate snaps to be skipped on diff-iterate
In case of diff-iterate against the beginning of time, the result
depends only on the end version. Loading and processing object maps
or intermediate snapshots is redundant and can be skipped.
This optimization is made possible by commit be507aaed15f ("librbd:
diff-iterate shouldn't ever report "new hole" against a hole") and, to
a lesser extent, the previous commit.
Getting FastDiffInvalid, LoadObjectMapError and ObjectMapTooSmall to
pass required tweaking not just expectations, but also start/end snap
ids and thus also the meaning of these tests. This is addressed in the
next commit.
Ilya Dryomov [Fri, 22 Dec 2023 17:50:20 +0000 (18:50 +0100)]
librbd/object_map: resurrect diff-iterate behavior when image is shrunk
The new "track the largest of all versions in the set, diff state is
only ever grown" semantics introduced in commit 330f2a7bb94f ("librbd:
helper state machine for computing diffs between object-maps") don't
make sense for diff-iterate. It's a waste because DiffIterate won't
query beyond the end version size -- this is baked into the API.
Limit this behavior to deep-copy and resurrect the original behavior
from 2015 for diff-iterate.
Ilya Dryomov [Fri, 22 Dec 2023 15:10:12 +0000 (16:10 +0100)]
librbd/object_map: fix diff from snapshot when image is grown
Commit 399a45e11332 ("librbd/object_map: rbd diff between two
snapshots lists entire image content") fixed most of the damage caused
by commit b81cd2460de7 ("librbd/object_map: diff state machine should
track object existence"), but the case of a "resize diff" when diffing
from snapshot was missed. An area that was freshly allocated in image
resize is the same in principle as a freshly created image and objects
marked OBJECT_EXISTS_CLEAN are no exception. Diff for such objects in
such an area should be set to DIFF_STATE_DATA_UPDATED, however
currently when diffing from snapshot, it's set to DIFF_STATE_DATA.
Ilya Dryomov [Wed, 20 Dec 2023 11:22:17 +0000 (12:22 +0100)]
librbd/object_map: drop bogus if in handle_load_object_map()
It became redundant with commit b81cd2460de7 ("librbd/object_map: diff
state machine should track object existence") -- it != end_it condition
in the loop is sufficient.
In preparation for multiple similarly configured MockTestImageCtx
objects being used in a single test, centralize their creation and add
a couple of helpers for setting expectations from a callback.
bacport mgr/rook: always recreate kvm default network + fix groups refresh Fixes: https://tracker.ceph.com/issues/64079
This change also includes:
- adding ~/.local/bin to path so behave binary can be found
- adding requirements.txt file for testing dependencies
- increasing timeout used to wait for tools deployment to 90s
- increasing timeout used to wait for kvm network to 20s
mgr/cephadm: add a new config option 'oob_default_addr'
So there's a default value (169.254.1.1) which is the default
address for the 'OS to iDrac pass-through' interface.
Given that node-proxy will reach the RedFish API through this interface,
we can make users avoid to pass that addr when providing the host spec
at bootstrap time.
Laura Flores [Mon, 29 Jan 2024 00:58:25 +0000 (00:58 +0000)]
mgr: pin pytest to version 7.4.4
On 2024-01-27, pytest updated to 8.0.0,
which broke run-tox-mgr.
https://docs.pytest.org/en/stable/changelog.html
==================================== ERRORS ====================================
_____________________ ERROR collecting alerts/__init__.py ______________________
alerts/__init__.py:2: in <module>
from .module import Alerts
alerts/module.py:6: in <module>
from mgr_module import CLIReadCommand, HandleCommandResult, MgrModule, Option
mgr_module.py:1: in <module>
import ceph_module # noqa
E ModuleNotFoundError: No module named 'ceph_module'
______________________ ERROR collecting alerts/module.py _______________________
alerts/module.py:6: in <module>
from mgr_module import CLIReadCommand, HandleCommandResult, MgrModule, Option
mgr_module.py:1: in <module>
import ceph_module # noqa
E ModuleNotFoundError: No module named 'ceph_module'
____________________ ERROR collecting balancer/__init__.py _____________________
balancer/__init__.py:2: in <module>
from .module import Module
balancer/module.py:12: in <module>
from mgr_module import CLIReadCommand, CLICommand, CommandResult, MgrModule, Option, OSDMap, CephReleases
mgr_module.py:1: in <module>
import ceph_module # noqa
E ModuleNotFoundError: No module named 'ceph_module'
_____________________ ERROR collecting balancer/module.py ______________________
balancer/module.py:12: in <module>
from mgr_module import CLIReadCommand, CLICommand, CommandResult, MgrModule, Option, OSDMap, CephReleases
mgr_module.py:1: in <module>
import ceph_module # noqa
E ModuleNotFoundError: No module named 'ceph_module'
ceph-volume: fix partitions support in disk.get_devices()
The following:
```
is_part = get_file_contents(os.path.join(_sys_dev_block_path, item, 'partition')) == "1"
```
assumes any `/sys/dev/block/x:y/partition` contains '1' which is wrong.
This file actually contains the corresponding partition number.
ceph-volume: use 'no workqueue' options with dmcrypt
CloudFlare engineers made some testing and realized that using
workqueues with encryption on flash devices has a bad effect.
See [1] for details.
With this patch it will make ceph-volume call crypsetup with
`--perf-no_read_workqueue` and `--perf-no_write_workqueue` options
when the device is not a rotational.
the following messages get logged quite a lot while
this is not a very useful information in a normal situation:
```
2024-01-12 09:09:40,604 - reporter - INFO - data ready to be sent to the mgr.
2024-01-12 09:09:40,604 - reporter - INFO - no diff, not sending data to the mgr.
2024-01-12 09:10:15,022 - reporter - INFO - data ready to be sent to the mgr.
2024-01-12 09:10:15,022 - reporter - INFO - no diff, not sending data to the mgr.
...
```
This `sleep(5)` should be initiated *after* the lock is released.
Otherwise, it can cause troubles with the reporter loop which can
never acquire the lock.
The current implementation requires the inclusion of all the recent
modifications in the cephadm binary, which won't be backported.
Since we need the node-proxy code backported to reef, let's move the
code make it a separate daemon.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com> Co-authored-by: Adam King <adking@redhat.com>
(cherry picked from commit 7e6bc179ae7e0d633bd63086775002182c861d3f)
This renames the mgr's NodeProxyCache attribute from
`self.node_proxy` to `self.node_proxy_cache` and the
class `NodeProxy` in agent.py from `NodeProxy` to
`NodeProxyEndpoint` to make it clearer and avoid confusion.
node-proxy: enhance debug log messages for locking operations
This commit updates the debug log messages in the BaseRedfishSystem
and Reporter classes. The adjustments made enhance the clarity and
precision of the messages by specifically identifying acquired
and released locks, detailing their context, thereby improving the
understanding of the control flow during locking operations
in these components.
node-proxy requires this dependency so it needs to be added as
dependency for tox testing.
Typical failure:
```
ImportError while importing test module '/root/ceph/src/cephadm/tests/test_agent.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib64/python3.9/importlib/__init__.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/test_agent.py:10: in <module>
_cephadm = import_cephadm()
tests/fixtures.py:14: in import_cephadm
import cephadm as _cephadm
cephadm.py:32: in <module>
from cephadmlib.node_proxy.main import NodeProxy
cephadmlib/node_proxy/main.py:2: in <module>
from .redfishdellsystem import RedfishDellSystem
cephadmlib/node_proxy/redfishdellsystem.py:2: in <module>
from .baseredfishsystem import BaseRedfishSystem
cephadmlib/node_proxy/baseredfishsystem.py:2: in <module>
from .basesystem import BaseSystem
cephadmlib/node_proxy/basesystem.py:2: in <module>
from .util import Config
cephadmlib/node_proxy/util.py:2: in <module>
import yaml
E ModuleNotFoundError: No module named 'yaml'
```
node-proxy: send oob management requests to the MgrListener()
Note that this won't be a true out of band management.
In the case where the host hangs, this won't work. The oob
management should be reached directly but most of the time
the oob network is isolated. The idea is to send queries to the
the tcp server exposed by the cephadm agent (MgrListener) so it
can send itself queries to the redfish API using the IP address
exposed on the OS.
node-proxy: move the output formatting logic to orchestrator
Implementing this in the cephadm module doesn't follow the general idea
of the orchestrator interface. This is where the output formatting should
be done so let's move the logic to the orchestrator module.
node-proxy: code change for hdd blinkenlight pre-requisites
This is mainly for anticipating the case where hdd blinkenlight via RedFish
works (testing has to be done). This introduces the required changes so the
endpoint `/led` can support blinkenlight for both chassis and disks.
Ramana Raja [Tue, 23 Jan 2024 21:07:04 +0000 (16:07 -0500)]
rbd-nbd: log errors during netlink_resize() using derr
When using rbd CLI to map the images to NBD devices via netlink,
any errors that arose during image resizing in netlink_resize()
were not logged. Switching the error logging from using cerr to
derr helps log the errors from netlink_resize().
Ramana Raja [Mon, 22 Jan 2024 22:06:58 +0000 (17:06 -0500)]
rbd_nbd: fix resize of images mapped using netlink
Include device identifier or cookie in the message sent to the kernel
to resize images mapped to NBD devices using netlink. Otherwise,
netlink_resize() fails and the size of the device isn't updated.
cephadm: gracefully shutdown the agent prior to removing
When the agent is removed, the daemon is abruptly stopped.
Since the node-proxy logic runs from within the cephadm agent,
it leaves an active RedFish session. The idea is to gracefully
shutdown the agent so node-proxy can catch that event and make sure
it closes the current active RedFish session prior to shutting down.
node-proxy: update the data structure for summary report
This extends the current data structure for the 'summary' report.
It adds `sn` (serial number information) and the `firmwares` dict
to the current data structure.
This was intented to address the case where the Ceph
manager can't talk directly to the oob management tool because
of network restrictions (subnets not inter-connecter, etc.).
If for any reason the host is stuck or unreachable, that local API won't
be helpful anyway, as a result any actions the Ceph mgr would be asked
to perform on the node would fail.
node-proxy: fetch idrac details from NodeProxyCache()
The class ` NodeProxyCache()` is intended for that, it already
has this information so there's no need to make a call to `get_store()`
each time we want to access idrac details.
This is the first 'act on node' feature implementation.
This adds the endpoint /led
a GET request to this endpoint returns the current status
of the enclosure LED.
a PATCH request to this endpoint allows to set the
enclosure LED status.
- subclass cherrypy._cpserver.Server,
- drop cherrypy.quickstart() call,
- drop nested classes approach,
- make it run over https
- print tracebacks when an exception is raised
This makes all the required changes in order to support
collecting, pushing and exposing data regarding firmwares
status and versions for all the underlying hardware.
This also refactors the redfish dell corresponding logic:
Having so many nested/inheritance classes seems unnecessary.