Nizamudeen A [Sun, 8 May 2022 14:27:34 +0000 (19:57 +0530)]
mgr/dashboard: smart data for devices with scsi protocol
In the dashboard, we've been showing smart data for hdd devices with ata
protocol only. Otherwise we show a No Smart Data found error which is
clearly misleading since Smart Data is returned even in the api call.
So this PR is trying to show the smart data for hdd devices
that uses scsi protocol too.
Nizamudeen A [Fri, 6 May 2022 15:19:18 +0000 (20:49 +0530)]
mgr/dashboard: fix smart data error
the error in the log was this
```
"/usr/share/ceph/mgr/dashboard/services/ceph_service.py", line 253, in _get_smart_data_by_device
May 06 07:38:39 occldlr750-1.occl208.lab conmon[2142938]: svc_type, svc_id = daemon.split('.')
May 06 07:38:39 occldlr750-1.occl208.lab conmon[2142938]: ValueError: too many values to unpack (expected 2)
```
on the cluster, the output of `ceph device ls-by-host` looks like this
the first device is mon and its name is mon.occldlr750-1.occl208.lab.
In our dashboard code, when fetching the smart data we have a line like
this
`svc_type, svc_id = daemon.split('.')`
so for the mon the output of `daemon.split('.') will be ['mon', 'occldlr750-1', 'occl208', 'lab']. The svc_id gets split into three because of the split. I am changing that and giving the criteria as splitting only on the first occurence of the dot and the considering everything that comes after the dot as the svc_id of the device.
Nizamudeen A [Thu, 5 May 2022 17:43:38 +0000 (23:13 +0530)]
mgr/dashboard: devices with same UID causes multiselection
In the Physical Disks page, the uids for multiple devices are coming in
as same and that causes the selection to go berserk and select multiple
rows with same UID. The uid is generated in the frontend service call
itself. I just added some more parameters to it inorder to make it more
unique.
The second issue is the number of selected number getting multiplied
exponentially. Its because each time the table is updated or refreshed,
we push the row with the number of selected items we had before and that
causes the number of selection to multiply.
In `master` the milestone step exits and causes remaining tasks not to be run. I previously tried with the `continue-on-error` flag, but it didn't work, so let's try putting that steps at the end.
Laura Flores [Mon, 16 May 2022 22:59:42 +0000 (17:59 -0500)]
qa/suites/rados/thrash-erasure-code-big/thrashers: add `osd max backfills` setting to mapgap and pggrow
All `rados/thrash-erasure-code-big` tests that die due to the “wait_for_recovery” timeout have one thing in common: They contain either `thrashers/pggrow` or `thrashers/mapgap`.
The difference between pggrow and mapgap vs. all other non-offending thrashers (default, careful, fastread, and morepggrow) is that they lack an override setting for `osd max backfills`. `osd max backfills` is the max number of backfill operations allowed to/from an OSD. The higher the number, the quicker the recovery. By default, this value is 1. On all of the non-offending thrashers (default, careful, fastread, and morepggrow), the default 1 value gets overridden in their .yaml files with a value > 1. This is not the case for pggrow and mapgap, however, as they lack an `osd max backfills` override setting.
The mclock op scheduler is known to override `osd max backfills` with a high value, but all of the thrash-erasure-code-big thrashers have their op queue set to “debug_random”, which chooses randomly between op queues (the debug_random op queue is set to override the default mclock_scheduler in qa/config/rados.yaml). So, coupled with the “debug_random” op queue, the low `osd max backfill` setting is causing some tests to time out in recovery.
WITHOUT `osd max backfills`, as they are now, “mapgap” and “pggrow” tests die due to timed-out recovery about 17/100 times, as seen here with a pggrow test: http://pulpito.front.sepia.ceph.com/lflores-2022-05-18_14:24:29-rados:thrash-erasure-code-big-master-distro-default-smithi/
WITH `osd max backfills` specified, as I have suggested in this PR, 99/100 tests passed, with one test failing for a different reason:
http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_22:40:27-rados:thrash-erasure-code-big-master-distro-default-smithi/
I also scheduled 145 tests WITH `osd max backfills` that are a mix of pggrow and mapgap thrashers. 144/145 tests passed, with one test failing for a different reason. http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_15:27:54-rados:thrash-erasure-code-big-master-distro-default-smithi/
Fixes: https://tracker.ceph.com/issues/51076 Signed-off-by: Laura Flores <lflores@redhat.com>
(cherry picked from commit 40062676c2ceed49b9fa147127ffa83ba6118e2a)
Yaarit Hatuka [Wed, 23 Mar 2022 17:08:59 +0000 (13:08 -0400)]
doc/releases: update telemetry commands
Telemetry is already an 'always-on' module, thus no need to enable it.
In Quincy, when telemetry is off, we use preview / preview-all to get a
sample report, and show / show-all to see what actually is being
reported.
but gcc-toolset-8-annobin provides this file. upgrading to
gcc-toolset-11 does not help. see https://centos.pkgs.org/8-stream/centos-appstream-x86_64/gcc-toolset-11-annobin-plugin-gcc-10.23-1.el8.x86_64.rpm.html
so, the intermediate solution would be to disable the plugin, if
we want to use gcc-toolset to build rpm packages.
in this change, _annotated_build is undefined to prevent the compiler
from adding extra information to the binary. in general this change
shuold be safe, without these information, it'd be hard to tell if
the binary is hardened or what ABI version it expects. see
also https://fedoraproject.org/wiki/Changes/Annobin
John Mulligan [Mon, 11 Apr 2022 19:32:42 +0000 (15:32 -0400)]
pybind/mgr: add a wrapper exception for use with Responder
In order to best get a "real" exception converted to something
that can be cleanly sent to the mgr response, this new exception
type can be invoked directly, or with the wrap method to automatically
pull as many properties as possibly from the original exception.
John Mulligan [Mon, 11 Apr 2022 19:16:34 +0000 (15:16 -0400)]
pybind/mgr: add format arg to Responder's extra args
To ensure that the Responder can make use of a user provided `--format=`
parameter even if the programmer doesn't explicitly add one to the
args of an endpoint function we set the `extra_args` attribute on
our wrapper function so that CLICommand can later extract it.
John Mulligan [Mon, 11 Apr 2022 19:03:12 +0000 (15:03 -0400)]
pybind/mgr: enhance CLICommand to fetch extra args from wrapped funcs
Previously, the CLICommand decorator "assumed" that the decorator was
applied directly to a mgr module api endpoint function. Now that we plan
on adding the Responder decorator into the mix we need a way of
properly fetching the arguments of the endpoint function. In addition,
the decorator itself needs to provide extra arguments to the mgr
(in cases where the endpoint function doesn't explicitly ask for it).
Thus we add a helper function to find the endpoint function when
wrapped as well as extract extra arguments when "walking" the stack
of __wrapped__ functions.
John Mulligan [Mon, 11 Apr 2022 18:46:37 +0000 (14:46 -0400)]
pybind/mgr: change to private _load_func_metadata classmethod
The load_func_metadata had exactly one use in the codebase, the
store_func_metadata method. It was also a staticmethod that referred to
a property of it's class.
This change makes the function "private" by renaming it to
_load_func_metadata, removing it from the public "surface area" of the
type. It changes it to a classmethod so that it would work correctly
if used from a subclass of CLICommand.
John Mulligan [Sat, 9 Apr 2022 19:19:37 +0000 (15:19 -0400)]
pybind/mgr: add a Responder decorator type
The Responder is the decorator that future endpoint functions in the mgr can
use to automatically handle conversions of returned types to serilaized
data (JSON, YAML, etc) as well as automatically convert exceptions into
error responses.
The Responder makes use of format and return-value adapter types,
previously added to the module, to convert a returned value into a mgr
response. This change adds some exception types to return error
responses to the clients.
Simple customizations can be done by passing an alternate format adapter
type when the Responder is being constructed. Additional customization
can be done by subclassing the Responder.
John Mulligan [Sat, 9 Apr 2022 19:13:41 +0000 (15:13 -0400)]
pybind/mgr: add CommonFormatter type and valid_formats method
A type that has a valid_formats method, and thus meets the
CommonFormatter protocol, supports distinguishing between formats
that are known but unsupported for a given API vs. unknown (possibly a typo).
To make working with the format names easier this also makes the Format
enum inherit from str.
John Mulligan [Sat, 9 Apr 2022 18:46:50 +0000 (14:46 -0400)]
pybind/mgr: add a ReturnValueAdapter type to object_format.py
The ReturnValueAdapter type fulfills a similar role to the
ObjectFormatAdapter but instead of serializing data for the
body of a mgr response, extracts an return value (error code)
to reply with.
Most of the time it is totally unnecessary to provide an explicit
return value because if you have are returning a valid object (as
opposed to raising an exception) the return value will be zero
(success). However, in the off chance an type need to directly
communicate a return value for the mgr response it can provide
the `mgr_return_value` method and the adapater will discover
and use it.
John Mulligan [Sat, 9 Apr 2022 18:29:25 +0000 (14:29 -0400)]
pybind/mgr: add ObjectFormatAdapter type to object_format.py
The ObjectFormatAdapter fills the role for bridging between types
that can return a simplified representation of themselves and
actually formatting objects as JSON and YAML.
Note that we do not want generally want types that serialize themselves
to JSON/YAML strings. That approach makes it harder to standardize on
the final output formatting (indentation, multiple yaml docs, etc).
Additionally, we do not want the types to need to specialize between
JSON and YAML. So, by default, we try to use a method `to_simplified`
which is not specific to any serialization format. However, for
backwards compatibility with types that already have methods *that
return dicts/lists/etc* under the names `to_json` or `to_yaml` we
support using the `compatible` flag to enable the use of those methods.
If the adaptor fails to find a conversion method on the object,
serialization of the object itself is attempted - this way return values
of simple lists, dicts, etc also works.
An earlier version of this patch tried to share the JSON/YAML
serialization logic found in src/pybind/mgr/orchestrator/module.py.
However, this approach was deemed too complicated and we also preferred
to use yaml safe dumping whenever possible. This does lead to a level
of code duplication. Dealing with this duplication is a task left for
the future.
John Mulligan [Fri, 8 Apr 2022 15:15:55 +0000 (11:15 -0400)]
pybind/mgr: reformat quoting in format enum
Whenever possible I use 'black' to reformat the python code.
It's strict and its formatting is superset of what ceph's
formatting tools require. This change updates the code that was
moved into this file so that future uses of 'black' don't
reformat this section too.
John Mulligan [Mon, 14 Mar 2022 15:29:50 +0000 (11:29 -0400)]
pybind/mgr: start a new object_format.py for general formatting
Currently, there's some auto-formatting logic in the orchestrator
module and a lot of ad-hoc formatting scattered around the mgr modules.
This new module aims to bring some of that together in a central
location.
Start by moving the Format enum from the orchestrator.
Adam King [Fri, 1 Apr 2022 12:20:28 +0000 (08:20 -0400)]
mgr/cephadm: make UpgradeState from_json a bit safer
This way, for downgrades to whatever versions
this lands in onward, having added new parameters to
UpgradeState shouldn't break anything. Can't do much
about downgrades to older versions from this one
but this should help in the future.
Adam King [Mon, 28 Mar 2022 16:10:15 +0000 (12:10 -0400)]
mgr/cephadm: split _do_upgrade into sub functions
This function was around 500 lines and difficult to work
with. Splitting it into sub functions should hopefully make
it a bit easier to understand and make changes to.
Prashant D [Thu, 17 Mar 2022 14:29:40 +0000 (14:29 +0000)]
mgr, mgr/prometheus: Fix regression with prometheus metrics
The ceph dameons on host are inheriting ceph version from the host.
This introduces a wrong interpretation in prometheus metrics as well
as in dump_server. Each ceph daemon should represent it's own
ceph version based on the ceph binary is use for that daemon.
Consider a situation where partial upgrade is done on host, some daemons
which are restarted should have ceph version tag as upgraded version
and rest should have older ceph version but presently all inherites
host version. In containerized environment, all daemons are
using ceph version of last daemon registered as a service on the host.
Fixes: https://tracker.ceph.com/issues/54611 Signed-off-by: Prashant D <pdhange@redhat.com>
(cherry picked from commit aeca2e41ef560cf51c1ad935cfb6470e782aa8d5)
Redouane Kachach [Tue, 17 May 2022 09:40:15 +0000 (11:40 +0200)]
mgr/cephadm: adding support to copy ceph conf to per fsid config location Fixes: https://tracker.ceph.com/issues/55685 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit c5e4aa6085dab3a8ce6087efa0fc1caf904ba4ae)
Adam King [Thu, 18 Nov 2021 20:22:39 +0000 (15:22 -0500)]
mgr/cephadm: re-use old ip when re-adding hosts if necessary
When a host is re-added without an explicit ip we can default to the old
ip we had stored for the host rather than either keeping the loopback
address or throwing an exception. We only want to actually error when
the only options left are error or use a resolved loopback address
Redouane Kachach [Tue, 17 May 2022 15:26:39 +0000 (17:26 +0200)]
mgr/cephadm: stripping out / from the end of the url Fixes: https://tracker.ceph.com/issues/55638 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 17032f6be22e9efc3e199d7e35091025bfaae965)
Redouane Kachach [Tue, 17 May 2022 10:32:50 +0000 (12:32 +0200)]
mgr/cephadm: do not use sudo for root user Fixes: https://tracker.ceph.com/issues/55641 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 7cfcc7ef089cb3458040b9b592a7d0bafbf4c2c2)