Jos Collin [Wed, 4 May 2022 13:03:12 +0000 (18:33 +0530)]
qa: fix is_addr_blocklisted() to get blocklisted clients from 'osd dump'
By the introduction of range blocklist, the 'blocklist ls' command outputs
two lists. It's also straightforward to get the blocklisted clients directly
from 'osd dump' to avoid regression.
Fixes: https://tracker.ceph.com/issues/55516 Signed-off-by: Jos Collin <jcollin@redhat.com>
(cherry picked from commit 47de5d79b8190458847072aae1c29db7d6a9b66b)
Greg Farnum [Tue, 30 Nov 2021 18:29:46 +0000 (18:29 +0000)]
osd: Check range_blocklist in is_blocklisted(): we actually blocklist ranges
Carry a parallel map from cidr addresses to a new
range_bits class (stored entirely as ephemeral state) so that we
don't need to re-compute masks and bit mappings too often, and to
separate out the unpleasant ipv6 bit mapping logic. Then check
against those with range_bits::matches() the same way we check
for equality on specific-entity matches. Nice and simple loops!
Greg Farnum [Tue, 2 Nov 2021 00:38:50 +0000 (00:38 +0000)]
osdmap: convert get_blocklist() to provide the entity/IP and range blocklists
Providing a non-range-aware blocklist accessor would just be
asking for trouble, so don't.
The ugly part of this is how the Objecter is currently just
throwing the range blocklist on the end of its own list. The in-tree
callers are okay with this, and I'd like to look at removing the
blocklist events API from librados entirely -- it exposes "OSD-only"
state to clients and, as evidenced by this patch series, is not
particularly stable.
Greg Farnum [Wed, 8 Dec 2021 21:32:58 +0000 (21:32 +0000)]
mon: take blocklist ranges as a subcommand, not implicitly from address format
I discovered in testing with CephFS that this tends to interpret client IPs
(which don't have ports, but do have nonces) as invalid ranges. So give it
a separate input keyword that has to be applied first.
Greg Farnum [Mon, 25 Oct 2021 19:53:04 +0000 (19:53 +0000)]
msg: common: allow entity_addr_t to store a CIDR address range
This required very little change to the existing code. Use with care, because
existing code expects an IP address instead of a range, but it saves on
writing a new parser.
Greg Farnum [Tue, 2 Nov 2021 00:34:34 +0000 (00:34 +0000)]
mds: Server: Simplify apply_blocklist and usage of the OSDMap's blocklist
This previoulsly re-implemented a bunch of the OSDMap::is_blocklisted()
function, and wasn't actually any faster to run -- the list of new blocklists
may be smaller than the full set, but OSDMap::blocklist is an unordered_map
of constant lookup time so it shouldn't slow things down. More importantly,
this is much simpler, less likely to be buggy from duplicate code, and lets
the MDS off the hook for dealing with range blocklisting.
Greg Farnum [Mon, 1 Nov 2021 23:52:53 +0000 (23:52 +0000)]
client: Simplify blocklist tracking and interface
I'm not sure if the blocklist events tracking in Client.cc was ever
the simplest way to track that state, but it definitely isn't now. We
can just hand our addr_vec to the OSDMap and ask it -- it handles
version compatibility issues and, happily, means the Client doesn't
need to learn to deal with ranges directly.
In `master` the milestone step exits and causes remaining tasks not to be run. I previously tried with the `continue-on-error` flag, but it didn't work, so let's try putting that steps at the end.
Laura Flores [Mon, 16 May 2022 22:59:42 +0000 (17:59 -0500)]
qa/suites/rados/thrash-erasure-code-big/thrashers: add `osd max backfills` setting to mapgap and pggrow
All `rados/thrash-erasure-code-big` tests that die due to the “wait_for_recovery” timeout have one thing in common: They contain either `thrashers/pggrow` or `thrashers/mapgap`.
The difference between pggrow and mapgap vs. all other non-offending thrashers (default, careful, fastread, and morepggrow) is that they lack an override setting for `osd max backfills`. `osd max backfills` is the max number of backfill operations allowed to/from an OSD. The higher the number, the quicker the recovery. By default, this value is 1. On all of the non-offending thrashers (default, careful, fastread, and morepggrow), the default 1 value gets overridden in their .yaml files with a value > 1. This is not the case for pggrow and mapgap, however, as they lack an `osd max backfills` override setting.
The mclock op scheduler is known to override `osd max backfills` with a high value, but all of the thrash-erasure-code-big thrashers have their op queue set to “debug_random”, which chooses randomly between op queues (the debug_random op queue is set to override the default mclock_scheduler in qa/config/rados.yaml). So, coupled with the “debug_random” op queue, the low `osd max backfill` setting is causing some tests to time out in recovery.
WITHOUT `osd max backfills`, as they are now, “mapgap” and “pggrow” tests die due to timed-out recovery about 17/100 times, as seen here with a pggrow test: http://pulpito.front.sepia.ceph.com/lflores-2022-05-18_14:24:29-rados:thrash-erasure-code-big-master-distro-default-smithi/
WITH `osd max backfills` specified, as I have suggested in this PR, 99/100 tests passed, with one test failing for a different reason:
http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_22:40:27-rados:thrash-erasure-code-big-master-distro-default-smithi/
I also scheduled 145 tests WITH `osd max backfills` that are a mix of pggrow and mapgap thrashers. 144/145 tests passed, with one test failing for a different reason. http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_15:27:54-rados:thrash-erasure-code-big-master-distro-default-smithi/
Fixes: https://tracker.ceph.com/issues/51076 Signed-off-by: Laura Flores <lflores@redhat.com>
(cherry picked from commit 40062676c2ceed49b9fa147127ffa83ba6118e2a)
Yaarit Hatuka [Wed, 23 Mar 2022 17:08:59 +0000 (13:08 -0400)]
doc/releases: update telemetry commands
Telemetry is already an 'always-on' module, thus no need to enable it.
In Quincy, when telemetry is off, we use preview / preview-all to get a
sample report, and show / show-all to see what actually is being
reported.
but gcc-toolset-8-annobin provides this file. upgrading to
gcc-toolset-11 does not help. see https://centos.pkgs.org/8-stream/centos-appstream-x86_64/gcc-toolset-11-annobin-plugin-gcc-10.23-1.el8.x86_64.rpm.html
so, the intermediate solution would be to disable the plugin, if
we want to use gcc-toolset to build rpm packages.
in this change, _annotated_build is undefined to prevent the compiler
from adding extra information to the binary. in general this change
shuold be safe, without these information, it'd be hard to tell if
the binary is hardened or what ABI version it expects. see
also https://fedoraproject.org/wiki/Changes/Annobin
John Mulligan [Mon, 11 Apr 2022 19:32:42 +0000 (15:32 -0400)]
pybind/mgr: add a wrapper exception for use with Responder
In order to best get a "real" exception converted to something
that can be cleanly sent to the mgr response, this new exception
type can be invoked directly, or with the wrap method to automatically
pull as many properties as possibly from the original exception.
John Mulligan [Mon, 11 Apr 2022 19:16:34 +0000 (15:16 -0400)]
pybind/mgr: add format arg to Responder's extra args
To ensure that the Responder can make use of a user provided `--format=`
parameter even if the programmer doesn't explicitly add one to the
args of an endpoint function we set the `extra_args` attribute on
our wrapper function so that CLICommand can later extract it.
John Mulligan [Mon, 11 Apr 2022 19:03:12 +0000 (15:03 -0400)]
pybind/mgr: enhance CLICommand to fetch extra args from wrapped funcs
Previously, the CLICommand decorator "assumed" that the decorator was
applied directly to a mgr module api endpoint function. Now that we plan
on adding the Responder decorator into the mix we need a way of
properly fetching the arguments of the endpoint function. In addition,
the decorator itself needs to provide extra arguments to the mgr
(in cases where the endpoint function doesn't explicitly ask for it).
Thus we add a helper function to find the endpoint function when
wrapped as well as extract extra arguments when "walking" the stack
of __wrapped__ functions.
John Mulligan [Mon, 11 Apr 2022 18:46:37 +0000 (14:46 -0400)]
pybind/mgr: change to private _load_func_metadata classmethod
The load_func_metadata had exactly one use in the codebase, the
store_func_metadata method. It was also a staticmethod that referred to
a property of it's class.
This change makes the function "private" by renaming it to
_load_func_metadata, removing it from the public "surface area" of the
type. It changes it to a classmethod so that it would work correctly
if used from a subclass of CLICommand.
John Mulligan [Sat, 9 Apr 2022 19:19:37 +0000 (15:19 -0400)]
pybind/mgr: add a Responder decorator type
The Responder is the decorator that future endpoint functions in the mgr can
use to automatically handle conversions of returned types to serilaized
data (JSON, YAML, etc) as well as automatically convert exceptions into
error responses.
The Responder makes use of format and return-value adapter types,
previously added to the module, to convert a returned value into a mgr
response. This change adds some exception types to return error
responses to the clients.
Simple customizations can be done by passing an alternate format adapter
type when the Responder is being constructed. Additional customization
can be done by subclassing the Responder.
John Mulligan [Sat, 9 Apr 2022 19:13:41 +0000 (15:13 -0400)]
pybind/mgr: add CommonFormatter type and valid_formats method
A type that has a valid_formats method, and thus meets the
CommonFormatter protocol, supports distinguishing between formats
that are known but unsupported for a given API vs. unknown (possibly a typo).
To make working with the format names easier this also makes the Format
enum inherit from str.
John Mulligan [Sat, 9 Apr 2022 18:46:50 +0000 (14:46 -0400)]
pybind/mgr: add a ReturnValueAdapter type to object_format.py
The ReturnValueAdapter type fulfills a similar role to the
ObjectFormatAdapter but instead of serializing data for the
body of a mgr response, extracts an return value (error code)
to reply with.
Most of the time it is totally unnecessary to provide an explicit
return value because if you have are returning a valid object (as
opposed to raising an exception) the return value will be zero
(success). However, in the off chance an type need to directly
communicate a return value for the mgr response it can provide
the `mgr_return_value` method and the adapater will discover
and use it.
John Mulligan [Sat, 9 Apr 2022 18:29:25 +0000 (14:29 -0400)]
pybind/mgr: add ObjectFormatAdapter type to object_format.py
The ObjectFormatAdapter fills the role for bridging between types
that can return a simplified representation of themselves and
actually formatting objects as JSON and YAML.
Note that we do not want generally want types that serialize themselves
to JSON/YAML strings. That approach makes it harder to standardize on
the final output formatting (indentation, multiple yaml docs, etc).
Additionally, we do not want the types to need to specialize between
JSON and YAML. So, by default, we try to use a method `to_simplified`
which is not specific to any serialization format. However, for
backwards compatibility with types that already have methods *that
return dicts/lists/etc* under the names `to_json` or `to_yaml` we
support using the `compatible` flag to enable the use of those methods.
If the adaptor fails to find a conversion method on the object,
serialization of the object itself is attempted - this way return values
of simple lists, dicts, etc also works.
An earlier version of this patch tried to share the JSON/YAML
serialization logic found in src/pybind/mgr/orchestrator/module.py.
However, this approach was deemed too complicated and we also preferred
to use yaml safe dumping whenever possible. This does lead to a level
of code duplication. Dealing with this duplication is a task left for
the future.
John Mulligan [Fri, 8 Apr 2022 15:15:55 +0000 (11:15 -0400)]
pybind/mgr: reformat quoting in format enum
Whenever possible I use 'black' to reformat the python code.
It's strict and its formatting is superset of what ceph's
formatting tools require. This change updates the code that was
moved into this file so that future uses of 'black' don't
reformat this section too.
John Mulligan [Mon, 14 Mar 2022 15:29:50 +0000 (11:29 -0400)]
pybind/mgr: start a new object_format.py for general formatting
Currently, there's some auto-formatting logic in the orchestrator
module and a lot of ad-hoc formatting scattered around the mgr modules.
This new module aims to bring some of that together in a central
location.
Start by moving the Format enum from the orchestrator.