cephadm: fix osd adoption with custom cluster name
When adopting Ceph OSD containers from a Ceph cluster with a custom name, it fails
because the name isn't propagated in unit.run.
The idea here is to change the lvm metadata and enforce 'ceph.cluster_name=ceph'
given that cephadm doesn't support custom names anyway.
Fixes: https://tracker.ceph.com/issues/55654 Signed-off-by: Adam King <adking@redhat.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e720a658d6a1582c0497bdf709ef4bd26bb5bb73)
Ronen Friedman [Tue, 17 May 2022 16:13:59 +0000 (16:13 +0000)]
osd/scrub: restart snap trimming after a failed scrub
A followup to PR#45640.
In PR#45640 snap trimming was restarted (if blocked) after all
successful scrubs, and after most scrub failures. Still, a few
failure scenarios did not handle snaptrim restart correctly.
The current PR cleans up and fixes the interaction between
scrub initiation/termination (for whatever cause) and snap
trimming.
Jos Collin [Wed, 4 May 2022 13:03:12 +0000 (18:33 +0530)]
qa: fix is_addr_blocklisted() to get blocklisted clients from 'osd dump'
By the introduction of range blocklist, the 'blocklist ls' command outputs
two lists. It's also straightforward to get the blocklisted clients directly
from 'osd dump' to avoid regression.
Fixes: https://tracker.ceph.com/issues/55516 Signed-off-by: Jos Collin <jcollin@redhat.com>
(cherry picked from commit 47de5d79b8190458847072aae1c29db7d6a9b66b)
Greg Farnum [Tue, 30 Nov 2021 18:29:46 +0000 (18:29 +0000)]
osd: Check range_blocklist in is_blocklisted(): we actually blocklist ranges
Carry a parallel map from cidr addresses to a new
range_bits class (stored entirely as ephemeral state) so that we
don't need to re-compute masks and bit mappings too often, and to
separate out the unpleasant ipv6 bit mapping logic. Then check
against those with range_bits::matches() the same way we check
for equality on specific-entity matches. Nice and simple loops!
Greg Farnum [Tue, 2 Nov 2021 00:38:50 +0000 (00:38 +0000)]
osdmap: convert get_blocklist() to provide the entity/IP and range blocklists
Providing a non-range-aware blocklist accessor would just be
asking for trouble, so don't.
The ugly part of this is how the Objecter is currently just
throwing the range blocklist on the end of its own list. The in-tree
callers are okay with this, and I'd like to look at removing the
blocklist events API from librados entirely -- it exposes "OSD-only"
state to clients and, as evidenced by this patch series, is not
particularly stable.
Greg Farnum [Wed, 8 Dec 2021 21:32:58 +0000 (21:32 +0000)]
mon: take blocklist ranges as a subcommand, not implicitly from address format
I discovered in testing with CephFS that this tends to interpret client IPs
(which don't have ports, but do have nonces) as invalid ranges. So give it
a separate input keyword that has to be applied first.
Greg Farnum [Mon, 25 Oct 2021 19:53:04 +0000 (19:53 +0000)]
msg: common: allow entity_addr_t to store a CIDR address range
This required very little change to the existing code. Use with care, because
existing code expects an IP address instead of a range, but it saves on
writing a new parser.
Greg Farnum [Tue, 2 Nov 2021 00:34:34 +0000 (00:34 +0000)]
mds: Server: Simplify apply_blocklist and usage of the OSDMap's blocklist
This previoulsly re-implemented a bunch of the OSDMap::is_blocklisted()
function, and wasn't actually any faster to run -- the list of new blocklists
may be smaller than the full set, but OSDMap::blocklist is an unordered_map
of constant lookup time so it shouldn't slow things down. More importantly,
this is much simpler, less likely to be buggy from duplicate code, and lets
the MDS off the hook for dealing with range blocklisting.
Greg Farnum [Mon, 1 Nov 2021 23:52:53 +0000 (23:52 +0000)]
client: Simplify blocklist tracking and interface
I'm not sure if the blocklist events tracking in Client.cc was ever
the simplest way to track that state, but it definitely isn't now. We
can just hand our addr_vec to the OSDMap and ask it -- it handles
version compatibility issues and, happily, means the Client doesn't
need to learn to deal with ranges directly.
Laura Flores [Tue, 24 May 2022 17:21:46 +0000 (17:21 +0000)]
qa/config: override `bluestore_zero_block_detection` default for rados suite tests
By default, `bluestore_zero_block_detection` is false since it interacts
negatively with some RBD and CephFS features. We still want this enabled
for rados teuthology tests though so we can use it for large-scale testing.
Adding this setting to the rados teuthology config file will override the global
config setting and enable `bluestore_zero_block_detection` only for tests in
the teuthology rados suite.
Laura Flores [Fri, 6 May 2022 17:14:10 +0000 (17:14 +0000)]
os/bluestore: turn bluestore zero block detection off by default
This commit adds a new global config option called `bluestore_zero_block_detection`.
This option is used at a dev level to skip zero bufferlist writes in large-scale
testing scenarios.
Store tests originally written for the BlueStore zero block detection feature
now account for the global config option; these tests will only run if
`bluestore_zero_block_detection` is set to "true".
Fixes: https://tracker.ceph.com/issues/55521 Signed-off-by: Laura Flores <lflores@redhat.com>
(cherry picked from commit 4a8180952c292aa2f8ee656685fcdddc3f0f0a4f)
Nizamudeen A [Sun, 8 May 2022 14:27:34 +0000 (19:57 +0530)]
mgr/dashboard: smart data for devices with scsi protocol
In the dashboard, we've been showing smart data for hdd devices with ata
protocol only. Otherwise we show a No Smart Data found error which is
clearly misleading since Smart Data is returned even in the api call.
So this PR is trying to show the smart data for hdd devices
that uses scsi protocol too.
Nizamudeen A [Fri, 6 May 2022 15:19:18 +0000 (20:49 +0530)]
mgr/dashboard: fix smart data error
the error in the log was this
```
"/usr/share/ceph/mgr/dashboard/services/ceph_service.py", line 253, in _get_smart_data_by_device
May 06 07:38:39 occldlr750-1.occl208.lab conmon[2142938]: svc_type, svc_id = daemon.split('.')
May 06 07:38:39 occldlr750-1.occl208.lab conmon[2142938]: ValueError: too many values to unpack (expected 2)
```
on the cluster, the output of `ceph device ls-by-host` looks like this
the first device is mon and its name is mon.occldlr750-1.occl208.lab.
In our dashboard code, when fetching the smart data we have a line like
this
`svc_type, svc_id = daemon.split('.')`
so for the mon the output of `daemon.split('.') will be ['mon', 'occldlr750-1', 'occl208', 'lab']. The svc_id gets split into three because of the split. I am changing that and giving the criteria as splitting only on the first occurence of the dot and the considering everything that comes after the dot as the svc_id of the device.
Nizamudeen A [Thu, 5 May 2022 17:43:38 +0000 (23:13 +0530)]
mgr/dashboard: devices with same UID causes multiselection
In the Physical Disks page, the uids for multiple devices are coming in
as same and that causes the selection to go berserk and select multiple
rows with same UID. The uid is generated in the frontend service call
itself. I just added some more parameters to it inorder to make it more
unique.
The second issue is the number of selected number getting multiplied
exponentially. Its because each time the table is updated or refreshed,
we push the row with the number of selected items we had before and that
causes the number of selection to multiply.
In `master` the milestone step exits and causes remaining tasks not to be run. I previously tried with the `continue-on-error` flag, but it didn't work, so let's try putting that steps at the end.
Laura Flores [Mon, 16 May 2022 22:59:42 +0000 (17:59 -0500)]
qa/suites/rados/thrash-erasure-code-big/thrashers: add `osd max backfills` setting to mapgap and pggrow
All `rados/thrash-erasure-code-big` tests that die due to the “wait_for_recovery” timeout have one thing in common: They contain either `thrashers/pggrow` or `thrashers/mapgap`.
The difference between pggrow and mapgap vs. all other non-offending thrashers (default, careful, fastread, and morepggrow) is that they lack an override setting for `osd max backfills`. `osd max backfills` is the max number of backfill operations allowed to/from an OSD. The higher the number, the quicker the recovery. By default, this value is 1. On all of the non-offending thrashers (default, careful, fastread, and morepggrow), the default 1 value gets overridden in their .yaml files with a value > 1. This is not the case for pggrow and mapgap, however, as they lack an `osd max backfills` override setting.
The mclock op scheduler is known to override `osd max backfills` with a high value, but all of the thrash-erasure-code-big thrashers have their op queue set to “debug_random”, which chooses randomly between op queues (the debug_random op queue is set to override the default mclock_scheduler in qa/config/rados.yaml). So, coupled with the “debug_random” op queue, the low `osd max backfill` setting is causing some tests to time out in recovery.
WITHOUT `osd max backfills`, as they are now, “mapgap” and “pggrow” tests die due to timed-out recovery about 17/100 times, as seen here with a pggrow test: http://pulpito.front.sepia.ceph.com/lflores-2022-05-18_14:24:29-rados:thrash-erasure-code-big-master-distro-default-smithi/
WITH `osd max backfills` specified, as I have suggested in this PR, 99/100 tests passed, with one test failing for a different reason:
http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_22:40:27-rados:thrash-erasure-code-big-master-distro-default-smithi/
I also scheduled 145 tests WITH `osd max backfills` that are a mix of pggrow and mapgap thrashers. 144/145 tests passed, with one test failing for a different reason. http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_15:27:54-rados:thrash-erasure-code-big-master-distro-default-smithi/
Fixes: https://tracker.ceph.com/issues/51076 Signed-off-by: Laura Flores <lflores@redhat.com>
(cherry picked from commit 40062676c2ceed49b9fa147127ffa83ba6118e2a)