Yaarit Hatuka [Thu, 27 Aug 2020 03:04:34 +0000 (23:04 -0400)]
mgr/telemetry: fix device id splitting when anonymizing serial
Anonymizing the serial number in the device id string fails in rare
cases where 'vendor' and 'model' are missing from the device id
string. Ideally, device id is generated (in blkdev.cc) as
'vendor_model_serial', in case all fields were successfully retrieved
from the device. In cases where they were not, device id can also be
generated as 'model_serial' or 'serial'. Splitting by '_' fails in the
latter case (since 'serial' is the only element in the string).
In order to anonymize serial numbers in smartctl reports we now rely
on the serial number value as retrieved from the raw smartctl report
itself (as opposed to the one in device id). That's in order to avoid
possible inconsistencies between the serial retrieved from device id and
the one in the report.
In master we use Python 3's f-string formatting to create 'anon_devid':
anon_devid = f"{devid.rsplit('_', 1)[0]}_{uuid.uuid1()}"
The conflict happened since Nautilus still uses Python 2, and 'anon_id'
is created via string concatenation.
anon_devid = devid[:devid.rfind('_')] + '_' + str(uuid.uuid1())
ceph.spec.in, debian/control: add smartmontools and nvme-cli dependencies
These packages are needed in order to scrape device health metrics from
devices used by OSD and MON daemons.
smartmontools' smartctl is what we use in order to scrape devices' SMART
attributes and general health metrics.
In addition, we use nvme-cli tool on NVMe devices, which fetches
vendor specific NVMe related health metrics.
Ceph rely on these tools for proper functioning of the underlying layers
of devicehealth mgr module, and other mgr modules which use devicehealth
functionality (such as diskprediction_local, telemetry, dashboard).
Essentially, most of devicehealth commands rely on proper functioning of
smartctl, otherwise they lack the device health metrics.
For example, in case smartctl is missing, the commands:
ceph device scrape-daemon-health-metrics <who>
ceph device scrape-health-metrics [<devid>]
will not be able to scrape health metrics, and the command:
ceph device predict-life-expectancy <devid>
will not provide any meaningful output (since there are no metrics).
In short, when we scrape a device by its daemon (be it an OSD or a MON):
ceph device scrape-daemon-health-metrics <who>
The devicehealth module command eventually invokes a
block_device_get_metrics() call in either osd/OSD.cc or mon/Monitor.cc,
which wraps calls to both
block_device_run_smartctl() (spawns smartctl)
block_device_run_vendor_nvme() (spawns nvme)
in common/blkdev.cc.
Minimum version requirements:
'smartmontools' is the package name, which contains two utility
programs: 'smartd' and 'smartctl'. Ceph uses the latter.
Version 6.7 of smartctl first introduced the --json option (beta), which
allows to output the metrics in a JSON format. Since then a few
adjustments were made and the feature officially launched in smartctl
version 7.0.
Since we rely on the JSON format to process the metrics, we must have
smartmontools' smartctl version >= 7.
That said, we choose not to specify smartmontools version here on
purpose, since there might be a scenario where:
We specified smartmontools version to be >= 7.
smartmontools 7 is not available yet in rhel 8 / centos 8.
A user installs via rpm ceph-osd, for example.
smartmontools will not be installed (since version >= 7 is not available
in this repo yet).
Then the user upgrades to 8.3 (which should have smartmontools >= 7),
but smartmontools will not get upgraded (since it's not installed).
In the scenario where we do not specify a version, smartmontools 6.6
will be installed, but it will be upgraded to >= 7 when a user upgrades
(and if it's a fresh installation - version >= 7 would be installed
anyway).
nvme-cli does not have a minimum version.
We use 'Recommends' for both rpm and deb packages since we do not want
the installation to fail in case of conflicts. 'Recommends' weakens the
dependency to be installed in case possible, but ignores it in cases of
conflicts with other dependencies.
It's worth mentioning that smartmontools and nvme-cli dependencies exist
in ceph-container builds.
We add them here for the cases of bare metal installations.
In the future we will add a separate package (with smartmontools and
nvme-cli dependencies) that can be installed on any node (running
rbd-mirror, rgw, mds, mgr, etc.), in order to be able to collect the
health metrics of its devices and offer their life expectancy
prediction.
Had to remove the line:
Requires: python%{python3_pkgversion}-ceph-common = %{_epoch_prefix}%{version}-%{release}
which slipped in between
Requires: libstoragemgmt
and
%if 0%{?weak_deps}
Also, removed from the cherry-picked commit the dependencies for mon package
from both ceph.spec.in and debian/control.
That's because in Nautilus we do not scrape the health metrics of mon devices
(please see commit d592e56e74d94c6a05b9240fcb0031868acefbab).
`ceph-volume simple activate --all` relies on the presence of json files
in `/etc/ceph/osd` that was created with `ceph-volume simple scan`
command.
In a cluster lifecycle, it is very likely an OSD which was deployed with
ceph-disk at some point gets removed or replaced. It means the corresponding
json file in `/etc/ceph/osd` becomes unrelevant. It makes `ceph-volume
simple activate --all` fails because it tries to mount non existing
partitions.
The idea here is to simply warn the user that the osd described in the
json file doesn't exist anymore and exit properly instead of throwing an
error.
Bryan Stillwell [Tue, 24 Mar 2020 21:15:41 +0000 (15:15 -0600)]
compressor: Set the Zstd default compression level to 1
The default compression level of 5 for Zstandard is too high for the majority
of use cases since it requires too many CPU cycles. This patch switches the
default to 1.
Dan van der Ster [Mon, 14 Sep 2020 14:23:53 +0000 (16:23 +0200)]
ceph.in: ignore failures to flush stdout
Catch an IOError exception when flushing ceph stdout.
Fixes: https://tracker.ceph.com/issues/47442 Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
(cherry picked from commit 48503413a28fbea32f8ef3d48cb765771216f165)
mgr/dashboard: Monitoring: Fix for the infinite loading bar action
Only seen in nautilus
Intended to fix the unusual behaviour in the All Alerts tab where the loading bar progressess continously until one of the alerts is selected.
To reproduce:
Navigate to cluster -> Monitoring -> All Alerts tab. You can see the progress bar at the bottom of the table.
Fixes: https://tracker.ceph.com/issues/47435 Signed-off-by: Nizamudeen A <nia@redhat.com>
Jason Dillaman [Wed, 5 Aug 2020 16:36:26 +0000 (12:36 -0400)]
test/rbd-mirror: pool watcher registration error might result in race
The init finish context should be swapped out before it attempts to
re-register the watcher. This affects the test case which mocks the
timer to fire immediately instead of after 30 seconds.
Fixes: https://tracker.ceph.com/issues/46669 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit c89d31ebf6c412d609123979c63ebc600b70e179)
Conflicts:
src/tools/rbd_mirror/PoolWatcher.cc
- nautilus uses Mutex::Locker where master has std::lock_guard
RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.
also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.
Conflicts:
doc/cephfs/cephfs-io-path.rst
doc/dev/deduplication.rst
doc/install/ceph-deploy/quick-cephfs.rst
doc/radosgw/vault.rst
doc/rbd/rbd-kubernetes.rst
doc/rbd/rbd-persistent-cache.rst: these file does not exist in
nautilus, so drop related changes
doc/conf.py: exclude pybindings docs from build for RTD
because it'd difficult to prepare (dummy) librados,libcephfs and librbd for
their python bindings in the building environment offered by Read the Docs.
Jason Dillaman [Wed, 5 Aug 2020 13:12:41 +0000 (09:12 -0400)]
librbd: migration abort should revert data back to the original image
If the migration destination image was modified and then the migration
was aborted, we need to copy the data back to the source image to avoid
losing data. For simplicity we will only revert the HEAD revision state
and will not attempt to copy new snapshots on the destination image
back to the source.
Fixes: https://tracker.ceph.com/issues/41394 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 5bd15da8be09a4e7644d411a0b0c132e5b795393)
We want to prevent the destination image from being used while an
abort is in-progress. Test that the image has no watchers prior to
permitting the abort, switch the migration state to ABORTING, and
treat the image as read-only if the migration state is ABORTING.
Jason Dillaman [Wed, 5 Feb 2020 20:27:39 +0000 (15:27 -0500)]
librbd: ensure deep-copy snapshot map includes all destination snap ids
When deep-copying from an arbitrary start snapshot id, the snap sequence
will be missing all older snapshots. Additionally, snapshot types that
are not deep-copied still need to be included in the destination snap
map.
Jason Dillaman [Wed, 5 Feb 2020 19:23:53 +0000 (14:23 -0500)]
librbd: deep-copy snapshots from a specified start/end position
Allow the snapshots to be arbitrarily copied from any source image
start/end snapshot ids. If the end snapshot is not a user-snapshot,
it will associate to the destination image HEAD revision.
Conflicts:
src/librbd/deep_copy/SnapshotCopyRequest.cc: different lock types
src/test/librbd/deep_copy/test_mock_SnapshotCopyRequest.cc: no mirror snapshot namespaces
Jason Dillaman [Wed, 5 Feb 2020 15:42:27 +0000 (10:42 -0500)]
librbd: deep-copy should accept a lower-bound for the destination snap_id
For snapshot-based mirroring, we will want to prevent the modification of
snapshots below the last sync snapshot and to prevent the copying of data
below that lower-bound as well. This commit just adds the new parameter and
future commits will update the snapshot and object copy behavior.
Greg Farnum [Wed, 12 Aug 2020 23:44:11 +0000 (23:44 +0000)]
mon: mark pgtemp messages as no_reply more consistently in preprocess_pgtemp
If a message is forwarded, it's conceivable the leader's and peon's evaluation
will disagree about whether the message is useful or not, which could result
in the leader ignoring it and the peon having a dangling forwarded message.
Fix this by marking the op as no_reply whenever ignoring it.
J. Eric Ivancich [Tue, 15 Sep 2020 18:20:04 +0000 (14:20 -0400)]
rgw: advance pseudo-folders properly in delimited ordered listing
The code mistakenly uses the current marker to figure out how to skip
past a pseudo-directory. This could allow for some entries in a bucket
to be skipped. The code should have used the current pseudo-directory
to determine what to skip past.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
While this did fix https://tracker.ceph.com/issues/40905, it did so in
an unnecessarily complex manner. So we're reverting it to more easily
apply a cleaner solution.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Ilya Dryomov [Sat, 29 Aug 2020 10:02:30 +0000 (12:02 +0200)]
msg/async/ProtocolV2: allow rxbuf/txbuf get bigger in testing
We have a kernel client test case that constructs huge auth tickets
to exercise the three related code paths in the kernel. One of the
tickets is bigger than 1000000 bytes, as required for triggering the
third code path.
We haven't bumped into this assert earlier because the kernel client
is still on msgr v1. However, "rbd map" and "rbd unmap" commands
started connecting to the cluster in commit 96f05a7956b3 ("rbd: delay
determination of default pool name") and that happens via msgr v2.
Satoru Takeuchi [Fri, 22 May 2020 01:45:32 +0000 (01:45 +0000)]
ceph-volume: show correct rejected reason in inventory if device type is not acceptable
If device type is not acceptable in `c-v inventory`, its rejected reason
becomes "Insufficient space (<5GB)" by mistake. It's because sys_api is
empty due to skipping devices that are neither `disk` nor `device`. We
should report the target device is not acceptable in this case.
osd: add and utilize OSD_FIXED_COLLECTION_LIST feature
If all osds from upacting set have this feature set
the backend can use the new "fixed" collection_list method,
otherwise it fallbacks to the legacy method.
include/encoding: Fix encode/decode of float types on big-endian systems
Currently, floating-point types use "raw" encoding, which means they're
simply copied as byte stream.
This means that if the decoding happens on a machine that differs in
byte order from the source machine, the returned value will be
incorrect. As one effect of this problem, a big-endian OSD node cannot
join a cluster where the MON node is little-endian (or vice versa),
because the OSDMap (incremental) structure contains floating-point
values, and as a result of this conversion problem, the OSD node will
crash with an assertion failure as soon as it receives any OSDMap update
from the MON.
This should be fixed by always encoding floating-point values in
little-endian byte order just as is done for integers. (Note that this
still assumes source and target machines used the same floating-point
format except for byte order. But given that nearly all platforms these
days use IEEE binary32/binary64 for float/double, that seems a
reasonable assumption.)
Andrew Schoen [Fri, 4 Sep 2020 14:44:49 +0000 (09:44 -0500)]
ceph-volume: simple scan should ignore tmpfs
When simple scan is ran against a ceph-volume
OSD, util.encryption.legacy_encrypted returns
tmpfs. We want to avoid creating a Device
object with tmpfs and ignore the OSD as it's
not a ceph-disk created OSD.
Note the public_addr above being split at the first ':' of an IPv6
address.
Signed-off-by: Matthew Oliver <moliver@suse.com> Fixes: https://tracker.ceph.com/issues/46846
(cherry picked from commit 985cce055bcee60b843806291458517c7ee890a3)
mon/OSDMonitor: only take in osd into consideration when trimming osdmaps
we should not take down osd into consideration when trimming osdmap. in e62269c892, we decrease the upper bound of range of osdmaps to be trimmed
if the given osd is out. but we should have to decrease it only if the
osd in question is still *in*.
so, in this change, the min_lec is decreased only if the osd in question
is *in*.
Ilya Dryomov [Mon, 24 Aug 2020 17:01:46 +0000 (19:01 +0200)]
krbd: optionally skip waiting for udev events
Add support for noudev option to allow mapping and unmapping images
from a privileged container in a non-initial network namespace (e.g.
when using Multus CNI).
Conflicts:
doc/man/8/rbd.rst [ crush_location, read_from_replica and
compression_hint map options not in nautilus ]
src/krbd.cc [ commits 08bf0b628803 ("krbd: do away with
explicit memory management") and 1e67e240f4dd ("krbd: misc
cleanups") not in nautilus ]
src/tools/rbd/action/Kernel.cc [ commits 34f539d8af33 ("rbd:
delay parsing of default kernel map options") and da4ffd834fb8
("rbd: rename some MapOptions instances to unmap_options") not
in nautilus ]