ceph.spec.in, debian/control: add smartmontools and nvme-cli dependencies
These packages are needed in order to scrape device health metrics from
devices used by OSD and MON daemons.
smartmontools' smartctl is what we use in order to scrape devices' SMART
attributes and general health metrics.
In addition, we use nvme-cli tool on NVMe devices, which fetches
vendor specific NVMe related health metrics.
Ceph rely on these tools for proper functioning of the underlying layers
of devicehealth mgr module, and other mgr modules which use devicehealth
functionality (such as diskprediction_local, telemetry, dashboard).
Essentially, most of devicehealth commands rely on proper functioning of
smartctl, otherwise they lack the device health metrics.
For example, in case smartctl is missing, the commands:
ceph device scrape-daemon-health-metrics <who>
ceph device scrape-health-metrics [<devid>]
will not be able to scrape health metrics, and the command:
ceph device predict-life-expectancy <devid>
will not provide any meaningful output (since there are no metrics).
In short, when we scrape a device by its daemon (be it an OSD or a MON):
ceph device scrape-daemon-health-metrics <who>
The devicehealth module command eventually invokes a
block_device_get_metrics() call in either osd/OSD.cc or mon/Monitor.cc,
which wraps calls to both
block_device_run_smartctl() (spawns smartctl)
block_device_run_vendor_nvme() (spawns nvme)
in common/blkdev.cc.
Minimum version requirements:
'smartmontools' is the package name, which contains two utility
programs: 'smartd' and 'smartctl'. Ceph uses the latter.
Version 6.7 of smartctl first introduced the --json option (beta), which
allows to output the metrics in a JSON format. Since then a few
adjustments were made and the feature officially launched in smartctl
version 7.0.
Since we rely on the JSON format to process the metrics, we must have
smartmontools' smartctl version >= 7.
That said, we choose not to specify smartmontools version here on
purpose, since there might be a scenario where:
We specified smartmontools version to be >= 7.
smartmontools 7 is not available yet in rhel 8 / centos 8.
A user installs via rpm ceph-osd, for example.
smartmontools will not be installed (since version >= 7 is not available
in this repo yet).
Then the user upgrades to 8.3 (which should have smartmontools >= 7),
but smartmontools will not get upgraded (since it's not installed).
In the scenario where we do not specify a version, smartmontools 6.6
will be installed, but it will be upgraded to >= 7 when a user upgrades
(and if it's a fresh installation - version >= 7 would be installed
anyway).
nvme-cli does not have a minimum version.
We use 'Recommends' for both rpm and deb packages since we do not want
the installation to fail in case of conflicts. 'Recommends' weakens the
dependency to be installed in case possible, but ignores it in cases of
conflicts with other dependencies.
It's worth mentioning that smartmontools and nvme-cli dependencies exist
in ceph-container builds.
We add them here for the cases of bare metal installations.
In the future we will add a separate package (with smartmontools and
nvme-cli dependencies) that can be installed on any node (running
rbd-mirror, rgw, mds, mgr, etc.), in order to be able to collect the
health metrics of its devices and offer their life expectancy
prediction.
Had to remove the line:
Requires: python%{python3_pkgversion}-ceph-common = %{_epoch_prefix}%{version}-%{release}
which slipped in between
Requires: libstoragemgmt
and
%if 0%{?weak_deps}
Also, removed from the cherry-picked commit the dependencies for mon package
from both ceph.spec.in and debian/control.
That's because in Nautilus we do not scrape the health metrics of mon devices
(please see commit d592e56e74d94c6a05b9240fcb0031868acefbab).
mgr/dashboard: Monitoring: Fix for the infinite loading bar action
Only seen in nautilus
Intended to fix the unusual behaviour in the All Alerts tab where the loading bar progressess continously until one of the alerts is selected.
To reproduce:
Navigate to cluster -> Monitoring -> All Alerts tab. You can see the progress bar at the bottom of the table.
Fixes: https://tracker.ceph.com/issues/47435 Signed-off-by: Nizamudeen A <nia@redhat.com>
RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.
also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.
Conflicts:
doc/cephfs/cephfs-io-path.rst
doc/dev/deduplication.rst
doc/install/ceph-deploy/quick-cephfs.rst
doc/radosgw/vault.rst
doc/rbd/rbd-kubernetes.rst
doc/rbd/rbd-persistent-cache.rst: these file does not exist in
nautilus, so drop related changes
doc/conf.py: exclude pybindings docs from build for RTD
because it'd difficult to prepare (dummy) librados,libcephfs and librbd for
their python bindings in the building environment offered by Read the Docs.
Greg Farnum [Wed, 12 Aug 2020 23:44:11 +0000 (23:44 +0000)]
mon: mark pgtemp messages as no_reply more consistently in preprocess_pgtemp
If a message is forwarded, it's conceivable the leader's and peon's evaluation
will disagree about whether the message is useful or not, which could result
in the leader ignoring it and the peon having a dangling forwarded message.
Fix this by marking the op as no_reply whenever ignoring it.
Ilya Dryomov [Sat, 29 Aug 2020 10:02:30 +0000 (12:02 +0200)]
msg/async/ProtocolV2: allow rxbuf/txbuf get bigger in testing
We have a kernel client test case that constructs huge auth tickets
to exercise the three related code paths in the kernel. One of the
tickets is bigger than 1000000 bytes, as required for triggering the
third code path.
We haven't bumped into this assert earlier because the kernel client
is still on msgr v1. However, "rbd map" and "rbd unmap" commands
started connecting to the cluster in commit 96f05a7956b3 ("rbd: delay
determination of default pool name") and that happens via msgr v2.
Satoru Takeuchi [Fri, 22 May 2020 01:45:32 +0000 (01:45 +0000)]
ceph-volume: show correct rejected reason in inventory if device type is not acceptable
If device type is not acceptable in `c-v inventory`, its rejected reason
becomes "Insufficient space (<5GB)" by mistake. It's because sys_api is
empty due to skipping devices that are neither `disk` nor `device`. We
should report the target device is not acceptable in this case.
osd: add and utilize OSD_FIXED_COLLECTION_LIST feature
If all osds from upacting set have this feature set
the backend can use the new "fixed" collection_list method,
otherwise it fallbacks to the legacy method.
Andrew Schoen [Fri, 4 Sep 2020 14:44:49 +0000 (09:44 -0500)]
ceph-volume: simple scan should ignore tmpfs
When simple scan is ran against a ceph-volume
OSD, util.encryption.legacy_encrypted returns
tmpfs. We want to avoid creating a Device
object with tmpfs and ignore the OSD as it's
not a ceph-disk created OSD.
mon/OSDMonitor: only take in osd into consideration when trimming osdmaps
we should not take down osd into consideration when trimming osdmap. in e62269c892, we decrease the upper bound of range of osdmaps to be trimmed
if the given osd is out. but we should have to decrease it only if the
osd in question is still *in*.
so, in this change, the min_lec is decreased only if the osd in question
is *in*.
Adam Kupczyk [Wed, 1 Jul 2020 21:09:17 +0000 (23:09 +0200)]
os/bluestore: Add documentation for large bluefs log recovery
Adds additional paragraph to ceph-bluestore-tool documentation,
describing how to use *special* options --bluefs_replay_recovery
and --bluefs_replay_recovery_disable_compact to recover large
bluefs log.
Fixes: https://tracker.ceph.com/issues/46714 Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
Sébastien Han [Tue, 18 Aug 2020 13:41:31 +0000 (15:41 +0200)]
ceph-volume: retry when acquiring lock fails
When preaparing the osd device with --mkfs, the ceph-osd binary tries to
acquire an exclusive lock on the device (soon to become an OSD).
Unfortunately, when running in containers, we have seen cases where
there is a race between ceph-osd and systemd-udevd to acquire a lock on
the device. Sometimes systemd-udevd gets the lock and releases it soon
so that the ceph-osd gets sometimes the lock is still held and because
ceph-osd uses LOCK_NB the command fails.
This commit retries if the lock cannot be acquired, up to 5 times for 5
seconds, this should be more than enough to acquire the lock and
proceed with the OSD mkfs.
Unfortunately, this is so transient that we cannot lock earlier from c-v,
this won't do anything.
Patrick Donnelly [Thu, 27 Aug 2020 20:43:01 +0000 (13:43 -0700)]
Merge PR #36833 into nautilus
* refs/pull/36833/head:
nautilus: mgr/volumes: convert uid and gid to integer type
mgr/volumes: Address python breakage in python 2
mgr/volumes: Update doc/cephfs/fs-volumes.rst for nautilus
mgr/volumes: Prevent subvolume recreate if trash is not-empty
mgr/volumes: Disallow subvolume group level snapshots
mgr/volumes: Add test case to ensure subvolume is marked
mgr/volumes: handle idempotent subvolume marks
mgr/volumes: Tests amended and added to ensure subvolume trash functionality
mgr/volumes: Mark subvolume root with the vxattr ceph.dir.subvolume
mgr/volumes: Move incarnations for v2 subvolumes, to subvolume trash
mgr/volumes: maintain per subvolume trash directory
mgr/volumes: make subvolume_v2::_is_retained() object property
mgr/volumes: Use snapshot root directory attrs when creating clone root
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Venky Shankar [Fri, 21 Aug 2020 14:07:37 +0000 (10:07 -0400)]
mgr/volumes: maintain per subvolume trash directory
PR https://github.com/ceph/ceph/pull/36472 introduces changes
that disallow nested nested snapshots in a subtree (subvolume)
and renames across subvolumes. This effect asynchronous purge
in mgr/volumes as subvolume are moved to a trash directory for
asynchronous deletion by purge threads.
To workaround this, start maintaining a subvolume specific
trash directory. Use the trash directory as an index to the
subvolume specific trash directory entry.
This changes subvolume deletion logic which currently relies
on `--retain-snapshots` flag to decide if the subvolume user
directory should get purged or the subvolume base directory
itself. Deleting a subvolume moves the user facing directory
to its specific trash directory. Purge threads take care of
deleting user facing directories (in trash) and the subvolume
base directory if required (when certain conditions are met).
mgr/volumes: Use snapshot root directory attrs when creating clone root
If a subvolumes mode or uid/gid values are changed post a snapshot,
and a clone of a snapshot prior to the change is initiated, the clone
inherits the current source subvolumes attributes, rather than the
snapshots attributes.
Fixing this by using the snapshots subvolume root attributes to create
the clone subvolumes root.
Following attributes are picked from the source subvolume snapshot:
- uid, gid, mode, data pool, pool namespace, quota
Patrick Donnelly [Wed, 26 Aug 2020 23:35:08 +0000 (16:35 -0700)]
Merge PR #36804 into nautilus
* refs/pull/36804/head:
qa/workunits/fs: add test for subvolume
mds: don't move inode with nlink > 1 to global snaprealm if it's in subvolume
mds: disallow hardlink across subvolume
mds: disallow across subvolume rename
mds: disallow creating snapshot on descendent directory of subvolume
mds: add vxattr that marks/clears subvolume flag
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>