Jason Dillaman [Tue, 22 Sep 2020 19:24:38 +0000 (15:24 -0400)]
librbd: list-snaps needs to writeout fancy-striped extents
If the delta is being generated for a fancy-stiped extent such that
the provided object extents are not within the actual snapshot deltas,
we still need to indicate that the object exists (as a whiteout) to
prevent the possibility of an incorrect read request to the parent.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Mon, 14 Sep 2020 18:26:51 +0000 (14:26 -0400)]
librbd: don't attempt to read from object snapshots known to not exist
If there is an issue with the list-snaps command that results in a forced
read of the full object, don't attempt to read from any snapshots that
are known to not exist. This can occur if cache-tiering does not have the
full snapshot context or if there is a bug in the list-snaps op
implementation.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The state machine now lists the snaps for the full overlap image-extent
using the new list-snaps API. Additionally, read operations are handled
via the image-extent read API.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Wed, 9 Sep 2020 04:05:21 +0000 (00:05 -0400)]
librbd: sparse bufferlist read should return image-based extent map
It's currently returning a buffer-based extent map which is fine under
the existing use-case for copy-up but it does not support more advanced
features that need to know the actual image-extents for any data.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Thu, 3 Sep 2020 13:40:50 +0000 (09:40 -0400)]
librbd: optionally disable read-from-parent for object-extent IO requests
Deep-copy (and eventually crypto) will need a way to prevent read-from-parent
IO requests. This introduces an optional flag at the object-extent IO layer
to disable that functionality.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Wed, 2 Sep 2020 23:10:57 +0000 (19:10 -0400)]
librbd: generic image-extent list snapshot request
Convert image extents to object-extents and issue list snapshot requests
against each object. Once all the results are available, assemble the
snapshot deltas back into image extents.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Rishabh Dave [Wed, 16 Sep 2020 10:59:24 +0000 (16:29 +0530)]
mon/MonCap: check profile_grants too while checking caps
When checking if a certain fs subcommand can and should be executed in
FSCommands.cc, check permissions in "profile_grants" too when the caps
for that entity contains a cap profile.
Fixes: https://tracker.ceph.com/issues/47423 Signed-off-by: Rishabh Dave <ridave@redhat.com>
Only flush requests coming from the refresh state machine or from the
exclusive-lock dispatch layer initializationshould be ignored. This is
because both can be initiated from the refresh state machine and
therefore deadlock.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Thu, 17 Sep 2020 20:22:20 +0000 (16:22 -0400)]
librbd: reorder exclusive-lock pre-release state steps
The exclusive-lock dispatch layer should be locked and flushed to
ensure no IO is waiting for a refresh. Once that is complete, interlock
with the refresh state machine and re-flush one last time w/ the
refresh dispatch layer skipped.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Thu, 17 Sep 2020 19:09:39 +0000 (15:09 -0400)]
librbd: avoid blocking writes when initializing exclusive-lock
The exclusive-lock dispatch layer will already block IOs as required
so this second layer of blocking just increases the complexity and
the potential for deadlocks when attempting to flush.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
ceph.spec.in, debian/control: add smartmontools and nvme-cli dependencies
These packages are needed in order to scrape device health metrics from
devices used by OSD and MON daemons.
smartmontools' smartctl is what we use in order to scrape devices' SMART
attributes and general health metrics.
In addition, we use nvme-cli tool on NVMe devices, which fetches
vendor specific NVMe related health metrics.
Ceph rely on these tools for proper functioning of the underlying layers
of devicehealth mgr module, and other mgr modules which use devicehealth
functionality (such as diskprediction_local, telemetry, dashboard).
Essentially, most of devicehealth commands rely on proper functioning of
smartctl, otherwise they lack the device health metrics.
For example, in case smartctl is missing, the commands:
ceph device scrape-daemon-health-metrics <who>
ceph device scrape-health-metrics [<devid>]
will not be able to scrape health metrics, and the command:
ceph device predict-life-expectancy <devid>
will not provide any meaningful output (since there are no metrics).
In short, when we scrape a device by its daemon (be it an OSD or a MON):
ceph device scrape-daemon-health-metrics <who>
The devicehealth module command eventually invokes a
block_device_get_metrics() call in either osd/OSD.cc or mon/Monitor.cc,
which wraps calls to both
block_device_run_smartctl() (spawns smartctl)
block_device_run_vendor_nvme() (spawns nvme)
in common/blkdev.cc.
Minimum version requirements:
'smartmontools' is the package name, which contains two utility
programs: 'smartd' and 'smartctl'. Ceph uses the latter.
Version 6.7 of smartctl first introduced the --json option (beta), which
allows to output the metrics in a JSON format. Since then a few
adjustments were made and the feature officially launched in smartctl
version 7.0.
Since we rely on the JSON format to process the metrics, we must have
smartmontools' smartctl version >= 7.
That said, we choose not to specify smartmontools version here on
purpose, since there might be a scenario where:
We specified smartmontools version to be >= 7.
smartmontools 7 is not available yet in rhel 8 / centos 8.
A user installs via rpm ceph-osd, for example.
smartmontools will not be installed (since version >= 7 is not available
in this repo yet).
Then the user upgrades to 8.3 (which should have smartmontools >= 7),
but smartmontools will not get upgraded (since it's not installed).
In the scenario where we do not specify a version, smartmontools 6.6
will be installed, but it will be upgraded to >= 7 when a user upgrades
(and if it's a fresh installation - version >= 7 would be installed
anyway).
nvme-cli does not have a minimum version.
We use 'Recommends' for both rpm and deb packages since we do not want
the installation to fail in case of conflicts. 'Recommends' weakens the
dependency to be installed in case possible, but ignores it in cases of
conflicts with other dependencies.
It's worth mentioning that smartmontools and nvme-cli dependencies exist
in ceph-container builds.
We add them here for the cases of bare metal installations.
In the future we will add a separate package (with smartmontools and
nvme-cli dependencies) that can be installed on any node (running
rbd-mirror, rgw, mds, mgr, etc.), in order to be able to collect the
health metrics of its devices and offer their life expectancy
prediction.
test/librados: fix endian bugs in checksum test cases
We're seeing test failures when running rados/test.sh in Teuthology
on a big-endian platform (IBM Z). These are all related to calls
to the checksum operations, which expect little-endian inputs and
outputs, but are in many places called with native-endian types
from the test code.
One test case, LibRadosAio::RoundTrip3 in aio.cc, already uses
ceph_le types to address this problem, and this test actually
completes successfully on IBM Z. This patch changes the other
test case performing checksum operations accordingly.
With this patch in place, rados/test.sh now completed successfully.
Patrick Donnelly [Wed, 16 Sep 2020 19:28:55 +0000 (12:28 -0700)]
mon: allow overriding the initial mon_host
This overrides what the CephContext believes to be the current quorum of
monitors (retrieved from other instances of the MonClient), introduced
by [1]. Tests need to be able to target a specific monitor for
exercising forwarding and other things.