if an existing object is cached with an object version, but it's
mutated without updating that version number, clear the OBJV flag so
that later cache reads asking for an object version result in a miss and
re-read the version from the osd
Casey Bodley [Thu, 6 Aug 2020 16:57:13 +0000 (12:57 -0400)]
rgw: system object cache tracks version over increments
instead of checking write_version before the write (which doesn't take
cls_version_inc() into account), check read_version after apply_write()
has been called. only cache the result if we got a read_version != 0
Casey Bodley [Tue, 4 Aug 2020 19:03:35 +0000 (15:03 -0400)]
rgw: RGWObjVersionTracker tracks read version over increments
when no write_version is given, cls_version_inc() is used to increment
the version so other writers can use cls_version_check() to detect races
however, apply_write() will clear its cached read_version, which means
that later writes can no longer use cls_version_check() to detect other
racing writers
in cases where cls_version_inc() is used AND we know the previous version,
we can increment the cached read_version and preserve the ability to use
cls_version_check(). we know the previous version if we provided a valid
read_version to cls_version_check() and it succeeded
Matthew Oliver [Thu, 9 Jul 2020 06:13:05 +0000 (06:13 +0000)]
rgw: Swift API anonymous access should 401
There was a previous patch to fix this but turns out that only fixed it
for the Swift V1 auth. And it actaully broke keystone because it didn't
take into account the idiosyncrasies of multi tenancy. Which resulted in
the incorect behaviour for keystone. Worse, because it didn't take
tenants properly into account keystone ACLs where broken.
This patch reworks, and simplifies the original patch to work for both
auths. It even extends the ThirdPartyAccountApplier to check for an ANON
user and properly scope it to a tenant.
Fixes: https://tracker.ceph.com/issues/46295 Signed-off-by: Matthew Oliver <moliver@suse.com>
(cherry picked from commit 67081098dc2dddd80d52d5acd166e68954cae618)
Conflicts:
src/rgw/rgw_swift_auth.h
- only need to modify the user related code to rgw_user construct
rbd: make common options override krbd-specific options
ceph-csi has added support for passing custom map and unmap options via
mapOptions and unmapOptions storage class parameters. However, it also
uses --read-only for implementing ROX (ReadOnlyMany) PVs. If the user
supplies "mapOptions: rw", they will get around the intended read-only
restriction (at least on the block device).
ceph-csi could be patched to use "-o ro", but it actually makes sense
for common options to win over device type-specific equivalents.
Conflicts:
src/tools/rbd/action/Kernel.cc [ snapshot quiesce support and
commit 34f539d8af33 ("rbd: delay parsing of default kernel map
options") not in nautilus ]
* extract get_ragweed_branch() out of download() task, for better
readablity.
* use a loop for retry when the first clone fails
* drop the `raise ValueError()` clause as it never happens. we could use
an assert() here, but i don't think it is necessary anyway.
* use sh() instead of run() for better readablity.
* always set ragweed_repo. before this change this variable is
unbounded if `force-branch` is set.
Yaarit Hatuka [Thu, 27 Aug 2020 03:04:34 +0000 (23:04 -0400)]
mgr/telemetry: fix device id splitting when anonymizing serial
Anonymizing the serial number in the device id string fails in rare
cases where 'vendor' and 'model' are missing from the device id
string. Ideally, device id is generated (in blkdev.cc) as
'vendor_model_serial', in case all fields were successfully retrieved
from the device. In cases where they were not, device id can also be
generated as 'model_serial' or 'serial'. Splitting by '_' fails in the
latter case (since 'serial' is the only element in the string).
In order to anonymize serial numbers in smartctl reports we now rely
on the serial number value as retrieved from the raw smartctl report
itself (as opposed to the one in device id). That's in order to avoid
possible inconsistencies between the serial retrieved from device id and
the one in the report.
In master we use Python 3's f-string formatting to create 'anon_devid':
anon_devid = f"{devid.rsplit('_', 1)[0]}_{uuid.uuid1()}"
The conflict happened since Nautilus still uses Python 2, and 'anon_id'
is created via string concatenation.
anon_devid = devid[:devid.rfind('_')] + '_' + str(uuid.uuid1())
mgr/dashboard: Fix many-to-many issue in host-details dashboard
The labels on one side do not match the labels of the other side, where
a label_replace is used. The fix uses the same label_replace on the
missing side.
Fixes: https://tracker.ceph.com/issues/47334 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
(cherry picked from commit fe64b9d1763ec9dbe78fe73c403929524ab4e253)
ceph.spec.in, debian/control: add smartmontools and nvme-cli dependencies
These packages are needed in order to scrape device health metrics from
devices used by OSD and MON daemons.
smartmontools' smartctl is what we use in order to scrape devices' SMART
attributes and general health metrics.
In addition, we use nvme-cli tool on NVMe devices, which fetches
vendor specific NVMe related health metrics.
Ceph rely on these tools for proper functioning of the underlying layers
of devicehealth mgr module, and other mgr modules which use devicehealth
functionality (such as diskprediction_local, telemetry, dashboard).
Essentially, most of devicehealth commands rely on proper functioning of
smartctl, otherwise they lack the device health metrics.
For example, in case smartctl is missing, the commands:
ceph device scrape-daemon-health-metrics <who>
ceph device scrape-health-metrics [<devid>]
will not be able to scrape health metrics, and the command:
ceph device predict-life-expectancy <devid>
will not provide any meaningful output (since there are no metrics).
In short, when we scrape a device by its daemon (be it an OSD or a MON):
ceph device scrape-daemon-health-metrics <who>
The devicehealth module command eventually invokes a
block_device_get_metrics() call in either osd/OSD.cc or mon/Monitor.cc,
which wraps calls to both
block_device_run_smartctl() (spawns smartctl)
block_device_run_vendor_nvme() (spawns nvme)
in common/blkdev.cc.
Minimum version requirements:
'smartmontools' is the package name, which contains two utility
programs: 'smartd' and 'smartctl'. Ceph uses the latter.
Version 6.7 of smartctl first introduced the --json option (beta), which
allows to output the metrics in a JSON format. Since then a few
adjustments were made and the feature officially launched in smartctl
version 7.0.
Since we rely on the JSON format to process the metrics, we must have
smartmontools' smartctl version >= 7.
That said, we choose not to specify smartmontools version here on
purpose, since there might be a scenario where:
We specified smartmontools version to be >= 7.
smartmontools 7 is not available yet in rhel 8 / centos 8.
A user installs via rpm ceph-osd, for example.
smartmontools will not be installed (since version >= 7 is not available
in this repo yet).
Then the user upgrades to 8.3 (which should have smartmontools >= 7),
but smartmontools will not get upgraded (since it's not installed).
In the scenario where we do not specify a version, smartmontools 6.6
will be installed, but it will be upgraded to >= 7 when a user upgrades
(and if it's a fresh installation - version >= 7 would be installed
anyway).
nvme-cli does not have a minimum version.
We use 'Recommends' for both rpm and deb packages since we do not want
the installation to fail in case of conflicts. 'Recommends' weakens the
dependency to be installed in case possible, but ignores it in cases of
conflicts with other dependencies.
It's worth mentioning that smartmontools and nvme-cli dependencies exist
in ceph-container builds.
We add them here for the cases of bare metal installations.
In the future we will add a separate package (with smartmontools and
nvme-cli dependencies) that can be installed on any node (running
rbd-mirror, rgw, mds, mgr, etc.), in order to be able to collect the
health metrics of its devices and offer their life expectancy
prediction.
Had to remove the line:
Requires: python%{python3_pkgversion}-ceph-common = %{_epoch_prefix}%{version}-%{release}
which slipped in between
Requires: libstoragemgmt
and
%if 0%{?weak_deps}
Also, removed from the cherry-picked commit the dependencies for mon package
from both ceph.spec.in and debian/control.
That's because in Nautilus we do not scrape the health metrics of mon devices
(please see commit d592e56e74d94c6a05b9240fcb0031868acefbab).
`ceph-volume simple activate --all` relies on the presence of json files
in `/etc/ceph/osd` that was created with `ceph-volume simple scan`
command.
In a cluster lifecycle, it is very likely an OSD which was deployed with
ceph-disk at some point gets removed or replaced. It means the corresponding
json file in `/etc/ceph/osd` becomes unrelevant. It makes `ceph-volume
simple activate --all` fails because it tries to mount non existing
partitions.
The idea here is to simply warn the user that the osd described in the
json file doesn't exist anymore and exit properly instead of throwing an
error.
Patrick Donnelly [Wed, 16 Sep 2020 19:28:55 +0000 (12:28 -0700)]
mon: allow overriding the initial mon_host
This overrides what the CephContext believes to be the current quorum of
monitors (retrieved from other instances of the MonClient), introduced
by [1]. Tests need to be able to target a specific monitor for
exercising forwarding and other things.
mon: store mon updates in ceph context for future MonMap instantiation
MonMap builds initial mon list using provided sources, like
mon-host or monmap.
For future instantiations of MonClient, if mon addresses are
updated, stale information from the provided sources are used.
This commit retains mon updates that are processed by the
MonClient in CephContext, for use in MonMap instantiations
and hence uses updated information as required.
This is helpful in cases where librados or libcephfs
instantiate MonClient in the ceph-mgr deamon as required.
Fixes: https://tracker.ceph.com/issues/46645 Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>
(cherry picked from commit 7a1f02acfe6b5d8a760efd16bb594a0656b39eac)
Conflicts:
src/mon/MonMap.h
Retain crimson namespace as ceph::common
src/mon/MonMap.cc
Address merge conflict due to linespace removed
src/common/ceph_context.cc
Remove WITH_ALIEN latch
src/common/ceph_context/h
Remove WITH_ALIEN latch
Bryan Stillwell [Tue, 24 Mar 2020 21:15:41 +0000 (15:15 -0600)]
compressor: Set the Zstd default compression level to 1
The default compression level of 5 for Zstandard is too high for the majority
of use cases since it requires too many CPU cycles. This patch switches the
default to 1.
Dan van der Ster [Mon, 14 Sep 2020 14:23:53 +0000 (16:23 +0200)]
ceph.in: ignore failures to flush stdout
Catch an IOError exception when flushing ceph stdout.
Fixes: https://tracker.ceph.com/issues/47442 Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
(cherry picked from commit 48503413a28fbea32f8ef3d48cb765771216f165)
mgr/dashboard: Monitoring: Fix for the infinite loading bar action
Only seen in nautilus
Intended to fix the unusual behaviour in the All Alerts tab where the loading bar progressess continously until one of the alerts is selected.
To reproduce:
Navigate to cluster -> Monitoring -> All Alerts tab. You can see the progress bar at the bottom of the table.
Fixes: https://tracker.ceph.com/issues/47435 Signed-off-by: Nizamudeen A <nia@redhat.com>
Jason Dillaman [Wed, 5 Aug 2020 16:36:26 +0000 (12:36 -0400)]
test/rbd-mirror: pool watcher registration error might result in race
The init finish context should be swapped out before it attempts to
re-register the watcher. This affects the test case which mocks the
timer to fire immediately instead of after 30 seconds.
Fixes: https://tracker.ceph.com/issues/46669 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit c89d31ebf6c412d609123979c63ebc600b70e179)
Conflicts:
src/tools/rbd_mirror/PoolWatcher.cc
- nautilus uses Mutex::Locker where master has std::lock_guard
RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.
also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.
Conflicts:
doc/cephfs/cephfs-io-path.rst
doc/dev/deduplication.rst
doc/install/ceph-deploy/quick-cephfs.rst
doc/radosgw/vault.rst
doc/rbd/rbd-kubernetes.rst
doc/rbd/rbd-persistent-cache.rst: these file does not exist in
nautilus, so drop related changes
doc/conf.py: exclude pybindings docs from build for RTD
because it'd difficult to prepare (dummy) librados,libcephfs and librbd for
their python bindings in the building environment offered by Read the Docs.
Jason Dillaman [Wed, 5 Aug 2020 13:12:41 +0000 (09:12 -0400)]
librbd: migration abort should revert data back to the original image
If the migration destination image was modified and then the migration
was aborted, we need to copy the data back to the source image to avoid
losing data. For simplicity we will only revert the HEAD revision state
and will not attempt to copy new snapshots on the destination image
back to the source.
Fixes: https://tracker.ceph.com/issues/41394 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 5bd15da8be09a4e7644d411a0b0c132e5b795393)
We want to prevent the destination image from being used while an
abort is in-progress. Test that the image has no watchers prior to
permitting the abort, switch the migration state to ABORTING, and
treat the image as read-only if the migration state is ABORTING.
Jason Dillaman [Wed, 5 Feb 2020 20:27:39 +0000 (15:27 -0500)]
librbd: ensure deep-copy snapshot map includes all destination snap ids
When deep-copying from an arbitrary start snapshot id, the snap sequence
will be missing all older snapshots. Additionally, snapshot types that
are not deep-copied still need to be included in the destination snap
map.
Jason Dillaman [Wed, 5 Feb 2020 19:23:53 +0000 (14:23 -0500)]
librbd: deep-copy snapshots from a specified start/end position
Allow the snapshots to be arbitrarily copied from any source image
start/end snapshot ids. If the end snapshot is not a user-snapshot,
it will associate to the destination image HEAD revision.
Conflicts:
src/librbd/deep_copy/SnapshotCopyRequest.cc: different lock types
src/test/librbd/deep_copy/test_mock_SnapshotCopyRequest.cc: no mirror snapshot namespaces
Jason Dillaman [Wed, 5 Feb 2020 15:42:27 +0000 (10:42 -0500)]
librbd: deep-copy should accept a lower-bound for the destination snap_id
For snapshot-based mirroring, we will want to prevent the modification of
snapshots below the last sync snapshot and to prevent the copying of data
below that lower-bound as well. This commit just adds the new parameter and
future commits will update the snapshot and object copy behavior.
Greg Farnum [Wed, 12 Aug 2020 23:44:11 +0000 (23:44 +0000)]
mon: mark pgtemp messages as no_reply more consistently in preprocess_pgtemp
If a message is forwarded, it's conceivable the leader's and peon's evaluation
will disagree about whether the message is useful or not, which could result
in the leader ignoring it and the peon having a dangling forwarded message.
Fix this by marking the op as no_reply whenever ignoring it.
J. Eric Ivancich [Tue, 15 Sep 2020 18:20:04 +0000 (14:20 -0400)]
rgw: advance pseudo-folders properly in delimited ordered listing
The code mistakenly uses the current marker to figure out how to skip
past a pseudo-directory. This could allow for some entries in a bucket
to be skipped. The code should have used the current pseudo-directory
to determine what to skip past.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
While this did fix https://tracker.ceph.com/issues/40905, it did so in
an unnecessarily complex manner. So we're reverting it to more easily
apply a cleaner solution.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Ilya Dryomov [Sat, 29 Aug 2020 10:02:30 +0000 (12:02 +0200)]
msg/async/ProtocolV2: allow rxbuf/txbuf get bigger in testing
We have a kernel client test case that constructs huge auth tickets
to exercise the three related code paths in the kernel. One of the
tickets is bigger than 1000000 bytes, as required for triggering the
third code path.
We haven't bumped into this assert earlier because the kernel client
is still on msgr v1. However, "rbd map" and "rbd unmap" commands
started connecting to the cluster in commit 96f05a7956b3 ("rbd: delay
determination of default pool name") and that happens via msgr v2.