Sebastian Krah [Tue, 19 Nov 2019 10:41:49 +0000 (11:41 +0100)]
mgr/dashboard: Don't use any xlf file when building the default language
The build-i18n script no longer uses a xlf file when building the default
language. This means that we don't need to keep the messages.en-US.xlf in the repository anymore.
Fixes: https://tracker.ceph.com/issues/42693 Signed-off-by: Sebastian Krah <skrah@suse.com>
(cherry picked from commit 5efe0a3ab25e26421001033bd7ea36c86aca2b02)
Conflicts:
src/pybind/mgr/dashboard/frontend/src/locale/messages.en-US.xlf
- Deleted this file
Casey Bodley [Mon, 31 Aug 2020 15:19:34 +0000 (11:19 -0400)]
radosgw-admin: period pull command is not always a raw_storage_op
if a --url is given, 'period pull' does not depend on any zone/period
configuration and can be a raw_storage_op. if we get a --remote instead,
we do need to initialize the zone/period configuration to find the
correct endpoint/access keys
Aleksei Gutikov [Mon, 30 Mar 2020 12:27:45 +0000 (15:27 +0300)]
mgr: decrease pool stats if pg was removed
After merge of placement groups resulting pg contains
objects from itself and merged one.
PGMap::apply_incremental treat this growth as pool stats delta,
but forget to decrease stats for removed pg.
Fixes: https://tracker.ceph.com/issues/44815 Signed-off-by: Aleksei Gutikov <aleksey.gutikov@synesis.ru>
(cherry picked from commit 6090acdae4495e11f117df2330b579744eeada2a)
This will crash the choose_acting() procedure as it will mistakenly
think that peer 3 should continue to perform asynchronous recovery
(e.g., due to num_objects_missing = 1) in contrast to fully
backfill-recovered.
While I did not dig into the real cause, there are a couple of
possible explanations of how num_objects can be off. I think that
if a roll forward or log replay could delete something twice, maybe
there would be an undercount. Or maybe something as simple as a
corruption.
Since _update_calc_stats() is going to fix num_objects_missing
for that peer anyway, let's make sure it always starts with a
clean state.
Casey Bodley [Fri, 29 May 2020 16:31:16 +0000 (12:31 -0400)]
rgw: fix shutdown crash in RGWAsyncReadMDLogEntries
RGWAsyncReadMDLogEntries must not store pointers into coroutine memory,
because it's not guaranteed to outlive our call. store these by-value
instead, and have RGWReadMDLogEntriesCR::request_complete() copy/move
them back on completion
When rgw_bucket_unlink_instance removes the last instance of a name, it
also clears the value of rgw_bucket_olh_entry.key. However, bucket index
resharding uses this key when choosing its shard placement, so an empty
key causes all of these olh entries to be misplaced in shard 0. After
reshard, all of the olh recovery/cleanup logic would be sent to the
correct shard, and these misplaced olh entries would never be cleaned
up.
Preserving the key's name on last unlink allows the olh entry to be
resharded correctly and cleaned up normally.
Soumya Koduri [Tue, 16 Jun 2020 12:40:08 +0000 (18:10 +0530)]
rgw: Empty reqs_change_state queue before unregistered_reqs
In RGWHTTPManager::manage_pending_request(), before unregistering
or unlinking the http requests, empty the reqs_change_state list
to avoid use after free.
if an existing object is cached with an object version, but it's
mutated without updating that version number, clear the OBJV flag so
that later cache reads asking for an object version result in a miss and
re-read the version from the osd
Casey Bodley [Thu, 6 Aug 2020 16:57:13 +0000 (12:57 -0400)]
rgw: system object cache tracks version over increments
instead of checking write_version before the write (which doesn't take
cls_version_inc() into account), check read_version after apply_write()
has been called. only cache the result if we got a read_version != 0
Casey Bodley [Tue, 4 Aug 2020 19:03:35 +0000 (15:03 -0400)]
rgw: RGWObjVersionTracker tracks read version over increments
when no write_version is given, cls_version_inc() is used to increment
the version so other writers can use cls_version_check() to detect races
however, apply_write() will clear its cached read_version, which means
that later writes can no longer use cls_version_check() to detect other
racing writers
in cases where cls_version_inc() is used AND we know the previous version,
we can increment the cached read_version and preserve the ability to use
cls_version_check(). we know the previous version if we provided a valid
read_version to cls_version_check() and it succeeded
Matthew Oliver [Thu, 9 Jul 2020 06:13:05 +0000 (06:13 +0000)]
rgw: Swift API anonymous access should 401
There was a previous patch to fix this but turns out that only fixed it
for the Swift V1 auth. And it actaully broke keystone because it didn't
take into account the idiosyncrasies of multi tenancy. Which resulted in
the incorect behaviour for keystone. Worse, because it didn't take
tenants properly into account keystone ACLs where broken.
This patch reworks, and simplifies the original patch to work for both
auths. It even extends the ThirdPartyAccountApplier to check for an ANON
user and properly scope it to a tenant.
Fixes: https://tracker.ceph.com/issues/46295 Signed-off-by: Matthew Oliver <moliver@suse.com>
(cherry picked from commit 67081098dc2dddd80d52d5acd166e68954cae618)
Conflicts:
src/rgw/rgw_swift_auth.h
- only need to modify the user related code to rgw_user construct
rbd: make common options override krbd-specific options
ceph-csi has added support for passing custom map and unmap options via
mapOptions and unmapOptions storage class parameters. However, it also
uses --read-only for implementing ROX (ReadOnlyMany) PVs. If the user
supplies "mapOptions: rw", they will get around the intended read-only
restriction (at least on the block device).
ceph-csi could be patched to use "-o ro", but it actually makes sense
for common options to win over device type-specific equivalents.
Conflicts:
src/tools/rbd/action/Kernel.cc [ snapshot quiesce support and
commit 34f539d8af33 ("rbd: delay parsing of default kernel map
options") not in nautilus ]
* extract get_ragweed_branch() out of download() task, for better
readablity.
* use a loop for retry when the first clone fails
* drop the `raise ValueError()` clause as it never happens. we could use
an assert() here, but i don't think it is necessary anyway.
* use sh() instead of run() for better readablity.
* always set ragweed_repo. before this change this variable is
unbounded if `force-branch` is set.
Yaarit Hatuka [Thu, 27 Aug 2020 03:04:34 +0000 (23:04 -0400)]
mgr/telemetry: fix device id splitting when anonymizing serial
Anonymizing the serial number in the device id string fails in rare
cases where 'vendor' and 'model' are missing from the device id
string. Ideally, device id is generated (in blkdev.cc) as
'vendor_model_serial', in case all fields were successfully retrieved
from the device. In cases where they were not, device id can also be
generated as 'model_serial' or 'serial'. Splitting by '_' fails in the
latter case (since 'serial' is the only element in the string).
In order to anonymize serial numbers in smartctl reports we now rely
on the serial number value as retrieved from the raw smartctl report
itself (as opposed to the one in device id). That's in order to avoid
possible inconsistencies between the serial retrieved from device id and
the one in the report.
In master we use Python 3's f-string formatting to create 'anon_devid':
anon_devid = f"{devid.rsplit('_', 1)[0]}_{uuid.uuid1()}"
The conflict happened since Nautilus still uses Python 2, and 'anon_id'
is created via string concatenation.
anon_devid = devid[:devid.rfind('_')] + '_' + str(uuid.uuid1())
mgr/dashboard: Fix many-to-many issue in host-details dashboard
The labels on one side do not match the labels of the other side, where
a label_replace is used. The fix uses the same label_replace on the
missing side.
Fixes: https://tracker.ceph.com/issues/47334 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
(cherry picked from commit fe64b9d1763ec9dbe78fe73c403929524ab4e253)
ceph.spec.in, debian/control: add smartmontools and nvme-cli dependencies
These packages are needed in order to scrape device health metrics from
devices used by OSD and MON daemons.
smartmontools' smartctl is what we use in order to scrape devices' SMART
attributes and general health metrics.
In addition, we use nvme-cli tool on NVMe devices, which fetches
vendor specific NVMe related health metrics.
Ceph rely on these tools for proper functioning of the underlying layers
of devicehealth mgr module, and other mgr modules which use devicehealth
functionality (such as diskprediction_local, telemetry, dashboard).
Essentially, most of devicehealth commands rely on proper functioning of
smartctl, otherwise they lack the device health metrics.
For example, in case smartctl is missing, the commands:
ceph device scrape-daemon-health-metrics <who>
ceph device scrape-health-metrics [<devid>]
will not be able to scrape health metrics, and the command:
ceph device predict-life-expectancy <devid>
will not provide any meaningful output (since there are no metrics).
In short, when we scrape a device by its daemon (be it an OSD or a MON):
ceph device scrape-daemon-health-metrics <who>
The devicehealth module command eventually invokes a
block_device_get_metrics() call in either osd/OSD.cc or mon/Monitor.cc,
which wraps calls to both
block_device_run_smartctl() (spawns smartctl)
block_device_run_vendor_nvme() (spawns nvme)
in common/blkdev.cc.
Minimum version requirements:
'smartmontools' is the package name, which contains two utility
programs: 'smartd' and 'smartctl'. Ceph uses the latter.
Version 6.7 of smartctl first introduced the --json option (beta), which
allows to output the metrics in a JSON format. Since then a few
adjustments were made and the feature officially launched in smartctl
version 7.0.
Since we rely on the JSON format to process the metrics, we must have
smartmontools' smartctl version >= 7.
That said, we choose not to specify smartmontools version here on
purpose, since there might be a scenario where:
We specified smartmontools version to be >= 7.
smartmontools 7 is not available yet in rhel 8 / centos 8.
A user installs via rpm ceph-osd, for example.
smartmontools will not be installed (since version >= 7 is not available
in this repo yet).
Then the user upgrades to 8.3 (which should have smartmontools >= 7),
but smartmontools will not get upgraded (since it's not installed).
In the scenario where we do not specify a version, smartmontools 6.6
will be installed, but it will be upgraded to >= 7 when a user upgrades
(and if it's a fresh installation - version >= 7 would be installed
anyway).
nvme-cli does not have a minimum version.
We use 'Recommends' for both rpm and deb packages since we do not want
the installation to fail in case of conflicts. 'Recommends' weakens the
dependency to be installed in case possible, but ignores it in cases of
conflicts with other dependencies.
It's worth mentioning that smartmontools and nvme-cli dependencies exist
in ceph-container builds.
We add them here for the cases of bare metal installations.
In the future we will add a separate package (with smartmontools and
nvme-cli dependencies) that can be installed on any node (running
rbd-mirror, rgw, mds, mgr, etc.), in order to be able to collect the
health metrics of its devices and offer their life expectancy
prediction.
Had to remove the line:
Requires: python%{python3_pkgversion}-ceph-common = %{_epoch_prefix}%{version}-%{release}
which slipped in between
Requires: libstoragemgmt
and
%if 0%{?weak_deps}
Also, removed from the cherry-picked commit the dependencies for mon package
from both ceph.spec.in and debian/control.
That's because in Nautilus we do not scrape the health metrics of mon devices
(please see commit d592e56e74d94c6a05b9240fcb0031868acefbab).