mds_cluster.mds_fail() runs command "mds fail" not "fs fail". The reason
for failure was PR #32581 which accidentally changed the return code
from 0 to EINVAL. Since this was reversed in PR #37159, the change
introduced by 04ed58f is not only incorrect but also redundant.
ceph.spec.in, debian/control: add smartmontools and nvme-cli dependencies
These packages are needed in order to scrape device health metrics from
devices used by OSD and MON daemons.
smartmontools' smartctl is what we use in order to scrape devices' SMART
attributes and general health metrics.
In addition, we use nvme-cli tool on NVMe devices, which fetches
vendor specific NVMe related health metrics.
Ceph rely on these tools for proper functioning of the underlying layers
of devicehealth mgr module, and other mgr modules which use devicehealth
functionality (such as diskprediction_local, telemetry, dashboard).
Essentially, most of devicehealth commands rely on proper functioning of
smartctl, otherwise they lack the device health metrics.
For example, in case smartctl is missing, the commands:
ceph device scrape-daemon-health-metrics <who>
ceph device scrape-health-metrics [<devid>]
will not be able to scrape health metrics, and the command:
ceph device predict-life-expectancy <devid>
will not provide any meaningful output (since there are no metrics).
In short, when we scrape a device by its daemon (be it an OSD or a MON):
ceph device scrape-daemon-health-metrics <who>
The devicehealth module command eventually invokes a
block_device_get_metrics() call in either osd/OSD.cc or mon/Monitor.cc,
which wraps calls to both
block_device_run_smartctl() (spawns smartctl)
block_device_run_vendor_nvme() (spawns nvme)
in common/blkdev.cc.
Minimum version requirements:
'smartmontools' is the package name, which contains two utility
programs: 'smartd' and 'smartctl'. Ceph uses the latter.
Version 6.7 of smartctl first introduced the --json option (beta), which
allows to output the metrics in a JSON format. Since then a few
adjustments were made and the feature officially launched in smartctl
version 7.0.
Since we rely on the JSON format to process the metrics, we must have
smartmontools' smartctl version >= 7.
That said, we choose not to specify smartmontools version here on
purpose, since there might be a scenario where:
We specified smartmontools version to be >= 7.
smartmontools 7 is not available yet in rhel 8 / centos 8.
A user installs via rpm ceph-osd, for example.
smartmontools will not be installed (since version >= 7 is not available
in this repo yet).
Then the user upgrades to 8.3 (which should have smartmontools >= 7),
but smartmontools will not get upgraded (since it's not installed).
In the scenario where we do not specify a version, smartmontools 6.6
will be installed, but it will be upgraded to >= 7 when a user upgrades
(and if it's a fresh installation - version >= 7 would be installed
anyway).
nvme-cli does not have a minimum version.
We use 'Recommends' for both rpm and deb packages since we do not want
the installation to fail in case of conflicts. 'Recommends' weakens the
dependency to be installed in case possible, but ignores it in cases of
conflicts with other dependencies.
It's worth mentioning that smartmontools and nvme-cli dependencies exist
in ceph-container builds.
We add them here for the cases of bare metal installations.
In the future we will add a separate package (with smartmontools and
nvme-cli dependencies) that can be installed on any node (running
rbd-mirror, rgw, mds, mgr, etc.), in order to be able to collect the
health metrics of its devices and offer their life expectancy
prediction.
`ceph-volume simple activate --all` relies on the presence of json files
in `/etc/ceph/osd` that was created with `ceph-volume simple scan`
command.
In a cluster lifecycle, it is very likely an OSD which was deployed with
ceph-disk at some point gets removed or replaced. It means the corresponding
json file in `/etc/ceph/osd` becomes unrelevant. It makes `ceph-volume
simple activate --all` fails because it tries to mount non existing
partitions.
The idea here is to simply warn the user that the osd described in the
json file doesn't exist anymore and exit properly instead of throwing an
error.
crimson/common/tri_mutex: update the class comment
to explain the reason why we have tri_mutex, and how it is related to
pipelined read / write. and the mutual exclusion between read, write
and rmw operations.
ignore BrokenPipeError which is thrown when piping the output of ceph
CLI to a tool which might close its stdin before ceph CLI sends the
whole help message.
Follow approach suggested by Kefu: https://github.com/python/cpython/commit/7b0ed43af55c1e2844aa0ccd5e088b2ddd38dbdb
This doesn't manage the clean-up/exit logic, as that's deferred to the
last part of the __main__ code.
Signed-off-by: Lisa Li <xiaoyan.li@intel.com> Signed-off-by: Mahati Chamarthy <mahati.chamarthy@intel.com> Signed-off-by: Changcheng Liu <changcheng.liu@intel.com>
librbd/cache: Establish the framework to integrate RWL and SSD
- Create WriteLogCache class
- Rename ReplicatedWriteLog files to AbstractWriteLog and
modify the I/O method names
- fix the test
- Modify CMakeLists.txt to add newly created classes
Signed-off-by: Lisa Li <xiaoyan.li@intel.com> Signed-off-by: Mahati Chamarthy <mahati.chamarthy@intel.com> Signed-off-by: Changcheng Liu <changcheng.liu@intel.com>
crimson/osd: replace "ceph_abort_msg()" with assert()
these are programming errors, and are easy to detect. also assert() does
not return, so compiler won't complain at seeing a branch does not
return value in a function that returns value.
* add tri_mutex::abort() to pass given exception to all waiters
* add ObjectContext::interrupt() to abort all pending consumers
of current object context
before this change, a seastar::shared_mutex, a RWState and a
shared_promise are used for tracking the consumers of ObjectContext.
and all of the consumers are put into writers if the predicate function
evaluates to "false", and is awaken if the predicate function evaluates
to "true" afterwards in a polling loop waiting on the shared_promise,
which is in turn fulfilled once the last consumer of the given category
relinquishes the lock.
this approach has couple issues:
* it is heavy weighted. seastar::shared_mutex already tracks each of
the waiters' continuation using separate promise<>, and it does try
to reschedule them once a given consumer releases the last lock.
so it's like a design of a customized shared_mutex over a
shared_mutex.
* it is complicated. 3 variables for tracking the different
consumers of ObjectContext.
in this change,
* `tri_mutex` is introduced as a variant of the original
`seastar::shared_mutex` to track two different shared users in
addition to an exclusive user.
* replace `shared_mutex` with `tri_mutex` in `ObjectContext`, to
simplify the design.
* move recovery_read_marker into `ObjectContext`. assuming all
pending actions will be added as a waiter for the related
object context before they acquire the lock.
instead of reusing ObjectContext::get_recovery_read() for both
sync call and async call. just add a new method for the async call
for better readability
Patrick Donnelly [Mon, 14 Sep 2020 19:21:10 +0000 (12:21 -0700)]
mds/FSMap: check parse_role return before filtering
If parse_role fails, then the fscid value is invalid.
This was caught in testing cephtool. Funnily enough, the command
rmfailed normally fails:
2020-09-06T00:05:51.020 INFO:tasks.workunit.client.0.smithi036.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:35: expect_false: set -x
2020-09-06T00:05:51.020 INFO:tasks.workunit.client.0.smithi036.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:36: expect_false: ceph mds rmfailed 0
2020-09-06T00:05:51.318 INFO:tasks.workunit.client.0.smithi036.stderr:Error EPERM: WARNING: this can make your filesystem inaccessible! Add --yes-i-really-mean-it if you are sure you wish to continue.
2020-09-06T00:05:51.321 INFO:tasks.workunit.client.0.smithi036.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:36: expect_false: return 0
2020-09-06T00:05:51.322 INFO:tasks.workunit.client.0.smithi036.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:989: test_mon_mds: ceph mds rmfailed 0 --yes-i-really-mean-it
2020-09-06T00:05:51.631 INFO:tasks.workunit.client.0.smithi036.stderr:Error EINVAL: Rank '0' not foundinvalid role '0'
2020-09-06T00:05:51.634 INFO:tasks.workunit.client.0.smithi036.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:990: test_mon_mds: set -e
are basically the same thing. They're all called directly
before the deployment of a daemon. All of them should be
unified. This PR makes this refactorization simpler
By renaming `create` to `prepare_create`, we make `create`
no longer being the entrypoint to call
`create_daemon`. Thus all the functions above
return some data structures.
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
crimson/common: add comment to explain the partial specialization
it might be confusing why we don't use explicit specialization for
defining errorator::futurize::stored_to_future.
quote from item 16, § 17.7.3, n4659:
In an explicit specialization declaration for a member of a class
template or a member template that appears in namespace scope, the
member template and some of its enclosing class templates may remain
unspecialized, except that the declaration shall not explicitly
specialize a class member template if its enclosing class templates are
not explicitly specialized as well.
crimson/common: add specialization for futurize::invoke(Func, monostate)
this is a leftover of 260a702ba983f1bca29d4c8d1e28f3eef46c6699. where we
bumped up the Seastar API level to 5, in which seastar::internal::monostate
is used to represent the stored state of a future instead of a tuple<>.
instead of converting string constant to char*, construct string_views
from string constants
to silence GCC warnings like:
src/rgw/services/svc_sys_obj_cache.cc:512:7: warning: ISO C++ forbids converting a string constant to 'char*' [-Wwrite-strings]
512 | { "cache list name=filter,type=CephString,req=false",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~