Sage Weil [Wed, 1 Aug 2018 01:16:30 +0000 (20:16 -0500)]
Merge PR #23223 into master
* refs/pull/23223/head:
osd/PG: kill dead functions and related options
iosd/osd_type: kill unused input ec_pool for iterate_mayberw_back_to
common: kill dead options
osd/PG: do not initialize up/acting twice
osd/PG: clear missing_loc properly if last location is gone
Sage Weil [Tue, 31 Jul 2018 22:23:48 +0000 (17:23 -0500)]
Merge PR #22692 into master
* refs/pull/22692/head:
doc/mgr/devicehealth: document devicehealth module
doc/rados/operations/health-checks: document DEVICE_HEALTH* messages
mgr/devicehealth: fix style for returns
mgr/devicehealth: use constants for health warnings
mgr/devicehealth: deal with as many daemons as we can until limit
mgr/devicehealth: warn if too many daemons are expected to fail soon
mgr/devicehealth: set primary-affinity 0 for failing devices
msg/devicehealth: fix config options
mgr/devicehealth: only fetch osdmap once from check_health
mgr/devicehealth: revise health messages
mgr/devicehealth: add 'device check-health' command and run periodically
mgr/devicehealth: fix new options
mgr/devicehealth: add helpers to life_expectancy_response()
mgr/devicehealth: simplify setting defaults
common/blkdev remove debug statements
Yaarit Hatuka [Mon, 25 Jun 2018 13:19:22 +0000 (08:19 -0500)]
mgr/devicehealth: add helpers to life_expectancy_response()
- if mark_out_threshold is met we write to log.warn instead of raising a
health warning.
- check that OSD is 'in' before calling mark_out().
- raise a health warning in case OSD is marked 'out' but still has PGs
attached to it.
- cast thresholds default values to string.
- add SCSI multipath support to health warning message.
- change health warning message.
src/osd/PG.cc: remove redundant call to trim_log()
This change is motived by the failure tracked in
https://tracker.ceph.com/issues/25198. The failure highlights a case, when a
call to trim_log() after the PG has recovered, races with the previous op,
on a replica OSD. Since the previous operation has not completed, the
last_complete value for that OSD is not valid, when we try to trim the
log. It is also worth noting that the race is due to MOSDPGTrim going through
the strict queue as a peering message vs regular ops going through the
non-strict queue.
During the investigation of this bug, we noticed that, with
https://tracker.ceph.com/issues/23979, we allow pg log trimming to
happen on the primary and replicas, whenever we cross the upper bound of
the pg log. This also ensures that pg log trimming happens while processing
any new op.
Therefore, the function trim_log(), which earlier served the purpose of
trimming logs on the primary and replicas, just before the PG went into
the Recovered state, is no more required. This acted like a last line of
defense to trim logs, when we did not need the logs any more. But, this call
seems redundant now, because, we are limiting the pg log length at all times.
Sage Weil [Mon, 30 Jul 2018 19:18:07 +0000 (14:18 -0500)]
pybind/rados/rados: do not pass prval from stack
The prval is a pointer to an int to write the final completion code of
the rados op. This can't be on the stack since we immediately leave the
current scope after preparing the op (looong before we do the rados op).
We keep the tuple return value to avoid breaking users of this API
(devicehealth module, gnocchi at a minimum).
Fixes: http://tracker.ceph.com/issues/25175 Signed-off-by: Sage Weil <sage@redhat.com>
cmake,make-dist: build gperftools if WITH_STATIC_LIBSTDCXX
we could create a mini project to build a shared library, and use
try_compile() to test if the found gperftools is compiled with -fPIC.
but as we are targeting mostly xenial when enabling
WITH_STATIC_LIBSTDCXX, and google-perftools on xenial by default
is built without -fPIC. so let's keep it simple.
- do not link libkv with ALLOC_LIBS, it turns out that if we link
tcmalloc *before* -static-libstdc++ -static-libgcc, libstdc++ and gcc
libs will show up in `ldd` output
- add `-static-libstdc++ -static-libgcc` to CMAKE_SHARED_LINKER_FLAGS
and CMAKE_EXE_LINKER_FLAGS instead of adding them to all shared
libraries and executable. simpler this way.
- link against libtcmalloc statically, because libtcmalloc is a C++
library, linking against it dynamically and linking against C++ runtime
statically will pull in depdencies on two versions of C++ runtime, which
will bring down the app at run-time.
- do not pass '-pie' to linker when building executable if
`WITH_STATIC_LIBSTDCXX` and tcmalloc is used, because the static tcmalloc
is not compiled with PIC.
- only apply '-pie' if ENABLE_SHARED is enabled.
Stephan Müller [Fri, 27 Jul 2018 14:15:07 +0000 (16:15 +0200)]
mgr/dashboard: Fix duplicate error messages
Duplicate error messages currently appear if the task wrapper service is
used. It calls 'notifyTask' on a failed task, this would be fine if
we didn't have the API interceptor, which watches all API requests and
triggers 'notifyTask' itself if an error appears.
seastar actually requires fmt 4.0.0 and up, as 3.0.2 does not offer
fmt/printf.h. see
https://github.com/fmtlib/fmt/blob/master/ChangeLog.rst#400---2017-06-27
.
Douglas Fuller [Fri, 29 Jun 2018 17:55:31 +0000 (13:55 -0400)]
mon/OSDMonitor: Warn if missing expected_num_objects
When creating a pool on filestore, warn if the user appears to be
creating a pool to store a large number of objects but omitted the
expected_num_objects parameter. Create the pool anyway.
Fixes: http://tracker.ceph.com/issues/24687 Signed-off-by: Douglas Fuller <dfuller@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1603615 indicates
a case when pg calc conflicts with mon_max_pg_per_osd, and does not
allow pool creation when this limit is 200. Hence, increase this limit
to avoid this.
Andrew Schoen [Thu, 26 Jul 2018 15:09:17 +0000 (10:09 -0500)]
ceph-volume: update version of ansible to 2.6.x for simple tests
ceph-ansible now requires a 2.5.x or 2.6.x version of ansible if you're
using the master branch. This updates our functional tests for the
simple subcommand to use a 2.6.x version of ansible.
mgr/dashboard: Improve prettier scripts and documentation
Added prettier validation to the run-frontend-unittests script. This will make
sure we are always running prettier in our commits.
Added 2 new npm scripts:
- 'prettier', will run prettier formatter on all frontend files
- 'prettier:lint', will check all frontend files against prettier linter
Removed 'pretty-quick' and related scripts. Since we now have all files
prettified we can simply run prettier on them.
Remove 'tslint-eslint-rules' package and all related rules. Prettier can check
all the removed rules.
Updated HACKING.rst with some information about prettier.