qa/suite: whitelist PG_AVAILABILITY in rados_api_tests.yaml
pg will be created when increasing pgp-num and pg-num. so at that
moment, PG_AVAILABILITY is reported. so whitelist it in all tests which
run rados/test.sh. that script exercises ceph_test_rados_api_list.
John Spray [Mon, 23 Apr 2018 21:15:22 +0000 (17:15 -0400)]
mgr: handle commands sent to unavailable modules
There are some up-front checks in DaemonServer
but it shouldn't assume that its checks are
necessarily going to match the choices about
how ActivePyModules composes its ::modules member,
so let's have some extra checks to avoid
risk of crashing mgr on commands sent to
unhealthy/unloaded modules.
John Spray [Mon, 23 Apr 2018 21:09:37 +0000 (17:09 -0400)]
mgr: execute modules even if can_run=false
We execute modules even if can_run=false, so that it is possible
to load them for running their selftest hooks. However,
we already raise health messages about the fact that they're
enabled but can't run, so we don't want to also raise
health messages about whatever exceptions they raise
from serve()
the influx plugin requires influxdb python module to function, but
influxdb mgr plugin is optional for users who don't use influxdb.
so it's marked "Suggests" at this moment before we use a more flexible
packaging scheme.
Sage Weil [Mon, 23 Apr 2018 18:22:26 +0000 (13:22 -0500)]
osd/ECBackend: wait for apply for luminous peers
Avoid completing a write until luminous peers also apply the change.
This is overkill, but works around a problem where completing our write
here allows the next read to start too early. This is because the
PrimaryLogPG is handling the write-to-read ordering, but completing the
write releases the write lock in the PG.
John Spray [Mon, 16 Apr 2018 16:10:13 +0000 (12:10 -0400)]
mgr: add MgrModule.OPTIONS definitions to modules
They need these to do upgrades properly. Where there
was an existing structure describing config, merge
these together to avoid having two lists of options.
John Spray [Mon, 16 Apr 2018 21:29:08 +0000 (17:29 -0400)]
mgr: introduce MgrModule.OPTIONS field
Now is a good time to start requiring
modules to explicitly list their configuration
settings, so that we can do a proper job of
migrating configuration from old config-key style,
i.e. knowing what's a config setting and what's
a KV store item.
Throw an exception if a module tries to
access a setting outside their schema, so
that we have some confidence that the schema
is complete.
John Spray [Mon, 16 Apr 2018 09:39:44 +0000 (05:39 -0400)]
mgr: replace get_config_prefix with get_store_prefix
The _prefix variant was only used for data-ish things,
so we can just move it over to operate on store instead
of config, rather than having a _prefix variant for both.
John Spray [Wed, 11 Apr 2018 16:08:39 +0000 (12:08 -0400)]
mon: grant mgr profile "config" commands
...and remove redundant config-key lines (these are applied
to mgr anyway in the next block, and mgr even has a broader
config-key permission in the line above).
John Spray [Mon, 9 Apr 2018 19:25:45 +0000 (15:25 -0400)]
mgr: rework kv store load path
The locking and blocking around this was a bit
tricky. Do the simple thing, and pull the
load_store out to Mgr so that it can be safely
done as part of the background_init process (just drop
Mgr::lock across blocking actions).
osd/PG: perfer EC async_recovery_targets in reverse order of cost
This is a follow-up fix of https://github.com/ceph/ceph/pull/21578,
in which I forget that erasure-coded-pools share the same logic
when determining the async_recovery_targets..
osd/PGLog: assert out on performing overflowed log trimming
Performing overflowed log-trim can be a sign of big trouble, e.g.,
the **complete_to** iterator will now point to an invalid position
of the original pg-log list when the trimming is done, and hence
randomly trigger **Segmentation fault**s as below:
```
2018-03-07 17:38:46.109018 7f274a4ed700 -1 *** Caught signal (Segmentation fault) **
1: (()+0xa51f31) [0x7f278290bf31]
2: (()+0xf370) [0x7f277fb4f370]
3: (PrimaryLogPG::recover_got(hobject_t, eversion_t)+0x266) [0x7f2782512786]
4: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo const&, std::shared_ptr<ObjectContext>, bool, ObjectStore::Tran
saction*)+0x2a4) [0x7f278251f3b4]
5: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, PushReplyOp*, ObjectStore::Transaction*)+0x2e2) [0x7f2782690f82]
6: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x194) [0x7f2782691224]
7: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2f1) [0x7f278269fd41]
8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x7f27825c2470]
```
The root cause of why PGs are starting to trim more log entries than
we expect is still lost to me, but setting the trap here should generally
do no harm and hopefully expose the above problem a little bit more offen.
We'll see.
"ceph fs set cephfs allow_multimds false" is deprecated, and multimds is
enabled by default, so "ceph fs set cephfs max_mds 4" won't fail with
the default settings.
* to pick up the fix of https://svn.boost.org/trac10/ticket/11622
* also the boost::python's library name now includes the version suffix
of python version, so update BuildBoost.cmake accordingly.
osd/PG: perfer async_recovery_targets in reverse order of cost
Theoretically peers which have a longer list of objects to recover
shall equivalently take a longer time to recover and hence have a
bigger chance to block client ops.
Also, to minimize the risk of data loss, we want to bring those broken
(inconsistent) peers back to normal as soon as possible. Putting them
into the async_recovery_targets queue, however, did quite the oppsite.
CMake Warning at CMakeLists.txt:73 (find_package):
By not providing "Findgflags.cmake" in CMAKE_MODULE_PATH this project
has
asked CMake to find a package configuration file provided by "gflags",
but
CMake did not find one.
Could not find a package configuration file provided by "gflags" with
any
of the following names:
gflagsConfig.cmake
gflags-config.cmake
Add the installation prefix of "gflags" to CMAKE_PREFIX_PATH or set
"gflags_DIR" to a directory containing one of the above files. If
"gflags"
provides a separate development package or SDK, be sure it has been
installed.
cmake: hide symbols import from other libraries in libcls_*
so they will not be involved when resolving symbols. ld tries to
keep a shared library around even if it fails to load it if it offers
some unique symbols. in that case, the library will not be properly
unloaded, and even worse it will interfere with following dlopen()
calls, because it is marked with NODELETE by dlopen(). if it has some
unresolved symbol and does offer some "unique" symbols required by
the library to be loaded, the library will fail to load, despite the
fact that the "unique" symbol is also offered by the executable.
for more details, see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60731 and
https://sourceware.org/bugzilla/show_bug.cgi?id=14577
random: revert change from boost::optional to std::optional
somehow this was breaking the seeding of thread-local engines on gcc.
we'll have to investigate this further, but for now i'm reverting this
piece to get messengers working again