Sage Weil [Fri, 8 Nov 2019 19:28:45 +0000 (13:28 -0600)]
Merge PR #31493 into master
* refs/pull/31493/head:
ceph-daemon: 'profile ...' not 'allow profile ...'
mgr/ssh: 'profile ...' not 'allow profile ...'
mgr/orchestrator_cli: rearrange things a bit
doc/mgr/orchestrator_cli: remove irrelevant line
mgr/ssh: learn to deploy rbd-mirror daemons
mgr/orchestrator: add rbd-mirror commands and hooks
ceph-daemon: learn to deploy rbd-mirror daemon
mgr/ssh: handle lack of node hints more gracefully
mgr/ssh: factor out update_{rgw,mds} into common helper
mgr/ssh: fix update_rgw, update_mgr
* refs/pull/31400/head:
mds: establish session with mgr only after added to FSMap
mds: do not register as a service daemon
mds: do not try to diagnose cause of MDSMap removal
mds: fix handling of initial MDS states
mds: remove unnecessary const qualifier
mds: cleanup type decl and map iteration
mds: define stream operator for mds_info_t
This commit undoes the service daemon registration for the MDS. It doesn't look
absolutely necessary and it causes the MDS to be listed twice in the `ceph
versions` output:
Fixing that requires looking for duplicates or ignoring MDSs in the
service daemons when the mon processes `ceph versions`. I have a feeling
that it wasn't actually designed to be used by the MDS this way however.
Additionally, the reason for "unknown" version is because the metadata
sent to the mgr does not include "ceph_version".
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
- Make explicit the check for getting removed from the MDSMap. This was
only done before by checking if MDS held a rank which does not check the
case where a standby is removed from the FSMap.
- Use mds_info_t::dump to simplify various debug output.
- Add a few sanity asserts for invalid state transitions.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Sage Weil [Fri, 8 Nov 2019 17:49:00 +0000 (11:49 -0600)]
Merge PR #29437 into master
* refs/pull/29437/head:
mgr/diskprediction_local: Reverted dependencies, added HGST models
mgr/diskprediction_local: Updated dependencies in ceph.spec.in, debian/control to match requirements.txt
mgr/diskprediction_local: Updated Red Hat developed prediction model. Updated module options to choose between Red Hat and ProphetStor models.
mgr/diskprediction_local: Updated prediction models to use only supported python packages.
mgr/diskprediction_local: Replaced old models and updated predictor.
Sage Weil [Fri, 8 Nov 2019 16:12:27 +0000 (10:12 -0600)]
mgr/orchestrator: add rbd-mirror commands and hooks
This is somewhat different from the other services in that the name is
basically unused: we have a single pool of rbd-mirror daemons for the
whole cluster.
Thomas Bechtold [Thu, 7 Nov 2019 15:41:23 +0000 (16:41 +0100)]
ceph-daemon: Move ceph-daemon executable to own directory
Moving ceph-daemon into src/ceph-daemon/ makes it simpler to add extra
code (eg. tox.ini, README, unittests, ...) specific to ceph-daemon.
That way related files are in a single directory.
Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
Sage Weil [Fri, 8 Nov 2019 13:10:53 +0000 (07:10 -0600)]
ceph-daemon: add --skip-pull
It occurs to me there might be cases where the user *doesn't* want to pull
the latest image (e.g., because it is a partially disconnected enviroment,
and they know the image is already in the local registry).
Sage Weil [Fri, 8 Nov 2019 13:08:46 +0000 (07:08 -0600)]
Merge PR #31464 into master
* refs/pull/31464/head:
ceph-daemon: help users find the shell/CLI too
ceph-daemon: enable the dashboard during bootstrap
ceph-daemon: add CLI helper to bootstrap
Reviewed-by: Paul Cuzner <pcuzner@redhat.com> Reviewed-by: Kai Wagner <kwagner@suse.com> Reviewed-by: Sebastian Wagner <swagner@suse.com>
Sage Weil [Thu, 7 Nov 2019 23:14:52 +0000 (17:14 -0600)]
ceph-daemon: make mon container privileged
libudev needs to be privileged in order to query the underlying hardware
devices, as reported by the 'ceph device ...' command set, and to scrape
smart metrics, etc.
Sage Weil [Thu, 7 Nov 2019 18:54:00 +0000 (12:54 -0600)]
mon/MonMap: encode (more) valid compat monmap when we have v2-only addrs
If we have 1 or more mons with v2-only addrs, pre-nautilus clients can't
talk to them. If there are more than 1 such mons in the map, they also
fail when loading the map because they expect the addrs to be unique. In
such situations, lie by giving them v1 addrs that are actually v2 ip:port
(so not actually valid). Hopefully there are enough other mons that do
have v1 addrs that the clients can still connect.
Fixes: https://tracker.ceph.com/issues/42600 Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 7 Nov 2019 18:51:04 +0000 (12:51 -0600)]
mgr/DaemonServer: warn when we reject reports
If this is triggered it can be disruptive, since we're marking the
connection down. It points to a real bug, so better visibility is good
(e.g., failing the teuthology jobs).
Sage Weil [Thu, 7 Nov 2019 16:57:56 +0000 (10:57 -0600)]
Merge PR #31064 into master
* refs/pull/31064/head:
test: Test balancer module commands
mgr: Improve balancer module status
mgr: Release GIL before calling OSDMap::calc_pg_upmaps()
Removed pandas from requirements.txt, ceph.spec.in, and debian/control
because of installation issues in RHEL/CentOS.
Replaced pandas usages in RHDiskFailurePredictor with similar numpy
counterparts (e.g. structured array instead of dataframe)
Replaced joblib usages with pickle because older version of scikit-learn
did not list joblib as a dependency and so it wasnt getting installed.
Using joblib would have required specifying it as a separate dependency
in spec file and requirements.
Karanraj Chauhan [Wed, 30 Oct 2019 16:17:35 +0000 (12:17 -0400)]
mgr/diskprediction_local: Updated dependencies in ceph.spec.in, debian/control to match requirements.txt
Added pandas dependency to ceph.spec.in and debian/control.
In the spirit of "if it aint broke, dont fix it", I did NOT add
scikit-learn as a dependency in spec or control, because scikit-learn
was already a dependency in diskprediction_local, and so it should have
already have been taken care of.
Also in the same spirit, removed joblib dependency from requirements.txt
because scikit-learn depends on it and therefore joblib will get
installed when scikit-learn gets installed.
Karanraj Chauhan [Tue, 15 Oct 2019 15:30:52 +0000 (11:30 -0400)]
mgr/diskprediction_local: Updated Red Hat developed prediction model. Updated module options to choose between Red Hat and ProphetStor models.
predictor.py contains definition for the original ProphetStor developed model as well
as Red Hat developed model (using Backblaze dataset).
User can choose which model to use by passing either 'prophetstor' or 'redhat' to the
`Module`'s `predictor_model` config option.
Updated the disk health data formatting code in `Module` to include `user_capacity`,
`vendor`, etc fields that are used by the RHDiskFailurePredictor. These will simply
be ignored by the PSDiskFailurePredictor
Updated preprocessing in RH model to use the data passed from module directly instead
of restructuring again. Added logging instead of print statements.
Restructured pretrained models directory to accomodate both models files.
mgr/diskprediction_local: Updated prediction models to use only supported python packages.
Removed non-supported python packages from requirements.txt
Added scikit-learn based models, removed rgf-python based models.
Updated config.json and DiskPredictor.__preprocess for the same.
Also added manufacturer as argument to DiskPredictor.__preprocess
Updated manufacturer lookup - first check if available as smartctl field,
if not then try to infer from model name.
Updated predicted class to be the prediction for the most recent day in
time series data given.
Updated naming convention from "preprocessor" to "scaler".
mgr/diskprediction_local: Replaced old models and updated predictor.
ProphetStor models are replaced with in-house developed models.
Preprocessors are also stored in addition to the prediction models.
Objects are now stored using joblib instead of pickle, as recommended by
scikit-learn docs.
"manufacturer-specific" models are used instead of "best-feature-match"
models. i.e., instead of models being trained (presumably) just based on what
features are available, models have been trained for each manufacturer.
This is because of variation in meaning and availibility of SMART
attributes across manufacturers.
Updated config.json, requirements.txt, and DiskFailurePredictor for these changes.
Thomas Bechtold [Thu, 7 Nov 2019 09:50:04 +0000 (10:50 +0100)]
ceph-daemon: Only run in the __main__ scope
That makes unit testing easier to setup because the code is not loaded
when ceph-daemon gets imported. Instead it is only loaded when
executed.
For that, the parser also moved to a function instead of being on
module level.
Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
Merge pull request #31070 from sebastian-philipp/dashbaord-run-backend-zsh
mgr/dashboard: Fix zsh support in run-backend-api-tests.sh
Reviewed-by: Alfonso Martínez <almartin@redhat.com> Reviewed-by: Laura Paduano <lpaduano@suse.com> Reviewed-by: Patrick Seidensal <pseidensal@suse.com>