Kefu Chai [Wed, 10 Feb 2021 07:49:03 +0000 (15:49 +0800)]
mgr/prometheus: add prometheus to flake8 test
for the explanation why we should add a line break before a binary
operator. see
https://www.python.org/dev/peps/pep-0008/#should-a-line-break-before-or-after-a-binary-operator
Sage Weil [Mon, 15 Mar 2021 22:20:25 +0000 (17:20 -0500)]
mgr/cephadm: ensure mgr metadata is not none
This hunk is from aca45d7d08fd8c3f32849331eba4620e2726282a, a much
larger change in master that added type annotations all over the place.
It just brings src/pybind/mgr/cephadm fully in sync with master.
Sage Weil [Mon, 15 Mar 2021 16:55:36 +0000 (11:55 -0500)]
mgr/cephadm: tolerate failure to update daemon caps
If we're upgrading from 15.2.0, we may fail to update caps. Instead of
failing the upgrade hard, warn to the log and continue. This is less
than ideal, but the caps will get corrected the next time the daemon is
redeployed on the next upgrade, and most likely the previous caps will
continue to work (given they were presumably working before the upgrade).
Sage Weil [Fri, 12 Mar 2021 16:15:35 +0000 (10:15 -0600)]
mgr/cephadm: fix get_keyring_with_caps
1- Pass caps to 'auth get-or-create'
2- Only try 'auth caps' if the get-or-create failed
Note that the 'auth caps' step can fail if upgrading from 15.2.0 since
'profile mgr' didn't include 'auth caps' until 15.2.1. We're not
addressing that for now...
Fixes: 7c0d532f3a4839f4199a13773fb5fa8b6fb3f183 Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 6127d7f20bc8a6ad02d8ea144584eaf2bfc9590e)
Sage Weil [Wed, 10 Mar 2021 22:31:31 +0000 (17:31 -0500)]
python-common: count-per-host must be combined with label or hosts or host_pattern
I think this is better for the same reason we made PlacementSpec() not
mean 'all hosts' by default. If you really want N daemons for every host
in the cluster, be specific with 'count-per-host:2 *'.
Sage Weil [Wed, 10 Mar 2021 00:01:39 +0000 (19:01 -0500)]
mgr/cephadm/schedule: calculate additions/removals in place()
We already have to examine existing daemons to choose the placements.
There is no reason to make the caller call another method to (re)calculate
what the net additions and removals are.
Sage Weil [Sat, 6 Mar 2021 15:10:42 +0000 (10:10 -0500)]
mgr/cephadm/schedule: respect count-per-host
In the no-count cases, our job is simple: we have a set of hosts specified
via some other means (label, filter, explicit list) and simply need to
do N instances for each of those hosts.
Sage Weil [Thu, 4 Mar 2021 23:30:35 +0000 (18:30 -0500)]
mgr/cephadm: do not worry about even # of monitors
Ceph works just fine with an even number of monitors.
Upside: more copies of critical cluster data
Downside: we can tolerate the same number of down mons as N-1, and now
we are slightly more likely to have a failing mon because we have 1 more
that might fail.
On balance it's not clear that have one fewer mon is any better, so avoid
the confusion and complexity of second-guessing ourselves.
Adam King [Wed, 24 Feb 2021 21:13:01 +0000 (16:13 -0500)]
mgr/cephadm: update caps if necessary when getting keyring
If the caps change from the old version to the new one it causes
issues in the upgrade. This allows the caps to be updated. Currently
only seeing this with iscsi but changing it for other as a precaution
Sebastian Wagner [Mon, 22 Feb 2021 14:12:39 +0000 (15:12 +0100)]
cephadm: Get rid of injected_argv
Removed the injected_argv parameter and the injection of code in the cephadm
script we send to hosts.
Now the script is copied and after that we execute the cephadm command.
I would like to copy it only one time (when adding new hosts) but this will be
part of a future PR, together with other prs to:
- Introduce cephadm version
- Get rid of packaged/root mode
- Use pex or eggs
Signed-off-by: Juan Miguel Olmo MartÃnez <jolmomar@redhat.com> Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
(cherry picked from commit 2142dcfc2bac3159b7a24f92ae75df2a14599377)
Sage Weil [Mon, 8 Mar 2021 14:49:14 +0000 (09:49 -0500)]
python-common: continue to allow RGWSpec(realm=r,zone=z)
This is for backward compatibility: an octopus spec yaml can still be
applied to an existing cluster. Note that it might not work on a new
cluster, since cephadm no longer tries to create the realm or zone if
they don't exist.
Sage Weil [Fri, 5 Mar 2021 21:30:20 +0000 (16:30 -0500)]
qa/tasks/cephadm: drop realm.zone convention for rgw
Note that cephadm.py will no longer do anything with rgw realms and
zones. That means that the setup of rgw roles here is only useful
for the default zone and a non-multisite config.
Zac Dover [Tue, 2 Mar 2021 18:16:27 +0000 (04:16 +1000)]
doc/cephadm: add prompts to adoption.rst
This PR formats the bash prompts. It also formats the
bash output so that it appears in the correct (easily
copy-and-pasteable) format. This PR will be followed by
a grammar-improving PR, but this PR is just a
formatting PR.
Zac Dover [Fri, 5 Mar 2021 18:37:01 +0000 (04:37 +1000)]
doc/cephadm: rewrite part of adoption.rst
This PR adds a bit of explanatory text to the
section of the cephadm docs that explains how
to convert Ceph deployments that were not deployed
with cephadm to deployments that can be managed by
the cephadm command line tool.
This is just so we can load up a stored spec after upgrade. We'll silently
drop it, since we have the service_id, and this was only used to generate
that anyway.
Sage Weil [Mon, 15 Mar 2021 22:05:45 +0000 (17:05 -0500)]
mgr/orchestrator: drop $realm.$zone naming convention
- Let users name the rgw service(s) whatever they like
- Make the rgw_zone and rgw_realm arguments optional, so that they can
omit them and let radosgw start up in the generic single-cluster
configuration (whichk, notably, has no realm).
- Continue to set the rgw_realm and rgw_zone options for the daemon(s),
but only when those values are specified.
- Adjust the CLI accordingly. Drop the subcluster argument, which was
only used to generate a service_id.
- Adjust rook. This is actually a simplification and improved mapping onto
the rook CRD, *except* that for octopus we enforced the realm.zone
naming *and* realm==zone. I doubt a single user actually did this
so it is not be worth dealing with, but we need a special case for
where there is a . in the service name.
Sage Weil [Fri, 5 Mar 2021 18:13:56 +0000 (13:13 -0500)]
mgr/cephadm: rgw: do not mess with realm configuration
It is simpler to consider this out of scope for the orchestrator. The
user should set up their multisite realms/zones before deploying the
daemons (or the daemons will not start). In the future we can wrap this
with a more friendly tool, perhaps.
Paul Cuzner [Fri, 19 Feb 2021 02:09:02 +0000 (15:09 +1300)]
mgr/cephadm:Drop active healthcheck during a disable request
The healthcheck could already be active when the admin attempts
to disable it. This patch removes the related healthcheck if it's set
during a config-check disable request.
Paul Cuzner [Thu, 18 Feb 2021 23:50:22 +0000 (12:50 +1300)]
mgr/cephadm:skip an alert if the linkspeed is better than most
The logic was issuing a healthcheck if the linkspeed was different
to the majority. But if the difference is good (i.e. better!) we should
not be raising a healthcheck
Paul Cuzner [Thu, 18 Feb 2021 23:23:13 +0000 (12:23 +1300)]
mgr/cephadm:Minor updates to address review comments
Changes to reflect review comments
- picked up on subscribed = unknown state
- using get_daemon_types() call
- use log.exception more
- changed logic and errors from the public_network check
Paul Cuzner [Thu, 18 Feb 2021 03:17:22 +0000 (16:17 +1300)]
mgr/cephadm:Multiple updates related to the addition of the CLI
Some changes needed to support the introduction of the CLI commands
used to manage the cephadm checks. For example, the main Cephadm
check class now interacts with the keystore directly to determine
status, and provides support for commands like ls to list the
check definitions. In addition the main class now handles existing
configuration checks and ensure that the stored state in the keystore
matches the checks defined by the module
Paul Cuzner [Thu, 18 Feb 2021 03:11:55 +0000 (16:11 +1300)]
mgr/cephadm:Moved 'ownership' of the checker to cephadm
Initial implementation used the Serve class as the owner of the
configuration checker. This patch moves the checker up to the
cephadm module itself, to make the CLI command logic cleaner
Paul Cuzner [Tue, 16 Feb 2021 23:45:38 +0000 (12:45 +1300)]
mgr/cephadm:Adds unit tests for the CephadmConfigChecks class
Add unit tests to test suite to verify functionality. The unit tests use
a sample host definition and scale that to simulate a cluster to run
the tests against
Paul Cuzner [Wed, 3 Mar 2021 00:12:05 +0000 (13:12 +1300)]
mgr/cephadm:Document the intergration with libstoragemgmt
Updates the cephadm osd documentation to include details about
including integration with libstoragement - including the potential
hardware issues that may arise.
Paul Cuzner [Mon, 22 Feb 2021 01:35:03 +0000 (14:35 +1300)]
mgr/cephadm:Enable cephadm device scan to use LSM
Using libstoragemgmt (LSM) in ceph-volume was disabled by default,
(nov 2020) which meant cephadm's inventory never had a way to
request the LSM data. This patch adds a module option called
'device_enhanced_scan' (bool), that if set will append the
--with-lsm parameter to the ceph-volume inventory call.
Daniel Pivonka [Mon, 8 Mar 2021 19:04:29 +0000 (14:04 -0500)]
mgr/cephadm: prevent traceback when invalid osd id passed to 'orch osd rm stop'
orch osd rm exepcts a str that can be converted to an int passed to it
if the user passed something that cant be converted it shows a traceback
catching the ValueError prevents this traceback.
Sage Weil [Tue, 9 Mar 2021 18:15:20 +0000 (12:15 -0600)]
mgr/cephadm: do not prime service cache on reconfig
Ceph daemon reconfig does not need any daemon state refresh since we don't
do a restart--we just rewrite the ceph.conf. This also avoids priming
our cache with a 'starting' state when the daemon wasn't touched.