Sage Weil [Wed, 17 Mar 2021 19:39:15 +0000 (15:39 -0400)]
mgr/cephadm: stop conflicting daemon when deploying to a specific port
If we are deploying a daemon to bind to a specific port and there is
an existing daemon we are removing that also binds to that port, stop
it first. Unless we are both binding to different IPs.
This resolves the case where daemons bind to * and we redeploy with a
subnet to bind to. It would eventually converge before, but would
throw a bind error in the process and take longer.
Sage Weil [Thu, 11 Mar 2021 23:47:24 +0000 (18:47 -0500)]
mgr/cephadm/schedule: choose an IP from a subnet list
Choose an IP from the subnet list provided by the ServiceSpec.
A few caveats:
- we ignore hosts that don't have IPs in the given subnet
- the subnet matching is STRICT. That is, the CIDR name has to exactly
match what is configured on the host. That means you can't just say 10/8
to match any 10.whatever addres--you need the exactly network on the host
(e.g, 10.1.2.0/24).
- If you modify a servicespec and change the networks when there are
already deployed daemons, we will try to deploy the new instances on
the same ports but bound to a specific IP instead of *. Which will fail.
You need to remove the service first, or remove the old daemons manually
so that creating new ones will succeed.
Sage Weil [Tue, 16 Mar 2021 16:58:03 +0000 (12:58 -0400)]
mgr/cephadm: rgw: drop .crt extension when storing cert in config-key
This will no affect upgrades since we will run the config() method before
prepare_create() any time we deploy a new daemon on this service, which
means we'll re-store the cert in the new key location before we generate
a new rgw_frontends option that references it.
Sage Weil [Thu, 11 Mar 2021 23:40:22 +0000 (18:40 -0500)]
mgr/cephadm/schedule: match placement ip only combination with port
1- We only have an IP to bind to if we also have a port, and
2- If we do, we want an exact match: if the DaemonPlacement has ip of
None, then the DaemonDescription should have None too.
Sage Weil [Mon, 15 Mar 2021 22:58:15 +0000 (18:58 -0400)]
mgr/cephadm: fall back to service spec port if none on DaemonDescription
For an RGW instance that is deployed in an older version, we won't have
IP or port metadata in the DaemonDescription. In that case, fall back
to the port in the ServiceSpec. This should be safe since any such
instance will only have 1 daemon per host.
Sage Weil [Mon, 15 Mar 2021 19:34:13 +0000 (15:34 -0400)]
mgr/cephadm: fix redeploy when daemons have ip:port
The _daemon_action() method can be called directly by upgrade and by
the 'orch daemon <action> <name>' commands. When this happens, construct
a CephadmDaemonDeploySpec from the DaemonDescription that incldes the
metadata we assigned when teh service was created: the IP and port(s).
This fixes upgrade and the CLI.
Sage Weil [Wed, 10 Mar 2021 19:58:09 +0000 (14:58 -0500)]
mgr/cephadm: remove ssl_frontend_ssl_key from RGWSpec
Since this didn't work anyway, stop collecting and passing through the
private key portion of the certificate. Instead, users should include
both in the first option. This is simpler, and provides consistency
across civetweb and beast rgw backends (for whatever that is worth).
Sage Weil [Wed, 10 Mar 2021 19:50:30 +0000 (14:50 -0500)]
mgr/cephadm: fix beast private key config option
This has always been broken. However, beast SSL will also accept a pem
(cert + key) via the ssl_certificate option, so any existing non-broken
users (if they exist) must be using that.
Sage Weil [Wed, 10 Mar 2021 18:59:18 +0000 (13:59 -0500)]
mgr/cephadm/schedule: only 1 port in DaemonPlacement
It would be weird to dynamically number multiple ports (although doable).
But until we have plans to support something like that, no need to handle
it here.
Sage Weil [Wed, 10 Mar 2021 17:24:32 +0000 (12:24 -0500)]
mgr/cephadm/schedule: return DaemonPlacement instead of HostPlacementSpec
Create a new type for the result of scheduling/place(). Include new fields
like ip:port. Introduce a matches_daemon() to see whether an existing
daemon matches a scheduling slot.
Adam King [Fri, 5 Mar 2021 22:01:12 +0000 (17:01 -0500)]
mgr/cephadm: add info to 'ceph orch upgrade status' in cephadm
properly fills in 'services_complete' field, adds info messages to 'message' field
such as what daemon type is being upgraded or if we're pulling an image and adds
'progress' field that shows how many daemons have been upgraded so far.
Sage Weil [Tue, 16 Mar 2021 20:14:19 +0000 (15:14 -0500)]
Merge PR #40135 into pacific
* refs/pull/40135/head:
pybind/mgr: correct a MgrModule annotation
mgr/ceph_module: add type annotation to BaseMgrModule
mgr/prometheus: fix warning of possibly unbound variables
mgr/prometheus: flake8 cleanups
mgr/prometheus: fix import failure (flake8)
mgr/prometheus: add type annotations
mgr/prometheus: raise at seeing unknown status
mgr/prometheus: implement command using CLIReadCommand
mgr/{prometheus,telemetry}: appease mypy
mgr/prometheus: add prometheus to flake8 test
mgr/prometheus: escape special chars using r-string
pybind/mgr/prometheus: PEP8 cleanups
pybind/mgr/prometheus: add typing annotations
mgr/prometheus: introduce metric for collection time
mgr/cephadm: fix 'auth caps' fallback
mgr/cephadm: ensure mgr metadata is not none
qa/suites/rados/cephadm: add back centos+rhel with kubic podman
qa/suites/rados/cephadm/upgrade: deploy a legacy r.z-style rgw
qa/suites/rados/cephadm/upgrade: start at 15.2.9 to test iscsi upgrade
qa/tasks/cephadm.py: don't set mgr count to +1
doc/cephadm: add note about deprecation of NFSv3
doc/cephadm: remove step to restart the mgr
doc/cephadm: use `reconfig` instead of `redeploy`
doc/cephadm: update custom j2 config-key name
doc/cephadm: use 'apt' to install cephadm on Ubuntu
mgr/cephadm: remove duplicate labels when adding a host
mgr/cephadm: tolerate failure to update daemon caps
mgr/cephadm: fix get_keyring_with_caps
python-common: fix PlacementSpec target size method
python-common: count-per-host must be combined with label or hosts or host_pattern
mgr/cephadm: handle bare 'count-per-host:NNN', fix comments
mgr/cephadm/schedule: remove Scheduler abstraction (for now at least)
mgr/cephadm/schedule: calculate additions/removals in place()
mgr/cephadm/schedule: allow colocation of certain daemon types
mgr/cephadm/schedule: shuffle candidates, not final placements
mgr/cephadm/schedule: pass per-type allow_colo to the scheduler
mgr/cephadm/services/cephadmservice: fix typo
mgr/cephadm/schedule: pass daemons, not get_daemons_func
mgr/cephadm: use local var
mgr/cephadm/schedule: move host filtering into get_candidates()
python-common/ceph/deployment/service_spec: disallow max-per-host + explicit placement
mgr/cephadm/schedule: respect count-per-host
mgr/cephadm: adjust deployment logic to allow multiple daemons per host
python-common: add count-per-host to PlacementSpec
mgr/cephadm: do not worry about even # of monitors
mgr/cephadm: add iscsi and nfs to upgrade
mgr/cephadm: update caps if necessary when getting keyring
mgr/cephadm: add cephfs-mirror to CEPH_UPGRADE_ORDER
cephadm: Add cephfs-mirror
qa/cephadm: Add cephfs-mirror test
qa/tasks: some type annotations
mgr/orch: Add cephfs-mirror to enum
mgr/cephadm: Add CephfsMirrorService
mgr/orch: replace def add_{type}(...) with generic add_daemon()
mgr/cephadm: drop `create_func` arg from _add_daemon
mgr/cephadm: move CephadmExporter to new module
mgr/cephadm: fix CephadmExporter deployment
cephadm: exporter: use os.path.realpath(__file__)
mgr/cephadm: root mode: call (and deploy) cephadm binary
cephadm: Get rid of injected_argv
cephadm: Make path to cephadm binary unique
python-common: continue to allow RGWSpec(realm=r,zone=z)
PendingReleaseNodes: note changes in cephadm rgw behavior
qa/tasks/cephadm: drop realm.zone convention for rgw
doc: update docs
doc/cephadm: rewrite "adoption process"
doc/cephadm: rewrite "preparation" in adoption.rst
doc/cephadm: add prompts to adoption.rst
doc/cephadm: rewrite part of adoption.rst
python-common/ceph/deployment: RGWSpec: accept (and drop) subcluster arg
mgr/orchestrator: drop $realm.$zone naming convention
mgr/cephadm: rgw: do not mess with realm configuration
mgr/cephadm:Document the cephadm config-check feature
mgr/cephadm:fix to resolve mypy issue
mgr/cephadm:add unit test for the lookup_check helper
mgr/cephadm:Drop active healthcheck during a disable request
mgr/cephadm:Added helper function to return a specific healthcheck
mgr/cephadm:unit test added for nics better than most
mgr/cephadm:skip an alert if the linkspeed is better than most
mgr/cephadm:fix mypy warning
mgr/cephadm:Remove check from ceph metadata gathering
mgr/cephadm:Add unit test for hosts without public network NIC
mgr/cephadm:Minor updates to address review comments
mgr/cephadm:Added CLI interface for the configuration checker
mgr/cephadm:Multiple updates related to the addition of the CLI
mgr/cephadm:Moved 'ownership' of the checker to cephadm
mgr/cephadm:Unit tests updated to account for upgrades
mgr/cephadm:Updates to CephadmConfigChecks class
mgr/cephadm:Adds unit tests for the CephadmConfigChecks class
mgr/cephadm:add module option to enable configuration checks
mgr/cephadm:added ceph version consistency check
mgr/cephadm: added config checker to main serve loop
mgr/cephadm: adding check logic
mgr/cephadm: resolve rebase conflicts
mgr/cephadm:Document the intergration with libstoragemgmt
mgr/cephadm:Enable cephadm device scan to use LSM
mgr/cephadm: prevent traceback when invalid osd id passed to 'orch osd rm stop'
mgr/cephadm: do not prime service cache on reconfig
mgr/cephadm/osd: PEP-8 fix
mgr/cephadm: Activate existing OSDs
mgr/cephadm: osd: Use _run_cephadm_json()
mgr/cephadm: document ok_to_stop output argument for clarity
mgr/DaemonServer: make warning language a bit friendlier
mgr/cephadm/upgrade: improve language a bit
mgr/cephadm/upgrade: restart multiple osds at once
mgr/cephadm: gather other osds that are safe to stop
mgr/cephadm: optional pass 'known' through to ok_to_stop
mgr/cephadm/upgrade: log start/stop/pause/resume
mgr/cephadm: add CEPHADM_STRAY_DAEMON unittest
mgr/cephadm: alias rgw-nfs -> nfs
qa/tasks/cephadm: remove mirror code
cephadm: fixup `alrady` -> `already`
cephadm: Change outer quotes to avoid escaping inner quotes (Q003)
cephadm: Remove bad quotes from multiline string (Q001)
cephadm: Remove bad quotes (Q000)
cephadm: introduce flake8-quotes
cephadm: line break after binary operator (W504)
cephadm: blank line contains whitespace (W293)
cephadm: trailing whitespace (W291)
cephadm: local variable 'e' is assigned to but never used (F841)
cephadm: 'select' imported but unused (F401)
cephadm: ambiguous variable name 'l' (E741)
cephadm: do not use bare 'except' (E722)
cephadm: statement ends with a semicolon (E703)
cephadm: module level import not at top of file (E402)
cephadm: expected 1 blank line before a nested definition (E306)
cephadm: expected 2 blank lines after end of function or class (E305)
cephadm: too many blank lines (E303)
cephadm: expected 2 blank lines, found 1 (E302)
cephadm: expected 1 blank line, found 0 (E301)
cephadm: too many leading '#' for block comment (E266)
cephadm: block comment should start with '# ' (E265)
cephadm: at least two spaces before inline comment (E261)
cephadm: unexpected spaces around keyword / parameter equals (E251)
cephadm: multiple spaces after ',' (E241)
cephadm: missing whitespace after ':' (E231)
cephadm: missing whitespace around arithmetic operator (E226)
cephadm: missing whitespace around operator (E225)
cephadm: whitespace before ':' (E203)
cephadm: whitespace after '{' (E201)
cephadm: continuation line unaligned for hanging indent (E131)
cephadm: continuation line under-indented for visual indent (E128)
cephadm: continuation line over-indented for visual indent (E127)
cephadm: continuation line over-indented for hanging indent (E126)
cephadm: continuation line with same indent as next logical line (E125)
cephadm: closing bracket does not match visual indentation (E124)
cephadm: ... does not match indentation of opening bracket's line (E123)
cephadm: continuation line missing indentation or outdented (E122)
cephadm: continuation line under-indented for hanging indent (E121)
cephadm: over-indented (E117)
cephadm: introduce flake8
Reviewed-by: Juan Miguel Olmo <jolmomar@redhat.com>
Kefu Chai [Wed, 10 Feb 2021 07:49:03 +0000 (15:49 +0800)]
mgr/prometheus: add prometheus to flake8 test
for the explanation why we should add a line break before a binary
operator. see
https://www.python.org/dev/peps/pep-0008/#should-a-line-break-before-or-after-a-binary-operator
Sage Weil [Mon, 15 Mar 2021 22:20:25 +0000 (17:20 -0500)]
mgr/cephadm: ensure mgr metadata is not none
This hunk is from aca45d7d08fd8c3f32849331eba4620e2726282a, a much
larger change in master that added type annotations all over the place.
It just brings src/pybind/mgr/cephadm fully in sync with master.
Sage Weil [Mon, 15 Mar 2021 16:55:36 +0000 (11:55 -0500)]
mgr/cephadm: tolerate failure to update daemon caps
If we're upgrading from 15.2.0, we may fail to update caps. Instead of
failing the upgrade hard, warn to the log and continue. This is less
than ideal, but the caps will get corrected the next time the daemon is
redeployed on the next upgrade, and most likely the previous caps will
continue to work (given they were presumably working before the upgrade).
Sage Weil [Fri, 12 Mar 2021 16:15:35 +0000 (10:15 -0600)]
mgr/cephadm: fix get_keyring_with_caps
1- Pass caps to 'auth get-or-create'
2- Only try 'auth caps' if the get-or-create failed
Note that the 'auth caps' step can fail if upgrading from 15.2.0 since
'profile mgr' didn't include 'auth caps' until 15.2.1. We're not
addressing that for now...
Fixes: 7c0d532f3a4839f4199a13773fb5fa8b6fb3f183 Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 6127d7f20bc8a6ad02d8ea144584eaf2bfc9590e)
Sage Weil [Wed, 10 Mar 2021 22:31:31 +0000 (17:31 -0500)]
python-common: count-per-host must be combined with label or hosts or host_pattern
I think this is better for the same reason we made PlacementSpec() not
mean 'all hosts' by default. If you really want N daemons for every host
in the cluster, be specific with 'count-per-host:2 *'.
Sage Weil [Wed, 10 Mar 2021 00:01:39 +0000 (19:01 -0500)]
mgr/cephadm/schedule: calculate additions/removals in place()
We already have to examine existing daemons to choose the placements.
There is no reason to make the caller call another method to (re)calculate
what the net additions and removals are.
Sage Weil [Sat, 6 Mar 2021 15:10:42 +0000 (10:10 -0500)]
mgr/cephadm/schedule: respect count-per-host
In the no-count cases, our job is simple: we have a set of hosts specified
via some other means (label, filter, explicit list) and simply need to
do N instances for each of those hosts.
Sage Weil [Thu, 4 Mar 2021 23:30:35 +0000 (18:30 -0500)]
mgr/cephadm: do not worry about even # of monitors
Ceph works just fine with an even number of monitors.
Upside: more copies of critical cluster data
Downside: we can tolerate the same number of down mons as N-1, and now
we are slightly more likely to have a failing mon because we have 1 more
that might fail.
On balance it's not clear that have one fewer mon is any better, so avoid
the confusion and complexity of second-guessing ourselves.
Adam King [Wed, 24 Feb 2021 21:13:01 +0000 (16:13 -0500)]
mgr/cephadm: update caps if necessary when getting keyring
If the caps change from the old version to the new one it causes
issues in the upgrade. This allows the caps to be updated. Currently
only seeing this with iscsi but changing it for other as a precaution
Sebastian Wagner [Mon, 22 Feb 2021 14:12:39 +0000 (15:12 +0100)]
cephadm: Get rid of injected_argv
Removed the injected_argv parameter and the injection of code in the cephadm
script we send to hosts.
Now the script is copied and after that we execute the cephadm command.
I would like to copy it only one time (when adding new hosts) but this will be
part of a future PR, together with other prs to:
- Introduce cephadm version
- Get rid of packaged/root mode
- Use pex or eggs
Signed-off-by: Juan Miguel Olmo MartÃnez <jolmomar@redhat.com> Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
(cherry picked from commit 2142dcfc2bac3159b7a24f92ae75df2a14599377)