Laura Flores [Mon, 1 May 2023 16:28:54 +0000 (16:28 +0000)]
mgr: add urllib3==1.26.15 to mgr/requirements.txt
We do not depend on any particular version of
urllib3, but as a workaround to the incompatibility
of urllib3 constraints between kubernetes and
requests, we need to pin it temporarily to
the version both are happy with.
Fixes: https://tracker.ceph.com/issues/59591 Signed-off-by: Laura Flores <lflores@redhat.com>
(cherry picked from commit 80d460005e44649191aa862fa78bd278644b5237)
Edit the "stretch mode" section in doc/rados/operations/stretch-mode.rst
so that the procedure is formatted as a procedure and the sentences
correctly have heads.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit a19ff7a5ea9bbd24365648a90abfa1b720c5b231)
Mistakenly removed in commit d79f2a81541c ("docs: warning and remove
few docs section for Filestore Update docs after filestore removal.").
The kernel client, however new, will continue to be able to talk to
FileStore OSDs for as long as they exist.
Adam King [Thu, 16 Feb 2023 17:34:06 +0000 (12:34 -0500)]
qa/distros: pass --allowerasing --nobest when installing container-tools
One of the tests in the orch suite is running distro install
commands from multiple distros, causing it to first install
container-tools 3.0 and then later install container-tools,
which fails, causing the test to fail. This is sort of a bandaid
fix to getthe test to work. It will cause whatever the last
version of the package to be installed to end up being installed
(and will do so without error) which is what we want in the tests.
Redouane Kachach [Wed, 26 Oct 2022 09:33:38 +0000 (11:33 +0200)]
mgr/cephadm: Adding extra arguments support for RGW frontend Fixes: https://tracker.ceph.com/issues/57931 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 2c46c0741962e0e6a5ddbc960dfd21948daf0947)
John Mulligan [Thu, 30 Mar 2023 20:49:27 +0000 (16:49 -0400)]
python-common: add a dedicated tox env to run mypy
IMO it's not a good practice to overload a tox rule with multiple
different test tools. It forces the tools to share the same virtualenvs
and makes it impossible to run the tools individually. A separate mypy
env also better matches the other tox.ini files in the ceph tree.
Since the new 'mypy' env is in the default env list it will continue
to get run automatically when no specific envs are selected.
John Mulligan [Tue, 28 Mar 2023 20:42:41 +0000 (16:42 -0400)]
mypy: update pinned mypy version to 0.981
mypy version 0.981 fixes a bug where on newer python versions mypy
doesn't properly load pyi files with keyword only arguments.
As noted in the src/mypy-constrains.txt mypy version needs to be
manually bumped periodically, and ceph is overdue for an update too.
It's never been updated since the file was added in June 2021.
John Mulligan [Thu, 30 Mar 2023 20:48:02 +0000 (16:48 -0400)]
python-common: fix variable name reuse to make mypy happy
The variables high and low were being used as both `str`s and regex
match objects. Rename the vars in the if block to avoid this problem.
This change makes this file pass mypy checking on mypy 0.981.
John Mulligan [Tue, 28 Mar 2023 21:09:30 +0000 (17:09 -0400)]
mgr/dashboard: ignore type checking on mgr proxy object assignments
Add `# type: ignore` comments to two dashboard functions that attempt
to set manager properties. There appear to be two approaches to fixing
the problem:
1. The _MgrProxy object that the dashboard uses has a __getattr__ method
for pulling value from the underlying mgr object. It does not have a
__setattr__ method. This means the setting values on _MgrProxy do not
propogate down to the original mgr.
mypy detects the fact that the object doesn't have __setattr__ and
complains. One could add a __setattr__ to the proxy type and mypy
is satisfied.
2. We can just suppress the type check with the comment.
Because I have no idea why the _MgrProxy exists or why it's implemented
the way it is, I feel that 2 is simpler. It is easy enough to go back
later and clean up the comments rather than me investing a lot of time
to understand the dashboard's approach just to bump up the version of
mypy.
John Mulligan [Tue, 28 Mar 2023 21:07:20 +0000 (17:07 -0400)]
mgr/dashboard: ignore type checking for exception handling module
Add a `# type: ignore` comment to the exception handling dashboard
module just like the instance to lines below. This module does not
already import typing, so I'm not going to add it.
This change is needed in order to run mypy 0.981.
Adam King [Mon, 30 Jan 2023 16:27:09 +0000 (11:27 -0500)]
qa/cephadm: add check that iscsi daemon /etc/hosts matches host /etc/hosts
To make sure we aren't being affected by any podman introduced
changes to the /etc/hosts file and test that we're properly
mounting /etc/hosts in our daemon containers
Adam King [Sat, 21 Jan 2023 23:44:22 +0000 (18:44 -0500)]
cephadm: mount host /etc/hosts for containers in podman deployments
Podman messes with the /etc/hosts file in certain version. There
was already a past issue with it placing the container name
there fixed by https://github.com/ceph/ceph/pull/42242. This time
it is adding an entry for "host.containers.internal" (seems to be
podman 4.1 onward currently). Iscsi figures out the FQDN for a
host by running
which is resolving to "host.containers.internal" when run in
the container with the podman modified /etc/hosts.
There is also an issue with grafana dashboard with
this entry present
Passing --no-hosts resolves this, but I think in the past
we avoided that due to not wanting to break deployments
where host name resolution was handled using /etc/hosts.
That's why we had that workaround previously linked. This
time I'm not sure such a workaround exists. The try here
is to mount a copy of the host's version of /etc/hosts
into the iscsi container. That copy won't have the extra
entry podman adds in but will have any user created entries in
case they were actually using it for host name resolution.
If /etc/hosts file isn't present for whatever reason, we're
assuming that this user isn't using /etc/hosts for hostname
resolution, and just going back to passing --no-hosts.
Fixes: https://tracker.ceph.com/issues/58532 Fixes: https://tracker.ceph.com/issues/57018 Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit dd8627bbe3ebc6d924912a37785859d8124f95e5)
Adam King [Wed, 1 Mar 2023 15:58:43 +0000 (10:58 -0500)]
mgr/cephadm: add more aggressive force flag for host maintenance enter
Right now, we have safety checks that will either say things are okay to stop,
return warnings, or returns "alerts". Warnings can be bypassed already with
the --force flag that exists for the command. However, the alerts cannot be
bypassed and cephadm will refuse to attempt to put the host in maintenance mode.
The idea here is for an additional flag that also bypasses that alerts, in cases
where a user really needs to put the host into maintenance mode even if that
means causing issues within the cluster.
Adam King [Wed, 15 Mar 2023 17:18:02 +0000 (13:18 -0400)]
mgr/cephadm: handle HostConnectionError when checking for valid addr
Otherwise, the error is not properly passed back up the chain
and the user can get an error message like
TypeError: __init__() missing 2 required positional arguments: 'hostname' and 'addr'
when trying to add a host, despite the actual problem being
cephadm.ssh.HostConnectionError: Failed to connect to vm-01 (192.168.122.248). Permission denied
The tracker shows a bit more, but generally trying to add a host
that doesn't have the proper pub-key set as an authorized key
will get a misleasing error message. With this patch, the error message looks like
[ceph: root@vm-00 /]# ceph orch host add vm-01 192.168.122.29
Error EINVAL: Failed to connect to vm-01 (192.168.122.29). Permission denied
Log: Opening SSH connection to 192.168.122.29, port 22
[conn=1] Connected to SSH server at 192.168.122.29, port 22
[conn=1] Local address: 192.168.122.156, port 49552
[conn=1] Peer address: 192.168.122.29, port 22
[conn=1] Beginning auth for user root
[conn=1] Auth failed for user root
[conn=1] Connection failure: Permission denied
[conn=1] Aborting connection
Redouane Kachach [Wed, 29 Mar 2023 08:48:30 +0000 (10:48 +0200)]
mgr/cephadm: use a dedicated cephadm tmp dir to copy remote files Fixes: https://tracker.ceph.com/issues/59189 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit ef958d47b44f8e4579ae380bd9d6890c0c62e13a)
Adam King [Wed, 15 Mar 2023 17:55:26 +0000 (13:55 -0400)]
cephadm: handle exceptions applying secondary services during bootstrap
Otherwise we risk hitting a mismatch between the cephadm binary version
and the container image version we're bootstrapping on, resulting in
bootstrap failing. Example in the tracker.
Filestore is no longer supported in cephadm and both the doc [1] and the
DriveGroupValidation [2] raise an exception if this method is used. This
patch removes the legacy code that is supposed to produce filestore
ceph-volume related commands.
Adam King [Fri, 3 Mar 2023 20:31:03 +0000 (15:31 -0500)]
mgr/prometheus: remove dependency on cephadm module
https://github.com/ceph/ceph/commit/f967ac061ebee362cdc82c458e955da75a9045e9
introduced an import of something in the cephadm module
in the prometheus module. This seems to break the prometheus
module in some non-cephadm setups. For example, the ceph-ansible
ci hit
failed: [mgr0 -> mon0] (item=prometheus) => changed=true
ansible_loop_var: item
cmd:
- ceph
- -n
- client.admin
- -k
- /etc/ceph/ceph.client.admin.keyring
- --cluster
- ceph
- mgr
- module
- enable
- prometheus
delta: '0:00:00.389965'
end: '2023-03-03 15:30:07.631308'
item: prometheus
rc: 2
start: '2023-03-03 15:30:07.241343'
stderr: 'Error ENOENT: module ''prometheus'' reports that it cannot run on the active manager daemon: No module named ''cephadm'' (pass --force to force enablement)'
stderr_lines: <omitted>
stdout: ''
stdout_lines: <omitted>
so we need to be a bit more careful with this import and
make sure the prometheus module works fine without cephadm
Adam King [Wed, 15 Feb 2023 22:07:09 +0000 (17:07 -0500)]
mgr/cephadm: be aware of host's shortname and FQDN
The idea is to gether the shortname and FQDN as part
of gather-facts, and then if we ever try to check if a certain
host is in our internal inventory by hostname, we can check
these other known names. This should avoid issues where
we think a hostname specified by FQDN is not in our
inventory because we know the host by the shortname
or vice versa.
John Mulligan [Mon, 27 Feb 2023 19:38:50 +0000 (14:38 -0500)]
cephadm: fix timeout argument to call function
The timeout argument to call function, for executing sub-processes, did
not function - this patch makes timeout work as (probably) intended.
Use the `process.communicate()` method rather than `tee` functions to
handle IO collection. Since no logging is done until after the exit code
is known the tee calls are not necessary. Add calls to kill the child
process when the time out occurs. This helps prevent event loop "leaks"
that generate python warnings.
John Mulligan [Thu, 23 Feb 2023 19:51:13 +0000 (14:51 -0500)]
cephadm/tests: add initial test coverage for call function
The call function provides the ability to run subprocesses, log output,
and provides an optional timeout parameter. This timeout parameter does
not appear to function correctly today, so we make use of
pytest.param/pytest.mark.xfail to mark these cases as already known to
fail.
John Mulligan [Wed, 22 Feb 2023 18:57:21 +0000 (13:57 -0500)]
cephadm: disable coverage for some compatibility blocks
This change disables reporting missing coverage for blocks that
contain copy and pasted code from other python versions and exist
to make those functions available to older python versions.
drive_group: fix limit filter in drive_selection.selector
When multiple osd service specs with 'limit' filter are applied,
the current logic makes the second service speec
try to pick devices that are already used by the first service spec.