Sage Weil [Mon, 9 Aug 2021 18:15:28 +0000 (14:15 -0400)]
cephadm: fix container name detection
'enter' was broken because we weren't correctly identifying the container
name. Strip the newline from the inspect result so that we can reliably
match against the 'running' state.
Sebastian Wagner [Wed, 21 Jul 2021 12:13:52 +0000 (14:13 +0200)]
cephadm: Introduce unit.stop
Reason is we now have to stop two containers named differently. This is possible
with `bash -c ... echo %i | tr . -`. But we're not gainig anything readability
compared to putting it into a unit.stop script.
As not all daemons have this stop script, we still have to call podman for old daemons.
podman adds the current container name to the /etc/hosts
file. Turns out, python's `socket.getfqdn()` differs from
`hostname -f`, when we have the container names containing
dots in it.:
This was partially done in b94c8de but only for haproxy in the cephadm
mgr module not in the cephadm binary.
This adds the same change for keepalived container image.
Now both haproxy and keepalived container images are fully qualified
(registry + namespace + image).
This PR rewrites a section in the Troubleshooting
chapter of the Cephadm Guide. The material that this
section discusses has been covered already in the
Cephadm Guide in the Cephadm Operations chapter.
There's no reason to repeat this information twice,
unless adding technical debt to the documentation
is our goal (which of course it is not, and the
opposite of adding technical debt to the documentation
has been the aim that has guided my work these past
six months).
Adam King [Mon, 19 Jul 2021 16:07:39 +0000 (12:07 -0400)]
mgr/cephadm: stop removal of daemons from offline hosts
This check was only looking for the status of the
host and not looking at the offline_hosts set so
it wasn't actually stopping daemons from being removed
from offline hosts
cmd = ['ceph-volume', '--fsid', fsid] + cv_cmd
with with_cephadm_ctx(cmd) as ctx:
cd.command_ceph_volume(ctx)
assert ctx.fsid == fsid
s = get_ceph_conf(fsid=fsid)
f = cephadm_fs.create_file('ceph.conf', contents=s)
cmd = ['ceph-volume', '--fsid', fsid, '--config', f.path] + cv_cmd
with with_cephadm_ctx(cmd) as ctx:
cd.command_ceph_volume(ctx)
assert ctx.fsid == fsid
cmd = ['ceph-volume', '--fsid', '10000000-0000-0000-0000-0000deadbeef', '--config', f.path] + cv_cmd
with with_cephadm_ctx(cmd) as ctx:
err = 'fsid does not match ceph.conf'
with pytest.raises(cd.Error, match=err):
cd.command_ceph_volume(ctx)
> assert ctx.fsid == None
E AssertionError: assert '10000000-0000-0000-0000-0000deadbeef' == None
E + where '10000000-0000-0000-0000-0000deadbeef' = <cephadm.CephadmContext object at 0x7f1c7121c1c0>.fsid
```
The clean_cgroup method assumes that the ctx.fsid is set while this is
true for the bootstrap command, it isn't set for adopt or deploy commands
(and maybe others).
This ends up to the adopt command to fails:
Traceback (most recent call last):
File "/sbin/cephadm", line 8301, in <module>
main()
File "/sbin/cephadm", line 8289, in main
r = ctx.func(ctx)
File "/sbin/cephadm", line 1764, in _default_image
return func(ctx)
File "/sbin/cephadm", line 5091, in command_adopt
command_adopt_ceph(ctx, daemon_type, daemon_id, fsid)
File "/sbin/cephadm", line 5299, in command_adopt_ceph
osd_fsid=osd_fsid)
File "/sbin/cephadm", line 2884, in deploy_daemon_units
clean_cgroup(ctx, unit_name)
File "/sbin/cephadm", line 2724, in clean_cgroup
if not ctx.fsid:
File "/sbin/cephadm", line 155, in __getattr__
return super().__getattribute__(name)
AttributeError: 'CephadmContext' object has no attribute 'fsid'
Since we already have the fsid value in deploy_daemon_units (which calls
clean_cgroup) then we can pass the fsid value directly.
mgr/cephadm: ingress: fix typo in spec.virtual_interface_networks reference
When using virtual_inteface_networks to identify the interface to have the
virtual ip on, it referenced spec.networks instead of
spec.virtual_interface_networks.
We don't need to run an extra command (mgr module ls) to obtain the mgr
modules list since we already have this information in the mgr_map.
This workflow is already done for the monitoring stack or for configuring
the iscsi integration within the dashboard (during creation) via the
config_dashboard method.
The mgr_map is mocked in the tests with the dashboard module enabled so we
don't need _mon_command_mock_mgr_module_ls anymore.
Sebastian Wagner [Tue, 20 Jul 2021 14:09:57 +0000 (16:09 +0200)]
cephadm: haproxy 2.4 defaults to a different container user.
Another alternative would be to investigage a different setup
leverageing `--sysctl net.ipv4.ip_unprivileged_port_start=0`,
but that would be a larger PR.
Fixes: https://tracker.ceph.com/issues/51355 Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit 250064bdcbe778b3cc245df843d14dd19cbb8772)
Michael Fritch [Fri, 9 Jul 2021 01:35:42 +0000 (19:35 -0600)]
cephadm: use CephadmContext rather than MagicMock
MagicMock hides attribute errors:
```
self = <cephadm.CephadmContext object at 0x7f1121e62370>, name = 'config_json'
def __getattr__(self, name: str) -> Any:
if '_conf' in self.__dict__ and hasattr(self._conf, name):
return getattr(self._conf, name)
elif '_args' in self.__dict__ and hasattr(self._args, name):
return getattr(self._args, name)
else:
> return super().__getattribute__(name)
E AttributeError: 'CephadmContext' object has no attribute 'config_json'
```
Javier Cacheiro [Mon, 12 Jul 2021 14:03:27 +0000 (16:03 +0200)]
Fetch the actually running selinux status.
The HostFacts should return the **actual** selinux mode in which the
kernel is running.
The actual mode can be different from the one in the configuration
if the server has not been rebooted or if the mode was changed
after boot using setenforce.
Instead of reading _selinux_path_list we should look at the output of
sestatus or getenforce.
The _selinux_path_list attribute is no longer needed.
Fixes: https://tracker.ceph.com/issues/51632 Signed-off-by: Javier Cacheiro <javier.cacheiro.lopez@cesga.es>
(cherry picked from commit c3c79fc44c34825384c59cbe962b9153e6b522b0)
When the mgr dashboard module isn't enabled then the iSCSI service deletion
is stuck and the cluster state goes ERR.
The `ceph dashboard` commands aren't available when the mgr dashboard module
isnt' enabled.
This PR improves the readability and format
of the troubleshooting.rst file. This also
makes a change to the markdown of one of the
sub-subsections so that it is made of tildes
(~) instead of carets (^), because that's
the RST standard.
- Rewrites the "Data Location" section of the Operations
docs
- Rewrites the "Health Checks" section of the Operations
docs
- Adds prompts to commands
- Adds console-output formatting to the places where it
is appropriate
- Adds several section headers where appropriate, to
signpost to the reader what is currently under discussion
For some reason, the sysctl directory could not exist if no packages dropping
a custom sysctl file is installed on the host.
Instead we create the directory if it doesn't exist.
Zac Dover [Tue, 29 Jun 2021 11:45:22 +0000 (21:45 +1000)]
doc/cephadm: improving "Starting the Upgrade"
This PR (slightly) improves the text in the section "Starting
the Upgrade" in the "Upgrading Ceph" chapter of the cephadm
documentation.
This is a very minor update, and does little but bring the sentences
into agreement with many other sentences that I've already written.
This is done to give the reader an almost tabular sense of what to
expect when looking at our docs.
胡玮文 [Sun, 13 Jun 2021 06:23:56 +0000 (14:23 +0800)]
cephadm: workaround unit replace failure
This should be a bug in systemd. It failed to cleanup cgroups when stop the
unit. Then if we start a new unit with the same name, the 'ExecStartPre' command
will fail with status=219/CGROUP (Only when systemd unified cgroup hierarchy is
enabled), because cgroup v2 does not allow process in non-leaf group. This
should be fixed in systemd commit e08dabfec7304dfa0d59997dc4219ffaf22af717.
By now, we just remove these left over cgroups before start new unit.
Michael Fritch [Wed, 16 Jun 2021 20:20:38 +0000 (14:20 -0600)]
cephadm: fix regexp to strip `v1:` or `v2:` prefix from IPv6 addr
regexp was striping the first hextet of the IPv6 address:
```
FAILED tests/test_cephadm.py::TestBootstrap::test_mon_addrv[[0000:0000:0000:0000:0000:FFFF:C0A8:0101:1234]-list_networks5-None] - cephadm.Error: Cannot infer CIDR network for mon IP `0000:0000:0000:0000:FFFF:C0A8:0101`: pass --skip-mon-network to configure it later
```
(1) improves syntax and formatting of "Logging to stdout"
(2) improves syntax and formatting of "Logging to files"
(3) replaces all carets with tildes in 3rd-level section
headers in operations.rst (./build-doc was crying
about inconsistency when I fed it tildes, but tildes
and not carets are the RST standard according to
https://docutils.sourceforge.io/ \
docs/user/rst/quickstart.html#sections
so the carets had to go.
Sebastian Wagner [Tue, 15 Jun 2021 09:24:34 +0000 (11:24 +0200)]
pyhton-common: fix mypy errors
Fixes:
```
py3 run-test: commands[2] | mypy --config-file=../mypy.ini -p ceph
ceph/deployment/service_spec.py: note: In member "yaml_representer" of class "ServiceSpec":
ceph/deployment/service_spec.py:659: error: Argument 1 to "represent_dict" of "SafeRepresenter" has incompatible type "_OrderedDictItemsView[str, Any]"; expected "Mapping[Any, Any]"
```
Sebastian Wagner [Tue, 15 Jun 2021 08:19:40 +0000 (10:19 +0200)]
mgr/orch: fix mypy errors
Fixes:
```
orchestrator/__init__.py:6: note: In module imported here:
orchestrator/_interface.py: note: In member "yaml_representer" of class "DaemonDescription":
orchestrator/_interface.py:1039: error: Argument 1 to "represent_dict" of "SafeRepresenter" has incompatible type "ItemsView[Any, Any]"; expected "Mapping[Any, Any]"
orchestrator/_interface.py: note: In member "yaml_representer" of class "ServiceDescription":
orchestrator/_interface.py:1178: error: Argument 1 to "represent_dict" of "SafeRepresenter" has incompatible type "ItemsView[Any, Any]"; expected "Mapping[Any, Any]"
orchestrator/_interface.py: note: At top level:
orchestrator/_interface.py:1181: error: Argument 2 to "add_representer" has incompatible type "Callable[[SafeDumper, DaemonDescription], Any]"; expected "Callable[[SafeDumper, ServiceDescription], Node]"
Found 3 errors in 1 file (checked 29 source files)
```