]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/commit
cephadm: handle "systemctl start" failures during deployment better 60303/head
authorAdam King <adking@dhcp-41-165.bos.redhat.com>
Mon, 14 Oct 2024 17:44:03 +0000 (13:44 -0400)
committerAdam King <adking@redhat.com>
Thu, 24 Oct 2024 18:51:34 +0000 (14:51 -0400)
commit5818305e8094f88949a7a63c93c6d76d0efa03d9
tree9008ddc4fdde1f8d24137f3f94672e9e255d30f9
parentbd0160de81e216e42d835a3d4ce920c3bef81b16
cephadm: handle "systemctl start" failures during deployment better

Previously it was assumed when the deploy command fails whatever
daemon we were trying to deploy does not exist on the host. However,
in the specific case where deploy fails trying to start the daemon's
systemd unit this is not the case. This leads us to both cleanup the
keyring for the daemon and also causes us to not trigger a refresh
of the daemons on the host which can make cephadm attempt to
deploy another daemon instead of just reporting the existing one
as failed. To get around this we need to handle that specific
failure as a success in the mgr module's deploy workflow so that
we refresh the daemons and report the failure as intended.

https://tracker.ceph.com/issues/68536

Signed-off-by: Adam King <adking@redhat.com>
src/cephadm/cephadm.py
src/cephadm/cephadmlib/constants.py
src/cephadm/cephadmlib/exceptions.py
src/pybind/mgr/cephadm/serve.py
src/pybind/mgr/cephadm/tests/test_cephadm.py
src/pybind/mgr/cephadm/tests/test_services.py