git-server-git.apps.pok.os.sepia.ceph.com Git

author	Adam King <adking@dhcp-41-165.bos.redhat.com>
	Mon, 14 Oct 2024 17:44:03 +0000 (13:44 -0400)
committer	Adam King <adking@redhat.com>
	Thu, 24 Oct 2024 18:51:34 +0000 (14:51 -0400)
commit	5818305e8094f88949a7a63c93c6d76d0efa03d9
tree	9008ddc4fdde1f8d24137f3f94672e9e255d30f9	tree \| snapshot
parent	bd0160de81e216e42d835a3d4ce920c3bef81b16	commit \| diff

cephadm: handle "systemctl start" failures during deployment better

Previously it was assumed when the deploy command fails whatever
daemon we were trying to deploy does not exist on the host. However,
in the specific case where deploy fails trying to start the daemon's
systemd unit this is not the case. This leads us to both cleanup the
keyring for the daemon and also causes us to not trigger a refresh
of the daemons on the host which can make cephadm attempt to
deploy another daemon instead of just reporting the existing one
as failed. To get around this we need to handle that specific
failure as a success in the mgr module's deploy workflow so that
we refresh the daemons and report the failure as intended.

https://tracker.ceph.com/issues/68536

Signed-off-by: Adam King <adking@redhat.com>

src/cephadm/cephadm.py		diff \| blob \| history
src/cephadm/cephadmlib/constants.py		diff \| blob \| history
src/cephadm/cephadmlib/exceptions.py		diff \| blob \| history
src/pybind/mgr/cephadm/serve.py		diff \| blob \| history
src/pybind/mgr/cephadm/tests/test_cephadm.py		diff \| blob \| history
src/pybind/mgr/cephadm/tests/test_services.py		diff \| blob \| history