git.apps.os.sepia.ceph.com Git

author	Sage Weil <sage@newdream.net>
	Thu, 6 May 2021 14:57:46 +0000 (10:57 -0400)
committer	Sage Weil <sage@newdream.net>
	Tue, 25 May 2021 14:15:45 +0000 (10:15 -0400)
commit	9ab674579f51a89febec69a93d179505e85a066a
tree	0129469473097071d3adfa6715662e373df64d12	tree \| snapshot
parent	09be2e14cc8d1d06609e876b6f0b0a579c11cd58	commit \| diff

cephadm: --stop-signal=SIGTERM

haproxy's container image tells docker|podman to send SIGUSR1 for a "clean"
shutdown.  For NFS, the connections never close, so we will always hit the
podman|docker 10s timeout and get a SIGKILL.  That, in turn, causes haproxy
to exit with 143, and puts the systemd unit in a failed state.

This highlights a general problem(?) with stopping containers: if they don't
do it quickly then we'll end up in this error state.  We don't directly
address that here.

Avoid this problem by always stopping containers with SIGTERM.  In the
haproxy case, that means an immediate shutdown (no graceful drain of
open connections).  In theory we could do this only for haproxy with
NFS, but we can easily imagine RGW connections that don't close in 10s
either, and we don't want containers exiting in error state--we just
want the proxy to stop quickly.

Signed-off-by: Sage Weil <sage@newdream.net>