From: Sage Weil Date: Thu, 6 May 2021 14:57:46 +0000 (-0400) Subject: cephadm: --stop-signal=SIGTERM X-Git-Tag: v17.1.0~1854^2~12 X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=9ab674579f51a89febec69a93d179505e85a066a;p=ceph.git cephadm: --stop-signal=SIGTERM haproxy's container image tells docker|podman to send SIGUSR1 for a "clean" shutdown. For NFS, the connections never close, so we will always hit the podman|docker 10s timeout and get a SIGKILL. That, in turn, causes haproxy to exit with 143, and puts the systemd unit in a failed state. This highlights a general problem(?) with stopping containers: if they don't do it quickly then we'll end up in this error state. We don't directly address that here. Avoid this problem by always stopping containers with SIGTERM. In the haproxy case, that means an immediate shutdown (no graceful drain of open connections). In theory we could do this only for haproxy with NFS, but we can easily imagine RGW connections that don't close in 10s either, and we don't want containers exiting in error state--we just want the proxy to stop quickly. Signed-off-by: Sage Weil --- diff --git a/src/cephadm/cephadm b/src/cephadm/cephadm index e4086a423a27a..425bf6c9cea66 100755 --- a/src/cephadm/cephadm +++ b/src/cephadm/cephadm @@ -3091,6 +3091,10 @@ class CephContainer: 'run', '--rm', '--ipc=host', + # some containers (ahem, haproxy) override this, but we want a fast + # shutdown always (and, more importantly, a successful exit even if we + # fall back to SIGKILL). + '--stop-signal=SIGTERM', ] if isinstance(self.ctx.container_engine, Podman):