From: Sage Weil <sage@newdream.net>
Date: Thu, 6 May 2021 14:57:46 +0000 (-0400)
Subject: cephadm: --stop-signal=SIGTERM
X-Git-Tag: v17.1.0~1854^2~12
X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=9ab674579f51a89febec69a93d179505e85a066a;p=ceph.git

cephadm: --stop-signal=SIGTERM

haproxy's container image tells docker|podman to send SIGUSR1 for a "clean"
shutdown.  For NFS, the connections never close, so we will always hit the
podman|docker 10s timeout and get a SIGKILL.  That, in turn, causes haproxy
to exit with 143, and puts the systemd unit in a failed state.

This highlights a general problem(?) with stopping containers: if they don't
do it quickly then we'll end up in this error state.  We don't directly
address that here.

Avoid this problem by always stopping containers with SIGTERM.  In the
haproxy case, that means an immediate shutdown (no graceful drain of
open connections).  In theory we could do this only for haproxy with
NFS, but we can easily imagine RGW connections that don't close in 10s
either, and we don't want containers exiting in error state--we just
want the proxy to stop quickly.

Signed-off-by: Sage Weil <sage@newdream.net>
---

diff --git a/src/cephadm/cephadm b/src/cephadm/cephadm
index e4086a423a27a..425bf6c9cea66 100755
--- a/src/cephadm/cephadm
+++ b/src/cephadm/cephadm
@@ -3091,6 +3091,10 @@ class CephContainer:
             'run',
             '--rm',
             '--ipc=host',
+            # some containers (ahem, haproxy) override this, but we want a fast
+            # shutdown always (and, more importantly, a successful exit even if we
+            # fall back to SIGKILL).
+            '--stop-signal=SIGTERM',
         ]
 
         if isinstance(self.ctx.container_engine, Podman):