cephadm: add a --dry-run option to cephadm shell

author John Mulligan <jmulligan@redhat.com>

Wed, 11 Oct 2023 18:05:17 +0000 (14:05 -0400)

committer Zac Dover <zac.dover@proton.me>

Fri, 27 Oct 2023 07:23:30 +0000 (17:23 +1000)
author John Mulligan <jmulligan@redhat.com>
Wed, 11 Oct 2023 18:05:17 +0000 (14:05 -0400)
committer Zac Dover <zac.dover@proton.me>
Fri, 27 Oct 2023 07:23:30 +0000 (17:23 +1000)
diff --git a/doc/cephadm/troubleshooting.rst b/doc/cephadm/troubleshooting.rst

index 5ec692881661a76721fcd259ef63296d704f6f10..3d4965deda034f104d58b5887b67dc9c8c3bf070 100644 (file)
--- a/doc/cephadm/troubleshooting.rst
+++ b/doc/cephadm/troubleshooting.rst
@@ -335,16 +335,22 @@ Deploy the daemon::
  
    cephadm --image <container-image> deploy --fsid <fsid> --name mgr.hostname.smfvfd --config-json config-json.json
  
-Analyzing Core Dumps
+Capturing Core Dumps
  ---------------------
  
-When a Ceph daemon crashes, cephadm supports analyzing core dumps. To enable core dumps, run
+A Ceph cluster that uses cephadm can be configured to capture core dumps.
+Initial capture and processing of the coredump is performed by
+`systemd-coredump <https://www.man7.org/linux/man-pages/man8/systemd-coredump.8.html>`_.
+
+
+To enable coredump handling, run:
  
  .. prompt:: bash #
  
    ulimit -c unlimited
  
-Core dumps will now be written to ``/var/lib/systemd/coredump``.
+Core dumps will be written to ``/var/lib/systemd/coredump``.
+This will persist until the system is rebooted.
  
  .. note::
  
@@ -352,16 +358,110 @@ Core dumps will now be written to ``/var/lib/systemd/coredump``.
    they will be written to ``/var/lib/systemd/coredump`` on
    the container host. 
  
-Now, wait for the crash to happen again. To simulate the crash of a daemon, run e.g. ``killall -3 ceph-mon``.
+Now, wait for the crash to happen again. To simulate the crash of a daemon, run
+e.g. ``killall -3 ceph-mon``.
+
+
+Running the Debugger with cephadm
+----------------------------------
+
+Running a single debugging session
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+One can initiate a debugging session using the ``cephadm shell`` command.
+From within the shell container we need to install the debugger and debuginfo
+packages. To debug a core file captured by systemd, run the following:
+
+.. prompt:: bash #
+
+    # start the shell session
+    cephadm shell --mount /var/lib/system/coredump
+    # within the shell:
+    dnf install ceph-debuginfo gdb zstd
+    unzstd /var/lib/systemd/coredump/core.ceph-*.zst
+    gdb /usr/bin/ceph-mon /mnt/coredump/core.ceph-*.zst
+
+You can then run debugger commands at gdb's prompt.
+
+.. prompt::
+
+    (gdb) bt
+    #0  0x00007fa9117383fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
+    #1  0x00007fa910d7f8f0 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
+    #2  0x00007fa913d3f48f in AsyncMessenger::wait() () from /usr/lib64/ceph/libceph-common.so.2
+    #3  0x0000563085ca3d7e in main ()
+
+
+Running repeated debugging sessions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When using ``cephadm shell``, like in the example above, the changes made to
+the container the shell command spawned are ephemeral. Once the shell session
+exits all of the files that were downloaded and installed are no longer
+available. One can simply re-run the same commands every time ``cephadm shell``
+is invoked, but in order to save time and resources one can create a new
+container image and use it for repeated debugging sessions.
+
+In the following example we create a simple file for constructing the
+container image. The command below uses podman but it should work correctly
+if ``podman`` is replaced with ``docker``.
+
+.. prompt:: bash
+
+  cat >Containerfile <<EOF
+  ARG BASE_IMG=quay.io/ceph/ceph:v18
+  FROM \${BASE_IMG}
+  # install ceph debuginfo packages, gdb and other potentially useful packages
+  RUN dnf install --enablerepo='*debug*' -y ceph-debuginfo gdb zstd strace python3-debuginfo
+  EOF
+  podman build -t ceph:debugging -f Containerfile .
+  # pass --build-arg=BASE_IMG=<your image> to customize the base image
+
+The result should be a new local image named ``ceph:debugging``. This image can
+be used on the same machine that built it. Later, the image could be pushed to
+a container repository, or saved and copied to a node runing other ceph
+containers. Please consult the documentation for ``podman`` or ``docker`` for
+more details on the general container workflow.
+
+Once the image has been built it can be used to initiate repeat debugging
+sessions without having to re-install the debug tools and debuginfo packages.
+To debug a core file using this image, in the same way as previously described,
+run:
+
+.. prompt:: bash #
+
+    cephadm --image ceph:debugging shell --mount /var/lib/system/coredump
+
+
+Debugging live processes
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+The gdb debugger has the ability to attach to running processes to debug them.
+For a containerized process this can be accomplished by using the debug image
+and attaching it to the same PID namespace as the process to be debugged.
+
+This requires running a container command with some custom arguments. We can generate a script that can debug a process in a running container.
+
+.. prompt:: bash #
+
+   cephadm --image ceph:debugging shell --dry-run > /tmp/debug.sh
+
+This creates a script with the container command cephadm would use to create a
+shell. Now, modify the script by removing the ``--init`` argument and replace
+that with the argument to join to the namespace used for a running running
+container.  For example, let's assume we want to debug the MGR, and have
+determnined that the MGR is running in a container named
+``ceph-bc615290-685b-11ee-84a6-525400220000-mgr-ceph0-sluwsk``. The new
+argument
+``--pid=container:ceph-bc615290-685b-11ee-84a6-525400220000-mgr-ceph0-sluwsk``
+should be used.
+
+Now, we can run our debugging container with ``sh /tmp/debug.sh``. Within the shell
+we can run commands such as ``ps`` to get the PID of the MGR process. In the following
+example this will be ``2``. Running gdb, we can now attach to the running process:
  
-Install debug packages including ``ceph-debuginfo`` by entering the cephadm shelll::
+.. prompt:: bash (gdb)
  
-  # cephadm shell --mount /var/lib/systemd/coredump
-  [ceph: root@host1 /]# dnf install ceph-debuginfo gdb zstd
-  [ceph: root@host1 /]# unzstd /mnt/coredump/core.ceph-*.zst
-  [ceph: root@host1 /]# gdb /usr/bin/ceph-mon /mnt/coredump/core.ceph-...
-  (gdb) bt
-  #0  0x00007fa9117383fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
-  #1  0x00007fa910d7f8f0 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
-  #2  0x00007fa913d3f48f in AsyncMessenger::wait() () from /usr/lib64/ceph/libceph-common.so.2
-  #3  0x0000563085ca3d7e in main ()
+   attach 2
+   info threads
+   bt
diff --git a/src/cephadm/cephadm b/src/cephadm/cephadm

index eaaeb840664172ed5f9ca9d524f90ab7bfc40cf1..a047c30549fb70092790f3f986d0b987cfa7537e 100755 (executable)
--- a/src/cephadm/cephadm
+++ b/src/cephadm/cephadm
@@ -6194,6 +6194,10 @@ def command_shell(ctx):
          privileged=True)
      command = c.shell_cmd(command)
  
+    if ctx.dry_run:
+        print(' '.join(shlex.quote(arg) for arg in command))
+        return 0
+
      return call_timeout(ctx, command, ctx.timeout)
  
  ##################################
@@ -9141,6 +9145,10 @@ def _get_parser():
          '--no-hosts',
          action='store_true',
          help='dont pass /etc/hosts through to the container')
+    parser_shell.add_argument(
+        '--dry-run',
+        action='store_true',
+        help='print, but do not execute, the container command to start the shell')
  
      parser_enter = subparsers.add_parser(
          'enter', help='run an interactive shell inside a running daemon container')
author	John Mulligan <jmulligan@redhat.com>
	Wed, 11 Oct 2023 18:05:17 +0000 (14:05 -0400)
committer	Zac Dover <zac.dover@proton.me>
	Fri, 27 Oct 2023 07:23:30 +0000 (17:23 +1000)
doc/cephadm/troubleshooting.rst		patch \| blob \| history
src/cephadm/cephadm		patch \| blob \| history