doc/cephadm: Add Troubleshooting

author Sebastian Wagner <sebastian.wagner@suse.com>

Fri, 21 Feb 2020 13:39:16 +0000 (14:39 +0100)

committer Sebastian Wagner <sebastian.wagner@suse.com>

Thu, 27 Feb 2020 19:00:28 +0000 (20:00 +0100)
author Sebastian Wagner <sebastian.wagner@suse.com>
Fri, 21 Feb 2020 13:39:16 +0000 (14:39 +0100)
committer Sebastian Wagner <sebastian.wagner@suse.com>
Thu, 27 Feb 2020 19:00:28 +0000 (20:00 +0100)
diff --git a/doc/cephadm/administration.rst b/doc/cephadm/administration.rst

index 4e9ed4c174f36fcd2dbdb51c0cb640936e5e0975..30340d0fbb6ecb38fa064b734248011ddff94f89 100644 (file)
--- a/doc/cephadm/administration.rst
+++ b/doc/cephadm/administration.rst
@@ -180,3 +180,67 @@ Adoption Process
  
  #. Check the ``ceph health detail`` output for cephadm warnings about
     stray cluster daemons or hosts that are not yet managed.
+   
+Troubleshooting
+===============
+
+Sometimes there is a need to investigate why a cephadm command failed or why
+a specific service no longer runs properly.
+
+As cephadm deploys daemons as containers, troubleshooting daemons is slightly
+different. Here are a few tools and commands to help investigating issues.
+
+Gathering log files
+-------------------
+
+Use journalctl to gather the log files of all daemons:
+
+.. note:: By default cephadm now stores logs in journald. This means
+   that you will no longer find daemon logs in ``/var/log/ceph/``.
+
+To read the log file of one specific daemon, run::
+
+    cephadm logs --name <name-of-daemon>
+
+To fetch all log files of all daemons on a given host, run::
+
+    for name in $(cephadm ls | jq -r '.[].name') ; do
+      cephadm logs --name "$name" > $name;
+    done
+
+Collecting systemd status
+-------------------------
+
+To print the state of a systemd unit, run::
+
+      systemctl status "ceph-$(cephadm shell ceph fsid)@<service name>.service";
+
+
+To fetch all state of all daemons of a given host, run::
+
+    fsid="$(cephadm shell ceph fsid)"
+    for name in $(cephadm ls | jq -r '.[].name') ; do
+      systemctl status "ceph-$fsid@$name.service" > $name;
+    done
+
+
+List all downloaded container images
+------------------------------------
+
+To list all container images that are downloaded on a host:
+
+.. note:: ``Image`` might also be called `ImageID`
+
+::
+
+    podman ps -a --format json | jq '.[].Image'
+    "docker.io/library/centos:8"
+    "registry.opensuse.org/opensuse/leap:15.2"
+
+
+Manually running containers
+---------------------------
+
+cephadm writes small wrappers that run a containers. Refer to
+``/var/lib/ceph/<cluster-fsid>/<service-name>/unit.run`` for the container execution command.
+to execute a container.
author	Sebastian Wagner <sebastian.wagner@suse.com>
	Fri, 21 Feb 2020 13:39:16 +0000 (14:39 +0100)
committer	Sebastian Wagner <sebastian.wagner@suse.com>
	Thu, 27 Feb 2020 19:00:28 +0000 (20:00 +0100)