From: Sebastian Wagner Date: Fri, 21 Feb 2020 13:39:16 +0000 (+0100) Subject: doc/cephadm: Add Troubleshooting X-Git-Tag: v15.1.1~219^2 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=refs%2Fpull%2F33460%2Fhead;p=ceph.git doc/cephadm: Add Troubleshooting Signed-off-by: Sebastian Wagner --- diff --git a/doc/cephadm/administration.rst b/doc/cephadm/administration.rst index 4e9ed4c174f3..30340d0fbb6e 100644 --- a/doc/cephadm/administration.rst +++ b/doc/cephadm/administration.rst @@ -180,3 +180,67 @@ Adoption Process #. Check the ``ceph health detail`` output for cephadm warnings about stray cluster daemons or hosts that are not yet managed. + +Troubleshooting +=============== + +Sometimes there is a need to investigate why a cephadm command failed or why +a specific service no longer runs properly. + +As cephadm deploys daemons as containers, troubleshooting daemons is slightly +different. Here are a few tools and commands to help investigating issues. + +Gathering log files +------------------- + +Use journalctl to gather the log files of all daemons: + +.. note:: By default cephadm now stores logs in journald. This means + that you will no longer find daemon logs in ``/var/log/ceph/``. + +To read the log file of one specific daemon, run:: + + cephadm logs --name + +To fetch all log files of all daemons on a given host, run:: + + for name in $(cephadm ls | jq -r '.[].name') ; do + cephadm logs --name "$name" > $name; + done + +Collecting systemd status +------------------------- + +To print the state of a systemd unit, run:: + + systemctl status "ceph-$(cephadm shell ceph fsid)@.service"; + + +To fetch all state of all daemons of a given host, run:: + + fsid="$(cephadm shell ceph fsid)" + for name in $(cephadm ls | jq -r '.[].name') ; do + systemctl status "ceph-$fsid@$name.service" > $name; + done + + +List all downloaded container images +------------------------------------ + +To list all container images that are downloaded on a host: + +.. note:: ``Image`` might also be called `ImageID` + +:: + + podman ps -a --format json | jq '.[].Image' + "docker.io/library/centos:8" + "registry.opensuse.org/opensuse/leap:15.2" + + +Manually running containers +--------------------------- + +cephadm writes small wrappers that run a containers. Refer to +``/var/lib/ceph///unit.run`` for the container execution command. +to execute a container.