From: Zac Dover Date: Sat, 3 Jun 2023 04:03:37 +0000 (+1000) Subject: doc/rados: edit troubleshooting-mon.rst (1 of x) X-Git-Tag: v17.2.7~339^2 X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=c0de219007d9bb163b17ac49886aae839ff9c577;p=ceph.git doc/rados: edit troubleshooting-mon.rst (1 of x) Edit the first 150 lines of doc/rados/troubleshooting/troubleshooting-mon.rst. https://tracker.ceph.com/issues/58485 Signed-off-by: Zac Dover (cherry picked from commit 0f0896d30ad677aca9e7a396f78d1e48ad64a90a) --- diff --git a/doc/rados/troubleshooting/troubleshooting-mon.rst b/doc/rados/troubleshooting/troubleshooting-mon.rst index fef181759466d..0939e086fa9c1 100644 --- a/doc/rados/troubleshooting/troubleshooting-mon.rst +++ b/doc/rados/troubleshooting/troubleshooting-mon.rst @@ -1,115 +1,132 @@ .. _rados-troubleshooting-mon: -================================= +========================== Troubleshooting Monitors -================================= +========================== .. index:: monitor, high availability -When a cluster encounters monitor-related troubles there's a tendency to -panic, and sometimes with good reason. Losing one or more monitors doesn't -necessarily mean that your cluster is down, so long as a majority are up, -running, and form a quorum. -Regardless of how bad the situation is, the first thing you should do is to -calm down, take a breath, and step through the below troubleshooting steps. +If a cluster encounters monitor-related problems, this does not necessarily +mean that the cluster is in danger of going down. Even if multiple monitors are +lost, the cluster can still be up and running, as long as there are enough +surviving monitors to form a quorum. +However serious your cluster's monitor-related problems might be, we recommend +that you take the following troubleshooting steps. -Initial Troubleshooting -======================== +Initial Troubleshooting +======================= **Are the monitors running?** - First of all, we need to make sure the monitor (*mon*) daemon processes - (``ceph-mon``) are running. You would be amazed by how often Ceph admins - forget to start the mons, or to restart them after an upgrade. There's no - shame, but try to not lose a couple of hours looking for a deeper problem. - When running Kraken or later releases also ensure that the manager - daemons (``ceph-mgr``) are running, usually alongside each ``ceph-mon``. - + First, make sure that the monitor (*mon*) daemon processes (``ceph-mon``) are + running. Sometimes Ceph admins either forget to start the mons or forget to + restart the mons after an upgrade. Checking for this simple oversight can + save hours of painstaking troubleshooting. It is also important to make sure + that the manager daemons (``ceph-mgr``) are running. Remember that typical + cluster configurations provide one ``ceph-mgr`` for each ``ceph-mon``. + + .. note:: Rook will not run more than two managers. + +**Can you reach the monitor nodes?** + + In certain rare cases, there may be ``iptables`` rules that block access to + monitor nodes or TCP ports. These rules might be left over from earlier + stress testing or rule development. To check for the presence of such rules, + SSH into the server and then try to connect to the monitor's ports + (``tcp/3300`` and ``tcp/6789``) using ``telnet``, ``nc``, or a similar tool. -**Are you able to reach to the mon nodes?** +**Does the ``ceph status`` command run and receive a reply from the cluster?** - Doesn't happen often, but sometimes there are ``iptables`` rules that - block access to mon nodes or TCP ports. These may be leftovers from - prior stress-testing or rule development. Try SSHing into - the server and, if that succeeds, try connecting to the monitor's ports - (``tcp/3300`` and ``tcp/6789``) using a ``telnet``, ``nc``, or similar tools. + If the ``ceph status`` command does receive a reply from the cluster, then the + cluster is up and running. The monitors will answer to a ``status`` request + only if there is a formed quorum. Confirm that one or more ``mgr`` daemons + are reported as running. Under ideal conditions, all ``mgr`` daemons will be + reported as running. -**Does ceph -s run and obtain a reply from the cluster?** - If the answer is yes then your cluster is up and running. One thing you - can take for granted is that the monitors will only answer to a ``status`` - request if there is a formed quorum. Also check that at least one ``mgr`` - daemon is reported as running, ideally all of them. + If the ``ceph status`` command does not receive a reply from the cluster, then + there are probably not enough monitors ``up`` to form a quorum. The ``ceph + -s`` command with no further options specified connects to an arbitrarily + selected monitor. In certain cases, however, it might be helpful to connect + to a specific monitor (or to several specific monitors in sequence) by adding + the ``-m`` flag to the command: for example, ``ceph status -m mymon1``. - If ``ceph -s`` hangs without obtaining a reply from the cluster - or showing ``fault`` messages, then it is likely that your monitors - are either down completely or just a fraction are up -- a fraction - insufficient to form a majority quorum. This check will connect to an - arbitrary mon; in rare cases it may be illuminating to bind to specific - mons in sequence by adding e.g. ``-m mymon1`` to the command. -**What if ceph -s doesn't come back?** +**None of this worked. What now?** - If you haven't gone through all the steps so far, please go back and do. + If the above solutions have not resolved your problems, you might find it + helpful to examine each individual monitor in turn. Whether or not a quorum + has been formed, it is possible to contact each monitor individually and + request its status by using the ``ceph tell mon.ID mon_status`` command (here + ``ID`` is the monitor's identifier). - You can contact each monitor individually asking them for their status, - regardless of a quorum being formed. This can be achieved using - ``ceph tell mon.ID mon_status``, ID being the monitor's identifier. You should - perform this for each monitor in the cluster. In section `Understanding - mon_status`_ we will explain how to interpret the output of this command. + Run the ``ceph tell mon.ID mon_status`` command for each monitor in the + cluster. For more on this command's output, see :ref:`Understanding + mon_status + `. - You may instead SSH into each mon node and query the daemon's admin socket. + There is also an alternative method: SSH into each monitor node and query the + daemon's admin socket. See :ref:`Using the Monitor's Admin + Socket`. +.. _rados_troubleshoting_troubleshooting_mon_using_admin_socket: Using the monitor's admin socket -================================= +================================ -The admin socket allows you to interact with a given daemon directly using a -Unix socket file. This file can be found in your monitor's ``run`` directory. -By default, the admin socket will be kept in ``/var/run/ceph/ceph-mon.ID.asok`` -but this may be elsewhere if you have overridden the default directory. If you -don't find it there, check your ``ceph.conf`` for an alternative path or -run:: +A monitor's admin socket allows you to interact directly with a specific daemon +by using a Unix socket file. This file is found in the monitor's ``run`` +directory. The admin socket's default directory is +``/var/run/ceph/ceph-mon.ID.asok``, but this can be overridden and the admin +socket might be elsewhere, especially if your cluster's daemons are deployed in +containers. If you cannot find it, either check your ``ceph.conf`` for an +alternative path or run the following command: + +.. prompt:: bash $ - ceph-conf --name mon.ID --show-config-value admin_socket + ceph-conf --name mon.ID --show-config-value admin_socket -Bear in mind that the admin socket will be available only while the monitor -daemon is running. When the monitor is properly shut down, the admin socket -will be removed. If however the monitor is not running and the admin socket -persists, it is likely that the monitor was improperly shut down. -Regardless, if the monitor is not running, you will not be able to use the -admin socket, with ``ceph`` likely returning ``Error 111: Connection Refused``. +The admin socket is available for use only when the monitor daemon is running. +Whenever the monitor has been properly shut down, the admin socket is removed. +However, if the monitor is not running and the admin socket persists, it is +likely that the monitor has been improperly shut down. In any case, if the +monitor is not running, it will be impossible to use the admin socket, and the +``ceph`` command is likely to return ``Error 111: Connection Refused``. -Accessing the admin socket is as simple as running ``ceph tell`` on the daemon -you are interested in. For example:: +To access the admin socket, run a ``ceph tell`` command of the following form +(specifying the daemon that you are interested in): - ceph tell mon. mon_status +.. prompt:: bash $ -Under the hood, this passes the command ``help`` to the running MON daemon -```` via its "admin socket", which is a file ending in ``.asok`` -somewhere under ``/var/run/ceph``. Once you know the full path to the file, -you can even do this yourself:: + ceph tell mon. mon_status - ceph --admin-daemon +This command passes a ``help`` command to the specific running monitor daemon +```` via its admin socket. If you know the full path to the admin socket +file, this can be done more directly by running the following command: -Using ``help`` as the command to the ``ceph`` tool will show you the -supported commands available through the admin socket. Please take a look -at ``config get``, ``config show``, ``mon stat`` and ``quorum_status``, -as those can be enlightening when troubleshooting a monitor. +.. prompt:: bash $ + ceph --admin-daemon + +Running ``ceph help`` shows all supported commands that are available through +the admin socket. See especially ``config get``, ``config show``, ``mon stat``, +and ``quorum_status``. + +.. _rados_troubleshoting_troubleshooting_mon_understanding_mon_status: Understanding mon_status -========================= +======================== -``mon_status`` can always be obtained via the admin socket. This command will -output a multitude of information about the monitor, including the same output -you would get with ``quorum_status``. +The status of the monitor (as reported by the ``ceph tell mon.X mon_status`` +command) can always be obtained via the admin socket. This command outputs a +great deal of information about the monitor (including the information found in +the output of the ``quorum_status`` command). -Take the following example output of ``ceph tell mon.c mon_status``:: +To understand this command's output, let us consider the following example, in +which we see the output of ``ceph tell mon.c mon_status``:: - { "name": "c", "rank": 2, "state": "peon", @@ -135,27 +152,30 @@ Take the following example output of ``ceph tell mon.c mon_status``:: "name": "c", "addr": "127.0.0.1:6795\/0"}]}} -A couple of things are obvious: we have three monitors in the monmap (*a*, *b* -and *c*), the quorum is formed by only two monitors, and *c* is in the quorum -as a *peon*. +It is clear that there are three monitors in the monmap (*a*, *b*, and *c*), +the quorum is formed by only two monitors, and *c* is in the quorum as a +*peon*. + +**Which monitor is out of the quorum?** -Which monitor is out of the quorum? + The answer is **a** (that is, ``mon.a``). - The answer would be **a**. +**Why?** -Why? + When the ``quorum`` set is examined, there are clearly two monitors in the + set: *1* and *2*. But these are not monitor names. They are monitor ranks, as + established in the current ``monmap``. The ``quorum`` set does not include + the monitor that has rank 0, and according to the ``monmap`` that monitor is + ``mon.a``. - Take a look at the ``quorum`` set. We have two monitors in this set: *1* - and *2*. These are not monitor names. These are monitor ranks, as established - in the current monmap. We are missing the monitor with rank 0, and according - to the monmap that would be ``mon.a``. +**How are monitor ranks determined?** -By the way, how are ranks established? + Monitor ranks are calculated (or recalculated) whenever monitors are added or + removed. The calculation of ranks follows a simple rule: the **greater** the + ``IP:PORT`` combination, the **lower** the rank. In this case, because + ``127.0.0.1:6789`` is lower than the other two ``IP:PORT`` combinations, + ``mon.a`` has the highest rank: namely, rank 0. - Ranks are (re)calculated whenever you add or remove monitors and follow a - simple rule: the **greater** the ``IP:PORT`` combination, the **lower** the - rank is. In this case, considering that ``127.0.0.1:6789`` is lower than all - the remaining ``IP:PORT`` combinations, ``mon.a`` has rank 0. Most Common Monitor Issues =========================== @@ -186,25 +206,25 @@ How to troubleshoot this? socket as explained in `Using the monitor's admin socket`_ and `Understanding mon_status`_. - If the monitor is out of the quorum, its state should be one of - ``probing``, ``electing`` or ``synchronizing``. If it happens to be either - ``leader`` or ``peon``, then the monitor believes to be in quorum, while - the remaining cluster is sure it is not; or maybe it got into the quorum - while we were troubleshooting the monitor, so check you ``ceph -s`` again - just to make sure. Proceed if the monitor is not yet in the quorum. + If the monitor is out of the quorum, its state should be one of ``probing``, + ``electing`` or ``synchronizing``. If it happens to be either ``leader`` or + ``peon``, then the monitor believes to be in quorum, while the remaining + cluster is sure it is not; or maybe it got into the quorum while we were + troubleshooting the monitor, so check you ``ceph status`` again just to make + sure. Proceed if the monitor is not yet in the quorum. What if the state is ``probing``? This means the monitor is still looking for the other monitors. Every time - you start a monitor, the monitor will stay in this state for some time - while trying to connect the rest of the monitors specified in the ``monmap``. - The time a monitor will spend in this state can vary. For instance, when on - a single-monitor cluster (never do this in production), - the monitor will pass through the probing state almost instantaneously. - In a multi-monitor cluster, the monitors will stay in this state until they - find enough monitors to form a quorum -- this means that if you have 2 out - of 3 monitors down, the one remaining monitor will stay in this state - indefinitely until you bring one of the other monitors up. + you start a monitor, the monitor will stay in this state for some time while + trying to connect the rest of the monitors specified in the ``monmap``. The + time a monitor will spend in this state can vary. For instance, when on a + single-monitor cluster (never do this in production), the monitor will pass + through the probing state almost instantaneously. In a multi-monitor + cluster, the monitors will stay in this state until they find enough monitors + to form a quorum -- this means that if you have 2 out of 3 monitors down, the + one remaining monitor will stay in this state indefinitely until you bring + one of the other monitors up. If you have a quorum the starting daemon should be able to find the other monitors quickly, as long as they can be reached. If your @@ -369,18 +389,18 @@ Can I increase the maximum tolerated clock skew? How do I know there's a clock skew? - The monitors will warn you via the cluster status ``HEALTH_WARN``. ``ceph health - detail`` or ``ceph status`` should show something like:: + The monitors will warn you via the cluster status ``HEALTH_WARN``. ``ceph + health detail`` or ``ceph status`` should show something like:: mon.c addr 10.10.0.1:6789/0 clock skew 0.08235s > max 0.05s (latency 0.0045s) That means that ``mon.c`` has been flagged as suffering from a clock skew. - On releases beginning with Luminous you can issue the - ``ceph time-sync-status`` command to check status. Note that the lead mon - is typically the one with the numerically lowest IP address. It will always - show ``0``: the reported offsets of other mons are relative to - the lead mon, not to any external reference source. + On releases beginning with Luminous you can issue the ``ceph + time-sync-status`` command to check status. Note that the lead mon is + typically the one with the numerically lowest IP address. It will always + show ``0``: the reported offsets of other mons are relative to the lead mon, + not to any external reference source. What should I do if there's a clock skew?