Understanding how to configure a :term:`Ceph Monitor` is an important part of
building a reliable :term:`Ceph Storage Cluster`. **All Ceph Storage Clusters
-have at least one monitor**. The monitor complement usually remains fairly
-consistent, but you can add, remove or replace a monitor in a cluster. See
+have at least one Monitor**. The Monitor complement usually remains fairly
+consistent, but you can add, remove or replace a Monitor in a cluster. See
:ref:`adding-and-removing-monitors` for details.
The Ceph Monitor's primary function is to maintain a master copy of the cluster
map. Monitors also provide authentication and logging services. All changes in
-the monitor services are written by the Ceph Monitor to a single Paxos
+the Monitor services are written by the Ceph Monitor to a single Paxos
instance, and Paxos writes the changes to a key/value store. This provides
strong consistency. Ceph Monitors are able to query the most recent version of
the cluster map during sync operations, and they use the key/value store's
Monitor Quorum
--------------
-Our Configuring ceph section provides a trivial `Ceph configuration file`_ that
-provides for one monitor in the test cluster. A cluster will run fine with a
-single monitor; however, **a single monitor is a single-point-of-failure**. To
+The *Configuring Ceph* section provides a trivial `Ceph configuration file`_ that
+provides for one Monitor in the test cluster. A cluster will run fine with a
+single Monitor; however, **a single Monitor is a single-point-of-failure**. To
ensure high availability in a production Ceph Storage Cluster, you should run
-Ceph with multiple monitors so that the failure of a single monitor **WILL NOT**
+Ceph with multiple Monitors so that the failure of a single Monitor **WILL NOT**
bring down your entire cluster.
When a Ceph Storage Cluster runs multiple Ceph Monitors for high availability,
Ceph Monitors use `Paxos`_ to establish consensus about the master cluster map.
-A consensus requires a majority of monitors running to establish a quorum for
+A consensus requires a majority of Monitors running to establish a quorum for
consensus about the cluster map (e.g., 1; 2 out of 3; 3 out of 5; 4 out of 6;
etc.).
Consistency
-----------
-When you add monitor settings to your Ceph configuration file, you need to be
+When you add Monitor settings to your Ceph configuration file, you need to be
aware of some of the architectural aspects of Ceph Monitors. **Ceph imposes
-strict consistency requirements** for a Ceph monitor when discovering another
-Ceph Monitor within the cluster. Although Ceph Clients and other Ceph daemons
-use the Ceph configuration file to discover monitors, monitors discover each
+strict consistency requirements** for a Ceph Monitor when discovering another
+Ceph Monitor within the cluster. Although Ceph clients and other Ceph daemons
+use the Ceph configuration file to discover Monitors, Monitors discover each
other using the monitor map (monmap), not the Ceph configuration file.
A Ceph Monitor always refers to the local copy of the monmap when discovering
other Ceph Monitors in the Ceph Storage Cluster. Using the monmap instead of the
Ceph configuration file avoids errors that could break the cluster (e.g., typos
-in ``ceph.conf`` when specifying a monitor address or port). Since monitors use
+in ``ceph.conf`` when specifying a Monitor address or port). Since Monitors use
monmaps for discovery and they share monmaps with clients and other Ceph
-daemons, **the monmap provides monitors with a strict guarantee that their
+daemons, **the monmap provides Monitors with a strict guarantee that their
consensus is valid.**
Strict consistency also applies to updates to the monmap. As with any other
updates on the Ceph Monitor, changes to the monmap always run through a
distributed consensus algorithm called `Paxos`_. The Ceph Monitors must agree on
each update to the monmap, such as adding or removing a Ceph Monitor, to ensure
-that each monitor in the quorum has the same version of the monmap. Updates to
+that each Monitor in the quorum has the same version of the monmap. Updates to
the monmap are incremental so that Ceph Monitors have the latest agreed upon
version, and a set of previous versions. Maintaining a history enables a Ceph
Monitor that has an older version of the monmap to catch up with the current
- **Filesystem ID**: The ``fsid`` is the unique identifier for your
object store. Since you can run multiple clusters on the same
hardware, you must specify the unique ID of the object store when
- bootstrapping a monitor. Deployment tools usually do this for you
+ bootstrapping a Monitor. Deployment tools usually do this for you
(e.g., ``cephadm`` can call a tool like ``uuidgen``), but you
may specify the ``fsid`` manually too.
-- **Monitor ID**: A monitor ID is a unique ID assigned to each monitor within
+- **Monitor ID**: A Monitor ID is a unique ID assigned to each Monitor within
the cluster. It is an alphanumeric value, and by convention the identifier
usually follows an alphabetical increment (e.g., ``a``, ``b``, etc.). This
can be set in a Ceph configuration file (e.g., ``[mon.a]``, ``[mon.b]``, etc.),
- by a deployment tool, or using the ``ceph`` commandline.
+ by a deployment tool, or using the ``ceph`` command line.
-- **Keys**: The monitor must have secret keys. A deployment tool such as
+- **Keys**: The Monitor must have secret keys. A deployment tool such as
``cephadm`` usually does this for you, but you may
perform this step manually too. See `Monitor Keyrings`_ for details.
====================
To apply configuration settings to the entire cluster, enter the configuration
-settings under ``[global]``. To apply configuration settings to all monitors in
+settings under ``[global]``. To apply configuration settings to all Monitors in
your cluster, enter the configuration settings under ``[mon]``. To apply
-configuration settings to specific monitors, specify the monitor instance
-(e.g., ``[mon.a]``). By convention, monitor instance names use alpha notation.
+configuration settings to specific Monitors, specify the Monitor instance
+(e.g., ``[mon.a]``). By convention, Monitor instance names use alpha notation.
.. code-block:: ini
Minimum Configuration
---------------------
-The bare minimum monitor settings for a Ceph monitor via the Ceph configuration
-file include a hostname and a network address for each monitor. You can configure
-these under ``[mon]`` or under the entry for a specific monitor.
+The bare minimum Monitor settings for a Ceph Monitor via the Ceph configuration
+file include a hostname and a network address for each Monitor. You can configure
+these under ``[mon]`` or under the entry for a specific Monitor.
.. code-block:: ini
See the `Network Configuration Reference`_ for details.
-.. note:: This minimum configuration for monitors assumes that a deployment
+.. note:: This minimum configuration for Monitors assumes that a deployment
tool generates the ``fsid`` and the ``mon.`` key for you.
Once you deploy a Ceph cluster, you **SHOULD NOT** change the IP addresses of
-monitors. However, if you decide to change the monitor's IP address, you
+Monitors. However, if you decide to change the Monitor's IP address, you
must follow a specific procedure. See :ref:`Changing a Monitor's IP address` for
details.
---------------
We recommend running a production Ceph Storage Cluster with at least three Ceph
-Monitors to ensure high availability. When you run multiple monitors, you may
-specify the initial monitors that must be members of the cluster in order to
+Monitors to ensure high availability. When you run multiple Monitors, you may
+specify the initial Monitors that must be members of the cluster in order to
establish a quorum. This may reduce the time it takes for your cluster to come
online.
store is co-located with the OSD Daemons.
In Ceph versions 0.58 and earlier, Ceph Monitors store their data in plain files. This
-approach allows users to inspect monitor data with common tools like ``ls``
+approach allows users to inspect Monitor data with common tools like ``ls``
and ``cat``. However, this approach didn't provide strong consistency.
In Ceph versions 0.59 and later, Ceph Monitors store their data as key/value
----------------
When a Ceph Storage Cluster gets close to its maximum capacity
-(see``mon_osd_full ratio``), Ceph prevents you from writing to or reading from OSDs
+(see ``mon_osd_full_ratio``), Ceph prevents you from writing to or reading from OSDs
as a safety measure to prevent data loss. Therefore, letting a
production Ceph Storage Cluster approach its full ratio is not a good practice,
because it sacrifices high availability. The default full ratio is ``.95``, or
and writing to a 3TB drive. So this exemplary Ceph Storage Cluster has a maximum
actual capacity of 99TB. With a ``mon osd full ratio`` of ``0.95``, if the Ceph
Storage Cluster falls to 5TB of remaining capacity, the cluster will not allow
-Ceph Clients to read and write data. So the Ceph Storage Cluster's operating
+Ceph clients to read and write data. So the Ceph Storage Cluster's operating
capacity is 95TB, not 99TB.
.. ditaa::
reasonable scenario involves a rack's router or power supply failing, which
brings down multiple OSDs simultaneously (e.g., OSDs 7-12). In such a scenario,
you should still strive for a cluster that can remain operational and achieve an
-``active + clean`` state--even if that means adding a few hosts with additional
+``active+clean`` state--even if that means adding a few hosts with additional
OSDs in short order. If your capacity utilization is too high, you may not lose
data, but you could still sacrifice data availability while resolving an outage
within a failure domain if capacity utilization of the cluster exceeds the full
.. tip:: These settings only apply during cluster creation. Afterwards they need
to be changed in the OSDMap using ``ceph osd set-nearfull-ratio`` and
- ``ceph osd set-full-ratio``
+ ``ceph osd set-full-ratio``.
.. index:: heartbeat
Heartbeat
---------
-Ceph monitors know about the cluster by requiring reports from each OSD, and by
+Ceph Monitors know about the cluster by requiring reports from each OSD, and by
receiving reports from OSDs about the status of their neighboring OSDs. Ceph
-provides reasonable default settings for monitor/OSD interaction; however, you
+provides reasonable default settings for Monitor/OSD interaction; however, you
may modify them as needed. See `Monitor/OSD Interaction`_ for details.
Monitor Store Synchronization
-----------------------------
-When you run a production cluster with multiple monitors (recommended), each
-monitor checks to see if a neighboring monitor has a more recent version of the
-cluster map (e.g., a map in a neighboring monitor with one or more epoch numbers
-higher than the most current epoch in the map of the instant monitor).
-Periodically, one monitor in the cluster may fall behind the other monitors to
+When you run a production cluster with multiple Monitors (recommended), each
+Monitor checks to see if a neighboring Monitor has a more recent version of the
+cluster map (e.g., a map in a neighboring Monitor with one or more epoch numbers
+higher than the most current epoch in the map of the instant Monitor).
+Periodically, one Monitor in the cluster may fall behind the other Monitors to
the point where it must leave the quorum, synchronize to retrieve the most
current information about the cluster, and then rejoin the quorum. For the
-purposes of synchronization, monitors may assume one of three roles:
+purposes of synchronization, Monitors may assume one of three roles:
-#. **Leader**: The `Leader` is the first monitor to achieve the most recent
+#. **Leader**: The `Leader` is the first Monitor to achieve the most recent
Paxos version of the cluster map.
-#. **Provider**: The `Provider` is a monitor that has the most recent version
+#. **Provider**: The `Provider` is a Monitor that has the most recent version
of the cluster map, but wasn't the first to achieve the most recent version.
-#. **Requester:** A `Requester` is a monitor that has fallen behind the leader
+#. **Requester:** A `Requester` is a Monitor that has fallen behind the leader
and must synchronize in order to retrieve the most recent information about
the cluster before it can rejoin the quorum.
These roles enable a leader to delegate synchronization duties to a provider,
which prevents synchronization requests from overloading the leader--improving
performance. In the following diagram, the requester has learned that it has
-fallen behind the other monitors. The requester asks the leader to synchronize,
+fallen behind the other Monitors. The requester asks the leader to synchronize,
and the leader tells the requester to synchronize with a provider.
| |
-Synchronization always occurs when a new monitor joins the cluster. During
-runtime operations, monitors may receive updates to the cluster map at different
-times. This means the leader and provider roles may migrate from one monitor to
+Synchronization always occurs when a new Monitor joins the cluster. During
+runtime operations, Monitors may receive updates to the cluster map at different
+times. This means the leader and provider roles may migrate from one Monitor to
another. If this happens while synchronizing (e.g., a provider falls behind the
leader), the provider can terminate synchronization with a requester.
-----
Ceph daemons pass critical messages to each other, which must be processed
-before daemons reach a timeout threshold. If the clocks in Ceph monitors
+before daemons reach a timeout threshold. If the clocks in Ceph Monitors
are not synchronized, it can lead to a number of anomalies. For example:
- Daemons ignoring received messages (e.g., timestamps outdated)
See `Monitor Store Synchronization`_ for details.
-.. tip:: You must configure NTP or PTP daemons on your Ceph monitor hosts to
- ensure that the monitor cluster operates with synchronized clocks.
- It can be advantageous to have monitor hosts sync with each other
+.. tip:: You must configure NTP or PTP daemons on your Ceph Monitor hosts to
+ ensure that the Monitor cluster operates with synchronized clocks.
+ It can be advantageous to have Monitor hosts sync with each other
as well as with multiple quality upstream time sources.
Clock drift may still be noticeable with NTP even though the discrepancy is not
Troubleshooting OSDs
======================
-Before troubleshooting the cluster's OSDs, check the monitors
+Before troubleshooting the cluster's OSDs, check the Monitors
and the network.
-First, determine whether the monitors have a quorum. Run the ``ceph health``
-command or the ``ceph -s`` command and if Ceph shows ``HEALTH_OK`` then there
-is a monitor quorum.
+First, determine whether the Monitors have a quorum. Run the ``ceph health``
+command or the ``ceph -s`` command, and if Ceph shows ``HEALTH_OK``, then there
+is a Monitor quorum.
-If the monitors don't have a quorum or if there are errors with the monitor
-status, address the monitor issues before proceeding by consulting the material
+If the Monitors don't have a quorum or if there are errors with the Monitor
+status, address the Monitor issues before proceeding by consulting the material
in :ref:`rados-troubleshooting-mon`.
Next, check your networks to make sure that they are running properly. Networks
kernel.pid_max = 4194303
-- **Check ``nf_conntrack``:** This connection-tracking and connection-limiting
+- **Check nf_conntrack:** This connection-tracking and connection-limiting
system causes problems for many production Ceph clusters. The problems often
emerge slowly and subtly. As cluster topology and client workload grow,
mysterious and intermittent connection failures and performance glitches
release notes for each Ceph version in order to make sure that you have
addressed any issues related to your kernel.
-- **Segment Fault:** If there is a segment fault, increase log levels and
- restart the problematic daemon(s). If segment faults recur, search the Ceph
- bug tracker `https://tracker.ceph/com/projects/ceph
+- **Segmentation Fault:** If there is a segmentation fault, increase log levels and
+ restart the problematic daemon(s). If segmentation faults recur, search the Ceph
+ bug tracker `https://tracker.ceph.com/projects/ceph
<https://tracker.ceph.com/projects/ceph/>`_ and the ``dev`` and
``ceph-users`` mailing list archives `https://ceph.io/resources
<https://ceph.io/resources>`_ to see if others have experienced and reported
these issues. If this truly is a new and unique failure, post to the ``dev``
email list and provide the following information: the specific Ceph release
- being run, ``ceph.conf`` (with secrets XXX'd out), your monitor status
+ being run, ``ceph.conf`` (with secrets XXX'd out), your Monitor status
output, and excerpts from your log file(s).
When an OSD fails, this means that a ``ceph-osd`` process is unresponsive or
has died and that the corresponding OSD has been marked ``down``. Surviving
-``ceph-osd`` daemons will report to the monitors that the OSD appears to be
+``ceph-osd`` daemons will report to the Monitors that the OSD appears to be
down, and a new status will be visible in the output of the ``ceph health``
command, as in the following example:
If the OSD problem is the result of a software error (for example, a failed
assertion or another unexpected error), search for reports of the issue in the
-`bug tracker <https://tracker.ceph/com/projects/ceph>`_ , the `dev mailing list
+`bug tracker <https://tracker.ceph.com/projects/ceph>`_, the `dev mailing list
archives <https://lists.ceph.io/hyperkitty/list/dev@ceph.io/>`_, and the
`ceph-users mailing list archives
<https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/>`_. If there is no
-------------------
If an OSD is full, Ceph prevents data loss by ensuring that no new data is
-written to the OSD. In an properly running cluster, health checks are raised
+written to the OSD. In a properly running cluster, health checks are raised
when the cluster's OSDs and pools approach certain "fullness" ratios. The
``mon_osd_full_ratio`` threshold defaults to ``0.95`` (or 95% of capacity):
this is the point above which clients are prevented from writing data. The
Drive Configuration
-------------------
-An SAS or SATA storage drive should house only one OSD, but a NVMe drive can
+An SAS or SATA storage drive should house only one OSD, but an NVMe drive can
easily house two or more. However, it is possible for read and write throughput
to bottleneck if other processes share the drive. Such processes include:
-journals / metadata, operating systems, Ceph monitors, ``syslog`` logs, other
+journals / metadata, operating systems, Ceph Monitors, ``syslog`` logs, other
OSDs, and non-Ceph processes.
Because Ceph acknowledges writes *after* journaling, fast SSDs are an
drive errors include ``dmesg``, ``syslog`` logs, and ``smartctl`` (found in the
``smartmontools`` package).
-.. note:: ``smartmontools`` 7.0 and late provides NVMe stat passthrough and
+.. note:: ``smartmontools`` 7.0 and later provides NVMe stat passthrough and
JSON output.
Co-resident Monitors/OSDs
-------------------------
-Although monitors are relatively lightweight processes, performance issues can
-result when monitors are run on the same host machine as an OSD. Monitors issue
+Although Monitors are relatively lightweight processes, performance issues can
+result when Monitors are run on the same host machine as an OSD. Monitors issue
many ``fsync()`` calls and this can interfere with other workloads. The danger
-of performance issues is especially acute when the monitors are co-resident on
-the same storage drive as an OSD. In addition, if the monitors are running an
+of performance issues is especially acute when the Monitors are co-resident on
+the same storage drive as an OSD. In addition, if the Monitors are running an
older kernel (pre-3.0) or a kernel with no ``syncfs(2)`` syscall, then multiple
OSDs running on the same host might make so many commits as to undermine each
other's performance. This problem sometimes results in what is called "the
provides the following benefits:
#. Segregation of (1) heartbeat traffic and replication/recovery traffic
- (private) from (2) traffic from clients and between OSDs and monitors
+ (private) from (2) traffic from clients and between OSDs and Monitors
(public). This helps keep one stream of traffic from DoS-ing the other,
which could in turn result in a cascading failure.
When a private network (or even a single host link) fails or degrades while the
public network continues operating normally, OSDs may not handle this situation
well. In such situations, OSDs use the public network to report each other
-``down`` to the monitors, while marking themselves ``up``. The monitors then
+``down`` to the Monitors, while marking themselves ``up``. The Monitors then
send out-- again on the public network--an updated cluster map with the
-affected OSDs marked `down`. These OSDs reply to the monitors "I'm not dead
-yet!", and the cycle repeats. We call this scenario 'flapping`, and it can be
+affected OSDs marked `down`. These OSDs reply to the Monitors "I'm not dead
+yet!", and the cycle repeats. We call this scenario 'flapping', and it can be
difficult to isolate and remediate. Without a private network, this irksome
dynamic is avoided: OSDs are generally either ``up`` or ``down`` without
flapping.
If something does cause OSDs to 'flap' (repeatedly being marked ``down`` and
-then ``up`` again), you can force the monitors to halt the flapping by
+then ``up`` again), you can force the Monitors to halt the flapping by
temporarily freezing their states:
.. prompt:: bash