exit(1);
}
-In the end, you'll want to close your IO context and connection to RADOS with :c:func:`rados_ioctx_destroy()` and :c:func:`rados_shutdown()`::
+In the end, you will want to close your IO context and connection to RADOS with :c:func:`rados_ioctx_destroy()` and :c:func:`rados_shutdown()`::
rados_ioctx_destroy(io);
rados_shutdown(cluster);
If Ceph Monitors discovered each other through the Ceph configuration file
instead of through the monmap, it would introduce additional risks because the
-Ceph configuration files aren't updated and distributed automatically. Ceph
+Ceph configuration files are not updated and distributed automatically. Ceph
Monitors might inadvertently use an older Ceph configuration file, fail to
recognize a Ceph Monitor, fall out of a quorum, or develop a situation where
-`Paxos`_ isn't able to determine the current state of the system accurately.
+`Paxos`_ is not able to determine the current state of the system accurately.
.. index:: Ceph Monitor; bootstrapping monitors
.. tip:: You SHOULD install NTP on your Ceph monitor hosts to
ensure that the monitor cluster operates with synchronized clocks.
-Clock drift may still be noticeable with NTP even though the discrepancy isn't
+Clock drift may still be noticeable with NTP even though the discrepancy is not
yet harmful. Ceph's clock drift / clock skew warnings may get triggered even
though NTP maintains a reasonable level of synchronization. Increasing your
clock drift may be tolerable under such circumstances; however, a number of
:Description: Initial number of worker threads used by each Async Messenger instance.
Should be at least equal to highest number of replicas, but you can
- decrease it if you're low on CPU core count and/or you host a lot of
+ decrease it if you are low on CPU core count and/or you host a lot of
OSDs on single server.
:Type: 64-bit Unsigned Integer
:Required: No
workers #1 and #2 to CPU cores #0 and #2, respectively.
NOTE: when manually setting affinity, make sure to not assign workers to
processors that are virtual CPUs created as an effect of Hyperthreading
- or similar technology, because they're slower than regular CPU cores.
+ or similar technology, because they are slower than regular CPU cores.
:Type: String
:Required: No
:Default: ``(empty)``
.. note:: Ceph uses `CIDR`_ notation for subnets (e.g., ``10.0.0.0/24``).
-When you've configured your networks, you may restart your cluster or restart
+When you have configured your networks, you may restart your cluster or restart
each daemon. Ceph daemons bind dynamically, so you do not have to restart the
entire cluster at once if you change your network configuration.
``ms bind ipv6``
:Description: Enables Ceph daemons to bind to IPv6 addresses. Currently the
- messenger *either* uses IPv4 or IPv6, but it can't do both.
+ messenger *either* uses IPv4 or IPv6, but it cannot do both.
:Type: Boolean
:Default: ``false``
:Required: No
.. note:: If you have specified multiple monitors in the setup of the cluster,
make sure, that all monitors are up and running. If the monitors haven't
- formed quorum, ``ceph-create-keys`` will not finish and the keys aren't
+ formed quorum, ``ceph-create-keys`` will not finish and the keys are not
generated.
Forget Keys
If monitors discovered each other through the Ceph configuration file instead of
through the monmap, it would introduce additional risks because the Ceph
-configuration files aren't updated and distributed automatically. Monitors
+configuration files are not updated and distributed automatically. Monitors
might inadvertently use an older ``ceph.conf`` file, fail to recognize a
-monitor, fall out of a quorum, or develop a situation where `Paxos`_ isn't able
+monitor, fall out of a quorum, or develop a situation where `Paxos`_ is not able
to determine the current state of the system accurately. Consequently, making
changes to an existing monitor's IP address must be done with great care.
After that, you can observe the data migration which should come to its
end. The difference between marking ``out`` the OSD and reweighting it
to 0 is that in the first case the weight of the bucket which contains
- the OSD isn't changed whereas in the second case the weight of the bucket
+ the OSD is not changed whereas in the second case the weight of the bucket
is updated (and decreased of the OSD weight). The reweight command could
be sometimes favoured in the case of a "small" cluster.
and forces CRUSH to re-place (1-weight) of the data that would
otherwise live on this drive. It does not change the weights assigned
to the buckets above the OSD in the crush map, and is a corrective
-measure in case the normal CRUSH distribution isn't working out quite
+measure in case the normal CRUSH distribution is not working out quite
right. For instance, if one of your OSDs is at 90% and the others are
at 50%, you could reduce this weight to try and compensate for it. ::
The crush location for an OSD is normally expressed via the ``crush location``
config option being set in the ``ceph.conf`` file. Each time the OSD starts,
-it verifies it is in the correct location in the CRUSH map and, if it isn't,
+it verifies it is in the correct location in the CRUSH map and, if it is not,
it moved itself. To disable this automatic CRUSH map management, add the
following to your configuration file in the ``[osd]`` section::
#. A default of ``root=default host=HOSTNAME`` where the hostname is
generated with the ``hostname -s`` command.
-This isn't useful by itself, as the OSD itself has the exact same
+This is not useful by itself, as the OSD itself has the exact same
behavior. However, the script can be modified to provide additional
location fields (for example, the rack or datacenter), and then the
hook enabled via the config option::
When a Ceph Client reads or writes data, it always contacts the primary OSD in
the acting set. For set ``[2, 3, 4]``, ``osd.2`` is the primary. Sometimes an
-OSD isn't well suited to act as a primary compared to other OSDs (e.g., it has
+OSD is not well suited to act as a primary compared to other OSDs (e.g., it has
a slow disk or a slow controller). To prevent performance bottlenecks
(especially on read operations) while maximizing utilization of your hardware,
you can set a Ceph OSD's primary affinity so that CRUSH is less likely to use
finding the `placement group`_ and the underlying OSDs at root of the problem.
.. tip:: A fault in one part of the cluster may prevent you from accessing a
- particular object, but that doesn't mean that you can't access other objects.
+ particular object, but that doesn't mean that you cannot access other objects.
When you run into a fault, don't panic. Just follow the steps for monitoring
your OSDs and placement groups. Then, begin troubleshooting.
If the number of OSDs that are ``in`` the cluster is more than the number of
OSDs that are ``up``, execute the following command to identify the ``ceph-osd``
-daemons that aren't running::
+daemons that are not running::
ceph osd tree
few cases:
- You are reaching your ``near full ratio`` or ``full ratio``.
-- Your data isn't getting distributed across the cluster due to an
+- Your data is not getting distributed across the cluster due to an
error in your CRUSH configuration.
current state. During that time period, the OSD may reflect a ``recovering``
state.
-Recovery isn't always trivial, because a hardware failure might cause a
+Recovery is not always trivial, because a hardware failure might cause a
cascading failure of multiple OSDs. For example, a network switch for a rack or
cabinet may fail, which can cause the OSDs of a number of host machines to fall
behind the current state of the cluster. Each one of the OSDs must recover once
requests when it is ready.
During the backfill operations, you may see one of several states:
-``backfill_wait`` indicates that a backfill operation is pending, but isn't
+``backfill_wait`` indicates that a backfill operation is pending, but is not
underway yet; ``backfill`` indicates that a backfill operation is underway;
and, ``backfill_too_full`` indicates that a backfill operation was requested,
but couldn't be completed due to insufficient storage capacity. When a
-placement group can't be backfilled, it may be considered ``incomplete``.
+placement group cannot be backfilled, it may be considered ``incomplete``.
Ceph provides a number of settings to manage the load spike associated with
reassigning placement groups to an OSD (especially a new OSD). By default,
-----
While Ceph uses heartbeats to ensure that hosts and daemons are running, the
-``ceph-osd`` daemons may also get into a ``stuck`` state where they aren't
+``ceph-osd`` daemons may also get into a ``stuck`` state where they are not
reporting statistics in a timely manner (e.g., a temporary network fault). By
default, OSD daemons report their placement group, up thru, boot and failure
statistics every half second (i.e., ``0.5``), which is more frequent than the
Identifying Troubled PGs
========================
-As previously noted, a placement group isn't necessarily problematic just
-because its state isn't ``active+clean``. Generally, Ceph's ability to self
+As previously noted, a placement group is not necessarily problematic just
+because its state is not ``active+clean``. Generally, Ceph's ability to self
repair may not be working when placement groups get stuck. The stuck states
include:
To set the number of placement groups in a pool, you must specify the
number of placement groups at the time you create the pool.
-See `Create a Pool`_ for details. Once you've set placement groups for a
+See `Create a Pool`_ for details. Once you have set placement groups for a
pool, you may increase the number of placement groups (but you cannot
decrease the number of placement groups). To increase the number of
placement groups, execute the following::
.. note:: The Ceph Object Gateway daemon (``radosgw``) is a client of the
- Ceph Storage Cluster, so it isn't represented as a Ceph Storage
+ Ceph Storage Cluster, so it is not represented as a Ceph Storage
Cluster daemon type.
The following entries describe each capability.
ceph auth export {TYPE.ID}
The ``auth export`` command is identical to ``auth get``, but also prints
-out the internal ``auid``, which isn't relevant to end users.
+out the internal ``auid``, which is not relevant to end users.
Second, make sure you are able to connect to ``mon.a``'s server from the
other monitors' servers. Check the ports as well. Check ``iptables`` on
- all your monitor nodes and make sure you're not dropping/rejecting
+ all your monitor nodes and make sure you are not dropping/rejecting
connections.
If this initial troubleshooting doesn't solve your problems, then it's
If you have a quorum, however, the monitor should be able to find the
remaining monitors pretty fast, as long as they can be reached. If your
- monitor is stuck probing and you've gone through with all the communication
+ monitor is stuck probing and you have gone through with all the communication
troubleshooting, then there is a fair chance that the monitor is trying
to reach the other monitors on a wrong address. ``mon_status`` outputs the
``monmap`` known to the monitor: check if the other monitor's locations
`Clock Skews`_ for more infos on that. If all your clocks are properly
synchronized, it is best if you prepare some logs and reach out to the
community. This is not a state that is likely to persist and aside from
- (*really*) old bugs there isn't an obvious reason besides clock skews on
+ (*really*) old bugs there is not an obvious reason besides clock skews on
why this would happen.
What if state is ``synchronizing``?
What if state is ``leader`` or ``peon``?
This should not happen. There is a chance this might happen however, and
- it has a lot to do with clock skews -- see `Clock Skews`_. If you're not
+ it has a lot to do with clock skews -- see `Clock Skews`_. If you are not
suffering from clock skews, then please prepare your logs (see
`Preparing your logs`_) and reach out to us.
$ ceph mon getmap -o /tmp/monmap
2. No quorum? Grab the monmap directly from another monitor (this
- assumes the monitor you're grabbing the monmap from has id ID-FOO
+ assumes the monitor you are grabbing the monmap from has id ID-FOO
and has been stopped)::
$ ceph-mon -i ID-FOO --extract-monmap /tmp/monmap
- 3. Stop the monitor you're going to inject the monmap into.
+ 3. Stop the monitor you are going to inject the monmap into.
4. Inject the monmap::
to ensure you have addressed any issues related to your kernel.
- **Segment Fault:** If there is a segment fault, turn your logging up
- (if it isn't already), and try again. If it segment faults again,
+ (if it is not already), and try again. If it segment faults again,
contact the ceph-devel email list and provide your Ceph configuration
file, your monitor output and the contents of your log file(s).
.. tip:: Newer versions of Ceph provide better recovery handling by preventing
recovering OSDs from using up system resources so that ``up`` and ``in``
- OSDs aren't available or are otherwise slow.
+ OSDs are not available or are otherwise slow.
Networking Issues
We recommend using both a public (front-end) network and a cluster (back-end)
network so that you can better meet the capacity requirements of object
replication. Another advantage is that you can run a cluster network such that
-it isn't connected to the internet, thereby preventing some denial of service
+it is not connected to the internet, thereby preventing some denial of service
attacks. When OSDs peer and check heartbeats, they use the cluster (back-end)
network when it's available. See `Monitor/OSD Interaction`_ for details.
Fewer OSDs than Replicas
------------------------
-If you've brought up two OSDs to an ``up`` and ``in`` state, but you still
+If you have brought up two OSDs to an ``up`` and ``in`` state, but you still
don't see ``active + clean`` placement groups, you may have an
``osd pool default size`` set to greater than ``2``.
mapped to OSDs, a small number of placement groups will not distribute across
your cluster. Try creating a pool with a placement group count that is a
multiple of the number of OSDs. See `Placement Groups`_ for details. The default
-placement group count for pools isn't useful, but you can change it `here`_.
+placement group count for pools is not useful, but you can change it `here`_.
Can't Write Data