# Python building things where it shouldn't
/src/python-common/build/
.cache
+
+# Doc build output
+src/pybind/cephfs/build/
+src/pybind/cephfs/cephfs.c
+src/pybind/cephfs/cephfs.egg-info/
+src/pybind/rados/build/
+src/pybind/rados/rados.c
+src/pybind/rados/rados.egg-info/
+src/pybind/rbd/build/
+src/pybind/rbd/rbd.c
+src/pybind/rbd/rbd.egg-info/
+src/pybind/rgw/build/
+src/pybind/rgw/rgw.c
+src/pybind/rgw/rgw.egg-info/
availability in the event that one of the monitor daemons or its host fails.
The Ceph monitor provides copies of the cluster map to storage cluster clients.
-A Ceph OSD Daemon checks its own state and the state of other OSDs and reports
+A Ceph OSD Daemon checks its own state and the state of other OSDs and reports
back to monitors.
A Ceph Manager serves as an endpoint for monitoring, orchestration, and plug-in
``librados``. The data received by the Ceph Storage Cluster is stored as RADOS
objects. Each object is stored on an :term:`Object Storage Device` (this is
also called an "OSD"). Ceph OSDs control read, write, and replication
-operations on storage drives. The default BlueStore back end stores objects
+operations on storage drives. The default BlueStore back end stores objects
in a monolithic, database-like fashion.
.. ditaa::
/------\ +-----+ +-----+
| obj |------>| {d} |------>| {s} |
\------/ +-----+ +-----+
-
+
Object OSD Drive
Ceph OSD Daemons store data as objects in a flat namespace. This means that
/------+------------------------------+----------------\
| ID | Binary Data | Metadata |
+------+------------------------------+----------------+
- | 1234 | 0101010101010100110101010010 | name1 = value1 |
+ | 1234 | 0101010101010100110101010010 | name1 = value1 |
| | 0101100001010100110101010010 | name2 = value2 |
| | 0101100001010100110101010010 | nameN = valueN |
- \------+------------------------------+----------------/
+ \------+------------------------------+----------------/
.. note:: An object ID is unique across the entire cluster, not just the local
filesystem.
the address, and the TCP port of each monitor. The monitor map specifies the
current epoch, the time of the monitor map's creation, and the time of the
monitor map's last modification. To view a monitor map, run ``ceph mon
- dump``.
-
+ dump``.
+
#. **The OSD Map:** Contains the cluster ``fsid``, the time of the OSD map's
creation, the time of the OSD map's last modification, a list of pools, a
list of replica sizes, a list of PG numbers, and a list of OSDs and their
statuses (for example, ``up``, ``in``). To view an OSD map, run ``ceph
- osd dump``.
-
+ osd dump``.
+
#. **The PG Map:** Contains the PG version, its time stamp, the last OSD map
epoch, the full ratios, and the details of each placement group. This
includes the PG ID, the `Up Set`, the `Acting Set`, the state of the PG (for
{decomp-crushmap-filename}``. Use a text editor or ``cat`` to view the
decompiled map.
-#. **The MDS Map:** Contains the current MDS map epoch, when the map was
- created, and the last time it changed. It also contains the pool for
+#. **The MDS Map:** Contains the current MDS map epoch, when the map was
+ created, and the last time it changed. It also contains the pool for
storing metadata, a list of metadata servers, and which metadata servers
are ``up`` and ``in``. To view an MDS map, execute ``ceph fs dump``.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The ``cephx`` authentication system is used by Ceph to authenticate users and
-daemons and to protect against man-in-the-middle attacks.
+daemons and to protect against man-in-the-middle attacks.
-.. note:: The ``cephx`` protocol does not address data encryption in transport
+.. note:: The ``cephx`` protocol does not address data encryption in transport
(for example, SSL/TLS) or encryption at rest.
``cephx`` uses shared secret keys for authentication. This means that both the
-client and the monitor cluster keep a copy of the client's secret key.
+client and the monitor cluster keep a copy of the client's secret key.
The ``cephx`` protocol makes it possible for each party to prove to the other
that it has a copy of the key without revealing it. This provides mutual
connections. The ``cephx`` authentication system establishes and sustains these
authenticated connections.
-The ``cephx`` protocol operates in a manner similar to `Kerberos`_.
+The ``cephx`` protocol operates in a manner similar to `Kerberos`_.
A user invokes a Ceph client to contact a monitor. Unlike Kerberos, each
monitor can authenticate users and distribute keys, which means that there is
monitors, and the monitors provide the client with a ticket that authenticates
the client against the OSDs that actually handle data. Ceph Monitors and OSDs
share a secret, which means that the clients can use the ticket provided by the
-monitors to authenticate against any OSD or metadata server in the cluster.
+monitors to authenticate against any OSD or metadata server in the cluster.
Like Kerberos tickets, ``cephx`` tickets expire. An attacker cannot use an
expired ticket or session key that has been obtained surreptitiously. This form
transmits the user's secret back to the ``client.admin`` user. This means that
the client and the monitor share a secret key.
-.. note:: The ``client.admin`` user must provide the user ID and
- secret key to the user in a secure manner.
+.. note:: The ``client.admin`` user must provide the user ID and
+ secret key to the user in a secure manner.
.. ditaa::
| request to |
| create a user |
|-------------->|----------+ create user
- | | | and
+ | | | and
|<--------------|<---------+ store key
| transmit key |
| |
+---------+ +---------+
| authenticate |
|-------------->|----------+ generate and
- | | | encrypt
+ | | | encrypt
|<--------------|<---------+ session key
| transmit |
| encrypted |
| session key |
- | |
+ | |
|-----+ decrypt |
- | | session |
- |<----+ key |
+ | | session |
+ |<----+ key |
| |
| req. ticket |
|-------------->|----------+ generate and
- | | | encrypt
+ | | | encrypt
|<--------------|<---------+ ticket
| recv. ticket |
- | |
+ | |
|-----+ decrypt |
- | | ticket |
- |<----+ |
+ | | ticket |
+ |<----+ |
The ``cephx`` protocol authenticates ongoing communications between the clients
| Client | | Monitor | | MDS | | OSD |
+---------+ +---------+ +-------+ +-------+
| request to | | |
- | create a user | | |
+ | create a user | | |
|-------------->| mon and | |
|<--------------| client share | |
| receive | a secret. | |
| |<------------>| |
| |<-------------+------------>|
| | mon, mds, | |
- | authenticate | and osd | |
+ | authenticate | and osd | |
|-------------->| share | |
|<--------------| a secret | |
| session key | | |
| receive response (CephFS only) |
| |
| make request |
- |------------------------------------------->|
+ |------------------------------------------->|
|<-------------------------------------------|
receive response
accesses the Ceph client from a remote host, cephx authentication will not be
applied to the connection between the user's host and the client host.
-See `Cephx Config Guide`_ for more on configuration details.
+See `Cephx Config Guide`_ for more on configuration details.
See `User Management`_ for more on user management.
Monitors receive no such message after a configurable period of time,
then they mark the OSD ``down``. This mechanism is a failsafe, however.
Normally, Ceph OSD Daemons determine if a neighboring OSD is ``down`` and
- report it to the Ceph Monitors. This contributes to making Ceph Monitors
+ report it to the Ceph Monitors. This contributes to making Ceph Monitors
lightweight processes. See `Monitoring OSDs`_ and `Heartbeats`_ for
additional details.
Write (2) | | | | Write (3)
+------+ | | +------+
| +------+ +------+ |
- | | Ack (4) Ack (5)| |
+ | | Ack (4) Ack (5)| |
v * * v
+---------------+ +---------------+
| Secondary OSD | | Tertiary OSD |
The Ceph storage system supports the notion of 'Pools', which are logical
partitions for storing objects.
-
+
Ceph Clients retrieve a `Cluster Map`_ from a Ceph Monitor, and write RADOS
objects to pools. The way that Ceph places the data in the pools is determined
by the pool's ``size`` or number of replicas, the CRUSH rule, and the number of
+--------+ +---------------+
| Pool |---------->| CRUSH Rule |
+--------+ Selects +---------------+
-
+
Pools set at least the following parameters:
- Ownership/Access to Objects
-- The Number of Placement Groups, and
+- The Number of Placement Groups, and
- The CRUSH Rule to Use.
See `Set Pool Values`_ for details.
Each pool has a number of placement groups (PGs) within it. CRUSH dynamically
maps PGs to OSDs. When a Ceph Client stores objects, CRUSH maps each RADOS
-object to a PG.
+object to a PG.
This mapping of RADOS objects to PGs implements an abstraction and indirection
layer between Ceph OSD Daemons and Ceph Clients. The Ceph Storage Cluster must
be able to grow (or shrink) and redistribute data adaptively when the internal
-topology changes.
+topology changes.
If the Ceph Client "knew" which Ceph OSD Daemons were storing which objects, a
tight coupling would exist between the Ceph Client and the Ceph OSD Daemon.
+------+------+-------------+ |
| | | |
v v v v
- /----------\ /----------\ /----------\ /----------\
+ /----------\ /----------\ /----------\ /----------\
| | | | | | | |
| OSD #1 | | OSD #2 | | OSD #3 | | OSD #4 |
| | | | | | | |
- \----------/ \----------/ \----------/ \----------/
+ \----------/ \----------/ \----------/ \----------/
The client uses its copy of the cluster map and the CRUSH algorithm to compute
precisely which OSD it will use when reading or writing a particular object.
the `Cluster Map`_. When a client has been equipped with a copy of the cluster
map, it is aware of all the monitors, OSDs, and metadata servers in the
cluster. **However, even equipped with a copy of the latest version of the
-cluster map, the client doesn't know anything about object locations.**
+cluster map, the client doesn't know anything about object locations.**
**Object locations must be computed.**
section.
.. Note:: PGs that agree on the state of the cluster do not necessarily have
- the current data yet.
+ the current data yet.
The Ceph Storage Cluster was designed to store at least two copies of an object
(that is, ``size = 2``), which is the minimum requirement for data safety. For
The Ceph OSD daemons that are part of an *Acting Set* might not always be
``up``. When an OSD in the *Acting Set* is ``up``, it is part of the *Up Set*.
The *Up Set* is an important distinction, because Ceph can remap PGs to other
-Ceph OSD Daemons when an OSD fails.
+Ceph OSD Daemons when an OSD fails.
.. note:: Consider a hypothetical *Acting Set* for a PG that contains
``osd.25``, ``osd.32`` and ``osd.61``. The first OSD (``osd.25``), is the
large clusters) where some, but not all of the PGs migrate from existing OSDs
(OSD 1, and OSD 2) to the new OSD (OSD 3). Even when rebalancing, CRUSH is
stable. Many of the placement groups remain in their original configuration,
-and each OSD gets some added capacity, so there are no load spikes on the
+and each OSD gets some added capacity, so there are no load spikes on the
new OSD after rebalancing is complete.
| | | |
| +-------+-------+ |
| ^ |
- | | |
+ | | |
| | |
+--+---+ +------+ +---+--+ +---+--+
name | NYAN | | NYAN | | NYAN | | NYAN |
.. ditaa::
Primary OSD
-
+
+-------------+
| OSD 1 | +-------------+
| log | Write Full | |
.. ditaa::
Primary OSD
-
+
+-------------+
| OSD 1 |
| log |
| +----+ +<------------+ Ceph Client |
| | v2 | |
| +----+ | +-------------+
- | |D1v1| 1,1 |
- | +----+ |
- +------+------+
- |
- |
+ | |D1v1| 1,1 |
+ | +----+ |
+ +------+------+
+ |
+ |
| +------+------+
| | OSD 2 |
| +------+ | log |
.. ditaa::
Primary OSD
-
+
+-------------+
| OSD 1 |
| log |
| +----+ +<------------+ Ceph Client |
| | v2 | |
| +----+ | +-------------+
- | |D1v1| 1,1 |
- | +----+ |
- +------+------+
- |
+ | |D1v1| 1,1 |
+ | +----+ |
+ +------+------+
+ |
| +-------------+
| | OSD 2 |
| | log |
| | |D2v1| 1,1 |
| | +----+ |
| +-------------+
- |
+ |
| +-------------+
| | OSD 3 |
| | log |
.. ditaa::
Primary OSD
-
+
+-------------+
| OSD 1 |
| log |
| (down) |
| c333 |
+------+------+
- |
+ |
| +-------------+
| | OSD 2 |
| | log |
| | +----+ |
| | |
| +-------------+
- |
+ |
| +-------------+
| | OSD 3 |
| | log |
| 1,1 |
| |
+------+------+
-
+
The log entry 1,2 found on **OSD 3** is divergent from the new authoritative log
provided by **OSD 4**: it is discarded and the file containing the ``C1v2``
chunk is removed. The ``D1v1`` chunk is rebuilt with the ``decode`` function of
-the erasure coding library during scrubbing and stored on the new primary
+the erasure coding library during scrubbing and stored on the new primary
**OSD 4**.
.. ditaa::
Primary OSD
-
+
+-------------+
| OSD 4 |
| log |
or relatively slower/cheaper devices configured to act as an economical storage
tier. The Ceph objecter handles where to place the objects and the tiering
agent determines when to flush objects from the cache to the backing storage
-tier. So the cache tier and the backing storage tier are completely transparent
+tier. So the cache tier and the backing storage tier are completely transparent
to Ceph clients.
| Ceph Client |
+------+------+
^
- Tiering is |
+ Tiering is |
Transparent | Faster I/O
to Ceph | +---------------+
- Client Ops | | |
+ Client Ops | | |
| +----->+ Cache Tier |
| | | |
| | +-----+---+-----+
- | | | ^
+ | | | ^
v v | | Active Data in Cache Tier
+------+----+--+ | |
| Objecter | | |
A Ceph class for a content management system that presents pictures of a
particular size and aspect ratio could take an inbound bitmap image, crop it
- to a particular aspect ratio, resize it and embed an invisible copyright or
- watermark to help protect the intellectual property; then, save the
+ to a particular aspect ratio, resize it and embed an invisible copyright or
+ watermark to help protect the intellectual property; then, save the
resulting bitmap image to the object store.
-See ``src/objclass/objclass.h``, ``src/fooclass.cc`` and ``src/barclass`` for
+See ``src/objclass/objclass.h``, ``src/fooclass.cc`` and ``src/barclass`` for
exemplary implementations.
+----------+ +----------+ +----------+ +---------------+
| | | |
| | | |
- | | Watch Object | |
+ | | Watch Object | |
|--------------------------------------------------->|
| | | |
|<---------------------------------------------------|
| | | |
| | |<-----------------|
| | | Ack/Commit |
- | | Notify | |
+ | | Notify | |
|--------------------------------------------------->|
| | | |
|<---------------------------------------------------|
| | Notify | |
| | |<-----------------|
| | | Notify |
- | | Ack | |
+ | | Ack | |
|----------------+---------------------------------->|
| | | |
| | Ack | |
| | | |
| | | Ack |
| | |----------------->|
- | | | |
+ | | | |
|<---------------+----------------+------------------|
| Complete
reliability of n-way RAID mirroring and faster recovery.
Ceph provides three types of clients: Ceph Block Device, Ceph File System, and
-Ceph Object Storage. A Ceph Client converts its data from the representation
+Ceph Object Storage. A Ceph Client converts its data from the representation
format it provides to its users (a block device image, RESTful objects, CephFS
-filesystem directories) into objects for storage in the Ceph Storage Cluster.
+filesystem directories) into objects for storage in the Ceph Storage Cluster.
-.. tip:: The objects Ceph stores in the Ceph Storage Cluster are not striped.
- Ceph Object Storage, Ceph Block Device, and the Ceph File System stripe their
- data over multiple Ceph Storage Cluster objects. Ceph Clients that write
+.. tip:: The objects Ceph stores in the Ceph Storage Cluster are not striped.
+ Ceph Object Storage, Ceph Block Device, and the Ceph File System stripe their
+ data over multiple Ceph Storage Cluster objects. Ceph Clients that write
directly to the Ceph Storage Cluster via ``librados`` must perform the
striping (and parallel I/O) for themselves to obtain these benefits.
| End cCCC | | End cCCC |
| Object 0 | | Object 1 |
\-----------/ \-----------/
-
+
If you anticipate large images sizes, large S3 or Swift objects (e.g., video),
or large CephFS directories, you may see considerable read/write performance
+-----------------+--------+--------+-----------------+
| | | | +--\
v v v v |
- /-----------\ /-----------\ /-----------\ /-----------\ |
+ /-----------\ /-----------\ /-----------\ /-----------\ |
| Begin cCCC| | Begin cCCC| | Begin cCCC| | Begin cCCC| |
| Object 0 | | Object 1 | | Object 2 | | Object 3 | |
+-----------+ +-----------+ +-----------+ +-----------+ |
| stripe | | stripe | | stripe | | stripe | |
| unit 0 | | unit 1 | | unit 2 | | unit 3 | |
+-----------+ +-----------+ +-----------+ +-----------+ |
- | stripe | | stripe | | stripe | | stripe | +-\
+ | stripe | | stripe | | stripe | | stripe | +-\
| unit 4 | | unit 5 | | unit 6 | | unit 7 | | Object
- +-----------+ +-----------+ +-----------+ +-----------+ +- Set
+ +-----------+ +-----------+ +-----------+ +-----------+ +- Set
| stripe | | stripe | | stripe | | stripe | | 1
| unit 8 | | unit 9 | | unit 10 | | unit 11 | +-/
+-----------+ +-----------+ +-----------+ +-----------+ |
| unit 12 | | unit 13 | | unit 14 | | unit 15 | |
+-----------+ +-----------+ +-----------+ +-----------+ |
| End cCCC | | End cCCC | | End cCCC | | End cCCC | |
- | Object 0 | | Object 1 | | Object 2 | | Object 3 | |
+ | Object 0 | | Object 1 | | Object 2 | | Object 3 | |
\-----------/ \-----------/ \-----------/ \-----------/ |
|
+--/
-
+
+--\
|
- /-----------\ /-----------\ /-----------\ /-----------\ |
+ /-----------\ /-----------\ /-----------\ /-----------\ |
| Begin cCCC| | Begin cCCC| | Begin cCCC| | Begin cCCC| |
- | Object 4 | | Object 5 | | Object 6 | | Object 7 | |
+ | Object 4 | | Object 5 | | Object 6 | | Object 7 | |
+-----------+ +-----------+ +-----------+ +-----------+ |
| stripe | | stripe | | stripe | | stripe | |
| unit 16 | | unit 17 | | unit 18 | | unit 19 | |
+-----------+ +-----------+ +-----------+ +-----------+ |
- | stripe | | stripe | | stripe | | stripe | +-\
+ | stripe | | stripe | | stripe | | stripe | +-\
| unit 20 | | unit 21 | | unit 22 | | unit 23 | | Object
+-----------+ +-----------+ +-----------+ +-----------+ +- Set
- | stripe | | stripe | | stripe | | stripe | | 2
+ | stripe | | stripe | | stripe | | stripe | | 2
| unit 24 | | unit 25 | | unit 26 | | unit 27 | +-/
+-----------+ +-----------+ +-----------+ +-----------+ |
| stripe | | stripe | | stripe | | stripe | |
| unit 28 | | unit 29 | | unit 30 | | unit 31 | |
+-----------+ +-----------+ +-----------+ +-----------+ |
| End cCCC | | End cCCC | | End cCCC | | End cCCC | |
- | Object 4 | | Object 5 | | Object 6 | | Object 7 | |
+ | Object 4 | | Object 5 | | Object 6 | | Object 7 | |
\-----------/ \-----------/ \-----------/ \-----------/ |
|
+--/
-Three important variables determine how Ceph stripes data:
+Three important variables determine how Ceph stripes data:
- **Object Size:** Objects in the Ceph Storage Cluster have a maximum
configurable size (e.g., 2MB, 4MB, etc.). The object size should be large
the stripe unit.
- **Stripe Width:** Stripes have a configurable unit size (e.g., 64kb).
- The Ceph Client divides the data it will write to objects into equally
- sized stripe units, except for the last stripe unit. A stripe width,
- should be a fraction of the Object Size so that an object may contain
+ The Ceph Client divides the data it will write to objects into equally
+ sized stripe units, except for the last stripe unit. A stripe width,
+ should be a fraction of the Object Size so that an object may contain
many stripe units.
- **Stripe Count:** The Ceph Client writes a sequence of stripe units
- over a series of objects determined by the stripe count. The series
- of objects is called an object set. After the Ceph Client writes to
+ over a series of objects determined by the stripe count. The series
+ of objects is called an object set. After the Ceph Client writes to
the last object in the object set, it returns to the first object in
the object set.
-
+
.. important:: Test the performance of your striping configuration before
putting your cluster into production. You CANNOT change these striping
parameters after you stripe the data and write it to objects.
Once the Ceph Client has striped data to stripe units and mapped the stripe
units to objects, Ceph's CRUSH algorithm maps the objects to placement groups,
-and the placement groups to Ceph OSD Daemons before the objects are stored as
+and the placement groups to Ceph OSD Daemons before the objects are stored as
files on a storage drive.
.. note:: Since a client writes to a single pool, all data striped into objects
that uses ``librbd`` directly--avoiding the kernel object overhead for
virtualized systems.
-- **Object Storage:** The :term:`Ceph Object Storage` (a.k.a., RGW) service
+- **Object Storage:** The :term:`Ceph Object Storage` (a.k.a., RGW) service
provides RESTful APIs with interfaces that are compatible with Amazon S3
- and OpenStack Swift.
-
-- **Filesystem**: The :term:`Ceph File System` (CephFS) service provides
- a POSIX compliant filesystem usable with ``mount`` or as
+ and OpenStack Swift.
+
+- **Filesystem**: The :term:`Ceph File System` (CephFS) service provides
+ a POSIX compliant filesystem usable with ``mount`` or as
a filesystem in user space (FUSE).
Ceph can run additional instances of OSDs, MDSs, and monitors for scalability
and high availability. The following diagram depicts the high-level
-architecture.
+architecture.
.. ditaa::
+--------------+ +----------------+ +-------------+
| Block Device | | Object Storage | | CephFS |
- +--------------+ +----------------+ +-------------+
+ +--------------+ +----------------+ +-------------+
+--------------+ +----------------+ +-------------+
| librbd | | librgw | | libcephfs |
.. topic:: S3/Swift Objects and Store Cluster Objects Compared
Ceph's Object Storage uses the term *object* to describe the data it stores.
- S3 and Swift objects are not the same as the objects that Ceph writes to the
+ S3 and Swift objects are not the same as the objects that Ceph writes to the
Ceph Storage Cluster. Ceph Object Storage objects are mapped to Ceph Storage
- Cluster objects. The S3 and Swift objects do not necessarily
- correspond in a 1:1 manner with an object stored in the storage cluster. It
+ Cluster objects. The S3 and Swift objects do not necessarily
+ correspond in a 1:1 manner with an object stored in the storage cluster. It
is possible for an S3 or Swift object to map to multiple Ceph objects.
See `Ceph Object Storage`_ for details.
distributed, and the placement groups are spread across separate ``ceph-osd``
daemons throughout the cluster.
-.. important:: Striping allows RBD block devices to perform better than a single
+.. important:: Striping allows RBD block devices to perform better than a single
server could!
Thin-provisioned snapshottable Ceph Block Devices are an attractive option for
QEMU/KVM, where the host machine uses ``librbd`` to provide a block device
service to the guest. Many cloud computing stacks use ``libvirt`` to integrate
with hypervisors. You can use thin-provisioned Ceph Block Devices with QEMU and
-``libvirt`` to support OpenStack and CloudStack among other solutions.
+``libvirt`` to support OpenStack, OpenNebula and CloudStack
+among other solutions.
While we do not provide ``librbd`` support with other hypervisors at this time,
you may also use Ceph Block Device kernel objects to provide a block device to a
+-----------------------+ +------------------------+
| CephFS Kernel Object | | CephFS FUSE |
- +-----------------------+ +------------------------+
+ +-----------------------+ +------------------------+
+---------------------------------------------------+
| CephFS Library (libcephfs) |
and storing the file data in one or more objects in the Ceph Storage Cluster.
The Ceph filesystem aims for POSIX compatibility. ``ceph-mds`` can run as a
single process, or it can be distributed out to multiple physical machines,
-either for high availability or for scalability.
+either for high availability or for scalability.
-- **High Availability**: The extra ``ceph-mds`` instances can be `standby`,
+- **High Availability**: The extra ``ceph-mds`` instances can be `standby`,
ready to take over the duties of any failed ``ceph-mds`` that was
`active`. This is easy because all the data, including the journal, is
stored on RADOS. The transition is triggered automatically by ``ceph-mon``.
Installing Ceph
===============
-There are multiple ways to install Ceph.
+There are multiple ways to install Ceph.
Recommended methods
~~~~~~~~~~~~~~~~~~~
:ref:`Cephadm <cephadm_deploying_new_cluster>` is a tool that can be used to
-install and manage a Ceph cluster.
+install and manage a Ceph cluster.
* cephadm supports only Octopus and newer releases.
* cephadm is fully integrated with the orchestration API and fully supports the
`github.com/openstack/puppet-ceph <https://github.com/openstack/puppet-ceph>`_ installs Ceph via Puppet.
+`OpenNebula HCI clusters <https://docs.opennebula.io/stable/provision_clusters/hci_clusters/overview.html>`_ deploys Ceph on various cloud platforms.
+
Ceph can also be :ref:`installed manually <install-manual>`.
Ceph's block devices deliver high performance with vast scalability to
`kernel modules`_, or to :abbr:`KVMs (kernel virtual machines)` such as `QEMU`_, and
-cloud-based computing systems like `OpenStack`_ and `CloudStack`_ that rely on
-libvirt and QEMU to integrate with Ceph block devices. You can use the same cluster
-to operate the :ref:`Ceph RADOS Gateway <object-gateway>`, the
+cloud-based computing systems like `OpenStack`_, `OpenNebula`_ and `CloudStack`_
+that rely on libvirt and QEMU to integrate with Ceph block devices. You can use
+the same cluster to operate the :ref:`Ceph RADOS Gateway <object-gateway>`, the
:ref:`Ceph File System <ceph-file-system>`, and Ceph block devices simultaneously.
.. important:: To use Ceph Block Devices, you must have access to a running
.. _kernel modules: ./rbd-ko/
.. _QEMU: ./qemu-rbd/
.. _OpenStack: ./rbd-openstack
+.. _OpenNebula: https://docs.opennebula.io/stable/open_cluster_deployment/storage_setup/ceph_ds.html
.. _CloudStack: ./rbd-cloudstack
.. index:: Ceph Block Device; livirt
-The ``libvirt`` library creates a virtual machine abstraction layer between
-hypervisor interfaces and the software applications that use them. With
-``libvirt``, developers and system administrators can focus on a common
+The ``libvirt`` library creates a virtual machine abstraction layer between
+hypervisor interfaces and the software applications that use them. With
+``libvirt``, developers and system administrators can focus on a common
management framework, common API, and common shell interface (i.e., ``virsh``)
-to many different hypervisors, including:
+to many different hypervisors, including:
- QEMU/KVM
- XEN
Ceph block devices support QEMU/KVM. You can use Ceph block devices with
software that interfaces with ``libvirt``. The following stack diagram
-illustrates how ``libvirt`` and QEMU use Ceph block devices via ``librbd``.
+illustrates how ``libvirt`` and QEMU use Ceph block devices via ``librbd``.
.. ditaa::
The most common ``libvirt`` use case involves providing Ceph block devices to
-cloud solutions like OpenStack or CloudStack. The cloud solution uses
+cloud solutions like OpenStack, OpenNebula or CloudStack. The cloud solution uses
``libvirt`` to interact with QEMU/KVM, and QEMU/KVM interacts with Ceph block
-devices via ``librbd``. See `Block Devices and OpenStack`_ and `Block Devices
-and CloudStack`_ for details. See `Installation`_ for installation details.
+devices via ``librbd``. See `Block Devices and OpenStack`_,
+`Block Devices and OpenNebula`_ and `Block Devices and CloudStack`_ for details.
+See `Installation`_ for installation details.
You can also use Ceph block devices with ``libvirt``, ``virsh`` and the
``libvirt`` API. See `libvirt Virtualization API`_ for details.
To configure Ceph for use with ``libvirt``, perform the following steps:
-#. `Create a pool`_. The following example uses the
+#. `Create a pool`_. The following example uses the
pool name ``libvirt-pool``.::
ceph osd pool create libvirt-pool
- Verify the pool exists. ::
+ Verify the pool exists. ::
ceph osd lspools
and references ``libvirt-pool``. ::
ceph auth get-or-create client.libvirt mon 'profile rbd' osd 'profile rbd pool=libvirt-pool'
-
- Verify the name exists. ::
-
+
+ Verify the name exists. ::
+
ceph auth ls
- **NOTE**: ``libvirt`` will access Ceph using the ID ``libvirt``,
- not the Ceph name ``client.libvirt``. See `User Management - User`_ and
- `User Management - CLI`_ for a detailed explanation of the difference
- between ID and name.
+ **NOTE**: ``libvirt`` will access Ceph using the ID ``libvirt``,
+ not the Ceph name ``client.libvirt``. See `User Management - User`_ and
+ `User Management - CLI`_ for a detailed explanation of the difference
+ between ID and name.
-#. Use QEMU to `create an image`_ in your RBD pool.
+#. Use QEMU to `create an image`_ in your RBD pool.
The following example uses the image name ``new-libvirt-image``
and references ``libvirt-pool``. ::
qemu-img create -f rbd rbd:libvirt-pool/new-libvirt-image 2G
- Verify the image exists. ::
+ Verify the image exists. ::
rbd -p libvirt-pool ls
admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok
The ``client.libvirt`` section name should match the cephx user you created
- above.
+ above.
If SELinux or AppArmor is enabled, note that this could prevent the client
process (qemu via libvirt) from doing some operations, such as writing logs
or operate the images or admin socket to the destination locations (``/var/
========================
You may use ``libvirt`` without a VM manager, but you may find it simpler to
-create your first domain with ``virt-manager``.
+create your first domain with ``virt-manager``.
#. Install a virtual machine manager. See `KVM/VirtManager`_ for details. ::
#. Download an OS image (if necessary).
-#. Launch the virtual machine manager. ::
+#. Launch the virtual machine manager. ::
sudo virt-manager
To create a VM with ``virt-manager``, perform the following steps:
-#. Press the **Create New Virtual Machine** button.
+#. Press the **Create New Virtual Machine** button.
#. Name the new virtual machine domain. In the exemplary embodiment, we
use the name ``libvirt-virtual-machine``. You may use any name you wish,
- but ensure you replace ``libvirt-virtual-machine`` with the name you
- choose in subsequent commandline and configuration examples. ::
+ but ensure you replace ``libvirt-virtual-machine`` with the name you
+ choose in subsequent commandline and configuration examples. ::
libvirt-virtual-machine
/path/to/image/recent-linux.img
- **NOTE:** Import a recent image. Some older images may not rescan for
+ **NOTE:** Import a recent image. Some older images may not rescan for
virtual devices properly.
-
+
#. Configure and start the VM.
#. You may use ``virsh list`` to verify the VM domain exists. ::
commands, refer to `Virsh Command Reference`_.
-#. Open the configuration file with ``virsh edit``. ::
+#. Open the configuration file with ``virsh edit``. ::
sudo virsh edit {vm-domain-name}
- Under ``<devices>`` there should be a ``<disk>`` entry. ::
+ Under ``<devices>`` there should be a ``<disk>`` entry. ::
<devices>
<emulator>/usr/bin/kvm</emulator>
Replace ``/path/to/image/recent-linux.img`` with the path to the OS image.
- The minimum kernel for using the faster ``virtio`` bus is 2.6.25. See
+ The minimum kernel for using the faster ``virtio`` bus is 2.6.25. See
`Virtio`_ for details.
- **IMPORTANT:** Use ``sudo virsh edit`` instead of a text editor. If you edit
- the configuration file under ``/etc/libvirt/qemu`` with a text editor,
- ``libvirt`` may not recognize the change. If there is a discrepancy between
- the contents of the XML file under ``/etc/libvirt/qemu`` and the result of
- ``sudo virsh dumpxml {vm-domain-name}``, then your VM may not work
+ **IMPORTANT:** Use ``sudo virsh edit`` instead of a text editor. If you edit
+ the configuration file under ``/etc/libvirt/qemu`` with a text editor,
+ ``libvirt`` may not recognize the change. If there is a discrepancy between
+ the contents of the XML file under ``/etc/libvirt/qemu`` and the result of
+ ``sudo virsh dumpxml {vm-domain-name}``, then your VM may not work
properly.
-
-#. Add the Ceph RBD image you created as a ``<disk>`` entry. ::
+
+#. Add the Ceph RBD image you created as a ``<disk>`` entry. ::
<disk type='network' device='disk'>
<source protocol='rbd' name='libvirt-pool/new-libvirt-image'>
<target dev='vdb' bus='virtio'/>
</disk>
- Replace ``{monitor-host}`` with the name of your host, and replace the
- pool and/or image name as necessary. You may add multiple ``<host>``
+ Replace ``{monitor-host}`` with the name of your host, and replace the
+ pool and/or image name as necessary. You may add multiple ``<host>``
entries for your Ceph monitors. The ``dev`` attribute is the logical
- device name that will appear under the ``/dev`` directory of your
- VM. The optional ``bus`` attribute indicates the type of disk device to
- emulate. The valid settings are driver specific (e.g., "ide", "scsi",
+ device name that will appear under the ``/dev`` directory of your
+ VM. The optional ``bus`` attribute indicates the type of disk device to
+ emulate. The valid settings are driver specific (e.g., "ide", "scsi",
"virtio", "xen", "usb" or "sata").
-
+
See `Disks`_ for details of the ``<disk>`` element, and its child elements
and attributes.
-
+
#. Save the file.
-#. If your Ceph Storage Cluster has `Ceph Authentication`_ enabled (it does by
- default), you must generate a secret. ::
+#. If your Ceph Storage Cluster has `Ceph Authentication`_ enabled (it does by
+ default), you must generate a secret. ::
cat > secret.xml <<EOF
<secret ephemeral='no' private='no'>
ceph auth get-key client.libvirt | sudo tee client.libvirt.key
-#. Set the UUID of the secret. ::
+#. Set the UUID of the secret. ::
sudo virsh secret-set-value --secret {uuid of secret} --base64 $(cat client.libvirt.key) && rm client.libvirt.key secret.xml
- You must also set the secret manually by adding the following ``<auth>``
+ You must also set the secret manually by adding the following ``<auth>``
entry to the ``<disk>`` element you entered earlier (replacing the
``uuid`` value with the result from the command line example above). ::
<auth username='libvirt'>
<secret type='ceph' uuid='{uuid of secret}'/>
</auth>
- <target ...
+ <target ...
- **NOTE:** The exemplary ID is ``libvirt``, not the Ceph name
- ``client.libvirt`` as generated at step 2 of `Configuring Ceph`_. Ensure
- you use the ID component of the Ceph name you generated. If for some reason
- you need to regenerate the secret, you will have to execute
- ``sudo virsh secret-undefine {uuid}`` before executing
+ **NOTE:** The exemplary ID is ``libvirt``, not the Ceph name
+ ``client.libvirt`` as generated at step 2 of `Configuring Ceph`_. Ensure
+ you use the ID component of the Ceph name you generated. If for some reason
+ you need to regenerate the secret, you will have to execute
+ ``sudo virsh secret-undefine {uuid}`` before executing
``sudo virsh secret-set-value`` again.
following procedures.
-#. Check to see if Ceph is running::
+#. Check to see if Ceph is running::
ceph health
-#. Check to see if the VM is running. ::
+#. Check to see if the VM is running. ::
sudo virsh list
-#. Check to see if the VM is communicating with Ceph. Replace
- ``{vm-domain-name}`` with the name of your VM domain::
+#. Check to see if the VM is communicating with Ceph. Replace
+ ``{vm-domain-name}`` with the name of your VM domain::
sudo virsh qemu-monitor-command --hmp {vm-domain-name} 'info block'
#. Check to see if the device from ``<target dev='vdb' bus='virtio'/>`` exists::
-
+
virsh domblklist {vm-domain-name} --details
-If everything looks okay, you may begin using the Ceph block device
+If everything looks okay, you may begin using the Ceph block device
within your VM.
.. _Installation: ../../install
.. _libvirt Virtualization API: http://www.libvirt.org
.. _Block Devices and OpenStack: ../rbd-openstack
+.. _Block Devices and OpenNebula: https://docs.opennebula.io/stable/open_cluster_deployment/storage_setup/ceph_ds.html#datastore-internals
.. _Block Devices and CloudStack: ../rbd-cloudstack
.. _Create a pool: ../../rados/operations/pools#create-a-pool
.. _Create a Ceph User: ../../rados/operations/user-management#add-a-user
also supports snapshot layering, which allows you to clone images (for example,
VM images) quickly and easily. Ceph block device snapshots are managed using
the ``rbd`` command and several higher-level interfaces, including `QEMU`_,
-`libvirt`_, `OpenStack`_, and `CloudStack`_.
+`libvirt`_, `OpenStack`_, `OpenNebula`_ and `CloudStack`_.
.. important:: To use RBD snapshots, you must have a running Ceph cluster.
.. note:: Because RBD is unaware of any file system within an image (volume),
snapshots are merely `crash-consistent` unless they are coordinated within
the mounting (attaching) operating system. We therefore recommend that you
- pause or stop I/O before taking a snapshot.
-
+ pause or stop I/O before taking a snapshot.
+
If the volume contains a file system, the file system should be in an
internally consistent state before a snapshot is taken. Snapshots taken
without write quiescing could need an `fsck` pass before they are mounted
again. To quiesce I/O you can use `fsfreeze` command. See the `fsfreeze(8)`
- man page for more details.
-
+ man page for more details.
+
For virtual machines, `qemu-guest-agent` can be used to automatically freeze
file systems when creating a snapshot.
When `cephx`_ authentication is enabled (it is by default), you must specify a
user name or ID and a path to the keyring containing the corresponding key. See
-:ref:`User Management <user-management>` for details.
+:ref:`User Management <user-management>` for details.
.. prompt:: bash $
.. prompt:: bash $
rbd snap create rbd/foo@snapname
-
+
List Snapshots
--------------
.. prompt:: bash $
rbd snap rm {pool-name}/{image-name}@{snap-name}
-
+
For example:
.. prompt:: bash $
| | to Parent | |
| (read only) | | (writable) |
+-------------+ +-------------+
-
+
Parent Child
.. note:: The terms "parent" and "child" refer to a Ceph block device snapshot
(parent) and the corresponding image cloned from the snapshot (child).
These terms are important for the command line usage below.
-
+
Each cloned image (child) stores a reference to its parent image, which enables
the cloned image to open the parent snapshot and read it.
A copy-on-write clone of a snapshot behaves exactly like any other Ceph
block device image. You can read to, write from, clone, and resize cloned
images. There are no special restrictions with cloned images. However, the
-copy-on-write clone of a snapshot depends on the snapshot, so you must
+copy-on-write clone of a snapshot depends on the snapshot, so you must
protect the snapshot before you clone it. The diagram below depicts this
process.
| | | |
+----------------------------+ +-----------------------------+
|
- +--------------------------------------+
+ +--------------------------------------+
|
v
+----------------------------+ +-----------------------------+
---------------------
Clones access the parent snapshots. All clones would break if a user
-inadvertently deleted the parent snapshot. To prevent data loss, you must
+inadvertently deleted the parent snapshot. To prevent data loss, you must
protect the snapshot before you can clone it:
.. prompt:: bash $
.. prompt:: bash $
rbd clone {pool-name}/{parent-image-name}@{snap-name} {pool-name}/{child-image-name}
-
+
For example:
.. prompt:: bash $
rbd clone rbd/foo@snapname rbd/bar
-
+
.. note:: You may clone a snapshot from one pool to an image in another pool.
For example, you may maintain read-only images and snapshots as templates in
.. _cephx: ../../rados/configuration/auth-config-ref/
.. _QEMU: ../qemu-rbd/
.. _OpenStack: ../rbd-openstack/
+.. _OpenNebula: https://docs.opennebula.io/stable/management_and_operations/vm_management/vm_instances.html?highlight=ceph#managing-disk-snapshots
.. _CloudStack: ../rbd-cloudstack/
.. _libvirt: ../libvirt/
==================
You can help the Ceph project by contributing to the documentation. Even
-small contributions help the Ceph project.
+small contributions help the Ceph project.
The easiest way to suggest a correction to the documentation is to send an
email to `ceph-users@ceph.io`. Include the string "ATTN: DOCS" or
===============================================
The Ceph documentation source is in the ``ceph/doc`` directory of the Ceph
-repository. Python Sphinx renders the source into HTML and manpages.
+repository. Python Sphinx renders the source into HTML and manpages.
Viewing Old Ceph Documentation
==============================
The Ceph documentation is organized by component:
-- **Ceph Storage Cluster:** The Ceph Storage Cluster documentation is
+- **Ceph Storage Cluster:** The Ceph Storage Cluster documentation is
in the ``doc/rados`` directory.
-
-- **Ceph Block Device:** The Ceph Block Device documentation is in
+
+- **Ceph Block Device:** The Ceph Block Device documentation is in
the ``doc/rbd`` directory.
-
-- **Ceph Object Storage:** The Ceph Object Storage documentation is in
+
+- **Ceph Object Storage:** The Ceph Object Storage documentation is in
the ``doc/radosgw`` directory.
-- **Ceph File System:** The Ceph File System documentation is in the
+- **Ceph File System:** The Ceph File System documentation is in the
``doc/cephfs`` directory.
-
+
- **Installation (Quick):** Quick start documentation is in the
``doc/start`` directory.
-
+
- **Installation (Manual):** Documentaton concerning the manual installation of
Ceph is in the ``doc/install`` directory.
-
+
- **Manpage:** Manpage source is in the ``doc/man`` directory.
-- **Developer:** Developer documentation is in the ``doc/dev``
+- **Developer:** Developer documentation is in the ``doc/dev``
directory.
- **Images:** Images including JPEG and PNG files are stored in the
git checkout main
-When you make changes to documentation that affect an upcoming release, use
+When you make changes to documentation that affect an upcoming release, use
the ``next`` branch. ``next`` is the second most commonly used branch. :
.. prompt:: bash $
usually contains a TOC, where you can add the new file name. All documents must
have a title. See `Headings`_ for details.
-Your new document doesn't get tracked by ``git`` automatically. When you want
-to add the document to the repository, you must use ``git add
+Your new document doesn't get tracked by ``git`` automatically. When you want
+to add the document to the repository, you must use ``git add
{path-to-filename}``. For example, from the top level directory of the
repository, adding an ``example.rst`` file to the ``rados`` subdirectory would
look like this:
- graphviz
- ant
- ditaa
+- cython3
.. raw:: html
.. prompt:: bash $
sudo apt-get install gcc python-dev python3-pip libxml2-dev libxslt-dev doxygen graphviz ant ditaa
- sudo apt-get install python3-sphinx python3-venv
+ sudo apt-get install python3-sphinx python3-venv cython3
For Fedora distributions, execute the following:
- A commit MUST have a comment.
- A commit comment MUST be prepended with ``doc:``. (strict)
- The comment summary MUST be one line only. (strict)
-- Additional comments MAY follow a blank line after the summary,
+- Additional comments MAY follow a blank line after the summary,
but should be terse.
- A commit MAY include ``Fixes: https://tracker.ceph.com/issues/{bug number}``.
- Commits MUST include ``Signed-off-by: Firstname Lastname <email>``. (strict)
-.. tip:: Follow the foregoing convention particularly where it says
- ``(strict)`` or you will be asked to modify your commit to comply with
+.. tip:: Follow the foregoing convention particularly where it says
+ ``(strict)`` or you will be asked to modify your commit to comply with
this convention.
-The following is a common commit comment (preferred)::
+The following is a common commit comment (preferred)::
doc: Fixes a spelling error and a broken hyperlink.
-
+
Signed-off-by: John Doe <john.doe@gmail.com>
-The following comment includes a reference to a bug. ::
+The following comment includes a reference to a bug. ::
doc: Fixes a spelling error and a broken hyperlink.
Fixes: https://tracker.ceph.com/issues/1234
-
+
Signed-off-by: John Doe <john.doe@gmail.com>
The following comment includes a terse sentence following the comment summary.
-There is a carriage return between the summary line and the description::
+There is a carriage return between the summary line and the description::
doc: Added mon setting to monitor config reference
-
+
Describes 'mon setting', which is a new setting added
to config_opts.h.
-
+
Signed-off-by: John Doe <john.doe@gmail.com>
.. prompt:: bash $
git commit -a
-
+
An easy way to manage your documentation commits is to use visual tools for
``git``. For example, ``gitk`` provides a graphical interface for viewing the
cd {git-ceph-repo-path}
gitk
-
+
Finally, select **File->Start git gui** to activate the graphical user interface.
#. Make the commits that you will later squash.
#. Make the first commit.
-
+
::
-
+
doc/glossary: improve "CephX" entry
-
+
Improve the glossary entry for "CephX".
-
+
Signed-off-by: Zac Dover <zac.dover@proton.me>
-
+
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# Changes to be committed:
# modified: glossary.rst
#
-
+
#. Make the second commit.
-
+
::
-
+
doc/glossary: add link to architecture doc
-
+
Add a link to a section in the architecture document, which link
will be used in the process of improving the "CephX" glossary entry.
-
+
Signed-off-by: Zac Dover <zac.dover@proton.me>
-
+
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
#
# Changes to be committed:
# modified: architecture.rst
-
+
#. Make the third commit.
-
+
::
-
+
doc/glossary: link to Arch doc in "CephX" glossary
-
+
Link to the Architecture document from the "CephX" entry in the
Glossary.
-
+
Signed-off-by: Zac Dover <zac.dover@proton.me>
-
+
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# modified: glossary.rst
#. There are now three commits in the feature branch. We will now begin the
- process of squashing them into a single commit.
-
- #. Run the command ``git rebase -i main``, which rebases the current branch
+ process of squashing them into a single commit.
+
+ #. Run the command ``git rebase -i main``, which rebases the current branch
(the feature branch) against the ``main`` branch:
.. prompt:: bash
-
+
git rebase -i main
-
+
#. A list of the commits that have been made to the feature branch now
appear, and looks like this:
::
-
+
pick d395e500883 doc/glossary: improve "CephX" entry
pick b34986e2922 doc/glossary: add link to architecture doc
pick 74d0719735c doc/glossary: link to Arch doc in "CephX" glossary
-
+
# Rebase 0793495b9d1..74d0719735c onto 0793495b9d1 (3 commands)
#
# Commands:
#
# If you remove a line here THAT COMMIT WILL BE LOST.
- Find the part of the screen that says "pick". This is the part that you will
+ Find the part of the screen that says "pick". This is the part that you will
alter. There are three commits that are currently labeled "pick". We will
choose one of them to remain labeled "pick", and we will label the other two
commits "squash".
pick d395e500883 doc/glossary: improve "CephX" entry
squash b34986e2922 doc/glossary: add link to architecture doc
squash 74d0719735c doc/glossary: link to Arch doc in "CephX" glossary
-
+
# Rebase 0793495b9d1..74d0719735c onto 0793495b9d1 (3 commands)
#
# Commands:
like this:
::
-
+
# This is a combination of 3 commits.
# This is the 1st commit message:
-
+
doc/glossary: improve "CephX" entry
-
+
Improve the glossary entry for "CephX".
-
+
Signed-off-by: Zac Dover <zac.dover@proton.me>
-
+
# This is the commit message #2:
-
+
doc/glossary: add link to architecture doc
-
+
Add a link to a section in the architecture document, which link
will be used in the process of improving the "CephX" glossary entry.
-
+
Signed-off-by: Zac Dover <zac.dover@proton.me>
-
+
# This is the commit message #3:
-
+
doc/glossary: link to Arch doc in "CephX" glossary
-
+
Link to the Architecture document from the "CephX" entry in the
Glossary.
-
+
Signed-off-by: Zac Dover <zac.dover@proton.me>
-
+
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# Changes to be committed:
# modified: doc/architecture.rst
# modified: doc/glossary.rst
-
- #. The commit messages have been revised into the simpler form presented here:
-
+
+ #. The commit messages have been revised into the simpler form presented here:
+
::
-
+
doc/glossary: improve "CephX" entry
-
+
Improve the glossary entry for "CephX".
-
+
Signed-off-by: Zac Dover <zac.dover@proton.me>
-
+
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
#. Force push the squashed commit from your local working copy to the remote
upstream branch. The force push is necessary because the newly squashed commit
- does not have an ancestor in the remote. If that confuses you, just run this
+ does not have an ancestor in the remote. If that confuses you, just run this
command and don't think too much about it:
- .. prompt:: bash $
+ .. prompt:: bash $
git push -f
-
+
::
Enumerating objects: 9, done.
Headings
--------
-#. **Document Titles:** Document titles use the ``=`` character overline and
- underline with a leading and trailing space on the title text line.
+#. **Document Titles:** Document titles use the ``=`` character overline and
+ underline with a leading and trailing space on the title text line.
See `Document Title`_ for details.
#. **Section Titles:** Section tiles use the ``=`` character underline with no
- leading or trailing spaces for text. Two carriage returns should precede a
+ leading or trailing spaces for text. Two carriage returns should precede a
section title (unless an inline reference precedes it). See `Sections`_ for
details.
-#. **Subsection Titles:** Subsection titles use the ``_`` character underline
- with no leading or trailing spaces for text. Two carriage returns should
+#. **Subsection Titles:** Subsection titles use the ``_`` character underline
+ with no leading or trailing spaces for text. Two carriage returns should
precede a subsection title (unless an inline reference precedes it).
possible, we prefer to maintain this convention with text, lists, literal text
(exceptions allowed), tables, and ``ditaa`` graphics.
-#. **Paragraphs**: Paragraphs have a leading and a trailing carriage return,
- and should be 80 characters wide or less so that the documentation can be
+#. **Paragraphs**: Paragraphs have a leading and a trailing carriage return,
+ and should be 80 characters wide or less so that the documentation can be
read in native format in a command line terminal.
#. **Literal Text:** To create an example of literal text (e.g., command line
usage), terminate the preceding paragraph with ``::`` or enter a carriage
return to create an empty line after the preceding paragraph; then, enter
``::`` on a separate line followed by another empty line. Then, begin the
- literal text with tab indentation (preferred) or space indentation of 3
+ literal text with tab indentation (preferred) or space indentation of 3
characters.
-#. **Indented Text:** Indented text such as bullet points
+#. **Indented Text:** Indented text such as bullet points
(e.g., ``- some text``) may span multiple lines. The text of subsequent
lines should begin at the same character position as the text of the
indented text (less numbers, bullets, etc.).
#. **Numbered Lists:** Numbered lists should use autonumbering by starting
a numbered indent with ``#.`` instead of the actual number so that
- numbered paragraphs can be repositioned without requiring manual
+ numbered paragraphs can be repositioned without requiring manual
renumbering.
-#. **Code Examples:** Ceph supports the use of the
- ``.. code-block::<language>`` role, so that you can add highlighting to
- source examples. This is preferred for source code. However, use of this
- tag will cause autonumbering to restart at 1 if it is used as an example
+#. **Code Examples:** Ceph supports the use of the
+ ``.. code-block::<language>`` role, so that you can add highlighting to
+ source examples. This is preferred for source code. However, use of this
+ tag will cause autonumbering to restart at 1 if it is used as an example
within a numbered list. See `Showing code examples`_ for details.
#. **Version Added:** Use the ``.. versionadded::`` directive for new features
or configuration settings so that users know the minimum release for using
a feature.
-
+
#. **Version Changed:** Use the ``.. versionchanged::`` directive for changes
in usage or configuration settings.
-#. **Deprecated:** Use the ``.. deprecated::`` directive when CLI usage,
- a feature or a configuration setting is no longer preferred or will be
+#. **Deprecated:** Use the ``.. deprecated::`` directive when CLI usage,
+ a feature or a configuration setting is no longer preferred or will be
discontinued.
#. **Topic:** Use the ``.. topic::`` directive to encapsulate text that is
documentation suite must be linked either (1) from another document in the
documentation suite or (2) from a table of contents (TOC). If any document in
the documentation suite is not linked in this way, the ``build-doc`` script
-generates warnings when it tries to build the documentation.
+generates warnings when it tries to build the documentation.
The Ceph project uses the ``.. toctree::`` directive. See `The TOC tree`_ for
details. When rendering a table of contents (TOC), specify the ``:maxdepth:``
For example, RST that links to the Sphinx Python Document Generator homepage
and generates a sentence reading "Click here to learn more about Python
-Sphinx." looks like this:
+Sphinx." looks like this:
::
``Click `here <https://www.sphinx-doc.org>`_ to learn more about Python
- Sphinx.``
+ Sphinx.``
And here it is, rendered:
-Click `here <https://www.sphinx-doc.org>`_ to learn more about Python Sphinx.
+Click `here <https://www.sphinx-doc.org>`_ to learn more about Python Sphinx.
Pay special attention to the underscore after the backtick. If you forget to
include it and this is your first day working with RST, there's a chance that
`inline text <http:www.foo.com>`_
.. note:: Do not fail to include the space between the inline text and the
- less-than sign.
-
+ less-than sign.
+
Do not fail to include the underscore after the final backtick.
To link to addresses that are external to the Ceph documentation, include a
:ref:`inline text<target>`
-.. note::
+.. note::
There is no space between "inline text" and the angle bracket that
immediately follows it. This is precisely the opposite of :ref:`the
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section explains how to make certain letters within a word bold while
-leaving the other letters in the word regular (non-bold).
+leaving the other letters in the word regular (non-bold).
The following single-line paragraph provides an example of this:
==========================
Ceph is designed to run on commodity hardware, which makes building and
-maintaining petabyte-scale data clusters flexible and economically feasible.
-When planning your cluster's hardware, you will need to balance a number
+maintaining petabyte-scale data clusters flexible and economically feasible.
+When planning your cluster's hardware, you will need to balance a number
of considerations, including failure domains, cost, and performance.
-Hardware planning should include distributing Ceph daemons and
-other processes that use Ceph across many hosts. Generally, we recommend
-running Ceph daemons of a specific type on a host configured for that type
-of daemon. We recommend using separate hosts for processes that utilize your
-data cluster (e.g., OpenStack, CloudStack, Kubernetes, etc).
+Hardware planning should include distributing Ceph daemons and
+other processes that use Ceph across many hosts. Generally, we recommend
+running Ceph daemons of a specific type on a host configured for that type
+of daemon. We recommend using separate hosts for processes that utilize your
+data cluster (e.g., OpenStack, OpenNebula, CloudStack, Kubernetes, etc).
The requirements of one Ceph cluster are not the same as the requirements of
-another, but below are some general guidelines.
+another, but below are some general guidelines.
.. tip:: check out the `ceph blog`_ too.
configuration option.
- Setting the :confval:`osd_memory_target` below 2GB is not
- recommended. Ceph may fail to keep the memory consumption under 2GB and
+ recommended. Ceph may fail to keep the memory consumption under 2GB and
extremely slow performance is likely.
- Setting the memory target between 2GB and 4GB typically works but may result
OSD performance.
- Setting the :confval:`osd_memory_target` higher than 4GB can improve
- performance when there many (small) objects or when large (256GB/OSD
+ performance when there many (small) objects or when large (256GB/OSD
or more) data sets are processed. This is especially true with fast
NVMe OSDs.
fragmented huge pages. Modern versions of Ceph disable transparent huge
pages at the application level to avoid this, but that does not
guarantee that the kernel will immediately reclaim unmapped memory. The OSD
- may still at times exceed its memory target. We recommend budgeting
+ may still at times exceed its memory target. We recommend budgeting
at least 20% extra memory on your system to prevent OSDs from going OOM
(**O**\ut **O**\f **M**\emory) during temporary spikes or due to delay in
the kernel reclaiming freed pages. That 20% value might be more or less than
.. tip:: Hosting multiple OSDs on a single SAS / SATA HDD
is **NOT** a good idea.
-.. tip:: Hosting an OSD with monitor, manager, or MDS data on a single
+.. tip:: Hosting an OSD with monitor, manager, or MDS data on a single
drive is also **NOT** a good idea.
.. tip:: With spinning disks, the SATA and SAS interface increasingly
- becomes a bottleneck at larger capacities. See also the `Storage Networking
+ becomes a bottleneck at larger capacities. See also the `Storage Networking
Industry Association's Total Cost of Ownership calculator`_.
------------------
Ceph performance is much improved when using solid-state drives (SSDs). This
-reduces random access time and reduces latency while increasing throughput.
+reduces random access time and reduces latency while increasing throughput.
SSDs cost more per gigabyte than do HDDs but SSDs often offer
access times that are, at a minimum, 100 times faster than HDDs.
limitations though. When evaluating SSDs, it is important to consider the
performance of sequential and random reads and writes.
-.. important:: We recommend exploring the use of SSDs to improve performance.
+.. important:: We recommend exploring the use of SSDs to improve performance.
However, before making a significant investment in SSDs, we **strongly
recommend** reviewing the performance metrics of an SSD and testing the
- SSD in a test configuration in order to gauge performance.
+ SSD in a test configuration in order to gauge performance.
Relatively inexpensive SSDs may appeal to your sense of economy. Use caution.
Acceptable IOPS are not the only factor to consider when selecting SSDs for
purchases an annual maintenance contract or extended warranty.
.. tip:: The `Ceph blog`_ is often an excellent source of information on Ceph
- performance issues. See `Ceph Write Throughput 1`_ and `Ceph Write
+ performance issues. See `Ceph Write Throughput 1`_ and `Ceph Write
Throughput 2`_ for additional details.
an ``active + clean`` state, the better. Notably, fast recovery minimizes
the likelihood of multiple, overlapping failures that can cause data to become
temporarily unavailable or even lost. Of course, when provisioning your
-network, you will have to balance price against performance.
+network, you will have to balance price against performance.
Some deployment tools employ VLANs to make hardware and network cabling more
manageable. VLANs that use the 802.1q protocol require VLAN-capable NICs and
Additionally BMCs as of 2023 rarely sport network connections faster than 1 Gb/s,
so dedicated and inexpensive 1 Gb/s switches for BMC administrative traffic
may reduce costs by wasting fewer expenive ports on faster host switches.
-
+
Failure Domains
===============