+++ /dev/null
-===========================================
- Hard Disk and File System Recommendations
-===========================================
-
-.. index:: hard drive preparation
-
-Hard Drive Prep
-===============
-
-Ceph aims for data safety, which means that when the :term:`Ceph Client`
-receives notice that data was written to a storage drive, that data was actually
-written to the storage drive. For old kernels (<2.6.33), disable the write cache
-if the journal is on a raw drive. Newer kernels should work fine.
-
-Use ``hdparm`` to disable write caching on the hard disk::
-
- sudo hdparm -W 0 /dev/hda 0
-
-In production environments, we recommend running a :term:`Ceph OSD Daemon` with
-separate drives for the operating system and the data. If you run data and an
-operating system on a single disk, we recommend creating a separate partition
-for your data.
-
-.. index:: filesystems
-
-Filesystems
-===========
-
-Ceph OSD Daemons rely heavily upon the stability and performance of the
-underlying filesystem.
-
-Recommended
------------
-
-We currently recommend ``XFS`` for production deployments.
-
-Not recommended
----------------
-
-We recommand *against* using ``btrfs`` due to the lack of a stable
-version to test against and frequent bugs in the ENOSPC handling.
-
-We recommend *against* using ``ext4`` due to limitations in the size
-of xattrs it can store, and the problems this causes with the way Ceph
-handles long RADOS object names. Although these issues will generally
-not surface with Ceph clusters using only short object names (e.g., an
-RBD workload that does not include long RBD image names), other users
-like RGW make extensive use of long object names and can break.
-
-Starting with the Jewel release, the ``ceph-osd`` daemon will refuse
-to start if the configured max object name cannot be safely stored on
-``ext4``. If the cluster is only being used with short object names
-(e.g., RBD only), you can continue using ``ext4`` by setting the
-following configuration option::
-
- osd max object name len = 256
- osd max object namespace len = 64
-
-.. note:: This may result in difficult-to-diagnose errors if you try
- to use RGW or other librados clients that do not properly
- handle or politely surface any resulting ENAMETOOLONG
- errors.
--- /dev/null
+=================
+ Storage Devices
+=================
+
+There are two Ceph daemons that store data on disk:
+
+* **Ceph OSDs** (or Object Storage Daemons) are where most of the
+ data is stored in Ceph. Generally speaking, each OSD is backed by
+ a single storage device, like a traditional hard disk (HDD) or
+ solid state disk (SSD). OSDs can also be backed by a combination
+ of devices, like a HDD for most data and an SSD (or partition of an
+ SSD) for some metadata. The number of OSDs in a cluster is
+ generally a function of how much data will be stored, how big each
+ storage device will be, and the level and type of redundancy
+ (replication or erasure coding).
+* **Ceph Monitor** daemons manage critical cluster state like cluster
+ membership and authentication information. For smaller clusters a
+ few gigabytes is all that is needed, although for larger clusters
+ the monitor database can reach tens or possibly hundreds of
+ gigabytes.
+
+
+OSD Backends
+============
+
+There are two ways that OSDs can manage the data they store. Starting
+with the Luminous 12.2.z release, the new default (and recommended) backend is
+*BlueStore*. Prior to Luminous, the default (and only option) was
+*FileStore*.
+
+BlueStore
+---------
+
+BlueStore is a special-purpose storage backend designed specifically
+for managing data on disk for Ceph OSD workloads. It is motivated by
+experience supporting and managing OSDs using FileStore over the
+last ten years. Key BlueStore features include:
+
+* Direct management of storage devices. BlueStore consumes raw block
+ devices or partitions. This avoids any intervening layers of
+ abstraction (such as local file systems like XFS) that may limit
+ performance or add complexity.
+* Metadata management with RocksDB. We embed RocksDB's key/value database
+ in order to manage internal metadata, such as the mapping from object
+ names to block locations on disk.
+* Full data and metadata checksumming. By default all data and
+ metadata written to BlueStore is protected by one or more
+ checksums. No data or metadata will be read from disk or returned
+ to the user without being verified.
+* Inline compression. Data written may be optionally compressed
+ before being written to disk.
+* Multi-device metadata tiering. BlueStore allows its internal
+ journal (write-ahead log) to be written to a separate, high-speed
+ device (like an SSD, NVMe, or NVDIMM) to increased performance. If
+ a significant amount of faster storage is available, internal
+ metadata can also be stored on the faster device.
+* Efficient copy-on-write. RBD and CephFS snapshots rely on a
+ copy-on-write *clone* mechanism that is implemented efficiently in
+ BlueStore. This results in efficient IO both for regular snapshots
+ and for erasure coded pools (which rely on cloning to implement
+ efficient two-phase commits).
+
+For more information, see :doc:`bluestore-config-ref`.
+
+FileStore
+---------
+
+FileStore is the legacy approach to storing objects in Ceph. It
+relies on a standard file system (normally XFS) in combination with a
+key/value database (traditionally LevelDB, now RocksDB) for some
+metadata.
+
+FileStore is well-tested and widely used in production but suffers
+from many performance deficiencies due to its overall design and
+reliance on a traditional file system for storing object data.
+
+Although FileStore is generally capable of functioning on most
+POSIX-compatible file systems (including btrfs and ext4), we only
+recommend that XFS be used. Both btrfs and ext4 have known bugs and
+deficiencies and their use may lead to data loss. By default all Ceph
+provisioning tools will use XFS.
+
+For more information, see :doc:`filestore-config-ref`.