teuthology-suite --suite rados --subset 0/4000
will run as few tests as possible. The tradeoff in this case is that
-some tests will only run on ``xfs`` and not on ``ext4`` or ``btrfs``,
+not all combinations of test variations will together,
but no matter how small a ratio is provided in the ``--subset``,
teuthology will still ensure that all files in the suite are in at
least one test. Understanding the actual logic that drives this
"PrimaryLogPG" -> "OSDMap"
"ObjectStore" -> "FileStore"
+ "ObjectStore" -> "BlueStore"
+
+ "BlueStore" -> "rocksdb"
"FileStore" -> "xfs"
"FileStore" -> "btrfs"
limits until the background flusher catches up.
The relevant config options are filestore_wbthrottle*. There are
-different defaults for btrfs and xfs. Each set has hard and soft
+different defaults for xfs and btrfs. Each set has hard and soft
limits on bytes (total dirty bytes), ios (total dirty ios), and
inodes (total dirty fds). The WBThrottle will begin flushing
when any of these hits the soft limit and will block in throttle()
#. Prepare the OSD. ::
ssh {node-name}
- sudo ceph-disk prepare --cluster {cluster-name} --cluster-uuid {uuid} --fs-type {ext4|xfs|btrfs} {data-path} [{journal-path}]
+ sudo ceph-disk prepare --cluster {cluster-name} --cluster-uuid {uuid} {data-path} [{journal-path}]
For example::
.. option:: --fs-type
- Filesystem to use to format disk ``(xfs, btrfs or ext4)``.
+ Filesystem to use to format disk ``(xfs, btrfs or ext4)``. Note that support for btrfs and ext4 is no longer tested or recommended; please use xfs.
.. option:: --fsid
system. It is responsible for storing objects on a local file system
and providing access to them over the network.
-The datapath argument should be a directory on a btrfs file system
+The datapath argument should be a directory on a xfs file system
where the object data resides. The journal is optional, and is only
useful performance-wise when it resides on a different disk than
datapath with low latency (ideally, an NVRAM device).
sudo mkfs -t {fstype} /dev/{disk}
sudo mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number}
-We recommend using the ``xfs`` file system or the ``btrfs`` file system when
-running :command:`mkfs`.
+We recommend using the ``xfs`` file system when running
+:command:`mkfs`. (``btrfs`` and ``ext4`` are not recommended and no
+longer tested.)
See the `OSD Config Reference`_ for additional configuration details.
We currently recommend ``XFS`` for production deployments.
-We used to recommend ``btrfs`` for testing, development, and any non-critical
-deployments becuase it has the most promising set of features. However, we
-now plan to avoid using a kernel file system entirely with the new BlueStore
-backend. ``btrfs`` is still supported and has a comparatively compelling
-set of features, but be mindful of its stability and support status in your
-Linux distribution.
-
Not recommended
---------------
+We recommand *against* using ``btrfs`` due to the lack of a stable
+version to test against and frequent bugs in the ENOSPC handling.
+
We recommend *against* using ``ext4`` due to limitations in the size
of xattrs it can store, and the problems this causes with the way Ceph
handles long RADOS object names. Although these issues will generally
to use RGW or other librados clients that do not properly
handle or politely surface any resulting ENAMETOOLONG
errors.
-
-
-Filesystem Background Info
-==========================
-
-The ``XFS``, ``btrfs`` and ``ext4`` file systems provide numerous
-advantages in highly scaled data storage environments when `compared`_
-to ``ext3``.
-
-``XFS``, ``btrfs`` and ``ext4`` are `journaling file systems`_, which means that
-they are more robust when recovering from crashes, power outages, etc. These
-filesystems journal all of the changes they will make before performing writes.
-
-``XFS`` was developed for Silicon Graphics, and is a mature and stable
-filesystem. By contrast, ``btrfs`` is a relatively new file system that aims
-to address the long-standing wishes of system administrators working with
-large scale data storage environments. ``btrfs`` has some unique features
-and advantages compared to other Linux filesystems.
-
-``btrfs`` is a `copy-on-write`_ filesystem. It supports file creation
-timestamps and checksums that verify metadata integrity, so it can detect
-bad copies of data and fix them with the good copies. The copy-on-write
-capability means that ``btrfs`` can support snapshots that are writable.
-``btrfs`` supports transparent compression and other features.
-
-``btrfs`` also incorporates multi-device management into the file system,
-which enables you to support heterogeneous disk storage infrastructure,
-data allocation policies. The community also aims to provide ``fsck``,
-deduplication, and data encryption support in the future.
-
-.. _copy-on-write: http://en.wikipedia.org/wiki/Copy-on-write
-.. _compared: http://en.wikipedia.org/wiki/Comparison_of_file_systems
-.. _journaling file systems: http://en.wikipedia.org/wiki/Journaling_file_system
write throughput can bottleneck if other processes share the drive, including
journals, operating systems, monitors, other OSDs and non-Ceph processes.
-Ceph acknowledges writes *after* journaling, so fast SSDs are an attractive
-option to accelerate the response time--particularly when using the ``XFS`` or
-``ext4`` filesystems. By contrast, the ``btrfs`` filesystem can write and journal
-simultaneously.
+Ceph acknowledges writes *after* journaling, so fast SSDs are an
+attractive option to accelerate the response time--particularly when
+using the ``XFS`` or ``ext4`` filesystems. By contrast, the ``btrfs``
+filesystem can write and journal simultaneously. (Note, however, that
+we recommend against using ``btrfs`` for production deployments.)
.. note:: Partitioning a drive does not change its total throughput or
sequential read/write limits. Running a journal in a separate partition
Filesystem Issues
-----------------
-Currently, we recommend deploying clusters with XFS. The btrfs
-filesystem has many attractive features, but bugs in the filesystem may
-lead to performance issues. We do not recommend ext4 because xattr size
-limitations break our support for long object names (needed for RGW).
+Currently, we recommend deploying clusters with XFS.
+
+We recommend against using btrfs or ext4. The btrfs filesystem has
+many attractive features, but bugs in the filesystem may lead to
+performance issues and suprious ENOSPC errors. We do not recommend
+ext4 because xattr size limitations break our support for long object
+names (needed for RGW).
For more information, see `Filesystem Recommendations`_.
Plan your data storage configuration carefully. There are significant cost and
performance tradeoffs to consider when planning for data storage. Simultaneous
OS operations, and simultaneous request for read and write operations from
-multiple daemons against a single drive can slow performance considerably. There
-are also file system limitations to consider: btrfs is not quite stable enough
-for production, but it has the ability to journal and write data simultaneously,
-whereas XFS does not.
+multiple daemons against a single drive can slow performance considerably.
.. important:: Since Ceph has to write all data to the journal before it can
send an ACK (for XFS at least), having the journal and OSD
lead to resource contention and diminish the overall throughput. You may store a
journal and object data on the same drive, but this may increase the time it
takes to journal a write and ACK to the client. Ceph must write to the journal
-before it can ACK the write. The btrfs filesystem can write journal data and
-object data simultaneously, whereas XFS cannot.
+before it can ACK the write.
Ceph best practices dictate that you should run operating systems, OSD data and
OSD journals on separate drives.
- **B-tree File System (Btrfs)**
- If you use the ``btrfs`` file system with Ceph, we recommend using a
- recent Linux kernel (3.14 or later).
+ We recommand *against* using ``btrfs`` with Ceph. However, if you
+ insist on using ``btrfs``, we recommend using a recent Linux kernel.
Platforms
=========
-----
- **1**: The default kernel has an older version of ``btrfs`` that we do not
- recommend for ``ceph-osd`` storage nodes. Upgrade to a recommended
- kernel or use ``XFS``.
+ recommend for ``ceph-osd`` storage nodes. We recommend using ``XFS``.
- **2**: The default kernel has an old Ceph client that we do not recommend
for kernel client (kernel RBD or the Ceph file system). Upgrade to a
recommended kernel.
- **3**: The default kernel regularly fails in QA when the ``btrfs``
- file system is used. We do not recommend using ``btrfs`` for backing Ceph OSDs.
+ file system is used. We do not recommend using ``btrfs`` for
+ backing Ceph OSDs.
Testing