From 0346998b5a581443872fce855334ef4df61dc155 Mon Sep 17 00:00:00 2001 From: Sage Weil Date: Fri, 30 Jun 2017 13:54:18 -0400 Subject: [PATCH] doc: recommend against btrfs Signed-off-by: Sage Weil --- doc/dev/index.rst | 2 +- doc/dev/object-store.rst | 3 ++ doc/dev/osd_internals/osd_throttles.rst | 2 +- doc/install/manual-deployment.rst | 2 +- doc/man/8/ceph-deploy.rst | 2 +- doc/man/8/ceph-osd.rst | 2 +- doc/rados/configuration/ceph-conf.rst | 5 ++- .../filesystem-recommendations.rst | 43 ++----------------- .../troubleshooting/troubleshooting-osd.rst | 20 +++++---- doc/start/hardware-recommendations.rst | 8 +--- doc/start/os-recommendations.rst | 10 ++--- 11 files changed, 33 insertions(+), 66 deletions(-) diff --git a/doc/dev/index.rst b/doc/dev/index.rst index abefa34a2d2..9cc3c03e06c 100644 --- a/doc/dev/index.rst +++ b/doc/dev/index.rst @@ -1149,7 +1149,7 @@ reduce the number of tests that are triggered. For instance:: teuthology-suite --suite rados --subset 0/4000 will run as few tests as possible. The tradeoff in this case is that -some tests will only run on ``xfs`` and not on ``ext4`` or ``btrfs``, +not all combinations of test variations will together, but no matter how small a ratio is provided in the ``--subset``, teuthology will still ensure that all files in the suite are in at least one test. Understanding the actual logic that drives this diff --git a/doc/dev/object-store.rst b/doc/dev/object-store.rst index 8e00d197085..355f5154832 100644 --- a/doc/dev/object-store.rst +++ b/doc/dev/object-store.rst @@ -57,6 +57,9 @@ "PrimaryLogPG" -> "OSDMap" "ObjectStore" -> "FileStore" + "ObjectStore" -> "BlueStore" + + "BlueStore" -> "rocksdb" "FileStore" -> "xfs" "FileStore" -> "btrfs" diff --git a/doc/dev/osd_internals/osd_throttles.rst b/doc/dev/osd_internals/osd_throttles.rst index e1142b3f799..6739bd9ea5a 100644 --- a/doc/dev/osd_internals/osd_throttles.rst +++ b/doc/dev/osd_internals/osd_throttles.rst @@ -17,7 +17,7 @@ flushing and block in FileStore::_do_op if we have exceeded any hard limits until the background flusher catches up. The relevant config options are filestore_wbthrottle*. There are -different defaults for btrfs and xfs. Each set has hard and soft +different defaults for xfs and btrfs. Each set has hard and soft limits on bytes (total dirty bytes), ios (total dirty ios), and inodes (total dirty fds). The WBThrottle will begin flushing when any of these hits the soft limit and will block in throttle() diff --git a/doc/install/manual-deployment.rst b/doc/install/manual-deployment.rst index e60c6beab2a..db06fdf3a49 100644 --- a/doc/install/manual-deployment.rst +++ b/doc/install/manual-deployment.rst @@ -323,7 +323,7 @@ on ``node2`` and ``node3``: #. Prepare the OSD. :: ssh {node-name} - sudo ceph-disk prepare --cluster {cluster-name} --cluster-uuid {uuid} --fs-type {ext4|xfs|btrfs} {data-path} [{journal-path}] + sudo ceph-disk prepare --cluster {cluster-name} --cluster-uuid {uuid} {data-path} [{journal-path}] For example:: diff --git a/doc/man/8/ceph-deploy.rst b/doc/man/8/ceph-deploy.rst index 3f197504ed9..ff96574dff6 100644 --- a/doc/man/8/ceph-deploy.rst +++ b/doc/man/8/ceph-deploy.rst @@ -522,7 +522,7 @@ Options .. option:: --fs-type - Filesystem to use to format disk ``(xfs, btrfs or ext4)``. + Filesystem to use to format disk ``(xfs, btrfs or ext4)``. Note that support for btrfs and ext4 is no longer tested or recommended; please use xfs. .. option:: --fsid diff --git a/doc/man/8/ceph-osd.rst b/doc/man/8/ceph-osd.rst index 3b89740fc2c..388e339d975 100644 --- a/doc/man/8/ceph-osd.rst +++ b/doc/man/8/ceph-osd.rst @@ -20,7 +20,7 @@ Description system. It is responsible for storing objects on a local file system and providing access to them over the network. -The datapath argument should be a directory on a btrfs file system +The datapath argument should be a directory on a xfs file system where the object data resides. The journal is optional, and is only useful performance-wise when it resides on a different disk than datapath with low latency (ideally, an NVRAM device). diff --git a/doc/rados/configuration/ceph-conf.rst b/doc/rados/configuration/ceph-conf.rst index a56eee88000..c5cf27cb72c 100644 --- a/doc/rados/configuration/ceph-conf.rst +++ b/doc/rados/configuration/ceph-conf.rst @@ -383,8 +383,9 @@ use with Ceph, and mount it to the directory you just created:: sudo mkfs -t {fstype} /dev/{disk} sudo mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number} -We recommend using the ``xfs`` file system or the ``btrfs`` file system when -running :command:`mkfs`. +We recommend using the ``xfs`` file system when running +:command:`mkfs`. (``btrfs`` and ``ext4`` are not recommended and no +longer tested.) See the `OSD Config Reference`_ for additional configuration details. diff --git a/doc/rados/configuration/filesystem-recommendations.rst b/doc/rados/configuration/filesystem-recommendations.rst index 6225dd379ad..c967d60ce07 100644 --- a/doc/rados/configuration/filesystem-recommendations.rst +++ b/doc/rados/configuration/filesystem-recommendations.rst @@ -34,16 +34,12 @@ Recommended We currently recommend ``XFS`` for production deployments. -We used to recommend ``btrfs`` for testing, development, and any non-critical -deployments becuase it has the most promising set of features. However, we -now plan to avoid using a kernel file system entirely with the new BlueStore -backend. ``btrfs`` is still supported and has a comparatively compelling -set of features, but be mindful of its stability and support status in your -Linux distribution. - Not recommended --------------- +We recommand *against* using ``btrfs`` due to the lack of a stable +version to test against and frequent bugs in the ENOSPC handling. + We recommend *against* using ``ext4`` due to limitations in the size of xattrs it can store, and the problems this causes with the way Ceph handles long RADOS object names. Although these issues will generally @@ -64,36 +60,3 @@ following configuration option:: to use RGW or other librados clients that do not properly handle or politely surface any resulting ENAMETOOLONG errors. - - -Filesystem Background Info -========================== - -The ``XFS``, ``btrfs`` and ``ext4`` file systems provide numerous -advantages in highly scaled data storage environments when `compared`_ -to ``ext3``. - -``XFS``, ``btrfs`` and ``ext4`` are `journaling file systems`_, which means that -they are more robust when recovering from crashes, power outages, etc. These -filesystems journal all of the changes they will make before performing writes. - -``XFS`` was developed for Silicon Graphics, and is a mature and stable -filesystem. By contrast, ``btrfs`` is a relatively new file system that aims -to address the long-standing wishes of system administrators working with -large scale data storage environments. ``btrfs`` has some unique features -and advantages compared to other Linux filesystems. - -``btrfs`` is a `copy-on-write`_ filesystem. It supports file creation -timestamps and checksums that verify metadata integrity, so it can detect -bad copies of data and fix them with the good copies. The copy-on-write -capability means that ``btrfs`` can support snapshots that are writable. -``btrfs`` supports transparent compression and other features. - -``btrfs`` also incorporates multi-device management into the file system, -which enables you to support heterogeneous disk storage infrastructure, -data allocation policies. The community also aims to provide ``fsck``, -deduplication, and data encryption support in the future. - -.. _copy-on-write: http://en.wikipedia.org/wiki/Copy-on-write -.. _compared: http://en.wikipedia.org/wiki/Comparison_of_file_systems -.. _journaling file systems: http://en.wikipedia.org/wiki/Journaling_file_system diff --git a/doc/rados/troubleshooting/troubleshooting-osd.rst b/doc/rados/troubleshooting/troubleshooting-osd.rst index f72c6a4adc1..85e8ced6cb4 100644 --- a/doc/rados/troubleshooting/troubleshooting-osd.rst +++ b/doc/rados/troubleshooting/troubleshooting-osd.rst @@ -286,10 +286,11 @@ A storage drive should only support one OSD. Sequential read and sequential write throughput can bottleneck if other processes share the drive, including journals, operating systems, monitors, other OSDs and non-Ceph processes. -Ceph acknowledges writes *after* journaling, so fast SSDs are an attractive -option to accelerate the response time--particularly when using the ``XFS`` or -``ext4`` filesystems. By contrast, the ``btrfs`` filesystem can write and journal -simultaneously. +Ceph acknowledges writes *after* journaling, so fast SSDs are an +attractive option to accelerate the response time--particularly when +using the ``XFS`` or ``ext4`` filesystems. By contrast, the ``btrfs`` +filesystem can write and journal simultaneously. (Note, however, that +we recommend against using ``btrfs`` for production deployments.) .. note:: Partitioning a drive does not change its total throughput or sequential read/write limits. Running a journal in a separate partition @@ -364,10 +365,13 @@ might not have a recent enough version of ``glibc`` to support ``syncfs(2)``. Filesystem Issues ----------------- -Currently, we recommend deploying clusters with XFS. The btrfs -filesystem has many attractive features, but bugs in the filesystem may -lead to performance issues. We do not recommend ext4 because xattr size -limitations break our support for long object names (needed for RGW). +Currently, we recommend deploying clusters with XFS. + +We recommend against using btrfs or ext4. The btrfs filesystem has +many attractive features, but bugs in the filesystem may lead to +performance issues and suprious ENOSPC errors. We do not recommend +ext4 because xattr size limitations break our support for long object +names (needed for RGW). For more information, see `Filesystem Recommendations`_. diff --git a/doc/start/hardware-recommendations.rst b/doc/start/hardware-recommendations.rst index 779cf8fd54d..dda22284387 100644 --- a/doc/start/hardware-recommendations.rst +++ b/doc/start/hardware-recommendations.rst @@ -52,10 +52,7 @@ Data Storage Plan your data storage configuration carefully. There are significant cost and performance tradeoffs to consider when planning for data storage. Simultaneous OS operations, and simultaneous request for read and write operations from -multiple daemons against a single drive can slow performance considerably. There -are also file system limitations to consider: btrfs is not quite stable enough -for production, but it has the ability to journal and write data simultaneously, -whereas XFS does not. +multiple daemons against a single drive can slow performance considerably. .. important:: Since Ceph has to write all data to the journal before it can send an ACK (for XFS at least), having the journal and OSD @@ -99,8 +96,7 @@ You may run multiple Ceph OSD Daemons per hard disk drive, but this will likely lead to resource contention and diminish the overall throughput. You may store a journal and object data on the same drive, but this may increase the time it takes to journal a write and ACK to the client. Ceph must write to the journal -before it can ACK the write. The btrfs filesystem can write journal data and -object data simultaneously, whereas XFS cannot. +before it can ACK the write. Ceph best practices dictate that you should run operating systems, OSD data and OSD journals on separate drives. diff --git a/doc/start/os-recommendations.rst b/doc/start/os-recommendations.rst index 65a3ba3d997..1aeb42dca04 100644 --- a/doc/start/os-recommendations.rst +++ b/doc/start/os-recommendations.rst @@ -34,8 +34,8 @@ Linux Kernel - **B-tree File System (Btrfs)** - If you use the ``btrfs`` file system with Ceph, we recommend using a - recent Linux kernel (3.14 or later). + We recommand *against* using ``btrfs`` with Ceph. However, if you + insist on using ``btrfs``, we recommend using a recent Linux kernel. Platforms ========= @@ -130,15 +130,15 @@ Notes ----- - **1**: The default kernel has an older version of ``btrfs`` that we do not - recommend for ``ceph-osd`` storage nodes. Upgrade to a recommended - kernel or use ``XFS``. + recommend for ``ceph-osd`` storage nodes. We recommend using ``XFS``. - **2**: The default kernel has an old Ceph client that we do not recommend for kernel client (kernel RBD or the Ceph file system). Upgrade to a recommended kernel. - **3**: The default kernel regularly fails in QA when the ``btrfs`` - file system is used. We do not recommend using ``btrfs`` for backing Ceph OSDs. + file system is used. We do not recommend using ``btrfs`` for + backing Ceph OSDs. Testing -- 2.39.5