From 7a2a5651018ac99beef022d6e05d537250661486 Mon Sep 17 00:00:00 2001 From: Anthony D'Atri Date: Sun, 15 May 2022 14:26:58 -0700 Subject: [PATCH] doc/rados/configuration: Enhance BlueStore min_alloc_size section Signed-off-by: Anthony D'Atri --- .../configuration/bluestore-config-ref.rst | 41 ++++++++++++++----- 1 file changed, 31 insertions(+), 10 deletions(-) diff --git a/doc/rados/configuration/bluestore-config-ref.rst b/doc/rados/configuration/bluestore-config-ref.rst index 493e529f76b01..55fc9794ae1d9 100644 --- a/doc/rados/configuration/bluestore-config-ref.rst +++ b/doc/rados/configuration/bluestore-config-ref.rst @@ -395,16 +395,13 @@ is created on an HDD, BlueStore will be initialized with the current value of :confval:`bluestore_min_alloc_size_hdd`, and SSD OSDs (including NVMe devices) with the value of :confval:`bluestore_min_alloc_size_ssd`. -Note that this BlueStore attribute takes effect *only* at OSD creation; if -changed later, a given OSD's behavior will not change unless / until it is -destroyed and redeployed. - Through the Mimic release, the default values were 64KB and 16KB for rotational -(HDD) and non-rotational (SSD) media respectively. Octopus and later releases -default to a value of 4KB for all media types. +(HDD) and non-rotational (SSD) media respectively. Octopus changed the default +for SSD (non-rotational) media to 4KB, and Pacific changed the default for HDD +(rotational) media to 4KB as well. -This change was driven by the space amplification experienced by Ceph RADOS -GateWay (RGW) deployments that host large numbers of relatively small files +These changes were driven by space amplification experienced by Ceph RADOS +GateWay (RGW) deployments that host large numbers of small files (S3/Swift objects). For example, when an RGW client stores a 1KB S3 object, it is written to a @@ -446,12 +443,36 @@ the :confval:`bluestore_use_optimal_io_size_for_min_alloc_size` option that enables automatic discovery of the appropriate value as each OSD is created. Note that the use of ``bcache``, ``OpenCAS``, ``dmcrypt``, ``ATA over Ethernet``, `iSCSI`, or other device layering / abstraction -technologies may confound the determination of appropriate values. We suggest -inspecting such OSDs at startup via logs and admin sockets to ensure that +technologies may confound the determination of appropriate values. OSD devices +deployed on top of VMware VSAN virtual volumes have been reported to also +sometimes report a ``rotational`` attribute that does not match the underlying +hardware. + +We suggest inspecting such OSDs at startup via logs and admin sockets to ensure that behavior is appropriate. Note that this also may not work as desired with older kernels. You can check for this by examining the presence and value of ``/sys/block//queue/optimal_io_size``. +You may also inspect a given OSD: + + .. prompt:: bash # + + ceph osd metadata osd.1701 | grep rotational + +This space amplification may manifest as an unusually high ratio of raw to +stored data reported by ``ceph df``. ``ceph osd df`` may also report +anomalously high ``%USE`` / ``VAR`` values when +compared to other, ostensibly identical OSDs. A pool using OSDs with +mismatched ``min_alloc_size`` values may experience unexpected balancer +behavior as well. + +Note that this BlueStore attribute takes effect *only* at OSD creation; if +changed later, a given OSD's behavior will not change unless / until it is +destroyed and redeployed with the appropriate option value(s). Upgrading +to a later Ceph release will *not* change the value used by OSDs deployed +under older releases or with other settings. + + .. confval:: bluestore_min_alloc_size .. confval:: bluestore_min_alloc_size_hdd .. confval:: bluestore_min_alloc_size_ssd -- 2.39.5