From: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
Date: Thu, 23 Oct 2025 19:29:19 +0000 (-0400)
Subject: doc/start: Improve hardware-recommendations.rst
X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=d24c3ac173c09018cd45d8932cde264d36cee257;p=ceph.git

doc/start: Improve hardware-recommendations.rst

Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
---

diff --git a/doc/start/hardware-recommendations.rst b/doc/start/hardware-recommendations.rst
index 7a10315724b7..1047d56fc35c 100644
--- a/doc/start/hardware-recommendations.rst
+++ b/doc/start/hardware-recommendations.rst
@@ -68,15 +68,21 @@ is advised.
 	 on a small initial cluster footprint.
 
 There is an :confval:`osd_memory_target` setting for BlueStore OSDs that
-defaults to 4GB.  Factor in a prudent margin for the operating system and
+defaults to 4 GiB. Factor in a prudent margin for the operating system and
 administrative tasks (like monitoring and metrics) as well as increased
-consumption during recovery:  provisioning ~8GB *per BlueStore OSD* is thus
-advised.
+consumption during recovery. We recommend ensuring that total server RAM
+is greater than (number of OSDs * ``osd_memory_target`` * 2), which
+allows for usage by the OS and by other Ceph daemons. A 1U server with
+8-10 OSDs thus is well-provisioned with 128 GB of physical memory. Enabling
+:confval:`osd_memory_target_autotune` can help avoid OOMing under heavy load or when
+non-OSD daemons migrate onto a node. An effective :confval:`osd_memory_target` of
+at least 6 GiB can help mitigate slow requests on HDD OSDs.
+
 
 Monitors and Managers (ceph-mon and ceph-mgr)
 ---------------------------------------------
 
-Monitor and manager daemon memory usage scales with the size of the
+Monitor and Manager memory usage scales with the size of the
 cluster.  Note that at boot-time and during topology changes and recovery these
 daemons will need more RAM than they do during steady-state operation, so plan
 for peak usage. For very small clusters, 32 GB suffices. For clusters of up to,
@@ -99,8 +105,8 @@ its cache. We recommend 1 GB as a minimum for most systems.  See
 Memory
 ======
 
-Bluestore uses its own memory to cache data rather than relying on the
-operating system's page cache. In Bluestore you can adjust the amount of memory
+BlueStore uses its own memory to cache data rather than relying on the
+operating system's page cache. When using the BlueStore OSD back end you can adjust the amount of memory
 that the OSD attempts to consume by changing the :confval:`osd_memory_target`
 configuration option.
 
@@ -140,10 +146,11 @@ configuration option.
 	 may result in lower performance, and your Ceph cluster may well be
 	 happier with a daemon that crashes vs one that slows to a crawl.
 
-When using the legacy FileStore back end, the OS page cache was used for caching
-data, so tuning was not normally needed. When using the legacy FileStore backend,
-the OSD memory consumption was related to the number of PGs per daemon in the
-system.
+When using the legacy Filestore back end, the OS page cache was used for caching
+data, so tuning was not normally needed. OSD memory consumption is related
+to the workload and number of PGs that it serves. BlueStore OSDs do not use
+the page cache, so the autotuner is recommended to ensure that RAM is used
+fully but prudently.
 
 
 Data Storage
@@ -174,7 +181,7 @@ drives:
 For more
 information on how to effectively use a mix of fast drives and slow drives in
 your Ceph cluster, see the :ref:`block and block.db <bluestore-mixed-device-config>`
-section of the Bluestore Configuration Reference.
+section of the BlueStore Configuration Reference.
 
 Hard Disk Drives
 ----------------
@@ -507,19 +514,19 @@ core / spine network switches or routers, often at least 40 Gb/s.
 Baseboard Management Controller (BMC)
 -------------------------------------
 
-Your server chassis should have a Baseboard Management Controller (BMC).
+Your server chassis likely has a Baseboard Management Controller (BMC).
 Well-known examples are iDRAC (Dell), CIMC (Cisco UCS), and iLO (HPE).
 Administration and deployment tools may also use BMCs extensively, especially
 via IPMI or Redfish, so consider the cost/benefit tradeoff of an out-of-band
-network for security and administration.  Hypervisor SSH access, VM image uploads,
+network for security and administration. Hypervisor SSH access, VM image uploads,
 OS image installs, management sockets, etc. can impose significant loads on a network.
 Running multiple networks may seem like overkill, but each traffic path represents
 a potential capacity, throughput and/or performance bottleneck that you should
 carefully consider before deploying a large scale data cluster.
 
-Additionally BMCs as of 2023 rarely sport network connections faster than 1 Gb/s,
+Additionally, BMCs as of 2025 rarely offer network connections faster than 1 Gb/s,
 so dedicated and inexpensive 1 Gb/s switches for BMC administrative traffic
-may reduce costs by wasting fewer expenive ports on faster host switches.
+may reduce costs by wasting fewer expensive ports on faster host switches.
 
 
 Failure Domains