From: John Wilkins Date: Fri, 7 Sep 2012 03:31:46 +0000 (-0700) Subject: :doc: Addresses Documentation #3096. Also added new information. X-Git-Tag: v0.53~176 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=9bbe73cc1d2ea6dfbdea0d823aa6904c2e0873d7;p=ceph.git :doc: Addresses Documentation #3096. Also added new information. Signed-off-by: John Wilkins --- diff --git a/doc/install/hardware-recommendations.rst b/doc/install/hardware-recommendations.rst index a1d082717142..54349adba309 100644 --- a/doc/install/hardware-recommendations.rst +++ b/doc/install/hardware-recommendations.rst @@ -2,24 +2,103 @@ Hardware Recommendations ========================== -Ceph runs on commodity hardware and a Linux operating system over a TCP/IP -network. The hardware recommendations for different processes/daemons differ -considerably. - -* **OSDs:** OSD hosts should have ample data storage in the form of a hard drive - or a RAID. Ceph OSDs run the RADOS service, calculate data placement with - CRUSH, and maintain their own copy of the cluster map. Therefore, OSDs - should have a reasonable amount of processing power. - -* **Monitors:** Ceph monitor hosts require enough disk space for the cluster map, - but usually do not encounter heavy loads. Monitor hosts do not need to be - very powerful. +Ceph was designed to run on commodity hardware, which makes building and +maintaining petabyte-scale data clusters economically feasible. +When planning out your cluster hardware, you will need to balance a number +of considerations, including failure domains and potential performance +issues. Hardware planning should include distributing Ceph daemons and +other processes that use Ceph across many hosts. Generally, we recommend +running Ceph daemons of a specific type on a host configured for that type +of daemon. We recommend using other hosts for processes that utilize your +data cluster (e.g., OpenStack, CloudStack, etc). + +- **CPU**: Ceph metadata servers dynamically redistribute their load, which is + CPU intensive. So your metadata servers should have significant processing power + (e.g., quad core or better CPUs). Ceph OSDs run the RADOS service, calculate + data placement with CRUSH, replicate data, and maintain their own copy of the + cluster map. Therefore, OSDs should have a reasonable amount of processing power + (e.g., dual-core processors). Monitors simply maintain a master copy of the + cluster map, so they are not CPU intensive. You must also consider whether the + host machine will run CPU-intensive processes in addition to Ceph daemons. For + example, if your hosts will run computing VMs (e.g., OpenStack Nova), you will + need to ensure that these other processes leave sufficient processing power for + Ceph daemons. We recommend running additional CPU-intensive processes on + separate hosts. + +- **RAM**: Metadata servers and monitors must be capable of serving their data + quickly, so they should have plenty of RAM (e.g., 1GB of RAM per daemon + instance). OSDs do not require as much RAM (e.g., 500MB of RAM per daemon + instance). Generally, more RAM is better. + +- **Data Storage**: Plan your data storage configuration carefully, because + there are significant opportunities for performance improvement by incurring + the added cost of using solid state drives, and there are significant + cost-per-gigabyte considerations with hard disk drives. Metadata servers and monitors + don't use a lot of storage space. A metadata server requires approximately 1MB + of storage space per daemon instance. A monitor requires approximately 10GB of + storage space per daemon instance. One opportunity for performance improvement + is to use solid-state drives to reduce random access time and read latency while + accelerating throughput. Solid state drives cost more than 10x as much per + gigabyte when compared to a hard disk, but they often exhibit access times that + are at least 100x faster than a hard disk drive. Since the storage requirements for + metadata servers and monitors are so low, solid state drives may provide an + economical opportunity to improve performance. OSDs should have plenty of disk + space. We recommend a minimum disk size of 1 terabyte. We recommend dividing the + price of the hard disk drive by the number of gigabytes to arrive at a cost per + gigabyte, because larger drives may have a significant impact on the + cost-per-gigabyte. For example, a 1 terabyte hard disk priced at $75.00 has a cost + of $0.07 per gigabyte (i.e., $75 / 1024 = 0.0732). By contrast, a 3 terabyte hard + disk priced at $150.00 has a cost of $0.05 per gigabyte (i.e., $150 / 3072 = 0.0488). + In the foregoing example, using the 1 terabyte disks would generally increase the cost + per gigabyte by 40%--rendering your cluster substantially less cost efficient. + For OSD hosts, we recommend using an OS disk for the operating + system and software, and one disk for each OSD daemon you run on the host. While + solid state drives are cost prohibitive for object storage, OSDs may see a + performance improvement by storing an OSD's journal on a solid state drive and + the OSD's object data on a hard disk drive. You may run multiple OSDs per host, + but you should ensure that the sum of the total throughput of your OSD hard + disks doesn't exceed the network bandwidth required to service a client's need + to read or write data. You should also consider what percentage of the cluster's + data storage is on each host. If the percentage is large and the host fails, it + can lead to problems such as exceeding the ``full ratio``, which causes Ceph to + halt operations as a safety precaution that prevents data loss. -* **Metadata Servers:** Ceph metadata servers distribute their load. However, - metadata servers must be capable of serving their data quickly. Metadata - servers should have strong processing capability and plenty of RAM. +- **Networks**: We recommend that each host have at least two 1Gbps network interface + controllers (NICs). Since most commodity hard disk drives have a throughput of + approximately 100MB/sec., your NICs should be able to handle the traffic for + the OSD disks on your host. We recommend a minimum of two NICs to account for a + public (front-side) network and a cluster (back-side) network. A cluster network + (preferably not connected to the internet) handles the additional load for data + replication and helps stop denial of service attacks that prevent the cluster + from achieving ``active + clean`` states for placement groups as OSDs replicate + data across the cluster. Consider starting with a 10Gbps network in your racks. + Replicating 1TB of data across a 1Gbps network takes 3 hours, and 3TBs (a typical + drive configuration) takes 9 hours. By contrast, with a 10Gbps network, the + replication times would be 20 minutes and 1 hour respectively. In a petabyte-scale + cluster, failure of an OSD disk should be an expectation, not an exception. + System administrators will appreciate PGs recovering from a ``degraded`` state + to an ``active + clean`` state as rapidly as possible, with price / performance + tradeoffs taken into consideration. Top-of-rack routers for each network need to + be able to communicate with spine routers that have even faster throughput--e.g., + 40Gbps to 100Gbps. Some experts suggest using a third NIC per host for a management + network (e.g., hypervisor SSH access, VM image uploads, management sockets, etc.), + and potentially a fourth NIC per host to handle VM traffic between between the cluster + and compute stacks (e.g., OpenStack, CloudStack, etc.). Running three or four + logical networks may seem like overkill, but each traffic path represents a + potential capacity, throughput and/or performance bottleneck that you should + carefully consider before deploying a large scale data cluster. + +- **Failure Domains**: A failure domain is any failure that prevents access + to one or more OSDs. That could be a stopped daemon on a host; a hard disk failure, + an OS crash, a malfunctioning NIC, a failed power supply, a network outage, a power + outage, and so forth. When planning out your hardware needs, you must balance the + temptation to reduce costs by placing too many responsibilities into too few failure + domains, and the added costs of isolating every potential failure domain. + +`Inktank`_ provides excellent premium support for hardware planning. + +.. _Inktank: http://www.inktank.com -.. note:: If you are not using the Ceph File System, you do not need a meta data server. Minimum Hardware Recommendations ================================ @@ -30,29 +109,29 @@ and development clusters can run successfully with modest hardware. +--------------+----------------+------------------------------------+ | Process | Criteria | Minimum Recommended | +==============+================+====================================+ -| ``ceph-osd`` | Processor | 64-bit AMD-64/i386 dual-core | +| ``ceph-osd`` | Processor | 1x 64-bit AMD-64/i386 dual-core | | +----------------+------------------------------------+ | | RAM | 500 MB per daemon | | +----------------+------------------------------------+ -| | Volume Storage | 1-disk or RAID per daemon | +| | Volume Storage | 1x Disk per daemon | | +----------------+------------------------------------+ -| | Network | 2-1GB Ethernet NICs | +| | Network | 2x 1GB Ethernet NICs | +--------------+----------------+------------------------------------+ -| ``ceph-mon`` | Processor | 64-bit AMD-64/i386 | +| ``ceph-mon`` | Processor | 1x 64-bit AMD-64/i386 | | +----------------+------------------------------------+ | | RAM | 1 GB per daemon | | +----------------+------------------------------------+ | | Disk Space | 10 GB per daemon | | +----------------+------------------------------------+ -| | Network | 2-1GB Ethernet NICs | +| | Network | 2x 1GB Ethernet NICs | +--------------+----------------+------------------------------------+ -| ``ceph-mds`` | Processor | 64-bit AMD-64/i386 quad-core | +| ``ceph-mds`` | Processor | 1x 64-bit AMD-64/i386 quad-core | | +----------------+------------------------------------+ | | RAM | 1 GB minimum per daemon | | +----------------+------------------------------------+ | | Disk Space | 1 MB per daemon | | +----------------+------------------------------------+ -| | Network | 2-1GB Ethernet NICs | +| | Network | 2x 1GB Ethernet NICs | +--------------+----------------+------------------------------------+ .. important: If you are running an OSD with a single disk, create a @@ -73,34 +152,30 @@ configurations for Ceph OSDs, and a lighter configuration for monitors. +----------------+----------------+------------------------------------+ | Configuration | Criteria | Minimum Recommended | +================+================+====================================+ -| Dell PE R510 | Processor | 2 64-bit quad-core Xeon CPUs | +| Dell PE R510 | Processor | 2x 64-bit quad-core Xeon CPUs | | +----------------+------------------------------------+ | | RAM | 16 GB | | +----------------+------------------------------------+ -| | Volume Storage | 8-2TB drives. 1-OS 7-Storage | +| | Volume Storage | 8x 2TB drives. 1 OS, 7 Storage | | +----------------+------------------------------------+ -| | Client Network | 2-1GB Ethernet NICs | +| | Client Network | 2x 1GB Ethernet NICs | | +----------------+------------------------------------+ -| | OSD Network | 2-1GB Ethernet NICs | +| | OSD Network | 2x 1GB Ethernet NICs | | +----------------+------------------------------------+ -| | NIC Mgmt. | 2-1GB Ethernet NICs | +| | NIC Mgmt. | 2x 1GB Ethernet NICs | +----------------+----------------+------------------------------------+ -| Dell PE R515 | Processor | 1 hex-core Opteron CPU | +| Dell PE R515 | Processor | 1x hex-core Opteron CPU | | +----------------+------------------------------------+ | | RAM | 16 GB | | +----------------+------------------------------------+ -| | Volume Storage | 12-3TB drives. Storage | +| | Volume Storage | 12x 3TB drives. Storage | | +----------------+------------------------------------+ -| | OS Storage | 1-500GB drive. Operating System. | +| | OS Storage | 1x 500GB drive. Operating System. | | +----------------+------------------------------------+ -| | Client Network | 2-1GB Ethernet NICs | +| | Client Network | 2x 1GB Ethernet NICs | | +----------------+------------------------------------+ -| | OSD Network | 2-1GB Ethernet NICs | +| | OSD Network | 2x 1GB Ethernet NICs | | +----------------+------------------------------------+ -| | NIC Mgmt. | 2-1GB Ethernet NICs | +| | NIC Mgmt. | 2x 1GB Ethernet NICs | +----------------+----------------+------------------------------------+ - - - -