From 3d2d6ffabb16a454135ae614f838f43c56ae5899 Mon Sep 17 00:00:00 2001 From: Kefu Chai Date: Sun, 27 Mar 2022 00:45:29 +0800 Subject: [PATCH] doc/dpdk: improve the formatting Signed-off-by: Kefu Chai --- doc/dev/dpdk.rst | 190 +++++++++++++++++++++++++++++++---------------- 1 file changed, 125 insertions(+), 65 deletions(-) diff --git a/doc/dev/dpdk.rst b/doc/dev/dpdk.rst index 79403ef0eb32b..7c87931b4055e 100644 --- a/doc/dev/dpdk.rst +++ b/doc/dev/dpdk.rst @@ -4,107 +4,167 @@ Ceph messenger DPDKStack Compiling DPDKStack =================== -Ceph dpdkstack is not compiled by default.Therefore,you need to recompile and + +Ceph dpdkstack is not compiled by default. Therefore, you need to recompile and enable the DPDKstack component. -Install dpdk and dpdk-devel,and compile do_cmake.sh -DWITH_DPDK=ON. +Optionally install ``dpdk-devel`` or ``dpdk-dev`` on distros with precompiled DPDK packages, and compile + +.. prompt:: bash $ + + do_cmake.sh -DWITH_DPDK=ON + Setting the DPDK Network Adapter ================================ + Most mainstream NICs support SR-IOV and can be virtualized into multiple VF NICs. -Each OSD uses some dedicated NICs through DPDK. The mon,mgr,and client use the PF NICs +Each OSD uses some dedicated NICs through DPDK. The mon, mgr and client use the PF NICs through the POSIX protocol stack. Load the driver on which DPDK depends: -modprobe vfio -modprobe vfio_pci -Configure Hugepage -vm.nr_hugepages = xxx +.. prompt:: bash # + + modprobe vfio + modprobe vfio_pci + +Configure Hugepage by editing ``/etc/sysctl.conf`` :: + + vm.nr_hugepages = xxx Configure the number of VFs based on the number of OSDs: -echo $numvfs > /sys/class/net/$port/device/sriov_numvfs + +.. prompt:: bash # + + echo $numvfs > /sys/class/net/$port/device/sriov_numvfs Binding NICs to DPDK Applications: -dpdk-devbind.py -b vfio-pci 0000:xx:yy.z + +.. prompt:: bash # + + dpdk-devbind.py -b vfio-pci 0000:xx:yy.z Configuring OSD DPDKStack ========================== -The DPDK RTE initialization process requires the root permission. -Therefore,you need to grant the root permission to ceph. -modify /etc/passwd to give ceph user root privilege and /var/run folder write: -ceph:x:0:0:Ceph storage service:/var/lib/ceph:/bin/false:/var/run - -The OSD selects the NICs using ms_dpdk_devs_allowlist: -1)Configure a single NIC. -ms_dpdk_devs_allowlist=-a 0000:7d:010 or ms_dpdk_devs_allowlist=--allow=0000:7d:010 -2)Configure the Bond Network Adapter -ms_dpdk_devs_allowlist=--allow=0000:7d:01.0 --allow=0000:7d:02.6 ---vdev=net_bonding0,mode=2,slave=0000:7d:01.0,slave=0000:7d:02.6 + +The DPDK RTE initialization process requires the root privileges. +Therefore, you need to grant the root permission to ceph. +modify ``/etc/passwd`` to give ceph user root privilege and ``/var/run`` folder write:: + + ceph:x:0:0:Ceph storage service:/var/lib/ceph:/bin/false:/var/run + +The OSD selects the NICs using ``ms_dpdk_devs_allowlist``: + +#. Configure a single NIC. + + .. code-block:: ini + + ms_dpdk_devs_allowlist=-a 0000:7d:010 + + or + + .. code-block:: ini + + ms_dpdk_devs_allowlist=--allow=0000:7d:010 + +#. Configure the Bond Network Adapter + + .. code-block:: ini + + ms_dpdk_devs_allowlist=--allow=0000:7d:01.0 --allow=0000:7d:02.6 --vdev=net_bonding0,mode=2,slave=0000:7d:01.0,slave=0000:7d:02.6 DPDK-related configuration items are as follows: -[osd] -ms_type=async+dpdk -ms_async_op_threads=1 - -ms_dpdk_port_id=0 -ms_dpdk_gateway_ipv4_addr=172.19.36.1 -ms_dpdk_netmask_ipv4_addr=255.255.255.0 -ms_dpdk_hugepages=/dev/hugepages -ms_dpdk_hw_flow_control=false -ms_dpdk_lro=false -ms_dpdk_enable_tso=false -ms_dpdk_hw_queue_weight=1 -ms_dpdk_memory_channel=2 -ms_dpdk_debug_allow_loopback = true - -[osd.x] -ms_dpdk_coremask=0xf0 -ms_dpdk_host_ipv4_addr=172.19.36.51 -public_addr=172.19.36.51 -cluster_addr=172.19.36.51 -ms_dpdk_devs_allowlist=--allow=0000:7d:01.1 + +.. code-block:: ini + + [osd] + ms_type=async+dpdk + ms_async_op_threads=1 + + ms_dpdk_port_id=0 + ms_dpdk_gateway_ipv4_addr=172.19.36.1 + ms_dpdk_netmask_ipv4_addr=255.255.255.0 + ms_dpdk_hugepages=/dev/hugepages + ms_dpdk_hw_flow_control=false + ms_dpdk_lro=false + ms_dpdk_enable_tso=false + ms_dpdk_hw_queue_weight=1 + ms_dpdk_memory_channel=2 + ms_dpdk_debug_allow_loopback = true + + [osd.x] + ms_dpdk_coremask=0xf0 + ms_dpdk_host_ipv4_addr=172.19.36.51 + public_addr=172.19.36.51 + cluster_addr=172.19.36.51 + ms_dpdk_devs_allowlist=--allow=0000:7d:01.1 Debug and Optimization ====================== + Locate faults based on logs and adjust logs to a proper level: -debug_dpdk=xx -debug_ms=xx + +.. code-block:: ini + + debug_dpdk=xx + debug_ms=xx + if the log contains a large number of retransmit messages,reduce the value of ms_dpdk_tcp_wmem. Run the perf dump command to view DPDKStack statistics: -ceph daemon osd.$i perf dump | grep dpdk -if the "dpdk_device_receive_nombuf_errors" keeps increasing,check whether the + +.. prompt:: bash $ + + ceph daemon osd.$i perf dump | grep dpdk + + +if the ``dpdk_device_receive_nombuf_errors`` keeps increasing, check whether the throttling exceeds the limit: -ceph daemon osd.$i perf dump | grep throttle-osd_client -A 7 | grep "get_or_fail_fail" -ceph daemon osd.$i perf dump | grep throttle-msgr_dispatch_throttler -A 7 | grep "get_or_fail_fail" -if the throttling exceeds the threshold,increase the throttling threshold or + +.. prompt:: bash $ + + ceph daemon osd.$i perf dump | grep throttle-osd_client -A 7 | grep "get_or_fail_fail" + ceph daemon osd.$i perf dump | grep throttle-msgr_dispatch_throttler -A 7 | grep "get_or_fail_fail" + +if the throttling exceeds the threshold, increase the throttling threshold or disable the throttling. Check whether the network adapter is faulty or abnormal.Run the following command to obtain the network adapter status and statistics: -ceph daemon osd.$i show_pmd_stats -ceph daemon osd.$i show_pmd_xstats -Some DPDK versions(eg.dpdk-20.11-3.e18.aarch64)or NIC TSOs are abnormal, +.. prompt:: bash $ + + ceph daemon osd.$i show_pmd_stats + ceph daemon osd.$i show_pmd_xstats + +Some DPDK versions (eg. dpdk-20.11-3.e18.aarch64) or NIC TSOs are abnormal, try disabling tso: -ms_dpdk_enable_tso=false -if VF NICs support multiple queues,more NIC queues can be allocated to a +.. code-block:: ini + + ms_dpdk_enable_tso=false + +if VF NICs support multiple queues, more NIC queues can be allocated to a single core to improve performance: -ms_dpdk_hw_queues_per_qp=4 + +.. code-block:: ini + + ms_dpdk_hw_queues_per_qp=4 Status and Future Work ====================== -Compared with POSIX Stack,In the multi-concurrency test,DPDKStack has the same -4K random write performance,8K random write performance is improved by 28%,and -1 MB packets are unstable.In the single-latency test,the 4K and 8K random write -latency is reduced by 15%(the lower the latency is,the better). - -At a high level,our future work plan is: -OSD multiple network support (public network and cluster network) -The public and cluster network adapters can be configured.When connecting or -listening,the public or cluster network adapters can be selected based on the -IP address.During msgr-work initialization,initialize both the public and cluster -network adapters and create two DPDKQueuePairs. + +Compared with POSIX Stack, in the multi-concurrency test, DPDKStack has the same +4K random write performance, 8K random write performance is improved by 28%, and +1 MB packets are unstable. In the single-latency test,the 4K and 8K random write +latency is reduced by 15% (the lower the latency is, the better). + +At a high level, our future work plan is: + + OSD multiple network support (public network and cluster network) + The public and cluster network adapters can be configured.When connecting or + listening,the public or cluster network adapters can be selected based on the + IP address.During msgr-work initialization,initialize both the public and cluster + network adapters and create two DPDKQueuePairs. -- 2.39.5