{"id":232783,"date":"2024-06-15T19:10:02","date_gmt":"2024-06-15T19:10:02","guid":{"rendered":"https:\/\/michigandigitalnews.com\/index.php\/2024\/06\/15\/nvidia-enhances-rdma-performance-with-doca-gpunetio\/"},"modified":"2025-06-25T17:16:59","modified_gmt":"2025-06-25T17:16:59","slug":"nvidia-enhances-rdma-performance-with-doca-gpunetio","status":"publish","type":"post","link":"https:\/\/michigandigitalnews.com\/index.php\/2024\/06\/15\/nvidia-enhances-rdma-performance-with-doca-gpunetio\/","title":{"rendered":"NVIDIA Enhances RDMA Performance with DOCA GPUNetIO"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div>\n<figure class=\"figure mt-2\">&#13;<br \/>\n                        <a href=\"https:\/\/image.blockchain.news:443\/features\/D8E08E86F8EDBDDCD68414CF49BDD8B1401B11A69515DFF98E6B2B03EE9CF9D7.jpg\" data-glightbox=\"\" data-gallery=\"image-popup\">&#13;<br \/>\n                            <img decoding=\"async\" class=\"rounded\" src=\"https:\/\/image.blockchain.news:443\/features\/D8E08E86F8EDBDDCD68414CF49BDD8B1401B11A69515DFF98E6B2B03EE9CF9D7.jpg\" alt=\"NVIDIA Enhances RDMA Performance with DOCA GPUNetIO\"\/>&#13;<br \/>\n&#13;<br \/>\n                        <\/a>&#13;<br \/>\n                    <\/figure>\n<p>NVIDIA has unveiled new capabilities for its DOCA GPUNetIO library, enabling GPU-accelerated Remote Direct Memory Access (RDMA) for real-time inline GPU packet processing. This enhancement leverages technologies such as GPUDirect RDMA and GPUDirect Async, allowing a CUDA kernel to directly communicate with the network interface card (NIC), bypassing the CPU. This update aims to improve GPU-centric applications by reducing latency and CPU utilization, according to the <a rel=\"nofollow\" href=\"https:\/\/developer.nvidia.com\/blog\/unlocking-gpu-accelerated-rdma-with-nvidia-doca-gpunetio\/\">NVIDIA Technical Blog<\/a>.<\/p>\n<h2>Enhanced RDMA Functionality<\/h2>\n<p>Previously, DOCA GPUNetIO, along with DOCA Ethernet and DOCA Flow, was used for packet transmissions over the Ethernet transport layer. The latest update, DOCA 2.7, introduces a new set of APIs that enable RDMA communications directly from a GPU CUDA kernel using RoCE or InfiniBand transport layers. This development allows for high-throughput, low-latency data transfers by enabling the GPU to control the data path of the RDMA application.<\/p>\n<h2>RDMA GPU Data Path<\/h2>\n<p>RDMA allows direct access between the main memory of two hosts without involving the operating system, cache, or storage. This is achieved by registering and sharing a local memory area with the remote host, enabling high-throughput and low-latency data transfers. The process involves three fundamental steps: local configuration, exchange of information, and data path execution.<\/p>\n<p>With the new GPUNetIO RDMA functions, the application can manage the data path of the RDMA application on the GPU, executing the data path step with a CUDA kernel instead of the CPU. This reduces latency and frees up CPU cycles, allowing the GPU to be the main controller of the application.<\/p>\n<h2>Performance Comparison<\/h2>\n<p>NVIDIA has conducted performance comparisons between GPUNetIO RDMA functions and IB Verbs RDMA functions using the perftest microbenchmark suite. The tests were executed on a Dell R750 machine with an NVIDIA H100 GPU and a ConnectX-7 network card. The results show that DOCA GPUNetIO RDMA performance is comparable to IB Verbs perftest, with both methods achieving similar peak bandwidth and elapsed times.<\/p>\n<p>For the performance tests, parameters were set to 1 RDMA queue, 2,048 iterations, and 512 RDMA writes per iteration, with message sizes ranging from 64 to 4,096 bytes. Both implementations reached up to 16 GB\/s in peak bandwidth when increased to four queues, demonstrating the scalability and efficiency of the new GPUNetIO RDMA functions.<\/p>\n<h2>Benefits and Applications<\/h2>\n<p>The architectural choice of offloading RDMA data path control to the GPU offers several benefits:<\/p>\n<ul>\n<li>Scalability: Capable of managing multiple RDMA queues in parallel.<\/li>\n<li>Parallelism: High degree of parallelism with several CUDA threads working simultaneously.<\/li>\n<li>Lower CPU Utilization: Platform-independent performance with minimal CPU involvement.<\/li>\n<li>Reduced Bus Transactions: Fewer internal bus transactions, as the CPU is no longer responsible for data synchronization.<\/li>\n<\/ul>\n<p>This update is particularly beneficial for network applications where data processing occurs on the GPU, enabling more efficient and scalable solutions. For more details, visit the <a rel=\"nofollow\" href=\"https:\/\/developer.nvidia.com\/blog\/unlocking-gpu-accelerated-rdma-with-nvidia-doca-gpunetio\/\">NVIDIA Technical Blog<\/a>.<\/p>\n<p><span><i>Image source: Shutterstock<\/i><\/span>                    <!-- Divider --><\/p>\n<p>                    <!-- Divider --><\/p>\n<p>                    <!-- Author info START --><br \/>\n                    <!-- Author info END --><br \/>\n                    <!-- Divider -->\n                <\/div>\n<p>[ad_2]<br \/>\n<br \/><a href=\"https:\/\/blockchain.news\/news\/nvidia-enhances-rdma-performance-doca-gpunetio\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] &#13; &#13; &#13; &#13; &#13; NVIDIA has unveiled new capabilities for its DOCA GPUNetIO library, enabling GPU-accelerated Remote Direct Memory Access (RDMA) for real-time<\/p>\n","protected":false},"author":1,"featured_media":232784,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[171],"tags":[],"_links":{"self":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/232783"}],"collection":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/comments?post=232783"}],"version-history":[{"count":0,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/232783\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/media\/232784"}],"wp:attachment":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/media?parent=232783"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/categories?post=232783"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/tags?post=232783"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}