{"id":231727,"date":"2024-06-13T05:48:50","date_gmt":"2024-06-13T05:48:50","guid":{"rendered":"https:\/\/michigandigitalnews.com\/index.php\/2024\/06\/13\/nvidia-breaks-records-in-generative-ai-with-mlperf-training-v4-0\/"},"modified":"2025-06-25T17:17:14","modified_gmt":"2025-06-25T17:17:14","slug":"nvidia-breaks-records-in-generative-ai-with-mlperf-training-v4-0","status":"publish","type":"post","link":"https:\/\/michigandigitalnews.com\/index.php\/2024\/06\/13\/nvidia-breaks-records-in-generative-ai-with-mlperf-training-v4-0\/","title":{"rendered":"NVIDIA Breaks Records in Generative AI with MLPerf Training v4.0"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div>\n<figure class=\"figure mt-2\">&#13;<br \/>\n                        <a href=\"https:\/\/image.blockchain.news:443\/features\/D8E08E86F8EDBDDCD68414CF49BDD8B1401B11A69515DFF98E6B2B03EE9CF9D7.jpg\" data-glightbox=\"\" data-gallery=\"image-popup\">&#13;<br \/>\n                            <img decoding=\"async\" class=\"rounded\" src=\"https:\/\/image.blockchain.news:443\/features\/D8E08E86F8EDBDDCD68414CF49BDD8B1401B11A69515DFF98E6B2B03EE9CF9D7.jpg\" alt=\"NVIDIA Breaks Records in Generative AI with MLPerf Training v4.0\"\/>&#13;<br \/>\n&#13;<br \/>\n                        <\/a>&#13;<br \/>\n                    <\/figure>\n<p>NVIDIA has set new performance and scale records in the generative AI domain, according to a recent submission to MLPerf Training v4.0. This achievement underscores the company&#8217;s ongoing dominance in AI training benchmarks, particularly in the realm of large language models (LLMs) and generative AI.<\/p>\n<h2>MLPerf Training v4.0 Updates<\/h2>\n<p>MLPerf Training, developed by the MLCommons consortium, is the industry-standard benchmark for evaluating end-to-end AI training performance. The latest version, v4.0, introduced two new tests to reflect popular industry workloads. The first test measures the fine-tuning speed of Llama 2 70B using the low-rank adaptation (LoRA) technique. The second test focuses on graph neural network (GNN) training, based on an implementation of the relational graph attention network (RGAT).<\/p>\n<p>The updated test suite includes a variety of workloads such as LLM pre-training (GPT-3 175B), LLM fine-tuning (Llama 2 70B with LoRA), text-to-image (Stable Diffusion v2), and several others, covering a wide range of AI applications.<\/p>\n<h2>NVIDIA&#8217;s Record-Breaking Performance<\/h2>\n<p>In the latest MLPerf Training round, NVIDIA achieved remarkable performance using a full stack of its hardware and software solutions:<\/p>\n<ul>\n<li>NVIDIA Hopper GPUs<\/li>\n<li>Fourth-generation NVLink interconnect with third-generation NVSwitch chip<\/li>\n<li>NVIDIA Quantum-2 InfiniBand networking<\/li>\n<li>An optimized NVIDIA software stack<\/li>\n<\/ul>\n<p>These components have been further optimized since the last round, enabling NVIDIA to break previous records. For instance, NVIDIA improved its GPT-3 175B training time from 10.9 minutes using 3,584 H100 GPUs to just 3.4 minutes using 11,616 H100 GPUs, demonstrating near-linear performance scaling.<\/p>\n<h2>Generative AI and LLM Fine-Tuning<\/h2>\n<p>NVIDIA also set new records in LLM fine-tuning, particularly with the Llama 2 70B model developed by Meta. Utilizing the LoRA technique, a single DGX H100 with eight H100 GPUs completed the fine-tuning in just over 28 minutes. The NVIDIA H200 Tensor Core GPU further reduced this time to 24.7 minutes. NVIDIA&#8217;s submissions also showcased scalability, achieving a fine-tuning time of just 1.5 minutes using 1,024 H100 GPUs.<\/p>\n<p>The company leveraged the context parallelism capability available in the NVIDIA NeMo framework to achieve these results. Additionally, the use of FP8 implementation of self-attention in cuDNN improved performance by 15% at the 8-GPU scale.<\/p>\n<h2>Advancements in Visual Generative AI<\/h2>\n<p>MLPerf Training v4.0 also includes a benchmark for text-to-image generative AI based on Stable Diffusion v2. NVIDIA&#8217;s submissions delivered up to 80% more performance at the same scales through extensive software enhancements, such as the use of full-iteration CUDA Graphs and an optimized distributed optimizer for Stable Diffusion.<\/p>\n<h2>Graph Neural Network Training<\/h2>\n<p>NVIDIA set new records in GNN training as well. Using 8, 64, and 512 H100 GPUs, the company achieved a record time of just 1.1 minutes in the largest-scale configuration. The use of eight H200 Tensor Core GPUs provided a 47% boost compared to the H100 submission at the same scale.<\/p>\n<h2>Key Takeaways<\/h2>\n<p>NVIDIA continues to lead in AI training performance, showcasing the highest versatility and efficiency across a range of AI workloads. The company&#8217;s ongoing optimization of its software stack ensures more performance per GPU, reducing training costs and enabling the training of more demanding models.<\/p>\n<p>Looking ahead, the NVIDIA Blackwell platform, announced at GTC 2024, promises to democratize trillion-parameter AI, delivering up to 30x faster real-time trillion-parameter inference and up to 4x faster trillion-parameter training compared to NVIDIA Hopper GPUs.<\/p>\n<p>For more detailed information, visit the <a rel=\"nofollow\" href=\"https:\/\/developer.nvidia.com\/blog\/nvidia-sets-new-generative-ai-performance-and-scale-records-in-mlperf-training-v4-0\/\">NVIDIA Technical Blog<\/a>.<\/p>\n<p><span><i>Image source: Shutterstock<\/i><\/span>                    <!-- Divider --><\/p>\n<p>. . .<\/p>\n<h4>Tags<\/h4>\n<p>                    <!-- Divider --><\/p>\n<p>                    <!-- Author info START --><\/p>\n<p>                    <!-- Author info END --><br \/>\n                    <!-- Divider -->\n                <\/div>\n<p>[ad_2]<br \/>\n<br \/><a href=\"https:\/\/blockchain.news\/news\/nvidia-breaks-records-generative-ai-mlperf-training-v4-0\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] &#13; &#13; &#13; &#13; &#13; NVIDIA has set new performance and scale records in the generative AI domain, according to a recent submission to<\/p>\n","protected":false},"author":1,"featured_media":231728,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[171],"tags":[],"_links":{"self":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/231727"}],"collection":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/comments?post=231727"}],"version-history":[{"count":0,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/231727\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/media\/231728"}],"wp:attachment":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/media?parent=231727"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/categories?post=231727"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/tags?post=231727"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}