{"id":245206,"date":"2024-07-18T23:11:05","date_gmt":"2024-07-18T23:11:05","guid":{"rendered":"https:\/\/michigandigitalnews.com\/index.php\/2024\/07\/18\/nvidia-delves-into-rapids-cuvs-ivf-pq-for-accelerated-vector-search\/"},"modified":"2025-06-25T17:14:26","modified_gmt":"2025-06-25T17:14:26","slug":"nvidia-delves-into-rapids-cuvs-ivf-pq-for-accelerated-vector-search","status":"publish","type":"post","link":"https:\/\/michigandigitalnews.com\/index.php\/2024\/07\/18\/nvidia-delves-into-rapids-cuvs-ivf-pq-for-accelerated-vector-search\/","title":{"rendered":"NVIDIA Delves into RAPIDS cuVS IVF-PQ for Accelerated Vector Search"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div>\n<figure class=\"figure mt-2\">&#13;<br \/>\n                                &#13;<\/p>\n<p>&#13;<br \/>\n                                    <a href=\"https:\/\/blockchain.news\/Profile\/Zach-Anderson\">Zach Anderson<\/a>&#13;<br \/>\n                                    <span class=\"publication-date ml-2\"> Jul 18, 2024 20:12<\/span>&#13;\n                                <\/p>\n<p>&#13;<\/p>\n<p class=\"lead\">NVIDIA explores the RAPIDS cuVS IVF-PQ algorithm, enhancing vector search performance through compression and GPU acceleration.<\/p>\n<p>&#13;<br \/>\n                                <a href=\"https:\/\/image.blockchain.news:443\/features\/D8E08E86F8EDBDDCD68414CF49BDD8B1401B11A69515DFF98E6B2B03EE9CF9D7.jpg\">&#13;<br \/>\n                                    <img decoding=\"async\" class=\"rounded\" src=\"https:\/\/image.blockchain.news:443\/features\/D8E08E86F8EDBDDCD68414CF49BDD8B1401B11A69515DFF98E6B2B03EE9CF9D7.jpg\" alt=\"NVIDIA Delves into RAPIDS cuVS IVF-PQ for Accelerated Vector Search\"\/>&#13;<br \/>\n                                <\/a>&#13;<br \/>\n                            <\/figure>\n<p>In a detailed blog post, NVIDIA has provided insights into their RAPIDS cuVS IVF-PQ algorithm, which aims to accelerate vector search by leveraging GPU technology and advanced compression techniques. This is part one of a two-part series that continues from their previous exploration of the IVF-Flat algorithm.<\/p>\n<h2>IVF-PQ Algorithm Introduction<\/h2>\n<p>The blog post introduces IVF-PQ (Inverted File Index with Product Quantization), an algorithm designed to enhance search performance and reduce memory usage by storing data in a compressed form. This method, however, comes at the cost of some accuracy, a trade-off that will be further explored in the second part of the series.<\/p>\n<p>IVF-PQ builds upon the concepts of IVF-Flat, which uses an inverted file index to limit the search complexity to a smaller subset of data through clustering. Product quantization (PQ) adds another layer of compression by encoding database vectors, making the process more efficient for large datasets.<\/p>\n<h2>Performance Benchmarks<\/h2>\n<p>NVIDIA shared benchmarks using the DEEP dataset, which contains a billion records and 96 dimensions, amounting to 360 GiB in size. A typical IVF-PQ configuration compresses this into an index of 54 GiB without significantly impacting search performance, or as small as 24 GiB with a slight slowdown. This compression allows the index to fit into GPU memory.<\/p>\n<p>Comparisons with the popular CPU algorithm HNSW on a 100-million subset of the DEEP dataset show that cuVS IVF-PQ can significantly accelerate both index building and vector search.<\/p>\n<h2>Algorithm Overview<\/h2>\n<p>IVF-PQ follows a two-step process: a coarse search and a fine search. The coarse search is identical to IVF-Flat, while the fine search involves calculating distances between query points and vectors in probed clusters, but with the vectors stored in a compressed format.<\/p>\n<p>This compression is achieved through PQ, which approximates a vector using two-level quantization. This allows IVF-PQ to fit more data into GPU memory, enhancing memory bandwidth utilization and speeding up the search process.<\/p>\n<h2>Optimizations and Performance<\/h2>\n<p>NVIDIA has implemented various optimizations in cuVS to ensure the IVF-PQ algorithm performs efficiently on GPUs. These include:<\/p>\n<ul>\n<li>Fusing operations to reduce output size and optimize memory bandwidth utilization.<\/li>\n<li>Storing the lookup table (LUT) in GPU shared memory when possible for faster access.<\/li>\n<li>Using a custom 8-bit floating point data type in the LUT for faster data conversion.<\/li>\n<li>Aligning data in 16-byte chunks to optimize data transfers.<\/li>\n<li>Implementing an \u201cearly stop\u201d check to avoid unnecessary distance computations.<\/li>\n<\/ul>\n<p>NVIDIA\u2019s benchmarks on a 100-million scale dataset show that IVF-PQ outperforms IVF-Flat, particularly with larger batch sizes, achieving up to 3-4 times the number of queries per second.<\/p>\n<h2>Conclusion<\/h2>\n<p>IVF-PQ is a robust ANN search algorithm that leverages clustering and compression to enhance search performance and throughput. The first part of NVIDIA\u2019s blog series provides a comprehensive overview of the algorithm\u2019s workings and its advantages on GPU platforms. For more detailed performance tuning recommendations, NVIDIA encourages readers to explore the second part of their series.<\/p>\n<p>For more information, visit the <a rel=\"nofollow\" href=\"https:\/\/developer.nvidia.com\/blog\/accelerating-vector-search-rapids-cuvs-ivf-pq-deep-dive-part-1\/\">NVIDIA Technical Blog<\/a>.<\/p>\n<p><span><i>Image source: Shutterstock<\/i><\/span><\/p>\n<p>                            <!-- Divider --><\/p>\n<p>                            <!-- Author info END --><br \/>\n                            <!-- Divider --><\/p><\/div>\n<p>[ad_2]<br \/>\n<br \/><a href=\"https:\/\/blockchain.news\/news\/nvidia-rapids-cuvs-ivf-pq-vector-search\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] &#13; &#13; &#13; Zach Anderson&#13; Jul 18, 2024 20:12&#13; &#13; NVIDIA explores the RAPIDS cuVS IVF-PQ algorithm, enhancing vector search performance through compression and<\/p>\n","protected":false},"author":1,"featured_media":245207,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[171],"tags":[],"_links":{"self":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/245206"}],"collection":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/comments?post=245206"}],"version-history":[{"count":0,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/245206\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/media\/245207"}],"wp:attachment":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/media?parent=245206"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/categories?post=245206"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/tags?post=245206"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}