{"id":251426,"date":"2024-08-05T21:41:23","date_gmt":"2024-08-05T21:41:23","guid":{"rendered":"https:\/\/michigandigitalnews.com\/index.php\/2024\/08\/05\/nvidias-ai-team-reportedly-scraped-youtube-netflix-videos-without-permission\/"},"modified":"2025-06-25T17:13:07","modified_gmt":"2025-06-25T17:13:07","slug":"nvidias-ai-team-reportedly-scraped-youtube-netflix-videos-without-permission","status":"publish","type":"post","link":"https:\/\/michigandigitalnews.com\/index.php\/2024\/08\/05\/nvidias-ai-team-reportedly-scraped-youtube-netflix-videos-without-permission\/","title":{"rendered":"NVIDIA\u2019s AI team reportedly scraped YouTube, Netflix videos without permission"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div>\n<p>In the latest example of a <a data-i13n=\"cpos:1;pos:1\" href=\"https:\/\/www.engadget.com\/ai-video-startup-runway-reportedly-trained-on-thousands-of-youtube-videos-without-permission-182314160.html\" data-ylk=\"slk:troubling industry pattern;cpos:1;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \">troubling industry pattern<\/a>, NVIDIA appears to have scraped troves of copyrighted content for AI training. <em>On Monday, 404 Media\u2019s Samantha Cole reported<\/em> that the $2.4 trillion company asked workers to download videos from YouTube, Netflix and other datasets to develop commercial AI projects. The graphics card maker is among the tech companies appearing to have adopted a \u201cmove fast and break things\u201d ethos as they race to establish dominance in this feverish, <a data-i13n=\"cpos:2;pos:1\" href=\"https:\/\/www.engadget.com\/ai\/ai-startup-argues-scraping-every-song-on-the-internet-is-fair-use-233132459.html\" data-ylk=\"slk:too-often-shameful;cpos:2;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \">too-often-shameful<\/a> AI gold rush.<\/p>\n<p>The training was reportedly to develop models for products like its Omniverse 3D world generator, self-driving car systems and \u201cdigital human\u201d efforts.<\/p>\n<p>NVIDIA defended its practice in an email to Engadget. A company spokesperson said its research is \u201cin full compliance with the letter and the spirit of copyright law\u201d while claiming IP laws protect specific expressions \u201cbut not facts, ideas, data, or information.\u201d The company equated the practice to a person\u2019s right to \u201clearn facts, ideas, data, or information from another source and use it to make their own expression.\u201d Human, computer\u2026 what\u2019s the difference?<\/p>\n<p>YouTube doesn\u2019t appear to agree. Spokesperson Jack Malon pointed us to a <a data-i13n=\"elm:context_link;elmt:doNotAffiliate;cpos:3;pos:1\" class=\"link \" href=\"https:\/\/www.bloomberg.com\/news\/articles\/2024-04-04\/youtube-says-openai-training-sora-with-its-videos-would-break-the-rules\" rel=\"nofollow noopener\" target=\"_blank\" data-ylk=\"slk:Bloomberg story;elm:context_link;elmt:doNotAffiliate;cpos:3;pos:1;itc:0;sec:content-canvas\"><em>Bloomberg story<\/em><\/a> from April, quoting CEO Neal Mohan saying using YouTube to train AI models would be a \u201cclear violation\u201d of its terms. \u201cOur previous comment still stands,\u201d the YouTube policy communications manager wrote to Engadget.<\/p>\n<p>That quote from Mohan in April was in response to reports that <a data-i13n=\"cpos:4;pos:1\" href=\"https:\/\/www.engadget.com\/youtube-ceo-warns-openai-that-training-models-on-its-videos-is-against-the-rules-121547513.html\" data-ylk=\"slk:OpenAI trained its Sora text-to-video generator on YouTube videos;cpos:4;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \">OpenAI trained its Sora text-to-video generator on YouTube videos<\/a> without permission. Last month, a report showed that <a data-i13n=\"cpos:5;pos:1\" href=\"https:\/\/www.engadget.com\/ai-video-startup-runway-reportedly-trained-on-thousands-of-youtube-videos-without-permission-182314160.html\" data-ylk=\"slk:the startup Runway AI followed suit;cpos:5;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \">the startup Runway AI followed suit<\/a>.<\/p>\n<p>NVIDIA employees who raised ethical and legal concerns about the practice were reportedly told by their managers that it had already been green-lit by the company&#8217;s highest levels. \u201cThis is an executive decision,\u201d Ming-Yu Liu, vice president of research at NVIDIA, replied. \u201cWe have an umbrella approval for all of the data.\u201d Others at the company allegedly described its scraping as an \u201copen legal issue\u201d they\u2019d tackle down the road.<\/p>\n<p>It all sounds similar to Facebook\u2019s (Meta\u2019s) old <a data-i13n=\"cpos:6;pos:1\" href=\"https:\/\/www.engadget.com\/2018-04-12-facebook-has-no-quick-solutions.html\" data-ylk=\"slk:\u201cmove fast and break things\u201d motto;cpos:6;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \">\u201cmove fast and break things\u201d motto<\/a>, which has succeeded admirably at breaking quite a few things. That included <a data-i13n=\"cpos:7;pos:1\" href=\"https:\/\/www.engadget.com\/2018-03-19-facebook-and-cambridge-analytica-nightmare.html\" data-ylk=\"slk:the privacy of millions of people;cpos:7;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \">the privacy of millions of people<\/a>.<\/p>\n<p>In addition to the YouTube and Netflix videos, NVIDIA reportedly instructed workers to train on movie trailer database MovieNet, internal libraries of video game footage and Github video datasets WebVid (now taken down after a cease-and-desist) and InternVid-10M. The latter is a dataset containing 10 million YouTube video IDs.<\/p>\n<p>Some of the data NVIDIA allegedly trained on was only marked as eligible for academic (or otherwise non-commercial) use. HD-VG-130M, a library of 130 million YouTube videos, includes a usage license specifying that it\u2019s only meant for academic research. NVIDIA reportedly brushed aside concerns about academic-only terms, insisting their batches were fair game for its commercial AI products.<\/p>\n<p>To evade detection from YouTube, NVIDIA reportedly downloaded content using virtual machines (VMs) with rotating IP addresses to avoid bans. In response to a worker\u2019s suggestion to use a third-party IP address-rotating tool, another NVIDIA employee reportedly wrote, \u201cWe are on [Amazon Web Services](#) and restarting a [virtual machine](#) instance gives a new public IP[.](#) So, that\u2019s not a problem so far.\u201d<\/p>\n<p><em>404 Media<\/em>\u2019s full report on NVIDIA\u2019s practices is <a data-i13n=\"elm:context_link;elmt:doNotAffiliate;cpos:8;pos:1\" class=\"link \" href=\"https:\/\/www.404media.co\/nvidia-ai-scraping-foundational-model-cosmos-project\/\" rel=\"nofollow noopener\" target=\"_blank\" data-ylk=\"slk:worth a read;elm:context_link;elmt:doNotAffiliate;cpos:8;pos:1;itc:0;sec:content-canvas\">worth a read<\/a>.<\/p>\n<\/div>\n<p>[ad_2]<br \/>\n<br \/><a href=\"https:\/\/www.engadget.com\/ai\/nvidias-ai-team-reportedly-scraped-youtube-netflix-videos-without-permission-204942022.html?src=rss\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] In the latest example of a troubling industry pattern, NVIDIA appears to have scraped troves of copyrighted content for AI training. On Monday, 404<\/p>\n","protected":false},"author":1,"featured_media":251427,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[159],"tags":[],"_links":{"self":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/251426"}],"collection":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/comments?post=251426"}],"version-history":[{"count":0,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/251426\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/media\/251427"}],"wp:attachment":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/media?parent=251426"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/categories?post=251426"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/tags?post=251426"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}