{"id":221619,"date":"2024-04-08T13:02:19","date_gmt":"2024-04-08T13:02:19","guid":{"rendered":"https:\/\/michigandigitalnews.com\/index.php\/2024\/04\/08\/openai-and-google-accused-of-using-youtube-transcripts-for-ai\/"},"modified":"2025-06-25T17:19:05","modified_gmt":"2025-06-25T17:19:05","slug":"openai-and-google-accused-of-using-youtube-transcripts-for-ai","status":"publish","type":"post","link":"https:\/\/michigandigitalnews.com\/index.php\/2024\/04\/08\/openai-and-google-accused-of-using-youtube-transcripts-for-ai\/","title":{"rendered":"OpenAI and Google accused of using YouTube transcripts for AI"},"content":{"rendered":"<p> [ad_1]<br \/>\n<br \/><img decoding=\"async\" src=\"https:\/\/readwrite.com\/wp-content\/uploads\/2024\/04\/OpenAI-and-Google-accused-of-using-YouTube-transcripts-for-AI-900x600.png\" \/><\/p>\n<div>\n<p>OpenAI and Google have reportedly <a href=\"https:\/\/readwrite.com\/youtube-ceo-underlines-training-ai-models-on-its-videos-is-against-the-rules\/\">transcribed YouTube videos<\/a> to harvest text for their AI models, potentially violating creators\u2019 copyrights.<\/p>\n<p><a href=\"https:\/\/www.nytimes.com\/2024\/04\/06\/technology\/tech-giants-harvest-data-artificial-intelligence.htmlhttps:\/\/www.nytimes.com\/2024\/04\/06\/technology\/tech-giants-harvest-data-artificial-intelligence.html\" target=\"_blank\" rel=\"noopener\">According<\/a> to an investigation by The New York Times and Meta, the tech giants allegedly cut corners to access as much data as possible to train their AI models.<\/p>\n<p>OpenAI researchers are said to have created a speech recognition tool called <a href=\"https:\/\/readwrite.com\/openai-has-released-gpt-4-for-widespread-use\/\">Whisper<\/a>, which allows audio transcription from YouTube videos. This can yield new conversational text that would make an AI system smarter.<\/p>\n<p>The inquiry cites several sources who claim that more than one million hours of YouTube videos have been transcribed, despite conversations discussing how it could violate YouTube\u2019s rules. The transcripts were then inputted into <a href=\"https:\/\/readwrite.com\/openai-ceo-sam-altman-says-gpt-4-kinda-sucks-hints-at-gpt-5-boost\/\">GPT-4<\/a>, the advanced AI system powering the most recent version of ChatGPT\u2019s chatbot. Google, the parent company of YouTube, was also reported to have transcribed videos to train its own AI models.<\/p>\n<p>In addition to this, OpenAI president Greg Brockman was personally involved in collecting videos that were used, the Times writes.<\/p>\n<p>OpenAI\u2019s alleged use of YouTube videos could also breach Google\u2019s policies, which prohibit using its content for \u201cindependent\u201d applications and the \u201cautomated means\u201d of its videos through methods like robots, botnets, or scrapers.<\/p>\n<h2>Are tech companies running out of training data?<\/h2>\n<p>The report also suggests that OpenAI had depleted its supplies of useful data in 2021, and as a result, discussed transcribing podcasts, audiobooks and YouTube videos to train its next-generation model. By then, it is said that they had mined the computer code repository GitHub, and used up databases of chess moves and data describing high school tests and homework assignments from the website Quizlet.<\/p>\n<p>The Times claims that Google\u2019s legal department requested the company\u2019s privacy team to modify the wording of its policy to broaden the scope of actions it could take with consumer data, including the use of office tools like Google Docs.<\/p>\n<p>According to the Times, <a href=\"https:\/\/readwrite.com\/metas-responsible-ai-team-disbanded-for-generative-ai-focus\/\">Meta<\/a> is also facing a shortage of available training data, and in recordings reviewed by the publication, its AI team was heard discussing the unauthorized use of copyrighted materials in an effort to keep pace with OpenAI. Having exhausted \u201calmost available English-language book, essay, poem and news article on the internet,\u201d the company reportedly contemplated measures such as acquiring book licenses or outright purchasing a major publishing house.<\/p>\n<p>Last week, YouTube CEO Neal Mohan said that using the videos on the platform to train an AI model would be a \u201cclear violation\u201d of YouTube\u2019s terms and conditions after OpenAI\u2019s CTO \u201cdidn\u2019t know\u201d whether the tool was trained on YouTube videos.<\/p>\n<p>Advanced systems created by OpenAI, Google, and others need vast expanses of information to learn. This need is depleting the reservoir of high-quality public data on the internet, especially as certain data owners restrict AI companies\u2019 access. The Wall Street Journal <a href=\"https:\/\/www.wsj.com\/tech\/ai\/ai-training-data-synthetic-openai-anthropic-9230f8d8\" target=\"_blank\" rel=\"noopener\">states<\/a> that there is a 90 per cent chance the demand for high-quality data will outstrip supply by 2028.<\/p>\n<p>OpenAI, Google, and Meta have been approached for further comment.<\/p>\n<p><em>Featured image: Canva<\/em><\/p>\n<\/p><\/div>\n<p>[ad_2]<br \/>\n<br \/><a href=\"https:\/\/readwrite.com\/openai-and-google-accused-of-using-youtube-transcripts-for-ai\/\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] OpenAI and Google have reportedly transcribed YouTube videos to harvest text for their AI models, potentially violating creators\u2019 copyrights. According to an investigation by<\/p>\n","protected":false},"author":1,"featured_media":221620,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[152],"tags":[],"_links":{"self":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/221619"}],"collection":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/comments?post=221619"}],"version-history":[{"count":3,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/221619\/revisions"}],"predecessor-version":[{"id":329912,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/221619\/revisions\/329912"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/media\/221620"}],"wp:attachment":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/media?parent=221619"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/categories?post=221619"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/tags?post=221619"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}