{"id":236521,"date":"2024-06-26T05:23:17","date_gmt":"2024-06-26T05:23:17","guid":{"rendered":"https:\/\/michigandigitalnews.com\/index.php\/2024\/06\/26\/reddit-puts-ai-scrapers-on-notice\/"},"modified":"2025-06-25T17:16:09","modified_gmt":"2025-06-25T17:16:09","slug":"reddit-puts-ai-scrapers-on-notice","status":"publish","type":"post","link":"https:\/\/michigandigitalnews.com\/index.php\/2024\/06\/26\/reddit-puts-ai-scrapers-on-notice\/","title":{"rendered":"Reddit puts AI scrapers on notice"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div>\n<p>Reddit has a warning for AI companies and other scrapers: play by our rules or get blocked. The company said in <a data-i13n=\"cpos:1;pos:1\" href=\"https:\/\/www.redditinc.com\/blog\/robot-txt-update\" rel=\"nofollow noopener\" target=\"_blank\" data-ylk=\"slk:an update;cpos:1;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \"><\/a> that it plans to update its Robots Exclusion Protocol (robots.txt file), which allows it to block automated scraping of its platform.<\/p>\n<p>The company said it will also continue to block and rate-limit crawlers and other bots that don\u2019t have a prior agreement with the company. The changes, it said, shouldn\u2019t affect \u201cgood faith actors,\u201d like the Internet Archive and researchers.<\/p>\n<p>Reddit\u2019s notice comes shortly after multiple reports that Perplexity and other AI companies regularly <a data-i13n=\"cpos:2;pos:1\" href=\"https:\/\/www.engadget.com\/ai-companies-are-reportedly-still-scraping-websites-despite-protocols-meant-to-block-them-132308524.html\" data-ylk=\"slk:bypass;cpos:2;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \"><\/a> websites\u2019 robots.txt protocol, which is used by publishers to tell web crawlers they don\u2019t want their content accessed. Perplexity\u2019s CEO, in a recent <a data-i13n=\"cpos:3;pos:1\" href=\"https:\/\/www.fastcompany.com\/91144894\/perplexity-ai-ceo-aravind-srinivas-on-plagiarism-accusations\" rel=\"nofollow noopener\" target=\"_blank\" data-ylk=\"slk:interview;cpos:3;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \"><\/a> with <em>Fast Company<\/em>, said that the protocol is \u201cnot a legal framework.\u201d<\/p>\n<p>In a statement, a Reddit spokesperson told Engadget that it wasn\u2019t targeting a particular company. \u201cThis update isn\u2019t meant to single any one entity out; it\u2019s meant to protect Reddit while keeping the internet open,\u201d the spokesperson said. \u201cIn the next few weeks, we\u2019ll be updating our robots.txt instructions to be as clear as possible: if you are using an automated agent to access Reddit, regardless of what type of company you are, you need to abide by our terms and policies, and you need to talk to us. We believe in the open internet, but we do not believe in the misuse of public content.\u201d<\/p>\n<p>It\u2019s not the first time the company has taken a hard line when it comes to data access. The company cited AI companies\u2019 use of its platform when it began charging for <a data-i13n=\"cpos:4;pos:1\" href=\"https:\/\/www.engadget.com\/reddit-will-charge-companies-for-api-access-citing-ai-training-concerns-184935783.html\" data-ylk=\"slk:its API;cpos:4;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \"><\/a> last year. Since then, it has struck licensing deals with some AI companies, including <a data-i13n=\"cpos:5;pos:1\" href=\"https:\/\/www.engadget.com\/reddit-is-licensing-its-content-to-google-to-help-train-its-ai-models-200013007.html\" data-ylk=\"slk:Google;cpos:5;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \"><\/a> and <a data-i13n=\"cpos:6;pos:1\" href=\"https:\/\/www.engadget.com\/openai-strikes-deal-to-put-reddit-posts-in-chatgpt-224133045.html\" data-ylk=\"slk:OpenAI;cpos:6;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \"><\/a>. The agreements allow AI firms to train their models on Reddit\u2019s archive and have been a significant source of revenue for the newly-public Reddit. The \u201ctalk to us\u201d part of that statement is likely a not-so-subtle reminder that the company is no longer in the business of handing out its content for free.<\/p>\n<\/div>\n<p>[ad_2]<br \/>\n<br \/><a href=\"https:\/\/www.engadget.com\/reddit-puts-ai-scrapers-on-notice-205734539.html?src=rss\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] Reddit has a warning for AI companies and other scrapers: play by our rules or get blocked. The company said in that it plans<\/p>\n","protected":false},"author":1,"featured_media":236522,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[159],"tags":[],"_links":{"self":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/236521"}],"collection":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/comments?post=236521"}],"version-history":[{"count":0,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/236521\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/media\/236522"}],"wp:attachment":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/media?parent=236521"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/categories?post=236521"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/tags?post=236521"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}