{"id":247368,"date":"2024-07-24T21:33:37","date_gmt":"2024-07-24T21:33:37","guid":{"rendered":"https:\/\/michigandigitalnews.com\/index.php\/2024\/07\/24\/ai-search-engines-that-dont-pay-up-cant-index-reddit-content\/"},"modified":"2025-06-25T17:14:01","modified_gmt":"2025-06-25T17:14:01","slug":"ai-search-engines-that-dont-pay-up-cant-index-reddit-content","status":"publish","type":"post","link":"https:\/\/michigandigitalnews.com\/index.php\/2024\/07\/24\/ai-search-engines-that-dont-pay-up-cant-index-reddit-content\/","title":{"rendered":"AI search engines that don\u2019t pay up can\u2019t index Reddit content"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div>\n<p>When Reddit said last month that it would block unauthorized data scraping from its site, everyone\u2019s (rightful) first reaction was \u201cAI, AI, AI.\u201d However, now that the change has taken effect, chatbot makers may not be the only ones being locked out. The widely used forum also appears to be blocking major search engines other than Brave and Google, the latter of which reportedly inked a deal earlier this year with Reddit <a data-i13n=\"cpos:1;pos:1\" href=\"https:\/\/www.engadget.com\/reddit-is-licensing-its-content-to-google-to-help-train-its-ai-models-200013007.html\" data-ylk=\"slk:worth $60 million annually.;cpos:1;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \">worth $60 million annually.<\/a> However, a Reddit spokesperson told Engadget that the empty search results are about Google\u2019s rivals not agreeing to the company\u2019s requirements for AI training. It says it\u2019s it\u2019s in discussions with several of them.<\/p>\n<p><em>404 Media<\/em> <a data-i13n=\"elm:context_link;elmt:doNotAffiliate;cpos:2;pos:1\" class=\"link \" href=\"https:\/\/www.404media.co\/google-is-the-only-search-engine-that-works-on-reddit-now-thanks-to-ai-deal\/\" rel=\"nofollow noopener\" target=\"_blank\" data-ylk=\"slk:reported on Wednesday;elm:context_link;elmt:doNotAffiliate;cpos:2;pos:1;itc:0;sec:content-canvas\">reported on Wednesday<\/a> (and Engadget confirmed in our queries) that searching for Reddit results from the past week on rival engine Bing (using \u201csite:reddit.com\u201d) returns empty results. The publication reported that DuckDuckGo produced seven links without any descriptions, only providing the note, \u201cWe would like to show you a description here but the site won\u2019t allow us.\u201d The engine now appears to have removed even those, as our test only produced an empty page, reading, \u201cno results found.\u201d<\/p>\n<p>When Reddit <a data-i13n=\"elm:context_link;elmt:doNotAffiliate;cpos:3;pos:1\" class=\"link \" href=\"https:\/\/www.reuters.com\/technology\/reddit-update-web-standard-block-automated-website-scraping-2024-06-25\/\" rel=\"nofollow noopener\" target=\"_blank\" data-ylk=\"slk:said last month;elm:context_link;elmt:doNotAffiliate;cpos:3;pos:1;itc:0;sec:content-canvas\">said last month<\/a> that it would update its Robots Exclusion Protocol (robots.txt) to block automated data scraping, it\u2019s now apparent that it wasn\u2019t only meant to thwart AI companies like Perplexity and its controversial \u201canswer engine.\u201d Currently, Google appears to be the only search engine allowed to crawl Reddit and produce results from \u201cthe front page of the internet.\u201d<\/p>\n<p>A Reddit spokesperson told Engadget on Wednesday it isn\u2019t accurate to say the missing search results are a result of its Google deal. \u201cWe block all crawlers that are unwilling to commit to not using crawl data for AI training, which is in line with enforcing our Public Content Policy and updated robots.txt file,\u201d the company said. \u201cAnyone accessing Reddit content must abide by our policies, including those in place to protect redditors. We are selective about who we work with and trust with large-scale access to Reddit content.\u201d<\/p>\n<p>Meanwhile, a source familiar with Reddit\u2019s thinking told Engadget on Wednesday that Bing\u2019s omission is due to Microsoft refusing to agree to Reddit\u2019s terms regarding AI crawling. Instead, the Bing maker allegedly claimed its standard web controls were sufficient. The source claims Microsoft\u2019s stance conflicts with Reddit\u2019s data privacy policy, leading to the impasse and empty search results.<\/p>\n<p>The ubiquitous robots.txt is the web standard that communicates which parts of a site can be crawled. Although many crawlers are known to ignore its instructions, Google\u2019s standard procedure is to respect it. So, on the technical side, the companies in cahoots on the lucrative deal appear to have deployed some manual override.<\/p>\n<p>The saga could be seen as a trickle-down effect of <a data-i13n=\"cpos:4;pos:1\" href=\"https:\/\/www.engadget.com\/ai-companies-are-reportedly-still-scraping-websites-despite-protocols-meant-to-block-them-132308524.html\" data-ylk=\"slk:AI chatbots scraping the live web for results;cpos:4;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \">AI chatbots scraping the live web for results<\/a>. With courts slow to determine <a data-i13n=\"cpos:5;pos:1\" href=\"https:\/\/www.engadget.com\/openai-admits-its-impossible-to-train-generative-ai-without-copyrighted-materials-103311496.html\" data-ylk=\"slk:how much of the open web is fair use to train chatbots on;cpos:5;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \">how much of the open web is fair use to train chatbots on<\/a>, companies like Reddit, whose bottom lines now depend on safeguarding their data from those who don\u2019t pay, are building walls at the expense of the open web. (Although, given the integral role Microsoft has played in this AI era, <a data-i13n=\"cpos:6;pos:1\" href=\"https:\/\/www.engadget.com\/microsofts-openai-partnership-was-born-from-google-envy-202143989.html\" data-ylk=\"slk:cozying up with OpenAI;cpos:6;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \">cozying up with OpenAI<\/a> early on, it seems ironic that Bing finds itself on the losing end of at least one aspect of the fallout.)<\/p>\n<p>Colin Hayhurst, CEO of lesser-known \u201cno-tracking\u201d search engine Mojeek, told <em>404 Media<\/em> that Reddit is \u201ckilling everything for search but Google.\u201d In addition, the executive said his attempts to contact Reddit were ignored. \u201cIt\u2019s never happened to us before,\u201d he said. \u201cBecause this happens to us, we get blocked, usually because of ignorance or stupidity or whatever, and when we contact the site you certainly can get that resolved, but we\u2019ve never had no reply from anybody before.\u201d<\/p>\n<p>Reddit has made no secret of its desire to block AI companies from scraping its treasure trove of data in this burgeoning age of AI. Last year, CEO Steve Huffman risked alienating large portions of its user base by <a data-i13n=\"cpos:7;pos:1\" href=\"https:\/\/www.engadget.com\/reddit-app-developer-says-the-sites-new-api-rules-will-cost-him-20-million-a-year-203911487.html\" data-ylk=\"slk:blocking third-party API requests;cpos:7;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \">blocking third-party API requests<\/a>, leading to the <a data-i13n=\"cpos:8;pos:1\" href=\"https:\/\/www.engadget.com\/apollo-and-other-popular-third-party-reddit-apps-have-shut-down-123149140.html\" data-ylk=\"slk:demise;cpos:8;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \">demise<\/a> of beloved apps like <a data-i13n=\"cpos:9;pos:1\" href=\"https:\/\/www.engadget.com\/reddit-app-developer-says-the-sites-new-api-rules-will-cost-him-20-million-a-year-203911487.html\" data-ylk=\"slk:Christian Selig\u2019s Apollo;cpos:9;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \">Christian Selig\u2019s Apollo<\/a>. Despite <a data-i13n=\"cpos:10;pos:1\" href=\"https:\/\/www.engadget.com\/reddit-sees-more-than-6000-communities-go-dark-in-protest-over-api-changes-095311637.html\" data-ylk=\"slk:widespread protests among moderators and forum-goers;cpos:10;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \">widespread protests among moderators and forum-goers<\/a>, the company only temporarily lost negligible numbers of users.<\/p>\n<p>The gamble appeared to pay off, and Reddit recovered. It <a data-i13n=\"cpos:11;pos:1\" href=\"https:\/\/www.engadget.com\/reddit-is-now-a-publicly-traded-company-144455403.html\" data-ylk=\"slk:went public in March;cpos:11;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \">went public in March<\/a>.<\/p>\n<p><strong>Update, July 24, 2024, 5:00 PM ET<\/strong>: This story has been updated to add statements from Reddit and additional context from sources familiar with the company\u2019s thinking.<\/p>\n<\/div>\n<p>[ad_2]<br \/>\n<br \/><a href=\"https:\/\/www.engadget.com\/search-engines-that-dont-pay-up-cant-index-reddit-content-172949170.html?src=rss\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] When Reddit said last month that it would block unauthorized data scraping from its site, everyone\u2019s (rightful) first reaction was \u201cAI, AI, AI.\u201d However,<\/p>\n","protected":false},"author":1,"featured_media":247369,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[159],"tags":[],"_links":{"self":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/247368"}],"collection":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/comments?post=247368"}],"version-history":[{"count":0,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/247368\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/media\/247369"}],"wp:attachment":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/media?parent=247368"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/categories?post=247368"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/tags?post=247368"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}