{"id":237540,"date":"2024-06-28T13:52:16","date_gmt":"2024-06-28T13:52:16","guid":{"rendered":"https:\/\/michigandigitalnews.com\/index.php\/2024\/06\/28\/amazon-investigating-perplexity-ai-after-accusations-it-scrapes-websites-without-consent\/"},"modified":"2025-06-25T17:15:54","modified_gmt":"2025-06-25T17:15:54","slug":"amazon-investigating-perplexity-ai-after-accusations-it-scrapes-websites-without-consent","status":"publish","type":"post","link":"https:\/\/michigandigitalnews.com\/index.php\/2024\/06\/28\/amazon-investigating-perplexity-ai-after-accusations-it-scrapes-websites-without-consent\/","title":{"rendered":"Amazon investigating Perplexity AI after accusations it scrapes websites without consent"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div>\n<p><a data-i13n=\"cpos:1;pos:1\" href=\"https:\/\/www.engadget.com\/amazon-aws-stops-accepting-new-customers-russia-belarus-230051139.html\" data-ylk=\"slk:Amazon Web Services;cpos:1;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \">Amazon Web Services<\/a> has started an investigation to determine whether Perplexity AI is breaking its rules, according to <a data-i13n=\"cpos:2;pos:1\" href=\"https:\/\/www.wired.com\/story\/aws-perplexity-bot-scraping-investigation\/\" rel=\"nofollow noopener\" target=\"_blank\" data-ylk=\"slk:Wired;cpos:2;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \"><em>Wired<\/em><\/a>. To, be precise, the company&#8217;s cloud division is looking into allegations that the service is using a crawler, which is hosted on its servers, that ignores the Robots Exclusion Protocol. This protocol is a web standard, wherein developers put a robots.txt file on a domain containing instructions on whether bots can or can&#8217;t access a particular page. Complying with those instructions is voluntary, but crawlers from reputable companies have generally been respecting them since web developers started implementing the standard in the &#8217;90s.<\/p>\n<p>In an earlier piece, <em>Wired<\/em> <a data-i13n=\"cpos:3;pos:1\" href=\"https:\/\/www.wired.com\/story\/perplexity-is-a-bullshit-machine\/\" rel=\"nofollow noopener\" target=\"_blank\" data-ylk=\"slk:reported;cpos:3;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \">reported<\/a> that it discovered a virtual machine that was bypassing its website&#8217;s robots.txt instructions. That machine was hosted on an Amazon Web Services server using the IP address 44.221.181.252 that&#8217;s &#8220;certainly operated by Perplexity.&#8221; It reportedly visited other Cond\u00e9 Nast properties hundreds of times over the past three months to scrape their content, as well. <em>The<\/em> <em>Guardian<\/em>, <em>Forbes<\/em> and <em>The New York Times<\/em> had also detected it visiting their publications multiple times, <em>Wired<\/em> said. To confirm whether Perplexity truly was scraping its content, <em>Wired<\/em> entered headlines or short descriptions of its articles into the company&#8217;s chatbot. The tool then responded with results that closely paraphrased its articles &#8220;with minimal attribution.&#8221;<\/p>\n<p>A recent <em>Reuters<\/em> report claimed that <a data-i13n=\"cpos:4;pos:1\" href=\"https:\/\/www.engadget.com\/ai-companies-are-reportedly-still-scraping-websites-despite-protocols-meant-to-block-them-132308524.html\" data-ylk=\"slk:Perplexity isn't the only AI company;cpos:4;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \">Perplexity isn&#8217;t the only AI company<\/a> that&#8217;s bypassing robots.txt files to gather content used to train large language models. However, Amazon&#8217;s investigation seems to be focused on Perplexity AI only. An Amazon spokesperson told <em>Wired<\/em> that its customers have to comply with robots.txt instructions when crawling websites. &#8220;AWS\u2019s terms of service prohibit customers from using our services for any illegal activity, and our customers are responsible for complying with our terms and all applicable laws,&#8221; they said.<\/p>\n<p>Perplexity spokesperson Sara Platnick told <em>Wired<\/em> that the company has already responded to Amazon&#8217;s inquiries and denied that its crawlers are bypassing the Robots Exclusion Protocol. &#8220;Our PerplexityBot \u2014 which runs on AWS \u2014 respects robots.txt, and we confirmed that Perplexity-controlled services are not crawling in any way that violates AWS Terms of Service,&#8221; she said. Platnick admitted, however, that PerplexityBot will ignore robots.text when a user includes a specific URL in their chatbot inquiry.<\/p>\n<p>Aravind Srinivas, the CEO of Perplexity, also previously denied that his company is &#8220;ignoring the Robot Exclusions Protocol and then lying about it.&#8221; Srinivas did admit to<em>\u00a0<\/em><a data-i13n=\"cpos:5;pos:1\" href=\"https:\/\/www.fastcompany.com\/91144894\/perplexity-ai-ceo-aravind-srinivas-on-plagiarism-accusations\" rel=\"nofollow noopener\" target=\"_blank\" data-ylk=\"slk:Fast Company;cpos:5;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \"><em>Fast Company<\/em><\/a> that Perplexity uses third-party web crawlers on top of its own, and that the bot <em>Wired<\/em> identified was one of them.<\/p>\n<\/div>\n<p>[ad_2]<br \/>\n<br \/><a href=\"https:\/\/www.engadget.com\/amazon-investigating-perplexity-ai-after-accusations-it-scrapes-websites-without-consent-133003374.html?src=rss\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] Amazon Web Services has started an investigation to determine whether Perplexity AI is breaking its rules, according to Wired. To, be precise, the company&#8217;s<\/p>\n","protected":false},"author":1,"featured_media":237541,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[159],"tags":[],"_links":{"self":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/237540"}],"collection":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/comments?post=237540"}],"version-history":[{"count":0,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/237540\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/media\/237541"}],"wp:attachment":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/media?parent=237540"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/categories?post=237540"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/tags?post=237540"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}