{"id":208599,"date":"2024-02-28T03:49:32","date_gmt":"2024-02-28T03:49:32","guid":{"rendered":"https:\/\/michigandigitalnews.com\/index.php\/2024\/02\/28\/tumblr-and-wordpress-posts-will-reportedly-be-used-for-openai-and-midjourney-training\/"},"modified":"2025-06-25T17:21:30","modified_gmt":"2025-06-25T17:21:30","slug":"tumblr-and-wordpress-posts-will-reportedly-be-used-for-openai-and-midjourney-training","status":"publish","type":"post","link":"https:\/\/michigandigitalnews.com\/index.php\/2024\/02\/28\/tumblr-and-wordpress-posts-will-reportedly-be-used-for-openai-and-midjourney-training\/","title":{"rendered":"Tumblr and WordPress posts will reportedly be used for OpenAI and Midjourney training"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div>\n<p>Tumblr and WordPress are reportedly set to strike deals to sell user data to artificial intelligence companies OpenAI and Midjourney. <em>404 Media<\/em> <a data-i13n=\"elm:context_link;elmt:doNotAffiliate;cpos:1;pos:1\" class=\"link \" href=\"https:\/\/www.404media.co\/tumblr-and-wordpress-to-sell-users-data-to-train-ai-tools\/\" rel=\"nofollow noopener\" target=\"_blank\" data-ylk=\"slk:reports;elm:context_link;elmt:doNotAffiliate;cpos:1;pos:1;itc:0;sec:content-canvas\">reports<\/a> that the platforms\u2019 parent company, Automattic, is nearing completion of an agreement to provide data to help train the AI companies\u2019 models.<\/p>\n<p>It isn\u2019t clear which data will be included, but the report suggests Automattic may have overreached initially. An alleged internal post from Tumblr product manager Cyle Gage suggests Automattic prepared to send private or partner-related data that wasn\u2019t supposed to be included in the deal. The questionable content reportedly included private posts on public blog posts, deleted or suspended blogs, unanswered (therefore, not publicly posted) questions, private answers, posts marked explicit and content from premium partner blogs (like Apple\u2019s former music site).<\/p>\n<p>The internal post suggests Automattic\u2019s engineers are preparing a list of post IDs that should have been excluded. It isn\u2019t clear whether the data had already been sent to the AI companies.<\/p>\n<p>Engadget emailed Automattic to ask for comment on the report. The company replied with a <a data-i13n=\"elm:context_link;elmt:doNotAffiliate;cpos:2;pos:1\" class=\"link \" href=\"https:\/\/automattic.com\/2024\/02\/27\/protecting-user-choice\/\" rel=\"nofollow noopener\" target=\"_blank\" data-ylk=\"slk:published statement;elm:context_link;elmt:doNotAffiliate;cpos:2;pos:1;itc:0;sec:content-canvas\">published statement<\/a>, claiming, \u201cWe will share only public content that\u2019s hosted on WordPress.com and Tumblr from sites that haven\u2019t opted out.\u201d The statement notes that legal regulations don\u2019t currently require AI companies\u2019 web crawlers to abide by users\u2019 opt-out preferences.<\/p>\n<p>The final line of Automattic\u2019s statement appears to align with the reported deals. \u201cWe are also working directly with select AI companies as long as their plans align with what our community cares about: attribution, opt-outs, and control,\u201d Automattic wrote. \u201cOur partnerships will respect all opt-out settings. We also plan to take that a step further and regularly update any partners about people who newly opt out and ask that their content be removed from past sources and future training.\u201d<\/p>\n<figure class=\"caas-figure\">\n<div class=\"caas-figure-with-pb\" style=\"max-height: 640px\">\n<div>\n<div class=\"caas-img-container caas-img-loader\" style=\"padding-bottom:67%\"><img decoding=\"async\" class=\"caas-img caas-lazy has-preview\" alt=\"NEW YORK, NEW YORK - DECEMBER 12: Sam Altman speaks onstage during A Year In TIME at The Plaza Hotel on December 12, 2023 in New York City. (Photo by Mike Coppola\/Getty Images for TIME)\" src=\"https:\/\/s.yimg.com\/ny\/api\/res\/1.2\/vlcLlCO7Ysf.6n1ylpIyfw--\/YXBwaWQ9aGlnaGxhbmRlcjt3PTk2MDtoPTY0MA--\/https:\/\/s.yimg.com\/os\/creatr-uploaded-images\/2024-01\/517ef050-c034-11ee-bef5-bfb61e38e9dd\"\/><noscript><img decoding=\"async\" alt=\"NEW YORK, NEW YORK - DECEMBER 12: Sam Altman speaks onstage during A Year In TIME at The Plaza Hotel on December 12, 2023 in New York City. (Photo by Mike Coppola\/Getty Images for TIME)\" src=\"https:\/\/s.yimg.com\/ny\/api\/res\/1.2\/vlcLlCO7Ysf.6n1ylpIyfw--\/YXBwaWQ9aGlnaGxhbmRlcjt3PTk2MDtoPTY0MA--\/https:\/\/s.yimg.com\/os\/creatr-uploaded-images\/2024-01\/517ef050-c034-11ee-bef5-bfb61e38e9dd\" class=\"caas-img\"\/><\/noscript><\/div>\n<\/div>\n<\/div>\n<p><figcaption class=\"caption-collapse\"><em>OpenAI CEO Sam Altman<\/em><span class=\"caption-credit\"> (Mike Coppola via Getty Images)<\/span><\/figcaption><\/p>\n<\/figure>\n<p>The company reportedly plans to launch a new opt-out tool on Wednesday that claims to allow users to block third parties \u2014 including AI companies \u2014 from training on their data. <em>404 Media<\/em> reviewed an alleged internal FAQ Automattic prepared for the tool, which includes the answer, \u201cIf you opt out from the start, we will block crawlers from accessing your content by adding your site on a disallowed list. If you change your mind later, we also plan to update any partners about people who newly opt-out and ask that their content be removed from past sources and future training.\u201d<\/p>\n<p>The phrasing, describing it as \u201casking\u201d the AI companies to remove the data, may be relevant.<\/p>\n<p>An alleged internal document from Automattic\u2019s AI head, Andrew Spittle, replying to a staff question about data-removal assurances when using the tool, explains, \u201cWe will notify existing partners on a regular basis about anyone who\u2019s opted out since the last time we provided a list. I want this to be an ongoing process where we regularly advocate for past content to be excluded based on current preferences. We will ask that content be deleted and removed from any future training runs. I believe partners will honor this based on our conversations with them to this point. I don\u2019t think they gain much overall by retaining it.\u201d<\/p>\n<p>So, if a Tumblr or WordPress user requests to opt out of AI training, Automattic will allegedly \u201cask\u201d and \u201cadvocate for\u201d their removal. And the company\u2019s AI boss \u201cbelieves\u201d the AI companies will find it in their best interest to comply \u201cbased on our conversations.\u201d (How\u2019s that for reassurance!)<\/p>\n<p>AI data training deals have become a lucrative opportunity for websites treading water in today\u2019s <a data-i13n=\"cpos:3;pos:1\" href=\"https:\/\/www.engadget.com\/after-layoffs-and-an-ai-scandal-cnets-staff-are-unionizing-161508890.html\" data-ylk=\"slk:slippery online publishing landscape;cpos:3;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \">slippery online publishing landscape<\/a>. (Tumblr\u2019s staff was reportedly <a data-i13n=\"cpos:4;pos:1\" href=\"https:\/\/www.engadget.com\/tumblrs-staff-is-reportedly-reduced-to-a-skeleton-crew-215853169.html\" data-ylk=\"slk:reduced to a skeleton crew;cpos:4;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \">reduced to a skeleton crew<\/a> in late 2023.) Last week, Google struck a deal with Reddit (ahead of the latter\u2019s IPO) to <a data-i13n=\"cpos:5;pos:1\" href=\"https:\/\/www.engadget.com\/reddit-is-licensing-its-content-to-google-to-help-train-its-ai-models-200013007.html\" data-ylk=\"slk:train on the platform\u2019s vast knowledge base of user-created content;cpos:5;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \">train on the platform\u2019s vast knowledge base of user-created content<\/a>. Meanwhile, OpenAI rolled out a partnership program last year to <a data-i13n=\"cpos:6;pos:1\" href=\"https:\/\/www.engadget.com\/openai-wants-to-work-with-organizations-to-build-new-ai-training-datasets-214548902.html\" data-ylk=\"slk:collect datasets from third parties;cpos:6;pos:1;elm:context_link;itc:0;sec:content-canvas\" class=\"link \">collect datasets from third parties<\/a> to help train its AI models.<\/p>\n<p><strong>Update, February 27, 2024, 3:56 PM ET<\/strong>: This story has been updated to add a published statement from WordPress and Tumblr parent company Automattic.<\/p>\n<\/div>\n<p>[ad_2]<br \/>\n<br \/><a href=\"https:\/\/www.engadget.com\/tumblr-and-wordpress-posts-will-reportedly-be-used-for-openai-and-midjourney-training-204425798.html?src=rss\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] Tumblr and WordPress are reportedly set to strike deals to sell user data to artificial intelligence companies OpenAI and Midjourney. 404 Media reports that<\/p>\n","protected":false},"author":1,"featured_media":208600,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[159],"tags":[],"_links":{"self":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/208599"}],"collection":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/comments?post=208599"}],"version-history":[{"count":4,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/208599\/revisions"}],"predecessor-version":[{"id":341731,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/208599\/revisions\/341731"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/media\/208600"}],"wp:attachment":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/media?parent=208599"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/categories?post=208599"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/tags?post=208599"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}