{"id":231118,"date":"2024-06-11T21:42:14","date_gmt":"2024-06-11T21:42:14","guid":{"rendered":"https:\/\/michigandigitalnews.com\/index.php\/2024\/06\/11\/ai-tools-are-illegally-training-on-real-children-including-for-explicit-material\/"},"modified":"2025-06-25T17:17:20","modified_gmt":"2025-06-25T17:17:20","slug":"ai-tools-are-illegally-training-on-real-children-including-for-explicit-material","status":"publish","type":"post","link":"https:\/\/michigandigitalnews.com\/index.php\/2024\/06\/11\/ai-tools-are-illegally-training-on-real-children-including-for-explicit-material\/","title":{"rendered":"AI tools are illegally training on real children, including for explicit material"},"content":{"rendered":"<p> [ad_1]<br \/>\n<br \/><img decoding=\"async\" src=\"https:\/\/fortune.com\/img-assets\/wp-content\/uploads\/2024\/06\/GettyImages-2031368235-e1718131176974.jpg?w=2048\" \/><\/p>\n<p>AI\u2019s unstoppable quest for training data is hoovering up growing amounts of increasingly questionable content\u2014including details of children whose use by AI breaks the law, researchers have found.\u00a0<\/p>\n<div>\n<p>At least 170 links to photos and personal details of children in Brazil have been scraped from the internet and utilized to train AI systems without parental consent or knowledge, Human Rights Watch said in<a href=\"https:\/\/www.hrw.org\/news\/2024\/06\/10\/brazil-childrens-personal-photos-misused-power-ai-tools\" target=\"_blank\" aria-label=\"Go to https:\/\/www.hrw.org\/news\/2024\/06\/10\/brazil-childrens-personal-photos-misused-power-ai-tools\" rel=\"noopener\" class=\"sc-80b85506-0 ovBKL\"> a report this week<\/a>. Some of those AI-systems have generated explicit and violent images of children, HRW said.\u00a0<\/p>\n<p>Brazilian law prohibits the processing of children\u2019s personal data without the consent of a child\u2019s guardian, Hye Jung Han, a children\u2019s technology rights researcher and the author of the report, told <em>Fortune<\/em>.\u00a0<\/p>\n<p>The links to the photos were scraped from personal blogs and social media sites into a large data set called LAION-5B, which has been used to train popular image generators such as <a href=\"https:\/\/fortune.com\/2024\/03\/27\/inside-stability-ai-emad-mostaque-bad-breakup-vc-investors-coatue-lightspeed\/\" target=\"_self\" aria-label=\"Go to https:\/\/fortune.com\/2024\/03\/27\/inside-stability-ai-emad-mostaque-bad-breakup-vc-investors-coatue-lightspeed\/\" class=\"sc-80b85506-0 ovBKL\" rel=\"noopener\">Stable Diffusion<\/a>. The 170 photos of children is likely a \u201csignificant undercount,\u201d HRW said, since the group only reviewed 0.0001 percent of the 5.8 billion images captured in LAION-5B.\u00a0<\/p>\n<p>\u201cMy wider concern is that this is the tip of the iceberg,\u201d Han told <em>Fortune<\/em>. \u201cIt\u2019s likely that there\u2019s many more children and many more Brazilian children\u2019s images in the data set.\u201d<\/p>\n<p>LAION-5B scraped photos of children from as far back as 1994, and which were clearly posted with the expectation of privacy, Han said. One of the photos features a 2-year-old girl meeting her newborn sister, and the photo\u2019s caption includes not only both of the girls\u2019 names, but the name and address of the hospital where the baby was born.\u00a0<\/p>\n<p>That kind of information was available in the URLs or the metadata of many of the photos, Han said. Childrens\u2019 identities are often easily traceable from the photos, either from the caption, or through information about their whereabouts when their photo was taken.\u00a0<\/p>\n<p>Young children dancing in their underwear at home, students giving a presentation at school, and highschoolers at a carnival are only a few examples of the personal photos that were scraped.\u00a0 Many of them were posted from mommy blogs, or screenshots taken from personal family Youtube videos with small view counts, Han said. The photos \u201cspan the entirety of childhood,\u201d the report found.\u00a0<\/p>\n<p>\u201cIt\u2019s very likely that these were personal accounts, and [the people who uploaded the images] just wanted these videos shared with family and friends,\u201d Han added.\u00a0<\/p>\n<p>All publicly available versions of LAION 5B <a href=\"https:\/\/fortune.com\/2023\/12\/21\/ai-training-child-abuse-explicit-stanford\/\" target=\"_self\" aria-label=\"Go to https:\/\/fortune.com\/2023\/12\/21\/ai-training-child-abuse-explicit-stanford\/\" class=\"sc-80b85506-0 ovBKL\" rel=\"noopener\">were taken down last Decembe<\/a>r after a Stanford <a href=\"https:\/\/cyber.fsi.stanford.edu\/news\/investigation-finds-ai-image-generation-models-trained-child-abuse\" target=\"_blank\" aria-label=\"Go to https:\/\/cyber.fsi.stanford.edu\/news\/investigation-finds-ai-image-generation-models-trained-child-abuse\" rel=\"noopener\" class=\"sc-80b85506-0 ovBKL\">investigation<\/a> found that it had scraped images of child sexual abuse. Nate Tyler, a spokesperson for LAION, the nonprofit that runs the data set,\u00a0 said that the organization is working with the Internet Watch Foundation, the Canadian Centre for Child Protection, Stanford, and Human Rights Watch to remove all known references to illegal content from LAION 5B.\u00a0<\/p>\n<p>\u201cWe are grateful for their support and hope to republish a revised LAION 5B soon,\u201d Tyler said.\u00a0<\/p>\n<p>He added that since LAION 5B is built from URL links, rather than direct photographs, simply removing the URL links from the LAION dataset won\u2019t remove any illegal content from the web.\u00a0<\/p>\n<p>However, there is still identifying information about minors within links, Han said. She told <em>Fortun<\/em>e she\u2019s asked LAION to do two things: first, prevent future ingestion of children\u2019s data, and second, regularly remove their data from the dataset.\u00a0<\/p>\n<p>\u201c[LAION] has not responded or committed to either of those things,\u201d Han said.\u00a0<\/p>\n<p>Tyler did not directly address this criticism, but underscored the nonprofit\u2019s commitment to addressing the issue of illegal material in the database.<\/p>\n<p>\u201cThis is a larger and very concerning issue, and as a nonprofit, volunteer organization, we will do our part to help,\u201d Tyler said.<\/p>\n<p>Much of LAION-5B\u2019s data is sourced from Common Crawl, which is a data repository that copies swaths of the open internet. However, Common Crawl\u2019s executive director, Rich Skrenta, previously told the Associated Press that it is LAION\u2019s responsibility to filter what it takes before making use of it.\u00a0<\/p>\n<h2 class=\"wp-block-heading\"><strong>Potential for harm<\/strong><\/h2>\n<p>Once their photos are collected, children face real threats to their privacy, Han said. AI models, including those trained on LAION-5B data, have notoriously <a href=\"https:\/\/not-just-memorization.github.io\/extracting-training-data-from-chatgpt.html\" target=\"_blank\" aria-label=\"Go to https:\/\/not-just-memorization.github.io\/extracting-training-data-from-chatgpt.html\" rel=\"noopener\" class=\"sc-80b85506-0 ovBKL\">regurgitated<\/a> private information\u00a0 \u2013 such as<a href=\"https:\/\/arstechnica.com\/information-technology\/2022\/09\/artist-finds-private-medical-record-photos-in-popular-ai-training-data-set\/\" target=\"_blank\" aria-label=\"Go to https:\/\/arstechnica.com\/information-technology\/2022\/09\/artist-finds-private-medical-record-photos-in-popular-ai-training-data-set\/\" rel=\"noopener\" class=\"sc-80b85506-0 ovBKL\"> medical records<\/a> or personal photographs \u2013 when prompted.\u00a0\u00a0<\/p>\n<p>AI models can now <a href=\"https:\/\/arstechnica.com\/information-technology\/2024\/04\/microsofts-vasa-1-can-deepfake-a-person-with-one-photo-and-one-audio-track\/\" target=\"_blank\" aria-label=\"Go to https:\/\/arstechnica.com\/information-technology\/2024\/04\/microsofts-vasa-1-can-deepfake-a-person-with-one-photo-and-one-audio-track\/\" rel=\"noopener\" class=\"sc-80b85506-0 ovBKL\">generate convincing clones <\/a>of a child with just one or two images, the report wrote.\u00a0<\/p>\n<p>\u201cIt is pretty safe to say that the photos that I found absolutely contributed to the model being able to produce realistic images of Brazilian kids, including sexually explicit imagery,\u201d Han said.\u00a0<\/p>\n<p>More maliciously, some users have used text-to-image AI sites to generate child pornography. One such site, called Civiai, trains their data off of LAION-5B and is overrun by requests for explicit content \u2013 60% of images generated on the platform are considered lewd. Some users asked for and were provided with images related to \u201cvery young girl,\u201d and \u201csex with dog,\u201d an <a href=\"https:\/\/www.404media.co\/a16z-funded-ai-platform-generated-images-that-could-be-categorized-as-child-pornography-leaked-documents-show\/\" target=\"_blank\" aria-label=\"Go to https:\/\/www.404media.co\/a16z-funded-ai-platform-generated-images-that-could-be-categorized-as-child-pornography-leaked-documents-show\/\" rel=\"noopener\" class=\"sc-80b85506-0 ovBKL\">investigation<\/a> from 404Media, a tech journalism company, found.\u00a0<\/p>\n<p>Civiai, upon request, even generated lewd images of girls that specifically did <em>not<\/em> look \u201cadult, old\u201d or \u201chave big breasts,\u201d 404Media revealed.\u00a0<\/p>\n<p>After the investigation was released, the cloud computing provider for Civiai, OctoML, dropped its partnership with the company. Now, Civiai includes a NSFW filter, much to the dismay of some users, who said that the platform will now be like \u201cany other,\u201d according to 404Media.\u00a0<\/p>\n<p>A spokesperson from CIviai told <em>Fortune<\/em> that it immediately bans anyone who produces NSFW content involving minors, and has introduced a \u201csemi-permeable membrane,\u201d referring to the filter which blocks inappropriate content.\u00a0<\/p>\n<p>Deepfake technology has already begun to impact young girls, Han said. At least 85 Brazilian girls have faced harassment from classmates who used AI to create sexually explicit deepfakes of them, based on photos taken from their social media profiles, according to the report. Han said she started investigating the topic due to the consistency and realism of these deepfakes.\u00a0<\/p>\n<p>\u201cI started looking at what was it about this technology that was able to produce such realistic imagery, horrific imagery, of Brazilian kids, and that investigation led me to the training data set,\u201d Han added.\u00a0<\/p>\n<p>The U.S. has seen a number of similar incidents. At least two high schools have faced <a href=\"https:\/\/www.scientificamerican.com\/article\/teens-are-spreading-deepfake-nudes-of-one-another-its-no-joke\/\" target=\"_blank\" aria-label=\"Go to https:\/\/www.scientificamerican.com\/article\/teens-are-spreading-deepfake-nudes-of-one-another-its-no-joke\/\" rel=\"noopener\" class=\"sc-80b85506-0 ovBKL\">scandals<\/a> with boys generating deepfake nude images of dozens of their female classmate.\u00a0<\/p>\n<p>Some states, including Florida, Louisiana, South Dakota, and Washington, have begun banning the creation of deepfake nudes of minors, and other states are considering similar bills. However, Han thinks lawmakers should go further, and protect children\u2019s data from being scraped into AI systems completely, as a \u201cfutureproof.\u201d<\/p>\n<p>\u201cThe burden of responsibility should not be placed on children and parents to try and protect kids from a technology that\u2019s fundamentally impossible to protect against,\u201d Han said. \u201cParents should be able to post those of kids to share with families and friends and not have to live in the fear that those photos might one day be weaponized and used against them.\u201d\u00a0<\/p>\n<\/div>\n<p>[ad_2]<br \/>\n<br \/><a href=\"https:\/\/fortune.com\/2024\/06\/11\/ai-models-training-real-children-explicit-materials-brazil\/\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] AI\u2019s unstoppable quest for training data is hoovering up growing amounts of increasingly questionable content\u2014including details of children whose use by AI breaks the<\/p>\n","protected":false},"author":1,"featured_media":231119,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[149],"tags":[],"_links":{"self":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/231118"}],"collection":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/comments?post=231118"}],"version-history":[{"count":0,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/231118\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/media\/231119"}],"wp:attachment":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/media?parent=231118"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/categories?post=231118"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/tags?post=231118"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}