Creeps: AI Giants Are Training Systems on Pictures of Children Without Consent
A recent investigation conducted by Human Rights Watch has unveiled a concerning practice in the field of artificial intelligence development. The study revealed that images of children are being utilized to train AI models without proper consent, posing potential privacy and safety risks to the minors involved.
According to Ars Technica, the Human Rights Watch researcher Hye Jung Han found that widely-used AI datasets, including LAION-5B, contain numerous photos of Australian children sourced from various online platforms. These images are being used to train AI models without the knowledge or approval of the children or their families, raising significant concerns regarding the privacy and safety of minors in the digital realm.
Despite examining a small fraction of the massive 5.85 billion images in the LAION-5B dataset, Han identified 190 photos of children from all states and territories in Australia. This sample size indicates that the actual number of affected children could be considerably higher. The dataset encompasses images capturing various stages of childhood, enabling AI image generators to produce realistic deepfakes of real Australian children.
Moreover, some URLs within the dataset expose identifying details about the children, such as their names and locations. In one instance, Han uncovered both children’s full names and ages, and the name of the preschool they attend in Perth, in Western Australia from a single photo link. This level of information disclosure puts children at risk of privacy breaches and potential safety threats.
The investigation further disclosed that even photos guarded by strict privacy settings were susceptible to scraping. Han identified examples of images from unlisted YouTube videos, which should be accessible only via direct links, incorporated into the dataset. This discovery raises concerns about the effectiveness of existing privacy protocols and the accountability of technology companies in safeguarding user data.
The utilization of these images in AI training sets poses unique risks to Australian children, particularly indigenous minors who might be more susceptible to harm. Han’s report emphasizes that for First Nations peoples, who restrict the reproduction of photos of deceased people during periods of mourning, the integration of these images into AI datasets could perpetuate cultural issues.
The potential misuse of this data is substantial. Recent incidents in Australia have already shown the dangers, with around 50 girls from Melbourne reporting that their social media images were manipulated using AI to create sexually explicit deepfakes. This emphasizes the pressing need for enhanced protections and regulations governing the use of personal data in AI advancement.
While LAION, the entity behind the dataset, has expressed its commitment to eliminating flagged images, the process seems to be sluggish. Additionally, the removal of links from the dataset does not resolve the fact that AI models have already been trained on these images, nor does it prevent the photos from being employed in other AI datasets.
In conclusion, the exposure of children’s images without consent for training AI models underscores the necessity for robust regulations and heightened vigilance in safeguarding minors’ privacy and security in the digital age. Awareness, transparency, and accountability are crucial in mitigating the potential risks associated with AI development.