Title: OpenAI Faces Class Action Lawsuit for Allegedly Illegally Gathering Internet Content for ChatGPT
OpenAI, the creator of ChatGPT, is currently under fire as a class action lawsuit has been filed against the company, claiming that their AI training methods violated privacy and copyright laws by using content from the internet without permission.
The lawsuit, filed in California, alleges that OpenAI unlawfully acquired an extensive amount of data from various online sources to train their advanced language models. This data included materials like Wikipedia articles, popular books, social media posts, and even explicit content from niche genres. The crucial issue is that OpenAI obtained this data without seeking consent from the content creators.
The class action lawsuit argues that OpenAI’s failure to follow proper protocols, including obtaining permission from content creators, amounts to data theft. The filing explicitly states that rather than acquiring personal information through established procedures, OpenAI resorted to systematic scraping, gathering 300 billion words from the internet, including books, articles, websites, and posts, which also included personal information obtained without consent.
The concern raised by the lawsuit is that if individuals have been active online in recent years, their digital contributions may have been incorporated into OpenAI’s datasets. Consequently, any output generated by OpenAI’s language models, which is then used for profit, may contain fragments of personal data collected through silent scraping.
Ryan Clarkson, the managing partner of the law firm suing OpenAI, highlighted how large amounts of information are being taken without it originally being intended for use by language models operating on such a massive scale.
The outcome of the case remains uncertain due to the complexities surrounding the internet’s infrastructure and the ownership of digital content. Online platforms typically have their own terms and agreements with users, and when users contribute content to these platforms, they generally grant the platform a broad license to use their content in various ways. This makes it challenging for ordinary users to claim entitlement to payment or compensation for the use of their data in training models.
Katherine Gardner, an intellectual-property lawyer, explained that users often grant social media platforms and websites broad licenses to use their uploaded content. Consequently, the ownership of the content typically belongs to the platform itself, making it difficult for individual users to seek payment or compensation for the use of their data.
As the lawsuit unfolds, it raises important questions about consent, privacy, and the legal implications of gathering and using data from the internet without proper authorization. The outcome of this case will have significant implications for the future of AI training methods and the protection of individual rights in the digital landscape.