Prominent Authors File Class Action Lawsuits Against Meta and OpenAI for Copyright Infringement in Training AI Systems
A group of renowned authors has filed class action lawsuits against Meta and OpenAI, two major artificial intelligence (AI) companies, alleging copyright infringement in the training of their AI systems. The lawsuit claims that the companies illegally harvested mass quantities of books from the internet to develop AI-generated works that violate the copyrights of the authors involved. The authors, including Michael Chabon and other decorated writers, are seeking a court order to require the companies to destroy any AI systems trained on their copyrighted works.
These lawsuits are part of a growing trend among creators who are challenging the legal implications of training AI systems using copyrighted materials. OpenAI is already facing a proposed class action lawsuit from author Paul Tremblay, as well as a suit filed by Sarah Silverman, which also includes Meta as a defendant. Artists have also pursued legal action against AI art generators Stability AI, Midjourney, and DeviantArt for copyright infringement.
The authors argue that AI systems like OpenAI’s ChatGPT have been fed their books as training data, citing examples of the system generating summaries and detailed analyses of the themes in their novels. They claim that this can only occur if the AI models have been trained using their copyrighted works. The lawsuit argues that because these AI systems rely on information extracted from copyright-protected materials, the content they produce is itself an infringement of the authors’ copyrights.
The authors allege that both OpenAI and Meta compiled their datasets by scraping text data from the internet without consent, credit, or compensation. OpenAI previously admitted to feeding its large language model, GPT-1, a collection of over 7,000 novels from the BookCorpus dataset. The authors claim that these novels were copied without authorization from Smashwords, a website hosting self-published works. Similarly, OpenAI’s subsequent models, including GPT-3, were trained on undisclosed internet-based book corpora, known as Books1 and Book2, respectively. The authors contend that these datasets were likely sourced from sites like Project Gutenberg and shadow library sites such as Library Genesis, Z-Library, and Bibliotik.
Both OpenAI and Meta have faced criticism for not disclosing the origins of their training data. OpenAI stated that it no longer reveals this information due to competitive reasons and safety implications. Meta, on the other hand, only refers to its dataset as being from the Books3 section of The Pile, without providing further details.
The class actions filed by the authors seek to represent a nationwide class of authors in the United States whose works were used to train AI systems. The lawsuits allege direct copyright infringement, vicarious copyright infringement, violations of the Digital Millennium Copyright Act, unjust enrichment, and negligence. The outcome of the litigation may hinge on two Supreme Court cases. One case involving Google’s digitization of books for a search function found that such copying constituted fair use, while another recent case emphasized the importance of considering potentially overlapping commercial exploitation.
Experts predict that the courts will focus on the nature of the use and that they may rule in favor of the creators if fair use is properly analyzed. This could potentially lead to AI companies being forced to establish licensing frameworks with authors and artists. Neither OpenAI nor Meta has provided a comment on the lawsuits at this time.
In summary, the class action lawsuits brought by prominent authors against Meta and OpenAI highlight the ongoing legal battles surrounding the use of copyrighted materials for training AI systems. As the courts grapple with the issues at hand, the outcomes of these cases could have significant implications for the future of AI development and its relationship with copyrighted works.