Title: Comedian, TV Writers, and Authors Sue OpenAI and Meta for Content Theft
Comedian Sarah Silverman, along with authors Christopher Golden and Richard Kadrey, has taken legal action against OpenAI and Meta, accusing them of stealing their work and training AI models, namely ChatGPT and LLaMA, with unlawfully obtained datasets. The lawsuits, filed in a US District Court, allege copyright infringement against both companies.
The complaint alleges that the datasets used to train the AI models included content obtained from websites notorious for sharing books through torrent systems, such as Bibliotik, Library Genesis, and Z-Library. The plaintiffs claim that their works were included in these datasets without their consent.
Sarah Silverman, Christopher Golden, and Richard Kadrey assert that the AI models, when prompted, provide summaries of their books, thereby violating their copyrights. The exhibits presented as evidence include specific examples of ChatGPT summarizing Silverman’s Bedwetter, Golden’s Ararat, and Kadrey’s Sandman Slim. Notably, the chatbot fails to include any of the copyright information originally included with the published works.
Meta, in particular, is targeted in a separate lawsuit pertaining to its LLaMA models, a series of open-source AI models unveiled in February. The authors assert that their books were included in the datasets used to train LLaMA, and they allege that these datasets were acquired unlawfully.
The complaints highlight Meta’s own documentation on LLaMA, which acknowledges the sources of their training datasets, including one called ThePile compiled by EleutherAI. The lawsuit argues that ThePile was created using the content of the Bibliotik private tracker, among others. The authors claim that these shadow libraries are overtly illegal.
In both lawsuits, the authors stress that they never granted permission for their copyrighted books to be used as training material for these AI models. The allegations include multiple counts of copyright infringement, negligence, unjust enrichment, and unfair competition. The plaintiffs are seeking statutory damages, restitution of profits, and other appropriate remedies.
Joseph Saveri and Matthew Butterick, the authors’ legal representatives, highlight on their LLMlitigation website that they have been contacted by other concerned writers, authors, and publishers who share concerns about ChatGPT’s ability to generate text similar to copyrighted materials, potentially encompassing thousands of books.
Saveri, together with Butterick, is initiating legal action against AI companies on behalf of programmers and artists. In another ongoing case, Getty Images has filed a lawsuit alleging that Stability AI, the creator of the AI image generation tool Stable Diffusion, trained its model using millions of copyrighted images. Mona Awad and Paul Tremblay are also being represented by Saveri and Butterick in a similar case involving Meta’s chatbot.
The lawsuits serve as a call to address the concerns of content creators regarding the unauthorized use and potential infringement of their copyrighted works by AI models. These developments signal the need for increased oversight and regulations within the AI industry to protect intellectual property rights and ensure fair compensation for creators.