Authors Mona Awad and Paul Tremblay have filed a lawsuit against OpenAI, the organization behind ChatGPT, claiming that their copyrighted books were unlawfully used to train the AI model without their permission. ChatGPT is an artificial intelligence tool that responds to user-initiated commands with human-like text. The lawsuit, filed in a San Francisco federal court, asserts that the chatbot generated highly accurate summaries of the authors’ novels, leading Awad and Tremblay to believe that their works were ingested and used for training purposes.
This is the first copyright-related lawsuit against ChatGPT, according to legal expert Andres Guadamuz from the University of Sussex. The case brings into question the legal boundaries within the generative AI space. Books are considered ideal for training language models due to their high-quality and well-edited prose, making them the gold standard of idea storage for humanity, according to the authors’ lawyers.
The complaint alleges that OpenAI profits unfairly from stolen writing and ideas. The authors are seeking monetary damages on behalf of all US-based authors who had their works used to train ChatGPT without authorization. Although authors have strong legal protection for their copyrighted works, they are now facing companies like OpenAI that seemingly disregard these laws, according to Joseph Saveri and Matthew Butterick, the authors’ lawyers.
Proving that the authors suffered financial losses specifically due to ChatGPT’s training on their copyrighted material may be challenging. Even if it is true, ChatGPT might perform similarly without ingesting the books, as it is trained on a wide range of internet information, including discussions between users about the novels, as explained by Guadamuz. The lawyers also highlight OpenAI’s increasing secrecy regarding its training data, mentioning that early iterations of ChatGPT referred to a dataset called Books2 estimated to contain 294,000 titles, which they speculate must have been sourced from shadow libraries such as Library Genesis (LibGen) and Z-Library.
The outcome of this lawsuit will likely hinge on whether courts view the use of copyright material as fair use or unauthorised copying, according to legal experts Lilian Edwards and Guadamuz. They note that a similar case in the UK would be decided differently due to the absence of a fair use defense. In recent months, the publishing industry has been discussing how to protect authors from potential harms associated with AI technology. The Society of Authors (SoA) has even published guidelines for its members to safeguard themselves and their work. The SoA’s chief executive, Nicola Solomon, expressed support for the authors’ lawsuit, explaining that the wholesale copying of authors’ work for training language models has long concerned their organization.
Richard Combes, head of rights and licensing at the Authors’ Licensing and Collecting Society (ALCS), believes that current regulations around AI are fragmented and struggling to keep up with technological advancements. He encourages policymakers to consider principles established by the ALCS that protect the value of human authorship. Saveri and Butterick anticipate that AI will eventually comply with copyright law, similar to what happened with digital music, TV, and movies. They believe future AI systems will be based on licensed data, with transparent sources.
The lawyers highlight the irony that so-called artificial intelligence tools depend entirely on human-created data, relying on human creativity. If these tools bankrupt human creators, they will ultimately bankrupt themselves.