A group of authors has filed multiple lawsuits against OpenAI, claiming copyright infringement in how the company trained its artificial intelligence (AI) models. The lawsuits were initiated by authors Paul Tremblay and Mona Awad, followed by another class action lawsuit led by comedian Sarah Silverman, along with Christopher Golden and Richard Kadrey. All three cases were filed by Joseph Saveri, an antitrust litigation lawyer.
However, these lawsuits reveal a fundamental misunderstanding of copyright law. The authors argue that OpenAI’s models were trained on copyrighted material, which they believe violates their copyrights. But training an AI model does not involve copying the work; rather, it involves reading and learning from it. The notion that simply reading a copyright-protected work constitutes copyright infringement is flawed.
To support their infringement claims, the plaintiffs also argue that the datasets used to train the models were themselves infringing. For example, OpenAI disclosed in a 2018 paper that it trained its model on a dataset called BookCorpus, which contains thousands of unpublished books from various genres. The authors claim that these books were copied without consent, credit, or compensation.
However, even if the dataset is considered infringing, it does not mean that training on that data is also infringing. These lawsuits fail to provide sufficient evidence of actual copying by OpenAI. Their claims primarily rely on weak examples, such as asking OpenAI’s model to summarize books written by the suing authors. This argument would suggest that every schoolchild writing a book report is engaging in copyright infringement.
Furthermore, the lawsuits imply that AI systems learning from existing works would infringe on the originals. If this were the case, any musician who creates music in a specific genre after being inspired by pirated songs in that genre could also be accused of infringement. Such an interpretation goes against how creativity and inspiration work.
While these lawsuits may have an emotional appeal, they are unlikely to succeed. In fact, the plaintiffs may end up having to pay legal fees, as fee awards are common in copyright cases. It is important to recognize that AI systems and machine learning models are designed to learn from existing works to create something new. This process is not infringing but rather an essential aspect of human creativity.
Notwithstanding, it is worth noting that courts have occasionally interpreted copyright law in ways that deviate from its original intent, favoring desired outcomes over legal clarity. If this were to happen in these cases, it would be detrimental to human creativity at large.