The New York Times suit against OpenAI and Microsoft is the copyright case most likely to set intellectual property boundaries for the burgeoning industry of generative artificial intelligence, lawyers say.
In the suit filed Wednesday, the New York Times argues that OpenAI used its content to train its large language model without permission, violating copyright law. Literary groups, Getty Images, and Sarah Silverman had already taken the companies to court for training their AI on their output. But the New York Times suit is thought to have the best chance of obtaining a favorable result for content creators.
The newspaper argues that OpenAI and Microsoft should be held accountable for billions of dollars in statutory and actual damages related to the unlawful copying and use of The Times’s uniquely valuable works. The New York Times argues that OpenAI’s unauthorized use of its content could hurt its profits. The paper is the third most significant source of proprietary data used to train GPT-4, the model that powers the chatbot ChatGPT, behind Wikipedia and a database of patent documents.
Sag argues that the New York Times’ most compelling argument is in their exhibits. The legal team posted 100 examples of ChatGPT copying the Times’ content word-for-word when prompted — the lawyers put the first half of news articles penned by New York Times authors and then asked the bot to complete them.
Each entry replicated the original article word for word.
Whether that evidence would be sufficient to establish copyright infringement will hinge on whether such output is considered Fair Use.
Fair Use is the principle in copyright law that anyone can use portions of copyrighted work for limited and transformative purposes.
The U.S. Copyright Office considers four separate factors when determining if something is Fair Use: Whether it is for commercial or educational purposes, the degree of creative expression involved in the work, how much of the work is replicated in the transformative work, and the effect that the transformative work may have on the market or value of the original copyrighted work.
Although Fair Use has been litigated extensively over the decades in the U.S., there is no significant case law relating to generative AI yet.
The next steps of the trial will focus on whether OpenAI’s text prompts can be categorized as Fair Use under the Copyright Office’s definitions.
On one hand, OpenAI’s lawyers could argue that ChatGPT is an educational institution and library that serves the higher good of the public. OpenAI was initially launched as a nonprofit organization.
On the other hand, OpenAI is also a for-profit company that Thrive Capital valued on paper at $80 billion or more in October. OpenAI CEO Sam Altman told staff in October that the company was earning $1.3 billion in revenue.
There is a sliding scale of protection, and a court ultimately ends up determining the results when it comes to nonfiction content. If the prompts had merely taken the facts of the New York Times’s reporting for its responses, then the court would be more lenient. However, the platform’s replication of New York Times articles word-for-word leaves it more exposed, in legal terms.
This is where there may be uncertainty in the ruling. While ChatGPT was trained on a large portion of the New York Times’s content, the work is often transformed by ChatGPT in an attempt to summarize the content. The exception would be the above examples, where ChatGPT replicated the articles word for word.
The New York Times argues that OpenAI needs to be held accountable for billions of dollars in statutory and actual damages related to the unlawful copying and use of The Times’s uniquely valuable works. The New York Times made $2.31 billion in 2022, had 10 million subscribers in Nov. 2023, and spent over $202 million on expenses. The news publisher must prove that ChatGPT or other generative AI programs cost them a significant portion of their readers and advertising revenue.
As of Friday, OpenAI has not responded in court to the New York Times’ suit, although it has publicly stated that it strives to uphold the creative rights of the content it includes in its training data.
We respect the rights of content creators and owners, OpenAI said in a statement. Our ongoing conversations with the New York Times have been productive and moving forward constructively, so we are surprised and disappointed with this development.
If the court finds OpenAI guilty of willfully breaching copyright law, the court could charge OpenAI up to $150,000 per violation.