The ongoing controversy surrounding copyright infringement by large language models has taken a significant turn, as The New York Times has filed a lawsuit against OpenAI and Microsoft. The suit alleges that the companies used millions of NYT articles to train their systems without permission or compensation. The lawsuit claims that OpenAI and Microsoft have exploited the Times’ journalism investment by using it to build competing products without authorization. The suit asserts that the defendants should be held accountable for billions of dollars in damages. Additionally, the Times requests the destruction of any chatbot models and training data that utilize its copyrighted material. OpenAI argues that its use of NYT content falls under fair use due to its transformative purpose.
This lawsuit follows failed negotiations in August between the Times, OpenAI, and Microsoft regarding a potential licensing agreement. The Times had sought a deal that would allow OpenAI to legally train its GPT model using the paper’s material. Talks broke down, leading to the current legal action. Notably, OpenAI already holds an agreement with Reuters to utilize its content for training purposes.
The issue of data scraping has garnered significant attention this year. In April, Elon Musk threatened to sue Microsoft over claims that it was illegally leveraging Twitter data to train AI models. Similarly, over 8,000 authors, including prominent figures like James Patterson and Margaret Atwood, signed an open letter urging AI companies to obtain consent and provide compensation before using their works for model training. OpenAI has faced multiple copyright infringement lawsuits from authors despite these pleas. In another related lawsuit, artists sued AI art generators Stable Diffusion and Midjourney for copyright violations earlier this year.
As this legal battle unfolds, the implications for the use of copyrighted material in training large language models will likely be scrutinized. The outcome could shape the future of fair use and compensation in the ever-evolving field of AI.
Despite efforts to address the matter through negotiations and open letters, disputes continue to arise, highlighting the complex legal and ethical challenges brought about by advancements in artificial intelligence. The outcome of this particular lawsuit, along with others related to copyright infringement, will have far-reaching implications for AI innovation and the boundaries of intellectual property rights.
It remains to be seen how the courts will interpret and rule on the issues raised in the lawsuit. Until then, the controversy surrounding the use of copyrighted material for training data in AI models is likely to persist, with both sides advocating for their respective interests.