OpenAI Faces Lawsuit from NYT over Unauthorized Use of Content
OpenAI, the AI pioneer, and its partner Microsoft have been hit with a lawsuit by The New York Times (NYT) for allegedly training their models on copyrighted and paywalled NYT content without proper disclosure or compensation. The news publisher had been negotiating a licensing deal with OpenAI, but both parties failed to reach an agreement. Other publishers, such as the Associated Press and Axel Springer, have already reached commercial agreements to license their content to OpenAI, but the exact amount of money involved has not been disclosed.
However, it is reported that OpenAI may have paid between $1 million and $5 million for these licensing deals. This amount, while significant, pales in comparison to the $50 million that Apple is rumored to be offering other publishers to train its own AI systems. The lack of access to copyright-protected content has potentially affected OpenAI’s flagship ChatGPT products, with users claiming a decline in capability. OpenAI’s GPT crawler has also been blocked from accessing data by Vox Media.
This incident highlights an important consideration for enterprise businesses looking to integrate AI into their workflows. Commercially available models can experience fluctuations in performance due to changes in the datasets used for training. Therefore, it is crucial for firms to be aware of the AI systems they employ and the sources of data behind them.
The large language models (LLMs) used today are trained using deep learning algorithms that rely on vast amounts of ordinary language data. However, the specific sources of training data are often undisclosed by the companies behind these models. News publishers, recognizing the use of copyrighted materials for training, have engaged in discussions with OpenAI regarding licensing their content. The U.S. Copyright Office has also launched an initiative to study the use of copyrighted materials in AI training, suggesting that legislative or regulatory action may be necessary in the future.
Despite the legal challenges and potential costs associated with obtaining training data, AI companies continue to invest in generative AI software development. However, forced payment for all scraped online data could have significant financial implications for these companies.
As the AI industry evolves, regulating the use of copyrighted materials and ensuring fair compensation for content creators remains a complex and ongoing issue. It is crucial for AI companies to operate within legal boundaries and for publishers to explore licensing agreements for their valuable content. The outcome of the lawsuit between OpenAI and NYT will likely shape the future of AI training and content usage in the industry.