The New York Times has filed a lawsuit against OpenAI and Microsoft, accusing them of copyright infringement. The Times claims that the two companies used millions of its articles to build their AI models, which now directly compete with the publication’s content. According to the lawsuit, OpenAI and Microsoft’s large language models (LLMs), specifically ChatGPT and Copilot, are capable of generating output that closely resembles the Times’ content, including verbatim recitations, summaries, and even mimicking the expressive style. The Times argues that this not only undermines its relationship with readers but also deprives the outlet of various revenue streams such as subscriptions, licensing, advertising, and affiliate earning opportunities.
One of the key issues raised in the lawsuit is the alleged unauthorized use of The Times’ material during the training of different versions of GPT. Although information regarding the training dataset was originally made public prior to GPT-3.5, OpenAI no longer discloses those details for recent GPT versions. However, it is implied that full-text New York Times articles continue to be part of the training process. The lawsuit highlights that substantial amounts of information from The Times, comprised of around 16 million unique records, were collected through the Common Crawl database, placing The Times as the third most referenced source after Wikipedia and a database of US patents.
Interestingly, the Times expresses concern that OpenAI-powered software allows users to bypass the Times’ paywall and attributing fictitious misinformation to the publication. With these allegations, the lawsuit brings attention not only to copyright issues but also potentially harmful consequences that could arise from the unauthorized use and manipulation of AI-powered language models.
If the case continues to progress, access to training information is expected to be a crucial matter during the discovery phase. The lawsuit signifies the Times’ determination to protect its content from what it perceives as copyright infringement by OpenAI and Microsoft, as well as the broader concerns regarding the potential impact of machine-generated content on journalism and media industries.
As the legal battle unfolds, the outcome could potentially shape the landscape of AI technology usage and copyright protections, raising questions about the boundaries between intellectual property, machine learning, and the freedom to develop language models trained on publicly available information. The New York Times’ lawsuit against OpenAI and Microsoft not only seeks accountability but also sheds light on the complex relationship between traditional media and emerging AI technologies.