NYT’s Copyright Lawsuit Against OpenAI Unveils Battle Between Technology and News Publishers

Date:

Last week’s copyright infringement lawsuit by The New York Times (NYT) against ChatGPT-maker OpenAI has opened another battlefront between big technology companies and news publishers. NYT has questioned the very essence of how large language models (LLMs) – on which tools such as ChatGPT– are trained. The development could have broader ramifications for news media firms and how they should be valued for helping train language models by the courts. Even as most generative AI companies are dealing with copyright issues post facto, Apple has struck commercial discussions with publishers to sign multi-million deals to use licensed content.

In the lawsuit, The New York Times alleges that OpenAI and its largest investor Microsoft have used millions of articles published by the news organization to train chatbots, accusing them of wide-scale copying. The lawsuit claims that OpenAI’s chatbots are now competing with the media platform as a source of information. It further states that the data from Google and Wikipedia, which is the biggest dataset scraped from the internet by Common Crawl, a non-profit web crawler, has been partially used to train the GPT3 engines. The New York Times argues that OpenAI’s generative AI tools can generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style. The lawsuit asserts that OpenAI’s tools undermine and damage The Times’s relationship with its readers and deprive The Times of subscription, licensing, advertising, and affiliate revenue, as the paywall has been breached, directly impacting its business model.

Meanwhile, Apple has adopted a different approach. The tech giant has reportedly approached media companies, including Conde Nast, NBC News, and IAC, with multiyear licensing deals worth at least $50 million to license their archives of news articles. Apple seeks permission to use content before training its generative AI models, unlike other platforms that approach for deals after already training their models. This strategy has received positive feedback from executives at publishing firms, although there are still some concerns about the terms offered by Apple.

See also  Why Microsoft Is Investing in AI Chips: Costing $700,000 Daily to Run ChatGPT

The issue of copyright infringement in the context of training LLMs and natural language processing engines is multifaceted. Many generative AI companies use a process called web-scraping to gather data from the internet and feed it into their LLMs. OpenAI, for instance, has been accused of scraping over 300 billion words from the internet without user consent. The New York Times claims that it had raised concerns with OpenAI and Microsoft about the use of its material but no resolution was reached, leading to the filing of the lawsuit.

Numerous copyright lawsuits have been filed by music labels, authors, and now news publishers, seeking to address the question of fair use and the extent to which copyrighted material can be used in training LLMs. These lawsuits will test the various copyright laws in different jurisdictions, considering factors such as the amount of original material used, the purpose and commercial nature of the use, the value of the copyrighted material, and the impact of its use.

In response to potential copyright suits, OpenAI announced a copyright shield in November, offering indemnification to its enterprise users against such claims arising from the use of ChatGPT. Microsoft, Google, and Amazon have also introduced similar shields. However, these measures do not resolve the underlying legal issues and debates surrounding the use of copyrighted material in training AI models.

As the legal battles continue, news publishers are taking various approaches. While some, like The New York Times, are pursuing lawsuits, others, like Apple, are seeking pre-emptive licensing agreements. The outcome of these cases will have significant implications not only for news publishers but for the entire landscape of technology companies relying on large language models for their AI applications.

See also  Revolutionary Chatbot Achieves Human-Like Interaction, Revealing the Magic of AI

Frequently Asked Questions (FAQs) Related to the Above News

What is the copyright infringement lawsuit between The New York Times and OpenAI?

The New York Times (NYT) has filed a copyright infringement lawsuit against OpenAI, alleging that OpenAI and its investor Microsoft have used millions of NYT articles to train their chatbots. The lawsuit claims that OpenAI's chatbots now compete with NYT as a source of information.

What is the main argument of The New York Times in the lawsuit?

The New York Times argues that OpenAI's generative AI tools can generate output that closely resembles and mimics NYT content, undermining their relationship with readers and impacting their subscription, licensing, advertising, and affiliate revenue.

How has Apple approached the issue?

Apple has taken a different approach by approaching media companies with licensing deals worth at least $50 million to license their news article archives. They seek permission to use content before training their generative AI models, unlike other platforms.

How do generative AI companies gather data for training their models?

Many generative AI companies use a process called web scraping to gather data from the internet. OpenAI has been accused of scraping over 300 billion words, including content from The New York Times, without user consent.

What are the copyright lawsuits seeking to address?

Copyright lawsuits filed by music labels, authors, and news publishers aim to address the question of fair use and the extent to which copyrighted material can be used in training language models. Factors such as the amount of original material used, the purpose and commercial nature of its use, and the impact are considered.

How have AI companies responded to potential copyright suits?

OpenAI, Microsoft, Google, and Amazon have introduced copyright shields that offer indemnification to their enterprise users against copyright claims arising from the use of their AI models. However, these measures do not resolve the underlying legal issues surrounding the use of copyrighted material.

How are news publishers approaching the copyright issue?

News publishers are taking different approaches. Some, like The New York Times, are pursuing lawsuits, while others, like Apple, are seeking pre-emptive licensing agreements with media companies. The outcome of these cases will have significant implications for both news publishers and technology companies relying on large language models.

Please note that the FAQs provided on this page are based on the news article published. While we strive to provide accurate and up-to-date information, it is always recommended to consult relevant authorities or professionals before making any decisions or taking action based on the FAQs or the news article.

Share post:

Subscribe

Popular

More like this
Related

ChatGPT vs. Humans: Can AI Tell Better Jokes? USC Study Reveals Surprising Results

Discover surprising USC study results comparing ChatGPT vs. humans in joke-telling abilities. Can AI really be funnier? Find out now!

China Accelerates Development of Autonomous Robot Dogs with Machine Guns

China accelerates development of autonomous robot dogs with machine guns, sparking global arms race with US and Russia. Don't miss out on this rapid advancement!

Apple Launches iOS 18 Beta Update: Exciting Features Revealed

Discover exciting features in Apple's iOS 18 beta update, including iPhone Mirroring and SharePlay Screen Sharing. Download now!

Bitfarms Appoints New CEO Amid Takeover Battle with Riot Platforms

Bitfarms appoints new CEO Ben Gagnon amid takeover battle with Riot Platforms, positioning for growth and innovation in Bitcoin mining.