OpenAI’s language models have recently faced a complaint from The New York Times regarding bypassing paywalls and potentially violating copyright laws. The newspaper argues that OpenAI’s large language models (LLMs) have the ability to memorize and replicate copyrighted material when provided with specific prompts, thereby allowing users to access the New York Times’ content without a subscription. However, an investigation was conducted to verify these allegations.
In the investigation, prompts from the complaint were copied into a paid instance of ChatGPT to assess its response. Each time, variations of the same response were received, stating that verbatim excerpts from copyrighted material could not be provided. Instead, the model offered summaries and discussions of the article’s themes and content, directing users to visit the New York Times website or access the full-text articles through other authorized services. Therefore, the investigation demonstrates that ChatGPT does not breach the paywall as claimed by the New York Times.
The allegation of OpenAI’s LLMs training on Common Crawl, which supposedly contains 16 million unique records from the New York Times, is also addressed. The complaint argues that this training process violates copyright law by copying and ingesting the content without permission or compensation to the newspaper. However, this line of argument fails to understand how LLMs work and what copyright protection entails. LLMs parameterize the training data into weights within a neural network, transforming inputs into outputs, rather than directly copying the content. Copyright law was not designed to address this aspect of generative AI.
Comparisons are made to search engines, which crawl the web and generate indices of content. Courts have previously held that such activity is permissible under fair use exceptions, as it is considered transformative. In fact, the extent of transformation achieved when LLMs parameterize content is even greater than that of search engines creating indices. The New York Times’ complaint might be seen as a potential negotiating tactic, pressuring OpenAI to agree to their licensing demands, as the latter has previously entered into licensing deals with other media companies.
The concerns raised by the New York Times over LLMs as a threat to high-quality journalism seem misplaced. LLMs can be viewed as advanced auto-complete tools that assist in generating text but do not create truly original content. The suggestion is made for the New York Times to focus on its strengths and produce news content that generative AI cannot match. Alternatively, similar to Disney, the newspaper could try to bend copyright law in its favor. However, past experience has shown that resistance to transformative technological advancements has limited success, and embracing change is essential for survival.
As the New York Times is a respected news organization, it is encouraged to adapt to the inevitable future shaped by these technologies. Emphasizing its unique capabilities and delivering high-quality journalism can ensure its relevance and address any concerns posed by generative AI. Ultimately, finding a balance between traditional journalistic values and the opportunities presented by AI could be the key to thriving in this dynamic media landscape.