NVIDIA is currently facing a lawsuit from three authors who claim that the company unlawfully used their copyrighted books to train its NeMo AI platform. Authors Brian Keene, Abdi Nazemian, and Stewart O’Nan allege that nearly 196,640 books, including their own, were part of a dataset that trained NeMo to imitate everyday written language. The dataset was removed in October last year following accusations of copyright infringement. The authors are now seeking compensation for the violation.
The legal battle has progressed to a class action lawsuit filed in a San Francisco court. Keene, Nazemian, and O’Nan argue that NVIDIA’s removal of the dataset serves as an admission of copyright infringement, highlighting the company’s unauthorized use of their works in training NeMo. They are requesting unspecified damages on behalf of individuals whose copyrighted materials were utilized in NeMo’s language models over the past three years. The lawsuit specifically mentions literary works like Keene’s Ghost Walk, Nazemian’s Like a Love Story, and O’Nan’s Last Night at the Lobster.
This lawsuit places NVIDIA among the growing list of companies facing legal action over generative AI technology that generates content based on various inputs such as text, images, and sounds. The company promotes NeMo as a convenient and cost-effective solution for implementing generative AI. Other entities embroiled in similar legal disputes include OpenAI, the developer of ChatGPT, and its collaborator Microsoft.
In a separate case, The New York Times has accused both ChatGPT and Microsoft’s Copilot of infringing on its content and potentially diverting online readership away from its platform. The publication has emphasized the significance of online readership revenues, which support its journalism endeavors. The NYT has cited instances where misinformation was erroneously attributed to it in the past.
The AI industry contends that utilizing freely available digital content on the internet aligns with US copyright laws, which permit restricted use of copyrighted material for purposes like research and education. However, media outlets argue that employing copyrighted content to train AI systems constitutes unfair competition that jeopardizes their revenue streams and overall sustainability.