Authors sue OpenAI for illegally ‘consuming’ their books

Date:

Authors Mona Awad and Paul Tremblay have filed a lawsuit against OpenAI, the organization behind ChatGPT, claiming that their copyrighted books were unlawfully used to train the AI model without their permission. ChatGPT is an artificial intelligence tool that responds to user-initiated commands with human-like text. The lawsuit, filed in a San Francisco federal court, asserts that the chatbot generated highly accurate summaries of the authors’ novels, leading Awad and Tremblay to believe that their works were ingested and used for training purposes.

This is the first copyright-related lawsuit against ChatGPT, according to legal expert Andres Guadamuz from the University of Sussex. The case brings into question the legal boundaries within the generative AI space. Books are considered ideal for training language models due to their high-quality and well-edited prose, making them the gold standard of idea storage for humanity, according to the authors’ lawyers.

The complaint alleges that OpenAI profits unfairly from stolen writing and ideas. The authors are seeking monetary damages on behalf of all US-based authors who had their works used to train ChatGPT without authorization. Although authors have strong legal protection for their copyrighted works, they are now facing companies like OpenAI that seemingly disregard these laws, according to Joseph Saveri and Matthew Butterick, the authors’ lawyers.

Proving that the authors suffered financial losses specifically due to ChatGPT’s training on their copyrighted material may be challenging. Even if it is true, ChatGPT might perform similarly without ingesting the books, as it is trained on a wide range of internet information, including discussions between users about the novels, as explained by Guadamuz. The lawyers also highlight OpenAI’s increasing secrecy regarding its training data, mentioning that early iterations of ChatGPT referred to a dataset called Books2 estimated to contain 294,000 titles, which they speculate must have been sourced from shadow libraries such as Library Genesis (LibGen) and Z-Library.

See also  AI Insiders Demand Transparency in Tech Development

The outcome of this lawsuit will likely hinge on whether courts view the use of copyright material as fair use or unauthorised copying, according to legal experts Lilian Edwards and Guadamuz. They note that a similar case in the UK would be decided differently due to the absence of a fair use defense. In recent months, the publishing industry has been discussing how to protect authors from potential harms associated with AI technology. The Society of Authors (SoA) has even published guidelines for its members to safeguard themselves and their work. The SoA’s chief executive, Nicola Solomon, expressed support for the authors’ lawsuit, explaining that the wholesale copying of authors’ work for training language models has long concerned their organization.

Richard Combes, head of rights and licensing at the Authors’ Licensing and Collecting Society (ALCS), believes that current regulations around AI are fragmented and struggling to keep up with technological advancements. He encourages policymakers to consider principles established by the ALCS that protect the value of human authorship. Saveri and Butterick anticipate that AI will eventually comply with copyright law, similar to what happened with digital music, TV, and movies. They believe future AI systems will be based on licensed data, with transparent sources.

The lawyers highlight the irony that so-called artificial intelligence tools depend entirely on human-created data, relying on human creativity. If these tools bankrupt human creators, they will ultimately bankrupt themselves.

Frequently Asked Questions (FAQs) Related to the Above News

What is the lawsuit against OpenAI about?

The lawsuit alleges that OpenAI illegally used copyrighted books written by Mona Awad and Paul Tremblay to train their artificial intelligence tool, ChatGPT, without permission.

What is ChatGPT?

ChatGPT is an artificial intelligence tool developed by OpenAI that responds to user commands with human-like text.

Why do the authors believe their works were used for training?

The authors claim that ChatGPT generated highly accurate summaries of their novels, leading them to believe that their works were ingested and used for training the AI model.

Is this the first lawsuit of its kind against ChatGPT?

Yes, this is the first copyright-related lawsuit against ChatGPT.

Why are books ideal for training language models?

Books are considered ideal due to their high-quality and well-edited prose, making them valuable for training language models accurately.

What are the authors seeking in this lawsuit?

The authors are seeking monetary damages on behalf of all US-based authors whose works were used without authorization to train ChatGPT.

Is it challenging to prove financial losses directly caused by ChatGPT's training on the copyrighted material?

Yes, it may be challenging as ChatGPT is also trained on a wide range of internet information and discussions about the novels.

What is OpenAI's stance on training data availability?

OpenAI has become increasingly secretive about its training data, which raises concerns raised by the authors' lawyers.

What will determine the outcome of the lawsuit?

The outcome may depend on how courts view the use of copyright material as fair use or unauthorized copying.

How has the publishing industry reacted to the lawsuit?

The publishing industry has been discussing measures to protect authors from potential harms of AI technology, and the Society of Authors has even published guidelines to aid authors in safeguarding their work.

What are the concerns raised by the Authors' Licensing and Collecting Society?

The organization believes that current regulations around AI are fragmented and struggling to keep up with technological advancements, calling for policymakers to consider principles that protect the value of human authorship.

Do the authors' lawyers believe AI will eventually comply with copyright law?

Yes, the lawyers believe that future AI systems will be based on licensed data with transparent sources, similar to what happened with digital music, TV, and movies.

Why do the lawyers highlight the irony of AI tools relying on human-created data?

The lawyers point out that if AI tools bankrupt human creators, they will ultimately bankrupt themselves, highlighting the dependence of AI on human creativity.

Please note that the FAQs provided on this page are based on the news article published. While we strive to provide accurate and up-to-date information, it is always recommended to consult relevant authorities or professionals before making any decisions or taking action based on the FAQs or the news article.

Aryan Sharma
Aryan Sharma
Aryan is our dedicated writer and manager for the OpenAI category. With a deep passion for artificial intelligence and its transformative potential, Aryan brings a wealth of knowledge and insights to his articles. With a knack for breaking down complex concepts into easily digestible content, he keeps our readers informed and engaged.

Share post:

Subscribe

Popular

More like this
Related

OpenAI Breach Reveals AI Tech Theft Risk

OpenAI breach underscores AI tech theft risk. Tighter security measures needed to prevent future breaches in AI companies.

OnePlus Summer Launch Event: Nord 4, Pad 2, Nord Buds 3 Pro & Watch 2R Revealed in Milan

Get ready for the OnePlus Summer Launch Event in Milan on July 16! Discover the Nord 4, Pad 2, Nord Buds 3 Pro, and Watch 2R with exclusive details.

Government Forms AI Taskforce to Explore Future of Work Impact

Government forms AI taskforce to study AI's future work impact. Labor Secretary promotes ease of doing business reforms in India.

Singaporean Appeal: Chinese AI Firms Flock for Evasion, Leaving Workers Behind

Explore why Chinese AI firms are flocking to Singapore for evasion, leaving workers behind. Discover the impact of this trend on the tech industry.