The OpenAI Copyright Battles Have Commenced

Date:

The ongoing battle between copyright holders and OpenAI has taken a new turn as two novelists, Paul Tremblay and Mona Awad, have filed a lawsuit against the company in a federal court in San Francisco. The authors allege that OpenAI’s ChatGPT large language model was trained using data from their copyrighted books without their consent.

In the 16-page class action suit filed on June 28, Tremblay, the author of The Cabin at the End of the World, and Awad, the author of Bunny and 13 Ways of Looking at a Fat Girl, claim that ChatGPT is capable of generating highly accurate summaries of their literary works when prompted. They argue that this level of accuracy is only possible if the model was trained on the content of their books, which would be a violation of federal copyright law. The authors assert that OpenAI stands to profit commercially from the use of their copyrighted materials.

This lawsuit represents the first copyright-related legal claim against OpenAI, but it is unlikely to be the last. Intellectual property scholar Andres Guadamuz from the University of Susse commented that this case could set a precedent for future claims.

The authors’ complaint references a 2018 paper by OpenAI where the company revealed that its GPT-1 model was trained on BookCorpus, a collection of over 7,000 unique unpublished books spanning various genres. In a subsequent 2020 paper introducing GPT-3, OpenAI disclosed that 15% of its training dataset came from Books1 and Books2, internet-based book corpora that comprise over 350,000 books.

Since the launch of ChatGPT, OpenAI has not publicly disclosed the specific data used to train the model or its source. The company stated in its 2020 paper that most of the training data was scraped from the web, including archived books and Wikipedia.

See also  Ixigo becomes first Indian travel company to have ChatGPT Plugin

The lawsuit by Tremblay and Awad highlights the emerging battle between copyright holders and AI companies that use copyrighted materials to train their models. It also adds to the growing demands for damages caused by the unauthorized use of copyrighted works, raising questions about how to prove financial losses in such cases.

In previous instances, visual artists filed suits against AI engines for using their artwork without permission, and music creators emphasized the need to protect their copyrights from generative AI systems. This latest lawsuit from Tremblay and Awad further pushes regulators and courts to define the rules surrounding copyright and AI. They may require AI companies to disclose the sources and methods of their training data, allowing for greater transparency in these systems.

As this legal battle unfolds, it will shape the future landscape of copyrights in the context of AI. The outcome may have significant implications for generative AI companies, potentially opening up the once-opaque workings of these systems for public scrutiny.

Frequently Asked Questions (FAQs) Related to the Above News

What is the lawsuit between copyright holders and OpenAI about?

The lawsuit involves two novelists, Paul Tremblay and Mona Awad, who have filed a lawsuit against OpenAI, alleging that the company's ChatGPT large language model was trained using data from their copyrighted books without their consent.

What do Tremblay and Awad claim in their lawsuit?

They claim that ChatGPT is capable of generating highly accurate summaries of their literary works, suggesting that the model was trained using the content of their books, which would be a violation of federal copyright law. They also argue that OpenAI stands to profit commercially from the use of their copyrighted materials.

How significant is this lawsuit in the battle between copyright holders and AI companies?

This lawsuit represents the first copyright-related legal claim against OpenAI and could potentially set a precedent for future claims. It highlights the ongoing battle between copyright holders and AI companies that use copyrighted materials to train their models.

What evidence do the authors provide to support their claim?

The authors reference a 2018 paper by OpenAI where the company revealed that its GPT-1 model was trained on a collection of unpublished books called BookCorpus. They also mention a 2020 paper in which OpenAI disclosed that 15% of the training dataset for GPT-3 came from internet-based book corpora.

What are the implications of this lawsuit for copyright and AI?

The lawsuit adds to the growing demands for damages caused by the unauthorized use of copyrighted works in AI models. It may require AI companies to disclose the sources and methods of their training data, promoting greater transparency in these systems and potentially shaping the future landscape of copyrights in the context of AI.

Please note that the FAQs provided on this page are based on the news article published. While we strive to provide accurate and up-to-date information, it is always recommended to consult relevant authorities or professionals before making any decisions or taking action based on the FAQs or the news article.

Aryan Sharma
Aryan Sharma
Aryan is our dedicated writer and manager for the OpenAI category. With a deep passion for artificial intelligence and its transformative potential, Aryan brings a wealth of knowledge and insights to his articles. With a knack for breaking down complex concepts into easily digestible content, he keeps our readers informed and engaged.

Share post:

Subscribe

Popular

More like this
Related

Phenomenal AI Launches India’s First Text-to-Video AI Platform

Transform video creation with Phenomenal AI's pioneering text-to-video platform in India, revolutionizing content creation with AI technology.

ZDNET’s Expert Tech Recommendations: Unbiased Reviews & Advice

Discover ZDNet's unbiased tech reviews & advice on AI models Claude & ChatGPT. Learn how to effectively use Claude for tasks requiring deeper understanding.

Telugu Cultural Heritage Datathon: Preserving Language & Tradition

Join the Telugu Cultural Heritage Datathon to preserve language & tradition with ITE&C, IIITH, and industry partners. Exciting opportunity!

China Teams Up with Tesla to Boost Compute Capacity, Alarming US

China partners with Tesla to boost compute capacity, stirring concerns in the US about technological advancements.