Authors File Lawsuit Against OpenAI, Claiming Unauthorized Use of Their Books in Training ChatGPT
Two authors have taken legal action against OpenAI, asserting that their copyrighted books were employed in training the company’s AI chatbot, ChatGPT, without their permission. Paul Tremblay, author of The Cabin at the End of the World, and Mona Awad, author of Bunny and 13 Ways of Looking at a Fat Girl, allege that ChatGPT generates remarkably accurate summaries of their works, suggesting that it utilized their books during its training process, which would violate copyright law.
OpenAI has yet to respond to CNBC’s request for comment, and lawyers representing Tremblay and Awad have not provided an immediate response either.
ChatGPT is an advanced and creative chatbot that automatically generates text based on written prompts. OpenAI, a San Francisco-based research company led by Sam Altman and backed by Microsoft, developed the technology. Its training involves vast amounts of text data, although specific details regarding the data used for training ChatGPT are not publicly disclosed. However, OpenAI has mentioned that it generally consists of web content, archived books, and Wikipedia.
The lawsuit, which was filed in a San Francisco federal court, claims that a significant portion of OpenAI’s training data is derived from copyrighted materials, including works by Tremblay and Awad. Nevertheless, proving precisely how and where ChatGPT obtained this information, as well as demonstrating financial damages suffered by the authors, poses a challenge.
The complaint includes exhibits showcasing the summaries generated by ChatGPT. While acknowledging some inaccuracies, Awad and Tremblay maintain that the majority of the summaries are correct, indicating that ChatGPT retains knowledge of particular works in the training dataset.
According to the complaint, ChatGPT never reproduced any of the copyright management information provided by the authors in relation to their published works.