Title: Sarah Silverman’s Lawsuit Against OpenAI Raises Concerns About AI Models and Copyright Infringement
Comedian Sarah Silverman has filed a lawsuit against OpenAI, the creator of the AI chatbot ChatGPT, accusing the company of copyright infringement. Silverman claims that her memoir, The Bedwetter, was copied without her permission, credit, or compensation by the AI model. This case has attracted attention as it could test the limits of AI models and their use of copyrighted material.
ChatGPT’s ability to provide a detailed synopsis of Silverman’s book has raised questions about how it acquired its knowledge. Did it read and memorize a pirated copy or gather information from customer reviews and online discussions about the book? The answer to these questions may determine the course of the legal battle.
This lawsuit is part of a growing trend among writers who feel they have unwittingly contributed to the development of AI models without proper consent. They argue that their works have been used as the foundation for generative AI products that are projected to contribute trillions of dollars to the global economy. The issue also shines a light on the ethical and legal aspects surrounding these tools.
One of the lawyers representing Silverman and other authors described the use of books as data for AI models as an open, dirty secret of the whole machine learning industry. The allegation is that companies like OpenAI acquire book data from illicit sites, akin to a shadow library of pirated works. OpenAI has refrained from commenting on the allegations, as have other companies facing similar lawsuits, including Meta, the parent company of Facebook and Instagram.
Legal experts believe that this case may be challenging for writers to win, drawing a parallel with Google’s successful defense in a copyright infringement case involving its online book library. The U.S. Supreme Court previously upheld lower court rulings, asserting that Google’s digitization of books and the limited display of extracts did not amount to copyright infringement.
Despite the challenges, concerns about the AI-building practices of tech companies have gained traction within the literary and author communities. Several prominent authors, such as Nora Roberts, Margaret Atwood, Louise Erdrich, and Jodi Picoult, signed an open letter accusing companies like OpenAI, Google, Microsoft, and Meta of exploitative practices. They contend that these AI developers are mimicking and regurgitating their language, style, and ideas without compensating the authors whose writings serve as the AI systems’ foundation.
The heart of the matter lies in the valuable data used to train large language models like ChatGPT. Books have proven invaluable in this regard, as acknowledged by OpenAI itself in a 2018 paper cited in Silverman’s lawsuit. The focus on books stems from their long-form, well-edited, and coherent content, which is essential for developing high-quality language models.
While OpenAI and other AI developers have become more secretive about their data sources, the allegation of relying on shadow libraries containing pirated content has gained traction. The circumstantial evidence suggests that these libraries may have included the works of Sarah Silverman and other plaintiffs.
It remains to be seen how the case will unfold, but its progress could expose the sources of data used by tech companies and their AI models. The possibility of tech executives testifying under oath about their data sources looms on the horizon.
The authors of the lawsuits are not demanding a complete overhaul of algorithms and training data, but they are advocating for fair compensation for the use of their writings. While some precedents exist for forcing companies to destroy ill-gotten AI data, finding a way to compensate writers for contributing to the development of AI models seems necessary. The debate surrounding AI and copyright infringement is likely to continue as these cases progress.