The recent lawsuit against OpenAI has sparked a renewed debate about the legal and ethical implications of data scraping by tech companies. The lawsuit, filed in California, accuses OpenAI of collecting private information without consent from millions of internet users, including children. This has raised concerns about privacy violations and the potential misuse of personal data.
Data scraping, the practice of extracting information from websites, is commonly used by AI companies to train their algorithms. While web scraping can have benefits for society, such as enhancing business transparency and aiding academic research, it also comes with risks. These risks include cybersecurity threats and scammers using scraped data for fraudulent activities.
The lawsuit against OpenAI joins a growing list of legal challenges against companies that repurpose or reuse images, personal information, code, and other data for their own purposes. Last year, coders sued GitHub, Microsoft, and OpenAI over a code-generating tool called CoPilot, claiming a violation of licensing agreements. Getty Images also sued Stability AI for alleged copyright infringement involving millions of images.
The widespread use of data scraping technology by AI companies has intensified the scale of web scraping and the potential harms associated with it. Lee Tiedrich, a faculty fellow at Duke University, highlights the numerous legal issues arising from the extensive scraping of code and data by these companies, particularly in relation to privacy and personally identifiable information.
The recent lawsuit against OpenAI places a strong emphasis on privacy concerns. It alleges that the company collected personal data without informed consent or knowledge by scraping the web. Timothy Edgar, a professor at Brown University, argues that OpenAI’s actions constitute a privacy violation and potentially an ethical and legal violation as well. The unauthorized use of shared data for a different purpose without consent is seen as a significant privacy infringement.
AI companies’ utilization of scraped data to train their models raises concerns about the potential consequences of privacy violations. Information obtained through scraping could inadvertently surface in generated responses, leading to unforeseen outcomes for individuals whose privacy has been compromised. Additionally, reclaiming such violated privacy is a challenging task.
Megan Iorio, senior counsel at the Electronic Privacy Information Center, compares the situation to the difficulty of controlling personal information obtained by data brokers. She anticipates a similar scenario where individuals struggle to exercise control over their information as they attempt to hold companies accountable for collecting it.
Data scraping has a lengthy legal history in the United States, with cases reaching the Supreme Court. The 2022 LinkedIn vs. HiQ Labs case involved the scraping of LinkedIn profiles, sparking a debate over whether scraping violated the Computer Fraud and Abuse Act. The Supreme Court rejected the notion that scraping equated to hacking and returned the case to a lower court for resolution. Another example is ClearView AI, a facial recognition company sued for privacy violations in Europe and Illinois. ClearView AI settled the ACLU’s Illinois lawsuit by committing to cease selling their image database to private companies.
Now, Microsoft, the parent company of LinkedIn, finds itself on the opposing side of the courtroom as a plaintiff in three related lawsuits against OpenAI. Tiedrich believes that the lawsuit against OpenAI, with its wide-ranging arguments, was inevitable given the escalating concerns surrounding data scraping and code scraping.
While recent court cases have provided some clarity on fair use in relation to materials and copyright, the legality of data scraping remains a subject of ambiguity. Tiedrich explains that data scraping has existed for decades, pointing to cases from twenty years ago involving scraping airline information. The repercussions of the OpenAI lawsuit could extend beyond the AI sector.
Although the privacy arguments put forth in the OpenAI lawsuit may face challenges, the plaintiffs, as individuals rather than companies, are in a better position to demonstrate the harm caused. However, the limited scope of federal privacy laws complicates bringing a data scraping case on privacy grounds. The lawsuit references three privacy statutes, with only the Illinois privacy law encompassing publicly available information of all users.
Without a comprehensive privacy law that addresses publicly available data without a blanket exemption, there is a risk that the United States could become a safe haven for malicious web scrapers.