Title: OpenAI Faces Lawsuit for Allegedly Using Stolen Data to Train AI Models
OpenAI, the renowned artificial intelligence research organization, is currently facing legal action brought by The Clarkson Law Firm for allegedly utilizing stolen data to train its AI models. The complaint filed asserts that OpenAI’s language models, ChatGPT and Dall-E, have been utilizing private information belonging to millions of internet users, including minors, without their informed consent or knowledge.
To train its language models, OpenAI collected vast amounts of data from various sources on the internet, including personal information extracted from platforms like Twitter and Reddit. The law firm contends that OpenAI conducted this data collection secretly, without adhering to the necessary regulations, stating that the organization failed to register as a data broker as mandated by applicable laws. OpenAI has faced criticism for its data collection methods for ChatGPT, as well as for not providing users with a clear option to decline the usage of their personal conversations and information.
In fact, the situation escalated to the point where Italy banned ChatGPT due to concerns about inadequate user data protection measures, particularly when it comes to minors. The current lawsuit primarily focuses on OpenAI’s privacy policies regarding existing users, as well as the utilization of data collected from the internet without users’ knowledge or consent, specifically for ChatGPT.
While OpenAI has profited from this data through investments and subscriptions, it has neglected to compensate the individuals whose data it employed. The complaint encompasses 15 charges, including privacy violations, insufficient protection of personal data, and the unauthorized acquisition of a significant volume of personal information for training purposes. Although datasets such as Common Crawl, Wikipedia, and Reddit may contain publicly accessible personal information, companies must abide by regulations when purchasing and utilizing such data.
The allegation against OpenAI revolves around the unauthorized use of this data for ChatGPT, without obtaining explicit permission from the users. Despite personal information being publicly accessible on social media platforms, blogs, and articles, its utilization beyond the intended scope can be deemed a violation of privacy.
In Europe, the General Data Protection Regulation (GDPR) provides a clear distinction between publicly available data and data that can be freely used. However, this matter is still under debate in the United States.
Nader Henein, the Vice President of Privacy Research at Gartner, acknowledges the validity of the lawsuit’s sentiment, asserting that individuals should retain control over the usage of their data, even if it is publicly available. However, Henein expresses uncertainty regarding whether the US legal system will align with this perspective.
Ryan Clarkson, the Managing Partner of Clarkson Law Firm, emphasized the importance of taking immediate action within the framework of existing laws, rather than waiting for the government to enact new regulations. Clarkson emphasized, As a society, the price we would all pay is far too steep. We cannot afford to pay the cost of negative outcomes with AI like we’ve done with social media, or like we did with nuclear.
As the legal battle between OpenAI and The Clarkson Law Firm unfolds, it underscores the significance of data privacy and the ethical responsibilities incumbent upon organizations utilizing AI technology. With the outcome of this lawsuit potentially shaping future regulations, the implications for the AI industry as a whole are far-reaching.