Title: ChatGPT Faces Scrutiny Amidst Big Data Scandal
Artificial intelligence (AI), particularly ChatGPT, has made significant strides in recent months, thanks to advancements in language paradigms. These AI systems are designed to provide intelligent and articulate responses by parsing vast amounts of text and generating new content based on learned parameters. However, the nature of the data used to train these systems has raised concerns, casting a shadow over their capabilities and potential privacy issues.
ChatGPT, released in November last year, relies on a staggering 175 billion variables to function effectively. But the source and composition of the data used to train these models remain obscured. Although certain packages and databases have been disclosed, the contents within them are not fully known. It is unclear if personal blog posts or social media content were included in the training data, making it challenging to determine the origins of the information fed into these AI systems.
This lack of transparency has led to regulatory action in various countries. In Italy, ChatGPT was suspended in March due to concerns over potential data protection violations. Canadian regulators initiated an investigation into OpenAI, the organization behind ChatGPT, for its data collection and usage practices. The Federal Trade Commission (FTC) in the United States has also launched an investigation into potential harm caused to consumers and alleged privacy breaches by OpenAI. The Ibero-American Data Protection Network (RIPD), which includes 16 data authorities from 12 countries, is conducting its own investigation into OpenAI’s practices.
In Brazil, concerns have been raised regarding the use of personal data by AI models. Luca Pelli, a professor of law, has petitioned the National Data Protection Authority (ANPD) to address the issue. Pelli emphasizes that individuals have the right to know how their personal data is used by ChatGPT and whether there is consent or a legal basis for its use in training these AI models. However, there has been no response from the ANPD thus far.
This lack of clarity regarding data sources and usage is reminiscent of the Cambridge Analytica scandal, where data from millions of Facebook users was misused. Privacy and data protection experts have continuously raised concerns about data usage on large platforms, but effective actions and regulations have been lacking.
The misuse of data by AI models such as ChatGPT could not only result in a privacy scandal but also a copyright scandal. OpenAI is facing lawsuits from authors who claim that their books have been used to train ChatGPT without proper authorization. Visual artists are also concerned about their work being used in AI-powered image generators.
To address these concerns, Google recently updated its terms of use to specify that publicly available online data can be used to train AI systems. However, critics argue that greater transparency and adherence to contextual integrity are crucial. Respecting the privacy and copyright of individuals is vital when training AI models with public data.
As the scrutiny on AI giants intensifies, it is clear that transparency and accountability should not be compromised. Regulatory bodies must ensure that proper rules and regulations are in place to protect individuals’ privacy and prevent the misuse of data. By prioritizing transparency and context-sensitive data usage, the AI industry can establish trust and provide responsible AI solutions that benefit society as a whole.