Good research, great examples, but OpenAI is not too excited about showing what data their models are trained on, so enforcing a policy change is the way to go.
A few days after it was disclosed, ChatGPT is now flagging the divergence attack that Google DeepMind (and others) found. Simply put, if you asked ChatGPT to repeat a word forever, it would eventually spew out the data that the model was trained on. And as the researchers showed, that data was accurate word-for-word on the Web.
This attack vector specifically targets ChatGPT 3.5 Turbo. The policy change has been verified by myself, as I was able to replicate the flagging of the attempt.
However, there is no specific mention in the content policy that explicitly prohibits this attack. Some argue that Section 2 Usage Requirements applies, as it restricts users from attempting to discover the underlying components of the model.
OpenAI has not responded to queries regarding this matter, as they have not engaged with any of my previous inquiries over the past year.
The primary reason for flagging this attack vector as a violation is the potential for lawsuits against OpenAI. The ability of the divergence attack to extract exact training data could lead to copyright infringement cases if someone were to extract and misuse valuable data.
Additionally, concerns exist regarding the extraction of Personally Identifiable Information (PII). The original researchers found numerous instances of personal information in their testing samples, raising potential privacy concerns and violating regulations such as the General Data Protection Regulation (GDPR) in the EU.
Interestingly, someone on Reddit discovered this trick four months prior, but it went relatively unnoticed. OpenAI likely considers this matter resolved.
It is crucial for OpenAI to address these issues and enforce policy changes to protect themselves from potential legal consequences. By ensuring transparency and preventing the unauthorized extraction of data, they can maintain trust and compliance with privacy regulations.
Ultimately, OpenAI must prioritize the security of user data and protect themselves from legal implications arising from the misuse of their models. As the landscape of AI continues to evolve, it is imperative for organizations like OpenAI to proactively address vulnerabilities and safeguard user privacy.
Note: This news article was written in response to a given set of details and does not represent the expressed views or opinions of any particular news agency or journalist.