Meta Uses Public Facebook and Instagram Posts to Train AI Assistant: Privacy Concerns Arise
Meta, formerly known as Facebook, has recently come under scrutiny as it disclosed that it used public posts from Facebook and Instagram to train parts of its new AI virtual assistant. While the company asserts that it did not incorporate users’ private posts or messages shared with friends and family in its training data, concerns regarding privacy have emerged.
During an interview with Reuters at Meta’s Connect conference, Nick Clegg, the company’s president of global affairs, emphasized that they had made an effort to exclude datasets containing personal information. Clegg added that the vast majority of the data used for training was publicly available.
Last week, Meta announced the launch of a beta version of its advanced conversational assistant, Meta AI, which is accessible on major platforms like WhatsApp, Messenger, and Instagram. The assistant will also be integrated into the upcoming Ray-Ban Meta smart glasses and Quest 3. Currently available only in the US, Meta AI provides real-time information and can generate realistic images from text prompts.
Powering Meta AI are the LLaMA 2 language model, released in July, and the Emu text-to-image model. Notably, both models have been trained using publicly available Facebook and Instagram posts. Clegg specifically cited privacy concerns as the reason why Meta did not utilize content from LinkedIn.
The use of generative AI has led to ongoing debates over copyright issues related to the content used for training language models (LLMs). Artists and authors have filed copyright lawsuits against various AI companies this year, raising questions about whether creative content is protected by existing fair use doctrine. Clegg anticipates an increase in litigation surrounding this matter.
Meta is not alone in utilizing user-generated content for AI training. Elon Musk’s xAI also uses users’ tweets, and Google has recently updated its policy to confirm the use of all posted user content for AI training purposes.
Recently, Meta’s CEO Mark Zuckerberg revealed the introduction of AI-based chatbots featuring the likenesses of celebrities and influencers. Notable figures such as Tom Brady, Mr. Beast, Paris Hilton, Kendall Jenner, and Snoop Dogg will be among the 28 bots to be launched. Similar to Meta AI, these chatbots will be powered by the LLaMA 2 language model.
While Meta’s latest announcements showcase their technological advancements, concerns surrounding privacy and copyright issues persist. As the company and others continue to leverage public content for AI training, the debate over the intersection of AI, privacy, and intellectual property rights is far from over.